The red car is controlled by a neural network that you can train. The neural network is a computer program that is loosely modeled on the way our brain process information and learns from it. These networks acquaint themselves with the world like humans do; learning by trial and error.

Internally the whole game runs on a grid system. You can see it if you change the Road Overlay to Full Map:


// a few things don't have var in front of them - they update already existing variables the game needs
lanesSide = 0;
patchesAhead = 1;
patchesBehind = 0;
trainIterations = 10000;

var num_inputs = (lanesSide * 2 + 1) * (patchesAhead + patchesBehind);
var num_actions = 5;
var temporal_window = 3;
var network_size = num_inputs * temporal_window + num_actions * temporal_window + num_inputs;

var layer_defs = [];
layer_defs.push({
    type: 'input',
    out_sx: 1,
    out_sy: 1,
    out_depth: network_size
});
layer_defs.push({
    type: 'fc',
    num_neurons: 1,
    activation: 'relu'
});
layer_defs.push({
    type: 'regression',
    num_neurons: num_actions
});

var tdtrainer_options = {
    learning_rate: 0.001,
    momentum: 0.0,
    batch_size: 64,
    l2_decay: 0.01
};

var opt = {};
opt.temporal_window = temporal_window;
opt.experience_size = 3000;
opt.start_learn_threshold = 500;
opt.gamma = 0.7;
opt.learning_steps_total = 10000;
opt.learning_steps_burnin = 1000;
opt.epsilon_min = 0.0;
opt.epsilon_test_time = 0.0;
opt.layer_defs = layer_defs;
opt.tdtrainer_options = tdtrainer_options;

brain = new deepqlearn.Brain(num_inputs, num_actions, opt);

learn = function (state, lastReward) {
    brain.backward(lastReward);
    var action = brain.forward(state);

    draw_net();
    draw_stats();

    return action;
}    

 

Reinforcement Learning

  1. This car does not know anything about its surrounding. The only thing that is pre-programmed is safety; it will never crash into other ones. The rest of its behavior is determined by a neural network.
  2. This neural network takes input data which is data about the car’s surrounding; the car’s state.
  3. It processes the data and learns from it through a hidden layer.
  4. Then outputs an action, which will be move or stay.
  5. The network is rewarded when the car chooses actions that result in it moving fast

New states and actions happen over and over again as time progresses, and at each timestamp the network learns more about each state and action results in the fastest navigation through traffic. This type of Deep Learning is called Reinforcement Learning, where the network is rewarded when the car chooses actions that result in it moving fast, this strategy reinforces the best behavior.