NeuralLearner


Neural networks are a natural choice as the control paradigm for embodied agents. An agent is trained with a set of training data representing sensory inputs and desired actions as outputs, and a learning algorithm such as back-propagation is used to teach the agent the optimal behavior. Other models, such as PolyWorld used very general neural network structures and Hebbian learning.

The defining difficulty in our implementation was the acquisition of training data, a problem noticed by other artificial life reseachers as well. The problem is that there is no input-output mapping inherent to artificial life simulations. One must find a mapping that the neural network should estimate, and then acquire data based on that mapping. This requires the pre-existence of other agents, and the performance of the NeuralLearner will be determined by the performance of the model agent. Based on this balance, this project has two parts: search the entire input-output mapping space for a possible solution, then teach that solution to a neural network agent.

NeuralLearner implements a multilayer perceptron, that was used to make all the decisions. The inputs to this network consisted of: the agent’s current energy level, the presence and direction of another agents, food and obstacles. Also included in the input was whether or not the agent could currently eat, mate, attack, or flee. The output of this network was an action selection (move, eat, attack, flee, mate), a direction (north, south, east, or west), and speed value.
  To acquire data for the training of the NeuralLearner a random agent was first created. The intention was for this agent to randomly explore the artificial life world and record data to be used to train the network. The goal of a normal artificial life agent is to stay alive. The actions of the random agent where filtered, and the training set contained only the input-output pairs that either led to a direct increase in energy, or kept the agent alive over a long period of time. Unfortunately, the random agent usually (about 80% of the time) made a decision that did not lead to useful data. Hence, the random agent approach was a very inefficient method of acquiring data. To improve data acquisition, the random agent was pushed towards situations where it would have experiences, both good and bad.

The resulting data sets were used to perform offline training on the neural network of the agent. The network was then used statically with the agent, that is, no more learning took place. As described above, the network used to learn this data had 7 input and 13 outputs. All data was normalized to the range [-0.5, 0.5]. This is a classification problem, so the output was coded as either 0’s or 1’s indicating the choice of action, direction, and speed.

 
   
Author: T. Ryan Fitz-Gibbon