Neural networks are a natural choice as the control paradigm for embodied
agents. An agent is trained with a set of training data representing sensory
inputs and desired actions as outputs, and a learning algorithm such as
back-propagation is used to teach the agent the optimal
behavior. Other models, such as PolyWorld used
very general neural network structures and Hebbian learning.
The defining difficulty in our implementation was the acquisition of training
data, a problem noticed by other artificial life reseachers as well.
The problem is that there is no input-output mapping
inherent to artificial life simulations. One must find a mapping that the
neural network should estimate, and then acquire data based on that mapping.
This requires the pre-existence of other agents, and the performance of the
NeuralLearner will be determined by the performance of the model agent.
Based on this balance, this project has two parts: search the entire
input-output mapping space for a possible solution, then teach that solution to
a neural network agent.
NeuralLearner implements a multilayer perceptron, that was used to make
all the decisions. The inputs to this network consisted of: the agent’s current
energy level, the presence and direction of another agents, food and obstacles.
Also included in the input was whether or not the agent could currently eat,
mate, attack, or flee. The output of this network was an action selection
(move, eat, attack, flee, mate), a direction (north, south, east, or west), and
speed value.
|
|
To acquire data for the training of the NeuralLearner a
random agent was first created. The intention was for this agent to randomly
explore the artificial life world and record data to be used to train the
network. The goal of a normal artificial life agent is to stay alive. The
actions of the random agent where filtered, and the training set contained only
the input-output pairs that either led to a direct increase in energy, or kept
the agent alive over a long period of time. Unfortunately, the random agent
usually (about 80% of the time) made a decision that did not lead to
useful data. Hence, the random agent approach was a very inefficient method of
acquiring data. To improve data acquisition, the random agent was pushed
towards situations where it would have experiences, both good and bad.
The resulting data sets were used to perform offline training on the neural
network of the agent. The network was then used statically with the agent, that
is, no more learning took place. As described above, the network used to learn
this data had 7 input and 13 outputs. All data was normalized to the range
[-0.5, 0.5]. This is a classification problem, so the output was coded as
either 0’s or 1’s indicating the choice of action, direction, and speed.
|