COMPLEX SYSTEMS

Current Research - MULTI-ROBOT LEARNING

One of the most desirable missions for robots is to reduce human exposure in dangerous tasks, for example in planetary exploration or hazardous waste cleanup. These application domains are characterized by the impossibility of obtaining a good enough model to use a step-by-step problem-solving procedure to program the robots. Therefore, learning (i.e., a way to improve automatically the robot performance in the environment) is mandatory. In the case of cooperative robots, group learning is also mandatory to achieve redundancy (a major reason for cooperation): on-the-fly reconfiguration of the group tasks.

Cooperative robot learning raises, at least, all the issues attached to robot learning [3]: huge search space size, limited number of examples, necessity for generalization. Answers (biases) have been searched in improved exploration of the search space, search space size reduction and reduction of the required number of samples. Over the years, reinforcement learning has emerged as the main learning approach in autonomous robotics, and lazy learning has become the leading bias (allowing the reduction of the time required by an experiment to the time needed to test the learned behavior quality).

Reinforcement learning allows us to synthesize a robot behavior using (only) a qualitative measure of the performance of the desired behavior [2]. Q-learning, because it is a model-free learning method is certainly the most used [1]. Lazy learning, also called instance-based learning, buids a non-explicit model of the robot-environment relation. Because there is no explicit model, sub-symbolic learning techniques must be used.

Cooperative robot learning adds at least two issues:
-The necessary awareness of other members of the team, which entails an increase of the size of the search space. Learning performances are highly dependent on the size of the search space [4].
-The need (in a lazy Q-learning approach) to relay to the individual robots the unique information associated with the team behavior (measure of the performance of the whole group). How to distribute a global qualitative measure to each team member is also the main question in cooperative agent learning. We have proposed the PESSIMISTIC algorithm [5], which is (to our knowledge) the first successful attempt to apply sub-symbolic learning to a team of robots.

References

[1] Claude Touzet, Neural Reinforcement Learning for Behaviour Synthesis, Robotics and Autonomous Systems, Special issue on Learning Robot: the New Wave, N. Sharkey Guest Editor, vol. 22, Nb 3-4, December 1997, pp 251-281.
[2] Juan Miguel Santos & Claude Touzet, Exploration Tuned Reinforcement Function, to appear in Neurocomputing, 1999.
[3] Claude Touzet, Bias Incorporation in Robot Learning, submitted to Autonomous Robots, 1998.
[4] Claude Touzet, Robot Awareness in Cooperative Mobile Robot Learning, under revision for Autonomous Robots, 2000.
[5] Claude Touzet, Distributed Lazy Q-Learning for Cooperative Mobile Robots, submitted to Autonomous Robots, 1999.





CESAR - Center for Engineering Science Advanced Research
Oak Ridge National Laboratory