The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Solving partially observable problems by evolution and learning of finite state machines

E. Sanchez, A. P?rez-Uribe, and B. Mesot

The 4th International Conference on Evolvable Systems: From Biology To Hardware (ICES2001), Tokyo, October 3-5, 2001, pp. 267-278
Abstract

Finite state machines (FSM) have been successfully used to implement the control of an agent to solve particular sequential tasks. Nevertheless, finite state machines must be hand-coded by the engineer, which might be very difficult for complex tasks. Researchers have used evolutionary techniques to evolve finite state machines and find automatic solutions to sequential tasks. Their approach consists on encoding the state-transition table defining a finite state machine in the genome. However, the search space of such approach tends to be innecesarily huge. In this article, we propose an alternative approach for the automatic design of finite state machines using artificial evolution and learning techniques: the SOS-algorithm. We have obtained very impresive results on experimental work solving partially observable problems.

A robotics framework for studying the coevolution of signaling

A. P?rez-Uribe and M. Courant

Symposium on Emergence and Development of Embodied Cognition (
EDEC'2001, 3rd International Conference on Cognitive Science, August 27-31, 2001, Beijing, China (to appear)
Abstract

In this paper, we propose a robotics framework for studying the coevolution of signaling. Our motivation is twofold. First, we propose a situated and embodied framework for signaler-receiver interaction, and second, we provide a promising approach for the study of mechanisms that would enable adaptive systems to access new information channels and to exploit implicit information in their environments. We present experimental results on a successful coevolution of signals that enable a very simple communication between two robots. Finally, we delineate some aspects of forthcoming research.

Learning to predict variable-delay rewards and its role in autonomous developmental robotics

Andr?s P?rez-Uribe and M. Courant

6th International Work-conference on Artificial and Natural Neural Networks, IWANN'2001.
Abstract

Researchers in the new field of ``developmental robotics'' propose to provide robots with so-called developmental programs. Similar to the development of human infants, robots might use those programs to interact with humans and their environment for extended periods of time, and become smarter autonomously. In this paper we show how a neural network model developed by neuroscientists can be used by an autonomous robot to learn by trial-and-error when considering rewards delivered at arbitrary times, as would be the case of developmental robots interacting with humans in the real world.

A non-computationally-intensive neurocontroller for autonomous mobile robot navigation

Andr?s P?rez-Uribe

Biologically inspired robot behavior engineering, R. J. Duro, J. Santos, M. Gra?a (Eds.), Springer-verlag, 2002.
Abstract

This chapter presents a neurocontroller architecture for autonomous mobile robot navigation. The main characteristic of such neurocontroller is that it is non-computationally-intensive. It provides a learning robot with the capability to autonomously categorize input data from the environment, to deal with the stability-plasticity dilemma, and to learn a state-to-action mapping that enables it to navigate in a workspace while avoiding obstacles. The neurocontroller architecture is composed of three main modules: an adaptive categorization module, implemented by an unsupervised learning neural architecture called FAST (Flexible Adaptable-Size Topology), a reinforcement learning module (SARSA), and a short-term memory or a planning module, intended to accelerate the learning of behaviors. We describe the use of our neurocontroller in three navigation tasks, each involving a different kind of sensor: 1) obstacle avoidance using infra-red proximity sensors, 2) foraging using a color CCD camera, and 3) wall-following using a grey-level linear vision system.

Using a time-delay actor-critic neural architecture with dopamine-like reinforcement signal for learning in autonomous robots

Andr?s P?rez-Uribe

Emerging Neural Architectures based on Neuroscience, S. Wermter, J. Austin, D. Willshaw (Eds.), Springer-verlag, LNAI 2036, pp. 522-533.
Abstract

Neuroscientists have identified a neural substrate of prediction and reward in experiments with primates. The so-called dopamine neurons have been shown to code an error in the temporal prediction of rewards. Similarly, artificial systems can ``learn to predict'' by the so-called temporal-difference (TD) methods. Based on the general resemblance between the effective reinforcement term of TD models and the response of dopamine neurons, neuroscientists have developed a TD-learning time-delay actor-critic neural model and compared its performance with the behavior of monkeys in the laboratory. We have used such a neural network model to learn to predict variable-delay rewards in a robot spatial choice task similar to the one used by neuroscientists with primates. Such architecture implementing TD-learning appears as a promising mechanism for robotic systems that learn from simple human teaching signals in the real world.

Learning and Foraging in Robot-bees

Andr?s P?rez-Uribe and Beat Hirsbrunner

SAB2000 Proceedings Supplement Book

Abstract

Honey-bees have long served as a model organism for investigating insect navigation and collective behavior: they exhibit division of labor and are an example of insect societies where direct communication between workers enable cooperation in the task of collecting nectar and pollen for the colony. However, honey-bees seem to learn about their environment progressively before becoming foragers and displaying the very complex collective behaviors that have inspired researchers interested in collective intelligence. Motivated by recent researches by biologists and neuroscientists on the individual learning in honey-bees, we have implemented a hebbian-learning model and tested it in a foraging task with an autonomous mobile robot (a robot-bee). Then, we used a second learning model that merges unsupervised learning and reinforcement learning techniques. We present some experimental results, as well as the advantages and disadvantages of both models, and describe future directions of research.

Learning to predict variable-delay rewards using an actor-critic architecture with dopamine-like reinforcement signal

Andr?s P?rez-Uribe

Proceedings of the
EmerNet'2000 Workshop on Current Computational Architectures Integrating Neural Networks and Neuroscience, Durham, UK, 8-9 August, 2000.
Abstract

Neuroscience researchers have identified a neural substrate of prediction and reward in experiments with primates. The so-called {\em dopamine neurons} have been shown to code an error in the temporal prediction of rewards. Similarly, artificial systems can ``learn to predict'' by the so-called {\em temporal-difference} (TD) methods. Based on the general resemblance between the {\em expected reinforcement} term of TD models and the response of dopamine neurons, neuroscientists have developed a TD model and compared its performance with the behavior of monkeys in the laboratory. We have used such neural network model to learn to predict variable-delay rewards in a robot spatial choice task similar to the one used by neuroscientists with primates. It appears as a promising mechanism for robotic systems that learn from simple human teaching signals in the real world.

The Risk of Exploration in Multi-Agent Learning Systems: A Case Study

Andr?s P?rez-Uribe and Beat Hirsbrunner

Proceedings of the AGENTS-00/ECML-00 Joint Workshop on Learning Agents

Abstract

The design of a multi-agent system is rarely a trivial task. Some researchers have recently proposed to use adaptation and learning techniques to alleviate such problem. Reinforcement learning techniques appear as a means to enable a group of several autonomous agents adapt their behaviors in order to cooperate and to collectively achieve a global task. However, such techniques have to deal with the well known exploration-exploitation dilemma. This paper describes the use of the Bar problem, a variant of Arthur's famous El Farol bar problem, as a testbed for the study of the exploration-exploitation dilemma in multi-agent reinforcement learning systems. We present experimental comparisons between several explore/exploit strategies and remark the risk of exploration in multi-agent learning systems.

Of implementing neural epigenesis, reinforcement learning, and mental rehearsal in a mobile autonomous robot

Andr?s P?rez-Uribe

Proceedings of the Artificial Intelligence and the Simulation of Behaviour (AISB'2000) symposium on How to Design a Functioning Mind

Abstract

One of the key implications of functionalism is that minds can, in principle, be implemented with any physical substratum provided that the right functional relations are preserved. In this paper we present an architecture that implements neural epigenesis, reinforcement learning, and mental rehearsal, some of the functional building blocks that may enable us to build an artificial brain. However, we conclude that a new kind of machines, where the learning algorithms would emerge from the dynamics of the interconnection between the processing elements, are necessary for the implementation of cognitive abilities that are irreducible to a mechanistic computing algorithm.


1