The documents contained in these directories are included by the
contributing authors as a means to ensure timely dissemination of
scholarly and technical work on a non-commercial basis. Copyright and all
rights therein are maintained by the authors or by other copyright holders,
notwithstanding that they have offered their works here electronically.
It is understood that all persons copying this information will adhere to
the terms and constraints invoked by each author's copyright. These works
may not be reposted without the explicit permission of the copyright holder.
Solving partially observable problems by evolution and learning of finite state machines
E. Sanchez, A. P?rez-Uribe, and B. Mesot
The 4th International Conference on Evolvable Systems: From Biology To Hardware (ICES2001), Tokyo, October 3-5, 2001, pp. 267-278
- Abstract
-
Finite state machines (FSM) have been successfully used to implement
the control of an agent to solve particular sequential
tasks. Nevertheless, finite state machines must be hand-coded by the
engineer, which might be very difficult for complex
tasks. Researchers have used evolutionary techniques to evolve finite
state machines and find automatic solutions to sequential tasks. Their
approach consists on encoding the state-transition table defining a
finite state machine in the genome. However, the search space of such
approach tends to be innecesarily huge. In this article, we propose an
alternative approach for the automatic design of finite state machines
using artificial evolution and learning techniques: the
SOS-algorithm. We have obtained very impresive results on experimental
work solving partially observable problems.
A robotics framework for studying the coevolution of signaling
A. P?rez-Uribe and M. Courant
Symposium on Emergence and Development of
Embodied Cognition (EDEC'2001, 3rd International Conference on Cognitive Science, August 27-31, 2001, Beijing, China (to appear)
- Abstract
-
In this paper, we propose a robotics framework for studying the
coevolution of signaling. Our motivation is twofold. First, we propose
a situated and embodied framework for signaler-receiver
interaction, and second, we provide a promising approach for the study
of mechanisms that would enable adaptive systems to access new
information channels and to exploit implicit information in their
environments. We present experimental results on a successful
coevolution of signals that enable a very simple communication between
two robots. Finally, we delineate some aspects of forthcoming
research.
Learning to predict variable-delay rewards
and its role in autonomous developmental robotics
Andr?s P?rez-Uribe and M. Courant
6th International Work-conference on Artificial and Natural Neural Networks, IWANN'2001.
- Abstract
-
Researchers in the new field of ``developmental robotics'' propose to
provide robots with so-called developmental programs. Similar to the
development of human infants, robots might use those programs to
interact with humans and their environment for extended periods of
time, and become smarter autonomously. In this paper we show how a
neural network model developed by neuroscientists can be used by an
autonomous robot to learn by trial-and-error when considering rewards
delivered at arbitrary times, as would be the case of developmental
robots interacting with humans in the real world.
A non-computationally-intensive neurocontroller for autonomous mobile robot navigation
Andr?s P?rez-Uribe
Biologically inspired robot behavior engineering, R. J. Duro, J. Santos, M. Gra?a (Eds.), Springer-verlag, 2002.
- Abstract
-
This chapter presents a neurocontroller architecture for autonomous
mobile robot navigation. The main characteristic of such
neurocontroller is that it is non-computationally-intensive. It
provides a learning robot with the capability to autonomously
categorize input data from the environment, to deal with the
stability-plasticity dilemma, and to learn a state-to-action mapping
that enables it to navigate in a workspace while avoiding obstacles.
The neurocontroller architecture is composed of three main modules: an
adaptive categorization module, implemented by an unsupervised
learning neural architecture called FAST (Flexible Adaptable-Size
Topology), a reinforcement learning module (SARSA), and a short-term
memory or a planning module, intended to accelerate the learning of
behaviors. We describe the use of our neurocontroller in three
navigation tasks, each involving a different kind of sensor: 1)
obstacle avoidance using infra-red proximity sensors, 2) foraging
using a color CCD camera, and 3) wall-following using a grey-level
linear vision system.
Using a time-delay actor-critic neural
architecture with dopamine-like reinforcement signal for learning in
autonomous robots
Andr?s P?rez-Uribe
Emerging Neural Architectures based on
Neuroscience, S. Wermter, J. Austin, D. Willshaw (Eds.),
Springer-verlag, LNAI 2036, pp. 522-533.
- Abstract
-
Neuroscientists have identified a neural substrate of
prediction and reward in experiments with primates. The so-called
dopamine neurons have been shown to code an error in the temporal
prediction of rewards. Similarly, artificial systems can ``learn to
predict'' by the so-called temporal-difference (TD)
methods. Based on the general resemblance between the effective
reinforcement term of TD models and the response of dopamine neurons,
neuroscientists have developed a TD-learning time-delay actor-critic
neural model and compared its performance with the behavior of monkeys
in the laboratory. We have used such a neural network model to learn to
predict variable-delay rewards in a robot spatial choice task similar
to the one used by neuroscientists with primates. Such architecture
implementing TD-learning appears as a promising mechanism for robotic
systems that learn from simple human teaching signals in the real
world.
Learning and Foraging in Robot-bees
Andr?s P?rez-Uribe and Beat Hirsbrunner
SAB2000 Proceedings Supplement Book
- Abstract
-
Honey-bees have long served as a model organism for investigating
insect navigation and collective behavior: they exhibit division of
labor and are an example of insect societies where direct
communication between workers enable cooperation in the task of
collecting nectar and pollen for the colony. However, honey-bees seem
to learn about their environment progressively before becoming
foragers and displaying the very complex collective behaviors that
have inspired researchers interested in collective
intelligence. Motivated by recent researches by biologists and
neuroscientists on the individual learning in honey-bees, we have
implemented a hebbian-learning model and tested it in a foraging task
with an autonomous mobile robot (a robot-bee). Then, we used a second
learning model that merges unsupervised learning and reinforcement
learning techniques. We present some experimental results, as well as
the advantages and disadvantages of both models, and describe future
directions of research.
Learning to predict variable-delay rewards using an actor-critic architecture with dopamine-like reinforcement signal
Andr?s P?rez-Uribe
Proceedings of the EmerNet'2000 Workshop on Current Computational Architectures Integrating Neural Networks and Neuroscience, Durham, UK, 8-9 August, 2000.
- Abstract
-
Neuroscience researchers have identified a neural substrate of
prediction and reward in experiments with primates. The so-called {\em
dopamine neurons} have been shown to code an error in the temporal
prediction of rewards. Similarly, artificial systems can ``learn to
predict'' by the so-called {\em temporal-difference} (TD)
methods. Based on the general resemblance between the {\em expected
reinforcement} term of TD models and the response of dopamine neurons,
neuroscientists have developed a TD model and compared its performance
with the behavior of monkeys in the laboratory. We have used such
neural network model to learn to predict variable-delay rewards in a
robot spatial choice task similar to the one used by neuroscientists
with primates. It appears as a promising mechanism for robotic systems
that learn from simple human teaching signals in the real world.
The Risk of Exploration in Multi-Agent Learning Systems: A Case Study
Andr?s P?rez-Uribe and Beat Hirsbrunner
Proceedings of the AGENTS-00/ECML-00 Joint Workshop on Learning Agents
- Abstract
-
The design of a multi-agent system is rarely a trivial task. Some
researchers have recently proposed to use adaptation and learning
techniques to alleviate such problem. Reinforcement learning
techniques appear as a means to enable a group of several autonomous
agents adapt their behaviors in order to cooperate and to collectively
achieve a global task. However, such techniques have to deal with the
well known exploration-exploitation dilemma. This paper describes the
use of the Bar problem, a variant of Arthur's famous El Farol bar
problem, as a testbed for the study of the exploration-exploitation
dilemma in multi-agent reinforcement learning systems. We present
experimental comparisons between several explore/exploit strategies
and remark the risk of exploration in multi-agent learning systems.
Of implementing neural epigenesis, reinforcement learning,
and mental rehearsal in a mobile autonomous robot
Andr?s P?rez-Uribe
Proceedings of the Artificial Intelligence and the Simulation of Behaviour (AISB'2000) symposium on How to Design a Functioning Mind
- Abstract
-
One of the key implications of functionalism is that minds can, in
principle, be implemented with any physical substratum provided that
the right functional relations are preserved. In this paper we present
an architecture that implements neural epigenesis, reinforcement
learning, and mental rehearsal, some of the functional building blocks
that may enable us to build an artificial brain. However, we conclude
that a new kind of machines, where the learning algorithms would
emerge from the dynamics of the interconnection between the processing
elements, are necessary for the implementation of cognitive abilities
that are irreducible to a mechanistic computing algorithm.