The Laboratory for Perceptual Robotics - Objectives and Method

The Laboratory for Perceptual Robotics experiments with computational principles underlying flexible, adaptable systems. We are concerned with robot systems that must produce many kinds of behavior in nonstationary environments. This implies that the objective of behavior is constantly changing as, for instance, when battery levels change, or when nondeterminism in the environment causes dangerous situations (or opportunities) to occur. We refer to these kinds of problem domains as open systems - they are only partially observable and partially controllable. To estimate hidden state and to expand the set of achievable control transitions, we have implemented temporally extended observations and actions, respectively. The kinds of world models developed in such systems are the product of native structure, rewards, environmental stimuli, and experience.

We also consider redundant robot systems - i.e. those that have many ways of perceiving important events and many ways of manipulating the world to effect change. We employ distributed solutions to multi-objective problems and propose that hierarchical robot programs should be acquired incrementally in a manner inspired by sensorimotor development in human infants. We propose to grow a functioning machine agency by observing that robot systems possess a great deal of intrinsic structure (kinematic, dynamic, perceptual, motor) that we discover and exploit during on-going interaction with the world.

Finally, we study robot systems that collaborate with humans and with other robots. A mixed-initiative system can take actions derived from competing internal objectives as well as from external peers and supervisors. Part of our goal concerns how such a robot system can explain why it is behaving in a particular way and can communicate effectively with others.

Best-Effort Adaptive-Optimal Sensorimotor Transformations (BEAST)

To address this goal, we integrate a collection of existing mathematical and computational frameworks for robot systems that grow knowledge structures grounded in activity and use them to make intelligent choices. The major pieces of this integrated framework are summarized in our glossary of terms and techniques for constructing intelligent robots that acquire world models from life-long interaction with humans, robots, and complex, non-deterministic worlds.

The BEAST architecture has been developed at UMass to provide a computational account of sensorimotor development for use in robot programming. "Best-Effort" in the acronym refers to constraints related to embodiment of the robot systems; sensory and motor systems, the suite of reflexes derived from the control basis, the state of existing control knowledge, and the influence of peers and supervisors.

Figure 1: BEAST - Best-Effort Adaptive-Optimal Sensorimotor Transformations

The agent's most primitive actions are closed-loop controllers that are asymptotically stable and whose behavior is dominated by a discrete set of stable fixed points. This design is consistent with current research in infant motor development, adult motor control and robotics. A closed-loop "action" is computed by continuously observing dependent variables with sensors and executing a greedy descent of objective functions F using independent effector variables. The control basis (Figure 1) is a generative basis for constructing such actions in the form of simple tracking controllers with references specified by events on sensory streams. The objective function is minimized by actuating some combination of the available effectors,

The transient response of a closed-loop action can be used to classify the environmental stimuli that drive it. For example, if control error is growing, then the subject we are tracking is too fast. We are attempting to use prototypical patterns in the dynamic response to distinguish important environmental contexts. Therefore, a controller, in our parlance, produces both mechanical artifacts and information regarding the environmental context during temporally extended interactions with it. A state vector (q in Figure 1) is a list of previously acquired dynamic models that could explain a series of run-time observations.

This approach yields a Markov Decision Process (MDP) as illustrated in Figure 1. Machine learning techniques based on MDPs like reinforcement learning (RL) are employed to learn policies for sequencing control decisions in order to optimize reward. The learning algorithm solves the temporal credit assignment problem by associating credit with elements of a behavioral sequence that lead to reward. However, RL depends on stochastic exploration and our state space could be enormous for interesting robots. Moreover, any algoithm that depends on completely random exploration will take a long time, and will occassionally do something terribly unfortunate to learn about the consequences.

The Human-Robot Interaction (HRI) Interface provides many opportunities to interact with learning robots. A Discrete Event Dynamic System (DEDS) model in Figure 1 can be implemented to analyze and interact with the processes for acquiring behavior. Axioms in a DEDS framework can be used to prove that certain predicate states can't occur by pruning the set of admissible actions. In a sense, the DEDS specification uses controllers to avoid uncontrollable states. But it can also be used to shape the early formation of a skill as in classsical approaches to shaping and maturation. The HRI also allows users to impose an external reward metric to generate a value function on legal states and admissible actions. Teleoperator inputs can be explained by projecting them onto admissible actions in the control basis. The human collaborator can be informed by predictions about future states that result from autonomous behavior. Adaptive interfaces, such as these, allow collaborators to communicate regarding future intentions.