The Control Basis
The paradigm for programming robots at the Laboratory for Perceptual Robtics is the Control Basis
framework for discrete event dynamic
systems [HuberThesis?,CoelhoThesis?]. This approach is designed to
provide a combinatoric basis for control that supports the
representation of declarative and procedural knowledge. Primitive
actions are closed-loop controllers that consist of an objective function
, a subset of available
"sensor" resource abstractions,
, and a
subset of available "effector" resource abstractions
. A fully specified controller with its sensor and
effector resources is denoted
.
The sequence of objective functions invoked captures the declarative
structure of a task, while the
and
parameters
represent the procedural structure appropriate to the run-time
context.
Concurrent control commands are constructed through the projection of
the output of one controller,
, into the nullspace of the
output of a higher priority controller,
(Unless
otherwise noted, we use
to represent a particular
instantiation of a controller with defined objective, sensor, and
effector resources). The nullspace
of the control
command of
is computed by
where
is
the Jacobian matrix of the objective with respect to the configuration
variables
and
is its pseudoinverse [Nakamura1991].
Our shorthand for this nullspace projection is written using the "subject-to" operator
"
" [HuberThesis]. For example,
captures the case where the inputs derived from
subordinate controller
will be projected into the nullspace
of the superior controller
.
Discrete Event Dynamic Systems
Using a control basis representation for primitive actions provides
desirable consequences. Closed-loop controllers provide asymptotically stable behavior that is robust to local perturbations. Furthermore, the error dynamics of a
controller (
,
) support a natural discrete abstraction of the continuous underlying state space. According to this discrete abstraction, control events are defined by recognizable temporal patterns in the control error.
The dynamic state of a controller
can be characterized by a predicate
. In this paper we define three cases:
where
is some small negative constant. A value of X for predicate
characterizes an aggregate "don't care"
value over the three possible values and is often useful for state abstraction.
Asymptotically stable controllers, for which
can be considered a
Lyapunov function, have negative definite
. Schema are combinations of primitive controllers defined on states consisting of their respective predicates. Two such schema will be used in the experiments to follow: Search-Track and Grasp.
Figure 1 - A sensorimotor schema constructed from primitive closed loop controllers which shows the Search-Track schema that finds and tracks a visual feature using controllers
and
Figure 2 - A sensorimotor schema constructed from primitive closed loop controllers which shows the Grasp schema that moves the fingers of a hand to a closed position using controllers
and
. If an object is present, the fingers will envelop that object and take action
to grasp it.
Search-Track is a schema constructed out of two primitive, closed-loop
controllers. Position controller
that moves a stereo pan/tilt head
to a random configuration drawn from a probability distribution. Another
position controller,
, moves a visual feature to the image center.
Both of these objectives are addressed by actuating the pan/tilt motors of
the stereo head. Together, they can be used to construct a variety of behavior.
States
and actions
and
define a Markov
Decision Process (MDP). A control policy can be defined over this MDP that
finds and tracks a visual feature. Figure 1 shows the relevant
states and non-zero state transitions for this schema. This schema results in a policy which moves the head to random pan/tilt locations, through the execution of
, until a feature is found. When this event occurs, the feature is tracked in the center of the image plane. Note that the transition from state (1,0) to state (1,-1) captures the situation in which the visual feature moves out of the image plane.
Figure 2 shows a Grasp schema that employs actions
and
activated on states
. Position
controller
flexes the fingers of the robot hand toward a reference
"closed" configuration. Force controller,
, applies a reference force
at each fingertip contact so as to generate a wrench closure condition
that defines a grasp. Assuming that the hand begins in the "open}
configuration, Figure 2 specifies that the hand close,
.
In the course of closing, an object may be encountered which prevents any further motion (states (1,0) or (1,1)). If the wrench closure condition is not asserted when contact is made (state (1,0)), controller
executes until state (1,1) is reached. State (1,-1) captures the situation in which the hand reaches the "closed" configuration and no object is encountered. Under the right
conditions, the Grasp schema yields grasps that allow the hand to
move the target object.
Learning in the Control Basis
Machine learning techniques based on Markov Decision Processes (MDPs?) like reinforcement learning (RL) are employed to learn policies for sequencing control decisions in order to optimize reward. The learning algorithm solves the temporal credit assignment problem by associating credit with elements of a behavioral sequence that lead to reward. However, RL depends on stochastic exploration and our state space could be enormous for interesting robots. Moreover, any algorithm that depends on completely random exploration will take a long time, and will occassionally do something terribly unfortunate to learn about the consequences. Below are a number of learning examples in the Control Basis framework.
Learning a Rotate Gait
The first policy Thing learned was how to rotate in place. Here's a movie showing the training, the number in the lower left of the frame is the elapsed time of the learning episode. The approach we employed made it possible for Thing to learn this gait in about 11 minutes, on-line, in a single training episode. Knowledge of the motor synergies involved in rotating under varying conditions significantly improved the acquisition of other behavior. For instance, Thing learned to translated in roughly half the time given the prior rotate policy than it did without it, and ultimately, the average amount of translation per action was roughly double as well. We believe that this phenomenon is consistent with the kind of staged, sequential development that human neonates exhibit as observed by Developmental Psychologists like Piaget.

This MDP shows all of the statically stable configurations for Thing. The hilighted transitions demonstrate actions that can be taken to transition between these states. The resulting behavior is the rotate gait seen in the movie. The four states in this policy are shown below.

Learning a Stable Grasp

A similar MDP can be constructed where states represent all stable bimanual grasp configurations for Dexter. These grasps include 2-handed grasps as well as 1-handed grasps where a contact force is supplied from gravity.
Here Dexter demonstrates the redundancy available in bimanual grasps. The robot is able to transition between stable grasps between both hands, as well as between one hand and gravity. The video demonstrates a learned policy for these transitions that allows the robot to move the ball to locations far to it's side, not reachable while maintaining a bimanual grasp.


