The Control Basis

The paradigm for programming robots at the Laboratory for Perceptual Robtics is the Control Basis framework for discrete event dynamic systems [HuberThesis?,CoelhoThesis?]. This approach is designed to provide a combinatoric basis for control that supports the representation of declarative and procedural knowledge. Primitive actions are closed-loop controllers that consist of an objective function \phi \in \Omega_\phi, a subset of available "sensor" resource abstractions, \sigma \in \Omega_\sigma, and a subset of available "effector" resource abstractions \tau \in \Omega_\tau. A fully specified controller with its sensor and effector resources is denoted \phi^\sigma_\tau.

The sequence of objective functions invoked captures the declarative structure of a task, while the \sigma and \tau parameters represent the procedural structure appropriate to the run-time context.

Concurrent control commands are constructed through the projection of the output of one controller, \phi_2, into the nullspace of the output of a higher priority controller, \phi_1 (Unless otherwise noted, we use \phi_i to represent a particular instantiation of a controller with defined objective, sensor, and effector resources). The nullspace \mathcal{N}_1 of the control command of \phi_1 is computed by (I - J^+_1 J_1) where J_i is the Jacobian matrix of the objective with respect to the configuration variables \theta and J_i^+ is its pseudoinverse [Nakamura1991]. Our shorthand for this nullspace projection is written using the "subject-to" operator "\triangleleft" [HuberThesis]. For example, \phi_2 \triangleleft \phi_1 captures the case where the inputs derived from subordinate controller \phi_2 will be projected into the nullspace of the superior controller \phi_1.


Discrete Event Dynamic Systems

Using a control basis representation for primitive actions provides desirable consequences. Closed-loop controllers provide asymptotically stable behavior that is robust to local perturbations. Furthermore, the error dynamics of a controller (e, \dot{e}) support a natural discrete abstraction of the continuous underlying state space. According to this discrete abstraction, control events are defined by recognizable temporal patterns in the control error.

The dynamic state of a controller \phi_i can be characterized by a predicate p_i. In this paper we define three cases:

p_i  =  \left\{ \begin{array}{rl} -1 & e_{i} \;\textrm{is undefined}\\ 0 & \dot{e}_{i} < \epsilon_\phi \\ 1 & \dot{e}_{i} \geq \epsilon_\phi \end{array} \right.,

where \epsilon_\phi is some small negative constant. A value of X for predicate p_i characterizes an aggregate "don't care" value over the three possible values and is often useful for state abstraction. Asymptotically stable controllers, for which e(t) can be considered a Lyapunov function, have negative definite \dot{e}. Schema are combinations of primitive controllers defined on states consisting of their respective predicates. Two such schema will be used in the experiments to follow: Search-Track and Grasp.

Figure 1 - A sensorimotor schema constructed from primitive closed loop controllers which shows the Search-Track schema that finds and tracks a visual feature using controllers \phi_0 and \phi_1

Figure 2 - A sensorimotor schema constructed from primitive closed loop controllers which shows the Grasp schema that moves the fingers of a hand to a closed position using controllers \phi_2 and \phi_3. If an object is present, the fingers will envelop that object and take action \phi_3 to grasp it.

Search-Track is a schema constructed out of two primitive, closed-loop controllers. Position controller \phi_0 that moves a stereo pan/tilt head to a random configuration drawn from a probability distribution. Another position controller, \phi_{1}, moves a visual feature to the image center. Both of these objectives are addressed by actuating the pan/tilt motors of the stereo head. Together, they can be used to construct a variety of behavior. States \mathcal S=(p_0,p_1) and actions \phi_0 and \phi_1 define a Markov Decision Process (MDP). A control policy can be defined over this MDP that finds and tracks a visual feature. Figure 1 shows the relevant states and non-zero state transitions for this schema. This schema results in a policy which moves the head to random pan/tilt locations, through the execution of \phi_0, until a feature is found. When this event occurs, the feature is tracked in the center of the image plane. Note that the transition from state (1,0) to state (1,-1) captures the situation in which the visual feature moves out of the image plane.

Figure 2 shows a Grasp schema that employs actions \phi_2 and \phi_3 activated on states \mathcal S=(p_2,p_3). Position controller \phi_{2} flexes the fingers of the robot hand toward a reference "closed" configuration. Force controller, \phi_3, applies a reference force at each fingertip contact so as to generate a wrench closure condition that defines a grasp. Assuming that the hand begins in the "open} configuration, Figure 2 specifies that the hand close, \phi_2. In the course of closing, an object may be encountered which prevents any further motion (states (1,0) or (1,1)). If the wrench closure condition is not asserted when contact is made (state (1,0)), controller \phi_{3} executes until state (1,1) is reached. State (1,-1) captures the situation in which the hand reaches the "closed" configuration and no object is encountered. Under the right conditions, the Grasp schema yields grasps that allow the hand to move the target object.


Learning in the Control Basis

Machine learning techniques based on Markov Decision Processes (MDPs?) like reinforcement learning (RL) are employed to learn policies for sequencing control decisions in order to optimize reward. The learning algorithm solves the temporal credit assignment problem by associating credit with elements of a behavioral sequence that lead to reward. However, RL depends on stochastic exploration and our state space could be enormous for interesting robots. Moreover, any algorithm that depends on completely random exploration will take a long time, and will occassionally do something terribly unfortunate to learn about the consequences. Below are a number of learning examples in the Control Basis framework.


Learning a Rotate Gait


The first policy Thing learned was how to rotate in place. Here's a movie showing the training, the number in the lower left of the frame is the elapsed time of the learning episode. The approach we employed made it possible for Thing to learn this gait in about 11 minutes, on-line, in a single training episode. Knowledge of the motor synergies involved in rotating under varying conditions significantly improved the acquisition of other behavior. For instance, Thing learned to translated in roughly half the time given the prior rotate policy than it did without it, and ultimately, the average amount of translation per action was roughly double as well. We believe that this phenomenon is consistent with the kind of staged, sequential development that human neonates exhibit as observed by Developmental Psychologists like Piaget.











This MDP shows all of the statically stable configurations for Thing. The hilighted transitions demonstrate actions that can be taken to transition between these states. The resulting behavior is the rotate gait seen in the movie. The four states in this policy are shown below.
















Learning a Stable Grasp










A similar MDP can be constructed where states represent all stable bimanual grasp configurations for Dexter. These grasps include 2-handed grasps as well as 1-handed grasps where a contact force is supplied from gravity.

















Here Dexter demonstrates the redundancy available in bimanual grasps. The robot is able to transition between stable grasps between both hands, as well as between one hand and gravity. The video demonstrates a learned policy for these transitions that allows the robot to move the ball to locations far to it's side, not reachable while maintaining a bimanual grasp.