The Control Basis
Learning in the Control Basis
Machine learning techniques based on Markov Decision Processes (MDPs?) like reinforcement learning (RL) are employed to learn policies for sequencing control decisions in order to optimize reward. The learning algorithm solves the temporal credit assignment problem by associating credit with elements of a behavioral sequence that lead to reward. However, RL depends on stochastic exploration and our state space could be enormous for interesting robots. Moreover, any algorithm that depends on completely random exploration will take a long time, and will occassionally do something terribly unfortunate to learn about the consequences. Below are a number of learning examples in the Control Basis framework.
Learning a Rotate Gait
The first policy Thing learned was how to rotate in place. Here's a movie showing the training, the number in the lower left of the frame is the elapsed time of the learning episode. The approach we employed made it possible for Thing to learn this gait in about 11 minutes, on-line, in a single training episode. Knowledge of the motor synergies involved in rotating under varying conditions significantly improved the acquisition of other behavior. For instance, Thing learned to translated in roughly half the time given the prior rotate policy than it did without it, and ultimately, the average amount of translation per action was roughly double as well. We believe that this phenomenon is consistent with the kind of staged, sequential development that human neonates exhibit as observed by Developmental Psychologists like Piaget.

This MDP shows all of the statically stable configurations for Thing. The hilighted transitions demonstrate actions that can be taken to transition between these states. The resulting behavior is the rotate gait seen in the movie. The four states in this policy are shown below.

Learning a Stable Grasp

A similar MDP can be constructed where states represent all stable bimanual grasp configurations for Dexter. These grasps include 2-handed grasps as well as 1-handed grasps where a contact force is supplied from gravity.
Here Dexter demonstrates the redundancy available in bimanual grasps. The robot is able to transition between stable grasps between both hands, as well as between one hand and gravity. The video demonstrates a learned policy for these transitions that allows the robot to move the ball to locations far to it's side, not reachable while maintaining a bimanual grasp.


