Roderic Grupen
Andrew Barto
Carole Beal
Neil Berthier
Paul Cohen
Andrew Fagg
Rachel Keen
Departments of Computer Science and Psychology
University of Massachusetts Amherst
|
|
| Grasping Cylinders: Top Approach |
|
grasp_top.mov
|
|
Grasping Cylinders: Side Approach 2 fingers form a virtual finger
|
|
grasp_peanutbutter.mov
|
| Whole Body Grasping |
|
wbg_move_left.mov
|
| Learning Grasp Location Affordances |
| grasp_afford.mov |
Inspired by our infant development work, we have applied a reinforcement learning technique to the problem of discovering an appropriate sequence of grasp and place actions. Rather than starting with a model of which grasp was appropriate for a given final object configuration, the robot learned through interaction with the environment to select a grip in anticipation of how the grasped object was to be used in future actions. The behavior exhibited through the learning process by the robot demonstrated qualitative similarities to what one sees in the development of grip selection by children in a similar task.
|
|
The robot is presented with a jar in one of two orientations. The task is to place the jar vertically with the top facing upwards. Through the course of interacting with the jar, the robot must discover 1) the sequence of actions that will accomplish the task and 2) the visual features that will allow the robot to select the appropriate sequence for a given situation. The robot is only told when the task is completed properly.
The following movies show individual trials during the learning process. In the first few trials, the robot is presented with the jar in only one orientation; this leads to the development of a "reflexive" response of reaching with the left arm (independent of the visual inputs). In the remaining trials, the jar is oriented randomly, requiring the robot to integrate the visual inputs into its decision making process.
| Left Presentations Only: Prior to Learning |
|
applesauce3-easy-other-final.mov
applesauce3-easy-other-final.mp4 applesauce3-easy-other-final.avi
|
| Left Presentation Only: Strategy After Learning |
|
applesauce1-easy-final.mov
|
| Both Orientations Presented: Before Learning |
|
applesauce4-hard-late-final.mov
applesauce4-hard-late-final.mp4 applesauce4-hard-late-final.avi
|
| Both Orientations Presented: During Learning |
|
applesauce8-hard-early-final.mov
applesauce8-hard-early-final.mp4 applesauce8-hard-early-final.avi
|
| Both Orientations Presented: After Learning |
|
applesauce7-hard-optimal-final.mov
applesauce7-hard-optimal-final.mp4 applesauce7-hard-optimal-final.avi
|
|
The remote teleoperation of robots is one of the dominant modes of
robot control in applications involving hazardous environments,
including space. Here, a user is equipped with an interface that
conveys the sensory information being collected by the robot and
allows the user to command the robot's actions. The difficulty with
this form of interface is the degree of fatigue that is experienced by
the user, often within a short period of time. To alleviate this
problem, we are working with our colleagues at the
NASA Johnson Space
Center to develop user interfaces that anticipate the
actions of the user, allowing the robot to aid in the partial performance of the task, or
even to learn how to perform entire tasks autonomously.
Our approach is to use our automatic control techniques to aid in the recognition of the user's actions. Prior to the user demonstration, the control system enumerates the different grasping actions that can be used for each object in the workspace (essentially, the robot "imagines" what it would feel like to pick up every object). The movements produced by the user are then compared against each of these imagined actions. The one action that best matches the user-driven movement is considered to be the explanation of that movement. Using this technique, we are able to recognize entire sequences of actions.
|
|
| Demonstration of a sequence by a user through a teleoperation interface. In this example, the extracted sequence is: pick up the blue ball; place it on the pink target, pick up the yellow ball, and place it on the orange target. |
|
sequence_learn_v2_demo.mov
sequence_learn_v2_demo_small.avi
|
| Automated replay of the same action sequence in a novel situation. Note that the movements are smoother and are executed more quickly than when the user is in control. |
|
sequence_learn_v2_D.mov
|
Last modified: Fri Mar 26 10:51:00 2004