While robots are extensively used in factories, our industry hasn't yet been able to prepare them for working in human environments - for instance in houses or in human-operated factories. The main obstacle to these applications lies in the amplitude of the uncertainty inherent to the environments humans are used to work in, and in the difficulty in programming robots to cope with it. For instance, in robot-oriented environments, robots can expect to find specific tools and objects in specific places. In a human environment, obstacles may force one to find a new way of holding a tool, and new objects appear continuously and need to be dealt with. As it proves difficult to build into robots the knowledge necessary for coping with uncertain environments, the robotics community is turning to the development of agents that acquire this knowledge progressively and that adapt to unexpected events.
This thesis studies the problem of vision-based robotic grasping in uncertain environments. We aim to create an autonomous agent that develops grasping skills from experience, by interacting with objects and with other agents. To this end, we present a 3D object model for autonomous, visuomotor interaction. The model represents grasping strategies along with visual features that predict their applicability. It provides a robot with the ability to compute grasp parameters from visual observations. The agent acquires models interactively by manipulating objects, possibly imitating a teacher. With time, it becomes increasingly efficient at inferring grasps from visual evidence. This behavior relies on (1) a grasp model representing relative object-gripper configurations and their feasibility, and (2) a model of visual object structure, which aligns the grasp model to arbitrary object poses (3D positions and orientations).
The visual model represents object edges or object faces in 3D by probabilistically encoding the spatial distribution of small segments of object edges or the distribution of small patches of object surface. A model is learned from a few segmented 3D scans or stereo images of an object. Monte Carlo simulation provides robust estimates of the object's 3D position and orientation in cluttered scenes.
The grasp model represents the likelihood of success of relative object-gripper configurations. Initial models are acquired from visual cues or by observing a teacher. Models are then refined autonomously by ``playing' with objects and observing the effects of exploratory grasps. After the robot has learned a few object models, learning becomes a combination of cross-object generalization and interactive experience: grasping strategies are generalized across objects that share similar visual substructures; they are then adapted to new objects through autonomous exploration.
The applicability of our model is supported by numerous examples of pose estimates in cluttered scenes, and by a robot platform that shows increasing grasping capabilities as it explores its environment.