Products of experts for robotic manipulation.
POLONEZ BIS-1 project financed by the National Science Centre (NCN)
-
Principal Investigator (fellow): dr Marek Kopicki, e-mail: marek.kopicki(at)put.poznan.pl, tel: +48 616652809
Investigators: dr Sainul Ansary, Jakub Chudziński, Piotr Michałek
Mentor: prof dr hab. inż. Piotr Skrzypczyński
Examples of manipulation scenarios (a-c) In columns: grasping a low rigid object (a) or a sheet of paper/cloth (b), from a table/flat surface/another object; opening a door (c). Point-based contact models (d-f): demonstration of point-based feature contacts, one per each robot link (d), transfer of contacts to a novel point cloud and performed grasp (e, f).
Autonomous robotic manipulation in novel situations is widely considered as an unsolved problem. Solving it would have enormous impact on society and economy. It would enable robots to help humans in factories, warehouses, but also at homes in everyday tasks.
This project will advance the state of the art in learning algorithms which will enable a robot to autonomously perform complex dexterous manipulation in unfamiliar situations in the real-world with novel objects and significant occlusions. In particular to grasp, push or pull rigid objects, but also to some extent, non-rigid and articulated objects that can be found at homes or in warehouses. Importantly, similarly to humans, a robot will also be able to envision results of its actions, without actually performing them. A robot could plan its actions, e.g. to tilt a low object to be able to grasp it, to bend a cloth before gripping and then pulling, to slide a sheet of paper to the edge of a table, or to open a door.
End-to-end/black-box approaches have made it possible to learn some of these kinds of complex tasks with almost no prior knowledge, for example by learning RGB image to motor command mapping directly. While this is impressive, typically the end-to-end manipulation learning requires prohibitive amounts of training data, scales poorly to different tasks, does not involve prediction and struggles to provide interpretation of its behaviour, why the learned skill fails. On the other hand, it was shown that exploiting the problem structure can help learning. For example, in bin picking scenarios, many successful approaches assume at least depth images or trajectory planners for robot control. In autonomous driving, adding representations such as depth estimation, or semantic scene segmentation enable to achieve the highest task performance.
We proposed a modularisation of the learned task, which opens up black-boxes and make them explainable. We introduced a first algorithm that was capable to learn dexterous grasps of novel objects from just one demonstration. It models grasps as a product of independent experts – object-hand link contact generative models, which accelerates learning and provides a flexible problem structure. Each model is a density over possible SE(3) poses of a particular robotic link, trained from one or more demonstrations. Grasps can be selected according to the maximum likelihood of a product of involved contact models, and their relative poses controlled by manifolds in hand posture. Furthermore, we introduced an algorithm which can learn accurate kinematic predictions in pushing and grasping solely from experience, without any knowledge of physics. However, our current algorithms and models either do not involve prediction (in grasping), they are insensitive to task context and prone to occlusions, or they assume object exemplars (in prediction). In this project, we will overcome all these limitations, by introducing hierarchical models which rely on CNN features, and which can represent both visible and occluded parts of objects of both hand link-object and object-object contacts, and also the entire manipulation context. The CNN can be trained offline, while our hierarchical models can be efficiently trained during demonstration. Furthermore, we will enable learning the quantitative generative models of SE(3) motion of objects for grasping, pushing and pulling actions. They can be trained from demonstration and experience from depth and RGB images, all within the same one algorithm. Finally, we will attempt to model dynamical properties of contacts, and learn from experience a maximum resistible wrench, a maximum wrench that does not break hand-object contact. It would enable prediction of slippage for a given hand-object contact for pushing, pulling and grasping.
-
DepthDeepSDF
DepthDeepSDF performs the normalisation directly in the depth image “space”, with (x, y) image coordinates, and a depth coordinate (z) along a “depth” projection ray of the pinhole camera model.
DeepSDF vs DepthDeepSDF performance comparison for bottle category (distance values – the smaller, the better). Right: example grasp performed by Franka robotic arm with the 2-finger gripper.
Preliminary results of 3D reconstruction of occluded scene and objects’ parts are published and presented in ”3D Object Reconstruction from a Single Depth View – Comparative Study” (proc. PP-RAI 2023).
Grasping with underactuated hand
The object can start moving depending on the complexity of the object and gripper fingers’ dynamics. Still, our approach enables contact learning and planning leading to successful grasps (Figure far right).
The example grasp trajectory begins with the initial set of contacts (Left, visualised as a region in red colour), and leads to the equilibrium state and contacts (Middle). In general, apparently complex dynamics/trajectories often lead to a similar equilibrium state (Right, at time tc).
Hardware setup
Franka Emika Panda (FR3) robot on a workbench with a custom-made gripper for dextrous manipulation.
This research is part of the project No 2021/43/P/ST6/01921 co-funded by the National Science Centre and the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 945339.
This project is affiliated at the Robotics Division, Institute of Robotics and Machine Intelligence.
Poznan University of Technology.