Learning to Dexterously Pick or Separate Tangled-Prone Objects for Industrial Bin Picking

Xinyi Zhang¹, Yukiyasu Domae², Weiwei Wan¹, Kensuke Harada^1,2

¹Osaka University, ²National Institute of Advanced Industrial Science and Technology (AIST)

Industrial bin picking for tangled-prone objects requires the robot to either pick up untangled objects or perform separation manipulation when the bin contains no isolated objects. The robot must be able to flexibly perform appropriate actions based on the current observation. It is challenging due to high occlusion in the clutter, elusive entanglement phenomena, and the need for skilled manipulation planning. In this paper, we propose an autonomous, effective and general approach for picking up tangled-prone objects for industrial bin picking. First, we learn PickNet - a network that maps the visual observation to pixel-wise possibilities of picking isolated objects or separating tangled objects and infers the corresponding grasp. Then, we propose two effective separation strategies: Dropping the entangled objects into a buffer bin to reduce the degree of entanglement; Pulling to separate the entangled objects in the buffer bin planned by PullNet - a network that predicts position and direction for pulling from visual input. To efficiently collect data for training PickNet and PullNet, we embrace the self-supervised learning paradigm using an algorithmic supervisor in a physics simulator. Real-world experiments show that our policy can dexterously pick up tangled-prone objects with success rates of 90%. We further demonstrate the generalization of our policy by picking a set of unseen objects.

Paper

Latest version: here (with supplementary material)

IEEE Robotics and Automation Letters (RA-L), 2023.

Code is available on Github.

Supplementary Video

Results of predicting isolated/tangled objects using PickNet

PickNet outputs two heatmaps: PickMap and SepMap, respectively indicate the grasp affordance of isolated or tangled objects in the clutter.

Results of planning pulling actions using PullNet

We rotate the input depth images for 8 angles representing 8 directions of pulling (pointing to the right). We select the one with the highest values among the PullNet outputs (PullMaps).