Robot trajectories used for learning end-to-end robot policies typically contain end-effector and gripper position, workspace images, and language. Policies learned from such trajectories are unsuitable for delicate grasping, which require tightly coupled and precise gripper force and gripper position. We collect and make publically available 130 trajectories with force feedback of successful grasps on 30 unique objects. Our current-based method for sensing force, albeit noisy, is gripper-agnostic and requires no additional hardware. We train and evaluate two diffusion policies: one with (forceful) the collected force feedback and one without (position-only). We find that forceful policies are superior to position-only policies for delicate grasping and are able to generalize to unseen delicate objects, while reducing grasp policy latency by near 4x, relative to LLM-based methods. With our promising results on limited data, we hope to signal to others to consider investing in collecting force and other such tactile information in new datasets, enabling more robust, contact-rich manipulation in future robot foundation models.
We introduce a dataset (RLDS format) of 130 successful adaptive grasp trajectories across 30 unique objects spanning two orders of magnitude in mass (1g to 500g) and variable deformability. Data is collected at 5 Hz from a MAGPIE gripper on a UR5 robot arm with a wrist-mounted Realsense D405 camera and a Realsense D435 camera overlooking a square, 55cm table.
To collect expert demonstrations, we employ DeliGrasp, which uses LLM-estimated object mass, friction coefficient, and spring constants as parameters in a proportional controller which increases applied force and gripper closure until a measured contact force. We command applied force by incrementing motor torque limit on a Dynamixel motor (an equivalent actuator-agnostic approach would be to increase supply current), and we measure contact force from increased current draw.
We train diffusion policies from this data, one with the full feature set and one without force (position-only). In our experiments we deploy and evaluate the policies only during the stationary grasp portion of a trajectory. We manually qualify deformation failures on a per-object common-sense basis and check for slip by raising the robot gripper directly vertically by 10cm. As the average adaptive grasp in the dataset completes in under 10 steps, for one ``grasp" we rollout the policy for 15 steps at 4Hz.
Across all objects, we find that forceful policies (82% success) are superior to position-only policies (54% success). Position-only policies are still capable, perhaps because they are artifacts of forceful adaptive grasping, just trained without the force feedback, and the control law may be implicitly learned through solely vision, gripper position, and task instruction. Forceful policies generalize to unseen objects (80% success, compared to 85% for seen objects) and withhheld policies improve (60%, up from 45%), potentially due to relatively stiff objects like the egg and potato chip being forgiving for additional compression.
In the above figure we depict per-object grasp trajectories and forces and observe that position-only policies uniformly compress more than forceful policies. Position-only policies are initially more aggressive in closing the gripper and often continue aggressive closure past contact, resulting in deformation failures. Forceful policies flatten applied force as contact force increases for some objects (pepper, empty taco, blackberry, tomato), showing vestiges of the proportional control law used in expert demonstrations, however, policies still apply more force than is typically needed and have not fully learned the control characteristics. Additionally, while objects span a large range of gripper position (5 to 65mm), final applied force lies in a smaller range (1.1N to 2.3N).
What's next?
We're trying to isolate and fully characterize the utility of force feedbacck in end-to-end robot policies.
Why this weird font?
The website template is adapted from Language to Rewards and ProgPrompt.