Tactile sensing, the sense of touch, is essential for robots to understand and effectively interact with their environment. Existing solutions for vision-based tactile sensors are often tailored to specific tasks and sensor types. The diversity of sensors – in terms of shape, lighting, and surface markings – makes developing universal solutions challenging. Traditional models are frequently developed for specific tasks or sensors, making scaling these solutions to different applications inefficient. Moreover, collecting labeled data for critical properties like force and slip is time-consuming and resource-intensive, further limiting the potential of tactile sensor technology in wide-ranging applications.
Meta AI has introduced Sparsh, a universal encoder for vision-based tactile sensors. "Sparsh," the Sanskrit word for "touch," represents a shift away from sensor-specific models towards a more flexible, scalable approach. Sparsh leverages advancements in self-supervised learning (SSL) to create tactile representations applicable to a wide array of vision-based tactile sensors. Unlike previous approaches that rely on task-specific, labeled data, Sparsh is trained on over 460,000 unlabeled tactile images sourced from various tactile sensors. By eliminating the need for labels, Sparsh opens possibilities for applications beyond the capabilities of conventional tactile models.
Sparsh is built upon state-of-the-art SSL models like DINO and Joint-Embedding Predictive Architecture (JEPA), adapted to the tactile domain. This approach allows Sparsh to generalize across different sensor types, such as DIGIT and GelSight, and achieve high performance across multiple tasks. The encoder family, pre-trained with over 460,000 tactile images, serves as a foundation and reduces the need for manually labeled data, enabling more efficient training. The Sparsh framework includes TacBench, a benchmark consisting of six touch-centric tasks: force estimation, slip detection, pose estimation, grasp stability, texture recognition, and dexterous manipulation. These tasks evaluate the performance of Sparsh models against conventional sensor-specific solutions, demonstrating significant performance gains – 95% on average – while using only 33-50% of the labeled data required by other models.
The implications of Sparsh are significant, particularly for robotics, where tactile sensing plays a crucial role in enhancing physical interaction and dexterity. By overcoming the limitations of traditional models that require labeled data, Sparsh paves the way for more advanced applications, including in-hand manipulation and dexterous planning. Evaluations show Sparsh outperforming end-to-end task-specific models by over 95% in benchmark scenarios. This means robots equipped with Sparsh-powered tactile sensors can better understand their physical environment, even with minimal labeled data. Furthermore, Sparsh has proven highly effective in various tasks, including slip detection (achieving the highest F1 score among tested models) and texture recognition, offering a robust solution for real-world robotic manipulation tasks.
Meta's introduction of Sparsh marks a significant step towards advancing physical intelligence through AI. By releasing this family of universal touch encoders, Meta aims to empower the research community to develop scalable solutions for robotics and AI. Sparsh's utilization of self-supervised learning allows it to bypass the expensive and laborious process of collecting labeled data, providing a more efficient path towards developing sophisticated tactile applications. Its ability to generalize across tasks and sensors, as demonstrated by its superior performance in the TacBench benchmark, highlights its transformative potential. As Sparsh gains wider adoption, we could see advancements in various fields, from industrial robots to household automation, where physical intelligence and tactile precision are essential for effective performance. Sparsh could thus make a vital contribution to developing robots capable of tackling complex real-world tasks.
Bibliographie: https://ai.meta.com/research/publications/sparsh-self-supervised-touch-representations-for-vision-based-tactile-sensing/ https://www.marktechpost.com/2024/11/02/meta-ai-releases-sparsh-the-first-general-purpose-encoder-for-vision-based-tactile-sensing/ https://arxiv.org/html/2410.24090v1 https://www.linkedin.com/pulse/new-meta-sparsh-vision-based-touch-sensing-theturingpost-977if https://arxiv.org/abs/2410.24090 https://www.threads.net/@aiatmeta/post/DBy4Z-liHjq https://github.com/facebookresearch/sparsh https://openreview.net/forum?id=xYJn2e1uu8