en.Wedoany.com Reported - Tashi Zhihang, in collaboration with the National University of Singapore, Shanghai Jiao Tong University, the Institute of Automation of the Chinese Academy of Sciences, and Fudan University, has published a paper titled "TacForeSight: Force-Guided Tactile World Model for Contact-Rich Manipulation" on a preprint platform. This research proposes a force-conditioned tactile world model, which, for the first time, uses wrist force signals as prior information for future tactile states to predict short-term contact evolution and integrates the predictions into the robot action generation pipeline.
In contact-rich manipulation tasks such as wiping, plugging, and tightening, contact states change continuously over time, and deviations in force magnitude or position can easily lead to task failure. Existing methods often rely on feedback signals for post-hoc adjustments. The core idea of TacForeSight is to identify the temporal relationship between force and tactile sensing: wrist force provides a leading signal for overall force trends, while tactile sensing reflects local contact details. Based on this, the team built the core module TacForceWM, which encodes dual-finger tactile fields into compact tactile latent variables and uses high-frequency wrist force or torque signals to predict short-term future tactile evolution. This reduces the computational burden of generating high-dimensional tactile images and incorporates predictive information into lightweight action policy generation.

After predicting future tactile states, the system employs a Predictive Tactile-Conditioned Policy, using a Cross-Attention mechanism to explicitly model the relationship between current contact and future trends. This allows action generation to consider both current contact and impending contact changes. Additionally, a tactile-driven adaptive gating mechanism dynamically adjusts the weights of vision and tactile sensing based on the task phase: emphasizing tactile control during contact-intensive phases and relying on visual information during phases away from contact.


Experiments were conducted on a real robotic platform, including a robotic arm, gripper, camera, six-axis force/torque sensor, and dual-finger tactile sensors, covering five typical contact-intensive tasks: vase wiping, card sliding, pipe insertion, bulb tightening, and flexible wire harness insertion. Results show an average completion rate of nearly 80% on standard tasks, outperforming pure vision models, simple vision-tactile-force fusion, and baseline methods such as KineDex, FoAR, and RDP. Under dynamic perturbations in height, angle, and posture, completion rates were 90%, 85%, and 85%, respectively, averaging 86.7%. The model supports real-time inference at 20 Hz and can be embedded in high-frequency robot closed-loop control.

Latent variable visualization analysis shows that in bulb tightening and vase wiping tasks, predicted tactile latent variables exhibit contact-related changes approximately 200 milliseconds earlier than current tactile latent variables. On unseen force-tactile interaction segments such as pressing, twisting, and sliding, the latent variables extracted by the tactile encoder form separable clusters in t-SNE visualization, indicating the model's ability to discriminate contact patterns. This marks another advancement for Tashi Zhihang in the field of fine manipulation; previously in March, it released the OmniVTA visuo-tactile manipulation framework and the OmniViTac large-scale visuo-tactile dataset to help robots understand contact through vision and touch.


This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com









