arXivDaily arXiv每日学术速递 周一至周五更新

视觉与机器人

机器人 / 具身智能

机器人、具身智能、机器人学习、操作、导航和具身世界模型。

今日/当前日期收录 8 信号源:cs.RO, cs.AI, cs.CV, cs.LG
2603.04531 2026-06-19 cs.RO 版本更新 95%

PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation

PTLD: 从仿真到现实的触觉潜在知识蒸馏用于灵巧操作

Rosy Chen, Mustafa Mukadam, Michael Kaess, Tingfan Wu, Francois R Hogan, Jitendra Malik, Akash Sharma

发表机构 * Carnegie Mellon University(卡内基梅隆大学) University of Washington(华盛顿大学) FAIR at Meta(Meta的FAIR团队) UC Berkeley(伯克利大学)

专题命中 机器人操作 :提出触觉蒸馏方法用于灵巧操作任务

AI总结 提出PTLD方法,通过真实世界触觉策略数据蒸馏鲁棒状态估计器,解决触觉仿真困难问题,在灵巧操作任务中相比纯本体感策略提升182%和57%。

详情
AI中文摘要

触觉灵巧操作对于自动化复杂家务任务至关重要,但学习有效控制策略仍然是一个挑战。虽然最近的工作依赖于模仿学习,但通过机器人遥操作或动觉教学获取多指手的高质量演示是困难的。另一种方法是,通过强化学习我们可以在仿真中学习技能,但快速且真实的触觉观测仿真具有挑战性。为了弥合这一差距,我们引入了PTLD:从仿真到现实的触觉潜在知识蒸馏,这是一种无需触觉仿真即可学习触觉操作技能的新方法。我们的关键思想不是模拟触觉传感器或纯粹依赖本体感策略进行零样本从仿真到现实的迁移,而是利用现实世界中的特权传感器收集真实的触觉策略数据。然后,这些数据用于蒸馏一个鲁棒的状态估计器,该估计器基于触觉输入运行。我们的实验表明,PTLD可以通过结合触觉感知显著改善在仿真中训练的本体感操作策略。在基准的掌内旋转任务中,PTLD相比纯本体感策略实现了182%的提升。我们还展示了PTLD能够学习具有挑战性的触觉掌内重定向任务,在该任务中,我们观察到达到的目标数量相比仅使用本体感提高了57%。网站:此 https URL。

英文摘要

Tactile dexterous manipulation is essential to automating complex household tasks, yet learning effective control policies remains a challenge. While recent work has relied on imitation learning, obtaining high quality demonstrations for multi-fingered hands via robot teleoperation or kinesthetic teaching is prohibitive. Alternatively, with reinforcement we can learn skills in simulation, but fast and realistic simulation of tactile observations is challenging. To bridge this gap, we introduce PTLD: sim-to-real Privileged Tactile Latent Distillation, a novel approach to learning tactile manipulation skills without requiring tactile simulation. Instead of simulating tactile sensors or relying purely on proprioceptive policies to transfer zero-shot sim-to-real, our key idea is to leverage privileged sensors in the real world to collect real-world tactile policy data. This data is then used to distill a robust state estimator that operates on tactile input. We demonstrate from our experiments that PTLD can be used to improve proprioceptive manipulation policies trained in simulation significantly by incorporating tactile sensing. On the benchmark in-hand rotation task, PTLD achieves a 182% improvement over a proprioception only policy. We also show that PTLD enables learning the challenging task of tactile in-hand reorientation where we see a 57% improvement in the number of goals reached over using proprioception alone. Website: https://akashsharma02.github.io/ptld-website/.

2510.08807 2026-06-19 cs.RO cs.LG 版本更新 90%

Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation

Humanoid Everyday:面向开放世界人形机器人操作的综合机器人数据集

Zhenyu Zhao, Hongyi Jing, Xiawei Liu, Jiageng Mao, Abha Jha, Hanwen Yang, Rong Xue, Sergey Zakharov, Vitor Guizilini, Yue Wang

发表机构 * University of Southern California(南加州大学) Toyota Research Institute(丰田研究院)

专题命中 机器人操作 :提供人形机器人灵巧操作数据集,含260任务

AI总结 提出Humanoid Everyday数据集,包含10.3k轨迹、260个任务的多模态数据,用于人形机器人灵巧操作、人机交互和移动操作研究,并配套云评估平台。

详情
AI中文摘要

从运动到灵巧操作,人形机器人在展示复杂的全身能力方面取得了显著进展。然而,当前大多数机器人学习数据集和基准主要关注固定机器人臂,少数现有人形数据集要么局限于固定环境,要么任务多样性有限,通常缺乏人机交互和下肢运动。此外,缺乏用于在人形数据上对基于学习的策略进行基准测试的标准化评估平台。在这项工作中,我们提出了Humanoid Everyday,一个大规模且多样化的人形操作数据集,其特点是涉及灵巧物体操作、人机交互、运动集成动作等广泛的任务多样性。利用高效的人工监督遥操作流水线,Humanoid Everyday聚合了高质量的多模态感官数据,包括RGB、深度、LiDAR和触觉输入,以及自然语言注释,包含10.3k条轨迹和超过300万帧数据,涵盖7个大类共260个任务。此外,我们对数据集上的代表性策略学习方法进行了分析,提供了它们在不同任务类别中的优势和局限性的见解。为了标准化评估,我们引入了一个基于云的评估平台,允许研究人员在我们的受控环境中无缝部署他们的策略并接收性能反馈。通过发布Humanoid Everyday以及我们的策略学习分析和标准化的基于云的评估平台,我们旨在推进通用人形操作的研究,并为现实世界中更有能力和具身化的机器人代理奠定基础。我们的数据集、数据收集代码和云评估网站在我们的项目网站上公开发布。

英文摘要

From loco-motion to dextrous manipulation, humanoid robots have made remarkable strides in demonstrating complex full-body capabilities. However, the majority of current robot learning datasets and benchmarks mainly focus on stationary robot arms, and the few existing humanoid datasets are either confined to fixed environments or limited in task diversity, often lacking human-humanoid interaction and lower-body locomotion. Moreover, there are a few standardized evaluation platforms for benchmarking learning-based policies on humanoid data. In this work, we present Humanoid Everyday, a large-scale and diverse humanoid manipulation dataset characterized by extensive task variety involving dextrous object manipulation, human-humanoid interaction, locomotion-integrated actions, and more. Leveraging a highly efficient human-supervised teleoperation pipeline, Humanoid Everyday aggregates high-quality multimodal sensory data, including RGB, depth, LiDAR, and tactile inputs, together with natural language annotations, comprising 10.3k trajectories and over 3 million frames of data across 260 tasks across 7 broad categories. In addition, we conduct an analysis of representative policy learning methods on our dataset, providing insights into their strengths and limitations across different task categories. For standardized evaluation, we introduce a cloud-based evaluation platform that allows researchers to seamlessly deploy their policies in our controlled setting and receive performance feedback. By releasing Humanoid Everyday along with our policy learning analysis and a standardized cloud-based evaluation platform, we intend to advance research in general-purpose humanoid manipulation and lay the groundwork for more capable and embodied robotic agents in real-world scenarios. Our dataset, data collection code, and cloud evaluation website are made publicly available on our project website.

2509.10416 2026-06-19 cs.RO 版本更新 85%

TASC: Task-Aware Shared Control for Relational Telemanipulation

TASC:面向关系遥操作的任务感知共享控制

Ze Fu, Pinhao Song, Yutong Hu, Renaud Detry

发表机构 * KU Leuven, Dept. Mechanical Engineering, Research unit Robotics, Automation and Mechatronics(KU莱顿机械工程系,机器人、自动化与机电一体化研究单位) KU Leuven, Dept. Electrical Engineering, Research unit Processing Speech and Images(KU莱顿电气工程系,语音与图像处理研究单位)

专题命中 机器人操作 :提出遥操作共享控制框架,涉及机器人操作和任务级意图推理。

AI总结 提出TASC框架,通过视觉构建开放词汇交互图推断任务级用户意图,并基于空间约束提供共享控制辅助,提升关系遥操作效率与泛化能力。

Comments Accepted to IROS 2026

详情
AI中文摘要

我们提出了TASC,一个面向关系遥操作的任务感知共享控制框架,该框架从仅运动输入中推断任务级用户意图并提供辅助。为了在没有预定义模板的情况下支持抓取关系任务,TASC从视觉输入构建一个开放词汇的交互图来表示功能性物体关系,并据此推断用户意图。然后,共享控制策略在抓取和物体交互过程中提供辅助,该辅助由视觉语言模型预测的空间约束引导。我们的方法解决了共享控制下关系遥操作的两个关键挑战:(1)从低级运动命令中推断任务级意图,以及(2)跨不同物体和任务的泛化辅助。在仿真和真实世界的实验表明,与先前方法相比,TASC提高了任务效率并减少了用户输入努力,同时实现了跨多种关系遥操作任务的零样本泛化。支持我们实验的代码在此https URL公开提供。

英文摘要

We present TASC, a Task-Aware Shared Control framework for relational telemanipulation that infers task-level user intent and provides assistance from motion-only input. To support prehensile relational tasks without predefined templates, TASC constructs an open-vocabulary interaction graph from visual input to represent functional object relationships, and infers user intent accordingly. A shared control policy then provides assistance during both grasping and object interaction, guided by spatial constraints predicted by a vision-language model. Our method addresses two key challenges in relational telemanipulation under shared control: (1) task-level intent inference from low-level motion commands, and (2) generalizable assistance across diverse objects and tasks. Experiments in both simulation and the real world demonstrate that TASC improves task efficiency and reduces user input effort compared to prior methods, while enabling zero-shot generalization across diverse relational telemanipulation tasks. The code that supports our experiments is publicly available at https://github.com/fitz0401/tasc.

2509.00271 2026-06-19 cs.RO 版本更新 85%

Learn from What We HAVE: History-Aware VErifier that Reasons about Past Interactions Online

从我们所拥有的学习:在线推理过去交互的历史感知验证器

Yishu Li, Xinyi Mao, Ying Yuan, Kyutae Sim, Ben Eisner, David Held

发表机构 * Robotics Institute, Carnegie Mellon University(卡内基梅隆大学机器人研究所) Computer Science and Technology, Tsinghua University(清华大学计算机科学与技术系)

专题命中 机器人操作 :提出历史感知验证器,用于机器人操作中的动作选择。

AI总结 提出历史感知验证器HAVE,通过解耦动作生成与验证,利用历史交互在线消除歧义,理论证明其提升期望动作质量,在多个模拟和真实环境中验证有效性。

Comments CoRL 2025

详情
AI中文摘要

我们引入了一种新颖的历史感知验证器(HAVE),通过利用过去的交互来在线消除不确定场景中的歧义。机器人经常遇到视觉上模糊的物体,这些物体的操作结果直到物理交互之前都是不确定的。虽然仅凭生成模型理论上可以适应这种模糊性,但在实践中,即使在以动作历史为条件的情况下,它们在模糊情况下也会获得次优性能。为了解决这个问题,我们提出明确地将动作生成与验证解耦:我们使用无条件的基于扩散的生成器来提出多个候选动作,并采用我们的历史感知验证器通过推理过去的交互来选择最有希望的动作。通过理论分析,我们证明了使用验证器显著提高了期望动作质量。在多个模拟和真实环境(包括铰接物体、多模态门和不均匀物体拾取)中的实证评估和分析证实了我们方法的有效性以及对基线的改进。我们的项目网站位于:this https URL

英文摘要

We introduce a novel History-Aware VErifier (HAVE) to disambiguate uncertain scenarios online by leveraging past interactions. Robots frequently encounter visually ambiguous objects whose manipulation outcomes remain uncertain until physically interacted with. While generative models alone could theoretically adapt to such ambiguity, in practice they obtain suboptimal performance in ambiguous cases, even when conditioned on action history. To address this, we propose explicitly decoupling action generation from verification: we use an unconditional diffusion-based generator to propose multiple candidate actions and employ our history-aware verifier to select the most promising action by reasoning about past interactions. Through theoretical analysis, we demonstrate that employing a verifier significantly improves expected action quality. Empirical evaluations and analysis across multiple simulated and real-world environments including articulated objects, multi-modal doors, and uneven object pick-up confirm the effectiveness of our method and improvements over baselines. Our project website is available at: https://liy1shu.github.io/HAVE_CoRL25/

2605.25005 2026-06-19 cs.RO 版本更新 80%

Stiffness Optimization for Concentrated Bending in Magnetically Actuated Catheters: Maintaining Steerability under Gradient Stiffness

磁驱动导管集中弯曲的刚度优化:在梯度刚度下保持可操控性

Jiewen Tan, Junnan Xue, Shing Shin Cheng, Shuang Song, Erli Lyu, Jiaole Wang

发表机构 * Harbin Institute of Technology (Shenzhen)(哈尔滨工业大学(深圳)) The Chinese University of Hong Kong(香港中文大学) Macao Polytechnic University(澳门理工学院)

专题命中 机器人操作 :磁驱动软导管刚度优化与操控

AI总结 针对磁驱动软导管在推送性与近端集中弯曲之间的权衡,提出一种刚度优化的多段磁驱动导管(SO-MAC),通过解耦转向-推进机构和梯度刚度架构,在推进过程中实现稳定的近端枢轴弯曲,同时远端被动自直以传递推进力。

详情
AI中文摘要

对于磁驱动软导管,实现高效的推送性(推进力传递)和近端集中弯曲以保持可操控性具有挑战性:较高的轴向/弯曲刚度可改善力传递但降低可操控性,而较低的刚度可实现大的近端集中弯曲,但在压缩推送载荷下增加扭结/屈曲风险。为了解决这一权衡,我们提出了一种刚度优化的多段磁驱动导管(SO-MAC),它集成了解耦的转向-推进机构与梯度刚度架构。SO-MAC在推进过程中将弯曲集中在稳定的近端枢轴周围,而远端部分通过优化的刚度分布和弹簧骨架的弹性恢复抵抗摩擦引起的扭结/屈曲,被动自直以传递推进力。在$0{-}180^{\circ}$的组合转向和推进过程中,枢轴保持稳定,远端尖端几乎直线地向目标方向推进。直径为1.5 mm的SO-MAC在其10 mm尖端处实现了高达$180^{\circ}$的转向,弯曲半径为3 mm,平均形状误差为$1.39 \pm 0.56$ mm,转向枢轴误差为$0.35 \pm 0.10$ mm。在支气管模型中的视觉反馈控制进一步验证了通过高度弯曲的分叉路径的鲁棒导航。

英文摘要

Achieving both efficient pushability (propulsion transmission) and proximally concentrated bending for steerability is challenging for magnetically actuated soft catheters: higher axial/bending stiffness improves force transmission but reduces steerability, whereas lower stiffness enables large, proximally concentrated bending yet increases kinking/buckling risk under compressive push loads. To address this trade-off, we propose a stiffness-optimized multi-segment magnetically actuated catheter (SO-MAC) that integrates a decoupled steering-advancement mechanism with a gradient-stiffness architecture. The SO-MAC concentrates bending about a stable proximal pivot during advancement while the distal section passively self-straightens to transmit propulsion, aided by the optimized stiffness distribution and elastic recovery of the spring backbone against friction-induced kinking/buckling. Over $0{-}180^{\circ}$ combined steering and advancement, the pivot remained stable and the distal tip advanced near-straight toward the target direction. A 1.5 mm-diameter SO-MAC achieved up to $180^{\circ}$ steering with a 3 mm bending radius at its 10 mm tip, with an average shape error of $1.39 \pm 0.56$ mm and a steering-pivot error of $0.35 \pm 0.10$ mm. Visual feedback control in a bronchial phantom further confirmed robust navigation through highly curved, bifurcating paths.

2508.21677 2026-06-19 cs.RO 版本更新 80%

Robust Convex Model Predictive Control with collision avoidance guarantees for robot manipulators

具有碰撞避免保证的机器人操作器鲁棒凸模型预测控制

Bernhard Wullt, Johannes Köhler, Per Mattsson, Mikeal Norrlöf, Thomas B. Schön

发表机构 * ABB robotics(ABB机器人公司) Department of Mechanical Engineering, Imperial College London(帝国理工学院机械工程系) Department of Information Technology, Uppsala University(乌普萨拉大学信息科技系)

专题命中 机器人操作 :鲁棒MPC实现工业机器人无碰撞运动

AI总结 提出一种结合鲁棒管MPC与走廊规划算法的凸MPC方案,在模型不确定下实现工业机器人快速无碰撞运动,优于基准方法。

详情
AI中文摘要

工业操作器通常在杂乱环境中运行,安全运动规划至关重要。然而,模型不确定性使任务更加复杂,导致保守的速度限制以减少干扰影响。因此,需要能够保证快速执行安全运动的控制方法。我们通过为操作器提出一种新颖的模型预测控制(MPC)方案来解决这一问题,其中两个主要组件是鲁棒管MPC和用于获得无碰撞运动的走廊规划算法。我们的方案形成凸MPC公式,可以快速求解,使方法具有实际应用价值。我们在模拟环境中展示了方法的有效性,该环境包含一个6自由度工业机器人在具有不确定模型参数的杂乱环境中运行。通过容忍更高水平的模型不确定性同时实现更快的运动,我们优于基准方法。

英文摘要

Industrial manipulators typically operate in cluttered environments, where safe motion planning is critical. However, model uncertainties further complicate this task, which leads to conservative speed limits to reduce the influence of disturbances. Hence, there is a need for control methods that can guarantee safe motions which are executed fast. We address this by suggesting a novel model predictive control (MPC) solution for manipulators, where our two main components are a robust tube MPC and a corridor planning algorithm to obtain collision-free motion. Our solution results in a convex MPC formulation, which we can solve fast, making our method practically useful. We demonstrate the efficacy of our method in a simulated environment with a 6 DOF industrial robot operating in cluttered environments with uncertain model parameters. We outperform benchmark methods by tolerating higher levels of model uncertainty while achieving faster motion.

2504.15535 2026-06-19 cs.RO 版本更新 80%

VibeCheck: Using Active Acoustic Tactile Sensing for Contact-Rich Manipulation

VibeCheck: 使用主动声学触觉传感进行接触丰富的操作

Kaidi Zhang, Do-Gon Kim, Eric T. Chang, Hua-Hsuan Liang, Zhanpeng He, Kathryn Lampo, Philippe Wu, Ioannis Kymissis, Matei Ciocarlie

发表机构 * Dept. of Mechanical Engineering(机械工程系) Dept. of Computer Science(计算机科学系) Dept. of Electrical Engineering(电气工程系) Columbia University(哥伦比亚大学)

专题命中 机器人操作 :主动声学触觉传感用于接触丰富的操作任务。

AI总结 本文构建了带有两个压电手指的主动声学传感夹爪,通过物体传递声学振动来感知其声学特性和接触状态,用于物体分类、抓取位置估计、内部结构姿态估计以及外部接触类型分类,并基于接触分类模型实现了鲁棒的插销任务。

Comments Published at IROS 2025. 8 pages, 7 figures

详情
AI中文摘要

物体的声学响应可以揭示其全局状态,例如材料属性或与外界的外部接触。在这项工作中,我们构建了一个主动声学传感夹爪,配备两个压电手指:一个用于生成信号,另一个用于接收信号。通过将一个手指的声学振动通过物体传递到另一个手指,我们能够洞察物体的声学特性和接触状态。我们使用该系统进行物体分类、估计抓取位置、估计内部结构的姿态,以及分类物体与环境的外部接触类型。利用我们的接触类型分类模型,我们解决了一个标准的长时域操作问题:插销插入。我们基于传感器的性能使用一个简单的模拟转移模型来训练一个模仿学习策略,该策略对分类器的不完美预测具有鲁棒性。最后,我们在UR5机器人上演示了该策略,仅使用主动声学传感作为反馈。视频可在此 https URL 找到。

英文摘要

The acoustic response of an object can reveal a lot about its global state, for example its material properties or the extrinsic contacts it is making with the world. In this work, we build an active acoustic sensing gripper equipped with two piezoelectric fingers: one for generating signals, the other for receiving them. By sending an acoustic vibration from one finger to the other through an object, we gain insight into an object's acoustic properties and contact state. We use this system to classify objects, estimate grasping position, estimate poses of internal structures, and classify the types of extrinsic contacts an object is making with the environment. Using our contact type classification model, we tackle a standard long-horizon manipulation problem: peg insertion. We use a simple simulated transition model based on the performance of our sensor to train an imitation learning policy that is robust to imperfect predictions from the classifier. We finally demonstrate the policy on a UR5 robot with active acoustic sensing as the only feedback. Videos can be found at https://roamlab.github.io/vibecheck .

2512.20014 2026-06-19 cs.RO cs.AI 版本更新 75%

Bring My Cup! Personalizing Vision-Language-Action Models with Visual Attentive Prompting

Bring My Cup! 使用视觉注意力提示个性化视觉-语言-动作模型

Sangoh Lee, Sangwoo Mo, Wook-Shin Han

发表机构 * GSAI, POSTECH(POSTECH 人工智能研究所) IME, POSTECH(POSTECH 信息媒体研究所)

专题命中 机器人操作 :机器人操作个人物品

AI总结 针对VLA模型难以处理个性化指令的问题,提出无需训练的视觉注意力提示(VAP)方法,通过参考图像作为非参数记忆,利用开放词汇检测和嵌入匹配定位个人物品,并以视觉提示注入模型,在多个仿真和真实场景中显著提升成功率和正确物体操作。

Comments ICML 2026. Project page: https://vap-project.github.io/

详情
AI中文摘要

尽管视觉-语言-动作(VLA)模型能够很好地泛化到通用指令,但在处理个性化命令(如“bring my cup”)时却存在困难,因为机器人必须在视觉相似的物体中识别并操作特定实例。我们研究了这种操作个人物品的场景,其中VLA必须仅使用少量参考图像来识别并控制训练中未见过的用户特定物体。为了解决这一挑战,我们提出了视觉注意力提示(VAP),一种简单而有效的无需训练的感知适配器,为冻结的VLA模型赋予自上而下的选择性注意力。VAP将参考图像视为非参数视觉记忆,通过开放词汇检测和基于嵌入的匹配将个人物品定位到场景中,然后通过突出显示该物体并重写指令,将这种定位作为视觉提示注入模型。我们构建了两个仿真基准(Personalized-SIMPLER和Personalized-VLABench)以及一个真实桌面基准,用于评估多个机器人和任务上的个性化操作。实验表明,VAP在成功率和正确物体操作方面始终优于通用策略和令牌学习基线,有助于弥合语义理解与实例级控制之间的差距。

英文摘要

While Vision-Language-Action (VLA) models generalize well to generic instructions, they struggle with personalized commands such as "bring my cup," where the robot must act on one specific instance among visually similar objects. We study this setting of manipulating personal objects, in which a VLA must identify and control a user-specific object unseen during training using only a few reference images. To address this challenge, we propose Visual Attentive Prompting (VAP), a simple-yet-effective training-free perceptual adapter that equips frozen VLAs with top-down selective attention. VAP treats the reference images as a non-parametric visual memory, grounds the personal object in the scene through open-vocabulary detection and embedding-based matching, and then injects this grounding as a visual prompt by highlighting the object and rewriting the instruction. We construct two simulation benchmarks, Personalized-SIMPLER and Personalized-VLABench, and a real-world tabletop benchmark to evaluate personalized manipulation across multiple robots and tasks. Experiments show that VAP consistently outperforms generic policies and token-learning baselines in both success rate and correct-object manipulation, helping to bridge the gap between semantic understanding and instance-level control.