arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.13817 2026-06-15 cs.RO cs.LG 新提交

ReactVLA: 通过改进的平均流动作生成实现快速轻量级反应式机器人操作

Yanzhao Guo, Wenkai Chen, Jianwei Zhang

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Technical Aspects of Multimodal Systems (TAMS), Department of Informatics, Universität Hamburg（汉堡大学信息学系多模态系统技术方面（TAMS））

AI总结提出ReactVLA框架，结合改进的平均流动作生成器和注意力残差机制，实现轻量低延迟的实时机器人操作，在模拟和真实任务中性能提升达1.65倍，推理速度提升4倍以上。

详情

AI中文摘要

基于扩散的视觉-语言-动作（VLA）策略在建模表达性和多模态动作分布方面表现出强大的能力。然而，它们对迭代采样的依赖引入了显著的推理延迟，限制了其在反应式闭环机器人操作中的应用。为了解决这一限制，我们提出了\texttt{ReactVLA}，一个用于实时机器人操作的轻量级低延迟VLA框架。\texttt{ReactVLA}结合了两种互补设计：（1）改进的平均流（iMF）动作生成器，将昂贵的多步扩散采样减少到一步到几步的动作生成；（2）注意力残差（AttnRes），一种动态的深度特征路由机制，取代均匀残差累积，以更好地保留任务相关的多模态表示。我们在大规模模拟基准（包括LIBERO和RoboIMI）以及真实世界机器人操作任务上评估了\texttt{ReactVLA}。实验结果表明，\texttt{ReactVLA}始终优于同等规模的VLA基线，包括SmolVLA和$\pi_0$。在具有挑战性的精密操作任务中，与领先的VLA模型相比，\texttt{ReactVLA}在任务性能上实现了高达1.65倍的提升，同时推理速度提高了4倍以上。最后，它将真实世界策略延迟降低到38.6毫秒以下，从而在物理机器人平台上实现快速反应控制。请访问我们的项目网站：this https URL。

英文摘要

Diffusion-based Vision-Language-Action (VLA) policies have demonstrated strong capability in modeling expressive and multimodal action distributions. However, their reliance on iterative sampling introduces substantial inference latency, which limits their applicability to reactive closed-loop robot manipulation. To address this limitation, we propose \texttt{ReactVLA}, a lightweight and low-latency VLA framework for real-time robotic manipulation. \texttt{ReactVLA} combines two complementary designs: (1) an improved Mean Flow (iMF) action generator that reduces expensive multi-step diffusion sampling to one-to-few-step action generation, and (2) Attention Residuals (AttnRes), a dynamic depth-wise feature routing mechanism that replaces uniform residual accumulation to better preserve task-relevant multimodal representations. We evaluate \texttt{ReactVLA} on large-scale simulation benchmarks, including LIBERO and RoboIMI, as well as real-world robotic manipulation tasks. Experimental results show that \texttt{ReactVLA} consistently outperforms similarly sized VLA baselines, including SmolVLA and $π_0$. On challenging precision manipulation tasks, \texttt{ReactVLA} achieves up to a 1.65$\times$ improvement in task performance while providing more than a 4$\times$ increase in inference speed compared with leading VLA models. Finally, it reduces real-world policy latency to below 38.6 ms, enabling fast reactive control on physical robot platforms. Please check out our project website at: https://game-loader.github.io/ReactVLA/.

URL PDF HTML ☆

赞 0 踩 0

2606.14375 2026-06-15 cs.RO cs.AI 新提交

Elastic Queries Reinforcement Learning: Self-Aware Policy Execution for VLA Models

弹性查询强化学习：VLA模型的自我感知策略执行

Ge Wang, Xinyu Tan, Xiang Li, Man Luo, Chengsi Yao, Shenhao Yan, Jiahao Yang, Fan Feng, Honghao Cai, Xiangyuan Wang, Zhixin Mai, Yiming Zhao, Yatong Han, Zhen Li

发表机构 * Ising AI ； CUHK-Shenzhen（香港中文大学（深圳））； PKU（北京大学）

AI总结提出弹性查询强化学习（EQRL），通过轻量级潜在调度适配器动态调整VLA模型的推理步骤和动作块长度，利用评论家集成分歧估计状态难度，在降低推理成本的同时保持或提升任务成功率。

详情

AI中文摘要

视觉-语言-动作（VLA）模型是机器人操作中强大的动作生成器，但通常以固定的推理和重新规划调度执行。这种刚性忽略了机器人控制的不均匀难度：接触密集或不确定状态可能需要更多计算和更新鲜的反馈，而较容易的状态通常可以用更少的推理步骤和更长的开环执行来处理。我们提出弹性查询强化学习（EQRL），一个使每个VLA策略查询具有弹性的框架。一个轻量级的潜在调度适配器联合选择潜在输入、去噪预算和动作块长度，无需微调底层VLA模型。为了使调度具有难度感知，EQRL在联合潜在调度动作上训练一个评论家，并从评论家集成分歧中推导出状态难度信号。该信号引导计算资源向困难状态倾斜，而学习到的残差允许任务驱动的修正。我们将可变块执行形式化为查询级宏动作强化学习，具有块依赖的折扣和摊销的函数评估次数（NFE）预算。在仿真和真实机器人操作中，EQRL在保持或提高任务成功率的同时，降低了摊销推理成本。

英文摘要

Vision-language-action (VLA) models are powerful action generators for robot manipulation, but they are typically executed with fixed inference and replanning schedules. This rigidity ignores the uneven difficulty of robot control: contact-rich or uncertain states may need more computation and fresher feedback, while easier states can often be handled with fewer inference steps and longer open-loop execution. We propose Elastic Queries Reinforcement Learning (EQRL), a framework that makes each VLA policy query elastic. A lightweight latent-schedule adaptor jointly selects the latent input, denoising budget, and action chunk length, without fine-tuning the underlying VLA model. To make scheduling difficulty-aware, EQRL trains a critic over the joint latent-schedule action and derives a state difficulty signal from critic ensemble disagreement. This signal guides compute toward difficult states, while a learned residual allows task-driven correction. We formulate variable chunk execution as query-level macro-action RL with chunk-dependent discounting and an amortized number-of-function-evaluations (NFE) budget. Across simulation and real-robot manipulation, EQRL reduces amortized inference cost while preserving or improving task success.

URL PDF HTML ☆

赞 0 踩 0

2606.14665 2026-06-15 cs.RO 新提交

EgoGuide: Egocentric Guidance for Efficient Robot-Free Demonstration Collection and Learning

EgoGuide: 以自我为中心引导的高效无机器人演示收集与学习

Yue Xu, Mingtao Nie, Tianle Li, Hong Li, Yibo Luo, Siyuan Huang, Yong-Lu Li

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Shanghai Innovation Institute（上海创新研究院）； Beijing Institute for General Artificial Intelligence (BIGAI)（北京通用人工智能研究院）

AI总结提出EgoGuide数据收集接口，通过同步腕部和头部/自我中心观察并在线视觉-几何质量引导，结合门控自我中心残差策略，减少所需数据量并提高数据效率。

详情

AI中文摘要

目前，从真实世界演示中进行的机器人学习受到数据扩展的限制。通用操作接口（UMI）提供了一种高效的无机器人数据收集接口，然而当前的UMI风格流程通常收集冗余的演示，并且缺乏全局场景上下文。为了提高数据效率，我们提出了EgoGuide，一种收集接口，它记录同步的腕部和头部/自我中心观察，并将其与在线视觉-几何数据质量引导相结合。我们还引入了一种门控自我中心残差策略，用于从视角变化的自我中心相机中进行鲁棒学习，允许头部/自我中心上下文纠正模糊的局部观察，同时保持稳定的腕部视角控制。真实世界实验表明，EgoGuide减少了所需的数据集数并提高了数据效率。残差策略进一步提高了视觉遮挡下的鲁棒性。项目页面：此 https URL

英文摘要

Robot learning from real-world demonstrations is currently constrained by data scaling. Universal Manipulation Interface (UMI) provides an efficient robot-free data collection interface, yet current UMI-style pipelines often collect redundant demonstrations and lack global scene context. To improve data efficiency, we present EgoGuide, a collection interface that records synchronized wrist and head/egocentric observations and couples them with online visual-geometric data quality guidance. We also introduce a Gated Egocentric Residual Policy for robust learning from a viewpoint-varying egocentric camera, allowing head/egocentric context to correct ambiguous local observations while preserving stable wrist-view control. Real-world experiments show that EgoGuide reduces the required number of data episodes and improves data efficiency. The residual policy further improves robustness under visual occlusion. Project Page: https://silicx.github.io/EgoGuide

URL PDF HTML ☆

赞 0 踩 0

2606.14418 2026-06-15 cs.AI cs.LG cs.RO 交叉投稿

通过流反转引导改进机器人通用策略

Andy Tang, William Chen, Andrew Wagenmaker, Chelsea Finn, Sergey Levine

发表机构 * Stanford University（斯坦福大学）； UC Berkeley（加州大学伯克利分校）

AI总结提出流反转引导（FRS）方法，通过逆向流策略找到次优动作的潜在噪声并映射到通用策略的动作模式，提升零样本控制、行为克隆和强化学习效果。

详情

AI中文摘要

通用策略可以从多样化的机器人数据集中学习广泛的技能。为了解决或改进具有挑战性的新任务，我们需要一种方法从策略丰富的行为先验中推断并调用适当的动作，特别是当直接命令策略失败时。我们专注于流匹配通用策略，并提出流反转引导（FRS）：一种方法，它采用次优但“合理”的动作，通过逆向流策略传递它们以找到其潜在噪声，并将它们映射到附近的通用策略动作模式。我们在多个模拟和真实世界的操作设置中评估了FRS。首先，FRS可以将来自人类或视觉语言模型的粗略语义引导转化为相应的良好机器人动作，从而改进零样本控制。这些收益可以通过行为克隆进行蒸馏，通过训练一个辅助策略输出噪声，通用策略将其映射到良好动作——在不到一分钟的训练中显示出高达95%的绝对任务成功率提升。最后，FRS通过用语义知识引导强化学习实现策略改进，在标准强化学习无法改进的多个任务上取得了改进。

英文摘要

Generalist policies can learn a wide range of skills from diverse robot datasets. In order to solve or improve on challenging new tasks, we need a way to infer and invoke the appropriate actions from the policy's rich behavioral prior, especially when directly commanding the policy fails. We focus on flow matching generalists and propose Flow Reversal Steering (FRS): a method that takes suboptimal but ``reasonable'' actions, finds their latent noises by passing them through the flow policy in reverse, and maps them to nearby generalist action modes. We evaluate FRS across many simulated and real-world manipulation settings. First, FRS can turn coarse semantic guidance from humans or vision-language models (VLMs) into corresponding good robot actions, improving zero-shot control. These gains can be distilled with behavioral cloning by training an auxiliary policy to output noises that the generalist maps to good actions -- showing up to 95% absolute task success rate boosts in under a minute of training. Finally, FRS enables policy improvement by bootstrapping reinforcement learning with semantic knowledge, improving on several tasks that standard RL fails to improve on.

URL PDF HTML ☆

赞 0 踩 0

2601.19810 2026-06-15 cs.LG cs.AI cs.RO 版本更新

Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals

高效探索的无监督学习：通过自我设定目标预训练自适应策略

Octavio Pappalardo

发表机构 * University College London (UCL)（伦敦大学学院（UCL））

AI总结提出ULEE方法，结合上下文学习器与对抗性目标生成策略，在无监督元学习框架中优化多回合探索与适应，提升零样本和少样本性能。

Comments ICLR 2026; v2 adds link to code: https://github.com/Octavio-Pappalardo/ulee-jax

详情

Journal ref: The Fourteenth International Conference on Learning Representations, 2026

AI中文摘要

无监督预训练可以为强化学习智能体提供先验知识，加速下游任务的学习。一个基于人类发展的有前景方向是研究智能体通过设定和追求自身目标来学习。核心挑战在于如何有效地生成、选择并从这些目标中学习。我们的关注点是下游任务的广泛分布，其中零样本解决每个任务是不可行的。当目标任务位于预训练分布之外或智能体未知其身份时，这种设置自然出现。在这项工作中，我们(i)在元学习框架内优化高效的多回合探索和适应，以及(ii)用智能体适应后性能的演化估计来指导训练课程。我们提出了ULEE，一种无监督元学习方法，它将上下文学习器与对抗性目标生成策略相结合，该策略将训练维持在智能体能力的前沿。在XLand-MiniGrid基准测试中，ULEE预训练产生了改进的探索和适应能力，这些能力泛化到新的目标、环境动态和地图结构。得到的策略获得了改进的零样本和少样本性能，并为更长的微调过程提供了强初始化。它优于从头学习、DIAYN预训练和替代课程。代码可在以下网址获取：https://github.com/facebookresearch/ulee

英文摘要

Unsupervised pre-training can equip reinforcement learning agents with prior knowledge and accelerate learning in downstream tasks. A promising direction, grounded in human development, investigates agents that learn by setting and pursuing their own goals. The core challenge lies in how to effectively generate, select, and learn from such goals. Our focus is on broad distributions of downstream tasks where solving every task zero-shot is infeasible. Such settings naturally arise when the target tasks lie outside of the pre-training distribution or when their identities are unknown to the agent. In this work, we (i) optimize for efficient multi-episode exploration and adaptation within a meta-learning framework, and (ii) guide the training curriculum with evolving estimates of the agent's post-adaptation performance. We present ULEE, an unsupervised meta-learning method that combines an in-context learner with an adversarial goal-generation strategy that maintains training at the frontier of the agent's capabilities. On XLand-MiniGrid benchmarks, ULEE pre-training yields improved exploration and adaptation abilities that generalize to novel objectives, environment dynamics, and map structures. The resulting policy attains improved zero-shot and few-shot performance, and provides a strong initialization for longer fine-tuning processes. It outperforms learning from scratch, DIAYN pre-training, and alternative curricula. Code is available at: https://github.com/Octavio-Pappalardo/ulee-jax

URL PDF HTML ☆

赞 0 踩 0

2605.03065 2026-06-15 cs.LG cs.RO 版本更新

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

OGPO：生成控制策略的样本高效全微调

Sarvesh Patil, Mitsuhiko Nakamoto, Manan Agarwal, Shashwat Saxena, Jesse Zhang, Giri Anantharaman, Cleah Winston, Chaoyi Pan, Douglas Chen, Nai-Chieh Huang, Zeynep Temel, Oliver Kroemer, Sergey Levine, Abhishek Gupta, Hongkai Dai, Paarth Shah, Max Simchowitz

发表机构 * University of California, Berkeley（加州大学伯克利分校）； UC Berkeley（加州大学伯克利分校）

AI总结提出OGPO算法，通过离策略评论网络和修改的PPO目标，实现生成控制策略的样本高效微调，在多种操作任务上达到最优性能，并能在无专家数据下微调不良初始化的行为克隆策略。

详情

AI中文摘要

生成控制策略（GCPs），如基于扩散和基于流的控制策略，已成为机器人学习的有效参数化方法。本文介绍了离策略生成策略优化（OGPO），一种用于微调GCPs的样本高效算法，该算法维护离策略评论网络以最大化数据重用，并通过修改的PPO目标将策略梯度传播到策略的完整生成过程，使用评论网络作为终端奖励。OGPO在涵盖多任务设置、高精度插入和灵巧控制的操作任务上达到了最先进的性能。据我们所知，它也是唯一一种能够在在线回放缓冲区中无专家数据的情况下，将初始化不良的行为克隆策略微调到接近完全任务成功的方法，并且只需很少的任务特定超参数调整。通过广泛的实证研究，我们证明了OGPO在策略引导和残差学习方面显著优于替代方法，并确定了其性能背后的关键机制。我们进一步引入了实用的稳定技巧，包括成功缓冲区正则化、双边保守优势和Q方差减少，以减轻基于状态和基于像素的设置中的评论网络过度利用。除了提出OGPO，我们还对GCP微调进行了系统的实证研究，确定了控制成功离策略全策略改进的稳定机制和失败模式。

英文摘要

Generative control policies (GCPs), such as diffusion- and flow-based control policies, have emerged as effective parameterizations for robot learning. This work introduces Off-policy Generative Policy Optimization (OGPO), a sample-efficient algorithm for finetuning GCPs that maintains off-policy critic networks to maximize data reuse and propagate policy gradients through the full generative process of the policy via a modified PPO objective, using critics as the terminal reward. OGPO achieves state-of-the-art performance on manipulation tasks spanning multi-task settings, high-precision insertion, and dexterous control. To our knowledge, it is also the only method that can fine-tune poorly-initialized behavior cloning policies to near full task-success with no expert data in the online replay buffer, and does so with few task-specific hyperparameter tuning. Through extensive empirical investigations, we demonstrate that OGPO drastically outperforms methods alternatives on policy steering and learning residual corrections, and identify the key mechanisms behind its performance. We further introduce practical stabilization tricks, including success-buffer regularization, two-sided conservative advantages, and Q-variance reduction, to mitigate critic over-exploitation across state- and pixel-based settings. Beyond proposing OGPO, we conduct a systematic empirical study of GCP finetuning, identifying the stabilizing mechanisms and failure modes that govern successful off-policy full-policy improvement.

URL PDF HTML ☆

赞 0 踩 0

2606.13842 2026-06-15 cs.RO 新提交

Efficient Domain-Adaptive Policy Learning via Kernel Representation with Application to Quadrotor Control under Non-Stationary Disturbances

基于核表示的高效域自适应策略学习及其在非平稳扰动下四旋翼控制中的应用

Hongyu Zhou, Mingtian Tan, Vasileios Tzoumas

发表机构 * University of Michigan, Ann Arbor（密歇根大学安娜堡分校）

AI总结提出一种基于核表示的高效域自适应策略学习算法，通过随机傅里叶特征建模未知扰动，离线训练仅需50秒，在线通过最小二乘估计实时更新参数，在四旋翼轨迹跟踪任务中有效应对非平稳扰动。

详情

AI中文摘要

我们提出了一种基于核表示的高效域自适应策略学习算法。学习域自适应策略具有挑战性，因为它需要一种环境表示，既能足够表达以在离线训练期间建模复杂的模拟到现实差距，又能在部署期间支持快速在线适应。例如，四旋翼可能遇到时变的非平稳扰动，如突然阵风、载荷变化或在不同飞行状态（有无地面效应）之间的转换。为了解决这些挑战，我们使用基于随机傅里叶特征的可微核近似来建模未知扰动。在离线训练阶段，我们随机采样核系数和带宽参数以生成丰富多样的扰动分布。然后通过可微仿真和解析梯度优化控制策略，该过程在RTX 4090 GPU上仅需50秒训练时间。在硬件部署期间，策略通过在线最小二乘估计更新核系数和带宽，实时适应非平稳环境。我们在高保真数值仿真和Crazyflie硬件实验中评估了该方法，在包括复杂气动效应、风、地面效应和载荷波动等各种扰动下进行四旋翼轨迹跟踪任务。

英文摘要

We present an algorithm for efficient domain-adaptive policy learning via kernel representations. Learning domain-adaptive policies is challenging since it requires an environment representation that is both sufficiently expressive to model complex sim-to-real gaps during offline training, and computationally efficient enough to support rapid online adaptation during deployment. For instance, a quadrotor may encounter time-varying, non-stationary disturbances, such as sudden gusts of wind, payload shifts, or transitions between distinct flight regimes with and without ground effects. To address these challenges, we model unknown disturbances using a differentiable kernel approximation based on random Fourier features. During the offline training phase, we randomly sample kernel coefficients and bandwidth parameters to generate a rich diversity of disturbance profiles. We then optimize the control policy via differentiable simulation with analytical gradients, a process that takes only 50 seconds of training time on an RTX 4090 GPU. During hardware deployment, the policy adapts to non-stationary environments in real time by updating both the kernel coefficients and bandwidth through online least-squares estimation. We evaluate our method on quadrotor trajectory tracking tasks across high-fidelity numerical simulations and hardware experiments using Crazyflie, subjected to various disturbances, including complex aerodynamic effects, wind, ground effects, and payload fluctuations.

URL PDF HTML ☆

赞 0 踩 0

2606.13915 2026-06-15 cs.RO cs.SY eess.SY 新提交

Learning Dynamic Swing-Up of an Inverted Pendulum using Remote Magnetic Actuation

利用远程磁驱动学习倒立摆的动态摆动控制

Viacheslav Sydora, Jasan Zughaibi, Denis von Arx, Quentin Boehler, Michael Muehlebach

发表机构 * University of Zurich（苏黎世大学）； ETH Zurich（苏黎世联邦理工学院）； University of Strasbourg（斯特拉斯堡大学）

AI总结针对电磁导航系统在远离平衡态轨迹跟踪中的空白，提出结合轨迹优化、时变LQR和迭代学习控制的方法，首次实现倒立摆的磁驱动摆动控制，六次迭代成功，并验证了ILC校正与高保真磁场模型预测的扭矩偏差高度吻合。

详情

AI中文摘要

电磁导航系统（eMNS）在微创手术和靶向药物递送中受到广泛关注。尽管大多数文献依赖于这些系统的准静态控制，但近期工作已展示了动态方法的优势。然而，远离平衡态的轨迹跟踪仍未得到充分解决。我们通过使用临床就绪的Navion eMNS首次演示了磁驱动倒立摆的摆动控制，填补了这一空白。尽管倒立摆本身不具有临床相关性，但所提出的方法将扭矩和力作为控制目标，使其适用于其他磁驱动设备，如导管和导丝。我们的方法结合了考虑eMNS内部动力学的轨迹优化、时变线性二次型调节器（LQR）状态反馈和迭代学习控制（ILC），后者利用先前的试验数据和系统动态模型逐步优化前馈指令。尽管单独使用LQR因磁驱动的复杂现象而失败，但ILC在六次迭代内实现了成功摆动。此外，实验后分析表明，学习到的ILC校正与高保真磁场模型校准预测的扭矩偏差高度吻合，表明学习和自适应是处理电磁驱动中不确定性的有前景工具，这些不确定性可能源于患者特定的生理运动模式和磁场模型校准误差。

英文摘要

Electromagnetic Navigation Systems (eMNS) have gained considerable attention for minimally invasive surgery and targeted drug delivery. While most of the literature relies on quasi-static control of these systems, recent work has demonstrated the benefits of dynamic approaches. However, trajectory tracking far from equilibrium states remains largely unaddressed. We close this gap by demonstrating the first swing-up of a magnetically actuated inverted pendulum using the clinically-ready Navion eMNS. Although the inverted pendulum is not clinically relevant in itself, the proposed method utilizes torques and forces as control objectives, making it applicable to other magnetically actuated devices such as catheters and guidewires. Our approach combines trajectory optimization that accounts for internal eMNS dynamics with time-varying Linear Quadratic Regulator (LQR) state feedback and Iterative Learning Control (ILC), which leverages previous trial data and the system's dynamic model to progressively refine the feedforward command. While LQR alone fails due to the complex phenomena of magnetic actuation, ILC enables successful swing-up within six iterations. Furthermore, post-experimental analysis reveals that the learned ILC correction closely matches the torque discrepancy predicted by high-fidelity magnetic field model calibration, suggesting learning and adaptation as a promising tool to deal with uncertainties in electromagnetic actuation arising, e.g., from patient-specific physiological motion patterns and field model calibration inaccuracies.

URL PDF HTML ☆

赞 0 踩 0

2606.14063 2026-06-15 cs.RO cs.SY eess.SY 新提交

Semidefinite Relaxations for Collision-Free Motion Planning

无碰撞运动规划的半定松弛

Bernhard Paus Graesdal, Alexandre Amice, Pablo A. Parrilo, Russ Tedrake

发表机构 * Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology（麻省理工学院电气工程与计算机科学系）

AI总结研究点机器人通过球形障碍物的无碰撞运动规划，提出半定松弛方法，理论分析其紧性并利用对称性降低计算复杂度，比直接非线性规划快10-100倍。

详情

AI中文摘要

我们研究了无碰撞运动规划的半定松弛。我们关注一个点机器人在 $\mathbb{R}^n$ 中从起点运动到终点，穿过球形障碍物，并受到路径连续性约束和平方导数成本；这一设定概念简单但抓住了无碰撞运动规划的难度。我们将该问题精确地表述为多项式曲线上的非凸问题，并提出了一个自然的半定松弛。我们贡献了两个关键的理论见解；据我们所知，这是对无碰撞运动规划半定松弛的首次理论分析。首先，我们表明求解凸松弛等价于在潜在更高维空间中全局最优地求解一个相关的运动规划问题。这种几何解释给出了紧性的必要和充分条件，以及松弛何时松弛的清晰直觉。其次，我们表明该松弛允许对称性约简，使其比预期的要小得多，正半定锥的大小随多项式次数线性增长，且与环境维度无关。由此产生的松弛比使用 SNOPT 和 IPOPT 求解的直接非线性规划转录快10到100倍，求解时间的方差显著降低，并能可靠地找到原始问题的局部最优路径。我们展示了其作为 RRT 规划器中凸导向函数的有效性，用于具有 $C^4$ 连续轨迹的最小加加速度四旋翼规划。

英文摘要

We study semidefinite relaxations for collision-free motion planning. We focus on a point robot moving from start to goal through spherical obstacles in $\mathbb{R}^n$, subject to path continuity constraints and squared derivative costs; a setting that is conceptually simple yet captures the hardness of collision-free motion planning. We formulate this problem exactly as a nonconvex problem over polynomial curves, and present a natural semidefinite relaxation. We contribute two key theoretical insights; to our knowledge this is the first theoretical analysis of semidefinite relaxations for collision-free motion planning. First, we show that solving the convex relaxation is equivalent to solving, to global optimality, a related motion planning problem in a potentially higher-dimensional space. This geometric interpretation yields necessary and sufficient conditions for tightness, and a clear intuition for when the relaxation is loose. Second, we show that the relaxation admits a symmetry reduction that makes it significantly smaller than one might expect, with positive semidefinite cone sizes that scale linearly with the polynomial degree and are independent of the ambient dimension. The resulting relaxation is 10 to 100 times faster than direct nonlinear programming transcriptions solved with SNOPT and IPOPT, exhibits significantly lower variance in solve times, and reliably finds a locally optimal path for the original problem. We demonstrate its effectiveness as a convex steering function in an RRT planner for minimum-snap quadrotor planning with $C^4$ continuous trajectories.

URL PDF HTML ☆

赞 0 踩 0

2606.14270 2026-06-15 cs.RO cs.AI 新提交

Robust Fall Recovery for Armless Bipedal-Wheeled Robots Via Force-Guided Learning

无臂双轮足机器人的鲁棒摔倒恢复：基于力引导的学习方法

Haidong Hou, Zhangguo Yu, Tao Han, Hengbo Qi, Khaleel Ghazal, Yu Zhang, Yidong Du, Xuechao Chen, Fei Meng

发表机构 * Beijing Institute of Technology（北京理工大学）

AI总结针对无臂双轮足机器人无法借助外部支撑恢复站立的问题，提出力引导教师-学生框架FTSR，通过约束强化学习逐步减少外力依赖，实现从摔倒到稳定行走的鲁棒恢复。

Comments 8 pages, 6 figures, accepted by IEEE Robotics and Automation Letters (RA-L)

详情

DOI: 10.1109/LRA.2026.3701481
Journal ref: IEEE Robotics and Automation Letters, 2026

AI中文摘要

摔倒恢复对于自主腿式运动至关重要。现有方法已证明，某些腿式机器人（如人形机器人和四足机器人）能够通过利用手臂或协调多腿产生支撑力，从各种姿态恢复。没有手臂或其他腿提供支撑辅助，双轮足机器人必须完全依赖其腿部的驱动，这使得恢复特别困难。为解决这一问题，我们引入了FTSR（力引导的教师-学生框架与阶段奖励）。力引导方法在模拟训练期间构建一个与机器人实时高度直接相关的外部辅助力，明确地将该力公式化为可优化约束。通过约束强化学习，策略被引导逐步减少力依赖并增加身体高度，尽管没有手臂支撑，仍能发展内部恢复策略。高度渐进式阶段奖励在恢复过程中逐步构建姿态稳定，并过渡到持续运动，与教师-学生架构集成，蒸馏出力效应和恢复动态的特权知识。经过模拟训练，该策略被部署在物理无臂双轮足机器人上并进行了广泛评估。实验证实了在多种挑战性条件下鲁棒可靠的摔倒恢复，展示了强大的环境适应性和运动鲁棒性，同时保持恢复后的完整运动能力。该框架也有效泛化到高自由度人形机器人，证实了其实用泛化性。项目页面见该URL。

英文摘要

Fall recovery is critical for autonomous legged locomotion. Existing methods have demonstrated that some legged robots, such as humanoids and quadrupeds, are capable of fall recovery from diverse postures by utilizing arms or coordinating multi-legs to generate support forces. Without arms or other legs to provide supportive assistance, a bipedal-wheeled robot must rely solely on the actuation of its legs, making recovery particularly difficult. To address this, we introduce FTSR (Force-guided Teacher-student framework with Stage-wise Rewards). The force-guided method constructs an external auxiliary force during simulation training that correlates directly with the robot's real-time height, explicitly formulating this force as an optimizable constraint. Through constrained reinforcement learning, the policy is guided toward reducing force dependency gradually and increasing the body height, developing internal recovery strategies despite having no arms for support. Height-progressive stage-Wise rewards progressively structure posture stabilization during recovery and transition to sustained locomotion, integrated with teacher-student architecture distilling privileged knowledge of force effects and recovery dynamics. After simulation training, the policy is deployed on a physical armless bipedal-wheeled robot and extensively evaluated. Experiments confirm robust and reliable fall recovery under diverse challenging conditions, demonstrating strong environmental adaptability and motion robustness, while maintaining full post-recovery motion capability. The framework also generalizes effectively to a high-DOF humanoid, confirming its practical generalizability. The project page is available at https://2350575870.github.io/force-guided.github.io/

URL PDF HTML ☆

赞 0 踩 0

2512.22484 2026-06-15 cs.RO math.DG 版本更新

Asymmetric Friction in Geometric Locomotion

几何运动中的非对称摩擦

Ross L. Hatton, Yousef Salaman, Shai Revzen

发表机构 * Robotics program at Oregon State University（俄勒冈州立大学机器人项目）； Department of Electrical Engineering and Computer Science at the University of Michigan（密歇根大学电气工程与计算机科学系）

AI总结本文提出将非对称摩擦引入几何运动模型，用Finsler度量替代Riemannian度量，并扩展子Riemannian方法为子Finsler方法，以表征系统运动能力。

Comments 23 pages, 15 figures

详情

AI中文摘要

运动学的几何力学模型揭示了机器人和动物如何利用环境相互作用将内部形状变化转化为在世界中的位移，并将这种关系编码为“运动图”。这类运动图的一个关键类别源于作用在系统各个身体部位上的（可能是各向异性的）线性阻力，通过系统各个身体部位运动的Riemannian度量形式化描述。然后，可以通过对系统整体运动施加子Riemannian约束来生成运动图，在该约束下，给定形状速度所引起的位置速度是使摩擦耗散功率最小的那个。这类系统的运动是“几何的”，因为系统最终达到的位置仅取决于系统经过的形状序列，而不取决于形状变化的速率。在本文中，我们考虑一类更一般的系统，其中阻力不仅可以是各向异性的（前后和左右运动具有不同的系数），而且可以是非对称的（前后运动具有不同的系数）。形式上，在摩擦中包含非对称性将身体部位的Riemannian度量替换为Finsler度量。我们证明了构建系统运动图的子Riemannian方法自然地扩展到子Finsler方法，并确定了与子Riemannian系统的约束曲率类似的系统属性，从而能够表征系统的运动能力。

英文摘要

Geometric mechanics models of locomotion have provided insight into how robots and animals use environmental interactions to convert internal shape changes into displacement through the world, encoding this relationship in a ``motility map''. A key class of such motility maps arises from (possibly anisotropic) linear drag acting on the system's individual body parts, formally described via Riemannian metrics on the motions of the system's individual body parts. The motility map can then be generated by invoking a sub-Riemannian constraint on the aggregate system motion under which the position velocity induced by a given shape velocity is that which minimizes the power dissipated via friction. The locomotion of such systems is ``geometric'' in the sense that the final position reached by the system depends only on the sequence of shapes that the system passes through, but not on the rate with which the shape changes are made. In this paper, we consider a far more general class of systems in which the drag may be not only anisotropic (with different coefficients for forward/backward and left/right motions), but also asymmetric (with different coefficients for forward and backward motions). Formally, including asymmetry in the friction replaces the Riemannian metrics on the body parts with Finsler metrics. We demonstrate that the sub-Riemannian approach to constructing the system motility map extends naturally to a sub-Finslerian approach and identify system properties analogous to the constraint curvature of sub-Riemannian systems that allow for the characterization of the system motion capabilities.

URL PDF HTML ☆

赞 0 踩 0

2602.01948 2026-06-15 cs.RO 版本更新

A Unified Control Architecture for Macro-Micro Manipulation using a Active Remote Center of Compliance for Manufacturing Applications

面向制造应用的宏微操作统一控制架构：基于主动远程柔顺中心

Patrick Frank, Christian Friedrich

发表机构 * Institute for Robotics and Intelligent Production Systems University of Applied Sciences Karlsruhe (HKA)（机器人与智能生产系统研究所卡尔施塔特应用科学大学（HKA））

AI总结提出一种将宏操作器纳入主动交互控制的新架构，相比现有领先-跟随方法将控制带宽提升2.1倍，相比传统力控制提升12.5倍，并引入替代模型简化控制器设计。

Comments 17 pages, 14 figures, submitted to Robotics and Computer-Integrated Manufacturing (RCIM)

详情

AI中文摘要

宏微操作器将具有大工作空间的宏操作器（如工业机器人）与轻量、高带宽的微操作器相结合。这使得在保持机器人广阔工作空间的同时，能够实现高动态的交互控制。传统上，位置控制分配给宏操作器，而微操作器负责与环境交互，这限制了可实现的交互控制带宽。为解决此问题，我们提出了一种新颖的控制架构，将宏操作器纳入主动交互控制中。与基于领先-跟随方法的最先进架构相比，这导致控制带宽提升了2.1倍，与传统基于机器人的力控制相比提升了12.5倍。此外，我们提出了替代模型，以实现更高效的控制器设计并易于适应硬件变化。我们通过在不同实验（如与物体碰撞、跟随力轨迹和工业装配任务）中与其他控制方案进行比较，验证了我们的方法。

英文摘要

Macro-micro manipulators combine a macro manipulator with a large workspace, such as an industrial robot, with a lightweight, high-bandwidth micro manipulator. This enables highly dynamic interaction control while preserving the wide workspace of the robot. Traditionally, position control is assigned to the macro manipulator, while the micro manipulator handles the interaction with the environment, limiting the achievable interaction control bandwidth. To solve this, we propose a novel control architecture that incorporates the macro manipulator into the active interaction control. This leads to a increase in control bandwidth by a factor of 2.1 compared to the state of the art architecture, based on the leader-follower approach and factor 12.5 compared to traditional robot-based force control. Further we propose surrogate models for a more efficient controller design and easy adaptation to hardware changes. We validate our approach by comparing it against the other control schemes in different experiments, like collision with an object, following a force trajectory and industrial assembly tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.25782 2026-06-15 cs.RO 版本更新

ParkourFormer: Integrating Predictive Supervision and Sequence Modeling into Parkour Locomotion

ParkourFormer：将预测监督与序列建模融入跑酷运动

Yanheng Mai, Wenhao Xu, Zirui Huang, Yifei Fu, Shengwei Dong, Xinjue Wang, Kailun Huang, Yanzhe Xie, Renjing Xu

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； CLAI-LAB, CL-TECH（CLAI实验室，CL-TECH）； South China Agricultural University（华南农业大学）； Guangdong University of Technology（广东工业大学）

AI总结提出基于Transformer的序列建模框架ParkourFormer，通过预测未来本体感受状态并融合时序特征生成动作，实现人形机器人在多地形跑酷中的高成功率运动控制。

Comments Project Homepage: https://mronaldo-gif.github.io/parkourformer.github.io/

详情

AI中文摘要

人形机器人跑酷需要运动策略协调全身动力学，以应对楼梯、间隙、斜坡和障碍物等快速变化的地形。现有的强化学习策略大多是反应式的，直接将观测映射到动作，而不显式建模未来身体状态。在敏捷运动任务中，这种建模变得至关重要，因为成功的运动执行强烈依赖于对即将到来的接触过渡和身体动力学的预测。我们提出了ParkourFormer，一个基于Transformer的序列建模框架，将人形机器人运动重新表述为未来条件化的决策问题。当前机器人状态通过交叉注意力查询历史传感器运动轨迹，同时一个轻量级预测头预测短时域的未来本体感受状态。经过监督信号训练的预测未来状态与时间特征融合以生成动作，使策略能够联合推理运动历史和预期的未来动力学。我们在一个包含楼梯、间隙、斜坡、粗糙地形和障碍物穿越的多样化多地形人形机器人跑酷基准上评估了ParkourFormer。在仿真和真实人形机器人上的实验表明，ParkourFormer在极具挑战性的地形上实现了93.85%的平均穿越成功率，相比强MLP、基于MoE的MLP和普通Transformer基线，提升高达42.73%，同时在所有地形类型上保持单一统一策略。这些结果表明，显式未来状态建模显著提高了敏捷全身运动的鲁棒性和泛化能力。

英文摘要

Humanoid parkour requires locomotion policies to coordinate whole-body dynamics across rapidly changing terrains such as stairs, gaps, slopes, and obstacles. Existing reinforcement learning policies are largely reactive, mapping observations directly to actions without explicitly modeling future body states. Such modeling becomes critical in agile locomotion tasks where successful motion execution depends strongly on anticipating upcoming contact transitions and body dynamics. We present ParkourFormer, a Transformer-based sequence modeling framework that reformulates humanoid locomotion as a future-conditioned decision-making problem. The current robot state queries historical sensorimotor trajectories through cross-attention, while a lightweight prediction head forecasts short-horizon future proprioceptive states. The predicted future states, trained with supervised signals, are fused with temporal features to generate actions, enabling the policy to jointly reason over motion history and anticipated future dynamics. We evaluate ParkourFormer on a diverse multi-terrain humanoid parkour benchmark including stairs, gaps, slopes, rough terrain, and obstacle traversal. Experiments in simulation and on a real humanoid robot show that ParkourFormer achieves a 93.85% average traversal success rate on highly challenging terrains, with improvements of up to 47.12% over strong MLP, MoE-based MLP, and vanilla Transformer baselines, while maintaining a single unified policy across all terrain types. These results demonstrate that explicit future-state modeling significantly improves robustness and generalization for agile whole-body locomotion.

URL PDF HTML ☆

赞 0 踩 0

2606.14089 2026-06-15 cs.RO 新提交

A Modular Dual-Arm Apple Harvesting Robot with Enhanced Field Performance

一种具有增强田间性能的模块化双臂苹果采摘机器人

Keyi Zhu, Kyle Lammers, Chaaran Arunachalam, Kaixiang Zhang, Renfu Lu, Zhaojian Li

发表机构 * Michigan State University（密歇根州立大学）； United States Department of Agriculture Agricultural Research Service（美国农业部农业研究局）

AI总结提出一种模块化双臂苹果采摘机器人，采用垂直堆叠臂实现单树上下区域同时作业，结合基础模型感知、7阶加加速度轨迹生成、线性扫描采摘策略等5项改进，在商业果园中达到80.0%采摘成功率和7.53秒平均单臂周期，91.2%果实达到特级标准。

详情

AI中文摘要

机器人苹果采摘为解决商业果园劳动力短缺提供了有前景的方案，但低吞吐量和在果园环境中的较差性能阻碍了其商业应用。本文提出一种模块化双臂苹果采摘机器人，采用垂直堆叠臂实现单棵树上、下区域同时作业，将平台定位从多树横向重新定位简化为单树停止。与我们之前的水平双臂系统相比，该平台集成了5项进步：(1)基于基础模型的感知管线，结合Grounding-DINO和EfficientViT-SAM，在非结构化户外环境中实现鲁棒的水果定位；(2)7阶加加速度有界轨迹生成与控制屏障函数安全滤波器相结合，实现快速且安全的臂运动；(3)线性扫描采摘策略，带有10厘米接近缓冲区和旋转分离，提高了采摘可靠性；(4)基于时序逻辑的双臂协调策略与视觉-臂异步调度，最大化共享真空源的使用；(5)在2025年收获季节，涵盖不同苹果品种和树形结构的两个商业果园中进行现场验证。在这些田间试验收集的1738个臂循环中，系统实现了80.0%的单次尝试成功率和平均每臂周期7.53秒。水果损伤评估确认，91.2%的机器人采摘水果保持了美国农业部最高等级（特级），碰伤率在2.4%至4.9%之间。随着采摘周期时间的进一步改进和对茂密树叶遮挡的处理，这种新型模块化机器人设计有望用于苹果的商业化采摘。

英文摘要

Robotic apple harvesting offers a promising solution to labor shortages in commercial orchards, but low throughput and poor performance in orchard environments hinder its commercial adoption. This paper presents a modular dual-arm apple harvesting robot that uses a vertically stacked arms to enable simultaneous operation in the upper and lower zones of a single tree, simplifying platform positioning from multi-tree lateral repositioning to single-tree stops. Compared to our prior horizontal dual-arm system, the platform integrates 5 advances: (1)a foundation-model-based perception pipeline combining Grounding-DINO and EfficientViT-SAM for robust fruit localization in unstructured outdoor environments; (2)7th-order jerk-bounded trajectory generation paired with a Control Barrier Function safety filter to achieve fast yet safe arm motions; (3)a linear sweep harvesting strategy with a 10cm approach buffer and rotational detachment that improves picking reliability; (4)a temporal-logic-based dual-arm coordination policy with vision-arm async scheduling that maximizes usage of a shared vacuum source; and (5)field validation in 2 commercial orchards covering different apple varieties and tree architectures during the 2025 harvest season. Across the 1738 arm cycles collected in these field trials, the system achieved an 80.0% per-attempt success rate and a mean per-arm cycle time of 7.53s. Fruit damage assessments confirmed that 91.2% of robotically harvested fruit retained the highest USDA grade (Extra Fancy), with bruise rates between 2.4% and 4.9%. With further improvements in the picking cycle time and handling of heavy foliage occlusions, this new modular robot design holds promise for commercial harvesting of apples.

URL PDF HTML ☆

赞 0 踩 0

2606.14188 2026-06-15 cs.RO cs.AI cs.LG cs.SY eess.SY math.OC 新提交

Robustness without Wrinkles: Parallel Simulation and Robust MPC for Certified Deformable Manipulation

无皱鲁棒性：并行仿真与鲁棒MPC实现可认证的变形体操作

Wei-Chen Li, Jeffrey Fang, Sasanka Polisetti, Yuexi Song, Glen Chou

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结提出CORD-SLS实时控制方法，通过GPU并行可微仿真与接触平滑实现高效梯度规划，结合鲁棒模型预测控制与共形预测校准，在绳索和布料操作中达到毫秒级规划与高安全性。

详情

AI中文摘要

我们提出了CORD-SLS，一种用于安全变形物体操作的实时控制方法，重点关注绳索和布料。其核心是一个带有接触平滑的GPU并行可微仿真器，能够通过间歇性接触实现高效的基于梯度的规划。为了在模型和感知不确定性下鲁棒地满足约束，我们开发了一种实时、GPU并行的输出反馈鲁棒模型预测控制（MPC）算法，该算法利用该仿真器进行规划。我们进一步证明，该仿真器加速了基于模型的强化学习，用于训练神经操作策略。为了提高现实世界的鲁棒性，我们使用共形预测来校准视觉反馈和感知误差界限，用于MPC，从而产生可达管，实现高概率的安全控制。我们在仿真和硬件上对高维、接触丰富的绳索和布料操作任务（包括避障、布线、折叠和平整）评估了CORD-SLS。在各种设置中，CORD-SLS实现了毫秒级规划速度，在安全性、速度和任务成功率方面均优于基线方法。

英文摘要

We present CORD-SLS, a real-time control method for safe deformable object manipulation, with a focus on ropes and cloth. At its core is a GPU-parallel differentiable simulator with contact smoothing which enables efficient gradient-based planning through intermittent contact. To robustly satisfy constraints under model and sensing uncertainty, we develop a real-time, GPU-parallel output-feedback robust model predictive control (MPC) algorithm that plans with this simulator. We further show that the simulator accelerates model-based RL for training neural manipulation policies. To improve real-world robustness, we use conformal prediction to calibrate visual-feedback and perception-error bounds for MPC, producing reachable tubes that enable high-probability safe control. We evaluate CORD-SLS on high-dimensional, contact-rich rope and cloth manipulation tasks in simulation and hardware, including obstacle avoidance, routing, folding, and smoothing. Across settings, CORD-SLS achieves millisecond-speed planning, exceeding baselines in safety, speed, and task success.

URL PDF HTML ☆

赞 0 踩 0

2606.14250 2026-06-15 cs.RO 新提交

SyLink Hand: A Synergy-Inspired Linkage-Driven Anthropomorphic Hand for Human-Like Dexterity

SyLink Hand：一种受协同作用启发的连杆驱动拟人手，实现类人灵巧性

Hao Wu, Yanzhe Wang, Yu Feng, Yitong Li, Jingxiang Guo, Jian Liu, Jianshu Zhou

发表机构 * National University of Singapore（新加坡国立大学）； Zhejiang University（浙江大学）

AI总结受人类手部协同作用启发，提出SyLink Hand拟人灵巧手，通过生物力学协同原理与连杆驱动机构结合，在紧凑低成本架构中实现外观、运动学和功能的高度拟人化，验证了协同启发连杆设计有效平衡拟人度、机械简单性和功能多样性。

详情

AI中文摘要

设计在功能灵巧性与机械简单性之间取得平衡的拟人机器人手仍然是一个重大挑战。受人类手部协同作用的启发，本文提出了SyLink Hand，一种拟人灵巧手，它将生物力学协同原理与连杆驱动传动机制相结合，在紧凑且成本效益高的架构中实现了外观、运动学和功能的高度拟人化。使用动作捕捉手套对自然手部运动进行生物力学分析，揭示了手部关节之间的强运动学相关性，为简化但功能性的自由度配置提供了基础。在这些协同特性的指导下，采用优化的连杆机构来协调多个关节运动并再现自然手指轨迹。进一步提出了一种新颖的球形四杆连杆机构，以在紧凑的外形下实现掌指关节的屈曲/伸展和外展/内收的解耦。最终原型集成了19个关节，由11个执行器驱动，总质量为520克，制造成本约为400美元。实验评估证明了其类人运动学性能、高承载能力以及多样的抓取和操作技能。这些结果验证了协同启发、基于连杆的设计有效平衡了拟人度、机械简单性和功能多样性，突显了其在需要灵巧性的机器人应用中实际部署的潜力。

英文摘要

Designing anthropomorphic robotic hands that balance functional dexterity with mechanical simplicity remains a significant challenge. Inspired by human hand synergies, this paper presents the SyLink Hand, an anthropomorphic dexterous hand that integrates biomechanical synergy principles with linkage-driven transmission mechanisms to achieve a high degree of anthropomorphism in appearance, kinematics, and functionality within a compact and cost-effective architecture. Biomechanical analysis of natural hand motions using motion capture gloves reveals strong kinematic correlations among hand joints, providing the basis for a simplified yet functional degree-of-freedom (DOF) configuration. Guided by these synergistic characteristics, optimized linkage mechanisms are employed to coordinate multiple joint motions and reproduce natural finger trajectories. A novel spherical four-bar linkage is further proposed to achieve decoupled flexion/extension (Flex/Ext) and abduction/adduction (Abd/Add) at the metacarpophalangeal joint within a compact form factor. The resulting prototype integrates 19 joints driven by 11 actuators, with a total mass of 520g and a manufacturing cost of approximately USD 400. Experimental evaluations demonstrate its human-like kinematic performance, high load-bearing capability, and versatile grasping and manipulation skills. These results validate that the synergy-inspired, linkage-based design effectively balances anthropomorphism, mechanical simplicity, and functional versatility, highlighting its potential for practical deployment in dexterity-demanding robotic applications.

URL PDF HTML ☆

赞 0 踩 0

2606.14531 2026-06-15 cs.RO 新提交

用于灵巧手控制的阻抗MPC与扰动估计

Yongyan Cao

AI总结提出一种执行器无关的阻抗模型预测控制框架，通过代数前馈将肌腱传动简化为常系数双积分器，结合编码器增强卡尔曼扰动估计，实现高精度轨迹跟踪与安全接触力控制。

详情

AI中文摘要

灵巧手必须同时跟踪精确的手指轨迹并保持安全、柔顺的接触——这对于任何固定增益控制器来说都是相互矛盾的目标。我们提出了一种执行器无关的灵巧手指阻抗模型预测控制（Impedance MPC）框架，实例化了为物理人机交互（pHRI）建立的恒定$A_d$无偏移架构；通过保留架构假设，其稳定性、递归可行性和输入-状态稳定性保证得以继承。代数前馈将肌腱传动——液压、缆绳、气动、扭绳或串联弹性——简化为常系数双积分器，因此QP代价逆矩阵可离线预计算，一个10步滚动时域二次规划以500 Hz运行，同时强制执行接触力（ISO/TS 15066）、驱动限制和加加速度的硬约束。仅使用编码器的增广卡尔曼扰动状态使任何恒定接触负载下的稳态误差为零。在液压驱动手指上——作为工作示例平台，增加了压力和空化约束——500 Hz卡尔曼MPC在1.5 Nm接触下实现了0.5 mrad RMS、0.1 mrad稳态和6.6 mrad峰值偏差：比经典阻抗分别好183倍、1500倍和23倍。实现的首次运动刚度（随更新率从18变化到323 Nm/rad）得到独立验证。该架构可扩展到16自由度LEAP Hand MuJoCo仿真，在0.7秒内从2.5 N抓取负载扰动中恢复。

英文摘要

Dexterous hands must simultaneously track precise finger trajectories and maintain safe, compliant contact -- objectives in tension for any fixed-gain controller. We present an actuator-agnostic Impedance Model Predictive Control (Impedance MPC) framework for dexterous fingers, instantiating the constant-$A_d$ offset-free architecture established for physical human-robot interaction (pHRI); its stability, recursive-feasibility, and input-to-state-stability guarantees are inherited by preserving the architectural assumptions. An algebraic feedforward reduces the tendon transmission -- hydraulic, cable, pneumatic, twisted-string, or series-elastic -- to a constant-coefficient double integrator, so the QP cost inverse is precomputed offline and a 10-step receding-horizon quadratic program runs at 500\,Hz while enforcing hard constraints on contact force (ISO/TS 15066), actuation limits, and jerk. An encoder-only augmented-Kalman disturbance state drives steady-state error to zero under any constant contact load. On a hydraulically actuated finger -- the worked example platform, adding pressure and cavitation constraints -- the 500\,Hz Kalman MPC attains 0.5\,mrad RMS, 0.1\,mrad steady-state, and 6.6\,mrad peak deflection under 1.5\,Nm contact: 183$\times$, 1500$\times$, and 23$\times$ better than classical impedance. The realized first-move stiffness (18$\to$323\,Nm/rad with update rate) is independently verified. The architecture scales to a 16-DOF LEAP Hand MuJoCo simulation, recovering from 2.5\,N grasp-load disturbances within 0.7\,s.

URL PDF HTML ☆

赞 0 踩 0

2606.08555 2026-06-15 cs.RO 版本更新

基于占用空间的房间分割用于分层3D场景图

Carlos Cueto Zumaya, Iacopo Catalano, Jorge Peña-Queralta, Wallace Moreira Bessa

发表机构 * University of Turku（图尔库大学）； Centre for Artificial Intelligence, Zürich University of Applied Sciences（苏黎世应用科学大学人工智能中心）

AI总结提出一种基于占用分解的房间节点锚定方法，构建分层3D场景图，在Matterport3D数据集上相比基线方法恢复了更多房间实例。

详情

AI中文摘要

室内机器人的分层3D场景图（3DSGs）在空间尺度上组织几何和语义信息，其中房间层连接对象级感知和房间级推理。现有系统从不同的空间基板（例如，地点聚类、墙壁平面或分割输出）构建该层，因此房间节点没有在共同的几何标准上进行评估。我们提出了一种基于占用空间的3DSG管道，其中房间节点锚定到从占用分解中跟踪的自由空间区域，为每个房间提供明确的多边形足迹。我们在12个Matterport3D场景上评估该管道，通过将预测的房间多边形与标注的房间实例进行匹配，并与代表性最先进的地点连接基线Hydra进行比较。结果表明，基于占用空间的锚定比地点连接构建恢复了更多的房间实例，但代价是精度较低，并且两种方法在墙壁精确的房间边界方面仍然是一个开放问题。代码可在该https URL获取。

英文摘要

Hierarchical 3D scene graphs (3DSGs) for indoor robots organize geometric and semantic information across spatial scales, with a room layer that connects object-level perception to room-scale reasoning. Existing systems construct this layer from different spatial substrates (\eg{} place clusters, wall planes, or segmentation outputs), and as a result, room nodes are not evaluated on a common geometric criterion. We present an occupancy-grounded 3DSG pipeline in which room nodes are anchored to tracked free-space regions derived from occupancy decomposition, giving each room an explicit polygonal footprint. We evaluate the pipeline on 12 Matterport3D scenes by matching predicted room polygons to annotated room instances and compare against Hydra, a representative state-of-the-art place-connectivity baseline. The results show that occupancy-grounded anchoring recovers substantially more room instances than place-connectivity construction, at the cost of lower precision, and that wall-accurate room boundaries remain an open problem for both methods. Code is available at https://github.com/crcz25/OccuSG.

URL PDF HTML ☆

赞 0 踩 0

2606.13878 2026-06-15 cs.RO 新提交

AnyGoal: Vision-Language Guided Multi-Agent Exploration for Training-Free Lifelong Navigation

AnyGoal: 视觉-语言引导的多智能体探索实现免训练终身导航

MoniJesu James, Marcelino Julio Fernando, Miguel Altamirano Cabrera, Dzmitry Tsetserukou

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出AnyGoal，一种免训练多机器人架构，利用视觉-语言模型（VLM）驱动前沿探索，通过共享2D高斯贝叶斯价值图（BVM）协调智能体，实现终身证据积累，在GOAT-Bench上达到52.4%子任务成功率，优于模块化方法27.7个百分点。

Comments 17 pages, 3 figures

详情

AI中文摘要

在大规模仿真语料库上训练的端到端导航策略在迁移到分布外场景、类别或目标模态时性能急剧下降。模块化流水线如Modular GOAT受限于封闭集目标检测召回率，而3D快照记忆系统（如3D-Mem）积累密集、视角相关的表示，维护成本高。我们提出AnyGoal，一种免训练的多机器人架构，将视觉-语言模型（VLM）置于基于前沿的探索核心，并通过共享的2D高斯贝叶斯价值图（BVM）协调智能体。BVM维护每个像素关于目标相关性的后验（mu, sigma^2），通过深度锥掩模对VLM分数进行精度加权融合更新，且子任务间从不重置，实现终身证据积累。前沿通过VLM评判softmax与BVM上的贝叶斯UCB项的凸混合进行排序。具有空间分离惩罚和承诺滞后的贪婪分配器在无中央控制器的情况下将前沿分配给各智能体。在完整的GOAT-Bench验证集未见分割（360个片段，2669个子任务）上，我们的双智能体系统在严格物理机制下（离散0.25米步长，无瞬移，42度水平视场角）达到52.4%的子任务成功率和12.7%的SPL，创下新纪录，比Modular GOAT（24.9%）提高27.7个百分点。单智能体AnyGoal达到41.9%的子任务成功率，表明增益来自决策架构。四路感知消融实验显示，开放词汇检测器将主要失败模式从探索转向目标验证。

英文摘要

End-to-end navigation policies trained on large simulation corpora degrade sharply when transferred to out-of-distribution scenes, categories, or goal modalities. Modular pipelines such as Modular GOAT are bottlenecked by closed-set object detection recall, while 3D snapshot-memory systems (e.g. 3D-Mem) accumulate dense, view-dependent representations that are heavy to maintain. We present AnyGoal, a training-free multi-robot architecture that places a Vision-Language Model (VLM) at the core of frontier-based exploration and coordinates agents through a shared 2D Gaussian Bayesian Value Map (BVM). The BVM maintains a per-pixel (mu, sigma^2) posterior over goal relevance, updated via precision-weighted fusion of VLM scores through a depth-cone mask, and is never reset between subtasks, yielding lifelong evidence accumulation. Frontiers are ranked by a convex blend of a VLM-as-judge softmax and a Bayesian UCB term on the BVM. A greedy allocator with spatial-separation penalty and commitment hysteresis distributes frontiers across agents without a centralized controller. On the full GOAT-Bench val unseen split (360 episodes, 2,669 subtasks), our dual-agent system achieves 52.4% Subtask SR at 12.7% SPL--state of the art under the strict physical regime (discrete 0.25 m steps, no teleportation, 42 deg HFOV) and a +27.5 pp improvement over Modular GOAT (24.9%). Single-agent AnyGoal achieves 41.9% Subtask SR, showing gains arise from the decision architecture. A four-way perception ablation shows that open-vocabulary detectors shift the dominant failure mode from exploration to goal verification.

URL PDF HTML ☆

赞 0 踩 0

2606.13990 2026-06-15 cs.RO 新提交

SplatlessDF: Continuous Distance Field Mapping with Non-Splatting Gaussians

SplatlessDF: 基于非溅射高斯分布的连续距离场映射

Monisha Mushtary Uttsha, Lan Wu, Teresa Vidal-Calleja

发表机构 * UTS Robotics Institute, Faculty of Engineering and IT, University of Technology Sydney（悉尼科技大学工程与信息技术学院UTS机器人研究所）； School of Engineering, University of Western Australia（西澳大学工程学院）

AI总结提出SplatlessDF框架，利用各向异性高斯元素从空间角度构建连续距离场，支持距离和梯度查询，并可与2D高斯溅射结合实现统一建模，适用于机器人导航。

详情

AI中文摘要

最近的高斯溅射（GS）方法表明，场景可以通过可优化的高斯分布高效表示，以实现高质量的重建和渲染。本文基于这一原理，引入SplatlessDF，一个从空间而非光度角度使用各向异性高斯元素的连续距离场（DF）映射框架。SplatlessDF直接参数化高斯分布并优化以恢复可微DF，使得能够在空间域中查询距离和梯度，用于下游机器人任务如导航。此外，SplatlessDF可与2D高斯溅射（2DGS）耦合，提供一个完全基于高斯原语的统一框架，该框架可以学习连续DF和表面模型，并支持光度渲染。我们考虑两种设置：独立的仅DF公式和与2DGS耦合的联合DF-渲染公式。实验表明，独立公式提供高效准确的距离和梯度查询，而联合公式改善渲染几何并同时建模连续DF。这些结果凸显了GS风格表示不仅在表面建模和渲染方面，而且在适用于机器人导航的映射表示方面的潜力。

英文摘要

Recent Gaussian splatting (GS) methods have shown that scenes can be represented efficiently with optimisable Gaussians for high-quality reconstruction and rendering. In this paper, building on this principle, we introduce SplatlessDF, a continuous distance field (DF) mapping framework that uses anisotropic Gaussian elements from a spatial rather than photometric perspective. SplatlessDF directly parameterises the Gaussians and optimises to recover a differentiable DF, enabling distances and gradients to be queried in the spatial domain for downstream robotic tasks such as navigation. Furthermore, SplatlessDF can be coupled with 2D Gaussian splatting (2DGS), providing a unified framework based solely on Gaussian primitives that can learn continuous DF and surface models and supports photometric rendering. We consider two settings: a standalone DF-only formulation and a joint DF-rendering formulation coupled with 2DGS. Experiments show that the standalone formulation provides efficient and accurate distance and gradient queries, while the joint formulation improves rendering geometry and simultaneously models a continuous DF. These results highlight the potential of GS-style representations not only for surface modelling and rendering but also for mapping representations suited to robotic navigation.

URL PDF HTML ☆

赞 0 踩 0

2606.14160 2026-06-15 cs.RO 新提交

薛定谔的导航者：为零样本目标导航设想未来轨迹集合

Yu He, Da Huang, Zhenyang Liu, Zixiao Gu, Qiang Sun, Guangnan Ye, Yanwei Fu, Yu-Gang Jiang

发表机构 * Fudan University（复旦大学）； Shanghai Jiao Tong University（上海交通大学）； Shanghai University of International Business and Economics（上海对外经贸大学）； Shanghai Innovation Institute（上海创新研究院）

AI总结提出一种信念感知框架，在推理时通过轨迹条件化的3D世界模型设想多个未来场景，结合自适应遮挡物感知采样和未来感知价值图，提升零样本目标导航在遮挡严重环境中的隐蔽目标发现和风险感知路径选择。

详情

AI中文摘要

零样本目标导航（ZSON）要求机器人在未见环境中找到目标物体，无需任务特定的微调或预建地图，这是通用服务机器人的关键能力。然而，在模拟中表现良好的方法在杂乱的真实世界场景中往往会退化，这些场景存在严重遮挡和潜在危险，大面积的未观察区域使得单场景推理脆弱且不安全。我们提出薛定谔的导航者，一个信念感知框架，在推理时对多个轨迹条件化的设想3D未来进行推理。给定候选路径，轨迹条件化的3D世界模型预测假设的观察结果，并保持多个合理场景实现的叠加，而不是承诺于单一地图。自适应遮挡物感知采样器将想象引导至不确定性关键区域，而未来感知价值图（FAVM）聚合设想的未来，以实现鲁棒、主动的动作选择。在模拟和物理Go2四足机器人上的实验表明，薛定谔的导航者优于强ZSON基线，在遮挡严重的导航场景中提高了隐蔽目标发现和风险感知路径点选择。这些结果突显了设想3D未来作为在不确定真实世界环境中进行零样本导航的可扩展和通用策略。

英文摘要

Zero-shot object navigation (ZSON) requires robots to find target objects in unseen environments without task-specific fine-tuning or pre-built maps, a key capability for general-purpose service robots. Yet methods that perform well in simulation often degrade in cluttered real-world scenes with severe occlusion and latent hazards, where large unseen regions make single-scene inference brittle and unsafe. We propose Schrödinger's Navigator, a belief-aware framework that reasons at inference time over multiple trajectory-conditioned imagined 3D futures. Given candidate paths, a trajectory-conditioned 3D world model predicts hypothetical observations and maintains a superposition of plausible scene realizations rather than committing to one map. An adaptive occluder-aware sampler directs imagination to uncertainty-critical regions, while a Future-Aware Value Map (FAVM) aggregates imagined futures for robust, proactive action selection. Experiments in simulation and on a physical Go2 quadruped show that Schrödinger's Navigator outperforms strong ZSON baselines, improving hidden-target discovery and risk-aware waypoint selection in occlusion-heavy navigation scenarios. These results highlight imagined 3D futures as a scalable and generalizable strategy for zero-shot navigation in uncertain real-world environments.

URL PDF HTML ☆

赞 0 踩 0

2606.14083 2026-06-15 cs.RO 新提交

The N2D Haptic Glove: A Multi-Finger Glove for 2D Directional Force Feedback for Contact Rich Manipulation

N2D 触觉手套：用于接触丰富操作的多指二维方向力反馈手套

Yao-Ting Huang, Jake Honma, Omar Hernandez, Logan Li, Kaitlin Calimbahin, Bryce Hackel, Michael C. Yip

发表机构 * University of California San Diego（加州大学圣地亚哥分校）

AI总结提出 N2D 触觉手套，通过绞盘驱动在指尖提供二维弯曲-伸展力反馈，显著降低遥操作中的接触力误差并提高一致性。

详情

AI中文摘要

人类在操作过程中依赖方向性指尖力来探测和调节接触，但大多数可穿戴触觉手套仅提供振动或单轴力，导致力方向模糊。缺乏方向性提示时，用户必须仅凭视觉推断接触力，常导致过度按压、控制不一致以及机器人遥操作精度下降。我们提出 N2D 触觉手套，一种多指可穿戴设备，利用绞盘驱动传输在指尖提供平面弯曲-伸展力，实现高透明度力反馈。通过台架验证和涉及机器人手臂与手触觉遥操作的用户研究，我们证明与仅视觉和单轴触觉基线相比，平面指尖反馈在精确操作中显著降低接触力误差，提高试验间一致性，并增强轴向探测任务中的整体用户体验。这些发现确立了 N2D 触觉手套和基于方向手指的触觉设备作为接触丰富遥操作、沉浸式虚拟现实模拟以及机器人从演示中学习的有前景模式。N2D 触觉手套的硬件和软件系统将完全开源，网址为 \href{this https URL}{this https URL}。

英文摘要

Humans rely on directional fingertip forces to probe and regulate contact during manipulation, yet most wearable haptic gloves render only vibration or single-axis force, leaving force direction ambiguous. Without directional cues, users must infer contact force from vision alone, often leading to over-pressing, inconsistent control, and reduced precision in robotic teleoperation. We present the N2D Haptic Glove, a multi-finger wearable device that renders planar flexion-extension fingertip forces using capstan-drive transmissions for high-transparency force feedback. Through benchtop validations and a user study involving haptic teleoperation of a robotic arm and hand, we demonstrate that compared to visual-only and single-axis haptic baselines, planar fingertip feedback significantly reduces contact force error during precise manipulation, improves trial-to-trial consistency, and enhances overall user experience in axial probing tasks. These findings establish the N2D Haptic Glove and directional finger-based haptics devices as a promising modality for contact-rich teleoperation, immersive virtual reality simulations, and robot learning from demonstrations. N2D Haptic Glove's hardware and software system will be fully open-sourced at \href{https://ucsdarclab.github.io/n2d-glove/}{this https URL}.

URL PDF HTML ☆

赞 0 踩 0

2606.14218 2026-06-15 cs.RO cs.AI cs.LG 新提交

Universal Manipulation Exoskeleton: Learning Compliant Whole-body Policies with Real-time Torque Feedback

通用操控外骨骼：利用实时扭矩反馈学习全身柔顺策略

Litian Liang, Jingxi Xu, Xinda Qi, Yujun Cai, Houzhu Ding, Luqi Wang, Zhixin Sun, Jyh-Herng Chow, Ming Yang, Mark Cutkosky

发表机构 * Ant Group（蚂蚁集团）； Stanford University（斯坦福大学）

AI总结提出通用操控外骨骼（UME），通过实时触觉扭矩反馈和全身数据采集，使机器人学习主动柔顺策略，在受限空间中完成移动操作、力控翻转等任务。

详情

AI中文摘要

为了使机器人在家庭环境中安全工作，它们需要具备柔顺性，并在接触过程中对扭矩和力反馈做出反应。然而，现有的大多数数据采集管道仍然缺乏捕捉力和扭矩数据以学习主动柔顺策略的能力。在本文中，我们提出了通用操控外骨骼（UME），一种上肢外骨骼，它提供实时触觉扭矩反馈，同时记录整个手臂的配置和关节扭矩信号用于遥操作。凭借透明的扭矩反馈，人类操作员甚至可以在蒙眼的情况下拔出运动学约束的物体。UME成本低、重量轻且便携。配备嵌入式IMU，它支持移动操作的遥操作。通过我们提出的通用重定向算法，UME可以遥操作多种机器人，包括7自由度OpenArm、7自由度Franka和6自由度X-ARM。我们证明，这些能力的组合使得学习双臂、全身和主动柔顺策略成为可能，这些策略在高度受限的空间中有效运行。学习到的鲁棒自主策略在各种任务中实现了高成功率，包括长时程移动操作、力介导的箱子翻转、视觉遮挡的箱子推挤以及空间受限的桌面操作。视频、代码和更多信息可在此https URL找到。

英文摘要

For robots to work safely in household environments, they need to be compliant and react to torque and force feedback during contact. However, the majority of existing data collection pipelines still lack the ability to capture force and torque data for learning active compliant policies. In this paper, we present Universal Manipulation Exoskeleton (UME), an upper-limb exoskeleton that provides real-time haptic torque feedback while recording whole-arm configurations and joint torque signals for teleoperation. With transparent torque feedback, human operators can even unsheathe kinematically constrained objects while blindfolded. UME is low-cost, lightweight, and portable. Equipped with an embedded IMU, it enables teleoperation for mobile manipulation. With our proposed universal retargeting algorithm, UME can teleoperate a range of robots, including the 7DoF OpenArm, 7DoF Franka, and 6DoF X-ARM. We demonstrate that this combination of capabilities enables learning bimanual, whole-body, and active compliant policies that operate effectively in highly constrained spaces. The learned robust autonomous policies achieve high success rates across a variety of tasks, including long-horizon mobile manipulation, force-mediated box flipping, visually occluded box pushing, and space-constrained tabletop manipulation. Videos, code, and additional information can be found at https://ume-exo.github.io.

URL PDF HTML ☆

赞 0 踩 0

2606.14602 2026-06-15 cs.RO 新提交

What Robots Do Matters More Than What They Look Like: Task Context Shapes Trust in Educational HRI

机器人做什么比它们长什么样更重要：任务背景塑造教育人机交互中的信任

Anna-Maria Velentza, Konstantina Nikou, Anne-Gwenn Bosser, Nikolaos Fachantidis

发表机构 * LIRES Robotics Lab, University of Macedonia（马其顿大学LIRES机器人实验室）

AI总结通过视频实验（N=81）发现，任务类型（教学、指导、索要个人信息）对信任有显著主效应，而机器人外观无显著影响，表明任务背景比物理外观更关键。

Comments Accepted in the 35th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2026), Kitakyushu, Fukuoka, Japan

详情

AI中文摘要

社交辅助机器人（SARs）越来越多地部署在教育和信息共享环境中，这得益于大型语言模型的进步，使得流畅的实时交互成为可能。尽管机器人外观的多样性不断增加，但尚不清楚单一机器人外观是否适用于不同的交互任务，或者信任是否主要取决于情境因素。在本研究中，我们考察了机器人外观和任务类型如何共同影响对机器人的信任。通过一项受试者内视频实验（N=81），参与者评估了三种外观不同的机器人在执行三种教育相关任务（教学、程序性指导和个人信息讨论）时的表现。重复测量分析结果显示，任务对信任有强烈的主效应：参与者在指导任务中报告了最高的信任度，在教学活动中信任度中等，而当机器人索要个人信息时信任度显著降低。相比之下，机器人外观没有显著的主效应，外观与任务之间的交互作用也不明显。这些发现表明，人机交互中的信任更多地由任务背景而非物理外观所塑造。通过关注未来的教育工作者作为最终用户，本研究为教育环境中任务感知的机器人部署提供了实证证据，并强调了将机器人角色和行为与交互目标对齐的重要性，而非仅仅依赖拟人化设计。

英文摘要

Socially assistive robots (SARs) are increasingly deployed in educational and information-sharing contexts, supported by advances in large language models that enable fluent real-time interaction. Despite the growing diversity of robot embodiments, it remains unclear whether a single robot appearance is appropriate across different interaction tasks or whether trust depends primarily on contextual factors. In this study, we examine how robot appearance and task type jointly influence trust in robots. Using a within-subjects video-based experiment (N = 81), participants evaluated three robots with distinct appearances while performing three educationally relevant tasks: teaching, procedural instruction, and personal-information discussion. Results from repeated-measures analyses show a strong main effect of task on trust, with participants reporting the highest trust during instructional guidance, moderate trust during teaching activities, and significantly lower trust when robots requested personal information. In contrast, robot appearance showed no significant main effect, and the interaction between appearance and task was marginal. These findings suggest that trust in human-robot interaction is shaped more strongly by task context than by physical embodiment alone. By focusing on future educators as end users, this work contributes empirical evidence toward task-aware robot deployment in educational environments and highlights the importance of aligning robot roles and behaviors with interaction goals rather than relying solely on anthropomorphic design.

URL PDF HTML ☆

赞 0 踩 0

2606.14617 2026-06-15 cs.RO cs.SY eess.SY 新提交

自我改进的VLA策略：用于抗伪影动作平滑的选择性扩散噪声

Duc Minh Nguyen, Bao-Ngoc Dao, Tung M. Luu, Binh Gia Nguyen, Vinh Tong, Anji Liu, Vu N. Duong, Dung D. Le, Daniel Sonntag, Trung Le, Ngan Le, Jan Peter, An Thai Le, Minh Nhat Vu, Mathias Niepert, Khoa D. Doan, Duy M. H. Nguyen, Vien Anh Ngo

发表机构 * Center for AI Research, VinUniversity（VinUniversity人工智能研究中心）； VinRobotics ； KAIST（韩国科学技术院）； University of Stuttgart（斯图加特大学）； IMPRS-IS（国际马克斯·普朗克智能系统研究学院）； National University of Singapore（新加坡国立大学）； DFKI（德国人工智能研究中心）； University of Oldenburg（奥尔登堡大学）； Monash University（莫纳什大学）； University of Arkansas（阿肯色大学）； TU Darmstadt（达姆施塔特工业大学）

AI总结提出一种无需训练的选择性扩散噪声方法，通过动态采样噪声向量增强视觉-语言-动作策略的鲁棒性和动作平滑性，在仿真和真实场景中成功率分别提升8%和10%。

详情

AI中文摘要

基于扩散的视觉-语言-动作（VLA）策略在机器人操作中实现了强大的泛化能力，但对伪影视觉相关性和噪声动作生成仍然敏感，导致在扰动下行为脆弱。我们引入了选择性扩散噪声（SDN），这是一种简单的、无需训练的测试时方法，通过利用扩散噪声空间作为可控自由度来提高鲁棒性和成功率。SDN动态采样与参考集最大分离的噪声向量，以减轻对伪影线索的依赖，同时选择产生更一致动作轨迹的候选。这种双重目标即使在物体遮挡的观测下也能鼓励稳定行为，并在不修改模型参数的情况下减少动作抖动。我们在两个模拟基准（Google Robot、Widow-X）和两个真实世界机器人数据集上，对多种VLA策略（包括pi_0、Groot-N1.5和Groot-N1.6）评估了SDN。SDN在模拟环境中一致地将成功率提高了8%，在真实环境中提高了10%，同时产生更平滑、更稳定的动作。我们的结果强调，扩散噪声选择可以作为在测试时增强VLA策略的有效且通用机制。

英文摘要

Diffusion-based Vision-Language-Action (VLA) policies enable strong generalization in robotic manipulation, but remain sensitive to spurious visual correlations and noisy action generation, leading to brittle behavior under perturbations. We introduce Selected Diffusion Noise (SDN), a simple, training-free test-time method that improves both robustness and success rate by leveraging the diffusion noise space as a controllable degree of freedom. SDN dynamically samples noise vectors that are maximally separated from a reference set to mitigate reliance on spurious cues, while selecting candidates that yield more coherent action trajectories. This dual objective encourages stable behavior even under object-masked observations and reduces action jitter without modifying model parameters. We evaluate SDN on two simulation benchmarks (Google Robot, Widow-X) and two real-world robotic datasets across multiple VLA policies, including pi_0, Groot-N1.5, and Groot-N1.6. SDN consistently improves success rates by +8% in simulation and +10% in real-world settings, while producing smoother and more stable actions. Our results highlight that diffusion noise selection can serve as an effective and general mechanism for enhancing VLA policies at test time.

URL PDF HTML ☆

赞 0 踩 0

2606.14409 2026-06-15 cs.RO cs.AI 新提交

Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack

Hy-Embodied-0.5-VLA：从视觉-语言-动作模型到真实世界机器人学习栈

He Zhang, Lingzhu Xiang, Haitao Lin, Zeyu Huang, Minghui Wang, Dingyan Zhong, Yubo Dong, Yihao Wu, Yongming Rao, Dongsheng Zhang, Wanjia He, Ling Chen, Kai Huang, Jiahao Chen, Sichang Su, Xumin Yu, Ziyi Wang, Chengwei Zhu, Xiao Teng, Yuchun Guo, Yufeng Zhang, Yuandong Liu, Rui Wang, Zisheng Lu, Han Hu, Zhengyou Zhang

发表机构 * University of Science and Technology of China（中国科学技术大学）； Tsinghua University（清华大学）

AI总结提出端到端机器人学习栈HyVLA-0.5，涵盖数据收集、模型设计、预训练与微调、RL后训练及真实部署，各组件协同工作。

2606.14010 2026-06-15 cs.CV cs.LG cs.RO 交叉投稿

动态流场中微群集运动优化的多目标多智能体强化学习方法

Josef Berman, Oren Gal

发表机构 * Hatter Department of Marine Technologies, Leon H. Charney School of Marine Sciences, University of Haifa（哈特尔海洋技术系，列昂·H·夏恩海洋科学学院，海法大学）

AI总结提出混合CFD与多目标多智能体强化学习框架，通过PCGrad解决梯度冲突，在振荡流中优化微机器人集群的上游推进、能量效率和运动平滑性。

详情

AI中文摘要

在生理真实、时间依赖的流体环境中协调微型机器人集群，仍然是生物医学和环境应用中的未解决挑战。我们提出了一种混合计算流体动力学-多目标多智能体强化学习框架，该框架将高保真不可压缩纳维-斯托克斯求解器与去中心化近端策略优化直接耦合，以在振荡流中学习物理一致的集群控制策略。十六个磁驱动微型机器人在脉动动脉波形中导航，同时优化上游推进、能量守恒和运动平滑性，并通过PCGrad手术进行协调。没有PCGrad时，能量效率和平滑度奖励在10000训练步内降至接近零，而进度表现出持续的大幅振荡，证实梯度冲突解决是该领域的一个结构性要求而非可选改进。收敛策略实现了6.5-7.0的进度奖励、0.63-0.65的持续能量效率以及接近最大的平滑度（0.97-0.99），在主目标上比暴力基线有所改进，而两个基线在整个过程中能量效率均为负值。训练揭示了三个涌现行为阶段：在正向流动期间抑制峰值通道速度的集体双层水动力节流编队、利用流动反转进行上游重新定位的周期同步棘轮机制，以及智能体接近成功边界时的个体化最终接近。这些结果表明，时间依赖的流体-智能体相互作用可以直接在多目标强化学习循环中捕获，为生物医学导航、环境监测和工业微流体中的微群集控制提供了基于物理的范式。

英文摘要

Coordinating micro-robotic swarms in realistic, time-dependent fluid environments remains a major challenge for biomedical and environmental applications. We present a hybrid CFD-MO-MARL (Computational Fluid Dynamics-Multi Objective-Multi Agent Reinforcement Learning) framework that couples a high-fidelity incompressible Navier--Stokes solver with decentralized proximal policy optimization to learn swarm control policies in oscillatory flow. Sixteen magnetically actuated micro-robots were simulated to navigate a pulsatile arterial waveform within a 2 mm channel while jointly optimizing upstream progression, energy efficiency, and motion smoothness. Conflicting objectives are resolved using Projected Conflicting Gradient (PCGrad) surgery. Without PCGrad, energy and smoothness rewards collapse during training, demonstrating that gradient conflict resolution is essential for stable multi-objective learning. The converged policy achieves progress rewards of 6.5-7.0, energy efficiency of 0.63-0.65, and smoothness of 0.97-0.99, outperforming brute-force baselines by more than 8 reward units on the primary objective. Training reveals three emergent behaviors not encoded in the reward function: hydrodynamic throttling formations that reduce peak flow velocities, a cycle-synchronized ratchet mechanism that exploits flow reversals for upstream movement, and individualized final-approach strategies near the target boundary. These results demonstrate that physically realistic fluid--agent interactions can be integrated directly into multi-objective reinforcement learning, providing a scalable framework for micro-swarm control in biomedical navigation, environmental monitoring, and microfluidic systems.

URL PDF HTML ☆

赞 0 踩 0

2606.13840 2026-06-15 cs.RO cs.CV 新提交

Multi-Agent Embodied Autonomous Driving: From V2X Information Exchange to Shared World Models

多智能体具身自动驾驶：从V2X信息交换到共享世界模型

Senkang Hu, Zhengru Fang, Yihang Tao, Zihan Fang, Sam Tak Wu Kwong, Yuguang Fang

发表机构 * Lingnan University, Hong Kong（岭南大学（香港））

AI总结本文综述了从单车智能向多智能体具身系统转变的自动驾驶技术，通过共享世界模型实现感知共享、意图推断和协同规划，并指出了在仿真评估、实时安全保证等方面的研究空白。

详情

AI中文摘要

自动驾驶正从孤立的车辆智能转向多智能体具身系统，这些系统共享感知、推断意图并在不确定性下协调行动。本综述通过共享世界模型（SWMs）的视角审视这一转变：SWMs是跨车辆、基础设施和其他交通参与者维护的预测性跨智能体表征。我们回顾了超过380篇文献，涵盖车联万物（V2X）通信、协同感知、智能体间认知、协同规划、端到端协同驾驶以及用于闭环验证的仿真和数据引擎。核心问题是交换的观测如何成为对齐的状态、意图感知的交互和协调的下游行动。在所调查的文献中，评估仍然集中在仿真、精心设计的基准测试和离线协议上。基于基础模型的协调也缺乏在开放交通中经过验证的实时安全保证。这些空白为多智能体具身自动驾驶（MAEAD）提出了关键研究重点：可验证的共享状态维护、鲁棒的意图和计划对齐，以及在通信、延迟和部署约束下的安全协调行动。

英文摘要

Autonomous driving is shifting from isolated vehicle intelligence toward multi-agent embodied systems that share perception, infer intent, and coordinate action under uncertainty. This survey examines this transition through the lens of Shared World Models (SWMs): predictive cross-agent representations maintained across vehicles, infrastructure, and other traffic participants. We review more than 380 publications spanning vehicle-to-everything (V2X) communication, collaborative perception, inter-agent cognition, cooperative planning, end-to-end cooperative driving, and simulation and data engines for closed-loop validation. The organizing question is how exchanged observations become aligned state, intent-aware interaction, and coordinated downstream action. Across the surveyed literature, evaluation remains concentrated in simulation, curated benchmarks, and offline protocols. Foundation-model-based coordination also lacks verified real-time safety guarantees in open traffic. These gaps motivate key research priorities for multi-agent embodied autonomous driving (MAEAD): verifiable shared-state maintenance, robust intent and plan alignment, and safe coordinated action under communication, latency, and deployment constraints.

URL PDF HTML ☆

赞 0 踩 0

2606.13883 2026-06-15 cs.RO 新提交

Guided Diffusion with Distilled Vision-Language Reliability for Aerial Navigation

基于蒸馏视觉语言可靠性的引导扩散用于空中导航

Ivan Valuev, Iana Zhura, Valerii Serpiva, Didar Seyidov, Dzmitry Tsetserukou

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结提出一种可靠性感知的扩散规划器，通过蒸馏视觉语言模型生成场景级可靠性热图，引导去噪过程处理不可靠区域，显著降低无人机导航中的障碍物违反率并提高区域可靠性。

详情

AI中文摘要

自主无人机导航通常由将感知、映射和规划分离为不同阶段的流水线解决，这会传播误差、累积延迟，并需要针对特定环境重新调整。端到端生成模型通过将原始观测直接映射到轨迹来消除这些接口，但继承了一个微妙的失败模式：在干净数据上训练后，它们无法识别观测何时不可靠，并将玻璃、镜子和过曝光表面等退化区域视为有效证据进行规划。我们提出了一种用于3D无人机导航的可靠性感知扩散规划器。它将轨迹生成条件设置为观测以及场景级可靠性热图，该热图标记了感知不可信的区域，由轻量级网络生成，该网络在实时规划预算内蒸馏了视觉语言模型的开放词汇推理能力。为了无需重新训练即可泛化到未见环境，我们使用可微的两阶段ESDF成本引导去噪过程，该成本将来自深度的物理障碍和来自高度不可靠区域的虚拟障碍同等对待。在仿真和真实四旋翼飞行器上，我们的规划器比最先进的扩散基线产生了明显更安全的轨迹，将障碍物违反率从40.3%降低到9.6%，并将穿越区域的平均可靠性从0.588提高到0.925。仅消融可靠性项会使平均可靠性从0.898降至0.783，确认了其决定性作用，而蒸馏使框架运行速度比完整视觉语言模型快2倍。

英文摘要

Autonomous UAV navigation is conventionally solved by pipelines that separate perception, mapping, and planning into distinct stages, which propagates errors, accumulates latency, and requires environment-specific retuning. End-to-end generative models remove these interfaces by mapping raw observations directly to trajectories, but inherit a subtle failure mode: trained on clean data, they cannot recognise when an observation is unreliable, and treat degraded regions such as glass, mirrors, and overexposed surfaces as valid evidence for planning. We present a reliability-aware diffusion planner for 3D UAV navigation. It conditions trajectory generation on the observation together with a scene-level reliability heatmap that marks where perception cannot be trusted, produced by a lightweight network that distils the open-vocabulary reasoning of a vision-language model within the real-time planning budget. To generalise to unseen environments without retraining, we steer the denoising process with a differentiable two-stage ESDF cost that treats physical obstacles from depth and virtual obstacles from highly unreliable regions on equal footing. In simulation and on a real quadrotor, our planner produces markedly safer trajectories than a state-of-the-art diffusion baseline, reducing the obstacle-violation rate from 40.3% to 9.6% and raising the mean reliability of traversed regions from 0.588 to 0.925. Ablating the reliability term alone drops mean reliability from 0.898 to 0.783, confirming it as the decisive component, while distillation runs the framework up to 2 times faster than the full vision-language model.

URL PDF HTML ☆

赞 0 踩 0

2606.14032 2026-06-15 cs.RO 新提交

From Attacks to Curricula: Learnability-Guided Adversarial Training for Safe Autonomous Driving

从攻击到课程：面向安全自动驾驶的可学习性引导对抗训练

Yuewen Mei, Tong Nie, Jie Sun, Haotian Shi, Wei Ma, Jian Sun

发表机构 * College of Transportation & Key Laboratory of Road and Traffic Engineering of Ministry of Education, Tongji University（同济大学交通运输工程学院 & 道路与交通工程教育部重点实验室）； Department of Civil and Environmental Engineering, The Hong Kong Polytechnic University（香港理工大学土木与环境工程学系）

AI总结提出AlignADV框架，通过偏好对齐生成可解决场景，并利用行为指纹预测策略能力，动态采样课程以提升自动驾驶对抗训练的收敛效率与安全性。

详情

AI中文摘要

过驱动飞行器的可解释控制效能学习与非线性控制分配集成方法

Umut Demir, Aamir Ahmad, Walter Fichter

发表机构 * University of Stuttgart, Faculty of Aerospace Engineering and Geodesy, Institute of Flight Mechanics and Control (iFR)（斯图加特大学航空航天工程与大地测量学院飞行力学与控制研究所）

AI总结提出一种基于稀疏非线性动力学辨识的学习控制效能映射方法，结合在线自适应机制，实现过驱动飞行器的高效非线性控制分配，兼具可解释性和低计算成本。

详情

AI中文摘要

非线性动力学以及多个执行器之间产生的强耦合削弱了传统线性控制分配技术背后的假设。当飞行进入非线性效应主导的模态时，线性分配器因模型失配增加而精度下降，进而降低飞行控制系统的性能和鲁棒性。高保真机载模型和黑箱数据驱动方法可以在整个飞行包线内恢复精度，但分别带来实时分配难以承受的计算负担，并牺牲了验证和故障诊断所需的可解释性。本文通过使用稀疏非线性动力学辨识从代表性飞行数据中学习显式的、受物理约束的控制效能映射解析模型，解决了这些限制。所得映射紧凑、可解释，并允许解析导数，从而能够在非线性求解器中高效计算，同时额外包含执行器动力学，无需机载模型。在线自适应机制监控预测残差，并在检测到显著对象变化时刷新模型，从而在执行器故障和变化工况下提供平滑重构。该方法在一款高保真非线性基准飞行器上经过一系列激进机动评估，达到了与完整非线性机载模型相当的精度，同时相对于现有基线显著降低了计算成本。

英文摘要

Nonlinear dynamics and the strong couplings that arise between multiple effectors undermine the assumptions behind conventional, linear control allocation techniques. When flight enters regimes where nonlinear effects dominate, linear allocators exhibit reduced accuracy due to increased model mismatch, which subsequently degrades performance and robustness of the flight control system. High fidelity onboard models and black box data driven approaches can recover accuracy across the flight envelope, but respectively impose computational burdens prohibitive for real time allocation and sacrifice the interpretability required for verification and fault diagnosis. This paper addresses these limitations by learning an explicit, physics constrained analytical model of the control effectiveness mapping from representative flight data using Sparse Identification of Nonlinear Dynamics. The resulting mapping is compact, interpretable, and admits analytical derivatives, enabling efficient computation within nonlinear solvers that additionally incorporate actuator dynamics, without requiring an onboard model. An online adaptation mechanism monitors prediction residuals and refreshes the model when significant plant changes are detected, providing graceful reconfiguration under actuator failures and varying operating conditions. The methodology is evaluated on a high fidelity nonlinear benchmark aircraft across a range of aggressive maneuvers, achieving accuracy comparable to a full nonlinear onboard model while substantially reducing computational cost relative to established baselines.

URL PDF HTML ☆

赞 0 踩 0

2503.14331 2026-06-15 cs.RO cs.CV cs.SY eess.SY 版本更新

ADAPT: An Autonomous Forklift for Construction Site Operation

ADAPT：一种用于建筑工地作业的自主叉车

Johannes Huemer, Markus Murschitz, Matthias Schörghuber, Lukas Reisinger, Thomas Kadiofsky, Christoph Weidinger, Mario Niedermeyer, Benedikt Widy, Marcel Zeilinger, Csaba Beleznai, Tobias Glück, Andreas Kugi, Patrik Zips

发表机构 * Center for Vision, Automation and Control（视觉、自动化与控制中心）； AIT Austrian Institute of Technology GmbH（奥地利技术研究所）； Automation and Control Institute（自动化与控制研究所）； Technische Universität Wien（维也纳技术大学）

AI总结提出ADAPT自主叉车，结合AI感知与经典方法，在非结构化建筑工地实现近人类水平的物流操作，提升安全与效率。

详情

AI中文摘要

高效的物料物流在控制建筑行业的成本和进度中起着关键作用。然而，人工物料搬运仍然容易出现效率低下、延误和安全风险。自主叉车提供了一种有前景的解决方案，以简化现场物流，减少对人类操作员的依赖并缓解劳动力短缺。本文介绍了ADAPT（自主动态全地形托盘运输车）的开发与评估，这是一种专为建筑环境设计的全自主越野叉车。与结构化的仓库环境不同，建筑工地面临重大挑战，包括动态障碍物、非结构化地形和多变的天气条件。为应对这些挑战，我们的系统将AI驱动的感知技术与传统的决策、规划和控制方法相结合，实现了在复杂环境中的可靠操作。我们通过广泛的真实世界测试验证了该系统，并在各种天气条件下将其连续性能与经验丰富的人类操作员进行了比较。我们的研究结果表明，自主户外叉车可以达到接近人类水平的性能，为更安全、更高效的建筑物流提供了一条可行路径。

英文摘要

Efficient material logistics play a critical role in controlling costs and schedules in the construction industry. However, manual material handling remains prone to inefficiencies, delays, and safety risks. Autonomous forklifts offer a promising solution to streamline on-site logistics, reducing reliance on human operators and mitigating labor shortages. This paper presents the development and evaluation of ADAPT (Autonomous Dynamic All-terrain Pallet Transporter), a fully autonomous off-road forklift designed for construction environments. Unlike structured warehouse settings, construction sites pose significant challenges, including dynamic obstacles, unstructured terrain, and varying weather conditions. To address these challenges, our system integrates AI-driven perception techniques with traditional approaches for decision making, planning, and control, enabling reliable operation in complex environments. We validate the system through extensive real-world testing, comparing its continuous performance against an experienced human operator across various weather conditions. Our findings demonstrate that autonomous outdoor forklifts can operate near human-level performance, offering a viable path toward safer and more efficient construction logistics.

URL PDF HTML ☆

赞 0 踩 0

2606.13746 2026-06-15 cs.RO 新提交

Scalable Dynamic Tactile Sensing Enabled by Passive and Flexible Acoustic Waveguides

可扩展动态触觉传感：基于被动柔性声波导

Guimin Long, Changhong Linghu, Chuanping Liu, Ke Xu, Xingjian Jing

发表机构 * Department of Mechanical Engineering, City University of Hong Kong（香港城市大学机械工程系）

AI总结提出一种基于深亚波长声波导的被动分布式触觉传感范式，通过弹性膜帽亥姆霍兹谐振器和弹簧增强微管网络实现弯曲不变性，结合稀疏麦克风阵列与轻量神经网络，在4个麦克风64节点阵列中实现4mm空间分辨率和>99%定位精度，支持低频信号波形重建，并展示指尖阵列、触觉手套和大面积皮肤等原型。

Comments 40 pages, 6 figures

详情

AI中文摘要

人工动态触觉传感需要灵敏度、鲁棒性和柔顺性，但现有技术在大面积阵列扩展时面临权衡，加上布线复杂性和成本。本文报告了一种使用深亚波长声波导的被动分布式范式，将性能与结构柔性解耦。弹性膜帽封装的亥姆霍兹谐振器由弹簧增强微管互连，形成封闭网络，在宏观弯曲下保持声学传输不变。通过稀疏嵌入麦克风，系统实现了低频信号（<100 Hz）的实时定位（4 mm最高空间分辨率；4个麦克风64节点传感阵列中准确率>99%）和波形重建。快速连续小波变换和轻量神经网络可在5.5 ms内完成推理。我们展示了适形原型——指尖阵列、触觉手套和大面积皮肤——可检测从单根头发接触到5 mg颗粒撞击、动脉脉搏波、羽毛触摸和手指接触的刺激。这为下一代人机界面建立了一种可扩展、灵活、低成本的范式。

英文摘要

Artificial dynamic tactile sensing requires sensitivity, robustness, and compliance, yet existing technologies face trade-offs when scaling to large-area arrays, compounded by wiring complexity and cost. Here, we report a passive distributed paradigm using deep sub-wavelength acoustic waveguides that decouples performance from structural flexibility. Elastic-membrane-capped Helmholtz resonators interconnected by spring-reinforced microtubes form an enclosed network with invariant acoustic transmission under macroscopic bending. By sparsely embedding microphones, the system achieves real-time localization (4 mm highest spatial resolution; >99% accuracy in a 4 microphones 64-node sensing array) and waveform reconstruction of low-frequency signals (<100 Hz). Fast Continuous Wavelet Transform and a lightweight neural network enable inference within 5.5 ms. We demonstrate conformable prototypes-fingertip arrays, a tactile glove, and large-area skins-detecting stimuli from single-hair contact to 5-mg particle impacts, arterial pulse waves, feather touches, and finger contact. This establishes a scalable, flexible, low-cost paradigm for next-generation human-machine interfaces.

URL PDF HTML ☆

赞 0 踩 0

2606.14070 2026-06-15 cs.RO 新提交

Development of a 3 in Sewer Pipe Inspection Robot with an Articulated Differential Mechanism using X-shaped Linkages

使用X形连杆的铰接差动机构的三通下水道管道检测机器人开发

Shoya Umemura, Ryota Taniguchi, Atsushi Kakogawa

发表机构 * Ritsumeikan University（立命馆大学）

AI总结提出一种改进的三通下水道管道检测机器人，通过铰接差动机构提升牵引力和越障能力，并设计基于驱动轮电流检测的线缆松弛控制方法，实验验证了其越障性能。

Comments The 23rd International Conference on Ubiquitous Robots (UR 2026), 15-18 July, Osaka Ibaraki Campus, Ritsumeikan University, Ibaraki, Osaka, Japan

2606.13877 2026-06-15 cs.RO 新提交

ContactWorld: What Matters in Vision-Tactile World Models for Contact-Rich Manipulation

ContactWorld: 视觉-触觉世界模型中什么对接触丰富操作至关重要

Zhiyuan Zhang, Pokuang Zhou, Kaidi Zhang, Adeesh Desai, Temitope Amosa, Davood Soleymanzadeh, Jiuzhou Lei, Minghui Zheng, Yu She

发表机构 * School of Industrial Engineering, Purdue University（普渡大学工业工程学院）； Department of Mechanical Engineering, Texas A&M University（德克萨斯农工大学机械工程系）

AI总结通过12项接触丰富操作任务，发现空间结构化和时间连续的表征（如点云）能显著提升规划成功率，且触觉传感的有效性依赖于跨模态表征兼容性。

Comments 32 pages, 12 figures, supplementary material included

详情

AI中文摘要

接触丰富操作需要世界模型从多模态感官观测中推理复杂的接触动力学。然而，哪些表征属性从根本上支持接触丰富环境下的稳定长时域规划仍不清楚。在本文中，我们提出了ContactWorld，一个涵盖12项接触丰富操作任务（包括插入、拆卸、拧紧和探索性交互）的基准和系统性实证研究。通过大量实验，我们发现同时具有空间结构化和时间连续性的表征始终能实现最强的规划性能。特别地，点云观测将平均规划成功率从腕部视角观测的20.7%和前方视角观测的22.0%提升至32.1%。我们进一步发现，触觉传感的有效性关键取决于跨模态表征兼容性，而非仅模态规模。将点云观测与保留更丰富空间结构和交互动力学的触觉力场表征相结合，进一步将性能提升至36.1%，在所有评估任务中实现了最强的整体规划性能。此外，在长时域规划目标下，触觉传感变得越来越重要，因为复合预测误差和接触不确定性随时间累积。总之，这些发现强调了表征结构、多模态兼容性和长时域鲁棒性在面向接触丰富机器人操作的视觉-触觉世界模型中的重要性。

英文摘要

Contact-rich manipulation requires world models to reason over complex contact dynamics from multimodal sensory observations. However, it remains unclear which representation properties fundamentally support stable long-horizon planning in contact-rich settings. In this paper, we present ContactWorld, a benchmark and systematic empirical study of vision-tactile world models spanning 12 contact-rich manipulation tasks, including insertion, disassembly, screwing, and exploratory interaction. Across extensive experiments, we find that representations that are both spatially structured and temporally continuous consistently achieve the strongest planning performance. In particular, point-cloud observations improve average planning success rates from 20.7% with wrist-view observations and 22.0% with front-view observations to 32.1%. We further find that the effectiveness of tactile sensing depends critically on cross-modal representation compatibility rather than modality scaling alone. Combining point-cloud observations with tactile force-field representations, which preserve richer spatial structure and interaction dynamics, further improves performance to 36.1%, yielding the strongest overall planning performance across all evaluated tasks. Moreover, tactile sensing becomes increasingly important under long-horizon planning objectives, where compounding prediction errors and contact uncertainty accumulate over time. Together, these findings highlight the importance of representation structure, multimodal compatibility, and long-horizon robustness in vision-tactile world models for contact-rich robotic manipulation.

URL PDF HTML ☆

赞 0 踩 0

2606.14058 2026-06-15 cs.RO 新提交

基于运动学数据估计行走过程中的地面反作用力

Gautami Golani, Dong Anh Khoa To, Ananda Sidarta, Arun-Kumar Kaliya-Perumal, Oliver Roberts, Lek Syn Lim, Jim Patton, Domenico Campolo

发表机构 * Nanyang Technological University（南洋理工大学）； Agency for Science, Technology and Research（科技研究局）； National Healthcare Group（国家健康集团）

AI总结提出一种仅使用标记点运动捕捉数据估计地面反作用力的无测力台方法，通过16个身体段运动学计算质心并分解力分量，实验验证了可行性。

详情

AI中文摘要

地面反作用力（GRFs）提供了对人体步态力学的基本洞察，并广泛用于评估关节负荷、肢体对称性、平衡控制和运动功能。尽管具有临床相关性，但由于测力台系统的实际限制，GRF在临床工作流程中的应用仍不充分。在这项工作中，我们提出了一种无测力台的方法，仅使用基于标记的运动捕捉数据来估计GRF。这种仅基于运动学的方法来估计和分解GRF，使其非常适合广泛的临床部署。通过使用16个身体节段的运动学，我们估计质心（CoM）并计算GRF，随后通过基于最小化的方法将其分解为各个分量。通过这一框架，我们可以识别步态支撑期，并在没有专用测力台系统的情况下提供临床上有意义的动力学测量。实验结果表明，仅基于运动学数据估计CoM和GRF是可行的，支持无测力台的步态分析。

英文摘要

Ground reaction forces (GRFs) provide fundamental insight into human gait mechanics and are widely used to assess joint loading, limb symmetry, balance control, and motor function. Despite their clinical relevance, the use of GRF remains underutilised in clinical workflows due to the practical limitations of force plate systems. In this work, we present a force-plate-free approach for estimating GRFs using only marker-based motion capture data. This kinematics only method to estimate and decompose GRF makes it well suited for widespread clinical depolyment. By using kinematics from sixteen body segments, we estimate the centre of mass (CoM) and compute GRFs, which are subsequently decomposed into individual components through a minimization-based approach. Through this framework, we can identify gait stance phases and provide access to clinically meaningful kinetic measures without a dedicated force plate system. Experimental results demonstrate the viability of CoM and GRF estimation based solely on kinematic data, supporting force-plate-free gait analysis.

URL PDF HTML ☆

赞 0 踩 0

2606.08881 2026-06-15 cs.RO cs.AI 版本更新

Benchmarking Vision-Language-Action Models on SO-101: Failure and Recovery Analysis

在SO-101上对视觉-语言-动作模型进行基准测试：失败与恢复分析

Yi Yu, Xinchuan Qiu

发表机构 * Graduate School of Advanced Science and Engineering, Hiroshima University（广岛大学先进科学与工程研究生院）

AI总结提出SO-101低成本机器人平台基准，通过失败分类和恢复评估指标，系统比较VLA和模仿学习策略，发现执行不稳定是主要失败源。

Comments 13 pages, 9 figures,

详情

AI中文摘要

视觉-语言-动作（VLA）模型在机器人操作中展现出强大的泛化能力，但现有评估主要在仿真或昂贵机器人平台上进行，其在低成本真实机器人上的鲁棒性尚未充分探索。我们提出了一个标准化的真实世界基准，用于在低成本SO-101机器人平台上评估代表性VLA和模仿学习策略。该基准包含四个代表性操作任务和统一评估协议，能够在具身不确定性下进行系统比较。使用真实遥操作演示，我们直接在物理平台上微调和评估$π_{0.5}$、SmolVLA、Wall-X和ACT。除了传统的任务成功率，该基准还包含结构化的失败分类、语义级和执行级失败分解，以及恢复感知评估指标，以表征策略鲁棒性。实验结果表明，更强的预训练VLA策略通常优于模仿学习基线，尽管在低成本机器人部署条件下性能高度依赖于任务。执行不稳定是主要的失败源，而恢复能力在不同架构间差异显著。这些结果强调了超越二元任务成功进行失败和恢复分析的重要性，并将SO-101确立为在现实低成本机器人部署条件下评估具身AI系统的实用基准。

英文摘要

Vision-Language-Action (VLA) models have demonstrated strong generalization in robotic manipulation, yet existing evaluations are primarily conducted in simulation or on expensive robotic platforms, leaving their robustness on affordable real-world robots largely unexplored. We present a standardized real-world benchmark for evaluating representative VLA and imitation learning policies on the low-cost SO-101 robotic platform. The benchmark comprises four representative manipulation tasks together with unified evaluation protocols, enabling systematic comparison under embodiment uncertainty. Using real-world teleoperated demonstrations, we fine-tune and evaluate $π_{0.5}$, SmolVLA, Wall-X, and ACT directly on the physical platform. Beyond conventional task success rates, the benchmark incorporates a structured failure taxonomy, semantic- and execution-level failure decomposition, and recovery-aware evaluation metrics to characterize policy robustness. Experimental results show that stronger pretrained VLA policies generally outperform the imitation learning baseline, although performance remains highly task-dependent under low-cost robotic deployment conditions. Execution instability emerges as the dominant failure source, while recovery capability varies substantially across architectures. These results highlight the importance of failure and recovery analysis beyond binary task success and establish SO-101 as a practical benchmark for evaluating embodied AI systems under realistic low-cost robotic deployment conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.12349 2026-06-15 cs.RO cs.SY eess.SY 版本更新

Traceable Virtual Sea Trials in the Marine Robotics Unity Simulator for Manoeuvring Assessment of Unmanned Surface Vehicles

面向无人水面艇操纵性评估的海洋机器人Unity仿真器中可追溯虚拟海试

Paria Rezayan

发表机构 * School of Engineering and Built Environment, Sheffield Hallam University（谢菲尔德哈勒姆大学工程与建筑环境学院）

AI总结针对USV水动力导数辨识数据获取难的问题，在MARUS仿真器中建立标准化虚拟海试框架，通过TC/ZZ机动自动化执行、数据采集与后处理管道，生成符合IMO/ITTC指标的可重复数据集，案例验证了框架的有效性。

详情

AI中文摘要

精确识别水动力导数对于无人水面艇（USV）的控制与导航至关重要，但物理海试的高保真操纵数据受成本和安全性限制。回转试验（TC）和Z形试验（ZZ）仍是IMO和ITTC评估程序的基础。本文扩展了海洋机器人Unity仿真器（MARUS），引入标准化虚拟海试框架，用于TC/ZZ机动的自动化执行和数据生成，包括可追溯的命令-执行日志记录、面向系统辨识（SI）的数据调理以及自动提取符合IMO/ITTC的操纵性指标。一个关键贡献是专用的TC/ZZ数据采集和后处理管道，提高了基于仿真的机动的可重复性和可审计性，同时生成适用于水动力导数辨识和数字孪生工作流的SI就绪数据集。另一个特点是差动推力转向的显式命令-执行分离，其中输入记录为有序的等效舵命令，而实际执行则记录为基于施加推力的执行级代理。案例研究结果表明了可重复且合规的机动行为。对于TC试验，左舷和右舷之间的归一化进距差异约为3.9%，战术直径差异约为4.6%至4.7%。对于ZZ试验，±10度和±20度机动下的第一和第二超越角超调量均保持在1度以下，满足IMO标准，而峰值偏航速率约为4.1至5.8度/秒。总体而言，该框架提供了一种可重复且可审计的虚拟海试工作流，用于生成符合IMO/ITTC的数据集，并支持系统辨识、水动力导数估计和数字孪生校准。

英文摘要

Accurate identification of hydrodynamic derivatives is essential for precise control and autonomous navigation of Unmanned Surface Vehicles (USVs). However, acquiring high-fidelity manoeuvring data from physical sea trials is often constrained by cost, safety, and environmental disturbances. Standard manoeuvring trials, particularly Turning Circle (TC) and Zig-Zag (ZZ), remain fundamental to IMO and ITTC assessment procedures because they provide comparable performance metrics reflective of underlying hydrodynamic behaviour. This paper extends the open-source Marine Robotics Unity Simulator (MARUS) by introducing a standardised Virtual Sea Trial framework for automated execution and data generation of TC/ZZ manoeuvres. The framework provides traceable command-actuation logging, system-identification (SI)-focused data conditioning, and automated extraction of IMO/ITTC-aligned manoeuvring metrics. A key contribution is a dedicated TC/ZZ data acquisition and post-processing pipeline, improving the repeatability and auditability of simulator-based manoeuvres while producing SI-ready datasets for hydrodynamic-derivative identification and digital-twin workflows. The framework also provides explicit command-execution separation for differential-thrust steering, where manoeuvre inputs are recorded as ordered rudder-equivalent commands and realised actuation is logged as an execution-level proxy derived from applied thrust. Case study results demonstrate repeatable and IMO-compliant manoeuvre behaviour. For TC tests, the normalised advance differs by approximately 3.9% between port and starboard turns, while the tactical diameter differs by 4.6-4.7%. For ZZ tests, first and second overshoot excesses remain below 1 degree for both +/-10-degree and +/-20-degree manoeuvres, satisfying IMO criteria, while peak yaw rates range from approximately 4.1 to 5.8 degrees/second.

URL PDF HTML ☆

赞 0 踩 0

2606.14585 2026-06-15 cs.RO cs.AI 新提交

Sensitivity Shaping for Latent Modeling

潜变量建模中的灵敏度塑造

Hongzhan Yu, Chenghao Li, Ruipeng Zhang, Henrik Christensen, Sicun Gao

发表机构 * University of California San Diego（加利福尼亚大学圣迭戈分校）

AI总结针对生成动力学模型在策略诱导的分布外（OOD）转换检测中灵敏度不足的问题，提出支持条件控制灵敏度正则化，提升对控制输入变化的局部响应，实验验证了改进的OOD检测和更安全的闭环规划。

详情

AI中文摘要

生成动力学模型能够在具有挑战性的机器人系统中进行规划，但安全部署需要可靠地检测策略诱导的分布外（OOD）转换。现有方法通常将学习到的动力学视为固定的，并附加事后支持代理。我们表明，当动力学对关键动作选择局部不敏感时，这些代理可能失效：不受支持的控制动作可能产生类似于演示转换的潜变量预测，尽管存在较大的真实预测误差，但仍会抑制OOD信号。为了解决这个问题，我们引入了支持条件控制灵敏度正则化，该正则化在学习动力学的高支持训练区域中促进对控制输入变化的局部敏感响应。这保留了控制引起的变异，同时限制了因弱经验支持导致的不稳定外推。在基于视觉的避障、操作和真实机器人导航中的实验表明，OOD检测和更安全的闭环规划得到了改进。

英文摘要

Generative dynamics models enable planning in challenging robotic systems, but safe deployment requires reliably detecting policy-induced out-of-distribution (OOD) transitions. Existing methods typically treat the learned dynamics as fixed and attach post hoc support surrogates. We show that these surrogates can fail when the dynamics are locally insensitive to critical action choices: unsupported control actions may produce latent predictions that resemble demonstrated transitions, suppressing OOD signals despite large true predictive errors. To address this, we introduce support-conditioned control-sensitivity regularization, which promotes sensitive local response to control input changes in learned dynamics in high-support training regions. This preserves control-induced variation while limiting unstable extrapolation due to weak empirical support. Experiments in vision-based obstacle avoidance, manipulation, and real-robot navigation show improved OOD detection and safer closed-loop planning.

URL PDF HTML ☆

赞 0 踩 0

2606.14536 2026-06-15 cs.LG cs.RO cs.SY eess.SY 交叉投稿

快速多方开放性对话与社交机器人

Giulio Antonio Abbo, Maria Jose Pinto-Bernal, Martijn Catrycke, Tony Belpaeme

发表机构 * University of Amsterdam（阿姆斯特丹大学）

AI总结本文提出一种结合多模态感知与大语言模型的多方对话系统，评估结果显示其在平行对话和小组讨论中表现出高参与度和准确率，但存在语音识别误差和响应延迟等技术限制。

Comments 15 pages, 5 figures, 4 tables; 2 appendices

详情

DOI: 10.3389/frobt.2026.1766383
Journal ref: Front. Robot. AI 13:1766383 (2026)

AI中文摘要

多方开放性对话在人机交互中仍是一个重大挑战，特别是当机器人需要识别说话者、分配发言权并在对话重叠或快速变化时保持连贯回应。本文提出一种多方对话系统，结合多模态感知（语音方向到达、说话人分离、面部识别）与大语言模型进行回应生成。在Furhat机器人上实现后，该系统在两个场景中对30名参与者进行了评估：（i）平行独立对话和（ii）共享小组讨论。结果表明，该系统能维持连贯且吸引人的对话，在平行设置中实现高收件人准确率（92.6%）和强面部识别可靠性（80-94%）。参与者报告了清晰的社会存在感和积极的参与度，尽管语音基于说话人识别错误和响应延迟等技术障碍影响了小组互动的流畅性。结果突显了基于LLM的多方交互的潜力和局限性，并概述了未来社交机器人改进多模态提示整合和响应能力的具体方向。

英文摘要

Multi-party open-ended conversation remains a major challenge in human-robot interaction, particularly when robots must recognise speakers, allocate turns, and respond coherently under overlapping or rapidly shifting dialogue. This paper presents a multi-party conversational system that combines multimodal perception (voice direction of arrival, speaker diarisation, face recognition) with a large language model for response generation. Implemented on the Furhat robot, the system was evaluated with 30 participants across two scenarios: (i) parallel, separate conversations and (ii) shared group discussion. Results show that the system maintains coherent and engaging conversations, achieving high addressee accuracy in parallel settings (92.6%) and strong face recognition reliability (80-94%). Participants reported clear social presence and positive engagement, although technical barriers such as audio-based speaker recognition errors and response latency affected the fluidity of group interactions. The results highlight both the promise and limitations of LLM-based multi-party interaction and outline concrete directions for improving multimodal cue integration and responsiveness in future social robots.

URL PDF HTML ☆

赞 0 踩 0

2508.18967 2026-06-15 cs.RO cs.CV 版本更新

Enhanced UAV Path Planning Using the Tangent Intersection Guidance (TIG) Algorithm

利用切线交点引导算法（TIG）增强的无人机路径规划

Hichem Cheriet, Khellat Kihel Badra, Chouraqui Samira

AI总结本文提出TIG算法，通过椭圆切线交点方法生成可行路径，结合启发式规则和二次贝塞尔曲线平滑技术，在静态和动态环境中实现高效安全的无人机路径规划。

Comments Accepted for publication in JAMRIS Journal

详情

DOI: 10.14313/jamris-2026-018
Journal ref: Journal of Automation, Mobile Robotics and Intelligent Systems, 20(2), 30-52 (2026)

AI中文摘要

高效的无人机导航对于各种应用至关重要，包括战斗支援、包裹递送和搜索救援。本文介绍了切线交点引导（TIG）算法，一种用于静态和动态环境中的无人机路径规划的先进方法。该算法使用椭圆切线交点方法生成可行路径。它为每个威胁生成两条子路径，根据启发式规则选择最佳路线，并迭代优化路径，直到达到目标。考虑到无人机的运动学和动力学约束，采用基于二次贝塞尔曲线的改进平滑技术生成平滑且高效的路径。实验结果表明，TIG算法在静态环境中能够在0.01秒内生成最短路径，比A*、PRM、RRT*、切线图和静态APPATT算法具有更少的转向角度。此外，在完全未知和部分已知环境中，TIG展示了高效的实时路径规划能力，用于避障，优于APF和动态APPATT算法。

英文摘要

Efficient and safe navigation of Unmanned Aerial Vehicles (UAVs) is critical for various applications, including combat support, package delivery and Search and Rescue Operations. This paper introduces the Tangent Intersection Guidance (TIG) algorithm, an advanced approach for UAV path planning in both static and dynamic environments. The algorithm uses the elliptic tangent intersection method to generate feasible paths. It generates two sub-paths for each threat, selects the optimal route based on a heuristic rule, and iteratively refines the path until the target is reached. Considering the UAV kinematic and dynamic constraints, a modified smoothing technique based on quadratic Bézier curves is adopted to generate a smooth and efficient route. Experimental results show that the TIG algorithm can generate the shortest path in less time, starting from 0.01 seconds, with fewer turning angles compared to A*, PRM, RRT*, Tangent Graph, and Static APPATT algorithms in static environments. Furthermore, in completely unknown and partially known environments, TIG demonstrates efficient real-time path planning capabilities for collision avoidance, outperforming APF and Dynamic APPATT algorithms.

URL PDF HTML ☆

赞 0 踩 0

1. 机器人学习与模仿强化学习 12 篇

FlowMo-WM: A World Model with Object Momentum and Hidden Ambient Drift

Output-Level Regularization Eliminates the Seed Lottery in Single-GPU VLA Fine-Tuning

An Attention-based Model for Robust Forecasting with Missing Modality

ReactVLA: Fast and Lightweight Reactive Robot Manipulation via Improved Mean Flow Action Generation

Elastic Queries Reinforcement Learning: Self-Aware Policy Execution for VLA Models

EgoGuide: Egocentric Guidance for Efficient Robot-Free Demonstration Collection and Learning

Causal Object-Centric Models for Planning with Monte Carlo Tree Search

X-Loco: Towards Generalist Humanoid Locomotion Control via Synergetic Policy Distillation

CoRe-MoE: Contrastive Reweighted Mixture of Experts for Multi-Terrain Humanoid Locomotion with Gait Adaptation

Improving Robotic Generalist Policies via Flow Reversal Steering

Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

2. 运动规划、控制与动力学 7 篇

Efficient Domain-Adaptive Policy Learning via Kernel Representation with Application to Quadrotor Control under Non-Stationary Disturbances

Learning Dynamic Swing-Up of an Inverted Pendulum using Remote Magnetic Actuation

Semidefinite Relaxations for Collision-Free Motion Planning

Robust Fall Recovery for Armless Bipedal-Wheeled Robots Via Force-Guided Learning

Asymmetric Friction in Geometric Locomotion

A Unified Control Architecture for Macro-Micro Manipulation using a Active Remote Center of Compliance for Manufacturing Applications

ParkourFormer: Integrating Predictive Supervision and Sequence Modeling into Parkour Locomotion

3. 操作、抓取与灵巧手 11 篇

A Modular Dual-Arm Apple Harvesting Robot with Enhanced Field Performance

Robustness without Wrinkles: Parallel Simulation and Robust MPC for Certified Deformable Manipulation

SyLink Hand: A Synergy-Inspired Linkage-Driven Anthropomorphic Hand for Human-Like Dexterity

AERMANI-PLACE: Language Guided Object Placement with Aerial Manipulators

Spatially Conditioned Diffusion Policy: Learning Precise and Robust Manipulation with a Single RGB Camera

ORCA: A Platform for Open-Source Dexterity Research

Impedance MPC with Disturbance Estimation for Dexterous Hand Control

FAWAM: Force-Aware World Action Models for Closed-Loop Contact-Rich Manipulation

EquiDexFlow: Contact-Grounded SE(3)-Equivariant Dexterous Grasp Generative Flows

Bounding Boxes as Goals: Language-Conditioned Grasping via Neuro-Symbolic Planning

Digital Twin Driven Textile Classification and Foreign Object Recognition in Automated Sorting Systems

4. 导航、定位与SLAM 9 篇

Occupancy-Grounded Room Segmentation for Hierarchical 3D Scene Graphs

AnyGoal: Vision-Language Guided Multi-Agent Exploration for Training-Free Lifelong Navigation

SplatlessDF: Continuous Distance Field Mapping with Non-Splatting Gaussians

GAIT: Legged Robot Proprioceptive State Estimation with Attention over Inertial-Leg Tokens

BIM-Loc: BIM-Integrated Discrepancy-Aware LiDAR-based Indoor Localization

FloVerse: Floor Plan-Guided Multi-Modal Navigation

ForestBack: Breadcrumb-Based Pedestrian Dead Reckoning for Infrastructure-Free Return Navigation

Cross-Stage Sensorimotor Perception Scheduling and Sparse Map Encoding for Efficient Edge Embodied Navigation

Schrödinger's Navigator: Imagining an Ensemble of Futures for Zero-Shot Object Navigation

5. 人机交互与协作机器人 5 篇

The N2D Haptic Glove: A Multi-Finger Glove for 2D Directional Force Feedback for Contact Rich Manipulation

Universal Manipulation Exoskeleton: Learning Compliant Whole-body Policies with Real-time Torque Feedback

What Robots Do Matters More Than What They Look Like: Task Context Shapes Trust in Educational HRI

Whole-Body Impedance Model Predictive Control for Safe Physical Human--Robot Interaction on Floating-Base Platforms

Low-Burden LLM-Based Preference Learning: Personalizing Assistive Robots from Natural Language Feedback for Users with Paralysis

6. 具身智能与视觉语言动作模型 6 篇

PhysVLA: Towards Physically-Grounded VLA for Embodied Robotic Manipulation

Self-Improving VLA Policies: Selected Diffusion Noise for Spurious-Robust Action Smoothing

Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack

RT-VLA: Real-Time Vision-Language-Action Models via Knowledge Distillation

WAM4D: Fast 4D World Action Model via Spatial Register Tokens

Encoder Winners Do Not Reliably Transfer Across VLA Backbone Scale: A Frozen-Backbone Grafting Diagnostic

7. 多机器人与群体系统 2 篇

Optimality-Preserving Decomposition for Scalable QAOA in Natural-Language-Guided Multi-Drone Assignment

Micro-Swarm Locomotion Optimization in Dynamic Flow using Multi-Objective Multi-Agent Reinforcement Learning

8. 无人车、无人机与移动机器人 8 篇

Multi-Agent Embodied Autonomous Driving: From V2X Information Exchange to Shared World Models

Guided Diffusion with Distilled Vision-Language Reliability for Aerial Navigation

From Attacks to Curricula: Learnability-Guided Adversarial Training for Safe Autonomous Driving

Short-Horizon Position Accuracy of Single-Track Models: Implications for Motion Planning of Autonomous Vehicles

Selective Agentic Recovery for UAV Autonomy with a Persistent Mission Runtime

Safe Reinforcement Learning of Autonomous Highway Driving: A Unified Framework for Safety and Efficiency

An integrated interpretable control effectiveness learning and nonlinear control allocation methodology for overactuated aircrafts

ADAPT: An Autonomous Forklift for Construction Site Operation

9. 软体机器人与硬件设计 2 篇

Scalable Dynamic Tactile Sensing Enabled by Passive and Flexible Acoustic Waveguides

Development of a 3 in Sewer Pipe Inspection Robot with an Articulated Differential Mechanism using X-shaped Linkages

10. 仿真、数据集与评测 7 篇

ContactWorld: What Matters in Vision-Tactile World Models for Contact-Rich Manipulation

ReactSim-Bench: Benchmarking Reactive Behavior World Model Simulation in Autonomous Driving

Kine2Go: Kinematic dataset for the Unitree Go2 robot with diverse gaits and motions

Instruct-Particulate: Scaling Feed-Forward 3D Object Articulation with Kinematic Control

Estimation of Ground Reaction Forces from Kinematic Data during Locomotion

Benchmarking Vision-Language-Action Models on SO-101: Failure and Recovery Analysis

Traceable Virtual Sea Trials in the Marine Robotics Unity Simulator for Manoeuvring Assessment of Unmanned Surface Vehicles

11. 安全、鲁棒性与可信机器人 2 篇