机器人 / 具身智能 - arXivDaily 专题

2606.19357 2026-06-19 cs.RO cs.AI 新提交 95%

Physical Atari: A Robust and Accessible Platform for Real-time Reinforcement Learning on Robots

Physical Atari: 一个用于机器人实时强化学习的鲁棒且可访问的平台

Khurram Javed, Joseph Modayil, Gloria Kennickell, Richard S. Sutton, John Carmack

发表机构 * Keen Technologies ； University of Alberta, Canada（阿尔伯塔大学，加拿大）； Openmind Research Institute（Openmind研究机构）

专题命中机器人学习：机器人实时强化学习平台，验证算法在物理世界学习

AI总结提出Physical Atari平台，通过机器人操作Atari控制器和实时渲染游戏帧，实现物理世界中的强化学习研究，验证了算法可直接在机器人上学习，并指出分布偏移会显著降低策略性能。

Comments To appear at RLC 2026

详情

AI中文摘要

我们构建了一个名为Robotroller的机器人，它能够操作Atari CX40+控制器，以及一个名为Atari Devbox的设备，该设备在屏幕上渲染来自Arcade Learning Environment的游戏帧和奖励信号。Robotroller和Atari Devbox，连同现成的摄像头和台式计算机，构成一个可用于研究物理世界中强化学习算法的系统。我们将整个系统称为Physical Atari。在本文中，我们详细介绍了使Physical Atari成为一个鲁棒且可访问平台的关键决策。为了使系统鲁棒，我们设计了Robotroller，使得所有运动都通过轴承完成，从而减少磨损。此外，我们编写了软件，以高频监控伺服电机的状态并进行干预以限制应力。为了使系统可访问，我们使用了价格合理的现成组件和可通过消费级3D打印机制造的零件。Physical Atari的建造成本低于1000美元，并且已用于数周不间断的强化学习实验，未出现任何机械故障。我们用它验证了强化学习算法可以直接在机器人上学习，并表明即使学习和部署之间的微小分布偏移也会显著降低策略的性能。我们的结果强调了设备端适应对于在机器人上获得强性能的重要性。

英文摘要

We built a robot called the Robotroller that actuates an Atari CX40+ controller and a device called the Atari Devbox that renders the game frame and the reward signal from the Arcade Learning Environment on a screen. The Robotroller and the Atari Devbox, together with an off-the-shelf camera and a desktop computer, constitute a system that can be used to study reinforcement learning algorithms in the physical world. We call the full system Physical Atari. In this paper, we detail the key decisions that make Physical Atari a robust and accessible platform. To make the system robust, we designed the Robotroller so that all movement is done through bearings, which reduces wear. Additionally, we wrote software that monitors the state of the servos at a high frequency and intervenes to limit stress. To make the system accessible, we used affordable off-the-shelf components and parts that can be manufactured using consumer 3D printers. Physical Atari can be built for under $1,000 and has been used for weeks of non-stop reinforcement learning experiments without any mechanical failures. We used it to validate that reinforcement learning algorithms can learn directly on robots and show that even small distribution shifts between learning and deployment can significantly degrade the performance of policies. Our results underscore the importance of on-device adaptation for strong performance on robots.

URL PDF HTML ☆

赞 0 踩 0

2601.02379 2026-06-19 cs.RO cs.AI 版本更新 95%

Movement Primitives in Robotics: A Comprehensive Survey

机器人运动基元：综合综述

Nolan B. Gutierrez, Joseph M. Cloud, William J. Beksi

发表机构 * Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, USA（计算机科学与工程系，德克萨斯理工大学阿灵顿分校，阿灵顿，美国）

专题命中机器人学习：全面综述机器人运动基元，属于机器人学习

AI总结综述机器人运动基元框架，涵盖从人类示教中编码轨迹的方法，分析弹簧-阻尼系统、概率耦合、神经网络等特性，并讨论应用与挑战。

Comments 105 pages, 3 figures, and 6 tables

详情

AI中文摘要

生物系统表现出连续的运动流，由顺序片段组成，使它们能够以创造性和多功能的方式执行复杂任务。这一观察促使研究人员识别出被称为运动基元的运动基本构建块，这些基元非常适合在自主系统（如机器人）中生成运动指令。在本综述中，我们按时间顺序提供了运动基元方法和应用的百科全书式概述。具体来说，我们将运动基元框架呈现为一种表示通过人类示教获得的机器人控制轨迹的方式。在机器人领域，运动基元可以在轨迹级别编码基本运动，例如机器人如何抓取杯子或抛球所需的运动序列。此外，运动基元已开发出具有弹簧-阻尼系统的理想分析特性、多个示教的概率耦合、在高维系统中使用神经网络等特性，以应对机器人领域的困难挑战。尽管运动基元广泛应用于各个领域，本综述的目标是告知从业者如何在机器人背景下使用这些框架。具体而言，我们旨在（i）系统回顾主要运动基元框架并检查其优缺点；（ii）突出已成功使用运动基元的应用；（iii）检查开放问题并讨论在机器人中应用运动基元时的实际挑战。

英文摘要

Biological systems exhibit a continuous stream of movements, consisting of sequential segments, that allow them to perform complex tasks in a creative and versatile fashion. This observation has led researchers towards identifying elementary building blocks of motion known as movement primitives, which are well-suited for generating motor commands in autonomous systems, such as robots. In this survey, we provide an encyclopedic overview of movement primitive approaches and applications in chronological order. Concretely, we present movement primitive frameworks as a way of representing robotic control trajectories acquired through human demonstrations. Within the area of robotics, movement primitives can encode basic motions at the trajectory level, such as how a robot would grasp a cup or the sequence of motions necessary to toss a ball. Furthermore, movement primitives have been developed with the desirable analytical properties of a spring-damper system, probabilistic coupling of multiple demonstrations, using neural networks in high-dimensional systems, and more, to address difficult challenges in robotics. Although movement primitives have widespread application to a variety of fields, the goal of this survey is to inform practitioners on the use of these frameworks in the context of robotics. Specifically, we aim to (i) present a systematic review of major movement primitive frameworks and examine their strengths and weaknesses; (ii) highlight applications that have successfully made use of movement primitives; and (iii) examine open questions and discuss practical challenges when applying movement primitives in robotics.

URL PDF HTML ☆

赞 0 踩 0

2606.19729 2026-06-19 cs.RO cs.AI 新提交 90%

VOiLA: Vectorized Online Planning with Learned Diffusion Model for POMDP Agents

VOiLA: 基于学习扩散模型的向量化在线规划用于POMDP智能体

Marcus Hoerger, Rishikesh Joshi, Rahul Shome, Ian Manchester, Hanna Kurniawati

发表机构 * Australian National University（澳大利亚国立大学）； The University of Sydney（悉尼大学）

专题命中机器人学习：提出POMDP在线规划框架，用于机器人规划。

AI总结提出VOiLA框架，利用条件扩散模型学习POMDP模型，通过蒸馏加速采样并与向量化在线规划器集成，在三个基准任务和实物机器人上实现高效在线规划。

Comments Submitted to the 2026 International Symposium of Robotics Research (ISRR)

详情

AI中文摘要

不确定性下的规划是自主机器人的关键能力。部分可观测马尔可夫决策过程（POMDP）为此提供了强大框架。尽管基于POMDP的规划已取得显著进展，但其在现实问题中的应用常受限于难以获得准确的POMDP模型。我们提出VOiLA（Vectorized Online planning wIth Learned diffusion model for POMDP Agents），一个学习任务无关POMDP模型以实现在不确定性下在线规划的框架。VOiLA使用条件扩散模型学习转移和观测采样器，并学习用于基于粒子的信念更新的观测似然模型。为实现高效在线规划，扩散采样器被蒸馏为紧凑的前馈生成器，并与VOPP（一种利用GPU并行化的在线POMDP规划器）集成。实验结果表明，蒸馏策略将采样成本降低了近三个数量级，使学习到的生成式POMDP模型对在线规划实用。在三个基准问题上的评估表明，VOiLA在使用不到10%训练数据的情况下，性能达到或优于递归软演员-评论家算法，并且对未见环境配置的泛化能力更强。实物机器人评估表明，VOiLA仅使用模拟数据学习模型，并在10次运行中全部成功完成任务。

英文摘要

Planning under uncertainty is an essential capability for autonomous robots. The Partially Observable Markov Decision Process (POMDP) provides a powerful framework for such a capability. Although POMDP-based planning has advanced significantly, its application to real-world problems is often limited by the difficulty of obtaining faithful POMDP models. We present Vectorized Online planning wIth Learned diffusion model for POMDP Agents (VOiLA), a framework that learns task-agnostic POMDP models for online planning under uncertainty. VOiLA learns transition and observation samplers using conditional diffusion models and learns observation-likelihood models for particle-based belief updates. To enable efficient online planning, the diffusion samplers are distilled into compact feedforward generators and integrated with Vectorized Online POMDP Planner (VOPP), an online POMDP planner designed to leverage GPU parallelization. Experimental results indicate the distillation strategy reduces sampling cost by up to nearly three orders of magnitude, making learned generative POMDP models practical for online planning. Evaluation of VOiLA on three benchmark problems indicate that VOiLA achieves equal or better performance than Recurrent Soft Actor Critic while using less than 10% training data, and generalizes much better to unseen environment configurations. Physical robot evaluation indicates VOiLA uses the models learned using only simulated data and generates a policy that successfully accomplish the task in 10 of 10 runs.

URL PDF HTML ☆

赞 0 踩 0

2606.19728 2026-06-19 cs.RO cs.AI 新提交 90%

Bidirectional Tutoring for Developmental Motor Learning in Robots: Co-Developed Interaction Dynamics Support Stable Learning

机器人发展性运动学习的双向辅导：共同发展的交互动力学支持稳定学习

Rui Fukushima, Jun Tani

发表机构 * Okinawa Institute of Science and Technology Graduate University（冲绳科学技术大学院大学）

专题命中机器人学习：提出双向辅导框架用于机器人运动技能学习。

AI总结提出双向辅导框架，通过人类或AI导师与机器人动态适应，利用自由能原理神经网络实现稳定序列学习，在物体操作任务中验证了行为一致性和泛化能力。

Comments 16 pages, 14 figures

详情

AI中文摘要

众所周知，婴儿通过与照顾者的密集互动来发展运动技能。尽管这种社会互动对人类发展至关重要，但机器人的运动技能学习通常被视为单向过程，机器人被动接受导师的演示。这忽视了社会互动的一个关键特性：它本质上是双向的，导师和学习者相互动态适应。在这种互动中，机器人的过往经验可能作为先验约束，塑造共同发展轨迹的动态。我们假设双向辅导允许这些约束引导形成一致的行为模式，从而保持行为一致性并支持泛化，而单向互动缺乏此类约束，导致更广泛、更不一致的行为模式。为检验这一假设，我们使用实体人形机器人进行了两个物体操作实验：一个涉及人机互动，另一个采用AI导师通过自适应干预机制与真实机器人互动，以检验在更受控条件下是否会出现类似效果。我们使用基于自由能原理的神经网络并扩展生成回放来实现发展性学习框架，该框架支持从单个辅导情节中进行稳定的逐序列学习。在两种设置中，双向辅导促进了行为一致性和阶段性泛化，同时机器人逐渐需要更少的导师指导。这些结果表明，双向辅导作为一种具身和社会化方法，为机器人的发展性运动学习提供了有效支架。

英文摘要

Infants are well known to develop their motor skills through dense interaction with caregivers. Although such social interaction is crucial for human development, motor-skill learning in robots is often treated as a unidirectional process in which robots passively receive demonstrations from tutors. This overlooks a key property of social interaction: it is inherently bidirectional, with tutor and learner dynamically adapting to each other. In such interactions, the robot's past experiences may function as prior constraints that shape the dynamics of their co-developed trajectories. We hypothesize that bidirectional tutoring allows such constraints to guide the formation of consistent behavioral patterns that preserve behavioral coherence and support generalization, whereas unidirectional interaction lacks such constraints and leads to broader, less consistent behavioral patterns. To examine this hypothesis, we conducted two experiments with a physical humanoid robot performing an object manipulation task: one involving human-robot interaction and another employing an AI tutor interacting with the real robot through an adaptive intervention mechanism designed to examine whether similar effects would emerge under more controlled conditions. We implement the developmental learning framework using a free-energy-principle-based neural network extended with generative replay, which supports stable sequence-by-sequence learning from single tutored episodes. Across both settings, bidirectional tutoring fostered consistent behaviors and stage-wise generalization, while the robot gradually required less tutor guidance. These results suggest that bidirectional tutoring, as an embodied and socially grounded approach, provides an effective scaffold for developmental motor learning in robots.

URL PDF HTML ☆

赞 0 踩 0

2606.19699 2026-06-19 cs.RO cs.LG cs.SY eess.SY 新提交 90%

Comparative Study on Agility, Efficiency, and Impact Absorption of Bipedal Robots with Active Toes

具有主动脚趾的双足机器人敏捷性、效率和冲击吸收的比较研究

Joong-Gil Kim, Wontae Ye, Geunwoo Cho, Seong-Ho Yun, Se-Hyoung Cho, Yong-Jae Kim

发表机构 * School of Electrical, Electronics and Communication Engineering, Korea University of Technology and Education（韩国技术教育大学电气、电子与通信工程学院）； Artificial Intelligence and Robotics Institute, Korea Institute of Science and Technology（韩国科学技术研究院人工智能与机器人研究所）； Robot Innovation Hub, WIRobotics Inc.（WIRobotics公司机器人创新中心）

专题命中机器人学习：比较双足机器人有无主动脚趾的性能。

AI总结提出一种14自由度双足机器人，模拟人类脚趾的轻量、高扭矩、坚固特性，通过高保真仿真训练环境，对比有无主动脚趾的配置，发现脚趾机器人以1.33米/秒行走时，CoT降低17.5%，脚跟冲击力降低5.0%，路径偏差平均和最大分别降低25.0%和34.0%。

Comments 6 pages, 7 figures

详情

AI中文摘要

人类腿部表现出高效率、敏捷性和冲击吸收能力，其中脚趾在这些能力中起着关键作用。尽管已经有许多尝试在机器人中实现类似人类的脚趾，但它们尚未完全复制人类特征，也没有严格验证其益处。我们提出了一种14自由度的双足机器人，模拟人类脚趾的轻量、高扭矩、坚固特性。为了定量分析主动脚趾在敏捷性、效率和冲击吸收方面的有效性，我们开发了一个高保真仿真训练环境，该环境反映了具有耦合传动和精确功耗的实际执行器。为了确保有和没有主动脚趾的配置之间的公平比较，我们设计了一个最小化强化学习奖励函数，并对两者应用了相同的训练程序。仿真结果表明，在1.33米/秒行走时，与无脚趾配置相比，配备脚趾的机器人将CoT降低了17.5%，脚跟冲击力降低了5.0%。在敏捷性测试中，平均和最大路径偏差分别降低了25.0%和34.0%。

英文摘要

Human legs exhibit high efficiency, agility, and impact absorption, with toes playing a crucial role in these capabilities. While many attempts have been made to implement human-like toes in robots, they have not fully replicated human characteristics nor rigorously validated their benefits. We propose a 14-DOF biped robot emulating human toes' lightweight, high-torque, robust nature. To quantitatively analyze the effectiveness of the active toes in terms of agility, efficiency, and impact absorption, we developed a high-fidelity simulation training environment that reflects actual actuators with coupled transmissions and accurate power consumption. To ensure a fair comparison between configurations with and without active toes, we designed a minimal RL reward function and applied an identical training procedure to both. The simulation results indicate that, at 1.33 m/s walking, the toe-equipped robot reduced CoT by 17.5% and heel-strike GRF by 5.0% compared with the toe-ablation configuration. On the agility test, average and maximum path deviation decreased by 25.0% and 34.0%, respectively.

URL PDF HTML ☆

赞 0 踩 0

2606.19419 2026-06-19 cs.RO cs.AI 新提交 90%

Playful Agentic Robot Learning

趣味性具身机器人学习

Junyi Zhang, Jiaxin Ge, Hanjun Yoo, Letian Fu, Zihan Yang, Yaowei Liu, Raj Saravanan, Shaofeng Yin, Justin Yu, Dantong Niu, Zirui Wang, Roei Herzig, Ken Goldberg, Yutong Bai, David M. Chan, Ion Stoica, Angjoo Kanazawa, Jiahui Lei, Haiwen Feng, Trevor Darrell

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Impossible Research

专题命中机器人学习：机器人通过自主探索学习可复用技能。

AI总结提出RATs框架，让机器人通过自主探索学习可复用技能，在LIBERO-PRO和MolmoSpaces上分别提升20.6和17.0个百分点。

Comments Project page: https://playful-rats.github.io/

详情

AI中文摘要

当前的具身机器人系统可以编写可执行的代码即策略程序、观察反馈并在多次尝试中修正行为，但它们仍然主要是任务驱动的：可复用技能仅在明确指令后获得。我们研究趣味性具身机器人学习，其中具身编码代理在下游任务到来之前，将自主导向的趣味性作为持续技能学习阶段。我们引入RATs，即专为趣味性技能获取设计的机器人代理团队。在趣味性阶段，RATs提出新颖且可学习的探索性任务，规划并执行机器人代码策略，验证中间进展，诊断失败，通过密集的步骤级反馈进行重试，并将成功执行提炼到持久代码技能库中。在测试时，代理从该冻结库中重用相关技能以帮助解决新任务。在LIBERO-PRO和MolmoSpaces上的实验表明，与无趣味性和随机趣味性基线相比，趣味性学习技能在保留的下游任务上分别提升了20.6和17.0个百分点（相对于CaP-Agent0）。此外，学习到的技能可以通过简单地检索到上下文中插入到其他推理时代码即策略代理中，无需微调基础模型，即可在RoboSuite和真实世界迁移中分别提升8.9和8.8个百分点。

英文摘要

Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied coding agent uses self-directed play as a continual skill-learning stage before downstream tasks arrive. We introduce RATs, Robotics Agent Teams designed for play-time skill acquisition. During play, RATs proposes novel yet learnable exploratory tasks, plans and executes robot-code policies, verifies intermediate progress, diagnoses failures, retries with dense, step-level feedback, and distills successful executions into a persistent code skill library. At test time, the agent reuses relevant skills from this frozen library to help solve new tasks. Experiments in LIBERO-PRO and MolmoSpaces show that play-learned skills improve held-out downstream tasks over no-play and random-play baselines, with 20.6 and 17.0 percentage-point gains over CaP-Agent0 on LIBERO-PRO and MolmoSpaces, respectively. Moreover, the learned skills can be plugged into other inference-time Code-as-Policy agents by simply retrieving them into the context, improving RoboSuite and real-world transfer by 8.9 and 8.8 points, respectively, without finetuning the underlying model.

URL PDF HTML ☆

赞 0 踩 0

2605.23733 2026-06-19 cs.RO cs.AI 版本更新 90%

Any2Any: Efficient Cross-Embodiment Transfer for Humanoid Whole-Body Tracking

Any2Any: 高效跨本体迁移用于人形机器人全身跟踪

Ming Yang, Tao Yu, Feng Li, Hua Chen

发表机构 * LimX Dynamics（LimX动力学）

专题命中机器人学习：人形机器人全身跟踪跨本体迁移

AI总结提出Any2Any范式，通过运动学对齐和动力学微调，实现预训练全身跟踪模型高效迁移至新的人形机器人本体，仅需少量数据和计算即可达到竞争性跟踪性能。

Comments Project Page: https://any2any.top/

详情

AI中文摘要

全身跟踪（WBT）模型已成为人形机器人的关键基础，使其能够高保真地模仿各种运动。从头训练此类模型需要大规模数据和计算，使得在新人形平台上快速部署成本高昂。这自然引发一个问题：预训练的WBT模型能否通过最小化适应跨本体迁移？为回答这个问题，我们提出Any2Any，一种范式，能够高效地将现有WBT专家迁移到新人形本体，仅需少量数据和计算。Any2Any首先在源和目标人形之间进行运动学对齐，对齐其输入和输出空间，使得预训练的源策略可以在目标本体上有意义地重用。然后，Any2Any通过向选定的动力学敏感模块应用轻量级参数高效微调（PEFT）组件进行动力学适应，保留有用的行为先验，同时实现对目标机器人的定向适应。在多个人形平台和预训练骨干上的大量实验表明，与从头训练相比，Any2Any显著加速收敛并降低训练成本，同时实现具有竞争力或更优的跟踪性能。值得注意的是，仅使用完整训练所需计算和数据的1%，Any2Any成功将在Unitree G1上预训练的Sonic模型迁移到LimX Oli和LimX Luna。这些结果表明，预训练的WBT专家可以跨本体高效重用，为在新机器人上部署人形全身控制提供可扩展的路径。

英文摘要

Whole-body tracking (WBT) models have become a key foundation for humanoid robots, enabling them to imitate diverse motions with high fidelity. Training such models from scratch requires large-scale data and computation, making rapid deployment on new humanoid platforms costly. This raises a natural question: Can pretrained WBT models transfer across embodiments with minimal adaptation? To answer this question, we propose Any2Any, a paradigm that efficiently transfers an existing WBT specialist to a new humanoid embodiment with only a small amount of data and compute. Any2Any first performs kinematic alignment between source and target humanoids, aligning their input and output spaces so that the pretrained source policy can be meaningfully reused on the target embodiment.Any2Any then performs dynamics adaptation by applying lightweight parameter-efficient fine-tuning (PEFT) components to selected dynamics-sensitive modules, preserving useful behavioral priors while enabling targeted adaptation to the target robot. Extensive experiments on multiple humanoid platforms and pretrained backbones show that Any2Any substantially accelerates convergence and reduces training cost compared with training from scratch, while achieving competitive or superior tracking performance. Notably, using only 1% of the compute and data required for full training, Any2Any successfully transfers Sonic models pre-trained on Unitree G1 to LimX Oli and LimX Luna. These results suggest that pretrained WBT specialists can be efficiently reused across embodiments, providing a scalable path toward deploying humanoid whole-body control on new robots. More results and videos are available on our project page: https://any2any.top/.

URL PDF HTML ☆

赞 0 踩 0

2605.08525 2026-06-19 cs.RO cs.SY eess.SY 版本更新 90%

Model-Reference Adaptive Flight Control of a 95-mg Insect-Scale Flapping-Wing Aerial Robot

95毫克昆虫尺度扑翼飞行机器人的模型参考自适应飞行控制

Francisco M. F. R. Gonçalves, Conor K. Trygstad, Néstor O. Pérez-Arancibia

发表机构 * Washington State University（华盛顿州立大学）

专题命中机器人学习：昆虫尺度扑翼飞行机器人的自适应飞行控制

AI总结针对昆虫尺度扑翼飞行机器人参数不确定性和扰动问题，提出模型参考自适应控制（MRAC）架构，结合混合乘性扩展卡尔曼滤波，实现高精度位置控制，并通过95毫克机器人实验验证了悬停和轨迹跟踪性能。

Comments Under review, 8 pages, 7 figures

详情

AI中文摘要

由于系统尺度和复杂制造，描述扑翼昆虫尺度飞行机器人动力学的模型存在参数不确定性，例如惯性矩阵和飞行器的执行器映射。此外，由于其低惯性，这种机器人在飞行中受到随机和系统性扰动的严重影响，包括电源线张力、阵风和机翼不对中产生的非期望气动力。因此，在亚分克尺度上执行复杂机动的高性能要求机器人调整其行为以抵消扰动和模型不确定性。为此，我们引入了一种模型参考自适应控制（MRAC）架构，用于可实现为三维空间中刚体的扑翼机器昆虫的高性能位置控制。此外，我们展示了在飞行中实现混合乘性扩展卡尔曼滤波以估计当前和期望角速度，如何显著抑制姿态振动，特别是沿滚转和俯仰自由度，并提高飞行性能。为了展示所提方法的适用性、功能性和高性能，我们使用一个95毫克的昆虫尺度飞行机器人进行了实时悬停和轨迹跟踪六自由度飞行控制实验。

英文摘要

Due to the system's scale and complex fabrication, the model describing the dynamics of a flapping-wing insect-scale aerial robot is subject to parameter uncertainty; for example, in the inertia matrix and the actuator mapping of the flier. Furthermore, due to its low inertia, this type of robot is greatly affected by stochastic and systematic disturbances during flight, including power-wire tension, gusts, and undesired aerodynamic forces produced by wing misalignment. Therefore, the high-performance execution of complex maneuvers at the subdecigram scale requires the robot to adapt its behavior to counteract disturbances and model uncertainty. Toward this objective, we introduce a model-reference adaptive control (MRAC) architecture for high-performance position control of flapping-wing robotic insects that can be modeled as rigid bodies in the three-dimensional (3D) space. In addition, we demonstrate how the implementation of a hybrid multiplicative extended Kálmán filter for estimating current and desired angular velocities during flight significantly dampens attitude vibrations, especially along the roll and pitch degrees of freedom (DOFs), and also improves flight performance. To show the suitability, functionality, and high performance of the proposed approach, we conducted real-time hovering and trajectory-tracking 6-DOF flight control experiments with a 95-mg insect-scale aerial robot.

URL PDF HTML ☆

赞 0 踩 0

2602.04037 2026-06-19 cs.LG cs.RO 版本更新 90%

DADP: Domain Adaptive Diffusion Policy

DADP: 领域自适应扩散策略

Pengcheng Wang, Qinghang Liu, Haotian Lin, Yiheng Li, Guojian Zhan, Masayoshi Tomizuka, Yixiao Wang

发表机构 * University of California, Berkeley, California, USA（加州大学伯克利分校）； Peking University, Beijing, China（北京大学）； Tsinghua University, Beijing, China（清华大学）

专题命中机器人学习：提出领域自适应扩散策略用于机器人控制

AI总结提出DADP，通过无监督解耦和领域感知扩散注入，实现跨动态环境的鲁棒零样本适应，在运动与操控任务上超越先前方法。

详情

AI中文摘要

学习能够泛化到未见过的转移动态的领域自适应策略，仍然是基于学习的控制中的一个基本挑战。通过领域表示学习来捕获领域特定信息，从而实现领域感知决策，已经取得了实质性进展。我们分析了通过动态预测学习领域表示的过程，发现选择与当前步骤相邻的上下文会导致学习到的表示将静态领域信息与变化的动态属性纠缠在一起。这种混合可能会混淆条件策略，从而限制零样本适应。为了应对这一挑战，我们提出了DADP（领域自适应扩散策略），通过无监督解耦和领域感知扩散注入实现鲁棒适应。首先，我们引入了滞后上下文动态预测，这是一种将未来状态估计条件化在历史偏移上下文上的策略；通过增加这个时间间隔，我们通过过滤掉瞬态属性来无监督地解耦静态领域表示。其次，我们通过偏置先验分布和重新制定扩散目标，将学习到的领域表示直接集成到生成过程中。在涉及运动和操控的具有挑战性的基准测试上的大量实验表明，DADP相对于先前方法具有优越的性能和泛化能力。更多可视化结果可在此https URL上获得。

英文摘要

Learning domain adaptive policies that can generalize to unseen transition dynamics, remains a fundamental challenge in learning-based control. Substantial progress has been made through domain representation learning to capture domain-specific information, thus enabling domain-aware decision making. We analyze the process of learning domain representations through dynamical prediction and find that selecting contexts adjacent to the current step causes the learned representations to entangle static domain information with varying dynamical properties. Such mixture can confuse the conditioned policy, thereby constraining zero-shot adaptation. To tackle the challenge, we propose DADP (Domain Adaptive Diffusion Policy), which achieves robust adaptation through unsupervised disentanglement and domain-aware diffusion injection. First, we introduce Lagged Context Dynamical Prediction, a strategy that conditions future state estimation on a historical offset context; by increasing this temporal gap, we unsupervisedly disentangle static domain representations by filtering out transient properties. Second, we integrate the learned domain representations directly into the generative process by biasing the prior distribution and reformulating the diffusion target. Extensive experiments on challenging benchmarks across locomotion and manipulation demonstrate the superior performance, and the generalizability of DADP over prior methods. More visualization results are available on the https://outsider86.github.io/DomainAdaptiveDiffusionPolicy/.

URL PDF HTML ☆

赞 0 踩 0

2505.17006 2026-06-19 cs.CV cs.RO 版本更新 90%

CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning

CoMo: 从互联网视频中学习连续潜在运动以实现可扩展的机器人学习

Jiange Yang, Yansong Shi, Haoyi Zhu, Mingyu Liu, Kaijing Ma, Yating Wang, Gangshan Wu, Tong He, Limin Wang

发表机构 * Nanjing University（南京大学）； Shanghai AI Lab（上海人工智能实验室）； University of Science and Technology of China（中国科学技术大学）； Zhejiang University（浙江大学）； Fudan University（复旦大学）； Tongji University（同济大学）

专题命中机器人学习：从视频学习运动用于机器人，属于机器人学习

AI总结提出CoMo方法，通过早期时间差分和时序对比学习从互联网视频中学习连续潜在运动，避免离散化信息损失，实现零样本泛化生成伪动作标签，联合训练策略在仿真和真实实验中表现优异。

Comments CVPR 2026

详情

AI中文摘要

从互联网视频中无监督学习潜在运动对于机器人学习至关重要。现有的离散方法通常通过小码本大小的向量量化来减轻提取过多静态背景导致的捷径学习，但它们存在信息损失，难以捕捉更复杂和细粒度的动态。此外，离散潜在运动与连续机器人动作之间存在固有分布差距，阻碍了统一策略的联合学习。我们提出CoMo，旨在从互联网规模视频中学习更精确的连续潜在运动。CoMo采用早期时间差分（Td）机制来增加捷径学习难度并显式增强运动线索。此外，为确保潜在运动更好地捕捉有意义的背景，我们进一步提出时序对比学习（Tcl）方案。具体地，正样本对通过小的未来帧时间偏移构建，而负样本对则通过直接反转时间方向形成。所提出的Td和Tcl协同工作，有效确保潜在运动更好地关注前景并增强运动线索。关键的是，CoMo表现出强大的零样本泛化能力，使其能够为未见过的视频生成有效的伪动作标签。大量的仿真和真实实验表明，使用CoMo伪动作标签联合训练的策略在扩散和自回归架构下均实现了优越性能。

英文摘要

Unsupervised learning of latent motion from Internet videos is crucial for robot learning. Existing discrete methods generally mitigate the shortcut learning caused by extracting excessive static backgrounds through vector quantization with a small codebook size. However, they suffer from information loss and struggle to capture more complex and fine-grained dynamics. Moreover, there is an inherent gap between the distribution of discrete latent motion and continuous robot action, which hinders the joint learning of a unified policy. We propose CoMo, which aims to learn more precise continuous latent motion from internet-scale videos. CoMo employs an early temporal difference (Td) mechanism to increase the shortcut learning difficulty and explicitly enhance motion cues. Additionally, to ensure latent motion better captures meaningful foregrounds, we further propose a temporal contrastive learning (Tcl) scheme. Specifically, positive pairs are constructed with a small future frame temporal offset, while negative pairs are formed by directly reversing the temporal direction. The proposed Td and Tcl work synergistically and effectively ensure that the latent motion focuses better on the foreground and reinforces motion cues. Critically, CoMo exhibits strong zeroshot generalization, enabling it to generate effective pseudo action labels for unseen videos. Extensive simulated and real-world experiments show that policies co-trained with CoMo pseudo action labels achieve superior performance with both diffusion and auto-regressive architectures.

URL PDF HTML ☆

赞 0 踩 0

2601.03040 2026-06-19 cs.RO cs.AI cs.LG 版本更新 90%

PiDR: Physics-Informed Inertial Dead Reckoning for Autonomous Platforms

PiDR：面向自主平台的物理信息惯性航位推算

Arup Kumar Sahoo, Itzik Klein

发表机构 * Autonomous Navigation and Sensor Fusion Lab (ANSFL)（自主导航与传感器融合实验室（ANSFL））； Hatter Department of Marine Technologies（海洋技术系）； Charney School of Marine Sciences（海洋科学学院）； University of Haifa（海法大学）

专题命中机器人学习：提出物理信息惯性航位推算框架，用于自主平台

AI总结提出PiDR框架，将惯性导航原理作为物理信息残差融入网络训练，在纯惯性导航中减少轨迹漂移，在移动机器人和水下自主航行器数据集上定位精度提升超29%。

Comments 11 pages and 7 figures

详情

AI中文摘要

完全自主的一个基本要求是在缺乏外部数据（如GNSS信号或视觉信息）的情况下维持精确导航的能力。在这些具有挑战性的环境中，平台必须完全依赖惯性传感器，导致纯惯性导航。然而，在现实场景中，惯性传感器的固有噪声和其他误差项会导致导航解随时间漂移。尽管传统的深度学习模型已成为惯性导航的一种可能方法，但它们本质上是黑箱的。此外，它们在有限的监督传感器数据下难以有效学习，并且常常无法保持物理原理。为了解决这些局限性，我们提出了PiDR，一种用于纯惯性导航情况下自主平台的物理信息惯性航位推算框架。PiDR通过物理信息残差组件将惯性导航原理明确地整合到网络训练过程中，从而提供了透明性。即使在有限或稀疏监督下，PiDR在减轻轨迹突然偏差方面也起着关键作用。我们在移动机器人和自主水下航行器收集的真实世界数据集上评估了PiDR。在两个数据集中，我们获得了超过29%的定位改进，证明了PiDR在不同环境和动力学下运行的不同平台上的泛化能力。因此，PiDR提供了一种鲁棒、轻量级且有效的架构，可以部署在资源受限的平台上，在不利场景中实现实时纯惯性导航。

英文摘要

A fundamental requirement for full autonomy is the ability to sustain accurate navigation in the absence of external data, such as GNSS signals or visual information. In these challenging environments, the platform must rely exclusively on inertial sensors, leading to pure inertial navigation. However, the inherent noise and other error terms of the inertial sensors in such real-world scenarios will cause the navigation solution to drift over time. Although conventional deep-learning models have emerged as a possible approach to inertial navigation, they are inherently black-box in nature. Furthermore, they struggle to learn effectively with limited supervised sensor data and often fail to preserve physical principles. To address these limitations, we propose PiDR, a physics-informed inertial dead-reckoning framework for autonomous platforms in situations of pure inertial navigation. PiDR offers transparency by explicitly integrating inertial navigation principles into the network training process through the physics-informed residual component. PiDR plays a crucial role in mitigating abrupt trajectory deviations even under limited or sparse supervision. We evaluated PiDR on real-world datasets collected by a mobile robot and an autonomous underwater vehicle. We obtained more than 29% positioning improvement in both datasets, demonstrating the ability of PiDR to generalize different platforms operating in various environments and dynamics. Thus, PiDR offers a robust, lightweight, yet effective architecture and can be deployed on resource-constrained platforms, enabling real-time pure inertial navigation in adverse scenarios.

URL PDF HTML ☆

赞 0 踩 0

2511.16223 2026-06-19 cs.RO 90%

DynaMimicGen: A Data Generation Framework for Robot Learning of Dynamic Tasks

DynaMimicGen：一种用于机器人动态任务学习的数据生成框架

Vincenzo Pomponi, Paolo Franceschi, Stefano Baraldo, Loris Roveda, Oliver Avram, Luca Maria Gambardella, Anna Valente

发表机构 * Institute of Systems and Technologies for Sustainable Production (ISTePS)（可持续生产系统与技术研究所）； Department of Innovative Technologies (DTI)（创新技术系）； University of Applied Science and Arts of Southern Switzerland (SUPSI)（瑞士南部应用科学与艺术大学）； Istituto Dalle Molle di studi sull’intelligenza artificiale (IDSIA)（达莫尔智能研究 institute）； Department of Mechanical Engineering（机械工程系）； Politecnico di Milano (PoliMi)（米兰理工学院）； Faculty of Informatics（信息学院）； Università della Svizzera Italiana (USI)（瑞士意大利大学）

专题命中机器人学习：提出DynaMimicGen框架生成动态任务数据用于机器人学习。

AI总结本文提出DynaMimicGen框架，通过少量人类示范生成数据，支持动态任务学习，产生适应性强的轨迹，提升机器人在复杂环境中的表现。

详情

DOI: 10.1109/LRA.2026.3703978

AI中文摘要

学习稳健的操作策略通常需要大量且多样化的数据集，但收集这些数据耗时费力且不适用于动态环境。本文引入DynaMimicGen（D-MG），一种可扩展的数据生成框架，能够在极少量人类监督下训练策略，同时支持动态任务设置。仅需少量人类示范，D-MG首先将示范分割为有意义的子任务，然后利用动态运动片段（DMPs）来适应和推广演示行为到新颖且动态变化的环境。改进了依赖静态假设或简单轨迹插值的先前方法，D-MG生成平滑、真实且任务一致的笛卡尔轨迹，能够实时适应任务执行过程中物体姿态、机器人状态或场景几何的变化。我们的方法支持不同场景——包括场景布局、物体实例和机器人配置——使其适用于静态和高度动态的操作任务。我们证明机器人代理通过模仿学习在D-MG生成的数据上实现了在长时间跨度和接触丰富的基准测试中的强大表现，包括立方体堆叠和将杯子放入抽屉等任务，即使在不可预测的环境变化下也是如此。通过消除对大量人类示范的需求并使动态设置的泛化成为可能，D-MG提供了一种强大而高效的替代手动数据收集方法，为可扩展的自主机器人学习铺平道路。

英文摘要

Learning robust manipulation policies typically requires large and diverse datasets, the collection of which is time-consuming, labor-intensive, and often impractical for dynamic environments. In this work, we introduce DynaMimicGen (D-MG), a scalable dataset generation framework that enables policy training from minimal human supervision while uniquely supporting dynamic task settings. Given only a few human demonstrations, D-MG first segments the demonstrations into meaningful sub-tasks, then leverages Dynamic Movement Primitives (DMPs) to adapt and generalize the demonstrated behaviors to novel and dynamically changing environments. Improving prior methods that rely on static assumptions or simplistic trajectory interpolation, D-MG produces smooth, realistic, and task-consistent Cartesian trajectories that adapt in real time to changes in object poses, robot states, or scene geometry during task execution. Our method supports different scenarios - including scene layouts, object instances, and robot configurations - making it suitable for both static and highly dynamic manipulation tasks. We show that robot agents trained via imitation learning on D-MG-generated data achieve strong performance across long-horizon and contact-rich benchmarks, including tasks like cube stacking and placing mugs in drawers, even under unpredictable environment changes. By eliminating the need for extensive human demonstrations and enabling generalization in dynamic settings, D-MG offers a powerful and efficient alternative to manual data collection, paving the way toward scalable, autonomous robot learning.

URL PDF HTML ☆

赞 0 踩 0

2509.19658 2026-06-19 cs.RO cs.AI 版本更新 90%

RoboSSM: Scalable In-context Imitation Learning via State-Space Models

RoboSSM: 基于状态空间模型的可扩展上下文模仿学习

Youngju Yoo, Jiaheng Hu, Yifeng Zhu, Bo Liu, Qiang Liu, Roberto Martín-Martín, Peter Stone

发表机构 * The University of Texas at Austin（德克萨斯大学奥斯汀分校）； KAIST（韩国科学技术院）； FAIR at Meta（元宇宙FAIR）； Amazon（亚马逊）； Sony AI（索尼人工智能）

专题命中机器人学习：状态空间模型用于机器人上下文模仿学习

AI总结提出RoboSSM，用状态空间模型替代Transformer实现上下文模仿学习，在LIBERO基准上对未见和长时任务泛化更优，首次证明SSM是ICIL高效可扩展的骨干网络。

Comments IROS 2026

详情

AI中文摘要

上下文模仿学习（ICIL）使机器人能够从仅包含少量演示的提示中学习任务。通过消除部署时参数更新的需求，该范式支持对新任务的少样本适应。然而，最近的ICIL方法依赖于Transformer，其计算能力有限，并且在处理比训练时更长的提示时往往表现不佳。在这项工作中，我们引入了RoboSSM，一种基于状态空间模型（SSM）的可扩展上下文模仿学习方案。具体来说，RoboSSM用Longhorn（一种最先进的SSM）替代Transformer，该模型提供线性时间推理和强大的外推能力，非常适合长上下文提示。通过在LIBERO基准上的多样化实验，我们证明了将SSM应用于ICIL的有效性，通过处理测试时更长的上下文，实现了比基于Transformer的ICIL方法对未见和长时任务更好的泛化。这些结果首次表明，SSM是ICIL高效且可扩展的骨干网络。我们的代码可在此网址获取。

英文摘要

In-context imitation learning (ICIL) enables robots to learn tasks from prompts consisting of just a handful of demonstrations. By eliminating the need for parameter updates at deployment time, this paradigm supports few-shot adaptation to novel tasks. However, recent ICIL methods rely on Transformers, which have computational limitations and tend to underperform when handling longer prompts than those seen during training. In this work, we introduce RoboSSM, a scalable recipe for in-context imitation learning based on state-space models (SSM). Specifically, RoboSSM replaces Transformers with Longhorn -- a state-of-the-art SSM that provides linear-time inference and strong extrapolation capabilities, making it well-suited for long-context prompts. Through diverse experiments on the LIBERO benchmark, we demonstrate the effectiveness of applying SSMs to ICIL, achieving improved generalization to both unseen and long-horizon tasks than Transformer-based ICIL methods by handling longer contexts at test-time. These results show for the first time that SSMs are an efficient and scalable backbone for ICIL. Our code is available at https://github.com/youngjuY/RoboSSM.

URL PDF HTML ☆

赞 0 踩 0

2606.20521 2026-06-19 cs.CV 新提交 85%

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

HumanScale: 以自我为中心的人类视频在具身预训练中可超越真实机器人数据

Juncheng Ma, Jianxin Bi, Yufan Deng, Xuanran Zhai, Kewei Zhang, Ye Huang, Bo Liang, Shukai Gong, Jiankai Tu, Xiaotian Tang, Jiaxin Li, Kaiqi Chen, Duomin Wang, Yuqi Wang, Bingyi Kang, Eric Huang, Zhiyang Dou, Zhen Dong, Enze Xie, Wojciech Matusik, Tat-Seng Chua, Daquan Zhou

发表机构 * PKU（北京大学）； NUS（新加坡国立大学）； MIT（麻省理工学院）； UCSB（加州大学圣塔芭芭拉分校）； NVIDIA（英伟达）

专题命中机器人学习：人类视频用于具身基础模型预训练

AI总结本文通过系统比较发现，经过精心设计的过滤和标注流程，以自我为中心的人类视频在具身基础模型预训练中不仅可行，而且性能优于遥操作真实机器人数据，验证了“预训练于人类视频+少量机器人数据适配”的可扩展范式。

Comments Github: https://github.com/DAGroup-PKU/HumanNet/

详情

AI中文摘要

具身基础模型有望像大型语言模型一样从数据扩展中受益，但面临更严重的数据瓶颈。遥操作真实机器人轨迹因其精确的动作监督和具身对齐而仍然是主要的预训练来源，但其可扩展性受限于高采集成本、获取难度以及低行为和环境多样性。这些限制引发了对以自我为中心的人类视频作为可扩展、成本显著更低且更多样化的具身模型预训练替代方案的兴趣。然而，与遥操作真实机器人数据相比，其有效性仍未得到充分探索。为了解决这个问题，我们在固定的后训练和验证协议下，进行了一项系统研究，比较以自我为中心的人类视频和遥操作真实机器人轨迹作为具身基础模型的预训练数据源。令人惊讶的是，我们发现经过精心设计的过滤和标注流程处理的以自我为中心的数据，不仅是模型预训练的可行替代品，而且可以带来更优的性能。在相同预训练数据量下，在以自我为中心数据上预训练的模型在真实机器人动作预测上的验证损失降低了24%，在分布内和分布外真实机器人任务执行上的成功率分别提高了52.5%和90%。这一发现验证了具身基础模型的一种可扩展范式：在以自我为中心的人类视频上预训练以学习多样化的世界表征，然后使用少量标注的真实机器人数据进行适配以实现动作空间对齐。我们希望这项研究能鼓励对以自我为中心数据的更广泛探索，并在昂贵的机器人数据收集之前为数据质量评估提供指导。

英文摘要

Embodied foundation models are expected to benefit from data scaling like large language models, but face a much tighter data bottleneck. Teleoperated real-robot trajectories remain the dominant pretraining source due to their precise action supervision and embodiment alignment, yet their scalability is limited by high collection cost, acquisition difficulty, and low behavioral and environmental diversity. These limitations have sparked interest in egocentric human video as a scalable, substantially lower-cost, and more diverse alternative for embodied model pretraining. However, its effectiveness compared to teleoperated real-robot data remains underexplored. To address this question, we conduct a systematic study comparing egocentric human video and teleoperated real-robot trajectories as pretraining data sources for embodied foundation models, under fixed post-training and validation protocols. Surprisingly, we find that egocentric data, when processed through a carefully designed filtering and labeling pipeline, is not merely a viable substitute for model pretraining but can lead to superior performance. With the same amount of pretraining data, models pretrained on egocentric data achieve a 24% lower validation loss on real-robot action prediction, as well as 52.5% and 90% higher success rates on in-distribution and out-of-distribution real-robot task execution, respectively. This finding verifies a scalable paradigm for embodied foundation models: pretrain on egocentric human video to learn diverse world representations, then adapt with a small amount of labeled real-robot data for action-space alignment. We hope this study encourages broader exploration of egocentric data and offers guidance for data quality assessment before costly robot data collection.

URL PDF HTML ☆

赞 0 踩 0

2606.20495 2026-06-19 cs.RO 新提交 85%

Increasing Resilience of Continuum Robots via Motion Planning Algorithms

通过运动规划算法提高连续体机器人的韧性

Oxana Shamilyan, Ievgen Kabin, Zoya Dyka, Oleksandr Sudakov, Peter Langendoerfer

发表机构 * IHP – Leibniz-Institut für innovative Mikroelektronik（莱布尼茨创新微电子研究所）； BTU Cottbus-Senftenberg（科特博斯-塞芬堡工业大学）； Technical Center, National Academy of Sciences of Ukraine（乌克兰国家科学院技术中心）

专题命中机器人学习：研究连续体机器人的运动规划算法

AI总结本文实验研究运动规划算法对连续体机器人韧性的影响，通过改进遗传算法和A*算法，结合层次分析法评估路径质量，发现遗传算法生成更多样化路径，提升机器人韧性。

详情

AI中文摘要

本文介绍了针对韧性连续体机器人的运动规划实验研究。我们主要关注多准则决策、其在路径规划算法中的应用、对生成路径的影响以及执行时间。为此，我们使用了两种著名的路径规划算法，即遗传算法和A*算法，并通过添加层次分析法算法来评估生成路径的质量，对其进行了修改。在我们的实验中，层次分析法考虑了四个不同的准则，即距离、电机损伤、机器人手臂的机械损伤和精度，每个准则都被认为有助于连续体机器人的韧性。使用不同的准则对于延长连续体机器人的维护操作时间是必要的。我们使用两种不同的机器人模拟环境进行了实验。尽管我们显著简化了机器人模型及其环境，但我们仍然基于真实机器人原型实现了环境的一些特征。特别地，其中一个环境包含单路径点和多路径点，另一个环境仅包含多路径点。结果表明，与A*算法相比，遗传算法的性能时间不依赖于环境的基数。它生成更多样化的路径，从而提高了机器人的韧性。

英文摘要

This paper presents an experimental study of motion planning for resilient continuum robots. In this study we mainly focused on multi-criteria decision-making, its application for path-planning algorithms, impact on the generated path and execution time. To do this, we used two well-known algorithms for path planning, namely Genetic algorithm and A star algorithm, and modified them by adding the Analytical Hierarchy Process algorithm to evaluate the quality of the paths generated. In our experiment the Analytical Hierarchy Process considers four different criteria, i.e. distance, motors damage, mechanical damage of the robot's arm and accuracy, each considered to contribute to the resilience of a continuum robot. The use of different criteria is necessary to increase the time to maintenance operations of the continuum robot. We conducted the experiments using two different simulated environments of the robot. Although we significantly simplified the robot's model and its environment, we still implemented some of the features of the environment based on the real robot prototype. In particular, one of the environments has single- as well as multi-path points, and other consists of the multi-path points only. The results show that, in contrast to A star, the performance time of Genetic algorithm does not depend on the environment's cardinality. It generates more diverse paths, which increases the robot's resilience.

URL PDF HTML ☆

赞 0 踩 0

2606.20389 2026-06-19 cs.RO 新提交 85%

CoLI: A Reproducible Platform for Continuum Robot Learning via Monolithic 3D Printing and Isomorphic Teleoperation

CoLI: 通过整体3D打印和同构遥操作实现连续体机器人学习的可复现平台

Ziyuan Tang, Chenxi Xiao*

发表机构 * School of Information Science and Technology at ShanghaiTech University（上海科技大学信息科学与技术学院）

专题命中机器人学习：连续体机器人学习平台，支持模仿学习和遥操作。

AI总结提出一种基于多材料3D打印和同构遥操作的连续体机器人平台，简化制造流程并实现无奇异映射控制，支持模仿学习自主控制，通过硬件表征和操作任务验证其可复现性和学习就绪性。

Comments 8 pages, 7 figures, 1 table, accepted by IROS2026

详情

AI中文摘要

连续体机器人因其高自由度、柔顺结构和操作安全性，在操作任务中展现出巨大潜力。然而，复杂的制造和组装过程、具有挑战性的运动学建模以及缺乏直观的控制接口，导致其在研究和实际应用中的可复现性受到阻碍。为解决这些问题，我们提出了一种新颖的开源连续体机器人设计。该平台采用多材料3D打印实现简化的制造流程，使机械臂能够作为整体柔顺结构制造，且组装工作量最小。控制通过同构遥操作接口实现，该接口建立了直接的执行器级映射，无需显式运动学建模，并提供无奇异映射。基于该硬件设计，平台进一步支持基于模仿学习的自主控制。通过硬件表征和一系列操作任务对所提出的系统进行了评估。实验结果表明，该平台提供了一个可复现的、学习就绪的连续体机器人系统，加速了连续体机器人社区的算法开发和系统基准测试。

英文摘要

Continuum robots offer strong potential for manipulation tasks due to their high degrees of freedom, compliant structures, and operational safety. However, their adoption in both research and practical applications has been hindered by reproducibility issues arising from complex fabrication and assembly processes, challenging kinematic modeling, and a lack of intuitive control interfaces. To address these challenges, we present a novel open-source continuum robot design. The platform features a simplified fabrication pipeline enabled by multi-material 3D printing, allowing the arm to be fabricated as a monolithic compliant structure with minimal assembly. Control is achieved through an isomorphic teleoperation interface that establishes a direct actuator-level mapping, eliminating the need for explicit kinematic modeling and providing a singularity-free mapping. Building on this hardware design, the platform further supports imitation-learning-based autonomous control. The proposed system is evaluated through hardware characterization and a set of manipulation tasks. Experimental results demonstrate that the platform provides a reproducible, learning-ready continuum robot system, accelerating algorithmic development and systematic benchmarking for the continuum robotics community.

URL PDF HTML ☆

赞 0 踩 0

2606.20365 2026-06-19 cs.RO cs.MA 新提交 85%

An Infrastructure-less, Control-Independent Solution to Relative Localisation of a Team of Mobile Robots using Ranging Measurements

基于测距的移动机器人团队相对定位的无基础设施、控制无关解决方案

Paolo Golinelli, Tommaso Faraci, Daniele Fontanelli

发表机构 * Department of Industrial Engineering, University of Trento（特伦托大学工业工程系）； Department of Information Engineering and Computer Science, University of Trento（特伦托大学信息工程与计算机科学系）

专题命中机器人学习：移动机器人团队协作定位算法

AI总结提出一种无锚点、完全去中心化的协作定位算法，仅依赖局部里程计、稀疏测距和短程通信，无需控制机器人运动即可实现团队可观测性，采用多假设贝叶斯框架保证鲁棒性。

详情

AI中文摘要

定位机器人团队的能力对于从非结构化环境中的机器人舰队到协作控制和导航任务等应用至关重要。在此类场景中，固定基础设施通常不可用，部署必须快速灵活，系统要求必须最小化。我们提出了一种去中心化协作定位算法，同时解决了所有这些挑战。该方法无锚点、完全去中心化，并且与大多数现有方法不同，不需要控制机器人运动来确保团队可观测性。它仅依赖局部里程计、稀疏的代理间测距测量和短程通信，这些在实践中广泛可用。该算法采用多假设贝叶斯框架，维护所有可行解集，确保在瞬态不可观测条件下的鲁棒性。此外，通过信息共享，每个代理都能受益于整个群体的估计，即使在部分连接条件下也是如此。

英文摘要

The ability to localise teams of robots is essential for applications ranging from robotic fleets in unstructured environments to cooperative control and navigation tasks. In such contexts, fixed infrastructure is often unavailable, deployments must be fast and flexible, and system requirements must be minimal. We present a decentralised cooperative localisation algorithm that addresses all these challenges at once. The method is anchor-less, fully decentralised, and, unlike most existing approaches, does not require controlling the robots motion to ensure team observability. It relies only on local odometry, sparse inter-agent ranging measurements, and short-range communication, all of which are widely available in practice. The algorithm adopts a multi-hypothesis Bayesian framework that maintains the entire set of feasible solutions, ensuring robustness under transient unobservable conditions. Moreover, through information sharing, each agent benefits from the estimates of the entire group, even in partially connected conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.20209 2026-06-19 cs.RO cs.AI 新提交 85%

FlowMaps: Modeling Long-Term Multimodal Object Dynamics with Flow Matching

FlowMaps: 使用流匹配建模长期多模态物体动态

Francesco Argenziano, Miguel Saavedra-Ruiz, Sacha Morin, Charlie Gauthier, Daniele Nardi, Liam Paull

发表机构 * Sapienza University of Rome（罗马大学）； Université de Montréal（蒙特利尔大学）； Mila - Quebec AI Institute（米拉-魁北克人工智能研究所）

专题命中机器人学习：FlowMaps建模物体动态，提升机器人导航性能。

AI总结提出FlowMaps模型，通过潜在流匹配学习物体位置的多模态时空分布，预测动态物体未来位置，提升机器人在变化家庭环境中的导航性能。

详情

AI中文摘要

对3D场景的联合空间和时间理解是部署在日常家庭环境中的机器人的关键要求。这些智能体不仅必须理解和导航空间布局，还必须推理这些空间如何随时间演变。特别是，人类每天与物体互动，导致物体在整个环境中改变位置，使机器人难以可靠地将当前观察与先前看到的物体关联起来。然而，这些互动并非随机：人类的习惯和日常行为在物体位置上产生了时空一致的模式，机器人智能体可以学习这些模式，然后将其用于下游任务，如导航。为此，我们引入了FlowMaps，一种潜在流匹配模型，用于估计连续3D空间中动态物体未来位置的多模态分布。通过学习物体之间的隐式依赖关系及其时间演变，FlowMaps预测物体位置在人类过去互动条件下的可能变化，同时支持在具有相似物体习惯的未见环境中的泛化。为了展示该方法的实用性，我们在模拟和真实环境中将FlowMaps部署到下游的动态物体导航任务中。在超过600个回合中，FlowMaps优于最先进的方法，表明通过连续、多模态的时空分布建模物体动态可以改善机器人在变化家庭环境中的搜索和导航。代码和附加材料可在此https URL获取。

英文摘要

Joint spatial and temporal understanding of 3D scenes is a crucial requirement for robots deployed in everyday household environments. Such agents must not only comprehend and navigate spatial layouts, but also reason about how these spaces evolve over time. In particular, humans interact with objects daily, causing them to change position throughout the environment and making it difficult for robots to reliably associate current observations with previously seen objects. However, these interactions are not random: human habits and routines induce spatio-temporally consistent patterns in object locations, which robotic agents can potentially learn and then exploit for downstream tasks such as navigation. To this end, we introduce FlowMaps, a latent flow matching model for estimating multimodal distributions over the future locations of dynamic objects in a continuous 3D space. By learning the implicit dependencies among objects and their temporal evolution, FlowMaps predicts likely changes in object locations conditioned on past human interactions, while supporting generalization across previously unseen environments that share similar object routines. To demonstrate the utility of this method, we deploy FlowMaps in a downstream dynamic Object Navigation task in both simulated and real-world environments. Across more than 600 episodes, FlowMaps outperforms state-of-the-art approaches, showing that modeling object dynamics through continuous, multimodal spatio-temporal distributions improves robotic search and navigation in changing household environments. Code and additional material is available at https://fra-tsuna.github.io/flowmaps/.

URL PDF HTML ☆

赞 0 踩 0

2606.20150 2026-06-19 cs.RO 新提交 85%

Robust Assembly State Reasoning from Action Recognition for Human-Robot Collaboration

面向人机协作的基于动作识别的鲁棒装配状态推理

James Fant-Male, Roel Pieters

发表机构 * Cognitive Robotics group, Unit of Automation Technology and Mechanical Engineering, Tampere University（坦佩雷大学自动化技术与机械工程系认知机器人组）

专题命中机器人学习：人机协作中的装配状态推理。

AI总结研究从动作识别输入跟踪装配状态的方法，比较逻辑、HMM和神经网络方法，发现最优方法因任务而异，逻辑方法在多变场景更鲁棒。

Comments Preprint accepted to the 35th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2026). 8 pages, 9 figures, 3 tables

详情

AI中文摘要

人类动作识别（HAR）在人机协作（HRC）研究中经常被用于理解已执行的动作以及协作任务的状态。然而，从HAR准确跟踪装配状态尚未得到充分研究，并且在现实场景中并非易事。本研究系统性地调查并比较了使用动作识别输入跟踪装配状态的方法。使用两个不同数据集和五种状态跟踪方法（包括基于逻辑的、隐马尔可夫模型（HMM）和神经网络（NN）方法）进行的调查表明，最优方法在不同任务中并不统一，并且不同方法在不同情况下会失败。测试使用具有不同噪声水平的模拟输入和来自HAR模型的真实输入进行。结果表明，NN和HMM方法在变异性有限的任务中表现良好，但在其他场景中，基于逻辑的方法可能更鲁棒。对于没有额外传感的重复动作任务，建模预期动作持续时间的方法也很重要。

英文摘要

Human Action Recognition (HAR) is frequently investigated in Human-Robot Collaboration (HRC) research to understand what actions have been performed and hence the state of a collaborative task. Accurately tracking an assembly state from HAR is however not fully investigated, and in realistic scenarios is not a trivial task. This research systematically investigates and compares methods for tracking assembly state using action recognition inputs. Investigations using two diverse datasets and five state tracking approaches, including logic-based, Hidden Markov Model (HMM), and neural network (NN) methods, show that optimal approaches are not uniform across different tasks and that different methods fail under different circumstances. Testing is performed using both simulated inputs with varying noise levels and realistic inputs from a HAR model. Results show NN and HMM methods can perform well in tasks with limited variability, but for other scenarios logic-based approaches can be more robust. Methods which model expected action duration are also important for tasks with repeated actions where no additional sensing is provided.

URL PDF HTML ☆

赞 0 踩 0

2606.20104 2026-06-19 cs.LG cs.AI 新提交 85%

Sensorimotor World Models: Perception for Action via Inverse Dynamics

传感器运动世界模型：通过逆动力学实现面向行动感知

Petr Ivashkov, Randall Balestriero, Bernhard Schölkopf

发表机构 * Max Planck Institute for Intelligent Systems（马克斯·普朗克智能系统研究所）； Department of Computer Science, Brown University（布朗大学计算机科学系）； ELLIS Institute（ELLIS研究所）； ETH Zürich（苏黎世联邦理工学院）

专题命中机器人学习：世界模型用于机器人控制

AI总结提出传感器运动世界模型（SMWM），通过逆动力学正则化端到端训练潜空间世界模型，防止表示崩溃并学习与行动对齐的紧凑表示，在2D和3D控制任务中实现竞争性规划性能。

详情

AI中文摘要

面向行动的感知表明，世界的表示不应仅由视觉保真度决定，而应由其与行动的相关性决定。同时，潜在的JEPA风格世界模型主张从高维观测中学习紧凑的预测状态以促进未来状态的预测，但这些模型的端到端训练并非易事，因为如果我们的唯一目标是构建易于预测的潜在状态，表示可能会崩溃。我们引入了一种传感器运动世界模型（SMWM）：一种通过逆动力学正则化进行端到端训练的潜在世界模型。这一单一正则化解决了两个问题：它防止表示崩溃并诱导与行动对齐的表示。通过迫使潜在状态保留关于转换背后行动的信息，它使模型偏向于环境中可控的自由度，同时丢弃不可控的干扰因素。这产生了从离线、无奖励轨迹中训练的稳定潜在世界模型，无需冻结编码器、指数移动平均或复杂的潜在正则化。实验表明，SMWM学习了紧凑、可解释的潜在空间，并在简单的2D和3D控制任务中实现了竞争性的规划性能。

英文摘要

Perception for action suggests that representations of the world should be shaped not by visual fidelity alone, but by their relevance for actions. At the same time, latent JEPA-style world models advocate learning compact predictive states from high-dimensional observations to facilitate the prediction of future states, but end-to-end training of these models is nontrivial because representations may collapse if our only goal is to construct a latent state that is easy to predict. We introduce a sensorimotor world model (SMWM): a latent world model trained end-to-end with inverse dynamics regularization. This single regularizer addresses both issues: it prevents representation collapse and induces action-aligned representations. By forcing latent states to preserve information about the action underlying a transition, it biases the model toward the controllable degrees of freedom of the environment while discarding uncontrollable distractors. This yields stable latent world models trained from offline, reward-free trajectories, without frozen encoders, exponential moving averages, or complex latent regularizers. Empirically, SMWM learns compact, interpretable latent spaces and enables competitive planning performance across simple 2D and 3D control tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.20056 2026-06-19 cs.RO 新提交 85%

VFILC: Accurate Frequency Extrapolations in Imitation Learning via Sampling Frequency ILC

VFILC: 通过采样频率迭代学习控制实现模仿学习中的精确频率外推

Nozomu Masuya, Toshiaki Tsuji, Sho Sakaino

发表机构 * Grad. School of Science ； Technology University of Tsukuba Tsukuba, Japan ； Engineering Saitama University Saitama, Japan ； Information Engineering University of Tsukuba Tsukuba, Japan

专题命中机器人学习：提出模仿学习方法用于机器人速度外推。

AI总结提出VFILC方法，结合可变频率模仿学习与前馈-反馈迭代学习控制，在三种任务中实现精确的速度外推，频率误差降低最高81%。

Comments 8 pages, 17 figures. Accepted at IROS 2026

详情

AI中文摘要

传统的基于神经网络（NN）的变速度运动模仿学习方法要么局限于内插速度，要么在外推超出训练速度范围时产生不可预测的运动。可变频率模仿学习（VFIL）通过将NN模型的采样频率与运动频率相关联，实现了速度的外推，但其开环配置导致频率误差，特别是在外推的高频设置中。本研究提出了基于VFIL和迭代学习控制（ILC）的可变频率模仿学习与迭代学习控制（VFILC），包含前馈和反馈两部分，前者利用VFIL的优势，后者调整频率误差。实验结果表明，所提方法成功且精确地外推了运动速度，并在所有三个任务中减少了频率误差；特别是在以训练数据中平均速度的两倍进行外推时，与简单前馈VFIL相比，反馈在擦拭任务中将频率误差显著降低了81%，在摇晃任务中降低了50%。即使在受复杂摩擦特性影响的接触密集混合任务的内插频率下，所提方法相比VFIL也将精度提高了27%。

英文摘要

Conventional neural network (NN)-based imitation learning methods for variable-speed motion either restricted their scope to interpolated speeds, or generated unpredictable motions when extrapolating beyond trained velocity ranges. Variable-frequency imitation learning (VFIL) enabled extrapolations of speeds by linking the NN model's sampling frequency to the motion frequency, whereas its open-loop configuration caused frequency errors, especially in the extrapolated high-frequency settings. This study proposes variable-frequency imitation learning with iterative learning control (VFILC) based on a combination of VFIL and iterative learning control (ILC) with both feedforward and feedback parts, the former taking advantage of VFIL and the latter adjusting the frequency errors. The experimental results showed that the proposed method successfully and accurately extrapolated motion speeds and reduced frequency errors in all three tasks, and that the feedback especially reduced the frequency errors by a remarkable 81% in the wiping task and 50% in the shaking task, both compared to simple feedforward VFIL, when extrapolating at double the average speed in the training data. The proposed method also improved accuracy by 27% compared with VFIL even at an interpolated frequency for a contact-rich mixing task affected by complex friction traits.

URL PDF HTML ☆

赞 0 踩 0

2606.20048 2026-06-19 cs.RO 新提交 85%

MirrorDuo: Reflection-Consistent Visuomotor Learning from Mirrored Demonstration Pairs

MirrorDuo：基于镜像演示对的反射一致视觉运动学习

Zheyu Zhuang, Ruiyu Wang, Giovanni Luca Marchetti, Florian T. Pokorny, Danica Kragic

发表机构 * Division of Robotics, Perception and Learning（机器人、感知与学习 division）

专题命中机器人学习：提出镜像演示增强行为克隆，用于机器人学习。

AI总结提出MirrorDuo方法，通过反射一致性为每个原始演示生成镜像副本，实现数据增强，在相同数据预算下显著提升行为克隆性能，并支持零/少样本技能迁移。

Comments Published in CoRL 2025

Journal ref CoRL 2025

详情

AI中文摘要

基于图像的行为克隆利用从无处不在的RGB相机捕获的演示。然而，它仍然受到收集多样化演示成本的限制，特别是在工作空间变化中泛化。我们提出MirrorDuo，一种基于反射的公式，操作于图像、本体感受和完整的6自由度末端执行器动作元组，为每个原始演示生成镜像对应物，有效实现“收集一个，免费获得一个”。它可以作为现有学习管道（如标准行为克隆或扩散策略）的数据增强策略，或作为反射等变策略网络的结构先验。通过利用原始域和镜像域之间的重叠，当演示均匀分布在工作空间两侧时，MirrorDuo在相同数据预算下实现了显著改进的性能。当演示仅限于一侧时，MirrorDuo能够在目标布局中仅使用零或五个演示实现向镜像工作空间的高效技能迁移。

英文摘要

Image-based behaviour cloning leverages demonstrations captured from ubiquitous RGB cameras. However, it remains constrained by the cost of collecting diverse demos, especially for generalizing across workspace variations. We propose MirrorDuo, a reflection-based formulation that operates on image, proprioception, and full 6-DoF end-effector action tuples, generating a mirrored counterpart for each original demonstration, effectively achieving "collect one, get one for free". It can be applied as a data augmentation strategy for existing learning pipelines, such as standard behaviour cloning or diffusion policy, or as a structural prior for reflection-equivariant policy networks. By leveraging the overlap between the original and mirrored domains, MirrorDuo achieves significantly improved performance under the same data budget when demonstrations are evenly distributed across both sides of the workspace. When demonstrations are confined to one side, MirrorDuo enables efficient skill transfer to the mirrored workspace with as few as zero or five demos in the target arrangement.

URL PDF HTML ☆

赞 0 踩 0

2606.19990 2026-06-19 cs.AI 新提交 85%

Reward as An Agent for Embodied World Models

奖励作为具身世界模型的智能体

Pu Li, Zhigang Lin, Qiang Wu, Yongxuan Lv, Fei Wang, Shan You

发表机构 * ACE Robotics（ACE机器人）

专题命中机器人学习：提出奖励智能体框架用于具身世界模型

AI总结提出奖励智能体框架和动态感知 rollout 多样化方法，通过鲁棒验证支持更广泛探索，缓解奖励黑客问题，提升世界模型性能。

详情

AI中文摘要

虽然强化学习已成为改进世界模型的有前景工具，现有方法大多依赖于训练分布附近的保守 rollout，限制了探索、行为多样性和更丰富的动态发现。在这项工作中，我们挑战这种保守范式。我们认为核心限制不是探索本身，而是缺乏支持更广泛探索的可靠验证策略。没有可靠的验证，扩展的探索极易受到奖励黑客攻击，即策略利用不完美的奖励而未能实现真正的改进。为了评估这一动机，我们在具身世界模型中实例化我们的方法，其中物理合理性和任务完成性为复杂动态下的可扩展强化学习提供了严格的测试平台。在验证方面，我们引入奖励作为智能体，一种主动评估生成行为以提供鲁棒奖励信号并减轻分布偏移下奖励黑客攻击的智能体奖励框架。在探索方面，我们通过 DynDiff-GRPO 引入动态感知 rollout 多样化，显式扩展动作空间探索以多样化轨迹、拓宽状态-动作覆盖范围，并鼓励超越保守 rollout 机制的更丰富具身行为。通过将奖励作为智能体与 DynDiff-GRPO 统一，我们在更可靠的奖励基础上实现强化学习，并大幅多样化采样，有效缓解奖励黑客攻击，同时在多个开源世界模型上取得显著的精度提升，从而证明当基于鲁棒验证时，更广泛的探索可以成功扩展。

英文摘要

While RL has become a promising tool for refining world models, existing methods largely rely on conservative rollouts near the training distribution, limiting exploration, behavioral diversity, and richer dynamic discovery. In this work, we challenge this conservative paradigm. We argue that the core limitation is not exploration itself, but the lack of reliable verification strategies to support broader exploration. Without reliable verification, expanded exploration becomes highly susceptible to reward hacking, where policies exploit imperfect rewards without achieving genuine improvement. To evaluate this motivation, we instantiate our method in embodied world models, where physical plausibility, and task completion provide a rigorous testbed for scalable RL under complex dynamics. On the verification side, we introduce Reward as an Agent, an agentic reward framework that actively evaluates generated behaviors to provide robust reward signals and mitigate reward hacking under distribution shifts. On the exploration side, we introduce Dynamic-Aware Rollout Diversification through DynDiff-GRPO, which explicitly expands action-space exploration to diversify trajectories, broaden state-action coverage, and encourage richer embodied behaviors beyond conservative rollout regimes. By unifying Reward as an Agent with DynDiff-GRPO, we enable RL on a more reliable reward foundation with substantially diversified sampling, effectively mitigating reward hacking while yielding significant accuracy gains across multiple open-source world models, thereby demonstrating that broader exploration can scale successfully when grounded in robust verification.

URL PDF HTML ☆

赞 0 踩 0

2606.19928 2026-06-19 cs.RO 新提交 85%

SWAP: Symmetric Equivariant World-Model for Agile Robot Parkour

SWAP: 用于敏捷机器人跑酷的对称等变世界模型

Kaixin Lan, Ze Wang, Hongyi Li, Lei Jiang, Chaojie Fu, Chengkai Su, Choi Lam Wong, Yongbin Jin, Hongtao Wang

发表机构 * Center for X-Mechanics, Zhejiang University（浙江大学交叉力学中心）； ZJU-Hangzhou Global Scientific and Technology Innovation Center（浙江大学杭州国际科创中心）； Mirrorme Technology Co., Ltd.（魔镜科技有限公司）

专题命中机器人学习：提出对称等变世界模型用于四足机器人跑酷

AI总结提出SWAP框架，将对称等变性嵌入世界模型和演员-评论家网络，实现四足机器人跑酷记录突破（跨越2.13米间隙、攀爬1.63米平台），并展现出对未见镜像地形的几何泛化与零样本迁移能力。

详情

AI中文摘要

虽然潜在世界模型能够实现极限跑酷所需的主动预测，但其纯数据驱动的特性迫使它们将左右对称交互冗余编码为独立模式。这增加了学习负担并阻碍了几何规律性的捕获，限制了潜在空间对下游策略的效率。为了解决这个问题，我们提出了SWAP，一个端到端的等变对称世界模型。该框架将对称性直接嵌入到世界模型和演员-评论家网络中。在真实世界测试中，机器人跨越了2.13米的间隙并攀爬了1.63米的高台，打破了四足机器人跑酷的记录。此外，该框架对未见过的镜像地形展现出鲁棒的几何泛化能力，并在多种户外环境中具有卓越的零样本迁移能力。这些结果表明，对称等变性是推动学习型腿式运动物理极限的有效结构先验。

英文摘要

While latent world models enable the proactive predictions required for extreme parkour, their purely data-driven nature forces them to redundantly encode left-right symmetric interactions as independent patterns. This inflates the learning burden and hinders the capture of geometric regularities, restricting the latent space's efficiency for downstream policies. To address this, we propose SWAP, an end-to-end equivariant symmetric world model. This framework embeds symmetry directly into both the world model and the actor-critic networks. In real-world tests, the robot leaps across a 2.13 m gap and climbs a 1.63 m platform, breaking records for quadruped parkour. Furthermore, the framework exhibits robust geometric generalization to unseen mirrored terrains and exceptional zero-shot transferability across diverse outdoor environments. These results demonstrate that symmetry equivariance is an effective structural prior for pushing the physical boundaries of learned legged locomotion.

URL PDF HTML ☆

赞 0 踩 0

2606.19774 2026-06-19 cs.RO 新提交 85%

Start Right, Arrive Right: Asynchronous Execution via Initial Noise Selection

开始正确，到达正确：通过初始噪声选择实现异步执行

Trong-Bao Ho, Quang-Tan Nguyen, Thien-Loc Ha, Gia-Binh Nguyen, Viet-Thanh Nguyen, Long Dinh, Minh N. Vu, Duy M. H. Nguyen, An Thai Le, Ngo Anh Vien

发表机构 * VinRobotics ； VinUniversity ； DFKI（德国人工智能研究中心）； University of Stuttgart（斯图加特大学）； IMPRS-IS（国际马克斯·普朗克智能系统研究学院）

专题命中机器人学习：通过初始噪声选择解决机器人异步执行中的动作块不一致。

AI总结针对流式策略异步执行中的动作块边界不一致问题，提出无需训练的PAINT方法，通过初始噪声选择而非轨迹引导实现前缀一致性，在12个模拟和6个真实操作任务中提升执行一致性与任务性能。

Comments First version 19 pages, project site: https://paint-action-chunking.github.io

详情

AI中文摘要

动作分块使机器人策略能够产生时间上连贯的行为，但基于流的策略生成多步动作序列会产生延迟，与实时控制不兼容。在异步执行下，机器人继续执行当前块的同时生成下一个块，即使微小延迟也会在块边界造成不一致。现有方法通过将生成导向已执行的动作前缀来解决此问题。我们则表明，通过在生成开始前选择合适的初始噪声即可实现前缀一致性，使得未经修改的流ODE能够生成连贯的下一块。这将异步推理重新定义为噪声选择问题而非轨迹引导问题。我们提出\textbf{PAINT}，一种无需训练的方法，通过后向欧拉反演找到此噪声，并通过重绘规则构建最终块。总之，\texttt{PAINT}不需要梯度、重新训练或策略修改；然而它在\textit{12个模拟基准}和\textit{6个真实世界操作任务}（涵盖单臂、双臂和人形机器人）上提高了执行一致性和任务性能。网站：~\href{ this https URL }{\texttt{ this https URL }}。

英文摘要

Action chunking enables robot policies to produce temporally coherent behavior, but generating multi-step action sequences with flow-based policies incurs latency that is incompatible with real-time control. Under asynchronous execution, the robot continues executing the current chunk while the next one is generated, causing even minor delays to create inconsistencies at chunk boundaries. Existing methods address this problem by steering generation toward the already executed action prefix. We instead show that prefix consistency can be achieved by selecting an appropriate initial noise before generation begins, allowing the unmodified flow ODE to produce a coherent next chunk. This reframes asynchronous inference as a noise selection problem rather than a trajectory steering problem. We introduce \textbf{PAINT}, a training-free method that finds this noise via backward Euler inversion and constructs the final chunk through a repainting rule. In summary, \texttt{PAINT} requires no gradients, retraining, or policy modification; yet it improves execution consistency and task performance across \textit{12 simulated benchmarks} and \textit{6 real-world manipulation tasks} spanning single-arm, bimanual, and humanoid embodiments. Website: ~\href{https://paint-action-chunking.github.io}{\texttt{https://paint-action-chunking.github.io}}.

URL PDF HTML ☆

赞 0 踩 0

2606.19752 2026-06-19 cs.RO cs.AI 新提交 85%

Temporal Self-Imitation Learning

时间自我模仿学习

Yinsen Jia, Boyuan Chen

发表机构 * Duke University（杜克大学）

专题命中机器人学习：时间自我模仿学习提升长时域机器人操作效率。

AI总结提出时间自我模仿学习框架，通过挖掘高效成功轨迹并转化为可重用监督信号，提升长时域机器人操作任务的学习效率与鲁棒性。

详情

AI中文摘要

基于奖励塑形训练的长时域机器人操作策略仍可能通过低效交互利用密集奖励，而训练过程中稀有高效行为可能被遗忘。我们认为时间效率本身为强化学习提供了强大且未充分利用的自我监督源。我们引入时间自我模仿学习（TSIL），一种强化学习框架，挖掘学习过程中产生的时间高效成功轨迹，并将其转化为可重用的监督信号以改进未来策略。TSIL通过从快速成功轨迹中提取配置条件自适应时间目标逐步优化学习，并通过效率加权自我模仿学习保留和重放高效行为。在15个不同的长时域操作任务中，TSIL持续提升了学习效率、任务完成效率、快速成功行为的重访率以及对不稳定训练条件的鲁棒性。更广泛地，我们的结果表明，成功行为的时间结构本身为强化学习提供了超越人工奖励塑形的可扩展自我监督信号。

英文摘要

Long-horizon robot manipulation policies trained with reward shaping can still exploit dense rewards through inefficient interaction, while rare efficient behaviors may be forgotten during training. We argue that temporal efficiency itself provides a powerful and underutilized source of self-supervision for reinforcement learning. We introduce Temporal Self-Imitation Learning (TSIL), a reinforcement learning framework that mines temporally efficient successful trajectories generated during learning and converts them into reusable supervision for future policy improvement. TSIL progressively refines learning using configuration-conditioned adaptive temporal targets derived from fast successful trajectories, while preserving and replaying efficient behaviors through efficiency-weighted self-imitation learning. Across 15 distinct long-horizon manipulation tasks, TSIL consistently improves learning efficiency, task-completion efficiency, revisitation of fast successful behaviors, and robustness to unstable training conditions. More broadly, our results suggest that the temporal structure of successful behavior itself provides a scalable self-supervisory signal for reinforcement learning beyond manually engineered reward shaping alone.

URL PDF HTML ☆

赞 0 踩 0

2606.19633 2026-06-19 cs.RO cs.AI 新提交 85%

CTS-MoE: Implicit Terrain Adaptation via Mixture-of-Experts for Perceptive Locomotion

CTS-MoE: 基于混合专家模型的隐式地形适应感知运动

Francisco Affonso, Matheus P. Angarola, Ana Luiza Mineiro, Aditya Potnis, Marcelo Becker, Girish Chowdhary

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； University of São Paulo（圣保罗大学）

专题命中机器人学习：提出CTS-MoE用于感知运动，隐式地形适应。

AI总结针对非连续地形上的感知运动问题，提出CTS-MoE方法，通过密集混合专家策略与感知门控组合共享行为，并用多批评家防止价值干扰，实现端到端训练和隐式地形适应，在仿真和硬件上优于基线。

详情

AI中文摘要

在不连续地形（如楼梯、间隙和障碍物）上的感知腿式运动需要自适应行为，因为单一的保守步态无法产生应对突然拓扑变化所需的预期动作。将该问题视为多任务强化学习，会在共享与分离之间引入张力。任务使用共同的运动基础但具有冲突的奖励，因此策略必须共享行为同时避免价值干扰。先前的工作只解决了其中一方面：整体策略牺牲了专业化，而分层子策略牺牲了跨过渡和未知地形的泛化能力。我们提出CTS-MoE，它结合了密集混合专家执行器与基于感知的门控来组合共享行为，以及具有任务特定价值头的多批评家来防止干扰。该模型在单阶段并发教师-学生设置中进行端到端训练，处理部分可观测性并避免顺序蒸馏，任务标签仅在训练期间使用。部署时，路由仅依赖于感知，从而无需高层选择器或地形分类器即可实现地形适应。在仿真和硬件上对Unitree Go1进行的实验（涵盖已知和未知地形）显示了任务感知的专业化，与整体基线相比，跟踪误差更低，成功率更高。项目网站：此https URL。

英文摘要

Perceptive legged locomotion over discontinuous terrain (e.g., stairs, gaps, and obstacles) requires adaptive behavior, as a single conservative gait cannot produce the anticipatory maneuvers needed for abrupt topology changes. Cast as multi-task reinforcement learning, this problem introduces a tension between sharing and separation. Tasks use a common locomotion base but have conflicting rewards, so a policy must share behavior while avoiding value interference. Prior work addresses only one side, with monolithic policies sacrificing specialization and hierarchical sub-policies sacrificing generalization across transitions and unseen terrain. We propose CTS-MoE, which combines a dense mixture-of-experts actor with perception-based gating to compose shared behaviors and a multi-critic with task-specific value heads to prevent interference. The model is trained end-to-end in a single-stage concurrent teacher-student setup that handles partial observability and avoids sequential distillation, with task labels used only during training. At deployment, routing depends solely on perception, allowing terrain adaptation without a high-level selector or terrain classifier. Experiments on a Unitree Go1 in simulation and on hardware across seen and unseen terrains show task-aware specialization, with lower tracking error and higher success rates than monolithic baselines. Project Website: https://cts-moe.github.io/ .

URL PDF HTML ☆

赞 0 踩 0

2606.19598 2026-06-19 cs.RO 新提交 85%

Fail-RAG : A Retrieval Augmented Generation Informed Framework for Robot Failure Identification

Fail-RAG：一种基于检索增强生成的机器人故障识别框架

Ameya Salvi, Jie Hu

发表机构 * Hitachi America, Ltd.（日立美国有限公司）

专题命中机器人学习：针对仓库机器人操作故障检测，属于机器人学习

AI总结提出Fail-RAG框架，利用检索增强生成和视觉语言模型，通过嵌入故障图像和上下文信息并查询数据库，实现机器人操作故障的高效检测，在仓库自动化任务中平均检测准确率提升25个百分点。

详情

AI中文摘要

工业自动化正经历由技术突破和社会变革驱动的机器人演进：向通用机器人、具身和物理人工智能发展，以及劳动力短缺的加剧。智能自主机器人不仅需要按计划运动，还需对意外事件做出反应。本研究聚焦于仓库中物料搬运机器人的意外事件，将其定义为故障，并开发检测机器人操作故障的方法。由于环境和任务的动态性，故障形式可能变化，基于规则的检测方法可能失效。我们提出'Fail-RAG'，一种基于检索增强生成（RAG）的故障检测框架，其中故障图像和上下文信息被嵌入，并通过计算相似度查询故障数据库。进一步使用视觉语言模型（VLM）按照指令模板分析故障并提供细节。通过使用固定机械臂和移动操作器在仓库自动化常见任务中进行仿真和物理实验，评估了Fail-RAG的性能。与使用现成VLM相比，Fail-RAG在五种机器人操作类型上的平均故障检测准确率提高了25个百分点，表明其在真实世界故障检测中的有效性。

英文摘要

Industry automation is witnessing an evolution in robotics driven by both technological breakthroughs and societal changes: progress towards generalist robots, embodied and physical artificial intelligence (AI), and increasing labor shortage in manufacturing.An intelligent autonomous robot needs to not only act according to planned motions but also react to any unexpected events. In this study, we focus on such unexpected events in warehouses where robots are used for material handling. Specifically, we refer to any unexpected events as failures and develop methods to detect robot operations related failures. Rule-based detection methods may break since the form of failures could change due to the dynamic nature of both environments and tasks. We propose 'Fail-RAG', a Retrieval Augmented Generation (RAG)-based failure detection framework where failure images and context information are embedded and queried against a failure database by calculating their similarities. Vision-Language Models (VLMs) are further used to analyze failures and provide details by following our instruction template. We evaluated the performance of Fail-RAG by conducting both simulation and physical experiments using fixed robot arms and a mobile manipulator for multiple tasks that are common in warehouse automation. Fail-RAG achieved 25 percentage point higher failure detection accuracy on average across five types of robot operations compared to using off-the-shelf VLMs, indicating its effectiveness for real-world failure detection.

URL PDF HTML ☆

赞 0 踩 0

2606.19531 2026-06-19 cs.CV cs.RO 新提交 85%

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

ImageWAM：世界动作模型真的需要视频生成，还是只需要图像编辑？

Yuyang Zhang, Wenyao Zhang, Zekun Qi, He Zhang, Haitao Lin, Jingbo Zhang, Yao Mu, Xiaokang Yang, Wenjun Zeng, Xin Jin

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Eastern Institute of Technology（东方理工学院）； Tencent Robotics X（腾讯机器人X）； Tsinghua University（清华大学）； Zhongguancun Academy（中关村学院）

专题命中机器人学习：用图像编辑模型进行机器人动作预测

AI总结提出ImageWAM框架，利用预训练图像编辑模型替代视频生成进行机器人动作预测，通过编辑去噪的KV缓存作为世界动作上下文，在多个模拟和真实实验中优于基线，计算量降至1/6，延迟降至1/4。

Comments Project Page: https://zhangwenyao1.github.io/ImageWAM/

详情

AI中文摘要

世界动作模型（WAMs）通常依赖视频生成来桥接视觉世界建模和机器人控制。然而，基于视频的WAMs面临三个耦合的限制：密集的多帧未来令牌使得推理成本高昂，完整的视频预测将容量花费在与动作无关的时间和外观细节上，以及长期未来想象可能引入误导动作预测的错误。这些问题提出了一个简单的问题：世界动作模型真的需要视频生成吗？我们提出ImageWAM，一个简单的WAM框架，将预训练的图像编辑模型重新用于机器人动作预测。与视频生成相比，图像编辑提供了更匹配的先验：它只需要建模目标帧变换，关注与动作相关的当前到目标视觉差异，并通过编辑预训练将任务指令接地到局部视觉变化。在实践中，ImageWAM在推理时不解码目标帧；相反，它根据图像编辑去噪产生的KV缓存条件化一个流匹配动作专家，将其用作紧凑的世界动作上下文。ImageWAM在多个模拟和真实世界实验中优于标准VLA基线和匹配的竞争性WAM，且无需额外的策略预训练。它还将FLOPs降低到基于视频的WAMs的1/6，延迟降低到1/4。注意力分析进一步表明，编辑缓存聚焦于任务相关的变化区域，支持图像编辑作为基于视频的世界动作建模的有效替代方案。

英文摘要

World Action Models (WAMs) commonly rely on video generation to bridge visual world modeling and robot control. However, video-based WAMs face three coupled limitations: dense multi-frame future tokens make inference costly, full video prediction spends capacity on action-irrelevant temporal and appearance details, and long-horizon future imagination may introduce errors that mislead action prediction. These issues raise a simple question: Does world action model really need video generation? We propose ImageWAM, a simple WAM framework that repurposes pretrained image editing models for robot action prediction. In contrast to video generation, image editing provides a better-matched prior: it only needs to model a target-frame transformation, focuses on action-relevant current-to-target visual differences, and grounds task instructions to localized visual changes through edit pretraining. In practice, ImageWAM does not decode the target frame at inference time; instead, it conditions a flow-matching action expert on the KV caches produced by image-editing denoising, using them as a compact world-action context. ImageWAM outperforms standard VLA baselines and matching competitive WAMs without additional policy pretraining across different simulator and real-world experiments. It also reduces FLOPs to 1/6 and latency to 1/4 of video-based WAMs. Attention analysis further shows that editing caches focus on task-relevant change regions, supporting image editing as an effective alternative to video-based world-action modeling.

URL PDF HTML ☆

赞 0 踩 0

2605.28654 2026-06-19 cs.RO cs.SY eess.SY math.OC 版本更新 85%

Integrated Exploration-Aware UAV Route Optimization and Path Planning

集成探索感知的无人机路径优化与轨迹规划

Jimin Choi, Grant Stagg, Cameron K. Peterson, Max Z. Li

发表机构 * Department of Aerospace Engineering, University of Michigan（密歇根大学航空航天工程系）； Department of Electrical Engineering, Brigham Young University（BYU 电子工程系）； Department of Aerospace Engineering, Department of Civil and Environmental Engineering, and Department of Industrial and Operations Engineering, University of Michigan（密歇根大学航空航天工程系、土木与环境工程系和工业与运营管理工程系）

专题命中机器人学习：提出探索感知的无人机路径优化与规划。

AI总结提出一种集成探索感知的无人机路径优化与轨迹规划框架，通过风险地图、不确定兴趣区域建模、B样条轨迹优化和在线重规划，在灾害监测中平衡报告点访问与新信息探索，实现平均KL散度降低15.9%。

详情

AI中文摘要

无人机越来越多地用于危险环境（如灾区、污染场地、野火区域和受损基础设施）中的探索驱动监测，此时有限的飞行续航必须在访问报告位置和收集新信息之间分配。在这些场景中，关于危险的先验信息通常不完整、空间不精确，并且在执行过程中可能发生变化。例如，初始报告可能识别出危险可能存在的区域，但实际危险可能被移动、部分观察到或完全未被报告。我们提出了一种集成的探索感知无人机路径优化与轨迹规划框架，用于在不确定和演变的先验信息下进行危险监测。环境被表示为空间风险地图，每个位置都有相关的危险状况信念。报告的危险被建模为不确定的兴趣区域（ROI），而不是确认的目标位置，要求无人机在检查报告区域的同时，利用有限的飞行续航探索信息丰富的区域。所提出的方法解决了报告ROI上的车辆路径问题，通过辅助伪节点增强路径以改善空间覆盖，将剩余飞行距离预算分配到路径段，并优化局部探索的动态可行B样条轨迹。在执行过程中，无人机测量更新基于网格的信念地图，当新信息和剩余预算证明调整合理时，对剩余轨迹进行重规划。在48种场景配置中，在线重规划相比离线优化规划器平均KL散度降低15.9%，相比直线遍历降低48.6%。

英文摘要

Uncrewed aerial vehicles (UAVs) are increasingly used for exploration-driven monitoring in hazardous environments such as disaster zones, contaminated sites, wildfire areas, and damaged infrastructure, where limited flight endurance must be allocated between visiting reported locations and gathering new information. In these settings, prior information regarding hazards is often incomplete, spatially imprecise, and subject to change during execution. For example, initial reports may identify a region where a hazard is likely to exist, but the actual hazard may be displaced, partially observed, or entirely unreported. We present an integrated exploration-aware UAV route optimization and path planning framework for hazard monitoring under uncertain and evolving prior information. The environment is represented as a spatial risk map, where each location has an associated belief of hazardous conditions. Reported hazards are modeled as uncertain regions of interest (ROIs) rather than confirmed target locations, requiring the UAV to inspect reported areas while also using its limited flight endurance to explore informative regions. The proposed method solves a vehicle routing problem over reported ROIs, augments the route with auxiliary pseudo-nodes to improve spatial coverage, allocates the remaining flight distance budget across route segments, and optimizes dynamically feasible B-spline trajectories for local exploration. During execution, UAV measurements update a grid-based belief map, and the remaining trajectory is replanned when new information and the remaining budget justify adaptation. Across 48 scenario configurations, online replanning improves average KL reduction by 15.9% over the offline optimized planner and 48.6% over straight-line traversal.

URL PDF HTML ☆

赞 0 踩 0