arXivDaily arXiv每日学术速递 周一至周五更新

视觉与机器人

机器人 / 具身智能

机器人、具身智能、机器人学习、操作、导航和具身世界模型。

今日/当前日期收录 33 信号源:cs.RO, cs.AI, cs.CV, cs.LG
2511.02036 2026-06-18 cs.RO 版本更新 70%

TurboMap: GPU-Accelerated Local Mapping for Visual SLAM

TurboMap: 面向视觉SLAM的GPU加速局部建图

Parsa Hosseininejad, Kimia Khabiri, Shishir Gopinath, Soudabeh Mohammadhashemi, Karthik Dantu, Steven Y. Ko

发表机构 * Simon Fraser University(西蒙弗雷泽大学) University at Buffalo(布法罗大学)

专题命中 机器人学习 :SLAM是机器人感知的核心技术

AI总结 针对视觉SLAM中局部建图延迟问题,提出GPU并行化与CPU优化结合的TurboMap后端,通过重构地图点创建、融合及关键帧管理,实现1.3-1.6倍加速且保持精度。

Comments Accepted for presentation at IROS 2026, preprint

详情
AI中文摘要

在实时视觉SLAM系统中,局部建图必须在严格的延迟约束下运行,因为延迟会降低地图质量并增加跟踪失败的风险。GPU并行化是降低延迟的有效途径。然而,由于同步共享状态更新以及将大型地图数据结构传输到GPU的开销,并行化局部建图具有挑战性。本文提出TurboMap,一个GPU并行化且CPU优化的局部建图后端,全面解决了这些挑战。我们重构了地图点创建,以在GPU上实现并行关键点对应搜索,重新设计并并行化了地图点融合,在CPU上优化了冗余关键帧剔除,并集成了基于GPU的快速局部光束法平差求解器。为最小化数据传输和同步成本,我们引入了持久化的GPU驻留关键帧存储。在EuRoC和TUM-VI数据集上的实验表明,平均局部建图速度分别提升1.3倍和1.6倍,同时保持精度不变。

英文摘要

In real-time Visual SLAM systems, local mapping must operate under strict latency constraints, as delays degrade map quality and increase the risk of tracking failure. GPU parallelization offers a promising way to reduce latency. However, parallelizing local mapping is challenging due to synchronized shared-state updates and the overhead of transferring large map data structures to the GPU. This paper presents TurboMap, a GPU-parallelized and CPU-optimized local mapping backend that holistically addresses these challenges. We restructure Map Point Creation to enable parallel Keypoint Correspondence Search on the GPU, redesign and parallelize Map Point Fusion, optimize Redundant Keyframe Culling on the CPU, and integrate a fast GPU-based Local Bundle Adjustment solver. To minimize data transfer and synchronization costs, we introduce persistent GPU-resident keyframe storage. Experiments on the EuRoC and TUM-VI datasets show average local mapping speedups of 1.3x and 1.6x, respectively, while preserving accuracy.

2606.19154 2026-06-18 cs.RO 新提交 65%

Viking Hill Dataset: A Lidar-Radar-Camera Dataset for Detection and Segmentation in Forest Scenes

Viking Hill数据集:用于森林场景检测与分割的激光雷达-雷达-相机数据集

Vladimír Kubelka, Oleksandr Kotlyar, Unal Artan, Martin Magnusson

发表机构 * Örebro University(奥雷布罗大学) AASS research centre(AASS研究中心) Robot Navigation and Perception Lab(机器人导航与感知实验室)

专题命中 机器人学习 :机器人平台采集数据,用于自主导航感知

AI总结 提出首个包含4D成像雷达的森林多传感器数据集,通过MinkowskiUNet实现雷达与激光雷达点云的语义分割,并评估树干分割质量与树木尺寸的关系。

Comments 33 pages, 11 figures

详情
AI中文摘要

在森林冠层下运行的自主机器人需要对树木及周围植被在不同季节条件下进行稳健感知。现有的林业数据集提供带有单棵树标注的激光雷达或相机数据,但均未包含共配准的4D成像雷达——这一模态因其对视觉退化、表面污染和植被遮挡的鲁棒性而日益受到关注。我们介绍了一个由移动机器人收集的多传感器森林数据集,该机器人配备了高分辨率FMCW成像雷达、激光雷达、RGB相机、IMU和RTK-GNSS。该场地在两个不同植被状态的会话中记录,3D立方体标注(包括每棵树的直径估计)为所有三种感知模态提供了共享语义标签。此外,我们提供了使用MinkowskiUNet对雷达和激光雷达点云进行语义分割的基线结果。雷达在主要类别(地面91%,冠层86%)上取得了与激光雷达竞争性的IoU分数,但在几何精细结构(如树干)上落后(56%对74%)。跨模态分析进一步比较了激光雷达和雷达的树干分割与RGB检测模型,而按直径分层的评估揭示了树干分割质量如何随树木尺寸变化。除了分割,共配准的多模态数据和RTK-GNSS辅助参考定位支持冠层下地图构建、定位和传感器融合的研究。数据集和标注工具已公开。

英文摘要

Autonomous robots operating under forest canopies need robust perception of trees and surrounding vegetation across varying seasonal conditions. Existing forestry datasets provide lidar or camera data with per-tree annotations, but none include co-registered 4D imaging radar -- a modality of growing interest for its resilience to visual degradation, surface contamination, and vegetation occlusion. We introduce a multi-sensor forest dataset collected by a mobile robot equipped with a high-resolution FMCW imaging radar, lidar, RGB camera, IMU, and RTK-GNSS. The site was recorded in two sessions under contrasting vegetation states, and 3D cuboid annotations -- including per-tree diameter estimates -- provide shared semantic labels across all three perception modalities. Furthermore, we provide baseline results for semantic segmentation of the radar and lidar point clouds using MinkowskiUNet. Radar achieves IoU scores competitive with lidar for dominant classes (ground 91%, canopy 86%) while lagging on geometrically fine structures such as tree trunks (56% vs. 74%). A cross-modality analysis further compares lidar and radar trunk segmentation against an RGB detection model, and a diameter-stratified evaluation reveals how trunk segmentation quality varies with tree size. Beyond segmentation, the co-registered multi-modal data and RTK-GNSS-aided reference positioning support research in mapping, localization, and sensor fusion under canopy. The dataset and annotation tools are publicly available.

2606.18315 2026-06-18 cs.LG cs.AI 新提交 65%

Ghost Attractor Networks: Basin-Structured Dynamical Decoders for Closed-Loop Sequential Generation

鬼吸引子网络:用于闭环序列生成的盆地结构动力学解码器

Tianyu Wang, Ying Wang, Zhihao Liu, Xi Vincent Wang, Lihui Wang

发表机构 * KTH Royal Institute of Technology(瑞典皇家理工学院) Department of Production Engineering, KTH Royal Institute of Technology(瑞典皇家理工学院生产工程系) Department of Decision and Control Systems, KTH Royal Institute of Technology(瑞典皇家理工学院决策与控制系统系)

专题命中 机器人学习 :提出动力学解码器用于机器人动作序列生成。

AI总结 提出鬼吸引子网络,一种理论推导的动力学解码器,通过构建盆地-吸引子结构实现高效闭环序列生成,在机器人动作解码任务中以2.3M参数匹配1.07B参数扩散变压器的离线精度,延迟降低32倍。

详情
AI中文摘要

使用大规模Transformer和扩散解码器进行序列输出生成时,内存成本随序列长度增长,且需要迭代逐步骤计算。用小型前馈解码器替代可恢复效率,但产生非结构化的潜在表示,限制了闭环控制:相位条件动作生成和跨步骤潜在传递都需要具有稳定盆地的潜在几何结构。本文提出鬼吸引子网络,一种理论推导的动力学解码器,其潜在变量在学习的势能下演化并带有漂移,通过构造产生盆地-吸引子结构。三个期望(多模态、解码器级单次切换和恒定内存)激发了势能-漂移形式,模式转变作为鞍结分岔和鬼吸引子逃逸出现。层次化的相空间分解将一阶盆地收敛与二阶本体感受细化分开。实验上,使用行为克隆和对比目标端到端训练的鬼网络在其势能中表现出预测的梯度流收缩,在1430个保留样本上,梯度范数在五个积分步骤中衰减67%。鬼网络作为机器人动作解码器进行评估。一个230万参数的鬼网络以462倍少的参数和32倍低的延迟匹配了10.7亿参数扩散变压器的离线精度,并在离线均方误差上比五个替代的200万参数解码器(MLP、神经常微分方程、条件变分自编码器、Transformer、单步扩散)低5.9%至29%。在LIBERO-10闭环基准测试中,鬼网络的盆地结构潜在上的相位条件比前馈MLP基线提高了13.5个百分点的成功率,持久潜在集成达到95.7%的最终成功率。

英文摘要

Sequential output generation with large-scale Transformer and diffusion decoders pays a memory cost that grows with sequence length, plus iterative per-step computation. Replacing them with small feed-forward decoders restores efficiency but produces unstructured latent representations that limit closed-loop control: phase-conditioned action generation and cross-step latent carry-over both require a latent geometry with stable basins. This article proposes Ghost Attractor Networks, a theoretically derived dynamical decoder whose latent evolves under a learned potential with drift and produces a basin-attractor structure by construction. Three desiderata (multi-modality, decoder-level single-pass switching, and constant memory) motivate the potential-drift form, and mode transitions arise as saddle-node bifurcations with ghost-attractor escape. A hierarchical phase-space decomposition separates first-order basin convergence from second-order proprioceptive refinement. Empirically, a Ghost trained end-to-end with a behavioral-cloning and contrastive objective exhibits the predicted gradient-flow contraction in its potential, with the gradient norm decaying by 67 percent across five integration steps on 1430 held-out samples. Ghost is evaluated as a robotic action decoder. A 2.3-million-parameter Ghost matches the offline accuracy of a 1.07-billion-parameter Diffusion Transformer at 462 times fewer parameters and 32 times lower latency, and beats five alternative 2M-parameter decoders (MLP, Neural ODE, CVAE, Transformer, 1-step Diffusion) on offline mean squared error by 5.9 to 29 percent. On the LIBERO-10 closed-loop benchmark, phase conditioning on Ghost's basin-structured latent yields a 13.5 percentage-point success-rate gain over a feed-forward MLP baseline, and persistent-latent ensembling reaches a 95.7 percent final success rate.