arXivDaily arXiv每日学术速递 周一至周五更新

视觉与机器人

机器人 / 具身智能

机器人、具身智能、机器人学习、操作、导航和具身世界模型。

今日/当前日期收录 1 信号源:cs.RO, cs.AI, cs.CV, cs.LG
2606.01605 2026-06-18 cs.RO 版本更新 85%

Embedding Semantic Risk into Distance Fields and CBFs for Online Monocular Safe Control

将语义风险嵌入距离场和CBF用于在线单目安全控制

Dawei Zhang, Nuo Chen, Shuo Liu, Roberto Tron, Zhiwen Fan

发表机构 * Division of Systems Engineering, Boston University(系统工程系,波士顿大学) Department of Mechanical Engineering, Boston University(机械工程系,波士顿大学) Department of Electrical and Computer Engineering, Texas A&M University(电气与计算机工程系,德克萨斯农工大学)

专题命中 具身导航 :单目安全控制,语义风险嵌入距离场用于导航

AI总结 提出一种在线单目感知到控制框架,通过将语义风险直接嵌入欧几里得符号距离场(ESDF),在控制优化前编码风险,实现基于控制障碍函数(CBF)的语义感知安全导航与遥操作。

详情
AI中文摘要

我们提出了一种在线单目感知到控制框架,将语义风险嵌入到用于基于控制障碍函数(CBF)的安全导航和遥操作的距离场中。许多基于感知的安全过滤器对所有映射的障碍物分配相同的基于距离的安全裕度,或者仅将语义用作下游控制器调整,而不是在空间表示中编码语义风险。我们的框架通过将语义信息直接嵌入欧几里得符号距离场(ESDF),在线推理障碍物几何和类别相关风险。这种设计在控制优化前编码语义风险,因此高风险对象在安全场中施加更大的空间影响,同时保留运行时高效的ESDF查询。具体来说,基于基础模型的SLAM前端从单目RGB视频重建密集3D几何,而每帧语义分割提供像素级类别标签,这些标签被融合到重建的几何中。得到的几何-语义表示随后被转换为ESDF,其中语义标签识别安全相关区域并在场计算前施加类别相关的膨胀。语义感知的ESDF提供CBF控制器所需的局部距离值和空间导数,而类别相关的增益进一步调节控制器响应。广泛的仿真和硬件实验证明了在线操作在10-20 Hz的频率以及遥操作和自主导航中的语义感知安全行为。

英文摘要

We propose an online monocular perception-to-control framework that embeds semantic risk into the distance field used by Control Barrier Function (CBF)-based safe navigation and teleoperation. Many perception-based safety filters assign the same distance-based safety margin to all mapped obstacles or use semantics only as a downstream controller adjustment, rather than encoding semantic risk in the spatial representation. Our framework instead reasons online about obstacle geometry and class-dependent risk by embedding semantic information directly into the Euclidean Signed Distance Field (ESDF). This design encodes semantic risk before control optimization, so high-risk objects exert a larger spatial influence in the safety field while retaining efficient ESDF queries at runtime. Specifically, a foundation-model-based SLAM front end reconstructs dense 3-D geometry from monocular RGB video, while per-frame semantic segmentation provides pixel-level class labels that are fused into the reconstructed geometry. The resulting geometric-semantic representation is then converted into an ESDF, where semantic labels identify safety-relevant regions and impose class-dependent inflation before field computation. The semantic-aware ESDF provides the local distance values and spatial derivatives required by the CBF controller, while class-dependent gains further regulate the controller response. Extensive simulation and hardware experiments demonstrate online operation at 10--20 Hz and semantic-aware safe behavior in both teleoperation and autonomous navigation.