arXivDaily arXiv每日学术速递 周一至周五更新

视觉与机器人

机器人 / 具身智能

机器人、具身智能、机器人学习、操作、导航和具身世界模型。

今日/当前日期收录 3 信号源:cs.RO, cs.AI, cs.CV, cs.LG
2606.18664 2026-06-18 cs.SD cs.AI 新提交 80%

NeuralMUSIC: A Hybrid Neural-Subspace Framework for Robot Sound Source Localization

NeuralMUSIC: 一种用于机器人声源定位的混合神经-子空间框架

Yizhuo Yang, Junqiao Fan, Shenghai Yuan, Lihua Xie

发表机构 * School of Electrical and Electronic Engineering, Nanyang Technological University(南洋理工大学电气与电子工程学院)

专题命中 其他机器人 :机器人声源定位混合框架

AI总结 提出NeuralMUSIC混合框架,结合神经网络估计空间协方差矩阵与经典MUSIC子空间方法,通过频率注意力融合和自监督学习提升机器人声源定位的鲁棒性和跨域泛化能力。

详情
AI中文摘要

可靠的声源定位是机器人听觉的基础,使自主机器人能够感知空间线索并在动态环境中有效运行。经典方法如多信号分类(MUSIC)具有坚实的理论基础,但在低信噪比下性能下降。基于深度学习的方法虽然取得了有前景的性能,但通常难以在多种条件下泛化。为了解决这些挑战,我们提出了NeuralMUSIC,一种用于机器人声源定位的混合神经-子空间框架。具体来说,神经网络首先从多通道麦克风观测中估计空间协方差矩阵。然后将预测的协方差集成到经典的MUSIC流程中,包括特征值分解(EVD)和伪谱计算,随后通过频率注意力融合(FAF)模块产生最终的DOA估计。为了提高数据效率,我们进一步引入了一种自监督空间相关学习(SSCL)策略,利用未标记的声学数据来捕获空间结构。跨不同机器人任务的广泛实验表明,NeuralMUSIC在实现有竞争力的定位精度的同时,表现出更强的鲁棒性和跨域泛化能力。

英文摘要

Reliable sound source localization is fundamental to robot audition, enabling autonomous robots to perceive spatial cues and operate effectively in dynamic environments. Classical methods such as Multiple Signal Classification (MUSIC) offer strong theoretical foundations but degrade under low signal-to-noise ratios. While deep learning-based approaches achieve promising performance, they often struggle with limited generalization across conditions. To address these challenges, we propose NeuralMUSIC, a hybrid neural-subspace framework for robotic sound source localization. Specifically, a neural network first estimates the spatial covariance matrix from multichannel microphone observations. The predicted covariance is then integrated into a classical MUSIC pipeline with eigenvalue decomposition (EVD) and pseudo-spectrum computation, followed by a Frequency Attention Fusion (FAF) module to produce the final DOA estimates. To improve data efficiency, we further introduce a Self-supervised Spatial Correlation Learning (SSCL) strategy that leverages unlabeled acoustic data to capture spatial structure. Extensive experiments across different robotic tasks demonstrate that NeuralMUSIC achieves competitive localization accuracy while exhibiting improved robustness and cross-domain generalization.

2606.18688 2026-06-18 cs.LG cs.AI 新提交 70%

Dual-Channel Grounded World Modeling (DCGWM): Structural Prevention of Objective Interference Collapse via Heterogeneous External Grounding with Inward-Only Gradient Flow

双通道接地世界建模 (DCGWM):通过异构外部接地与内向梯度流结构性防止目标干扰崩溃

Akshay Hazare

发表机构 * Independent Researcher(独立研究者)

专题命中 其他机器人 :世界模型表示学习,双通道接地

AI总结 提出双通道接地世界建模(DCGWM),通过分区潜空间和内向梯度流,结构性防止联合嵌入预测架构中多目标接地导致的目标干扰崩溃。

Comments Position paper. Experimental validation in progress

详情
AI中文摘要

联合嵌入预测架构(JEPAs)是世界模型表示学习的主要方法。我们识别出基于JEPA的世界模型在接地于两种性质不同的外部信号时存在一种失败模式:物理动力学(稀疏、高幅度、满足约束的梯度修正)和社会行为动力学(扩散、分布匹配的修正)。我们将其称为目标干扰崩溃(OIC):我们认为在共享潜空间中的联合学习会导致主导通道系统地崩溃从属通道的表示子空间,且仅通过损失加权无法解决。我们提出双通道接地世界建模(DCGWM),通过分区潜空间(物理子空间Z_p,行为子空间Z_b)和内向梯度流,从结构上防止OIC。物理接地通道通过VICReg风格的对齐到物理测量仅更新Z_p;社会行为接地通道通过对齐到涌现多智能体模拟的轨迹仅更新Z_b。通道间接口模块在任务级别耦合子空间,而不产生跨子空间梯度。非对称接地 adherence 损失通过硬铰链惩罚物理违反和软KL惩罚行为发散来惩罚 rollout 漂移。生成渲染层在架构上与潜世界模型隔离。我们给出三个理论结果:分区消除了与OIC相关的梯度干扰路径;每个接地子空间从其对齐目标继承抗崩溃保证;在生成目标几何形状的假设下,生成隔离是必要的。本文建立了问题表述和架构;实验验证正在进行中,将在未来修订中报告。

英文摘要

Joint Embedding Predictive Architectures (JEPAs) are a leading approach to world model representation learning. We identify a failure mode in JEPA-based world models grounded against two qualitatively distinct external signals: physical dynamics (sparse, high-magnitude, constraint-satisfying gradient corrections) and social-behavioral dynamics (diffuse, distribution-matching corrections). We term this Objective Interference Collapse (OIC): we argue that joint learning in a shared latent space causes the dominant channel to systematically collapse the subordinate channel's representational subspace, in a manner not resolvable by loss weighting alone. We propose Dual-Channel Grounded World Modeling (DCGWM), designed to structurally prevent OIC through a partitioned latent space (physical subspace Z_p, behavioral subspace Z_b) with inward-only gradient flow. A Physical Grounding Channel updates only Z_p via VICReg-style alignment to physical measurements; a Social-Behavioral Grounding Channel updates only Z_b via alignment to trajectories from an emergent multi-agent simulation. An Inter-Channel Interface Module couples the subspaces at the task level without cross-subspace gradients. An Asymmetric Grounding Adherence Loss penalizes rollout drift with a hard hinge for physical violations and a soft KL for behavioral divergence. A Generative Rendering Layer is architecturally isolated from the latent world model. We present three theoretical results: the partition removes the gradient-interference pathway implicated in OIC; each grounded subspace inherits anti-collapse guarantees from its alignment objective; and generative isolation is necessary under a stated assumption on the generative objective's geometry. This manuscript establishes the problem formulation and architecture; experimental validation is ongoing and will be reported in a future revision.

2606.18532 2026-06-18 cs.CR cs.AI cs.RO cs.SE 新提交 60%

AI Sandboxes: A Threat Model, Taxonomy, and Measurement Framework

AI沙箱:威胁模型、分类法与测量框架

Inderjeet Singh, Haitham Mahmoud, Andrés Murillo

发表机构 * Fujitsu Research of Europe(富士通欧洲研究)

专题命中 其他机器人 :涉及物理AI和具身自主系统

AI总结 提出AI沙箱的威胁模型、分类法和测量框架,形式化沙箱边界与最弱链规则,定义网络物理威胁模型,并通过三个案例验证。

Comments 50 pages, 8 figures, 10 tables

详情
AI中文摘要

AI系统越来越多地在结合隔离、仿真、仪器化、监督和证据捕获的有界环境中进行评估。对于物理AI、AIoT和网络物理系统,这种转变不仅仅是术语问题:被测系统可能通过物理过程、网络设备和人类操作员进行感知、决策、执行、通信和故障。本文开发了一种面向保证的AI沙箱描述,将其作为数字AI、具身自主和网络物理部署中测试、评估、验证和确认的受控环境。我们形式化了沙箱边界和用于将每个维度的证据组合成有界部署声明的“最弱链”规则;分离了主要的沙箱原型;定义了一个包括对保证装置本身攻击的网络物理威胁模型;并引入了一个跨越保真度、可控性、可观测性、包含性、可重复性和治理工件的测量框架,在三个实际沙箱的工作案例研究中实例化。由此产生的威胁模型、分类法和测量框架阐明了沙箱可以有效测试什么、它可以包含哪些风险,以及它可以为安全、安保和监管保证支持哪些形式的证据。

英文摘要

AI systems are increasingly evaluated in bounded environments that combine isolation, simulation, instrumentation, supervision, and evidence capture. For physical AI, AIoT, and cyber-physical systems, this shift is not a matter of terminology: the system under test may sense, decide, actuate, communicate, and fail through physical processes, networked devices, and human operators. This article develops an assurance-oriented account of AI sandboxes as controlled environments for testing, evaluation, verification, and validation across digital AI, embodied autonomy, and cyber-physical deployments. We formalize the sandbox boundary and a weakest-link rule for composing per-dimension evidence into a bounded deployment claim; separate major sandbox archetypes; define a cyber-physical threat model that includes attacks on the assurance apparatus itself; and introduce a measurement framework spanning fidelity, controllability, observability, containment, reproducibility, and governance artifacts, instantiated on three worked case studies of real sandboxes. The resulting threat model, taxonomy, and measurement framework clarify what a sandbox can validly test, which risks it can contain, and what forms of evidence it can support for safety, security, and regulatory assurance.