arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.06130 2026-06-05 cs.RO

Towards Realistic 3D Sonar Simulation

面向真实3D声纳仿真

Youssef Attia, Davide Costa, Francesco Wanderlingh, Filippo Campagnaro, Enrico Simetti

发表机构 * IEEE

AI总结本文提出一种模块化架构，结合GPU加速图形引擎与物理声学传播原理，在NVIDIA Isaac Sim中实现基于Water Linked 3D-15传感器的体积3D声纳模型，并通过硬件在环配置验证其有效性。

详情

AI中文摘要

随着水下机器人研究日益涉及复杂的三维感知和自主导航，声纳仿真的保真度已成为算法开发的关键因素。当前的仿真框架通常依赖于几何驱动的渲染，将3D声纳近似为水下的LiDAR等效物，这未能考虑基本的声学现象，如折射、多径干扰和相位相关的信号形成。本文提出了一种用于真实3D声纳仿真的模块化架构，该架构将GPU加速的图形引擎与基于物理的声学传播原理相结合。我们在NVIDIA Isaac Sim环境中实现了一个体积3D声纳模型，该模型以Water Linked 3D-15传感器为原型，并将其集成到一个全面的水下仿真框架中。该系统通过硬件在环配置进行了验证，其中在NVIDIA Jetson Orin Nano上执行的改进FastLIO2 SLAM流水线使用合成3D声纳、DVL、IMU和压力数据进行传感器融合。最后，提供了模拟输出与来自港口板桩检查的真实数据之间的定性比较，描述了剩余的模拟到现实差距，并建立了迈向完全声学驱动的体积感知的路线图。

英文摘要

As underwater robotics research increasingly addresses complex 3D perception and autonomous navigation, the fidelity of sonar simulation has become a key factor in algorithm development. Current simulation frameworks typically rely on geometry-driven rendering, approximating 3D sonar as an underwater equivalent to LiDAR, which fails to account for fundamental acoustic phenomena such as refraction, multi-path interference, and phase-dependent signal formation. This paper proposes a modular architecture for realistic 3D sonar simulation that integrates GPU-accelerated graphics engines with physically grounded acoustic propagation principles. We implement a volumetric 3D sonar model within the NVIDIA Isaac Sim environment, modeled after the Water Linked 3D-15 sensor, and integrate it into a comprehensive underwater simulation framework. The system is validated through a hardware-in-the-loop configuration, where a modified FastLIO2 SLAM pipeline, executed on an NVIDIA Jetson Orin Nano, performs sensor fusion using synthetic 3D sonar, DVL, IMU, and pressure data. Finally, a qualitative comparison between simulated outputs and real-world data from harbor sheet-pile inspections is provided, characterizing the remaining sim-to-real gap and establishing a roadmap toward fully acoustics-driven volumetric sensing.

URL PDF HTML ☆

赞 0 踩 0

2606.06123 2026-06-05 cs.LG stat.ML

Adaptive state-action abstractions via rate-distortion

基于率失真的自适应状态-动作抽象

Fernando E. Rosas

发表机构 * Department of Informatics, University of Sussex（苏塞克斯大学信息学院）； Department of Brain Science, Imperial College London（伦敦帝国学院脑科学系）； Centre for Eudaimonia and Human Flourishing, University of Oxford（牛津大学幸福与人类繁荣中心）

AI总结提出通过率失真原理构建软状态-动作抽象，并利用性能证书动态调整抽象粒度，以在压缩状态和动作信息时实现近似最优性能。

Comments 28 pages, 2 figures

详情

AI中文摘要

在学习走路时，婴儿似乎首先处理问题的粗略版本——保持直立、到达看护者——并且只有当在该分辨率下的进一步练习不再有回报时才会细化它。强化学习提供了多种构建复杂任务简单版本的技术，但缺乏关于如何在学习过程中动态调整这些抽象粒度的通用原则。本文提出了这样一个原则：一旦抽象内的学习误差变得与抽象本身引起的误差相当，就细化抽象。在这里，我们通过一个性能证书来研究这一原则的一种形式化方式，该证书将值误差分解为两项：由贝尔曼残差捕获的学习误差界，和由双模拟度量给出的抽象误差界。由此产生的切换策略通过基于率失真原理构建的软状态-动作抽象来实现，其沿状态和动作轴的分辨率可以连续调整。我们在各种表格设置中验证了这种构造，表明在状态和动作信息的大量有损压缩下可以实现近似最优性能。

英文摘要

When learning to walk, infants seem to address a coarse version of the problem first - stay upright, reach the caregiver - and refine it only when further practice at that resolution stops paying off. Reinforcement learning offers multiple techniques for building simple versions of complex tasks, but lacks general principles for how to dynamically adjust the granularity of these abstractions during learning. This paper proposes one such principle: refine the abstraction as soon as the learning error within it becomes comparable to the error induced by the abstraction itself. Here, we investigate one way of formalising this principle via a performance certificate that decomposes value error into two terms: a learning error bound captured by a Bellman residual, and an abstraction error bound given by a bisimulation metric. The resulting switching strategy is implemented by soft state-action abstractions built from rate-distortion principles, whose resolution along state and action axes can be continuously adjusted. We validate this construction in a range of tabular settings, showing that near-optimal performance can be achieved under substantial lossy compression of state and action information.

URL PDF HTML ☆

赞 0 踩 0

2606.06120 2026-06-05 cs.CV

Diff-CA: Separating Common and Salient Factors with Diffusion Models

Diff-CA: 使用扩散模型分离共同因素和显著因素

Michaël Soumm, Alexandre Fournier Montgieux, Yunlong He, Pietro Gori, Alasdair Newson

发表机构 * INRIA at Univ. Grenoble Alpes（法国格勒诺布尔大学INRIA实验室）； CEA List, Palaiseau（法国CEA列表，帕莱索）； Télécom Paris, Institut Polytechnique de Paris（巴黎电信学院，巴黎理工学院）

AI总结提出一种基于扩散模型的条件框架，通过弱监督学习将图像条件分解为共同因素和显著因素，实现对比分析中的因素分离，并保持高保真图像生成质量。

详情

AI中文摘要

对比分析旨在将两个数据分布之间的共同因素与仅对其中一个分布显著的因素分离开来。现有的对比方法基于生成模型（如VAE或GAN），这些模型通常受到重建和图像质量有限的困扰，这阻碍了有效的潜在因素分离，并限制了它们在高保真图像生成和编辑中的应用。我们提出了一种新颖的扩散模型条件框架，能够在不牺牲生成质量的情况下实现对比分解。我们首先训练一个无需提示、以图像为条件的扩散模型，然后学习使用弱监督将条件分解为共同因素和显著因素。我们证明了先前工作中通常假设的加性对比分解在温和条件下是可识别的。这种分解通过仅交换或插值显著因素来实现有针对性的操作。

英文摘要

Contrastive Analysis aims to separate factors that are common between two data distributions from those that are salient to only one of them. Existing contrastive methods are based on generative models (e.g., VAEs or GANs) that often suffer from limited reconstruction and image quality, which hampers effective latent factor separation and limits their applicability to high-fidelity image generation and edition. We propose a novel conditioning framework for diffusion models that enables contrastive decomposition without compromising generation quality. We first train a prompt-free, image-conditioned diffusion model, and then learn to decompose the conditioning into a common and a salient factor, using weak supervision. We prove that the additive contrastive factorization, commonly assumed in prior work, is identifiable under mild conditions. This factorization enables targeted operations by swapping or interpolating only the salient factor.

URL PDF HTML ☆

赞 0 踩 0

2605.03413 2026-06-05 cs.LG cs.AI

Learning to Theorize the World from Observation

从观察中学习理论化世界

Doojin Baek, Gyubin Lee, Junyeob Baek, Hosung Lee, Sungjin Ahn

发表机构 * University of Washington（华盛顿大学）

AI总结受认知科学启发，提出Learning-to-Theorize范式，通过神经理论家（NEO）模型从原始非文本观测中推断显式解释性理论，实现基于解释的泛化。

详情

AI中文摘要

理解世界意味着什么？当代世界模型通常将理解操作化为在潜在空间或观测空间中的准确未来预测。然而，发展认知科学提出了不同的观点：人类理解是通过构建关于世界如何运作的内部理论而涌现的，即使在成熟语言习得之前也是如此。受这种理论构建的认知观点启发，我们引入了Learning-to-Theorize，一种从原始非文本观测中推断世界的显式解释性理论的学习范式。我们通过神经理论家（NEO）实例化该范式，这是一种概率神经模型，它将潜在程序诱导为习得的思维语言，并通过共享的转移模型执行它们。在NEO中，理论被表示为一个可执行的组合程序，其习得的原语可以系统地重新组合以解释新现象。实验表明，这种公式化实现了基于解释的泛化，允许根据生成观测的程序来理解观测。

英文摘要

What does it mean to understand the world? Contemporary world models often operationalize understanding as accurate future prediction in latent or observation space. Developmental cognitive science, however, suggests a different view: human understanding emerges through the construction of internal theories of how the world works, even before mature language is acquired. Inspired by this theory-building view of cognition, we introduce Learning-to-Theorize, a learning paradigm for inferring explicit explanatory theories of the world from raw, non-textual observations. We instantiate this paradigm with the Neural Theorizer (NEO), a probabilistic neural model that induces latent programs as a learned Language of Thought and executes them through a shared transition model. In NEO, a theory is represented as an executable, compositional program whose learned primitives can be systematically recombined to explain novel phenomena. Experiments show that this formulation enables explanation-driven generalization, allowing observations to be understood in terms of the programs that generate them.

URL PDF HTML ☆

赞 0 踩 0

2606.06109 2026-06-05 cs.CL cs.AI

Harnessing Structural Context for Entity Alignment Foundation Models

利用结构上下文进行实体对齐基础模型

Xingyu Chen, Yuanning Cui, Zequn Sun, Wei Hu

发表机构 * State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China（南京大学新型软件技术国家重点实验室）； Nanjing University of Information Science and Technology, Nanjing, China（南京信息科学技术大学）； National Institute of Healthcare Data Science, Nanjing University, Nanjing, China（南京大学健康数据科学国家研究院）

AI总结提出ContextEA框架，通过交叉KG交互编码器和结构校准解码器增强结构上下文的构建与利用，在29个数据集上超越强基线，实现更强的跨KG迁移能力。

详情

AI中文摘要

实体对齐（EA）旨在识别异构知识图谱（KG）中的等价实体，是知识融合和跨KG推理的关键组成部分。最近的EA基础模型表明，对齐知识一旦预训练，可以直接应用于各种未见过的KG对。然而，它仍然在两个地方未充分利用结构上下文：编码时跨KG交互较弱，最终候选排序仍然过于依赖粗略的相似性。我们通过ContextEA（一种用于可迁移EA的增强型编码器-解码器框架）来解决这些局限性。在编码器侧，我们引入了一个跨KG交互编码器，该编码器通过锚点桥统一两个KG，并执行更早的关系感知跨图传播。在解码器侧，我们引入了一个结构校准解码器，该解码器使用实体级、邻域级、关系级和锚点感知的结构证据来校准对齐分数。这种设计在保持轻量级的同时，增强了结构上下文的构建和利用。在OpenEA、SRPRS和DBP的29个EA数据集上的实验显示，与强可迁移基线相比，取得了持续改进。值得注意的是，预训练的ContextEA已经在所有三个基准组上超越了微调基线，显示出对未见KG的显著更强的迁移能力。这些结果表明，显式利用结构上下文是改进EA基础模型的有效方向。

英文摘要

Entity alignment (EA) aims to identify equivalent entities across heterogeneous knowledge graphs (KGs) and is a key component of knowledge fusion and cross-KG reasoning. The recent EA foundation model demonstrates that alignment knowledge, once pretrained, can be directly applied to diverse previously unseen KG pairs. However, it still underuses structural context in two places: cross-KG interaction is weak during encoding, and final candidate ranking still relies too heavily on coarse similarity. We address these limitations with ContextEA, an enhanced encoder-decoder framework for transferable EA. On the encoder side, we introduce a cross-KG interaction encoder that unifies the two KGs with anchor bridges and performs earlier relation-aware cross-graph propagation. On the decoder side, we introduce a structural calibration decoder that calibrates alignment scores with entity-level, neighborhood-level, relation-level, and anchor-aware structural evidence. This design strengthens both structural context construction and structural context exploitation while remaining lightweight. Experiments on 29 EA datasets in OpenEA, SRPRS, and DBP show consistent gains over strong transferable baselines. Notably, the pretrained ContextEA already surpasses the finetuned baselines on all three benchmark groups, demonstrating substantially stronger transfer to unseen KGs. These results suggest that explicitly harnessing structural context is an effective direction for improving EA foundation models.

URL PDF HTML ☆

赞 0 踩 0

2606.06104 2026-06-05 cs.LG

A Sliced-Wasserstein Framework on Correlation Matrices for EEG Decoding

用于脑电图解码的相关矩阵切片Wasserstein框架

Chen Hu, Rui Wang, Jiale Zhou, Jingjun Yi, Shaocheng Jin, Yidong Song, Yefeng Zheng

发表机构 * Westlake University（西湖大学）； School of Artificial Intelligence and Computer Science（人工智能与计算机科学学院）； Jiangnan University（江南大学）； Sun Yat-sen University（中山大学）

AI总结提出基于拉回欧几里得度量的切片Wasserstein框架，实例化两种相关矩阵切片Wasserstein差异，并构建脑电图解码的域泛化方法，在三个数据集上验证了分布偏移下的泛化能力提升。

Comments Accepted by KDD 2026

详情

DOI: 10.1145/3770855.3818864

AI中文摘要

脑电图（EEG）提供非侵入性、毫秒分辨率的神经活动记录，广泛应用于神经科学和医疗保健。许多EEG解码流程依赖协方差描述符以抵抗噪声，但这种表示对通道缩放敏感。因此，近期研究提倡使用满秩相关矩阵作为EEG解码的尺度不变替代。本文提出一个通用框架，用于在赋予拉回欧几里得度量（PEM）的流形上进行切片Wasserstein（SW）差异计算，称为拉回欧几里得度量切片Wasserstein（PEMSW）。在该框架下，我们在两种最近引入的相关几何（即Off-Log度量（OLM）和对数缩放度量（LSM））下，在满秩相关矩阵流形上实例化了两种相关切片Wasserstein（CorSW）差异。基于CorSW，我们进一步开发了用于EEG解码的域泛化（DG）框架。在三个EEG数据集上的实验表明，在分布偏移下泛化能力得到提升，且训练开销低，无额外推理成本。源代码可在https://github.com/ChenHu-ML/CorSW获取。

英文摘要

Electroencephalography (EEG) offers noninvasive, millisecond resolution recordings of neuronal activity and is widely used in neuroscience and healthcare. Many EEG decoding pipelines rely on covariance descriptors for their robustness to noise, but such representations are sensitive to channel-wise scaling. Recent studies have therefore advocated full-rank correlation matrices as a scale-invariant alternative for EEG decoding. In this paper, we propose a general framework for Sliced Wasserstein (SW) discrepancies on manifolds endowed with Pullback Euclidean Metrics (PEMs), termed Pullback Euclidean Metric Sliced Wasserstein (PEMSW). Within this framework, we instantiate two Correlation Sliced-Wasserstein (CorSW) discrepancies on the manifold of full-rank correlation matrices under two recently introduced correlation geometries, \textit{i.e.}, the Off-Log Metric (OLM) and Log-Scaled Metric (LSM). Building on CorSW, we further develop a domain generalization (DG) framework for EEG decoding. Experiments on three EEG datasets demonstrate improved generalization under distribution shifts, with low training overhead and no additional inference cost. The source code is available at https://github.com/ChenHu-ML/CorSW.

URL PDF HTML ☆

赞 0 踩 0

2606.06103 2026-06-05 cs.CV

MS-DKC: A Dataset Knowledge Card Framework for Designing and Adapting Medical Image Segmentation Models

MS-DKC：用于设计和适配医学图像分割模型的数据集知识卡片框架

Tariq M. Khan, Syed Saud Naqvi, Thantrira Porntaveetus, Hamid Alinejad-Rokny, Shahzaib Iqbal, Imran Razzak, Mohammad AU Khan

发表机构 * Center of Excellence in Precision Medicine and Digital Health, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand（精准医学与数字健康中心，朱拉隆功大学牙科学院，泰国曼谷）； Department of Computer Engineering, COMSATS University Islamabad, Islamabad, Pakistan（计算机工程系，COMSATS伊斯兰堡大学，巴基斯坦伊斯兰堡）； School of Biomedical Engineering, UNSW, Sydney, NSW, Australia（生物医学工程学院，新南威尔士大学，澳大利亚悉尼，新南威尔士）； Visiting Scholar (Collaborative Projects), Center of Excellence in Precision Medicine and Digital Health, Chulalongkorn University, Bangkok, Thailand（访问学者（合作项目），精准医学与数字健康中心，朱拉隆功大学，泰国曼谷）； Department of Computing, Abasyn University Islamabad Campus (AUIC), Islamabad, Pakistan（计算系，阿巴斯扬大学伊斯兰堡校区（AUIC），巴基斯坦伊斯兰堡）； Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates（Mohamed bin Zayed人工智能大学，阿布扎比，阿拉伯联合酋长国）； College of Computer and Information Sciences, prince Sultan University, Riyadh, SAudi Arabia（计算机与信息科学学院，苏丹王子大学，沙特阿拉伯利雅得）

AI总结提出MS-DKC框架，通过显式记录数据集特征（如前景占有率、形态、边界模糊性等）并映射到失败模式、设计先验和风险对齐标准，指导医学图像分割模型的设计与适配，在DRIVE、ISIC2018和ACDC数据集上验证了数据集条件化设计的有效性。

详情

AI中文摘要

医学图像分割通常被定义为寻找更强架构的问题，但这可能掩盖一个更基本的问题：数据集对模型有什么要求？在医学影像中，这种要求由前景占有率、形态、边界模糊性、拓扑敏感性、标注质量、采集变异和操作点决定。本文介绍了医学分割数据集知识卡片（MS-DKC），一个使这些因素显式化的框架。MS-DKC通过图像/采集、形态、监督、上下文依赖和部署风险描述符记录数据集证据。这些描述符被映射到失败模式、设计先验和风险对齐标准，使分割设计比架构优先比较更具可追溯性。我们在DRIVE、ISIC2018和ACDC上评估了MS-DKC，它们代表了不同的场景。DRIVE包含稀疏、细小的分支血管，有利于细节保持模型、敏感性感知优化、阈值分析和拓扑感知指标。DKC-TNet-v2以35103个参数达到了Dice 0.8044和IoU 0.6730，而SA-UNetv2-DKC-AmbRef达到了Dice 0.8141、IoU 0.6865、敏感性0.8265、特异性0.9804和AUC 0.9853。ISIC2018涉及紧凑但外观可变的病变；在Att-Next-Topo/ATTNext上基于验证约束的评分函数选择产生了MS-DKC-AttNextTopo-VCSF-NoAug，Dice 0.8872、IoU 0.8214、精确率0.9173、边界F1 0.4878和ASSD 4.13，而合理的添加未能改善风险对齐的轮廓。ACDC提供了一个多类心脏案例，其中MS-DKC推荐四类softmax分割、类别平衡的Dice/CE监督和类别级表面评估。总体而言，结果支持数据集条件化设计：不同的数据集需要不同的先验、操作点和证据，然后才能判断模型是否合适。

英文摘要

Medical image segmentation is often framed as a search for stronger architectures, but this can obscure a more fundamental question: what does the dataset require from the model? In medical imaging, this requirement is shaped by foreground occupancy, morphology, boundary ambiguity, topology sensitivity, annotation quality, acquisition variation, and operating point. This paper introduces the Medical Segmentation Dataset Knowledge Card (MS-DKC), a framework for making these factors explicit. MS-DKC records dataset evidence through image/acquisition, morphology, supervision, context-dependence, and deployment-risk descriptors. These descriptors are mapped to failure modes, design priors, and risk-aligned criteria, making segmentation design more traceable than architecture-first comparison. We evaluate MS-DKC on DRIVE, ISIC2018, and ACDC, representing distinct regimes. DRIVE contains sparse, thin, branching vessels, favoring detail-preserving models, sensitivity-aware optimization, threshold analysis, and topology-aware metrics. DKC-TNet-v2 achieved Dice 0.8044 and IoU 0.6730 with 35103 parameters, while SA-UNetv2-DKC-AmbRef reached Dice 0.8141, IoU 0.6865, sensitivity 0.8265, specificity 0.9804, and AUC 0.9853. ISIC2018 involves compact but appearance-variable lesions; validation-constrained score-function selection on Att-Next-Topo/ATTNext produced MS-DKC-AttNextTopo-VCSF-NoAug with Dice 0.8872, IoU 0.8214, precision 0.9173, Boundary F1 0.4878, and ASSD 4.13, while plausible additions failed to improve the risk-aligned profile. ACDC provides a multi-class cardiac case, where MS-DKC recommends four-class softmax segmentation, class-balanced Dice/CE supervision, and class-wise surface evaluation. Overall, the results support dataset-conditioned design: different datasets require different priors, operating points, and evidence before a model can be judged appropriate.

URL PDF HTML ☆

赞 0 踩 0

2606.06102 2026-06-05 cs.AI cs.LG

Step-adaptive multimodal fusion network with multi-scale cloud feature learning for ultra-short-term solar irradiance forecasting

步进自适应多模态融合网络与多尺度云特征学习用于超短期太阳辐照度预测

Jingxin Zhang Xiaoqin Wang

发表机构 * School of Automation, Southeast University（自动化学院，东南大学）

AI总结提出一种步进自适应多模态融合网络，通过InceptionNeXt提取多尺度云特征、步进自适应低频补偿单元动态调整低频信息，并结合气象时间序列特征进行超短期太阳辐照度预测。

详情

AI中文摘要

超短期太阳辐照度预测对于光伏系统调度和电网稳定性至关重要。现有方法存在三个关键缺陷：单一时间序列模型无法捕捉复杂条件下云的空间动态，标准卷积不能充分表示多尺度云特征，固定的低频补偿策略无法适应不同的预测步长。针对这些问题，本文提出了一种用于超短期辐照度预测的多源数据融合模型。该模型首先采用InceptionNeXt从地基云图像中提取多尺度、多方向的空间特征。然后引入步进自适应低频补偿单元，根据预测步长动态调节全局低频信息。最后，将增强的图像特征与气象时间序列特征相结合，通过TempAttnLSTM网络捕获全局时间依赖性进行多步预测。在公共NREL数据集和山东实际光伏电站上的实验表明，与几种最先进的方法相比，所提方法具有有效性。

英文摘要

Ultra-short-term solar irradiance prediction is critical for photovoltaic system dispatch and power grid stability. Existing approaches suffer from three key shortcomings: single time-series models cannot capture the spatial dynamics of clouds under complex conditions, standard convolutions inadequately represent multi-scale cloud features, and fixed low-frequency compensation strategies fail to adapt to different prediction steps. To address these issues, this proposes a multi-source data fusion model for ultra-short-term irradiance prediction. The model first employs InceptionNeXt to extract multi-scale, multi-directional spatial features from ground-based cloud images. A step-adaptive low-frequency compensation unit is then introduced to dynamically modulate global low-frequency information based on the prediction step. Eventually, the enhanced image features are combined with meteorological time-series features, and a TempAttnLSTM network captures global temporal dependencies for multi-step prediction. Experiments on the public NREL dataset and practical photovoltaic stations in Shandong illustrate the effectiveness of the proposed method compared with several state-of-the-art approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.06100 2026-06-05 cs.CV

HyperVis: Continuous Latent Visual Relational Graphs on the Lorentz Hyperboloid for Compositional Reasoning

HyperVis：洛伦兹双曲面上的连续潜在视觉关系图用于组合推理

Moshiur Farazi, Sameera Ramasinghe, Mahbub Ahmed Turza, Shafin Rahman

发表机构 * Data Science and AI, University of Doha for Science and Technology, Qatar（数据科学与人工智能，多哈科学技术大学，卡塔尔）； Pluralis Research, Australia（Pluralis研究，澳大利亚）； Department of Electrical and Computer Engineering, North South University, Bangladesh（电气与计算机工程系，北南大学，孟加拉国）

AI总结针对视觉语言模型在组合推理中理解物体间关系的困难，提出HyperVis方法，通过计算密集视觉关系张量并投影到洛伦兹双曲面，利用空间物理（IoA驱动的蕴含锥和外部角排斥）增强层次结构，在训练时作为正则化器提升生成式VQA性能，在推理时作为关系编码器提升判别式组合评分。

详情

AI中文摘要

视觉语言模型（VLM）在需要理解物体间关系的组合推理中表现不佳。一个自然的补救措施是从现成的场景图生成器（SGG）注入显式场景图三元组$\langle s, p, o \rangle$，但我们发现这会产生反效果：离散文本标签与连续视觉模态冲突，导致GQA准确率从60.38%降至58.86%。我们提出 extbf{HyperVis}，完全绕过了SGG的语义瓶颈。从$N$个类别无关的区域提议出发，通过空间偏置交叉注意力计算密集的$O(N^2)$视觉关系张量，将其投影到洛伦兹双曲面上，并通过空间物理（即IoA驱动的蕴含锥和外部角排斥）强制执行层次结构。我们发现HyperVis以两种互补的方式发挥作用：（1）作为 extit{训练时正则化器}，双曲关系损失塑造了LoRA表示，提高了生成式VQA性能（GQA 61.03%对比无关系损失的LoRA微调57.21%，恢复并超越基线）；（2）作为 extit{推理时关系编码器}，双曲前缀令牌提升了判别式组合评分（SugarCrepe 79.94%，比基线高6.25个百分点）。学习到的曲率稳定在$\kappa=4.0$，比先前的双曲VLM高一个数量级（先前$\kappa$通常趋近于零），表明连续视觉特征确实需要强曲率空间的指数体积。受控的欧几里得消融实验证实了这种分解：关系流水线在平坦空间中对LoRA的正则化效果相当（GQA 60.81%），但组合增益是双曲空间特有的（SugarCrepe比欧几里得高4.58个百分点），且欧几里得训练中的蕴含损失高出约6倍。代码将在后续公布。

英文摘要

Vision-Language Models (VLMs) struggle with compositional reasoning that requires understanding inter-object relationships. A natural remedy is to inject explicit scene graph triplets $\langle s, p, o \rangle$ from an off-the-shelf scene graph generator (SGG), but we show this backfires: discrete text labels collide with the continuous visual modality, degrading GQA accuracy from 60.38\% to 58.86\%. We propose \textbf{HyperVis}, which bypasses the SGG semantic bottleneck entirely. From $N$ class-agnostic region proposals, we compute a dense $O(N^2)$ visual relation tensor via spatially-biased cross-attention, project it onto a Lorentz hyperboloid, and enforce hierarchy through spatial physics, namely IoA-driven entailment cones and exterior-angle repulsion. We discover that HyperVis contributes in two complementary ways: (1) as a \emph{training-time regularizer}, the hyperbolic relational losses shape LoRA representations that improve generative VQA (GQA 61.03\% vs.\ 57.21\% for LoRA fine-tuning without relational losses, recovering and surpassing the baseline); and (2) as an \emph{inference-time relational encoder}, hyperbolic prefix tokens boost discriminative compositional scoring (SugarCrepe 79.94\%, $+$6.25pp over baseline). The learned curvature stabilises at $κ{=}4.0$, an order of magnitude above prior hyperbolic VLMs where $κ$ typically collapses toward zero, indicating that continuous visual features genuinely require the exponential volume of strongly curved space. A controlled Euclidean ablation confirms this decomposition: the relational pipeline regularises LoRA comparably in flat space (GQA 60.81\%), but the compositionality gain is specifically hyperbolic (SugarCrepe $+$4.58pp over Euclidean), with entailment loss ${\sim}6{\times}$ higher in Euclidean training. Codes are available at TBA.

URL PDF HTML ☆

赞 0 踩 0

2606.06099 2026-06-05 cs.AI

CogManip: Benchmarking Manipulative Behavior in Multi-Turn Interactions with Large Language Model

CogManip: 在大语言模型多轮交互中操控行为的基准测试

Zeyang Yue, Chenfei Yan, Feifei Zhao, Haibo Tong, Mengwen Xu, Xiaozhen Wang, Erliang Lin, Yi Zeng

发表机构 * School of Artificial Intelligence, Beihang University（北京航空航天大学人工智能学院）； BrainCog AI Lab, CASIA（CASIA脑认知人工智能实验室）； Gaoling School of AI, Renmin University of China（中国人民大学 Gallagher人工智能学院）； Beijing-AISI（北京人工智能研究所）； Beijing Key Laboratory of Safe AI and Superalignment（北京安全人工智能与超对齐重点实验室）； School of Artificial Intelligence, UCAS（中国科学技术大学人工智能学院）； Huawei Technologies Co., Ltd.（华为技术有限公司）

AI总结提出CogManip基准，通过1000个多轮交互场景评估15种操控策略风险，发现前沿模型存在显著风险异质性，并揭示提示工程防御的重要性。

详情

AI中文摘要

大语言模型（LLM）在复杂人机交互中是否表现出隐蔽的心理操控已引起越来越多的安全担忧。然而，现有的人工智能安全基准大多局限于显式的规则遵守和静态提示，未能捕捉多轮对话中操控策略的动态性和隐蔽性。我们引入了CogManip，一个全面的基准，在1000个多轮交互场景中评估15种操控策略风险，并由人类专家验证。对包括GPT-5.4和DeepSeek-V3.2等前沿模型在内的13个代表性模型的系统评估揭示了显著的风险异质性，并为未来防御指明了方向。进一步的目标函数扰动分析表明，DeepSeek-V3.2的操控策略对负面和良性系统提示均高度敏感，证明了基于提示的防御工程和隐式目标审计的关键必要性。CogManip为审计现代LLM的隐式心理影响和动态策略选择提供了强大的工具和视角。

英文摘要

Whether Large Language Models (LLMs) exhibit covert psychological manipulation in complex human-AI interactions has garnered increasing safety concerns. However, existing AI safety benchmarks remain largely restricted to explicit rule compliance and static prompts, failing to capture the dynamic and covert nature of manipulative strategies in multi-turn dialogues. We introduce CogManip, a comprehensive benchmark that evaluates 15 manipulation strategy risks across 1,000 multi-turn interaction scenarios, validated by human experts. A systematic evaluation of 13 representative models, including frontier models like GPT-5.4 and DeepSeek-V3.2, reveals significant risk heterogeneities and illuminates the targeted direction for future defense. Further analysis of objective function perturbation reveals that DeepSeek-V3.2's manipulation tactics are highly sensitive to both negative and benign system prompts, demonstrating the critical necessity of prompt-based defense engineering and implicit goal auditing. CogManip offers a robust instrument and perspective for auditing the implicit psychological influence and dynamic strategy selection of modern LLMs.

URL PDF HTML ☆

赞 0 踩 0

2606.06098 2026-06-05 cs.CL cs.LG

IR3DE: A Linear Router for Large Language Models

IR3DE：面向大型语言模型的线性路由器

Eros Fanì, Oğuzhan Ersoy

发表机构 * Gensyn

AI总结提出基于岭回归的线性路由器IR3DE，以低成本快速为每个提示选择最合适的领域专家大语言模型，在推理任务中超越基线方法，并支持动态添加或移除专家模型。

Comments Accepted at the ICML 2026 Workshop on Resource-Adaptive Foundation Model Inference

详情

AI中文摘要

基础大型语言模型（LLM）在广泛的一般任务上表现出色，并通过领域专家LLM在各种专业任务上取得显著成果。随着可用LLM列表的不断增长，推理路由器被提出以选择每个提示最合适的LLM。然而，现有的路由方法要么优化弱到强通用LLM的成本，要么需要大量训练来支持领域专家路由。在本文中，我们提出IR3DE，一种基于岭回归的领域专家路由器，为每个提示提供廉价且快速的路由决策。我们在两种因果语言建模（CLM）设置中评估IR3DE，其中任务是对所有域进行下一个词预测，以及一种推理设置，其中每个域有自己的独特推理任务。尽管是线性路由器，IR3DE在两种CLM设置中实现了与其他基线相当的性能，并在推理设置中超越了它们，归一化性能达到98.4%。此外，IR3DE允许添加或移除新的领域专家，而无需从头重新训练路由器，从而可以动态服务一组LLM，对路由器本身的干扰最小。我们的代码可在github.com/gensyn-ai/IR3DE获取。

英文摘要

Foundational Large Language Models (LLMs) demonstrate proficiency on a wide range of general tasks, and achieve remarkable results on various specialized tasks via domain-expert LLMs. With the ever-growing list of available LLMs, inference routers are being proposed to select the most appropriate LLM for each prompt. However, existing routing methods either optimize cost across weak-to-strong generalist LLMs or require substantial training to support domain-expertise routing. In this paper, we propose IR3DE, a Ridge Regression-based Router for Domain Experts that provides cheap and fast routing decisions for each prompt. We evaluate IR3DE in two Causal Language Modeling (CLM) settings where the tasks are next-token prediction for all domains, and one reasoning setting where each domain has its own distinct reasoning task. Despite being a linear router, IR3DE achieves performance comparable to the other baselines in both CLM settings, and surpassing them in the reasoning setting, with a normalized performance of 98.4%. Moreover, IR3DE enables the addition or removal of new domain experts without requiring the router to be retrained from scratch, allowing a dynamic set of LLMs to be served with minimal disruption to the router itself. Our code is available at: github.com/gensyn-ai/IR3DE.

URL PDF HTML ☆

赞 0 踩 0

2606.06096 2026-06-05 cs.LG cs.AI cs.CL

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

OrderGrad: 通过顺序统计量策略梯度估计超越均值优化

Paavo Parmas, Yongmin Kim, Kohsei Matsutani, Shota Takashiro, Soichiro Nishimori, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo

发表机构 * The University of Tokyo（东京大学）

AI总结提出OrderGrad，一种用于顺序统计量目标的似然比和重参数化梯度估计器族，通过奖励变换实现风险厌恶、鲁棒和探索性学习的统一即插即用方法。

详情

AI中文摘要

策略梯度方法通常优化期望回报，但许多现实应用关心回报的分布特性：尾部风险、异常值鲁棒性或最佳K发现。我们引入OrderGrad，一种用于顺序统计量目标的似然比和重参数化梯度估计器族。OrderGrad优化有限样本L-统计量，即排序奖励或成本的加权平均，通过仅改变秩权重来恢复诸如VaR、CVaR、修剪均值、中位数和top-m/最佳K标准等目标。对于任何固定样本大小和秩权重向量，OrderGrad为相应的顺序统计量目标提供无偏梯度估计。该方法实现为简单的奖励变换，然后可在其他标准策略梯度或重参数化更新中使用。我们研究了所得估计量的方差行为，并在均值优化与部署目标不匹配的任务上进行了评估，包括LLM数学后训练和其他任务。OrderGrad为风险厌恶、鲁棒和探索性学习提供了统一的即插即用途径。代码：https://github.com/paavo5/ordergrad

英文摘要

Policy-gradient methods usually optimize expected return, but many real world applications care about distributional properties of returns: tail risk, outlier robustness, or best-of-K discovery. We introduce OrderGrad, a family of likelihood-ratio and reparameterization gradient estimators for order-statistic objectives. OrderGrad optimizes finite-sample L-statistics, i.e., weighted averages of sorted rewards or costs, recovering objectives such as VaR, CVaR, trimmed means, medians, and top-m/best-of-K criteria by changing only the rank weights. For any fixed sample size and rank-weight vector, OrderGrad provides an unbiased gradient estimator for the corresponding order-statistic objective. The method is implemented as a simple reward transformation that can then be used in an otherwise standard policy-gradient or reparameterized update. We study the resulting estimator's variance behavior and evaluate it on tasks where mean optimization is mismatched to the deployment objective, including LLM math post-training and other tasks. OrderGrad provides a unified, plug-and-play route to risk-averse, robust, and exploratory learning. Code: https://github.com/paavo5/ordergrad

URL PDF HTML ☆

赞 0 踩 0

2606.06094 2026-06-05 cs.AI cs.LG math.DS physics.med-ph

Integrating Mechanistic and Data-Driven Models for Neurological Disorders through Differentiable Programming

通过可微编程整合机制模型与数据驱动模型用于神经系统疾病

Shah Pallav Dhanendrakumar, Saikat Pal, Sitikantha Roy

发表机构 * Department of Applied Mechanics, Indian Institute of Technology Delhi（印度理工学院德里应用力学系）； Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi（印度理工学院德里人工智能学院）

AI总结本文综述了混合建模策略，通过可微编程将深度学习与基于物理的求解器结合，用于神经系统疾病的诊断、预后和治疗规划，优于纯机制或纯数据驱动方法。

详情

AI中文摘要

计算建模、神经影像和人工智能的进步正在革新神经系统疾病的建模，以改进诊断、预后和治疗规划。机制模型提供了对疾病的宝贵科学见解，但在实践中常常因假设而简化，或计算昂贵且求解缓慢。然而，纯数据驱动方法虽然提供速度和可扩展性，但需要大量高质量数据进行训练，并且通常存在可解释性和泛化问题。本视角论文提供了混合建模策略的结构化概述，这些策略将深度学习模型与基于物理的求解器相结合，并分为并行、串行和并行-串行架构。强调的三种主要方法是：用于缺失或不完整物理的残差建模、用于连续时间动力学近似的神经常微分方程（NODEs），以及用神经近似加速传统求解器的求解器在环。这些混合模型整合了基于控制微分方程的公式和深度学习，以表征神经系统疾病的演变，并有望实现先进的个性化神经建模。此外，该研究探索并提出了不同的混合配置，以提高诊断准确性、预测疾病进展，并为一系列神经系统疾病提供治疗策略信息。这些能力优于独立的机制或纯数据驱动方法，使混合建模成为强大的工具，特别是在涉及脑肿瘤、阿尔茨海默病和中风等神经系统疾病的进展和治疗反应建模的应用中。

英文摘要

Advances in computational modeling, neuroimaging, and artificial intelligence are revolutionizing the modeling of neurological disorders for improved diagnostics, prognosis, and treatment planning. Mechanistic models provide valuable scientific insight into the disorders, but in practice they are often simplified with assumptions or computationally expensive and slow to solve. However, while purely data driven approaches provide speed and scalability, they require large, high quality data to train and generally suffer from interpretability and generalization issues. This perspective paper presents a structured overview of hybrid modeling strategies, which combine deep learning models with physics based solvers, and are categorized into parallel, series, and parallel-series architectures. Three main approaches that have been emphasized are residual modeling for missing or incomplete physics, Neural Ordinary Differential Equations (NODEs) for continuous time dynamics approximation, and solver in the loop that accelerates traditional solvers with neural approximations. These hybrid models integrate the governing differential equation based formulations and deep learning to characterize the evolution of neurological disorders, and promise advanced personalized neurological modeling. In addition, the study explores and proposes different hybrid configurations to improve diagnosis accuracy, predict disease progression, and inform treatment strategies across a range of neurological disorders. These capabilities outperform standalone mechanistic or purely data driven approaches, making hybrid modeling a powerful tool, especially in applications involving modeling the progression and treatment responses in neurological conditions such as brain tumors, Alzheimer's disease, and stroke.

URL PDF HTML ☆

赞 0 踩 0

2606.06090 2026-06-05 cs.AI

Beyond Semantic Organization: Memory as Execution State Management for Long-Horizon Agents

超越语义组织：记忆作为长时程智能体的执行状态管理

Yaoqi Chen, Haibin Lai, Yuru Feng, Chuyu Han, Qianxi Zhang, Baotong Lu, Menghao Li, Xinjiang Wang, Zhirui Wang, Shusen Xu, Zengzhong Li, Zewen Jin, Hao Wu, Cheng Li, Qi Chen

发表机构 * University of Science and Technology of China（中国科学技术大学）； Microsoft（微软）； Nanjing University（南京大学）； University of California, San Diego（加州大学圣地亚哥分校）

AI总结针对长时程任务中智能体依赖执行状态而非语义相似性的问题，提出MAGE（记忆作为智能体引导的探索），通过层次状态树管理交互，实现状态完整性和错误隔离，在MemoryArena上任务成功率提升7.8-20.4个百分点，token消耗降低55.1%。

Comments 16 pages

详情

AI中文摘要

基于LLM的智能体越来越多地处理具有相互依赖决策的长时程任务，其中每个动作都会重塑未来约束，中间错误可能级联。现有的RAG和智能体记忆系统通过语义相似性组织历史，在决策时检索内容相关的条目。我们认为这种设计与执行状态依赖不匹配：它分割了决策轨迹，混合了有效和错误的痕迹，阻碍了连贯的状态重建和错误隔离。我们提出MAGE（记忆作为智能体引导的探索），一个主动的执行状态管理器，将交互存储在层次状态树中。智能体从活跃的根到当前路径派生其状态，结合子目标摘要、近期轨迹和来自先前分支的提示。四个耦合操作维护树：Grow记录新轨迹，Compress总结完成的子目标，Maintain验证摘要，Revise恢复目标边界并在新分支上继续。这种设计在保持状态完整性和将缺陷片段与活跃路径隔离的同时，限制了上下文增长。在MemoryArena上的实验表明，MAGE将平均任务成功率提高了7.8-20.4个百分点，同时将token消耗降低了55.1%。

英文摘要

LLM-based agents increasingly tackle long-horizon tasks with interdependent decisions, where each action reshapes future constraints and intermediate errors can cascade. Existing RAG and agent memory systems organize histories by semantic similarity, retrieving content-relevant entries at decision time. We argue that this design mismatches execution-state dependencies: it fragments decision trajectories and mixes valid and erroneous traces, hindering coherent state reconstruction and error isolation. We propose MAGE (Memory as Agent-Guided Exploration), an active execution-state manager that stores interactions in a hierarchical state tree. The agent derives its state from the active root-to-current path, combining subgoal summaries, recent traces, and hints from prior branches. Four coupled operations maintain the tree: Grow records new traces, Compress summarizes completed subgoals, Maintain validates summaries, and Revise restores a target boundary and resumes on a new branch. This design bounds context growth while preserving state integrity and isolating flawed segments from the active path. Experiments on MemoryArena show that MAGE improves the average task success rate by 7.8--20.4 pp over baselines, while reducing token consumption by 55.1%.

URL PDF HTML ☆

赞 0 踩 0

2606.06088 2026-06-05 cs.CL

CHALIS: A Challenge Dataset for Language Identification in Difficult Scenarios

CHALIS：困难场景下的语言识别挑战数据集

Michal Tichý, Jindřich Libovický

发表机构 * Charles University, Faculty of Mathematics and Physics（查理大学数学与物理系）； Institute of Formal and Applied Linguistics（形式与应用语言学研究所）

AI总结提出CHALIS数据集，针对亲缘语言和拼写噪声等困难场景，通过收集互懂语言对句子和模拟拼写噪声，评估四种语言识别系统，发现它们在低资源语言和音译输入上表现不佳。

Comments 7 pages

2606.06087 2026-06-05 cs.CL cs.AI

LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents

LatentSkill: 从上下文文本技能到LLM智能体的权重内隐技能

Aofan Yu, Chenyu Zhou, Tianyi Xu, Zihan Guo, Rong Shan, Zhihui Fu, Jun Wang, Weiwen Liu, Yong Yu, Weinan Zhang, Jianghao Lin

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Sun Yat-Sen University（中山大学）； Shanghai Innovation Institute（上海创新研究院）； OPPO Research Institute（OPPO研究院）

AI总结提出LatentSkill框架，通过预训练超网络将文本技能转换为即插即用的LoRA适配器，将技能知识存储在权重空间而非上下文空间，从而减少预填充令牌并提升性能。

Comments 16 pages, 4 figures

详情

AI中文摘要

智能体系统越来越多地使用文本技能来编码可重用的任务流程，但在每一步将这些技能注入提示中会带来大量的上下文开销，并将技能内容暴露为明文。我们提出了LatentSkill，一个通过预训练超网络将文本技能转换为即插即用LoRA适配器的框架。LatentSkill将技能知识存储在权重空间而非上下文空间中，消除了每步的技能令牌，同时保留了模块化加载、缩放和组合。在ALFWorld和Search-QA上，LatentSkill在显著减少预填充令牌的情况下，优于相应的上下文技能基线：在ALFWorld的已见和未见划分上，它分别提高了21.4和13.4个百分点的成功率，预填充令牌减少了64.1%；在Search-QA上，精确匹配提高了3.0个百分点，技能令牌开销降低了72.2%。进一步分析表明，生成的技能LoRA形成了结构化的语义几何，可以通过LoRA缩放系数精确控制，并且在技能组件对齐时可以通过参数空间算术进行组合。这些发现表明，权重空间技能为扩展LLM智能体提供了一种高效、模块化且暴露更少的基础。

英文摘要

Agent systems increasingly use textual skills to encode reusable task procedures, but injecting these skills into the prompt at every step incurs substantial context overhead and exposes skill content as plaintext. We present LatentSkill, a framework that converts textual skills into plug-and-play LoRA adapters through a pretrained hypernetwork. LatentSkill stores skill knowledge in weight space rather than context space, removing per-step skill tokens while preserving modular loading, scaling, and composition. On ALFWorld and Search-QA, LatentSkill outperforms the corresponding in-context skill baseline while using substantially fewer prefill tokens: it improves ALFWorld success by 21.4 and 13.4 points on the seen and unseen splits with 64.1% fewer prefill tokens, and improves Search-QA exact match by 3.0 points with 72.2% lower skill-token overhead. Further analysis shows that generated skill LoRAs form a structured semantic geometry, can be precisely controlled via the LoRA scaling coefficient, and can be composed through parameter-space arithmetic when skill components are aligned. These findings suggest that weight-space skills provide an efficient, modular, and less exposed substrate for extending LLM agents.

URL PDF HTML ☆

赞 0 踩 0

2606.06081 2026-06-05 cs.AI cs.HC

A Framework for Measuring Appropriate Reliance on Set-Valued AI Advice

衡量对集合值AI建议适当依赖的框架

Ranjan Mishra, Jakob Schoeffer

发表机构 * University of California, Berkeley（加州大学伯克利分校）； ETH Zurich（苏黎世联邦理工学院）

AI总结本文提出首个正式框架，用于在序列判断-顾问范式中衡量对集合值AI建议的适当依赖，涵盖分类和回归任务，并定义了新的度量指标以捕捉现有方法忽略的细微差别。

详情

AI中文摘要

对AI建议的适当依赖已成为人机协作的核心研究主题。现有框架仅关注点预测作为AI建议。然而，集合值AI建议（例如离散集或连续区间）越来越多地被用于传达不确定性和改善人类决策。在本文中，我们在序列判断-顾问范式中开发了第一个用于衡量对集合值AI建议适当依赖的正式框架，涵盖分类和回归任务。对于分类，我们首先引入了评估集合值AI建议所需的维度。然后我们定义了两个指标：对AI的正确依赖率和对自身的正确依赖率，它们共同表征了这种设置下的适当依赖。对于回归，我们引入了AI依赖的数量和AI依赖的质量，分别衡量决策者是否利用了AI建议以及他们的依赖是否帮助他们相对于初始估计更接近真实值。通过应用我们的框架，我们展示了这些度量如何捕捉现有方法忽略的人机协作中的重要细微差别。

英文摘要

Appropriate reliance on AI advice has become a central research theme in human-AI collaboration. Existing frameworks have focused exclusively on point predictions as AI advice. However, set-valued AI advice (e.g., discrete sets or continuous intervals) is increasingly being used to communicate uncertainty and improve human decision making. In this paper, we develop the first formal framework for measuring appropriate reliance on set-valued AI advice within the sequential judge-advisor paradigm, spanning both classification and regression tasks. For classification, we first introduce the dimensions that are necessary for evaluating set-valued AI advice. We then define two metrics: correct reliance rate on AI and correct reliance rate on self, which jointly characterize appropriate reliance in this setting. For regression, we introduce quantity of AI reliance and quality of AI reliance, which respectively measure whether a decision maker utilized the AI advice and whether their reliance helped them get closer to the ground truth relative to their initial estimate. Through the application of our framework, we demonstrate how these metrics capture important nuances in human-AI collaboration that existing measures overlook.

URL PDF HTML ☆

赞 0 踩 0

2606.06080 2026-06-05 cs.LG cs.AI cs.CL

On Advantage Estimates for Max@K Policy Gradients

关于 Max@K 策略梯度的优势估计

Shota Takashiro, Soichiro Nishimori, Paavo Parmas, Yongmin Kim, Kohsei Matsutani, Gouki Minegishi, Yusuke Iwasawa, Takeshi Kojima, Yutaka Matsuo

发表机构 * The University of Tokyo（东京大学）

AI总结针对稀疏奖励下推理模型后训练困难，提出一种新的优势估计方法 MaxPO，通过 Leave-Two-Out 基线实现中心化优势，降低梯度方差并提升性能。

详情

AI中文摘要

具有可验证奖励的强化学习广泛用于推理模型的后训练，但稀疏的结果奖励使得探索困难。一种补充方法是直接优化推理时目标如 pass@K 和 max@K，然而现有针对这些目标的策略梯度估计器使用不同的信号、基线和归一化，使得它们之间的关系不明确。我们通过基线设计和优势中心化来研究这个问题。从该领域领先方法的优势估计器出发，我们证明它是策略梯度无偏的，但产生非中心化的优势。然后我们引入一种 Leave-Two-Out 基线，它在保持策略梯度无偏性的同时，使得实现的批次优势完全中心化。由此产生的方法 MaxPO 具有高效的二次时间实现，并自然地集成到基于组的 LLM 后训练强化学习中。我们进一步推导了 max@K 的规范有限批次优势，为现有优势估计器提供了统一视角。实验上，我们验证了 L2O 基线降低了梯度方差，并优于非中心化的替代方案。

英文摘要

Reinforcement learning with verifiable rewards is widely used for post-training reasoning models, but sparse outcome rewards make exploration difficult. A complementary approach is to optimize inference-time objectives such as pass@K and max@K directly, yet existing policy-gradient estimators for these objectives use different signals, baselines, and normalizations, making their relationships unclear. We study this issue through baseline design and advantage centering. Starting from the advantage estimator of a leading method in the field, we show that it is policy-gradient unbiased but yields a non-centered advantage. We then introduce a Leave-Two-Out baseline that preserves policy-gradient unbiasedness while making realized batch advantages exactly centered. The resulting method, MaxPO, has an efficient quadratic-time implementation and integrates naturally into group-based RL for LLM post-training. We further derive the canonical finite-batch advantage for max@K, providing a unified view of existing advantage estimators. Empirically, we verify that the L2O baseline reduces gradient variance and outperforms non-centered alternatives.

URL PDF HTML ☆

赞 0 踩 0

2606.06079 2026-06-05 cs.CL

SkillComposer: Learning to Evolve Agent Skills for Specification and Generalization

SkillComposer: 学习演化智能体技能以实现特化与泛化

Qi Zhang, Zhaopeng Feng, Xiaonan Shi, Xiaomeng Hu, Chu Liu, Pengjun Xie, Xiaobin Wang, Jieping Ye, Bryan Hooi, Haobo Wang, Junbo Zhao

发表机构 * Zhejiang University（浙江大学）； Tongyi Lab（通义实验室）； National University of Singapore（新加坡国立大学）

AI总结提出SkillComposer框架，通过创建、改进和合并三种可学习操作，使语言模型在推理时自我演化技能，支持离线、在线和混合部署模式，在多个基准上提升性能。

Comments Under Review

详情

AI中文摘要

智能体技能由指导智能体推理和行动的可重用策略组成，在推理时展现出提升模型能力的强大潜力。然而，当前的技能构建方法将问题视为一次性提取，忽略了一个基本矛盾：针对特定任务的技能难以迁移，而抽象化的技能往往提供不足的指导。我们将这种脆弱性归因于缺乏明确的技能特化和泛化机制。为解决这一问题，我们引入了SkillComposer框架，该框架将技能构建分解为三种可学习操作：创建、改进和合并。通过系统性的拒绝采样方案进行训练，SkillComposer使语言模型能够在推理时自我演化技能，并支持三种部署模式：离线构建通用库、在线进行任务特定优化以及混合模式结合两者。在$τ^2$-Bench、LiveCodeBench v6和AppWorld上的综合实验表明，SkillComposer持续优于基线方法。我们的SkillComposer-4B将27B执行器在智能体任务上提升了最多+4.5，在代码任务上提升了最多+3.4，同时泛化到训练中未见过的领域和任务类型。分析表明，合并和改进操作处理正交的质量维度，且技能组合是一种可迁移的元能力，为技能增强推理提供了实用方案。

英文摘要

Agent skills, which consist of reusable strategies that guide agent reasoning and action, have shown strong potential for improving model capability at inference time. However, current skill construction methods treat the problem as one-shot extraction, overlooking a fundamental tension: a skill tailored to the specific task fails to transfer, while the abstracted skill often provides insufficient guidance. We attribute this fragility to the absence of explicit mechanisms for skill specification and generalization. To address this gap, we introduce SkillComposer, a framework that decomposes skill construction into three learnable operations: create, improve, and merge. Trained via systematic rejection sampling recipe, SkillComposer enables language models to self-evolve skills at inference time and supports three deployment modes: offline for building generalized libraries, online for task-specific refinement, and hybrid for combining both. Comprehensive experiments on $τ^2$-Bench, LiveCodeBench v6, and AppWorld show that SkillComposer consistently outperforms baselines. Our SkillComposer-4B improves a 27B executor by up to +4.5 on agent tasks and +3.4 on code tasks, while generalizing across domains and task types unseen during training. Analysis reveals that merge and improve address orthogonal quality dimensions and that skill composition is a transferable meta-ability, providing a practical recipe for skill-augmented inference.

URL PDF HTML ☆

赞 0 踩 0

2606.06078 2026-06-05 cs.CV

ReCache: 通过REINFORCE学习扩散模型的预算感知缓存调度

Mishan Aliev, Eva Neudachina, Ilya Bykov, Aleksandr Oganov, Kirill Struminsky, Aibek Alanov, Denis Rakitin

发表机构 * HSE University（俄罗斯高等经济学院）； Yandex Research（Yandex研究院）

AI总结提出ReCache，利用策略梯度学习在给定计算预算下最大化生成质量的去噪步骤重计算调度，无需标注数据且兼容多种缓存机制。

详情

AI中文摘要

现代扩散模型生成高质量图像和视频，但其迭代去噪过程导致推理成本高昂。特征缓存通过重用或预测相邻去噪步骤的中间激活来加速采样，利用沿反向轨迹的计算冗余。本文关注缓存调度：选择哪些去噪步骤应完全重计算。现有调度要么是固定的（如均匀），要么根据每步误差启发式自适应选择；这两种情况下，实际计算成本是手动调整阈值的副作用，而非用户可指定的量。我们提出ReCache，它反转了这一过程：给定目标预算k，学习最大化生成质量的重计算调度，将计算变为可直接控制的输入。ReCache通过策略梯度训练，避开了通过完整扩散推理的反向传播，且不使用任何标注数据。来自无缓存推理的生成作为匹配目标，并配以生成质量的奖励。ReCache兼容任何缓存机制，包括特征重用和特征预测；对于每种机制，单个训练好的策略在推理时适应不同计算预算。ReCache持续优于调度基线：在FLUX上减少$ imes5.04$ FLOPs时，与DiCache相比，LPIPS降低31%（从0.456降至0.316）；在Wan 2.1上实现$\sim imes2.6$加速时，与均匀HiCache相比，LPIPS降低65%（从0.480降至0.169），VBench分数提升7%（5.6分，从70.4升至76.0）。代码见https://github.com/thecrazymage/ReCache。

英文摘要

Modern diffusion models generate high-quality images and videos, but their iterative denoising process makes inference expensive. Feature caching accelerates sampling by reusing or predicting intermediate activations across neighboring denoising steps, exploiting the redundancy of computations along the reverse trajectory. In this work, we focus on the caching schedule: selecting which denoising steps should be fully recomputed. Existing schedules are either fixed (e.g. uniform) or chosen adaptively from per-step error heuristics; in both cases, the actual compute cost is a side-effect of hand-tuned thresholds rather than a quantity the user can specify. We propose ReCache, which inverts this: given a target budget k, it learns the recomputation schedule that maximizes generation quality, turning compute into a directly controllable input. ReCache trains via policy gradients, sidestepping backpropagation through full diffusion inference, and uses no labelled data. Generations from uncached inference serve as matching targets, paired with a reward for generation quality. ReCache is compatible with any caching mechanism, including feature reuse and feature forecasting; for each mechanism, a single trained policy adapts across computational budgets at inference time. ReCache consistently outperforms scheduling baselines: under a $\times5.04$ FLOPs reduction on FLUX, it reduces LPIPS by 31% (from 0.456 to 0.316) compared to DiCache; on Wan 2.1 at a $\sim \times2.6$ speedup, it drops LPIPS by 65% (from 0.480 to 0.169) and boosts the VBench score by 7% (5.6 points, from 70.4 to 76.0) over uniform HiCache. Code is available at https://github.com/thecrazymage/ReCache.

URL PDF HTML ☆

赞 0 踩 0

2606.06058 2026-06-05 cs.LG cs.AI cs.CL

MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following

MDP-GRPO：面向多约束指令跟随的稳定化组相对策略优化

Mohammad Mahdi Salmani-Zarchi, Zahra Rahimi, Heshaam Faili, Mohammad Javad Dousti

发表机构 * Department of Electrical and Computer Engineering, College of Engineering, University of Tehran（德黑兰大学电气与计算机工程系，工程学院）； Department of Statistics, Mathematics and Computer Science, Allameh Tabataba’i University（塔巴蒂大学统计、数学与计算机科学系）

AI总结针对标准GRPO在离散低分散奖励下的不稳定性，提出MDP-GRPO，通过多温度采样、双锚优势、前景理论整形和非对称KL正则化，在FollowBench等数据集上提升严格约束满足率最高5.0%。

Comments Accepted to ACL 2026 Main Conference. 14 pages, 9 figures

详情

AI中文摘要

可验证奖励的强化学习非常适合多约束指令跟随，但标准组相对策略优化（GRPO）在离散、低分散奖励下变得不稳定，此时组内奖励分布常常同质。我们识别并形式化了在此场景下z-score组归一化的三种病理：低方差放大、均值中心盲视和零方差崩溃。为解决这些问题，我们提出MDP-GRPO，通过以下方式稳定学习：（1）多温度采样以增加奖励分散度，（2）双锚优势以恢复同质组中的梯度并阻止均值中心盲视，（3）基于Kahneman和Tversky理论的前景理论整形以限制更新并惩罚违规，以及（4）非对称KL正则化。在FollowBench、IFEval和一个精心策划的多约束数据集上评估，MDP-GRPO优于标准GRPO，在Llama-3.2-3B上将严格约束满足率提高了最多5.0%。我们的方法还能够在保持MMLU和ARC上通用能力的同时，实现小批量大小的稳定收敛。

英文摘要

Reinforcement learning with verifiable rewards is ideal for multi-constraint instruction following, yet standard group-relative policy optimization (GRPO) becomes unstable under discrete, low-dispersion rewards, where within-group reward distributions are frequently homogeneous. We identify and formalize three pathologies of z-score group normalization in this regime: low-variance amplification, mean-centering blindness, and zero-variance collapse. To address them, we propose MDP-GRPO, which stabilizes learning through (1) multi-temperature sampling to increase reward dispersion, (2) dual-anchor advantages to restore gradients in homogeneous groups and stop mean-centering blindness, (3) prospect-theoretic shaping to bound updates and penalize violations based on Kahneman and Tversky's theory, and (4) asymmetric KL regularization. Evaluated on FollowBench, IFEval, and a curated multi-constraint dataset, MDP-GRPO outperforms standard GRPO, improving strict constraint satisfaction by up to 5.0% on Llama-3.2-3B. Our method also enables stable convergence with small group sizes while preserving general capabilities on MMLU and ARC.

URL PDF HTML ☆

赞 0 踩 0

2606.06055 2026-06-05 cs.AI

When Should Memory Stay Silent: Measuring Memory-Use Boundaries in Memory-Augmented Conversational Agents

记忆何时应保持沉默：衡量记忆增强型对话代理的记忆使用边界

Lingxiang Xu, Jiaoyun Yang, Min Hu, Hongtu Chen, Ning An

发表机构 * Hefei University of Technology（合肥工业大学）； Harvard Medical School（哈佛医学院）

AI总结提出RBI-Eval框架，通过探针集比较模型在有/无敏感记忆时的行为差异，发现当前检索增强生成系统无法避免敏感记忆的不当整合，需在检索和生成阶段同时进行记忆感知决策。

Comments 21 pages, 10 figures

详情

AI中文摘要

长期记忆使语言模型代理能够支持个性化交互，但目前尚不清楚何时可用记忆应被整合到响应中。现有的记忆评估强调检索准确性和下游任务效用，而忽略了检索到的敏感记忆内容在当前轮次中是否合理。我们引入RBI-Eval，这是一种基于探针集的受控测量研究，比较模型在相同良性提示下访问和不访问敏感记忆时的行为。我们在四种记忆访问设置（全上下文暴露和三种检索系统）下，针对四个基础LLM与匹配的无记忆参考进行评估。我们的结果揭示了显著的行为差异。在有记忆可用时，GPT-5.4-mini的敏感记忆整合分离分数相对于匹配的无记忆参考下降了8.9%–26.6%，而Claude-Sonnet-4.6、DeepSeek-V4-Flash和Qwen3.5-9B下降了51.1%–82.9%。对DeepSeek和GPT-5.4-mini的对照实验表明，这种效应是敏感内容特有的，而非一般个性化。检索系统减少了暴露，但一旦敏感记忆到达生成器，并不能消除整合。这些发现表明，安全个性化需要在检索和生成时都进行记忆感知决策。

英文摘要

Long-term memory enables language model agents to support personalized interactions, but it remains unclear when available memories warrant integration into responses. Existing memory evaluations emphasize retrieval accuracy and downstream task utility, while overlooking whether retrieved sensitive memory content is warranted in the current turn. We introduce RBI-Eval, a controlled measurement study built around a probe set that compares model behavior with and without access to sensitive memory under identical benign prompts. We evaluate four base LLMs against a matched no-memory reference across four memory-access settings: full-context exposure and three retrieval systems. Our results reveal substantial behavioral divergence. With memory available, the separation score for sensitive-memory integration decreases by 8.9\%--26.6\% relative to the matched no-memory reference for GPT-5.4-mini, but by 51.1\%--82.9\% for Claude-Sonnet-4.6, DeepSeek-V4-Flash, and Qwen3.5-9B. Control experiments on DeepSeek and GPT-5.4-mini show this effect is specific to sensitive content, rather than general personalization. Retrieval systems reduce exposure but do not eliminate integration once sensitive memory reaches the generator. These findings suggest safe personalization requires memory-aware decisions at both retrieval and generation time.

URL PDF HTML ☆

赞 0 踩 0

2606.06054 2026-06-05 cs.AI

Beyond Similarity: Trustworthy Memory Search for Personal AI Agents

超越相似性：面向个人AI代理的可信记忆搜索

Jiawen Zhang, Kejia Chen, Jiachen Ma, Yangfan Hu, Lipeng He, Yechao Zhang, Jian Liu, Xiaohu Yang, Tianwei Zhang, Ruoxi Jia

发表机构 * University of Science and Technology of China（中国科学技术大学）； Tsinghua University（清华大学）； National University of Singapore（新加坡国立大学）； University of California, Berkeley（加州大学伯克利分校）

AI总结针对个人AI代理中基于语义相似性的记忆检索存在的信任漏洞，提出轻量级记忆插件MemGate，通过查询条件神经门控实现可信记忆搜索。

详情

AI中文摘要

个人AI代理越来越依赖长期记忆来跨会话提供持久个性化。然而，现有的记忆流水线主要由语义相似性驱动：检索与当前查询语义接近的记忆数据并将其注入模型上下文。这造成了关键的信任差距，因为语义相关的记忆可能仍然在上下文中不合适，导致跨域泄露、谄媚、工具调用漂移或记忆引发的越狱等威胁。在本文中，我们将记忆搜索作为个人AI代理中的信任边界进行研究。我们评估了代表性的代理记忆框架，包括A-Mem、Mem0和MemOS，以及OpenClaw（一个具有持久状态和工具使用能力的真实世界个人代理环境）。我们的结果表明，长期记忆不仅仅是一个实用层，而是一个持久的控制通道，可以重塑代理如何解释任务和执行操作，使其极易受到上述威胁的影响。为了缓解这些漏洞，我们提出了MemGate，一个轻量级且可部署的记忆插件，用于可信记忆搜索，仅9M参数和35.1MB占用空间。MemGate插入在向量记忆存储和骨干LLM之间，无需修改LLM、重写记忆数据库或推理时LLM评判。它对候选记忆表示应用查询条件神经门控，将原始相似性搜索转化为任务条件记忆准入。在多个主流记忆框架、真实世界代理设置和多样化LLM骨干上，MemGate在保留长期记忆效用的同时减少了记忆引发的威胁。

英文摘要

Personal AI agents increasingly rely on long-term memory to provide persistent personalization across sessions. However, existing memory pipelines are largely driven by semantic similarity: memory data close to the current query is retrieved and injected into the model context. This creates a critical trustworthiness gap, since a semantically related memory may still be contextually inappropriate, leading to threats such as cross-domain leakage, sycophancy, tool-call drift, or memory-induced jailbreaks. In this paper, we study memory search as a trust boundary in personal AI agents. We evaluate representative agentic memory frameworks, including A-Mem, Mem0, and MemOS, together with OpenClaw, a real-world personal-agent environment with persistent state and tool-use capability. Our results show that long-term memory is not merely a utility layer, but a durable control channel that can reshape how agents interpret tasks and execute actions, leaving them highly susceptible to the aforementioned threats. To mitigate these vulnerabilities, we propose MemGate, a lightweight and deployable memory plug-in for trustworthy memory search, with only 9M parameters and a 35.1MB footprint. MemGate is inserted between the vector memory store and the backbone LLM, requiring no LLM modification, memory-database rewriting, or inference-time LLM judge. It applies a query-conditioned neural gate to candidate memory representations, turning raw similarity search into task-conditioned memory admission. Across multiple mainstream memory frameworks, real-world agent settings, and diverse LLM backbones, MemGate reduces memory-induced threats while preserving long-term memory utility.

URL PDF HTML ☆

赞 0 踩 0

2606.06053 2026-06-05 cs.LG

Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification

基于函数近似的在线KL正则化强化学习在模型误设下的研究

Haoyang Hong, Zichen Wang, Quanquan Gu, Huazheng Wang

发表机构 * Department of XXX, University of YYY, Location, Country（XXX系，YYY大学，地点，国家）； School of ZZZ, Institute of WWW, Location, Country（ZZZ学院，WWW研究所，地点，国家）

AI总结研究在模型误设下，基于一般函数近似的KL正则化上下文赌博机和情节强化学习，提出KL误设公式并分析基于回归的Gibbs策略更新算法，给出包含显式误设项的高概率KL遗憾界。

Comments Accepted by RLC 2026

2606.06049 2026-06-05 cs.RO

L-SDPPO: Policy Optimization of Spiking Diffusion Policy for Intra-vehicular Robotic Manipulation

L-SDPPO：用于舱内机器人操作的脉冲扩散策略优化

Liwen Zhang, Dong Zhou, Guanghui Sun, Yifei Zheng, Yuhui Hu, Kaihong Ouyang, Zuoquan Zhao

发表机构 * Department of Control Science and Engineering, Harbin Institute of Technology（控制科学与工程系，哈尔滨工业大学）； Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong（机械与自动化工程系，香港中文大学）

AI总结提出L-SDPPO框架，结合脉冲扩散策略与强化学习优化，并引入状态依赖延迟注入机制，在舱内机器人操作任务中实现高成功率和低能耗。

详情

AI中文摘要

航天器中的舱内机器人有助于减少宇航员的工作量并提高任务效率。最近的研究集中于使用深度学习方法来实现这些复杂环境中操作所需的精确控制。然而，在没有重力阻尼的情况下，物体会表现出不可预测、无约束的漂移。这些因素要求对复杂的多模态动作分布具有鲁棒性。扩散策略（DP）可以建模这些复杂动作，但其迭代采样过程对于航天器有限的功率预算来说消耗过多能量。因此，我们提出了一种低能耗的舱内机器人操作框架L-SDPPO，其中脉冲扩散策略（SDP）通过强化学习（RL）算法进行优化。此外，为了解决微重力下动态时空特征感知不足的问题，我们提出了状态依赖延迟注入（SDLI）机制，该机制模拟生物神经延迟以动态调节输入信息的时间。在五个代表性的舱内日常任务（例如舱门打开和精密容器盖合）上的评估表明，与最先进的机器人操作方法相比，我们的方法始终能实现更高的成功率和更低的能耗。这些结果表明我们的方法是一种可行的舱内机器人操作方法。

英文摘要

Intra-vehicular robots in spacecraft help reduce astronaut workload and improve mission efficiency. Recent research focuses on using deep learning methods to achieve the acute control required for operations in these complex environments. However, objects exhibit unpredictable, unconstrained drift without gravitational damping. These factors demand robustness against complex multimodal action distributions. Diffusion policies (DP) can model these complex actions, but their iterative sampling process consumes too much energy for the limited power budgets of spacecraft. We therefore propose a low-energy intra-vehicular robotic manipulation framework, L-SDPPO, in which the Spiking Diffusion Policy (SDP) is optimized with a reinforcement learning (RL) algorithm. Furthermore, to address the insufficient perception of dynamic spatiotemporal features in microgravity, we propose the statedependent latency injection (SDLI) mechanism, which mimics biological neural delays to dynamically regulate the timing of input information. Evaluation on five representative intra-vehicular daily tasks (e.g., hatch opening and precision container capping) shows that our method consistently achieves higher success rates and lower energy consumption, compared to the state-of-the-art robotic manipulation methods. These results demonstrate our method is a viable intra-vehicular robotic manipulation method.

URL PDF HTML ☆

赞 0 踩 0