arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 8081
专题追踪
2602.16666 2026-06-03 cs.AI cs.CY cs.LG

Towards a Science of AI Agent Reliability

迈向AI代理可靠性的科学

Stephan Rabanser, Sayash Kapoor, Peter Kirgis, Kangheng Liu, Saiteja Utpala, Arvind Narayanan

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出十二个具体指标,从一致性、鲁棒性、可预测性和安全性四个维度分解AI代理的可靠性,并通过实验揭示能力提升仅带来可靠性小幅改进。

Comments Accepted at ICML 2026. Interactive dashboard available at: https://hal.cs.princeton.edu/reliability

详情
AI中文摘要

AI代理越来越多地被部署来执行重要任务。虽然标准基准测试上的准确率分数不断提高表明进展迅速,但许多代理在实践中仍然持续失败。这种差异凸显了当前评估的一个根本局限性:将代理行为压缩为单一成功指标会掩盖关键的操作缺陷。值得注意的是,它忽略了代理是否在不同运行中表现一致、能否承受扰动、是否可预测地失败,或者错误严重性是否有界。基于安全关键工程,我们通过提出十二个具体指标来提供全面的性能概况,这些指标将代理可靠性分解为四个关键维度:一致性、鲁棒性、可预测性和安全性。在两个互补基准测试上评估15个模型,我们发现最近的能力提升仅带来了可靠性的小幅改进。通过暴露这些持续的局限性,我们的指标补充了传统评估,同时提供了推理代理如何表现、退化和失败的工具。

英文摘要

AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many agents still continue to fail in practice. This discrepancy highlights a fundamental limitation of current evaluations: compressing agent behavior into a single success metric obscures critical operational flaws. Notably, it ignores whether agents behave consistently across runs, withstand perturbations, fail predictably, or have bounded error severity. Grounded in safety-critical engineering, we provide a holistic performance profile by proposing twelve concrete metrics that decompose agent reliability along four key dimensions: consistency, robustness, predictability, and safety. Evaluating 15 models across two complementary benchmarks, we find that recent capability gains have only yielded small improvements in reliability. By exposing these persistent limitations, our metrics complement traditional evaluations while offering tools for reasoning about how agents perform, degrade, and fail.

2602.18084 2026-06-03 cs.LG

Balancing Symmetry and Efficiency in Graph Flow Matching

平衡图流匹配中的对称性与效率

Benjamin Honoré, Alba Carballo-Castro, Yiming Qin, Pascal Frossard

发表机构 * LTS4, EPFL, Lausanne, Switzerland(LTS4,瑞士洛桑联邦理工学院,拉夫斯堡)

AI总结 通过可控对称调制方案,研究图生成模型中严格等变性带来的计算成本与收敛速度之间的权衡,发现适当调节对称性可在加速收敛的同时避免过拟合。

Comments 15 pages, 11 figures

详情
AI中文摘要

等变性是图生成模型的核心,因为它确保模型尊重图的置换对称性。然而,严格的等变性由于增加了架构约束而提高了计算成本,并且由于模型必须在大量可能的节点置换空间上保持一致而可能减慢收敛速度。我们研究了图生成模型中的这种权衡。具体来说,我们从等变离散流匹配模型出发,在训练过程中通过基于正弦位置编码和节点置换的可控对称调制方案来放松其等变性。实验首先表明,对称性破缺可以通过提供更简单的学习信号来加速早期训练,但代价是鼓励捷径解决方案,可能导致过拟合,即模型重复生成训练集的重复图。相反,适当调节对称性信号可以延迟过拟合,同时加速收敛,使模型在基线训练周期的19%内达到更强的性能。

英文摘要

Equivariance is central to graph generative models, as it ensures the model respects the permutation symmetry of graphs. However, strict equivariance can increase computational cost due to added architectural constraints, and can slow down convergence because the model must be consistent across a large space of possible node permutations. We study this trade-off for graph generative models. Specifically, we start from an equivariant discrete flow-matching model, and relax its equivariance during training via a controllable symmetry modulation scheme based on sinusoidal positional encodings and node permutations. Experiments first show that symmetry-breaking can accelerate early training by providing an easier learning signal, but at the expense of encouraging shortcut solutions that can cause overfitting, where the model repeatedly generates graphs that are duplicates of the training set. On the contrary, properly modulating the symmetry signal can delay overfitting while accelerating convergence, allowing the model to reach stronger performance with $19\%$ of the baseline training epochs.

2502.08834 2026-06-03 cs.LG cs.AI stat.ML

Rex: A Family of Reversible Exponential (Stochastic) Runge-Kutta Solvers

Rex: 一族可逆指数(随机)龙格-库塔求解器

Zander W. Blasingame, Chen Liu

发表机构 * University of Washington(华盛顿大学)

AI总结 提出Rex求解器族,通过Lawson方法将显式(随机)龙格-库塔格式转化为代数可逆形式,用于扩散ODE和SDE,实现近机器精度重建并提升流模型和扩散模型的性能。

Comments Accepted as an Oral presentation at ICML 2026

详情
AI中文摘要

基于神经微分方程的深度生成模型已成为许多生成任务的最先进方法。这些模型依赖于从先验分布积分到数据分布的ODE/SDE求解器;在许多应用中,逆方向积分也非常可取。然而,标准求解器会累积离散误差,阻碍精确反演,这种不准确性在精度关键的应用中是不可接受的。现有的反演方法稳定性差、收敛阶低,且严格限于ODE设置。在这项工作中,我们提出Rex,一族可逆指数(随机)龙格-库塔求解器,通过应用Lawson方法将任何显式(随机)龙格-库塔格式转化为扩散ODE和SDE的代数可逆格式。除了严格的理论分析——建立任意阶收敛性和非零线性稳定区域——我们通过实验证明Rex实现了近机器精度的重建,并改进了基于流模型的玻尔兹曼采样以及基于扩散模型的图像生成和编辑。

英文摘要

Deep generative models based on neural differential equations have become state-of-the-art for many generation tasks. These models rely on ODE/SDE solvers that integrate from a prior distribution to the data distribution; in many applications it is also highly desirable to integrate in the inverse direction. Standard solvers, however, accumulate discretization errors that prohibit exact inversion, an inaccuracy that is unacceptable in precision-critical applications. Existing inversion methods suffer from poor stability and low order of convergence, and are strictly limited to the ODE setting. In this work, we propose Rex, a family of reversible exponential (stochastic) Runge-Kutta solvers obtained by applying Lawson methods to convert any explicit (stochastic) Runge-Kutta scheme into an algebraically reversible one for both diffusion ODEs and SDEs. Beyond a rigorous theoretical analysis -- establishing arbitrary-order convergence and a non-zero region of linear stability -- we empirically demonstrate that Rex achieves near-machine-precision reconstruction and improves Boltzmann sampling with flow models as well as image generation and editing with diffusion models.

2602.17149 2026-06-03 cs.LG cs.AI

TimeOmni-VL: Unified Models for Time Series Understanding and Generation

TimeOmni-VL:统一时间序列理解与生成的模型

Tong Guan, Sheng Pan, Johan Barthelemy, Zhao Li, Yujun Cai, Cesare Alippi, Ming Jin, Shirui Pan

发表机构 * Tsinghua University(清华大学)

AI总结 提出TimeOmni-VL框架,通过保真双向映射和理解引导生成,首次统一时间序列的理解与生成任务。

Comments Accepted by the Forty-third International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

近期的时间序列建模在数值生成与语义理解之间存在明显鸿沟,研究表明生成模型往往依赖浅层模式匹配,而理解导向的模型难以输出高保真数值。尽管统一多模态模型(UMMs)已在视觉领域弥合这一差距,但其在时间序列上的潜力尚未被发掘。我们提出TimeOmni-VL,这是首个以视觉为中心的统一时间序列理解与生成框架,通过两项关键创新实现:(1)时间序列与图像之间的保真双向映射(Bi-TSI),改进了时间序列到图像(TS2I)和图像到时间序列(I2TS)的转换,确保近乎无损的变换。(2)理解引导生成。我们引入TSUMM-Suite,这是一个新颖的数据集,包含六个基于时间序列分析的理解任务,并耦合两个生成任务。通过校准的思维链,TimeOmni-VL首次利用时间序列理解作为高保真生成的显式控制信号。实验证实,这种统一方法显著提升了语义理解和数值精度,为多模态时间序列建模开辟了新前沿。

英文摘要

Recent time series modeling faces a sharp divide between numerical generation and semantic understanding, with research showing that generation models often rely on superficial pattern matching, while understanding-oriented models struggle with high-fidelity numerical output. Although unified multimodal models (UMMs) have bridged this gap in vision, their potential for time series remains untapped. We propose TimeOmni-VL, the first vision-centric framework that unifies time series understanding and generation through two key innovations: (1) Fidelity-preserving bidirectional mapping between time series and images (Bi-TSI), which advances Time Series-to-Image (TS2I) and Image-to-Time Series (I2TS) conversions to ensure near-lossless transformations. (2) Understanding-guided generation. We introduce TSUMM-Suite, a novel dataset consisting of six understanding tasks rooted in time series analytics and coupled with two generation tasks. With a calibrated Chain-of-Thought, TimeOmni-VL is the first to leverage time series understanding as an explicit control signal for high-fidelity generation. Experiments confirm that this unified approach significantly improves semantic understanding and numerical precision, establishing a new frontier for multimodal time series modeling.

2602.17063 2026-06-03 cs.LG cs.AI cs.CL cs.CV

Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression

符号锁定:随机初始化的权重符号持续存在并成为亚比特模型压缩的瓶颈

Akira Sakai, Yuma Ichikawa

发表机构 * Fujitsu Limited(富士通株式会社) Tokai University(静冈大学) Riken Center for AIP(理化学研究所AIP研究中心)

AI总结 研究亚比特模型压缩中符号位的瓶颈问题,通过符号锁定理论解释权重符号的随机性来源,并提出一种从头开始的低秩符号模板训练方法以突破该瓶颈。

Comments Accepted at the Forty-Third International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

亚比特模型压缩的目标是将每个权重的存储降至1比特以下;当幅度被激进压缩时,符号位成为固定成本的瓶颈。在Transformer、CNN和MLP中,学习到的符号矩阵抵抗低秩近似,并且在频谱上与i.i.d. Rademacher基线无法区分。这种随机性导致了亚比特模型压缩的下界——1比特墙。尽管存在这种明显的随机性,大多数权重仍保留其初始化符号;翻转主要通过罕见的近零边界穿越发生,表明符号模式的随机性很大程度上继承自初始化。我们通过符号锁定理论形式化了这一行为,这是对SGD噪声下符号翻转的停时分析。在有界更新和零的小邻域内罕见重新进入的条件下,有效符号翻转的数量呈现几何尾部。基于这一机制,我们引入了一种从头开始的低秩符号模板训练方法,以防止这种1比特墙的出现。

英文摘要

Sub-bit model compression targets storage below one bit per weight; as magnitudes are aggressively compressed, the sign bit becomes a fixed-cost bottleneck. Across Transformers, CNNs, and MLPs, learned sign matrices resist low-rank approximation and are spectrally indistinguishable from an i.i.d. Rademacher baseline. This randomness gives rise to the lower bound of sub-bit model compression -- the one-bit wall. Despite this apparent randomness, most weights retain their initialization signs; flips primarily occur via rare near-zero boundary crossings, suggesting that sign-pattern randomness is largely inherited from initialization. We formalize this behavior with sign lock-in theory, a stopping-time analysis of sign flips under SGD noise. Under bounded updates and a rare re-entry condition into a small neighborhood of zero, the number of effective sign flips exhibits a geometric tail. Building on this mechanism, we introduce a from-scratch low-rank sign-template training method that prevents the emergence of this one-bit wall.

2602.14279 2026-06-03 cs.LG cs.AI cs.CL cs.SI

Whom to Query for What: Adaptive Group Elicitation via Multi-Turn LLM Interactions

为谁查询什么:通过多轮LLM交互的自适应群体征询

Ruomeng Ding, Tianwei Gao, Thomas P. Zollo, Eitan Bachmat, Richard Zemel, Zhun Deng

发表机构 * University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校) Columbia University(哥伦比亚大学) Ben-Gurion University of the Negev(贝内-约尔大学内盖夫分校)

AI总结 针对有限预算下群体属性不确定性降低问题,提出结合LLM期望信息增益与异构图神经网络传播的自适应群体征询框架,实现问题与受访者联合选择,在三个真实数据集上显著提升群体响应预测。

Comments Published as a conference paper at ICML 2026

详情
AI中文摘要

从调查和其他集体评估中征询信息以减少关于潜在群体属性的不确定性,需要在实际成本和缺失数据下分配有限的提问努力。尽管大型语言模型支持自然语言中的自适应多轮交互,但大多数现有征询方法优化了在固定受访者池中询问什么,并且在响应部分或不完整时不会调整受访者选择或利用群体结构。为解决这一差距,我们研究了自适应群体征询,这是一个多轮设置,其中智能体在明确的查询和参与预算下自适应地选择问题和受访者。我们提出了一个理论基础的框架,该框架结合了(i)基于LLM的期望信息增益目标,用于评分候选问题,以及(ii)异构图神经网络传播,该传播聚合观察到的响应和参与者属性,以插补缺失响应并指导每轮受访者选择。这种闭环过程查询一个小的、信息丰富的个体子集,同时通过结构化相似性推断群体级别的响应。在三个真实世界意见数据集上,我们的方法在预算受限的情况下持续提高了群体级别响应预测,包括在10%受访者预算下CES上相对提升超过12%。

英文摘要

Eliciting information to reduce uncertainty about latent group-level properties from surveys and other collective assessments requires allocating limited questioning effort under real costs and missing data. Although large language models enable adaptive, multi-turn interactions in natural language, most existing elicitation methods optimize what to ask with a fixed respondent pool, and do not adapt respondent selection or leverage population structure when responses are partial or incomplete. To address this gap, we study adaptive group elicitation, a multi-round setting where an agent adaptively selects both questions and respondents under explicit query and participation budgets. We propose a theoretically grounded framework that combines (i) an LLM-based expected information gain objective for scoring candidate questions with (ii) heterogeneous graph neural network propagation that aggregates observed responses and participant attributes to impute missing responses and guide per-round respondent selection. This closed-loop procedure queries a small, informative subset of individuals while inferring population-level responses via structured similarity. Across three real-world opinion datasets, our method consistently improves population-level response prediction under constrained budgets, including a >12% relative gain on CES at a 10% respondent budget.

2602.11908 2026-06-03 cs.AI cs.CL cs.LG

When Should LLMs Be Less Specific? Selective Abstraction for Reliable Long-Form Text Generation

LLM何时应降低具体性?面向可靠长文本生成的选择性抽象

Shani Goren, Ido Galil, Ran El-Yaniv

发表机构 * Technion(技术离子大学) NVIDIA(英伟达)

AI总结 针对LLM在长文本生成中因低置信度而丢弃有价值信息的问题,提出选择性抽象框架,通过原子级抽象替换不确定内容,在保持语义的同时提升准确性和可靠性。

详情
AI中文摘要

LLM被广泛使用,但仍容易出现事实错误,这削弱了用户信任并限制了在高风险场景中的采用。缓解这一风险的一种方法是为模型配备不确定性估计机制,在置信度低时弃权。然而,这种二元的“全有或全无”方法在长文本场景中过于严格,常常丢弃有价值的信息。我们引入了选择性抽象(SA),这是一个框架,使LLM能够通过选择性地降低不确定内容的细节来用具体性换取可靠性。我们首先通过选择性风险和覆盖率的视角形式化SA。然后,我们提出原子级选择性抽象,这是一种声明级别的实例化,将响应分解为原子声明(简短、自包含的陈述,每个表达一个单一事实),并用更高置信度、更低具体性的抽象替换不确定的原子。为了评估这一框架,我们开发了一个新颖的端到端流水线用于开放式生成,将风险实例化为事实正确性,并使用信息论度量保留信息来衡量覆盖率。在FactScore和LongFact-Objects基准测试上的六个开源模型中,原子级SA始终优于现有基线,在风险-覆盖率曲线下面积(AURC)上比声明移除方法提升高达27.73%,表明降低具体性可以在保留大部分原始含义的同时提升准确性和可靠性。

英文摘要

LLMs are widely used, yet they remain prone to factual errors that erode user trust and limit adoption in high-risk settings. One approach to mitigate this risk is to equip models with uncertainty estimation mechanisms that abstain when confidence is low. However, this binary "all-or-nothing" approach is excessively restrictive in long-form settings, often discarding valuable information. We introduce Selective Abstraction (SA), a framework that enables LLMs to trade specificity for reliability by selectively reducing the detail of uncertain content. We first formalize SA through the lenses of selective risk and coverage. We then propose Atom-wise Selective Abstraction, a claim-level instantiation that decomposes responses into atomic claims (short, self-contained statements each expressing a single fact) and replaces uncertain atoms with higher confidence, less specific abstractions. To evaluate this framework, we develop a novel end-to-end pipeline for open-ended generation that instantiates risk as factual correctness and measures coverage using an information-theoretic measure of retained information. Across six open-source models on the FactScore and LongFact-Objects benchmarks, atom-wise SA consistently outperforms existing baselines, improving the area under the risk-coverage curve (AURC) by up to 27.73% over claim removal, demonstrating that reducing specificity can boost accuracy and reliability while preserving most of their original meaning.

2602.11804 2026-06-03 cs.CV eess.IV

Efficient Segment Anything with Depth-Aware Fusion and Limited Training Data

基于深度感知融合与有限训练数据的高效分割一切

Yiming Zhou, Xuenjie Xie, Panfeng Li, Albrecht Kunz, Ahmad Osman, Xavier Maldague

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出一种轻量级RGB-D融合框架,通过单目深度先验增强EfficientViT-SAM,在仅使用11.2k训练样本(不到SA-1B的0.1%)的情况下,实现比EfficientViT-SAM更高的分割精度。

Journal ref ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1731-1735

详情
AI中文摘要

分割一切模型(SAM)实现了令人印象深刻的通用分割性能,但需要大规模数据集(例如1100万张图像)且仅依赖RGB输入。最近的高效变体减少了计算量,但仍依赖于大规模训练。我们提出了一种轻量级RGB-D融合框架,用单目深度先验增强EfficientViT-SAM。深度图通过预训练的估计器生成,并通过专门的深度编码器与RGB特征进行中层融合。仅使用11.2k样本(不到SA-1B的0.1%)训练,我们的方法比EfficientViT-SAM取得了更高的准确率,表明深度线索为分割提供了强大的几何先验。

英文摘要

Segment Anything Models (SAM) achieve impressive universal segmentation performance but require massive datasets (e.g., 11M images) and rely solely on RGB inputs. Recent efficient variants reduce computation but still depend on large-scale training. We propose a lightweight RGB-D fusion framework that augments EfficientViT-SAM with monocular depth priors. Depth maps are generated with a pretrained estimator and fused mid-level with RGB features through a dedicated depth encoder. Trained on only 11.2k samples (less than 0.1\% of SA-1B), our method achieves higher accuracy than EfficientViT-SAM, showing that depth cues provide strong geometric priors for segmentation.

2602.10352 2026-06-03 cs.CL cs.AI cs.LG

Learning Self-Interpretation from Interpretability Artifacts: Training Lightweight Adapters on Vector-Label Pairs

从可解释性工件中学习自我解释:在向量-标签对上训练轻量级适配器

Keenan Pepper, Alex McKenzie, Florin Pop, Stijn Servaes, Martin Leitgab, Mike Vaiana, Judd Rosenblatt, Michael S. A. Graziano, Diogo de Lucena

发表机构 * University of Washington(华盛顿大学)

AI总结 通过训练轻量级适配器(标量仿射适配器,仅需d_model+1参数)在可解释性工件上,保持语言模型完全冻结,实现了跨任务和模型族的可靠自我解释,在稀疏自编码器特征标注、主题识别和多跳推理桥接实体解码等任务上显著优于未训练基线。

Comments 26 pages, 18 tables, 17 figures. Code and data at https://github.com/agencyenterprise/selfie-adapters

详情
AI中文摘要

自我解释方法促使语言模型描述其内部状态,但由于超参数敏感性而仍然不可靠。我们表明,在可解释性工件上训练轻量级适配器,同时保持语言模型完全冻结,可以在任务和模型族中产生可靠的自我解释。一个仅需$d_\text{model}+1$个参数的标量仿射适配器就足够了:训练后的适配器生成稀疏自编码器特征标签,其性能优于训练标签本身(在70B规模下,生成评分为70% vs 50%),以94%的召回率@1识别主题(未训练基线为1%),并在多跳推理中解码既不在提示中也不在响应中出现的桥接实体,从而无需思维链即可揭示隐式推理。仅学习到的偏置向量就占了改进的85%,更简单的适配器比更具表达力的替代方案具有更好的泛化能力。通过提示描述控制模型知识,我们发现从7B到72B参数,自我解释的提升超过了能力提升。我们的结果表明,自我解释随着规模扩大而改善,且无需修改被解释的模型。

英文摘要

Self-interpretation methods prompt language models to describe their own internal states, but remain unreliable due to hyperparameter sensitivity. We show that training lightweight adapters on interpretability artifacts, while keeping the LM entirely frozen, yields reliable self-interpretation across tasks and model families. A scalar affine adapter with just $d_\text{model}+1$ parameters suffices: trained adapters generate sparse autoencoder feature labels that outperform the training labels themselves (70% vs 50% generation scoring at 70B scale), identify topics with 94% recall@1 versus 1% for untrained baselines, and decode bridge entities in multi-hop reasoning that appear in neither prompt nor response, surfacing implicit reasoning without chain-of-thought. The learned bias vector alone accounts for 85% of improvement, and simpler adapters generalize better than more expressive alternatives. Controlling for model knowledge via prompted descriptions, we find self-interpretation gains outpace capability gains from 7B to 72B parameters. Our results demonstrate that self-interpretation improves with scale, without modifying the model being interpreted.

2602.05302 2026-06-03 cs.AI

PieArena: Ranking and Profiling Language Agents in Realistic Negotiation Scenarios

PieArena:在真实谈判场景中对语言智能体进行排名与画像

Chris Zhu, Sasha Cui, Will Sanok Dufallo, Runzhi Jin, Zhen Xu, Linjun Zhang, Daylian Cain

发表机构 * Yale University(耶鲁大学) UC Berkeley(加州大学伯克利分校) BloomBerg(摩根大通) Rutgers University(罗格斯大学)

AI总结 本文提出PieArena基准,通过多智能体交互评估LLM的谈判能力,并开发排名模型与行为画像,发现联合意图框架对中低端模型提升显著,前沿模型(如GPT-5)在谈判中达到或超过人类基线。

详情
AI中文摘要

我们深入评估了LLM的谈判能力,这是一项需要战略推理、心理理论和经济价值创造的核心商业任务。为此,我们引入了PieArena,一个基于精英商学院MBA谈判课程中真实场景的多智能体交互的大规模谈判基准。我们在三种配对模式下评估语言智能体:镜像博弈、交叉博弈和人与语言模型博弈。我们开发了一个用于连续谈判收益的排名模型,该模型生成顺序不变、不确定性量化的排行榜,同时纠正系统性的实验不对称性。我们进一步研究了联合意图智能体框架的效果,发现其收益不对称:对中低端语言模型有大幅提升,而对前沿语言模型的边际收益递减。作为校准锚点,我们收集了受过训练的商学院学生之间以及学生与语言模型之间的谈判数据,发现代表性前沿语言智能体(GPT-5)在我们的评估设置中达到或超过了这一人类基线。除了交易结果,PieArena还提供了多维行为画像,揭示了指令遵从性、计算准确性以及法官评估的欺骗性和声誉方面的跨模型异质性,说明了超越仅基于结果的排行榜的评估价值。

英文摘要

We present an in-depth evaluation of LLMs' ability to negotiate, a central business task requiring strategic reasoning, theory of mind, and economic value creation. To do so, we introduce PieArena, a large-scale negotiation benchmark grounded in multi-agent interactions over realistic scenarios adapted from MBA negotiation courses at an elite business school. We evaluate language agents across three pairing regimes: mirror-play, cross-play, and human-LM play. We develop a ranking model for continuous negotiation payoffs that yields order-invariant, uncertainty-quantified leaderboards while correcting for systematic experimental asymmetries. We further study the effects of joint-intentionality agentic scaffolding and find asymmetric gains, with large improvements for mid- and lower-tier LMs and diminishing returns for frontier LMs. As calibration anchors, we collect human-human and human-LM negotiation data from trained business school students, finding that a representative frontier language agent (GPT-5) matches or exceeds this human baseline in our evaluation settings. Beyond deal outcomes, PieArena provides a multi-dimensional behavioral profile that reveals cross-model heterogeneity in instruction compliance, computation accuracy, as well as judge-assessed deception and reputation, illustrating the value of evaluation beyond outcome-only leaderboards.

2602.09708 2026-06-03 cs.LG cs.AI cs.CV cs.NA math.NA

Physics-informed diffusion models in spectral space

谱空间中的物理信息扩散模型

Davide Gallon, Philippe von Wurstemberger, Patrick Cheridito, Arnulf Jentzen

发表机构 * ETH Zürich(苏黎世联邦理工学院)

AI总结 提出物理信息谱扩散(PISD)方法,结合生成式潜扩散模型与物理信息机器学习,在谱表示潜空间中对偏微分方程参数和解进行扩散建模,通过扩散后验采样施加物理约束和测量条件,在泊松、亥姆霍兹和不可压缩纳维-斯托克斯方程上展现出比现有扩散求解器更高的精度和计算效率。

Comments 18 pages, 10 figures

详情
AI中文摘要

我们提出物理信息谱扩散(PISD),一种将生成式潜扩散模型与物理信息机器学习相结合的方法,用于生成基于部分观测的偏微分方程(PDE)的解,特别包括正向和逆向PDE问题。我们在缩放谱表示的潜空间中通过扩散过程学习PDE参数和解的联合分布,其中高斯噪声对应于具有受控正则性的函数。与基于网格的扩散模型相比,这种谱公式能够实现显著的降维,并确保函数空间中的诱导过程保持在PDE算子定义良好的函数类内。基于扩散后验采样,我们在推理过程中施加物理信息约束和测量条件,在每个扩散步骤应用基于Adam的更新。我们在泊松、亥姆霍兹和不可压缩纳维-斯托克斯方程上评估了所提出的方法,与现有的基于扩散的PDE求解器(在稀疏观测下达到最先进水平)相比,展示了更高的精度和计算效率。代码可在 https://github.com/deeplearningmethods/PISD 获取。

英文摘要

We propose physics-informed spectral diffusion (PISD), a methodology that combines generative latent diffusion models with physics-informed machine learning to generate solutions of partial differential equations (PDEs) conditioned on partial observations, which includes, in particular, forward and inverse PDE problems. We learn the joint distribution of PDE parameters and solutions via a diffusion process in a latent space of scaled spectral representations, where Gaussian noise corresponds to functions with controlled regularity. This spectral formulation enables significant dimensionality reduction compared to grid-based diffusion models and ensures that the induced process in function space remains within a class of functions for which the PDE operators are well defined. Building on diffusion posterior sampling, we enforce physics-informed constraints and measurement conditions during inference, applying Adam-based updates at each diffusion step. We evaluate the proposed approach on Poisson, Helmholtz, and incompressible Navier-Stokes equations, demonstrating improved accuracy and computational efficiency compared with existing diffusion-based PDE solvers, which are state of the art for sparse observations. Code is available at https://github.com/deeplearningmethods/PISD.

2602.08335 2026-06-03 cs.AI

Who Deserves the Reward? SHARP: Shapley Credit-based Optimization for Multi-Agent System

谁应得奖励?SHARP:基于Shapley信用的多智能体系统优化

Yanming Li, Xuelin Zhang, WenJie Lu, Ziye Tang, Maodong Wu, Haotian Luo, Tongtong Wu, Zijie Peng, Hongze Mi, Yibo Feng, Naiqiang Tan, Chao Huang, Lian Peng, Li Shen

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 针对多智能体系统中信用分配难题,提出SHARP框架,通过分解奖励机制(全局广播奖励、Shapley边际信用奖励和工具过程奖励)实现精确信用归因,显著提升强化学习性能。

详情
AI中文摘要

通过多智能体系统将大型语言模型(LLMs)与外部工具集成,为分解和解决复杂问题提供了一种有前景的新范式。然而,由于信用分配挑战,训练这些系统仍然非常困难,因为通常不清楚哪个特定功能智能体对决策轨迹的成功或失败负责。现有方法通常依赖稀疏或全局广播奖励,无法捕捉个体贡献,导致强化学习效率低下。为解决这些限制,我们引入了基于Shapley的层次化强化策略归因(SHARP),一种通过精确信用归因优化多智能体强化学习的新框架。SHARP通过跨轨迹组归一化智能体特定优势来有效稳定训练,主要通过一种分解奖励机制实现,该机制包括全局广播准确率奖励、每个智能体的基于Shapley的边际信用奖励,以及提高执行效率的工具过程奖励。在各种真实世界基准上的大量实验表明,SHARP显著优于近期最先进的基线,在单智能体和多智能体方法上分别实现了23.66%和14.05%的平均匹配改进。

英文摘要

Integrating Large Language Models (LLMs) with external tools via multi-agent systems offers a promising new paradigm for decomposing and solving complex problems. However, training these systems remains notoriously difficult due to the credit assignment challenge, as it is often unclear which specific functional agent is responsible for the success or failure of decision trajectories. Existing methods typically rely on sparse or globally broadcast rewards, failing to capture individual contributions and leading to inefficient reinforcement learning. To address these limitations, we introduce the Shapley-based Hierarchical Attribution for Reinforcement Policy (SHARP), a novel framework for optimizing multi-agent reinforcement learning via precise credit attribution. SHARP effectively stabilizes training by normalizing agent-specific advantages across trajectory groups, primarily through a decomposed reward mechanism comprising a global broadcast-accuracy reward, a Shapley-based marginal-credit reward for each agent, and a tool-process reward to improve execution efficiency. Extensive experiments across various real-world benchmarks demonstrate that SHARP significantly outperforms recent state-of-the-art baselines, achieving average match improvements of 23.66% and 14.05% over single-agent and multi-agent approaches, respectively.

2602.06960 2026-06-03 cs.CL cs.AI

InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

InftyThink+:通过强化学习实现高效且有效的无限时域推理

Yuchen Yan, Liang Jiang, Jin Jiang, Shuaicheng Li, Zujie Wen, Zhiqiang Zhang, Jun Zhou, Jian Shao, Yueting Zhuang, Yongliang Shen

发表机构 * Tsinghua University(清华大学)

AI总结 提出InftyThink+框架,通过强化学习优化迭代推理的总结时机、保留内容和恢复策略,在DeepSeek-R1-Distill-Qwen-1.5B上提升AIME24准确率21%,并降低推理延迟。

Comments ICML 2026: https://openreview.net/forum?id=tyul8kXaJU Project Page: https://zju-real.github.io/InftyThink-Plus Code: https://github.com/ZJU-REAL/InftyThink-Plus Models: https://huggingface.co/collections/yanyc/inftythink

详情
AI中文摘要

大型推理模型通过扩展推理时的思维链取得了强劲性能,但这种范式存在二次成本、上下文长度限制以及因中间丢失效应导致的推理退化问题。迭代推理通过定期总结中间思考缓解了这些问题,但现有方法依赖监督学习或固定启发式,未能优化何时总结、保留什么以及如何恢复推理。我们提出InftyThink+,一个端到端强化学习框架,它优化整个迭代推理轨迹,基于模型控制的迭代边界和显式总结。InftyThink+采用两阶段训练方案:监督冷启动后接轨迹级强化学习,使模型学习策略性总结和继续决策。在DeepSeek-R1-Distill-Qwen-1.5B上的实验表明,InftyThink+在AIME24上准确率提升21%,显著优于传统长思维链强化学习,同时在分布外基准上泛化能力更强。此外,InftyThink+大幅降低推理延迟并加速强化学习训练,展示了更强的推理效率与性能提升。

英文摘要

Large reasoning models achieve strong performance by scaling inference-time chain-of-thought, but this paradigm suffers from quadratic cost, context length limits, and degraded reasoning due to lost-in-the-middle effects. Iterative reasoning mitigates these issues by periodically summarizing intermediate thoughts, yet existing methods rely on supervised learning or fixed heuristics and fail to optimize when to summarize, what to preserve, and how to resume reasoning. We propose InftyThink+, an end-to-end reinforcement learning framework that optimizes the entire iterative reasoning trajectory, building on model-controlled iteration boundaries and explicit summarization. InftyThink+ adopts a two-stage training scheme with supervised cold-start followed by trajectory-level reinforcement learning, enabling the model to learn strategic summarization and continuation decisions. Experiments on DeepSeek-R1-Distill-Qwen-1.5B show that InftyThink+ improves accuracy by 21% on AIME24 and outperforms conventional long chain-of-thought reinforcement learning by a clear margin, while also generalizing better to out-of-distribution benchmarks. Moreover, InftyThink+ significantly reduces inference latency and accelerates reinforcement learning training, demonstrating improved reasoning efficiency alongside stronger performance.

2602.07842 2026-06-03 cs.CL

Evaluating and Calibrating LLM Confidence on Questions with Multiple Correct Answers

评估和校准LLM在多个正确答案问题上的置信度

Yuhan Wang, Shiyu Ni, Zhikai Ding, Zihang Zhan, Yuanzi Li, Keping Bi

发表机构 * State Key Laboratory of AI Safety(人工智能安全国家重点实验室) Institute of Computing Technology, Chinese Academy of Sciences(中国科学院计算技术研究所) University of Chinese Academy of Sciences(中国科学院大学) Tsinghua University(清华大学) Renmin University of China(中国人民大学)

AI总结 针对多正确答案问题导致现有置信度校准方法失效的问题,提出语义置信度聚合(SCA)方法,通过聚合多个高概率采样响应的置信度,在混合答案设置下实现最优校准性能。

详情
AI中文摘要

置信度校准对于使大型语言模型(LLM)可靠至关重要,然而现有的无训练方法主要在单答案问答场景下研究。本文表明,这些方法在存在多个有效答案时会失效,因为同等正确响应之间的分歧导致置信度系统性低估。为了系统研究这一现象,我们引入了MACE基准,包含跨越六个领域的12,000个事实性问题,每个问题有不同数量的正确答案。在15种代表性校准方法和四个LLM系列(7B-72B)上的实验表明,虽然准确率随答案基数增加而提高,但估计置信度持续下降,导致对于混合答案数量的问题出现严重校准偏差。为解决此问题,我们提出语义置信度聚合(SCA),该方法聚合多个高概率采样响应的置信度。SCA在混合答案设置下实现了最先进的校准性能,同时在单答案问题上保持强校准能力。

英文摘要

Confidence calibration is essential for making large language models (LLMs) reliable, yet existing training-free methods have been primarily studied under single-answer question answering. In this paper, we show that these methods break down in the presence of multiple valid answers, where disagreement among equally correct responses leads to systematic underestimation of confidence. To enable a systematic study of this phenomenon, we introduce MACE, a benchmark of 12,000 factual questions spanning six domains with varying numbers of correct answers. Experiments across 15 representative calibration methods and four LLM families (7B-72B) reveal that while accuracy increases with answer cardinality, estimated confidence consistently decreases, causing severe miscalibration for questions with mixed answer counts. To address this issue, we propose Semantic Confidence Aggregation (SCA), which aggregates confidence over multiple high-probability sampled responses. SCA achieves state-of-the-art calibration performance under mixed-answer settings while preserving strong calibration on single-answer questions.

2602.07639 2026-06-03 cs.CL

Letting Tutor Personas Speak Up for LLMs: Learning Steering Vectors from Dialogue via Preference Optimization

让导师角色为LLMs发声:通过偏好优化从对话中学习引导向量

Jaewook Lee, Alexander Scarlatos, Simon Woodhead, Andrew Lan

发表机构 * University of Massachusetts Amherst(马萨诸塞大学阿姆赫斯特分校) Eedi

AI总结 本文提出使用偏好优化训练引导向量,从人类导师-学生对话中提取导师角色信息,以控制大语言模型的行为,实现多样化的教学风格。

Comments Accepted to ACL 2026 BEA Workshop

详情
AI中文摘要

随着大语言模型(LLMs)作为一类强大的生成式人工智能(AI)的出现,它们在辅导中的应用日益突出。先前基于LLM的辅导工作通常学习单一的辅导策略,未能捕捉辅导风格的多样性。在现实世界的导师-学生互动中,教学意图通过适应性教学策略实现,导师根据学习者的需求调整支架式教学、指导性、反馈和情感支持的级别。这些差异都会影响对话动态和学生参与度。在本文中,我们探讨如何利用嵌入在人类导师-学生对话中的导师角色来引导LLM行为,而不依赖于显式提示指令。我们使用偏好优化训练一个引导向量:一个激活空间方向,用于引导模型响应朝向特定的导师角色。我们发现,这个引导向量捕捉了跨对话上下文的导师特定变化,提高了与真实导师话语的语义对齐,并增加了基于偏好的评估,同时很大程度上保留了词汇相似性。对学习到的缩放系数的进一步分析揭示了跨导师的可解释结构,对应于辅导行为的一致差异。这些结果表明,激活引导提供了一种有效且可解释的方式,利用直接从人类对话数据中获得的信号来控制LLM中导师特定的变化。

英文摘要

With the emergence of large language models (LLMs) as a powerful class of generative artificial intelligence (AI), their use in tutoring has become increasingly prominent. Prior works on LLM-based tutoring typically learn a single tutor policy and do not capture the diversity of tutoring styles. In real-world tutor-student interactions, pedagogical intent is realized through adaptive instructional strategies, with tutors varying the level of scaffolding, instructional directiveness, feedback, and affective support in response to learners' needs. These differences can all impact dialogue dynamics and student engagement. In this paper, we explore how tutor personas embedded in human tutor-student dialogues can be used to guide LLM behavior without relying on explicitly prompted instructions. We train a steering vector using preference optimization: an activation-space direction that guides model responses toward specific tutor personas. We find that this steering vector captures tutor-specific variation across dialogue contexts, improving semantic alignment with ground-truth tutor utterances and increasing preference-based evaluations, while largely preserving lexical similarity. Analysis of the learned scaling coefficients further reveals interpretable structure across tutors, corresponding to consistent differences in tutoring behavior. These results demonstrate that activation steering offers an effective and interpretable way for controlling tutor-specific variation in LLMs using signals derived directly from human dialogue data.

2511.16275 2026-06-03 cs.CL cs.AI

SeSE: Black-Box Uncertainty Quantification for Large Language Models Based on Structural Information Theory

SeSE: 基于结构信息理论的大语言模型黑盒不确定性量化

Xingtao Zhao, Hao Peng, Dingli Su, Xianghua Zeng, Chunyang Liu, Jinzhi Liao, Philip S. Yu

发表机构 * School of Cyber Science and Technology Beihang University(北航信息科学与技术学院) School of Computer Science and Engineering Beihang University(北航计算机科学与工程学院) Didi Chuxing(滴滴出行) Laboratory for Big Data and Decision National University of Defense Technology(国防科技大学大数据与决策实验室) Department of Computer Science University of Illinois Chicago(伊利诺伊大学芝加哥分校计算机科学系)

AI总结 提出SeSE框架,通过构建语义空间的最优层次抽象并计算结构熵,实现大语言模型的黑盒不确定性量化,理论推广了语义熵并在长文本生成中优于现有方法。

Comments Accepted by UAI 2026

详情
AI中文摘要

可靠的不确定性量化(UQ)对于在安全关键场景中部署大语言模型(LLMs)至关重要,因为它使模型能够在不确定时避免回应,从而避免产生幻觉,即看似合理但事实错误的回应。然而,尽管语义UQ方法取得了先进性能,它们忽略了可能实现更精确不确定性估计的潜在语义结构信息。在本文中,我们提出了语义结构熵(SeSE),一个适用于开源和闭源LLMs的原则性黑盒UQ框架。为了揭示语义空间的内在结构,SeSE通过具有最小结构熵的编码树构建其最优层次抽象。因此,该编码树的结构熵量化了最优压缩后LLM语义空间内的固有不确定性。此外,与主要关注简单短文本生成的现有方法不同,我们将SeSE扩展到为长文本输出提供可解释的、细粒度的不确定性估计。我们从理论上证明SeSE推广了语义熵(LLM中UQ的金标准),并通过24个模型-数据集组合的实验证明其优于强基线的性能。

英文摘要

Reliable uncertainty quantification (UQ) is essential for deploying large language models (LLMs) in safety-critical scenarios, as it enables them to abstain from responding when uncertain, thereby avoiding hallucinations, i.e., plausible yet factually incorrect responses. However, while semantic UQ methods have achieved advanced performance, they overlook latent semantic structural information that could enable more precise uncertainty estimates. In this paper, we propose \underline{Se}mantic \underline{S}tructural \underline{E}ntropy ({SeSE}), a principled black-box UQ framework applicable to both open- and closed-source LLMs. To reveal the intrinsic structure of the semantic space, SeSE constructs its optimal hierarchical abstraction through an encoding tree with minimal structural entropy. The structural entropy of this encoding tree thus quantifies the inherent uncertainty within LLM semantic space after optimal compression. Additionally, unlike existing methods that primarily focus on simple short-form generation, we extent SeSE to provide interpretable, granular uncertainty estimation for long-form outputs. We theoretically prove that SeSE generalizes semantic entropy, the gold standard for UQ in LLMs, and empirically demonstrate its superior performance over strong baselines across 24 model-dataset combinations.

2602.06219 2026-06-03 cs.RO cs.AI

Coupled Local and Global World Models for Efficient First Order RL

耦合局部与全局世界模型的高效一阶强化学习

Joseph Amigo, Rooholla Khorrambakht, Nicolas Mansard, Ludovic Righetti

发表机构 * Machines in Motion Laboratory, New York University, USA(纽约大学运动机器实验室) LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France(图卢兹大学LAAS-CNRS中心) Artificial and Natural Intelligence Toulouse Institute, Toulouse, France(图卢兹人工智能与自然智能研究所)

AI总结 提出一种通过解耦一阶梯度方法在数据驱动的世界模型内训练策略的方法,结合局部和全局世界模型实现高效梯度计算,在Push-T任务和四足机器人操作任务中显著优于PPO。

Comments Project website: https://coupled-global-local-wm-rl.pages.dev/

详情
AI中文摘要

世界模型为在标准模拟器难以处理的情况下更忠实地捕捉复杂动力学(包括接触和非刚性)以及复杂感官信息(如视觉感知)提供了一条有前景的途径。然而,这些模型的计算复杂度高,对流行的强化学习方法构成了挑战,这些方法已成功用于模拟器解决复杂运动任务,但在操作任务上仍存在困难。本文介绍了一种完全绕过模拟器的方法,在从机器人与真实环境交互中学习到的世界模型内部训练强化学习策略。其核心是通过一种新颖的解耦一阶梯度方法实现大规模扩散模型的策略训练:全尺度世界模型生成准确的前向轨迹,而轻量级潜在空间代理近似其局部动力学以实现高效梯度计算。这种局部与全局世界模型的耦合确保了高保真展开以及计算上可处理的微分。我们在Push-T操作任务上证明了该方法的有效性,其在样本效率上显著优于PPO。我们还通过四足机器人的自我中心物体操作任务进一步评估了该方法。这些结果共同表明,在数据驱动的世界模型内部学习是解决难以建模的图像空间强化学习任务的一条有前景的途径,无需依赖手工设计的物理模拟器。

英文摘要

World models offer a promising avenue for more faithfully capturing complex dynamics, including contacts and non-rigidity, as well as complex sensory information, such as visual perception, in situations where standard simulators struggle. However, these models are computationally complex to evaluate, posing a challenge for popular RL approaches that have been successfully used with simulators to solve complex locomotion tasks but yet struggle with manipulation. This paper introduces a method that bypasses simulators entirely, training RL policies inside world models learned from robots' interactions with real environments. At its core, our approach enables policy training with large-scale diffusion models via a novel decoupled first-order gradient (FoG) method: a full-scale world model generates accurate forward trajectories, while a lightweight latent-space surrogate approximates its local dynamics for efficient gradient computation. This coupling of a local and global world model ensures high-fidelity unrolling alongside computationally tractable differentiation. We demonstrate the efficacy of our method on the Push-T manipulation task, where it significantly outperforms PPO in sample efficiency. We further evaluate our approach through an ego-centric object manipulation task with a quadruped. Together, these results demonstrate that learning inside data-driven world models is a promising pathway for solving hard-to-model RL tasks in image space without reliance on hand-crafted physics simulators.

2602.05031 2026-06-03 cs.LG

Laplacian Representations for Decision-Time Planning

用于决策时规划的拉普拉斯表示

Dikshant Shehmar, Matthew Schlegel, Matthew E. Taylor, Marlos C. Machado

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文提出利用拉普拉斯表示作为决策时规划的潜在空间,通过多时间尺度捕捉状态空间距离,并基于此设计层次规划算法ALPS,在离线目标条件强化学习任务中优于常用基线。

Comments Accepted at ICML 2026

Journal ref Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

在基于模型的强化学习中,使用学习到的模型进行规划仍然是一个关键挑战。在决策时规划中,状态表示至关重要,因为它们必须支持局部成本计算,同时保持长时程结构。在本文中,我们展示了拉普拉斯表示通过在多时间尺度上捕捉状态空间距离,为规划提供了一个有效的潜在空间。这种表示保留了有意义的距离,并自然地将长时程问题分解为子目标,同时也减轻了长预测范围内出现的复合误差。基于这些特性,我们引入了ALPS,一种层次规划算法,并证明它在来自OGBench(一个以前由无模型方法主导的基准)的离线目标条件强化学习任务选择上优于常用的基线。

英文摘要

Planning with a learned model remains a key challenge in model-based reinforcement learning (RL). In decision-time planning, state representations are critical as they must support local cost computation while preserving long-horizon structure. In this paper, we show that the Laplacian representation provides an effective latent space for planning by capturing state-space distances at multiple time scales. This representation preserves meaningful distances and naturally decomposes long-horizon problems into subgoals, also mitigating the compounding errors that arise over long prediction horizons. Building on these properties, we introduce ALPS, a hierarchical planning algorithm, and demonstrate that it outperforms commonly used baselines on a selection of offline goal-conditioned RL tasks from OGBench, a benchmark previously dominated by model-free methods.

2507.10419 2026-06-03 cs.LG cs.AI cs.CL stat.ML

Multiple Choice Learning of Low-Rank Adapters for Language Modeling

低秩适配器的多选学习用于语言建模

Victor Letzelter, Hugo Malard, Mathieu Fontaine, Gaël Richard, Slim Essid, Andrei Bursuc, Patrick Pérez

发表机构 * Institut National de la Recherche Scientifique (INRS)(国家科学研究院)

AI总结 提出LoRA-MCL训练方案,通过多选学习和低秩适配扩展语言模型的下一词预测,以在推理时解码多样且合理的句子延续。

Comments ICML 2026

详情
AI中文摘要

我们提出LoRA-MCL,一种训练方案,通过一种旨在推理时解码多样、合理的句子延续的方法,扩展语言模型中的下一词预测。传统语言建模是一个本质上不适定的问题:给定一个上下文,多个未来可能同样合理。我们的方法利用多选学习(MCL)和胜者全得损失,通过低秩适配有效处理歧义。我们提供了将MCL应用于语言建模的理论解释,假设数据来自混合分布。我们使用马尔可夫链混合来说明所提出的方法。然后,我们通过音频和视觉字幕以及机器翻译的实验证明,我们的方法在生成输出中实现了高多样性和相关性。我们发布了将LoRA-MCL应用于广泛语言模型的代码。

英文摘要

We propose LoRA-MCL, a training scheme that extends next-token prediction in language models with a method designed to decode diverse, plausible sentence continuations at inference time. Traditional language modeling is an intrinsically ill-posed problem: given a context, multiple futures may be equally plausible. Our approach leverages Multiple Choice Learning (MCL) and the winner-takes-all loss to efficiently handle ambiguity through Low-Rank Adaptation. We provide a theoretical interpretation of applying MCL to language modeling, assuming the data is generated from a mixture of distributions. We illustrate the proposed approach using mixtures of Markov chains. We then demonstrate with experiments on audio and visual captioning, as well as machine translation, that our method achieves high diversity and relevance in generated outputs. We release the code for applying LoRA-MCL to a wide range of language models.

2602.03681 2026-06-03 cs.CL cs.LG

Neural Attention Search Linear: Towards Adaptive Token-Level Hybrid Attention Models

神经注意力搜索线性:迈向自适应令牌级混合注意力模型

Difan Deng, Andreas Bentzen Winje, Lukas Fehring, Marius Lindauer

发表机构 * University of Copenhagen(哥本哈根大学)

AI总结 提出NAtS-L框架,在同一层内对不同令牌自适应选择线性注意力或softmax注意力,以平衡效率与表达能力。

Comments 21 pages, 12 figures

详情
AI中文摘要

softmax变换器的二次计算复杂度已成为长上下文场景的瓶颈。相比之下,线性注意力模型系列为更高效的序列模型提供了有希望的方向。这些线性注意力模型将过去的KV值压缩成单个隐藏状态,从而在训练和推理期间有效降低复杂度。然而,它们的表达能力仍然受限于隐藏状态的大小。先前的工作提出交错使用softmax和线性注意力层,以在保持表达能力的同时降低计算复杂度。然而,这些模型的效率仍然受限于其softmax注意力层。在本文中,我们提出神经注意力搜索线性(NAtS-L)框架,该框架在同一层内对不同令牌同时应用线性注意力和softmax注意力操作。NAtS-L自动判断一个令牌是否可以由线性注意力模型处理(即仅具有短期影响且可编码为固定大小隐藏状态的令牌),或者是否需要softmax注意力(即包含与长期检索相关信息且需要为未来查询保留的令牌)。通过跨令牌搜索最优的门控DeltaNet和softmax注意力组合,我们展示了NAtS-L提供了一种强大而高效的令牌级混合架构。

英文摘要

The quadratic computational complexity of softmax transformers has become a bottleneck in long-context scenarios. In contrast, linear attention model families provide a promising direction towards a more efficient sequential model. These linear attention models compress past KV values into a single hidden state, thereby efficiently reducing complexity during both training and inference. However, their expressivity remains limited by the size of their hidden state. Previous work proposed interleaving softmax and linear attention layers to reduce computational complexity while preserving expressivity. Nevertheless, the efficiency of these models remains bottlenecked by their softmax attention layers. In this paper, we propose Neural Attention Search Linear (NAtS-L), a framework that applies both linear attention and softmax attention operations within the same layer on different tokens. NAtS-L automatically determines whether a token can be handled by a linear attention model, i.e., tokens that have only short-term impact and can be encoded into fixed-size hidden states, or require softmax attention, i.e., tokens that contain information related to long-term retrieval and need to be preserved for future queries. By searching for optimal Gated DeltaNet and softmax attention combinations across tokens, we show that NAtS-L provides a strong yet efficient token-level hybrid architecture.

2602.01483 2026-06-03 cs.LG cs.AI stat.ME

Causal Preference Elicitation

因果偏好启发

Edwin V. Bonilla, He Zhao, Daniel M. Steinberg

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出一种贝叶斯框架,通过主动查询局部边关系来集中有向无环图的后验分布,实现专家参与的因果发现。

详情
AI中文摘要

我们提出因果偏好启发,一种用于专家参与因果发现的贝叶斯框架,该框架主动查询局部边关系以集中有向无环图(DAG)的后验分布。从任何黑箱观测后验出发,我们使用一个三向似然模型对专家的噪声判断进行建模,该似然涵盖边的存在性和方向。后验推断采用灵活的粒子近似,并通过专家分类响应的期望信息增益准则高效选择查询。在合成图、蛋白质信号数据以及人类基因扰动基准上的实验表明,在严格的查询预算下,后验集中速度更快,且对有向效应的恢复能力得到提升。

英文摘要

We propose causal preference elicitation, a Bayesian framework for expert-in-the-loop causal discovery that actively queries local edge relations to concentrate a posterior over directed acyclic graphs (DAGs). From any black-box observational posterior, we model noisy expert judgments with a three-way likelihood over edge existence and direction. Posterior inference uses a flexible particle approximation, and queries are selected by an efficient expected information gain criterion on the expert's categorical response. Experiments on synthetic graphs, protein signaling data, and a human gene perturbation benchmark show faster posterior concentration and improved recovery of directed effects under tight query budgets.

2512.00956 2026-06-03 cs.LG cs.CL

WUSH: Near-Optimal Adaptive Transforms for LLM Quantization

WUSH: 面向LLM量化的近最优自适应变换

Jiale Chen, Vage Egiazarian, Roberto L. Castro, Torsten Hoefler, Dan Alistarh

发表机构 * University of Tartu(塔尔图大学)

AI总结 提出一种结合Hadamard基与数据依赖二阶矩的非正交变换WUSH,在标准RTN AbsMax缩放块量化器下实现权重-激活联合量化的闭式最优解,显著提升低比特量化精度并支持高效GPU实现。

Comments Published as a conference paper at the 43rd International Conference on Machine Learning (ICML 2026): https://openreview.net/forum?id=ZsECxUkbKB

详情
AI中文摘要

量化LLM权重和激活是实现高效部署的标准方法,但少数极端异常值会拉伸动态范围并放大低比特量化误差。先前的基于变换的缓解方法(例如Hadamard旋转)是固定的且与数据无关,其量化最优性尚不明确。我们推导了在标准RTN AbsMax缩放块量化器下,用于联合权重-激活量化的闭式最优线性块变换,涵盖整数和浮点格式。由此产生的构造WUSH将Hadamard骨干与数据依赖的二阶矩分量相结合,形成一种非正交变换,在温和假设下对FP和INT量化器证明是近最优的,同时支持高效的融合GPU实现。实验上,WUSH在最强Hadamard基线(例如,在Llama-3.1-8B-Instruct的MXFP4上,RTN平均提升+2.8个点,GPTQ提升+0.7个点)上改善了W4A4精度,同时通过FP4 MatMul实现了高达BF16的5.8倍每层吞吐量。源代码可在https://github.com/IST-DASLab/WUSH获取。

英文摘要

Quantizing LLM weights and activations is a standard approach for efficient deployment, but a few extreme outliers can stretch the dynamic range and amplify low-bit quantization errors. Prior transform-based mitigations (e.g., Hadamard rotations) are fixed and data-agnostic, and their optimality for quantization has remained unclear. We derive closed-form optimal linear blockwise transforms for joint weight-activation quantization under standard RTN AbsMax-scaled block quantizers, covering both integer and floating-point formats. The resulting construction, WUSH, combines a Hadamard backbone with a data-dependent second-moment component to form a non-orthogonal transform that is provably near-optimal for FP and INT quantizers under mild assumptions while admitting an efficient fused GPU implementation. Empirically, WUSH improves W4A4 accuracy over the strongest Hadamard-based baselines (e.g., on Llama-3.1-8B-Instruct in MXFP4, it gains +2.8 average points with RTN and +0.7 with GPTQ) while delivering up to 5.8$\times$ per-layer throughput over BF16 via FP4 MatMul. Source code is available at https://github.com/IST-DASLab/WUSH.

2510.16392 2026-06-03 cs.AI

RGMem: Renormalization Group-inspired Memory Evolution for Language Agents

RGMem:基于重正化群启发的语言智能体记忆演化

Ao Tian, Yunfeng Lu, Xinxin Fan, Changhao Wang, Lanzhi Zhou, Yeyao Zhang, Yanfang Liu

发表机构 * School of Computer Science Engineering, Beihang University, Beijing, China School of Reliability Systems Engineering, Beihang University, Beijing, China State Key Laboratory of Complex \& Critical Software Environment National Key Laboratory of Reliability State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences

AI总结 提出RGMem框架,利用重正化群思想对长期对话记忆进行多尺度粗粒化、阈值更新和重缩放,实现从事实到用户偏好的层次化整合,在LOCOMO和PersonaMem基准上超越现有记忆系统。

Comments Accepted to ICML 2026

详情
AI中文摘要

个性化和持续交互对于基于LLM的对话智能体至关重要,但有限的上下文窗口和静态参数记忆阻碍了对长期、跨会话用户状态的建模。现有方法(包括检索增强生成和显式记忆系统)主要在事实层面操作,难以从演化且可能冲突的对话中提炼稳定的偏好和深层用户特征。为应对这一挑战,我们提出RGMem,一种受重正化群(RG)多尺度组织和涌现观点启发的自演化记忆框架。RGMem将长期对话记忆建模为多尺度演化过程:情节交互被转化为语义事实和用户洞察,然后通过层次化粗粒化、阈值更新和重缩放逐步整合为动态演化的用户画像。通过明确分离快速变化的证据和慢变特征,并启用非线性、相变般的动力学,RGMem实现了超越平面检索或静态摘要的稳健个性化。在LOCOMO和PersonaMem基准上的大量实验表明,RGMem持续优于最先进的记忆系统,实现了更强的跨会话连续性并更好地适应演化的用户偏好。代码可在https://github.com/fenhg297/RGMem获取。

英文摘要

Personalized and continuous interactions are critical for LLM-based conversational agents, yet finite context windows and static parametric memory hinder the modeling of long-term, cross-session user states. Existing approaches, including retrieval-augmented generation and explicit memory systems, primarily operate at the fact level, making it difficult to distill stable preferences and deep user traits from evolving and potentially conflicting dialogues.To address this challenge, we propose RGMem, a self-evolving memory framework inspired by the renormalization group (RG) perspective on multi-scale organization and emergence. RGMem models long-term conversational memory as a multi-scale evolutionary process: episodic interactions are transformed into semantic facts and user insights, which are then progressively integrated through hierarchical coarse-graining, thresholded updates, and rescaling into a dynamically evolving user profile.By explicitly separating fast-changing evidence from slow-varying traits and enabling non-linear, phase-transition-like dynamics, RGMem enables robust personalization beyond flat retrieval or static summarization. Extensive experiments on the LOCOMO and PersonaMem benchmarks demonstrate that RGMem consistently outperforms SOTA memory systems, achieving stronger cross-session continuity and improved adaptation to evolving user preferences. Code is available at https://github.com/fenhg297/RGMem

2510.02763 2026-06-03 cs.LG cs.AI

Fusing Multi- and Hyperspectral Satellite Data for Harmful Algal Bloom Monitoring with Self-Supervised and Hierarchical Deep Learning

融合多光谱和高光谱卫星数据用于有害藻华监测的自监督与分层深度学习

Nicholas LaHaye, Kelly M. Luis, Michelle M. Gierach

发表机构 * University of Colorado Boulder(科罗拉多大学博尔德分校)

AI总结 提出自监督机器学习框架SIT-FUSE,融合多传感器卫星反射率与TROPOMI太阳诱导荧光数据,通过分层深度聚类生成有害藻华严重程度和物种分类产品,在墨西哥湾和南加州验证了与实测数据的一致性。

详情
AI中文摘要

我们提出了一种自监督机器学习框架,用于利用多传感器卫星数据检测和绘制有害藻华(HABs)的严重程度和物种分类。通过融合来自运行极轨卫星仪器(VIIRS、MODIS、OLCI和OCI)的反射率数据与TROPOMI太阳诱导荧光(SIF),我们的框架SIT-FUSE无需每个仪器的标记数据集即可生成HAB严重程度和物种分类产品。该框架采用自监督表示学习和分层深度聚类,将浮游植物细胞丰度和物种分割成可解释的类别,并利用墨西哥湾和南加州(2018-2025年)的原位数据进行了验证。结果显示与总浮游植物、短凯伦藻和拟菱形藻属测量值高度一致。这项工作推进了在地面观测有限的环境中进行可扩展的HAB监测,同时通过分层嵌入实现探索性分析——这是将自监督学习应用于全球水生生物地球化学操作化的关键一步。

英文摘要

We present a self-supervised machine learning framework for detecting and mapping the severity and speciation of harmful algal blooms (HABs) using multi-sensor satellite data. By fusing reflectance data from operational polar-orbiting satellite-based instruments (VIIRS, MODIS, OLCI, and OCI) with TROPOMI solar-induced fluorescence (SIF), our framework, called SIT-FUSE, generates HAB severity and speciation products without requiring per-instrument labeled datasets. The framework employs self-supervised representation learning and hierarchical deep clustering to segment phytoplankton cell abundance and species into interpretable classes, validated against in-situ data from the Gulf of Mexico and Southern California (2018-2025). Results show strong agreement with total phytoplankton, Karena brevis, and Pseudo-nitzschia spp. measurements. This work advances scalable HAB monitoring in environments where ground truth observations are limited, while enabling exploratory analysis via hierarchical embeddings - a critical step toward operationalizing self-supervised learning for global aquatic biogeochemistry.

2601.23229 2026-06-03 cs.AI cs.CC

Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs

$L_\infty$ 鲁棒 MDP 的策略迭代的强多项式时间复杂度

Ali Asadi, Krishnendu Chatterjee, Ehsan Goharshady, Mehrdad Karrabi, Alipasha Montaseri, Carlo Pagano

发表机构 * Institute for Computer Science, Austrian Academy of Sciences(奥地利科学院计算机科学研究所) Concordia University(康科迪亚大学)

AI总结 针对 $(s,a)$-矩形 $L_\infty$ 鲁棒 MDP 的折扣问题,证明了策略迭代算法在固定折扣因子下具有强多项式时间复杂度。

Comments To Appear in The 39th Annual Conference on Learning Theory (COLT'26)

详情
AI中文摘要

马尔可夫决策过程(MDP)是序列决策中的基本模型。鲁棒 MDP(RMDP)通过允许转移概率存在不确定性并针对最坏情况不确定性进行优化来扩展此框架。特别地,具有 $L_\infty$ 不确定性集的 $(s,a)$-矩形 RMDP 构成一个基础且富有表现力的模型:它们包含经典 MDP 和回合制随机博弈。我们考虑具有折扣收益的此模型。多项式时间和强多项式时间算法的存在性是这些优化模型的基本问题。对于 MDP,线性规划为任意折扣因子提供了多项式时间算法,而 Ye 的开创性工作为固定折扣因子建立了强多项式时间。将这些结果推广到 RMDP 仍然是一个重要的开放问题。在这项工作中,我们证明了鲁棒策略迭代算法在常数(固定)折扣因子下对于 $(s,a)$-矩形 $L_\infty$ RMDP 以强多项式时间运行,解决了一个重要的算法问题。

英文摘要

Markov decision processes (MDPs) are a fundamental model in sequential decision making. Robust MDPs (RMDPs) extend this framework by allowing uncertainty in transition probabilities and optimizing against the worst-case realization of that uncertainty. In particular, $(s, a)$-rectangular RMDPs with $L_\infty$ uncertainty sets form a fundamental and expressive model: they subsume classical MDPs and turn-based stochastic games. We consider this model with discounted payoffs. The existence of polynomial and strongly-polynomial time algorithms is a fundamental problem for these optimization models. For MDPs, linear programming yields polynomial-time algorithms for any arbitrary discount factor, and the seminal work of Ye established strongly--polynomial time for a fixed discount factor. The generalization of such results to RMDPs has remained an important open problem. In this work, we show that a robust policy iteration algorithm runs in strongly-polynomial time for $(s, a)$-rectangular $L_\infty$ RMDPs with a constant (fixed) discount factor, resolving an important algorithmic question.

2601.23169 2026-06-03 cs.LG cs.LO cs.SC

Names Don't Matter: Symbol-Invariant Transformer for Open-Vocabulary Learning

名称无关:面向开放词汇学习的符号不变Transformer

İlker Işık, Wenchao Li

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 提出一种符号不变Transformer机制,通过并行嵌入流和聚合注意力实现可互换令牌的重命名不变性,在开放词汇任务上取得显著性能提升。

Comments ICML 2026 Poster (Camera-Ready Version)

详情
AI中文摘要

当前的神经架构缺乏处理可互换令牌(即语义等价但可区分的符号,如绑定变量)的原则性方法。因此,在固定词汇表上训练的模型通常难以泛化到未见过的符号,即使底层语义保持不变。我们提出了一种新颖的基于Transformer的机制,该机制对可互换令牌的重命名具有可证明的不变性。我们的方法采用并行嵌入流来隔离输入中每个可互换令牌的贡献,并结合聚合注意力机制实现跨流的结构化信息共享。实验结果证实了我们方法的理论保证,并在需要泛化到新符号的开放词汇任务上展示了显著的性能提升。项目页面:https://bu-depend-lab.github.io/Symbol-Invariant-Transformer/

英文摘要

Current neural architectures lack a principled way to handle interchangeable tokens, i.e., symbols that are semantically equivalent yet distinguishable, such as bound variables. As a result, models trained on fixed vocabularies often struggle to generalize to unseen symbols, even when the underlying semantics remain unchanged. We propose a novel Transformer-based mechanism that is provably invariant to the renaming of interchangeable tokens. Our approach employs parallel embedding streams to isolate the contribution of each interchangeable token in the input, combined with an aggregated attention mechanism that enables structured information sharing across streams. Experimental results confirm the theoretical guarantees of our method and demonstrate substantial performance gains on open-vocabulary tasks that require generalization to novel symbols. Project page: https://bu-depend-lab.github.io/Symbol-Invariant-Transformer/

2601.22599 2026-06-03 cs.SD cs.HC

A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation

用于数据高效查询式通用声音分离的语义一致数据集

Kai Li, Jintao Cheng, Chang Zeng, Zijun Yan, Helin Wang, Zixiong Su, Bo Zheng, Xiaolin Hu

发表机构 * Department of Computer Science and Technology, Institute for AI, BNRist, Tsinghua University, Beijing, China(计算机科学与技术系,人工智能研究所,BNRist,清华大学,北京,中国) Shanda AI Research Tokyo(莎莎人工智能研究东京) IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing, China(IDG/麦戈文脑研究 institute,清华大学,北京,中国) Johns Hopkins University(约翰霍普金斯大学) Chinese Institute for Brain Research (CIBR), Beijing, China(中国脑研究 institute(CIBR),北京,中国)

AI总结 提出自动管道通过语义一致合成协议消除事件共现,构建高质量合成数据集Hive,使模型在数据量极小的情况下达到与大规模训练模型相当的分离精度和泛化能力。

Comments Accepted to ICML 2026

详情
AI中文摘要

查询式通用声音分离是智能听觉系统的基础,旨在从混合声音中分离特定声源。尽管最近取得了进展,现有方法在复杂声学场景中仍存在残余干扰。这种性能限制主要源于数据瓶颈:野外数据集包含弱标签和严重的事件共现。这些缺陷导致模型学习背景噪声与目标类别之间的虚假相关性,而非鲁棒的声学特征。为解决这一问题,我们提出了一种自动管道,通过语义一致合成协议从野外数据集中挖掘高纯度单事件片段,从而消除事件共现。利用该管道,我们构建了Hive,一个包含2400小时原始音频的高质量合成数据集。实验结果表明,与在比Hive大约500倍的大数据集上训练的最先进模型SAM-Audio相比,在Hive上训练的某些开源模型达到了具有竞争力的分离精度和感知质量。此外,这些模型在分布外评估基准上表现出显著的零样本泛化能力。这些发现强调,优先考虑监督信号的纯度可以实现显著的数据效率,为以降低计算成本训练鲁棒的听觉基础模型提供了新范式。代码和数据集可在https://cslikai.cn/Hive获取。

英文摘要

Query-based universal sound separation is fundamental to intelligent auditory systems, aiming to isolate specific sources from mixtures. Despite recent advances, existing methods continue to suffer from residual interference in complex acoustic scenes. This performance limitation stems largely from a data bottleneck: in-the-wild datasets contain weak labels and severe co-occurrence of events. These flaws induce models to learn spurious correlations between background noise and target categories instead of robust acoustic features. To address this, we propose an automated pipeline that eliminates co-occurrence of events by mining high-purity single-event segments from in-the-wild datasets via a semantically consistent synthesis protocol. Utilizing this pipeline, we constructed Hive, a high-quality synthetic dataset comprising 2.4k hours of raw audio. Experimental results demonstrate that, compared with the state-of-the-art model SAM-Audio which was trained on a huge dataset $\sim$500 times larger than Hive, certain open-source models trained on Hive achieve competitive separation accuracy and perceptual quality. Moreover, these models exhibited remarkable zero-shot generalization on out-of-distribution evaluation benchmarks. These findings highlight that prioritizing purity of supervised signals enables significant data efficiency, offering a new paradigm for training robust auditory foundation models with reduced computational costs. Code and dataset are available at https://cslikai.cn/Hive.

2601.22443 2026-06-03 cs.LG cs.CV stat.CO stat.ML

Weak Diffusion Priors Can Still Achieve Strong Inverse-Problem Performance

弱扩散先验仍能实现强逆问题性能

Jing Jia, Wei Yuan, Sifan Liu, Liyue Shen, Guanyang Wang

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 研究弱扩散先验在逆问题中的鲁棒性,通过贝叶斯一致性和局部相关性分析揭示其在信息丰富测量下仍有效的原因。

Comments 37 pages, ICML 2026 spotlight. Code: https://github.com/jjia131/weak-diffusion-priors-inverse-problem, Project Page: https://jjia131.github.io/weak-diffusion-priors-inverse-problem/

详情
AI中文摘要

在卧室图像上训练的扩散模型能否恢复人脸图像?扩散模型被广泛用作逆问题的先验,但标准方法通常假设一个高保真模型,该模型在与未知信号高度匹配的数据上训练。实践中,常常必须使用不匹配或低保真的扩散先验。令人惊讶的是,这些弱先验的表现往往几乎与全强度的域内基线相当。我们研究了逆求解器何时以及为何对弱扩散先验具有鲁棒性。通过大量实验,我们发现当测量信息高度丰富(例如,大量观测像素)时,弱先验能够成功,并识别了它们失败的场景。为了解释这一行为,我们将贝叶斯一致性理论与局部相关性分析相结合:理论给出了高维测量使后验集中于真实信号附近的条件,而相关性分析表明弱先验和更强的自然图像先验可以共享相似的局部空间结构。这些结果为何时可以可靠地使用弱扩散先验提供了原则性依据。代码可在 https://github.com/jjia131/weak-diffusion-priors-inverse-problem 获取。

英文摘要

Can a diffusion model trained on bedrooms recover human faces? Diffusion models are widely used as priors for inverse problems, but standard approaches usually assume a high-fidelity model trained on data that closely match the unknown signal. In practice, one often must use a mismatched or low-fidelity diffusion prior. Surprisingly, these weak priors often perform nearly as well as full-strength, in-domain baselines. We study when and why inverse solvers are robust to weak diffusion priors. Through extensive experiments, we find that weak priors succeed when measurements are highly informative (e.g., many observed pixels), and we identify regimes where they fail. To explain this behavior, we combine Bayesian-consistency theory with local-correlation analysis: the theory gives conditions under which high-dimensional measurements make the posterior concentrate near the true signal, while the correlation analysis shows that weak and stronger natural-image priors can share similar local spatial structure. These results provide a principled justification on when weak diffusion priors can be used reliably. Code is available at https://github.com/jjia131/weak-diffusion-priors-inverse-problem.

2512.19347 2026-06-03 cs.RO

OMP: One-step Meanflow Policy with Directional Alignment

OMP: 一步均值流策略与方向对齐

Han Fang, Yize Huang, Yuheng Zhao, Paul Weng, Xiao Li, Yutong Ban

发表机构 * School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China(上海交通大学机械工程学院) Global College, Shanghai Jiao Tong University, Shanghai, China(上海交通大学全球学院) Duke Kunshan University, Jiangsu, China(杜克昆山大学)

AI总结 提出一步均值流策略(OMP),通过方向对齐机制和微分推导方程解决均值流在机器人操作中的谱偏差和梯度饥饿问题,实现高保真实时操控。

Comments Accepted as poster of ICML-2026

详情
AI中文摘要

机器人操作日益采用数据驱动的生成策略框架,但该领域面临持续的权衡:扩散模型推理延迟高,而基于流的方法通常需要复杂的架构约束。尽管在图像生成领域,均值流范式提供了单步推理的路径,但其直接应用于机器人领域受到关键理论病理的阻碍,特别是低速度区域中的谱偏差和梯度饥饿。为克服这些限制,我们提出了一步均值流策略(OMP),一种专为高保真实时操作设计的新型框架。我们引入轻量级方向对齐机制,以显式同步预测速度与真实均值速度。此外,我们实现了微分推导方程(DDE)来近似雅可比向量积(JVP)算子,该算子解耦前向和后向传播,显著降低内存复杂度。在Adroit和Meta-World基准上的大量实验表明,OMP在成功率和轨迹精度上优于最先进方法,特别是在高精度任务中,同时保持了单步生成的效率。

英文摘要

Robot manipulation has increasingly adopted data-driven generative policy frameworks, yet the field faces a persistent trade-off: diffusion models suffer from high inference latency, while flow-based methods often require complex architectural constraints. Although in image generation domain, the MeanFlow paradigm offers a path to single-step inference, its direct application to robotics is impeded by critical theoretical pathologies, specifically spectral bias and gradient starvation in low-velocity regimes. To overcome these limitations, we propose the One-step MeanFlow Policy (OMP), a novel framework designed for high-fidelity, real-time manipulation. We introduce a lightweight directional alignment mechanism to explicitly synchronize predicted velocities with true mean velocities. Furthermore, we implement a Differential Derivation Equation (DDE) to approximate the Jacobian-Vector Product (JVP) operator, which decouples forward and backward passes to significantly reduce memory complexity. Extensive experiments on the Adroit and Meta-World benchmarks demonstrate that OMP outperforms state-of-the-art methods in success rate and trajectory accuracy, particularly in high-precision tasks, while retaining the efficiency of single-step generation.

2601.21683 2026-06-03 cs.LG

Can Local Learning Match Self-Supervised Backpropagation?

局部学习能否匹配自监督反向传播?

Wu S. Zihan, Ariane Delrocq, Wulfram Gerstner, Guillaume Bellec

发表机构 * University of Zurich(苏黎世大学)

AI总结 本文通过理论分析和算法变体,证明局部自监督学习在深度非线性卷积网络中可接近全局反向传播自监督学习的性能,并在图像数据集上达到或超越现有最优水平。

Comments Accepted at ICML 2026; Code is available at https://github.com/zihan-wu/local-SSL

详情
AI中文摘要

虽然基于反向传播的端到端自监督学习(全局BP-SSL)已成为训练现代AI系统的核心,但局部自监督学习(local-SSL)理论在深度神经网络中构建功能表示方面仍面临挑战。为建立全局与局部规则之间的联系,我们首先发展了深度线性网络的理论:识别了局部SSL算法(如Forward-forward或CLAPP)实现与全局BP-SSL完全相同的权重更新的条件。从理论见解出发,我们随后开发了局部SSL算法的新变体,以近似深度非线性卷积神经网络中的全局BP-SSL。那些提高局部SSL与全局BP-SSL梯度更新相似性的变体在图像数据集(CIFAR-10、STL-10和Tiny ImageNet)上也表现出更好的性能。使用CLAPP损失函数的最佳局部SSL规则与使用InfoNCE或CPC类损失函数的可比全局BP-SSL性能相匹配,并在这些基准上改进了局部SSL的最新技术水平。

英文摘要

While end-to-end self-supervised learning with backpropagation (global BP-SSL) has become central for training modern AI systems, theories of local self-supervised learning (local-SSL) have struggled to build functional representations in deep neural networks. To establish a link between global and local rules, we first develop a theory for deep linear networks: we identify conditions for local-SSL algorithms (like Forward-forward or CLAPP) to implement exactly the same weight update as a global BP-SSL. Starting from the theoretical insights, we then develop novel variants of local-SSL algorithms to approximate global BP-SSL in deep non-linear convolutional neural networks. Variants that improve the similarity between gradient updates of local-SSL with those of global BP-SSL also show better performance on image datasets (CIFAR-10, STL-10, and Tiny ImageNet). The best local-SSL rule with the CLAPP loss function matches the performance of a comparable global BP-SSL with InfoNCE or CPC-like loss functions, and improves upon state-of-the-art for local SSL on these benchmarks.