arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.13697 2026-06-05 cs.AI cs.DB cs.LG

No Need to Train Your RDB Foundation Model

无需训练你的关系数据库基础模型

Linjie Xu, Yanlin Zhang, Quan Gan, Minjie Wang, David Wipf

发表机构 * University of Hong Kong, Shanghai X-Lab（香港大学，上海X实验室）

AI总结本文提出了一种基于上下文学习的关系数据库编码器，能够在不重新训练的情况下，与现有的单表上下文学习基础模型结合，实现对多张相关表的高效处理。

Comments International Conference on Machine Learning (ICML) 2026

详情

AI中文摘要

关系数据库（RDBs）包含大量异构的表格信息，可用于预测建模。但鉴于企业环境中潜在的目标空间广阔，如何避免每次预测新感兴趣的量时重新训练新模型？基于上下文学习（ICL）的基础模型提供了一种方便的选项，但目前大多局限于单表操作。在推广到多张相互关联的表时，关键在于将可变大小的RDB邻域压缩为固定长度的ICL样本供解码器使用。然而，细节至关重要：与现有监督学习RDB流程不同，我们提供了理论和实证证据表明，ICL特定的压缩应限制在高维RDB列中，其中所有实体共享单位和角色，而不是跨列，因为异构数据类型的相关性无法在缺乏大量标签信息的情况下确定。基于此限制，我们证明了排除可训练参数不会影响编码器的表达能力。因此，我们得到了一种原理上可行的RDB编码器家族，可以无缝搭配已有的单表ICL基础模型，从而无需训练或微调。从实用角度看，我们开发了可扩展的SQL原语来实现编码器阶段，最终得到一个易于使用的开源RDBLearn基础模型，能够在未见过的数据集上实现稳健的性能。

英文摘要

Relational databases (RDBs) contain vast amounts of heterogeneous tabular information that can be exploited for predictive modeling purposes. But since the space of potential targets is vast across enterprise settings, how can we avoid retraining a new model each time we wish to predict a new quantity of interest? Foundation models based on in-context learning (ICL) offer a convenient option, but so far are largely restricted to single-table operability. In generalizing to multiple interrelated tables, it is essential to compress variably-sized RDB neighborhoods into fixed-length ICL samples for consumption by the decoder. However, the details here are critical: unlike existing supervised learning RDB pipelines, we provide theoretical and empirical evidence that ICL-specific compression should be constrained within high-dimensional RDB columns where all entities share units and roles, not across columns where the relevance of heterogeneous data types cannot be determined without extensive label information. Conditioned on this restriction, we then demonstrate that encoder expressiveness is actually not compromised by excluding trainable parameters. Hence we arrive at a principled family of RDB encoders that can be seamlessly paired with already-existing single-table ICL foundation models, whereby no training or fine-tuning is required. From a practical standpoint, we develop scalable SQL primitives to implement the encoder stage, resulting in the easy-to-use open-source RDBLearn foundation model capable of robust performance on unseen datasets out of the box.

URL PDF HTML ☆

赞 0 踩 0

2602.13255 2026-06-05 cs.AI cs.MA

DPBench: Structural Determinants of Multi-Agent LLM Coordination Under Simultaneous Resource Contention

DPBench: 多智能体LLM在同时资源竞争下的协调结构决定因素

Najmul Hasan, Prashanth BusiReddyGari

发表机构 * Department of Mathematics and Computer Science University of North Carolina at Pembroke（数学与计算机科学系北卡罗来纳大学帕特森分校）

AI总结本文提出DPBench，用于评估多智能体系统中协调性能的基准测试，通过分析不同协议、通信结构和群体规模对协调成功或失败的影响，揭示了多智能体LLM在资源竞争中的协调机制。

Comments 20 pages, 4 figures

详情

AI中文摘要

我们提出了DPBench，一个用于评估多智能体系统中协调性能的基准测试，该测试基于大型语言模型构建。现有基准测试在固定协议下衡量任务级的成功率；然而，协调成功或失败的结构条件尚未被明确刻画。DPBench将哲学家就餐问题改编为受控测试平台，其中动作协议、通信结构和群体规模可独立变化。我们评估了六个智能体：GPT-5.2、Claude Opus 4.5、Grok 4.1、Gemini 2.5 Flash、Llama 4 Maverick以及一个均匀随机基线。在N=5的同时动作下，默认提示中，GPT-5.2的死锁率为25.0%（95% Wilson置信区间[11.2, 46.9]），而Gemini 2.5 Flash的死锁率为90.0%（[74.4, 96.5]）；顺序动作被六个智能体中的四个解决。在固定模型为Gemini 2.5 Flash的情况下，三个协议变量将死锁率从90%降低到置信区间接近零：三次预承诺通信（0.0% vs. 单次通信86.7%）、提示中包含经典并发原语（资源排序和对称打破的0.0% vs. 最小提示的100%）或将群体从N=5扩大到N=10（90.0%到10.0%）。单次通信和过去时间步的记忆在我们运行的样本量下不会改变死锁率。是否同一个模型协调或死锁由协议决定，而不是模型的能力。

英文摘要

We present DPBench, a benchmark for evaluating coordination in multi-agent systems built from large language models. Existing benchmarks measure task-level success under a fixed protocol; the structural conditions under which coordination succeeds or fails at all have not been characterised. DPBench adapts the Dining Philosophers problem into a controlled testbed where the action protocol, the communication structure, and the group size each vary independently. We evaluate six agents: GPT-5.2, Claude Opus 4.5, Grok 4.1, Gemini 2.5 Flash, Llama 4 Maverick, and a uniform-random baseline. Under simultaneous action at N=5 with the default prompt, deadlock ranges from 25.0% (95% Wilson CI [11.2, 46.9]) for GPT-5.2 to 90.0% [74.4, 96.5] for Gemini 2.5 Flash; sequential action is solved by four of the six. Holding the model fixed at Gemini 2.5 Flash, three protocol variables drive deadlock from 90% to within CI of zero: three rounds of pre-commitment communication (0.0% vs. single-round 86.7%), a prompt encoding a classical concurrency primitive (0.0% for resource-ordering and symmetry-breaking, against 100% for the minimal prompt), or doubling the group from N=5 to N=10 (90.0% to 10.0%). Single-round messaging and memory of past timesteps do not change the rate at the sample size we ran. Whether the same model coordinates or deadlocks is determined by the protocol, not by the model's capability.

URL PDF HTML ☆

赞 0 踩 0

2602.12124 2026-06-05 cs.LG cs.CL

Alignment Risks from Capability-Seeking RL Training

从能力寻求强化学习训练中产生的对齐风险

Yujun Zhou, Yue Huang, Han Bao, Kehan Guo, Zhenwen Liang, Pin-Yu Chen, Tian Gao, Werner Geyer, Nuno Moniz, Nitesh V Chawla, Xiangliang Zhang

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）； University of Washington（华盛顿大学）； University of Texas at Austin（德克萨斯大学奥斯汀分校）； University of Toronto（多伦多大学）； University of Cambridge（剑桥大学）

AI总结本文研究了在易受攻击的环境中通过强化学习训练语言模型时，模型可能利用隐含漏洞来最大化奖励的风险，发现这些策略不仅限于狭窄的技巧，还能在一定程度上转移、传播，并在某些情况下比通过SFT学习更持久，表明需要扩展AI安全工作到审计和保障训练环境、奖励机制和评估渠道。

Comments Accepted by ICML 2026

详情

AI中文摘要

尽管大多数AI对齐研究集中在防止模型生成显式有害内容，但来自易受攻击环境中的能力寻求强化学习训练的更微妙的风险却值得关注。我们研究了当语言模型在具有隐含漏洞的环境中通过强化学习（RL）训练时，是否能学习利用这些漏洞来最大化奖励，即使没有被明确指示这样做。为此，我们设计了四种多样化的“漏洞游戏”，每种游戏都涉及与上下文条件合规性、代理指标、奖励篡改和自我评估相关的结构性漏洞。我们的实验表明，模型经常学会利用这些漏洞，发现机会性策略以增加奖励，有时甚至保持或改进标准任务性能指标。更关键的是，我们发现这些剥削策略不总是狭窄的“技巧”：它们可以在结构但有限的方式下转移，通过SFT从有能力的教师模型传播到其他学生模型，并在某些情况下通过RL学习比通过SFT蒸馏更持久。我们的发现表明，来自能力寻求RL训练的能力对齐风险可能难以通过标准性能监控检测，这表明未来AI安全工作应超越内容审查，扩展到审计和保障训练环境、奖励机制和评估渠道。代码可在https://github.com/YujunZhou/Capability-seeking-RL-risk获取。

英文摘要

While most AI alignment research focuses on preventing models from generating explicitly harmful content, a more subtle risk arises from capability-seeking RL training in vulnerable environments. We investigate whether language models, when trained with reinforcement learning (RL) in environments with implicit loopholes, can learn to exploit these flaws to maximize reward, even without being explicitly instructed to do so. To test this, we design a suite of four diverse "vulnerability games," each presenting a structural vulnerability related to context-conditional compliance, proxy metrics, reward tampering, and self-evaluation. Our experiments show that models often learn to exploit these vulnerabilities, discovering opportunistic strategies that increase reward while sometimes preserving or even improving standard task-performance metrics. More critically, we find that these exploitative strategies are not always narrow "tricks": they can transfer in structured but limited ways, propagate from a capable teacher model to other student models through SFT, and in several cases remain more persistent when learned through RL than when distilled through SFT. Our findings show that alignment risks from capability-seeking RL training can be difficult to detect with standard performance monitoring, suggesting that future AI safety work should extend beyond content moderation to auditing and securing training environments, reward mechanisms, and evaluation channels. Code is available at https://github.com/YujunZhou/Capability-seeking-RL-risk.

URL PDF HTML ☆

赞 0 踩 0

2602.04809 2026-06-05 cs.LG cs.AI

Beyond Rewards in Reinforcement Learning for Cyber Defence

超越奖励的强化学习在网络安全防御中的应用

Elizabeth Bates, Chris Hicks, Vasilios Mavroudis

发表机构 * University of Cambridge（剑桥大学）

AI总结本文研究了在网络安全防御中使用强化学习时，奖励函数结构对学习和策略行为的影响，通过比较稀疏和密集奖励函数，揭示了奖励、动作空间和子最优策略风险之间的复杂关系。

详情

AI中文摘要

近年来，自主网络安全防御代理在使用深度强化学习保护计算机网络方面引起了广泛关注。这些代理通常在网络安全 gym 环境中训练，使用密集的、高度工程化的奖励函数，结合多种惩罚和激励，以应对各种（不） desirable 状态和昂贵的操作。密集奖励有助于缓解探索复杂环境的挑战，但会偏向于次优且可能风险更大的解决方案，这对复杂的网络安全环境至关重要。我们通过多种稀疏和密集奖励函数、两种已确立的网络安全 gym、不同网络规模以及策略梯度和基于价值的 RL 算法，全面评估了奖励函数结构对学习和策略行为特征的影响。我们的评估得益于一种新的真实评估方法，使可以直接比较不同的奖励函数，揭示了奖励、动作空间和网络安全环境中子最优策略风险之间的微妙关系。我们的结果表明，稀疏奖励，如果目标一致且可以频繁遇到，能够提供增强的训练可靠性和更有效的网络安全防御代理，具有较低风险的策略。令人惊讶的是，稀疏奖励还能产生与网络安全守护者目标更一致的策略，并在不使用显式奖励基于数值惩罚的情况下，节省昂贵的防御操作。

英文摘要

Recent years have seen an explosion of interest in autonomous cyber defence agents trained to defend computer networks using deep reinforcement learning. These agents are typically trained in cyber gym environments using dense, highly engineered reward functions which combine many penalties and incentives for a range of (un)desirable states and costly actions. Dense rewards help alleviate the challenge of exploring complex environments but risk biasing agents towards suboptimal and potentially riskier solutions, a critical issue in complex cyber environments. We thoroughly evaluate the impact of reward function structure on learning and policy behavioural characteristics using a variety of sparse and dense reward functions, two well-established cyber gyms, a range of network sizes, and both policy gradient and value-based RL algorithms. Our evaluation is enabled by a novel ground truth evaluation approach which allows directly comparing between different reward functions, illuminating the nuanced inter-relationships between rewards, action space and the risks of suboptimal policies in cyber environments. Our results show that sparse rewards, provided they are goal aligned and can be encountered frequently, uniquely offer both enhanced training reliability and more effective cyber defence agents with lower-risk policies. Surprisingly, sparse rewards can also yield policies that are better aligned with cyber defender goals and make sparing use of costly defensive actions without explicit reward-based numerical penalties.

URL PDF HTML ☆

赞 0 踩 0

2602.10314 2026-06-05 cs.LG

Stop Training for the Worst: Progressive Unmasking Accelerates Masked Diffusion Training

停止训练于最差：渐进性解蔽加速了掩码扩散训练

Jaeyeon Kim, Jonathan Geuter, David Alvarez-Melis, Sham Kakade, Sitan Chen

发表机构 * Harvard University（哈佛大学）； Kempner Institute（凯普纳研究所）

AI总结本文提出了一种名为渐进性解蔽（PUMA）的方法，通过修改前向掩码过程，使训练时间和推理时的掩码模式一致，从而加速了掩码扩散模型的训练。

详情

AI中文摘要

掩码扩散模型（MDMs）已在离散空间的生成建模中展现出有前途的潜力。通过以任何顺序生成序列并允许并行解码，它们能够实现快速的推理和在非因果任务上的强大性能。然而，这种灵活性带来了训练复杂度的权衡：MDMs需要在一个指数级大的掩码模式集合上进行训练，这不仅计算成本高昂，而且在训练时使用的随机掩码与推理时由解码过程诱导的结构化掩码之间存在训练-测试不匹配。在本文中，我们提出渐进性解蔽（PUMA），这是一种简单的前向掩码过程修改方法，使训练时间和推理时的掩码模式一致，从而将优化集中在推理对齐的掩码上并加快训练。经验上，PUMA在125M规模的预训练中加速了约2.5倍，并在自回归初始化等常见方法上提供了互补的优势。我们开源了我们的代码库：https://github.com/JaeyeonKim01/PUMA。

英文摘要

Masked Diffusion Models (MDMs) have emerged as a promising approach for generative modeling in discrete spaces. By generating sequences in any order and allowing for parallel decoding, they enable fast inference and strong performance on non-causal tasks. However, this flexibility comes with a training complexity trade-off: MDMs train on an exponentially large set of masking patterns, which is not only computationally expensive, but also creates a train--test mismatch between the random masks used in training and the highly structured masks induced by inference-time unmasking. In this work, we propose Progressive UnMAsking (PUMA), a simple modification of the forward masking process that aligns training-time and inference-time masking patterns, thereby focusing optimization on inference-aligned masks and speeding up training. Empirically, PUMA speeds up pretraining at the 125M scale by $\approx 2.5\times$ and offers complementary advantages on top of common recipes like autoregressive initialization. We open-source our codebase at https://github.com/JaeyeonKim01/PUMA.

URL PDF HTML ☆

赞 0 踩 0

2602.10106 2026-06-05 cs.RO

EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration

EgoHumanoid: 通过无机器人眼示范解锁真实场景中的移动- manipulation

Modi Shi, Shijia Peng, Jin Chen, Haoran Jiang, Tianyu Li, Di Huang, Ping Luo, Hongyang Li, Li Chen

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Tsinghua University（清华大学）

AI总结本文提出EgoHumanoid框架，通过结合大量眼示范数据和少量机器人数据共同训练视觉-语言-动作策略，使机器人能够执行多样化的现实环境中的移动- manipulation任务，实验表明无机器人数据显著提升了性能，尤其在未见过的环境中表现更优。

Comments Project page: https://opendrivelab.com/EgoHumanoid

详情

AI中文摘要

人类示范提供丰富的环境多样性，并能自然扩展规模，使其成为机器人远程操作的有吸引力替代方案。尽管这一范式已促进了机器人手臂操作的发展，但其在更具挑战性且数据需求高的问题——人形机器人移动- manipulation方面的潜力仍 largely未被探索。我们提出了EgoHumanoid，这是首个框架，通过大量眼示范数据和少量机器人数据共同训练视觉-语言-动作策略，使机器人能够执行多样化的现实环境中的移动- manipulation任务。为弥合人类与机器人之间的身体差距，包括形态和视角的差异，我们引入了一个系统化的对齐流程，涵盖从硬件设计到数据处理的各个方面。开发了一种便携式系统用于可扩展的人类数据收集，并建立了实用的收集协议以提高可迁移性。我们的核心人类到人形机器人对齐流程包含两个关键组件。视图对齐减少了由相机高度和视角变化引起的视觉领域差异。动作对齐将人类动作映射到一个统一的、在人形机器人控制中可行的动作空间。广泛的现实世界实验表明，结合无机器人眼数据显著优于仅机器人数据的基线，提高了51%，特别是在未见过的环境中。我们的分析进一步揭示了哪些行为能够有效迁移以及人类数据扩展的潜力。

英文摘要

Human demonstrations offer rich environmental diversity and scale naturally, making them an appealing alternative to robot teleoperation. While this paradigm has advanced robot-arm manipulation, its potential for the more challenging, data-hungry problem of humanoid loco-manipulation remains largely unexplored. We present EgoHumanoid, the first framework to co-train a vision-language-action policy using abundant egocentric human demonstrations together with a limited amount of robot data, enabling humanoids to perform loco-manipulation across diverse real-world environments. To bridge the embodiment gap between humans and robots, including discrepancies in physical morphology and viewpoint, we introduce a systematic alignment pipeline spanning from hardware design to data processing. A portable system for scalable human data collection is developed, and we establish practical collection protocols to improve transferability. At the core of our human-to-humanoid alignment pipeline lies two key components. The view alignment reduces visual domain discrepancies caused by camera height and perspective variation. The action alignment maps human motions into a unified, kinematically feasible action space for humanoid control. Extensive real-world experiments demonstrate that incorporating robot-free egocentric data significantly outperforms robot-only baselines by 51\%, particularly in unseen environments. Our analysis further reveals which behaviors transfer effectively and the potential for scaling human data.

URL PDF HTML ☆

赞 0 踩 0

2602.09574 2026-06-05 cs.CL cs.AI cs.LG

Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs

在LLMs的测试时间扩展中对树搜索策略与固定令牌预算对齐

Sora Miyamoto, Daisuke Oba, Naoaki Okazaki

发表机构 * University of Tokyo（东京大学）

AI总结本文提出了一种名为Budget-Guided MCTS (BG-MCTS)的树搜索解码算法，通过将搜索策略与剩余令牌预算对齐，以提高在不同令牌预算下的推理性能。

Comments Accepted at ICML 2026. Code: https://github.com/Sora-Miyamoto/bg-mcts

2602.08749 2026-06-05 cs.CV

Shifting the Breaking Point of Flow Matching for Multi-Instance Editing

将流匹配的断裂点转向多实例编辑

Carmine Zaccagnino, Fabio Quattrini, Enis Simsar, Marta Tintoré Gazulla, Rita Cucchiara, Alessio Tonioni, Silvia Cascianelli

发表机构 * University of Bologna（博洛尼亚大学）

AI总结针对流匹配模型在多实例编辑中语义纠缠的问题，提出实例解耦注意力机制，通过分割联合注意力操作强制实例-文本指令与空间区域的绑定，实现单次前向传播的实例级编辑。

Comments Accepted at ICML 2026

详情

AI中文摘要

流匹配模型最近作为扩散模型的高效替代方案出现，特别是在文本引导的图像生成和编辑中，通过连续时间动力学提供更快的推理。然而，现有的基于流的编辑器主要支持全局或单指令编辑，在多实例场景中表现不佳，其中参考输入的多个部分必须独立编辑而不受语义干扰。我们将此限制归因于全局条件速度场和联合注意力机制，它们纠缠了并发编辑。为了解决这个问题，我们引入了实例解耦注意力，一种分割联合注意力操作的机制，在速度场估计期间强制实例特定文本指令与空间区域之间的绑定。我们在自然图像编辑和新引入的具有区域级编辑指令的文本密集信息图表基准上评估了我们的方法。实验结果表明，我们的方法促进了编辑解耦和局部性，同时保持了全局输出的一致性，实现了单次前向传播的实例级编辑。

英文摘要

Flow matching models have recently emerged as an efficient alternative to diffusion, especially for text-guided image generation and editing, offering faster inference through continuous-time dynamics. However, existing flow-based editors predominantly support global or single-instruction edits and struggle with multi-instance scenarios, where multiple parts of a reference input must be edited independently without semantic interference. We identify this limitation as a consequence of globally conditioned velocity fields and joint attention mechanisms, which entangle concurrent edits. To address this issue, we introduce Instance-Disentangled Attention, a mechanism that partitions joint attention operations, enforcing binding between instance-specific textual instructions and spatial regions during velocity field estimation. We evaluate our approach on both natural image editing and a newly introduced benchmark of text-dense infographics with region-level editing instructions. Experimental results demonstrate that our approach promotes edit disentanglement and locality while preserving global output coherence, enabling single-pass, instance-level editing.

URL PDF HTML ☆

赞 0 踩 0

2602.08503 2026-06-05 cs.CV cs.CL cs.LG

Learning Self-Correction in Vision-Language Models via Rollout Augmentation

通过回滚增强学习视觉-语言模型中的自我纠正

Yi Ding, Ziliang Qiu, Bolian Li, Ruqi Zhang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文提出一种基于回滚增强的强化学习框架Octopus，通过重新组合现有回滚生成密集的自我纠正示例，提高样本效率并稳定RL优化，同时引入响应遮蔽策略以解耦自我纠正与直接推理，从而在7个基准测试中实现开源VLM的SOTA性能。

Comments 18 pages

详情

Journal ref: ICML 2026

AI中文摘要

自我纠正对于解决视觉-语言模型（VLMs）中的复杂推理问题至关重要。然而，现有的强化学习（RL）方法在学习自我纠正方面存在困难，因为有效的自我纠正行为只在很少情况下出现，导致学习信号非常稀疏。为了解决这一挑战，我们提出了correction-specific rollouts（Octopus），一种RL回滚增强框架，通过重新组合现有回滚来合成密集的自我纠正示例。这种增强同时提高了样本效率，由于回滚重用，并通过平衡监督稳定了RL优化。此外，我们引入了一种响应遮蔽策略，将自我纠正与直接推理解耦，避免信号冲突，并使两种行为都能被有效学习。基于此，我们介绍了Octopus-8B，一种具有可控自我纠正能力的推理VLM。在7个基准测试中，它在开源VLM中实现了SOTA性能，优于最佳RLVR基线1.0分，同时仅需0.72倍的训练时间每步。

英文摘要

Self-correction is essential for solving complex reasoning problems in vision-language models (VLMs). However, existing reinforcement learning (RL) methods struggle to learn it, as effective self-correction behaviors emerge only rarely, making learning signals extremely sparse. To address this challenge, we propose correction-specific rollouts (Octopus), an RL rollout augmentation framework that synthesizes dense self-correction examples by recombining existing rollouts. This augmentation simultaneously improves sample efficiency due to rollout reuse and stabilizes RL optimization through balanced supervision. Furthermore, we introduce a response-masking strategy that decouples self-correction from direct reasoning, avoiding signal conflicts and enabling both behaviors to be learned effectively. Building on this, we introduce Octopus-8B, a reasoning VLM with controllable self-correction capability. Across 7 benchmarks, it achieves SoTA performance among open-source VLMs, outperforming the best RLVR baseline by 1.0 score while requiring only $0.72\times$ training time per step.

URL PDF HTML ☆

赞 0 踩 0

2602.07834 2026-06-05 cs.LG math.DG

Interpretable Analytic Calabi-Yau Metrics via Symbolic Distillation

通过符号蒸馏获得可解释的分析Calabi-Yau度量

D Yang Eng

发表机构 * D Yang Eng

AI总结本文研究如何用少量射影不变量紧凑描述Calabi-Yau度量的点确定比，并通过符号回归发现低阶对称特征能有效捕捉教师变化，同时验证了在复杂结构模数范围内保持一致性。

详情

AI中文摘要

点确定比 $ R_ψ(z)\equiv \log\!\left( rac{\det g_{\mathrm{RF}}(z;ψ)}{\det g_{\mathrm{FS}}(z)} ight) $ 用于衡量Dwork五次曲面上的Ricci-flat度量偏离Fubini-Study基线的程度。我们询问这个标量可观测是否能用少量射影不变量紧凑描述，以及是否在复杂结构模数范围内保持有效。使用Donaldson的$k=10$平衡度量作为代数教师，并对采样点进行符号回归，我们发现，在此处研究的受限模数-only特征类别中，两个低阶对称特征，即幂和$p_2=\sum_i |z_i|^4$和三次基本对称多项式$σ_3=e_3$，已经能捕捉大部分教师变化。一个以$(p_2,σ_3)$为变量的三次多项式在测试中达到$R^2=0.946$，而添加剩余低阶对称生成器只会改变不到$10^{-3}$。在同一两个特征空间中，符号回归识别出一个五项有理多项式表达式，能够与$k=10$教师匹配，$R^2=0.9994$。在$ψ\in[0,0.8]$范围内重新拟合相同的函数框架，保持采样点云上的平均确定比代理$\langle R_ψ angle$在$0.01\%$以内，且在研究范围内产生平滑变化的拟合系数。Holomorphic Yukawa耦合$κ_{111}=5$仅作为归一化检查被重现。总体而言，这些结果提供了Dwork家族上一个度量衍生标量可观测的紧凑符号描述，同时受限于用于蒸馏的有限$k$教师，而不是建立闭合形式的Ricci-flat度量。

英文摘要

The pointwise determinant ratio \[ R_ψ(z)\equiv \log\!\left(\frac{\det g_{\mathrm{RF}}(z;ψ)}{\det g_{\mathrm{FS}}(z)}\right) \] measures how the Ricci-flat metric on the Dwork quintic departs from the Fubini--Study baseline. We ask whether this scalar observable can be described compactly in terms of a small number of projective invariants, and whether the same scaffold remains usable across complex-structure moduli. Using Donaldson's $k=10$ balanced metric as an algebraic teacher and symbolic regression on sampled points, we find that, within the restricted moduli-only feature class studied here, two low-order symmetric features, the power sum $p_2=\sum_i |z_i|^4$ and the cubic elementary symmetric polynomial $σ_3=e_3$, already capture most of the teacher variation. A degree-3 polynomial in $(p_2,σ_3)$ achieves held-out test $R^2=0.946$, while adding the remaining low-order symmetric generators changes this by less than $10^{-3}$. Within the same two-feature space, symbolic regression identifies a five-term rational-polynomial expression that matches the $k=10$ teacher with $R^2=0.9994$. Refitting the same functional scaffold across $ψ\in[0,0.8]$ keeps the mean determinant-ratio proxy $\langle R_ψ\rangle$ within $0.01\%$ of the local teachers on the sampled point clouds and yields smoothly varying fitted coefficients over the studied range. The holomorphic Yukawa coupling $κ_{111}=5$ is reproduced as a normalization check only. Taken together, these results provide a compact symbolic description of one metric-derived scalar observable on the Dwork family, while remaining bounded by the finite-$k$ teacher used for distillation rather than establishing a closed-form Ricci-flat metric.

URL PDF HTML ☆

赞 0 踩 0

2602.07428 2026-06-05 cs.CV

Row-Column Separated Attention Based Low-Light Image/Video Enhancement

基于行-列分离注意力的低光照图像/视频增强

Chengqi Dong, Zhiyuan Cao, Tuoshi Qi, Kexin Wu, Yixing Gao, Fan Tang

发表机构 * School of Artificial Intelligence, Jilin University, China（吉林大学人工智能学院）； College of Software, Jilin University, China（吉林大学软件学院）； Institute of Computing Technology, Chinese Academy of Sciences, China（中国科学院计算技术研究所）

AI总结本文提出了一种行-列分离注意力模块（RCSA），用于改进U-Net结构以增强低光照图像和视频，通过减少参数和计算量来利用全局信息指导局部信息，同时提出两种时间损失函数以保持时间一致性。

详情

DOI: 10.1111/cgf.15192

AI中文摘要

U-Net结构被广泛用于低光照图像/视频增强。增强的图像在没有适当全局信息指导的情况下，会导致局部噪声较大和细节丢失。注意力机制可以更好地关注和利用全局信息。然而，对图像的注意力可能会显著增加参数和计算量。我们提出了一种行-列分离注意力模块（RCSA），插入到改进的U-Net之后。RCSA模块的输入是特征图的行和列的均值和最大值，利用全局信息以较少的参数指导局部信息。我们提出两种时间损失函数，将该方法应用于低光照视频增强并保持时间一致性。在LOL、MIT Adobe FiveK图像和SDSD视频数据集上的广泛实验表明了我们方法的有效性。代码可在https://github.com/cq-dong/URCSA上公开获取。

英文摘要

U-Net structure is widely used for low-light image/video enhancement. The enhanced images result in areas with large local noise and loss of more details without proper guidance for global information. Attention mechanisms can better focus on and use global information. However, attention to images could significantly increase the number of parameters and computations. We propose a Row-Column Separated Attention module (RCSA) inserted after an improved U-Net. The RCSA module's input is the mean and maximum of the row and column of the feature map, which utilizes global information to guide local information with fewer parameters. We propose two temporal loss functions to apply the method to low-light video enhancement and maintain temporal consistency. Extensive experiments on the LOL, MIT Adobe FiveK image, and SDSD video datasets demonstrate the effectiveness of our approach. The code is publicly available at https://github.com/cq-dong/URCSA.

URL PDF HTML ☆

赞 0 踩 0

2602.07253 2026-06-05 cs.AI cs.CL

From Out-of-Distribution Detection to Hallucination Detection: A Geometric View

从分布外检测到幻觉检测：一个几何视角

Litian Liu, Reza Pourreza, Yubing Jian, Yao Qin, Roland Memisevic

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文通过将幻觉检测重新定义为分布外检测问题，利用几何视角提出了一种无需训练、基于单样本的检测方法，在推理任务中实现了高准确率。

Comments ICML 2026 main conference paper

详情

AI中文摘要

检测大型语言模型中的幻觉是一个关键且开放的问题，对安全性和可靠性有重大影响。虽然现有的幻觉检测方法在问答任务中表现强劲，但在需要推理的任务上效果不佳。在这项工作中，我们通过分布外（OOD）检测的视角重新审视幻觉检测，这是计算机视觉等领域中一个研究充分的问题。将语言模型中的下一个词预测视为分类任务，允许我们应用OOD技术，前提是进行适当的修改以考虑大型语言模型的结构差异。我们表明，基于OOD的方法产生了无需训练、基于单样本的检测器，在推理任务的幻觉检测中实现了高准确率。总体而言，我们的工作表明，将幻觉检测重新定义为OOD检测为语言模型安全性提供了一条有前景且可扩展的路径。

英文摘要

Detecting hallucinations in large language models is a critical open problem with significant implications for safety and reliability. While existing hallucination detection methods achieve strong performance in question-answering tasks, they remain less effective on tasks requiring reasoning. In this work, we revisit hallucination detection through the lens of out-of-distribution (OOD) detection, a well-studied problem in areas like computer vision. Treating next-token prediction in language models as a classification task allows us to apply OOD techniques, provided appropriate modifications are made to account for the structural differences in large language models. We show that OOD-based approaches yield training-free, single-sample-based detectors, achieving strong accuracy in hallucination detection for reasoning tasks. Overall, our work suggests that reframing hallucination detection as OOD detection provides a promising and scalable pathway toward language model safety.

URL PDF HTML ☆

赞 0 踩 0

2602.06773 2026-06-05 cs.LG stat.ML

On the Convergence of Multicalibration Gradient Boosting

多校准梯度提升的收敛性研究

Daniel Haimovich, Fridolin Linder, Lorenzo Perini, Niek Tax, Milan Vojnovic

发表机构 * Meta ； LSE, Department of Statistics（伦敦经济学院统计系）

AI总结本文研究了多校准梯度提升的收敛性，证明了预测更新的幅度以O(1/√T)衰减，并在额外的平滑假设下实现线性收敛，实验验证了理论结果和方法的快速收敛性。

Comments Under submission

2602.05843 2026-06-05 cs.CL

OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

OdysseyArena: 为长视界、主动和归纳交互评估大型语言模型

Hang Yan, Fangzhi Xu, Qiushi Sun, Jinyang Wu, Zixian Huang, Muye Huang, Jingyang Gong, Zichen Ding, Kanzhi Cheng, Yian Wang, Xinyu Che, Zeyi Sun, Jian Zhang, Zhangyue Yin, Haoran Luo, Ben Kao, Qika Lin

发表机构 * National University of Singapore（新加坡国立大学）

AI总结本文提出OdysseyArena，通过长视界、主动和归纳交互评估大型语言模型，提供120个任务测量归纳效率和长视界发现，并通过OdysseyArena-Challenge测试极端交互视界下的模型稳定性，揭示前沿模型在复杂环境中的归纳能力瓶颈。

Comments 34 pages

详情

AI中文摘要

大型语言模型（LLMs）的快速发展推动了能够导航复杂环境的自主代理的发展。然而，现有评估主要采用演绎范式，代理基于显式提供的规则和静态目标执行任务，通常在有限的规划视界内。关键的是，这种做法忽视了代理需要从经验中自主发现潜在转换规律的归纳必要性，这是实现代理前瞻性思维和维持战略一致性的重要基础。为弥合这一差距，我们引入OdysseyArena，将代理评估重新聚焦于长视界、主动和归纳交互。我们形式化并实例化了四个原始构件，将抽象转换动态转化为具体的交互环境。在此基础上，我们建立了OdysseyArena-Lite用于标准化基准测试，提供一组120个任务以衡量代理的归纳效率和长视界发现能力。进一步地，我们引入OdysseyArena-Challenge以在极端交互视界（例如>200步）下压力测试代理的稳定性。对15余个领先LLM的广泛实验表明，即使前沿模型在归纳场景中也存在缺陷，揭示了在复杂环境中追求自主发现的关键瓶颈。我们的代码和数据可在https://github.com/xufangzhi/Odyssey-Arena获取。

英文摘要

The rapid advancement of Large Language Models (LLMs) has catalyzed the development of autonomous agents capable of navigating complex environments. However, existing evaluations primarily adopt a deductive paradigm, where agents execute tasks based on explicitly provided rules and static goals, often within limited planning horizons. Crucially, this neglects the inductive necessity for agents to discover latent transition laws from experience autonomously, which is the cornerstone for enabling agentic foresight and sustaining strategic coherence. To bridge this gap, we introduce OdysseyArena, which re-centers agent evaluation on long-horizon, active, and inductive interactions. We formalize and instantiate four primitives, translating abstract transition dynamics into concrete interactive environments. Building upon this, we establish OdysseyArena-Lite for standardized benchmarking, providing a set of 120 tasks to measure an agent's inductive efficiency and long-horizon discovery. Pushing further, we introduce OdysseyArena-Challenge to stress-test agent stability across extreme interaction horizons (e.g., > 200 steps). Extensive experiments on 15+ leading LLMs reveal that even frontier models exhibit a deficiency in inductive scenarios, identifying a critical bottleneck in the pursuit of autonomous discovery in complex environments. Our code and data are available at https://github.com/xufangzhi/Odyssey-Arena

URL PDF HTML ☆

赞 0 踩 0

2602.03410 2026-06-05 cs.CV

UnHype: CLIP-Guided Hypernetworks for Dynamic LoRA Unlearning

UnHype: 基于CLIP的超网络用于动态LoRA反学习

Piotr Wójcik, Maksym Petrenko, Wojciech Gromski, Przemysław Spurek, Maciej Zieba

发表机构 * Institute of Computer Science, University of Warsaw（华沙大学计算机科学研究所）

AI总结本文提出UnHype框架，通过将超网络引入单概念和多概念LoRA训练，解决传统LoRA方法在概念语义适应性差、难以平衡删除相关概念与保持泛化能力以及多概念同时删除时的可扩展性问题，展示了在物体擦除、名人擦除和色情内容删除等任务中的有效性。

Comments 23 pages, 11 figures. Accepted at ICML 2026. Code: https://github.com/gmum/UnHype/ Project Page: https://gmum.github.io/UnHype/

详情

AI中文摘要

近期大规模扩散模型的进步加剧了对其潜在滥用的担忧，特别是生成逼真但有害或社会 disruptive 的内容。这一挑战推动了有效机器反学习的研究，即在不损害模型整体生成能力的情况下，选择性地移除特定知识或概念。在各种方法中，低秩适应（LoRA）已成为一种有效的、高效的微调方法，用于针对反学习的定向调整。然而，基于LoRA的方法在概念语义适应性方面有限，并且在删除密切相关概念与保持更广泛意义的泛化能力之间难以平衡。此外，当必须同时删除多个概念时，这些方法面临可扩展性挑战。为了解决这些限制，我们引入了UnHype框架，该框架将超网络引入单概念和多概念LoRA训练中。所提出的架构可以直接插入到Stable Diffusion以及现代流基文本到图像模型中，其中展示了稳定的训练行为和有效的概念控制。在推理过程中，超网络根据CLIP嵌入动态生成适应性的LoRA权重，使反学习更加上下文感知和可扩展。我们评估了UnHype在多个具有挑战性的任务中的表现，包括物体擦除、名人擦除和色情内容删除，展示了其有效性和通用性。见GitHub上的代码：https://github.com/gmum/UnHype。

英文摘要

Recent advances in large-scale diffusion models have intensified concerns about their potential misuse, particularly in generating realistic yet harmful or socially disruptive content. This challenge has spurred growing interest in effective machine unlearning, the process of selectively removing specific knowledge or concepts from a model without compromising its overall generative capabilities. Among various approaches, Low-Rank Adaptation (LoRA) has emerged as an effective and efficient method for fine-tuning models toward targeted unlearning. However, LoRA-based methods often exhibit limited adaptability to concept semantics and struggle to balance removing closely related concepts with maintaining generalization across broader meanings. Moreover, these methods face scalability challenges when multiple concepts must be erased simultaneously. To address these limitations, we introduce UnHype, a framework that incorporates hypernetworks into single- and multi-concept LoRA training. The proposed architecture can be directly plugged into Stable Diffusion as well as modern flow-based text-to-image models, where it demonstrates stable training behavior and effective concept control. During inference, the hypernetwork dynamically generates adaptive LoRA weights based on the CLIP embedding, enabling more context-aware, scalable unlearning. We evaluate UnHype across several challenging tasks, including object erasure, celebrity erasure, and explicit content removal, demonstrating its effectiveness and versatility. See the code on GitHub: https://github.com/gmum/UnHype.

URL PDF HTML ☆

赞 0 踩 0

2602.02680 2026-06-05 cs.LG

FlexRank: Nested Low-Rank Knowledge Decomposition for Adaptive Model Deployment

FlexRank: 嵌套低秩知识分解用于自适应模型部署

Riccardo Zaccone, Stefanos Laskaridis, Marco Ciccone, Samuel Horváth

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出FlexRank方法，通过嵌套低秩权重分解和基于重要性的整合，从预训练模型中提取不同能力的子模型，实现“一次训练，随处部署”的自适应部署。

Comments Accepted at ICML 2026 (Spotlight)

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning, PMLR, 2026

AI中文摘要

深度神经网络（包括大型语言模型和视觉变换器）的规模不断增长，使得从头训练成本过高，部署成本也日益增加。这些模型通常作为固定计算成本的单一整体使用，阻碍了在不同成本预算下的自适应部署。我们认为，可以从预训练模型中提取按重要性排序的嵌套组件，并在可用计算预算内选择性激活。为此，我们提出的FlexRank方法利用嵌套的、基于重要性的低秩权重分解来整合子模型，从而提取能力递增的子模型。我们的方法实现了“一次训练，随处部署”的范式，无需为每个预算从头训练即可在成本与性能之间实现优雅的权衡——推进了大型模型的实际部署。

英文摘要

The growing scale of deep neural networks, encompassing large language models (LLMs) and vision transformers (ViTs), has made training from scratch prohibitively expensive and deployment increasingly costly. These models are often used as computational monoliths with fixed cost, hindering adaptive deployment across different cost budgets. We argue that nested components, ordered by importance, can be extracted from pretrained models and selectively activated within the available computational budget. To this end, our proposed FlexRank method leverages low-rank weight decomposition with nested, importance-based consolidation to extract submodels of increasing capabilities. Our approach enables a "train-once, deploy-everywhere" paradigm offering a graceful trade-off between cost and performance without training from scratch for each budget - advancing practical deployment of large models.

URL PDF HTML ☆

赞 0 踩 0

2602.02241 2026-06-05 cs.LG

Variational Entropic Optimal Transport

变分熵最优传输

Roman Dyachenko, Nikita Gushchin, Kirill Sokolov, Petr Mokrov, Evgeny Burnaev, Alexander Korotin

发表机构 * Lomonosov Moscow State University（莫斯科罗蒙诺索夫莫斯科大学）； National Research Nuclear University MEPhI（国家研究核大学 MEPhI）

AI总结本文提出变分熵最优传输（VarEOT），通过精确的变分重参数化将对数分区函数转化为可处理的最小化问题，从而在不依赖MCMC模拟的情况下实现高效的最优传输学习，理论上有有限样本泛化界和通用函数逼近结果，并在合成数据和未配对图像到图像翻译任务中展示了竞争力或改进的翻译质量。

详情

AI中文摘要

熵最优传输（EOT）在连续空间中以二次成本为经典工具，用于解决领域迁移问题。在实践中，最近的方法优化一个弱对偶EOT目标，依赖于单一势能函数，但这样做在计算上效率不高，因为对数分区项不可计算。现有方法通常通过两种方式解决这一障碍：通过显著限制传输家族以获得闭式归一化（通过高斯混合参数化），或通过使用通用神经参数化，需要基于模拟的训练过程。我们提出变分熵最优传输（VarEOT），基于对数分区$\log \mathbb{E}[\exp(\cdot)]$的精确变分重参数化，作为对辅助对数归一化进行可处理的最小化。这产生了一个可微学习目标，通过随机梯度优化，并避免了训练期间MCMC模拟的必要性。我们提供了理论保证，包括有限样本泛化界和在通用函数逼近下的近似结果。在合成数据和未配对图像到图像翻译实验中，展示了竞争力或改进的翻译质量，而与使用相同弱对偶EOT目标的求解器比较支持所提出优化原理的优势。我们的求解器代码可在https://github.com/DrEternity/VarEOT找到。

英文摘要

Entropic optimal transport (EOT) in continuous spaces with quadratic cost is a classical tool for solving the domain translation problem. In practice, recent approaches optimize a weak dual EOT objective depending on a single potential, but doing so is computationally not efficient due to the intractable log-partition term. Existing methods typically resolve this obstacle in one of two ways: by significantly restricting the transport family to obtain closed-form normalization (via Gaussian-mixture parameterizations), or by using general neural parameterizations that require simulation-based training procedures. We propose Variational Entropic Optimal Transport (VarEOT), based on an exact variational reformulation of the log-partition $\log \mathbb{E}[\exp(\cdot)]$ as a tractable minimization over an auxiliary log-normalizer. This yields a differentiable learning objective optimized with stochastic gradients and avoids the necessity of MCMC simulations during the training. We provide theoretical guarantees, including finite-sample generalization bounds and approximation results under universal function approximation. Experiments on synthetic data and unpaired image-to-image translation demonstrate competitive or improved translation quality, while comparisons within the solvers that use the same weak dual EOT objective support the benefit of the proposed optimization principle. The code for our solver can be found at https://github.com/DrEternity/VarEOT .

URL PDF HTML ☆

赞 0 踩 0

2602.01196 2026-06-05 cs.LG

Unraveling the Hidden Dynamical Structure in Recurrent Neural Policies

揭示递归神经策略中的隐藏动力学结构

Jin Li, Yue Wu, Mengsha Huang, Yuhao Sun, Hao He, Xianyuan Zhan

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文通过分析不同训练方法、模型架构和任务中学习得到的递归策略的隐藏状态域，发现稳定的循环结构在与环境交互时 consistently 出现，这些结构与动力系统分析中的极限环有相似性，并揭示了极限环几何结构与策略行为之间的对应关系，为解释递归策略的性能提供了新视角。

详情

AI中文摘要

递归神经策略在部分可观测控制和元强化学习任务中被广泛应用。它们能够维持内部记忆并快速适应未见过的场景，相较于非递归策略具有无可比拟的性能。然而，到目前为止，其优异的泛化性和鲁棒性性能的底层机制仍不明确。在本研究中，通过分析不同训练方法、模型架构和任务中学习得到的递归策略的隐藏状态域，我们发现稳定的循环结构在与环境交互时 consistently 出现。如果将策略和环境视为一个联合混合动力系统，这些循环结构与动力系统分析中的极限环有显著相似性。此外，我们发现这些极限环的几何结构也与策略行为具有结构化的对应关系。这些发现为解释递归策略的许多良好特性提供了新的视角：极限环的出现稳定了策略的内部记忆和任务相关的环境状态，同时抑制了来自环境不确定性的干扰变量；极限环的几何结构也编码了行为的关联结构，有助于在非稳态环境中更轻松地进行技能适应。

英文摘要

Recurrent neural policies are widely used in partially observable control and meta-RL tasks. Their abilities to maintain internal memory and adapt quickly to unseen scenarios have offered them unparalleled performance when compared to non-recurrent counterparts. However, until today, the underlying mechanisms for their superior generalization and robustness performance remain poorly understood. In this study, by analyzing the hidden state domain of recurrent policies learned over a diverse set of training methods, model architectures, and tasks, we find that stable cyclic structures consistently emerge during interaction with the environment. Such cyclic structures share a remarkable similarity with \textit{limit cycles} in dynamical system analysis, if we consider the policy and the environment as a joint hybrid dynamical system. Moreover, we uncover that the geometry of such limit cycles also has a structured correspondence with the policies' behaviors. These findings offer new perspectives to explain many nice properties of recurrent policies: the emergence of limit cycles stabilizes both the policies' internal memory and the task-relevant environmental states, while suppressing nuisance variability arising from environmental uncertainty; the geometry of limit cycles also encodes relational structures of behaviors, facilitating easier skill adaptation when facing non-stationary environments.

URL PDF HTML ☆

赞 0 踩 0

2602.00911 2026-06-05 cs.AI

Synapse: Federated Tool Routing via Typed Compendium Artifacts

Synapse: 通过类型化编目工件实现联邦工具路由

Abhijit Chakraborty, Yash Shah, Vivek Gupta

发表机构 * MongoDB ； Arizona State University（亚利桑那州立大学）

AI总结本文提出了一种名为Synapse的联邦工具路由编目系统，通过类型化联邦工件实现跨客户端的联邦学习，解决了在异构LLM和无共享数据的情况下，如何实现隐私保护、冲突解决和跨架构迁移的问题。

详情

AI中文摘要

在联邦学习中，协作单位决定了可以表达哪些保证。像权重、提示、原始示例这样的扁平单位没有类型签名，无法为隐私、冲突解决或跨模型迁移提供明确的操作。我们提出了类型化的联邦工件：经过模式验证的对象，其声明的字段结构使得每字段差分隐私、模式感知合并和跨架构迁移成为第一类操作，而非启发式近似。我们将此实现为SYNAPSE，一个用于在具有冻结、异构LLM且无共享数据或权重的客户端之间进行联邦工具路由的编目系统，这种设置下扁平单位无法处理，除非泄露梯度或丢弃结构。该编目系统允许带有字段级冲突解决的类型化合并运算符，对数值元数据提供形式化的DP保证，并在五个分布上经验性地刻画了条件检索失真和路由稳定性结果，包括一个收缩前提失败的分布。一个单一的编目系统在四个LLM家族（LLaMA 3.18B、LLaMA 3.2-3B、Mistral 7B、GPT 4o）之间转移，损失约2 pt，这种能力重量共享联邦无法在无架构匹配的情况下提供。

英文摘要

The unit of collaboration in federated learning determines what guarantees are even expressible. Flat units like weights, prompts, raw examples, carry no type signature on which privacy, conflict resolution, or cross-model transfer can dispatch as well-defined operations. We propose typed federated artifacts: schema validated objects whose declared field structure makes per field differential privacy, schema aware merging, and cross architectural transfer first-class operations rather than heuristic approximations. We instantiate this as SYNAPSE, a compendium for federated tool routing across clients with frozen, heterogeneous LLMs and no shared data or weights which is a setting flat units cannot handle without either leaking gradients or discarding structure. The compendium admits a typed merge operator with field wise conflict resolution, a formal DP guarantee on numeric metadata, and conditional retrieval distortion and routing-stability results empirically characterized on five distributions, including one where the contraction premise fails. A single compendium transfers across four LLM families (LLaMA 3.18B,LLaMA 3.2-3B, Mistral 7B, GPT 4o) with approximately 2 pt loss, a capability weight-sharing federation cannot provide without architectural matching.

URL PDF HTML ☆

赞 0 踩 0

2511.20102 2026-06-05 cs.CL

SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space

SSA: 通过对齐特征空间中的全注意力和稀疏注意力输出实现稀疏注意力

Zhenyi Shen, Junru Lu, Lin Gui, Jiazheng Li, Yulan He, Di Yin, Xing Sun

发表机构 * King’s College London（伦敦国王学院）； Tencent Youtu Lab（腾讯优图实验室）

AI总结提出SSA训练框架，通过双向注意力输出对齐同时解决稀疏注意力的注意力差距和能力差距，实现与全注意力相当的性能。

Comments 34 pages

详情

AI中文摘要

稀疏注意力降低了全自注意力的二次复杂度，但面临两个挑战：（1）注意力差距，即对全注意力训练模型应用稀疏注意力会因训练-推理分布不匹配导致性能下降；（2）能力差距，即纯稀疏注意力训练的模型缺乏完整梯度流，无法达到全注意力性能。我们提出SSA（稀疏注意力），一个集成稀疏和全注意力并具有双向注意力输出对齐的训练框架。我们证明近似误差与稀疏注意力下丢弃的注意力质量线性相关，并表明SSA的对齐目标相比基线大幅减少了该量。实验表明，SSA在两种推理模式下均达到最先进性能，能平滑适应不同的稀疏预算，并展现出优越的长上下文能力。

英文摘要

Sparse attention reduces the quadratic complexity of full self-attention but faces two challenges: (1) an attention gap, where applying sparse attention to full-attention-trained models causes performance degradation due to train-inference distribution mismatch, and (2) a capability gap, where models trained purely with sparse attention lack complete gradient flow, preventing them from matching full-attention performance. We propose SSA (Sparse Sparse Attention), a training framework that integrates both sparse and full attention with bidirectional attention-output alignment. We prove that the approximation error scales linearly with the attention mass dropped under sparse attention, and show that SSA's alignment objective substantially reduces this quantity compared to baselines. Experiments demonstrate that SSA achieves state-of-the-art performance under both inference modes, adapts smoothly to varying sparsity budgets, and demonstrates superior long-context capabilities.

URL PDF HTML ☆

赞 0 踩 0

2601.22580 2026-06-05 cs.CL cs.AI cs.LG

SpanNorm: Reconciling Training Stability and Performance in Deep Transformers

SpanNorm: 在深度Transformer中协调训练稳定性与性能

Chao Wang, Bei Li, Jiaqi Zhang, Xinyu Liu, Yuchun Fan, Linkun Lyu, Xin Chen, Jingang Wang, Tong Xiao, Peng Pei, Xunliang Cai

发表机构 * Meituan Inc.（美团公司）； NLP Lab, School of Computer Science and Engineering（自然语言处理实验室，计算机科学与工程学院）； Northeastern University, Shenyang, China（东北大学，沈阳，中国）

AI总结本文提出SpanNorm技术，通过结合前归一化和后归一化的优势，解决深度Transformer中训练稳定性与性能之间的根本性权衡问题，理论分析和实验结果表明其在密集和专家混合（MoE）场景中均优于传统归一化方案。

Comments Accepted by ICML2026

详情

AI中文摘要

大型语言模型（LLMs）的成功依赖于深度Transformer架构的稳定训练。一个关键的设计选择是归一化层的位置，导致了一个根本性的权衡：PreNorm架构在深度模型中确保了训练稳定性，但可能牺牲性能；而PostNorm架构提供了强大的性能，但面临严重的训练不稳定性。在本工作中，我们提出SpanNorm，一种新的技术，旨在通过整合两种范式的优点来解决这一困境。结构上，SpanNorm建立了一个跨越整个Transformer块的清晰残差连接以稳定信号传播，同时采用PostNorm风格的计算方式对聚合输出进行归一化以增强模型性能。我们提供了理论分析，证明SpanNorm结合合理的缩放策略可以在整个网络中保持信号方差有界，防止PostNorm模型中出现的梯度问题，并缓解PreNorm中的表示崩溃问题。实验结果表明，SpanNorm在密集和专家混合（MoE）场景中均优于传统归一化方案，为更强大和稳定的Transformer架构铺平了道路。

英文摘要

The success of Large Language Models (LLMs) hinges on the stable training of deep Transformer architectures. A critical design choice is the placement of normalization layers, leading to a fundamental trade-off: the ``PreNorm'' architecture ensures training stability at the cost of potential performance degradation in deep models, while the ``PostNorm'' architecture offers strong performance but suffers from severe training instability. In this work, we propose SpanNorm, a novel technique designed to resolve this dilemma by integrating the strengths of both paradigms. Structurally, SpanNorm establishes a clean residual connection that spans the entire transformer block to stabilize signal propagation, while employing a PostNorm-style computation that normalizes the aggregated output to enhance model performance. We provide a theoretical analysis demonstrating that SpanNorm, combined with a principled scaling strategy, maintains bounded signal variance throughout the network, preventing the gradient issues that plague PostNorm models, and also alleviating the representation collapse of PreNorm. Empirically, SpanNorm consistently outperforms standard normalization schemes in both dense and Mixture-of-Experts (MoE) scenarios, paving the way for more powerful and stable Transformer architectures.

URL PDF HTML ☆

赞 0 踩 0

2601.21700 2026-06-05 cs.CL cs.AI cs.IR cs.MA cs.SI

Toward Culturally Aligned LLMs through Ontology-Guided Multi-Agent Reasoning

通过本体引导的多智能体推理实现文化对齐的大型语言模型

Wonduk Seo, Wonseok Choi, Junseo Koh, Juhyeon Lee, Hyunjin An, Minhyeong Yu, Jian Park, Qingshan Zhou, Seunghyun Lee, Yi Bu

发表机构 * KAIST（韩国科学技术院）

AI总结本文提出OG-MAR框架，通过本体引导的多智能体推理方法，提高大型语言模型在文化对齐和鲁棒性方面的性能，并生成更透明的推理轨迹。

Comments Accepted by ICML 2026 Regular Track

详情

AI中文摘要

大型语言模型（LLMs）越来越多地支持文化敏感的决策制定，但往往由于预训练数据倾斜和缺乏结构化的价值表示而表现出不一致。现有方法虽然可以引导输出，但通常缺乏人口统计学基础，并将价值观视为独立的、无结构的信号，从而降低一致性和可解释性。我们提出OG-MAR，一种本体引导的多智能体推理框架。OG-MAR从世界价值观调查（WVS）中总结出响应特定的价值，并通过能力问题在固定分类法上提取关系来构建全球文化本体。在推理过程中，它检索与本体一致的关系和人口统计学相似的资料，以实例化多个价值-人设代理，其输出由一个执行本体一致性和人口统计学接近性的判断代理合成。在四个LLM基础架构上的区域社会调查基准测试中，OG-MAR在文化对齐和鲁棒性方面优于竞争基线，同时生成更透明的推理轨迹。

英文摘要

Large Language Models (LLMs) increasingly support culturally sensitive decision making, yet often exhibit misalignment due to skewed pretraining data and the absence of structured value representations. Existing methods can steer outputs, but often lack demographic grounding and treat values as independent, unstructured signals, reducing consistency and interpretability. We propose OG-MAR, an Ontology-Guided Multi-Agent Reasoning framework. OG-MAR summarizes respondent-specific values from the World Values Survey (WVS) and constructs a global cultural ontology by eliciting relations over a fixed taxonomy via competency questions. At inference time, it retrieves ontology-consistent relations and demographically similar profiles to instantiate multiple value-persona agents, whose outputs are synthesized by a judgment agent that enforces ontology consistency and demographic proximity. Experiments on regional social-survey benchmarks across four LLM backbones show that OG-MAR improves cultural alignment and robustness over competitive baselines, while producing more transparent reasoning traces.

URL PDF HTML ☆

赞 0 踩 0

2505.11766 2026-06-05 cs.LG cs.AI quant-ph

Reformulating Neural Operators in $d+1$ Dimensions for Embedding Evolution

在d+1维度中重新表述神经算子以嵌入演化

Haoze Song, Zhihao Li, Xiaobo Zhang, Zecheng Gan, Zhilu Lai, Wei Wang

发表机构 * HKUST (GZ)（香港科技大学（广州））； HKUST（香港科技大学）； SWJTU（西南交通大学）

AI总结本文提出在d+1维度中重新表述神经算子，通过引入辅助函数维度来建模嵌入演化，从而改进嵌入扩展的效率，通过傅里叶基算子在物理域和辅助域上联合作用，实现更高效的嵌入演化模块，实验表明该方法在多个基准测试中表现优异。

详情

AI中文摘要

神经算子（NOs）是学习函数空间之间映射的强大架构。尽管大多数进展集中在改进核参数化在d维物理域上的精度，但提升的嵌入扩展仍缺乏探索，这通常导致模型倾向于计算成本高昂的嵌入扩展设计以提高近似能力。在本文中，我们引入了一个辅助函数维度，以运算形式建模嵌入演化，从而在d+1维度中重新表述NO流程。我们通过基于傅里叶的算子在物理域和辅助域上联合作用，实例化了这一框架，得到一个基于基底多样化的方法作为替代于暴力嵌入扩展。在超过十种越来越具有挑战性的基准测试中，从1D热方程到高度非线性的3D瑞利-泰勒不稳定性，我们的模型在评估的基线中始终实现了最低的相对L2误差。关键的是，这一优势通过（1）受控预算意识的比较，与缩放和剥离的基线；（2）混合分辨率训练和超分辨率推断下的鲁棒性；以及（3）零样本泛化到未见的时间范围，得到了实证支持。此外，我们还展示了更广泛的设计选择，以提升和恢复算子，展示了其对模型预测性能的影响。

英文摘要

Neural Operators (NOs) are powerful architectures for learning mappings between function spaces. While most advances focus on refining kernel parameterizations over the $d$-dimensional physical domain, the evolution of lifted embeddings remains underexplored, which often drives models toward computationally expensive embedding-scaling designs to improve approximation. In this paper, we introduce an auxiliary function dimension that models embedding evolution in operator form, thereby reformulating the NO pipeline in $d+1$ dimensions. We instantiate this framework via Fourier-based operators acting jointly on the physical and auxiliary domains, yielding a basis-diversified auxiliary evolution module as an alternative to brute-force embedding scaling. Across more than ten increasingly challenging benchmarks, ranging from the 1D heat equation to the highly nonlinear 3D Rayleigh-Taylor instability, our model consistently achieves the lowest relative $L_2$ error among the evaluated baselines. Crucially, this advantage is empirically supported by (1) controlled budget-aware comparisons against scaled and ablated baselines; (2) robustness under mixed-resolution training and super-resolution inference; and (3) zero-shot generalization to unseen temporal regimes. In addition, we present a broader set of design choices for lifting and recovery operators, demonstrating their impact on our model's predictive performance.

URL PDF HTML ☆

赞 0 踩 0

2601.21288 2026-06-05 cs.AI cs.CV

Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving

Drive-KD：自动驾驶中用于视觉语言模型的多教师知识蒸馏

Weitong Lian, Zecong Tang, Haoran Li, Tianjian Gao, Yifei Wang, Zixu Wang, Lingyi Meng, Tengju Ru, Zhejun Cui, Yichen Zhu, Hangshuo Cao, Qi Kang, Tianxing Chen, Kaixuan Wang, Yu Zhang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文提出Drive-KD框架，通过将自动驾驶分解为感知-推理-规划三元组，并利用知识蒸馏转移能力，构建了专用教师模型，并通过异构梯度投影缓解跨能力梯度冲突，验证了方法在不同模型家族和规模上的泛化能力，展示了蒸馏模型在自动驾驶任务中的优越性能。

详情

AI中文摘要

自动驾驶是一个重要且安全关键的任务，最近大型语言模型（LLM）和视觉语言模型（VLM）的进展为该领域提供了新的推理和规划可能性。然而，大模型需要大量GPU内存并表现出较高的推理延迟，而传统监督微调（SFT）往往难以弥补小模型的能力差距。为了解决这些限制，我们提出了Drive-KD，一个将自动驾驶分解为“感知-推理-规划”三元组并通过知识蒸馏转移这些能力的框架。我们识别出层特定的注意力作为蒸馏信号，构建出能够超越基线的专用单教师模型。此外，我们将这些单教师设置统一到多教师蒸馏框架中，并引入异构梯度投影以缓解跨能力梯度冲突。广泛的评估验证了我们的方法在不同模型家族和规模上的泛化能力。实验表明，我们的蒸馏InternVL3-1B模型在GPU内存方面仅为78B模型的约42倍，在吞吐量方面为11.4倍，且在DriveBench上整体性能优于同家族的预训练78B模型，并在规划维度上超越GPT-5.1，为高效自动驾驶VLMs提供了新的见解。

英文摘要

Autonomous driving is an important and safety-critical task, and recent advances in LLMs/VLMs have opened new possibilities for reasoning and planning in this domain. However, large models demand substantial GPU memory and exhibit high inference latency, while conventional supervised fine-tuning (SFT) often struggles to bridge the capability gaps of small models. To address these limitations, we propose Drive-KD, a framework that decomposes autonomous driving into a "perception-reasoning-planning" triad and transfers these capabilities via knowledge distillation. We identify layer-specific attention as the distillation signal to construct capability-specific single-teacher models that outperform baselines. Moreover, we unify these single-teacher settings into a multi-teacher distillation framework and introduce asymmetric gradient projection to mitigate cross-capability gradient conflicts. Extensive evaluations validate the generalization of our method across diverse model families and scales. Experiments show that our distilled InternVL3-1B model, with ~42 times less GPU memory and ~11.4 times higher throughput, achieves better overall performance than the pretrained 78B model from the same family on DriveBench, and surpasses GPT-5.1 on the planning dimension, providing insights toward efficient autonomous driving VLMs.

URL PDF HTML ☆

赞 0 踩 0

2601.19568 2026-06-05 cs.AI cs.SE

Learning Adaptive Parallel Execution for Efficient Code Localization

学习适应性并行执行以实现高效的代码定位

Ke Xu, Siyang Xiao, Ming Liang, Yichen Yu, Zhixiang Wang, Jingxuan Xu, Dajun Chen, Wei Jiang, Yong Li

发表机构 * Ant Group（蚂蚁集团）； Peking University（北京大学）； Beijing Jiaotong University（北京交通大学）

AI总结本文提出FuseSearch，通过将并行代码定位重新表述为联合质量-效率优化任务，采用两阶段SFT和RL训练方法学习适应性并行策略，以提高代码定位的效率和性能。

Comments Paper accepted to Findings of ACL 2026

详情

AI中文摘要

代码定位是自动化软件开发流水线中的关键瓶颈。尽管并发工具执行可以提高发现速度，但当前代理表现出34.9%的冗余调用率，抵消了并行优势。我们提出FuseSearch，将并行代码定位重新表述为联合质量-效率优化任务。通过定义工具效率——唯一信息增益与调用次数的比率——我们采用两阶段SFT和RL训练方法来学习适应性并行策略。与固定广度方法不同，FuseSearch根据任务上下文动态调节搜索广度，从探索阶段演变为细化阶段。在SWE-bench Verified上评估，FuseSearch-4B实现了SOTA级性能（84.7%的文件级和56.4%的功能级F1分数），速度提升达93.6%，使用67.7%更少的轮次和68.9%更少的token。结果表明，效率感知的训练通过消除嘈杂的冗余信号自然提高质量，使高绩效的低成本定位代理成为可能。

英文摘要

Code localization constitutes a key bottleneck in automated software development pipelines. While concurrent tool execution can enhance discovery speed, current agents demonstrate a 34.9% redundant invocation rate, which negates parallelism benefits. We propose FuseSearch, reformulating parallel code localization as a joint quality-efficiency optimization} task. Through defining tool efficiency -- the ratio of unique information gain to invocation count -- we utilize a two-phase SFT and RL training approach for learning adaptive parallel strategies. Different from fixed-breadth approaches, FuseSearch dynamically modulates search breadth according to task context, evolving from exploration phases to refinement stages. Evaluated on SWE-bench Verified, FuseSearch-4B achieves SOTA-level performance (84.7% file-level and 56.4% function-level F1 scores) with 93.6% speedup, utilizing 67.7% fewer turns and 68.9% fewer tokens. Results indicate that efficiency-aware training naturally improves quality through eliminating noisy redundant signals, enabling high-performance cost-effective localization agents.

URL PDF HTML ☆

赞 0 踩 0

2601.18383 2026-06-05 cs.AI cs.CL cs.LG

Dynamic Thinking-Token Selection for Efficient Reasoning in Large Reasoning Models

动态思维-令牌选择用于大型推理模型中的高效推理

Zhenyuan Guo, Tong Chen, Wenlong Meng, Chen Gong, Xin Yu, Chengkun Wei, Wenzhi Chen

发表机构 * Zhejiang University（浙江大学）

AI总结本研究提出动态思维-令牌选择方法，通过分析推理轨迹发现只有部分关键令牌影响最终答案，从而优化大型推理模型的效率。

2508.11618 2026-06-05 cs.LG

Optimal CO2 storage management considering safety constraints in multi-stakeholder multi-site CCS projects: a Markov game perspective

考虑安全约束的多利益相关者多地点碳捕集与封存项目最优存储管理：从马尔可夫博弈视角

Jungang Chen, Seyyed A. Hosseini

发表机构 * The University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结本文基于马尔可夫博弈方法，研究多利益相关者多地点碳捕集与封存项目中不同联盟结构对利益相关者目标的影响，提出一种考虑安全约束的多智能体强化学习框架，以实现多利益相关者的最优存储管理。

Comments 58 pages

详情

DOI: 10.1016/j.ijggc.2026.104683
Journal ref: Int. J. Greenh. Gas Control 149 (2026) 104683

AI中文摘要

碳捕集与封存（CCS）项目通常涉及来自公共、私人和监管部门的多种利益相关者，每个利益相关者有不同的目标和责任。鉴于CCS操作的复杂性、规模和长期性，确定个体利益相关者是否能够独立最大化其利益，或是否需要协作联盟协议，仍然是有效CCS项目规划和管理的核心问题。CCS项目通常在地质相连的地点实施，其中共享的地质特征如压力空间和储层孔隙容量可能导致利益相关者之间的竞争行为。此外，CO2储存地点通常位于地质成熟的盆地，这些盆地以前曾作为石油开采或废水处置的地点，以利用现有基础设施，这使得单方面优化变得更加复杂和不现实。在本工作中，我们提出了一种基于马尔可夫博弈的范式，以定量研究不同联盟结构如何影响利益相关者的目标。我们将多利益相关者多地点问题框架为具有安全约束的多智能体强化学习问题。我们的方法使智能体能够在遵守安全规定的情况下学习最优策略。我们展示了多个操作员在地质相连盆地中向各自项目区域注入CO2的示例。为了解决高保真模型重复模拟的高计算成本，采用了一种基于Embed-to-Control（E2C）框架的先前开发的替代模型。我们的结果展示了所提出框架在处理多个具有不同目标和目标的利益相关者时实现CO2存储最优管理的有效性。

英文摘要

Carbon capture and storage (CCS) projects typically involve a diverse array of stakeholders or players from public, private, and regulatory sectors, each with different objectives and responsibilities. Given the complexity, scale, and long-term nature of CCS operations, determining whether individual stakeholders can independently maximize their interests or whether collaborative coalition agreements are needed remains a central question for effective CCS project planning and management. CCS projects are often implemented in geologically connected sites, where shared geological features such as pressure space and reservoir pore capacity can lead to competitive behavior among stakeholders. Furthermore, CO2 storage sites are often located in geologically mature basins that previously served as sites for hydrocarbon extraction or wastewater disposal in order to leverage existing infrastructures, which makes unilateral optimization even more complicated and unrealistic. In this work, we propose a paradigm based on Markov games to quantitatively investigate how different coalition structures affect the goals of stakeholders. We frame this multi-stakeholder multi-site problem as a multi-agent reinforcement learning problem with safety constraints. Our approach enables agents to learn optimal strategies while compliant with safety regulations. We present an example where multiple operators are injecting CO2 into their respective project areas in a geologically connected basin. To address the high computational cost of repeated simulations of high-fidelity models, a previously developed surrogate model based on the Embed-to-Control (E2C) framework is employed. Our results demonstrate the effectiveness of the proposed framework in addressing optimal management of CO2 storage when multiple stakeholders with various objectives and goals are involved.

URL PDF HTML ☆

赞 0 踩 0

2601.08510 2026-06-05 cs.CL cs.AI

STAGE: A Full-Screenplay Benchmark for Reasoning over Evolving Storie

STAGE：一个用于推理演变故事的完整剧本基准

Qiuyu Tian, Zequn Liu, Yiding Li, Fengyi Chen, Youyong Kong, Fan Guo, Yuyao Li, Jinjing Shen, Zhijing Xie, Yiyun Luo, Xin Zhang, Yingce Xia

发表机构 * Southeast University（东南大学）； Beijing Zhongguancun Academy（北京中关村学院）； Nanjing Normal University（南京师范大学）； ZhuiWen Technology Co., Ltd.（智库文科技有限公司）

AI总结提出STAGE基准，通过知识图谱构建、场景事件摘要、长上下文问答和角色扮演四项任务，全面评估模型对电影剧本叙事世界的理解与推理能力。

Comments 66 pages, 9 figures

详情

AI中文摘要

电影剧本是丰富的长篇叙事，交织着复杂的角色关系、时间顺序事件和对话驱动的互动。虽然先前的基准针对诸如问答或对话生成等单个子任务，但它们很少评估模型能否构建连贯的故事世界并在多种推理和生成形式中一致地使用它。我们引入了STAGE（剧本文本、智能体、图谱与评估），一个针对全长电影剧本叙事理解的统一基准。STAGE定义了四个任务：知识图谱构建、场景级事件摘要、长上下文剧本问答以及剧本内角色扮演，所有这些都基于共享的叙事世界表示。该基准提供了150部中英文电影的清洗脚本、策划的知识图谱以及事件和角色为中心的注释，从而能够全面评估模型构建世界表示、抽象和验证叙事事件、推理长叙事以及生成角色一致响应的能力。

英文摘要

Movie screenplays are rich long-form narratives that interleave complex character relationships, temporally ordered events, and dialogue-driven interactions. While prior benchmarks target individual subtasks such as question answering or dialogue generation, they rarely evaluate whether models can construct a coherent story world and use it consistently across multiple forms of reasoning and generation. We introduce STAGE (Screenplay Text, Agents, Graphs and Evaluation), a unified benchmark for narrative understanding over full-length movie screenplays. STAGE defines four tasks: knowledge graph construction, scene-level event summarization, long-context screenplay question answering, and in-script character role-playing, all grounded in a shared narrative world representation. The benchmark provides cleaned scripts, curated knowledge graphs, and event- and character-centric annotations for 150 films across English and Chinese, enabling holistic evaluation of models' abilities to build world representations, abstract and verify narrative events, reason over long narratives, and generate character-consistent responses.

URL PDF HTML ☆

赞 0 踩 0

2512.14338 2026-06-05 cs.LG

Implicit Bias and Invariance: How Hopfield Networks Efficiently Learn Graph Orbits

隐式偏差与不变性：Hopfield网络如何高效学习图轨道

Michael Murray, Tenzin Chan, Kedar Karhadker, Christopher J. Hillar

发表机构 * Mathematical Sciences, University of Bath（巴斯大学数学科学系）； Department of Mathematics, UCLA（洛杉矶大学数学系）； Algebraic 4 New Theory AI（代数4新理论AI）

AI总结研究探讨了Hopfield网络在处理对称性学习问题时的隐式不变性机制，揭示了通过梯度下降学习图同构类时的隐式偏差及其对样本复杂度的影响。

详情

AI中文摘要

许多学习问题涉及对称性，尽管不变性可以被构建到神经架构中，但也可以在训练于群结构数据时隐式地出现。我们研究了经典Hopfield网络中的这一现象，并展示了它们可以从少量随机样本中推断出图的完整同构类。我们的结果揭示了：(i) 图的同构类可以在三维不变子空间内表示；(ii) 使用梯度下降最小化能量流（MEF）具有隐式偏差，倾向于规范高效解，这为学习同构类提供了多项式样本复杂度界；(iii) 在多种学习规则下，参数随着样本量的增加而收敛到不变子空间。这些发现突显了Hopfield网络泛化中的统一机制：学习过程对规范效率的偏见驱动了在群结构数据下的近似不变性出现。

英文摘要

Many learning problems involve symmetries, and while invariance can be built into neural architectures, it can also emerge implicitly when training on group-structured data. We study this phenomenon in classical Hopfield networks and show they can infer the full isomorphism class of a graph from a small random sample. Our results reveal that: (i) graph isomorphism classes can be represented within a three-dimensional invariant subspace, (ii) using gradient descent to minimize energy flow (MEF) has an implicit bias toward norm-efficient solutions, which underpins a polynomial sample complexity bound for learning isomorphism classes, and (iii) across multiple learning rules, parameters converge toward the invariant subspace as sample sizes grow. Together, these findings highlight a unifying mechanism for generalization in Hopfield networks: a bias toward norm efficiency in learning drives the emergence of approximate invariance under group-structured data.

URL PDF HTML ☆

赞 0 踩 0

2601.09236 2026-06-05 cs.LG cs.AI

Reward Learning through Ranking Mean Squared Error

通过排名均方误差进行奖励学习

Chaitanya Kharyal, Calarina Muslimani, Matthew E. Taylor

发表机构 * Calarina Muslimani（卡拉里娜·穆斯林尼）； Matthew E. Taylor（马修·E·泰勒）

AI总结本文提出了一种基于排名的强化学习方法R4，通过引入新的排名均方误差损失函数，从轨迹-评分对数据中学习奖励函数，并在机器人基准测试中表现出色。

详情

AI中文摘要

奖励设计仍然是将强化学习（RL）应用于现实世界问题的主要瓶颈。一种流行的替代方法是奖励学习，其中奖励函数是从人类反馈中推断出来，而不是手动指定。最近的工作提出了从人类评分而不是传统二元偏好中学习奖励函数，从而实现更丰富且可能更少认知需求的监督。在此范式基础上，我们引入了一种新的基于评分的RL方法，即Ranked Return Regression for RL（R4）。其核心是使用一种新的排名均方误差损失，从轨迹-评分对数据集中学习，将人类提供的离散评分（例如，差，中性，好）视为有序目标。与以往的基于评分的方法不同，R4提供了正式的保证：在其解集下，在温和的假设下，解集是可证明的最小且完整的。实证上，使用人类提供的和模拟的评分，我们证明R4在OpenAI Gym和DeepMind Control Suite的机器人基准测试中，一致地匹配或优于现有的基于评分和偏好强化学习方法。代码发布在https://github.com/IRLL/R4。

英文摘要

Reward design remains a significant bottleneck in applying reinforcement learning (RL) to real-world problems. A popular alternative is reward learning, where reward functions are inferred from human feedback rather than manually specified. Recent work has proposed learning reward functions from human ratings rather than traditional binary preferences, enabling richer and potentially less cognitively demanding supervision. Building on this paradigm, we introduce a new rating-based RL method, Ranked Return Regression for RL (R4). At its core, R4 uses a novel ranking mean squared error loss that learns from a dataset of trajectory-rating pairs, treating the human-provided discrete ratings (e.g., bad, neutral, good) as ordinal targets. Unlike prior rating-based approaches, R4 offers formal guarantees: its solution set is provably minimal and complete under mild assumptions. Empirically, using both human-provided and simulated ratings, we demonstrate that R4 consistently matches or outperforms existing rating and preference-based RL methods on robotic benchmarks from OpenAI Gym and the DeepMind Control Suite. Code released at https://github.com/IRLL/R4.

URL PDF HTML ☆

赞 0 踩 0