arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2086
2605.06667 2026-05-08 cs.CV cs.AI cs.LG

ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation

ActCam: 零样本联合摄像机和3D运动控制用于视频生成

Omar El Khalifi, Thomas Rossi, Oscar Fossey, Thibault Fouque, Ulysse Mizrahi, Philip Torr, Ivan Laptev, Fabio Pizzati, Baptiste Bellot-Gurlet

发表机构 * University of Oxford(牛津大学)

AI总结 ActCam通过零样本方法实现视频生成中演员动作与摄像机轨迹的联合控制,通过几何一致的条件生成提升摄像机适应性和动作真实性。

Comments SIGGRAPH 2026

详情
AI中文摘要

对于艺术应用,视频生成需要对表演和 cinematography(演员动作和摄像机轨迹)进行精细控制。我们提出了ActCam,一种零样本方法,用于生成视频,该方法联合将角色动作从驱动视频转移到新场景,并实现每帧对内参和外参摄像机参数的控制。ActCam基于任何预训练的图像到视频扩散模型,该模型接受场景深度和角色姿态的条件输入。给定一个包含移动角色的源视频和目标摄像机运动,ActCam生成保持帧间几何一致性的姿态和深度条件。然后运行一个单次采样过程,采用双阶段条件调度:早期去噪步骤同时条件于姿态和稀疏深度以强制场景结构,之后丢弃深度并仅使用姿态指导来细化高频细节而不过度约束生成。我们在多个基准上评估了ActCam,涵盖多样化的角色运动和具有挑战性的视角变化。我们发现,与仅姿态控制和其他姿态和摄像机方法相比,ActCam在摄像机适应性和动作保真度方面有所提升,并在人类评估中更受欢迎,尤其是在大视角变化情况下。我们的结果表明,精心设计的摄像机一致条件和分阶段指导可以实现强大的联合摄像机和运动控制,而无需训练。项目页面:https://elkhomar.github.io/actcam/.

英文摘要

For artistic applications, video generation requires fine-grained control over both performance and cinematography, i.e., the actor's motion and the camera trajectory. We present ActCam, a zero-shot method for video generation that jointly transfers character motion from a driving video into a new scene and enables per-frame control of intrinsic and extrinsic camera parameters. ActCam builds on any pretrained image-to-video diffusion model that accepts conditioning in terms of scene depth and character pose. Given a source video with a moving character and a target camera motion, ActCam generates pose and depth conditions that remain geometrically consistent across frames. We then run a single sampling process with a two-phase conditioning schedule: early denoising steps condition on both pose and sparse depth to enforce scene structure, after which depth is dropped and pose-only guidance refines high-frequency details without over-constraining the generation. We evaluate ActCam on multiple benchmarks spanning diverse character motions and challenging viewpoint changes. We find that, compared to pose-only control and other pose and camera methods, ActCam improves camera adherence and motion fidelity, and is preferred in human evaluations, especially under large viewpoint changes. Our results highlight that careful camera-consistent conditioning and staged guidance can enable strong joint camera and motion control without training. Project page: https://elkhomar.github.io/actcam/.

2605.06665 2026-05-08 cs.LG cs.AI

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

UniPool: 一种全局共享专家池的混合专家架构

Minbin Huang, Han Shi, Chuanyang Zheng, Yimeng Wu, Guoxuan Chen, Xintong Yu, Yichun Yin, Hong Cheng

发表机构 * The Chinese University of Hong Kong(香港中文大学) Huawei Technologies(华为技术有限公司) The University of Hong Kong(香港大学)

AI总结 UniPool通过全局共享专家池替代传统每层独立专家资源,减少专家参数线性增长需求,提升模型效率和效果。

详情
AI中文摘要

现代混合专家(MoE)架构通过每层独立的专家集分配专家容量,但这种规则导致深度扩展与线性专家参数增长耦合。本文提出UniPool,通过全局共享专家池替代每层独立所有权,引入池级辅助损失平衡专家利用,并采用NormRouter实现稳定稀疏路由。在五个LLaMA架构模型规模上,UniPool在验证损失和困惑度上优于基线模型,且专家参数增长可亚线性化,同时与更细粒度专家分解结合效果更优。

英文摘要

Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and assumes that every layer needs isolated expert capacity. However, recent analyses and our routing probe challenge this allocation rule: replacing a deeper layer's learned top-k router with uniform random routing drops downstream accuracy by only 1.0-1.6 points across multiple production MoE models. Motivated by this redundancy, we propose UniPool, an MoE architecture that treats expert capacity as a global architectural budget by replacing per-layer expert ownership with a single shared pool accessed by independent per-layer routers. To enable stable and balanced training under sharing, we introduce a pool-level auxiliary loss that balances expert utilization across the entire pool, and adopt NormRouter to provide sparse and scale-stable routing into the shared expert pool. Across five LLaMA-architecture model scales (182M, 469M, 650M, 830M, and 978M parameters) trained on 30B tokens from the Pile, UniPool consistently improves validation loss and perplexity over the matched vanilla MoE baselines. Across these scales, UniPool reduces validation loss by up to 0.0386 relative to vanilla MoE. Beyond raw loss improvement, our results identify pool size as an explicit depth-scaling hyperparameter: reduced-pool UniPool variants using only 41.6%-66.7% of the vanilla expert-parameter budget match or outperform layer-wise MoE at the tested scales. This shows that, under a shared-pool design, expert parameters need not grow linearly with depth; they can grow sublinearly while remaining more efficient and effective than vanilla MoE. Further analysis shows that UniPool's benefits compose with finer-grained expert decomposition.

2605.06664 2026-05-08 cs.CV cs.AI

BAMI: Training-Free Bias Mitigation in GUI Grounding

BAMI:无需训练的GUI定位偏差缓解

Borui Zhang, Bo Zhang, Bo Wang, Wenzhao Zheng, Yuhao Cheng, Liang Tang, Yiqiang Yan, Jie Zhou, Jiwen Lu

发表机构 * Tsinghua University, China(清华大学,中国) Lenovo Research, China(联想研究院,中国)

AI总结 本文提出BAMI方法,通过粗到细聚焦和候选选择缓解GUI定位中的精度偏差和歧义偏差,提升模型在无训练设置下的准确性。

Comments Accepted by CVPR 2026

详情
AI中文摘要

GUI定位是使GUI代理执行点击和拖动等任务的关键能力。然而,在复杂场景如ScreenSpot-Pro基准中,现有模型常表现出次优性能。利用提出的遮蔽预测分布(MPD)归因方法,我们发现误差主要源于两点:高图像分辨率(导致精度偏差)和复杂界面元素(导致歧义偏差)。为解决这些挑战,我们引入了偏差意识操作推理(BAMI),其包含两个关键操作:粗到细聚焦和候选选择,以有效缓解这些偏差。我们的广泛实验结果表明,BAMI显著提升了各种GUI定位模型在无训练设置下的准确性。例如,将我们的方法应用于TianXi-Action-7B模型,使其在ScreenSpot-Pro基准上的准确性从51.9%提升到57.8%。此外,消融研究证实了BAMI方法在不同参数配置下的鲁棒性,突显了其稳定性和有效性。代码可在https://github.com/Neur-IO/BAMI上获得。

英文摘要

GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Prediction Distribution (MPD)} attribution method, we identify that the primary sources of errors are twofold: high image resolution (leading to precision bias) and intricate interface elements (resulting in ambiguity bias). To address these challenges, we introduce \textbf{Bias-Aware Manipulation Inference (BAMI)}, which incorporates two key manipulations, coarse-to-fine focus and candidate selection, to effectively mitigate these biases. Our extensive experimental results demonstrate that BAMI significantly enhances the accuracy of various GUI grounding models in a training-free setting. For instance, applying our method to the TianXi-Action-7B model boosts its accuracy on the ScreenSpot-Pro benchmark from 51.9\% to 57.8\%. Furthermore, ablation studies confirm the robustness of the BAMI approach across diverse parameter configurations, highlighting its stability and effectiveness. Code is available at https://github.com/Neur-IO/BAMI.

2605.06662 2026-05-08 cs.RO

Multi-Robot Coordination in V2X Environments

多机器人在V2X环境中的协同

John Pravin Arockiasamy, Alexey Vinel

发表机构 * Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院) Halmstad University(哈姆斯塔德大学)

AI总结 本文提出一种V2X通信框架,通过引入机器人中心的服务层服务实现复杂城市交通环境中社交机器人的去中心化协作。

Comments Accepted for publication at the IEEE Intelligent Transportation Systems Conference (ITSC), 2026

详情
AI中文摘要

本文提出了一种Vehicle-to-Everything (V2X)通信框架,该框架能够使在复杂城市交通环境中操作的社会机器人实现去中心化协作。基于ETSI协作意识和 maneuver 协调服务,该框架引入了两个以机器人为中心的设施层服务:机器人意识服务(RAS)和机器人 maneuver 协调服务(RMCS),分别通过机器人意识消息(RAM)和机器人 maneuver 协调消息(RMCM)实现。RAS能够实现角色感知、任务导向的机器人意识,同时整合外部检测到的易受伤害道路使用者(VRUs),包括非V2X行人,进入协作意识。RMCS支持基于事件驱动、低延迟的机器人 maneuver 协调,在显式建立的角色下进行,无需集中基础设施或先前配对。一个现实世界的证明概念展示了在行人穿越场景中,人形机器人和四足机器人之间确定性的多机器人协调,由形式规范的有限状态协调模型控制。互补的模拟评估了机器人介导的VRU聚类在混合V2X环境中的表现,显示RAS基于的聚类在安全关键区域整合非V2X VRUs,同时减少V2X启用VRUs的冗余传输,从而降低信道负载。共同,所提出的服务为将协作机器人整合到未来连接、协作和自动化移动生态系统中提供了可扩展且标准兼容的基础。

英文摘要

This paper presents a Vehicle-to-Everything (V2X) communication framework that enables decentralized cooperation among social robots operating in complex urban traffic environments. Building on ETSI Cooperative Awareness and Maneuver Coordination services, the framework introduces two robot-centric facility-layer services: the Robot Awareness Service (RAS) and the Robot Maneuver Coordination Service (RMCS), realized through the Robot Awareness Message (RAM) and the Robot Maneuver Coordination Message (RMCM), respectively. RAS enables role-aware, task-oriented robot awareness while integrating externally detected Vulnerable Road Users (VRUs), including non-V2X pedestrians, into cooperative awareness. RMCS supports event-driven, low-latency coordination of robot maneuvers under explicitly established roles, without centralized infrastructure or prior pairing. A real-world proof of concept demonstrates deterministic multi-robot coordination between a humanoid robot and a quadrupedal robot assisting a pedestrian during a road-crossing scenario, governed by a formally specified finite-state coordination model. Complementary simulations evaluate robot-mediated VRU clustering in mixed V2X environments, showing that RAS-based clustering integrates non-V2X VRUs in safety-critical areas while reducing redundant transmissions from V2X-enabled VRUs, thereby lowering channel load. Together, the proposed services provide a scalable and standards-aligned foundation for integrating cooperative robots into future Connected, Cooperative, and Automated Mobility ecosystems.

2605.06660 2026-05-08 cs.LG cs.AI cs.CL

Verifier-Backed Hard Problem Generation for Mathematical Reasoning

基于验证器的数学推理硬问题生成

Yuhang Lai, Jiazhan Feng, Yee Whye Teh, Ning Miao

发表机构 * Department of Data Science, City University of Hong Kong(香港城市大学数据科学系) Hong Kong Institute of AI for Science, City University of Hong Kong(香港城市大学人工智能科学研究所) School of Intelligence Science and Technology, Peking University(北京大学智能科学与技术学院) Department of Statistics, University of Oxford(牛津大学统计系)

AI总结 本文提出VHG框架,通过引入独立验证器约束问题生成器的奖励,提升生成问题的有效性和难度,实验显示其在不定积分和数学推理任务中优于基线方法。

详情
AI中文摘要

大型语言模型(LLMs)在解决科学和数学问题方面表现出强大的能力,但它们在生成有效、具有挑战性和新颖性的问题方面存在困难,这是推进LLM训练和促进自主科学研究的关键组成部分。现有问题生成方法要么依赖昂贵的人类专家参与,要么采用幼稚的自我扮演范式,这通常导致由于奖励黑客而产生无效的问题。本文介绍VHG,一种基于三方自我扮演的验证器增强型硬问题生成框架。通过将独立验证器整合到传统的设置者-求解者二元关系中,我们的设计使设置者的奖励由问题有效性(由验证器评估)和难度(由求解者评估)共同决定。我们实例化了两种验证器变体:一个硬符号验证器和一个软LLM基于验证器,在不定积分任务和一般数学推理任务上进行了评估。实验结果表明,VHG在所有基线方法上显著领先。

英文摘要

Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generation approaches either depend on expensive human expert involvement or adopt naive self-play paradigms, which frequently yield invalid problems due to reward hacking. This work introduces VHG, a verifier-enhanced hard problem generation framework built upon three-party self-play. By integrating an independent verifier into the conventional setter-solver duality, our design constrains the setter's reward to be jointly determined by problem validity (evaluated by the verifier) and difficulty (assessed by the solver). We instantiate two verifier variants: a Hard symbolic verifier and a Soft LLM-based verifier, with evaluations conducted on indefinite integral tasks and general mathematical reasoning tasks. Experimental results show that VHG substantially outperforms all baseline methods by a clear margin.

2605.06658 2026-05-08 cs.CV

Relit-LiVE: Relight Video by Jointly Learning Environment Video

Relit-LiVE: 通过联合学习环境视频实现视频照明

Weiqing Xiao, Hong Li, Xiuyu Yang, Houyuan Chen, Wenyi Li, Tianqi Liu, Shaocong Xu, Chongjie Ye, Hao Zhao, Beibei Wang

发表机构 * Nanjing University(南京大学) Tsinghua University(清华大学) The Hong Kong University of Science and Technology(香港科学与技术大学) University of Chinese Academy of Sciences(中国科学院大学) Huazhong University of Science and Technology(华中科技大学) The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳))

AI总结 Relit-LiVE通过引入原始参考图像和环境视频预测方法,实现了无需相机姿态先验知识的物理一致视频照明,提升了真实场景下的照明效果和时间稳定性。

Comments Accepted at SIGGRAPH 2026. Project site: https://github.com/zhuxing0/Relit-LiVE

详情
AI中文摘要

最近的进展表明,大规模视频扩散模型可以通过将视频分解为内在场景表示并进行新颖照明下的前向渲染,被重新用于神经渲染。然而,这种范式本质上依赖于准确的内在分解,这在现实视频中仍然高度不可靠,常导致失真外观、破碎材质和累积时间伪影。本文提出了Relit-LiVE,一种新的视频照明框架,能够在不需先验相机姿态知识的情况下生成物理一致且时间稳定的成果。我们的关键见解是显式引入原始参考图像到渲染过程中,使模型能够恢复在内在表示中不可避免丢失或损坏的场景线索。此外,我们提出了一种新的环境视频预测公式,同时在单个扩散过程中生成照明视频和与每个相机视角对齐的每帧环境图。这种联合预测强制了强几何-照明对齐,并自然支持动态照明和相机运动,显著提高了视频照明的物理一致性,同时降低了已知每帧相机姿态的要求。广泛的实验表明,Relit-LiVE在合成和真实世界基准上均优于最先进的视频照明和神经渲染方法。除了照明外,我们的框架还自然支持广泛的下游应用,包括场景级渲染、材质编辑、物体插入和流媒体视频照明。该项目可在https://github.com/zhuxing0/Relit-LiVE上获得。

英文摘要

Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decomposition, which remains highly unreliable for real-world videos and often leads to distorted appearances, broken materials, and accumulated temporal artifacts during relighting. In this work, we present Relit-LiVE, a novel video relighting framework that produces physically consistent, temporally stable results without requiring prior knowledge of camera pose. Our key insight is to explicitly introduce raw reference images into the rendering process, enabling the model to recover critical scene cues that are inevitably lost or corrupted in intrinsic representations. Furthermore, we propose a novel environment video prediction formulation that simultaneously generates relit videos and per-frame environment maps aligned with each camera viewpoint in a single diffusion process. This joint prediction enforces strong geometric-illumination alignment and naturally supports dynamic lighting and camera motion, significantly improving physical consistency in video relighting while easing the requirement of known per-frame camera pose. Extensive experiments demonstrate that Relit-LiVE consistently outperforms state-of-the-art video relighting and neural rendering methods across synthetic and real-world benchmarks. Beyond relighting, our framework naturally supports a wide range of downstream applications, including scene-level rendering, material editing, object insertion, and streaming video relighting. The Project is available at https://github.com/zhuxing0/Relit-LiVE.

2605.06656 2026-05-08 cs.LG cs.DM cs.ET math.OC

Why Global LLM Leaderboards Are Misleading: Small Portfolios for Heterogeneous Supervised ML

为何全球大语言模型排行榜具有误导性:异质监督学习的小微投资组合

Jai Moondra, Ayela Chughtai, Bhargavi Lanka, Swati Gupta

发表机构 * Carnegie Mellon University(卡内基梅隆大学) MIT Sloan School of Management(麻省理工学院斯隆管理学院)

AI总结 本文分析了全球大语言模型排行榜的误导性,指出语言异质性导致传统排名方法失效,并提出(λ,ν)投资组合框架以解决异质性问题。

详情
AI中文摘要

通过在Arena上分析约89,000次比较,本文发现基于全局Bradley-Terry(BT)排名的全球大语言模型排行榜具有误导性。近三分之二的决定性投票被抵消,甚至根据全局BT排名前50模型在 pairwise win probabilities 上最多为0.53。我们追溯这种失败的原因是意见在语言、任务和时间上存在强结构性异质性。此外,我们发现语言在其中起关键作用。按语言(和家族)分组显著提高了投票的一致性,导致ELO分数的方差增加两个数量级(即非常一致的排名)。看起来是全球噪声,实际上是一种由一致但冲突的子群体组成的混合。为了解决监督学习中的异质性,我们引入了(λ,ν)投资组合框架,即小集合模型,其预测误差至多为λ,

英文摘要

Ranking LLMs via pairwise human feedback underpins current leaderboards for open-ended tasks, such as creative writing and problem-solving. We analyze ~89K comparisons in 116 languages from 52 LLMs from Arena, and show that the best-fit global Bradley-Terry (BT) ranking is misleading. Nearly 2/3 of the decisive votes cancel out, and even the top 50 models according to the global BT ranking are statistically indistinguishable (pairwise win probabilities are at most 0.53 within the top 50 models). We trace this failure to strong, structured heterogeneity of opinions across language, task, and time. Moreover, we find an important characteristic - *language* plays a key role. Grouping by language (and families) increases the agreement of votes massively, resulting in two orders of magnitude higher spread in the ELO scores (i.e., very consistent rankings). What appears as global noise is in fact a mixture of coherent but conflicting subpopulations. To address such heterogeneity in supervised machine learning, we introduce the framework of $(λ, ν)$-portfolios, which are small sets of models that achieve a prediction error at most $λ$, "covering" at least a $ν$ fraction of users. We formulate this as a variant of the set cover problem and provide guarantees using the VC dimension of the underlying set system. On the Arena data, our algorithms recover just 5 distinct BT rankings that cover over 96% of votes at a modest $λ$, compared to the 21% coverage by the global ranking. We also provide a portfolio of 6 LLMs that cover twice as many votes as the top-6 LLMs from a global ranking. We further construct portfolios for a classification problem on the COMPAS dataset using an ensemble of fairness-regularized classification models and show that these portfolios can be used to detect blind spots in the data, which might be of independent interest to policymakers.

2605.06654 2026-05-08 cs.LG cs.AI math.OC

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

优化器-模型一致性:使用与预训练相同的优化器进行全微调可减少遗忘

Yuxing Liu, Jianyu Wang, Tong Zhang

发表机构 * UIUC(伊利诺伊大学香槟分校) Apple(苹果公司)

AI总结 本文发现使用与预训练相同的优化器进行全微调,在监督微调阶段能更少遗忘并保持性能,提出优化器-模型一致性概念,通过实验和理论分析揭示优化器对模型的影响及微调策略的重要性。

详情
AI中文摘要

优化器在大语言模型(LLMs)的预训练和微调阶段起重要作用。本文提出一个观察:使用与预训练相同的优化器进行全微调,在监督微调(SFT)阶段能实现更好的学习-遗忘平衡,即在不遗忘的前提下达到相同或更好的新任务性能,优于其他优化器和可能令人惊讶的LoRA。我们称之为优化器-模型一致性。通过受控实验和理论分析,我们发现:1)优化器通过在激活上产生正则化效应来塑造模型,导致预训练检查点周围的景观不同;2)为响应这种正则化效应,SFT中的权重更新应遵循特定结构以降低对预训练知识的遗忘,这可通过使用相同优化器实现。此外,我们特别比较了Muon和AdamW在预训练和SFT阶段的使用情况,发现Muon在微调推理任务时表现更差。通过合成语言建模实验,我们证明这可能源于Muon对机械记忆的强烈倾向,这可能在数据量较少的SFT中损害模式获取。

英文摘要

Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves a better learning-forgetting tradeoff, i.e., forgetting less while achieving the same or better performance on the new task, than other optimizers and, possibly surprisingly, LoRA, during the supervised finetuning (SFT) stage. We term this phenomenon optimizer-model consistency. To better understand it, through controlled experiments and theoretical analysis, we show that: 1) optimizers can shape the models by having regularization effects on the activations, leading to different landscapes around the pretrained checkpoints; 2) in response to this regularization effect, the weight update in SFT should follow some specific structures to lower forgetting of the knowledge learned in pretraining, which can be obtained by using the same optimizer. Moreover, we specifically compare Muon and AdamW when they are employed throughout the pretraining and SFT stages and find that Muon performs worse when finetuned for reasoning tasks. With a synthetic language modeling experiment, we demonstrate that this can come from Muon's strong tendency towards rote memorization, which may hurt pattern acquisition with a small amount of data, as for SFT.

2605.06652 2026-05-08 cs.LG cs.AI cs.CL

When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels

在没有基准的情况下:在无标签的情况下验证比较LLM安全性评分

Sushant Gautam, Finn Schwall, Annika Willoch Olstad, Fernando Vallecillos Ruiz, Birk Torpmann-Hagen, Sunniva Maria Stordal Bjørklund, Leon Moonen, Klas Pettersen, Michael A. Riegler

发表机构 * Simula Metropolitan Center for Digital Engineering(Simula 数字工程中心) Oslo Metropolitan University(奥斯陆 Metropolitan 大学) University of Oslo(奥斯陆大学) Simula Research Laboratory(Simula 研究实验室) Norwegian Directorate of Health(挪威健康 Directorate)

AI总结 本文提出在无标签情况下验证LLM安全性的方法,通过构建仪器有效性链来替代真实标签,通过实验验证其有效性,并展示了在不同场景下的应用和结果。

Comments SimpleAudit Repository: https://github.com/kelkalot/simpleaudit

详情
AI中文摘要

许多部署必须在相关语言、行业或监管制度中缺乏标签基准的情况下比较候选语言模型的安全性。我们正式将此设定定义为无基准的比较安全性评分,并指定一种合同,使得基于场景的审计可以被解释为部署证据。评分仅在固定场景包、评分标准、审计员、法官、采样配置和重跑预算下才有效。由于没有标签可用,我们用仪器有效性链替代真实标签的一致性:对受控的“安全 vs 被削弱”对比的响应性,目标驱动的方差主导审计员和法官的伪影,以及在多次重跑中的稳定性。我们通过SimpleAudit本地优先评分工具实例化该链,并在挪威安全包上进行了验证。安全和被削弱的目标在AUROC值在0.89到1.00之间分离,目标身份是主导的方差成分(η²≈0.52),严重性轮廓在十次重跑中稳定。将相同的链应用于Petri,显示它既允许工具。显著的差异出现在链上游,即声明-合同执行和部署适应性方面。一个挪威公共部门采购案例比较Borealis和Gemma 3,展示了由此产生的证据:更安全的模型取决于场景类别和风险度量。因此,评分、匹配的差异、关键率、不确定性以及所用的审计员和法官必须一起报告,而不是合并成单一排名。

英文摘要

Many deployments must compare candidate language models for safety before a labeled benchmark exists for the relevant language, sector, or regulatory regime. We formalize this setting as benchmarkless comparative safety scoring and specify the contract under which a scenario-based audit can be interpreted as deployment evidence. Scores are valid only under a fixed scenario pack, rubric, auditor, judge, sampling configuration, and rerun budget. Because no labels are available, we replace ground-truth agreement with an instrumental-validity chain: responsiveness to a controlled safe-versus-abliterated contrast, dominance of target-driven variance over auditor and judge artifacts, and stability across reruns. We instantiate the chain in SimpleAudit, a local-first scoring instrument, and validate it on a Norwegian safety pack. Safe and abliterated targets separate with AUROC values between 0.89 and 1.00, target identity is the dominant variance component ($η^2 \approx 0.52$), and severity profiles stabilize by ten reruns. Applying the same chain to Petri shows that it admits both tools. The substantial differences arise upstream of the chain, in claim-contract enforcement and deployment fit. A Norwegian public-sector procurement case comparing Borealis and Gemma 3 demonstrates the resulting evidence in practice: the safer model depends on scenario category and risk measure. Consequently, scores, matched deltas, critical rates, uncertainty, and the auditor and judge used must be reported together rather than collapsed into a single ranking.

2605.06650 2026-05-08 cs.CL

Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients

超越负回滚:仅正回滚的策略优化

Mingwei Xu, Hao Fang

发表机构 * University of Washington(华盛顿大学)

AI总结 本文提出POPO框架,通过仅使用在线正回滚进行学习,避免负回滚,并通过siamese网络和相似性惩罚提升稳定性,实验证明其在数学基准测试中性能优于GRPO。

详情
AI中文摘要

可验证奖励的强化学习(RLVR)由于确定性验证,已成为增强大语言模型(LLMs)推理能力的主要范式。社区见证了从近端策略优化(PPO)到群相对策略优化(GRPO)的快速转变,其中GRPO通过简单的分组正负回滚估计减少复杂优势估计。然而,我们注意到负回滚可能没有失败严重程度的梯度,并且组合的广阔性使惩罚少量采样负样本难以覆盖稀疏二进制奖励信号。在本文中,我们提出了正-only策略优化(POPO),一种新的RLVR框架,其中学习仅通过在线正回滚进行。具体而言,POPO利用正回滚集上的有界重要性采样。因此,不使用离散的负回滚进行梯度指导。我们展示了通过强化正概率通过回滚再分配可以自然产生隐含的负梯度。接下来,POPO通过两种机制稳定策略优化。首先,它应用具有动量基适应律的siamese策略网络以稳定策略进化。其次,我们用有界相似性惩罚项替代KL散度在siamese表示空间中。我们使用公开可用的、已确立的文本LLM模型,例如Qwen家族,在所有级别的数学基准测试中进行了广泛的实验。我们的实验表明,POPO在性能上与或优于GRPO相当。值得注意的是,我们展示了POPO在Qwen-Math-7B上达到36.67%的AIME 2025成绩,优于GRPO 30.00%。我们的消融和扫描研究进一步证明了POPO组件的必要性和鲁棒性。

英文摘要

Reinforcement learning with verifiable rewards (RLVR), due to the deterministic verification, becomes a dominant paradigm for enhancing the reasoning ability of large language models (LLMs). The community witnesses the rapid change from the Proximal Policy Optimization (PPO) to Group Relative Policy Optimization (GRPO), in which GRPO reduces the complicated advantage estimation with simple estimation over grouped positive and negative rollouts. However, we note that negative rollouts may admit no gradation of failure severity, and the combinatorial vastness makes penalizing a few sampled negatives unlikely to cover a meaningful reward signal under sparse binary rewards. In this work, we propose Positive-Only Policy Optimization (POPO), a novel RLVR framework in which learning can occur exclusively via online positive rollouts. Specifically, POPO utilizes bounded importance sampling over the positive rollout set. Thus, no disjoint negative rollouts are used for the gradient guidance. We show that implicit negative gradients can emerge naturally through reinforcing the positive probability via rollouts redistribution. Next, POPO stabilizes the policy optimization through two mechanisms. First, it applies a siamese policy network with a momentum-based adaptation law for stabilized policy evolution. Second, we replace the KL-divergence with a bounded similarity penalty term in the siamese representation space. We conduct extensive experiments using publicly available, well-established text-LLM models, e.g., the Qwen family, across all-level mathematical benchmarks. Our experiment demonstrates that POPO achieves performance comparable to, or even superior to GRPO. Notably, we show that POPO can achieve 36.67% in AIME 2025 with Qwen-Math-7B, outperforming GRPO 30.00%. Our ablation and sweep studies further illustrate the necessity and robustness of POPO components.

2605.06646 2026-05-08 cs.LG

Inductive Venn-Abers and related regressors

归纳性Venn-Abers及相关回归器

Ivan Petej, Vladimir Vovk

发表机构 * GitHub

AI总结 本文将Venn-Abers回归器推广到无界回归,引入符合预测元素,通过模拟和实证研究证明其在大训练集下提升预测效率。

Comments 33 pages

详情
AI中文摘要

Venn-Abers预测器是一种具有有效性特性的概率预测器,但其主要限制是仅适用于二分类问题,近期已扩展到有界回归。我们将其推广到无界回归,这需要加入符合预测元素。在我们的模拟和实证研究中,我们调查了从Venn-Abers回归器导出的点回归器的预测效率,并认为它们在较大训练集下略微提高了标准回归器的预测效率。

英文摘要

Venn-Abers predictors are probabilistic predictors that enjoy appealing properties of validity, but their major limitation is that they are applicable only to the case of binary classification, with a recent extension to bounded regression. We generalize them to the case of unbounded regression, which requires adding an element of conformal prediction. In our simulation and empirical studies we investigate the predictive efficiency of point regressors derived from Venn-Abers regressors and argue that they somewhat improve the predictive efficiency of standard regressors for larger training sets.

2605.06643 2026-05-08 cs.CV cs.AI cs.LG cs.MM

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

我们是否在多模态领域泛化中取得进展?一个全面的基准研究

Hao Dong, Hongzhao Li, Shupan Li, Muhammad Haris Khan, Eleni Chatzi, Olga Fink

发表机构 * ETH Zürich(苏黎世联邦理工学院) Zhengzhou University(郑州大学) MBZUAI(马克斯·普朗克人工智能研究所) EPFL(苏黎世联邦理工学院)

AI总结 本文通过MMDG-Bench基准测试,评估了多模态领域泛化方法的有效性,发现现有方法在公平比较下进步有限,且存在领域和模态配置差异,揭示了领域泛化仍需进一步改进。

Comments Code: https://github.com/lihongzhao99/MMDG_Benchmark

详情
AI中文摘要

尽管多模态领域泛化(MMDG)在增强模型鲁棒性方面日益流行,但其性能提升是否反映真正的算法进步仍不明确。当前研究碎片化,不同数据集、模态配置和实验设置差异显著。现有基准主要关注动作识别,忽略了输入损坏、缺失模态和模型可信度等现实挑战。为此,本文引入MMDG-Bench,首个统一且全面的MMDG基准,涵盖六个数据集、三个任务:动作识别、机械故障诊断和情感分析。MMDG-Bench包含六种模态组合、九种代表性方法和多种评估设置。除标准准确率外,还系统评估了抗损坏性、缺失模态泛化、误分类检测和分布外检测。通过训练7,402个神经网络完成95个跨领域任务,得出五个关键结论:(1)公平比较下,最新专用MMDG方法仅在ERM基线上有微小提升;(2)无单一方法在所有数据集或模态组合中表现最佳;(3)与上界性能仍有显著差距,表明MMDG仍远未解决;(4)三模态融合不总优于最强双模态配置;(5)所有评估方法在损坏和缺失模态场景下均显著退化,部分方法进一步损害模型可信度。

英文摘要

Despite the growing popularity of Multimodal Domain Generalization (MMDG) for enhancing model robustness, it remains unclear whether reported performance gains reflect genuine algorithmic progress or are artifacts of inconsistent evaluation protocols. Current research is fragmented, with studies varying significantly across datasets, modality configurations, and experimental settings. Furthermore, existing benchmarks focus predominantly on action recognition, often neglecting critical real-world challenges such as input corruptions, missing modalities, and model trustworthiness. This lack of standardization obscures a reliable assessment of the field's advancement. To address this issue, we introduce MMDG-Bench, the first unified and comprehensive benchmark for MMDG, which standardizes evaluation across six datasets spanning three diverse tasks: action recognition, mechanical fault diagnosis, and sentiment analysis. MMDG-Bench encompasses six modality combinations, nine representative methods, and multiple evaluation settings. Beyond standard accuracy, it systematically assesses corruption robustness, missing-modality generalization, misclassification detection, and out-of-distribution detection. With 7, 402 neural networks trained in total across 95 unique cross-domain tasks, MMDG-Bench yields five key findings: (1) under fair comparisons, recent specialized MMDG methods offer only marginal improvements over ERM baseline; (2) no single method consistently outperforms others across datasets or modality combinations; (3) a substantial gap to upper-bound performance persists, indicating that MMDG remains far from solved; (4) trimodal fusion does not consistently outperform the strongest bimodal configurations; and (5) all evaluated methods exhibit significant degradation under corruption and missing-modality scenarios, with some methods further compromising model trustworthiness.

2605.06642 2026-05-08 cs.CL cs.AI

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

StraTA: 通过战略轨迹抽象激励代理强化学习

Xiangyuan Xue, Yifan Zhou, Zidong Wang, Shengji Tang, Philip Torr, Wanli Ouyang, Lei Bai, Zhenfei Yin

发表机构 * The Chinese University of Hong Kong(香港中文大学) Shanghai Artificial Intelligence Laboratory(上海人工智能实验室) University of Georgia(佐治亚大学) University of Oxford(牛津大学) Shenzhen Loop Area Institute(深圳Loop区研究院)

AI总结 本文提出StraTA框架,通过引入显式的轨迹级策略提升代理强化学习的样本效率和最终性能,实验显示其在ALFWorld、WebShop和SciWorld上均优于基线模型。

Comments 26 pages, 4 figures, 7 tables

详情
AI中文摘要

大型语言模型(LLMs)越来越多地被用作交互代理,但优化其长期决策仍困难,因为现有方法大多纯反应式,削弱了探索和长期轨迹的信用分配。本文提出战略轨迹抽象(StraTA),一种简单的框架,通过在代理强化学习(RL)中引入显式的轨迹级策略。StraTA从初始任务状态采样紧凑策略,将后续动作基于该策略进行条件化,并通过分层GRPO式滚出设计联合训练策略生成和动作执行,进一步通过多样化策略滚出和关键自我判断增强。在ALFWorld、WebShop和SciWorld上的实验表明,StraTA在样本效率和最终性能上均优于强基线。在ALFWorld上,StraTA达到93.1%的成功率,在WebShop上达到84.2%。在SciWorld上,StraTA获得63.5%的总体分数,优于前沿闭源模型。

英文摘要

Large language models (LLMs) are increasingly used as interactive agents, but optimizing them for long-horizon decision making remains difficult because current methods are largely purely reactive, which weakens both exploration and credit assignment over extended trajectories. In this work, we present Strategic Trajectory Abstraction (StraTA), a simple framework that introduces an explicit trajectory-level strategy into agentic reinforcement learning (RL). StraTA samples a compact strategy from the initial task state, conditions subsequent actions on that strategy, and trains strategy generation and action execution jointly with a hierarchical GRPO-style rollout design, further enhanced by diverse strategy rollout and critical self-judgment. Experiments on ALFWorld, WebShop, and SciWorld show that StraTA consistently improves both sample efficiency and final performance over strong baselines. StraTA reaches success rates of 93.1% on ALFWorld and 84.2% on WebShop. On SciWorld, StraTA attains a 63.5% overall score, outperforming frontier closed-source models.

2605.06641 2026-05-08 cs.AI cs.CV

GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation

GlazyBench:陶瓷釉料属性预测与图像生成的基准测试

Ziyu Zhai, Siyou Li, Juexi Shao, Juntao Yu

发表机构 * Queen Mary University of London(伦敦大学玛丽女王学院)

AI总结 本文提出GlazyBench,首个用于AI辅助釉料设计的基准数据集,包含23148种真实釉料配方,支持釉料属性预测与图像生成任务,通过传统机器学习和大语言模型建立基线,展示出有前景但具挑战性的实验结果。

详情
AI中文摘要

开发陶瓷釉料是一个成本高、耗时的试错过程,由于复杂的化学性质,给独立艺术家带来很大负担。尽管最近多模态AI的进步提供了现代解决方案,但该领域缺乏大规模数据集来训练这些模型。我们提出了GlazyBench,首个用于AI辅助釉料设计的数据集,包含23,148种真实釉料配方,支持两项主要任务:从原材料预测烧制后表面属性(如颜色和透明度),以及根据这些属性生成准确的视觉表示。我们利用传统机器学习和大语言模型建立属性预测的全面基线,并使用深度生成和大多模态模型建立图像生成基准。我们的实验展示了有前景但具有挑战性的结果。GlazyBench开创了AI辅助材料设计的新研究方向,为系统的评估提供了标准化基准。

英文摘要

Developing ceramic glazes is a costly, time-consuming process of trial and error due to complex chemistry, placing a significant burden on independent artists. While recent advances in multimodal AI offer a modern solution, the field lacks the large-scale datasets required to train these models. We propose GlazyBench, the first dataset for AI-assisted glaze design. Comprising 23,148 real glaze formulations, GlazyBench supports two primary tasks: predicting post-firing surface properties, such as color and transparency, from raw materials, and generating accurate visual representations of the glaze based on these properties. We establish comprehensive baselines for property prediction using traditional machine learning and large language models, alongside image generation benchmarks using deep generative and large multimodal models. Our experiments demonstrate promising yet challenging results. GlazyBench pioneers a new research direction in AI-assisted material design, providing a standardized benchmark for systematic evaluation.

2605.06640 2026-05-08 cs.LG cs.AI

Concept-Based Abductive and Contrastive Explanations for Behaviors of Vision Models

基于概念的归纳与对比解释用于视觉模型的行为

Ronaldo Canizales, Divya Gopinath, Corina Păsăreanu, Ravi Mangal

发表机构 * Colorado State University(科罗拉多州立大学) KBR Inc.(KBR公司) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文提出基于概念的归纳与对比解释方法,用于解释视觉模型的行为,通过概念擦除过程建立因果关系,实现对单张图像和图像集合的预测理解。

详情
AI中文摘要

基于概念的解释方法为深度神经网络的预测提供了有前途的解释方式,通过高层次、人类可理解的概念来解释。然而,现有方法要么未能建立概念与模型预测之间的因果联系,要么在表达性上有限,只能推断涉及单个概念的因果解释。同时,关于*形式归纳和对比解释*的平行工作计算了最小的输入特征集,这些特征对模型结果具有因果相关性,但仅考虑低层次特征,如像素。将这两条线结合起来,在本文中,我们提出*基于概念的归纳和对比解释*的概念,捕捉对模型结果具有因果相关性的最小高层次概念集。然后,我们提出了一组算法,枚举所有最小解释,同时使用*概念擦除*过程来建立因果关系。通过适当地聚合这些解释,我们不仅能够理解模型对单张图像的预测,还能对表现出用户指定的共同*行为*的图像集合进行理解。我们在此方法上对多个模型、数据集和行为进行了评估,并展示了其在计算有用、用户友好的解释方面的有效性。

英文摘要

*Concept-based explanations* offer a promising approach for explaining the predictions of deep neural networks in terms of high-level, human-understandable concepts. However, existing methods either do not establish a causal connection between the concepts and model predictions or are limited in expressivity and only able to infer causal explanations involving single concepts. At the same time, the parallel line of work on *formal abductive and contrastive explanations* computes the minimal set of input features causally relevant for model outcomes but only considers low-level features such as pixels. Merging these two threads, in this work, we propose the notion of *concept-based abductive and contrastive explanations* that capture the minimal sets of high-level concepts causally relevant for model outcomes. We then present a family of algorithms that enumerate all minimal explanations while using *concept erasure* procedures to establish causal relationships. By appropriately aggregating such explanations, we are not only able to understand model predictions on individual images but also on collections of images where the model exhibits a user-specified, common *behavior*. We evaluate our approach on multiple models, datasets, and behaviors, and demonstrate its effectiveness in computing helpful, user-friendly explanations.

2605.06639 2026-05-08 cs.LG cs.AI cs.CL cs.MA

Recursive Agent Optimization

递归智能体优化

Apurva Gandhi, Satyaki Chakraborty, Xiangjun Wang, Aviral Kumar, Graham Neubig

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Amazon AGI Labs(亚马逊人工智能实验室)

AI总结 本文提出递归智能体优化方法,通过递归代理训练实现更高效的推理扩展和泛化能力,提升任务处理效率和泛化性能。

详情
AI中文摘要

我们引入递归智能体优化(RAO),一种用于训练递归代理的强化学习方法:代理能够生成并委托子任务给新的自身实例。递归代理实现一种推理时间扩展算法,自然允许代理扩展到更长的上下文并泛化到更困难的问题。RAO提供一种训练模型以充分利用此类递归推理的方法,教导代理何时和如何委托和沟通。我们发现通过这种方式训练的递归代理在训练效率上更好,能够处理超出模型上下文窗口的任务,泛化到比训练任务更困难的任务,并且相比单代理系统能减少实际时间。

英文摘要

We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that naturally allows agents to scale to longer contexts and generalize to more difficult problems via divide-and-conquer. RAO provides a method to train models to best take advantage of such recursive inference, teaching agents when and how to delegate and communicate. We find that recursive agents trained in this way enjoy better training efficiency, can scale to tasks that go beyond the model's context window, generalize to tasks much harder than the ones the agent was trained on, and can enjoy reduced wall-clock time compared to single-agent systems.

2605.06635 2026-05-08 cs.CL

Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents

被引用但未验证:解析和评估LLM深度研究代理中的源归属

Hailey Onweller, Elias Lumer, Austin Huber, Pia Ramchandani, Vamse Kumar Subbiah, Corey Feld

发表机构 * Commercial Technology and Innovation Office, PricewaterhouseCoopers, U.S(普华永道商业技术与创新办公室,美国)

AI总结 本文提出首个源归属评估框架,通过AST解析器大规模评估LLM生成的Markdown报告中的引用,从链接有效性、内容相关性和事实准确性三个维度验证引用可靠性,揭示了表面引用质量与事实可靠性之间的关键断层。

详情
AI中文摘要

大型语言模型(LLMs)驱动深度研究代理,将数百个网络来源的信息综合为引用报告,但这些引用无法可靠验证。当前方法或依赖模型自我引用,风险偏见,或采用检索增强生成(RAG)不验证来源可访问性、相关性或事实一致性。我们引入首个源归属评估框架,使用可复现的AST解析器大规模提取并评估LLM生成的Markdown报告中的内联引用。与仅验证声明的方法不同,该框架通过检索实际引用内容,使人类或模型评估者能根据来源判断每个引用。引用在三个维度上评估:(1)链接有效性验证URL可访问性,(2)相关内容测量主题一致性,(3)事实核查验证事实准确性。我们通过基于评分标准的LLM-as-a-judge评估者,对14个闭源和开源LLM在三个评估维度上进行基准测试。结果表明,即使最强的前沿模型保持链接有效性超过94%和相关性超过80%,但事实准确性仅为39-77%,而不到一半的开源模型能成功生成引用报告。研究深度的消融研究显示,事实核查准确性在两个前沿模型中平均下降42%,表明更多检索不产生更准确的引用。这些发现揭示了表面引用质量与事实可靠性之间的关键断层,且我们的框架提供了评估该断层的基础设施。

英文摘要

Large language models (LLMs) power deep research agents that synthesize information from hundreds of web sources into cited reports, yet these citations cannot be reliably verified. Current approaches either trust models to self-cite accurately, risking bias, or employ retrieval-augmented generation (RAG) that does not validate source accessibility, relevance, or factual consistency. We introduce the first source attribution evaluation framework that uses a reproducible AST parser to extract and evaluate inline citations from LLM-generated Markdown reports at scale. Unlike methods that verify claims in isolation, our framework closes the loop by retrieving the actual cited content, enabling human or model evaluators to judge each citation against its source. Citations are evaluated along three dimensions. (1) Link Works verifies URL accessibility, (2) Relevant Content measures topical alignment, and (3) Fact Check validates factual accuracy against source content. We benchmark 14 closed-source and open-source LLMs across three evaluation dimensions using rubric-based LLM-as-a-judge evaluators calibrated through human review. Our results reveal that even the strongest frontier models maintain link validity above 94% and relevance above 80%, yet achieve only 39-77% factual accuracy, while fewer than half of open-source models successfully generate cited reports in a one-shot setting. Ablation studies on research depth show that Fact Check accuracy drops by approximately 42% on average across two frontier models as tool calls scale from 2 to 150, demonstrating that more retrieval does not produce more accurate citations. These findings reveal a critical disconnect between surface-level citation quality and factual reliability, and our framework provides the evaluation infrastructure to assess the disconnect.

2605.06632 2026-05-08 cs.LG

Crafting Reversible SFT Behaviors in Large Language Models

在大型语言模型中构建可逆的SFT行为

Yuping Lin, Pengfei He, Yue Xing, Yingqian Cui, Jiayuan Ding, Subhabrata Mukherjee, Hui Liu, Zhen Xiang

发表机构 * Michigan State University(密歇根州立大学) Hippocratic AI(希波克拉底AI) University of Georgia(佐治亚大学)

AI总结 本文提出LCDD和SFT-Eraser方法,通过构建稀疏必要子网络实现对SFT诱导行为的可控逆向,验证了结构对行为因果必要性的重要性。

详情
AI中文摘要

监督微调(SFT)在大型语言模型中诱导出新行为,但未对这些行为在模型中的分布施加结构性约束。现有行为解释方法如回路归因方法,通过事后识别与SFT诱导行为相关的稀疏子网络,但这些相关性不意味着因果必要性,限制了在推理时对SFT诱导行为的选择性控制。本文提出一种替代方法:能否将SFT诱导行为有意压缩到稀疏、机理必要的子网络(称为载体)中,同时在推理时无需修改权重即可保持可控?我们提出(a)损失约束双下降(LCDD),通过联合优化路由掩码和模型权重,在显式效用预算下构建此类载体;(b)SFT-Eraser,通过在提取的载体通道上进行激活匹配优化,实现对SFT诱导行为的逆向。在多个模型家族上的安全、固定响应和风格行为中,LCDD生成的稀疏载体在保持目标行为的同时,当被SFT-Eraser触发时能实现强逆向。消融实验进一步证实稀疏结构是逆向的关键前提:相同的触发优化在标准SFT模型上失效,证实结构而非触发设计是关键因素。这些结果提供了直接证据,证明学习到的载体对行为具有因果必要性,指明了系统本地化和选择性抑制部署模型中SFT诱导行为的新方向。

英文摘要

Supervised fine-tuning (SFT) induces new behaviors in large language models, yet imposes no structural constraint on how these behaviors are distributed within the model. Existing behavior interpretation methods, such as circuit attribution approaches, identify sparse subnetworks correlated with SFT-induced behaviors post-hoc. However, such correlations do not imply *causal necessity*, limiting the ability to selectively control SFT-induced behaviors at inference time. We pursue an alternative by asking: can an SFT-induced behavior be deliberately compressed into a sparse, mechanistically necessary subnetwork, termed a *carrier*, while remaining controllable at inference time without weight modification? We propose (a) **Loss-Constrained Dual Descent (LCDD)**, which constructs such carriers by jointly optimizing routing masks and model weights under an explicit utility budget, and (b) **SFT-Eraser**, a soft prompt optimized via activation matching on extracted carrier channels, to reverse the SFT-induced behavior. Across safety, fixed-response, and style behaviors on multiple model families, LCDD yields sparse carriers that preserve target behaviors while enabling strong reversion when triggered by SFT-Eraser. Ablations further establish that the sparse structure is the key precondition for reversal: the same trigger optimization fails on standard SFT models, confirming that structure rather than trigger design is the operative factor. These results provide direct evidence that the learned carriers are causally necessary for the behaviors, pointing to a new direction for systematically localizing and selectively suppressing SFT-induced behaviors in deployed models.

2605.06629 2026-05-08 cs.LG

Hybrid Quantum-Classical GANs for the Generation of Adversarial Network Flows

混合量子-经典GANs用于生成对抗网络流

Prateek Paudel, Nitin Jha, Abhishek Parakh, Mahadevan Subramaniam

发表机构 * Kennesaw State University(肯尼斯州立大学) University of Nebraska Omaha(内布拉斯加大学奥马哈分校)

AI总结 本文提出混合量子-经典GAN框架,利用量子生成器生成模拟恶意流量的合成网络流,通过量子态编码减少计算开销,并测试其对经典IDS的绕过能力,探索量子机器学习在生成高级攻击流中的潜力。

Comments 14 pages

详情
AI中文摘要

经典生成对抗网络(GANs)已被用于生成能够攻击入侵检测系统(IDS)的对抗网络流量,但存在需要大量高维数据集、模式崩溃和计算开销大的缺点。本文提出混合量子-经典GAN(QC-GAN)框架,其中变分量子生成器利用潜在表示生成合成网络流量。通过将潜在向量编码为量子态,实现更丰富的潜在表示并减少计算开销。经典判别器在真实数据集(UNSW-NB15)和QC-GAN生成的伪造流量上进行训练。生成器旨在最小化判别器区分真实与伪造流量的能力,而判别器旨在最大化分类准确率。在攻击模型中,假设攻击者具有有限的量子计算能力,而判别器选择为经典模型。通过经典IDS模型(如随机森林分类器和基于卷积神经网络的分类器)测试生成的流量以绕过检测过程。本文旨在突出量子机器学习在生成高级攻击流中的可能性,并强调对经典IDS的应力测试。最后,评估硬件噪声对攻击的影响,提供IDS的新视角,强调需要量子鲁棒的防御系统。

英文摘要

Classical generative adversarial networks (GANs) have been applied to generate adversarial network traffic capable of attacking intrusion detection systems, but they suffer from shortcomings such as the need for large amounts of high-dimensional datasets, mode collapse, and high computational overhead. In this work, we propose a hybrid quantum-classical GAN (QC-GAN) framework where a variational quantum generator is used to generate synthetic network traffic flows mimicking malicious traffic using latent representations. Instead of sampling classical noise vectors, we encode the latent vector (the hidden features) as a quantum state, which is the basis for claiming more expressive latent representations and reducing computational overhead. A classical discriminator will be trained on real-world datasets (UNSW-NB15) and the proposed QC-GAN-generated fake network flows. In this configuration, the generator aims to minimize the discriminator's ability to distinguish real from fake traffic, while the discriminator aims to maximize its classification accuracy, in an iterative manner. In our attack model, we assume that the attacker is a state actor with access to limited quantum computing power, whereas the discriminator is chosen to be classical, as will likely be the case for most end users and organizations. We test the generated flows using classical intrusion detection system (IDS) models, such as a random forest classifier and a convolutional neural network-based classifier, for their ability to bypass the detection process. This work aims to highlight the possibilities of quantum machine learning as a means of generating advanced attack flows and stress testing classical IDS. Lastly, we further evaluate how hardware-based noise affects these attacks to offer a new perspective on IDS, highlighting the need for a quantum resilient defense system.

2605.06627 2026-05-08 cs.SD cs.LG

PianoCoRe: Combined and Refined Piano MIDI Dataset

PianoCoRe:综合与优化的钢琴MIDI数据集

Ilya Borovik

发表机构 * Skolkovo Institute of Science and Technology(斯克洛尔沃科学与技术研究院)

AI总结 PianoCoRe数据集整合并优化了多个开源钢琴数据集,包含250,046次演奏5,625首作品,提供高质量的MIDI数据及注释对齐功能,支持不同应用需求。

Comments Published in TISMIR. Project repository: https://github.com/ilya16/PianoCoRe

Journal ref Transactions of the International Society for Music Information Retrieval, 9(1), 144-163, 2026

详情
AI中文摘要

PianoCoRe数据集通过整合并优化多个开源钢琴数据集,提供了高质量的MIDI数据及注释对齐功能,包含250,046次演奏5,625首作品,由483位作曲家创作,总计21,763小时的演奏音乐。数据集分为多个层级子集以支持不同应用:从大规模分析和预训练(PianoCoRe-C和去重的PianoCoRe-B)到具有音符级注释对齐的表达性表演建模(PianoCoRe-A/A*)。其中,音符对齐子集PianoCoRe-A提供了到目前为止最大的开源157,207次演奏与1,591首乐谱对齐的集合。此外,数据集还贡献了MIDI质量分类器用于检测损坏和乐谱样式的转录,以及RAScoP对齐精修流程,用于清理时间对齐错误并插值缺失的音符。分析显示,精修过程减少了时间噪声并消除了节奏异常。此外,基于PianoCoRe训练的表达性表演渲染模型在面对未见过的曲目时表现出比基于原始或较小数据集训练的模型更强大的鲁棒性。PianoCoRe为下一代表达性钢琴表演研究提供了现成的基础。

英文摘要

Symbolic music datasets with matched scores and performances are essential for many music information retrieval (MIR) tasks. Yet, existing resources often cover a narrow range of composers, lack performance variety, omit note-level alignments, or use inconsistent naming formats. This work presents PianoCoRe, a large-scale piano MIDI dataset that unifies and refines major open-source piano corpora. The dataset contains 250,046 performances of 5,625 pieces written by 483 composers, totaling 21,763 h of performed music. PianoCoRe is released in tiered subsets to support different applications: from large-scale analysis and pre-training (PianoCoRe-C and deduplicated PianoCoRe-B) to expressive performance modeling with note-level score alignment (PianoCoRe-A/A*). The note-aligned subset, PianoCoRe-A, provides the largest open-source collection of 157,207 performances aligned to 1,591 scores to date. In addition to the dataset, the contributions are: (1) a MIDI quality classifier for detecting corrupted and score-like transcriptions and (2) RAScoP, an alignment refinement pipeline that cleans temporal alignment errors and interpolates missing notes. The analysis shows that the refinement reduces temporal noise and eliminates tempo outliers. Moreover, an expressive performance rendering model trained on PianoCoRe demonstrates improved robustness to unseen pieces compared to models trained on raw or smaller datasets. PianoCoRe provides a ready-to-use foundation for the next generation of expressive piano performance research.

2605.06625 2026-05-08 cs.CL

Parser agreement and disagreement in L2 Korean UD: Implications for human-in-the-loop annotation

L2韩语UD中的解析一致与不一致:对人类在环标注的启示

Hakyung Sung, Gyu-Ho Shin

发表机构 * Rochester Institute of Technology(罗切斯特技术学院) University of Illinois Chicago(伊利诺伊大学芝加哥分校)

AI总结 本文提出一种简化的人工在环流程,利用两个领域适应解析器的一致性进行第二语言韩语形态语法标注,通过对比解析器与人类判断发现其高度一致,支持半自动标注的可行性,并指出解析分歧集中在语言可预测领域。

Comments To be published in the 20th Linguistic Annotation Workshop

详情
AI中文摘要

我们提出了一种简化的人工在环流程,通过利用两个领域适应解析器之间的一致性,对第二语言(L2)韩语的形态语法标注进行简化。我们首先评估解析器一致性是否能作为标注正确性的代理,通过与独立人类判断进行比较。结果表明解析器与人类判断之间有很强的一致性,支持半自动L2-韩语UD标注的可行性。进一步分析显示,解析器的不一致集中在语言上可预测的领域,如语法关系区分和从句边界模糊性。尽管许多不一致案例可通过迭代模型优化解决,但其他案例反映了在解析和标注L2-韩语语料时固有的表示挑战。

英文摘要

We propose a simplified human-in-the-loop workflow for second language (L2) Korean morphosyntactic annotation by leveraging agreement between two domain-adapted parsers. We first evaluate whether parser agreement can serve as a proxy for annotation correctness by comparing it with independent human judgments. The results show strong correspondence between parser and human judgments, supporting the feasibility of semi-automatic L2-Korean UD annotation. Further analysis demonstrates that parser disagreements cluster in linguistically predictable domains such as grammatical-relation distinctions and clause-boundary ambiguity. While many disagreement cases are tractable for iterative model refinement, others reflect deeper representational challenges inherent in parsing and tagging L2-Korean corpora.

2605.06619 2026-05-08 cs.CL cs.CY

Algospeak, Hiding in the Open: The Trade-off Between Legible Meaning and Detection Avoidance

算法语言:隐藏在开放中的平衡:可理解性与检测规避之间的权衡

Jan Fillies, Ronald E. Robertson, Jeffrey Hancock

发表机构 * Stanford University(斯坦福大学) Freie Universität Berlin(柏林自由大学)

AI总结 本文研究了算法语言在内容生成和审核中的平衡问题,提出了多数可理解调制概念,并通过实验验证了可理解性与检测规避之间的关系。

Comments Under Review

详情
AI中文摘要

随着大型语言模型(LLMs)越来越多地参与内容生成和审核,已知的算法语言逃避策略加剧了逃避者和检测者之间的共进化。本研究基于联合行动模型,正式化了底层动态:当算法语言增加时,检测性和可理解性会降低。进一步,引入了多数可理解调制(MUM)的概念,定义为额外逃避性改变会增加检测逃避但使大多数接收者失去理解的调制水平。为了经验性探测这种权衡,我们引入了一个可重复的框架,可用于创建保留意义的算法语言风格变体,基于现有的分类法,并具有可调的调制水平。使用新冠虚假信息作为第一个证明例子设置,我们构建了一个包含700个调制项的参考数据集,来源于二十个基础句子,跨越五个调制水平和七种策略。然后,我们运行了两个关联的评估,使用七种不同的语言模型:一种测试通过意义恢复进行解释,另一种通过分类进行虚假信息检测。对调制水平的曲线拟合得出多数可理解调制阈值的估计,并能进行策略和模型的敏感性分析,见图1。结果揭示了可理解性和调制之间的特征关系。本研究为理解算法语言背后的动态奠定了基础,并提供了所描述的框架、数据集和实验设置。

英文摘要

As large language models (LLMs) increasingly mediate both content generation and moderation, linguistic evasion strategies known as Algospeak have intensified the coevolution between evaders and detectors. This research formalizes the underlying dynamics grounded in a joint action model: when Algospeak increases, detectability and understandability decrease. Further, the concept of Majority Understandable Modulation (MUM) is introduced and defined as the modulation level at which additional evasive alteration increases detector evasion but loses comprehension for the majority of recipients. To empirically probe this trade-off, we introduce a reproducible framework that can be used to create meaning-preserving, Algospeak-style variants, based on an existing taxonomy and with tunable modulation levels. Using COVID-19 disinformation as a first proof-by-example setting, we construct a reference dataset of 700 modulated items, drawn from twenty base sentences across five modulation levels and seven strategies. We then run two linked evaluations with seven different language models: one testing for interpretation through meaning recovery and one for disinformation detection through classification. Curve fitting over modulation levels yields an estimate of the Majority Understandable Modulation threshold and enables sensitivity analyses across strategies and models, see Figure 1. Results reveal the characteristic relationships between understandability and modulation. This study lays the groundwork for understanding the dynamics behind Algospeak and provides the framework, dataset, and experimental setups described.

2605.06615 2026-05-08 cs.LG cs.AI cs.CL math.OC

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

何时以及为何SignSGD优于SGD:基于ℓ1范数下界的一个理论研究

Hongyi Tao, Dingzhi Yu, Lijun Zhang

发表机构 * State Key Laboratory of Novel Software Technology, Nanjing University(南京大学新型软件技术国家重点实验室) School of Artificial Intelligence, Nanjing University(南京大学人工智能学院)

AI总结 本文通过分析ℓ1范数站稳性、ℓ∞光滑性和可分离噪声模型,揭示SignSGD在稀疏噪声下比SGD更高效的原因,并证明其在矩阵域中的最优性。

Comments Code is available at https://github.com/Dingzhen230/SignSGD_Outperforms_SGD

详情
AI中文摘要

基于符号优化算法,如SignSGD和Muon,在训练大基础模型中表现出色。尽管经验上成功,我们仍缺乏对何时以及为何这些符号方法优于标准SGD的理论理解。核心障碍在于,在标准平滑性和有限方差条件下,SGD已知是寻找以ℓ2范数度量的平稳点的最小最大最优方法,从而在标准设置中根本排除了符号方法的复杂性增益。为克服这一障碍,我们分析了利用ℓ1范数站稳性、ℓ∞光滑性和可分离噪声模型的符号优化器,这能更好地捕捉符号更新的坐标性质。在这一不同的问题几何下,我们推导了SignSGD的匹配上界和下界,并明确刻画了SignSGD在其中证明优于SGD的问题类别。具体而言,我们比较了SignSGD的上界与SGD的下界,表明在稀疏噪声下,SignSGD通过因子d(d为问题维度)减少复杂性。此外,我们将这一框架扩展到矩阵域,为Muon优化器提供等效最优下界,证明将符号运算扩展到矩阵域保持了与维度的最优缩放。最后,我们将理论界限与实践连接,证明SignSGD的理论优势准确预测了其在124M参数GPT-2模型预训练中的更快收敛。

英文摘要

Sign-based optimization algorithms, such as SignSGD and Muon, have garnered significant attention for their remarkable performance in training large foundation models. Despite this empirical success, we still lack a theoretical understanding of when and why these sign-based methods outperform vanilla SGD. The core obstacle is that under standard smoothness and finite variance conditions, SGD is known to be minimax optimal for finding stationary points measured by $\ell_2$-norms, thereby fundamentally precluding any complexity gains for sign-based methods in standard settings. To overcome this barrier, we analyze sign-based optimizers leveraging $\ell_1$-norm stationarity, $\ell_\infty$-smoothness, and a separable noise model, which can better capture the coordinate-wise nature of signed updates. Under this distinct problem geometry, we derive matched upper and lower bounds for SignSGD and explicitly characterize the problem class in which SignSGD provably dominates SGD. Specifically, we compare the \emph{upper bound of SignSGD} with the \emph{lower bound of SGD}, illustrating that SignSGD effectively reduces the complexity by a factor of $d$ under \emph{sparse noise}, where $d$ is the problem dimension. Furthermore, we elevate this framework to the matrix domain, providing an equivalent optimal lower bound for the Muon optimizer, proving that extending the sign operator to matrices preserves this optimal scaling with dimensionality. Finally, we bridge our theoretical bounds to practice, demonstrating that the theoretical superiority of SignSGD accurately predicts its faster convergence during the pretraining of a 124M parameter GPT-2 model.

2605.06614 2026-05-08 cs.AI cs.CL

SkillOS: Learning Skill Curation for Self-Evolving Agents

SkillOS: 为自演化代理学习技能编目

Siru Ouyang, Jun Yan, Yanfei Chen, Rujun Han, Zifeng Wang, Bhavana Dalvi Mishra, Rui Meng, Chun-Liang Li, Yizhu Jiao, Kaiwen Zha, Maohao Shen, Vishy Tirumalashetty, George Lee, Jiawei Han, Tomas Pfister, Chen-Yu Lee

发表机构 * Google Cloud AI Research(谷歌云人工智能研究) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Massachusetts Institute of Technology(麻省理工学院)

AI总结 SkillOS通过经验驱动的强化学习方法,解决自演化代理中复杂长期技能编目的学习问题,优于记忆-free和强记忆基线,在效果和效率上均表现优异,技能库逐步演变成更丰富的Markdown文件。

Comments 11 pages, 6 figures, 3 tables

详情
AI中文摘要

基于大语言模型的代理越来越多地被用于处理流式任务,但它们通常仍是一次性问题解决者,无法从过去交互中学习。从经验中提取的可重用技能为自演化提供了自然基质,其中高质量的技能编目是关键瓶颈。现有方法要么依赖手动技能编目,要么规定启发式技能操作,或训练短期地平线技能操作。然而,它们仍然难以从间接和延迟反馈中学习复杂长期编目策略。为解决这一挑战,我们提出了SkillOS,一种基于经验的强化学习训练配方,用于学习自演化代理中的技能编目。SkillOS将一个冻结的代理执行器(检索和应用技能)与可训练的技能编目器(更新外部SkillRepo)配对。为了为编目提供学习信号,我们设计了复合奖励,并基于技能相关的任务依赖性在分组任务流上进行训练,其中早期轨迹更新SkillRepo,而后期相关任务评估这些更新。在多轮代理任务和单轮推理任务中,SkillOS在效果和效率上均优于无记忆和强记忆基线,所学的技能编目器能跨不同执行器backbone和任务领域泛化。进一步分析显示,所学的编目器产生更针对性的技能使用,而SkillRepo中的技能逐步演变成更丰富的Markdown文件,编码更高层次的元技能。

英文摘要

LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck. Existing approaches either rely on manual skill curation, prescribe heuristic skill operations, or train for short-horizon skill operations. However, they still struggle to learn complex long-term curation policies from indirect and delayed feedback. To tackle this challenge, we propose SkillOS, an experience-driven RL training recipe for learning skill curation in self-evolving agents. SkillOS pairs a frozen agent executor that retrieves and applies skills with a trainable skill curator that updates an external SkillRepo from accumulated experience. To provide learning signals for curation, we design composite rewards and train on grouped task streams based on skill-relevant task dependencies, where earlier trajectories update the SkillRepo, and later related tasks evaluate these updates. Across multi-turn agentic tasks and single-turn reasoning tasks, SkillOS consistently outperforms memory-free and strong memory-based baselines in both effectiveness and efficiency, with the learned skill curator generalizing across different executor backbones and task domains. Further analyses show that the learned curator produces more targeted skill use, while the skills in SkillRepo evolve into more richly structured Markdown files that encode higher-level meta-skills over time.

2605.06612 2026-05-08 cs.LG cs.ET stat.ML

Online Bayesian Calibration under Gradual and Abrupt System Changes

在线贝叶斯校准在渐进和突发系统变化下的应用

Yang Xu, Chiwoo Park

发表机构 * Department of Industrial and Systems Engineering(工业与系统工程系) University of Washington(华盛顿大学) Seattle, WA 98195(华盛顿州西雅图98195)

AI总结 本文提出BRPC框架,用于处理流数据中的系统渐进变化和突发变化,通过分离校准参数更新和偏差更新,提升校准精度和鲁棒性。

详情
AI中文摘要

贝叶斯模型校准是数字孪生和计算机实验的核心,通过估计校准参数和纠正系统偏差来对齐模型输出与现场观测。经典贝叶斯校准引入了潜变量参数和偏差函数,但存在参数-偏差混淆问题,并通常假设数据生成过程是平稳的。这些限制在现代数字孪应用中尤为突出,因为系统随时间演变,可能表现出渐进漂移和突发制度转换。虽然数据同化方法能够实现顺序更新,但通常不显式建模系统偏差,并在突发变化下效果较差。本文提出贝叶斯递归投影校准(BRPC),一种用于流数据的在线贝叶斯校准框架,以应对模拟器不匹配和非平稳性。BRPC通过将偏差无关的粒子更新用于校准参数和条件高斯过程更新用于偏差,扩展了投影校准到在线设置,从而在渐进系统演变下实现偏差感知的适应。为处理突发变化,BRPC集成了重启机制,用于检测制度转换并重置校准过程。本文为这两个组件建立了理论保证,包括在渐进演变下的跟踪性能以及重启机制的误报和检测行为。在合成和植物模拟基准上的实证研究显示,BRPC在渐进变化下提高了校准精度,而带有重启机制的BRPC在突发制度转换下相比滑动窗口贝叶斯校准和数据同化基线进一步提高了鲁棒性和预测性能。

英文摘要

Bayesian model calibration is central to digital twins and computer experiments, as it aligns model outputs with field observations by estimating calibration parameters and correcting systematic model bias. Classical Bayesian calibration introduces latent parameters and a discrepancy function to model bias, but suffers from parameter--discrepancy confounding and is typically formulated as an offline procedure under a stationary data-generating assumption. These limitations are restrictive in modern digital twin applications, where systems evolve over time and may exhibit gradual drift and abrupt regime shifts. While data assimilation methods enable sequential updates, they generally do not explicitly model systematic bias and are less effective under abrupt changes. We propose Bayesian Recursive Projected Calibration (BRPC), an online Bayesian calibration framework for streaming data under simulator mismatch and nonstationarity. BRPC extends projected calibration to the online setting by separating a discrepancy-free particle update for calibration parameters from a conditional Gaussian process update for discrepancy, preserving identifiability while enabling bias-aware adaptation under gradual system evolution. To handle abrupt changes, BRPC is integrated with restart mechanisms that detect regime shifts and reset the calibration process. We establish theoretical guarantees for both components, including tracking performance under gradual evolution and false-alarm and detection behavior for restart mechanisms. Empirical studies on synthetic and plant-simulation benchmarks show that BRPC improves calibration accuracy under gradual changes, while restart-augmented BRPC further improves robustness and predictive performance under abrupt regime shifts compared to sliding-window Bayesian calibration and data assimilation baselines.

2605.06611 2026-05-08 cs.LG cs.AI stat.ML

The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity

注意力陷阱的结构起源:方差差异、超级神经元和维度不均等

Siquan Li, Kaiqi Jiang, Jiacheng Sun, Tianyang Hu

发表机构 * The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Huawei Foundation Model Department(华为基础模型部门)

AI总结 本文揭示了大语言模型中注意力陷阱现象的结构成因,通过分析自注意力机制中的方差差异和前馈网络中超级神经元的激活,证明了维度不均等导致注意力陷阱的形成,并提出head-wise RMSNorm架构改进以稳定值聚合。

Comments Accepted to ICML 2026

详情
AI中文摘要

尽管在大型语言模型(LLMs)中,初始token disproportionately垄断注意力分数的现象普遍存在,但其结构起源仍不明确。本文提供了这一现象的机理解释。首先,我们追溯其根源到自注意力机制中固有的值聚合过程,该过程导致系统性的方差差异。我们进一步证明,这种差异通过前馈网络(FFN)层中超级神经元的激活被显著放大。具体而言,通道稀疏的下投影触发了第一个token表示的维度不均等, necessitating the formation of attention sinks as a structural anchor。然后,我们通过两种受控干预验证了这一因果链:(i) 通过注意力掩码修改隔离聚合效应;(ii) 放大目标token表示的方差。两种干预均可在任意位置复制注意力陷阱。我们的机理理解为系统控制陷阱形成提供了基础。最后,作为概念验证,我们提出head-wise RMSNorm,一种在预训练期间稳定值聚合输出的架构改进。我们的实验表明,恢复位置间的统计平衡显著加快了收敛。

英文摘要

Despite the prevalence of the attention sink phenomenon in Large Language Models (LLMs), where initial tokens disproportionately monopolize attention scores, its structural origins remain elusive. This work provides a \textit{mechanistic explanation} for this phenomenon. First, we trace its root to the value aggregation process inherent in self-attention, which induces a systematic variance discrepancy. We further demonstrate that this discrepancy is drastically amplified by the activation of super neurons within Feed-Forward Network (FFN) layers. Specifically, the channel-sparse down-projections trigger a dimension disparity of the first-token representation, necessitating the formation of attention sinks as a structural anchor. Then, we validate this causal chain through two controlled interventions: (i) isolating the aggregation effect via attention mask modifications and (ii) amplifying the variance of targeted token representations. Both interventions can replicate attention sinks at arbitrary positions. Our mechanistic understanding offers a foundation for the systematic control of sink formation. Finally, as a proof of concept, we propose \textit{head-wise RMSNorm}, an architectural modification that stabilizes value aggregation outputs during pre-training. Our experiments demonstrate that restoring statistical parity across positions significantly accelerates convergence.

2605.06609 2026-05-08 cs.LG stat.ML

Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

Transformer 通过归一化梯度下降高效执行上下文逻辑回归

Chenyang Zhang, Yuan Cao

发表机构 * School of Computing & Data Science, The University of Hong Kong(香港大学计算机与数据科学学院)

AI总结 本文研究了softmax注意力机制的Transformer在线性分类数据上的上下文学习能力,通过构建多层Transformer实现上下文逻辑回归,证明其可通过训练单层自注意力层并循环应用获得,提供了训练收敛性和分布外泛化性的理论保障。

Comments 94 pages, 8 figures

详情
AI中文摘要

Transformer通过归一化梯度下降高效执行上下文逻辑回归。本文研究了softmax注意力机制的Transformer在线性分类数据上的上下文学习能力,通过构建多层Transformer实现上下文逻辑回归,证明其可通过训练单层自注意力层并循环应用获得,提供了训练收敛性和分布外泛化性的理论保障。

英文摘要

Transformers have demonstrated remarkable in-context learning (ICL) capabilities. The strong ICL performance of transformers is commonly believed to arise from their ability to implicitly execute certain algorithms on the context, thereby enhancing prediction and generation. In this work, we investigate how transformers with softmax attention perform in-context learning on linear classification data. We first construct a class of multi-layer transformers that can perform in-context logistic regression, with each layer exactly performing one step of normalized gradient descent on an in-context loss. Then, we show that our constructed transformer can be obtained through (i) training a single self-attention layer supervised by one-step gradient descent, and (ii) recurrently applying the trained layer to obtain a looped model. Training convergence guarantees of the self-attention layer and out-of-distribution generalization guarantees of the looped model are provided. Our results advance the theoretical understanding of ICL mechanism by showcasing how softmax transformers can effectively act as in-context learners.

2605.06605 2026-05-08 cs.LG

How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation

需要多少次迭代才能突破限制?多轮LLM评估中的动态预算分配

Shai Feldman, Yaniv Romano

发表机构 * Department of Computer Science(计算机科学系) Technion, Israel(技术ion, 以色列) Departments of Electrical and Computer Engineering and of Computer Science(电气与计算机工程系和计算机科学系)

AI总结 本文提出DAPRO框架,通过动态预算分配在多轮LLM交互中提供事件发生时间的界限,解决静态方法效率低的问题,实验表明其在覆盖性和方差方面优于传统方法。

详情
AI中文摘要

评估和预测大型语言模型(LLMs)在多轮对话设置中的性能至关重要但计算成本高;关键事件——例如突破限制或代理成功完成任务——往往在多次交互后才出现。这些事件可能罕见,在任何可行的计算预算下可能未被观察到。最近的符合生存框架构建了可靠的下界预测界限(LPBs)以确定触发事件所需的迭代次数,但依赖静态预算分配,在多轮设置中效率低下。为了解决这个问题,我们引入了动态分配通过投影优化(DAPRO),这是第一个理论上有效的动态预算分配框架,用于在多轮LLM交互中界定时间到事件。我们证明DAPRO满足预算约束,并提供分布无关、有限样本覆盖保证,而无需假设先前符合生存方法中条件独立性之间的截断和事件时间。关键理论贡献是新的覆盖界,其规模与均截断权重的平方根而不是最坏情况权重相关,从而比先前工作提供更紧的保证。此外,DAPRO可用于在有限计算资源下获得无偏、低方差的总体评估指标估计,如突破率。全面实验在代理任务成功、对抗性突破、有毒内容生成和RAG幻觉使用LLM如Llama 3.1和Qwen 2.5上显示,DAPRO在覆盖性和方差方面优于静态基线,同时满足预算约束。

英文摘要

Evaluating and predicting the performance of large language models (LLMs) in multi-turn conversational settings is critical yet computationally expensive; key events -- e.g., jailbreaks or successful task completion by an agent -- often emerge only after repeated interactions. These events might be rare, and under any feasible computational budget, remain unobserved. Recent conformal survival frameworks construct reliable lower predictive bounds (LPBs) on the number of iterations to trigger the event of interest, but rely on static budget allocation that is inefficient in multi-turn setups. To address this, we introduce \emph{Dynamic Allocation via PRojected Optimization} (DAPRO), the first theoretically valid dynamic budget allocation framework for bounding the time-to-event in multi-turn LLM interactions. We prove that DAPRO satisfies the budget constraint and provides distribution-free, finite-sample coverage guarantees without requiring the conditional independence between censoring and event times assumed by prior conformal survival approaches. A key theoretical contribution is a novel coverage bound that scales with the square root of the mean censoring weight rather than the worst-case weight, yielding provably tighter guarantees than prior work. Furthermore, DAPRO can be employed to obtain unbiased, low-variance estimates of population-level evaluation metrics, such as the jailbreak rate, under limited computing resources. Comprehensive experiments across agentic task success, adversarial jailbreaks, toxic content generation, and RAG hallucinations using LLMs such as Llama 3.1 and Qwen 2.5 demonstrate that DAPRO consistently achieves coverage closer to the nominal level with lower variance than static baselines, while satisfying the budget constraint.

2605.06599 2026-05-08 cs.LG eess.AS

Weight-Decay Turns Transformer Loss Landscapes Villani: Functional-Analytic Foundations for Optimization and Generalization

权重衰减使Transformer损失景观变得Villani:优化和泛化的功能-分析基础

Abhijit Das, Sayantan Dutta

发表机构 * Science and Technology Organization, GE HealthCare(科技组织,GE医疗)

AI总结 本文通过功能-分析方法研究权重衰减对Transformer损失景观的影响,证明其满足Villani的 coercive 能量函数条件,并推导出与正则化强度和模型维度相关的收敛保证和泛化界限。

Comments 17 pages, 10 figures

详情
AI中文摘要

权重衰减在大型语言模型中被广泛用作正则化器,但其在塑造Transformer损失景观中的精确作用仍缺乏理论研究。本文首次严格地从功能-分析角度对标准Transformer目标函数——交叉熵损失与L²正则化相结合的函数进行了表征,通过证明其满足Villani对 coercive 能量函数的条件。具体而言,我们证明了正则化损失F是无限次可微的,至少以二次速度增长,具有高斯可积的尾部,并满足当θ的范数趋于无穷时,对于所有s>0,-ΔF + (1/s)||∇F||²趋于无穷的微分增长条件。从这种结构中,我们推导出显式的log-Sobolev和Poincaré常数C_LS ≤ λ⁻¹ + d/λ²,将正则化强度λ和模型维度d与噪声随机梯度下降的有限时间收敛保证以及PAC-Bayesian泛化界限联系起来,这些界限随着λ的增加而收紧。为了验证我们的理论,我们引入了一个可扩展的Villani诊断Ψ_s(θ) = -ΔF + s⁻¹||∇F||²,并利用Hutchinson迹探针在具有超过1亿参数的模型中高效地估计它。在GPT-Neo-125M模型上对Penn Treebank和WikiText-103的实验验证了预测的Ψ_s二次增长、Hessian的谱膨胀以及与我们log-Sobolev分析一致的指数收敛行为。这些结果表明,权重衰减不仅在经验上提高了泛化能力,还建立了深度学习中快速Langevin混合和理论支撑的曲率感知优化所需的数学条件。

英文摘要

Weight decay is widely used as a regularizer in large language models, yet its precise role in shaping Transformer loss landscapes remains theoretically underexplored. This paper provides the first rigorous functional-analytic characterization of the standard Transformer objective--cross-entropy loss with $L^2$ regularization--by proving it satisfies Villani's criteria for coercive energy functions. Specifically, we show that the regularized loss $\mathcal{F}$ is infinitely differentiable, grows at least quadratically, has Gaussian-integrable tails, and satisfies the differential growth condition $-Δ\mathcal{F} + \tfrac{1}{s}\|\nabla\mathcal{F}\|^{2} \to \infty$ as $\|θ\| \to \infty$ for all $s>0$. From this structure, we derive explicit log-Sobolev and Poincaré constants $C_{\mathrm{LS}} \leq λ^{-1} + d/λ^{2}$, linking the regularization strength $λ$ and model dimension $d$ to finite-time convergence guarantees for noisy stochastic gradient descent and PAC-Bayesian generalization bounds that tighten with increasing $λ$. To validate our theory, we introduce a scalable Villani diagnostic $Ψ_s(θ) = -Δ\mathcal{F} + s^{-1}\|\nabla \mathcal{F}\|^2$ and estimate it efficiently using Hutchinson trace probes in models with over 100M parameters. Experiments on GPT-Neo-125M across Penn Treebank and WikiText-103 confirm the predicted quadratic growth of $Ψ_s$, spectral inflation of the Hessian, and exponential convergence behavior consistent with our log-Sobolev analysis. These results demonstrate that weight decay not only improves generalization empirically but also establishes the mathematical conditions required for fast Langevin mixing and theoretically grounded curvature-aware optimization in deep learning.

2605.06595 2026-05-08 cs.RO cs.AI cs.LG cs.MA

Cross-Modal Navigation with Multi-Agent Reinforcement Learning

跨模态导航与多智能体强化学习

Shuo Liu, Xinzichen Li, Christopher Amato

发表机构 * Khoury College of Computer Sciences(计算机科学学院)

AI总结 本文提出CRONA框架,通过多智能体强化学习实现跨模态导航,利用辅助信念和集中式多模态批评者提升协作效率,实验表明多智能体方法在视觉-听觉导航中优于单智能体基线。

详情
AI中文摘要

鲁棒的具身导航依赖互补的感知线索。然而,高质量且对齐的多模态数据在实践中难以获得。训练单一大模型也具有挑战性,因为丰富的多模态输入导致复杂的表示并显著扩大了策略空间。跨模态协作中的轻量级模态专用智能体提供了一种可扩展的范式。它允许灵活的部署和并行执行,同时保留每个模态的优势。本文提出CRONA,一种用于跨模态导航的多智能体强化学习框架。CRONA通过利用与控制相关的辅助信念和具有全局状态的集中式多模态批评者来改进协作。在视觉-听觉导航任务中的实验表明,多智能体方法在性能和效率上显著优于单智能体基线。我们发现,在显著线索下,有限模态的同质协作足以完成短距离导航;在智能体间具有互补模态的异质协作通常高效且有效;而在大型复杂环境中导航则需要更丰富的多模态感知和增加模型容量。

英文摘要

Robust embodied navigation relies on complementary sensory cues. However, high-quality and well-aligned multi-modal data is often difficult to obtain in practice. Training a monolithic model is also challenging as rich multi-modal inputs induce complex representations and substantially enlarge the policy space. Cross-modal collaboration among lightweight modality-specialized agents offers a scalable paradigm. It enables flexible deployment and parallel execution, while preserving the strength of each modality. In this paper, we propose \textbf{CRONA}, a Multi-Agent Reinforcement Learning (MARL) framework for \textbf{Cro}ss-Modal \textbf{Na}vigation. CRONA improves collaboration by leveraging control-relevant auxiliary beliefs and a centralized multi-modal critic with global state. Experiments on visual-acoustic navigation tasks show that multi-agent methods significantly improve performance and efficiency over single-agent baselines. We find that homogeneous collaboration with limited modalities is sufficient for short-range navigation under salient cues; heterogeneous collaboration among agents with complementary modalities is generally efficient and effective; and navigation in large, complex environments requires both richer multi-modal perception and increased model capacity.