arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2075
专题追踪
2605.07068 2026-05-11 cs.CL cs.AI

WiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki Systems

WiCER:维基内存编译、评估、细化迭代知识编译用于LLM维基系统

Juan M. Huerta

发表机构 * Zinnia Tech Solutions(Zinnia科技解决方案)

AI总结 本文提出WiCER算法,通过迭代编译、评估和细化维基内存,解决LLM维基系统中知识编译的差距问题,显著提升知识检索质量并减少灾难性失败。

详情
AI中文摘要

LLM维基模式通过将领域知识编译为持久化 artifact 并通过KV缓存推理提供给LLM,承诺在亚秒延迟下实现上下文访问,且无检索失败。实现这一目标需要解决编译差距:LLM编译将原始文档编译为维基,而不致灾难性地丢弃关键事实。我们跨17个RepLiQA领域(6,800个问题)分析这一差距:我们发现全上下文KV缓存推理在 curated knowledge 上优于RAG(4.38 vs. 4.08,7.3次更快TTFT),但随规模扩大因注意力稀释而劣于RAG,而盲目编译完全失效(2.14到2.32 vs. 3.46,53到60%灾难性失败率)。为解决编译差距,我们提出WiCER(维基内存编译、评估、细化),一种受反例引导抽象细化(CEGAR)启发的迭代算法。WiCER通过评估编译的维基 against 诊断探针,识别丢失的事实,并在后续编译中强制保留。一次到两次迭代恢复80%的丢失质量(15个主题上raw full-context的平均3.24 vs. 3.47),将灾难性失败率降低55%。跨所有17个主题的消融研究证实,针对性诊断(+0.95)而非通用固定(+0.16)推动了增益。所有代码和基准测试已发布以供可重复研究。

英文摘要

The LLM Wiki pattern, to compile and provide domain knowledge into a persistent artifact and serve it to LLMs via KV cache inference, promises context access at sub-second latency with zero retrieval failure. Realizing this requires solving the compilation gap: LLM compilation distilling raw documents into a wiki without catastrophically discarding critical facts. We characterize this gap across 17 RepLiQA domains (6,800 questions): we observe that full context KV cache inference outperforms RAG on curated knowledge (4.38 vs. 4.08 out of 5, 7.3 faster TTFT) but degrades below RAG at scale due to attention dilution, and blind compilation fails entirely (2.14 to 2.32 vs. 3.46, 53 to 60% catastrophic failure rate). To address the compilation gap, we propose WiCER (Wiki-memory Compile, Evaluate, Refine), an iterative algorithm inspired by counterexample-guided abstraction refinement (CEGAR) that closes this gap. WiCER evaluates compiled wikis against diagnostic probes, identifies dropped facts, and forces their preservation in subsequent compilations. One to two iterations recover 80% of lost quality (mean 3.24 vs. 3.47 for raw full-context across the 15 topics with baselines), reducing catastrophic failures by 55% relative. An ablation across all 17 topics confirms that targeted diagnosis (+0.95), not generic pinning (+0.16), drives the gains. All code and benchmarks are released for reproducible research.

2605.07067 2026-05-11 cs.LG

PolarAdamW: Disentangling Spectral Control and Schur Gauge-Equivariance in Matrix Optimisation

PolarAdamW:在矩阵优化中分离谱控制与Schur规范等变性

Haozhou Zhang

发表机构 * Department of Mathematics and Statistics(数学与统计学系)

AI总结 PolarAdamW通过分离谱控制与Schur规范等变性,改进了矩阵优化算法,在不同任务中表现出色,优于Muon和AdamW。

详情
AI中文摘要

Muon的矩阵级更新结合了两种效应:通过极坐标映射实现的谱控制,以及在正交变换下的等变性(Schur规范等变性)。PolarAdamW通过一个受控的混合方法分离了这两种效应,保留了Muon的极坐标谱范数控制,但破坏了等变性,因为AdamW的坐标预处理器依赖于基底。算法上,PolarAdamW将Muon的Newton-Schulz极坐标映射应用于AdamW的预处理方向而非原始动量,在每迭代的墙时间上与Muon相当。我们证明Muon的极坐标步长在多重性矩阵上是Schur规范等变的,而AdamW的坐标步长不是。在从头训练的DeiT-Tiny上,PolarAdamW在平均测试准确率上优于Muon 1.93个百分点,优于AdamW 9.5个百分点;在300个epoch的DeiT风格配方下,它在平均上仍优于Muon 1.37个百分点,优于AdamW 5.80个百分点。在SO(3)-等变的3D点云回归中,当多重性基底自由度非平凡时,顺序反转:Muon在所有审计容量上优于PolarAdamW,并且差距随容量扩大。两种矩阵极坐标优化器继续优于AdamW。这种双重分离将谱控制与Schur规范等变性分开:前者与标准变压器上的AdamW预处理兼容,而后者在多重性基底自由度结构非平凡时变得重要。

英文摘要

Muon's matrix-level update couples two distinct effects: spectral control via a polar map, and equivariance under orthogonal changes of multiplicity-space basis (Schur gauge-equivariance). We separate them with PolarAdamW, a controlled hybrid that preserves Muon's polar spectral-norm control but breaks the gauge-equivariance, since AdamW's coordinatewise preconditioner is basis-dependent. Algorithmically, PolarAdamW applies Muon's Newton-Schulz polar map to AdamW's preconditioned direction rather than to raw momentum, at per-iteration wall-time comparable to Muon. We prove that Muon's polar step is Schur gauge-equivariant on multiplicity matrices while AdamW's coordinatewise step is not. On DeiT-Tiny trained from scratch on four independently sampled 100-class subsets of ImageNet-1k, where multiplicity-basis freedom is trivial, PolarAdamW outperforms Muon by +1.93 pp in test accuracy on average and AdamW by +9.5 pp; under the 300-epoch DeiT-style recipe, it remains ahead of Muon by +1.37 pp and AdamW by +5.80 pp on average. On SO(3)-equivariant 3D point-cloud regression, where multiplicity-basis freedom is non-trivial, the ordering reverses: Muon outperforms PolarAdamW at every audited capacity, and the gap widens with capacity. Both matrix-polar optimisers continue to outperform AdamW. This double dissociation separates spectral control from Schur gauge-equivariance: the first composes well with AdamW preconditioning on standard transformers, while the second becomes consequential when multiplicity-basis freedom is structurally non-trivial.

2605.07064 2026-05-11 cs.CV

Learning to Track Instance from Single Nature Language Description

从单个自然语言描述中学习实例跟踪

Yaozong Zheng, Bineng Zhong, Qihua Liang, Shuimu Zeng, Haiying Xia, Shuxiang Song

发表机构 * Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education Guangxi Normal University(教育区块链与智能技术重点实验室,教育部广西师范大学) University Engineering Research Center of Educational Intelligent Technology Guangxi Normal University(教育智能技术大学工程研究中心,广西师范大学) University of Southampton(南安普顿大学)

AI总结 本文提出一种自监督视觉-语言跟踪方法,通过动态令牌聚合模块提升语义对齐,无需bounding-box标注即可实现实例跟踪。

Comments CVPR 2026

详情
AI中文摘要

如何利用视频序列中的自然语言描述实现视觉-语言跟踪,而无需依赖任何边界框标注?本文通过解决自监督视觉-语言跟踪问题实现此目标。我们引入了一个新的自监督视觉-语言跟踪器,能够通过语言描述跟踪任何被引用的对象。与传统方法不同,我们提出了一种高效的动态令牌聚合模块,该模块对每个视觉令牌进行不平等处理。该模块包含三个主要步骤:i)基于锚定令牌,从模板帧中选择多个重要的目标令牌;ii)所选目标令牌根据注意力分数合并并聚合到语言令牌中,从而消除冗余的视觉令牌噪声并增强语义对齐;iii)最后,融合的语言令牌作为引导信号,从搜索帧中提取潜在的目标令牌并传播到后续帧,增强时间提示并鼓励跟踪器从无标注视频中自主学习实例跟踪。这种新的建模方法使语言引导的跟踪表示能够有效自监督学习,而无需大规模边界框标注。在视觉-语言跟踪基准上的大量实验表明,{\tracker}超越了最先进的自监督方法。

英文摘要

How to achieve vision-language (VL) tracking using natural language descriptions from a video sequence \textbf{without relying on any bounding-box ground truth}? In this work, we achieve this goal by tackling \textit{self-supervised VL tracking}, which aims to evaluate tracking capabilities guided by natural language descriptions. We introduce \textbf{\tracker}, a novel self-supervised VL tracker that is capable of tracking any referred object by a language description. Unlike traditional methods that equally fuse all language and visual tokens, we propose an efficient Dynamic Token Aggregation Module, which treats each visual token \textbf{unequally}. The module consists of three main steps: i) Based on an anchor token, it selects multiple important target tokens from the template frame. ii) The selected target tokens are merged according to their attention scores and aggregated into the language tokens, thereby eliminating redundant visual token noise and enhancing semantic alignment. iii) Finally, the fused language tokens serve as guiding signals to extract potential target tokens from the search frame and propagate them to subsequent frames, enhancing temporal prompts and encouraging the tracker to autonomously learn instance tracking from unlabeled videos. This new modeling approach enables the effective self-supervised learning of language-guided tracking representations without the need for large-scale bounding box annotations. Extensive experiments on VL tracking benchmarks show that {\tracker} surpasses SOTA self-supervised methods.

2605.07063 2026-05-11 cs.LG cs.AI

Dr. Post-Training: A Data Regularization Perspective on LLM Post-Training

Dr. Post-Training:一种数据正则化视角下的LLM后训练

Pingbang Hu, Xueshen Liu, Z. Morley Mao, Jiaqi W. Ma

发表机构 * University of Illinois Urbana–Champaign(伊利诺伊大学厄巴纳-香槟分校) University of Michigan(密歇根大学)

AI总结 本文提出Dr. Post-Training框架,通过将通用训练数据作为数据诱导正则化器,防止模型过拟合稀缺目标数据,从而提升LLM后训练效果。

详情
AI中文摘要

数据选择方法解决了LLM后训练中的关键挑战:有效利用稀缺的高质量目标数据与大量但不完全对齐的一般训练数据。本文超越数据选择框架,引入Dr. Post-Training(数据正则化后训练)新框架,将通用训练数据视为数据诱导正则化器,以防止模型过拟合稀缺目标任务。具体而言,框架在每个训练步骤中使用通用训练数据构造可行的模型更新方向集,并将稀缺目标数据指定的模型更新方向投影到该可行集上。标准训练和现有数据选择方法作为正则化器不同选择的特殊情形,对应不同的偏差-方差谱上的不同正则化强度点。基于此观点,本文提出一系列方法,提供更丰富的设计空间和更灵活的偏差-方差权衡。对于实际LLM规模应用,我们引入精心设计的系统优化,以最小的开销实现这些方法。在SFT、RLHF和RLVR上的广泛实验表明,我们的方法在性能上始终优于最先进的数据选择基线,系统基准测试也证实了其效率。

英文摘要

Data selection methods address a critical challenge in LLM post-training: effectively leveraging scarce, high-fidelity target data alongside abundant but imperfectly aligned general training data. In this work, we move beyond the data-selection framing and introduce Dr. Post-Training (Data-Regularized Post-Training), a novel framework that reconceptualizes general training data as a data-induced regularizer that prevents overfitting to the scarce target objective, rather than serving as a pool for selection. Specifically, our framework proposes that at each training step, construct a feasible set of model update directions using the general training data, and project the model update direction specified by the scarce target data onto that feasible set. Standard training and existing data selection methods arise as special cases with different choices of the data-induced regularizer, and these methods correspond to different points on a bias--variance spectrum with different regularization strength. Building on this view, we propose a family of methods offering a richer design space and more flexible bias--variance tradeoffs. For practical LLM-scale use, we introduce careful system optimizations that realize these methods with minimal overhead. Extensive experiments across SFT, RLHF, and RLVR show that our methods consistently outperform state-of-the-art data selection baselines, and system benchmarks confirm their efficiency.

2605.07058 2026-05-11 cs.CL cs.AI

MedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environments

MedExAgent:训练LLM代理在嘈杂的临床环境中提问、检查和诊断

Yicheng Gao, Xiaolin Zhou, Yahan Li, Yue Zhao, Ruishan Liu

发表机构 * University of Southern California(南加州大学) Arizona State University(亚利桑那州立大学)

AI总结 本文提出MedExAgent,通过两阶段流程训练医疗诊断代理,结合合成对话和DAPO优化,实现高效诊断与成本控制。

详情
AI中文摘要

现实中的临床诊断是一个复杂过程,医生需通过与患者互动和进行医学检查获取信息,同时需适应不同患者类型及嘈杂不完整信息。现有医疗LLM基准和自动诊断方法多简化此过程为单轮问答、无噪声对话或顺序检查等,忽略临床诊断的交互性和不确定性。本文通过将临床诊断形式化为部分可观测马尔可夫决策过程(POMDP),引入三种动作类型:提问患者、调用医学检查和下达诊断。同时提出包含七种患者噪声类型和三种检查噪声类型的系统噪声模型。通过所提环境,训练出有效诊断代理MedExAgent,通过监督微调和DAPO优化复合奖励函数,涵盖诊断准确性、工具调用质量和检查成本。实验表明MedExAgent在诊断性能上与大模型相当,同时保持成本高效的检查策略。

英文摘要

Real-world clinical diagnosis is a complex process in which the doctor is required to obtain information from both interaction with the patient and conducting medical exams. Additionally, the doctor needs to adapt to different patient personas, as well as noisy and incomplete information that can happen at any time during the process. However, existing benchmarks for medical LLMs and methods for automatic diagnosis largely simplify this process by reducing it to single-turn question answering, noise-free conversations, or sequential exam making, etc., ignoring the interactive and uncertain nature of clinical diagnosis. In this paper, we aim to address this gap by formalizing clinical diagnosis as a Partially Observable Markov Decision Process (POMDP) with three action types: questioning the patient, ordering medical exams as tool calls, and issuing a diagnosis. We also introduce a systematic noise model comprising seven patient noise types and three exam noise types. Using our proposed environment, we train an effective diagnosis agent, \textbf{MedExAgent}, through a two-stage pipeline that first performs supervised finetuning on synthetic conversations structured after the Calgary-Cambridge model for clinical interviews, and then applies DAPO to optimize a composite reward capturing diagnostic accuracy, tool call quality, and exam cost including financial cost and patient discomfort. Through extensive experiments and ablation studies, we demonstrate that MedExAgent achieves diagnostic performance comparable to larger models while maintaining cost-efficient examination strategies.

2605.07057 2026-05-11 cs.LG

Integrating Causal DAGs in Deep RL: Activating Minimal Markovian States with Multi-Order Exposure

在深度强化学习中整合因果DAG:通过多阶暴露激活最小马尔可夫状态

Jiamin Xu, Jacqueline Maasch, Kyra Gan

发表机构 * Cornell Tech(康奈尔科技)

AI总结 本文研究了如何在深度强化学习中构建满足马尔可夫性质的MDP状态,提出MOSE方法通过多阶历史状态提升性能,证明了最小充分性不足,需引入受控冗余以利用因果状态信息。

详情
AI中文摘要

在线强化学习(RL)依赖马尔可夫性质保证性能,但现实应用中常缺乏明确的状态定义。尽管因果RL受到关注,现有研究通常假设马尔可夫状态已提供,专注于利用因果性加速学习,留下关键问题:给定观察变量的纵向因果图,如何构建满足马尔可夫性质的MDP状态?本文提供了一种构造可证明最小状态表示的程序。在深度RL中,观察到最小表示单独无法提升性能,表明神经网络无法直接利用马尔可夫最小性。为此,提出MOSE(多阶状态暴露),将多阶历史状态构造输入同一Q函数。MOSE在常见基准和合成数据集上均优于最小状态构造和单窗口策略。结合最小表示与MOSE可进一步提升性能。结果确立了因果深度RL的核心原则:最小充分性不足,受控冗余是解锁因果状态信息收益的必要条件。

英文摘要

Online reinforcement learning (RL) relies on the Markov property for guaranteed performance, but real-world applications often lack well-defined states given raw observed variables. While causal RL has attracted growing interest, existing work typically assumes Markovian states are provided and focuses on using causality to accelerate learning, leaving a fundamental gap: \emph{given a longitudinal causal graph over observed variables, how does one construct MDP states that provably satisfy the Markov property?} We address this by providing a procedure that constructs a provably minimal state representation. In deep RL, we observe that the minimal representation alone empirically fails to improve performance, indicating that neural networks cannot directly exploit Markovian minimality. To address this, we propose \textbf{MOSE} (Multi-Order State Exposure), which feeds multi-order historical state constructions into the same $Q$-function. MOSE consistently outperforms both the minimal state construction and single-window policies on common benchmarks and synthetic datasets. Including the minimal representation alongside MOSE can further improve performance. Our results establish a core principle for causal deep RL: minimal sufficiency is not enough, and \emph{controlled redundancy} is necessary to unlock the benefit of causal state information.

2605.07055 2026-05-11 cs.CV cs.AI

Pan-FM: A Pan-Organ Foundation Model with Saliency-Guided Masking for Missing Robustness

Pan-FM:一种具有显著性引导掩码的全器官基础模型以实现缺失鲁棒性

Qiangqiang Wu, Grace McIlvain, Zhou Yu, Junhao Wen

发表机构 * Laboratory of AI and Biomedical Science(人工智能与生物医学科学实验室) Columbia University(哥伦比亚大学)

AI总结 Pan-FM通过显著性引导掩码技术,解决多模态生物医学数据缺失问题,提升全器官表示学习的鲁棒性,优于单器官和多器官基线模型。

详情
AI中文摘要

基础模型(FMs)在医学影像中展现出巨大潜力,但大多数FMs是在孤立领域内训练的单模态数据,如单独的脑部MRI。人类衰老和疾病是通过跨器官的协调生物过程产生的,因此推动了学习全身表示的多模态FMs。然而,一个关键挑战是现实中的多模态生物医学数据往往不是随机缺失的,这会降低统计效力,限制泛化能力,并引入偏差。我们提出Pan-FM,一种在七个器官(脑、心、脂肪、肝、肾、脾和胰腺)的影像上预训练的全器官基础模型,模拟现实中的器官缺失场景。Pan-FM使用统一的主干网络,在训练和推理过程中处理器官缺失,并通过基于掩码的自我蒸馏进行预训练。我们发现,朴素的多模态预训练会导致主导器官的捷径学习偏差,模型过度依赖如脂肪和心脏等主导器官。为了解决这一问题,我们引入显著性引导掩码(SGM),利用模型注意力分布,适应性地在预训练过程中掩码主导器官,从而促进更平衡的跨器官、全身学习。值得注意的是,SGM引入了极小的计算开销,并能无缝集成到现有的自监督学习框架中,以改进多器官表示学习。在UK Biobank上,Pan-FM在13种疾病类别和14种单疾病实体上的预测表现优于单器官和多器官基线模型,在器官缺失设置下具有改进的鲁棒性。Pan-FM为系统神经科学中多模态学习中的现实模态缺失问题提供了可扩展的解决方案,并为更通用的全身FMs迈进了一步。

英文摘要

Foundation models (FMs) have shown great promise in medical imaging, but most FMs are trained on unimodal data within isolated domains, such as brain MRI alone. Human aging and disease arise through coordinated biological processes across organs, therefore motivating multimodal FMs that learn whole-body representations. A key challenge, however, is that real-world multimodal biomedical data are often missing not at random, which can reduce power, limit generalizability, and introduce bias. We propose Pan-FM, a pan-organ foundation model pre-trained on imaging from seven organs (Brain, Heart, Adipose, Liver, Kidney, Spleen, and Pancreas) under realistic missing-organ scenarios. Pan-FM uses a unified backbone that handles organ missingness during both training and inference, and is pre-trained with masking-based self-distillation. We find that naive multimodal pre-training leads to dominant-organ shortcut learning bias, with the model over-relying on dominant organs such as adipose and heart. To address this, we introduce Saliency-Guided Masking (SGM), which uses the model attention distribution to adaptively mask dominant organs during pre-training, thus encouraging more balanced cross-organ, whole-body learning. Notably, SGM introduces negligible computational overhead and can be seamlessly integrated into existing self-supervised learning frameworks to improve multi-organ representation learning. On the UK Biobank, Pan-FM achieves stronger prediction across 13 disease categories and 14 single disease entities than single-organ and multi-organ baselines, with improved robustness under missing-organ settings. Pan-FM serves as a scalable solution to realistic modality-missingness in multimodal learning in system neuroscience and as a step toward more generalizable whole-body FMs.

2605.07051 2026-05-11 cs.CL

NSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language Models

NSMQ谜题:一个科学和数学谜题的基准,用于评估大型语言模型

George Boateng, Naafi Ibrahim, Samuel John, Philemon Badu, Patrick Agyeman-Budu, Jonathan Mensah, Kevin Yeboah, William Edor, Andrew Mensa-Onumah, Nana Yeboah, Victor Wumbor-Apin Kumbol

发表机构 * ETH Zurich(苏黎世联邦理工学院) Charité - Universitätsmedizin Berlin(柏林夏里特医学院) Kwame AI Inc.(夸梅人工智能公司) Ashesi University(阿什西大学)

AI总结 本文提出NSMQ谜题基准,基于加纳国家科学与数学竞赛的谜题,评估大型语言模型的科学和数学推理能力,发现即使是最先进的模型也难以应对。

Comments 15 pages. Accepted at the 27th International Conference on Artificial Intelligence in Education

详情
AI中文摘要

大型语言模型(LLMs)在各种科学教育基准上表现出色,展示了其在科学和数学教育中的潜力。然而,LLMs往往被西方世界的科学和数学教育数据集评估,而全球南方的数据集代表性不足。此外,它们的多项选择答案选项往往容易评估。在本文中,我们提出了NSMQ谜题,一个新颖的科学和数学谜题基准,来自加纳国家科学与数学竞赛(NSMQ)竞赛,用于评估LLMs。NSMQ是加纳每年的实时电视竞赛,汇集了加纳最聪明的高中学生,他们以两人一组的形式回答生物学、化学、物理和数学的问题,经过五轮和五阶段,直到决出当年的获胜队伍。NSMQ谜题包含11年(n=1.8K)的谜题问题(第5轮),每个谜题至少包含3个线索。学生竞争率先猜出答案,早期的线索较为模糊且得分更高。答案通常是数字、单词或短语,允许自动评估。我们评估了最先进的模型:封闭模型(GPT-5.4、Gemini 3.1 Pro、Claude Opus 4.6)和开放模型(Kimi-K2.5、DeepSeek-V3.1、GPT-OSS-120B)的高和低推理设置。我们的评估表明,该数据集对最先进的LLMs来说仍然具有挑战性,这些模型的表现甚至不如最佳学生参赛者。本工作贡献了一个新颖且具有挑战性的基准,用于科学和数学推理,来自全球南方,以促进LLMs在科学和数学教育中的真正全球基准评估。

英文摘要

Large Language Models (LLMs) have shown good performance on various science educational benchmarks, demonstrating their potential for use in science and mathematics education. Yet, LLMs tend to be evaluated on science and mathematical educational datasets from the Western world, with an underrepresentation of datasets from the Global South. Furthermore, they tend to have multiple-choice answer options that are trivial to evaluate. In this work, we present NSMQ Riddles, a novel benchmark of Scientific and Mathematical Riddles from Ghana's National Science and Maths Quiz (NSMQ) competition to evaluate LLMs. The NSMQ is an annual live TV competition for senior secondary school students in Ghana that brings together the smartest high school students in Ghana who compete in teams of 2 by answering questions in biology, chemistry, physics, and math over five rounds and five stages until a winning team is crowned for that year. NSMQ Riddles consists of 11 years of riddle questions (n=1.8K) from the 5th round, with each riddle containing a minimum of 3 clues. Students compete to be the first to guess the answer on any of the clues, with earlier clues being vague and also fetching more points. The answers are usually a number, word, or short phrase, allowing for automatic evaluation. We evaluated state-of-the-art models: closed (GPT-5.4, Gemini 3.1 Pro, Claude Opus 4.6) and open models (Kimi-K2.5, DeepSeek-V3.1, GPT-OSS-120B) with high and low reasoning settings. Our evaluation shows that the dataset is challenging even for state-of-the-art LLMs, which performed worse than the best student contestants. This work contributes a novel and challenging benchmark for scientific and mathematical reasoning from the Global South towards enabling a true global benchmarking of LLMs' capabilities for science and mathematics education.

2605.07049 2026-05-11 cs.LG cs.AI

Towards Differentially Private Reinforcement Learning with General Function Approximation

面向通用函数逼近的差分隐私强化学习

Yi He, Xingyu Zhou

发表机构 * Wayne State University(韦恩州立大学)

AI总结 本文提出首个针对差分隐私在线强化学习与通用函数逼近的理论保证,结合批量策略更新与指数机制,证明在通用函数逼近下,模型自由设置的后悔与线性情况一致,达到O~(K^{3/5}),并揭示线性函数逼近隐私RL结果中的根本差距。

详情
AI中文摘要

我们提出了首个针对差分隐私在线强化学习(RL)与通用函数逼近的理论保证,扩展了先前仅限于表格和线性设置的工作。我们的方法结合了批量策略更新方案与指数机制,并辅以新的后悔分析。我们证明,在通用函数逼近下,模型自由设置中的后悔在差分隐私下与线性情况的最优结果一致,缩放为O~(K^{3/5}),其中K表示回合数。作为重要副产品,我们还建立了首个依赖标准覆盖复杂度的在线RL批量更新的后悔界,补充了基于新引入的Eluder-Condition类的现有结果。此外,我们揭示了近期线性函数逼近隐私RL结果中的根本差距,从而澄清了其景观。

英文摘要

We present the first theoretical guarantees for differentially private online reinforcement learning (RL) with general function approximation, extending beyond prior work restricted to tabular and linear settings. Our approach combines a batched policy update scheme with the exponential mechanism, together with a novel regret analysis. We show that, even under general function approximation, the regret in the model-free setting under differential privacy matches the state of the art for the linear case, scaling as $\widetilde{O}(K^{3/5})$, where $K$ denotes the number of episodes. As an important by-product, we also establish the first regret bound for online RL with batch update that depends on the standard complexity measure of coverability, complementing existing results based on a newly introduced Eluder-Condition class. In addition, we uncover fundamental gaps in recent results for private RL with linear function approximation, thereby clarifying its landscape.

2605.07048 2026-05-11 cs.LG cs.AI

Unlocking High-Fidelity Molecular Generation from Mass Spectra via Dual-Stream Line Graph Diffusion

通过双流线图扩散解锁高保真分子生成:从质谱数据

Xujun Che, Xiuxia Du, Depeng Xu

发表机构 * Department of Software and Information Systems, University of North Carolina at Charlotte(北卡罗来纳州立大学软件与信息系统系) Department of Bioinformatics and Genomics, University of North Carolina at Charlotte(北卡罗来纳州立大学生物信息学与基因组学系)

AI总结 本文提出双流线图扩散模型,通过交替求解原子和键级推理子问题,提升从质谱数据生成分子的准确性,实验结果显示其在两个基准测试中达到34.37%和23.89%的top-1准确率,优于现有方法。

详情
AI中文摘要

从串联质谱数据进行de novo分子生成是一个具有挑战性的逆向问题,其核心困难在于原子级和键级推理之间的循环依赖:确定键的类型需要知道其端点原子的化学环境,而原子的环境又由其连接的键定义。现有图扩散方法在单一计算流中处理原子和键,原子-键信息同步只能在层间隐式发生。我们认为这种单流范式,而非特定聚合内核的选择,是关键的架构瓶颈。我们提出DualLGD(双流线图扩散),将分子图去噪重新表述为交替求解两个耦合子问题:原子级推理和键级推理,每个在各自专用的表示空间中操作。线图为键空间提供了自然的数学构造,在其中键角、二面角、共轭链和环对应于键之间的局部拓扑动机。约束性双向交叉注意力在每一层同步两个流,确保每个原子只关注其连接的键,反之亦然,尊重化学基本原理:原子的环境由其键合上下文决定。在NPLIB1和MassSpecGym基准测试中,DualLGD达到34.37%和23.89%的top-1准确率,大约是先前最佳状态的3倍。消融研究证实架构是主要的改进来源:DualLGD在无任何预训练的情况下已经超过了先前最佳的完全预训练模型。

英文摘要

De novo molecular generation from tandem mass spectra is a challenging inverse problem whose core difficulty lies in the circular dependency between atom-level and bond-level reasoning: determining a bond's type requires knowing its endpoint atoms' chemical environment, yet an atom's environment is in turn defined by its incident bonds. Existing graph diffusion methods process atoms and bonds within a single computation stream, where atom-bond information synchronization can only occur implicitly across layers. We argue that this single-stream paradigm, rather than the choice of any particular aggregation kernel, is a key architectural bottleneck. We propose DualLGD (Dual-stream Line Graph Diffusion), which reformulates molecular graph denoising as the alternating solution of two coupled subproblems: atom-level reasoning and bond-level reasoning, each operating in its own dedicated representation space. The line graph provides a natural mathematical construction for the bond space, in which bond angles, dihedrals, conjugation chains, and rings correspond to local topological motifs between bonds. Incidence-constrained bidirectional cross-attention synchronizes the two streams at every layer, ensuring that each atom attends only to its incident bonds and vice versa, respecting the fundamental chemical principle that an atom's environment is determined by its bonding context. On the NPLIB1 and MassSpecGym benchmarks, DualLGD achieves top-1 accuracy of 34.37\% and 23.89\%, approximately $3\times$ the previous state of the art. Ablation studies confirm the architecture as the primary source of improvement: DualLGD without any pre-training already surpasses the previous best fully pretrained model.

2605.07042 2026-05-11 cs.AI cs.LG

The Context Gathering Decision Process: A POMDP Framework for Agentic Search

上下文收集决策过程:一种用于代理搜索的POMDP框架

Chinmaya Kausik, Adith Swaminathan, Nathan Kallus

发表机构 * University of Michigan(密歇根大学) Netflix

AI总结 本文提出CGDP框架,通过将LLM行为建模为近似汤普森采样,引入谓词方法分解搜索过程,并设计两种干预措施提升多跳推理能力。

Comments 25 pages

详情
AI中文摘要

大型语言模型(LLM)代理在复杂环境中(如大规模代码库、企业数据库和对话历史)面临相关状态超出上下文窗口的挑战。为导航这些空间,代理需迭代探索以寻找相关信息。然而,缺乏显式基础设施时,工作内存可能退化为搜索状态的低效表示,导致冗余工作(如重复循环)和过早停止。本文将此挑战形式化为上下文收集决策过程(CGDP),一种专门的半马尔可夫决策过程,其中代理的目标是自适应地细化其信念状态以隔离任务所需的必要信息。我们将LLM行为建模为CGDP内的近似汤普森采样,并引入基于谓词的方法将LLM的隐式搜索分解为显式和模块化操作。随后,我们推导出两种适用于迭代LLM代理的插件式干预措施:一种持久的、基于谓词的信念状态,可在保留多跳推理的同时限制上下文;另一种程序性耗尽门,可在不导致过早停止的情况下终止无用的搜索。在四种方法和三个问答领域中,我们实证验证了将LLM的隐式状态替换为CGDP驱动的信念状态可将多跳推理能力提升高达11.4%;而模块化的程序性耗尽检测可节省高达39%的token,同时不降低代理性能。最终,我们论证将LLM代理循环框架化为CGDP可指导设计模块化、非干扰的改进以增强代理搜索能力。

英文摘要

Large Language Model (LLM) agents are deployed in complex environments -- such as massive codebases, enterprise databases, and conversational histories -- where the relevant state far exceeds their context windows. To navigate these spaces, an agent must iteratively explore the environment to find relevant information. However, without explicit infrastructure, an agent's working memory can degrade into lossy representations of the search state, resulting in redundant work (e.g. repetitive looping) and premature stopping. In this work, we formalize this challenge as the Context Gathering Decision Process (CGDP), a specialized Partially Observable Markov Decision Process, where an agent's objective is to adaptively refine its belief state to isolate the necessary information for a task. We model an LLM's behavior as approximate Thompson Sampling within this CGDP, and introduce a predicate-based method that decomposes an LLM's implicit search into explicit and modular operations. We then derive two plug-and-play interventions for iterative LLM agents: a persistent, predicate-based belief state that bounds context while preserving multi-hop reasoning, and a programmatic exhaustion gate that halts unproductive search without premature stopping. Across four methods and three question-answering domains, we empirically validate that replacing an LLM's implicit state with our CGDP-motivated belief state improves multi-hop reasoning by up to $11.4\%$; while the modular programmatic exhaustion detection saves up to $39\%$ of tokens without any degradation in agent performance. Ultimately, we argue that framing the LLM agent loop as a CGDP can guide the design of modular, non-interfering improvements to agentic search harnesses.

2605.07041 2026-05-11 cs.RO cs.CV

Dr-BA: Separable Optimization for Direct Radar Bundle Adjustment & Localization

Dr-BA:直接雷达束调整与定位的可分离优化

Daniil Lisus, Cedric Le Gentil, Timothy D. Barfoot

发表机构 * Robotics Institute, University of Toronto(多伦多大学机器人研究所)

AI总结 本文提出Dr-BA框架,通过直接处理2D旋转雷达强度图像实现雷达束调整与定位,利用雷达抗雨优势,解决稠密地图与传感器姿态联合估计问题,实现最先进的雷达束调整与跨会话定位性能。

Comments Accepted for presentation at RSS 2026

详情
AI中文摘要

本文提出Dr-BA框架,通过直接处理2D旋转雷达强度图像实现雷达束调整与定位,利用雷达抗雨优势,解决稠密地图与传感器姿态联合估计问题,实现最先进的雷达束调整与跨会话定位性能。

英文摘要

This paper introduces Dr-BA, a first-of-its-kind radar bundle adjustment (BA) framework that operates directly on 2D spinning radar intensity images. Unlike camera or lidar sensors, radar is largely unaffected by precipitation, making it a critical modality for autonomous systems that require all-weather robustness. Existing state estimation approaches using spinning radar typically extract sparse point clouds from range-azimuth-intensity measurements and apply point cloud alignment techniques to estimate vehicle motion, scene structure, or to localize within an existing map. In contrast, Dr-BA uses the full radar returns from multiple scans to jointly estimate dense maps and sensor poses. By formulating the problem as a separable optimization, we derive an efficient and general solution that decouples pose estimation from mapping. In addition to solving the BA problem, this formulation naturally extends to direct radar-only localization (DRL) within a previously built map. Dr-BA achieves state-of-the-art radar-based BA and cross-session localization performance, demonstrated on more than 200 km of on-road data across five distinct routes. Our implementation is publicly available at https://github.com/utiasASRL/dr_ba.

2605.07040 2026-05-11 cs.CL cs.AI cs.CY

Cognitive Agent Compilation for Explicit Problem Solver Modeling

认知代理编译用于显式问题求解器建模

Hyeongdon Moon, Carolyn Rosé, John Stamper

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文提出Cognitive Agent Compilation框架,利用强教师LLM将问题解决知识编译为显式目标代理,旨在提高教育场景中问题解决的可检查性和可编辑性。

Comments Accepted to AIED 2026 Blue Sky

详情
AI中文摘要

大型语言模型(LLMs)被广泛用于辅导、反馈生成和内容创作,但其广泛的预训练使其难以约束且不适合作为可控学习者替代品。教育系统往往需要可检查和可编辑的知识状态:教育者希望了解系统假设学习者知道什么,而学习者受益于系统能根据显式技能、误解和策略来证明行动。受认知架构启发,我们提出Cognitive Agent Compilation(CAC),一个框架,利用强教师LLM将问题解决知识编译为显式目标代理。CAC将(i)知识表示、(ii)问题解决策略和(iii)验证和更新规则分开,目标是使有界问题解决在教育场景中更可检查和可编辑。我们展示了使用小型语言模型实现的早期证明概念,揭示了关键设计权衡,特别是显式控制与可扩展泛化之间的平衡,并将CAC定位为教育应用中有界知识AI的初步步骤。

英文摘要

Large language models (LLMs) are widely used for tutoring, feedback generation, and content creation, but their broad pretraining makes them hard to constrain and poor substitutes for controllable learners. Educational systems often require inspectable and editable knowledge states: educators want to know what a system assumes the learner knows, and learners benefit when the system can justify actions in terms of explicit skills, misconceptions, and strategies. Inspired by cognitive architectures, we propose Cognitive Agent Compilation (CAC), a framework that uses a strong teacher LLM to compile problem-solving knowledge into an explicit target agent. CAC separates (i) knowledge representation, (ii) problem-solving policy, and (iii) verification and update rules, with the goal of making bounded problem solving more inspectable and editable in educational settings. We present an early proof of concept implemented with Small Language Models that surfaces key design trade-offs, particularly between explicit control and scalable generalization, and positions CAC as an initial step toward bounded-knowledge AI for educational applications.

2605.07039 2026-05-11 cs.LG

PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

PACEvolve++: 提高进化搜索代理的测试时学习

Minghao Yan, Bo Peng, Benjamin Coleman, Ziqi Chen, Zhouhang Xie, Shuo Chen, Zhankui He, Noveen Sachdeva, Weili Wang, Ed H. Chi, Shivaram Venkataraman, Wang-Cheng Kang, Derek Zhiyuan Cheng, Beidou Wang

发表机构 * Google(谷歌) University of Wisconsin–Madison(威斯康星大学麦迪逊分校) Google DeepMind(谷歌DeepMind)

AI总结 PACEvolve++通过强化学习框架提升进化搜索代理的测试时策略适应能力,采用可训练顾问生成并评估假设,结合前沿模型生成可执行候选,实现更快收敛和更稳定的测试时训练。

详情
AI中文摘要

大型语言模型已成为进化搜索的驱动力,但大多数系统依赖固定提示引导的策略生成下一个候选。这限制了在实际工程和研究任务中的适应性,其中评估成本高且进展依赖于学习任务特定的搜索动态。我们引入PACEvolve++,一种用于进化搜索代理测试时策略适应的顾问模型强化学习框架。PACEvolve++将战略搜索决策与实现解耦:可训练的顾问生成、评估并选择假设,而更强的前沿模型将选定的假设转换为可执行候选。为了在非平稳反馈下训练顾问,我们提出了一种相适应方法,使其优化策略适应进化过程的不同阶段。早期进化中,它使用组相对反馈学习广泛搜索偏好;后期,随着奖励差距压缩,它强调最佳的k前沿贡献以支持稳定细化。在专家并行负载平衡、顺序推荐和蛋白质适应度外推中,PACEvolve++优于具有前沿模型的最新进化搜索框架,实现更快收敛并稳定测试时训练。

英文摘要

Large language models have become drivers of evolutionary search, but most systems rely on a fixed, prompt-elicited policy to sample next candidates. This limits adaptation in practical engineering and research tasks, where evaluations are expensive, and progress depends on learning task-specific search dynamics. We introduce PACEvolve++, an advisor-model reinforcement learning framework for test-time policy adaptation in evolutionary search agents. PACEvolve++ decouples strategic search decisions from implementation: a trainable advisor generates, assesses, and selects hypotheses, while a stronger frontier model translates selected hypotheses into executable candidates. To train the advisor under non-stationary feedback, we propose a phase-adaptive approach that adapts its optimization strategy to different phases of the evolutionary process. Early in evolution, it uses group-relative feedback to learn broad search preferences; later, as reward gaps compress, it emphasizes best-of-$k$ frontier contribution to support stable refinement. Across expert-parallel load balancing, sequential recommendation, and protein fitness extrapolation, PACEvolve++ outperforms the state-of-the-art evolutionary search framework with frontier models, achieving faster convergence and stabilizing test-time training during evolutionary search.

2605.07038 2026-05-11 cs.LG cs.MA cs.RO

Learning Material-Aware Hamiltonian Risk Fields for Safe Navigation

学习材料感知的哈密顿风险场以实现安全导航

Aditya Sai Ellendula, Yi Wang, Chandrajit Bajaj

发表机构 * Department of Computer Science, University of Texas at Austin(德克萨斯大学奥斯汀分校计算机科学系) Oden Institute, University of Texas at Austin(德克萨斯大学奥斯汀分校奥登学院)

AI总结 本文提出一种基于材料感知的哈密顿风险场方法,通过引入上下文能量项,实现安全导航中的风险选择性控制,实验验证了其在不同场景下的有效性。

详情
AI中文摘要

风险感知导航应具有选择性:策略应仅在局部场景允许较低风险可行操作时暴露逃避自由度,否则应抑制它们。我们证明,将一个上下文能量项添加到端口-哈密顿导航策略中,会产生具有恰好这种可检验特征的学习力通道。当局部风险场包含可行的低风险方向时,诱导的上下文力会朝向该方向;当显而易见的逃脱被阻挡或尚未可用时,路线感知的门会抑制横向力而不是 hallucinating 一个不安全的操作。一个CVaR尾风险目标将梯度更新集中在罕见但重要的风险转换上。我们验证了选择性特征在四个设置中的有效性。在主要的延迟必需逃脱基准中,路线感知的CVaR将提前力激活从0.950降低到0.180,同时提高成功率从0.480到0.810,无需重新规划。在现实的非铺装道路地形(RELLIS-3D)上,路线感知的丰富度实现了正确的激活率0.837和错误激活率0.114,相比标量风险梯度的0.378/0.752。在静态语义地图(DFC2018)上,丰富度将灾难性故障从0.60降低到0.10,并减少了90.7%的振荡,同时保持路径效率。在高速公路交通中,当可行的车道逃脱存在时,碰撞率从100%降至0%;当没有逃脱存在时,策略会抑制横向操作。选择性属性源于上下文能量的梯度结构,而非训练时的调优。

英文摘要

Risk-aware navigation should be selective: a policy should expose evasive degrees of freedom only when the local scene admits a lower-risk feasible maneuver, and suppress them when no safer alternative exists. We show that adding one context-energy term to a port-Hamiltonian navigation policy produces a learned force channel with exactly this falsifiable signature. When the local risk field contains a feasible lower-risk direction, the induced context force activates toward it; when the apparent escape is blocked or not yet available, a route-aware gate suppresses lateral force rather than hallucinating an unsafe maneuver. A CVaR tail-risk objective focuses gradient updates on rare but consequential risk transitions. We validate the selectivity signature across four settings. In the primary delayed-required-escape benchmark, route-aware CVaR reduces premature force activation from 0.950 to 0.180 versus DWA while raising success from 0.480 to 0.810 with zero replans. On real off-road terrain (RELLIS-3D), route-aware enrichment achieves correct activation rate 0.837 and false activation rate 0.114, compared to 0.378/0.752 for scalar risk gradients. On static semantic maps (DFC2018), enrichment reduces catastrophic failure from 0.60 to 0.10 and oscillation by 90.7% while preserving path efficiency. In highway traffic, collisions drop from 100% to 0% when a lane escape is feasible; when no escape exists, the policy suppresses the lateral maneuver. The selectivity property follows from the gradient structure of the context energy rather than from training-time tuning.

2605.07023 2026-05-11 cs.CV

OneViewAll: Semantic Prior Guided One-View 6D Pose Estimation for Novel Objects

OneViewAll:基于语义先验的单视角6D姿态估计用于新物体

Yang Luo, Yan Gong, Yongsheng Gao, Jie Zhao, Xinyu Zhang, Huaping Liu

发表机构 * Senior Member, IEEE(IEEE高级会员) Member, IEEE(IEEE会员) Fellow, IEEE(IEEE会士)

AI总结 OneViewAll提出一种基于语义先验的框架,通过新颖的投影-比较方法实现单视角6D姿态估计,无需CAD模型,实现92.5%的ADD-0.1精度,优于现有方法。

详情
AI中文摘要

在许多实际的6D物体姿态估计场景中,我们通常只能获得每个物体一个真实的RGB-D参考视角,通常没有CAD模型。现有方法主要依赖于显式的3D模型或多视角数据,限制了其可扩展性。为解决这种具有挑战性的单参考模型自由设置,我们提出了OneViewAll,一种基于语义先验的框架,通过新颖的投影-比较方法进行姿态估计。与依赖计算昂贵的CAD基渲染不同,我们的方法直接在投影等变空间内对齐参考和查询观测。OneViewAll逐步整合了三个层次的层次语义先验:(1)类别和场景级别的先验用于高效假设初始化;(2)物体级别的对称性先验通过镜像融合进行几何补全;(3)斑块级别的先验用于判别性细化。广泛的实验表明,OneViewAll在仅使用一个真实参考视角的情况下,在LINEMOD数据集上实现了92.5%的ADD-0.1精度,显著优于CVPR 2025基线One2Any(52.6%)。它还在YCB-V、Real275和Toyota-Light上实现了一致的改进,同时保持低推理延迟。我们的结果强调了对称性感知投影在处理对称、无纹理和遮挡物体方面的有效性。

英文摘要

In many practical 6D object pose estimation scenarios, we often have access to only a single real-world RGB-D reference view per object, typically without CAD models. Existing methods largely rely on explicit 3D models or multi-view data, which limits their scalability. To address this challenging single-reference model-free setting, we propose \textbf{OneViewAll}, a semantic-prior-guided framework that performs pose estimation via a novel Project-and-Compare paradigm. Instead of relying on computationally expensive CAD-based rendering, our method directly aligns reference and query observations within a projection-equivariant space. OneViewAll progressively integrates hierarchical semantic priors across three levels: (1) \textit{category- and scene-level} priors for efficient hypothesis initialization; (2) \textit{object-level symmetry} priors for geometry completion via mirror fusion; and (3) \textit{patch-level} priors for discriminative refinement. Extensive experiments demonstrate that OneViewAll achieves \textbf{92.5\%} ADD-0.1 accuracy on the LINEMOD dataset using only one real reference view -- significantly outperforming the CVPR 2025 baseline One2Any (52.6\%). It also yields consistent improvements on YCB-V, Real275, and Toyota-Light while maintaining low inference latency. Our results underscore the efficacy of symmetry-aware projection in handling symmetric, texture-less, and occluded objects.

2605.07020 2026-05-11 cs.LG cs.AI

FlashMol: High-Quality Molecule Generation in as Few as Four Steps

FlashMol:在四步内生成高质量分子构象

Xinyuan Wei, Zian Li, Shaoheng Yan, Cai Zhou, Muhan Zhang

发表机构 * Institute for Artificial Intelligence, Peking University(北京大学人工智能研究院) Yuanpei College, Peking University(北京大学元培学院) School of Intelligence Science and Technology, Peking University(北京大学智能科学与技术学院) Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology(麻省理工学院电气工程与计算机科学系) State Key Laboratory of General Artificial Intelligence, Peking University(北京大学通用人工智能国家重点实验室)

AI总结 FlashMol通过改进分布匹配蒸馏方法,在四步内生成高质量分子构象,实验表明其在速度和质量上优于传统模型。

详情
AI中文摘要

生成化学上有效的3D分子构象对计算药物发现至关重要。经典扩散模型如GeoLDM表现良好但需要数百步,使大规模计算机筛选不切实际。最近的努力将分子生成步骤减少到12-50步,但通常牺牲样本稳定性。在本文中,我们提出FlashMol,一种超快分子生成模型,能够在四步内生成高质量分子构象。为此,我们将分布匹配蒸馏(DMD)-一个反KL散度最小化目标-适应到分子领域以实现有效蒸馏。考虑到DMD的局部最小化行为,我们重新空间化分子生成时间步,为生成器提供更好的初始化并实现有效蒸馏。此外,为了减轻DMD的模式寻求行为并提高多样性,我们进一步用詹森-肖罗散度项进行正则化,该项结合了正向KL散度的平均寻求行为。在QM9和GEOM-DRUG数据集上的广泛实验表明,FlashMol与原1000步教师模型相当,甚至在保持高质量分子的同时,采样速度提高了250倍。

英文摘要

Generating chemically valid 3D molecular conformations is critical for computational drug discovery. Classical diffusion-based models like GeoLDM perform well but require hundreds of steps, making large-scale in silico screening impractical. Recent efforts on few-step molecular generation have accelerated this process to 12-50 steps, but they often largely sacrifice sample stability. In this work, we present FlashMol, an ultra-fast molecule generative model producing high-quality molecular conformations in as few as 4 steps. To achieve this, we adapt distribution matching distillation (DMD) - a reverse KL-divergence minimization objective - to the molecular domain for effective distillation. Considering the local minimization behavior of DMD, we respace the molecule generation timesteps, providing the generator with much better initialization and enables effective distillation. Additionally, to mitigate the mode-seeking behavior of DMD and improve diversity, we further regularize it with a Jensen-Shannon divergence term, which incorporates the mean-seeking behavior of the forward KL divergence. Extensive experiments on QM9 and GEOM-DRUG datasets demonstrate that FlashMol matches and even surpasses the original 1000-step teacher, achieving up to 250$\times$ acceleration in sampling speed while maintaining high molecular quality.

2605.07019 2026-05-11 cs.CV cs.AI

LensVLM: Selective Context Expansion for Compressed Visual Representation of Text

LensVLM:压缩文本视觉表示的选取性上下文扩展

Roy Xie, Dan Friedman, Donghan Yu, Bowen Pan, Christopher Fifty, Jang-Hyun Kim, Xianzhi Du, Zhe Gan, Vivek Rathod, Bhuwan Dhingra

发表机构 * Apple(苹果公司) Duke University(杜克大学)

AI总结 LensVLM通过选择性上下文扩展提升压缩视觉表示的准确性,在4.3倍压缩下表现优异,且在多模态任务中表现更佳。

详情
AI中文摘要

视觉语言模型(VLMs)提供了将文本作为渲染图像处理的可能性,从而绕过将文本分词为长序列的需要。由于VLM图像编码器将固定大小的图像映射到固定数量的视觉标记,变化的渲染分辨率提供了精细的压缩调节器。然而,随着压缩的增加,准确性迅速下降:字符缩小到视觉编码器的有效分辨率以下,使其无法区分。为此,我们提出了LensVLM,一种推理框架和后训练配方,使VLM能够扫描压缩图像,然后通过学习的工具选择性地将相关图像扩展到未压缩形式。基于Qwen3.5-9B-Base,LensVLM在4.3倍有效压缩下保持与全文上限相当的准确性,并在七个文本问答基准上,比基于检索、文本和视觉压缩的基线高出达10.1倍有效压缩。LensVLM还扩展到多模态文档和代码理解任务,随着压缩的增加,其在基线上的准确性提升也相应增加。我们的分析验证了这种方法:训练使视觉压缩对渲染选择具有鲁棒性,并随着压缩的增加,模型越来越依赖扩展内容而不是不可靠的视觉阅读。分析还提供了实用的工具选择指导:文本扩展更适合渲染文本,而高分辨率图像扩展适合原生文档,其布局提示携带任务相关的信息。

英文摘要

Vision Language Models (VLMs) offer the exciting possibility of processing text as rendered images, bypassing the need for tokenizing the text into long token sequences. Since VLM image encoders map fixed-size images to a fixed number of visual tokens, varying rendering resolution provides a fine-grained compression knob. However, accuracy deteriorates quickly as compression increases: characters shrink below the vision encoder's effective resolution, making them indistinguishable. To address this, we propose LensVLM, an inference framework and post-training recipe that enables VLMs to scan compressed images, then selectively expand only the relevant images to their uncompressed form via learned tools. Building on Qwen3.5-9B-Base, LensVLM maintains accuracy comparable to the full-text upper bound at 4.3x effective compression and outperforms retrieval-based, text- and visual-compression baselines up to 10.1x effective compression across seven text QA benchmarks. LensVLM also generalizes to multimodal document and code understanding tasks, with the accuracy gain over baselines growing as compression increases. Our analysis validates this approach: training makes visual compression robust to rendering choices, and as compression grows the model increasingly relies on expanded content rather than unreliable visual reading. The analysis also yields practical tool-choice guidance: text expansion is preferable for rendered text, while high-resolution image expansion suits native documents whose layout cues carry task-relevant information.

2605.07011 2026-05-11 cs.LG

Dual-Agent Co-Training for Health Coaching via Implicit Adversarial Preference Optimization

双代理协同训练用于健康教练的隐式对抗偏好优化

Da Long, Lingyi Fu, Diya Michelle Rao, Jasmine Ruales Carrera, Yang Bai, Shandian Zhe

发表机构 * Kahlert School of Computing University of Utah(Utah大学计算学院Kahlert学院) Department of Health and Kinesiology University of Utah(健康与运动科学系Utah大学)

AI总结 本文提出双代理框架,通过隐式对抗偏好优化提升健康教练质量,改进对话和模拟器训练。

详情
AI中文摘要

基于激励访谈的健康教练是改善心理健康和促进健康行为的有效方法。然而,受训人类教练稀缺和辅导服务成本高,使得许多人无法获得支持。为此,本文提出双代理框架,通过Pareto主导响应对优化健康教练代理,并通过反转偏好进行对抗训练,从而提升交互空间探索和目标代理能力发展。

英文摘要

Motivational-interviewing-based health coaching is an effective approach for improving mental health and promoting healthy behavior change. However, the scarcity of trained human coaches and the high cost of coaching services make such support inaccessible to many people who could benefit from it. This motivates the development of AI health coaches that can provide scalable and affordable support. Existing methods typically optimize only one side of the interaction: they either train a dialogue agent against a fixed client environment or train a client simulator against a fixed assistant. This one-sided setup can limit exploration of the interaction space and may be inefficient at developing the capabilities required by the target agent and pushing its performance boundaries. In this paper, we propose a dual-agent framework that interactively co-trains both the health coach agent and the client simulator. The coach is optimized with DPO using Pareto-dominant response pairs identified by a multi-dimensional LLM judge. In turn, the client is trained adversarially by reversing these preferences, inducing an implicit adversarial training dynamic. We further show that this co-training process admits a natural stochastic-game interpretation. Extensive experiments demonstrate that our method effectively improves coaching quality across several important dimensions.

2605.07010 2026-05-11 cs.LG

Inductive Power Grid Cascading Failure Analysis with GRU-Gated Graph Attention

基于GRU门控图注意力的归纳性电力网络级联故障分析

Tianxin Zhou, Xiang Li, Haibing Lu

发表机构 * Tianxin Zhou1(周天新1) Xiang Li1(李祥1) Haibing Lu2(卢海兵2)

AI总结 本文提出一种基于GRU门控图注意力网络的方法,能有效识别电力网络中潜在的脆弱传输线路,适用于多电网跨时间和领域场景。

Comments 10 pages, 10 figures, IEEE format

详情
AI中文摘要

在级联故障发生前识别脆弱传输线路具有挑战性:现有方法能从级联数据中学习线路故障相关性,但其训练和评估均基于单一电网,将学习到的知识迁移到未见过的电网仍是一个开放性问题。我们通过在有限训练电网的综合级联故障数据上训练一个单GRU门控图注意力网络,并直接应用于任何未见过的电网而无需重新训练。GRU门控控制每个节点在每次级联迭代中保留或丢弃的信息。实证评估表明,该模型能实现零样本迁移至多个新电网,涵盖跨时间和跨领域设置。利用训练模型提取的信息,我们能够一致地识别出比现有结构和电气基准更脆弱的线路。

英文摘要

Identifying vulnerable transmission lines in power grids before a cascading failure occurs is challenging: existing methods can learn inter-line failure correlations from cascade data, but they are trained and evaluated on a single grid, and transferring the learned knowledge to an unseen grid remains an open problem. We address this by training a single Gated Recurrent Unit (GRU)-gated Graph Attention Network on combined cascading failure data from limited training grids and applying it directly to any unseen grid without retraining. A GRU gate controls what information each node retains or discards at each cascade iteration. Empirical evaluation shows that the model transfers zero-shot to multiple new grids spanning inter-time and inter-domain settings. Using information extracted from the trained model, we consistently identify more vulnerable lines than established structural and electrical baselines.

2605.07003 2026-05-11 cs.RO cs.SY eess.SY

AirBender: Adaptive Transportation of Bendable Objects Using Dual UAVs

AirBender: 使用双无人机自适应运输可弯曲物体

Jiawei Xu, Longsen Gao, Rafael Fierro, David Saldaña

发表机构 * Autonomous and Intelligent Robotics Laboratory (AIRLab)(自主与智能机器人实验室) Lehigh University(莱斯大学) Electrical and Computer Engineering Department(电子与计算机工程系) The University of New Mexico(新墨西哥大学)

AI总结 本文提出一种自适应控制器,使双无人机在不依赖显式弹性模型的情况下协同运输可弯曲物体,通过Lyapunov分析证明控制器渐近稳定,通过硬件实验验证了多旋翼无人机处理可弯曲物体的能力。

详情
AI中文摘要

在空中与可弯曲物体的交互在控制方面带来了重大挑战,通常导致性能下降和潜在碰撞,尤其是对于空中机器人,由于其有限的作动能力和持续需要保持空中状态。本文提出了一种自适应控制器,使两个空中车辆能够协同跟随轨迹运输可弯曲物体,而无需依赖显式弹性模型。我们的方法允许实时适应物体未知的变形特性,确保轨迹跟踪任务的稳定性和性能。我们使用Lyapunov分析证明我们的自适应控制器是渐近稳定的。我们的方法通过在各种场景中的硬件实验进行评估,展示了使用多旋翼空中车辆处理可弯曲物体的能力。

英文摘要

The interaction of robots with bendable objects in midair presents significant challenges in control, often resulting in performance degradation and potential crashes, especially for aerial robots due to their limited actuation capabilities and constant need to remain airborne. This paper presents an adaptive controller that enables two aerial vehicles to collaboratively follow a trajectory while transporting a bendable object without relying on explicit elasticity models. Our method allows on-the-fly adaptation to the object's unknown deformable properties, ensuring stability and performance in trajectory-tracking tasks. We use Lyapunov analysis to demonstrate that our adaptive controller is asymptotically stable. Our method is evaluated through hardware experiments in various scenarios, demonstrating the capabilities of using multirotor aerial vehicles to handle bendable objects.

2605.07002 2026-05-11 cs.AI math.ST stat.ML stat.TH

Adaptive auditing of AI systems with anytime-valid guarantees

具有任何时间有效保证的AI系统自适应审计

Siyu Zhou, Patrick Vossler, Venkatesh Sivaraman, Yifan Mai, Jean Feng

发表机构 * University of California, San Francisco(加州大学旧金山分校) Stanford University(斯坦福大学)

AI总结 本文提出一种自适应审计框架,通过双视角假设检验方法,在有限样本下实现统计严谨的AI系统鲁棒性验证,证明在严格审计下系统可被认证为全局鲁棒。

详情
AI中文摘要

在生成AI系统故障模式表征中,标注和评估的成本和时间是一个主要瓶颈。因此,自适应测试范式受到青睐,其中根据过往结果机会性决定标注案例和数量。尽管该框架高度实用,但其极端灵活性使得难以得出统计严谨的结论,因为它违反了经典假设:观测数量通常有限(通常10到50个案例),采样和停止决策是在数据收集过程中而非基于预定义规则。为表征从高度自适应审计中可以得出什么统计推论,我们引入了两种'对立'视角的假设检验框架:(i)模型的零假设,即不存在性能低于目标阈值的故障模式;(ii)审计员的零假设,即审计员有采样策略能发现故障模式。利用Safe Anytime-Valid Inference(SAVI),我们将审计员视为进行'测试赌局',这转化为同时进行e-processes以检验对立的零假设。此外,如果审计员足够强大,我们证明这两个假设在渐近意义上是互为逆的,即通过严格审计确实可认证AI系统为全局鲁棒。实证表明,我们提出的方法在有限样本下维持任何时间有效的一类错误控制,优于预定义测试方法,并且有时在仅20个观测下即可得出统计严谨的结论。

英文摘要

A major bottleneck in characterizing the failure modes of generative AI systems is the cost and time of annotation and evaluation. Consequently, adaptive testing paradigms have gained popularity, where one opportunistically decides which cases and how many to annotate based on past results. While this framework is highly practical, its extreme flexibility makes it difficult to draw statistically rigorous conclusions, as it violates classical assumptions: the number of observations is typically limited (often 10 to 50 cases) and decisions regarding sampling and stopping are made in the midst of data collection rather than based a pre-specified rule. To characterize what statistical inferences can be drawn from highly adaptive audits, we introduce a hypothesis testing framework from two 'dueling' perspectives: (i) the model's null that asserts there is no failure mode with performance below a target threshold versus (ii) the auditor's null that asserts they have a sampling strategy that will uncover a failure mode. Leveraging Safe Anytime-Valid Inference (SAVI), we formalize the auditor as conducting 'testing by betting', which translates into simultaneous e-processes for testing the dueling null hypotheses. Furthermore, if the auditor is sufficiently powerful, we prove that these two hypotheses are asymptotically inverses of each other, in that passage of a stringent audit does in fact certify the AI system as being globally robust. Empirically, we demonstrate that our proposed testing procedures maintain anytime-valid type-I error control, outperform pre-specified testing methods, and can reach statistically rigorous conclusions sometimes with as few as 20 observations.

2605.06997 2026-05-11 cs.LG

Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators

Echo:无KV缓存的关联回忆与谱Koopman算子

Anupama Sridhar, Alexander Johansen

发表机构 * Stanford University(斯坦福大学)

AI总结 Echo通过谱Koopman算子实现无KV缓存的关联回忆,解决了传统Transformer的内存瓶颈问题,在多个基准测试中展现出卓越的检索性能。

详情
AI中文摘要

长链式推理和代理工具调用会产生数万token的轨迹,但Transformer的KV缓存随序列长度线性增长,导致在商用硬件上出现内存瓶颈。状态空间模型提供常数内存递归但存在内存悬崖:一旦存储事实与查询之间的差距超过递归状态的有效范围,检索准确性就会崩溃。我们引入Echo,一种无KV缓存的关联回忆架构,围绕Spectral Koopman Attention(SKA)构建;SKA是一种可直接替换注意力层的算子,它通过闭式动力学算子增强SSM块,其充分统计量在常数内存中累积,无需KV缓存。Echo通过核岭回归拟合键值历史的谱线性系统,并通过学习的幂迭代滤波器进行检索,所有操作都在O(r²)流式状态中完成,其中r是小投影秩。在多查询关联回忆基准测试中,纯Mamba-2 SSM在所有间隙长度和KV对数量上都无法超过偶然准确性(约3%),而SKA增强模型在5000万参数规模下,每种配置测试均达到100%检索准确率,包括有4096个token的干扰间隙和32个KV对。在五个额外的迁移基准测试中,包括针尖在 haystack、工具轨迹和多跳检索,SKA始终优于纯SSM和SSM+注意力混合模型,同时保持常数推理内存。消融实验确认,谱算子而非前缀掩码策略驱动了检索增益。

英文摘要

Long chain-of-thought reasoning and agentic tool-calling produce traces spanning tens of thousands of tokens, yet Transformer KV caches grow linearly with sequence length, creating a memory bottleneck on commodity hardware. State-space models offer constant-memory recurrence but suffer a memory cliff: retrieval accuracy collapses once the gap between a stored fact and its query exceeds the effective horizon of the recurrent state. We introduce Echo, a KV-cache-free associative recall architecture built around Spectral Koopman Attention (SKA); a drop-in replacement for attention layers that augments SSM blocks with a closed-form dynamical operator whose sufficient statistics are accumulated in constant memory with no KV cache. Echo fits a spectral linear system to the key and value history via kernel ridge regression and retrieves through a learned power-iterated filter, all from $O(r^{2})$ streaming state where $r$ is a small projection rank. On the Multi-Query Associative Recall benchmark, a pure Mamba-2 SSM fails to exceed chance accuracy (${\sim}3\%$) across all gap lengths and KV-pair counts, while at the 50M parameter scale SKA-augmented models achieve $100\%$ retrieval accuracy on every configuration tested, including distractor gaps of $4{,}096$ tokens with $32$ KV pairs. Across five additional transfer benchmarks including needle-in-a-haystack, tool-trace, and multi-hop retrieval, SKA consistently outperforms both pure SSM and SSM+Attention hybrids while maintaining constant inference memory. Ablations confirm that the spectral operator, not the prefix masking strategy, drives the retrieval gain.

2605.06993 2026-05-11 cs.AI stat.ML

Optimal Experiments for Partial Causal Effect Identification

部分因果效应识别的最优实验

Tobias Maringgele, Jalal Etesami

发表机构 * Technical University of Munich(慕尼黑技术大学)

AI总结 本文研究在有限成本下选择实验以最紧缩目标因果查询的界限,提出max-potency问题并证明其NP难,通过图修剪策略高效筛选实验,实验证明在随机图和基准网络上显著减少候选实验数量。

详情
AI中文摘要

因果查询通常只能部分从观测数据中识别,而能收紧结果界限的实验通常成本高昂。本文研究在观察实验结果前,选择成本受限的实验子集以最大化紧缩目标查询的界限。我们将此问题形式化为max-potency问题,其中epistemic potency衡量实验保证的最坏界宽减少量,并通过0-1背包问题归约证明其NP难。基于Duarte等人(2023)的多项式编程框架,我们提供了一种在离散设置中评估epistemic potency的一般方法。为控制超指数搜索空间,我们引入两种图修剪准则,仅依赖因果图和查询:一种新的路径拦截规则利用区域结构在线性时间内认证零potency,另一种基于ID算法的可识别性检查。在Erdos-Renyi随机图和11个bnlearn基准网络上,两种准则共同平均修剪50-88%的候选实验,而无需求解单个多项式程序。对于一般子集搜索,我们证明ID修剪的实验具有组合惰性,导致评估的子集数量呈超指数减少。最后,我们通过观察NHANES数据进行端到端演示,选择最优实验估计体力活动对糖尿病的影响。

英文摘要

Causal queries are often only partially identifiable from observational data, and experiments that could tighten the resulting bounds are typically costly. We study the problem of selecting, prior to observing experimental outcomes, a cost-constrained subset of experiments that maximally tightens bounds on a target query. We formalize this as the max-potency problem, where epistemic potency measures the worst-case reduction in bound width guaranteed by an experiment, and show that this problem is NP-hard via a reduction from 0-1 knapsack. Building on the polynomial-programming framework of Duarte et al. (2023), we give a general procedure for evaluating epistemic potency in discrete settings. To control the super-exponential search space, we introduce two graphical pruning criteria that depend only on the causal graph and the query: a novel path-interception rule that exploits district structure to certify zero potency in linear time, and an identifiability check based on the ID algorithm. On Erdos-Renyi random graphs and 11 bnlearn benchmark networks, the two criteria together prune 50-88% of candidate experiments on average without solving a single polynomial program. For the general subset search, we show that ID-pruned experiments are combinatorially inert, yielding a super-exponential reduction in the number of subsets evaluated. We close with an end-to-end demonstration on observational NHANES data, selecting optimal experiments for estimating the effect of physical activity on diabetes.

2605.06992 2026-05-11 cs.LG stat.ML

Why Does Agentic Safety Fail to Generalize Across Tasks?

为何代理安全无法跨任务泛化?

Yonatan Slutzky, Yotam Alexander, Tomer Slor, Yoav Nagel, Nadav Cohen

发表机构 * Some Institute(某些研究所)

AI总结 本文探讨了代理安全在多任务场景中泛化失败的原因,指出安全要求使任务与安全执行的关系更为复杂,通过理论和实验表明需新的方法提升安全性能。

详情
AI中文摘要

人工智能代理在多任务环境中日益被部署,其中在测试时指定要执行的任务,且代理必须泛化到未见过的任务。此类设置中的主要关注点是安全性:代理不仅要执行未见过的任务,还必须在避免风险和处理出现的问题时加以处理。实证证据表明,即使执行能力能够泛化到未见过的任务,安全能力往往却无法做到。本文通过理论和实验表明,代理安全性跨任务泛化失败并非仅仅由于训练方法的限制,而是反映了安全性本身的固有属性:任务与其安全执行之间的关系比任务与其执行之间的关系更为复杂。理论上,我们分析了线性二次控制与H∞鲁棒性,并证明了在安全要求下,从任务规范到最优控制器的映射具有更高的Lipschitz常数,从而得出一个独立于其他因素的Lipschitz界。实证上,我们在模拟四旋翼导航中使用神经网络代理,在CRM中使用大语言模型代理中展示了我们的结论。我们的发现表明,目前增强代理安全性的努力可能不足,并指出需要根本不同的方法。

英文摘要

AI agents are increasingly deployed in multi-task settings, where the task to perform is specified at test time, and the agent must generalize to unseen tasks. A major concern in such settings is safety: often, an agent must not only execute unseen tasks, but do so while avoiding risks and handling ones that materialize. Empirical evidence suggests that even when the ability to execute generalizes to unseen tasks, the ability to do so safely frequently does not. This paper provides theory and experiments indicating that failures of agentic safety to generalize across tasks are not merely due to limitations of training methods, but reflect an inherent property of safety itself: the relationship between a task and its safe execution is more complex than the relationship between a task and its execution alone. Theoretically, we analyze linear-quadratic control with $H_{\infty}$-robustness, and prove that the mapping from task specification to an optimal controller has higher Lipschitz constant with safety requirements than without, yielding a Lipschitz bound of independent interest. Empirically, we demonstrate our conclusions in simulated quadcopter navigation with a neural network agent and in CRM with an LLM agent. Our findings suggest that current efforts to enhance agentic safety may be insufficient, and point to a need for fundamentally different approaches.

2605.06990 2026-05-11 cs.CV cs.LG

TRAJGANR: Trajectory-Centric Urban Multimodal Learning via Geospatially Aligned Neural Representations

TRAJGANR: 通过地理对齐的神经表示进行轨迹导向的城市多模态学习

Maria Despoina Siampou, Gengchen Mai, Ni Lao, Jinmeng Rao, Neha Arora, Cyrus Shahabi, Shushman Choudhury

发表机构 * Google Research, Mountain View, CA(谷歌研究,山景城,加利福尼亚州) Google LLC, Mountain View, CA(谷歌公司,山景城,加利福尼亚州) Dept. of Computer Science, University of Southern California, Los Angeles, CA(计算机科学系,南加州大学,洛杉矶,加利福尼亚州) SEAI Lab, Dept. of Geography and the Environment, The University of Texas at Austin, Austin, TX(SEAI实验室,地理与环境系,德克萨斯大学奥斯汀分校,奥斯汀,德克萨斯州)

AI总结 TRAJGANR提出一种新的多模态自监督学习框架,通过将连续移动模式与静态位置观察对齐,提升城市理解和移动任务的性能,优于现有方法。

详情
AI中文摘要

多模态自监督学习(MSSL)已成为预训练地理空间基础模型的关键范式。然而,现有地理空间MSSL方法主要针对静态模态对,如卫星图像、街景图像和文本,学习驱动的是对同一或附近位置的观测对齐。这一假设在人类移动轨迹上失效,因为轨迹代表连续路径上的移动而非离散位置观测。尽管轨迹对于通过时间捕捉人类活动在道路、社区和地点的能力重要,但它们在当前地理空间MSSL框架中仍被忽视。我们提出TRAJGANR,一种新的轨迹导向的地理空间MSSL框架,将连续移动模式与静态位置观测对齐。TRAJGANR在每条路径的任意点学习轨迹的连续神经表示,使与附近街景图像的细粒度对齐成为可能,即使它们不与任何轨迹航点重合。我们利用这一能力引入一个MSSL目标,联合对齐轨迹、街景图像及其地理位置。我们在四个城市移动和道路理解任务上评估TRAJGANR。在这些任务中,TRAJGANR始终优于现有地理空间MSSL框架和一个轨迹特定的基础模型。消融研究进一步表明,我们提出MSSL目标和多模态学习框架是这些改进的主要驱动因素,突显了细粒度地理空间对齐相对于粗粒度聚合的重要性,以及地理空间多模态学习的重要性。

英文摘要

Multimodal self-supervised learning (MSSL) has emerged as a key paradigm for pretraining geospatial foundation models. However, existing geospatial MSSL methods are mainly designed for static pairs of modalities, such as satellite imagery, street-view imagery, and text, where learning is driven by aligning observations from the same or nearby locations. This assumption breaks down for human mobility trajectories, which represent continuous movement along paths rather than discrete observations at individual locations. Although trajectories are important for urban understanding through their ability to capture human activity across roads, neighborhoods, and places over time, they remain largely underexplored in current geospatial MSSL frameworks. We present TrajGANR, a novel trajectory-centric geospatial MSSL framework that aligns continuous movement patterns with static, location-based observations. TrajGANR learns a continuous neural representation of trajectories at arbitrary points along each path, which enables fine-grained alignment with nearby street-view images, even when they are not co-located with any trajectory waypoints. We leverage this capability to introduce an MSSL objective that jointly aligns three modalities: trajectories, street-view images, and their geographic locations. We evaluate TrajGANR on four urban mobility and road understanding tasks. Across these tasks, TrajGANR consistently outperforms existing geospatial MSSL frameworks and a trajectory-specific foundation model. Ablation studies further demonstrate that our proposed MSSL objective and the multimodal learning framework are the primary drivers of these improvements, highlighting the importance of fine-grained geospatial alignment over coarser aggregation, as well as geospatial multimodal learning.

2605.06987 2026-05-11 cs.LG cs.GT econ.TH stat.ML

Response Time Enhances Alignment with Heterogeneous Preferences

响应时间增强异质偏好对齐

Federico Echenique, Alireza Fallah, Baihe Huang, Michael I. Jordan

发表机构 * Department of Economics, University of California, Berkeley(加州大学伯克利分校经济系) Department of Computer Science and Ken Kennedy Institute, Rice University(休斯敦大学计算机科学系和肯尼迪研究所) Departments of Electrical Engineering and Computer Sciences, University of California, Berkeley(加州大学伯克利分校电子工程与计算机科学系) Departments of Electrical Engineering and Computer Sciences and Statistics, University of California, Berkeley(加州大学伯克利分校电子工程与计算机科学系和统计学系;巴黎Inria) Inria Paris

AI总结 本文提出通过引入用户响应时间信号来纠正传统偏好数据中对异质偏好的估计偏差,证明该方法能准确识别群体平均偏好,提升大型语言模型对人类偏好的对齐效果。

详情
AI中文摘要

对齐大型语言模型通常依赖于聚合反馈生成单一奖励模型,但该方法假设所有标注者具有相同偏好,忽略了现实中标注者异质性的问题。本文通过在偏好数据集中加入用户响应时间这一简单信号,恢复群体平均偏好的可识别性。通过将每个决策建模为漂移扩散模型(DDM),提出了一种新的异质偏好估计器,能够纠正传统选择数据的偏差。证明该估计器在极端情况下(每个匿名标注者仅提供一个选择)仍能渐近收敛到真实平均偏好。实验表明,在合成和真实数据集上,该方法优于传统基线方法,后者在性能上达到偏置上限。由于响应时间记录成本低且无需用户跟踪,该方法为未来数据收集流程提升社会效益提供了新机会。

英文摘要

Aligning large language models (LLMs) to human preferences typically relies on aggregating pooled feedback into a single reward model. However, this standard approach assumes that all labelers share the same underlying preferences, ignoring the fact that real-world labelers are highly heterogeneous and usually anonymous. Consequently, relying solely on binary choice data fundamentally distorts the learned policy, making the true population-average preference unidentifiable. To overcome this critical limitation, we demonstrate that augmenting preference datasets with a simple, secondary signal -- the user's response time -- can restore the identifiability of the population's average preference. By modeling each decision as a Drift-Diffusion Model (DDM), we introduce a novel, consistent estimator of heterogeneous preferences that successfully corrects the distortions of standard choice-only labels. We prove that our estimator asymptotically converges to the true average preference even in extreme cases where each anonymous labeler contributes only a single choice. Empirically, across both synthetic and real-world datasets, our method consistently outperforms standard baselines that otherwise fail and plateau at a bias floor. Because response times are essentially free to record and require zero user tracking or identification, our results bring promises and open up new opportunities for future data-collection pipelines to improve the social benefit without requiring user-level identifiers or repeated elicitations.

2605.06982 2026-05-11 cs.LG

FastOmniTMAE: Parallel Clause Learning for Scalable and Hardware-Efficient Tsetlin Embeddings

FastOmniTMAE:并行子句学习用于可扩展且硬件高效的Tsetlin嵌入

Ahmed K. Kadhim, Lei Jiao, Rishad Shafik, Ole-Christoffer Granmo, Mayur Kishor Shende

发表机构 * Department of ICT University of Agder(信息与通信技术系阿格德大学) School of Engineering Newcastle University(工程学院新castle大学)

AI总结 本文提出FastOmniTMAE,通过并行评估和更新阶段提升训练效率,实现分类任务5倍加速,同时在FPGA和SoC上实现高效逻辑嵌入学习。

详情
AI中文摘要

自然语言处理中的嵌入模型日益依赖深度架构如BERT,而更简单的Word2Vec提供高效表示但解释性有限。Tsetlin机(TM)提供了一种替代的逻辑学习范式。Omni TM自动编码器(Omni TM-AE)通过在单个子句层内利用自动机状态分布来应用此范式进行静态嵌入,但其训练过程仍较慢。本文提出FastOmniTMAE,是一种重新表述的Omni TM-AE,通过将顺序训练依赖替换为两阶段并行过程:评估和更新。使用单次运行多环境基准测试,涵盖分类、相似性和聚类,FastOmniTMAE在分类任务中实现了高达5倍的训练加速,同时在Spearman和Kendall相似性度量下保持可比的嵌入质量。为了解决TM在传统GPU上的有限效率,我们进一步在SoC-FPGA平台上实现FastOmniTMAE作为可重用的加速器。多硬件基准测试显示,FastOmniTMAE在资源受限的FPGA上达到相似性分数0.669,在UltraScale+ SoC上达到0.696,证明了在小硬件足迹下高效逻辑嵌入训练的可能性。

英文摘要

Embedding models in natural language processing (NLP) increasingly rely on deep architectures such as BERT, while simpler models such as Word2Vec provide efficient representations but limited interpretability. The Tsetlin Machine (TM) offers an alternative logic-based learning paradigm. Omni TM Autoencoder (Omni TM-AE) applies this paradigm to static embedding by exploiting automaton state distributions within a single clause layer, but its training process remains slow. In this work, we propose FastOmniTMAE, a reformulation of Omni TM-AE that replaces sequential training dependencies with a two-stage parallel process: evaluation and update. Using a Single-Run Multi-Environment Benchmark covering classification, similarity, and clustering, FastOmniTMAE achieves up to 5$\times$ faster training in classification while maintaining comparable embedding quality under both Spearman and Kendall similarity measures. To address the limited efficiency of TM training on conventional GPUs, we further implement FastOmniTMAE as a reusable accelerator on SoC-FPGA platforms. The Multi-Hardware Benchmark shows that FastOmniTMAE achieves similarity scores of 0.669 on a resource-constrained FPGA and 0.696 on an UltraScale+ SoC, demonstrating efficient logic-based embedding training with a small hardware footprint.

2605.06979 2026-05-11 cs.LG cs.AI stat.ML

PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction

PLOT:通过最优传输进行神经因果抽象的渐进定位

Jonathn Chang, Arya Datla, Ziv Goldfeld

发表机构 * Cornell University(康奈尔大学)

AI总结 PLOT通过最优传输框架实现因果变量的定位,能够在不同复杂度的实验中高效定位因果抽象,提升因果解释的效率和准确性。

详情
AI中文摘要

因果抽象提供了一个系统的方法来解释神经网络的机制,通过反事实干预分析将高层因果模型与神经网络的底层计算对齐。现有方法如分布式对齐搜索(DAS)学习表达性子空间干预,但相关神经位点是未知的,因此寻找处理点需要计算负担繁重的候选位点搜索。我们引入PLOT(通过最优传输进行渐进定位),一个基于传输的框架,从抽象和神经干预的输出效应几何中定位因果变量。PLOT在抽象变量和候选神经位点之间拟合最优传输耦合,产生一个全局软对应关系,可以校准为干预处理点。在简单设置中,单个耦合个体神经元即可。在较大模型中,PLOT逐步应用,从粗略位点如标记、时间步或层到更精细的支撑如坐标组或PCA跨度,可选地根据本地化信号指导DAS。在增加复杂性的实验中,仅传输的PLOT处理点异常快速且在准确性上具有竞争力,而PLOT指导的DAS在DAS级别的准确性上以远低于完整DAS运行时间达到,为大规模因果抽象研究提供高效的定位引擎。

英文摘要

Causal abstraction offers a principled framework for mechanistic interpretability, aligning a high-level causal model with the low-level computation realized by a neural network through counterfactual intervention analysis. Existing methods such as distributed alignment search (DAS) learn expressive subspace interventions, but the relevant neural site is unknown a priori, so finding a handle requires a computationally burdensome search over candidate sites. We introduce PLOT (Progressive Localization via Optimal Transport), a transport-based framework that localizes causal variables from the output effect geometry of abstract and neural interventions. PLOT fits an optimal transport coupling between abstract variables and candidate neural sites, yielding a global soft correspondence that can be calibrated into intervention handles. In simple settings, a single coupling over individual neurons suffices. In larger models, PLOT is applied progressively, moving from coarse sites such as tokens, timesteps, or layers to finer supports such as coordinate groups or PCA spans, and optionally guiding DAS based on the localized signal. Across experiments of increasing complexity, transport-only PLOT handles are exceedingly fast and competitive on accuracy, while PLOT-guided DAS reaches DAS-level accuracy at a fraction of full DAS runtime, providing an efficient localization engine for causal abstraction research at scale.

2605.06978 2026-05-11 cs.CL cs.AI

Group of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries

技能组:面向智能体技能库的分组结构技能检索

Kun Zeng, Yu Huo, Siyu Zhang, Zi Ye, Yuecheng Zhuo, Haoyue Liu, Yuquan Lu, Junhao Wen, Xiaoying Tang

发表机构 * School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)科学与工程学院) Sun Yat-sen University(孙中山大学) University of California, San Diego(加州大学圣地亚哥分校) Taiyuan University of Technology(太原理工大学)

AI总结 本文提出GoSkills,一种分组结构技能检索方法,通过构建基于类型技能图的锚点中心技能组,提升智能体在有限技能预算下对可见需求的覆盖能力,改进了传统技能检索方法的性能。

Comments 30 pages, 4 figures, 24 tables

详情
AI中文摘要

增强技能的智能体越来越多地依赖大型可重用技能库,但检索相关技能不同于提供可使用的上下文。现有方法通常返回原子技能或依赖感知的捆绑,其内部角色仍隐含,使智能体需自行推断执行入口点、支持技能、可见需求和故障规避指导。我们引入技能组(GoSkills),一种推理时的分组结构检索方法,将智能体面对的检索对象从扁平技能列表转变为紧凑、角色标注的执行上下文。GoSkills从类型技能图构建锚点中心技能组,通过组图扩展支持组,将选定的组计划瓶颈为有限的原子技能负载,并渲染具有Start、Support、Check和Avoid字段的固定执行合同,不改变下游智能体、技能负载或执行环境。在SkillsBench和ALFWorld上的实验表明,GoSkills在有限技能预算下保持可见需求覆盖,优于扁平技能访问基线,并在相对结构检索参考中经常提升奖励和智能体单独运行时间。

英文摘要

Skill-augmented agents increasingly rely on large reusable skill libraries, but retrieving relevant skills is not the same as presenting usable context. Existing methods typically return atomic skills or dependency-aware bundles whose internal roles remain implicit, leaving the agent to infer the execution entry point, support skills, visible requirements, and failure-avoidance guidance. We introduce Group of Skills (GoSkills), an inference-time group-structured retrieval method that changes the agent-facing retrieval object from a flat skill list to a compact, role-labeled execution context. GoSkills builds anchor-centered skill groups from a typed skill graph, expands support groups through a group graph, bottlenecks the selected group plan into a bounded set of atomic skill payloads, and renders a fixed execution contract with Start, Support, Check, and Avoid fields, without changing the downstream agent, skill payloads, or execution environment. Experiments on SkillsBench and ALFWorld show that GoSkills preserves visible-requirement coverage under a small skill budget, improves over flat skill-access baselines, and often improves reward and agent-only runtime relative to structural retrieval references.