arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2075
专题追踪
2602.19974 2026-05-11 cs.CV

RL-RIG: A Generative Spatial Reasoner via Intrinsic Reflection

基于内在反思的生成式空间推理器:RL-RIG

Tianyu Wang, Zhiyuan Ma, Qian Wang, Xinyi Zhang, Xinwei Long, Bowen Zhou

发表机构 * Zhiyuan College, Shanghai Jiao Tong University(上海交通大学紫元学院) Huazhong University of Science and Technology(华中科技大学) National University of Singapore(新加坡国立大学) Shanghai Jiao Tong University(上海交通大学) Tsinghua University(清华大学) Shanghai AI Laboratory(上海人工智能实验室)

AI总结 RL-RIG通过生成-反思-编辑范式,提升图像生成的精细空间关系捕捉能力,采用Scene Graph IoU和VLM评估策略,在空间一致性上优于现有模型。

详情
AI中文摘要

近期图像生成技术在高质量图像生成方面取得了显著进展,但现有模型仍面临空间推理难题,难以准确捕捉提示中的细粒度空间关系并生成结构完整的场景。为缓解此问题,我们提出了RL-RIG,一种基于反思的图像生成强化学习框架。该架构包含四个主要组件:Diffuser、Checker、Actor和Inverse Diffuser,遵循生成-反思-编辑范式,以激发图像生成中的链式思维能力。为增强模型对生成轨迹的直觉,我们进一步开发了Reflection-GRPO,用于训练VLM Actor处理编辑提示和Image Editor提升图像质量。与传统方法仅生成视觉吸引人但结构不合理的内容不同,我们的评估指标优先考虑空间准确性,利用Scene Graph IoU和VLM作为评判者策略,在LAION-SG数据集上评估生成图像的空间一致性。实验结果表明,RL-RIG在图像生成的可控性和精确空间推理方面,相较于现有最先进的开源模型,性能提升了高达11%。

英文摘要

Recent advancements in image generation have achieved impressive results in producing high-quality images. However, existing image generation models still generally struggle with a spatial reasoning dilemma, lacking the ability to accurately capture fine-grained spatial relationships from the prompt and correctly generate scenes with structural integrity. To mitigate this dilemma, we propose RL-RIG, a Reinforcement Learning framework for Reflection-based Image Generation. Our architecture comprises four primary components: Diffuser, Checker, Actor, and Inverse Diffuser, following a Generate-Reflect-Edit paradigm to spark the Chain of Thought reasoning ability in image generation for addressing the dilemma. To equip the model with better intuition over generation trajectories, we further develop Reflection-GRPO to train the VLM Actor for edit prompts and the Image Editor for better image quality under a given prompt, respectively. Unlike traditional approaches that solely produce visually stunning yet structurally unreasonable content, our evaluation metrics prioritize spatial accuracy, utilizing Scene Graph IoU and employing a VLM-as-a-Judge strategy to assess the spatial consistency of generated images on LAION-SG dataset. Experimental results show that RL-RIG outperforms existing state-of-the-art open-source models by up to 11% in terms of controllable and precise spatial reasoning in image generation.

2602.17472 2026-05-11 cs.RO

A Cost-Effective and Climate-Resilient Air Pressure System for Rain Effect Reduction on Automated Vehicle Cameras

一种成本效益高且气候适应性强的空气压力系统,用于减少自动化车辆摄像头上的降雨影响

Mohamed Sabry, Joseba Gorospe, Cristina Olaverri-Monreal

发表机构 * Department Intelligent Transport Systems, Johannes Kepler University Linz(智能交通系统部门,约翰内斯·开普勒大学林茨)

AI总结 本文提出了一种低成本硬件解决方案,以减少雨天对自动化车辆摄像头的影响,同时支持可持续交通系统目标,提升系统可靠性并减少资源消耗。

详情
AI中文摘要

近年来,自动化车辆的研究重点是提高恶劣天气下的感知性能,但针对物理硬件解决方案的研究仍有限,尽管这对关键应用如车辆编队至关重要。现有方法如亲水或疏水透镜和喷雾只能提供部分缓解,而工业保护系统成本高且不适用于汽车部署。为解决这些限制,本文提出了一种针对雨天的低成本硬件解决方案,设计为同时兼容多个摄像头。除了技术贡献外,该方案还支持交通系统的可持续发展目标。通过使现有基于摄像头的传感平台兼容,该系统在不需额外高成本传感器或硬件更换的情况下,延长了自动化车辆的运行可靠性。此方法减少了资源消耗,支持模块化升级,并促进自动化车辆技术的更高效部署,特别是在挑战性天气条件下,系统故障会导致效率低下和排放增加。所提出的系统能够将深度学习模型的人行道检测准确率从8.3%提高到41.6%。

英文摘要

Recent advances in automated vehicles have focused on improving perception performance under adverse weather conditions; however, research on physical hardware solutions remains limited, despite their importance for perception critical applications such as vehicle platooning. Existing approaches, such as hydrophilic or hydrophobic lenses and sprays, provide only partial mitigation, while industrial protection systems imply high cost and they do not enable scalability for automotive deployment. To address these limitations, this paper presents a cost-effective hardware solution for rainy conditions, designed to be compatible with multiple cameras simultaneously. Beyond its technical contribution, the proposed solution supports sustainability goals in transportation systems. By enabling compatibility with existing camera-based sensing platforms, the system extends the operational reliability of automated vehicles without requiring additional high-cost sensors or hardware replacements. This approach reduces resource consumption, supports modular upgrades, and promotes more cost-efficient deployment of automated vehicle technologies, particularly in challenging weather conditions where system failures would otherwise lead to inefficiencies and increased emissions. The proposed system was able to increase pedestrian detection accuracy of a Deep Learning model from 8.3% to 41.6%.

2602.16548 2026-05-11 cs.LG

RIDER: 3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion

RIDER: 3D RNA逆向设计与强化学习引导的扩散

Tianmeng Hu, Yongzheng Cui, Biao Luo, Ke Li

发表机构 * University of Exeter(埃克塞特大学) Central South University(中南大学)

AI总结 RIDER通过强化学习引导扩散模型直接优化3D结构相似性,提升RNA逆向设计的结构准确性与多样性。

Comments Accepted as a conference paper at ICLR 2026

详情
AI中文摘要

三维RNA逆向设计对于合成生物学和治疗学中的功能性RNA工程至关重要。尽管最近的深度学习方法在该领域取得了进展,但它们通常通过原生序列恢复进行优化和评估,这在结构保真度上是一个有限的替代方案,因为不同序列可以折叠成相似的3D结构,高恢复率并不一定意味着正确的折叠。为了解决这一限制,我们提出了RIDER,一种带有强化学习的RNA逆向设计框架,直接优化3D结构相似性。首先,我们开发并预训练了一个基于图神经网络的生成扩散模型,该模型条件于目标3D结构,实现了在最先进方法上的9%的原生序列恢复提升。然后,我们使用基于3D自一致性度量的四个任务特定奖励函数,通过改进的策略梯度算法对模型进行微调。实验结果表明,RIDER在所有指标上将结构相似性提高了超过100%,并发现了与原生序列不同的设计。

英文摘要

The inverse design of RNA three-dimensional (3D) structures is crucial for engineering functional RNAs in synthetic biology and therapeutics. While recent deep learning approaches have advanced this field, they are typically optimized and evaluated using native sequence recovery, which is a limited surrogate for structural fidelity, since different sequences can fold into similar 3D structures and high recovery does not necessarily indicate correct folding. To address this limitation, we propose RIDER, an RNA Inverse DEsign framework with Reinforcement learning that directly optimizes for 3D structural similarity. First, we develop and pre-train a GNN-based generative diffusion model conditioned on the target 3D structure, achieving a 9% improvement in native sequence recovery over state-of-the-art methods. Then, we fine-tune the model with an improved policy gradient algorithm using four task-specific reward functions based on 3D self-consistency metrics. Experimental results show that RIDER improves structural similarity by over 100% across all metrics and discovers designs that are distinct from native sequences.

2602.13837 2026-05-11 cs.CV

A Causal Diffusion Model for Video Reconstruction from Ultra-Low-Bitrate Representations

一种用于从超低比特率表示中重建视频的因果扩散模型

Cem Eteke, Batuhan Tosun, Martin Piccolrovazzi, Alexander Griessel, Wolfgang Kellerer, Eckehard Steinbach

发表机构 * Chair of Media Technology(媒体技术系) Chair of Communication Networks(通信网络系) Munich Institute of Robotics and Machine Intelligence(慕尼黑机器人与智能机械研究所) School of Computation Information and Technology, Technical University of Munich(计算信息与技术学院,慕尼黑技术大学)

AI总结 本文提出一种因果扩散模型,用于从超低比特率语义和高度压缩帧中重建视频,通过联合建模互补信息,提升重建质量,优于传统、神经、生成和语义基线方法。

详情
AI中文摘要

我们研究了从超低比特率表示中进行视频重建,其中主要挑战从编码转向解码。在此情况下,传统和神经编码器在重建中引入模糊,而生成和语义方法往往难以同时保持保真度、时间一致性和感知质量。为了解决这些限制,我们提出了一种因果视频扩散模型,通过联合建模超低比特率语义和高度压缩帧的互补信息来重建视频。我们进一步引入了从双向教师模型中仅时间的蒸馏,以实现参数高效训练和因果少量步推理。通过广泛的定量、定性和主观评估,我们表明所提出的方法在超低比特率视频重建中优于传统、神经、生成和语义基线方法。

英文摘要

We study video reconstruction from ultra-low-bitrate representations, where the primary challenge shifts from encoding to decoding. In this regime, reconstruction with classical and neural codecs introduces blur, while generative and semantic approaches often struggle to jointly preserve fidelity, temporal consistency, and perceptual quality. To address these limitations, we propose a causal video diffusion model that reconstructs videos from ultra-low-bitrate semantics and highly compressed frames by jointly modeling their complementary information. We further introduce temporal-only distillation from a bidirectional teacher to enable parameter-efficient training and causal few-step inference. Through extensive quantitative, qualitative, and subjective evaluation, we show that the proposed method outperforms classical, neural, generative, and semantic baselines in ultra-low-bitrate video reconstruction.

2602.13506 2026-05-11 cs.LG cs.AI math.OC

$γ$-weakly $θ$-up-concavity: A Unified Framework for Non-Convex Optimization Beyond DR-Submodular and OSS Functions

$γ$-弱$θ$-上凹性:超越DR子模函数和OSS函数的非凸优化统一框架

Mohammad Pedramfar, Vaneet Aggarwal

发表机构 * Mila - Quebec AI Institute/McGill University(蒙特利尔人工智能研究所/麦吉尔大学) Purdue University(普渡大学)

AI总结 本文提出$γ$-弱$θ$-上凹性条件,为非凸优化提供统一框架,涵盖DR子模和OSS函数,并通过线性化方法获得广泛问题的近似保证。

详情
AI中文摘要

非凸函数优化是机器学习和组合优化中的基础挑战。本文引入并研究$γ$-弱$θ$-上凹性,一种新的第一阶条件,刻画了广泛非凸函数类。该条件提供了一个强大的统一框架,严格扩展了DR子模和单侧平滑(OSS)函数,捕捉了更广泛的规模依赖曲率形式,包括累积后递减回报和平坦起始行为。我们的核心理论贡献证明,$γ$-弱$θ$-上凹函数是上线性化的:对于任何可行点,可以构造一个线性替代函数,其收益可证明近似原非线性目标。关键技术贡献是非均匀上线性化论证,得到的近似系数依赖于曲率参数和可行区域的几何特性。这种线性化能力为广泛问题提供即时统一的近似保证。具体而言,我们通过标准减少到线性优化,获得离线优化以及在线设置下的静态和动态遗憾界。此外,我们的框架恢复了DR子模最大化最优近似系数,并改进了OSS优化的现有近似系数,特别是在Matroid约束下。

英文摘要

Optimizing non-convex functions is a fundamental challenge across machine learning and combinatorial optimization. We introduce and study $γ$-weakly $θ$-up-concavity, a novel first-order condition that characterizes a broad class of such functions. This condition provides a powerful unifying framework, strictly generalizing both DR-submodular and One-Sided Smooth (OSS) functions while capturing broader forms of scale-dependent curvature, including accumulating-then-diminishing returns and flat-start behavior. Our central theoretical contribution demonstrates that $γ$-weakly $θ$-up-concave functions are upper-linearizable: for any feasible point, we can construct a linear surrogate whose gains provably approximate the original non-linear objective. A key technical contribution is a nonuniform upper-linearization argument yielding approximation coefficients that depend explicitly on the curvature parameters and the geometry of the feasible region. This linearizability yields immediate and unified approximation guarantees for a wide range of problems. Specifically, we obtain unified approximation guarantees for offline optimization as well as static and dynamic regret bounds in online settings via standard reductions to linear optimization. Moreover, our framework recovers the optimal approximation coefficient for DR-submodular maximization and improves existing approximation coefficients for OSS optimization, particularly over matroid constraints.

2602.13357 2026-05-11 cs.CV cs.AI

AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers

AdaCorrection: 用于准确扩散变换器的自适应偏移缓存校正

Dong Liu, Yanxuan Yu, Ben Lengerich, Ying Nian Wu

发表机构 * Columbia University(哥伦比亚大学)

AI总结 本文提出AdaCorrection框架,通过自适应偏移缓存校正提升扩散变换器的生成质量与缓存复用效率,减少计算开销并保持高保真度。

详情
AI中文摘要

扩散变换器(DiTs)在高保真图像和视频生成中表现优异,但其迭代去噪结构导致推理成本高。尽管先前方法通过缓存中间特征加速采样,但静态重用计划或粗粒度启发式方法常导致时间漂移和缓存对齐问题,严重影响生成质量。我们引入AdaCorrection,一种自适应偏移缓存校正框架,在扩散推理过程中维持高生成保真度的同时,实现跨Transformer层的高效缓存复用。在每个时间步,AdaCorrection通过轻量级时空信号估计缓存有效性,并自适应混合缓存和新鲜激活。该校正过程无需额外监督或重新训练。我们的方法在最小计算开销下实现了强生成质量,保持接近原始FID的同时提供适度加速。在图像和视频扩散基准测试中,AdaCorrection一致提升了生成性能。

英文摘要

Diffusion Transformers (DiTs) achieve state-of-the-art performance in high-fidelity image and video generation but suffer from expensive inference due to their iterative denoising structure. While prior methods accelerate sampling by caching intermediate features, they rely on static reuse schedules or coarse-grained heuristics, which often lead to temporal drift and cache misalignment that significantly degrade generation quality. We introduce \textbf{AdaCorrection}, an adaptive offset cache correction framework that maintains high generation fidelity while enabling efficient cache reuse across Transformer layers during diffusion inference. At each timestep, AdaCorrection estimates cache validity with lightweight spatio-temporal signals and adaptively blends cached and fresh activations. This correction is computed on-the-fly without additional supervision or retraining. Our approach achieves strong generation quality with minimal computational overhead, maintaining near-original FID while providing moderate acceleration. Experiments on image and video diffusion benchmarks show that AdaCorrection consistently improves generation performance.

2602.13224 2026-05-11 cs.AI cs.CL

A Geometric Taxonomy of Hallucinations in LLMs

大语言模型幻觉的几何分类

Javier Marín

发表机构 * Javier Marín

AI总结 本文提出基于嵌入空间几何的幻觉检测方法,通过三种操作类型区分可检测与不可检测的幻觉,验证了在不同领域数据集上的有效性。

详情
AI中文摘要

部署中的语言模型幻觉可能对医疗、法律和金融服务等领域的下游决策产生实际影响。在生产环境中,检测必须基于部署系统可见的内容:查询、响应和通常的源文档。白盒访问模型内部和多样本查询通常无法通过第三方API获得。在这一设定下,即黑盒、单次通过、仅问题/回答可用的情况下,主导的基线是NLI,其在失败时仅返回一个值而无诊断。我们主张直接操作嵌入空间的几何结构,提供检测方法,其成功与失败可解释为对比句编码训练的结构性质。贡献是:给定一个操作性动机的分类,几何预测哪些类型的幻觉可检测,哪些不可检测,并且预测成立。我们提出三种操作类型,按响应嵌入与基于单位超球面的 grounded 响应可信区域的关系组织,并从对齐目标推导出每种类型的预测:(1)查询附近的不忠实可通过角度比检测;(2)在可信区域外的编造产生方向性特征,其在专家标注的错误上优于NLI;(3)事实性错误共享词汇和框架与正确答案不可通过角度几何区分。为了在与部署相似的内容上验证,我们构建了一个包含212对人工编造数据集,涵盖九个领域,使用诱发编造方法。

英文摘要

Hallucinations in deployed language models can have real consequences for downstream decisions in domains such as healthcare, legal, and financial services. In production, detection has to run on what the deployed system can see: the query, the response, and often a source document. White-box access to model internals and multi-sample querying are not generally available behind a third-party API. Within this setting - black-box, single-pass, only question/answer available - the dominant baseline is NLI, which returns a value but no diagnosis when it fails. We argue that operating directly on the geometry of the embedding space provides detection methods whose successes and failures are interpretable as structural properties of contrastive sentence-encoder training \citep{wang2020understanding}. The contribution is: given an operationally-motivated taxonomy, geometry predicts which types of hallucination are detectable and which are not - and the predictions hold. We propose three operational types organized by the relation of the response embedding to the plausibility region of grounded responses on the unit hypersphere, and derive from the alignment objective a prediction for each: (1)query-proximate unfaithfulness is detectable by an angular ratio; (2)confabulation outside the plausibility region produces a directional signature that outperforms NLI on expert-annotated error; (3)factual errors sharing vocabulary and frame with correct answers are not separable by angular geometry. To validate on content resembling deployment, we built a 212-pair human-confabulated dataset across nine domains using provoked confabulation.

2602.12852 2026-05-11 cs.AI

WebClipper: Efficient Evolution of Web Agents with Graph-based Trajectory Pruning

WebClipper: 基于图的轨迹修剪高效进化网络代理

Junjie Wang, Zequn Xie, Dan Yang, Jie Feng, Yue Shen, Duolin Sun, Meixiu Long, Yihan Jiao, Zhehao Tan, Jian Wang, Peng Wei, Jinjie Gu

发表机构 * Ant Group(蚂蚁集团)

AI总结 本文提出WebClipper框架,通过图基修剪压缩网络代理轨迹,减少20%的工具调用次数并提升准确性,引入F-AE Score指标平衡精度与效率。

Comments ACL 2026 Main

详情
AI中文摘要

基于网络代理的深度研究系统在解决复杂信息检索任务中展现出强大潜力,但其搜索效率仍待探索。我们观察到许多最先进的开源网络代理依赖长工具调用轨迹,包含循环推理和无效分支探索。为此,我们提出WebClipper框架,通过图基修剪压缩网络代理轨迹。具体而言,我们将代理的搜索过程建模为状态图,并将轨迹优化转化为最小必要有向无环图(DAG)挖掘问题,生成保留关键推理并消除冗余步骤的修剪轨迹。继续在这些优化轨迹上训练,使代理向更高效的搜索模式进化,并减少约20%的工具调用次数同时提升准确性。此外,我们引入了一个新的指标F-AE Score,用于衡量模型在平衡精度和效率方面的整体性能。实验表明,WebClipper在优异性能下压缩工具调用次数,为网络代理设计中平衡有效性和效率提供了实用见解。

英文摘要

Deep Research systems based on web agents have shown strong potential in solving complex information-seeking tasks, yet their search efficiency remains underexplored. We observe that many state-of-the-art open-source web agents rely on long tool-call trajectories with cyclic reasoning loops and exploration of unproductive branches. To address this, we propose WebClipper, a framework that compresses web agent trajectories via graph-based pruning. Concretely, we model the agent's search process as a state graph and cast trajectory optimization as a minimum-necessary Directed Acyclic Graph (DAG) mining problem, yielding pruned trajectories that preserve essential reasoning while eliminating redundant steps. Continued training on these refined trajectories enables the agent to evolve toward more efficient search patterns and reduces tool-call rounds by about 20% while improving accuracy. Furthermore, we introduce a new metric called F-AE Score to measure the model's overall performance in balancing accuracy and efficiency. Experiments demonstrate that WebClipper compresses tool-call rounds under excellent performance, providing practical insight into balancing effectiveness and efficiency in web agent design.

2602.12162 2026-05-11 cs.LG

Amortized Molecular Optimization via Group Relative Policy Optimization

通过组相对策略优化实现 amortized 分子优化

Muhammad bin Javaid, Hasham Hussain, Ashima Khanna, Berke Kisin, Jonathan Pirnay, Alexander Mitsos, Dominik G. Grimm, Martin Grohe

发表机构 * RWTH Aachen University, Department of Computer Science(亚琛工业大学计算机科学系) Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability(慕尼黑工业大学生物技术与可持续性校区) University of Applied Sciences Weihenstephan-Triesdorf, Bioinformatics(魏因施泰因-特里尔多应用科学大学生物信息学) RWTH Aachen University, Process Systems Engineering (AVT.SVT)(亚琛工业大学过程系统工程(AVT.SVT))

AI总结 本文提出 AMORTIX 模型,通过组内奖励归一化解决分子结构优化中的异质性问题,在单目标和多目标激酶抑制剂设计中表现优异,优于其他方法。

详情
AI中文摘要

在结构受限的分子优化中,现有方法每次都需要从头开始昂贵的 oracle 驱动搜索,难以扩展到多个起始结构或昂贵 oracle 的场景。虽然 amortized 方法理论上可以消除这一瓶颈,但现有方法在推理时难以泛化到多样化的结构约束。我们提出了 AMORTIX,一种原生支持此类约束的 amortized 图变压器模型,能够在单次前向传递中优化分子结构,无需推理时调用 oracle。在该领域中,amortized 训练的核心挑战是起始结构的优化难度差异极大。我们展示,在这种异质性下,标准强化学习方法无法稳定训练,通过在共享相同起始结构的完成组内归一化奖励来解决。我们在结构受限的单目标和多目标激酶抑制剂设计,以及一个少样本前药案例研究中进行了评估。AMORTIX 在目标导向的骨架装饰任务中优于 amortized 和实例优化基线,在 PMO 挑战赛中排名第一;前药案例研究进一步展示了学习的修改规则在未见药物结构上的迁移。代码可在 https://github.com/Hash-hh/AMORTIX/ 上获得。

英文摘要

In structurally constrained molecular optimization, state-of-the-art methods restart an expensive oracle-driven search from scratch for every new input structure, scaling poorly to settings with many starting structures or expensive oracles. While amortized approaches that learn a transferable policy could in principle remove this bottleneck, existing methods struggle to generalize to diverse structural constraints at inference time. We present AMORTIX, an amortized Graph Transformer model that natively supports such constraints, optimizing molecular structures in a single forward pass with zero inference-time oracle calls. A central challenge for amortized training in this domain is that optimization difficulty varies drastically across starting structures. We show that, under this heterogeneity, standard reinforcement learning methods fail to stabilize training, and address this by normalizing rewards within groups of completions sharing the same starting structure. We evaluate on structurally constrained single- and multi-target kinase inhibitor design, and on a few-shot prodrug case study. AMORTIX outperforms both amortized and instance-optimization baselines on goal-directed scaffold decoration and ranks first among amortized methods on the PMO benchmark; the prodrug case study further demonstrates transfer of a learned modification rule to unseen drug structures. Code is available at https://github.com/Hash-hh/AMORTIX/.

2602.11162 2026-05-11 cs.CL

Retrieval Heads are Dynamic

检索头是动态的

Yuping Lin, Zitao Li, Yue Xing, Pengfei He, Yingqian Cui, Yaliang Li, Bolin Ding, Jingren Zhou, Jiliang Tang

发表机构 * Michigan State University(密歇根州立大学) Zoom Communications(Zoom公司) Tongyi Lab, Alibaba Group(阿里云实验室)

AI总结 本文从动态视角研究LLM中的检索头,揭示其时间动态性、不可替代性及与隐藏状态的关联,通过实验验证动态检索头在生成任务中的优势。

Comments Accepted at ACL 2026

详情
AI中文摘要

近期研究表明,大型语言模型(LLM)中存在负责从输入上下文中提取信息的

英文摘要

Recent studies have identified "retrieval heads" in Large Language Models (LLMs) responsible for extracting information from input contexts. However, prior works largely rely on static statistics aggregated across datasets, identifying heads that perform retrieval on average. This perspective overlooks the fine-grained temporal dynamics of autoregressive generation. In this paper, we investigate retrieval heads from a dynamic perspective. Through extensive analysis, we establish three core claims: (1) Dynamism: Retrieval heads vary dynamically across timesteps; (2) Irreplaceability: Dynamic retrieval heads are specific at each timestep and cannot be effectively replaced by static retrieval heads; and (3) Correlation: The model's hidden state encodes a predictive signal for future retrieval head patterns, indicating an internal planning mechanism. We validate these findings on the Needle-in-a-Haystack task and a multi-hop QA task, and quantify the differences on the utility of dynamic and static retrieval heads in a Dynamic Retrieval-Augmented Generation framework. Our study provides new insights into the internal mechanisms of LLMs.

2602.10512 2026-05-11 cs.LG cs.LO stat.ML

Exponential Sample Complexity Separation between Flat and Hierarchical Agentic Theorem Provers

平坦与分层代理定理证明器的指数样本复杂度分离

Sho Sonoda, Shunta Akiyama, Yuya Uezato

发表机构 * CyberAgent RIKEN AIP(RIKEN先进研究所)

AI总结 研究通过分析教师证明器生成的数据分布,探讨了分层证明器在重复子证明中通过预测可重用证明图来减少样本需求的机制。

详情
AI中文摘要

代理定理证明器通常在返回战术搜索前引入中间引理、证明草图或子目标分解。这可能看起来像一条昂贵的弯路:如果证明引理本身很困难,为什么学习的证明器要花费精力在那里?我们给出了统计学习的答案。我们不研究所有公式的最坏情况证明复杂度,而是研究由教师证明器产生的有偏数据分布:初始定理状态及其成功验证的证明轨迹。我们将证明搜索建模为确定性有限时间马尔可夫决策过程,并分析从这些轨迹中进行离线模仿学习。成功界限取决于教师证明的平均长度、教师下一步动作的可预测性以及学生学习该局部预测问题的准确性。一个扁平的学生从完全内联的轨迹中学习,因此重复的子证明在训练和测试时间的证书中多次出现。一个分层的学生则预测可重用的证明DAG,并在每个共享块中仅解决一次。当展开重复时,相同的困难局部论证会被指数多次重复,我们界别的证书可以对分层学习者来说小得多。这给出了一个具体的统计机制,说明可重用的证明结构如何帮助基于验证器的定理证明。

英文摘要

Agentic theorem provers often introduce intermediate lemmas, proof sketches, or subgoal decompositions before returning to tactic-level search. This can look like an expensive detour: if proving lemmas is itself hard, why should a learned prover spend effort there? We give a statistical learning answer. Instead of worst-case proof complexity over all formulas, we study the biased data distribution produced by a teacher prover: initial theorem states together with successful verified proof traces. We model proof search as a deterministic finite-horizon MDP and analyze offline imitation learning from those traces. The success bounds depend on the average length of teacher proofs, how predictable the teacher's next action is, and how accurately the student learns that local prediction problem. A flat student learns from fully inlined traces, so repeated subproofs appear many times in its training and test-time certificate. A hierarchical student instead predicts a reusable proof DAG and solves each shared block once. When flattening duplicates the same hard local argument exponentially many times, the sufficient-sample certificate produced by our bounds can be exponentially smaller for the hierarchical learner. This gives a concrete statistical mechanism by which reusable proof structure helps verifier-based theorem proving.

2602.09850 2026-05-11 cs.CV

Towards Explainable Industrial Anomaly Detection via Knowledge-Guided Latent Reasoning

面向可解释性工业异常检测的基于知识引导的潜在推理

Peng Chen, Chao Huang, Yunkang Cao, Chengliang Liu, Wei Wang, Wenqiang Wang, Mingbo Yang, Li Shen, Wenqi Ren, Xiaochun Cao

发表机构 * School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University(中山大学深圳校区计算机科学与技术学院) School of Artificial Intelligence and Robotics, Hunan University(湖南大学人工智能与机器人学院) Department of Computer and Information Science, University of Macau(澳门大学计算机与信息科学系)

AI总结 本文提出Reason-IAD框架,通过引入领域特定文本描述和熵驱动的潜在推理机制,提升工业异常检测的准确性和可解释性,实验表明其在多个任务中均优于现有方法。

详情
AI中文摘要

工业异常检测需要对细粒度缺陷模式进行精确推理。然而,现有多模态大语言模型(MLLMs)在通用领域数据上预训练,难以捕捉类别特定的异常,限制了检测准确性和可解释性。为此,我们提出Reason-IAD,一种基于知识引导的动态潜在推理框架,用于可解释性工业异常检测。Reason-IAD包含两个核心组件。首先,检索增强的知识模块将类别特定的文本描述纳入模型输入,实现对领域特定缺陷的上下文感知推理。其次,熵驱动的潜在推理机制在紧凑的潜在空间中进行迭代探索,利用基于熵的奖励鼓励自信稳定的预测。此外,动态视觉注入策略选择性地将最信息丰富的图像块纳入潜在序列,引导推理过程向异常检测关键区域。大量实验结果表明,Reason-IAD在多个任务中均优于现有方法。代码将在https://github.com/chenpeng052/Reason-IAD上公开。

英文摘要

Industrial anomaly detection demands precise reasoning over fine-grained defect patterns. However, existing multimodal large language models (MLLMs), pretrained on general-domain data, often struggle to capture category-specific anomalies, thereby limiting both detection accuracy and interpretability. To address these limitations, we propose Reason-IAD, a knowledge-guided dynamic latent reasoning framework for explainable industrial anomaly detection. Reason-IAD comprises two core components. First, a retrieval-augmented knowledge module incorporates category-specific textual descriptions into the model input, enabling context-aware reasoning over domain-specific defects. Second, an entropy-driven latent reasoning mechanism conducts iterative exploration within a compact latent space using optimizable latent think tokens, guided by an entropy-based reward that encourages confident and stable predictions. Furthermore, a dynamic visual injection strategy selectively incorporates the most informative image patches into the latent sequence, directing the reasoning process toward regions critical for anomaly detection. Extensive experimental results demonstrate that Reason-IAD consistently outperforms state-of-the-art methods across multiple tasks. The code will be publicly available at https://github.com/chenpeng052/Reason-IAD.

2602.09782 2026-05-11 cs.LG cs.AI cs.CL

Flexible Entropy Control in RLVR with a Gradient-Preserving Perspective

在RLVR中采用梯度保持视角实现灵活熵控制

Kun Chen, Peng Shi, Fanfan Liu, Haibo Qiu, Zhixiong Zeng, Siqi Yang, Wenji Mao

发表机构 * School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院) MAIS, Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所MAIS) Meituan(美团)

AI总结 本文从梯度保持裁剪角度出发,提出动态裁剪阈值机制和动态熵控制策略,有效缓解熵崩溃问题并提升多基准性能。

Comments https://github.com/Kwen-Chen/Flexible-Entropy-Control

详情
AI中文摘要

强化学习与可验证奖励(RLVR)已成为增强大型语言模型(LLMs)推理能力的关键方法。然而,连续训练常导致策略熵崩溃,表现为熵的快速衰减,从而导致过早自信、输出多样性降低和梯度范数消失,抑制学习。梯度保持裁剪是影响这些动态的主要因素,但现有缓解策略大多静态且缺乏将裁剪机制与精确熵控制联系起来的框架。本文从梯度保持裁剪角度重塑RL中的熵控制。我们首先理论和实证验证了特定重要性采样比率区域对熵增长和减少的贡献。利用这些发现,我们引入一种新的调节机制,使用动态裁剪阈值精确管理熵。此外,我们设计并评估了动态熵控制策略,包括先增后减、减增减和振荡衰减。实验结果表明,这些策略有效缓解熵崩溃并实现多个基准上的优越性能。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a critical method for enhancing the reasoning capabilities of Large Language Models (LLMs). However, continuous training often leads to policy entropy collapse, characterized by a rapid decay in entropy that results in premature overconfidence, reduced output diversity, and vanishing gradient norms that inhibit learning. Gradient-Preserving Clipping is a primary factor influencing these dynamics, but existing mitigation strategies are largely static and lack a framework connecting clipping mechanisms to precise entropy control. This paper proposes reshaping entropy control in RL from the perspective of Gradient-Preserving Clipping. We first theoretically and empirically verify the contributions of specific importance sampling ratio regions to entropy growth and reduction. Leveraging these findings, we introduce a novel regulation mechanism using dynamic clipping thresholds to precisely manage entropy. Furthermore, we design and evaluate dynamic entropy control strategies, including increase-then-decrease, decrease-increase-decrease, and oscillatory decay. Experimental results demonstrate that these strategies effectively mitigate entropy collapse and achieve superior performance across multiple benchmarks.

2602.09229 2026-05-11 cs.LG cs.IR

When Does Embedding Magnitude Matter? A Cross-Task Functional-Symmetry Framework

嵌入量何时重要?一种跨任务功能对称框架

Xincan Feng, Taro Watanabe

发表机构 * Nara Institute of Science and Technology(奈良科学技术研究所)

AI总结 本文提出一种2x2框架,独立控制查询和文档侧归一化,发现QNorm和DNorm在跨任务中优于余弦和点积,揭示文档量影响推理分数,查询量调节训练梯度,Fisher信息矩阵条件数预测归一化侧。

Comments Preliminary work. Under review

详情
AI中文摘要

余弦相似度对两边进行归一化;点积对两边都不归一化。我们提出一个2x2框架,独立控制查询侧和文档侧的归一化,揭示出两个中间变体(QNorm,DNorm)此前未被研究。在四个编码器的检索任务中,评估于MS MARCO领域和BEIR、多跳QA领域,单侧变体在领域外表现优于余弦和点积,领域外相对增益高达+72%,在下游RAG任务中增益达+24%。跨评估揭示了机制:文档量缩放推理分数,查询量调节训练梯度,Fisher信息矩阵条件数预测应归一化哪一侧。然后通过功能对称性分类任务,定义为聚合评分过程是否将Q和C视为可交换,测试机制是否扩展到检索之外。在五个额外的任务家族(语义文本相似性、CLIP、知识图谱完成、少样本分类、推荐系统)中,粗预测(余弦用于对称,保持量用于不对称)在所有情况下均成立;单侧变体在推荐系统中优于余弦,在少样本分类中DNorm优于余弦和Prototypical Networks的标准欧几里得默认值。

英文摘要

Cosine similarity normalizes both sides; dot product normalizes neither. We propose a 2x2 framework that independently controls query-side and document-side normalization, exposing two intermediate variants (QNorm, DNorm) that have not been previously studied. On retrieval with four encoders, evaluated in-domain on MS MARCO and out-of-domain on BEIR, BRIGHT, and multi-hop QA, the unilateral variants outperform both cosine and dot product, with relative gains of up to +72% out-of-domain and +24% on downstream RAG. Cross-evaluation reveals the mechanism: document magnitude scales inference scores while query magnitude modulates training gradients, and the Fisher Information Matrix condition number predicts which side to normalize. We then classify tasks by functional symmetry, defined as whether the aggregate scoring procedure treats Q and C as interchangeable, and test whether the mechanism extends beyond retrieval. On five additional task families (semantic textual similarity, CLIP, knowledge graph completion, few-shot classification, recommender systems), the coarse prediction (cosine for symmetric, magnitude-preserving for asymmetric) holds in every case examined; the unilateral variants beat Cosine on recommendation, and on few-shot classification DNorm beats both Cosine and the standard Euclidean default of Prototypical Networks.

2602.06283 2026-05-11 cs.LG

SOCKET: SOft Collision Kernel EsTimator for Sparse Attention

SOCKET: 用于稀疏注意力的软碰撞核估计器

Sahil Joshi, Agniva Chowdhury, Wyatt Bellinger, Amar Kanakamedala, Ekam Singh, Hoang Anh Duy Le, Aditya Desai, Anshumali Shrivastava

发表机构 * Department of Computer Science, Rice University(里士大学计算机科学系) Department of Electrical Engineering and Computer Sciences, UC Berkeley(伯克利大学电气工程与计算机科学系)

AI总结 SOCKET通过引入软碰撞核估计器改进稀疏注意力,利用概率性相似性聚合替代传统二元碰撞信号,提升长上下文推理效率和内存利用率。

Comments 7 figures, 17 tables

详情
AI中文摘要

在长上下文推理中利用稀疏性是扩展大语言模型的关键,因为注意力主导了自回归解码的成本。稀疏注意力通过限制计算到部分token来降低此成本,但其效果依赖于推理时高效的评分和选择。我们重新审视局部敏感哈希(LSH)并引入SOCKET,一种软碰撞核估计器,用概率性、相似性感知的聚合替代硬桶匹配。传统LSH产生二元碰撞信号,限制排名质量并需要大量内存才能表现良好。相比之下,软LSH在哈希表中积累分级碰撞证据,保留top-k顺序的同时显著减少内存使用。这将LSH从候选生成器转变为稀疏注意力的原理性评分核。利用这一特性,SOCKET实现了高效的token选择,无需随意投票,并在多个长上下文基准上匹配或超越了先前的稀疏注意力方法。通过定制的CUDA评分内核和Flash Decode Triton后端,SOCKET的吞吐量比FlashAttention高至1.5倍。

英文摘要

Exploiting sparsity during long-context inference is key to scaling large language models, as attention dominates the cost of autoregressive decoding. Sparse attention reduces this cost by restricting computation to a subset of tokens, but its effectiveness depends on efficient scoring and selection at inference time. We revisit Locality-Sensitive Hashing (LSH) and introduce SOCKET, a SOft Collision Kernel EsTimator that replaces hard bucket matches with probabilistic, similarity-aware aggregation. Traditional LSH yields binary collision signals that limit ranking quality and require substantial memory to perform well. In contrast, soft LSH accumulates graded collision evidence across hash tables, preserving top-k ordering with significantly less memory. This reframes LSH from a candidate generator into a principled scoring kernel for sparse attention. Leveraging this property, SOCKET enables efficient token selection without ad hoc voting and matches or surpasses prior sparse attention methods across multiple long-context benchmarks. With a custom CUDA scoring kernel and a Flash Decode Triton backend, SOCKET achieves up to 1.5$\times$ higher throughput than FlashAttention.

2602.05359 2026-05-11 cs.CV

Multimodal Latent Reasoning via Hierarchical Visual Cues Injection

多模态潜在推理 via 分层视觉提示注入

Yiming Zhang, Qiangyu Yan, Borui Jiang, Kai Han

发表机构 * Nanyang Technological University(南洋理工大学) Huawei Noah's Ark Lab(华为诺亚实验室)

AI总结 本文提出HIVE框架,通过分层视觉提示注入实现多模态潜在推理,提升模型在对齐潜在空间中的多步推理能力。

详情
AI中文摘要

多模态大语言模型(MLLMs)的进步使感知能力显著提升,但其推理过程仍依赖于端到端生成或显式语言中心的推理链(CoT),效率低且易产生幻觉。本文提出HIVE框架,通过分层视觉提示注入,使推理过程在潜在空间中稳健进行。方法递归扩展transformer块,创建内部循环以迭代优化推理。关键在于将分层视觉提示从全局场景上下文注入到模型的潜在表示中,使模型能在对齐的潜在空间中进行基于现实的多步推理。大量评估显示,结合视觉知识时测试时扩展有效,整合分层信息显著提升模型对复杂场景的理解。

英文摘要

The advancement of multimodal large language models (MLLMs) has enabled impressive perception capabilities. However, their reasoning process often remains a "fast thinking" paradigm, reliant on end-to-end generation or explicit, language-centric chains of thought (CoT), which can be inefficient, verbose, and prone to hallucination. This work posits that robust reasoning should evolve within a latent space, integrating multimodal signals seamlessly. We propose multimodal latent reasoning via HIerarchical Visual cuEs injection (\emph{HIVE}), a novel framework that instills deliberate, "slow thinking" without depending on superficial textual rationales. Our method recursively extends transformer blocks, creating an internal loop for iterative reasoning refinement. Crucially, it injectively grounds this process with hierarchical visual cues from global scene context to fine-grained regional details directly into the model's latent representations. This enables the model to perform grounded, multi-step inference entirely in the aligned latent space. Extensive evaluations demonstrate that test-time scaling is effective when incorporating vision knowledge, and that integrating hierarchical information significantly enhances the model's understanding of complex scenes.

2602.04556 2026-05-11 cs.CL cs.LG

Rethinking Weight Tying: Pseudo-Inverse Tying for LM Stable Training and Updates

重新思考权重绑定:用于语言模型稳定训练和更新的伪逆绑定

Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang

发表机构 * Monash University(墨尔本大学) Technical University of Munich(慕尼黑技术大学) Chongqing University(重庆大学)

AI总结 本文提出伪逆绑定方法,通过同步嵌入和解嵌入作为共享的潜在令牌记忆的耦合投影,提升语言模型训练稳定性与接口一致性,同时改善可解释性探查。

Comments an early-stage version

详情
AI中文摘要

权重绑定在紧凑语言模型中被广泛使用,以通过共享令牌表来减少参数。然而,仅参数共享并不能保证稳定的令牌接口:在训练过程中,将编码令牌转换为隐藏状态与将隐藏状态转换为logits的对应关系可能会漂移,从而恶化优化敏感性和依赖有意义词汇空间解码器的可解释性探查。我们提出伪逆绑定(PIT),通过将嵌入和解嵌入作为共享潜在令牌记忆的耦合投影,保证整个训练过程中的伪逆一致接口。PIT通过极化初始化从源检查点获得或thonormal共享内存,用于持续预训练,或通过随机或thonormal初始化用于从头预训练,并引入一个通过Cholesky因子参数化的学习对称正定隐藏空间转换参数。输出头在词汇投影之前应用此转换,而嵌入则使用稳定的三角求解应用反向转换,避免显式伪逆重新计算和词汇规模的辅助参数。除了提高训练稳定性外,PIT通过保持输入和输出令牌几何结构同步,为logit-lens风格和词汇空间可解释性探查提供了更清晰的基质。我们在设备模型上评估了PIT,参数范围从256M到1.3B。结果表明,PIT提高了持续预训练的稳定性,强制在不同设置下近似精确的令牌接口一致性,并在持续预训练后产生更可预测的轻量级适应,而从头预训练揭示了严格接口一致性与无约束优化之间的权衡。

英文摘要

Weight tying is widely used in compact language models to reduce parameters by sharing the token table between the input embedding and the output projection. However, parameter sharing alone does not guarantee a stable token interface: during training, the correspondence between encoding tokens into hidden states and decoding hidden states into logits can drift, worsening optimization sensitivity and weakening explainability probes that rely on a meaningful vocabulary-space decoder. We propose Pseudo-Inverse Tying (PIT), which synchronizes embedding and unembedding as coupled projections of a shared latent token memory, guaranteeing a pseudo-inverse-consistent interface throughout training. PIT maintains an orthonormal shared memory, obtained by polar initialization from a source checkpoint for continued pretraining or by random orthonormal initialization for from-scratch pretraining, and introduces a learned symmetric positive definite hidden-space transform parameterized via a Cholesky factor. The output head applies this transform to hidden states before the vocabulary projection, while the embedding applies the inverse transform to token vectors using stable triangular solves, avoiding explicit pseudo-inverse recomputation and vocabulary-sized auxiliary parameters. Beyond improving training stability, PIT provides a cleaner substrate for logit-lens-style and vocabulary-space explainability probes by keeping the input and output token geometries synchronized. We evaluate PIT on on-device models spanning 256M-1.3B parameters. The results show that PIT improves continued-pretraining stability, enforces near-exact token-interface consistency across settings, and yields more predictable lightweight adaptation after continued pretraining, while from-scratch pretraining reveals a trade-off between strict interface consistency and unconstrained optimization.

2602.04447 2026-05-11 cs.LG cs.AI

Mixture of Masters: Sparse Chess Language Models with Player Routing

大师混合:具有玩家路由的稀疏国际象棋语言模型

Giacomo Frisoni, Lorenzo Molfetta, Davide Freddi, Gianluca Moro

发表机构 * Department of Computer Science and Engineering(计算机科学与工程系)

AI总结 本文提出Mixture-of-Masters模型,通过稀疏专家网络实现国际象棋生成的多样化与可控性,优于传统密集模型和GPT基线。

详情
AI中文摘要

现代国际象棋语言模型是基于数百万高水平玩家对局训练的密集转换器。然而,这些单一网络倾向于产生模式平均行为,风格边界模糊,罕见但有效的策略被压制。为对抗同质化,我们引入Mixture-of-Masters(MoM),首个具有小规模GPT专家的国际象棋混合专家模型,这些专家模拟世界级大师。对于每个移动,后验学习的门控网络根据游戏状态选择最合适的身份,使MoM能够动态切换风格,例如塔尔的进攻性或彼得罗西亚的防守性。在未见过的标准化游戏中评估MoM对抗Stockfish,其优于密集个体专家网络和流行的GPT基线,同时确保生成多样性、控制和可解释性。

英文摘要

Modern chess language models are dense transformers trained on millions of games played by thousands of high-rated individuals. However, these monolithic networks tend to collapse into mode-averaged behavior, where stylistic boundaries are blurred, and rare but effective strategies are suppressed. To counteract homogenization, we introduce Mixture-of-Masters (MoM), the first chess mixture-of-experts model with small-sized GPT experts emulating world-class grandmasters. For each move, a post-hoc learnable gating network selects the most appropriate persona to channel depending on the game state, allowing MoM to switch its style dynamically, e.g., Tal's offensive vocation or Petrosian's defensive solidity. When evaluated against Stockfish on unseen standard games, MoM outperforms both dense individual expert networks and popular GPT baselines trained on aggregated data, while ensuring generation variety, control, and interpretability.

2602.03331 2026-05-11 cs.LG

Bayesian Conformal Prediction as a Decision Risk Problem

贝叶斯符合预测作为决策风险问题

Fanyi Wu, Veronika Lohmanova, Samuel Kaski, Michele Caprio

发表机构 * Department of Computer Science, University of Manchester(曼彻斯特大学计算机科学系) UKRI AI Centre for Doctoral Training in Decision Making for Complex Systems(UKRI人工智能博士培训中心(复杂系统决策)) Department of Computer Science, Aalto University(艾洛大学计算机科学系) ELLIS Institute, Finland(芬兰ELLIS研究所)

AI总结 本文提出贝叶斯符合预测(BCP),通过结合贝叶斯后验预测分布与PAC式符合风险控制,生成具有有限样本覆盖保证的预测集。BCP将符合预测转化为决策风险优化问题,扩展标准固定分位数阈值集为优化最高后验密度(HPD)预测集,提高效率并保持覆盖性。

Comments 22 pages, 8 figures. A previous version was accepted at the EIML Workshop at NeurIPS 2025

详情
AI中文摘要

我们提出贝叶斯符合预测(BCP),一种将贝叶斯后验预测分布与PAC式符合风险控制相结合的框架,以生成具有有限样本覆盖保证的预测集。标准分位数阈值符合方法通常使用单一固定阈值构造预测集,通常产生连通的预测集。虽然有效,但当后验预测分布是多模态时,此类集合可能跨越分离模式之间的低密度区域,效率低下。BCP的主要贡献是将符合预测作为决策风险优化问题进行建模,扩展标准固定分位数阈值集为优化最高后验密度(HPD)预测集。这些集合可以是不连通的,将概率质量集中在分离的高密度区域。通过PAC式风险约束来保证有效性,即使贝叶斯模型不正确指定也能提供覆盖控制。在标准嵌套阈值设置中,BCP恢复了最小可行阈值,与现有PAC基于方法一致。在多模态实验中,HPD几何学显著提高了效率,将平均预测集大小从$4.82$降至$2.07$,同时满足目标PAC通过率。在回归、分类和分布偏移实验中,BCP在模型不正确指定下保持可靠覆盖,而贝叶斯可信区间可能无法保持名义覆盖。

英文摘要

We propose Bayesian Conformal Prediction (BCP), a framework that combines Bayesian posterior predictive distributions with PAC-style conformal risk control to produce prediction sets with finite-sample coverage guarantees. Standard quantile-threshold conformal methods often construct prediction sets using a single fixed threshold, which typically yields connected prediction sets. While valid, such sets can be inefficient when the posterior predictive distribution is multimodal, since they may span low-density regions between separated modes. The main contribution of BCP is to formulate conformal prediction as a decision-risk optimisation problem, extending standard fixed quantile-threshold sets to optimised highest posterior density (HPD) prediction sets. These sets can be disjoint, concentrating probability mass on separated high-density regions. Validity is enforced using a PAC-style risk constraint, which provides coverage control even when the Bayesian model is misspecified. In standard nested-threshold settings, BCP recovers the smallest feasible threshold, aligning with existing PAC-based approaches. In the multimodal experiment, HPD geometry substantially improves efficiency, reducing mean prediction set size from $4.82$ to $2.07$ while satisfying the target PAC pass rate. Across regression, classification, and distribution-shift experiments, BCP maintains reliable coverage under model misspecification, whereas Bayesian credible intervals can fail to preserve nominal coverage.

2602.03201 2026-05-11 cs.LG

SLOPE: Optimistic Potential Landscape Shaping for Model-based Reinforcement Learning

SLOPE:基于模型的强化学习中的乐观势能塑造

Yao-Hui Li, Zeyu Wang, Xin Li, Wei Pang, Yingfang Yuan, Zhengkun Chen, Boya Zhang, Riashat Islam, Alex Lamb, Yonggang Zhang

发表机构 * Beijing Institute of Technology(北京理工大学) Heriot-Watt University(赫瑞-沃森大学) Shenzhen Institutes of Advanced Technology, CAS(深圳先进技术研究院) Microsoft Research NY(微软研究院纽约分部) Tsinghua University(清华大学) Jilin University(吉林大学)

AI总结 SLOPE通过乐观势能估计构建信息丰富的势能景观,解决稀疏奖励环境下梯度信息不足的问题,提升模型在稀疏、半稀疏和密集奖励任务中的性能。

Comments Work in progress

详情
AI中文摘要

基于模型的强化学习(MBRL)在样本效率方面表现良好,但在稀疏奖励环境下面临挑战。关键瓶颈在于稀疏设置中缺乏信息性梯度,标准奖励模型常产生平坦景观,难以引导规划。为此,我们提出Shaping Landscapes with Optimistic Potential Estimates(SLOPE),一种新颖框架,将奖励建模从预测稀疏标量转向构建信息丰富的势能景观。SLOPE采用乐观分布回归估计高置信度上界,放大罕见成功信号并确保充分的探索梯度。在30多个任务和五个基准及真实机器人部署上的评估表明,SLOPE在完全稀疏、半稀疏和密集奖励任务中均优于领先基线。

英文摘要

Model-based reinforcement learning (MBRL) is sample-efficient but struggles in sparse reward settings. A critical bottleneck arises from the lack of informative gradients in sparse settings, where standard reward models often yield flat landscapes that struggle to guide planning. To address this challenge, we propose Shaping Landscapes with Optimistic Potential Estimates (SLOPE), a novel framework that shifts reward modeling from predicting sparse scalars to constructing informative potential landscapes. SLOPE employs optimistic distributional regression to estimate high-confidence upper bounds, which amplifies rare success signals and ensures sufficient exploration gradients. Evaluations on 30+ tasks across 5 benchmarks and real-world robotic deployments, demonstrate that SLOPE consistently outperforms leading baselines in fully sparse, semi-sparse, and dense rewards.

2602.02739 2026-05-11 cs.LG cs.AI

TopoPrune: Robust Data Pruning via Unified Latent Space Topology

TopoPrune:通过统一的潜在空间拓扑实现稳健的数据修剪

Arjun Roy, Prajna G. Malettira, Manish Nagaraj, Kaushik Roy

发表机构 * Purdue University(普渡大学)

AI总结 本文提出TopoPrune框架,通过统一的双尺度拓扑方法,在潜在空间中捕捉数据的稳定内在结构,实现高精度和鲁棒性,尤其在高数据修剪率下表现优异。

Comments Preprint. Under Review

详情
AI中文摘要

几何数据修剪方法虽然在利用预训练模型时实用,但本质上不稳定。其依赖于外在几何性,使其对潜在空间扰动高度敏感,导致在跨架构迁移或存在特征噪声时性能下降。我们引入TopoPrune框架,通过拓扑捕捉数据的稳定内在结构来解决这一挑战。TopoPrune在两个尺度上操作:(1)利用拓扑感知的流形近似建立数据集的全局低维嵌入;随后,(2)通过可微分持续同调对流形嵌入进行局部拓扑优化,按结构复杂度对样本进行排序。我们证明,我们的统一双尺度拓扑方法确保了高准确性和精度,特别是在显著的数据集修剪率(如90%)下。此外,通过拓扑的内在稳定性属性,TopoPrune (a) 非常稳健于潜在特征嵌入的噪声扰动,并且 (b) 在多样化的网络架构上表现出优越的迁移性。本研究展示了稳定且原则性的拓扑框架在稳健数据高效学习中的有前途的路径。

英文摘要

Geometric data pruning methods, while practical for leveraging pretrained models, are fundamentally unstable. Their reliance on extrinsic geometry renders them highly sensitive to latent space perturbations, causing performance to degrade during cross-architecture transfer or in the presence of feature noise. We introduce TopoPrune, a framework which resolves this challenge by leveraging topology to capture the stable, intrinsic structure of data. TopoPrune operates at two scales, (1) utilizing a topology-aware manifold approximation to establish a global low-dimensional embedding of the dataset. Subsequently, (2) it employs differentiable persistent homology to perform a local topological optimization on the manifold embeddings, ranking samples by their structural complexity. We demonstrate that our unified dual-scale topological approach ensures high accuracy and precision, particularly at significant dataset pruning rates (e.g., 90%). Furthermore, through the inherent stability properties of topology, TopoPrune is (a) exceptionally robust to noise perturbations of latent feature embeddings and (b) demonstrates superior transferability across diverse network architectures. This study demonstrates a promising avenue towards stable and principled topology-based frameworks for robust data-efficient learning.

2602.02320 2026-05-11 cs.CL cs.AI q-bio.BM

A Large-Scale Dataset for Molecular Structure-Language Description via a Rule-Regularized Method

通过规则正则化方法构建大规模分子结构-语言描述数据集

Feiyang Cai, Guijuan He, Yi Hu, Jingjing Wang, Joshua Luo, Tianyu Zhu, Srikanth Pilla, Gang Li, Ling Liu, Feng Luo

发表机构 * Clemson University(克莱姆森大学) Independent Researcher(独立研究者) University of Delaware(德雷克塞尔大学) Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文提出一种全自动标注框架,生成精确分子描述,保留完整结构细节,构建了约16.3万分子-描述对数据集,验证精度达98.6%。

详情
AI中文摘要

分子功能主要由结构决定。准确对齐分子结构与自然语言对使大语言模型能够推理下游化学任务至关重要。然而,人工标注成本过高,难以构建大规模高质量的结构 grounded 描述数据集。本文提出一个完全自动化的标注框架,用于生成精确的分子描述,保留完整的结构细节。我们的方法基于并扩展了基于规则的化学命名法解析器,以解释IUPAC名称并构建丰富的结构性XML元数据,该元数据显式编码分子结构。然后利用该元数据指导LLM生成准确的自然语言描述。使用此框架,我们整理了一个大约163,000个分子-描述对的数据集。通过结合LLM和专家人工评估的严格验证协议,在2000个分子子集上展示了98.6%的描述精度。所提出的标注框架对依赖结构描述的更广泛化学任务具有广泛益处,生成的数据集为分子-语言对齐提供了可靠的基础。源代码和数据集分别托管在https://github.com/TheLuoFengLab/MolLangData和https://huggingface.co/datasets/ChemFM/MolLangData。

英文摘要

Molecular function is largely determined by structure. Accurately aligning molecular structure with natural language is therefore essential for enabling large language models (LLMs) to reason about downstream chemical tasks. However, the substantial cost of human annotation makes it infeasible to construct large-scale, high-quality datasets of structure-grounded descriptions. In this work, we propose a fully automated annotation framework for generating precise molecular descriptions that preserve complete structural details at scale. Our approach builds upon and extends a rule-based chemical nomenclature parser to interpret IUPAC names and construct enriched, structural XML metadata that explicitly encodes molecular structure. This metadata is then used to guide LLMs in producing accurate natural-language descriptions. Using this framework, we curate a large-scale dataset of approximately $163$k molecule--description pairs. A rigorous validation protocol combining LLM-based and expert human evaluation on a subset of $2,000$ molecules demonstrates a high description precision of $98.6$%. The proposed annotation framework is readily beneficial to broader chemical tasks that rely on structural descriptions, with the resulting dataset providing a reliable foundation for molecule--language alignment. The source code and dataset are hosted at https://github.com/TheLuoFengLab/MolLangData and https://huggingface.co/datasets/ChemFM/MolLangData, respectively.

2602.01752 2026-05-11 cs.CL cs.CR

WorldCup Sampling for Multi-bit LLM Watermarking

多比特LLM水印的WorldCup采样

Yidan Wang, Yubing Ren, Yanan Cao, Li Guo

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences(中国科学院信息工程研究所) School of Cyber Security, University of Chinese Academy of Sciences(中国科学院大学网络与信息安全学院)

AI总结 本文提出WorldCup框架,通过结构化通信通道建模和分层竞争机制实现多比特水印嵌入,提升文本质量和解码鲁棒性,实验表明其在容量、检测性、鲁棒性等方面优于现有方法。

详情
AI中文摘要

随着大语言模型(LLM)生成越来越像人类的文本,水印技术已成为可靠归属认证的有前景解决方案。虽然多比特水印能实现更丰富的溯源编码,但现有方法通常通过引入静态logit扰动和计数解码策略扩展零比特水印方案,这会导致文本质量下降和解码鲁棒性降低。本文提出WorldCup,一种多比特LLM水印框架,将采样过程建模为结构化通信通道,并通过互补信号引导的分层竞争机制嵌入信息位。此外,WorldCup通过熵感知调制保持生成质量,并通过置信度感知解码实现鲁棒的信息恢复。全面实验表明,WorldCup在信息容量、检测性、鲁棒性、文本质量和解码效率之间实现了良好平衡,持续优于现有基线方法。我们相信,这项工作为未来多比特LLM水印研究奠定了可扩展和原则性的基础。

英文摘要

As large language models (LLMs) generate increasingly human-like text, watermarking has emerged as a promising solution for reliable attribution beyond mere detection. While multi-bit watermarking enables richer provenance encoding, existing approaches typically extend zero-bit watermarking schemes by introducing static logit perturbations and counting-based decoding strategies, which can degrade text quality and compromise decoding robustness as the payload increases. In this paper, we propose WorldCup, a multi-bit watermarking framework for LLMs that models the sampling process as a structured communication channel and embeds message bits through a hierarchical competition mechanism guided by complementary signals. Moreover, WorldCup incorporates entropy-aware modulation to preserve generation quality and enables robust message recovery via confidence-aware decoding that accounts for token-level reliability. Comprehensive experiments demonstrate that WorldCup achieves a strong balance across message capacity, detectability, robustness, text quality, and decoding efficiency, consistently outperforming prior baselines. We believe that this work establishes a scalable and principled foundation for future research on multi-bit watermarking in LLMs.

2602.01642 2026-05-11 cs.LG cs.AI math.OC stat.CO stat.ML

The Effect of Mini-Batch Noise on the Implicit Bias of Adam

小批量噪声对Adam中隐式偏差的影响

Matias D. Cattaneo, Boris Shigida

发表机构 * Princeton University(普林斯顿大学)

AI总结 本文研究了小批量噪声如何影响Adam优化器中记忆的隐式偏差,发现大批次时高β₂会降低泛化能力,而小批次时需调整β₁和β₂以提升验证精度。

详情
AI中文摘要

在有限高质量数据和计算资源下,多轮训练在深度学习子领域重新获得重要性。Adam(W)作为许多任务(如下一个token预测)的首选优化器,有两个动量超参数(β₁,β₂)控制记忆,以及一个重要的超参数——批次大小,控制小批量噪声的量。我们提出一个理论框架,以理解小批量噪声如何影响Adam中记忆的隐式偏差(取决于β₁,β₂)向损失景观中更尖锐或更平坦的区域偏移,这通常与多轮训练中的泛化差距相关。我们发现,在大批次情况下,较高的β₂会增加记忆的反正则化作用(损害泛化),但当批次变小时,反正则化对β₂的依赖性反转。在β₁上也出现类似的单调性变化(方向相反)。特别是,通常“默认”对(β₁,β₂)=(0.9, 0.999)在小批次时表现良好;在大批次情况下,许多设置中将β₁靠近β₂在多轮训练中的验证精度上更优。此外,我们的理论推导将移位发生批次大小的尺度与临界批次大小的尺度联系起来。我们通过小规模数据在即将过拟合的领域中实验验证了这一效应。

英文摘要

With limited high-quality data and growing compute, multi-epoch training is gaining back its importance across sub-areas of deep learning. Adam(W), versions of which are go-to optimizers for many tasks such as next token prediction, has two momentum hyperparameters $(β_1, β_2)$ controlling memory and one very important hyperparameter, batch size, controlling (in particular) the amount mini-batch noise. We introduce a theoretical framework to understand how mini-batch noise influences the implicit bias of memory in Adam (depending on $β_1$, $β_2$) towards sharper or flatter regions of the loss landscape, which is commonly observed to correlate with the generalization gap in multi-epoch training. We find that in the case of large batch sizes, higher $β_2$ increases the magnitude of anti-regularization by memory (hurting generalization), but as the batch size becomes smaller, the dependence of (anti-)regulariation on $β_2$ is reversed. A similar monotonicity shift (in the opposite direction) happens in $β_1$. In particular, the commonly "default" pair $(β_1, β_2) = (0.9, 0.999)$ is a good choice if batches are small; for larger batches, in many settings moving $β_1$ closer to $β_2$ is much better in terms of validation accuracy in multi-epoch training. Moreover, our theoretical derivations connect the scale of the batch size at which the shift happens to the scale of the critical batch size. We illustrate this effect in experiments with small-scale data in the about-to-overfit regime.

2602.01166 2026-05-11 cs.RO

Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models

隐式推理VLA:面向视觉-语言-动作模型的隐式思考与预测

Shuanghao Bai, Jing Lyu, Wanqi Zhou, Zhe Li, Dakai Wang, Lei Xing, Xiaoguang Zhao, Pengwei Wang, Zhongyuan Wang, Cheng Chi, Badong Chen, Shanghang Zhang

发表机构 * Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University(西安交通大学人工智能与机器人研究院) Beijing Academy of Artificial Intelligence(北京人工智能研究院) Institute of Automation, University of Chinese Academy of Sciences(中国科学院自动化研究所) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院)

AI总结 本文提出LaRA-VLA框架,通过连续潜在表示实现多模态推理,减少推理开销,提升机器人实时控制效率。

Comments Accepted by ICML 2026

详情
AI中文摘要

视觉-语言-动作(VLA)模型受益于链式推理(CoT),但现有方法存在推理开销大且依赖离散推理表示的问题。本文提出隐式推理VLA(LaRA-VLA),通过将多模态Co-T推理内化为连续潜在表示,实现统一的推理与预测。LaRA-VLA在潜在空间中执行统一推理和预测,消除了推理时的显式CoT生成,从而实现高效、以动作为导向的控制。为实现隐式具身推理,我们引入基于课程的学习范式,逐步从显式文本和视觉Co-T监督过渡到潜在推理,最终适应潜在推理动态以条件生成动作。我们构建了两个结构化的Co-T数据集,并在模拟基准和长周期真实机器人操作任务上评估LaRA-VLA。实验结果表明,LaRA-VLA在性能上优于现有最先进的VLA方法,同时将推理延迟降低了90%以上,证明了潜在推理作为实时具身控制的有效且高效的范式。项目页面:https://loveju1y.github.io/Latent-Reasoning-VLA/

英文摘要

Vision-Language-Action (VLA) models benefit from chain-of-thought (CoT) reasoning, but existing approaches incur high inference overhead and rely on discrete reasoning representations that mismatch continuous perception and control. We propose Latent Reasoning VLA (LaRA-VLA), a unified VLA framework that internalizes multi-modal CoT reasoning into continuous latent representations for embodied action. LaRA-VLA performs unified reasoning and prediction in latent space, eliminating explicit CoT generation at inference time and enabling efficient, action-oriented control. To realize latent embodied reasoning, we introduce a curriculum-based training paradigm that progressively transitions from explicit textual and visual CoT supervision to latent reasoning, and finally adapts latent reasoning dynamics to condition action generation. We construct two structured CoT datasets and evaluate LaRA-VLA on both simulation benchmarks and long-horizon real-robot manipulation tasks. Experimental results show that LaRA-VLA consistently outperforms state-of-the-art VLA methods while reducing inference latency by up to 90\% compared to explicit CoT-based approaches, demonstrating latent reasoning as an effective and efficient paradigm for real-time embodied control. Project Page: https://loveju1y.github.io/Latent-Reasoning-VLA/

2602.01003 2026-05-11 cs.LG cs.AI

ESSAM: A Novel Competitive Evolution Strategies Approach to Reinforcement Learning for Memory Efficient LLMs Fine-Tuning

ESSAM:一种用于内存高效的LLM微调的强化学习竞争进化策略方法

Zhishen Sun, Sizhe Dang, Guang Dai, Haishan Ye

发表机构 * Xi’an Jiaotong University(西安交通大学) SGIT AI Lab(SGIT人工智能实验室)

AI总结 本文提出ESSAM,结合进化策略的零阶搜索与SAM提升泛化能力,实现低内存高精度的LLM微调,实验显示其在GSM8K任务中表现优异,内存使用显著降低。

详情
AI中文摘要

强化学习(RL)已成为提升大语言模型(LLM)数学推理能力的关键训练步骤,但通常具有高GPU内存使用率,限制了资源受限环境的应用。为解决这些问题,我们提出了进化策略与尖锐度感知最大化(ESSAM),一种完全参数微调框架,紧密结合进化策略(ES)中的参数空间零阶搜索与SAM,以提高泛化能力。我们在主流数学推理任务GSM8K上进行了微调实验。结果表明,ESSAM在所有模型上的平均准确率为78.27%,其整体性能与RL方法相当。它在准确率为77.72%时优于经典RL算法PPO,并在准确率为78.34%时与GRPO相当,甚至在某些模型上超越它们。进一步的泛化实验显示,使用ESSAM训练的模型表现出更强的泛化能力。它们的平均性能在6个数据集中有5个取得最佳结果,表明ESSAM能有效提升微调模型的泛化性能。在GPU内存使用方面,与PPO相比,ESSAM将平均GPU内存使用减少了18倍,与GRPO相比减少了10倍,实现了极低的GPU内存使用。此外,我们设计了一种加速的ESSAM变体,实现了近两倍的速度提升,同时保持与ESSAM相同的GPU内存使用,平均准确率为78.02%,优于PPO。代码:https://github.com/szs777/ESSAM

英文摘要

Reinforcement learning (RL) has become a key training step for improving mathematical reasoning in large language models (LLMs), but it often has high GPU memory usage, which makes it hard to use in settings with limited resources. To reduce these issues, we propose Evolution Strategies with Sharpness-Aware Maximization (ESSAM), a full parameter fine-tuning framework that tightly combines the zero-order search in parameter space from Evolution Strategies (ES) with the Sharpness-Aware Maximization (SAM) to improve generalization. We conduct fine-tuning experiments on the mainstream mathematica reasoning task GSM8K. The results show that ESSAM achieves an average accuracy of 78.27\% across all models and its overall performance is comparable to RL methods. It surpasses classic RL algorithm PPO with an accuracy of 77.72\% and is comparable to GRPO with an accuracy of 78.34\%, and even surpassing them on some models. Further generalization experiments show that the models trained with ESSAM exhibit stronger generalization ability. Their average performance achieves the best results on 5 out of 6 datasets, indicating that ESSAM can effectively improve the generalization performance of fine-tuned models. In terms of GPU memory usage, ESSAM reduces the average GPU memory usage by $18\times$ compared to PPO and by $10\times$ compared to GRPO, achieving an extremely low GPU memory usage. In addition, we design an accelerated variant of ESSAM, which achieves nearly a twofold speedup while maintaining the same GPU memory usage as ESSAM, and attains an average accuracy of 78.02\% across all models, outperforming PPO. Code: https://github.com/szs777/ESSAM

2602.00513 2026-05-11 cs.LG

Minerva: Reinforcement Learning with Verifiable Rewards for Cyber Threat Intelligence LLMs

Minerva: 为网络威胁情报LLM采用可验证奖励的强化学习

Md Tanvirul Alam, Aritran Piplai, Ionut Cardei, Nidhi Rastogi, Peter J Worth

发表机构 * Rochester Institute of Technology(罗切斯特理工学院) University of Texas at El Paso(德克萨斯大学埃尔帕索分校) Florida Atlantic University(佛罗里达Atlantic大学)

AI总结 本文提出Minerva,一种统一的数据集和训练流程,用于多类网络威胁情报子任务,通过任务特定验证器评分结构化输出。MinervaRL通过轻量自训练机制生成额外验证轨迹并反向蒸馏至模型,提升性能。

详情
AI中文摘要

网络威胁情报(CTI)分析师经常将噪声的非结构化安全工件转换为标准化、可自动化表示。尽管大语言模型(LLMs)在此任务中显示出潜力,但现有方法在生成结构化CTI输出时仍脆弱,并主要依赖监督微调(SFT)。相比之下,CTI标准和社区维护的资源定义了标准标识符和模式,使模型输出的确定性验证成为可能。我们利用这种结构研究CTI任务中的可验证奖励强化学习(RLVR)。我们引入Minerva,一个涵盖多个CTI子任务的统一数据集和训练流程,每个子任务都配以任务特定的验证器,评分结构化输出和标识符预测。为了解决滚动过程中的奖励稀疏性问题,我们提出MinervaRL,一种轻量自训练机制,生成额外的验证轨迹并将其蒸馏回模型。在四个骨干网络和12个CTI基准测试中,MinervaRL在相应基础模型上平均提高15.8个百分点,在GRPO上提高4.3个百分点。

英文摘要

Cyber threat intelligence (CTI) analysts routinely convert noisy, unstructured security artifacts into standardized, automation-ready representations. Although large language models (LLMs) show promise for this task, existing approaches remain brittle when producing structured CTI outputs and have largely relied on supervised fine-tuning (SFT). In contrast, CTI standards and community-maintained resources define canonical identifiers and schemas that enable deterministic verification of model outputs. We leverage this structure to study reinforcement learning with verifiable rewards (RLVR) for CTI tasks. We introduce Minerva, a unified dataset and training pipeline spanning multiple CTI subtasks, each paired with task-specific verifiers that score structured outputs and identifier predictions. To address reward sparsity during rollout, we propose MinervaRL, a lightweight self-training mechanism that generates additional verified trajectories and distills them back into the model. Averaged across four backbones and 12 CTI benchmarks, MinervaRL improves the mean score by 15.8 percentage points over the corresponding base models and by 4.3 points over GRPO.

2601.22307 2026-05-11 cs.LG cs.NA math.NA

Exact Gaussian Moment Matching for Residual Networks: a Second-Order Method

残差网络的精确高斯矩匹配:一种二阶方法

Simon Kuang, Xinfan Lin

发表机构 * Department of Mechanical and Aerospace Engineering(机械与航空航天工程系) University of California, Davis(加州大学戴维斯分校)

AI总结 本文提出通过逐层矩匹配在残差网络中精确传播高斯分布的均值和协方差,针对多种激活函数实现精确矩匹配,显著提升KL散度误差指标。

Comments new theoretical result on higher-order accuracy

详情
AI中文摘要

我们研究如何通过逐层矩匹配传播一般多元高斯分布的均值和协方差通过深度(残差)神经网络。我们通过推导probit、GeLU、ReLU(作为GeLU的极限)、Heaviside(作为probit的极限)和sine激活函数的精确矩匹配,填补了长期存在的空白;在随机网络上,我们发现KL散度误差度量在数量级上比流行方法提高了数百万倍;在变分贝叶斯神经网络中,我们的方法在KL散度方面相对于最先进的确定性推断方法提升了百倍;我们还给出了平滑距离误差界,表明在正则性假设下,矩匹配消除了低方差误差,并通过网络层传播更高阶的局部精度。

英文摘要

We study the problem of propagating the mean and covariance of a general multivariate Gaussian distribution through a deep (residual) neural network using layer-by-layer moment matching. We close a longstanding gap by deriving exact moment matching for the probit, GeLU, ReLU (as a limit of GeLU), Heaviside (as a limit of probit), and sine activation functions; for both feedforward and generalized residual layers. On random networks, we find orders-of-magnitude improvements in the KL divergence error metric, up to a millionfold, over popular alternatives. On a variational Bayes neural network, we show that our method attains hundredfold improvements in KL divergence from Monte Carlo ground truth over a state-of-the-art deterministic inference method. We also give a smooth-distance error bound showing that, under regularity assumptions, moment matching removes the leading low-variance errors and propagates higher-order local accuracy through the layers of a network.

2601.21424 2026-05-11 cs.LG cs.CV cs.IT math.IT

Lossy Common Information in a Learnable Gray-Wyner Network

可学习的灰-韦纳网络中的损失性共同信息

Anderson de Andrade, Alon Harell, Ivan V. Bajić

发表机构 * School of Engineering Science Simon Fraser University(工程科学学院 西蒙弗雷泽大学)

AI总结 本文提出一种可学习的三通道编码器,通过分离多任务视觉任务中的共享信息与任务特定细节,减少冗余并提升效率。

详情
AI中文摘要

许多计算机视觉任务共享大量重叠信息,但传统编码器往往忽视这一点,导致冗余和低效的表示。灰-韦纳网络,信息论中的经典概念,提供了一个系统框架来分离共同信息和任务特定信息。受此启发,我们开发了一种可学习的三通道编码器,旨在在多个视觉任务中分离共享信息与任务特定细节。通过损失性共同信息的概念,我们刻画了这种方法的局限性,并提出一个优化目标,平衡学习此类表示中的固有权衡。通过在六个视觉基准上的两个任务场景中比较三种编码器架构,我们证明了我们的方法显著减少了冗余,并在独立编码中表现更优。这些结果突显了在现代机器学习背景下重新审视灰-韦纳理论的实用价值,将经典信息论与任务驱动的表示学习相结合。

英文摘要

Many computer vision tasks share substantial overlapping information, yet conventional codecs tend to ignore this, leading to redundant and inefficient representations. The Gray-Wyner network, a classical concept from information theory, offers a principled framework for separating common and task-specific information. Inspired by this idea, we develop a learnable three-channel codec that disentangles shared information from task-specific details across multiple vision tasks. We characterize the limits of this approach through the notion of lossy common information, and propose an optimization objective that balances inherent tradeoffs in learning such representations. Through comparisons of three codec architectures on two-task scenarios spanning six vision benchmarks, we demonstrate that our approach substantially reduces redundancy and consistently outperforms independent coding. These results highlight the practical value of revisiting Gray-Wyner theory in modern machine learning contexts, bridging classic information theory with task-driven representation learning.

2601.20599 2026-05-11 cs.LG cs.AI

R-GTD: A Geometric Analysis of Gradient Temporal-Difference Learning in Singular Regimes

R-GTD:梯度时间差学习在奇异情形下的几何分析

Hyunjun Na, Donghwan Lee

发表机构 * School of Electrical Engineering, KAIST(韩国科学技术院电子工程学院)

AI总结 本文提出R-GTD算法,通过重新表述均方投影Bellman误差最小化问题,解决FIM奇异导致的稳定性问题,提供理论保证和实验证明。

Comments 32 pages, 8 figures

详情
AI中文摘要

梯度时间差(GTD)学习算法广泛用于具有函数逼近的离策略策略评估。然而,现有收敛分析依赖于特征交互矩阵(FIM)非奇异的限制性假设。在实践中,FIM可能成为奇异矩阵,导致不稳定或性能下降。尽管一些先前工作应用正则化来放松非奇异假设,但其理论保证仍依赖于其他限制性条件。在本文中,我们通过重新表述均方投影Bellman误差最小化问题,提出一个正则化的优化目标。该公式自然产生一个称为R-GTD的正则化GTD算法,即使FIM奇异,也能保证收敛到唯一解。我们通过几何分析建立所提方法的理论收敛保证和显式误差界,并通过实验证明其有效性。

英文摘要

Gradient temporal-difference (GTD) learning algorithms are widely used for off-policy policy evaluation with function approximation. However, existing convergence analyses rely on the restrictive assumption that the so-called feature interaction matrix (FIM) is nonsingular. In practice, the FIM can become singular and leads to instability or degraded performance. While some prior works have applied regularization to relax the nonsingularity assumption, their theoretical guarantees inevitably rely on other restrictive conditions. In this paper, we propose a regularized optimization objective by reformulating the mean-square projected Bellman error minimization. This formulation naturally yields a regularized GTD algorithms, referred to as R-GTD, which guarantees convergence to a unique solution even when the FIM is singular. We conduct a geometric analysis to establish theoretical convergence guarantees and explicit error bounds for the proposed method, and validate its effectiveness through empirical experiments.