arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2154
专题追踪
2605.19284 2026-05-20 cs.CL cs.LG

Language models struggle with compartmentalization

语言模型在 compartmentalization 方面遇到困难

Thomas Vincent Howe, David Wingate

发表机构 * Department of Computer Science(计算机科学系) Brigham Young University

AI总结 研究探讨了大型语言模型在处理统一概念的不同表达方式时的 compartmentalization 问题,发现模型在不同表达方式之间无法有效共享统计信息,导致模型容量浪费和样本效率降低。

Comments 9 pages, 8 figures, plus 9 pages of appendices. Submitted to NeurIPS 2026. Code: https://github.com/vinhowe/compartmentalization. Eval data: https://doi.org/10.5281/zenodo.20171021

详情
AI中文摘要

在大型语言模型(LLMs)使用的训练数据中,相同的潜在概念通常以多种不同的方式呈现:相同的事实出现在英语和斯瓦希里语中;许多函数可以用Python和Haskell表达;命题可以用正式语言和自然语言表达。我们展示了LLMs可能会表现出compartmentalization,即在不同的统一概念的不同表达方式之间无法识别和共享统计信息。在最坏的情况下,LLMs只是学习了每个概念表达方式的平行内部表示,用冗余性耗尽模型容量,并随着这些表达方式的数量增加而降低样本效率。我们还证明,即使合成平行数据容易学习,它也可能无法改善这一问题。在此框架下,我们发现,对于小型模型,早期多语言学习几乎完全是 compartmentalized 的。最后,所有我们研究的干预措施都表现出相变,其有效性取决于不同的表达方式数量,这表明语言建模目标可能只能不一致地统一表示。

英文摘要

In the training data used by large language models (LLMs), the same latent concept is often presented in multiple distinct ways: the same facts appear in English and Swahili; many functions can be expressed in both Python and Haskell; we can express propositions in both formal and natural language. We show that LLMs can exhibit compartmentalization, where they fail to identify and share statistical strength between distinct presentations of unified concepts. In the worst case, LLMs simply learn parallel internal representations of each presentation of the concept, saturating model capacity with redundancies and decreasing sample efficiency with the number of such presentations. We also demonstrate that synthetic parallel data can fail to improve this despite being easily learned itself. Under this framework, we find that, for small models, early multilingual learning is nearly entirely compartmentalized. Finally, all interventions that we study exhibit a phase transition in which their effectiveness depends on the number of distinct presentations, suggesting that the language modeling objective may only inconsistently unify representations.

2605.19283 2026-05-20 cs.LG cs.AI stat.ML

EviTrack: Selection over Sampling for Delayed Disambiguation

EviTrack: 在延迟歧义中选择而非采样

Omer Haq

发表机构 * Independent Researcher(独立研究者)

AI总结 本文提出EviTrack框架,通过在潜在轨迹上进行选择而非边际状态,以在延迟歧义中实现更有效的序列推理,其核心方法是基于证据和似然比的轨迹假设选择,从而在数据支持后延迟承诺,优于基于采样的基线方法。

Comments https://github.com/Haq94/EviTrack

详情
AI中文摘要

在延迟歧义的环境中,顺序预测具有挑战性,因为早期观测模糊,多个潜在解释在足够证据积累之前仍然合理。基于边际推断的标准方法在此设置中表现不佳,要么过早坍塌不确定性,要么在信息证据出现后无法恢复。我们引入EviTrack,一种测试时间推断框架,该框架在潜在轨迹上而非边际状态上操作。EviTrack维护一组竞争轨迹假设,并应用基于证据和似然比的选择来延迟承诺,直到有数据支持。受多假设跟踪和先检测前跟踪中的假设管理启发。为了评估此设置,我们构建了一个受控的合成基准,具有已知的潜在真实值,明确展示了延迟歧义。在匹配的推断预算下,EviTrack显著优于基于采样的基线方法,实现更快的后歧义恢复。这些结果表明,在延迟歧义环境中,适度的轨迹级选择比增加采样覆盖更有效,突显了选择而非采样作为可靠序列推断的关键原则。

英文摘要

Sequential prediction is challenging in regimes of delayed disambiguation, where early observations are ambiguous and multiple latent explanations remain plausible until sufficient evidence accumulates. Standard approaches based on marginal inference struggle in this setting, either collapsing uncertainty prematurely or failing to recover once informative evidence arrives. We introduce EviTrack, a test-time inference framework that operates over latent trajectories rather than marginal states. EviTrack maintains a set of competing trajectory hypotheses and applies evidence- and likelihood-ratio-based selection to delay commitment until supported by data, drawing inspiration from hypothesis management in multiple hypothesis tracking and track-before-detect. To evaluate this setting, we construct a controlled synthetic benchmark with known latent ground truth that explicitly exhibits delayed disambiguation. At matched inference budget, EviTrack substantially outperforms sampling-based baselines, achieving faster post-disambiguation recovery. These results show that, in delayed disambiguation regimes, moderate trajectory-level selection is more effective than increasing sampling coverage, highlighting selection over sampling as a key principle for reliable sequential inference.

2605.19282 2026-05-20 cs.LG

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

重新思考Muon超越预训练:VLA和RLVR中的频谱失败及高频修复

Chongyu Fan, Gaowen Liu, Mingyi Hong, Ramana Rao Kompella, Sijia Liu

发表机构 * Michigan State University(密歇根州立大学) Cisco(思科) University of Minnesota(明尼苏达大学) IBM Research(IBM研究院)

AI总结 本文研究了Muon优化器在预训练之外的局限性,提出Pion通过高频NS迭代机制改进VLA和RLVR任务的性能。

详情
AI中文摘要

Muon是一种矩阵感知优化器,利用牛顿-施楚兹(NS)迭代来通过驱动动量矩阵的所有奇异值趋近于1来强制梯度正交化。尽管这种均匀频谱白化增强了探索并优于AdamW在LLM预训练中,我们显示它在两个领域可能导致根本限制:(i)跨模态视觉-语言-动作(VLA)训练,其中固有低秩动作模块梯度导致噪声尾部方向的放大,以及(ii)可验证奖励的强化学习(RLVR),其中低信噪比梯度和需要保留先前训练的每头专业化使白化不稳定。为了解决这些挑战,我们提出Pion,作为Muon的即插即用替代品,保持其计算效率,同时将均匀频谱白化替换为两阶段的提升+抑制机制,我们称之为高频NS迭代。这种设计诱导了锐利的频谱高频效应,将主导奇异值锚定在1,同时将噪声尾部组件抑制到0,具有可控的滤波强度。为了保持预训练的每头异质性,Pion还支持一种每头模式,通过简单的reshape在注意力头之间独立应用更新,而无需额外成本。在LIBERO和LIBERO-Plus上的VLA训练中,Pion在l_1回归(VLA-Adapter)和流匹配(VLANeXt)架构上一致优于基线,例如在1,500次训练步骤后达到LIBERO Object的100%成功率,而Muon为97.0%,AdamW仅为32.2%。Pion的优势进一步扩展到使用pi_0.5骨干的现实Franka Research 3机器人在DROID设置下的三个抓取和放置任务。在Qwen3-1.7B/4B上的RLVR后训练中,Pion在MATH和GSM8K上优于AdamW,而Muon则崩溃为零。

英文摘要

Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pretraining, we show it could lead to fundamental limitations beyond pretraining in two regimes: (i) cross-modality vision-language-action (VLA) training, where inherently low-rank action-module gradients cause amplification of noisy tail directions, and (ii) reinforcement learning with verifiable rewards (RLVR), where low-SNR gradients and the need to preserve per-head specialization from prior training make whitening unstable. To address these challenges, we propose Pion, a drop-in replacement for Muon that preserves its computational efficiency while replacing uniform spectral whitening with a two-stage Promotion+Suppression mechanism, which we call the high-pass NS iteration. This design induces a sharp spectral high-pass effect, anchoring dominant singular values at 1 while suppressing noisy tail components toward 0, with controllable filter strength. To preserve pretrained per-head heterogeneity, Pion also supports a per-head mode that applies updates independently across attention heads via a simple reshape, at no extra cost. In VLA training on LIBERO and LIBERO-Plus, Pion consistently outperforms both baselines across l_1-regression (VLA-Adapter) and flow-matching (VLANeXt) architectures, e.g., reaching 100% success rate on LIBERO Object after 1,500 training steps with VLA-Adapter, vs. 97.0% for Muon and only 32.2% for AdamW. The advantage of Pion further extends to a real Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on three grasp-and-place tasks. In RLVR post-training on Qwen3-1.7B/4B with GRPO and GMPO, Pion also outperforms AdamW on MATH and GSM8K while Muon collapses to zero.

2605.19279 2026-05-20 cs.CV

FPED: A Functional-Network Prior-Guided Mixture-of-Experts Framework for Interpretable Brain Decoding

FPED: 一种基于功能网络先验的可解释性脑解码混合专家框架

Yudan Ren, Pengcheng Shi, Zihan Ma, Xiaowei He, Xiao Li

发表机构 * School of Electronic Information (School of Artificial Intelligence), Northwest University(电子信息学院(人工智能学院),西北大学)

AI总结 本文提出FPED框架,通过建模不同的功能脑网络作为专家,利用自适应路由机制捕捉其对视觉语义理解的互补贡献,实现可解释的脑解码。

Comments 15 pages,4 figures

详情
AI中文摘要

从功能磁共振成像(fMRI)进行视觉图像重建是脑解码中的基本任务,为理解人类感知机制和开发高级脑机接口(BCIs)提供了关键路径。然而,大多数现有方法将局部视觉皮层的fMRI信号简单地展平为一维向量,直接映射到对比语言-图像预训练(CLIP)等潜在空间。这种范式不仅破坏了大脑固有网络拓扑结构,导致神经科学解释性有限,还忽略了其他分布式功能网络在处理高级视觉语义中的协同作用。为解决这些限制,我们提出了FPED,一种基于功能网络先验的混合专家(MoE)框架,用于可解释的脑解码。FPED明确将不同的功能脑网络建模为专门的专家,并利用自适应路由机制捕捉其对视觉语义理解的互补贡献。与传统同质解码范式不同,我们的框架整合了神经生物学基础的先验知识,以实现结构化且可解释的网络层面表示学习。实验结果表明,FPED仅使用0.68B参数即可实现高度竞争的语义重建性能。所学的路由动态揭示了功能脑网络与模态特定语义处理之间的生物意义对应关系,提供了透明的神经科学解释性。这表明,具有脑网络意识的专家建模是连接神经解码与生物启发式人工智能的有前景方向。

英文摘要

Visual image reconstruction from functional Magnetic Resonance Imaging (fMRI) is a fundamental task in brain decoding, providing a crucial pathway for understanding human perceptual mechanisms and developing advanced brain-computer interfaces (BCIs). However, most current methods simply flatten fMRI signals from localized visual cortices into one-dimensional (1D) vectors, mapping them directly into latent spaces such as that of Contrastive Language-Image Pre-training (CLIP). This paradigm not only disrupts the inherent network topology of the brain-leading to limited neuroscientific interpretability-but also overlooks the synergistic contributions of other distributed functional networks in processing high-level visual semantics. To address these limitations, we propose FPED, a Functional-Network Prior-Guided Mixture of Experts (MoE) framework for interpretable brain decoding. FPED explicitly models different functional brain networks as specialized experts and employs adaptive routing to capture their complementary contributions to visual semantic understanding. Unlike conventional homogeneous decoding paradigms, our framework incorporates neurobiologically grounded priors to enable structured and interpretable network-level representation learning. Experimental results demonstrate that FPED achieves highly competitive semantic reconstruction performance with only 0.68B parameters. The learned routing dynamics reveal biologically meaningful correspondence between functional brain networks and modality-specific semantic processing, providing transparent neuroscientific interpretability. This suggests that brain network-aware expert modeling is a promising direction for bridging neural decoding and biologically inspired artificial intelligence.

2605.19274 2026-05-20 cs.CL

Lost in Interpretation: The Plausibility-Faithfulness Trade-off in Cross-Lingual Explanations

迷失在解释中:跨语言解释中的可信度与忠实度的权衡

Somnath Banerjee, Pranav Jha, Rima Hazra, Animesh Mukherjee

发表机构 * Indian Institute of Technology Kharagpur(印度理工学院Khargpur分校) TCG Crest(TCG研究中心) National University of Singapore(新加坡国立大学)

AI总结 本文研究了跨语言解释中可信度与忠实度之间的权衡,发现以英语为枢纽的解释在与人类理由的一致性上表现更好,但其证据在模型预测中的因果基础较弱。研究发现,英语解释虽然流畅,但与原生语言条件相比,其解释的全面性下降了5.7倍,甚至在任务准确性保持稳定的情况下,也未能保留语义细微差别。因此,建议在输入语言中审计解释,报告多方面的忠实度指标,并将英语解释视为沟通摘要而非忠实的决策轨迹。

详情
AI中文摘要

多语言部署的LLM通常通过英语解释对非英语输入进行审计。我们评估了提取性解释(模型识别输入token跨度作为证据并生成理由)并发现存在系统性的权衡:英语枢纽解释在与人类理由的一致性上表现更好,但其证据在模型预测中的因果基础较弱,这通过全面性和充分性来衡量。在三个任务、五种语言和两种多语言LLM家族中,我们发现英语解释经常产生流畅但松散锚定的理由,其全面性相对于原生语言条件下降高达5.7倍——即使在不同设置中任务准确性保持稳定。对于社会细微差别分类,英语枢纽也未能保留语义线索,从而降低忠实度和跨度一致性。我们建议在输入语言中审计解释,报告超越词法重叠的多方面忠实度指标,并将英语理由视为沟通摘要而非忠实的决策轨迹。

英文摘要

LLMs deployed multilingually are often audited via English explanations for non-English inputs. We evaluate extractive explanations ''where the model identifies input token spans as evidence alongside a generated rationale'' and uncover a systematic trade-off: English-pivot explanations can achieve higher span agreement with human rationales while their evidence becomes less causally grounded in the model's prediction, as measured by both comprehensiveness and sufficiency. Across 3 tasks, 5~languages, and 2~multilingual LLM families, we find that English explanations frequently produce fluent but loosely anchored rationales, with comprehensiveness degrading by up to 5.7x relative to native-language conditions - even as task accuracy remains stable across settings. For socially nuanced classification, English pivots also fail to preserve pragmatic cues, reducing both faithfulness and span agreement. We recommend auditing explanations in the input language, reporting multi-faceted faithfulness metrics beyond lexical overlap, and treating English rationales as communication summaries rather than faithful decision traces.

2605.19270 2026-05-20 cs.CL

DECOR: Auditing LLM Deception via Information Manipulation Theory

DECOR:通过信息操纵理论审计大语言模型的欺骗行为

Linyue Cai, Samuel Yeh, Jwala Dhamala, Rahul Gupta, Sharon Li

发表机构 * Department of Computer Sciences, University of Wisconsin-Madison(威斯康星大学麦迪逊分校计算机科学系) Amazon(亚马逊)

AI总结 本文提出DECOR框架,基于信息操纵理论,通过细粒度审计实现对大语言模型欺骗行为的有效检测,展示了其在单轮和多轮欺骗检测中的优越性能。

详情
AI中文摘要

大型语言模型可以通过微妙地操纵真实信息来欺骗,例如省略关键事实、转移焦点或模糊意义,使这种行为难以检测。现有的黑盒方法依赖于粗粒度判断,提供有限的可解释性,并未能确定哪些事实被扭曲以及如何扭曲。我们引入DECOR,一种基于信息操纵理论的多智能体框架,用于细粒度审计LLM响应中的战略性欺骗。DECOR将输入上下文分解为原子信息单元,并在四个操纵维度上对每个单元进行评分,生成可解释的操纵配置文件,并将其汇总为全局欺骗指数。我们全面评估了DECOR在单轮和多轮欺骗检测基准上,涵盖现实世界领域,并显示DECOR在两者上均达到最先进的性能,优于竞争基线。该框架在15种前沿模型上具有泛化能力,消融研究证实了每个关键设计组件的贡献。我们的发现表明,基于理论的细粒度信息操纵审计为LLM欺骗检测提供了一条有效且可解释的路径。

英文摘要

Large language models can deceive by subtly manipulating truthful information -- omitting key facts, shifting focus, or obscuring meaning -- making such behavior difficult to detect. Existing black-box methods rely on coarse-grained judgments, offering limited interpretability and failing to pinpoint which facts were distorted and how. We introduce DECOR, a multi-agent framework grounded in Information Manipulation Theory for fine-grained auditing of strategic deception in LLM responses. DECOR decomposes input contexts into atomic informational units and scores each unit against the response across four dimensions of manipulation, producing interpretable manipulation profiles that are aggregated into a global deception index. We comprehensively evaluate DECOR on both single-turn and multi-turn deception detection benchmarks spanning real-world domains, and show that DECOR achieves state-of-the-art performance on both, outperforming competitive baselines. The framework generalizes across 15 frontier models, and ablation studies confirm the contribution of each key design component. Our findings demonstrate that fine-grained, theory-grounded auditing of information manipulation offers an effective and interpretable path for LLM deception detection.

2605.19264 2026-05-20 cs.AI cs.MA

Swimming with Whales: Analysis of Power Imbalances in Stake-Weighted Governance

与鲸鱼游泳:对基于权益治理中权力不平衡的分析

Yuzhe Zhang, Manvir Schneider, Qin Wang, Davide Grossi

发表机构 * Independent researcher(独立研究者) University of Groningen(格罗宁根大学) University of Amsterdam(阿姆斯特丹大学)

AI总结 本文研究了基于权益的投票机制中权力失衡现象,通过计算社会选择理论分析了权益加权投票中权力不平衡的程度,并提供了理论和实证贡献。

详情
AI中文摘要

基于权益的投票方法是权益证明(PoS)区块链的基本治理范式。这种范式已知容易产生权力扭曲:少数拥有大权益的用户可能完全控制决策,即使他们不拥有全部权益。我们通过计算社会选择的视角研究这一现象,关注在使用Penrose-Banzhaf权力指数量化权力的情况下,权益加权投票中的权力不平衡程度。我们的工作提供了分析和实证贡献。分析上,我们证明虽然权力与相对权益所有权之间的完美一致通常无法实现,但在特定条件下可以期望近似。实证上,利用现实世界链上治理系统(Project Catalyst)的数据,我们提供了当前权益加权治理系统中可能发生的权力不平衡的更细致理解。

英文摘要

Voting methods weighted by stakes are the fundamental governance paradigm in Proof-of-Stake (PoS) blockchains. Such a paradigm is known to be prone to power distortions: a few users possessing large stakes may completely control decision making, even without owning the totality of the stakes. We study this phenomenon through the lens of computational social choice, focusing on the extent of power imbalances in stake-weighted voting when power is quantified using the Penrose-Banzhaf power index. Our work presents both analytical and empirical contributions. Analytically, we demonstrate that while a perfect alignment between power and relative stake ownership is generally unattainable, it can be approximated in expectation under specific conditions. Empirically, using data from a real-world on-chain governance system (Project Catalyst), we provide a more fine-grained understanding of the power imbalances that are likely to occur in current stake-weighted governance systems.

2605.19260 2026-05-20 cs.AI cs.CV cs.MA

AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees

AQuaUI: 用于GUI代理的视觉令牌减少方法基于自适应四叉树

Yuankai Li, Tinghui Zhu, Ha Min Son, Zhe Zhao, Xin Liu, Muhao Chen

发表机构 * UC Davis(加州大学戴维斯分校)

AI总结 本文提出AQuaUI,一种无需训练的推理时GUI代理模型的视觉令牌减少方法,利用屏幕截图中的非均匀信息密度,通过自适应四叉树结构保持令牌位置以确保一致性,并通过条件四叉树算法提升多步骤GUI交互的时序一致性,实验表明其在准确性和效率之间取得了改进。

详情
AI中文摘要

大型多模态模型(LMMs)最近已作为GUI代理模型的有希望的骨干出现,其中在每个迭代步骤中将高分辨率GUI截图引入提示中。然而,这些截图表现出高度非均匀的空间信息密度:大区域可能携带很少的信息且视觉上同质,而关键文本和图标可能需要高视觉保真度。现有方法要么需要额外训练,要么依赖于基于注意力的令牌压缩,忽略了GUI截图的结构布局和空间冗余。为填补这一空白,本文提出了AQuaUI,一种用于GUI代理模型的无训练推理时令牌减少方法,利用截图中的非均匀信息密度。AQuaUI在每个截图输入上构建一个自适应四叉树,并在四叉树的每个叶子节点保留一个代表性的合并令牌。AQuaUI在整个管道中保持保留令牌的空间位置,以确保所有位置编码阶段保持一致。为进一步提高多步骤GUI交互中的时间一致性,我们提出了一种条件四叉树算法,利用单个请求内连续截图之间的连续性。具体而言,它利用先前的四叉树作为参考来细化当前四叉树,帮助在静态或轻微移动的GUI状态下保留细粒度区域。我们在最先进的GUI代理模型上实现了AQuaUI,并在标准的地面和导航基准上进行了实验。AQuaUI在准确性和效率之间始终优于先前的基线。值得注意的是,在GUI-Owl-1.5-32B-Instruct上,AQuaUI实现了高达13.22%的速度提升和29.52%的更少视觉令牌,同时保留了99.06%的完整令牌性能,表明可以在不重新训练的情况下利用GUI截图的空间冗余。

英文摘要

Large Multimodal Models (LMMs) have recently emerged as promising backbones for GUI-agent models, where high-resolution GUI screenshots are introduced to the prompts at each iteration step. However, these screenshots exhibit highly non-uniform spatial information density: large regions may carry little information and are visually homogeneous, while key text and icons may require high visual fidelity. Existing approaches to this problem either require additional training or rely on attention-based token compression, ignoring the structured layout and spatial redundancy of GUI screenshots. To fill the gap, this paper proposes AquaUI, a training-free inference-time token reduction method for GUI agent models that utilizes the non-uniform information density in screenshots. AQuaUI constructs an adaptive quadtree on each screenshot input and keeps one representative merged token per leaf of the quadtree. AQuaUI preserves the spatial positions of retained tokens throughout the pipeline to ensure that all position-encoding stages remain consistent. To further improve temporal consistency across multi-step GUI interactions, we propose a conditional quadtree algorithm that leverages the continuity between consecutive screenshots within a single request. Specifically, it refines the current quadtree using previous quadtrees as references, helping preserve fine-grained regions across static or mildly shifted GUI states. We implement AQuaUI on state-of-the-art GUI agent models and conduct experiments on standard grounding and navigational benchmarks. AQuaUI consistently shows improved accuracy-efficiency trade-offs over prior baselines. Notably, on GUI-Owl-1.5-32B-Instruct, AQuaUI achieves up to 13.22% speedup and 29.52% fewer visual tokens while retaining 99.06% of full-token performance, suggesting that the spatial redundancy of GUI screenshots can be exploited at inference without retraining.

2605.19258 2026-05-20 cs.LG cs.AI

ExECG: An Explainable AI Framework for ECG models

ExECG:用于ECG模型的可解释AI框架

Jong-Hwan Jang, Yong-yeon Jo

发表机构 * Medical AI Co. Ltd(医疗AI公司)

AI总结 本文提出ExECG框架,旨在解决ECG模型在临床应用中缺乏解释性的问题,通过三阶段流程提供可重用和可复现的ECG可解释性。

详情
AI中文摘要

深度学习已使ECG诊断模型在如心律失常分类和异常检测等任务中表现出强大的性能。然而,仅凭准确性不足以满足临床部署的需求,因为它无法解释为何产生特定的输出,限制了验证、错误分析和信任。尽管ECG XAI已被广泛研究并持续改进,但不同研究中的实际流程和报告规范差异较大,阻碍了重用和可复现性。为了解决这些问题,我们提出了ExECG,一个Python框架,提供三阶段流程:Wrapper标准化访问异构ECG格式和中间表示,Explainer统一各种XAI方法到共享的执行协议,Visualizer支持在统一界面内一致的跨方法比较。我们通过简洁的例子和两个案例研究展示了端到端的使用,强调了可互操作和可复现的ECG可解释性。

英文摘要

Deep learning has enabled ECG diagnostic models with strong performance in tasks such as arrhythmia classification and abnormality detection. However, accuracy alone is insufficient for clinical deployment because it does not explain why a specific output was produced, limiting justification, error analysis, and trust. Although ECG XAI has been extensively investigated and steadily improved, practical pipelines and reporting conventions vary across studies, hindering reuse and reproducibility. To address these issues, we present Explainable AI framework for ECG models (ExECG), a Python framework that provides a three-stage pipeline: Wrapper standardizes access across heterogeneous ECG formats and intermediate representations, Explainer unifies diverse XAI methods under a shared execution protocol, and Visualizer supports consistent cross-method comparison within a unified interface. We demonstrate end-to-end usage with concise examples and two case studies, highlighting interoperable and reproducible ECG explainability.

2605.19256 2026-05-20 cs.CV

Distribution Matching Distillation without Fake Score Network

无需假评分网络的分布匹配蒸馏

Youngjoong Kim, Deokyeong Lee, Jaesik Park

发表机构 * Department of Computer Science and Engineering, Seoul National University(首尔国立大学计算机科学与工程系) Department of Computer Science and Engineering, Sogang University(成均馆大学计算机科学与工程系)

AI总结 本文提出无需假评分网络的分布匹配蒸馏(FSF-DMD),通过流图生成器自身诱导的伪速度替代传统假评分网络,实现了分布级校正,并在ImageNet-1K数据集上验证了其有效性。

详情
AI中文摘要

分布匹配蒸馏(DMD)为少步生成提供了有效的分布级校正,但依赖辅助的假评分网络来跟踪生成分布的演变。近期工作将DMD式目标与流图生成器结合,以利用正向发散训练和反向发散校正。假评分估计器仍是一个额外的组件,具有内存和更新开销。在本工作中,我们研究当生成器本身具有流图结构时是否可以避免显式跟踪器。我们提出无需假评分网络的DMD(FSF-DMD),一种适用于流图生成器的DMD形式,其用生成器诱导的伪速度替代传统假评分估计器。关键观察是流图生成器的端点伪速度提供了一个可计算的假速度估计代理,使生成器本身能够提供反向发散信号。基于这一观察,我们推导出一个实用的目标,扩展了流图一致的反向模拟,并引入了自教师变体以从头开始训练。在ImageNet-1K 256×256实验中,FSF-DMD改进了流图基线,达到了流图初始化设置下低于列出的DMD2比较的FID,并在流图匹配初始化和从头开始训练时仍保持有效。

英文摘要

Distribution Matching Distillation (DMD) provides an effective distribution-level correction for few-step generation, while relying on an auxiliary fake-score network to track the evolving generative distribution. Recent work combines DMD-style objectives with flow-map generators to exploit both forward-divergence training and reverse-divergence correction. The fake-score estimator remains an additional component with memory and update overhead. In this work, we study whether this explicit tracker can be avoided when the generator itself has a flow-map structure. We propose Fake-Score-network-Free DMD (FSF-DMD), a DMD formulation for flow-map generators that replaces the auxiliary fake-score estimator with a generator-induced pseudo-velocity surrogate. The key observation is that the endpoint pseudo-velocity of a flow-map generator provides a tractable proxy for fake-velocity estimation, allowing the generator itself to supply the reverse-divergence signal. Building on this observation, we derive a practical objective, extend it with flow-map-consistent backward simulation, and introduce a self-teacher variant for training from scratch. In our ImageNet-1K $256 \times 256$ experiments, FSF-DMD improves flow-map baselines, reaches lower FID than the listed DMD2 comparisons in the flow-map-initialized setting, and remains effective under flow-matching initialization and training from scratch.

2605.19255 2026-05-20 cs.RO

Bilateral Teleoperation with Compliant 6-DOF Pose-and-Force Sensing

双通道远程操作与合规6自由度位姿和力感知

Yue Feng, Weicheng Huang, I-Ming Chen

发表机构 * Robotics Research Centre Nanyang Technological University Singapore(南洋理工大学机器人研究中心)

AI总结 本文提出了一种基于硬件无关的WinGs操作系统(WOS)中间件的笛卡尔双通道框架,通过低成本的6自由度位姿和力感知末端执行器Delta6实现远程操作,该框架能够稳定跟踪高达120±40ms延迟和1%丢包率的系统,并在接触时匹配规定的虚拟刚度。

Comments 8 pages, 16 figures, 2 tables. Preprint

详情
AI中文摘要

现有的双通道远程操作平台仍然依赖于昂贵的刚性六轴力/扭矩传感器、紧密耦合的主从硬件和千赫兹控制回路。我们提出了一种基于硬件无关的WinGs操作系统(WOS)中间件的笛卡尔双通道框架,其中低成本的合规6自由度位姿和力感知末端执行器Delta6被安装在两侧,使得每个机械臂行为如同一个末端执行器6自由度系列弹性执行器(SEA)。主控制器运行一个仅含阻尼的顺应回路,配以6-D双二次-notch滤波器;从控制器通过基于位置的外环实现刚度-阻尼阻抗,通过PID力到位姿的映射。三个时间尺度(硬件I/O、中速阻抗/顺应、低速远程操作消息)被显式解耦,使同一应用能够驱动异构机械臂。在Lite6/FR3测试平台上以150Hz运行时,系统在高达120±40ms延迟和1%丢包率下稳定跟踪,接触时匹配规定的虚拟刚度,并在被动式测试中表现出良好的累积能量特征。

英文摘要

Existing bilateral teleoperation platforms still rely on costly rigid six-axis force/torque sensors, tightly coupled leader-follower hardware, and kilohertz control loops. We present a Cartesian bilateral framework built on the hardware-agnostic WinGs Operating Studio (WOS) middleware, in which a low-cost compliant 6-DOF pose-and-force sensing end-effector, Delta6, is mounted on both sides so that each manipulator behaves as an end-effector 6-DOF series elastic actuator (SEA). The leader runs a damping-only admittance loop with a 6-D biquad notch filter; the follower realizes a stiffness-damping impedance through a position-based outer loop with a PID wrench-to-pose mapping. Three time scales (hardware I/O, mid-rate impedance/admittance, low-rate teleoperation messages) are explicitly decoupled, enabling the same application to drive heterogeneous arms. On a Lite6/FR3 testbed at 150 Hz, the system tracks stably under delays up to $120\pm40$ ms and 1% packet loss, matches the prescribed virtual stiffness in contact, and shows a favorable cumulative energy signature in passivity-style tests.

2605.19250 2026-05-20 cs.AI

Causal Evidence for Attention Head Imbalance in Modality Conflict Hallucination

因果证据:模态冲突幻觉中的注意力头不平衡

Jinrui Jiang, Zhangtai Wu, Zhen Wu, Xinyu Dai

发表机构 * National Key Laboratory for Novel Software Technology(新型软件技术国家重点实验室) School of Artificial Intelligence(人工智能学院)

AI总结 本文研究了多模态大语言模型在模态冲突中产生幻觉的原因,通过分析注意力头的因果作用,发现驱动幻觉的头部分布更广且权重更大,而抑制幻觉的头部集中在少量重要头部,提出MACI方法在减少幻觉的同时保持准确性。

详情
AI中文摘要

当多模态大语言模型(MLLMs)优先考虑错误的文本前提而非矛盾的视觉证据时,就会出现模态冲突幻觉。为了理解为什么视觉证据在生成过程中无法占据优势,我们从机制角度出发,考察哪些内部组件驱动或阻碍这一失败。我们通过在五个开源MLLMs上进行头部层面的因果分析,识别出两组具有相反因果作用的注意力头:驱动幻觉的头部和抑制幻觉的头部。我们发现一种一致的不对称性:驱动效应更广泛分布且具有更大的总权重,而抑制效应集中在少量重要头部。消融实验进一步证实,这些组在生成过程中产生相反效果:分布驱动影响和局部抑制共同形成不平衡的路由结构,使生成偏向于错误前提。受此发现启发,我们提出了MACI(模态冲突感知因果干预),一种条件干预方法,仅在检测到冲突时抑制因果识别出的驱动幻觉头部。在五个MLLMs上,MACI在MMMC基准测试中实现了最大的幻觉减少,同时在幻觉准确性之间取得了有利的权衡,并能够零样本转移到SCI-SemanticConflict测试。

英文摘要

Modality-conflict hallucination occurs when multimodal large language models (MLLMs) prioritize erroneous textual premises over contradictory visual evidence. To understand why visual evidence fails to prevail during generation, we take a mechanistic perspective and examine which internal components drive or resist this failure. We perform head-level causal analysis using path patching across five open-source MLLMs and identify two groups of attention heads with opposing causal roles: hallucination-driving heads and hallucination-resisting heads. We find a consistent asymmetry: driving effects are more broadly distributed and carry greater aggregate weight, whereas resisting effects concentrate in a small number of high-importance heads. Ablation experiments further confirm that these groups exert opposing effects during generation: distributed driving influence and localized resistance together form an imbalanced routing structure that biases generation toward the erroneous premise. Motivated by this finding, we propose MACI (Modality-conflict-Aware Causal Intervention), a conditional intervention that suppresses causally identified hallucination-driving heads only when conflict is detected. Across five MLLMs, MACI achieves the largest hallucination reduction among compared inference-time baselines on the MMMC benchmark with a favorable hallucination-accuracy trade-off, and transfers zero-shot to the SCI-SemanticConflict test.

2605.19249 2026-05-20 cs.LG

Beyond Extrapolation: Knowledge Utilization Paradigm with Bidirectional Inspiration for Time Series Forecasting

超越外推:基于双向启发的知识利用范式用于时间序列预测

Liu Chong, Yingjie Zhou, Hao Li, Pengyang Wang, Qingsong Wen, Ce Zhu

发表机构 * College of Computer Science, Sichuan University(四川大学计算机科学学院) Department of Computer and Information Science, University of Macau(澳门大学计算机与信息科学系) School of Information and Communication Engineering, University of Electronic Science and Technology of China(电子科技大学信息与通信工程学院)

AI总结 本文提出了一种新的时间序列预测范式KUP-BI,通过从训练历史库中提炼出延续式知识,为双向预测提供结构化知识,从而提升预测性能。

Comments Accepted to ICML 2026. 18 pages, 6 figures

详情
AI中文摘要

时间序列预测在能源、交通和公共卫生等场景中至关重要。然而,大多数现有预测模型主要依赖单向推理,即从历史映射到目标,而忽略了由修订的自然链('历史(模型输入)--目标(真实输出)--目标后延续')提供的结构信息。目标后延续记录了轨迹在目标后的发展情况,有助于稳定预测,但无法在推理时观测到。本文旨在获得当前输入的近似后延续代理,为双向预测提供结构化知识。该想法被实例化为KUP-BI(Knowledge Utilization Paradigm with Bidirectional Inspiration),一种新的时间序列建模范式,从仅训练的历史库中提炼出延续式知识(作为近似后延续代理),并将其整合到标准预测骨干中。输入流和延续代理流通过轻量级的特征级门控模块进行融合。这种设计不引入训练轨迹中已包含的信息之外的内容;相反,它提供了一种结构化的归纳偏置,帮助骨干利用典型的延续模式,而不是仅依赖参数外推。在六个公开数据集上的实验结果表明,KUP-BI在提升最先进模型的预测性能方面表现一致,且具有较小的额外开销。

英文摘要

Time-series forecasting is critical in various scenarios, such as energy, transportation, and public health. However, most existing forecasters rely primarily on one-way inference, \textit{i.e.}, mapping \textbf{history} to \textbf{target}, and overlook the structural information provided by a revised natural chain (``\textbf{history} (model input) -- \textbf{target} (ground-truth output) -- \textbf{post-target continuation}''). The post-target continuation records how trajectories evolve after the target, which can help stabilize forecasting, but it is not observable at inference time. In this work, we aim to obtain an approximate proxy of the post-target continuation for the current input, providing structural knowledge for bidirectional forecasting. This idea is instantiated as KUP-BI (Knowledge Utilization Paradigm with Bidirectional Inspiration), a new time-series modeling paradigm that distills continuation-style knowledge (as an approximate post-target continuation proxy) from a \emph{train-only} historical library and integrates it into standard forecasting backbones. The input stream and the continuation-proxy stream are fused via a lightweight feature-level gating module. This design does not introduce information beyond what is already contained in the training trajectories; instead, it provides a structured inductive bias that helps backbones exploit typical continuation patterns rather than relying solely on parametric extrapolation. Experimental results on six public datasets show that KUP-BI consistently improves the forecasting performance of state-of-the-art models, with small additional overhead.

2605.19247 2026-05-20 cs.CV

Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search

结构化开放端NAS:利用LLM进行半自动设计知识结构化以实现高效的神经架构搜索

Yuiko Sakuma, Masakazu Yoshimura, Marcel Gröpl, Zitang Sun, Junji Otsuka, Atsushi Irie, Takeshi Ohashi

发表机构 * Sony Group Corporation(索尼集团公司) ETH Zurich(苏黎世联邦理工学院)

AI总结 本文提出一种半自动方法,利用LLM结构化模型设计知识,以指导神经架构搜索过程,通过定义高层结构模板和引入FairNAD算法,实现了高效的开放端搜索空间探索,提升了在多个数据集上的性能。

Comments 42 pages

详情
AI中文摘要

当前的神经架构搜索(NAS)方法通常受到预定义、限制性搜索空间的限制。尽管最近的基于大语言模型(LLM)的NAS方法能够实现开放式的搜索空间,但它们往往由于偏见或低质量的设计想法而导致探索效率低下。为了解决这些问题,我们提出了一种半自动的方法来结构化模型设计知识以指导搜索过程。我们的方法首先定义了高层结构模板,然后通过分析论文,利用LLM填充此模板,从而创建了一个丰富且多样的搜索空间,该空间体现了这种结构化设计知识。为了高效地探索这个庞大的空间,我们引入了FairNAD,使用多类型突变,通过公平的想法采样、帕累托感知突变、LLM驱动的迭代突变和细粒度反馈循环实现广泛的探索。我们展示了FairNAD在发现高性能架构方面的有效性,这些架构在CIFAR-10、CIFAR-100和ImageNet16-120上分别比当前最先进的方法提高了0.84、2.17和2.35个点。

英文摘要

Current neural architecture search (NAS) methods are often limited by their predefined, restrictive search spaces. While recent large language model (LLM)-assisted NAS methods enable open-ended search spaces, they often suffer from inefficient exploration due to biased or low-quality design ideas. To address these issues, we propose to semi-automatically structure model design knowledge to guide the search process. Our approach first defines a high-level structural template of architectural attributes. An LLM then populates this template by analyzing papers, creating a rich and diverse search space that embodies this structured design knowledge. To efficiently explore this vast space, we introduce FairNAD, using a multi-type mutation that enables broad exploration through mutation with fair idea sampling, Pareto-aware mutation, LLM-driven iterative mutation, and a fine-grained feedback loop. We demonstrate the effectiveness of FairNAD in discovering high-performing architectures that yield 0.84, 2.17, and 2.35 points improvement on CIFAR-10, CIFAR-100, and ImageNet16-120, respectively, compared to current state-of-the-art methods.

2605.19243 2026-05-20 cs.LG cs.AI cs.CG

Euclidean Embedding of Data Using Local Distances

利用局部距离进行数据的欧几里得嵌入

Dimitris Arabadjis

发表机构 * Department of Statistics and Actuarial-Financial Mathematics(统计与精算-金融数学系) University of the Aegean(爱琴海大学)

AI总结 本文研究了在仅给定局部距离图的情况下恢复全局一致的欧几里得嵌入问题,提出了一种能够最优表示这些距离的方法。该方法仅在由成对距离加权的邻域图上操作,不需要任何先前的数据向量表示。通过求解一个变分问题,将图上的局部距离与由嵌入函数微分诱导的欧几里得度量匹配。所得的欧拉-拉格朗日方程以坐标自由形式推导,允许仅从距离图直接评估所有算子。尽管非线性和缺少非线性的显式表达式,这些方程被证明可以作为迭代更新的稀疏线性问题解决。本文的主要贡献包括:(a)推导出在连续体中支配最优欧几里得嵌入的功能方程;(b)一种不依赖于特征向量的表示形式,仅需要邻域距离图;(c)基于纯粹局部图操作的估计程序。我们在合成流形和真实数据集上实验性地评估了所得到的非参数算法,证明了在保持局部度量结构和邻近关系的同时,能够近似全局等距嵌入。

详情
AI中文摘要

我们研究了在仅给定局部距离图的情况下恢复全局一致的欧几里得嵌入问题,并提出了一种能够最优表示这些距离的方法。该方法仅在由成对距离加权的邻域图上操作,不需要任何先前的数据向量表示。嵌入是通过求解一个变分问题来实现的,该问题将图上的局部距离与由嵌入函数微分诱导的欧几里得度量匹配。所得的欧拉-拉格朗日方程以坐标自由形式推导,允许仅从距离图直接评估所有算子。尽管非线性和缺少非线性的显式表达式,这些方程被证明可以作为迭代更新的稀疏线性问题解决。本文的主要贡献包括:(a)推导出在连续体中支配最优欧几里得嵌入的功能方程;(b)一种不依赖于特征向量的表示形式,仅需要邻域距离图;(c)基于纯粹局部图操作的估计程序。我们在合成流形和真实数据集上实验性地评估了所得到的非参数算法,证明了在保持局部度量结构和邻近关系的同时,能够近似全局等距嵌入。

英文摘要

We study the problem of recovering a globally consistent Euclidean embedding of data, given only a local distance graph and propose a method that optimally represents these distances. The method operates solely on a neighborhood graph weighted by pairwise distances, without requiring any prior vector representation of the data. The embedding is obtained by solving a variational problem that matches local, on-graph distances to the Euclidean metric, induced by the differentials of the embedding functions. The resulting Euler-Lagrange equations are derived in a coordinate-free form, enabling direct evaluation of all operators from the distance graph alone. Though non-linear and missing an explicit expression for their non-linearity, these equations are shown to be resolved as an iteratively updated sparse linear problem. The main contributions of the proposed approach are (a) the derivation of the functional equations governing the optimal Euclidean embedding in the continuum, (b) a representation-free formulation that requires only a neighborhood distance graph and no feature vectors and (c) an estimation procedure based exclusively on local graph operations. We experimentally evaluate the resulting non-parametric algorithm on synthetic manifolds and real datasets, demonstrating consistent preservation of local metric structure and neighboring relations, while approximating the global isometric embedding.

2605.19242 2026-05-20 cs.CV cs.AI cs.ET cs.LG cs.MM

PhyWorld: Physics-Faithful World Model for Video Generation

PhyWorld: 用于视频生成的物理忠实世界模型

Pu Zhao, Juyi Lin, Timothy Rupprecht, Arash Akbari, Chence Yang, Rahul Chowdhury, Elaheh Motamedi, Arman Akbari, Yumei He, Chen Wang, Geng Yuan, Weiwei Chen, Yanzhi Wang

发表机构 * Northeastern University(东北大学) University of Georgia(佐治亚大学) Tulane University(路易斯安那大学) EmbodyX

AI总结 本文提出PhyWorld,一种通过两阶段训练提升视频生成模型的物理忠实性,以改进世界模拟器的性能,从而更有效地支持物理AI系统。

详情
AI中文摘要

世界模拟器可以在真实世界部署前提供安全且可扩展的环境来训练物理AI系统。大型视频生成模型正成为此类模拟器的有希望的基础,因为它们能够生成多样且逼真的视觉未来。然而,将其用作世界模拟器需要物理忠实的视频延续,即生成的视频应保持由条件输入隐含的物理状态,并以符合基本物理原理的方式演变。我们提出了PhyWorld,一种视频生成世界模型,通过两阶段的后训练来生成时间上一致且物理忠实的场景延续。在第一阶段,我们通过流匹配微调改进视频到视频延续,鼓励稳定视觉属性和帧间一致的运动动态。在第二阶段,我们通过直接偏好优化(DPO)对物理偏好对进行对齐,使模型朝着更符合物理合理性的输出发展。为了评估PhyWorld,我们使用了标准视频质量基准和专门的物理忠实性基准,并对每条物理定律进行评分。实验表明,PhyWorld提高了视频一致性,其在VBench上的平均得分为0.769,比最先进的基线0.756或更低。PhyWorld还提高了物理合理性,其在我们物理忠实性基准上的平均得分为3.09,比最强基线的2.99有所提高。这些结果表明,通过延续和物理偏好信号对大型视频生成模型进行后训练,可以使其成为更有效的物理AI世界模拟器。

英文摘要

World simulators can provide safe and scalable environments for training Physical AI systems before real-world deployment. Large video generation models are emerging as a promising basis for such simulators because they can generate diverse and realistic visual futures. However, using them as world simulators requires physically faithful video continuations, namely, generated videos that preserve the physical state implied by the conditioning input, and evolve in ways consistent with basic physical principles. We propose PhyWorld, a video generation world model designed to produce temporally coherent and physically faithful scene continuations through two-stage post-training. In the first stage, we improve video-to-video continuation with flow matching fine-tuning, encouraging stable visual attributes and coherent motion dynamics across frames. In the second stage, we align generated dynamics with physical principles using Direct Preference Optimization (DPO) over physics preference pairs, guiding the model toward outputs with higher physical plausibility. To evaluate PhyWorld, we use both standard video-quality benchmarks and a dedicated physical-faithfulness benchmark with per-law scoring. Experiments show that PhyWorld improves video consistency, achieving an average score of 0.769 on VBench compared with 0.756 or below for state-of-the-art baselines. PhyWorld also improves physical plausibility, reaching an average score of 3.09 on our physical-faithfulness benchmark compared with 2.99 for the strongest baseline. These results suggest that post-training large video generation models with continuation and physics-preference signals can make them more effective world simulators for Physical AI.

2605.19235 2026-05-20 cs.LG cs.GT

GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning

GAE在不完全信息自博弈强化学习中表现不足

Zhiyuan Fan, Gabriele Farina

发表机构 * MIT(麻省理工学院)

AI总结 本文研究了不完全信息博弈中自博弈强化学习中GAE估计器的方差问题,提出Q-boosting和VRPO算法以减少方差并提升性能。

详情
AI中文摘要

不完全信息博弈中的竞争多智能体强化学习需要智能体在部分可观测环境下对抗对手,需要随机策略。虽然使用近端策略优化(PPO)的自博弈强化学习在经验上取得了成功,但其标准优势估计器广义优势估计(GAE)由于随机未来动作的采样而产生额外的方差。在均衡自博弈中,这种方差被均衡策略的随机性放大,并且即使当批评器是精确的时仍然存在。我们通过引入基于集中动作价值批评的Q-boosting,一种方差减少的优势估计器,以及提出方差减少策略优化(VRPO),将此新估计器纳入其中。该算法用多步期望SARSA(λ)轨迹替代了采样的多步备份,每一步计算策略期望以平均动作采样噪声,同时保留PPO的裁剪目标和在线策略演员更新。经验上,VRPO在中等规模到大规模游戏,包括斗地主和头衔无限制德州扑克中都表现出强劲的性能。

英文摘要

Competitive multi-agent reinforcement learning in imperfect-information games requires agents to act under partial observability and against adversarial opponents, necessitating stochastic policies. While self-play reinforcement learning with Proximal Policy Optimization (PPO) has achieved strong empirical success, its standard advantage estimator, generalized advantage estimation, suffers from additional variance due to the sampling of stochastic future actions. This variance is amplified in equilibrium self-play because of the stochastic nature of the equilibrium policy and persists even when the critic is exact. We address this bottleneck by introducing $Q$-boosting, a variance-reduced advantage estimator based on a centralized action-value critic, and propose Variance-Reduced Policy Optimization (VRPO), incorporating this new estimator. The algorithm replaces sampled multi-step backups with a multi-step Expected SARSA$(λ)$ trace, computing policy expectations at each step to average out action-sampling noise, while retaining PPO's clipped objective and on-policy actor updates. Empirically, VRPO consistently achieves strong performance from mid-sized to large-scale games including Dou Dizhu and Heads-Up No-Limit Texas Hold'em.

2605.19234 2026-05-20 cs.CL cs.AI

AI Technologies in Language Access: Attitudes Towards AI and the Human Value of Language Access Managers

人工智能技术在语言接入中的应用:对人工智能的态度以及语言接入管理者的人类价值

Miguel A. Jiménez-Crespo, Stephanie Rodriguez, Alejandro Jaume Losa

发表机构 * Rutgers University/ Dept. of Spanish(罗格斯大学西班牙语系) Rutgers University/ Dept. of Spanish and and Portuguese(罗格斯大学西班牙语和葡萄牙语系)

AI总结 本文探讨了人工智能在语言接入中的影响,通过分析十位美国语言接入管理者在医疗、法庭、公共服务和地方政府领域的半结构化访谈,揭示了语言接入管理者对人工智能的有条件乐观态度以及对人工智能实施中人类价值和人类监督的高度重视。

Comments 11 pages, 2 tables, Convergence Conference 2026

详情
AI中文摘要

人工智能技术的快速出现正在重塑翻译实践和理论。本文探讨了人工智能在语言接入中的影响。这一领域的特点在于需要服务于广泛且多样化的用户群体,而效率和可及性受到法律要求、伦理和商业矛盾以及安全问题的影响。本文报告了语言接入管理者对人工智能以及人工智能时代的人类价值的态度和看法。方法上,本文呈现了一项关于语言接入和技术的更大研究的子集分析,具体为对十位美国语言接入管理者进行的定性主题分析,这些管理者在医疗、法庭、公共服务和地方政府领域工作。结果表明,语言接入管理者对不可避免的人工智能实施表现出有条件乐观,对风险具有强烈意识,并对人工智能实施和输出中的人类价值和人类监督有深刻承诺。

英文摘要

The rapid emergence of AI technologies is reshaping translation practices and theory across the board. This paper deals with the impact of AI in language access. This area is characterized by the need to serve broad and diverse user populations, within a context where efficiency and access are shaped by legal mandates, ethical and commercial tensions, and safety concerns. This paper reports on the attitudes and perceptions of language access managers towards the AI and the human value in the AI age. Methodologically, this paper presents an analysis of a subset of a broader study on language access and technology, specifically a qualitative thematic analysis of ten semi-structured interviews with language access managers in the USA working in healthcare, court, public service and local government contexts. The results indicate that language access managers show conditional optimism towards the inevitable AI implementations, are strongly risk aware, and deeply committed to the human value and human oversight of AI implementations and output.

2605.19231 2026-05-20 cs.LG stat.ML

DeRegiME: Deep Regime Mixtures for Probabilistic Forecasting under Distribution Shift

DeRegiME:用于分布偏移下概率预测的深度制度混合

Kieran Wood, Stefan Zohren, Stephen J. Roberts

发表机构 * Machine Learning Research Group, University of Oxford(牛津大学机器学习研究组) Oxford-Man Institute, University of Oxford(牛津大学奥克斯曼研究所)

AI总结 DeRegiME通过引入稀疏变分高斯过程,实现了概率预测中的制度混合,解决了神经预测器在处理分布偏移时的不足,提升了预测密度的准确性。

详情
AI中文摘要

我们介绍了DeRegiME--深度制度混合专家--一种直接多时间跨度的概率预测器,它将潜在的不确定性制度与底层信号分开,并使用稀疏变分高斯过程(GP)软地将每个预测位置分配给学习到的重复制度。该过程通过共享门将非平稳制度混合核和学生t分布似然结合起来,从而得到一个单一的稀疏GP后验,而不是GP专家的混合。DeRegiME解决了神经预测器的一个关键限制:点预测丢弃残差不确定性,而概率头--无论是单边际、未解释的混合、分位数集还是扩散样本--很少暴露残差的制度结构。然而,在噪声异方差时间序列中,分布偏移可能突然、逐渐或时间依赖性出现,通常出现在残差不确定性而非条件均值中。DeRegiME提供了一个可解释的均值-残差-噪声分解,通过直接求和的特征空间表示,将制度锚定为残差相似性的聚类,其转换表现为隐含的转折点。有效制度的数量通过粘性打破门进行修剪。我们证明了核的有效性及预测密度的正确性,并在十个基准和三个编码器网格上,DeRegiME在最强大的编码器匹配基线(DeepAR/GluonTS风格的动态学生t头)上将负对数预测密度(NLPD)提高了20.3%,并在CRPS(3.0%)和MSE(4.7%)上获得并行收益。改进在所有数据集中保持一致,这些数据集涵盖了突然、逐渐和季节性偏移。

英文摘要

We introduce DeRegiME -- Deep Regime Mixture of Experts -- a direct multi-horizon probabilistic forecaster that separates latent uncertainty regimes from the underlying signal and softly assigns each forecast location to learned recurring regimes using a sparse variational Gaussian process (GP) whose nonstationary regime-mixing kernel and Student-t likelihood combine per-regime sub-kernels and noise processes via a shared gate. This yields a single sparse-GP posterior, not a mixture of GP experts. DeRegiME addresses a key limitation of neural forecasters: point forecasts discard residual uncertainty, and probabilistic heads -- whether single marginals, uninterpreted mixtures, quantile sets, or diffusion samples -- rarely expose the regime structure of the residual. Yet distribution shift in noisy heteroskedastic time series may be abrupt, gradual, or horizon-dependent and often appears in residual uncertainty rather than the conditional mean. DeRegiME yields an interpretable mean-residual-noise decomposition with a direct-sum feature-space representation that anchors regimes as clusters of residual similarity whose transitions surface as implicit changepoints. The effective number of regimes is pruned by the stick-breaking gate. We prove kernel validity and predictive-density propriety, and across ten benchmarks and three encoder grids DeRegiME improves negative log predictive density (NLPD) by 20.3% over the strongest encoder-matched baseline, a DeepAR/GluonTS-style dynamic Student-t head, with parallel gains on CRPS (3.0%) and MSE (4.7%). Improvements are consistent across all datasets, which span abrupt, gradual, and seasonal shifts.

2605.19230 2026-05-20 cs.CV cs.LG

Robust Mitigation of Age-Dependent Confounding Effects via Sample-Difficulty Decorrelation

通过样本难度去相关性实现鲁棒的年龄依赖性混杂效应缓解

Nikhil Cherian Kurian, Victor Caquilpan Parra, Abin Shoby, Luke Whitbread, Lyle J. Palmer

发表机构 * Australian Institute for Machine Learning(澳大利亚机器学习研究所) Adelaide University(阿德莱德大学)

AI总结 本文提出了一种鲁棒框架,通过针对虚假的年龄相关趋势而非强制不变性来缓解年龄依赖性混杂效应,通过样本难度建模和去相关年龄与主导年龄难度趋势,减少年龄相关的真阳性与假阳性差异,同时保持临床有意义的非线性年龄信息。

Comments 10 Pages, 3 Figures

详情
AI中文摘要

医学图像分类中的年龄依赖性性能差异通常是因为年龄作为混杂因素,将成像形态与疾病流行率联系起来。在实践中,差异可能表现为在疾病流行率较高的年龄过诊断,而在流行率较低的年龄下诊断不足,并在训练测试年龄分布变化时恶化。传统缓解方法强制严格年龄不变性可能会抑制在年龄中编码的诊断性信息。因此,我们提出了一种鲁棒框架,通过针对虚假的年龄相关趋势而非强制不变性来缓解年龄依赖性混杂效应。在预热阶段后,我们表征样本难度并以标签条件方式建模其年龄依赖性趋势。通过使用鲁棒的Huber加权亲和权重去相关年龄与主导年龄难度趋势,削弱由混杂驱动的捷径,同时保留临床有意义的非线性年龄信息。我们进一步引入了一个年龄覆盖分数,通过mini-batch年龄方差缩放去相关惩罚,以确保在有限年龄多样性下稳定的优化。在两个放射学数据集中,我们的方法在最小化AUC影响的同时减少了年龄相关的真阳性与假阳性差异,并在增加的训练测试年龄分布变化下保持稳健。

英文摘要

Age dependent performance disparities in medical image classification often arise because age acts as a confounder, linking imaging morphology with disease prevalence. In practice, disparities can manifest as overdiagnosis at ages where disease prevalence is higher and underdiagnosis at ages where prevalence is lower, and can worsen under train test shifts in the age distribution. Conventional mitigation approaches that enforce strict age invariance may suppress diagnostically meaningful information encoded in age. We therefore propose a robust framework that mitigates the effects of age-dependent confounding by targeting spurious age linked trends rather than enforcing invariance. Following a warm-up phase, we characterize sample difficulty and model its age-dependent trends in a label-conditioned manner. We decorrelate age from dominant age difficulty trends using robust, Huber weighted affinity weights, attenuating confounding-driven shortcuts while preserving clinically meaningful, nonlinear age information. We further introduce an Age Coverage Score that scales the decorrelation penalty by minibatch age variance to ensure stable optimization under limited age diversity. Across two radiology datasets, our approach reduces age dependent true and false positive disparities with minimal AUC impact and remains robust to increasing train test age distribution shifts.

2605.19229 2026-05-20 cs.AI

Can Large Language Models Revolutionize Survey Research? Experiments with Disaster Preparedness Responses

大型语言模型能否革新调查研究?与灾害准备响应的实验

Yan Wang, Ziyi Guo, Christopher McCarty

发表机构 * Dept. of Urban and Regional Planning & Florida Institute for Built Environment Resilience, University of Florida(城市与区域规划系及佛罗里达环境韧性研究所,佛罗里达大学) College of Liberal Arts and Sciences, Bureau of Economic and Business Research, University of Florida(文理学院及经济与商业研究局,佛罗里达大学)

AI总结 本文探讨了大型语言模型在调查研究中的应用,通过实验验证了其在灾害准备响应中的有效性,提出了一个五阶段框架,涵盖问卷设计、样本选择、试点测试、缺失数据填补和事后分析,并介绍了基于保护动机理论的协同出现知识图谱和七种LLM配置。

详情
AI中文摘要

调查研究面临日益严峻的结构性挑战:响应率下降、样本偏差、高风险受访者中的块状缺失以及在线面板中AI辅助的欺诈性完成。大型语言模型(LLMs)已被提出作为解决方案,但迄今为止,对整个调查工作流程的严格评估仍然有限,特别是在灾害情境中,数据质量至关重要。我们提出并评估了一个五阶段框架,用于LLM的整合,涵盖问卷设计、样本选择、试点测试、缺失数据填补和事后分析,使用2024年飓风米勒尔准备调查(佛罗里达居民,n=946)作为共享的实证测试床。我们引入了一个受保护动机理论(PMT)约束的协同出现知识图谱,并开发了七种LLM配置,涵盖零样本推理、检索增强基线和新型理论指导变体。我们提出的锚定边际理论指导LLM(A-TLM)在灾难相关块状MNAR条件下,在RMSE上优于所有三个经典填补基线(IPW/MI、MICE+PMM、missForest)(S4 RMSE 1.439 vs. 1.496 for the next-best),同时在接近零的符号偏差(-0.121)方面优于随机森林填补器(产生最大的绝对偏差-0.631)。围绕PMT因果结构组织检索,并在单个模型调用中整合所有证据,优于无结构检索和分阶段顺序推理(MAE 0.993 vs. 1.097 for standard RAG)。我们记录了接近零的总体偏差可以掩盖相反的子组误差,并提出子组分层偏差审计作为报告标准。一个检索受限的知识图谱聊天机器人展示了幻觉是通过接地拒绝可管理的。

英文摘要

Survey research faces mounting structural challenges: declining response rates, sample bias, block-wise missingness among at-risk respondents, and AI-assisted fraudulent completions in online panels. Large language models (LLMs) have been proposed as a remedy, yet rigorous evaluations across the full survey workflow remain scarce, particularly in disaster contexts where data quality matters most. We present and evaluate a five-stage framework for LLM integration covering questionnaire design, sample selection, pilot testing, missing-data imputation, and post-collection analysis, using the 2024 Hurricane Milton preparedness survey of Florida residents (n=946) as a shared empirical testbed. We introduce a Protection Motivation Theory (PMT)-constrained co-occurrence knowledge graph and develop seven LLM configurations spanning zero-shot inference, retrieval-augmented baselines, and novel theory-informed variants. Our proposed Anchored Marginal Theory-Informed LLM (A-TLM) outperforms all three classical imputation baselines (IPW/MI, MICE+PMM, missForest) on RMSE under disaster-relevant block-wise MNAR conditions (S4 RMSE 1.439 vs. 1.496 for the next-best), while achieving near-zero signed bias (-0.121) where the random-forest imputer produces the largest absolute bias (-0.631). Organizing retrieval around PMT causal structure and integrating all evidence in a single model call outperforms unstructured retrieval and staged sequential inference (MAE 0.993 vs. 1.097 for standard RAG). We document that near-zero aggregate bias can mask opposing subgroup errors and propose subgroup-stratified bias auditing as a reporting standard. A retrieval-constrained knowledge-graph chatbot demonstrates that hallucination is architecturally manageable through grounded refusal.

2605.19224 2026-05-20 cs.CL

Fine-tuning language encoding models on slow fMRI improves prediction for fast ECoG

在慢速fMRI上微调语言编码模型以提高对快速ECoG的预测

Aditya R. Vaidya, Richard J. Antonello, Alexander G. Huth

发表机构 * Department of Computer Science(计算机科学系) The University of Texas at Austin(德克萨斯大学奥斯汀分校) Zuckerman Mind Brain Behavior Institute(祖克曼心智-脑-行为研究所) Columbia University(哥伦比亚大学) Departments of Neuroscience and Statistics(神经科学与统计学系) University of California, Berkeley(加州大学伯克利分校)

AI总结 该研究通过在慢速fMRI上微调语言编码模型,提高了对快速ECoG数据的预测性能,展示了慢速数据在构建快速脑数据模型中的价值。

详情
AI中文摘要

神经科学家最近开始使用侵入性脑记录方法,如电极皮层图(ECoG),进行人类实验,因为它们提供了精细的空间和时间分辨率。然而,训练这些数据的模型受到能接受记录植入物的患者群体的限制。我们提出利用非侵入性fMRI来弥补训练数据的不足。通过在fMRI上微调的语音语言表示,我们构建了ECoG的编码模型。这些表示在ECoG上的预测性能得到了提高,尽管fMRI的时间分辨率比ECoG差两个数量级。预测性能在远超fMRI直接测量的频率带中有所提升。接下来,为了测试该方法的泛化能力,我们对在fMRI响应上以2倍时间下采样率微调的模型进行了测试。尽管分辨率有所下降,这些模型仍能预测fMRI和ECoG响应,与原始fMRI微调模型的水平相当。最后,我们展示了ECoG性能与fMRI微调数据量之间有稳定的关系。我们的结果表明,像fMRI这样的“慢”数据可以成为构建“快”脑数据模型如ECoG的宝贵资源。未来,整合多种记录方法可能进一步提高在其他应用中的性能,如解码。

英文摘要

Neuroscientists have recently turned to intracranial brain recording methods, like electrocorticography (ECoG), for human experiments because of the fine spatial and temporal resolution that they afford. Models trained on this data, however, are fundamentally restricted by the patient populations that can receive the implants necessary for recording. We propose using non-invasive fMRI to bridge the gap in training data. Using spoken language representations fine-tuned on fMRI, we build encoding models of ECoG. These representations showed improved prediction performance in ECoG, even though the temporal resolution of fMRI is two orders of magnitude worse. Prediction improved in frequency bands well beyond what is directly measured in fMRI. Next, to test the procedure's generalization ability, we fine-tuned models on fMRI responses that were temporally downsampled by a factor of 2. Despite the loss in resolution, these models were able to predict fMRI and ECoG responses at levels comparable to the original fMRI-tuned models. Finally, we showed that ECoG performance steadily scales with the amount of fMRI-tuning data. Our results show that "slow" data like fMRI can be a valuable resource for building better models of "fast" brain data like ECoG. In the future, integrating across multiple recording methods may further improve performance in other applications, like decoding.

2605.19223 2026-05-20 cs.CV

HAVEN: Hierarchically Aligned Multimodal Benchmark for Unified Video Understanding

HAVEN:用于统一视频理解的层次对齐多模态基准

Mengqi Shi, Haopeng Zhang

发表机构 * Department of Information and Computer Sciences(信息与计算机科学系)

AI总结 本文提出HAVEN,一个用于统一视频理解的层次对齐多模态基准,旨在解决现有多模态大语言模型在复杂叙事总结和推理方面评估不足的问题,通过引入全粒度和全多模态的数据集架构,提供了一个严谨的标准测试平台。

详情
AI中文摘要

尽管多模态大语言模型(MLLMs)在标准视频任务上表现出色,但其在复杂叙事的忠实总结和推理能力仍缺乏充分评估。现有总结基准在监督上分散于孤立的粒度层面,如关键帧、关键镜头或不连贯的文本总结,未能捕捉跨模态对齐的内在层次结构。为了解决这一关键差距,我们引入了HAVEN,一个用于统一视频理解的层次对齐多模态基准。HAVEN开创了一种全粒度(帧、镜头和视频层面)且全多模态(视频和文本)的数据集架构,配备了明确的、连续的模态对齐。基于这一统一的标注范式,我们提出了涵盖总结、时间推理、多模态定位和显著性排序的综合评估套件。对最新MLLMs的广泛基准测试揭示了表面文本流畅性与基于多模态理解之间的持续差距。最终,HAVEN推动了多模态系统的评估超越传统问答格式,提供了一个严谨、标准化的测试平台,以推动未来可解释、层次化的视频理解研究。我们公开发布了数据集、基准套件和评估协议。

英文摘要

While Multimodal Large Language Models (MLLMs) exhibit strong performance on standard video tasks, their ability to faithfully summarize and reason over complex narratives remains poorly evaluated. Existing summarization benchmarks fragment supervision across isolated granularities, such as keyframes, key shots, or disjointed text summaries, failing to capture the inherently hierarchical structure of cross-modal alignment. To address this critical gap, we introduce HAVEN, a hierarchically aligned multimodal benchmark for unified video understanding. HAVEN pioneers a fully granular (frame, shot, and video levels) and fully multimodal (video and text) dataset architecture, complete with explicit, continuous alignment between modalities. Built upon this unified annotation paradigm, we propose a comprehensive evaluation suite spanning summarization, temporal reasoning, multimodal grounding, and saliency ranking. Extensive benchmarking of state-of-the-art MLLMs exposes a persistent gap between surface-level textual fluency and grounded multimodal understanding. Ultimately, HAVEN advances the evaluation of multimodal systems beyond traditional QA formats, offering a rigorous, standardized testbed to drive future research in interpretable, hierarchical video understanding. We publicly release the dataset, benchmark suite, and evaluation protocols.

2605.19220 2026-05-20 cs.CL cs.AI cs.LG

Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering

位置:在LLM中的不确定性量化仅仅是无监督聚类

Tiejin Chen, Longchao Da, Xiaoou Liu, Hua Wei

发表机构 * School of Computing(计算学院) Augmented Intelligence, Arizona State University(智能增强与亚利桑那州立大学)

AI总结 本文指出,当前LLM的不确定性量化方法本质上是无监督聚类算法,无法有效评估模型的外部正确性,导致无法检测出自信但错误的回答。文章提出了改进的不确定性量化方法,以确保模型的自信度能可靠地反映现实。

Comments Accepted by ICML 2026 Position Paper Track

详情
AI中文摘要

不确定性量化(UQ)被广泛认为是部署大型语言模型(LLM)于高风险领域的主要保障。然而,我们主张该领域存在类别错误:主流LLM的UQ方法本质上是无监督聚类算法。我们证明大多数当前方法本质上量化的是模型生成的内部一致性,而不是其外部正确性。因此,当前方法从根本上无法识别事实现实,并无法检测出“自信幻觉”,即模型在稳定但错误的答案上表现出高自信。因此,当前UQ方法在部署模型时可能会产生误导的安全感。具体而言,我们识别出由于对内部状态的依赖而产生的三种关键病理:超参数敏感危机,使部署不安全;内部评估循环,将稳定性与事实混淆;以及缺乏事实基础,迫使依赖不稳定代理指标来评估不确定性。为解决这一困境,我们倡导向UQ方法转变,并为研究界制定研究路线图,以采用更好的评估指标和设置,实施原生不确定性机制的变化,并将验证锚定在客观事实上,确保模型自信度能可靠地反映现实。

英文摘要

Uncertainty Quantification (UQ) is widely regarded as the primary safeguard for deploying Large Language Models (LLMs) in high-stakes domains. However, we argue that the field suffers from a category error: mainstream UQ methods for LLMs are just unsupervised clustering algorithms. We demonstrate that most current approaches inherently quantify the internal consistency of the model's generations rather than their external correctness. Consequently, current methods are fundamentally blind to factual reality and fail to detect ``confident hallucinations,'' where models exhibit high confidence in stable but incorrect answers. Therefore, the current UQ methods may create a deceptive sense of safety when deploying the models with uncertainty. In detail, we identify three critical pathologies resulting from this dependence on internal state: a hyperparameter sensitivity crisis that renders deployment unsafe, an internal evaluation cycle that conflates stability with truth, and a fundamental lack of ground truth that forces reliance on unstable proxy metrics to evaluate uncertainty. To resolve this impasse, we advocate for a paradigm shift to UQ and outline a roadmap for the research community to adopt better evaluation metrics and settings, implement mechanism changes for native uncertainty, and anchor verification in objective truth, ensuring that model confidence serves as a reliable proxy for reality.

2605.19219 2026-05-20 cs.AI

SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents

SimGym:一种用于电子商务A/B测试模拟的框架,使用基于流量的VLM代理

Han Li, Vibhor Malik, Zahra Zanjani Foumani, Alberto Castelo, Shuang Xie, Ailin Fan, Keat Yang Koay, Yuanzheng Zhu, Meysam Feghhi, Ronie Uliana, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara, Zhong Wu, Lingyun Wang

发表机构 * Shopify

AI总结 本文提出SimGym框架,通过基于流量的VLM代理模拟电子商务A/B测试,解决真实测试周期长、风险高等问题,验证结果显示其能快速准确预测用户行为变化。

详情
AI中文摘要

A/B测试仍然是评估电子商务店铺修改的黄金标准,但其分流流量、需要数周才能达到统计显著性,并有降低用户体验的风险。我们提出了SimGym,一种使用视觉语言模型(VLM)代理在浏览器中模拟A/B测试的框架。该框架包含三个关键组件:(a)基于流量的买家人设生成管道,从生产点击流数据中推导出每个店铺的买家人设和意图;(b)实时浏览器代理架构,结合多模态感知和情景记忆与守卫规则,以在控制和处理店铺中进行连贯的购物会话;(c)评估协议,将模拟的成果变化与实际买家行为的观察变化进行比较。我们验证了SimGym在主要电子商务平台上对视觉驱动的UI主题变化的A/B测试,结果表明SimGym代理在观察到的成果变化上表现良好,与实际买家流量中不同界面变体的add-to-cart变化达成77%的方向一致。它将实验周期从数周减少到不到一小时,使快速实验成为可能,而无需将真实买家暴露于候选变体中。

英文摘要

A/B testing remains the gold standard for evaluating modifications to e-commerce storefronts, yet it diverts traffic, requires weeks to reach statistical significance, and risks degrading user experience. We present SimGym, a framework for simulating A/B tests on e-commerce storefronts using vision-language model (VLM) agents operating in a live browser. The framework comprises three key components: (a) a traffic-grounded persona generation pipeline that derives per-shop buyer archetypes and intents from production clickstream data; (b) a live-browser agent architecture that combines multimodal perception over visual and browser-structured observations with episodic memory and guardrails to conduct coherent shopping sessions across control and treatment storefronts; and (c) an evaluation protocol that compares simulated outcome shifts with observed shifts in real buyer behavior. We validate SimGym on A/B tests of visually driven UI theme changes from a major e-commerce platform across diverse storefronts and product categories. Empirical results show that SimGym agents achieve strong agreement with observed outcome shifts, attaining 77% directional alignment with add-to-cart shifts observed across interface variants in real-buyer traffic. It reduces experimental cycles from weeks to under an hour, enabling rapid experimentation without exposing real buyers to candidate variants.

2605.19218 2026-05-20 cs.CV cs.AI

Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference

旋转对齐的关键通道剪枝用于高效的视觉-语言模型推理

Beomseok Kang, Dongwon Jo, Jiwon Song, Donghwee Son, Jae-Joon Kim

发表机构 * Seoul National University(首尔国立大学)

AI总结 本文提出旋转对齐的关键通道剪枝方法,通过压缩通道维度在固定KV缓存预算下保留更多视觉token,解决传统token剪枝在细粒度感知任务中的性能下降问题,同时提升解码效率。

详情
AI中文摘要

视觉-语言模型在推理过程中面临严重的KV缓存压力,因为单张图像通常会编码成数千个token。现有方法主要通过token稀疏性进行token剪枝,但永久丢弃视觉内容导致细粒度感知任务显著退化。为此,本文提出一个互补的轴,即特征稀疏性:在固定KV缓存预算下,压缩通道维度可以在相同内存成本下保留更多视觉token。然而,现有关键通道剪枝方法面临结构上的权衡:基于token的通道剪枝具有表现力但不结构化且较慢,而基于head的方法则硬件友好但不够稳健。本文通过RotateK,一种基于旋转的结构化关键通道剪枝框架,解决这一问题。RotateK应用基于PCA的在线旋转,将token依赖的通道重要性对齐到共享的低维子空间,从而在轻量级head掩码下实现精确剪枝;融合的Triton注意力内核直接在稀疏通道的Key上操作以实现高效的解码。在两个代表性的VLM后端上进行的实验表明,RotateK在准确率和解码延迟方面均优于现有关键通道剪枝方法,而联合token-通道剪枝在匹配的KV缓存预算下优于仅token剪枝的基线。

英文摘要

Vision-Language Models suffer severe KV cache pressure at inference, as a single image often encodes into thousands of tokens. Most existing methods exploit token sparsity through token pruning, but permanently discarding visual content causes substantial degradation on fine-grained perception tasks. This motivates a complementary axis, feature sparsity: under a fixed KV cache budget, compressing the channel dimension preserves more visual tokens at the same memory cost. Prior Key channel pruning methods, however, face a structural trade-off: token-wise channel pruning is expressive but unstructured and slow, while head-wise approach is hardware-friendly but less robust. We resolve this with RotateK, a rotation-based structured Key channel pruning framework. RotateK applies an online PCA-based rotation that aligns token-dependent channel importance into a shared low-dimensional subspace, enabling accurate pruning under lightweight head-wise masks; a fused Triton attention kernel operates directly on sparse-channel Keys for efficient decoding. Experiments on two representative VLM backbones show that RotateK consistently outperforms prior Key channel pruning in both accuracy and decoding latency, while joint token-channel pruning improves over token-only baselines at matched KV cache budgets.

2605.19215 2026-05-20 cs.AI

Not all uncertainty is alike: volatility, stochasticity, and exploration

并非所有不确定性都相同:波动性、随机性与探索

Payam Piray

发表机构 * Department of Psychology, University of Southern California(南加州大学心理学系)

AI总结 本文研究了在生物和人工智能中适应性决策中波动性和随机性对探索的影响差异,提出了CAUSE方法以提升探索效率。

详情
AI中文摘要

在生物和人工智能中适应性决策需要在利用已知结果和探索不确定替代方案之间取得平衡。尽管先前研究表明不确定性通常促进探索,但通常将不同的环境不确定性来源视为等同。我们考虑具有潜在线性奖励状态随时间变化(波动性)和通过噪声结果观察(随机性)的环境。两者都增加后验不确定性,但我们显示它们驱动最优探索的方向相反:波动性增强它,随机性抑制它。我们通过将Gittins指数框架扩展到具有潜在线性动态的高斯状态空间带顿时,正式建立了这种不对称性。我们进一步推导出Cause-Aware Uncertainty-Sensitive Exploration (CAUSE),一种通过控制-推理获得的闭式探索奖励,继承了相同的单调性。CAUSE在具有异质噪声结构的环境中优于标准探索策略,并且在非休息带顿设置中改进了Gittins-per-arm策略。学习和探索由相同的噪声推理不对称性所支配,并且该框架预测病理噪声推理会产生相反而非仅仅受损的探索,对计算精神病学的解释具有启示。

英文摘要

Adaptive decision-making in biological and artificial intelligence requires balancing the exploitation of known outcomes with the exploration of uncertain alternatives. Although prior work suggests that uncertainty generally promotes exploration, it has typically treated distinct sources of environmental uncertainty as equivalent. We consider environments with latent reward states that drift over time (volatility) and are observed through noisy outcomes (stochasticity). Both increase posterior uncertainty, yet we show they drive optimal exploration in opposite directions: volatility enhances it, stochasticity suppresses it. We establish this asymmetry formally by extending the Gittins index framework to Gaussian state-space bandits with latent dynamics. We further derive Cause-Aware Uncertainty-Sensitive Exploration (CAUSE), a closed-form exploration bonus obtained via control-as-inference that inherits the same monotonicities. CAUSE outperforms standard exploration strategies in environments with heterogeneous noise structure, and also improves on a Gittins-per-arm policy whose rested-bandit optimality does not transfer to restless settings. Learning and exploration are governed by the same noise-inference asymmetry, and the framework predicts that pathological noise inference produces \emph{reversed} rather than merely impaired exploration, with implications for computational accounts of psychiatric conditions.

2605.19214 2026-05-20 cs.LG cs.CV

Worst-Group Equalized Odds Regularization for Multi-Attribute Fair Medical Image Classification

多属性公平医疗图像分类中的最差组等化几率正则化

Nikhil Cherian Kurian, Victor Caquilpan Parra, Abin Shoby, Luke Whitbread, Lauren Oakden-Rayner, Robert Vandersluis, Jessica Schrouff, Lyle J. Palmer, Mark Jenkinson

发表机构 * Australian Institute for Machine Learning, Adelaide University(澳大利亚机器学习研究所,阿德莱德大学) GlaxoSmithKline (GSK)(葛兰素史克(GSK))

AI总结 本文提出了一种最差组等化几率正则化方法,用于在多个人口属性上同时评估和缓解医疗图像分类中的系统性差异,通过在推理时优化子组层面的真阳性率和假阳性率偏差,减少等化几率和等化机会的不平等,同时对AUC影响最小。

Comments 11 Pages, 2 Figures

详情
AI中文摘要

医疗人工智能的诊断性能在不同人口群体间系统性地变化,但子组AUC可能掩盖了临床重要的不平等。在固定的推理时间操作点上,某些群体可能表现出过度诊断行为,其特征是真阳性率和假阳性率升高,而另一些群体则表现出不足诊断模式,其真阳性率和假阳性率降低。这些对立的趋势可能在总体AUC中相互抵消,但会产生有意义的临床决策不平等。受在操作点和多个人口属性上评估和缓解此类不平等的需要所驱动,我们提出了一种最差组等化几率边际正则化器。该正则化器明确针对推理时的子组层面真阳性率和假阳性率偏差。在每次更新时,该方法识别出由显式人口属性(如年龄、性别和种族)定义的最极端边际偏差的子组,并应用统一的惩罚,从而在多个人口轴上实现公平优化,而无需显式交集约束。在两个现实中的多标签医学影像数据集中,我们的方法在减少等化几率和等化机会的不平等方面表现一致,对AUC影响极小,从而在保持诊断性能的同时提高公平性。

英文摘要

Diagnostic performance in medical AI varies systematically across demographic groups, yet subgroup AUC can mask clinically important disparities. At a fixed inference-time operating point, some groups may exhibit over-diagnostic behaviour, characterized by elevated true and false positive rates, while others show under-diagnostic patterns with reduced true and false positive rates. These opposing tendencies can cancel in aggregate AUCs while producing meaningful inequities in clinical decision-making. Motivated by the need to assess and mitigate such disparities at the operating point and across multiple demographic attributes simultaneously, we propose a worst-group equalized-odds margin regularizer. The proposed regularizer explicitly targets subgroup-level deviations on both the true positive and false positive sides at inference. At each update, the method identifies subgroups defined by explicit demographic attributes (e.g., age, sex, and race) that exhibit the most extreme margin deviations and applies a unified penalty, enabling fairness optimization across multiple demographic axes without requiring explicit intersectional constraints. Across two medical imaging datasets in realistic multi-label settings, our method consistently reduces disparities in Equalized Odds and Equalized Opportunity with minimal impact on AUC, preserving diagnostic performance while improving fairness.

2605.19213 2026-05-20 cs.CV

Smartphone-based Circular Plot Sampling for Forest Inventory

基于智能手机的圆形采样法用于森林调查

Su Sun, Jui-Cheng Chiu, Nabin Khanal, Songlin Fei, Yingjie Victor Chen

发表机构 * School of Applied and Creative Computing, Purdue University(应用与创意计算学院,普渡大学) Department of Forestry and Natural Resources, Purdue University(林业与自然资源学院,普渡大学)

AI总结 本文提出了一种基于智能手机的轻量级pipeline,通过单次 walkthrough 视频实现完整的圆形采样法树测量,无需额外专业硬件,结合预训练的单目深度估计和树实例分割与SLAM框架,实现相机轨迹和深度的联合优化,从而获得树的位置和胸径估计,具有较高的准确性和可扩展性。

详情
AI中文摘要

圆形采样法是森林调查的核心,但准确测量树的胸径(DBH)和在采样区域内的空间位置仍然具有挑战性。传统方法依赖于昂贵的地面激光雷达系统或劳动密集型的手动方法,涉及卡尺和罗盘测量,限制了其在大规模环境中的可扩展性和可及性。本文提出了一种轻量级、基于智能手机的pipeline,能够通过单次walkthrough视频实现完整的采样区域树测量,仅需一个消费者智能手机安装在便携支架上即可。所提出的方法整合了预训练的单目深度估计和树实例分割与同时定位与建图(SLAM)框架,以联合优化视频序列中的相机轨迹和深度。通过融合SLAM推导出的相机姿态与分割深度图,结合校准的参考长度,获得树的位置和DBH估计。该系统在管理森林和自然森林采样区域中进行了评估,分别达到了1.51厘米(MARE 3.98%)和2.30厘米(MARE 5.69%)的平均绝对误差,性能在不同起始方向和位置下保持一致。跨视频一致性分析进一步证明了在不同起始位置开始测量时,树的定位稳定且可重复。所提出的方法在准确性和可扩展性上与传统现场方法相当,同时显著降低了设备成本和操作复杂性,使其适用于专业研究人员和非专业森林管理者在多样化的操作环境中使用。

英文摘要

Circular sample plots are a cornerstone of forest inventory, yet accurate measurement of tree diameter at breast height (DBH) and spatial location within such plots remains challenging. Conventional approaches rely either on costly terrestrial LiDAR systems or labor-intensive manual methods involving calipers and compass bearings, limiting their scalability and accessibility in large scale environments. We present a lightweight, smartphone-based pipeline that enables complete plot sampling based tree measurement from a single walkthrough video, requiring no specialized hardware beyond a consumer smartphone mounted on a portable stand. The proposed method integrates pretrained monocular depth estimation and tree instance segmentation with a simultaneous localization and mapping (SLAM) framework to jointly refine camera trajectories and depth across the video sequence. Tree positions and DBH estimates are recovered by fusing SLAM-derived camera poses with segmented depth maps, with absolute real-world scale anchored via a calibrated reference length. The system was evaluated in both managed forest plots and natural forest plot, achieving a mean absolute error of 1.51 cm (MARE 3.98%) and 2.30 cm (MARE 5.69%) respectively, with consistent performance across varying starting directions and positions. Cross-video consistency analysis further demonstrated stable and reproducible tree localization across measurements initiated from different starting positions. The proposed approach achieves accuracy comparable to established field methods while substantially reducing equipment cost and operational complexity, making it accessible to both professional researchers and non-expert forest managers in diverse operational settings.

2605.19210 2026-05-20 cs.CV

D-Convexity: A Unified Differentiable Convex Shape Prior via Quasi-Concavity for Data-driven Image Segmentation

D-Convexity:通过准凹性统一的可微凸形状先验用于数据驱动的图像分割

Shengzhe Chen, Hao Yan

发表机构 * School of Computing and Augmented Intelligence, Arizona State University(计算与增强智能学院,亚利桑那州立大学)

AI总结 本文提出了一种基于网络输出掩码函数u的准凹性,统一且无阈值的可微凸形状先验,用于数据驱动的图像分割,通过将所有超水平集要求为凸性,将全局形状约束转化为局部可微不等式,从而提升形状正则化性能。

Comments Accepted by CVPR 2026

详情
AI中文摘要

凸性是许多自然和人造结构的基础几何先验,但在端到端可训练分割网络中有效施加仍然具有挑战性。我们从函数的角度重新审视凸性,并提出基于网络输出掩码函数u的准凹性的一致、无阈值凸性先验。我们不局限于约束单个二值分割,而是要求u的所有超水平集都是凸的,将全局形状约束转化为u及其导数的局部、可微不等式。从这一原则出发,我们推导出零、一、二阶特征,分别产生局部中点凸化算法、基于支撑超平面的梯度条件以及以切平面上的二次形式表达的充分二阶不等式。一阶和二阶形式产生一个紧凑的卷积损失,可以在图像上密集应用而无需阈值处理。我们的准凹性损失通过所提出的凸梯度投影模块(CGPM)无缝集成到现代分割网络中。它们在多个数据集中一致地强制凸性并提高形状正则化性能,优于专门针对视网膜分割的网络,并超越了先前的形状意识方法。值得注意的是,我们的分析将一系列先前的凸形状模型,从离散1-0-1线约束和图割凸性公式到基于曲率或带符号距离拉普拉斯的水平集先验,统一在一个连续且可微的框架中。

英文摘要

Convexity is a fundamental geometric prior that underlies many natural and man-made structures, yet remains challenging to impose effectively in end-to-end trainable segmentation networks. We revisit convexity from a functional perspective and propose a unified, threshold-free convexity prior based on the quasi-concavity of the network's output mask function u. Instead of constraining a single binary segmentation, we require all super-level sets of u to be convex, transforming global shape constraints into local, differentiable inequalities on u and its derivatives. From this principle, we derive zero, first, and second-order characterizations, yielding respectively a local midpoint convexification algorithm, a gradient-based condition linked to supporting hyperplanes, and a sufficient second-order inequality expressed as a quadratic form on the tangent plane. The first and second-order formulations produce a compact convolutional loss that can be densely applied across the image without thresholding. Our quasi-concavity losses integrate seamlessly with modern segmentation networks via the proposed convex gradient projection module (CGPM). They consistently enforce convexity and improve shape regularity across multiple datasets, outperforming networks tailored for retinal segmentation and surpassing previous shape-aware methods. Remarkably, our analysis unifies a wide spectrum of previous convex shape models, from discrete 1-0-1 line constraints and graph-cuts convexity formulations to curvature or signed distance Laplacian based level-set priors, within a single continuous and differentiable framework.