arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2075
专题追踪
2605.06891 2026-05-11 cs.CV cs.LG

Towards Fairness under Label Bias in Image Segmentation: Impact, Measurement and Mitigation

图像分割中标签偏见下的公平性:影响、测量与缓解

Aditya Parikh, Stella Frank, Sneha Das, Aasa Feragen

发表机构 * Department of Applied Mathematics and Computer Science(应用数学与计算机科学系)

AI总结 本文提出一种基于数据的Confident Learning方法,用于检测和缓解图像分割中的标签偏见,通过比较训练标签与模型预测,识别偏见方向并改进公平性。

详情
AI中文摘要

标注数据反映了其标注流程的偏见,有时会引入标签偏见:组条件标签错误,导致不同人口子群体之间系统性性能差异。图像分割中的标签偏见仍被低估,因为检测它通常需要干净、无偏的标注,这些并不容易获得。我们提出了一种数据导向的Confident Learning适应方法,用于分割,允许在训练数据中直接检测标签偏见,而无需干净、无偏的地面真实值。通过将提供的训练标签与模型的自信预测进行比较,我们隔离了方向性错误,量化偏见的存在和性质,其中标准重叠指标如Dice失效。我们进一步表明,标签偏见影响编码器特征空间中子群体的可分离性,这一特性我们用于缓解偏见而非压制它。我们评估了三个数据集,涵盖从合成到现实生活中偏见的范围,展示我们的框架如何可靠地检测和缓解偏见,而无需访问干净标签,在实验条件下实现公平性能。

英文摘要

Labeled datasets reflect the biases of their annotation pipelines, which sometimes introduce label bias: group-conditional label errors that cause systematic performance disparities across demographic subgroups. Label bias in image segmentation remains underexplored, as even detecting it typically requires clean, unbiased annotations, which are not readily available. We present a data-centric adaptation of Confident Learning to segmentation, allowing detection of label bias directly in the training data without a clean, unbiased ground truth. By comparing the provided training labels to the model's confident predictions, we isolate directional errors that quantify the presence and nature of bias, where standard overlap metrics like Dice fail. We further show that label bias influences subgroup separability in the encoder's feature space, an artifact we leverage for bias mitigation rather than suppressing it. We evaluate three datasets, spanning from synthetic to real-life bias, showing how our framework reliably detects and mitigates bias without access to clean labels, achieving equitable performance across experimental conditions.

2605.06889 2026-05-11 cs.CV

TriDE: Triangle-Consistent Translation Directions for Global Camera Pose Estimation

TriDE:用于全局相机姿态估计的三角形一致翻译方向

Francisco Chen, Yiran Wang, Yunpeng Shi

发表机构 * Department of Mathematics University of California, Davis(数学系加州大学戴维斯分校)

AI总结 TriDE通过利用相机-三角形一致性提高全局结构从运动中相机位置估计的准确性,通过信息传播策略实现高效可靠的方向修正。

Comments 32 pages, 6 figures

详情
AI中文摘要

在全局结构从运动中,成对的翻译方向是估计相机位置的关键输入。现有估计器通常独立处理每对图像,产生可能局部合理但与其他相对方向不一致的方向。为此,我们提出了TriDE,利用相机-三角形一致性作为高效的高阶验证信号。不同于解决成本高且对初始化敏感的全局非线性优化问题,TriDE通过方向与其 incident 加权三角形之间的信息传递来细化不可靠的成对方向。这种信息传播策略使我们能够在现实随机损坏模型下建立强的相变界以实现精确恢复。在真实图像图上实验表明,TriDE显著提高了方向准确性,并产生了更好的下游相机位置,提供了从局部成对估计到全局相机姿态几何的实用联系。

英文摘要

Pairwise translation directions are a key input to camera location estimation in global structure-from-motion. Existing estimators usually process each image pair independently, producing directions that may be locally plausible but inconsistent with the other relative directions in the viewing graph. To jointly estimate the direction, we propose TriDE, which exploits camera-triangle consistency as an efficient higher-order verification signal. Instead of solving a costly global nonlinear optimization problem that is sensitive to initialization, TriDE refines unreliable pairwise directions through message passing between directions and their incident weighted triangles. This information propagation strategy enables us to establish a strong phase-transition bound for exact recovery under a realistic random corruption model. Experiments on real image graphs show that TriDE improves direction accuracy by a large margin and yields better downstream camera locations, providing a practical link between local pairwise estimation and global camera pose geometry.

2605.06886 2026-05-11 cs.CL

TajPersLexon: A Tajik-Persian Lexical Resource and Hybrid Model for Cross-Script Low-Resource NLP

TajPersLexon:一种塔吉克-波斯语词汇资源和混合模型用于跨脚本低资源NLP

Mullosharaf K. Arabov

发表机构 * Institute of Computational Mathematics and Information Technologies(计算数学与信息科技学院)

AI总结 本文提出TajPersLexon,一个包含40112词对的塔吉克-波斯语平行词汇资源,通过混合模型在低资源环境下实现跨脚本词检索与对齐,取得96.4%准确率。

Comments Published in The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family (SilkRoadNLP 2026), pages 29-37, Rabat, Morocco. Association for Computational Linguistics

Journal ref Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family (SilkRoadNLP 2026), pages 29-37

详情
AI中文摘要

本文介绍了TajPersLexon,一个精心编纂的塔吉克-波斯语平行词汇资源,包含40,112个词和短语对,用于跨脚本词汇检索、转写和对齐在低资源设置中。我们进行了全面的CPU-only基准测试,比较三种方法论家族:(i)轻量级混合流水线,(ii)神经序列到序列模型,以及(iii)检索方法。我们的评估证明该任务本质上可解决,神经和检索基线在top-1准确率上达到98-99%。关键的是,我们证明虽然大型多语言句子变换器在这一精确词汇匹配上失败,但我们的可解释混合模型在实际应用中提供了良好的准确率-效率权衡,达到了96.4%的准确率。所有实验使用固定随机种子以确保完全可重复性。数据集、代码和模型将公开发布。

英文摘要

This work introduces TajPersLexon, a curated Tajik--Persian parallel lexical resource of 40,112 word and short-phrase pairs for cross-script lexical retrieval, transliteration, and alignment in low-resource settings. We conduct a comprehensive CPU-only benchmark comparing three methodological families: (i) a lightweight hybrid pipeline, (ii) neural sequence-to-sequence models, and (iii) retrieval methods. Our evaluation establishes that the task is essentially solvable, with neural and retrieval baselines achieving 98-99% top-1 accuracy. Crucially, we demonstrate that while large multilingual sentence transformers fail on this exact lexical matching, our interpretable hybrid model offers a favorable accuracy-efficiency trade-off for practical applications, achieving 96.4% accuracy in an OCR post-correction task. All experiments use fixed random seeds for full reproducibility. The dataset, code, and models will be publicly released.

2605.06885 2026-05-11 cs.LG cs.AI

Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment

无需再训练,对齐:通过表征对齐将自回归语言模型适应到扩散语言模型

Fred Zhangzhi Peng, Alexis Fox, Anru R. Zhang, Alexander Tong

发表机构 * Duke University(杜克大学) AITHYRA

AI总结 本文提出REPR-ALIGN方法,通过表征对齐将自回归模型转换为扩散模型,实现4倍训练加速,尤其在低数据情况下效果显著。

Comments Code available at https://github.com/pengzhangzhi/Open-dLLM

详情
AI中文摘要

扩散语言模型(DLMs)最近展示了与标准自回归(AR)模型互补的能力,特别是在非序列生成和双向编辑方面。尽管最近的研究表明预训练的自回归检查点可以转换为扩散语言模型,但现有方法主要通过持续去噪训练进行参数转移。我们提出问题:在AR到DLM转换过程中,是否可以显式保留next-token预测学习的内部表征几何结构?我们假设AR预训练学习的语义结构可以跨生成顺序转移,因此DLM训练应视为重新学习解码路径而非重新学习语言表征。为此,我们引入REPR-ALIGN,一种表征对齐目标,将双向掩码扩散模型适应到预训练的相同架构自回归模型上。具体而言,我们通过余弦相似度对DLM的隐藏状态与冻结的AR模型在每一层进行对齐,同时优化标准的掩码去噪目标。这种简单的对齐方法无需适配器和架构更改,仅在注意力掩码层面进行调整,在我们的设置中实现了高达4倍的训练加速,并在低数据情况下特别有效。我们的结果表明,语言表征可以跨生成顺序转移,表征对齐为训练扩散语言模型提供了一种简单而有效的方法。代码可在https://github.com/pengzhangzhi/Open-dLLM获取。

英文摘要

Diffusion language models (DLMs) have recently demonstrated capabilities that complement standard autoregressive (AR) models, particularly in non-sequential generation and bidirectional editing. Although recent work has shown that pretrained autoregressive checkpoints can be converted into diffusion language models, existing recipes primarily transfer parameters through continued denoising training with objective- and attention-level modifications. We instead ask whether the internal representation geometry learned by next-token prediction can be explicitly preserved during AR-to-DLM conversion. We hypothesize that much of the semantic structure learned by AR pretraining can transfer across generation orders, and thus DLM training should be viewed as relearning the decoding path rather than relearning language representations. To investigate this, we introduce REPR-ALIGN, a representation alignment objective that adapts a bidirectional masked diffusion model to reuse representations from a pretrained AR model of identical architecture. Concretely, we align the hidden states of the DLM to the frozen AR model at every layer using cosine similarity, while optimizing the standard masked denoising objective. This simple alignment, with no adapters and no architectural changes beyond the attention mask, yields up to 4x training acceleration in our setting and is particularly effective in low-data regimes. Our results suggest that linguistic representations can transfer across generation order, and that representation alignment provides a simple and effective technique for training diffusion language models. Code is available at https://github.com/pengzhangzhi/Open-dLLM.

2605.06882 2026-05-11 cs.AI

How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem

LLMs在最简单的长链推理任务上表现如何:等价类问题的实证研究

Chun Zheng, Lianlong Wu, Bingqian Li, Lvting Liu, Yi Zhou

发表机构 * University of Science and Technology of China(中国科学技术大学) University of Oxford(牛津大学)

AI总结 本文评估了LLMs在等价类问题上的表现,发现非推理模型无法解决,而推理模型虽更优但仍难以完全解决,且问题难度随连接概率和变量数变化而变化。

Comments 9 pages, 5 figures

详情
AI中文摘要

近年来,大型语言模型(LLMs)在推理任务上取得了显著进步,但其在长链推理任务中的表现仍不明确。本文通过评估最简单的长链推理任务——等价类问题(ECP)来探讨LLMs的性能。ECP要求在给定随机生成的等价关系下判断两个变量是否相等。我们考虑了不同变量数、连接概率、提示和其他因素下的多种问题实例。实验结果表明,非推理模型在ECP上表现不佳,而推理模型虽更优但仍难以完全解决该问题。有趣的是,当变量数固定时,非推理模型的最难题实例与ln n/(n-1)的相变点重合,表明问题存在混沌;而推理模型的最难题实例则与最大直径重合,表明问题存在推理难度。

英文摘要

Large Language Models (LLMs) have achieved great improvements in recent years. Nevertheless, it still remains unclear how good LLMs are for reasoning tasks, especially for long-chain ones. In this paper, we evaluate LLMs' performance on the simplest yet long-chain reasoning task, namely the Equivalence Class Problem (ECP), i.e., determining whether two variables are equal given a set of randomly generated equivalence relations. We consider both reasoning and non-reasoning representative LLMs over a large variety of problem instances, ranging over different numbers of variables, connectivity probabilities, prompts, and other factors. The experimental results show that non-reasoning LLMs fail ECP, while reasoning models are significantly better but still struggle to completely solve this problem. Interestingly, considering various connectivity probabilities with a fixed number of variables, we observe that, for non-reasoning models, the hardest problem instances coincide with the phase transition point of ln n/(n-1), suggesting the chaos of the problem; in contrast, for reasoning models, the hardest ones coincide with the biggest diameter, suggesting the reasoning difficulty of the problem.

2605.06879 2026-05-11 cs.LG q-bio.QM

Better Protein Function Prediction by Modeling Survivorship Bias

通过建模幸存者偏差更好地预测蛋白质功能

Zhongmou Chao, Poompol Buathong, Ekaterina Selivanovitch, Susan Daniel, Peter I. Frazier

发表机构 * Smith School of Chemical and Biomolecular Engineering, Cornell University, USA(卡内基梅隆大学斯密斯化学与生物分子工程学院,美国)

AI总结 本文提出Evo-PU框架,利用突变知识建模幸存者偏差,提升单物种序列数据的功能预测性能,优于传统PU学习、OCC和PLMs。

Comments 29 pages, 12 figures, 3 tables

详情
AI中文摘要

蛋白质序列数据表现出幸存者偏差:我们仅观察到那些存活并繁殖的生物体的数据,而非功能蛋白突变被自然选择淘汰。因此,预测蛋白质序列是否功能通常需要仅从正例学习。尽管正例-未标记(PU)学习框架提供了通用解决方案,但现有PU方法忽略了塑造序列可观察性及造成幸存者偏差的进化过程。考虑一个序列距离常见观察蛋白变体仅一个突变的序列:如果该序列功能,它很可能被观察到;如果未被观察,这表明非功能。相反,不太可能通过突变产生的序列可能只是从未产生过。因此,这两种缺失的序列在训练模型时应被不同对待。在本文中,我们提出Evo-PU,一种利用核苷酸突变的科学理解来建模幸存者偏差的PU学习框架,用于单物种序列数据。在三个预测任务中使用单物种均匀覆盖监控数据——预测留出的流感和呼吸道合胞病毒(RSV)突变研究结果,以及预测未来的SARS-CoV-2变种——Evo-PU优于标准PU学习、单类分类(OCC)和蛋白质语言模型(PLMs)。在多物种ProteinGym数据集的预测任务中,我们发现了泛化该方法的机会。

英文摘要

Protein sequence data from nature exhibits survivorship bias: we only observe data from those organisms that survive and reproduce, while non-functional protein mutations are eliminated by natural selection. Thus, predicting whether a protein sequence is functional often requires learning from positive examples alone. While positive-unlabeled (PU) learning frameworks offer a generic solution to this problem, existing PU methods ignore the evolutionary processes that shape sequence observability and cause survivorship bias. Consider a sequence that is one mutation away from a commonly-observed protein variant in a well-surveilled organism. If the sequence were functional, it would likely be observed. If it is not observed, this suggests non-functionality. In contrast, sequences that are unlikely to arise through mutation may be missing simply because they never arose. Thus, these two kinds of missing sequences should be treated differently when training models. In this work, we propose Evo-PU, a PU learning framework that uses a scientific understanding of nucleotide mutation to model survivorship bias for well-surveilled single-organism sequence data. On three prediction tasks using single-organism uniform-coverage surveillance data -- predicting results from held-out influenza and respiratory syncytial virus (RSV) mutagenesis studies, and predicting future SARS-CoV-2 variants -- Evo-PU outperforms standard PU learning, one-class classification (OCC), and protein language models (PLMs). On prediction tasks from multi-organism ProteinGym datasets with more heterogeneous surveillance coverage, we identify opportunities to generalize our approach.

2605.06877 2026-05-11 cs.LG

Temporal Attention for Adaptive Control of Euler-Lagrange Systems with Unobservable Memory

时间注意力用于具有不可观测记忆的欧拉-拉格朗日系统的自适应控制

Giansalvo Cirrincione, Adriano Fagiolini

发表机构 * the present authors(本作者)

AI总结 本文提出利用时间注意力机制改进欧拉-拉格朗日系统自适应控制,通过自注意力块处理近期运动历史生成控制器增益,实验显示在短记忆范围内优于Transformer基线,但在长记忆情况下出现性能下降,需在强化学习中动态调整注意力头数。

详情
AI中文摘要

欧拉-拉格朗日系统自适应控制在摩擦由有限时间内部状态决定且无法直接从关节测量中观测时具有挑战性。在该设定中,测量闭环状态不再马尔可夫,标准确定性等价自适应法则可能失去收敛保证。本文提出一种元控制架构,其中计算扭矩控制器的增益由处理近期运动历史短窗口的自注意力块生成。注意力头数通过分析时间窗口内记忆状态梯度的自协方差进行代理分析确定。该代理基于作者先前开发的增量秩跟踪框架的时间适应。选定的头数随后固定并在强化学习阶段作为架构超参数使用,其中策略在受保护的可接受约束下训练。该方法在具有非线性摩擦和可变负载的2自由度机械臂上进行测试。在短且匹配的记忆范围内,单层注意力仅元控制器在跟踪误差减少方面优于更深层的Transformer基线,分别减少12和19个百分点。报告的效果大小较大,d约为-1.1和-2.1,且曼-惠特尼p<0.05在两种情况下均成立。然而,在长记忆范围内,优势消失。十次训练运行中有四次出现发散或负载不变策略崩溃,揭示了静态Phase-1头数预设的弱点。这促使将秩跟踪移至强化学习循环内,允许在运行时修剪或增长注意力头数,而不是在训练前固定。

英文摘要

Adaptive control of Euler-Lagrange systems is challenging when friction is governed by a finite-horizon internal state that is not directly observable from joint measurements. In this setting, the measured closed-loop state is no longer Markovian, and standard certainty-equivalence adaptive laws may lose their convergence guarantees. The paper proposes a meta-control architecture in which the gains of a computed-torque controller are generated by a self-attention block processing a short window of recent motion history. The number of attention heads is selected before policy training through a surrogate analysis of the autocovariance of the memory-state gradient along the temporal window. This surrogate is based on a temporal adaptation of an incremental rank-tracking framework previously developed by the authors. The selected head count is then fixed and used as an architectural hyperparameter in a reinforcement-learning stage, where the policy is trained under a shielded admissibility constraint. The approach is tested on a 2-DOF manipulator with nonlinear friction and variable payload. In the short and matched memory regimes, the single-layer attention-only meta-controller outperforms a deeper Transformer baseline, with tracking-error reductions of 12 and 19 percentage points, respectively. The reported effect sizes are large, with d approximately -1.1 and -2.1, and Mann-Whitney p < 0.05 in both cases. In the long memory regime, however, the advantage disappears. Four out of ten training runs show either divergence or payload-invariant policy collapse, revealing a weakness in the static Phase-1 head-count prescription. This motivates moving rank-tracking inside the reinforcement-learning loop, allowing attention heads to be pruned or grown at runtime instead of fixed before training.

2605.06876 2026-05-11 cs.CV

AdpSplit: Error-Driven Adaptive Splitting for Faster Geometry Discovery in 3D Gaussian Splatting

AdpSplit:基于误差的自适应分裂以加速3D高斯点云中的几何发现

Yongjae Lee, Jingxing Li, Abhay Kumar Yadav, Rama Chellappa, Deliang Fan

发表机构 * Arizona State University(亚利桑那州立大学) Johns Hopkins University(约翰霍普金斯大学)

AI总结 AdpSplit通过基于误差的自适应分裂操作减少3DGS训练时间,同时保持渲染质量,在多个数据集上实现9.2%-22.3%的加速。

详情
AI中文摘要

在3D高斯点云(3DGS)中,自适应密度控制通过固定基数的随机分裂反复增长高斯群体以发现有用的场景结构。然而,在标准3DGS中,其二进制分裂操作需要许多密集化轮次才能暴露细部信息,成为高效训练计划中迭代次数较少的瓶颈。我们引入AdpSplit,一种基于误差的自适应分裂操作,该操作通过L1像素误差区域统计确定分裂子的数量并初始化子参数,从而减少密集化轮次,从而减少训练时间,同时保持完整计划训练的渲染质量。在MipNeRF360、Deep-Blending和Tanks&Temples数据集上,AdpSplit作为标准分裂操作的简单替换,减少了多个加速3DGS管道的训练时间9.2%-22.3%。在FastGS中,AdpSplit在MipNeRF360上与完整计划的PSNR匹配,同时将训练时间减少16.4%,对应于相对于标准3DGS的12.6倍加速。

英文摘要

Adaptive density control in 3D Gaussian Splatting (3DGS) repeatedly grows the Gaussian population through fixed-cardinality random splitting to discover useful scene structure. However, in vanilla 3DGS, its binary split operator requires many densification rounds to expose fine details, making it a bottleneck for efficient training schedules with fewer iterations. We introduce AdpSplit, an error-driven adaptive split operator that determines the number of split children and initializes the child parameters from L1-pixel-error region statistics, enabling fewer densification iterations, thus reduced training time, while preserving the rendering quality of full-schedule training. Across the MipNeRF360, Deep-Blending, and Tanks&Temples datasets, AdpSplit reduces the training time of multiple accelerated 3DGS pipelines by 9.2%-22.3% as a simple drop-in replacement for the standard split operator. With FastGS, AdpSplit matches the full-schedule PSNR on MipNeRF360 while reducing training time by 16.4%, corresponding to a 12.6x acceleration over vanilla 3DGS.

2605.06874 2026-05-11 cs.LG

On the Divergence of Differential Temporal Difference Learning without Local Clocks

关于没有局部时钟的差分时间差学习发散性研究

David Antrobius, Shangtong Zhang

发表机构 * Department of Computer Science(计算机科学系) University of Virginia(弗吉尼亚大学)

AI总结 本文探讨了在平均回报强化学习中,差分时间差学习在局部时钟下收敛但在全局时钟下发散的现象,解决了Wan等[2021]和Blaser等[2026]提出的问题。

详情
AI中文摘要

学习率是强化学习(RL)中的关键组成部分。本文利用全局和局部时钟区分两种学习率类型。前者是标准形式α_t,仅依赖于时间步t(即全局时钟)。后者是形式α_{ν(S_t, t)},其中ν(s, t)计算到时间t为止访问状态s的次数(即局部时钟)。在折扣RL中,具有局部时钟的收敛RL算法也总是具有全局时钟的收敛性,反之亦然。我们未发现任何反例。本文的主要贡献是展示这种良好的对应关系在平均回报RL中失效。具体而言,我们构造了一个反例,表明尽管差分时间差学习在局部时钟下收敛,但可能在全局时钟下发散。此反例解决了Wan等[2021]和Blaser等[2026]提出的问题。

英文摘要

Learning rate is a critical component of reinforcement learning (RL). This work uses global and local clocks to distinguish two types of learning rates. The former is of the standard form $α_t$ that depends only on the time step $t$ (i.e., a global clock). The latter is of the form $α_{ν(S_t, t)}$, where $ν(s, t)$ counts the number of visits to state $s$ until time $t$ (i.e., a local clock). In discounted RL, an RL algorithm that is convergent with a local clock is always also convergent with a global clock, and vice versa. We are not aware of any counterexample. The key contribution of this work is to show that this nice correspondence breaks down in average-reward RL. Specifically, we construct a counterexample showing that although differential temporal difference learning is convergent with a local clock, it can diverge with a global clock. This counterexample closes the open problem in Wan et al. [2021], Blaser et al. [2026].

2605.06868 2026-05-11 cs.LG math.OC

When Descent Is Too Stable: Event-Triggered Hamiltonian Learning to Optimize

当下降过于稳定时:事件触发的哈密顿学习优化

Yi Wang, Chandrajit Bajaj

发表机构 * Oden Institute(奥登研究所) The University of Texas at Austin(德克萨斯大学奥斯汀分校) Department of Computer Science(计算机科学系)

AI总结 本文提出SHAPE算法,通过事件触发机制在局部信息下优化,平衡下降、探索与预算分配,提升非凸优化性能。

详情
AI中文摘要

固定预算非凸优化可能失败并非因为局部下降不稳定,而是因为过于稳定:在达到附近驻点后,优化器可能在剩余评估中细化无信息的局部极小值。本文将此失败模式建模为优化器动力学的控制问题,其中学习者需决定何时下降、何时利用有前途的盆地以及何时停滞应触发移动。我们引入SHAPE,一种结构化自适应端口-哈密顿任务家族优化器,用于在局部信息下事件触发的极小值狩猎。从梯度下降动力学开始,SHAPE将优化提升到扩展相空间(q, p),其中原始状态q代表候选解,余切变量p承载方向敏感性,控制器u提供当前梯度 oracle 的处理信息。在每个阶段,学习的哈密顿向量场诱导结构化的局部下降;在阶段间,实现中的固定事件钟在检测到局部平衡时更新端口和内存,阶段依赖的视野在分析中被视为直接推广。此设计保留了被动兼容的结构,同时允许相同的训练策略使用干净、随机或估计的梯度输入。在固定预算非凸优化任务上的实验表明,SHAPE比固定策略优化器在最佳已知性能上有所提升。这些结果表明,自适应哈密顿能量塑造提供了一种原理上平衡下降、探索和预算分配的机制,在困难的优化景观中。

英文摘要

Fixed-budget nonconvex optimization can fail not because local descent is unstable, but because it is too stable: after reaching a nearby stationary point, an optimizer may spend the remaining evaluations refining an uninformative local minimum. We formulate this failure mode as a control problem over optimizer dynamics, where the learner must decide when to descend, when to exploit a promising basin, and when stagnation should trigger movement elsewhere. We introduce SHAPE, a structured adaptive port-Hamiltonian task-family optimizer for event-triggered minima hunting under local information. Starting from gradient-descent dynamics, SHAPE lifts optimization to an augmented phase space $(q, p)$, where the primal state $q$ represents the candidate solution, the cotangent variable $p$ carries directional sensitivity, and a controller $u$ provides processed information from current gradient oracle. Within each stage, a learned Hamiltonian vector field induces structured local descent; across stages, a fixed event clock in the implementation updates ports and memory when local equilibria are detected, with stage-dependent horizons treated in the analysis as a direct generalization. This design preserves a passivity-compatible structure while allowing the same trained policy to use clean, stochastic, or estimated gradient inputs. Experiments on fixed-budget nonconvex optimization tasks show that SHAPE improves best-so-far performance compared with fixed-policy optimizers. These results suggest that adaptive Hamiltonian energy shaping provides a principled mechanism for balancing descent, exploration, and budget allocation in difficult optimization landscapes.

2605.06866 2026-05-11 cs.LG math.OC

A Finite-Iteration Theory for Asynchronous Categorical Distributional Temporal-Difference Learning

异步分类分布时序差分学习的有限迭代理论

Ege C. Kaya, Abolfazl Hashemi

发表机构 * Elmore Family School of Electrical and Computer Engineering(埃尔摩家庭电气与计算机工程学院)

AI总结 本文提出两种分类策略的有限迭代理论,通过等距嵌入将算法转化为异步单状态随机近似递归,从而在折扣和非折扣问题中提供保证。

Comments 53 pages

详情
AI中文摘要

近期的非渐近分析显著推进了分布策略评估的理论,但主要关注同步全状态更新、生成模型、基于模型的估计器、加速变体或不同近似架构。标准分类时序差分学习通常在不同模式下使用。它在每次迭代中异步执行单状态更新,并在在线设置中由马尔可夫轨迹驱动。这在现有有限迭代理论与最贴近实际分布时序差分实现的分类递归之间留下重要空白。本文为两种分类策略方法填补了这一空白:在Cramér几何中使用标量分类时序差分学习,在最大均值偏差几何中使用多变量带符号分类时序差分学习。经过适当的等距嵌入后,两种算法都以异步单状态随机近似递归形式出现,其在状态wise supremum范数中收缩。这允许在折扣问题中使用i.i.d.和马尔可夫态采样,在非折扣固定时间 horizon 问题中使用i.i.d.事件采样提供有限迭代保证。

英文摘要

Recent non-asymptotic analyses have substantially advanced the theory of distributional policy evaluation, but they largely concern synchronous full-state updates under a generative model, model-based estimators, accelerated variants, or different approximation architectures. Standard categorical temporal-difference learning is typically used in a different regime. It asynchronously performs a single-state update at each iteration and, in online settings, is driven by a Markovian trajectory. This leaves an important gap between existing finite-iteration theory and the categorical recursions most closely aligned with practical distributional temporal-difference implementations. We bridge this gap for two categorical policy-evaluation methods: scalar categorical temporal-difference learning in the Cramér geometry and multivariate signed-categorical temporal-difference learning in the maximum mean discrepancy geometry. After suitable isometric embeddings, both algorithms take the form of asynchronous single-state stochastic-approximation recursions that contract in a statewise supremum norm. This permits finite-iteration guarantees in discounted problems under both i.i.d. and Markovian state sampling, and in undiscounted fixed-horizon problems under i.i.d. episodic sampling.

2605.06864 2026-05-11 cs.LG

Multi-Objective Multi-Agent Bandits: From Learning Efficiency to Fairness Optimization

多目标多智能体老虎机:从学习效率到公平性优化

John Wang, Mengfan Xu

发表机构 * University of Massachusetts Amherst(马萨诸塞大学阿姆赫斯特分校)

AI总结 本文研究了在随机奖励下的多目标多智能体老虎机问题,提出基于Pareto regret的高效学习算法和基于社会福利的公平学习方法,证明了算法在效率和公平性上的性能。

详情
AI中文摘要

我们研究了在随机奖励下的多目标多智能体多臂老虎机(MO-MA-MAB)问题,其中智能体观察到异质奖励向量并在时间变化的图上通信。我们将此新兴问题设置用于解决测量为帕累托后悔的高效学习,并将公平学习作为额外目标,通过社会福利捕捉。为了衡量效率,我们制定了帕累托后悔并开发了Pareto UCB1 Gossip,其新颖的探索半径明确分离了基于帕累托推断的统计不确定性与共识误差。为了表达公平性约束,我们制定了基于偏好标量化的纳什社会福利目标并提出Simulated NSW UCB Gossip,该方法整合了基于偏好奖励模拟、基于 gossip 的效用估计和 UCB 风格的探索。我们证明了Pareto UCB1 Gossip达到O(log T)后悔和实例独立的O(√T)速率,而Simulated NSW UCB Gossip达到实例独立的O(T^{3/4})后悔界。这种分离揭示了在效率目标上施加公平性约束的成本:公平性限制了信息聚合并减缓了收敛。实验表明,我们的方法在效率和公平性设置中均优于基线,性能提高了约100%和50%。

英文摘要

We study multi-objective multi-agent multi-armed bandits (MO-MA-MAB) under stochastic rewards, where agents observe heterogeneous reward vectors and communicate over time-varying graphs. We formulate this emerging problem setting to address \emph{efficient learning}, measured by Pareto regret, and incorporate \emph{fair learning} as an additional goal, captured via social welfare. To measure efficiency, we formulate Pareto regret and develop \textsc{Pareto UCB1 Gossip}, whose novel exploration radius explicitly separates statistical uncertainty in Pareto-based inference from consensus error. To express the fairness constraint, we formulate a Nash Social Welfare objective over preference-scalarized rewards and propose \textsc{Simulated NSW UCB Gossip}, which integrates preference-based reward simulation, gossip-based utility estimation, and UCB-style exploration. We prove that \textsc{Pareto UCB1 Gossip} achieves \(\mathcal{O}(\log T)\) regret and an instance-independent rate of \(\mathcal{O}(\sqrt{T})\), while \textsc{Simulated NSW UCB Gossip} achieves an instance-independent regret bound of \(\mathcal{O}(T^{3/4})\). This separation reveals the cost of imposing the fairness constraint to our efficiency objective: fairness limits information aggregation and slows convergence. Experiments show that our methods consistently outperform baselines, improving performance by approximately \(100\%\) and \(50\%\) in the efficiency and fairness settings, respectively.

2605.06863 2026-05-11 cs.RO cs.HC

Bi3: A Biplatform, Bicultural, Biperson Dataset for Social Robot Navigation

Bi3:一个双平台、双文化、双人数据集用于社交机器人导航

Andrew Stratton, Phani Teja Singamaneni, Pranav Goyal, Rachid Alami, Christoforos Mavrogiannis

发表机构 * Department of Robotics, University of Michigan(机器人系,密歇根大学) LAAS-CNRS, University of Toulouse(LAAS-CNRS,图卢兹大学) INRIA, University of Lorraine(INRIA,洛林大学)

AI总结 Bi3数据集通过双平台、双文化、双人交互,为社交机器人导航提供多样性模型复杂性的基准,用于研究人类与机器人在受限环境中的协同活动。

Comments ICRA 2026

详情
AI中文摘要

我们贡献了Bi3,一个社交机器人在人群中的导航数据集,其独特之处在于:原创实验设计导致人与机器人之间的近距离导航交互;五种不同的导航算法;两种不同的机器人平台;74名参与者从美国和法国两个地点招募;多模态数据流包括10.5小时的人和机器人地面真实运动轨迹、RGB视频和用户对机器人性能的评价。通过对收集数据集的分析,如交互密度和人类速度等指标,表明Bi3代表了一个独特多样性和建模复杂性的基准。Bi3有助于理解人类与机器人如何在受限环境中有效协同活动,并可作为训练人类运动预测模型和机器人导航控制策略的资源。

英文摘要

We contribute Bi3, a dataset of social robot navigation among groups of people in a constrained lab space. Compared to prior data collection efforts for social robot navigation, our dataset is unique in that it features: an original experiment design giving rise to close navigation encounters between two humans and a robot; five different navigation algorithms; two different robot platforms; a diverse participant pool of 74 people recruited from two sites in the USA and France; multimodal data streams including 10.5 hours of human and robot ground-truth motion tracks, RGB video, and user impressions over robot performance. Our analysis of the collected dataset through metrics like interaction density and human velocity suggests that Bi3 represents a benchmark of unique diversity and modeling complexity. Bi3 contributes towards understanding how humans and robots can productively mesh their activities in constrained environments, and can be a resource for training models of human motion prediction and robot control policies for navigation in densely crowded spaces.

2605.06861 2026-05-11 cs.LG cs.NA math.NA

Christoffel-DPS: Optimal sensor placement in diffusion posterior sampling for arbitrary distributions

Christoffel-DPS: 在扩散后验采样中为任意分布优化传感器布置

James Rowbottom, Nick Huang, Carola-Bibiane Schönlieb, Ben Adcock

发表机构 * Department of Applied Mathematics and Theoretical Physics(应用数学与理论物理系) University of Cambridge(剑桥大学) Department of Mathematics(数学系) Simon Fraser University(西蒙弗雷泽大学)

AI总结 本文提出Christoffel-DPS方法,基于Christoffel函数为任意分布提供非渐近的传感器数量界,优于传统高斯方法,适用于非高斯基准的低传感器预算场景。

详情
AI中文摘要

状态估计是科学、工程和控制应用中的关键任务。由于重建的可靠性依赖于传感器的数量和位置,最优传感器布置(OSP)在测量稀少且昂贵的场景中至关重要。经典OSP方法依赖高斯假设,因而无法处理许多实际系统中遇到的复杂分布。基于生成模型的传感器引导扩散后验采样(DPS)已出现作为从高度复杂分布中重建状态的有前途的技术。然而,现有传感器选择方法要么需要不现实多的传感器,要么模仿经典OSP,导致现代恢复模型与经典OSP工具之间存在不匹配,推动了需要新的OSP想法以匹配最近在强大恢复模型中的进步。我们介绍了一个基于Christoffel函数的无分布传感器布置框架:Christoffel函数是后验采样中任意传感器和信号分布的最优采样和恢复保证的数学公式,从中我们推导出一种新的OSP策略,具有非渐近的传感器数量界。我们开发了Christoffel-DPS,具有离线和在线变体,实例化了生成模型中的Christoffel采样。Christoffel-DPS优于高斯OSP基线和现有生成模型布置方法,验证了无分布传感在理论和实践中都是可行的。该框架是模型无关的;我们展示了其在一系列无条件DPS和流匹配模型上的应用,展示了Christoffel-DPS在低传感器预算场景中的有效性。

英文摘要

State estimation is a critical task in scientific, engineering and control applications. Since the reliability of reconstructions depends on the number and position of sensors, optimal sensor placement (OSP) is essential in scenarios where measurements are sparse and expensive. Classical OSP approaches rely on Gaussian assumptions and are consequently unable to account for the complex distributions encountered in many real-world systems. Generative-model-based reconstruction using sensor guided diffusion posterior sampling (DPS) has emerged as a promising technique for reconstructing states from highly complex distributions. However, existing sensor-selection methods either require unrealistically many sensors or emulate classical OSP, creating a mismatch between modern recovery models with classical OSP tools motivating the need for fundamentally new ideas towards OSP that match the recent advances made in powerful recovery models. We introduce a distribution-free sensor placement framework based on the Christoffel function: a mathematical formulation of optimal sampling and recovery guarantees for posterior sampling with arbitrary sensors and signal distributions, from which we derive a new OSP strategy with non-asymptotic bounds on the number of sensors needed for recovery. We develop Christoffel-DPS, with offline and online variants, instantiating Christoffel sampling for generative models. Christoffel-DPS outperforms Gaussian OSP baselines and existing generative-model placement methods, validating that distribution-free sensing is both theoretically principled and practically superior. The framework is model-agnostic; we demonstrate its application to a range of unconditional DPS and flow-matching models on structurally non-Gaussian benchmarks, showing the efficacy of Christoffel-DPS in low sensor budget regimes.

2605.06859 2026-05-11 cs.CV cs.AI cs.LG

Knowledge Transfer Scaling Laws for 3D Medical Imaging

三维医学影像中的知识迁移扩展定律

Ho Hin Lee, Dongna Du, Chu Wang, Yuankai Huo, Shi Gu, James C. Gee, Yifan Wu

发表机构 * Vanderbilt University(范德比大学) Zhejiang University(浙江大学) McGill University(麦吉尔大学) University of Pennsylvania(宾夕法尼亚大学)

AI总结 本文研究了三维医学影像中不同域的预训练扩展规律,提出基于可转移性的数据分配方法,提升跨域迁移效果,实验表明该方法在疾病分类和器官分割任务中表现更优。

Comments 20 Pages

详情
AI中文摘要

视觉基础模型正逐步从二维扩展到体积域,如三维医学影像,其中跨不同成像模态(如CT、MRI和PET)的统一预训练可为多种临床任务提供基础模型。然而,训练此类模型需要混合异质影像域,而当前混合策略仍 largely 为启发式。本文观察到不同医学影像域在预训练过程中以不同速率扩展,且域间知识迁移具有强不对称性:在某一域训练可显著提升另一域,但反向效果可能较弱。有趣的是,MAE重建损失和跨域迁移均遵循可预测的幂律趋势。受此启发,我们将数据分配建模为扩展律优化问题。推导出的分配揭示了可解释的中心-岛屿结构:高可迁移域作为中心,受益于其他域并值得战略分配,而孤立域作为岛屿需直接投资。实验表明,迁移感知分配在数据比例采样基础上提升58%,且在未见过的预算下泛化性良好(r=0.989)。下游验证显示,推导出的迁移感知混合提供了更强的预训练表示,适用于临床三维医学影像任务。

英文摘要

Vision foundation models are increasingly moving beyond 2D to volumetric domains such as 3D medical imaging, where unified pretraining across different imaging modalities (i.e. CT, MRI, and PET) could provide foundational models for diverse clinical tasks. However, training such models requires mixing heterogeneous imaging domains, and current mixture strategies remain largely heuristic. In this work, we observe that different medical imaging domains scale at variable rates during pretraining, and knowledge transfer between domains is strongly asymmetric: training on one domain can substantially improve another, but the reverse may be much weaker. Interestingly, both MAE reconstruction loss and cross-domain transfer follow predictable power-law trends with domain-specific behaviors. Motivated by these findings, we formulate data allocation as a scaling-law optimization problem. The derived allocations reveal an interpretable hub-and-island structure: highly transferable domains emerge as hubs that benefit many others and deserve strategic allocation, while isolated domains act as islands requiring direct investment. Empirically, transfer-aware allocation outperforms data-proportional sampling by up to 58% and generalizes well to unseen budgets with r=0.989. Downstream validation on disease classification and organ/lesion segmentation further confirms that the derived transfer-aware mixtures provide stronger pretrained representations for clinical 3D medical imaging tasks.

2605.06850 2026-05-11 cs.LG cs.AI

How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment

如何在RL后训练中压缩KV缓存?基于影子遮罩的内存高效对齐

Rui Zhu, Weiheng Bai, Qiushi Wu, Yang Ren, Haixu Tang, Yuchu Liu

发表机构 * Yale University(耶鲁大学) University of Minnesota Twin Cities(明尼苏达大学双城分校) Indiana University Bloomington(印第安纳大学布卢明顿分校)

AI总结 本文提出影子遮罩蒸馏方法,用于缓解RL后训练中KV缓存内存瓶颈问题,通过减少采样时的上下文密度来降低偏置,提升样本效率。

详情
AI中文摘要

强化学习(RL)已成为解锁大语言模型(LLMs)高级推理能力的关键范式,包括RLHF和RLAIF等框架。无论采用哪种优化算法(如PPO、GRPO或在线DPO),在线RL本质上都需要生成探索轨迹(rollout)阶段。然而,对于长上下文推理任务,此阶段因Key-Value(KV)缓存足迹过大而面临严重的``内存墙''问题。尽管在rollout过程中应用KV缓存压缩可缓解内存开销,但会引入关键的非策略偏置。尽管现代KV压缩在标准推理中通常接近无损,但即使微小的近似误差也会因RL优化的固有不稳定性而大幅放大。具体而言,采样器在稀疏上下文中生成响应,而学习器则使用完整的密集上下文更新参数。现有统计解决方案,如重要性重加权,难以纠正此放大偏置,导致高梯度方差和严重样本不效率。

英文摘要

Reinforcement Learning (RL) has emerged as a crucial paradigm for unlocking the advanced reasoning capabilities of Large Language Models (LLMs), encompassing frameworks like RLHF and RLAIF. Regardless of the specific optimization algorithm (e.g., PPO, GRPO, or Online DPO), online RL inherently requires an exploratory trajectory generation (rollout) phase. However, for long-context reasoning tasks, this rollout phase imposes a severe ``memory wall'' due to the exorbitant Key-Value (KV) cache footprint. While applying KV cache compression during rollouts mitigates this memory overhead, it induces a critical off-policy bias. Although modern KV compression is often nearly lossless during standard inference, even minuscule approximation errors are drastically amplified by the inherent instability of RL optimization. Specifically, the sampler generates responses under a sparse context, whereas the learner updates parameters using the full, dense context. Existing statistical solutions, such as importance reweighting, struggle to correct this magnified bias, suffering from high gradient variance and severe sample inefficiency.

2605.06841 2026-05-11 cs.AI cs.LG

AGWM: Affordance-Grounded World Models for Environments with Compositional Prerequisites

AGWM:基于 affordance 的世界模型用于具有组合前提条件的环境

Qinshi Zhang, Weipeng Deng, Zhihan Jiang, Jiaming Qu, Qianren Li, Weitao Xu, Ray LC

发表机构 * University of California, San Diego(加州大学圣地亚哥分校) University of Hong Kong(香港大学) Columbia University(哥伦比亚大学) Amazon(亚马逊) City University of Hong Kong(香港城市大学)

AI总结 本文提出 AGWM 模型,通过学习抽象 affordance 结构来跟踪动作的动态可执行性,以解决传统世界模型在多步预测中的误差累积问题。

Comments 16 pages, 3 figures, 4 tables. Appendix on pages 11-16 (main text is self-contained)

详情
AI中文摘要

在基于模型的学习中,智能体通过模拟轨迹来学习行为,基于世界模型的预测。标准世界模型通常学习一个静态的转移函数,将状态和动作映射到下一个状态。当动作和结果在训练数据中频繁共现时,模型倾向于将这种相关性内化为一般的因果规则,而忽略动作前提条件。在交互环境中,智能体的动作可以改变未来的 affordance 空间。在每个时间步,一个动作只有在其前提条件得到满足后才可能被执行,或者在它们被破坏时不可执行。我们将此类事件称为结构改变事件(SC 事件)。因此,传统世界模型往往无法确定当前状态下给定动作是否可执行,尤其是在多步预测中。每个想象的步骤都基于错误的 affordance 状态,因此预测误差在滚出时间范围内累积。在本文中,我们提出 AGWM(基于 affordance 的世界模型),它学习一个抽象的 affordance 结构,表示为依赖前提的 DAG,以显式跟踪动作的动态可执行性。在基于游戏的模拟环境中进行的实验表明,我们的方法通过实现较低的多步预测误差、对新配置的更好泛化能力和改进的可解释性而有效。

英文摘要

In model-based learning, the agent learns behaviors by simulating trajectories based on world model predictions. Standard world models typically learn a stationary transition function that maps states and actions to next states, when an action and an outcome frequently co-occur in training data, the model tends to internalize this correlation as a general causal rule while ignoring action preconditions. In interactive environments, however, agent actions can reshape the future affordance space. At each timestep, an action may becomes executable only after its prerequisites are met, or non-executable when they are destroyed. We term such events structure-changing events (SC events). As a result, a conventional world model often fails to determine whether a given action is executable in the current state, especially in multi-step predictions. Each imagined step is conditioned on an incorrect affordance state, and therefore the prediction error compounds over the rollout horizon. In this paper, we propose AGWM (Affordance-Grounded World Model), which learns an abstract affordance structure represented as a DAG of prerequisite dependencies to explicitly track the dynamic executability of actions. Experiments on game-based simulated environments demonstrate the effectiveness of our method by achieving lower multi-step prediction error, better generalization to novel configurations, and improved interpretability.

2605.06835 2026-05-11 cs.LG cs.AI

On Privacy Leakage in Tabular Diffusion Models: Influential Factors, Attacker Knowledge, and Metrics

关于表格扩散模型中的隐私泄露:影响因素、攻击者知识和度量标准

Masoumeh Shafieinejad, D. B. Emerson, Behnoosh Zamanlooy, Elaheh Bassak, Fatemeh Tavakoli, Sara Kodeiri, Marcelo Lotif, Xi He

发表机构 * Vector Institute(向量研究所) McMaster University(麦 master 大学) University of Toronto(多伦多大学) University of Waterloo(滑铁卢大学)

AI总结 研究探讨了表格扩散模型中隐私泄露的影响因素,通过黑盒和白盒设置中的最新成员推断攻击,量化了训练设置、合成选择和攻击者知识对隐私泄露的影响,并揭示了启发式隐私度量的缺陷。

Comments 23 pages, 11 Figures, 12 Tables

详情
AI中文摘要

表格数据在许多领域和行业中发挥重要作用,包括那些具有高隐私考虑和风险的领域。因此,生成高质量的合成代理以减少隐私风险和专有数据暴露变得越来越重要。鉴于表格扩散模型(TDMs)在合成此类数据方面表现出领先性能,理解并衡量这些模型相关的隐私风险至关重要。通过利用最新的TDMs成员推断攻击,在黑盒和白盒设置中,本研究量化了训练设置、合成选择和攻击者知识对隐私泄露的影响。此外,结果表明,攻击者不需要拥有完美的训练设置知识、相同的数据分布或大规模计算资源即可构建成功的攻击。最后,应用启发式隐私度量(如最接近记录的距离)的缺陷被揭示。

英文摘要

Tabular data plays an important role in many fields and industries, including those with elevated privacy considerations and risks. As such, there is a rising interest in generating high-quality synthetic proxies for real tabular data as a means of reducing privacy risk and proprietary data exposure. With tabular diffusion models (TDMs) demonstrating leading performance in synthesizing such data, understanding and measuring the privacy risks associated with these models is imperative. Leveraging state-of-the-art membership inference attacks for TDMs in both black- and white-box settings, this work quantifies the impact of training setup, synthesis choices, and attacker knowledge on privacy leakage. Moreover, the results demonstrate that adversaries need not have perfect knowledge of the training setup, identical data distributions, or massive compute resources to construct successful attacks. Finally, the pitfalls associated with applying heuristic privacy metrics, such as distance-to-closest record, are revealed.

2605.06834 2026-05-11 cs.LG

Attribution-Based Neuron Utility for Plasticity Restoration in Deep Networks

基于归因的神经元效用用于深度网络中可塑性的恢复

Patrick Elisii, Lucas Beauchemin, Dawer Jamshed

发表机构 * The Vanguard Group, Inc.(维珍集团)

AI总结 本文提出基于梯度差分参考的效用度量,用于改进深度网络的持续学习中的可塑性恢复,通过估计替换单元的一阶功能成本提升干预可靠性。

详情
AI中文摘要

持续学习研究试图保留两项基本能力:新知识的获取和先前知识的保持。尽管知识可通过隐式或显式任务空间上的性能来衡量,模型可塑性通常涉及数据分布变化时的适应性。尽管文献大多关注灾难性遗忘,深度网络也可能因可塑性丧失而变得难以更新。最近的研究发现,这种现象的机制包括神经元饱和、参数范数增长和有用曲率方向的丧失。基于适应性重置的干预,即选择性重初始化低效用网络参数,已成为恢复训练性的实用解决方案。现有用于指导重置的效用度量,如激活幅度、贡献效用或基于梯度的活动,依赖于代理信号,这些信号可能与它们所引导的干预不一致。在本文中,我们引入了基于参考的梯度差分(GXD),这是一种理论驱动的效用度量,基于参考梯度归因,估计替换单元的一阶功能成本。我们的结果表明,在现有重置标准退化的情况下,与重置功能成本对齐的效用度量可以使得干预更加可靠。GXD将适应性重置重新表述为干预成本估计问题,为更稳健的持续学习系统提供了一条实用路径。

英文摘要

Continual learning research attempts to conserve two fundamental capabilities: new knowledge acquisition and the preservation of previously acquired knowledge. While knowledge in this case can be measured through performance over an implicit or explicit task space, model plasticity generally concerns adaptability as data distributions evolve. Though much of the literature has focused on catastrophic forgetting, deep networks can also suffer from loss of plasticity, becoming progressively harder to update under continued training. Recent research has identified multiple mechanisms underlying this phenomenon, including neuron saturation, parameter norm growth, and loss of useful curvature directions. Adaptive reset-based interventions, which selectively reinitialize low-utility network parameters, have emerged as practical solutions to restore trainability. Existing utility measures used to guide resets, such as activation magnitude, contribution utility, or gradient-based activity, rely on proxy signals that can become misaligned with the intervention they are meant to guide. In this paper, we introduce gradient times difference from reference (GXD), a theoretically motivated utility measure based on reference-based gradient attribution that estimates the first-order functional cost of replacing a unit. Our results show that utility measures aligned with the functional cost of the reset can make interventions more reliable in settings where existing reset criteria degrade. GXD reframes adaptive resetting as an intervention cost estimation problem, providing a practical path toward more robust continual learning systems.

2605.06832 2026-05-11 cs.CL cs.AI cs.LG

IntentGrasp: A Comprehensive Benchmark for Intent Understanding

IntentGrasp:意图理解的综合基准

Yuwei Yin, Chuyuan Li, Giuseppe Carenini

发表机构 * Department of Computer Science, University of British Columbia(不列颠哥伦比亚大学计算机科学系)

AI总结 本文提出IntentGrasp基准,用于评估大语言模型的意图理解能力,通过训练集和测试集评估20种模型,发现其表现不佳,提出Intentional Fine-Tuning方法提升性能。

Comments IntentGrasp data is available on [Hugging Face](https://huggingface.co/datasets/yuweiyin/IntentGrasp), and the code is released on [GitHub](https://github.com/YuweiYin/IntentGrasp)

详情
AI中文摘要

准确理解语音、对话和写作背后的意图对于开发有用的大型语言模型(LLM)助手至关重要。本文介绍了IntentGrasp,一个用于评估LLM意图理解能力的综合基准。该基准基于49个高质量、开放许可的涵盖12个不同领域的语料库,通过源数据集筛选、意图标签上下文化和任务格式统一构建。IntentGrasp包含262,759个大规模训练实例和两个评估集:包含12,909个测试案例的All集和更平衡且具有挑战性的Gem集。对20种LLM在7个家族(包括前沿模型如GPT-5.4、Gemini-3.1-Pro和Claude-Opus-4.7)的广泛评估显示,其表现不佳,All集得分低于60%,Gem集低于25%。值得注意的是,17种模型在Gem集上表现比随机猜测基线(15.2%)更差,而人类表现约为81.1%,显示出显著的改进空间。为增强此能力,本文提出意图微调(IFT),在IntentGrasp训练集上微调模型,显著提升All集的F1分数30+,Gem集提升20+。令人印象深刻的是,留一领域验证实验进一步证明了IFT的强跨领域泛化能力,验证了其作为大幅增强LLM意图理解能力的有前景方法。总体而言,通过基准测试和提升意图理解能力,本研究为更意图导向、有能力且安全的AI助手开辟了有前途的道路,以造福人类和社会。

英文摘要

Accurately understanding the intent behind speech, conversation, and writing is crucial to the development of helpful Large Language Model (LLM) assistants. This paper introduces IntentGrasp, a comprehensive benchmark for evaluating the intent understanding capability of LLMs. Derived from 49 high-quality, open-licensed corpora spanning 12 diverse domains, IntentGrasp is constructed through source datasets curation, intent label contextualization, and task format unification. IntentGrasp contains a large-scale training set of 262,759 instances and two evaluation sets: an All Set of 12,909 test cases and a more balanced and challenging Gem Set of 470 cases. Extensive evaluations on 20 LLMs across 7 families (including frontier models such as GPT-5.4, Gemini-3.1-Pro, and Claude-Opus-4.7) demonstrate unsatisfactory performance, with scores below 60% on All Set and below 25% on Gem set. Notably, 17 out of 20 tested models perform worse than a random-guess baseline (15.2%) on Gem Set, while the estimated human performance is ~81.1%, showing substantial room for improvement. To enhance such ability, this paper proposes Intentional Fine-Tuning (IFT), which fine-tunes the models on the training set in IntentGrasp, yielding significant gains of 30+ F1 points on All Set and 20+ points on Gem Set. Tellingly, the leave-one-domain-out (Lodo) experiments further demonstrate the strong cross-domain generalizability of IFT, verifying that it is a promising approach to substantially enhancing the intent understanding of LLMs. Overall, by benchmarking and boosting intent understanding ability, this study sheds light on a promising path towards more intentional, capable, and safe AI assistants for human benefits and social good.

2605.06830 2026-05-11 cs.LG cs.CL

ProtSent: Protein Sentence Transformers

ProtSent:蛋白质句子变换器

Dan Ofer, Oriel Perets, Michal Linial, Nadav Rappoport

发表机构 * Department of Biological Chemistry(生物化学系) Department of Computer and Information Science(计算机与信息科学系) The Hebrew University of Jerusalem(耶路撒冷希伯来大学) Ben-Gurion University of the Negev(内盖夫本·古里安大学)

AI总结 ProtSent通过对比学习框架改进蛋白质语言模型,提升蛋白质功能和结构表征能力,在23项下游任务中表现优异,尤其在远程同源检测和结构检索中取得显著提升。

Comments 9 figures, appendix, 2 figures, open code and models

详情
AI中文摘要

蛋白质语言模型(pLMs)生成的每种残基表示能够捕捉进化和结构信息,但其均值池化序列嵌入并未显式训练以反映蛋白质之间的功能、进化或结构相似性。我们提出了蛋白质句子变换器(ProtSent),一种用于将pLMs适应为通用嵌入模型的对比微调框架。ProtSent在五个蛋白质对数据集上训练:Pfam家族、结构推导的难样本、AlphaFold DB结构对、StringDB蛋白质-蛋白质相互作用以及深度突变扫描数据。我们在23项下游任务上评估,使用冻结嵌入和k近邻探针测量嵌入邻域质量。在ESM-2 150M上,ProtSent在15项任务中取得提升,远程同源检测提升105%,变体效应预测提升17%,SCOPe-40结构检索Recall@1提升19.9%。35M变体在16项任务中表现优异,远程同源检测提升40.5%,SCOPe-40结构检索Recall@1提升15.5%。对比微调重构了嵌入空间,以更好地捕捉蛋白质功能和结构,无需任何任务特定监督。我们发布了模型、公共数据、训练配方和代码。

英文摘要

Protein language models (pLMs) produce per-residue representations that capture evolutionary and structural information, yet their mean-pooled sequence embeddings are not explicitly trained to reflect functional, evolutionary or structural similarity between proteins. We present Protein Sentence Transformers (ProtSent), a contrastive fine-tuning framework for adapting PLMs into general-purpose embedding models. ProtSent trains with MultipleNegativesRankingLoss across five protein-pair datasets: Pfam families, structurally derived hard negatives, AlphaFold DB structural pairs, and StringDB protein--protein interactions, and Deep Mutational Scanning data. We evaluate on 23~downstream tasks using frozen embeddings with a k-nearest-neighbor probe to measure embedding neighborhood quality. On ESM-2 150M, ProtSent improves 15 of 23 tasks, with gains of +105% on remote homology detection, +17% on variant effect prediction, and +19.9% Recall@1 on SCOPe-40 structural retrieval. The 35M variant improves 16 of 23 tasks with +40.5% on remote homology and +15.5% Recall@1 on SCOPe-40. Contrastive fine-tuning restructures the embedding space to better capture protein function and structure, without any task-specific supervision. We release the models, public data, and training recipe and code.

2605.06829 2026-05-11 cs.LG cs.CV cs.ET cs.IT cs.NE math.IT

A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models

扩散、基于分数的和流匹配生成模型的统一测度论观点

Aditya Ranganath, Mukesh Singhal

发表机构 * Center for Applied Scientific Computing(应用科学计算中心) Lawrence Livermore National Laboratory(劳伦斯利弗莫尔国家实验室) University of California Merced(加州默塞德大学)

AI总结 本文从测度论角度统一了扩散、基于分数和流匹配生成模型,探讨了时间依赖向量场诱导的边缘分布,并分析了采样、稳定性和计算的权衡。

Comments 62 pages, 1 figure, jmlr preprint

详情
AI中文摘要

我们调查了基于通过随机或确定性动力学将简单参考分布运输到数据分布的连续时间生成建模方法。我们提出了一种统一框架,其中扩散模型、基于分数生成模型和流匹配是学习时间依赖向量场的实例,该向量场诱导了一组边缘分布(ρ_t)_{t ∈ [0,1]},由连续性和福克-计划克方程控制。这种统一理论及时,因为这些方法在方法上趋于一致,但碎片化的符号和竞争的推导仍然模糊了它们的共同结构和采样、稳定性和计算的实践权衡。在此框架内,我们(i)推导了扩散和基于分数模型的反向时间采样作为受控随机动力学,(ii)显示了概率流ODE产生相同的边缘分布并连接扩散到基于似然的归一化流,(iii)将流匹配解释为在选定插值下的速度场直接回归,澄清了它与基于分数训练何时一致或不同。我们比较了目标、采样方案和离散化误差在统一符号下,讨论了与施罗德inger桥梁和熵最优传输的联系,并总结了近似、稳定性和可扩展性的理论保证和开放问题。

英文摘要

We survey continuous-time generative modeling methods based on transporting a simple reference distribution to a data distribution via stochastic or deterministic dynamics. We present a unified framework in which diffusion models, score-based generative models, and flow matching are instances of learning a time-dependent vector field that induces a family of marginals $(ρ_t)_{t \in [0,1]}$ governed by continuity and Fokker-Planck equations. Such a unified theory is timely because these methods are converging methodologically, yet fragmented notation and competing derivations continue to obscure their shared structure and the practical tradeoffs governing sampling, stability, and computation. Within this framework, we (i) derive reverse-time sampling for diffusion and score-based models as controlled stochastic dynamics, (ii) show that the probability flow ODE yields identical marginals and connects diffusion to likelihood-based normalizing flows, and (iii) interpret flow matching as direct regression of the velocity field under a chosen interpolation, clarifying when it coincides with or differs from score-based training. We compare objectives, sampling schemes, and discretization errors under unified notation, discuss connections to Schrodinger bridges and entropic optimal transport, and summarize theoretical guarantees and open problems on approximation, stability, and scalability.

2605.06825 2026-05-11 cs.AI cs.RO

Randomness is sometimes necessary for coordination

随机有时是协调所必需的

Rohan Patil, Jai Malegaonkar, Henrik I. Christensen

发表机构 * Department of Computer Science and Engineering(计算机科学与工程系)

AI总结 本文提出Diamond Attention架构,通过引入随机数实现临时排名,使不同角色在协作中得以区分,从而在不同规模的团队中实现零样本部署,展示了结构化随机在协调任务中的关键作用。

详情
AI中文摘要

在合作多智能体强化学习(MARL)中,全参数共享是标准做法。然而,在排列对称的观测下,共享的确定性策略会为每个智能体输出相同的行为分布,使角色区分成为不可能。这种失败可以通过在匿名相同处理器之间引入对称性打破来理论解决,这需要随机性。我们提出了Diamond Attention,一种交叉注意力架构,其中每个智能体在每个时间步采样一个标量随机数,诱导出临时的排名顺序,使低排名的同伴在智能体间注意力中被遮蔽,同时保持任务注意力完全未遮蔽。这实现了在单次广播轮次中实现随机位协调协议,且基于集合的注意力使零样本部署到不同规模的团队成为可能。我们评估了三个隔离结构化随机重要性的场景。在完全对称的XOR游戏中,我们的方法在1.0的成功率下,而所有确定性基线都停滞在0.5附近。在控制协调任务中,训练于N=4的策略可零样本泛化到N∈[2,8]。在SMACLite跨场景迁移中,我们实现了标准基线无法迁移的零样本迁移。此外,用标准的dropout基于随机性替换结构化掩码导致0%的胜率,证实了协议空间结构而非随机噪声是起作用的成分。

英文摘要

Full parameter sharing is standard in cooperative multi-agent reinforcement learning (MARL) for homogeneous agents. Under permutation-symmetric observations, however, a shared deterministic policy outputs identical action distributions for every agent, making role differentiation impossible. This failure can theoretically be resolved using symmetry breaking among anonymous identical processors, which requires randomness. We propose Diamond Attention, a cross-attention architecture in which each agent samples a scalar random number per timestep, inducing a transient rank ordering that masks lower-ranked peers from agent-to-agent attention while leaving task attention fully unmasked. This realizes a random-bit coordination protocol in a single broadcast round, and the set-based attention enables zero-shot deployment to teams of different sizes. We evaluate across three regimes that isolate when structured randomness matters. On the perfectly symmetric XOR game, our method achieves $1.0$ success while all deterministic baselines plateau near $0.5$. On control coordination tasks, a policy trained on $N=4$ generalizes zero-shot to $N \in [2,8]$. On SMACLite cross-scenario transfer, we achieve zero-shot transfer where standard baselines cannot transfer due to structural limitations. Furthermore, replacing the structured mask with standard dropout-based randomness results in a 0\% win rate, confirming that protocol-space structure, not stochastic noise, is the operative ingredient. https://anonymous.4open.science/r/randomness-137A/

2605.06822 2026-05-11 cs.LG

SHARP: A Self-Evolving Human-Auditable Rubric Policy for Financial Trading Agents

SHARP: 一种自进化可审计的规则策略用于金融交易代理

Xiwen Chen, Wenhui Zhu, Songzhu Zheng, Kashif Rasul, Yueyue Deng, Huayu Li

发表机构 * Morgan Stanley(摩根大通) Arizona State University(亚利桑那州立大学) Columbia University(哥伦比亚大学) University of Arizona(亚利桑那大学)

AI总结 SHARP通过结构化规则优化解决金融交易代理中信用分配问题,提升策略鲁棒性和透明度,使紧凑模型性能提升10-20个百分点。

详情
AI中文摘要

大型语言模型(LLMs)越来越多地用于自主金融交易,该领域需要持续适应嘈杂、非平稳市场。现有自我改进代理通常通过无约束的自由文本优化解决此问题。然而,在低信噪比和延迟标量奖励(P&L)环境中,这种无结构方法加剧了根本的信用分配问题:优化器无法可靠地区分系统性逻辑错误与随机市场波动,导致策略漂移。为克服这一瓶颈,我们引入了自进化可审计规则策略(SHARP),一种神经符号框架,用结构化符号策略优化取代无约束文本变异。SHARP将代理的推理限制在一组明确的条件-动作规则的有界范围内。当次优交易发生时,一个归因代理通过跨样本推理来隔离特定规则失败。这使得可以进行针对性的原子策略编辑,随后通过严格的时间序列验证进行正则化。在三个不同的股票行业和四个LLM基础架构上评估,SHARP一致将通用初始启发式转化为高度稳健的策略,使紧凑模型的实证性能平均提升10到20个百分点(例如,GPT-4o-mini)。最终,SHARP证明LLMs可以实现动态和高效的适应,同时显著提高机构金融所需的结构透明度和可审计性。

英文摘要

Large language models (LLMs) are increasingly deployed for autonomous financial trading, a domain requiring continuous adaptation to noisy, non-stationary markets. Existing self-improving agents typically address this through unbounded free-form prompt optimization. However, in low signal-to-noise environments with delayed scalar rewards (P\&L), this unstructured approach exacerbates the fundamental credit assignment problem: optimizers cannot reliably distinguish systematic logic flaws from stochastic market variance, inevitably leading to policy drift. To overcome this bottleneck, we introduce the Self-Evolving Human-Auditable Rubric Policy (SHARP), a neuro-symbolic framework that replaces unconstrained text mutation with structured, symbolic policy optimization. SHARP confines the agent's reasoning to a bounded, human-readable rubric of explicit condition-action rules. When sub-optimal trades occur, an attribution agent employs cross-sample reasoning across multiple samples to isolate specific rule failures. This enables targeted, atomic policy edits that are subsequently regularized through strict walk-forward validation. Evaluated across three diverse equity sectors and four LLM backbones, SHARP consistently transforms generic initial heuristics into highly robust strategies, lifting the empirical performance of compact models by 10 to 20 percentage points on average (e.g., GPT-4o-mini). Ultimately, SHARP demonstrates that LLMs can achieve dynamic and efficient adaptation while significantly enhancing the structural transparency and auditability demanded by institutional finance.

2605.06821 2026-05-11 cs.LG cs.AI math.OC stat.ML

A Rod Flow Model for Adam at the Edge of Stability

在稳定性边缘的Adam流模型

Eric Regis, Sinho Chewi

发表机构 * Yale University(耶鲁大学)

AI总结 本文提出基于参数和一阶矩的联合相空间中的流模型,用于更精确地跟踪在稳定性边缘的离散迭代,扩展至Adam等优化器。

详情
AI中文摘要

Cohen等人(arXiv:2207.14484)观察到自适应梯度方法如Adam在稳定性边缘运行。尽管已有大量关于连续时间梯度下降在稳定性边缘建模的工作,但将其扩展到动量方法仍不完善。在梯度下降设置中,Regis等人(arXiv:2602.01480)引入了杆流,将连续迭代建模为扩展的一维对象——'杆'。本文通过在参数和一阶矩(w, m)的联合相空间中工作,并将二阶矩ν视为平滑的辅助变量,将杆流扩展到Adam。我们还为重球动量、Nesterov动量以及RMSProp、Adam和NAdam的标量和分量版本开发了杆流。对于所有八个优化器,我们在代表性机器学习架构上经验性地评估了杆流,在跟踪稳定性边缘的离散迭代方面比相应的稳定流更准确。

英文摘要

Cohen et al. (arXiv:2207.14484) observed that adaptive gradient methods such as Adam operate at the edge of stability. While there has been significant work on continuous-time modeling of gradient descent at the edge of stability, extending these models to momentum methods remains underdeveloped. In the gradient descent setting, Regis et al. (arXiv:2602.01480) introduced rod flow, which models consecutive iterates as an extended one-dimensional object -- a "rod." Here we extend rod flow to Adam by working in the joint phase space of parameters and first moment $(w, m)$ and treating the second moment $ν$ as a smooth auxiliary variable. We also develop rod flows for heavy ball momentum, Nesterov momentum, and scalar and per-component versions of RMSProp, Adam, and NAdam. For all eight optimizers, we empirically evaluate rod flow on representative machine learning architectures, where it tracks the discrete iterates through the edge-of-stability regime significantly more accurately than the corresponding stable flow.

2605.06819 2026-05-11 cs.LG

A Theory of Online Learning with Autoregressive Chain-of-Thought Reasoning

在线学习中自回归链式推理理论

Ilan Doron-Arad, Idan Mehalel, Elchanan Mossel

发表机构 * MIT(麻省理工学院) The Hebrew University(希伯来大学)

AI总结 本文研究自回归生成过程中的在线学习问题,探讨反馈形式对学习误差界的影响,证明在链式推理模型中可消除生成步数依赖,提出最优误差界及新下界。

详情
AI中文摘要

自回归生成是大语言模型的核心机制,可视为重复应用下一个标记生成器:从输入字符串开始,经过M步生成,最后生成的标记作为输出。[Joshi等人,2025]提出了PAC模型研究此类过程的可学习性。本文发展了该框架的在线版本,关注由未知下一个标记生成器诱导的最终输出学习的误差界。我们区分了两种反馈形式:在端到端模型中,学习者仅观察最终生成的标记;在链式推理模型中,学习者还看到完整的M步轨迹。我们的目标是理解最优误差界如何依赖于生成步数M,以及观察中间标记能减少这种依赖程度。主要结果表明,自回归在线学习的理论图景与[Hanneke等人,2026]发现的统计学图景相似,但依赖程度不同。在端到端模型中,我们证明了生成步数M的误差界增长率可能在常数到对数之间任意出现,并证明对数上限是不可避免的。在链式推理模型中,我们证明访问完整生成轨迹可完全消除对M的依赖。我们还分析了自回归线性阈值类,并证明了最优误差界及统计学设置的新下界。同时,我们的结果解决了[Joshi等人,2025]留下的几个问题。

英文摘要

Autoregressive generation lies at the heart of the mechanism of large language models. It can be viewed as the repeated application of a next-token generator: starting from an input string (prompt), the generator is applied for $M$ steps, and the last generated token is taken as the final output. [Joshi et al., 2025] proposed a PAC model for studying the learnability of the input-output maps arising from this process. We develop an online analogue of this framework, focusing on the mistake bound of learning the final output induced by an unknown next-token generator. We distinguish between two forms of feedback. In the End-to-End model, after each round the learner observes only the final token produced after $M$ autoregressive steps. In the Chain-of-Thought model, the learner is additionally shown the entire $M$-step trajectory. Our goal is to understand how the optimal mistake bound depends on the generation horizon $M$, and to what extent observing intermediate tokens can reduce this dependence. Our main results show that the online theory of autoregressive learning exhibits a qualitative picture analogous to the statistical one found by [Hanneke et al., 2026], but with a different scale of dependence on the generation horizon. In the End-to-End model, we prove a taxonomy of possible mistake-bound growth rates in the generation horizon $M$: essentially any rate between constant and logarithmic can arise. We further show that this logarithmic ceiling is unavoidable. In the Chain-of-Thought model, we show that access to the full generated trajectory eliminates the dependence on $M$ altogether. We also analyze autoregressive linear threshold classes, and prove optimal mistake bounds, as well as a new lower bound for the statistical setting. Along the way, our results resolve several questions left open by [Joshi et al., 2025].

2605.06815 2026-05-11 cs.AI cs.CV

Uneven Evolution of Cognition Across Generations of Generative AI Models

不同世代生成式AI模型认知能力的不均衡发展

Isaac Galatzer-Levy, Daniel McDuff, Xin Liu, Jed McGiffin

发表机构 * Google DeepMind(谷歌DeepMind) Google Research(谷歌研究) University of Washington(华盛顿大学)

AI总结 本文通过心理测量框架评估生成式AI的认知特征,发现其在语言抽象推理方面发展迅速,而视觉感知组织能力停滞,表明架构偏倚影响认知平衡发展。

Comments 25 pages, 5 Figures, 3 Tables

详情
AI中文摘要

人工智能通用智能的追求需要评估模型认知能力的稳健方法,超越狭义任务表现。本文引入了一个心理测量框架来评估生成式AI的认知特征,将其与人类标准进行比较,并跟踪其在不同世代中的演变。对领先多模态模型的初步评估显示,其认知架构存在显著不均衡:在语言理解和工作记忆方面表现接近天花板(>98百分位),而在感知推理方面表现接近地板(<1百分位)。为跟踪超越人类标准极限的发展轨迹,我们开发了人工智能商数(AIQ)基准,并将其应用于六个世代和两个模型家族,揭示了显著但不对称的表现提升。值得注意的是,我们发现模态之间存在明显差异;抽象量化推理在以语言呈现时成熟得远比以视觉类似格式呈现时更快,表明架构偏向于基于语言的符号操作。虽然抽象视觉推理有所提升,但视觉感知组织能力仍基本停滞。总体而言,这些发现表明生成模型的认知能力正在不均衡地发展,暗示仅通过扩展和优化AGI开发方法可能不足以克服实现平衡、类人一般智能的基本架构限制。

英文摘要

The pursuit of artificial general intelligence necessitates robust methods for evaluating the cognitive capabilities of models beyond narrow task performance. Here, we introduce a psychometric framework to assess the cognitive profiles of generative AI, comparing them to human norms and tracking their evolution across generations. Initial evaluation of leading multimodal models using tasks adapted from the Wechsler Adult Intelligence Scale revealed a profoundly uneven cognitive architecture: near-ceiling performance in verbal comprehension and working memory (>$98^{\text{th}}$ percentile) contrasted with near-floor performance in perceptual reasoning (<$1^{\text{st}}$ percentile). To track developmental trajectories beyond human-normed limits, we developed the Artificial Intelligence Quotient (AIQ) Benchmark and applied it to six generations and two model families, revealing significant but asymmetric performance gains. Notably, we uncovered a sharp dissociation between modalities; abstract quantitative reasoning matured far more rapidly when presented linguistically compared to a visually analogous format, indicating an architectural bias towards language-based symbolic manipulation. While abstract visual reasoning improved, visual-perceptual organization remained largely stagnant. Collectively, these findings demonstrate that the cognitive abilities of generative models are evolving unevenly, suggesting that scaling and optimization approaches to AGI development alone may be insufficient to overcome fundamental architectural limitations in achieving balanced, human-like general intelligence.

2605.06814 2026-05-11 cs.LG

From Model to Data (M2D): Shifting Complexity from GNNs to Graphs for Transparent Graph Learning

从模型到数据(M2D):将复杂性从GNNs转移到图中以实现透明的图学习

Debolina Halder Lina, Arlei Silva

发表机构 * Department of Computer Science, Rice University, Houston, TX 77005, USA(计算机科学系,里士大学,休斯顿,德克萨斯州,77005,美国) Department of Computer Science & Ken Kennedy Institute, Rice University, Houston, TX 77005, USA(计算机科学系及肯尼迪学院,里士大学,休斯顿,德克萨斯州,77005,美国)

AI总结 本文提出M2D框架,通过将模型复杂度转移到数据空间,提升图神经网络的透明度,揭示公平性目标和注意力聚合等机制,增强模型可解释性而不影响性能。

详情
AI中文摘要

图神经网络(GNNs)虽然性能优异,但对人类来说却缺乏透明性,难以理解和比较多种架构。现有可解释性方法仅能将预测归因于节点、边或特征,但无法提供架构透明性或解释简单与复杂模型之间的性能差距。为解决这一限制,我们引入Model-to-Data(M2D)知识蒸馏,一种新的框架,通过将模型复杂性转移到数据空间来增加透明性。M2D将教师模型蒸馏到一个特征丰富且结构增强的增强图中,使简单的学生模型能够匹配教师模型的性能。通过在数据空间中实现模型行为,我们的方法允许人类直接检查架构优势。我们展示M2D以可解释的方式揭示底层机制,如公平性目标和基于注意力的聚合,从而在不牺牲性能的情况下增强GNN的透明性。

英文摘要

Graph Neural Networks (GNNs) achieve high performance but can be opaque to humans, making it difficult to understand and compare the many proposed architectures. While existing explainability methods attribute individual predictions to nodes, edges, or features, they do not provide architectural transparency or explain the fundamental performance gap between simple and more complex models. To address this limitation, we introduce Model-to-Data (M2D) distillation, a new framework that increases transparency by transferring model complexity into the data space. M2D distills the teacher model into an augmented graph with enriched features and structure, enabling a simple student to match the teacher's performance. By materializing model behavior in the data, our approach allows humans to inspect architectural advantages directly. We show that M2D reveals underlying mechanisms such as fairness objectives and attention-based aggregation in an interpretable way, enhancing GNN transparency while preserving performance.

2605.06812 2026-05-11 cs.AI

Towards Security-Auditable LLM Agents: A Unified Graph Representation

迈向安全可审计的LLM代理:一种统一的图表示

Chaofan Li, Lyuye Zhang, Jintao Zhai, Siyue Feng, Xichun Yang, Huahao Wang, Shihan Dou, Yu Ji, Yutao Hu, Yueming Wu, Yang Liu, Deqing Zou

发表机构 * Huazhong University of Science and Technology(华中科技大学) Nanyang Technological University(南洋理工大学) Fudan University(复旦大学) Chongqing University of Posts and Telecommunications(重庆邮电大学) Donghua University(东华大学)

AI总结 本文提出Agent-BOM统一图表示,用于解决LLM代理系统中执行意图与物理事件间的语义鸿沟问题,通过构建分层属性有向图实现安全审计与风险评估。

详情
AI中文摘要

基于LLM的代理系统正迅速发展,通过动态工具调用、状态内存管理和多代理协作执行复杂自主任务。然而,这种语义驱动的执行范式导致低层物理事件与高层执行意图之间存在严重的语义鸿沟,使事后安全审计变得根本困难。现有表示机制,包括静态SBOM和运行时日志,仅提供碎片化证据,无法捕捉认知状态演变、能力绑定、持久内存污染以及交互代理间的级联风险传播。为弥合这一鸿沟,我们提出Agent-BOM,一种统一的结构表示用于代理安全审计。Agent-BOM将代理系统建模为分层属性有向图,将静态能力基础(如模型、工具和长期记忆)与动态运行时语义状态(如目标、推理轨迹和动作)分离。这些层次通过语义边和安全属性连接,将碎片化的执行轨迹转化为可查询的审计路径。基于Agent-BOM,我们开发了基于图查询的路径级风险评估范式,并将其实例化为OWASP Agentic Top 10。我们进一步在OpenClaw环境中实现一个审计插件,从实时执行中构建Agent-BOM。在具有代表性的现实世界代理攻击场景评估中,Agent-BOM能够重建隐蔽的攻击链,包括跨会话内存污染和工具滥用、能力供应链劫持和意外代码执行、多代理生态系统劫持以及特权和信任滥用。这些结果表明,Agent-BOM为复杂代理生态系统中的根本原因分析和安全裁定提供了统一且可审计的基础。

英文摘要

LLM-based agentic systems are rapidly evolving to perform complex autonomous tasks through dynamic tool invocation, stateful memory management, and multi-agent collaboration. However, this semantics-driven execution paradigm creates a severe semantic gap between low-level physical events and high-level execution intent, making post-hoc security auditing fundamentally difficult. Existing representation mechanisms, including static SBOMs and runtime logs, provide only fragmented evidence and fail to capture cognitive-state evolution, capability bindings, persistent memory contamination, and cascading risk propagation across interacting agents. To bridge this gap, we propose Agent-BOM, a unified structural representation for agent security auditing. Agent-BOM models an agentic system as a hierarchical attributed directed graph that separates static capability bases, such as models, tools, and long-term memory, from dynamic runtime semantic states, such as goals, reasoning trajectories, and actions. These layers are connected through semantic edges and security attributes, transforming fragmented execution traces into queryable audit paths. Building on Agent-BOM, we develop a graph-query-based paradigm for path-level risk assessment and instantiate it with the OWASP Agentic Top 10. We further implement an auditing plugin in the OpenClaw environment to construct Agent-BOM from live executions. Evaluation on representative real-world agentic attack scenarios shows that Agent-BOM can reconstruct stealthy attack chains, including cross-session memory poisoning and tool misuse, capability supply-chain hijacking and unexpected code execution, multi-agent ecosystem hijacking, and privilege and trust abuse. These results demonstrate that Agent-BOM provides a unified and auditable foundation for root-cause analysis and security adjudication in complex agentic ecosystems.

2605.06809 2026-05-11 cs.CV cs.LG

LookWhen? Fast Video Recognition by Learning When, Where, and What to Compute

LookWhen? 通过学习何时、何地和什么来计算实现快速视频识别

Ali Salamatian, Anthony Fuller, Pritam Sarkar, James R. Green, Leonid Sigal, Evan Shelhamer

发表机构 * University of British Columbia(不列颠哥伦比亚大学) Carleton University(卡尔顿大学) Vector Institute(向量研究所)

AI总结 本文提出LookWhen框架,通过学习何时、何地和什么来计算,实现高效视频识别,在多个数据集上优于现有模型。

详情
AI中文摘要

Transformer在视频识别中占据主导地位。它们将视频分割成token,并处理这些token具有昂贵的超线性计算成本。然而,视频充满冗余,因此可以质疑这种成本的必要性。我们引入LookWhen,一种选择器-提取器框架,将视频识别分解为学习何时、何地和什么来计算。我们的浅层选择器获取缩放后的视频并快速评分所有空间-时间token,而我们的深层提取器获取顶部K个选定token以近似完整视频表示,而无需实际处理所有token。关键挑战是定义有效的监督用于选择和提取。对于选择预训练,我们引入一种基于表示的评分,使用简单的最近邻距离对token进行唯一性排序。对于提取预训练,我们蒸馏了视频教师和图像教师,以学习视频内变化的内容。通过这些策略,我们的选择器-提取器学习了通用且高效的表示用于特征提取或微调到特定任务。在Kinetics-400、SSv2、Epic-Kitchens、Diving48、Jester和Charades数据集上的实验表明,LookWhen在准确率-计算量权衡上优于高效模型和类似规模的升级基线。在9/12种情况下(6个任务x2种设置),LookWhen在准确率-FLOPs上Pareto主导,其余3种情况大致匹配。在准确率-吞吐量上,测量实际时间,LookWhen在同等准确率下仍比InternVideo2-B快6.7倍。

英文摘要

Transformers dominate video recognition. They split videos into tokens, and processing them has expensive superlinear computational cost. Yet videos are filled with redundancy, so we can question the need for this expense. We introduce LookWhen, a selector-extractor framework that factorizes video recognition into learning when, where, and what to compute. Our shallow selector gets a scaled-down video and quickly scores all tokens across space-time, while our deep extractor gets the top-K selected tokens to approximate full-video representations without actually processing all the tokens. A key challenge is defining effective supervision for selection and extraction. For selection pre-training, we introduce a score on representations that ranks tokens by uniqueness using a simple nearest-neighbor distance. For extraction pre-training, we distill both a video teacher and an image teacher, for which we normalize its frame-wise representations to learn what changes within videos. Through these strategies, our selector-extractor learns general and efficient representations for feature extraction or fine-tuning to a task. Through experiments on Kinetics-400, SSv2, Epic-Kitchens, Diving48, Jester, and Charades, we show that LookWhen achieves a better accuracy-computation trade-off than efficient models and upgraded baselines of similar size. LookWhen Pareto-dominates in accuracy-FLOPs on 9 of 12 cases (6 tasks x 2 settings) and roughly matches on 3. In accuracy-throughput, measuring time in practice, LookWhen is more efficient still at 6.7x faster than InternVideo2-B at equal accuracy.