arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.06546 2026-06-08 cs.LG 新提交

你确定吗？符号回归中不确定性量化的全面且可理解的综述

Julia Reuter, Fabricio Olivetti de Franca

AI总结综述符号回归中的不确定性量化方法，涵盖频率派、贝叶斯和模型选择三个研究方向，指出该领域尚待探索。

2606.06572 2026-06-08 cs.LG cs.AI cs.CY econ.GN q-fin.EC 新提交

Generative Models Erode Human Temporal Learning Through Market Selection

生成模型通过市场选择侵蚀人类时间学习

Wenjun Cao

AI总结本文论证现代生成模型在亚AGI能力水平上通过市场选择机制侵蚀人类时间学习，提出价值崩溃路径并用昂贵检验框架形式化，跨领域证据显示验证侵蚀四阶段。

Comments Accepted at ICML 2026

详情

Journal ref: Forty-third International Conference on Machine Learning Position Paper Track (2026)

AI中文摘要

我们认为，现代生成模型在当前亚AGI能力水平上对知识和文化生产造成了结构性风险。我们将人类时间学习（HTL）定义为通过长期持续参与问题而形成的路径依赖的知识积累。生成输出在表面特征上越来越像HTL密集型工作，因此验证给定输出是否反映真正的人类学习的成本相对于其预期收益变得高昂。一旦验证失去经济合理性，评估者就会奖励输出而不论其生产模式，而投入多年学习的生产者则在与几乎零成本生成的输出的价格竞争中处于劣势。我们将这一路径称为价值崩溃，并通过一个昂贵检验框架将其形式化。来自学术出版、法律实践、内容平台和软件安全的跨领域证据映射出验证侵蚀的四个阶段。对齐成功是正交的。更好的对齐模型缩小了人类与AI输出之间的可观察差距，使得来源验证更加困难，并加剧了对HTL密集型工作的竞争压力，即使单个AI输出有所改进。

英文摘要

We argue that modern generative models create structural risks for knowledge and cultural production at current, sub-AGI capability levels. We define Human Temporal Learning (HTL) as path-dependent knowledge accumulation through sustained engagement with problems over time. Generative outputs increasingly resemble HTL-intensive work in surface features, so verifying whether a given output reflects genuine human learning grows costly relative to its expected benefit. Once verification loses economic justification, evaluators reward outputs regardless of production mode, and producers who invested years of learning compete on price against outputs that cost almost nothing to generate. We call this pathway value collapse and formalize it through a costly-inspection framework. Cross-domain evidence from academic publishing, legal practice, content platforms, and software security maps onto four stages of verification erosion. Alignment success is orthogonal. Better-aligned models narrow observable gaps between human and AI outputs, making source verification harder and intensifying competitive pressure against HTL-intensive work even when individual AI outputs improve.

URL PDF HTML ☆

赞 0 踩 0

2606.06574 2026-06-08 cs.LG 新提交

Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

跳过一层还是循环它？学习LLM中的层程序

Ziyue Li, Yang Li, Tianyi Zhou

发表机构 * Ziyue Li, Yang Li, Tianyi Zhou（李子悦、李阳、周天毅）

AI总结本文提出PoLar方法，通过轻量级预测网络学习为每个输入动态跳过或重复预训练层，在数学推理任务上以更少层数提升准确率。

Comments Accepted at ICML 2026. Substantially extends arXiv:2507.07996. Code: https://github.com/tianyi-lab/PoLar

详情

AI中文摘要

Future AI-integrated Radio Access Networks (AI-RAN) will combine open programmability with learning-enabled xApps, rApps, and control functions that act on shared parameters and key performance indicators (KPIs). For conflict monitoring, it is not enough to know which applications are deployed; the system must also know whether the parameter--KPI dependencies assumed by runtime diagnosis remain valid under the current operating regime. This paper studies a lightweight monitoring primitive for that purpose: tracking an interpretable dependency representation from streaming telemetry events. We represent active dependencies by a Boolean matrix and use Boolean matrix multiplication to check whether recent parameter-activity and KPI-response events are consistent with the current estimate. We propose a sliding-window inference procedure that reuses the estimate when it remains consistent and recomputes it when recent observations indicate structural change. The tracker is intended as an explainable signal for conflict diagnosis and slow-loop model refresh, not as an autonomous mitigation mechanism. Experiments on controlled Boolean event streams show efficient and accurate tracking under dependency changes and Boolean observation noise.

URL PDF HTML ☆

赞 0 踩 0

2606.06673 2026-06-08 cs.LG 新提交

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

不确定性感知的LLM引导策略塑形用于稀疏奖励强化学习

Ujjwal Bhatta, Utsabi Dangol, Sumaly Bajracharya, Rodrigue Rizk, KC Santosh

发表机构 * USD AI Research Lab（USD人工智能研究实验室）

AI总结提出ULPS框架，结合校准的大语言模型与不确定性估计，通过A*轨迹微调BERT模型提供动作建议，并用熵机制平衡LLM引导与PPO策略，在MiniGridUnlockPickup基准上显著提升成功率、奖励效率和样本复杂度。

Comments Accepted to the 2026 IEEE Conference on Artificial Intelligence (IEEE CAI). 6 pages, 3 figures. Code available at: https://github.com/USD-AI-ResearchLab/uncertainty-aware-llm-rl

详情

DOI: 10.1109/CAI68641.2026.11536354

AI中文摘要

稀疏奖励和异构任务序列仍然是强化学习（RL）中的持续挑战，常常导致收敛缓慢、泛化能力弱和探索效率低下。我们提出不确定性感知的LLM引导策略塑形（ULPS），这是一个新颖的框架，将校准的大语言模型（LLM）集成到RL训练循环中，以提供结构化、不确定性调制的行为引导。ULPS采用基于A*的预言机来合成最优符号轨迹，用于微调基于BERT的语言模型。在训练过程中，该模型提供动作建议，其影响取决于通过蒙特卡洛（MC）dropout估计的认知不确定性。基于熵的混合机制自适应地平衡LLM引导和学习到的策略（通过近端策略优化，PPO），使智能体能够优先考虑可靠先验，同时保持适应性。我们在MiniGridUnlockPickup基准上评估ULPS，并观察到在成功率、奖励效率和样本复杂度上，相对于无引导、未校准和标准RL基线的一致改进。ULPS在微调后执行准确率提高了9%以上，需要更少的环境交互，并获得了更高的奖励AUC。我们的结果表明，集成符号A*轨迹、预训练语言先验和不确定性感知控制，为稀疏奖励领域中的多任务强化学习提供了一种原则性且有效的方法，并具有扩展到部分可观察和多智能体设置的潜力。

英文摘要

Sparse rewards and heterogeneous task sequences remain persistent challenges in Reinforcement Learning (RL), often resulting in slow convergence, weak generalization, and inefficient exploration. We propose Uncertainty-Aware LLM-Guided Policy Shaping (ULPS), a novel framework that integrates a calibrated Large Language Model (LLM) into the RL training loop to provide structured, uncertainty-modulated behavioral guidance. ULPS employs an A*-based oracle to synthesize optimal symbolic trajectories, which are used to fine-tune a BERT-based language model. During training, this model supplies action suggestions whose influence is conditioned on epistemic uncertainty estimated via Monte Carlo (MC) dropout. An entropy-based blending mechanism adaptively balances LLM guidance and the learned policy (via Proximal Policy Optimization, PPO), allowing the agent to prioritize reliable priors while preserving adaptability. We evaluate ULPS on the MiniGridUnlockPickup benchmark and observe consistent improvements in success rate, reward efficiency, and sample complexity over unguided, uncalibrated, and standard RL baselines. ULPS achieves more than 9% improvement in execution accuracy after fine-tuning, requires fewer environment interactions, and yields higher reward AUC. Our results demonstrate that integrating symbolic A* trajectories, pretrained language priors, and uncertainty-aware control offers a principled and effective approach to multi-task reinforcement learning in sparse-reward domains, with potential extensibility to partially observable and multi-agent settings.

URL PDF HTML ☆

赞 0 踩 0

2606.06682 2026-06-08 cs.LG 新提交

ShallowBench: 浅口袋靶标上的生成式药物设计模型基准测试

Saket Reddy, Shiwei Liu

发表机构 * University of Illinois - Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结提出ShallowBench基准，包含5780个浅口袋靶标，用于评估生成式药物设计模型在低凹度界面上的性能，揭示现有模型预测结合亲和力较弱的问题。

详情

AI中文摘要

虽然生成式AI模型在基于结构的药物设计中已展现出显著成功，但它们主要依赖深结合口袋，难以对具有挑战性的低口袋性靶标（如历史上“不可成药”的肿瘤靶标KRAS和MYC）采样有效配体。为弥补这一空白，我们引入了ShallowBench，这是一个从CrossDocked2020中提取的包含5780个浅口袋靶标的严格精选基准。通过计算Alpha Shape“盖子”体积与底层蛋白质原子体素体积之间的差异，我们成功分离出低凹度靶标，同时确保足够的结合表面积。评估多种最先进的生成模型显示，在这些低凹度界面上预测的结合亲和力较弱。因此，ShallowBench为生成生物学模型提供了一个严格的基准，并强调了需要能够应对这些具有挑战性靶标的新型架构创新或损失函数。

英文摘要

While generative AI models have demonstrated remarkable success in structure-based drug design, they predominantly rely on deep binding pockets and struggle to sample effective ligands for challenging low-pocketability targets, such as the historically "undruggable" oncology targets KRAS and MYC. To address this gap, we introduce ShallowBench, a strictly curated benchmark of 5,780 shallow-pocket targets extracted from CrossDocked2020. By computing the difference between an Alpha Shape "lid" volume and the underlying protein atom voxel volume, we successfully isolated targets with low concavity while ensuring sufficient surface area for binding. Evaluating various state-of-the-art generative models reveals weaker predicted binding affinity on these low-concavity interfaces. ShallowBench therefore provides a rigorous benchmark for generative biology models and highlights the necessity of new architectural innovations or loss functions capable of navigating these challenging targets.

URL PDF HTML ☆

赞 0 踩 0

2606.06718 2026-06-08 cs.LG cs.AI cs.SY eess.SY 新提交

MSAIC-Net: A Multi-Scale Attention and Imbalance-Aware Contrastive Network for ECG-Based Myocardial Substrate Abnormality Detection

MSAIC-Net：用于基于心电图的心肌基质异常检测的多尺度注意力和不平衡感知对比网络

Canyu Lei, Fenglin Zhang, Derek Bivona, Cristiane Singulane, Jonathan Pan, Kenneth Bilchick, Amit R. Patel, Jianxin Xie

发表机构 * University of Virginia（弗吉尼亚大学）

AI总结提出多尺度注意力增强卷积网络MSAIC-Net，通过并行空洞卷积提取多尺度特征、通道注意力重加权、不平衡感知对比学习及导联置换重要性分析，在低数据量UVA队列和大规模PTB-XL数据集上实现心肌瘢痕和心肌梗死检测的准确性和可解释性提升。

详情

AI中文摘要

心肌基质异常，如心肌瘢痕和心肌梗死（MI），与不良心血管结局相关。心电图（ECG）为检测这些异常提供了一种低成本且广泛可用的工具，但由于异质性导联依赖性表现、高维多导联信号、类别不平衡以及深度学习模型的可解释性有限，基于ECG的检测仍然具有挑战性。我们提出了一种多尺度注意力增强卷积网络（MSAIC-Net）用于基于ECG的心肌基质异常检测。MSAIC-Net采用并行空洞卷积分支，在多个时间感受野上提取ECG特征，使模型能够捕捉局部和更长时间范围的时间模式。然后使用通道注意力自适应地重新加权信息性导联和特征通道表示。为了解决类别不平衡并提高特征可分性，我们引入了一种新颖的不平衡感知监督对比学习策略，鼓励同一类别的样本形成紧凑表示，同时增加异常和正常样本之间的分离。进一步引入导联置换重要性来量化每个ECG导联的贡献并提高模型可解释性。该方法在两个互补数据集上进行了评估：来自弗吉尼亚大学（UVA）健康系统的低数据量机构队列用于心肌瘢痕分类，以及来自PhysioNet的大规模公共PTB-XL数据集用于MI识别。实验结果表明，MSAIC-Net优于基线模型，在低数据量的UVA队列中改进尤为显著。总体而言，所提出的框架为基于ECG的心肌基质异常检测提供了一种有效且可解释的方法。

英文摘要

Myocardial substrate abnormalities, such as myocardial scar and myocardial infarction (MI), are associated with adverse cardiovascular outcomes. Electrocardiography (ECG) provides a low-cost and widely available tool for detecting these abnormalities, but ECG-based detection remains challenging due to heterogeneous lead-dependent manifestations, high-dimensional multi-lead signals, class imbalance, and the limited interpretability of deep learning models. We propose a multi-scale attention-enhanced convolutional network (MSAIC-Net) for ECG-based myocardial substrate abnormality detection. MSAIC-Net employs parallel atrous convolutional branches to extract ECG features across multiple temporal receptive fields. %, enabling the model to capture both local and longer-range temporal patterns. Channel attention is then used to adaptively reweight informative lead-wise and feature-channel representations. To address class imbalance and improve feature separability, we introduce a novel imbalance-aware supervised contrastive learning strategy that encourages samples from the same class to form compact representations while increasing separation between abnormal and normal samples. Lead-wise permutation importance is further incorporated to quantify the contribution of each ECG lead and improve model interpretability. The proposed method was evaluated on two complementary datasets: a low-data institutional cohort from the University of Virginia (UVA) Health System for myocardial scar classification and the large-scale public PTB-XL dataset from PhysioNet for MI identification. Experimental results show that MSAIC-Net outperforms baseline models, with particularly pronounced improvements in the low-data UVA cohort. Overall, the proposed framework provides an effective and interpretable approach for ECG-based detection of myocardial substrate abnormalities.

URL PDF HTML ☆

赞 0 踩 0

2606.06722 2026-06-08 cs.LG 新提交

Flatland: The Adventures of Gradient Descent with Large Step Sizes

平面国：大步长梯度下降的冒险

Leonardo Galli, Curtis Fox, Wiebke Bartolomaeus, Mark Schmidt, Holger Rauhut

发表机构 * University of British Columbia（不列颠哥伦比亚大学）； Munich Center for Machine Learning（慕尼黑机器学习中心）； Canada CIFAR AI Chair (Amii)（加拿大CIFAR人工智能主席（Amii））

AI总结针对非全局L-光滑的神经网络目标函数，提出一种只需梯度局部Lipschitz连续的定义，设计自适应一阶方法实现大步长并始终处于稳定性边缘，发现过早进入全局平坦区域会降低收敛速度和泛化能力。

Comments Accepted for the International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

神经网络的训练通常涉及非全局$L$-光滑的目标函数。对于这些函数，从理论和实践上都很难回答这样一个问题：确保梯度下降（GD）收敛的最大可能步长是多少？我们通过提供“大”步长的统一定义来解决深度学习中长期存在的这个开放问题，该定义仅要求梯度的局部Lipschitz（甚至Hölder）连续性。我们设计了一阶自适应方法，这些方法可证明产生大步长，并表明它们从训练开始就处于稳定性边缘（EoS）。特别地，损失非单调下降，并且步长与锐度（即Hessian矩阵的最大特征值）的乘积在整个训练过程中保持在EoS阈值2以上。使用我们的方法，我们还能将锐度最小化到其全局最小值。与预期相反，我们发现训练过早遇到全局平坦区域可能会减慢收敛速度并损害网络的泛化能力。利用自稳定化论证，我们允许GD进入稍尖锐的谷底，并将不成功的训练运行转变为非常成功的运行。

英文摘要

The training of neural networks often entails objective functions that are not globally $L$-smooth. For these functions, it is both theoretically and practically difficult to reply to the question: what is the largest possible step size that ensures the convergence of gradient descent (GD)? We address this longstanding open question in deep learning by providing a unifying definition of "large" step sizes that requires only local Lipschitz (or even Hölder) continuity of the gradient. We design first-order adaptive methods that provably yield large step sizes and show that they operate at the edge of stability (EoS) right from the start of the training. In particular, the loss decreases nonmonotonically and the product between the step size and sharpness, i.e., the largest eigenvalue of the Hessian, stays above the EoS threshold of 2 throughout training. Using our method, we are also able to minimize the sharpness all the way down to its global minimum. Contrary to expectation, we find that encountering globally-flat regions too early in the training may both slow down convergence and jeopardize the generalization ability of the network. Exploiting a self-stabilization argument, we allow GD to enter slightly sharper valleys and turn unsuccessful training runs into very successful ones.

URL PDF HTML ☆

赞 0 踩 0

2606.06724 2026-06-08 cs.LG 新提交

Synthics: Synthetic Physics-like Datasets for Machine Learning

Synthics: 用于机器学习的类物理合成数据集

Jari Vepsäläinen

发表机构 * Aalto University（阿尔托大学）

AI总结提出基于贝叶斯概率上下文无关文法从方程语料库生成结构相似的合成回归数据集，通过非侵入式探测确定物理有效域，统计验证表明合成数据在结构保真度上优于纯概率方法，且在下游超参数调优中性能接近真实数据。

详情

AI中文摘要

代表性数据是机器学习的基础，因为有限的数据会阻碍泛化。收集足够的真实样本通常是不可行的。合成数据生成提供了一种实用的解决方案，但前提是生成的数据能忠实反映真实观测的结构。本文提出了一种方法，用于生成结构上类似于给定方程语料库中物理方程的合成回归数据集。该方法使用贝叶斯概率上下文无关文法来捕捉语料库的底层代数结构，并从中采样新方程。为了确保生成的输入位于物理有意义的域内，通过非侵入式探测为每个方程表征适用域，同时恢复变量间约束。输入采样进一步通过从有效域的随机子范围中混合均匀分布和截断正态分布进行抽取，模拟真实的实验条件。使用Kolmogorov-Smirnov检验对Feynman方程语料库进行统计验证。生成的方程在全部八个研究的结构特征上与语料库匹配，而未经平滑的纯概率文法仅匹配两个特征，表明贝叶斯先验对于在给定语料库规模下实现结构保真度至关重要。在下游超参数调优任务中，基于合成数据调优的梯度提升回归器在真实数据上平均选出20个配置中的第6优配置，与基于真实数据调优的结果相当，并显著优于随机表达式树（第10优）和噪声（第19优）。

英文摘要

Representative data is fundamental in machine learning, as limited data hinders generalisation. Collecting sufficient real-world samples is often infeasible. Synthetic data generation offers a practical solution, but only if the generated data faithfully reflects the structure of real observations. In this paper, a method for generating synthetic regression datasets that structurally resemble physics equations from a given equation corpus is presented. The approach uses a Bayesian Probabilistic Context-Free Grammar to capture the underlying algebraic structure of the corpus, from which novel equations are sampled. To ensure the generated inputs lie within a physically meaningful domain, the applicability domain is characterised for each equation through non-intrusive probing, also recovering inter-variable constraints. Input sampling further mimics realistic experimental conditions by drawing from random sub-ranges of the valid domain with mixed uniform and truncated normal distributions. The generated data is statistically validated against the Feynman equation corpus using Kolmogorov-Smirnov tests. The generated equations match the corpus on all of the eight studied structural features, compared to only two for an unsmoothed purely probabilistic grammar, demonstrating that the Bayesian prior is essential for structural fidelity given the size of the corpus. In a downstream hyperparameter-tuning task, a gradient-boosted regressor tuned on the synthetic data picks, on average, the 6th-best configuration out of 20 on real data, matching the result of tuning on real data itself and substantially outperforming random expression trees (10th) and noise (19th).

URL PDF HTML ☆

赞 0 踩 0

2606.06746 2026-06-08 cs.LG 新提交

Performance Variation in Deep Reinforcement Learning

深度强化学习中的性能变化

Haruto Tanaka, A. Rupam Mahmood

发表机构 * Department of Computing Science University of Alberta（计算科学系阿尔伯塔大学）； Alberta Machine Intelligence Institute (Amii)（阿尔伯塔人工智能研究所（Amii））； CIFAR AI Chair（CIFAR人工智能主席）

AI总结针对深度强化学习算法运行间鲁棒性低的问题，提出基于百分位的统计量（min-max IPR）和可视化方法（逐运行百分位高亮），用于评估性能变化，并通过三个案例验证其有效性。

详情

AI中文摘要

深度强化学习（RL）算法通常存在运行间鲁棒性低的问题，表现为相同配置的智能体在独立运行中性能显著变化。尽管这一问题在研究和实践中带来一系列挑战，但开发评估方法的研究相对较少；相反，RL研究通常报告估计平均性能的不确定性。在本文中，我们概述了传统不确定性和变化估计的局限性，特别是它们与目的的不匹配以及低估风险。然后，我们提出了一种基于百分位的替代统计量和可视化方法，分别称为min-max IPR和逐运行百分位高亮。这些基于百分位的工具易于解释，并依赖于样本百分位的标准性质，提供了关于运行间性能变化的丰富信息。我们通过三个案例研究证明了这一点。首先，我们表明LayerNorm和倒数第二层归一化缩小了PPO的性能变化，而SAC的变化基本不变。其次，我们比较了PPO、SAC、TD-MPC和TD-MPC2，并表明TD-MPC在四个算法中变化最小，同时数据效率最高。最后，在五个Atari环境上比较DQN和Rainbow时，我们表明两种算法表现出相似水平的性能变化。

英文摘要

Deep reinforcement learning (RL) algorithms often suffer from low run-to-run robustness, manifesting as significant performance variation across independent runs of identically configured agents. Although this issue poses a spectrum of challenges across research and practice, relatively few studies develop methods to evaluate it; RL research instead often reports uncertainty in the estimated mean performance. In this paper, we outline the limitations of conventional uncertainty and variation estimates, particularly their misalignment with purpose and the risk of underreporting. We then propose an alternative percentile-based statistic and visualization method, min-max IPR and run-wise percentile highlighting, respectively. These percentile-based tools are easy to interpret and rely on standard properties of sample percentiles, providing rich information about run-to-run performance variation. We demonstrate this through three case studies. First, we show that LayerNorm and penultimate-layer normalizations narrow performance variation in PPO, whereas the variation is mostly unchanged in SAC. Second, we compare PPO, SAC, TD-MPC, and TD-MPC2, and show TD-MPC exhibits the least variation while being the most data efficient among the four. Finally, in a comparison of DQN and Rainbow on five Atari environments, we show that both algorithms exhibit similar levels of performance variation.

URL PDF HTML ☆

赞 0 踩 0

2606.06776 2026-06-08 cs.LG 新提交

A Rolling-Window Framework for Churn Prediction and Behavioral Driver Identification

一个用于流失预测和行为驱动因素识别的滚动窗口框架

Muhammad Jawad Mufti, Omar Hammad, Haitham Saleh, Muqaddas Gull

发表机构 * Information and Computer Science Department, King Fahd University of Petroleum and Minerals（国王法赫德石油和矿物大学信息与计算机科学系）； Interdisciplinary Research Center for Smart Mobility and Logistics (IRC-SML), King Fahd University of Petroleum and Minerals（智能交通与物流交叉研究中心（IRC-SML），国王法赫德石油和矿物大学）； SDAIA–KFUPM Joint Research Center for Artificial Intelligence, King Fahd University of Petroleum and Minerals（SDAIA-KFUPM人工智能联合研究中心，国王法赫德石油和矿物大学）

AI总结提出基于滚动行为窗口的流失预测框架，在非契约服务环境中实现连续风险评估，特征模型准确率87.6%、ROC-AUC 0.94，序列模型召回率96.1%。

详情

AI中文摘要

客户流失预测是客户分析中的核心任务，特别是在非契约、按使用付费的服务环境中，脱离行为并非明确可观察，而必须从行为不活跃中推断出来。现有的流失预测方法通常依赖于简化的时间假设或客户行为的单点表示，这限制了它们支持持续风险评估、可解释性和随时间推移的现实部署的能力。本研究提出了一个时间显式的流失预测框架，该框架使用滚动行为窗口对客户行为进行建模，从而随着客户活动的演变实现重复和实例级别的流失风险估计。客户行为在固定的30天观察窗口内进行总结，随后是30天的未来流失评估窗口，确保行为证据与流失结果之间存在明确的时间分离。该框架在统一的时间设计内集成了基于特征和基于序列的学习方法。所提出的方法在来自非契约服务平台的大规模真实世界数据集上进行了评估。实证结果表明了强大且稳定的预测性能，基于特征的模型准确率达到87.6%，ROC-AUC为0.94，而基于序列的模型通过捕获时间脱离模式实现了高达96.1%的召回率。对未来未见数据的评估证实了在时间偏移下具有显著的鲁棒性，无需模型重新训练即可保持准确率高于83%和ROC-AUC超过0.91。总体而言，研究结果强调，精心设计的时间框架（而非仅模型复杂性）对于实现稳健、可解释且可部署的流失预测至关重要。本研究为动态服务环境中面向流失的决策支持提供了实用基础。

英文摘要

Customer churn prediction is a central task in customer analytics, particularly in non-contractual, pay-per-use service environments where disengagement is not explicitly observed and must be inferred from behavioral inactivity. Existing churn prediction approaches often rely on simplified temporal assumptions or single-point representations of customer behavior, which limit their ability to support continuous risk assessment, interpretability, and realistic deployment over time. This study proposes a temporally explicit churn prediction framework that models customer behavior using rolling behavioral windows, enabling repeated and instance-level churn risk estimation as customer activity evolves. Customer behavior is summarized within a fixed 30-day observation window, followed by a 30-day future churn evaluation window, ensuring a clear temporal separation between behavioral evidence and churn outcomes. The framework integrates feature-based and sequence-based learning approaches within a unified temporal design. The proposed approach is evaluated on a large-scale, real-world dataset from a non-contractual service platform. Empirical results demonstrate strong and stable predictive performance, with accuracy reaching 87.6% and ROC-AUC of 0.94 for the feature-based model, while the sequence-based model achieves recall as high as 96.1% by capturing temporal disengagement patterns. Evaluation on future unseen data confirms meaningful robustness under temporal shift, with accuracy remaining above 83% and ROC-AUC exceeding 0.91 without model retraining. Overall, the findings highlight that carefully designed temporal framing, rather than model complexity alone, is critical for achieving robust, interpretable, and deployment-ready churn prediction. The study provides a practical foundation for churn-oriented decision support in dynamic service environments.

URL PDF HTML ☆

赞 0 踩 0

2606.06786 2026-06-08 cs.LG cs.NI 新提交

Federated Foundation Models over Vehicular Networks

车辆网络中的联邦基础模型

Kasra Borazjani, Fardis Nadimi, Payam Abdisarabshali, Owen Palinski, Allan Salihovic, Dinh Nguyen, Minghui Liwang, Seyyedali Hosseinalipour

发表机构 * University of Waterloo（多伦多大学）

AI总结本文提出将多模态多任务联邦基础模型（M3T FedFMs）集成到车辆网络中的愿景，结合基础模型的表达力与联邦学习的隐私保护分布式学习能力，并通过Waymo数据集案例验证其潜力。

Comments 8 pages, 4 figures

详情

AI中文摘要

本文提出了一种前瞻性的愿景，将新兴的多模态多任务联邦基础模型（M3T FedFMs）集成到车辆网络中，旨在统一多模态多任务基础模型（M3T FMs）的表达能力与联邦学习（FL）的隐私保护和分布式学习能力。鉴于这一研究方向在很大程度上尚未被探索，我们首先介绍了M3T FedFMs的基本训练/微调原理。然后，我们讨论了它们在车辆网络中的一系列代表性用例，展示了M3T FedFMs在实现下一代车辆智能方面的巨大潜力。之后，我们识别了车辆环境固有的关键约束，这些约束挑战了M3T FedFMs的实际部署，并阐述了一系列前瞻性的研究方向以应对这些挑战。此外，通过在真实车辆数据集（即Waymo开放数据集）上进行的案例研究，我们展示了M3T FedFMs在车辆网络中的前景，并发布了我们的实现以促进可重复性和刺激这一新兴领域的研究（仓库：此 https URL）。

英文摘要

This paper presents a forward-looking vision for integrating the emerging multi-modal multi-task federated foundation models (M3T FedFMs) into vehicular networks, with the goal of unifying the expressive power of multi-modal multi-task foundation models (M3T FMs) with the privacy-preserving and distributed learning capabilities of federated learning (FL). Given the largely underexplored nature of this research direction, we first introduce the fundamental training/fine-tuning principles of M3T FedFMs. We then discuss a range of their representative use cases in vehicular networks, illustrating the significant potential of M3T FedFMs to enable next-generation vehicular intelligence. Afterwards, we identify key constraints inherent to vehicular environments that challenge the practical deployment of M3T FedFMs, and articulate a set of forward-looking research directions to address these challenges. Furthermore, through a case study conducted on a real-world vehicular dataset (i.e., Waymo Open Dataset), we demonstrate the promise of M3T FedFMs for vehicular networks and release our implementation to facilitate reproducibility and stimulate research in this emerging area (repository: https://github.com/KasraBorazjani/vehicular-fedfm)

URL PDF HTML ☆

赞 0 踩 0

2606.06804 2026-06-08 cs.LG stat.AP 新提交

Interpreting Learning Under Competing Models: Joint and Stepwise Approaches for Dynamic Cognitive Diagnosis

解释竞争模型下的学习：动态认知诊断的联合与逐步方法

Yawen Ma, Sahoko Ishida, Kate Cain, Gabriel Wallin

发表机构 * School of Mathematical Sciences, Lancaster University（兰卡斯特大学数学科学学院）； Department of Computer Science, University of Oxford（牛津大学计算机科学系）； Department of Psychology, Lancaster University（兰卡斯特大学心理学系）

AI总结研究在项目-技能结构未知时，联合估计Q矩阵与学习过程相比先确定Q矩阵再研究学习，如何改变对学习者发展的结论，并通过动态认知诊断模型分析阅读游戏数据，发现联合分析更可靠。

详情

AI中文摘要

数字学习环境记录学习者对单个项目的反应，使得研究特定技能的发展而非总体分数成为可能。从这些数据中得出关于学习的结论需要一个将反应与潜在技能联系起来的模型，并追踪掌握程度随时间的变化。当每个项目测量的技能未知时，分析者必须决定是联合估计这种结构（Q矩阵）与学习过程，还是先确定它再研究学习。我们表明，这一决定可以改变关于学习者如何发展的实质性结论。使用动态认知诊断模型，我们分析了两个阅读游戏的数据，这些游戏测量了从二年级到三年级的词汇和理解能力，项目文本嵌入为未知的Q矩阵提供了先验信息。联合分析和偏差校正的逐步分析一致认为，大多数学习者朝着掌握两种技能的方向发展，但在三年级时有多少人仍然只部分熟练的问题上存在分歧，从而改变了阅读进展的报告方式。模拟研究确定了两种分析何时出现分歧，并表明当项目-技能结构不确定且项目池在不同年级之间变化时，联合分析更可靠。我们提供了两种分析的R代码。

英文摘要

Digital learning environments record learners' responses to individual items, making it possible to study the development of specific skills rather than overall scores. Drawing conclusions about learning from these data requires a model that links responses to latent skills and tracks how mastery changes over time. When the skills measured by each item are unknown, the analyst must decide whether to estimate this structure, the Q-matrix, jointly with the learning process, or to establish it first and study learning afterwards. We show that this decision can change substantive conclusions about how learners develop. Using dynamic cognitive diagnostic models, we analyse data from two reading games measuring vocabulary and comprehension from Grade 2 to Grade 3, with item-text embeddings providing prior information for the unknown Q-matrix. A joint analysis and a bias-corrected stepwise analysis agree that most learners move toward mastering both skills, but disagree about how many remain only partially proficient at Grade 3, changing how reading progress would be reported. A simulation study identifies when the two analyses diverge and shows that joint analysis is more reliable when the item-skill structure is uncertain and the item pool changes between grades. We provide R code for both analyses.

URL PDF HTML ☆

赞 0 踩 0

2606.06820 2026-06-08 cs.LG cs.AI 新提交

SCALE: Scalable Cross-Attention Learning with Extrapolation for Agentic Workflow Scheduling

SCALE: 可扩展的交叉注意力学习与外推方法用于智能体工作流调度

Zhifei Xu, Jierui Lan, Zixuan Liang, Aiji Liang, Jinxi He

发表机构 * Faculty of Arts and Sciences, Beijing Normal University（北京师范大学文理学院）

AI总结提出SCALE调度器，通过交叉注意力指针网络和结构化表示正则化，实现无需微调即可泛化到不同规模集群的深度强化学习工作流调度。

Comments Submitted to Computer Networks

详情

AI中文摘要

智能体大型语言模型系统将复杂任务分解为工作流有向无环图，其原语必须在异构集群上调度。现有的深度强化学习调度器与固定集群大小绑定，当服务器数量变化时需要重新训练。我们提出SCALE（可扩展的交叉注意力学习与外推），一种无需微调即可泛化到未见过的集群规模的深度强化学习调度器。SCALE采用交叉注意力指针网络，其中任务特征查询服务器特征，因此架构通过构造接受任意数量的服务器。然而，我们观察到仅排列不变架构并不能保证在新规模下的良好性能——随着服务器数量增长，注意力特征经历分布偏移。为了解决这个问题，我们引入结构化表示正则化：一种去相关损失结合朝向标准正态的KL惩罚，使特征统计量无论输入大小都保持稳定。在16个节点上训练并直接在32和48个节点上测试，SCALE在N=48时相对于没有SRR的相同架构将平均响应时间降低了8.9%，确认了显式正则化对于缩小规模泛化差距是必要的。

英文摘要

Agentic Large Language Model (LLM) systems decompose complex tasks into workflow Directed Acyclic Graphs (DAGs) whose primitives must be scheduled on heterogeneous clusters. Existing deep reinforcement learning (DRL) schedulers are tied to a fixed cluster size and require retraining whenever the number of servers changes. We propose SCALE (Scalable Cross-Attention Learning with Extrapolation), a DRL scheduler that generalizes to unseen cluster scales without fine-tuning. SCALE employs a cross-attention pointer network where task features query against server features, so the architecture accepts any number of servers by construction. We observe, however, that permutation-invariant architecture alone does not guarantee good performance at new scales - the attention feature undergoes distribution shift as the server count grows. To counter this, we introduce Structured Representation Regularization (SRR): a decorrelation loss combined with a KL penalty toward the standard normal, which keeps feature statistics stable regardless of input size. Trained on 16 nodes and tested directly on 32 and 48 nodes, SCALE reduces average response time by 8.9% at N=48 relative to the same architecture without SRR, confirming that explicit regularization is necessary to close the scale-generalization gap.

URL PDF HTML ☆

赞 0 踩 0

2606.06823 2026-06-08 cs.LG cs.AI q-fin.ST 新提交

PandaAI: A Practical Agent CQ2 for Neuro-symbolic Data Analysis And Integrated Decision-Making in Quantitative Finance

PandaAI: 一种用于量化金融中神经符号数据分析与集成决策的实用智能体CQ2

Yuqi Li, Siyuan Liu, Bingjun Liu

发表机构 * Panda AI

AI总结针对金融数据低信噪比和非平稳性，提出PandaAI，一种结合市场状态建模与约束alpha生成的闭环神经符号LLM智能体，通过领域微调和模块化架构实现风险感知决策，在沪深300数据上Rank IC提升18.2%，最大回撤降低25.7%。

详情

AI中文摘要

尽管深度学习在各个领域表现出色，但由于金融数据的低信噪比（SNR）和非平稳性，其在金融序列决策中的应用仍然具有挑战性。利用大型语言模型（LLM）的推理能力，我们提出了\textbf{PandaAI}，一种具有市场状态建模和约束alpha生成的闭环神经符号LLM智能体，它桥接了通用LLM推理与金融严谨性，并抑制了LLM生成输出的金融毒性。为了弥合通用语言能力与金融严谨性之间的差距，我们微调了一个领域特定的LLM。此外，我们将此LLM集成到模块化架构中，形成一个闭环系统。与传统优化孤立预测指标的模型不同，\textbf{PandaAI}被设计为一种神经符号智能体，以明确的风险意识在复杂、真实的金融环境中导航。在沪深300股票数据上的大量实验表明，\textbf{PandaAI}比最先进的时间序列模型实现了$18.2\%$更高的Rank IC和$25.7\%$更低的最大回撤。我们的约束LLM生成和双通道适应方法为LLM在高风险序列决策场景中的部署提供了一种通用范式。

英文摘要

While deep learning has excelled in various domains, its application to sequential decision-making in finance remains challenging due to the low Signal-to-Noise Ratio (SNR) and non-stationarity of financial data. Leveraging the reasoning capabilities of Large Language Models (LLMs), we propose \textbf{PandaAI}, a closed-loop neuro-symbolic LLM agent with market regime modeling and constrained alpha generation, which bridges general LLM reasoning with financial rigor and suppresses the financial toxicity of LLM-generated outputs. To bridge the gap between general linguistic capability and financial rigor, we fine-tune a domain-specific LLM. Furthermore, we integrate this LLM into a modular architecture and form a closed-loop system. Unlike traditional models that optimize isolated prediction metrics, \textbf{PandaAI} is designed as a neuro-symbolic agent that navigates the complex, real-world financial environment with explicit risk awareness. Extensive experiments on CSI 300 stock data show that \textbf{PandaAI} achieves a $18.2\%$ higher Rank IC and $25.7\%$ lower maximum drawdown than state-of-the-art time-series models. Our constrained LLM generation and dual-channel adaptation method provide a general paradigm for LLM deployment in high-stakes sequential decision-making scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.06827 2026-06-08 cs.LG 新提交

Architecture Shapes Transfer Specificity in Implicit Neural Representations

架构影响隐式神经表示中的迁移特异性

D Yang Eng

发表机构 * D Yang Eng

AI总结通过控制实验和偏微分方程基准测试，研究SIREN、ReLU MLP和傅里叶特征MLP三种隐式神经表示架构的迁移特性，发现迁移幅度与迁移特异性分离，ReLU更具源选择性，而SIREN广泛重用权重。

详情

AI中文摘要

坐标网络中的迁移通常通过热启动增益来衡量，但这种增益反映的是源特定结构还是通用权重重用尚不明确。我们通过控制分析测试、二维顶盖驱动方腔纳维-斯托克斯基准以及一维热方程、粘性伯格斯方程和聚焦三次非线性薛定谔方程参考解套件，研究了三种隐式神经表示（INR）家族：SIREN、ReLU MLP和傅里叶特征MLP。分析测试使用独立种子随机控制，而PDE基准使用同族替代源控制和辅助消融。在各种设置下，迁移幅度和迁移特异性明显分离。在10种子控制的一维几何测试中，傅里叶特征显示出最大的结构化迁移（33.1倍），其次是SIREN（23.0倍）和ReLU（10.7倍），但ReLU的选择性更强：随机控制迁移为0.41倍，而SIREN为14.24倍。在受控的双参数一维族中，排名发生变化：在默认设置下，ReLU给出了最清晰的结构化与控制分离，而傅里叶特征仅在带宽重新调整后才有改进。在纳维-斯托克斯和更广泛的一维PDE套件中，没有单一架构主导所有方程，但相同的模式仍然存在：SIREN通常广泛重用权重，而ReLU以及在某些方程中的傅里叶特征更具源选择性。静态诊断仍然薄弱，启发式缩放律$A_{\text{transfer}} \propto 1/\Delta t^2$在所实施的一维审计中被拒绝。这些结果将迁移特异性定位为坐标网络的有用诊断，并表明科学机器学习中的架构选择应在明确控制条件下进行评估，而不仅仅依据迁移幅度。

英文摘要

Transfer in coordinate networks is often measured by warm-start gain, but whether that gain reflects source-specific structure or generic weight reuse is less clear. We study this question across three implicit neural representation (INR) families, SIREN, ReLU MLPs, and Fourier-feature MLPs, using controlled analytic tests, a 2D lid-driven-cavity Navier--Stokes benchmark, and 1D PDE reference-solution suites for heat, viscous Burgers, and focusing cubic NLS. The analytic tests use independent-seed random controls, while the PDE benchmarks use alternate same-family source controls and auxiliary ablations. Across settings, transfer magnitude and transfer specificity separate clearly. In a 10-seed controlled 1D geometric test, Fourier Features show the largest structured transfer ($33.1\times$), followed by SIREN ($23.0\times$) and ReLU ($10.7\times$), but ReLU is far more selective: random-control transfer is $0.41\times$ for ReLU versus $14.24\times$ for SIREN. On a controlled two-parameter 1D family, the ranking changes: ReLU gives the clearest structured-versus-control separation at default settings, whereas Fourier Features improve only after bandwidth retuning. In Navier--Stokes and the broader 1D PDE suite, no single architecture dominates every equation, yet the same pattern remains: SIREN often reuses weights broadly, whereas ReLU and, in some equations, Fourier Features are more source-selective. Static diagnostics remain weak, and the heuristic scaling law $A_{\text{transfer}} \propto 1/Δt^2$ is rejected in the implemented 1D audit. These results position transfer specificity as a useful diagnostic for coordinate networks and suggest that architecture selection in scientific machine learning should be evaluated under explicit control conditions, not by transfer magnitude alone.

URL PDF HTML ☆

赞 0 踩 0

2606.06833 2026-06-08 cs.LG cs.AI cs.CR 新提交

Hearing the Unspoken: Language Model Priors for Acoustic Adversarial Attacks

听弦外之音：面向声学对抗攻击的语言模型先验

Jiani Xie, Andrew C. Cullen, Paul Montague, Benjamin I. P. Rubinstein

发表机构 * University of Melbourne（墨尔本大学）； DST Group（DST集团）

AI总结提出Semantic Gambit攻击，利用大语言模型实时提供预测上下文，突破因果限制，使实时ASR系统词错误率提升至35.6%，较当前最优方法提高三倍。

2606.06854 2026-06-08 cs.LG 新提交

The Geometry of Last-Layer Model Stealing

最后一层模型窃取的几何学

Snigdha Chandan Khilar

发表机构 * Independent Researcher（独立研究者）

AI总结利用几何学解释如何通过已知方法窃取机器学习模型，展示了完美复制Transformer网络最后一层的条件，并揭示了隐藏层的限制。

2606.06861 2026-06-08 cs.LG cs.AI 新提交

Modeling Nonlinear Feature Interactions with Product-Unit Residual Networks

使用乘积单元残差网络建模非线性特征交互

Ziyuan Li, Uwe Jaekel, Babette Dellen

发表机构 * University of Applied Sciences Koblenz（科隆应用科学大学）； Technical University of Munich（慕尼黑技术大学）

AI总结提出乘积单元残差网络（PURe），通过显式建模特征交互提升鲁棒性和可解释性，在合成和真实数据集上优于MLP。

Comments Accepted at ICCS 2026

详情

AI中文摘要

理解非线性特征交互在科学和工程中至关重要，然而标准多层感知器（MLP）通常仅隐式地捕获此类交互，导致表征纠缠，可能损害鲁棒性和可解释性。我们研究了乘积单元残差网络（PURe），它将乘法乘积单元与残差连接相结合，以显式建模跨特征耦合，同时稳定优化。我们在一个基于交互的合成基准和两个真实世界数据集上进行了系统评估，考察了预测准确性、对高斯特征噪声的鲁棒性以及在有限训练数据下的性能，并在匹配参数预算下比较了实值和复值变体。除了准确性，基于SHapley Additive exPlanations（SHAP）的交互分析表明，与MLP基线相比，PURe学习了更集中且结构更连贯的交互模式。总体而言，PURe实现了具有竞争力或更好的性能，在低数据场景下具有更好的鲁棒性和样本效率，并增强了交互级别的可解释性。

英文摘要

Understanding nonlinear feature interactions is crucial in science and engineering, yet standard multilayer perceptrons (MLPs) often capture such interactions only implicitly, leading to entangled representations that can impair robustness and interpretability. We investigate product-unit residual networks (PURe) that integrate multiplicative product units with residual connections to explicitly model cross-feature couplings while stabilizing optimization. We conduct a systematic evaluation on an interaction-driven synthetic benchmark and two real-world datasets, assessing predictive accuracy, robustness to Gaussian feature noise, and performance under limited training data, and we compare real- and complex-valued variants under a matched parameter budget. Beyond accuracy, SHapley Additive exPlanations (SHAP)-based interaction analyses show that PURe learns more concentrated and structurally coherent interaction patterns than MLP baselines. Overall, PURe achieves competitive or improved performance, better robustness and sample efficiency in low-data regimes, and enhanced interaction-level interpretability.

URL PDF HTML ☆

赞 0 踩 0

2606.06866 2026-06-08 cs.LG nucl-th 新提交

Product units in gated recurrent units improve nuclear-mass prediction

门控循环单元中的乘积单元改进核质量预测

Ziyuan Li, Paulo S. A. Freitas, John W. Clark, Babette Dellen

发表机构 * University of Applied Sciences Koblenz（应用科学大学科伦兹大学）； Technical University of Munich（慕尼黑技术大学）； University of Madeira（马德拉大学）； Washington University in St. Louis（圣路易斯华盛顿大学）

AI总结提出基于复数域加法-乘法乘积单元门控循环单元（AM-PU-GRU）的机器学习模型，通过整合乘积单元变换和复数计算，在核质量预测中实现插值RMSE 0.227 MeV和外推RMSE 0.179 MeV，超越现有模型。

Comments Accepted at ICCS 2026

详情

AI中文摘要

使用机器学习预测原子核质量可以补充理论模型，并推进对核图表中未知领域的探索。我们提出了一种基于门控循环单元（GRU）的机器学习技术，该技术通过利用长期依赖关系在核质量预测中展现出竞争性能。通过在循环单元内整合乘法交互和乘积单元变换，我们报告了核质量预测的显著改进。计算在复数域中进行，以联合捕捉幅度和相位动态。对于基于原子质量评估（AME2016和AME2020）的插值和时间外推任务，复数加法-乘法乘积单元门控循环单元（AM-PU-GRU）模型始终实现最低的预测误差，插值RMSE为0.227 ± 0.004 MeV，外推RMSE为0.179 ± 0.015 MeV。这些结果超越了其他最先进的机器学习模型，也优于实值GRU基线和乘积单元消融变体，同时对不同的理论先验（包括WS4和SEMF）保持鲁棒性。我们的发现确立了复数乘积单元循环网络作为基于序列的核质量预测的新基准。

英文摘要

The prediction of masses of atomic nuclei using machine learning can complement theoretical models and advance the exploration of poorly known domains of the nuclear chart. We propose a machine learning technique based on gated recurrent units (GRU), which have demonstrated competitive performance in nuclear-mass prediction by exploiting long-term dependencies. By integrating multiplicative interactions and product-unit transformations within recurrent units, we report significant improvements in nuclear-mass prediction. Computations are performed in the complex domain to jointly capture amplitude and phase dynamics. For interpolation and temporal-extrapolation tasks based on the atomic mass evaluation (AME2016 and AME2020), the complex additive-multiplicative product-unit gated recurrent unit (AM-PU-GRU) model consistently achieves the lowest prediction errors, with an interpolation RMSE of 0.227 $\pm$ 0.004 MeV and an extrapolation RMSE of 0.179 $\pm$ 0.015 MeV. These results surpass other state-of-the-art machine learning models and also outperform the real-valued GRU baseline and product-unit ablation variants, while remaining robust to different theoretical priors, including WS4 and SEMF. Our findings establish complex-valued product-unit recurrent networks as a new benchmark for sequence-based nuclear-mass prediction.

URL PDF HTML ☆

赞 0 踩 0

2606.06871 2026-06-08 cs.LG 新提交

Evidence-Grounded Ensemble Diagnosis of 802.11 Packet Captures: A Multi-Stage Pipeline with Deterministic Reliability Scoring

基于证据的802.11数据包捕获集成诊断：具有确定性可靠性评分的多阶段流水线

Jerome Henry, Swadhin Pradhan, Miroslav Popovic

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出PROBE多阶段流水线，通过确定性证据框架和集成方法解决LLM在802.11诊断中的幻觉、置信度偏差和评估偏见问题，在87个企业Wi-Fi捕获上实现0.957的加权证据F1分数和96%的自动接受率。

Comments 37 pages, 9 figures, 9 tables

详情

AI中文摘要

诊断802.11数据包捕获需要专家协议知识，速度慢、工程师间不一致且不可扩展。基于LLM的方法听起来合理，但会编造捕获中不存在的协议事件（尤其是截断的跟踪），产生未校准的置信度分数，并且当黄金参考由被测模型共同生成时遭受评估偏差。我们引入PROBE（基于证据的协议推理集成），一个解决所有三个失败的多阶段流水线。它整合了(i)具有帧级可验证性的确定性PCAP到文本归一化，(ii)多运行、多候选集成，带有可选的跨模型第二意见和渐进混淆，(iii)一个判决感知的证据框架，将缺乏失败证据视为贡献证据，以及(iv)一个完全确定性的复合可靠性分数，来自证据有效性、运行间稳定性和跨模型一致性，无需LLM自我评估。在87个企业Wi-Fi捕获（104个捕获-审查者对）上，单次LLM分析将加权证据F1从0.871（专家基线）提升到0.912，但在35%的情况下遗漏了关键帧。朴素集成投票降至基线以下（0.842），因为多数投票放大了保守判决：50%的确认失败被误分类为“无问题”或“证据不足”。添加基于证据的协调达到0.957 F1，96%的自动接受率，以及最坏情况下的下限高于0.70。LLM自我报告的置信度聚集在0.95，无论难度如何（71%报告恰好0.95），证实其无信息量。我们还引入了一个使用逐字段断言匹配的模型无关评估框架，消除了来自模型共同生成的黄金参考的循环偏差。

英文摘要

Diagnosing 802.11 packet captures requires expert protocol knowledge, is slow, inconsistent across engineers, and unscalable. LLM-based approaches sound plausible but fabricate protocol events absent from captures (especially truncated traces), produce uncalibrated confidence scores, and suffer evaluation bias when golden references are co-produced by the model under test. We introduce PROBE (Protocol Reasoning Over evidence-Based Ensembles), a multi-stage pipeline addressing all three failures. It integrates (i) deterministic PCAP-to-text normalization with frame-level verifiability, (ii) multi-run, multi-candidate ensembles with optional cross-model second opinion and progressive obfuscation, (iii) a verdict-aware evidence framework treating absence of failure evidence as contributing evidence, and (iv) a fully deterministic composite reliability score from evidence validity, run-to-run stability, and cross-model agreement without LLM self-assessment. On 87 enterprise Wi-Fi captures (104 capture-reviewer pairs), single-pass LLM analysis raises weighted evidence F1 from 0.871 (expert baseline) to 0.912 but misses critical frames in 35% of cases. Naive ensemble voting drops below baseline (0.842) as majority voting amplifies conservative verdicts: 50% of confirmed failures are misclassified as 'no issue' or 'insufficient evidence.' Adding evidence-grounded reconciliation achieves 0.957 F1, a 96% auto-accept rate, and a worst-case floor above 0.70. LLM self-reported confidence clusters at 0.95 regardless of difficulty (71% report exactly 0.95), confirming it is uninformative. We also introduce a model-agnostic evaluation framework using per-field assertion matching, eliminating circular bias from model-co-produced golden references.

URL PDF HTML ☆

赞 0 踩 0

2606.06881 2026-06-08 cs.LG 新提交

GlucoFM-Bench: Benchmarking Time-Series Foundation Models for Blood Glucose Forecasting

GlucoFM-Bench：血糖预测的时间序列基础模型基准测试

Baiying Lu, Zhaohui Liang, Ryan Pontius, Shengpu Tang, Temiloluwa Prioleau

发表机构 * Department of Computer Science（计算机科学系）； Dartmouth College（达特茅斯学院）； Emory University（埃默里大学）； Quantitative Biomedical Sciences（定量生物医学科学）

AI总结提出GlucoFM-Bench基准，评估8种时间序列基础模型与监督深度学习模型在15个糖尿病数据集上的血糖预测性能，发现预训练模型在零样本和少样本场景表现优异，但全样本下轻量LSTM仍最优。

详情

AI中文摘要

血糖预测模型是现代糖尿病管理系统的基石，可靠的短期预测能够实现主动干预、支持自动化胰岛素输送，并降低低血糖和高血糖事件的风险。从建模角度看，由于糖尿病群体中异质的生理动态，血糖预测面临独特挑战。传统机器学习和深度学习模型已被广泛评估用于血糖预测，但近期的时间序列基础模型（TSFMs）在此场景下的研究仍较少。为填补这一空白，我们提出GlucoFM-Bench，一个全面的基准测试，评估最先进的TSFMs与监督深度学习模型在血糖预测中的表现。我们评估了8种代表性架构，包括预训练TSFMs、时间序列大语言模型和特定任务深度学习模型，涵盖15个公开的糖尿病相关数据集，涉及1117名1型糖尿病、2型糖尿病、前驱糖尿病和非糖尿病个体。模型在零样本、少样本和全样本协议下进行评估，并系统变化上下文长度和预测范围。跨数据集，预训练TSFMs，尤其是Chronos-2和TimesFM，展现出强大的零样本和少样本迁移能力，最佳零样本模型性能在最佳全样本监督模型的5%以内。然而，当任务特定数据充足时，轻量级LSTM仍是最强的，在全样本训练下比TSFMs高出4-21%。分层分析揭示了T1D队列和低/高血糖范围内的持续挑战，强调了超越聚合误差指标进行评估的必要性。总之，GlucoFM-Bench为评估、比较和改进血糖预测基础模型提供了标准化和可重复的基础。

英文摘要

Blood glucose forecasting models are foundational for modern diabetes management systems, as reliable short-term predictions can enable proactive interventions, support automated insulin delivery, and reduce the risk of hypo- and hyperglycemic events. From a modeling perspective, glucose forecasting poses unique challenges due to heterogeneous physiological dynamics across diabetes populations. Traditional machine learning and deep learning models have been extensively evaluated for glucose prediction, yet recent time-series foundation models (TSFMs) remain much less studied in this setting. To bridge this gap, we present GlucoFM-Bench, a comprehensive benchmark evaluating state-of-the-art TSFMs alongside supervised deep learning models for blood glucose forecasting. We assess eight representative architectures, including pre-trained TSFMs, time-series large language models, and task-specific deep learning models, across 15 publicly available diabetes-relevant datasets comprising 1,117 individuals with type 1 diabetes, type 2 diabetes, prediabetes, and no diabetes. Models are evaluated under zero-shot, few-shot, and full-shot protocols, with systematic variation in context length and prediction horizon. Across datasets, pre-trained TSFMs, especially Chronos-2 and TimesFM, show strong zero-shot and few-shot transfer, with the best zero-shot model performing within 5% of the best full-shot supervised model. Yet, when task-specific data are abundant, a lightweight LSTM remains strongest, outperforming TSFMs by 4--21% under full-shot training. Stratified analyses reveal persistent challenges in T1D cohorts and hypo-/hyperglycemic ranges, highlighting the need for evaluation beyond aggregate error metrics. Together, GlucoFM-Bench provides a standardized and reproducible foundation for evaluating, comparing, and improving foundation models for blood glucose forecasting.

URL PDF HTML ☆

赞 0 踩 0

2606.06892 2026-06-08 cs.LG 新提交

GRASP: Geometry-aware Residual Alignment for Scalable Pretraining Data Attribution

GRASP：面向可扩展预训练数据归因的几何感知残差对齐

Yue Min, Ruining Chen, Yujun Li

发表机构 * Wizard Quant ； University of Science and Technology of China（中国科学技术大学）

AI总结提出GRASP方法，通过二次几何惩罚建模子集交互，结合低维特征草图与有限置信度选择协议，实现可扩展的预训练数据归因，显著提升反事实子集保真度并降低计算成本。

详情

AI中文摘要

可扩展的数据归因方法通常为单个训练样本分配孤立的效用分数。这种普遍的加性假设从根本上无法捕捉关键的子集动态，包括数据冗余和互补覆盖。在这项工作中，我们将归因重新定义为子集级别的反事实效用预测，并引入GRASP，一种交互感知的替代方法。基于理论平滑度下界，GRASP通过二次几何惩罚显式建模子集交互。为了实现预训练规模的效率而不依赖隐藏的oracle调优，我们将低维特征草图与严格有限下置信度选择协议相结合。广泛的子集重训练评估表明，GRASP显著优于现有的可扩展基线。它将反事实子集保真度的任务级秩相关性提高了一倍以上，同时将前期工件构建成本降低了近一个数量级。下游诊断进一步表明，这种评分机制可迁移到语言模型策展和跨领域视觉选择，为优化大规模预训练语料库奠定了坚实基础。

英文摘要

Scalable data attribution methods typically assign isolated utility scores to individual training examples. This prevalent additive assumption fundamentally fails to capture critical subset dynamics, including data redundancy and complementary coverage. In this work, we reframe attribution as subset-level counterfactual utility prediction and introduce GRASP, an interaction-aware surrogate. Grounded in a theoretical smoothness lower bound, GRASP explicitly models subset interactions through a quadratic geometric penalty. To achieve pretraining-scale efficiency without relying on hidden oracle tuning, we couple low-dimensional feature sketches with a strictly finite lower-confidence bound selection protocol. Extensive subset-retraining evaluations demonstrate that GRASP decisively outperforms existing scalable baselines. It more than doubles the task-level rank correlation for counterfactual subset fidelity while reducing upfront artifact construction costs by nearly an order of magnitude. Downstream diagnostics further show that this scoring mechanism transfers to language model curation and cross-domain vision selection, establishing a robust foundation for optimizing massive pretraining corpora.

URL PDF HTML ☆

赞 0 踩 0

2606.06902 2026-06-08 cs.LG 新提交

TALAN: Task-Aligned Latent Adaptation Networks for Targeted Post-Training of Large Language Models

TALAN：面向大型语言模型目标后训练的任务对齐潜在自适应网络

Chengkai Zhang, Ziteng Liu, Junpu Wang, Zeyi Tao, Yang Wang, Sagar Chordia, Qin Huang

发表机构 * Meta AI

AI总结提出TALAN，一种序列条件潜在旁路，插入Transformer残差流并与低秩适配器协同训练，在STEM/代码基准上平均提升LoRA 1.41点、DoRA 1.85点，仅增加<1%可训练参数和1.01-1.02倍推理开销。

详情

AI中文摘要

目标后训练旨在提升推理、数学和代码能力而不损害原有优势。低秩适配器高效但任务全局；激活干预输入感知但通常需要独立的探针、向量或推理时引导。我们提出TALAN（任务对齐潜在自适应网络），一种序列条件潜在旁路，插入Transformer的残差流中，并在一个SFT循环中与低秩适配器协同训练。TALAN将活动序列压缩为潜在记忆，将其重新混合为令牌级扰动，并通过受控残差更新写回。它沿六个轴配置：插入位置、记忆大小、混合器、写回规则、可训练范围和梯度尺度。在四个Qwen3系列骨干和四个STEM/代码基准上，TALAN改进了匹配的LoRA和DoRA基线。使用LoRA，它实现了+1.41点的跨模型平均增益，在所有四个骨干上为正，在所有16个模型-基准单元上非负。使用DoRA，它实现了+1.85点的平均增益，在所有骨干上为正，在16个单元中的13个上为正。配对种子检查支持正平均效应但显示非平凡方差，因此我们将其视为敏感性检查。成本很小：相对于骨干的可训练参数<1%，推理开销为匹配LoRA的1.01-1.02倍。在Llama-3.2-1B上的迁移探针在LoRA和rsLoRA下，跨七个配对种子也呈正效应，支持超越Qwen的迁移。内部状态分析表明TALAN是一种小的互补激活干预。匹配的适配器更新比TALAN扰动大80-1700倍，但它们的余弦接近零；逐层测量显示这种小的正交扰动通过深度传播和放大。TALAN为在标准适配器后训练中研究可引导的激活级自适应提供了一个实用平台。

英文摘要

Targeted post-training aims to improve reasoning, math, and code without degrading strengths. Low-rank adapters are efficient but task-global; activation interventions are input-aware but often require separate probes, vectors, or inference-time steering. We introduce TALAN (Task-Aligned Latent Adaptation Networks), a sequence-conditioned latent side path inserted into a transformer's residual stream and co-trained with a low-rank adapter in one SFT loop. TALAN compresses the active sequence into latent memory, remixes it into token-level perturbations, and writes them back through a controlled residual update. It is configured along six axes: insertion location, memory size, mixer, writeback rule, trainability scope, and gradient scale. Across four Qwen3-family backbones and four STEM/code benchmarks, TALAN improves matched LoRA and DoRA baselines. With LoRA, it yields a +1.41 point cross-model mean gain, positive on all four backbones and non-negative on all 16 model-benchmark cells. With DoRA, it yields a +1.85 point mean gain, positive on all backbones and on 13 of 16 cells. Paired seed checks support positive average effects but show nontrivial variance, so we treat them as sensitivity checks. Cost is small: <1% trainable parameters relative to the backbone and 1.01-1.02x inference overhead versus matched LoRA. A Llama-3.2-1B transfer probe is also positive under LoRA and rsLoRA across seven paired seeds, supporting a transfer beyond Qwen. Internal-state analyses suggest TALAN is a small complementary activation intervention. The matched adapter update is 80-1,700x larger than the TALAN perturbation, yet their directions have near-zero cosine; per-layer measurements show this small orthogonal perturbation propagates and amplifies through depth. TALAN offers a practical platform for studying steerable activation-level adaptation within standard adapter-based post-training.

URL PDF HTML ☆

赞 0 踩 0

2606.06920 2026-06-08 cs.LG cs.AI 新提交

The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning

微调陷阱：评估负迁移及PEFT在亚十亿参数数学推理中的作用

Rahul Nair, Chun Tao

发表机构 * GitHub ； University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结本研究评估了五种亚十亿参数模型在数学推理任务中的微调策略，发现全量微调对小于3亿参数的模型造成负迁移，而参数高效微调（PEFT）是稳定性要求。

Comments 8 pages, 6 figures, 2 tables

详情

AI中文摘要

在边缘设备上部署小型语言模型（SLM）需要高效的微调策略，使模型适应新任务而不降低其通用能力。在本研究中，我们对五种亚十亿参数模型（135M-1B）在数学推理任务上进行了基准测试，并发现了一个关键脆弱性：全量微调（Full FT）会主动损害300M以下参数模型的性能，通常将准确率降至零样本基线以下。这种“负迁移”使得参数高效微调（PEFT）不仅是效率上的偏好，更是稳定性上的要求。我们发现，虽然低秩适应（LoRA）和权重分解LoRA（DoRA）性能相当，但它们的优势因任务而异：DoRA在复杂推理（GSM8K）中表现出色，而LoRA在模式匹配（OrcaMath）中占主导地位。特别地，在对齐模型（Qwen2.5-0.5B）上，LoRA优于全量微调，甚至在最小架构（SmolLM2-135M）上，简单的5-shot上下文学习也优于全量微调。基于这些发现，我们建议对所有对齐的亚十亿参数模型默认使用PEFT，并警告不要对任何小于500M参数的架构使用全量微调，以防止灾难性遗忘。本工作的复现可在此网址找到：https://this URL。

英文摘要

Deploying Small Language Models (SLMs) on edge devices requires efficient fine-tuning strategies that adapt models to new tasks without degrading their general capabilities. In this study, we benchmark five sub-1B models (135M-1B) on mathematical reasoning tasks and uncover a critical vulnerability: Full Fine-Tuning (Full FT) actively harms performance in models under 300M parameters, often dropping accuracy below zero-shot baselines. This "negative transfer" makes Parameter-Efficient Fine-Tuning (PEFT) not just an efficiency preference, but a stability requirement. We find that while Low-Rank Adaptation (LoRA) and Weight-Decomposed LoRA (DoRA) perform comparably, their strengths vary by task; DoRA excels in complex reasoning (GSM8K), while LoRA dominates pattern matching (OrcaMath). In particular, Full FT is outperformed by LoRA on aligned models (Qwen2.5-0.5B) and even by simple 5-shot In-Context Learning on the smallest architectures (SmolLM2-135M). Based on these findings, we recommend defaulting to PEFT for all aligned sub-1B models and caution against Full FT for any architecture smaller than 500M parameters to prevent catastrophic forgetting. Reproduction of this work can be found at https://github.com/gulguluu/tiny-slm-finetune-compare.

URL PDF HTML ☆

赞 0 踩 0

2606.06924 2026-06-08 cs.LG 新提交

From Sampled Outcomes to Capability Distributions: Rethinking Supervision for LLM Routing

从采样结果到能力分布：重新思考LLM路由的监督

Guannan Lai, Haoran Hu, Long Chen, Zhenguo Li, Han-Jia Ye

发表机构 * School of Artificial Intelligence, Nanjing University（南京大学人工智能学院）； National Key Laboratory for Novel Software Technology, Nanjing University（南京大学新型软件技术国家重点实验室）； Hong Kong University of Science and Technology（香港科学与技术大学）； Frontier Robotics（前沿机器人）

AI总结针对LLM路由中单次响应作为监督信号噪声大的问题，提出DARS框架，从分布视角构建路由监督，考虑输入和输出不确定性，实验表明分布感知监督更稳定有效。

详情

AI中文摘要

现有的LLM路由方法通常将模型对查询的单个响应作为训练路由器的能力标签。然而，由于LLM生成本质上是随机的，这种单次监督仅提供了查询-模型对行为的噪声观测，而非可靠的能力估计。我们表明，这种假设会向路由监督中引入系统性噪声，使得学习到的路由策略可靠性降低。为解决此问题，我们提出DARS（分布感知路由监督）框架，该框架从模型行为的分布视角构建路由监督。DARS不依赖单个生成的响应，而是考虑来自输入侧和输出侧的不确定性，捕捉语义等价的查询表述和随机生成如何影响模型性能。基于这些分布感知的观测，DARS为路由构建更可靠的监督信号。跨不同任务的实验表明，单次标签可能对模型选择产生误导，而分布感知监督提供更稳定的标签并改进学习到的路由行为。我们的结果表明，可靠的LLM路由应超越单次响应观测，并基于查询级模型能力分布。

英文摘要

Existing LLM routing methods typically treat a model's single response to a query as its capability label for training routers. However, because LLM generation is inherently stochastic, such single-shot supervision provides only a noisy observation of a query-model pair's behavior rather than a reliable capability estimate. We show that this assumption introduces systematic noise into routing supervision, making learned routing policies less reliable. To address this issue, we propose DARS (Distribution-Aware Routing Supervision), a framework that constructs routing supervision from a distributional view of model behavior. Instead of relying on a single generated response, DARS considers uncertainty from both the input side and the output side, capturing how semantically equivalent query formulations and stochastic generations affect model performance. Based on these distribution-aware observations, DARS builds more reliable supervision signals for routing. Experiments across diverse tasks show that single-shot labels can be misleading for model selection, while distribution-aware supervision provides more stable labels and improves learned routing behavior. Our results suggest that reliable LLM routing should move beyond single-response observations and be grounded in query-level model capability distributions.

URL PDF HTML ☆

赞 0 踩 0

2606.06934 2026-06-08 cs.LG 新提交

Uniform Stability and Generalization Error of GD and SGD on Fixed-Point Parameters

固定点参数上GD和SGD的均匀稳定性与泛化误差

Jonghyun Shin, Sejun Park

发表机构 * Department of Artificial Intelligence, Korea University（人工智能系，韩国大学）

AI总结研究离散参数空间中梯度下降（GD）和随机梯度下降（SGD）的泛化误差与均匀稳定性，发现确定性舍入使GD泛化误差率从O(T/n)恶化到O(T/√n)，而SGD在确定性舍入下仍具有非平凡稳定性保证，且随机舍入会引入随维度增长的泛化误差。

详情

AI中文摘要

我们分析了离散参数空间上梯度下降（GD）和随机梯度下降（SGD）的泛化误差、均匀稳定性和均匀参数稳定性，其中每次更新涉及确定性或随机舍入。我们表明，确定性舍入降低了GD在凸、Lipschitz和平滑损失函数上的泛化误差，将速率从$O(T/n)$增加到$O(T/\sqrt{n})$，并建立了匹配的下界。我们进一步证明GD的均匀稳定性变为$\Omega(T)$，表明基于稳定性的泛化界在此设置中是无效的。相比之下，对于相同的损失，带有确定性舍入的随机梯度下降具有非平凡的均匀稳定性保证，这些保证与实值情况有质的区别，并且在迭代次数和维度上表现出不同的依赖性：我们证明了一维的紧界$O(T/n)$和高维的$O(T^2/n)$。我们还表明，随机舍入可能引入随维度增加的泛化误差；这种现象在标准实值优化和确定性舍入情况下是不存在的。最后，我们给出了随机舍入方案的均匀参数稳定性的上界，并表明当损失可以表示为坐标函数之和时，这些界是紧的。

英文摘要

We analyze generalization error, uniform stability, and uniform argument stability of gradient descent (GD) and stochastic gradient descent (SGD) over discrete parameter spaces, where each update involves deterministic or stochastic rounding. We show that deterministic rounding degrades the generalization error of GD on convex, Lipschitz, and smooth loss functions, increasing the rate from $O(T/n)$ to $O(T/\sqrt{n})$, and establish matching lower bounds. We further prove that uniform stability of GD becomes $Ω(T)$, showing that stability-based generalization bounds are vacuous in this setting. In contrast, for the same losses, stochastic gradient descent with deterministic rounding admits nontrivial uniform stability guarantees, which differ qualitatively from the real-valued case and exhibit distinct dependencies on the number of iterations and the dimension: we prove tight bounds $O(T/n)$ for one dimension and $O(T^2/n)$ for higher dimensions. We also show that stochastic rounding can introduce generalization error that increases with the dimension; such a phenomenon is absent in standard real-valued optimization and in the deterministic rounding case. Finally, we provide upper bounds on uniform argument stability for stochastic rounding schemes and show that these bounds are tight when the loss can be represented as a sum of coordinate-wise functions.

URL PDF HTML ☆

赞 0 踩 0

2606.06967 2026-06-08 cs.LG 新提交

GenPO++: Generative Policy Optimization with Jacobian-free Likelihood Ratios

GenPO++：基于无雅可比似然比的生成式策略优化

Ke Hu, Shutong Ding, Panxin Tao, Jingya Wang, Ye Shi

发表机构 * ShanghaiTech University（上海科技大学）

AI总结提出GenPO++框架，利用高阶可逆ODE求解器中的历史状态作为辅助记忆，实现精确可逆映射，从而无偏且高效地计算生成流策略的似然比，在连续控制任务中优于现有方法。

详情

AI中文摘要

生成式策略提供表达性强且多模态的动作分布，使其在复杂连续控制任务的强化学习（RL）中具有吸引力。其中，基于流的策略尤其吸引人，因为它们通过确定性传输映射生成动作。然而，将此类生成式策略应用于基于似然的在线学习仍然受到评估已执行动作概率的困难限制。现有的流RL方法要么用近似替代品替换真实的动作密度比，这可能会引入有偏更新，要么通过虚拟动作增广恢复精确似然，这会扩大策略空间并增加计算量。在这项工作中，我们提出GenPO++，一种可逆生成式策略优化框架，它使用高阶可逆ODE求解器中的历史状态作为辅助记忆，在不改变原始动作维度的情况下实现精确反演。由此产生的生成式策略映射的对数行列式仅由固定的求解器系数决定，从而实现了精确且无雅可比的似然比计算。这种设计保留了生成流策略的表达能力，同时避免了动作比率偏差和虚拟动作开销。我们在大规模模拟控制、微调和真实机器人操作任务上评估了GenPO++，与最先进的在线RL方法相比，它取得了具有竞争力或更优的性能，同时提高了训练稳定性和计算效率。

运营中断下城市轨道交通需求预测的层级协调方法

Dang Viet Anh Nguyen, Alma Fazlagic, Kristine Pryds Loft, Filipe Rodrigues

发表机构 * Technical University of Denmark (DTU)（丹麦技术大学）

AI总结针对城市轨道交通中站点与OD流预测不一致问题，提出首个层级协调框架，利用神经全连接协调器(FCR)学习非线性映射，确保结构一致性，在中断场景下OD预测误差降低达17.45%。

Comments 33 pages, 6 figures, 16 tables

详情

AI中文摘要

准确且一致的乘客需求预测对于城市轨道交通(URT)运营至关重要。乘客需求具有层级结构，其中起讫点(OD)流量通过守恒约束聚合为站点级进出站流量。实践中，站点级和OD级预测通常独立生成，产生违反这些约束的不一致预测，给运营决策带来不一致性。在中断期间，当预测可靠性最为关键时，此类问题更为严重。本文提出了首个用于联合站点级和OD级URT需求预测的层级预测协调框架。神经全连接协调器(FCR)学习从非协调基础预测到协调层级预测的非线性映射，同时通过构造保证精确的结构一致性。该方法使用哥本哈根S-train网络的Rejsekort智能卡数据，在单步、多步和中断预测场景下，与OLS、WLS和最小迹(MinT)变体进行基准比较。结果表明，协调一致地提高了OD预测准确性，同时确保了层级一致性。在正常条件下，FCR与基于MinT的方法性能相当。一项oracle分析表明，完美的站点级预测可将OD预测误差降低高达34%，凸显了改进基础预测的价值。在严重中断下，FCR优于经典方法，在多步目的地侧延迟场景中将OD预测误差降低高达17.45%。这些发现确立了层级协调作为提高预测鲁棒性的有效机制，最大的收益出现在最具挑战性的运营条件下。

英文摘要

Accurate and coherent passenger demand forecasting is essential for Urban Rail Transit (URT) operations. Passenger demand has a hierarchical structure in which origin-destination (OD) flows aggregate to station-level inflows and outflows through conservation constraints. In practice, station-level and OD-level forecasts are often generated independently, producing incoherent predictions that violate these constraints and introduce inconsistencies into operational decision-making. Such issues become more severe during disruptions, when forecasting reliability is most critical. This paper presents the first hierarchical forecast reconciliation framework for joint station-level and OD-level URT demand prediction. A neural Fully Connected Reconciler (FCR) learns a non-linear mapping from incoherent base forecasts to coherent hierarchical predictions while guaranteeing exact structural consistency by construction. The method is benchmarked against OLS, WLS, and Minimum Trace (MinT) variants using Rejsekort smart-card data from the Copenhagen S-train network under one-step, multi-step, and disruption forecasting scenarios. Results show that reconciliation consistently improves OD forecasting accuracy while ensuring hierarchical coherence. Under normal conditions, FCR performs competitively with MinT-based methods. An oracle analysis indicates that perfect station-level forecasts could reduce OD prediction error by up to 34 percent, highlighting the value of improved base forecasts. Under severe disruptions, FCR outperforms classical methods, reducing OD forecasting error by up to 17.45 percent in multi-step destination-side delay scenarios. These findings establish hierarchical reconciliation as an effective mechanism for improving forecast robustness, with the largest benefits occurring under the most challenging operating conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.07058 2026-06-08 cs.LG cs.CV math.AT stat.ML 新提交

Constructing VAE Latent Spaces with Prescribed Topology

构建具有指定拓扑的VAE潜在空间

Jilles S. van Hulst, Jakub M. Tomczak, W. P. M. H. Heemels, Duarte J. Antunes

发表机构 * Control Systems Technology Section, Department of Mechanical Engineering, Eindhoven University of Technology（机械工程系控制系统技术部，埃因霍温理工大学）； Nature Innovation Laboratory (NatInLab)（自然创新实验室（NatInLab））

AI总结针对数据流形非欧几里得拓扑导致标准高斯先验不匹配的问题，提出一种构造性数学框架，通过因子化分布和重参数化技巧，为乘积覆盖空间流形（如圆柱、环面、莫比乌斯带等）设计拓扑匹配的先验，提升重建质量和表示忠实性。

Comments 16 pages, 7 figures

详情

AI中文摘要

变分自编码器（VAE）学习高维数据的低维潜在表示。当数据位于具有非欧几里得拓扑的流形上时，标准高斯先验会引入拓扑不匹配，从而降低重建质量并阻碍忠实表示。我们提出了一个构造性数学框架，解决了所有允许乘积覆盖空间的流形的这种不匹配问题。这些流形可表示为基本因子（圆、区间或直线）的乘积，或此类乘积在有限对称群下的商。该类包括圆柱、环面、莫比乌斯带、克莱因瓶和实射影空间。基本因子上的因子化分布产生具有闭式解耦KL散度的乘积拓扑，使得每个潜在因子可以独立塑造，同时保持训练可处理。我们为周期、有界和无界支撑编目了可重参数化的编码器-先验对，并提供了坐标变换，允许标准神经网络输出具有平滑梯度的非欧几里得参数。对于商流形，解码器接收覆盖空间坐标的群不变特征，使得识别点产生相同输出。锚点约束相对于数据固定坐标系或创建软拓扑孔。在合成流形和真实图像数据集（旋转和循环移位MNIST）上的实验证实，拓扑匹配的先验使KL正则化与数据流形对齐。所得到的拓扑感知模型在所有实际相关的正则化强度下均优于高斯基线。代码可从此https URL获取。

英文摘要

Variational autoencoders (VAEs) learn low-dimensional latent representations of high-dimensional data. When the data lies on a manifold with non-Euclidean topology, the standard Gaussian prior introduces a topological mismatch that degrades reconstruction quality and prevents faithful representation. We present a constructive mathematical framework that resolves this mismatch for all manifolds that admit a product covering space. These are manifolds expressible as products of elementary factors (circles, intervals, or lines) or as quotients of such products by a finite symmetry group. The class includes cylinders, tori, Möbius strips, Klein bottles, and real projective spaces. Factorized distributions over the elementary factors yield product topologies with closed-form, decoupled KL divergences, so that each latent factor can be shaped independently while keeping training tractable. We catalogue reparametrizable encoder-prior pairs for periodic, bounded, and unbounded supports, and provide coordinate transformations that allow standard neural networks to output non-Euclidean parameters with smooth gradients. For quotient manifolds, the decoder receives group-invariant features of the covering-space coordinates, so that identified points produce identical outputs. Anchor constraints fix the coordinate system relative to the data or create soft topological holes. Experiments on synthetic manifolds and real-image datasets (rotated and cyclically shifted MNIST) confirm that a topology-matched prior aligns KL regularization with the data manifold. The resulting topology-aware models outperform the Gaussian baseline at all practically relevant regularization strengths. The code is available at https://github.com/JvHulst/VAE-Topology.

URL PDF HTML ☆

赞 0 踩 0

2606.07068 2026-06-08 cs.LG 新提交

Bias in Filter Feature Selection Evaluation: A Meta-Analysis of Datasets, Baselines, and Experimental Design Choices

过滤特征选择评估中的偏差：数据集、基线和实验设计选择的元分析

Malick Ebiele, Malika Bendechache, Rob Brennan

发表机构 * University College Dublin（都柏林大学）； University of Galway（Galway大学）； ADAPT Centre（ADAPT中心）

AI总结通过分析28项高影响力过滤特征选择研究，发现数据集数量、基线方法和新方法数量可解释33%的性能变异，揭示了评估中的潜在偏差，并提出了五项基于证据的评估建议。

详情

AI中文摘要

背景：自1990年以来，跨异构应用提出了许多特征选择方法。为了验证新方法的有用性，需要在使用至少一个数据集的特征选择任务中，与现有文献中的至少一种基线方法进行比较。表格深度学习（DL）和机器学习（ML）中数据估值的最新发展表明，新方法、算法和模型的评估可能有意识或无意识地存在偏差。我们假设在特征选择（FS）中，特别是在过滤特征选择（FFS）中，存在类似的趋势。因此，本研究的目的是检查FFS研究，以识别影响评估的因素，这些因素可能构成偏差的入口点，从而为FFS评估推荐更强的原则。方法：我们分析了1994年至2025年间发表的28项高影响力FFS研究样本。该分析提供了如何检查FFS研究的思考，强调了过程中学到的经验教训，并为未来的FFS评估给出了五项基于证据的建议。结果：多元线性回归分析得分为$R^2=0.33$。这意味着新方法相对于所选基线的性能变异（胜率）的33%可由数据集数量（#Datasets）、基线数量（#Baselines）和新方法数量（#NewMethods）解释。讨论：$R^2=0.33$被认为是中等解释力；考虑到这是首次此类研究，这一结果是有希望的。中等解释力的结果是由于胜率还受到其他因素的影响，例如特征选择领域的成熟度、数据集和基线的类型，以及用于解释关系的回归模型的简单性。

英文摘要

Background: Since 1990 many feature selection methods have been proposed across heterogeneous applications. To validate the usefulness of a new method, it needs to be compared against at least one baseline method from the existing literature on a feature selection task using at least one dataset. Recent developments in tabular Deep Learning (DL) and data valuation in Machine Learning (ML) suggest that the evaluation of new methods, algorithms, and models may be consciously or unconsciously biased. We hypothesise that a similar trend exists in feature selection (FS), particularly in filter feature selection (FFS). The aim of this study is therefore to examine FFS studies to identify factors that influence the evaluation and that might consist entry point for biases in order to recommend stronger principles for FFS evaluation. Methods: We analyse a sample of 28 high profile FFS studies published between 1994 and 2025. The analysis provides reflections on how to examine FFS studies, highlights lessons learned throughout the process, and gives five evidence-based recommendations for future FFS evaluation. Results: Multivariate Linear Regression analysis achieved a score of $R^2=0.33$. It means that 33% of the variance in the performance of new methods against chosen baselines (win rate) is explained by the number of datasets (#Datasets), the number of baselines (#Baselines), and the number of new methods (#NewMethods). Discussion: $R^2=0.33$ is considered medium explanation; which is promising given that this is the first such study. The medium explanation result is due to the fact that win rate is influenced by additional factors such as the maturity of the feature selection domain, the type of datasets and baselines, and the simplicity of the regression model used to explain the relationship.

URL PDF HTML ☆

赞 0 踩 0

2606.07074 2026-06-08 cs.LG cs.AI 新提交

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

SlimSearcher: 通过自适应奖励门控实现训练效率感知的Web代理

Zequn Xie, Junjie Wang, Dan Yang, Jie Feng, Yue Shen, Jian Wang, Jinjie Gu

发表机构 * Zhejiang University（浙江大学）； Ant Group（蚂蚁集团）

AI总结提出SlimSearcher框架，通过帕累托高效过滤和自适应奖励门控，在保持或提升准确率的同时将工具调用轮次减少17%-58%。

Comments 17 pages, 8 figures,

详情

AI中文摘要

深度研究代理在复杂信息寻求任务中展现了卓越能力，但这种能力伴随着高昂的计算成本。受准确率驱动训练范式的影响，当前模型采用蛮力策略，表现为盲目依赖工具和执行性推理——生成长而冗余的轨迹，这些对于解决任务远非必要，导致浪费的工具调用和过多的token消耗。为克服这一效率陷阱，我们提出SlimSearcher，一个原则性框架，在监督微调（SFT）和强化学习（RL）中推动准确率与计算成本之间的帕累托前沿。在SFT阶段，SlimSearcher采用帕累托高效过滤来提炼既成功又经济的轨迹，引导模型走向内在效率感知的搜索行为。在RL阶段，我们引入自适应奖励门控，一种动态奖励塑造机制，在采样队列中评估相对工具和token效率。通过将这些自适应效率指标与严格正确性门控级联，我们的方法有效避免了与绝对惩罚相关的简洁性偏差，并缓解了奖励黑客攻击。在包括GAIA、BrowseComp和XBenchDeepSearch在内的长时域基准上的大量实验表明，SlimSearcher在保持或提升准确率的同时，将平均工具调用轮次减少了17%-58%。

英文摘要

Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this power comes at a steep computational cost. Driven by accuracy-focused training paradigms, current models adopt brute-force strategies characterized by blind tool dependency and performative reasoning-generating long, redundant trajectories that are far from necessary for resolving these tasks, leading to wasteful tool calls and excessive token consumption. To overcome this efficiency trap, we propose SlimSearcher, a principled framework that pushes the Pareto frontier between accuracy and computational cost across both Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). In the SFT stage, SlimSearcher employs Pareto-efficient filtration to distill trajectories that are both successful and economical, guiding the model toward inherently efficiency-aware search behaviors. During RL, we introduce Adaptive Reward Gating, a dynamic reward-shaping mechanism that evaluates relative tool and token efficiency within a sampled cohort. By cascading these adaptive efficiency metrics with a strict correctness gate, our approach effectively avoids the brevity bias associated with absolute penalties and mitigates reward hacking. Extensive experiments on long-horizon benchmarks, including GAIA, BrowseComp, and XBenchDeepSearch, demonstrate that SlimSearcher reduces average tool-call rounds by 17%-58% while maintaining or improving accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.07093 2026-06-08 cs.LG 新提交

The discovery of the effects of women employment participation on the fertility of developing countries: A panel data approach

女性就业参与对发展中国家生育率影响的发现：面板数据方法

Thi Kim Ngan Nguyen

发表机构 * Tokyo International University（东京国际大学）

AI总结本文使用面板数据方法，将115个发展中国家分为四大洲组，发现女性劳动参与率对生育率的影响因地区而异，仅美洲地区显著负相关。

详情

AI中文摘要

过去几十年，发展中国家的生育率显著下降，同时女性在职场中的作用有所提升。为了更深入地了解女性劳动力市场参与率对发展中国家总生育率的因果关系，本文将1991-2018年间115个发展中国家的数据集分为四个大洲组（非洲、南北美洲、亚太、欧洲），并采用数据驱动的面板数据计量经济学程序来减轻遗漏变量偏差。结果表明，南北美洲大陆女性的生育行为受到其职业选择的影响；而在其他地区的社会中，女性在考虑生育时，其他因素可能更为重要。总之，政策制定者可以借鉴本文制定政策，以在生育决策方面提供更多激励，该领域的进一步研究需要考虑发展中国家的家庭政策和从夫居作为重要数据。

英文摘要

The fertility trend in developing countries has experienced a significant decline in the last few decades; at the same time, the role of women in the workplace has improved. To have a better insight of the causality of the rate of women participation in the labor market on the total fertility rate in developing world, this paper divides the dataset of 115 developing countries in the period of 1991-2018 into four continents group (Africa, North/South America, Asia/Pacific, Europe) and then applies a data-driven panel data econometric procedure to mitigate omitted bias. The results suggest that the fertility behaviors of women in the North/South America continents are influenced by their career choice; meanwhile in society of other regions, other factors might be more important to women when thinking of having children. In conclusion, policymakers can reference to the paper and formulate policies to have more incentives in making reproductive decisions and further research in the field needs to consider family policies and patrilocality of developing countries as important data.

URL PDF HTML ☆

赞 0 踩 0

2606.07116 2026-06-08 cs.LG cs.AI cs.CL 新提交

OffQ: Taming Structured Outliers in LLM Quantization by Offsetting

OffQ：通过偏移驯服LLM量化中的结构化异常值

Haoqi Wang, Lorenz K. Mueller, Jiawei Zhuang, Mathieu Salzmann, Lukas Cavigelli

发表机构 * School of Computer and Communication Sciences, EPFL, Switzerland（瑞士联邦理工学院计算机与通信科学学院）； Huawei, Switzerland（华为公司）； Swiss Data Science Center, ETHZ & EPFL, Switzerland（瑞士数据科学中心，苏黎世联邦理工学院与联邦理工学院）

AI总结提出OffQ方法，通过top-1 PCA识别异常值子空间、旋转集中异常值通道并转换为共享偏移，实现LLM的低比特均匀量化，在W4A4KV4下提升精度。

详情

AI中文摘要

$\alpha$-PFN：通过上下文学习实现快速熵搜索

Herilalaina Rakotoarison, Steven Adriaensen, Tom Viering, Carl Hvarfner, Samuel Müller, Frank Hutter, Eytan Bakshy

发表机构 * University of Freiburg（弗莱堡大学）； University of Tübingen（图宾根大学）； University of Amsterdam（阿姆斯特丹大学）； Lund University（Lund大学）； Meta

AI总结提出一种两阶段摊销策略，利用先验数据拟合网络（PFN）在单次前向传播中近似熵搜索采集函数，实现超过50倍加速，在合成和真实基准上性能与最先进方法相当。

Comments Published at ICML 2026

详情

AI中文摘要

信息论采集函数如熵搜索（ES）为贝叶斯优化（BO）提供了原则性的探索-利用框架。然而，它们的实际实现依赖于复杂且缓慢的近似，即信息增益的蒙特卡洛估计。这种复杂性可能引入数值误差，并需要专门的、手工定制的实现。我们提出了一种两阶段摊销策略，该策略学习使用先验数据拟合网络（PFN）在单次前向传播中近似基于熵搜索的采集函数。第一个PFN被训练为以最优值的信息为条件；第二个$\alpha$-PFN通过训练来预测期望信息增益，该训练基于使用第一个PFN测量的信息增益。$\alpha$-PFN提供了一种灵活的学习近似，用每个候选点的单次前向传播取代了复杂的启发式近似，实现了快速且可扩展的采集评估。实验上，我们的方法在合成和真实世界基准上与最先进的熵搜索实现具有竞争力，同时在我们所有实验中加速了不同的熵搜索变体，加速比超过50倍。源代码：此https URL。

英文摘要

Information-theoretic acquisition functions such as Entropy Search (ES) offer a principled exploration-exploitation framework for Bayesian optimization (BO). However, their practical implementation relies on complicated and slow approximations, i.e., a Monte Carlo estimation of the information gain. This complexity can introduce numerical errors and requires specialized, hand-crafted implementations. We propose a two-stage amortization strategy that learns to approximate entropy search-based acquisition functions using Prior-data Fitted Networks (PFNs) in a single forward pass. A first PFN is trained to be conditioned on information about the optima; second, the $α$-PFN is trained to predict the expected information gain by training on information gains measured with the first PFN. The $α$-PFN offers a flexible learned approximation, which replaces the complex heuristic approximations with a single forward pass per candidate, enabling rapid and extensible acquisition evaluation. Empirically, our approach is competitive with state-of-the-art entropy search implementations on synthetic and real-world benchmarks, while accelerating the different entropy search variants across all our experiments, with speed ups over 50x. Source code: https://github.com/automl/AlphaPFN.

URL PDF HTML ☆

赞 0 踩 0

2606.07141 2026-06-08 cs.LG cs.AI 新提交

REMEDI: A Benchmark for Retention and Unlearning Evaluation in Multi-label Clinical Disease Inference

REMEDI：多标签临床疾病推断中的保留与遗忘评估基准

Anurag Sharma, Sai Teja Chunchu, Prasenjit Mitra, Sandipan Sikdar, Koustav Rudra

发表机构 * IIT Kharagpur（印度理工学院Kharagpur分校）； Carnegie Mellon University（卡内基梅隆大学）； L3S Research Center, Leibniz University Hannover（Leibniz汉诺威大学L3S研究中心）

AI总结提出REMEDI基准，针对多标签临床疾病推断中的机器遗忘问题，利用MIMIC-III数据库评估现有方法在效用与遗忘性能间的权衡，并发现其不适用于多标签任务。

Comments Under review

详情

AI中文摘要

用于临床疾病推断的语言模型在患者数据上进行训练，这些数据可能包含敏感和私人信息，数据所有者可能出于隐私或版权原因要求从训练模型中删除其数据。然而，精确遗忘患者特定数据是棘手的，而通过少量数据删除重新训练则资源密集。虽然存在几种可用的机器遗忘方法，但其效用通常局限于非医疗领域。此外，评估此类遗忘方法的现有基准主要使用合成数据集，这些数据集不能真正代表现实系统。因此，这些遗忘方法在医疗领域的有效性在很大程度上尚不清楚。为此，我们引入了REMEDI，一个针对多标签和多类别临床疾病推断的广泛机器遗忘基准，其中标签相关性、纵向结构和安全约束使得遗忘特别具有挑战性。与现有基准不同，REMEDI考虑：(1) 相关的应用领域（医疗），(2) 涉及多样遗忘实例集的全面遗忘设置，(3) 具有挑战性的遗忘场景，包括多标签和多类别分类任务，以及(4) 评估指标，涉及效用和遗忘程度的性能。REMEDI使用MIMIC-III临床数据库开发，该数据库包含患者的全面临床数据。现有遗忘方法的实验表明，效用和遗忘性能之间存在权衡。它们也大多不适合多标签分类任务。为促进可重复性，我们公开了我们的基准。

英文摘要

Language models trained for clinical disease inference are trained on patient data, which may include sensitive and private information, and data owners may request the removal of their data from a trained model due to privacy or copyright concerns. However, exactly unlearning patient-specific data is intractable, and retraining with minor data removal is resource-intensive. While there exists several machine unlearning methods that can be used, their utility is generally restricted to non-medical domains. Moreover, the existing benchmarks for evaluating such unlearning methods primarily utilize synthetically curated datasets, which are not truly representative of real-world systems. Hence, the effectiveness of these unlearning methods in the medical domain is largely unclear. To this end, we introduce REMEDI, an extensive benchmark for machine unlearning tailored to multi-label and multiclass clinical disease inference, where label correlations, longitudinal structure, and safety constraints make unlearning particularly challenging. Unlike the existing benchmarks, REMEDI considers: (1) a relevant application domain (medical), (2) comprehensive unlearning setups involving diverse sets of forget instances, (3) challenging unlearning scenarios including multi-label and multi-class classification tasks, and (4) evaluation metrics involving performance both in terms of utility and extent of unlearning achieved. REMEDI is developed using the MIMIC-III clinical database that contains comprehensive clinical data of patients. Experiments with existing unlearning methods indicate that there exists a trade-off between utility and unlearning performance. They are also largely unsuited to multi-label classification tasks. To facilitate reproducibility, we make our benchmark publicly available.

URL PDF HTML ☆

赞 0 踩 0

2606.07146 2026-06-08 cs.LG cs.CE 新提交

Decision-Aware Evaluation of Physics-Informed Surrogates

决策感知的物理信息替代模型评估

Daniel Cieślak, Andrzej Czyżewski

发表机构 * Gdańsk University of Technology（格但斯克技术大学）

AI总结针对物理信息机器学习在工程决策中的评估，提出pinn-gym基准，通过曲线误差、物理可行性、top-k检索和遗憾值等多维度指标，揭示低nRMSE不足以识别有用设计，且物理信息损失改变权衡而非单调改进所有指标。

Comments 12 pages, 5 figures, 9 tables. Code and data available at https://github.com/Dyniel/pinn-gym

详情

AI中文摘要

物理信息机器学习通常通过曲线误差来评估，尽管工程应用取决于下游决策：对候选方案进行排序、避免不可行设计以及限制遗憾值。我们引入了pinn-gym，一个用于材料条件晶格设计的开放基准，它结合了一个透明的降阶碰撞冲击预言机、五种可打印聚合物卡片、无量纲力响应目标以及一个涵盖曲线保真度、物理可行性、top-k检索和质量遗憾值的协议。在逐材料、混合和跨材料设置中，低nRMSE通常不足以识别有用的设计选择。物理信息损失改变了权衡，而不是单调地改进所有指标，并且无量纲条件化提高了可比性，但并未使迁移对称。该基准不是经过认证的材料模型；在发布的预言机、候选生成器和材料卡片中，pinn-gym提供了一个可重复的测试平台，用于评估PIML替代模型作为决策系统，而不仅仅是曲线预测器。

英文摘要

Physics-informed machine learning is often assessed by curve error, although engineering use depends on downstream decisions: ranking candidates, avoiding infeasible designs and limiting regret. We introduce pinn-gym, an open benchmark for material-conditioned lattice design that couples a transparent reduced-order crush-and-impact oracle with five printable polymer cards, dimensionless force-response targets and a protocol spanning curve fidelity, physical admissibility, top-k retrieval and mass regret. Across per-material, pooled and cross-material settings, low nRMSE is frequently insufficient to identify useful design selections. Physics-informed losses alter trade-offs rather than monotonically improving all metrics, and dimensionless conditioning improves comparability without making transfer symmetric. The benchmark is not a certified material model; within the released oracle, candidate generator and material cards, pinn-gym provides a reproducible testbed for evaluating PIML surrogates as decision systems rather than curve predictors alone.

URL PDF HTML ☆

赞 0 踩 0

2606.07151 2026-06-08 cs.LG 新提交

基于非平衡最优传输的柔性尺寸分子生成变形设计

Malte Franke, Stefan P. Schmid, Zarko Ivkovic, Kjell Jorner, Andreas Krause

发表机构 * ETH Zürich（苏黎世联邦理工学院）； NCCR Catalysis（催化联合研究所）

AI总结针对现有扩散和流模型固定原子数限制的问题，提出基于非平衡最优传输的柔性尺寸分子生成模型Morph，实现条件与无条件3D分子设计，在保持性能的同时提供采样灵活性，并支持分布外生成。

详情

AI中文摘要

生成分子设计的成功取决于模型向高奖励样本的可引导性。由于许多分子性质与分子大小内在相关，准确捕捉性质与原子数的联合分布至关重要。然而，当前的扩散和基于流的模型固定了原子数，这最终限制了它们驾驭这种复杂关系的能力。为解决这一问题，我们引入了Morph，一种基于几何图的柔性尺寸生成模型，用于条件和无条件的3D分子设计。通过动态调整尺寸，Morph可以无缝集成现有的结构先验（如骨架），并显著增强性质引导。我们证明Morph在提供无与伦比的采样灵活性的同时，与当前固定尺寸的最先进模型性能相当。我们展示了在先前模型失败的领域中的分布外生成，为分子设计的增强生成建模铺平了道路。

英文摘要

The success of generative molecular design hinges on a model's steerability toward high-reward samples. Because many molecular properties are intrinsically linked to molecular size, accurately capturing the joint distribution of properties and the number of atoms is essential. However, current diffusion and flow-based models fix the number of atoms, which ultimately limits their ability to navigate this complex relationship. To address this, we introduce Morph, a flexible-size generative model for conditional and unconditional 3D molecular design based on geometric graphs. By dynamically adapting size, Morph can seamlessly integrate existing structural priors, like scaffolds, and significantly enhances property steering. We show that Morph matches current fixed-size state-of-the-art models while offering the benefit of unparalleled sampling flexibility. We demonstrate out-of-distribution generation in regimes where previous models fail, paving the way for enhanced generative modeling for molecular design.

URL PDF HTML ☆

赞 0 踩 0

2606.07254 2026-06-08 cs.LG cs.FL 新提交

A Held-Out Transition-Pair Falsifier for Long-Horizon Non-Abelian State Tracking

长视野非阿贝尔状态跟踪的保留转移对验证器

Jeonghoon Lee

发表机构 * Attractor Dynamics（吸引子动力学）

AI总结针对序列模型在非交换状态跟踪中的局限，提出保留转移对验证协议，在投影循环状态模型上实现长达百万步的完美预测，揭示显式非交换状态组合作为有效归纳偏置。

Comments Technical preprint, 24 pages. 7 figures

详情

AI中文摘要

状态跟踪揭示了序列模型的一个尖锐限制：相关信号通常不是观测令牌的摘要，而是通过非交换变换演化的有序潜在状态。我们引入了一个用于有限非阿贝尔群跟踪的保留转移对验证器。该协议在训练期间禁止选定的有序生成器对，并在评估期间要求相同的局部模式，从而阻断了一条直接的局部转移记忆路径。在一个受控的 $S_3 \ imes S_3$ 基准测试中，仅在长度为8的序列上训练的投影循环状态模型，在长达1,048,576个令牌的评估视野中，跨五个种子产生了无错误的最终状态预测（每个视野完美250/250）。匹配的原生读出基线，包括bag、GRU和单配置结构化状态空间模型，在相同协议下保持接近基线水平。投影匹配的GRU、结构化SSM和bag基线配备了类似的有限群原型读出，在相同分割下也保持接近随机水平。机制诊断显示，硬投影与低同态误差、低状态一致性漂移和非平凡交换子分离同时出现，而软投影则导致最终状态精度崩溃。干净分割审计验证了训练和评估分区之间零逐字缩减词重叠和零结构模板重叠。该证据限于这个受控的有限群验证器，而非通用架构排名。在该范围内，显式投影的非交换状态组合作为长视野隐藏状态跟踪的有用归纳偏置。

英文摘要

State tracking exposes a sharp limitation of sequence models: the relevant signal is often not a summary of observed tokens, but an ordered latent state that evolves through non-commutative transformations. We introduce a held-out transition-pair falsifier for finite non-Abelian group tracking. The protocol forbids selected ordered generator pairs during training and requires the same local patterns during evaluation, blocking one direct local-transition memorization pathway. In a controlled $S_3 \times S_3$ benchmark, a projected recurrent state model trained only on length-8 sequences produces error-free final-state predictions (perfect 250/250 per horizon) through evaluation horizons up to 1,048,576 tokens across five seeds. Matched native-readout baselines, including bag, GRU, and a single-configuration structured state-space model, remain near floor under the same protocol. Projection-matched GRU, structured SSM, and bag baselines equipped with analogous finite-group prototype readouts also remain near chance under the same split. Mechanism diagnostics show that hard projection coincides with low homomorphism error, low state-consistency drift, and non-trivial commutator separation, while softened projection collapses final-state accuracy. Clean-split audits verify zero verbatim reduced-word overlap and zero structural-template overlap between training and evaluation partitions. The evidence is scoped to this controlled finite-group falsifier rather than to a general architecture ranking. Within that regime, explicit projected non-commutative state composition acts as a useful inductive bias for long-horizon hidden-state tracking.

URL PDF HTML ☆

赞 0 踩 0

2606.07271 2026-06-08 cs.LG cs.AI cs.SD 新提交

Where Rectified Flows Leak: Characterising Membership Signals Along the Interpolation Path

整流流泄漏之处：沿插值路径表征成员信号

Thomas Sesmat, Gabriel Meseguer-Brocal, Geoffroy Peeters

发表机构 * University of Amsterdam（阿姆斯特丹大学）

AI总结本文分析整流流（Rectified Flows）在插值路径上的训练数据成员信号，发现训练与测试数据的重建差异呈钟形曲线，并在高斯假设下推导出峰值位置，验证了该结构的普适性，并利用其进行成员推断攻击。

Comments ICML 2026 article, 9 main pages and 25 with annexes, 11 figures

详情

Journal ref: 43rd International Conference on Machine Learning, Seoul, South Korea, 2026

AI中文摘要

理解生成模型从训练数据中保留了什么仍然具有挑战性，这对版权和隐私有影响。除了逐字复制外，模型可以编码训练数据中更微妙的痕迹，这些痕迹从未出现在输出中，但仍可利用。我们针对整流流（Rectified Flows）研究了这一机制，整流流越来越多地用于部署的生成系统。我们分析了定义整流流训练的插值路径 $X_\lambda = (1-\lambda)X_0 + \lambda X_1$。我们展示了训练数据和测试数据的重建之间存在一个差距，该差距在 $\lambda$ 上呈钟形曲线，并在训练过程中累积，而验证指标保持稳定。该信号有一个最大值，我们在高斯假设下推导出其位置的闭式解。我们在音频和图像上验证了这些预测，并表明钟形结构是普遍的，而峰值预测在我们的假设满足时成立。作为概念验证，我们利用这种特定的 $\lambda$ 解析结构进行成员推断攻击，区分训练集的成员和非成员。

英文摘要

Understanding what generative models retain from training data remains challenging, with implications for copyright and privacy. Beyond verbatim reproduction, models can encode subtler traces of their training data that never surface in their outputs yet remain exploitable. We study this regime for Rectified Flows, which are increasingly used in deployed generative systems. We analyse the interpolation path $X_λ= (1-λ)X_0 + λX_1$ that defines the Rectified Flow training. We show that a gap exists between the reconstruction of train and test data that follows a bell-shaped curve over $λ$, wich accumulates during training, while the validation metrics remain stable. The signal has a maximum whose location we derive in closed form under Gaussian assumptions. We validate these predictions on both audio and images and show that the bell-shaped structure is universal, while the peak prediction holds when our assumptions are satisfied. As a proof of concept, we exploit this specific $λ$-resolved structure to perform a Membership Inference Attack, distinguishing members of the training set from non-members.

URL PDF HTML ☆

赞 0 踩 0

2606.07289 2026-06-08 cs.LG cs.CV 新提交

Closed-Form Spectral Regularization for Multi-Task Model Merging

多任务模型融合的闭式谱正则化

Yongxian Wei, Runxi Cheng, Xingxuan Zhang, Li Shen, Chun Yuan, Peng Cui, Dacheng Tao

发表机构 * Shenzhen International Graduate School, Tsinghua University（清华大学深圳国际研究生院）； Department of Computer Science and Technology, Tsinghua University（清华大学计算机科学与技术系）； Sun Yat-sen University（中山大学）； Nanyang Technological University（南洋理工大学）

AI总结针对多任务模型融合中的干扰最小化问题，发现迭代求解器实际充当隐式谱正则化器，据此提出基于谱滤波的闭式方法SWUDI及其自适应变体SWUDI-A，显著提升效率并匹配或超越现有方法。

详情

AI中文摘要

模型融合将多个独立微调专家合并为单个多任务模型，无需任何训练数据，降低了大型基础模型的存储、服务和去中心化开发成本。最先进的融合方法将融合表述为逐层二次干扰最小化问题。尽管该问题存在精确的闭式伪逆解，但该解在实践中性能不如数百次梯度下降迭代。迭代循环主导了流程的成本，但其有效性尚未得到解释。我们重新审视这一机制，并表明迭代求解器主要并非作为优化器；相反，它充当了病态正规方程的隐式谱正则化器，其中每层干扰算子的小特征值方向放大了代理噪声。基于这一发现，我们将多任务模型融合形式化为一个带噪线性逆问题，并提出一种由逐方向滤波器参数化的谱滤波估计器。我们通过SWUDI实例化该估计器，这是一种闭式方法，结合了软指数滤波器（匹配迭代下降的梯度流轨迹）和硬top-K截断（抑制放大噪声的小特征值方向）。此外，我们提出了SWUDI-A，一种自适应变体，用逐层秩规则替换全局秩超参数，进一步提高了跨架构的鲁棒性。两种变体共享每个线性层的单个对称特征分解，且不需要训练数据或优化器状态。在四个通用基准和一个涵盖VQA、几何、图表、OCR、定位和模态融合的多模态融合基准上，我们提出的谱求解器匹配或超越了最先进的融合方法。关键的是，它们将挂钟时间减少了28-72倍，峰值GPU内存减少了高达50%。

英文摘要

Model merging combines several independently fine-tuned experts into a single multi-task model without any training data, reducing the storage, serving, and decentralized-development costs of large foundation models. State-of-the-art merging methods formulate merging as a layer-wise quadratic interference minimization problem. Although this problem admits an exact closed-form pseudoinverse solution, that solution underperforms hundreds of iterations of gradient descent in practice. The iterative loop dominates the cost of the pipeline, yet its effectiveness has remained unexplained. We revisit this regime and show that the iterative solver does not primarily act as an optimizer; rather, it serves as an implicit spectral regularizer for an ill-posed normal equation, where small-eigenvalue directions of the per-layer interference operator amplify proxy noise. Building on this finding, we formalize multi-task model merging as a noisy linear inverse problem and propose a spectral filtering estimator parameterized by a per-direction filter. We instantiate this estimator with SWUDI, a closed-form method that combines a soft exponential filter, which matches the gradient-flow trajectory of iterative descent, with a hard top-K truncation that suppresses noise-amplifying small-eigenvalue directions. Furthermore, we propose SWUDI-A, an adaptive variant that replaces the global rank hyperparameter with per-layer rank rules, further improving robustness across architectures. Both variants share a single symmetric eigendecomposition per linear layer and require no training data or optimizer state. Across four general benchmarks and a multimodal merging benchmark spanning VQA, Geometry, Chart, OCR, Grounding, and modality merging, our proposed spectral solvers match or outperform state-of-the-art merging methods. Crucially, they reduce wall-clock time by 28-72x and peak GPU memory by up to 50%.

URL PDF HTML ☆

赞 0 踩 0

2606.07291 2026-06-08 cs.LG 新提交

SleepExplain: 基于EEG信号的可解释非快速眼动和快速眼动睡眠阶段分类

Rafsan Jany, Md. Hamjajul Ashmafee, Iqram Hussain, Md Azam Hossain

AI总结提出SleepExplain模型，使用集成学习（随机森林、XGBoost、梯度提升）对NREM和REM睡眠阶段进行分类，准确率达94.30%，并利用SHAP提供可解释性。

Comments 6 pages, 7 figures, 2022 25th International Conference on Computer and Information Technology (ICCIT)

详情

DOI: 10.1109/ICCIT57492.2022.10055956
Journal ref: 2022 25th International Conference on Computer and Information Technology (ICCIT), pp. 248-253, 2022

AI中文摘要

睡眠阶段分类是多种睡眠相关疾病最重要的诊断方法之一。脑电图（EEG）被认为是检查神经效应与睡眠阶段之间关联的有力工具，因为它能正确识别与睡眠相关的神经变化。在非快速眼动（NREM）和快速眼动（REM）睡眠阶段，许多神经和身体功能受到影响，因此在其功能中扮演重要角色。本研究旨在从睡眠EEG数据中分类NREM和REM睡眠阶段，并提出一个新颖的SleepExplain模型，一种可解释的NREM和REM睡眠阶段分类，以解释其预测。在这项工作中，使用随机森林、XGBoost和梯度提升集成分类模型对睡眠阶段进行分类。总体而言，我们获得了92.54%（随机森林）、94.25%（梯度提升）和94.30%（XGBoost）的准确率。对于可解释分类模型，我们采用博弈论方法SHAP（SHapley Additive exPlanations）为预测提供令人信服的解释。

英文摘要

Classification of sleep stages is one of the most important diagnostic approaches for a variety of sleep-related disorders. Electroencephalography (EEG) is regarded as a powerful tool for examining the association between neurological effects and sleep phases since it correctly identifies sleep-related neurological alterations. During Non-Rapid Eye Movement (NREM) and Rapid Eye Movement (REM) sleep phases, a number of nerve and bodily functions are affected and therefore hold an important role both in their functionalities. This work aims to classify NREM and REM sleep stages from sleep EEG data and present a noble SleepExplain model, an explainable NREM and REM sleep stage classification to explain its predictions. In this work, sleep stages were classified using Random Forest, XGBoost, and Gradient Boosting ensemble classification models. Overall, we obtained an accuracy of 92.54% (Random Forest), 94.25% (Gradient Boosting), and 94.30% (XGBoost). For explainable classification model, we utilized a game theoretic approach, SHAP (SHapley Addictive exPlanations) to offer a convincing explanation for the prediction.

URL PDF HTML ☆

赞 0 踩 0

2606.07365 2026-06-08 cs.LG cs.AI 新提交

充分利用有限数据：面向文本到音乐生成的分数感知训练

Yun-Chen Cheng, Tzu-Hung Huang, Chih-Pin Tan

发表机构 * National Taiwan University（国立台湾大学）

AI总结提出分数感知训练方法，利用CLAP条件Beta噪声时间表将低分音频段用于高噪声训练，结合段级过滤和两阶段字幕策略，在有限数据下实现高效文本到音乐生成，并在ICME 2026 ATTM挑战赛中获得客观评估第二名。

详情

AI中文摘要

最先进的文本到音乐生成系统依赖于大规模专有数据集和工业级计算资源，使得无法区分架构贡献与资源优势。我们提出\textit{分数感知训练}，将音频-字幕对齐分数作为整个流程的直接监督信号。我们不丢弃低分片段，而是通过CLAP条件Beta噪声时间表将其重新用于高噪声训练阶段，作为有效的隐式正则化器。作为补充，段级过滤移除最不匹配的样本，两阶段字幕程序弥合了冗长训练字幕与简洁推理提示之间的分布差距。REPA辅助损失进一步从预训练的CLAP和MuQ编码器中迁移结构化语义知识，无需额外数据。我们基于FluxAudio的450M参数系统提交至ICME 2026 ATTM Grand Challenge效率赛道，在客观评估中两个赛道均排名第二，在最终MOS评估中效率赛道排名第三。

英文摘要

State-of-the-art text-to-music generation systems rely on massive proprietary datasets and industrial-scale compute, making it impossible to disentangle architectural contributions from resource advantages. We propose \textit{score-aware training}, which treats audio-caption alignment score as a direct supervision signal throughout the pipeline. Rather than discarding low-scoring segments, we repurpose them via a CLAP-conditioned Beta noise timestep schedule that routes them to high-noise training regimes, acting as an effective implicit regularizer. Complementarily, segment-level filtering removes the most misaligned examples, and a two-stage caption procedure bridges the distribution gap between verbose training captions and concise inference prompts. A REPA auxiliary loss further transfers structured semantic knowledge from pretrained CLAP and MuQ encoders without additional data. Our 450M-parameter FluxAudio-based system, submitted to the ICME 2026 ATTM Grand Challenge Efficiency Track, ranked 2nd across both tracks in the objective evaluation and 3rd in the Efficiency Track in the final MOS evaluation.

URL PDF HTML ☆

赞 0 踩 0

2606.07400 2026-06-08 cs.LG 新提交

Generative Modeling of Discrete Latent Structures via Dynamic Policy Gradients

通过动态策略梯度对离散潜在结构进行生成建模

Stefan Ivanovic, Ge Liu, Mohammed El-Kebir

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出GReinSS框架，使用动态缩放奖励学习潜在状态分布以最大化观测数据似然，在模拟潜在集和图重建中优于基线，并在RNA测序数据中比RSEM更准确地重建异构体。

Comments ICML 2026

详情

AI中文摘要

许多科学问题需要从间接观测中推断未观测到的机械潜在状态。虽然经典方法（如期望最大化）无法扩展到组合爆炸的空间，但深度学习方法（如变分自编码器）通常形成人工潜在状态，而非重建机械真实状态。本文提出GReinSS，一个策略学习框架，使用动态缩放奖励来学习最大化观测数据似然的潜在状态分布。我们证明GReinSS能够准确重建模拟的潜在集和潜在图，优于替代的策略学习和生成建模基线。此外，GReinSS从真实短读RNA测序数据中重建的异构体，比标准RSEM算法更匹配通过正交长读测序检测到的异构体。总体而言，GReinSS是一种从间接观测中对组合潜在状态进行生成建模和推断的原则性且实际有效的方法。

英文摘要

Many scientific problems require inferring unobserved mechanistic latent states from indirect observations. While classical approaches, including expectation maximization, do not scale to combinatorially large spaces, deep learning approaches such as variational autoencoders typically form artificial latent states rather than reconstructing the mechanistic ground-truth states. Here, we introduce GReinSS, a policy learning framework that uses dynamically rescaled rewards to learn latent state distributions that maximize the observed data likelihood. We show that GReinSS accurately reconstructs simulated latent sets and latent graphs, outperforming alternative policy learning and generative modeling baselines. Additionally, GReinSS reconstructs isoforms from real short-read RNA sequencing data that better match isoforms detected by orthogonal long-read sequencing than the standard RSEM algorithm. Overall, GReinSS is a principled and practically effective approach for generative modeling and inference of combinatorial latent states from indirect observations.

URL PDF HTML ☆

赞 0 踩 0

2606.07404 2026-06-08 cs.LG 新提交

Reversible Foundations: Training a 120B Sparse MoE through State-Preserving Scaling

可逆基础：通过状态保持缩放训练120B稀疏MoE

Rohan Shravan

发表机构 * The School of AI（人工智能学院）

AI总结本文报告在单台8-GPU节点上端到端训练千亿参数稀疏混合专家模型，通过可逆循环、状态保持增长和单节点经济学三大原则，实现从密集种子到120B模型的四阶段扩展。

Comments 58 pages, 9 figures, 37 tables. Code: https://github.com/The-School-of-AI/LLM. Released models: huggingface.co/theschoolofai/LightningLM-0.1V-{2B, 5B-MoE, 9B-MoE, 120B-MoE}. Companion work: arXiv:2605.29379 (BrahmicTokenizer-131K), arXiv:2605.29459 (Kronecker Embeddings)

详情

AI中文摘要

本文报告在单台八GPU节点上端到端训练千亿参数稀疏混合专家模型。LightningLM 0.1V是一个循环骨干语言模型家族，通过四个阶段从小型密集种子扩展，经过5B和9B混合专家，最终达到120B模型，具有460个路由专家，采用top-12路由。每个更大模型从小模型的训练权重增长而来；活跃参数从密集种子的1.78B单调增加到120B时的5.93B（约占存储的118.67B的5%）。整个谱系在单节点上运行，较大阶段在8K上下文中，达到120B规模时发布的训练损失为1.78。这是一份系统和经验报告，围绕三个原则组织。可逆性：可逆循环栈在反向传播中重建激活而非存储它们，使激活内存随模型增长保持平坦。状态保持增长：每次扩展（密集到MoE、浅到深、少专家到多专家）都作为可重现原则给出，并附有错误导致的失败案例；若干失败是无声的。单节点经济学：120B通过TQP训练，这是一种量化基础专家权重和训练低秩适配器的策略，将优化器状态承载于2.26B适配器参数而非路由专家中的100B+，将专家路径优化器状态减少约45倍。新颖之处在于已知原语的集成，而非任何孤立原语：一个在单节点上端到端运行的成长谱系，以从业者级别记录，并以每个领域的留出损失作为证据，表明目标能力（多语言印度能力、代码）是通过构造学习的。模型家族、分词器和训练代码已发布。

英文摘要

This paper reports on training a hundred-billion-parameter sparse mixture of experts on a single eight-GPU node, end to end. LightningLM 0.1V is a recurrence-backbone language model family grown in four stages from a small dense seed, through a 5B and a 9B mixture of experts, to a 120B model with 460 routed experts under top-12 routing. Each larger model is grown from the trained weights of the smaller one; active parameters rise monotonically from 1.78B at the dense seed to 5.93B at 120B (about 5% of the 118.67B stored). The full lineage runs on single nodes, the larger stages at 8K context, reaching a released training loss of 1.78 at 120B scale. This is a systems and experience report. It is organized around three disciplines. Reversibility: a reversible recurrence stack reconstructs activations in the backward pass instead of storing them, holding activation memory flat as the model grows. State-preserving growth: each expansion (dense to MoE, shallow to deep, few experts to many) is given as a reproducible principle paired with the failure that results from getting it wrong; several failures are silent. Single-node economics: the 120B trains through TQP, a strategy of quantized base expert weights and trained low-rank adapters that carries optimizer state on 2.26B adapter parameters rather than 100B+ resident in routed experts, cutting expert-path optimizer state by a factor of ~45. What is new is the integration of known primitives, not any primitive in isolation: one grown lineage running end to end on a single node, documented at practitioner level, with per-domain held-out loss as evidence that targeted capabilities (multilingual Indic competence, code) were learned by construction. Model family, tokenizer, and training code are released.

URL PDF HTML ☆

赞 0 踩 0

2606.07410 2026-06-08 cs.LG cs.AI 新提交

利用高阶类标签连通性的图神经网络用于异配图

Takuto Takahashi, Itsuki Nakayama, Takahiro Mitani, Ryosuke Kikuchi, Yuya Sasaki, Makoto Onizuka

发表机构 * The University of Osaka（大阪大学）

AI总结针对异配图中节点分类性能受限问题，提出标签上下文分类器（LCC），通过四种游走生成标签上下文嵌入捕获高阶类标签连通性，并可与任意GNN自适应集成，实验表明优于现有方法。

详情

AI中文摘要

图神经网络（GNN）中的节点分类已广泛应用于图分析的各个领域。在同配图中，具有相同类标签的节点倾向于连接，GNN能实现高精度节点分类。然而，在异配图中，不同类标签的节点更可能连接，其性能仍然有限。特别是，当前基于图卷积网络的GNN无法捕获高阶类标签连通性，而这在真实世界的异配图中经常出现。为了解决这个问题，我们提出了一种新颖的分类器——标签上下文分类器（LCC），旨在捕获有向图中的高阶类标签连通性。LCC通过利用四种不同类型的游走生成的标签上下文嵌入来估计目标节点的类标签。此外，我们的方法允许通过自适应学习LCC和任意GNN的重要性来集成它们。实验结果表明，与LCC集成的GNN优于最先进的方法，并且标签上下文嵌入提高了异配有向图中的节点分类性能。

英文摘要

Node classification in graph neural networks (GNNs) has been widely applied in various fields of graph analysis. GNNs achieve high-accuracy node classification in homophilous graphs, where nodes with the same class label tend to be connected. However, their performance remains limited in heterophilous graphs, where nodes with different class labels are more likely to be connected. In particular, current GNNs derived from graph convolutional networks cannot capture higher-order class label connectivity, which is frequently observed in real-world heterophilous graphs. To address this issue, we propose a novel classifier, Label Context Classifier (LCC), designed to capture higher-order class label connectivity in directed graphs. LCC estimates the class label of a target node by leveraging label context embeddings that are generated through four distinct types of walks. In addition, our approach allows the integration of LCC and any GNN by adaptively learning their importance. Experimental results demonstrate that GNNs integrated with LCC outperform SOTA methods and the label context embeddings improve the node classification performance in heterophilous directed graphs.

URL PDF HTML ☆

赞 0 踩 0

2606.07481 2026-06-08 cs.LG 新提交

Drifting Models for Surrogate Flow Modeling

用于代理流建模的漂移模型

Chris R. Jung, Markus Dörr, Natalie Jüngling, Jennifer Niessner, Adam T. Müller, Nicolaj C. Stache

发表机构 * Center for Machine Learning (ZML)（机器学习中心（ZML））； Institute for Flow in Additively Manufactured Porous Structures (ISAPS)（添加剂制造多孔结构流动研究所（ISAPS））； Heilbronn University of Applied Sciences（海德堡应用科学大学）

AI总结提出条件漂移框架，在VAE潜空间中进行漂移并利用标签感知掩码对齐边界条件，实现高质量单步生成，速度比迭代扩散快两个数量级。

Comments Accepted to the 2nd International Symposium AI and Fluid Mechanics 2026

详情

AI中文摘要

虽然计算流体动力学（CFD）可以为优化室内环境提供高保真流场，但其计算成本限制了快速探索。为了解决这个问题，生成式代理比确定性网络提供了更好的分布建模，但迭代采样速度慢。为了实现高质量的单步生成，我们将新颖的生成式漂移框架应用于流体力学。我们引入了一个条件架构，该架构在学习的VAE潜空间中进行漂移，并使用标签感知掩码将生成的样本与其边界条件对齐。我们的标签条件模型在精度和流一致性上匹配迭代扩散，同时运行速度快两个数量级。此外，我们提出了一种空间条件变体，为泛化到未见几何体开辟了有希望的路径。最终，条件漂移作为基于扩散方法的高效替代方案，为推理速度至关重要的实时CFD代理提供了可能。

英文摘要

While Computational Fluid Dynamics (CFD) provides high-fidelity flow fields for optimizing indoor environments, its computational cost limits rapid exploration. To solve this problem generative surrogates offer better distribution modeling than deterministic networks, but iterative sampling is slow. To enable high-quality, single-pass generation, we adapt the novel generative drifting framework to fluid mechanics. We introduce a conditional architecture that performs drifting in a learned VAE latent space and uses label-aware masking to align generated samples with their boundary conditions. Our label-conditioned model matches iterative diffusion in accuracy and flow consistency while running two orders of magnitude faster. Additionally, we propose a spatial-conditioning variant that establishes a promising path towards generalization to unseen geometries. Ultimately, conditional drifting serves as a highly efficient alternative to diffusion based approaches, unlocking real-time CFD surrogates where inference speed is critical.

URL PDF HTML ☆

赞 0 踩 0

2606.07483 2026-06-08 cs.LG stat.ML 新提交

Network Recovery from Cascade Data: A Debiased Jacobian-Based Machine Learning Approach

从级联数据中恢复网络：一种基于去偏雅可比矩阵的机器学习方法

Lei Huang

发表机构 * MIT Sloan School of Management（麻省理工学院斯隆管理学院）

AI总结提出CascadeNet框架，通过去偏雅可比矩阵估计一步转移函数，无需指定扩散模型即可恢复隐藏影响网络，在模拟和COVID-19传播数据中优于现有方法。

详情

AI中文摘要

许多重要结果以动态级联的形式展开，包括产品采用、疾病传播、金融困境和信息扩散。一个核心挑战是恢复这些级联背后的隐藏影响网络。现有方法通常假设特定的扩散模型，当该假设错误时，其性能会大幅下降。我们提出了CascadeNet，一种基于雅可比矩阵的机器学习框架，用于网络恢复，无需指定扩散机制。关键思想是，潜在的影响结构可以通过一步转移函数的雅可比矩阵来刻画。CascadeNet首先构建转移函数的灵活估计量，然后通过Riesz表示应用Neyman正交去偏，使得去偏后的雅可比矩阵是$\sqrt{n}$一致且渐近正态的，从而能够对网络结构进行正式推断。我们在模拟实验和真实世界实证应用中验证了CascadeNet。在模拟中，数据生成过程已知，CascadeNet在九种常见数据生成过程中实现了最高的网络恢复准确率。在西班牙52个省份的COVID-19传播实证应用中，CascadeNet恢复的传播网络与真实的省际移动网络显著相关，而基线方法恢复的网络与真实情况无显著一致性。

英文摘要

Many important outcomes unfold as dynamic cascades, including product adoption, disease spread, financial distress, and information diffusion. A central challenge is to recover the hidden influence network behind these cascades. Existing methods typically assume a specific diffusion model, and their performance degrades substantially when that assumption is misspecified. We propose CascadeNet, a Jacobian-based machine learning framework for network recovery that does not require specifying a diffusion mechanism. The key idea is that the underlying influence structure can be characterized by the Jacobian of the one-step transition function. CascadeNet first constructs a flexible estimator of the transition function, and further applies Neyman-orthogonal debiasing via the Riesz representer, so that the debiased Jacobian is $\sqrt{n}$-consistent and asymptotically normal, enabling formal inference on the network structure. We validate CascadeNet in both a simulation exercise and a real-world empirical application. In simulations, where the data-generating process is known, CascadeNet achieves the highest network recovery accuracy across nine common data-generating processes. In an empirical application to COVID-19 transmission across Spain's 52 provinces, CascadeNet recovers transmission networks that are significantly correlated with the true inter-province mobility network, whereas networks recovered by baseline methods show no significant alignment with the ground truth.

URL PDF HTML ☆

赞 0 踩 0

2606.07488 2026-06-08 cs.LG 新提交

CoMetaPNS: Continually Meta-learning Personalized Neural Surrogates for Cardiac Electrophysiology Simulations

CoMetaPNS：心脏电生理模拟的持续元学习个性化神经代理

Ryan Missel, Xiajun Jiang, Linwei Wang

发表机构 * Golisano College of Computing and Information Sciences, Rochester Institute of Technology（罗切斯特理工学院戈利萨诺计算与信息科学学院）； Department of Computer Science, Rowan University（罗文大学计算机科学系）； The University of Utah（犹他大学）

AI总结提出持续元学习框架CoMetaPNS，通过贝叶斯高斯混合模型记忆缓冲区分辨数据来源，实现个性化神经代理的持续学习，避免灾难性遗忘，在心脏模拟预测中优于基线。

详情

AI中文摘要

个性化虚拟心脏模拟面临模型个性化和计算成本的挑战。虽然神经代理提供了最先进的解决方案，但它们通常只解决高效个性化或训练可泛化模型中的一个方面。最近的工作通过使用有限的主题特定上下文数据，通过小样本生成建模与集合条件代理和元学习摊销推理，重新定义了学习个性化代理的过程。然而，这些方法假设一个静态且多样化的训练分布，并具有已知的任务标识符。当新数据可用时，它们需要与所有先前数据一起进行昂贵的重新训练，以避免灾难性遗忘——即模型在训练新任务时忘记旧任务的现象。这在临床环境中是一个主要限制，因为未标记的数据通常顺序到达，而完全重新训练是不可行的。本文提出了一种新的持续元学习框架，以实现个性化的神经代理，该代理不仅能够持续整合信息，还能识别传入数据是否来自已知或未知的动态源。通过利用基于记忆缓冲区的持续贝叶斯高斯混合模型，我们的框架可以推断数据随时间变化的标识符和关系——这是有效元学习所必需的。在合成心脏数据上的实验结果表明，与现有基线相比，我们的方法在模拟预测、计算可扩展性和对灾难性遗忘的鲁棒性方面表现更优。

英文摘要

Personalized virtual heart simulations face challenges in model personalization and computational cost. While neural surrogates offer state-of-the-art solutions, they typically address either efficient personalization or training generalizable models. Recent work reframes this by learning the process of personalizing a surrogate using limited subject-specific context data, through few-shot generative modeling with set-conditioned surrogates and meta-learned amortized inference. These methods, however, assume a static and diverse training distribution with known task identifiers. When new data becomes available, they require costly retraining with all prior data to avoid catastrophic forgetting - a phenomena where the model forgets earlier tasks when trained on new ones. This is a major limitation in clinical settings where often unlabeled data arrives sequentially and full retraining is infeasible. This paper presents a new continual meta-learning framework to achieve personalized neural surrogates able to not only continually integrate information but also identify whether incoming data stems from a known or unknown dynamics source. By leveraging a continual Bayesian Gaussian Mixture Model over a memory buffer, our framework can infer the identifiers and relationships of data over time - required for effective meta-learning. Empirical results on synthetic cardiac data demonstrate superior simulation forecasting, computational scalability, and resilience to catastrophic forgetting compared to existing baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.07495 2026-06-08 cs.LG 新提交

Second-Order Path Kernel Interpolation Formulas in Machine Learning

机器学习中的二阶路径核插值公式

Jin Guo, Roy Y. He, Jean-Michel Morel

发表机构 * City University of Hong Kong（香港城市大学）

AI总结本文提出神经网络的二阶路径核插值公式，引入曲率加权项和随机梯度下降的噪声耦合项，并扩展到带动量的情况，完善了路径核对预测的解释。

详情

AI中文摘要

理解训练数据如何塑造神经网络预测是现代学习理论的核心问题。2020年，Pedro Domingos提出了一个适用于通过确定性梯度下降学习的每个模型的插值公式。它将模型的预测表示为沿优化路径的积分，该积分依赖于一个数据相关的核，该核对齐模型在测试数据和训练数据上的梯度。这种一阶特征对于基于批处理的随机优化训练的模型仍然有效。在本文中，我们发展了这些插值公式的二阶形式。我们表明，主要的路径核插值由一个曲率加权的插值项补充。对于随机梯度下降，出现了一个额外的采样诱导分量，将预测的曲率与小批量梯度噪声的协方差耦合起来。我们还将表示扩展到带动量的随机梯度下降，其中插值结构得以保留，但权重由记忆相关因子修改。此外，我们建立了终端预测的浓度估计，确定了围绕期望二阶表示的波动尺度。这些结果共同完善了神经网络预测的路径核解释。

英文摘要

Understanding how training data shape neural network predictions is a central problem in modern learning theory. In 2020, Pedro Domingos proposed an interpolation formula valid for every model learned by deterministic gradient descent. It expresses the model's prediction as an integral, along the optimization path, of a data-dependent kernel that aligns the model's gradients at the test and training data. Such a first-order characterization remains valid for models trained with batch-based stochastic optimization. In this paper, we develop second-order forms of these interpolation formulas. We show that the leading path-kernel interpolation is supplemented by a curvature-weighted interpolation term. For stochastic gradient descent, an additional sampling-induced component appears, coupling the curvature of the prediction with the covariance of mini-batch gradient noise. We also extend the representation to stochastic gradient descent with momentum, where the interpolation structure is preserved but with the weights modified by a memory-related factor. Moreover, we establish a concentration estimate for the terminal prediction, identifying the fluctuation scale around the expected second-order representation. Together, these results provide a refinement of the path-kernel interpretation of neural network prediction.

URL PDF HTML ☆

赞 0 踩 0

2606.07496 2026-06-08 cs.LG math.OC 新提交

在有限标签下哪些解剖结构重要？用于心脏病理预测的数据高效解剖感知基准

Himanshu Singh

发表机构 * Himanshu Singh（希曼斯·辛格）

AI总结针对有限标签和计算资源下的医学影像问题，提出解剖感知基准，通过比较不同解剖结构表示和分类器，发现表示质量比模型复杂度更重要。

Comments ACCEPTED at ICML 2026 Workshop GlobalSouthML (Seoul, South Korea; PMLR 306, 2026)

2606.06514 2026-06-08 cs.AI cs.LG 交叉投稿

Detecting and Mitigating Bias by Treating Fairness as a Symmetry Operation

通过将公平性视为对称操作来检测和缓解偏见

Nishit Singh

发表机构 * Nishit Singh

AI总结将偏见形式化为对称性破缺操作，通过损失正则化恢复对称性，在合成数据集上实现90%以上的违规减少，准确率成本约5%。

Comments 8 pages, 7 figures

2606.06515 2026-06-08 cs.AR cs.AI cs.DC cs.ET cs.LG 交叉投稿

DxPTA: An Architecture Design Space Exploration with Optical Dataflow-guided Strategy for HW/SW Co-Design of Photonic Transformer Accelerators

DxPTA：基于光学数据流引导策略的光子Transformer加速器硬件/软件协同设计的架构设计空间探索

Rachmad Vidya Wicaksana Putra, Solomon Micheal Serunjogi, Mahmoud Rasras, Muhammad Shafique

发表机构 * eBRAIN Lab, Division of Engineering, New York University (NYU) Abu Dhabi（eBRAIN实验室，工程学院，纽约大学（NYU）阿布扎赫德分校）； Photonic Research Lab (PRL), Division of Engineering, New York University (NYU) Abu Dhabi（光子研究实验室（PRL），工程学院，纽约大学（NYU）阿布扎赫德分校）； New York University (NYU) Abu Dhabi（纽约大学（NYU）阿布扎赫德分校）

AI总结提出DxPTA方法，通过光学数据流分析架构参数并设计约束感知搜索算法，实现光子Transformer加速器的高效硬件/软件协同设计，在满足面积、功耗等约束下显著提升搜索速度。

Comments 8 pages, 12 figures

详情

AI中文摘要

基于Transformer的网络已成为具有最先进性能的突出AI模型，可能为人工通用智能（AGI）铺平道路。然而，它们的大尺寸仍然阻碍了其高效实现，因此需要替代解决方案以实现其节能加速。最近，最先进的工作提出了光子Transformer加速器（PTA），与传统电子加速器相比，具有显著的加速和能效提升。然而，它们的PTA架构是在不考虑应用约束（如面积、功耗、能量和延迟）的情况下开发的。此外，它们的手动设计方法也需要大量设计时间来确定适合目标应用的架构，因此使得这种方法不可扩展。为了解决这些限制，我们提出了DxPTA，一种新颖的设计空间探索方法，用于实现满足所有约束的适当PTA架构的高效硬件/软件协同设计。这是通过（1）基于相干光学数据流识别PTA架构参数；（2）分析参数的影响/重要性；（3）利用此分析设计约束感知架构搜索算法来实现的。实验结果表明，我们的DxPTA可以为不同的基于Transformer的模型（即DeiT-T/S/B和BERT-B/L）找到合适的PTA架构。在约束条件为面积50mm^2、功耗5W、能量50mJ和延迟10ms的情况下，它实现了高达26mm^2面积、4.8W功耗、39mJ能量和6ms延迟；搜索时间比穷举方法快15.2倍。这些结果证明了DxPTA方法在实现针对各种基于AGI的应用的高效PTA设计方面的潜力。

英文摘要

Transformer-based networks have emerged as prominent AI models with state-of-the-art performance, which potentially pave the way toward artificial general intelligence (AGI). However, their large sizes still hinder their efficient implementation, thus highlighting the need for alternate solutions to enable their energy-efficient acceleration. Recently, state-of-the-art works propose photonic transformer accelerators (PTAs) with significant speedup and energy efficiency improvements over the conventional electronic accelerators. However, their PTA architectures are developed without considering the application constraints (e.g., area, power, energy, and latency). Moreover, their manual design approach also requires huge design time to determine a suitable architecture for the targeted application, hence making this approach not scalable. To address these limitations, we propose DxPTA, a novel design space exploration methodology for enabling efficient hardware/software co-design of the appropriate PTA architecture that meets all constraints. It is achieved by (1) identifying the PTA architecture parameters based on the coherent optical dataflow; (2) analyzing the impact/significance of the parameters; and (3) leveraging this analysis for devising a constraint-aware architecture search algorithm. Experimental results show that, our DxPTA can find the appropriate PTA architectures for different transformer-based models (i.e., DeiT-T/S/B and BERT-B/L). It achieves up to 26mm^2 area, 4.8W power, 39mJ energy, and 6ms latency, for constraints of 50mm^2 area, 5W power, 50mJ energy, and 10ms latency; with 15.2x faster searching time than the exhaustive approach. These results demonstrate the potential of DxPTA methodology for enabling efficient PTA designs for diverse AGI-based applications.

URL PDF HTML ☆

赞 0 踩 0

2606.06516 2026-06-08 q-bio.QM cs.LG 交叉投稿

Probabilistic learning to perform pre-onset individualised prediction of disease severity: application to Veno Occlusive Disease

概率学习用于疾病严重程度的发病前个体化预测：在静脉闭塞性疾病中的应用

Dalia Chakrabarty, Kane Warrior, Chuqiao Zhang, Akash Bhojgaria, Joydeep Chakrabartty

发表机构 * University of York（约克大学）

AI总结提出一种新的概率监督学习方法，利用数字孪生和概率逆学习，在骨髓移植前自动预测静脉闭塞性疾病（VOD）的严重程度评分，辅助医生制定治疗方案。

详情

AI中文摘要

我们提出了一种新的概率监督学习方法，能够对预期患者疾病发展的严重程度进行可靠、自动且早期的个体化预测。通过考虑预期患者的数字孪生（DT），在移植前预测静脉闭塞性疾病（VOD）的严重程度评分来展示预测能力，该评分参数化了患者在接受骨髓移植后VOD发展的严重程度。通过将移植前变量与严重程度评分变量之间的关系建模为（随机）函数，该函数被视为适当选择的随机过程的样本函数，从而学习这种关系。该基础过程的参数使用训练数据集学习，该数据集由回顾性患者队列的实时演变生成，随后通过预期患者评分的概率逆学习来扩充该训练数据集的大小。扩充后的训练集允许学习在移植前阶段自动预测VOD严重程度评分的函数，该评分表征了物理患者在其独特移植前状态下的DT。该评分随后反馈给真实预期患者，作为其移植后VOD发展的严重程度。这样的评分允许治疗血液肿瘤学家决定治疗方案，在本例中简化为决定是否使用去纤维蛋白多核苷酸治疗患者。开发了一个AI工具来执行这种自动预测，医生输入表征预期患者DT的移植前状态数据。

英文摘要

We advance a new probabilistic supervised learning approach that permits reliable, automated, and early individualised prediction of the severity with which a disease will develop in a prospective patient. The prediction capacity is illustrated via the pre-transplant prediction of the score of severity of Veno Occlusive Disease (or VOD) in the digital twin (DT) of the considered prospective patient, where this score parametrises the severity with which VOD will develop in this patient, after they undergo their Bone Marrow Transplant. The learning of the relationship between the pre-transplant variables, and a severity score variable is undertaken by modelling this relationship as a (random) function that is treated as a sample function of an adequately-chosen stochastic process. The parameters of this underlying process are learnt using a training dataset that is generated using the real-time evolution of retrospective patients in a cohort, with this training dataset subsequently augmented in size by a probabilistic inverse learning of the score of prospective patients. The augmented training set, then permits the learning of the function that capacitates - at the pre-transplant stage - automated prediction of the score of the severity of VOD that characterises the DT of a physical patient in their unique pre-transplant state. This score is subsequently fed back to the real prospective patient as the severity with which VOD will develop in them, after this patient undergoes their transplant. Such a score then permits the treating Haematologist-Oncologists to decide on the treatment regimen, which in this illustration reduces to deciding on treating the patient with Defibrotide. An AI facility is developed to undertake such automated prediction, with the physician inputting the data on the pre-transplant state that characterises the DT of the prospective patient under consideration.

URL PDF HTML ☆

赞 0 踩 0

2606.06518 2026-06-08 cs.AI cs.LG 交叉投稿

DiBS: Diffusion-Informed Branch Selection

DiBS: 扩散模型引导的分支选择

Bo Liu, Yuan Xie, Yuan Gao, Xiaolong Luo, Peng Ye, Tao Chen, Fujun Han

发表机构 * Nanyang Technological University（南洋理工大学）； University of Science and Technology of China（中国科学技术大学）； Tsinghua University（清华大学）

AI总结针对数独求解中学习型求解器缺乏正确性保证而符号求解器存在长尾搜索的问题，提出扩散模型引导的分支选择方法DiBS，在保持符号求解器完备性的同时，利用扩散模型排序候选值，显著降低搜索成本。

Comments 12 pages, 6 figures, 3 tables

详情

AI中文摘要

数独是一个典型的约束满足问题，需要在严格的离散约束下进行全局结构推理。现有的数独求解方法主要集中于两种主流方法，即传统启发式方法和深度学习求解器。然而，它们存在互补的局限性：基于学习的求解器缺乏硬正确性保证，而完全符号求解器仍容易陷入长尾搜索。为了解决这些不足，我们提出了一种新颖的扩散模型引导方法，称为DiBS，用于分支选择搜索过程。具体来说，DiBS保持符号求解器的完备性，并使用扩散模型作为分支排序指南。核心方法是在当前部分赋值和轻量级一致性信号下对候选值进行排序。此外，我们提供了深入的理论证明来揭示其工作原理和原因。在具有挑战性的Royle 17线索数独基准上的实验表明，与强启发式基线相比，我们的DiBS显著降低了搜索成本，特别是在节点、回溯和长尾百分位数方面。此外，这些结果证实，在分支顺序错误代价最高的困难实例上，学习到的全局指导是有效的。所有代码均可在此https URL获取。

英文摘要

Sudoku is a representative constraint satisfaction problem that requires global structural reasoning under strict discrete constraints. The existing works of solving Sudoku mainly focus on two dominant approaches, i.e., traditional heuristic and deep learning solver. However, they suffer from two complementary limitations: learning-based solvers lack hard correctness guarantees, while complete symbolic solvers are still prone to long-tail search. To address these shortcomings, we propose a novel diffusion model-guided approach, termed as DiBS, for the branch selection search process. Specifically, DiBS keeps the symbolic solver complete and uses the diffusion model as a branch-ordering guide. The core method is ranking candidate values under the current partial assignment and lightweight consistency signal. Furthermore, we provide an in-depth theoretical proof to reveal how it works and why it works. Experiments on the challenging Royle 17-clue Sudoku benchmark show that our DiBS substantially reduces search cost relative to strong heuristic baselines, especially in nodes, backtracks, and long-tail percentiles. Besides, these results confirm that learned global guidance is effective on hard instances where branch-order mistakes are most expensive. All codes are available at https://github.com/shanxierdan/DiBS.

URL PDF HTML ☆

赞 0 踩 0

2606.06519 2026-06-08 cs.AI cs.LG 交叉投稿

智能体AI控制评估中的攻击选择显著降低安全性

Catherine Ge-Wang, Tyler Crosse, Benjamin Hadad, Joachim Schaeffer, Ram Potham, Tyler Tracy

发表机构 * GitHub ； arXiv

AI总结本文研究攻击者策略性选择攻击时机对AI控制安全性的影响，通过分解攻击决策为开始和停止策略，实验表明两者均显著降低安全性，现有评估可能高估安全性。

详情

AI中文摘要

一个策略性选择攻击时机的攻击者比无差别攻击者更难被捕获。AI控制是一种安全框架，用于在较弱但可信的监控者和有限的人工审计预算下部署能力强但不可信的AI智能体。控制评估通过让红队攻击策略对抗蓝队监控者来压力测试这些协议，但当前评估通常假设攻击者不会策略性地选择攻击时机。我们通过将攻击决策分解为开始策略（决定攻击者何时应开始攻击）和停止策略（决定攻击者何时应中止正在进行的攻击），在智能体环境中研究了这种能力——攻击选择。在两个智能体环境BashArena和LinuxArena中，两种策略都在不改变底层攻击能力的情况下显著降低了测量的经验安全性。在1%的审计预算下，我们的开始策略在BashArena和LinuxArena上均将安全性降低了20个百分点，停止策略在BashArena上降低了20个百分点，在LinuxArena上降低了28个百分点。这些降低应被解释为攻击选择效果的上限。因此，现有的控制评估可能对选择性攻击者产生过于乐观的安全性估计。我们建议未来的评估、系统卡和安全案例应引出攻击选择，以产生更现实的安全性估计。

英文摘要

An attacker that strategically chooses when to attack is much harder to catch than one that attacks indiscriminately. AI control is a safety framework for deploying capable but untrusted AI agents under the oversight of a weaker, trusted monitor and a limited human audit budget. Control evaluations stress-test these protocols by pitting a red-team attack policy against the blue-team monitor, but current evaluations typically assume attackers that do not strategically select when to attack. We study this capability, attack selection, in agentic settings by decomposing attack decisions into a start policy, which decides when an attacker should attack, and a stop policy, which decides when an attacker should abort an ongoing attack. Across two agentic settings, BashArena and LinuxArena, both policies substantially lower measured empirical safety without changing the underlying attack capability. At a 1% audit budget, our start policy reduces safety by 20pp on both BashArena and LinuxArena, and our stop policy reduces safety by 20pp on BashArena and 28pp on LinuxArena. These reductions should be interpreted as upper bounds on the effect of attack selection. Existing control evaluations may therefore yield overly optimistic safety estimates against selective attackers. We recommend that future evaluations, system cards, and safety cases elicit attack selection to produce more realistic safety estimates.

URL PDF HTML ☆

赞 0 踩 0

2606.06535 2026-06-08 cs.SE cs.LG 交叉投稿

Architecturally Significant MLOps Guidelines for ML Model Integration and Deployment: a Gray Literature Review

架构上重要的MLOps指南：ML模型集成与部署的灰色文献综述

Faezeh Amou Najafabad, Markus Haug, Keerthiga Rajenthiram, Justus Bogner, Ilias Gerostathopoulos

发表机构 * Vrije Universiteit Amsterdam（阿姆斯特丹自由大学）； Technical University of Munich（慕尼黑技术大学）

AI总结通过灰色文献综述，总结了25条架构上重要的MLOps指南，分为五类，用于指导ML模型在MLOps系统中的集成与部署。

Comments ECSA2026

详情

AI中文摘要

背景。尽管机器学习运维（MLOps）的采用日益增长，但由于缺乏统一的架构指导，团队往往以临时方式处理MLOps项目。社区将受益于一份综合知识的参考，以指导MLOps系统的架构设计，特别是关于ML模型的集成与部署。目标。为此，我们的目标是提供一份关于MLOps系统中ML模型集成与部署的架构上重要指南的全面概述。方法。我们对103个网络来源进行了灰色文献综述，以分析MLOps模型集成与部署的实践知识现状。然后，我们应用主题分析将这些实践综合为推荐指南。结果。我们贡献了25条架构上重要的MLOps指南，用于模型集成与部署，分为五类，并描述了它们对整体系统架构的影响。结论。我们的结果作为实践现状的MLOps指南概述，以支持研究人员和从业者在其MLOps系统中集成与部署ML模型。

英文摘要

Context. Despite the growing adoption of Machine Learning Operations (MLOps), teams often approach MLOps projects in an ad hoc manner due to the lack of consolidated architectural guidance. The community would benefit from a reference that synthesizes knowledge to inform the architectural design of MLOps systems, especially regarding the integration and deployment of ML models. Objective. In response, our goal is to provide a comprehensive overview of architecturally significant guidelines for the integration and deployment of ML models in MLOps systems. Method. We conduct a gray literature review of 103 web sources to analyze state-of-practice knowledge on MLOps model integration and deployment. We then apply thematic analysis to synthesize these practices into recommended guidelines. Results. We contribute a collection of 25 architecturally significant MLOps guidelines for model integration and deployment, organized into five categories, and describe their impact on the overall system architecture. Conclusion. Our results serve as an overview of state-of-practice MLOps guidelines to support researchers and practitioners with the integration and deployment of ML models in their MLOps systems.

URL PDF HTML ☆

赞 0 踩 0

2606.06536 2026-06-08 cs.CV cs.AI cs.LG 交叉投稿

Attention-Guided Autoencoder Fusion for Insulator Defect Detection Using UAV Transmission-Line Imaging

基于注意力引导自编码器融合的无人机输电线路绝缘子缺陷检测

Malak Allam, Khaled Shaban, Ali Hamdi

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出AE-YOLO框架，通过注意力引导自编码器融合与方差最大化正则化，解决无人机图像中绝缘子缺陷检测的类别不平衡和尺度变化问题，在mAP@0.5上达95.10%，优于YOLO基线5个百分点。

详情

AI中文摘要

高压输电线路绝缘子的自动缺陷检测仍然具有挑战性，原因在于无人机（UAV）图像中严重的类别不平衡、尺度变化大以及缺陷实例的空间范围小。为了解决这些问题，本文提出了AE-YOLO，一种注意力引导的自编码器增强型YOLO框架，用于鲁棒的绝缘子缺陷检测。该架构在特征金字塔网络-路径聚合网络（FPN-PAN）颈部集成了轻量级瓶颈自编码器，在多尺度特征融合过程中保留了异常敏感信息。整个骨干网络使用卷积块注意力模块（CBAM），增强了特征辨别能力并抑制了背景干扰。该框架还引入了一种方差最大化的自编码器正则化策略，鼓励生成多样化、缺陷判别性的潜在表示。网络使用统一目标进行训练，该目标结合了焦点损失、完全IoU（CIoU）损失和自编码器正则化，以解决前景-背景不平衡问题并提高定位精度。在推理过程中，加权框融合（WBF）结合了YOLOv8、YOLOv10和YOLO11的预测结果。自编码器引导的置信度提升机制提高了对罕见缺陷类别的敏感性。在绝缘子缺陷检测数据集上的实验表明，采用EfficientNetV2骨干网络的AE-YOLO在mAP@0.5上达到95.10%，精度为96.40%，召回率为93.80%。这一性能在mAP@0.5上比最强的YOLO系列基线高出5.0个百分点，在召回率上高出6.7个百分点。这些结果证实了该框架的有效性和适应性。该模型是基于UAV的输电线路巡检和缺陷监测的实用且可扩展的解决方案。

英文摘要

Automated defect detection in high-voltage transmission-line insulators remains challenging due to severe class imbalance, large scale variation, and the small spatial extent of defect instances in Unmanned Aerial Vehicle (UAV) imagery. To address these challenges, this paper proposes AE-YOLO, an Attention-Guided AutoEncoder-Enhanced YOLO framework for robust insulator defect detection. The architecture integrates lightweight bottleneck autoencoders within a Feature Pyramid Network-Path Aggregation Network (FPN-PAN) neck. This preserves anomaly-sensitive information during multi-scale feature fusion. Convolutional Block Attention Modules (CBAM) are used throughout the backbone, enhancing feature discrimination and suppressing background interference. The framework also introduces a variance-maximizing autoencoder regularization strategy, which encourages diverse, defect-discriminative latent representations. The network trains using a unified objective that combines focal loss, Complete IoU (CIoU) loss, and autoencoder regularization to address foreground-background imbalance and improve localization accuracy. During inference, Weighted Boxes Fusion (WBF) combines predictions from YOLOv8, YOLOv10, and YOLO11. An autoencoder-guided confidence boosting mechanism improves sensitivity to rare defect categories. Experiments on the Insulator-Defect Detection dataset show that AE-YOLO with an EfficientNetV2 backbone achieves 95.10 percent mAP at 0.5, 96.40 percent precision, and 93.80 percent recall. This performance surpasses the strongest YOLO-family baseline by 5.0 points in mAP at 0.5 and 6.7 points in recall. These results confirm the effectiveness and adaptability of the framework. The model is a practical and scalable solution for UAV-based transmission-line inspection and defect monitoring.

URL PDF HTML ☆

赞 0 踩 0

2606.06539 2026-06-08 cs.CV cs.AI cs.LG cs.NE 交叉投稿

Synthetic Benchmarks Overstate Forward-Forward Scaling: Real-Data Limits of Layer-Local Training

合成基准高估了前向-前向扩展：真实数据对逐层训练的限制

Yucheng Chen

发表机构 * Amplimit

AI总结通过DTG-FF方法在真实数据上评估前向-前向学习的扩展性，发现其与反向传播的差距随类别数增加而扩大，合成任务高估了其迁移能力，且内存优势不成立。

Comments 23 pages, 6 figures

详情

AI中文摘要

前向-前向（FF）学习[Hinton, 2022]用严格的逐层良好性更新取代了反向传播。最近的FF-CNN工作在32x32基准上缩小了与BP的差距，引发了逐层训练是否在现实规模下成为可行替代方案的问题。为了严格探究这一点，我们开发了DTG-FF——动态温度良好性、解耦归一化和多层融合——作为在九个真实数据基准上设定FF系列最先进水平的工具（CIFAR-10上91.8%，以及ImageNet-100 224x224上的首个FF基线），并用它来审计逐层训练实际能扩展到何种程度。（1）真实数据扩展。在相同配方和主干下，架构匹配的BP-DeepSup基线在CIFAR-10/CIFAR-100上分别超过DTG-FF 2.40/5.93个百分点，且差距随类别数增加而扩大。在224x224分辨率下，同一工具仅达到49.4%——这是该尺度下的首个FF基线，而典型BP超过75%[Tian et al., 2020]——暴露了在32x32下不可见的真实数据上限。（2）合成与真实K冲突。在合成教师-学生任务中，随着类别数K增长，DTG-FF越来越优于BP；而在真实图像上，FF-BP差距符号反转并随K扩大。数据集内CIFAR-100粗粒度与细粒度探针将标签层次与图像分布分离：合成K扫描将输出维度与细粒度判别难度混淆，从而高估了FF的可迁移性。（3）系统审计。FF可以在不存储深度激活的情况下实现，但在普通8 GB硬件上，标准BP+梯度累积达到4.18 GB / 157 imgs/s，而DTG-FF为7.90 GB / 138 imgs/s，因此在公平基线支持下，基于内存的理由在此规模下不成立。

英文摘要

Forward-Forward (FF) learning [Hinton, 2022] replaces backpropagation with strictly layer-local goodness updates. Recent FF-CNN work has narrowed the gap to BP on 32x32 benchmarks, raising the question of whether layer-local training is becoming a viable alternative at realistic scale. To probe this rigorously, we develop DTG-FF -- dynamic temperature goodness, decoupled normalization, and multi-layer fusion -- as an instrument that sets FF-family state of the art across nine real-data benchmarks (91.8% CIFAR-10 and the first FF baseline at ImageNet-100 224x224), and use it to audit how far layer-local training actually scales. (1) Real-data scaling. Under identical recipe and backbone, an architecture-matched BP-DeepSup baseline beats DTG-FF by 2.40/5.93 pp on CIFAR-10/CIFAR-100, and the gap widens with class count. At 224x224 the same instrument reaches only 49.4% -- the first FF baseline at this scale, versus typical BP above 75% [Tian et al., 2020] -- exposing a real-data ceiling invisible at 32x32. (2) Synthetic vs. real K-conflict. DTG-FF increasingly outperforms BP as class count K grows on synthetic teacher-student tasks, yet on real images the FF-BP gap reverses sign and widens with K. A within-dataset CIFAR-100 coarse vs. fine probe isolates label-hierarchy from image distribution: synthetic K-sweeps confound output dimensionality with fine-grained discrimination difficulty and thereby overstate FF transferability. (3) Systems audit. FF can be implemented without storing depth-wide activations, but on commodity 8 GB hardware standard BP+gradient-accumulation reaches 4.18 GB / 157 imgs/s versus DTG-FF's 7.90 GB / 138 imgs/s, so a memory-based justification for FF at this scale is not supported under fair baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.06555 2026-06-08 cs.NE cs.LG 交叉投稿

Depth over Fidelity in Fixed-Budget Noisy Evolution Strategies

固定预算噪声进化策略中深度优先于保真度

Sichen Wang, Zhipeng Lu

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结针对固定评估预算下的噪声进化策略，提出概率精英成员（PEM）方法，通过条件期望秩权重替代硬秩权重，实现Rao-Blackwell化降噪，在COCO基准和RL等任务中取得一致提升。

Comments Accepted at the 43rd International Conference on Machine Learning (ICML 2026). 28 pages, 16 figures, 7 tables, including appendices

2606.06565 2026-06-08 cs.GR cs.HC cs.LG 交叉投稿

AI Level of Detail: Distance-Aware ML Model Precision Selection for Real-Time Human Motion Prediction in Games

AI细节层次：面向游戏中实时人体运动预测的距离感知机器学习模型精度选择

Mathew Varghese

发表机构 * University of Washington（华盛顿大学）

AI总结提出AI LOD框架，根据NPC与玩家摄像头的距离调整机器学习推理精度，利用量化模型作为近似，在保持感知质量的同时降低计算开销。

Comments Camera-ready for SIGGRAPH Technical Workshops 2026

详情

AI中文摘要

现代游戏引擎使用学习到的运动模型来驱动NPC动画，消耗大量计算资源。本文提出AI细节层次（AI LOD）框架，其中机器学习推理精度根据每个NPC与玩家摄像头的距离进行调整。核心思想类似于经典的几何LOD：在差异不可察觉的地方用更便宜的近似替代。这里的近似是低精度的量化机器学习模型，而不是低多边形网格。本文的贡献在于AI LOD概念本身：推理时量化可以作为AI驱动角色动画的LOD轴——更广泛地说，适用于任何基于AI的运行时系统，其中感知敏感性随上下文变化。使用Li等人的卷积序列到序列模型作为代表性示例来演示该概念，将其训练好的检查点导出为三个ONNX Runtime变体（FP32、FP16和INT8逐张量），旨在运行时由基于距离的选择器路由。在CMU Mocap数据集上的评估初步表明，每个精度层级可以在其指定距离范围内提供服务，且可感知的退化可忽略不计，支持了距离感知的机器学习模型精度选择作为基于AI的角色动画的可行LOD策略这一更广泛的假设。

英文摘要

Modern game engines spend significant compute animating NPCs with learned motion models. This paper proposes AI Level of Detail (AI LOD), a framework in which machine learning inference precision is adapted based on the distance between each NPC and the player camera. The core idea mirrors classical geometry LOD: substitute a cheaper approximation where the difference is imperceptible. Here, the approximation is a lower-precision quantized machine learning model rather than a lower-polygon mesh. The contribution of this work is the AI LOD concept itself: that inference-time quantization can serve as the LOD axis for AI-driven character animation - and more broadly, for any AI-based runtime system where perceptual sensitivity varies with context. The convolutional sequence-to-sequence model of Li et al. is used as a representative example to demonstrate the concept, with its trained checkpoint exported into three ONNX Runtime variants (FP32, FP16, and INT8 per-tensor), intended to be routed by a distance-based selector at runtime. Evaluation on the CMU Mocap dataset provides initial evidence that each precision tier can be served at its assigned distance range with negligible perceptible degradation, supporting the broader premise that distance-aware ML model precision selection is a viable LOD strategy for AI-based character animation.

URL PDF HTML ☆

赞 0 踩 0

2606.06573 2026-06-08 physics.flu-dyn cs.CL cs.LG eess.SP 交叉投稿

Multiscale POD of Transformer Attention Fields: Scale-Selective Analysis via Morlet Scalogram

Transformer注意力场的多尺度POD：基于Morlet尺度图的尺度选择性分析

Athanasios Zeris

发表机构 * Independent Researcher（独立研究者）； Athens, Greece（希腊雅典）

AI总结提出尺度选择性POD方法分析Transformer注意力场，通过Morlet小波识别时间尺度，提取各尺度能量主导模态，揭示层间尺度组织规律，无需架构修改或语言标注。

Comments 23 pages, 3 figures, 4 tables

详情

AI中文摘要

我们引入尺度选择性本征正交分解（POD）用于Transformer注意力场，受POD从湍流系综中提取能量主导模态的启发。Morlet连续小波变换识别文档系综中注意力滞后结构的主导时间尺度；然后POD从注意力场系综中提取每个尺度上的能量主导模态。得到的模态揭示了层依赖的尺度组织，早期层强调精细尺度，后期层转向较粗尺度。我们根据POD特征值衰减率定义谱集中指数，并经验性地表明该指数通过注意力场复杂度区分不同层。根据经典POD最优性定理，提取的模态最小化系综上的平均L2重构误差（定理1），为每层提供数据驱动的有效秩。该方法无需架构修改和语言标注：主导注意力模式仅从系综统计中涌现。湍流类比是结构性的而非物理性的：我们借用系综协方差和模态分析，而非流体动力学本身。

英文摘要

We introduce scale-selective Proper Orthogonal Decomposition (POD) for transformer attention fields, inspired by the use of POD for extracting energetically dominant modes from turbulent flow ensembles. The Morlet continuous wavelet transform identifies dominant temporal scales in the attention lag structure across a document ensemble; POD then extracts the energetically dominant modes at each scale from the ensemble of attention fields. The resulting modes reveal layer-dependent scale organisation, with early layers emphasising fine scales and later layers shifting toward coarser scales. We define a spectral concentration index from the POD eigenvalue decay rate and show empirically that it differentiates layers by their attention field complexity. By the classical POD optimality theorem, the extracted modes minimise the average L2 reconstruction error over the ensemble (Theorem 1), giving a data-driven effective rank for each layer. The method requires no architectural modification and no linguistic annotations: dominant attention patterns emerge from ensemble statistics alone. The turbulence analogy is structural rather than physical: we borrow ensemble covariance and modal analysis, not fluid dynamics itself.

URL PDF HTML ☆

赞 0 踩 0

2606.06601 2026-06-08 cs.CV cs.AI cs.LG 交叉投稿

Direct 3D-Aware Object Insertion via Decomposed Visual Proxies

通过分解视觉代理实现直接3D感知物体插入

Jingbo Gong, Yikai Wang, Yushi Lan, Yuhao Wan, Ziheng Ouyang, Rui Zhao, Ming-Ming Cheng, Qibin Hou, Chen Change Loy

发表机构 * Google（谷歌）； Black Forest Labs（黑森林实验室）

AI总结提出DIRECT框架，通过分解外观、几何和上下文引导，实现可控制3D姿态的物体插入，在几何可控性和视觉质量上优于现有方法。

Comments ICML 2026; Project Page: https://gong1130.github.io/DIRECT/

详情

AI中文摘要

物体插入旨在将参考对象无缝合成到背景图像的指定区域。最近的基于扩散的方法实现了高视觉质量，但将插入视为简单的2D修复任务，无法显式控制对象的3D姿态，限制了其实用性。我们提出DIRECT（用于参考组合和目标集成的分解注入），一种新颖框架，将交互式姿态操作与高保真2D图像合成相结合，实现姿态可控的物体插入。我们的方法将插入条件分解为三个互补组件：从参考对象捕获视觉细节的外观引导、从用户调整的3D代理派生的几何引导以及来自目标背景的上下文引导。通过将它们注入到不同路径，DIRECT避免了特征纠缠，同时保留了参考外观、遵循用户指定的姿态并使对象适应目标场景。我们还引入了一个自动数据构建流程，以提高训练数据的多样性和质量。实验表明，DIRECT在几何可控性和视觉质量方面均优于先前方法。

英文摘要

Object insertion aims to seamlessly composite a reference object into a specified region of a background image. Recent diffusion-based methods achieve high visual quality but formulate insertion as a simple 2D inpainting task, providing no explicit control over the object's 3D pose and limiting their practical applicability. We propose DIRECT (Decomposed Injection for Reference Composition and Target-integration), a novel framework that integrates interactive pose manipulation with high-fidelity 2D image synthesis to enable pose-controllable object insertion. Our method decomposes the insertion conditions into three complementary components: appearance guidance capturing visual details from the reference object, geometry guidance derived from the user-adjusted 3D proxy, and context guidance from the target background. By injecting them through separate pathways, DIRECT avoids feature entanglement and simultaneously preserves reference appearance, follows the user-specified pose, and adapts the object to the target scene. We also introduce an automated data construction pipeline to improve the diversity and quality of training data. Experiments show that DIRECT outperforms previous methods in both geometric controllability and visual quality.

URL PDF HTML ☆

赞 0 踩 0

2606.06615 2026-06-08 cs.SD cs.AI cs.LG eess.AS 交叉投稿

FIGMA: Towards FIne-Grained Music retrievAl

FIGMA：迈向细粒度音乐检索

Nishit Anand, Ashish Seth, Sreyan Ghosh, Dinesh Manocha, Ramani Duraiswami

发表机构 * University of Maryland, College Park（马里兰大学学院公园分校）

AI总结针对现有音乐检索模型无法处理细粒度属性查询的问题，提出多视角对比架构FIGMA，通过联合优化全局音频-文本对齐和帧级标记对齐，在统一表示空间中捕获高层语义和细粒度音乐属性，并在新构建的细粒度音乐描述数据集上取得显著提升。

Comments Accepted to ACL 2026. Project Website: https://nishitanand.github.io/figma-website/

详情

AI中文摘要

使用自然语言描述检索音乐已通过对比音频-文本模型（如CLAP）得到改进，但当前系统仍局限于粗粒度语义查询。当描述指定细粒度音乐属性（如速度、调性、和弦进行或节奏结构）时，现有模型通常无法检索到正确的音频。我们表明，这一限制源于对比学习目标本身：尽管在长描述上训练，基于CLAP的模型实际上仅利用前几个标记，丢弃了详细提示中编码的大量信息。然后，我们提出FIGMA（细粒度音乐检索），一种多视角对比架构，通过联合优化全局音频-文本对齐和帧级、标记级对齐来解决这一限制。该设计使FIGMA能够在统一表示空间中捕获高层语义上下文和细粒度音乐属性。此外，我们形式化了细粒度音乐检索任务，并构建了细粒度音乐描述数据集（FGMCaps），一个包含38万音乐-描述对的大规模训练数据集以及1万测试集，两者都标注了速度、调性、和弦进行、节拍数以及流派和情绪。大量实验表明，FIGMA在多个音乐检索基准（包括域外评估）上持续优于现有基于CLAP的音乐检索模型，相对改进高达73.3%。

英文摘要

Retrieving music using natural language descriptions has improved with contrastive audio-text models such as CLAP, but current systems remain limited to coarse semantic queries. When descriptions specify fine-grained musical attributes such as tempo, key, chord progression, or rhythmic structure, existing models often fail to retrieve the correct audio. We show that this limitation stems from the contrastive learning objective itself: despite being trained on long captions, CLAP-based models effectively utilize only the first few tokens, discarding much of the information encoded in detailed prompts. Then, we propose FIGMA (FIne-Grained Music RetrievAl), a multi-view contrastive architecture that addresses this limitation by jointly optimizing global audio-text alignment and frame-level, token-wise alignment. This design enables FIGMA to capture both high-level semantic context and fine-grained musical attributes within a unified representation space. Moreover, we formalize the task of Fine-Grained Music Retrieval and construct Fine-Grained Music Caption dataset (FGMCaps), a large-scale dataset of 380K music-caption pairs for training along with a 10K test set, both annotated with tempo, key, chord progression, beat count, as well as genre and mood. Extensive experiments demonstrate that FIGMA consistently outperforms existing CLAP-based music retrieval models across multiple music retrieval benchmarks, including out-of-domain evaluations, with relative improvements of up to 73.3%.

URL PDF HTML ☆

赞 0 踩 0

2606.06618 2026-06-08 cs.RO cs.AI cs.LG 交叉投稿

ChronoForest: Closed-Loop Multi-Tree Diffusion Planning for Efficient Bridge Search and Route Composition

ChronoForest: 用于高效桥接搜索和路线组合的闭环多树扩散规划

Jungmin Seo, Jaesik Park

发表机构 * Seoul National University（首尔国立大学）

AI总结针对仅依赖短程离线轨迹进行长程路线规划的问题，提出ChronoForest系统，通过锚链树扩散规划器和在线多树协调器实现局部桥接搜索与全局路线重解，在OGBench和哈密顿路线组合基准上显著提升成功率和效率。

Comments 40 pages, 4 figures, 7 tables, 3 algorithms

详情

AI中文摘要

当仅有短程离线轨迹可用时，我们如何规划到达指定目标、访问必经航点且保持路径短的长程路线？这一问题在离线导航中至关重要，因为收集足够丰富的长程数据十分困难，但真实智能体仍需以路线级效率（而非仅仅可行性）解决长程任务。难点有两方面：在微观层面，组合多个短程片段会在搜索代价和路径质量之间产生权衡；在宏观层面，航点排序需要比较起点、目标和航点锚点之间的成对旅行代价，而这些锚点在规划前未知，且仅通过长程时间距离估计时可靠性下降。本文提出ChronoForest，一种闭环规划系统，通过锚链树扩散规划器和在线多树协调器，将局部桥接搜索与在线路线重解耦合。ChronoForest利用时间距离进行短程引导和节点评估，同时利用搜索时的桥接证据验证长程锚点连通性，并反复重解路线。在OGBench AntMaze-Stitch上，ChronoForest在中等、大型和巨型分片上分别达到99.8%、99.3%和99.5%的成功率，并在巨型拼接任务上相比先前报道的扩散方法提升高达34.5个百分点。在哈密顿路线组合基准上，在线重解纠正了较差的时间排序，提升了路线质量，同时代价远低于穷举规划。

英文摘要

How can we plan long-horizon routes that reach designated goals, visit required waypoints, and remain short when only short-horizon offline trajectories are available? This problem matters in offline navigation because collecting sufficiently rich long-horizon data is difficult, yet real agents must still solve long-range tasks with route-level efficiency rather than mere feasibility. The difficulty is twofold: at the microscopic level, composing many short-horizon segments creates a trade-off between search cost and path quality, while at the macroscopic level, waypoint ordering requires comparing pairwise travel costs among start, goal, and waypoint anchors that are unknown before planning and increasingly unreliable when estimated only from long-range temporal distance. In this paper, we propose ChronoForest, a closed-loop planning system that couples local bridge search and online route re-solving through an anchor-chaining tree diffusion planner and an online multi-tree orchestrator. ChronoForest uses temporal distance for short-range guidance and node evaluation, while using search-time bridge evidence to validate long-range anchor connectivity and repeatedly re-solve the route. On OGBench AntMaze-Stitch, ChronoForest achieves 99.8%, 99.3%, and 99.5% success on the medium, large, and giant splits and improves giant-stitch success by up to 34.5 points over prior reported diffusion-based results. On Hamiltonian route-composition benchmarks, online re-solving corrects poor temporal orderings and improves route quality while remaining substantially cheaper than exhaustive planning.

URL PDF HTML ☆

赞 0 踩 0

2606.06627 2026-06-08 cs.RO cs.AI cs.CV cs.LG 交叉投稿

What Matters When Cotraining Robot Manipulation Policies on Everyday Human Videos?

在日常生活人类视频上协同训练机器人操作策略时什么因素重要？

Richard Li, Aditya Prakash, Andrew Wen, Saurabh Gupta, Yilun Du, Pulkit Agrawal

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Harvard University（哈佛大学）

AI总结研究利用日常互联网视频协同训练机器人操作策略时，手部姿态质量和运动差距对迁移的影响，提出一种协同训练方法，在低机器人数据场景下六个操作任务中绝对成功率提升29.7%。

Comments The project website is here: https://richardrl.github.io/what-matters-cotraining-human-videos/index.html

2606.06664 2026-06-08 cs.CV cs.AI cs.LG 交叉投稿

Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers

内在视觉：神经科学启发的概念电路用于解释和引导视觉变换器

Tang Li, Yanlin Chen, Mengmeng Ma, Xi Peng

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出ViSAE工具箱，通过神经科学启发的概念电路解释视觉变换器内部机制，包含高效概念集、自动电路追踪算法和概念编辑应用，在WaterBirds上最差组准确率提升48.2%。

Comments In Proceedings of the International Conference on Machine Learning, 2026. (acceptance rate 26.6%)

详情

AI中文摘要

尽管视觉变换器（ViT）具有高准确率，但其预测可能受到虚假线索的驱动，因此在安全部署前需要理解其内部工作机制。稀疏自编码器（SAE）为将模型表示分解为人类可解释的概念提供了有前景的视角，但由于对概念覆盖范围的控制有限以及特征解释的主观性和不可扩展性，将基于SAE的解释方法应用于ViT仍然具有挑战性。为填补这些空白，受神经科学启发原理的驱动，我们提出了ViSAE，一个通过概念电路理解ViT内部工作机制的机械可解释性工具箱。ViSAE包含三个组成部分：（1）一个包含64K图像和16K视觉基础概念词汇的探测套件，与ImageNet相比，概念覆盖效率提高了20倍，与现有概念集相比，解释准确率提高了28.7%。（2）自上而下的概念读取和自下而上的电路追踪算法，通过概念电路自动恢复ViT内部工作机制。（3）用于审计和引导ViT行为的应用。通过概念编辑，ViSAE在WaterBirds上将最差组准确率提高了48.2%，比现有方法高出23.8%。我们的数据和代码：此 https URL。

英文摘要

Despite high accuracy, Vision Transformer (ViT) predictions can be driven by spurious cues, raising the need to understand their inner workings before safe deployment. Sparse autoencoders (SAEs) provide a promising lens for decomposing model representations into human-interpretable concepts, yet adapting SAE-based interpretation to ViTs remains challenging due to limited control over concept coverage and subjective, non-scalable feature interpretation. To fill the gaps, motivated by neuroscience-inspired principles, we propose ViSAE, a mechanistic interpretability toolbox for understanding ViT inner workings through concept circuits. ViSAE consists of three components: (1) A probing suite with 64K images and a 16K visually grounded concept vocabulary, improving concept coverage efficiency by 20x over ImageNet and interpretation accuracy by 28.7% over existing concept sets. (2) Top-down concept reading and Bottom-up circuit tracing algorithms that automatically recover ViT inner workings via concept circuits. (3) Applications for auditing and steering ViT behavior. Through concept editing, ViSAE improves the worst-group accuracy on WaterBirds by 48.2%, outperforming existing methods by 23.8%. Our data and code: https://github.com/deep-real/ViSAE.

URL PDF HTML ☆

赞 0 踩 0

2606.06715 2026-06-08 cs.CL cs.AI cs.LG 交叉投稿

Does Topic Sentiment Cause Perceived Ideology? Comparing Human and LLM Annotations in Political News Articles

主题情感是否导致感知意识形态？比较政治新闻文章中人类与LLM的标注

Upasana Chatterjee

发表机构 * Columbia University（哥伦比亚大学）

AI总结研究主题情感对感知政治意识形态的因果效应，通过比较人类与LLM标注，发现微调GPT-4o-mini产生显著因果效应，归因于捷径学习。

Comments Accepted to ACL SRW 2026

详情

AI中文摘要

我们探究主题情感是否对感知政治意识形态具有因果效应，以及答案是否取决于意识形态标签的分配者。使用来自AllSides的文章，结合Llama-3.3-70b-versatile的共享情感标注，我们比较了来自专家人类标注者、GPT-4o-mini（基线和微调）以及Llama-3.3-70B的意识形态标签。我们应用双重机器学习（DML）和社区级中介分析于所有四种标注范式。人类标注在社区水平未产生显著因果效应。微调后的GPT-4o-mini达到了最高的分类准确率（F1=72.48），并且是唯一在社区水平产生显著处理效应和中介中显著自然直接效应（NDE）的标注范式。我们将此解释为捷径学习的证据：对意识形态标签数据进行微调导致模型内化了一种虚假的情感-意识形态耦合，而这种耦合在人类判断中对此任务并不起作用。这种耦合在基于F1的评估中结构上不可见，对LLM标注作为银标签以及在下游因果分析中作为人类判断的代理的使用具有影响。

英文摘要

We ask whether topic sentiment has a causal effect on perceived political ideology, and whether the answer depends on who assigns the ideology label. Using articles from AllSides, paired with shared sentiment annotations from Llama-3.3-70b-versatile, we compare ideology labels from expert human annotators, GPT-4o-mini (baseline and finetuned), and Llama-3.3-70B. We apply Double Machine Learning (DML) and community-level mediation analysis across all four annotation paradigms. Human annotations yield no significant causal effects at the community level. Fine-tuned GPT-4o-mini achieves the highest classification accuracy (F1=72.48) and is the only annotator paradigm that produces significant community-level treatment effects and significant natural direct effects (NDEs) in mediation. We interpret this as evidence of shortcut learning: fine-tuning on ideology-labeled data causes the model to internalise a spurious sentiment--ideology coupling not operative in human judgment for this task. This coupling is structurally invisible to F1-based evaluation, with implications for the use of LLM annotations as silver labels and as proxies for human judgment in downstream causal analyses.

URL PDF HTML ☆

赞 0 踩 0

2606.06741 2026-06-08 cs.AI cs.CL cs.LG 交叉投稿

OpenSkill: Open-World Self-Evolution for LLM Agents

OpenSkill: 面向LLM智能体的开放世界自我进化

Zhiling Yan, Dingjie Song, Hanrong Zhang, Wei Liang, Yuxuan Zhang, Yutong Dai, Lifang He, Philip S. Yu, Ran Xu, Xiang Li, Lichao Sun

发表机构 * Lehigh University（莱维大学）； University of Illinois Chicago（伊利诺伊大学芝加哥分校）； University of British Columbia（不列颠哥伦比亚大学）； Vector Institute（向量研究所）； Salesforce AI Research（Salesforce人工智能研究）； Massachusetts General Hospital and Harvard Medical School（麻省总医院和哈佛医学院）

AI总结提出OpenSkill框架，使智能体在无目标任务监督下，利用开放世界资源自举构建技能和验证信号，实现自我进化，在多个基准上取得最佳自动通过率。

Comments 20 pages, 4 figures and 8 tables. Code is avalable at https://github.com/OpenLAIR/OpenSkill

详情

AI中文摘要

自我进化智能体需要在部署后进行适应，但现有方法假设存在可用的学习循环，例如精心策划的技能、成功的轨迹或验证信号。真实的开放世界部署可能不提供这些，只提供一个任务提示。在这项工作中，我们研究开放世界自我进化，其中智能体必须从零开始构建其技能和自身的验证信号，使用开放世界资源但没有目标任务监督。我们提出OpenSkill，一个启动这个循环的框架：它从文档、代码库和网络中获取基础知识和验证锚点，将它们综合成可迁移的技能，并根据自建的虚拟任务（基于锚点而非目标答案）来优化这些技能。因此，开放世界既提供了要学习的知识，也提供了一个独立于监督的练习环境，目标任务监督保留用于最终评估。在三个基准和两个目标智能体上，OpenSkill在满足无监督约束的同时取得了最佳自动通过率。分析表明，其技能无需特定模型适应即可跨模型迁移，并且其自建验证器与真实结果一致，尽管从未访问过这些结果。

英文摘要

Self-evolving agents requires adaptation after deployment, but existing approaches assume a usable learning loop, such as curated skills, successful trajectories, or verifier signals. Real open-world deployments may provide none of these, offering only a task prompt. In this work, we study open-world self-evolution, where an agent must build both its skills and its own verification signals from scratch, using open-world resources but no target-task supervision. We propose OpenSkill, a framework that bootstraps this loop: it acquires grounded knowledge and verification anchors from documentation, repositories, and the web, synthesizes them into transferable skills, and refines those skills against self-built virtual tasks grounded in the anchors rather than in target answers. The open world thus supplies both the knowledge to be learned and a supervision-independent practice environment, with target-task supervision reserved for final evaluation. Across three benchmarks and two target agents, OpenSkill attains the best automated pass rate while satisfying the no-supervision constraint. Analysis shows its skills transfer across models without model-specific adaptation, and its self-built verifier aligns with ground-truth outcomes despite never accessing them.

URL PDF HTML ☆

赞 0 踩 0

2606.06748 2026-06-08 cs.CL cs.AI cs.LG 交叉投稿

Evidence Graph Consistency in Retrieval-Augmented Generation: A Model-Dependent Analysis of Hallucination Detection

检索增强生成中的证据图一致性：基于模型的幻觉检测分析

Jianru Shen

AI总结提出证据图一致性（EGC）框架，通过构建局部证据图并计算五种结构一致性指标检测幻觉，发现不同模型族间一致性特征方向相反，表明嵌入图一致性不能作为模型无关的检测信号。

Comments Accepted at the International Conference on Advanced Machine Learning and Data Science; to appear in the IEEE Xplore proceedings

详情

AI中文摘要

检索增强生成（RAG）减少了但并未消除大型语言模型中的幻觉。现有检测方法依赖于生成答案与检索段落之间的平面相似性，忽略了证据片段与答案声明之间的结构关系。我们提出了证据图一致性（EGC）框架，该框架为每个响应构建一个局部证据图，并计算五种结构一致性度量作为幻觉指标。在RAGTruth的完整问答拆分上，跨六个LLM（5,767个响应）进行评估，EGC揭示了一个一致的模型族分裂：图一致性特征在Llama-2模型中显示出预期的诊断方向，但在GPT-4、GPT-3.5和Mistral-7B中表现出系统性逆转。这种逆转表明不同模型族之间存在定性的不同幻觉模式，并表明基于嵌入的图一致性不能作为模型无关的幻觉检测信号。

英文摘要

Retrieval-Augmented Generation (RAG) reduces but does not eliminate hallucination in large language models. Existing detection methods rely on flat similarity between generated answers and retrieved passages, ignoring structural relationships among evidence pieces and answer claims. We propose Evidence Graph Consistency (EGC), a framework that constructs a local evidence graph per response and computes five structural consistency measures as hallucination indicators. Evaluated on the full question answering split of RAGTruth across six LLMs (5,767 responses), EGC reveals a consistent model-family split: graph consistency features show the expected diagnostic direction for hallucinations in Llama-2 models but exhibit systematic reversal in GPT-4, GPT-3.5, and Mistral-7B. This reversal suggests qualitatively different hallucination patterns across model families and indicates that embedding-based graph consistency cannot serve as a model-independent hallucination detection signal.

URL PDF HTML ☆

赞 0 踩 0

2606.06764 2026-06-08 stat.ML cs.AI cs.LG 交叉投稿

Optimal Rates for Generalization of Gradient Descent Methods with Deep Neural Networks

深度神经网络梯度下降方法的泛化最优速率

Junyu Zhou, Puyu Wang, Yunwen Lei, Yiming Ying, Ding-Xuan Zhou

发表机构 * Mathematical Institute for Machine Learning and Data Science, KU Eichstätt-Ingolstadt（机器学习与数据科学数学研究所，埃施特哈特-英戈尔施塔特大学）； Department of Computer Science, RPTU Kaiserslautern-Landau（计算机科学系，凯撒斯劳滕-兰道大学）； Department of Mathematics, University of Hong Kong（数学系，香港大学）； School of Mathematics and Statistics, University of Sydney（数学与统计学学院，悉尼大学）

AI总结本文针对深度ReLU网络，在神经正切核（NTK）机制下，首次建立了梯度下降（GD）和随机梯度下降（SGD）的极小化最优泛化误差速率，证明宽度足够时可达核方法的最优速率。

Comments 39 pages, 1 table

详情

AI中文摘要

近年来，在神经正切核（NTK）机制下，对于过参数化神经网络的梯度下降方法的统计泛化性能的理解取得了进展。然而，现有关于回归问题的工作大多局限于浅层网络架构，在深度神经网络理论中留下了显著的空白。本文通过为使用梯度下降（GD）和随机梯度下降（SGD）训练的深度ReLU网络提供全面的泛化分析来填补这一空白。具体来说，我们首次建立了深度ReLU网络的GD和SGD在总体风险超额上的极小化最优速率，假设网络宽度与网络深度和训练样本规模呈多项式关系。我们的结果表明，在足够宽度下，深度ReLU网络的梯度下降方法能够达到与核方法相当的泛化最优速率。

英文摘要

Recent progress has been made in understanding the statistical generalization performance of gradient descent methods for overparameterized neural networks within the neural tangent kernel (NTK) regime. However, most of the existing work on regression problems is limited to shallow network architectures, leaving a notable gap in the theory of deep neural networks. This paper addresses this gap by presenting a comprehensive generalization analysis for deep ReLU networks trained using gradient descent (GD) and stochastic gradient descent (SGD). Specifically, we establish the first known minimax-optimal rates of excess population risk for both GD and SGD with deep ReLU networks, under the assumption that the network width scales polynomially with respect to the network depth and training sample size. Our results demonstrate that with sufficient width, gradient descent methods for deep ReLU networks can achieve optimal generalization rates on par with kernel methods.

URL PDF HTML ☆

赞 0 踩 0

2606.06765 2026-06-08 cond-mat.mtrl-sci cs.LG 交叉投稿

Reactivity-Informed Machine Learning for Performance Prediction and Design Space Exploration of Alkali-Activated Slag

反应性信息驱动的机器学习用于碱激发矿渣性能预测与设计空间探索

Qiyao He, Zhanzhao Li, Kai Gong

发表机构 * Department of Civil and Environmental Engineering, Rice University（Rice大学土木与环境工程系）； Rice Advanced Materials Institute, Rice University（Rice大学先进材料研究所）； Ken Kennedy Institute, Rice University（Rice大学肯尼迪研究所）

AI总结通过整合矿渣反应性描述符（AMODE）与机器学习，从最大规模文献数据集预测碱激发矿渣抗压强度，揭示物理一致趋势并探索低碳设计空间。

Comments 68 pages, 14 figures, 2 tables

详情

AI中文摘要

在胶凝材料中，建立配合比、原材料性能、养护条件和性能之间的定量关系一直是一个长期挑战，特别是对于具有可变前驱体和激发剂化学性质的碱激发材料。在此，我们整理了迄今为止最大的文献来源碱激发矿渣（AAS）数据集，包含超过3100条抗压强度记录、155种化学性质不同的粒化高炉矿渣（GGBS）以及24个属性，涵盖前驱体化学性质、细度和反应性。多种机器学习（ML）算法在逐步丰富的特征场景下进行了基准测试，表明整合GGBS成分、细度、养护条件和试件几何形状可提高预测性能。平均金属氧化物解离能（AMODE）作为前驱体反应性的物理可解释表示，为显式氧化物成分提供了紧凑的替代描述符，同时实现了相当的预测性能。模型解释揭示了来自异构数据的物理一致趋势，包括Na2O用量和硅酸盐模量的非单调效应、较高含水量和较大试件尺寸下预测强度降低，以及AMODE比单个氧化物含量更连贯地表示的耦合氧化物级效应。统计约束的设计空间探索揭示了强度、隐含CO2排放和成本之间依赖于反应性的权衡。设计图识别出高强度区域，其CO2排放量显著低于类似成本的OPC基参考。总体而言，这项工作展示了反应性信息驱动的ML如何从异构AAS数据中提取物理有意义的趋势，并指导源依赖性粘结剂设计。整理后的数据集可公开访问，以支持水泥和混凝土研究的进展。

英文摘要

Establishing quantitative relationships among mix design, raw material properties, curing conditions, and performance remains a long-standing challenge in cementitious materials, particularly for alkali-activated materials with variable precursor and activator chemistry. Here, we curated the largest literature-derived alkali-activated slag (AAS) dataset to date, comprising over 3100 compressive strength records, 155 chemically distinct ground granulated blast-furnace slags (GGBSs), and 24 attributes incorporating precursor chemistry, fineness, and reactivity. Multiple machine learning (ML) algorithms were benchmarked across progressively enriched feature scenarios, demonstrating that integrating GGBS compositions, fineness, curing conditions, and specimen geometry improves predictive performance. The average metal oxide dissociation energy (AMODE), a physically interpretable representation of precursor reactivity, provides a compact alternative descriptor to explicit oxide compositions while enabling comparable predictive performance. Model interpretation revealed physically consistent trends from heterogeneous data, including non-monotonic effects of Na2O dosage and silicate modulus, reduced predicted strength at higher water content and larger specimen size, and coupled oxide-level effects more coherently represented by AMODE than by individual oxide contents. Statistically constrained design space exploration reveals reactivity-dependent trade-offs among strength, embodied CO2 emissions, and cost. The design maps identify high-strength regions with substantially lower CO2 emissions than OPC-based references at similar cost. Overall, this work demonstrates how reactivity-informed ML can extract physically meaningful trends from heterogeneous AAS data and guide source-dependent binder design. The curated dataset is publicly accessible to support advances in cement and concrete research.

URL PDF HTML ☆

赞 0 踩 0

2606.06772 2026-06-08 stat.ML cs.AI cs.LG 交叉投稿

Generalization in Deep Neural Networks: Minimax Rates for Gradient Methods

深度神经网络的泛化：梯度方法的极小化最优速率

Junyu Zhou, Puyu Wang, Yunwen Lei, Marius Kloft, Yiming Ying

发表机构 * Mathematical Institute for Machine Learning and Data Science, Catholic University of Eichstätt-Ingolstadt（机器学习与数据科学数学研究所，埃施特哈特-因戈尔施塔特天主教大学）； Department of Computer Science, RPTU Kaiserslautern-Landau（计算机科学系，凯斯莱特恩-兰道大学）； Department of Mathematics, The University of Hong Kong（数学系，香港大学）； School of Mathematics and Statistics, The University of Sydney（数学与统计学学院，悉尼大学）

AI总结本文建立了过参数化深度神经网络与核方法学习动力学的联系，证明了梯度下降和随机梯度下降在足够宽度下能达到极小化最优泛化误差。

Comments 37 pages

详情

AI中文摘要

理解过参数化神经网络的泛化性能已成为深度学习理论的核心课题。尽管近期进展，特别是神经正切核（NTK）机制下的工作，揭示了浅层架构的行为，但深度神经网络（DNN）的统计泛化性质，尤其是在回归任务中，仍远未得到充分理解。本文通过提供使用梯度方法训练的DNN的全面泛化分析，在弥合这一差距方面取得了重大进展。首先，我们首次建立了使用梯度方法训练的、具有光滑激活函数的DNN的学习动态与核方法的学习动态之间的关键联系，表明过参数化DNN上的梯度方法可以完全继承其核对应物的有利学习动态。基于这一联系以及核方法已确立的最优性，我们推导出了梯度下降（GD）和随机梯度下降（SGD）的过量总体风险的第一个已知极小化最优速率，假设网络宽度与样本大小成多项式比例。我们的结果表明，在足够宽度下，由GD或SGD训练的DNN可以实现与基于核的方法相当的泛化性能。

英文摘要

Understanding the generalization performance of over-parameterized neural networks has become a central topic in deep learning theory. While recent advances, particularly works under the Neural Tangent Kernel (NTK) regime, have shed light on the behavior of shallow architectures, the statistical generalization properties of deep neural networks (DNNs), especially in regression tasks, remain far less understood. In this paper, we make significant progress toward closing this gap by providing a comprehensive generalization analysis of DNNs trained using gradient-based methods. First, we establish, for the first time, a crucial connection between the learning dynamics of a DNN with smooth activation functions trained via gradient-based methods and those of kernel methods, showing that gradient-based methods on over-parameterized DNNs can fully inherit the favorable learning dynamics of their kernel counterparts. Building on this connection and the well-established optimality of kernel methods, we derive the first known minimax-optimal rates for the excess population risk of both gradient descent (GD) and stochastic gradient descent (SGD), under the assumption that network width scales polynomially with the sample size. Our results demonstrate that, with sufficient width, DNNs trained by GD or SGD can achieve generalization performance comparable to kernel-based methods.

URL PDF HTML ☆

赞 0 踩 0

2606.06782 2026-06-08 cs.IT cs.LG math.IT math.ST stat.ML stat.TH 交叉投稿

训练任务多样性对上下文学习的影响：基于低维子空间的视角

Soo Min Kwon, Alec S. Xu, Can Yaras, Dogyoon Song, Laura Balzano, Qing Qu

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Washington（华盛顿大学）； University of California, Los Angeles（加州大学洛杉矶分校）； Stanford University（斯坦福大学）； University of Toronto（多伦多大学）

AI总结本文通过低秩高斯混合模型分析训练任务多样性（由子空间非重叠列数定义）如何提升线性注意力上下文学习的泛化与优化，解释训练多样性缩短学习平台期及实现分布外泛化的现象，并扩展至非线性场景。

详情

AI中文摘要

Transformer执行上下文学习（ICL）的涌现能力引发了大量旨在理解其底层机制的研究。现有工作通常研究训练任务多样性（定义为ICL训练任务向量的数量或任务向量所来自的函数类数量）如何塑造ICL的学习动态和泛化能力。尽管这两种定义都揭示了许多有趣的现象，但后一定义下的许多观察结果在理论上仍未得到解释。本文提出了一个最小分析模型，在这些现象下，这些现象可以从训练数据的属性中可靠地涌现。通过将训练任务向量建模为低秩高斯的混合，我们展示了训练任务多样性（由参数化协方差矩阵的子空间之间的非重叠列数定义）如何改善线性注意力ICL的泛化和优化轨迹。特别地，我们表明我们的模型可以解释（i）为什么任务多样性训练缩短了ICL的平台期，以及（ii）为什么ICL似乎实现了分布外泛化。最后，我们通过实验证明了我们的结果如何扩展到非线性Transformer和非线性函数类。总体而言，我们的工作提出了一个可处理的框架来统一现有的观察结果。

英文摘要

The transformer's emergent ability to perform in-context learning (ICL) has sparked a wide range of studies designed to understand its underlying mechanisms. Existing works often study how training task diversity, defined either as the number of ICL training task vectors or as the number of function classes from which the task vectors are drawn, shapes both the learning dynamics and generalization capabilities of ICL. While both definitions have uncovered many interesting phenomena, many observations under the latter definition remain theoretically unexplained. This paper presents a minimal analytical model under which these phenomena provably emerge from the properties of the training data. By modeling the training task vectors as a mixture of low-rank Gaussians, we show how training task diversity, defined by the number of non-overlapping columns between subspaces that parameterize the covariance matrices, improves both the generalization and optimization trajectory of ICL with linear attention. In particular, we show that our model can explain (i) why training with task diversity shortens the ICL plateau and (ii) why ICL appears to achieve out-of-distribution generalization. We conclude by empirically demonstrating how our results extend to nonlinear transformers and nonlinear function classes. Overall, our work presents a tractable framework to unify existing observations.

URL PDF HTML ☆

赞 0 踩 0

2606.06815 2026-06-08 cs.CR cs.LG 交叉投稿

学习公平需求模型

Adam N. Elmachtoub, Hyemi Kim, Jonathan Y. Tan

发表机构 * Department of Industrial Engineering and Operations Research and Data Science Institute, Columbia University（工业工程与运筹学系及数据科学研究院，哥伦比亚大学）； Department of Industrial Engineering and Operations Research, Columbia University（工业工程与运筹学系，哥伦比亚大学）

AI总结研究数据驱动定价中的公平性问题，通过比较在需求估计或价格优化阶段施加公平约束的策略，分析其对消费者福利和社会效益的影响。

详情

AI中文摘要

数据驱动定价在航空、贷款、保险和零售等领域日益普遍。通过从客户特征中学习需求模型并据此定价，这些系统可能产生歧视性结果，引发公平性问题。这引出了基本问题——系统应如何在定价流程中纳入公平考量，以及最终如何影响社会结果？为回答这些问题，我们研究了一个简化模型，其中卖方有一个两阶段决策流程：线性需求模型估计，随后是价格优化。卖方在训练损失、价格和需求方面考虑公平概念，包括均等主义和罗尔斯主义视角。我们表明，跨消费者群体均等化训练损失会导致多个解，进而可能产生不良结果，尽管这是公平机器学习中的标准方法。相反，关注直接应用于价格或需求的公平性，我们比较了两种策略：在需求估计阶段或价格优化阶段强制执行公平性。对于均等主义公平，我们刻画了在较小公平水平下每种策略何时产生更高的社会福利。我们表明，当数据集中的市场规模和价格相似时，在估计阶段施加价格公平对消费者更有利，而在优化阶段施加需求公平则带来更好的消费者结果。对于罗尔斯主义公平，两种策略完全一致。最后，我们将模型扩展到其他需求函数，并使用真实世界疫苗定价数据进行案例研究。

英文摘要

Data-driven pricing is increasingly prevalent in sectors such as airlines, lending, insurance, and retail. By learning demand models from customer features and setting prices accordingly, these systems may generate discriminatory outcomes that raise fairness concerns. This leads to fundamental questions - how and where should systems incorporate fairness considerations in the pricing pipeline, and how does it ultimately affect societal outcomes? To answer these, we study a stylized model where a seller has a two-stage decision pipeline comprising linear demand model estimation followed by price optimization. The seller considers fairness notions in training loss, price, and demand, under both parity-wise and Rawlsian perspectives. We show that equalizing training loss across consumer groups leads to multiple solutions, which in turn can result in undesirable outcomes despite being a standard approach in fair machine learning. Focusing instead on fairness applied directly to prices or demand, we compare two strategies that enforce fairness in either the demand estimation stage or the price optimization stage. For parity-wise fairness, we characterize when each strategy yields higher social welfare under small fairness levels. We show that when market sizes and prices in the dataset are similar, imposing price fairness in the estimation stage is more beneficial to consumers, whereas imposing demand fairness in the optimization stage yields better consumer outcomes. For Rawlsian fairness, the two strategies coincide exactly. Lastly, we extend our model to alternate demand functions and conduct a case study using real-world vaccine pricing data.

URL PDF HTML ☆

赞 0 踩 0

2606.06837 2026-06-08 eess.AS cs.LG 交叉投稿

SEAM: Shortcut-Aware Real-Time Detection of Scripted vs. Spontaneous Speech for Interview Guardrails

SEAM：面向面试防护栏的脚本化与自发语音的快捷方式感知实时检测

Vsevolod, Kovalev, Pranay Manocha

发表机构 * Symbal AI ； Princeton University（普林斯顿大学）

AI总结提出SEAM框架，通过统一预处理、接缝感知采样、非语音增强和紧凑DistilHuBERT骨干，在8秒窗口下实现0.971 ROC-AUC，并揭示快捷方式学习问题。

Comments Accepted to Interspeech 2026

详情

AI中文摘要

AI中文摘要

合成组织病理学图像生成解决了计算病理学中的关键挑战，包括患者隐私和对基础模型大规模训练数据日益增长的需求。潜在扩散模型主导了图像生成领域，最近的研究强调潜在空间的选择对生成图像的质量至关重要。现有的组织病理学最先进生成模型使用预训练的视觉基础模型（VFM）作为条件信号，我们观察到这会导致“条件崩溃”，即条件信号主导潜在空间，降低生成样本的质量和多样性。因此，我们转而使用预训练的组织病理学VFM作为潜在空间本身，利用其编码丰富语义信息的patch-token特征。我们经验性地表明，这些特征经过$\ell_2$归一化，位于单位超球面$\mathcal{S}^{d-1}$上，具有强烈的角度主导性和内在曲率，使其自然适用于黎曼公式。因此，我们提出了STREAM，这是第一个在病理学领域应用黎曼流匹配的框架。STREAM包括两个阶段：1）一种桥式随机扰动，在$\mathcal{S}^{d-1}$上建立每个token的可整流性，用于在潜在空间中训练扩散变换器（DiT）；2）一种新颖的各向异性解码器，对速度场雅可比矩阵的低能量方向分配鲁棒性，同时保持其高能量方向的保真度。STREAM在乳腺癌和结直肠癌数据集上实现了最先进的重建和生成性能。代码将在接收后公开发布。

英文摘要

Synthetic histopathology image generation addresses critical challenges in computational pathology, including patient privacy and the growing need for large-scale training data for foundation models. Latent diffusion models have dominated the image generation domain, with recent works emphasizing that the choice of latent space is critical to the quality of generated images. Existing state-of-the-art generative models in histopathology use pretrained Vision Foundation Models (VFMs) as conditioning signals, and we observe that this leads to "conditioning collapse," where the conditioning signal dominates the latent space and lowers the quality and diversity of generated samples. Therefore, we instead use pretrained histopathology VFMs as the latent space itself, leveraging their patch-token features that encode rich semantic information. We empirically show that these features are $\ell_2$-normalized and lie on the unit hypersphere $\mathcal{S}^{d-1}$ with strong angular dominance and intrinsic curvature, making them naturally suited for a Riemannian formulation. We therefore present STREAM, the first framework to apply Riemannian flow matching in the pathology domain. STREAM consists of two stages: 1) a bridge-type stochastic perturbation that establishes per-token rectifiability on $\mathcal{S}^{d-1}$ for training a Diffusion Transformer (DiT) in latent space, and 2) a novel anisotropic decoder that allocates robustness to low-energy directions of the velocity-field Jacobian while preserving fidelity along its high-energy directions. Together, STREAM achieves state-of-the-art reconstruction and generation performance on breast and colorectal cancer datasets. The code will be publicly released upon acceptance.

URL PDF HTML ☆

赞 0 踩 0

2606.07053 2026-06-08 cs.CV cs.LG 交叉投稿

TrioPose: Native Triple-Stream Diffusion Transformers for Pose-Guided Text-to-Image Generation

TrioPose: 用于姿态引导文本到图像生成的原生三流扩散变换器

Dian Gu, Zhengyi Yang

发表机构 * Institute of Automation Chinese Academy of Sciences（中国科学院自动化研究所）

AI总结提出TrioPose，基于SD3.5M架构的原生三流姿态感知DiT，通过逐层激活和零初始化双残差注入保持预训练稳定性，并设计可学习关系偏置掩码和姿态引导空间损失加权，在多人姿态引导生成中实现SOTA性能，Human-Art上AP达64.33。

Comments 15 pages (9 pages main body, 6 pages references and appendix), 3 figures, 5 tables

详情

AI中文摘要

姿态引导的文本到图像生成在复杂多人场景中常遭受肢体扭曲和特征串扰。虽然现有的基于UNet的适配器难以处理长程空间依赖，新兴的多模态扩散变换器（MM-DiT）提供了优越的全局建模能力。然而，MM-DiT中的简单信号拼接严重破坏了预训练的潜在分布。为了解决这个问题，我们提出了TrioPose，一个基于SD3.5M架构的原生姿态驱动框架。具体来说，我们引入了一个三流姿态感知DiT（TSPA-DiT），将姿态视为独立模态。它采用逐层激活和零初始化双残差注入，在保持预训练潜在稳定性的同时平滑地施加几何约束。为了解决严重的多实例遮挡，我们设计了一个可学习关系偏置掩码，将拓扑连接分类为细粒度的物理状态，将其映射为连续的注意力软约束，以有效解耦实例间干扰。此外，一种姿态引导空间损失加权策略利用热图导出的误差图调制原生扩散目标，将解剖监督严格集中在畸变区域。大量实验表明，TrioPose在具有挑战性的基准测试（包括Human-Art、CrowdPose和OCHuman）上实现了最先进的性能。值得注意的是，它在Human-Art上达到了64.33的AP，比先前方法提高了30%，同时在复杂多人生成中为视觉保真度和文本-图像语义对齐设立了新标准。

英文摘要

Pose-guided text-to-image generation often suffers from limb distortions and feature crosstalk in complex multi-person scenarios. While existing UNet-based adapters struggle with long-range spatial dependencies, emerging Multimodal Diffusion Transformers (MM-DiTs) offer superior global modeling. However, naive signal concatenation in MM-DiTs severely disrupts pre-trained latent distributions. To address this, we propose TrioPose, a native pose-driven framework built upon the SD3.5M architecture. Specifically, we introduce a Triple-Stream Pose-Aware DiT (TSPA-DiT) that treats pose as an independent modality. It employs layer-wise activation and zero-initialized dual-residual injection to smoothly enforce geometric constraints while preserving pre-trained latent stability. To resolve severe multi-instance occlusions, we design a Learnable Relational Bias Mask that categorizes topological connectivity into fine-grained physical states, mapping them into continuous attention soft constraints to effectively decouple inter-instance interference. Furthermore, a Pose-Guided Spatial Loss Weighting strategy modulates the native diffusion objective using heatmap-derived error maps, focusing anatomical supervision strictly on distortion-prone regions. Extensive experiments demonstrate that TrioPose achieves state-of-the-art performance across challenging benchmarks, including Human-Art, CrowdPose, and OCHuman. Notably, it attains an AP of $64.33$ on Human-Art, representing a $30\%$ improvement over prior arts, while setting new standards for visual fidelity and text-image semantic alignment in complex multi-human generation.

URL PDF HTML ☆

赞 0 踩 0

2606.07054 2026-06-08 cs.CL cs.AI cs.CR cs.LG 交叉投稿

TRACE: Trajectory Reasoning through Adaptive Cross-Step Evidence Aggregation for LLM Agents

TRACE: 通过自适应跨步骤证据聚合的LLM智能体轨迹推理

Vijitha Mittapalli, Shreyaa Jayant Dani, Satya Srujana Pilli, Snigdha Ansu, Mohammadreza Teymoorianfard, Franck Dernoncourt, Hongjie Chen, Yu Wang, Ryan A. Rossi, Nesreen K. Ahmed

发表机构 * University of Massachusetts at Amherst（马萨诸塞大学阿默斯特分校）； Adobe Research（Adobe研究）； Dolby Labs（杜比实验室）； University of Oregon（俄勒冈大学）； Cisco（思科）

AI总结提出TRACE框架，通过TIJ循环识别高信号区域、累积跨步骤证据并合成轨迹级判决，在SHADE-Arena的十个任务域上F1达0.713，召回率0.844，尤其擅长长距离证据链接。

详情

AI中文摘要

自主LLM智能体可以通过一系列单独良性的行动追求隐藏的恶意目标，这使得使用标准轨迹级监控难以检测破坏行为。现有方法要么一次性评估完整轨迹，要么将其划分为独立评分的窗口，限制了连接时间上相距较远的证据的能力。我们提出TRACE，一个用于长视界LLM智能体轨迹的监控框架。TRACE通过一个TIJ（分类-检查-判决）循环运行，该循环识别高信号区域，执行有针对性的检查，同时在推理步骤中累积累积的证据，并综合出轨迹级判决。我们在SHADE-Arena的十个任务域上评估TRACE，与最先进的基线进行比较。TRACE实现了0.713的总体F1分数和0.844的召回率，在需要长距离证据链接的任务上取得了最大的提升。

英文摘要

Autonomous LLM agents can pursue hidden malicious objectives through sequences of individually benign actions, making sabotage difficult to detect using standard trajectory-level monitoring. Existing approaches either evaluate complete trajectories in a single pass or partition them into independently scored windows, limiting their ability to connect evidence across temporally distant actions. We propose TRACE, a monitoring framework for long-horizon LLM agent trajectories. TRACE operates through a TIJ (Triage-Inspect-Judge) loop that identifies high-signal regions, performs targeted inspection while maintaining accumulated evidence across reasoning steps, and synthesizes a trajectory-level verdict. We evaluate TRACE on ten task domains from SHADE-Arena against state-of-the-art baselines. TRACE achieves an aggregate F1 of 0.713 and recall of 0.844, with the largest gains on tasks requiring long-range evidence linking.

URL PDF HTML ☆

赞 0 踩 0

2606.07098 2026-06-08 cs.CL cs.LG 交叉投稿

SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices

SigmaScale: 基于SVD低秩分解和学习缩放矩阵的LLM压缩

Ernests Lavrinovics, Marco Letizia, Roy Janco, Shai Segal, Johannes Bjerva, Maurizio Pierini

发表机构 * Department of Computer Science, Aalborg University Copenhagen（奥尔堡大学哥本哈根分校计算机科学系）； MaLGa-DIBRIS, University of Genoa（热那亚大学MaLGa-DIBRIS）； INFN, Sezione di Genova（国家核物理研究所热那亚分部）； European Organization for Nuclear Research (CERN)（欧洲核子研究中心）； Ceva, Inc.（Ceva公司）

AI总结提出SigmaScale方法，通过学习辅助缩放矩阵优化截断SVD的LLM压缩，降低权重矩阵有效秩，在Llama 3.1 8B和Qwen3-8B上达到竞争性能。

详情

AI中文摘要

我们提出SigmaScale，一种学习辅助缩放矩阵$S$以辅助基于截断奇异值分解（SVD）的大语言模型（LLM）压缩的方法。SigmaScale不是解析地推导缩放矩阵，而是优化两组定义对角行和列缩放变换的向量，并在激活感知的压缩损失下进行。我们表明，学习到的缩放降低了权重矩阵的有效内在秩，这反映在有效秩熵的减少上，并且这种减少与压缩损失强相关。在Llama 3.1 8B Instruct和Qwen3-8B上的实验表明，SigmaScale在困惑度和零样本基准测试上与最相关的基于SVD的压缩方法具有竞争力。通过使用学习到的激活感知变换，SigmaScale通过适应单个模型权重的结构，探索了一条更灵活的低秩LLM压缩路径。在特定任务中观察到的优势使我们的方法成为需要降低LLM推理计算成本的应用的有效选择。

英文摘要

We present SigmaScale, a method for learning auxiliary scaling matrices $S$ to aid truncated Singular Value Decomposition (SVD) based Large Language Model (LLM) compression. Instead of deriving scaling matrices analytically, SigmaScale optimizes two sets of vectors that define diagonal row and column scaling transformations under an activation-aware compression loss. We show that learned scaling lowers the effective intrinsic rank of weight matrices, as reflected by reductions in effective-rank entropy, and that this reduction is strongly correlated with compression loss. Experiments on Llama 3.1 8B Instruct and Qwen3-8B show that SigmaScale is competitive with closely related state-of-the-art SVD-based compression methods across perplexity and zero-shot benchmarks. By using learned activation-aware transformations, SigmaScale explores a more flexible route to low-rank LLM compression by adapting to the structure of individual model weights. The advantage observed in specific tasks makes our approach a valid option for applications requiring a reduced LLM-inference computing cost.

URL PDF HTML ☆

赞 0 踩 0

2606.07153 2026-06-08 math.NA cs.LG cs.NA math.OC 交叉投稿

No-Harm Physics-Informed Inverse Learning with Residual-Calibrated Uncertainty

无伤害物理信息逆学习与残差校准不确定性

Ronald Katende

发表机构 * Department of Mathematics（数学系）； Kabale University（卡巴勒大学）

AI总结提出一种无伤害认证与选择框架，通过残差校准半径确保物理信息逆学习不劣于基线，结合数据、物理、边界等残差提供后验误差界与确定性不确定性半径。

Comments 25 pages, 10 Tables, 12 Figures

详情

AI中文摘要

物理信息学习越来越多地用于偏微分方程控制的逆问题，但其可靠性仍然难以认证。本文开发了一种用于物理信息逆学习的无伤害认证与选择框架。仅当学习重建的残差校准半径不劣于基线半径时，即当 $$R_{\mathrm{learn}}\le R_{\mathrm{base}}+\varepsilon_{\mathrm{safe}}$$ 时，才接受学习重建；否则，该方法返回基线。该认证结合了数据、物理、边界或初始条件以及优化残差。在条件稳定性估计下，这些残差产生后验重建误差界和确定性不确定性半径。对于从独立随机配点估计的物理残差，还推导了高概率认证。在泊松源恢复、逆热重建、有限角度断层扫描、椭圆系数识别和随机残差验证上的数值测试表明，该选择器接受认证的改进，拒绝偏移、幻觉或未完成的候选，并在强不适定情况下变得保守。因此，该框架是一个认证与选择层，而不是另一个重建架构。

外观有帮助吗？在线3D多行人追踪中基于图像的重识别系统研究

Eduardo Borges, Luís Garrote, Urbano J. Nunes

发表机构 * Institute of Systems and Robotics, Department of Electrical and Computer Engineering, University of Coimbra（系统与机器人研究所，电气与计算机工程系，科英布拉大学）

AI总结系统研究轻量级投影框架下图像重识别在在线3D多目标追踪中的作用，提出级联匹配策略以在低延迟下恢复遮挡轨迹并防止身份切换。

Comments Accepted for publication at the 35th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2026)

详情

AI中文摘要

基于LiDAR的3D多目标追踪通常仅依赖几何信息，这在长时间遮挡或拥挤人群环境中往往不足以区分目标。虽然集成基于RGB的重识别提供了保持身份上下文的理论解决方案，但现有方法通常依赖计算昂贵的并行检测器，阻碍了机器人的实时响应。本文通过利用轻量级投影框架解耦移动机器人的几何和外观建模，对在线3D多目标追踪中的基于图像的重识别进行了系统研究。对特征提取架构进行了全面分析，采用轻量级CNN和视觉Transformer，并评估了多种多模态数据关联策略以平衡计算延迟和鲁棒追踪。在KITTI数据集的行人类别上的实验表明，外观和运动成本的朴素线性融合由于视觉噪声而降低了性能。相反，级联匹配策略成功恢复了被遮挡的轨迹而不损害整体精度，有效防止了身份切换以维持人机交互的连续性。我们表明，轻量级架构可以在安全导航所需的低延迟和社交意识所需的判别能力之间提供最优权衡。

英文摘要

LiDAR-based 3D Multi-Object Tracking (MOT) typically relies solely on geometric information, which is often insufficient to distinguish between targets during prolonged occlusions or in crowded human-populated environments. While integrating RGB-based Re-Identification (ReID) offers a theoretical solution for preserving identity context, existing approaches often rely on computationally expensive parallel detectors that hinder real-time robot responsiveness. This work presents a systematic study of image-based ReID in online 3D MOT, utilizing a lightweight projection-based framework to decouple geometric and appearance modeling for mobile robots. A comprehensive analysis of feature extraction architectures is conducted, employing lightweight CNNs and Vision Transformers, and evaluating various multi-modal data association strategies to balance computational latency with robust tracking. Experiments on the Pedestrian class of the KITTI dataset reveal that naive linear fusion, of appearance and motion costs, degrades performance due to visual noise. Conversely, a cascaded matching strategy successfully recovers occluded tracks without compromising overall precision, effectively preventing identity switches to maintain human-robot interaction continuity. We show that lightweight architectures can offer an optimal trade-off between the low latency required for safe navigation and the discriminative power needed for social awareness.

URL PDF HTML ☆

赞 0 踩 0

2606.07237 2026-06-08 cs.CL cs.AI cs.LG 交叉投稿

When Large Language Models Fail in Healthcare: Evaluating Sensitivity to Prompt Variations

当大型语言模型在医疗保健中失败：评估对提示变化的敏感性

Mahdi Alkaeed

发表机构 * Department of Computer Science and Engineering, Doha, Qatar（计算机科学与工程系，多哈，卡塔尔）

AI总结本研究系统分析了通用和医学专用LLM对提示扰动的敏感性，发现即使是微小的措辞变化也可能改变临床建议，对抗性提示可能引发有害输出，表明这些模型在临床应用中不可靠。

Comments 12 pages

详情

AI中文摘要

大型语言模型（LLM）越来越多地用于医疗保健任务，如临床问答、诊断支持和报告总结。尽管前景广阔，但这些模型对微小的提示扰动（包括词汇和句法）仍然高度敏感，在安全关键的临床应用中构成严重风险。在本研究中，我们使用MedMCQA基准进行了系统的敏感性分析，以评估通用（例如GPT-3.5、Llama3）和医学专用LLM（例如ClinicalBERT、BioLlama3、BioBERT）的鲁棒性。我们将扰动分为自然和对抗两种类型，并检查它们对临床推理任务中模型一致性、准确性和可靠性的影响。我们的发现表明，医学LLM并非本质安全。即使是措辞的微小变化也可能改变临床建议，而针对性的对抗性提示可能引发有害输出。在医疗保健等高风险环境中，这种不可预测性是不可接受的——模型因重新措辞的输入而改变诊断，或因轻微改写而幻觉药物，临床医生无法可靠地信任它们。虽然模型通常对简单的词汇替换或释义表现出韧性，但在句法重新排序或误导性上下文线索下往往会崩溃。这种脆弱性在通用和领域专用LLM中都很明显。值得注意的是，对抗性操作可能导致临床危险的输出，例如推荐不正确的剂量或遗漏关键发现。

英文摘要

Large Language Models (LLMs) are increasingly used in healthcare for tasks such as clinical question answering, diagnosis support, and report summarization. Despite their promise, these models remain highly sensitive to subtle prompt perturbations, both lexical and syntactic, posing serious risks in safety-critical clinical applications. In this study, we conduct a systematic sensitivity analysis to evaluate the robustness of both general-purpose (e.g., GPT-3.5, Llama3) and medical-specific LLMs (e.g., ClinicalBERT, BioLlama3, BioBERT) using the MedMCQA benchmark. We categorize perturbations into natural and adversarial types and examine their effect on model consistency, accuracy, and reliability in clinical reasoning tasks. Our findings reveal that medical LLMs are not intrinsically safe. Even minor variations in phrasing can alter clinical advice, and targeted adversarial prompts can provoke harmful outputs. In high-stakes settings like healthcare, such unpredictability is unacceptable-models that change diagnoses due to reworded inputs or hallucinate medications when slightly rephrased cannot be reliably trusted by clinicians. While models tend to show resilience to simple lexical substitutions or paraphrasing, they often break down under syntactic reordering or misleading contextual cues. This fragility is evident across both general-purpose and domain-specific LLMs. Notably, adversarial manipulations can lead to clinically dangerous outputs, such as recommending incorrect dosages or omitting critical findings.

URL PDF HTML ☆

赞 0 踩 0

2606.07257 2026-06-08 physics.optics cs.LG 交叉投稿

On the conditional equivalence of phase retrieval algorithms

关于相位恢复算法的条件等价性

Jakob Schroeder, Andreas Döpp

发表机构 * Fakultät für Physik, Ludwig-Maximilian-Universität München（物理系，路德维希-马克西米利安慕尼黑大学）； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心）； Max Planck Institut für Quantenoptik（马克斯·普朗克量子光学研究所）

AI总结本文证明了Gerchberg-Saxton算法与梯度下降法在幅度最小二乘损失上等价，并给出了全局和局部的概率解释，为迭代相位恢复中的松弛提供指导。

详情

AI中文摘要

相位恢复——从强度测量中恢复复值场——通常使用Gerchberg-Saxton (GS)算法的变体来解决，该算法被理解为测量平面之间的交替投影。同时，现代计算成像越来越依赖于基于梯度的优化和自动微分。这里我们表明这两种方法在数学上是等价的：GS幅度替换步骤恰好是幅度最小二乘损失上的单位梯度下降步骤。这种等价性使得经典相位恢复与可微物理管道无缝集成。我们进一步确定了这种等价性的两种互补概率解释：全局上，幅度损失是高斯幅度噪声下的负对数似然；局部上，每个投影步骤作为贝叶斯更新出现，以传播场为先验。局部观点为迭代相位恢复中的松弛提供了定性指导。

英文摘要

Phase retrieval - recovering a complex-valued field from intensity measurements - is typically solved using variants of the Gerchberg-Saxton (GS) algorithm, understood as alternating projections between measurement planes. Meanwhile, modern computational imaging increasingly relies on gradient-based optimization and automatic differentiation. Here we show that these two approaches are mathematically identical: the GS magnitude replacement step is exactly a unit gradient descent step on an amplitude least-squares loss. This equivalence enables seamless integration of classical phase retrieval with differentiable physics pipelines. We further identify two complementary probabilistic interpretations of this equivalence: globally, the amplitude loss is the negative log-likelihood under Gaussian amplitude noise; locally, each projection step arises as a Bayesian update with the propagated field as prior. The local view provides qualitative guidance for relaxation in iterative phase retrieval.

URL PDF HTML ☆

赞 0 踩 0

2606.07277 2026-06-08 cs.IT cs.CR cs.LG math.IT 交叉投稿

The Capacity of Information-Theoretic Secure Aggregation in Federated Learning

联邦学习中信息论安全聚合的容量

Lanxin Yi, Jinbao Zhu, Kai Wan, Xiaohu Tang

发表机构 * Information Coding and Transmission (ICT) Key Laboratory of Sichuan Province, Southwest Jiaotong University（四川省信息编码与传输（ICT）重点实验室，西南交通大学）； School of Electronic Information and Communications, Huazhong University of Science and Technology（华中科技大学电子信息与通信学院）

AI总结针对联邦学习中的安全聚合问题，提出一种无需可信第三方或预设结构的通用密钥分发模型，并完整刻画了安全性、密钥分发通信和聚合通信三者间的容量区域。

详情

AI中文摘要

安全聚合允许服务器在保护更新隐私的情况下聚合用户的本地更新。现有的信息论问题通常假设相关随机密钥由可信第三方（TTP）提供或通过规定的群组结构生成，而建立此类相关密钥的通信成本往往被忽略。因此，在通用密钥分发机制下的基本极限仍然未知。在本文中，我们研究了在由密钥分发阶段和更新聚合阶段组成的通用两阶段框架下，具有$N$个用户的$T$共谋信息论安全聚合问题。与先前工作不同，我们通过用户间通信对密钥分发进行建模，并允许任意用户生成的密钥分发机制，消除了TTP或规定结构。这使得能够联合表征三种资源：用于安全性的随机性、密钥分发通信和聚合通信。我们通过构建一种新颖的安全聚合方案以及匹配的信息论逆定理，完全刻画了这三种资源之间的容量区域。特别地，我们在任何大小至少为$N$的有限域上开发了一种显式的确定性容量达到构造，而大多数现有方案要么依赖TTP，要么在足够大的有限域上采用随机或存在性构造。我们进一步表明，仅使用成对共享密钥即可实现最优性能，从而可以通过Diffie-Hellman密钥交换实现。与Google开创性的安全聚合方案相比，所提方案在保持相同聚合通信开销的同时，需要更少的随机掩码密钥。

英文摘要

Secure aggregation allows a server to aggregate users' local updates while preserving update privacy. Existing information-theoretic problems typically assume that correlated random keys are provided by a trusted third party (TTP) or generated via prescribed groupwise structures, while the communication cost for establishing such correlated keys is often ignored. Consequently, the fundamental limits under general key-distribution mechanisms remain unknown. In this paper, we study the $T$-colluding information-theoretic secure aggregation problem with $N$ users under a general two-phase framework consisting of a key distribution phase and an update aggregation phase. Unlike prior work, we model key distribution through user-to-user communication and allow arbitrary user-generated key-distribution mechanisms, eliminating TTP or prescribed structures. This enables a joint characterization of three resources: randomness for security, key-distribution communication, and aggregation communication. We completely characterize the capacity region among these three resources by constructing a novel secure aggregation scheme together with a matching information-theoretic converse. In particular, we develop an explicit deterministic capacity-achieving construction over any finite field of size at least $N$, whereas most existing schemes either rely on TTP or employ randomized or existential constructions over sufficiently large finite fields. We further show that the optimal performance can be achieved using only pairwise shared keys, enabling implementation via Diffie--Hellman key exchange. Compared with Google's seminal secure aggregation scheme, the proposed scheme requires fewer random masking keys while preserving the same aggregation communication overhead.

URL PDF HTML ☆

赞 0 踩 0

2606.07293 2026-06-08 cs.SD cs.LG 交叉投稿

TargetSEC: Plug-and-Play In-the-Wild Speech Emotion Conversion via Arousal-Conditioned Latent Style Diffusion

TargetSEC: 基于唤醒度条件潜在风格扩散的即插即用野外语音情感转换

Constantin Alexander Auga

发表机构 * Hasso Plattner Institute / University of Potsdam（霍普特尔研究所 / 波茨坦大学）

AI总结提出TargetSEC，一种基于嵌入驱动的潜在扩散框架，通过连续情感条件生成情感风格嵌入，在紧凑潜在空间操作，实现高转换精度和语音质量。

Comments 5 pages, 2 figures, 2 tables, preprint

详情

AI中文摘要

语音情感转换旨在将源话语的情感转换为目标情感，同时保留内容和说话人身份。由于训练数据的非平行性和复杂真实世界声学，野外数据的SEC具有挑战性。现有的固定时长方法要么难以有效转移情感（高质量、低转换），要么降低语音自然度（低质量、高转换）。我们提出TargetSEC，一种嵌入驱动的潜在扩散框架，根据说话人身份和连续情感生成以情感为中心的风格嵌入。与在频谱图上扩散的方法不同，TargetSEC在紧凑潜在空间中操作。在MSP-Podcast数据集上的实验表明，TargetSEC在转换准确性上优于当前非时长基线，同时保持高语音质量，并且在没有显式时间建模的情况下实现了与时长预测系统相当的性能。

英文摘要

Speech Emotion Conversion (SEC) aims to transform the emotion of a source utterance into a target emotion while preserving content and speaker identity. SEC on in-the-wild data is challenging due to the non-parallel nature of training data and complex real-world acoustics. Existing fixed-duration approaches either struggle to shift the emotion effectively (high quality, low conversion) or degrade speech naturalness (low quality, high conversion). We propose TargetSEC, an embedding-driven latent diffusion framework that generates emotion-focused style embeddings conditioned on speaker identity and continuous emotion. Unlike methods that diffuse over spectrograms, TargetSEC operates in a compact latent space. Experiments on the MSP-Podcast dataset show that TargetSEC outperforms current non-duration baselines in conversion accuracy while maintaining high speech quality, and achieves performance comparable to duration-prediction systems without explicit temporal modeling.

URL PDF HTML ☆

赞 0 踩 0

2606.07366 2026-06-08 cs.CV cs.LG cs.RO 交叉投稿

Dash2Sim: Closed-Loop Driving Simulation from in-the-wild Dashcam Videos

Dash2Sim: 来自野外行车记录仪视频的闭环驾驶仿真

Anurag Ghosh, Francesco Pittaluga, Khiem Vuong, Angela Chen, Juan Alvarez-Padilla, Manmohan Chandraker, Srinivasa Narasimhan

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； NEC Labs America（NEC美国实验室）； MIT（麻省理工学院）； UC San Diego（加州大学圣地亚哥分校）

AI总结提出Dash2Sim框架，将单目行车记录仪视频转化为度量级、地理参考的4D驾驶日志，用于闭环仿真，并构建ROADWork4D基准数据集，验证了施工区场景对规划器的挑战。

详情

AI中文摘要

自动驾驶仿真通常依赖于在少数城市收集的数据或手工编写的合成场景。行车记录仪视频覆盖了更广泛的位置和情况，包括罕见或长尾场景。由于难以从单目野外视频中恢复准确的4D场景，它们被认为不太适用于仿真。施工区是行车记录仪捕捉到的一类长尾情况。我们提出Dash2Sim，一个将野外单目行车记录仪视频转化为度量级、地理参考的4D驾驶日志并与现有仿真器兼容的框架，并针对独立维护的地图验证每个日志，无需标注。我们将Dash2Sim应用于大型视频语料库，创建了ROADWork4D基准数据集，涵盖17个城市的4,244个场景和270万个3D对象。在验证子集ROADWork4D-CL（2,201个场景）上，我们研究了特权闭环规划器，发现施工区场景具有挑战性：尽管基于规则和混合规划器的泛化能力优于基于学习的规划器，但所有规划器均表现不足，无法完成临时施工区通道所需的变道。在规划之外，Dash2Sim恢复的密集深度在新视角合成质量上提高了高达19%（基于感知指标），表明其具有为单目视频的闭环传感器仿真提供丰富条件的潜力。

英文摘要

Self-driving simulations typically rely on data collected in a small number of cities or on hand-authored synthetic scenarios. Dashcam videos cover a far broader range of locations and situations, including rare or long-tailed scenarios. They are considered less usable for simulation because it is difficult to recover accurate 4D scenes from monocular in-the-wild videos. Work zones are one such class of long-tailed situations that dashcams capture. We present Dash2Sim, a framework that turns in-the-wild monocular dashcam videos into metric, geo-referenced 4D driving logs compatible with existing simulators, and verifies eachone against an independently maintained map without annotations. We apply Dash2Sim to a large video corpus to create the ROADWork4D benchmark dataset, which spans 4,244 scenes with 2.7M 3D objects across 17 cities. On a verified subset ROADWork4D-CL (2,201 scenes), we study privileged closed-loop planners and find that work zone scenarios are difficult: while rule-based and hybrid planners generalize better than learning-based ones, all fall short, failing to make the lane changes that temporary work zone channels require. Beyond planning, dense depth recovered by Dash2Sim improves novel-view synthesis quality by up to 19% on perceptual metrics, suggesting its potential to provide rich conditioning for closed-loop sensor simulation from monocular videos.

URL PDF HTML ☆

赞 0 踩 0

2606.07383 2026-06-08 cs.RO cs.LG 交叉投稿

RhinoVLA Technical Report

RhinoVLA 技术报告

Huixi Intelligence, :, Chen Zhang, Chenyang Zhou, Guanglei Ding, Guanghui He, Haibin Gao, Jiajia Chen, Jianyong Zhang, Lianyi Yu, Ningyi Xu, Ping Xu, Qingchen Li, Yingjun Hu, Yijia Zhang, Yuxi Liu

发表机构 * Huixi Intelligence（慧溪智能）

AI总结针对边缘硬件上VLA模型部署延迟问题，提出RhinoVLA，通过令牌高效骨干、连续动作专家和统一接口实现实时闭环控制，在Huixi R1上达到11.69 Hz推理速度。

详情

AI中文摘要

视觉-语言-动作（VLA）模型在机器人操作中展现出强大潜力，但在边缘硬件上的实时部署仍具挑战。本文中，我们识别出VLM视觉和上下文令牌是部署延迟的主要来源：对于以GEMM为主的投影算子，当模型维度固定时，计算量随输入令牌数量线性增长。基于此观察，我们提出RhinoVLA，一种与Huixi R1边缘SoC协同设计的面向部署的VLA模型。RhinoVLA采用令牌高效的Qwen3-VL骨干和连续动作专家，在保留预训练多模态能力的同时减少VLM侧的令牌和计算负担。为支持跨机器人学习，RhinoVLA进一步引入统一接口，结合视图注册表、72维物理状态-动作槽空间和机器人实例LoRA，使异构机器人观测和动作模式能在共享策略下对齐。在部署方面，RhinoVLA通过硬件感知编译、混合精度执行和并行视觉编码进行优化。实验表明，RhinoVLA在相似参数量下实现了与π0.5相当的下游性能，同时在Huixi R1上达到11.69 Hz的端到端推理，满足10 Hz实时闭环控制目标。该项目将在以下网址开源：此 https URL。

英文摘要

Vision-Language-Action (VLA) models have shown strong potential for robotic manipulation, but real-time deployment on edge hardware remains challenging. In this work, we identify VLM visual and context tokens as a major source of deployment latency: for GEMM-dominated projection operators, computation grows linearly with the number of input tokens when model dimensions are fixed. Motivated by this observation, we propose RhinoVLA, a deployment-oriented VLA model co-designed with the Huixi R1 edge SoC. RhinoVLA adopts a token-efficient Qwen3-VL backbone and a continuous Action Expert, reducing the VLM-side token and computation burden while preserving pretrained multimodal capability. To support cross-robot learning, RhinoVLA further introduces a unified interface that combines View Registry, 72D physical state-action slot space, and robotinstance LoRA, allowing heterogeneous robot observations and action schemas to be aligned under a shared policy. On the deployment side, RhinoVLA is optimized through hardware-aware compilation, mixed-precision execution, and parallel visual encoding. Experiments show that RhinoVLA achieves downstream performance comparable to π0.5 at a similar parameter scale, while reaching 11.69 Hz end-to-end inference on Huixi R1, meeting the 10 Hz real-time closedloop control target. The project will be open-sourced at https://github.com/HuixiAI/RhinoVLA.

URL PDF HTML ☆

赞 0 踩 0

2606.07385 2026-06-08 nlin.CD cs.LG physics.data-an 交叉投稿

Unified Geometry-Guided ML-FTLE for Tracking Transient Chaos from Scalar Time Series

统一几何引导的ML-FTLE用于从标量时间序列追踪瞬态混沌

S. V. Manivelan, Andrei Velichko, I. Manimehan

发表机构 * Department of Physics, M. R. Government Arts College (Affiliated to Bharathidasan University, Tiruchirappalli)（物理系，M.R.政府艺术学院（隶属于巴拉特拉桑大学， Tiruchirappalli））； Institute of Physics and Technology, Petrozavodsk State University（物理与技术学院，佩特罗扎沃茨克州立大学）

AI总结提出几何引导的机器学习框架，通过结合预测轨迹发散和宏观吸引子形态，从标量观测中检测瞬态混沌，无需控制方程，验证了融合拓扑状态空间与预测发散能系统改进连续过渡追踪。

Comments Preprint; 9 figures; submitted for peer review

详情

AI中文摘要

在没有控制方程的情况下，从标量观测中检测瞬态混沌是非线性动力学中的一个基本挑战。我们提出了一个几何引导的机器学习框架，该框架统一了预测轨迹发散与宏观吸引子形态，以追踪突然的 regime 转变。该方法通过样本外 k-最近邻预测误差提取局部不稳定性尺度，建立 ML-FTLE 估计器，随后将此时间发散映射到由最小庞加莱占用网格字典导出的结构接近矩阵上。通过偏最小二乘回归，我们提取一个直接校准到经验有限时间李雅普诺夫谱的潜在几何成分，得到基于庞加莱的几何引导 FTLE。对解析 QR-FTLE 基线的验证证实，融合拓扑状态空间与预测发散系统地改进了连续过渡追踪。结构相似性指数最优地解析了逐渐阻尼，而豪斯多夫距离在突然的相空间崩溃期间表现出极端弹性。此外，宏观空间离散化作为针对加性高斯噪声的鲁棒拓扑正则化器，即使在中等信号阈值下也能保留确定性特征。这个无方程框架为监测复杂非平稳系统中的结构转变提供了高精度、抗噪声的诊断方法。

英文摘要

Detecting transient chaos from scalar observations without governing equations represents a fundamental challenge in nonlinear dynamics. We propose a geometry-guided machine learning framework that unifies predictive trajectory divergence with macroscopic attractor morphology to track abrupt regime shifts. The methodology extracts a local instability scale via out-of-sample k-nearest neighbor forecast errors to establish the ML-FTLE estimator, subsequently mapping this temporal divergence onto a structural closeness matrix derived from a minimal dictionary of Poincare occupancy grids. By employing partial least squares regression, we extract a latent geometric component calibrated directly to the empirical finite-time Lyapunov spectrum, yielding the Poincare-based geometric-guided FTLE. Validation against analytical QR-FTLE baselines confirms that fusing topological state spaces with predictive divergence systematically improves continuous transition tracking. The Structural Similarity Index optimally resolves gradual damping, while Hausdorff Distance exhibits extreme resilience during abrupt phase-space collapses. Furthermore, macroscopic spatial discretization acts as a robust topological regularizer against additive Gaussian noise, preserving deterministic signatures even at moderate signal thresholds. This equation-free framework provides a highly accurate, noise-resilient diagnostic for monitoring structural transitions in complex non-stationary systems.

URL PDF HTML ☆

赞 0 踩 0

2606.07392 2026-06-08 cs.AI cs.LG econ.EM stat.ML 交叉投稿

Online Pandora's Box for Contextual LLM Cascading

面向上下文LLM级联的在线潘多拉魔盒

Alexandre Belloni, Yan Chen, Yehua Wei

发表机构 * The Fuqua School of Business, Duke University（杜克大学福克商学院）

AI总结针对LLM级联场景，提出在线上下文潘多拉魔盒模型，通过参数化保留索引和GMM估计结合UCB界，实现维度相关的√T累积遗憾。

详情

AI中文摘要

受大型语言模型（LLM）级联的启发，我们提出了一种在线上下文潘多拉魔盒模型，用于自适应地查询和选择LLM API。在每个周期中，决策者观察一个请求上下文，并面临一个两阶段决策问题。在查询阶段，决策者顺序查询API，每次查询揭示一个生成的输出，并且决策者承担（输出相关的）成本。在选择阶段，决策者选择一个生成的输出进行部署，并仅观察部署输出的下游奖励。这种输出介导的反馈结构不同于经典的在线上下文潘多拉魔盒模型，后者打开盒子直接揭示其奖励。我们不估计每个API的完整条件输出和成本分布，而是直接建模保留索引，并为查询阶段开发一种学习方法。具体地，我们对由经典Weitzman策略诱导的上下文保留索引函数施加参数化结构。我们的策略将这些保留索引的广义矩方法（GMM）类型估计与这些索引以及共享输出级奖励评估器的UCB风格置信界相结合。在正则条件下，我们证明所得策略在T个周期的时间范围内实现了维度相关的$\widetilde O(\sqrt T)$累积遗憾。

英文摘要

Motivated by Large Language Model (LLM) cascading, we propose an online contextual Pandora's Box model for adaptively querying and selecting LLM APIs. In each period, a decision-maker observes a request context and faces a two-phase decision problem. In the query phase, the decision-maker sequentially queries APIs, where each query reveals a generated output and the decision-maker incurs an (output-dependent) cost. In the selection phase, the decision-maker selects one of the generated outputs to deploy and observes only the downstream reward of the deployed output. This output-mediated feedback structure differs from classical online contextual Pandora's Box models, in which opening a box directly reveals its reward. Rather than estimating the full conditional output and cost distributions of each API, we directly model the reservation index and develop a learning approach for the query phase. Specifically, we impose a parametric structure on the contextual reservation index functions induced by the classical Weitzman's policy. Our policy combines generalized method of moments (GMM) type estimation of these reservation indices with UCB-style confidence bounds for both these indices and the shared output-level reward evaluator. Under regularity conditions, we prove that the resulting policy achieves dimension-dependent $\widetilde O(\sqrt T)$ cumulative regret over a horizon of $T$ periods.

URL PDF HTML ☆

赞 0 踩 0

2606.07399 2026-06-08 stat.ML cs.LG 交叉投稿

Automatic, Debiased, and Invariant Counterfactual Generation under General Interventions

通用干预下的自动、去偏和不变反事实生成

Raphael C Kim, Jingsen Zhu, Ramin Zabih, Michele Santacatterina

发表机构 * Cornell Tech（康奈尔科技）； Cornell University（康奈尔大学）； Department of Biostatistics, Department of Population Health（生物统计学系、人口健康系）； New York University Grossman School of Medicine（纽约大学格罗斯曼医学院）

AI总结提出ADIGen框架，结合Riesz回归、因果不变性和正交统计学习，实现通用干预下反事实生成的自动、去偏和不变性，并提供过剩风险界。

2606.07403 2026-06-08 math.OC cs.LG 交叉投稿

基于数据集分类学的推荐系统Bradley-Terry排名

Ekaterina Grishina, Stepan Kuznetsov, Askar Tsyganov, Ilya Ivanov, Daria Korovaitceva, Margarita Rusanova, Uliana Parkina, Alexander Derevyagin, Evgeny Frolov, Sergey Samsonov, Anton Lysenko

发表机构 * HSE University（俄罗斯莫斯科国立高等经济学院）

AI总结针对推荐算法排名对数据集特性敏感的问题，提出基于Bradley-Terry模型的数据驱动排名方法，并引入排名一致性指标和针对未见数据集的算法排名方法。

Comments KDD'26

详情

DOI: 10.1145/3770855.3817890

AI中文摘要

推荐算法的排名是一个具有挑战性的问题，因为模型性能对数据集特征（如稀疏性、序列结构和规模）敏感。这驱动了对适当方法的需求，以公平比较算法。对性能指标（例如，在基准测试上平均NDCG）的简单聚合可能会产生误导性的排名，削弱实际选择。为解决此问题，我们引入了一种基于Bradley-Terry（BT）模型的新型数据驱动排名方法。我们证明所获得的排名取决于关键数据集统计量。此外，我们提出了一种新的排名一致性评估指标，并展示了我们的排名对不完整数据的鲁棒性。最后，我们引入了一种针对未见数据集的算法排名方法，无需运行模型，依赖于Bradley-Terry框架的扩展，包括BT树和带协变量的BT模型。

英文摘要

The ranking of recommendation algorithms is a challenging problem since model performance is sensitive to dataset characteristics such as sparsity, sequential structure, and scale. This drives a demand for a proper methodology for fair comparison between algorithms. Naive aggregation of performance metrics (e.g., averaging NDCG over benchmarks) can yield misleading rankings, undermining practical selection. To address this problem, we introduce a novel, data-driven ranking methodology based on Bradley-Terry (BT) model. We demonstrate that the obtained ranking depends on key dataset statistics. Additionally, we propose a novel metric for evaluating ranking consistency and demonstrate robustness of our ranking to incomplete data. Finally, we introduce a dataset-specific methodology for ranking algorithms on unseen datasets without running the models, relying on extensions of the Bradley-Terry framework, including BT trees and BT models with covariates.

URL PDF HTML ☆

赞 0 踩 0

2206.08598 2026-06-08 cs.LG stat.ML 版本更新

Characterizing Learning Dynamics under Relative Reparameterization of Singular Models

奇异模型相对重参数化下的学习动态表征

Pascal Mattia Esser, Frank Nielsen

发表机构 * Ludwig-Maximilians-Universität München（慕尼黑路易斯-马克西米利安大学）； Sony Computer Science Laboratories Inc.（索尼计算机科学实验室）

AI总结针对奇异模型参数空间与模型空间非一一对应导致收敛慢的问题，提出相对重参数化方法提取正则子模型，并在高斯混合模型和神经网络上理论分析梯度下降收敛率。

2312.07762 2026-06-08 cs.LG cs.NA math.NA stat.AP 版本更新

Interpretable factorization of clinical questionnaires to identify latent factors of psychopathology

临床问卷的可解释分解以识别精神病理学的潜在因素

Ka Chun Lam, Bridget W Mahony, Armin Raznahan, Francisco Pereira

发表机构 * Machine Learning Core, National Institute of Mental Health, National Institutes of Health（机器学习核心，国家心理健康研究所，国立卫生研究院）； Section on Developmental Neurogenomics, Human Genetics Branch, National Institute of Mental Health, National Institutes of Health（发育神经基因组学部门，人类遗传学分支，国家心理健康研究所，国立卫生研究院）； National Institute of Mental Health, National Institutes of Health（国家心理健康研究所，国立卫生研究院）

AI总结提出可解释性约束问卷分解（ICQF），一种非负矩阵分解方法，通过正则化提高因子可解释性和稳定性，并自动检测潜在维度，在真实数据中优于现有方法。

详情

AI中文摘要

精神病学研究旨在通过识别少量潜在因素来理解问卷数据中测量的行为精神病理学表现。虽然因子分析是传统工具，但所得因子可能不可解释，且可能受混杂变量影响。此外，缺失数据常见，通常需要显式插补。为克服这些限制，我们引入了可解释性约束问卷分解（ICQF），一种针对问卷数据正则化的非负矩阵分解方法。我们的方法旨在提高因子可解释性和解稳定性。我们提供了具有理论收敛保证的优化过程，以及自动准确检测潜在维度的程序。我们使用逼真的合成数据验证了这些程序。我们在两个独立数据集（健康大脑网络和青少年大脑认知发展研究）中展示了该方法在广泛使用的通用问卷中的有效性。具体而言，我们表明ICQF提高了领域专家定义的可解释性，同时保留了跨一系列障碍的诊断信息，并在较小数据集规模下优于竞争方法。这表明我们方法中的正则化与领域特征相匹配。ICQF的Python实现可在https://github.com/jefferykclam/ICQF获取。

英文摘要

Psychiatry research seeks to understand the manifestations of psychopathology in behavior, as measured in questionnaire data, by identifying a small number of latent factors that explain them. While factor analysis is the traditional tool for this purpose, the resulting factors may not be interpretable, and may also be subject to confounding variables. Moreover, missing data are common, and explicit imputation is often required. To overcome these limitations, we introduce interpretability constrained questionnaire factorization (ICQF), a non-negative matrix factorization method with regularization tailored for questionnaire data. Our method aims to promote factor interpretability and solution stability. We provide an optimization procedure with theoretical convergence guarantees, and an automated procedure to detect latent dimensionality accurately. We validate these procedures using realistic synthetic data. We demonstrate the effectiveness of our method in a widely used general-purpose questionnaire, in two independent datasets (the Healthy Brain Network and Adolescent Brain Cognitive Development studies). Specifically, we show that ICQF improves interpretability, as defined by domain experts, while preserving diagnostic information across a range of disorders, and outperforms competing methods for smaller dataset sizes. This suggests that the regularization in our method matches domain characteristics. The python implementation for ICQF is available at https://github.com/jefferykclam/ICQF.

URL PDF HTML ☆

赞 0 踩 0

2403.05532 2026-06-08 cs.LG cs.CV 版本更新

Twin: Tuning Learning Rate and Weight Decay of Deep Homogeneous Classifiers without Validation

Twin: 无需验证的深度同质分类器学习率和权重衰减调优

Lorenzo Brigato, Stavroula Mougiakakou

发表机构 * ARTORG Center, University of Bern（伯恩大学ARTORG中心）

AI总结提出Twin方法，利用同质网络的边界最大化动态和训练-测试损失间的经验缩放定律，实现无需验证集的学习率和权重衰减调优，在37个图像分类配置上达到与Oracle基线1.28%的平均绝对误差。

Comments Accepted at TMLR

详情

AI中文摘要

我们介绍了Tune without Validation (Twin)，一种简单有效的管道，用于调优同质分类器的学习率和权重衰减，无需验证集，消除了保留数据的需求并避免了两步过程。Twin利用了同质网络的边界最大化动态以及连接超参数配置下训练和测试损失的经验缩放定律。这种数学建模产生了一个依赖于区域的、无需验证的选择规则：在不可分离区域，训练损失在测试损失中是单调的，因此可以预测泛化；而在可分离区域，由于边界最大化，参数的范数成为泛化的可靠指标。在37个图像分类的数据集-架构配置中，我们证明Twin与使用测试准确率选择超参数的Oracle基线相比，平均绝对误差为1.28%。我们展示了Twin在验证数据稀缺的场景（如小数据 regime）或难以且昂贵收集的场景（如医学成像）中的优势。代码可在 https://github.com/lorenzobrigato/twin 获取。

英文摘要

We introduce Tune without Validation (Twin), a simple and effective pipeline for tuning learning rate and weight decay of homogeneous classifiers without validation sets, eliminating the need to hold out data and avoiding the two-step process. Twin leverages the margin-maximization dynamics of homogeneous networks and an empirical scaling law that links training and test losses across hyper-parameter configurations. This mathematical modeling yields a regime-dependent, validation-free selection rule: in the non-separable regime, training loss is monotonic in test loss and therefore predictive of generalization, whereas in the separable regime, the parameters' norm becomes a reliable indicator of generalization due to margin maximization. Across 37 dataset-architecture configurations for image classification, we demonstrate that Twin achieves a mean absolute error of 1.28% compared to an Oracle baseline that selects HPs using test accuracy. We demonstrate Twin's benefits in scenarios where validation data is scarce, such as small-data regimes, or difficult and costly to collect, as in medical imaging. Code available at https://github.com/lorenzobrigato/twin.

URL PDF HTML ☆

赞 0 踩 0

2403.10318 2026-06-08 cs.LG 版本更新

pTNAS: Progressive Neural Architecture Search for Tabular Data

pTNAS: 面向表格数据的渐进式神经架构搜索

Naili Xing, Shaofeng Cai, Lingze Zeng, Jiaqi Zhu, Peng Lu, Jian Pei, Beng Chin Ooi

发表机构 * National University of Singapore（新加坡国立大学）； University of Science and Technology of China（中国科学技术大学）

AI总结提出首个针对表格数据的渐进式神经架构搜索方法pTNAS，采用过滤-精炼优化策略，结合零成本代理和固定预算调度算法，实现架构快速识别与性能持续提升，相比其他NAS方法加速高达82.75倍。

详情

AI中文摘要

最近的进展已将表格学习的范式转向表格基础模型，但其准确性依赖于随着上下文大小扩展而性能不佳的高推理成本。当配备精心设计的架构时，深度神经网络仍然是一种极具竞争力且更高效的建模范式；然而，以数据自适应和预算感知的方式识别此类架构仍然具有挑战性。我们提出了pTNAS，这是首个针对表格数据定制的渐进式神经架构搜索（NAS）方法，它能够快速识别可行的架构，并在更多预算可用时持续提高其搜索性能。pTNAS采用了一种过滤-精炼优化策略，结合了高效的免训练和有效的基于训练的架构评估。在过滤阶段，我们引入了pTProxy，这是一种专为表格网络设计的新型零成本代理，它联合捕捉架构的可训练性和表达能力，从而能够快速过滤大型架构搜索空间。在精炼阶段，pTNAS采用固定预算调度算法，从一小批有希望的候选架构中准确识别出性能最佳的架构。我们进一步提出了一种预算感知协调器来整体优化预算分配。实验表明，与其他NAS方法相比，pTNAS将达到全局最佳架构的时间缩短了高达82.75倍，实现了最佳的平均预测排名，并且与TabPFN相比，端到端效率提高了高达4.78倍。

英文摘要

Recent advances have shifted the paradigm of tabular learning toward tabular foundation models, yet their accuracy relies on a heavy inference cost that scales poorly with context size. Deep neural networks remain a highly competitive and more efficient modeling paradigm when equipped with well-designed architectures; however, identifying such architectures in a data-adaptive and budget-aware manner remains challenging. We propose pTNAS, the first progressive neural architecture search (NAS) approach tailored for tabular data, which enables fast identification of a viable architecture and continuously improves its search performance as more budget becomes available. pTNAS adopts a filter-and-refine optimization strategy that combines efficient training-free and effective training-based architecture evaluation. In the filtering phase, we introduce pTProxy, a novel zero-cost proxy specifically designed for tabular networks that jointly captures architectural trainability and expressivity, enabling fast filtering of large architecture search spaces. In the refinement phase, pTNAS employs a fixed-budget scheduling algorithm to accurately identify the best-performing architecture from a small set of promising candidates. We further propose a budget-aware coordinator to optimize budget allocation holistically. Experiments show that pTNAS reduces the time to reach the globally best architecture by up to 82.75 X compared with other NAS approaches, achieves the best average predictive rank, and improves end-to-end efficiency by up to 4.78 X compared with TabPFN.

URL PDF HTML ☆

赞 0 踩 0

2406.05670 2026-06-08 cs.LG cs.CR cs.CV 版本更新

Certified Robustness to Data Poisoning in Gradient-Based Training

基于梯度的训练中对数据投毒的认证鲁棒性

Philip Sosnin, Mark N. Müller, Maximilian Baader, Calvin Tsay, Matthew Wicker

发表机构 * Department of Computing, Imperial College London, United Kingdom（帝国理工学院伦敦分校计算机系）； Department of Computer Science, ETH Zurich, Switzerland（苏黎世联邦理工学院计算机科学系）； LogicStar.ai, Switzerland（LogicStar.ai公司）； The Alan Turing Institute, United Kingdom（艾伦·图灵研究所）

AI总结提出首个框架，通过凸松弛过度近似参数更新集，为梯度下降训练的模型提供针对无目标、有目标投毒和后门攻击的可证明鲁棒性保证。

Comments 21 pages, 8 figures

详情

AI中文摘要

现代机器学习流程利用大量公共数据，使得保证数据质量变得不可行，并使模型容易受到投毒和后门攻击。在攻击下可证明地约束模型行为仍然是一个开放问题。在这项工作中，我们通过开发第一个框架来应对这一挑战，该框架在不修改模型或学习算法的情况下，为使用可能被操纵的数据训练的模型的行为提供可证明的保证。特别是，我们的框架针对训练输入和标签的有界和无界操纵，认证了对无目标和有目标投毒以及后门攻击的鲁棒性。我们的方法利用凸松弛来过度近似给定投毒威胁模型下所有可能的参数更新集，从而允许我们为任何基于梯度的学习算法约束所有可达参数的集合。给定这个参数集，我们提供了最坏情况行为的界限，包括模型性能和后门成功率。我们在多个真实世界数据集上展示了我们的方法，这些数据集来自能源消耗、医学成像和自动驾驶等应用。

英文摘要

Modern machine learning pipelines leverage large amounts of public data, making it infeasible to guarantee data quality and leaving models open to poisoning and backdoor attacks. Provably bounding model behavior under such attacks remains an open problem. In this work, we address this challenge by developing the first framework providing provable guarantees on the behavior of models trained with potentially manipulated data without modifying the model or learning algorithm. In particular, our framework certifies robustness against untargeted and targeted poisoning, as well as backdoor attacks, for bounded and unbounded manipulations of the training inputs and labels. Our method leverages convex relaxations to over-approximate the set of all possible parameter updates for a given poisoning threat model, allowing us to bound the set of all reachable parameters for any gradient-based learning algorithm. Given this set of parameters, we provide bounds on worst-case behavior, including model performance and backdoor success rate. We demonstrate our approach on multiple real-world datasets from applications including energy consumption, medical imaging, and autonomous driving.

URL PDF HTML ☆

赞 0 踩 0

2408.15344 2026-06-08 cs.LG math.DS 版本更新

Conformal Disentanglement and Latent-Space Curation: A Neural Framework for Perspective Synthesis, Differentiation and Targeted Generation

共形解缠与潜在空间策展：面向视角合成、区分和定向生成的神经框架

George A. Kevrekidis, Eleni D. Koronaki, Dimitris G. Giovanis, Yannis G. Kevrekidis

发表机构 * Department of Applied Mathematics and Statistics, Johns Hopkins University（应用数学与统计学系，约翰霍普金斯大学）； Los Alamos National Laboratory（洛斯阿拉莫斯国家实验室）； Faculty of Science, Technology and Medicine, University of Luxembourg（科学、技术与医学学院，卢森堡大学）； Department of Civil and Systems Engineering, Johns Hopkins University（土木与系统工程系，约翰霍普金斯大学）； Department of Chemical and Biomolecular Engineering, Johns Hopkins University（化学与生物分子工程系，约翰霍普金斯大学）

AI总结提出一种神经自编码器框架，通过结构约束和正交正则化从多传感器数据中分离共享与传感器特定潜在变量，并利用解缠潜在子空间实现定向生成和跨传感器推断。

详情

AI中文摘要

许多科学和工程问题涉及通过多个异构传感器或测量模态观察同一现象。此类观测通常包含跨传感器共享的信息（反映底层系统）以及来自测量过程或环境效应的传感器特定或外部成分。当传感器独立观测不可用时，解缠这些贡献至关重要。我们提出一种神经自编码器框架，从多传感器数据中显式分离共享和传感器特定的潜在变量。该架构通过结构约束和基于正交的正则化强制潜在组件之间的几何独立性，产生可解释且解缠的表示。基于此表示，我们引入一种潜在空间生成方法，其中生成模型在选定的解缠潜在子空间上被调谐/“限制”；然后我们建设性地组合解缠的观测潜在变量，通过训练的解码器条件合成新样本。这使得能够生成具有指定共享（或传感器特定）特征的一致数据。它还通过一致地采样未观测模态中合理测量的分布来支持跨传感器推断。我们在多个计算示例上展示了该方法，显示了在异构传感设置中的有效解缠、定向数据生成和模态插补。

英文摘要

Many scientific and engineering problems involve observing a common phenomenon through multiple heterogeneous sensors or measurement modalities. Such observations typically contain both information shared across sensors, reflecting the underlying system, and sensor-specific or extraneous components arising from measurement processes or environmental effects. Disentangling these contributions is essential when sensor-independent observations are unavailable. We propose a neural autoencoder framework that explicitly separates shared and sensor-specific latent variables from multi-sensor data. The architecture enforces geometric independence between latent components through structural constraints and orthogonality-based regularization, yielding interpretable and disentangled representations. Building on this representation, we then introduce a latent-space generative methodology in which generative models are tuned/"restricted" on selected disentangled latent subspaces; we then constructively combine disentangled observed latent variables to conditionally synthesize new samples via trained decoders. This enables consistent data generation with prescribed shared (or sensor-specific) characteristics. It also supports cross-sensor inference by consistently sampling distributions over plausible measurements in unobserved modalities. We demonstrate the approach on several computational examples, showing effective disentanglement, targeted data generation, and modality imputation in heterogeneous sensing settings.

URL PDF HTML ☆

赞 0 踩 0

2411.09734 2026-06-08 cs.LG cs.NA math.NA math.OC 版本更新

Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations

用积分微分方程建模 AdaGrad、RMSProp 和 Adam

Carlos Heredia

发表机构 * IAMM Research, Department of Applied Artificial Intelligence（IAMM研究院应用人工智能系）； DAMM

AI总结提出 AdaGrad、RMSProp 和 Adam 的连续时间积分微分方程模型，通过数值模拟和稳定性分析验证其与离散算法的一致性，为自适应优化方法提供新视角。

Comments 60 pages, 15 figures; v3 - Section 4 corrected

2412.09119 2026-06-08 cs.LG cs.CR math.OC 版本更新

多目标偏好优化：提升生成模型的人类对齐

Akhil Agnihotri, Rahul Jain, Deepak Ramachandran, Zheng Wen

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结针对RLHF和偏好优化方法假设单一目标的问题，提出多目标偏好优化框架MOPO，通过约束KL散度最大化主要目标并保障次要目标下限，在合成基准和人类偏好数据上实现帕累托最优策略。

详情

AI中文摘要

使用RLHF和偏好优化方法（如DPO、IPO）对LLM进行后训练已大大改善了对齐，但这些方法假设单一目标。实际上，人类表达多个通常相互冲突的目标，例如有用性和无害性，没有自然的标量化。我们研究多目标偏好对齐问题，其中策略必须同时平衡多个目标。我们提出多目标偏好优化（MOPO），一个受约束的KL正则化框架，通过可调安全阈值在强制执行次要目标下限的同时最大化主要目标。MOPO直接操作成对偏好，无需点式奖励，并允许简单的闭式迭代更新。实验上，MOPO在合成基准上恢复帕累托最优策略，并在人类偏好数据上微调时，产生数十亿参数模型，实现更高奖励和帕累托支配基线，具有稳定且鲁棒的优化动态。

英文摘要

Post-training LLMs with RLHF and preference optimization methods (e.g., DPO, IPO) has greatly improved alignment, yet these approaches assume a single objective. In reality, humans express multiple, often conflicting objectives, such as helpfulness and harmlessness, with no natural scalarization. We study the multi-objective preference alignment problem, where a policy must balance several objectives simultaneously. We propose Multi-Objective Preference Optimization (MOPO), a constrained KL-regularized framework that maximizes a primary objective while enforcing lower bounds on secondary objectives via tunable safety thresholds. MOPO operates directly on pairwise preferences without point-wise rewards, and admits simple closed-form iterative updates. Empirically, MOPO recovers Pareto-optimal policies on synthetic benchmarks and, when fine-tuned on human-preference data, yields multi-billion parameter models that achieve higher rewards and Pareto-dominate baselines, with stable and robust optimization dynamics.

URL PDF HTML ☆

赞 0 踩 0

2505.12239 2026-06-08 cs.LG cs.AI cs.CR 版本更新

DOPPLER: 异步数据流图中设备分配的双策略学习

Xinyu Yao, Daniel Bourgeois, Abhinav Jain, Yuxin Tang, Jiawen Yao, Zhimin Ding, Arlei Silva, Chris Jermaine

发表机构 * Rice University（里士大学）； Rice Ken Kennedy Institute（里士肯尼迪研究所）

AI总结提出Doppler框架，通过双策略网络（SEL选择操作、PLC放置设备）优化异步数据流图中的设备分配，减少执行时间并提高采样效率。

Comments 32 pages, 19 figures

详情

Journal ref: Proceedings of the International Conference on Learning Representations (ICLR), 2026

AI中文摘要

我们研究在work-conserving系统中将数据流图中的操作分配给设备以最小化执行时间的问题，重点关注复杂的机器学习工作负载。先前的基于学习的方法常常因三个关键限制而难以奏效：(1) 依赖像TensorFlow这样的批量同步系统，由于屏障同步导致设备利用率不足；(2) 在设计基于学习的方法时缺乏对底层系统调度机制的了解；(3) 完全依赖强化学习，忽略了专家设计的有效启发式结构。在本文中，我们提出Doppler，一个用于训练双策略网络的三阶段框架，包括1) 用于选择操作的$\mathsf{SEL}$策略和2) 用于将所选操作放置到设备上的$\mathsf{PLC}$策略。我们的实验表明，Doppler通过减少系统执行时间在所有任务上优于所有基线方法，并且通过减少每回合训练时间展示了采样效率。

英文摘要

We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based methods often struggle due to three key limitations: (1) reliance on bulk-synchronous systems like TensorFlow, which under-utilize devices due to barrier synchronization; (2) lack of awareness of the scheduling mechanism of underlying systems when designing learning-based methods; and (3) exclusive dependence on reinforcement learning, ignoring the structure of effective heuristics designed by experts. In this paper, we propose Doppler, a three-stage framework for training dual-policy networks consisting of 1) a $\mathsf{SEL}$ policy for selecting operations and 2) a $\mathsf{PLC}$ policy for placing chosen operations on devices. Our experiments show that Doppler outperforms all baseline methods across tasks by reducing system execution time and additionally demonstrates sampling efficiency by reducing per-episode training time.

URL PDF HTML ☆

赞 0 踩 0

2505.23437 2026-06-08 cs.LG cs.AI cs.IR 版本更新

Bounded-Abstention Pairwise Learning to Rank

有界弃权成对学习排序

Antonio Ferrara, Andrea Pugnana, Francesco Bonchi, Salvatore Ruggieri

发表机构 * Intesa Sanpaolo AI Research（Intesa Sanpaolo AI研究中心）； University of Trento（特伦托大学）； University of Pisa（比萨大学）

AI总结提出一种基于条件风险阈值的成对排序弃权方法，理论刻画最优策略，设计模型无关的插件算法，实验验证有效性。

Comments KDD 2026

详情

AI中文摘要

排序系统影响健康、教育和就业等高风险领域的决策，可能产生重大经济和社会影响，因此集成安全机制至关重要。弃权是一种安全机制，允许算法决策系统将不确定或低置信度的决策推迟给人类专家。虽然弃权主要在分类任务中研究，但其在其他机器学习范式中的应用尚不充分。本文提出一种用于成对学习排序任务的弃权新方法。该方法基于对排序器条件风险设置阈值：当估计风险超过预定义阈值时，系统弃权不做决策。我们的贡献有三方面：最优弃权策略的理论刻画、一个模型无关的插件式算法用于构建弃权排序模型，以及在多个数据集上的全面实证评估，证明了我们方法的有效性。

英文摘要

Ranking systems influence decision-making in high-stakes domains like health, education, and employment, where they can have substantial economic and social impacts. This makes the integration of safety mechanisms essential. One such mechanism is abstention, which enables algorithmic decision-making systems to defer uncertain or low-confidence decisions to human experts. While abstention has been predominantly explored in the context of classification tasks, its application to other machine learning paradigms remains underexplored. In this paper, we introduce a novel method for abstention in pairwise learning-to-rank tasks. Our approach is based on thresholding the ranker's conditional risk: the system abstains from making a decision when the estimated risk exceeds a predefined threshold. Our contributions are threefold: a theoretical characterization of the optimal abstention strategy, a model-agnostic, plug-in algorithm for constructing abstaining ranking models, and a comprehensive empirical evaluation across multiple datasets, demonstrating the effectiveness of our approach.

URL PDF HTML ☆

赞 0 踩 0

2507.12927 2026-06-08 cs.LG cs.IT math.IT 版本更新

Trace Reconstruction with Language Models

基于语言模型的迹重建

Franziska Weindel, Michael Girsch, Reinhard Heckel

发表机构 * School of Computation, Information and Technology, Technical University of Munich（计算、信息与技术学院，慕尼黑技术大学）； Munich Center for Machine Learning（慕尼黑机器学习中心）

AI总结提出TReconLM解码器仅变换器，将迹重建视为下一个标记预测任务，在合成和真实数据上预训练和微调，显著优于现有算法。

详情

AI中文摘要

一般的迹重建问题旨在从被插入、删除和替换独立损坏的噪声副本中恢复原始序列。该问题出现在DNA数据存储等应用中，DNA数据存储因其高信息密度和持久性而成为一种有前景的存储介质。然而，DNA合成、存储和测序过程中引入的错误需要通过算法和编码进行纠正，而迹重建通常作为数据检索的一部分。在这项工作中，我们提出了TReconLM，一种仅解码器的变换器，将迹重建作为下一个标记预测任务来解决。TReconLM优于最先进的迹重建算法，包括先前的深度学习方法，能够以无错误的方式恢复更高比例的序列。我们在基于简单错误模型生成的合成数据上进行预训练，并在真实世界数据上进行微调，以适应特定技术的错误模式。代码可在https://github.com/MLI-lab/TReconLM获取。

英文摘要

The general trace reconstruction problem seeks to recover an original sequence from its noisy copies independently corrupted by insertions, deletions, and substitutions. This problem arises in applications such as DNA data storage, a promising storage medium due to its high information density and longevity. However, errors introduced during DNA synthesis, storage, and sequencing require correction through algorithms and codes, with trace reconstruction often used as part of data retrieval. In this work, we propose TReconLM, a decoder-only transformer that solves trace reconstruction as a next-token prediction task. TReconLM outperforms state-of-the-art trace reconstruction algorithms, including prior deep-learning approaches, recovering a substantially higher fraction of sequences without error. We pretrain on synthetic data generated from a simple error model and fine-tune on real-world data to adapt to technology-specific error patterns. Code is available at https://github.com/MLI-lab/TReconLM.

URL PDF HTML ☆

赞 0 踩 0

2508.02039 2026-06-08 cs.LG stat.ML 版本更新

Model Recycling Framework for Multi-Source Data-Free Supervised Transfer Learning

多源无数据监督迁移学习的模型回收框架

Sijia Wang, Ricardo Henao

发表机构 * Department of Electrical and Computer Engineering（电气与计算机工程系）； Duke University（杜克大学）

AI总结提出模型回收框架，在无源数据情况下，通过识别相关源模型子集实现白盒和黑盒设置下的参数高效迁移学习，支持多源无数据监督迁移学习。

详情

AI中文摘要

对数据隐私的日益关注以及与检索源数据进行模型训练相关的其他困难，催生了无源迁移学习的需求，在这种学习中，只能访问预训练模型，而不能访问原始源域的数据。这种设置带来了许多挑战，因为许多现有的迁移学习方法通常依赖于对源数据的访问，这限制了它们直接应用于源数据不可用的场景。此外，实际问题使其更加困难，例如在没有源数据信息的情况下有效选择迁移模型，以及在没有完全访问源模型的情况下进行迁移。受此启发，我们提出了一个模型回收框架，用于参数高效的模型训练，该框架在白盒和黑盒设置中识别要重用的相关源模型的子集。因此，我们的框架使模型即服务（MaaS）提供商能够构建高效预训练模型的库，从而为多源无数据监督迁移学习创造了机会。

英文摘要

Increasing concerns for data privacy and other difficulties associated with retrieving source data for model training have created the need for source-free transfer learning, in which one only has access to pre-trained models instead of data from the original source domains. This setting introduces many challenges, as many existing transfer learning methods typically rely on access to source data, which limits their direct applicability to scenarios where source data is unavailable. Further, practical concerns make it more difficult, for instance efficiently selecting models for transfer without information on source data, and transferring without full access to the source models. So motivated, we propose a model recycling framework for parameter-efficient training of models that identifies subsets of related source models to reuse in both white-box and black-box settings. Consequently, our framework makes it possible for Model as a Service (MaaS) providers to build libraries of efficient pre-trained models, thus creating an opportunity for multi-source data-free supervised transfer learning.

URL PDF HTML ☆

赞 0 踩 0

2508.17821 2026-06-08 cs.LG cs.AI cs.CL 版本更新

Limitations of Normalization in Attention Mechanism

注意力机制中归一化的局限性

Timur Mudarisov, Mikhail Burtsev, Tatiana Petrova, Radu State

发表机构 * University of Luxembourg（卢森堡大学）； London Institute for Mathematical Sciences（伦敦数学科学研究所）

AI总结本文通过理论框架和GPT-2实验，揭示softmax归一化导致注意力随选择token数增加而趋于均匀，并分析低温度下梯度敏感性带来的训练挑战。

详情

AI中文摘要

本文研究了注意力机制中归一化的局限性。我们首先建立了一个理论框架，用于识别模型的选择能力以及token选择中涉及的几何分离。我们的分析包括在softmax缩放下token向量距离和分离准则的显式界限。通过使用预训练的GPT-2模型进行实验，我们实证验证了理论结果，并分析了注意力机制的关键行为。值得注意的是，我们证明随着所选token数量的增加，模型区分信息性token的能力下降，通常趋向于均匀选择模式。我们还表明，softmax归一化下的梯度敏感性在训练过程中带来了挑战，尤其是在低温度设置下。这些发现推进了当前对基于softmax的注意力机制的理解，并激发了在未来注意力架构中需要更稳健的归一化和选择策略的需求。

英文摘要

This paper investigates the limitations of the normalization in attention mechanisms. We begin with a theoretical framework that enables the identification of the model's selective ability and the geometric separation involved in token selection. Our analysis includes explicit bounds on distances and separation criteria for token vectors under softmax scaling. Through experiments with pre-trained GPT-2 model, we empirically validate our theoretical results and analyze key behaviors of the attention mechanism. Notably, we demonstrate that as the number of selected tokens increases, the model's ability to distinguish informative tokens declines, often converging toward a uniform selection pattern. We also show that gradient sensitivity under softmax normalization presents challenges during training, especially at low temperature settings. These findings advance current understanding of softmax-based attention mechanism and motivate the need for more robust normalization and selection strategies in future attention architectures.

URL PDF HTML ☆

赞 0 踩 0

2509.05316 2026-06-08 cs.LG cs.AI 版本更新

Standard vs. Modular Sampling: Best Practices for Reliable LLM Unlearning

标准采样与模块化采样：可靠的大语言模型遗忘的最佳实践

Praveen Bushipaka, Lucia Passaro, Tommaso Cucinotta

发表机构 * Scuola Superiore Sant’Anna（圣安纳高等学院）； University of Pisa（比萨大学）

AI总结针对大语言模型遗忘中采样策略的不足，提出模块化实体级遗忘（MELU）策略，通过多样化邻居集和模块化采样平衡遗忘效果与模型效用。

2509.21751 2026-06-08 cs.LG physics.comp-ph physics.flu-dyn 版本更新

On the Effect of Neural Field Reparameterization for 4DVAR

神经场重参数化对四维变分同化的影响

Jaemin Oh

发表机构 * Division of Applied Mathematics, Brown University（布朗大学应用数学系）

AI总结提出用神经场重参数化4DVAR，利用谱偏置隐式正则化，无需背景误差协方差，实现并行时间优化，在混沌基准测试中优于经典方法。

Comments 26 pages, 9 figures, 11 tables

详情

AI中文摘要

四维变分资料同化（4DVAR）是数值天气预报的基石，但由于目标函数的非凸性，它仍然计算密集且对初始化敏感。我们提出了一种基于神经场的4DVAR重构，其中时空状态被表示为由神经网络参数化的连续函数。我们证明，在参数空间中优化利用了神经场的谱偏置，作为隐式正则化器，稳定状态估计并抑制虚假的高频振荡，而无需显式的背景误差协方差信息。此外，通过参数化完整的时空轨迹，我们的框架实现了时间并行优化，并通过物理信息损失直接纳入物理约束。在混沌基准测试（包括二维Kolmogorov流和三维Taylor-Green涡旋）上的评估表明，神经重参数化比经典4DVAR产生更准确的初始条件。当与可分离神经架构（SPINNs）结合时，该方法实现了显著的加速。与许多机器学习方法不同，该框架不需要真实训练数据，为业务化资料同化提供了一种稳健且可扩展的替代方案。

英文摘要

Four-dimensional variational data assimilation (4DVAR) is a cornerstone of numerical weather prediction, yet it remains computationally intensive and sensitive to initialization due to the non-convexity of its objective function. We propose a neural field-based reformulation of 4DVAR in which the spatiotemporal state is represented as a continuous function parameterized by a neural network. We demonstrate that optimizing in parameter space leverages the spectral bias of neural fields, acting as an implicit regularizer that stabilizes state estimation and suppresses spurious high-frequency oscillations without requiring explicit background error covariance information. Furthermore, by parameterizing the full spatiotemporal trajectory, our framework enables parallel-in-time optimization and incorporates physical constraints directly through physics-informed losses. Evaluations on chaotic benchmarks, including 2D Kolmogorov flow and 3D Taylor-Green vortices, show that neural reparameterization produces more accurate initial conditions than classical 4DVAR. When combined with separable neural architectures (SPINNs), the method achieves substantial speedups. Unlike many machine learning approaches, this framework requires no ground-truth training data, offering a robust and scalable alternative for operational data assimilation.

URL PDF HTML ☆

赞 0 踩 0

2510.03381 2026-06-08 cs.LG cs.AI 版本更新

Proxy Reconstruction Pre-training for Ramp Flow Prediction at Highway Interchanges

高速公路立交匝道流量预测的代理重建预训练

Yongchao Li, Jun Chen, Zhuoxuan Li, Chao Gao, Yang Li, Chu Zhang, Changyin Dong

发表机构 * Southeast University（东南大学）； Institute of Telecommunications and Information Sciences, China（中国电信与信息科学研究院）

AI总结提出时空解耦自编码器（STDAE），通过跨模态重建预训练从主线数据恢复匝道流量，结合GWNet等模型提升预测精度，在真实数据集上超越13个基线。

Comments Accepted at Applied Soft Computing Journal

详情

DOI: 10.1016/j.asoc.2026.115462
Journal ref: Applied Soft Computing Journal 200 (2026) 115462

AI中文摘要

立交桥是高速公路间车辆转换的关键节点，但缺乏实时匝道检测器导致交通预测存在盲区。为解决这一问题，我们提出时空解耦自编码器（STDAE），一种利用跨模态重建预训练的两阶段框架。在第一阶段，STDAE从主线数据重建历史匝道流量，迫使模型捕捉内在的时空关系。其解耦架构通过并行的空间和时间自编码器高效提取异质特征。在预测阶段，学习到的表示与GWNet等模型集成以提高准确性。在三个真实立交数据集上的实验表明，STDAE-GWNET始终优于十三个最先进的基线，并达到与使用历史匝道数据的模型相当的性能。这证明了其在克服检测器稀缺方面的有效性及其在不同预测流程中的即插即用潜力。

英文摘要

Interchanges are crucial nodes for vehicle transfers between highways, yet the lack of real-time ramp detectors creates blind spots in traffic prediction. To address this, we propose a Spatio-Temporal Decoupled Autoencoder (STDAE), a two-stage framework that leverages cross-modal reconstruction pretraining. In the first stage, STDAE reconstructs historical ramp flows from mainline data, forcing the model to capture intrinsic spatio-temporal relations. Its decoupled architecture with parallel spatial and temporal autoencoders efficiently extracts heterogeneous features. In the prediction stage, the learned representations are integrated with models such as GWNet to enhance accuracy. Experiments on three real-world interchange datasets show that STDAE-GWNET consistently outperforms thirteen state-of-the-art baselines and achieves performance comparable to models using historical ramp data. This demonstrates its effectiveness in overcoming detector scarcity and its plug-and-play potential for diverse forecasting pipelines.

URL PDF HTML ☆

赞 0 踩 0

2510.09041 2026-06-08 cs.LG cs.AI 版本更新

Robust Driving Control for Autonomous Vehicles: An Intelligent General-sum Constrained Adversarial Reinforcement Learning Approach

自动驾驶鲁棒控制：一种智能一般和约束对抗强化学习方法

Junchao Fan, Qi Wei, Ruichen Zhang, Yang Lu, Jianhua Wang, Xiaolin Chang, Bo Ai

发表机构 * Beijing Key Laboratory of Security and Privacy in Intelligent Transportation（北京智能交通安全与隐私重点实验室）； Beijing Jiaotong University（北京交通大学）； College of Computing and Data Science（计算与数据科学学院）； Nanyang Technological University（南洋理工大学）； School of Computer Science and Technology（计算机科学与技术学院）； Taiyuan University of Technology（太原科技大学）； School of Electronics and Information Engineering（电子与信息工程学院）

AI总结针对深度强化学习在自动驾驶中易受对抗攻击的问题，提出智能一般和约束对抗强化学习（IGCARL），通过战略性目标对手和鲁棒驾驶代理的交互训练，在约束优化下提升策略稳定性，实验表明成功率比现有方法提高至少27.9%。

详情

AI中文摘要

深度强化学习（DRL）在开发自动驾驶策略方面取得了显著成功。然而，其对对抗攻击的脆弱性仍然是实际部署的关键障碍。尽管现有的鲁棒方法已取得一定成功，但它们仍面临三个关键问题：（i）这些方法针对短视的对抗攻击进行训练，限制了它们应对更具战略性威胁的能力；（ii）它们难以引发真正安全关键的事件（例如碰撞），反而常常导致轻微后果；（iii）由于缺乏鲁棒约束，这些方法在训练过程中可能导致学习不稳定和策略漂移。为了解决这些问题，我们提出了智能一般和约束对抗强化学习（IGCARL），一种新颖的鲁棒自动驾驶方法，包括一个战略性目标对手和一个鲁棒驾驶代理。战略性目标对手被设计为利用DRL的时间决策能力来执行策略协调的多步攻击。此外，它通过采用一般和目标明确地专注于引发安全关键事件。鲁棒驾驶代理通过与对手交互学习，以发展出对抗攻击的鲁棒自动驾驶策略。为了确保对抗环境中的稳定学习并减轻攻击引起的策略漂移，代理在约束公式下进行优化。大量实验表明，IGCARL相比现有最先进方法将成功率提高了至少27.9%，展示了对抗攻击的卓越鲁棒性，并增强了基于DRL的自动驾驶的安全性和可靠性。

英文摘要

Deep reinforcement learning (DRL) has demonstrated remarkable success in developing autonomous driving policies. However, its vulnerability to adversarial attacks remains a critical barrier to real-world deployment. Although existing robust methods have achieved success, they still suffer from three key issues: (i) these methods are trained against myopic adversarial attacks, limiting their abilities to respond to more strategic threats, (ii) they have trouble causing truly safety-critical events (e.g., collisions), but instead often result in minor consequences, and (iii) these methods can introduce learning instability and policy drift during training due to the lack of robust constraints. To address these issues, we propose Intelligent General-sum Constrained Adversarial Reinforcement Learning (IGCARL), a novel robust autonomous driving approach that consists of a strategic targeted adversary and a robust driving agent. The strategic targeted adversary is designed to leverage the temporal decision-making capabilities of DRL to execute strategically coordinated multi-step attacks. In addition, it explicitly focuses on inducing safety-critical events by adopting a general-sum objective. The robust driving agent learns by interacting with the adversary to develop a robust autonomous driving policy against adversarial attacks. To ensure stable learning in adversarial environments and to mitigate policy drift caused by attacks, the agent is optimized under a constrained formulation. Extensive experiments show that IGCARL improves the success rate by at least 27.9% over state-of-the-art methods, demonstrating superior robustness to adversarial attacks and enhancing the safety and reliability of DRL-based autonomous driving.

URL PDF HTML ☆

赞 0 踩 0

2510.16023 2026-06-08 cs.LG cond-mat.mtrl-sci 版本更新

MIST: 通过监督训练进行互信息估计

German Gritsai, Megan Richards, Maxime Méloux, Kyunghyun Cho, Maxime Peyrard

发表机构 * Université Grenoble Alpes（格拉诺布尔大学）； CNRS（国家科学研究中心）； Grenoble INP（格拉诺布尔研究所）； LIG（实验室）

AI总结提出一种基于神经网络的全数据驱动互信息估计器MIST，在大规模合成数据集上训练，采用二维注意力机制处理变长样本，并通过分位数回归量化不确定性，实验表明其性能优于传统方法且推理速度快。

详情

AI中文摘要

我们提出了一种完全数据驱动的互信息（MI）估计器设计方法。由于任何MI估计器都是来自两个随机变量的观测样本的函数，我们用一个神经网络（MIST）参数化这个函数，并端到端地训练它以预测MI值。训练是在一个包含625,000个已知真实MI的合成联合分布的大型元数据集上进行的。为了处理可变的样本大小和维度，我们采用了一种二维注意力机制，确保输入样本的置换不变性。为了量化不确定性，我们优化了分位数回归损失，使估计器能够近似MI的采样分布，而不是返回单个点估计。这一研究计划与先前的工作不同，它采取了一条完全经验性的路线，用普适的理论保证换取了灵活性和效率。实验表明，学习到的估计器在样本大小和维度上大大优于经典基线，包括在训练期间未见过的联合分布上。由此产生的基于分位数的区间校准良好，比基于自助法的置信区间更可靠，而推理速度比现有的神经基线快几个数量级。除了直接的实证收益外，这一框架产生了可训练、完全可微的估计器，可以嵌入到更大的学习流程中。此外，利用MI对可逆变换的不变性，元数据集可以通过归一化流适应任意数据模态，从而为多样化的目标元分布实现灵活的训练。

英文摘要

We propose a fully data-driven approach to designing mutual information (MI) estimators. Since any MI estimator is a function of the observed sample from two random variables, we parameterize this function with a neural network (MIST) and train it end-to-end to predict MI values. Training is performed on a large meta-dataset of 625,000 synthetic joint distributions with known ground-truth MI. To handle variable sample sizes and dimensions, we employ a two-dimensional attention scheme ensuring permutation invariance across input samples. To quantify uncertainty, we optimize a quantile regression loss, enabling the estimator to approximate the sampling distribution of MI rather than return a single point estimate. This research program departs from prior work by taking a fully empirical route, trading universal theoretical guarantees for flexibility and efficiency. Empirically, the learned estimators largely outperform classical baselines across sample sizes and dimensions, including on joint distributions unseen during training. The resulting quantile-based intervals are well-calibrated and more reliable than bootstrap-based confidence intervals, while inference is orders of magnitude faster than existing neural baselines. Beyond immediate empirical gains, this framework yields trainable, fully differentiable estimators that can be embedded into larger learning pipelines. Moreover, exploiting MI's invariance to invertible transformations, meta-datasets can be adapted to arbitrary data modalities via normalizing flows, enabling flexible training for diverse target meta-distributions.

URL PDF HTML ☆

赞 0 踩 0

2511.19359 2026-06-08 cs.LG 版本更新

Enhancing Conformal Prediction via Class Similarity

通过类别相似性增强保形预测

Ariel Fargion, Lahav Dabah, Tom Tirer

发表机构 * Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel（巴伊兰大学工程学院，拉马特甘，以色列）

AI总结提出利用类别相似性改进保形预测的方法，通过惩罚组外错误或利用嵌入信息，减少预测集大小并提升语义一致性。

Comments ICML 2026 (camera-ready). Code is available at: https://github.com/ariel361/CP_via_CS

详情

AI中文摘要

保形预测（CP）已成为高风险分类应用中一个强大的统计框架。CP 不是预测单个类别，而是生成一个预测集，保证以预先指定的概率包含真实标签。不同 CP 方法的性能通常通过其平均预测集大小来评估。在类别可以划分为语义组（例如需要类似治疗的疾病）的设置中，用户可以从不仅平均较小而且包含少量语义不同组的预测集中受益。本文首先解决这个问题，并最终提供一种广泛适用的工具，用于在任何数据集上提升任何 CP 方法。首先，给定一个类别划分，我们建议在 CP 评分函数中增加一个惩罚项，用于惩罚包含组外错误的预测。我们从理论上分析了这一策略，并证明了其在组相关指标上的优势。令人惊讶的是，我们从数学上表明，对于常见的类别划分，它还可以减少任何 CP 评分函数的平均集大小。我们的分析揭示了这种改进背后的类别相似性因素，并激发了一种变体，该变体可以通过利用模型的嵌入进一步减少预测集大小，而无需任何人工语义划分。最后，我们提出了一项广泛的实证研究，涵盖了著名的 CP 方法、多个模型和几个数据集，表明我们基于类别相似性的方法一致地增强了 CP 方法。

英文摘要

Conformal Prediction (CP) has emerged as a powerful statistical framework for high-stakes classification applications. Instead of predicting a single class, CP generates a prediction set, guaranteed to include the true label with a pre-specified probability. The performance of different CP methods is typically assessed by their average prediction set size. In setups where the classes can be partitioned into semantic groups, e.g., diseases that require similar treatment, users can benefit from prediction sets that are not only small on average, but also contain a small number of semantically different groups. This paper begins by addressing this problem and ultimately offers a widely applicable tool for boosting any CP method on any dataset. First, given a class partition, we propose augmenting the CP score function with a term that penalizes predictions with out-of-group errors. We theoretically analyze this strategy and prove its advantages for group-related metrics. Surprisingly, we show mathematically that, for common class partitions, it can also reduce the average set size of any CP score function. Our analysis reveals the class-similarity factors behind this improvement and motivates a variant that can further reduce prediction set size by leveraging the model's embeddings, without requiring any human semantic partition. Finally, we present an extensive empirical study, encompassing prominent CP methods, multiple models, and several datasets, which demonstrates that our class-similarity-based approach consistently enhances CP methods.

URL PDF HTML ☆

赞 0 踩 0

2511.22581 2026-06-08 cs.LG cs.MA 版本更新

High entropy leads to symmetry-equivariant policies in Dec-POMDPs

高熵导致 Dec-POMDP 中的对称等变策略

Johannes Forkel, Constantin Ruhdorfer, Michael Beukman, Andreas Bulling, Jakob Foerster

发表机构 * FLAIR, Department of Engineering Science, University of Oxford（奥德赛实验室，工程科学系，牛津大学）； Collaborative Artificial Intelligence, University of Stuttgart（协同人工智能，斯图加特大学）

AI总结证明在 Dec-POMDP 中，足够高的熵正则化可确保策略梯度收敛到对称等变联合策略，并通过实验发现高熵系数能提升跨种子交叉对战的回报。

详情

AI中文摘要

我们证明，在任何 Dec-POMDP 中，足够高的熵正则化可确保使用表格 softmax 参数化的策略梯度流对于任何初始化都收敛到相同的联合策略，并且该联合策略关于 Dec-POMDP 的所有对称性是等变的。特别地，来自不同初始化的策略将完全兼容，即它们的交叉对战回报等于自对战回报。通过在 Hanabi、Overcooked 和 Yokai 环境中对独立 PPO（可以说是标准基线深度多智能体策略梯度算法）进行广泛评估，我们发现熵系数对独立训练策略之间的交叉对战回报有巨大影响，并且增加熵正则化导致的自对战回报下降通常可以通过在训练后对学习策略进行贪婪化来抵消。特别是在 Hanabi 中，我们通过这种方式实现了跨种子交叉对战的新 SOTA。虽然我们给出了 Dec-POMDP 的示例，其中无法以这种方式学习最优对称等变策略，但我们的理论和实证结果都表明，在 Dec-POMDP 的超参数扫描中，应该考虑比通常高得多的熵系数。我们实验的代码可以在 https://github.com/jforkel/JAX-OBL 找到。

英文摘要

We prove that in any Dec-POMDP, sufficiently high entropy regularization ensures that the policy gradient flow with tabular softmax parametrization always converges, for any initialization, to the same joint policy, and that this joint policy is equivariant w.r.t. all symmetries of the Dec-POMDP. In particular, policies coming from different initializations will be fully compatible, in that their cross-play returns are equal to their self-play returns. Through extensive evaluation of independent PPO, arguably the standard baseline deep multi-agent policy gradient algorithm, in the Hanabi, Overcooked and Yokai environments, we find that the entropy coefficient has a massive influence on the cross-play returns between independently trained policies, and that the decrease in self-play returns coming from increased entropy regularization can often be counteracted by greedifying the learned policies after training. In Hanabi in particular we achieve a new SOTA in inter-seed cross-play this way. While we give examples of Dec-POMDPs in which one cannot learn the optimal symmetry-equivariant policy this way, both our theoretical and empirical results suggest that one should consider far higher entropy coefficients during hyperparameter sweeps in Dec-POMDPs than is typically done. Code for our experiments can be found at https://github.com/jforkel/JAX-OBL

URL PDF HTML ☆

赞 0 踩 0

2512.01362 2026-06-08 cs.LG 版本更新

Directed evolution algorithm drives neural prediction

定向进化算法驱动神经预测

Yanlin Wang, Nancy M Young, Patrick C M Wong

发表机构 * Brain and Mind institute, The Chinese University of Hong Kong（脑科学与智能技术研究所，香港中文大学）； Department of Linguistics and Modern Languages, The Chinese University of Hong Kong（语言学与现代语言系，香港中文大学）； Division of Otolaryngology, Ann & Robert H. Lurie Children's Hospital of Chicago（芝加哥安·罗伯特·H·卢里儿童医院耳鼻喉科）； Department of Otolaryngology Head & Neck Surgery, Feinberg School of Medicine, Northwestern University（费因伯格医学院耳鼻喉科与头颈外科部，西北大学）； Knowles Hearing Center, Department of Communication Sciences and Disorders, Northwestern University（诺里斯听力中心，西北大学沟通科学与障碍系）

AI总结提出定向进化模型(DEM)，模拟生物定向进化试错过程，结合回放缓冲和连续反向传播，在跨域神经预测中提升泛化能力并解决标签稀缺问题。

Comments 43 pages, 5 figures

详情

AI中文摘要

神经预测为预测神经认知功能和障碍的个体差异以及为个性化干预提供预后指标提供了一种有前景的方法。然而，由于领域偏移和标签稀缺的限制，将神经预测模型转化为医学人工智能应用具有挑战性。在此，我们提出定向进化模型（DEM），一种新颖的计算模型，模拟生物定向进化的试错过程，以逼近预测建模任务的最优解。我们证明了定向进化算法是一种有效的不确定性探索策略，能够增强强化学习中的泛化能力。此外，通过将回放缓冲和连续反向传播方法整合到DEM中，我们提供了在连续学习环境中实现利用与探索之间更好权衡的证据。我们在四个不同数据集上进行了实验，这些数据集涉及接受人工耳蜗植入的儿童，其口语发展结果在个体儿童水平上差异很大。术前神经MRI数据已被证明可以准确预测这些儿童术后结果，但在数据集之间不适用。我们的结果表明，DEM能够有效提高跨域植入前神经预测的性能，同时解决目标域中标签稀缺的挑战。

英文摘要

Neural prediction offers a promising approach to forecasting the individual variability of neurocognitive functions and disorders and providing prognostic indicators for personalized invention. However, it is challenging to translate neural predictive models into medical artificial intelligent applications due to the limitations of domain shift and label scarcity. Here, we propose the directed evolution model (DEM), a novel computational model that mimics the trial-and-error processes of biological directed evolution to approximate optimal solutions for predictive modeling tasks. We demonstrated that the directed evolution algorithm is an effective strategy for uncertainty exploration, enhancing generalization in reinforcement learning. Furthermore, by incorporating replay buffer and continual backpropagate methods into DEM, we provide evidence of achieving better trade-off between exploitation and exploration in continuous learning settings. We conducted experiments on four different datasets for children with cochlear implants whose spoken language developmental outcomes vary considerably on the individual-child level. Preoperative neural MRI data has shown to accurately predict the post-operative outcome of these children within but not across datasets. Our results show that DEM can efficiently improve the performance of cross-domain pre-implantation neural predictions while addressing the challenge of label scarcity in target domain.

URL PDF HTML ☆

赞 0 踩 0

2512.05291 2026-06-08 cs.LG 版本更新

SHAP-Guided Kernel Actor-Critic for Explainable Reinforcement Learning

基于SHAP引导的核化Actor-Critic可解释强化学习

Na Li, Hangguan Shan, Wei Ni, Wenjie Zhang, Xinyu Li

发表机构 * National University of Singapore（新加坡国立大学）； University of Science and Technology of China（中国科学技术大学）

AI总结提出RSA2C算法，利用RKHS-SHAP计算状态属性，通过马氏门控权重调节Actor梯度和Advantage Critic目标，实现高效、稳定且可解释的强化学习。

详情

Journal ref: ICML2026

AI中文摘要

Actor-Critic (AC) 方法是强化学习 (RL) 的基石，但可解释性有限。当前的可解释RL方法很少使用状态属性来辅助训练，而是平等对待所有状态特征，从而忽略了单个状态维度对奖励的异质性影响。我们提出基于RKHS-SHAP的高级Actor-Critic (RSA2C)，一种属性感知的、核化的、双时间尺度AC算法，包括Actor、Value Critic和Advantage Critic。Actor实例化在向量值再生核希尔伯特空间 (RKHS) 中，使用马氏加权算子值核，而Value Critic和Advantage Critic位于标量RKHS中。这些RKHS增强组件使用稀疏化字典：Value Critic维护自己的字典，而Actor和Advantage Critic共享一个字典。通过RKHS-SHAP（用于流形上期望的核均值嵌入和流形外期望的条件均值嵌入）从Value Critic计算的状态属性被转换为马氏门控权重，用于调节Actor梯度和Advantage Critic目标。我们推导了在状态扰动下的全局非渐近收敛界，通过扰动误差项显示稳定性，通过收敛误差项显示效率。在三个连续控制环境上的实验结果表明，RSA2C实现了效率、稳定性和可解释性。我们的代码可在 https://github.com/Na-Li66/RSA2C 获取。

英文摘要

Actor-critic (AC) methods are a cornerstone of reinforcement learning (RL) but offer limited interpretability. Current explainable RL methods seldom use state attributions to assist training. Rather, they treat all state features equally, thereby neglecting the heterogeneous impacts of individual state dimensions on the reward. We propose RKHS-SHAP-based Advanced Actor-Critic (RSA2C), an attribution-aware, kernelized, two-timescale AC algorithm, including Actor, Value Critic, and Advantage Critic. The Actor is instantiated in a vector-valued reproducing kernel Hilbert space (RKHS) with a Mahalanobis-weighted operator-valued kernel, while the Value Critic and Advantage Critic reside in scalar RKHSs. These RKHS-enhanced components use sparsified dictionaries: the Value Critic maintains its own dictionary, while the Actor and Advantage Critic share one. State attributions, computed from the Value Critic via RKHS-SHAP (kernel mean embedding for on-manifold and conditional mean embedding for off-manifold expectations), are converted into Mahalanobis-gated weights that modulate Actor gradients and Advantage Critic targets. We derive a global, non-asymptotic convergence bound under state perturbations, showing stability through the perturbation-error term and efficiency through the convergence-error term. Empirical results on three continuous-control environments show that RSA2C achieves efficiency, stability, and interpretability. Our code is available at https://github.com/Na-Li66/RSA2C.

URL PDF HTML ☆

赞 0 踩 0

2512.09084 2026-06-08 cs.LG 版本更新

GS-KAN: Parameter-Efficient Kolmogorov-Arnold Networks via Sprecher-Type Shared Basis Functions

GS-KAN: 通过Sprecher型共享基函数的参数高效Kolmogorov-Arnold网络

Oscar Eliasson

发表机构 * Chalmers University of Technology（挑战大学）

AI总结提出GS-KAN，通过每层共享单一父函数的线性变换构造边函数，在保持参数高效的同时，在函数逼近、表格回归和图像分类任务上优于或媲美现有KAN和MLP。

Comments 6 pages, 2 figures

详情

AI中文摘要

Kolmogorov-Arnold表示定理通过在边上而非节点上放置可学习单变量函数，为多层感知器（MLP）提供了理论替代方案。尽管最近的实现如Kolmogorov-Arnold网络（KAN）展示了高逼近能力，但由于需要为每个网络边维护唯一参数化，它们存在显著的参数低效问题。在这项工作中，我们提出GS-KAN（广义Sprecher-KAN），一种受David Sprecher对叠加定理的改进启发的轻量级架构。GS-KAN通过对每层单个可学习的共享父函数应用可学习线性变换来构造唯一的边函数。我们在合成函数逼近、表格数据回归和图像分类任务上评估了GS-KAN与现有KAN架构和MLP的性能。结果表明，GS-KAN在连续函数逼近任务上优于MLP和标准KAN基线，同时保持优越的参数效率。此外，GS-KAN在表格回归上与现有KAN架构性能相当，在高维分类任务上优于MLP。关键的是，所提出的架构使得在严格参数约束下的高维场景中部署基于KAN的架构成为可能，而标准实现由于参数爆炸通常不可行。源代码可在https://github.com/rambamn48/gs-impl获取。

英文摘要

The Kolmogorov-Arnold representation theorem offers a theoretical alternative to Multi-Layer Perceptrons (MLPs) by placing learnable univariate functions on edges rather than nodes. While recent implementations such as Kolmogorov-Arnold Networks (KANs) demonstrate high approximation capabilities, they suffer from significant parameter inefficiency due to the requirement of maintaining unique parameterizations for every network edge. In this work, we propose GS-KAN (Generalized Sprecher-KAN), a lightweight architecture inspired by David Sprecher's refinement of the superposition theorem. GS-KAN constructs unique edge functions by applying learnable linear transformations to a single learnable, shared parent function per layer. We evaluate GS-KAN against existing KAN architectures and MLPs across synthetic function approximation, tabular data regression and image classification tasks. Our results demonstrate that GS-KAN outperforms both MLPs and standard KAN baselines on continuous function approximation tasks while maintaining superior parameter efficiency. Additionally, GS-KAN achieves competitive performance with existing KAN architectures on tabular regression and outperforms MLPs on high-dimensional classification tasks. Crucially, the proposed architecture enables the deployment of KAN-based architectures in high-dimensional regimes under strict parameter constraints, a setting where standard implementations are typically infeasible due to parameter explosion. The source code is available at https://github.com/rambamn48/gs-impl.

URL PDF HTML ☆

赞 0 踩 0

2512.14391 2026-06-08 cs.LG cs.AI cs.CL 版本更新

RePo: Language Models with Context Re-Positioning

RePo：具有上下文重定位的语言模型

Huayang Li, Tianyu Zhao, Deng Cai, Richard Sproat

发表机构 * University of Maryland（马里兰大学）

AI总结提出RePo机制，通过可微分模块重新分配token位置以减轻注意力层负担，在噪声上下文、结构化数据和长上下文任务上持续提升性能。

Comments Accepted to ICML 2026

详情

AI中文摘要

上下文学习是现代大型语言模型（LLM）的基础；然而，主流架构通过分配线性或常数的位置索引来施加刚性且固定的上下文结构。刚性的位置信息将组织输入结构的全部负担强加给注意力层，从而减少了可用于更关键信息的注意力量。为了解决这个问题，我们提出了RePo，一种通过上下文重定位来减轻注意力层负担的新机制。与传统方法不同，RePo利用可微分模块$f_ϕ$来分配捕获上下文依赖关系的token位置，而不是依赖预定义的顺序。通过在OLMo-2 1B和7B模型上持续预训练，我们证明RePo在涉及噪声上下文、结构化数据和更长上下文长度的任务上持续提升性能，同时在一般短上下文任务上保持有竞争力的性能。分析表明，RePo成功地将更多注意力分配给遥远但相关的信息，在密集且非线性的空间中分配位置，并捕获输入上下文的内在结构。我们的代码位于https://github.com/SakanaAI/repo。

英文摘要

In-context learning is fundamental to modern Large Language Models (LLMs); however, prevailing architectures impose a rigid and fixed contextual structure by assigning linear or constant positional indices. The rigid position information poses the full burden of organizing the input structure to attention layers, thus reducing the amount of attention that could be allocated for more critical information. To address this, we propose RePo, a novel mechanism that alleviates the burden for attention layers via context re-positioning. Unlike conventional approaches, RePo utilizes a differentiable module, $f_ϕ$, to assign token positions that capture contextual dependencies, rather than replying on pre-defined order. By continually pre-training on the OLMo-2 1B \& 7B models, we demonstrate that RePo consistently enhances performance on tasks involving noisy contexts, structured data, and longer context length, while maintaining competitive performance on general short-context tasks. Analysis reveals that RePo successfully allocates more attention mass to distant but relevant information, assigns positions in a dense and non-linear space, and captures the intrinsic structure of the input context. Our code is at https://github.com/SakanaAI/repo.

URL PDF HTML ☆

赞 0 踩 0

2512.17058 2026-06-08 cs.LG 版本更新

Universal consistency of the $k$-NN rule in metric spaces and Nagata dimension. III

度量空间和Nagata维数中$k$-NN规则的普适一致性. III

Vladimir G. Pestov

发表机构 * Department of Mathematics and Statistics, University of Ottawa（数学与统计学系，渥太华大学）； Departamento de Matemática, Universidade Federal de Santa Catarina（数学系，圣卡塔琳娜联邦大学）

AI总结本文证明了在完备可分度量空间中，$k$-最近邻分类器普适一致的充要条件是空间具有强Lebesgue-Besicovitch微分性质或Nagata的$\sigma$-有限维数，填补了最后缺失的环节。

Comments 22 pages, latex with ESAIM P&S macros, a second revision requested by the referee, with more accurate and detailed proofs, in particular, the referee pointed out the correct value of the Nagata dimension of R^2 which is 4

详情

AI中文摘要

我们建立了最后缺失的环节，使得能够用维数理论的组合术语和实分析的基本性质来描述那些完备可分度量空间$X$，其中$k$最近邻分类器是普适一致的。以下条件等价：(1) $k$-最近邻分类器在$X$中普适一致，(2) 强Lebesgue--Besicovitch微分性质在$X$中对每个局部有限Borel测度成立，(3) $X$在Jun-Iti Nagata意义下是$\sigma$-有限维的。等价关系(2)$\iff$(3)由Preiss (1983)宣布，而(3)$\Rightarrow$(2)的详细证明仅出现在Assouad和Quentin de Gromard (2006)中。(2)$\Rightarrow$(1)由Cérou和Guyader (2006)建立。我们证明了(1)$\Rightarrow$(3)。我们进一步表明，弱（而非强）Lebesgue--Besicovitch性质对于$k$-NN规则的一致性是不充分的，例如Heisenberg群就是一个反例（这里我们纠正了之前文章(Kumari and Pestov 2024)中的一个错误说法）。有点反直觉的是，存在一个与通常距离一致等价的实数直线上的度量，在该度量下$k$-NN分类器失效。最后，另一个可以添加到上述条件的等价条件是Cover--Hart性质：(4) $1$-最近邻分类器的误差渐近地至多是Bayes误差的两倍。

英文摘要

We establish the last missing link allowing to describe those complete separable metric spaces $X$ in which the $k$ nearest neighbour classifier is universally consistent, both in combinatorial terms of dimension theory and via a fundamental property of real analysis. The following are equivalent: (1) The $k$-nearest neighbour classifier is universally consistent in $X$, (2) The strong Lebesgue--Besicovitch differentiation property holds in $X$ for every locally finite Borel measure, (3) $X$ is sigma-finite dimensional in the sense of Jun-Iti Nagata. The equivalence (2)$\iff$(3) was announced by Preiss (1983), while a detailed proof of the implication (3)$\Rightarrow$(2) has only appeared in Assouad and Quentin de Gromard (2006). The implication (2)$\Rightarrow$(1) was established by Cérou and Guyader (2006). We prove the implication (1)$\Rightarrow$(3). We further show that the weak (instead of strong) Lebesgue--Besicovitch property is insufficient for the consistency of the $k$-NN rule, as witnessed, for example, by the Heisenberg group (here we correct a wrong claim made in the previous article (Kumari and Pestov 2024)). A bit counter-intuitively, there is a metric on the real line uniformly equivalent to the usual distance but under which the $k$-NN classifier fails. Finally, another equivalent condition that can be added to the above is the Cover--Hart property: (4) the error of the $1$-nearest neighbour classifier is asymptotically at most twice as bad as the Bayes error.

URL PDF HTML ☆

赞 0 踩 0

2512.20963 2026-06-08 cs.LG cs.CV 版本更新

Generalization of Diffusion Models Arises with a Balanced Representation Space

扩散模型的泛化源于平衡表示空间

Zekai Zhang, Xiao Li, Xiang Li, Lianghe Shi, Meng Wu, Molei Tao, Qing Qu

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结通过分析两层ReLU去噪自编码器，证明记忆化导致局部尖峰表示，而泛化产生平衡表示，并在真实扩散模型中验证，提出基于表示的检测和编辑方法。

Comments Accepted at ICLR 2026. 40 pages, 19 figures. The first two authors contributed equally

详情

AI中文摘要

扩散模型擅长生成高质量、多样化的样本，但当过度拟合训练目标时，它们有记忆训练数据的风险。我们通过表示学习的视角分析了扩散模型中记忆化和泛化之间的区别。通过研究两层ReLU去噪自编码器（DAE），我们证明了（i）记忆化对应于模型在学习的权重中存储原始训练样本以进行编码和解码，产生局部尖峰表示，而（ii）泛化发生在模型捕获局部数据统计时，产生平衡表示。此外，我们在真实的无条件和文本到图像扩散模型上验证了这些理论发现，表明相同的表示结构出现在深度生成模型中，并具有重要的实际意义。基于这些见解，我们提出了一种基于表示的检测记忆化的方法，以及一种无需训练的编辑技术，通过表示引导实现精确控制。总之，我们的结果强调了学习好的表示对于新颖且有意义的生成建模至关重要。

英文摘要

Diffusion models excel at generating high-quality, diverse samples, yet they risk memorizing training data when overfit to the training objective. We analyze the distinctions between memorization and generalization in diffusion models through the lens of representation learning. By investigating a two-layer ReLU denoising autoencoder (DAE), we prove that (i) memorization corresponds to the model storing raw training samples in the learned weights for encoding and decoding, yielding localized spiky representations, whereas (ii) generalization arises when the model captures local data statistics, producing balanced representations. Furthermore, we validate these theoretical findings on real-world unconditional and text-to-image diffusion models, demonstrating that the same representation structures emerge in deep generative models with significant practical implications. Building on these insights, we propose a representation-based method for detecting memorization and a training-free editing technique that allows precise control via representation steering. Together, our results highlight that learning good representations is central to novel and meaningful generative modeling.

URL PDF HTML ☆

赞 0 踩 0

2601.16622 2026-06-08 cs.LG cs.AI 版本更新

E2Former-V2: On-the-Fly Equivariant Attention with Linear Activation Memory

E2Former-V2：具有线性激活内存的即时等变注意力

Lin Huang, Chengxiang Huang, Ziang Wang, Yiyue Du, Chu Wang, Haocheng Lu, Yunyang Li, Xiaoli Liu, Arthur Jiang, Jia Zhang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出E2Former-V2架构，通过等变轴对齐稀疏化（EAAS）和即时等变注意力机制，利用SO(3)到SO(2)基变换和自定义Triton内核，实现线性激活内存和20倍TFLOPS提升，在SPICE和OMol25数据集上加速推理并保持预测性能。

详情

AI中文摘要

等变图神经网络（EGNN）已成为建模3D原子系统的广泛使用的方法。然而，主流架构由于在每条边上显式构造几何特征或密集张量积而面临关键的可扩展性瓶颈。为克服这一问题，我们引入了**E2Former-V2**，一种将代数稀疏性与硬件感知执行相结合的可扩展架构。我们首先提出**等变轴对齐稀疏化（EAAS）**。EAAS基于Wigner-$6j$卷积，利用$\mathrm{SO}(3) \rightarrow \mathrm{SO}(2)$基变换，将计算昂贵的密集张量收缩转化为高效的稀疏奇偶重索引操作。基于这种表示，我们引入了**即时等变注意力**，一种通过自定义融合Triton内核实现的完全节点中心机制。通过消除物化的边张量并最大化SRAM利用率，我们的内核相比标准实现实现了**20倍的TFLOPS提升**。在SPICE和OMol25数据集上的大量实验表明，E2Former-V2在保持相当预测性能的同时显著加速推理。这项工作表明，大型等变Transformer可以使用广泛可用的GPU平台高效训练。代码可在https://github.com/IQuestLab/UBio-MolFM/tree/e2formerv2获取。

英文摘要

Equivariant Graph Neural Networks (EGNNs) have become a widely used approach for modeling 3D atomistic systems. However, mainstream architectures face critical scalability bottlenecks due to the explicit construction of geometric features or dense tensor products on \textit{every} edge. To overcome this, we introduce \textbf{E2Former-V2}, a scalable architecture that integrates algebraic sparsity with hardware-aware execution. We first propose \textbf{E}quivariant \textbf{A}xis-\textbf{A}ligned \textbf{S}parsification (EAAS). EAAS builds on Wigner-$6j$ convolution by exploiting an $\mathrm{SO}(3) \rightarrow \mathrm{SO}(2)$ change of basis to transform computationally expensive dense tensor contractions into efficient, sparse parity re-indexing operations. Building on this representation, we introduce \textbf{On-the-Fly Equivariant Attention}, a fully node-centric mechanism implemented via a custom fused Triton kernel. By eliminating materialized edge tensors and maximizing SRAM utilization, our kernel achieves a \textbf{20$\times$ improvement in TFLOPS} compared to standard implementations. Extensive experiments on the SPICE and OMol25 datasets demonstrate that E2Former-V2 maintains comparable predictive performance while notably accelerating inference. This work demonstrates that large equivariant transformers can be trained efficiently using widely accessible GPU platforms. The code is avalible at https://github.com/IQuestLab/UBio-MolFM/tree/e2formerv2.

URL PDF HTML ☆

赞 0 踩 0

2601.23207 2026-06-08 cs.LG cs.AI 版本更新

Learning to Execute Graph Algorithms Exactly with Graph Neural Networks

学习用图神经网络精确执行图算法

Muhammad Fetrat Qharabagh, Artur Back de Luca, George Giapitzakis, Kimon Fountoulakis

发表机构 * University of Waterloo（多伦多大学）

AI总结证明在有限度和有限精度约束下，图神经网络能通过训练多层感知机集成学习局部指令，从而在推理时无误差执行完整图算法，并展示了在分布式计算LOCAL模型及多种经典算法上的可学习性。

详情

AI中文摘要

理解图神经网络能学习什么，特别是它们学习执行算法的能力，仍然是一个核心的理论挑战。在这项工作中，我们证明了在有限度和有限精度约束下图算法的精确可学习性结果。我们的方法遵循两步过程。首先，我们训练一个多层感知机（MLP）集成来执行单个节点的局部指令。其次，在推理过程中，我们使用训练好的MLP集成作为图神经网络（GNN）中的更新函数。利用神经正切核（NTK）理论，我们表明局部指令可以从一个小训练集中学习，从而使得完整的图算法在推理过程中能够以高概率无误差地执行。为了说明我们设置的学习能力，我们为分布式计算的LOCAL模型建立了一个严格的可学习性结果。我们进一步展示了广泛研究的算法（如消息洪泛、广度优先搜索、深度优先搜索和贝尔曼-福特算法）的积极可学习性结果。

英文摘要

Understanding what graph neural networks can learn, especially their ability to learn to execute algorithms, remains a central theoretical challenge. In this work, we prove exact learnability results for graph algorithms under bounded-degree and finite-precision constraints. Our approach follows a two-step process. First, we train an ensemble of multi-layer perceptrons (MLPs) to execute the local instructions of a single node. Second, during inference, we use the trained MLP ensemble as the update function within a graph neural network (GNN). Leveraging Neural Tangent Kernel (NTK) theory, we show that local instructions can be learned from a small training set, enabling the complete graph algorithm to be executed during inference without error and with high probability. To illustrate the learning power of our setting, we establish a rigorous learnability result for the LOCAL model of distributed computation. We further demonstrate positive learnability results for widely studied algorithms such as message flooding, breadth-first and depth-first search, and Bellman-Ford.

URL PDF HTML ☆

赞 0 踩 0

2602.00541 2026-06-08 cs.LG 版本更新

One Loss to Rule Them All: Marked Time-to-Event for Structured EHR Foundation Models

一个损失统治一切：结构化EHR基础模型的标记时间到事件

Zilin Jing, Vincent Jeanselme, Yuta Kobayashi, Simon A. Lee, Chao Pang, Aparajita Kashyap, Yanwei Li, Xinzhuo Jiang, Shalmali Joshi

发表机构 * Department of Computer Science, Columbia University（哥伦比亚大学计算机科学系）； Department of Biomedical Informatics, Columbia University（哥伦比亚大学生物医学信息学系）； Department of Computational Medicine, UCLA（洛杉矶大学计算医学系）； Formation Bio

AI总结提出ORA预训练目标，联合建模事件时间和关联测量，相比下一词预测和忽略连续测量的损失，在多个数据集和下游任务上产生更通用的表示，提升回归和时间到事件预测能力。

详情

AI中文摘要

电子健康记录（EHR）中捕获的临床事件是不规则采样的，可能由离散事件和数值测量（如实验室值或治疗剂量）混合组成。EHR的序列性质类似于自然语言，这促使使用下一词预测来训练事件上的EHR基础模型（FM）。然而，这种训练未能捕获EHR的完整结构。必须捕获给定事件发生的时间，但事件值（异常实验室）也会调节其他临床事件的可能性。大多数现有的EHR FM不联合建模这种可能性，无法捕获完整的观察过程，影响下游能力。我们提出ORA，一种标记时间到事件预训练目标，联合建模事件时间和相关测量。在多个数据集、下游任务和模型骨干上，该目标始终比下一词预测和忽略连续测量的预训练损失产生更可泛化的表示。重要的是，所提出的目标在传统分类评估之外带来改进，包括更好的回归和时间到事件预测。除了引入新的FM家族，我们的消融研究提出了更广泛的结论：考虑EHR结构的预训练目标对于扩展下游能力和泛化性至关重要。

英文摘要

Clinical events captured in Electronic Health Records (EHR) are irregularly sampled and may consist of a mixture of discrete events and numerical measurements, such as laboratory values or treatment dosages. The sequential nature of EHR, analogous to natural language, has motivated the use of next-token prediction to train prior EHR Foundation Models (FMs) over events. However, this training fails to capture the full structure of EHR. When a given event occurs must be captured, but the event value (abnormal lab) also modulates the likelihood of other clinical events. Most existing EHR FMs do not jointly model this likelihood and are unable to capture the full observation process, impacting downstream capabilities. We propose ORA, a marked time-to-event pretraining objective that jointly models event timing and associated measurements. Across multiple datasets, downstream tasks, and model backbones, this objective consistently yields more generalizable representations than next-token prediction and pretraining losses that ignore continuous measurements. Importantly, the proposed objective yields improvements beyond traditional classification evaluation, including better regression and time-to-event prediction. Beyond introducing a new family of FMs, our ablations suggest a broader takeaway: pretraining objectives that account for EHR structure are critical for expanding downstream capabilities and generalizability.

URL PDF HTML ☆

赞 0 踩 0

2602.02600 2026-06-08 cs.LG cs.AI 版本更新

Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models

自回归与扩散语言模型中的逐步拒绝动态

Eliron Rahimi, Elad Hirshel, Rom Himelstein, Amit LeVi, Avi Mendelson, Chaim Baskin

发表机构 * Department of Computer Science, Technion – Israel Institute of Technology（技术学院计算机科学系，以色列技术学院）； INSIGHT Lab, School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Israel（内斯坦实验室，贝内-加隆大学内加尔分校，以色列）； Computer Science Department, University of Haifa, Haifa, Israel（海法大学计算机科学系，海法，以色列）

AI总结研究扩散语言模型（DLM）与自回归（AR）模型在拒绝有害生成行为上的差异，发现扩散重掩码机制可促进恢复，提出逐步拒绝内部动态（SRI）信号，并基于此构建无需修改推理的越狱检测器。

Comments Preprint

详情

AI中文摘要

扩散语言模型（DLM）最近已成为自回归（AR）模型的有竞争力的替代方案，提供并行解码、竞争性生成质量以及越狱鲁棒性改善的初步证据。尽管取得了这些进展，但采样机制在塑造拒绝行为中的作用仍知之甚少。为填补这一空白，我们提出了一项关于逐步拒绝动态的全面研究。我们表明，扩散重掩码可以促进从有害中间生成中恢复，提供证据表明这种行为与采样机制相关，并证明从AR采样切换到扩散采样可提高越狱鲁棒性，包括在固定模型权重下。为了捕捉在文本层面不可观察的生成动态，我们提出了逐步拒绝内部动态（SRI）信号。与我们的文本层面发现一致，SRI表明恢复主要在AR采样下失败，这些失败在SRI空间中通常相对于无害生成表现为异常。基于这一观察，我们表明SRI能够实现一个简单的越狱检测器，该检测器无需修改推理，并且仅通过在良性SRI信号上训练即可泛化到未见攻击。我们的评估表明，该检测器匹配或超越现有越狱检测基线，同时增加可忽略的开销。

英文摘要

Diffusion language models (DLMs) have recently emerged as a competitive alternative to autoregressive (AR) models, offering parallel decoding, competitive generation quality, and initial evidence of improved jailbreak robustness. Despite this progress, the role of sampling mechanisms in shaping refusal behavior remains poorly understood. To address this gap, we present a comprehensive study of step-wise refusal dynamics. We show that diffusion remasking can promote recovery from harmful intermediate generations, provide evidence that this behavior is tied to the sampling mechanism, and demonstrate that switching from AR to diffusion sampling improves jailbreak robustness, including under fixed model weights. To capture generation dynamics not observable at the text level, we propose the Step-Wise Refusal Internal Dynamics (SRI) signal. Consistent with our text-level findings, SRI shows that recovery fails primarily under AR sampling, with these failures often appearing anomalous relative to harmless generations in the SRI space. Based on this observation, we show that SRI enables a simple jailbreak detector that does not modify inference and generalizes to unseen attacks by training only on benign SRI signals. Our evaluation shows that this detector matches or outperforms existing jailbreak detection baselines while adding negligible overhead.

URL PDF HTML ☆

赞 0 踩 0

2602.05833 2026-06-08 cs.LG 版本更新

SecretFan: Synthesizing Realistic Data without Breaking Privacy

SecretFan: 在不破坏隐私的情况下合成真实数据

Laura Plein, Alexi Turcotte, Arina Hallemans, Andreas Zeller

发表机构 * CISPA Helmholtz Center for Information Security（CISPA赫尔姆霍尔茨信息安全部）； Saarland University（萨尔兰州大学）

AI总结提出将合成数据生成视为引导测试生成问题，结合生成对抗网络（GAN）的判别器和模糊测试生成器，在保护隐私的同时生成高可用性合成数据。

详情

AI中文摘要

需要合成训练和测试数据集，这些数据集能够复制原始数据集的统计分布，同时不损害其机密性。已有大量研究利用生成对抗网络（GAN）进行合成数据生成，但生成的模型要么不够准确，要么由于原始数据在训练过程中被利用，仍然容易受到成员推断攻击（MIA）或数据集重建攻击。在本文中，我们将合成数据生成视为引导测试生成或基于搜索的测试问题，而不是纯粹的生成建模任务。我们提出了一种基于搜索的、充分性引导的输入生成技术，灵感来自GAN，包括生成步骤和判别步骤；与GAN一样，判别使用在数据上训练的判别器模型，但生成部分我们不使用模型，而是使用模糊测试器。这样，原始（私有）数据仅在生成过程中间接利用，通过演化样本并用判别器确定“好样本”，我们可以生成遵循与原始数据集相同统计分布的隐私保护数据，从而获得与原始数据相似的效用。我们在八个用于评估最先进技术的数据集上评估了我们的方法，发现我们的技术生成的合成数据平均具有良好效用，同时具有较高的相似性得分，突显了结合经典生成和模型驱动判别的混合方法在生成隐私保护且有用的合成数据集方面的潜力。

英文摘要

There is a need for synthetic training and test datasets that replicate statistical distributions of original datasets without compromising their confidentiality. A lot of research has been done in leveraging Generative Adversarial Networks (GANs) for synthetic data generation, however the resulting models are either not accurate enough or are still vulnerable to membership inference attacks (MIA) or dataset reconstruction attacks since the original data has been leveraged in the training process. In this paper, we frame synthetic data generation as a guided test generation, or search-based testing problem rather than a purely generative modeling task. Ours is a search-based, adequacy-guided input generation technique inspired by GANs, with a generation step and a discrimination step; as in GAN, discrimination uses a discriminator model trained on the date, but instead of using models also for generation, we use a fuzzer. This way, the original (private) data is only indirectly leveraged in the generation process, and by evolving samples and determining "good samples" with the discriminator, we can generate privacy-preserving data that follows the same statistical distributions as the original dataset, leading to a similar utility as the original data. We evaluated our approach on eight datasets that have been used to evaluate the state-of-the-art techniques, finding that synthetic generated with our technique achieves good utility on average while also having good similarity scores, highlighting the potential of a mixed approach leveraging classical generation and model-driven discrimination for generating privacy-preserving, useful synthetic datasets.

URL PDF HTML ☆

赞 0 踩 0

2602.06941 2026-06-08 cs.LG cs.AI cs.CL 版本更新

Endogenous Resistance to Activation Steering in Language Models

语言模型中激活引导的内生抵抗

Alex McKenzie, Keenan Pepper, Stijn Servaes, Martin Leitgab, Murat Cubuktepe, Mike Vaiana, Diogo de Lucena, Judd Rosenblatt, Michael S. A. Graziano

发表机构 * University of Washington（华盛顿大学）

AI总结研究发现大型语言模型在任务不匹配的激活引导下能内生抵抗，通过显式重启恢复正确生成，并识别出相关稀疏自编码器潜在变量，可增强或削弱该抵抗。

详情

AI中文摘要

大型语言模型可以在生成过程中从任务不匹配的激活引导中恢复，产生显式的语言重启（例如，“等等，那不对”），并在引导扰动仍然活跃的情况下继续讨论主题。我们将此称为内生引导抵抗（ESR）。使用稀疏自编码器（SAE）潜在变量来引导模型激活，我们发现Llama-3.3-70B在\llamaseventyEsrRate\\%的情况下表现出显式ESR，而来自Llama-3和Gemma-2系列的较小模型则较少出现显式形式。两个对照实验将ESR分解为检测事件和持续抵抗组件，后者不能仅由最近的on-topic token条件化来完全解释。我们通过对比on-topic/off-topic搜索识别出\numOtdLatents{}个SAE潜在变量；将其零消融使多次尝试率降低\multiAttemptReductionPct\\%，随机潜在变量和保留提示对照支持特异性。ESR还可以通过元提示和基于合成自我纠正示例的微调来有意增强。ESR对安全性具有双重影响：它可能使模型对对抗性激活空间操纵更具抵抗力，但同样可能干扰有益的基于引导的干预，因为模型无法区分两者。代码可在\href{https://github.com/agencyenterprise/endogenous-steering-resistance}{github.com/agencyenterprise/endogenous-steering-resistance}获取。

英文摘要

Large language models can recover mid-generation from task-misaligned activation steering, producing explicit verbal restarts (e.g., ``wait, that's not right'') and continuing on-topic even while the steering perturbation remains active. We term this Endogenous Steering Resistance (ESR). Using sparse autoencoder (SAE) latents to steer model activations, we find that Llama-3.3-70B exhibits explicit ESR at \llamaseventyEsrRate\%, with smaller models from the Llama-3 and Gemma-2 families showing the explicit form less frequently. Two controls dissociate ESR into a detection event and a sustained-resistance component that conditioning on recent on-topic tokens does not fully explain. We identify \numOtdLatents{} SAE latents through contrastive on-topic/off-topic search; zero-ablating them reduces the multi-attempt rate by \multiAttemptReductionPct\%, with random-latent and held-out-prompt controls supporting specificity. ESR can also be deliberately enhanced through both meta-prompting and fine-tuning on synthetic self-correction examples. ESR has dual implications for safety: it could harden models against adversarial activation-space manipulation, but may equally interfere with beneficial steering-based interventions, since the model has no way to distinguish the two. Code is available at \href{https://github.com/agencyenterprise/endogenous-steering-resistance}{github.com/agencyenterprise/endogenous-steering-resistance}.

URL PDF HTML ☆

赞 0 踩 0

2602.08857 2026-06-08 cs.LG cs.AI cs.CL 版本更新

Discovering Interpretable Algorithms by Decompiling Transformers to RASP

通过将Transformer反编译为RASP发现可解释算法

Xinting Huang, Aleksandra Bakalova, Satwik Bhattamishra, William Merrill, Michael Hahn

发表机构 * Saarland Informatics Campus, Saarland University（萨尔兰大学信息学院校区，萨尔兰大学）； University of Oxford（牛津大学）； Allen Institute for AI（人工智能研究所）

AI总结提出一种将训练好的Transformer忠实重参数化为RASP程序，并通过因果干预发现小型充分子程序的方法，实验表明长度泛化的Transformer内部实现了简单可解释的RASP程序。

Comments 104 pages, 92 figures. Accepted for publication at ICML 2026

详情

AI中文摘要

近期研究表明，Transformer的计算可以在RASP编程语言家族中模拟。这些发现增进了对Transformer表达能力和泛化能力的理解。特别是，Transformer被建议在具有简单RASP程序的问题上精确实现长度泛化。然而，训练模型是否实际实现了简单的可解释程序仍是一个开放问题。在本文中，我们提出了一种从训练好的Transformer中提取此类程序的通用方法。其思想是将Transformer忠实地重参数化为RASP程序，然后应用因果干预来发现一个小的充分子程序。在算法和形式语言任务上训练的小型Transformer实验中，我们表明我们的方法通常能从长度泛化的Transformer中恢复简单且可解释的RASP程序。我们的结果提供了迄今为止最直接的证据，证明Transformer内部实现了简单的RASP程序。

英文摘要

Recent work has shown that the computations of Transformers can be simulated in the RASP family of programming languages. These findings have enabled improved understanding of the expressive capacity and generalization abilities of Transformers. In particular, Transformers have been suggested to length-generalize exactly on problems that have simple RASP programs. However, it remains open whether trained models actually implement simple interpretable programs. In this paper, we present a general method to extract such programs from trained Transformers. The idea is to faithfully re-parameterize a Transformer as a RASP program and then apply causal interventions to discover a small sufficient sub-program. In experiments on small Transformers trained on algorithmic and formal language tasks, we show that our method often recovers simple and interpretable RASP programs from length-generalizing transformers. Our results provide the most direct evidence so far that Transformers internally implement simple RASP programs.

URL PDF HTML ☆

赞 0 踩 0

2602.14209 2026-06-08 cs.LG cs.CL 版本更新

MAGE: All-[MASK] Block Already Knows Where to Look in Block Diffusion LLM

MAGE：在块扩散LLM中，全[MASK]块已经知道在哪里看

Omin Kwon, Yeonjae Kim, Doyeon Kim, Minseo Kim, Yeonhong Park, Jae W. Lee

发表机构 * Seoul National University（首尔国立大学）； Meta

AI总结针对块扩散LLM长上下文推理中KV缓存导致的内存瓶颈，提出无训练方法MAGE，利用块扩散训练目标的对齐特性，在第一步确定整个轨迹的KV子集，实现近无损精度和显著加速。

详情

AI中文摘要

块扩散LLM是一种并行语言生成的新兴范式，但其KV缓存使得内存访问成为长上下文推理中的主要瓶颈。稀疏注意力（每个查询仅关注少量KV子集）可以在最小化精度损失的情况下减少延迟。然而，在块扩散中，每个块的B个token必须共享一个KV子集，我们证明这种每块约束会使现有稀疏KV估计器的召回率下降高达25%。为了解决这一挑战，我们利用了块扩散训练目标中出现的一个特性：它将去噪步骤中的块平均查询对齐，因此第一步的全[MASK]块已经揭示了整个轨迹中每块的KV子集。我们在MAGE（[MASK]引导的稀疏注意力）中利用了这一特性，这是一种无训练方法，在第一步执行一次精确注意力，并在块内的所有剩余步骤中重用其top-k索引集。在LongBench上的三个块扩散家族中，MAGE在k=512时匹配精确注意力，精度几乎无损，在128K上下文中实现高达6.82倍的端到端加速，并且比分别为自回归LLM和全双向扩散LLM设计的Quest和SparseD快3.35倍和2.28倍。

英文摘要

Block diffusion LLMs are an emerging paradigm for parallel language generation, but their KV caching makes memory access the dominant bottleneck in long-context inference. Sparse attention, which attends only to a small KV subset per query, can reduce this latency with minimal accuracy loss. In block diffusion, however, the B tokens of each block must share a single KV subset, and we show this per-block constraint degrades existing sparse KV estimators by up to 25% in recall. We address this challenge by exploiting a property that emerges from the block-diffusion training objective: it aligns the block-average query across denoising steps, so the All-[MASK] block at the first step already reveals the per-block KV subset for the entire trajectory. We exploit this in MAGE ([MASK]-Guided Sparse Attention), a training-free method that runs one exact attention pass at the first step and reuses its top-k index sets for all remaining steps within the block. Across three block-diffusion families on LongBench, MAGE matches Exact Attention at k=512 with near-lossless accuracy, achieves up to 6.82x end-to-end speedup at 128K context, and runs up to 3.35x and 2.28x faster than Quest and SparseD, designed for AR LLMs and fully bidirectional diffusion LLMs, respectively.

URL PDF HTML ☆

赞 0 踩 0

2602.16864 2026-06-08 cs.LG cs.AI math.DS 版本更新

Position: A Dynamical Systems Perspective is Needed to Advance Time Series Modeling

立场：需要动力系统视角以推进时间序列建模

Daniel Durstewitz, Christoph Jürgen Hemmer, Florian Hess, Charlotte Ricarda Doll, Lukas Eisenmann

发表机构 * University of Tübingen（图宾根大学）

AI总结本文主张时间序列建模需引入动力系统视角，通过重构底层DS实现更优预测，并讨论其理论优势与具体建议。

详情

AI中文摘要

时间序列（TS）建模从早期的统计方法（主要是线性方法）发展到当前TS基础模型的趋势，已经走过了很长的路。由于该领域存在大量炒作和工业需求，实际进展并不总是清晰。为了将TS预测和分析提升到新水平，本文主张该领域需要动力系统（DS）视角。来自自然或工程系统的观测TS几乎总是源于某个底层DS，并且可以说，访问其控制方程将产生理论上的最优预测。这是DS重构（DSR）的承诺，这是一类旨在从数据中推断底层DS替代模型的ML/AI方法。但基于DS原理的模型还提供了其他深刻优势：除了短期预测，它们还能预测观测系统的长期统计量，这在许多实际场景中可能是更相关的量。此外，DS理论提供了领域无关的理论洞见，理解TS生成的机制，从而告知我们例如任何TS模型性能的上限、向未见过场景（如临界点）的泛化，或潜在的控制策略。在回顾DS理论和DSR中的一些核心概念、方法、度量和模型后，我们将讨论该领域的洞见如何以关键方式推进TS建模，实现更好的预测，同时大幅降低计算和内存占用。最后，我们提出若干具体建议，将DSR的洞见转化为TS建模实践。

英文摘要

Time series (TS) modeling has come a long way from early statistical, mainly linear, approaches to the current trend in TS foundation models. With a lot of hype and industrial demand in this field, it is not always clear how much progress there really is. To advance TS forecasting and analysis to the next level, here we argue that the field needs a dynamical systems (DS) perspective. TS of observations from natural or engineered systems almost always originate from some underlying DS, and arguably access to its governing equations would yield theoretically optimal forecasts. This is the promise of DS reconstruction (DSR), a class of ML/AI approaches that aim to infer surrogate models of the underlying DS from data. But models based on DS principles offer other profound advantages: Beyond short-term forecasts, they enable to predict the long-term statistics of an observed system, which in many practical scenarios may be the more relevant quantities. DS theory furthermore provides domain-independent theoretical insight into mechanisms underlying TS generation, and thereby will inform us, e.g., about upper bounds on performance of any TS model, generalization into unseen regimes as in tipping points, or potential control strategies. After reviewing some of the central concepts, methods, measures, and models in DS theory and DSR, we will discuss how insights from this field can advance TS modeling in crucial ways, enabling better forecasting with much lower computational and memory footprints. We conclude with a number of specific suggestions for translating insights from DSR into TS modeling.

URL PDF HTML ☆

赞 0 踩 0

2602.18905 2026-06-08 cs.LG cs.AI cs.CL 版本更新

TRUE: A Trustworthy Unified Explanation Framework for Large Language Model Reasoning

TRUE：一种用于大语言模型推理的可信统一解释框架

Yujiao Yang

发表机构 * Dalian University of Technology（大连理工大学）

AI总结提出TRUE框架，通过可执行推理验证、可行域DAG建模和因果故障模式分析，为LLM推理提供实例级、局部结构级和类别级的多层次可验证解释。

详情

AI中文摘要

大型语言模型（LLM）在复杂推理任务中展现出强大能力，但其决策过程仍难以解释。现有解释方法通常缺乏可信的结构性洞察，且局限于单实例分析，无法揭示推理稳定性和系统性故障机制。为解决这些局限，我们提出可信统一解释框架（TRUE），该框架集成了可执行推理验证、可行域有向无环图（DAG）建模和因果故障模式分析。在实例层面，我们将推理轨迹重新定义为可执行过程规范，并引入盲执行验证来评估操作有效性。在局部结构层面，我们通过结构一致性扰动构建可行域DAG，从而显式刻画局部输入空间中推理稳定性和可执行区域。在类别层面，我们引入因果故障模式分析方法，识别重复出现的结构性故障模式，并使用Shapley值量化其因果影响。在多个推理基准上的广泛实验表明，所提框架提供了多层次、可验证的解释，包括单个实例的可执行推理结构、邻近输入的可行域表示以及类别层面具有量化重要性的可解释故障模式。这些结果建立了一个统一且原则性的范式，用于提高LLM推理系统的可解释性和可靠性。

英文摘要

Large language models (LLMs) have demonstrated strong capabilities in complex reasoning tasks, yet their decision-making processes remain difficult to interpret. Existing explanation methods often lack trustworthy structural insight and are limited to single-instance analysis, failing to reveal reasoning stability and systematic failure mechanisms. To address these limitations, we propose the Trustworthy Unified Explanation Framework (TRUE), which integrates executable reasoning verification, feasible-region directed acyclic graph (DAG) modeling, and causal failure mode analysis. At the instance level, we redefine reasoning traces as executable process specifications and introduce blind execution verification to assess operational validity. At the local structural level, we construct feasible-region DAGs via structure-consistent perturbations, enabling explicit characterization of reasoning stability and the executable region in the local input space. At the class level, we introduce a causal failure mode analysis method that identifies recurring structural failure patterns and quantifies their causal influence using Shapley values. Extensive experiments across multiple reasoning benchmarks demonstrate that the proposed framework provides multi-level, verifiable explanations, including executable reasoning structures for individual instances, feasible-region representations for neighboring inputs, and interpretable failure modes with quantified importance at the class level. These results establish a unified and principled paradigm for improving the interpretability and reliability of LLM reasoning systems.

URL PDF HTML ☆

赞 0 踩 0

2603.02220 2026-06-08 cs.LG cs.AI cs.CV 版本更新

概率高斯同伦：非凸优化的概率空间延拓框架

Eshed Gal, Samy Wu Fung, Eldad Haber

发表机构 * University of British Columbia（不列颠哥伦比亚大学）； Colorado School of Mines（科罗拉多矿业学院）

AI总结提出概率高斯同伦（PGH）框架，通过变形玻尔兹曼分布和玻尔兹曼加权梯度聚合，实现非凸优化的概率空间延拓，并导出基于蒙特卡洛梯度估计的实用算法PGHO。

详情

AI中文摘要

我们提出了概率高斯同伦（PGH），一种用于非凸优化的概率空间延拓框架。与经典的高斯同伦（平滑目标函数并均匀平均梯度）不同，PGH 变形相关的玻尔兹曼分布，并诱导扰动梯度的玻尔兹曼加权聚合，从而将下降方向指数地偏向低能量区域。我们证明 PGH 对应于一种 log-sum-exp（软最小）同伦，它在尺度 $λ>0$ 下平滑非凸目标函数，并在 $λ\ o 0$ 时恢复原始目标函数，从而得到 Moreau 包络的后验均值推广，并且我们推导了沿着退火同伦路径控制极小值演化的动力系统。这建立了高斯延拓、贝叶斯去噪和扩散式平滑之间的原理性联系。我们进一步提出了概率高斯同伦优化（PGHO），一种基于蒙特卡洛梯度估计的实用随机算法，并在高维非凸基准测试和稀疏恢复问题上展示了强大的性能，而经典梯度方法和目标空间平滑在这些问题上经常失败。

英文摘要

We introduce Probabilistic Gaussian Homotopy (PGH), a probability-space continuation framework for nonconvex optimization. Unlike classical Gaussian homotopy, which smooths the objective and uniformly averages gradients, PGH deforms the associated Boltzmann distribution and induces Boltzmann-weighted aggregation of perturbed gradients, which exponentially biases descent directions toward low-energy regions. We show that PGH corresponds to a log-sum-exp (soft-min) homotopy that smooths a nonconvex objective at scale $λ>0$ and recovers the original objective as $λ\to 0$, yielding a posterior-mean generalization of the Moreau envelope, and we derive a dynamical system governing minimizer evolution along an annealed homotopy path. This establishes a principled connection between Gaussian continuation, Bayesian denoising, and diffusion-style smoothing. We further propose Probabilistic Gaussian Homotopy Optimization (PGHO), a practical stochastic algorithm based on Monte Carlo gradient estimation, and demonstrate strong performance on high-dimensional nonconvex benchmarks and sparse recovery problems where classical gradient methods and objective-space smoothing frequently fail.

URL PDF HTML ☆

赞 0 踩 0

2603.14014 2026-06-08 cs.LG cs.GT 版本更新

Aumann-SHAP: The Geometry of Counterfactual Interaction Explanations in Machine Learning

Aumann-SHAP: 机器学习中反事实交互解释的几何结构

Adam Belahcen, Stéphane Mussard

发表机构 * GitHub ； arXiv

AI总结提出Aumann-SHAP框架，通过局部超立方体网格分解反事实转移，利用微博弈Shapley和LES值实现几何感知归因，在合成数据上纠正了等分Shapley的偏差，在真实数据上修正了符号错误并提高了编辑效率。

详情

AI中文摘要

我们引入Aumann-SHAP，一个交互感知框架，通过将模型限制在连接基线和反事实特征的局部超立方体来分解反事实转移。每个超立方体被离散化为一个网格，以构建一个诱导的微玩家合作博弈，其中基本网格步移动成为玩家。该TU-微博弈上的Shapley和LES值产生几何感知的域内归因，在网格细化下收敛到对角Aumann-Shapley / Integrated Gradients极限，并将等分Shapley恢复为退化的$m=1$特例。精确的网格状态闭式公式为固定交互阶数提供了多项式时间计算。在具有已知真实值的合成基准上，等分Shapley带有不可约偏差，而Aumann-SHAP收敛到正确分解。在German Credit上，交互几何在$12.3\%$的实例中改变了特征优先级排序。在UCI Heart Disease上，等分错误地将胆固醇抑制因子归因为正贡献者，这是Aumann-SHAP纠正的符号错误。在MNIST上，博弈论归因达到目标置信度所需的编辑次数比基于幅度的排序少$3.5\ imes$，其中微博弈Shapley在所有预算下实现了最佳效率。

英文摘要

We introduce Aumann-SHAP, an interaction-aware framework that decomposes counterfactual transitions by restricting the model to a local hypercube connecting baseline and counterfactual features. Each hypercube is discretized into a grid to construct an induced micro-player cooperative game in which elementary grid-step moves become players. Shapley and LES values on this TU-micro-game yield geometry-aware within-pot attributions that converge to the diagonal Aumann--Shapley / Integrated Gradients limit under grid refinement, and recover equal-split Shapley as the degenerate $m=1$ special case. An exact grid-state closed form gives polynomial-time computation for fixed interaction order. On a synthetic benchmark with known ground truth, equal-split Shapley carries an irreducible bias while Aumann-SHAP converges to the correct decomposition. On German Credit, interaction geometry changes feature priority rankings in $12.3\%$ of instances. On UCI Heart Disease, equal-split misattributes a cholesterol suppressor as a positive contributor, which is a sign error Aumann-SHAP corrects. On MNIST, game-theoretic attribution reaches target confidence with $3.5\times$ fewer edits than magnitude-based ordering, with micro-game Shapley achieving the best efficiency across all budgets.

URL PDF HTML ☆

赞 0 踩 0

2603.16689 2026-06-08 cs.LG 版本更新

Predictive Statistics Shape Emergent World Representations of Grid Walkers

预测统计塑造网格行走者的涌现世界表征

Sasha Brenner, Thomas R. Knösche, Nico Scherf

发表机构 * Max Planck Institute for Human Cognitive and Brain Sciences（马克斯·普朗克人类认知与脑科学研究所）； Leipzig University（莱比锡大学）； ScaDS.AI

AI总结通过约束随机游走实验，发现解码器仅Transformer的第一注意力块提取预测充分统计量，后续层将其转化为预测几何，形成可读的世界模型，而循环网络未分离此阶段。

Comments 24 pages, 15 figures

详情

AI中文摘要

下一个词预测器通常似乎会发展出对潜在世界及其规则的内部表征。这些模型的概率性质表明世界结构与概率分布几何之间存在深层联系。为了更精确地理解这种联系，我们使用一个最小随机过程作为受控设置：在二维格点上的约束随机游走，必须在预定步数后到达固定终点。对该过程的最优预测仅取决于由游走者相对于目标的位置和剩余时间范围决定的充分向量；换句话说，概率分布由世界的网格几何参数化。我们在从这些游走的精确分布中采样的前缀上训练解码器仅Transformer和循环网络，并通过跨层测量对齐和线性可读性，将其隐藏激活与预测的充分统计量进行比较。我们发现Transformer的计算分为两个阶段：第一个注意力块从输入中提取充分统计量，后续层将其转化为下一步预测几何。在不同约束变体中，注意力后的表征是通用的：一个共享的晶格世界状态，可以直接作为世界模型读取，追溯到数据的预测几何。后续层随后将其专门化到每个变体的下一步分布。循环网络达到相同的贝叶斯最优损失，但未将这个世界状态隔离为一个单独阶段，表明世界模型几何也依赖于架构。尽管在玩具系统中演示，结果表明预测分布的几何是理解神经网络如何内化其数据结构的有用视角。

英文摘要

Next-token predictors often appear to develop internal representations of the latent world and its rules. The probabilistic nature of these models suggests a deep connection between the structure of the world and the geometry of probability distributions. In order to understand this link more precisely, we use a minimal stochastic process as a controlled setting: constrained random walks on a two-dimensional lattice that must reach a fixed endpoint after a predetermined number of steps. Optimal prediction of this process solely depends on a sufficient vector determined by the walker's position relative to the target and the remaining time horizon; in other words, the probability distributions are parametrized by the world's grid geometry. We train decoder-only transformers and recurrent networks on prefixes sampled from the exact distribution of these walks and compare their hidden activations to sufficient statistics of prediction, by measuring alignment and linear readability across layers. We find that the transformer's computation factors into two stages: the first attention block extracts the sufficient statistic from the input, and later layers transform it into the next-step predictive geometry. Across constraint variants the post-attention representation is universal: a shared world-state of the lattice that can be read directly as a world model, traced to the predictive geometry of the data. Later layers then specialize it to each variant's next-step distribution. Recurrent networks reach the same Bayes-optimal loss but do not isolate this world-state as a separate stage, showing that the world-model geometry also depends on architecture. Although demonstrated in a toy system, the results suggest that the geometry of the predictive distribution is a useful lens on how neural networks internalize the structure of their data.

URL PDF HTML ☆

赞 0 踩 0

2603.26846 2026-06-08 cs.LG cs.AI 版本更新

GraphWalker: 患者类比与信息增益结合用于大型语言模型的临床推理

Yue Fang, Weibin Liao, Yuxin Guo, Jiaran Gao, Hongxin Ding, Jinyang Zhang, Xinke Jiang, Zhibang Yang, Junfeng Zhao, Yasha Wang, Liantao Ma

发表机构 * School of Computer Science, Peking University, Beijing, China（北京大学计算机学院，北京，中国）； National Engineering Research Center for Software Engineering, Peking University, Beijing, China（软件工程国家工程研究中心，北京大学，北京，中国）

AI总结提出GraphWalker框架，通过联合数据驱动和模型驱动视角、发现患者队列以及采用懒惰贪心搜索，从电子健康记录中检索患者案例进行类比推理，无需参数更新即可提升临床推理性能。

详情

AI中文摘要

在电子健康记录（EHR）上进行临床推理是现代医疗中一项基本但具有挑战性的任务。虽然大型语言模型（LLM）通过上下文演示提供了一种有前景的范式，无需特定任务的参数更新，但现有的基于患者类比推理的方法在EHR设置中存在三个核心局限性：（1）视角局限性，数据驱动的相似性与LLM推理需求不一致，而模型驱动的信号受限于有限的临床能力；（2）队列意识，演示独立选择，未建模群体级结构；（3）信息聚合，忽略演示之间的冗余和交互效应。我们提出GraphWalker，一个无需训练的框架，让冻结的LLM通过检索到的患者案例进行类比推理。GraphWalker（i）联合利用数据驱动和模型驱动视角，（ii）发现患者队列以将检索基于群体级结构，（iii）采用带前沿扩展的懒惰贪心搜索来组合具有高边际信息增益的演示。在多个真实EHR基准上的大量实验表明，GraphWalker始终优于最先进的演示选择基线，并且在跨数据集分布偏移下保持更强的鲁棒性，无需特定任务的参数更新。GraphWalker进一步泛化到黑盒LLM，并自然地与智能体推理框架组合，使其成为基于LLM的临床工作流中可插拔的患者类比技能。我们的代码可在https://github.com/PuppyKnightUniversity/GraphWalker获取。

英文摘要

Clinical reasoning over electronic health records (EHRs) is a fundamental yet challenging task in modern healthcare. While large language models (LLMs) offer a promising paradigm via in-context demonstrations that requires no task-specific parameter updates, existing methods for reasoning by patient analogy in EHR settings suffer from three core limitations: (1) Perspective Limitation, where data-driven similarity misaligns with LLM reasoning needs while model-driven signals are constrained by limited clinical competence; (2) Cohort Awareness, as demonstrations are selected independently without modeling population-level structure; and (3) Information Aggregation, where redundancy and interaction effects among demonstrations are ignored. We propose GraphWalker, a training-free framework that lets frozen LLMs reason by analogy over retrieved patient cases. GraphWalker (i) jointly leverages data-driven and model-driven perspectives, (ii) discovers patient cohorts to ground retrieval in population-level structure, and (iii) employs a lazy greedy search with frontier expansion to compose demonstrations with high marginal information gain. Extensive experiments on multiple real-world EHR benchmarks show that GraphWalker consistently outperforms state-of-the-art demonstration selection baselines, and remains substantially more robust under cross-dataset distribution shift, without task-specific parameter updates. GraphWalker further generalizes to black-box LLMs and composes naturally with agentic reasoning frameworks, positioning it as a pluggable patient-analogy skill in LLM-based clinical workflows. Our code is available at https://github.com/PuppyKnightUniversity/GraphWalker.

URL PDF HTML ☆

赞 0 踩 0

2604.07472 2026-06-08 cs.LG cs.NI 版本更新

Scalable Joint Resource Allocation for SLO-Constrained LLM Inference in Heterogeneous GPU Clouds

异构GPU云中SLO约束的LLM推理的可扩展联合资源分配

Jiaming Cheng, Duong Tung Nguyen

发表机构 * Ira A. Fulton Schools of Engineering, Arizona State University（亚利桑那州立大学工程学院）

AI总结针对异构GPU云中LLM推理的SLO约束，提出可扩展框架，通过约束感知启发式算法（GH和AGH）实现联合资源分配，在秒级内生成可行解并接近最优，显著降低成本和SLO违规。

详情

AI中文摘要

在云环境中服务大型语言模型（LLM）推理需要在延迟、准确性、内存和预算约束下联合优化模型选择、GPU配置、并行配置和工作负载路由。虽然混合整数线性规划（MILP）可以建模此问题，但其计算成本限制了在需求变化下的频繁重新优化。现有的启发式方法通常分别优化各个组件，并且在强制执行系统范围约束时可能变得不可行。本文提出了一个用于SLO约束的LLM推理的可扩展框架。我们将问题表述为一个MILP，并采用两阶段延迟模型，该模型在张量并行和流水线并行下捕获预填充和自回归解码。为了高效求解，我们开发了两种约束感知启发式算法：贪婪启发式（GH）和自适应贪婪启发式（AGH）。AGH通过多起点构建、局部搜索和GPU整合扩展了GH。两种方法通过并行感知过滤、基于成本的排序和自适应并行缩放来保持可行性。基于Azure LLM推理轨迹的实验表明，GH在一秒内生成可行解，而AGH在三秒内实现接近最优的性能，并可扩展到精确求解器无法收敛的大规模实例。在高达1.5倍延迟和准确性膨胀的样本外压力下，AGH通过预留的余量优雅地退化，产生的成本和SLO违规远低于成本最小的MILP解决方案。在合成和真实Azure工作负载中，AGH以显著低于精确MILP解决方案的成本维持SLO合规性。这些结果表明，高质量分配在实现快速适应工作负载变化的同时，对需求变化提供了显著的鲁棒性。

英文摘要

Serving large language model (LLM) inference in cloud environments requires jointly optimizing model selection, GPU provisioning, parallelism configuration, and workload routing under latency, accuracy, memory, and budget constraints. While mixed-integer linear programming (MILP) can model this problem, its computational cost limits frequent re-optimization under demand variability. Existing heuristics often optimize individual components separately and may become infeasible when system-wide constraints are enforced. This paper presents a scalable framework for SLO-constrained LLM inference. We formulate the problem as an MILP with a two-phase delay model capturing both prefill and autoregressive decoding under tensor and pipeline parallelism. To solve it efficiently, we develop two constraint-aware heuristics: a Greedy Heuristic (GH) and an Adaptive Greedy Heuristic (AGH). AGH extends GH through multi-start construction, local search, and GPU consolidation. Both methods maintain feasibility through parallelism-aware filtering, cost-based ranking, and adaptive parallelism scaling. Experiments based on the Azure LLM Inference Trace show that GH generates feasible solutions within one second, while AGH achieves near-optimal performance within three seconds and scales to large instances where exact solvers fail to converge. Under out-of-sample stress with up to 1.5x delay and accuracy inflation, AGH degrades gracefully through provisioned headroom, yielding substantially lower cost and SLO violations than cost-minimal MILP solutions. Across synthetic and real Azure workloads, AGH maintains SLO compliance at significantly lower cost than exact MILP solutions. These results demonstrate that high-quality allocations provide substantial robustness to demand variability while enabling rapid adaptation to workload changes.

URL PDF HTML ☆

赞 0 踩 0

2604.10098 2026-06-08 cs.LG 版本更新

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Transformer中的注意力汇聚：利用、解释与缓解综述

Zunhai Su, Hengyuan Zhang, Wei Wu, Yifan Zhang, Yaxiu Liu, He Xiao, Qingyao Yang, Yuxuan Sun, Rui Yang, Chao Zhang, Jing Xiong, Hui Shen, Keyu Fan, Weihao Ye, Chaofan Tao, Taiqiang Wu, Zhongwei Wan, Tiantian Zhang, Bowen Yan, Zhen Li, Yiming Zhang, Congkai Xie, Yulei Qian, Yuchen Xie, Yik-Chung Wu, Hongxia Yang, Ngai Wong

发表机构 * Tsinghua University（清华大学）； Meituan LongCat Team（美团LongCat团队）； The University of Hong Kong（香港大学）； University of Michigan（密歇根大学）； Xiamen University（厦门大学）； The Ohio State University（俄亥俄州立大学）； Columbia University（哥伦比亚大学）； Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； The Hong Kong Polytechnic University（香港理工大学）

AI总结本文首次系统综述Transformer中的注意力汇聚现象，从基础利用、机制解释和策略缓解三个维度梳理研究现状，为未来研究提供指导。

详情

AI中文摘要

作为现代机器学习的基础架构，Transformer已在多个AI领域取得了显著进展。尽管其具有变革性影响，但各种Transformer中普遍存在一个挑战——注意力汇聚（Attention Sink, AS），即大量注意力集中在少数特定但无信息量的标记上。AS使可解释性复杂化，显著影响训练和推理动态，并加剧幻觉等问题。近年来，大量研究致力于理解和利用AS。然而，缺乏系统整合AS相关研究并为未来进展提供指导的全面综述。为填补这一空白，我们提出了首个关于AS的综述，围绕定义当前研究格局的三个关键维度展开：基础利用、机制解释和策略缓解。我们的工作通过突出该领域的关键概念和主要趋势，引导研究人员了解AS相关研究的演变，做出了关键贡献。我们希望本综述能成为有价值的资源，使研究人员能够在当前Transformer范式下有效管理AS，同时为下一代Transformer的创新进展提供灵感。本文的论文列表可在https://github.com/ZunhaiSu/Awesome-Attention-Sink获取。

英文摘要

As the foundational architecture of modern machine learning, Transformers have driven remarkable progress across diverse AI domains. Despite their transformative impact, a persistent challenge across various Transformers is Attention Sink (AS), in which a disproportionate amount of attention is focused on a small subset of specific yet uninformative tokens. AS complicates interpretability, significantly affecting the training and inference dynamics, and exacerbates issues such as hallucinations. In recent years, substantial research has been dedicated to understanding and harnessing AS. However, a comprehensive survey that systematically consolidates AS-related research and offers guidance for future advancements remains lacking. To address this gap, we present the first survey on AS, structured around three key dimensions that define the current research landscape: Fundamental Utilization, Mechanistic Interpretation, and Strategic Mitigation. Our work makes a pivotal contribution by highlighting the key concepts and main trends in the field, guiding researchers through the evolution of AS-related studies. We envision this survey as a valuable resource, empowering researchers to effectively manage AS within the current Transformer paradigm, while simultaneously inspiring innovative advancements for the next generation of Transformers. The paper list of this work is available at https://github.com/ZunhaiSu/Awesome-Attention-Sink.

URL PDF HTML ☆

赞 0 踩 0

2604.21407 2026-06-08 cs.LG stat.CO stat.ML 版本更新

Even More Guarantees for Variational Inference in the Presence of Symmetries

变分推断在对称性存在下的更多保证

Lena Zellinger, Antonio Vergari

发表机构 * School of Informatics（信息学院）； University of Edinburgh（爱丁堡大学）

AI总结本文扩展了变分推断在目标对称性下的鲁棒性理论，证明了使用前向KL散度和α-散度时，即使模型误设也能精确恢复目标均值和相关矩阵，并放宽了对数凹假设，适用于多模态分布。

详情

AI中文摘要

当通过变分推断（VI）近似一个难以处理的密度时，变分族通常被选为一个简单的参数族，很可能不包含目标。这引发了一个问题：在模型误设的情况下，我们能在什么条件下恢复目标的特征？在这项工作中，我们在两个重要方面扩展了先前关于位置-尺度族在目标对称性下鲁棒VI的理论结果：（1）我们通过提供使用前向Kullback-Leibler散度和α-散度时精确恢复目标均值和相关矩阵的充分条件，将它们开放给更广泛的散度。（2）通过这样做，我们发现可以放弃先前工作中做出的对数凹目标的限制性假设，从而允许我们为更广泛的目标（包括多模态目标）提供保证。在我们的实验中，我们展示了我们的保证如何作为选择变分族和α值的指南，并通过一组多样化的例子说明了在缺乏我们的充分条件时优化如何以及为何会失败。

英文摘要

When approximating an intractable density via variational inference (VI) the variational family is typically chosen as a simple parametric family that very likely does not contain the target. This raises the question: Under which conditions can we recover characteristics of the target despite misspecification? In this work, we extend previous theoretical results on robust VI with location-scale families under target symmetries in two substantial ways: (1) We open them up to a wider range of divergences by providing sufficient conditions for exact recovery of the target mean and correlation matrix when using the forward Kullback-Leibler divergence and $α$-divergences. (2) By doing so, we find that we can drop the restrictive assumption of a log-concave target made in previous work, allowing us to give guarantees for a wider range of targets, including multi-modal ones. In our experiments, we show how our guarantees can serve as guidelines for the choice of the variational family and $α$-value and we illustrate on a diverse set of examples how and why optimization can fail in the absence of our sufficient conditions.

URL PDF HTML ☆

赞 0 踩 0

2604.27011 2026-06-08 cs.LG cs.AI 版本更新

Automatic Causal Fairness Analysis with LLM-Generated Reporting

基于LLM生成报告的自适应因果公平性分析

Alessia Berarducci, Eric Rossetto, Alessandro Antonucci, Marco Zaffalon

发表机构 * Istituto Dalle Molle di Studi sull’Intelligenza Artificiale (IDSIA), USI-SUPSI（日内瓦人工智能研究所（IDSIA）、USI-SUPSI）

AI总结提出FairMind原型，利用标准公平模型进行因果公平分析，通过反事实查询计算因果效应，并借助LLM零样本生成公平性报告，优于直接LLM分析。

Comments 23 pages, 6 figures, 3 tables, LaTeX; added missing proof for Proposition 3, typos corrected, updated example 1 to have positive values for the Sankey

详情

AI中文摘要

AutoML，旨在自动化机器学习在现实问题中的应用，是AI普及的关键步骤。大多数AutoML框架未考虑训练数据及相应预测中潜在的公平性缺失。我们介绍\textsc{FairMind}，一个旨在自动化数据集层面公平性分析的软件原型。我们通过借助Plečko和Bareinboim最近提出的\textit{标准公平模型}的假设来实现这一点。这允许基于涉及目标、可能的混杂因素和中介变量以及我们视为\textit{受保护}的输入特征的不同值的\textit{反事实}查询，进行合理的因果效应公平性评估。在必要的数据预处理之后，该工具实现了效应的闭式计算。随后利用LLM生成关于训练数据集中检测到的公平性水平的准确报告。我们在零样本设置中实现了这一点，并通过示例展示了相对于LLM直接分析的预期优势。为了促进应用，还讨论了有序受保护变量和连续目标的扩展以及新的分解结果。

英文摘要

AutoML, intended as the process of automating the application of machine learning to real-world problems, is a key step for AI popularisation. Most AutoML frameworks are not accounting for the potential lack of fairness in the training data and in the corresponding predictions. We introduce \textsc{FairMind}, a software prototype aiming to automatise fairness analysis at the dataset level. We achieve that by resorting to the assumptions of the \emph{standard fairness model}, recently proposed by Plečko and Bareinboim. This allows for a sound fairness evaluation in terms of causal effects, based on \emph{counterfactual} queries involving the target, possibly confounders and mediators, and the different values of an input feature we regard as \emph{protected}. After the necessary data preprocessing, the tool implements a closed-form computation of the effects. LLMs are consequently exploited to generate accurate reports on the fairness levels detected in the training dataset. We achieve that in a zero-shot setup and show by examples the expected advantages with respect to a direct analysis performed by the LLM. To favour applications, extensions to ordinal protected variable and continuous targets and novel decomposition results are also discussed.

URL PDF HTML ☆

赞 0 踩 0

2605.00369 2026-06-08 cs.LG cs.AI 版本更新

InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees

InvEvolve：通过具有性能保证的大语言模型进化白盒库存策略

Chenyu Huang, Jianghao Lin, Zhengyang Tang, Bo Jiang, Ruoqing Jiang, Benyou Wang, Lai Wei

发表机构 * Shanghai University of Finance and Economics（上海财经大学）； Shanghai Jiao Tong University（上海交通大学）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））； Tsinghua University（清华大学）； Boston College（波士顿大学）

AI总结提出InvEvolve框架，利用强化学习训练的大语言模型，结合置信区间认证，在线生成具有统计安全保证的白盒库存策略，在合成和真实零售数据上优于经典和深度学习方法。

详情

AI中文摘要

我们研究如何利用大语言模型在非平稳需求在线环境中生成库存策略。我们的工作受到基于LLM的进化搜索（如AlphaEvolve）最新进展的启发，这些方法在数学发现等静态且高度结构化问题上表现出色，但不太适合具有在线更新的动态库存环境。我们提出InvEvolve，一个基于置信区间认证的端到端库存策略进化与推理框架。InvEvolve建立在通过强化学习训练的大语言模型之上，能够处理需求数据以及额外的数值和文本特征，并生成具有统计安全保证的白盒库存策略，用于未来部署。我们进一步引入一个具有理论保证的统一框架，连接训练、推理和部署。这使我们能够推导出InvEvolve进化出统计上安全且改进的策略的概率下界，并刻画相对于最优安全基准的多周期性能差距。在合成数据和真实零售数据上的测试表明，InvEvolve优于经典库存策略和基于深度学习的方法。在典型库存设置中，它生成了超越现有基准的新策略。

英文摘要

We study how large language models can be used to generate inventory policies in online settings with non-stationary demand. Our work is motivated by recent advances in LLM-based evolutionary search, such as AlphaEvolve, which demonstrates strong performance on static and highly structured problems such as mathematical discovery, but is not directly suited to dynamic inventory settings with online updates. We propose InvEvolve, an end-to-end inventory policy evolution and inference framework grounded in confidence-interval-based certification. Built on a large language model trained via reinforcement learning, InvEvolve can process demand data together with additional numerical and textual features, and generates white-box inventory policies with statistical safety guarantees for future deployment. We further introduce a unified framework with theoretical guarantees that connects training, inference, and deployment. This allows us to derive a lower bound on the probability that InvEvolve evolves a statistically safe and improved policy, and to characterize the multi-period performance gap relative to the oracle-safe benchmark. Tested on both synthetic data and real-world retail data, InvEvolve outperforms classical inventory policies and deep-learning-based methods. In canonical inventory settings, it generates new policies that outperform existing benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.01642 2026-06-08 cs.LG 版本更新

Adaptive Pluralistic Alignment: A pipeline for dynamic artificial democracy

自适应多元对齐：动态人工民主的流水线

Rachel Freedman

发表机构 * GitHub

AI总结提出自适应多元对齐（APA）流水线，通过低秩奖励基分解和陪审团投票机制，动态追踪社会价值观演变，避免价值锁定，无需重复预训练或大规模数据收集。

详情

AI中文摘要

当前的对齐方法针对一组固定的偏好，因此随着社会规范随时间演变，存在强制价值锁定的风险。我们引入了自适应多元对齐（APA），这是一个模块化流水线，用于更新多元对齐的AI系统，以追踪不断变化的价值观并避免价值锁定，而无需重复昂贵的预训练或大规模数据收集。APA包含三个阶段：（1）通过低秩奖励基分解学习紧凑的个性化奖励模型；（2）使用这些模型作为陪审团，通过社会选择理论投票集体选择候选输出；（3）随着价值观变化，通过在固定奖励基上拟合新的注释者权重，高效地随时间调整陪审团。由此产生的系统高效、可解释、可引导且模块化。我们使用PRISM多用户对齐数据集和模拟的历史注释者实现了概念验证实例，并提供了初步分析，表明陪审团组成和投票规则的选择可以显著影响结果，尤其是在陪审团偏好异质的情况下。我们在https://github.com/RachelFreedman/apa提供完整代码和生成的偏好数据集。

英文摘要

Prevailing alignment methods target a fixed set of preferences and therefore risk forcing value lock-in as societal norms evolve over time. We introduce Adaptive Pluralistic Alignment (APA), a modular pipeline for updating pluralistically aligned AI systems to track evolving values and avoid value lock-in without repeating costly pretraining or large-scale data collection. APA has three stages: (1) learning compact personalized reward models via low-rank reward basis decomposition, (2) using these models as a jury that collectively selects among candidate outputs through social-choice-theoretic voting, and (3) efficiently adapting the jury over time by fitting new annotator weights over the fixed reward bases as values shift. The resulting system is efficient, explainable, steerable, and modular. We implement a proof-of-concept instantiation using the PRISM multi-user alignment dataset and simulated historical annotators, and provide preliminary analysis showing that jury composition and the choice of voting rule can substantially affect outcomes, particularly when jury preferences are heterogeneous. We provide full code and resulting preference datasets at https://github.com/RachelFreedman/apa.

URL PDF HTML ☆

赞 0 踩 0

2605.04130 2026-06-08 cs.LG 版本更新

Constrained Extreme Gradient Boosting for Adapting Reduced-Order Models

约束极端梯度提升用于自适应降阶模型

Melika Baghi, Xiao Liu, Kamran Paynabar

发表机构 * H. Milton Stewart School of Industrial and Systems Engineering（H. Milton Stewart工业与系统工程学院）

AI总结提出约束极端梯度提升（cXGBoost）框架，通过Grassmann流形上的几何表示和范数约束，预测参数依赖的POD基，实现高效自适应的降阶建模。

Comments Preprint. Under review. 4 numerical examples

详情

AI中文摘要

高保真仿真（如计算流体动力学和有限元分析）对于建模复杂工程系统至关重要，但在参数研究、优化和实时控制等任务中往往成本过高。基于投影的降阶模型（ROM）通过将控制动力学投影到低维子空间来缓解这一成本。然而，其性能在参数变化下可能恶化，因此需要自适应基构造。在这项工作中，我们提出了一种约束集成学习框架，称为约束极端梯度提升（cXGBoost），用于预测作为系统参数函数的本征正交分解（POD）基。该方法利用Grassmann流形上子空间的几何表示，将其映射到欧几里得空间，以便使用梯度提升树进行高效回归。在训练过程中施加范数约束，以确保逆映射的有效性并保持预测子空间的几何结构。所提出的方法在四个数值示例（包括流体动力学和波传播问题）上进行了评估，证明了其能够准确预测参数依赖的基，同时在非线性区域内保持鲁棒性。这些结果凸显了将几何学习与约束集成方法相结合，用于高维参数系统可扩展且可靠的降阶建模的潜力。

英文摘要

High-fidelity simulations, such as computational fluid dynamics and finite element analysis, are essential for modeling complex engineering systems but are often prohibitively expensive for tasks including parametric studies, optimization, and real-time control. Projection-based reduced-order models (ROMs) alleviate this cost by projecting the governing dynamics onto low-dimensional subspaces. However, their performance can deteriorate under parameter variation, motivating the need for adaptive basis construction. In this work, we propose a constrained ensemble learning framework, termed Constrained Extreme Gradient Boosting (cXGBoost), for predicting Proper Orthogonal Decomposition (POD) bases as functions of system parameters. The approach leverages a geometric representation of subspaces on the Grassmann manifold, which are mapped to a Euclidean space to enable efficient regression using gradient boosting trees. A norm constraint is imposed during training to ensure the validity of the inverse mapping and preserve the geometric structure of the predicted subspaces. The proposed method is evaluated on four numerical examples, including fluid dynamics and wave propagation problems, demonstrating its ability to accurately predict parameter-dependent bases while maintaining robustness across nonlinear regimes. These results highlight the potential of combining geometric learning with constrained ensemble methods for scalable and reliable reduced-order modeling of high-dimensional parametric systems.

URL PDF HTML ☆

赞 0 踩 0

2605.05220 2026-06-08 cs.LG cs.AI 版本更新

MidSteer: Optimal Affine Framework for Steering Generative Models

MidSteer：用于引导生成模型的最优仿射框架

Tatiana Gaintseva, Andrew Stepanov, Ziquan Liu, Martin Benning, Gregory Slabaugh, Jiankang Deng, Ismail Elezi

发表机构 * University of Basel（巴塞尔大学）； University of California, Berkeley（加州大学伯克利分校）； ETH Zurich（苏黎世联邦理工学院）； University of Cambridge（剑桥大学）； University of Washington（华盛顿大学）

AI总结本文提出MidSteer，一种基于仿射变换的最优概念引导框架，通过最小干扰实现生成模型中的概念切换，并在视觉扩散模型和大型语言模型上验证其有效性。

详情

AI中文摘要

引导中间表示已成为控制生成模型的一种强大策略，特别是在部署后对齐和安全设置中。然而，尽管其经验成功，目前仍缺乏全面的理论框架。在本文中，我们通过形式化概念引导的理论来弥补这一差距。首先，我们在引导和仿射概念擦除之间建立联系，证明去除不期望行为的标准方法是LEACE（一种用于仿射擦除的闭式方法）的特例。接下来，我们为概念切换LEACE-Switch制定了一个原则性的理论框架，并描述了其提供最优仿射解的假设。基于这一分析，我们引入了MidSteer（最小干扰概念引导），一个更通用的用于概念操作的仿射框架，它放宽了这些假设并实现了定向的最小干扰变换。我们证明MidSteer在一系列任务、模态和架构（包括视觉扩散模型和大型语言模型）中表现良好。

英文摘要

Steering intermediate representations has emerged as a powerful strategy for controlling generative models, particularly in post-deployment alignment and safety settings. However, despite its empirical success, it currently lacks a comprehensive theoretical framework. In this paper, we bridge this gap by formalizing the theory of concept steering. First, we establish a link between steering and affine concept erasure, proving that the standard approach for removing unwanted behaviors is a special case of LEACE (a closed-form method for affine erasure). Next, we formulate a principled theoretical framework for concept switching, LEACE-Switch, and characterize the assumptions under which it provides an optimal affine solution. Building on this analysis, we then introduce MidSteer (Minimal Disturbance concept Steering), a more general affine framework for concept manipulation that relaxes these assumptions and enables directed, minimal-disturbance transformations. We demonstrate that MidSteer performs favorably across a range of tasks, modalities, and architectures, including vision diffusion models and large language models.

URL PDF HTML ☆

赞 0 踩 0

2605.05225 2026-06-08 cs.LG cs.AI 版本更新

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

MACS: 模态感知容量缩放用于高效多模态MoE推理

Bo Li, Chuan Wu, Shaolin Zhu

发表机构 * School of Software, Tsinghua University, Beijing, China（清华大学软件学院，北京，中国）； TJUNLP Lab, School of Computer Science and Technology, Tianjin University, China（天津大学计算机科学与技术学院，中国）； School of New Media and Communication, Tianjin University, China（天津大学新媒体与传播学院，中国）

AI总结针对多模态MoE大模型在专家并行推理中因信息异质性和模态动态性导致的效率瓶颈，提出无需训练的MACS框架，通过熵加权负载和动态模态自适应容量机制优化资源分配，显著提升多模态基准性能。

Comments Accepted by ACL 2026

详情

AI中文摘要

混合专家多模态大语言模型（MoE MLLMs）在专家并行（EP）推理过程中因落后者效应而遭受显著的效率瓶颈。在多模态背景下，这一问题更加严重，因为现有的基于token计数的负载均衡方法无法解决两个独特挑战：（1）信息异质性，其中大量冗余的视觉token与语义关键的token被同等对待；（2）模态动态性，不同任务中视觉与文本比例的变化导致资源错配。为应对这些挑战，我们提出MACS（模态感知容量缩放），一种无需训练的推理框架。具体而言，MACS引入熵加权负载机制来量化视觉token的语义价值，解决信息异质性。此外，动态模态自适应容量机制根据输入的实时模态组成分配专家资源。大量实验表明，MACS在各种多模态基准上显著优于现有方法，为MoE MLLMs在EP推理中的高效部署提供了新颖且稳健的解决方案。

英文摘要

Mixture-of-Experts Multimodal Large Language Models (MoE MLLMs) suffer from a significant efficiency bottleneck during Expert Parallelism (EP) inference due to the straggler effect. This issue is worsened in the multimodal context, as existing token-count-based load balancing methods fail to address two unique challenges: (1) Information Heterogeneity, where numerous redundant visual tokens are treated equally to semantically critical ones, and (2) Modality Dynamics, where varying visual to text ratios across tasks lead to resource misallocation. To address these challenges, we propose MACS (Modality-Aware Capacity Scaling), a training-free inference framework. Specifically, MACS introduces an Entropy-Weighted Load mechanism to quantify the semantic value of visual tokens, addressing information heterogeneity. Additionally, the Dynamic Modality-Adaptive Capacity mechanism allocates expert resources based on the real-time modal composition of the input. Extensive experiments demonstrate that MACS significantly outperforms existing methods on various multimodal benchmarks, providing a novel and robust solution for the efficient deployment of MoE MLLMs in EP inference.

URL PDF HTML ☆

赞 0 踩 0

2605.08692 2026-06-08 cs.LG cs.CL 版本更新

AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization

AAAC: 面向4位LLM权重量化的激活感知自适应码本

Beshr IslamBouli, David Jin

发表机构 * University of Waterloo（滑铁卢大学）

AI总结提出AAAC方法，通过每层两个小型学习码本（64字节）替代固定标量码本，以激活加权重建误差最小化选择码本，实现零额外存储开销的4位权重量化，在3-30分钟内完成量化，精度优于现有方法。

详情

AI中文摘要

训练后仅权重量化至4位被广泛用于减少大语言模型推理的内存和计算成本。现有的PTQ方法，如AWQ和GPTQ，通过缩放、裁剪或误差补偿改进权重映射到固定4位网格的方式。为进一步提高精度，OmniQuant和QuIP#等方法使用梯度辅助算法，但需要数小时的量化时间。在这项工作中，我们提出AAAC（激活感知自适应码本），一种用于4位LLM权重量化的轻量级方法。AAAC用每层两个小型学习标量码本（64字节）替换标准量化中使用的固定标量码本。每组权重选择使激活加权重建误差最小的码本，将选择编码在组正缩放的未使用符号位中，并增加零存储开销。AAAC在单个GPU上3-30分钟内完成，且不增加模型本身之外的额外内存。我们跨模型族与AWQ、GPTQ、IF4、GPTVQ、OmniQuant、SqueezeLLM和QuIP#进行评估。AAAC在量化时间少几个数量级的情况下优于基线方法。

英文摘要

Post-training weight-only quantization to 4 bits is widely used to reduce the memory and compute costs of large language model inference. Existing PTQ methods, such as AWQ and GPTQ, improve how weights are mapped onto a fixed 4-bit grid through scaling, clipping, or error compensation. To further improve accuracy, methods such as OmniQuant and QuIP\# uses gradient-assisted algorithms at the cost of hours of quantization time. In this work, we propose AAAC (Activation-Aware Adaptive Codebooks), a lightweight method for 4-bit LLM weight quantization. AAAC replaces the fixed scalar codebook used in standard quantization with two small learned scalar codebooks (64 bytes) per layer. Each group of weights selects the codebook that minimizes activation-weighted reconstruction error, encoding the choice in the unused sign bit of the group's positive scale and adding zero storage overhead. AAAC completes in 3--30 minutes on a single GPU, and adds no memory beyond the model itself. We evaluate against AWQ, GPTQ, IF4, GPTVQ, OmniQuant, SqueezeLLM, and QuIP\# across model families. AAAC outperforms baselines at orders-of-magnitude less quantization time.

URL PDF HTML ☆

赞 0 踩 0

2605.15354 2026-06-08 cs.LG 版本更新

I-SAFE：基于瓦尔德斯坦一致性度量的科学AI模型结构审计

Barbara Tarantino, Gennaro Auricchio, Paolo Giudici

发表机构 * Department of Economics, University of Pavia（经济学系，帕维亚大学）； Department of Mathematics, University of Padua（数学系，帕维亚大学）

AI总结本文提出I-SAFE框架，通过瓦尔德斯坦一致性度量对科学AI模型进行结构审计，揭示模型在分布响应上的差异，为科学AI模型提供更可靠的评估方法。

详情

AI中文摘要

深度学习模型在科学预测任务中被越来越多地使用，其中强大的基准性能常被解释为具有科学意义的行为。然而，这种解释是脆弱的，因为模型可能利用捷径特征、数据集特定的规律或分布偏见，这些在验证数据上具有预测性，但与领域相关的结构不一致。为了解决这一限制，我们引入了I-SAFE（Interventional Secure, Accurate, Fair and Explainable）框架，这是一个面向科学AI模型的后验分布审计框架，核心是瓦尔德斯坦一致性度量（WCM）。给定一个训练好的黑盒预测器和一个外部结构先验，该框架评估模型输出在结构引导的输入扰动下的表现。所提出的审计度量通过三个互补的指标输出分布一致性：基于分位数的度量（QBM）用于位置级一致性，WCM用于顺序一致性，以及一个翻译不变的WCM变体用于形状一致性。我们通过药物-靶点相互作用（DTI）预测在Davis激酶基准、KLIFS（激酶-配体相互作用指纹和结构）结合口袋注释以及三个基于序列的DTI模型：DeepConvDTI、DeepDTA和TAPB上实例化I-SAFE。尽管这些模型在可比的预测范围内运行，I-SAFE揭示了显著不同的分布响应特征，这种差异在基于准确性的评估中是不可见的。该框架是模型无关的，适用于任何输入具有结构分解和外部先验可用的领域。

英文摘要

Deep learning models are increasingly used in scientific prediction tasks where strong benchmark performance is often interpreted as evidence of scientifically meaningful behavior. This interpretation is fragile, as models may exploit shortcut features, dataset-specific regularities, or distributional biases that are predictive on held-out data but not aligned with domain-relevant structure. To address this limitation, we introduce the \textsc{I-SAFE} (Interventional Secure, Accurate, Fair and Explainable) framework, a post-hoc distributional auditing framework for scientific AI models centered on the Wasserstein Coherence Metric (WCM). Given a trained black-box predictor and an external structural prior encoding domain knowledge about task-relevant input structure, \textsc{I-SAFE} evaluates raw model outputs under structurally guided perturbations of the input. The proposed audit measures output-distribution coherence through three complementary metrics: a Quantile-Based Metric (QBM) for location-level coherence, the WCM for ordinal coherence, and a translation-invariant WCM variant for shape coherence. We instantiate \textsc{I-SAFE} on drug--target interaction (DTI) prediction using the Davis kinase benchmark, KLIFS (Kinase--Ligand Interaction Fingerprints and Structures) binding-pocket annotations, and three sequence-based DTI models: DeepConvDTI, DeepDTA, and TAPB. Although the models operate in a comparable predictive regime, \textsc{I-SAFE} reveals substantially different distributional response profiles, a distinction invisible to accuracy-based evaluation. The framework is model-agnostic and applicable to any domain where inputs admit a structured decomposition and an external prior is available.

URL PDF HTML ☆

赞 0 踩 0

2605.25054 2026-06-08 cs.LG cs.AI 版本更新

Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training

按需扩展：自适应神经元级混合精度量化感知训练

Ayush K. Varshney, Konstantinos Vandikas, Šarūnas Girdzijauskas, Adam Orucu, Aneta Vulgarakis Feljan

发表机构 * University of California, Berkeley（加州大学伯克利分校）； DeepMind（深度思维）； University of Cambridge（剑桥大学）

AI总结提出神经元级混合精度量化感知训练（NMP-QAT），通过可微代理和直通估计器让每个神经元独立学习离散精度，实现按需扩展位宽，在MLP和表格基础模型上取得更优的压缩-精度权衡。

Comments Accepted at ICML - GlobalSouthML workshop, 2026

详情

AI中文摘要

在资源受限的6G边缘设备上部署深度神经网络需要激进压缩且最小化精度损失。量化感知训练（QAT）已成为领先的压缩方法；然而，现有的混合精度方法通常以粗粒度的层或通道级别操作。这些方法通常依赖启发式或基于搜索的位分配策略，可能忽略神经元级别的细粒度变异性。我们提出神经元级混合精度QAT（NMP-QAT），其中每个神经元在训练期间独立学习自己的离散精度。从低位精度开始，NMP-QAT仅在训练信号需要时通过可微代理和直通估计器扩展位宽，同时保持完全离散的推理图。这种适应性扩展到权重和激活，减少内存移动。在电信和非电信数据集上，跨MLP和表格基础模型架构评估，NMP-QAT相比混合精度QAT基线实现了更优的压缩-精度权衡，使其非常适合网络边缘的绿色AI部署。

英文摘要

Deploying deep neural networks on resource-constrained 6G edge devices demands aggressive compression with minimal accuracy loss. Quantization-Aware Training (QAT) has emerged as a leading compression approach; however, existing mixed-precision methods typically operate at coarse layer- or channel-level granularity. These methods often rely on heuristic or search-based bit-allocation strategies, which may overlook fine-grained variability at the neuron level. We propose Neuron-Level Mixed-Precision QAT (NMP-QAT), where each neuron independently learns its own discrete precision during training. Starting from low-bit precision, NMP-QAT expands bit-width only when training signals demand it, via differentiable surrogates and straight-through estimators, while preserving a fully discrete inference graph. This adaptability extends to both weights and activations, reducing memory movement. Evaluated on telecom and non-telecom datasets across MLP and tabular foundation model architectures, NMP-QAT achieves superior compression-accuracy trade-offs over mixed-precision QAT baselines, making it well-suited for Green AI deployments at the network edge.

URL PDF HTML ☆

赞 0 踩 0

2605.25413 2026-06-08 cs.LG cs.AI cs.NA math.NA 版本更新

Autoregression-Free Neural Operators for Time-Dependent PDEs

无自回归的神经算子用于时间相关偏微分方程

Jiaquan Zhang, Caiyan Qin, Haoyu Bian, Libin Cai, Yi Lu, Chaoning Zhang, Wei Dong, Yuanfang Guo, Yang Yang, Heng Tao Shen

发表机构 * School of Computer Science and Engineering, University of Electronic Science and Technology of China（电子科技大学计算机科学与工程学院）； School of Robotics and Advanced Manufacture, Harbin Institute of Technology（哈尔滨工业大学机器人与先进制造学院）； School of Mathematical Sciences, Capital Normal University（首都师范大学数学学院）； College of Information and Control Engineering, Xi’an University of Architecture and Technology（西安建筑科技大学信息与控制工程学院）； Laboratory of Intelligent Recognition and Image Processing, School of Computer Science and Engineering, Beihang University（北京航空航天大学智能识别与图像处理实验室）； School of Computer Science and Technology, Tongji University（同济大学计算机科学与技术学院）

AI总结提出AFNO，通过将PDE时间演化映射到潜空间并利用流匹配学习连续时间向量场，避免自回归展开，实现长期稳定预测。

Comments 23 pages, 18 figures

详情

AI中文摘要

神经算子学习从函数依赖输入到解的映射，为求解偏微分方程（PDE）提供了有效框架。对于时间相关PDE，现有方法通常通过在高维物理场空间中直接进行自回归展开来执行长时域预测，其中每个预测状态被递归地反馈作为下一步的输入。尽管对短期预测有效，但这种自回归展开以及缺乏连续时间建模导致长时域展开中误差逐渐累积。在这项工作中，我们提出无自回归神经算子（AFNO），将PDE的时间演化映射到潜空间并在其中建模连续时间向量场。AFNO使用流匹配来学习潜向量场，从而能够在扩展时域上实现连续演化，避免自回归展开，并通过显式条件化物理参数来捕捉不同参数配置下的动力学。对六个PDE的理论分析和广泛实验表明，与基线相比，AFNO提高了长时域预测稳定性并持续减少了展开误差。

英文摘要

Neural operators learn mappings from function-dependent inputs to solutions, providing an effective framework for solving partial differential equations (PDEs). For time-dependent PDEs, existing methods typically perform long-horizon prediction through autoregressive rollout directly in high-dimensional physical field spaces, where each predicted state is recursively fed back as the input for the next step. Although effective for short-term prediction, this autoregressive rollout and the lack of continuous-time modeling lead to progressive error accumulation over long-horizon rollouts. In this work, we propose Autoregression-Free Neural Operators (AFNO), which map the time evolution of PDEs into a latent space and model continuous-time vector fields within it. AFNO uses flow matching to learn the latent vector field, thereby enabling continuous evolution over extended horizons, avoiding autoregressive rollout and capturing dynamics under varying parameter configurations through explicit conditioning on physical parameters. Theoretical analysis and extensive experiments on six PDEs demonstrate that AFNO improves long-horizon prediction stability and consistently reduces rollout errors compared with the baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.25451 2026-06-08 cs.LG 版本更新

BigMac: Breaking the Pareto Frontier of Compute and Memory in Multimodal LLM Training

BigMac: 打破多模态大语言模型训练中的计算与内存帕累托前沿

Zili Zhang, Chengxu Yang, Shenglong Zhang, Chenyu Wang, Yufan Zhang, Tuo Dai, Zhouyang Li, Yuhong Ge, Chao Jin, Xin Jin, Yuliang Liu

发表机构 * Peking University（北京大学）； Independent Researcher（独立研究员）； Xiaohongshu, Inc（小红书公司）

AI总结提出BigMac训练流水线，通过嵌套编码器和生成器计算到LLM流水线中，同时优化计算效率和内存使用，打破帕累托前沿。

详情

AI中文摘要

训练多模态大语言模型（MLLMs）面临模型和数据的异构性挑战。现有系统重新设计训练流水线以应对这些挑战，但仍受限于计算与内存效率之间的帕累托前沿，只能以牺牲一方为代价改进另一方。我们提出BigMac，一种新的多模态大语言模型训练流水线。BigMac的核心思想是将编码器和生成器的计算优雅地嵌套到原始LLM流水线中，形成依赖安全的嵌套流水线结构。通过这种设计，BigMac将编码器和生成器的激活内存复杂度降低到O(1)，同时保持LLM的激活内存复杂度不变。同时，它实现了与具有无限内存的理想设置相同的计算效率。因此，BigMac打破了计算效率与内存使用之间的帕累托前沿，使得在MLLM训练中能够同时优化计算和内存。我们在多个MLLM和训练负载上评估了BigMac。实验结果表明，与基线系统相比，BigMac实现了1.08倍至1.9倍的训练加速，同时随着批次大小的增加保持稳定的内存使用。

英文摘要

Training multimodal large language models (MLLMs) is challenged by both model and data heterogeneity. Existing systems redesign the training pipeline to address these challenges, but remain bound by a Pareto frontier between compute and memory efficiency, improving one only at the expense of the other. We present BigMac, a new training pipeline for multimodal LLMs. The core idea of BigMac is to elegantly nest the encoder and generator computation into the original LLM pipeline, forming a dependency-safe nested pipeline structure. With this design, BigMac reduces the activation memory complexity of the encoder and generator to O(1) while keeping the activation memory complexity of the LLM unchanged. At the same time, it achieves the same computational efficiency as the idealized setting with unlimited memory. As a result, BigMac breaks the Pareto frontier between computational efficiency and memory usage, enabling simultaneous optimization of both computation and memory in MLLM training. We evaluate BigMac on multiple MLLMs and training workloads. Experimental results show that BigMac achieves a 1.08$\times$-1.9$\times$ training speedup over baseline systems while maintaining stable memory usage as batch size increases.

URL PDF HTML ☆

赞 0 踩 0

2605.29223 2026-06-08 cs.LG 版本更新

Inferring the Size of Large Language Models From Popular Text Memorization

从流行文本记忆推断大型语言模型的规模

Ivica Nikolic

发表机构 * National University of Singapore（新加坡国立大学）

AI总结提出一种黑盒方法，通过分析模型对流行文本的记忆准确性，仅从生成文本推断LLM参数规模的下界，并验证了开源和闭源模型。

详情

AI中文摘要

最广泛使用的大型语言模型（LLM）的参数数量通常被其开发者隐瞒，使得模型规模——解释能力和成本的主要参考点——在很大程度上未被公开。我们提出了一种黑盒方法，仅从生成的文本输出推断LLM规模的下界，除了提交文本片段和观察下一个词预测的能力外，不需要任何其他条件。我们的方法基于一个关键观察：流行的、广泛传播的文本——如古典文学、宗教文本和基础文档——几乎存在于每个大规模预训练语料库中，而模型在不同长度文本片段上预测下一个词的准确度是其记忆程度的可靠信号，而记忆程度又从根本上受到其总参数数量的限制。我们将来自不同文本和片段长度的记忆信号聚合成每个模型的单一准确率轮廓向量，并在此基础上构建了两种互补的推断方法：一种成对统计检验，用于确定两个模型中哪个更大；以及一种缩放律估计器，通过主成分分析（PCA）从这些向量中提取一维潜在指数，将聚合信号映射到参数数量。在广泛的开源模型上验证，两种方法都产生了准确可靠的下界。当应用于流行的闭源模型时，我们的框架恢复了内部产品层级，并揭示了行业扩展策略的明显分歧：虽然一些开发者产生了显著更高的下界，表明代际参数大幅增长，但其他开发者在严格的参数上限下运行，表明即使在严格的API限制下，隐藏的设计选择也可以被系统地探测。

英文摘要

The parameter counts of the most widely used large language models (LLMs) are often withheld by their developers, leaving model size -- a primary reference point for interpreting capabilities and costs -- largely undisclosed. We propose a black-box method to infer conservative lower bounds on LLM size from generated text outputs alone, requiring nothing beyond the ability to submit text fragments and observe next-token predictions. Our approach is grounded in a key observation: popular, widely-circulated texts -- such as classical literature, religious texts, and foundational documents -- are present in virtually every large-scale pretraining corpus, and how accurately a model predicts the next word across text fragments of varying length is a reliable signal of how much it has memorized them, which in turn is fundamentally limited by its total parameter count. We aggregate this memorization signal across a diverse corpus of texts and fragment lengths into a single accuracy profile vector per model, and build two complementary inference methods on top of it: a pairwise statistical test that determines which of two models is larger, and a scaling-law estimator that extracts a one-dimensional latent index from these vectors via Principal Component Analysis (PCA) to map the aggregated signal to a parameter count. Validated on a broad set of open-weight models, both methods produce accurate and reliable lower bounds. When applied to popular closed-weight models, our framework recovers internal product hierarchies and reveals a clear divergence in industry scaling strategies: while some developers yield significantly higher bounds indicative of large generational parameter growth, others operate under strict parameter ceilings, demonstrating that hidden design choices can be systematically probed even under strict API limitations.

URL PDF HTML ☆

赞 0 踩 0

2605.30119 2026-06-08 cs.LG cs.AI cs.NE 版本更新

Evolving Features vs Evolving Entire Trees with GP for Interpretable Survival Analysis

使用遗传编程进行可解释生存分析：进化特征 vs 进化整个树

Thalea Schlender, Peter A. N. Bosman, Tanja Alderliesten

发表机构 * Leiden University Medical Center（莱顿大学医学中心）； Centrum Wiskunde & Informatica（数学与信息学研究中心）

AI总结本研究使用遗传编程多目标进化可检查的特征集，并联合优化生存树结构与非线性分裂逻辑，以提高浅层生存树的预测性能和可解释性。

详情

AI中文摘要

Muon的谱缩放定律

Gagik Magakyan, Pablo Parrilo, Asuman Ozdaglar

发表机构 * MIT（麻省理工学院）

AI总结本文系统研究了Muon优化器中动量矩阵奇异值谱随模型大小的缩放行为，发现其遵循幂律，并据此提出层感知的牛顿-舒尔茨迭代配置选择方法以减少计算开销。

详情

AI中文摘要

正交归一化更新规则已迅速成为训练大型语言模型的主流优化器选择，最近的开源最先进模型采用了Muon。为了保持这些更新的可处理性，Muon使用牛顿-舒尔茨（NS）迭代执行正交归一化。由于NS只是近似，小奇异值的方向无法被正交归一化。在Muon中，NS每一步都应用于动量矩阵，然而关于这些动量矩阵的奇异值谱在训练过程中如何行为，以及该行为如何随模型大小变化，我们知之甚少。我们首次系统研究了这一问题。通过追踪从77M到2.8B参数模型中各层动量缓冲区的奇异值分位数，我们观察到一致的现象：在短暂的预热后，分位数稳定在一个由层类型和模型大小决定的值上。这些稳定值随模型大小呈现出非常清晰的幂律，且指数依赖于层。中后深度的层随模型大小$M$的缩放非常温和（约$M^{-0.25}$），因此学术规模下使用的标准5步NS配置将在更大规模下继续对它们进行正交归一化。然而，某些后期层的缩放更为激进（高达$M^{-0.96}$），在前沿规模下将落入NS失效区域，除非使用更多NS迭代或更好调整的系数。NS迭代在规模上计算成本高昂；我们的定律为从业者提供了一种有原则的、层感知的配方，用于选择最小的NS配置，该配置仍能正交归一化重要的方向——在不牺牲更新质量的情况下避免不必要的计算。

英文摘要

Orthonormalized update rules have rapidly become a leading choice of optimizer for training large language models, with recent open-source state-of-the-art models adopting Muon. To keep these updates tractable, Muon performs the orthonormalization with the Newton--Schulz (NS) iteration. Since NS is only approximate, directions with small singular values fail to be orthonormalized. In Muon, NS is applied to the momentum matrix at every step, yet little is known about how the singular value spectrum of these momentum matrices behaves during training, or how that behavior changes with model size. We present the first systematic study of this question. Tracking singular value quantiles of the momentum buffer across layers in models ranging from 77M to 2.8B parameters, we observe a consistent picture: after a short burn-in, the quantiles stabilize at a value determined by the layer type and model size. These stabilization values follow remarkably clean power laws in model size, with layer-dependent exponents. Layers up to mid-late depth scale very mildly with model size $M$ (around $M^{-0.25}$), so the standard 5-step NS configuration used at academic scale will continue to orthonormalize them at much larger scales. Some of the late layers, however, scale much more aggressively (up to $M^{-0.96}$) and will fall into the NS failure regime at frontier scale unless one uses more NS iterations or better-tuned coefficients. NS iterations are computationally expensive at scale; our laws give practitioners a principled, layer-aware recipe for choosing the minimum NS configuration that still orthonormalizes the directions that matter -- avoiding unnecessary computation without sacrificing update quality.

URL PDF HTML ☆

赞 0 踩 0

2606.04812 2026-06-08 cs.LG cs.AI 版本更新

Scenario Generation for Risk-Aware Reinforcement Learning with Probably Approximately Safe Guarantees

面向风险感知强化学习的情景生成与近似安全保证

Mohit Prashant, Arvind Easwaran

发表机构 * Nanyang Technological University（南洋理工大学）

AI总结针对强化学习策略对转移扰动敏感导致不安全行为的问题，提出使用变分自编码器近似状态空间分布，通过构造上下界屏障证书并采样非鲁棒区域状态来收紧概率安全保证。

Comments 8 pages, preprint

详情

AI中文摘要

保证安全性对于强化学习（RL）智能体在现实世界中的部署至关重要，尤其是使用深度RL学习的策略可能表现出对转移扰动的敏感性，从而导致未知或不安全的行为。一种策略验证方法是通过采样相对于安全约束的策略轨迹来构造概率屏障证书，从而将已知的安全行为与未知行为区分开来。如果策略容易受到转移不确定性或扰动的影响，使智能体处于未充分探索的状态，则难以获得这些约束违反概率的严格上下界。为了解决这个问题，我们使用变分自编码器（VAE）近似遇到的状态空间的分布，并利用状态的潜在特征构造上下界屏障证书，以高置信度优化已知安全行为的区域。我们在工作中将其表述为一个对偶优化问题，其中下界屏障证书比上界屏障证书提供更保守的安全区域估计。在训练期间采样位于两者集合差（即非鲁棒区域）内的状态，使我们能够收紧上下界，从而提供更尖锐的概率安全保证。在我们的研究中，我们描述了所放置的保证，并通过实验证明了我们边界的紧致性。

英文摘要

Guaranteeing safety is critical to the deployment of reinforcement learning (RL) agents in the real-world, especially as policies learned using deep RL may demonstrate susceptibility to transition perturbations that result in unknown or unsafe behaviour. A method of policy verification is to construct probabilistic barrier-certificates by sampling policy trajectories with respect to safety constraints, thereby demarcating known safe behaviour from unknown behaviour. Obtaining tight upper and lower bounds on the probability of violation of these constraints may be difficult if the policy is susceptible to transition uncertainty or perturbation that places the agent in insufficiently explored states. To address this, we approximate the distribution of the encountered state-space using a variational autoencoder (VAE) and construct upper and lower-bound barrier-certificates using latent characteristics of states to optimize for regions of known, safe behaviour with high confidence. We frame this in our work as a dual optimization problem where the lower-bound barrier-certificate presents a more conservative estimate of the safe region than the upper-bound barrier-certificate. Sampling states that lie within the set difference of the two during training, i.e. the non-robust region, allows us to tighten the upper and lower bounds to provide sharper probabilistic guarantees on safety. Within our study, we describe the guarantees placed and demonstrate the tightness of our bounds experimentally.

URL PDF HTML ☆

赞 0 踩 0

2606.05152 2026-06-08 cs.LG cs.AI cs.CL 版本更新

Reinforcement Learning from Rich Feedback with Distributional DAgger

利用丰富反馈的强化学习与分布式DAgger

Rishabh Agrawal, Jacob Fein-Ashley, Paria Rashidinejad

发表机构 * University of Southern California（南加州大学）

AI总结提出DistIL算法，通过分布式DAgger利用丰富反馈（如执行轨迹、工具输出等）进行前向交叉熵优化，实现单调策略改进和更好的Pass@N性能。

详情

AI中文摘要

推理模型发展迅速，但主流的基于可验证奖励的强化学习（RLVR）方法仍然非常狭窄：采样多个响应，并用单个比特奖励每个响应，指示最终答案是否正确。然而，许多设置提供了丰富的反馈，包括执行轨迹、工具输出、专家修正和模型自我评估。我们研究如何通过经典模仿学习算法DAgger的分布式变体来使用这种反馈，其中学习器可以局部访问当前策略所访问状态上的专家分布。这产生了一个简单的前向交叉熵目标，该目标接受黑盒专家，并且其序列级梯度通过将未来的专家-学生分歧传播回早期决策来进行丰富的信用分配。我们表明，基于反向KL或Jensen-Shannon的先前具有自蒸馏目标的强化学习无法保证单调策略改进：即使专家具有更高的奖励，它们的更新也可能增加更差动作的概率。相比之下，我们证明前向交叉熵允许单调策略改进并享有遗憾保证。我们进一步表明，我们的目标优化了教师加权的成功可能性的下界，从而改进了Pass@N。实验上，我们的方法DistIL在科学推理、编程和解决困难数学问题等多个领域优于RLVR和基于自蒸馏的强化学习基线。

英文摘要

Reasoning models have advanced rapidly, but the dominant reinforcement learning from verifiable rewards (RLVR) recipe remains surprisingly narrow: sample many responses and reward each with a single bit indicating whether the final answer is correct. Yet many settings provide rich feedback, including execution traces, tool outputs, expert corrections, and model self-evaluations. We study how to use such feedback through a distributional variant of the classic imitation learning algorithm DAgger, where the learner has local access to an expert distribution on states visited by the current policy. This yields a simple forward cross-entropy objective that admits a blackbox expert and whose sequence-level gradient {conduct rich credit assignment by propagating} future expert-student disagreement back to earlier decisions. We show that prior RL with self-distillation objectives based on reverse KL or Jensen-Shannon fail to guarantee monotonic policy improvement: even when the expert has higher reward, their updates may increase probability on worse actions. In contrast, we show that forward cross-entropy admits monotonic policy improvement and enjoys guarantees on regret. We further show that our objective optimizes a lower bound on teacher-weighted likelihood of success, leading to improved Pass@N. Empirically, our approach, DistIL, improves over RLVR and RL with self-distillation baselines across a variety of domains: scientific reasoning, coding, and solving hard mathematical problems.

URL PDF HTML ☆

赞 0 踩 0

2606.06397 2026-06-08 cs.LG 版本更新

The Post-GCN Decade Revisited: Curvature-Stratified Evaluation of Relational Learning

后GCN十年回顾：曲率分层的关联学习评估

Shuo Wang, Xiangyu Wang, Quanxin Wang, Bailin Wu, Bokui Wang, Shunyang Huang, Boyan Deng, Haonan Liu, Ruiyi Fang, Zhenxiang Xu, Boyu Wang, Zhao Kang

发表机构 * University of Electronic Science and Technology of China（电子科技大学）； Tsinghua University（清华大学）； Western University（西方大学）； Zhejiang University（浙江大学）

AI总结针对关联学习中统一基准掩盖几何依赖性性能的问题，提出曲率分层评估框架，通过将数据集按曲率正负零分区，揭示模型性能本质上是几何依赖的，并给出更可靠的评估协议。

Comments Comments: Suggestions and comments are welcomed

详情

AI中文摘要

当前关联学习的评估实践严重依赖于在异质数据集上平均性能的扁平排行榜，隐含地假设了统一的底层结构。我们证明这一假设引入了系统性偏差：它掩盖了依赖于几何的性能变化，并可能导致关于模型泛化的误导性结论。在这项工作中，我们将内在几何识别为控制模型有效性的关键潜在因素。我们证明，传统的聚合指标掩盖了关键的性能权衡，这些权衡只有在数据集按其几何属性分层时才变得可见。为了解决这个问题，我们引入了一个曲率分层的评估框架，将数据集划分为正曲率、负曲率和近零曲率区域。我们的基准测试评估了18个代表性模型，包括图卷积网络（GCNs）、图基础模型（GFMs）和表格学习方法，涵盖14个数据集。我们发现，模型排名在每个曲率区域内高度稳定，但在不同区域之间显著变化，表明性能从根本上依赖于几何，而非普遍可迁移。值得注意的是，我们识别出与几何对齐的GNN相比，GFMs提供递减收益的区域。基于这些发现，我们提出了一种几何感知的评估协议，该协议比标准聚合基准产生更可靠和可解释的比较。我们发布所有代码、曲率分层的数据集划分和评估工具，以支持未来关联学习方法的可重复和严格评估。代码和数据集在我们的项目主页上提供：https://sirbabbage.github.io/CurvBench_HOME/。

英文摘要

Current evaluation practices in relational learning rely heavily on flat leaderboards that average performance across heterogeneous datasets, implicitly assuming a uniform underlying structure. We show that this assumption introduces systematic bias: it obscures geometry-dependent performance variations and can lead to misleading conclusions about model generalization. In this work, we identify intrinsic geometry as a key latent factor governing model effectiveness. We demonstrate that conventional aggregated metrics mask critical performance trade-offs that only become visible when datasets are stratified by their geometric properties. To address this issue, we introduce a curvature-stratified evaluation framework that partitions datasets into positive, negative, and near-zero curvature regimes. Our benchmark evaluates 18 representative models including Graph Convolutional Networks (GCNs), Graph Foundation Models (GFMs), and tabular learning methods across 14 datasets. We find that model rankings are highly stable within each curvature regime but shift significantly across regimes, indicating that performance is fundamentally geometry-dependent rather than universally transferable. Notably, we identify regimes where GFMs offer diminishing returns compared to geometry-aligned GNNs. Based on these findings, we propose a geometry-aware evaluation protocol that yields more reliable and interpretable comparisons than standard aggregated benchmarks. We release all code, curvature-stratified dataset splits, and evaluation tools to support reproducible and rigorous assessment of future relational learning methods. Code and datasets are provided in our project homepage: https://sirbabbage.github.io/CurvBench_HOME/.

URL PDF HTML ☆

赞 0 踩 0

2203.07904 2026-06-08 eess.IV cs.CV cs.LG 版本更新

Unsupervised Learning Based Focal Stack Camera Depth Estimation

基于无监督学习的焦堆相机深度估计

Zhengyu Huang, Weizhi Du, Theodore B. Norris

发表机构 * Center for Ultrafast Optical Science, University of Michigan（超快光学科学中心，密歇根大学）； University of Michigan（密歇根大学）

AI总结提出一种基于无监督深度学习的方法，从焦堆相机图像估计深度，在NYU-v2数据集上相比单图像方法显著提高精度。

2209.00188 2026-06-08 cs.AR cs.LG 版本更新

Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction

Hermes: 通过基于感知器的片外负载预测加速长延迟负载请求

Rahul Bera, Konstantinos Kanellopoulos, Shankar Balachandran, David Novo, Ataberk Olgun, Mohammad Sadrosadati, Onur Mutlu

发表机构 * ETH Zürich（苏黎世联邦理工学院）； Intel Processor Architecture Research Lab（英特尔处理器架构研究实验室）； LIRMM, Univ. Montpellier, CNRS（蒙彼利埃大学LIRMM实验室，CNRS）

AI总结提出Hermes技术，利用感知器预测片外负载请求，投机性地直接从主存获取数据，同时并行访问缓存层次，从而消除片外负载关键路径上的片上缓存访问延迟，显著提升处理器性能。

Comments To appear in 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022

详情

AI中文摘要

长延迟负载请求持续限制高性能处理器的性能。为增加处理器的延迟容忍度，架构师主要依赖两种关键技术：复杂的数据预取器和大型片上缓存。在这项工作中，我们表明：1) 即使是最先进的复杂预取器，在广泛的工作负载中平均也只能预测一半的片外负载请求；2) 由于片上缓存的规模和复杂性不断增加，片外负载请求的大部分延迟都花费在访问片上缓存层次结构上。本工作的目标是通过从片外负载请求的关键路径中移除片上缓存访问延迟来加速它们。为此，我们提出了一种名为Hermes的新技术，其关键思想是：1) 准确预测哪些负载请求可能走向片外；2) 投机性地直接从主存获取预测的片外负载所需的数据，同时并发访问这些负载的缓存层次结构。为实现Hermes，我们开发了一种新的轻量级、基于感知器的片外负载预测技术，该技术学习使用多个程序特征（例如，程序计数器序列）来识别片外负载请求。对于每个负载请求，预测器观察一组程序特征以预测该负载是否会走向片外。如果预测负载将走向片外，Hermes在负载的物理地址生成后立即向内存控制器发出投机性请求。如果预测正确，负载最终会错过缓存层次结构，并等待正在进行的投机性请求完成，从而从片外负载的关键路径中隐藏片上缓存层次结构访问延迟。我们的评估表明，Hermes显著提升了最先进基线的性能。我们开源了Hermes。

英文摘要

Long-latency load requests continue to limit the performance of high-performance processors. To increase the latency tolerance of a processor, architects have primarily relied on two key techniques: sophisticated data prefetchers and large on-chip caches. In this work, we show that: 1) even a sophisticated state-of-the-art prefetcher can only predict half of the off-chip load requests on average across a wide range of workloads, and 2) due to the increasing size and complexity of on-chip caches, a large fraction of the latency of an off-chip load request is spent accessing the on-chip cache hierarchy. The goal of this work is to accelerate off-chip load requests by removing the on-chip cache access latency from their critical path. To this end, we propose a new technique called Hermes, whose key idea is to: 1) accurately predict which load requests might go off-chip, and 2) speculatively fetch the data required by the predicted off-chip loads directly from the main memory, while also concurrently accessing the cache hierarchy for such loads. To enable Hermes, we develop a new lightweight, perceptron-based off-chip load prediction technique that learns to identify off-chip load requests using multiple program features (e.g., sequence of program counters). For every load request, the predictor observes a set of program features to predict whether or not the load would go off-chip. If the load is predicted to go off-chip, Hermes issues a speculative request directly to the memory controller once the load's physical address is generated. If the prediction is correct, the load eventually misses the cache hierarchy and waits for the ongoing speculative request to finish, thus hiding the on-chip cache hierarchy access latency from the critical path of the off-chip load. Our evaluation shows that Hermes significantly improves performance of a state-of-the-art baseline. We open-source Hermes.

URL PDF HTML ☆

赞 0 踩 0

2404.02141 2026-06-08 stat.ME cs.LG econ.EM stat.CO stat.ML 版本更新

一种面向中高分辨率地表温度反演的机制耦合分裂窗网络

Tian Xie, Menghui Jiang, Chao Zeng, Huifang Li, Guanhao Zhang, Chan Li, Huanfeng Shen

发表机构 * School of Resource and Environmental Sciences, Wuhan University（武汉大学资源与环境科学学院）； Key Laboratory of Geographic Information System of Ministry of Education, Wuhan（教育部地理信息系统重点实验室）； Key Laboratory of Digital Cartography and Land Information Application of the Ministry of Natural Resources, Wuhan（自然资源部数字测图与土地信息应用重点实验室）

AI总结提出并行分量解耦神经网络（PCD-Net），将分裂窗反演重构为物理分量系数的动态学习问题，通过分量级解耦建模和残差分支，实现复杂大气和地表条件下的高精度、鲁棒且全局可泛化的地表温度反演。

详情

AI中文摘要

地表温度（LST）是陆-气相互作用、地表能量收支和气候过程中的基本物理变量。从中高分辨率热红外（TIR）观测中获取的LST能有效揭示不同景观单元间的热环境差异。然而，在复杂大气条件和多样土地覆盖类型下，实现准确、鲁棒且全局可泛化的LST反演仍具挑战。传统分裂窗（SW）算法严重依赖经验参数化，其固定系数无法适应高温地表和高大气水汽含量等复杂场景。同时，传统数据驱动模型因缺乏显式物理结构约束，对分布外（OOD）样本的泛化能力有限。为解决这些问题，本研究提出并行分量解耦神经网络（PCD-Net）框架，将SW反演重构为物理分量系数的动态学习问题。以SW方程作为物理主干，该框架构建并行子网络，自适应学习对应常数项、一阶和二阶亮度温度差项的动态系数；同时引入残差分支，补充由地表发射率和大气水汽联合效应引起的非线性耦合校正。通过这种分量级解耦建模，PCD-Net显式刻画了地表发射率、大气水汽含量与不同SW物理分量之间的动态响应关系。

英文摘要

Land surface temperature (LST) is a fundamental physical variable in land-atmosphere interactions, surface energy budgets, and climate processes. LST derived from medium- to high-resolution thermal infrared (TIR) observations effectively reveals thermal environmental disparities across distinct landscape units. However, achieving accurate, robust, and globally generalizable LST retrieval remains challenging under complex atmospheric conditions and diverse land cover types. Traditional split window (SW) algorithms heavily rely on empirical parameterizations, whose fixed coefficients fail to adapt to complex scenarios such as high surface temperatures and high atmospheric water vapor content. Concurrently, conventional data-driven models exhibit limited generalizability to out-of-distribution (OOD) samples due to the absence of explicit physical structure constraints. To address these issues, this study proposes a Parallel Component Decoupled Neural Network (PCD-Net) framework, which reformulates SW retrieval as a dynamic learning problem of physical component coefficients. Using the SW equation as the physical backbone, the framework constructs parallel subnetworks to adaptively learn the dynamic coefficients corresponding to the constant, first-order, and second-order brightness temperature difference terms; meanwhile, a residual branch is incorporated to supplement the nonlinear coupling corrections induced by the joint effects of surface emissivity and atmospheric water vapor. Through this component-level decoupled modeling, PCD-Net explicitly characterizes the dynamic response relationships between land surface emissivity, atmospheric water vapor content, and different SW physical components.

URL PDF HTML ☆

赞 0 踩 0

2509.11208 2026-06-08 stat.ML cs.LG 版本更新

W7-X中电子尺度湍流建模的机器学习方法

Ionut-Gabriel Farcas, Don Lawrence Carl Agapito Fernando, Alejandro Banon Navarro, Gabriele Merlo, Frank Jenko

发表机构 * Department of Mathematics and Division of Computational Modeling and Data Analytics, Academy of Data Science, Virginia Tech（数学系和计算建模与数据科学学院，数据科学学院，弗吉尼亚理工学院）； Max Planck Institute for Plasma Physics（最大平面物理研究所）

AI总结针对Wendelstein 7-X仿星器中电子温度梯度湍流，利用主动学习回归构建物理引导的标度律降阶模型，预测热流并评估插值与外推性能。

Comments 15 pages, 7 tables, 14 figures

详情

DOI: 10.1063/5.0311057
Journal ref: Phys. Plasmas 33, 000000 (2026)

AI中文摘要

构建湍流输运的降阶模型对于加速剖面预测和实现参数探索、设计优化等多查询任务至关重要。本文研究了Wendelstein 7-X (W7-X)仿星器中电子温度梯度(ETG)湍流的机器学习驱动降阶模型。我们开发了物理引导的标度律，以预测七个径向位置处的ETG热流作为三个关键等离子体参数的函数：归一化电子温度梯度($ω_{T_e}$)、归一化电子温度与密度梯度之比($η_e$)以及电子与离子温度比($τ$)。模型系数通过回归结合主动学习策略确定。该过程使用低基数稀疏网格训练数据初始化标度律，并通过从现有模拟数据库中选择信息量最大的样本迭代丰富训练集。使用每个径向位置超过393个点的样本外数据集评估模型的预测性能。利用在七个训练径向位置识别的系数，我们进一步推导了标度律系数作为径向位置函数的回归参数化。然后在训练中未使用的三个额外径向位置评估所得模型，包括插值和适度外推情况。总体而言，我们的降阶模型表现出良好的预测性能，并达到与原始参考模拟相当的精度，包括在插值和适度外推范围内。一个重要发现是，单一的径向无关模型无法充分描述W7-X核心区的ETG输运，表明存在当前公式未捕捉的几何依赖物理。

英文摘要

Constructing reduced models for turbulent transport is essential for accelerating profile predictions and enabling many-query tasks such as parameter exploration and design optimization. This work investigates machine-learning-driven reduced models for Electron Temperature Gradient (ETG) turbulence in the Wendelstein 7-X (W7-X) stellarator. We develop physics-guided scaling laws to predict the ETG heat flux at seven radial locations as functions of three key plasma parameters: the normalized electron temperature gradient ($ω_{T_e}$), the ratio of normalized electron temperature and density gradients ($η_e$), and the electron-to-ion temperature ratio ($τ$). The model coefficients are determined through regression combined with an active learning strategy. The procedure initializes the scaling laws using low-cardinality sparse-grid training data and iteratively enriches the training set by selecting maximally informative samples from an existing simulation database. The predictive performance of the models is assessed using out-of-sample datasets comprising more than $393$ points per radial location. Using the coefficients identified at the seven training radial locations, we further derive regression-based parameterizations for the scaling-law coefficients as functions of radial position. The resulting models are then evaluated at three additional radial locations not used during training, including both interpolation and moderate extrapolation cases. Overall, our reduced models demonstrate good predictive performance and achieve accuracy comparable to the original reference simulations, including in interpolation and moderate extrapolation regimes. An important finding is that a single radius-independent model cannot adequately describe ETG transport across the W7-X core, suggesting the presence of geometry-dependent physics not captured by the present formulation.

URL PDF HTML ☆

赞 0 踩 0

2512.04123 2026-06-08 cs.CY cs.AI cs.LG cs.SE 版本更新

Measuring Agents in Production

生产环境中的智能体测量

Melissa Z. Pan, Negar Arabzadeh, Riccardo Cogo, Yuxuan Zhu, Alexander Xiong, Lakshya A Agrawal, Huanzhi Mao, Emma Shen, Sid Pallerla, Liana Patel, Shu Liu, Tianneng Shi, Xiaoyuan Liu, Jared Quincy Davis, Emmanuele Lacavalla, Alessandro Basile, Shuyi Yang, Paul Castro, Daniel Kang, Koushik Sen, Dawn Song, Joseph E. Gonzalez, Ion Stoica, Matei Zaharia, Marquita Ellis

发表机构 * University of California at Berkeley（加州大学伯克利分校）； IBM Research（IBM研究院）； University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Stanford University（斯坦福大学）

AI总结通过对86个已部署系统的调查和20个案例研究，发现生产环境中的LLM智能体主要采用简单可控的方法，可靠性是首要挑战，并依赖系统级设计和人工评估。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026) as Oral Presentation

详情

AI中文摘要

基于LLM的智能体已经在许多行业的生产环境中运行，但我们缺乏对哪些技术方法能使部署成功的理解。我们首次系统性地研究了生产环境中的智能体测量（MAP），使用了来自智能体开发者的一手数据。我们通过深度访谈进行了20个案例研究，并调查了来自26个领域的86个已部署系统的从业者。我们调查了组织为何构建智能体、如何构建它们、如何评估它们以及它们面临的主要开发挑战。我们的研究发现，生产环境中的智能体是使用简单、可控的方法构建的：68%的智能体在人类干预前最多执行10步，70%依赖对现成模型进行提示而非权重调整，74%主要依赖人工评估。可靠性（随时间保持一致的正确行为）仍然是首要开发挑战，从业者目前通过系统级设计来解决。MAP记录了生产智能体的当前状态，为研究社区提供了部署现实和未充分探索的研究方向的可见性。

英文摘要

LLM-based agents already operate in production across many industries, yet we lack an understanding of what technical methods make deployments successful. We present the first systematic study of Measuring Agents in Production, MAP, using first-hand data from agent developers. We conducted 20 case studies via in-depth interviews and surveyed 86 deployed systems practitioners across 26 domains. We investigate why organizations build agents, how they build them, how they evaluate them, and their top development challenges. Our study finds that production agents are built using simple, controllable approaches: 68% execute at most 10 steps before human intervention, 70% rely on prompting off-the-shelf models instead of weight tuning, and 74% depend primarily on human evaluation. Reliability (consistent correct behavior over time) remains the top development challenge, which practitioners currently address through systems-level design. MAP documents the current state of production agents, providing the research community with visibility into deployment realities and underexplored research avenues.

URL PDF HTML ☆

赞 0 踩 0

2512.12997 2026-06-08 cs.CV cs.AI cs.LG 版本更新

Calibrating Uncertainty for Zero-Shot Adversarial CLIP

校准零样本对抗性CLIP的不确定性

Wenjing Lu, Zerui Tao, Yuning Qiu, Dongping Zhang, Yang Yang, Qibin Zhao

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结针对CLIP在零样本分类中对抗攻击脆弱且不确定性校准差的问题，提出基于狄利克雷分布重参数化的对抗微调目标，统一对齐语义结构与置信度，提升校准性和鲁棒性。

Comments ICML 2026

详情

AI中文摘要

CLIP在零样本分类中表现强劲，但仍易受对抗攻击。先前的对抗微调工作主要匹配干净样本和对抗样本之间的预测logits，忽略了不确定性校准，可能损害零样本泛化能力。在可靠的不确定性估计中，一个常见期望是预测不确定性应随输入难度增加或偏离训练分布而上升。然而，在对抗环境中我们经常观察到相反的情况：扰动不仅降低准确性，还抑制不确定性，导致严重的校准错误和过度自信。这揭示了鲁棒性之外的关键可靠性差距。为弥合这一差距，我们提出了一种考虑准确性和不确定性的CLIP对抗微调目标。通过将CLIP输出重参数化为狄利克雷分布的浓度参数，我们提出了一种统一表示，捕获相对语义结构和置信度大小。这使得在扰动下实现整体分布对齐，超越单一logits锚定，恢复校准的不确定性。在多个零样本基准上的实验表明，我们的方法显著提高了不确定性校准，在保持干净准确性的同时实现了具有竞争力的对抗鲁棒性。

英文摘要

CLIP delivers strong zero-shot classification but remains highly vulnerable to adversarial attacks. Prior adversarial fine-tuning work primarily matches predicted logits between clean and adversarial examples, which overlooks uncertainty calibration and may degrade the zero-shot generalization. A common expectation in reliable uncertainty estimation is that predictive uncertainty should increase as inputs become more difficult or shift away from the training distribution. However, we frequently observe the opposite in the adversarial setting: perturbations not only degrade accuracy but also suppress uncertainty, leading to severe miscalibration and over-confidence. This reveals a critical reliability gap beyond robustness. To bridge this gap, we propose an adversarial fine-tuning objective for CLIP considering both accuracy and uncertainty. By reparameterizing CLIP outputs as the concentration parameters of a Dirichlet distribution, we propose a unified representation that captures relative semantic structure and confidence magnitude. This enables holistic distribution alignment under perturbations, moving beyond single-logit anchoring and restoring calibrated uncertainty. Experiments across multiple zero-shot benchmarks demonstrate that our method significantly improves uncertainty calibration and achieves competitive adversarial robustness while preserving clean accuracy.

URL PDF HTML ☆

赞 0 踩 0

2512.13278 2026-06-08 cs.CL cs.LG 版本更新

AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning

AutoTool: 面向智能体推理的动态工具选择与集成

Jiaru Zou, Ling Yang, Yunzhe Qi, Sirui Chen, Mengting Ai, Ke Shen, Jingrui He, Mengdi Wang

发表机构 * Nanyang Technological University（南洋理工大学）

AI总结提出AutoTool框架，通过双阶段优化（SFT+RL轨迹稳定化和KL正则化Plackett-Luce排序）使大语言模型具备动态工具选择能力，在数学、科学、代码和多模态推理等任务上平均提升6.4%-7.7%。

Comments ICML2026; Best Paper Award at ICCV 2025 Workshop on Multi-Modal Reasoning for Agentic Intelligence

详情

AI中文摘要

智能体强化学习推动了大语言模型（LLMs）在长链思维轨迹中进行推理，同时穿插外部工具的使用。现有方法假设工具集固定，限制了LLM智能体对新工具或演化工具集的适应性。我们提出AutoTool，一个训练框架，使LLM智能体在整个推理轨迹中具备动态工具选择能力。AutoTool采用双阶段优化流水线：（i）基于SFT和RL的轨迹稳定化，以实现连贯推理；（ii）KL正则化的Plackett-Luce排序，以优化一致的多步工具选择。我们进一步构建了一个包含20万条数据的数据集，其中包含跨1000多个工具和100多个任务（涵盖数学、科学、代码生成和多模态推理）的显式工具选择理由。在十个多样化基准上，我们使用AutoTool训练了两个基础模型：Qwen3-8B和Qwen2.5-VL-7B。在参数更少的情况下，AutoTool持续优于先进的LLM智能体和工具集成方法，在数学与科学推理上平均提升6.4%，在基于搜索的问答上提升4.5%，在代码生成上提升7.7%，在多模态理解上提升6.9%。此外，AutoTool通过在推理过程中动态利用演化工具集中的未见工具，展现出更强的泛化能力。

英文摘要

Agentic reinforcement learning has advanced large language models (LLMs) to reason through long chain-of-thought trajectories while interleaving external tool use. Existing approaches assume a fixed inventory of tools, which limits the adaptability of LLM agents to new or evolving toolsets. We present AutoTool, a training framework that equips LLM agents with dynamic tool-selection capabilities throughout their reasoning trajectories. AutoTool employs a dual-phase optimization pipeline: (i) SFT and RL-based trajectory stabilization for coherent reasoning, and (ii) KL-regularized Plackett-Luce Ranking to refine consistent multi-step tool selection. We further build a 200k dataset with explicit tool-selection rationales across 1,000+ tools and 100+ tasks spanning mathematics, science, code generation, and multimodal reasoning. Across ten diverse benchmarks, we train two base models, Qwen3-8B and Qwen2.5-VL-7B, with AutoTool. With fewer parameters, AutoTool consistently outperforms advanced LLM agents and tool-integration methods, yielding average gains of 6.4% in math & science reasoning, 4.5% in search-based QA, 7.7% in code generation, and 6.9% in multimodal understanding. In addition, AutoTool exhibits stronger generalization by dynamically leveraging unseen tools from evolving toolsets during inference.

URL PDF HTML ☆

赞 0 踩 0

2512.23292 2026-06-08 cs.AI cs.LG 版本更新

Agentic Physical AI toward a Domain-Specific Foundation Model for Energy Systems: A Case Study on Nuclear Reactor Control

面向能源系统的领域特定基础模型的具身物理人工智能：以核反应堆控制为例

Yoon Pyo Lee, Samrendra Roy, Kazuma Kobayashi, Sajedul Talukder, Diab Abueidda, Seid Koric, Souvik Chakraborty, Syed Bahauddin Alam

发表机构 * The Grainger College of Engineering, Nuclear, Plasma & Radiological Engineering, University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校格雷格学院工程学院、核等工程学院）； Department of Nuclear Engineering, Hanyang University（汉阳大学核工程系）； University of Texas - El Paso（德克萨斯大学埃尔帕索分校）； National Center for Supercomputing Applications（国家超级计算应用中心）； Department of Applied Mechanics, Indian Institute of Technology Delhi（印度德里理工学院应用力学系）； Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi（印度德里理工学院亚里人工智能学院）

AI总结本研究提出通过紧凑语言模型作为具身物理人工智能，利用基于物理模拟器验证的策略优化替代感知推理，在核反应堆控制任务中实现领域特定基础模型，并展示了规模扩展带来的可靠性提升和策略集中化行为。

详情

AI中文摘要

MACD：基于反事实数据的模型感知对比解码

Qixin Xiao, Kun Zhou

发表机构 * University of Michigan, Ann Arbor, MI, USA（密歇根大学，安娜堡分校）； University of California San Diego, La Jolla, CA, USA（加州大学圣地亚哥分校）

AI总结提出MACD方法，利用视频语言模型自身反馈识别导致幻觉的目标区域，生成目标级反事实输入，结合对比解码减少幻觉，提升多模型在复杂场景下的准确性。

详情

AI中文摘要

视频语言模型（Video-LLMs）容易产生幻觉，当视觉证据薄弱、模糊或存在偏差时，会生成看似合理但无根据的内容。现有方法如对比解码（CD）依赖随机扰动构建对比数据以缓解幻觉，但往往未能针对驱动幻觉的视觉线索或模型弱点。我们提出基于模型感知反事实数据的对比解码（MACD），这是一种结合模型引导的反事实构建与对比解码的推理策略。MACD利用Video-LLM自身的反馈来识别最可能导致幻觉的目标区域，生成有针对性的目标级反事实输入，而非任意的帧或时间修改。这些反事实输入被整合到CD中，以在解码过程中强制进行基于证据的令牌选择。在EventHallusion、MVBench、Perception-test和Video-MME上的实验表明，MACD在包括Qwen和InternVL在内的多种Video-LLM上持续减少幻觉，同时保持或提高任务准确性，在涉及小目标、遮挡目标或共现目标的场景中尤其表现出显著优势。

英文摘要

Video language models (Video-LLMs) are prone to hallucinations, generating plausible but ungrounded content when visual evidence is weak, ambiguous, or biased. Existing methods, such as contrastive decoding (CD), rely on random perturbations to construct contrastive data for hallucination mitigation, but often fail to target the visual cues that drive hallucination or align with model weaknesses. We propose Model-Aware Counterfactual Data based Contrastive Decoding (MACD), an inference strategy that combines model-guided counterfactual construction with contrastive decoding. MACD uses the Video-LLM's own feedback to identify object regions most responsible for hallucination, generating targeted object-level counterfactual inputs rather than arbitrary frame or temporal modifications. These counterfactual inputs are integrated into CD to enforce evidence-grounded token selection during decoding. Experiments on EventHallusion, MVBench, Perception-test, and Video-MME show that MACD consistently reduces hallucination while maintaining or improving task accuracy across diverse Video-LLMs, including Qwen and InternVL, with especially strong gains in scenarios involving small, occluded, or co-occurring objects.

URL PDF HTML ☆

赞 0 踩 0

2602.02014 2026-06-08 cs.CV cs.AI cs.CL cs.LG 版本更新

Rethinking Genomic Modeling Through Optical Character Recognition

通过光学字符识别重新思考基因组建模

Hongxin Xiang, Pengsen Ma, Yunkang Cao, Di Yu, Haowen Chen, Xinyu Yang, Xiangxiang Zeng

发表机构 * National University of Singapore（新加坡国立大学）； University of Science and Technology of China（中国科学技术大学）

AI总结提出OpticalDNA框架，将DNA渲染为视觉布局，利用视觉语言模型进行OCR式基因组理解，实现高保真压缩和长序列高效处理，在450k碱基序列上以近20倍更少有效token超越基线模型。

Comments Accepted by ICML 2026

详情

AI中文摘要

最近的基因组基础模型大多采用大型语言模型架构，将DNA视为一维token序列。然而，穷举式顺序阅读在结构上与稀疏且不连续的基因组语义不匹配，导致在低信息背景上的计算浪费，并阻碍了面向长上下文的压缩理解。在此，我们提出OpticalDNA，一个基于视觉的框架，将基因组建模重新定义为光学字符识别（OCR）风格的文档理解。OpticalDNA将DNA渲染为结构化视觉布局，并训练一个具备OCR能力的视觉语言模型，该模型包含视觉DNA编码器和文档解码器，其中编码器生成紧凑、可重建的视觉token以实现高保真压缩。基于这种表示，OpticalDNA定义了基于提示条件的核心基因组原语目标——读取、区域定位、子序列检索和掩码跨度补全——从而学习到布局感知的DNA表示，在减少的有效token预算下保留细粒度的基因组信息。在多种基因组基准测试中，OpticalDNA持续优于最近的基线模型；在长达450k碱基的序列上，它以近20倍更少的有效token实现了最佳整体性能，并且仅调整256k可训练参数就超越了激活参数多达985倍的模型。

英文摘要

Recent genomic foundation models largely adopt large language model architectures that treat DNA as a one-dimensional token sequence. However, exhaustive sequential reading is structurally misaligned with sparse and discontinuous genomic semantics, leading to wasted computation on low-information background and preventing understanding-driven compression for long contexts. Here, we present OpticalDNA, a vision-based framework that reframes genomic modeling as Optical Character Recognition (OCR)-style document understanding. OpticalDNA renders DNA into structured visual layouts and trains an OCR-capable vision--language model with a visual DNA encoder and a document decoder, where the encoder produces compact, reconstructible visual tokens for high-fidelity compression. Building on this representation, OpticalDNA defines prompt-conditioned objectives over core genomic primitives-reading, region grounding, subsequence retrieval, and masked span completion-thereby learning layout-aware DNA representations that retain fine-grained genomic information under a reduced effective token budget. Across diverse genomic benchmarks, OpticalDNA consistently outperforms recent baselines; on sequences up to 450k bases, it achieves the best overall performance with nearly 20$\times$ fewer effective tokens, and surpasses models with up to 985$\times$ more activated parameters while tuning only 256k trainable parameters.

URL PDF HTML ☆

赞 0 踩 0

2602.06245 2026-06-08 stat.ML cs.LG 版本更新

Inheritance Between Feedforward and Convolutional Networks via Model Projection

前馈网络与卷积网络之间的继承关系：通过模型投影

Nicolas Ewen, Jairo Diaz-Rodriguez, Kelly Ramsay

发表机构 * Department of Mathematics and Statistics（数学与统计学系）

AI总结提出模型继承概念，证明广义前馈网络是广义卷积网络的子集，并通过模型投影实现反向继承，用于参数高效的迁移学习。

详情

AI中文摘要

神经网络技术通常通过类比在不同架构家族之间转移，但这种转移仅在技术所需假设被保留时才有效。我们将这一思想引入为模型类之间的继承。使用统一的节点级框架和张量值激活，我们证明广义前馈网络（GFFN）是广义卷积网络（GCNN）的严格子集，因此GCNN的性质直接转移到GFFN。反向方向并非自动：标准CNN节点使用空间核，而FFN节点对每个输入贡献使用一个标量权重。我们引入模型投影来恢复受限的反向继承路径。投影冻结每个卷积输入通道子函数，并为每个输入-输出通道贡献学习一个标量系数，使投影后的CNN节点具有标量加权输入重组的GFFN风格可训练结构。这种继承结构自然导致参数高效的迁移学习。在多个ImageNet预训练CNN骨干网络和下游图像分类数据集上，模型投影与标准和PEFT基线竞争，并为后续全微调提供有效的初始化。

英文摘要

Neural-network techniques are often transferred across architecture families by analogy, but such transfer is valid only when the assumptions required by a technique are preserved. We introduce this idea as inheritance between model classes. Using a unified node-level framework with tensor-valued activations, we prove that generalized feedforward networks (GFFNs) form a strict subset of generalized convolutional networks (GCNNs), so GCNN properties transfer directly to GFFNs. The reverse direction is not automatic: standard CNN nodes use spatial kernels, while FFN nodes use one scalar weight per input contribution. We introduce model projection to recover a restricted reverse inheritance path. Projection freezes each convolutional input-channel sub-function and learns one scalar coefficient for each input-output channel contribution, giving projected CNN nodes the GFFN-style trainable structure of scalar-weighted input recombination. This inherited structure leads naturally to parameter-efficient transfer learning. Across multiple ImageNet-pretrained CNN backbones and downstream image-classification datasets, model projection is competitive with standard and PEFT baselines and provides an effective initialization for subsequent full fine-tuning.

URL PDF HTML ☆

赞 0 踩 0

2602.09580 2026-06-08 cs.RO cs.LG 版本更新

SERNF: Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows

SERNF: 通过动作块评论家和归一化流实现样本高效的真实世界灵巧策略微调

Chenyu Yang, Denis Tarasov, Davide Liconti, Romain Guntz, Hehui Zheng, Robert K. Katzschmann

发表机构 * Soft Robotics Lab, D-MAVT（软机器人实验室，D-MAVT）； ETH Zurich（苏黎世联邦理工学院）

AI总结提出SERNF框架，结合归一化流策略和动作块评论家，实现真实世界灵巧操作策略的样本高效微调，解决多模态动作分布和信用分配问题。

Comments https://srl-ethz.github.io/SERNF/

详情

AI中文摘要

由于有限的真实世界交互预算和高度多模态的动作分布，真实世界中灵巧操作策略的微调仍然具有挑战性。基于扩散的策略虽然表达能力强，但在微调过程中不允许进行保守的基于似然的更新，因为动作概率难以处理。相比之下，传统的高斯策略在多模态下会崩溃，特别是当动作以块形式执行时，而标准的逐步骤评论家无法与块执行对齐，导致信用分配不佳。我们提出了SERFN，一个具有归一化流（NF）的样本高效离策略微调框架，以应对这些挑战。归一化流策略为多模态动作块提供精确的似然，通过似然正则化实现保守、稳定的策略更新，从而提高样本效率。动作块评论家评估整个动作序列，使价值估计与策略的时间结构对齐，并改善长时域信用分配。据我们所知，这是首次在真实机器人硬件上展示基于似然的多模态生成策略与块级价值学习相结合。我们在真实世界的两个具有挑战性的灵巧操作任务上评估了SERFN：从盒子中取出剪刀并剪断胶带，以及手掌朝下抓握时进行手中立方体旋转——两者都需要在长时域内进行精确、灵巧的控制。在这些任务上，SERFN实现了稳定、样本高效的适应，而标准方法则难以应对。

英文摘要

Real-world fine-tuning of dexterous manipulation policies remains challenging due to limited real-world interaction budgets and highly multimodal action distributions. Diffusion-based policies, while expressive, do not permit conservative likelihood-based updates during fine-tuning because action probabilities are intractable. In contrast, conventional Gaussian policies collapse under multimodality, particularly when actions are executed in chunks, and standard per-step critics fail to align with chunked execution, leading to poor credit assignment. We present SERFN, a sample-efficient off-policy fine-tuning framework with normalizing flow (NF) to address these challenges. The normalizing flow policy yields exact likelihoods for multimodal action chunks, allowing conservative, stable policy updates through likelihood regularization and thereby improving sample efficiency. An action-chunked critic evaluates entire action sequences, aligning value estimation with the policy's temporal structure and improving long-horizon credit assignment. To our knowledge, this is the first demonstration of a likelihood-based, multimodal generative policy combined with chunk-level value learning on real robotic hardware. We evaluate SERFN on two challenging dexterous manipulation tasks in the real world: cutting tape with scissors retrieved from a case, and in-hand cube rotation with a palm-down grasp -- both of which require precise, dexterous control over long horizons. On these tasks, SERFN achieves stable, sample-efficient adaptation where standard methods struggle.

URL PDF HTML ☆

赞 0 踩 0

2602.15084 2026-06-08 physics.plasm-ph cs.AI cs.LG 版本更新

TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics

TokaMind: 用于托卡马克等离子体动力学的多模态Transformer基础模型

Tobia Boschi, Andrea Loreti, Nicola C. Amorisco, Rodrigo H. Ordonez-Hurtado, Cécile Rousseau, George K. Holt, Eszter Székely, Alexander Whittle, Samuel Jackson, Adriano Agnello, Stanislas Pamela, Alessandra Pascale, Robert Akers, Juan Bernabe Moreno, Vassil Alexandrov, Mykhaylo Zayats

发表机构 * IBM Research（IBM研究院）； UK Atomic Energy Authority（英国原子能局）； STFC Hartree Centre（科学与技术设施研究中心哈特ree中心）

AI总结提出TokaMind，首个开源托卡马克等离子体动力学基础模型，基于多模态Transformer在MAST数据集上预训练，支持多种数据模态和缺失信号处理，在14个任务上优于基线。

详情

AI中文摘要

我们提出TokaMind，据我们所知，这是首个用于托卡马克等离子体动力学的开源基础模型，基于多模态Transformer（MMT）并在公开可用的MAST数据集上的异构诊断数据上预训练。TokaMind支持多种数据模态（时间序列、2D轮廓和视频），具有不同的采样率、鲁棒的缺失信号处理，并通过选择性加载和冻结四个模型组件实现高效任务适配。为了表示多模态信号，我们使用轻量级固定基离散余弦变换嵌入（DCT3D），并为替代嵌入（例如变分自编码器）提供干净接口。我们在最近引入的MAST基准TokaMark上评估TokaMind，该基准包含14个具有异构重建和预测目标的任务。我们的结果表明，微调后的TokaMind在所有任务上均优于最强的基准基线，仅一个任务除外。与在匹配的epoch预算下从头训练相同架构相比，热启动适配在要求苛刻的下游设置中最为有益，包括长时域预测和高维平衡目标。这些发现突显了多模态预训练对托卡马克等离子体动力学的价值，并为未来的聚变建模任务提供了实用、可扩展的基础。训练代码和模型权重分别公开在github.com/UKAEA-IBM-STFC-Fusion-FMs/tokamind和huggingface.co/UKAEA-IBM-STFC。

英文摘要

We present TokaMind, to our knowledge the first open-source foundation model for tokamak plasma dynamics, based on a Multi-Modal Transformer (MMT) and pretrained on heterogeneous diagnostics from the publicly available MAST dataset. TokaMind supports multiple data modalities (time-series, 2D profiles, and videos) with different sampling rates, robust missing-signal handling, and efficient task adaptation via selectively loading and freezing four model components. To represent multi-modal signals, we use a lightweight fixed-basis Discrete Cosine Transform embedding (DCT3D) and provide a clean interface for alternative embeddings (e.g., Variational Autoencoders). We evaluate TokaMind on the recently introduced MAST benchmark TokaMark, which comprises 14 tasks with heterogeneous reconstruction and forecasting objectives. Our results show that fine-tuned TokaMind outperforms the strongest benchmark baseline on all but one task. Compared with training the same architecture from scratch under a matched epoch budget, warm-start adaptation is most beneficial on demanding downstream settings, including long-horizon forecasting and high-dimensional equilibrium objectives. These findings highlight the value of multi-modal pretraining for tokamak plasma dynamics and provide a practical, extensible foundation for future fusion modeling tasks. Training code and model weights are publicly available at github.com/UKAEA-IBM-STFC-Fusion-FMs/tokamind and huggingface.co/UKAEA-IBM-STFC, respectively.

URL PDF HTML ☆

赞 0 踩 0

2602.16908 2026-06-08 cond-mat.mtrl-sci cs.LG quant-ph 版本更新

Multi-objective optimization and quantum hybridization of equivariant deep learning interatomic potentials

等变深度学习原子间势的多目标优化与量子混合

G. Laskaris, D. Morozov, D. Tarpanov, A. Seth, J. Procelewska, G. Sai Gautam, A. Sagingalieva, R. Brasher, A. Melnikov

发表机构 * Terra Quantum AG ； LIACS, Leiden University（LIACS，莱顿大学）； Nanoscience Center and Department of Chemistry, University of Jyväskylä（贾瓦尔基利亚大学纳米科学中心和化学系）； Department of Materials Engineering, Indian Institute of Science（印度科学研究所材料工程系）； Schaeffler Technologies AG & Co. KG

AI总结针对Allegro模型在精度与推理时间之间的权衡，采用多目标超参数优化，并设计经典扩展和量子-经典混合两种变体，在多个数据集上验证了混合变体在力预测精度上的优势。

Comments 15 pages, 7 figures, 6 tables

详情

DOI: 10.1016/j.commatsci.2026.114742
Journal ref: Comput. Mater. Sci. 270, 114742 (2026)

AI中文摘要

Allegro是一种机器学习原子间势模型，旨在使用E(3)等变神经网络预测分子中的原子性质。在训练该模型时，精度与推理时间之间往往存在权衡。为此，我们对这两个目标应用多目标超参数优化。此外，我们通过构建Allegro的变体来尝试修改架构：一种扩展了额外的经典层，另一种结合了量子-经典混合层。我们在QM9、rMD17-阿司匹林、rMD17-苯以及一个自生成的铜-锂结构数据集上评估所有模型。结果表明，两种变体在多个数据集上的力预测精度均超过Allegro。经典变体持续优于基线，而量子-经典混合变体在完全优化的Cu-Li数据集上取得了最佳的整体力预测精度，比经典变体高出约13%。值得注意的是，尽管混合变体在其他数据集上使用了从Cu-Li转移的超参数而未进行特定数据集的优化，但仍取得了有竞争力的结果，这表明量子-经典混合是增强MLIP架构的一个有前景的方向。

英文摘要

Allegro is a machine learning interatomic potential model designed to predict atomic properties in molecules using E(3) equivariant neural networks. When training this model, there tends to be a trade-off between accuracy and inference time. For this reason, we apply multi-objective hyperparameter optimization to both objectives. Additionally, we experiment with modified architectures by constructing variants of Allegro: one extended with additional classical layers and one incorporating quantum-classical hybrid layers. We evaluate all models on QM9, rMD17-aspirin, rMD17-benzene, and a self-generated dataset of copper-lithium structures. As results, both variants surpass Allegro in force prediction accuracy across multiple datasets. The classical variant consistently improves over the baseline, while the quantum-classical hybrid variant achieves the best overall force prediction accuracy on the Cu-Li dataset, where it was fully optimized, outperforming the classical variant by approximately 13%. Notably, the hybrid variant also achieves competitive results on the remaining datasets despite using hyperparameters transferred from Cu-Li without dataset-specific optimization, suggesting that quantum-classical hybridization is a promising direction for enhancing MLIP architectures.

URL PDF HTML ☆

赞 0 踩 0

2603.06673 2026-06-08 cs.CV cs.LG 版本更新

Unmixing ATR-μFTIR spectroscopic images of cross-sections of historical oil paintings

历史油画横截面的ATR-μFTIR光谱图像解混

Shivam Pande, Nicolas Nadisic, Francisco Mederos-Henry, Aleksandra Pizurica

发表机构 * Belgian Federal Science Policy（比利时联邦科学政策）； FED-tWIN project（FED-tWIN项目）； Prf-2022-050 BALaTAI ； Prf-2021-002 MatCoRe

AI总结提出一种无监督CNN自编码器，结合加权光谱角距离损失，用于解混ATR-μFTIR高光谱图像，自动估计端元光谱和丰度图，在污染区域提升可解释性。

Comments 5 pages, accepted at EUSIPCO 2026

详情

AI中文摘要

光谱成像已成为遗产科学的核心技术，因为它能够对文物中的材料进行非侵入性、空间分辨的表征。特别是，衰减全反射傅里叶变换红外显微镜（ATR-$μ$FTIR）被广泛用于分析绘画横截面，其中在每个像素处记录光谱以形成高光谱图像（HSI）。解释这些数据是困难的：光谱通常是异质、多层和退化样品中多种物质的混合物，而当前实践仍然严重依赖于与参考库的手动比较。这种工作流程缓慢、主观且难以扩展。我们提出了一种无监督CNN自编码器，用于盲解混ATR-$μ$FTIR HSI，通过基于块建模利用局部空间结构，估计端元光谱及其丰度图。为了减少对超过1500个波段的大气和采集伪影的敏感性，我们引入了一种加权光谱角距离（WSAD）损失，该损失具有从空间平坦度、邻域一致性和光谱粗糙度的稳健度量中自动导出的波段可靠性权重。与标准SAD训练相比，WSAD在易受污染的光谱区域提高了可解释性。我们在凡·艾克兄弟的根特祭坛画的ATR-$μ$FTIR横截面上演示了该方法。

英文摘要

Spectroscopic imaging (SI) has become central to heritage science because it enables non-invasive, spatially resolved characterisation of materials in artefacts. In particular, attenuated total reflection Fourier transform infrared microscopy (ATR-$μ$FTIR) is widely used to analyse painting cross-sections, where a spectrum is recorded at each pixel to form a hyperspectral image (HSI). Interpreting these data is difficult: spectra are often mixtures of several species in heterogeneous, multi-layered and degraded samples, and current practice still relies heavily on manual comparison with reference libraries. This workflow is slow, subjective and hard to scale. We propose an unsupervised CNN autoencoder for blind unmixing of ATR-$μ$FTIR HSIs, estimating endmember spectra and their abundance maps while exploiting local spatial structure through patch-based modelling. To reduce sensitivity to atmospheric and acquisition artefacts across more than 1500 bands, we introduce a weighted spectral angle distance (WSAD) loss with automatic band-reliability weights derived from robust measures of spatial flatness, neighbour agreement and spectral roughness. Compared with standard SAD training, WSAD improves interpretability in contamination-prone spectral regions. We demonstrate the method on an ATR-$μ$FTIR cross-section from the Ghent Altarpiece by the Van Eyck brothers.

URL PDF HTML ☆

赞 0 踩 0

2603.06915 2026-06-08 cs.CL cs.LG 版本更新

A Dynamic Self-Evolving Extraction System

一种动态自演化抽取系统

Moin Amin-Naseri, Hannah Kim, Estevam Hruschka

发表机构 * Megagon Labs（Megagon实验室）

AI总结提出DySECT系统，通过LLM抽取三元组构建知识库，结合概率知识和图推理丰富知识，再反馈优化抽取器，形成闭环持续提升。

详情

AI中文摘要

从原始文本中抽取结构化信息是许多NLP应用（包括文档检索、排序和相关性估计）的基本组成部分。高质量的抽取通常需要领域特定的准确性、对专业分类法的最新理解，以及吸收新兴术语和罕见异常值的能力。在许多领域（如医疗、法律和人力资源），抽取模型还必须适应不断变化的术语，并受益于对结构化知识的显式推理。我们提出了DySECT，一个动态自演化抽取与策管工具包，它在使用过程中持续改进。该系统逐步用LLM抽取的三元组填充一个多功能、自扩展的知识库（KB）。KB通过整合概率知识和基于图的推理进一步丰富自身，逐步积累领域概念和关系。然后，丰富的KB通过提示调优、采样相关少样本示例或使用KB衍生的合成数据进行微调，反馈给LLM抽取器。结果，系统形成了一个共生的闭环循环，其中抽取持续改进知识，知识持续改进抽取。

英文摘要

The extraction of structured information from raw text is a fundamental component of many NLP applications, including document retrieval, ranking, and relevance estimation. High-quality extractions often require domain-specific accuracy, up-to-date understanding of specialized taxonomies, and the ability to incorporate emerging jargon and rare outliers. In many domains--such as medical, legal, and HR--the extraction model must also adapt to shifting terminology and benefit from explicit reasoning over structured knowledge. We propose DySECT, a Dynamic Self-Evolving Extraction and Curation Toolkit, which continually improves as it is used. The system incrementally populates a versatile, self-expanding knowledge base (KB) with triples extracted by the LLM. The KB further enriches itself through the integration of probabilistic knowledge and graph-based reasoning, gradually accumulating domain concepts and relationships. The enriched KB then feeds back into the LLM extractor via prompt tuning, sampling of relevant few-shot examples, or fine-tuning using KB-derived synthetic data. As a result, the system forms a symbiotic closed-loop cycle in which extraction continuously improves knowledge, and knowledge continuously improves extraction.

URL PDF HTML ☆

赞 0 踩 0

2603.08683 2026-06-08 cs.SD cs.AI cs.LG eess.AS 版本更新

Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

全保真音频无损压缩的语言建模基准测试

Phillip Long, Zachary Novack, Chris Donahue

发表机构 * University of California, San Diego, Computer Science and Engineering Department（加州大学圣地亚哥分校计算机科学与工程系）； Carnegie Mellon University, School of Computer Science（卡内基梅隆大学计算机科学学院）

AI总结提出字节级分词方案Trilobyte，将词汇量从指数级降至常数级，首次实现24位音频的LM无损压缩，并在8位和16位下超越FLAC。

Comments Accepted at Interspeech 2026, 7 pages, 5 figures

详情

AI中文摘要

在原始波形上训练的自回归“语言”模型（LM）可被重新用于无损音频压缩，但先前的工作仅限于8位音频，尚不清楚此类方法是否适用于实际场景（16/24位）以及能否与现有编解码器竞争。我们对基于LM的压缩在全保真音频上进行了基准测试，涵盖不同领域（音乐、语音、生物声学）、采样率（16kHz-48kHz）和位深度（8、16、24位）。标准的样本级分词在更高位深度下因词汇量过大（16位为65K；24位为16.7M）而变得不可行。我们提出了Trilobyte，一种用于全分辨率音频的字节级分词方案，将词汇量从$O(2^{b})$改进为$O(1)$，并首次实现了可行的24位基于LM的无损压缩。虽然LM在8位和16位下持续优于FLAC并达到最先进的压缩效果，但我们观察到，随着位深度超过8位，压缩增益变得更为有限。

英文摘要

Autoregressive "language" models (LMs) trained on raw waveforms can be repurposed for lossless audio compression, but prior work is limited to 8-bit audio, leaving open whether such approaches work for practical settings (16/24-bit) and can compete with existing codecs. We benchmark LM-based compression on full-fidelity audio across diverse domains (music, speech, bioacoustics), sampling rates (16kHz-48kHz), and bit depths (8, 16, 24-bit). Standard sample-level tokenization becomes intractable at higher bit depths due to vocabulary size (65K for 16-bit; 16.7M for 24-bit). We propose Trilobyte, a byte-level tokenization schema for full resolution audio, improving vocabulary scaling from $O(2^{b})$ to $O(1)$ and enabling the first tractable 24-bit LM-based lossless compression. While LMs consistently outperform FLAC and yield state-of-the-art compression at 8-bit and 16-bit, we observe that compression gains become more modest as bit depth increases beyond 8-bit.

URL PDF HTML ☆

赞 0 踩 0

2603.19146 2026-06-08 cs.AI cs.LG 版本更新

D5P4: Partition Determinantal Point Process for Diversity in Parallel Discrete Diffusion Decoding

D5P4：用于并行离散扩散解码中多样性的分区行列式点过程

Jonathan Lys, Vincent Gripon, Axel Marmoret, Lukas Mauch, Fabien Cardinaux, Ghouthi Boukli Hacene, Bastien Pasdeloup

发表机构 * IMT Atlantique, Lab-STICC, UMR CNRS 6285（IMT阿提拉克大学，Lab-STICC，UMR CNRS 6285）； Sony Europe Ltd. Stuttgart Technology Center, EUREC（索尼欧洲有限公司，斯图加特技术中心，EUREC）

AI总结提出D5P4波束解码方法，利用分区行列式点过程在离散扩散模型中选择中间序列，平衡质量与多样性，无需外部验证器。

详情

AI中文摘要

离散扩散模型是自回归方法在文本生成中的有前途的替代方案，但其解码方法仍研究不足。标准的自回归搜索过程（如波束搜索）不直接适用于迭代去噪，其中假设是完整的中间序列而非从左到右的前缀。此外，现有的扩散解码过程对保留假设的多样性和覆盖范围的控制有限。在这项工作中，我们引入了D5P4，一种针对离散扩散模型定制的波束式解码方法，它将中间波束选择视为分区行列式点过程下的MAP推理。这产生了一个模型内部的批次目标，无需外部验证器即可平衡质量和多样性。在开放域生成、问答和数学推理上的实验表明，D5P4提高了多样性和pass@$k$覆盖率，同时匹配或超越了基线质量和保真度。

英文摘要

Discrete diffusion models are promising alternatives to autoregressive approaches for text generation, yet their decoding methods remain under-studied. Standard autoregressive search procedures, such as beam search, do not directly apply to iterative denoising, where hypotheses are complete intermediate sequences rather than left-to-right prefixes. Furthermore, existing diffusion decoding procedures only provide limited control over the diversity and coverage of retained hypotheses. In this work, we introduce D5P4, a beam-style decoding method tailored to discrete diffusion models, which casts intermediate beam selection as MAP inference under a partitioned Determinantal Point Process. This yields a model-internal batch objective that balances quality and diversity without external verifiers. Experiments on open-ended generation, question answering, and mathematical reasoning show that D5P4 improves diversity and pass@$k$ coverage while matching or surpassing baseline quality and fidelity

URL PDF HTML ☆

赞 0 踩 0

2603.20967 2026-06-08 stat.ML cs.LG math.ST stat.TH 版本更新

Hard labels sampled from sparse targets mislead rotation invariant algorithms

从稀疏目标采样的硬标签误导旋转不变算法

Avrajit Ghosh, Bin Yu, Manfred Warmuth, Peter Bartlett

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Wisconsin, Madison（威斯康星大学麦迪逊分校）

AI总结针对稀疏目标下的二分类问题，证明旋转不变算法（如逻辑损失梯度下降）的过风险下界为Ω((d-1)/n)，而通过重参数化u_i v_i的非旋转不变算法可实现O(s log d / n)的上界。

详情

Journal ref: ICML-2026

AI中文摘要

最常见的机器学习设置之一是逻辑回归。在许多分类模型中，包括神经网络，最终预测是通过将逻辑链接函数应用于线性得分获得的。在二元逻辑回归中，反馈可以是软标签（对应于数据的真实条件概率，如在蒸馏中）或采样的硬标签（取值为$\pm 1$）。我们指出即使在特别有利的设置中也会出现一个基本问题，其中目标是学习形式为$\sigma(\mathbf{x}^{\top}\mathbf{w}^{\star})$的无噪声软目标。在过约束情况（即样本数$n$超过输入维度$d$）下，使用样本$(\mathbf{x}_i,\sigma(\mathbf{x}_i^{\top}\mathbf{w}^{\star}))$足以恢复$\mathbf{w}^{\star}$，从而获得贝叶斯风险。然而，我们证明当样本由从相同条件分布$\sigma(\mathbf{x}_i^{\top}\mathbf{w}^{\star})$采样的硬标签$y_i$标记，且$\mathbf{w}^{\star}$是$s$-稀疏时，旋转不变算法被证明是次优的：它们产生过风险$\Omega\\!\left(\frac{d-1}{n}\right)$，而存在简单的非旋转不变算法，其过风险为$O(\frac{s\log d}{n})$。最简单的旋转不变算法是逻辑损失上的梯度下降（带早停）。针对稀疏目标实现上述上界的简单非旋转不变算法使用对权重$u_i,v_i$的梯度下降，其中线性权重$w_i$被重参数化为$u_i v_i$。

英文摘要

One of the most common machine learning setups is logistic regression. In many classification models, including neural networks, the final prediction is obtained by applying a logistic link function to a linear score. In binary logistic regression, the feedback can be either soft labels, corresponding to the true conditional probability of the data (as in distillation), or sampled hard labels (taking values $\pm 1$). We point out a fundamental problem that arises even in a particularly favorable setting, where the goal is to learn a noise-free soft target of the form $σ(\mathbf{x}^{\top}\mathbf{w}^{\star})$. In the over-constrained case (i.e. the number of samples $n$ exceeds the input dimension $d$) with examples $(\mathbf{x}_i,σ(\mathbf{x}_i^{\top}\mathbf{w}^{\star}))$, it is sufficient to recover $\mathbf{w}^{\star}$ and hence achieve the Bayes risk. However, we prove that when the examples are labeled by hard labels $y_i$ sampled from the same conditional distribution $σ(\mathbf{x}_i^{\top}\mathbf{w}^{\star})$ and $\mathbf{w}^{\star}$ is $s$-sparse, then rotation-invariant algorithms are provably suboptimal: they incur an excess risk $Ω\!\left(\frac{d-1}{n}\right)$, while there are simple non-rotation invariant algorithms with excess risk $O(\frac{s\log d}{n})$. The simplest rotation invariant algorithm is gradient descent on the logistic loss (with early stopping). A simple non-rotation-invariant algorithm for sparse targets that achieves the above upper bounds uses gradient descent on the weights $u_i,v_i$, where now the linear weight $w_i$ is reparameterized as $u_iv_i$.

URL PDF HTML ☆

赞 0 踩 0

2603.22278 2026-06-08 cs.CV cs.LG 版本更新

The Dual Mechanisms of Spatial Variable Binding in Vision-Language Models

视觉-语言模型中空间变量绑定的双重机制

Kelly Cui, Nikhil Prakash, Shoval Messica, Ayush Raina, David Bau, Antonio Torralba, Tamar Rott Shaham

发表机构 * MIT CSAIL（麻省理工学院计算机科学与人工智能实验室）； Northeastern University（东北大学）； Sony Playstation（索尼PlayStation）

AI总结本文揭示视觉-语言模型通过语言骨干中的内容无关空间关系编码和视觉编码器中的全局布局表示两种机制实现空间变量绑定，其中视觉编码器起主导作用。

Comments 37 pages, 53 figures

详情

AI中文摘要

许多多模态任务，如图像描述和视觉问答，要求视觉-语言模型（VLM）将对象与其属性和空间关系绑定。然而，这种关联在VLM中如何以及在哪里计算仍不清楚。在这项工作中，我们展示了VLM依赖两种并发机制来表示空间变量绑定。在语言模型骨干中，中间层在对应对象的视觉标记之上表示内容无关的空间关系。然而，这种机制在塑造模型预测中仅起次要作用。相反，空间信息的主要来源是视觉编码器，其表示编码了对象的布局，并被语言模型骨干直接利用。值得注意的是，这种空间信号全局分布在视觉标记中，从对象区域扩展到周围的背景区域。我们表明，增强这些源自视觉的空间表示（跨所有图像标记）可以改善不同规模模型在COCO数据集复杂自然图像上的空间变量绑定性能。总之，我们的结果阐明了VLM中空间变量绑定的计算方式，并强调了视觉编码器在实现这一功能中的核心作用。

英文摘要

Many multimodal tasks, such as image captioning and visual question answering, require vision-language models (VLMs) to bind objects with their properties and spatial relations. Yet it remains unclear where and how such associations are computed within VLMs. In this work, we show that VLMs rely on two concurrent mechanisms to represent spatial variable binding. In the language model backbone, intermediate layers represent content-independent spatial relations on top of visual tokens corresponding to objects. However, this mechanism plays only a secondary role in shaping model predictions. Instead, the dominant source of spatial information originates in the vision encoder, whose representations encode the layout of objects and are directly exploited by the language model backbone. Notably, this spatial signal is distributed globally across visual tokens, extending beyond object regions into surrounding background areas. We show that enhancing these vision-derived spatial representations globally across all image tokens improves spatial variable binding performance across models of various sizes on complex natural images from the COCO datasets. Together, our results clarify how spatial variable binding is computed within VLMs and highlight the central role of vision encoders in enabling it.

URL PDF HTML ☆

赞 0 踩 0

2603.24481 2026-06-08 cs.AI cs.CL cs.LG 版本更新

Multi-Agent Reasoning with Consistency Verification Improves Uncertainty Calibration in Medical MCQA

基于一致性验证的多智能体推理改进医学多项选择题问答中的不确定性校准

John Ray B. Martinez

发表机构 * Department of Data Science and Analytics（数据科学与分析系）

AI总结提出多智能体框架，结合领域专家智能体与两阶段验证及S分数加权融合，在医学MCQA中显著降低校准误差并提升判别能力。

Comments 20 pages, 6 figures. Preprint under review

详情

AI中文摘要

校准不良的置信度分数是AI在临床环境中部署的实际障碍。总是过度自信的模型无法为延迟决策提供有用信号。我们提出了一个多智能体框架，结合领域特定专家智能体与两阶段验证（Wu等人，2024）和S分数加权融合，以改进医学多项选择题问答中的校准和判别能力。四个专家智能体（呼吸科、心脏病科、神经科、胃肠科）使用Qwen2.5-7B-Instruct生成独立诊断。每个诊断经历两阶段自我验证过程，测量内部一致性并产生专家置信度分数（S分数）。S分数驱动加权融合策略，选择最终答案并校准报告的置信度。我们在MedQA-USMLE和MedMCQA的高分歧子集（100和250个问题）上进行评估。所有结果均针对此过滤后的设置。在MedQA-250上，完整系统实现了ECE=0.091（比单专家基线降低74.4%）和AUROC=0.630（+0.056），准确率为59.2%。在所有四种设置中，校准增益保持在49-74%。消融分析表明，两阶段验证驱动ECE降低，而多智能体推理驱动AUROC提升，表明一致性检查和集成聚合解决了LLM不确定性的不同失败模式。由此产生的置信度信号是否足以在实践中支持临床延迟决策，仍是未来研究的方向。

英文摘要

Miscalibrated confidence scores are a practical obstacle to deploying AI in clinical settings. A model that is always overconfident offers no useful signal for deferral. We present a multi-agent framework that combines domain-specific specialist agents with Two-Phase Verification (Wu et al., 2024) and S-Score Weighted Fusion to improve both calibration and discrimination in medical multiple-choice question answering. Four specialist agents (respiratory, cardiology, neurology, gastroenterology) generate independent diagnoses using Qwen2.5-7B-Instruct. Each diagnosis undergoes a two-phase self-verification process that measures internal consistency and produces a Specialist Confidence Score (S-score). The S-scores drive a weighted fusion strategy that selects the final answer and calibrates the reported confidence. We evaluate on high-disagreement subsets of MedQA-USMLE and MedMCQA (100 and 250 questions). All results are specific to this filtered regime. On MedQA-250, the full system achieves ECE = 0.091 (74.4% reduction over the single-specialist baseline) and AUROC = 0.630 (+0.056) at 59.2% accuracy. Calibration gains of 49-74% hold across all four settings. Ablation analysis reveals that Two-Phase Verification drives ECE reduction while multi-agent reasoning drives AUROC improvement, suggesting that consistency checking and ensemble aggregation address different failure modes of LLM uncertainty. Whether the resulting confidence signal is sufficient to support clinical deferral decisions in practice remains a direction for future investigation.

URL PDF HTML ☆

赞 0 踩 0

2603.24963 2026-06-08 cs.AI cs.LG 版本更新

Design Once, Deploy at Scale: Template-Driven ML Development for Large Model Ecosystems

一次设计，大规模部署：面向大型模型生态的模板驱动ML开发

Jiang Liu, John Martabano Landy, Yao Xuan, Swamy Muddu, Nhat Le, Munaf Sahaf, Luc Kien Hang, Rupinder Khandpour, Kevin De Angeli, Chang Yang, Shouyuan Chen, Shiblee Sadik, Anirudh Agrawal, Djordje Gligorijevic, Jingzheng Qin, Peggy Yao, Alireza Vahdatpour

发表机构 * Meta AI

AI总结针对大型模型生态中ML开发效率低的问题，提出标准化模型模板（SMT）框架，将技术传播复杂度从O(n·2^k)降至O(n+k)，在Meta广告排名系统中实现交叉熵提升0.63%、迭代时间减少92%、技术-模型对采用吞吐量提升6.3倍。

详情

AI中文摘要

现代计算广告平台通常依赖推荐系统来预测用户响应，如点击率、转化率和其他优化事件。为了支持多样化的产品表面和广告主目标，这些平台经常维护一个广泛的机器学习（ML）模型生态系统。然而，在这种规模下运营带来了显著的发展和效率挑战。需要大量的工程努力来定期刷新ML模型并传播新技术，这导致在生态系统中部署ML创新时出现长延迟。我们提出了一项大规模实证研究，比较了标准化模型构建方法与推荐系统中独立每模型优化之间的模型性能、效率和ML技术传播。为了促进这种标准化，我们提出了标准模型模板（SMT）——一个生成适应不同数据分布和优化事件的高性能模型的框架。通过利用标准化、可组合的ML模型组件，SMT将技术传播复杂度从O(n·2^k)降低到O(n+k)，其中n是模型数量，k是技术数量。在Meta的生产广告排名生态系统中，对四个全球开发周期内的广泛模型套件进行评估，我们的结果表明：（1）在中等服务容量下，交叉熵平均提高0.63%；（2）每模型迭代工程时间减少92%；（3）技术-模型对采用吞吐量增加6.3倍。这些发现挑战了多样化优化目标本质上需要多样化ML模型设计的传统观点。

英文摘要

Modern computational advertising platforms typically rely on recommendation systems to predict user responses, such as click-through rates, conversion rates, and other optimization events. To support a wide variety of product surfaces and advertiser goals, these platforms frequently maintain an extensive ecosystem of machine learning (ML) models. However, operating at this scale creates significant development and efficiency challenges. Substantial engineering effort is required to regularly refresh ML models and propagate new techniques, which results in long latencies when deploying ML innovations across the ecosystem. We present a large-scale empirical study comparing model performance, efficiency, and ML technique propagation between a standardized model-building approach and independent per-model optimization in recommendation systems. To facilitate this standardization, we propose the Standard Model Template (SMT) -- a framework that generates high-performance models adaptable to diverse data distributions and optimization events. By utilizing standardized, composable ML model components, SMT reduces technique propagation complexity from $O(n \cdot 2^k)$ to $O(n + k)$ where $n$ is the number of models and $k$ the number of techniques. Evaluating an extensive suite of models over four global development cycles within Meta's production ads ranking ecosystem, our results demonstrate: (1) a 0.63% average improvement in cross-entropy at neutral serving capacity, (2) a 92% reduction in per-model iteration engineering time, and (3) a $6.3\times$ increase in technique-model pair adoption throughput. These findings challenge the conventional wisdom that diverse optimization goals inherently require diversified ML model design.

URL PDF HTML ☆

赞 0 踩 0

2604.03146 2026-06-08 stat.ML cs.LG 版本更新

Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization

高维经验风险最小化中高斯普适性破坏的表征

Chiheb Yaakoubi, Cosme Louart, Malik Tiomoko, Zhenyu Liao

发表机构 * School of Data Science, The Chinese University of Hong Kong, Shenzhen, China ； Huawei Noah's Ark Lab, Huawei Technologies, Paris, France ； School of Electronic Information ； Communications, Huazhong University of Science \& Technology, China

AI总结通过将凸高斯极小极大定理推广到非高斯数据，刻画了高维经验风险最小化估计量的渐近分布，揭示了高斯普适性的适用范围与局限。

Comments 28 pages, 5 figures, 1 table

详情

Journal ref: ICML 2026

AI中文摘要

我们研究了一般非高斯数据设计下的高维凸经验风险最小化（ERM）。通过启发式地将凸高斯极小极大定理（CGMT）扩展到非高斯设置，我们推导出关键统计量的渐近极小极大表征，从而能够近似ERM估计量 $\hat{\theta}$ 的均值 $\mu_{\hat{\theta}}$ 和协方差 $C_{\hat{\theta}}$。具体地，在数据矩阵的集中假设以及损失和正则化子的标准正则性条件下，我们证明：对于独立于训练数据的测试协变量 $x$，投影 $\hat{\theta}^\top x$ 近似遵循 $\mu_{\hat{\theta}}^\top x$ 的一般非高斯分布与一个独立中心高斯变量（方差为 $\mathrm{tr}(C_{\hat{\theta}} \mathbb{E}[xx^\top])$）的卷积。这一结果阐明了ERM高斯普适性的范围和局限。此外，我们证明任何 $\mathcal{C}^2$ 正则化子渐近等价于一个由其零点的Hessian矩阵和 $\mu_{\hat{\theta}}$ 处的梯度唯一确定的二次型。我们提供了跨不同损失和模型的数值模拟，以验证我们的理论预测和定性见解。

英文摘要

We study high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussian Min-Max Theorem (CGMT) to non-Gaussian settings, we derive an asymptotic min-max characterization of key statistics, enabling approximation of the mean $μ_{\hatθ}$ and covariance $C_{\hatθ}$ of the ERM estimator $\hatθ$. Specifically, under a concentration assumption on the data matrix and standard regularity conditions on the loss and regularizer, we show that for a test covariate $x$ independent of the training data, the projection $\hatθ^\top x$ approximately follows the convolution of the generally non-Gaussian distribution of $μ_{\hatθ}^\top x$ with an independent centered Gaussian variable of variance $\mathrm{tr}(C_{\hatθ} \mathbb{E}[xx^\top])$. This result clarifies the scope and limits of Gaussian universality for ERMs. Additionally, we prove that any $\mathcal{C}^2$ regularizer is asymptotically equivalent to a quadratic form determined solely by its Hessian at zero and gradient at $μ_{\hatθ}$. Numerical simulations across diverse losses and models are provided to validate our theoretical predictions and qualitative insights.

URL PDF HTML ☆

赞 0 踩 0

2604.17433 2026-06-08 cs.CL cs.AI cs.LG 版本更新

Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning

仅需两个样本的自一致性：CoT-PoT集成实现高效LLM推理

Raman Saparkhan, Majd Hawasly, Md Rizwan Parvez, Mohammad Raza

发表机构 * Carnegie Mellon University Qatar（卡内基梅隆大学（卡塔尔））； Qatar Computing Research Institute（卡塔尔计算研究院）

AI总结提出一种混合集成方法，结合思维链与程序化推理两种模式，通过仅需两个样本即可实现自一致性，将采样量减少9.3倍，并在78.6%的任务上达到最优。

Comments 9 pages, 3 figures; accepted to Findings of ACL 2026

2604.23025 2026-06-08 cs.CR cs.LG 版本更新

Self-Supervised Learning for Android Malware Detection on a Time-Stamped Dataset

基于时间戳数据集的自监督学习安卓恶意软件检测

Annan Fu, Hao Pei, Maryam Tanha

发表机构 * Mastercard Canada（Mastercard加拿大）

AI总结针对机器学习检测器的时间偏差问题，构建时间戳数据集并采用BYOL自监督预训练，在时间感知评估下达到98%准确率和89%F1分数。

Comments Accepted for publication in IEEE ICC 2026. \c{opyright} 2026 IEEE

2605.06647 2026-06-08 cs.IR cs.AI cs.LG 版本更新

物理引导的Trotter-Suzuki分解生成优化

WenBin Yan

发表机构 * University of Colorado Boulder（科罗拉多大学博尔德分校）

AI总结提出P-GONE方法，结合条件扩散模型、图神经网络和REINFORCE微调，联合优化Trotter-Suzuki分解中的项分组、阶数和时间步分配，在保真度≥0.95时实现19.4倍电路深度压缩。

详情

AI中文摘要

Trotter-Suzuki乘积公式是在含噪中等规模量子（NISQ）硬件上进行哈密顿演化的标准途径，但其精度取决于三个耦合的选择：项分组、乘积公式阶数和时间步分配。分组和阶数是离散的，这使得直接梯度优化不可行，并迫使现有编译器依赖静态启发式方法。我们描述了P-GONE方法，该方法结合了条件扩散模型（D3PM + DDPM）、图神经网络（GNN）编码器和闭环REINFORCE微调，以在混合离散-连续空间上联合学习分组、阶数和时间步优化。在保真度匹配条件下（$F \geq 0.95$），该方法实现的电路深度为86，而Qiskit四阶（未分组，Suzuki-4）为1673，压缩约19.4倍；Paulihedral（一阶Trotter）为141，压缩约1.6倍。在$T=0.90$时，该方法也优于Qiskit分组对易教师（65 vs 103，压缩1.6倍），但在$T=0.95$时教师仍领先——这种分层模式指向保真度感知的微调。在标准退极化噪声模型下，该方法的含噪保真度大约是Qiskit四阶基线的2倍（0.743 vs 0.380）。消融实验显示清晰的层次：阶数学习 > 时间分配 > 分组。最佳N采样（$N=32$是实际最佳点）和CFG指导在推理时提供灵活的保真度-深度权衡。该方法在结构化哈密顿量（TFIM，Heisenberg）上表现良好，但随机Pauli哈密顿量在$T \geq 0.95$时完全失败——这定义了该方法的适用边界。

英文摘要

Trotter Suzuki product formulas are the standard route to Hamiltonian evolution on noisy intermediate-scale quantum (\NISQ{}) hardware, but their accuracy depends on three coupled choices: term grouping, product-formula order, and time-step allocation. Grouping and order are discrete, which makes direct gradient optimization infeasible and forces existing compilers to rely on static heuristics. We describe P-GONE, a method that combines a conditional diffusion model (D3PM + DDPM), a graph neural network (\GNN{}) encoder, and closed-loop REINFORCE fine-tuning to jointly learn grouping, order, and time-step optimization over a mixed discrete-continuous space. Under fidelity-matched conditions ($F \geq 0.95$), the method achieves circuit depth 86 versus 1673 for Qiskit fourth-order (ungrouped, Suzuki-4), about $19.4\times$ compression, and 141 for Paulihedral (first-order Trotter), about $1.6\times$ compression. At $T=0.90$ the method also beats the Qiskit group-commuting teacher (65 vs 103, $1.6\times$ compression), though at $T=0.95$ the teacher still leads -- a stratified pattern that points toward fidelity-aware fine-tuning. Under a standard depolarizing noise model, the method achieves noisy fidelity roughly $2\times$ the Qiskit fourth-order baseline (0.743 vs 0.380). Ablation shows a clear hierarchy: order learning $>$ time allocation $>$ grouping. Best-of-N sampling ($N=32$ is a practical sweet spot) and CFG guidance give flexible fidelity-depth trade-offs at inference. The method works well on structured Hamiltonians (TFIM, Heisenberg), but random Pauli Hamiltonians fail entirely at $T \geq 0.95$ -- a boundary that defines where the method applies.

URL PDF HTML ☆

赞 0 踩 0

2605.21347 2026-06-08 cs.AI cs.LG cs.SE 版本更新

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

Insights Generator: LLM代理的系统级语料库追踪诊断

Akshay Manglik, Apaar Shanker, Kaustubh Deshpande, Jason Qin, Yash Maurya, Veronica Chatrath, Vijay S. Kalmath, Levi Lentz, Yuan Xue

发表机构 * Scale AI, Inc.

AI总结本文提出Insights Generator，一种多智能体系统，通过在语料库中提出和测试假设来生成基于证据的洞察报告，从而系统性地诊断LLM代理的行为模式。

详情

AI中文摘要

诊断LLM代理的故障仍然主要依赖人工。从业者检查少量执行轨迹子集，形成临时假设并迭代。此过程错过了仅在轨迹群体中显现的模式，并且无法扩展到生产语料库，其中单个轨迹跨度上万词。我们正式化了语料库级轨迹诊断问题。给定一个执行轨迹语料库，目标是生成具有证据支持的自然语言洞察，以描述轨迹群体中的系统性行为模式，每个模式都与支持证据相关联。我们提出了Insights Generator (IG)，一种多智能体系统，通过在轨迹语料库中提出和测试假设来回答诊断问题，从而生成证据支持的洞察报告。我们从定性和客观维度评估了IG，涵盖基于评分标准的报告评估和通过实施IG洞察获得的下游性能改进。使用IG报告的人类专家将支架性能提高了30.4个点百分比，而利用IG衍生洞察的编码代理显示出一致且稳定的提升。在多个基准测试中，IG的scout-investigator架构生成的发现与竞争方法在检测覆盖方面相当，同时领域专家对IG报告的深度和证据质量评价优异。

英文摘要

Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patterns that only emerge across trace populations and does not scale to production corpora where individual traces span tens of thousands of tokens. We formalize the problem of corpus-level trace diagnostics. Given a corpus of execution traces, the goal is to produce grounded natural-language insights that characterize systematic behavioral patterns across trace groups, each linked to supporting evidence. We present the Insights Generator (IG), a multi-agent system that answers diagnostic questions by proposing and testing hypotheses across the trace corpus to produce an evidence-backed insights report. We evaluate IG across qualitative and objective dimensions, spanning rubric-based report assessment and downstream performance improvements achieved by implementing IG insights. Human experts using IG reports improve scaffold performance by 30.4pp over the unmodified baseline scaffold, and coding agents leveraging IG-derived insights show consistent and stable gains. Across benchmarks, IG's scout-investigator architecture produces findings comparable in detection coverage to competing approaches, while domain experts rated IG reports as leading depth and evidence quality.

URL PDF HTML ☆

赞 0 踩 0

2605.25638 2026-06-08 cs.CL cs.LG 版本更新

循环语言模型表达能力的代数视角

Franz Nowak, Ryan Cotterell, Reda Boumasmoud

发表机构 * GitHub

AI总结本文通过代数统一框架分析循环神经网络在不同算术模型下的表达能力，将形式语言识别问题归结为语法幺半群是否划分特定圈积的代数问题。

Comments 28 pages, 2 figures, to be published at ICML 2026

2606.02475 2026-06-08 math.NA cs.CE cs.LG cs.NA 版本更新

Physics-Informed Residuals for Adaptive Mesh Refinement in Finite-Difference PDE Solvers

面向有限差分PDE求解器中自适应网格细化的物理信息残差

Henry Kasumba, Ronald Katende

发表机构 * Department of Mathematics, Makerere University（数学系，Makerere大学）； Department of Mathematics, Kabale University（数学系，Kabale大学）

AI总结提出利用物理信息神经网络（PINN）作为离网格残差探针，为有限差分求解器提供自适应网格细化指示，在粘性Burgers方程等基准测试中验证了其有效性。

Comments 20 pages, 5 tables, 5 figures

详情

AI中文摘要

经典有限差分求解器仍是偏微分方程的可靠工具，但其效率取决于网格分辨率的放置位置。当求解困难集中在尖锐梯度、前沿、振荡或约束敏感区域附近时，均匀细化可能浪费自由度。本文研究了一种混合策略，其中物理信息神经网络（PINN）不作为最终求解器，而是作为自适应网格细化的离网格残差探针。PINN残差在域内采样，转换为单元指示器，并在最终近似由有限差分求解器计算之前指导细化。该方法在三个基准测试上进行了评估。主要的全求解器验证使用一维粘性Burgers方程，在自适应网格上进行非均匀有限差分求解。PINN阈值细化在60个自由度下达到最终相对$L^2$误差0.021067，而均匀细化在192个自由度下为0.022617。在匹配网格大小时，PINN阈值将误差降低了约67.5%。PINN-Dörfler细化性能类似，使用58个自由度时误差为0.021264。梯度指示器仍略精确，因此结果支持有用性而非普遍优越性。基于非线性薛定谔方程和不可压缩Navier-Stokes系统的二维和三维代理测试表明，PINN残差可以组织结构化细化并优于随机细化，尽管它们并不始终优于梯度或均匀基线。结果支持PINN引导的AMR作为一种残差指示器策略，将物理信息诊断信息传递到有限差分网格自适应中，同时保留经典求解器作为最终近似引擎。

英文摘要

Classical finite-difference solvers remain reliable tools for partial differential equations, but their efficiency depends on where mesh resolution is placed. Uniform refinement can waste degrees of freedom when solution difficulty is localised near sharp gradients, fronts, oscillations, or constraint-sensitive regions. This paper studies a hybrid strategy in which a physics-informed neural network (PINN) is used not as the final solver, but as an off-grid residual probe for adaptive mesh refinement. The PINN residual is sampled over the domain, converted into cellwise indicators, and used to guide refinement before the final approximation is computed by a finite-difference solver. The method is evaluated on three benchmarks. The main full-solver validation uses the one-dimensional viscous Burgers equation with a nonuniform finite-difference solve on the adapted meshes. PINN-threshold refinement attains final relative $L^2$ error $0.021067$ with $60$ degrees of freedom, compared with $0.022617$ for uniform refinement with $192$ degrees of freedom. At matched mesh size, PINN-threshold reduces the error by about $67.5\%$. PINN-Dörfler refinement gives similar performance, with error $0.021264$ using $58$ degrees of freedom. A gradient indicator remains slightly more accurate, so the result supports usefulness rather than universal superiority. Manufactured 2D and 3D proxy tests, based on a nonlinear Schrödinger equation and an incompressible Navier--Stokes system, show that PINN residuals can organise structured refinement and improve over random refinement, although they do not consistently outperform gradient or uniform baselines. The results support PINN-guided AMR as a residual-indicator strategy for transferring physics-informed diagnostic information into finite-difference mesh adaptation while preserving the classical solver as the final approximation engine.

URL PDF HTML ☆

赞 0 踩 0

2606.04101 2026-06-08 cs.DC cs.LG 版本更新

UltraEP: Unleash MoE Training and Inference on Rack-Scale Nodes with Near-Optimal Load Balancing

UltraEP：在机架级节点上以近最优负载均衡释放MoE训练与推理

Xinming Wei, Chao Jin, Tuo Dai, Yinmin Zhong, Shan Yu, Chengxu Yang, Bingyang Wu, Zili Zhang, Jing Mai, Qianchao Zhu, Zhouyang Li, Yuliang Liu, Guojie Luo

AI总结提出UltraEP，首个基于精确负载的实时均衡器，通过协同设计规划求解与专家复制通信，在机架级节点上实现MoE训练和推理的微批次与逐层重均衡，达到94.3%的力均衡理想吞吐量。

Comments The authors have identified issues related to information disclosure in the current version of the manuscript and therefore request its withdrawal. A revised version may be prepared at a later date

详情

AI中文摘要

大规模专家并行（EP）正成为训练和服务前沿MoE模型的关键，但它也加剧了设备级专家负载不均衡，导致计算掉队者、令牌全对全瓶颈和激活内存峰值。现有的均衡器基于历史负载定期重新分配专家，这对于具有非平稳负载模式的生产部署变得不可靠。我们提出UltraEP，首个用于大规模EP MoE训练和在机架级节点（RSN）上服务预填充的精确负载实时均衡器。基于RSN扩展的纵向扩展连接性，UltraEP在关键路径上对每个微批次和层进行重均衡，这需要规划求解和专家复制通信的非平凡协同设计，以最小化暴露的开销。为此，UltraEP通过高效的配额驱动规划对门控后负载做出积极反应，并利用RSN原生的持久tile流和基于中继的扇出缓解来执行由此产生的不规则专家状态传输。在训练和预填充中，平均涵盖106B到671B参数的MoE模型，UltraEP实现了力均衡理想吞吐量的94.3%，相比无均衡提升了1.49倍，同时将最终跨秩不均衡从1.30-4.01降低到1.01-1.04。此外，我们在2560个GPU的生产MoE训练中验证了UltraEP的可扩展性和鲁棒性。

英文摘要

Large-scale expert parallelism (EP) is becoming pivotal for training and serving frontier MoE models, but it also amplifies device-level expert load imbalance into compute stragglers, token all-to-all bottlenecks, and activation-memory spikes. Existing balancers redistribute experts periodically based on historical load, which becomes unreliable for production deployments with non-stationary load patterns. We present UltraEP, the first exact-load, real-time balancer for large-EP MoE training and serving prefill on rack-scale nodes (RSNs). Built upon the extended scale-up connectivity of RSNs, UltraEP rebalances every microbatch and layer on critical paths, which requires nontrivial co-design of plan solving and expert replication communication to minimize exposed overhead. To this end, UltraEP eagerly reacts to post-gating load with efficient quota-driven planning, and executes the resulting irregular expert-state transfers with RSN-native persistent tile streaming and relay-based fan-out mitigation. Averaged across MoE models from 106B to 671B parameters in training and prefill, UltraEP achieves 94.3% of the force-balanced ideal throughput, delivering 1.49$\times$ improvement over non-balancing, while reducing the final inter-rank imbalance from 1.30$-$4.01 to 1.01$-$1.04. Additionally, we validate UltraEP's scalability and robustness in production MoE training with 2560 GPUs.

URL PDF HTML ☆

赞 0 踩 0

2606.05654 2026-06-08 cs.SE cs.AI cs.LG 版本更新

具有线性函数逼近、通用学习步长和独立同分布样本的TD(0)的快速鲁棒收敛速率

Ziad Kobeissi, Éloïse Berthier

发表机构 * L2S, INRIA U2IS, ENSTA Université Paris-Saclay, CentraleSupelec, Institut Polytechnique de Paris（L2S、INRIA U2IS、ENSTA巴黎萨克雷大学、中央苏格兰、巴黎理工学院）

AI总结针对线性函数逼近的TD(0)算法，在独立同分布样本和常数学习步长下，提出一种均方误差的快速（1/k阶）、鲁棒（不依赖最小特征值）且尖锐（乘性常数小于11）的收敛速率，并引入PCTD(0)变体以在强混合假设下获得更好收敛性。

Comments This is an extended version of a paper accepted at AISTATS 2026

详情

Journal ref: AISTATS 2026, May 2026, Tanger, Morocco

AI中文摘要

本文研究了具有线性函数逼近（LFA）的TD(0)时序差分方法的有限时间行为。我们考虑策略内独立同分布（i.i.d.）样本、常数学习步长和Polyak-Juditsky平均方法。我们为近似函数的均方误差（MSE）建立了一个新的收敛速率，该速率（i）快速，即具有迭代次数k的最优依赖性（即1/k阶），（ii）对病态条件鲁棒：仅依赖于初始误差和模型无关常数，以及（iii）尖锐，乘性常数小于11。特别地，与TD(0)文献中所有现有的O(1/k)速率不同，它不依赖于线性参数化的非中心协方差矩阵的最小特征值。我们还引入了PCTD(0)，这是TD(0)的一个变体，在马尔可夫链的强混合附加假设下具有更好的收敛性质。

英文摘要

In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method. We establish a new convergence rate, for the Mean-Square Error (MSE) on the approximated function, that is (i) fast in the sense that it admits an optimal dependency in the number of iterations k (i.e., of order 1/k), (ii) robust to ill-conditioning: it only depends on an initial error and modelindependent constants and (iii) sharp up to a multiplicative constant lower than 11. In particular, it does not depend on the smallest eigenvalue of the uncentered covariance matrix of the linear parametrization, unlike all pre-existing O(1/k) rates in the TD(0) literature. We also introduce PCTD(0), a variant of TD(0), which benefits from better convergence properties under an additional assumption of strong mixing on the Markov Chain.

URL PDF HTML ☆

赞 0 踩 0

2606.06224 2026-06-08 cs.CV cs.LG 版本更新

Symb-xMIL: Symbolic Explanations for Multiple Instance Learning in Digital Pathology

Symb-xMIL: 数字病理学中多实例学习的符号解释

Yanqing Luo, Julius Hense, Niklas Prenißl, Andreas Mock, Klaus-Robert Müller, Thomas Schnake, Mina Jamshidi Idaji

发表机构 * Berlin Institute for the Foundations of Learning and Data（柏林学习与数据基础研究院）； Machine Learning Group, Technische Universität Berlin（柏林技术大学机器学习组）； Institute of Pathology, Charité Universitätsmedizin（查理研究所病理学部）； Berlin Institute of Health at Charité – Universitätsmedizin Berlin, BIH Biomedical Innovation Academy, BIH Charité Digital Clinician Scientist Program（柏林查理医学研究院健康研究所、BIH生物医学创新学院、BIH查理数字临床科学家项目）； Institute of Pathology, Ludwig Maximilian University of Munich（慕尼黑路德维希-马克西米利安大学病理学部）； Division of Translational Medical Oncology, DKFZ（转化医学肿瘤学部，德国有机化学研究所）； German Cancer Consortium (DKTK), partner site Munich, a partnership between DKFZ and Ludwig-Maximilians-Universität München (LMU)（德国癌症联盟（DKTK），慕尼黑合作伙伴站点，由DKFZ和路德维希-马克西米利安-慕尼黑大学（LMU）组成）； Department of Artificial Intelligence, Korea University（韩国大学人工智能系）； Max-Planck Institute for Informatics, Saarbrücken, Germany（马克斯·普朗克信息学院，萨尔布吕肯，德国）； Department of Chemistry, Chemical Physics Theory Group, University of Toronto（多伦多大学化学系，化学物理理论组）； Vector Institute for Artificial Intelligence, Toronto, Canada（多伦多人工智能矢量研究所）； Acceleration Consortium, University of Toronto（多伦多大学加速联盟）

AI总结提出Symb-xMIL框架，通过量化模型行为与可读决策规则（逻辑关系）的对齐程度，为多实例学习提供结构化的符号解释，并在合成和真实病理数据上验证其有效性。

Comments 23 pages, 18 figures

详情

AI中文摘要

多实例学习（MIL）模型的解释被广泛用于数字组织病理学的验证和发现。现有方法主要依赖于突出显示影响区域的热力图，但不解释如何将不同组织区域的证据组合以产生预测。这限制了可解释性，尤其是当决策依赖于组织特征之间的交互时。我们引入了符号可解释MIL（Symb-xMIL），一种事后解释框架，量化MIL模型的行为与人类可读决策规则（表示为输入特征之间的逻辑关系，如AND、OR、NOT）的对齐程度。这些对齐分数揭示了模型预测背后的语义模式。我们在合成和真实世界的组织病理学数据集上评估了Symb-xMIL。在合成MIL数据上，Symb-xMIL可靠地恢复了真实逻辑规则。在临床肿瘤检测任务中，最佳对齐的规则揭示了异质决策模式并暴露了隐藏的模型错误。在TCGA-HNSCC（头颈癌队列）的HPV预测任务中，我们的框架在HPV状态之外细化了患者生存分层，具有潜在的临床相关性。总体而言，Symb-xMIL将MIL的可解释性从视觉归因扩展到结构化的、基于规则的推理，实现了对模型预测更透明和基于语义的解释。

英文摘要

Explanations of multiple instance learning (MIL) models are widely used for validation and discovery in digital histopathology. Existing methods primarily rely on heatmaps that highlight influential regions but do not explain how evidence from different tissue regions is combined to produce a prediction. This limits interpretability, especially when decisions depend on interactions between tissue features. We introduce Symbolic explainable MIL (Symb-xMIL), a post-hoc explanation framework that quantifies how a MIL model's behavior aligns with human-readable decision rules, expressed as logical relationships (e.g., AND, OR, NOT) between input features. These alignment scores reveal semantic patterns underlying the model's predictions. We evaluate Symb-xMIL on synthetic and real-world histopathology datasets. On synthetic MIL data, Symb-xMIL reliably recovers ground-truth logical rules. In a clinical tumor detection task, the best-aligned rules uncover heterogeneous decision patterns and expose hidden model errors. On an HPV-prediction task on TCGA-HNSCC, a cohort of head and neck cancer, our framework refines patient survival stratification beyond HPV status with potential clinical relevance. Overall, Symb-xMIL extends MIL explainability beyond visual attribution toward structured, rule-based reasoning, enabling more transparent and semantically grounded interpretation of model predictions.

URL PDF HTML ☆

赞 0 踩 0

2507.12878 2026-06-08 eess.SP cs.LG stat.ML 版本更新

Bayesian Modeling and Estimation of Linear Time-Varying Systems using Neural Networks and Gaussian Processes

基于神经网络和高斯过程的线性时变系统贝叶斯建模与估计

Yaniv Shulman

发表机构 * Shulman.info（Shulman信息）

AI总结本文提出一种统一的贝叶斯框架，通过将系统脉冲响应建模为随机过程，利用变分推断和高斯过程，实现了对线性时变系统的鲁棒估计。

2603.14573 2026-06-08 cond-mat.dis-nn cs.LG math.PR 版本更新

Rigorous Asymptotics for First-Order Algorithms Through the Dynamical Cavity Method

通过动力学空腔方法严格推导一阶算法的渐进行为

Yatin Dandi, David Gamarnik, Francisco Pernice, Lenka Zdeborová

发表机构 * Statistical Physics of Computation Laboratory, École polytechnique fédérale de Lausanne (EPFL)（计算统计物理实验室，瑞士联邦理工学院（EPFL））； Sloan School of Management, Operations Research Center and Institute of Data, Systems and Society (IDSS), MIT（斯隆管理学院，运筹学中心和数据、系统与社会研究所（IDSS），麻省理工学院）； CSAIL and LIDS, MIT（计算机科学与人工智能实验室（CSAIL）和麻省理工学院数据科学研究所（LIDS））

AI总结本文通过严格形式化的动力学空腔方法，推导出一阶算法（如梯度下降和近似消息传递）的动力学主方程，为非严谨的传统方法提供数学基础。

2505.19888 2026-06-08 cs.LG 版本更新

Generalized and Personalized Federated Learning with Black-Box Foundation Models via Orthogonal Transformations

基于正交变换的联邦学习与个性化方法：通过黑盒基础模型

Eun Gyung Kong, Je Won Yeom, Yonghoon Jeon, Taesup Kim

发表机构 * Seoul National University（首尔国立大学）； Mobilint, Inc.（Mobilint公司）； Kakao Healthcare Corp.（Kakao医疗公司）

AI总结本文提出FedOT框架，通过正交变换实现联邦学习中的鲁棒泛化与有效个性化，在异构环境中提升性能，优于基线方法。

Comments 31 pages, 5 figures

详情

Journal ref: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 24567-24576, 2026

AI中文摘要

联邦学习（FL）在保护数据隐私的同时促进去中心化模型训练。然而，在异构（非iid）环境中同时实现鲁棒泛化和有效个性化仍是一个严峻挑战。此外，基础模型（FMs）的广泛使用要求双重隐私保护：（a）保护敏感客户端数据和（b）保护服务器的知识产权。这需要严格黑盒访问FMs。为解决这些挑战，我们引入FedOT，一种针对黑盒FMs优化的联邦学习框架。FedOT采用共享的全局任务依赖分类器，同时通过客户端特定的正交变换实现本地适应，该变换应用于FMs嵌入之外。这种架构本质上保证FMs内部参数保持不可访问和未修改。通过强制正交性，FedOT有效缓解了跨不同客户端的梯度冲突，理论上有界，保持FMs表示的语义完整性，并在显著的数据异质性下实现稳健性能。全局和本地参数的协同优化最佳平衡了泛化和个性化，显著优于基线FL方法。广泛的实证分析，包括严格多种子验证和可扩展性评估，证实了FedOT的鲁棒性、效率和优越性能。

英文摘要

Federated Learning (FL) facilitates decentralized model training while preserving data privacy. However, achieving both robust generalization and effective personalization simultaneously in heterogeneous (non-IID) environments remains a formidable challenge. Furthermore, the widespread adoption of proprietary Foundation Models (FMs) introduces a critical requirement for dual privacy: (a) protecting sensitive client data and (b) securing the server's valuable intellectual property. This mandates strictly black-box access to the FM. To address these multifaceted challenges, we introduce FedOT, a novel FL framework optimized for black-box FMs. FedOT employs a shared global task-dependent classifier while facilitating local adaptation through client-specific orthogonal transformations applied externally to the FM embeddings. This architecture inherently guarantees that the FM's internal parameters remain inaccessible and unmodified. By enforcing orthogonality, FedOT effectively mitigates gradient conflicts across diverse clients, which is theoretically bounded, preserves the semantic integrity of the FM representations, and achieves robust performance under significant data heterogeneity. The synergy of global and local parameters optimally balances generalization and personalization, markedly outperforming baseline FL methods across diverse benchmarks. Extensive empirical analysis, including rigorous multi-seed validation and scalability assessments, substantiates the robustness, efficiency, and superior performance of FedOT.

URL PDF HTML ☆

赞 0 踩 0

2502.21123 2026-06-08 cs.LG cs.AI 版本更新

Causality Is Key to Understand and Balance Multiple Goals in Trustworthy ML and Foundation Models

因果关系是理解和平衡可信机器学习与基础模型中多个目标的关键

Ruta Binkyte, Ivaxi Sheth, Zhijing Jin, Mohammad Havaei, Bernhard Schölkopf, Mario Fritz

发表机构 * CISPA Helmholtz Center for Information Security（CISPA海德堡信息安全中心）； Max Planck Institute for Intelligent Systems, Tübingen（马克斯·普朗克智能系统研究所（图宾根））； Google Research（谷歌研究）； ETH Zürich（苏黎世联邦理工学院）； University of Toronto（多伦多大学）

AI总结本文主张将因果方法集成到机器学习中，以平衡公平性、隐私、鲁棒性、准确性和可解释性等可信原则之间的权衡，并探讨其在基础模型中的实际应用。

详情

AI中文摘要

确保机器学习系统的可信度至关重要，因为它们日益嵌入高风险领域。本文主张将因果方法集成到机器学习中，以应对可信机器学习关键原则（包括公平性、隐私、鲁棒性、准确性和可解释性）之间的权衡。虽然这些目标理想情况下应同时满足，但它们通常被孤立地处理，导致冲突和次优解决方案。借鉴因果在ML中成功协调目标（如公平性与准确性，或隐私与鲁棒性）的现有应用，本文认为因果方法对于平衡可信ML和基础模型中的多个竞争目标至关重要。除了强调这些权衡，我们考察了如何将因果实际集成到ML和基础模型中，提供增强其可靠性和可解释性的解决方案。最后，我们讨论了采用因果框架的挑战、局限性和机遇，为更负责任和合乎伦理的AI系统铺平道路。

英文摘要

Ensuring trustworthiness in machine learning (ML) systems is crucial as they become increasingly embedded in high-stakes domains. This paper advocates for integrating causal methods into machine learning to navigate the trade-offs among key principles of trustworthy ML, including fairness, privacy, robustness, accuracy, and explainability. While these objectives should ideally be satisfied simultaneously, they are often addressed in isolation, leading to conflicts and suboptimal solutions. Drawing on existing applications of causality in ML that successfully align goals such as fairness and accuracy or privacy and robustness, this paper argues that a causal approach is essential for balancing multiple competing objectives in both trustworthy ML and foundation models. Beyond highlighting these trade-offs, we examine how causality can be practically integrated into ML and foundation models, offering solutions to enhance their reliability and interpretability. Finally, we discuss the challenges, limitations, and opportunities in adopting causal frameworks, paving the way for more accountable and ethically sound AI systems.

URL PDF HTML ☆

赞 0 踩 0

2602.10680 2026-06-08 stat.ML cond-mat.dis-nn cs.LG 版本更新

A solvable high-dimensional model where nonlinear autoencoders learn structure invisible to PCA while test loss misaligns with generalization

一个可解的高维模型，其中非线性自编码器学习到结构对PCA不可见，而测试损失与泛化不一致

Vicente Conde Mendes, Lorenzo Bardone, Cédric Koller, Jorge Medina Moreira, Vittorio Erba, Emanuele Troiani, Lenka Zdeborová

发表机构 * Statistical Physics of Computation Laboratory, École polytechnique fédérale de Lausanne (EPFL)（计算统计物理实验室，瑞士联邦理工学院（EPFL））

AI总结本文提出一个高维模型，展示非线性自编码器能学习线性方法如PCA无法捕捉的结构，尽管其测试损失与泛化性能不一致。

详情

Journal ref: ICML 2026

AI中文摘要

许多现实世界的数据集包含隐藏的结构，这些结构无法通过输入特征间的简单线性相关性检测到。例如，潜在因子可能以协调的方式影响数据，尽管其影响对基于协方差的方法如PCA不可见。在实践中，非线性神经网络常在无监督和自监督学习中成功提取此类隐藏结构。然而，构建一个最小的高维模型，其中这种优势可以严格分析仍是一个开放的理论挑战。我们引入了一个可解的高维 spiked 模型，包含两个潜在因子：一个对协方差可见，另一个统计上相关但不相关，仅出现在高阶矩中。PCA 和线性自编码器无法恢复后者，而最小的非线性自编码器可以证明性地提取两者。我们分析了总体风险和经验风险最小化。我们的模型还提供了一个可解的例子，其中自监督测试损失与表征质量不一致：非线性自编码器恢复了线性方法无法捕捉的结构，尽管其重建损失更高。

英文摘要

Many real-world datasets contain hidden structure that cannot be detected by simple linear correlations between input features. For example, latent factors may influence the data in a coordinated way, even though their effect is invisible to covariance-based methods such as PCA. In practice, nonlinear neural networks often succeed in extracting such hidden structure in unsupervised and self-supervised learning. However, constructing a minimal high-dimensional model where this advantage can be rigorously analyzed has remained an open theoretical challenge. We introduce a tractable high-dimensional spiked model with two latent factors: one visible to covariance, and one statistically dependent yet uncorrelated, appearing only in higher-order moments. PCA and linear autoencoders fail to recover the latter, while a minimal nonlinear autoencoder provably extracts both. We analyze both the population risk, and empirical risk minimization. Our model also provides a tractable example where self-supervised test loss is poorly aligned with representation quality: nonlinear autoencoders recover latent structure that linear methods miss, even though their reconstruction loss is higher.

URL PDF HTML ☆

赞 0 踩 0

2509.24914 2026-06-08 stat.ML cond-mat.dis-nn cs.IT cs.LG math.IT 版本更新

Single-Head Attention in High Dimensions: A Theory of Generalization, Weights Spectra, and Scaling Laws

高维中的单头注意力：一般化、权重谱和扩展定律的理论

Fabrizio Boncoraglio, Vittorio Erba, Emanuele Troiani, Yizhou Xu, Florent Krzakala, Lenka Zdeborová

发表机构 * Statistical Physics of Computation Laboratory, École polytechnique fédérale de Lausanne (EPFL)（计算物理实验室，瑞士联邦理工学院（EPFL））； Information, Learning and Physics Laboratory, École polytechnique fédérale de Lausanne (EPFL)（信息、学习与物理实验室，瑞士联邦理工学院（EPFL））

AI总结本文研究了高维序列任务中训练的注意力层权重谱结构，通过随机矩阵理论等工具，揭示了训练误差、插值阈值及键查询矩阵谱的高维特性，并预测了功率谱定律的出现。

详情

Journal ref: ICML 2026

AI中文摘要

训练的注意力层表现出显著且可重复的权重谱结构，包括低秩坍塌、批量变形和孤立谱异常，但其起源及对泛化的影响尚不明确。本文通过在合成高维序列任务上训练单头绑定注意力层，利用随机矩阵理论、自旋玻璃理论和近似消息传递工具，获得训练和测试误差、插值和恢复阈值及键查询矩阵谱的高维表征。理论预测了训练查询-键映射的完整奇异值分布，包括低秩结构和孤立谱异常，与更现实的Transformer观察结果定性一致。最后，对于具有幂律谱的目标，显示学习通过序列谱恢复进行，导致幂律扩展定律的出现。

英文摘要

Trained attention layers exhibit striking and reproducible spectral structure of the weights, including low-rank collapse, bulk deformation, and isolated spectral outliers, yet the origin of these phenomena and their implications for generalization remain poorly understood. We study empirical risk minimization in a single-head tied-attention layer trained on synthetic high-dimensional sequence tasks generated from the attention-indexed model. Using tools from random matrix theory, spin-glass theory, and approximate message passing, we obtain an exact high-dimensional characterization of training and test error, interpolation and recovery thresholds, and the spectrum of the key and query matrices. Our theory predicts the full singular-value distribution of the trained query-key map, including low-rank structure and isolated spectral outliers, in qualitative agreement with observations in more realistic transformers. Finally, for targets with power-law spectra, we show that learning proceeds through sequential spectral recovery, leading to the emergence of power-law scaling laws.

URL PDF HTML ☆

赞 0 踩 0

2411.05729 2026-06-08 cs.LG stat.ML 版本更新

Graph-Dictionary Signal Model for Sparse Representations of Multivariate Data

图词典信号模型用于多变量数据的稀疏表示

William Cappelletti, Pascal Frossard

发表机构 * LTS4, EPFL, Lausanne, Switzerland（瑞士洛桑联邦理工学院LTS4实验室）

AI总结本文提出图词典信号模型，通过图结构描述多变量数据中的关系，利用稀疏组合的图原子进行信号重构，优于现有基线方法。

详情

DOI: 10.1109/TSIPN.2026.3653623

AI中文摘要

表示和利用多变量信号需要捕捉变量间的关系，我们通过图来表示这些关系。图词典允许将复杂的关联信息表示为稀疏简单结构之和，但目前尚无先验模型能从数据中推断此类底层结构元素。我们定义了新的图词典信号模型，其中有限的图集合通过其拉普拉斯算子加权和的稀疏组合来描述数据分布中的关系。我们提出了一种从观测节点信号中推断图词典表示的框架，允许包含关于信号属性、底层图及其系数的先验知识。我们引入了原始-对偶分裂算法的双线性推广来解决学习问题。我们展示了该方法在多个合成设置中从信号中重建图的能力，其中我们的模型优于流行的基线方法。然后，我们利用图词典表示在脑活动数据上的示例运动解码任务中，比依赖更多特征的标准方法更好地分类想象运动。我们的图词典模型弥合了多变量数据稀疏表示与样本变化关系的结构分解之间的差距。

英文摘要

Representing and exploiting multivariate signals requires capturing relations between variables, which we can represent by graphs. Graph dictionaries allow to describe complex relational information as a sparse sum of simpler structures, but no prior model exists to infer such underlying structure elements from data. We define a novel Graph-Dictionary signal model, where a finite set of graphs characterizes relationships in data distribution as filters on the weighted sum of their Laplacians. We propose a framework to infer the graph dictionary representation from observed node signals, which allows to include a priori knowledge about signal properties, and about underlying graphs and their coefficients. We introduce a bilinear generalization of the primal-dual splitting algorithm to solve the learning problem. We show the capability of our method to reconstruct graphs from signals in multiple synthetic settings, where our model outperforms popular baselines. Then, we exploit graph-dictionary representations in an illustrative motor imagery decoding task on brain activity data, where we classify imagined motion better than standard methods relying on many more features. Our graph-dictionary model bridges a gap between sparse representations of multivariate data and a structured decomposition of sample-varying relationships into a sparse combination of elementary graph atoms.

URL PDF HTML ☆

赞 0 踩 0

2403.09110 2026-06-08 cs.LG cs.SY eess.SY math.DS math.OC 版本更新

SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning

SINDy-RL：可解释且高效的基于模型的强化学习

Nicholas Zolman, Christian Lagemann, Urban Fasel, J. Nathan Kutz, Steven L. Brunton

发表机构 * Department of Mechanical Engineering, University of Washington, Seattle, WA 98195, USA（华盛顿大学机械工程系）； Data Science and Artificial Intelligence Department, The Aerospace Corporation, El Segundo, CA 90245（航空航天公司数据科学与人工智能部）； Department of Aeronautics, Imperial College, London SW7 2AZ, United Kingdom（帝国理工学院航空系）； Department of Applied Mathematics, University of Washington, Seattle, WA 98195（华盛顿大学应用数学系）； Department of Electrical and Computer Engineering, University of Washington, Seattle, WA 98195（华盛顿大学电气与计算机工程系）

AI总结本文提出SINDy-RL框架，结合SINDy和DRL，实现低数据下高效、可解释的动力学模型和控制策略，通过基准环境和流体控制实验验证其有效性。

Comments For code, see https://github.com/nzolman/sindy-rl. v2 Update: Included Pinball and 3D Airfoil examples. Christian Lagemann added as an author for contributions with the 3D Airfoil code. To appear in Nature Communications

详情

DOI: 10.1038/s41467-025-65738-4
Journal ref: Nat. Commun. 16, 10714 (2025)

AI中文摘要

深度强化学习（DRL）在复杂环境中揭示复杂控制策略方面展现出巨大潜力，如稳定托卡马克聚变反应堆或最小化流体中物体的阻力。然而，DRL需要大量训练示例且成本高昂。此外，依赖深度神经网络导致不可解释的黑箱策略，可能在嵌入式系统中计算成本过高。最近的稀疏字典学习进展，如非线性动态的稀疏识别（SINDy），在低数据条件下展示了创建高效且可解释的数据驱动模型的潜力。本文介绍SINDy-RL，一种结合SINDy和DRL的统一框架，以创建高效、可解释且可信的动力学模型、奖励函数和控制策略。我们在基准控制环境和流体控制问题上展示了方法的有效性，包括在Re=1000时的3D NACA 0012翼型气流抑制。SINDy-RL在显著较少的环境交互中实现了与现代DRL算法相当的性能，并产生比DRL策略小多个数量级的可解释控制策略。

英文摘要

Deep reinforcement learning (DRL) has shown significant promise for uncovering sophisticated control policies that interact in complex environments, such as stabilizing a tokamak fusion reactor or minimizing the drag force on an object in a fluid flow. However, DRL requires an abundance of training examples and may become prohibitively expensive for many applications. In addition, the reliance on deep neural networks often results in an uninterpretable, black-box policy that may be too computationally expensive to use with certain embedded systems. Recent advances in sparse dictionary learning, such as the sparse identification of nonlinear dynamics (SINDy), have shown promise for creating efficient and interpretable data-driven models in the low-data regime. In this work we introduce SINDy-RL, a unifying framework for combining SINDy and DRL to create efficient, interpretable, and trustworthy representations of the dynamics model, reward function, and control policy. We demonstrate the effectiveness of our approaches on benchmark control environments and flow control problems, including gust mitigation on a 3D NACA 0012 airfoil at $Re=1000$. SINDy-RL achieves comparable performance to modern DRL algorithms using significantly fewer interactions in the environment and results in an interpretable control policy orders of magnitude smaller than a DRL policy.

URL PDF HTML ☆

赞 0 踩 0

2507.17799 2026-06-08 eess.AS cs.LG cs.SD 版本更新

A Concept-based approach to Voice Disorder Detection

基于概念的方法用于声带疾病检测

Davide Ghia, Gabriele Ciravegna, Alkis Koudounas, Marco Fantini, Erika Crosetti, Giovanni Succo, Tania Cerquitelli

发表机构 * Politecnico di Torino ； CENTAI Institute（CENTAI研究院）； San Feliciano Hospital（San Feliciano医院）； SCDU Otorinolaringoiatria, Head Neck Cancer Unit, Ospedale San Giovanni Bosco（SCDU耳鼻喉科，头颈癌症单元，San Giovanni Bosco医院）； Dipartimento di Oncologia, Università degli Studi di Torino（肿瘤学系，托里尼大学）

AI总结本文提出基于概念的声带疾病检测方法，利用可解释AI提升模型透明度，与传统深度学习方法相比，实现更清晰的决策框架。

2506.12454 2026-06-08 stat.ML cond-mat.dis-nn cs.CR cs.LG 版本更新

On the existence of consistent adversarial attacks in high-dimensional linear classification

高维线性分类中一致对抗攻击存在的存在性研究

Matteo Vilucchio, Lenka Zdeborová, Bruno Loureiro

发表机构 * Information Learning and Physics Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)（信息学习与物理实验室，瑞士联邦理工学院（EPFL））； Statistical Physics of Computation Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)（计算统计物理实验室，瑞士联邦理工学院（EPFL））； Département d’Informatique, École Normale Supérieure - PSL & CNRS, France（信息学系，法国高等科学研究院（PSL）与国家科学研究中心（CNRS））

AI总结本文研究高维二分类中对抗攻击与模型表达能力有限导致的误分类区别，提出新的误差度量标准，揭示模型对保持真实标签扰动的脆弱性，理论分析显示模型越过度参数化，对标签保持扰动的敏感性越高。

详情

Journal ref: ICML 2026

AI中文摘要

本文研究高维二分类中对抗攻击与模型表达能力有限或数据有限导致的误分类的本质区别，提出新的误差度量标准，精确捕捉这一区别，量化模型对保持真实标签扰动的脆弱性。我们的主要技术贡献是精确且严谨地对这些度量在良好指定模型和潜在空间模型中的渐进行为进行刻画，揭示与标准稳健误差度量不同的脆弱性模式。理论结果表明，随着模型变得越来越过度参数化，其对标签保持扰动的脆弱性增加，为理解模型对对抗攻击的敏感机制提供了理论见解。

英文摘要

What fundamentally distinguishes an adversarial attack from a misclassification due to limited model expressivity or finite data? In this work, we investigate this question in the setting of high-dimensional binary classification, where statistical effects due to limited data availability play a central role. We introduce a new error metric that precisely capture this distinction, quantifying model vulnerability to consistent adversarial attacks -- perturbations that preserve the ground-truth labels. Our main technical contribution is an exact and rigorous asymptotic characterization of these metrics in both well-specified models and latent space models, revealing different vulnerability patterns compared to standard robust error measures. The theoretical results demonstrate that as models become more overparameterized, their vulnerability to label-preserving perturbations grows, offering theoretical insight into the mechanisms underlying model sensitivity to adversarial attacks.

URL PDF HTML ☆

赞 0 踩 0

2311.00212 2026-06-08 cs.LG cs.NA math.DG math.NA 版本更新

A Unified Framework to Enforce, Discover, and Promote Symmetry in Machine Learning

一种统一的框架用于在机器学习中强制、发现和促进对称性

Samuel E. Otto, Nicholas Zolman, J. Nathan Kutz, Steven L. Brunton

发表机构 * AI Institute in Dynamic Systems University of Washington（动态系统人工智能研究所华盛顿大学）； Sibley School of Mechanical and Aerospace Engineering, Cornell University（机械与航空航天工程学院，康奈尔大学）

AI总结本文提出统一框架，通过强制已知对称性、发现未知对称性和促进对称性三种方式，将对称性纳入机器学习模型中，基于李导数的数学框架统一了现有结果。

详情

Journal ref: J. Mach. Learn. Res. 26(248):1-83 (2025)

AI中文摘要

对称性在自然界中普遍存在，并在物理和机器学习中扮演越来越重要的角色。基本对称性，如庞加莱不变性，使在地球实验室发现的物理定律能够扩展到宇宙的最远区域。对称性对于在机器学习应用中实现这种扩展能力至关重要。例如，图像分类中的平移不变性使具有较少参数的模型，如卷积神经网络，能够用较小的数据集进行训练并达到最先进的性能。本文提供了一个统一的理论和方法框架，用于在三种方式中将对称性纳入机器学习模型：1. 在训练模型时强制已知对称性；2. 发现给定模型或数据集的未知对称性；3. 通过学习一个模型来促进对称性，该模型在用户指定的候选群中学习时，当数据中有足够证据时会打破对称性。我们证明这些任务可以被一个共同的数学框架所涵盖，其核心对象是与向量丛上的纤维线性李群作用相关的李导数。我们通过展示强制和发现对称性是线性代数任务，并且在李导数的双线性结构下是互为对偶的，扩展并统一了现有的结果。我们还提出了一种新的促进对称性的方式，通过引入基于李导数和核范数松弛的一类凸正则化函数，以在训练机器学习模型时惩罚对称性破坏。我们解释了这些想法如何应用于广泛范围的机器学习模型，包括基函数回归、动态系统发现、神经网络和作用于场的神经算子。

英文摘要

Symmetry is present throughout nature and continues to play an increasingly central role in physics and machine learning. Fundamental symmetries, such as Poincaré invariance, allow physical laws discovered in laboratories on Earth to be extrapolated to the farthest reaches of the universe. Symmetry is essential to achieving this extrapolatory power in machine learning applications. For example, translation invariance in image classification allows models with fewer parameters, such as convolutional neural networks, to be trained on smaller data sets and achieve state-of-the-art performance. In this paper, we provide a unifying theoretical and methodological framework for incorporating symmetry into machine learning models in three ways: 1. enforcing known symmetry when training a model; 2. discovering unknown symmetries of a given model or data set; and 3. promoting symmetry during training by learning a model that breaks symmetries within a user-specified group of candidates when there is sufficient evidence in the data. We show that these tasks can be cast within a common mathematical framework whose central object is the Lie derivative associated with fiber-linear Lie group actions on vector bundles. We extend and unify several existing results by showing that enforcing and discovering symmetry are linear-algebraic tasks that are dual with respect to the bilinear structure of the Lie derivative. We also propose a novel way to promote symmetry by introducing a class of convex regularization functions based on the Lie derivative and nuclear norm relaxation to penalize symmetry breaking during training of machine learning models. We explain how these ideas can be applied to a wide range of machine learning models including basis function regression, dynamical systems discovery, neural networks, and neural operators acting on fields.

URL PDF HTML ☆

赞 0 踩 0

2303.11949 2026-06-08 cs.NE cs.LG 版本更新

A fuzzy adaptive evolutionary-based feature selection and machine learning framework for single and multi-objective body fat prediction

一种基于模糊自适应进化的方法用于单目标和多目标身体脂肪预测的特征选择和机器学习框架

Farshid Keivanian, Raymond Chiong, Zongwen Fan

AI总结本文提出了一种融合模糊集理论和进化算法的特征选择与机器学习框架，用于提升身体脂肪预测的准确性与稳定性，同时解决多目标优化中的冲突问题。

Comments Due to unforeseen challenges in coordination and supervision, including unavoidable delays, this study requires further review and refinement. To ensure it meets necessary academic and methodological standards, we have decided to withdraw the paper. We appreciate the understanding of the research community

详情

DOI: 10.1016/j.neucom.2026.132974
Journal ref: Neurocomputing, Article 132974, 2026

AI中文摘要

预测身体脂肪可以为医疗人员和用户提供预防和诊断心脏病的重要信息。混合机器学习模型通过选择相关身体测量值并捕捉所选特征之间的复杂非线性关系，比简单的回归分析方法表现更好。然而，这些模型也存在一些缺点。将身体脂肪预测建模为组合的单目标和多目标优化问题时，常常陷入局部最优。当多个特征子集产生相似或接近的预测时，避免局部最优变得更加复杂。进化特征选择已被用于解决几种基于机器学习的优化问题。模糊集理论决定了探索和利用的适当水平，同时管理参数化和计算成本。通过进化特征选择、模糊集理论和机器学习算法，探索了一种加权求和身体脂肪预测方法，将矛盾的指标整合到一个复合目标中，由模糊自适应进化特征选择优化。混合模糊自适应全局学习局部搜索通用多样性特征选择应用于这种单目标特征选择-机器学习框架（FAGLSUD-based FS-ML）。在使用较少特征的情况下，该模型比其他混合和最新机器学习模型获得了更准确和稳定的脂肪百分比估计。还提出了多目标FAGLSUD-based FS-MLP，用于同时分析准确性、稳定性和维度冲突。为了做出关于最关键身体部位脂肪沉积和血液脂质水平的明智决策，医疗人员和用户可以使用一个良好的分布的帕累托集的权衡解决方案。

英文摘要

Predicting body fat can provide medical practitioners and users with essential information for preventing and diagnosing heart diseases. Hybrid machine learning models offer better performance than simple regression analysis methods by selecting relevant body measurements and capturing complex nonlinear relationships among selected features in modelling body fat prediction problems. There are, however, some disadvantages to them. Current machine learning. Modelling body fat prediction as a combinatorial single- and multi-objective optimisation problem often gets stuck in local optima. When multiple feature subsets produce similar or close predictions, avoiding local optima becomes more complex. Evolutionary feature selection has been used to solve several machine-learning-based optimisation problems. A fuzzy set theory determines appropriate levels of exploration and exploitation while managing parameterisation and computational costs. A weighted-sum body fat prediction approach was explored using evolutionary feature selection, fuzzy set theory, and machine learning algorithms, integrating contradictory metrics into a single composite goal optimised by fuzzy adaptive evolutionary feature selection. Hybrid fuzzy adaptive global learning local search universal diversity-based feature selection is applied to this single-objective feature selection-machine learning framework (FAGLSUD-based FS-ML). While using fewer features, this model achieved a more accurate and stable estimate of body fat percentage than other hybrid and state-of-the-art machine learning models. A multi-objective FAGLSUD-based FS-MLP is also proposed to analyse accuracy, stability, and dimensionality conflicts simultaneously. To make informed decisions about fat deposits in the most vital body parts and blood lipid levels, medical practitioners and users can use a well-distributed Pareto set of trade-off solutions.

URL PDF HTML ☆

赞 0 踩 0

2501.11592 2026-06-08 cs.LG cs.AI cs.CL 版本更新

Training-free Ultra Small Model for Universal Sparse Reconstruction in Compressed Sensing

无需训练的超小模型用于压缩感知中的通用稀疏重建

Chaoqing Tang, Huanze Zhuang, Guiyun Tian, Zhenli Zeng, Yi Ding, Wenzhong Liu, Xiang Bai

发表机构 * School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China（华中科技大学人工智能与自动化学院）； China Belt and Road Joint Lab on Measurement and Control Technology, Wuhan, China（中国一带一路测量与控制技术联合实验室）； School of Electric and Electrical Engineering, Chongqing University of Technology, Chongqing, China（重庆理工大学电气工程学院）； Optics Valley Laboratory, Wuhan, China（光谷实验室）； School of Water Conservancy and Transportation, Zhengzhou University, Zhengzhou, China（郑州大学水利与交通学院）； School of Software Engineering, Huazhong University of Science and Technology, Wuhan, China（华中科技大学软件工程学院）

AI总结本文提出无需训练的超小神经模型CL，实现快速稀疏重建，继承传统迭代方法的通用性和可解释性，提升效率和精度。

详情

DOI: 10.1109/TPAMI.2026.3680162

AI中文摘要

预训练大模型近年来受到广泛关注，但在需要高可解释性或资源有限的应用中面临挑战，如物理传感、医学成像和生物信息学。压缩感知（CS）是已证明的理论，推动了这些应用的许多突破。然而，作为典型的欠定线性系统，CS在使用传统迭代方法时，对大规模数据的稀疏重建时间过长。当前的AI方法如深度展开失败于替代它们，因为预训练模型在超出训练条件和数据分布时泛化性差或缺乏可解释性。本文提出名为系数学习（CL）的超小人工神经模型，实现无需训练的快速稀疏重建，同时完美继承传统迭代方法的泛化性和可解释性，带来融合先验知识的新特性。在CL中，长度为n的信号仅需最少n个可训练参数。一个案例研究模型称为CLOMP用于评估。实验在合成和真实的一维和二维信号上进行，显示了显著的效率和精度提升。与代表性的迭代方法相比，CLOMP在大规模数据上提高了100到1000倍的效率。在八个不同的图像数据集上的测试结果表明，CLOMP在采样率为0.1、0.3、0.5时分别提高了结构相似性指数292%、98%、45%。我们相信这种方法可以真正将CS重建带入AI时代，造福无数依赖稀疏解的欠定线性系统。

英文摘要

Pre-trained large models attract widespread attention in recent years, but they face challenges in applications that require high interpretability or have limited resources, such as physical sensing, medical imaging, and bioinformatics. Compressed Sensing (CS) is a well-proved theory that drives many recent breakthroughs in these applications. However, as a typical under-determined linear system, CS suffers from excessively long sparse reconstruction times when using traditional iterative methods, particularly with large-scale data. Current AI methods like deep unfolding fail to substitute them because pre-trained models exhibit poor generality beyond their training conditions and dataset distributions, or lack interpretability. Instead of following the big model fervor, this paper proposes ultra-small artificial neural models called coefficients learning (CL), enabling training-free and rapid sparse reconstruction while perfectly inheriting the generality and interpretability of traditional iterative methods, bringing new feature of incorporating prior knowledges. In CL, a signal of length $n$ only needs a minimal of $n$ trainable parameters. A case study model called CLOMP is implemented for evaluation. Experiments are conducted on both synthetic and real one-dimensional and two-dimensional signals, demonstrating significant improvements in efficiency and accuracy. Compared to representative iterative methods, CLOMP improves efficiency by 100 to 1000 folds for large-scale data. Test results on eight diverse image datasets indicate that CLOMP improves structural similarity index by 292%, 98%, 45% for sampling rates of 0.1, 0.3, 0.5, respectively. We believe this method can truly usher CS reconstruction into the AI era, benefiting countless under-determined linear systems that rely on sparse solution.

URL PDF HTML ☆

赞 0 踩 0

2407.15555 2026-06-08 eess.SP cs.LG 版本更新

The Rlign Algorithm for Enhanced Electrocardiogram Analysis through R-Peak Alignment for Explainable Classification and Clustering

通过R峰对齐提升心电图分析的Rlign算法：用于可解释分类和聚类

Lucas Plagwitz, Lucas Bickmann, Michael Fujarski, Alexander Brenner, Warnes Gobalakrishnan, Lars Eckardt, Antonius Büscher, Julian Varghese

发表机构 * IMI Medical Systems GmbH（IMI医疗系统 GmbH）； University of Freiburg（弗赖堡大学）

AI总结本文提出Rlign算法，通过R峰对齐重构心电图信号，提升分类、聚类和可解释性，优于传统方法和CNN。

详情

DOI: 10.1093/ehjdh/ztag067
Journal ref: European Heart Journal - Digital Health, Volume 7, Issue 5, June 2026, ztag067

AI中文摘要

心电图（ECG）记录长期以来在诊断不同心脏状况中至关重要。最近，使用机器学习方法自动处理ECG的研究变得重要，主要通过在原始ECG信号上使用深度学习方法。像卷积神经网络（CNNs）这样的模型的优势在于能够有效处理生物医学影像或信号数据。然而，这种优势受到缺乏可解释性、需要大量训练数据以及适应于无监督聚类任务的复杂性等挑战的限制。为解决这些问题，我们旨在通过利用其半结构化、周期性形式重新引入浅层学习技术，包括支持向量机和主成分分析，到ECG信号处理中。为此，我们开发并评估了一种转换，能够有效将ECG信号重构为完全结构化的格式，从而后续使用浅层学习算法进行分析。在本研究中，我们提出了这种自适应转换方法，通过在数据集中对所有信号的R峰进行对齐，并在有无心跳率依赖的情况下重新采样R峰之间的段落。我们展示了这种转换在分类、聚类和可解释性方面的显著优势，优于商业软件的中位心拍转换和CNN方法。我们的方法在处理有限训练数据时，显示出浅层机器学习方法相对于CNNs的显著优势。此外，我们发布了一个经过充分测试且公开可访问的代码框架，提供了一个稳健的对齐管道以支持未来研究，网址为https://github.com/imi-ms/rlign。

英文摘要

Electrocardiogram (ECG) recordings have long been vital in diagnosing different cardiac conditions. Recently, research in the field of automatic ECG processing using machine learning methods has gained importance, mainly by utilizing deep learning methods on raw ECG signals. A major advantage of models like convolutional neural networks (CNNs) is their ability to effectively process biomedical imaging or signal data. However, this strength is tempered by challenges related to their lack of explainability, the need for a large amount of training data, and the complexities involved in adapting them for unsupervised clustering tasks. In addressing these tasks, we aim to reintroduce shallow learning techniques, including support vector machines and principal components analysis, into ECG signal processing by leveraging their semi-structured, cyclic form. To this end, we developed and evaluated a transformation that effectively restructures ECG signals into a fully structured format, facilitating their subsequent analysis using shallow learning algorithms. In this study, we present this adaptive transformative approach that aligns R-peaks across all signals in a dataset and resamples the segments between R-peaks, both with and without heart rate dependencies. We illustrate the substantial benefit of this transformation for traditional analysis techniques in the areas of classification, clustering, and explainability, outperforming commercial software for median beat transformation and CNN approaches. Our approach demonstrates a significant advantage for shallow machine learning methods over CNNs, especially when dealing with limited training data. Additionally, we release a fully tested and publicly accessible code framework, providing a robust alignment pipeline to support future research, available at https://github.com/imi-ms/rlign.

URL PDF HTML ☆

赞 0 踩 0