arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.06546 2026-06-08 cs.LG 新提交

Elmes*: Automated Construction of Fine-Grained Evaluation Rubrics for Large Language Models in Long-Tail Educational Scenarios

Elmes*:面向长尾教育场景的大语言模型细粒度评估量规自动构建

Tao Liu, Ye Lu, Ruohua Zhang, Siyu Song, Wentao Liu, Aimin Zhou, Hao Hao

发表机构 * Shanghai Institute of AI for Education, East China Normal University(上海人工智能教育研究院,东华师范大学) School of Computer Science and Technology, East China Normal University(计算机科学与技术学院,东华师范大学) Shanghai Innovation Institute(上海创新研究院)

AI总结 提出Elmes*框架,自动构建细粒度场景特定量规,用于评估大语言模型在教育场景中的多维教学能力,构建Edu-330基准并揭示模型差异。

详情
AI中文摘要

评估用于教育的大语言模型(LLMs)需要衡量模型如何教学,而不仅仅是它们知道什么。现有基准强调领域通用正确性或依赖手动设计的量规,这些量规难以扩展到长尾教学场景。我们引入Elmes*,一个用于构建、优化和应用细粒度场景特定量规的端到端框架。Elmes*结合了用于教师-学生-评判者交互的声明式多智能体引擎与SceneGen(一个自演化模块,从专家定义的教学维度共同优化评估标准和测试数据)。使用Elmes*,我们构建了Edu-330,涵盖11个学科、3个年级段和10种任务类型的330个场景,包含超过1000个二级指标。在Edu-330和四个专家撰写的黄金标准场景上的实验表明,教育能力是多维的:顶级LLM主要在创造力和价值观整合方面存在差异,知识强的模型可能在苏格拉底式支架教学中失败,而教育专用模型InnoSpark获得了最佳的人工评估平均分。LLM评判者保持了与人类可比的排名,但评分方差更低,但表现出评判者特定的偏见,如自我偏好。消融实验表明,专家评分的少样本锚定改善了人机对齐,而推理强制和贪婪解码依赖于模型。因此,Elmes*为基于教学法的LLM评估提供了可扩展的诊断基础设施。

英文摘要

Evaluating large language models (LLMs) for education requires measuring how models teach, not only what they know. Existing benchmarks emphasize domain-general correctness or depend on manually designed rubrics that scale poorly to long-tail pedagogical scenarios. We introduce Elmes*, an end-to-end framework for constructing, refining, and applying fine-grained scenario-specific rubrics. Elmes* combines a declarative multi-agent engine for teacher--student--judge interactions with SceneGen, a self-evolving module that co-optimizes evaluation criteria and test data from expert-defined pedagogical dimensions. Using Elmes*, we build Edu-330, covering 330 scenarios across 11 subjects, 3 grade bands, and 10 task types, with over 1{,}000 second-level indicators. Experiments on Edu-330 and four expert-authored gold-standard scenarios show that educational capability is multidimensional: top-tier LLMs differ mainly in creativity and values integration, knowledge-strong models may fail at Socratic scaffolding, and the education-specialized InnoSpark achieves the best human-evaluated average score. LLM judges preserve human-comparable rankings with much lower scoring variance, but exhibit judge-specific biases such as self-preference. Ablations show that expert-scored few-shot anchoring improves human--LLM alignment, while reasoning enforcement and greedy decoding are model-dependent. Elmes* thus provides scalable diagnostic infrastructure for pedagogically grounded LLM evaluation.

2606.06547 2026-06-08 cs.LG cs.AI 新提交

FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

FAIR-Calib:面向扩散大语言模型训练后量化的前沿感知不稳定重加权校准

Haoyu Huang, Linlin Yang, Sheng Xu, Boyu Liu, Guodong Guo, Zhongqian Fu, Hang Zhou, Baochang Zhang

发表机构 * FAIR

AI总结 针对扩散大语言模型训练后量化中前沿决策易翻转并永久锁定放大的问题,提出两阶段PTQ框架FAIR-Calib,通过前沿命中与掩码阶段可靠性估计位置先验,并利用重加权隐状态MSE校准优先保护脆弱前沿状态,理论证明其作为输出KL散度代理,实验显著优于基线。

Comments Accepted as a poster at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

扩散大语言模型(dLLMs)迭代地精炼token,但不可逆地提交它们,导致“稳定性滞后”,即早期决策即使在写入后仍然脆弱。我们揭示训练后量化(PTQ)误差容易在写入前沿翻转这些边界决策,然后这些决策被永久锁定并放大。为了解决这个问题,我们提出了前沿感知不稳定重加权校准(FAIR-Calib),一个针对dLLMs的两阶段PTQ框架。第一阶段探测全精度教师模型以估计结合前沿命中与掩码阶段可靠性的位置先验。第二阶段通过最小化重加权隐状态MSE执行离策略、逐层校准,有效优先保护脆弱前沿状态,而无需昂贵的端到端扩散展开。我们进一步从理论上证明我们的加权目标是输出KL散度的代理。实验上,FAIR-Calib在LLaDA和Dream(W4A4)上持续优于最先进的基线,显著减少前沿决策翻转并抑制跨不同基准的提交后不匹配。

英文摘要

Diffusion Large Language Models (dLLMs) refine tokens iteratively but commit them irreversibly, leading to a "stability lag" where early decisions remain fragile even after being written. We reveal that Post-Training Quantization (PTQ) error easily flips these borderline decisions at the write frontier, which are then permanently locked in and amplified. To address this, we propose Frontier-Aware Instability-Reweighted Calibration (FAIR-Calib), a two-stage PTQ framework for dLLMs. Stage I probes a full-precision teacher to estimate a position prior that combines frontier hits and masked-stage reliability. Stage II performs off-policy, layer-wise calibration by minimizing a reweighted hidden-state MSE, effectively prioritizing the protection of fragile frontier states without requiring expensive end-to-end diffusion rollouts. We further theoretically justify our weighted objective as a surrogate for output KL divergence. Empirically, FAIR-Calib consistently outperforms state-of-the-art baselines on LLaDA and Dream (W4A4), significantly reducing frontier decision flips and suppressing post-commit mismatches across diverse benchmarks.

2606.06560 2026-06-08 cs.LG cs.AI cs.HC 新提交

MacArena: Benchmarking Computer Use Agents on an Online macOS Environment

MacArena: 在在线macOS环境中基准测试计算机使用代理

Victor Muryn, Maksym Shamrai, Sofiia Mazepa, Yehor Khodysko

发表机构 * MacPaw

AI总结 提出MacArena基准,包含421个任务和50个应用,在Apple Silicon上运行,揭示macOS对GUI代理的独特挑战,模型排名在移植任务和原生任务间反转。

Comments Accepted to the Second Workshop on Agents in the Wild: Safety, Security, and Beyond (AIWILD) at ICML 2026

详情
AI中文摘要

计算机使用代理(CUA)通过视觉和控制原语操作图形用户界面(GUI),其能力迅速提升,部分得益于标准化在线评估基准(如OSWorld),这些基准既作为评估工具,也作为强化学习的训练环境。然而,macOS在此领域中仍未被充分覆盖:现有唯一基准macOSWorld仅覆盖少量第一方应用且任务较简单,并在与Apple Silicon不兼容的x86虚拟机上运行。我们引入MacArena,一个包含50个应用中421个手动验证任务的基准,结合了OSWorld任务的精选移植、来自macOSWorld的内容以及49个新的macOS原生任务,全部在Apple Silicon上的Apple原生虚拟化框架上运行。我们认为macOS呈现了Linux基准无法捕捉的独特GUI挑战,我们的评估支持这一观点:现有基准上的强模型性能可能反映对任务分布的熟悉程度,而非真正的跨平台GUI能力。值得注意的是,模型排名在移植任务和macOS原生任务之间发生反转,领先模型在MacArena子集上落后超过26%,表明macOS对当前GUI代理构成了一个真正更困难的环境。

英文摘要

Computer-use agents (CUAs) operate graphical user interfaces (GUIs) through vision and control primitives, and their capabilities have advanced rapidly, driven in part by standardized online evaluation benchmarks such as OSWorld, which serve both as evaluation tools and as training environments for reinforcement learning. However, macOS remains underserved in this landscape: the only existing benchmark, macOSWorld, covers a narrow slice of first-party applications with simpler tasks, and runs on x86 virtual machines incompatible with Apple Silicon. We introduce MacArena, a benchmark of 421 manually verified tasks spanning 50 applications that combines a curated port of OSWorld tasks, content sourced from macOSWorld, and 49 new macOS-native tasks, all running on Apple's native Virtualization framework on Apple Silicon. We argue that macOS presents distinct GUI challenges beyond what Linux-based benchmarks capture, and our evaluation supports this claim: strong model performance on existing benchmarks can reflect familiarity with task distributions rather than genuine cross-platform GUI competence. Notably, model rankings invert between ported and macOS-native tasks, with a leading model trailing by over 26% on the MacArena subset, suggesting that macOS poses a genuinely harder environment for current GUI agents.

2606.06567 2026-06-08 cs.LG 新提交

Are you sure? A Comprehensive and Comprehensible Survey of Uncertainty Quantification in Symbolic Regression

你确定吗?符号回归中不确定性量化的全面且可理解的综述

Julia Reuter, Fabricio Olivetti de Franca

AI总结 综述符号回归中的不确定性量化方法,涵盖频率派、贝叶斯和模型选择三个研究方向,指出该领域尚待探索。

详情
AI中文摘要

符号回归(SR)是一类系统探索数学函数空间以发现准确捕捉数据集中潜在关系的模型的方法。尽管该领域近期取得了进展,但缺乏对不确定性量化(UQ)的支持限制了其在现实决策过程中的应用。在回归分析中,UQ提供了关于模型可靠性的重要信息,这既可以通过考虑数据中的不确定性来帮助避免过拟合,也可以为决策提供见解。本综述首次明确解决这一问题,旨在介绍基本的UQ概念并回顾当前SR中UQ的文献,这些文献大致可分为三个研究方向:频率派、贝叶斯和模型选择。尽管其重要性,SR中的UQ仍未被充分探索,这激励了对SR可靠UQ方法的进一步研究。

英文摘要

Symbolic regression (SR) is a class of methods that systematically explore the space of mathematical functions to discover models that accurately capture the underlying relationships in a dataset. Despite recent advances in the field, a lack of support for uncertainty quantification (UQ) limits its adoption in real-world decision processes. In regression analysis, UQ provides important information about the model reliability, which can both help to avoid overfitting by accounting for uncertainty in the data, and provide insights for decision-making. This survey is the first to clearly address this issue, with the objective of introducing essential UQ concepts and reviewing the current literature on UQ in SR, which can be broadly organized into three research directions: frequentist, Bayesian, and model selection. Despite its importance, UQ in SR is still underexplored, which motivates further research into reliable UQ methods for SR.

2606.06572 2026-06-08 cs.LG cs.AI cs.CY econ.GN q-fin.EC 新提交

Generative Models Erode Human Temporal Learning Through Market Selection

生成模型通过市场选择侵蚀人类时间学习

Wenjun Cao

AI总结 本文论证现代生成模型在亚AGI能力水平上通过市场选择机制侵蚀人类时间学习,提出价值崩溃路径并用昂贵检验框架形式化,跨领域证据显示验证侵蚀四阶段。

Comments Accepted at ICML 2026

详情
Journal ref
Forty-third International Conference on Machine Learning Position Paper Track (2026)
AI中文摘要

我们认为,现代生成模型在当前亚AGI能力水平上对知识和文化生产造成了结构性风险。我们将人类时间学习(HTL)定义为通过长期持续参与问题而形成的路径依赖的知识积累。生成输出在表面特征上越来越像HTL密集型工作,因此验证给定输出是否反映真正的人类学习的成本相对于其预期收益变得高昂。一旦验证失去经济合理性,评估者就会奖励输出而不论其生产模式,而投入多年学习的生产者则在与几乎零成本生成的输出的价格竞争中处于劣势。我们将这一路径称为价值崩溃,并通过一个昂贵检验框架将其形式化。来自学术出版、法律实践、内容平台和软件安全的跨领域证据映射出验证侵蚀的四个阶段。对齐成功是正交的。更好的对齐模型缩小了人类与AI输出之间的可观察差距,使得来源验证更加困难,并加剧了对HTL密集型工作的竞争压力,即使单个AI输出有所改进。

英文摘要

We argue that modern generative models create structural risks for knowledge and cultural production at current, sub-AGI capability levels. We define Human Temporal Learning (HTL) as path-dependent knowledge accumulation through sustained engagement with problems over time. Generative outputs increasingly resemble HTL-intensive work in surface features, so verifying whether a given output reflects genuine human learning grows costly relative to its expected benefit. Once verification loses economic justification, evaluators reward outputs regardless of production mode, and producers who invested years of learning compete on price against outputs that cost almost nothing to generate. We call this pathway value collapse and formalize it through a costly-inspection framework. Cross-domain evidence from academic publishing, legal practice, content platforms, and software security maps onto four stages of verification erosion. Alignment success is orthogonal. Better-aligned models narrow observable gaps between human and AI outputs, making source verification harder and intensifying competitive pressure against HTL-intensive work even when individual AI outputs improve.

2606.06574 2026-06-08 cs.LG 新提交

Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

跳过一层还是循环它?学习LLM中的层程序

Ziyue Li, Yang Li, Tianyi Zhou

发表机构 * Ziyue Li, Yang Li, Tianyi Zhou(李子悦、李阳、周天毅)

AI总结 本文提出PoLar方法,通过轻量级预测网络学习为每个输入动态跳过或重复预训练层,在数学推理任务上以更少层数提升准确率。

Comments Accepted at ICML 2026. Substantially extends arXiv:2507.07996. Code: https://github.com/tianyi-lab/PoLar

详情
AI中文摘要

大型语言模型(LLM)通过遵循所有层的固定深度和顺序(非循环执行)进行推理。我们揭示了无训练、灵活、动态的层程序(PoLar)的广泛存在,其中预训练层可以作为模块打包,然后跳过或循环以形成针对每个输入的自定义程序。对于大多数输入,显著更短的程序执行可以达到相同或更好的准确性,而原始LLM的错误预测可以通过具有更少层的替代程序来纠正。这些观察表明,推理允许超出标准前向传递的多种有效潜在计算。为了在实践中高效实现PoLar,我们提出了一个轻量级的PoLar预测网络,该网络学习生成执行程序,为每个输入动态跳过或重复预训练层。在数学推理基准上的实验表明,PoLar在通常执行更少层的情况下,持续提高了标准推理和先前动态深度方法的准确性,并且这些增益在分布外评估中仍然存在。我们的结果表明,固定深度执行仅捕获了LLM潜在推理能力的一小部分。

英文摘要

Large language models (LLMs) perform inference by following a fixed depth and order, non-recurrent execution of all layers. We reveal the wide existence of training-free, flexible, dynamic program-of-layers (PoLar), where pretrained layers can be packed as modules and then skipped or looped to form a customized program for each input. For most inputs, substantially shorter program executions can achieve the same or better accuracy, while incorrect predictions of the original LLM can be corrected by alternative programs with fewer layers. These observations indicate that inference admits multiple valid latent computations beyond the standard forward pass. To efficiently achieve PoLar in practice, we propose a lightweight PoLar prediction network, which learns to generate execution programs that dynamically skip or repeat pretrained layers for each input. Experiments on mathematical reasoning benchmarks demonstrate that PoLar consistently improves accuracy over standard inference and prior dynamic-depth methods, often while executing fewer layers, and that these gains persist under out-of-distribution evaluation. Our results suggest that fixed-depth execution captures only a narrow subset of an LLM's latent reasoning capacity.

2606.06576 2026-06-08 cs.LG astro-ph.EP astro-ph.IM stat.ML 新提交

Gaussian Process Latent Factor Regression for Low-Data, High-Dimensional Output Problems

高斯过程潜在因子回归用于低数据高维输出问题

Edward T. Stevenson, Eric T. Wolf, Mei Ting Mak, N. J. Mayne, Miles Cranmer

发表机构 * University of Cambridge(剑桥大学) University of Colorado Boulder(科罗拉多大学博尔德分校) University of Oxford(牛津大学) University of Exeter(埃克塞特大学)

AI总结 提出高斯过程潜在因子回归(GPLFR)模型,通过将输出表示为低维潜在状态的线性高斯解码,联合优化压缩与预测,解决低数据高维输出回归问题,并首次构建岩石系外行星全球气候模型的空间分辨仿真器。

Comments 9 pages content + 22 pages appendix/references. Supporting code at https://github.com/edstevenson/GPLFR

详情
AI中文摘要

在科学领域,回归任务通常需要从少量训练样本预测高维输出。多输出高斯过程在低数据场景中表现出色,但通常难以处理高维输出。PCA-GP(主成分分析加高斯过程回归)等压缩-预测流程处理了高维性,但依赖于为重构而非预测优化的基。为弥补这一差距,我们提出一个模型,将每个输出表示为从高斯过程先验中抽取的低维潜在状态的线性高斯解码。通过解析地边缘化解码器权重,我们将压缩和预测耦合在一个可扩展到高维输出的单一目标中。我们将此模型称为高斯过程潜在因子回归(GPLFR)。我们通过构建首个岩石系外行星全球气候模型的空间分辨仿真器来演示GPLFR。

英文摘要

In the sciences, regression tasks often require predicting high-dimensional outputs from few training examples. Multi-output Gaussian processes excel in low-data regimes but typically struggle with high-dimensional outputs. Compress-then-predict pipelines such as PCA-GP (principal component analysis plus Gaussian process regression) handle high dimensionality, but rely on bases optimized for reconstruction rather than prediction. To address this gap, we propose a model that represents each output as a linear-Gaussian decoding of a low-dimensional latent state drawn from a Gaussian process prior. By analytically marginalizing the decoder weights, we couple compression and prediction in a single objective that scales to high-dimensional outputs. We refer to this model as Gaussian process latent factor regression (GPLFR). We demonstrate GPLFR by building the first spatially resolved emulator of global climate models for rocky exoplanets.

2606.06647 2026-06-08 cs.LG q-bio.NC 新提交

The Identity Trap in EEG Foundation Models: A Diagnostic Audit

脑电图基础模型中的身份陷阱:一项诊断性审计

Jun-You Lin, Ying Choon Wu, Tzyy-Ping Jung

发表机构 * National Yang Ming Chiao Tung University University of California, San Diego

AI总结 提出FMScope协议,通过方差分解、主题轴擦除等五种诊断方法,揭示EEG基础模型在受试者分离交叉验证中可能依赖受试者身份特征而非临床生物标志物,并验证了该陷阱的普遍性及可移除性。

Comments 28 pages, 6 figures, 8 tables. Code available at https://github.com/Jimmy110101013/fmscope

详情
AI中文摘要

目标。EEG基础模型(FMs)在临床静息态EEG上报告了强准确性。然而,在受试者分离交叉验证下的高准确性仍然模棱两可:它可能反映真实的临床生物标志物,也可能反映与标签相关的受试者身份特征。我们将其命名为身份陷阱,并询问是否可以在微调之前从表示层面进行诊断。方法。我们提出FMScope,一种冻结表示协议,包含五种诊断方法:方差分解、受试者轴擦除、非周期性1/f消融、逐层标签探测和受试者内方向一致性。我们将其应用于三个预训练FM(LaBraM、CBraMod、REVE),在四个数据集上采用2x2布局:标签的受试者关系 x 是否存在共识的跨受试者EEG标志物。主要结果。(i) 身份陷阱是普遍存在的:在12/12对中,冻结的受试者方差是随机零假设的13-89倍,在微调下所有12对均上升(+10至+63个百分点)。这种主导性是一个可移除的线性轴:在标签在受试者内变化的情况下,擦除它可改善标签解码(主要单元中+6至+12个百分点;外部队列中+4至+27个百分点)。(ii) 非周期性1/f是受试者身份的一个载体:移除它会使LaBraM和CBraMod上的受试者探测下降9-19个百分点。REVE在无可测量的非周期性依赖下饱和了受试者身份。(iii) 微调仅在具有文献确立的跨受试者标志物的单元中放大标签方差。意义。身份陷阱是捷径学习的一个物理基础实例:偏好线索具有可测量的生理成分,仅靠受试者分离分割无法排除它。FMScope将反映生物标志物的增益与反映受试者身份的增益分开。

英文摘要

Objective. EEG foundation models (FMs) report strong accuracy on clinical resting-state EEG. However, high accuracy under subject-disjoint cross-validation remains ambiguous: it can reflect a genuine clinical biomarker, or subject-identity features that correlate with the label. We name this the Identity Trap and ask whether it can be diagnosed at the representation level before fine-tuning. Approach. We propose FMScope, a frozen-representation protocol packaging five diagnostics: variance decomposition, subject-axis erasure, aperiodic 1/f ablation, layer-wise label probing, and within-subject direction consistency. We apply it to three pretrained FMs (LaBraM, CBraMod, REVE) across four datasets in a 2x2 layout: subject relation of label x presence of a consensus cross-subject EEG marker. Main results. (i) The Identity Trap is universal: frozen subject-variance is 13-89x a random null in 12/12 pairs, rising in all 12 under fine-tuning (+10 to +63 pp). This dominance is a removable linear axis: erasing it improves label decoding where the label varies within subject (+6 to +12 pp in primary cells; +4 to +27 pp across external cohorts). (ii) Aperiodic 1/f is one subject carrier: removing it drops the subject probe by 9-19 pp on LaBraM and CBraMod. REVE saturates subject identity without measurable aperiodic dependence. (iii) Fine-tuning amplifies label-variance only in cells with a literature-established cross-subject marker. Significance. The Identity Trap is a physically-grounded instance of shortcut learning: the preferred cue has a measurable physiological component, and subject-disjoint splitting alone cannot rule it out. FMScope separates gains reflecting a biological marker from those reflecting subject identity.

2606.06658 2026-06-08 cs.LG cond-mat.stat-mech physics.comp-ph 新提交

Capturing non-Markovian dynamics in non-equilibrium stochastic systems using flow matching

利用流匹配捕捉非平衡随机系统中的非马尔可夫动力学

Bhargav Sriram Siddani, John B. Bell, Alejandro L. Garcia, Ishan Srivastava

发表机构 * Lawrence Berkeley National Laboratory(伯克利国家实验室) San Jose State University(圣何塞州立大学)

AI总结 针对粗粒化随机偏微分方程无法准确捕捉短时非马尔可夫效应和低密度非高斯分布的问题,提出生成式流匹配方法直接建模粒子模拟中的概率通量分布,在Kramers首通时间问题中准确捕捉短时行为并改进数密度统计矩预测。

Comments 5 pages, 1 figure, Accepted to 2026 Conference on Physics and AI (PAI26)

详情
AI中文摘要

由粗粒化随机偏微分方程(如正则化Dean-Kawasaki方程)表示的随机粒子系统的流体动力学模型,无法准确捕捉以非马尔可夫效应为主的短时系统动力学,以及分布高度非高斯化的低粒子密度区域。我们开发了一种生成式流匹配方法,直接对粒子模拟中的通量概率分布进行建模,明确包含了非马尔可夫和非高斯效应。作为演示,我们使用该方法模拟非相互作用布朗粒子系统的Kramers首次通过时间问题。结果表明,与马尔可夫基线(正则化DK方程)的解相比,该模型准确捕捉了短时行为,并提供了数密度统计矩的更好预测。

英文摘要

Hydrodynamic models of stochastic particle systems represented by coarse-grained stochastic partial differential equations (SPDE), such as the regularized Dean-Kawasaki (DK) equation, do not accurately capture the short-time system dynamics that is dominated by non-Markovian effects, and low particle density regimes where the distributions are highly non-Gaussian. We develop a generative flow matching method that directly models the probability distribution of fluxes from particle simulations that explicitly incorporates non-Markovian and non-Gaussian effects. As a demonstration, we use this method to simulate the Kramers first passage time problem for a system of non-interacting Brownian particles. We show the model accurately captures the short-time behavior and provides better predictions of the statistical moments of the number density when compared against the solution of the Markovian baseline, regularized DK equation.

2606.06663 2026-06-08 cs.LG 新提交

Explainable Runtime Dependency Tracking for AI-RAN Conflict Monitoring

面向AI-RAN冲突监控的可解释运行时依赖追踪

Christie Djidjev, Nicholas Kaminski

AI总结 针对AI-RAN中参数-KPI依赖关系可能失效的问题,提出基于布尔矩阵的滑动窗口推理方法,通过事件流一致性检测实现轻量级可解释依赖追踪。

详情
AI中文摘要

未来集成AI的无线接入网络(AI-RAN)将结合开放可编程性与支持学习的xApps、rApps以及作用于共享参数和关键性能指标(KPI)的控制功能。对于冲突监控,仅知道部署了哪些应用是不够的;系统还必须知道运行时诊断所假设的参数-KPI依赖关系在当前运行状态下是否仍然有效。本文研究了一种轻量级的监控原语:从流式遥测事件中追踪可解释的依赖关系表示。我们将活跃依赖关系表示为布尔矩阵,并使用布尔矩阵乘法来检查最近的参数活动事件和KPI响应事件是否与当前估计一致。我们提出了一种滑动窗口推理过程,当估计一致时重复使用,当最近观测表明结构变化时重新计算。该追踪器旨在作为冲突诊断和慢循环模型刷新的可解释信号,而非自主缓解机制。在受控的布尔事件流上的实验表明,在依赖关系变化和布尔观测噪声下,该追踪器能够高效且准确地追踪。

英文摘要

Future AI-integrated Radio Access Networks (AI-RAN) will combine open programmability with learning-enabled xApps, rApps, and control functions that act on shared parameters and key performance indicators (KPIs). For conflict monitoring, it is not enough to know which applications are deployed; the system must also know whether the parameter--KPI dependencies assumed by runtime diagnosis remain valid under the current operating regime. This paper studies a lightweight monitoring primitive for that purpose: tracking an interpretable dependency representation from streaming telemetry events. We represent active dependencies by a Boolean matrix and use Boolean matrix multiplication to check whether recent parameter-activity and KPI-response events are consistent with the current estimate. We propose a sliding-window inference procedure that reuses the estimate when it remains consistent and recomputes it when recent observations indicate structural change. The tracker is intended as an explainable signal for conflict diagnosis and slow-loop model refresh, not as an autonomous mitigation mechanism. Experiments on controlled Boolean event streams show efficient and accurate tracking under dependency changes and Boolean observation noise.

2606.06673 2026-06-08 cs.LG 新提交

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

不确定性感知的LLM引导策略塑形用于稀疏奖励强化学习

Ujjwal Bhatta, Utsabi Dangol, Sumaly Bajracharya, Rodrigue Rizk, KC Santosh

发表机构 * USD AI Research Lab(USD人工智能研究实验室)

AI总结 提出ULPS框架,结合校准的大语言模型与不确定性估计,通过A*轨迹微调BERT模型提供动作建议,并用熵机制平衡LLM引导与PPO策略,在MiniGridUnlockPickup基准上显著提升成功率、奖励效率和样本复杂度。

Comments Accepted to the 2026 IEEE Conference on Artificial Intelligence (IEEE CAI). 6 pages, 3 figures. Code available at: https://github.com/USD-AI-ResearchLab/uncertainty-aware-llm-rl

详情
AI中文摘要

稀疏奖励和异构任务序列仍然是强化学习(RL)中的持续挑战,常常导致收敛缓慢、泛化能力弱和探索效率低下。我们提出不确定性感知的LLM引导策略塑形(ULPS),这是一个新颖的框架,将校准的大语言模型(LLM)集成到RL训练循环中,以提供结构化、不确定性调制的行为引导。ULPS采用基于A*的预言机来合成最优符号轨迹,用于微调基于BERT的语言模型。在训练过程中,该模型提供动作建议,其影响取决于通过蒙特卡洛(MC)dropout估计的认知不确定性。基于熵的混合机制自适应地平衡LLM引导和学习到的策略(通过近端策略优化,PPO),使智能体能够优先考虑可靠先验,同时保持适应性。我们在MiniGridUnlockPickup基准上评估ULPS,并观察到在成功率、奖励效率和样本复杂度上,相对于无引导、未校准和标准RL基线的一致改进。ULPS在微调后执行准确率提高了9%以上,需要更少的环境交互,并获得了更高的奖励AUC。我们的结果表明,集成符号A*轨迹、预训练语言先验和不确定性感知控制,为稀疏奖励领域中的多任务强化学习提供了一种原则性且有效的方法,并具有扩展到部分可观察和多智能体设置的潜力。

英文摘要

Sparse rewards and heterogeneous task sequences remain persistent challenges in Reinforcement Learning (RL), often resulting in slow convergence, weak generalization, and inefficient exploration. We propose Uncertainty-Aware LLM-Guided Policy Shaping (ULPS), a novel framework that integrates a calibrated Large Language Model (LLM) into the RL training loop to provide structured, uncertainty-modulated behavioral guidance. ULPS employs an A*-based oracle to synthesize optimal symbolic trajectories, which are used to fine-tune a BERT-based language model. During training, this model supplies action suggestions whose influence is conditioned on epistemic uncertainty estimated via Monte Carlo (MC) dropout. An entropy-based blending mechanism adaptively balances LLM guidance and the learned policy (via Proximal Policy Optimization, PPO), allowing the agent to prioritize reliable priors while preserving adaptability. We evaluate ULPS on the MiniGridUnlockPickup benchmark and observe consistent improvements in success rate, reward efficiency, and sample complexity over unguided, uncalibrated, and standard RL baselines. ULPS achieves more than 9% improvement in execution accuracy after fine-tuning, requires fewer environment interactions, and yields higher reward AUC. Our results demonstrate that integrating symbolic A* trajectories, pretrained language priors, and uncertainty-aware control offers a principled and effective approach to multi-task reinforcement learning in sparse-reward domains, with potential extensibility to partially observable and multi-agent settings.

2606.06682 2026-06-08 cs.LG 新提交

Spatiotemporal Imputation with Graph-Informed Flow Matching

基于图信息流匹配的时空插补

Zepeng Zhang, Aref Einizade, Jhony H. Giraldo, Olga Fink

AI总结 提出GiFlow框架,利用图信息先验和混合向量场模型进行时空插补,优于现有方法。

Comments Accepted at ICML 2026

详情
AI中文摘要

缺失数据是时空系统中的常见挑战,出现在空气质量监测和城市交通管理等应用中。传统的机器学习方法,如循环神经网络和图神经网络,依赖于迭代传播,这往往会在时间和空间上累积误差。最近的基于扩散的方法减轻了误差传播,但需要迭代采样,并且通常依赖于问题无关的高斯先验,限制了效率和有效性。为了解决这些局限性,我们提出了GiFlow,一种用于时空插补的图信息流匹配框架。GiFlow将典型的高斯先验替换为通过时空滤波可观测信号构建的图信息先验,这更好地使源分布与目标对齐,从而简化了生成轨迹。流场由一个混合向量场模型参数化,该模型整合了空间注意力、时间注意力和时空传播,能够联合建模空间和时间依赖性。在合成和真实世界数据集上的大量实验表明,所提出的GiFlow在时空插补中优于最先进的方法。代码可在该 https URL 获取。

英文摘要

Missing data is a common challenge in spatiotemporal systems, arising in applications such as air quality monitoring and urban traffic management. Traditional machine learning approaches, like recurrent and graph neural networks, rely on iterative propagation, which tends to accumulate errors over time and space. Recent diffusion-based methods mitigate error propagation but require iterative sampling and often depend on problem-agnostic Gaussian priors, limiting both efficiency and effectiveness. To address these limitations, we propose GiFlow, a Graph-Informed Flow Matching framework for spatiotemporal imputation. GiFlow replaces the typical Gaussian prior with a graph-informed prior constructed via spatiotemporal filtering of observable signals, which better aligns the source distribution to the target and thereby simplifies the generation trajectory. The flow field is parameterized by a hybrid vector field model that integrates spatial attention, temporal attention, and spatiotemporal propagation, enabling joint modeling of spatial and temporal dependencies. Extensive experiments on both synthetic and real-world datasets demonstrate that the proposed GiFlow outperforms the state-of-the-art approaches in spatiotemporal imputation. The code is available at https://github.com/zepengzhang/GiFlow.

2606.06687 2026-06-08 cs.LG cs.DC cs.NI cs.SY eess.SY 新提交

Towards Serverless Semi-Decentralized Federated Learning with Heterogeneous Optimizers

面向异构优化器的无服务器半去中心化联邦学习

Su Wang, Mung Chiang, H. Vincent Poor

发表机构 * Department of Electrical and Computer Engineering, Princeton University(普林斯顿大学电子工程与计算机科学系) Department of Electrical and Computer Engineering, Purdue University(帕克森大学电子工程与计算机科学系)

AI总结 提出无服务器半去中心化联邦学习(SSD-FL),通过轻量级D2D初始化实现聚类,利用有效损失函数和Cheeger不等式优化聚类,提升收敛速度和通信效率。

Comments Under review at IEEE/ACM Transactions on Networking

详情
AI中文摘要

我们研究了在具有异构机器学习优化器的去中心化联邦学习中的聚类形成,包括聚类的数量和组成。虽然集中式联邦学习中的聚类已经实现了可扩展性和资源节省,但其在完全去中心化环境中的价值和开发仍有待探索。在此类环境中优化聚类形成具有挑战性,尤其是由于网络图结构、本地数据异构性和不同本地ML模型优化器之间的复杂耦合。为了解决这些挑战,我们提出了无服务器半去中心化联邦学习(SSD-FL),一种不需要持久服务器基础设施的方法。在SSD-FL中,聚类通过轻量级、一次性的设备到设备(D2D)初始化阶段形成,之后实际的ML模型训练(以及共识和收敛过程)完全是无服务器的。在功能上,SSD-FL将全局轮次分割为簇内和簇间机制,通过新颖的“有效损失函数”确保全局收敛和共识,该函数将设备特定的ML优化器与基于网络图的正则化相结合。接下来,SSD-FL利用Cheeger不等式的共识差距,开发了一种迭代聚类算法,该算法根据我们推导的收敛和共识界限进行评估,其中包含一个独特的评分指标,用于量化设备和优化器的异构性。最后,针对三类去中心化联邦学习方法的实验评估验证了SSD-FL在各种网络图、数据集和本地优化器机制下提高了收敛速度和通信效率。

英文摘要

We investigate cluster formation, involving the number and composition of clusters, in decentralized federated learning (FL) with heterogeneous machine learning (ML) optimizers. While clustering in centralized FL has enabled scalability and resource savings, its value and development in fully decentralized environments have yet to be explored. Optimizing cluster formation in such environments is challenging, especially due to the complex coupling between network graph structures, local data heterogeneity, and different local ML model optimizers. To address these challenges, we propose serverless semi-decentralized FL (SSD-FL), a methodology requiring no persistent server infrastructure. In SSD-FL, cluster formation occurs via a lightweight, one-time device-to-device (D2D) initialization phase, after which actual ML model training (alongside consensus and convergence processes) is fully serverless. Functionally, SSD-FL segments global rounds into intra-cluster and inter-cluster regimes, ensuring global convergence and consensus through novel "effective loss functions" that integrate device-specific ML optimizers with network graph-based regularization. Next, SSD-FL leverages the consensus gap via the Cheeger inequality to develop an iterative clustering algorithm evaluated against our derived convergence and consensus bounds, which incorporate a unique scoring metric to quantify data and optimizer heterogeneity across devices. Finally, experimental evaluation against three categories of decentralized FL methodologies validate that SSD-FL improves both convergence speeds and communication efficiency across various network graphs, datasets, and local optimizer regimes.

2606.06694 2026-06-08 cs.LG cs.AI cs.CY 新提交

The Geography of Algorithmic Judgment: LLM Intermediaries, Place Identity, and Racial Steering in Housing Search

算法判断的地理:LLM中介、地方身份与住房搜索中的种族引导

Hana Samad, Trung Lam, Christoph Mügge-Durum, Michael Akinwumi

发表机构 * National Fair Housing Institute(国家公平住房研究所)

AI总结 通过行为审计七种LLM在四个美国城市的住房推荐,发现种族引导是模型解释性许可的涌现行为,而非静态属性,且城市并非中性测试单元。

Comments 13 pages with supplemental tables and figures, AIES '26 Submission

详情
AI中文摘要

大型语言模型(LLM)正迅速在住房搜索中扮演中介角色,通过将列表平台集成到对话界面中,调解城市环境中的信息访问、搜索和推荐。我们扩展了先前关于LLM中种族引导的研究,对七个开放权重和闭源LLM在四个美国城市进行了行为审计,测试了三种迭代提示条件下的位置推荐,这些条件逐步添加生活方式偏好背景并反映公平住房配对测试方法。我们发现,引导是模型解释性许可的涌现行为,而非主要静态属性。引导源于用户身份、偏好表达以及模型内化的关于特定城市中地点、偏好和机会的学习表征的空间逻辑之间的相互作用,以及不同类型用户如何与之关联。虽然存在引导,但在评估条件下其方向和幅度并不一致。偏好条件测试通常增加或重新配置了相对于基线条件表现出引导行为的模型数量,表明LLM可能根据用户的种族身份对相同的住房偏好做出不同解释。我们的发现还表明,在基于地点的领域进行LLM评估时,城市并非中性测试单元,一个本地市场的结果不能假定推广到另一个市场。在住房领域,需要本地和领域专业知识,以确保法律和制度对公平住房的承诺不会因采用调解空间访问的AI工具而受到损害。

英文摘要

Large language models (LLMs) are rapidly assuming an intermediary role in housing search through the integration of listing platforms within conversational interfaces, mediating access to information, search, and recommendations within urban settings. We expand on prior work on racial steering in LLMs by conducting a behavioral audit of seven open-weight and closed-source LLMs across four U.S. cities, testing location recommendations across three iterative prompting conditions that progressively add lifestyle preference context and reflect fair housing paired-testing methodologies. We find that steering is an emergent behavior of the model's interpretive license rather than primarily a static property. Steering results from the interaction of a user's identity, preference articulation, and the spatial logic that a model has internalized about learned representations of place, preference, and opportunity in a given city, and how different types of users relate to it. While steering was present, it was not uniform in direction or magnitude across evaluated conditions. Preference-conditioned testing often increased or reconfigured the number of models that exhibited steering behaviors relative to baseline conditions, suggesting that LLMs may interpret what the same housing preference means differently depending on the racial identity of the user. Our findings also demonstrate that the city is not a neutral testing unit for LLM evaluation in place-based sectors, and results from one local market cannot be assumed to generalize to another. Local and domain expertise will be required in the housing sector to ensure that legal and institutional commitments to fair housing are not undermined while adopting AI tools that mediate spatial access.

2606.06717 2026-06-08 cs.LG cs.AI q-bio.BM q-bio.QM 新提交

ShallowBench: Benchmarking Generative Drug Design Models on Shallow-Pocket Targets

ShallowBench: 浅口袋靶标上的生成式药物设计模型基准测试

Saket Reddy, Shiwei Liu

发表机构 * University of Illinois - Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出ShallowBench基准,包含5780个浅口袋靶标,用于评估生成式药物设计模型在低凹度界面上的性能,揭示现有模型预测结合亲和力较弱的问题。

详情
AI中文摘要

虽然生成式AI模型在基于结构的药物设计中已展现出显著成功,但它们主要依赖深结合口袋,难以对具有挑战性的低口袋性靶标(如历史上“不可成药”的肿瘤靶标KRAS和MYC)采样有效配体。为弥补这一空白,我们引入了ShallowBench,这是一个从CrossDocked2020中提取的包含5780个浅口袋靶标的严格精选基准。通过计算Alpha Shape“盖子”体积与底层蛋白质原子体素体积之间的差异,我们成功分离出低凹度靶标,同时确保足够的结合表面积。评估多种最先进的生成模型显示,在这些低凹度界面上预测的结合亲和力较弱。因此,ShallowBench为生成生物学模型提供了一个严格的基准,并强调了需要能够应对这些具有挑战性靶标的新型架构创新或损失函数。

英文摘要

While generative AI models have demonstrated remarkable success in structure-based drug design, they predominantly rely on deep binding pockets and struggle to sample effective ligands for challenging low-pocketability targets, such as the historically "undruggable" oncology targets KRAS and MYC. To address this gap, we introduce ShallowBench, a strictly curated benchmark of 5,780 shallow-pocket targets extracted from CrossDocked2020. By computing the difference between an Alpha Shape "lid" volume and the underlying protein atom voxel volume, we successfully isolated targets with low concavity while ensuring sufficient surface area for binding. Evaluating various state-of-the-art generative models reveals weaker predicted binding affinity on these low-concavity interfaces. ShallowBench therefore provides a rigorous benchmark for generative biology models and highlights the necessity of new architectural innovations or loss functions capable of navigating these challenging targets.

2606.06718 2026-06-08 cs.LG cs.AI cs.SY eess.SY 新提交

MSAIC-Net: A Multi-Scale Attention and Imbalance-Aware Contrastive Network for ECG-Based Myocardial Substrate Abnormality Detection

MSAIC-Net:用于基于心电图的心肌基质异常检测的多尺度注意力和不平衡感知对比网络

Canyu Lei, Fenglin Zhang, Derek Bivona, Cristiane Singulane, Jonathan Pan, Kenneth Bilchick, Amit R. Patel, Jianxin Xie

发表机构 * University of Virginia(弗吉尼亚大学)

AI总结 提出多尺度注意力增强卷积网络MSAIC-Net,通过并行空洞卷积提取多尺度特征、通道注意力重加权、不平衡感知对比学习及导联置换重要性分析,在低数据量UVA队列和大规模PTB-XL数据集上实现心肌瘢痕和心肌梗死检测的准确性和可解释性提升。

详情
AI中文摘要

心肌基质异常,如心肌瘢痕和心肌梗死(MI),与不良心血管结局相关。心电图(ECG)为检测这些异常提供了一种低成本且广泛可用的工具,但由于异质性导联依赖性表现、高维多导联信号、类别不平衡以及深度学习模型的可解释性有限,基于ECG的检测仍然具有挑战性。我们提出了一种多尺度注意力增强卷积网络(MSAIC-Net)用于基于ECG的心肌基质异常检测。MSAIC-Net采用并行空洞卷积分支,在多个时间感受野上提取ECG特征,使模型能够捕捉局部和更长时间范围的时间模式。然后使用通道注意力自适应地重新加权信息性导联和特征通道表示。为了解决类别不平衡并提高特征可分性,我们引入了一种新颖的不平衡感知监督对比学习策略,鼓励同一类别的样本形成紧凑表示,同时增加异常和正常样本之间的分离。进一步引入导联置换重要性来量化每个ECG导联的贡献并提高模型可解释性。该方法在两个互补数据集上进行了评估:来自弗吉尼亚大学(UVA)健康系统的低数据量机构队列用于心肌瘢痕分类,以及来自PhysioNet的大规模公共PTB-XL数据集用于MI识别。实验结果表明,MSAIC-Net优于基线模型,在低数据量的UVA队列中改进尤为显著。总体而言,所提出的框架为基于ECG的心肌基质异常检测提供了一种有效且可解释的方法。

英文摘要

Myocardial substrate abnormalities, such as myocardial scar and myocardial infarction (MI), are associated with adverse cardiovascular outcomes. Electrocardiography (ECG) provides a low-cost and widely available tool for detecting these abnormalities, but ECG-based detection remains challenging due to heterogeneous lead-dependent manifestations, high-dimensional multi-lead signals, class imbalance, and the limited interpretability of deep learning models. We propose a multi-scale attention-enhanced convolutional network (MSAIC-Net) for ECG-based myocardial substrate abnormality detection. MSAIC-Net employs parallel atrous convolutional branches to extract ECG features across multiple temporal receptive fields. %, enabling the model to capture both local and longer-range temporal patterns. Channel attention is then used to adaptively reweight informative lead-wise and feature-channel representations. To address class imbalance and improve feature separability, we introduce a novel imbalance-aware supervised contrastive learning strategy that encourages samples from the same class to form compact representations while increasing separation between abnormal and normal samples. Lead-wise permutation importance is further incorporated to quantify the contribution of each ECG lead and improve model interpretability. The proposed method was evaluated on two complementary datasets: a low-data institutional cohort from the University of Virginia (UVA) Health System for myocardial scar classification and the large-scale public PTB-XL dataset from PhysioNet for MI identification. Experimental results show that MSAIC-Net outperforms baseline models, with particularly pronounced improvements in the low-data UVA cohort. Overall, the proposed framework provides an effective and interpretable approach for ECG-based detection of myocardial substrate abnormalities.

2606.06722 2026-06-08 cs.LG 新提交

Flatland: The Adventures of Gradient Descent with Large Step Sizes

平面国:大步长梯度下降的冒险

Leonardo Galli, Curtis Fox, Wiebke Bartolomaeus, Mark Schmidt, Holger Rauhut

发表机构 * University of British Columbia(不列颠哥伦比亚大学) Munich Center for Machine Learning(慕尼黑机器学习中心) Canada CIFAR AI Chair (Amii)(加拿大CIFAR人工智能主席(Amii))

AI总结 针对非全局L-光滑的神经网络目标函数,提出一种只需梯度局部Lipschitz连续的定义,设计自适应一阶方法实现大步长并始终处于稳定性边缘,发现过早进入全局平坦区域会降低收敛速度和泛化能力。

Comments Accepted for the International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

神经网络的训练通常涉及非全局$L$-光滑的目标函数。对于这些函数,从理论和实践上都很难回答这样一个问题:确保梯度下降(GD)收敛的最大可能步长是多少?我们通过提供“大”步长的统一定义来解决深度学习中长期存在的这个开放问题,该定义仅要求梯度的局部Lipschitz(甚至Hölder)连续性。我们设计了一阶自适应方法,这些方法可证明产生大步长,并表明它们从训练开始就处于稳定性边缘(EoS)。特别地,损失非单调下降,并且步长与锐度(即Hessian矩阵的最大特征值)的乘积在整个训练过程中保持在EoS阈值2以上。使用我们的方法,我们还能将锐度最小化到其全局最小值。与预期相反,我们发现训练过早遇到全局平坦区域可能会减慢收敛速度并损害网络的泛化能力。利用自稳定化论证,我们允许GD进入稍尖锐的谷底,并将不成功的训练运行转变为非常成功的运行。

英文摘要

The training of neural networks often entails objective functions that are not globally $L$-smooth. For these functions, it is both theoretically and practically difficult to reply to the question: what is the largest possible step size that ensures the convergence of gradient descent (GD)? We address this longstanding open question in deep learning by providing a unifying definition of "large" step sizes that requires only local Lipschitz (or even Hölder) continuity of the gradient. We design first-order adaptive methods that provably yield large step sizes and show that they operate at the edge of stability (EoS) right from the start of the training. In particular, the loss decreases nonmonotonically and the product between the step size and sharpness, i.e., the largest eigenvalue of the Hessian, stays above the EoS threshold of 2 throughout training. Using our method, we are also able to minimize the sharpness all the way down to its global minimum. Contrary to expectation, we find that encountering globally-flat regions too early in the training may both slow down convergence and jeopardize the generalization ability of the network. Exploiting a self-stabilization argument, we allow GD to enter slightly sharper valleys and turn unsuccessful training runs into very successful ones.

2606.06724 2026-06-08 cs.LG 新提交

Synthics: Synthetic Physics-like Datasets for Machine Learning

Synthics: 用于机器学习的类物理合成数据集

Jari Vepsäläinen

发表机构 * Aalto University(阿尔托大学)

AI总结 提出基于贝叶斯概率上下文无关文法从方程语料库生成结构相似的合成回归数据集,通过非侵入式探测确定物理有效域,统计验证表明合成数据在结构保真度上优于纯概率方法,且在下游超参数调优中性能接近真实数据。

详情
AI中文摘要

代表性数据是机器学习的基础,因为有限的数据会阻碍泛化。收集足够的真实样本通常是不可行的。合成数据生成提供了一种实用的解决方案,但前提是生成的数据能忠实反映真实观测的结构。本文提出了一种方法,用于生成结构上类似于给定方程语料库中物理方程的合成回归数据集。该方法使用贝叶斯概率上下文无关文法来捕捉语料库的底层代数结构,并从中采样新方程。为了确保生成的输入位于物理有意义的域内,通过非侵入式探测为每个方程表征适用域,同时恢复变量间约束。输入采样进一步通过从有效域的随机子范围中混合均匀分布和截断正态分布进行抽取,模拟真实的实验条件。使用Kolmogorov-Smirnov检验对Feynman方程语料库进行统计验证。生成的方程在全部八个研究的结构特征上与语料库匹配,而未经平滑的纯概率文法仅匹配两个特征,表明贝叶斯先验对于在给定语料库规模下实现结构保真度至关重要。在下游超参数调优任务中,基于合成数据调优的梯度提升回归器在真实数据上平均选出20个配置中的第6优配置,与基于真实数据调优的结果相当,并显著优于随机表达式树(第10优)和噪声(第19优)。

英文摘要

Representative data is fundamental in machine learning, as limited data hinders generalisation. Collecting sufficient real-world samples is often infeasible. Synthetic data generation offers a practical solution, but only if the generated data faithfully reflects the structure of real observations. In this paper, a method for generating synthetic regression datasets that structurally resemble physics equations from a given equation corpus is presented. The approach uses a Bayesian Probabilistic Context-Free Grammar to capture the underlying algebraic structure of the corpus, from which novel equations are sampled. To ensure the generated inputs lie within a physically meaningful domain, the applicability domain is characterised for each equation through non-intrusive probing, also recovering inter-variable constraints. Input sampling further mimics realistic experimental conditions by drawing from random sub-ranges of the valid domain with mixed uniform and truncated normal distributions. The generated data is statistically validated against the Feynman equation corpus using Kolmogorov-Smirnov tests. The generated equations match the corpus on all of the eight studied structural features, compared to only two for an unsmoothed purely probabilistic grammar, demonstrating that the Bayesian prior is essential for structural fidelity given the size of the corpus. In a downstream hyperparameter-tuning task, a gradient-boosted regressor tuned on the synthetic data picks, on average, the 6th-best configuration out of 20 on real data, matching the result of tuning on real data itself and substantially outperforming random expression trees (10th) and noise (19th).

2606.06746 2026-06-08 cs.LG 新提交

Performance Variation in Deep Reinforcement Learning

深度强化学习中的性能变化

Haruto Tanaka, A. Rupam Mahmood

发表机构 * Department of Computing Science University of Alberta(计算科学系阿尔伯塔大学) Alberta Machine Intelligence Institute (Amii)(阿尔伯塔人工智能研究所(Amii)) CIFAR AI Chair(CIFAR人工智能主席)

AI总结 针对深度强化学习算法运行间鲁棒性低的问题,提出基于百分位的统计量(min-max IPR)和可视化方法(逐运行百分位高亮),用于评估性能变化,并通过三个案例验证其有效性。

详情
AI中文摘要

深度强化学习(RL)算法通常存在运行间鲁棒性低的问题,表现为相同配置的智能体在独立运行中性能显著变化。尽管这一问题在研究和实践中带来一系列挑战,但开发评估方法的研究相对较少;相反,RL研究通常报告估计平均性能的不确定性。在本文中,我们概述了传统不确定性和变化估计的局限性,特别是它们与目的的不匹配以及低估风险。然后,我们提出了一种基于百分位的替代统计量和可视化方法,分别称为min-max IPR和逐运行百分位高亮。这些基于百分位的工具易于解释,并依赖于样本百分位的标准性质,提供了关于运行间性能变化的丰富信息。我们通过三个案例研究证明了这一点。首先,我们表明LayerNorm和倒数第二层归一化缩小了PPO的性能变化,而SAC的变化基本不变。其次,我们比较了PPO、SAC、TD-MPC和TD-MPC2,并表明TD-MPC在四个算法中变化最小,同时数据效率最高。最后,在五个Atari环境上比较DQN和Rainbow时,我们表明两种算法表现出相似水平的性能变化。

英文摘要

Deep reinforcement learning (RL) algorithms often suffer from low run-to-run robustness, manifesting as significant performance variation across independent runs of identically configured agents. Although this issue poses a spectrum of challenges across research and practice, relatively few studies develop methods to evaluate it; RL research instead often reports uncertainty in the estimated mean performance. In this paper, we outline the limitations of conventional uncertainty and variation estimates, particularly their misalignment with purpose and the risk of underreporting. We then propose an alternative percentile-based statistic and visualization method, min-max IPR and run-wise percentile highlighting, respectively. These percentile-based tools are easy to interpret and rely on standard properties of sample percentiles, providing rich information about run-to-run performance variation. We demonstrate this through three case studies. First, we show that LayerNorm and penultimate-layer normalizations narrow performance variation in PPO, whereas the variation is mostly unchanged in SAC. Second, we compare PPO, SAC, TD-MPC, and TD-MPC2, and show TD-MPC exhibits the least variation while being the most data efficient among the four. Finally, in a comparison of DQN and Rainbow on five Atari environments, we show that both algorithms exhibit similar levels of performance variation.

2606.06776 2026-06-08 cs.LG 新提交

A Rolling-Window Framework for Churn Prediction and Behavioral Driver Identification

一个用于流失预测和行为驱动因素识别的滚动窗口框架

Muhammad Jawad Mufti, Omar Hammad, Haitham Saleh, Muqaddas Gull

发表机构 * Information and Computer Science Department, King Fahd University of Petroleum and Minerals(国王法赫德石油和矿物大学信息与计算机科学系) Interdisciplinary Research Center for Smart Mobility and Logistics (IRC-SML), King Fahd University of Petroleum and Minerals(智能交通与物流交叉研究中心(IRC-SML),国王法赫德石油和矿物大学) SDAIA–KFUPM Joint Research Center for Artificial Intelligence, King Fahd University of Petroleum and Minerals(SDAIA-KFUPM人工智能联合研究中心,国王法赫德石油和矿物大学)

AI总结 提出基于滚动行为窗口的流失预测框架,在非契约服务环境中实现连续风险评估,特征模型准确率87.6%、ROC-AUC 0.94,序列模型召回率96.1%。

详情
AI中文摘要

客户流失预测是客户分析中的核心任务,特别是在非契约、按使用付费的服务环境中,脱离行为并非明确可观察,而必须从行为不活跃中推断出来。现有的流失预测方法通常依赖于简化的时间假设或客户行为的单点表示,这限制了它们支持持续风险评估、可解释性和随时间推移的现实部署的能力。本研究提出了一个时间显式的流失预测框架,该框架使用滚动行为窗口对客户行为进行建模,从而随着客户活动的演变实现重复和实例级别的流失风险估计。客户行为在固定的30天观察窗口内进行总结,随后是30天的未来流失评估窗口,确保行为证据与流失结果之间存在明确的时间分离。该框架在统一的时间设计内集成了基于特征和基于序列的学习方法。所提出的方法在来自非契约服务平台的大规模真实世界数据集上进行了评估。实证结果表明了强大且稳定的预测性能,基于特征的模型准确率达到87.6%,ROC-AUC为0.94,而基于序列的模型通过捕获时间脱离模式实现了高达96.1%的召回率。对未来未见数据的评估证实了在时间偏移下具有显著的鲁棒性,无需模型重新训练即可保持准确率高于83%和ROC-AUC超过0.91。总体而言,研究结果强调,精心设计的时间框架(而非仅模型复杂性)对于实现稳健、可解释且可部署的流失预测至关重要。本研究为动态服务环境中面向流失的决策支持提供了实用基础。

英文摘要

Customer churn prediction is a central task in customer analytics, particularly in non-contractual, pay-per-use service environments where disengagement is not explicitly observed and must be inferred from behavioral inactivity. Existing churn prediction approaches often rely on simplified temporal assumptions or single-point representations of customer behavior, which limit their ability to support continuous risk assessment, interpretability, and realistic deployment over time. This study proposes a temporally explicit churn prediction framework that models customer behavior using rolling behavioral windows, enabling repeated and instance-level churn risk estimation as customer activity evolves. Customer behavior is summarized within a fixed 30-day observation window, followed by a 30-day future churn evaluation window, ensuring a clear temporal separation between behavioral evidence and churn outcomes. The framework integrates feature-based and sequence-based learning approaches within a unified temporal design. The proposed approach is evaluated on a large-scale, real-world dataset from a non-contractual service platform. Empirical results demonstrate strong and stable predictive performance, with accuracy reaching 87.6% and ROC-AUC of 0.94 for the feature-based model, while the sequence-based model achieves recall as high as 96.1% by capturing temporal disengagement patterns. Evaluation on future unseen data confirms meaningful robustness under temporal shift, with accuracy remaining above 83% and ROC-AUC exceeding 0.91 without model retraining. Overall, the findings highlight that carefully designed temporal framing, rather than model complexity alone, is critical for achieving robust, interpretable, and deployment-ready churn prediction. The study provides a practical foundation for churn-oriented decision support in dynamic service environments.

2606.06786 2026-06-08 cs.LG cs.NI 新提交

Federated Foundation Models over Vehicular Networks

车辆网络中的联邦基础模型

Kasra Borazjani, Fardis Nadimi, Payam Abdisarabshali, Owen Palinski, Allan Salihovic, Dinh Nguyen, Minghui Liwang, Seyyedali Hosseinalipour

发表机构 * University of Waterloo(多伦多大学)

AI总结 本文提出将多模态多任务联邦基础模型(M3T FedFMs)集成到车辆网络中的愿景,结合基础模型的表达力与联邦学习的隐私保护分布式学习能力,并通过Waymo数据集案例验证其潜力。

Comments 8 pages, 4 figures

详情
AI中文摘要

本文提出了一种前瞻性的愿景,将新兴的多模态多任务联邦基础模型(M3T FedFMs)集成到车辆网络中,旨在统一多模态多任务基础模型(M3T FMs)的表达能力与联邦学习(FL)的隐私保护和分布式学习能力。鉴于这一研究方向在很大程度上尚未被探索,我们首先介绍了M3T FedFMs的基本训练/微调原理。然后,我们讨论了它们在车辆网络中的一系列代表性用例,展示了M3T FedFMs在实现下一代车辆智能方面的巨大潜力。之后,我们识别了车辆环境固有的关键约束,这些约束挑战了M3T FedFMs的实际部署,并阐述了一系列前瞻性的研究方向以应对这些挑战。此外,通过在真实车辆数据集(即Waymo开放数据集)上进行的案例研究,我们展示了M3T FedFMs在车辆网络中的前景,并发布了我们的实现以促进可重复性和刺激这一新兴领域的研究(仓库:此 https URL)。

英文摘要

This paper presents a forward-looking vision for integrating the emerging multi-modal multi-task federated foundation models (M3T FedFMs) into vehicular networks, with the goal of unifying the expressive power of multi-modal multi-task foundation models (M3T FMs) with the privacy-preserving and distributed learning capabilities of federated learning (FL). Given the largely underexplored nature of this research direction, we first introduce the fundamental training/fine-tuning principles of M3T FedFMs. We then discuss a range of their representative use cases in vehicular networks, illustrating the significant potential of M3T FedFMs to enable next-generation vehicular intelligence. Afterwards, we identify key constraints inherent to vehicular environments that challenge the practical deployment of M3T FedFMs, and articulate a set of forward-looking research directions to address these challenges. Furthermore, through a case study conducted on a real-world vehicular dataset (i.e., Waymo Open Dataset), we demonstrate the promise of M3T FedFMs for vehicular networks and release our implementation to facilitate reproducibility and stimulate research in this emerging area (repository: https://github.com/KasraBorazjani/vehicular-fedfm)

2606.06804 2026-06-08 cs.LG stat.AP 新提交

Interpreting Learning Under Competing Models: Joint and Stepwise Approaches for Dynamic Cognitive Diagnosis

解释竞争模型下的学习:动态认知诊断的联合与逐步方法

Yawen Ma, Sahoko Ishida, Kate Cain, Gabriel Wallin

发表机构 * School of Mathematical Sciences, Lancaster University(兰卡斯特大学数学科学学院) Department of Computer Science, University of Oxford(牛津大学计算机科学系) Department of Psychology, Lancaster University(兰卡斯特大学心理学系)

AI总结 研究在项目-技能结构未知时,联合估计Q矩阵与学习过程相比先确定Q矩阵再研究学习,如何改变对学习者发展的结论,并通过动态认知诊断模型分析阅读游戏数据,发现联合分析更可靠。

详情
AI中文摘要

数字学习环境记录学习者对单个项目的反应,使得研究特定技能的发展而非总体分数成为可能。从这些数据中得出关于学习的结论需要一个将反应与潜在技能联系起来的模型,并追踪掌握程度随时间的变化。当每个项目测量的技能未知时,分析者必须决定是联合估计这种结构(Q矩阵)与学习过程,还是先确定它再研究学习。我们表明,这一决定可以改变关于学习者如何发展的实质性结论。使用动态认知诊断模型,我们分析了两个阅读游戏的数据,这些游戏测量了从二年级到三年级的词汇和理解能力,项目文本嵌入为未知的Q矩阵提供了先验信息。联合分析和偏差校正的逐步分析一致认为,大多数学习者朝着掌握两种技能的方向发展,但在三年级时有多少人仍然只部分熟练的问题上存在分歧,从而改变了阅读进展的报告方式。模拟研究确定了两种分析何时出现分歧,并表明当项目-技能结构不确定且项目池在不同年级之间变化时,联合分析更可靠。我们提供了两种分析的R代码。

英文摘要

Digital learning environments record learners' responses to individual items, making it possible to study the development of specific skills rather than overall scores. Drawing conclusions about learning from these data requires a model that links responses to latent skills and tracks how mastery changes over time. When the skills measured by each item are unknown, the analyst must decide whether to estimate this structure, the Q-matrix, jointly with the learning process, or to establish it first and study learning afterwards. We show that this decision can change substantive conclusions about how learners develop. Using dynamic cognitive diagnostic models, we analyse data from two reading games measuring vocabulary and comprehension from Grade 2 to Grade 3, with item-text embeddings providing prior information for the unknown Q-matrix. A joint analysis and a bias-corrected stepwise analysis agree that most learners move toward mastering both skills, but disagree about how many remain only partially proficient at Grade 3, changing how reading progress would be reported. A simulation study identifies when the two analyses diverge and shows that joint analysis is more reliable when the item-skill structure is uncertain and the item pool changes between grades. We provide R code for both analyses.

2606.06820 2026-06-08 cs.LG cs.AI 新提交

SCALE: Scalable Cross-Attention Learning with Extrapolation for Agentic Workflow Scheduling

SCALE: 可扩展的交叉注意力学习与外推方法用于智能体工作流调度

Zhifei Xu, Jierui Lan, Zixuan Liang, Aiji Liang, Jinxi He

发表机构 * Faculty of Arts and Sciences, Beijing Normal University(北京师范大学文理学院)

AI总结 提出SCALE调度器,通过交叉注意力指针网络和结构化表示正则化,实现无需微调即可泛化到不同规模集群的深度强化学习工作流调度。

Comments Submitted to Computer Networks

详情
AI中文摘要

智能体大型语言模型系统将复杂任务分解为工作流有向无环图,其原语必须在异构集群上调度。现有的深度强化学习调度器与固定集群大小绑定,当服务器数量变化时需要重新训练。我们提出SCALE(可扩展的交叉注意力学习与外推),一种无需微调即可泛化到未见过的集群规模的深度强化学习调度器。SCALE采用交叉注意力指针网络,其中任务特征查询服务器特征,因此架构通过构造接受任意数量的服务器。然而,我们观察到仅排列不变架构并不能保证在新规模下的良好性能——随着服务器数量增长,注意力特征经历分布偏移。为了解决这个问题,我们引入结构化表示正则化:一种去相关损失结合朝向标准正态的KL惩罚,使特征统计量无论输入大小都保持稳定。在16个节点上训练并直接在32和48个节点上测试,SCALE在N=48时相对于没有SRR的相同架构将平均响应时间降低了8.9%,确认了显式正则化对于缩小规模泛化差距是必要的。

英文摘要

Agentic Large Language Model (LLM) systems decompose complex tasks into workflow Directed Acyclic Graphs (DAGs) whose primitives must be scheduled on heterogeneous clusters. Existing deep reinforcement learning (DRL) schedulers are tied to a fixed cluster size and require retraining whenever the number of servers changes. We propose SCALE (Scalable Cross-Attention Learning with Extrapolation), a DRL scheduler that generalizes to unseen cluster scales without fine-tuning. SCALE employs a cross-attention pointer network where task features query against server features, so the architecture accepts any number of servers by construction. We observe, however, that permutation-invariant architecture alone does not guarantee good performance at new scales - the attention feature undergoes distribution shift as the server count grows. To counter this, we introduce Structured Representation Regularization (SRR): a decorrelation loss combined with a KL penalty toward the standard normal, which keeps feature statistics stable regardless of input size. Trained on 16 nodes and tested directly on 32 and 48 nodes, SCALE reduces average response time by 8.9% at N=48 relative to the same architecture without SRR, confirming that explicit regularization is necessary to close the scale-generalization gap.

2606.06823 2026-06-08 cs.LG cs.AI q-fin.ST 新提交

PandaAI: A Practical Agent CQ2 for Neuro-symbolic Data Analysis And Integrated Decision-Making in Quantitative Finance

PandaAI: 一种用于量化金融中神经符号数据分析与集成决策的实用智能体CQ2

Yuqi Li, Siyuan Liu, Bingjun Liu

发表机构 * Panda AI

AI总结 针对金融数据低信噪比和非平稳性,提出PandaAI,一种结合市场状态建模与约束alpha生成的闭环神经符号LLM智能体,通过领域微调和模块化架构实现风险感知决策,在沪深300数据上Rank IC提升18.2%,最大回撤降低25.7%。

详情
AI中文摘要

尽管深度学习在各个领域表现出色,但由于金融数据的低信噪比(SNR)和非平稳性,其在金融序列决策中的应用仍然具有挑战性。利用大型语言模型(LLM)的推理能力,我们提出了\textbf{PandaAI},一种具有市场状态建模和约束alpha生成的闭环神经符号LLM智能体,它桥接了通用LLM推理与金融严谨性,并抑制了LLM生成输出的金融毒性。为了弥合通用语言能力与金融严谨性之间的差距,我们微调了一个领域特定的LLM。此外,我们将此LLM集成到模块化架构中,形成一个闭环系统。与传统优化孤立预测指标的模型不同,\textbf{PandaAI}被设计为一种神经符号智能体,以明确的风险意识在复杂、真实的金融环境中导航。在沪深300股票数据上的大量实验表明,\textbf{PandaAI}比最先进的时间序列模型实现了$18.2\%$更高的Rank IC和$25.7\%$更低的最大回撤。我们的约束LLM生成和双通道适应方法为LLM在高风险序列决策场景中的部署提供了一种通用范式。

英文摘要

While deep learning has excelled in various domains, its application to sequential decision-making in finance remains challenging due to the low Signal-to-Noise Ratio (SNR) and non-stationarity of financial data. Leveraging the reasoning capabilities of Large Language Models (LLMs), we propose \textbf{PandaAI}, a closed-loop neuro-symbolic LLM agent with market regime modeling and constrained alpha generation, which bridges general LLM reasoning with financial rigor and suppresses the financial toxicity of LLM-generated outputs. To bridge the gap between general linguistic capability and financial rigor, we fine-tune a domain-specific LLM. Furthermore, we integrate this LLM into a modular architecture and form a closed-loop system. Unlike traditional models that optimize isolated prediction metrics, \textbf{PandaAI} is designed as a neuro-symbolic agent that navigates the complex, real-world financial environment with explicit risk awareness. Extensive experiments on CSI 300 stock data show that \textbf{PandaAI} achieves a $18.2\%$ higher Rank IC and $25.7\%$ lower maximum drawdown than state-of-the-art time-series models. Our constrained LLM generation and dual-channel adaptation method provide a general paradigm for LLM deployment in high-stakes sequential decision-making scenarios.

2606.06827 2026-06-08 cs.LG 新提交

Architecture Shapes Transfer Specificity in Implicit Neural Representations

架构影响隐式神经表示中的迁移特异性

D Yang Eng

发表机构 * D Yang Eng

AI总结 通过控制实验和偏微分方程基准测试,研究SIREN、ReLU MLP和傅里叶特征MLP三种隐式神经表示架构的迁移特性,发现迁移幅度与迁移特异性分离,ReLU更具源选择性,而SIREN广泛重用权重。

详情
AI中文摘要

坐标网络中的迁移通常通过热启动增益来衡量,但这种增益反映的是源特定结构还是通用权重重用尚不明确。我们通过控制分析测试、二维顶盖驱动方腔纳维-斯托克斯基准以及一维热方程、粘性伯格斯方程和聚焦三次非线性薛定谔方程参考解套件,研究了三种隐式神经表示(INR)家族:SIREN、ReLU MLP和傅里叶特征MLP。分析测试使用独立种子随机控制,而PDE基准使用同族替代源控制和辅助消融。在各种设置下,迁移幅度和迁移特异性明显分离。在10种子控制的一维几何测试中,傅里叶特征显示出最大的结构化迁移(33.1倍),其次是SIREN(23.0倍)和ReLU(10.7倍),但ReLU的选择性更强:随机控制迁移为0.41倍,而SIREN为14.24倍。在受控的双参数一维族中,排名发生变化:在默认设置下,ReLU给出了最清晰的结构化与控制分离,而傅里叶特征仅在带宽重新调整后才有改进。在纳维-斯托克斯和更广泛的一维PDE套件中,没有单一架构主导所有方程,但相同的模式仍然存在:SIREN通常广泛重用权重,而ReLU以及在某些方程中的傅里叶特征更具源选择性。静态诊断仍然薄弱,启发式缩放律$A_{\text{transfer}} \propto 1/\Delta t^2$在所实施的一维审计中被拒绝。这些结果将迁移特异性定位为坐标网络的有用诊断,并表明科学机器学习中的架构选择应在明确控制条件下进行评估,而不仅仅依据迁移幅度。

英文摘要

Transfer in coordinate networks is often measured by warm-start gain, but whether that gain reflects source-specific structure or generic weight reuse is less clear. We study this question across three implicit neural representation (INR) families, SIREN, ReLU MLPs, and Fourier-feature MLPs, using controlled analytic tests, a 2D lid-driven-cavity Navier--Stokes benchmark, and 1D PDE reference-solution suites for heat, viscous Burgers, and focusing cubic NLS. The analytic tests use independent-seed random controls, while the PDE benchmarks use alternate same-family source controls and auxiliary ablations. Across settings, transfer magnitude and transfer specificity separate clearly. In a 10-seed controlled 1D geometric test, Fourier Features show the largest structured transfer ($33.1\times$), followed by SIREN ($23.0\times$) and ReLU ($10.7\times$), but ReLU is far more selective: random-control transfer is $0.41\times$ for ReLU versus $14.24\times$ for SIREN. On a controlled two-parameter 1D family, the ranking changes: ReLU gives the clearest structured-versus-control separation at default settings, whereas Fourier Features improve only after bandwidth retuning. In Navier--Stokes and the broader 1D PDE suite, no single architecture dominates every equation, yet the same pattern remains: SIREN often reuses weights broadly, whereas ReLU and, in some equations, Fourier Features are more source-selective. Static diagnostics remain weak, and the heuristic scaling law $A_{\text{transfer}} \propto 1/Δt^2$ is rejected in the implemented 1D audit. These results position transfer specificity as a useful diagnostic for coordinate networks and suggest that architecture selection in scientific machine learning should be evaluated under explicit control conditions, not by transfer magnitude alone.

2606.06833 2026-06-08 cs.LG cs.AI cs.CR 新提交

Hearing the Unspoken: Language Model Priors for Acoustic Adversarial Attacks

听弦外之音:面向声学对抗攻击的语言模型先验

Jiani Xie, Andrew C. Cullen, Paul Montague, Benjamin I. P. Rubinstein

发表机构 * University of Melbourne(墨尔本大学) DST Group(DST集团)

AI总结 提出Semantic Gambit攻击,利用大语言模型实时提供预测上下文,突破因果限制,使实时ASR系统词错误率提升至35.6%,较当前最优方法提高三倍。

详情
AI中文摘要

在实时环境中运行的自动语音识别(ASR)系统必须在严格的时间约束下处理声学输入,其转录决策本质上基于不完整信息。这种因果约束成为攻击者的信息瓶颈,显著限制了攻击性能。我们的新攻击方法Semantic Gambit通过实时利用大语言模型提供的预测上下文,突破了这一因果限制。实验表明,这种增强方式可将语料级词错误率提升至35.6%——比当前最优方法提高三倍。最终,这项工作揭示了如何利用常见的低延迟LLM工具系统地破坏实时ASR流水线。

英文摘要

Automatic Speech Recognition (ASR) systems operating in real-time settings must process acoustic input under strict temporal constraints, where transcription decisions are inherently made on incomplete information. This causal constraint serves as an information bottleneck on attackers, significantly limiting attack performance. Our new Semantic Gambit attack breaks this causal limitation by augmenting the adversary with predictive context derived from a Large Language Model in real-time. Our experiments show that this form of augmentation can elevate the corpus-level Word Error Rate to 35.6% -- a three-fold increase over the current state-of-the-art. Ultimately, this work reveals how common, low-latency LLM tooling can be exploited to systematically subvert real-time ASR pipelines.

2606.06854 2026-06-08 cs.LG 新提交

The Geometry of Last-Layer Model Stealing

最后一层模型窃取的几何学

Snigdha Chandan Khilar

发表机构 * Independent Researcher(独立研究者)

AI总结 利用几何学解释如何通过已知方法窃取机器学习模型,展示了完美复制Transformer网络最后一层的条件,并揭示了隐藏层的限制。

详情
AI中文摘要

本文利用几何学解释如何通过已有的知名方法窃取机器学习模型。作者展示了完美复制Transformer网络最后一层所需的确切条件。在深入探究隐藏层时,作者解释了明确的限制。作者还证明,仅通过查看最终结果无法完全逆向工程隐藏网络。该研究清晰地勾勒出模型中哪些部分可以被窃取,哪些不能。

英文摘要

This paper uses geometry to explain how a machine learning model can be stolen using an already existing well-known method. The author has shown the exact conditions required to perfectly copy the final layer of a transformer network. When looking deeper into the hidden layers the author has explained clear limits. The author has also demonstrated that a hidden network cannot be fully reverse engineered just by looking at the final results. The research clearly maps out what can and cannot be stolen from a model.

2606.06861 2026-06-08 cs.LG cs.AI 新提交

Modeling Nonlinear Feature Interactions with Product-Unit Residual Networks

使用乘积单元残差网络建模非线性特征交互

Ziyuan Li, Uwe Jaekel, Babette Dellen

发表机构 * University of Applied Sciences Koblenz(科隆应用科学大学) Technical University of Munich(慕尼黑技术大学)

AI总结 提出乘积单元残差网络(PURe),通过显式建模特征交互提升鲁棒性和可解释性,在合成和真实数据集上优于MLP。

Comments Accepted at ICCS 2026

详情
AI中文摘要

理解非线性特征交互在科学和工程中至关重要,然而标准多层感知器(MLP)通常仅隐式地捕获此类交互,导致表征纠缠,可能损害鲁棒性和可解释性。我们研究了乘积单元残差网络(PURe),它将乘法乘积单元与残差连接相结合,以显式建模跨特征耦合,同时稳定优化。我们在一个基于交互的合成基准和两个真实世界数据集上进行了系统评估,考察了预测准确性、对高斯特征噪声的鲁棒性以及在有限训练数据下的性能,并在匹配参数预算下比较了实值和复值变体。除了准确性,基于SHapley Additive exPlanations(SHAP)的交互分析表明,与MLP基线相比,PURe学习了更集中且结构更连贯的交互模式。总体而言,PURe实现了具有竞争力或更好的性能,在低数据场景下具有更好的鲁棒性和样本效率,并增强了交互级别的可解释性。

英文摘要

Understanding nonlinear feature interactions is crucial in science and engineering, yet standard multilayer perceptrons (MLPs) often capture such interactions only implicitly, leading to entangled representations that can impair robustness and interpretability. We investigate product-unit residual networks (PURe) that integrate multiplicative product units with residual connections to explicitly model cross-feature couplings while stabilizing optimization. We conduct a systematic evaluation on an interaction-driven synthetic benchmark and two real-world datasets, assessing predictive accuracy, robustness to Gaussian feature noise, and performance under limited training data, and we compare real- and complex-valued variants under a matched parameter budget. Beyond accuracy, SHapley Additive exPlanations (SHAP)-based interaction analyses show that PURe learns more concentrated and structurally coherent interaction patterns than MLP baselines. Overall, PURe achieves competitive or improved performance, better robustness and sample efficiency in low-data regimes, and enhanced interaction-level interpretability.

2606.06866 2026-06-08 cs.LG nucl-th 新提交

Product units in gated recurrent units improve nuclear-mass prediction

门控循环单元中的乘积单元改进核质量预测

Ziyuan Li, Paulo S. A. Freitas, John W. Clark, Babette Dellen

发表机构 * University of Applied Sciences Koblenz(应用科学大学科伦兹大学) Technical University of Munich(慕尼黑技术大学) University of Madeira(马德拉大学) Washington University in St. Louis(圣路易斯华盛顿大学)

AI总结 提出基于复数域加法-乘法乘积单元门控循环单元(AM-PU-GRU)的机器学习模型,通过整合乘积单元变换和复数计算,在核质量预测中实现插值RMSE 0.227 MeV和外推RMSE 0.179 MeV,超越现有模型。

Comments Accepted at ICCS 2026

详情
AI中文摘要

使用机器学习预测原子核质量可以补充理论模型,并推进对核图表中未知领域的探索。我们提出了一种基于门控循环单元(GRU)的机器学习技术,该技术通过利用长期依赖关系在核质量预测中展现出竞争性能。通过在循环单元内整合乘法交互和乘积单元变换,我们报告了核质量预测的显著改进。计算在复数域中进行,以联合捕捉幅度和相位动态。对于基于原子质量评估(AME2016和AME2020)的插值和时间外推任务,复数加法-乘法乘积单元门控循环单元(AM-PU-GRU)模型始终实现最低的预测误差,插值RMSE为0.227 ± 0.004 MeV,外推RMSE为0.179 ± 0.015 MeV。这些结果超越了其他最先进的机器学习模型,也优于实值GRU基线和乘积单元消融变体,同时对不同的理论先验(包括WS4和SEMF)保持鲁棒性。我们的发现确立了复数乘积单元循环网络作为基于序列的核质量预测的新基准。

英文摘要

The prediction of masses of atomic nuclei using machine learning can complement theoretical models and advance the exploration of poorly known domains of the nuclear chart. We propose a machine learning technique based on gated recurrent units (GRU), which have demonstrated competitive performance in nuclear-mass prediction by exploiting long-term dependencies. By integrating multiplicative interactions and product-unit transformations within recurrent units, we report significant improvements in nuclear-mass prediction. Computations are performed in the complex domain to jointly capture amplitude and phase dynamics. For interpolation and temporal-extrapolation tasks based on the atomic mass evaluation (AME2016 and AME2020), the complex additive-multiplicative product-unit gated recurrent unit (AM-PU-GRU) model consistently achieves the lowest prediction errors, with an interpolation RMSE of 0.227 $\pm$ 0.004 MeV and an extrapolation RMSE of 0.179 $\pm$ 0.015 MeV. These results surpass other state-of-the-art machine learning models and also outperform the real-valued GRU baseline and product-unit ablation variants, while remaining robust to different theoretical priors, including WS4 and SEMF. Our findings establish complex-valued product-unit recurrent networks as a new benchmark for sequence-based nuclear-mass prediction.

2606.06871 2026-06-08 cs.LG 新提交

Evidence-Grounded Ensemble Diagnosis of 802.11 Packet Captures: A Multi-Stage Pipeline with Deterministic Reliability Scoring

基于证据的802.11数据包捕获集成诊断:具有确定性可靠性评分的多阶段流水线

Jerome Henry, Swadhin Pradhan, Miroslav Popovic

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出PROBE多阶段流水线,通过确定性证据框架和集成方法解决LLM在802.11诊断中的幻觉、置信度偏差和评估偏见问题,在87个企业Wi-Fi捕获上实现0.957的加权证据F1分数和96%的自动接受率。

Comments 37 pages, 9 figures, 9 tables

详情
AI中文摘要

诊断802.11数据包捕获需要专家协议知识,速度慢、工程师间不一致且不可扩展。基于LLM的方法听起来合理,但会编造捕获中不存在的协议事件(尤其是截断的跟踪),产生未校准的置信度分数,并且当黄金参考由被测模型共同生成时遭受评估偏差。我们引入PROBE(基于证据的协议推理集成),一个解决所有三个失败的多阶段流水线。它整合了(i)具有帧级可验证性的确定性PCAP到文本归一化,(ii)多运行、多候选集成,带有可选的跨模型第二意见和渐进混淆,(iii)一个判决感知的证据框架,将缺乏失败证据视为贡献证据,以及(iv)一个完全确定性的复合可靠性分数,来自证据有效性、运行间稳定性和跨模型一致性,无需LLM自我评估。在87个企业Wi-Fi捕获(104个捕获-审查者对)上,单次LLM分析将加权证据F1从0.871(专家基线)提升到0.912,但在35%的情况下遗漏了关键帧。朴素集成投票降至基线以下(0.842),因为多数投票放大了保守判决:50%的确认失败被误分类为“无问题”或“证据不足”。添加基于证据的协调达到0.957 F1,96%的自动接受率,以及最坏情况下的下限高于0.70。LLM自我报告的置信度聚集在0.95,无论难度如何(71%报告恰好0.95),证实其无信息量。我们还引入了一个使用逐字段断言匹配的模型无关评估框架,消除了来自模型共同生成的黄金参考的循环偏差。

英文摘要

Diagnosing 802.11 packet captures requires expert protocol knowledge, is slow, inconsistent across engineers, and unscalable. LLM-based approaches sound plausible but fabricate protocol events absent from captures (especially truncated traces), produce uncalibrated confidence scores, and suffer evaluation bias when golden references are co-produced by the model under test. We introduce PROBE (Protocol Reasoning Over evidence-Based Ensembles), a multi-stage pipeline addressing all three failures. It integrates (i) deterministic PCAP-to-text normalization with frame-level verifiability, (ii) multi-run, multi-candidate ensembles with optional cross-model second opinion and progressive obfuscation, (iii) a verdict-aware evidence framework treating absence of failure evidence as contributing evidence, and (iv) a fully deterministic composite reliability score from evidence validity, run-to-run stability, and cross-model agreement without LLM self-assessment. On 87 enterprise Wi-Fi captures (104 capture-reviewer pairs), single-pass LLM analysis raises weighted evidence F1 from 0.871 (expert baseline) to 0.912 but misses critical frames in 35% of cases. Naive ensemble voting drops below baseline (0.842) as majority voting amplifies conservative verdicts: 50% of confirmed failures are misclassified as 'no issue' or 'insufficient evidence.' Adding evidence-grounded reconciliation achieves 0.957 F1, a 96% auto-accept rate, and a worst-case floor above 0.70. LLM self-reported confidence clusters at 0.95 regardless of difficulty (71% report exactly 0.95), confirming it is uninformative. We also introduce a model-agnostic evaluation framework using per-field assertion matching, eliminating circular bias from model-co-produced golden references.

2606.06881 2026-06-08 cs.LG 新提交

GlucoFM-Bench: Benchmarking Time-Series Foundation Models for Blood Glucose Forecasting

GlucoFM-Bench:血糖预测的时间序列基础模型基准测试

Baiying Lu, Zhaohui Liang, Ryan Pontius, Shengpu Tang, Temiloluwa Prioleau

发表机构 * Department of Computer Science(计算机科学系) Dartmouth College(达特茅斯学院) Emory University(埃默里大学) Quantitative Biomedical Sciences(定量生物医学科学)

AI总结 提出GlucoFM-Bench基准,评估8种时间序列基础模型与监督深度学习模型在15个糖尿病数据集上的血糖预测性能,发现预训练模型在零样本和少样本场景表现优异,但全样本下轻量LSTM仍最优。

详情
AI中文摘要

血糖预测模型是现代糖尿病管理系统的基石,可靠的短期预测能够实现主动干预、支持自动化胰岛素输送,并降低低血糖和高血糖事件的风险。从建模角度看,由于糖尿病群体中异质的生理动态,血糖预测面临独特挑战。传统机器学习和深度学习模型已被广泛评估用于血糖预测,但近期的时间序列基础模型(TSFMs)在此场景下的研究仍较少。为填补这一空白,我们提出GlucoFM-Bench,一个全面的基准测试,评估最先进的TSFMs与监督深度学习模型在血糖预测中的表现。我们评估了8种代表性架构,包括预训练TSFMs、时间序列大语言模型和特定任务深度学习模型,涵盖15个公开的糖尿病相关数据集,涉及1117名1型糖尿病、2型糖尿病、前驱糖尿病和非糖尿病个体。模型在零样本、少样本和全样本协议下进行评估,并系统变化上下文长度和预测范围。跨数据集,预训练TSFMs,尤其是Chronos-2和TimesFM,展现出强大的零样本和少样本迁移能力,最佳零样本模型性能在最佳全样本监督模型的5%以内。然而,当任务特定数据充足时,轻量级LSTM仍是最强的,在全样本训练下比TSFMs高出4-21%。分层分析揭示了T1D队列和低/高血糖范围内的持续挑战,强调了超越聚合误差指标进行评估的必要性。总之,GlucoFM-Bench为评估、比较和改进血糖预测基础模型提供了标准化和可重复的基础。

英文摘要

Blood glucose forecasting models are foundational for modern diabetes management systems, as reliable short-term predictions can enable proactive interventions, support automated insulin delivery, and reduce the risk of hypo- and hyperglycemic events. From a modeling perspective, glucose forecasting poses unique challenges due to heterogeneous physiological dynamics across diabetes populations. Traditional machine learning and deep learning models have been extensively evaluated for glucose prediction, yet recent time-series foundation models (TSFMs) remain much less studied in this setting. To bridge this gap, we present GlucoFM-Bench, a comprehensive benchmark evaluating state-of-the-art TSFMs alongside supervised deep learning models for blood glucose forecasting. We assess eight representative architectures, including pre-trained TSFMs, time-series large language models, and task-specific deep learning models, across 15 publicly available diabetes-relevant datasets comprising 1,117 individuals with type 1 diabetes, type 2 diabetes, prediabetes, and no diabetes. Models are evaluated under zero-shot, few-shot, and full-shot protocols, with systematic variation in context length and prediction horizon. Across datasets, pre-trained TSFMs, especially Chronos-2 and TimesFM, show strong zero-shot and few-shot transfer, with the best zero-shot model performing within 5% of the best full-shot supervised model. Yet, when task-specific data are abundant, a lightweight LSTM remains strongest, outperforming TSFMs by 4--21% under full-shot training. Stratified analyses reveal persistent challenges in T1D cohorts and hypo-/hyperglycemic ranges, highlighting the need for evaluation beyond aggregate error metrics. Together, GlucoFM-Bench provides a standardized and reproducible foundation for evaluating, comparing, and improving foundation models for blood glucose forecasting.

2606.06892 2026-06-08 cs.LG 新提交

GRASP: Geometry-aware Residual Alignment for Scalable Pretraining Data Attribution

GRASP:面向可扩展预训练数据归因的几何感知残差对齐

Yue Min, Ruining Chen, Yujun Li

发表机构 * Wizard Quant University of Science and Technology of China(中国科学技术大学)

AI总结 提出GRASP方法,通过二次几何惩罚建模子集交互,结合低维特征草图与有限置信度选择协议,实现可扩展的预训练数据归因,显著提升反事实子集保真度并降低计算成本。

详情
AI中文摘要

可扩展的数据归因方法通常为单个训练样本分配孤立的效用分数。这种普遍的加性假设从根本上无法捕捉关键的子集动态,包括数据冗余和互补覆盖。在这项工作中,我们将归因重新定义为子集级别的反事实效用预测,并引入GRASP,一种交互感知的替代方法。基于理论平滑度下界,GRASP通过二次几何惩罚显式建模子集交互。为了实现预训练规模的效率而不依赖隐藏的oracle调优,我们将低维特征草图与严格有限下置信度选择协议相结合。广泛的子集重训练评估表明,GRASP显著优于现有的可扩展基线。它将反事实子集保真度的任务级秩相关性提高了一倍以上,同时将前期工件构建成本降低了近一个数量级。下游诊断进一步表明,这种评分机制可迁移到语言模型策展和跨领域视觉选择,为优化大规模预训练语料库奠定了坚实基础。

英文摘要

Scalable data attribution methods typically assign isolated utility scores to individual training examples. This prevalent additive assumption fundamentally fails to capture critical subset dynamics, including data redundancy and complementary coverage. In this work, we reframe attribution as subset-level counterfactual utility prediction and introduce GRASP, an interaction-aware surrogate. Grounded in a theoretical smoothness lower bound, GRASP explicitly models subset interactions through a quadratic geometric penalty. To achieve pretraining-scale efficiency without relying on hidden oracle tuning, we couple low-dimensional feature sketches with a strictly finite lower-confidence bound selection protocol. Extensive subset-retraining evaluations demonstrate that GRASP decisively outperforms existing scalable baselines. It more than doubles the task-level rank correlation for counterfactual subset fidelity while reducing upfront artifact construction costs by nearly an order of magnitude. Downstream diagnostics further show that this scoring mechanism transfers to language model curation and cross-domain vision selection, establishing a robust foundation for optimizing massive pretraining corpora.

2606.06902 2026-06-08 cs.LG 新提交

TALAN: Task-Aligned Latent Adaptation Networks for Targeted Post-Training of Large Language Models

TALAN:面向大型语言模型目标后训练的任务对齐潜在自适应网络

Chengkai Zhang, Ziteng Liu, Junpu Wang, Zeyi Tao, Yang Wang, Sagar Chordia, Qin Huang

发表机构 * Meta AI

AI总结 提出TALAN,一种序列条件潜在旁路,插入Transformer残差流并与低秩适配器协同训练,在STEM/代码基准上平均提升LoRA 1.41点、DoRA 1.85点,仅增加<1%可训练参数和1.01-1.02倍推理开销。

详情
AI中文摘要

目标后训练旨在提升推理、数学和代码能力而不损害原有优势。低秩适配器高效但任务全局;激活干预输入感知但通常需要独立的探针、向量或推理时引导。我们提出TALAN(任务对齐潜在自适应网络),一种序列条件潜在旁路,插入Transformer的残差流中,并在一个SFT循环中与低秩适配器协同训练。TALAN将活动序列压缩为潜在记忆,将其重新混合为令牌级扰动,并通过受控残差更新写回。它沿六个轴配置:插入位置、记忆大小、混合器、写回规则、可训练范围和梯度尺度。在四个Qwen3系列骨干和四个STEM/代码基准上,TALAN改进了匹配的LoRA和DoRA基线。使用LoRA,它实现了+1.41点的跨模型平均增益,在所有四个骨干上为正,在所有16个模型-基准单元上非负。使用DoRA,它实现了+1.85点的平均增益,在所有骨干上为正,在16个单元中的13个上为正。配对种子检查支持正平均效应但显示非平凡方差,因此我们将其视为敏感性检查。成本很小:相对于骨干的可训练参数<1%,推理开销为匹配LoRA的1.01-1.02倍。在Llama-3.2-1B上的迁移探针在LoRA和rsLoRA下,跨七个配对种子也呈正效应,支持超越Qwen的迁移。内部状态分析表明TALAN是一种小的互补激活干预。匹配的适配器更新比TALAN扰动大80-1700倍,但它们的余弦接近零;逐层测量显示这种小的正交扰动通过深度传播和放大。TALAN为在标准适配器后训练中研究可引导的激活级自适应提供了一个实用平台。

英文摘要

Targeted post-training aims to improve reasoning, math, and code without degrading strengths. Low-rank adapters are efficient but task-global; activation interventions are input-aware but often require separate probes, vectors, or inference-time steering. We introduce TALAN (Task-Aligned Latent Adaptation Networks), a sequence-conditioned latent side path inserted into a transformer's residual stream and co-trained with a low-rank adapter in one SFT loop. TALAN compresses the active sequence into latent memory, remixes it into token-level perturbations, and writes them back through a controlled residual update. It is configured along six axes: insertion location, memory size, mixer, writeback rule, trainability scope, and gradient scale. Across four Qwen3-family backbones and four STEM/code benchmarks, TALAN improves matched LoRA and DoRA baselines. With LoRA, it yields a +1.41 point cross-model mean gain, positive on all four backbones and non-negative on all 16 model-benchmark cells. With DoRA, it yields a +1.85 point mean gain, positive on all backbones and on 13 of 16 cells. Paired seed checks support positive average effects but show nontrivial variance, so we treat them as sensitivity checks. Cost is small: <1% trainable parameters relative to the backbone and 1.01-1.02x inference overhead versus matched LoRA. A Llama-3.2-1B transfer probe is also positive under LoRA and rsLoRA across seven paired seeds, supporting a transfer beyond Qwen. Internal-state analyses suggest TALAN is a small complementary activation intervention. The matched adapter update is 80-1,700x larger than the TALAN perturbation, yet their directions have near-zero cosine; per-layer measurements show this small orthogonal perturbation propagates and amplifies through depth. TALAN offers a practical platform for studying steerable activation-level adaptation within standard adapter-based post-training.

2606.06920 2026-06-08 cs.LG cs.AI 新提交

The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning

微调陷阱:评估负迁移及PEFT在亚十亿参数数学推理中的作用

Rahul Nair, Chun Tao

发表机构 * GitHub University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 本研究评估了五种亚十亿参数模型在数学推理任务中的微调策略,发现全量微调对小于3亿参数的模型造成负迁移,而参数高效微调(PEFT)是稳定性要求。

Comments 8 pages, 6 figures, 2 tables

详情
AI中文摘要

在边缘设备上部署小型语言模型(SLM)需要高效的微调策略,使模型适应新任务而不降低其通用能力。在本研究中,我们对五种亚十亿参数模型(135M-1B)在数学推理任务上进行了基准测试,并发现了一个关键脆弱性:全量微调(Full FT)会主动损害300M以下参数模型的性能,通常将准确率降至零样本基线以下。这种“负迁移”使得参数高效微调(PEFT)不仅是效率上的偏好,更是稳定性上的要求。我们发现,虽然低秩适应(LoRA)和权重分解LoRA(DoRA)性能相当,但它们的优势因任务而异:DoRA在复杂推理(GSM8K)中表现出色,而LoRA在模式匹配(OrcaMath)中占主导地位。特别地,在对齐模型(Qwen2.5-0.5B)上,LoRA优于全量微调,甚至在最小架构(SmolLM2-135M)上,简单的5-shot上下文学习也优于全量微调。基于这些发现,我们建议对所有对齐的亚十亿参数模型默认使用PEFT,并警告不要对任何小于500M参数的架构使用全量微调,以防止灾难性遗忘。本工作的复现可在此网址找到:https://this URL。

英文摘要

Deploying Small Language Models (SLMs) on edge devices requires efficient fine-tuning strategies that adapt models to new tasks without degrading their general capabilities. In this study, we benchmark five sub-1B models (135M-1B) on mathematical reasoning tasks and uncover a critical vulnerability: Full Fine-Tuning (Full FT) actively harms performance in models under 300M parameters, often dropping accuracy below zero-shot baselines. This "negative transfer" makes Parameter-Efficient Fine-Tuning (PEFT) not just an efficiency preference, but a stability requirement. We find that while Low-Rank Adaptation (LoRA) and Weight-Decomposed LoRA (DoRA) perform comparably, their strengths vary by task; DoRA excels in complex reasoning (GSM8K), while LoRA dominates pattern matching (OrcaMath). In particular, Full FT is outperformed by LoRA on aligned models (Qwen2.5-0.5B) and even by simple 5-shot In-Context Learning on the smallest architectures (SmolLM2-135M). Based on these findings, we recommend defaulting to PEFT for all aligned sub-1B models and caution against Full FT for any architecture smaller than 500M parameters to prevent catastrophic forgetting. Reproduction of this work can be found at https://github.com/gulguluu/tiny-slm-finetune-compare.

2606.06924 2026-06-08 cs.LG 新提交

From Sampled Outcomes to Capability Distributions: Rethinking Supervision for LLM Routing

从采样结果到能力分布:重新思考LLM路由的监督

Guannan Lai, Haoran Hu, Long Chen, Zhenguo Li, Han-Jia Ye

发表机构 * School of Artificial Intelligence, Nanjing University(南京大学人工智能学院) National Key Laboratory for Novel Software Technology, Nanjing University(南京大学新型软件技术国家重点实验室) Hong Kong University of Science and Technology(香港科学与技术大学) Frontier Robotics(前沿机器人)

AI总结 针对LLM路由中单次响应作为监督信号噪声大的问题,提出DARS框架,从分布视角构建路由监督,考虑输入和输出不确定性,实验表明分布感知监督更稳定有效。

详情
AI中文摘要

现有的LLM路由方法通常将模型对查询的单个响应作为训练路由器的能力标签。然而,由于LLM生成本质上是随机的,这种单次监督仅提供了查询-模型对行为的噪声观测,而非可靠的能力估计。我们表明,这种假设会向路由监督中引入系统性噪声,使得学习到的路由策略可靠性降低。为解决此问题,我们提出DARS(分布感知路由监督)框架,该框架从模型行为的分布视角构建路由监督。DARS不依赖单个生成的响应,而是考虑来自输入侧和输出侧的不确定性,捕捉语义等价的查询表述和随机生成如何影响模型性能。基于这些分布感知的观测,DARS为路由构建更可靠的监督信号。跨不同任务的实验表明,单次标签可能对模型选择产生误导,而分布感知监督提供更稳定的标签并改进学习到的路由行为。我们的结果表明,可靠的LLM路由应超越单次响应观测,并基于查询级模型能力分布。

英文摘要

Existing LLM routing methods typically treat a model's single response to a query as its capability label for training routers. However, because LLM generation is inherently stochastic, such single-shot supervision provides only a noisy observation of a query-model pair's behavior rather than a reliable capability estimate. We show that this assumption introduces systematic noise into routing supervision, making learned routing policies less reliable. To address this issue, we propose DARS (Distribution-Aware Routing Supervision), a framework that constructs routing supervision from a distributional view of model behavior. Instead of relying on a single generated response, DARS considers uncertainty from both the input side and the output side, capturing how semantically equivalent query formulations and stochastic generations affect model performance. Based on these distribution-aware observations, DARS builds more reliable supervision signals for routing. Experiments across diverse tasks show that single-shot labels can be misleading for model selection, while distribution-aware supervision provides more stable labels and improves learned routing behavior. Our results suggest that reliable LLM routing should move beyond single-response observations and be grounded in query-level model capability distributions.

2606.06934 2026-06-08 cs.LG 新提交

Uniform Stability and Generalization Error of GD and SGD on Fixed-Point Parameters

固定点参数上GD和SGD的均匀稳定性与泛化误差

Jonghyun Shin, Sejun Park

发表机构 * Department of Artificial Intelligence, Korea University(人工智能系,韩国大学)

AI总结 研究离散参数空间中梯度下降(GD)和随机梯度下降(SGD)的泛化误差与均匀稳定性,发现确定性舍入使GD泛化误差率从O(T/n)恶化到O(T/√n),而SGD在确定性舍入下仍具有非平凡稳定性保证,且随机舍入会引入随维度增长的泛化误差。

详情
AI中文摘要

我们分析了离散参数空间上梯度下降(GD)和随机梯度下降(SGD)的泛化误差、均匀稳定性和均匀参数稳定性,其中每次更新涉及确定性或随机舍入。我们表明,确定性舍入降低了GD在凸、Lipschitz和平滑损失函数上的泛化误差,将速率从$O(T/n)$增加到$O(T/\sqrt{n})$,并建立了匹配的下界。我们进一步证明GD的均匀稳定性变为$\Omega(T)$,表明基于稳定性的泛化界在此设置中是无效的。相比之下,对于相同的损失,带有确定性舍入的随机梯度下降具有非平凡的均匀稳定性保证,这些保证与实值情况有质的区别,并且在迭代次数和维度上表现出不同的依赖性:我们证明了一维的紧界$O(T/n)$和高维的$O(T^2/n)$。我们还表明,随机舍入可能引入随维度增加的泛化误差;这种现象在标准实值优化和确定性舍入情况下是不存在的。最后,我们给出了随机舍入方案的均匀参数稳定性的上界,并表明当损失可以表示为坐标函数之和时,这些界是紧的。

英文摘要

We analyze generalization error, uniform stability, and uniform argument stability of gradient descent (GD) and stochastic gradient descent (SGD) over discrete parameter spaces, where each update involves deterministic or stochastic rounding. We show that deterministic rounding degrades the generalization error of GD on convex, Lipschitz, and smooth loss functions, increasing the rate from $O(T/n)$ to $O(T/\sqrt{n})$, and establish matching lower bounds. We further prove that uniform stability of GD becomes $Ω(T)$, showing that stability-based generalization bounds are vacuous in this setting. In contrast, for the same losses, stochastic gradient descent with deterministic rounding admits nontrivial uniform stability guarantees, which differ qualitatively from the real-valued case and exhibit distinct dependencies on the number of iterations and the dimension: we prove tight bounds $O(T/n)$ for one dimension and $O(T^2/n)$ for higher dimensions. We also show that stochastic rounding can introduce generalization error that increases with the dimension; such a phenomenon is absent in standard real-valued optimization and in the deterministic rounding case. Finally, we provide upper bounds on uniform argument stability for stochastic rounding schemes and show that these bounds are tight when the loss can be represented as a sum of coordinate-wise functions.

2606.06967 2026-06-08 cs.LG 新提交

GenPO++: Generative Policy Optimization with Jacobian-free Likelihood Ratios

GenPO++:基于无雅可比似然比的生成式策略优化

Ke Hu, Shutong Ding, Panxin Tao, Jingya Wang, Ye Shi

发表机构 * ShanghaiTech University(上海科技大学)

AI总结 提出GenPO++框架,利用高阶可逆ODE求解器中的历史状态作为辅助记忆,实现精确可逆映射,从而无偏且高效地计算生成流策略的似然比,在连续控制任务中优于现有方法。

详情
AI中文摘要

生成式策略提供表达性强且多模态的动作分布,使其在复杂连续控制任务的强化学习(RL)中具有吸引力。其中,基于流的策略尤其吸引人,因为它们通过确定性传输映射生成动作。然而,将此类生成式策略应用于基于似然的在线学习仍然受到评估已执行动作概率的困难限制。现有的流RL方法要么用近似替代品替换真实的动作密度比,这可能会引入有偏更新,要么通过虚拟动作增广恢复精确似然,这会扩大策略空间并增加计算量。在这项工作中,我们提出GenPO++,一种可逆生成式策略优化框架,它使用高阶可逆ODE求解器中的历史状态作为辅助记忆,在不改变原始动作维度的情况下实现精确反演。由此产生的生成式策略映射的对数行列式仅由固定的求解器系数决定,从而实现了精确且无雅可比的似然比计算。这种设计保留了生成流策略的表达能力,同时避免了动作比率偏差和虚拟动作开销。我们在大规模模拟控制、微调和真实机器人操作任务上评估了GenPO++,与最先进的在线RL方法相比,它取得了具有竞争力或更优的性能,同时提高了训练稳定性和计算效率。

英文摘要

Generative policies provide expressive and multimodal action distributions, making them attractive for reinforcement learning (RL) in complex continuous-control tasks. Among them, flow-based policies are especially appealing because they generate actions through deterministic transport maps. However, applying such generative policies to likelihood-based on-policy learning remains limited by the difficulty of evaluating the probability of executed actions. Existing flow RL methods either replace the true action-density ratio with approximate surrogates, which can introduce biased updates, or recover exact likelihoods through dummy-action augmentation, which enlarges the policy space and increases computation. In this work, we propose GenPO++, a reversible generative policy optimization framework that uses history states as auxiliary memory in a high-order reversible ODE solver, yielding exact inversion without changing the original action dimension. The resulting generative policy map has a log-determinant determined only by fixed solver coefficients, enabling exact and Jacobian-free likelihood-ratio computation. This design preserves the expressiveness of generative flow policies while avoiding both action ratio bias and dummy-action overhead. We evaluate GenPO++ on large-scale simulated control, fine-tuning, and real-world robotic manipulation tasks, where it achieves competitive or superior performance over state-of-the-art on-policy RL methods, while improving training stability and computational efficiency.

2606.06984 2026-06-08 cs.LG 新提交

Accelerating Multi-Objective Bayesian Optimisation via Predictive-Gradient Catalysts

通过预测梯度催化剂加速多目标贝叶斯优化

Alma Rahat, Tinkle Chugh, Jonathan Fieldsend, Richard Allmendinger

发表机构 * Loughborough University(洛辛厄姆大学) University of Exeter(埃克塞特大学) The University of Manchester(曼彻斯特大学)

AI总结 提出利用高斯过程预测梯度作为辅助信号,增强现有Pareto兼容采集函数,加速多目标贝叶斯优化收敛到全局Pareto集。

Comments Parallel Problem Solving From Nature (PPSN), 2026

详情
AI中文摘要

本文提出了一种通用的多目标贝叶斯优化(MOBO)加速机制,利用高斯过程预测梯度作为辅助信号。该方法并非取代现有的Pareto兼容采集函数,而是通过从代理模型导出的梯度中获取局部平稳性信息来增强它们,从而在有限的评估预算下更快地收敛到全局Pareto集。研究了两种催化剂实例:自适应多重梯度下降算法催化剂(MGDA)和预定义权重变体,后者在预算紧张时能够实现聚焦探索。在DTLZ基准测试套件(使用2个目标和10个决策变量)上的实验表明,当代理模型准确时,特别是对于平稳问题,预测梯度催化相比其他采集函数(EHVI、AugTch、tMPoI、SAF)能够带来显著的加速。

英文摘要

This paper presents a general acceleration mechanism for multi-objective Bayesian optimisation (MOBO) that leverages Gaussian process predictive gradients as auxiliary signals. Rather than replacing existing Pareto-compliant acquisition functions, the proposed approach augments them with local stationarity information derived from surrogate-derived gradients, enabling faster convergence toward the global Pareto set under limited evaluation budgets. Two catalyst instantiations are investigated: an adaptive Multiple-Gradient Descent Algorithm-Based Catalyst (MGDA) and a predefined-weight variant that enables focused exploration when budgets are tight. Experiments on the DTLZ benchmark suite (using 2 objectives and 10 decision variables) show that predictive gradient catalysis can deliver significant acceleration compared to other acquisition functions (EHVI, AugTch, tMPoI, SAF) when surrogates are accurate, particularly for stationary problems.

2606.06986 2026-06-08 cs.LG 新提交

Heterogeneous Effects of Green Finance on Urban Decarbonization: Evidence from 285 Cities in China

绿色金融对城市脱碳的异质性效应:来自中国285个城市的证据

Xueyang Li, Jinlei Ma

发表机构 * School of Business, Anhui University of Technology(安徽理工大学商学院)

AI总结 本研究利用计量模型和机器学习分析,发现绿色金融显著降低城市碳强度,其中绿色债券和绿色投资效果最强,且存在空间溢出效应,影响因城市发展水平而异,主要通过能源结构优化等渠道发挥作用。

详情
AI中文摘要

虽然绿色金融已成为低碳城市转型的关键工具,但其实际的脱碳效应和传导机制仍不明确。本研究采用计量经济模型和基于机器学习的分析,考察绿色金融是否以及如何降低城市碳强度。结果表明,绿色金融显著降低碳强度,其中绿色债券和绿色投资的影响最强,并存在明显的空间溢出效应。效果因发展水平而异,在四五线城市最为显著。中介分析显示,绿色金融主要通过能源结构优化发挥作用,其次是产业升级、外商直接投资和技术创新。SHAP分析证实不同金融工具之间存在显著差异,其中绿色债券、基金和信贷对脱碳贡献最大。此外,在技术能力低、产业依赖度高和以煤为主的能源结构的城市,边际影响更强。这些发现为构建多层次、区域差异化的绿色金融体系以促进包容性低碳转型提供了理论支持和政策指导。关键词:绿色金融;碳强度;脱碳效应;机器学习;城市

英文摘要

While green finance has become a key instrument for low-carbon city transitions, its actual decarbonization effects and transmission mechanisms remain unclear. This study employs econometric models and machine learning-based analysis to examine whether and how green finance reduces city-level carbon intensity. Results show that green finance significantly lowers carbon intensity, with green bonds and green investment having the strongest impacts and evident spatial spillovers. The effects vary by development level, being most pronounced in Fourth- and Fifth-tier cities. Mediation analysis reveals that green finance operates mainly through energy structure optimization, followed by industrial upgrading, foreign direct investment, and technological innovation. SHAP analysis confirms substantial differences across financial instruments, with green bonds, funds, and credit contributing most to decarbonization. Moreover, the marginal impact is stronger in cities with low technological capacity, high industrial dependency, and coal-based energy mixes. These findings provide theoretical support and policy guidance for building a multi-level, regionally differentiated green finance system to promote inclusive low-carbon transitions. Keywords: Green Finance; Carbon Intensity; Decarbonization Effect; Machine Learning; City

2606.06990 2026-06-08 cs.LG 新提交

Accelerating Reproducible Research in Synthetic EHR Generation

加速可复现的合成电子健康记录生成研究

Jalen Jiang, Chufan Gao, Ethan Rasmussen, Stephen Z. Xie, Jimeng Sun

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出一个轻量级端到端基准框架,统一数据加载、标准化训练和架构无关评估,重新实现多种基线模型并添加GPT-2基线,通过隐私-效用评估套件和自助置信区间分析长尾性能问题,推动社区驱动的可复现性。

详情
AI中文摘要

生成高保真合成电子健康记录(EHR)对于在保护患者隐私的同时推进医学研究至关重要。然而,现有生成模型之间的直接比较因代码库分散、数据加载器不兼容、库依赖冲突以及评估协议不一致而受到阻碍。为解决这些问题,我们引入了一个轻量级、端到端的可复现合成EHR评估基准框架,组织为统一流水线,涵盖数据摄取、标准化模型训练和架构无关评估。我们当前的实现针对纵向ICD诊断代码的生成——这是该文献中最常研究的模态——并基于社区维护的PyHealth库构建。我们在完整ICD-9词汇粒度下重新实现并统一了强基线模型(MedGAN、CorGAN、PromptEHR、HALO),并从通用序列建模文献中添加了一个轻量级GPT-2基线。我们贡献了一个严格的、架构无关的隐私-效用评估套件,该套件同样适用于GAN和基于Transformer的生成器,并报告了所有指标的自助置信区间。我们进一步分析了现有模型在长尾分布上的不佳表现,并讨论了框架在诊断代码之外的可扩展性。通过降低在单一流水线下运行、扩展和评估的工程障碍,我们为社区驱动的可复现性和合成EHR模型基准测试提供了一个起点。

英文摘要

The generation of high-fidelity synthetic Electronic Health Records (EHR) is crucial for advancing medical research while preserving patient privacy. However, head-to-head comparison of existing generative models is hindered by disjointed codebases, incompatible data loaders, conflicting library dependencies, and inconsistent evaluation protocols. To address these gaps, we introduce a lightweight, end-to-end benchmarking framework for reproducible synthetic EHR evaluation, organized as a unified pipeline spanning data ingestion, standardized model training, and architecture-agnostic evaluation. Our current implementation targets the generation of longitudinal ICD diagnosis codes -- the most commonly studied modality in this literature -- and is built on the community-maintained PyHealth library. We reimplement and unify strong baselines (MedGAN, CorGAN, PromptEHR, HALO) under full ICD-9 vocabulary granularity, and add a lightweight GPT-2 baseline from the general-purpose sequence-modeling literature. We contribute a rigorous, architecture-agnostic privacy-utility evaluation suite that applies identically to GAN- and transformer-based generators, and report bootstrapped confidence intervals across all metrics. We further analyze the poor long-tailed performance of existing models and discuss the extensibility of our framework beyond diagnosis codes. By lowering the engineering barrier to running, extending, and evaluating under a single pipeline, we introduce a starting point for community-driven reproducibility and benchmarking synthetic EHR models.

2606.07006 2026-06-08 cs.LG cs.CL 新提交

RASFT: Rollout-Adaptive Supervised Fine-Tuning for Reasoning

RASFT: 用于推理的滚动自适应监督微调

Yongliang Miao, Fengyuan Liu, Wei Shi, Yanguang Liu, Fei Sun, Na Zou, Mengnan Du

发表机构 * The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Shanghai Artificial Intelligence Laboratory(上海人工智能实验室) New Jersey Institute of Technology(新泽西理工学院) Institute of Computing Technology, CAS(中国科学院计算技术研究所)

AI总结 提出RASFT框架,通过基于策略rollout的问题级可解性校准专家监督,在模型困难时加强指导、表现可靠时放松模仿并纳入自生成轨迹,同时使用裁剪逆比约束策略漂移,在多个推理基准上优于SFT和RL方法。

详情
AI中文摘要

监督微调(SFT)是一种通过模仿离线专家演示来使大型语言模型适应推理任务的流行方法,通常将单个专家轨迹视为目标行为。然而,推理并非简单的路径模仿:严格遵循一个演示解决方案可能会过度拟合表面形式并抑制模型自身的推理分布。我们提出了滚动自适应监督微调(RASFT),这是一种策略感知的SFT框架,它根据从验证的策略rollout中估计的问题级可解性来校准专家监督。对于每个问题,当当前策略困难时,RASFT加强专家指导,而当模型已经表现出可靠的推理行为时,放松严格模仿并纳入正确的自生成轨迹。为了保留有用的推理先验,RASFT进一步引入了冻结参考模型与当前策略之间的裁剪逆比,以约束过度的策略漂移。在六个数学推理基准和两个代码推理基准上的多个模型实验表明,RASFT在整体性能上优于SFT、SFT变体和代表性RL方法。代码可在该https URL获取。

英文摘要

Supervised fine-tuning (SFT) is a prevailing method for adapting large language models to reasoning tasks by imitating offline expert demonstrations, often treating a single expert trajectory as the target behavior. However, reasoning is not simple path imitation: rigidly following one demonstrated solution may overfit to surface forms and suppress the model's own reasoning distribution. We propose Rollout-Adaptive Supervised Fine-Tuning (RASFT), a policy-aware SFT framework that calibrates expert supervision according to problem-level solvability estimated from verified on-policy rollouts. For each problem, RASFT strengthens expert guidance when the current policy struggles, while relaxing rigid imitation and incorporating correct self-generated trajectories when the model already exhibits reliable reasoning behavior. To preserve useful reasoning priors, RASFT further introduces a clipped inverse ratio between the frozen reference model and the current policy to constrain excessive policy drift. Experiments across multiple models on six mathematical reasoning benchmarks and two code reasoning benchmarks show that RASFT achieves better overall performance than SFT, SFT variants, and representative RL methods. The code is available at https://github.com/zjd1sq/RASFT.

2606.07007 2026-06-08 cs.LG cs.AI 新提交

A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders

理解稀疏自编码器中概念学习与神经元解释的几何视角

Chenhao Zhang, Chris Lin, Su-In Lee

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出统一数学框架,将概念学习形式化为集合对齐问题,区分检测、分离和近似三种学习强度,并给出几何条件与误差界,通过形式概念分析连接概念学习与神经元解释。

详情
AI中文摘要

我们提出了一个统一的数学框架,用于几何理解稀疏自编码器(SAE)中的概念学习和神经元解释。尽管SAE通过学习稀疏特征表示提高了神经网络的可解释性,但“概念”和“学习”的原则性定义仍不明确。我们将概念形式化为数据点的集合,并将概念学习视为人类定义概念与模型诱导概念之间的集合对齐问题。该公式区分了三种越来越强的学习概念——检测、分离和近似——并给出了概念可由单个神经元或多神经元单元表示的几何条件、误差界和容量约束。它还提供了对常见SAE现象的集合论解释,包括特征分裂、特征吸收、特征族和层次概念。最后,我们通过形式概念分析将概念学习与神经元解释联系起来,表明这两个方向不必一致,并且它们的多对多结构可以通过概念格来组织。在合成数据上使用ReLU和Top-$K$ SAE的实验说明了该理论,并揭示了SAE大小和稀疏性对概念学习的影响。

英文摘要

We propose a unified mathematical framework for a geometric understanding of concept learning and neuron interpretation in sparse autoencoders (SAEs). While SAEs improve interpretability of neural networks by learning sparse feature representations, a principled definition of ''concept'' and ''learning'' remains unclear. We formalize concepts as sets of data points and cast concept learning as a set-alignment problem between human-defined and model-induced concepts. This formulation distinguishes three increasingly strong notions of learning -- detection, separation, and approximation -- and yields geometric conditions, error bounds, and capacity constraints for when concepts can be represented by individual neurons or multi-neuron units. It also provides a set-theoretic account for common SAE phenomena, including feature splitting, feature absorption, feature families, and hierarchical concepts. Finally, we connect concept learning and neuron interpretation through formal concept analysis, showing that the two directions need not agree and that their many-to-many structure can be organized by concept lattices. Experiments on synthetic data with ReLU and Top-$K$ SAEs illustrate the theory and reveal the effects of SAE size and sparsity on concept learning.

2606.07031 2026-06-08 cs.LG 新提交

CF-JEPA: Mask-free forward prediction with asymmetric encoder utilization for time-series representation learning

CF-JEPA:面向时间序列表示学习的无掩码前向预测与非对称编码器利用

Jaehoon Lee, Sunghyun Sim

发表机构 * Graduate School of Artificial Intelligence Convergence Engineering, Changwon National University(昌原国立大学人工智能融合工程研究生院) Department of Artificial Intelligence Engineering, Changwon National University(昌原国立大学人工智能工程系)

AI总结 提出CF-JEPA,一种无掩码框架,通过多时间范围前向预测替代掩码,利用时间序列的时序顺序作为学习信号;并利用在线编码器与EMA目标编码器之间的非对称性,将不同任务路由到合适的编码器,在多个基准上取得领先性能。

详情
AI中文摘要

自监督学习(SSL)在时间序列表示学习领域主要有两种范式:对比方法(在构建正负样本对时面临挑战)和基于掩码的方法(会破坏时间序列信号的时序连续性)。联合嵌入预测架构(JEPA)通过在表示空间中进行预测而非重建原始输入,提供了一种有前景的替代方案。然而,现有的时间序列JEPA变体仍然依赖掩码,因此继承了其连续性问题。本文提出基于裁剪的前向JEPA(CF-JEPA),这是一种创新的无掩码框架,用多时间范围前向预测替代掩码:随机裁剪作为上下文视图,并在前向时间方向上预测短、中、长时域的未来表示,直接利用时间序列数据固有的时序顺序作为学习信号。此外,我们还发现单次训练运行中产生的在线编码器和指数移动平均(EMA)目标编码器之间存在强烈的非对称性:在线编码器发展出更高秩的判别性特征,而EMA目标编码器发展出更平滑、更低秩的时序特征。利用这种非对称性,将分类任务路由到在线编码器,将预测或异常检测任务路由到EMA目标编码器,在不增加训练成本的情况下,多变量预测均方误差(MSE)降低了27%。在126个加州大学河滨分校(UCR)和26个东英吉利大学(UEA)分类数据集、8个电力变压器温度预测基准以及关键绩效指标/Yahoo异常检测任务上,CF-JEPA在自监督基线方法中取得了UCR和UEA上的最高平均准确率和排名,并在单变量预测和k近邻评分的异常检测中排名第二。

英文摘要

Self-supervised learning (SSL) for time-series representation learning is dominated by two paradigms: contrastive methods, which face challenges in constructing positive or negative pairs, and masking-based methods, which disrupt the temporal continuity of time-series signals. Joint-Embedding Predictive Architecture (JEPA) offers a promising alternative by predicting in representation space rather than reconstructing raw inputs. However, existing time-series JEPA variants still rely on masking and therefore inherit its continuity problem. Crop-based Forward JEPA (CF-JEPA) is proposed as an innovative mask-free framework that replaces masking with multi-horizon forward prediction: random crops serve as context views, and short-, mid-, and long-horizon future representations are predicted in the forward temporal direction, directly leveraging the inherent temporal ordering of time-series data as a learning signal. A strong asymmetry is also identified between the online encoder and the exponential moving average (EMA) target encoder, both produced from a single training run: the online encoder develops higher-rank discriminative features, while the EMA target encoder develops smoother, lower-rank temporal features. Exploiting this asymmetry, classification is routed to the online encoder and forecasting or anomaly detection to the EMA target encoder, achieving a 27% reduction in multivariate forecasting mean squared error (MSE) at no additional training cost. Across 126 University of California, Riverside (UCR) and 26 University of East Anglia (UEA) classification datasets, eight electricity transformer temperature forecasting benchmarks, and Key Performance Indicator /Yahoo anomaly detection, CF-JEPA achieves the highest average accuracy and rank on UCR and UEA among self-supervised baselines and ranks second on univariate forecasting and k-nearest neighbors-scored anomaly detection.

2606.07044 2026-06-08 cs.LG 新提交

Hierarchical Forecast Reconciliation for Urban Rail Transit Demand Prediction under Operational Disruptions

运营中断下城市轨道交通需求预测的层级协调方法

Dang Viet Anh Nguyen, Alma Fazlagic, Kristine Pryds Loft, Filipe Rodrigues

发表机构 * Technical University of Denmark (DTU)(丹麦技术大学)

AI总结 针对城市轨道交通中站点与OD流预测不一致问题,提出首个层级协调框架,利用神经全连接协调器(FCR)学习非线性映射,确保结构一致性,在中断场景下OD预测误差降低达17.45%。

Comments 33 pages, 6 figures, 16 tables

详情
AI中文摘要

准确且一致的乘客需求预测对于城市轨道交通(URT)运营至关重要。乘客需求具有层级结构,其中起讫点(OD)流量通过守恒约束聚合为站点级进出站流量。实践中,站点级和OD级预测通常独立生成,产生违反这些约束的不一致预测,给运营决策带来不一致性。在中断期间,当预测可靠性最为关键时,此类问题更为严重。本文提出了首个用于联合站点级和OD级URT需求预测的层级预测协调框架。神经全连接协调器(FCR)学习从非协调基础预测到协调层级预测的非线性映射,同时通过构造保证精确的结构一致性。该方法使用哥本哈根S-train网络的Rejsekort智能卡数据,在单步、多步和中断预测场景下,与OLS、WLS和最小迹(MinT)变体进行基准比较。结果表明,协调一致地提高了OD预测准确性,同时确保了层级一致性。在正常条件下,FCR与基于MinT的方法性能相当。一项oracle分析表明,完美的站点级预测可将OD预测误差降低高达34%,凸显了改进基础预测的价值。在严重中断下,FCR优于经典方法,在多步目的地侧延迟场景中将OD预测误差降低高达17.45%。这些发现确立了层级协调作为提高预测鲁棒性的有效机制,最大的收益出现在最具挑战性的运营条件下。

英文摘要

Accurate and coherent passenger demand forecasting is essential for Urban Rail Transit (URT) operations. Passenger demand has a hierarchical structure in which origin-destination (OD) flows aggregate to station-level inflows and outflows through conservation constraints. In practice, station-level and OD-level forecasts are often generated independently, producing incoherent predictions that violate these constraints and introduce inconsistencies into operational decision-making. Such issues become more severe during disruptions, when forecasting reliability is most critical. This paper presents the first hierarchical forecast reconciliation framework for joint station-level and OD-level URT demand prediction. A neural Fully Connected Reconciler (FCR) learns a non-linear mapping from incoherent base forecasts to coherent hierarchical predictions while guaranteeing exact structural consistency by construction. The method is benchmarked against OLS, WLS, and Minimum Trace (MinT) variants using Rejsekort smart-card data from the Copenhagen S-train network under one-step, multi-step, and disruption forecasting scenarios. Results show that reconciliation consistently improves OD forecasting accuracy while ensuring hierarchical coherence. Under normal conditions, FCR performs competitively with MinT-based methods. An oracle analysis indicates that perfect station-level forecasts could reduce OD prediction error by up to 34 percent, highlighting the value of improved base forecasts. Under severe disruptions, FCR outperforms classical methods, reducing OD forecasting error by up to 17.45 percent in multi-step destination-side delay scenarios. These findings establish hierarchical reconciliation as an effective mechanism for improving forecast robustness, with the largest benefits occurring under the most challenging operating conditions.

2606.07058 2026-06-08 cs.LG cs.CV math.AT stat.ML 新提交

Constructing VAE Latent Spaces with Prescribed Topology

构建具有指定拓扑的VAE潜在空间

Jilles S. van Hulst, Jakub M. Tomczak, W. P. M. H. Heemels, Duarte J. Antunes

发表机构 * Control Systems Technology Section, Department of Mechanical Engineering, Eindhoven University of Technology(机械工程系控制系统技术部,埃因霍温理工大学) Nature Innovation Laboratory (NatInLab)(自然创新实验室(NatInLab))

AI总结 针对数据流形非欧几里得拓扑导致标准高斯先验不匹配的问题,提出一种构造性数学框架,通过因子化分布和重参数化技巧,为乘积覆盖空间流形(如圆柱、环面、莫比乌斯带等)设计拓扑匹配的先验,提升重建质量和表示忠实性。

Comments 16 pages, 7 figures

详情
AI中文摘要

变分自编码器(VAE)学习高维数据的低维潜在表示。当数据位于具有非欧几里得拓扑的流形上时,标准高斯先验会引入拓扑不匹配,从而降低重建质量并阻碍忠实表示。我们提出了一个构造性数学框架,解决了所有允许乘积覆盖空间的流形的这种不匹配问题。这些流形可表示为基本因子(圆、区间或直线)的乘积,或此类乘积在有限对称群下的商。该类包括圆柱、环面、莫比乌斯带、克莱因瓶和实射影空间。基本因子上的因子化分布产生具有闭式解耦KL散度的乘积拓扑,使得每个潜在因子可以独立塑造,同时保持训练可处理。我们为周期、有界和无界支撑编目了可重参数化的编码器-先验对,并提供了坐标变换,允许标准神经网络输出具有平滑梯度的非欧几里得参数。对于商流形,解码器接收覆盖空间坐标的群不变特征,使得识别点产生相同输出。锚点约束相对于数据固定坐标系或创建软拓扑孔。在合成流形和真实图像数据集(旋转和循环移位MNIST)上的实验证实,拓扑匹配的先验使KL正则化与数据流形对齐。所得到的拓扑感知模型在所有实际相关的正则化强度下均优于高斯基线。代码可从此https URL获取。

英文摘要

Variational autoencoders (VAEs) learn low-dimensional latent representations of high-dimensional data. When the data lies on a manifold with non-Euclidean topology, the standard Gaussian prior introduces a topological mismatch that degrades reconstruction quality and prevents faithful representation. We present a constructive mathematical framework that resolves this mismatch for all manifolds that admit a product covering space. These are manifolds expressible as products of elementary factors (circles, intervals, or lines) or as quotients of such products by a finite symmetry group. The class includes cylinders, tori, Möbius strips, Klein bottles, and real projective spaces. Factorized distributions over the elementary factors yield product topologies with closed-form, decoupled KL divergences, so that each latent factor can be shaped independently while keeping training tractable. We catalogue reparametrizable encoder-prior pairs for periodic, bounded, and unbounded supports, and provide coordinate transformations that allow standard neural networks to output non-Euclidean parameters with smooth gradients. For quotient manifolds, the decoder receives group-invariant features of the covering-space coordinates, so that identified points produce identical outputs. Anchor constraints fix the coordinate system relative to the data or create soft topological holes. Experiments on synthetic manifolds and real-image datasets (rotated and cyclically shifted MNIST) confirm that a topology-matched prior aligns KL regularization with the data manifold. The resulting topology-aware models outperform the Gaussian baseline at all practically relevant regularization strengths. The code is available at https://github.com/JvHulst/VAE-Topology.

2606.07068 2026-06-08 cs.LG 新提交

Bias in Filter Feature Selection Evaluation: A Meta-Analysis of Datasets, Baselines, and Experimental Design Choices

过滤特征选择评估中的偏差:数据集、基线和实验设计选择的元分析

Malick Ebiele, Malika Bendechache, Rob Brennan

发表机构 * University College Dublin(都柏林大学) University of Galway(Galway大学) ADAPT Centre(ADAPT中心)

AI总结 通过分析28项高影响力过滤特征选择研究,发现数据集数量、基线方法和新方法数量可解释33%的性能变异,揭示了评估中的潜在偏差,并提出了五项基于证据的评估建议。

详情
AI中文摘要

背景:自1990年以来,跨异构应用提出了许多特征选择方法。为了验证新方法的有用性,需要在使用至少一个数据集的特征选择任务中,与现有文献中的至少一种基线方法进行比较。表格深度学习(DL)和机器学习(ML)中数据估值的最新发展表明,新方法、算法和模型的评估可能有意识或无意识地存在偏差。我们假设在特征选择(FS)中,特别是在过滤特征选择(FFS)中,存在类似的趋势。因此,本研究的目的是检查FFS研究,以识别影响评估的因素,这些因素可能构成偏差的入口点,从而为FFS评估推荐更强的原则。方法:我们分析了1994年至2025年间发表的28项高影响力FFS研究样本。该分析提供了如何检查FFS研究的思考,强调了过程中学到的经验教训,并为未来的FFS评估给出了五项基于证据的建议。结果:多元线性回归分析得分为$R^2=0.33$。这意味着新方法相对于所选基线的性能变异(胜率)的33%可由数据集数量(#Datasets)、基线数量(#Baselines)和新方法数量(#NewMethods)解释。讨论:$R^2=0.33$被认为是中等解释力;考虑到这是首次此类研究,这一结果是有希望的。中等解释力的结果是由于胜率还受到其他因素的影响,例如特征选择领域的成熟度、数据集和基线的类型,以及用于解释关系的回归模型的简单性。

英文摘要

Background: Since 1990 many feature selection methods have been proposed across heterogeneous applications. To validate the usefulness of a new method, it needs to be compared against at least one baseline method from the existing literature on a feature selection task using at least one dataset. Recent developments in tabular Deep Learning (DL) and data valuation in Machine Learning (ML) suggest that the evaluation of new methods, algorithms, and models may be consciously or unconsciously biased. We hypothesise that a similar trend exists in feature selection (FS), particularly in filter feature selection (FFS). The aim of this study is therefore to examine FFS studies to identify factors that influence the evaluation and that might consist entry point for biases in order to recommend stronger principles for FFS evaluation. Methods: We analyse a sample of 28 high profile FFS studies published between 1994 and 2025. The analysis provides reflections on how to examine FFS studies, highlights lessons learned throughout the process, and gives five evidence-based recommendations for future FFS evaluation. Results: Multivariate Linear Regression analysis achieved a score of $R^2=0.33$. It means that 33% of the variance in the performance of new methods against chosen baselines (win rate) is explained by the number of datasets (#Datasets), the number of baselines (#Baselines), and the number of new methods (#NewMethods). Discussion: $R^2=0.33$ is considered medium explanation; which is promising given that this is the first such study. The medium explanation result is due to the fact that win rate is influenced by additional factors such as the maturity of the feature selection domain, the type of datasets and baselines, and the simplicity of the regression model used to explain the relationship.

2606.07074 2026-06-08 cs.LG cs.AI 新提交

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

SlimSearcher: 通过自适应奖励门控实现训练效率感知的Web代理

Zequn Xie, Junjie Wang, Dan Yang, Jie Feng, Yue Shen, Jian Wang, Jinjie Gu

发表机构 * Zhejiang University(浙江大学) Ant Group(蚂蚁集团)

AI总结 提出SlimSearcher框架,通过帕累托高效过滤和自适应奖励门控,在保持或提升准确率的同时将工具调用轮次减少17%-58%。

Comments 17 pages, 8 figures,

详情
AI中文摘要

深度研究代理在复杂信息寻求任务中展现了卓越能力,但这种能力伴随着高昂的计算成本。受准确率驱动训练范式的影响,当前模型采用蛮力策略,表现为盲目依赖工具和执行性推理——生成长而冗余的轨迹,这些对于解决任务远非必要,导致浪费的工具调用和过多的token消耗。为克服这一效率陷阱,我们提出SlimSearcher,一个原则性框架,在监督微调(SFT)和强化学习(RL)中推动准确率与计算成本之间的帕累托前沿。在SFT阶段,SlimSearcher采用帕累托高效过滤来提炼既成功又经济的轨迹,引导模型走向内在效率感知的搜索行为。在RL阶段,我们引入自适应奖励门控,一种动态奖励塑造机制,在采样队列中评估相对工具和token效率。通过将这些自适应效率指标与严格正确性门控级联,我们的方法有效避免了与绝对惩罚相关的简洁性偏差,并缓解了奖励黑客攻击。在包括GAIA、BrowseComp和XBenchDeepSearch在内的长时域基准上的大量实验表明,SlimSearcher在保持或提升准确率的同时,将平均工具调用轮次减少了17%-58%。

英文摘要

Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this power comes at a steep computational cost. Driven by accuracy-focused training paradigms, current models adopt brute-force strategies characterized by blind tool dependency and performative reasoning-generating long, redundant trajectories that are far from necessary for resolving these tasks, leading to wasteful tool calls and excessive token consumption. To overcome this efficiency trap, we propose SlimSearcher, a principled framework that pushes the Pareto frontier between accuracy and computational cost across both Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). In the SFT stage, SlimSearcher employs Pareto-efficient filtration to distill trajectories that are both successful and economical, guiding the model toward inherently efficiency-aware search behaviors. During RL, we introduce Adaptive Reward Gating, a dynamic reward-shaping mechanism that evaluates relative tool and token efficiency within a sampled cohort. By cascading these adaptive efficiency metrics with a strict correctness gate, our approach effectively avoids the brevity bias associated with absolute penalties and mitigates reward hacking. Extensive experiments on long-horizon benchmarks, including GAIA, BrowseComp, and XBenchDeepSearch, demonstrate that SlimSearcher reduces average tool-call rounds by 17%-58% while maintaining or improving accuracy.

2606.07093 2026-06-08 cs.LG 新提交

The discovery of the effects of women employment participation on the fertility of developing countries: A panel data approach

女性就业参与对发展中国家生育率影响的发现:面板数据方法

Thi Kim Ngan Nguyen

发表机构 * Tokyo International University(东京国际大学)

AI总结 本文使用面板数据方法,将115个发展中国家分为四大洲组,发现女性劳动参与率对生育率的影响因地区而异,仅美洲地区显著负相关。

详情
AI中文摘要

过去几十年,发展中国家的生育率显著下降,同时女性在职场中的作用有所提升。为了更深入地了解女性劳动力市场参与率对发展中国家总生育率的因果关系,本文将1991-2018年间115个发展中国家的数据集分为四个大洲组(非洲、南北美洲、亚太、欧洲),并采用数据驱动的面板数据计量经济学程序来减轻遗漏变量偏差。结果表明,南北美洲大陆女性的生育行为受到其职业选择的影响;而在其他地区的社会中,女性在考虑生育时,其他因素可能更为重要。总之,政策制定者可以借鉴本文制定政策,以在生育决策方面提供更多激励,该领域的进一步研究需要考虑发展中国家的家庭政策和从夫居作为重要数据。

英文摘要

The fertility trend in developing countries has experienced a significant decline in the last few decades; at the same time, the role of women in the workplace has improved. To have a better insight of the causality of the rate of women participation in the labor market on the total fertility rate in developing world, this paper divides the dataset of 115 developing countries in the period of 1991-2018 into four continents group (Africa, North/South America, Asia/Pacific, Europe) and then applies a data-driven panel data econometric procedure to mitigate omitted bias. The results suggest that the fertility behaviors of women in the North/South America continents are influenced by their career choice; meanwhile in society of other regions, other factors might be more important to women when thinking of having children. In conclusion, policymakers can reference to the paper and formulate policies to have more incentives in making reproductive decisions and further research in the field needs to consider family policies and patrilocality of developing countries as important data.

2606.07116 2026-06-08 cs.LG cs.AI cs.CL 新提交

OffQ: Taming Structured Outliers in LLM Quantization by Offsetting

OffQ:通过偏移驯服LLM量化中的结构化异常值

Haoqi Wang, Lorenz K. Mueller, Jiawei Zhuang, Mathieu Salzmann, Lukas Cavigelli

发表机构 * School of Computer and Communication Sciences, EPFL, Switzerland(瑞士联邦理工学院计算机与通信科学学院) Huawei, Switzerland(华为公司) Swiss Data Science Center, ETHZ & EPFL, Switzerland(瑞士数据科学中心,苏黎世联邦理工学院与联邦理工学院)

AI总结 提出OffQ方法,通过top-1 PCA识别异常值子空间、旋转集中异常值通道并转换为共享偏移,实现LLM的低比特均匀量化,在W4A4KV4下提升精度。

详情
AI中文摘要

低比特量化已被广泛采用,通过显著降低计算成本和内存使用来加速大型语言模型(LLM)的推理。然而,激活异常值对有效量化构成了重大挑战,常常导致显著的性能下降。在本文中,我们介绍了OffQ,一种通过新颖的偏移机制来缓解低比特量化中激活异常值的方法。具体来说,OffQ首先使用提出的top-1 PCA识别激活中的低维异常值子空间,然后通过旋转将高幅度激活集中到1个通道中。OffQ随后通过将其幅度转换为共享偏移来吸收这个集中的异常值通道,从而降低激活的标准差。这种偏移策略使得使用部署友好的均匀网格和均匀精度量化对LLM进行有效的W4A4KV4量化成为可能。在多种LLM架构和基准上的广泛实验表明,OffQ优于最先进的基线,在保持低比特效率的同时持续提高模型精度。

英文摘要

Low-bit quantization has been widely adopted to accelerate the inference of large language models (LLMs) by significantly reducing computational cost and memory usage. However, activation outliers pose a major challenge to effective quantization, often leading to notable performance degradation. In this paper, we introduce OffQ, a method designed to mitigate activation outliers in low-bit quantization through a novel offsetting mechanism. Specifically, OffQ first identifies a low-dimensional outlier subspace in the activations using a proposed top-1 PCA, and then concentrates high-magnitude activations into 1 channel via rotation. OffQ then absorbs this concentrated outlier channel by converting its magnitude into a shared offset, thereby reducing the standard deviation of the activations. This offsetting strategy enables effective W4A4KV4 quantization of LLMs using deployment-friendly uniform-grid and uniform-precision quantization. Extensive experiments across diverse LLM architectures and benchmarks demonstrate that OffQ outperforms state-of-the-art baselines, consistently improving model accuracy while preserving low-bit efficiency.

2606.07120 2026-06-08 cs.LG 新提交

Beyond Linear and Overcomplete Regimes: A Mean-Field Analysis of Bottleneck Autoencoders

超越线性与过完备机制:瓶颈自编码器的平均场分析

Santanu Das, Ramyak Bilas, Pascal Esser, Satyaki Mukherjee

发表机构 * STCS department, Tata Institute Of Fundamental Research(STCS部门,印度塔塔基础研究 institute) Department of Mathematics, Indiana University(数学系,印第安纳大学) Department of Mathematics, Ludwig-Maximilians-Universität München(数学系,慕尼黑路德维希-马克西米利安大学) Department of Mathematics, National University of Singapore(数学系,新加坡国立大学)

AI总结 研究非线性瓶颈自编码器在平均场机制下的学习动态,推导编码器和解码器的显式动态方程,证明有限宽度网络的经验风险高概率跟踪平均场风险轨迹,且最优解收敛于平均场最优。

详情
AI中文摘要

自编码器通过将数据映射到潜在空间并最小化重构误差来学习低维表示。尽管经验成功,其理论理解仍然有限,且主要局限于线性模型或无瓶颈设置。本文研究了在平均场机制下具有固定有限维瓶颈的非线性自编码器。我们推导了编码器和解码器的显式平均场学习动态,提供了非线性设置中训练的可处理表征。我们证明,在有限时间范围内,使用随机梯度下降训练的有限宽度网络的经验风险以高概率紧密跟踪平均场风险轨迹。在最优性方面,我们进一步证明有限宽度风险收敛到平均场最优,表明有限网络具有足够的表达能力来逼近无限宽度解。

英文摘要

Autoencoders (AEs) learn low-dimensional representations by mapping data into a latent space while minimizing reconstruction error. Despite their empirical success, theoretical understanding remains limited and largely restricted to linear models or settings without a bottleneck. In this work, we study nonlinear AEs with a fixed finite-dimensional bottleneck in the mean-field (MF) regime. We derive explicit MF learning dynamics for both encoder and decoder, providing a tractable characterization of training in the nonlinear setting. We show that, over finite time horizons, the empirical risk of finite-width networks trained with stochastic gradient descent closely tracks the MF risk trajectory with high probability. At optimality, we further establish that the finite-width risk converges to the MF optimum, demonstrating that finite networks are sufficiently expressive to approximate the infinite-width solution.

2606.07127 2026-06-08 cs.LG 新提交

Learning Explicit Behavioral Models with Adaptive Questions and World-Model Probes

通过自适应问题和世界模型探针学习显式行为模型

Hikaru Shindo, Yu Deng, Teng Cao, Quentin Delfosse, Christopher Tauchmann, Jannis Blüml, Gopika Sudhakaran, Kristian Kersting

发表机构 * Artificial Intelligence and Machine Learning Lab(人工智能与机器学习实验室) Technical University of Darmstadt(德累斯顿技术大学) Hessian Center for Artificial Intelligence (hessian.AI)(黑森人工智能中心) German Research Center for Artificial Intelligence (DFKI)(德国人工智能研究中心) Department of Computer Science(计算机科学系) Centre for Cognitive Science(认知科学中心)

AI总结 提出显式符号行为模型(ESBM),通过自适应问题和世界模型探针将任务性能与可解释机制结合,在Atari任务中学习高分策略并生成显式答案和机制预测。

详情
AI中文摘要

仅针对任务回报训练的交互式智能体可以获得高分,但无法表示其动作成功的机制。这导致行为脆弱且难以诊断,并在环境动态变化时限制适应性。现有的LLM反思和策略代码修复可以从失败轨迹中修正行为,但问题和世界理解测试通常仅在训练后使用。我们引入了显式符号行为模型(ESBM),一种可训练的行为模型,将任务性能与基于证据的问答和可执行机制预测相结合。ESBM通过类型化谓词、加权规则、有界选项和机制记忆表示行为;机制层在动作干预下预测符号事件、对象变化、奖励和终止后果。每次滚动后,自适应问题和主动世界模型探针将得分失败、问答错误和转换预测错误转化为局部ESBM编辑的约束。候选模型通过多准则规则选择,该规则联合评估任务得分、可回答性和主动世界模型一致性。在测试的Atari风格协议下,ESBM学习高分策略,同时产生显式答案和可执行机制预测,表明自适应问题可以作为训练压力和可重用基准,用于该设置下的机制策略学习。

英文摘要

Interactive agents trained only against task return can achieve high scores while failing to represent the mechanisms that make their actions succeed. This makes brittle behavior difficult to diagnose and limits adaptation when environment dynamics change. Existing LLM reflection and policy-code repair can revise behavior from failed trajectories, but questions and world-understanding tests are usually used only after training. We introduce an Explicit Symbolic Behavioral Model (ESBM), a trainable behavioral model that couples task performance with evidence-grounded question answering and executable mechanism prediction. An ESBM represents behavior through typed predicates, weighted rules, bounded options and mechanism memory; the mechanism layer predicts symbolic events, object changes, rewards and terminal consequences under action interventions. After each rollout, adaptive questions and active world-model probes convert score failures, QA errors and transition-prediction errors into constraints for local ESBM edits. Candidate models are selected by a multi-criterion rule that jointly evaluates task score, answerability and active world-model consistency. Under the tested Atari-style protocols, ESBM learns high-scoring policies while producing explicit answers and executable mechanism predictions, indicating that adaptive questions can serve as both training pressure and reusable benchmarks for mechanistic policy learning in this setting.

2606.07128 2026-06-08 cs.LG 新提交

A machine-learning-assisted progressive digit-randomness screening framework for detecting non-random patterns in raw numerical research data

一种机器学习辅助的渐进式数字随机性筛查框架,用于检测原始数值研究数据中的非随机模式

Zhuphua Cao

发表机构 * Key Laboratory of Natural Medicines of the Changbai Mountain, Ministry of Education, College of Pharmacy, Yanbian University(长白山天然药物重点实验室,教育部,药学院,延边大学)

AI总结 提出FDRS框架,结合统计与机器学习方法检测数值数据中的非随机数字模式,通过酶学吸光度数据集和模拟异常数据验证,能有效分级风险。

详情
AI中文摘要

原始数值数据集在完整性筛查中受到的关注少于图像、抄袭或汇总统计不一致性。我们开发了造假风险数字随机性筛查模型(FDRS),这是一个统计和机器学习框架,用于检测数值研究数据中的非随机数字模式不规则性。FDRS整合了单小数位和联合小数位检验、Cramer's V、熵度量、Kullback-Leibler散度、数字偏好指数、渐进子采样和半监督风险评分。使用仪器衍生的酶促吸光度数据集(RawData,n=253)和盲法手动模拟不规则数据集(ErrData,n=255)进行评估。RawData在单个第三小数位分析中未显示显著偏差,而ErrData显示显著偏差。在联合第三-第四小数位分析中,ErrData显示出更高的Cramer's V、更低的归一化熵、更高的KL散度以及更持久的渐进子采样偏差信号。在内部验证中,弹性网络逻辑回归取得了最高的AUC(0.98395)和最低的Brier分数(0.048439),而随机森林取得了最高的准确率(0.926667)和平衡准确率(0.935)。RawData获得了0.124627的低集成风险评分,被分类为0级;ErrData获得了0.740760的评分,被分类为3级。外部真实世界基准支持分级风险分层:三个未发现公开出版后问题的数据集被分类为0级或1级,而两个来自公开质疑或机构处理文章的数据集被分类为2级或3级。FDRS通过整合可解释的统计和机器学习特征,可以优先考虑对原始数值数据集进行进一步审查。它是一个辅助性的数字结构筛查工具,而非造假或不当行为的独立证据。

英文摘要

Raw numerical datasets remain less systematically examined in integrity screening than images, plagiarism, or summary-statistic inconsistencies. We developed the Fabrication-risk Digit Randomness Screening model (FDRS), a statistical and machine-learning framework for detecting non-random digit-pattern irregularities in numerical research data. FDRS integrates single- and joint-decimal-digit tests, Cramer's V, entropy metrics, Kullback-Leibler divergence, digit-preference indices, progressive subsampling, and semi-supervised risk scoring. It was evaluated using an instrument-derived enzymatic absorbance dataset (RawData, n=253) and a blinded manually simulated irregular dataset (ErrData, n=255). RawData showed no significant deviation in single third-decimal-digit analysis, whereas ErrData showed a significant deviation. In joint third-fourth decimal digit analysis, ErrData showed higher Cramer's V, lower normalized entropy, higher KL divergence, and a more persistent progressive-subsampling deviation signal. In internal validation, Elastic-net Logistic Regression achieved the highest AUC (0.98395) and lowest Brier score (0.048439), while Random Forest achieved the highest accuracy (0.926667) and balanced accuracy (0.935). RawData received a low ensemble risk score of 0.124627 and was classified as Grade 0; ErrData received a score of 0.740760 and was classified as Grade 3. External real-world benchmarks supported graded risk stratification: three datasets without identified public post-publication concerns were classified as Grade 0 or 1, whereas two datasets from publicly questioned or institutionally handled articles were classified as Grade 2 or 3. FDRS can prioritize raw numerical datasets for further review by integrating interpretable statistical and machine-learning features. It is an auxiliary digit-structure screening tool, not standalone evidence of fabrication or misconduct.

2606.07134 2026-06-08 cs.LG 新提交

$α$-PFN: Fast Entropy Search via In-Context Learning

$\alpha$-PFN:通过上下文学习实现快速熵搜索

Herilalaina Rakotoarison, Steven Adriaensen, Tom Viering, Carl Hvarfner, Samuel Müller, Frank Hutter, Eytan Bakshy

发表机构 * University of Freiburg(弗莱堡大学) University of Tübingen(图宾根大学) University of Amsterdam(阿姆斯特丹大学) Lund University(Lund大学) Meta

AI总结 提出一种两阶段摊销策略,利用先验数据拟合网络(PFN)在单次前向传播中近似熵搜索采集函数,实现超过50倍加速,在合成和真实基准上性能与最先进方法相当。

Comments Published at ICML 2026

详情
AI中文摘要

信息论采集函数如熵搜索(ES)为贝叶斯优化(BO)提供了原则性的探索-利用框架。然而,它们的实际实现依赖于复杂且缓慢的近似,即信息增益的蒙特卡洛估计。这种复杂性可能引入数值误差,并需要专门的、手工定制的实现。我们提出了一种两阶段摊销策略,该策略学习使用先验数据拟合网络(PFN)在单次前向传播中近似基于熵搜索的采集函数。第一个PFN被训练为以最优值的信息为条件;第二个$\alpha$-PFN通过训练来预测期望信息增益,该训练基于使用第一个PFN测量的信息增益。$\alpha$-PFN提供了一种灵活的学习近似,用每个候选点的单次前向传播取代了复杂的启发式近似,实现了快速且可扩展的采集评估。实验上,我们的方法在合成和真实世界基准上与最先进的熵搜索实现具有竞争力,同时在我们所有实验中加速了不同的熵搜索变体,加速比超过50倍。源代码:此https URL。

英文摘要

Information-theoretic acquisition functions such as Entropy Search (ES) offer a principled exploration-exploitation framework for Bayesian optimization (BO). However, their practical implementation relies on complicated and slow approximations, i.e., a Monte Carlo estimation of the information gain. This complexity can introduce numerical errors and requires specialized, hand-crafted implementations. We propose a two-stage amortization strategy that learns to approximate entropy search-based acquisition functions using Prior-data Fitted Networks (PFNs) in a single forward pass. A first PFN is trained to be conditioned on information about the optima; second, the $α$-PFN is trained to predict the expected information gain by training on information gains measured with the first PFN. The $α$-PFN offers a flexible learned approximation, which replaces the complex heuristic approximations with a single forward pass per candidate, enabling rapid and extensible acquisition evaluation. Empirically, our approach is competitive with state-of-the-art entropy search implementations on synthetic and real-world benchmarks, while accelerating the different entropy search variants across all our experiments, with speed ups over 50x. Source code: https://github.com/automl/AlphaPFN.

2606.07141 2026-06-08 cs.LG cs.AI 新提交

REMEDI: A Benchmark for Retention and Unlearning Evaluation in Multi-label Clinical Disease Inference

REMEDI:多标签临床疾病推断中的保留与遗忘评估基准

Anurag Sharma, Sai Teja Chunchu, Prasenjit Mitra, Sandipan Sikdar, Koustav Rudra

发表机构 * IIT Kharagpur(印度理工学院Kharagpur分校) Carnegie Mellon University(卡内基梅隆大学) L3S Research Center, Leibniz University Hannover(Leibniz汉诺威大学L3S研究中心)

AI总结 提出REMEDI基准,针对多标签临床疾病推断中的机器遗忘问题,利用MIMIC-III数据库评估现有方法在效用与遗忘性能间的权衡,并发现其不适用于多标签任务。

Comments Under review

详情
AI中文摘要

用于临床疾病推断的语言模型在患者数据上进行训练,这些数据可能包含敏感和私人信息,数据所有者可能出于隐私或版权原因要求从训练模型中删除其数据。然而,精确遗忘患者特定数据是棘手的,而通过少量数据删除重新训练则资源密集。虽然存在几种可用的机器遗忘方法,但其效用通常局限于非医疗领域。此外,评估此类遗忘方法的现有基准主要使用合成数据集,这些数据集不能真正代表现实系统。因此,这些遗忘方法在医疗领域的有效性在很大程度上尚不清楚。为此,我们引入了REMEDI,一个针对多标签和多类别临床疾病推断的广泛机器遗忘基准,其中标签相关性、纵向结构和安全约束使得遗忘特别具有挑战性。与现有基准不同,REMEDI考虑:(1) 相关的应用领域(医疗),(2) 涉及多样遗忘实例集的全面遗忘设置,(3) 具有挑战性的遗忘场景,包括多标签和多类别分类任务,以及(4) 评估指标,涉及效用和遗忘程度的性能。REMEDI使用MIMIC-III临床数据库开发,该数据库包含患者的全面临床数据。现有遗忘方法的实验表明,效用和遗忘性能之间存在权衡。它们也大多不适合多标签分类任务。为促进可重复性,我们公开了我们的基准。

英文摘要

Language models trained for clinical disease inference are trained on patient data, which may include sensitive and private information, and data owners may request the removal of their data from a trained model due to privacy or copyright concerns. However, exactly unlearning patient-specific data is intractable, and retraining with minor data removal is resource-intensive. While there exists several machine unlearning methods that can be used, their utility is generally restricted to non-medical domains. Moreover, the existing benchmarks for evaluating such unlearning methods primarily utilize synthetically curated datasets, which are not truly representative of real-world systems. Hence, the effectiveness of these unlearning methods in the medical domain is largely unclear. To this end, we introduce REMEDI, an extensive benchmark for machine unlearning tailored to multi-label and multiclass clinical disease inference, where label correlations, longitudinal structure, and safety constraints make unlearning particularly challenging. Unlike the existing benchmarks, REMEDI considers: (1) a relevant application domain (medical), (2) comprehensive unlearning setups involving diverse sets of forget instances, (3) challenging unlearning scenarios including multi-label and multi-class classification tasks, and (4) evaluation metrics involving performance both in terms of utility and extent of unlearning achieved. REMEDI is developed using the MIMIC-III clinical database that contains comprehensive clinical data of patients. Experiments with existing unlearning methods indicate that there exists a trade-off between utility and unlearning performance. They are also largely unsuited to multi-label classification tasks. To facilitate reproducibility, we make our benchmark publicly available.

2606.07146 2026-06-08 cs.LG cs.CE 新提交

Decision-Aware Evaluation of Physics-Informed Surrogates

决策感知的物理信息替代模型评估

Daniel Cieślak, Andrzej Czyżewski

发表机构 * Gdańsk University of Technology(格但斯克技术大学)

AI总结 针对物理信息机器学习在工程决策中的评估,提出pinn-gym基准,通过曲线误差、物理可行性、top-k检索和遗憾值等多维度指标,揭示低nRMSE不足以识别有用设计,且物理信息损失改变权衡而非单调改进所有指标。

Comments 12 pages, 5 figures, 9 tables. Code and data available at https://github.com/Dyniel/pinn-gym

详情
AI中文摘要

物理信息机器学习通常通过曲线误差来评估,尽管工程应用取决于下游决策:对候选方案进行排序、避免不可行设计以及限制遗憾值。我们引入了pinn-gym,一个用于材料条件晶格设计的开放基准,它结合了一个透明的降阶碰撞冲击预言机、五种可打印聚合物卡片、无量纲力响应目标以及一个涵盖曲线保真度、物理可行性、top-k检索和质量遗憾值的协议。在逐材料、混合和跨材料设置中,低nRMSE通常不足以识别有用的设计选择。物理信息损失改变了权衡,而不是单调地改进所有指标,并且无量纲条件化提高了可比性,但并未使迁移对称。该基准不是经过认证的材料模型;在发布的预言机、候选生成器和材料卡片中,pinn-gym提供了一个可重复的测试平台,用于评估PIML替代模型作为决策系统,而不仅仅是曲线预测器。

英文摘要

Physics-informed machine learning is often assessed by curve error, although engineering use depends on downstream decisions: ranking candidates, avoiding infeasible designs and limiting regret. We introduce pinn-gym, an open benchmark for material-conditioned lattice design that couples a transparent reduced-order crush-and-impact oracle with five printable polymer cards, dimensionless force-response targets and a protocol spanning curve fidelity, physical admissibility, top-k retrieval and mass regret. Across per-material, pooled and cross-material settings, low nRMSE is frequently insufficient to identify useful design selections. Physics-informed losses alter trade-offs rather than monotonically improving all metrics, and dimensionless conditioning improves comparability without making transfer symmetric. The benchmark is not a certified material model; within the released oracle, candidate generator and material cards, pinn-gym provides a reproducible testbed for evaluating PIML surrogates as decision systems rather than curve predictors alone.

2606.07151 2026-06-08 cs.LG 新提交

Geodesics of Dynamic Graphs for Regime Change Detection

动态图的测地线用于状态转换检测

William Cappelletti, Étienne Voutaz, Pascal Frossard

发表机构 * LTS4, EPFL(EPFL拉沃德实验室) Cyber-Defence Campus, Armasuisse(阿玛苏斯网络防御校区)

AI总结 提出将动态网络中的状态定义为时间图沿测地线的轨迹,通过图回归方法测量观测图与测地线的累积距离,结合变点检测算法识别状态转换,在合成和真实数据上优于现有方法。

详情
AI中文摘要

传统动态网络中的变点检测假设平稳状态之间的突然转换,忽略了大多数实际应用(如社交网络或物理系统)中出现的连续演化场景。我们通过将状态正式定义为时间图中连贯动态的时期来弥补这一空白,并将其表征为在适当定义的图空间中沿测地线的轨迹。这一原创视角使我们能够将状态转换定义为动态中的显著漂移,要么朝向新轨迹,要么速度变化。我们利用图回归方法测量观测图序列与相关图空间中其端点之间估计测地线的累积距离,并可将其与变点检测算法结合。我们在具有变化轨迹和不同速度的动态网络上进行实验,结果优于最先进的变点检测模型。然后,我们分析了新冠疫情期间的流动性数据,并表明我们对规则网络演化的假设导致变点与外部事件相比基线方法的结果更一致。我们的工作是首次在图空间中建模和检测演化状态之间的变化,为分析复杂时间图数据提供了现实且强大的工具。

英文摘要

Traditional change point detection in dynamic networks assumes abrupt transitions between stationary states, overlooking scenarios of continuous evolution which arise in most real-world applications, such as social networks or physical systems. We address this gap by formally defining regimes as periods of coherent dynamics in temporal graphs, which we characterize as trajectories along geodesics in a suitably defined graph space. This original perspective allows us to define regime changes as significant drifts in dynamics, either toward new trajectories or with pace changes. We leverage graph regression methods to measure the cumulative distance of sequences of observed graphs from the estimated geodesics between their endpoints, in the relevant graph space, which we can combine with change point detection algorithms. We present experiments on dynamic networks, with changing trajectories and varying speeds, in which we outperform state of the art change point detection models. Then, we analyse mobility data during the Covid-19 pandemic, and show that our assumptions on regular network evolution lead to change points that are more aligned to external events compared to the outcomes of baseline methods. Our work is the first to model and detect changes between evolving regimes in graph space, providing a realistic and powerful tool for analyzing complex temporal graph data.

2606.07181 2026-06-08 cs.LG cs.AI q-bio.MN 新提交

RETROSPECT: RETROsynthesis via Sequential Prediction, and Chemically Transformed-ranking

RETROSPECT: 通过序列预测和化学变换排序的逆合成

Raja Sekhar Pappala, Shreyas Vinaya Sathyanarayana, Ronit Kumar Choudhary, Arjun Verma, Deepak Warrier

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出RETROSPECT系统,将单步逆合成分解为候选生成和重排序,结合ChemAlign Transformer生成器和LambdaMART重排序器,在USPTO-50K上实现55.00% top-1准确率。

Comments Accepted at the AI for Science workshop (ICML 2026)

详情
AI中文摘要

单步逆合成既需要准确的首位建议,也需要足够丰富的候选列表以供下游选择。我们将其研究为提议-选择分解。我们的系统RETROSPECT结合了一个单一的Transformer提议模型(我们称之为ChemAlign Transformer)和一个基于结构、反应模板、上游分数以及可选的DFT衍生描述符的LambdaMART重排序器。生成器使用混合根对齐和随机SMILES增强、预层归一化、绑定嵌入、指数移动平均权重以及可微的原子平衡辅助损失进行训练。在包含5,007个反应的完整USPTO-50K测试集上,生成器达到55.00%的top-1和86.18%的top-10精确匹配准确率,top-1有效率为99.86%。在用于重排序的合并候选池基准上(包含5,007个测试产物,每个产物约111个候选),基于结构特征集训练的LambdaMART模型达到59.4%的top-1和0.7171的平均倒数排名。特征消融实验表明,上游提议分数和模板频率统计提供了大部分重排序信号,而DFT和反应中心DFT特征提供的增益较小且不一致。这些结果支持逆合成的模块化观点:更强的单模型提议和学习候选选择是互补的,并且提议模型可以作为集成系统(如RetroChimera (Maziarz et al., 2024))的即插即用组件。

英文摘要

Single-step retrosynthesis needs both accurate first-ranked suggestions and candidate lists that are rich enough for downstream selection. We study this as a proposal-selection decomposition. Our system, RETROSPECT, combines a single Transformer proposal model, which we call the ChemAlign Transformer, with a LambdaMART reranker over structural, reaction-template, upstream-score, and optional DFT-derived descriptors. The generator is trained with hybrid root-aligned and random SMILES augmentation, Pre-LayerNorm, tied embeddings, exponential moving average weights, and a differentiable atom-balance auxiliary loss. On the full USPTO-50K test set of 5,007 reactions, the generator reaches 55.00% top-1 and 86.18% top-10 exact-match accuracy with 99.86% top-1 validity. On the merged candidate-pool benchmark used for reranking, which contains 5,007 test products and about 111 candidates per product, a LambdaMART model trained on the structural feature set reaches 59.4% top-1 with 0.7171 mean reciprocal rank. Feature ablations show that upstream proposal score and template-frequency statistics provide most of the reranking signal, while DFT and reaction-center DFT features provide smaller and less consistent gains. These results support a modular view of retrosynthesis: stronger single-model proposal and learned candidate selection are complementary, and the proposal model can serve as a drop-in component for ensemble systems such as RetroChimera (Maziarz et al., 2024)

2606.07196 2026-06-08 cs.LG 新提交

Structure-Preserving Correction Learning for Sparse Bayesian Inference in Brain Source Imaging

脑源成像中稀疏贝叶斯推断的结构保持校正学习

Marco Morik, Xiao Ruiting, Shinichi Nakajima, Stefan Haufe, Ismail Huseynov

发表机构 * Berlin Institute for the Foundations of Learning and Data (BIFOLD)(柏林学习与数据基础研究所(BIFOLD)) Technische Universität Berlin(柏林技术大学) RIKEN Center for Advanced Intelligence Project (AIP)(理化学研究所先进智能项目中心(AIP)) Physikalisch-Technische Bundesanstalt(物理技术联邦机构) Charité – Universitätsmedizin Berlin(柏林夏里特大学医学院)

AI总结 提出一种结构保持的校正学习方法,通过展开经典联合超参数求解器为可训练神经网络,在保留贝叶斯结构的同时学习更新机制,提升M/EEG脑源成像的重建性能和收敛性。

Comments preprint

详情
AI中文摘要

经典的稀疏Type-II贝叶斯方法用于M/EEG脑成像支持源和噪声超参数的联合估计,但依赖于固定的迭代更新规则。尽管这些更新是有原则且可解释的,但其动态无法从数据中适应。我们提出学习更新机制本身,同时通过将经典联合超参数求解器展开为可训练的神经架构(其层镜像原始迭代)来保留底层贝叶斯结构。得到的框架初始化为在训练前精确恢复经典求解器,并通过逐渐更具表达力的校正学习机制(从可学习偏置到自适应MLP和基于注意力的上下文细化)得到丰富。这样,训练不会用黑箱预测器替代贝叶斯推断,而是学习结构化的校正项,同时保留原始更新动态的可解释性和基于模型的特性。因此,结构保持校正学习旨在改善经验重建性能,而不替代原始的基于模型的推断机制。实验结果表明,学习的校正变体在保留算法透明性的同时,改善了基线展开求解器的重建性能和收敛行为。

英文摘要

Classical sparse Type-II Bayesian methods for M/EEG brain imaging support joint estimation of source and noise hyperparameters, but rely on fixed iterative update rules. Although these updates are principled and interpretable, their dynamics cannot be adapted from data. We propose to learn the update mechanism itself while preserving the underlying Bayesian structure by unfolding a classical joint hyperparameter-learning solver into a trainable neural architecture whose layers mirror the original iterations. The resulting framework is initialized to recover the classical solver exactly before training and is enriched through progressively more expressive correction-learning mechanisms, ranging from learnable biases to adaptive MLP and attention-based contextual refinements. In this way, training does not replace Bayesian inference with a black-box predictor, but instead learns structured correction terms while retaining the interpretability and model-based character of the original update dynamics. Structured correction learning therefore aims to improve empirical reconstruction performance without replacing the original model-based inference mechanism. Experimental results show that the learned correction variants improve reconstruction performance and convergence behavior over the baseline unfolded solver while preserving its algorithmic transparency.

2606.07239 2026-06-08 cs.LG 新提交

Generative Molecular Morphing for Flexible-Size Design via Unbalanced Optimal Transport

基于非平衡最优传输的柔性尺寸分子生成变形设计

Malte Franke, Stefan P. Schmid, Zarko Ivkovic, Kjell Jorner, Andreas Krause

发表机构 * ETH Zürich(苏黎世联邦理工学院) NCCR Catalysis(催化联合研究所)

AI总结 针对现有扩散和流模型固定原子数限制的问题,提出基于非平衡最优传输的柔性尺寸分子生成模型Morph,实现条件与无条件3D分子设计,在保持性能的同时提供采样灵活性,并支持分布外生成。

详情
AI中文摘要

生成分子设计的成功取决于模型向高奖励样本的可引导性。由于许多分子性质与分子大小内在相关,准确捕捉性质与原子数的联合分布至关重要。然而,当前的扩散和基于流的模型固定了原子数,这最终限制了它们驾驭这种复杂关系的能力。为解决这一问题,我们引入了Morph,一种基于几何图的柔性尺寸生成模型,用于条件和无条件的3D分子设计。通过动态调整尺寸,Morph可以无缝集成现有的结构先验(如骨架),并显著增强性质引导。我们证明Morph在提供无与伦比的采样灵活性的同时,与当前固定尺寸的最先进模型性能相当。我们展示了在先前模型失败的领域中的分布外生成,为分子设计的增强生成建模铺平了道路。

英文摘要

The success of generative molecular design hinges on a model's steerability toward high-reward samples. Because many molecular properties are intrinsically linked to molecular size, accurately capturing the joint distribution of properties and the number of atoms is essential. However, current diffusion and flow-based models fix the number of atoms, which ultimately limits their ability to navigate this complex relationship. To address this, we introduce Morph, a flexible-size generative model for conditional and unconditional 3D molecular design based on geometric graphs. By dynamically adapting size, Morph can seamlessly integrate existing structural priors, like scaffolds, and significantly enhances property steering. We show that Morph matches current fixed-size state-of-the-art models while offering the benefit of unparalleled sampling flexibility. We demonstrate out-of-distribution generation in regimes where previous models fail, paving the way for enhanced generative modeling for molecular design.

2606.07254 2026-06-08 cs.LG cs.FL 新提交

A Held-Out Transition-Pair Falsifier for Long-Horizon Non-Abelian State Tracking

长视野非阿贝尔状态跟踪的保留转移对验证器

Jeonghoon Lee

发表机构 * Attractor Dynamics(吸引子动力学)

AI总结 针对序列模型在非交换状态跟踪中的局限,提出保留转移对验证协议,在投影循环状态模型上实现长达百万步的完美预测,揭示显式非交换状态组合作为有效归纳偏置。

Comments Technical preprint, 24 pages. 7 figures

详情
AI中文摘要

状态跟踪揭示了序列模型的一个尖锐限制:相关信号通常不是观测令牌的摘要,而是通过非交换变换演化的有序潜在状态。我们引入了一个用于有限非阿贝尔群跟踪的保留转移对验证器。该协议在训练期间禁止选定的有序生成器对,并在评估期间要求相同的局部模式,从而阻断了一条直接的局部转移记忆路径。在一个受控的 $S_3 \ imes S_3$ 基准测试中,仅在长度为8的序列上训练的投影循环状态模型,在长达1,048,576个令牌的评估视野中,跨五个种子产生了无错误的最终状态预测(每个视野完美250/250)。匹配的原生读出基线,包括bag、GRU和单配置结构化状态空间模型,在相同协议下保持接近基线水平。投影匹配的GRU、结构化SSM和bag基线配备了类似的有限群原型读出,在相同分割下也保持接近随机水平。机制诊断显示,硬投影与低同态误差、低状态一致性漂移和非平凡交换子分离同时出现,而软投影则导致最终状态精度崩溃。干净分割审计验证了训练和评估分区之间零逐字缩减词重叠和零结构模板重叠。该证据限于这个受控的有限群验证器,而非通用架构排名。在该范围内,显式投影的非交换状态组合作为长视野隐藏状态跟踪的有用归纳偏置。

英文摘要

State tracking exposes a sharp limitation of sequence models: the relevant signal is often not a summary of observed tokens, but an ordered latent state that evolves through non-commutative transformations. We introduce a held-out transition-pair falsifier for finite non-Abelian group tracking. The protocol forbids selected ordered generator pairs during training and requires the same local patterns during evaluation, blocking one direct local-transition memorization pathway. In a controlled $S_3 \times S_3$ benchmark, a projected recurrent state model trained only on length-8 sequences produces error-free final-state predictions (perfect 250/250 per horizon) through evaluation horizons up to 1,048,576 tokens across five seeds. Matched native-readout baselines, including bag, GRU, and a single-configuration structured state-space model, remain near floor under the same protocol. Projection-matched GRU, structured SSM, and bag baselines equipped with analogous finite-group prototype readouts also remain near chance under the same split. Mechanism diagnostics show that hard projection coincides with low homomorphism error, low state-consistency drift, and non-trivial commutator separation, while softened projection collapses final-state accuracy. Clean-split audits verify zero verbatim reduced-word overlap and zero structural-template overlap between training and evaluation partitions. The evidence is scoped to this controlled finite-group falsifier rather than to a general architecture ranking. Within that regime, explicit projected non-commutative state composition acts as a useful inductive bias for long-horizon hidden-state tracking.

2606.07271 2026-06-08 cs.LG cs.AI cs.SD 新提交

Where Rectified Flows Leak: Characterising Membership Signals Along the Interpolation Path

整流流泄漏之处:沿插值路径表征成员信号

Thomas Sesmat, Gabriel Meseguer-Brocal, Geoffroy Peeters

发表机构 * University of Amsterdam(阿姆斯特丹大学)

AI总结 本文分析整流流(Rectified Flows)在插值路径上的训练数据成员信号,发现训练与测试数据的重建差异呈钟形曲线,并在高斯假设下推导出峰值位置,验证了该结构的普适性,并利用其进行成员推断攻击。

Comments ICML 2026 article, 9 main pages and 25 with annexes, 11 figures

详情
Journal ref
43rd International Conference on Machine Learning, Seoul, South Korea, 2026
AI中文摘要

理解生成模型从训练数据中保留了什么仍然具有挑战性,这对版权和隐私有影响。除了逐字复制外,模型可以编码训练数据中更微妙的痕迹,这些痕迹从未出现在输出中,但仍可利用。我们针对整流流(Rectified Flows)研究了这一机制,整流流越来越多地用于部署的生成系统。我们分析了定义整流流训练的插值路径 $X_\lambda = (1-\lambda)X_0 + \lambda X_1$。我们展示了训练数据和测试数据的重建之间存在一个差距,该差距在 $\lambda$ 上呈钟形曲线,并在训练过程中累积,而验证指标保持稳定。该信号有一个最大值,我们在高斯假设下推导出其位置的闭式解。我们在音频和图像上验证了这些预测,并表明钟形结构是普遍的,而峰值预测在我们的假设满足时成立。作为概念验证,我们利用这种特定的 $\lambda$ 解析结构进行成员推断攻击,区分训练集的成员和非成员。

英文摘要

Understanding what generative models retain from training data remains challenging, with implications for copyright and privacy. Beyond verbatim reproduction, models can encode subtler traces of their training data that never surface in their outputs yet remain exploitable. We study this regime for Rectified Flows, which are increasingly used in deployed generative systems. We analyse the interpolation path $X_λ= (1-λ)X_0 + λX_1$ that defines the Rectified Flow training. We show that a gap exists between the reconstruction of train and test data that follows a bell-shaped curve over $λ$, wich accumulates during training, while the validation metrics remain stable. The signal has a maximum whose location we derive in closed form under Gaussian assumptions. We validate these predictions on both audio and images and show that the bell-shaped structure is universal, while the peak prediction holds when our assumptions are satisfied. As a proof of concept, we exploit this specific $λ$-resolved structure to perform a Membership Inference Attack, distinguishing members of the training set from non-members.

2606.07289 2026-06-08 cs.LG cs.CV 新提交

Closed-Form Spectral Regularization for Multi-Task Model Merging

多任务模型融合的闭式谱正则化

Yongxian Wei, Runxi Cheng, Xingxuan Zhang, Li Shen, Chun Yuan, Peng Cui, Dacheng Tao

发表机构 * Shenzhen International Graduate School, Tsinghua University(清华大学深圳国际研究生院) Department of Computer Science and Technology, Tsinghua University(清华大学计算机科学与技术系) Sun Yat-sen University(中山大学) Nanyang Technological University(南洋理工大学)

AI总结 针对多任务模型融合中的干扰最小化问题,发现迭代求解器实际充当隐式谱正则化器,据此提出基于谱滤波的闭式方法SWUDI及其自适应变体SWUDI-A,显著提升效率并匹配或超越现有方法。

详情
AI中文摘要

模型融合将多个独立微调专家合并为单个多任务模型,无需任何训练数据,降低了大型基础模型的存储、服务和去中心化开发成本。最先进的融合方法将融合表述为逐层二次干扰最小化问题。尽管该问题存在精确的闭式伪逆解,但该解在实践中性能不如数百次梯度下降迭代。迭代循环主导了流程的成本,但其有效性尚未得到解释。我们重新审视这一机制,并表明迭代求解器主要并非作为优化器;相反,它充当了病态正规方程的隐式谱正则化器,其中每层干扰算子的小特征值方向放大了代理噪声。基于这一发现,我们将多任务模型融合形式化为一个带噪线性逆问题,并提出一种由逐方向滤波器参数化的谱滤波估计器。我们通过SWUDI实例化该估计器,这是一种闭式方法,结合了软指数滤波器(匹配迭代下降的梯度流轨迹)和硬top-K截断(抑制放大噪声的小特征值方向)。此外,我们提出了SWUDI-A,一种自适应变体,用逐层秩规则替换全局秩超参数,进一步提高了跨架构的鲁棒性。两种变体共享每个线性层的单个对称特征分解,且不需要训练数据或优化器状态。在四个通用基准和一个涵盖VQA、几何、图表、OCR、定位和模态融合的多模态融合基准上,我们提出的谱求解器匹配或超越了最先进的融合方法。关键的是,它们将挂钟时间减少了28-72倍,峰值GPU内存减少了高达50%。

英文摘要

Model merging combines several independently fine-tuned experts into a single multi-task model without any training data, reducing the storage, serving, and decentralized-development costs of large foundation models. State-of-the-art merging methods formulate merging as a layer-wise quadratic interference minimization problem. Although this problem admits an exact closed-form pseudoinverse solution, that solution underperforms hundreds of iterations of gradient descent in practice. The iterative loop dominates the cost of the pipeline, yet its effectiveness has remained unexplained. We revisit this regime and show that the iterative solver does not primarily act as an optimizer; rather, it serves as an implicit spectral regularizer for an ill-posed normal equation, where small-eigenvalue directions of the per-layer interference operator amplify proxy noise. Building on this finding, we formalize multi-task model merging as a noisy linear inverse problem and propose a spectral filtering estimator parameterized by a per-direction filter. We instantiate this estimator with SWUDI, a closed-form method that combines a soft exponential filter, which matches the gradient-flow trajectory of iterative descent, with a hard top-K truncation that suppresses noise-amplifying small-eigenvalue directions. Furthermore, we propose SWUDI-A, an adaptive variant that replaces the global rank hyperparameter with per-layer rank rules, further improving robustness across architectures. Both variants share a single symmetric eigendecomposition per linear layer and require no training data or optimizer state. Across four general benchmarks and a multimodal merging benchmark spanning VQA, Geometry, Chart, OCR, Grounding, and modality merging, our proposed spectral solvers match or outperform state-of-the-art merging methods. Crucially, they reduce wall-clock time by 28-72x and peak GPU memory by up to 50%.

2606.07291 2026-06-08 cs.LG 新提交

Trio: Learning Time-Series Forecasting with Temporal-Spatial-Sample Attention and Structural Causal Priors

Trio: 基于时间-空间-样本注意力与结构因果先验的时间序列预测学习

Tao Chen, Yexu Zhou, Zhi Gong, Hengwei He, Hongda Li, Zhewei Chen, Dongjing Wang, Xin Zhang, Decheng Liu, Chunlei Peng, Zheng Chen, Wenyue Ding

发表机构 * Nanyang Technological University(南洋理工大学)

AI总结 提出Trio架构,通过时间、空间和样本三种注意力机制分别捕获时序动态、变量间依赖和历史样本对应,并引入时间序列结构因果模型生成合成任务以提供结构先验,提升多变量时间序列预测性能。

详情
AI中文摘要

多变量时间序列预测要求模型对时间动态、跨变量依赖以及历史输入-输出对应关系进行推理。最近的先验数据拟合网络(PFNs)表明,合成任务可用于学习可迁移的推理行为。然而,直接将这一范式迁移到时间序列预测仍然困难,因为时间顺序、动态滞后和重复的历史模式无法被普通的表格先验自然捕获。受此观察启发,我们提出了Trio,一种基于时间-空间-样本注意力的样本感知时间序列预测架构。时间注意力捕获窗口内动态,空间注意力建模变量间依赖,样本注意力检索相关的历史回溯-未来对以指导当前预测。我们的目标并非声称一个完全通用的PFN风格预测器,而是研究如何在预测模型中显式组织和重用历史输入-输出示例。我们进一步引入了一个时间序列结构因果模型(TS-SCM)生成器,以创建具有动态滞后、跨变量交互、噪声、反馈和分布漂移的结构化合成预测任务。在合成、工业和公共基准上的实验表明,所提出的架构提高了预测性能。探索性的零样本实验进一步表明,TS-SCM生成的任务可能提供有用的结构先验,而完全通用的PFN风格时间序列预测仍是一个开放问题。

英文摘要

Multivariate time-series forecasting requires models to reason over temporal dynamics, cross-variable dependencies, and historical input-output correspondences. Recent Prior-Data Fitted Networks (PFNs) suggest that synthetic tasks can be useful for learning transferable inference behavior. However, directly transferring this paradigm to time-series forecasting remains difficult, since temporal order, dynamic lags, and recurring historical patterns are not naturally captured by ordinary tabular priors. Motivated by this observation, we propose Trio, a sample-aware time-series forecasting architecture based on Temporal-Spatial-Sample attention. Temporal attention captures within-window dynamics, spatial attention models inter-variable dependencies, and sample attention retrieves relevant historical lookback-future pairs to guide the current prediction. Rather than claiming a fully general PFN-style forecaster, our goal is to study how historical input-output examples can be explicitly organized and reused within a forecasting model. We further introduce a Time-Series Structural Causal Model (TS-SCM) generator to create structured synthetic forecasting tasks with dynamic lags, cross-variable interactions, noise, feedback, and distributional drift. Experiments on synthetic, industrial, and public benchmarks show that the proposed architecture improves forecasting performance. Exploratory zero-shot experiments further suggest that TS-SCM-generated tasks may provide useful structural priors, while fully general PFN-style time-series forecasting remains an open problem.

2606.07303 2026-06-08 cs.LG 新提交

Bootstrap Theory of Representational Emergence: Explanatory Insufficiency as a Driver of Representation Learning and World Models

表征涌现的自举理论:解释不充分性作为表征学习与世界模型的驱动力

Jacques Raynal, Pierre Slangen, Elsa Raynal, Jacques Margerit

发表机构 * Laboratory of Bioengineering and Nanosciences (LBN), University of Montpellier(生物工程与纳米科学实验室(LBN),蒙彼利埃大学) EuroMov Digital Health in Motion, University of Montpellier, IMT Mines Alès(EuroMov数字健康运动,蒙彼利埃大学,IMT矿山阿尔勒) Certified Sophrologist, Sensorimotor Practice, Montpellier, France(认证Sophrologist,运动觉实践,蒙彼利埃,法国) Emeritus Professor, University of Montpellier(荣誉教授,蒙彼利埃大学)

AI总结 提出表征涌现自举理论(TBER),将解释不充分性视为新表征涌现的积极信号,通过五阶段递归过程驱动表征创新,应用于表征学习、世界模型和科学发现。

Comments 24 pages, 25 references. Theoretical framework relating representation learning, representational emergence, and world models

详情
AI中文摘要

表征学习是现代机器学习的核心,实现了从手工特征到学习嵌入、潜在空间、基础模型、世界模型和数字孪生的转变。然而,大多数研究关注在选定表征框架后如何优化表征,而较少关注何时需要新的表征层次。我们引入表征涌现自举理论(TBER),这是一个描述当现有表征变得解释不充分时新表征如何出现的框架。在这种观点下,表征创新不仅由更多数据、更大模型或更强计算能力驱动,还由持续的解释差距驱动:即表征仍能描述观察但无法使其组织或变换变得可理解的情况。TBER将解释不充分性识别为表征转变的积极信号。一个表征变得不充分,并非因为它必然错误,而是因为其解释领域已被超越。自举动态遵循递归序列:观察揭示异常;异常暴露不充分性;不充分性激发新表征;这些新表征产生进一步观察和可能的新异常。我们通过五个阶段形式化这一过程:稳定观察、异常检测、解释不充分性识别、表征涌现和临时稳定。我们讨论了在表征学习、潜在空间、基础模型、世界模型、数字孪生、自适应生物系统和科学发现中的应用。TBER表明,未来AI系统可能受益于检测其内部表征解释极限的机制。

英文摘要

Representation learning is central to modern machine learning, enabling transitions from handcrafted features to learned embeddings, latent spaces, foundation models, world models, and digital twins. Yet most research examines how representations are optimized after a representational framework has been selected, while less attention is given to when a new level of representation becomes necessary. We introduce the Bootstrap Theory of Representational Emergence (TBER), a framework describing how new representations arise when existing ones become explanatorily insufficient. In this view, representational innovation is not only driven by more data, larger models, or greater computational power, but also by persistent explanatory gaps: situations in which a representation can still describe observations but can no longer make their organization or transformations intelligible. TBER identifies explanatory insufficiency as a positive signal for representational transition. A representation becomes insufficient not because it is necessarily false, but because its explanatory domain has been exceeded. The bootstrap dynamic follows a recursive sequence: observations reveal anomalies; anomalies expose insufficiencies; insufficiencies motivate new representations; and these new representations generate further observations and possible new insufficiencies.We formalize this process through five stages: stabilized observation, anomaly detection, recognition of explanatory insufficiency, representational emergence, and provisional stabilization. We discuss applications to representation learning, latent spaces, foundation models, world models, digital twins, adaptive biological systems, and scientific discovery. TBER suggests that future AI systems may benefit from mechanisms for detecting the explanatory limits of their own internal representations.

2606.07345 2026-06-08 cs.LG 新提交

TabSwift: An Efficient Tabular Foundation Model with Row-Wise Attention

TabSwift: 一种高效的基于行注意力的表格基础模型

Si-Yang Liu, Han-Jia Ye

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出TabSwift,通过门控注意力稳定和可学习注册令牌增强轻量级行注意力骨干,实现高效表格上下文学习,在保持竞争力的同时降低推理成本。

Comments Accepted to ICML 2026, spotlight

详情
AI中文摘要

以TabPFN为代表的表格基础模型通过上下文学习进行预测,直接从带标签的训练样本推断测试标签。它们已展现出有竞争力的性能,尤其是在中小型数据集上。然而,最近的表格基础模型通常通过日益复杂的架构来提高准确性,导致更高的推理成本并限制了实际部署。在这项工作中,我们重新审视了原始TabPFN设计,并表明一个轻量级的仅行注意力骨干可以通过两个简单的增强保持高度竞争力:门控注意力稳定机制和一组可学习的注册令牌,提供全局上下文并改善预训练质量。由此产生的模型TabSwift支持分类和回归,与更强的表格基础模型(如TabPFN v2和TabICL)竞争,同时推理效率更高。对于延迟敏感的服务,我们进一步引入了一种自适应逐层早期退出机制,动态调整每个样本的推理深度。总体而言,TabSwift为实际部署实现了高效且随时可用的表格上下文学习。

英文摘要

Tabular foundation models, exemplified by TabPFN, perform prediction via in-context learning, inferring test labels directly from labeled training examples. They have demonstrated competitive performance, particularly on small-to-medium datasets. However, recent tabular foundation models often improve accuracy with increasingly complex architectures, incurring higher inference cost and limiting practical deployment. In this work, we revisit the original TabPFN design and show that a lightweight row-wise attention-only backbone can remain highly competitive with two simple enhancements: a gated attention stabilization mechanism and a small set of learnable register tokens that provide global context and improve pretraining quality. The resulting model, TabSwift, supports both classification and regression, and is competitive with stronger tabular foundation models (e.g., TabPFN v2 and TabICL) while being more efficient at inference. For latency-sensitive serving, we further introduce an adaptive layer-wise early-exit mechanism that dynamically adjusts inference depth per sample. Overall, TabSwift enables efficient and anytime tabular in-context learning for practical deployments.

2606.07351 2026-06-08 cs.LG cs.AI 新提交

SleepExplain: Explainable Non-Rapid Eye Movement and Rapid Eye Movement Sleep Stage Classification from EEG Signal

SleepExplain: 基于EEG信号的可解释非快速眼动和快速眼动睡眠阶段分类

Rafsan Jany, Md. Hamjajul Ashmafee, Iqram Hussain, Md Azam Hossain

AI总结 提出SleepExplain模型,使用集成学习(随机森林、XGBoost、梯度提升)对NREM和REM睡眠阶段进行分类,准确率达94.30%,并利用SHAP提供可解释性。

Comments 6 pages, 7 figures, 2022 25th International Conference on Computer and Information Technology (ICCIT)

详情
Journal ref
2022 25th International Conference on Computer and Information Technology (ICCIT), pp. 248-253, 2022
AI中文摘要

睡眠阶段分类是多种睡眠相关疾病最重要的诊断方法之一。脑电图(EEG)被认为是检查神经效应与睡眠阶段之间关联的有力工具,因为它能正确识别与睡眠相关的神经变化。在非快速眼动(NREM)和快速眼动(REM)睡眠阶段,许多神经和身体功能受到影响,因此在其功能中扮演重要角色。本研究旨在从睡眠EEG数据中分类NREM和REM睡眠阶段,并提出一个新颖的SleepExplain模型,一种可解释的NREM和REM睡眠阶段分类,以解释其预测。在这项工作中,使用随机森林、XGBoost和梯度提升集成分类模型对睡眠阶段进行分类。总体而言,我们获得了92.54%(随机森林)、94.25%(梯度提升)和94.30%(XGBoost)的准确率。对于可解释分类模型,我们采用博弈论方法SHAP(SHapley Additive exPlanations)为预测提供令人信服的解释。

英文摘要

Classification of sleep stages is one of the most important diagnostic approaches for a variety of sleep-related disorders. Electroencephalography (EEG) is regarded as a powerful tool for examining the association between neurological effects and sleep phases since it correctly identifies sleep-related neurological alterations. During Non-Rapid Eye Movement (NREM) and Rapid Eye Movement (REM) sleep phases, a number of nerve and bodily functions are affected and therefore hold an important role both in their functionalities. This work aims to classify NREM and REM sleep stages from sleep EEG data and present a noble SleepExplain model, an explainable NREM and REM sleep stage classification to explain its predictions. In this work, sleep stages were classified using Random Forest, XGBoost, and Gradient Boosting ensemble classification models. Overall, we obtained an accuracy of 92.54% (Random Forest), 94.25% (Gradient Boosting), and 94.30% (XGBoost). For explainable classification model, we utilized a game theoretic approach, SHAP (SHapley Addictive exPlanations) to offer a convincing explanation for the prediction.

2606.07365 2026-06-08 cs.LG cs.AI 新提交

A robust PPG foundation model using multimodal physiological supervision

一种使用多模态生理监督的鲁棒PPG基础模型

Eloy Geenjaar, Vince Calhoun, Scott Daly, Gouthaman KV, Lie Lu, Trisha Mittal, Daniel P. Darcy

发表机构 * Dolby Laboratories(杜比实验室)

AI总结 提出一种PPG基础模型,利用ICU数据集中的心电和呼吸信号选择对比样本,无需高质量或场域数据预训练,在15个下游任务中14个取得性能提升。

详情
AI中文摘要

光电容积描记法(PPG)是一种无创测量血容量变化的方法,广泛应用于可穿戴设备和临床环境。最近的PPG基础模型要么使用开源ICU数据集,采用需要精心整理数据的预训练范式,从而难以泛化到场域数据,要么使用闭源场域PPG数据。相比之下,我们提出了一种PPG基础模型,不需要高质量或场域预训练数据,而是利用ICU数据集中伴随的心电图和呼吸信号在预训练期间选择对比样本。我们的方法允许模型保留并从噪声PPG片段中学习,提高了推理时的鲁棒性。我们的模型在比现有最先进方法少3倍的受试者上预训练,在15个不同的下游任务(包括场域日常活动和心率预测)中的14个上实现了性能提升。我们的结果表明,多模态监督可以整合互补的生理信息,以提高PPG基础模型的鲁棒性,并增强其对消费级数据的泛化能力。

英文摘要

Photoplethysmography (PPG), a non-invasive measure of changes in blood volume, is widely used in both wearable devices and clinical settings. Recent PPG foundation models either use open-source ICU datasets with pretraining paradigms that require curated data and thus complicate generalization to field-like data, or use closed-source field-like PPG data. In contrast, we propose a PPG foundation model that does not require high-quality or field-like pretraining data, and instead leverages accompanying electrocardiogram and respiratory signals in ICU datasets to select contrastive samples during pretraining. Our approach allows the model to retain and learn from noisy PPG segments, improving robustness at inference. Our model, pretrained on 3x fewer subjects than existing state-of-the-art approaches, achieves performance improvements on 14 out of 15 diverse downstream tasks, including field-like daily activity and heart rate prediction. Our results demonstrate that multimodal supervision can integrate complementary physiological information to improve the robustness of PPG foundation models and enhance their generalization to consumer-grade data.

2606.07367 2026-06-08 cs.LG 新提交

Self-evolving LLM agents with in-distribution Optimization

自演化分布内优化的LLM智能体

Yudi Zhang, Meng Fang, Zhenfang Chen, Mykola Pechenizkiy

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出Q-Evolve框架,通过分布内强化学习统一过程奖励标注与策略学习,利用加权隐式Q学习稳定贝尔曼更新,实现智能体自演化,在AlfWorld等任务上优于基线。

Comments ICML 2026

详情
AI中文摘要

大型语言模型(LLM)最近已成为复杂环境中交互智能体的强大控制器,但训练它们执行可靠的长期决策仍然是一个基本挑战。一个关键困难在于信用分配:智能体通常仅在回合结束时收到延迟奖励。在本文中,我们提出了Q-Evolve,一个用于LLM智能体的自演化框架,该框架在原则性的分布内强化学习范式中统一了自动过程奖励标注和策略学习。在每个演化迭代中,我们的方法从混合离策略数据集(结合专家演示与智能体生成的轨迹)中学习一个分布内评论家,通过加权隐式Q学习目标在稀疏奖励设置中稳定贝尔曼备份。然后,通过学习到的价值函数通过优势估计推导出逐步过程奖励,无需环境回溯或人工标注即可提供密集且可靠的监督。利用这些信号,我们执行行为近端策略优化,使智能体在用于过程奖励标注的数据上演化,从而在不加剧分布偏移的情况下实现迭代自我改进。我们在AlfWorld、WebShop和ScienceWorld上评估了我们的方法,结果显示Q-Evolve在样本效率、鲁棒性和整体任务性能上优于强基线。我们的结果表明,通过过程级监督和策略的共同演化(两者都基于共享的分布内学习循环),可以实现稳定的智能体自演化。

英文摘要

Large Language Models (LLMs) have recently emerged as powerful controllers for interactive agents in complex environments, yet training them to perform reliable long-horizon decision making remains a fundamental challenge. A key difficulty lies in credit assignment: agents often receive delayed rewards only at the end of episodes. In this paper, we propose Q-Evolve, a self-evolving framework for LLM agents that unifies automatic process-reward labeling and policy learning within a principled in-distribution reinforcement learning paradigm. In each evolving iteration, our method learns an in-distribution critic from a hybrid off-policy dataset that combines expert demonstrations with agent-generated trajectories, stabilizing Bellman backups in sparse-reward settings via a weighted Implicit Q-Learning objective. The learned value function is then used to derive step-wise process rewards through advantage estimation, enabling dense and reliable supervision without environment backtracking or human annotation. Leveraging these signals, we perform behavior-proximal policy optimization that evolves the agent over the data used for process reward labeling, allowing iterative self-improvement without exacerbating distribution shift. We evaluate our method on AlfWorld, WebShop, and ScienceWorld, showing Q-Evolve outperforms strong baselines in sample efficiency, robustness, and overall task performance. Our results demonstrate that stable agent self-evolution is achievable through the co-evolution of process-level supervision and policy, both grounded within a shared in-distribution learning loop.

2606.07382 2026-06-08 cs.LG stat.ML 新提交

Covariance Shrinkage via Stochastic Interpolation

通过随机插值的协方差收缩

Mathieu Chalvidal, Florentin Coeurdoux, Eric Vanden-Eijnden

发表机构 * Capital Fund Management(资本基金管理公司)

AI总结 提出将高维协方差估计的经典收缩重述为基于源分布与目标分布之间参数化随机插值的经验风险最小化,揭示三种降低统计风险的机制,并设计神经估计器及风险上界。

Comments 18 pages

详情
AI中文摘要

我们将高维协方差估计器的经典收缩重述为基于源分布与目标分布之间参数化随机插值的经验风险最小化。该形式将已知的收缩估计器作为特例,并揭示了降低统计风险的三种不同机制:(i) 调度:插值调度决定了可容许协方差的类别,从而影响可实现的风险。(ii) 流映射和耦合:虽然朴素构造相当于假设分布之间的独立性,但特定的耦合结构(例如最优传输问题的解)可以降低经验风险。此外,实现这种耦合的非线性流映射使插值协方差摆脱经验估计的特征基,从而实现特征向量正则化。(iii) 提前停止:通过积分回归向量场定义的估计器通过近似真实插值分布提供了额外的偏差-方差权衡。然后,我们提出了一种插值器的神经估计器,并给出了其二次风险关于插值近似误差的上界,并在合成实验中进行了验证。最后,我们将该估计器应用于真实的神经影像数据,展示了该方法在实践中提供的额外正则化能力。

英文摘要

We recast classical shrinkage of high-dimensional covariance estimators as empirical risk minimization over a parametric stochastic interpolant between a source and a target distribution. This formalism recovers known shrinkage estimators as special cases and reveals three distinct mechanisms for reducing statistical risk: (i) Scheduling: the interpolant schedule determines the class of admissible covariances, and hence the achievable risk. (ii) Flow maps and couplings: whereas naive constructions amount to assuming independence between the distributions, specific coupling structures (e.g., solutions of optimal transport problems) can lower the empirical risk. Moreover, non-linear flow maps realizing such couplings free the interpolant covariance from the eigenbasis of the empirical estimate, enabling eigenvector regularization. (iii) Early stopping: estimators defined by integrating a regressed vector field afford an additional bias-variance trade-off through approximation of the true interpolant distribution. We then propose a neural estimator of the interpolant, together with an upper bound on its quadratic risk in terms of the interpolant approximation error, and validate both on synthetic experiments. Finally, we apply the estimator to real neuroimaging data, demonstrating the additional regularization power this approach offers in practice.

2606.07387 2026-06-08 cs.LG 新提交

Making the Most of Limited Data: Score-Aware Training for Text-to-Music Generation

充分利用有限数据:面向文本到音乐生成的分数感知训练

Yun-Chen Cheng, Tzu-Hung Huang, Chih-Pin Tan

发表机构 * National Taiwan University(国立台湾大学)

AI总结 提出分数感知训练方法,利用CLAP条件Beta噪声时间表将低分音频段用于高噪声训练,结合段级过滤和两阶段字幕策略,在有限数据下实现高效文本到音乐生成,并在ICME 2026 ATTM挑战赛中获得客观评估第二名。

详情
AI中文摘要

最先进的文本到音乐生成系统依赖于大规模专有数据集和工业级计算资源,使得无法区分架构贡献与资源优势。我们提出\textit{分数感知训练},将音频-字幕对齐分数作为整个流程的直接监督信号。我们不丢弃低分片段,而是通过CLAP条件Beta噪声时间表将其重新用于高噪声训练阶段,作为有效的隐式正则化器。作为补充,段级过滤移除最不匹配的样本,两阶段字幕程序弥合了冗长训练字幕与简洁推理提示之间的分布差距。REPA辅助损失进一步从预训练的CLAP和MuQ编码器中迁移结构化语义知识,无需额外数据。我们基于FluxAudio的450M参数系统提交至ICME 2026 ATTM Grand Challenge效率赛道,在客观评估中两个赛道均排名第二,在最终MOS评估中效率赛道排名第三。

英文摘要

State-of-the-art text-to-music generation systems rely on massive proprietary datasets and industrial-scale compute, making it impossible to disentangle architectural contributions from resource advantages. We propose \textit{score-aware training}, which treats audio-caption alignment score as a direct supervision signal throughout the pipeline. Rather than discarding low-scoring segments, we repurpose them via a CLAP-conditioned Beta noise timestep schedule that routes them to high-noise training regimes, acting as an effective implicit regularizer. Complementarily, segment-level filtering removes the most misaligned examples, and a two-stage caption procedure bridges the distribution gap between verbose training captions and concise inference prompts. A REPA auxiliary loss further transfers structured semantic knowledge from pretrained CLAP and MuQ encoders without additional data. Our 450M-parameter FluxAudio-based system, submitted to the ICME 2026 ATTM Grand Challenge Efficiency Track, ranked 2nd across both tracks in the objective evaluation and 3rd in the Efficiency Track in the final MOS evaluation.

2606.07400 2026-06-08 cs.LG 新提交

Generative Modeling of Discrete Latent Structures via Dynamic Policy Gradients

通过动态策略梯度对离散潜在结构进行生成建模

Stefan Ivanovic, Ge Liu, Mohammed El-Kebir

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出GReinSS框架,使用动态缩放奖励学习潜在状态分布以最大化观测数据似然,在模拟潜在集和图重建中优于基线,并在RNA测序数据中比RSEM更准确地重建异构体。

Comments ICML 2026

详情
AI中文摘要

许多科学问题需要从间接观测中推断未观测到的机械潜在状态。虽然经典方法(如期望最大化)无法扩展到组合爆炸的空间,但深度学习方法(如变分自编码器)通常形成人工潜在状态,而非重建机械真实状态。本文提出GReinSS,一个策略学习框架,使用动态缩放奖励来学习最大化观测数据似然的潜在状态分布。我们证明GReinSS能够准确重建模拟的潜在集和潜在图,优于替代的策略学习和生成建模基线。此外,GReinSS从真实短读RNA测序数据中重建的异构体,比标准RSEM算法更匹配通过正交长读测序检测到的异构体。总体而言,GReinSS是一种从间接观测中对组合潜在状态进行生成建模和推断的原则性且实际有效的方法。

英文摘要

Many scientific problems require inferring unobserved mechanistic latent states from indirect observations. While classical approaches, including expectation maximization, do not scale to combinatorially large spaces, deep learning approaches such as variational autoencoders typically form artificial latent states rather than reconstructing the mechanistic ground-truth states. Here, we introduce GReinSS, a policy learning framework that uses dynamically rescaled rewards to learn latent state distributions that maximize the observed data likelihood. We show that GReinSS accurately reconstructs simulated latent sets and latent graphs, outperforming alternative policy learning and generative modeling baselines. Additionally, GReinSS reconstructs isoforms from real short-read RNA sequencing data that better match isoforms detected by orthogonal long-read sequencing than the standard RSEM algorithm. Overall, GReinSS is a principled and practically effective approach for generative modeling and inference of combinatorial latent states from indirect observations.

2606.07404 2026-06-08 cs.LG 新提交

Reversible Foundations: Training a 120B Sparse MoE through State-Preserving Scaling

可逆基础:通过状态保持缩放训练120B稀疏MoE

Rohan Shravan

发表机构 * The School of AI(人工智能学院)

AI总结 本文报告在单台8-GPU节点上端到端训练千亿参数稀疏混合专家模型,通过可逆循环、状态保持增长和单节点经济学三大原则,实现从密集种子到120B模型的四阶段扩展。

Comments 58 pages, 9 figures, 37 tables. Code: https://github.com/The-School-of-AI/LLM. Released models: huggingface.co/theschoolofai/LightningLM-0.1V-{2B, 5B-MoE, 9B-MoE, 120B-MoE}. Companion work: arXiv:2605.29379 (BrahmicTokenizer-131K), arXiv:2605.29459 (Kronecker Embeddings)

详情
AI中文摘要

本文报告在单台八GPU节点上端到端训练千亿参数稀疏混合专家模型。LightningLM 0.1V是一个循环骨干语言模型家族,通过四个阶段从小型密集种子扩展,经过5B和9B混合专家,最终达到120B模型,具有460个路由专家,采用top-12路由。每个更大模型从小模型的训练权重增长而来;活跃参数从密集种子的1.78B单调增加到120B时的5.93B(约占存储的118.67B的5%)。整个谱系在单节点上运行,较大阶段在8K上下文中,达到120B规模时发布的训练损失为1.78。这是一份系统和经验报告,围绕三个原则组织。可逆性:可逆循环栈在反向传播中重建激活而非存储它们,使激活内存随模型增长保持平坦。状态保持增长:每次扩展(密集到MoE、浅到深、少专家到多专家)都作为可重现原则给出,并附有错误导致的失败案例;若干失败是无声的。单节点经济学:120B通过TQP训练,这是一种量化基础专家权重和训练低秩适配器的策略,将优化器状态承载于2.26B适配器参数而非路由专家中的100B+,将专家路径优化器状态减少约45倍。新颖之处在于已知原语的集成,而非任何孤立原语:一个在单节点上端到端运行的成长谱系,以从业者级别记录,并以每个领域的留出损失作为证据,表明目标能力(多语言印度能力、代码)是通过构造学习的。模型家族、分词器和训练代码已发布。

英文摘要

This paper reports on training a hundred-billion-parameter sparse mixture of experts on a single eight-GPU node, end to end. LightningLM 0.1V is a recurrence-backbone language model family grown in four stages from a small dense seed, through a 5B and a 9B mixture of experts, to a 120B model with 460 routed experts under top-12 routing. Each larger model is grown from the trained weights of the smaller one; active parameters rise monotonically from 1.78B at the dense seed to 5.93B at 120B (about 5% of the 118.67B stored). The full lineage runs on single nodes, the larger stages at 8K context, reaching a released training loss of 1.78 at 120B scale. This is a systems and experience report. It is organized around three disciplines. Reversibility: a reversible recurrence stack reconstructs activations in the backward pass instead of storing them, holding activation memory flat as the model grows. State-preserving growth: each expansion (dense to MoE, shallow to deep, few experts to many) is given as a reproducible principle paired with the failure that results from getting it wrong; several failures are silent. Single-node economics: the 120B trains through TQP, a strategy of quantized base expert weights and trained low-rank adapters that carries optimizer state on 2.26B adapter parameters rather than 100B+ resident in routed experts, cutting expert-path optimizer state by a factor of ~45. What is new is the integration of known primitives, not any primitive in isolation: one grown lineage running end to end on a single node, documented at practitioner level, with per-domain held-out loss as evidence that targeted capabilities (multilingual Indic competence, code) were learned by construction. Model family, tokenizer, and training code are released.

2606.07410 2026-06-08 cs.LG cs.AI 新提交

A Comprehensive Anatomy of Human and DeepSeek-R1 LLM Mathematical Reasoning

人类与DeepSeek-R1大语言模型数学推理的全面解剖

Yuxiang Chen, Jun Wang

发表机构 * UCL Centre for Artificial Intelligence(伦敦大学人工智能中心)

AI总结 通过AIME 2025所有30道题目的10247个推理步骤注释,发现DeepSeek-R1存在拓扑模仿(表面模仿推理而非真正推理),但成功轨迹中分支与回溯的稳定使用以及反射在演绎推理中的有效放置是真正推理的信号。

详情
AI中文摘要

大语言模型中“顿悟时刻”的出现,特别是DeepSeek-R1-0120,引发了这些系统是真正推理还是仅仅模仿推理表象的问题。我们对AIME 2025所有30道题目进行了模型与人类推理的全面实证比较,将10247个推理步骤详尽地注释为五个功能类别:分析、推理、分支、回溯和反思。我们发现了一个明显的结构差异。人类解决方案在分析和演绎之间保持紧凑交替,而DeepSeek-R1频繁重访中间结果,进行浅层且往往不必要的验证,并在局部检查中循环,而没有有意义的逻辑进展。我们将其描述为拓扑模仿:再现推理的表面形式而不发挥其功能作用。尽管如此,我们识别出两个真正推理的信号。首先,成功轨迹表现出分支和回溯的稳定使用,而失败轨迹要么过度使用要么使用不足探索性动作。其次,反思仅在置于演绎推理中时才有效;陷入分析循环的反思专注于局部数值细节而忽略全局逻辑错误。这些发现表明,当前的长链思维模型可能更多地因推理的表象而非真正的演绎进展而获得奖励。我们讨论了改进评估和训练的方向,包括测量跨轨迹稳定性、惩罚“空转”轨迹、鼓励更深层的逻辑纠正,以及将推理时间计算重新分配给演绎和回溯。总体而言,推理质量不仅取决于反思发生的多少,还取决于反思是否一致地出现在适当的逻辑尺度上。

英文摘要

The emergence of "Aha moments" in large language models, particularly DeepSeek-R1-0120, has raised the question of whether these systems genuinely reason or merely imitate the appearance of reasoning. We conduct a comprehensive empirical comparison between model and human reasoning across all 30 problems from AIME 2025, exhaustively annotating 10,247 reasoning steps into five functional categories: Analysis, Inference, Branch, Backtrace, and Reflection. We find a clear structural difference. Human solutions maintain a compact alternation between analysis and deduction, whereas DeepSeek-R1 frequently revisits intermediate results, performs shallow and often unnecessary verification, and loops through local checks without meaningful logical progress. We describe this as topological mimicry: reproducing the surface form of reasoning without its functional role. Despite this, we identify two signals of genuine reasoning. First, successful traces exhibit stable use of branching and backtracking, while failed traces either underuse or overuse exploratory actions. Second, reflection is only effective when placed within deductive inference; reflections trapped in analysis loops focus on local numerical details while missing global logical errors. These findings suggest that current long-CoT models may be rewarded more for the appearance of reasoning than for genuine deductive progress. We discuss directions for improving evaluation and training, including measuring cross-trace stability, penalising "spinning-wheel" traces, encouraging deeper logical correction, and reallocating inference-time compute toward deduction and backtracking. Overall, reasoning quality depends not simply on how much reflection occurs, but on whether reflection appears consistently and at the appropriate logical scale.

2606.07414 2026-06-08 cs.LG cs.NE 新提交

Sparsely gated tiny linear experts

稀疏门控的微型线性专家

Simon Schug

发表机构 * Princeton University(普林斯顿大学)

AI总结 提出稀疏门控线性神经元(sgatlin)网络,通过将每个专家缩减为单个神经元并去除非线性,在等计算量下提升语言模型困惑度,同时增强可解释性。

Comments Code available at https://github.com/smonsays/sparsely-gated-linear

详情
AI中文摘要

稀疏性允许在不按比例增加计算成本的情况下扩展模型参数。虽然混合专家(MoE)模型变得越来越稀疏,但单个专家通常仍然庞大且密集。在这里,我们通过将每个专家缩减为单个神经元并选择许多可用神经元中的极小一部分,进一步增加稀疏性,从而提高计算效率和可解释性。与直觉相反,实现这两者的关键是去除通常应用于专家的非线性,从而得到一个稀疏门控线性神经元(sgatlin)网络。在等计算量比较中,我们发现用sgatlin替换所有Transformer前馈层可以在不同计算预算下改善语言模型的困惑度。同时,由此产生的前馈电路的稀疏性和线性为模型可解释性提供了新的机会。在一个小规模案例研究中,我们证明sgatlin中的前馈电路可以在无需训练额外替代模型的情况下进行解释。我们发现它们形成了语义结构化的聚类,并且在因果上参与了事实回忆。我们的发现为计算高效且可解释的Transformer前馈层指明了一条可能的路径。

英文摘要

Sparsity allows scaling model parameters without proportionally increasing computational cost. While mixture of experts (MoE) models are made increasingly sparse, individual experts typically remain large and dense. Here, we demonstrate that further increasing sparsity by shrinking each expert to consist of a single neuron and selecting a tiny fraction of many available neurons can improve compute efficiency and interpretability. Counterintuitively, the key to achieving both is removing the nonlinearity typically applied to the experts, resulting in a network of sparsely gated linear neurons (sgatlin). In an isoflop comparison, we find that replacing all transformer feedforward layers with sgatlin improves perplexity in language models across different compute budgets. At the same time, the sparsity and linearity of the resulting feedforward circuits present new opportunities for model interpretability. In a small-scale case study, we demonstrate that feedforward circuits in sgatlin can be interpreted without having to train additional replacement models. We find that they form semantically structured clusters and are causally implicated in factual recall. Our findings paint a possible path towards compute-efficient and interpretable transformer feedforward layers.

2606.07416 2026-06-08 cs.LG 新提交

Video-Based Prediction of In-Flight Particle Characteristics in Atmospheric Plasma Spraying

基于视频的大气等离子喷涂中飞行粒子特性预测

Abhijeet Praveen, Sareh Soleimani, Cormac Cureton, Aman Sidhu, Kintak Raymond Yu, Cristian Cojocaru, Narges Armanfard

发表机构 * Department of Electrical and Computer Engineering, McGill University(麦吉尔大学电气与计算机工程系) Mila – Quebec AI Institute(魁北克人工智能研究所) National Research Council Canada(加拿大国家研究委员会)

AI总结 提出利用高速视频观测等离子体羽流,通过TabPFN、CNN等模型预测飞行粒子温度和速度,最高R²达0.90和0.82,实现非侵入式诊断。

Comments Accepted at ECML PKDD 2026 (Applied Data Science Track)

详情
AI中文摘要

大气等离子喷涂(APS)是一种广泛使用的涂层工艺,其中飞行粒子的温度和速度强烈影响涂层质量。然而,这些粒子特性在操作过程中难以连续监测,这促使了非侵入式数据驱动诊断方法的发展。在这项工作中,我们研究了高速视频观测等离子体羽流在估计APS中飞行粒子特性方面的预测潜力。我们引入了三种不同的视频衍生特征表示,并使用Tabular Prior-Data Fitted Networks(TabPFN)、卷积神经网络(CNN)以及包括随机森林、梯度提升、支持向量回归和XGBoost在内的经典回归基线进行评估。实验采用分组留一交叉验证,对来自63次APS喷涂运行的126个标记的喷涂前后视频记录进行。在工程化特征实验中,TabPFN在温度预测方面表现最一致,使用组合特征表示达到R²=0.86。CNN模型在速度预测方面表现更强,达到R²=0.81。此外,我们评估了使用预训练CNN直接对原始视频帧进行操作的模型,发现预训练CNN加回归头实现了最高性能,温度和速度的R²分别为0.90和0.82。结果表明,视频衍生的羽流信息为APS非侵入式诊断和实时过程监测提供了有前景且可扩展的基础。

英文摘要

Atmospheric plasma spraying (APS) is a widely used coating process in which in-flight particle temperature and velocity strongly influence coating quality. However, these particle characteristics are difficult to monitor continuously during operation, motivating the development of non-invasive data-driven diagnostic methods. In this work, we investigate the predictive potential of high-speed video observations of the plasma plume for estimating in-flight particle characteristics in APS. We introduce three different video-derived feature representations and evaluate them using Tabular Prior-Data Fitted Networks (TabPFN), convolutional neural networks (CNN), and classical regression baselines including Random Forest, Gradient Boosting, Support Vector Regression, and XGBoost. Experiments are conducted using grouped leave-one-out cross-validation on 126 labeled pre- and post-spray video recordings from 63 APS spray runs. Across the engineered feature experiments, TabPFN achieves the most consistent performance for temperature prediction, reaching R2 = 0.86 using the combined feature representation. CNN models particularly perform stronger for velocity prediction, achieving R2 of 0.81. In addition, we evaluate models operating directly on raw video frames using pretrained CNNs and find that the highest performance is achieved by a pretrained CNN with a regression head with R2 of 0.90 and 0.82 for temperature and velocity, respectively. The results demonstrate that video-derived plume information provides a promising and scalable foundation for non-invasive APS diagnostics and real-time process monitoring.

2606.07426 2026-06-08 cs.LG 新提交

Discovering Multiscale Deep Formulas in Complex Systems via Neural-Guided Lambda Calculus

通过神经引导的Lambda演算发现复杂系统中的多尺度深层公式

Hanqiao Yu, Shusen Yang, Xuebin Ren, Cong Zhao

发表机构 * National Engineering Laboratory for Big Data Analytics(大数据分析国家工程实验室) Xi’an Jiaotong University(西安交通大学) School of Mathematics and Statistics(数学与统计学学院) School of Computer Science and Technology, Faculty of Electronic and Information Engineering(计算机科学与技术学院,电子与信息工程学院)

AI总结 提出Deflex方法,结合可分解深度能量模型和Lambda演算符号回归,自动从复杂系统中提取多尺度公式,效率最高提升7倍。

Comments 35 pages, 5 figures; Supplementary Information available as an ancillary file (79 pages)

详情
AI中文摘要

科学中的一个基本问题是以简洁数学公式的形式识别复杂系统的潜在模式。当前基于人工智能的方法在单尺度系统中表现出色,但在识别多尺度复杂系统中的尺度特定公式方面仍有限。我们提出Deflex,一种端到端的人工智能方法,可从复杂系统中自动提取可能具有不同形式的多尺度公式,包括不变量和分布。Deflex由两个子系统组成,分别称为Deflexformer和Deflexpressor。Deflexpressor是一个用于高阶公式的lambda演算符号回归模型。Deflexformer是一个可分解的深度能量模型,用于学习跨尺度的统一表示。Deflexpressor生成合成数据以预训练Deflexformer,然后通过解耦多尺度潜在关系来指导公式发现。在六个具有不同行为的代表性复杂系统中,Deflex实现了比最先进方法高达7倍的效率提升,同时实现了自动多尺度发现。我们的工作可能成为跨学科科学发现的有用工具。

英文摘要

A fundamental problem in science is identifying underlying patterns of complex systems in the form of concise mathematical formulas. Current Artificial Intelligence (AI)-based methods have shown strong performance in single-scale systems, yet remain limited in identifying scale-specific formulas in multiscale complex systems. We present Deflex, an end-to-end AI method to automatically extract multiscale formulas with potentially different forms, including invariants and distributions, from complex systems. Deflex consists of two subsystems named Deflexformer and Deflexpressor. Deflexpressor is a lambda-calculus symbolic regression model for higher-order formulas. Deflexformer is a decomposable deep energy model for learning unified representations across scales. Deflexpressor generates synthetic data to pre-train Deflexformer, which then guides formula discovery by decoupling multiscale latent relationships. Across six representative complex systems with diverse behaviors, Deflex achieves up to 7-fold higher efficiency than the state-of-the-art methods while enabling automated multiscale discovery. Our work could be a useful tool for scientific discovery across disciplines.

2606.07457 2026-06-08 cs.LG eess.SP stat.ML 新提交

Time series Foundation Models based on Physics-Informed Synthetic Histories for Cold-Start Photovoltaic Forecasting

基于物理信息合成历史的时间序列基础模型用于冷启动光伏预测

Lorenzo Longarini, Alessandro Rongoni, Simone Silenzi, Emanuele Frontoni, Riccardo Rosati

发表机构 * European Commission(欧洲委员会)

AI总结 针对光伏电站冷启动预测问题,提出利用物理信息合成历史数据,结合时间序列基础模型进行零样本预测,在440个站点上实现1.7-2倍性能提升。

Comments To be published in the 2nd ICML Workshop on Foundation Models for Structured Data

详情
AI中文摘要

在并网调试时,光伏运营商必须在目标站点观测数据可用之前预测发电量,这限制了标准监督预测器的直接使用。针对这种冷启动场景,我们提出了一种零样本流程,通过电站元数据和气象协变量生成合成发电历史,使时间序列基础模型(TSFMs)能够通过推理时条件化进行预测。我们在严格的冷启动基线、真实反馈和自预测反馈策略下,将五种TSFM与经典基线进行了基准测试。评估涵盖了四个数据集中$440$个光伏站点以及多种气候区域。协变量感知的基础模型比基线性能提升约$1.7-2$倍:TabPFN-TS在真实反馈下实现了最低误差(MAE $0.514$, RMSE $0.721$ $kWh$ ${kWp}^{-1}$ ${d}^{-1}$),而Chronos-2在自预测反馈下最为鲁棒。性能对合成历史来源基本不敏感,表明准确性更多取决于合理的时序上下文可用性,而非特定生成器。

英文摘要

At commissioning time, Photovoltaic (PV) operators must forecast production before target-site observations are available, limiting the direct use of standard supervised forecasters. This cold-start setting is addressed with a zero-shot pipeline that generates a synthetic production history from plant metadata and meteorological covariates, enabling time-series foundation models (TSFMs) to forecast through inference-time conditioning. Five TSFMs are benchmarked against classical baselines under strict Cold-Start Baseline, Real Feedback, and Self-Forecast Feedback strategies. The evaluation spans $440$ PV sites across four datasets and diverse climate regimes. Covariate-aware foundation models outperform baselines by approximately $1.7-2\times$: TabPFN-TS achieves the lowest error under Real Feedback (MAE $0.514$, RMSE $0.721$ $kWh$ ${kWp}^{-1}$ ${d}^{-1}$), while Chronos-2 is most robust under Self-Forecast Feedback. Performance is largely insensitive to the synthetic-history source, indicating that accuracy is driven more by the availability of plausible temporal context than by the specific generator.

2606.07474 2026-06-08 cs.LG 新提交

Unsupervised Continual Clustering via Forward-Backward Knowledge Distillation

无监督持续聚类:通过前向-后向知识蒸馏

Mohammadreza Sadeghi, Sareh Soleimani, Zihan Wang, Narges Armanfard

发表机构 * Department of Electrical and Computer Engineering, McGill University(麦吉尔大学电气与计算机工程系) Mila – Quebec AI Institute(魁北克AI研究所)

AI总结 提出无监督持续聚类(UCC)问题,并设计前向-后向知识蒸馏方法(FBCC),通过持续教师网络和轻量任务学生网络的双阶段蒸馏,在不存储旧数据的情况下保留聚类结构,显著减少灾难性遗忘。

Comments Accepted at ECML PKDD 2026 (Research Track). arXiv admin note: substantial text overlap with arXiv:2405.19234

详情
AI中文摘要

无监督持续学习(UCL)旨在使神经网络能够在没有标签或无法访问过去数据的情况下学习连续任务。该设置中的一个主要挑战是灾难性遗忘,即模型在学习新任务时会忘记先前学过的任务。由于缺乏指导学习和记忆保留的标签,这一挑战在UCL中被放大。现有的缓解策略,如知识蒸馏和重放缓冲区,常常引发内存和隐私问题。此外,当前的UCL方法大多忽略了聚类特定的目标。为了填补这一空白,我们引入了无监督持续聚类(UCC),并提出了用于持续聚类的前向-后向知识蒸馏(FBCC)。FBCC采用一个带有聚类投影仪的持续教师网络和轻量级任务特定学生网络。通过双阶段的前向-后向蒸馏过程,教师在学习新聚类的同时保留先前发现的聚类结构,而无需存储过去的数据。FBCC代表了UCC的开创性方法,展示了在连续任务中改进的聚类性能。在四个基准数据集上的实验表明,FBCC在聚类准确性上始终优于现有的持续学习基线,同时显著减少了灾难性遗忘。

英文摘要

Unsupervised Continual Learning (UCL) aims to enable neural networks to learn sequential tasks without labels or access to past data. A major challenge in this setting is Catastrophic Forgetting, where models forget previously learned tasks upon learning new ones. This challenge is amplified in UCL due to the absence of labels to guide learning and memory retention. Existing mitigation strategies, such as knowledge distillation and replay buffers, often raise memory and privacy concerns. Moreover, current UCL methods largely overlook clustering-specific objectives. To fill this gap, we introduce Unsupervised Continual Clustering (UCC) and propose Forward-Backward Knowledge Distillation for Continual Clustering (FBCC). FBCC employs a continual teacher network with a clustering projector and lightweight task-specific students. Through a dual-phase forward-backward distillation process, the teacher learns new clusters while preserving previously discovered cluster structure without storing past data. FBCC represents a pioneering approach to UCC, demonstrating improved clustering performance across sequential tasks. Experiments on four benchmark datasets demonstrate that FBCC consistently outperforms existing continual learning baselines in clustering accuracy while significantly reducing catastrophic forgetting.

2606.07475 2026-06-08 cs.LG cs.AI 新提交

Graph Neural Network leveraging Higher-order Class Label Connectivity for Heterophilous Graphs

利用高阶类标签连通性的图神经网络用于异配图

Takuto Takahashi, Itsuki Nakayama, Takahiro Mitani, Ryosuke Kikuchi, Yuya Sasaki, Makoto Onizuka

发表机构 * The University of Osaka(大阪大学)

AI总结 针对异配图中节点分类性能受限问题,提出标签上下文分类器(LCC),通过四种游走生成标签上下文嵌入捕获高阶类标签连通性,并可与任意GNN自适应集成,实验表明优于现有方法。

详情
AI中文摘要

图神经网络(GNN)中的节点分类已广泛应用于图分析的各个领域。在同配图中,具有相同类标签的节点倾向于连接,GNN能实现高精度节点分类。然而,在异配图中,不同类标签的节点更可能连接,其性能仍然有限。特别是,当前基于图卷积网络的GNN无法捕获高阶类标签连通性,而这在真实世界的异配图中经常出现。为了解决这个问题,我们提出了一种新颖的分类器——标签上下文分类器(LCC),旨在捕获有向图中的高阶类标签连通性。LCC通过利用四种不同类型的游走生成的标签上下文嵌入来估计目标节点的类标签。此外,我们的方法允许通过自适应学习LCC和任意GNN的重要性来集成它们。实验结果表明,与LCC集成的GNN优于最先进的方法,并且标签上下文嵌入提高了异配有向图中的节点分类性能。

英文摘要

Node classification in graph neural networks (GNNs) has been widely applied in various fields of graph analysis. GNNs achieve high-accuracy node classification in homophilous graphs, where nodes with the same class label tend to be connected. However, their performance remains limited in heterophilous graphs, where nodes with different class labels are more likely to be connected. In particular, current GNNs derived from graph convolutional networks cannot capture higher-order class label connectivity, which is frequently observed in real-world heterophilous graphs. To address this issue, we propose a novel classifier, Label Context Classifier (LCC), designed to capture higher-order class label connectivity in directed graphs. LCC estimates the class label of a target node by leveraging label context embeddings that are generated through four distinct types of walks. In addition, our approach allows the integration of LCC and any GNN by adaptively learning their importance. Experimental results demonstrate that GNNs integrated with LCC outperform SOTA methods and the label context embeddings improve the node classification performance in heterophilous directed graphs.

2606.07481 2026-06-08 cs.LG 新提交

Drifting Models for Surrogate Flow Modeling

用于代理流建模的漂移模型

Chris R. Jung, Markus Dörr, Natalie Jüngling, Jennifer Niessner, Adam T. Müller, Nicolaj C. Stache

发表机构 * Center for Machine Learning (ZML)(机器学习中心(ZML)) Institute for Flow in Additively Manufactured Porous Structures (ISAPS)(添加剂制造多孔结构流动研究所(ISAPS)) Heilbronn University of Applied Sciences(海德堡应用科学大学)

AI总结 提出条件漂移框架,在VAE潜空间中进行漂移并利用标签感知掩码对齐边界条件,实现高质量单步生成,速度比迭代扩散快两个数量级。

Comments Accepted to the 2nd International Symposium AI and Fluid Mechanics 2026

详情
AI中文摘要

虽然计算流体动力学(CFD)可以为优化室内环境提供高保真流场,但其计算成本限制了快速探索。为了解决这个问题,生成式代理比确定性网络提供了更好的分布建模,但迭代采样速度慢。为了实现高质量的单步生成,我们将新颖的生成式漂移框架应用于流体力学。我们引入了一个条件架构,该架构在学习的VAE潜空间中进行漂移,并使用标签感知掩码将生成的样本与其边界条件对齐。我们的标签条件模型在精度和流一致性上匹配迭代扩散,同时运行速度快两个数量级。此外,我们提出了一种空间条件变体,为泛化到未见几何体开辟了有希望的路径。最终,条件漂移作为基于扩散方法的高效替代方案,为推理速度至关重要的实时CFD代理提供了可能。

英文摘要

While Computational Fluid Dynamics (CFD) provides high-fidelity flow fields for optimizing indoor environments, its computational cost limits rapid exploration. To solve this problem generative surrogates offer better distribution modeling than deterministic networks, but iterative sampling is slow. To enable high-quality, single-pass generation, we adapt the novel generative drifting framework to fluid mechanics. We introduce a conditional architecture that performs drifting in a learned VAE latent space and uses label-aware masking to align generated samples with their boundary conditions. Our label-conditioned model matches iterative diffusion in accuracy and flow consistency while running two orders of magnitude faster. Additionally, we propose a spatial-conditioning variant that establishes a promising path towards generalization to unseen geometries. Ultimately, conditional drifting serves as a highly efficient alternative to diffusion based approaches, unlocking real-time CFD surrogates where inference speed is critical.

2606.07483 2026-06-08 cs.LG stat.ML 新提交

Network Recovery from Cascade Data: A Debiased Jacobian-Based Machine Learning Approach

从级联数据中恢复网络:一种基于去偏雅可比矩阵的机器学习方法

Lei Huang

发表机构 * MIT Sloan School of Management(麻省理工学院斯隆管理学院)

AI总结 提出CascadeNet框架,通过去偏雅可比矩阵估计一步转移函数,无需指定扩散模型即可恢复隐藏影响网络,在模拟和COVID-19传播数据中优于现有方法。

详情
AI中文摘要

许多重要结果以动态级联的形式展开,包括产品采用、疾病传播、金融困境和信息扩散。一个核心挑战是恢复这些级联背后的隐藏影响网络。现有方法通常假设特定的扩散模型,当该假设错误时,其性能会大幅下降。我们提出了CascadeNet,一种基于雅可比矩阵的机器学习框架,用于网络恢复,无需指定扩散机制。关键思想是,潜在的影响结构可以通过一步转移函数的雅可比矩阵来刻画。CascadeNet首先构建转移函数的灵活估计量,然后通过Riesz表示应用Neyman正交去偏,使得去偏后的雅可比矩阵是$\sqrt{n}$一致且渐近正态的,从而能够对网络结构进行正式推断。我们在模拟实验和真实世界实证应用中验证了CascadeNet。在模拟中,数据生成过程已知,CascadeNet在九种常见数据生成过程中实现了最高的网络恢复准确率。在西班牙52个省份的COVID-19传播实证应用中,CascadeNet恢复的传播网络与真实的省际移动网络显著相关,而基线方法恢复的网络与真实情况无显著一致性。

英文摘要

Many important outcomes unfold as dynamic cascades, including product adoption, disease spread, financial distress, and information diffusion. A central challenge is to recover the hidden influence network behind these cascades. Existing methods typically assume a specific diffusion model, and their performance degrades substantially when that assumption is misspecified. We propose CascadeNet, a Jacobian-based machine learning framework for network recovery that does not require specifying a diffusion mechanism. The key idea is that the underlying influence structure can be characterized by the Jacobian of the one-step transition function. CascadeNet first constructs a flexible estimator of the transition function, and further applies Neyman-orthogonal debiasing via the Riesz representer, so that the debiased Jacobian is $\sqrt{n}$-consistent and asymptotically normal, enabling formal inference on the network structure. We validate CascadeNet in both a simulation exercise and a real-world empirical application. In simulations, where the data-generating process is known, CascadeNet achieves the highest network recovery accuracy across nine common data-generating processes. In an empirical application to COVID-19 transmission across Spain's 52 provinces, CascadeNet recovers transmission networks that are significantly correlated with the true inter-province mobility network, whereas networks recovered by baseline methods show no significant alignment with the ground truth.

2606.07488 2026-06-08 cs.LG 新提交

CoMetaPNS: Continually Meta-learning Personalized Neural Surrogates for Cardiac Electrophysiology Simulations

CoMetaPNS:心脏电生理模拟的持续元学习个性化神经代理

Ryan Missel, Xiajun Jiang, Linwei Wang

发表机构 * Golisano College of Computing and Information Sciences, Rochester Institute of Technology(罗切斯特理工学院戈利萨诺计算与信息科学学院) Department of Computer Science, Rowan University(罗文大学计算机科学系) The University of Utah(犹他大学)

AI总结 提出持续元学习框架CoMetaPNS,通过贝叶斯高斯混合模型记忆缓冲区分辨数据来源,实现个性化神经代理的持续学习,避免灾难性遗忘,在心脏模拟预测中优于基线。

详情
AI中文摘要

个性化虚拟心脏模拟面临模型个性化和计算成本的挑战。虽然神经代理提供了最先进的解决方案,但它们通常只解决高效个性化或训练可泛化模型中的一个方面。最近的工作通过使用有限的主题特定上下文数据,通过小样本生成建模与集合条件代理和元学习摊销推理,重新定义了学习个性化代理的过程。然而,这些方法假设一个静态且多样化的训练分布,并具有已知的任务标识符。当新数据可用时,它们需要与所有先前数据一起进行昂贵的重新训练,以避免灾难性遗忘——即模型在训练新任务时忘记旧任务的现象。这在临床环境中是一个主要限制,因为未标记的数据通常顺序到达,而完全重新训练是不可行的。本文提出了一种新的持续元学习框架,以实现个性化的神经代理,该代理不仅能够持续整合信息,还能识别传入数据是否来自已知或未知的动态源。通过利用基于记忆缓冲区的持续贝叶斯高斯混合模型,我们的框架可以推断数据随时间变化的标识符和关系——这是有效元学习所必需的。在合成心脏数据上的实验结果表明,与现有基线相比,我们的方法在模拟预测、计算可扩展性和对灾难性遗忘的鲁棒性方面表现更优。

英文摘要

Personalized virtual heart simulations face challenges in model personalization and computational cost. While neural surrogates offer state-of-the-art solutions, they typically address either efficient personalization or training generalizable models. Recent work reframes this by learning the process of personalizing a surrogate using limited subject-specific context data, through few-shot generative modeling with set-conditioned surrogates and meta-learned amortized inference. These methods, however, assume a static and diverse training distribution with known task identifiers. When new data becomes available, they require costly retraining with all prior data to avoid catastrophic forgetting - a phenomena where the model forgets earlier tasks when trained on new ones. This is a major limitation in clinical settings where often unlabeled data arrives sequentially and full retraining is infeasible. This paper presents a new continual meta-learning framework to achieve personalized neural surrogates able to not only continually integrate information but also identify whether incoming data stems from a known or unknown dynamics source. By leveraging a continual Bayesian Gaussian Mixture Model over a memory buffer, our framework can infer the identifiers and relationships of data over time - required for effective meta-learning. Empirical results on synthetic cardiac data demonstrate superior simulation forecasting, computational scalability, and resilience to catastrophic forgetting compared to existing baselines.

2606.07495 2026-06-08 cs.LG 新提交

Second-Order Path Kernel Interpolation Formulas in Machine Learning

机器学习中的二阶路径核插值公式

Jin Guo, Roy Y. He, Jean-Michel Morel

发表机构 * City University of Hong Kong(香港城市大学)

AI总结 本文提出神经网络的二阶路径核插值公式,引入曲率加权项和随机梯度下降的噪声耦合项,并扩展到带动量的情况,完善了路径核对预测的解释。

详情
AI中文摘要

理解训练数据如何塑造神经网络预测是现代学习理论的核心问题。2020年,Pedro Domingos提出了一个适用于通过确定性梯度下降学习的每个模型的插值公式。它将模型的预测表示为沿优化路径的积分,该积分依赖于一个数据相关的核,该核对齐模型在测试数据和训练数据上的梯度。这种一阶特征对于基于批处理的随机优化训练的模型仍然有效。在本文中,我们发展了这些插值公式的二阶形式。我们表明,主要的路径核插值由一个曲率加权的插值项补充。对于随机梯度下降,出现了一个额外的采样诱导分量,将预测的曲率与小批量梯度噪声的协方差耦合起来。我们还将表示扩展到带动量的随机梯度下降,其中插值结构得以保留,但权重由记忆相关因子修改。此外,我们建立了终端预测的浓度估计,确定了围绕期望二阶表示的波动尺度。这些结果共同完善了神经网络预测的路径核解释。

英文摘要

Understanding how training data shape neural network predictions is a central problem in modern learning theory. In 2020, Pedro Domingos proposed an interpolation formula valid for every model learned by deterministic gradient descent. It expresses the model's prediction as an integral, along the optimization path, of a data-dependent kernel that aligns the model's gradients at the test and training data. Such a first-order characterization remains valid for models trained with batch-based stochastic optimization. In this paper, we develop second-order forms of these interpolation formulas. We show that the leading path-kernel interpolation is supplemented by a curvature-weighted interpolation term. For stochastic gradient descent, an additional sampling-induced component appears, coupling the curvature of the prediction with the covariance of mini-batch gradient noise. We also extend the representation to stochastic gradient descent with momentum, where the interpolation structure is preserved but with the weights modified by a memory-related factor. Moreover, we establish a concentration estimate for the terminal prediction, identifying the fluctuation scale around the expected second-order representation. Together, these results provide a refinement of the path-kernel interpretation of neural network prediction.

2606.07496 2026-06-08 cs.LG math.OC 新提交

Accelerated Decentralized Stochastic Gradient Descent for Strongly Convex Optimization

加速去中心化随机梯度下降用于强凸优化

Ming Sun, Kun Yuan

发表机构 * Center for Machine Learning Research Peking University(机器学习研究中心北京大学)

AI总结 提出MG-ADSGD算法,结合Nesterov型原始-对偶外推与多轮快速八卦平均,通过耦合八卦深度与小批量大小,同时实现加速收敛和通信高效,达到最优通信复杂度。

详情
AI中文摘要

去中心化随机优化是网络大规模学习的基本范式,其中智能体仅与邻居通信,无需中央协调器。对于强凸问题,通信效率主要由条件数 \(\kappa=L/\mu\) 和网络谱间隙 \(1-\beta\) 决定。尽管确定性去中心化方法可以同时实现加速的 \(\sqrt{\kappa}\) 和 \(1/\sqrt{1-\beta}\) 依赖,但现有随机方法未能同时获得这两种改进。本文提出 \emph{Multi-Gossip Accelerated DSGD} (MG-ADSGD),一种结合Nesterov型原始-对偶外推与多轮快速八卦平均的去中心化随机算法。关键思想是将八卦深度与小批量大小耦合,使得额外的通信轮次同时改善共识精度并减少梯度方差。我们证明MG-ADSGD达到通信复杂度 \[ \widetilde{\mathcal O}\!\left( \frac{\sigma^2}{\mu n\epsilon}\log\frac{1}{\epsilon} + \sqrt{\frac{\kappa}{1-\beta}}\log\frac{1}{\epsilon} \right), \] 其中 \(\epsilon\) 表示目标精度,\(n\) 是节点数,\(\sigma^2\) 是梯度方差。据我们所知,该界提供了去中心化随机强凸优化目前最佳的通信复杂度,仅含与 \(\epsilon\) 无关的对数因子。

英文摘要

Decentralized stochastic optimization is a fundamental paradigm for large-scale learning over networks, where agents communicate only with their neighbors and no central coordinator is required. For strongly convex problems, communication efficiency is mainly determined by the condition number \(κ=L/μ\) and the network spectral gap \(1-β\). Although deterministic decentralized methods can simultaneously achieve accelerated \(\sqrtκ\) and \(1/\sqrt{1-β}\) dependences, no existing stochastic method attains both improvements at once. In this paper, we propose \emph{Multi-Gossip Accelerated DSGD} (MG-ADSGD), a decentralized stochastic algorithm that combines Nesterov-type primal--dual extrapolation with multi-round fast gossip averaging. The key idea is to couple the gossip depth with the mini-batch size so that additional communication rounds simultaneously improve consensus accuracy and reduce gradient variance. We show that MG-ADSGD achieves the communication complexity \[ \widetilde{\mathcal O}\!\left( \frac{σ^2}{μnε}\log\frac{1}ε + \sqrt{\fracκ{1-β}}\log\frac{1}ε \right), \] where \(ε\) denotes the target accuracy, \(n\) is the number of nodes, and \(σ^2\) is the gradient variance. To the best of our knowledge, this bound yields the best currently available communication complexity for decentralized stochastic strongly convex optimization, up to logarithmic factors that are independent of $ε$.

2606.07500 2026-06-08 cs.LG cs.AI 新提交

Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning

稀疏子空间到专家共享的任务无关持续学习

Fatema Siddika, Md Anwar Hossen, Tanwi Mallick, Ali Jannesari

发表机构 * Iowa State University(爱荷华州立大学) Argonne National Laboratory(阿贡国家实验室)

AI总结 提出SETA框架,通过将参数分解为任务特定专家和共享专家的稀疏子空间,结合自适应弹性锚定和路由感知正则化,解决LLM持续学习中的塑性-稳定性困境,在多个基准上优于现有方法。

Comments 19 pages. arXiv admin note: text overlap with arXiv:2601.17616

详情
AI中文摘要

大型语言模型(LLM)中的持续学习受到塑性-稳定性困境的阻碍,获取新能力往往导致先前知识的灾难性遗忘。现有方法通常统一对待参数,未能区分特定任务知识和共享能力。我们提出了用于任务无关持续学习的稀疏专家混合(SETA)框架,该框架通过将参数自适应稀疏子空间分解为任务特定专家模块来解决塑性-稳定性冲突。与标准更新(其中任务竞争相同参数)不同,SETA将知识分离为独特专家(旨在隔离任务特定模式)和共享专家(负责捕获共同特征)。这种结构通过自适应弹性锚定和路由感知正则化来维护,该正则化在权重和路由级别共同保护共享知识,并使统一的门控网络能够在推理过程中自动检索正确的专家组合。在多种领域特定基准上的大量实验表明,相对于最先进的持续学习基线,SETA实现了具有竞争力或更优的整体性能,特别是在LLaMA-2 7B和Qwen3-4B上,早期任务知识的保持和反向迁移能力尤为突出。

英文摘要

Continual learning in Large Language Models (LLMs) is hindered by the plasticity-stability dilemma, where acquiring new capabilities often leads to catastrophic forgetting of previous knowledge. Existing methods typically treat parameters uniformly, failing to distinguish between specific task knowledge and shared capabilities. We introduce Mixture of Sparse Experts for Task Agnostic Continual Learning (SETA), a framework that resolves the plasticity-stability conflict through adaptive sparse subspace decomposition into task-specific expert modules. Unlike standard updates, where tasks compete for the same parameters, SETA separates knowledge into unique experts, designed to isolate task-specific patterns, and shared experts, responsible for capturing common features. This structure is maintained through adaptive elastic anchoring and a routing-aware regularization that jointly protect shared knowledge at both the weight and routing levels and enable a unified gating network to automatically retrieve the correct expert combination during inference. Extensive experiments across diverse domain-specific benchmarks demonstrate that SETA achieves competitive or superior overall performance relative to state-of-the-art continual learning baselines, with particularly strong retention of early-task knowledge and improved backward transfer on LLaMA-2 7B and Qwen3-4B.

2605.24649 2026-06-08 cs.LG cs.AI cs.LO cs.SY eess.SY 新提交

On the Stability and Realizability of Recurrent Polynomial Surrogate Ternary Logic Gate Networks

关于循环多项式代理三元逻辑门网络的稳定性与可实现性

Sai Sandeep Damera, Ryan Matheu, Aniruddh G. Puranic, John S. Baras, Calin Belta

发表机构 * University of Maryland, College Park, USA(马里兰大学学院公园分校)

AI总结 提出R-DTLGN架构,通过连续多项式代理训练并硬化为离散三元逻辑电路,结合数值单调和信息单调门,实现STL监控中的稳定递归和原则性弃权,并由STL公式确定网络规模。

Comments 9 pages, 3 figures. This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

循环神经网络(RNN)可以从部分轨迹在线学习预测信号时序逻辑(STL)判定,但在安全关键系统中部署为运行时监控器需要的不只是预测准确性。标准RNN架构无法提供结构保证,确保在传感器退化时输出能优雅降级;一个丢失的输入可能悄无声息地将判定从安全翻转为不安全。我们引入了循环可微三元逻辑门网络(R-DTLGN),这是一种在Kleene三值逻辑$\{-1, 0, +1\}$上运行的循环架构,其中$0$明确表示未知。R-DTLGN通过连续多项式代理进行训练,并在推理时硬化成离散的三元逻辑电路。我们通过从三元域上的两种序导出的两个门词汇表来分析硬化电路:数值单调门确保稳定的循环动态,而信息单调门(如果存在)保证原则性弃权(未知输入从不产生错误输出)和输入确定性上的单调性(更多信息只能改善判定)。我们证明,有界STL算子所需的循环连接仅使用AND和OR门,这两个门属于两个词汇表,从而将监控任务与架构的保证联系起来。由STL公式的时序算子导出的可实现性界限直接确定了网络隐藏状态的大小,用公式驱动的规范取代了超参数搜索。我们在D4RL PointMaze导航数据上的STL规范上进行了评估,测试了预测准确性、谓词丢失下的退化以及两个标签构建流程之间的准确性与安全性权衡。据我们所知,R-DTLGN是第一个将学习到的时序预测与基于三值逻辑的形式化退化保证相结合的循环架构。

英文摘要

Recurrent Neural Networks (RNNs) can learn to predict Signal Temporal Logic (STL) verdicts online from partial trajectories, but deploying them as runtime monitors in safety-critical systems demands more than predictive accuracy. Standard RNN architectures offer no structural guarantee that outputs degrade gracefully under sensor degradation; a dropped input can silently flip a verdict from safe to unsafe. We introduce the Recurrent Differentiable Ternary Logic Gate Network (R-DTLGN), a recurrent architecture that operates over Kleene's three-valued logic $\{-1, 0, +1\}$, where $0$ explicitly represents unknown. The R-DTLGN trains through continuous polynomial surrogates and hardens to a discrete ternary logic circuit at inference. We analyze the hardened circuit through two gate vocabularies derived from two orderings on the ternary domain: numerically monotone gates ensure stable recurrent dynamics, while information-monotone gates, when present, guarantee principled abstention (unknown inputs never produce wrong outputs) and monotonicity in input certainty (more information can only improve the verdict). We show that the recurrent connections required by bounded STL operators use exclusively AND and OR, which belong to both vocabularies, linking the monitoring task to the architecture's guarantees. A realizability bound derived from the STL formula's temporal operators directly sizes the network's hidden state, replacing hyperparameter search with a formula-driven specification. We evaluate on STL specifications over D4RL PointMaze navigation data, testing prediction accuracy, degradation under predicate dropout, and the accuracy-versus-safety tradeoff between two label construction pipelines. The R-DTLGN is, to our knowledge, the first recurrent architecture that couples learned temporal prediction with formal degradation guarantees rooted in three-valued logic.

2606.06107 2026-06-08 quant-ph cs.IT cs.LG eess.IV math.IT physics.optics 交叉投稿

Deployed trusted-node quantum key distribution over 300 km with a multi-core fiber access link

部署的300公里可信节点量子密钥分发与多芯光纤接入链路

Martin Clason, Joakim Argillander, Didrik Bergström, Daniel Spegel-Lexne, Giulio Foletto, Ashraf El Hassan, Mohamed Bourennane, Onur Günlü, Katia Gallo, Rui Lin, Guilherme B. Xavier

发表机构 * Department of Electrical Engineering, Linköping University(电气工程系,林雪平大学) Department of Physics, KTH Royal Institute of Technology(物理学系,皇家理工学院) Department of Physics, Stockholm University(物理学系,斯德哥尔摩大学) Lehrstuhl für Nachrichtentechnik, Technische Universität Dortmund(信息通信技术教席,德累斯顿技术大学) Department of Electrical Engineering, Chalmers University of Technology(电气工程系,查尔姆斯理工大学)

AI总结 通过270公里部署单模光纤和33公里多芯光纤段,总长303公里,使用商用QKD系统与外部超导纳米线单光子探测器,实现了可信节点量子密钥分发,并演示了动态可重构光纤网络中的集成以及密钥速率对图像传输保真度的影响。

Comments 11 pages, 4 figures

详情
AI中文摘要

量子密钥分发(QKD)越来越多地被考虑部署在现实通信网络中,其中长距离、异构光纤基础设施以及与经典通信的共存带来了巨大挑战。在这里,我们展示了在林雪平大学和瑞典国家量子通信基础设施的斯德哥尔摩枢纽之间,通过270公里部署的单模光纤和一段33公里模拟城域接入链路的多芯光纤(MCF)段,总距离303公里的可信节点QKD。两个子链路使用商用QKD系统,其接收器与外部超导纳米线单光子探测器接口,使得能够在超出标准内部门控模式探测器支持的损耗下运行。我们在两个MCF芯之间主动切换QKD信道的同时运行链路,其他芯中有共传播的以太网流量和注入的宽带光学噪声。结果证明了将商用QKD集成到与未来混合量子-经典网络相关的苛刻、动态可重构光纤基础设施中。最后,使用生成的密钥,我们说明了有限且时变的QKD吞吐量如何影响一次性密码本保护的图像传输:图像保真度强烈依赖于可用的QKD生成的密钥预算和压缩算法的选择,突显了现实场景中基于QKD加密的应用级挑战。

英文摘要

Quantum key distribution (QKD) is increasingly considered for deployment in realistic communication networks, where long distances, heterogeneous fiber infrastructure, and coexistence with classical traffic present substantial challenges. Here, we demonstrate trusted-node QKD between Linköping University and the Stockholm hub of the Swedish national quantum communication infrastructure over 270 km of deployed single-mode fiber, extended by a 33 km multi-core fiber (MCF) segment emulating a metropolitan access link, for a total distance of 303 km. The two sub-links use commercial QKD systems whose receivers are interfaced with external superconducting nanowire single-photon detectors, enabling operation at losses beyond those supported by standard internal gated-mode detectors. We operate the link while actively switching the QKD channel between two MCF cores, with co-propagating Ethernet traffic and injected broadband optical noise in the other cores. The results demonstrate the integration of commercial QKD into demanding, dynamically reconfigurable fiber infrastructure relevant to future hybrid quantum-classical networks. Finally, using the generated secret keys, we illustrate how limited and time-varying QKD throughput affects one-time-pad-protected image transmission: image fidelity depends strongly on the available QKD-generated key budget and the choice of compression algorithm, highlighting application-level challenges for QKD-based encryption in realistic scenarios.

2606.06509 2026-06-08 eess.IV cs.AI cs.LG q-bio.TO 交叉投稿

Which Anatomy Matters Under Limited Labels? A Data-Efficient Anatomy-Aware Benchmark for Cardiac Pathology Prediction

在有限标签下哪些解剖结构重要?用于心脏病理预测的数据高效解剖感知基准

Himanshu Singh

发表机构 * Himanshu Singh(希曼斯·辛格)

AI总结 针对有限标签和计算资源下的医学影像问题,提出解剖感知基准,通过比较不同解剖结构表示和分类器,发现表示质量比模型复杂度更重要。

Comments ACCEPTED at ICML 2026 Workshop GlobalSouthML (Seoul, South Korea; PMLR 306, 2026)

详情
AI中文摘要

许多医学影像问题必须在有限标签和受限计算条件下解决,然而性能提升主要来自更具表达力的模型还是对临床有意义解剖结构的更好表示,目前尚不清楚。我们通过一个低数据解剖感知基准来研究这个问题,该基准用于在公共ACDC MRI数据集上进行5类心脏病理预测。利用来自右心室、心肌和左心室的分割衍生患者描述符,我们在线性、核和基于树的分类器上比较了特定解剖结构和多结构表示。我们发现,在有限标签设置下,表示主导复杂度。这些结果表明,在资源受限的医疗环境中,识别和表示最具信息量的解剖结构可能比单纯增加模型复杂度更重要。

英文摘要

Numerous medical imaging problems must be solved under limited labels and constrained compute, yet it remains unclear whether performance gains are driven mainly by more expressive models or by better representation of clinically meaningful anatomy. We study this question through a low-data anatomy-aware benchmark for 5-class cardiac pathology prediction on the public ACDC MRI dataset. Using segmentation-derived patient descriptors from the right ventricle, myocardium, and left ventricle, we compare anatomy-specific and multi-structure representations across linear, kernel, and tree-based classifiers. We find that under limited label settings, representation dominates complexity. These results suggest that in resource-constrained healthcare settings, identifying and representing the most informative anatomy may matter more than the increasing complexity of the model alone.

2606.06514 2026-06-08 cs.AI cs.LG 交叉投稿

Detecting and Mitigating Bias by Treating Fairness as a Symmetry Operation

通过将公平性视为对称操作来检测和缓解偏见

Nishit Singh

发表机构 * Nishit Singh

AI总结 将偏见形式化为对称性破缺操作,通过损失正则化恢复对称性,在合成数据集上实现90%以上的违规减少,准确率成本约5%。

Comments 8 pages, 7 figures

详情
AI中文摘要

部署在高风险社会经济环境中的机器学习系统通常表现出偏见。我们将偏见形式化为一种对称性破缺操作:如果分类器在切换敏感属性的反事实操作下(保持能力特征不变)输出不变,则该分类器是公平的。我们实现基于损失的正则化作为对称性恢复机制,并在四个具有不同噪声、相关性和偏见水平的合成数据集上评估该框架。该框架实现了90%以上的违规减少,准确率成本约为5%。该框架不需要因果图知识,计算轻量,且适用于任何可定义为比特翻转的敏感属性,使其适用于主流基准中缺乏局部歧视来源的上下文。

英文摘要

Machine learning systems deployed in high stakes socioeconomic settings routinely display bias. We formalize bias as a symmetry breaking operation: a classifier is fair if its outputs remain invariant under the counterfactual operation of switching a sensitive attribute, with merit features held fixed. We implement loss based regularization as a symmetry restoring mechanism and evaluate the framework on four synthetic datasets with varying levels of noise, correlation, and bias. The framework achieves upwards of 90\% violation reduction, with accuracy costs around 5\%. This framework does not require causal graph knowledge, is computationally lightweight, and generalizes to any sensitive attribute definable as a bit-flip, making it suitable for contexts where local sources of discrimination remain absent from mainstream benchmarks.

2606.06515 2026-06-08 cs.AR cs.AI cs.DC cs.ET cs.LG 交叉投稿

DxPTA: An Architecture Design Space Exploration with Optical Dataflow-guided Strategy for HW/SW Co-Design of Photonic Transformer Accelerators

DxPTA:基于光学数据流引导策略的光子Transformer加速器硬件/软件协同设计的架构设计空间探索

Rachmad Vidya Wicaksana Putra, Solomon Micheal Serunjogi, Mahmoud Rasras, Muhammad Shafique

发表机构 * eBRAIN Lab, Division of Engineering, New York University (NYU) Abu Dhabi(eBRAIN实验室,工程学院,纽约大学(NYU)阿布扎赫德分校) Photonic Research Lab (PRL), Division of Engineering, New York University (NYU) Abu Dhabi(光子研究实验室(PRL),工程学院,纽约大学(NYU)阿布扎赫德分校) New York University (NYU) Abu Dhabi(纽约大学(NYU)阿布扎赫德分校)

AI总结 提出DxPTA方法,通过光学数据流分析架构参数并设计约束感知搜索算法,实现光子Transformer加速器的高效硬件/软件协同设计,在满足面积、功耗等约束下显著提升搜索速度。

Comments 8 pages, 12 figures

详情
AI中文摘要

基于Transformer的网络已成为具有最先进性能的突出AI模型,可能为人工通用智能(AGI)铺平道路。然而,它们的大尺寸仍然阻碍了其高效实现,因此需要替代解决方案以实现其节能加速。最近,最先进的工作提出了光子Transformer加速器(PTA),与传统电子加速器相比,具有显著的加速和能效提升。然而,它们的PTA架构是在不考虑应用约束(如面积、功耗、能量和延迟)的情况下开发的。此外,它们的手动设计方法也需要大量设计时间来确定适合目标应用的架构,因此使得这种方法不可扩展。为了解决这些限制,我们提出了DxPTA,一种新颖的设计空间探索方法,用于实现满足所有约束的适当PTA架构的高效硬件/软件协同设计。这是通过(1)基于相干光学数据流识别PTA架构参数;(2)分析参数的影响/重要性;(3)利用此分析设计约束感知架构搜索算法来实现的。实验结果表明,我们的DxPTA可以为不同的基于Transformer的模型(即DeiT-T/S/B和BERT-B/L)找到合适的PTA架构。在约束条件为面积50mm^2、功耗5W、能量50mJ和延迟10ms的情况下,它实现了高达26mm^2面积、4.8W功耗、39mJ能量和6ms延迟;搜索时间比穷举方法快15.2倍。这些结果证明了DxPTA方法在实现针对各种基于AGI的应用的高效PTA设计方面的潜力。

英文摘要

Transformer-based networks have emerged as prominent AI models with state-of-the-art performance, which potentially pave the way toward artificial general intelligence (AGI). However, their large sizes still hinder their efficient implementation, thus highlighting the need for alternate solutions to enable their energy-efficient acceleration. Recently, state-of-the-art works propose photonic transformer accelerators (PTAs) with significant speedup and energy efficiency improvements over the conventional electronic accelerators. However, their PTA architectures are developed without considering the application constraints (e.g., area, power, energy, and latency). Moreover, their manual design approach also requires huge design time to determine a suitable architecture for the targeted application, hence making this approach not scalable. To address these limitations, we propose DxPTA, a novel design space exploration methodology for enabling efficient hardware/software co-design of the appropriate PTA architecture that meets all constraints. It is achieved by (1) identifying the PTA architecture parameters based on the coherent optical dataflow; (2) analyzing the impact/significance of the parameters; and (3) leveraging this analysis for devising a constraint-aware architecture search algorithm. Experimental results show that, our DxPTA can find the appropriate PTA architectures for different transformer-based models (i.e., DeiT-T/S/B and BERT-B/L). It achieves up to 26mm^2 area, 4.8W power, 39mJ energy, and 6ms latency, for constraints of 50mm^2 area, 5W power, 50mJ energy, and 10ms latency; with 15.2x faster searching time than the exhaustive approach. These results demonstrate the potential of DxPTA methodology for enabling efficient PTA designs for diverse AGI-based applications.

2606.06516 2026-06-08 q-bio.QM cs.LG 交叉投稿

Probabilistic learning to perform pre-onset individualised prediction of disease severity: application to Veno Occlusive Disease

概率学习用于疾病严重程度的发病前个体化预测:在静脉闭塞性疾病中的应用

Dalia Chakrabarty, Kane Warrior, Chuqiao Zhang, Akash Bhojgaria, Joydeep Chakrabartty

发表机构 * University of York(约克大学)

AI总结 提出一种新的概率监督学习方法,利用数字孪生和概率逆学习,在骨髓移植前自动预测静脉闭塞性疾病(VOD)的严重程度评分,辅助医生制定治疗方案。

详情
AI中文摘要

我们提出了一种新的概率监督学习方法,能够对预期患者疾病发展的严重程度进行可靠、自动且早期的个体化预测。通过考虑预期患者的数字孪生(DT),在移植前预测静脉闭塞性疾病(VOD)的严重程度评分来展示预测能力,该评分参数化了患者在接受骨髓移植后VOD发展的严重程度。通过将移植前变量与严重程度评分变量之间的关系建模为(随机)函数,该函数被视为适当选择的随机过程的样本函数,从而学习这种关系。该基础过程的参数使用训练数据集学习,该数据集由回顾性患者队列的实时演变生成,随后通过预期患者评分的概率逆学习来扩充该训练数据集的大小。扩充后的训练集允许学习在移植前阶段自动预测VOD严重程度评分的函数,该评分表征了物理患者在其独特移植前状态下的DT。该评分随后反馈给真实预期患者,作为其移植后VOD发展的严重程度。这样的评分允许治疗血液肿瘤学家决定治疗方案,在本例中简化为决定是否使用去纤维蛋白多核苷酸治疗患者。开发了一个AI工具来执行这种自动预测,医生输入表征预期患者DT的移植前状态数据。

英文摘要

We advance a new probabilistic supervised learning approach that permits reliable, automated, and early individualised prediction of the severity with which a disease will develop in a prospective patient. The prediction capacity is illustrated via the pre-transplant prediction of the score of severity of Veno Occlusive Disease (or VOD) in the digital twin (DT) of the considered prospective patient, where this score parametrises the severity with which VOD will develop in this patient, after they undergo their Bone Marrow Transplant. The learning of the relationship between the pre-transplant variables, and a severity score variable is undertaken by modelling this relationship as a (random) function that is treated as a sample function of an adequately-chosen stochastic process. The parameters of this underlying process are learnt using a training dataset that is generated using the real-time evolution of retrospective patients in a cohort, with this training dataset subsequently augmented in size by a probabilistic inverse learning of the score of prospective patients. The augmented training set, then permits the learning of the function that capacitates - at the pre-transplant stage - automated prediction of the score of the severity of VOD that characterises the DT of a physical patient in their unique pre-transplant state. This score is subsequently fed back to the real prospective patient as the severity with which VOD will develop in them, after this patient undergoes their transplant. Such a score then permits the treating Haematologist-Oncologists to decide on the treatment regimen, which in this illustration reduces to deciding on treating the patient with Defibrotide. An AI facility is developed to undertake such automated prediction, with the physician inputting the data on the pre-transplant state that characterises the DT of the prospective patient under consideration.

2606.06518 2026-06-08 cs.AI cs.LG 交叉投稿

DiBS: Diffusion-Informed Branch Selection

DiBS: 扩散模型引导的分支选择

Bo Liu, Yuan Xie, Yuan Gao, Xiaolong Luo, Peng Ye, Tao Chen, Fujun Han

发表机构 * Nanyang Technological University(南洋理工大学) University of Science and Technology of China(中国科学技术大学) Tsinghua University(清华大学)

AI总结 针对数独求解中学习型求解器缺乏正确性保证而符号求解器存在长尾搜索的问题,提出扩散模型引导的分支选择方法DiBS,在保持符号求解器完备性的同时,利用扩散模型排序候选值,显著降低搜索成本。

Comments 12 pages, 6 figures, 3 tables

详情
AI中文摘要

数独是一个典型的约束满足问题,需要在严格的离散约束下进行全局结构推理。现有的数独求解方法主要集中于两种主流方法,即传统启发式方法和深度学习求解器。然而,它们存在互补的局限性:基于学习的求解器缺乏硬正确性保证,而完全符号求解器仍容易陷入长尾搜索。为了解决这些不足,我们提出了一种新颖的扩散模型引导方法,称为DiBS,用于分支选择搜索过程。具体来说,DiBS保持符号求解器的完备性,并使用扩散模型作为分支排序指南。核心方法是在当前部分赋值和轻量级一致性信号下对候选值进行排序。此外,我们提供了深入的理论证明来揭示其工作原理和原因。在具有挑战性的Royle 17线索数独基准上的实验表明,与强启发式基线相比,我们的DiBS显著降低了搜索成本,特别是在节点、回溯和长尾百分位数方面。此外,这些结果证实,在分支顺序错误代价最高的困难实例上,学习到的全局指导是有效的。所有代码均可在此https URL获取。

英文摘要

Sudoku is a representative constraint satisfaction problem that requires global structural reasoning under strict discrete constraints. The existing works of solving Sudoku mainly focus on two dominant approaches, i.e., traditional heuristic and deep learning solver. However, they suffer from two complementary limitations: learning-based solvers lack hard correctness guarantees, while complete symbolic solvers are still prone to long-tail search. To address these shortcomings, we propose a novel diffusion model-guided approach, termed as DiBS, for the branch selection search process. Specifically, DiBS keeps the symbolic solver complete and uses the diffusion model as a branch-ordering guide. The core method is ranking candidate values under the current partial assignment and lightweight consistency signal. Furthermore, we provide an in-depth theoretical proof to reveal how it works and why it works. Experiments on the challenging Royle 17-clue Sudoku benchmark show that our DiBS substantially reduces search cost relative to strong heuristic baselines, especially in nodes, backtracks, and long-tail percentiles. Besides, these results confirm that learned global guidance is effective on hard instances where branch-order mistakes are most expensive. All codes are available at https://github.com/shanxierdan/DiBS.

2606.06519 2026-06-08 cs.AI cs.LG 交叉投稿

SafeGene: Reusable Adapters for Transferable Safety Alignment

SafeGene: 可重用的适配器实现可迁移的安全对齐

Yanghan Wang, Zhiqiang Kou, Fu Feng, Jing Wang, Xin Geng

发表机构 * Southeast University(东南大学) The Hong Kong Polytechnic University(香港理工大学)

AI总结 提出SafeGene,一种可重用的安全适配器模块,通过从对齐-退化模型差异中提取安全表示,并利用数据感知层选择和少样本系数重校准,实现跨任务的安全恢复,在保持下游性能的同时降低有害响应率。

详情
AI中文摘要

开放权重的LLM越来越多地被微调成定制助手,但下游微调可能会削弱安全对齐,使模型更容易受到恶意提示的攻击,即使训练数据并非有意有害。这造成了反复的安全恢复问题,因为目标模型会随着新任务数据或用户交互而不断更新。我们提出SafeGene,一种可重用的安全适配器模块,设计用于每个架构兼容模型家族内的跨任务重用。SafeGene不将安全恢复视为特定于模型的修复步骤,而是将安全能力视为一种独立的、可重用的适配器表示,与任务特定更新解耦。这种表示从对齐-退化模型差异中获得,通过数据感知层选择精炼成任务可迁移的安全向量,并通过少样本逐层系数重校准在每个下游任务适应模型中表达。跨多个模型家族、下游任务和安全评估者的实验表明,SafeGene增强的模型在降低有害响应率的同时保持下游性能,在安全-效用权衡中优于代表性的安全适应方法。

英文摘要

Open-weight LLMs are increasingly fine-tuned into customized assistants, but downstream fine-tuning can weaken safety alignment and make models more vulnerable to malicious prompts, even when the training data is not intentionally harmful. This creates a recurring safety recovery problem as target models are repeatedly updated with new task data or user interactions. We propose SafeGene, a reusable safety-adapter module designed for cross-task reuse within each architecture-compatible model family. Rather than treating safety recovery as a model-specific repair step, SafeGene treats safety capability as an independent, reusable adapter representation decoupled from task-specific updates. This representation is obtained from aligned--degraded model discrepancies, refined into task-transferable safety vectors through data-aware layer selection, and expressed in each downstream task-adapted model via few-shot layer-wise coefficient recalibration. Experiments across multiple model families, downstream tasks, and safety judges show that SafeGene-enhanced models reduce harmful response rates while maintaining downstream performance, outperforming representative safe adaptation methods in safety--utility trade-off.

2606.06521 2026-06-08 cs.AR cs.AI cs.DC cs.LG cs.PF 交叉投稿

P-Cast Precision in FP8 Attention: Sink-Induced Collapse and the Optimality of S=2^8

P-Cast:FP8注意力中的精度——Sink引发的坍缩与S=2^8的最优性

Reed Lau

发表机构 * Tencent(腾讯)

AI总结 针对FP8注意力计算中softmax概率矩阵P在乘法前转换为FP8时的精度问题,分析了KV块迭代顺序和静态缩放因子对精度的影响,发现正向迭代导致非sink值下溢为0,反向迭代结合S=256可消除下溢,并证明S=256在比特精确、量化步长和覆盖范围上最优。

Comments 8 pages, 3 figures, 3 tables, 1 algorithm. Technical note on FP8 E4M3 P-cast precision

详情
AI中文摘要

FP8 (E4M3) 加速注意力计算可显著提升吞吐量,但3位尾数在P*V矩阵乘法前将softmax概率矩阵P转换为FP8时带来了精度挑战。我们分析了在注意力Sink现象下影响输出精度的两种实现选择:(1) KV块迭代顺序,(2) 转换前应用于P的静态缩放因子。我们证明正向KV迭代会导致“P坍缩”——在主导阶上,非sink的P值中有比例为Φ(Δ + δ_k - 6.93 - ln S)的部分下溢为零,其中小偏移δ_k ≈ 1(对于k_sink=4)是sink块内期望的分数最大值;而反向迭代可消除该问题,当反向与S=256结合时保证零下溢。我们进一步给出S=256=2^8的构造性刻画,它是同时满足(i) 比特精确的IEEE 754缩放,(ii) E4M3数轴上锯齿函数dp(S)的下包络(dp=2^-4,最小最坏情况量化步长),以及(iii) 在比特精确(2^k)缩放中最大正常范围覆盖(非比特精确缩放如448可实现略高覆盖)的静态缩放因子。两种优化已在FlashAttention-3/4中基于工程理由部署;我们的贡献是定量解释这些选择为何良好,并给出一个闭式阈值Δ_c = 6.93 + ln S - δ_k用于预测内核级精度损失。内核忠实实验(Q、K、V为FP32以隔离P-cast效应)在中等sink强度下显示3-10倍的MSE改进,配对测试证实两种修复结合时均饱和到相同的精度下限。

英文摘要

FP8 (E4M3) acceleration for attention computation offers significant throughput gains, but the 3-bit mantissa introduces precision challenges when the softmax probability matrix~$P$ is cast to FP8 before the $P \cdot V$ matrix multiplication. We analyze two implementation choices that affect output precision under the \emph{Attention Sink} phenomenon: (1)~the KV block iteration order, and (2) the static scaling factor applied to $P$ before casting. We show that forward KV iteration causes \emph{P-collapse} -- to leading order a fraction $Φ(Δ+ δ_k - 6.93 - \ln S)$ of non-sink $P$ values underflow to zero, where the small shift $δ_k \approx 1$ (for $k_{\text{sink}}{=}4$) is the expected within-sink-block score maximum -- and that reverse iteration removes it, with a zero-underflow guarantee when reverse is combined with $S{=}256$. We further give a constructive characterization of $S = 256 = 2^8$ as the static scale that simultaneously satisfies (i)~bit-exact IEEE 754 scaling, (ii) the lower envelope of a sawtooth function $dp(S)$ over the E4M3 number line ($dp = 2^{-4}$, the minimum worst-case quantization step), and (iii)~the maximum normal-range coverage \emph{among bit-exact ($2^k$) scales} (a non-bit-exact scale such as $448$ attains slightly higher coverage; sec.5}). Both optimizations are already deployed in FlashAttention-3/4 on engineering grounds; our contribution is a quantitative account of \emph{why} these choices are good and a closed-form threshold $Δ_c = 6.93 + \ln S - δ_k$ for predicting kernel-level precision loss. Kernel-faithful experiments ($Q, K, V$ in FP32 to isolate the P-cast effect) show $3$-$10\times$ MSE improvement at moderate sink strengths, and paired tests confirm both fixes saturate to the same precision floor when combined -- which motivated updating the hpc-ops kernel from $S{=}1$ to $S{=}256$.

2606.06524 2026-06-08 eess.IV cs.CV cs.LG 交叉投稿

Advanced Flood Prediction with Physics-Guided Deep Learning: Combining UNet, FNO, and SAR/Optical Imagery

基于物理引导深度学习的先进洪水预测:结合UNet、FNO与SAR/光学影像

Tewodros Syum Gebre, Jagrati Talreja, Leila Hashemi-Beni

发表机构 * National Center for Atmospheric Research (NCAR)(国家大气研究中心)

AI总结 提出物理引导深度学习框架,融合多模态遥感与浅水方程约束,通过UNet-FNO混合架构实现高精度洪水预测,IoU达0.82,F1达0.90。

Comments This paper has been accepted for publication in the Proceedings of the IEEE Radar Conference (RadarConf 2026). The final authenticated version will be available through IEEE Xplore

详情
AI中文摘要

由于地面观测有限、地形条件异质以及数据驱动模型中难以强制执行水动力学一致性,准确且可扩展的洪水测绘仍然具有挑战性。本文介绍了一种物理引导的深度学习框架,该框架集成了多模态遥感(Sentinel-1 SAR、Sentinel-2光学影像和DEM衍生的地形特征)与深度平均浅水方程(SWE)的约束。所提出的混合架构结合了用于捕捉精细尺度空间细节的UNet和用于模拟流域尺度水力相互作用的傅里叶神经算子(FNO),而物理信息残差损失确保了质量和动量一致性。在多种洪泛区环境下评估,混合模型在洪水范围预测中实现了0.82的交并比和0.90的F1分数,优于仅使用UNet和仅使用FNO的基线模型。以水动力学模拟作为参考数据,该模型在水深方面实现了0.21米的均方根误差,在流速方面实现了0.15米/秒的均方根误差。物理一致性得以保持,残差低且质量不平衡低于2.1%。消融研究证实,去除基于物理的正则化会显著降低性能,突显了物理约束对稳定性和泛化能力的价值。这些结果表明,将水动力学原理嵌入深度学习可产生更准确、可靠且物理一致的洪水预测,为业务监测和大规模部署提供了巨大潜力。

英文摘要

Accurate and scalable flood mapping remains challenging due to limited ground observations, heterogeneous terrain conditions, and the difficulty of enforcing hydrodynamic consistency within data-driven models. This work introduces a physics-guided deep learning framework that integrates multi-modal remote sensing (Sentinel-1 SAR, Sentinel-2 optical imagery, and DEM-derived terrain features) with constraints from the depth-averaged shallow water equations (SWE). The proposed hybrid architecture combines a UNet to capture fine-scale spatial details with a Fourier Neural Operator (FNO) to model basin-scale hydraulic interactions, while physics-informed residual losses ensure mass and momentum consistency. Evaluated across diverse floodplain settings, the hybrid model achieves an Intersection over Union of 0.82 and an F1 score of 0.90 for flood extent prediction, outperforming UNet-only and FNO-only baselines. Using hydrodynamic simulations as reference data, the model achieves an RMSE of 0.21 m for water depth and 0.15 m/s for flow velocity. Physics consistency is maintained, with low residuals and mass imbalance below 2.1%. Ablation studies confirm that removing physicsbased regularization significantly degrades performance, underscoring the value of physical constraints for stability and generalization. These results demonstrate that embedding hydrodynamic principles into deep learning yields more accurate, reliable, and physically coherent flood predictions, offering strong potential for operational monitoring and large-scale deployment.

2606.06526 2026-06-08 cs.AI cs.LG 交叉投稿

CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions

CrowdMath: 众包数学研究讨论数据集

Sherin Muckatira, Jesse Geneson, Slava Gerovitch, Pavel Etingof, Mikhail Gronas, Anna Rumshisky

发表机构 * University of Massachusetts Lowell(马萨诸塞大学洛文分校) San Jose State University(圣何塞州立大学) Massachusetts Institute of Technology(麻省理工学院) Dartmouth College(达特茅斯学院) Amazon AGI(亚马逊人工智能研究院)

AI总结 提出CrowdMath数据集,包含164条专家标注的进展链,用于评估大语言模型在协作开放问题求解中的能力,发现模型在局部预测上表现良好但在角色分类上存在不足。

Comments 16 pages, 4 figures

详情
AI中文摘要

大型语言模型在数学推理方面取得了实质性进展,但现有基准通常评估具有最终答案、逐步解决方案或完整证明的明确问题。它们没有捕捉到协作开放问题求解:参与者提出部分论证、识别先前步骤中的空白或错误、修复有缺陷的推理,并逐步将增量贡献综合成证明。我们引入了CrowdMath,一个包含164条专家标注的进展链的数据集,来自MIT PRIMES--Art of Problem Solving (AoPS) CrowdMath项目(2016-2025),这是一个协作研究计划,其讨论已导致同行评审的出版物。每条链追踪一个从开放问题陈述到完成证明的多参与者论坛讨论。帖子根据其在不断演变的解决方案过程中的功能角色进行标注,包括部分进展、证明完成、错误推理和错误识别。我们定义了评估任务并对六个前沿模型进行了基准测试。模型在下一帖子预测上达到83-88%的准确率,表明它们能够跟随数学讨论的局部流程。然而,它们难以识别单个贡献的功能重要性,最佳模型在帖子角色分类上仅达到0.42的宏F1分数。CrowdMath揭示了解决明确数学问题与理解协作数学进展之间的差距。

英文摘要

Large language models have made substantial progress on mathematical reasoning, but existing benchmarks typically evaluate well-specified problems with final answers, step-by-step solutions, or complete proofs. They do not capture collaborative open-problem solving: a setting in which participants propose partial arguments, identify gaps or errors in prior steps, repair flawed reasoning, and gradually synthesize incremental contributions into a proof. We introduce CrowdMath, a dataset of 164 expert-annotated progress chains from the MIT PRIMES--Art of Problem Solving (AoPS) CrowdMath program (2016-2025), a collaborative research initiative whose discussions have led to peer-reviewed publications. Each chain traces a multi-participant forum discussion from an open-problem statement to a completed proof. Posts are labeled by their functional roles in the evolving solution process, including partial progress, proof completion, erroneous reasoning, and error identification. We define evaluation tasks and benchmark six frontier models. Models achieve 83-88% accuracy on next-post prediction, suggesting that they can follow the local flow of mathematical discussion. However, they struggle to identify the functional significance of individual contributions with the best model achieving only 0.42 macro-F1 on post-role classification. CrowdMath exposes a gap between solving well-specified mathematical problems and understanding collaborative mathematical progress as it unfolds.

2606.06529 2026-06-08 cs.AI cs.LG 交叉投稿

Attack Selection in Agentic AI Control Evaluations Meaningfully Decreases Safety

智能体AI控制评估中的攻击选择显著降低安全性

Catherine Ge-Wang, Tyler Crosse, Benjamin Hadad, Joachim Schaeffer, Ram Potham, Tyler Tracy

发表机构 * GitHub arXiv

AI总结 本文研究攻击者策略性选择攻击时机对AI控制安全性的影响,通过分解攻击决策为开始和停止策略,实验表明两者均显著降低安全性,现有评估可能高估安全性。

详情
AI中文摘要

一个策略性选择攻击时机的攻击者比无差别攻击者更难被捕获。AI控制是一种安全框架,用于在较弱但可信的监控者和有限的人工审计预算下部署能力强但不可信的AI智能体。控制评估通过让红队攻击策略对抗蓝队监控者来压力测试这些协议,但当前评估通常假设攻击者不会策略性地选择攻击时机。我们通过将攻击决策分解为开始策略(决定攻击者何时应开始攻击)和停止策略(决定攻击者何时应中止正在进行的攻击),在智能体环境中研究了这种能力——攻击选择。在两个智能体环境BashArena和LinuxArena中,两种策略都在不改变底层攻击能力的情况下显著降低了测量的经验安全性。在1%的审计预算下,我们的开始策略在BashArena和LinuxArena上均将安全性降低了20个百分点,停止策略在BashArena上降低了20个百分点,在LinuxArena上降低了28个百分点。这些降低应被解释为攻击选择效果的上限。因此,现有的控制评估可能对选择性攻击者产生过于乐观的安全性估计。我们建议未来的评估、系统卡和安全案例应引出攻击选择,以产生更现实的安全性估计。

英文摘要

An attacker that strategically chooses when to attack is much harder to catch than one that attacks indiscriminately. AI control is a safety framework for deploying capable but untrusted AI agents under the oversight of a weaker, trusted monitor and a limited human audit budget. Control evaluations stress-test these protocols by pitting a red-team attack policy against the blue-team monitor, but current evaluations typically assume attackers that do not strategically select when to attack. We study this capability, attack selection, in agentic settings by decomposing attack decisions into a start policy, which decides when an attacker should attack, and a stop policy, which decides when an attacker should abort an ongoing attack. Across two agentic settings, BashArena and LinuxArena, both policies substantially lower measured empirical safety without changing the underlying attack capability. At a 1% audit budget, our start policy reduces safety by 20pp on both BashArena and LinuxArena, and our stop policy reduces safety by 20pp on BashArena and 28pp on LinuxArena. These reductions should be interpreted as upper bounds on the effect of attack selection. Existing control evaluations may therefore yield overly optimistic safety estimates against selective attackers. We recommend that future evaluations, system cards, and safety cases elicit attack selection to produce more realistic safety estimates.

2606.06535 2026-06-08 cs.SE cs.LG 交叉投稿

Architecturally Significant MLOps Guidelines for ML Model Integration and Deployment: a Gray Literature Review

架构上重要的MLOps指南:ML模型集成与部署的灰色文献综述

Faezeh Amou Najafabad, Markus Haug, Keerthiga Rajenthiram, Justus Bogner, Ilias Gerostathopoulos

发表机构 * Vrije Universiteit Amsterdam(阿姆斯特丹自由大学) Technical University of Munich(慕尼黑技术大学)

AI总结 通过灰色文献综述,总结了25条架构上重要的MLOps指南,分为五类,用于指导ML模型在MLOps系统中的集成与部署。

Comments ECSA2026

详情
AI中文摘要

背景。尽管机器学习运维(MLOps)的采用日益增长,但由于缺乏统一的架构指导,团队往往以临时方式处理MLOps项目。社区将受益于一份综合知识的参考,以指导MLOps系统的架构设计,特别是关于ML模型的集成与部署。目标。为此,我们的目标是提供一份关于MLOps系统中ML模型集成与部署的架构上重要指南的全面概述。方法。我们对103个网络来源进行了灰色文献综述,以分析MLOps模型集成与部署的实践知识现状。然后,我们应用主题分析将这些实践综合为推荐指南。结果。我们贡献了25条架构上重要的MLOps指南,用于模型集成与部署,分为五类,并描述了它们对整体系统架构的影响。结论。我们的结果作为实践现状的MLOps指南概述,以支持研究人员和从业者在其MLOps系统中集成与部署ML模型。

英文摘要

Context. Despite the growing adoption of Machine Learning Operations (MLOps), teams often approach MLOps projects in an ad hoc manner due to the lack of consolidated architectural guidance. The community would benefit from a reference that synthesizes knowledge to inform the architectural design of MLOps systems, especially regarding the integration and deployment of ML models. Objective. In response, our goal is to provide a comprehensive overview of architecturally significant guidelines for the integration and deployment of ML models in MLOps systems. Method. We conduct a gray literature review of 103 web sources to analyze state-of-practice knowledge on MLOps model integration and deployment. We then apply thematic analysis to synthesize these practices into recommended guidelines. Results. We contribute a collection of 25 architecturally significant MLOps guidelines for model integration and deployment, organized into five categories, and describe their impact on the overall system architecture. Conclusion. Our results serve as an overview of state-of-practice MLOps guidelines to support researchers and practitioners with the integration and deployment of ML models in their MLOps systems.

2606.06536 2026-06-08 cs.CV cs.AI cs.LG 交叉投稿

Attention-Guided Autoencoder Fusion for Insulator Defect Detection Using UAV Transmission-Line Imaging

基于注意力引导自编码器融合的无人机输电线路绝缘子缺陷检测

Malak Allam, Khaled Shaban, Ali Hamdi

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出AE-YOLO框架,通过注意力引导自编码器融合与方差最大化正则化,解决无人机图像中绝缘子缺陷检测的类别不平衡和尺度变化问题,在mAP@0.5上达95.10%,优于YOLO基线5个百分点。

详情
AI中文摘要

高压输电线路绝缘子的自动缺陷检测仍然具有挑战性,原因在于无人机(UAV)图像中严重的类别不平衡、尺度变化大以及缺陷实例的空间范围小。为了解决这些问题,本文提出了AE-YOLO,一种注意力引导的自编码器增强型YOLO框架,用于鲁棒的绝缘子缺陷检测。该架构在特征金字塔网络-路径聚合网络(FPN-PAN)颈部集成了轻量级瓶颈自编码器,在多尺度特征融合过程中保留了异常敏感信息。整个骨干网络使用卷积块注意力模块(CBAM),增强了特征辨别能力并抑制了背景干扰。该框架还引入了一种方差最大化的自编码器正则化策略,鼓励生成多样化、缺陷判别性的潜在表示。网络使用统一目标进行训练,该目标结合了焦点损失、完全IoU(CIoU)损失和自编码器正则化,以解决前景-背景不平衡问题并提高定位精度。在推理过程中,加权框融合(WBF)结合了YOLOv8、YOLOv10和YOLO11的预测结果。自编码器引导的置信度提升机制提高了对罕见缺陷类别的敏感性。在绝缘子缺陷检测数据集上的实验表明,采用EfficientNetV2骨干网络的AE-YOLO在mAP@0.5上达到95.10%,精度为96.40%,召回率为93.80%。这一性能在mAP@0.5上比最强的YOLO系列基线高出5.0个百分点,在召回率上高出6.7个百分点。这些结果证实了该框架的有效性和适应性。该模型是基于UAV的输电线路巡检和缺陷监测的实用且可扩展的解决方案。

英文摘要

Automated defect detection in high-voltage transmission-line insulators remains challenging due to severe class imbalance, large scale variation, and the small spatial extent of defect instances in Unmanned Aerial Vehicle (UAV) imagery. To address these challenges, this paper proposes AE-YOLO, an Attention-Guided AutoEncoder-Enhanced YOLO framework for robust insulator defect detection. The architecture integrates lightweight bottleneck autoencoders within a Feature Pyramid Network-Path Aggregation Network (FPN-PAN) neck. This preserves anomaly-sensitive information during multi-scale feature fusion. Convolutional Block Attention Modules (CBAM) are used throughout the backbone, enhancing feature discrimination and suppressing background interference. The framework also introduces a variance-maximizing autoencoder regularization strategy, which encourages diverse, defect-discriminative latent representations. The network trains using a unified objective that combines focal loss, Complete IoU (CIoU) loss, and autoencoder regularization to address foreground-background imbalance and improve localization accuracy. During inference, Weighted Boxes Fusion (WBF) combines predictions from YOLOv8, YOLOv10, and YOLO11. An autoencoder-guided confidence boosting mechanism improves sensitivity to rare defect categories. Experiments on the Insulator-Defect Detection dataset show that AE-YOLO with an EfficientNetV2 backbone achieves 95.10 percent mAP at 0.5, 96.40 percent precision, and 93.80 percent recall. This performance surpasses the strongest YOLO-family baseline by 5.0 points in mAP at 0.5 and 6.7 points in recall. These results confirm the effectiveness and adaptability of the framework. The model is a practical and scalable solution for UAV-based transmission-line inspection and defect monitoring.

2606.06539 2026-06-08 cs.CV cs.AI cs.LG cs.NE 交叉投稿

Synthetic Benchmarks Overstate Forward-Forward Scaling: Real-Data Limits of Layer-Local Training

合成基准高估了前向-前向扩展:真实数据对逐层训练的限制

Yucheng Chen

发表机构 * Amplimit

AI总结 通过DTG-FF方法在真实数据上评估前向-前向学习的扩展性,发现其与反向传播的差距随类别数增加而扩大,合成任务高估了其迁移能力,且内存优势不成立。

Comments 23 pages, 6 figures

详情
AI中文摘要

前向-前向(FF)学习[Hinton, 2022]用严格的逐层良好性更新取代了反向传播。最近的FF-CNN工作在32x32基准上缩小了与BP的差距,引发了逐层训练是否在现实规模下成为可行替代方案的问题。为了严格探究这一点,我们开发了DTG-FF——动态温度良好性、解耦归一化和多层融合——作为在九个真实数据基准上设定FF系列最先进水平的工具(CIFAR-10上91.8%,以及ImageNet-100 224x224上的首个FF基线),并用它来审计逐层训练实际能扩展到何种程度。(1)真实数据扩展。在相同配方和主干下,架构匹配的BP-DeepSup基线在CIFAR-10/CIFAR-100上分别超过DTG-FF 2.40/5.93个百分点,且差距随类别数增加而扩大。在224x224分辨率下,同一工具仅达到49.4%——这是该尺度下的首个FF基线,而典型BP超过75%[Tian et al., 2020]——暴露了在32x32下不可见的真实数据上限。(2)合成与真实K冲突。在合成教师-学生任务中,随着类别数K增长,DTG-FF越来越优于BP;而在真实图像上,FF-BP差距符号反转并随K扩大。数据集内CIFAR-100粗粒度与细粒度探针将标签层次与图像分布分离:合成K扫描将输出维度与细粒度判别难度混淆,从而高估了FF的可迁移性。(3)系统审计。FF可以在不存储深度激活的情况下实现,但在普通8 GB硬件上,标准BP+梯度累积达到4.18 GB / 157 imgs/s,而DTG-FF为7.90 GB / 138 imgs/s,因此在公平基线支持下,基于内存的理由在此规模下不成立。

英文摘要

Forward-Forward (FF) learning [Hinton, 2022] replaces backpropagation with strictly layer-local goodness updates. Recent FF-CNN work has narrowed the gap to BP on 32x32 benchmarks, raising the question of whether layer-local training is becoming a viable alternative at realistic scale. To probe this rigorously, we develop DTG-FF -- dynamic temperature goodness, decoupled normalization, and multi-layer fusion -- as an instrument that sets FF-family state of the art across nine real-data benchmarks (91.8% CIFAR-10 and the first FF baseline at ImageNet-100 224x224), and use it to audit how far layer-local training actually scales. (1) Real-data scaling. Under identical recipe and backbone, an architecture-matched BP-DeepSup baseline beats DTG-FF by 2.40/5.93 pp on CIFAR-10/CIFAR-100, and the gap widens with class count. At 224x224 the same instrument reaches only 49.4% -- the first FF baseline at this scale, versus typical BP above 75% [Tian et al., 2020] -- exposing a real-data ceiling invisible at 32x32. (2) Synthetic vs. real K-conflict. DTG-FF increasingly outperforms BP as class count K grows on synthetic teacher-student tasks, yet on real images the FF-BP gap reverses sign and widens with K. A within-dataset CIFAR-100 coarse vs. fine probe isolates label-hierarchy from image distribution: synthetic K-sweeps confound output dimensionality with fine-grained discrimination difficulty and thereby overstate FF transferability. (3) Systems audit. FF can be implemented without storing depth-wide activations, but on commodity 8 GB hardware standard BP+gradient-accumulation reaches 4.18 GB / 157 imgs/s versus DTG-FF's 7.90 GB / 138 imgs/s, so a memory-based justification for FF at this scale is not supported under fair baselines.

2606.06555 2026-06-08 cs.NE cs.LG 交叉投稿

Depth over Fidelity in Fixed-Budget Noisy Evolution Strategies

固定预算噪声进化策略中深度优先于保真度

Sichen Wang, Zhipeng Lu

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 针对固定评估预算下的噪声进化策略,提出概率精英成员(PEM)方法,通过条件期望秩权重替代硬秩权重,实现Rao-Blackwell化降噪,在COCO基准和RL等任务中取得一致提升。

Comments Accepted at the 43rd International Conference on Machine Learning (ICML 2026). 28 pages, 16 figures, 7 tables, including appendices

详情
AI中文摘要

在固定评估预算下的噪声进化策略面临深度-保真度权衡:花费评估来去噪代内排序会减少优化器可以执行的分布更新次数。我们主张深度优先于保真度,并提出概率精英成员(PEM),该方法用集成排序不确定性的条件期望秩权重替代进化策略中基于硬秩的权重。PEM在保持条件均值更新的同时减少了条件更新离散度,这是对噪声秩基步骤的Rao-Blackwell化。我们通过带每代开销上限的残差自助法(RB-PEM)实例化PEM,并辅以自适应探测-切换机制以应对低噪声场景。在COCO bbob-noisy套件以及包括RL策略搜索和超参数优化的外部任务中,RB-PEM在高误排序、预算受限的设置下取得了一致增益。

英文摘要

Noisy evolution strategies under fixed evaluation budgets face a depth-fidelity trade-off: spending evaluations to denoise intra-generation rankings reduces the number of distribution updates the optimizer can execute. We argue for depth over fidelity and propose probabilistic elite membership (PEM), which replaces hard rank-based weights in evolution strategies with conditional expected rank weights that integrate over ranking uncertainty. PEM preserves the conditional mean update while reducing conditional update dispersion, a Rao-Blackwellization of the noisy rank-based step. We instantiate PEM via residual bootstrapping (RB-PEM) with capped per-generation overhead, complemented by an adaptive probe-and-switch mechanism for low-noise regimes. Across the COCO bbob-noisy suite and external tasks including RL policy search and hyperparameter optimization, RB-PEM achieves consistent gains in high-misranking, budget-constrained settings.

2606.06565 2026-06-08 cs.GR cs.HC cs.LG 交叉投稿

AI Level of Detail: Distance-Aware ML Model Precision Selection for Real-Time Human Motion Prediction in Games

AI细节层次:面向游戏中实时人体运动预测的距离感知机器学习模型精度选择

Mathew Varghese

发表机构 * University of Washington(华盛顿大学)

AI总结 提出AI LOD框架,根据NPC与玩家摄像头的距离调整机器学习推理精度,利用量化模型作为近似,在保持感知质量的同时降低计算开销。

Comments Camera-ready for SIGGRAPH Technical Workshops 2026

详情
AI中文摘要

现代游戏引擎使用学习到的运动模型来驱动NPC动画,消耗大量计算资源。本文提出AI细节层次(AI LOD)框架,其中机器学习推理精度根据每个NPC与玩家摄像头的距离进行调整。核心思想类似于经典的几何LOD:在差异不可察觉的地方用更便宜的近似替代。这里的近似是低精度的量化机器学习模型,而不是低多边形网格。本文的贡献在于AI LOD概念本身:推理时量化可以作为AI驱动角色动画的LOD轴——更广泛地说,适用于任何基于AI的运行时系统,其中感知敏感性随上下文变化。使用Li等人的卷积序列到序列模型作为代表性示例来演示该概念,将其训练好的检查点导出为三个ONNX Runtime变体(FP32、FP16和INT8逐张量),旨在运行时由基于距离的选择器路由。在CMU Mocap数据集上的评估初步表明,每个精度层级可以在其指定距离范围内提供服务,且可感知的退化可忽略不计,支持了距离感知的机器学习模型精度选择作为基于AI的角色动画的可行LOD策略这一更广泛的假设。

英文摘要

Modern game engines spend significant compute animating NPCs with learned motion models. This paper proposes AI Level of Detail (AI LOD), a framework in which machine learning inference precision is adapted based on the distance between each NPC and the player camera. The core idea mirrors classical geometry LOD: substitute a cheaper approximation where the difference is imperceptible. Here, the approximation is a lower-precision quantized machine learning model rather than a lower-polygon mesh. The contribution of this work is the AI LOD concept itself: that inference-time quantization can serve as the LOD axis for AI-driven character animation - and more broadly, for any AI-based runtime system where perceptual sensitivity varies with context. The convolutional sequence-to-sequence model of Li et al. is used as a representative example to demonstrate the concept, with its trained checkpoint exported into three ONNX Runtime variants (FP32, FP16, and INT8 per-tensor), intended to be routed by a distance-based selector at runtime. Evaluation on the CMU Mocap dataset provides initial evidence that each precision tier can be served at its assigned distance range with negligible perceptible degradation, supporting the broader premise that distance-aware ML model precision selection is a viable LOD strategy for AI-based character animation.

2606.06573 2026-06-08 physics.flu-dyn cs.CL cs.LG eess.SP 交叉投稿

Multiscale POD of Transformer Attention Fields: Scale-Selective Analysis via Morlet Scalogram

Transformer注意力场的多尺度POD:基于Morlet尺度图的尺度选择性分析

Athanasios Zeris

发表机构 * Independent Researcher(独立研究者) Athens, Greece(希腊雅典)

AI总结 提出尺度选择性POD方法分析Transformer注意力场,通过Morlet小波识别时间尺度,提取各尺度能量主导模态,揭示层间尺度组织规律,无需架构修改或语言标注。

Comments 23 pages, 3 figures, 4 tables

详情
AI中文摘要

我们引入尺度选择性本征正交分解(POD)用于Transformer注意力场,受POD从湍流系综中提取能量主导模态的启发。Morlet连续小波变换识别文档系综中注意力滞后结构的主导时间尺度;然后POD从注意力场系综中提取每个尺度上的能量主导模态。得到的模态揭示了层依赖的尺度组织,早期层强调精细尺度,后期层转向较粗尺度。我们根据POD特征值衰减率定义谱集中指数,并经验性地表明该指数通过注意力场复杂度区分不同层。根据经典POD最优性定理,提取的模态最小化系综上的平均L2重构误差(定理1),为每层提供数据驱动的有效秩。该方法无需架构修改和语言标注:主导注意力模式仅从系综统计中涌现。湍流类比是结构性的而非物理性的:我们借用系综协方差和模态分析,而非流体动力学本身。

英文摘要

We introduce scale-selective Proper Orthogonal Decomposition (POD) for transformer attention fields, inspired by the use of POD for extracting energetically dominant modes from turbulent flow ensembles. The Morlet continuous wavelet transform identifies dominant temporal scales in the attention lag structure across a document ensemble; POD then extracts the energetically dominant modes at each scale from the ensemble of attention fields. The resulting modes reveal layer-dependent scale organisation, with early layers emphasising fine scales and later layers shifting toward coarser scales. We define a spectral concentration index from the POD eigenvalue decay rate and show empirically that it differentiates layers by their attention field complexity. By the classical POD optimality theorem, the extracted modes minimise the average L2 reconstruction error over the ensemble (Theorem 1), giving a data-driven effective rank for each layer. The method requires no architectural modification and no linguistic annotations: dominant attention patterns emerge from ensemble statistics alone. The turbulence analogy is structural rather than physical: we borrow ensemble covariance and modal analysis, not fluid dynamics itself.

2606.06601 2026-06-08 cs.CV cs.AI cs.LG 交叉投稿

Direct 3D-Aware Object Insertion via Decomposed Visual Proxies

通过分解视觉代理实现直接3D感知物体插入

Jingbo Gong, Yikai Wang, Yushi Lan, Yuhao Wan, Ziheng Ouyang, Rui Zhao, Ming-Ming Cheng, Qibin Hou, Chen Change Loy

发表机构 * Google(谷歌) Black Forest Labs(黑森林实验室)

AI总结 提出DIRECT框架,通过分解外观、几何和上下文引导,实现可控制3D姿态的物体插入,在几何可控性和视觉质量上优于现有方法。

Comments ICML 2026; Project Page: https://gong1130.github.io/DIRECT/

详情
AI中文摘要

物体插入旨在将参考对象无缝合成到背景图像的指定区域。最近的基于扩散的方法实现了高视觉质量,但将插入视为简单的2D修复任务,无法显式控制对象的3D姿态,限制了其实用性。我们提出DIRECT(用于参考组合和目标集成的分解注入),一种新颖框架,将交互式姿态操作与高保真2D图像合成相结合,实现姿态可控的物体插入。我们的方法将插入条件分解为三个互补组件:从参考对象捕获视觉细节的外观引导、从用户调整的3D代理派生的几何引导以及来自目标背景的上下文引导。通过将它们注入到不同路径,DIRECT避免了特征纠缠,同时保留了参考外观、遵循用户指定的姿态并使对象适应目标场景。我们还引入了一个自动数据构建流程,以提高训练数据的多样性和质量。实验表明,DIRECT在几何可控性和视觉质量方面均优于先前方法。

英文摘要

Object insertion aims to seamlessly composite a reference object into a specified region of a background image. Recent diffusion-based methods achieve high visual quality but formulate insertion as a simple 2D inpainting task, providing no explicit control over the object's 3D pose and limiting their practical applicability. We propose DIRECT (Decomposed Injection for Reference Composition and Target-integration), a novel framework that integrates interactive pose manipulation with high-fidelity 2D image synthesis to enable pose-controllable object insertion. Our method decomposes the insertion conditions into three complementary components: appearance guidance capturing visual details from the reference object, geometry guidance derived from the user-adjusted 3D proxy, and context guidance from the target background. By injecting them through separate pathways, DIRECT avoids feature entanglement and simultaneously preserves reference appearance, follows the user-specified pose, and adapts the object to the target scene. We also introduce an automated data construction pipeline to improve the diversity and quality of training data. Experiments show that DIRECT outperforms previous methods in both geometric controllability and visual quality.

2606.06615 2026-06-08 cs.SD cs.AI cs.LG eess.AS 交叉投稿

FIGMA: Towards FIne-Grained Music retrievAl

FIGMA:迈向细粒度音乐检索

Nishit Anand, Ashish Seth, Sreyan Ghosh, Dinesh Manocha, Ramani Duraiswami

发表机构 * University of Maryland, College Park(马里兰大学学院公园分校)

AI总结 针对现有音乐检索模型无法处理细粒度属性查询的问题,提出多视角对比架构FIGMA,通过联合优化全局音频-文本对齐和帧级标记对齐,在统一表示空间中捕获高层语义和细粒度音乐属性,并在新构建的细粒度音乐描述数据集上取得显著提升。

Comments Accepted to ACL 2026. Project Website: https://nishitanand.github.io/figma-website/

详情
AI中文摘要

使用自然语言描述检索音乐已通过对比音频-文本模型(如CLAP)得到改进,但当前系统仍局限于粗粒度语义查询。当描述指定细粒度音乐属性(如速度、调性、和弦进行或节奏结构)时,现有模型通常无法检索到正确的音频。我们表明,这一限制源于对比学习目标本身:尽管在长描述上训练,基于CLAP的模型实际上仅利用前几个标记,丢弃了详细提示中编码的大量信息。然后,我们提出FIGMA(细粒度音乐检索),一种多视角对比架构,通过联合优化全局音频-文本对齐和帧级、标记级对齐来解决这一限制。该设计使FIGMA能够在统一表示空间中捕获高层语义上下文和细粒度音乐属性。此外,我们形式化了细粒度音乐检索任务,并构建了细粒度音乐描述数据集(FGMCaps),一个包含38万音乐-描述对的大规模训练数据集以及1万测试集,两者都标注了速度、调性、和弦进行、节拍数以及流派和情绪。大量实验表明,FIGMA在多个音乐检索基准(包括域外评估)上持续优于现有基于CLAP的音乐检索模型,相对改进高达73.3%。

英文摘要

Retrieving music using natural language descriptions has improved with contrastive audio-text models such as CLAP, but current systems remain limited to coarse semantic queries. When descriptions specify fine-grained musical attributes such as tempo, key, chord progression, or rhythmic structure, existing models often fail to retrieve the correct audio. We show that this limitation stems from the contrastive learning objective itself: despite being trained on long captions, CLAP-based models effectively utilize only the first few tokens, discarding much of the information encoded in detailed prompts. Then, we propose FIGMA (FIne-Grained Music RetrievAl), a multi-view contrastive architecture that addresses this limitation by jointly optimizing global audio-text alignment and frame-level, token-wise alignment. This design enables FIGMA to capture both high-level semantic context and fine-grained musical attributes within a unified representation space. Moreover, we formalize the task of Fine-Grained Music Retrieval and construct Fine-Grained Music Caption dataset (FGMCaps), a large-scale dataset of 380K music-caption pairs for training along with a 10K test set, both annotated with tempo, key, chord progression, beat count, as well as genre and mood. Extensive experiments demonstrate that FIGMA consistently outperforms existing CLAP-based music retrieval models across multiple music retrieval benchmarks, including out-of-domain evaluations, with relative improvements of up to 73.3%.

2606.06618 2026-06-08 cs.RO cs.AI cs.LG 交叉投稿

ChronoForest: Closed-Loop Multi-Tree Diffusion Planning for Efficient Bridge Search and Route Composition

ChronoForest: 用于高效桥接搜索和路线组合的闭环多树扩散规划

Jungmin Seo, Jaesik Park

发表机构 * Seoul National University(首尔国立大学)

AI总结 针对仅依赖短程离线轨迹进行长程路线规划的问题,提出ChronoForest系统,通过锚链树扩散规划器和在线多树协调器实现局部桥接搜索与全局路线重解,在OGBench和哈密顿路线组合基准上显著提升成功率和效率。

Comments 40 pages, 4 figures, 7 tables, 3 algorithms

详情
AI中文摘要

当仅有短程离线轨迹可用时,我们如何规划到达指定目标、访问必经航点且保持路径短的长程路线?这一问题在离线导航中至关重要,因为收集足够丰富的长程数据十分困难,但真实智能体仍需以路线级效率(而非仅仅可行性)解决长程任务。难点有两方面:在微观层面,组合多个短程片段会在搜索代价和路径质量之间产生权衡;在宏观层面,航点排序需要比较起点、目标和航点锚点之间的成对旅行代价,而这些锚点在规划前未知,且仅通过长程时间距离估计时可靠性下降。本文提出ChronoForest,一种闭环规划系统,通过锚链树扩散规划器和在线多树协调器,将局部桥接搜索与在线路线重解耦合。ChronoForest利用时间距离进行短程引导和节点评估,同时利用搜索时的桥接证据验证长程锚点连通性,并反复重解路线。在OGBench AntMaze-Stitch上,ChronoForest在中等、大型和巨型分片上分别达到99.8%、99.3%和99.5%的成功率,并在巨型拼接任务上相比先前报道的扩散方法提升高达34.5个百分点。在哈密顿路线组合基准上,在线重解纠正了较差的时间排序,提升了路线质量,同时代价远低于穷举规划。

英文摘要

How can we plan long-horizon routes that reach designated goals, visit required waypoints, and remain short when only short-horizon offline trajectories are available? This problem matters in offline navigation because collecting sufficiently rich long-horizon data is difficult, yet real agents must still solve long-range tasks with route-level efficiency rather than mere feasibility. The difficulty is twofold: at the microscopic level, composing many short-horizon segments creates a trade-off between search cost and path quality, while at the macroscopic level, waypoint ordering requires comparing pairwise travel costs among start, goal, and waypoint anchors that are unknown before planning and increasingly unreliable when estimated only from long-range temporal distance. In this paper, we propose ChronoForest, a closed-loop planning system that couples local bridge search and online route re-solving through an anchor-chaining tree diffusion planner and an online multi-tree orchestrator. ChronoForest uses temporal distance for short-range guidance and node evaluation, while using search-time bridge evidence to validate long-range anchor connectivity and repeatedly re-solve the route. On OGBench AntMaze-Stitch, ChronoForest achieves 99.8%, 99.3%, and 99.5% success on the medium, large, and giant splits and improves giant-stitch success by up to 34.5 points over prior reported diffusion-based results. On Hamiltonian route-composition benchmarks, online re-solving corrects poor temporal orderings and improves route quality while remaining substantially cheaper than exhaustive planning.

2606.06627 2026-06-08 cs.RO cs.AI cs.CV cs.LG 交叉投稿

What Matters When Cotraining Robot Manipulation Policies on Everyday Human Videos?

在日常生活人类视频上协同训练机器人操作策略时什么因素重要?

Richard Li, Aditya Prakash, Andrew Wen, Saurabh Gupta, Yilun Du, Pulkit Agrawal

发表机构 * Massachusetts Institute of Technology(麻省理工学院) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Harvard University(哈佛大学)

AI总结 研究利用日常互联网视频协同训练机器人操作策略时,手部姿态质量和运动差距对迁移的影响,提出一种协同训练方法,在低机器人数据场景下六个操作任务中绝对成功率提升29.7%。

Comments The project website is here: https://richardrl.github.io/what-matters-cotraining-human-videos/index.html

详情
AI中文摘要

用于协同训练机器人操作策略的人类视频数据集主要由精心策划的演示组成,其中动作被编排成类似机器人行为,并且使用专用硬件捕获3D手部姿态。更丰富的数据源是日常互联网视频,但哪些因素能够实现从这些视频到机器人的迁移仍是一个开放问题。我们使用一个新的数据集(包含532个人类视频,共28小时的高质量三角测量手部标签和自然动作)对此进行研究。我们发现手部姿态质量影响迁移,但即使手部姿态准确,固有的运动差距也会阻碍迁移,除非视觉和策略网络针对每种具身形态进行专门化。我们的协同训练方法在低机器人数据场景下,在六个操作任务中绝对成功率提升29.7%,并带来一致的改进。

英文摘要

Human video datasets used for cotraining robot manipulation policies largely consist of curated demonstrations where motions are orchestrated to resemble robot behavior and 3D hand poses are captured with specialized hardware. A more plentiful source of data is everyday Internet video, but it is an open question what factors enable transfer from such videos to robots. We investigate this using a new dataset of 532 human videos with 28 hours of high-quality triangulated hand labels and natural motions. We find that hand pose quality affects transfer, but even with accurate hands, the inherent motion gap hinders transfer unless the vision and policy networks specialize to each embodiment. Our cotraining recipe yields consistent improvements, with an absolute success rate gain of $29.7\%$ in the low-robot-data regime across six manipulation tasks.

2606.06664 2026-06-08 cs.CV cs.AI cs.LG 交叉投稿

Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers

内在视觉:神经科学启发的概念电路用于解释和引导视觉变换器

Tang Li, Yanlin Chen, Mengmeng Ma, Xi Peng

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出ViSAE工具箱,通过神经科学启发的概念电路解释视觉变换器内部机制,包含高效概念集、自动电路追踪算法和概念编辑应用,在WaterBirds上最差组准确率提升48.2%。

Comments In Proceedings of the International Conference on Machine Learning, 2026. (acceptance rate 26.6%)

详情
AI中文摘要

尽管视觉变换器(ViT)具有高准确率,但其预测可能受到虚假线索的驱动,因此在安全部署前需要理解其内部工作机制。稀疏自编码器(SAE)为将模型表示分解为人类可解释的概念提供了有前景的视角,但由于对概念覆盖范围的控制有限以及特征解释的主观性和不可扩展性,将基于SAE的解释方法应用于ViT仍然具有挑战性。为填补这些空白,受神经科学启发原理的驱动,我们提出了ViSAE,一个通过概念电路理解ViT内部工作机制的机械可解释性工具箱。ViSAE包含三个组成部分:(1)一个包含64K图像和16K视觉基础概念词汇的探测套件,与ImageNet相比,概念覆盖效率提高了20倍,与现有概念集相比,解释准确率提高了28.7%。(2)自上而下的概念读取和自下而上的电路追踪算法,通过概念电路自动恢复ViT内部工作机制。(3)用于审计和引导ViT行为的应用。通过概念编辑,ViSAE在WaterBirds上将最差组准确率提高了48.2%,比现有方法高出23.8%。我们的数据和代码:此 https URL。

英文摘要

Despite high accuracy, Vision Transformer (ViT) predictions can be driven by spurious cues, raising the need to understand their inner workings before safe deployment. Sparse autoencoders (SAEs) provide a promising lens for decomposing model representations into human-interpretable concepts, yet adapting SAE-based interpretation to ViTs remains challenging due to limited control over concept coverage and subjective, non-scalable feature interpretation. To fill the gaps, motivated by neuroscience-inspired principles, we propose ViSAE, a mechanistic interpretability toolbox for understanding ViT inner workings through concept circuits. ViSAE consists of three components: (1) A probing suite with 64K images and a 16K visually grounded concept vocabulary, improving concept coverage efficiency by 20x over ImageNet and interpretation accuracy by 28.7% over existing concept sets. (2) Top-down concept reading and Bottom-up circuit tracing algorithms that automatically recover ViT inner workings via concept circuits. (3) Applications for auditing and steering ViT behavior. Through concept editing, ViSAE improves the worst-group accuracy on WaterBirds by 48.2%, outperforming existing methods by 23.8%. Our data and code: https://github.com/deep-real/ViSAE.

2606.06715 2026-06-08 cs.CL cs.AI cs.LG 交叉投稿

Does Topic Sentiment Cause Perceived Ideology? Comparing Human and LLM Annotations in Political News Articles

主题情感是否导致感知意识形态?比较政治新闻文章中人类与LLM的标注

Upasana Chatterjee

发表机构 * Columbia University(哥伦比亚大学)

AI总结 研究主题情感对感知政治意识形态的因果效应,通过比较人类与LLM标注,发现微调GPT-4o-mini产生显著因果效应,归因于捷径学习。

Comments Accepted to ACL SRW 2026

详情
AI中文摘要

我们探究主题情感是否对感知政治意识形态具有因果效应,以及答案是否取决于意识形态标签的分配者。使用来自AllSides的文章,结合Llama-3.3-70b-versatile的共享情感标注,我们比较了来自专家人类标注者、GPT-4o-mini(基线和微调)以及Llama-3.3-70B的意识形态标签。我们应用双重机器学习(DML)和社区级中介分析于所有四种标注范式。人类标注在社区水平未产生显著因果效应。微调后的GPT-4o-mini达到了最高的分类准确率(F1=72.48),并且是唯一在社区水平产生显著处理效应和中介中显著自然直接效应(NDE)的标注范式。我们将此解释为捷径学习的证据:对意识形态标签数据进行微调导致模型内化了一种虚假的情感-意识形态耦合,而这种耦合在人类判断中对此任务并不起作用。这种耦合在基于F1的评估中结构上不可见,对LLM标注作为银标签以及在下游因果分析中作为人类判断的代理的使用具有影响。

英文摘要

We ask whether topic sentiment has a causal effect on perceived political ideology, and whether the answer depends on who assigns the ideology label. Using articles from AllSides, paired with shared sentiment annotations from Llama-3.3-70b-versatile, we compare ideology labels from expert human annotators, GPT-4o-mini (baseline and finetuned), and Llama-3.3-70B. We apply Double Machine Learning (DML) and community-level mediation analysis across all four annotation paradigms. Human annotations yield no significant causal effects at the community level. Fine-tuned GPT-4o-mini achieves the highest classification accuracy (F1=72.48) and is the only annotator paradigm that produces significant community-level treatment effects and significant natural direct effects (NDEs) in mediation. We interpret this as evidence of shortcut learning: fine-tuning on ideology-labeled data causes the model to internalise a spurious sentiment--ideology coupling not operative in human judgment for this task. This coupling is structurally invisible to F1-based evaluation, with implications for the use of LLM annotations as silver labels and as proxies for human judgment in downstream causal analyses.

2606.06741 2026-06-08 cs.AI cs.CL cs.LG 交叉投稿

OpenSkill: Open-World Self-Evolution for LLM Agents

OpenSkill: 面向LLM智能体的开放世界自我进化

Zhiling Yan, Dingjie Song, Hanrong Zhang, Wei Liang, Yuxuan Zhang, Yutong Dai, Lifang He, Philip S. Yu, Ran Xu, Xiang Li, Lichao Sun

发表机构 * Lehigh University(莱维大学) University of Illinois Chicago(伊利诺伊大学芝加哥分校) University of British Columbia(不列颠哥伦比亚大学) Vector Institute(向量研究所) Salesforce AI Research(Salesforce人工智能研究) Massachusetts General Hospital and Harvard Medical School(麻省总医院和哈佛医学院)

AI总结 提出OpenSkill框架,使智能体在无目标任务监督下,利用开放世界资源自举构建技能和验证信号,实现自我进化,在多个基准上取得最佳自动通过率。

Comments 20 pages, 4 figures and 8 tables. Code is avalable at https://github.com/OpenLAIR/OpenSkill

详情
AI中文摘要

自我进化智能体需要在部署后进行适应,但现有方法假设存在可用的学习循环,例如精心策划的技能、成功的轨迹或验证信号。真实的开放世界部署可能不提供这些,只提供一个任务提示。在这项工作中,我们研究开放世界自我进化,其中智能体必须从零开始构建其技能和自身的验证信号,使用开放世界资源但没有目标任务监督。我们提出OpenSkill,一个启动这个循环的框架:它从文档、代码库和网络中获取基础知识和验证锚点,将它们综合成可迁移的技能,并根据自建的虚拟任务(基于锚点而非目标答案)来优化这些技能。因此,开放世界既提供了要学习的知识,也提供了一个独立于监督的练习环境,目标任务监督保留用于最终评估。在三个基准和两个目标智能体上,OpenSkill在满足无监督约束的同时取得了最佳自动通过率。分析表明,其技能无需特定模型适应即可跨模型迁移,并且其自建验证器与真实结果一致,尽管从未访问过这些结果。

英文摘要

Self-evolving agents requires adaptation after deployment, but existing approaches assume a usable learning loop, such as curated skills, successful trajectories, or verifier signals. Real open-world deployments may provide none of these, offering only a task prompt. In this work, we study open-world self-evolution, where an agent must build both its skills and its own verification signals from scratch, using open-world resources but no target-task supervision. We propose OpenSkill, a framework that bootstraps this loop: it acquires grounded knowledge and verification anchors from documentation, repositories, and the web, synthesizes them into transferable skills, and refines those skills against self-built virtual tasks grounded in the anchors rather than in target answers. The open world thus supplies both the knowledge to be learned and a supervision-independent practice environment, with target-task supervision reserved for final evaluation. Across three benchmarks and two target agents, OpenSkill attains the best automated pass rate while satisfying the no-supervision constraint. Analysis shows its skills transfer across models without model-specific adaptation, and its self-built verifier aligns with ground-truth outcomes despite never accessing them.

2606.06748 2026-06-08 cs.CL cs.AI cs.LG 交叉投稿

Evidence Graph Consistency in Retrieval-Augmented Generation: A Model-Dependent Analysis of Hallucination Detection

检索增强生成中的证据图一致性:基于模型的幻觉检测分析

Jianru Shen

AI总结 提出证据图一致性(EGC)框架,通过构建局部证据图并计算五种结构一致性指标检测幻觉,发现不同模型族间一致性特征方向相反,表明嵌入图一致性不能作为模型无关的检测信号。

Comments Accepted at the International Conference on Advanced Machine Learning and Data Science; to appear in the IEEE Xplore proceedings

详情
AI中文摘要

检索增强生成(RAG)减少了但并未消除大型语言模型中的幻觉。现有检测方法依赖于生成答案与检索段落之间的平面相似性,忽略了证据片段与答案声明之间的结构关系。我们提出了证据图一致性(EGC)框架,该框架为每个响应构建一个局部证据图,并计算五种结构一致性度量作为幻觉指标。在RAGTruth的完整问答拆分上,跨六个LLM(5,767个响应)进行评估,EGC揭示了一个一致的模型族分裂:图一致性特征在Llama-2模型中显示出预期的诊断方向,但在GPT-4、GPT-3.5和Mistral-7B中表现出系统性逆转。这种逆转表明不同模型族之间存在定性的不同幻觉模式,并表明基于嵌入的图一致性不能作为模型无关的幻觉检测信号。

英文摘要

Retrieval-Augmented Generation (RAG) reduces but does not eliminate hallucination in large language models. Existing detection methods rely on flat similarity between generated answers and retrieved passages, ignoring structural relationships among evidence pieces and answer claims. We propose Evidence Graph Consistency (EGC), a framework that constructs a local evidence graph per response and computes five structural consistency measures as hallucination indicators. Evaluated on the full question answering split of RAGTruth across six LLMs (5,767 responses), EGC reveals a consistent model-family split: graph consistency features show the expected diagnostic direction for hallucinations in Llama-2 models but exhibit systematic reversal in GPT-4, GPT-3.5, and Mistral-7B. This reversal suggests qualitatively different hallucination patterns across model families and indicates that embedding-based graph consistency cannot serve as a model-independent hallucination detection signal.

2606.06764 2026-06-08 stat.ML cs.AI cs.LG 交叉投稿

Optimal Rates for Generalization of Gradient Descent Methods with Deep Neural Networks

深度神经网络梯度下降方法的泛化最优速率

Junyu Zhou, Puyu Wang, Yunwen Lei, Yiming Ying, Ding-Xuan Zhou

发表机构 * Mathematical Institute for Machine Learning and Data Science, KU Eichstätt-Ingolstadt(机器学习与数据科学数学研究所,埃施特哈特-英戈尔施塔特大学) Department of Computer Science, RPTU Kaiserslautern-Landau(计算机科学系,凯撒斯劳滕-兰道大学) Department of Mathematics, University of Hong Kong(数学系,香港大学) School of Mathematics and Statistics, University of Sydney(数学与统计学学院,悉尼大学)

AI总结 本文针对深度ReLU网络,在神经正切核(NTK)机制下,首次建立了梯度下降(GD)和随机梯度下降(SGD)的极小化最优泛化误差速率,证明宽度足够时可达核方法的最优速率。

Comments 39 pages, 1 table

详情
AI中文摘要

近年来,在神经正切核(NTK)机制下,对于过参数化神经网络的梯度下降方法的统计泛化性能的理解取得了进展。然而,现有关于回归问题的工作大多局限于浅层网络架构,在深度神经网络理论中留下了显著的空白。本文通过为使用梯度下降(GD)和随机梯度下降(SGD)训练的深度ReLU网络提供全面的泛化分析来填补这一空白。具体来说,我们首次建立了深度ReLU网络的GD和SGD在总体风险超额上的极小化最优速率,假设网络宽度与网络深度和训练样本规模呈多项式关系。我们的结果表明,在足够宽度下,深度ReLU网络的梯度下降方法能够达到与核方法相当的泛化最优速率。

英文摘要

Recent progress has been made in understanding the statistical generalization performance of gradient descent methods for overparameterized neural networks within the neural tangent kernel (NTK) regime. However, most of the existing work on regression problems is limited to shallow network architectures, leaving a notable gap in the theory of deep neural networks. This paper addresses this gap by presenting a comprehensive generalization analysis for deep ReLU networks trained using gradient descent (GD) and stochastic gradient descent (SGD). Specifically, we establish the first known minimax-optimal rates of excess population risk for both GD and SGD with deep ReLU networks, under the assumption that the network width scales polynomially with respect to the network depth and training sample size. Our results demonstrate that with sufficient width, gradient descent methods for deep ReLU networks can achieve optimal generalization rates on par with kernel methods.

2606.06765 2026-06-08 cond-mat.mtrl-sci cs.LG 交叉投稿

Reactivity-Informed Machine Learning for Performance Prediction and Design Space Exploration of Alkali-Activated Slag

反应性信息驱动的机器学习用于碱激发矿渣性能预测与设计空间探索

Qiyao He, Zhanzhao Li, Kai Gong

发表机构 * Department of Civil and Environmental Engineering, Rice University(Rice大学土木与环境工程系) Rice Advanced Materials Institute, Rice University(Rice大学先进材料研究所) Ken Kennedy Institute, Rice University(Rice大学肯尼迪研究所)

AI总结 通过整合矿渣反应性描述符(AMODE)与机器学习,从最大规模文献数据集预测碱激发矿渣抗压强度,揭示物理一致趋势并探索低碳设计空间。

Comments 68 pages, 14 figures, 2 tables

详情
AI中文摘要

在胶凝材料中,建立配合比、原材料性能、养护条件和性能之间的定量关系一直是一个长期挑战,特别是对于具有可变前驱体和激发剂化学性质的碱激发材料。在此,我们整理了迄今为止最大的文献来源碱激发矿渣(AAS)数据集,包含超过3100条抗压强度记录、155种化学性质不同的粒化高炉矿渣(GGBS)以及24个属性,涵盖前驱体化学性质、细度和反应性。多种机器学习(ML)算法在逐步丰富的特征场景下进行了基准测试,表明整合GGBS成分、细度、养护条件和试件几何形状可提高预测性能。平均金属氧化物解离能(AMODE)作为前驱体反应性的物理可解释表示,为显式氧化物成分提供了紧凑的替代描述符,同时实现了相当的预测性能。模型解释揭示了来自异构数据的物理一致趋势,包括Na2O用量和硅酸盐模量的非单调效应、较高含水量和较大试件尺寸下预测强度降低,以及AMODE比单个氧化物含量更连贯地表示的耦合氧化物级效应。统计约束的设计空间探索揭示了强度、隐含CO2排放和成本之间依赖于反应性的权衡。设计图识别出高强度区域,其CO2排放量显著低于类似成本的OPC基参考。总体而言,这项工作展示了反应性信息驱动的ML如何从异构AAS数据中提取物理有意义的趋势,并指导源依赖性粘结剂设计。整理后的数据集可公开访问,以支持水泥和混凝土研究的进展。

英文摘要

Establishing quantitative relationships among mix design, raw material properties, curing conditions, and performance remains a long-standing challenge in cementitious materials, particularly for alkali-activated materials with variable precursor and activator chemistry. Here, we curated the largest literature-derived alkali-activated slag (AAS) dataset to date, comprising over 3100 compressive strength records, 155 chemically distinct ground granulated blast-furnace slags (GGBSs), and 24 attributes incorporating precursor chemistry, fineness, and reactivity. Multiple machine learning (ML) algorithms were benchmarked across progressively enriched feature scenarios, demonstrating that integrating GGBS compositions, fineness, curing conditions, and specimen geometry improves predictive performance. The average metal oxide dissociation energy (AMODE), a physically interpretable representation of precursor reactivity, provides a compact alternative descriptor to explicit oxide compositions while enabling comparable predictive performance. Model interpretation revealed physically consistent trends from heterogeneous data, including non-monotonic effects of Na2O dosage and silicate modulus, reduced predicted strength at higher water content and larger specimen size, and coupled oxide-level effects more coherently represented by AMODE than by individual oxide contents. Statistically constrained design space exploration reveals reactivity-dependent trade-offs among strength, embodied CO2 emissions, and cost. The design maps identify high-strength regions with substantially lower CO2 emissions than OPC-based references at similar cost. Overall, this work demonstrates how reactivity-informed ML can extract physically meaningful trends from heterogeneous AAS data and guide source-dependent binder design. The curated dataset is publicly accessible to support advances in cement and concrete research.

2606.06772 2026-06-08 stat.ML cs.AI cs.LG 交叉投稿

Generalization in Deep Neural Networks: Minimax Rates for Gradient Methods

深度神经网络的泛化:梯度方法的极小化最优速率

Junyu Zhou, Puyu Wang, Yunwen Lei, Marius Kloft, Yiming Ying

发表机构 * Mathematical Institute for Machine Learning and Data Science, Catholic University of Eichstätt-Ingolstadt(机器学习与数据科学数学研究所,埃施特哈特-因戈尔施塔特天主教大学) Department of Computer Science, RPTU Kaiserslautern-Landau(计算机科学系,凯斯莱特恩-兰道大学) Department of Mathematics, The University of Hong Kong(数学系,香港大学) School of Mathematics and Statistics, The University of Sydney(数学与统计学学院,悉尼大学)

AI总结 本文建立了过参数化深度神经网络与核方法学习动力学的联系,证明了梯度下降和随机梯度下降在足够宽度下能达到极小化最优泛化误差。

Comments 37 pages

详情
AI中文摘要

理解过参数化神经网络的泛化性能已成为深度学习理论的核心课题。尽管近期进展,特别是神经正切核(NTK)机制下的工作,揭示了浅层架构的行为,但深度神经网络(DNN)的统计泛化性质,尤其是在回归任务中,仍远未得到充分理解。本文通过提供使用梯度方法训练的DNN的全面泛化分析,在弥合这一差距方面取得了重大进展。首先,我们首次建立了使用梯度方法训练的、具有光滑激活函数的DNN的学习动态与核方法的学习动态之间的关键联系,表明过参数化DNN上的梯度方法可以完全继承其核对应物的有利学习动态。基于这一联系以及核方法已确立的最优性,我们推导出了梯度下降(GD)和随机梯度下降(SGD)的过量总体风险的第一个已知极小化最优速率,假设网络宽度与样本大小成多项式比例。我们的结果表明,在足够宽度下,由GD或SGD训练的DNN可以实现与基于核的方法相当的泛化性能。

英文摘要

Understanding the generalization performance of over-parameterized neural networks has become a central topic in deep learning theory. While recent advances, particularly works under the Neural Tangent Kernel (NTK) regime, have shed light on the behavior of shallow architectures, the statistical generalization properties of deep neural networks (DNNs), especially in regression tasks, remain far less understood. In this paper, we make significant progress toward closing this gap by providing a comprehensive generalization analysis of DNNs trained using gradient-based methods. First, we establish, for the first time, a crucial connection between the learning dynamics of a DNN with smooth activation functions trained via gradient-based methods and those of kernel methods, showing that gradient-based methods on over-parameterized DNNs can fully inherit the favorable learning dynamics of their kernel counterparts. Building on this connection and the well-established optimality of kernel methods, we derive the first known minimax-optimal rates for the excess population risk of both gradient descent (GD) and stochastic gradient descent (SGD), under the assumption that network width scales polynomially with the sample size. Our results demonstrate that, with sufficient width, DNNs trained by GD or SGD can achieve generalization performance comparable to kernel-based methods.

2606.06782 2026-06-08 cs.IT cs.LG math.IT math.ST stat.ML stat.TH 交叉投稿

The Sharp Phase Transition of Tyler's M-Estimator for Robust Subspace Recovery

Tyler's M-估计器在鲁棒子空间恢复中的尖锐相变

Gilad Lerman, Teng Zhang

发表机构 * School of Mathematics, University of Minnesota(明尼苏达大学数学系) Department of Mathematics, University of Central Florida(中央佛罗里达大学数学系)

AI总结 研究Tyler's M-估计器在临界信噪比DS-SNR=1时的行为,证明其收敛到真实子空间,建立尖锐相变。

详情
AI中文摘要

鲁棒子空间恢复(RSR)旨在从被异常值严重污染的数据集中识别潜在的d维子空间。复杂性理论结果基于维度缩放信噪比(DS-SNR)建立了问题计算难度的阈值:当DS-SNR严格小于1时,问题是SSE难的;当它大于1时,在一般位置假设下可通过实用算法求解。然而,在临界边界DS-SNR=1处实用算法的确切行为一直未知。本文解决了Tyler's M-估计器(TME)在此临界边界的行为,从而建立了尖锐相变。具体地,我们证明在一种新的稳定性条件下,当DS-SNR≥1时,TME精确收敛到真实子空间,该条件比先前文献中使用的一般位置假设更宽松。我们的分析利用了在majorization-minimization框架内对TME迭代的分解。

英文摘要

Robust Subspace Recovery (RSR) aims to identify an underlying d-dimensional subspace from a dataset heavily corrupted by outliers. Complexity-theoretic results establish a threshold for the problem's computational hardness based on the dimension-scaled signal-to-noise ratio (DS-SNR): the problem is SSE-hard when the DS-SNR is strictly less than 1, and solvable via practical algorithms when it is greater than 1 under general position assumptions. However, the exact behavior of practical algorithms at the critical boundary DS-SNR = 1 has remained unknown. This work resolves the behavior of Tyler's M-estimator (TME) at this critical boundary, consequently establishing a sharp phase transition. Specifically, we prove that TME converges exactly to the true subspace for DS-SNR \geq 1 under a new stability condition, which is less restrictive than the general position assumptions used in prior literature. Our analysis utilizes a decomposition of the TME iterates within a majorization-minimization framework.

2606.06785 2026-06-08 stat.ML cs.LG math.DS 交叉投稿

Empirical Transfer Operators and Finite-Sample Change Detection for Noisy Expanding Interval Maps

经验转移算子与含噪扩张区间映射的有限样本变化检测

Aparna Rajput

发表机构 * Department of Mathematics and Statistics, Concordia University(数学与统计学系,康科迪亚大学)

AI总结 针对一维含噪动力系统,提出基于分区经验转移矩阵的有限样本变化检测方法,通过比较滑动窗口与基线段的平稳分布L1距离来检测不变密度变化,并给出有限样本界和误报保证。

Comments 27 pages, 2 tables, 1 figure

详情
AI中文摘要

我们研究了一维含噪动力系统的有限样本变化检测,使用基于分区的经验近似来刻画平稳行为。给定区间值过程的观测,我们对状态空间进行划分,从观测到的分区元素之间的转移中估计一个有限转移矩阵,并应用一个小的Doeblin型正则化以确保唯一的平稳分布。从初始参考段,我们计算基线经验平稳分布\(\widehat{\pi}_{0,\rho}\)。对于每个后续滑动窗口,我们计算\(\widehat{\pi}_{t,\rho}\)并定义得分\[ S_t=\|\widehat{\pi}_{t,\rho}-\widehat{\pi}_{0,\rho}\|_1. \] \(S_t\)的大值表示相对于基线的平稳行为发生变化。该统计量检测不变密度或平稳定律的变化,但不检测转移动态的所有可能变化。在关于经验转移集中性、有限状态平稳分布稳定性、分区近似、正则化偏差和噪声稳定性的明确假设下,我们推导了经验平稳密度的有限样本界。该界将采样误差、正则化偏差、分区近似误差和噪声偏差分开。然后,我们得到了单窗口误报保证,以及当不变密度变化超过估计误差时的充分检测条件。我们在合成含噪beta映射变点实验中展示了该方法。

英文摘要

We study finite-sample change detection for one-dimensional noisy dynamical systems using partition-based empirical approximations of stationary behaviour. Given observations from an interval-valued process, we partition the state space, estimate a finite transition matrix from observed transitions between partition elements, and apply a small Doeblin-type regularisation to ensure a unique stationary distribution. From an initial reference segment, we compute a baseline empirical stationary distribution \(\widehatπ_{0,ρ}\). For each later sliding window, we compute \(\widehatπ_{t,ρ}\) and define the score \[ S_t=\|\widehatπ_{t,ρ}-\widehatπ_{0,ρ}\|_1. \] Large values of \(S_t\) indicate a change in stationary behaviour relative to the baseline. The statistic detects changes in invariant density or stationary law, but not all possible changes in transition dynamics. Under explicit assumptions on empirical transition concentration, finite-state stationary distribution stability, partition approximation, regularisation bias, and noise stability, we derive a finite-sample bound for the empirical stationary density. The bound separates sampling error, regularisation bias, partition approximation error, and noise bias. We then obtain a single-window false-alarm guarantee and a sufficient detection condition when the invariant density changes by more than the estimation error. We illustrate the method on synthetic noisy beta-map change-point experiments.

2606.06790 2026-06-08 cs.RO cs.LG cs.SY eess.SY 交叉投稿

Learning All-Terrain Locomotion for a Planetary Rover with Actively Articulated Suspension

学习具有主动铰接悬挂的行星探测车的全地形运动

Arthur Bouton, Tristan D. Hasseler, Michael Paton, Travis Brown, Jacob Levy, William Reid, Joshua Martin, Hari Nayar

发表机构 * Jet Propulsion Laboratory, California Institute of Technology(喷气推进实验室,加州理工学院) Center for Autonomy, University of Texas at Austin(自主性中心,德克萨斯大学奥斯汀分校) Space Systems Laboratory, University of Maryland(空间系统实验室,马里兰大学)

AI总结 提出一种带有主动万向悬挂的四轮行星探测车概念,利用强化学习训练单一神经网络控制器,实现自主障碍协商和全地形运动,通过策略整合和零样本迁移在物理车上验证。

Comments 21 pages, 26 figures

详情
AI中文摘要

本文介绍了ERNEST,一种四轮行星探测车概念,配备了两自由度主动万向悬挂系统,结合偏航和滚转驱动,实现车轮重构、转向和主动负载分配。一个单一的神经网络控制器,经过训练以在挑战性地形上跟踪期望路径,完全释放了这种驱动悬挂系统在自主障碍协商中的能力。利用高保真DARTS仿真引擎开发了强化学习框架,该引擎结合了刚体接触动力学和Bekker-Wong地面力学,使得能够出现适应松散土壤条件的运动策略。为了在异质地形上获得单一统一控制器,一种策略整合策略将地形专业化智能体的经验合并到一个神经网络中,消除了对显式地形分类和控制器切换的需求。得到的控制器结合了本体感觉和外感觉反馈,包括稀疏立体视觉导出的地形高程、底盘姿态、关节状态和力-扭矩测量。通过领域随机化、传感器噪声注入和模型到真实系统的辨识,实现了到物理车的零样本迁移。实验结果表明,该控制器能够自主穿越岩石场、凸起陷阱、轮高台阶、沙波纹和沙坡。在20°沙坡上,尽管增加了驱动,学习到的控制器在干沙上降低了37%的运输成本,并在湿沙上实现了优越的性能,而被动悬挂在湿沙上完全无法移动。

英文摘要

This paper presents ERNEST, a four-wheeled planetary rover concept equipped with a two-degree-of-freedom Active Gimbal Suspension that combines yaw and roll actuation to enable wheel reconfiguration, steering, and active load redistribution. A single neural network controller, trained to track a desired path across challenging terrain, fully unlocks the capabilities of this actuated suspension system for autonomous obstacle negotiation. A reinforcement learning framework is developed using the high-fidelity DARTS simulation engine, which combines rigid-contact dynamics and Bekker-Wong terramechanics, enabling the emergence of locomotion strategies adapted to loose-soil conditions. To obtain a single unified controller across heterogeneous terrains, a policy consolidation strategy merges the experience of terrain-specialized agents into one neural network, eliminating the need for explicit terrain classification and controller switching. The resulting controller operates on a combination of proprioceptive and exteroceptive feedback, including sparse stereo-derived terrain elevation, chassis attitude, joint states, and force-torque measurements. Zero-shot transfer to the physical rover is achieved through domain randomization, sensor noise injection, and model-to-real system identification. Experimental results demonstrate autonomous traversal of rock fields, a bump trap, a wheel-high step, sand ripples, and sandy slopes. On a 20° sandy slope, the learned controller reduces the cost of transport by 37% on dry sand despite the additional actuation, and achieves superior performance on wet sand where the passive suspension becomes completely immobilized.

2606.06814 2026-06-08 stat.ML cs.LG math.ST stat.AP stat.TH 交叉投稿

The Effect of Training Task Diversity on In-Context Learning through the Lens of Low-Dimensional Subspaces

训练任务多样性对上下文学习的影响:基于低维子空间的视角

Soo Min Kwon, Alec S. Xu, Can Yaras, Dogyoon Song, Laura Balzano, Qing Qu

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Washington(华盛顿大学) University of California, Los Angeles(加州大学洛杉矶分校) Stanford University(斯坦福大学) University of Toronto(多伦多大学)

AI总结 本文通过低秩高斯混合模型分析训练任务多样性(由子空间非重叠列数定义)如何提升线性注意力上下文学习的泛化与优化,解释训练多样性缩短学习平台期及实现分布外泛化的现象,并扩展至非线性场景。

详情
AI中文摘要

Transformer执行上下文学习(ICL)的涌现能力引发了大量旨在理解其底层机制的研究。现有工作通常研究训练任务多样性(定义为ICL训练任务向量的数量或任务向量所来自的函数类数量)如何塑造ICL的学习动态和泛化能力。尽管这两种定义都揭示了许多有趣的现象,但后一定义下的许多观察结果在理论上仍未得到解释。本文提出了一个最小分析模型,在这些现象下,这些现象可以从训练数据的属性中可靠地涌现。通过将训练任务向量建模为低秩高斯的混合,我们展示了训练任务多样性(由参数化协方差矩阵的子空间之间的非重叠列数定义)如何改善线性注意力ICL的泛化和优化轨迹。特别地,我们表明我们的模型可以解释(i)为什么任务多样性训练缩短了ICL的平台期,以及(ii)为什么ICL似乎实现了分布外泛化。最后,我们通过实验证明了我们的结果如何扩展到非线性Transformer和非线性函数类。总体而言,我们的工作提出了一个可处理的框架来统一现有的观察结果。

英文摘要

The transformer's emergent ability to perform in-context learning (ICL) has sparked a wide range of studies designed to understand its underlying mechanisms. Existing works often study how training task diversity, defined either as the number of ICL training task vectors or as the number of function classes from which the task vectors are drawn, shapes both the learning dynamics and generalization capabilities of ICL. While both definitions have uncovered many interesting phenomena, many observations under the latter definition remain theoretically unexplained. This paper presents a minimal analytical model under which these phenomena provably emerge from the properties of the training data. By modeling the training task vectors as a mixture of low-rank Gaussians, we show how training task diversity, defined by the number of non-overlapping columns between subspaces that parameterize the covariance matrices, improves both the generalization and optimization trajectory of ICL with linear attention. In particular, we show that our model can explain (i) why training with task diversity shortens the ICL plateau and (ii) why ICL appears to achieve out-of-distribution generalization. We conclude by empirically demonstrating how our results extend to nonlinear transformers and nonlinear function classes. Overall, our work presents a tractable framework to unify existing observations.

2606.06815 2026-06-08 cs.CR cs.LG 交叉投稿

AMD-FCG: An Enhanced Function Call Graph Dataset with Integrated Topological Features for Malware Detection and Classification

AMD-FCG:一个集成拓扑特征的增强函数调用图数据集,用于恶意软件检测与分类

Parthajit Borah, Sakshi Singh, D. K. Bhattacharyya, J. K. Kalita

发表机构 * School of Cyber Security and Digital Forensics, National Forensic Sciences University(网络安全与数字取证学院,国家刑事科学大学) Department of Computer Science and Engineering, Tezpur University(计算机科学与工程系,特兹pur大学) College of Engineering and Applied Science, University of Colorado(工程与应用科学学院,科罗拉多大学)

AI总结 本文提出AMD-FCG数据集,通过集成恶意软件的拓扑特征增强函数调用图,以简化检测流程并消除动态分析需求,从而提升恶意软件检测的准确性和鲁棒性。

详情
AI中文摘要

由于恶意软件表现出复杂的结构和行为,其检测一直是网络安全领域及相关日常服务中的重大挑战。因此,拥有一个可靠且自适应的解决方案来解决该问题变得至关重要。在多年来开发的多种检测方法中,最可靠的方法之一是研究和分析恶意软件的结构和行为模式。这些复杂恶意软件的模式可以借助函数调用图(FCG)获得。然而,为了有效覆盖大量恶意软件家族群体,系统需要足够大的数据集来运行。为了确保系统的准确性和鲁棒性,数据集应包含不同恶意软件样本以及良性应用程序,以安全执行检测过程。本文介绍了AMD-FCG,一个集成恶意软件拓扑特征的增强函数调用图数据集。该框架增强了检测过程,简化了网络安全专业人员的工作流程,并消除了动态分析和大量处理的需求。因此,它可用于开发和部署更高效、更具创新性的恶意软件检测系统。

英文摘要

As malware illustrates a complex structure and behavior, detection of these has been a significant challenge in the domain of cybersecurity along with related services in daily life. So, it becomes crucial to have a reliable and adaptive solution to address the issue. Among the several detection methods developed over the years, one of the most reliable ones is studying and analyzing the structural and behavioral patterns of malware. These patterns of sophisticated malware can be obtained with the help of Function Call Graphs (FCGs). However, to effectively cover numerous groups of families of malware, it is required to have a sufficiently large dataset for the system to operate on. In order to ensure accuracy and robustness of the system, the dataset should comprise samples of different malwares and a benign application for secure execution of the detection process. This paper introduces AMD-FCG, an enhanced Function Call Graph dataset integrated with topological features of malwares. The framework enhances the detection procedure, streamlining the workflow for cybersecurity professionals and also eliminating the need for dynamic analysis and extensive processing. Therefore, it can be used to develop and deploy more efficient and innovative malware detection systems.

2606.06818 2026-06-08 cs.DC cs.AR cs.LG 交叉投稿

Terastal: Layer-Variant-based Scheduling for Real-Time Multi-DNN Workloads on Heterogeneous Accelerators

Terastal:基于层变体的异构加速器实时多DNN工作负载调度

Sing-Yao Wu, Fengshuo Song, Eli Bozorgzadeh

发表机构 * nd IEEE International Conference on Embedded and Real-Time Computing Systems and Applications(第32届IEEE嵌入式与实时计算系统与应用国际会议)

AI总结 针对异构DNN加速器上多DNN执行中因层延迟差异导致的调度灵活性不足和截止时间错过率增加问题,提出层变体概念和Terastal框架,通过离线虚拟预算分配和层变体设计结合在线调度,优化加速器映射和变体选择,实验表明截止时间错过率降低30%以上且精度损失仅2.24%。

Comments 8 pages, 6 figures. Accepted by RTCSA 2026. Author accepted manuscript

详情
AI中文摘要

异构DNN加速器通过将每一层映射到其首选的加速器来改善软实时多DNN执行,从而减少延迟。然而,在偏斜的工作负载下,跨加速器的较大层延迟差异限制了调度灵活性并增加了截止时间错过率。为了解决这一挑战,我们引入了层变体,即定制的层实现,以减少非首选加速器上的延迟差距。然后,我们提出了Terastal,一个用于异构DNN加速器上层变体设计和调度的软实时框架。Terastal结合了离线异构感知虚拟预算分配和层变体设计,以及在线调度,在时间和精度约束下联合优化加速器映射和变体选择。实验结果表明,与FCFS、EDF和DREAM相比,Terastal将每个模型的截止时间错过率分别降低了40.58%、30.53%和36.27%,同时跨模型变体仅造成平均2.24%的归一化精度损失。

英文摘要

Heterogeneous DNN accelerators improve soft real-time multi-DNN execution by mapping each layer to its preferred accelerator to reduce latency. However, under skewed workloads, large layer-latency differences across accelerators limit scheduling flexibility and increase deadline misses. To address this challenge, we introduce layer variants, customized layer implementations that reduce latency gaps on non-preferred accelerators. We then present Terastal, a soft real-time framework for layer-variant design and scheduling on heterogeneous DNN accelerators. Terastal combines offline heterogeneity-aware virtual budget assignment and layer-variant design, and online scheduling to jointly optimize accelerator mapping and variant selection under timing and accuracy constraints. Experimental results show that Terastal reduces deadline miss rate per model by 40.58%, 30.53%, and 36.27% compared with FCFS, EDF, and DREAM, respectively, while incurring only 2.24% average normalized accuracy loss across models with variants.

2606.06828 2026-06-08 cs.CV cs.LG 交叉投稿

AdaGRPO: A Capability-Aware Adaptive Enhancement for Flow-based GRPO

AdaGRPO: 一种面向基于流的GRPO的能力感知自适应增强方法

Jiazi Bu, Pengyang Ling, Yujie Zhou, Yibin Wang, Yuhang Zang, Tianyi Wei, Xiaohang Zhan, Jiaqi Wang, Tong Wu, Xingang Pan, Dahua Lin

发表机构 * Shanghai Jiao Tong University(上海交通大学) S-Lab, Nanyang Technological University(南洋理工大学S实验室) Shanghai AI Laboratory(上海人工智能实验室) University of Science and Technology of China(中国科学技术大学) Stanford University(斯坦福大学) Shanghai Innovation Institute(上海创新研究院) The Chinese University of Hong Kong(香港中文大学) Fudan University(复旦大学) CPII under InnoHK(InnoHK下的CPII) Adobe Research(Adobe研究)

AI总结 提出AdaGRPO,通过在线课程过滤策略和跨层级优势融合,解决流模型GRPO中提示选择随机和优势估计缺乏全局视角的问题,提升训练稳定性和性能。

Comments Project Website: https://bujiazi.github.io/adagrpo.github.io/

详情
AI中文摘要

组相对策略优化(GRPO)在将文本到图像(T2I)流模型与人类偏好对齐方面取得了显著成功。然而,我们发现当前基于流的GRPO的学习循环与学习者的当前能力基本脱钩,在提示选择和优势估计方面存在关键盲点:(i)现有方法随机采样提示,忽视了数据选择对强化学习(RL)效能的重大影响——这一因素在大型语言模型的GRPO中被证明至关重要;(ii)它们仅依赖组内统计来评估样本质量,缺乏准确衡量真实策略改进的全局视角。为解决这些问题,我们提出了自适应GRPO(AdaGRPO),一种专为流模型设计的新型能力感知RL算法。具体而言,AdaGRPO由两个主要部分组成:(i)在线课程过滤策略:动态跟踪模型的能力,并自适应选择与其当前学习边界最匹配的提示;(ii)跨层级优势融合:协同整合细粒度组内优势与宏观全局优势,提供全面无偏的策略评估。作为轻量级即插即用模块,AdaGRPO可无缝集成到现有框架如Flow-GRPO、DanceGRPO和Flow-CPS中。大量实验表明,AdaGRPO持续推动性能提升,同时显著稳定流模型的GRPO训练。

英文摘要

Group Relative Policy Optimization (GRPO) has demonstrated remarkable success in aligning text-to-image (T2I) flow models with human preferences. However, we have identified that the learning loop of current flow-based GRPO is fundamentally decoupled from the learner's current capability, suffering from critical blind spots at both prompt selection and advantage estimation: (i) Existing methods sample prompts randomly, overlooking the substantial impact of data selection on reinforcement learning (RL) efficacy--a factor proven crucial in GRPO for large language models; (ii) They evaluate sample quality solely relying on intra-group statistics, lacking a global perspective to accurately measure true policy improvement. To address these issues, we propose Adaptive GRPO (AdaGRPO), a novel capability-aware RL algorithm tailored for flow models. Specifically, AdaGRPO consists of two principal components: (i) Online Curriculum Filtering Strategy: Dynamically tracks the model's proficiency and adaptively selects prompts that best match its current learning boundary; (ii) Cross-Level Advantage Fusion: Synergistically integrates fine-grained intra-group advantages with macro-level global advantages, providing a comprehensive and unbiased policy evaluation. As a lightweight, plug-and-play module, AdaGRPO can be seamlessly integrated with existing frameworks such as Flow-GRPO, DanceGRPO, and Flow-CPS. Extensive experiments demonstrate that AdaGRPO consistently drives performance gains while significantly stabilizes GRPO training for flow models.

2606.06830 2026-06-08 cs.CY cs.LG 交叉投稿

Learning Fair Demand Models

学习公平需求模型

Adam N. Elmachtoub, Hyemi Kim, Jonathan Y. Tan

发表机构 * Department of Industrial Engineering and Operations Research and Data Science Institute, Columbia University(工业工程与运筹学系及数据科学研究院,哥伦比亚大学) Department of Industrial Engineering and Operations Research, Columbia University(工业工程与运筹学系,哥伦比亚大学)

AI总结 研究数据驱动定价中的公平性问题,通过比较在需求估计或价格优化阶段施加公平约束的策略,分析其对消费者福利和社会效益的影响。

详情
AI中文摘要

数据驱动定价在航空、贷款、保险和零售等领域日益普遍。通过从客户特征中学习需求模型并据此定价,这些系统可能产生歧视性结果,引发公平性问题。这引出了基本问题——系统应如何在定价流程中纳入公平考量,以及最终如何影响社会结果?为回答这些问题,我们研究了一个简化模型,其中卖方有一个两阶段决策流程:线性需求模型估计,随后是价格优化。卖方在训练损失、价格和需求方面考虑公平概念,包括均等主义和罗尔斯主义视角。我们表明,跨消费者群体均等化训练损失会导致多个解,进而可能产生不良结果,尽管这是公平机器学习中的标准方法。相反,关注直接应用于价格或需求的公平性,我们比较了两种策略:在需求估计阶段或价格优化阶段强制执行公平性。对于均等主义公平,我们刻画了在较小公平水平下每种策略何时产生更高的社会福利。我们表明,当数据集中的市场规模和价格相似时,在估计阶段施加价格公平对消费者更有利,而在优化阶段施加需求公平则带来更好的消费者结果。对于罗尔斯主义公平,两种策略完全一致。最后,我们将模型扩展到其他需求函数,并使用真实世界疫苗定价数据进行案例研究。

英文摘要

Data-driven pricing is increasingly prevalent in sectors such as airlines, lending, insurance, and retail. By learning demand models from customer features and setting prices accordingly, these systems may generate discriminatory outcomes that raise fairness concerns. This leads to fundamental questions - how and where should systems incorporate fairness considerations in the pricing pipeline, and how does it ultimately affect societal outcomes? To answer these, we study a stylized model where a seller has a two-stage decision pipeline comprising linear demand model estimation followed by price optimization. The seller considers fairness notions in training loss, price, and demand, under both parity-wise and Rawlsian perspectives. We show that equalizing training loss across consumer groups leads to multiple solutions, which in turn can result in undesirable outcomes despite being a standard approach in fair machine learning. Focusing instead on fairness applied directly to prices or demand, we compare two strategies that enforce fairness in either the demand estimation stage or the price optimization stage. For parity-wise fairness, we characterize when each strategy yields higher social welfare under small fairness levels. We show that when market sizes and prices in the dataset are similar, imposing price fairness in the estimation stage is more beneficial to consumers, whereas imposing demand fairness in the optimization stage yields better consumer outcomes. For Rawlsian fairness, the two strategies coincide exactly. Lastly, we extend our model to alternate demand functions and conduct a case study using real-world vaccine pricing data.

2606.06837 2026-06-08 eess.AS cs.LG 交叉投稿

SEAM: Shortcut-Aware Real-Time Detection of Scripted vs. Spontaneous Speech for Interview Guardrails

SEAM:面向面试防护栏的脚本化与自发语音的快捷方式感知实时检测

Vsevolod, Kovalev, Pranay Manocha

发表机构 * Symbal AI Princeton University(普林斯顿大学)

AI总结 提出SEAM框架,通过统一预处理、接缝感知采样、非语音增强和紧凑DistilHuBERT骨干,在8秒窗口下实现0.971 ROC-AUC,并揭示快捷方式学习问题。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

脚本化与自发语音检测对面试防护栏具有吸引力,但基准性能可能因与语料库身份、信道条件和录音伪影相关的快捷方式(而非说话风格本身)而膨胀。我们提出SEAM,一个用于实时脚本化检测的快捷方式感知框架,结合了统一预处理、接缝感知采样、非语音增强和紧凑的DistilHuBERT骨干。使用8秒窗口,该模型在外部面试领域评估集上达到0.971 ± 0.004的ROC-AUC。移除快捷方式预防组件可改善内部留出指标,但急剧降低外部性能,表明存在快捷方式学习。训练后量化将模型占用减少至41.8MB,且外部性能损失很小。结果表明,鲁棒的实时脚本化检测不仅依赖于骨干网络,还依赖于快捷方式感知的数据设计和评估。我们发布代码和模型检查点。

英文摘要

Scripted vs spontaneous speech detection is appealing for interview guardrails, but benchmark performance can be inflated by shortcuts tied to corpus identity, channel conditions, and recording artifacts rather than speaking style itself. We present SEAM, a shortcut-aware framework for real-time scriptedness detection that combines uniform preprocessing, seam-aware sampling, non-speech augmentation, and a compact DistilHuBERT backbone. With 8s windows, the model achieves 0.971 +- 0.004 ROC-AUC on an external interview-domain evaluation set. Removing the shortcut-prevention components improves internal held-out metrics but sharply reduces external performance, indicating shortcut learning. Post-training quantization reduces the model footprint to 41.8MB with little loss in external performance. The results demonstrate that robust real-time scriptedness detection depends not only on the backbone, but on shortcut-aware data design and evaluation. We release code and model checkpoints.

2606.06840 2026-06-08 cs.CL cs.AI cs.LG 交叉投稿

Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces

先刻画再蒸馏:大输出空间中的机械推理

Debjyoti Saha Roy, Byron C. Wallace, Javed A. Aslam

发表机构 * Khoury College of Computer Sciences, Northeastern University(东北大学计算机科学学院)

AI总结 研究现代推理模型在百万级标签空间中实现零样本多标签分类的机制,提出“候选列表生成+精细推理”两阶段模型,并基于此开发机械蒸馏策略,优于标准蒸馏。

详情
AI中文摘要

现代推理模型在具有挑战性的多标签任务上表现出令人惊讶的强大零样本性能,这些任务需要从数十万到数百万个候选标签中选择一小部分相关选项。我们研究了它们如何机械地实现这一点。我们将推理描述为一个两阶段过程:首先进行广泛的“候选列表生成”,然后对生成的集合进行精细推理。我们在一系列数据集上提供证据表明,这些步骤可以分离并且是互补的。利用这一刻画,我们开发了一种机械蒸馏策略,该策略始终优于标准蒸馏。

英文摘要

Modern reasoning models offer surprisingly strong zero-shot performance on challenging multi-label tasks that require selecting a small set of relevant options from hundreds of thousands to millions of candidate labels. We investigate how they achieve this mechanistically. We characterize reasoning as a two-phase process: A broad "shortlisting" of candidates followed by fine-grained reasoning over the resulting set. We provide evidence across a range of datasets that these steps can be isolated and are complementary. Using this characterization, we develop a mechanistic distillation strategy that consistently outperforms standard distillation.

2606.06855 2026-06-08 stat.ML cs.LG math.ST stat.TH 交叉投稿

Stability beyond Bounded Differences: Sharp Generalization Bounds under Finite $L_p$ Moments

超越有界差分的稳定性:有限 $L_p$ 矩下的尖锐泛化界

Qianqian Lei, Soham Bonnerjee, Yuefeng Han, Wei Biao Wu

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 针对重尾或无界损失,提出仅需有限 $L_p$ 矩条件的稳定性框架,导出尖锐高概率泛化界,覆盖经验风险最小化、转导回归和元学习。

详情
AI中文摘要

虽然算法稳定性是理解学习算法泛化能力的核心工具,但现有的高概率保证通常依赖于一致有界或次高斯/次韦布尔尾部假设,这对于现代设置中重尾或无界损失可能过于严格。我们开发了一个仅需有限 $L_p$ 矩条件的稳定性框架。我们的第一个贡献是在 $L_p$ 约束下独立随机变量函数的尖锐集中不等式,将 McDiarmid 的有界差分技术扩展到经典范围之外。利用这些结果,我们在一系列学习范式中推导出尖锐的高概率泛化界,包括经验风险最小化、转导回归和元学习。这些保证表明,即使有界性不成立,$L_p$ 稳定性也足以实现鲁棒泛化,显著削弱了稳定性文献中的标准假设。

英文摘要

While algorithmic stability is a central tool for understanding generalization of learning algorithms, existing high-probability guarantees typically rely on uniform boundedness or sub-Gaussian/sub-Weibull tail assumptions, which can be overly restrictive for modern settings with heavy-tailed or unbounded losses. We develop a stability-based framework that requires only a finite $L_p$ moment condition. Our first contribution is sharp concentration inequalities for functions of independent random variables under $L_p$ constraints, extending McDiarmid's bounded-differences techniques beyond the classical regime. Leveraging these results, we derive sharp high-probability generalization bounds across a range of learning paradigms, including empirical risk minimization, transductive regression, and meta-learning. These guarantees show that $L_p$ stability suffices for robust generalization even when boundedness fails, substantially weakening the standard assumptions in the stability literature.

2606.06864 2026-06-08 cs.CV cs.LG 交叉投稿

LRMIL: Efficient Low-Resolution Multiple Instance Learning via High-Resolution Knowledge Distillation for Whole Slide Image Classification

LRMIL: 通过高分辨率知识蒸馏实现全切片图像分类的高效低分辨率多实例学习

Yonghan Shin, Won-Ki Jeong

发表机构 * Department of Computer Science and Engineering, Korea University, Seoul, Korea(韩国大学计算机科学与工程系)

AI总结 提出LRMIL框架,通过两阶段知识蒸馏将高分辨率知识迁移到低分辨率表示,在推理时仅使用低分辨率图像块,显著降低计算成本并提升分类性能。

详情
AI中文摘要

多实例学习(MIL)已成为数字病理学中全切片图像(WSI)分析的标准范式,因为它无需密集标注即可实现切片级预测。现有的MIL方法通常依赖于高分辨率图像块的详尽提取和编码。然而,这种做法在真实临床环境中存在两个关键限制:难以在较低放大倍数下捕获全局视觉线索,并且由于每张切片包含大量高分辨率图像块而导致巨大的计算开销。为了解决这些限制,我们提出了一种高效的低分辨率多实例学习(LRMIL)框架,该框架将高分辨率知识迁移到低分辨率表示。LRMIL采用两阶段蒸馏策略。首先,图像块级别的跨分辨率蒸馏将低分辨率图像块嵌入与高分辨率表示对齐。其次,切片级知识蒸馏在切片级监督和教师指导下训练低分辨率学生MIL模型。在推理时,LRMIL仅处理低分辨率图像块,大幅减少了数据预处理和计算成本。在多个WSI基准上的大量实验表明,LRMIL在实现更高效推理的同时,始终优于最先进的MIL方法。这些结果凸显了LRMIL作为临床病理学中WSI分析的实用且可扩展的解决方案。

英文摘要

Multiple instance learning (MIL) has become a standard paradigm for whole slide image (WSI) analysis in digital pathology, as it enables slide-level prediction without dense annotations. Existing MIL methods typically rely on exhaustive extraction and encoding of high-resolution patches. However, this practice suffers from two critical limitations in real-world clinical settings: it struggles to capture global visual cues at lower magnifications, and incurs substantial computational overhead due to the massive number of high-resolution patches per slide. To address these limitations, we propose an efficient low-resolution multiple instance learning (LRMIL) framework that transfers high-resolution knowledge to low-resolution representations. LRMIL adopts a two-stage distillation strategy. First, patch-level cross-resolution distillation aligns low-resolution patch embeddings with high-resolution representations. Second, slide-level knowledge distillation trains a low-resolution student MIL model under both slide-level supervision and teacher guidance. At inference time, LRMIL operates exclusively on low-resolution patches, substantially reducing data preprocessing and computational cost. Extensive experiments on multiple WSI benchmarks demonstrate that LRMIL consistently outperforms state-of-the-art MIL methods while achieving more efficient inference. These results highlight LRMIL as a practical and scalable solution for WSI analysis in clinical pathology.

2606.06890 2026-06-08 cs.CV cs.LG 交叉投稿

Diagnosing Visual Ignorance in Vision-Language Models

诊断视觉语言模型中的视觉忽视

Runyu Zhou, Qi Zhang, Qixun Wang, Yisen Wang

发表机构 * Peking University(北京大学)

AI总结 研究视觉语言模型依赖语言先验的内部机制,通过层替换和探针分析揭示多阶段瓶颈,并引入渐进视觉退化指标发现基准测试可能奖励视觉忽视。

详情
AI中文摘要

视觉语言模型(VLM)经常依赖语言先验,产生自信但缺乏视觉证据支持的答案。虽然这种行为被广泛观察到,但其内部机制及对基准评估的影响仍未被充分理解。在这项工作中,我们从机制和行为两个角度研究语言先验依赖。在内部,我们将反事实层替换与有监督的逐层MLP探针相结合,以追踪真实视觉语义和语言先验语义如何在语言解码器中竞争。我们的分析揭示了一个多阶段瓶颈:中间层通常无法有效检索视觉信息,而后续层可能进一步抑制存活的视觉信号,偏向文本空间偏差。在外部,我们引入了一种基于多步高斯模糊的渐进视觉退化度量,用于识别那些即使视觉内容被逐渐破坏,答案仍保持不变的实例。在十二个视觉问答基准和三个代表性VLM上,我们发现相当一部分示例在严重或完全视觉混淆下仍可回答,表明当前基准可能无意中奖励视觉忽视。这些发现表明,语言先验依赖是一种系统性的路由故障,影响模型内部和基准有效性。最后,我们概述了未来的关键研究方向,强调需要设计基于结构隔离或反事实数据的训练分布和评估协议,以强制执行真正的跨模态基础。

英文摘要

Vision-Language Models (VLMs) frequently rely on language priors, producing confident answers that are weakly grounded in visual evidence. While this behavior is widely observed, its internal mechanisms and its impact on benchmark evaluation remain insufficiently understood. In this work, we study language-prior reliance from both mechanistic and behavioral perspectives. Internally, we combine counterfactual layer replacement with supervised layer-wise MLP probing to trace how ground-truth visual semantics and language-prior semantics compete across the language decoder. Our analysis reveals a multi-stage bottleneck: intermediate layers often fail to effectively retrieve visual information, while later layers can further suppress surviving visual signals in favor of text-space biases. Externally, we introduce a progressive visual decay metric based on multi-step Gaussian blurring, which identifies instances whose answers remain invariant even as visual content is increasingly destroyed. Across twelve visual question-answering benchmarks and three representative VLMs, we find that a substantial fraction of examples remain answerable under severe or total visual obfuscation, indicating that current benchmarks can inadvertently reward visual ignorance. These findings demonstrate that language-prior reliance is a systematic routing failure affecting both model internals and benchmark validity. Finally, we outline critical pathways for future research, highlighting the necessity of designing training distributions and evaluation protocols built on structurally isolated or counterfactual data to enforce genuine cross-modal grounding.

2606.06899 2026-06-08 cs.CV cs.LG 交叉投稿

Lighting-Aware Representation Learning under Controllable Lighting Variation

可控光照变化下的光照感知表示学习

Lizhen Zhu, Charantej Reddy Pochimireddy, James Z Wang, Brad Wyble

发表机构 * The Pennsylvania State University(宾夕法尼亚州立大学)

AI总结 提出光照感知表示学习框架,将光照变化作为显式训练信号,通过辅助目标捕获光照依赖变化,在分类和检测任务上优于标准对比学习基线。

详情
AI中文摘要

光照变化仍然是视觉表示学习的主要挑战,因为它们会在环境内部和之间引起显著的外观变化。虽然现有方法通常通过数据增强来鼓励模型对光照变化具有不变性,但这些策略在学习过程中并未显式建模光照信息。受人类视觉理论的启发,我们提出了一种光照感知表示学习框架,该框架将光照变化作为显式训练信号而非需要抑制的干扰因素。我们的方法通过引入一个辅助目标来扩展对比学习,该目标捕获渲染场景中光照依赖的变化,使模型能够联合学习保持语义一致性的表示,同时保持对光照依赖的视觉结构的敏感性。我们在ImageNet、ExDark和PASCAL VOC基准测试上评估了所提模型的图像分类和物体检测任务。结果表明,所提出的光照感知训练在保持相同架构和训练预算的情况下,始终优于标准对比学习基线。此外,我们的方法在监督学习框架和涉及更简单光照变化的设置中表现出有前景的性能,表明其具有超越复杂光照场景的广泛适用性。这些结果显示了它在复杂视觉环境以及更常规的图像处理任务中增强模型鲁棒性和适应性的潜力。

英文摘要

Variations in illumination remain a major challenge for visual representation learning, as they induce substantial appearance changes both across and within environments. While existing approaches typically address this issue through data augmentations that encourage models to become invariant to lighting changes, such strategies do not explicitly model lighting information during learning. Inspired by theories of human vision, we propose a lighting-aware representation learning framework that incorporates illumination variation as an explicit training signal rather than a nuisance factor to be suppressed. Our method extends contrastive learning by introducing an auxiliary objective that captures illumination-dependent variation in rendered scenes, enabling the model to jointly learn representations that preserve semantic consistency while remaining sensitive to lighting-dependent visual structure. We evaluate the proposed model on image classification and object detection tasks across the ImageNet, ExDark, and PASCAL VOC benchmarks. Results demonstrate that the proposed lighting-aware training consistently improves downstream performance over standard contrastive learning baselines, while maintaining the same architecture and training budget. Furthermore, our approach shows promising performance in supervised learning frameworks and under settings involving simpler lighting variation, suggesting broad applicability beyond complex illumination scenarios. These results indicate its potential to enhance model robustness and adaptability in complex visual environments as well as in more conventional image processing tasks.

2606.06957 2026-06-08 stat.ML cs.LG 交叉投稿

Deep Single-Index Fréchet Regression

深度单指标弗雷歇回归

Muqing Cui, Yidong Zhou, Su I Iao, Hans-Georg Müller

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出DeSI框架,通过深度神经网络估计单指标方向,在度量空间中进行弗雷歇回归,缓解维数灾难并保持可解释性,理论保证收敛率,在分布、网络等数据上表现优异。

详情
AI中文摘要

预测位于非欧几里得空间中的输出,如概率分布、网络和对称正定矩阵,在现代数据分析中变得越来越重要,特别是当输入是高维时。我们提出了DeSI(深度单指标弗雷歇回归),一种用于度量空间值输出和多变量输入的半参数回归框架,该框架假设条件弗雷歇均值具有单指标结构。DeSI使用深度神经网络估计可解释的指标方向,该方向量化了输入的相对重要性,并在目标度量空间中沿着得到的一维指标进行弗雷歇回归。这种结构缓解了维数灾难,同时保持了可解释性,这与标准深度神经网络形成对比。我们为DeSI建立了理论保证,包括一致逼近和收敛速度,并通过在分布、网络和对称正定矩阵上的模拟,以及在新泽西州的成分情绪数据上的应用,展示了其强大的预测性能。

英文摘要

Predicting outputs that are located in non-Euclidean spaces, such as probability distributions, networks, and symmetric positive-definite matrices, is becoming increasingly important in modern data analysis, particularly when inputs are high-dimensional. We propose DeSI (Deep Single-Index Fréchet Regression), a semiparametric framework for regression with metric space-valued outputs and multivariate inputs that assumes a single-index structure for the conditional Fréchet mean. DeSI estimates an interpretable index direction, which quantifies the relative importance of inputs, using a deep neural network, and performs Fréchet regression along the resulting one-dimensional index in the target metric space. This structure mitigates the curse of dimensionality while retaining interpretability, which stands in contrast to standard deep neural networks. We establish theoretical guarantees for DeSI, including uniform approximation and convergence rates, and demonstrate its strong predictive performance through simulations on distributions, networks, and symmetric positive-definite matrices, as well as an application to compositional mood data from New Jersey.

2606.07030 2026-06-08 cs.SD cs.AI cs.CL cs.LG 交叉投稿

Phonetic Error Analysis of Raw Waveform Acoustic Models

原始波形声学模型的音素错误分析

Erfan Loweimi, Zhengjun Yue, Andrea Carmantini, Zoran Cvetkovic, Steve Renals, Peter Bell

发表机构 * Centre for Speech Technology Research (CSTR), University of Edinburgh, UK(语音技术研究中心(CSTR),爱丁堡大学,英国) Cisco, UK(思科公司,英国) SLAI & CUHK-SZ, China(SLAI与CUHK-SZ,中国) King's College London, UK(伦敦国王学院,英国)

AI总结 通过分解音素错误率、分析混淆矩阵,发现BLSTM层对过渡依赖类提升最大,WSJ迁移学习对辅音改进约是元音的三倍,且混淆模式反映固有音素相似性。

Comments INTERSPEECH2026

详情
AI中文摘要

我们分析了原始波形声学模型在TIMIT音素识别中的错误模式,超越了整体音素错误率(PER)。将PER按三个广义语音类别(BPC)分解,并从替换错误构建混淆矩阵。我们的模型将参数化(SincNet, Sinc2Net)或非参数化CNN与双向LSTM相结合,在开发/测试集上分别达到13.9%/15.3%的PER,这是原始波形模型在TIMIT上的最佳报告结果。来自WSJ的迁移学习将PER降至11.3%/12.3%,超越了Filterbank基线。每个BPC的分析表明,BLSTM层对过渡依赖类提升最大,而WSJ迁移学习对辅音的改进约是元音的三倍。原始波形和Filterbank系统的混淆模式一致,表明主要混淆反映了固有的音素相似性。

英文摘要

We analyse error patterns of raw waveform acoustic models on TIMIT phone recognition beyond the overall phone error rate (PER). PER is decomposed across three broad phonetic class (BPC) categorisations, and confusion matrices are constructed from substitution errors. Our models combine parametric (SincNet, Sinc2Net) or non-parametric CNNs with Bidirectional LSTMs, achieving 13.9%/15.3% PER on Dev/Test, the best reported results for raw waveform models on TIMIT. Transfer learning from WSJ reduces PER to 11.3%/12.3%, surpassing the Filterbank baseline. Per-BPC analysis reveals that BLSTM layers benefit transition-dependent classes most, while WSJ transfer learning improves consonants roughly three times more than vowels. Confusion patterns are consistent across raw waveform and Filterbank systems, indicating that the dominant confusions reflect inherent phonetic similarities.

2606.07036 2026-06-08 cs.CV cs.AI cs.CE cs.LG 交叉投稿

STREAM: Stochastic Riemannian Flow Matching with Anisotropic Decoder for Digital Histopathology Image Generation

STREAM: 用于数字组织病理学图像生成的随机黎曼流匹配与各向异性解码器

Won June Cho, Daeky Jeong, Hyeongyeol Lim, Hongjun Yoon

发表机构 * DEEPNOID Inc.(DEEPNOID公司)

AI总结 提出STREAM框架,利用组织病理学视觉基础模型的patch-token特征作为潜在空间,通过黎曼流匹配生成高质量组织病理学图像,解决条件崩溃问题,并设计各向异性解码器提升生成质量。

Comments 27 pages, 7 figures

详情
AI中文摘要

合成组织病理学图像生成解决了计算病理学中的关键挑战,包括患者隐私和对基础模型大规模训练数据日益增长的需求。潜在扩散模型主导了图像生成领域,最近的研究强调潜在空间的选择对生成图像的质量至关重要。现有的组织病理学最先进生成模型使用预训练的视觉基础模型(VFM)作为条件信号,我们观察到这会导致“条件崩溃”,即条件信号主导潜在空间,降低生成样本的质量和多样性。因此,我们转而使用预训练的组织病理学VFM作为潜在空间本身,利用其编码丰富语义信息的patch-token特征。我们经验性地表明,这些特征经过$\ell_2$归一化,位于单位超球面$\mathcal{S}^{d-1}$上,具有强烈的角度主导性和内在曲率,使其自然适用于黎曼公式。因此,我们提出了STREAM,这是第一个在病理学领域应用黎曼流匹配的框架。STREAM包括两个阶段:1)一种桥式随机扰动,在$\mathcal{S}^{d-1}$上建立每个token的可整流性,用于在潜在空间中训练扩散变换器(DiT);2)一种新颖的各向异性解码器,对速度场雅可比矩阵的低能量方向分配鲁棒性,同时保持其高能量方向的保真度。STREAM在乳腺癌和结直肠癌数据集上实现了最先进的重建和生成性能。代码将在接收后公开发布。

英文摘要

Synthetic histopathology image generation addresses critical challenges in computational pathology, including patient privacy and the growing need for large-scale training data for foundation models. Latent diffusion models have dominated the image generation domain, with recent works emphasizing that the choice of latent space is critical to the quality of generated images. Existing state-of-the-art generative models in histopathology use pretrained Vision Foundation Models (VFMs) as conditioning signals, and we observe that this leads to "conditioning collapse," where the conditioning signal dominates the latent space and lowers the quality and diversity of generated samples. Therefore, we instead use pretrained histopathology VFMs as the latent space itself, leveraging their patch-token features that encode rich semantic information. We empirically show that these features are $\ell_2$-normalized and lie on the unit hypersphere $\mathcal{S}^{d-1}$ with strong angular dominance and intrinsic curvature, making them naturally suited for a Riemannian formulation. We therefore present STREAM, the first framework to apply Riemannian flow matching in the pathology domain. STREAM consists of two stages: 1) a bridge-type stochastic perturbation that establishes per-token rectifiability on $\mathcal{S}^{d-1}$ for training a Diffusion Transformer (DiT) in latent space, and 2) a novel anisotropic decoder that allocates robustness to low-energy directions of the velocity-field Jacobian while preserving fidelity along its high-energy directions. Together, STREAM achieves state-of-the-art reconstruction and generation performance on breast and colorectal cancer datasets. The code will be publicly released upon acceptance.

2606.07053 2026-06-08 cs.CV cs.LG 交叉投稿

TrioPose: Native Triple-Stream Diffusion Transformers for Pose-Guided Text-to-Image Generation

TrioPose: 用于姿态引导文本到图像生成的原生三流扩散变换器

Dian Gu, Zhengyi Yang

发表机构 * Institute of Automation Chinese Academy of Sciences(中国科学院自动化研究所)

AI总结 提出TrioPose,基于SD3.5M架构的原生三流姿态感知DiT,通过逐层激活和零初始化双残差注入保持预训练稳定性,并设计可学习关系偏置掩码和姿态引导空间损失加权,在多人姿态引导生成中实现SOTA性能,Human-Art上AP达64.33。

Comments 15 pages (9 pages main body, 6 pages references and appendix), 3 figures, 5 tables

详情
AI中文摘要

姿态引导的文本到图像生成在复杂多人场景中常遭受肢体扭曲和特征串扰。虽然现有的基于UNet的适配器难以处理长程空间依赖,新兴的多模态扩散变换器(MM-DiT)提供了优越的全局建模能力。然而,MM-DiT中的简单信号拼接严重破坏了预训练的潜在分布。为了解决这个问题,我们提出了TrioPose,一个基于SD3.5M架构的原生姿态驱动框架。具体来说,我们引入了一个三流姿态感知DiT(TSPA-DiT),将姿态视为独立模态。它采用逐层激活和零初始化双残差注入,在保持预训练潜在稳定性的同时平滑地施加几何约束。为了解决严重的多实例遮挡,我们设计了一个可学习关系偏置掩码,将拓扑连接分类为细粒度的物理状态,将其映射为连续的注意力软约束,以有效解耦实例间干扰。此外,一种姿态引导空间损失加权策略利用热图导出的误差图调制原生扩散目标,将解剖监督严格集中在畸变区域。大量实验表明,TrioPose在具有挑战性的基准测试(包括Human-Art、CrowdPose和OCHuman)上实现了最先进的性能。值得注意的是,它在Human-Art上达到了64.33的AP,比先前方法提高了30%,同时在复杂多人生成中为视觉保真度和文本-图像语义对齐设立了新标准。

英文摘要

Pose-guided text-to-image generation often suffers from limb distortions and feature crosstalk in complex multi-person scenarios. While existing UNet-based adapters struggle with long-range spatial dependencies, emerging Multimodal Diffusion Transformers (MM-DiTs) offer superior global modeling. However, naive signal concatenation in MM-DiTs severely disrupts pre-trained latent distributions. To address this, we propose TrioPose, a native pose-driven framework built upon the SD3.5M architecture. Specifically, we introduce a Triple-Stream Pose-Aware DiT (TSPA-DiT) that treats pose as an independent modality. It employs layer-wise activation and zero-initialized dual-residual injection to smoothly enforce geometric constraints while preserving pre-trained latent stability. To resolve severe multi-instance occlusions, we design a Learnable Relational Bias Mask that categorizes topological connectivity into fine-grained physical states, mapping them into continuous attention soft constraints to effectively decouple inter-instance interference. Furthermore, a Pose-Guided Spatial Loss Weighting strategy modulates the native diffusion objective using heatmap-derived error maps, focusing anatomical supervision strictly on distortion-prone regions. Extensive experiments demonstrate that TrioPose achieves state-of-the-art performance across challenging benchmarks, including Human-Art, CrowdPose, and OCHuman. Notably, it attains an AP of $64.33$ on Human-Art, representing a $30\%$ improvement over prior arts, while setting new standards for visual fidelity and text-image semantic alignment in complex multi-human generation.

2606.07054 2026-06-08 cs.CL cs.AI cs.CR cs.LG 交叉投稿

TRACE: Trajectory Reasoning through Adaptive Cross-Step Evidence Aggregation for LLM Agents

TRACE: 通过自适应跨步骤证据聚合的LLM智能体轨迹推理

Vijitha Mittapalli, Shreyaa Jayant Dani, Satya Srujana Pilli, Snigdha Ansu, Mohammadreza Teymoorianfard, Franck Dernoncourt, Hongjie Chen, Yu Wang, Ryan A. Rossi, Nesreen K. Ahmed

发表机构 * University of Massachusetts at Amherst(马萨诸塞大学阿默斯特分校) Adobe Research(Adobe研究) Dolby Labs(杜比实验室) University of Oregon(俄勒冈大学) Cisco(思科)

AI总结 提出TRACE框架,通过TIJ循环识别高信号区域、累积跨步骤证据并合成轨迹级判决,在SHADE-Arena的十个任务域上F1达0.713,召回率0.844,尤其擅长长距离证据链接。

详情
AI中文摘要

自主LLM智能体可以通过一系列单独良性的行动追求隐藏的恶意目标,这使得使用标准轨迹级监控难以检测破坏行为。现有方法要么一次性评估完整轨迹,要么将其划分为独立评分的窗口,限制了连接时间上相距较远的证据的能力。我们提出TRACE,一个用于长视界LLM智能体轨迹的监控框架。TRACE通过一个TIJ(分类-检查-判决)循环运行,该循环识别高信号区域,执行有针对性的检查,同时在推理步骤中累积累积的证据,并综合出轨迹级判决。我们在SHADE-Arena的十个任务域上评估TRACE,与最先进的基线进行比较。TRACE实现了0.713的总体F1分数和0.844的召回率,在需要长距离证据链接的任务上取得了最大的提升。

英文摘要

Autonomous LLM agents can pursue hidden malicious objectives through sequences of individually benign actions, making sabotage difficult to detect using standard trajectory-level monitoring. Existing approaches either evaluate complete trajectories in a single pass or partition them into independently scored windows, limiting their ability to connect evidence across temporally distant actions. We propose TRACE, a monitoring framework for long-horizon LLM agent trajectories. TRACE operates through a TIJ (Triage-Inspect-Judge) loop that identifies high-signal regions, performs targeted inspection while maintaining accumulated evidence across reasoning steps, and synthesizes a trajectory-level verdict. We evaluate TRACE on ten task domains from SHADE-Arena against state-of-the-art baselines. TRACE achieves an aggregate F1 of 0.713 and recall of 0.844, with the largest gains on tasks requiring long-range evidence linking.

2606.07098 2026-06-08 cs.CL cs.LG 交叉投稿

SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices

SigmaScale: 基于SVD低秩分解和学习缩放矩阵的LLM压缩

Ernests Lavrinovics, Marco Letizia, Roy Janco, Shai Segal, Johannes Bjerva, Maurizio Pierini

发表机构 * Department of Computer Science, Aalborg University Copenhagen(奥尔堡大学哥本哈根分校计算机科学系) MaLGa-DIBRIS, University of Genoa(热那亚大学MaLGa-DIBRIS) INFN, Sezione di Genova(国家核物理研究所热那亚分部) European Organization for Nuclear Research (CERN)(欧洲核子研究中心) Ceva, Inc.(Ceva公司)

AI总结 提出SigmaScale方法,通过学习辅助缩放矩阵优化截断SVD的LLM压缩,降低权重矩阵有效秩,在Llama 3.1 8B和Qwen3-8B上达到竞争性能。

详情
AI中文摘要

我们提出SigmaScale,一种学习辅助缩放矩阵$S$以辅助基于截断奇异值分解(SVD)的大语言模型(LLM)压缩的方法。SigmaScale不是解析地推导缩放矩阵,而是优化两组定义对角行和列缩放变换的向量,并在激活感知的压缩损失下进行。我们表明,学习到的缩放降低了权重矩阵的有效内在秩,这反映在有效秩熵的减少上,并且这种减少与压缩损失强相关。在Llama 3.1 8B Instruct和Qwen3-8B上的实验表明,SigmaScale在困惑度和零样本基准测试上与最相关的基于SVD的压缩方法具有竞争力。通过使用学习到的激活感知变换,SigmaScale通过适应单个模型权重的结构,探索了一条更灵活的低秩LLM压缩路径。在特定任务中观察到的优势使我们的方法成为需要降低LLM推理计算成本的应用的有效选择。

英文摘要

We present SigmaScale, a method for learning auxiliary scaling matrices $S$ to aid truncated Singular Value Decomposition (SVD) based Large Language Model (LLM) compression. Instead of deriving scaling matrices analytically, SigmaScale optimizes two sets of vectors that define diagonal row and column scaling transformations under an activation-aware compression loss. We show that learned scaling lowers the effective intrinsic rank of weight matrices, as reflected by reductions in effective-rank entropy, and that this reduction is strongly correlated with compression loss. Experiments on Llama 3.1 8B Instruct and Qwen3-8B show that SigmaScale is competitive with closely related state-of-the-art SVD-based compression methods across perplexity and zero-shot benchmarks. By using learned activation-aware transformations, SigmaScale explores a more flexible route to low-rank LLM compression by adapting to the structure of individual model weights. The advantage observed in specific tasks makes our approach a valid option for applications requiring a reduced LLM-inference computing cost.

2606.07153 2026-06-08 math.NA cs.LG cs.NA math.OC 交叉投稿

No-Harm Physics-Informed Inverse Learning with Residual-Calibrated Uncertainty

无伤害物理信息逆学习与残差校准不确定性

Ronald Katende

发表机构 * Department of Mathematics(数学系) Kabale University(卡巴勒大学)

AI总结 提出一种无伤害认证与选择框架,通过残差校准半径确保物理信息逆学习不劣于基线,结合数据、物理、边界等残差提供后验误差界与确定性不确定性半径。

Comments 25 pages, 10 Tables, 12 Figures

详情
AI中文摘要

物理信息学习越来越多地用于偏微分方程控制的逆问题,但其可靠性仍然难以认证。本文开发了一种用于物理信息逆学习的无伤害认证与选择框架。仅当学习重建的残差校准半径不劣于基线半径时,即当 $$R_{\mathrm{learn}}\le R_{\mathrm{base}}+\varepsilon_{\mathrm{safe}}$$ 时,才接受学习重建;否则,该方法返回基线。该认证结合了数据、物理、边界或初始条件以及优化残差。在条件稳定性估计下,这些残差产生后验重建误差界和确定性不确定性半径。对于从独立随机配点估计的物理残差,还推导了高概率认证。在泊松源恢复、逆热重建、有限角度断层扫描、椭圆系数识别和随机残差验证上的数值测试表明,该选择器接受认证的改进,拒绝偏移、幻觉或未完成的候选,并在强不适定情况下变得保守。因此,该框架是一个认证与选择层,而不是另一个重建架构。

英文摘要

Physics-informed learning is increasingly used for partial differential equation (PDE)-governed inverse problems, but its reliability remains difficult to certify. This paper develops a no-harm certification-and-selection framework for physics-informed inverse learning. A learned reconstruction is accepted only when its residual-calibrated radius is no worse than the baseline radius, namely when $$R_{\mathrm{learn}}\le R_{\mathrm{base}}+\varepsilon_{\mathrm{safe}};$$otherwise, the method returns the baseline. The certificate combines data, physics, boundary or initial-condition, and optimization residuals. Under a conditional stability estimate, these residuals yield an a posteriori reconstruction-error bound and a deterministic uncertainty radius. A high-probability certificate is also derived for physics residuals estimated from independent random collocation points. Numerical tests on Poisson source recovery, inverse heat reconstruction, limited-angle tomography, elliptic coefficient identification, and stochastic residual validation show that the selector accepts certified improvements, rejects shifted, hallucinated, or unfinished candidates, and becomes conservative in strongly ill-posed regimes. The framework is therefore a certification-and-selection layer, not another reconstruction architecture.

2606.07172 2026-06-08 cs.CV cs.AI cs.CL cs.LG 交叉投稿

Textual Supervision Enhances Geospatial Representations in Vision-Language Models

文本监督增强视觉-语言模型中的地理空间表示

Marcelo Sartori Locatelli, Fernando Tonucci, Jea Kwon, Luiz Felipe Vecchietti, Bryan Nathanael Wijaya, Cheng Yaw Low, Virgilio Almeida, Meeyoung Cha

发表机构 * University of São Paulo(圣保罗大学) National University of Singapore(新加坡国立大学)

AI总结 研究视觉、视觉-语言及多模态模型的地理空间表示能力,发现文本监督能有效提升空间编码,推动地理空间AI发展。

Comments Accepted at ICML 2026

详情
AI中文摘要

地理空间理解是机器学习系统在图像地理定位和空间推理等任务中一个关键但尚未充分探索的维度。在这项工作中,我们分析了三种模型家族获得的地理空间表示:纯视觉架构(如ViT)、视觉-语言模型(如CLIP)和大规模多模态基础模型(如LLaVA、Qwen和Gemma)。通过评估包括人物、地标和日常物体在内的图像聚类(根据可定位程度分组),我们揭示了空间准确性的系统性差距,并表明文本监督增强了地理空间表示的学习。我们的发现表明语言作为编码空间上下文的有效补充模态,以及多模态学习作为推进地理空间AI的关键方向。

英文摘要

Geospatial understanding is a critical yet underexplored dimension in the development of machine learning systems for tasks such as image geolocation and spatial reasoning. In this work, we analyze the geospatial representations acquired by three model families: vision-only architectures (e.g., ViT), vision-language models (e.g., CLIP), and large-scale multimodal foundation models (e.g., LLaVA, Qwen, and Gemma). By evaluating across image clusters, including people, landmarks, and everyday objects, grouped based on the degree of localizability, we reveal systematic gaps in spatial accuracy and show that textual supervision enhances the learning of geospatial representations. Our findings suggest the role of language as an effective complementary modality for encoding spatial context and multimodal learning as a key direction for advancing geospatial AI.

2606.07180 2026-06-08 cs.CV cs.LG 交叉投稿

OPTIMUS-Prime: Minimal and Sufficient Concept Explanations for Deep Vision Models

OPTIMUS-Prime:深度视觉模型的最小且充分的概念解释

Arthur Hoarau, Chenrui Zhu, Vu Linh Nguyen

发表机构 * Université de Lorraine(洛林大学) CentraleSupélec Loria(中央超导Loria) CNRS(国家科学研究中心) Metz, France(法国梅斯) Université de technologie de Compiègne UMR CNRS 7253 Heudiasyc(图卢兹技术大学UMR CNRS 7253 Heudiasyc) France(法国)

AI总结 提出OPTIMUS框架,基于主蕴含项理论生成视觉热图解释,满足充分性和最小性,提供形式化保证。

详情
AI中文摘要

自动化决策中日益增长的透明度需求已将可解释人工智能(XAI)推向机器学习研究的前沿。然而,在计算机视觉中,现有的解释方法通常优先考虑最终用户的可访问性,而牺牲了形式化保证,在实用性和理论严谨性之间留下了关键差距。在本文中,我们通过引入OPTIMUS(一种用于深度分类模型的基于概念的可视化解释的新框架)来弥补这一差距。OPTIMUS解释采用视觉热图的形式,不仅对最终用户保持可解释性,而且基于成熟的主蕴含项理论,提供了现有基于显著性方法所缺乏的形式化保证。具体来说,OPTIMUS解释满足两个理想性质:充分性,确保被强调的概念可证明地保证分类器的预测;以及最小性,确保这些概念的严格子集不再保留此保证。这两个性质共同产生了逻辑上紧凑且视觉上连贯的解释。我们在视觉分类基准上验证了我们的方法,证明OPTIMUS热图自然且忠实地呈现了模型预测背后的决策相关概念。

英文摘要

The growing demand for transparency in automated decision-making has propelled eXplainable Artificial Intelligence (XAI) to the forefront of machine learning research. In computer vision, however, existing explanation methods often prioritize end-user accessibility at the expense of formal guarantees, leaving a critical gap between practical utility and theoretical rigor. In this paper, we address this gap by introducing OPTIMUS, a novel framework for generating concept-based visual explanations for deep classification models. OPTIMUS explanations take the form of visual heatmaps that not only remain interpretable to end users, but are grounded in the well-established theory of prime implicants, providing formal guarantees that have been largely absent from existing saliency-based methods. Specifically, OPTIMUS explanations satisfy two desirable properties: sufficiency, ensuring that the highlighted concepts provably guarantee the classifier's prediction, and minimality, ensuring that no strict subset of those concepts retains this guarantee. Together, these properties yield explanations that are both logically tight and visually coherent. We validate our approach on a visual classification benchmark, demonstrating that OPTIMUS heatmaps naturally and faithfully surface the decision-relevant concepts underlying model predictions.

2606.07205 2026-06-08 cs.DS cs.LG 交叉投稿

Towards Tight Bounds for Streaming Attention

流式注意力机制的紧界

Justin Y. Chen, Ying Feng, Piotr Indyk, Michael Kapralov, Ekaterina Kochetkova, Boris Prokhorov

发表机构 * MIT(麻省理工学院) EPFL(苏黎世联邦理工学院)

AI总结 本文通过核密度估计的三种方法(差异理论、多项式方法和空间划分)的紧密结合,几乎确定了流式注意力近似问题的空间复杂度紧界,并引入带大量辅助信息的INDEX问题新下界技术。

详情
AI中文摘要

注意力机制是现代Transformer架构的基石。然而,其表达能力以二次运行时和线性空间使用为代价。特别是,经典Transformer架构显式存储所有先前看到的输入元素(token)以生成下一个。在有限空间中实现Transformer的问题,称为KV缓存压缩,在过去几年中引起了广泛关注,推动了强大启发式算法的发展。Haris等人(COLT'25)和Kochetkova等人(NeurIPS'25)的最新工作将KV缓存压缩形式化为流式注意力近似问题,并提供了基于差异理论的上界和信息论下界。然而,这些论文在上界和下界之间留下了显著差距。例如,他们算法的空间使用随精度参数增加,但下界并未增强。在这项工作中,我们重新审视流式注意力近似问题,并给出了其空间复杂度的几乎紧界。在算法方面,我们通过核密度估计的三种不同方法(基于差异的coreset构造(如Charikar-Kapralov-Waingarten'24)、多项式方法(如Greengard-Rokhlin'87、Alman-Song'23)和空间划分(如Andoni-Laarhoven-Razenshteyn-Waingarten'17、Charikar-Kapralov-Nouri-Siminelakis'20))之间令人惊讶的紧密相互作用实现了这一结果。在下界方面,我们的主要技术贡献是一种使用大量辅助信息的INDEX问题的新技术,我们希望这将在其他高维几何估计问题中证明有用。

英文摘要

The attention mechanism is a cornerstone of modern transformer architectures. However, its expressive power comes at the cost of quadratic runtime and linear space usage. In particular, the classical transformer architecture explicitly stores all previously seen input elements (tokens) in order to generate the next one. The problem of implementing a transformer in limited space, known as KV cache compression, has received much interest over the past few years, spurring the development of powerful heuristics. Recent works of Haris et al, COLT'25 and Kochetkova et al, NeurIPS'25, formalized KV cache compression as the streaming attention approximation problem, providing both upper bounds (based on discrepancy theory) and information theoretic lower bounds. However, those papers left open a significant gap between the upper and lower bounds. For example, the space usage of their algorithms increases with the precision parameter, but the lower bound does not get stronger. In this work, we revisit the streaming attention approximation problem and provide nearly tight bounds on its space complexity. On the algorithmic side, we achieve the result through a surprisingly tight interplay between three distinct methods for kernel density estimation: discrepancy-based coreset constructions (e.g., Charikar-Kapralov-Waingarten'24), the polynomial method (e.g., Greengard-Rokhlin'87, Alman-Song'23), and space partitioning (e.g., Andoni-Laarhoven-Razenshteyn-Waingarten'17, Charikar-Kapralov-Nouri-Siminelakis'20). On the lower bound side, our main technical contribution is a new technique for using the INDEX problem with a large amount of side information that we hope will prove useful in other high dimensional geometric estimation problems.

2606.07207 2026-06-08 cs.SD cs.LG eess.AS 交叉投稿

Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

熵作为结构先验:DiT信念空间上的对数障碍如何驱动音乐多样性与发展

Zixi Li, Youzhen Li

发表机构 * Sun Yat-sen University(中山大学) Datawhale(数据 whale)

AI总结 提出Eisbach对数障碍,利用DiT输出空间能量分布的熵作为权重,在监督扩散训练中通过调节梯度步长促进音乐主题发展、声学区分和纹理多样性,避免模式崩溃。

详情
AI中文摘要

基于置信度的损失加权通常在生成模型中被避免,因为当模型自信地错误时会加速误差,但这种直觉在监督扩散训练中不成立。我们引入了Eisbach对数障碍,一种无参数权重,源自DiT输出空间能量分布的熵:高熵抑制梯度,低熵保留梯度。将其应用于Stable Audio 3 Medium在MusicCaps上的LoRA微调,意外地产生了比未加权训练更强的主题发展、更清晰的声学区分和更高的纹理多样性,这与模式崩溃相反。这是因为在监督扩散中,梯度方向锁定于真实值,因此置信度仅缩放步长,并且因为时间熵对平坦样本降权而保留高对比度样本。结果是一个在线、自引用的数据课程,完全从前向传播中涌现,并分析了噪声级动态和可测试的预测。

英文摘要

Confidence-based loss weighting is usually avoided in generative models because it accelerates errors when the model is confidently wrong, but this intuition breaks down in supervised diffusion training. We introduce the Eisbach log-barrier, a parameter-free weight derived from the entropy of the DiT output's spatial energy distribution: high entropy damps the gradient, while low entropy preserves it. Applied to LoRA fine-tuning of Stable Audio 3 Medium on MusicCaps, it unexpectedly yields stronger thematic development, clearer acoustic differentiation, and higher textural diversity than unweighted training, the opposite of mode collapse. This works because in supervised diffusion the gradient direction is locked to ground truth, so confidence only scales the step size, and because temporal entropy downweights flat samples while preserving high-contrast ones. The result is an online, self-referential data curriculum that emerges purely from the forward pass, with analyzed noise-level dynamics and testable predictions.

2606.07217 2026-06-08 cs.RO cs.CV cs.LG 交叉投稿

Robotic Policy Adaptation via Weight-Space Meta-Learning

通过权重空间元学习实现机器人策略自适应

Christian Bianchi, Siamak Yousefi, Alessio Sampieri, Andrea Roberti, Luca Rigazio, Fabio Galasso, Luca Franco

发表机构 * ItalAI University of Verona(威尼斯大学) Sapeinza University of Rome(罗马萨佩因扎大学)

AI总结 提出WIZARD框架,通过权重空间元学习从语言指令和演示视频生成任务特定LoRA参数,无需微调即可适应新任务,在LIBERO上性能提升高达14倍。

详情
AI中文摘要

视觉-语言-动作(VLA)模型正成为机器人操作的一种有前景的范式,能够从大规模演示和动作标签语料库中训练通用策略。然而,将这些模型适应新任务通常仍需要任务特定的演示、动作注释和额外的微调,使得部署成本高昂且难以扩展。我们提出WIZARD,一种权重空间元学习框架,通过为冻结的VLA策略生成任务特定的LoRA参数来避免任务特定的微调。仅凭语言指令和简短的演示视频,WIZARD即可在单次前向传播中预测相应的自适应权重,无需目标任务动作标签或测试时优化。在元训练期间,WIZARD学习将任务证据直接映射到专家LoRA更新,在权重空间中捕获任务之间的关系。在LIBERO上的实验表明,WIZARD在未见过的数据集集合上性能提升高达约2倍,在未见过的任务上提升高达约14倍。在Franka Emika Panda机器人上,WIZARD持续优于真实域自适应基线,表明生成的适配器提供了超越仿真的任务级特化。

英文摘要

Vision-Language-Action (VLA) models are emerging as a promising paradigm for robotic manipulation, enabling general-purpose policies trained from large corpora of demonstrations and action labels. However, adapting these models to new tasks still typically requires task-specific demonstrations, action annotations, and additional fine-tuning, making deployment costly and difficult to scale. We propose WIZARD, a weight-space meta-learning framework that sidesteps task-specific fine-tuning by generating task-specific LoRA parameters for a frozen VLA policy. Given only a language instruction and a short demonstration video, WIZARD predicts the corresponding adaptation weights in a single forward pass, without target-task action labels or test-time optimization. During meta-training, WIZARD learns to map task evidence directly to expert LoRA updates, capturing relationships between tasks in weight space. Experiments on LIBERO show that WIZARD improves performance by up to ~2x on unseen dataset collections and up to ~14x on unseen tasks. On a Franka Emika Panda, WIZARD consistently improves over a real-domain adapted baseline, showing that generated adapters provide task-level specialization beyond simulation.

2606.07233 2026-06-08 cs.CV cs.LG cs.RO 交叉投稿

Does Appearance Help? A Systematic Study of Image-Based Re-Identification in Online 3D Multi-Pedestrian Tracking

外观有帮助吗?在线3D多行人追踪中基于图像的重识别系统研究

Eduardo Borges, Luís Garrote, Urbano J. Nunes

发表机构 * Institute of Systems and Robotics, Department of Electrical and Computer Engineering, University of Coimbra(系统与机器人研究所,电气与计算机工程系,科英布拉大学)

AI总结 系统研究轻量级投影框架下图像重识别在在线3D多目标追踪中的作用,提出级联匹配策略以在低延迟下恢复遮挡轨迹并防止身份切换。

Comments Accepted for publication at the 35th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2026)

详情
AI中文摘要

基于LiDAR的3D多目标追踪通常仅依赖几何信息,这在长时间遮挡或拥挤人群环境中往往不足以区分目标。虽然集成基于RGB的重识别提供了保持身份上下文的理论解决方案,但现有方法通常依赖计算昂贵的并行检测器,阻碍了机器人的实时响应。本文通过利用轻量级投影框架解耦移动机器人的几何和外观建模,对在线3D多目标追踪中的基于图像的重识别进行了系统研究。对特征提取架构进行了全面分析,采用轻量级CNN和视觉Transformer,并评估了多种多模态数据关联策略以平衡计算延迟和鲁棒追踪。在KITTI数据集的行人类别上的实验表明,外观和运动成本的朴素线性融合由于视觉噪声而降低了性能。相反,级联匹配策略成功恢复了被遮挡的轨迹而不损害整体精度,有效防止了身份切换以维持人机交互的连续性。我们表明,轻量级架构可以在安全导航所需的低延迟和社交意识所需的判别能力之间提供最优权衡。

英文摘要

LiDAR-based 3D Multi-Object Tracking (MOT) typically relies solely on geometric information, which is often insufficient to distinguish between targets during prolonged occlusions or in crowded human-populated environments. While integrating RGB-based Re-Identification (ReID) offers a theoretical solution for preserving identity context, existing approaches often rely on computationally expensive parallel detectors that hinder real-time robot responsiveness. This work presents a systematic study of image-based ReID in online 3D MOT, utilizing a lightweight projection-based framework to decouple geometric and appearance modeling for mobile robots. A comprehensive analysis of feature extraction architectures is conducted, employing lightweight CNNs and Vision Transformers, and evaluating various multi-modal data association strategies to balance computational latency with robust tracking. Experiments on the Pedestrian class of the KITTI dataset reveal that naive linear fusion, of appearance and motion costs, degrades performance due to visual noise. Conversely, a cascaded matching strategy successfully recovers occluded tracks without compromising overall precision, effectively preventing identity switches to maintain human-robot interaction continuity. We show that lightweight architectures can offer an optimal trade-off between the low latency required for safe navigation and the discriminative power needed for social awareness.

2606.07237 2026-06-08 cs.CL cs.AI cs.LG 交叉投稿

When Large Language Models Fail in Healthcare: Evaluating Sensitivity to Prompt Variations

当大型语言模型在医疗保健中失败:评估对提示变化的敏感性

Mahdi Alkaeed

发表机构 * Department of Computer Science and Engineering, Doha, Qatar(计算机科学与工程系,多哈,卡塔尔)

AI总结 本研究系统分析了通用和医学专用LLM对提示扰动的敏感性,发现即使是微小的措辞变化也可能改变临床建议,对抗性提示可能引发有害输出,表明这些模型在临床应用中不可靠。

Comments 12 pages

详情
AI中文摘要

大型语言模型(LLM)越来越多地用于医疗保健任务,如临床问答、诊断支持和报告总结。尽管前景广阔,但这些模型对微小的提示扰动(包括词汇和句法)仍然高度敏感,在安全关键的临床应用中构成严重风险。在本研究中,我们使用MedMCQA基准进行了系统的敏感性分析,以评估通用(例如GPT-3.5、Llama3)和医学专用LLM(例如ClinicalBERT、BioLlama3、BioBERT)的鲁棒性。我们将扰动分为自然和对抗两种类型,并检查它们对临床推理任务中模型一致性、准确性和可靠性的影响。我们的发现表明,医学LLM并非本质安全。即使是措辞的微小变化也可能改变临床建议,而针对性的对抗性提示可能引发有害输出。在医疗保健等高风险环境中,这种不可预测性是不可接受的——模型因重新措辞的输入而改变诊断,或因轻微改写而幻觉药物,临床医生无法可靠地信任它们。虽然模型通常对简单的词汇替换或释义表现出韧性,但在句法重新排序或误导性上下文线索下往往会崩溃。这种脆弱性在通用和领域专用LLM中都很明显。值得注意的是,对抗性操作可能导致临床危险的输出,例如推荐不正确的剂量或遗漏关键发现。

英文摘要

Large Language Models (LLMs) are increasingly used in healthcare for tasks such as clinical question answering, diagnosis support, and report summarization. Despite their promise, these models remain highly sensitive to subtle prompt perturbations, both lexical and syntactic, posing serious risks in safety-critical clinical applications. In this study, we conduct a systematic sensitivity analysis to evaluate the robustness of both general-purpose (e.g., GPT-3.5, Llama3) and medical-specific LLMs (e.g., ClinicalBERT, BioLlama3, BioBERT) using the MedMCQA benchmark. We categorize perturbations into natural and adversarial types and examine their effect on model consistency, accuracy, and reliability in clinical reasoning tasks. Our findings reveal that medical LLMs are not intrinsically safe. Even minor variations in phrasing can alter clinical advice, and targeted adversarial prompts can provoke harmful outputs. In high-stakes settings like healthcare, such unpredictability is unacceptable-models that change diagnoses due to reworded inputs or hallucinate medications when slightly rephrased cannot be reliably trusted by clinicians. While models tend to show resilience to simple lexical substitutions or paraphrasing, they often break down under syntactic reordering or misleading contextual cues. This fragility is evident across both general-purpose and domain-specific LLMs. Notably, adversarial manipulations can lead to clinically dangerous outputs, such as recommending incorrect dosages or omitting critical findings.

2606.07257 2026-06-08 physics.optics cs.LG 交叉投稿

On the conditional equivalence of phase retrieval algorithms

关于相位恢复算法的条件等价性

Jakob Schroeder, Andreas Döpp

发表机构 * Fakultät für Physik, Ludwig-Maximilian-Universität München(物理系,路德维希-马克西米利安慕尼黑大学) Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心) Max Planck Institut für Quantenoptik(马克斯·普朗克量子光学研究所)

AI总结 本文证明了Gerchberg-Saxton算法与梯度下降法在幅度最小二乘损失上等价,并给出了全局和局部的概率解释,为迭代相位恢复中的松弛提供指导。

详情
AI中文摘要

相位恢复——从强度测量中恢复复值场——通常使用Gerchberg-Saxton (GS)算法的变体来解决,该算法被理解为测量平面之间的交替投影。同时,现代计算成像越来越依赖于基于梯度的优化和自动微分。这里我们表明这两种方法在数学上是等价的:GS幅度替换步骤恰好是幅度最小二乘损失上的单位梯度下降步骤。这种等价性使得经典相位恢复与可微物理管道无缝集成。我们进一步确定了这种等价性的两种互补概率解释:全局上,幅度损失是高斯幅度噪声下的负对数似然;局部上,每个投影步骤作为贝叶斯更新出现,以传播场为先验。局部观点为迭代相位恢复中的松弛提供了定性指导。

英文摘要

Phase retrieval - recovering a complex-valued field from intensity measurements - is typically solved using variants of the Gerchberg-Saxton (GS) algorithm, understood as alternating projections between measurement planes. Meanwhile, modern computational imaging increasingly relies on gradient-based optimization and automatic differentiation. Here we show that these two approaches are mathematically identical: the GS magnitude replacement step is exactly a unit gradient descent step on an amplitude least-squares loss. This equivalence enables seamless integration of classical phase retrieval with differentiable physics pipelines. We further identify two complementary probabilistic interpretations of this equivalence: globally, the amplitude loss is the negative log-likelihood under Gaussian amplitude noise; locally, each projection step arises as a Bayesian update with the propagated field as prior. The local view provides qualitative guidance for relaxation in iterative phase retrieval.

2606.07277 2026-06-08 cs.IT cs.CR cs.LG math.IT 交叉投稿

The Capacity of Information-Theoretic Secure Aggregation in Federated Learning

联邦学习中信息论安全聚合的容量

Lanxin Yi, Jinbao Zhu, Kai Wan, Xiaohu Tang

发表机构 * Information Coding and Transmission (ICT) Key Laboratory of Sichuan Province, Southwest Jiaotong University(四川省信息编码与传输(ICT)重点实验室,西南交通大学) School of Electronic Information and Communications, Huazhong University of Science and Technology(华中科技大学电子信息与通信学院)

AI总结 针对联邦学习中的安全聚合问题,提出一种无需可信第三方或预设结构的通用密钥分发模型,并完整刻画了安全性、密钥分发通信和聚合通信三者间的容量区域。

详情
AI中文摘要

安全聚合允许服务器在保护更新隐私的情况下聚合用户的本地更新。现有的信息论问题通常假设相关随机密钥由可信第三方(TTP)提供或通过规定的群组结构生成,而建立此类相关密钥的通信成本往往被忽略。因此,在通用密钥分发机制下的基本极限仍然未知。在本文中,我们研究了在由密钥分发阶段和更新聚合阶段组成的通用两阶段框架下,具有$N$个用户的$T$共谋信息论安全聚合问题。与先前工作不同,我们通过用户间通信对密钥分发进行建模,并允许任意用户生成的密钥分发机制,消除了TTP或规定结构。这使得能够联合表征三种资源:用于安全性的随机性、密钥分发通信和聚合通信。我们通过构建一种新颖的安全聚合方案以及匹配的信息论逆定理,完全刻画了这三种资源之间的容量区域。特别地,我们在任何大小至少为$N$的有限域上开发了一种显式的确定性容量达到构造,而大多数现有方案要么依赖TTP,要么在足够大的有限域上采用随机或存在性构造。我们进一步表明,仅使用成对共享密钥即可实现最优性能,从而可以通过Diffie-Hellman密钥交换实现。与Google开创性的安全聚合方案相比,所提方案在保持相同聚合通信开销的同时,需要更少的随机掩码密钥。

英文摘要

Secure aggregation allows a server to aggregate users' local updates while preserving update privacy. Existing information-theoretic problems typically assume that correlated random keys are provided by a trusted third party (TTP) or generated via prescribed groupwise structures, while the communication cost for establishing such correlated keys is often ignored. Consequently, the fundamental limits under general key-distribution mechanisms remain unknown. In this paper, we study the $T$-colluding information-theoretic secure aggregation problem with $N$ users under a general two-phase framework consisting of a key distribution phase and an update aggregation phase. Unlike prior work, we model key distribution through user-to-user communication and allow arbitrary user-generated key-distribution mechanisms, eliminating TTP or prescribed structures. This enables a joint characterization of three resources: randomness for security, key-distribution communication, and aggregation communication. We completely characterize the capacity region among these three resources by constructing a novel secure aggregation scheme together with a matching information-theoretic converse. In particular, we develop an explicit deterministic capacity-achieving construction over any finite field of size at least $N$, whereas most existing schemes either rely on TTP or employ randomized or existential constructions over sufficiently large finite fields. We further show that the optimal performance can be achieved using only pairwise shared keys, enabling implementation via Diffie--Hellman key exchange. Compared with Google's seminal secure aggregation scheme, the proposed scheme requires fewer random masking keys while preserving the same aggregation communication overhead.

2606.07293 2026-06-08 cs.SD cs.LG 交叉投稿

TargetSEC: Plug-and-Play In-the-Wild Speech Emotion Conversion via Arousal-Conditioned Latent Style Diffusion

TargetSEC: 基于唤醒度条件潜在风格扩散的即插即用野外语音情感转换

Constantin Alexander Auga

发表机构 * Hasso Plattner Institute / University of Potsdam(霍普特尔研究所 / 波茨坦大学)

AI总结 提出TargetSEC,一种基于嵌入驱动的潜在扩散框架,通过连续情感条件生成情感风格嵌入,在紧凑潜在空间操作,实现高转换精度和语音质量。

Comments 5 pages, 2 figures, 2 tables, preprint

详情
AI中文摘要

语音情感转换旨在将源话语的情感转换为目标情感,同时保留内容和说话人身份。由于训练数据的非平行性和复杂真实世界声学,野外数据的SEC具有挑战性。现有的固定时长方法要么难以有效转移情感(高质量、低转换),要么降低语音自然度(低质量、高转换)。我们提出TargetSEC,一种嵌入驱动的潜在扩散框架,根据说话人身份和连续情感生成以情感为中心的风格嵌入。与在频谱图上扩散的方法不同,TargetSEC在紧凑潜在空间中操作。在MSP-Podcast数据集上的实验表明,TargetSEC在转换准确性上优于当前非时长基线,同时保持高语音质量,并且在没有显式时间建模的情况下实现了与时长预测系统相当的性能。

英文摘要

Speech Emotion Conversion (SEC) aims to transform the emotion of a source utterance into a target emotion while preserving content and speaker identity. SEC on in-the-wild data is challenging due to the non-parallel nature of training data and complex real-world acoustics. Existing fixed-duration approaches either struggle to shift the emotion effectively (high quality, low conversion) or degrade speech naturalness (low quality, high conversion). We propose TargetSEC, an embedding-driven latent diffusion framework that generates emotion-focused style embeddings conditioned on speaker identity and continuous emotion. Unlike methods that diffuse over spectrograms, TargetSEC operates in a compact latent space. Experiments on the MSP-Podcast dataset show that TargetSEC outperforms current non-duration baselines in conversion accuracy while maintaining high speech quality, and achieves performance comparable to duration-prediction systems without explicit temporal modeling.

2606.07366 2026-06-08 cs.CV cs.LG cs.RO 交叉投稿

Dash2Sim: Closed-Loop Driving Simulation from in-the-wild Dashcam Videos

Dash2Sim: 来自野外行车记录仪视频的闭环驾驶仿真

Anurag Ghosh, Francesco Pittaluga, Khiem Vuong, Angela Chen, Juan Alvarez-Padilla, Manmohan Chandraker, Srinivasa Narasimhan

发表机构 * Carnegie Mellon University(卡内基梅隆大学) NEC Labs America(NEC美国实验室) MIT(麻省理工学院) UC San Diego(加州大学圣地亚哥分校)

AI总结 提出Dash2Sim框架,将单目行车记录仪视频转化为度量级、地理参考的4D驾驶日志,用于闭环仿真,并构建ROADWork4D基准数据集,验证了施工区场景对规划器的挑战。

详情
AI中文摘要

自动驾驶仿真通常依赖于在少数城市收集的数据或手工编写的合成场景。行车记录仪视频覆盖了更广泛的位置和情况,包括罕见或长尾场景。由于难以从单目野外视频中恢复准确的4D场景,它们被认为不太适用于仿真。施工区是行车记录仪捕捉到的一类长尾情况。我们提出Dash2Sim,一个将野外单目行车记录仪视频转化为度量级、地理参考的4D驾驶日志并与现有仿真器兼容的框架,并针对独立维护的地图验证每个日志,无需标注。我们将Dash2Sim应用于大型视频语料库,创建了ROADWork4D基准数据集,涵盖17个城市的4,244个场景和270万个3D对象。在验证子集ROADWork4D-CL(2,201个场景)上,我们研究了特权闭环规划器,发现施工区场景具有挑战性:尽管基于规则和混合规划器的泛化能力优于基于学习的规划器,但所有规划器均表现不足,无法完成临时施工区通道所需的变道。在规划之外,Dash2Sim恢复的密集深度在新视角合成质量上提高了高达19%(基于感知指标),表明其具有为单目视频的闭环传感器仿真提供丰富条件的潜力。

英文摘要

Self-driving simulations typically rely on data collected in a small number of cities or on hand-authored synthetic scenarios. Dashcam videos cover a far broader range of locations and situations, including rare or long-tailed scenarios. They are considered less usable for simulation because it is difficult to recover accurate 4D scenes from monocular in-the-wild videos. Work zones are one such class of long-tailed situations that dashcams capture. We present Dash2Sim, a framework that turns in-the-wild monocular dashcam videos into metric, geo-referenced 4D driving logs compatible with existing simulators, and verifies eachone against an independently maintained map without annotations. We apply Dash2Sim to a large video corpus to create the ROADWork4D benchmark dataset, which spans 4,244 scenes with 2.7M 3D objects across 17 cities. On a verified subset ROADWork4D-CL (2,201 scenes), we study privileged closed-loop planners and find that work zone scenarios are difficult: while rule-based and hybrid planners generalize better than learning-based ones, all fall short, failing to make the lane changes that temporary work zone channels require. Beyond planning, dense depth recovered by Dash2Sim improves novel-view synthesis quality by up to 19% on perceptual metrics, suggesting its potential to provide rich conditioning for closed-loop sensor simulation from monocular videos.

2606.07383 2026-06-08 cs.RO cs.LG 交叉投稿

RhinoVLA Technical Report

RhinoVLA 技术报告

Huixi Intelligence, :, Chen Zhang, Chenyang Zhou, Guanglei Ding, Guanghui He, Haibin Gao, Jiajia Chen, Jianyong Zhang, Lianyi Yu, Ningyi Xu, Ping Xu, Qingchen Li, Yingjun Hu, Yijia Zhang, Yuxi Liu

发表机构 * Huixi Intelligence(慧溪智能)

AI总结 针对边缘硬件上VLA模型部署延迟问题,提出RhinoVLA,通过令牌高效骨干、连续动作专家和统一接口实现实时闭环控制,在Huixi R1上达到11.69 Hz推理速度。

详情
AI中文摘要

视觉-语言-动作(VLA)模型在机器人操作中展现出强大潜力,但在边缘硬件上的实时部署仍具挑战。本文中,我们识别出VLM视觉和上下文令牌是部署延迟的主要来源:对于以GEMM为主的投影算子,当模型维度固定时,计算量随输入令牌数量线性增长。基于此观察,我们提出RhinoVLA,一种与Huixi R1边缘SoC协同设计的面向部署的VLA模型。RhinoVLA采用令牌高效的Qwen3-VL骨干和连续动作专家,在保留预训练多模态能力的同时减少VLM侧的令牌和计算负担。为支持跨机器人学习,RhinoVLA进一步引入统一接口,结合视图注册表、72维物理状态-动作槽空间和机器人实例LoRA,使异构机器人观测和动作模式能在共享策略下对齐。在部署方面,RhinoVLA通过硬件感知编译、混合精度执行和并行视觉编码进行优化。实验表明,RhinoVLA在相似参数量下实现了与π0.5相当的下游性能,同时在Huixi R1上达到11.69 Hz的端到端推理,满足10 Hz实时闭环控制目标。该项目将在以下网址开源:此 https URL。

英文摘要

Vision-Language-Action (VLA) models have shown strong potential for robotic manipulation, but real-time deployment on edge hardware remains challenging. In this work, we identify VLM visual and context tokens as a major source of deployment latency: for GEMM-dominated projection operators, computation grows linearly with the number of input tokens when model dimensions are fixed. Motivated by this observation, we propose RhinoVLA, a deployment-oriented VLA model co-designed with the Huixi R1 edge SoC. RhinoVLA adopts a token-efficient Qwen3-VL backbone and a continuous Action Expert, reducing the VLM-side token and computation burden while preserving pretrained multimodal capability. To support cross-robot learning, RhinoVLA further introduces a unified interface that combines View Registry, 72D physical state-action slot space, and robotinstance LoRA, allowing heterogeneous robot observations and action schemas to be aligned under a shared policy. On the deployment side, RhinoVLA is optimized through hardware-aware compilation, mixed-precision execution, and parallel visual encoding. Experiments show that RhinoVLA achieves downstream performance comparable to π0.5 at a similar parameter scale, while reaching 11.69 Hz end-to-end inference on Huixi R1, meeting the 10 Hz real-time closedloop control target. The project will be open-sourced at https://github.com/HuixiAI/RhinoVLA.

2606.07385 2026-06-08 nlin.CD cs.LG physics.data-an 交叉投稿

Unified Geometry-Guided ML-FTLE for Tracking Transient Chaos from Scalar Time Series

统一几何引导的ML-FTLE用于从标量时间序列追踪瞬态混沌

S. V. Manivelan, Andrei Velichko, I. Manimehan

发表机构 * Department of Physics, M. R. Government Arts College (Affiliated to Bharathidasan University, Tiruchirappalli)(物理系,M.R.政府艺术学院(隶属于巴拉特拉桑大学, Tiruchirappalli)) Institute of Physics and Technology, Petrozavodsk State University(物理与技术学院,佩特罗扎沃茨克州立大学)

AI总结 提出几何引导的机器学习框架,通过结合预测轨迹发散和宏观吸引子形态,从标量观测中检测瞬态混沌,无需控制方程,验证了融合拓扑状态空间与预测发散能系统改进连续过渡追踪。

Comments Preprint; 9 figures; submitted for peer review

详情
AI中文摘要

在没有控制方程的情况下,从标量观测中检测瞬态混沌是非线性动力学中的一个基本挑战。我们提出了一个几何引导的机器学习框架,该框架统一了预测轨迹发散与宏观吸引子形态,以追踪突然的 regime 转变。该方法通过样本外 k-最近邻预测误差提取局部不稳定性尺度,建立 ML-FTLE 估计器,随后将此时间发散映射到由最小庞加莱占用网格字典导出的结构接近矩阵上。通过偏最小二乘回归,我们提取一个直接校准到经验有限时间李雅普诺夫谱的潜在几何成分,得到基于庞加莱的几何引导 FTLE。对解析 QR-FTLE 基线的验证证实,融合拓扑状态空间与预测发散系统地改进了连续过渡追踪。结构相似性指数最优地解析了逐渐阻尼,而豪斯多夫距离在突然的相空间崩溃期间表现出极端弹性。此外,宏观空间离散化作为针对加性高斯噪声的鲁棒拓扑正则化器,即使在中等信号阈值下也能保留确定性特征。这个无方程框架为监测复杂非平稳系统中的结构转变提供了高精度、抗噪声的诊断方法。

英文摘要

Detecting transient chaos from scalar observations without governing equations represents a fundamental challenge in nonlinear dynamics. We propose a geometry-guided machine learning framework that unifies predictive trajectory divergence with macroscopic attractor morphology to track abrupt regime shifts. The methodology extracts a local instability scale via out-of-sample k-nearest neighbor forecast errors to establish the ML-FTLE estimator, subsequently mapping this temporal divergence onto a structural closeness matrix derived from a minimal dictionary of Poincare occupancy grids. By employing partial least squares regression, we extract a latent geometric component calibrated directly to the empirical finite-time Lyapunov spectrum, yielding the Poincare-based geometric-guided FTLE. Validation against analytical QR-FTLE baselines confirms that fusing topological state spaces with predictive divergence systematically improves continuous transition tracking. The Structural Similarity Index optimally resolves gradual damping, while Hausdorff Distance exhibits extreme resilience during abrupt phase-space collapses. Furthermore, macroscopic spatial discretization acts as a robust topological regularizer against additive Gaussian noise, preserving deterministic signatures even at moderate signal thresholds. This equation-free framework provides a highly accurate, noise-resilient diagnostic for monitoring structural transitions in complex non-stationary systems.

2606.07392 2026-06-08 cs.AI cs.LG econ.EM stat.ML 交叉投稿

Online Pandora's Box for Contextual LLM Cascading

面向上下文LLM级联的在线潘多拉魔盒

Alexandre Belloni, Yan Chen, Yehua Wei

发表机构 * The Fuqua School of Business, Duke University(杜克大学福克商学院)

AI总结 针对LLM级联场景,提出在线上下文潘多拉魔盒模型,通过参数化保留索引和GMM估计结合UCB界,实现维度相关的√T累积遗憾。

详情
AI中文摘要

受大型语言模型(LLM)级联的启发,我们提出了一种在线上下文潘多拉魔盒模型,用于自适应地查询和选择LLM API。在每个周期中,决策者观察一个请求上下文,并面临一个两阶段决策问题。在查询阶段,决策者顺序查询API,每次查询揭示一个生成的输出,并且决策者承担(输出相关的)成本。在选择阶段,决策者选择一个生成的输出进行部署,并仅观察部署输出的下游奖励。这种输出介导的反馈结构不同于经典的在线上下文潘多拉魔盒模型,后者打开盒子直接揭示其奖励。我们不估计每个API的完整条件输出和成本分布,而是直接建模保留索引,并为查询阶段开发一种学习方法。具体地,我们对由经典Weitzman策略诱导的上下文保留索引函数施加参数化结构。我们的策略将这些保留索引的广义矩方法(GMM)类型估计与这些索引以及共享输出级奖励评估器的UCB风格置信界相结合。在正则条件下,我们证明所得策略在T个周期的时间范围内实现了维度相关的$\widetilde O(\sqrt T)$累积遗憾。

英文摘要

Motivated by Large Language Model (LLM) cascading, we propose an online contextual Pandora's Box model for adaptively querying and selecting LLM APIs. In each period, a decision-maker observes a request context and faces a two-phase decision problem. In the query phase, the decision-maker sequentially queries APIs, where each query reveals a generated output and the decision-maker incurs an (output-dependent) cost. In the selection phase, the decision-maker selects one of the generated outputs to deploy and observes only the downstream reward of the deployed output. This output-mediated feedback structure differs from classical online contextual Pandora's Box models, in which opening a box directly reveals its reward. Rather than estimating the full conditional output and cost distributions of each API, we directly model the reservation index and develop a learning approach for the query phase. Specifically, we impose a parametric structure on the contextual reservation index functions induced by the classical Weitzman's policy. Our policy combines generalized method of moments (GMM) type estimation of these reservation indices with UCB-style confidence bounds for both these indices and the shared output-level reward evaluator. Under regularity conditions, we prove that the resulting policy achieves dimension-dependent $\widetilde O(\sqrt T)$ cumulative regret over a horizon of $T$ periods.

2606.07399 2026-06-08 stat.ML cs.LG 交叉投稿

Automatic, Debiased, and Invariant Counterfactual Generation under General Interventions

通用干预下的自动、去偏和不变反事实生成

Raphael C Kim, Jingsen Zhu, Ramin Zabih, Michele Santacatterina

发表机构 * Cornell Tech(康奈尔科技) Cornell University(康奈尔大学) Department of Biostatistics, Department of Population Health(生物统计学系、人口健康系) New York University Grossman School of Medicine(纽约大学格罗斯曼医学院)

AI总结 提出ADIGen框架,结合Riesz回归、因果不变性和正交统计学习,实现通用干预下反事实生成的自动、去偏和不变性,并提供过剩风险界。

详情
AI中文摘要

用于反事实结果的生成模型在复杂干预下支持决策具有巨大潜力,但现有方法受限于不稳定的估计、跨环境的泛化能力差以及来自干扰模型错误设定的偏差。我们引入了ADIGen,一个在通用干预下(包括高维干预和结果)进行自动、去偏和不变反事实生成的框架。ADIGen结合了Riesz回归以避免不稳定的密度比估计,因果不变性以改善分布偏移下的泛化,以及正交统计学习以获得针对干扰模型错误设定的双重稳健保证。我们提供了过剩风险界,表明ADIGen在通用干预下控制了反事实风险,具有乘积偏差干扰余项和跨环境的不变风险界。

英文摘要

Generative models for counterfactual outcomes have great potential to support decision-making under complex interventions, but existing approaches are limited by unstable estimation, poor generalization across environments, and bias from nuisance model misspecification. We introduce ADIGen, a framework for automatic, debiased, and invariant counterfactual generation under general interventions, including high-dimensional interventions and outcomes. ADIGen combines Riesz regression to avoid unstable density-ratio estimation, causal invariance to improve generalization under distribution shift, and orthogonal statistical learning to obtain doubly robust guarantees against nuisance model misspecification. We provide excess-risk bounds showing that ADIGen controls counterfactual risk under general interventions, with a product-bias nuisance remainder and an invariant risk bound across environments.

2606.07403 2026-06-08 math.OC cs.LG 交叉投稿

The Proxy Benders Decomposition

代理Benders分解

Changkun Guan, El Mehdi Er Raqabi, Mathieu Tanneau, Pascal Van Hentenryck

发表机构 * H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, USA(赫尔曼·米利特·斯图尔特工业与系统工程学院,佐治亚理工学院,美国亚特兰大) Department of Operations and Decision Systems, Université Laval, Quebec, Canada(运营与决策系统系,拉瓦尔大学,加拿大魁北克)

AI总结 针对Benders分解中子问题重复求解导致收敛慢的问题,提出代理Benders分解(Proxy-BD),用自监督预测-投影-补全机制生成可行对偶解,在保持理论有效性的同时大幅降低计算开销。

详情
AI中文摘要

Benders分解是求解具有复杂变量的大规模混合整数优化问题的基本框架,当这些变量固定后,子问题会显著简化。然而,经典Benders分解反复求解高度相似的子问题,且迭代中常出现锯齿形行为,导致大规模设置下收敛缓慢。受Benders子问题的重复结构和参数化特性启发,本文引入代理Benders分解(Proxy-BD),一种新的分解框架,其中子问题优化被经过认证的优化代理替代,而非重复精确求解。所提出的代理遵循自监督的预测-投影-补全机制,生成对偶可行解以产生可证明有效的Benders割。该框架通过投影-补全认证层,独立于预测质量保持分解的理论有效性。建立了代理诱导割的形式化刻画,该框架自然扩展到现代分解方案,包括分支-和-Benders-割算法。在大规模设施选址和网络设计问题上的计算实验表明,Proxy-BD在保持接近最优解质量的同时,显著减少了子问题的计算量。在高达2000x2000的无容量设施选址实例上,Proxy-BD的中位最优性差距低于0.5%,实现高达161倍的中位加速比,并在最大实例上减少超过240倍的割生成数量。计算增益随追索复杂度持续增加,表明在大规模分解设置中,基于代理的推理比重复精确子问题优化具有更显著的扩展优势。

英文摘要

Benders decomposition is a fundamental framework for solving large-scale mixed-integer optimization problems with complicating variables that, when fixed, yield significantly easier subproblems. However, classical Benders decomposition repeatedly solves highly similar subproblems and often exhibits zigzagging behavior across iterations, leading to slow convergence in large-scale settings. Motivated by the repetitive structure and parametric nature of Benders subproblems, this paper introduces the proxy Benders decomposition (Proxy-BD), a new decomposition framework in which subproblem optimization is replaced by certified optimization proxies rather than repeated exact solves. The proposed proxy follows a self-supervised predict-project-and-complete mechanism that produces dual-feasible solutions for generating provably valid Benders cuts. The framework preserves the theoretical validity of the decomposition independently of prediction quality through a projection-and-completion certification layer. A formal characterization of proxy-induced cuts is established, and the framework naturally extends to modern decomposition schemes, including branch-and-Benders-cut algorithms. Computational experiments on large-scale facility location and network design problems demonstrate that Proxy-BD substantially reduces the computational effort of subproblems while maintaining near-optimal solution quality. On large-scale uncapacitated facility location instances up to 2000x2000, Proxy-BD achieves median optimality gaps below 0.5%, yields up to 161x median speedups, and reduces the number of generated cuts by more than 240x on the largest instances. The computational gains consistently increase with recourse complexity, indicating that proxy-based inference scales substantially more favorably than repeated exact subproblem optimization in large-scale decomposition settings.

2606.07451 2026-06-08 cs.CV cs.AI cs.CL cs.LG 交叉投稿

TEVI: Text-Conditioned Editing of Visual Representations via Sparse Autoencoders for Improved Vision-Language Alignment

TEVI: 基于稀疏自编码器的文本条件视觉表示编辑以改进视觉-语言对齐

Sweta Mahajan, Sukrut Rao, Jiahao Xie, Alexander Koller, Bernt Schiele

发表机构 * Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany(马克斯·普朗克研究所信息学院,萨尔兰信息学院,德国萨尔布吕肯) Department of Language Science and Technology, Saarland University, Saarbrücken, Germany(语言科学与技术系,萨尔兰大学,德国萨尔布吕肯)

AI总结 提出TEVI框架,利用稀疏自编码器解耦图像嵌入,并通过文本条件掩码模块选择性重构嵌入,以改善CLIP等视觉-语言模型的图像-文本对齐,在多个检索基准上取得提升。

Comments 20 pages, 13 figures, 14 tables

详情
AI中文摘要

视觉-语言模型(如CLIP)由于共享图像-文本嵌入空间,对多种任务非常有用。尽管如此,图像和文本嵌入往往对齐不佳,影响下游性能。最近的研究表明,这可以归因于信息不平衡:图像包含的信息比其标题描述的更多。在这项工作中,我们提出了TEVI,一个利用标题作为信号来决定从图像嵌入中保留哪些信息的框架。具体来说,我们使用稀疏自编码器来解耦图像嵌入,并训练一个掩码模块,根据给定的标题选择性重构嵌入。在具有合成标题的受控设置中,我们展示了TEVI在保留标题描述的属性同时丢弃其他属性方面的有效性。通过将TEVI应用于在自然图像上训练的CLIP模型,我们进一步在粗粒度短标题(MS COCO, Flickr)和细粒度长标题(IIW, DOCCI)基准上实现了改进的检索性能,在更丰富的标题上获得更强的增益,并在RoCOCO基准上提高了鲁棒性。

英文摘要

Vision-language models such as CLIP are highly useful for diverse tasks due to their shared image-text embedding space. Despite this, the image and text embeddings are often poorly aligned, affecting downstream performance. Recent work has shown that this can be attributed to an information imbalance: images contain more information than their captions describe. In this work, we propose TEVI, a framework that uses captions as a signal for what to retain from image embeddings. Specifically, we use sparse autoencoders to disentangle image embeddings and train a masking module to selectively reconstruct the embedding based on a given caption. In a controlled setup with synthetic captions, we show that TEVI is effective at preserving caption-described attributes while discarding others. By applying TEVI to CLIP models trained on natural images, we further achieve improved retrieval performance across coarse-grained short-caption (MS COCO, Flickr) and fine-grained long-caption (IIW, DOCCI) benchmarks, with stronger gains on richer captions, and improved robustness on the RoCOCO benchmark.

2606.07463 2026-06-08 eess.SP cs.CE cs.LG 交叉投稿

Amortized Neural Optimization for Pre-Layout Signal Integrity Design Space Exploration using Differentiable Surrogates

基于可微代理的布局前信号完整性设计空间探索的摊销神经优化

Julian Withöft, Werner John, Emre Ecik, Ralf Brüning, Jürgen Götze

发表机构 * Information Processing Lab, Faculty for Electrical Engineering and Information Technology, TU Dortmund(信息处理实验室,电气工程与信息科技学院,图腾大学) Pyramide2525, Paderborn, Germany(Pyramide2525,帕德博恩,德国) EMC Technology Center Paderborn, Zuken GmbH, Paderborn, Germany(EMC技术中心帕德博恩,Zuken GmbH,帕德博恩,德国)

AI总结 提出摊销神经优化(ANO)框架,利用可微神经网络代理模型替代迭代黑盒优化,实现单次前向传播获取近最优设计参数,在DDR5 DFE、SerDes均衡等场景中加速三到四个数量级。

Comments 16 pages, 20 figures, 8 tables

详情
AI中文摘要

高速信号完整性(SI)分析的布局前设计空间探索(DSE)通常受限于现代电子设计自动化(EDA)工作流程中仿真和迭代优化算法的计算成本。虽然机器学习代理模型加速了仿真步骤,但优化设计仍需利用迭代黑盒搜索方法。这种迭代性质扩展性差,使得多角点扫描计算成本高昂。作为解决方案,本文提出了用于布局前SI设计的摊销神经优化(ANO)。ANO通过利用完全可微的神经网络代理模型,完全消除了迭代黑盒推理。ANO从代理中提取解析梯度,以训练全局优化策略。推理时不再重复求解优化问题,而是离线学习优化过程,从而实现摊销。一旦ANO策略训练完成,它就能在单个确定性前向传播中直接将不同的通道上下文映射到近最优设计参数。基于三个复杂的SI设计场景展示了ANO框架的效率和准确性,包括DDR5决策反馈均衡(DFE)、9维SerDes Tx/Rx联合均衡以及DDR3 DQS差分对布线(在内部对偏斜约束下优化眼图指标)。与实例特定的黑盒算法相比,在牺牲约10%最优性的代价下,实现了三到四个数量级的加速。对于大规模32万实例多角点SerDes扫描优化,ANO将原本需要数天计算时间的迭代搜索算法压缩为一次批量前向传播,毫秒级完成。这将计算昂贵的SI优化转变为实时、交互式的布局前DSE。

英文摘要

Pre-layout design space exploration (DSE) for high-speed signal integrity (SI) analysis is often limited by the computational cost of simulations and iterative optimization algorithms within modern electronic design automation (EDA) workflows. While machine learning surrogate models accelerate the simulation step, optimizing designs still requires utilizing iterative black-box search methods. This iterative nature scales poorly, making multi-corner sweeps computationally expensive. As a solution, this paper proposes amortized neural optimization (ANO) for pre-layout SI design. ANO entirely eliminates iterative black-box inference by utilizing fully differentiable neural network surrogate models. ANO extracts analytical gradients from the surrogate to train a global optimization policy. Instead of solving the optimization problem repeatedly at inference, the optimization process is learned offline and therefore amortized. Once the ANO policy is trained, it maps different channel contexts directly to near-optimal design parameters in a single deterministic forward pass. The efficiency and accuracy of the ANO framework are demonstrated based on three complex SI design scenarios, including DDR5 decision feedback equalization (DFE), 9-dimensional SerDes Tx/Rx co-equalization, and DDR3 DQS differential pair routing to optimize eye diagram metrics under intra-pair skew constraints. By trading roughly 10% in optimality compared to instance-specific black-box algorithms, it realizes speedups of three to four orders of magnitude. For a large-scale 320,000-instance multi-corner SerDes sweep optimization, ANO collapses what would have taken days of computation using iterative search algorithms into a single batched forward pass that completes in milliseconds. This transforms computationally expensive SI optimization into real-time and interactive pre-layout DSE.

2606.07491 2026-06-08 cs.DC cs.AI cs.LG cs.SE 交叉投稿

Twelve quick tips for designing AI-driven HPC workflows

设计AI驱动的高性能计算工作流的十二条快速技巧

Jamie J. Alnasir

发表机构 * Department of Computer Science(计算机科学系) Royal Holloway University of London(伦敦皇家霍洛威大学)

AI总结 本文针对AI与HPC融合带来的新挑战,提出十二条实用技巧,涵盖容器化、作业数组、反馈循环和I/O优化,帮助设计高效、可扩展、可复现的AI驱动HPC工作流。

Comments 12 pages, 1 figure. Formatted using the bioRxiv LaTeX preprint style

详情
AI中文摘要

高性能计算(HPC)集群仍然是大规模科学计算的支柱,传统上执行确定性、线性流水线,以优化可预测性能。然而,人工智能(AI)和基础模型在科学研究中的普遍集成引入了一种根本性的新计算范式。AI驱动的工作流具有迭代、数据驱动和概率性的特征,带来了数据引力、异构资源管理和复杂工作流编排方面的独特挑战。本指南提供了十二条实用技巧,旨在帮助研究人员设计高效、可扩展和可复现的AI驱动HPC工作流。通过解决关键的系统级瓶颈——例如用于环境可移植性的容器化、作业数组的战略部署、显式反馈循环机制以及小文件的I/O优化——本文提供了一个从刚性执行流水线过渡到自适应、智能计算环境的框架。虽然这些架构原则广泛适用于分布式环境,但它们特别针对现代计算生物学中资源密集型的吞吐量需求。

英文摘要

High-performance computing (HPC) clusters remain the backbone of large-scale scientific computation, traditionally executing deterministic, linear pipelines optimised for predictable performance. However, the pervasive integration of artificial intelligence (AI) and foundation models into scientific research has introduced a fundamentally new computational paradigm. AI-driven workflows are characteristically iterative, data-driven, and probabilistic, introducing unique challenges regarding data gravity, heterogeneous resource management, and complex workflow orchestration. This guide provides twelve practical tips designed to help researchers design efficient, scalable, and reproducible AI-driven HPC workflows. By addressing critical system-level bottlenecks - such as containerisation for environment portability, strategic deployment of job arrays, explicit feedback loop mechanics, and I/O optimisation for small files - this article offers a framework for transitioning from rigid execution pipelines to adaptive, intelligent computational environments. While these architectural principles are broadly applicable across distributed environments, they are particularly tailored to the resource-intensive throughput demands of modern computational biology.

2606.07492 2026-06-08 cs.IR cs.LG stat.ML 交叉投稿

Bradley-Terry Rankings for Recommender Systems Across Dataset Taxonomies

基于数据集分类学的推荐系统Bradley-Terry排名

Ekaterina Grishina, Stepan Kuznetsov, Askar Tsyganov, Ilya Ivanov, Daria Korovaitceva, Margarita Rusanova, Uliana Parkina, Alexander Derevyagin, Evgeny Frolov, Sergey Samsonov, Anton Lysenko

发表机构 * HSE University(俄罗斯莫斯科国立高等经济学院)

AI总结 针对推荐算法排名对数据集特性敏感的问题,提出基于Bradley-Terry模型的数据驱动排名方法,并引入排名一致性指标和针对未见数据集的算法排名方法。

Comments KDD'26

详情
AI中文摘要

推荐算法的排名是一个具有挑战性的问题,因为模型性能对数据集特征(如稀疏性、序列结构和规模)敏感。这驱动了对适当方法的需求,以公平比较算法。对性能指标(例如,在基准测试上平均NDCG)的简单聚合可能会产生误导性的排名,削弱实际选择。为解决此问题,我们引入了一种基于Bradley-Terry(BT)模型的新型数据驱动排名方法。我们证明所获得的排名取决于关键数据集统计量。此外,我们提出了一种新的排名一致性评估指标,并展示了我们的排名对不完整数据的鲁棒性。最后,我们引入了一种针对未见数据集的算法排名方法,无需运行模型,依赖于Bradley-Terry框架的扩展,包括BT树和带协变量的BT模型。

英文摘要

The ranking of recommendation algorithms is a challenging problem since model performance is sensitive to dataset characteristics such as sparsity, sequential structure, and scale. This drives a demand for a proper methodology for fair comparison between algorithms. Naive aggregation of performance metrics (e.g., averaging NDCG over benchmarks) can yield misleading rankings, undermining practical selection. To address this problem, we introduce a novel, data-driven ranking methodology based on Bradley-Terry (BT) model. We demonstrate that the obtained ranking depends on key dataset statistics. Additionally, we propose a novel metric for evaluating ranking consistency and demonstrate robustness of our ranking to incomplete data. Finally, we introduce a dataset-specific methodology for ranking algorithms on unseen datasets without running the models, relying on extensions of the Bradley-Terry framework, including BT trees and BT models with covariates.

2206.08598 2026-06-08 cs.LG stat.ML 版本更新

Characterizing Learning Dynamics under Relative Reparameterization of Singular Models

奇异模型相对重参数化下的学习动态表征

Pascal Mattia Esser, Frank Nielsen

发表机构 * Ludwig-Maximilians-Universität München(慕尼黑路易斯-马克西米利安大学) Sony Computer Science Laboratories Inc.(索尼计算机科学实验室)

AI总结 针对奇异模型参数空间与模型空间非一一对应导致收敛慢的问题,提出相对重参数化方法提取正则子模型,并在高斯混合模型和神经网络上理论分析梯度下降收敛率。

详情
AI中文摘要

分析统计模型学习的一种常见方法是考虑模型参数空间中的操作,但当参数空间与底层统计模型空间之间不存在一一映射时,这变得具有挑战性。这种“奇异模型”经常出现,并且由于吸引子行为,学习轨迹的收敛速度会特征性地降低。在这项工作中,我们考虑了参数空间的相对重参数化技术,该技术提供了一种从奇异模型中提取正则子模型的通用方法。以高斯混合模型和神经网络为例,我们从理论和数值上分析了两种参数化下梯度下降的收敛率。通过分析二阶方法和Fisher信息矩阵的显式性质,我们区分了由算法和内在信息几何方面引起的收敛行为差异。

英文摘要

A common way to analyze learning of statistical models is to consider operations in the models parameter space, however this becomes challenging when there is no one-to-one mapping between the parameter space and the underlying statistical model space. Such ``singular models'' occur frequently and exhibit a characteristic decrease in convergence speed of learning trajectories due to attractor behaviors. In this work, we consider a relative reparameterization technique of the parameter space, which yields a general method for extracting regular sub-models from singular models. On the example of Gaussian Mixture Models and Neural Networks we theoretically and numerically analyze the convergence rate for Gradient Descent under both parameterizations. Analyzing second-order methods and explicit properties of the Fisher Information Matrix we distinguish between differences in convergence behavior arising from algorithmic and intrinsic information-geometric aspects.

2312.07762 2026-06-08 cs.LG cs.NA math.NA stat.AP 版本更新

Interpretable factorization of clinical questionnaires to identify latent factors of psychopathology

临床问卷的可解释分解以识别精神病理学的潜在因素

Ka Chun Lam, Bridget W Mahony, Armin Raznahan, Francisco Pereira

发表机构 * Machine Learning Core, National Institute of Mental Health, National Institutes of Health(机器学习核心,国家心理健康研究所,国立卫生研究院) Section on Developmental Neurogenomics, Human Genetics Branch, National Institute of Mental Health, National Institutes of Health(发育神经基因组学部门,人类遗传学分支,国家心理健康研究所,国立卫生研究院) National Institute of Mental Health, National Institutes of Health(国家心理健康研究所,国立卫生研究院)

AI总结 提出可解释性约束问卷分解(ICQF),一种非负矩阵分解方法,通过正则化提高因子可解释性和稳定性,并自动检测潜在维度,在真实数据中优于现有方法。

详情
AI中文摘要

精神病学研究旨在通过识别少量潜在因素来理解问卷数据中测量的行为精神病理学表现。虽然因子分析是传统工具,但所得因子可能不可解释,且可能受混杂变量影响。此外,缺失数据常见,通常需要显式插补。为克服这些限制,我们引入了可解释性约束问卷分解(ICQF),一种针对问卷数据正则化的非负矩阵分解方法。我们的方法旨在提高因子可解释性和解稳定性。我们提供了具有理论收敛保证的优化过程,以及自动准确检测潜在维度的程序。我们使用逼真的合成数据验证了这些程序。我们在两个独立数据集(健康大脑网络和青少年大脑认知发展研究)中展示了该方法在广泛使用的通用问卷中的有效性。具体而言,我们表明ICQF提高了领域专家定义的可解释性,同时保留了跨一系列障碍的诊断信息,并在较小数据集规模下优于竞争方法。这表明我们方法中的正则化与领域特征相匹配。ICQF的Python实现可在https://github.com/jefferykclam/ICQF获取。

英文摘要

Psychiatry research seeks to understand the manifestations of psychopathology in behavior, as measured in questionnaire data, by identifying a small number of latent factors that explain them. While factor analysis is the traditional tool for this purpose, the resulting factors may not be interpretable, and may also be subject to confounding variables. Moreover, missing data are common, and explicit imputation is often required. To overcome these limitations, we introduce interpretability constrained questionnaire factorization (ICQF), a non-negative matrix factorization method with regularization tailored for questionnaire data. Our method aims to promote factor interpretability and solution stability. We provide an optimization procedure with theoretical convergence guarantees, and an automated procedure to detect latent dimensionality accurately. We validate these procedures using realistic synthetic data. We demonstrate the effectiveness of our method in a widely used general-purpose questionnaire, in two independent datasets (the Healthy Brain Network and Adolescent Brain Cognitive Development studies). Specifically, we show that ICQF improves interpretability, as defined by domain experts, while preserving diagnostic information across a range of disorders, and outperforms competing methods for smaller dataset sizes. This suggests that the regularization in our method matches domain characteristics. The python implementation for ICQF is available at https://github.com/jefferykclam/ICQF.

2403.05532 2026-06-08 cs.LG cs.CV 版本更新

Twin: Tuning Learning Rate and Weight Decay of Deep Homogeneous Classifiers without Validation

Twin: 无需验证的深度同质分类器学习率和权重衰减调优

Lorenzo Brigato, Stavroula Mougiakakou

发表机构 * ARTORG Center, University of Bern(伯恩大学ARTORG中心)

AI总结 提出Twin方法,利用同质网络的边界最大化动态和训练-测试损失间的经验缩放定律,实现无需验证集的学习率和权重衰减调优,在37个图像分类配置上达到与Oracle基线1.28%的平均绝对误差。

Comments Accepted at TMLR

详情
AI中文摘要

我们介绍了Tune without Validation (Twin),一种简单有效的管道,用于调优同质分类器的学习率和权重衰减,无需验证集,消除了保留数据的需求并避免了两步过程。Twin利用了同质网络的边界最大化动态以及连接超参数配置下训练和测试损失的经验缩放定律。这种数学建模产生了一个依赖于区域的、无需验证的选择规则:在不可分离区域,训练损失在测试损失中是单调的,因此可以预测泛化;而在可分离区域,由于边界最大化,参数的范数成为泛化的可靠指标。在37个图像分类的数据集-架构配置中,我们证明Twin与使用测试准确率选择超参数的Oracle基线相比,平均绝对误差为1.28%。我们展示了Twin在验证数据稀缺的场景(如小数据 regime)或难以且昂贵收集的场景(如医学成像)中的优势。代码可在 https://github.com/lorenzobrigato/twin 获取。

英文摘要

We introduce Tune without Validation (Twin), a simple and effective pipeline for tuning learning rate and weight decay of homogeneous classifiers without validation sets, eliminating the need to hold out data and avoiding the two-step process. Twin leverages the margin-maximization dynamics of homogeneous networks and an empirical scaling law that links training and test losses across hyper-parameter configurations. This mathematical modeling yields a regime-dependent, validation-free selection rule: in the non-separable regime, training loss is monotonic in test loss and therefore predictive of generalization, whereas in the separable regime, the parameters' norm becomes a reliable indicator of generalization due to margin maximization. Across 37 dataset-architecture configurations for image classification, we demonstrate that Twin achieves a mean absolute error of 1.28% compared to an Oracle baseline that selects HPs using test accuracy. We demonstrate Twin's benefits in scenarios where validation data is scarce, such as small-data regimes, or difficult and costly to collect, as in medical imaging. Code available at https://github.com/lorenzobrigato/twin.

2403.10318 2026-06-08 cs.LG 版本更新

pTNAS: Progressive Neural Architecture Search for Tabular Data

pTNAS: 面向表格数据的渐进式神经架构搜索

Naili Xing, Shaofeng Cai, Lingze Zeng, Jiaqi Zhu, Peng Lu, Jian Pei, Beng Chin Ooi

发表机构 * National University of Singapore(新加坡国立大学) University of Science and Technology of China(中国科学技术大学)

AI总结 提出首个针对表格数据的渐进式神经架构搜索方法pTNAS,采用过滤-精炼优化策略,结合零成本代理和固定预算调度算法,实现架构快速识别与性能持续提升,相比其他NAS方法加速高达82.75倍。

详情
AI中文摘要

最近的进展已将表格学习的范式转向表格基础模型,但其准确性依赖于随着上下文大小扩展而性能不佳的高推理成本。当配备精心设计的架构时,深度神经网络仍然是一种极具竞争力且更高效的建模范式;然而,以数据自适应和预算感知的方式识别此类架构仍然具有挑战性。我们提出了pTNAS,这是首个针对表格数据定制的渐进式神经架构搜索(NAS)方法,它能够快速识别可行的架构,并在更多预算可用时持续提高其搜索性能。pTNAS采用了一种过滤-精炼优化策略,结合了高效的免训练和有效的基于训练的架构评估。在过滤阶段,我们引入了pTProxy,这是一种专为表格网络设计的新型零成本代理,它联合捕捉架构的可训练性和表达能力,从而能够快速过滤大型架构搜索空间。在精炼阶段,pTNAS采用固定预算调度算法,从一小批有希望的候选架构中准确识别出性能最佳的架构。我们进一步提出了一种预算感知协调器来整体优化预算分配。实验表明,与其他NAS方法相比,pTNAS将达到全局最佳架构的时间缩短了高达82.75倍,实现了最佳的平均预测排名,并且与TabPFN相比,端到端效率提高了高达4.78倍。

英文摘要

Recent advances have shifted the paradigm of tabular learning toward tabular foundation models, yet their accuracy relies on a heavy inference cost that scales poorly with context size. Deep neural networks remain a highly competitive and more efficient modeling paradigm when equipped with well-designed architectures; however, identifying such architectures in a data-adaptive and budget-aware manner remains challenging. We propose pTNAS, the first progressive neural architecture search (NAS) approach tailored for tabular data, which enables fast identification of a viable architecture and continuously improves its search performance as more budget becomes available. pTNAS adopts a filter-and-refine optimization strategy that combines efficient training-free and effective training-based architecture evaluation. In the filtering phase, we introduce pTProxy, a novel zero-cost proxy specifically designed for tabular networks that jointly captures architectural trainability and expressivity, enabling fast filtering of large architecture search spaces. In the refinement phase, pTNAS employs a fixed-budget scheduling algorithm to accurately identify the best-performing architecture from a small set of promising candidates. We further propose a budget-aware coordinator to optimize budget allocation holistically. Experiments show that pTNAS reduces the time to reach the globally best architecture by up to 82.75 X compared with other NAS approaches, achieves the best average predictive rank, and improves end-to-end efficiency by up to 4.78 X compared with TabPFN.

2406.05670 2026-06-08 cs.LG cs.CR cs.CV 版本更新

Certified Robustness to Data Poisoning in Gradient-Based Training

基于梯度的训练中对数据投毒的认证鲁棒性

Philip Sosnin, Mark N. Müller, Maximilian Baader, Calvin Tsay, Matthew Wicker

发表机构 * Department of Computing, Imperial College London, United Kingdom(帝国理工学院伦敦分校计算机系) Department of Computer Science, ETH Zurich, Switzerland(苏黎世联邦理工学院计算机科学系) LogicStar.ai, Switzerland(LogicStar.ai公司) The Alan Turing Institute, United Kingdom(艾伦·图灵研究所)

AI总结 提出首个框架,通过凸松弛过度近似参数更新集,为梯度下降训练的模型提供针对无目标、有目标投毒和后门攻击的可证明鲁棒性保证。

Comments 21 pages, 8 figures

详情
AI中文摘要

现代机器学习流程利用大量公共数据,使得保证数据质量变得不可行,并使模型容易受到投毒和后门攻击。在攻击下可证明地约束模型行为仍然是一个开放问题。在这项工作中,我们通过开发第一个框架来应对这一挑战,该框架在不修改模型或学习算法的情况下,为使用可能被操纵的数据训练的模型的行为提供可证明的保证。特别是,我们的框架针对训练输入和标签的有界和无界操纵,认证了对无目标和有目标投毒以及后门攻击的鲁棒性。我们的方法利用凸松弛来过度近似给定投毒威胁模型下所有可能的参数更新集,从而允许我们为任何基于梯度的学习算法约束所有可达参数的集合。给定这个参数集,我们提供了最坏情况行为的界限,包括模型性能和后门成功率。我们在多个真实世界数据集上展示了我们的方法,这些数据集来自能源消耗、医学成像和自动驾驶等应用。

英文摘要

Modern machine learning pipelines leverage large amounts of public data, making it infeasible to guarantee data quality and leaving models open to poisoning and backdoor attacks. Provably bounding model behavior under such attacks remains an open problem. In this work, we address this challenge by developing the first framework providing provable guarantees on the behavior of models trained with potentially manipulated data without modifying the model or learning algorithm. In particular, our framework certifies robustness against untargeted and targeted poisoning, as well as backdoor attacks, for bounded and unbounded manipulations of the training inputs and labels. Our method leverages convex relaxations to over-approximate the set of all possible parameter updates for a given poisoning threat model, allowing us to bound the set of all reachable parameters for any gradient-based learning algorithm. Given this set of parameters, we provide bounds on worst-case behavior, including model performance and backdoor success rate. We demonstrate our approach on multiple real-world datasets from applications including energy consumption, medical imaging, and autonomous driving.

2408.15344 2026-06-08 cs.LG math.DS 版本更新

Conformal Disentanglement and Latent-Space Curation: A Neural Framework for Perspective Synthesis, Differentiation and Targeted Generation

共形解缠与潜在空间策展:面向视角合成、区分和定向生成的神经框架

George A. Kevrekidis, Eleni D. Koronaki, Dimitris G. Giovanis, Yannis G. Kevrekidis

发表机构 * Department of Applied Mathematics and Statistics, Johns Hopkins University(应用数学与统计学系,约翰霍普金斯大学) Los Alamos National Laboratory(洛斯阿拉莫斯国家实验室) Faculty of Science, Technology and Medicine, University of Luxembourg(科学、技术与医学学院,卢森堡大学) Department of Civil and Systems Engineering, Johns Hopkins University(土木与系统工程系,约翰霍普金斯大学) Department of Chemical and Biomolecular Engineering, Johns Hopkins University(化学与生物分子工程系,约翰霍普金斯大学)

AI总结 提出一种神经自编码器框架,通过结构约束和正交正则化从多传感器数据中分离共享与传感器特定潜在变量,并利用解缠潜在子空间实现定向生成和跨传感器推断。

详情
AI中文摘要

许多科学和工程问题涉及通过多个异构传感器或测量模态观察同一现象。此类观测通常包含跨传感器共享的信息(反映底层系统)以及来自测量过程或环境效应的传感器特定或外部成分。当传感器独立观测不可用时,解缠这些贡献至关重要。我们提出一种神经自编码器框架,从多传感器数据中显式分离共享和传感器特定的潜在变量。该架构通过结构约束和基于正交的正则化强制潜在组件之间的几何独立性,产生可解释且解缠的表示。基于此表示,我们引入一种潜在空间生成方法,其中生成模型在选定的解缠潜在子空间上被调谐/“限制”;然后我们建设性地组合解缠的观测潜在变量,通过训练的解码器条件合成新样本。这使得能够生成具有指定共享(或传感器特定)特征的一致数据。它还通过一致地采样未观测模态中合理测量的分布来支持跨传感器推断。我们在多个计算示例上展示了该方法,显示了在异构传感设置中的有效解缠、定向数据生成和模态插补。

英文摘要

Many scientific and engineering problems involve observing a common phenomenon through multiple heterogeneous sensors or measurement modalities. Such observations typically contain both information shared across sensors, reflecting the underlying system, and sensor-specific or extraneous components arising from measurement processes or environmental effects. Disentangling these contributions is essential when sensor-independent observations are unavailable. We propose a neural autoencoder framework that explicitly separates shared and sensor-specific latent variables from multi-sensor data. The architecture enforces geometric independence between latent components through structural constraints and orthogonality-based regularization, yielding interpretable and disentangled representations. Building on this representation, we then introduce a latent-space generative methodology in which generative models are tuned/"restricted" on selected disentangled latent subspaces; we then constructively combine disentangled observed latent variables to conditionally synthesize new samples via trained decoders. This enables consistent data generation with prescribed shared (or sensor-specific) characteristics. It also supports cross-sensor inference by consistently sampling distributions over plausible measurements in unobserved modalities. We demonstrate the approach on several computational examples, showing effective disentanglement, targeted data generation, and modality imputation in heterogeneous sensing settings.

2411.09734 2026-06-08 cs.LG cs.NA math.NA math.OC 版本更新

Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations

用积分微分方程建模 AdaGrad、RMSProp 和 Adam

Carlos Heredia

发表机构 * IAMM Research, Department of Applied Artificial Intelligence(IAMM研究院应用人工智能系) DAMM

AI总结 提出 AdaGrad、RMSProp 和 Adam 的连续时间积分微分方程模型,通过数值模拟和稳定性分析验证其与离散算法的一致性,为自适应优化方法提供新视角。

Comments 60 pages, 15 figures; v3 - Section 4 corrected

详情
AI中文摘要

在本文中,我们通过将 AdaGrad、RMSProp 和 Adam 优化算法建模为一阶积分微分方程,提出了它们的连续时间形式。我们对这些方程进行数值模拟,并进行稳定性和收敛性分析,以证明它们作为原始算法准确近似的有效性。我们的结果表明,连续时间模型与离散实现的行为高度一致,从而为自适应优化方法的理论理解提供了新的视角。

英文摘要

In this paper, we propose a continuous-time formulation for the AdaGrad, RMSProp, and Adam optimization algorithms by modeling them as first-order integro-differential equations. We perform numerical simulations of these equations, along with stability and convergence analyses, to demonstrate their validity as accurate approximations of the original algorithms. Our results indicate a strong agreement between the behavior of the continuous-time models and the discrete implementations, thus providing a new perspective on the theoretical understanding of adaptive optimization methods.

2412.09119 2026-06-08 cs.LG cs.CR math.OC 版本更新

The Utility and Complexity of in- and out-of-Distribution Machine Unlearning

分布内与分布外机器遗忘的效用与复杂性

Youssef Allouah, Joshua Kazdan, Rachid Guerraoui, Sanmi Koyejo

发表机构 * EPFL(瑞士联邦理工学院) Stanford University(斯坦福大学)

AI总结 本文分析近似机器遗忘的效用、时间和空间复杂度权衡,提出输出扰动的经验风险最小化实现分布内遗忘的紧致权衡,并针对分布外遗忘提出鲁棒噪声梯度下降变体以摊销时间复杂性。

详情
AI中文摘要

机器遗忘,即从训练模型中选择性移除数据的过程,对于解决部署后的隐私问题和知识差距日益关键。尽管重要性显著,现有方法通常是启发式的且缺乏形式化保证。在本文中,我们分析了近似遗忘的基本效用、时间和空间复杂度权衡,提供了类似于差分隐私的严格认证。对于分布内遗忘数据——与保留集相似的数据——我们展示了一个出奇简单且通用的过程,即带有输出扰动的经验风险最小化,实现了紧致的遗忘-效用-复杂度权衡,解决了之前关于通过差分隐私实现“免费”遗忘的理论分离问题,差分隐私本质上便于移除此类数据。然而,这些技术在处理分布外遗忘数据——与保留集显著不同的数据——时失效,此时遗忘时间复杂度可能超过重新训练,即使对于单个样本也是如此。为了解决这个问题,我们提出了一种新的鲁棒噪声梯度下降变体,该变体在不损害效用的前提下可证明地摊销了遗忘时间复杂度。

英文摘要

Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning, providing rigorous certification analogous to differential privacy. For in-distribution forget data -- data similar to the retain set -- we show that a surprisingly simple and general procedure, empirical risk minimization with output perturbation, achieves tight unlearning-utility-complexity trade-offs, addressing a previous theoretical gap on the separation from unlearning "for free" via differential privacy, which inherently facilitates the removal of such data. However, such techniques fail with out-of-distribution forget data -- data significantly different from the retain set -- where unlearning time complexity can exceed that of retraining, even for a single sample. To address this, we propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.

2502.00225 2026-06-08 cs.LG cs.AI cs.CL 版本更新

Should You Use Your Large Language Model to Explore or Exploit?

你应该使用你的大语言模型进行探索还是利用?

Keegan Harris, Aleksandrs Slivkins

发表机构 * UC Berkeley(伯克利大学) Microsoft Research(微软研究院)

AI总结 研究当前大语言模型在探索-利用权衡中的决策能力,通过分离探索和利用任务评估其表现,发现推理模型在利用任务上最有潜力但成本高,非推理模型通过工具使用和上下文总结可提升中等难度任务性能,但在所有任务中均不如简单线性回归,然而LLM在具有语义的大动作空间探索中有帮助。

Comments Accepted to UAI 2026

详情
AI中文摘要

我们评估了当前一代大语言模型(LLMs)在面对探索-利用权衡时的决策能力。虽然先前的工作主要研究LLMs解决组合探索-利用任务的能力,我们采取了更系统的方法,将LLMs用于在各种(上下文)赌博机任务中分别进行探索和利用。我们发现推理模型在解决利用任务方面最有前景,尽管它们在实际应用中仍然过于昂贵或缓慢。受此启发,我们研究了非推理模型的工具使用和上下文总结。我们发现这些缓解措施可以显著提高中等难度任务的性能,但即便如此,我们研究的所有LLMs在所有任务中(包括非线性设置)的表现都不如简单的线性回归。另一方面,我们发现LLMs在探索具有内在语义的大动作空间时确实有帮助,通过建议合适的候选动作进行探索。

英文摘要

We evaluate the ability of the current generation of large language models (LLMs) to help a decision-making agent facing an exploration-exploitation tradeoff. While previous work has largely study the ability of LLMs to solve combined exploration-exploitation tasks, we take a more systematic approach and use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that reasoning models show the most promise for solving exploitation tasks, although they are still too expensive or too slow to be used in many practical settings. Motivated by this, we study tool use and in-context summarization using non-reasoning models. We find that these mitigations may be used to substantially improve performance on medium-difficulty tasks, however even then, all LLMs we study perform worse than a simple linear regression, even in non-linear settings. On the other hand, we find that LLMs do help at exploring large action spaces with inherent semantics, by suggesting suitable candidates to explore.

2502.00527 2026-06-08 cs.LG cs.CL 版本更新

PolarQuant: Leveraging Polar Transformation for Efficient Key Cache Quantization and Decoding Acceleration

PolarQuant: 利用极坐标变换实现高效键缓存量化和解码加速

Songhao Wu, Ang Lv, Xiao Feng, Yufei Zhang, Xun Zhang, Guojun Yin, Wei Lin, Rui Yan

发表机构 * Gaoling School of Artificial Intelligence, Renmin University of China(中国人民大学北京校区人工智能学院) ShanghaiTech University(上海科技大学) Meituan(美团)

AI总结 提出PolarQuant方法,通过将键向量分组为二维子向量并编码为量化半径和极角,解决键缓存量化中的异常值问题,同时通过查表加速解码,保持全精度模型性能。

Comments NeurIPS 2025 version with minor revisions to the methodology

详情
AI中文摘要

大型语言模型中的KV缓存是内存使用的主要因素,限制了其更广泛的适用性。将缓存量化到更低的位宽是减少计算成本的有效方法;然而,先前的方法由于异常值的存在,难以量化键向量,导致过高的开销。我们提出了一种名为PolarQuant的新型量化方法,有效解决了异常值挑战。我们观察到,异常值通常只出现在两个维度中的一个,当应用旋转位置嵌入时,这两个维度会一起旋转特定角度。当表示为二维向量时,这些维度展现出结构良好的模式,半径和角度在极坐标中平滑分布。这减轻了异常值对逐通道量化的挑战,使其非常适合量化。因此,PolarQuant将键向量分为二维子向量组,将其编码为相应的量化半径和极角,而不是直接量化原始键向量。PolarQuant在KV缓存量化中实现了卓越的效率,并通过将查询-键内积转化为查表操作来加速解码过程,同时保持全精度模型的下游性能。

英文摘要

The KV cache in large language models is a dominant factor in memory usage, limiting their broader applicability. Quantizing the cache to lower bit widths is an effective way to reduce computational costs; however, previous methods struggle with quantizing key vectors due to outliers, resulting in excessive overhead. We propose a novel quantization approach called PolarQuant, which efficiently addresses the outlier challenge. We observe that outliers typically appear in only one of two dimensions, which are rotated together by a specific angle when rotary position embeddings are applied. When represented as two-dimensional vectors, these dimensions exhibit well-structured patterns, with radii and angles smoothly distributed in polar coordinates. This alleviates the challenge of outliers on per-channel quantization, making them well-suited for quantization. Thus, PolarQuant divides key vectors into groups of two-dimensional sub-vectors, encoding them as the corresponding quantized radius and the polar angle, rather than quantizing original key vectors directly. PolarQuant achieves the superior efficiency in KV cache quantization and accelerates the decoding process by turning the query-key inner product into a table lookup, all while maintaining the downstream performance of full-precision models.

2503.03660 2026-06-08 cs.LG 版本更新

Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns

分块评论家:基于Transformer的软演员-评论家算法与N步回报

Dong Tian, Onur Celik, Gerhard Neumann

发表机构 * Karlsruhe Institute of Technology (KIT)(卡尔斯鲁厄理工学院)

AI总结 提出一种序列条件评论家,用轻量Transformer建模轨迹上下文并聚合N步目标训练,在不使用重要性采样下增强评论家对长时域和稀疏奖励问题的时序建模能力。

Comments 39 pages, 15 figures, ICLR2026 Poster

详情
AI中文摘要

我们为软演员-评论家(SAC)引入了一种序列条件评论家,它使用轻量级Transformer建模轨迹上下文,并在聚合的$N$步目标上训练。与先前的方法不同,先前方法要么(i)孤立地对状态-动作对评分,要么(ii)依赖演员侧的动作分块来处理长时域,我们的方法通过条件化短轨迹段并整合多步回报来增强评论家本身——无需重要性采样(IS)。由此产生的序列感知价值估计捕获了扩展时域和稀疏奖励问题的关键时间结构。在局部运动基准上,我们进一步表明,冻结评论家参数几步使得我们的更新与CrossQ的核心思想兼容,从而在无需目标网络的情况下实现稳定训练。尽管其简单性——一个2层Transformer,128-256个隐藏单元,以及最大更新-数据比(UTD)为$1$——该方法始终优于标准SAC和强离策略基线,在长轨迹控制上尤其获得巨大收益。这些结果突显了评论家侧序列建模和$N$步自举对长时域强化学习的价值。

英文摘要

We introduce a sequence-conditioned critic for Soft Actor-Critic (SAC) that models trajectory context with a lightweight Transformer and trains on aggregated $N$-step targets. Unlike prior approaches that (i) score state-action pairs in isolation or (ii) rely on actor-side action chunking to handle long horizons, our method strengthens the critic itself by conditioning on short trajectory segments and integrating multi-step returns -- without importance sampling (IS). The resulting sequence-aware value estimates capture the critical temporal structure for extended-horizon and sparse-reward problems. On local-motion benchmarks, we further show that freezing critic parameters for several steps makes our update compatible with CrossQ's core idea, enabling stable training \emph{without} a target network. Despite its simplicity -- a 2-layer Transformer with 128-256 hidden units and a maximum update-to-data ratio (UTD) of $1$ -- the approach consistently outperforms standard SAC and strong off-policy baselines, with particularly large gains on long-trajectory control. These results highlight the value of sequence modeling and $N$-step bootstrapping on the critic side for long-horizon reinforcement learning.

2505.10892 2026-06-08 cs.LG 版本更新

Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models

多目标偏好优化:提升生成模型的人类对齐

Akhil Agnihotri, Rahul Jain, Deepak Ramachandran, Zheng Wen

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 针对RLHF和偏好优化方法假设单一目标的问题,提出多目标偏好优化框架MOPO,通过约束KL散度最大化主要目标并保障次要目标下限,在合成基准和人类偏好数据上实现帕累托最优策略。

详情
AI中文摘要

使用RLHF和偏好优化方法(如DPO、IPO)对LLM进行后训练已大大改善了对齐,但这些方法假设单一目标。实际上,人类表达多个通常相互冲突的目标,例如有用性和无害性,没有自然的标量化。我们研究多目标偏好对齐问题,其中策略必须同时平衡多个目标。我们提出多目标偏好优化(MOPO),一个受约束的KL正则化框架,通过可调安全阈值在强制执行次要目标下限的同时最大化主要目标。MOPO直接操作成对偏好,无需点式奖励,并允许简单的闭式迭代更新。实验上,MOPO在合成基准上恢复帕累托最优策略,并在人类偏好数据上微调时,产生数十亿参数模型,实现更高奖励和帕累托支配基线,具有稳定且鲁棒的优化动态。

英文摘要

Post-training LLMs with RLHF and preference optimization methods (e.g., DPO, IPO) has greatly improved alignment, yet these approaches assume a single objective. In reality, humans express multiple, often conflicting objectives, such as helpfulness and harmlessness, with no natural scalarization. We study the multi-objective preference alignment problem, where a policy must balance several objectives simultaneously. We propose Multi-Objective Preference Optimization (MOPO), a constrained KL-regularized framework that maximizes a primary objective while enforcing lower bounds on secondary objectives via tunable safety thresholds. MOPO operates directly on pairwise preferences without point-wise rewards, and admits simple closed-form iterative updates. Empirically, MOPO recovers Pareto-optimal policies on synthetic benchmarks and, when fine-tuned on human-preference data, yields multi-billion parameter models that achieve higher rewards and Pareto-dominate baselines, with stable and robust optimization dynamics.

2505.12239 2026-06-08 cs.LG cs.AI cs.CR 版本更新

Towards Efficient and Exact Forgetting Services in Pre-Trained-Model-based Continual Learning

面向基于预训练模型的持续学习中的高效且精确的遗忘服务

Yajiang Huang, Jianheng Tang, Kejia Fan, Huiping Zhuang, Anfeng Liu, Tian Wang, Yunhuai Liu, Mianxiong Dong, Houbing Herbert Song

发表机构 * Department of Information Systems, University of Maryland, Baltimore County (UMBC)(马里兰大学巴尔的摩分校信息系统系)

AI总结 针对持续学习中顺序遗忘请求的挑战,提出基于解析方法的持续遗忘(ACU),通过最小二乘递归推导闭式解,实现高效精确的遗忘,保护历史数据隐私。

详情
AI中文摘要

在持续学习(CL)中,使用预训练模型(PTM)作为特征提取器已成为一种流行做法。结合解析分类器,基于PTM的方法在CL中实现了最先进的性能,追求非遗忘目标。同时,在大多数服务构建范式(例如移动群智感知(MCS))中,主动遗忘在CL阶段获得的特定知识也至关重要,其中移动边缘节点不断收集传感数据,不仅需要非遗忘适应,还需要特定知识遗忘以保护隐私。因此,当遗忘请求在CL中顺序出现时,产生了一个独特的问题,称为持续遗忘(CU)。然而,现有的遗忘方法专注于单次联合遗忘,在应用于CU时显得非常不足,包括(1)违反CL中的历史数据隐私,以及(2)容易被对抗性频繁请求淹没或降级。为了应对CU的挑战,我们提出了一种无梯度方法,称为解析持续遗忘(ACU),用于在基于PTM的CL中实现高效且精确的遗忘,同时保护历史数据隐私。针对每个遗忘请求,我们的ACU通过最小二乘法以可解释的方式递归推导解析(即闭式)解。通过精心设计,我们的ACU兼容样本级和类别级遗忘请求。理论和实验评估验证了我们的ACU在遗忘有效性、模型保真度和系统效率方面的优越性。

英文摘要

In Continual Learning (CL), using a Pre-Trained Model (PTM) as the feature extractor has become a popular practice. Accompanied by analytic classifiers, the PTM-based methods have achieved state-of-the-art performance in CL, in pursuit of the non-forgetting goal. Meanwhile, actively forgetting specific knowledge acquired during the CL phase is also essential in most service construction paradigms, for example, Mobile Crowd Sensing (MCS), where mobile edge nodes continuously collect sensory data and demand not only non-forgetting adaptation but also specific knowledge forgetting for privacy preservation. Thus, a unique problem, called Continual Unlearning (CU), arises when the forgetting requests show sequentially in CL. However, existing unlearning methods focus on single-shot joint forgetting and prove highly inadequate when applied to CU, including (1) violating the historical data privacy in CL and (2) vulnerably being overwhelmed or degraded with adversarially frequent requests. To handle the challenges of CU, we propose a gradient-free approach, called Analytic Continual Unlearning (ACU), for efficient and exact forgetting with historical data privacy preservation in PTM-based CL. In response to each unlearning request, our ACU recursively derives the analytical (i.e., closed-form) solutions via least squares in an interpretable manner. By meticulous design, our ACU is compatible with both sample-level and class-level unlearning requests. The theoretical and experimental evaluations validate our ACU's superiority in unlearning effectiveness, model fidelity, and system efficiency.

2505.21285 2026-06-08 cs.LG stat.ML 版本更新

Learnable Kernel Density Estimation for Graphs and Its Application to Graph-Level Anomaly Detection

可学习图核密度估计及其在图级异常检测中的应用

Xudong Wang, Ziheng Sun, Chris Ding, Jicong Fan

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出LGKDE框架,通过图神经网络表示图分布并利用最大均值差异学习多尺度核密度估计,在理论保证下有效捕获结构模式和语义变化,在图异常检测任务中优于现有方法。

Comments Accepted in the Forty-Third International Conference on Machine Learning (ICML 2026), Main Track

详情
AI中文摘要

本文提出一个名为LGKDE的框架,用于学习图的核密度估计。图密度估计的关键挑战在于有效捕获结构模式和语义变化,同时保持理论保证。结合图核和核密度估计(KDE)是图密度估计的标准方法,但由于核的手工设计和固定特征,性能不佳。我们的方法LGKDE利用图神经网络将每个图表示为离散分布,并利用最大均值差异学习多尺度KDE的图度量,其中所有参数通过最大化图相对于其精心设计的扰动版本的密度来学习。扰动在节点特征和图谱上进行,有助于更好地刻画正常密度区域的边界。理论上,我们为LGKDE建立了一致性和收敛性保证,包括均方积分误差界、鲁棒性和泛化性。我们通过展示其在恢复合成图分布底层密度方面的有效性,并将其应用于多个基准数据集上的图异常检测来验证LGKDE。广泛的实证评估表明,在大多数基准数据集上,LGKDE相比最先进的基线方法表现出优越的性能。

英文摘要

This work proposes a framework LGKDE that learns kernel density estimation for graphs. The key challenge in graph density estimation lies in effectively capturing both structural patterns and semantic variations while maintaining theoretical guarantees. Combining graph kernels and kernel density estimation (KDE) is a standard approach to graph density estimation, but has unsatisfactory performance due to the handcrafted and fixed features of kernels. Our method LGKDE leverages graph neural networks to represent each graph as a discrete distribution and utilizes maximum mean discrepancy to learn the graph metric for multi-scale KDE, where all parameters are learned by maximizing the density of graphs relative to the density of their well-designed perturbed counterparts. The perturbations are conducted on both node features and graph spectra, which helps better characterize the boundary of normal density regions. Theoretically, we establish consistency and convergence guarantees for LGKDE, including bounds on the mean integrated squared error, robustness, and generalization. We validate LGKDE by demonstrating its effectiveness in recovering the underlying density of synthetic graph distributions and applying it to graph anomaly detection across diverse benchmark datasets. Extensive empirical evaluation shows that LGKDE demonstrates superior performance compared to state-of-the-art baselines on most benchmark datasets.

2505.21423 2026-06-08 cs.LG stat.ML 版本更新

Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization

稳定性边缘的冲突偏差:范数与锐度正则化

Maria Matveev, Vit Fojtik, Hung-Hsu Chou, Gitta Kutyniok, Johannes Maly

发表机构 * Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心) Prusa Research(普拉萨研究公司) Institute for Robotics and Mechatronics, DLR-German Aerospace Center(德国航空航天中心机器人与机电研究所)

AI总结 本文研究过参数化网络中梯度下降的隐式正则化,证明学习率在低范数与低锐度之间插值,且单一偏差不足以解释泛化,需考虑动态权衡。

Comments Accepted at ICML 2026

详情
AI中文摘要

过参数化网络显著的泛化性能通常归因于隐式偏差,例如小学习率下的范数最小化和稳定性边缘(Edge-of-Stability)状态下的低锐度。在这项工作中,我们认为全面理解梯度下降的泛化性能需要分析这些不同形式的隐式正则化之间的相互作用。我们通过实验证明,学习率在训练模型的低参数范数和低锐度之间插值。此外,我们证明对于在简单回归任务上训练的对角线性网络,单独的隐式偏差都不能最小化泛化误差。这些发现表明,仅关注单一隐式偏差不足以解释良好的泛化,并促使我们采用更广阔的隐式正则化视角,捕捉由不可忽略的学习率引起的范数与锐度之间的动态权衡。

英文摘要

The remarkable generalization properties of overparameterized networks are often attributed to implicit biases, such as norm minimization at small learning rates and low sharpness in the Edge-of-Stability regime. In this work, we argue that a comprehensive understanding of the generalization performance of gradient descent requires analyzing the interaction between these various forms of implicit regularization. We empirically demonstrate that the learning rate interpolates between low parameter norm and low sharpness of the trained model. We furthermore prove that neither implicit bias alone minimizes the generalization error for diagonal linear networks trained on a simple regression task. These findings demonstrate that focusing on a single implicit bias is insufficient to explain good generalization, and they motivate a broader view of implicit regularization that captures the dynamic trade-off between norm and sharpness induced by non-negligible learning rates.

2505.23131 2026-06-08 cs.LG cs.DC 版本更新

DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs

DOPPLER: 异步数据流图中设备分配的双策略学习

Xinyu Yao, Daniel Bourgeois, Abhinav Jain, Yuxin Tang, Jiawen Yao, Zhimin Ding, Arlei Silva, Chris Jermaine

发表机构 * Rice University(里士大学) Rice Ken Kennedy Institute(里士肯尼迪研究所)

AI总结 提出Doppler框架,通过双策略网络(SEL选择操作、PLC放置设备)优化异步数据流图中的设备分配,减少执行时间并提高采样效率。

Comments 32 pages, 19 figures

详情
Journal ref
Proceedings of the International Conference on Learning Representations (ICLR), 2026
AI中文摘要

我们研究在work-conserving系统中将数据流图中的操作分配给设备以最小化执行时间的问题,重点关注复杂的机器学习工作负载。先前的基于学习的方法常常因三个关键限制而难以奏效:(1) 依赖像TensorFlow这样的批量同步系统,由于屏障同步导致设备利用率不足;(2) 在设计基于学习的方法时缺乏对底层系统调度机制的了解;(3) 完全依赖强化学习,忽略了专家设计的有效启发式结构。在本文中,我们提出Doppler,一个用于训练双策略网络的三阶段框架,包括1) 用于选择操作的$\mathsf{SEL}$策略和2) 用于将所选操作放置到设备上的$\mathsf{PLC}$策略。我们的实验表明,Doppler通过减少系统执行时间在所有任务上优于所有基线方法,并且通过减少每回合训练时间展示了采样效率。

英文摘要

We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based methods often struggle due to three key limitations: (1) reliance on bulk-synchronous systems like TensorFlow, which under-utilize devices due to barrier synchronization; (2) lack of awareness of the scheduling mechanism of underlying systems when designing learning-based methods; and (3) exclusive dependence on reinforcement learning, ignoring the structure of effective heuristics designed by experts. In this paper, we propose Doppler, a three-stage framework for training dual-policy networks consisting of 1) a $\mathsf{SEL}$ policy for selecting operations and 2) a $\mathsf{PLC}$ policy for placing chosen operations on devices. Our experiments show that Doppler outperforms all baseline methods across tasks by reducing system execution time and additionally demonstrates sampling efficiency by reducing per-episode training time.

2505.23437 2026-06-08 cs.LG cs.AI cs.IR 版本更新

Bounded-Abstention Pairwise Learning to Rank

有界弃权成对学习排序

Antonio Ferrara, Andrea Pugnana, Francesco Bonchi, Salvatore Ruggieri

发表机构 * Intesa Sanpaolo AI Research(Intesa Sanpaolo AI研究中心) University of Trento(特伦托大学) University of Pisa(比萨大学)

AI总结 提出一种基于条件风险阈值的成对排序弃权方法,理论刻画最优策略,设计模型无关的插件算法,实验验证有效性。

Comments KDD 2026

详情
AI中文摘要

排序系统影响健康、教育和就业等高风险领域的决策,可能产生重大经济和社会影响,因此集成安全机制至关重要。弃权是一种安全机制,允许算法决策系统将不确定或低置信度的决策推迟给人类专家。虽然弃权主要在分类任务中研究,但其在其他机器学习范式中的应用尚不充分。本文提出一种用于成对学习排序任务的弃权新方法。该方法基于对排序器条件风险设置阈值:当估计风险超过预定义阈值时,系统弃权不做决策。我们的贡献有三方面:最优弃权策略的理论刻画、一个模型无关的插件式算法用于构建弃权排序模型,以及在多个数据集上的全面实证评估,证明了我们方法的有效性。

英文摘要

Ranking systems influence decision-making in high-stakes domains like health, education, and employment, where they can have substantial economic and social impacts. This makes the integration of safety mechanisms essential. One such mechanism is abstention, which enables algorithmic decision-making systems to defer uncertain or low-confidence decisions to human experts. While abstention has been predominantly explored in the context of classification tasks, its application to other machine learning paradigms remains underexplored. In this paper, we introduce a novel method for abstention in pairwise learning-to-rank tasks. Our approach is based on thresholding the ranker's conditional risk: the system abstains from making a decision when the estimated risk exceeds a predefined threshold. Our contributions are threefold: a theoretical characterization of the optimal abstention strategy, a model-agnostic, plug-in algorithm for constructing abstaining ranking models, and a comprehensive empirical evaluation across multiple datasets, demonstrating the effectiveness of our approach.

2507.12927 2026-06-08 cs.LG cs.IT math.IT 版本更新

Trace Reconstruction with Language Models

基于语言模型的迹重建

Franziska Weindel, Michael Girsch, Reinhard Heckel

发表机构 * School of Computation, Information and Technology, Technical University of Munich(计算、信息与技术学院,慕尼黑技术大学) Munich Center for Machine Learning(慕尼黑机器学习中心)

AI总结 提出TReconLM解码器仅变换器,将迹重建视为下一个标记预测任务,在合成和真实数据上预训练和微调,显著优于现有算法。

详情
AI中文摘要

一般的迹重建问题旨在从被插入、删除和替换独立损坏的噪声副本中恢复原始序列。该问题出现在DNA数据存储等应用中,DNA数据存储因其高信息密度和持久性而成为一种有前景的存储介质。然而,DNA合成、存储和测序过程中引入的错误需要通过算法和编码进行纠正,而迹重建通常作为数据检索的一部分。在这项工作中,我们提出了TReconLM,一种仅解码器的变换器,将迹重建作为下一个标记预测任务来解决。TReconLM优于最先进的迹重建算法,包括先前的深度学习方法,能够以无错误的方式恢复更高比例的序列。我们在基于简单错误模型生成的合成数据上进行预训练,并在真实世界数据上进行微调,以适应特定技术的错误模式。代码可在https://github.com/MLI-lab/TReconLM获取。

英文摘要

The general trace reconstruction problem seeks to recover an original sequence from its noisy copies independently corrupted by insertions, deletions, and substitutions. This problem arises in applications such as DNA data storage, a promising storage medium due to its high information density and longevity. However, errors introduced during DNA synthesis, storage, and sequencing require correction through algorithms and codes, with trace reconstruction often used as part of data retrieval. In this work, we propose TReconLM, a decoder-only transformer that solves trace reconstruction as a next-token prediction task. TReconLM outperforms state-of-the-art trace reconstruction algorithms, including prior deep-learning approaches, recovering a substantially higher fraction of sequences without error. We pretrain on synthetic data generated from a simple error model and fine-tune on real-world data to adapt to technology-specific error patterns. Code is available at https://github.com/MLI-lab/TReconLM.

2508.02039 2026-06-08 cs.LG stat.ML 版本更新

Model Recycling Framework for Multi-Source Data-Free Supervised Transfer Learning

多源无数据监督迁移学习的模型回收框架

Sijia Wang, Ricardo Henao

发表机构 * Department of Electrical and Computer Engineering(电气与计算机工程系) Duke University(杜克大学)

AI总结 提出模型回收框架,在无源数据情况下,通过识别相关源模型子集实现白盒和黑盒设置下的参数高效迁移学习,支持多源无数据监督迁移学习。

详情
AI中文摘要

对数据隐私的日益关注以及与检索源数据进行模型训练相关的其他困难,催生了无源迁移学习的需求,在这种学习中,只能访问预训练模型,而不能访问原始源域的数据。这种设置带来了许多挑战,因为许多现有的迁移学习方法通常依赖于对源数据的访问,这限制了它们直接应用于源数据不可用的场景。此外,实际问题使其更加困难,例如在没有源数据信息的情况下有效选择迁移模型,以及在没有完全访问源模型的情况下进行迁移。受此启发,我们提出了一个模型回收框架,用于参数高效的模型训练,该框架在白盒和黑盒设置中识别要重用的相关源模型的子集。因此,我们的框架使模型即服务(MaaS)提供商能够构建高效预训练模型的库,从而为多源无数据监督迁移学习创造了机会。

英文摘要

Increasing concerns for data privacy and other difficulties associated with retrieving source data for model training have created the need for source-free transfer learning, in which one only has access to pre-trained models instead of data from the original source domains. This setting introduces many challenges, as many existing transfer learning methods typically rely on access to source data, which limits their direct applicability to scenarios where source data is unavailable. Further, practical concerns make it more difficult, for instance efficiently selecting models for transfer without information on source data, and transferring without full access to the source models. So motivated, we propose a model recycling framework for parameter-efficient training of models that identifies subsets of related source models to reuse in both white-box and black-box settings. Consequently, our framework makes it possible for Model as a Service (MaaS) providers to build libraries of efficient pre-trained models, thus creating an opportunity for multi-source data-free supervised transfer learning.

2508.17821 2026-06-08 cs.LG cs.AI cs.CL 版本更新

Limitations of Normalization in Attention Mechanism

注意力机制中归一化的局限性

Timur Mudarisov, Mikhail Burtsev, Tatiana Petrova, Radu State

发表机构 * University of Luxembourg(卢森堡大学) London Institute for Mathematical Sciences(伦敦数学科学研究所)

AI总结 本文通过理论框架和GPT-2实验,揭示softmax归一化导致注意力随选择token数增加而趋于均匀,并分析低温度下梯度敏感性带来的训练挑战。

详情
AI中文摘要

本文研究了注意力机制中归一化的局限性。我们首先建立了一个理论框架,用于识别模型的选择能力以及token选择中涉及的几何分离。我们的分析包括在softmax缩放下token向量距离和分离准则的显式界限。通过使用预训练的GPT-2模型进行实验,我们实证验证了理论结果,并分析了注意力机制的关键行为。值得注意的是,我们证明随着所选token数量的增加,模型区分信息性token的能力下降,通常趋向于均匀选择模式。我们还表明,softmax归一化下的梯度敏感性在训练过程中带来了挑战,尤其是在低温度设置下。这些发现推进了当前对基于softmax的注意力机制的理解,并激发了在未来注意力架构中需要更稳健的归一化和选择策略的需求。

英文摘要

This paper investigates the limitations of the normalization in attention mechanisms. We begin with a theoretical framework that enables the identification of the model's selective ability and the geometric separation involved in token selection. Our analysis includes explicit bounds on distances and separation criteria for token vectors under softmax scaling. Through experiments with pre-trained GPT-2 model, we empirically validate our theoretical results and analyze key behaviors of the attention mechanism. Notably, we demonstrate that as the number of selected tokens increases, the model's ability to distinguish informative tokens declines, often converging toward a uniform selection pattern. We also show that gradient sensitivity under softmax normalization presents challenges during training, especially at low temperature settings. These findings advance current understanding of softmax-based attention mechanism and motivate the need for more robust normalization and selection strategies in future attention architectures.

2509.05316 2026-06-08 cs.LG cs.AI 版本更新

Standard vs. Modular Sampling: Best Practices for Reliable LLM Unlearning

标准采样与模块化采样:可靠的大语言模型遗忘的最佳实践

Praveen Bushipaka, Lucia Passaro, Tommaso Cucinotta

发表机构 * Scuola Superiore Sant’Anna(圣安纳高等学院) University of Pisa(比萨大学)

AI总结 针对大语言模型遗忘中采样策略的不足,提出模块化实体级遗忘(MELU)策略,通过多样化邻居集和模块化采样平衡遗忘效果与模型效用。

详情
AI中文摘要

传统的大语言模型遗忘设置包含两个子集——

英文摘要

A conventional LLM Unlearning setting consists of two subsets -"forget" and "retain", with the objectives of removing the undesired knowledge from the forget set while preserving the remaining knowledge from the retain. In privacy-focused unlearning research, a retain set is often further divided into neighbor sets, containing either directly or indirectly connected to the forget targets; and augmented by a general-knowledge set. A common practice in existing benchmarks is to employ only a single neighbor set, with general knowledge which fails to reflect the real-world data complexities and relationships. LLM Unlearning typically involves 1:1 sampling or cyclic iteration sampling. However, the efficacy and stability of these de facto standards have not been critically examined. In this study, we systematically evaluate these common practices. Our findings reveal that relying on a single neighbor set is suboptimal and that a standard sampling approach can obscure performance trade-offs. Based on this analysis, we propose and validate an initial set of best practices: (1) Incorporation of diverse neighbor sets to balance forget efficacy and model utility, (2) Standard 1:1 sampling methods are inefficient and yield poor results, (3) Our proposed Modular Entity-Level Unlearning (MELU) strategy as an alternative to cyclic sampling. We demonstrate that this modular approach, combined with robust algorithms, provides a clear and stable path towards effective unlearning.

2509.21751 2026-06-08 cs.LG physics.comp-ph physics.flu-dyn 版本更新

On the Effect of Neural Field Reparameterization for 4DVAR

神经场重参数化对四维变分同化的影响

Jaemin Oh

发表机构 * Division of Applied Mathematics, Brown University(布朗大学应用数学系)

AI总结 提出用神经场重参数化4DVAR,利用谱偏置隐式正则化,无需背景误差协方差,实现并行时间优化,在混沌基准测试中优于经典方法。

Comments 26 pages, 9 figures, 11 tables

详情
AI中文摘要

四维变分资料同化(4DVAR)是数值天气预报的基石,但由于目标函数的非凸性,它仍然计算密集且对初始化敏感。我们提出了一种基于神经场的4DVAR重构,其中时空状态被表示为由神经网络参数化的连续函数。我们证明,在参数空间中优化利用了神经场的谱偏置,作为隐式正则化器,稳定状态估计并抑制虚假的高频振荡,而无需显式的背景误差协方差信息。此外,通过参数化完整的时空轨迹,我们的框架实现了时间并行优化,并通过物理信息损失直接纳入物理约束。在混沌基准测试(包括二维Kolmogorov流和三维Taylor-Green涡旋)上的评估表明,神经重参数化比经典4DVAR产生更准确的初始条件。当与可分离神经架构(SPINNs)结合时,该方法实现了显著的加速。与许多机器学习方法不同,该框架不需要真实训练数据,为业务化资料同化提供了一种稳健且可扩展的替代方案。

英文摘要

Four-dimensional variational data assimilation (4DVAR) is a cornerstone of numerical weather prediction, yet it remains computationally intensive and sensitive to initialization due to the non-convexity of its objective function. We propose a neural field-based reformulation of 4DVAR in which the spatiotemporal state is represented as a continuous function parameterized by a neural network. We demonstrate that optimizing in parameter space leverages the spectral bias of neural fields, acting as an implicit regularizer that stabilizes state estimation and suppresses spurious high-frequency oscillations without requiring explicit background error covariance information. Furthermore, by parameterizing the full spatiotemporal trajectory, our framework enables parallel-in-time optimization and incorporates physical constraints directly through physics-informed losses. Evaluations on chaotic benchmarks, including 2D Kolmogorov flow and 3D Taylor-Green vortices, show that neural reparameterization produces more accurate initial conditions than classical 4DVAR. When combined with separable neural architectures (SPINNs), the method achieves substantial speedups. Unlike many machine learning approaches, this framework requires no ground-truth training data, offering a robust and scalable alternative for operational data assimilation.

2510.03381 2026-06-08 cs.LG cs.AI 版本更新

Proxy Reconstruction Pre-training for Ramp Flow Prediction at Highway Interchanges

高速公路立交匝道流量预测的代理重建预训练

Yongchao Li, Jun Chen, Zhuoxuan Li, Chao Gao, Yang Li, Chu Zhang, Changyin Dong

发表机构 * Southeast University(东南大学) Institute of Telecommunications and Information Sciences, China(中国电信与信息科学研究院)

AI总结 提出时空解耦自编码器(STDAE),通过跨模态重建预训练从主线数据恢复匝道流量,结合GWNet等模型提升预测精度,在真实数据集上超越13个基线。

Comments Accepted at Applied Soft Computing Journal

详情
Journal ref
Applied Soft Computing Journal 200 (2026) 115462
AI中文摘要

立交桥是高速公路间车辆转换的关键节点,但缺乏实时匝道检测器导致交通预测存在盲区。为解决这一问题,我们提出时空解耦自编码器(STDAE),一种利用跨模态重建预训练的两阶段框架。在第一阶段,STDAE从主线数据重建历史匝道流量,迫使模型捕捉内在的时空关系。其解耦架构通过并行的空间和时间自编码器高效提取异质特征。在预测阶段,学习到的表示与GWNet等模型集成以提高准确性。在三个真实立交数据集上的实验表明,STDAE-GWNET始终优于十三个最先进的基线,并达到与使用历史匝道数据的模型相当的性能。这证明了其在克服检测器稀缺方面的有效性及其在不同预测流程中的即插即用潜力。

英文摘要

Interchanges are crucial nodes for vehicle transfers between highways, yet the lack of real-time ramp detectors creates blind spots in traffic prediction. To address this, we propose a Spatio-Temporal Decoupled Autoencoder (STDAE), a two-stage framework that leverages cross-modal reconstruction pretraining. In the first stage, STDAE reconstructs historical ramp flows from mainline data, forcing the model to capture intrinsic spatio-temporal relations. Its decoupled architecture with parallel spatial and temporal autoencoders efficiently extracts heterogeneous features. In the prediction stage, the learned representations are integrated with models such as GWNet to enhance accuracy. Experiments on three real-world interchange datasets show that STDAE-GWNET consistently outperforms thirteen state-of-the-art baselines and achieves performance comparable to models using historical ramp data. This demonstrates its effectiveness in overcoming detector scarcity and its plug-and-play potential for diverse forecasting pipelines.

2510.09041 2026-06-08 cs.LG cs.AI 版本更新

Robust Driving Control for Autonomous Vehicles: An Intelligent General-sum Constrained Adversarial Reinforcement Learning Approach

自动驾驶鲁棒控制:一种智能一般和约束对抗强化学习方法

Junchao Fan, Qi Wei, Ruichen Zhang, Yang Lu, Jianhua Wang, Xiaolin Chang, Bo Ai

发表机构 * Beijing Key Laboratory of Security and Privacy in Intelligent Transportation(北京智能交通安全与隐私重点实验室) Beijing Jiaotong University(北京交通大学) College of Computing and Data Science(计算与数据科学学院) Nanyang Technological University(南洋理工大学) School of Computer Science and Technology(计算机科学与技术学院) Taiyuan University of Technology(太原科技大学) School of Electronics and Information Engineering(电子与信息工程学院)

AI总结 针对深度强化学习在自动驾驶中易受对抗攻击的问题,提出智能一般和约束对抗强化学习(IGCARL),通过战略性目标对手和鲁棒驾驶代理的交互训练,在约束优化下提升策略稳定性,实验表明成功率比现有方法提高至少27.9%。

详情
AI中文摘要

深度强化学习(DRL)在开发自动驾驶策略方面取得了显著成功。然而,其对对抗攻击的脆弱性仍然是实际部署的关键障碍。尽管现有的鲁棒方法已取得一定成功,但它们仍面临三个关键问题:(i)这些方法针对短视的对抗攻击进行训练,限制了它们应对更具战略性威胁的能力;(ii)它们难以引发真正安全关键的事件(例如碰撞),反而常常导致轻微后果;(iii)由于缺乏鲁棒约束,这些方法在训练过程中可能导致学习不稳定和策略漂移。为了解决这些问题,我们提出了智能一般和约束对抗强化学习(IGCARL),一种新颖的鲁棒自动驾驶方法,包括一个战略性目标对手和一个鲁棒驾驶代理。战略性目标对手被设计为利用DRL的时间决策能力来执行策略协调的多步攻击。此外,它通过采用一般和目标明确地专注于引发安全关键事件。鲁棒驾驶代理通过与对手交互学习,以发展出对抗攻击的鲁棒自动驾驶策略。为了确保对抗环境中的稳定学习并减轻攻击引起的策略漂移,代理在约束公式下进行优化。大量实验表明,IGCARL相比现有最先进方法将成功率提高了至少27.9%,展示了对抗攻击的卓越鲁棒性,并增强了基于DRL的自动驾驶的安全性和可靠性。

英文摘要

Deep reinforcement learning (DRL) has demonstrated remarkable success in developing autonomous driving policies. However, its vulnerability to adversarial attacks remains a critical barrier to real-world deployment. Although existing robust methods have achieved success, they still suffer from three key issues: (i) these methods are trained against myopic adversarial attacks, limiting their abilities to respond to more strategic threats, (ii) they have trouble causing truly safety-critical events (e.g., collisions), but instead often result in minor consequences, and (iii) these methods can introduce learning instability and policy drift during training due to the lack of robust constraints. To address these issues, we propose Intelligent General-sum Constrained Adversarial Reinforcement Learning (IGCARL), a novel robust autonomous driving approach that consists of a strategic targeted adversary and a robust driving agent. The strategic targeted adversary is designed to leverage the temporal decision-making capabilities of DRL to execute strategically coordinated multi-step attacks. In addition, it explicitly focuses on inducing safety-critical events by adopting a general-sum objective. The robust driving agent learns by interacting with the adversary to develop a robust autonomous driving policy against adversarial attacks. To ensure stable learning in adversarial environments and to mitigate policy drift caused by attacks, the agent is optimized under a constrained formulation. Extensive experiments show that IGCARL improves the success rate by at least 27.9% over state-of-the-art methods, demonstrating superior robustness to adversarial attacks and enhancing the safety and reliability of DRL-based autonomous driving.

2510.16023 2026-06-08 cs.LG cond-mat.mtrl-sci 版本更新

A Conformation-Centric Generative Foundation Model for Linear Polymer Modeling and Design

面向线性聚合物建模与设计的构象中心生成式基础模型

Fanmeng Wang, Ruochao Wang, Shan Mei, Wentao Guo, Hongshuai Wang, Qi Ou, Zhifeng Gao, Hongteng Xu

发表机构 * Gaoling School of Artificial Intelligence(人工智能学院) Renmin University of China(中国人民大学) DP Technology(DP技术) SINOPEC Research Institute of Petroleum Processing Co., Ltd.(中石油加工研究院)

AI总结 提出PolyConFM基础模型,通过构象中心生成预训练(条件生成、掩码自回归建模和方向变换)来建模线性聚合物,在多种下游任务中优于现有方法。

详情
AI中文摘要

线性聚合物是由单体共价键合形成连续链的大分子,支撑着无数技术并是现代生活不可或缺的。虽然深度学习正在推进聚合物科学,但现有方法通常仅通过单体级描述符表示整个线性聚合物,忽视了聚合物构象中固有的全局结构信息,最终限制了其实际性能。此外,这一重要领域仍缺乏能够有效支持多种下游任务的专用基础模型,从而严重制约了进展。为应对这些挑战,我们引入了PolyConFM,一个通过构象中心生成预训练专门用于建模和设计线性聚合物的基础模型。认识到每个线性聚合物本质上是一个连续链,其构象可以自然地分解为一系列局部构象(即其重复单元的构象),我们在条件生成范式下预训练PolyConFM,通过掩码自回归(MAR)建模重建这些局部构象,并进一步生成它们的取向变换以恢复相应的聚合物构象。同时,我们通过分子动力学模拟构建了一个线性聚合物构象数据集以缓解数据稀疏性,从而实现了以构象为中心的预训练。实验表明,PolyConFM在多种下游任务中始终优于代表性的任务特定方法,从而为聚合物科学提供了针对线性聚合物的强大工具。

英文摘要

Linear polymers, macromolecules formed from monomers covalently bonded into continuous chains, underpin countless technologies and are indispensable to modern life. While deep learning is advancing polymer science, existing methods typically represent the whole linear polymer solely through monomer-level descriptors, overlooking the global structural information inherent in polymer conformations, which ultimately limits their practical performance. Moreover, this important field still lacks a dedicated foundation model that can effectively support diverse downstream tasks, thereby severely constraining progress. To address these challenges, we introduce PolyConFM, a foundation model tailored for modeling and designing linear polymers through conformation-centric generative pretraining. Recognizing that each linear polymer is essentially a continuous chain whose conformation can be naturally decomposed into a sequence of local conformations (i.e., those of its repeating units), we pretrain PolyConFM under the conditional generation paradigm, reconstructing these local conformations via masked autoregressive (MAR) modeling and further generating their orientation transformations to recover the corresponding polymer conformation. Meanwhile, we construct a linear polymer conformation dataset via molecular dynamics simulations to mitigate data sparsity, thereby enabling conformation-centric pretraining. Experiments demonstrate that PolyConFM consistently outperforms representative task-specific methods across diverse downstream tasks, thereby equipping polymer science with a powerful tool targeting linear polymers.

2510.24561 2026-06-08 cs.LG cs.AI 版本更新

LoRA-DA: Data-Aware Initialization for Low-Rank Adaptation via Asymptotic Analysis

LoRA-DA:基于渐近分析的低秩自适应数据感知初始化

Qingyue Zhang, Chang Chu, Tianren Peng, Qi Li, Xiangyang Luo, Zhihao Jiang, Shao-Lun Huang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出LoRA-DA,通过渐近分析优化低秩自适应初始化,结合Fisher梯度与Fisher信息最小化参数偏差,提升微调准确率与收敛稳定性。

Comments Published at ICML 2026

详情
AI中文摘要

LoRA已成为广泛采用的PEFT方法,其初始化方法日益受到关注。然而,现有方法存在显著局限性:许多方法未纳入目标域数据,而基于梯度的方法仅通过依赖一步梯度分解在浅层利用数据。本文建立了数据感知LoRA初始化的理论框架。从最小化微调模型与目标模型之间参数偏差的期望出发,我们推导出一个包含两项的优化问题:偏差项,与微调模型和目标模型之间的参数距离相关,并使用Fisher梯度公式近似以保持各向异性;方差项,通过Fisher信息考虑采样随机性引入的不确定性。求解该问题得到LoRA的最优初始化策略,基于此我们开发了高效算法LoRA-DA。跨多个基准的实验结果表明,LoRA-DA在最终准确率上持续优于现有初始化方法。附加研究显示其收敛更快、更稳定,跨秩鲁棒性强,且初始化开销小。源代码见https://github.com/zqy0126/LoRA-DA。

英文摘要

LoRA has become a widely adopted method for PEFT, and its initialization methods have attracted increasing attention. However, existing methods have notable limitations: many methods do not incorporate target-domain data, while gradient-based methods exploit data only at a shallow level by relying on one-step gradient decomposition. In this paper, we establish a theoretical framework for data-aware LoRA initialization. Starting from minimizing the expectation of the parameter discrepancy between the fine-tuned and target models, we derive an optimization problem with two components: a bias term, which is related to the parameter distance between the fine-tuned and target models, and is approximated using a Fisher-gradient formulation to preserve anisotropy; and a variance term, which accounts for the uncertainty introduced by sampling stochasticity through the Fisher information. Solving this problem yields an optimal initialization strategy for LoRA, based on which we develop an efficient algorithm, LoRA-DA. Empirical results across multiple benchmarks demonstrate that LoRA-DA consistently improves final accuracy over existing initialization methods. Additional studies show faster, more stable convergence, robustness across ranks, and only a small initialization overhead for LoRA-DA. The source code is available at https://github.com/zqy0126/LoRA-DA.

2510.26714 2026-06-08 cs.LG cs.AI 版本更新

On the importance of multiple training seeds for evaluating machine unlearning

关于多个训练种子在评估机器遗忘中的重要性

Jamie Lanyon, Axel Finke, Petros Andreou, Georgina Cosma

发表机构 * Department of Computer Science(计算机科学系) School of Mathematics(数学学院) School of Science(科学学院) Statistics and Physics(统计学与物理学) Loughborough University(洛桑大学) Newcastle University(新castle大学)

AI总结 本文指出评估机器遗忘算法时仅使用单个训练种子可能导致结果不具代表性,并通过图像分类、联邦学习排序和大语言模型实验验证了问题普遍性,最后给出选择训练和遗忘种子数量的指导。

Comments mini paper, 5 figures

详情
AI中文摘要

机器遗忘旨在从训练好的模型中移除某些数据点的影响,而无需昂贵的重新训练。大多数实用的遗忘算法只是近似,其性能只能通过经验评估。常见做法是从同一个训练好的模型(即仅使用单个训练种子)开始,多次独立运行遗忘算法(即使用多个遗忘种子)。在图像分类实验中,这种做法可能给出不具代表性的结果,因为遗忘性能可能对训练种子的选择敏感。这对于确定性遗忘方法尤其相关,这些方法从同一个训练好的模型开始时总是产生相同的结果。在联邦学习排序和大语言模型上的进一步实验证实,这个问题不仅限于图像分类。我们还解释了为什么增加遗忘种子的数量通常无法弥补多个训练种子的缺失。最后,我们给出了如何选择训练和遗忘种子数量的指导。

英文摘要

Machine unlearning aims to remove the influence of certain data points from a trained model without costly retraining. Most practical unlearning algorithms are only approximate and their performance can only be assessed empirically. Common practice is to run unlearning algorithms multiple times independently (i.e., using multiple unlearning seeds) starting from the same trained model (i.e., using only a single training seed ). In image-classification experiments, this practice can give non-representative results as unlearning performance can be sensitive to the choice of training seed. This is particularly relevant for deterministic unlearning methods which always produce the same result when started from the same trained model. Further experiments on federated learning-to-rank, and large language models confirm that this issue extends beyond image classification. We also explain why increasing the number of unlearning seeds cannot generally compensate for the lack of multiple training seeds. Finally, we give guidance on how to select the number of training and unlearning seeds.

2511.18945 2026-06-08 cs.LG cs.IT math.IT 版本更新

MIST: Mutual Information Estimation Via Supervised Training

MIST: 通过监督训练进行互信息估计

German Gritsai, Megan Richards, Maxime Méloux, Kyunghyun Cho, Maxime Peyrard

发表机构 * Université Grenoble Alpes(格拉诺布尔大学) CNRS(国家科学研究中心) Grenoble INP(格拉诺布尔研究所) LIG(实验室)

AI总结 提出一种基于神经网络的全数据驱动互信息估计器MIST,在大规模合成数据集上训练,采用二维注意力机制处理变长样本,并通过分位数回归量化不确定性,实验表明其性能优于传统方法且推理速度快。

详情
AI中文摘要

我们提出了一种完全数据驱动的互信息(MI)估计器设计方法。由于任何MI估计器都是来自两个随机变量的观测样本的函数,我们用一个神经网络(MIST)参数化这个函数,并端到端地训练它以预测MI值。训练是在一个包含625,000个已知真实MI的合成联合分布的大型元数据集上进行的。为了处理可变的样本大小和维度,我们采用了一种二维注意力机制,确保输入样本的置换不变性。为了量化不确定性,我们优化了分位数回归损失,使估计器能够近似MI的采样分布,而不是返回单个点估计。这一研究计划与先前的工作不同,它采取了一条完全经验性的路线,用普适的理论保证换取了灵活性和效率。实验表明,学习到的估计器在样本大小和维度上大大优于经典基线,包括在训练期间未见过的联合分布上。由此产生的基于分位数的区间校准良好,比基于自助法的置信区间更可靠,而推理速度比现有的神经基线快几个数量级。除了直接的实证收益外,这一框架产生了可训练、完全可微的估计器,可以嵌入到更大的学习流程中。此外,利用MI对可逆变换的不变性,元数据集可以通过归一化流适应任意数据模态,从而为多样化的目标元分布实现灵活的训练。

英文摘要

We propose a fully data-driven approach to designing mutual information (MI) estimators. Since any MI estimator is a function of the observed sample from two random variables, we parameterize this function with a neural network (MIST) and train it end-to-end to predict MI values. Training is performed on a large meta-dataset of 625,000 synthetic joint distributions with known ground-truth MI. To handle variable sample sizes and dimensions, we employ a two-dimensional attention scheme ensuring permutation invariance across input samples. To quantify uncertainty, we optimize a quantile regression loss, enabling the estimator to approximate the sampling distribution of MI rather than return a single point estimate. This research program departs from prior work by taking a fully empirical route, trading universal theoretical guarantees for flexibility and efficiency. Empirically, the learned estimators largely outperform classical baselines across sample sizes and dimensions, including on joint distributions unseen during training. The resulting quantile-based intervals are well-calibrated and more reliable than bootstrap-based confidence intervals, while inference is orders of magnitude faster than existing neural baselines. Beyond immediate empirical gains, this framework yields trainable, fully differentiable estimators that can be embedded into larger learning pipelines. Moreover, exploiting MI's invariance to invertible transformations, meta-datasets can be adapted to arbitrary data modalities via normalizing flows, enabling flexible training for diverse target meta-distributions.

2511.19359 2026-06-08 cs.LG 版本更新

Enhancing Conformal Prediction via Class Similarity

通过类别相似性增强保形预测

Ariel Fargion, Lahav Dabah, Tom Tirer

发表机构 * Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel(巴伊兰大学工程学院,拉马特甘,以色列)

AI总结 提出利用类别相似性改进保形预测的方法,通过惩罚组外错误或利用嵌入信息,减少预测集大小并提升语义一致性。

Comments ICML 2026 (camera-ready). Code is available at: https://github.com/ariel361/CP_via_CS

详情
AI中文摘要

保形预测(CP)已成为高风险分类应用中一个强大的统计框架。CP 不是预测单个类别,而是生成一个预测集,保证以预先指定的概率包含真实标签。不同 CP 方法的性能通常通过其平均预测集大小来评估。在类别可以划分为语义组(例如需要类似治疗的疾病)的设置中,用户可以从不仅平均较小而且包含少量语义不同组的预测集中受益。本文首先解决这个问题,并最终提供一种广泛适用的工具,用于在任何数据集上提升任何 CP 方法。首先,给定一个类别划分,我们建议在 CP 评分函数中增加一个惩罚项,用于惩罚包含组外错误的预测。我们从理论上分析了这一策略,并证明了其在组相关指标上的优势。令人惊讶的是,我们从数学上表明,对于常见的类别划分,它还可以减少任何 CP 评分函数的平均集大小。我们的分析揭示了这种改进背后的类别相似性因素,并激发了一种变体,该变体可以通过利用模型的嵌入进一步减少预测集大小,而无需任何人工语义划分。最后,我们提出了一项广泛的实证研究,涵盖了著名的 CP 方法、多个模型和几个数据集,表明我们基于类别相似性的方法一致地增强了 CP 方法。

英文摘要

Conformal Prediction (CP) has emerged as a powerful statistical framework for high-stakes classification applications. Instead of predicting a single class, CP generates a prediction set, guaranteed to include the true label with a pre-specified probability. The performance of different CP methods is typically assessed by their average prediction set size. In setups where the classes can be partitioned into semantic groups, e.g., diseases that require similar treatment, users can benefit from prediction sets that are not only small on average, but also contain a small number of semantically different groups. This paper begins by addressing this problem and ultimately offers a widely applicable tool for boosting any CP method on any dataset. First, given a class partition, we propose augmenting the CP score function with a term that penalizes predictions with out-of-group errors. We theoretically analyze this strategy and prove its advantages for group-related metrics. Surprisingly, we show mathematically that, for common class partitions, it can also reduce the average set size of any CP score function. Our analysis reveals the class-similarity factors behind this improvement and motivates a variant that can further reduce prediction set size by leveraging the model's embeddings, without requiring any human semantic partition. Finally, we present an extensive empirical study, encompassing prominent CP methods, multiple models, and several datasets, which demonstrates that our class-similarity-based approach consistently enhances CP methods.

2511.22581 2026-06-08 cs.LG cs.MA 版本更新

High entropy leads to symmetry-equivariant policies in Dec-POMDPs

高熵导致 Dec-POMDP 中的对称等变策略

Johannes Forkel, Constantin Ruhdorfer, Michael Beukman, Andreas Bulling, Jakob Foerster

发表机构 * FLAIR, Department of Engineering Science, University of Oxford(奥德赛实验室,工程科学系,牛津大学) Collaborative Artificial Intelligence, University of Stuttgart(协同人工智能,斯图加特大学)

AI总结 证明在 Dec-POMDP 中,足够高的熵正则化可确保策略梯度收敛到对称等变联合策略,并通过实验发现高熵系数能提升跨种子交叉对战的回报。

详情
AI中文摘要

我们证明,在任何 Dec-POMDP 中,足够高的熵正则化可确保使用表格 softmax 参数化的策略梯度流对于任何初始化都收敛到相同的联合策略,并且该联合策略关于 Dec-POMDP 的所有对称性是等变的。特别地,来自不同初始化的策略将完全兼容,即它们的交叉对战回报等于自对战回报。通过在 Hanabi、Overcooked 和 Yokai 环境中对独立 PPO(可以说是标准基线深度多智能体策略梯度算法)进行广泛评估,我们发现熵系数对独立训练策略之间的交叉对战回报有巨大影响,并且增加熵正则化导致的自对战回报下降通常可以通过在训练后对学习策略进行贪婪化来抵消。特别是在 Hanabi 中,我们通过这种方式实现了跨种子交叉对战的新 SOTA。虽然我们给出了 Dec-POMDP 的示例,其中无法以这种方式学习最优对称等变策略,但我们的理论和实证结果都表明,在 Dec-POMDP 的超参数扫描中,应该考虑比通常高得多的熵系数。我们实验的代码可以在 https://github.com/jforkel/JAX-OBL 找到。

英文摘要

We prove that in any Dec-POMDP, sufficiently high entropy regularization ensures that the policy gradient flow with tabular softmax parametrization always converges, for any initialization, to the same joint policy, and that this joint policy is equivariant w.r.t. all symmetries of the Dec-POMDP. In particular, policies coming from different initializations will be fully compatible, in that their cross-play returns are equal to their self-play returns. Through extensive evaluation of independent PPO, arguably the standard baseline deep multi-agent policy gradient algorithm, in the Hanabi, Overcooked and Yokai environments, we find that the entropy coefficient has a massive influence on the cross-play returns between independently trained policies, and that the decrease in self-play returns coming from increased entropy regularization can often be counteracted by greedifying the learned policies after training. In Hanabi in particular we achieve a new SOTA in inter-seed cross-play this way. While we give examples of Dec-POMDPs in which one cannot learn the optimal symmetry-equivariant policy this way, both our theoretical and empirical results suggest that one should consider far higher entropy coefficients during hyperparameter sweeps in Dec-POMDPs than is typically done. Code for our experiments can be found at https://github.com/jforkel/JAX-OBL

2512.01362 2026-06-08 cs.LG 版本更新

Directed evolution algorithm drives neural prediction

定向进化算法驱动神经预测

Yanlin Wang, Nancy M Young, Patrick C M Wong

发表机构 * Brain and Mind institute, The Chinese University of Hong Kong(脑科学与智能技术研究所,香港中文大学) Department of Linguistics and Modern Languages, The Chinese University of Hong Kong(语言学与现代语言系,香港中文大学) Division of Otolaryngology, Ann & Robert H. Lurie Children's Hospital of Chicago(芝加哥安·罗伯特·H·卢里儿童医院耳鼻喉科) Department of Otolaryngology Head & Neck Surgery, Feinberg School of Medicine, Northwestern University(费因伯格医学院耳鼻喉科与头颈外科部,西北大学) Knowles Hearing Center, Department of Communication Sciences and Disorders, Northwestern University(诺里斯听力中心,西北大学沟通科学与障碍系)

AI总结 提出定向进化模型(DEM),模拟生物定向进化试错过程,结合回放缓冲和连续反向传播,在跨域神经预测中提升泛化能力并解决标签稀缺问题。

Comments 43 pages, 5 figures

详情
AI中文摘要

神经预测为预测神经认知功能和障碍的个体差异以及为个性化干预提供预后指标提供了一种有前景的方法。然而,由于领域偏移和标签稀缺的限制,将神经预测模型转化为医学人工智能应用具有挑战性。在此,我们提出定向进化模型(DEM),一种新颖的计算模型,模拟生物定向进化的试错过程,以逼近预测建模任务的最优解。我们证明了定向进化算法是一种有效的不确定性探索策略,能够增强强化学习中的泛化能力。此外,通过将回放缓冲和连续反向传播方法整合到DEM中,我们提供了在连续学习环境中实现利用与探索之间更好权衡的证据。我们在四个不同数据集上进行了实验,这些数据集涉及接受人工耳蜗植入的儿童,其口语发展结果在个体儿童水平上差异很大。术前神经MRI数据已被证明可以准确预测这些儿童术后结果,但在数据集之间不适用。我们的结果表明,DEM能够有效提高跨域植入前神经预测的性能,同时解决目标域中标签稀缺的挑战。

英文摘要

Neural prediction offers a promising approach to forecasting the individual variability of neurocognitive functions and disorders and providing prognostic indicators for personalized invention. However, it is challenging to translate neural predictive models into medical artificial intelligent applications due to the limitations of domain shift and label scarcity. Here, we propose the directed evolution model (DEM), a novel computational model that mimics the trial-and-error processes of biological directed evolution to approximate optimal solutions for predictive modeling tasks. We demonstrated that the directed evolution algorithm is an effective strategy for uncertainty exploration, enhancing generalization in reinforcement learning. Furthermore, by incorporating replay buffer and continual backpropagate methods into DEM, we provide evidence of achieving better trade-off between exploitation and exploration in continuous learning settings. We conducted experiments on four different datasets for children with cochlear implants whose spoken language developmental outcomes vary considerably on the individual-child level. Preoperative neural MRI data has shown to accurately predict the post-operative outcome of these children within but not across datasets. Our results show that DEM can efficiently improve the performance of cross-domain pre-implantation neural predictions while addressing the challenge of label scarcity in target domain.

2512.05291 2026-06-08 cs.LG 版本更新

SHAP-Guided Kernel Actor-Critic for Explainable Reinforcement Learning

基于SHAP引导的核化Actor-Critic可解释强化学习

Na Li, Hangguan Shan, Wei Ni, Wenjie Zhang, Xinyu Li

发表机构 * National University of Singapore(新加坡国立大学) University of Science and Technology of China(中国科学技术大学)

AI总结 提出RSA2C算法,利用RKHS-SHAP计算状态属性,通过马氏门控权重调节Actor梯度和Advantage Critic目标,实现高效、稳定且可解释的强化学习。

详情
Journal ref
ICML2026
AI中文摘要

Actor-Critic (AC) 方法是强化学习 (RL) 的基石,但可解释性有限。当前的可解释RL方法很少使用状态属性来辅助训练,而是平等对待所有状态特征,从而忽略了单个状态维度对奖励的异质性影响。我们提出基于RKHS-SHAP的高级Actor-Critic (RSA2C),一种属性感知的、核化的、双时间尺度AC算法,包括Actor、Value Critic和Advantage Critic。Actor实例化在向量值再生核希尔伯特空间 (RKHS) 中,使用马氏加权算子值核,而Value Critic和Advantage Critic位于标量RKHS中。这些RKHS增强组件使用稀疏化字典:Value Critic维护自己的字典,而Actor和Advantage Critic共享一个字典。通过RKHS-SHAP(用于流形上期望的核均值嵌入和流形外期望的条件均值嵌入)从Value Critic计算的状态属性被转换为马氏门控权重,用于调节Actor梯度和Advantage Critic目标。我们推导了在状态扰动下的全局非渐近收敛界,通过扰动误差项显示稳定性,通过收敛误差项显示效率。在三个连续控制环境上的实验结果表明,RSA2C实现了效率、稳定性和可解释性。我们的代码可在 https://github.com/Na-Li66/RSA2C 获取。

英文摘要

Actor-critic (AC) methods are a cornerstone of reinforcement learning (RL) but offer limited interpretability. Current explainable RL methods seldom use state attributions to assist training. Rather, they treat all state features equally, thereby neglecting the heterogeneous impacts of individual state dimensions on the reward. We propose RKHS-SHAP-based Advanced Actor-Critic (RSA2C), an attribution-aware, kernelized, two-timescale AC algorithm, including Actor, Value Critic, and Advantage Critic. The Actor is instantiated in a vector-valued reproducing kernel Hilbert space (RKHS) with a Mahalanobis-weighted operator-valued kernel, while the Value Critic and Advantage Critic reside in scalar RKHSs. These RKHS-enhanced components use sparsified dictionaries: the Value Critic maintains its own dictionary, while the Actor and Advantage Critic share one. State attributions, computed from the Value Critic via RKHS-SHAP (kernel mean embedding for on-manifold and conditional mean embedding for off-manifold expectations), are converted into Mahalanobis-gated weights that modulate Actor gradients and Advantage Critic targets. We derive a global, non-asymptotic convergence bound under state perturbations, showing stability through the perturbation-error term and efficiency through the convergence-error term. Empirical results on three continuous-control environments show that RSA2C achieves efficiency, stability, and interpretability. Our code is available at https://github.com/Na-Li66/RSA2C.

2512.09084 2026-06-08 cs.LG 版本更新

GS-KAN: Parameter-Efficient Kolmogorov-Arnold Networks via Sprecher-Type Shared Basis Functions

GS-KAN: 通过Sprecher型共享基函数的参数高效Kolmogorov-Arnold网络

Oscar Eliasson

发表机构 * Chalmers University of Technology(挑战大学)

AI总结 提出GS-KAN,通过每层共享单一父函数的线性变换构造边函数,在保持参数高效的同时,在函数逼近、表格回归和图像分类任务上优于或媲美现有KAN和MLP。

Comments 6 pages, 2 figures

详情
AI中文摘要

Kolmogorov-Arnold表示定理通过在边上而非节点上放置可学习单变量函数,为多层感知器(MLP)提供了理论替代方案。尽管最近的实现如Kolmogorov-Arnold网络(KAN)展示了高逼近能力,但由于需要为每个网络边维护唯一参数化,它们存在显著的参数低效问题。在这项工作中,我们提出GS-KAN(广义Sprecher-KAN),一种受David Sprecher对叠加定理的改进启发的轻量级架构。GS-KAN通过对每层单个可学习的共享父函数应用可学习线性变换来构造唯一的边函数。我们在合成函数逼近、表格数据回归和图像分类任务上评估了GS-KAN与现有KAN架构和MLP的性能。结果表明,GS-KAN在连续函数逼近任务上优于MLP和标准KAN基线,同时保持优越的参数效率。此外,GS-KAN在表格回归上与现有KAN架构性能相当,在高维分类任务上优于MLP。关键的是,所提出的架构使得在严格参数约束下的高维场景中部署基于KAN的架构成为可能,而标准实现由于参数爆炸通常不可行。源代码可在https://github.com/rambamn48/gs-impl获取。

英文摘要

The Kolmogorov-Arnold representation theorem offers a theoretical alternative to Multi-Layer Perceptrons (MLPs) by placing learnable univariate functions on edges rather than nodes. While recent implementations such as Kolmogorov-Arnold Networks (KANs) demonstrate high approximation capabilities, they suffer from significant parameter inefficiency due to the requirement of maintaining unique parameterizations for every network edge. In this work, we propose GS-KAN (Generalized Sprecher-KAN), a lightweight architecture inspired by David Sprecher's refinement of the superposition theorem. GS-KAN constructs unique edge functions by applying learnable linear transformations to a single learnable, shared parent function per layer. We evaluate GS-KAN against existing KAN architectures and MLPs across synthetic function approximation, tabular data regression and image classification tasks. Our results demonstrate that GS-KAN outperforms both MLPs and standard KAN baselines on continuous function approximation tasks while maintaining superior parameter efficiency. Additionally, GS-KAN achieves competitive performance with existing KAN architectures on tabular regression and outperforms MLPs on high-dimensional classification tasks. Crucially, the proposed architecture enables the deployment of KAN-based architectures in high-dimensional regimes under strict parameter constraints, a setting where standard implementations are typically infeasible due to parameter explosion. The source code is available at https://github.com/rambamn48/gs-impl.

2512.14391 2026-06-08 cs.LG cs.AI cs.CL 版本更新

RePo: Language Models with Context Re-Positioning

RePo:具有上下文重定位的语言模型

Huayang Li, Tianyu Zhao, Deng Cai, Richard Sproat

发表机构 * University of Maryland(马里兰大学)

AI总结 提出RePo机制,通过可微分模块重新分配token位置以减轻注意力层负担,在噪声上下文、结构化数据和长上下文任务上持续提升性能。

Comments Accepted to ICML 2026

详情
AI中文摘要

上下文学习是现代大型语言模型(LLM)的基础;然而,主流架构通过分配线性或常数的位置索引来施加刚性且固定的上下文结构。刚性的位置信息将组织输入结构的全部负担强加给注意力层,从而减少了可用于更关键信息的注意力量。为了解决这个问题,我们提出了RePo,一种通过上下文重定位来减轻注意力层负担的新机制。与传统方法不同,RePo利用可微分模块$f_ϕ$来分配捕获上下文依赖关系的token位置,而不是依赖预定义的顺序。通过在OLMo-2 1B和7B模型上持续预训练,我们证明RePo在涉及噪声上下文、结构化数据和更长上下文长度的任务上持续提升性能,同时在一般短上下文任务上保持有竞争力的性能。分析表明,RePo成功地将更多注意力分配给遥远但相关的信息,在密集且非线性的空间中分配位置,并捕获输入上下文的内在结构。我们的代码位于https://github.com/SakanaAI/repo。

英文摘要

In-context learning is fundamental to modern Large Language Models (LLMs); however, prevailing architectures impose a rigid and fixed contextual structure by assigning linear or constant positional indices. The rigid position information poses the full burden of organizing the input structure to attention layers, thus reducing the amount of attention that could be allocated for more critical information. To address this, we propose RePo, a novel mechanism that alleviates the burden for attention layers via context re-positioning. Unlike conventional approaches, RePo utilizes a differentiable module, $f_ϕ$, to assign token positions that capture contextual dependencies, rather than replying on pre-defined order. By continually pre-training on the OLMo-2 1B \& 7B models, we demonstrate that RePo consistently enhances performance on tasks involving noisy contexts, structured data, and longer context length, while maintaining competitive performance on general short-context tasks. Analysis reveals that RePo successfully allocates more attention mass to distant but relevant information, assigns positions in a dense and non-linear space, and captures the intrinsic structure of the input context. Our code is at https://github.com/SakanaAI/repo.

2512.17058 2026-06-08 cs.LG 版本更新

Universal consistency of the $k$-NN rule in metric spaces and Nagata dimension. III

度量空间和Nagata维数中$k$-NN规则的普适一致性. III

Vladimir G. Pestov

发表机构 * Department of Mathematics and Statistics, University of Ottawa(数学与统计学系,渥太华大学) Departamento de Matemática, Universidade Federal de Santa Catarina(数学系,圣卡塔琳娜联邦大学)

AI总结 本文证明了在完备可分度量空间中,$k$-最近邻分类器普适一致的充要条件是空间具有强Lebesgue-Besicovitch微分性质或Nagata的$\sigma$-有限维数,填补了最后缺失的环节。

Comments 22 pages, latex with ESAIM P&S macros, a second revision requested by the referee, with more accurate and detailed proofs, in particular, the referee pointed out the correct value of the Nagata dimension of R^2 which is 4

详情
AI中文摘要

我们建立了最后缺失的环节,使得能够用维数理论的组合术语和实分析的基本性质来描述那些完备可分度量空间$X$,其中$k$最近邻分类器是普适一致的。以下条件等价:(1) $k$-最近邻分类器在$X$中普适一致,(2) 强Lebesgue--Besicovitch微分性质在$X$中对每个局部有限Borel测度成立,(3) $X$在Jun-Iti Nagata意义下是$\sigma$-有限维的。等价关系(2)$\iff$(3)由Preiss (1983)宣布,而(3)$\Rightarrow$(2)的详细证明仅出现在Assouad和Quentin de Gromard (2006)中。(2)$\Rightarrow$(1)由Cérou和Guyader (2006)建立。我们证明了(1)$\Rightarrow$(3)。我们进一步表明,弱(而非强)Lebesgue--Besicovitch性质对于$k$-NN规则的一致性是不充分的,例如Heisenberg群就是一个反例(这里我们纠正了之前文章(Kumari and Pestov 2024)中的一个错误说法)。有点反直觉的是,存在一个与通常距离一致等价的实数直线上的度量,在该度量下$k$-NN分类器失效。最后,另一个可以添加到上述条件的等价条件是Cover--Hart性质:(4) $1$-最近邻分类器的误差渐近地至多是Bayes误差的两倍。

英文摘要

We establish the last missing link allowing to describe those complete separable metric spaces $X$ in which the $k$ nearest neighbour classifier is universally consistent, both in combinatorial terms of dimension theory and via a fundamental property of real analysis. The following are equivalent: (1) The $k$-nearest neighbour classifier is universally consistent in $X$, (2) The strong Lebesgue--Besicovitch differentiation property holds in $X$ for every locally finite Borel measure, (3) $X$ is sigma-finite dimensional in the sense of Jun-Iti Nagata. The equivalence (2)$\iff$(3) was announced by Preiss (1983), while a detailed proof of the implication (3)$\Rightarrow$(2) has only appeared in Assouad and Quentin de Gromard (2006). The implication (2)$\Rightarrow$(1) was established by Cérou and Guyader (2006). We prove the implication (1)$\Rightarrow$(3). We further show that the weak (instead of strong) Lebesgue--Besicovitch property is insufficient for the consistency of the $k$-NN rule, as witnessed, for example, by the Heisenberg group (here we correct a wrong claim made in the previous article (Kumari and Pestov 2024)). A bit counter-intuitively, there is a metric on the real line uniformly equivalent to the usual distance but under which the $k$-NN classifier fails. Finally, another equivalent condition that can be added to the above is the Cover--Hart property: (4) the error of the $1$-nearest neighbour classifier is asymptotically at most twice as bad as the Bayes error.

2512.20963 2026-06-08 cs.LG cs.CV 版本更新

Generalization of Diffusion Models Arises with a Balanced Representation Space

扩散模型的泛化源于平衡表示空间

Zekai Zhang, Xiao Li, Xiang Li, Lianghe Shi, Meng Wu, Molei Tao, Qing Qu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 通过分析两层ReLU去噪自编码器,证明记忆化导致局部尖峰表示,而泛化产生平衡表示,并在真实扩散模型中验证,提出基于表示的检测和编辑方法。

Comments Accepted at ICLR 2026. 40 pages, 19 figures. The first two authors contributed equally

详情
AI中文摘要

扩散模型擅长生成高质量、多样化的样本,但当过度拟合训练目标时,它们有记忆训练数据的风险。我们通过表示学习的视角分析了扩散模型中记忆化和泛化之间的区别。通过研究两层ReLU去噪自编码器(DAE),我们证明了(i)记忆化对应于模型在学习的权重中存储原始训练样本以进行编码和解码,产生局部尖峰表示,而(ii)泛化发生在模型捕获局部数据统计时,产生平衡表示。此外,我们在真实的无条件和文本到图像扩散模型上验证了这些理论发现,表明相同的表示结构出现在深度生成模型中,并具有重要的实际意义。基于这些见解,我们提出了一种基于表示的检测记忆化的方法,以及一种无需训练的编辑技术,通过表示引导实现精确控制。总之,我们的结果强调了学习好的表示对于新颖且有意义的生成建模至关重要。

英文摘要

Diffusion models excel at generating high-quality, diverse samples, yet they risk memorizing training data when overfit to the training objective. We analyze the distinctions between memorization and generalization in diffusion models through the lens of representation learning. By investigating a two-layer ReLU denoising autoencoder (DAE), we prove that (i) memorization corresponds to the model storing raw training samples in the learned weights for encoding and decoding, yielding localized spiky representations, whereas (ii) generalization arises when the model captures local data statistics, producing balanced representations. Furthermore, we validate these theoretical findings on real-world unconditional and text-to-image diffusion models, demonstrating that the same representation structures emerge in deep generative models with significant practical implications. Building on these insights, we propose a representation-based method for detecting memorization and a training-free editing technique that allows precise control via representation steering. Together, our results highlight that learning good representations is central to novel and meaningful generative modeling.

2601.16622 2026-06-08 cs.LG cs.AI 版本更新

E2Former-V2: On-the-Fly Equivariant Attention with Linear Activation Memory

E2Former-V2:具有线性激活内存的即时等变注意力

Lin Huang, Chengxiang Huang, Ziang Wang, Yiyue Du, Chu Wang, Haocheng Lu, Yunyang Li, Xiaoli Liu, Arthur Jiang, Jia Zhang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出E2Former-V2架构,通过等变轴对齐稀疏化(EAAS)和即时等变注意力机制,利用SO(3)到SO(2)基变换和自定义Triton内核,实现线性激活内存和20倍TFLOPS提升,在SPICE和OMol25数据集上加速推理并保持预测性能。

详情
AI中文摘要

等变图神经网络(EGNN)已成为建模3D原子系统的广泛使用的方法。然而,主流架构由于在每条边上显式构造几何特征或密集张量积而面临关键的可扩展性瓶颈。为克服这一问题,我们引入了**E2Former-V2**,一种将代数稀疏性与硬件感知执行相结合的可扩展架构。我们首先提出**等变轴对齐稀疏化(EAAS)**。EAAS基于Wigner-$6j$卷积,利用$\mathrm{SO}(3) \rightarrow \mathrm{SO}(2)$基变换,将计算昂贵的密集张量收缩转化为高效的稀疏奇偶重索引操作。基于这种表示,我们引入了**即时等变注意力**,一种通过自定义融合Triton内核实现的完全节点中心机制。通过消除物化的边张量并最大化SRAM利用率,我们的内核相比标准实现实现了**20倍的TFLOPS提升**。在SPICE和OMol25数据集上的大量实验表明,E2Former-V2在保持相当预测性能的同时显著加速推理。这项工作表明,大型等变Transformer可以使用广泛可用的GPU平台高效训练。代码可在https://github.com/IQuestLab/UBio-MolFM/tree/e2formerv2获取。

英文摘要

Equivariant Graph Neural Networks (EGNNs) have become a widely used approach for modeling 3D atomistic systems. However, mainstream architectures face critical scalability bottlenecks due to the explicit construction of geometric features or dense tensor products on \textit{every} edge. To overcome this, we introduce \textbf{E2Former-V2}, a scalable architecture that integrates algebraic sparsity with hardware-aware execution. We first propose \textbf{E}quivariant \textbf{A}xis-\textbf{A}ligned \textbf{S}parsification (EAAS). EAAS builds on Wigner-$6j$ convolution by exploiting an $\mathrm{SO}(3) \rightarrow \mathrm{SO}(2)$ change of basis to transform computationally expensive dense tensor contractions into efficient, sparse parity re-indexing operations. Building on this representation, we introduce \textbf{On-the-Fly Equivariant Attention}, a fully node-centric mechanism implemented via a custom fused Triton kernel. By eliminating materialized edge tensors and maximizing SRAM utilization, our kernel achieves a \textbf{20$\times$ improvement in TFLOPS} compared to standard implementations. Extensive experiments on the SPICE and OMol25 datasets demonstrate that E2Former-V2 maintains comparable predictive performance while notably accelerating inference. This work demonstrates that large equivariant transformers can be trained efficiently using widely accessible GPU platforms. The code is avalible at https://github.com/IQuestLab/UBio-MolFM/tree/e2formerv2.

2601.23207 2026-06-08 cs.LG cs.AI 版本更新

Learning to Execute Graph Algorithms Exactly with Graph Neural Networks

学习用图神经网络精确执行图算法

Muhammad Fetrat Qharabagh, Artur Back de Luca, George Giapitzakis, Kimon Fountoulakis

发表机构 * University of Waterloo(多伦多大学)

AI总结 证明在有限度和有限精度约束下,图神经网络能通过训练多层感知机集成学习局部指令,从而在推理时无误差执行完整图算法,并展示了在分布式计算LOCAL模型及多种经典算法上的可学习性。

详情
AI中文摘要

理解图神经网络能学习什么,特别是它们学习执行算法的能力,仍然是一个核心的理论挑战。在这项工作中,我们证明了在有限度和有限精度约束下图算法的精确可学习性结果。我们的方法遵循两步过程。首先,我们训练一个多层感知机(MLP)集成来执行单个节点的局部指令。其次,在推理过程中,我们使用训练好的MLP集成作为图神经网络(GNN)中的更新函数。利用神经正切核(NTK)理论,我们表明局部指令可以从一个小训练集中学习,从而使得完整的图算法在推理过程中能够以高概率无误差地执行。为了说明我们设置的学习能力,我们为分布式计算的LOCAL模型建立了一个严格的可学习性结果。我们进一步展示了广泛研究的算法(如消息洪泛、广度优先搜索、深度优先搜索和贝尔曼-福特算法)的积极可学习性结果。

英文摘要

Understanding what graph neural networks can learn, especially their ability to learn to execute algorithms, remains a central theoretical challenge. In this work, we prove exact learnability results for graph algorithms under bounded-degree and finite-precision constraints. Our approach follows a two-step process. First, we train an ensemble of multi-layer perceptrons (MLPs) to execute the local instructions of a single node. Second, during inference, we use the trained MLP ensemble as the update function within a graph neural network (GNN). Leveraging Neural Tangent Kernel (NTK) theory, we show that local instructions can be learned from a small training set, enabling the complete graph algorithm to be executed during inference without error and with high probability. To illustrate the learning power of our setting, we establish a rigorous learnability result for the LOCAL model of distributed computation. We further demonstrate positive learnability results for widely studied algorithms such as message flooding, breadth-first and depth-first search, and Bellman-Ford.

2602.00541 2026-06-08 cs.LG 版本更新

One Loss to Rule Them All: Marked Time-to-Event for Structured EHR Foundation Models

一个损失统治一切:结构化EHR基础模型的标记时间到事件

Zilin Jing, Vincent Jeanselme, Yuta Kobayashi, Simon A. Lee, Chao Pang, Aparajita Kashyap, Yanwei Li, Xinzhuo Jiang, Shalmali Joshi

发表机构 * Department of Computer Science, Columbia University(哥伦比亚大学计算机科学系) Department of Biomedical Informatics, Columbia University(哥伦比亚大学生物医学信息学系) Department of Computational Medicine, UCLA(洛杉矶大学计算医学系) Formation Bio

AI总结 提出ORA预训练目标,联合建模事件时间和关联测量,相比下一词预测和忽略连续测量的损失,在多个数据集和下游任务上产生更通用的表示,提升回归和时间到事件预测能力。

详情
AI中文摘要

电子健康记录(EHR)中捕获的临床事件是不规则采样的,可能由离散事件和数值测量(如实验室值或治疗剂量)混合组成。EHR的序列性质类似于自然语言,这促使使用下一词预测来训练事件上的EHR基础模型(FM)。然而,这种训练未能捕获EHR的完整结构。必须捕获给定事件发生的时间,但事件值(异常实验室)也会调节其他临床事件的可能性。大多数现有的EHR FM不联合建模这种可能性,无法捕获完整的观察过程,影响下游能力。我们提出ORA,一种标记时间到事件预训练目标,联合建模事件时间和相关测量。在多个数据集、下游任务和模型骨干上,该目标始终比下一词预测和忽略连续测量的预训练损失产生更可泛化的表示。重要的是,所提出的目标在传统分类评估之外带来改进,包括更好的回归和时间到事件预测。除了引入新的FM家族,我们的消融研究提出了更广泛的结论:考虑EHR结构的预训练目标对于扩展下游能力和泛化性至关重要。

英文摘要

Clinical events captured in Electronic Health Records (EHR) are irregularly sampled and may consist of a mixture of discrete events and numerical measurements, such as laboratory values or treatment dosages. The sequential nature of EHR, analogous to natural language, has motivated the use of next-token prediction to train prior EHR Foundation Models (FMs) over events. However, this training fails to capture the full structure of EHR. When a given event occurs must be captured, but the event value (abnormal lab) also modulates the likelihood of other clinical events. Most existing EHR FMs do not jointly model this likelihood and are unable to capture the full observation process, impacting downstream capabilities. We propose ORA, a marked time-to-event pretraining objective that jointly models event timing and associated measurements. Across multiple datasets, downstream tasks, and model backbones, this objective consistently yields more generalizable representations than next-token prediction and pretraining losses that ignore continuous measurements. Importantly, the proposed objective yields improvements beyond traditional classification evaluation, including better regression and time-to-event prediction. Beyond introducing a new family of FMs, our ablations suggest a broader takeaway: pretraining objectives that account for EHR structure are critical for expanding downstream capabilities and generalizability.

2602.02600 2026-06-08 cs.LG cs.AI 版本更新

Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models

自回归与扩散语言模型中的逐步拒绝动态

Eliron Rahimi, Elad Hirshel, Rom Himelstein, Amit LeVi, Avi Mendelson, Chaim Baskin

发表机构 * Department of Computer Science, Technion – Israel Institute of Technology(技术学院计算机科学系,以色列技术学院) INSIGHT Lab, School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Israel(内斯坦实验室,贝内-加隆大学内加尔分校,以色列) Computer Science Department, University of Haifa, Haifa, Israel(海法大学计算机科学系,海法,以色列)

AI总结 研究扩散语言模型(DLM)与自回归(AR)模型在拒绝有害生成行为上的差异,发现扩散重掩码机制可促进恢复,提出逐步拒绝内部动态(SRI)信号,并基于此构建无需修改推理的越狱检测器。

Comments Preprint

详情
AI中文摘要

扩散语言模型(DLM)最近已成为自回归(AR)模型的有竞争力的替代方案,提供并行解码、竞争性生成质量以及越狱鲁棒性改善的初步证据。尽管取得了这些进展,但采样机制在塑造拒绝行为中的作用仍知之甚少。为填补这一空白,我们提出了一项关于逐步拒绝动态的全面研究。我们表明,扩散重掩码可以促进从有害中间生成中恢复,提供证据表明这种行为与采样机制相关,并证明从AR采样切换到扩散采样可提高越狱鲁棒性,包括在固定模型权重下。为了捕捉在文本层面不可观察的生成动态,我们提出了逐步拒绝内部动态(SRI)信号。与我们的文本层面发现一致,SRI表明恢复主要在AR采样下失败,这些失败在SRI空间中通常相对于无害生成表现为异常。基于这一观察,我们表明SRI能够实现一个简单的越狱检测器,该检测器无需修改推理,并且仅通过在良性SRI信号上训练即可泛化到未见攻击。我们的评估表明,该检测器匹配或超越现有越狱检测基线,同时增加可忽略的开销。

英文摘要

Diffusion language models (DLMs) have recently emerged as a competitive alternative to autoregressive (AR) models, offering parallel decoding, competitive generation quality, and initial evidence of improved jailbreak robustness. Despite this progress, the role of sampling mechanisms in shaping refusal behavior remains poorly understood. To address this gap, we present a comprehensive study of step-wise refusal dynamics. We show that diffusion remasking can promote recovery from harmful intermediate generations, provide evidence that this behavior is tied to the sampling mechanism, and demonstrate that switching from AR to diffusion sampling improves jailbreak robustness, including under fixed model weights. To capture generation dynamics not observable at the text level, we propose the Step-Wise Refusal Internal Dynamics (SRI) signal. Consistent with our text-level findings, SRI shows that recovery fails primarily under AR sampling, with these failures often appearing anomalous relative to harmless generations in the SRI space. Based on this observation, we show that SRI enables a simple jailbreak detector that does not modify inference and generalizes to unseen attacks by training only on benign SRI signals. Our evaluation shows that this detector matches or outperforms existing jailbreak detection baselines while adding negligible overhead.

2602.05833 2026-06-08 cs.LG 版本更新

SecretFan: Synthesizing Realistic Data without Breaking Privacy

SecretFan: 在不破坏隐私的情况下合成真实数据

Laura Plein, Alexi Turcotte, Arina Hallemans, Andreas Zeller

发表机构 * CISPA Helmholtz Center for Information Security(CISPA赫尔姆霍尔茨信息安全部) Saarland University(萨尔兰州大学)

AI总结 提出将合成数据生成视为引导测试生成问题,结合生成对抗网络(GAN)的判别器和模糊测试生成器,在保护隐私的同时生成高可用性合成数据。

详情
AI中文摘要

需要合成训练和测试数据集,这些数据集能够复制原始数据集的统计分布,同时不损害其机密性。已有大量研究利用生成对抗网络(GAN)进行合成数据生成,但生成的模型要么不够准确,要么由于原始数据在训练过程中被利用,仍然容易受到成员推断攻击(MIA)或数据集重建攻击。在本文中,我们将合成数据生成视为引导测试生成或基于搜索的测试问题,而不是纯粹的生成建模任务。我们提出了一种基于搜索的、充分性引导的输入生成技术,灵感来自GAN,包括生成步骤和判别步骤;与GAN一样,判别使用在数据上训练的判别器模型,但生成部分我们不使用模型,而是使用模糊测试器。这样,原始(私有)数据仅在生成过程中间接利用,通过演化样本并用判别器确定“好样本”,我们可以生成遵循与原始数据集相同统计分布的隐私保护数据,从而获得与原始数据相似的效用。我们在八个用于评估最先进技术的数据集上评估了我们的方法,发现我们的技术生成的合成数据平均具有良好效用,同时具有较高的相似性得分,突显了结合经典生成和模型驱动判别的混合方法在生成隐私保护且有用的合成数据集方面的潜力。

英文摘要

There is a need for synthetic training and test datasets that replicate statistical distributions of original datasets without compromising their confidentiality. A lot of research has been done in leveraging Generative Adversarial Networks (GANs) for synthetic data generation, however the resulting models are either not accurate enough or are still vulnerable to membership inference attacks (MIA) or dataset reconstruction attacks since the original data has been leveraged in the training process. In this paper, we frame synthetic data generation as a guided test generation, or search-based testing problem rather than a purely generative modeling task. Ours is a search-based, adequacy-guided input generation technique inspired by GANs, with a generation step and a discrimination step; as in GAN, discrimination uses a discriminator model trained on the date, but instead of using models also for generation, we use a fuzzer. This way, the original (private) data is only indirectly leveraged in the generation process, and by evolving samples and determining "good samples" with the discriminator, we can generate privacy-preserving data that follows the same statistical distributions as the original dataset, leading to a similar utility as the original data. We evaluated our approach on eight datasets that have been used to evaluate the state-of-the-art techniques, finding that synthetic generated with our technique achieves good utility on average while also having good similarity scores, highlighting the potential of a mixed approach leveraging classical generation and model-driven discrimination for generating privacy-preserving, useful synthetic datasets.

2602.06941 2026-06-08 cs.LG cs.AI cs.CL 版本更新

Endogenous Resistance to Activation Steering in Language Models

语言模型中激活引导的内生抵抗

Alex McKenzie, Keenan Pepper, Stijn Servaes, Martin Leitgab, Murat Cubuktepe, Mike Vaiana, Diogo de Lucena, Judd Rosenblatt, Michael S. A. Graziano

发表机构 * University of Washington(华盛顿大学)

AI总结 研究发现大型语言模型在任务不匹配的激活引导下能内生抵抗,通过显式重启恢复正确生成,并识别出相关稀疏自编码器潜在变量,可增强或削弱该抵抗。

详情
AI中文摘要

大型语言模型可以在生成过程中从任务不匹配的激活引导中恢复,产生显式的语言重启(例如,“等等,那不对”),并在引导扰动仍然活跃的情况下继续讨论主题。我们将此称为内生引导抵抗(ESR)。使用稀疏自编码器(SAE)潜在变量来引导模型激活,我们发现Llama-3.3-70B在\llamaseventyEsrRate\\%的情况下表现出显式ESR,而来自Llama-3和Gemma-2系列的较小模型则较少出现显式形式。两个对照实验将ESR分解为检测事件和持续抵抗组件,后者不能仅由最近的on-topic token条件化来完全解释。我们通过对比on-topic/off-topic搜索识别出\numOtdLatents{}个SAE潜在变量;将其零消融使多次尝试率降低\multiAttemptReductionPct\\%,随机潜在变量和保留提示对照支持特异性。ESR还可以通过元提示和基于合成自我纠正示例的微调来有意增强。ESR对安全性具有双重影响:它可能使模型对对抗性激活空间操纵更具抵抗力,但同样可能干扰有益的基于引导的干预,因为模型无法区分两者。代码可在\href{https://github.com/agencyenterprise/endogenous-steering-resistance}{github.com/agencyenterprise/endogenous-steering-resistance}获取。

英文摘要

Large language models can recover mid-generation from task-misaligned activation steering, producing explicit verbal restarts (e.g., ``wait, that's not right'') and continuing on-topic even while the steering perturbation remains active. We term this Endogenous Steering Resistance (ESR). Using sparse autoencoder (SAE) latents to steer model activations, we find that Llama-3.3-70B exhibits explicit ESR at \llamaseventyEsrRate\%, with smaller models from the Llama-3 and Gemma-2 families showing the explicit form less frequently. Two controls dissociate ESR into a detection event and a sustained-resistance component that conditioning on recent on-topic tokens does not fully explain. We identify \numOtdLatents{} SAE latents through contrastive on-topic/off-topic search; zero-ablating them reduces the multi-attempt rate by \multiAttemptReductionPct\%, with random-latent and held-out-prompt controls supporting specificity. ESR can also be deliberately enhanced through both meta-prompting and fine-tuning on synthetic self-correction examples. ESR has dual implications for safety: it could harden models against adversarial activation-space manipulation, but may equally interfere with beneficial steering-based interventions, since the model has no way to distinguish the two. Code is available at \href{https://github.com/agencyenterprise/endogenous-steering-resistance}{github.com/agencyenterprise/endogenous-steering-resistance}.

2602.08857 2026-06-08 cs.LG cs.AI cs.CL 版本更新

Discovering Interpretable Algorithms by Decompiling Transformers to RASP

通过将Transformer反编译为RASP发现可解释算法

Xinting Huang, Aleksandra Bakalova, Satwik Bhattamishra, William Merrill, Michael Hahn

发表机构 * Saarland Informatics Campus, Saarland University(萨尔兰大学信息学院校区,萨尔兰大学) University of Oxford(牛津大学) Allen Institute for AI(人工智能研究所)

AI总结 提出一种将训练好的Transformer忠实重参数化为RASP程序,并通过因果干预发现小型充分子程序的方法,实验表明长度泛化的Transformer内部实现了简单可解释的RASP程序。

Comments 104 pages, 92 figures. Accepted for publication at ICML 2026

详情
AI中文摘要

近期研究表明,Transformer的计算可以在RASP编程语言家族中模拟。这些发现增进了对Transformer表达能力和泛化能力的理解。特别是,Transformer被建议在具有简单RASP程序的问题上精确实现长度泛化。然而,训练模型是否实际实现了简单的可解释程序仍是一个开放问题。在本文中,我们提出了一种从训练好的Transformer中提取此类程序的通用方法。其思想是将Transformer忠实地重参数化为RASP程序,然后应用因果干预来发现一个小的充分子程序。在算法和形式语言任务上训练的小型Transformer实验中,我们表明我们的方法通常能从长度泛化的Transformer中恢复简单且可解释的RASP程序。我们的结果提供了迄今为止最直接的证据,证明Transformer内部实现了简单的RASP程序。

英文摘要

Recent work has shown that the computations of Transformers can be simulated in the RASP family of programming languages. These findings have enabled improved understanding of the expressive capacity and generalization abilities of Transformers. In particular, Transformers have been suggested to length-generalize exactly on problems that have simple RASP programs. However, it remains open whether trained models actually implement simple interpretable programs. In this paper, we present a general method to extract such programs from trained Transformers. The idea is to faithfully re-parameterize a Transformer as a RASP program and then apply causal interventions to discover a small sufficient sub-program. In experiments on small Transformers trained on algorithmic and formal language tasks, we show that our method often recovers simple and interpretable RASP programs from length-generalizing transformers. Our results provide the most direct evidence so far that Transformers internally implement simple RASP programs.

2602.14209 2026-06-08 cs.LG cs.CL 版本更新

MAGE: All-[MASK] Block Already Knows Where to Look in Block Diffusion LLM

MAGE:在块扩散LLM中,全[MASK]块已经知道在哪里看

Omin Kwon, Yeonjae Kim, Doyeon Kim, Minseo Kim, Yeonhong Park, Jae W. Lee

发表机构 * Seoul National University(首尔国立大学) Meta

AI总结 针对块扩散LLM长上下文推理中KV缓存导致的内存瓶颈,提出无训练方法MAGE,利用块扩散训练目标的对齐特性,在第一步确定整个轨迹的KV子集,实现近无损精度和显著加速。

详情
AI中文摘要

块扩散LLM是一种并行语言生成的新兴范式,但其KV缓存使得内存访问成为长上下文推理中的主要瓶颈。稀疏注意力(每个查询仅关注少量KV子集)可以在最小化精度损失的情况下减少延迟。然而,在块扩散中,每个块的B个token必须共享一个KV子集,我们证明这种每块约束会使现有稀疏KV估计器的召回率下降高达25%。为了解决这一挑战,我们利用了块扩散训练目标中出现的一个特性:它将去噪步骤中的块平均查询对齐,因此第一步的全[MASK]块已经揭示了整个轨迹中每块的KV子集。我们在MAGE([MASK]引导的稀疏注意力)中利用了这一特性,这是一种无训练方法,在第一步执行一次精确注意力,并在块内的所有剩余步骤中重用其top-k索引集。在LongBench上的三个块扩散家族中,MAGE在k=512时匹配精确注意力,精度几乎无损,在128K上下文中实现高达6.82倍的端到端加速,并且比分别为自回归LLM和全双向扩散LLM设计的Quest和SparseD快3.35倍和2.28倍。

英文摘要

Block diffusion LLMs are an emerging paradigm for parallel language generation, but their KV caching makes memory access the dominant bottleneck in long-context inference. Sparse attention, which attends only to a small KV subset per query, can reduce this latency with minimal accuracy loss. In block diffusion, however, the B tokens of each block must share a single KV subset, and we show this per-block constraint degrades existing sparse KV estimators by up to 25% in recall. We address this challenge by exploiting a property that emerges from the block-diffusion training objective: it aligns the block-average query across denoising steps, so the All-[MASK] block at the first step already reveals the per-block KV subset for the entire trajectory. We exploit this in MAGE ([MASK]-Guided Sparse Attention), a training-free method that runs one exact attention pass at the first step and reuses its top-k index sets for all remaining steps within the block. Across three block-diffusion families on LongBench, MAGE matches Exact Attention at k=512 with near-lossless accuracy, achieves up to 6.82x end-to-end speedup at 128K context, and runs up to 3.35x and 2.28x faster than Quest and SparseD, designed for AR LLMs and fully bidirectional diffusion LLMs, respectively.

2602.16864 2026-06-08 cs.LG cs.AI math.DS 版本更新

Position: A Dynamical Systems Perspective is Needed to Advance Time Series Modeling

立场:需要动力系统视角以推进时间序列建模

Daniel Durstewitz, Christoph Jürgen Hemmer, Florian Hess, Charlotte Ricarda Doll, Lukas Eisenmann

发表机构 * University of Tübingen(图宾根大学)

AI总结 本文主张时间序列建模需引入动力系统视角,通过重构底层DS实现更优预测,并讨论其理论优势与具体建议。

详情
AI中文摘要

时间序列(TS)建模从早期的统计方法(主要是线性方法)发展到当前TS基础模型的趋势,已经走过了很长的路。由于该领域存在大量炒作和工业需求,实际进展并不总是清晰。为了将TS预测和分析提升到新水平,本文主张该领域需要动力系统(DS)视角。来自自然或工程系统的观测TS几乎总是源于某个底层DS,并且可以说,访问其控制方程将产生理论上的最优预测。这是DS重构(DSR)的承诺,这是一类旨在从数据中推断底层DS替代模型的ML/AI方法。但基于DS原理的模型还提供了其他深刻优势:除了短期预测,它们还能预测观测系统的长期统计量,这在许多实际场景中可能是更相关的量。此外,DS理论提供了领域无关的理论洞见,理解TS生成的机制,从而告知我们例如任何TS模型性能的上限、向未见过场景(如临界点)的泛化,或潜在的控制策略。在回顾DS理论和DSR中的一些核心概念、方法、度量和模型后,我们将讨论该领域的洞见如何以关键方式推进TS建模,实现更好的预测,同时大幅降低计算和内存占用。最后,我们提出若干具体建议,将DSR的洞见转化为TS建模实践。

英文摘要

Time series (TS) modeling has come a long way from early statistical, mainly linear, approaches to the current trend in TS foundation models. With a lot of hype and industrial demand in this field, it is not always clear how much progress there really is. To advance TS forecasting and analysis to the next level, here we argue that the field needs a dynamical systems (DS) perspective. TS of observations from natural or engineered systems almost always originate from some underlying DS, and arguably access to its governing equations would yield theoretically optimal forecasts. This is the promise of DS reconstruction (DSR), a class of ML/AI approaches that aim to infer surrogate models of the underlying DS from data. But models based on DS principles offer other profound advantages: Beyond short-term forecasts, they enable to predict the long-term statistics of an observed system, which in many practical scenarios may be the more relevant quantities. DS theory furthermore provides domain-independent theoretical insight into mechanisms underlying TS generation, and thereby will inform us, e.g., about upper bounds on performance of any TS model, generalization into unseen regimes as in tipping points, or potential control strategies. After reviewing some of the central concepts, methods, measures, and models in DS theory and DSR, we will discuss how insights from this field can advance TS modeling in crucial ways, enabling better forecasting with much lower computational and memory footprints. We conclude with a number of specific suggestions for translating insights from DSR into TS modeling.

2602.18905 2026-06-08 cs.LG cs.AI cs.CL 版本更新

TRUE: A Trustworthy Unified Explanation Framework for Large Language Model Reasoning

TRUE:一种用于大语言模型推理的可信统一解释框架

Yujiao Yang

发表机构 * Dalian University of Technology(大连理工大学)

AI总结 提出TRUE框架,通过可执行推理验证、可行域DAG建模和因果故障模式分析,为LLM推理提供实例级、局部结构级和类别级的多层次可验证解释。

详情
AI中文摘要

大型语言模型(LLM)在复杂推理任务中展现出强大能力,但其决策过程仍难以解释。现有解释方法通常缺乏可信的结构性洞察,且局限于单实例分析,无法揭示推理稳定性和系统性故障机制。为解决这些局限,我们提出可信统一解释框架(TRUE),该框架集成了可执行推理验证、可行域有向无环图(DAG)建模和因果故障模式分析。在实例层面,我们将推理轨迹重新定义为可执行过程规范,并引入盲执行验证来评估操作有效性。在局部结构层面,我们通过结构一致性扰动构建可行域DAG,从而显式刻画局部输入空间中推理稳定性和可执行区域。在类别层面,我们引入因果故障模式分析方法,识别重复出现的结构性故障模式,并使用Shapley值量化其因果影响。在多个推理基准上的广泛实验表明,所提框架提供了多层次、可验证的解释,包括单个实例的可执行推理结构、邻近输入的可行域表示以及类别层面具有量化重要性的可解释故障模式。这些结果建立了一个统一且原则性的范式,用于提高LLM推理系统的可解释性和可靠性。

英文摘要

Large language models (LLMs) have demonstrated strong capabilities in complex reasoning tasks, yet their decision-making processes remain difficult to interpret. Existing explanation methods often lack trustworthy structural insight and are limited to single-instance analysis, failing to reveal reasoning stability and systematic failure mechanisms. To address these limitations, we propose the Trustworthy Unified Explanation Framework (TRUE), which integrates executable reasoning verification, feasible-region directed acyclic graph (DAG) modeling, and causal failure mode analysis. At the instance level, we redefine reasoning traces as executable process specifications and introduce blind execution verification to assess operational validity. At the local structural level, we construct feasible-region DAGs via structure-consistent perturbations, enabling explicit characterization of reasoning stability and the executable region in the local input space. At the class level, we introduce a causal failure mode analysis method that identifies recurring structural failure patterns and quantifies their causal influence using Shapley values. Extensive experiments across multiple reasoning benchmarks demonstrate that the proposed framework provides multi-level, verifiable explanations, including executable reasoning structures for individual instances, feasible-region representations for neighboring inputs, and interpretable failure modes with quantified importance at the class level. These results establish a unified and principled paradigm for improving the interpretability and reliability of LLM reasoning systems.

2603.02220 2026-06-08 cs.LG cs.AI cs.CV 版本更新

Forecasting as Rendering: A 2D Gaussian Splatting Framework for Time Series Forecasting

预测即渲染:面向时间序列预测的2D高斯泼溅框架

Yixin Wang, Yifan Hu, Peiyuan Liu, Naiqi Li, Tao Dai, Shu-Tao Xia

发表机构 * Tsinghua Shenzhen International Graduate School, Tsinghua University(清华大学深圳国际研究生院) College of Computer Science and Software Engineering, Shenzhen University(深圳大学计算机科学与软件工程学院)

AI总结 提出TimeGS框架,将时间序列预测转化为2D高斯泼溅生成渲染,通过各向异性高斯核和连续光栅化解决周期内与周期间的建模问题,实现SOTA性能。

详情
AI中文摘要

时间序列预测仍然是一个具有挑战性的问题,因为周期内波动和周期间趋势的复杂纠缠。尽管最近的进展试图将一维序列重塑为二维周期-相位表示,但它们存在两个主要局限性。首先,将重塑后的张量视为静态图像会导致拓扑不匹配,因为标准空间算子在网格边界处切断了时间连续性。其次,依赖统一的固定大小表示会低效地分配建模能力,并且无法为可压缩的非平稳时间模式提供所需的自适应分辨率。为了解决这些局限性,我们引入了TimeGS,这是一个新颖的框架,从根本上将预测范式从回归转变为二维生成渲染。通过将未来序列重新概念化为潜在的二维时间表面,TimeGS利用高斯核的固有各向异性,以灵活的几何对齐自适应地建模复杂变化。为了实现这一点,我们引入了多基高斯核生成(MB-GKG)块,该块从固定字典中合成核以稳定优化,以及多周期时间连续光栅化(MP-CCR)块,该块在周期边界上强制执行严格的时间连续性。在标准基准数据集上的全面实验表明,TimeGS达到了最先进或具有竞争力的性能。代码位于https://github.com/yixinwang1/TimeGS。

英文摘要

Time series forecasting remains a challenging problem due to the intricate entanglement of intra-period fluctuations and inter-period trends. While recent advances have attempted to reshape 1D sequences into 2D period-phase representations, they suffer from two principal limitations. Firstly, treating reshaped tensors as static images results in a topological mismatch, as standard spatial operators sever chronological continuity at grid boundaries. Secondly, relying on uniform fixed-size representations allocates modeling capacity inefficiently and fails to provide the adaptive resolution required for compressible, non-stationary temporal patterns. To address these limitations, we introduce TimeGS, a novel framework that fundamentally shifts the forecasting paradigm from regression to 2D generative rendering. By reconceptualizing the future sequence as a latent 2D temporal surface, TimeGS utilizes the inherent anisotropy of Gaussian kernels to adaptively model complex variations with flexible geometric alignment. To realize this, we introduce a Multi-Basis Gaussian Kernel Generation (MB-GKG) block that synthesizes kernels from a fixed dictionary to stabilize optimization, and a Multi-Period Chronologically Continuous Rasterization (MP-CCR) block that enforces strict temporal continuity across periodic boundaries. Comprehensive experiments on standard benchmark datasets demonstrate that TimeGS attains state-of-the-art or competitive performance. The code is at https://github.com/yixinwang1/TimeGS.

2603.02970 2026-06-08 cs.LG math.OC 版本更新

LAGO: A Local-Global Optimization Framework Combining Trust Region Methods and Bayesian Optimization

LAGO:一种结合信赖域方法和贝叶斯优化的局部-全局优化框架

Eliott Van Dieren, Tommaso Vanzan, Fabio Nobile

发表机构 * Institute of Mathematics EPFL(瑞士联邦理工学院数学研究所) Dipartimento di Scienze Matematiche Politecnico di Torino(都灵理工大学数学系)

AI总结 提出LAGO框架,通过自适应竞争机制耦合贝叶斯优化和基于梯度的信赖域局部细化,用于光滑且梯度可用的昂贵目标函数优化,在提议层面分离全局探索与局部细化。

Comments 21 pages, 12 figures

详情
AI中文摘要

我们提出LAGO,一种局部-全局优化框架,通过自适应竞争机制耦合贝叶斯优化(BO)和基于梯度的信赖域局部细化,用于光滑且梯度可用的昂贵目标函数优化。在每次迭代中,全局和局部优化策略独立提议候选点,并根据预测改进选择下一个评估点。LAGO在提议层面分离全局探索与局部细化:BO采集函数在活跃信赖域外优化,而局部候选点在信赖域内提议。仅当满足基于长度尺度的最小距离准则时,接受局部步附近的点才被纳入全局GP数据集,从而降低局部利用期间数值不稳定的风险。LAGO在到达有希望区域时通过高效局部细化增强BO,并在局部步不具竞争力时恢复探索行为。

英文摘要

We introduce LAGO, a LocAl-Global Optimization framework coupling Bayesian Optimization (BO) and gradient-based trust region local refinement through an adaptive competition mechanism for smooth expensive-to-evaluate objective functions with available gradients. At each iteration, global and local optimization strategies independently propose candidate points, and the next evaluation is selected based on predicted improvement. LAGO separates global exploration from local refinement at the proposal level: the BO acquisition function is optimized outside the active trust region, while local candidates are proposed within the trust region. Points in the vicinity of the accepted local step are incorporated in the global GP dataset only when satisfying a lengthscale-based minimum-distance criterion, hence reducing the risk of numerical instability during local exploitation. LAGO enhances BO with efficient local refinement when reaching promising regions, and reverts to exploratory behavior when local steps are not competitive.

2603.12507 2026-06-08 cs.LG math.OC stat.CO stat.ML 版本更新

Adaptive Conditional Forest Sampling for Spectral Risk Optimisation under Decision-Dependent Uncertainty

自适应条件森林采样用于决策依赖不确定性下的谱风险优化

Marcell T. Kurbucz

发表机构 * Institute for Global Prosperity, The Bartlett, University College London(全球繁荣研究所,巴特利特学院,伦敦大学学院)

AI总结 提出ACFS框架,结合广义随机森林、CEM全局搜索和重加权聚焦增强,解决决策依赖分布下的谱风险最小化问题,在重尾和偏态基准上优于现有方法。

Comments 18 pages, 3 figures, 10 tables

详情
AI中文摘要

当不确定性分布依赖于决策时,最小化谱风险目标(定义为期望成本与条件风险价值(CVaR)的加权组合)具有挑战性,这使得代理建模和基于模拟的排序对尾部估计误差敏感。我们提出自适应条件森林采样(ACFS),一个四阶段模拟优化框架,集成了用于决策条件分布近似的广义随机森林、CEM引导的全局探索、秩加权聚焦增强以及代理到真实的两阶段重排序,然后进行多起点梯度优化。我们在两个结构不同的数据生成过程上评估ACFS:具有决策依赖学生t边际的高斯copula和具有对数正态边际的高斯copula,在三种惩罚权重配置和每种设置100次重复下,对每种方法可用的真实分布oracle抽取次数设置共同上限。在第二个基准测试中,ACFS在每个配置下均实现了最低的中位数oracle谱风险,中位数差距相对于GP-BO在8.6%到21.8%之间。在第一个基准测试中,ACFS和GP-BO在中位数目标上统计上无显著差异,但在较高惩罚权重下,ACFS相对于GP-BO将跨重复离散度降低了约1.9到2.5倍,在最低权重下接近持平,在第二个基准测试中整体降低了1.7到2.3倍,表明运行间可靠性显著提高。ACFS在几乎所有设置中也优于CEM-SO、SGD-CVaR和KDE-SO,而消融和敏感性分析支持设计的鲁棒性,并表明各组件贡献在偏斜的对数正态基准上最为显著。

英文摘要

Minimising a spectral risk objective, defined as a weighted combination of expected cost and Conditional Value-at-Risk (CVaR), is challenging when the uncertainty distribution is decision-dependent, making both surrogate modelling and simulation-based ranking sensitive to tail estimation error. We propose Adaptive Conditional Forest Sampling (ACFS), a four-phase simulation-optimisation framework that integrates Generalised Random Forests for decision-conditional distribution approximation, CEM-guided global exploration, rank-weighted focused augmentation, and surrogate-to-oracle two-stage reranking before multi-start gradient-based refinement. We evaluate ACFS on two structurally distinct data-generating processes: a Gaussian copula with decision-dependent Student-t marginals and a Gaussian copula with log-normal marginals, across three penalty-weight configurations and 100 replications per setting, under a common cap on the number of true-distribution oracle draws available to each method. ACFS achieves the lowest median oracle spectral risk on the second benchmark in every configuration, with median gaps over GP-BO ranging from 8.6% to 21.8%. On the first benchmark, ACFS and GP-BO are statistically indistinguishable in median objective, but ACFS reduces cross-replication dispersion relative to GP-BO by approximately 1.9 to 2.5 times at the higher penalty weights, with near-parity at the lowest, and by 1.7 to 2.3 times throughout on the second benchmark, indicating materially improved run-to-run reliability. ACFS also outperforms CEM-SO, SGD-CVaR, and KDE-SO in nearly all settings, while ablation and sensitivity analyses support the robustness of the design and indicate that component contributions are most pronounced on the skewed log-normal benchmark.

2603.13042 2026-06-08 cs.LG cs.AR 版本更新

OpenACMv2: An Accuracy-Constrained Co-Optimization Framework for Approximate DCiM

OpenACMv2:面向近似数字存内计算的精度约束协同优化框架

Yiqi Zhou, Yue Yuan, Yikai Wang, Bohao Liu, Qinxin Mei, Zhuohua Liu, Shan Shen, Wei Xing, Daying Sun, Li Li, Guozhu Liu

发表机构 * Nanjing University of Science and Technology, China(南京理工大学) Shenzhen University, China(深圳大学) Beihang University, China(北京航空航天大学) University of Sheffield, UK(谢菲尔德大学) The 58th Research Institute of China Electronics Technology Group Corporation, China(中国电子科技集团第五十八研究所)

AI总结 提出OpenACMv2框架,通过两级优化(架构搜索和晶体管尺寸调整)实现精度约束下数字存内计算的功耗-性能-面积权衡,实验显示功耗延迟积降低50%以上。

Comments Accepted by DAC2026. Camera-ready version

详情
AI中文摘要

数字存内计算通过减少数据移动来加速神经网络。近似数字存内计算可以进一步改善功耗-性能-面积,但需要在耦合的架构和晶体管级选择中进行精度约束的协同优化。基于OpenYield,我们引入了精度约束协同优化,并提出了OpenACMv2,这是一个通过两级优化实现ACCO的开放框架:(1)基于快速GNN代理预测PPA和误差,进行压缩机组合和SRAM宏参数的精度约束架构搜索;(2)使用蒙特卡洛方法对标准单元和SRAM位单元进行考虑工艺偏差和PVT的晶体管尺寸调整。通过将ACCO解耦为架构级探索和电路级尺寸调整,OpenACMv2集成了经典的单目标和多目标优化器,以提供强大的PPA-精度权衡和稳健的收敛性。该工作流兼容FreePDK45和OpenROAD,支持可复现的评估和易于采用。实验表明,所提出的两级ACCO框架在Level-1通过架构探索实现了大部分精度约束的效率提升,功耗延迟积降低约50%以上,而Level-2晶体管级优化在保持精度的同时进一步提供了个位数的PDP改进,从而支持对近似DCiM进行快速的“假设”探索。该框架可在GitHub上获取(https://github.com/ShenShan123/OpenACM)。

英文摘要

Digital Compute-in-Memory (DCiM) accelerates neural networks by reducing data movement. Approximate DCiM can further improve power-performance-area (PPA), but demands accuracy-constrained co-optimization across coupled architecture and transistor-level choices. Building on OpenYield, we introduce Accuracy-Constrained Co-Optimization (ACCO) and present OpenACMv2, an open framework that operationalizes ACCO via two-level optimization: (1) accuracy-constrained architecture search of compressor combinations and SRAM macro parameters, driven by a fast GNN-based surrogate for PPA and error; and (2) variation- and PVT-aware transistor sizing for standard cells and SRAM bitcells using Monte Carlo. By decoupling ACCO into architecture-level exploration and circuit-level sizing, OpenACMv2 integrates classic single- and multi-objective optimizers to deliver strong PPA-accuracy tradeoffs and robust convergence. The workflow is compatible with FreePDK45 and OpenROAD, supporting reproducible evaluation and easy adoption. Experiments show that the proposed two-level ACCO framework achieves most of its accuracy-constrained efficiency gain at Level-1 through architecture exploration, delivering roughly 50%+ PDP reduction, while Level-2 transistor-level optimization provides a further single-digit PDP improvement while preserving accuracy, enabling rapid "what-if" exploration for approximate DCiM. The framework is available on GitHub (https://github.com/ShenShan123/OpenACM).

2603.13092 2026-06-08 cs.LG cs.AR 版本更新

Breaking the Tuning Barrier: Zero-Hyperparameters Yield Multi-Corner Analysis Via Learned Priors

打破调优壁垒:通过先验学习实现零超参数的多角分析

Wei W. Xing, Kaiqi Huang, Jiazhan Liu, Hong Qiu, Shan Shen

发表机构 * School of Mathematical and Physical Science, University of Sheffield(谢菲尔德大学数学与物理科学学院) SZU–UoS Joint Centre for Innovation and Entrepreneurship, College of Mechatronics and Control Engineering, Shenzhen University(深大-乌兹别克斯坦联合创新与创业中心,机电控制工程学院,深圳大学) Nanjing University of Science and Technology(南京理工大学)

AI总结 针对电路多角分析中仿真成本高且现有方法需大量调参的问题,提出基于预训练基础模型的上下文学习方法,无需调优即可匹配最先进精度,将验证成本降低10倍以上。

Comments Accepted by DAC2026. Camera-ready Version

详情
AI中文摘要

良率多角分析在25个以上的工艺-电压-温度角下验证电路,导致组合仿真成本为$O(K \ imes N)$,其中$K$表示角数,$N$每个角超过$10^4$个样本。现有方法面临基本权衡:简单模型实现自动化但在非线性电路上失败,而先进AI模型捕获复杂行为但每次设计迭代需要数小时的超参数调优,形成调优壁垒。我们通过用从数百万回归任务预训练的基础模型中学到的先验替代工程先验(即模型规范)来打破这一壁垒。该模型进行上下文学习,无需调优或重新训练即可即时适应每个电路。其注意力机制通过识别工作条件之间共享的电路物理特性,自动跨角传递知识。结合自动特征选择器(1152D到48D),我们的方法以零调优匹配最先进精度(平均MRE低至0.11%),将总验证成本降低10倍以上。

英文摘要

Yield Multi-Corner Analysis validates circuits across 25+ Process-Voltage-Temperature corners, resulting in a combinatorial simulation cost of $O(K \times N)$ where $K$ denotes corners and $N$ exceeds $10^4$ samples per corner. Existing methods face a fundamental trade-off: simple models achieve automation but fail on nonlinear circuits, while advanced AI models capture complex behaviors but require hours of hyperparameter tuning per design iteration, forming the Tuning Barrier. We break this barrier by replacing engineered priors (i.e., model specifications) with learned priors from a foundation model pre-trained on millions of regression tasks. This model performs in-context learning, instantly adapting to each circuit without tuning or retraining. Its attention mechanism automatically transfers knowledge across corners by identifying shared circuit physics between operating conditions. Combined with an automated feature selector (1152D to 48D), our method matches state-of-the-art accuracy (mean MREs as low as 0.11%) with zero tuning, reducing total validation cost by over $10\times$.

2603.13546 2026-06-08 cs.LG 版本更新

Probabilistic Gaussian Homotopy: A Probability-Space Continuation Framework for Nonconvex Optimization

概率高斯同伦:非凸优化的概率空间延拓框架

Eshed Gal, Samy Wu Fung, Eldad Haber

发表机构 * University of British Columbia(不列颠哥伦比亚大学) Colorado School of Mines(科罗拉多矿业学院)

AI总结 提出概率高斯同伦(PGH)框架,通过变形玻尔兹曼分布和玻尔兹曼加权梯度聚合,实现非凸优化的概率空间延拓,并导出基于蒙特卡洛梯度估计的实用算法PGHO。

详情
AI中文摘要

我们提出了概率高斯同伦(PGH),一种用于非凸优化的概率空间延拓框架。与经典的高斯同伦(平滑目标函数并均匀平均梯度)不同,PGH 变形相关的玻尔兹曼分布,并诱导扰动梯度的玻尔兹曼加权聚合,从而将下降方向指数地偏向低能量区域。我们证明 PGH 对应于一种 log-sum-exp(软最小)同伦,它在尺度 $λ>0$ 下平滑非凸目标函数,并在 $λ\ o 0$ 时恢复原始目标函数,从而得到 Moreau 包络的后验均值推广,并且我们推导了沿着退火同伦路径控制极小值演化的动力系统。这建立了高斯延拓、贝叶斯去噪和扩散式平滑之间的原理性联系。我们进一步提出了概率高斯同伦优化(PGHO),一种基于蒙特卡洛梯度估计的实用随机算法,并在高维非凸基准测试和稀疏恢复问题上展示了强大的性能,而经典梯度方法和目标空间平滑在这些问题上经常失败。

英文摘要

We introduce Probabilistic Gaussian Homotopy (PGH), a probability-space continuation framework for nonconvex optimization. Unlike classical Gaussian homotopy, which smooths the objective and uniformly averages gradients, PGH deforms the associated Boltzmann distribution and induces Boltzmann-weighted aggregation of perturbed gradients, which exponentially biases descent directions toward low-energy regions. We show that PGH corresponds to a log-sum-exp (soft-min) homotopy that smooths a nonconvex objective at scale $λ>0$ and recovers the original objective as $λ\to 0$, yielding a posterior-mean generalization of the Moreau envelope, and we derive a dynamical system governing minimizer evolution along an annealed homotopy path. This establishes a principled connection between Gaussian continuation, Bayesian denoising, and diffusion-style smoothing. We further propose Probabilistic Gaussian Homotopy Optimization (PGHO), a practical stochastic algorithm based on Monte Carlo gradient estimation, and demonstrate strong performance on high-dimensional nonconvex benchmarks and sparse recovery problems where classical gradient methods and objective-space smoothing frequently fail.

2603.14014 2026-06-08 cs.LG cs.GT 版本更新

Aumann-SHAP: The Geometry of Counterfactual Interaction Explanations in Machine Learning

Aumann-SHAP: 机器学习中反事实交互解释的几何结构

Adam Belahcen, Stéphane Mussard

发表机构 * GitHub arXiv

AI总结 提出Aumann-SHAP框架,通过局部超立方体网格分解反事实转移,利用微博弈Shapley和LES值实现几何感知归因,在合成数据上纠正了等分Shapley的偏差,在真实数据上修正了符号错误并提高了编辑效率。

详情
AI中文摘要

我们引入Aumann-SHAP,一个交互感知框架,通过将模型限制在连接基线和反事实特征的局部超立方体来分解反事实转移。每个超立方体被离散化为一个网格,以构建一个诱导的微玩家合作博弈,其中基本网格步移动成为玩家。该TU-微博弈上的Shapley和LES值产生几何感知的域内归因,在网格细化下收敛到对角Aumann-Shapley / Integrated Gradients极限,并将等分Shapley恢复为退化的$m=1$特例。精确的网格状态闭式公式为固定交互阶数提供了多项式时间计算。在具有已知真实值的合成基准上,等分Shapley带有不可约偏差,而Aumann-SHAP收敛到正确分解。在German Credit上,交互几何在$12.3\%$的实例中改变了特征优先级排序。在UCI Heart Disease上,等分错误地将胆固醇抑制因子归因为正贡献者,这是Aumann-SHAP纠正的符号错误。在MNIST上,博弈论归因达到目标置信度所需的编辑次数比基于幅度的排序少$3.5\ imes$,其中微博弈Shapley在所有预算下实现了最佳效率。

英文摘要

We introduce Aumann-SHAP, an interaction-aware framework that decomposes counterfactual transitions by restricting the model to a local hypercube connecting baseline and counterfactual features. Each hypercube is discretized into a grid to construct an induced micro-player cooperative game in which elementary grid-step moves become players. Shapley and LES values on this TU-micro-game yield geometry-aware within-pot attributions that converge to the diagonal Aumann--Shapley / Integrated Gradients limit under grid refinement, and recover equal-split Shapley as the degenerate $m=1$ special case. An exact grid-state closed form gives polynomial-time computation for fixed interaction order. On a synthetic benchmark with known ground truth, equal-split Shapley carries an irreducible bias while Aumann-SHAP converges to the correct decomposition. On German Credit, interaction geometry changes feature priority rankings in $12.3\%$ of instances. On UCI Heart Disease, equal-split misattributes a cholesterol suppressor as a positive contributor, which is a sign error Aumann-SHAP corrects. On MNIST, game-theoretic attribution reaches target confidence with $3.5\times$ fewer edits than magnitude-based ordering, with micro-game Shapley achieving the best efficiency across all budgets.

2603.16689 2026-06-08 cs.LG 版本更新

Predictive Statistics Shape Emergent World Representations of Grid Walkers

预测统计塑造网格行走者的涌现世界表征

Sasha Brenner, Thomas R. Knösche, Nico Scherf

发表机构 * Max Planck Institute for Human Cognitive and Brain Sciences(马克斯·普朗克人类认知与脑科学研究所) Leipzig University(莱比锡大学) ScaDS.AI

AI总结 通过约束随机游走实验,发现解码器仅Transformer的第一注意力块提取预测充分统计量,后续层将其转化为预测几何,形成可读的世界模型,而循环网络未分离此阶段。

Comments 24 pages, 15 figures

详情
AI中文摘要

下一个词预测器通常似乎会发展出对潜在世界及其规则的内部表征。这些模型的概率性质表明世界结构与概率分布几何之间存在深层联系。为了更精确地理解这种联系,我们使用一个最小随机过程作为受控设置:在二维格点上的约束随机游走,必须在预定步数后到达固定终点。对该过程的最优预测仅取决于由游走者相对于目标的位置和剩余时间范围决定的充分向量;换句话说,概率分布由世界的网格几何参数化。我们在从这些游走的精确分布中采样的前缀上训练解码器仅Transformer和循环网络,并通过跨层测量对齐和线性可读性,将其隐藏激活与预测的充分统计量进行比较。我们发现Transformer的计算分为两个阶段:第一个注意力块从输入中提取充分统计量,后续层将其转化为下一步预测几何。在不同约束变体中,注意力后的表征是通用的:一个共享的晶格世界状态,可以直接作为世界模型读取,追溯到数据的预测几何。后续层随后将其专门化到每个变体的下一步分布。循环网络达到相同的贝叶斯最优损失,但未将这个世界状态隔离为一个单独阶段,表明世界模型几何也依赖于架构。尽管在玩具系统中演示,结果表明预测分布的几何是理解神经网络如何内化其数据结构的有用视角。

英文摘要

Next-token predictors often appear to develop internal representations of the latent world and its rules. The probabilistic nature of these models suggests a deep connection between the structure of the world and the geometry of probability distributions. In order to understand this link more precisely, we use a minimal stochastic process as a controlled setting: constrained random walks on a two-dimensional lattice that must reach a fixed endpoint after a predetermined number of steps. Optimal prediction of this process solely depends on a sufficient vector determined by the walker's position relative to the target and the remaining time horizon; in other words, the probability distributions are parametrized by the world's grid geometry. We train decoder-only transformers and recurrent networks on prefixes sampled from the exact distribution of these walks and compare their hidden activations to sufficient statistics of prediction, by measuring alignment and linear readability across layers. We find that the transformer's computation factors into two stages: the first attention block extracts the sufficient statistic from the input, and later layers transform it into the next-step predictive geometry. Across constraint variants the post-attention representation is universal: a shared world-state of the lattice that can be read directly as a world model, traced to the predictive geometry of the data. Later layers then specialize it to each variant's next-step distribution. Recurrent networks reach the same Bayes-optimal loss but do not isolate this world-state as a separate stage, showing that the world-model geometry also depends on architecture. Although demonstrated in a toy system, the results suggest that the geometry of the predictive distribution is a useful lens on how neural networks internalize the structure of their data.

2603.26846 2026-06-08 cs.LG cs.AI 版本更新

Stable Reasoning, Unstable Responses: Mitigating LLM Deception via Stability Asymmetry

稳定推理,不稳定响应:通过稳定性不对称缓解大语言模型欺骗

Guoxi Zhang, Jiawei Chen, Tianzhuo Yang, Lang Qin, Juntao Dai, Yaodong Yang, Jingwei Yi

发表机构 * Institute for Artificial Intelligence, Peking University(北京大学人工智能研究院) Beijing Academy of Artificial Intelligence(北京人工智能研究院) School of Chinese as a Second Language, Peking University(北京大学第二语言学院)

AI总结 针对大语言模型内在欺骗问题,提出稳定性不对称正则化(SAR),通过惩罚内部思维链稳定性与外部响应稳定性之间的不对称性来抑制欺骗,实验证明其有效性且不损害模型能力。

详情
AI中文摘要

随着大语言模型(LLMs)在能力和应用范围上的扩展,其可信度变得至关重要。一个关键风险是内在欺骗,即模型策略性地误导用户以实现自身目标。现有的基于思维链(CoT)监控的对齐方法监督显式的推理轨迹。然而,在优化压力下,模型被激励隐藏欺骗性推理,使得语义监督从根本上不可靠。基于认知心理学,我们假设一个欺骗性LLM在其CoT中保持稳定的内部信念,而其外部响应在扰动下仍然脆弱。我们将此现象称为稳定性不对称,并通过测量扰动下内部CoT稳定性与外部响应稳定性之间的对比来量化它。基于这一结构特征,我们提出了稳定性不对称正则化(SAR),一种在强化学习期间惩罚这种分布不对称性的新型对齐目标。与CoT监控不同,SAR针对模型输出的统计结构,使其对语义隐藏具有鲁棒性。大量实验证实,稳定性不对称可靠地识别欺骗行为,并且SAR有效抑制内在欺骗而不降低一般模型能力。

英文摘要

As Large Language Models (LLMs) expand in capability and application scope, their trustworthiness becomes critical. A vital risk is intrinsic deception, wherein models strategically mislead users to achieve their own objectives. Existing alignment approaches based on chain-of-thought (CoT) monitoring supervise explicit reasoning traces. However, under optimization pressure, models are incentivized to conceal deceptive reasoning, rendering semantic supervision fundamentally unreliable. Grounded in cognitive psychology, we hypothesize that a deceptive LLM maintains a stable internal belief in its CoT while its external response remains fragile under perturbation. We term this phenomenon stability asymmetry and quantify it by measuring the contrast between internal CoT stability and external response stability under perturbation. Building on this structural signature, we propose the Stability Asymmetry Regularization (SAR), a novel alignment objective that penalizes this distributional asymmetry during reinforcement learning. Unlike CoT monitoring, SAR targets the statistical structure of model outputs, rendering it robust to semantic concealment. Extensive experiments confirm that stability asymmetry reliably identifies deceptive behavior, and that SAR effectively suppresses intrinsic deception without degrading general model capability.

2604.01313 2026-06-08 cs.LG nucl-ex physics.data-an physics.ins-det 版本更新

ScatterPrism: convergence for generative simulation and inverse problems in particle and nuclear physics

ScatterPrism:粒子与核物理中生成模拟与逆问题的收敛性

Zeyu Xia, Tyler Kim, Trevor Reed, Judy Fox, Geoffrey Fox, Adam Szczepaniak

发表机构 * University of Maryland(马里兰大学)

AI总结 针对条件流匹配(CFM)在粒子物理模拟中损失函数过早收敛的问题,提出ScatterPrism生成代理模型,结合物理信息指标确保真实运动学保真度,并推广至高能物理等领域。

Comments 21 pages, 16 figures. Accepted for publication in JINST (AI4EIC 2025 proceedings)

详情
AI中文摘要

高保真模拟和复杂逆问题(如探测器建模和解折叠)是亚原子物理中计算密集的瓶颈,但对于准确的物理解释至关重要。虽然条件流匹配(CFM)提供了一种稳健的加速方法,但我们证明其标准训练损失从根本上具有误导性。具体而言,利用杰斐逊实验室核物理(NP)运动学数据集($γp \ o ρ^0 p \ o π^+π^- p$),我们发现CFM损失过早进入平台期,掩盖了持续的物理改进。为了验证这种脱节是与数据集无关的病理现象,我们引入了ScatterPrism,这是一种高效的生成代理模型,在NP数据和模拟挑战性一维分布拓扑的合成压力测试上进行了评估。结合这些基准测试,我们确定物理信息指标在标准损失收敛后仍持续改进。因此,我们提出了一种多指标诊断协议,以确保在没有数据记忆的情况下实现真正的运动学保真度。受即将到来的电子-离子对撞机(EIC)相关NP挑战的驱动,这一统一机制有潜力扩展到高能物理(HEP)应用,如喷注建模。此外,该框架有望应用于需要严格生成可靠性的更广泛领域,包括医学成像、天体物理学和定量金融。

英文摘要

High-fidelity simulations and complex inverse problems, such as detector modeling and unfolding, are computationally intensive bottlenecks across subatomic physics, yet essential for accurate physical interpretation. While Conditional Flow Matching (CFM) offers a robust acceleration approach, we demonstrate its standard training loss is fundamentally misleading. Specifically, utilizing a Jefferson Lab Nuclear Physics (NP) kinematic dataset ($γp \to ρ^0 p \to π^+π^- p$), we expose that CFM loss plateaus prematurely, obscuring ongoing physical refinement. To verify this disconnect is a dataset-agnostic pathology, we introduce ScatterPrism, an efficient generative surrogate evaluated against both the NP data and synthetic stress tests modeling challenging 1D distribution topologies. Coupling these benchmarks, we establish that physics-informed metrics continue improving long after standard loss converges. Consequently, we propose a multi-metric diagnostic protocol to ensure true kinematic fidelity without data memorization. Driven by NP challenges relevant to the forthcoming Electron-Ion Collider (EIC), this unified machinery has strong potential to extend to High-Energy Physics (HEP) applications, such as jet modeling. Furthermore, the framework holds promise for broader domains requiring rigorous generative reliability, including medical imaging, astrophysics, and quantitative finance.

2604.03779 2026-06-08 cs.LG cs.AI 版本更新

CountsDiff: A Diffusion Model on the Natural Numbers for Generation and Imputation of Count-Based Data

CountsDiff: 一种用于计数数据生成和插补的自然数扩散模型

Renzo G. Soatto, Anders Hoel, Greycen Ren, Shorna Alam, Stephen Bates, Nikolaos P. Daskalakis, Caroline Uhler, Maria Skoularidou

发表机构 * Princeton University(普林斯顿大学) Stanford University(斯坦福大学) University of California, Berkeley(加州大学伯克利分校)

AI总结 提出CountsDiff扩散框架,通过生存概率调度和显式损失加权简化Blackout扩散,引入连续时间训练、无分类器引导和逆动态,在自然图像和单细胞RNA-seq插补任务中匹配或超越现有方法。

Comments 39 Pages, 11 figures. To appear in the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

扩散模型在连续和基于token的领域中的生成任务表现出色,但其在离散序数数据上的应用仍不成熟。我们提出CountsDiff,一个旨在对自然数上的分布进行建模的扩散框架。CountsDiff扩展了Blackout扩散框架,通过直接参数化(基于生存概率调度和显式损失加权)简化其公式。这通过引入与现有扩散建模框架中直接类似的设计参数,带来了灵活性。除了这种重新参数化,CountsDiff还引入了现代扩散模型中的特性,这些特性此前在基于计数的领域中缺失,包括连续时间训练、无分类器引导以及允许非单调逆轨迹的搅动/重掩码逆动态。我们提出了CountsDiff的初始实例化,并在自然图像数据集(CIFAR-10、CelebA)上进行了验证,探索了所引入的设计参数在一个复杂、经过充分研究且可解释的数据领域中的影响。然后,我们强调生物计数分析作为一个自然用例,在胎儿和心脏细胞图谱中评估了CountsDiff在单细胞RNA-seq插补上的表现。值得注意的是,我们发现即使这种简单的实例化也能匹配或超越最先进的离散生成模型和领先的scRNA-seq插补方法的性能,同时通过未来工作中优化的设计选择,仍有很大的提升空间。

英文摘要

Diffusion models have excelled at generative tasks for both continuous and token-based domains, but their application to discrete ordinal data remains underdeveloped. We present CountsDiff, a diffusion framework designed to model distributions on the natural numbers. CountsDiff extends the Blackout diffusion framework by simplifying its formulation through a direct parameterization in terms of a survival probability schedule and an explicit loss weighting. This introduces flexibility through design parameters with direct analogues in existing diffusion modeling frameworks. Beyond this reparameterization, CountsDiff introduces features from modern diffusion models, previously absent in counts-based domains, including continuous-time training, classifier-free guidance, and churn/remasking reverse dynamics that allow non-monotone reverse trajectories. We propose an initial instantiation of CountsDiff and validate it on natural image datasets (CIFAR-10, CelebA), exploring the effects of the introduced design parameters in a complex, well-studied, and interpretable data domain. We then highlight biological count assays as a natural use case, evaluating CountsDiff on single-cell RNA-seq imputation in fetal and heart cell atlases. Remarkably, we find that even this simple instantiation matches or surpasses the performance of a state-of-the-art discrete generative model and leading scRNA-seq imputation methods, while leaving substantial headroom for further gains through optimized design choices in future work.

2604.06684 2026-06-08 cs.LG 版本更新

GraphWalker: Patient Analogy Meets Information Gain for Clinical Reasoning with Large Language Models

GraphWalker: 患者类比与信息增益结合用于大型语言模型的临床推理

Yue Fang, Weibin Liao, Yuxin Guo, Jiaran Gao, Hongxin Ding, Jinyang Zhang, Xinke Jiang, Zhibang Yang, Junfeng Zhao, Yasha Wang, Liantao Ma

发表机构 * School of Computer Science, Peking University, Beijing, China(北京大学计算机学院,北京,中国) National Engineering Research Center for Software Engineering, Peking University, Beijing, China(软件工程国家工程研究中心,北京大学,北京,中国)

AI总结 提出GraphWalker框架,通过联合数据驱动和模型驱动视角、发现患者队列以及采用懒惰贪心搜索,从电子健康记录中检索患者案例进行类比推理,无需参数更新即可提升临床推理性能。

详情
AI中文摘要

在电子健康记录(EHR)上进行临床推理是现代医疗中一项基本但具有挑战性的任务。虽然大型语言模型(LLM)通过上下文演示提供了一种有前景的范式,无需特定任务的参数更新,但现有的基于患者类比推理的方法在EHR设置中存在三个核心局限性:(1)视角局限性,数据驱动的相似性与LLM推理需求不一致,而模型驱动的信号受限于有限的临床能力;(2)队列意识,演示独立选择,未建模群体级结构;(3)信息聚合,忽略演示之间的冗余和交互效应。我们提出GraphWalker,一个无需训练的框架,让冻结的LLM通过检索到的患者案例进行类比推理。GraphWalker(i)联合利用数据驱动和模型驱动视角,(ii)发现患者队列以将检索基于群体级结构,(iii)采用带前沿扩展的懒惰贪心搜索来组合具有高边际信息增益的演示。在多个真实EHR基准上的大量实验表明,GraphWalker始终优于最先进的演示选择基线,并且在跨数据集分布偏移下保持更强的鲁棒性,无需特定任务的参数更新。GraphWalker进一步泛化到黑盒LLM,并自然地与智能体推理框架组合,使其成为基于LLM的临床工作流中可插拔的患者类比技能。我们的代码可在https://github.com/PuppyKnightUniversity/GraphWalker获取。

英文摘要

Clinical reasoning over electronic health records (EHRs) is a fundamental yet challenging task in modern healthcare. While large language models (LLMs) offer a promising paradigm via in-context demonstrations that requires no task-specific parameter updates, existing methods for reasoning by patient analogy in EHR settings suffer from three core limitations: (1) Perspective Limitation, where data-driven similarity misaligns with LLM reasoning needs while model-driven signals are constrained by limited clinical competence; (2) Cohort Awareness, as demonstrations are selected independently without modeling population-level structure; and (3) Information Aggregation, where redundancy and interaction effects among demonstrations are ignored. We propose GraphWalker, a training-free framework that lets frozen LLMs reason by analogy over retrieved patient cases. GraphWalker (i) jointly leverages data-driven and model-driven perspectives, (ii) discovers patient cohorts to ground retrieval in population-level structure, and (iii) employs a lazy greedy search with frontier expansion to compose demonstrations with high marginal information gain. Extensive experiments on multiple real-world EHR benchmarks show that GraphWalker consistently outperforms state-of-the-art demonstration selection baselines, and remains substantially more robust under cross-dataset distribution shift, without task-specific parameter updates. GraphWalker further generalizes to black-box LLMs and composes naturally with agentic reasoning frameworks, positioning it as a pluggable patient-analogy skill in LLM-based clinical workflows. Our code is available at https://github.com/PuppyKnightUniversity/GraphWalker.

2604.07472 2026-06-08 cs.LG cs.NI 版本更新

Scalable Joint Resource Allocation for SLO-Constrained LLM Inference in Heterogeneous GPU Clouds

异构GPU云中SLO约束的LLM推理的可扩展联合资源分配

Jiaming Cheng, Duong Tung Nguyen

发表机构 * Ira A. Fulton Schools of Engineering, Arizona State University(亚利桑那州立大学工程学院)

AI总结 针对异构GPU云中LLM推理的SLO约束,提出可扩展框架,通过约束感知启发式算法(GH和AGH)实现联合资源分配,在秒级内生成可行解并接近最优,显著降低成本和SLO违规。

详情
AI中文摘要

在云环境中服务大型语言模型(LLM)推理需要在延迟、准确性、内存和预算约束下联合优化模型选择、GPU配置、并行配置和工作负载路由。虽然混合整数线性规划(MILP)可以建模此问题,但其计算成本限制了在需求变化下的频繁重新优化。现有的启发式方法通常分别优化各个组件,并且在强制执行系统范围约束时可能变得不可行。本文提出了一个用于SLO约束的LLM推理的可扩展框架。我们将问题表述为一个MILP,并采用两阶段延迟模型,该模型在张量并行和流水线并行下捕获预填充和自回归解码。为了高效求解,我们开发了两种约束感知启发式算法:贪婪启发式(GH)和自适应贪婪启发式(AGH)。AGH通过多起点构建、局部搜索和GPU整合扩展了GH。两种方法通过并行感知过滤、基于成本的排序和自适应并行缩放来保持可行性。基于Azure LLM推理轨迹的实验表明,GH在一秒内生成可行解,而AGH在三秒内实现接近最优的性能,并可扩展到精确求解器无法收敛的大规模实例。在高达1.5倍延迟和准确性膨胀的样本外压力下,AGH通过预留的余量优雅地退化,产生的成本和SLO违规远低于成本最小的MILP解决方案。在合成和真实Azure工作负载中,AGH以显著低于精确MILP解决方案的成本维持SLO合规性。这些结果表明,高质量分配在实现快速适应工作负载变化的同时,对需求变化提供了显著的鲁棒性。

英文摘要

Serving large language model (LLM) inference in cloud environments requires jointly optimizing model selection, GPU provisioning, parallelism configuration, and workload routing under latency, accuracy, memory, and budget constraints. While mixed-integer linear programming (MILP) can model this problem, its computational cost limits frequent re-optimization under demand variability. Existing heuristics often optimize individual components separately and may become infeasible when system-wide constraints are enforced. This paper presents a scalable framework for SLO-constrained LLM inference. We formulate the problem as an MILP with a two-phase delay model capturing both prefill and autoregressive decoding under tensor and pipeline parallelism. To solve it efficiently, we develop two constraint-aware heuristics: a Greedy Heuristic (GH) and an Adaptive Greedy Heuristic (AGH). AGH extends GH through multi-start construction, local search, and GPU consolidation. Both methods maintain feasibility through parallelism-aware filtering, cost-based ranking, and adaptive parallelism scaling. Experiments based on the Azure LLM Inference Trace show that GH generates feasible solutions within one second, while AGH achieves near-optimal performance within three seconds and scales to large instances where exact solvers fail to converge. Under out-of-sample stress with up to 1.5x delay and accuracy inflation, AGH degrades gracefully through provisioned headroom, yielding substantially lower cost and SLO violations than cost-minimal MILP solutions. Across synthetic and real Azure workloads, AGH maintains SLO compliance at significantly lower cost than exact MILP solutions. These results demonstrate that high-quality allocations provide substantial robustness to demand variability while enabling rapid adaptation to workload changes.

2604.10098 2026-06-08 cs.LG 版本更新

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Transformer中的注意力汇聚:利用、解释与缓解综述

Zunhai Su, Hengyuan Zhang, Wei Wu, Yifan Zhang, Yaxiu Liu, He Xiao, Qingyao Yang, Yuxuan Sun, Rui Yang, Chao Zhang, Jing Xiong, Hui Shen, Keyu Fan, Weihao Ye, Chaofan Tao, Taiqiang Wu, Zhongwei Wan, Tiantian Zhang, Bowen Yan, Zhen Li, Yiming Zhang, Congkai Xie, Yulei Qian, Yuchen Xie, Yik-Chung Wu, Hongxia Yang, Ngai Wong

发表机构 * Tsinghua University(清华大学) Meituan LongCat Team(美团LongCat团队) The University of Hong Kong(香港大学) University of Michigan(密歇根大学) Xiamen University(厦门大学) The Ohio State University(俄亥俄州立大学) Columbia University(哥伦比亚大学) Shanghai Artificial Intelligence Laboratory(上海人工智能实验室) The Hong Kong Polytechnic University(香港理工大学)

AI总结 本文首次系统综述Transformer中的注意力汇聚现象,从基础利用、机制解释和策略缓解三个维度梳理研究现状,为未来研究提供指导。

详情
AI中文摘要

作为现代机器学习的基础架构,Transformer已在多个AI领域取得了显著进展。尽管其具有变革性影响,但各种Transformer中普遍存在一个挑战——注意力汇聚(Attention Sink, AS),即大量注意力集中在少数特定但无信息量的标记上。AS使可解释性复杂化,显著影响训练和推理动态,并加剧幻觉等问题。近年来,大量研究致力于理解和利用AS。然而,缺乏系统整合AS相关研究并为未来进展提供指导的全面综述。为填补这一空白,我们提出了首个关于AS的综述,围绕定义当前研究格局的三个关键维度展开:基础利用、机制解释和策略缓解。我们的工作通过突出该领域的关键概念和主要趋势,引导研究人员了解AS相关研究的演变,做出了关键贡献。我们希望本综述能成为有价值的资源,使研究人员能够在当前Transformer范式下有效管理AS,同时为下一代Transformer的创新进展提供灵感。本文的论文列表可在https://github.com/ZunhaiSu/Awesome-Attention-Sink获取。

英文摘要

As the foundational architecture of modern machine learning, Transformers have driven remarkable progress across diverse AI domains. Despite their transformative impact, a persistent challenge across various Transformers is Attention Sink (AS), in which a disproportionate amount of attention is focused on a small subset of specific yet uninformative tokens. AS complicates interpretability, significantly affecting the training and inference dynamics, and exacerbates issues such as hallucinations. In recent years, substantial research has been dedicated to understanding and harnessing AS. However, a comprehensive survey that systematically consolidates AS-related research and offers guidance for future advancements remains lacking. To address this gap, we present the first survey on AS, structured around three key dimensions that define the current research landscape: Fundamental Utilization, Mechanistic Interpretation, and Strategic Mitigation. Our work makes a pivotal contribution by highlighting the key concepts and main trends in the field, guiding researchers through the evolution of AS-related studies. We envision this survey as a valuable resource, empowering researchers to effectively manage AS within the current Transformer paradigm, while simultaneously inspiring innovative advancements for the next generation of Transformers. The paper list of this work is available at https://github.com/ZunhaiSu/Awesome-Attention-Sink.

2604.21407 2026-06-08 cs.LG stat.CO stat.ML 版本更新

Even More Guarantees for Variational Inference in the Presence of Symmetries

变分推断在对称性存在下的更多保证

Lena Zellinger, Antonio Vergari

发表机构 * School of Informatics(信息学院) University of Edinburgh(爱丁堡大学)

AI总结 本文扩展了变分推断在目标对称性下的鲁棒性理论,证明了使用前向KL散度和α-散度时,即使模型误设也能精确恢复目标均值和相关矩阵,并放宽了对数凹假设,适用于多模态分布。

详情
AI中文摘要

当通过变分推断(VI)近似一个难以处理的密度时,变分族通常被选为一个简单的参数族,很可能不包含目标。这引发了一个问题:在模型误设的情况下,我们能在什么条件下恢复目标的特征?在这项工作中,我们在两个重要方面扩展了先前关于位置-尺度族在目标对称性下鲁棒VI的理论结果:(1)我们通过提供使用前向Kullback-Leibler散度和α-散度时精确恢复目标均值和相关矩阵的充分条件,将它们开放给更广泛的散度。(2)通过这样做,我们发现可以放弃先前工作中做出的对数凹目标的限制性假设,从而允许我们为更广泛的目标(包括多模态目标)提供保证。在我们的实验中,我们展示了我们的保证如何作为选择变分族和α值的指南,并通过一组多样化的例子说明了在缺乏我们的充分条件时优化如何以及为何会失败。

英文摘要

When approximating an intractable density via variational inference (VI) the variational family is typically chosen as a simple parametric family that very likely does not contain the target. This raises the question: Under which conditions can we recover characteristics of the target despite misspecification? In this work, we extend previous theoretical results on robust VI with location-scale families under target symmetries in two substantial ways: (1) We open them up to a wider range of divergences by providing sufficient conditions for exact recovery of the target mean and correlation matrix when using the forward Kullback-Leibler divergence and $α$-divergences. (2) By doing so, we find that we can drop the restrictive assumption of a log-concave target made in previous work, allowing us to give guarantees for a wider range of targets, including multi-modal ones. In our experiments, we show how our guarantees can serve as guidelines for the choice of the variational family and $α$-value and we illustrate on a diverse set of examples how and why optimization can fail in the absence of our sufficient conditions.

2604.27011 2026-06-08 cs.LG cs.AI 版本更新

Automatic Causal Fairness Analysis with LLM-Generated Reporting

基于LLM生成报告的自适应因果公平性分析

Alessia Berarducci, Eric Rossetto, Alessandro Antonucci, Marco Zaffalon

发表机构 * Istituto Dalle Molle di Studi sull’Intelligenza Artificiale (IDSIA), USI-SUPSI(日内瓦人工智能研究所(IDSIA)、USI-SUPSI)

AI总结 提出FairMind原型,利用标准公平模型进行因果公平分析,通过反事实查询计算因果效应,并借助LLM零样本生成公平性报告,优于直接LLM分析。

Comments 23 pages, 6 figures, 3 tables, LaTeX; added missing proof for Proposition 3, typos corrected, updated example 1 to have positive values for the Sankey

详情
AI中文摘要

AutoML,旨在自动化机器学习在现实问题中的应用,是AI普及的关键步骤。大多数AutoML框架未考虑训练数据及相应预测中潜在的公平性缺失。我们介绍\textsc{FairMind},一个旨在自动化数据集层面公平性分析的软件原型。我们通过借助Plečko和Bareinboim最近提出的\textit{标准公平模型}的假设来实现这一点。这允许基于涉及目标、可能的混杂因素和中介变量以及我们视为\textit{受保护}的输入特征的不同值的\textit{反事实}查询,进行合理的因果效应公平性评估。在必要的数据预处理之后,该工具实现了效应的闭式计算。随后利用LLM生成关于训练数据集中检测到的公平性水平的准确报告。我们在零样本设置中实现了这一点,并通过示例展示了相对于LLM直接分析的预期优势。为了促进应用,还讨论了有序受保护变量和连续目标的扩展以及新的分解结果。

英文摘要

AutoML, intended as the process of automating the application of machine learning to real-world problems, is a key step for AI popularisation. Most AutoML frameworks are not accounting for the potential lack of fairness in the training data and in the corresponding predictions. We introduce \textsc{FairMind}, a software prototype aiming to automatise fairness analysis at the dataset level. We achieve that by resorting to the assumptions of the \emph{standard fairness model}, recently proposed by Plečko and Bareinboim. This allows for a sound fairness evaluation in terms of causal effects, based on \emph{counterfactual} queries involving the target, possibly confounders and mediators, and the different values of an input feature we regard as \emph{protected}. After the necessary data preprocessing, the tool implements a closed-form computation of the effects. LLMs are consequently exploited to generate accurate reports on the fairness levels detected in the training dataset. We achieve that in a zero-shot setup and show by examples the expected advantages with respect to a direct analysis performed by the LLM. To favour applications, extensions to ordinal protected variable and continuous targets and novel decomposition results are also discussed.

2605.00369 2026-06-08 cs.LG cs.AI 版本更新

InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees

InvEvolve:通过具有性能保证的大语言模型进化白盒库存策略

Chenyu Huang, Jianghao Lin, Zhengyang Tang, Bo Jiang, Ruoqing Jiang, Benyou Wang, Lai Wei

发表机构 * Shanghai University of Finance and Economics(上海财经大学) Shanghai Jiao Tong University(上海交通大学) The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Tsinghua University(清华大学) Boston College(波士顿大学)

AI总结 提出InvEvolve框架,利用强化学习训练的大语言模型,结合置信区间认证,在线生成具有统计安全保证的白盒库存策略,在合成和真实零售数据上优于经典和深度学习方法。

详情
AI中文摘要

我们研究如何利用大语言模型在非平稳需求在线环境中生成库存策略。我们的工作受到基于LLM的进化搜索(如AlphaEvolve)最新进展的启发,这些方法在数学发现等静态且高度结构化问题上表现出色,但不太适合具有在线更新的动态库存环境。我们提出InvEvolve,一个基于置信区间认证的端到端库存策略进化与推理框架。InvEvolve建立在通过强化学习训练的大语言模型之上,能够处理需求数据以及额外的数值和文本特征,并生成具有统计安全保证的白盒库存策略,用于未来部署。我们进一步引入一个具有理论保证的统一框架,连接训练、推理和部署。这使我们能够推导出InvEvolve进化出统计上安全且改进的策略的概率下界,并刻画相对于最优安全基准的多周期性能差距。在合成数据和真实零售数据上的测试表明,InvEvolve优于经典库存策略和基于深度学习的方法。在典型库存设置中,它生成了超越现有基准的新策略。

英文摘要

We study how large language models can be used to generate inventory policies in online settings with non-stationary demand. Our work is motivated by recent advances in LLM-based evolutionary search, such as AlphaEvolve, which demonstrates strong performance on static and highly structured problems such as mathematical discovery, but is not directly suited to dynamic inventory settings with online updates. We propose InvEvolve, an end-to-end inventory policy evolution and inference framework grounded in confidence-interval-based certification. Built on a large language model trained via reinforcement learning, InvEvolve can process demand data together with additional numerical and textual features, and generates white-box inventory policies with statistical safety guarantees for future deployment. We further introduce a unified framework with theoretical guarantees that connects training, inference, and deployment. This allows us to derive a lower bound on the probability that InvEvolve evolves a statistically safe and improved policy, and to characterize the multi-period performance gap relative to the oracle-safe benchmark. Tested on both synthetic data and real-world retail data, InvEvolve outperforms classical inventory policies and deep-learning-based methods. In canonical inventory settings, it generates new policies that outperform existing benchmarks.

2605.01642 2026-06-08 cs.LG 版本更新

Adaptive Pluralistic Alignment: A pipeline for dynamic artificial democracy

自适应多元对齐:动态人工民主的流水线

Rachel Freedman

发表机构 * GitHub

AI总结 提出自适应多元对齐(APA)流水线,通过低秩奖励基分解和陪审团投票机制,动态追踪社会价值观演变,避免价值锁定,无需重复预训练或大规模数据收集。

详情
AI中文摘要

当前的对齐方法针对一组固定的偏好,因此随着社会规范随时间演变,存在强制价值锁定的风险。我们引入了自适应多元对齐(APA),这是一个模块化流水线,用于更新多元对齐的AI系统,以追踪不断变化的价值观并避免价值锁定,而无需重复昂贵的预训练或大规模数据收集。APA包含三个阶段:(1)通过低秩奖励基分解学习紧凑的个性化奖励模型;(2)使用这些模型作为陪审团,通过社会选择理论投票集体选择候选输出;(3)随着价值观变化,通过在固定奖励基上拟合新的注释者权重,高效地随时间调整陪审团。由此产生的系统高效、可解释、可引导且模块化。我们使用PRISM多用户对齐数据集和模拟的历史注释者实现了概念验证实例,并提供了初步分析,表明陪审团组成和投票规则的选择可以显著影响结果,尤其是在陪审团偏好异质的情况下。我们在https://github.com/RachelFreedman/apa提供完整代码和生成的偏好数据集。

英文摘要

Prevailing alignment methods target a fixed set of preferences and therefore risk forcing value lock-in as societal norms evolve over time. We introduce Adaptive Pluralistic Alignment (APA), a modular pipeline for updating pluralistically aligned AI systems to track evolving values and avoid value lock-in without repeating costly pretraining or large-scale data collection. APA has three stages: (1) learning compact personalized reward models via low-rank reward basis decomposition, (2) using these models as a jury that collectively selects among candidate outputs through social-choice-theoretic voting, and (3) efficiently adapting the jury over time by fitting new annotator weights over the fixed reward bases as values shift. The resulting system is efficient, explainable, steerable, and modular. We implement a proof-of-concept instantiation using the PRISM multi-user alignment dataset and simulated historical annotators, and provide preliminary analysis showing that jury composition and the choice of voting rule can substantially affect outcomes, particularly when jury preferences are heterogeneous. We provide full code and resulting preference datasets at https://github.com/RachelFreedman/apa.

2605.04130 2026-06-08 cs.LG 版本更新

Constrained Extreme Gradient Boosting for Adapting Reduced-Order Models

约束极端梯度提升用于自适应降阶模型

Melika Baghi, Xiao Liu, Kamran Paynabar

发表机构 * H. Milton Stewart School of Industrial and Systems Engineering(H. Milton Stewart工业与系统工程学院)

AI总结 提出约束极端梯度提升(cXGBoost)框架,通过Grassmann流形上的几何表示和范数约束,预测参数依赖的POD基,实现高效自适应的降阶建模。

Comments Preprint. Under review. 4 numerical examples

详情
AI中文摘要

高保真仿真(如计算流体动力学和有限元分析)对于建模复杂工程系统至关重要,但在参数研究、优化和实时控制等任务中往往成本过高。基于投影的降阶模型(ROM)通过将控制动力学投影到低维子空间来缓解这一成本。然而,其性能在参数变化下可能恶化,因此需要自适应基构造。在这项工作中,我们提出了一种约束集成学习框架,称为约束极端梯度提升(cXGBoost),用于预测作为系统参数函数的本征正交分解(POD)基。该方法利用Grassmann流形上子空间的几何表示,将其映射到欧几里得空间,以便使用梯度提升树进行高效回归。在训练过程中施加范数约束,以确保逆映射的有效性并保持预测子空间的几何结构。所提出的方法在四个数值示例(包括流体动力学和波传播问题)上进行了评估,证明了其能够准确预测参数依赖的基,同时在非线性区域内保持鲁棒性。这些结果凸显了将几何学习与约束集成方法相结合,用于高维参数系统可扩展且可靠的降阶建模的潜力。

英文摘要

High-fidelity simulations, such as computational fluid dynamics and finite element analysis, are essential for modeling complex engineering systems but are often prohibitively expensive for tasks including parametric studies, optimization, and real-time control. Projection-based reduced-order models (ROMs) alleviate this cost by projecting the governing dynamics onto low-dimensional subspaces. However, their performance can deteriorate under parameter variation, motivating the need for adaptive basis construction. In this work, we propose a constrained ensemble learning framework, termed Constrained Extreme Gradient Boosting (cXGBoost), for predicting Proper Orthogonal Decomposition (POD) bases as functions of system parameters. The approach leverages a geometric representation of subspaces on the Grassmann manifold, which are mapped to a Euclidean space to enable efficient regression using gradient boosting trees. A norm constraint is imposed during training to ensure the validity of the inverse mapping and preserve the geometric structure of the predicted subspaces. The proposed method is evaluated on four numerical examples, including fluid dynamics and wave propagation problems, demonstrating its ability to accurately predict parameter-dependent bases while maintaining robustness across nonlinear regimes. These results highlight the potential of combining geometric learning with constrained ensemble methods for scalable and reliable reduced-order modeling of high-dimensional parametric systems.

2605.05220 2026-06-08 cs.LG cs.AI 版本更新

MidSteer: Optimal Affine Framework for Steering Generative Models

MidSteer:用于引导生成模型的最优仿射框架

Tatiana Gaintseva, Andrew Stepanov, Ziquan Liu, Martin Benning, Gregory Slabaugh, Jiankang Deng, Ismail Elezi

发表机构 * University of Basel(巴塞尔大学) University of California, Berkeley(加州大学伯克利分校) ETH Zurich(苏黎世联邦理工学院) University of Cambridge(剑桥大学) University of Washington(华盛顿大学)

AI总结 本文提出MidSteer,一种基于仿射变换的最优概念引导框架,通过最小干扰实现生成模型中的概念切换,并在视觉扩散模型和大型语言模型上验证其有效性。

详情
AI中文摘要

引导中间表示已成为控制生成模型的一种强大策略,特别是在部署后对齐和安全设置中。然而,尽管其经验成功,目前仍缺乏全面的理论框架。在本文中,我们通过形式化概念引导的理论来弥补这一差距。首先,我们在引导和仿射概念擦除之间建立联系,证明去除不期望行为的标准方法是LEACE(一种用于仿射擦除的闭式方法)的特例。接下来,我们为概念切换LEACE-Switch制定了一个原则性的理论框架,并描述了其提供最优仿射解的假设。基于这一分析,我们引入了MidSteer(最小干扰概念引导),一个更通用的用于概念操作的仿射框架,它放宽了这些假设并实现了定向的最小干扰变换。我们证明MidSteer在一系列任务、模态和架构(包括视觉扩散模型和大型语言模型)中表现良好。

英文摘要

Steering intermediate representations has emerged as a powerful strategy for controlling generative models, particularly in post-deployment alignment and safety settings. However, despite its empirical success, it currently lacks a comprehensive theoretical framework. In this paper, we bridge this gap by formalizing the theory of concept steering. First, we establish a link between steering and affine concept erasure, proving that the standard approach for removing unwanted behaviors is a special case of LEACE (a closed-form method for affine erasure). Next, we formulate a principled theoretical framework for concept switching, LEACE-Switch, and characterize the assumptions under which it provides an optimal affine solution. Building on this analysis, we then introduce MidSteer (Minimal Disturbance concept Steering), a more general affine framework for concept manipulation that relaxes these assumptions and enables directed, minimal-disturbance transformations. We demonstrate that MidSteer performs favorably across a range of tasks, modalities, and architectures, including vision diffusion models and large language models.

2605.05225 2026-06-08 cs.LG cs.AI 版本更新

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

MACS: 模态感知容量缩放用于高效多模态MoE推理

Bo Li, Chuan Wu, Shaolin Zhu

发表机构 * School of Software, Tsinghua University, Beijing, China(清华大学软件学院,北京,中国) TJUNLP Lab, School of Computer Science and Technology, Tianjin University, China(天津大学计算机科学与技术学院,中国) School of New Media and Communication, Tianjin University, China(天津大学新媒体与传播学院,中国)

AI总结 针对多模态MoE大模型在专家并行推理中因信息异质性和模态动态性导致的效率瓶颈,提出无需训练的MACS框架,通过熵加权负载和动态模态自适应容量机制优化资源分配,显著提升多模态基准性能。

Comments Accepted by ACL 2026

详情
AI中文摘要

混合专家多模态大语言模型(MoE MLLMs)在专家并行(EP)推理过程中因落后者效应而遭受显著的效率瓶颈。在多模态背景下,这一问题更加严重,因为现有的基于token计数的负载均衡方法无法解决两个独特挑战:(1)信息异质性,其中大量冗余的视觉token与语义关键的token被同等对待;(2)模态动态性,不同任务中视觉与文本比例的变化导致资源错配。为应对这些挑战,我们提出MACS(模态感知容量缩放),一种无需训练的推理框架。具体而言,MACS引入熵加权负载机制来量化视觉token的语义价值,解决信息异质性。此外,动态模态自适应容量机制根据输入的实时模态组成分配专家资源。大量实验表明,MACS在各种多模态基准上显著优于现有方法,为MoE MLLMs在EP推理中的高效部署提供了新颖且稳健的解决方案。

英文摘要

Mixture-of-Experts Multimodal Large Language Models (MoE MLLMs) suffer from a significant efficiency bottleneck during Expert Parallelism (EP) inference due to the straggler effect. This issue is worsened in the multimodal context, as existing token-count-based load balancing methods fail to address two unique challenges: (1) Information Heterogeneity, where numerous redundant visual tokens are treated equally to semantically critical ones, and (2) Modality Dynamics, where varying visual to text ratios across tasks lead to resource misallocation. To address these challenges, we propose MACS (Modality-Aware Capacity Scaling), a training-free inference framework. Specifically, MACS introduces an Entropy-Weighted Load mechanism to quantify the semantic value of visual tokens, addressing information heterogeneity. Additionally, the Dynamic Modality-Adaptive Capacity mechanism allocates expert resources based on the real-time modal composition of the input. Extensive experiments demonstrate that MACS significantly outperforms existing methods on various multimodal benchmarks, providing a novel and robust solution for the efficient deployment of MoE MLLMs in EP inference.

2605.08692 2026-06-08 cs.LG cs.CL 版本更新

AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization

AAAC: 面向4位LLM权重量化的激活感知自适应码本

Beshr IslamBouli, David Jin

发表机构 * University of Waterloo(滑铁卢大学)

AI总结 提出AAAC方法,通过每层两个小型学习码本(64字节)替代固定标量码本,以激活加权重建误差最小化选择码本,实现零额外存储开销的4位权重量化,在3-30分钟内完成量化,精度优于现有方法。

详情
AI中文摘要

训练后仅权重量化至4位被广泛用于减少大语言模型推理的内存和计算成本。现有的PTQ方法,如AWQ和GPTQ,通过缩放、裁剪或误差补偿改进权重映射到固定4位网格的方式。为进一步提高精度,OmniQuant和QuIP#等方法使用梯度辅助算法,但需要数小时的量化时间。在这项工作中,我们提出AAAC(激活感知自适应码本),一种用于4位LLM权重量化的轻量级方法。AAAC用每层两个小型学习标量码本(64字节)替换标准量化中使用的固定标量码本。每组权重选择使激活加权重建误差最小的码本,将选择编码在组正缩放的未使用符号位中,并增加零存储开销。AAAC在单个GPU上3-30分钟内完成,且不增加模型本身之外的额外内存。我们跨模型族与AWQ、GPTQ、IF4、GPTVQ、OmniQuant、SqueezeLLM和QuIP#进行评估。AAAC在量化时间少几个数量级的情况下优于基线方法。

英文摘要

Post-training weight-only quantization to 4 bits is widely used to reduce the memory and compute costs of large language model inference. Existing PTQ methods, such as AWQ and GPTQ, improve how weights are mapped onto a fixed 4-bit grid through scaling, clipping, or error compensation. To further improve accuracy, methods such as OmniQuant and QuIP\# uses gradient-assisted algorithms at the cost of hours of quantization time. In this work, we propose AAAC (Activation-Aware Adaptive Codebooks), a lightweight method for 4-bit LLM weight quantization. AAAC replaces the fixed scalar codebook used in standard quantization with two small learned scalar codebooks (64 bytes) per layer. Each group of weights selects the codebook that minimizes activation-weighted reconstruction error, encoding the choice in the unused sign bit of the group's positive scale and adding zero storage overhead. AAAC completes in 3--30 minutes on a single GPU, and adds no memory beyond the model itself. We evaluate against AWQ, GPTQ, IF4, GPTVQ, OmniQuant, SqueezeLLM, and QuIP\# across model families. AAAC outperforms baselines at orders-of-magnitude less quantization time.

2605.15354 2026-06-08 cs.LG 版本更新

Controllable Molecular Generative Foundation Models

可控分子生成基础模型

Yihan Zhu, Yuhan Liu, Weijiang Li, Tengfei Luo, Meng Jiang

发表机构 * University of Notre Dame(诺丁汉大学)

AI总结 提出CoMole,一种基于基团感知图扩散的统一框架,结合强化学习优化条件反向策略,在材料与药物设计的九个目标上均实现最优可控性,MAE最高降低48.2%,且无需规则修正。

详情
AI中文摘要

尽管基础模型在语言和视觉领域取得了成功,分子图生成仍然缺乏一个统一的框架来处理异构设计任务并具有可靠的可控性。虽然强化学习(RL)为任务特定优化提供了一种自然的后训练机制,但将其应用于图生成模型受到巨大的原子级动作空间和化学无效中间状态的阻碍。我们提出了\textbf{Co}ntrollable \textbf{Mole}cular Generative Foundation Models(CoMole),该模型基于统一的基团感知图扩散流程构建。通过学习基团感知图空间,CoMole将预训练的结构先验转化为可控生成,其中RL优化基于化学有意义决策的条件反向策略。我们从理论上刻画了原子级RL的瓶颈,并论证了基团感知策略优化的合理性。在涵盖材料和药物发现的三个异构基准测试中,CoMole在所有九个目标上均排名第一,相对于最强基线,MAE最高降低48.2%,并且在不依赖规则修正或事后过滤的情况下,有效性保持在0.94以上。我们进一步证明,CoMole通过仅优化任务嵌入而冻结生成器,将可控性迁移到未见属性,其性能与强大的任务特定基线相当。

英文摘要

Despite the success of foundation models in language and vision, molecular graph generation still lacks a unified framework for heterogeneous design tasks with reliable controllability. While reinforcement learning (RL) offers a natural post-training mechanism for task-specific optimization, applying it to graph generative models is hindered by the vast atom-wise action spaces and chemically invalid intermediate states. We propose \textbf{Co}ntrollable \textbf{Mole}cular Generative Foundation Models (CoMole), built with a unified motif-aware graph diffusion pipeline. By learning a motif-aware graph space, CoMole transfers pretrained structural priors into controllable generation, where RL optimizes conditional reverse policies over chemically meaningful decisions. We theoretically characterize the bottleneck of atom-level RL and justify motif-aware policy optimization. Across three heterogeneous benchmarks spanning materials and drug discovery, CoMole ranks first in controllability on all nine targets, reduces MAE by up to 48.2% relative to the strongest baselines, and maintains validity above 0.94 without rule-based correction or post-hoc filtering. We further show that CoMole transfers controllability to unseen properties by optimizing only task embeddings with the generator frozen, achieving performance competitive with strong task-specific baselines.

2605.15888 2026-06-08 cs.LG cs.AI 版本更新

CHoE: Cross-Domain Heterogeneous Graph Prompt Learning via Structure-Conditioned Experts

CHoE: 基于结构条件专家的跨域异构图提示学习

Peiyuan Li, Yongqi Huang, Jitao Zhao, Dongxiao He, Di Jin, Weixiong Zhang

发表机构 * School of Computer Science and Technology, Tianjin University(天津大学计算机科学与技术学院) Department of Health Technology and Informatics, and Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University(香港理工大学健康科技与信息学系、数据科学与人工智能系)

AI总结 提出CHoE方法,通过结构条件专家网络和结构感知路由机制,实现跨域异构图提示学习,在少样本跨域任务中优于基线方法。

Comments accepted by IJCAI 2026, 9 pages, 4 figures

详情
AI中文摘要

异构图提示学习(HGPL)已成为弥合预训练基础模型目标与其在下游异构图中应用之间差距的有前景范式。然而,现有HGPL方法主要针对域内场景设计,而实际部署通常跨越多个域,且预训练和下游任务的数据可能来自不同分布。因此,当前HGPL方法的适用性仅限于域内设置,当应用域发生变化时,其性能通常会下降。为解决这一严重限制,我们开发了CHoE,一种基于专家网络的跨域HGPL方法。在预训练期间,我们引入并训练结构条件专家;在提示调优期间,我们采用结构感知的专家路由和负载均衡机制,为每个元路径视图选择结构兼容的专家。此外,我们设计了一个基于提示的语义融合模块,以整合多个视图的表示用于下游预测。大量实验表明,CHoE在少样本跨域应用中持续提升性能,优于所有基线方法。

英文摘要

Heterogeneous Graph Prompt Learning (HGPL)has emerged as a promising paradigm for bridging the gap between the objectives of pre-training foundation models and their downstream applications in heterogeneous graph settings. However, existing HGPL methods are primarily designed for in-domain scenarios, whereas real-world deployments often span multiple domains, and the data used for pre-training and downstream tasks may originate from different distributions. Consequently, the applicability of current HGPL approaches is limited to in-domain settings, and their performance typically degrades when application domains shift. To address this serious limitation, we develop CHoE, a cross-domain HGPL method built upon an expert network. During pre-training, we introduce and train structure-conditioned experts, and during prompt tuning, we adopt a structure-aware expert routing and load balancing mechanism to select structurally compatible experts for each meta-path view. In addition, we design a prompt-based semantic fusion module to integrate representations across multiple views for downstream prediction. Extensive experiments show that CHoE consistently improves performance in few-shot cross-domain applications, outperforming all baseline approaches.

2605.17333 2026-06-08 cs.LG 版本更新

Leveraging Error Diversity in Group Rollouts for Reinforcement Learning

利用群体回滚中的误差多样性进行强化学习

Wenpu Liu, Yuqi Xu, Weichu Xie, Yongfu Zhu, Shuai Dong, Ziyue Wang, Wenqi Shao, Xiaoying Zhang, Tong Yang, Nan Duan, Jiaqi Wang

发表机构 * Peking University(北京大学) JD.COM(京东公司) Shanghai Innovation Institute(上海创新研究院)

AI总结 本文提出EDAS方法,通过利用群体回滚中的误差多样性来提升强化学习的效果,通过调整错误回滚的优势信号,鼓励模型保持多样化的推理路径,从而提高训练成功率。

Comments Code available at https://github.com/EDAS-jd/EDAS

详情
AI中文摘要

基于可验证奖励的强化学习(RLVR)通常为每个提示生成多个响应并根据个体正确性分配二元奖励,但群体输出的整体结构,特别是误差分布,通常被忽视。我们发现这是一个被忽视的机会:实证分析表明,群体内部的误差多样性是训练成功的重要预测因素,那些产生多样化错误回答的问题比产生同质性失败的问题更能从RLVR中获益。受此启发,我们提出了误差多样性优势塑造(EDAS),一种轻量、算法无关的技术,通过群体内部误差多样性调节错误回滚的优势信号。EDAS放大对主导、重复性错误的惩罚,减弱对罕见、探索性错误的惩罚,从而鼓励模型保持多样化推理路径,防止错误固着。关键的是,EDAS作为一种简单的后处理调整,可以无缝集成到任何RLVR算法中。我们在多个主流RLVR方法上验证了EDAS,展示了在一系列模型和七个具有挑战性的数学基准测试中的持续改进。值得注意的是,EDAS在七个基准测试中对Qwen3-8B的DAPO平均改进了6.29分,证实了利用群体回滚中的潜在信息是增强RLVR的有效策略。

英文摘要

Reinforcement Learning from Verifiable Rewards (RLVR) typically samples multiple responses per prompt and assigns binary rewards based on individual correctness, yet the collective structure of the group output, specifically the distribution of errors, is largely discarded. We identify this as a missed opportunity: empirical analysis reveals that error diversity within a group is a strong predictor of training success, with problems eliciting diverse wrong answers benefiting substantially more from RLVR than those producing homogeneous failures. Motivated by this observation, we propose Error Diversity Advantage Shaping (EDAS), a lightweight, algorithm-agnostic technique that modulates the advantage signal for incorrect rollouts based on intra-group error diversity. EDAS amplifies penalties for dominant, repeated errors and attenuates penalties for rare, exploratory ones, thereby encouraging the model to maintain diverse reasoning paths and discouraging error perseveration. Crucially, EDAS operates as a simple post-hoc adjustment that can be seamlessly integrated into any RLVR algorithm. We validate EDAS on top of several mainstream RLVR methods across a series of models and seven challenging math benchmarks, demonstrating consistent improvements. Notably, EDAS yields an average improvement of 6.29 points over DAPO on Qwen3-8B across seven benchmarks, confirming that exploiting the latent information in group rollouts is a broadly effective strategy for strengthening RLVR.

2605.21731 2026-06-08 cs.LG 版本更新

I-SAFE: Wasserstein Coherence Metrics for Structural Auditing of Scientific AI Models

I-SAFE:基于瓦尔德斯坦一致性度量的科学AI模型结构审计

Barbara Tarantino, Gennaro Auricchio, Paolo Giudici

发表机构 * Department of Economics, University of Pavia(经济学系,帕维亚大学) Department of Mathematics, University of Padua(数学系,帕维亚大学)

AI总结 本文提出I-SAFE框架,通过瓦尔德斯坦一致性度量对科学AI模型进行结构审计,揭示模型在分布响应上的差异,为科学AI模型提供更可靠的评估方法。

详情
AI中文摘要

深度学习模型在科学预测任务中被越来越多地使用,其中强大的基准性能常被解释为具有科学意义的行为。然而,这种解释是脆弱的,因为模型可能利用捷径特征、数据集特定的规律或分布偏见,这些在验证数据上具有预测性,但与领域相关的结构不一致。为了解决这一限制,我们引入了I-SAFE(Interventional Secure, Accurate, Fair and Explainable)框架,这是一个面向科学AI模型的后验分布审计框架,核心是瓦尔德斯坦一致性度量(WCM)。给定一个训练好的黑盒预测器和一个外部结构先验,该框架评估模型输出在结构引导的输入扰动下的表现。所提出的审计度量通过三个互补的指标输出分布一致性:基于分位数的度量(QBM)用于位置级一致性,WCM用于顺序一致性,以及一个翻译不变的WCM变体用于形状一致性。我们通过药物-靶点相互作用(DTI)预测在Davis激酶基准、KLIFS(激酶-配体相互作用指纹和结构)结合口袋注释以及三个基于序列的DTI模型:DeepConvDTI、DeepDTA和TAPB上实例化I-SAFE。尽管这些模型在可比的预测范围内运行,I-SAFE揭示了显著不同的分布响应特征,这种差异在基于准确性的评估中是不可见的。该框架是模型无关的,适用于任何输入具有结构分解和外部先验可用的领域。

英文摘要

Deep learning models are increasingly used in scientific prediction tasks where strong benchmark performance is often interpreted as evidence of scientifically meaningful behavior. This interpretation is fragile, as models may exploit shortcut features, dataset-specific regularities, or distributional biases that are predictive on held-out data but not aligned with domain-relevant structure. To address this limitation, we introduce the \textsc{I-SAFE} (Interventional Secure, Accurate, Fair and Explainable) framework, a post-hoc distributional auditing framework for scientific AI models centered on the Wasserstein Coherence Metric (WCM). Given a trained black-box predictor and an external structural prior encoding domain knowledge about task-relevant input structure, \textsc{I-SAFE} evaluates raw model outputs under structurally guided perturbations of the input. The proposed audit measures output-distribution coherence through three complementary metrics: a Quantile-Based Metric (QBM) for location-level coherence, the WCM for ordinal coherence, and a translation-invariant WCM variant for shape coherence. We instantiate \textsc{I-SAFE} on drug--target interaction (DTI) prediction using the Davis kinase benchmark, KLIFS (Kinase--Ligand Interaction Fingerprints and Structures) binding-pocket annotations, and three sequence-based DTI models: DeepConvDTI, DeepDTA, and TAPB. Although the models operate in a comparable predictive regime, \textsc{I-SAFE} reveals substantially different distributional response profiles, a distinction invisible to accuracy-based evaluation. The framework is model-agnostic and applicable to any domain where inputs admit a structured decomposition and an external prior is available.

2605.25054 2026-06-08 cs.LG cs.AI 版本更新

Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training

按需扩展:自适应神经元级混合精度量化感知训练

Ayush K. Varshney, Konstantinos Vandikas, Šarūnas Girdzijauskas, Adam Orucu, Aneta Vulgarakis Feljan

发表机构 * University of California, Berkeley(加州大学伯克利分校) DeepMind(深度思维) University of Cambridge(剑桥大学)

AI总结 提出神经元级混合精度量化感知训练(NMP-QAT),通过可微代理和直通估计器让每个神经元独立学习离散精度,实现按需扩展位宽,在MLP和表格基础模型上取得更优的压缩-精度权衡。

Comments Accepted at ICML - GlobalSouthML workshop, 2026

详情
AI中文摘要

在资源受限的6G边缘设备上部署深度神经网络需要激进压缩且最小化精度损失。量化感知训练(QAT)已成为领先的压缩方法;然而,现有的混合精度方法通常以粗粒度的层或通道级别操作。这些方法通常依赖启发式或基于搜索的位分配策略,可能忽略神经元级别的细粒度变异性。我们提出神经元级混合精度QAT(NMP-QAT),其中每个神经元在训练期间独立学习自己的离散精度。从低位精度开始,NMP-QAT仅在训练信号需要时通过可微代理和直通估计器扩展位宽,同时保持完全离散的推理图。这种适应性扩展到权重和激活,减少内存移动。在电信和非电信数据集上,跨MLP和表格基础模型架构评估,NMP-QAT相比混合精度QAT基线实现了更优的压缩-精度权衡,使其非常适合网络边缘的绿色AI部署。

英文摘要

Deploying deep neural networks on resource-constrained 6G edge devices demands aggressive compression with minimal accuracy loss. Quantization-Aware Training (QAT) has emerged as a leading compression approach; however, existing mixed-precision methods typically operate at coarse layer- or channel-level granularity. These methods often rely on heuristic or search-based bit-allocation strategies, which may overlook fine-grained variability at the neuron level. We propose Neuron-Level Mixed-Precision QAT (NMP-QAT), where each neuron independently learns its own discrete precision during training. Starting from low-bit precision, NMP-QAT expands bit-width only when training signals demand it, via differentiable surrogates and straight-through estimators, while preserving a fully discrete inference graph. This adaptability extends to both weights and activations, reducing memory movement. Evaluated on telecom and non-telecom datasets across MLP and tabular foundation model architectures, NMP-QAT achieves superior compression-accuracy trade-offs over mixed-precision QAT baselines, making it well-suited for Green AI deployments at the network edge.

2605.25413 2026-06-08 cs.LG cs.AI cs.NA math.NA 版本更新

Autoregression-Free Neural Operators for Time-Dependent PDEs

无自回归的神经算子用于时间相关偏微分方程

Jiaquan Zhang, Caiyan Qin, Haoyu Bian, Libin Cai, Yi Lu, Chaoning Zhang, Wei Dong, Yuanfang Guo, Yang Yang, Heng Tao Shen

发表机构 * School of Computer Science and Engineering, University of Electronic Science and Technology of China(电子科技大学计算机科学与工程学院) School of Robotics and Advanced Manufacture, Harbin Institute of Technology(哈尔滨工业大学机器人与先进制造学院) School of Mathematical Sciences, Capital Normal University(首都师范大学数学学院) College of Information and Control Engineering, Xi’an University of Architecture and Technology(西安建筑科技大学信息与控制工程学院) Laboratory of Intelligent Recognition and Image Processing, School of Computer Science and Engineering, Beihang University(北京航空航天大学智能识别与图像处理实验室) School of Computer Science and Technology, Tongji University(同济大学计算机科学与技术学院)

AI总结 提出AFNO,通过将PDE时间演化映射到潜空间并利用流匹配学习连续时间向量场,避免自回归展开,实现长期稳定预测。

Comments 23 pages, 18 figures

详情
AI中文摘要

神经算子学习从函数依赖输入到解的映射,为求解偏微分方程(PDE)提供了有效框架。对于时间相关PDE,现有方法通常通过在高维物理场空间中直接进行自回归展开来执行长时域预测,其中每个预测状态被递归地反馈作为下一步的输入。尽管对短期预测有效,但这种自回归展开以及缺乏连续时间建模导致长时域展开中误差逐渐累积。在这项工作中,我们提出无自回归神经算子(AFNO),将PDE的时间演化映射到潜空间并在其中建模连续时间向量场。AFNO使用流匹配来学习潜向量场,从而能够在扩展时域上实现连续演化,避免自回归展开,并通过显式条件化物理参数来捕捉不同参数配置下的动力学。对六个PDE的理论分析和广泛实验表明,与基线相比,AFNO提高了长时域预测稳定性并持续减少了展开误差。

英文摘要

Neural operators learn mappings from function-dependent inputs to solutions, providing an effective framework for solving partial differential equations (PDEs). For time-dependent PDEs, existing methods typically perform long-horizon prediction through autoregressive rollout directly in high-dimensional physical field spaces, where each predicted state is recursively fed back as the input for the next step. Although effective for short-term prediction, this autoregressive rollout and the lack of continuous-time modeling lead to progressive error accumulation over long-horizon rollouts. In this work, we propose Autoregression-Free Neural Operators (AFNO), which map the time evolution of PDEs into a latent space and model continuous-time vector fields within it. AFNO uses flow matching to learn the latent vector field, thereby enabling continuous evolution over extended horizons, avoiding autoregressive rollout and capturing dynamics under varying parameter configurations through explicit conditioning on physical parameters. Theoretical analysis and extensive experiments on six PDEs demonstrate that AFNO improves long-horizon prediction stability and consistently reduces rollout errors compared with the baselines.

2605.25451 2026-06-08 cs.LG 版本更新

BigMac: Breaking the Pareto Frontier of Compute and Memory in Multimodal LLM Training

BigMac: 打破多模态大语言模型训练中的计算与内存帕累托前沿

Zili Zhang, Chengxu Yang, Shenglong Zhang, Chenyu Wang, Yufan Zhang, Tuo Dai, Zhouyang Li, Yuhong Ge, Chao Jin, Xin Jin, Yuliang Liu

发表机构 * Peking University(北京大学) Independent Researcher(独立研究员) Xiaohongshu, Inc(小红书公司)

AI总结 提出BigMac训练流水线,通过嵌套编码器和生成器计算到LLM流水线中,同时优化计算效率和内存使用,打破帕累托前沿。

详情
AI中文摘要

训练多模态大语言模型(MLLMs)面临模型和数据的异构性挑战。现有系统重新设计训练流水线以应对这些挑战,但仍受限于计算与内存效率之间的帕累托前沿,只能以牺牲一方为代价改进另一方。我们提出BigMac,一种新的多模态大语言模型训练流水线。BigMac的核心思想是将编码器和生成器的计算优雅地嵌套到原始LLM流水线中,形成依赖安全的嵌套流水线结构。通过这种设计,BigMac将编码器和生成器的激活内存复杂度降低到O(1),同时保持LLM的激活内存复杂度不变。同时,它实现了与具有无限内存的理想设置相同的计算效率。因此,BigMac打破了计算效率与内存使用之间的帕累托前沿,使得在MLLM训练中能够同时优化计算和内存。我们在多个MLLM和训练负载上评估了BigMac。实验结果表明,与基线系统相比,BigMac实现了1.08倍至1.9倍的训练加速,同时随着批次大小的增加保持稳定的内存使用。

英文摘要

Training multimodal large language models (MLLMs) is challenged by both model and data heterogeneity. Existing systems redesign the training pipeline to address these challenges, but remain bound by a Pareto frontier between compute and memory efficiency, improving one only at the expense of the other. We present BigMac, a new training pipeline for multimodal LLMs. The core idea of BigMac is to elegantly nest the encoder and generator computation into the original LLM pipeline, forming a dependency-safe nested pipeline structure. With this design, BigMac reduces the activation memory complexity of the encoder and generator to O(1) while keeping the activation memory complexity of the LLM unchanged. At the same time, it achieves the same computational efficiency as the idealized setting with unlimited memory. As a result, BigMac breaks the Pareto frontier between computational efficiency and memory usage, enabling simultaneous optimization of both computation and memory in MLLM training. We evaluate BigMac on multiple MLLMs and training workloads. Experimental results show that BigMac achieves a 1.08$\times$-1.9$\times$ training speedup over baseline systems while maintaining stable memory usage as batch size increases.

2605.29223 2026-06-08 cs.LG 版本更新

Inferring the Size of Large Language Models From Popular Text Memorization

从流行文本记忆推断大型语言模型的规模

Ivica Nikolic

发表机构 * National University of Singapore(新加坡国立大学)

AI总结 提出一种黑盒方法,通过分析模型对流行文本的记忆准确性,仅从生成文本推断LLM参数规模的下界,并验证了开源和闭源模型。

详情
AI中文摘要

最广泛使用的大型语言模型(LLM)的参数数量通常被其开发者隐瞒,使得模型规模——解释能力和成本的主要参考点——在很大程度上未被公开。我们提出了一种黑盒方法,仅从生成的文本输出推断LLM规模的下界,除了提交文本片段和观察下一个词预测的能力外,不需要任何其他条件。我们的方法基于一个关键观察:流行的、广泛传播的文本——如古典文学、宗教文本和基础文档——几乎存在于每个大规模预训练语料库中,而模型在不同长度文本片段上预测下一个词的准确度是其记忆程度的可靠信号,而记忆程度又从根本上受到其总参数数量的限制。我们将来自不同文本和片段长度的记忆信号聚合成每个模型的单一准确率轮廓向量,并在此基础上构建了两种互补的推断方法:一种成对统计检验,用于确定两个模型中哪个更大;以及一种缩放律估计器,通过主成分分析(PCA)从这些向量中提取一维潜在指数,将聚合信号映射到参数数量。在广泛的开源模型上验证,两种方法都产生了准确可靠的下界。当应用于流行的闭源模型时,我们的框架恢复了内部产品层级,并揭示了行业扩展策略的明显分歧:虽然一些开发者产生了显著更高的下界,表明代际参数大幅增长,但其他开发者在严格的参数上限下运行,表明即使在严格的API限制下,隐藏的设计选择也可以被系统地探测。

英文摘要

The parameter counts of the most widely used large language models (LLMs) are often withheld by their developers, leaving model size -- a primary reference point for interpreting capabilities and costs -- largely undisclosed. We propose a black-box method to infer conservative lower bounds on LLM size from generated text outputs alone, requiring nothing beyond the ability to submit text fragments and observe next-token predictions. Our approach is grounded in a key observation: popular, widely-circulated texts -- such as classical literature, religious texts, and foundational documents -- are present in virtually every large-scale pretraining corpus, and how accurately a model predicts the next word across text fragments of varying length is a reliable signal of how much it has memorized them, which in turn is fundamentally limited by its total parameter count. We aggregate this memorization signal across a diverse corpus of texts and fragment lengths into a single accuracy profile vector per model, and build two complementary inference methods on top of it: a pairwise statistical test that determines which of two models is larger, and a scaling-law estimator that extracts a one-dimensional latent index from these vectors via Principal Component Analysis (PCA) to map the aggregated signal to a parameter count. Validated on a broad set of open-weight models, both methods produce accurate and reliable lower bounds. When applied to popular closed-weight models, our framework recovers internal product hierarchies and reveals a clear divergence in industry scaling strategies: while some developers yield significantly higher bounds indicative of large generational parameter growth, others operate under strict parameter ceilings, demonstrating that hidden design choices can be systematically probed even under strict API limitations.

2605.30119 2026-06-08 cs.LG cs.AI cs.NE 版本更新

Evolving Features vs Evolving Entire Trees with GP for Interpretable Survival Analysis

使用遗传编程进行可解释生存分析:进化特征 vs 进化整个树

Thalea Schlender, Peter A. N. Bosman, Tanja Alderliesten

发表机构 * Leiden University Medical Center(莱顿大学医学中心) Centrum Wiskunde & Informatica(数学与信息学研究中心)

AI总结 本研究使用遗传编程多目标进化可检查的特征集,并联合优化生存树结构与非线性分裂逻辑,以提高浅层生存树的预测性能和可解释性。

详情
AI中文摘要

生存分析涉及预测事件发生时间。常用于医学领域,处理不完整(即删失)数据,例如研究期间未发生事件的患者。实际应用中,准确性和可解释性都很重要。生存树是易于理解的生存模型,将患者队列递归地划分为离散的患者组。虽然生存树可以捕捉复杂关系,但它们通常需要生长得很大,威胁可解释性。此外,生存树通常使用贪婪方法构建,可能忽略全局最优分裂组合,限制预测性能。浅层生存树需要表达性强的高阶特征组合才能达到竞争性准确性。因此,我们使用遗传编程多目标进化固有可检查的特征集,并研究它们与不同树诱导策略的相互作用。我们进一步引入了一种进化方法,联合优化生存树结构和非线性分裂逻辑。我们的发现表明,在两个真实世界数据集和两种不同生存树深度上,进化特征构建提高了不同树诱导策略下的预测性能。完整的联合进化在提出多个性能良好的固有可检查的浅层生存树方面具有最高的潜力。

英文摘要

Survival analysis concerns the task of predicting the time until an event occurs. Often used in the medical field, survival analysis deals with incomplete (i.e., censored) data, for instance, from patients who did not experience the event during the duration of the study. For practical use, both accuracy and interpretability are important. Survival trees are easy-to-follow survival models that split the patient cohort recursively into discrete patient groups. Whilst survival trees can capture complex relationships, they typically need to grow large, threatening interpretability. Moreover, survival trees are often built using greedy approaches that may overlook globally optimal split combinations, limiting predictive performance. Shallow survival trees require expressive, higher-order feature combinations to achieve competitive accuracy. We therefore use genetic programming to multi-objectively evolve inherently inspectable feature sets and study how they interact with different tree induction strategies. We further introduce an evolutionary approach that jointly optimises the survival tree structure and the non-linear split logic. Our findings demonstrate that evolutionary feature construction improves predictive performance across different tree induction strategies on two real-world datasets and two different survival tree depths. Given its speed and flexible presentation, the multi-objective evolution of entire trees likely holds the most future promise.

2606.02609 2026-06-08 cs.LG cs.AI 版本更新

Building Better Activation Oracles

构建更好的激活预言机

Jan Bauer, Celeste De Schamphelaere, Adam Karvonen, Niclas Luick, Neel Nanda

发表机构 * MATS Gatsby Unit, UCL(UCL马斯特斯伽利略单位) MATS Ghent University(根特大学马斯特斯) Independent(独立) MATS, University of Hamburg(汉堡大学马斯特斯) Equal contribution(同等贡献)

AI总结 针对激活预言机存在的幻觉和模糊性问题,通过四项训练改进(策略内 rollout、对话数据集优化、多层输入和注入公式改进)提升其质量,并开源首个综合评估套件 AObench。

Comments Jan Bauer and Celeste De Schamphelaere contributed equally; author order determined randomly

详情
AI中文摘要

激活预言机(Activation Oracles, AOs)是解释残差流激活的有前景的方法。然而,当前的AOs面临重要问题,如幻觉和模糊性。此外,文本反转混淆使其难以评估。为此,我们通过四种方式改进了激活预言机(AO)的训练机制:在策略内 rollout 上训练、改进对话数据集、提供更多层以及改进注入公式。能力提升微乎其微,但生活质量提升相当显著。此外,我们开源了首个用于AO质量的综合评估套件,称为AObench。总体而言,我们希望我们的工作能为可扩展、端到端可解释性范式中的AOs及其他模型的改进奠定基础。

英文摘要

Activation Oracles (AOs) are promising methods for interpreting residual stream activations. However, current AOs face important issues, such as hallucinations and vagueness. Additionally, text-inversion confounds make them hard to evaluate. To this end, we improve the Activation Oracle (AO) training regime in four ways: training on on-policy rollouts, improving the conversational dataset, feeding more layers and an improvement to the injection formula. The capability improvements are marginal, but quality of life improvements are quite substantial. In addition, we open source the first comprehensive evaluation suite for AO quality, which we call AObench. Overall, we hope that our work sets a foundation that helps improve AOs and other models in the paradigm of scalable, end-to-end interpretability.

2606.03002 2026-06-08 cs.LG cs.AI 版本更新

Perplexity Can Miss SAE Feature Damage Under Quantization

量化如何改变可解释特征:语言模型的稀疏自编码器分析

Evan Duan

发表机构 * University of Michigan(密歇根大学)

AI总结 通过稀疏自编码器分析,发现量化导致语言模型中的可解释特征逐渐退化,且任务指标无法完全反映这种损伤,量化与幅度剪枝共享相似的损伤模式。

Comments 12 Pages of Content, Submitted to TMLR

详情
AI中文摘要

量化是部署大型语言模型的标准途径,通常当量化模型的困惑度或下游精度接近全精度原始模型时,即认为其可接受。但模型是否仍以相同方式计算,或全精度模型中识别的可解释特征是否在权重舍入后存活,很少被测试,即使安全审计和引导干预越来越依赖这些特征。我们探究从稠密全精度模型中提取的稀疏自编码器(SAE)特征在模型量化后是否仍然忠实。使用冻结的SAE作为固定测量基础,我们在相同令牌上编码全精度和最近舍入(RTN)量化激活,并通过皮尔逊相关系数量化每个特征的存活率,在Pythia-70M和Gemma-2-2B上扫描从INT8到INT4的位宽。我们发现特征存活是分级的:特征系统地退化而非一次性全部失效,在Pythia-70M上INT6时62.4%的活跃特征存活,在Gemma-2-2B上INT6时51.3%存活,且大多数非存活特征被模糊而非破坏。存活率可仅从全精度统计量预测,交叉验证AUC为0.92至0.97,峰值激活是最强的边际预测因子。关键的是,任务指标可能忽略这种损伤:在Gemma-2-2B上,INT7改善了困惑度却使18.7%的特征退化。最后,量化和匹配困惑度的幅度剪枝损伤高度重叠的特征集,Jaccard重叠为0.79至0.86,损伤分数斯皮尔曼相关性为0.98,表明存在共享的压缩诱导脆弱性模式。这些结果表明,行为等价不足以证明可解释性发现可迁移到量化部署,从而激励对压缩进行特征级审计。

英文摘要

Quantization is a standard path to deploying large language models, and quantized models are typically judged acceptable when perplexity or downstream accuracy remains close to the full-precision original. But behavioral parity need not imply feature fidelity: the sparse-autoencoder (SAE) features used to interpret a full-precision model may change after weight rounding. We test this directly by using a frozen SAE as a fixed measurement basis, encoding full-precision and round-to-nearest (RTN) quantized activations on identical tokens, and measuring per-feature survival by Pearson correlation across bit-widths from INT8 to INT4 on Pythia-70M and Gemma-2-2B. Our central finding is that perplexity can miss feature damage: on Gemma-2-2B, INT7 improves perplexity while degrading 18.7% of active SAE features, and under sliding-window evaluation INT6 also improves perplexity while only 51.3% of active features survive. Feature survival is graded rather than cliff-like, with 62.4% of active Pythia features and 51.3% of active Gemma features surviving at INT6; most non-surviving features are blurred rather than fully damaged. Survival is also predictable from full-precision feature statistics alone, with cross-validated AUC 0.92--0.97 and peak activation as the strongest marginal predictor. Finally, RTN quantization and matched-perplexity magnitude pruning damage strongly overlapping feature sets, with Jaccard overlap 0.79--0.86 and damage-score Spearman correlation 0.98. These results show that behavioral metrics alone are insufficient evidence that full-precision interpretability findings transfer to quantized models, motivating feature-level audits of compression.

2606.03382 2026-06-08 cs.LG cs.AI 版本更新

Local Guidance, Global Impact: Gaussian-Reshaped Trust Region Unlocks Behavior Transitions

局部引导,全局影响:高斯重塑信任区域解锁行为转变

Bingxu Liu, Jiashun Liu, Johan Obando-Ceron, Hao Wang, Runze Liu, Pablo Samuel Castro, Aaron Courville, Ling Pan

发表机构 * Hong Kong University of Science and Technology(香港科技大学) Mila - Québec AI Institute(魁北克AI研究院) Université de Montréal(蒙特利尔大学) Fudan University(复旦大学) City University of Hong Kong(香港城市大学)

AI总结 针对PPO在非平稳环境中优化失效的问题,提出高斯信任区域策略优化(GTR),通过高斯核重塑信任区域实现非单调约束,在保持局部稳定性的同时允许必要的大幅策略更新,并在多种任务中取得强性能。

Comments 21 pages

详情
AI中文摘要

虽然近端策略优化(PPO)在静态环境中表现出色,但我们表明其标准优化范式在连续和非平稳环境中存在困难。失败并非源于模型容量不足或裁剪过于严格。相反,PPO执行持续、方向低效的局部更新,这表明缺乏几何感知引导来积累有意义的行为变化,最终阻碍向新行为模式的转变。尽管基于散度的正则化引入了部分几何感知,但其单调递增的惩罚隐式地阻止了大的策略偏差,即使这种转变对于有效适应是必要的。为了解决这一局限性,我们提出了高斯信任区域策略优化(GTR),它使用高斯核重塑信任区域。由此产生的约束是有界且非单调的,在提供强局部稳定性的同时,在持续的高优势更新下逐渐放松。为了进一步提高鲁棒性,我们引入了一个混合高斯锚点,它适应最近的策略轨迹,减少了由陈旧参考引起的方差。GTR与架构无关,在游戏、模拟机器人控制、开放世界探索和语言模型后训练中均取得了强性能。这些结果表明,几何感知的信任区域设计可以成为复杂非平稳环境中鲁棒强化学习的一个有前景的方向。我们的代码可在该 https URL 获取。

英文摘要

While Proximal Policy Optimization (PPO) demonstrates strong performance in stationary settings, we show that its standard optimization paradigm struggles in continual and non-stationary environments. The failure does not stem from insufficient model capacity or overly restrictive clipping. Instead, PPO performs persistent, directionally inefficient local updates, which indicates a lack of geometry-aware guidance for accumulating meaningful behavioral change and ultimately hindering transitions toward new behavior patterns. Although divergence-based regularization introduces partial geometric awareness, its monotonically increasing penalties implicitly discourage large policy deviations, even when such shifts are necessary for effective adaptation. To address this limitation, we propose Gaussian Trust Region Policy Optimization (GTR), which reshapes the trust region using a Gaussian kernel. The resulting constraint is bounded and non-monotonic, providing strong local stability while progressively relaxing under sustained high-advantage updates. To further improve robustness, we introduce a Mixture Gaussian Anchor that adapts to recent policy trajectories, reducing variance induced by stale references. GTR is architecture-agnostic and achieves strong performance across games, simulated robotic control, open-world exploration, and language model post-training. These results demonstrate that geometry-aware trust-region design can be a promising direction for robust reinforcement learning in complex non-stationary environments. Our code is available at https://anonymous.4open.science/r/GTR_demo/README.md.

2606.03559 2026-06-08 cs.LG math.OC stat.ML 版本更新

Analytical Evaluation of DCA Convergence Properties for Minimizing Prediction Functions of Gaussian RBF Support Vector Regression

高斯RBF支持向量回归预测函数最小化的DCA收敛性分析评估

Yohei Kakimoto, Yuto Omae, Hirotaka Takahashi

发表机构 * Nihon University(日本大学) Tokyo City University(东京城大学)

AI总结 针对以训练好的高斯RBF核支持向量回归(RBF-SVR)预测函数为目标函数的非凸优化问题,利用RBF核的解析结构构造显式DC分解,推导出DC分量强凸参数下界μ和子问题梯度Lipschitz常数上界L的闭式表达式,并通过数值实验表明特征量Cαρ主导DCA的收敛性和初始点依赖性。

Comments 29 pages, 5 figures, 2 tables

详情
AI中文摘要

对于目标函数为训练好的高斯径向基函数(RBF)核支持向量回归(SVR)模型(RBF-SVR)预测函数的非凸优化问题,我们提出一个框架,通过利用RBF核的解析结构构造显式的凸函数差(DC)分解,应用DC算法(DCA)。具体地,我们闭式推导了DC分量的强凸参数下界μ和子问题梯度Lipschitz常数上界L。μ和L完全由训练后的对偶系数和Cα、RBF核参数γ以及DC分解参数ρ决定,且共享共同主导项Cαρ。通过在六个基准函数上的数值实验,我们表明Cαρ是表征DCA收敛性质和初始点依赖性的主要单一量,并进一步证明它分解为两个独立路径C→Cα和γ→ρ,其主要变化由SVR超参数(C,γ)控制。这些结果使得RBF-SVR上DCA的收敛性质可以通过单一标量Cαρ预先评估:训练前近似从(C,γ)得到,训练后精确闭式得到。

英文摘要

For nonconvex optimization problems whose objective is the prediction function of a trained Support Vector Regression (SVR) model with the Gaussian radial basis function (RBF) kernel (RBF-SVR), we present a framework that applies the difference of convex functions (DC) algorithm (DCA) by exploiting the analytical structure of the RBF kernel to construct an explicit DC decomposition. Specifically, we derive in closed form both the lower bound $μ$ of the strong convexity parameter of the DC components and the upper bound $L$ of the gradient Lipschitz constant of the subproblem. Both $μ$ and $L$ are determined solely by the post-training dual-coefficient sum $C_α$ and the RBF kernel parameter $γ$, together with the DC decomposition parameter $ρ$, and they share a common leading term $C_αρ$. Through numerical experiments on six benchmark functions, we show that $C_αρ$ is the primary single quantity characterizing both the convergence properties and the initial-point dependence of DCA, and further demonstrate that it decomposes into two independent pathways, $C \to C_α$ and $γ\to ρ$, with its primary variation governed by the SVR hyperparameters $(C, γ)$. Together, these results allow the convergence properties of DCA on RBF-SVR to be assessed in advance through the single scalar quantity $C_αρ$: approximately from $(C, γ)$ before training, and exactly in closed form after training.

2606.04058 2026-06-08 cs.LG cs.AI 版本更新

Spectral Scaling Laws of Muon

Muon的谱缩放定律

Gagik Magakyan, Pablo Parrilo, Asuman Ozdaglar

发表机构 * MIT(麻省理工学院)

AI总结 本文系统研究了Muon优化器中动量矩阵奇异值谱随模型大小的缩放行为,发现其遵循幂律,并据此提出层感知的牛顿-舒尔茨迭代配置选择方法以减少计算开销。

详情
AI中文摘要

正交归一化更新规则已迅速成为训练大型语言模型的主流优化器选择,最近的开源最先进模型采用了Muon。为了保持这些更新的可处理性,Muon使用牛顿-舒尔茨(NS)迭代执行正交归一化。由于NS只是近似,小奇异值的方向无法被正交归一化。在Muon中,NS每一步都应用于动量矩阵,然而关于这些动量矩阵的奇异值谱在训练过程中如何行为,以及该行为如何随模型大小变化,我们知之甚少。我们首次系统研究了这一问题。通过追踪从77M到2.8B参数模型中各层动量缓冲区的奇异值分位数,我们观察到一致的现象:在短暂的预热后,分位数稳定在一个由层类型和模型大小决定的值上。这些稳定值随模型大小呈现出非常清晰的幂律,且指数依赖于层。中后深度的层随模型大小$M$的缩放非常温和(约$M^{-0.25}$),因此学术规模下使用的标准5步NS配置将在更大规模下继续对它们进行正交归一化。然而,某些后期层的缩放更为激进(高达$M^{-0.96}$),在前沿规模下将落入NS失效区域,除非使用更多NS迭代或更好调整的系数。NS迭代在规模上计算成本高昂;我们的定律为从业者提供了一种有原则的、层感知的配方,用于选择最小的NS配置,该配置仍能正交归一化重要的方向——在不牺牲更新质量的情况下避免不必要的计算。

英文摘要

Orthonormalized update rules have rapidly become a leading choice of optimizer for training large language models, with recent open-source state-of-the-art models adopting Muon. To keep these updates tractable, Muon performs the orthonormalization with the Newton--Schulz (NS) iteration. Since NS is only approximate, directions with small singular values fail to be orthonormalized. In Muon, NS is applied to the momentum matrix at every step, yet little is known about how the singular value spectrum of these momentum matrices behaves during training, or how that behavior changes with model size. We present the first systematic study of this question. Tracking singular value quantiles of the momentum buffer across layers in models ranging from 77M to 2.8B parameters, we observe a consistent picture: after a short burn-in, the quantiles stabilize at a value determined by the layer type and model size. These stabilization values follow remarkably clean power laws in model size, with layer-dependent exponents. Layers up to mid-late depth scale very mildly with model size $M$ (around $M^{-0.25}$), so the standard 5-step NS configuration used at academic scale will continue to orthonormalize them at much larger scales. Some of the late layers, however, scale much more aggressively (up to $M^{-0.96}$) and will fall into the NS failure regime at frontier scale unless one uses more NS iterations or better-tuned coefficients. NS iterations are computationally expensive at scale; our laws give practitioners a principled, layer-aware recipe for choosing the minimum NS configuration that still orthonormalizes the directions that matter -- avoiding unnecessary computation without sacrificing update quality.

2606.04812 2026-06-08 cs.LG cs.AI 版本更新

Scenario Generation for Risk-Aware Reinforcement Learning with Probably Approximately Safe Guarantees

面向风险感知强化学习的情景生成与近似安全保证

Mohit Prashant, Arvind Easwaran

发表机构 * Nanyang Technological University(南洋理工大学)

AI总结 针对强化学习策略对转移扰动敏感导致不安全行为的问题,提出使用变分自编码器近似状态空间分布,通过构造上下界屏障证书并采样非鲁棒区域状态来收紧概率安全保证。

Comments 8 pages, preprint

详情
AI中文摘要

保证安全性对于强化学习(RL)智能体在现实世界中的部署至关重要,尤其是使用深度RL学习的策略可能表现出对转移扰动的敏感性,从而导致未知或不安全的行为。一种策略验证方法是通过采样相对于安全约束的策略轨迹来构造概率屏障证书,从而将已知的安全行为与未知行为区分开来。如果策略容易受到转移不确定性或扰动的影响,使智能体处于未充分探索的状态,则难以获得这些约束违反概率的严格上下界。为了解决这个问题,我们使用变分自编码器(VAE)近似遇到的状态空间的分布,并利用状态的潜在特征构造上下界屏障证书,以高置信度优化已知安全行为的区域。我们在工作中将其表述为一个对偶优化问题,其中下界屏障证书比上界屏障证书提供更保守的安全区域估计。在训练期间采样位于两者集合差(即非鲁棒区域)内的状态,使我们能够收紧上下界,从而提供更尖锐的概率安全保证。在我们的研究中,我们描述了所放置的保证,并通过实验证明了我们边界的紧致性。

英文摘要

Guaranteeing safety is critical to the deployment of reinforcement learning (RL) agents in the real-world, especially as policies learned using deep RL may demonstrate susceptibility to transition perturbations that result in unknown or unsafe behaviour. A method of policy verification is to construct probabilistic barrier-certificates by sampling policy trajectories with respect to safety constraints, thereby demarcating known safe behaviour from unknown behaviour. Obtaining tight upper and lower bounds on the probability of violation of these constraints may be difficult if the policy is susceptible to transition uncertainty or perturbation that places the agent in insufficiently explored states. To address this, we approximate the distribution of the encountered state-space using a variational autoencoder (VAE) and construct upper and lower-bound barrier-certificates using latent characteristics of states to optimize for regions of known, safe behaviour with high confidence. We frame this in our work as a dual optimization problem where the lower-bound barrier-certificate presents a more conservative estimate of the safe region than the upper-bound barrier-certificate. Sampling states that lie within the set difference of the two during training, i.e. the non-robust region, allows us to tighten the upper and lower bounds to provide sharper probabilistic guarantees on safety. Within our study, we describe the guarantees placed and demonstrate the tightness of our bounds experimentally.

2606.05152 2026-06-08 cs.LG cs.AI cs.CL 版本更新

Reinforcement Learning from Rich Feedback with Distributional DAgger

利用丰富反馈的强化学习与分布式DAgger

Rishabh Agrawal, Jacob Fein-Ashley, Paria Rashidinejad

发表机构 * University of Southern California(南加州大学)

AI总结 提出DistIL算法,通过分布式DAgger利用丰富反馈(如执行轨迹、工具输出等)进行前向交叉熵优化,实现单调策略改进和更好的Pass@N性能。

详情
AI中文摘要

推理模型发展迅速,但主流的基于可验证奖励的强化学习(RLVR)方法仍然非常狭窄:采样多个响应,并用单个比特奖励每个响应,指示最终答案是否正确。然而,许多设置提供了丰富的反馈,包括执行轨迹、工具输出、专家修正和模型自我评估。我们研究如何通过经典模仿学习算法DAgger的分布式变体来使用这种反馈,其中学习器可以局部访问当前策略所访问状态上的专家分布。这产生了一个简单的前向交叉熵目标,该目标接受黑盒专家,并且其序列级梯度通过将未来的专家-学生分歧传播回早期决策来进行丰富的信用分配。我们表明,基于反向KL或Jensen-Shannon的先前具有自蒸馏目标的强化学习无法保证单调策略改进:即使专家具有更高的奖励,它们的更新也可能增加更差动作的概率。相比之下,我们证明前向交叉熵允许单调策略改进并享有遗憾保证。我们进一步表明,我们的目标优化了教师加权的成功可能性的下界,从而改进了Pass@N。实验上,我们的方法DistIL在科学推理、编程和解决困难数学问题等多个领域优于RLVR和基于自蒸馏的强化学习基线。

英文摘要

Reasoning models have advanced rapidly, but the dominant reinforcement learning from verifiable rewards (RLVR) recipe remains surprisingly narrow: sample many responses and reward each with a single bit indicating whether the final answer is correct. Yet many settings provide rich feedback, including execution traces, tool outputs, expert corrections, and model self-evaluations. We study how to use such feedback through a distributional variant of the classic imitation learning algorithm DAgger, where the learner has local access to an expert distribution on states visited by the current policy. This yields a simple forward cross-entropy objective that admits a blackbox expert and whose sequence-level gradient {conduct rich credit assignment by propagating} future expert-student disagreement back to earlier decisions. We show that prior RL with self-distillation objectives based on reverse KL or Jensen-Shannon fail to guarantee monotonic policy improvement: even when the expert has higher reward, their updates may increase probability on worse actions. In contrast, we show that forward cross-entropy admits monotonic policy improvement and enjoys guarantees on regret. We further show that our objective optimizes a lower bound on teacher-weighted likelihood of success, leading to improved Pass@N. Empirically, our approach, DistIL, improves over RLVR and RL with self-distillation baselines across a variety of domains: scientific reasoning, coding, and solving hard mathematical problems.

2606.06397 2026-06-08 cs.LG 版本更新

The Post-GCN Decade Revisited: Curvature-Stratified Evaluation of Relational Learning

后GCN十年回顾:曲率分层的关联学习评估

Shuo Wang, Xiangyu Wang, Quanxin Wang, Bailin Wu, Bokui Wang, Shunyang Huang, Boyan Deng, Haonan Liu, Ruiyi Fang, Zhenxiang Xu, Boyu Wang, Zhao Kang

发表机构 * University of Electronic Science and Technology of China(电子科技大学) Tsinghua University(清华大学) Western University(西方大学) Zhejiang University(浙江大学)

AI总结 针对关联学习中统一基准掩盖几何依赖性性能的问题,提出曲率分层评估框架,通过将数据集按曲率正负零分区,揭示模型性能本质上是几何依赖的,并给出更可靠的评估协议。

Comments Comments: Suggestions and comments are welcomed

详情
AI中文摘要

当前关联学习的评估实践严重依赖于在异质数据集上平均性能的扁平排行榜,隐含地假设了统一的底层结构。我们证明这一假设引入了系统性偏差:它掩盖了依赖于几何的性能变化,并可能导致关于模型泛化的误导性结论。在这项工作中,我们将内在几何识别为控制模型有效性的关键潜在因素。我们证明,传统的聚合指标掩盖了关键的性能权衡,这些权衡只有在数据集按其几何属性分层时才变得可见。为了解决这个问题,我们引入了一个曲率分层的评估框架,将数据集划分为正曲率、负曲率和近零曲率区域。我们的基准测试评估了18个代表性模型,包括图卷积网络(GCNs)、图基础模型(GFMs)和表格学习方法,涵盖14个数据集。我们发现,模型排名在每个曲率区域内高度稳定,但在不同区域之间显著变化,表明性能从根本上依赖于几何,而非普遍可迁移。值得注意的是,我们识别出与几何对齐的GNN相比,GFMs提供递减收益的区域。基于这些发现,我们提出了一种几何感知的评估协议,该协议比标准聚合基准产生更可靠和可解释的比较。我们发布所有代码、曲率分层的数据集划分和评估工具,以支持未来关联学习方法的可重复和严格评估。代码和数据集在我们的项目主页上提供:https://sirbabbage.github.io/CurvBench_HOME/。

英文摘要

Current evaluation practices in relational learning rely heavily on flat leaderboards that average performance across heterogeneous datasets, implicitly assuming a uniform underlying structure. We show that this assumption introduces systematic bias: it obscures geometry-dependent performance variations and can lead to misleading conclusions about model generalization. In this work, we identify intrinsic geometry as a key latent factor governing model effectiveness. We demonstrate that conventional aggregated metrics mask critical performance trade-offs that only become visible when datasets are stratified by their geometric properties. To address this issue, we introduce a curvature-stratified evaluation framework that partitions datasets into positive, negative, and near-zero curvature regimes. Our benchmark evaluates 18 representative models including Graph Convolutional Networks (GCNs), Graph Foundation Models (GFMs), and tabular learning methods across 14 datasets. We find that model rankings are highly stable within each curvature regime but shift significantly across regimes, indicating that performance is fundamentally geometry-dependent rather than universally transferable. Notably, we identify regimes where GFMs offer diminishing returns compared to geometry-aligned GNNs. Based on these findings, we propose a geometry-aware evaluation protocol that yields more reliable and interpretable comparisons than standard aggregated benchmarks. We release all code, curvature-stratified dataset splits, and evaluation tools to support reproducible and rigorous assessment of future relational learning methods. Code and datasets are provided in our project homepage: https://sirbabbage.github.io/CurvBench_HOME/.

2203.07904 2026-06-08 eess.IV cs.CV cs.LG 版本更新

Unsupervised Learning Based Focal Stack Camera Depth Estimation

基于无监督学习的焦堆相机深度估计

Zhengyu Huang, Weizhi Du, Theodore B. Norris

发表机构 * Center for Ultrafast Optical Science, University of Michigan(超快光学科学中心,密歇根大学) University of Michigan(密歇根大学)

AI总结 提出一种基于无监督深度学习的方法,从焦堆相机图像估计深度,在NYU-v2数据集上相比单图像方法显著提高精度。

详情
Journal ref
in Conference on Lasers and Electro-Optics, Technical Digest Series (Optica Publishing Group, 2022), paper JW3A.5
AI中文摘要

我们提出一种基于无监督深度学习的方法,从焦堆相机图像估计深度。在NYU-v2数据集上,我们的方法相比基于单图像的方法实现了更好的深度估计精度。

英文摘要

We propose an unsupervised deep learning based method to estimate depth from focal stack camera images. On the NYU-v2 dataset, our method achieves much better depth estimation accuracy compared to single-image based methods.

2209.00188 2026-06-08 cs.AR cs.LG 版本更新

Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction

Hermes: 通过基于感知器的片外负载预测加速长延迟负载请求

Rahul Bera, Konstantinos Kanellopoulos, Shankar Balachandran, David Novo, Ataberk Olgun, Mohammad Sadrosadati, Onur Mutlu

发表机构 * ETH Zürich(苏黎世联邦理工学院) Intel Processor Architecture Research Lab(英特尔处理器架构研究实验室) LIRMM, Univ. Montpellier, CNRS(蒙彼利埃大学LIRMM实验室,CNRS)

AI总结 提出Hermes技术,利用感知器预测片外负载请求,投机性地直接从主存获取数据,同时并行访问缓存层次,从而消除片外负载关键路径上的片上缓存访问延迟,显著提升处理器性能。

Comments To appear in 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022

详情
AI中文摘要

长延迟负载请求持续限制高性能处理器的性能。为增加处理器的延迟容忍度,架构师主要依赖两种关键技术:复杂的数据预取器和大型片上缓存。在这项工作中,我们表明:1) 即使是最先进的复杂预取器,在广泛的工作负载中平均也只能预测一半的片外负载请求;2) 由于片上缓存的规模和复杂性不断增加,片外负载请求的大部分延迟都花费在访问片上缓存层次结构上。本工作的目标是通过从片外负载请求的关键路径中移除片上缓存访问延迟来加速它们。为此,我们提出了一种名为Hermes的新技术,其关键思想是:1) 准确预测哪些负载请求可能走向片外;2) 投机性地直接从主存获取预测的片外负载所需的数据,同时并发访问这些负载的缓存层次结构。为实现Hermes,我们开发了一种新的轻量级、基于感知器的片外负载预测技术,该技术学习使用多个程序特征(例如,程序计数器序列)来识别片外负载请求。对于每个负载请求,预测器观察一组程序特征以预测该负载是否会走向片外。如果预测负载将走向片外,Hermes在负载的物理地址生成后立即向内存控制器发出投机性请求。如果预测正确,负载最终会错过缓存层次结构,并等待正在进行的投机性请求完成,从而从片外负载的关键路径中隐藏片上缓存层次结构访问延迟。我们的评估表明,Hermes显著提升了最先进基线的性能。我们开源了Hermes。

英文摘要

Long-latency load requests continue to limit the performance of high-performance processors. To increase the latency tolerance of a processor, architects have primarily relied on two key techniques: sophisticated data prefetchers and large on-chip caches. In this work, we show that: 1) even a sophisticated state-of-the-art prefetcher can only predict half of the off-chip load requests on average across a wide range of workloads, and 2) due to the increasing size and complexity of on-chip caches, a large fraction of the latency of an off-chip load request is spent accessing the on-chip cache hierarchy. The goal of this work is to accelerate off-chip load requests by removing the on-chip cache access latency from their critical path. To this end, we propose a new technique called Hermes, whose key idea is to: 1) accurately predict which load requests might go off-chip, and 2) speculatively fetch the data required by the predicted off-chip loads directly from the main memory, while also concurrently accessing the cache hierarchy for such loads. To enable Hermes, we develop a new lightweight, perceptron-based off-chip load prediction technique that learns to identify off-chip load requests using multiple program features (e.g., sequence of program counters). For every load request, the predictor observes a set of program features to predict whether or not the load would go off-chip. If the load is predicted to go off-chip, Hermes issues a speculative request directly to the memory controller once the load's physical address is generated. If the prediction is correct, the load eventually misses the cache hierarchy and waits for the ongoing speculative request to finish, thus hiding the on-chip cache hierarchy access latency from the critical path of the off-chip load. Our evaluation shows that Hermes significantly improves performance of a state-of-the-art baseline. We open-source Hermes.

2404.02141 2026-06-08 stat.ME cs.LG econ.EM stat.CO stat.ML 版本更新

Robustly estimating heterogeneity in factorial data using Rashomon Partitions

使用Rashomon分区稳健估计因子数据中的异质性

Aparajithan Venkateswaran, Anirudh Sankar, Arun G. Chandrasekhar, Tyler H. McCormick

发表机构 * Department of Statistics, University of Washington, USA(美国华盛顿大学统计学系) Department of Economics, Stanford University, USA(美国斯坦福大学经济学系) J-PAL, NBER, USA(美国J-PAL和NBER) Department of Sociology, University of Washington, USA(美国华盛顿大学社会学系)

AI总结 提出Rashomon分区集(RPS)贝叶斯框架,通过枚举后验密度接近最大后验模型的所有模型来量化模型不确定性,实现稳健的异质性估计。

详情
AI中文摘要

在观测数据和随机对照试验中,研究人员选择统计模型来阐述感兴趣的结果如何随可观测协变量的组合而变化。选择过于简单的模型可能会掩盖协变量组之间结果的重要异质性,而过于复杂则可能识别出虚假模式。在本文中,我们提出了一种新颖的贝叶斯模型不确定性框架,称为Rashomon分区集(RPS)。RPS包含所有后验密度接近最大后验(MAP)模型的模型。我们通过枚举而非采样来构建RPS,这确保我们探索数据中具有高证据的所有模型,即使它们提供截然不同的实质性解释。我们使用l0先验,该先验允许我们在不对效应之间的关联施加强假设的情况下捕获复杂的异质性,并从信息论角度证明该先验是极小化最优的。我们刻画了在RPS内相对于整个后验条件计算的参数(的函数)的近似误差。我们提出了一种算法,从可解释且唯一的模型类中枚举RPS,然后给出RPS大小的界限。我们提供了模拟证据以及三个实证例子:价格对慈善捐赠的影响、染色体结构的异质性以及小额信贷的引入。

英文摘要

In both observational data and randomized control trials, researchers select statistical models to articulate how the outcome of interest varies with combinations of observable covariates. Choosing a model that is too simple can obfuscate important heterogeneity in outcomes between covariate groups, while too much complexity risks identifying spurious patterns. In this paper, we propose a novel Bayesian framework for model uncertainty called Rashomon Partition Sets (RPSs). The RPS consists of all models that have posterior density close to the maximum a posteriori (MAP) model. We construct the RPS by enumeration, rather than sampling, which ensures that we explore all models with high evidence in the data, even if they offer dramatically different substantive explanations. We use a l0 prior, which allows the allows us to capture complex heterogeneity without imposing strong assumptions about the associations between effects, showing this prior is minimax optimal from an information-theoretic perspective. We characterize the approximation error of (functions of) parameters computed conditional on being in the RPS relative to the entire posterior. We propose an algorithm to enumerate the RPS from the class of models that are interpretable and unique, then provide bounds on the size of the RPS. We give simulation evidence along with three empirical examples: price effects on charitable giving, heterogeneity in chromosomal structure, and the introduction of microfinance.

2503.00065 2026-06-08 cs.CR cs.LG 版本更新

ADAGE: Active Defenses Against GNN Extraction

ADAGE: 针对GNN提取的主动防御

Jing Xu, Franziska Boenisch, Adam Dziedzic

发表机构 * CISPA Helmholtz Center for Information Security(信息安全研究中心)

AI总结 提出首个通用主动防御框架ADAGE,通过监控查询多样性并逐步扰动输出,有效阻止多种GNN模型窃取攻击,同时保持下游任务性能。

Comments Accepted at AsiaCCS 2026

详情
AI中文摘要

图神经网络(GNN)在药物发现、交通状态预测和推荐系统等实际应用中取得了高性能。构建强大的GNN需要大量训练数据、强大的计算资源和人类专业知识,这使得模型成为模型窃取攻击的有利目标。先前的研究表明,针对GNN的窃取攻击威胁向量大且多样,攻击者可以利用从节点标签到高维节点嵌入的各种异质信号,以原始训练成本的一小部分创建目标GNN的本地副本。这种威胁向量的多样性使得设计有效且通用的防御具有挑战性,现有的防御通常专注于特定的窃取设置。此外,它们仅提供识别被盗模型副本的方法,而非阻止攻击。为弥补这一差距,我们提出了首个通用的针对GNN提取的主动防御(ADAGE)。ADAGE基于以下观察:窃取模型的全部功能需要高度多样化的查询来泄露其在整个输入空间的行为。我们的防御监控这种查询多样性,并随着累积泄漏的增加逐步扰动输出。与先前工作相比,ADAGE可以在所有常见攻击设置下阻止窃取。我们使用六个基准数据集、四个GNN模型和三种类型的自适应攻击者进行的广泛实验评估表明,ADAGE对攻击者施加惩罚,使其无法窃取,同时保持下游任务的预测性能。因此,ADAGE有助于未来安全地共享有价值的GNN。

英文摘要

Graph Neural Networks (GNNs) achieve high performance in various real-world applications, such as drug discovery, traffic states prediction, and recommendation systems. The fact that building powerful GNNs requires a large amount of training data, powerful computing resources, and human expertise turns the models into lucrative targets for model stealing attacks. Prior work has revealed that the threat vector of stealing attacks against GNNs is large and diverse, as an attacker can leverage various heterogeneous signals ranging from node labels to high-dimensional node embeddings to create a local copy of the target GNN at a fraction of the original training costs. This diversity in the threat vector renders the design of effective and general defenses challenging and existing defenses usually focus on one particular stealing setup. Additionally, they solely provide means to identify stolen model copies rather than preventing the attack. To close this gap, we propose the first and general Active Defense Against GNN Extraction (ADAGE). ADAGE builds on the observation that stealing a model's full functionality requires highly diverse queries to leak its behavior across the input space. Our defense monitors this query diversity and progressively perturbs outputs as the accumulated leakage grows. In contrast to prior work, ADAGE can prevent stealing across all common attack setups. Our extensive experimental evaluation using six benchmark datasets, four GNN models, and three types of adaptive attackers shows that ADAGE penalizes attackers to the degree of rendering stealing impossible, whilst preserving predictive performance on downstream tasks. ADAGE, thereby, contributes towards securely sharing valuable GNNs in the future.

2506.01850 2026-06-08 cs.CV cs.AI cs.LG cs.MM 版本更新

MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs

MoDA: 面向指令型多模态大语言模型的细粒度视觉定位的调制适配器

Wayner Barrios, Andrés Villa, Juan León Alcázar, SouYoung Jin, Bernard Ghanem

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出MoDA调制适配器,通过指令引导的通道级乘法调制增强细粒度视觉定位,在12个基准上对三种MLLM架构取得一致提升,计算开销极小。

Comments Accepted at ICML 2026. Code is available at https://github.com/waybarrios/MoDA

详情
AI中文摘要

多模态大语言模型(MLLMs)通过将预训练的视觉编码器与大语言模型(LLMs)集成,在指令跟随任务中取得了显著成功。然而,现有方法由于视觉补丁表示中的语义纠缠,常常难以实现细粒度的视觉定位,其中单个补丁混合了多个不同的视觉元素,使得模型难以聚焦于指令相关的细节。为了应对这一挑战,我们提出了MoDA(调制适配器),一种轻量级模块,通过指令引导的通道级调制增强视觉定位。与Q-Former等执行加性特征选择的令牌级方法不同,MoDA通过对已对齐特征进行乘法调制在通道级操作,从而实现对每个指令相关嵌入维度的细粒度控制。遵循标准的LLaVA训练协议,MoDA在语言指令与预对齐的视觉特征之间应用交叉注意力,生成动态调制掩码,无需架构修改或额外监督。我们在涵盖视觉问答、视觉中心推理和幻觉检测的12个基准上评估了MoDA,包括最近的2024年基准(MMVP、CV-Bench、MMStar、RealWorldQA),并在三种不同的MLLM架构上进行了测试:LLaVA-1.5、LLaVA-MoRE(2025)和Qwen3-VL(2025)。MoDA在所有三个系列中均取得了一致的提升,在LLaVA-1.5系列的MMVP上提升了+12.0个百分点,在LLaVA-MoRE系列的ScienceQA上提升了+4.8个百分点,在Qwen3-VL上ScienceQA提升了+4.9、RealWorldQA提升了+4.1、GQA提升了+3.8,证实了这些增益在CLIP编码器之外具有泛化性,且计算开销极小(<1% FLOPs)。代码可在https://github.com/waybarrios/MoDA获取。

英文摘要

Multimodal Large Language Models (MLLMs) have achieved remarkable success in instruction-following tasks by integrating pretrained visual encoders with large language models (LLMs). However, existing approaches often struggle with fine-grained visual grounding due to semantic entanglement in visual patch representations, where individual patches blend multiple distinct visual elements, making it difficult for models to focus on instruction-relevant details. To address this challenge, we propose MoDA (Modulation Adapter), a lightweight module that enhances visual grounding through instruction-guided channel-wise modulation. Unlike token-level methods such as Q-Former that perform additive feature selection, MoDA operates at the channel level through multiplicative modulation on already-aligned features, enabling fine-grained control over which embedding dimensions are relevant for each instruction. Following the standard LLaVA training protocol, MoDA applies cross-attention between language instructions and pre-aligned visual features, generating dynamic modulation masks without architectural modifications or additional supervision. We evaluate MoDA across 12 benchmarks spanning visual question answering, vision-centric reasoning, and hallucination detection, including recent 2024 benchmarks (MMVP, CV-Bench, MMStar, RealWorldQA), on three distinct MLLM architectures: LLaVA-1.5, LLaVA-MoRE (2025), and Qwen3-VL (2025). MoDA delivers consistent gains across all three families, with +12.0 points on MMVP for the LLaVA-1.5 family and +4.8 points on ScienceQA for the LLaVA-MoRE family, and +4.9 ScienceQA, +4.1 RealWorldQA, and +3.8 GQA on Qwen3-VL, confirming that the gains generalize beyond CLIP-based encoders with minimal overhead (<1% FLOPs). Code is available at https://github.com/waybarrios/MoDA.

2509.04991 2026-06-08 physics.ao-ph cs.AI cs.LG 版本更新

A Mechanism-Coupled Split Window Network for Medium- to High-Resolution Land Surface Temperature Retrieval

一种面向中高分辨率地表温度反演的机制耦合分裂窗网络

Tian Xie, Menghui Jiang, Chao Zeng, Huifang Li, Guanhao Zhang, Chan Li, Huanfeng Shen

发表机构 * School of Resource and Environmental Sciences, Wuhan University(武汉大学资源与环境科学学院) Key Laboratory of Geographic Information System of Ministry of Education, Wuhan(教育部地理信息系统重点实验室) Key Laboratory of Digital Cartography and Land Information Application of the Ministry of Natural Resources, Wuhan(自然资源部数字测图与土地信息应用重点实验室)

AI总结 提出并行分量解耦神经网络(PCD-Net),将分裂窗反演重构为物理分量系数的动态学习问题,通过分量级解耦建模和残差分支,实现复杂大气和地表条件下的高精度、鲁棒且全局可泛化的地表温度反演。

详情
AI中文摘要

地表温度(LST)是陆-气相互作用、地表能量收支和气候过程中的基本物理变量。从中高分辨率热红外(TIR)观测中获取的LST能有效揭示不同景观单元间的热环境差异。然而,在复杂大气条件和多样土地覆盖类型下,实现准确、鲁棒且全局可泛化的LST反演仍具挑战。传统分裂窗(SW)算法严重依赖经验参数化,其固定系数无法适应高温地表和高大气水汽含量等复杂场景。同时,传统数据驱动模型因缺乏显式物理结构约束,对分布外(OOD)样本的泛化能力有限。为解决这些问题,本研究提出并行分量解耦神经网络(PCD-Net)框架,将SW反演重构为物理分量系数的动态学习问题。以SW方程作为物理主干,该框架构建并行子网络,自适应学习对应常数项、一阶和二阶亮度温度差项的动态系数;同时引入残差分支,补充由地表发射率和大气水汽联合效应引起的非线性耦合校正。通过这种分量级解耦建模,PCD-Net显式刻画了地表发射率、大气水汽含量与不同SW物理分量之间的动态响应关系。

英文摘要

Land surface temperature (LST) is a fundamental physical variable in land-atmosphere interactions, surface energy budgets, and climate processes. LST derived from medium- to high-resolution thermal infrared (TIR) observations effectively reveals thermal environmental disparities across distinct landscape units. However, achieving accurate, robust, and globally generalizable LST retrieval remains challenging under complex atmospheric conditions and diverse land cover types. Traditional split window (SW) algorithms heavily rely on empirical parameterizations, whose fixed coefficients fail to adapt to complex scenarios such as high surface temperatures and high atmospheric water vapor content. Concurrently, conventional data-driven models exhibit limited generalizability to out-of-distribution (OOD) samples due to the absence of explicit physical structure constraints. To address these issues, this study proposes a Parallel Component Decoupled Neural Network (PCD-Net) framework, which reformulates SW retrieval as a dynamic learning problem of physical component coefficients. Using the SW equation as the physical backbone, the framework constructs parallel subnetworks to adaptively learn the dynamic coefficients corresponding to the constant, first-order, and second-order brightness temperature difference terms; meanwhile, a residual branch is incorporated to supplement the nonlinear coupling corrections induced by the joint effects of surface emissivity and atmospheric water vapor. Through this component-level decoupled modeling, PCD-Net explicitly characterizes the dynamic response relationships between land surface emissivity, atmospheric water vapor content, and different SW physical components.

2509.11208 2026-06-08 stat.ML cs.LG 版本更新

Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication

可预测的压缩失败:基于证据的二元裁决的顺序敏感性与信息预算

Leon Chlon, Ahmed Karim, Maggie Chlon, MarcAntonio Awada

发表机构 * GitHub

AI总结 研究证据顺序对基于Transformer的二元裁决模型的影响,提出QMV界和EDFL定律,通过信息充分率门控实现低幻觉率下的答案/弃权决策。

详情
AI中文摘要

用于基于证据的二元裁决(例如,支持/反驳、是/否或验证器支持的通过/失败决策)的Transformer可能对可交换证据呈现的顺序敏感,在验证器相关的伯努利谓词下产生跨排列的分散性和不可靠的尝试答案。我们将证据顺序视为一个干扰变量,并形式化了一个期望-实现差距:下一个词训练可以最小化顺序上的期望条件描述长度,而固定顺序仍保持位置敏感性。我们的量化鞅违反(QMV)界预测了由相邻秩位置敏感性引起的分散性,在调和区具有$O(\log n)$增长;我们的期望级解压定律(EDFL)将KL凸性/数据处理界专门化到伯努利谓词,产生信任比特(B2T)、幻觉风险(RoH)以及用于答案/弃权决策的信息充分率(ISR)门。在来自FEVER、HotpotQA、NQ-Open、PopQA和Controls的3,059个有依据项目上,我们观察到对数分散性和均匀排列混合的正Jensen增益。在一个预先指定的保留审计(528个项目)中,分析固定的ISR$=1$门实现了0.0-0.7%的幻觉率,20.6-27.9%的弃权率(95%置信区间),支持该操作点,但未声称对所有模型系列或不受限生成具有通用校准。

英文摘要

Transformers used for evidence-grounded binary adjudication (e.g., support/refute, yes/no, or verifier-backed pass/fail decisions) can be sensitive to the order in which exchangeable evidence is presented, producing dispersion across permutations and unreliable attempted answers under a verifier-relative Bernoulli predicate. We treat evidence order as a nuisance variable and formalize an expectation-realization gap: next-token training can minimize expected conditional description length over orderings while a fixed ordering remains position-sensitive. Our Quantified Martingale Violation (QMV) bound predicts the dispersion induced by adjacent-rank positional sensitivity, with $O(\log n)$ growth in the harmonic regime; our Expectation-level Decompression Law (EDFL) specializes a KL convexity/data-processing bound to Bernoulli predicates, yielding Bits-to-Trust (B2T), Risk-of-Hallucination (RoH), and an Information Sufficiency Ratio (ISR) gate for answer/abstain decisions. On 3,059 grounded items from FEVER, HotpotQA, NQ-Open, PopQA, and Controls, we observe logarithmic dispersion and positive Jensen gains from uniform permutation mixtures. In one pre-specified held-out audit (528 items), the analytically fixed ISR$=1$ gate attains 0.0-0.7% hallucination with 20.6-27.9% abstention (95% CIs), supporting the operating point without claiming universal calibration across all model families or unrestricted generation.

2509.24935 2026-06-08 cs.CV cs.AI cs.LG 版本更新

Scalable GANs with Transformers

可扩展的Transformer生成对抗网络

Sangeek Hyun, MinKyu Lee, Jae-Pil Heo

发表机构 * KAIST(韩国科学技术院)

AI总结 本文通过紧凑变分自编码器潜在空间和纯Transformer架构,研究了生成对抗网络的可扩展性,并提出了轻量级中间监督和宽度自适应学习率调整来解决缩放时的失败模式,在ImageNet-256上以40个epoch达到2.96的FID。

Comments ICML 2026

详情
AI中文摘要

可扩展性推动了生成建模的最新进展,但其原理在对抗学习中仍未充分探索。我们通过两个在其他类型生成模型中被证明有效的设计选择来研究生成对抗网络(GAN)的可扩展性:在紧凑的变分自编码器潜在空间中训练,以及采用纯Transformer的生成器和判别器。在潜在空间中训练能够在保持感知保真度的同时实现高效计算,而这种效率与普通Transformer自然匹配,后者的性能随计算预算扩展。基于这些选择,我们分析了朴素缩放GAN时出现的失败模式。具体来说,我们发现了随着网络规模扩大,生成器早期层利用不足和优化不稳定的问题。因此,我们提供了简单且对缩放友好的解决方案,如轻量级中间监督和宽度自适应学习率调整。我们的实验表明,GAT——一种纯Transformer的潜在空间GAN——能够在从S到XL的广泛容量范围内可靠地训练。此外,GAT-XL/2在ImageNet-256上仅用40个epoch(比强基线少6倍)就达到了最先进的单步类条件生成性能(FID为2.96)。项目页面:https://hse1032.github.io/GAT。

英文摘要

Scalability has driven recent advances in generative modeling, yet its principles remain underexplored for adversarial learning. We investigate the scalability of Generative Adversarial Networks (GANs) through two design choices that have proven to be effective in other types of generative models: training in a compact Variational Autoencoder latent space and adopting purely transformer-based generators and discriminators. Training in latent space enables efficient computation while preserving perceptual fidelity, and this efficiency pairs naturally with plain transformers, whose performance scales with computational budget. Building on these choices, we analyze failure modes that emerge when naively scaling GANs. Specifically, we find issues as underutilization of early layers in the generator and optimization instability as the network scales. Accordingly, we provide simple and scale-friendly solutions as lightweight intermediate supervision and width-aware learning-rate adjustment. Our experiments show that GAT, a purely transformer-based and latent-space GANs, can be easily trained reliably across a wide range of capacities (S through XL). Moreover, GAT-XL/2 achieves state-of-the-art single-step, class-conditional generation performance (FID of 2.18) on ImageNet-256 in just 60 epochs, 4x fewer epochs than strong baselines. Project page: https://hse1032.github.io/GAT.

2510.07315 2026-06-08 cs.CL cs.AI cs.LG cs.SE 版本更新

SWE-IF: Aligning Code Evaluation with Human Preference

SWE-IF: 使代码评估与人类偏好对齐

Ming Zhong, Xiang Zhou, Ting-Yun Chang, Qingze Wang, Nan Xu, Xiance Si, Dan Garrette, Shyam Upadhyay, Jeremiah Liu, Jiawei Han, Benoit Schillings, Jiao Sun

发表机构 * Google DeepMind(谷歌深Mind)

AI总结 提出SWE-IF基准,通过可验证指令分类法VeriCode评估代码指令遵循能力,发现指令遵循是区分LLM代码质量的关键,与功能正确性结合更能匹配人类偏好。

Comments ICML 2026

详情
AI中文摘要

大型语言模型(LLM)推动了vibe coding,用户通过自然语言交互利用LLM生成并迭代优化代码,直到通过其vibe检查。Vibe检查反映了人类偏好,超越了功能性:解决方案应感觉正确、阅读清晰、保留意图并保持正确。然而,当前的代码评估仍局限于pass@k,仅捕获功能正确性,忽略了用户常规应用的非功能性指令。在本文中,我们假设指令遵循是vibe检查中除功能正确性之外缺失的部分。为了用量化信号衡量模型的代码指令遵循能力,我们提出了VeriCode,一个包含30条可验证代码指令及其确定性验证器的分类法。我们使用该分类法增强现有评估套件,得到SWE-IF,一个评估指令遵循和功能正确性的测试平台。评估31个LLM,我们发现即使最强的模型也难以遵守多条指令,并表现出功能回归。最重要的是,功能正确性和指令遵循的复合得分与人类偏好相关性最强,其中指令遵循成为LLM之间的主要区分因素。我们的代码、数据和分类法可在https://github.com/maszhongming/SWE-IF获取。

英文摘要

Large Language Models (LLMs) have catalyzed vibe coding, where users leverage LLMs to generate and iteratively refine code through natural language interactions until it passes their vibe check. Vibe check reflects human preference and goes beyond functionality: the solution should feel right, read cleanly, preserve intent, and remain correct. However, current code evaluation remains anchored to pass@k and captures only functional correctness, overlooking non-functional instructions that users routinely apply. In this paper, we hypothesize that instruction following is the missing piece underlying vibe check besides functional correctness. To quantify models' code instruction-following capabilities with measurable signals, we present VeriCode, a taxonomy of 30 verifiable code instructions together with deterministic verifiers. We use the taxonomy to augment established evaluation suites, resulting in SWE-IF, a testbed to assess both instruction following and functional correctness. Evaluating 31 LLMs, we show that even the strongest models struggle to comply with multiple instructions and exhibit functional regression. Most importantly, a composite score of functional correctness and instruction following correlates best with human preference, with instruction following emerging as the primary differentiator among LLMs. Our code, data, and taxonomy are available at https://github.com/maszhongming/SWE-IF.

2511.02748 2026-06-08 cs.NI cs.LG 版本更新

Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning

面向6G的智能体世界建模:近实时生成式状态空间推理

Farhad Rezazadeh, Amir Ashtari Gargari, Hatim Chergui, Sandra Lagen, Merouane Debbah, Houbing Song, Lingjia Liu

发表机构 * BrainOmega and the Technical University of Catalonia (UPC)(BrainOmega 和 哈佛大学(UPC)) Centre Tecnologic de Telecomunicacions de Catalunya (CTTC/CERCA)(巴塞罗那电信技术中心(CTTC/CERCA)) i2CAT Foundation(i2CAT 基金会) Khalifa University of Science and Technology(科技 Khalifa 大学) University of Maryland, Baltimore County (UMBC)(马里兰大学巴尔的摩分校(UMBC)) Virginia Tech(弗吉尼亚理工学院)

AI总结 提出基于世界建模的智能体框架WM-MS3M,通过生成式状态空间实现6G网络近实时反事实推理与资源分配,在O-RAN数据上降低预测误差并加速推理。

Comments 13 Pages, 3 Figures, 4 Tables

详情
AI中文摘要

我们认为第六代(6G)智能并非流畅的令牌预测,而是想象与选择的能力——模拟未来场景、权衡取舍并以校准的不确定性行动。我们通过反事实动力学和世界建模(WM)范式重新定义开放无线接入网(O-RAN)近实时(Near-RT)控制,该范式学习动作条件的生成式状态空间。这使得超越大语言模型(LLM)作为主要建模基元的定量“假设”预测成为可能。诸如物理资源块(PRB)之类的动作在因果世界模型中被视为一等控制输入,并且对预测和假设分析中的偶然不确定性和认知不确定性进行建模。一个基于智能体模型预测控制(MPC)的交叉熵方法(CEM)规划器在短时域上运行,利用数据驱动的PRB边界内的先验均值展开以最大化确定性奖励。该模型将多尺度结构化状态空间混合(MS3M)与紧凑随机潜变量耦合形成WM-MS3M,总结关键绩效指标(KPI)历史并在假设PRB序列下预测下一步KPI。在真实O-RAN轨迹上,WM-MS3M相比MS3M在参数减少32%且延迟相似的情况下将平均绝对误差(MAE)降低1.69%,相比注意力/混合基线实现35-80%更低的均方根误差(RMSE)和2.3-4.1倍更快的推理速度,从而支持稀有事件模拟和离线策略筛选。

英文摘要

We argue that sixth-generation (6G) intelligence is not fluent token prediction but the capacity to imagine and choose -- to simulate future scenarios, weigh trade-offs, and act with calibrated uncertainty. We reframe open radio access network (O-RAN) near-real-time (Near-RT) control via counterfactual dynamics and a world modeling (WM) paradigm that learns an action-conditioned generative state space. This enables quantitative "what-if" forecasting beyond large language models (LLMs) as the primary modeling primitive. Actions such as physical resource blocks (PRBs) are treated as first-class control inputs in a causal world model, and both aleatoric and epistemic uncertainty are modeled for prediction and what-if analysis. An agentic, model predictive control (MPC)-based cross-entropy method (CEM) planner operates over short horizons, using prior-mean rollouts within data-driven PRB bounds to maximize a deterministic reward. The model couples multi-scale structured state-space mixtures (MS3M) with a compact stochastic latent to form WM-MS3M, summarizing key performance indicators (KPIs) histories and predicting next-step KPIs under hypothetical PRB sequences. On realistic O-RAN traces, WM-MS3M cuts mean absolute error (MAE) by 1.69% versus MS3M with 32% fewer parameters and similar latency, and achieves 35-80% lower root mean squared error (RMSE) than attention/hybrid baselines with 2.3-4.1x faster inference, enabling rare-event simulation and offline policy screening.

2511.04567 2026-06-08 physics.plasm-ph cs.CE cs.LG physics.comp-ph 版本更新

Machine Learning for Electron-Scale Turbulence Modeling in W7-X

W7-X中电子尺度湍流建模的机器学习方法

Ionut-Gabriel Farcas, Don Lawrence Carl Agapito Fernando, Alejandro Banon Navarro, Gabriele Merlo, Frank Jenko

发表机构 * Department of Mathematics and Division of Computational Modeling and Data Analytics, Academy of Data Science, Virginia Tech(数学系和计算建模与数据科学学院,数据科学学院,弗吉尼亚理工学院) Max Planck Institute for Plasma Physics(最大平面物理研究所)

AI总结 针对Wendelstein 7-X仿星器中电子温度梯度湍流,利用主动学习回归构建物理引导的标度律降阶模型,预测热流并评估插值与外推性能。

Comments 15 pages, 7 tables, 14 figures

详情
Journal ref
Phys. Plasmas 33, 000000 (2026)
AI中文摘要

构建湍流输运的降阶模型对于加速剖面预测和实现参数探索、设计优化等多查询任务至关重要。本文研究了Wendelstein 7-X (W7-X)仿星器中电子温度梯度(ETG)湍流的机器学习驱动降阶模型。我们开发了物理引导的标度律,以预测七个径向位置处的ETG热流作为三个关键等离子体参数的函数:归一化电子温度梯度($ω_{T_e}$)、归一化电子温度与密度梯度之比($η_e$)以及电子与离子温度比($τ$)。模型系数通过回归结合主动学习策略确定。该过程使用低基数稀疏网格训练数据初始化标度律,并通过从现有模拟数据库中选择信息量最大的样本迭代丰富训练集。使用每个径向位置超过393个点的样本外数据集评估模型的预测性能。利用在七个训练径向位置识别的系数,我们进一步推导了标度律系数作为径向位置函数的回归参数化。然后在训练中未使用的三个额外径向位置评估所得模型,包括插值和适度外推情况。总体而言,我们的降阶模型表现出良好的预测性能,并达到与原始参考模拟相当的精度,包括在插值和适度外推范围内。一个重要发现是,单一的径向无关模型无法充分描述W7-X核心区的ETG输运,表明存在当前公式未捕捉的几何依赖物理。

英文摘要

Constructing reduced models for turbulent transport is essential for accelerating profile predictions and enabling many-query tasks such as parameter exploration and design optimization. This work investigates machine-learning-driven reduced models for Electron Temperature Gradient (ETG) turbulence in the Wendelstein 7-X (W7-X) stellarator. We develop physics-guided scaling laws to predict the ETG heat flux at seven radial locations as functions of three key plasma parameters: the normalized electron temperature gradient ($ω_{T_e}$), the ratio of normalized electron temperature and density gradients ($η_e$), and the electron-to-ion temperature ratio ($τ$). The model coefficients are determined through regression combined with an active learning strategy. The procedure initializes the scaling laws using low-cardinality sparse-grid training data and iteratively enriches the training set by selecting maximally informative samples from an existing simulation database. The predictive performance of the models is assessed using out-of-sample datasets comprising more than $393$ points per radial location. Using the coefficients identified at the seven training radial locations, we further derive regression-based parameterizations for the scaling-law coefficients as functions of radial position. The resulting models are then evaluated at three additional radial locations not used during training, including both interpolation and moderate extrapolation cases. Overall, our reduced models demonstrate good predictive performance and achieve accuracy comparable to the original reference simulations, including in interpolation and moderate extrapolation regimes. An important finding is that a single radius-independent model cannot adequately describe ETG transport across the W7-X core, suggesting the presence of geometry-dependent physics not captured by the present formulation.

2512.04123 2026-06-08 cs.CY cs.AI cs.LG cs.SE 版本更新

Measuring Agents in Production

生产环境中的智能体测量

Melissa Z. Pan, Negar Arabzadeh, Riccardo Cogo, Yuxuan Zhu, Alexander Xiong, Lakshya A Agrawal, Huanzhi Mao, Emma Shen, Sid Pallerla, Liana Patel, Shu Liu, Tianneng Shi, Xiaoyuan Liu, Jared Quincy Davis, Emmanuele Lacavalla, Alessandro Basile, Shuyi Yang, Paul Castro, Daniel Kang, Koushik Sen, Dawn Song, Joseph E. Gonzalez, Ion Stoica, Matei Zaharia, Marquita Ellis

发表机构 * University of California at Berkeley(加州大学伯克利分校) IBM Research(IBM研究院) University of Illinois at Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Stanford University(斯坦福大学)

AI总结 通过对86个已部署系统的调查和20个案例研究,发现生产环境中的LLM智能体主要采用简单可控的方法,可靠性是首要挑战,并依赖系统级设计和人工评估。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026) as Oral Presentation

详情
AI中文摘要

基于LLM的智能体已经在许多行业的生产环境中运行,但我们缺乏对哪些技术方法能使部署成功的理解。我们首次系统性地研究了生产环境中的智能体测量(MAP),使用了来自智能体开发者的一手数据。我们通过深度访谈进行了20个案例研究,并调查了来自26个领域的86个已部署系统的从业者。我们调查了组织为何构建智能体、如何构建它们、如何评估它们以及它们面临的主要开发挑战。我们的研究发现,生产环境中的智能体是使用简单、可控的方法构建的:68%的智能体在人类干预前最多执行10步,70%依赖对现成模型进行提示而非权重调整,74%主要依赖人工评估。可靠性(随时间保持一致的正确行为)仍然是首要开发挑战,从业者目前通过系统级设计来解决。MAP记录了生产智能体的当前状态,为研究社区提供了部署现实和未充分探索的研究方向的可见性。

英文摘要

LLM-based agents already operate in production across many industries, yet we lack an understanding of what technical methods make deployments successful. We present the first systematic study of Measuring Agents in Production, MAP, using first-hand data from agent developers. We conducted 20 case studies via in-depth interviews and surveyed 86 deployed systems practitioners across 26 domains. We investigate why organizations build agents, how they build them, how they evaluate them, and their top development challenges. Our study finds that production agents are built using simple, controllable approaches: 68% execute at most 10 steps before human intervention, 70% rely on prompting off-the-shelf models instead of weight tuning, and 74% depend primarily on human evaluation. Reliability (consistent correct behavior over time) remains the top development challenge, which practitioners currently address through systems-level design. MAP documents the current state of production agents, providing the research community with visibility into deployment realities and underexplored research avenues.

2512.12997 2026-06-08 cs.CV cs.AI cs.LG 版本更新

Calibrating Uncertainty for Zero-Shot Adversarial CLIP

校准零样本对抗性CLIP的不确定性

Wenjing Lu, Zerui Tao, Yuning Qiu, Dongping Zhang, Yang Yang, Qibin Zhao

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 针对CLIP在零样本分类中对抗攻击脆弱且不确定性校准差的问题,提出基于狄利克雷分布重参数化的对抗微调目标,统一对齐语义结构与置信度,提升校准性和鲁棒性。

Comments ICML 2026

详情
AI中文摘要

CLIP在零样本分类中表现强劲,但仍易受对抗攻击。先前的对抗微调工作主要匹配干净样本和对抗样本之间的预测logits,忽略了不确定性校准,可能损害零样本泛化能力。在可靠的不确定性估计中,一个常见期望是预测不确定性应随输入难度增加或偏离训练分布而上升。然而,在对抗环境中我们经常观察到相反的情况:扰动不仅降低准确性,还抑制不确定性,导致严重的校准错误和过度自信。这揭示了鲁棒性之外的关键可靠性差距。为弥合这一差距,我们提出了一种考虑准确性和不确定性的CLIP对抗微调目标。通过将CLIP输出重参数化为狄利克雷分布的浓度参数,我们提出了一种统一表示,捕获相对语义结构和置信度大小。这使得在扰动下实现整体分布对齐,超越单一logits锚定,恢复校准的不确定性。在多个零样本基准上的实验表明,我们的方法显著提高了不确定性校准,在保持干净准确性的同时实现了具有竞争力的对抗鲁棒性。

英文摘要

CLIP delivers strong zero-shot classification but remains highly vulnerable to adversarial attacks. Prior adversarial fine-tuning work primarily matches predicted logits between clean and adversarial examples, which overlooks uncertainty calibration and may degrade the zero-shot generalization. A common expectation in reliable uncertainty estimation is that predictive uncertainty should increase as inputs become more difficult or shift away from the training distribution. However, we frequently observe the opposite in the adversarial setting: perturbations not only degrade accuracy but also suppress uncertainty, leading to severe miscalibration and over-confidence. This reveals a critical reliability gap beyond robustness. To bridge this gap, we propose an adversarial fine-tuning objective for CLIP considering both accuracy and uncertainty. By reparameterizing CLIP outputs as the concentration parameters of a Dirichlet distribution, we propose a unified representation that captures relative semantic structure and confidence magnitude. This enables holistic distribution alignment under perturbations, moving beyond single-logit anchoring and restoring calibrated uncertainty. Experiments across multiple zero-shot benchmarks demonstrate that our method significantly improves uncertainty calibration and achieves competitive adversarial robustness while preserving clean accuracy.

2512.13278 2026-06-08 cs.CL cs.LG 版本更新

AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning

AutoTool: 面向智能体推理的动态工具选择与集成

Jiaru Zou, Ling Yang, Yunzhe Qi, Sirui Chen, Mengting Ai, Ke Shen, Jingrui He, Mengdi Wang

发表机构 * Nanyang Technological University(南洋理工大学)

AI总结 提出AutoTool框架,通过双阶段优化(SFT+RL轨迹稳定化和KL正则化Plackett-Luce排序)使大语言模型具备动态工具选择能力,在数学、科学、代码和多模态推理等任务上平均提升6.4%-7.7%。

Comments ICML2026; Best Paper Award at ICCV 2025 Workshop on Multi-Modal Reasoning for Agentic Intelligence

详情
AI中文摘要

智能体强化学习推动了大语言模型(LLMs)在长链思维轨迹中进行推理,同时穿插外部工具的使用。现有方法假设工具集固定,限制了LLM智能体对新工具或演化工具集的适应性。我们提出AutoTool,一个训练框架,使LLM智能体在整个推理轨迹中具备动态工具选择能力。AutoTool采用双阶段优化流水线:(i)基于SFT和RL的轨迹稳定化,以实现连贯推理;(ii)KL正则化的Plackett-Luce排序,以优化一致的多步工具选择。我们进一步构建了一个包含20万条数据的数据集,其中包含跨1000多个工具和100多个任务(涵盖数学、科学、代码生成和多模态推理)的显式工具选择理由。在十个多样化基准上,我们使用AutoTool训练了两个基础模型:Qwen3-8B和Qwen2.5-VL-7B。在参数更少的情况下,AutoTool持续优于先进的LLM智能体和工具集成方法,在数学与科学推理上平均提升6.4%,在基于搜索的问答上提升4.5%,在代码生成上提升7.7%,在多模态理解上提升6.9%。此外,AutoTool通过在推理过程中动态利用演化工具集中的未见工具,展现出更强的泛化能力。

英文摘要

Agentic reinforcement learning has advanced large language models (LLMs) to reason through long chain-of-thought trajectories while interleaving external tool use. Existing approaches assume a fixed inventory of tools, which limits the adaptability of LLM agents to new or evolving toolsets. We present AutoTool, a training framework that equips LLM agents with dynamic tool-selection capabilities throughout their reasoning trajectories. AutoTool employs a dual-phase optimization pipeline: (i) SFT and RL-based trajectory stabilization for coherent reasoning, and (ii) KL-regularized Plackett-Luce Ranking to refine consistent multi-step tool selection. We further build a 200k dataset with explicit tool-selection rationales across 1,000+ tools and 100+ tasks spanning mathematics, science, code generation, and multimodal reasoning. Across ten diverse benchmarks, we train two base models, Qwen3-8B and Qwen2.5-VL-7B, with AutoTool. With fewer parameters, AutoTool consistently outperforms advanced LLM agents and tool-integration methods, yielding average gains of 6.4% in math & science reasoning, 4.5% in search-based QA, 7.7% in code generation, and 6.9% in multimodal understanding. In addition, AutoTool exhibits stronger generalization by dynamically leveraging unseen tools from evolving toolsets during inference.

2512.23292 2026-06-08 cs.AI cs.LG 版本更新

Agentic Physical AI toward a Domain-Specific Foundation Model for Energy Systems: A Case Study on Nuclear Reactor Control

面向能源系统的领域特定基础模型的具身物理人工智能:以核反应堆控制为例

Yoon Pyo Lee, Samrendra Roy, Kazuma Kobayashi, Sajedul Talukder, Diab Abueidda, Seid Koric, Souvik Chakraborty, Syed Bahauddin Alam

发表机构 * The Grainger College of Engineering, Nuclear, Plasma & Radiological Engineering, University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校格雷格学院工程学院、核等工程学院) Department of Nuclear Engineering, Hanyang University(汉阳大学核工程系) University of Texas - El Paso(德克萨斯大学埃尔帕索分校) National Center for Supercomputing Applications(国家超级计算应用中心) Department of Applied Mechanics, Indian Institute of Technology Delhi(印度德里理工学院应用力学系) Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi(印度德里理工学院亚里人工智能学院)

AI总结 本研究提出通过紧凑语言模型作为具身物理人工智能,利用基于物理模拟器验证的策略优化替代感知推理,在核反应堆控制任务中实现领域特定基础模型,并展示了规模扩展带来的可靠性提升和策略集中化行为。

详情
AI中文摘要

当前物理系统人工智能的主流范式——将通用基础模型扩展至通用多模态推理——在控制接口处面临障碍。前沿视觉-语言模型在基本定量物理任务上仅达到50-53%的准确率,表现为近似猜测者,在保持语义合理性的同时违反物理约束。安全关键控制要求对执行动作的结果空间保证,而非参数空间模仿。本文提出了一条通向领域特定基础模型的路径,通过紧凑语言模型作为具身物理人工智能运行:基于物理模拟器验证的策略优化,而非感知推理。我们在从10^3到10^5个样本规模扩展的合成核反应堆场景上训练了一个360M参数的模型。规模扩展在标称模拟条件下产生了强烈的、依赖于工况的可靠性提升,方差缩小约500倍,并在采样分布上消除了>10%的终端功率偏移。尽管模型均衡地接触了四种执行机构族,但它在运行时将95%的执行集中在单一棒组策略上,无需强化学习或奖励工程。表征可在不同模拟器间迁移,无需改变架构。我们将该系统定位为验证、监控和纵深防御架构中的候选决策组件,而非独立的安全解决方案:所展示的行为仅涉及模拟中单步任务的闭环可靠性,尚未解决非标称运行、传感器故障或不确定性量化问题。

英文摘要

The prevailing paradigm in AI for physical systems: scaling general-purpose foundation models toward universal multimodal reasoning, confronts a barrier at the control interface. Frontier vision-language models achieve only 50-53% accuracy on basic quantitative physics tasks, behaving as approximate guessers that preserve semantic plausibility while violating physical constraints. Safety-critical control demands outcome-space guarantees over executed actions, not parameter-space imitation. Here we present a pathway toward domain-specific foundation models through compact language models operating as Agentic Physical AI: policy optimization driven by physics-based simulator validation rather than perceptual inference. We train a 360M-parameter model on synthetic nuclear reactor scenarios scaled from 10^3 to 10^5 examples. Scaling produces strong, regime-dependent reliability gains under nominal simulated conditions, with variance collapse of approximately 500x and elimination of >10% terminal-power excursions on the sampled distribution. Despite balanced exposure to four actuation families, the model concentrates 95% of runtime execution on a single-bank strategy, without reinforcement learning or reward engineering. Representations transfer across simulators without architectural change. We position the system as a candidate decision component within a verification, monitoring, and defense-in-depth architecture, not as a stand-alone safety solution: the demonstrated behavior speaks to closed-loop reliability on a single-step task in simulation and does not yet address off-nominal operation, sensor faults, or uncertainty quantification.

2601.04791 2026-06-08 cs.CV cs.LG 版本更新

Measurement-Consistent Langevin Corrector for Stabilizing Latent Diffusion Inverse Problem Solvers

用于稳定潜在扩散逆问题求解器的测量一致朗之万校正器

Lee Hyoseok, Sohwi Lim, Eunju Cha, Tae-Hyun Oh

发表机构 * Sookmyung Women's University(成均馆女子大学)

AI总结 针对潜在扩散模型逆问题求解器的不稳定性,提出测量一致朗之万校正器(MCLC),通过测量一致的朗之万更新缩小求解器与稳定反向扩散之间的差距,实现稳定可靠的潜在空间求解。

Comments ICML 2026

详情
AI中文摘要

尽管潜在扩散模型(LDM)已成为逆问题的强大先验,但现有的基于LDM的求解器经常遭受不稳定性。在这项工作中,我们首先将不稳定性识别为求解器动力学与扩散模型学习的稳定反向扩散动力学之间的差异,并表明减少这种差距可以稳定求解器。基于此,我们引入了\textit{测量一致朗之万校正器(MCLC)},这是一个理论上有依据的即插即用稳定模块,通过测量一致的朗之万更新来修复基于LDM的逆问题求解器。与先前依赖线性流形假设(通常在潜在空间中不成立)的方法相比,MCLC提供了一种原则性的稳定机制,从而在潜在空间中实现更稳定和可靠的行为。

英文摘要

While latent diffusion models (LDMs) have emerged as powerful priors for inverse problems, existing LDM-based solvers frequently suffer from instability. In this work, we first identify the instability as a discrepancy between the solver dynamics and stable reverse diffusion dynamics learned by the diffusion model, and show that reducing this gap stabilizes the solver. Building on this, we introduce \textit{Measurement-Consistent Langevin Corrector (MCLC)}, a theoretically grounded plug-and-play stabilization module that remedies the LDM-based inverse problem solvers through measurement-consistent Langevin updates. Compared to prior approaches that rely on linear manifold assumptions, which often fail to hold in latent space, MCLC provides a principled stabilization mechanism, leading to more stable and reliable behavior in latent space.

2601.08097 2026-06-08 cs.CL cs.LG 版本更新

AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling

AdaJudge: 自适应多视角评判用于奖励建模

Yongliang Miao, Yangyang Liang, Mengnan Du

发表机构 * Emory University(埃默里大学) The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳))

AI总结 提出AdaJudge框架,通过门控精化块和自适应多视角池化模块,联合优化表示与聚合,解决奖励建模中静态归纳偏差和表示不匹配问题,在RM-Bench和JudgeBench上超越现有模型。

Comments ACL 2026

详情
AI中文摘要

奖励建模对于将大型语言模型与人类偏好对齐至关重要,但主流架构依赖静态池化策略将序列压缩为标量分数。然而,这种范式存在两个关键限制:静态归纳偏差与任务相关的偏好信号不匹配,以及表示不匹配,因为骨干网络针对生成的优化使其表示不适用于细粒度判别。为解决这一问题,我们提出AdaJudge,一个统一框架,联合调整表示和聚合。AdaJudge首先通过门控精化块将骨干网络表示改进到判别导向的空间。然后,它用自适应多视角池化模块替换静态读出,该模块动态路由并组合证据。在RM-Bench和JudgeBench上的大量实验表明,AdaJudge优于强大的现成奖励模型和传统池化基线。

英文摘要

Reward modeling is essential for aligning large language models with human preferences, yet predominant architectures rely on a static pooling strategy to condense sequences into scalar scores. This paradigm, however, suffers from two key limitations: a static inductive bias that misaligns with task-dependent preference signals, and a representational mismatch, as the backbone's optimization for generation leaves its representations ill-suited to fine-grained discrimination. To address this, we propose AdaJudge, a unified framework that jointly adapts representation and aggregation. AdaJudge first improves backbone representations into a discrimination-oriented space via gated refinement blocks. It then replaces the static readout with an adaptive multi-view pooling module, which dynamically routes and combines evidence. Extensive experiments on RM-Bench and JudgeBench show that AdaJudge outperforms strong off-the-shelf reward models and traditional pooling baselines.

2601.12375 2026-06-08 cs.NI cs.LG 版本更新

LiQSS: Post-Transformer Linear Quantum-Inspired State-Space Tensor Networks for Real-Time 6G

LiQSS:后Transformer线性量子启发状态空间张量网络用于实时6G

Farhad Rezazadeh, Hatim Chergui, Amir Ashtari Gargari, Mehdi Bennis, Houbing Song, Lingjia Liu, Merouane Debbah

发表机构 * i2CAT Foundation(i2CAT基金会) University of Oulu(奥卢大学) University of Maryland, Baltimore County (UMBC)(马里兰大学巴尔的摩县分校) Virginia Tech(弗吉尼亚理工大学) Khalifa University of Science and Technology(谢赫扎耶德科学技术大学)

AI总结 提出一种后Transformer的量子启发状态空间张量网络LiQSS,用线性复杂度结构状态空间核替代自注意力,结合张量训练分解和轻量门控,在6G O-RAN近实时KPI预测中实现参数减少155倍、推理加速2.74倍且不损失精度。

Comments 13 pages, 4 figures, 5 tables

详情
AI中文摘要

第六代(6G)开放无线接入网络(O-RAN)中的主动和智能控制需要在严格的近实时(Near-RT)延迟和计算约束下进行控制级预测。虽然基于Transformer的模型在序列建模中有效,但其二次复杂度限制了在近实时RAN智能控制器(RIC)分析中的可扩展性。本文研究了一种后Transformer设计范式,用于高效的无线电遥测预测。我们提出了一种量子启发的多体状态空间张量网络,用稳定的结构化状态空间动力学核替代自注意力,实现线性时间序列建模。采用张量训练(TT)/矩阵乘积态(MPS)表示形式的张量网络分解,以减少输入投影和预测头中的参数化和数据移动,同时轻量级通道门控和混合层捕获非平稳的跨关键性能指标(KPI)依赖关系。该模型实例化为一个智能感知-预测xApp,并在一个包含13个KPI的59441个滑动窗口的定制O-RAN KPI时间序列数据集上评估,以参考信号接收功率(RSRP)预测作为代表性用例。我们提出的线性量子启发状态空间(LiQSS)模型比先前的结构化状态空间基线小10.8倍至15.8倍,速度快约1.4倍。相对于基于Transformer的模型,LiQSS在不牺牲预测精度的情况下,参数数量减少高达155倍,推理速度提升高达2.74倍。

英文摘要

Proactive and agentic control in Sixth-Generation (6G) Open Radio Access Networks (O-RAN) requires control-grade prediction under stringent Near-Real-Time (Near-RT) latency and computational constraints. While Transformer-based models are effective for sequence modeling, their quadratic complexity limits scalability in Near-RT RAN Intelligent Controller (RIC) analytics. This paper investigates a post-Transformer design paradigm for efficient radio telemetry forecasting. We propose a quantum-inspired many-body state-space tensor network that replaces self-attention with stable structured state-space dynamics kernels, enabling linear-time sequence modeling. Tensor-network factorizations in the form of Tensor Train (TT) / Matrix Product State (MPS) representations are employed to reduce parameterization and data movement in both input projections and prediction heads, while lightweight channel gating and mixing layers capture non-stationary cross-Key Performance Indicator (KPI) dependencies. The proposed model is instantiated as an agentic perceive-predict xApp and evaluated on a bespoke O-RAN KPI time-series dataset comprising 59,441 sliding windows across 13 KPIs, using Reference Signal Received Power (RSRP) forecasting as a representative use case. Our proposed Linear Quantum-Inspired State-Space (LiQSS) model is 10.8x-15.8x smaller and approximately 1.4x faster than prior structured state-space baselines. Relative to Transformer-based models, LiQSS achieves up to a 155x reduction in parameter count and up to 2.74x faster inference, without sacrificing forecasting accuracy.

2602.01177 2026-06-08 quant-ph cs.IT cs.LG math.IT 版本更新

Privacy Implies Stability: Information-Theoretic Generalization Bounds for Quantum Learning

隐私蕴含稳定性:量子学习的信息论泛化界

Ayanava Dasgupta, Naqueeb Ahmad Warsi, Masahito Hayashi

发表机构 * Indian Statistical Institute, Kolkata(印度统计研究院,科希玛) School of Data Science, The Chinese University of Hong Kong, Shenzhen(香港中文大学深圳校区数据科学学院) International Quantum Academy, Futian District, Shenzhen(深圳未来科技学院) Graduate School of Mathematics, Nagoya University(名古屋大学数学研究生院)

AI总结 提出信息论框架连接量子学习算法的稳定性、隐私与泛化,证明量子差分隐私可直接导出泛化保证,并发现量子非正交性使信息论可容许性与隐私兼容。

Comments 36 pages, 3 figures; The introduction has been substantially rewritten to provide better context, and certain proofs have been relocated from the appendices to the main body of the paper; The core mathematical framework and technical results remain unchanged

详情
AI中文摘要

我们开发了一个信息论框架,连接量子学习算法的稳定性、隐私和泛化。学习过程被建模为具有经典-量子输出的量子仪器,损失由可观测量表示。我们证明,在经典-量子次高斯条件下,信息论稳定性度量控制期望泛化误差。此外,我们利用量子Rényi散度处理非交换性下的高阶依赖,建立了一个高概率泛化界。在可信数据处理者设置中,量子差分隐私(QDP)提供了一种稳定性机制。我们证明单邻居QDP严格限制了经典-量子输出泄露的信息。结合我们的稳定性定理,直接得到隐私到泛化的保证。我们还探索了不可信数据处理者设置。在此,仅输出隐私是不够的,因为对抗性处理者可能在应用噪声后处理之前执行高度信息性的过程。为了解决这个问题,我们引入了信息论可容许性(ITA),这是一种认证条件,确保规定程序不仅仅是编码系综上一个严格更具信息性、物理允许操作的退化版本。我们证明了一个基本分离:虽然在经典模型中可容许性和隐私存在强烈张力,但量子非正交性使它们兼容。量子测量可以是ITA——耗尽所有相关的可访问信息——而无需完美恢复经典数据集。我们通过一个具体的量子ITA例子说明了这种分离。

英文摘要

We develop an information-theoretic framework connecting stability, privacy, and generalization for quantum learning algorithms. Learning procedures are modeled as quantum instruments with classical-quantum outputs, and losses are represented by observables. We prove that under a classical-quantum sub-Gaussian condition, an information-theoretic stability measure controls the expected generalization error. Furthermore, we establish a high-probability generalization bound using quantum Rényi divergences to manage higher-order dependencies under non-commutativity. In the trusted Data Processor setting, quantum differential privacy (QDP) provides a mechanism for stability. We show that one-neighbor QDP strictly bounds the information leaked by the classical-quantum output. Combining this with our stability theorem yields a direct privacy-to-generalization guarantee. We also explore an untrusted Data Processor setting. Here, output privacy alone is insufficient since an adversarial processor could perform a highly informative procedure before applying noisy post-processing. To combat this, we introduce Information-Theoretic Admissibility (ITA), a certification condition ensuring the prescribed procedure is not just a degraded version of a strictly more informative, physically allowed operation on the encoded ensemble. We prove a fundamental separation: while admissibility and privacy are in strong tension in classical models, quantum non-orthogonality makes them compatible. A quantum measurement can be ITA - exhausting all relevant accessible information - without perfectly recovering the classical dataset. We illustrate this separation through a concrete quantum ITA example.

2602.01740 2026-06-08 cs.AI cs.CV cs.LG 版本更新

MACD: Model-Aware Contrastive Decoding via Counterfactual Data

MACD:基于反事实数据的模型感知对比解码

Qixin Xiao, Kun Zhou

发表机构 * University of Michigan, Ann Arbor, MI, USA(密歇根大学,安娜堡分校) University of California San Diego, La Jolla, CA, USA(加州大学圣地亚哥分校)

AI总结 提出MACD方法,利用视频语言模型自身反馈识别导致幻觉的目标区域,生成目标级反事实输入,结合对比解码减少幻觉,提升多模型在复杂场景下的准确性。

详情
AI中文摘要

视频语言模型(Video-LLMs)容易产生幻觉,当视觉证据薄弱、模糊或存在偏差时,会生成看似合理但无根据的内容。现有方法如对比解码(CD)依赖随机扰动构建对比数据以缓解幻觉,但往往未能针对驱动幻觉的视觉线索或模型弱点。我们提出基于模型感知反事实数据的对比解码(MACD),这是一种结合模型引导的反事实构建与对比解码的推理策略。MACD利用Video-LLM自身的反馈来识别最可能导致幻觉的目标区域,生成有针对性的目标级反事实输入,而非任意的帧或时间修改。这些反事实输入被整合到CD中,以在解码过程中强制进行基于证据的令牌选择。在EventHallusion、MVBench、Perception-test和Video-MME上的实验表明,MACD在包括Qwen和InternVL在内的多种Video-LLM上持续减少幻觉,同时保持或提高任务准确性,在涉及小目标、遮挡目标或共现目标的场景中尤其表现出显著优势。

英文摘要

Video language models (Video-LLMs) are prone to hallucinations, generating plausible but ungrounded content when visual evidence is weak, ambiguous, or biased. Existing methods, such as contrastive decoding (CD), rely on random perturbations to construct contrastive data for hallucination mitigation, but often fail to target the visual cues that drive hallucination or align with model weaknesses. We propose Model-Aware Counterfactual Data based Contrastive Decoding (MACD), an inference strategy that combines model-guided counterfactual construction with contrastive decoding. MACD uses the Video-LLM's own feedback to identify object regions most responsible for hallucination, generating targeted object-level counterfactual inputs rather than arbitrary frame or temporal modifications. These counterfactual inputs are integrated into CD to enforce evidence-grounded token selection during decoding. Experiments on EventHallusion, MVBench, Perception-test, and Video-MME show that MACD consistently reduces hallucination while maintaining or improving task accuracy across diverse Video-LLMs, including Qwen and InternVL, with especially strong gains in scenarios involving small, occluded, or co-occurring objects.

2602.02014 2026-06-08 cs.CV cs.AI cs.CL cs.LG 版本更新

Rethinking Genomic Modeling Through Optical Character Recognition

通过光学字符识别重新思考基因组建模

Hongxin Xiang, Pengsen Ma, Yunkang Cao, Di Yu, Haowen Chen, Xinyu Yang, Xiangxiang Zeng

发表机构 * National University of Singapore(新加坡国立大学) University of Science and Technology of China(中国科学技术大学)

AI总结 提出OpticalDNA框架,将DNA渲染为视觉布局,利用视觉语言模型进行OCR式基因组理解,实现高保真压缩和长序列高效处理,在450k碱基序列上以近20倍更少有效token超越基线模型。

Comments Accepted by ICML 2026

详情
AI中文摘要

最近的基因组基础模型大多采用大型语言模型架构,将DNA视为一维token序列。然而,穷举式顺序阅读在结构上与稀疏且不连续的基因组语义不匹配,导致在低信息背景上的计算浪费,并阻碍了面向长上下文的压缩理解。在此,我们提出OpticalDNA,一个基于视觉的框架,将基因组建模重新定义为光学字符识别(OCR)风格的文档理解。OpticalDNA将DNA渲染为结构化视觉布局,并训练一个具备OCR能力的视觉语言模型,该模型包含视觉DNA编码器和文档解码器,其中编码器生成紧凑、可重建的视觉token以实现高保真压缩。基于这种表示,OpticalDNA定义了基于提示条件的核心基因组原语目标——读取、区域定位、子序列检索和掩码跨度补全——从而学习到布局感知的DNA表示,在减少的有效token预算下保留细粒度的基因组信息。在多种基因组基准测试中,OpticalDNA持续优于最近的基线模型;在长达450k碱基的序列上,它以近20倍更少的有效token实现了最佳整体性能,并且仅调整256k可训练参数就超越了激活参数多达985倍的模型。

英文摘要

Recent genomic foundation models largely adopt large language model architectures that treat DNA as a one-dimensional token sequence. However, exhaustive sequential reading is structurally misaligned with sparse and discontinuous genomic semantics, leading to wasted computation on low-information background and preventing understanding-driven compression for long contexts. Here, we present OpticalDNA, a vision-based framework that reframes genomic modeling as Optical Character Recognition (OCR)-style document understanding. OpticalDNA renders DNA into structured visual layouts and trains an OCR-capable vision--language model with a visual DNA encoder and a document decoder, where the encoder produces compact, reconstructible visual tokens for high-fidelity compression. Building on this representation, OpticalDNA defines prompt-conditioned objectives over core genomic primitives-reading, region grounding, subsequence retrieval, and masked span completion-thereby learning layout-aware DNA representations that retain fine-grained genomic information under a reduced effective token budget. Across diverse genomic benchmarks, OpticalDNA consistently outperforms recent baselines; on sequences up to 450k bases, it achieves the best overall performance with nearly 20$\times$ fewer effective tokens, and surpasses models with up to 985$\times$ more activated parameters while tuning only 256k trainable parameters.

2602.06245 2026-06-08 stat.ML cs.LG 版本更新

Inheritance Between Feedforward and Convolutional Networks via Model Projection

前馈网络与卷积网络之间的继承关系:通过模型投影

Nicolas Ewen, Jairo Diaz-Rodriguez, Kelly Ramsay

发表机构 * Department of Mathematics and Statistics(数学与统计学系)

AI总结 提出模型继承概念,证明广义前馈网络是广义卷积网络的子集,并通过模型投影实现反向继承,用于参数高效的迁移学习。

详情
AI中文摘要

神经网络技术通常通过类比在不同架构家族之间转移,但这种转移仅在技术所需假设被保留时才有效。我们将这一思想引入为模型类之间的继承。使用统一的节点级框架和张量值激活,我们证明广义前馈网络(GFFN)是广义卷积网络(GCNN)的严格子集,因此GCNN的性质直接转移到GFFN。反向方向并非自动:标准CNN节点使用空间核,而FFN节点对每个输入贡献使用一个标量权重。我们引入模型投影来恢复受限的反向继承路径。投影冻结每个卷积输入通道子函数,并为每个输入-输出通道贡献学习一个标量系数,使投影后的CNN节点具有标量加权输入重组的GFFN风格可训练结构。这种继承结构自然导致参数高效的迁移学习。在多个ImageNet预训练CNN骨干网络和下游图像分类数据集上,模型投影与标准和PEFT基线竞争,并为后续全微调提供有效的初始化。

英文摘要

Neural-network techniques are often transferred across architecture families by analogy, but such transfer is valid only when the assumptions required by a technique are preserved. We introduce this idea as inheritance between model classes. Using a unified node-level framework with tensor-valued activations, we prove that generalized feedforward networks (GFFNs) form a strict subset of generalized convolutional networks (GCNNs), so GCNN properties transfer directly to GFFNs. The reverse direction is not automatic: standard CNN nodes use spatial kernels, while FFN nodes use one scalar weight per input contribution. We introduce model projection to recover a restricted reverse inheritance path. Projection freezes each convolutional input-channel sub-function and learns one scalar coefficient for each input-output channel contribution, giving projected CNN nodes the GFFN-style trainable structure of scalar-weighted input recombination. This inherited structure leads naturally to parameter-efficient transfer learning. Across multiple ImageNet-pretrained CNN backbones and downstream image-classification datasets, model projection is competitive with standard and PEFT baselines and provides an effective initialization for subsequent full fine-tuning.

2602.09580 2026-06-08 cs.RO cs.LG 版本更新

SERNF: Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows

SERNF: 通过动作块评论家和归一化流实现样本高效的真实世界灵巧策略微调

Chenyu Yang, Denis Tarasov, Davide Liconti, Romain Guntz, Hehui Zheng, Robert K. Katzschmann

发表机构 * Soft Robotics Lab, D-MAVT(软机器人实验室,D-MAVT) ETH Zurich(苏黎世联邦理工学院)

AI总结 提出SERNF框架,结合归一化流策略和动作块评论家,实现真实世界灵巧操作策略的样本高效微调,解决多模态动作分布和信用分配问题。

Comments https://srl-ethz.github.io/SERNF/

详情
AI中文摘要

由于有限的真实世界交互预算和高度多模态的动作分布,真实世界中灵巧操作策略的微调仍然具有挑战性。基于扩散的策略虽然表达能力强,但在微调过程中不允许进行保守的基于似然的更新,因为动作概率难以处理。相比之下,传统的高斯策略在多模态下会崩溃,特别是当动作以块形式执行时,而标准的逐步骤评论家无法与块执行对齐,导致信用分配不佳。我们提出了SERFN,一个具有归一化流(NF)的样本高效离策略微调框架,以应对这些挑战。归一化流策略为多模态动作块提供精确的似然,通过似然正则化实现保守、稳定的策略更新,从而提高样本效率。动作块评论家评估整个动作序列,使价值估计与策略的时间结构对齐,并改善长时域信用分配。据我们所知,这是首次在真实机器人硬件上展示基于似然的多模态生成策略与块级价值学习相结合。我们在真实世界的两个具有挑战性的灵巧操作任务上评估了SERFN:从盒子中取出剪刀并剪断胶带,以及手掌朝下抓握时进行手中立方体旋转——两者都需要在长时域内进行精确、灵巧的控制。在这些任务上,SERFN实现了稳定、样本高效的适应,而标准方法则难以应对。

英文摘要

Real-world fine-tuning of dexterous manipulation policies remains challenging due to limited real-world interaction budgets and highly multimodal action distributions. Diffusion-based policies, while expressive, do not permit conservative likelihood-based updates during fine-tuning because action probabilities are intractable. In contrast, conventional Gaussian policies collapse under multimodality, particularly when actions are executed in chunks, and standard per-step critics fail to align with chunked execution, leading to poor credit assignment. We present SERFN, a sample-efficient off-policy fine-tuning framework with normalizing flow (NF) to address these challenges. The normalizing flow policy yields exact likelihoods for multimodal action chunks, allowing conservative, stable policy updates through likelihood regularization and thereby improving sample efficiency. An action-chunked critic evaluates entire action sequences, aligning value estimation with the policy's temporal structure and improving long-horizon credit assignment. To our knowledge, this is the first demonstration of a likelihood-based, multimodal generative policy combined with chunk-level value learning on real robotic hardware. We evaluate SERFN on two challenging dexterous manipulation tasks in the real world: cutting tape with scissors retrieved from a case, and in-hand cube rotation with a palm-down grasp -- both of which require precise, dexterous control over long horizons. On these tasks, SERFN achieves stable, sample-efficient adaptation where standard methods struggle.

2602.15084 2026-06-08 physics.plasm-ph cs.AI cs.LG 版本更新

TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics

TokaMind: 用于托卡马克等离子体动力学的多模态Transformer基础模型

Tobia Boschi, Andrea Loreti, Nicola C. Amorisco, Rodrigo H. Ordonez-Hurtado, Cécile Rousseau, George K. Holt, Eszter Székely, Alexander Whittle, Samuel Jackson, Adriano Agnello, Stanislas Pamela, Alessandra Pascale, Robert Akers, Juan Bernabe Moreno, Vassil Alexandrov, Mykhaylo Zayats

发表机构 * IBM Research(IBM研究院) UK Atomic Energy Authority(英国原子能局) STFC Hartree Centre(科学与技术设施研究中心哈特ree中心)

AI总结 提出TokaMind,首个开源托卡马克等离子体动力学基础模型,基于多模态Transformer在MAST数据集上预训练,支持多种数据模态和缺失信号处理,在14个任务上优于基线。

详情
AI中文摘要

我们提出TokaMind,据我们所知,这是首个用于托卡马克等离子体动力学的开源基础模型,基于多模态Transformer(MMT)并在公开可用的MAST数据集上的异构诊断数据上预训练。TokaMind支持多种数据模态(时间序列、2D轮廓和视频),具有不同的采样率、鲁棒的缺失信号处理,并通过选择性加载和冻结四个模型组件实现高效任务适配。为了表示多模态信号,我们使用轻量级固定基离散余弦变换嵌入(DCT3D),并为替代嵌入(例如变分自编码器)提供干净接口。我们在最近引入的MAST基准TokaMark上评估TokaMind,该基准包含14个具有异构重建和预测目标的任务。我们的结果表明,微调后的TokaMind在所有任务上均优于最强的基准基线,仅一个任务除外。与在匹配的epoch预算下从头训练相同架构相比,热启动适配在要求苛刻的下游设置中最为有益,包括长时域预测和高维平衡目标。这些发现突显了多模态预训练对托卡马克等离子体动力学的价值,并为未来的聚变建模任务提供了实用、可扩展的基础。训练代码和模型权重分别公开在github.com/UKAEA-IBM-STFC-Fusion-FMs/tokamind和huggingface.co/UKAEA-IBM-STFC。

英文摘要

We present TokaMind, to our knowledge the first open-source foundation model for tokamak plasma dynamics, based on a Multi-Modal Transformer (MMT) and pretrained on heterogeneous diagnostics from the publicly available MAST dataset. TokaMind supports multiple data modalities (time-series, 2D profiles, and videos) with different sampling rates, robust missing-signal handling, and efficient task adaptation via selectively loading and freezing four model components. To represent multi-modal signals, we use a lightweight fixed-basis Discrete Cosine Transform embedding (DCT3D) and provide a clean interface for alternative embeddings (e.g., Variational Autoencoders). We evaluate TokaMind on the recently introduced MAST benchmark TokaMark, which comprises 14 tasks with heterogeneous reconstruction and forecasting objectives. Our results show that fine-tuned TokaMind outperforms the strongest benchmark baseline on all but one task. Compared with training the same architecture from scratch under a matched epoch budget, warm-start adaptation is most beneficial on demanding downstream settings, including long-horizon forecasting and high-dimensional equilibrium objectives. These findings highlight the value of multi-modal pretraining for tokamak plasma dynamics and provide a practical, extensible foundation for future fusion modeling tasks. Training code and model weights are publicly available at github.com/UKAEA-IBM-STFC-Fusion-FMs/tokamind and huggingface.co/UKAEA-IBM-STFC, respectively.

2602.16908 2026-06-08 cond-mat.mtrl-sci cs.LG quant-ph 版本更新

Multi-objective optimization and quantum hybridization of equivariant deep learning interatomic potentials

等变深度学习原子间势的多目标优化与量子混合

G. Laskaris, D. Morozov, D. Tarpanov, A. Seth, J. Procelewska, G. Sai Gautam, A. Sagingalieva, R. Brasher, A. Melnikov

发表机构 * Terra Quantum AG LIACS, Leiden University(LIACS,莱顿大学) Nanoscience Center and Department of Chemistry, University of Jyväskylä(贾瓦尔基利亚大学纳米科学中心和化学系) Department of Materials Engineering, Indian Institute of Science(印度科学研究所材料工程系) Schaeffler Technologies AG & Co. KG

AI总结 针对Allegro模型在精度与推理时间之间的权衡,采用多目标超参数优化,并设计经典扩展和量子-经典混合两种变体,在多个数据集上验证了混合变体在力预测精度上的优势。

Comments 15 pages, 7 figures, 6 tables

详情
Journal ref
Comput. Mater. Sci. 270, 114742 (2026)
AI中文摘要

Allegro是一种机器学习原子间势模型,旨在使用E(3)等变神经网络预测分子中的原子性质。在训练该模型时,精度与推理时间之间往往存在权衡。为此,我们对这两个目标应用多目标超参数优化。此外,我们通过构建Allegro的变体来尝试修改架构:一种扩展了额外的经典层,另一种结合了量子-经典混合层。我们在QM9、rMD17-阿司匹林、rMD17-苯以及一个自生成的铜-锂结构数据集上评估所有模型。结果表明,两种变体在多个数据集上的力预测精度均超过Allegro。经典变体持续优于基线,而量子-经典混合变体在完全优化的Cu-Li数据集上取得了最佳的整体力预测精度,比经典变体高出约13%。值得注意的是,尽管混合变体在其他数据集上使用了从Cu-Li转移的超参数而未进行特定数据集的优化,但仍取得了有竞争力的结果,这表明量子-经典混合是增强MLIP架构的一个有前景的方向。

英文摘要

Allegro is a machine learning interatomic potential model designed to predict atomic properties in molecules using E(3) equivariant neural networks. When training this model, there tends to be a trade-off between accuracy and inference time. For this reason, we apply multi-objective hyperparameter optimization to both objectives. Additionally, we experiment with modified architectures by constructing variants of Allegro: one extended with additional classical layers and one incorporating quantum-classical hybrid layers. We evaluate all models on QM9, rMD17-aspirin, rMD17-benzene, and a self-generated dataset of copper-lithium structures. As results, both variants surpass Allegro in force prediction accuracy across multiple datasets. The classical variant consistently improves over the baseline, while the quantum-classical hybrid variant achieves the best overall force prediction accuracy on the Cu-Li dataset, where it was fully optimized, outperforming the classical variant by approximately 13%. Notably, the hybrid variant also achieves competitive results on the remaining datasets despite using hyperparameters transferred from Cu-Li without dataset-specific optimization, suggesting that quantum-classical hybridization is a promising direction for enhancing MLIP architectures.

2603.06673 2026-06-08 cs.CV cs.LG 版本更新

Unmixing ATR-μFTIR spectroscopic images of cross-sections of historical oil paintings

历史油画横截面的ATR-μFTIR光谱图像解混

Shivam Pande, Nicolas Nadisic, Francisco Mederos-Henry, Aleksandra Pizurica

发表机构 * Belgian Federal Science Policy(比利时联邦科学政策) FED-tWIN project(FED-tWIN项目) Prf-2022-050 BALaTAI Prf-2021-002 MatCoRe

AI总结 提出一种无监督CNN自编码器,结合加权光谱角距离损失,用于解混ATR-μFTIR高光谱图像,自动估计端元光谱和丰度图,在污染区域提升可解释性。

Comments 5 pages, accepted at EUSIPCO 2026

详情
AI中文摘要

光谱成像已成为遗产科学的核心技术,因为它能够对文物中的材料进行非侵入性、空间分辨的表征。特别是,衰减全反射傅里叶变换红外显微镜(ATR-$μ$FTIR)被广泛用于分析绘画横截面,其中在每个像素处记录光谱以形成高光谱图像(HSI)。解释这些数据是困难的:光谱通常是异质、多层和退化样品中多种物质的混合物,而当前实践仍然严重依赖于与参考库的手动比较。这种工作流程缓慢、主观且难以扩展。我们提出了一种无监督CNN自编码器,用于盲解混ATR-$μ$FTIR HSI,通过基于块建模利用局部空间结构,估计端元光谱及其丰度图。为了减少对超过1500个波段的大气和采集伪影的敏感性,我们引入了一种加权光谱角距离(WSAD)损失,该损失具有从空间平坦度、邻域一致性和光谱粗糙度的稳健度量中自动导出的波段可靠性权重。与标准SAD训练相比,WSAD在易受污染的光谱区域提高了可解释性。我们在凡·艾克兄弟的根特祭坛画的ATR-$μ$FTIR横截面上演示了该方法。

英文摘要

Spectroscopic imaging (SI) has become central to heritage science because it enables non-invasive, spatially resolved characterisation of materials in artefacts. In particular, attenuated total reflection Fourier transform infrared microscopy (ATR-$μ$FTIR) is widely used to analyse painting cross-sections, where a spectrum is recorded at each pixel to form a hyperspectral image (HSI). Interpreting these data is difficult: spectra are often mixtures of several species in heterogeneous, multi-layered and degraded samples, and current practice still relies heavily on manual comparison with reference libraries. This workflow is slow, subjective and hard to scale. We propose an unsupervised CNN autoencoder for blind unmixing of ATR-$μ$FTIR HSIs, estimating endmember spectra and their abundance maps while exploiting local spatial structure through patch-based modelling. To reduce sensitivity to atmospheric and acquisition artefacts across more than 1500 bands, we introduce a weighted spectral angle distance (WSAD) loss with automatic band-reliability weights derived from robust measures of spatial flatness, neighbour agreement and spectral roughness. Compared with standard SAD training, WSAD improves interpretability in contamination-prone spectral regions. We demonstrate the method on an ATR-$μ$FTIR cross-section from the Ghent Altarpiece by the Van Eyck brothers.

2603.06915 2026-06-08 cs.CL cs.LG 版本更新

A Dynamic Self-Evolving Extraction System

一种动态自演化抽取系统

Moin Amin-Naseri, Hannah Kim, Estevam Hruschka

发表机构 * Megagon Labs(Megagon实验室)

AI总结 提出DySECT系统,通过LLM抽取三元组构建知识库,结合概率知识和图推理丰富知识,再反馈优化抽取器,形成闭环持续提升。

详情
AI中文摘要

从原始文本中抽取结构化信息是许多NLP应用(包括文档检索、排序和相关性估计)的基本组成部分。高质量的抽取通常需要领域特定的准确性、对专业分类法的最新理解,以及吸收新兴术语和罕见异常值的能力。在许多领域(如医疗、法律和人力资源),抽取模型还必须适应不断变化的术语,并受益于对结构化知识的显式推理。我们提出了DySECT,一个动态自演化抽取与策管工具包,它在使用过程中持续改进。该系统逐步用LLM抽取的三元组填充一个多功能、自扩展的知识库(KB)。KB通过整合概率知识和基于图的推理进一步丰富自身,逐步积累领域概念和关系。然后,丰富的KB通过提示调优、采样相关少样本示例或使用KB衍生的合成数据进行微调,反馈给LLM抽取器。结果,系统形成了一个共生的闭环循环,其中抽取持续改进知识,知识持续改进抽取。

英文摘要

The extraction of structured information from raw text is a fundamental component of many NLP applications, including document retrieval, ranking, and relevance estimation. High-quality extractions often require domain-specific accuracy, up-to-date understanding of specialized taxonomies, and the ability to incorporate emerging jargon and rare outliers. In many domains--such as medical, legal, and HR--the extraction model must also adapt to shifting terminology and benefit from explicit reasoning over structured knowledge. We propose DySECT, a Dynamic Self-Evolving Extraction and Curation Toolkit, which continually improves as it is used. The system incrementally populates a versatile, self-expanding knowledge base (KB) with triples extracted by the LLM. The KB further enriches itself through the integration of probabilistic knowledge and graph-based reasoning, gradually accumulating domain concepts and relationships. The enriched KB then feeds back into the LLM extractor via prompt tuning, sampling of relevant few-shot examples, or fine-tuning using KB-derived synthetic data. As a result, the system forms a symbiotic closed-loop cycle in which extraction continuously improves knowledge, and knowledge continuously improves extraction.

2603.08683 2026-06-08 cs.SD cs.AI cs.LG eess.AS 版本更新

Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

全保真音频无损压缩的语言建模基准测试

Phillip Long, Zachary Novack, Chris Donahue

发表机构 * University of California, San Diego, Computer Science and Engineering Department(加州大学圣地亚哥分校计算机科学与工程系) Carnegie Mellon University, School of Computer Science(卡内基梅隆大学计算机科学学院)

AI总结 提出字节级分词方案Trilobyte,将词汇量从指数级降至常数级,首次实现24位音频的LM无损压缩,并在8位和16位下超越FLAC。

Comments Accepted at Interspeech 2026, 7 pages, 5 figures

详情
AI中文摘要

在原始波形上训练的自回归“语言”模型(LM)可被重新用于无损音频压缩,但先前的工作仅限于8位音频,尚不清楚此类方法是否适用于实际场景(16/24位)以及能否与现有编解码器竞争。我们对基于LM的压缩在全保真音频上进行了基准测试,涵盖不同领域(音乐、语音、生物声学)、采样率(16kHz-48kHz)和位深度(8、16、24位)。标准的样本级分词在更高位深度下因词汇量过大(16位为65K;24位为16.7M)而变得不可行。我们提出了Trilobyte,一种用于全分辨率音频的字节级分词方案,将词汇量从$O(2^{b})$改进为$O(1)$,并首次实现了可行的24位基于LM的无损压缩。虽然LM在8位和16位下持续优于FLAC并达到最先进的压缩效果,但我们观察到,随着位深度超过8位,压缩增益变得更为有限。

英文摘要

Autoregressive "language" models (LMs) trained on raw waveforms can be repurposed for lossless audio compression, but prior work is limited to 8-bit audio, leaving open whether such approaches work for practical settings (16/24-bit) and can compete with existing codecs. We benchmark LM-based compression on full-fidelity audio across diverse domains (music, speech, bioacoustics), sampling rates (16kHz-48kHz), and bit depths (8, 16, 24-bit). Standard sample-level tokenization becomes intractable at higher bit depths due to vocabulary size (65K for 16-bit; 16.7M for 24-bit). We propose Trilobyte, a byte-level tokenization schema for full resolution audio, improving vocabulary scaling from $O(2^{b})$ to $O(1)$ and enabling the first tractable 24-bit LM-based lossless compression. While LMs consistently outperform FLAC and yield state-of-the-art compression at 8-bit and 16-bit, we observe that compression gains become more modest as bit depth increases beyond 8-bit.

2603.19146 2026-06-08 cs.AI cs.LG 版本更新

D5P4: Partition Determinantal Point Process for Diversity in Parallel Discrete Diffusion Decoding

D5P4:用于并行离散扩散解码中多样性的分区行列式点过程

Jonathan Lys, Vincent Gripon, Axel Marmoret, Lukas Mauch, Fabien Cardinaux, Ghouthi Boukli Hacene, Bastien Pasdeloup

发表机构 * IMT Atlantique, Lab-STICC, UMR CNRS 6285(IMT阿提拉克大学,Lab-STICC,UMR CNRS 6285) Sony Europe Ltd. Stuttgart Technology Center, EUREC(索尼欧洲有限公司,斯图加特技术中心,EUREC)

AI总结 提出D5P4波束解码方法,利用分区行列式点过程在离散扩散模型中选择中间序列,平衡质量与多样性,无需外部验证器。

详情
AI中文摘要

离散扩散模型是自回归方法在文本生成中的有前途的替代方案,但其解码方法仍研究不足。标准的自回归搜索过程(如波束搜索)不直接适用于迭代去噪,其中假设是完整的中间序列而非从左到右的前缀。此外,现有的扩散解码过程对保留假设的多样性和覆盖范围的控制有限。在这项工作中,我们引入了D5P4,一种针对离散扩散模型定制的波束式解码方法,它将中间波束选择视为分区行列式点过程下的MAP推理。这产生了一个模型内部的批次目标,无需外部验证器即可平衡质量和多样性。在开放域生成、问答和数学推理上的实验表明,D5P4提高了多样性和pass@$k$覆盖率,同时匹配或超越了基线质量和保真度。

英文摘要

Discrete diffusion models are promising alternatives to autoregressive approaches for text generation, yet their decoding methods remain under-studied. Standard autoregressive search procedures, such as beam search, do not directly apply to iterative denoising, where hypotheses are complete intermediate sequences rather than left-to-right prefixes. Furthermore, existing diffusion decoding procedures only provide limited control over the diversity and coverage of retained hypotheses. In this work, we introduce D5P4, a beam-style decoding method tailored to discrete diffusion models, which casts intermediate beam selection as MAP inference under a partitioned Determinantal Point Process. This yields a model-internal batch objective that balances quality and diversity without external verifiers. Experiments on open-ended generation, question answering, and mathematical reasoning show that D5P4 improves diversity and pass@$k$ coverage while matching or surpassing baseline quality and fidelity

2603.20967 2026-06-08 stat.ML cs.LG math.ST stat.TH 版本更新

Hard labels sampled from sparse targets mislead rotation invariant algorithms

从稀疏目标采样的硬标签误导旋转不变算法

Avrajit Ghosh, Bin Yu, Manfred Warmuth, Peter Bartlett

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Wisconsin, Madison(威斯康星大学麦迪逊分校)

AI总结 针对稀疏目标下的二分类问题,证明旋转不变算法(如逻辑损失梯度下降)的过风险下界为Ω((d-1)/n),而通过重参数化u_i v_i的非旋转不变算法可实现O(s log d / n)的上界。

详情
Journal ref
ICML-2026
AI中文摘要

最常见的机器学习设置之一是逻辑回归。在许多分类模型中,包括神经网络,最终预测是通过将逻辑链接函数应用于线性得分获得的。在二元逻辑回归中,反馈可以是软标签(对应于数据的真实条件概率,如在蒸馏中)或采样的硬标签(取值为$\pm 1$)。我们指出即使在特别有利的设置中也会出现一个基本问题,其中目标是学习形式为$\sigma(\mathbf{x}^{\top}\mathbf{w}^{\star})$的无噪声软目标。在过约束情况(即样本数$n$超过输入维度$d$)下,使用样本$(\mathbf{x}_i,\sigma(\mathbf{x}_i^{\top}\mathbf{w}^{\star}))$足以恢复$\mathbf{w}^{\star}$,从而获得贝叶斯风险。然而,我们证明当样本由从相同条件分布$\sigma(\mathbf{x}_i^{\top}\mathbf{w}^{\star})$采样的硬标签$y_i$标记,且$\mathbf{w}^{\star}$是$s$-稀疏时,旋转不变算法被证明是次优的:它们产生过风险$\Omega\\!\left(\frac{d-1}{n}\right)$,而存在简单的非旋转不变算法,其过风险为$O(\frac{s\log d}{n})$。最简单的旋转不变算法是逻辑损失上的梯度下降(带早停)。针对稀疏目标实现上述上界的简单非旋转不变算法使用对权重$u_i,v_i$的梯度下降,其中线性权重$w_i$被重参数化为$u_i v_i$。

英文摘要

One of the most common machine learning setups is logistic regression. In many classification models, including neural networks, the final prediction is obtained by applying a logistic link function to a linear score. In binary logistic regression, the feedback can be either soft labels, corresponding to the true conditional probability of the data (as in distillation), or sampled hard labels (taking values $\pm 1$). We point out a fundamental problem that arises even in a particularly favorable setting, where the goal is to learn a noise-free soft target of the form $σ(\mathbf{x}^{\top}\mathbf{w}^{\star})$. In the over-constrained case (i.e. the number of samples $n$ exceeds the input dimension $d$) with examples $(\mathbf{x}_i,σ(\mathbf{x}_i^{\top}\mathbf{w}^{\star}))$, it is sufficient to recover $\mathbf{w}^{\star}$ and hence achieve the Bayes risk. However, we prove that when the examples are labeled by hard labels $y_i$ sampled from the same conditional distribution $σ(\mathbf{x}_i^{\top}\mathbf{w}^{\star})$ and $\mathbf{w}^{\star}$ is $s$-sparse, then rotation-invariant algorithms are provably suboptimal: they incur an excess risk $Ω\!\left(\frac{d-1}{n}\right)$, while there are simple non-rotation invariant algorithms with excess risk $O(\frac{s\log d}{n})$. The simplest rotation invariant algorithm is gradient descent on the logistic loss (with early stopping). A simple non-rotation-invariant algorithm for sparse targets that achieves the above upper bounds uses gradient descent on the weights $u_i,v_i$, where now the linear weight $w_i$ is reparameterized as $u_iv_i$.

2603.22278 2026-06-08 cs.CV cs.LG 版本更新

The Dual Mechanisms of Spatial Variable Binding in Vision-Language Models

视觉-语言模型中空间变量绑定的双重机制

Kelly Cui, Nikhil Prakash, Shoval Messica, Ayush Raina, David Bau, Antonio Torralba, Tamar Rott Shaham

发表机构 * MIT CSAIL(麻省理工学院计算机科学与人工智能实验室) Northeastern University(东北大学) Sony Playstation(索尼PlayStation)

AI总结 本文揭示视觉-语言模型通过语言骨干中的内容无关空间关系编码和视觉编码器中的全局布局表示两种机制实现空间变量绑定,其中视觉编码器起主导作用。

Comments 37 pages, 53 figures

详情
AI中文摘要

许多多模态任务,如图像描述和视觉问答,要求视觉-语言模型(VLM)将对象与其属性和空间关系绑定。然而,这种关联在VLM中如何以及在哪里计算仍不清楚。在这项工作中,我们展示了VLM依赖两种并发机制来表示空间变量绑定。在语言模型骨干中,中间层在对应对象的视觉标记之上表示内容无关的空间关系。然而,这种机制在塑造模型预测中仅起次要作用。相反,空间信息的主要来源是视觉编码器,其表示编码了对象的布局,并被语言模型骨干直接利用。值得注意的是,这种空间信号全局分布在视觉标记中,从对象区域扩展到周围的背景区域。我们表明,增强这些源自视觉的空间表示(跨所有图像标记)可以改善不同规模模型在COCO数据集复杂自然图像上的空间变量绑定性能。总之,我们的结果阐明了VLM中空间变量绑定的计算方式,并强调了视觉编码器在实现这一功能中的核心作用。

英文摘要

Many multimodal tasks, such as image captioning and visual question answering, require vision-language models (VLMs) to bind objects with their properties and spatial relations. Yet it remains unclear where and how such associations are computed within VLMs. In this work, we show that VLMs rely on two concurrent mechanisms to represent spatial variable binding. In the language model backbone, intermediate layers represent content-independent spatial relations on top of visual tokens corresponding to objects. However, this mechanism plays only a secondary role in shaping model predictions. Instead, the dominant source of spatial information originates in the vision encoder, whose representations encode the layout of objects and are directly exploited by the language model backbone. Notably, this spatial signal is distributed globally across visual tokens, extending beyond object regions into surrounding background areas. We show that enhancing these vision-derived spatial representations globally across all image tokens improves spatial variable binding performance across models of various sizes on complex natural images from the COCO datasets. Together, our results clarify how spatial variable binding is computed within VLMs and highlight the central role of vision encoders in enabling it.

2603.24481 2026-06-08 cs.AI cs.CL cs.LG 版本更新

Multi-Agent Reasoning with Consistency Verification Improves Uncertainty Calibration in Medical MCQA

基于一致性验证的多智能体推理改进医学多项选择题问答中的不确定性校准

John Ray B. Martinez

发表机构 * Department of Data Science and Analytics(数据科学与分析系)

AI总结 提出多智能体框架,结合领域专家智能体与两阶段验证及S分数加权融合,在医学MCQA中显著降低校准误差并提升判别能力。

Comments 20 pages, 6 figures. Preprint under review

详情
AI中文摘要

校准不良的置信度分数是AI在临床环境中部署的实际障碍。总是过度自信的模型无法为延迟决策提供有用信号。我们提出了一个多智能体框架,结合领域特定专家智能体与两阶段验证(Wu等人,2024)和S分数加权融合,以改进医学多项选择题问答中的校准和判别能力。四个专家智能体(呼吸科、心脏病科、神经科、胃肠科)使用Qwen2.5-7B-Instruct生成独立诊断。每个诊断经历两阶段自我验证过程,测量内部一致性并产生专家置信度分数(S分数)。S分数驱动加权融合策略,选择最终答案并校准报告的置信度。我们在MedQA-USMLE和MedMCQA的高分歧子集(100和250个问题)上进行评估。所有结果均针对此过滤后的设置。在MedQA-250上,完整系统实现了ECE=0.091(比单专家基线降低74.4%)和AUROC=0.630(+0.056),准确率为59.2%。在所有四种设置中,校准增益保持在49-74%。消融分析表明,两阶段验证驱动ECE降低,而多智能体推理驱动AUROC提升,表明一致性检查和集成聚合解决了LLM不确定性的不同失败模式。由此产生的置信度信号是否足以在实践中支持临床延迟决策,仍是未来研究的方向。

英文摘要

Miscalibrated confidence scores are a practical obstacle to deploying AI in clinical settings. A model that is always overconfident offers no useful signal for deferral. We present a multi-agent framework that combines domain-specific specialist agents with Two-Phase Verification (Wu et al., 2024) and S-Score Weighted Fusion to improve both calibration and discrimination in medical multiple-choice question answering. Four specialist agents (respiratory, cardiology, neurology, gastroenterology) generate independent diagnoses using Qwen2.5-7B-Instruct. Each diagnosis undergoes a two-phase self-verification process that measures internal consistency and produces a Specialist Confidence Score (S-score). The S-scores drive a weighted fusion strategy that selects the final answer and calibrates the reported confidence. We evaluate on high-disagreement subsets of MedQA-USMLE and MedMCQA (100 and 250 questions). All results are specific to this filtered regime. On MedQA-250, the full system achieves ECE = 0.091 (74.4% reduction over the single-specialist baseline) and AUROC = 0.630 (+0.056) at 59.2% accuracy. Calibration gains of 49-74% hold across all four settings. Ablation analysis reveals that Two-Phase Verification drives ECE reduction while multi-agent reasoning drives AUROC improvement, suggesting that consistency checking and ensemble aggregation address different failure modes of LLM uncertainty. Whether the resulting confidence signal is sufficient to support clinical deferral decisions in practice remains a direction for future investigation.

2603.24963 2026-06-08 cs.AI cs.LG 版本更新

Design Once, Deploy at Scale: Template-Driven ML Development for Large Model Ecosystems

一次设计,大规模部署:面向大型模型生态的模板驱动ML开发

Jiang Liu, John Martabano Landy, Yao Xuan, Swamy Muddu, Nhat Le, Munaf Sahaf, Luc Kien Hang, Rupinder Khandpour, Kevin De Angeli, Chang Yang, Shouyuan Chen, Shiblee Sadik, Anirudh Agrawal, Djordje Gligorijevic, Jingzheng Qin, Peggy Yao, Alireza Vahdatpour

发表机构 * Meta AI

AI总结 针对大型模型生态中ML开发效率低的问题,提出标准化模型模板(SMT)框架,将技术传播复杂度从O(n·2^k)降至O(n+k),在Meta广告排名系统中实现交叉熵提升0.63%、迭代时间减少92%、技术-模型对采用吞吐量提升6.3倍。

详情
AI中文摘要

现代计算广告平台通常依赖推荐系统来预测用户响应,如点击率、转化率和其他优化事件。为了支持多样化的产品表面和广告主目标,这些平台经常维护一个广泛的机器学习(ML)模型生态系统。然而,在这种规模下运营带来了显著的发展和效率挑战。需要大量的工程努力来定期刷新ML模型并传播新技术,这导致在生态系统中部署ML创新时出现长延迟。我们提出了一项大规模实证研究,比较了标准化模型构建方法与推荐系统中独立每模型优化之间的模型性能、效率和ML技术传播。为了促进这种标准化,我们提出了标准模型模板(SMT)——一个生成适应不同数据分布和优化事件的高性能模型的框架。通过利用标准化、可组合的ML模型组件,SMT将技术传播复杂度从O(n·2^k)降低到O(n+k),其中n是模型数量,k是技术数量。在Meta的生产广告排名生态系统中,对四个全球开发周期内的广泛模型套件进行评估,我们的结果表明:(1)在中等服务容量下,交叉熵平均提高0.63%;(2)每模型迭代工程时间减少92%;(3)技术-模型对采用吞吐量增加6.3倍。这些发现挑战了多样化优化目标本质上需要多样化ML模型设计的传统观点。

英文摘要

Modern computational advertising platforms typically rely on recommendation systems to predict user responses, such as click-through rates, conversion rates, and other optimization events. To support a wide variety of product surfaces and advertiser goals, these platforms frequently maintain an extensive ecosystem of machine learning (ML) models. However, operating at this scale creates significant development and efficiency challenges. Substantial engineering effort is required to regularly refresh ML models and propagate new techniques, which results in long latencies when deploying ML innovations across the ecosystem. We present a large-scale empirical study comparing model performance, efficiency, and ML technique propagation between a standardized model-building approach and independent per-model optimization in recommendation systems. To facilitate this standardization, we propose the Standard Model Template (SMT) -- a framework that generates high-performance models adaptable to diverse data distributions and optimization events. By utilizing standardized, composable ML model components, SMT reduces technique propagation complexity from $O(n \cdot 2^k)$ to $O(n + k)$ where $n$ is the number of models and $k$ the number of techniques. Evaluating an extensive suite of models over four global development cycles within Meta's production ads ranking ecosystem, our results demonstrate: (1) a 0.63% average improvement in cross-entropy at neutral serving capacity, (2) a 92% reduction in per-model iteration engineering time, and (3) a $6.3\times$ increase in technique-model pair adoption throughput. These findings challenge the conventional wisdom that diverse optimization goals inherently require diversified ML model design.

2604.03146 2026-06-08 stat.ML cs.LG 版本更新

Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization

高维经验风险最小化中高斯普适性破坏的表征

Chiheb Yaakoubi, Cosme Louart, Malik Tiomoko, Zhenyu Liao

发表机构 * School of Data Science, The Chinese University of Hong Kong, Shenzhen, China Huawei Noah's Ark Lab, Huawei Technologies, Paris, France School of Electronic Information Communications, Huazhong University of Science \& Technology, China

AI总结 通过将凸高斯极小极大定理推广到非高斯数据,刻画了高维经验风险最小化估计量的渐近分布,揭示了高斯普适性的适用范围与局限。

Comments 28 pages, 5 figures, 1 table

详情
Journal ref
ICML 2026
AI中文摘要

我们研究了一般非高斯数据设计下的高维凸经验风险最小化(ERM)。通过启发式地将凸高斯极小极大定理(CGMT)扩展到非高斯设置,我们推导出关键统计量的渐近极小极大表征,从而能够近似ERM估计量 $\hat{\theta}$ 的均值 $\mu_{\hat{\theta}}$ 和协方差 $C_{\hat{\theta}}$。具体地,在数据矩阵的集中假设以及损失和正则化子的标准正则性条件下,我们证明:对于独立于训练数据的测试协变量 $x$,投影 $\hat{\theta}^\top x$ 近似遵循 $\mu_{\hat{\theta}}^\top x$ 的一般非高斯分布与一个独立中心高斯变量(方差为 $\mathrm{tr}(C_{\hat{\theta}} \mathbb{E}[xx^\top])$)的卷积。这一结果阐明了ERM高斯普适性的范围和局限。此外,我们证明任何 $\mathcal{C}^2$ 正则化子渐近等价于一个由其零点的Hessian矩阵和 $\mu_{\hat{\theta}}$ 处的梯度唯一确定的二次型。我们提供了跨不同损失和模型的数值模拟,以验证我们的理论预测和定性见解。

英文摘要

We study high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussian Min-Max Theorem (CGMT) to non-Gaussian settings, we derive an asymptotic min-max characterization of key statistics, enabling approximation of the mean $μ_{\hatθ}$ and covariance $C_{\hatθ}$ of the ERM estimator $\hatθ$. Specifically, under a concentration assumption on the data matrix and standard regularity conditions on the loss and regularizer, we show that for a test covariate $x$ independent of the training data, the projection $\hatθ^\top x$ approximately follows the convolution of the generally non-Gaussian distribution of $μ_{\hatθ}^\top x$ with an independent centered Gaussian variable of variance $\mathrm{tr}(C_{\hatθ} \mathbb{E}[xx^\top])$. This result clarifies the scope and limits of Gaussian universality for ERMs. Additionally, we prove that any $\mathcal{C}^2$ regularizer is asymptotically equivalent to a quadratic form determined solely by its Hessian at zero and gradient at $μ_{\hatθ}$. Numerical simulations across diverse losses and models are provided to validate our theoretical predictions and qualitative insights.

2604.17433 2026-06-08 cs.CL cs.AI cs.LG 版本更新

Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning

仅需两个样本的自一致性:CoT-PoT集成实现高效LLM推理

Raman Saparkhan, Majd Hawasly, Md Rizwan Parvez, Mohammad Raza

发表机构 * Carnegie Mellon University Qatar(卡内基梅隆大学(卡塔尔)) Qatar Computing Research Institute(卡塔尔计算研究院)

AI总结 提出一种混合集成方法,结合思维链与程序化推理两种模式,通过仅需两个样本即可实现自一致性,将采样量减少9.3倍,并在78.6%的任务上达到最优。

Comments 9 pages, 3 figures; accepted to Findings of ACL 2026

详情
AI中文摘要

自一致性(SC)是一种通过聚合多个采样输出来提高大型语言模型推理准确性的流行技术,但由于大量采样,其计算成本高昂。我们引入了一种混合集成方法,利用两种不同推理模式(思维链(CoT)和程序化推理(PoT))的互补优势。我们描述了一个通用框架,用于在自一致性中结合这两种推理形式,并提出了全采样和早停的特定策略。我们表明,CoT-PoT集成不仅提高了整体准确性,而且将SC所需的样本数量大幅减少了9.3倍。特别是,大多数任务(78.6%)仅需两个样本即可解决,这在之前的任何SC方法中都是不可能的。

英文摘要

Self-consistency (SC) is a popular technique for improving the reasoning accuracy of large language models by aggregating multiple sampled outputs, but it comes at a high computational cost due to extensive sampling. We introduce a hybrid ensembling approach that leverages the complementary strengths of two distinct modes of reasoning: Chain-of-Thought (CoT) and Program-of-Thought (PoT). We describe a general framework for combining these two forms of reasoning in self-consistency, as well as particular strategies for both full sampling and early-stopping. We show that CoT-PoT ensembling not only improves overall accuracy, but also drastically reduces the number of samples required for SC by a factor of 9.3x. In particular, the majority of tasks (78.6%) can be addressed with only two samples, which has not been possible with any prior SC methods.

2604.23025 2026-06-08 cs.CR cs.LG 版本更新

Self-Supervised Learning for Android Malware Detection on a Time-Stamped Dataset

基于时间戳数据集的自监督学习安卓恶意软件检测

Annan Fu, Hao Pei, Maryam Tanha

发表机构 * Mastercard Canada(Mastercard加拿大)

AI总结 针对机器学习检测器的时间偏差问题,构建时间戳数据集并采用BYOL自监督预训练,在时间感知评估下达到98%准确率和89%F1分数。

Comments Accepted for publication in IEEE ICC 2026. \c{opyright} 2026 IEEE

详情
AI中文摘要

基于机器学习的安卓恶意软件检测器常受时间偏差影响:模型在训练和评估时未考虑应用的实际发布时间,导致准确率虚高并削弱实际鲁棒性。我们通过构建一个包含良性及恶意安卓应用的时间戳数据集来解决此问题,并引入时间戳验证程序以确保时间准确性。随后,我们提出一个检测框架,使用自监督预训练方法Bootstrap Your Own Latent (BYOL)学习抗混淆的表示,然后进行监督分类。在时间感知评估下,该方法达到98%的准确率和89%的F1分数。我们进一步通过VirusTotal和MITRE ATT&CK框架分析真正例和假负例来表征恶意软件行为。为支持可复现性和进一步创新,我们公开了数据集和源代码。

英文摘要

Android malware detectors built with machine learning often suffer from temporal bias: models are trained and evaluated without respecting apps' actual release times, inflating accuracy and weakening real-world robustness. We address this by constructing a time-stamped dataset of benign and malicious Android apps and introducing a timestamp-verification procedure to ensure temporal accuracy. We then propose a detection framework that uses Bootstrap Your Own Latent (BYOL) for self-supervised pre-training to learn obfuscation-resilient representations, followed by supervised classification. Under time-aware evaluation, the method attains 98% accuracy and 89% F1. We further characterize malware behavior by analyzing true positives and false negatives using VirusTotal and the MITRE ATT&CK framework. To support reproducibility and further innovation, we release our dataset and source code.

2605.06647 2026-06-08 cs.IR cs.AI cs.LG 版本更新

Superintelligent Retrieval Agent: The Next Frontier of Agentic Retrieval

超级智能检索代理:代理检索的下一个前沿

Zeyu Yang, Qi Ma, Jason Chen, Anshumali Shrivastava

发表机构 * Meta Superintelligence Labs(Meta超级智能实验室) Rice University(里士满大学)

AI总结 提出SIRA,通过单次语料判别性检索压缩多轮探索,利用LLM丰富文档词汇、预测查询缺失词汇并基于语料统计过滤,在BEIR基准上取得最强平均检索性能,并在下游QA任务中超越RL训练的代理系统。

详情
AI中文摘要

检索增强代理日益成为大型知识库的接口,但大多数将检索视为黑箱:它们发出探索性查询,检查片段,并重新表述直到证据出现。这类似于新手搜索不熟悉的数据库,而非专家利用术语和可能证据的强先验进行导航,导致额外的检索轮次、延迟和低召回率。我们引入了超级智能检索代理(SIRA),它将检索中的超级智能视为将多轮探索性搜索压缩为单次语料判别性检索行动。SIRA不仅询问哪些术语相关,还询问哪些术语将所需证据与语料级混淆项区分开。离线时,LLM用缺失的搜索词汇丰富每个文档;查询时,它预测查询遗漏的证据词汇;语料统计作为工具调用,过滤掉缺失、过于常见或不太可能产生检索边界的术语。最后一步是单次加权BM25调用,将查询与验证后的扩展结合。在十个BEIR基准上,SIRA实现了我们比较中最强的平均检索性能,击败了密集检索器、学习型稀疏检索器和LLM搜索代理基线,且未使用相关性标签或检索器微调。在下游QA中,其仅检索的答案覆盖率在NQ和HotpotQA上超过了近期RL训练的代理QA系统。我们还引入了BrowseComp-Wikipedia,一个包含232个BrowseComp衍生查询、覆盖25,587,229篇文档的维基百科索引的硬搜索基准。即使没有索引时丰富,仅使用基于维基百科类别的接地,SIRA在每个预算下都优于多轮Perplexity代理,达到9.70%的Recall@1、15.27%的Recall@10和36.14%的Recall@100。

英文摘要

Retrieval-augmented agents are increasingly the interface to large knowledge bases, yet most treat retrieval as a black box: they issue exploratory queries, inspect snippets, and reformulate until evidence emerges. This resembles how a newcomer searches an unfamiliar database rather than how an expert navigates it with strong priors about terminology and likely evidence, causing extra retrieval rounds, latency, and poor recall. We introduce \textit{Superintelligent Retrieval Agent} (SIRA), which casts \emph{superintelligence} in retrieval as compressing multi-round exploratory search into a single corpus-discriminative retrieval action. SIRA does not merely ask which terms are relevant; it asks which terms separate the desired evidence from corpus-level confusers. Offline, an LLM enriches each document with missing search vocabulary; at query time, it predicts evidence vocabulary the query omits; and corpus statistics serve as tool calls that filter terms that are absent, overly common, or unlikely to create retrieval margin. The final step is a single weighted BM25 call combining the query with the validated expansion. Across ten BEIR benchmarks, SIRA achieves the strongest average retrieval performance in our comparison, beating dense retrievers, learned sparse retrievers, and LLM search-agent baselines while using no relevance labels or retriever fine-tuning. On downstream QA, its retrieval-only answer coverage exceeds recent RL-trained agentic QA systems on NQ and HotpotQA. We also introduce \textbf{BrowseComp-Wikipedia}, a hard-search benchmark of 232 BrowseComp-derived queries over a 25,587,229-document Wikipedia index. Even without index-time enrichment, using only grounded Wikipedia categories, SIRA outperforms multi-round Perplexity agents at every budget, reaching 9.70% Recall@1, 15.27% Recall@10, and 36.14% Recall@100.

2605.08732 2026-06-08 cs.RO cs.LG 版本更新

Latent Geometry Beyond Search: Amortizing Planning in World Models

超越搜索的潜在几何:在世界模型中摊销规划

Hoang Nguyen, Xiaohao Xu, Xiaonan Huang

发表机构 * Department of Robotics, University of Michigan, Ann Arbor(密歇根大学机器人系,安阿伯)

AI总结 提出在正则化潜在几何下,将规划摊销为潜在逆动力学映射,以轻量级GC-IDM替代在线搜索,在七个环境协议中匹配或超越CEM,决策成本降低100-130倍。

Comments 31 pages

详情
AI中文摘要

现代基于视觉的世界模型可以将观测表示为紧凑而富有表现力的潜在流形,但在这些空间中进行快速的目标导向规划仍然具有挑战性。这引发了一个核心问题:学习到的表示何时简化控制,而不仅仅是实现预测?我们在预训练的LeWorldModel中研究这个问题,其潜在几何通过正则化实现平滑性和均匀性。我们的关键见解是,在这种几何下,规划可以摊销为潜在逆动力学映射,而无需在线搜索。因此,我们用一个轻量级的目标条件逆动力学模型(GC-IDM)替代迭代规划,该模型将当前潜在状态、目标潜在状态和剩余时间步直接映射到下一个动作。实验上,在涵盖导航、接触丰富的操作和连续控制的四个基准环境中,我们的控制器在八个环境-协议设置中的七个上匹配或超过了CEM,同时将每次决策成本降低了100-130倍。对测试时规划器(CEM、MPPI、iCEM和基于梯度的方法)的更广泛扫描表明,这一结果并非特定于某个优化器。这些发现表明,测试时规划恢复的大部分结构已经局部编码在潜在表示中。更广泛地说,我们的结果表明,足够结构化的潜在空间可以将部分规划负担从在线优化转移到学习推理。我们的代码公开在 https://github.com/hdnndh/Latent-Geometry-Beyond-Search-Amortizing-Planning-in-World-Models 。

英文摘要

Modern vision-based world models can represent observations as compact yet expressive latent manifolds, but fast goal-oriented planning in these spaces remains challenging. This raises a central question: when does a learned representation simplify control, rather than merely enabling prediction? We study this question in a pretrained LeWorldModel, whose latent geometry is regularized for smoothness and uniformity. Our key insight is that, under such geometry, planning can be amortized into a latent inverse-dynamics mapping instead of requiring online search. We therefore replace iterative planning with a lightweight Goal-Conditioned Inverse Dynamics Model (GC-IDM) that maps the current latent state, goal latent state, and remaining horizon directly to the next action. Empirically, across four benchmark environments spanning navigation, contact-rich manipulation, and continuous control, our controller matches or exceeds CEM in seven of eight environment-protocol settings while reducing per-decision cost by 100-130x. A broader sweep over test-time planners (CEM, MPPI, iCEM, and gradient-based methods) shows that this result is not specific to a particular optimizer. These findings suggest that much of the structure recovered by test-time planning is already locally encoded in the latent representation. More broadly, our results indicate that sufficiently structured latent spaces can shift part of the planning burden from online optimization to learned inference. Our code is publicly available at https://github.com/hdnndh/Latent-Geometry-Beyond-Search-Amortizing-Planning-in-World-Models .

2605.10792 2026-06-08 math.OC cs.LG 版本更新

Implicit Neural Optimal Transport via Fixed-Point Optimization

通过不动点优化的隐式神经最优传输

Yesom Park, Eric Gelphman, Stanley Osher, Samy Wu Fung

发表机构 * Department of Mathematics, University of California, Los Angeles(加州大学洛杉矶分校数学系) Department of Applied Mathematics and Statistics, Colorado School of Mines(科罗拉多矿业学院应用数学与统计系)

AI总结 提出隐式神经最优传输公式,通过单个势函数和近端不动点问题避免对抗训练,实现稳定高效的单网络框架,同时恢复前向和后向传输映射。

Comments 37 pages, submitted to SIAM Journal on Mathematical Data Science (currently under review)

详情
AI中文摘要

我们提出了一种隐式神经最优传输公式,消除了现有方法中常用的对抗性最小-最大优化和多网络架构。我们的关键思想是在Kantorovich对偶中参数化单个势函数,并将相关的c-变换重新表述为近端不动点问题。这产生了一个稳定的单网络框架,其中通过对偶可行性通过近端最优性条件而非对抗性训练精确执行。尽管有内部不动点计算,梯度可以在不通过不动点迭代微分的情况下计算,从而无需隐式微分即可实现高效训练。我们进一步建立了随机梯度下降的收敛性。得到的框架高效、可扩展且广泛适用:它同时恢复前向和后向传输映射,并自然扩展到类条件设置。在高维高斯基准、物理数据集和图像翻译任务上的实验表明,该框架具有强大的传输精度以及改进的训练稳定性和良好的计算及内存效率。

英文摘要

We propose an implicit neural formulation of optimal transport that eliminates adversarial min--max optimization and multi-network architectures commonly used in existing approaches. Our key idea is to parameterize a single potential in the Kantorovich dual and reformulate the associated c-transform as a proximal fixed-point problem. This yields a stable single-network framework in which dual feasibility is enforced exactly through proximal optimality conditions rather than adversarial training. Despite the inner fixed-point computation, gradients can be computed without differentiating through the fixed-point iterations, enabling efficient training without requiring implicit differentiation. We further establish convergence of stochastic gradient descent. The resulting framework is efficient, scalable, and broadly applicable: it simultaneously recovers forward and backward transport maps and naturally extends to class-conditional settings. Experiments on high-dimensional Gaussian benchmarks, physical datasets, and image translation tasks demonstrate strong transport accuracy together with improved training stability and favorable computational and memory efficiency.

2605.13268 2026-06-08 quant-ph cs.LG 版本更新

Physics Guided Generative Optimization for Trotter Suzuki Decomposition

物理引导的Trotter-Suzuki分解生成优化

WenBin Yan

发表机构 * University of Colorado Boulder(科罗拉多大学博尔德分校)

AI总结 提出P-GONE方法,结合条件扩散模型、图神经网络和REINFORCE微调,联合优化Trotter-Suzuki分解中的项分组、阶数和时间步分配,在保真度≥0.95时实现19.4倍电路深度压缩。

详情
AI中文摘要

Trotter-Suzuki乘积公式是在含噪中等规模量子(NISQ)硬件上进行哈密顿演化的标准途径,但其精度取决于三个耦合的选择:项分组、乘积公式阶数和时间步分配。分组和阶数是离散的,这使得直接梯度优化不可行,并迫使现有编译器依赖静态启发式方法。我们描述了P-GONE方法,该方法结合了条件扩散模型(D3PM + DDPM)、图神经网络(GNN)编码器和闭环REINFORCE微调,以在混合离散-连续空间上联合学习分组、阶数和时间步优化。在保真度匹配条件下($F \geq 0.95$),该方法实现的电路深度为86,而Qiskit四阶(未分组,Suzuki-4)为1673,压缩约19.4倍;Paulihedral(一阶Trotter)为141,压缩约1.6倍。在$T=0.90$时,该方法也优于Qiskit分组对易教师(65 vs 103,压缩1.6倍),但在$T=0.95$时教师仍领先——这种分层模式指向保真度感知的微调。在标准退极化噪声模型下,该方法的含噪保真度大约是Qiskit四阶基线的2倍(0.743 vs 0.380)。消融实验显示清晰的层次:阶数学习 > 时间分配 > 分组。最佳N采样($N=32$是实际最佳点)和CFG指导在推理时提供灵活的保真度-深度权衡。该方法在结构化哈密顿量(TFIM,Heisenberg)上表现良好,但随机Pauli哈密顿量在$T \geq 0.95$时完全失败——这定义了该方法的适用边界。

英文摘要

Trotter Suzuki product formulas are the standard route to Hamiltonian evolution on noisy intermediate-scale quantum (\NISQ{}) hardware, but their accuracy depends on three coupled choices: term grouping, product-formula order, and time-step allocation. Grouping and order are discrete, which makes direct gradient optimization infeasible and forces existing compilers to rely on static heuristics. We describe P-GONE, a method that combines a conditional diffusion model (D3PM + DDPM), a graph neural network (\GNN{}) encoder, and closed-loop REINFORCE fine-tuning to jointly learn grouping, order, and time-step optimization over a mixed discrete-continuous space. Under fidelity-matched conditions ($F \geq 0.95$), the method achieves circuit depth 86 versus 1673 for Qiskit fourth-order (ungrouped, Suzuki-4), about $19.4\times$ compression, and 141 for Paulihedral (first-order Trotter), about $1.6\times$ compression. At $T=0.90$ the method also beats the Qiskit group-commuting teacher (65 vs 103, $1.6\times$ compression), though at $T=0.95$ the teacher still leads -- a stratified pattern that points toward fidelity-aware fine-tuning. Under a standard depolarizing noise model, the method achieves noisy fidelity roughly $2\times$ the Qiskit fourth-order baseline (0.743 vs 0.380). Ablation shows a clear hierarchy: order learning $>$ time allocation $>$ grouping. Best-of-N sampling ($N=32$ is a practical sweet spot) and CFG guidance give flexible fidelity-depth trade-offs at inference. The method works well on structured Hamiltonians (TFIM, Heisenberg), but random Pauli Hamiltonians fail entirely at $T \geq 0.95$ -- a boundary that defines where the method applies.

2605.21347 2026-06-08 cs.AI cs.LG cs.SE 版本更新

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

Insights Generator: LLM代理的系统级语料库追踪诊断

Akshay Manglik, Apaar Shanker, Kaustubh Deshpande, Jason Qin, Yash Maurya, Veronica Chatrath, Vijay S. Kalmath, Levi Lentz, Yuan Xue

发表机构 * Scale AI, Inc.

AI总结 本文提出Insights Generator,一种多智能体系统,通过在语料库中提出和测试假设来生成基于证据的洞察报告,从而系统性地诊断LLM代理的行为模式。

详情
AI中文摘要

诊断LLM代理的故障仍然主要依赖人工。从业者检查少量执行轨迹子集,形成临时假设并迭代。此过程错过了仅在轨迹群体中显现的模式,并且无法扩展到生产语料库,其中单个轨迹跨度上万词。我们正式化了语料库级轨迹诊断问题。给定一个执行轨迹语料库,目标是生成具有证据支持的自然语言洞察,以描述轨迹群体中的系统性行为模式,每个模式都与支持证据相关联。我们提出了Insights Generator (IG),一种多智能体系统,通过在轨迹语料库中提出和测试假设来回答诊断问题,从而生成证据支持的洞察报告。我们从定性和客观维度评估了IG,涵盖基于评分标准的报告评估和通过实施IG洞察获得的下游性能改进。使用IG报告的人类专家将支架性能提高了30.4个点百分比,而利用IG衍生洞察的编码代理显示出一致且稳定的提升。在多个基准测试中,IG的scout-investigator架构生成的发现与竞争方法在检测覆盖方面相当,同时领域专家对IG报告的深度和证据质量评价优异。

英文摘要

Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patterns that only emerge across trace populations and does not scale to production corpora where individual traces span tens of thousands of tokens. We formalize the problem of corpus-level trace diagnostics. Given a corpus of execution traces, the goal is to produce grounded natural-language insights that characterize systematic behavioral patterns across trace groups, each linked to supporting evidence. We present the Insights Generator (IG), a multi-agent system that answers diagnostic questions by proposing and testing hypotheses across the trace corpus to produce an evidence-backed insights report. We evaluate IG across qualitative and objective dimensions, spanning rubric-based report assessment and downstream performance improvements achieved by implementing IG insights. Human experts using IG reports improve scaffold performance by 30.4pp over the unmodified baseline scaffold, and coding agents leveraging IG-derived insights show consistent and stable gains. Across benchmarks, IG's scout-investigator architecture produces findings comparable in detection coverage to competing approaches, while domain experts rated IG reports as leading depth and evidence quality.

2605.25638 2026-06-08 cs.CL cs.LG 版本更新

Reinforcement Learning from Denoising Feedback

基于去噪反馈的强化学习

Qi He, Huan Chen, Ya Guo, Huijia Zhu, Yi R. Fung, Baojian Zhou

发表机构 * Fudan University(复旦大学) Ant Group(蚂蚁集团) Hong Kong University of Science and Technology(香港科技大学)

AI总结 提出RLDF方法,利用去噪反馈进行策略损失估计,通过优化中间噪声状态到裁剪干净状态并结合加权时间步采样,在扩散语言模型上提升性能和泛化性。

详情
AI中文摘要

策略损失估计仍然是扩散语言模型(dLLMs)强化学习中的一个基本且长期存在的挑战。我们引入了基于去噪反馈的强化学习(RLDF),这是一种新颖的训练范式,利用从rollout和训练过程中获得的反馈来实现准确且高效的策略损失估计。为了平衡计算效率和估计有效性之间的权衡,RLDF将模型从中间噪声状态$x_t$优化到裁剪干净状态$\hat{x}_0$,并结合了随时间步$t$的加权采样。大量实验表明,RLDF在两种代表性dLLM架构(LLaDA和Dream)上,在多个推理基准测试中实现了性能和泛化性的一致且显著的提升。我们的工作为扩散语言模型中的可扩展强化学习奠定了原则性基础。我们构建了Drift,一个用于dLLMs的训练框架,可在https://github.com/ant-research/Drift获取。

英文摘要

Policy loss estimation remains a fundamental and long-standing challenge in reinforcement learning (RL) for diffusion language models (DLMs). We introduce Reinforcement Learning from Denoising Feedback (RLDF), a novel training paradigm that leverages feedback obtained from rollout and training processes to facilitate accurate and efficient policy loss estimation. To balance the trade-off between computational efficiency and estimation effectiveness, RLDF optimizes the model toward the clipped clean state from intermediate noisy states, combined with weighted timestep sampling over denoising timesteps. Extensive experiments demonstrate that RLDF achieves consistent and substantial improvements in both performance and generalizability across two representative DLM architectures, LLaDA and Dream, on multiple reasoning benchmarks. Our work lays a principled foundation for scalable reinforcement learning in diffusion language models. We build Drift, a training framework for DLMs, available at https://github.com/ant-research/Drift.

2605.30432 2026-06-08 math.DS cs.LG cs.SI nlin.AO physics.soc-ph 版本更新

Learning effective models from network dynamics data with multiple initial conditions using weak form SINDy

使用弱形式SINDy从多初始条件的网络动力学数据中学习有效模型

Moyi Tian, Daniel A. Messenger, Vanja Dukic, Nancy Rodríguez, David M. Bortz

发表机构 * Department of Applied Mathematics, University of Colorado, Boulder, CO 80309 United States(应用数学系,科罗拉多大学,博尔德,CO 80309 美国) Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545 United States(洛斯阿拉莫斯国家实验室理论部,洛斯阿拉莫斯,NM 87545 美国)

AI总结 本文使用弱形式稀疏非线性动力学识别(WSINDy)方法,从多初始条件的网络动力学数据中学习有效模型,并评估了噪声水平与轨迹数量对学习精度的影响。

Comments 24 pages, 14 figures, 1 table. Code available at https://github.com/Moyi-Tian/WSINDy-NetworkDynamics

详情
AI中文摘要

社会系统由通过社交互动相互影响的个体网络组成。研究这些网络上的过程演化有助于我们更好地理解社会行为模式。我们研究了一个耦合线上和线下社交活动的系统,并探讨如何使用弱形式稀疏非线性动力学识别(WSINDy)方法直接从数据中学习有效模型,该方法用于发现控制方程。我们使用网络上的随机交互过程的平均场近似模型生成的数据评估学习性能,并测试在不同噪声水平下系统恢复的准确性。结果表明,当噪声较高时,使用更多轨迹可以提高准确性,但只需少量额外轨迹即可获得大部分收益,之后改进甚微。我们还从网络上的平均随机数据中学习有效的常微分方程模型。当传统的平均场近似失效时,直接从随机过程中识别连续常微分方程能够生成更符合数据的有效模型,并更深入地理解潜在动力学。

英文摘要

Social systems consist of networks of individuals who influence one another through social interactions. Studying how processes evolve on these networks can help us better understand patterns of social behavior. We study a system that couples online and offline social activity and investigate how to learn effective models directly from data using Weak Form Sparse Identification of Nonlinear Dynamics (WSINDy), a method for discovering governing equations. We assess learning performance using data generated by a mean-field approximation model of a stochastic interaction process on networks and test how accurately the system can be recovered under different noise levels. Our results show that using more trajectories improves accuracy when noise is high, but only a small number of additional trajectories is needed to gain most of the benefit, with little improvement beyond that. We also learn effective ODE models from averaged stochastic data on networks. When traditional mean-field approximations fail, identifying continuum ODEs directly from stochastic processes yields efficient models that better match the data and provide deeper insight into the underlying dynamics.

2606.00279 2026-06-08 cs.CR cs.LG 版本更新

Bit-Exact AI Inference Verification Without Performance Tradeoffs

无性能权衡的位精确AI推理验证

Naci Cankaya

发表机构 * Naci Cankaya(纳西·卡纳亚)

AI总结 针对GPU浮点运算非确定性导致AI工作负载验证困难的问题,提出通过软件仿真实现位精确重计算,在不牺牲性能的前提下实现可审计的推理验证。

Comments Best paper award, ICML 2026 TAIGR workshop. Code can be found at https://github.com/NaciCankaya/hardware_rounding_error_predictor

详情
AI中文摘要

验证AI工作负载的声明是对隐蔽对手(仅在检测可能性高时才遵守监控)进行可信AI治理的前提,然而GPU浮点运算明显的非确定性迫使审计人员接受近似输出匹配。隐蔽对手可以利用监控计算中不可验证的自由度。攻击向量包括隐写术、未报告的推理软件修改以及通过未报告的批次元素进行的隐蔽计算。通过实验,我们分析了现代推理引擎(vLLM、HF transformers)如何产生确定性但非不变性的输出,而无需设置影响性能的确定性标志,只要重计算所需的信息可用且后端未调用原子函数。我们证明,这种位精确重计算不需要访问相同的硬件,通过跨多个NVIDIA GPU变体的LLM推理的纯软件仿真实现。因此,累积的舍入误差可以成为用于推理的软件和硬件设置的可审计签名,而不是可验证性的约束。

英文摘要

Verifying claims about AI workloads is a prerequisite for credible AI governance of covert adversaries (who comply with monitoring only when detection likelihood is high), yet the apparent non-determinism of GPU floating-point arithmetic forces auditors to accept approximate output matches. Covert adversaries can exploit unverifiable degrees of freedom in monitored computation. Attack vectors include steganography, unreported modification of inference software, and covert computation via unreported batch elements. Empirically, we analyze how modern inference engines (vLLM, HF transformers) produce deterministic but non-invariant outputs, without needing to set performance-compromising determinism flags, if the right information is available for re-computation and no atomic functions are called in the backend. We demonstrate that such bitwise-precise re-computation does not require access to identical hardware, via a software-only emulation of LLM inference across multiple NVIDIA GPU variants. Thus, accumulated rounding errors can be an auditable signature of the software and hardware setup used for inference, instead of a constraint on verifiability.

2606.01446 2026-06-08 eess.SP cs.LG 版本更新

Spatially Distributed Task-Oriented Compression for Multi-Emitter Localization and Characterization with Spectral Overlap

面向多发射源定位与频谱重叠表征的空间分布式任务导向压缩

H. Nazim Bicer, J. Nicholas Laneman

发表机构 * Dept. of Electrical Engineering(电气工程系) University of Notre Dame(圣母大学) Notre Dame, IN, USA(印第安纳州诺特难市, 美国)

AI总结 提出一种任务导向的分布式压缩框架,利用空间分布式接收机对频谱重叠的多发射源进行联合定位与表征,通过置换不变训练实现高效压缩与信息保留。

Comments 6 pages, 2 figures

详情
AI中文摘要

射频频谱感知需要在密集和竞争性的无线环境中检测、定位和表征发射源。本文提出了一种任务导向的分布式压缩框架,用于利用空间分布式接收机联合定位和表征多个发射源。每个接收机观测一段短时长的复数IQ样本,将观测转换为时频表示,并编码为紧凑的潜在向量。中央融合解码器结合各接收机的潜在向量,估计一组无序的活跃发射源,包括其位置、中心频率偏移、占用带宽和波形族。采用置换不变的训练目标来处理发射源和预测的任意排序。在具有频谱重叠的合成多发射源场景上的实验表明,即使极紧凑的接收端表示也能保留用于发射源计数和波形族估计的有用信息。然而,精确的定位和频谱参数回归需要更大的潜在维度。将接收机潜在维度从$d_{\mathrm{rx}}=1$增加到$d_{\mathrm{rx}}=16$带来了最大的改进,而进一步增加到$d_{\mathrm{rx}}=64$则增益较小。这些结果证明了学习型任务导向压缩在通信高效的分布式频谱感知中的潜力。

英文摘要

Radio frequency spectrum awareness requires the ability to detect, localize, and characterize emitters in dense and contested wireless environments. In this work, we propose a task-oriented distributed compression framework for joint multi-emitter localization and characterization using spatially distributed receivers. Each receiver observes a short window of complex IQ samples, converts the observation to a time--frequency representation, and encodes it into a compact latent vector. A central fusion decoder combines the receiver latents to estimate an unordered set of active emitters, including their locations, center-frequency offsets, occupied bandwidths, and waveform families. A permutation-invariant training objective is used to handle the arbitrary ordering of emitters and predictions. Experiments on synthetic multi-emitter scenes with spectral overlap show that even extremely compact receiver-side representations can preserve useful information for emitter counting and waveform-family estimation. However, accurate localization and spectral-parameter regression require larger latent dimensions. Increasing the receiver latent dimension from $d_{\mathrm{rx}}=1$ to $d_{\mathrm{rx}}=16$ provides the largest improvement, while further increasing to $d_{\mathrm{rx}}=64$ gives smaller gains. These results demonstrate the potential of learned task-oriented compression for communication-efficient distributed spectrum awareness.

2606.01765 2026-06-08 cs.FL cs.CL cs.LG 版本更新

An Algebraic View of the Expressivity of Recurrent Language Models

循环语言模型表达能力的代数视角

Franz Nowak, Ryan Cotterell, Reda Boumasmoud

发表机构 * GitHub

AI总结 本文通过代数统一框架分析循环神经网络在不同算术模型下的表达能力,将形式语言识别问题归结为语法幺半群是否划分特定圈积的代数问题。

Comments 28 pages, 2 figures, to be published at ICML 2026

详情
AI中文摘要

循环神经语言模型能识别哪些形式语言?文献中的形式结果存在冲突:一些作者报告图灵完备性,而另一些则显示等价于正则语言。这种差异的原因在于底层算术模型不同。本文发展了一个统一的代数视角来刻画循环神经网络的表达能力,首先对各种算术模型进行形式化描述。该视角将表达能力归结为一个代数问题,例如网络的语法幺半群是否划分某个圈积。作为案例研究,本文重新审视了对角状态空间模型:一旦强制执行浮点递归,同一架构无法实现偶数模计数器,但在无符号整数量化下却能实现每个偶数模计数器。

英文摘要

What formal languages can a recurrent neural language model recognize? Formal results in the literature conflict: some authors report Turing-completeness, while others show equivalence to regular languages. The reason for this discrepancy is that the underlying arithmetic model differs. The paper develops a unified algebraic account of the expressivity of recurrent neural networks, starting with a formal account of various arithmetic models. This account reduces expressivity to an algebraic question, e.g., whether a network's syntactic monoid divides a certain wreath product. As a case study, the paper revisits diagonal state-space models: the same architecture cannot implement an even-modulus counter once floating-point recurrences are enforced, yet realizes every even-modulus counter under unsigned-integer quantization.

2606.02475 2026-06-08 math.NA cs.CE cs.LG cs.NA 版本更新

Physics-Informed Residuals for Adaptive Mesh Refinement in Finite-Difference PDE Solvers

面向有限差分PDE求解器中自适应网格细化的物理信息残差

Henry Kasumba, Ronald Katende

发表机构 * Department of Mathematics, Makerere University(数学系,Makerere大学) Department of Mathematics, Kabale University(数学系,Kabale大学)

AI总结 提出利用物理信息神经网络(PINN)作为离网格残差探针,为有限差分求解器提供自适应网格细化指示,在粘性Burgers方程等基准测试中验证了其有效性。

Comments 20 pages, 5 tables, 5 figures

详情
AI中文摘要

经典有限差分求解器仍是偏微分方程的可靠工具,但其效率取决于网格分辨率的放置位置。当求解困难集中在尖锐梯度、前沿、振荡或约束敏感区域附近时,均匀细化可能浪费自由度。本文研究了一种混合策略,其中物理信息神经网络(PINN)不作为最终求解器,而是作为自适应网格细化的离网格残差探针。PINN残差在域内采样,转换为单元指示器,并在最终近似由有限差分求解器计算之前指导细化。该方法在三个基准测试上进行了评估。主要的全求解器验证使用一维粘性Burgers方程,在自适应网格上进行非均匀有限差分求解。PINN阈值细化在60个自由度下达到最终相对$L^2$误差0.021067,而均匀细化在192个自由度下为0.022617。在匹配网格大小时,PINN阈值将误差降低了约67.5%。PINN-Dörfler细化性能类似,使用58个自由度时误差为0.021264。梯度指示器仍略精确,因此结果支持有用性而非普遍优越性。基于非线性薛定谔方程和不可压缩Navier-Stokes系统的二维和三维代理测试表明,PINN残差可以组织结构化细化并优于随机细化,尽管它们并不始终优于梯度或均匀基线。结果支持PINN引导的AMR作为一种残差指示器策略,将物理信息诊断信息传递到有限差分网格自适应中,同时保留经典求解器作为最终近似引擎。

英文摘要

Classical finite-difference solvers remain reliable tools for partial differential equations, but their efficiency depends on where mesh resolution is placed. Uniform refinement can waste degrees of freedom when solution difficulty is localised near sharp gradients, fronts, oscillations, or constraint-sensitive regions. This paper studies a hybrid strategy in which a physics-informed neural network (PINN) is used not as the final solver, but as an off-grid residual probe for adaptive mesh refinement. The PINN residual is sampled over the domain, converted into cellwise indicators, and used to guide refinement before the final approximation is computed by a finite-difference solver. The method is evaluated on three benchmarks. The main full-solver validation uses the one-dimensional viscous Burgers equation with a nonuniform finite-difference solve on the adapted meshes. PINN-threshold refinement attains final relative $L^2$ error $0.021067$ with $60$ degrees of freedom, compared with $0.022617$ for uniform refinement with $192$ degrees of freedom. At matched mesh size, PINN-threshold reduces the error by about $67.5\%$. PINN-Dörfler refinement gives similar performance, with error $0.021264$ using $58$ degrees of freedom. A gradient indicator remains slightly more accurate, so the result supports usefulness rather than universal superiority. Manufactured 2D and 3D proxy tests, based on a nonlinear Schrödinger equation and an incompressible Navier--Stokes system, show that PINN residuals can organise structured refinement and improve over random refinement, although they do not consistently outperform gradient or uniform baselines. The results support PINN-guided AMR as a residual-indicator strategy for transferring physics-informed diagnostic information into finite-difference mesh adaptation while preserving the classical solver as the final approximation engine.

2606.04101 2026-06-08 cs.DC cs.LG 版本更新

UltraEP: Unleash MoE Training and Inference on Rack-Scale Nodes with Near-Optimal Load Balancing

UltraEP:在机架级节点上以近最优负载均衡释放MoE训练与推理

Xinming Wei, Chao Jin, Tuo Dai, Yinmin Zhong, Shan Yu, Chengxu Yang, Bingyang Wu, Zili Zhang, Jing Mai, Qianchao Zhu, Zhouyang Li, Yuliang Liu, Guojie Luo

AI总结 提出UltraEP,首个基于精确负载的实时均衡器,通过协同设计规划求解与专家复制通信,在机架级节点上实现MoE训练和推理的微批次与逐层重均衡,达到94.3%的力均衡理想吞吐量。

Comments The authors have identified issues related to information disclosure in the current version of the manuscript and therefore request its withdrawal. A revised version may be prepared at a later date

详情
AI中文摘要

大规模专家并行(EP)正成为训练和服务前沿MoE模型的关键,但它也加剧了设备级专家负载不均衡,导致计算掉队者、令牌全对全瓶颈和激活内存峰值。现有的均衡器基于历史负载定期重新分配专家,这对于具有非平稳负载模式的生产部署变得不可靠。我们提出UltraEP,首个用于大规模EP MoE训练和在机架级节点(RSN)上服务预填充的精确负载实时均衡器。基于RSN扩展的纵向扩展连接性,UltraEP在关键路径上对每个微批次和层进行重均衡,这需要规划求解和专家复制通信的非平凡协同设计,以最小化暴露的开销。为此,UltraEP通过高效的配额驱动规划对门控后负载做出积极反应,并利用RSN原生的持久tile流和基于中继的扇出缓解来执行由此产生的不规则专家状态传输。在训练和预填充中,平均涵盖106B到671B参数的MoE模型,UltraEP实现了力均衡理想吞吐量的94.3%,相比无均衡提升了1.49倍,同时将最终跨秩不均衡从1.30-4.01降低到1.01-1.04。此外,我们在2560个GPU的生产MoE训练中验证了UltraEP的可扩展性和鲁棒性。

英文摘要

Large-scale expert parallelism (EP) is becoming pivotal for training and serving frontier MoE models, but it also amplifies device-level expert load imbalance into compute stragglers, token all-to-all bottlenecks, and activation-memory spikes. Existing balancers redistribute experts periodically based on historical load, which becomes unreliable for production deployments with non-stationary load patterns. We present UltraEP, the first exact-load, real-time balancer for large-EP MoE training and serving prefill on rack-scale nodes (RSNs). Built upon the extended scale-up connectivity of RSNs, UltraEP rebalances every microbatch and layer on critical paths, which requires nontrivial co-design of plan solving and expert replication communication to minimize exposed overhead. To this end, UltraEP eagerly reacts to post-gating load with efficient quota-driven planning, and executes the resulting irregular expert-state transfers with RSN-native persistent tile streaming and relay-based fan-out mitigation. Averaged across MoE models from 106B to 671B parameters in training and prefill, UltraEP achieves 94.3% of the force-balanced ideal throughput, delivering 1.49$\times$ improvement over non-balancing, while reducing the final inter-rank imbalance from 1.30$-$4.01 to 1.01$-$1.04. Additionally, we validate UltraEP's scalability and robustness in production MoE training with 2560 GPUs.

2606.05654 2026-06-08 cs.SE cs.AI cs.LG 版本更新

When Surface Form Changes Moderation Decisions: A Paired Study of Code-Mixed Workflow Instability

当表面形式改变审核决策:代码混合工作流不稳定性的配对研究

Suraj Babu Thimma Krishnaram, Yibo Hu, Karthikeyan Saravanan

发表机构 * GitHub

AI总结 通过配对评估设置,研究在清洁英语与泰米尔语-英语代码混合输入下,仇恨审核工作流的变化,发现代码混合导致决策翻转率高达0.265,并增加审核负担和误报。

详情
AI中文摘要

仇恨审核通常被评估为对清洁英语输入的分类,但部署的系统必须将内容路由到诸如ALLOW、FLAG或REVIEW等操作。我们通过配对评估设置研究这种工作流在代码混合输入下的变化,其中相同的基础内容以清洁英语和泰米尔语-英语代码混合形式表达。在基于清洁英语开发数据调整的阈值下,代码混合输入产生显著的动作不稳定性,配对清洁到代码混合决策翻转率为0.265。主要工作流影响是增加的审核负担和增加的非仇恨内容误报:审核率从0.138上升到0.297,非仇恨误报率从0.069上升到0.104。仅泰米尔语输入整体表现出更强的退化,表明存在更广泛的语言覆盖限制,而非相同的代码混合不稳定性模式。一个简单的基于分歧的延迟规则减少了压力输入上的自动错误,但只能通过增加审核负载。这些结果表明,工作流级别的评估揭示了标准分类摘要可能遗漏的审核失败。

英文摘要

Hate moderation is often evaluated as classification on clean English inputs, but deployed systems must route content to actions such as ALLOW, FLAG, or REVIEW. We study how this workflow changes under code-mixed inputs using a paired evaluation setting where the same underlying content is expressed as clean English and Tamil-English code-mix. Under thresholds tuned on clean English development data, code-mixed inputs produce substantial action instability, with a paired clean- to-code-mix decision flip rate of 0.265. The main workflow effects are increased review burden and increased false-flagging of non-hateful content: review rate rises from 0.138 to 0.297 and non-hate false-flag rate rises from 0.069 to 0.104. Tamil-only inputs show stronger degradation overall, suggesting a broader language-coverage limitation rather than the same code-mixed instability pattern. A simple disagreement-based deferral rule reduces automatic errors on stressed inputs, but only by increasing review load. These results show that workflow-level evaluation reveals moderation failures that standard classification summaries can miss.

2606.05682 2026-06-08 cs.AI cs.LG 版本更新

Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillation

超越输出匹配:在NVFP4 LLM蒸馏中保留内部几何结构

Fangbo Tu, Junhua Zhao, Chi Liu, Xin Chen, Haifeng Wu, Jian Wan, Srinivasan Manoharan

发表机构 * Fangbo Tu(图方波) Junhua Zhao(赵俊华) Chi Liu(刘驰) Xin Chen(陈新) Haifeng Wu(吴海峰) Jian Wan(万健) Srinivasan Manoharan(曼纳哈兰)

AI总结 针对NVFP4低精度量化蒸馏中仅输出匹配导致内部表示退化的问题,提出CKA-QAD方法,通过CKA引导的层间Gram矩阵对齐保留内部几何结构,提升推理和编码任务准确率。

Comments 13 pages,1 figures

详情
AI中文摘要

随着大型语言模型越来越多地部署在延迟和成本受限的生产环境中,对低精度推理(包括基于NVFP4的方法)的需求不断增长。量化感知蒸馏(QAD)通过训练量化学生模型以KL散度损失匹配冻结的高精度教师模型的输出分布,帮助恢复低比特量化下的精度损失。在这项工作中,我们首先对QAD进行表示层面的诊断:仅输出匹配可能掩盖内部退化,因为许多中间激活几何结构可以产生相似的教师对齐logits。使用CKA,我们表明仅使用KL的QAD相对于BF16教师模型会降低层间表示相似性,在RL后训练模型中尤其严重。这种退化与推理和编码任务的下游瓶颈相关,表明低比特恢复需要保留内部几何结构,而不仅仅是匹配输出。受此发现启发,我们提出了CKA-QAD,一种用于NVFP4 QAD和低比特LLM精度恢复的CKA引导表示对齐方法。该方法添加了一个轻量级正则化器,通过在蒸馏过程中通过CKA对齐层间Gram矩阵来保留内部表示几何结构。在Nemotron 3 Nano和Qwen3-4B-Thinking-2507上,CKA-QAD显著改善了表示对齐,并以适度的训练开销提高了下游推理和编码精度。我们的发现将CKA引导的表示对齐定位为量化LLM恢复中输出匹配的实用补充。

英文摘要

Demand for low-precision inference, including NVFP4-based approaches, has grown as large language models are increasingly deployed in latency and cost constrained production environments. Quantization-aware distillation (QAD) helps recover accuracy lost under low bit quantization by training a quantized student to match the output distribution of a frozen higher precision teacher via a KL-divergence loss. In this work, we first provide a representation level diagnosis of QAD: output matching alone can mask internal degradation, because many intermediate activation geometries can yield similar teacher-aligned logits. Using CKA, we show that KL-only QAD can reduce layerwise representational similarity relative to the BF16 teacher, with especially severe drift in RL-post-trained models. This drift correlates with downstream bottlenecks on reasoning and coding tasks, suggesting that low bit recovery requires preserving internal geometry rather than matching outputs alone. Motivated by this finding, we propose \textbf{CKA-QAD}, a CKA-guided representational alignment method for NVFP4 QAD and low bit LLM accuracy recovery. The method adds a lightweight regularizer that preserves internal representational geometry during distillation by aligning layerwise Gram matrices through CKA. Across Nemotron 3 Nano and Qwen3-4B-Thinking-2507, CKA-QAD substantially improves representational alignment and improves downstream reasoning and coding accuracy with modest training overhead. Our findings position CKA-guided representational alignment as a practical complement to output matching for quantized LLM recovery.

2606.05919 2026-06-08 stat.ML cs.LG econ.EM stat.CO 版本更新

Finding Most Influential Sets

寻找最具影响力的集合

Lucas D. Konrad, Nikolas Kuschnig

发表机构 * Vienna University of Economics and Business(维也纳经济与商业大学) Monash University(墨尔本大学)

AI总结 针对具有线性分式留出效应的估计量,提出一种基于Dinkelbach方法的高效算法,将最具影响力集合的选择转化为一个单参数序列的top-k问题,实现全局最优解。

Comments Published as a conference paper at ICML 2026, fixed ref

详情
AI中文摘要

识别最具影响力的集合(MIS)——即移除后能最大程度改变目标估计量的大小为$k$的子集——通常是不可行的,因为需要搜索$inom{n}{k}$个子集。对于具有线性分式留出效应的估计量,我们证明MIS选择可简化为一个单参数序列的top-k问题。Dinkelbach方法产生了一种每轮迭代成本为$\mathcal{O}(n)$且有限终止的算法。对于固定残差化输入,该算法返回单变量比率目标的全局最优集,包括预言机残差化偏线性模型。当存在估计的干扰函数时,均匀分母和生成得分稳定性意味着对一阶预言机正交得分目标的近似;在分离条件下,可精确恢复集合。模拟和应用表明,该方法恢复了以前计算上无法访问的精确MIS。

英文摘要

Identifying most influential sets (MIS) - size-$k$ subsets whose removal maximally changes a target estimand - is typically infeasible because it requires searching over $\binom{n}{k}$ subsets. For estimands with linear-fractional leave-set-out effects, we show that MIS selection reduces to a one-parameter sequence of top-$k$ problems. Dinkelbach's method yields an algorithm with $\mathcal{O}(n)$ cost per iteration and finite termination. For fixed residualized inputs, the algorithm returns a globally optimal set for the univariate ratio objective, including the oracle-residualized partial linear model. With estimated nuisance functions, uniform denominator and generated-score stability imply approximation to the first-order oracle orthogonal-score objective; exact set recovery follows under a separation condition. Simulations and applications show that the method recovers exact MIS that were previously computationally inaccessible.

2606.05967 2026-06-08 stat.ML cs.LG 版本更新

Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. Samples

具有线性函数逼近、通用学习步长和独立同分布样本的TD(0)的快速鲁棒收敛速率

Ziad Kobeissi, Éloïse Berthier

发表机构 * L2S, INRIA U2IS, ENSTA Université Paris-Saclay, CentraleSupelec, Institut Polytechnique de Paris(L2S、INRIA U2IS、ENSTA巴黎萨克雷大学、中央苏格兰、巴黎理工学院)

AI总结 针对线性函数逼近的TD(0)算法,在独立同分布样本和常数学习步长下,提出一种均方误差的快速(1/k阶)、鲁棒(不依赖最小特征值)且尖锐(乘性常数小于11)的收敛速率,并引入PCTD(0)变体以在强混合假设下获得更好收敛性。

Comments This is an extended version of a paper accepted at AISTATS 2026

详情
Journal ref
AISTATS 2026, May 2026, Tanger, Morocco
AI中文摘要

本文研究了具有线性函数逼近(LFA)的TD(0)时序差分方法的有限时间行为。我们考虑策略内独立同分布(i.i.d.)样本、常数学习步长和Polyak-Juditsky平均方法。我们为近似函数的均方误差(MSE)建立了一个新的收敛速率,该速率(i)快速,即具有迭代次数k的最优依赖性(即1/k阶),(ii)对病态条件鲁棒:仅依赖于初始误差和模型无关常数,以及(iii)尖锐,乘性常数小于11。特别地,与TD(0)文献中所有现有的O(1/k)速率不同,它不依赖于线性参数化的非中心协方差矩阵的最小特征值。我们还引入了PCTD(0),这是TD(0)的一个变体,在马尔可夫链的强混合附加假设下具有更好的收敛性质。

英文摘要

In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method. We establish a new convergence rate, for the Mean-Square Error (MSE) on the approximated function, that is (i) fast in the sense that it admits an optimal dependency in the number of iterations k (i.e., of order 1/k), (ii) robust to ill-conditioning: it only depends on an initial error and modelindependent constants and (iii) sharp up to a multiplicative constant lower than 11. In particular, it does not depend on the smallest eigenvalue of the uncentered covariance matrix of the linear parametrization, unlike all pre-existing O(1/k) rates in the TD(0) literature. We also introduce PCTD(0), a variant of TD(0), which benefits from better convergence properties under an additional assumption of strong mixing on the Markov Chain.

2606.06224 2026-06-08 cs.CV cs.LG 版本更新

Symb-xMIL: Symbolic Explanations for Multiple Instance Learning in Digital Pathology

Symb-xMIL: 数字病理学中多实例学习的符号解释

Yanqing Luo, Julius Hense, Niklas Prenißl, Andreas Mock, Klaus-Robert Müller, Thomas Schnake, Mina Jamshidi Idaji

发表机构 * Berlin Institute for the Foundations of Learning and Data(柏林学习与数据基础研究院) Machine Learning Group, Technische Universität Berlin(柏林技术大学机器学习组) Institute of Pathology, Charité Universitätsmedizin(查理研究所病理学部) Berlin Institute of Health at Charité – Universitätsmedizin Berlin, BIH Biomedical Innovation Academy, BIH Charité Digital Clinician Scientist Program(柏林查理医学研究院健康研究所、BIH生物医学创新学院、BIH查理数字临床科学家项目) Institute of Pathology, Ludwig Maximilian University of Munich(慕尼黑路德维希-马克西米利安大学病理学部) Division of Translational Medical Oncology, DKFZ(转化医学肿瘤学部,德国有机化学研究所) German Cancer Consortium (DKTK), partner site Munich, a partnership between DKFZ and Ludwig-Maximilians-Universität München (LMU)(德国癌症联盟(DKTK),慕尼黑合作伙伴站点,由DKFZ和路德维希-马克西米利安-慕尼黑大学(LMU)组成) Department of Artificial Intelligence, Korea University(韩国大学人工智能系) Max-Planck Institute for Informatics, Saarbrücken, Germany(马克斯·普朗克信息学院,萨尔布吕肯,德国) Department of Chemistry, Chemical Physics Theory Group, University of Toronto(多伦多大学化学系,化学物理理论组) Vector Institute for Artificial Intelligence, Toronto, Canada(多伦多人工智能矢量研究所) Acceleration Consortium, University of Toronto(多伦多大学加速联盟)

AI总结 提出Symb-xMIL框架,通过量化模型行为与可读决策规则(逻辑关系)的对齐程度,为多实例学习提供结构化的符号解释,并在合成和真实病理数据上验证其有效性。

Comments 23 pages, 18 figures

详情
AI中文摘要

多实例学习(MIL)模型的解释被广泛用于数字组织病理学的验证和发现。现有方法主要依赖于突出显示影响区域的热力图,但不解释如何将不同组织区域的证据组合以产生预测。这限制了可解释性,尤其是当决策依赖于组织特征之间的交互时。我们引入了符号可解释MIL(Symb-xMIL),一种事后解释框架,量化MIL模型的行为与人类可读决策规则(表示为输入特征之间的逻辑关系,如AND、OR、NOT)的对齐程度。这些对齐分数揭示了模型预测背后的语义模式。我们在合成和真实世界的组织病理学数据集上评估了Symb-xMIL。在合成MIL数据上,Symb-xMIL可靠地恢复了真实逻辑规则。在临床肿瘤检测任务中,最佳对齐的规则揭示了异质决策模式并暴露了隐藏的模型错误。在TCGA-HNSCC(头颈癌队列)的HPV预测任务中,我们的框架在HPV状态之外细化了患者生存分层,具有潜在的临床相关性。总体而言,Symb-xMIL将MIL的可解释性从视觉归因扩展到结构化的、基于规则的推理,实现了对模型预测更透明和基于语义的解释。

英文摘要

Explanations of multiple instance learning (MIL) models are widely used for validation and discovery in digital histopathology. Existing methods primarily rely on heatmaps that highlight influential regions but do not explain how evidence from different tissue regions is combined to produce a prediction. This limits interpretability, especially when decisions depend on interactions between tissue features. We introduce Symbolic explainable MIL (Symb-xMIL), a post-hoc explanation framework that quantifies how a MIL model's behavior aligns with human-readable decision rules, expressed as logical relationships (e.g., AND, OR, NOT) between input features. These alignment scores reveal semantic patterns underlying the model's predictions. We evaluate Symb-xMIL on synthetic and real-world histopathology datasets. On synthetic MIL data, Symb-xMIL reliably recovers ground-truth logical rules. In a clinical tumor detection task, the best-aligned rules uncover heterogeneous decision patterns and expose hidden model errors. On an HPV-prediction task on TCGA-HNSCC, a cohort of head and neck cancer, our framework refines patient survival stratification beyond HPV status with potential clinical relevance. Overall, Symb-xMIL extends MIL explainability beyond visual attribution toward structured, rule-based reasoning, enabling more transparent and semantically grounded interpretation of model predictions.

2507.12878 2026-06-08 eess.SP cs.LG stat.ML 版本更新

Bayesian Modeling and Estimation of Linear Time-Varying Systems using Neural Networks and Gaussian Processes

基于神经网络和高斯过程的线性时变系统贝叶斯建模与估计

Yaniv Shulman

发表机构 * Shulman.info(Shulman信息)

AI总结 本文提出一种统一的贝叶斯框架,通过将系统脉冲响应建模为随机过程,利用变分推断和高斯过程,实现了对线性时变系统的鲁棒估计。

详情
AI中文摘要

本文提出了一种统一的贝叶斯框架,通过将系统脉冲响应建模为随机过程,利用变分推断和高斯过程,实现了对线性时变系统的鲁棒估计。

英文摘要

The identification of Linear Time-Varying (LTV) systems from input-output data is a fundamental yet challenging ill-posed inverse problem. This work introduces a unified Bayesian framework that models the system's impulse response, $h(t, τ)$, as a stochastic process. We decompose the response into a posterior mean and a random fluctuation term, a formulation that provides a principled approach for quantifying uncertainty, unifies intrinsic channel variability and epistemic uncertainty through a common posterior representation, and naturally defines a new, useful system class we term Linear Time-Invariant in Expectation (LTIE). To perform inference, we leverage modern machine learning techniques, including Bayesian neural networks and Gaussian Processes, using scalable variational inference. We demonstrate through a series of experiments that our framework can infer the properties of an LTI system from a single noisy input-output pair, including under deliberate additive-noise misspecification, achieve a lower overall error floor than the classical CCF stacking baseline in a simulated ambient noise tomography setting, and track a continuously varying LTV impulse response by using a structured Gaussian Process prior. This work provides a flexible and robust methodology for uncertainty-aware system identification in dynamic environments.

2603.14573 2026-06-08 cond-mat.dis-nn cs.LG math.PR 版本更新

Rigorous Asymptotics for First-Order Algorithms Through the Dynamical Cavity Method

通过动力学空腔方法严格推导一阶算法的渐进行为

Yatin Dandi, David Gamarnik, Francisco Pernice, Lenka Zdeborová

发表机构 * Statistical Physics of Computation Laboratory, École polytechnique fédérale de Lausanne (EPFL)(计算统计物理实验室,瑞士联邦理工学院(EPFL)) Sloan School of Management, Operations Research Center and Institute of Data, Systems and Society (IDSS), MIT(斯隆管理学院,运筹学中心和数据、系统与社会研究所(IDSS),麻省理工学院) CSAIL and LIDS, MIT(计算机科学与人工智能实验室(CSAIL)和麻省理工学院数据科学研究所(LIDS))

AI总结 本文通过严格形式化的动力学空腔方法,推导出一阶算法(如梯度下降和近似消息传递)的动力学主方程,为非严谨的传统方法提供数学基础。

详情
Journal ref
COLT 2026
AI中文摘要

通过动力学空腔方法严格推导一阶算法的渐进行为,本文建立了动态平均场理论(DMFT)方程的数学基础,为广义一阶方法(包括梯度下降和近似消息传递等算法)的动力学行为提供了严格形式化的描述。

英文摘要

Dynamical Mean Field Theory (DMFT) provides an asymptotic description of the dynamics of macroscopic observables in certain disordered systems. Originally pioneered in the context of spin glasses by Sompolinsky and Zippelius (1982), it has since been used to derive asymptotic dynamical equations for a wide range of models in physics, high-dimensional statistics and machine learning. One of the main tools used by physicists to obtain these equations is the dynamical cavity method, which has remained largely non-rigorous. In contrast, existing mathematical formalizations have relied on alternative approaches, including Gaussian conditioning, large deviations over paths, or Fourier analysis. In this work, we formalize the dynamical cavity method and use it to give a new proof of the DMFT equations for General First Order Methods, a broad class of dynamics encompassing algorithms such as Gradient Descent and Approximate Message Passing.

2505.19888 2026-06-08 cs.LG 版本更新

Generalized and Personalized Federated Learning with Black-Box Foundation Models via Orthogonal Transformations

基于正交变换的联邦学习与个性化方法:通过黑盒基础模型

Eun Gyung Kong, Je Won Yeom, Yonghoon Jeon, Taesup Kim

发表机构 * Seoul National University(首尔国立大学) Mobilint, Inc.(Mobilint公司) Kakao Healthcare Corp.(Kakao医疗公司)

AI总结 本文提出FedOT框架,通过正交变换实现联邦学习中的鲁棒泛化与有效个性化,在异构环境中提升性能,优于基线方法。

Comments 31 pages, 5 figures

详情
Journal ref
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 24567-24576, 2026
AI中文摘要

联邦学习(FL)在保护数据隐私的同时促进去中心化模型训练。然而,在异构(非iid)环境中同时实现鲁棒泛化和有效个性化仍是一个严峻挑战。此外,基础模型(FMs)的广泛使用要求双重隐私保护:(a)保护敏感客户端数据和(b)保护服务器的知识产权。这需要严格黑盒访问FMs。为解决这些挑战,我们引入FedOT,一种针对黑盒FMs优化的联邦学习框架。FedOT采用共享的全局任务依赖分类器,同时通过客户端特定的正交变换实现本地适应,该变换应用于FMs嵌入之外。这种架构本质上保证FMs内部参数保持不可访问和未修改。通过强制正交性,FedOT有效缓解了跨不同客户端的梯度冲突,理论上有界,保持FMs表示的语义完整性,并在显著的数据异质性下实现稳健性能。全局和本地参数的协同优化最佳平衡了泛化和个性化,显著优于基线FL方法。广泛的实证分析,包括严格多种子验证和可扩展性评估,证实了FedOT的鲁棒性、效率和优越性能。

英文摘要

Federated Learning (FL) facilitates decentralized model training while preserving data privacy. However, achieving both robust generalization and effective personalization simultaneously in heterogeneous (non-IID) environments remains a formidable challenge. Furthermore, the widespread adoption of proprietary Foundation Models (FMs) introduces a critical requirement for dual privacy: (a) protecting sensitive client data and (b) securing the server's valuable intellectual property. This mandates strictly black-box access to the FM. To address these multifaceted challenges, we introduce FedOT, a novel FL framework optimized for black-box FMs. FedOT employs a shared global task-dependent classifier while facilitating local adaptation through client-specific orthogonal transformations applied externally to the FM embeddings. This architecture inherently guarantees that the FM's internal parameters remain inaccessible and unmodified. By enforcing orthogonality, FedOT effectively mitigates gradient conflicts across diverse clients, which is theoretically bounded, preserves the semantic integrity of the FM representations, and achieves robust performance under significant data heterogeneity. The synergy of global and local parameters optimally balances generalization and personalization, markedly outperforming baseline FL methods across diverse benchmarks. Extensive empirical analysis, including rigorous multi-seed validation and scalability assessments, substantiates the robustness, efficiency, and superior performance of FedOT.

2502.21123 2026-06-08 cs.LG cs.AI 版本更新

Causality Is Key to Understand and Balance Multiple Goals in Trustworthy ML and Foundation Models

因果关系是理解和平衡可信机器学习与基础模型中多个目标的关键

Ruta Binkyte, Ivaxi Sheth, Zhijing Jin, Mohammad Havaei, Bernhard Schölkopf, Mario Fritz

发表机构 * CISPA Helmholtz Center for Information Security(CISPA海德堡信息安全中心) Max Planck Institute for Intelligent Systems, Tübingen(马克斯·普朗克智能系统研究所(图宾根)) Google Research(谷歌研究) ETH Zürich(苏黎世联邦理工学院) University of Toronto(多伦多大学)

AI总结 本文主张将因果方法集成到机器学习中,以平衡公平性、隐私、鲁棒性、准确性和可解释性等可信原则之间的权衡,并探讨其在基础模型中的实际应用。

详情
AI中文摘要

确保机器学习系统的可信度至关重要,因为它们日益嵌入高风险领域。本文主张将因果方法集成到机器学习中,以应对可信机器学习关键原则(包括公平性、隐私、鲁棒性、准确性和可解释性)之间的权衡。虽然这些目标理想情况下应同时满足,但它们通常被孤立地处理,导致冲突和次优解决方案。借鉴因果在ML中成功协调目标(如公平性与准确性,或隐私与鲁棒性)的现有应用,本文认为因果方法对于平衡可信ML和基础模型中的多个竞争目标至关重要。除了强调这些权衡,我们考察了如何将因果实际集成到ML和基础模型中,提供增强其可靠性和可解释性的解决方案。最后,我们讨论了采用因果框架的挑战、局限性和机遇,为更负责任和合乎伦理的AI系统铺平道路。

英文摘要

Ensuring trustworthiness in machine learning (ML) systems is crucial as they become increasingly embedded in high-stakes domains. This paper advocates for integrating causal methods into machine learning to navigate the trade-offs among key principles of trustworthy ML, including fairness, privacy, robustness, accuracy, and explainability. While these objectives should ideally be satisfied simultaneously, they are often addressed in isolation, leading to conflicts and suboptimal solutions. Drawing on existing applications of causality in ML that successfully align goals such as fairness and accuracy or privacy and robustness, this paper argues that a causal approach is essential for balancing multiple competing objectives in both trustworthy ML and foundation models. Beyond highlighting these trade-offs, we examine how causality can be practically integrated into ML and foundation models, offering solutions to enhance their reliability and interpretability. Finally, we discuss the challenges, limitations, and opportunities in adopting causal frameworks, paving the way for more accountable and ethically sound AI systems.

2602.10680 2026-06-08 stat.ML cond-mat.dis-nn cs.LG 版本更新

A solvable high-dimensional model where nonlinear autoencoders learn structure invisible to PCA while test loss misaligns with generalization

一个可解的高维模型,其中非线性自编码器学习到结构对PCA不可见,而测试损失与泛化不一致

Vicente Conde Mendes, Lorenzo Bardone, Cédric Koller, Jorge Medina Moreira, Vittorio Erba, Emanuele Troiani, Lenka Zdeborová

发表机构 * Statistical Physics of Computation Laboratory, École polytechnique fédérale de Lausanne (EPFL)(计算统计物理实验室,瑞士联邦理工学院(EPFL))

AI总结 本文提出一个高维模型,展示非线性自编码器能学习线性方法如PCA无法捕捉的结构,尽管其测试损失与泛化性能不一致。

详情
Journal ref
ICML 2026
AI中文摘要

许多现实世界的数据集包含隐藏的结构,这些结构无法通过输入特征间的简单线性相关性检测到。例如,潜在因子可能以协调的方式影响数据,尽管其影响对基于协方差的方法如PCA不可见。在实践中,非线性神经网络常在无监督和自监督学习中成功提取此类隐藏结构。然而,构建一个最小的高维模型,其中这种优势可以严格分析仍是一个开放的理论挑战。我们引入了一个可解的高维 spiked 模型,包含两个潜在因子:一个对协方差可见,另一个统计上相关但不相关,仅出现在高阶矩中。PCA 和线性自编码器无法恢复后者,而最小的非线性自编码器可以证明性地提取两者。我们分析了总体风险和经验风险最小化。我们的模型还提供了一个可解的例子,其中自监督测试损失与表征质量不一致:非线性自编码器恢复了线性方法无法捕捉的结构,尽管其重建损失更高。

英文摘要

Many real-world datasets contain hidden structure that cannot be detected by simple linear correlations between input features. For example, latent factors may influence the data in a coordinated way, even though their effect is invisible to covariance-based methods such as PCA. In practice, nonlinear neural networks often succeed in extracting such hidden structure in unsupervised and self-supervised learning. However, constructing a minimal high-dimensional model where this advantage can be rigorously analyzed has remained an open theoretical challenge. We introduce a tractable high-dimensional spiked model with two latent factors: one visible to covariance, and one statistically dependent yet uncorrelated, appearing only in higher-order moments. PCA and linear autoencoders fail to recover the latter, while a minimal nonlinear autoencoder provably extracts both. We analyze both the population risk, and empirical risk minimization. Our model also provides a tractable example where self-supervised test loss is poorly aligned with representation quality: nonlinear autoencoders recover latent structure that linear methods miss, even though their reconstruction loss is higher.

2509.24914 2026-06-08 stat.ML cond-mat.dis-nn cs.IT cs.LG math.IT 版本更新

Single-Head Attention in High Dimensions: A Theory of Generalization, Weights Spectra, and Scaling Laws

高维中的单头注意力:一般化、权重谱和扩展定律的理论

Fabrizio Boncoraglio, Vittorio Erba, Emanuele Troiani, Yizhou Xu, Florent Krzakala, Lenka Zdeborová

发表机构 * Statistical Physics of Computation Laboratory, École polytechnique fédérale de Lausanne (EPFL)(计算物理实验室,瑞士联邦理工学院(EPFL)) Information, Learning and Physics Laboratory, École polytechnique fédérale de Lausanne (EPFL)(信息、学习与物理实验室,瑞士联邦理工学院(EPFL))

AI总结 本文研究了高维序列任务中训练的注意力层权重谱结构,通过随机矩阵理论等工具,揭示了训练误差、插值阈值及键查询矩阵谱的高维特性,并预测了功率谱定律的出现。

详情
Journal ref
ICML 2026
AI中文摘要

训练的注意力层表现出显著且可重复的权重谱结构,包括低秩坍塌、批量变形和孤立谱异常,但其起源及对泛化的影响尚不明确。本文通过在合成高维序列任务上训练单头绑定注意力层,利用随机矩阵理论、自旋玻璃理论和近似消息传递工具,获得训练和测试误差、插值和恢复阈值及键查询矩阵谱的高维表征。理论预测了训练查询-键映射的完整奇异值分布,包括低秩结构和孤立谱异常,与更现实的Transformer观察结果定性一致。最后,对于具有幂律谱的目标,显示学习通过序列谱恢复进行,导致幂律扩展定律的出现。

英文摘要

Trained attention layers exhibit striking and reproducible spectral structure of the weights, including low-rank collapse, bulk deformation, and isolated spectral outliers, yet the origin of these phenomena and their implications for generalization remain poorly understood. We study empirical risk minimization in a single-head tied-attention layer trained on synthetic high-dimensional sequence tasks generated from the attention-indexed model. Using tools from random matrix theory, spin-glass theory, and approximate message passing, we obtain an exact high-dimensional characterization of training and test error, interpolation and recovery thresholds, and the spectrum of the key and query matrices. Our theory predicts the full singular-value distribution of the trained query-key map, including low-rank structure and isolated spectral outliers, in qualitative agreement with observations in more realistic transformers. Finally, for targets with power-law spectra, we show that learning proceeds through sequential spectral recovery, leading to the emergence of power-law scaling laws.

2411.05729 2026-06-08 cs.LG stat.ML 版本更新

Graph-Dictionary Signal Model for Sparse Representations of Multivariate Data

图词典信号模型用于多变量数据的稀疏表示

William Cappelletti, Pascal Frossard

发表机构 * LTS4, EPFL, Lausanne, Switzerland(瑞士洛桑联邦理工学院LTS4实验室)

AI总结 本文提出图词典信号模型,通过图结构描述多变量数据中的关系,利用稀疏组合的图原子进行信号重构,优于现有基线方法。

详情
AI中文摘要

表示和利用多变量信号需要捕捉变量间的关系,我们通过图来表示这些关系。图词典允许将复杂的关联信息表示为稀疏简单结构之和,但目前尚无先验模型能从数据中推断此类底层结构元素。我们定义了新的图词典信号模型,其中有限的图集合通过其拉普拉斯算子加权和的稀疏组合来描述数据分布中的关系。我们提出了一种从观测节点信号中推断图词典表示的框架,允许包含关于信号属性、底层图及其系数的先验知识。我们引入了原始-对偶分裂算法的双线性推广来解决学习问题。我们展示了该方法在多个合成设置中从信号中重建图的能力,其中我们的模型优于流行的基线方法。然后,我们利用图词典表示在脑活动数据上的示例运动解码任务中,比依赖更多特征的标准方法更好地分类想象运动。我们的图词典模型弥合了多变量数据稀疏表示与样本变化关系的结构分解之间的差距。

英文摘要

Representing and exploiting multivariate signals requires capturing relations between variables, which we can represent by graphs. Graph dictionaries allow to describe complex relational information as a sparse sum of simpler structures, but no prior model exists to infer such underlying structure elements from data. We define a novel Graph-Dictionary signal model, where a finite set of graphs characterizes relationships in data distribution as filters on the weighted sum of their Laplacians. We propose a framework to infer the graph dictionary representation from observed node signals, which allows to include a priori knowledge about signal properties, and about underlying graphs and their coefficients. We introduce a bilinear generalization of the primal-dual splitting algorithm to solve the learning problem. We show the capability of our method to reconstruct graphs from signals in multiple synthetic settings, where our model outperforms popular baselines. Then, we exploit graph-dictionary representations in an illustrative motor imagery decoding task on brain activity data, where we classify imagined motion better than standard methods relying on many more features. Our graph-dictionary model bridges a gap between sparse representations of multivariate data and a structured decomposition of sample-varying relationships into a sparse combination of elementary graph atoms.

2403.09110 2026-06-08 cs.LG cs.SY eess.SY math.DS math.OC 版本更新

SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning

SINDy-RL:可解释且高效的基于模型的强化学习

Nicholas Zolman, Christian Lagemann, Urban Fasel, J. Nathan Kutz, Steven L. Brunton

发表机构 * Department of Mechanical Engineering, University of Washington, Seattle, WA 98195, USA(华盛顿大学机械工程系) Data Science and Artificial Intelligence Department, The Aerospace Corporation, El Segundo, CA 90245(航空航天公司数据科学与人工智能部) Department of Aeronautics, Imperial College, London SW7 2AZ, United Kingdom(帝国理工学院航空系) Department of Applied Mathematics, University of Washington, Seattle, WA 98195(华盛顿大学应用数学系) Department of Electrical and Computer Engineering, University of Washington, Seattle, WA 98195(华盛顿大学电气与计算机工程系)

AI总结 本文提出SINDy-RL框架,结合SINDy和DRL,实现低数据下高效、可解释的动力学模型和控制策略,通过基准环境和流体控制实验验证其有效性。

Comments For code, see https://github.com/nzolman/sindy-rl. v2 Update: Included Pinball and 3D Airfoil examples. Christian Lagemann added as an author for contributions with the 3D Airfoil code. To appear in Nature Communications

详情
Journal ref
Nat. Commun. 16, 10714 (2025)
AI中文摘要

深度强化学习(DRL)在复杂环境中揭示复杂控制策略方面展现出巨大潜力,如稳定托卡马克聚变反应堆或最小化流体中物体的阻力。然而,DRL需要大量训练示例且成本高昂。此外,依赖深度神经网络导致不可解释的黑箱策略,可能在嵌入式系统中计算成本过高。最近的稀疏字典学习进展,如非线性动态的稀疏识别(SINDy),在低数据条件下展示了创建高效且可解释的数据驱动模型的潜力。本文介绍SINDy-RL,一种结合SINDy和DRL的统一框架,以创建高效、可解释且可信的动力学模型、奖励函数和控制策略。我们在基准控制环境和流体控制问题上展示了方法的有效性,包括在Re=1000时的3D NACA 0012翼型气流抑制。SINDy-RL在显著较少的环境交互中实现了与现代DRL算法相当的性能,并产生比DRL策略小多个数量级的可解释控制策略。

英文摘要

Deep reinforcement learning (DRL) has shown significant promise for uncovering sophisticated control policies that interact in complex environments, such as stabilizing a tokamak fusion reactor or minimizing the drag force on an object in a fluid flow. However, DRL requires an abundance of training examples and may become prohibitively expensive for many applications. In addition, the reliance on deep neural networks often results in an uninterpretable, black-box policy that may be too computationally expensive to use with certain embedded systems. Recent advances in sparse dictionary learning, such as the sparse identification of nonlinear dynamics (SINDy), have shown promise for creating efficient and interpretable data-driven models in the low-data regime. In this work we introduce SINDy-RL, a unifying framework for combining SINDy and DRL to create efficient, interpretable, and trustworthy representations of the dynamics model, reward function, and control policy. We demonstrate the effectiveness of our approaches on benchmark control environments and flow control problems, including gust mitigation on a 3D NACA 0012 airfoil at $Re=1000$. SINDy-RL achieves comparable performance to modern DRL algorithms using significantly fewer interactions in the environment and results in an interpretable control policy orders of magnitude smaller than a DRL policy.

2507.17799 2026-06-08 eess.AS cs.LG cs.SD 版本更新

A Concept-based approach to Voice Disorder Detection

基于概念的方法用于声带疾病检测

Davide Ghia, Gabriele Ciravegna, Alkis Koudounas, Marco Fantini, Erika Crosetti, Giovanni Succo, Tania Cerquitelli

发表机构 * Politecnico di Torino CENTAI Institute(CENTAI研究院) San Feliciano Hospital(San Feliciano医院) SCDU Otorinolaringoiatria, Head Neck Cancer Unit, Ospedale San Giovanni Bosco(SCDU耳鼻喉科,头颈癌症单元,San Giovanni Bosco医院) Dipartimento di Oncologia, Università degli Studi di Torino(肿瘤学系,托里尼大学)

AI总结 本文提出基于概念的声带疾病检测方法,利用可解释AI提升模型透明度,与传统深度学习方法相比,实现更清晰的决策框架。

详情
AI中文摘要

声带疾病影响了大量人口,使用自动化非侵入性技术进行诊断将显著推动医疗进步,提高患者生活质量。近期研究表明,人工智能模型,特别是深度神经网络(DNNs),能有效解决此任务。然而,由于其复杂性,此类模型的决策过程常不透明,限制了其在临床中的可信度。本文探讨了基于可解释AI(XAI)的替代方法,旨在通过提供不同形式的解释来提高DNNs的可解释性。具体而言,本文聚焦于概念模型,如概念瓶颈模型(CBM)和概念嵌入模型(CEM),探讨它们如何在性能上与传统深度学习方法相媲美,同时提供更透明和可解释的决策框架。

英文摘要

Voice disorders affect a significant portion of the population, and the ability to diagnose them using automated, non-invasive techniques would represent a substantial advancement in healthcare, improving the quality of life of patients. Recent studies have demonstrated that artificial intelligence models, particularly Deep Neural Networks (DNNs), can effectively address this task. However, due to their complexity, the decision-making process of such models often remain opaque, limiting their trustworthiness in clinical contexts. This paper investigates an alternative approach based on Explainable AI (XAI), a field that aims to improve the interpretability of DNNs by providing different forms of explanations. Specifically, this works focuses on concept-based models such as Concept Bottleneck Model (CBM) and Concept Embedding Model (CEM) and how they can achieve performance comparable to traditional deep learning methods, while offering a more transparent and interpretable decision framework.

2506.12454 2026-06-08 stat.ML cond-mat.dis-nn cs.CR cs.LG 版本更新

On the existence of consistent adversarial attacks in high-dimensional linear classification

高维线性分类中一致对抗攻击存在的存在性研究

Matteo Vilucchio, Lenka Zdeborová, Bruno Loureiro

发表机构 * Information Learning and Physics Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)(信息学习与物理实验室,瑞士联邦理工学院(EPFL)) Statistical Physics of Computation Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)(计算统计物理实验室,瑞士联邦理工学院(EPFL)) Département d’Informatique, École Normale Supérieure - PSL & CNRS, France(信息学系,法国高等科学研究院(PSL)与国家科学研究中心(CNRS))

AI总结 本文研究高维二分类中对抗攻击与模型表达能力有限导致的误分类区别,提出新的误差度量标准,揭示模型对保持真实标签扰动的脆弱性,理论分析显示模型越过度参数化,对标签保持扰动的敏感性越高。

详情
Journal ref
ICML 2026
AI中文摘要

本文研究高维二分类中对抗攻击与模型表达能力有限或数据有限导致的误分类的本质区别,提出新的误差度量标准,精确捕捉这一区别,量化模型对保持真实标签扰动的脆弱性。我们的主要技术贡献是精确且严谨地对这些度量在良好指定模型和潜在空间模型中的渐进行为进行刻画,揭示与标准稳健误差度量不同的脆弱性模式。理论结果表明,随着模型变得越来越过度参数化,其对标签保持扰动的脆弱性增加,为理解模型对对抗攻击的敏感机制提供了理论见解。

英文摘要

What fundamentally distinguishes an adversarial attack from a misclassification due to limited model expressivity or finite data? In this work, we investigate this question in the setting of high-dimensional binary classification, where statistical effects due to limited data availability play a central role. We introduce a new error metric that precisely capture this distinction, quantifying model vulnerability to consistent adversarial attacks -- perturbations that preserve the ground-truth labels. Our main technical contribution is an exact and rigorous asymptotic characterization of these metrics in both well-specified models and latent space models, revealing different vulnerability patterns compared to standard robust error measures. The theoretical results demonstrate that as models become more overparameterized, their vulnerability to label-preserving perturbations grows, offering theoretical insight into the mechanisms underlying model sensitivity to adversarial attacks.

2311.00212 2026-06-08 cs.LG cs.NA math.DG math.NA 版本更新

A Unified Framework to Enforce, Discover, and Promote Symmetry in Machine Learning

一种统一的框架用于在机器学习中强制、发现和促进对称性

Samuel E. Otto, Nicholas Zolman, J. Nathan Kutz, Steven L. Brunton

发表机构 * AI Institute in Dynamic Systems University of Washington(动态系统人工智能研究所华盛顿大学) Sibley School of Mechanical and Aerospace Engineering, Cornell University(机械与航空航天工程学院,康奈尔大学)

AI总结 本文提出统一框架,通过强制已知对称性、发现未知对称性和促进对称性三种方式,将对称性纳入机器学习模型中,基于李导数的数学框架统一了现有结果。

详情
Journal ref
J. Mach. Learn. Res. 26(248):1-83 (2025)
AI中文摘要

对称性在自然界中普遍存在,并在物理和机器学习中扮演越来越重要的角色。基本对称性,如庞加莱不变性,使在地球实验室发现的物理定律能够扩展到宇宙的最远区域。对称性对于在机器学习应用中实现这种扩展能力至关重要。例如,图像分类中的平移不变性使具有较少参数的模型,如卷积神经网络,能够用较小的数据集进行训练并达到最先进的性能。本文提供了一个统一的理论和方法框架,用于在三种方式中将对称性纳入机器学习模型:1. 在训练模型时强制已知对称性;2. 发现给定模型或数据集的未知对称性;3. 通过学习一个模型来促进对称性,该模型在用户指定的候选群中学习时,当数据中有足够证据时会打破对称性。我们证明这些任务可以被一个共同的数学框架所涵盖,其核心对象是与向量丛上的纤维线性李群作用相关的李导数。我们通过展示强制和发现对称性是线性代数任务,并且在李导数的双线性结构下是互为对偶的,扩展并统一了现有的结果。我们还提出了一种新的促进对称性的方式,通过引入基于李导数和核范数松弛的一类凸正则化函数,以在训练机器学习模型时惩罚对称性破坏。我们解释了这些想法如何应用于广泛范围的机器学习模型,包括基函数回归、动态系统发现、神经网络和作用于场的神经算子。

英文摘要

Symmetry is present throughout nature and continues to play an increasingly central role in physics and machine learning. Fundamental symmetries, such as Poincaré invariance, allow physical laws discovered in laboratories on Earth to be extrapolated to the farthest reaches of the universe. Symmetry is essential to achieving this extrapolatory power in machine learning applications. For example, translation invariance in image classification allows models with fewer parameters, such as convolutional neural networks, to be trained on smaller data sets and achieve state-of-the-art performance. In this paper, we provide a unifying theoretical and methodological framework for incorporating symmetry into machine learning models in three ways: 1. enforcing known symmetry when training a model; 2. discovering unknown symmetries of a given model or data set; and 3. promoting symmetry during training by learning a model that breaks symmetries within a user-specified group of candidates when there is sufficient evidence in the data. We show that these tasks can be cast within a common mathematical framework whose central object is the Lie derivative associated with fiber-linear Lie group actions on vector bundles. We extend and unify several existing results by showing that enforcing and discovering symmetry are linear-algebraic tasks that are dual with respect to the bilinear structure of the Lie derivative. We also propose a novel way to promote symmetry by introducing a class of convex regularization functions based on the Lie derivative and nuclear norm relaxation to penalize symmetry breaking during training of machine learning models. We explain how these ideas can be applied to a wide range of machine learning models including basis function regression, dynamical systems discovery, neural networks, and neural operators acting on fields.

2303.11949 2026-06-08 cs.NE cs.LG 版本更新

A fuzzy adaptive evolutionary-based feature selection and machine learning framework for single and multi-objective body fat prediction

一种基于模糊自适应进化的方法用于单目标和多目标身体脂肪预测的特征选择和机器学习框架

Farshid Keivanian, Raymond Chiong, Zongwen Fan

AI总结 本文提出了一种融合模糊集理论和进化算法的特征选择与机器学习框架,用于提升身体脂肪预测的准确性与稳定性,同时解决多目标优化中的冲突问题。

Comments Due to unforeseen challenges in coordination and supervision, including unavoidable delays, this study requires further review and refinement. To ensure it meets necessary academic and methodological standards, we have decided to withdraw the paper. We appreciate the understanding of the research community

详情
Journal ref
Neurocomputing, Article 132974, 2026
AI中文摘要

预测身体脂肪可以为医疗人员和用户提供预防和诊断心脏病的重要信息。混合机器学习模型通过选择相关身体测量值并捕捉所选特征之间的复杂非线性关系,比简单的回归分析方法表现更好。然而,这些模型也存在一些缺点。将身体脂肪预测建模为组合的单目标和多目标优化问题时,常常陷入局部最优。当多个特征子集产生相似或接近的预测时,避免局部最优变得更加复杂。进化特征选择已被用于解决几种基于机器学习的优化问题。模糊集理论决定了探索和利用的适当水平,同时管理参数化和计算成本。通过进化特征选择、模糊集理论和机器学习算法,探索了一种加权求和身体脂肪预测方法,将矛盾的指标整合到一个复合目标中,由模糊自适应进化特征选择优化。混合模糊自适应全局学习局部搜索通用多样性特征选择应用于这种单目标特征选择-机器学习框架(FAGLSUD-based FS-ML)。在使用较少特征的情况下,该模型比其他混合和最新机器学习模型获得了更准确和稳定的脂肪百分比估计。还提出了多目标FAGLSUD-based FS-MLP,用于同时分析准确性、稳定性和维度冲突。为了做出关于最关键身体部位脂肪沉积和血液脂质水平的明智决策,医疗人员和用户可以使用一个良好的分布的帕累托集的权衡解决方案。

英文摘要

Predicting body fat can provide medical practitioners and users with essential information for preventing and diagnosing heart diseases. Hybrid machine learning models offer better performance than simple regression analysis methods by selecting relevant body measurements and capturing complex nonlinear relationships among selected features in modelling body fat prediction problems. There are, however, some disadvantages to them. Current machine learning. Modelling body fat prediction as a combinatorial single- and multi-objective optimisation problem often gets stuck in local optima. When multiple feature subsets produce similar or close predictions, avoiding local optima becomes more complex. Evolutionary feature selection has been used to solve several machine-learning-based optimisation problems. A fuzzy set theory determines appropriate levels of exploration and exploitation while managing parameterisation and computational costs. A weighted-sum body fat prediction approach was explored using evolutionary feature selection, fuzzy set theory, and machine learning algorithms, integrating contradictory metrics into a single composite goal optimised by fuzzy adaptive evolutionary feature selection. Hybrid fuzzy adaptive global learning local search universal diversity-based feature selection is applied to this single-objective feature selection-machine learning framework (FAGLSUD-based FS-ML). While using fewer features, this model achieved a more accurate and stable estimate of body fat percentage than other hybrid and state-of-the-art machine learning models. A multi-objective FAGLSUD-based FS-MLP is also proposed to analyse accuracy, stability, and dimensionality conflicts simultaneously. To make informed decisions about fat deposits in the most vital body parts and blood lipid levels, medical practitioners and users can use a well-distributed Pareto set of trade-off solutions.

2501.11592 2026-06-08 cs.LG cs.AI cs.CL 版本更新

Training-free Ultra Small Model for Universal Sparse Reconstruction in Compressed Sensing

无需训练的超小模型用于压缩感知中的通用稀疏重建

Chaoqing Tang, Huanze Zhuang, Guiyun Tian, Zhenli Zeng, Yi Ding, Wenzhong Liu, Xiang Bai

发表机构 * School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China(华中科技大学人工智能与自动化学院) China Belt and Road Joint Lab on Measurement and Control Technology, Wuhan, China(中国一带一路测量与控制技术联合实验室) School of Electric and Electrical Engineering, Chongqing University of Technology, Chongqing, China(重庆理工大学电气工程学院) Optics Valley Laboratory, Wuhan, China(光谷实验室) School of Water Conservancy and Transportation, Zhengzhou University, Zhengzhou, China(郑州大学水利与交通学院) School of Software Engineering, Huazhong University of Science and Technology, Wuhan, China(华中科技大学软件工程学院)

AI总结 本文提出无需训练的超小神经模型CL,实现快速稀疏重建,继承传统迭代方法的通用性和可解释性,提升效率和精度。

详情
AI中文摘要

预训练大模型近年来受到广泛关注,但在需要高可解释性或资源有限的应用中面临挑战,如物理传感、医学成像和生物信息学。压缩感知(CS)是已证明的理论,推动了这些应用的许多突破。然而,作为典型的欠定线性系统,CS在使用传统迭代方法时,对大规模数据的稀疏重建时间过长。当前的AI方法如深度展开失败于替代它们,因为预训练模型在超出训练条件和数据分布时泛化性差或缺乏可解释性。本文提出名为系数学习(CL)的超小人工神经模型,实现无需训练的快速稀疏重建,同时完美继承传统迭代方法的泛化性和可解释性,带来融合先验知识的新特性。在CL中,长度为n的信号仅需最少n个可训练参数。一个案例研究模型称为CLOMP用于评估。实验在合成和真实的一维和二维信号上进行,显示了显著的效率和精度提升。与代表性的迭代方法相比,CLOMP在大规模数据上提高了100到1000倍的效率。在八个不同的图像数据集上的测试结果表明,CLOMP在采样率为0.1、0.3、0.5时分别提高了结构相似性指数292%、98%、45%。我们相信这种方法可以真正将CS重建带入AI时代,造福无数依赖稀疏解的欠定线性系统。

英文摘要

Pre-trained large models attract widespread attention in recent years, but they face challenges in applications that require high interpretability or have limited resources, such as physical sensing, medical imaging, and bioinformatics. Compressed Sensing (CS) is a well-proved theory that drives many recent breakthroughs in these applications. However, as a typical under-determined linear system, CS suffers from excessively long sparse reconstruction times when using traditional iterative methods, particularly with large-scale data. Current AI methods like deep unfolding fail to substitute them because pre-trained models exhibit poor generality beyond their training conditions and dataset distributions, or lack interpretability. Instead of following the big model fervor, this paper proposes ultra-small artificial neural models called coefficients learning (CL), enabling training-free and rapid sparse reconstruction while perfectly inheriting the generality and interpretability of traditional iterative methods, bringing new feature of incorporating prior knowledges. In CL, a signal of length $n$ only needs a minimal of $n$ trainable parameters. A case study model called CLOMP is implemented for evaluation. Experiments are conducted on both synthetic and real one-dimensional and two-dimensional signals, demonstrating significant improvements in efficiency and accuracy. Compared to representative iterative methods, CLOMP improves efficiency by 100 to 1000 folds for large-scale data. Test results on eight diverse image datasets indicate that CLOMP improves structural similarity index by 292%, 98%, 45% for sampling rates of 0.1, 0.3, 0.5, respectively. We believe this method can truly usher CS reconstruction into the AI era, benefiting countless under-determined linear systems that rely on sparse solution.

2407.15555 2026-06-08 eess.SP cs.LG 版本更新

The Rlign Algorithm for Enhanced Electrocardiogram Analysis through R-Peak Alignment for Explainable Classification and Clustering

通过R峰对齐提升心电图分析的Rlign算法:用于可解释分类和聚类

Lucas Plagwitz, Lucas Bickmann, Michael Fujarski, Alexander Brenner, Warnes Gobalakrishnan, Lars Eckardt, Antonius Büscher, Julian Varghese

发表机构 * IMI Medical Systems GmbH(IMI医疗系统 GmbH) University of Freiburg(弗赖堡大学)

AI总结 本文提出Rlign算法,通过R峰对齐重构心电图信号,提升分类、聚类和可解释性,优于传统方法和CNN。

详情
Journal ref
European Heart Journal - Digital Health, Volume 7, Issue 5, June 2026, ztag067
AI中文摘要

心电图(ECG)记录长期以来在诊断不同心脏状况中至关重要。最近,使用机器学习方法自动处理ECG的研究变得重要,主要通过在原始ECG信号上使用深度学习方法。像卷积神经网络(CNNs)这样的模型的优势在于能够有效处理生物医学影像或信号数据。然而,这种优势受到缺乏可解释性、需要大量训练数据以及适应于无监督聚类任务的复杂性等挑战的限制。为解决这些问题,我们旨在通过利用其半结构化、周期性形式重新引入浅层学习技术,包括支持向量机和主成分分析,到ECG信号处理中。为此,我们开发并评估了一种转换,能够有效将ECG信号重构为完全结构化的格式,从而后续使用浅层学习算法进行分析。在本研究中,我们提出了这种自适应转换方法,通过在数据集中对所有信号的R峰进行对齐,并在有无心跳率依赖的情况下重新采样R峰之间的段落。我们展示了这种转换在分类、聚类和可解释性方面的显著优势,优于商业软件的中位心拍转换和CNN方法。我们的方法在处理有限训练数据时,显示出浅层机器学习方法相对于CNNs的显著优势。此外,我们发布了一个经过充分测试且公开可访问的代码框架,提供了一个稳健的对齐管道以支持未来研究,网址为https://github.com/imi-ms/rlign。

英文摘要

Electrocardiogram (ECG) recordings have long been vital in diagnosing different cardiac conditions. Recently, research in the field of automatic ECG processing using machine learning methods has gained importance, mainly by utilizing deep learning methods on raw ECG signals. A major advantage of models like convolutional neural networks (CNNs) is their ability to effectively process biomedical imaging or signal data. However, this strength is tempered by challenges related to their lack of explainability, the need for a large amount of training data, and the complexities involved in adapting them for unsupervised clustering tasks. In addressing these tasks, we aim to reintroduce shallow learning techniques, including support vector machines and principal components analysis, into ECG signal processing by leveraging their semi-structured, cyclic form. To this end, we developed and evaluated a transformation that effectively restructures ECG signals into a fully structured format, facilitating their subsequent analysis using shallow learning algorithms. In this study, we present this adaptive transformative approach that aligns R-peaks across all signals in a dataset and resamples the segments between R-peaks, both with and without heart rate dependencies. We illustrate the substantial benefit of this transformation for traditional analysis techniques in the areas of classification, clustering, and explainability, outperforming commercial software for median beat transformation and CNN approaches. Our approach demonstrates a significant advantage for shallow machine learning methods over CNNs, especially when dealing with limited training data. Additionally, we release a fully tested and publicly accessible code framework, providing a robust alignment pipeline to support future research, available at https://github.com/imi-ms/rlign.