arXivDaily arXiv每日学术速递 周一至周五更新
重置

1. 深度学习架构与训练方法 84 篇

2606.14934 2026-06-16 cs.LG cs.AI 新提交

Separable Neural Architectures as Physical World Models: from Mathematical Theory to Applications

可分离神经架构作为物理世界模型:从数学理论到应用

Reza T Batley, Andrew Kichline, Sourav Saha

发表机构 * Kevin T. Crofton Department of Aerospace and Ocean Engineering, Virginia Polytechnic Institute and State University(弗吉尼亚理工大学凯文·T·克罗夫顿航空航天与海洋工程系)

AI总结 提出可分离神经架构(SNA),结合神经逼近与张量分解,通过变分框架求解偏微分方程,实现高维问题代数级缩放,并在工程案例中取得显著加速。

详情
AI中文摘要

本文介绍了可分离神经架构(SNA),这是一种结合神经逼近与张量分解的函数表示类。SNA将局部坐标函数(原子)与由稀疏低秩交互对象控制的全局相互作用解耦。该架构具有紧凑且平滑的归纳偏置,非常适合求解偏微分方程(PDE)。当在变分SNA(VSNA)框架下被视为Galerkin试验空间时,该公式满足Lax-Milgram下的经典变分保证:适定性、拟最优性、收敛性和稳定性。在高维时空-参数PDE中,VSNA通过代数级而非指数级缩放来缓解维数灾难。利用完全分解的、张量原生的交替最小二乘(ALS)优化框架,可将此成本降低至维度线性。VSNA在椭圆、双曲和抛物系统中得到验证,显示出与预测的代数谱缩放率高度一致。我们通过两个工程案例研究展示了SNA作为“一次求解,随处查询”的物理世界模型:一个7维参数化制造模拟和一个用于Inconel 718的实验性热-属性反演流程。VSNA在标准笔记本电脑CPU上102秒内执行了1,000,000次蒙特卡洛扫描,相比基于NVIDIA A100 GPU的全网格有限元基线实现了150,000倍加速。它还能在100毫秒内实现实时生成式逆模态重建。这些结果表明,SNA可作为连续参数流形的紧凑数学基础,实现实时反演、优化循环和快速不确定性传播。

英文摘要

This work introduces the Separable Neural Architecture (SNA), a function representational class combining neural approximation with tensor decomposition. The SNA decouples localized coordinate functions (atoms) from global interactions governed by a sparse, low-rank interaction object. This architecture possesses a compact and smooth inductive bias well-suited for solving partial differential equations (PDEs). When viewed as a Galerkin trial space under the variational SNA (VSNA) framework, the formulation satisfies classical variational guarantees under Lax-Milgram: well-posedness, quasi-optimality, convergence, and stability. In high-dimensional spatiotemporal--parametric PDEs, the VSNA mitigates the curse of dimensionality by scaling algebraically rather than exponentially. Exploiting an entirely factorized, tensor-native alternating least squares (ALS) optimization framework reduces this cost to linear in dimension. The VSNA is validated across elliptic, hyperbolic, and parabolic systems, demonstrating close alignment with predicted algebraic and spectral scaling rates. We showcase the SNA as a "solve once, query anywhere" physical world model via two engineering case studies: a 7D parametric manufacturing simulation and an experimental thermal-to-property inversion pipeline for Inconel 718. The VSNA executes a 1,000,000-query Monte Carlo sweep in 102s on a standard laptop CPU, yielding a 150,000x speedup over a full-grid finite element baseline hosted on an NVIDIA A100 GPU. It further enables real-time generative inverse-mode reconstructions under 100ms. These results demonstrate that the SNA serves as a compact mathematical substrate for continuous parameter manifolds to enable real-time inversion, optimization loops, and rapid uncertainty propagation.

2606.15036 2026-06-16 cs.LG math.NT 新提交

Transformers Learn the Mestre-Nagao Heuristic

Transformer学习Mestre-Nagao启发式方法

Pranav Venkata Konda

发表机构 * Pranav Venkata Konda(普拉纳夫·文卡塔·科恩达)

AI总结 训练两层Transformer编码器对有理椭圆曲线进行秩分类(rank 0/1),精度>99%,并通过机械可解释性发现模型学到了Mestre-Nagao和启发式权重,且CLS嵌入编码了L(E,1)的对数。

Comments 15 pages, 10 figures

详情
AI中文摘要

我们训练了一个两层Transformer编码器,用于将导子≤10000的有理椭圆曲线$E/\mathbb{Q}$从前128个归一化Frobenius迹分类为秩0或秩1。我们在两个类别上都达到了>99%的准确率,并且在测试曲线上(训练集中没有同源或二次扭的曲线)准确率基本不变。然后,我们应用机械可解释性技术,如注意力分析、线性探针、激活修补、logit归因和神经元级电路分析,来逆向工程模型(函数空间中的质心)学到的算法。我们发现,在平台期,一个由512个第一层MLP神经元中的20个组成的稀疏电路足以在AUROC为0.992的线性探针下进行秩预测,实现了秩0和秩1检测器的推挽检测架构,并带有单侧读出。然而,我们注意到模型存在次优的读出问题,表明读出路径与判别电路之间的秩顺序不匹配。关键的是,顶部判别神经元的学得输入权重与Mestre-Nagao和启发式权重$\log(p)/(p\cdot \log{B})$匹配,Spearman系数$r=0.997$,Pearson系数$r=0.952$:模型仅从Frobenius迹数据就学到了解析数论的一个结果。我们还发现,所有50个独立训练的模型都将CLS注意力集中在素数位置,其速率是合数位置的2-50倍。CLS嵌入编码了$\log{L(E,1)}$,在50个模型中的$R^2=0.962\pm 0.011$(在控制导子后)。激活修补分析表明,注意力权重与因果信息流分离。此外,训练得到的50个解在函数空间上几乎相同(成对一致性>98.8%),尽管权重空间存在巨大障碍。

英文摘要

We train a two-layer transformer encoder to classify rational elliptic curves $E/\mathbb{Q}$ of conductor $\leq 10000$ as either rank 0 or rank 1 from the first 128 normalized Frobenius traces. We achieve >99% accuracy on both classes, and accuracy is essentially unchanged on test curves with no isogeny or quadratic-twist relative in the training set. We then apply techniques from mechanistic interpretability such as attention analysis, linear probing, activation patching, logit attribution, and neuron-level circuit analysis to reverse-engineer the algorithm the (centroid in function space) model learned. We find that a sparse circuit of 20 out of 512 layer-1 MLP neurons is sufficient for rank prediction under a linear probe with an AUROC of 0.992 at plateau, implementing a push-pull detector architecture of rank-0 and rank-1 detectors with a one-sided readout. However, we notice that the model has sub-optimal readout problems indicating a mismatch in rank-order between the readout pathway and the discriminative circuit. Critically, the learned input weights of the top discriminating neuron match the Mestre-Nagao sum heuristic weights $\log(p)/(p\cdot \log{B})$ with a Spearman coefficient $r = 0.997$ and Pearson coefficient $r = 0.952$: the model has learnt a result from analytic number theory from the Frobenius trace data alone. We additionally find that all 50 independently trained models concentrate CLS attention on prime positions at 2-50$\times$ the rate of composite positions. The CLS embedding encodes $\log{L(E,1)}$ with $R^2 = 0.962\pm 0.011$ across the 50 models (after controlling for the conductor). Activation patching analysis reveals that attention weights are dissociated from causal information flow. Additionally, the 50 solutions from training are near-identical in function space (with pairwise agreement $>$98.8%) despite large weight space barriers.

2606.15085 2026-06-16 cs.LG 新提交

An Integrable Token Mixing Layer from the Generalized Yang Baxter Equation

来自广义杨-巴克斯特方程的可积令牌混合层

Snigdha Chandan Khilar

发表机构 * Independent Researcher(独立研究员)

AI总结 提出YB Mixer,一种基于自由费米子和广义杨-巴克斯特结构的序列令牌混合层,利用可积系统的局部代数约束保证全局计算稳定性,并实现保范正交映射、可交换传输矩阵和谱循环生成器,以支持变长序列推理。

详情
AI中文摘要

YB Mixer是一种源自自由费米子和广义杨-巴克斯特结构的序列令牌混合层。它应用了可积系统的核心原理,其中局部代数约束保证了全局计算稳定性。通过使用伊辛交换代数,该混合器创建了一个自由费米子结构,作为一个精确保范的正交映射。该代数还产生了可交换的传输矩阵,使得推理可以无顺序限制并适应任意可变预算。为了确保模型能够泛化到更长的序列长度,它使用了一个谱循环生成器。该生成器保持了系统关键的正交和可交换性质。结果是一个高度稳定且数学基础扎实的序列处理架构。

英文摘要

The YB Mixer is a sequence token mixing layer derived from free fermion and generalized Yang Baxter structures. It applies a core principle from integrable systems where a local algebraic constraint guarantees global computational stability. By using the Ising exchange algebra the mixer creates a free fermionic structure that acts as an exactly norm preserving orthogonal map. This algebra also produces commuting transfer matrices which allow inference to be order free and adaptable to any variable budget. To ensure the model can generalize to longer sequence lengths it uses a spectral circulant generator. This generator maintains the crucial orthogonal and commuting properties of the system. The result is a highly stable and mathematically grounded architecture for sequence processing.

2606.15207 2026-06-16 cs.LG cs.AI cs.NE 新提交

Controlled Dynamics Attractor Transformer

受控动力学吸引子Transformer

Cheng Zhang, Minnan Luo, Zesheng Yang, Ming Li, Yong-Jin Liu, Qinghua Zheng

发表机构 * Xi'an Jiaotong University(西安交通大学) Tsinghua University(清华大学)

AI总结 提出受控动力学吸引子Transformer(CDAT),通过耦合混合von Mises-Fisher注意力能量与Hopfield精炼能量,并引入CANN启发的兴奋-抑制调制,实现拓扑约束的动力学系统,在图异常检测和图分类任务上达到最优性能。

Comments 20pages,3 figures

详情
Journal ref
Forty-Third International Conference on Machine Learning(ICML 2026)
AI中文摘要

Transformer架构通过自注意力机制在深度模型的表示学习和推理方面取得了显著进展。同时,联想记忆(AM)框架将表示映射到能量景观上,提供了可解释的检索机制。然而,其连续时间推理动力学缺乏经典连续吸引子神经网络(CANN)的生物合理性。为弥合这一差距,我们提出了受控动力学吸引子Transformer(CDAT),它将混合von Mises-Fisher(Mo-vMF)注意力能量与Hopfield精炼能量耦合,同时通过CANN启发的兴奋-抑制调制增强能量下降。CDAT实例化了一个拓扑约束的动力学系统,其耦合编码了标记之间的关系结构,从而将吸引子式动力学与现代基于能量的注意力联系起来。我们进一步提供了构造性的耗散分析,以正式建立其受控推理动力学。得益于这些鲁棒且结构化的动力学,CDAT在图异常检测和图分类的多个基准测试中达到了最先进的性能。

英文摘要

Transformer architectures have dramatically advanced representation learning and inference in deep models through self-attention mechanisms. In parallel,associative memory (AM) frameworks map representations onto energy landscapes, offering interpretable retrieval mechanisms. However, their continuous-time inference dynamics lack the biological plausibility of classical Continuous Attractor Neural Networks (CANNs). To bridge this gap, we propose Controlled Dynamics Attractor Transformer (CDAT), which couples a mixture von Mises-Fisher (Mo-vMF) attention energy with a Hopfield refinement energy, while augmenting energy descent with a CANN-inspired excitation-inhibition modulation. CDAT instantiates a topology-constrained dynamical system whose couplings encode relational structure among tokens, thereby linking attractor-style dynamics to modern energy-based attention. We further provide a constructive dissipation analysis to formally establish their controlled inference dynamics. Benefiting from these robust and structured dynamics, CDAT achieves state-of-the-art performance across multiple benchmarks in graph anomaly detection and graph classification.

2606.15576 2026-06-16 cs.LG cs.AI 新提交

Localizing Credit at the Divergence: Path-Conditioned Self-Distillation for LLM Reasoning

在分歧处定位信用:路径条件自蒸馏用于LLM推理

Yu Li, Shu Hong, Tian Lan

发表机构 * Department of Electrical and Computer Engineering, George Washington University(乔治华盛顿大学电气与计算机工程系)

AI总结 提出Hindsight Self-Distillation (HSD)方法,通过将教师模型条件于当前训练组中的成功同伴轨迹,在失败与成功轨迹的分歧处提供密集信用信号,提升LLM在数学和代码推理任务上的性能。

详情
AI中文摘要

基于可验证奖励的强化学习为每次 rollout 分配一个标量,在长推理轨迹中留下了 token 级信用分配不明确的问题。同策略自蒸馏通过让同一模型作为教师,并条件于特权信息,产生密集的逐 token 信号来解决这一问题。但常见的真实答案选择仅是一个终点线索:在简短答案任务中,教师在需要路径级指导的中间位置保持沉默。我们提出后见自蒸馏(HSD),它将教师条件于从当前训练组中抽取的一个成功同伴 rollout。这样的同伴是从成功条件策略中精确采样的样本,无需额外的采样 rollout。通过提供完整的成功延续而不仅仅是最终答案,产生的信用信号集中在失败 rollout 与成功同伴之间的分歧位置。在 Qwen3-8B 和 Qwen3-32B 的数学和代码基准测试中,HSD 相比 GRPO 变体和同策略蒸馏基线获得了最佳结果,在 AIME 等简短答案任务上提升最大。

英文摘要

Reinforcement learning from verifiable rewards assigns a single scalar to each rollout, leaving token-level credit assignment underspecified in long reasoning traces. On-policy self-distillation addresses this by letting the same model act as a teacher conditioned on privileged information, producing a dense per-token signal. But the common choice of a ground-truth answer is only an endpoint cue: on terse-answer tasks, the teacher falls silent at the intermediate positions where path-level guidance matters most. We propose Hindsight Self-Distillation (HSD), which conditions the teacher on a successful peer rollout drawn from the current training group. Such a peer is an exact sample from the success-conditioned policy, requiring no additional sampled rollouts. By providing a full successful continuation rather than only the final answer, the resulting credit signal concentrates at the divergence position between a failed rollout and a successful peer. Across Qwen3-8B and Qwen3-32B on math and code benchmarks, HSD obtains the best result against GRPO variants and on-policy distillation baselines, with the largest gains on terse-answer tasks such as AIME.

2606.15669 2026-06-16 cs.LG cs.AI 新提交

Z-Plane Neural Networks: Bounded Geometric Activation Replaces ReLU and LayerNorm

Z平面神经网络:有界几何激活替代ReLU和LayerNorm

Sungwoo Goo, Hwi-yeol Yun, Sangkeun Jung

发表机构 * College of Pharmacy, Chungnam National University(忠南大学药学院) Department of Computer Science & Engineering, Chungnam National University(忠南大学计算机科学与工程系)

AI总结 提出Z平面神经网络,通过有界几何激活函数Radial Bounding将隐藏状态映射到超球面上的2D相量束,在保持方向信息的同时限制能量幅度,理论证明其保持1-Lipschitz连续性并防止梯度消失,实验表明100层无ReLU和LayerNorm的MLP在MNIST上稳定收敛。

详情
AI中文摘要

现代深度神经网络依赖欧几里得标量激活(如ReLU)和全局归一化技术(如LayerNorm)来防止深层架构中的梯度不稳定。然而,这些机制固有地导致神经元死亡、丢弃关键方向信息并破坏特征表示的正交性。受生物轴突频率调制传输的启发,我们提出了Z平面神经网络,将隐藏状态映射到超球面上的2D相量束。我们引入了一种新颖的几何激活函数Radial Bounding($\mathbf{x} / \max(1, \\|\mathbf{x}\\|_2)$),它在保持相位(方向)的同时限制能量幅度。我们从数学上证明,这种各向同性激活保持了1-Lipschitz连续性,并通过保留切向梯度防止梯度消失。实验上,一个完全不含ReLU和LayerNorm的100层Z平面多层感知机(MLP)在MNIST数据集上成功收敛,准确率达到98.34%,且具有绝对数值稳定性,证明仅靠有界几何激活就足以实现稳定的深度学习。

英文摘要

Modern deep neural networks rely on Euclidean scalar activations (e.g., ReLU) and global normalization techniques (e.g., LayerNorm) to prevent gradient instability in deep architectures. However, these mechanisms inherently cause dead neurons, discard critical directional information, and destroy the orthogonality of feature representations. Inspired by the frequency-modulation transmission of biological axons, we propose the Z-Plane Neural Network, which maps hidden states into 2D phasor bundles on a hypersphere. We introduce a novel geometric activation function, Radial Bounding($\mathbf{x} / \max(1, \|\mathbf{x}\|_2)$), which limits the energy magnitude while preserving the phase (direction). We demonstrate mathematically that this isotropic activation maintains 1-Lipschitz continuity and prevents gradient vanishing by preserving tangential gradients. Empirically, a 100-layer Z-Plane Multi-Layer Perceptron (MLP)-entirely devoid of ReLU and LayerNorm-successfully converges on the MNIST dataset with 98.34% accuracy and absolute numerical stability, proving that bounded geometric activation alone is sufficient for stable deep learning.

2606.15678 2026-06-16 cs.LG cs.AI 新提交

The Reservoir Attention Network: Cross-Pass State in Pretrained Transformers via Content-Addressable Reservoir Injection

储层注意力网络:通过内容可寻址储层注入在预训练Transformer中的跨前向传播状态

Emma Leonhart

发表机构 * Emma Leonhart

AI总结 提出储层注意力网络(RAN),通过在预训练Transformer中间层注入固定随机储层来携带跨前向传播状态,实验表明未训练的循环动态足以传递可用状态。

Comments 29 pages, 14 figures

详情
AI中文摘要

本文对储层注意力网络(RAN)进行了可行性和动力学研究,该架构将一个固定的、随机初始化的储层注入到预训练Transformer的中间层注意力中,以在跨前向传播时携带状态。实验涵盖从GPT-2(124M、355M)到Qwen2.5(0.5B、1.5B)的模型,均在单个消费级GPU上运行。任务被选为最小探针,以隔离单个机制;更广泛的“始终活跃的智能体”愿景在整个过程中被视为受计算限制的未来工作,而非本文的主张。储层被设计为未训练的(固定随机):这隔离了未训练的循环动态本身是否足以携带可用的跨前向传播状态,而将训练的循环作为互补的、更昂贵的方向。

英文摘要

A feasibility and dynamics study of the Reservoir Attention Network (RAN), an architecture that injects a fixed, randomly-initialized reservoir into the mid-layer attention of a pretrained transformer to carry state across forward passes. Experiments span GPT-2 (124M, 355M) to Qwen2.5 (0.5B, 1.5B) on a single consumer GPU. The tasks are minimal probes chosen to isolate individual mechanisms; the broader always-alive agent vision is treated throughout as compute-limited future work, not a claim of this paper. The reservoir is left untrained (fixed random) by design: this isolates whether untrained recurrent dynamics alone suffice to carry usable cross-pass state, leaving trained recurrence as a complementary, more expensive direction.

2606.16002 2026-06-16 cs.LG 新提交

Decomposing one-class support vector machine into an ensemble of one-data support vector machines

将一类支持向量机分解为单数据支持向量机的集成

Toshitaka Hayashi, Dalibor Cimr, Hamido Fujita, Richard Cimler

发表机构 * University of Hradec Králové(赫拉德茨-克拉洛韦大学) Universiti Teknologi Malaysia(马来西亚理工大学)

AI总结 针对一类支持向量机在大规模数据集上的可扩展性问题,提出将数据集分解为单个样本并训练单数据支持向量机,再通过集成学习组合模型,同时采用数据缩减策略加速,实验表明该方法在保持分类性能的同时显著提升速度。

详情
AI中文摘要

一类分类(OCC)是一个训练数据只包含一个类别的分类问题。一类支持向量机(OCSVM)是最有竞争力的OCC算法之一。然而,OCSVM在处理大规模数据集时存在可扩展性问题。本文提出了OCSVM的加速策略。其思想是将数据集分解为样本,并为单个数据点训练OCSVM模型。随后,应用集成学习将所有模型组合起来,以计算数据集的OCSVM模型。此外,通过数据缩减策略(使用训练样本均值训练的OCSVM模型)实现了进一步加速。实验使用Python包将所提方法与传统OCSVM进行了比较。所提策略比传统OCSVM更快,同时获得了相似的分类结果。此外,所提策略可以在样本和模型之间建立一一对应关系。源代码已上传至https://github.com/ToshiHayashi/ODSVM。

英文摘要

One-class classification (OCC) is a classification problem in which the training data contains only one class. The one-class support vector machine (OCSVM) is one of the most competitive OCC algorithms. However, OCSVM has scalability issues with large-scale datasets. This paper proposes the acceleration strategy of OCSVM. The idea is to decompose the dataset into samples and train OCSVM models for single data points. Subsequently, ensemble learning is applied to combine all models to compute the OCSVM model for the dataset. In addition, further acceleration is achieved through a data-reduction strategy with an OCSVM model trained on the average of the training samples. The experiment compared the proposal and traditional OCSVM using the Python package. The proposed strategy is faster than traditional OCSVM, while achieving similar classification results. Moreover, the proposed strategy can create one-to-one correspondence between samples and models. Source code is uploaded at https://github.com/ToshiHayashi/ODSVM

2606.16034 2026-06-16 cs.LG 新提交

Inference-Time Decision Calibration for Temporal Classification

时序分类的推理时决策校准

Arthur Chagas, Arthur Buzelin, Yan Aquino, Pedro Bento, Gisele L. Pappa, Wagner Meira, Cristiano Arbex Valle

发表机构 * Department of Computer Science (DCC), Universidade Federal de Minas Gerais (UFMG)(米纳斯吉拉斯联邦大学计算机科学系)

AI总结 提出将时序分类错误分解为表征错误和决策错误,通过冻结原生分类器并添加残差多尺度分支与事后分支感知校准器,在不重训练骨干网络的情况下区分缺失时序证据与未充分利用的决策级证据。

详情
AI中文摘要

时序分类错误常被视为表征失败,但也可能源于可用证据转化为决策的方式。本文提出时序分类的表征-校准分解。我们冻结训练好的原生分类器,并分离两种推理时干预:一个保守的残差多尺度分支,向原生预测添加辅助logits;以及一个事后分支感知校准器,在决策时重新组合原生和残差证据。这种设计在不重训练骨干网络的情况下,区分缺失的时序证据与未充分利用的决策级证据。在FI-2010、PTB-XL、UCI-HAR、MHEALTH和HARTH上,我们发现增益强烈依赖于场景。残差多尺度证据在噪声或表征受限的设置中最有用,尤其是短时域FI-2010和较弱的循环骨干网络,而分支感知校准在原生和辅助logits包含未被原始决策规则充分利用的互补证据时有所帮助。接近饱和的场景中,两种干预的增益有限。这些结果表明,时序分类不仅应理解为表征学习,还应理解为信任、组合和校准来自多个视角的证据的问题。

英文摘要

Temporal classification errors are often treated as representation failures, but they can also arise from how available evidence is converted into decisions. This paper proposes a representation--calibration decomposition for temporal classification. We keep a trained native classifier frozen and separate two inference-time interventions: a conservative residual multi-scale branch that adds auxiliary logits to the native prediction, and a post-hoc branch-aware calibrator that recombines native and residual evidence at decision time. This design distinguishes missing temporal evidence from underused decision-level evidence without retraining the backbone. Across FI-2010, PTB-XL, UCI-HAR, MHEALTH, and HARTH, we find that gains are strongly regime-dependent. Residual multi-scale evidence is most useful in noisy or representation-limited settings, especially short-horizon FI-2010 and weaker recurrent backbones, while branch-aware calibration helps when native and auxiliary logits contain complementary evidence not fully exploited by the raw decision rule. Near-saturated settings show limited gains from either intervention. These results suggest that temporal classification should be understood not only as representation learning, but also as the problem of trusting, combining, and calibrating evidence from multiple views.

2606.16076 2026-06-16 cs.LG cs.AI cs.GT 新提交

Phys-JEPA: Physics-Informed Latent World Models for Multivariate Time-Series Forecasting

Phys-JEPA:面向多变量时间序列预测的物理信息潜在世界模型

Weizhi Nie, Weichao Liu, Honglin Guo, Yuting Su

发表机构 * Tianjin University(天津大学)

AI总结 提出Phys-JEPA架构,将物理一致性约束引入潜在状态和状态转移,分解预测状态为物理和残差分量,在气候、交通、电力数据集上提升预测精度。

Comments Submitted to arXiv as a preliminary manuscript. 10 figures

详情
AI中文摘要

物理系统中的多变量预测需要模型在预测耦合时间变量的同时保持有意义的状态演化。深度预测器可以拟合时间相关性,物理信息模型可以用科学约束正则化预测,但这些方向通常仅在解码输出层面连接。因此,生成未来轨迹的隐藏预测状态可能在统计上有用,但在物理上无结构。我们提出Phys-JEPA,一种用于多变量时间序列预测的物理信息联合嵌入预测架构。Phys-JEPA学习一个潜在世界模型,其中预测状态被分解为物理和残差分量,物理一致性直接施加于潜在状态和潜在转移,而不仅仅施加于解码后的预测。该公式利用已知物理变量组织表示空间,同时保留未解析动力学的残差容量。在Jena Climate 2009–2016上,Phys-JEPA在H=24时将聚合MSE从0.12482降至0.12273,温度MSE从0.01892降至0.01831。在Traffic上,完整Phys-JEPA在所有测试视界内优于监督基线,将H=192的MSE从0.800784降至0.773873。在Electricity上,最佳变体取决于视界:静态潜在一致性在H=24和H=48时最强,而完整Phys-JEPA在H=192时给出最佳的聚合和目标变量MSE。这些初步结果表明,将物理信息学习从输出空间转移到潜在预测状态空间是可解释时间世界模型的一个有前景的方向。

英文摘要

Multivariate forecasting in physical systems requires models that predict coupled temporal variables while preserving meaningful state evolution. Deep forecasters can fit temporal correlations, and physics-informed models can regularize predictions with scientific constraints, but these directions are often connected only at the decoded-output level. As a result, the hidden predictive state that generates future trajectories may remain statistically useful but physically unstructured. We introduce Phys-JEPA, a physics-informed joint-embedding predictive architecture for multivariate time-series forecasting. Phys-JEPA learns a latent world model in which predictive states are decomposed into physical and residual components, and physical consistency is imposed directly on latent states and latent transitions rather than only on decoded forecasts. This formulation uses known physical variables to organize the representation space while retaining residual capacity for unresolved dynamics. On Jena Climate 2009--2016, Phys-JEPA reduces aggregate MSE from 0.12482 to 0.12273 and temperature MSE from 0.01892 to 0.01831 at H=24. On Traffic, full Phys-JEPA improves aggregate MSE over the supervised baseline across all tested horizons, reducing H=192 MSE from 0.800784 to 0.773873. On Electricity, the best variant depends on horizon: static latent consistency is strongest at H=24 and H=48, while full Phys-JEPA gives the best aggregate and target-variable MSE at H=192. These initial results suggest that moving physics-informed learning from output space to latent predictive state space is a promising direction for interpretable temporal world models.

2606.16112 2026-06-16 cs.LG cs.AI 新提交

Scaling Adaptive Depth with Norm-Agnostic Residual Networks

缩放自适应深度:范数无关残差网络

Tomás Figliolia, Beren Millidge

发表机构 * Zyphra San Francisco, CA(Zyphra旧金山加州)

AI总结 针对残差网络中残差流范数随深度增长导致深层更新被抑制的问题,提出范数无关残差架构NAG,通过分离幅度和方向信息保持各层贡献,并实现可解释的自适应深度跳过机制,在等计算量下匹配全深度性能。

详情
AI中文摘要

残差架构在深度学习中无处不在,但它们存在一个微妙的结构性限制:残差流的范数会随深度迅速增长。因此,来自后层的更新相对于累积的残差状态变得很小。这降低了它们对表示的影响,并限制了模型在深度上扩展的益处。为了解决这个问题,我们引入了NAG,一种范数无关的残差架构,它将残差流中的幅度与方向信息分离,在整个深度中保留有意义的层贡献,并防止后层更新被残差范数增长系统地抑制。重要的是,NAG仅引入可忽略数量的额外参数,并依赖于易于内核融合的简单操作,从而在实践中保持训练效率。我们表明,该架构优于基线Transformer,其增益随深度增加而显著增大,从而能够有效训练更深的模型。范数无关的公式还产生了一种可解释的深度混合(MoD)机制,该机制自适应地跳过注意力和MLP层。除了作为训练后的精度-计算权衡外,该机制还可以用作预训练时的扩展策略:在等FLOP训练下,通过减少每token前向传播成本节省的计算量可以再投资于在更多token上训练,同时保持总参数数量和KV缓存预算固定。在我们的实验中,约20%-25%的适度深度混合率在相等训练计算量下匹配全深度基线性能,同时大幅减少执行的层参数数量和前向传播FLOPs。这些结果将深度稀疏性确定为固定计算量训练的新扩展轴,从而能够实现非常深但FLOP高效的模型。

英文摘要

Residual architectures are ubiquitous in deep learning, but they suffer from a subtle structural limitation: the norm of the residual stream can grow rapidly with depth. As a result, updates from later layers become small relative to the accumulated residual state. This reduces their impact on the representation and limits the benefits of scaling models in depth. To address this, we introduce NAG, a norm-agnostic residual architecture that separates magnitude from directional information in the residual stream, preserving meaningful layer contributions throughout depth and preventing later updates from being systematically suppressed by residual-norm growth. Importantly, NAG introduces only a negligible number of additional parameters and relies on simple operations that are easily kernel-fusible, preserving training efficiency in practice. We show that this architecture outperforms baseline Transformers, with gains that increase substantially as depth grows, enabling effective training of much deeper models. The norm-agnostic formulation also leads to an interpretable Mixture-of-Depths (MoD) mechanism that adaptively skips both attention and MLP layers. Beyond serving as a post-training accuracy-compute tradeoff, this mechanism can be used as a pretraining-time scaling strategy: under iso-FLOP training, compute saved by reducing per-token forward-pass cost can be reinvested into training on more tokens while keeping the total parameter count and KV-cache budget fixed. In our experiments, moderate Mixture-of-Depths rates of approximately 20%-25% match full-depth baseline performance under equal training compute while substantially reducing the number of executed layer parameters and forward-pass FLOPs. These results identify sparsity in depth as a new scaling axis for fixed-compute training, enabling very deep yet FLOP-efficient models.

2606.16231 2026-06-16 cs.LG cs.AI 新提交

From Tokens to Regions: CUDA-Sensitive Instruction Tuning for GPU Kernel Generation

从令牌到区域:面向GPU内核生成的CUDA敏感指令微调

Wentao Chen, Jiace Zhu, Xing Zhe Chai, Zeng Qu, Qiaoling Xiao, Liucheng Duan, An Zou

发表机构 * Shanghai Jiao Tong University(上海交通大学) Biren Technology(壁仞科技)

AI总结 提出CuSeT方法,通过自适应令牌级掩码和区域感知样本重加权,在简单SFT框架内提升LLM生成CUDA内核的功能正确性。

详情
AI中文摘要

高性能CUDA内核对于可扩展的AI系统至关重要,而大型语言模型(LLM)由于严格且隐式的执行约束,仍然难以生成正确的内核。现有的基于LLM的方法要么依赖昂贵的智能体或强化学习(RL)流水线,要么采用监督微调(SFT)目标,但未能显式建模CUDA敏感性,即与执行约束紧密耦合的代码令牌或区域。在这项工作中,我们从令牌置信度模式的角度研究CUDA敏感性,表明CUDA敏感性出现在令牌和区域两个层面,其中大多数CUDA敏感令牌以高置信度被预测,而较小的低置信度子集形成对应于执行关键结构的区域。这些发现表明,有效的CUDA内核生成应同时利用高置信度的CUDA敏感令牌并保留低置信度的CUDA敏感区域。基于这些见解,我们提出了\textbf{\underline{CU}DA-\underline{Se}nsitive Instruction \underline{T}uning (CuSeT)},一种在简单SFT框架内的低成本后训练方法。CuSeT遵循“从令牌到区域”的原则,结合了\emph{自适应令牌级掩码}和\emph{区域感知样本重加权}。实验表明,CuSeT在多个模型系列和规模上一致地提高了功能正确性,优于标准SFT和高级SFT变体,同时以显著更低的推理成本达到了与前沿CUDA内核生成模型相竞争的性能。

英文摘要

High-performance CUDA kernels are essential for scalable AI systems, while Large Language Models (LLMs) still struggle to generate correct kernels due to strict and implicit execution constraints. Existing LLM-based approaches either rely on costly agentic or reinforcement-learning (RL) pipelines, or adopt supervised fine-tuning (SFT) objectives that fail to explicitly model CUDA sensitivity, namely code tokens or regions tightly coupled with execution constraints. In this work, we investigate CUDA sensitivity from the perspective of token confidence patterns, showing that CUDA sensitivity appears at both token and region levels, where most CUDA-sensitive tokens are predicted with high confidence, while a smaller low-confidence subset forms regions corresponding to execution-critical structures. These findings suggest that effective CUDA kernel generation should both leverage high-confidence CUDA-sensitive tokens and preserve low-confidence CUDA-sensitive regions. Building on these insights, we propose \textbf{\underline{CU}DA-\underline{Se}nsitive Instruction \underline{T}uning (CuSeT)}, a low-cost post-training method within a simple SFT framework. CuSeT follows the principle of ``from tokens to regions'' by combining \emph{adaptive token-level masking} with \emph{region-aware sample reweighting}. Experiments show that CuSeT consistently improves functional correctness across multiple model families and scales, outperforming standard SFT and advanced SFT variants, while achieving competitive performance against frontier CUDA kernel generation models with substantially lower inference cost.

2606.16243 2026-06-16 cs.LG cs.CL 新提交

LiFT: Local Search via Linear Programming for Overfitting-Controlled Transformers

LiFT: 通过线性规划进行局部搜索以实现过拟合可控的Transformer

Abhishek Shukla, Anikeit Khanna, Ankur Sinha, Faiz Hamid

发表机构 * Department of Management Sciences, Indian Institute of Technology Kanpur(印度理工学院坎普尔分校管理科学系) Department of Civil Engineering, Indian Institute of Technology Kanpur(印度理工学院坎普尔分校土木工程系) Operations and Decision Sciences, Indian Institute of Management Ahmedabad(印度管理学院艾哈迈达巴德分校运营与决策科学系) Brij Disa Centre for Data Science and AI, Indian Institute of Management Ahmedabad(印度管理学院艾哈迈达巴德分校Brij Disa数据科学与人工智能中心)

AI总结 提出基于线性规划的局部搜索框架,通过双层优化联合更新模型参数和正则化超参数,利用验证梯度和Hessian信息构造局部下降方向,在保持训练最优性的同时减少过拟合,实验表明在GPT-2 Small微调中持续改善测试困惑度。

Comments 22 pages, 6 figures, published in The 20th Learning and Intelligent Optimization Conference (LION 2026)

详情
AI中文摘要

本文提出了一种基于线性规划(LP)的局部搜索框架,用于微调预训练Transformer模型,并显式控制过拟合。该方法将Transformer微调表述为一个基于双层优化的正则化问题,其中模型参数和正则化超参数被联合更新。利用初始热身迭代期间收集的信息,包括验证梯度和训练Hessian信息,通过求解一个线性规划来构造局部下降方向,该方向在保持训练最优性的同时最小化缩放的方向导数。这种验证感知的下降方向能够对参数和正则化超参数进行聚焦的局部更新,从而在不需重复完整再训练周期的情况下减少过拟合。由此产生的方法称为基于线性规划的Transformer微调(LiFT),它通过系统识别任务特定的更新,而非依赖启发式或网格搜索的超参数选择,从而区别于传统微调。在WikiText-2上微调GPT-2 Small的实验表明,LiFT通过选择性调整Transformer块和正则化参数实现了有效的适应,在多种层配置和正则化设置下持续改善测试困惑度,尤其在易过拟合场景中增益显著。除了实证性能,LiFT还在Transformer微调、双层优化、局部搜索和正则化理论之间建立了原则性的联系。

英文摘要

This paper proposes a Linear Programming (LP)-based local search framework for fine-tuning pretrained transformer models with explicit control against overfitting. The approach formulates transformer fine-tuning as a bilevel optimization-based regularization problem, in which model parameters and regularization hyperparameters are jointly updated. Information collected during initial warm-up iterations, including validation gradients and training Hessian information, is used to construct a local descent direction by solving an LP that minimizes a scaled directional derivative while preserving training optimality. This validation-aware descent direction enables focused local updates of both parameters and regularization hyperparameters, reducing overfitting without requiring repeated full retraining cycles. The resulting method, termed Linear Programming-based Fine-Tuning (LiFT) for transformers, differs from conventional fine-tuning by systematically identifying task-specific updates rather than relying on heuristic or grid-based hyperparameter selection. Experiments on GPT-2 Small fine-tuned on WikiText-2 demonstrate that LiFT enables effective adaptation through selective tuning of transformer blocks and regularization parameters, yielding consistent improvements in test perplexity across multiple layer configurations and regularization settings, with particularly pronounced gains in overfitting-prone scenarios. Beyond empirical performance, LiFT establishes a principled connection between transformer fine-tuning, bilevel optimization, local search, and regularization theory.

2606.16310 2026-06-16 cs.LG cs.CL 新提交

QK-Normed MLA: QK normalization without full key caching

QK归一化MLA:无需完整键缓存的QK归一化

Yizhou Han, Yao Zhao, Jun Zhou, Longfei Li, Ruoyu Sun

发表机构 * The Chinese University of Hong Kong(香港中文大学) Ant Group(蚂蚁集团)

AI总结 提出QK归一化与MLA兼容的方法,通过吸收静态权重和动态标量,无需缓存完整键,在400M模型训练中降低损失并提升下游精度,解码延迟增加小于2%。

Comments 13 pages, 5 figures, conference-style manuscript

详情
AI中文摘要

查询-键(QK)归一化通过控制点积前查询和键的尺度来稳定注意力,但无法直接与多头潜在注意力(MLA)兼容。MLA通过缓存低维潜在状态而非完整键来实现高效解码,而投影后的QK RMSNorm似乎需要对每个缓存的token使用完全投影的键。我们表明这种明显的不兼容性是实现伪影,而非架构约束。RMSNorm分解为静态仿射权重和动态标量RMS统计量。静态键侧权重可以吸收到MLA查询侧投影中;动态键统计量简化为每个token和KV组的一个逆RMS标量。得到的公式在精确算术中与显式投影后QK RMSNorm完全等价,并保留了MLA的潜在解码路径。在我们训练高达100B token的400M参数模型中,QK归一化MLA相比QK裁剪实现了更低的训练损失和更好的下游准确率,而H800解码基准测试显示在高达256k上下文下延迟开销小于2%。这些结果使得QK归一化成为MLA模型实用的稳定选项,无需完整键缓存。

英文摘要

Query-key (QK) normalization stabilizes attention by controlling the scale of queries and keys before the dot product, but is not immediately compatible with Multi-head Latent Attention (MLA). MLA achieves efficient decoding by caching low-dimensional latent states instead of full keys, whereas post-projection QK RMSNorm appears to require the fully projected key for every cached token. We show this apparent incompatibility is an implementation artifact, not an architectural constraint. RMSNorm decomposes into a static affine weight and a dynamic scalar RMS statistic. The static key-side weight can be absorbed into the MLA query-side projection; the dynamic key statistic reduces to one inverse-RMS scalar per token and KV group. The resulting formulation is exactly equivalent to explicit post-projection QK RMSNorm in exact arithmetic and preserves MLA's latent decode path. In our 400M runs trained for up to 100B tokens, QK-Normed MLA achieves lower training loss and better downstream accuracy than QK clipping, while H800 decode benchmarks show less than 2% latency overhead up to 256k context. These results make QK normalization a practical stabilization option for MLA models without requiring full-key caching.

2606.16371 2026-06-16 cs.LG 新提交

CacheMuon: Using Temporal Preconditioning To Approximate Polar Factor

CacheMuon:利用时间预条件近似极分解因子

Bishnu Dev, Sushil Bohara, Martin Takáč, Samuel Horváth

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(莫扎德·本·扎耶德人工智能大学)

AI总结 提出CacheMuon,通过缓存历史优化步的极分解因子来减少Muon优化器中牛顿-舒尔茨迭代的计算开销,在保持训练质量的同时降低正交化计算量。

详情
AI中文摘要

Muon是一种优化器,它利用动量矩阵的极分解因子计算更新,并在多种训练设置中展现出强大的实证性能。Muon的一个关键组件是用于计算该极分解因子的牛顿-舒尔茨迭代。尽管这避免了精确奇异值分解的计算成本,但由于每一步优化都要执行,实际中仍然昂贵。同时,动量矩阵在训练过程中平滑变化,表明对应的极分解因子存在强时间相关性。在本文中,我们利用这一结构,提出CacheMuon,一种时间预条件方法,它重用先前优化步的信息来近似当前步的极分解因子。这减少了跨迭代的冗余正交化计算。我们将CacheMuon分析为一种非精确Muon更新,其误差由新鲜求解器误差和缓存陈旧度控制。实验上,CacheMuon提供了可控的质量-效率边界:保守阈值在语言模型和视觉训练中与新鲜Muon紧密匹配,同时减少正交化FLOPs,而更激进的阈值在牺牲适度验证质量下降的情况下带来更大的算术节省。

英文摘要

Muon is an optimizer that computes updates using the polar factor of the momentum matrix and has shown strong empirical performance across a range of training settings. A key component of Muon is the Newton-Schulz iteration used to compute this polar factor. Although this avoids the cost of an exact singular value decomposition, it remains expensive in practice because it is applied at every optimization step. At the same time, the momentum matrix changes smoothly over training, suggesting strong temporal correlation in the corresponding polar factors. In this paper, we exploit this structure and propose CacheMuon, a temporal preconditioning method that reuses information from previous optimization steps to approximate the polar factor at the current step. This reduces redundant orthogonalization computation across iterations. We analyze CacheMuon as an inexact Muon update, with error controlled by fresh-solver error and cache staleness. Empirically, CacheMuon provides a controllable quality-efficiency frontier: conservative thresholds closely match fresh Muon on language-model and vision training while reducing orthogonalization FLOPs, whereas more aggressive thresholds yield larger arithmetic savings at the cost of modest validation-quality degradation.

2606.16388 2026-06-16 cs.LG 新提交

Robust Neural Tucker Factorization with Bias Correction and Adaptive Initialization

鲁棒神经Tucker分解:偏差校正与自适应初始化

Yuchao Su, Yixin Ran

发表机构 * School of Computer Science and Engineering, Chongqing University of Science and Technology(重庆科技大学计算机科学与工程学院) College of Computer and Information Science, School of Software, Southwest University(西南大学计算机与信息科学学院 软件学院)

AI总结 提出KaBiN模型,结合Kaiming初始化和偏差校正,解决高维不完全张量补全中初始化不当和偏差缺失导致的优化不稳定问题。

Comments 9 pages,3 figures, 106 conferences

详情
AI中文摘要

高维不完全(HDI)张量广泛应用于交通和气候领域,但稀疏观测使得准确补全困难。跨不同多模态场的固有非线性动态和非平稳变化严重阻碍了传统线性重构框架的有效性。神经Tucker分解为建模张量模式间的高阶交互提供了有效框架。通过将底层结构特征参数化为连续潜在空间,神经表示规避了经典代数的刚性低秩约束。然而,其性能仍可能受到实现层面选择的影响,尤其是参数初始化和最终输出映射的偏差配置。次优初始化常导致立方扩展交互空间中的方差爆炸,将后续非线性激活边界推入严重梯度饱和区域,而忽略专用平移参数迫使交互权重隐式吸收全局统计偏差。本文提出一种简单有效的神经Tucker分解模型,结合Kaiming初始化和偏差校正(KaBiN),用于HDI张量补全。所提模型对嵌入和Tucker线性参数采用Kaiming均匀初始化,并在输出映射中采用简单偏差校正。通过优雅地将全局均值偏移与局部结构表示解耦,该框架提供了高度稳定且条件良好的优化景观。在三个真实HDI张量数据集上的实验表明,KaBiN在引入最小计算开销的同时,实现了优于原始NeuTucF的性能。

英文摘要

High-dimensional incomplete (HDI) tensors are widely used in traffic and climate applications, but sparse observations make accurate completion difficult. The intrinsic non-linear dynamics and non-stationary variations across distinct multi-modal fields severely hinder the efficacy of conventional linear reconstruction frameworks. Neural Tucker factorization provides an effective framework for modeling high-order interactions among tensor modes. By parameterizing underlying structural characteristics into continuous latent spaces, neural representations circumvent the rigid low-rank constraints of classical algebra. However, its performance can still be affected by implementation-level choices, especially parameter initialization and the bias configuration of the final output mapping. Suboptimal initializations frequently lead to variance explosion across the cubically expanded interaction spaces, driving the subsequent non-linear activation boundaries into severe gradient saturation zones, while the omission of a dedicated translation parameter forces interaction weights to implicitly absorb global statistical deviations. This paper proposes a simple yet effective neural Tucker factorization model with Kaiming initialization and bias correction (KaBiN) for HDI tensor completion. The proposed model utilizes Kaiming uniform initialization for the embedding and Tucker linear parameters, and adopts a simple bias correction in output mapping. By elegantly decoupling global mean shifts from local structural representations, the framework provides a highly stable and well-conditioned optimization landscape. Experiments on three real-world HDI tensor datasets show that KaBiN achieves better performance than the original NeuTucF, while introducing minimal computational overhead.

2606.16429 2026-06-16 cs.LG cs.CL 新提交

Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation

Taylor-Calibrate:混合线性注意力蒸馏的原则性初始化

Zhongzhu Zhou, Qingyang Wu, Junxiong Wang, Mayank Mishra, Shuaiwen Leon Song, Ben Athiwaratkun, Chenfeng Xu

发表机构 * The University of Sydney(悉尼大学) Together AI University of California, Berkeley(加州大学伯克利分校) The University of Texas at Austin(德克萨斯大学奥斯汀分校) Microsoft(微软)

AI总结 提出Taylor-Calibrate方法,利用泰勒引导的教师注意力统计初始化混合线性注意力学生模型,显著减少蒸馏所需训练令牌数。

Comments 24 pages, 9 figures

详情
AI中文摘要

混合线性注意力模型提供了一条更快长上下文推理的诱人路径:它们降低了全softmax注意力的二次成本和KV缓存负担,同时保留了Transformer模型的大部分质量。获得此类模型的一种实用方法是转换预训练的Transformer,而不是从头开始预训练新架构,但这种转换仍然脆弱。简单地将教师注意力投影复制到Gated DeltaNet(GDN)学生中并不能指定新的循环衰减、写入和输出门控动态。因此,转换后的模型通常从较差的动态状态开始,必须花费大量蒸馏令牌来修复初始化,而不是学习剩余的教师行为。我们提出了Taylor-Calibrate,一种用于混合GDN学生的轻量级初始化方法。该方法使用泰勒引导的教师注意力统计来设置值投影、记忆时间尺度、写入门和输出门,然后应用一个简短的逐层对齐步骤,使每个转换后的层与教师输出匹配。在四种教师设置和三种保留层策略下,Taylor-Calibrate提供了显著更强的零样本学生,在代表性消融中改进高达88倍,并且达到匹配恢复目标所需的训练令牌比朴素转换少4.9倍至9.2倍。

英文摘要

Hybrid linear attention models offer an appealing path to faster long-context inference: they reduce the quadratic cost and KV-cache burden of full softmax attention while retaining much of the quality of Transformer models. A practical way to obtain such models is to convert a pretrained Transformer instead of pretraining a new architecture from scratch, but this conversion is still brittle. Simply copying the teacher attention projections into a Gated DeltaNet (GDN) student does not specify the new recurrent decay, write, and output-gating dynamics. As a result, the converted model often starts in a poor dynamical regime and must spend many distillation tokens repairing initialization rather than learning the remaining teacher behavior. We propose Taylor-Calibrate, a lightweight initialization method for hybrid GDN students. The method uses Taylor-guided teacher attention statistics to set the value projection, memory timescale, write gates, and output gate, then applies a short per-layer alignment step to match each converted layer to the teacher output. Across four teacher settings and three retained-layer policies, Taylor-Calibrate gives substantially stronger zero-shot students, with up to an 88x improvement in a representative ablation, and reaches matched recovery targets with 4.9x--9.2x fewer training tokens than naive conversion.

2606.16454 2026-06-16 cs.LG cs.AI 新提交

SDS-LoRA: Overcoming Anisotropic Gradient Scaling in Low-Rank Adaptation

SDS-LoRA:克服低秩适应中的各向异性梯度缩放

Junghun Oh, Sungyong Baik, Kyoung Mu Lee

发表机构 * Seoul National University(首尔大学) Hanyang University(汉阳大学)

AI总结 提出SDS-LoRA,通过结构解耦奇异值与反向传播,消除LoRA中梯度各向异性缩放导致的秩降低和次优对齐问题,提升收敛速度和适应性能。

详情
AI中文摘要

低秩适应(LoRA)通过使用低秩矩阵参数化权重更新,实现了大型预训练模型对下游任务的高效适应。在本文中,我们从几何角度研究了LoRA参数化的局限性。具体地,我们表明当全微调梯度反向传播到低秩矩阵时,它会经历由奇异值驱动的各向异性缩放。我们认为这种现象是不可取的,因为它通过将梯度偏向主导奇异方向而抑制其他方向,从而扭曲了全微调梯度。我们的分析表明,各向异性梯度缩放降低了低秩矩阵梯度的有效秩,并导致LoRA中全微调梯度与其低秩近似之间的次优对齐,从而加剧了与全微调的差距。为了解决这些局限性,我们提出了一种新的低秩参数化方法SDS-LoRA,该方法在结构上将奇异值与反向传播解耦。我们的方法确保全微调梯度仅通过低秩矩阵子空间的正交基反向传播,独立于其尺度。收敛性分析表明,虽然LoRA的收敛速率随低秩矩阵的条件数而恶化,但SDS-LoRA与之无关。在自然语言和视觉基准上的实验结果表明,SDS-LoRA改善了损失收敛并缩小了与全微调的差距,显著提升了适应性能。

英文摘要

Low-Rank Adaptation (LoRA) enables efficient adaptation of large pre-trained models to downstream tasks by parameterizing weight updates with low-rank matrices. In this paper, we investigate the limitations of the LoRA parameterization from a geometric perspective. Specifically, we show that when a full fine-tuning gradient is backpropagated to the low-rank matrices, it undergoes anisotropic scaling driven by their singular values. We argue that this phenomenon is undesirable because it distorts the full fine-tuning gradient by skewing it toward dominant singular directions while suppressing others. Our analyses demonstrate that anisotropic gradient scaling reduces the effective rank of the low-rank matrices' gradients and results in suboptimal alignment between the full fine-tuning gradient and its low-rank approximation in LoRA, thereby exacerbating the gap to full fine-tuning. To address these limitations, we propose a new low-rank parameterization, SDS-LoRA, which structurally decouples singular values from the backward pass. Our method ensures that the full fine-tuning gradient backpropagates only through the orthonormal bases of the low-rank matrices' subspaces, independent of their scales. Convergence analysis demonstrates that while LoRA's convergence rate degrades with the condition number of the low-rank matrices, SDS-LoRA remains independent of it. Experimental results across natural language and vision benchmarks show that SDS-LoRA improves loss convergence and reduces the gap to full fine-tuning, significantly enhancing adaptation performance.

2606.16456 2026-06-16 cs.LG cs.AI 新提交

SPRI: SVD-Partitioned Residual Initialization for Data-Constrained MoE Upcycling

SPRI: 基于SVD分解残差初始化的数据受限MoE升级方法

Weiqiao Shan, Ruixiang Mao, Yuang Li, Yuhao Zhang, Yingfeng Luo, Tong Zheng, Chen Xu, Yucheng Qiao, Chunxiang Jin, Yi Yuan, Jingdong Chen, Tong Xiao, Jingbo Zhu

发表机构 * Northeastern University, China(东北大学) Huawei TSC, China(华为技术有限公司) CUHK-Shenzhen, China(香港中文大学(深圳)) University of Maryland, USA(马里兰大学) Harbin Engineering University, China(哈尔滨工程大学) Inclusion AI, Ant Group(蚂蚁集团Inclusion AI) NiuTrans Research, China(小牛翻译研究中心)

AI总结 提出SPRI方法,利用预训练FFN权重的SVD分解残差初始化MoE专家,结合两阶段训练策略,在数据受限的多语言语音翻译任务中显著提升性能。

Comments 8pages, 12 tables, 3 figures

详情
AI中文摘要

混合专家(MoE)模型能够实现高效扩展,但从头训练成本过高。MoE升级通过将预训练的密集模型转换为稀疏MoE模型来降低这一成本。然而,现有的升级方法通常依赖大规模持续训练,并且在数据受限的监督适应中表现不佳,原因在于专家同质化或对预训练参数的过度扰动。在此设置下,有效的升级必须利用预训练权重结构,同时为路由专家引入足够的多样性。为此,我们提出了基于SVD分解残差初始化(SPRI)的方法,该方法将从预训练前馈网络(FFN)权重中提取的SVD分解残差分配到路由专家中,从而在预训练谱结构的基础上引入可控的专家多样性。我们进一步引入两阶段训练策略以提高适应稳定性。我们在多语言语音到文本翻译任务上评估SPRI,该任务中有限的监督数据对MoE升级构成挑战,而多个目标语言提供了天然的路由异质性。在CoVoST2数据集上的15个英语到其他语言方向中,SPRI相比完全微调的密集模型平均BLEU和COMET分别提高了2.58和3.32分,并且比之前最佳的MoE升级基线高出3.39 BLEU和4.34 COMET分。

英文摘要

Mixture-of-Experts (MoE) models enable efficient scaling, but training them from scratch remains prohibitively expensive. MoE upcycling mitigates this cost by converting pretrained dense models into sparse MoE models. However, existing upcycling methods typically rely on large-scale continued training and often perform poorly under data-constrained supervised adaptation, due to either homogeneous experts or overly disruptive perturbations to pretrained parameters. In this setting, effective upcycling must leverage pretrained weight structure while introducing sufficient diversity among routed experts. To this end, we propose SVD-Partitioned Residual Initialization (SPRI), which distributes SVD-partitioned residuals derived from pretrained feed-forward network (FFN) weights across routed experts, introducing controlled expert diversity grounded in pretrained spectral structure. We further introduce a two-stage training strategy to improve adaptation stability. We evaluate SPRI on multilingual speech-to-text translation, where limited supervised data challenges MoE upcycling and multiple target languages provide natural routing heterogeneity. On CoVoST2 across 15 En-to-XX directions, SPRI improves average BLEU and COMET over fully fine-tuned dense models by 2.58 and 3.32 points, respectively, and outperforms the prior best MoE upcycling baseline by 3.39 BLEU and 4.34 COMET points.

2606.16575 2026-06-16 cs.LG math-ph math.MP 新提交

RepNet: Tackling spectral bias in deep neural networks via parameter reparameterization

RepNet:通过参数重参数化解决深度神经网络中的谱偏差

Yong Wang, Tao Zhou, Xuhui Meng

发表机构 * Institute of Interdisciplinary Research for Mathematics and Applied Science, School of Mathematics and Statistics, Huazhong University of Science and Technology(华中科技大学数学与统计学院交叉科学与应用数学研究所) Institute of Computational Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences(中国科学院数学与系统科学研究院计算数学研究所)

AI总结 针对深度神经网络在捕捉振荡和多尺度行为时的谱偏差问题,提出RepNet模型,通过重参数化第一隐藏层的权重和偏置,有效控制初始斜率尺度和分区点分布,实现自适应频率缩放,在函数逼近、PDE求解和算子学习中显著提升精度。

详情
AI中文摘要

深度神经网络(DNN)在科学计算中取得了显著成功,但在捕捉振荡和多尺度行为时常常受到谱偏差的影响。在本研究中,我们通过考察浅层ReLU神经网络在高频函数拟合中的失败来探究这一局限性。这一观察识别出解决快速振荡的两个重要因素:初始斜率尺度和网络诱导的分区点分布。受此分析启发,我们提出了RepNet,一种针对ReLU和tanh网络的重参数化DNN模型,专为高频和多尺度问题设计。关键思想是重参数化第一隐藏层的权重和偏置,从而能够有效控制初始斜率尺度并提供合适的初始分区点分布。此外,将重参数化的权重和偏置视为可训练参数,使得DNN在训练过程中实现自适应频率缩放。我们还推导了重参数化DNN的输出和斜率幅度的定量估计,以指导所提方法的初始化。数值实验,包括多尺度一维和四维函数逼近、结合物理信息神经网络(PINN)的正向和逆向PDE问题以及算子学习,表明RepNet在略微增加计算成本的情况下,提高了普通DNN在捕捉高度振荡特征时的预测精度。这些结果表明,RepNet为克服谱偏差并将DNN应用于多尺度问题提供了一种有效且灵活的方法。

英文摘要

Deep neural networks (DNNs) have achieved remarkable success in scientific computing, yet they often suffer from spectral bias in capturing oscillatory and multiscale behaviors. In this study, we investigate this limitation by examining the failure of shallow ReLU neural networks in fitting high-frequency functions. This observation identifies two important factors in resolving rapid oscillations: the initial slope scale and the distribution of partition points induced by the networks. Motivated by this analysis, we propose RepNet, a reparameterized DNN model for ReLU and tanh networks designed for high-frequency and multiscale problems. The key idea is to reparameterize the weights and biases in the first hidden layer, which enables effective control of the initial slope scale and provides an appropriate distribution of the initial partition points. Furthermore, treating the reparameterized weights and biases as trainable parameters allows the DNN to achieve adaptive frequency scaling during training. In addition, we derive quantitative estimates for the output and slope magnitudes of the reparameterized DNN to guide the initialization of the proposed method. Numerical experiments, including multiscale one- and four-dimensional function approximation, forward and inverse PDE problems in combination with physics-informed neural networks (PINNs), and operator learning, demonstrate that RepNet improves the predicted accuracy of vanilla DNNs in capturing highly oscillatory features with slightly additional computational cost. These results indicate that RepNet provides an effective and flexible approach for overcoming spectral bias and applying DNNs to multiscale problems.

2606.16620 2026-06-16 cs.LG cs.AI 新提交

Entropy-Gated Latent Recursion

熵门控潜在递归

Soham Bhattacharjee, Dushyant Singh Chauhan, Salem Lahlou, Martin Takac, Nils Lukas

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学)

AI总结 提出熵门控潜在递归(EGLR),通过在高不确定性token处递归应用冻结模型顶层解码器,构建与温度采样正交的确定性采样轴,扩展推理时缩放空间,在数学推理任务中显著提升性能。

详情
AI中文摘要

推理时缩放已成为改进语言模型推理能力的主要手段,但现有方法的展开多样性仅来源于单一来源:随机token级采样。我们认为这种单轴采样空间本质上是受限的,并识别出第二个完全确定且互补的轴:在冻结模型的顶层解码器层在高不确定性token处递归重新应用的层跨度$L$。不同的$L$选择会产生不同的展开,解决不同的问题子集,且无需随机性。我们通过熵门控潜在递归(EGLR)实例化这一轴,这是一种无需训练的解码过程,它重新应用顶层$L$层最多$K_{\max}$次迭代,直到下一个token分布收敛。结合$T$个温度采样,EGLR将单轴随机展开池转变为$L\times T$笛卡尔采样空间,且几乎不增加每次展开的成本。我们在8个指令微调模型和6个数学推理基准上表征了这一空间,并表明$L$轴与温度确实互补:在MATH-500上使用Qwen2.5-3B-Instruct时,联合$L\times T$预言机达到91.6%,比仅温度预言机(83.4%)高出8.2个百分点,比仅层预言机(81.2%)高出10.4个百分点,证实两个轴捕获了真正互补的问题。扩展的展开池为任何下游过程(包括自一致性、带验证器的最佳$N$选择和组相对RL训练(GRPO))提供了更丰富的每个提示候选,开辟了不依赖随机噪声的推理时缩放新方向。

英文摘要

Inference-time scaling has become the dominant lever for improving language-model reasoning, but existing methods derive rollout diversity from a single source: stochastic token-level sampling. We argue that this single-axis sampling space is fundamentally limiting, and identify a second, fully deterministic and complementary axis: the layer span $L$ at which a frozen model's top decoder layers are recursively re-applied at high-uncertainty tokens. Different choices of $L$ produce distinct rollouts that solve different subsets of problems, with no stochasticity. We instantiate this axis through Entropy-Gated Latent Recursion (EGLR), a training-free decoding procedure that re-applies the top-$L$ layers for at most $K_{\max}$ iterations until the next-token distribution converges. Combined with $T$ temperature samples, EGLR turns a single-axis stochastic rollout pool into an $L\times T$ Cartesian sampling space at almost the same per-rollout cost. We characterize this space across $8$ instruction-tuned models and $6$ math reasoning benchmarks, and show that the $L$-axis is genuinely complementary to temperature: on MATH-500 with Qwen2.5-3B-Instruct, the joint $L\times T$ oracle reaches $91.6\%$, $+8.2$ percentage points beyond the temperature-only oracle ($83.4\%$) and $+10.4$ points beyond the layer-only oracle ($81.2\%$), confirming that the two axes capture genuinely complementary problems. The expanded rollout pool provides richer per-prompt candidates for any downstream procedure that consumes rollouts, including self-consistency, best-of-$N$ with verifiers, and group-relative RL training (GRPO), opening a new direction for inference-time scaling that does not rely on stochastic noise.

2606.16639 2026-06-16 cs.LG 新提交

SPICE: Synergy and Partial Information Based Curriculum Evolution

SPICE: 基于协同与部分信息的课程演化

Ankush Pratap Singh, Houwei Cao, Yong Liu

发表机构 * New York Institute of Technology(纽约理工学院) New York University(纽约大学)

AI总结 提出SPICE框架,利用部分信息分解理论动态量化样本复杂度,设计渐进式课程使模型从学习共享跨模态线索过渡到模态特定模式再到复杂协同交互,在多个多模态基准上取得一致改进。

详情
AI中文摘要

多模态学习利用异构模态间的互补信息。每种模态的信息量在不同样本和训练阶段可能差异很大。现有的多模态课程学习策略通常假设样本的相对复杂度在训练过程中保持不变,因此无法适应模型的演化。我们提出了SPICE(基于协同与部分信息的课程演化),一种新颖的渐进式课程框架,用于多模态交互学习。在部分信息分解(PID)理论的指导下,我们的方法将多模态交互分解为冗余、独特和协同信息成分,从而实现对样本复杂度的可解释且动态的表征。基于这种分解,我们设计了一个在训练过程中不断演化的渐进式课程,使模型能够从学习共享的跨模态线索过渡到模态特定模式,最后到复杂的协同交互。为了适应模型演化,样本排序通过从单模态和多模态预测中得出的PID信息估计进行实时优化。在多个多模态基准上的实验表明,与传统训练和最先进基线相比,该方法取得了持续改进,凸显了PID信息分解和自适应样本排序在多模态课程学习中的有效性。

英文摘要

Multimodal learning exploits complementary information across heterogeneous modalities. The informativeness of each modality can vary widely across samples and training stages. Existing multimodal curriculum learning strategies often assume that the relative complexity of samples remains unchanged throughout training and therefore cannot adapt to model evolution. We propose SPICE (Synergy and Partial Information based Curriculum Evolution), a novel progressive curriculum framework for multimodal interaction learning. Guided by Partial Information Decomposition (PID) theory, our approach decomposes multimodal interactions into redundant, unique, and synergistic information components, enabling an interpretable and dynamic characterization of sample complexity. Building on this decomposition, we design a progressive curriculum that evolves throughout training, allowing the model to transition from learning shared cross-modal cues to modality-specific patterns and, finally, to complex synergistic interactions. Adapting to model evolution, sample ordering is refined in real-time using PID information estimates derived from unimodal and multimodal predictions. Experiments across multiple multimodal benchmarks demonstrate consistent improvements over conventional training and state-of-the-art baselines, highlighting the effectiveness of PID information decomposition and adaptive sample ordering for multimodal curriculum learning.

2606.16694 2026-06-16 cs.LG cs.AI physics.app-ph q-bio.NC 新提交

Adaptive inference and function vectors in deep transformers

深度变换器中的自适应推理与函数向量

Ravin Raj, Gautam Reddy

发表机构 * Joseph Henry Laboratories of Physics, Princeton University(普林斯顿大学约瑟夫·亨利物理实验室)

AI总结 提出深度变换器作为平均场交互系统实现分布式推理的理论,利用函数向量逐层推断潜在上下文变量,在上下文回归任务中预测非高斯分层结构与深度的关系,并通过约束线性注意力变换器验证。

详情
AI中文摘要

变换器被广泛用作学习大量耦合变量间复杂相关性的通用基础架构,但其内部机制仍不明确。我们提出了一种深度变换器作为平均场交互系统的理论,该系统在通信、局部性和深度约束下实现分布式推理。我们证明,这样的系统可以利用内部状态表示(“函数向量”)在其层上以越来越精细的尺度推断潜在上下文变量。在上下文回归任务中,该理论预测了潜在上下文变量中的非高斯分层结构与变换器深度之间的非平凡关系。使用约束线性注意力变换器对预测进行了测试,并展示了深度架构中的自适应推理。前馈模块和深度使变换器能够实现比先前描述的更丰富的上下文学习算法类别。

英文摘要

Transformers are widely used as a general-purpose substrate for learning complex correlations between a large collection of coupled variables, but their internal mechanisms have remained mysterious. We introduce a theory of a deep transformer as a mean-field interacting system that implements distributed inference, subject to constraints on communication, locality and depth. We show that such a system can exploit internal state representations ('function vectors') to infer a latent context variable at increasingly finer scales over its layers. In an in-context regression task, the theory predicts a non-trivial relationship between non-Gaussian, hierarchical structure in the latent context variable, and transformer depth. Predictions are tested using constrained linear attention transformers and demonstrate adaptive inference in deep architectures. Feedforward blocks and depth enable transformers to implement a much richer class of in-context learning algorithms than previously described.

2606.16768 2026-06-16 cs.LG 新提交

Taming Curvature: Architecture Warm-Up for Stable Transformer Training

驯服曲率:稳定Transformer训练的架构预热

Sameera Ramasinghe, Ajanthan Thalaiyasingam, Hadi Mohaghegh Dolatabadi, Chamin Hewa Koneputugodage, Gil Avraham, Violetta Shevchenko, Yan Zuo, Karol Pajak, Alexander Long

发表机构 * Pluralis Research

AI总结 提出基于热启动幂迭代的快速在线曲率估计方法,并发现训练不稳定性与预条件曲率激增相关,进而提出渐进增加网络深度的架构预热策略,有效稳定大模型训练。

详情
AI中文摘要

训练数十亿参数的Transformer通常很脆弱,会出现瞬时的损失尖峰和发散,浪费计算资源。尽管最近发展的边缘稳定性(EoS)理论通过(预条件)曲率提供了理解和控制优化方法稳定性的强大工具,但由于曲率估计的复杂性,这些曲率控制方法在大规模Transformer训练中并不流行。为此,我们首先引入一种基于热启动变体的快速在线估计器,用于估计最大的(预条件)Hessian特征值(即曲率),该估计器使用Hessian-向量积进行幂迭代。我们从理论上证明,并通过实验验证,所提出的方法在十亿参数规模下使每次迭代的曲率跟踪变得可行,同时更加准确。利用这一工具,我们发现训练不稳定性与预条件曲率的激增同时发生,并且曲率随深度增加而增长。基于这些观察,我们提出架构预热:逐步增加网络深度,以仔细控制预条件Hessian并稳定训练。在大规模Transformer上的实验验证了我们的方法能够实现高效的曲率跟踪,并在不减慢收敛速度的情况下,与现有最先进的稳定技术相比减少了不稳定性。

英文摘要

Training billion-parameter Transformers is often brittle, with transient loss spikes and divergence that waste compute. Even though the recently developed Edge of Stability (EoS) theory provides a powerful tool to understand and control the stability of optimization methods via the (preconditioned) curvature, these curvature-controlling methods are not popular in large-scale Transformer training due to the complexity of curvature estimation. To this end, we first introduce a fast online estimator of the largest (preconditioned) Hessian eigenvalue (i.e., curvature) based on a warm-started variant for power iteration with Hessian-vector products. We show theoretically, and verify empirically, that the proposed method makes per-iteration curvature tracking feasible at billion parameter scale while being more accurate. Using this tool, we find that training instabilities coincide with surges in preconditioned curvature and that curvature grows with depth. Motivated by these observations, we propose architecture warm-up: progressively growing network depth to carefully control the preconditioned Hessian and stabilize training. Experiments on large Transformers validate that our approach enables efficient curvature tracking and reduces instabilities compared to existing state-of-the-art stabilization techniques without slowing down convergence.

2606.16899 2026-06-16 cs.LG 新提交

Fantastic Pretraining Optimizers and Where to Find Them II: Hyperball Optimization

奇妙预训练优化器及其发现之处 II:超球优化

Kaiyue Wen, Xingyu Dang, Kaifeng Lyu, Tengyu Ma, Percy Liang

发表机构 * Stanford University(斯坦福大学) Princeton University(普林斯顿大学) Tsinghua University(清华大学)

AI总结 针对Muon等优化器在大模型预训练中增益随规模增大而减弱的问题,提出Hyperball包装器,固定权重矩阵及其更新的Frobenius范数,在1.2B参数模型上实现20-30%的token等效加速,并改善学习率迁移。

Comments Corresponding blog post: https://psychedelic-sunstone-851.notion.site/Fantastic-Pretraining-Optimizers-and-Where-to-Find-Them-2-1-Hyperball-Optimization-2e924306e6f280e7a5ffee00eb40a0dd

详情
AI中文摘要

基于矩阵的优化器(如Muon)可以显著加速语言模型预训练,但观察到当使用标准常数解耦权重衰减时,随着模型大小和数据规模的增长,它们相对于AdamW的增益会缩小。我们提出Hyperball,一个简单的优化器包装器来解决这个问题。给定一个基础优化器(如Adam或Muon),Hyperball将权重矩阵的Frobenius范数及其对应的优化器更新设置为固定常数。在高达1.2B参数的Qwen3风格模型上,Muon Hyperball相对于权重衰减基线实现了20-30%的token等效加速。与解耦权重衰减相比,Hyperball还改善了跨宽度和深度的学习率迁移。该方法的动机来自先前的理论,该理论表明使用权重衰减训练会导致一个仅依赖于训练超参数的平衡权重范数。通过这种机制,权重衰减决定了角度学习率,即权重矩阵方向变化的速度。

英文摘要

Matrix based optimizers such as Muon can substantially speed up language model pretraining, but their gains over AdamW are observed to shrink as model size and data scale grow when using standard constant decoupled weight decay. We propose Hyperball, a simple optimizer wrapper that addresses this issue. Given a base optimizer such as Adam or Muon, Hyperball sets the Frobenius norms of weight matrices and their corresponding optimizer updates to fixed constants. On Qwen3 style models up to 1.2B parameters, Muon Hyperball achieves 20--30% token equivalent speedup over weight decay baselines. Hyperball also improves learning rate transfer across widths and depths compared to decoupled weight decay. This method is motivated by prior theory showing that training with weight decay leads to an equilibrium weight norm that only depends on the training hyperparameters. Through this mechanism, the weight decay then decides the angular learning rate, i.e. how fast the direction of the weight matrix changes.

2606.16900 2026-06-16 cs.LG 新提交

Factorized Neural Operators Decompose Dynamic and Persistent Responses

因子化神经算子分解动态与持久响应

Hao Tang, Yuechen Duan, Jiongyu Zhu, Zimeng Feng, Hao Li, Chao Li

发表机构 * School of Medicine, University of Dundee(邓迪大学医学院) School of Data Science, Fudan University(复旦大学数据科学学院) School of Mathematical Sciences, Fudan University(复旦大学数学科学学院) Institute of Science and Technology for Brain-inspired Intelligence, Fudan University(复旦大学类脑智能科学与技术研究院) School of Science and Engineering, University of Dundee(邓迪大学科学与工程学院) Department of Applied Mathematics and Theoretical Physics, University of Cambridge(剑桥大学应用数学与理论物理系)

AI总结 提出因子化神经算子(FaNO),通过分解谱表示为等变动态响应和不变持久响应,提升多尺度物理系统的预测精度、参数效率和跨尺度泛化能力。

详情
AI中文摘要

物理系统通常表现出异质性机制,其中快速演变的动力学与持久结构共存。现有的神经算子通常依赖单一主导归纳偏置,因此将不同的物理响应耦合到共享表示中,难以捕捉这种多尺度物理行为。我们引入了跨域的统一格林函数框架,并提出了因子化神经算子(FaNO),它将谱表示分解为等变动态响应和不变持久响应,从而提高了可解释性和泛化能力。从机制上讲,我们展示了两个算子分支自发地特化为不同的物理角色,这些角色在尺度和域上保持一致:等变分支捕捉快速变化的瞬态动力学,而不变分支提取连贯的持久结构。FaNO的这种因子化机制提高了跨物理系统和域的预测精度、参数效率和跨尺度泛化能力。特别是,它在长时程自回归滚动、跨分辨率外推和物理状态转移下保持一致的预测。这些发现表明,可扩展的物理建模可能受益于从单一归纳偏置公式转向更好地反映物理系统异质性组织的因子化算子表示,从而加速机器学习在科学计算和发现中的可靠部署。

英文摘要

Physical systems often exhibit heterogeneous mechanisms, where rapidly evolving dynamics coexist with persistent structures. Capturing such multiscale physical behavior remains challenging for existing neural operators, which typically rely on single dominant inductive bias and therefore couple distinct physical responses into a shared representation. We introduce the Unified Green's Function Framework across domains and propose the Factorized Neural Operators (FaNO), which decompose spectral representations into equivariant dynamic responses and invariant persistent responses, leading to better interpretability and generalization. Mechanistically, we show that the two operator branches spontaneously specialize into distinct physical roles that remain consistent across scales and domains: the equivariant branch captures rapidly varying transient dynamics, whereas the invariant branch extracts coherent persistent structures. This factorized mechanism of FaNO improves prediction accuracy, parameter efficiency and cross-scale generalization across physical systems and domains. In particular, it maintains consistent predictions under long-horizon autoregressive rollout, cross-resolution extrapolation and physical-regime shifts. These findings suggest that scalable physical modeling may benefit from moving beyond single-inductive-bias formulations toward factorized operator representations that better reflect the heterogeneous organization of physical systems, accelerating the reliable deployment of machine learning for scientific computing and discovery.

2606.16939 2026-06-16 cs.LG cs.AI 新提交

Scalable Circuit Learning for Interpreting Large Language Models

可扩展的电路学习用于解释大型语言模型

Naiyu Yin, Dennis Wei, Tian Gao, Amit Dhurandhar, Karthikeyan Natesan Ramamurthy, Yue Yu

AI总结 提出CircuitLasso方法,基于稀疏线性回归高效学习LLM中的稀疏电路,以SAE特征为单元,在保持结构准确性的同时大幅降低计算成本,并揭示语义特征传播机制。

Comments Accepted to the Mechanistic Interpretability Workshop at ICML 2026

详情
AI中文摘要

机械可解释性中的一个重要研究方向是学习LLM组件上的稀疏电路,以揭示它们如何共同产生模型行为。然而,原始神经元具有多语义性,使得学习到的电路难以解释。稀疏自编码器(SAE)特征缓解了这一问题,但其高维度使得现有的基于干预的电路学习方法在计算上变得不可行。我们提出了CircuitLasso,一种基于稀疏线性回归的可扩展电路学习方法。CircuitLasso恢复的电路在基准数据上的结构准确性与最先进的基于干预的方法相匹配,而计算成本仅为后者的一小部分。为了可解释性,CircuitLasso高效地揭示了SAE特征之间的关系,展示了人类可解释的语义特征如何通过模型传播并影响其预测。最后,我们通过利用所学电路的见解,在领域泛化任务上以显著更低的成本实现了相当的性能,从而验证了所学电路的实用性。

英文摘要

A prominent research direction in mechanistic interpretability is learning sparse circuits over LLM components to reveal how they jointly produce model behavior. However, raw neurons are polysemantic, making learned circuits hard to interpret. Sparse autoencoder (SAE) features alleviate this, but their high dimensionality makes existing intervention-based circuit learning methods computationally prohibitive. We propose CircuitLasso, a scalable circuit-learning approach based on sparse linear regression. CircuitLasso recovers circuits whose structural accuracy matches that of state-of-the-art intervention-based methods on the benchmark data, at a fraction of the computational cost. For interpretability, CircuitLasso efficiently uncovers relationships among SAE features, showing how human-interpretable semantic features propagate through the model and influence its predictions. Finally, we validate the utility of our learned circuits by leveraging their insights to achieve comparable performance at substantially lower cost on a domain-generalization task.

2606.16979 2026-06-16 cs.LG 新提交

Scalable Pairwise Kernel Learning with Stochastic Vec Trick

可扩展的成对核学习与随机Vec技巧

Napsu Karmitsa, Tapio Pahikkala, Antti Airola

发表机构 * Department of Computing, University of Turku(图尔库大学计算系)

AI总结 提出SPaiK方法,利用随机广义vec技巧(sGVT)实现成对核学习的大规模扩展,在七个药物-靶标亲和力数据集上优于现有方法。

详情
AI中文摘要

成对学习是一种特殊形式的监督学习,专注于预测对象对的结果。在这项工作中,我们引入了SPaiK,一种针对成对设置的新可扩展核学习方法。我们的方法保留了核方法的表达能力,同时大幅降低了计算和内存需求。关键创新是随机广义vec技巧(sGVT),它是稀疏Kronecker积乘法算法的随机扩展,能够使用成对核进行高效的大规模训练。通过结合sGVT,SPaiK使得将基于核的成对学习应用于以前无法达到的大规模数据集成为可能。我们在七个真实的药物-靶标亲和力数据集上评估了SPaiK的性能,并将结果与成对学习中的最新方法进行了比较。

英文摘要

Pairwise learning is a specialized form of supervised learning that focuses on predicting outcomes for pairs of objects. In this work, we introduce SPaiK, a new scalable kernel learning method tailored for pairwise settings. Our approach preserves the expressive power of kernel methods while substantially reducing computational and memory requirements. The key innovation is the stochastic generalized vec trick (sGVT), a stochastic extension of the sparse Kronecker product multiplication algorithm, which enables efficient large-scale training with pairwise kernels. By incorporating sGVT, SPaiK makes it possible to apply kernel-based pairwise learning to datasets of a size previously out of reach. We evaluate the performance of SPaiK on seven real-world drug-target affinity datasets and compare the results with state-of-the-art methods in pairwise learning.

2606.17028 2026-06-16 cs.LG cs.AI cs.AR 新提交

HAMON: Passive Optical Sequence Mixing for Long-Horizon Forecasting

HAMON: 用于长程预测的无源光学序列混合

Alper Yıldırım

AI总结 提出HAMON无源衍射光学预测核心,通过光学传播替代数字序列混合层,在多个基准上优于或接近最强数字基线,MSE最多降低14%。

详情
AI中文摘要

简单的线性模型和频域模型在长程时间序列预测中仍然出奇地具有竞争力,最近的机制证据表明,标准预测基准可能不需要使Transformer在其他领域强大的密集叠加表示。这引发了一个底层问题:如果核心预测算子通常是低复杂度的且近似线性,它是否需要被实现为学习到的数字时间混合?我们引入了HAMON,一种无源衍射光学预测核心,其中历史值被编码到光学孔径上,未来位置保持暗场,级联的可训练相位掩模与自由空间衍射直接在输出场中形成预测。在推理时,预测由单个无源光学传播过程完成,无需可训练的数字序列混合层。在标准基准上,HAMON在ETTm2的所有预测长度和ETTh2除最长预测长度外的所有长度上优于考虑的最强数字基线,MSE最多降低14%,并且在不同预测长度上一致地优于基线,而非孤立点。它在Weather上具有竞争力,在其余ETT设置以及高通道数的Traffic和Electricity数据集上略逊于最强基线。相位编码、强度兼容读出和相位扰乱消融实验,以及TorchOptics交叉模拟检查表明,预测来自承载数据的光场而非数字预测头。由于无源核心使用标准傅里叶光学,HAMON为光学硬件和无源物理序列混合定义了一个具体目标。

英文摘要

Simple linear and frequency-domain models remain surprisingly competitive in long-horizon time-series forecasting, and recent mechanistic evidence suggests that standard forecasting benchmarks may not require the dense superposed representations that make transformers powerful in other domains. This raises a substrate-level question: if the core forecasting operator is often low-complexity and approximately linear, does it need to be implemented as learned digital temporal mixing? We introduce HAMON, a passive diffractive optical forecasting core in which historical values are encoded onto an optical aperture, future positions are left dark, and cascaded trainable phase masks with free-space diffraction shape the forecast directly in the output field. At inference, prediction is performed by a single passive optical propagation pass with no trainable digital sequence-mixing layer. Across standard benchmarks, HAMON outperforms the strongest digital baselines considered on ETTm2 at all horizons and on ETTh2 at all but the longest horizon, improving MSE by up to 14\% and doing so consistently across horizons rather than at isolated points. It is competitive on Weather and trails the strongest baselines on the remaining ETT settings and on the high-channel-count Traffic and Electricity datasets. Phase encoding, intensity-compatible readout, and phase-scrambling ablations, together with a TorchOptics cross-simulator check, indicate that the forecasts arise from the data-bearing optical field rather than from a digital forecasting head. Because the passive core uses standard Fourier optics, HAMON defines a concrete target for optical hardware and for passive physical sequence mixing.

2606.14757 2026-06-16 cs.CV cs.LG 交叉投稿

Spatial Priors via Space Filling Curves for Small and Limited Data Vision Transformers

基于空间填充曲线的小型与有限数据视觉Transformer的空间先验

Leyla Naz Candogan, Arshia Afzal, Pol Puigdemont, Volkan Cevher

发表机构 * ETH Zürich(苏黎世联邦理工学院)

AI总结 提出VIOLIN,一种轻量级掩码注意力机制,通过空间填充曲线编码空间结构,以极小的参数和计算开销为视觉Transformer注入空间归纳偏置,在小模型和有限数据场景下显著提升性能。

Comments ICML 2026

详情
AI中文摘要

尽管视觉Transformer(ViT)已成为许多计算机视觉任务中的主导骨干网络,但由于置换等变性,其注意力机制缺乏显式的空间归纳偏置。这在模型容量小或训练数据有限的情况下尤为重要。受线性Transformer中的注意力掩码策略和视觉状态空间模型(SSM)的扫描模式的启发,我们引入了VIOLIN,一种轻量级掩码注意力机制,通过空间填充曲线(SFC)在注意力中编码空间结构,仅增加不到0.0015%的额外参数和可忽略的计算开销。VIOLIN使用多条SFC扫描图像,构建曲线特定的衰减掩码,然后将其组合并与注意力矩阵相乘。在广泛的评估中,VIOLIN持续提升性能。在有限数据场景下,例如在VTAB-1K上进行微调时,它提升了所有任务组的准确率,在空间信息至关重要的任务上提升高达8.7%。它可以与参数高效微调方法(如LoRA)结合,进一步提高性能。除了微调,VIOLIN在ImageNet-1K上预训练期间改进了各种小型ViT架构(如DeiT、DINO)。此外,在高度依赖位置信息的像素级CIFAR-100训练中,VIOLIN将准确率提升了高达7.2%。总体而言,VIOLIN提供了一种计算高效且有效的方式,将空间归纳偏置注入ViT,特别有利于小模型和有限数据场景。

英文摘要

Though Vision Transformers (ViTs) have become the dominant backbone in many computer vision tasks, due to permutation equivariance, their attention mechanism lacks explicit spatial inductive biases. This become particularly important in two settings: when model capacity is small or training data is limited. Inspired by the attention masking strategies in Linear Transformers and the scanning patterns of Vision SSMs, we introduce VIOLIN, a lightweight masked attention mechanism that encodes spatial structure within attention via Space Filling Curves (SFCs) with less than 0.0015% extra parameters and negligible computational overhead. VIOLIN scans the image using multiple SFCs to construct curve-specific decay masks, which are then combined and multiplied with the attention matrix. Across a wide range of evaluations, VIOLIN consistently improves performance. In limited data regimes such as fine-tuning on VTAB-1K, it boosts accuracy across all task groups and by up to 8.7% on the tasks where spatial information is essential. It can be combined with parameter-efficient fine-tuning methods such as LoRA to further increase the performance. Beyond fine-tuning, VIOLIN improves various small scale ViT architectures (e.g., DeiT, DINO) during pretraining on ImageNet-1K. Additionally, on pixel-level CIFAR-100 training, a task that is highly dependent on location information, VIOLIN increases accuracy by up to 7.2%. Overall, VIOLIN provides a computationally efficient yet effective way to inject spatial inductive bias into ViTs, especially benefiting small models and limited data settings.

2606.14770 2026-06-16 cs.CV cs.AI cs.IR cs.LG 交叉投稿

An Empirical Analysis of Optimization Dynamics and Sparsity Boundaries in Large-Scale Pedestrian Attribute Recognition

大规模行人属性识别中的优化动态与稀疏边界实证分析

Houssam El Mir

发表机构 * College of Computer Science and Technology, Zhejiang University of Technology(浙江工业大学计算机科学与技术学院)

AI总结 针对行人属性识别中极端类别不平衡问题,提出多标签焦点损失校准配置(alpha=0.50, gamma=2.0),在零计算开销下匹配BCE基线并提升难例挖掘,同时识别出0.1%正样本率下的稀疏墙边界。

详情
AI中文摘要

行人属性识别(PAR)对于视频监控至关重要,支持法医搜索和重识别系统。当将PETA和PA-100K合并为一个包含109,000张图像的复合语料库时,极端类别不平衡仍然是一个基本障碍,其中少数属性的正样本比例低于1%。这导致标准BCE优化抑制稀有特征,我们称之为多数负类欺骗陷阱。我们在ResNet-18骨干网络上对多标签焦点损失超参数(alpha和gamma)进行了系统消融。校准配置(alpha=0.50, gamma=2.0)实现了62.32%的宏F1分数,与BCE基线相当,同时保留了优越的难例挖掘和收敛动态。我们的方法使用纯损失函数工程,边缘部署零计算开销。我们识别出稀疏墙,这是一个硬边界,当正样本比例低于0.1%时,全局损失重新加权失效,需要实例级干预。

英文摘要

Pedestrian Attribute Recognition (PAR) is critical for video surveillance, enabling forensic search and re-identification systems. Extreme class imbalance remains a fundamental obstacle when merging PETA and PA-100K into a 109,000-image composite corpus, where minority attributes have positive sample fractions below 1%. This causes standard BCE optimization to suppress rare traits, a phenomenon we term the majority negative class cheating trap. We present a systematic ablation of Multi-Label Focal Loss hyperparameters (alpha and gamma) on a ResNet-18 backbone. A calibrated configuration (alpha=0.50, gamma=2.0) achieves a Macro F1-score of 62.32%, matching BCE baseline while preserving superior hard-example mining and convergence dynamics. Our approach uses pure loss-function engineering with zero computational overhead for edge deployment. We identify the Sparsity Wall, a hard boundary where positive sample fractions below 0.1% make global loss reweighting ineffective, requiring instance-level intervention.

2606.14943 2026-06-16 cs.CL cs.LG 交叉投稿

Simplifying the Modeling of Arbitrary Conditionals in Natural Language

简化自然语言中任意条件概率的建模

Yinhan Lu, Eric Elmoznino, Léo Gagnon, Sarthak Mittal, Tejas Kasetty, Guillaume Lajoie

发表机构 * Mila — Quebec AI Institute(Mila — 魁北克人工智能研究所) McGill University(麦吉尔大学) Université de Montréal(蒙特利尔大学)

AI总结 提出AC-GPT,通过简单修改标准因果Transformer,实现单次前向传播中评估和采样任意条件(包括过去、未来和混合上下文),保持左到右顺序和下一词预测目标,无需退化标准性能。

详情
AI中文摘要

因果Transformer通过联合分布的自回归分解对序列进行建模,这使得高效的从左到右解码和条件似然计算成为可能。然而,它们无法高效地从任意条件中采样或评估——例如,以过去和未来标记为条件的文本块。最近的工作旨在通过新颖的架构解决这个问题,但通常导致对此类条件概率的次优建模和退化的生成。我们提出了任意条件GPT(AC-GPT),它引入了一个对标准因果Transformer的简单修改,使得在单次前向传播中能够评估和采样任意条件——包括过去、未来和混合上下文。与先前的方法不同,我们的方法保留了标准的从左到右顺序和下一词预测目标,这对于自然语言上的强性能和高效训练都是必不可少的。关键的是,这种兼容性允许现有的LLM被微调以进行任意条件建模。我们的实证结果表明,我们的方法在建模任意条件概率方面优于基线,且不会降低标准的从左到右性能。

英文摘要

Causal Transformers model sequences through an autoregressive factorization of the joint distribution, which enables efficient left-to-right decoding and conditional likelihood computation. However, they cannot tractably sample from or evaluate arbitrary conditionals -- e.g., a block of text conditioned on past and future tokens. Recent work aims to solve this problem through novel architectures, but they often lead to sub-optimal modeling of such conditionals and degraded generations. We propose Arbitrary Conditionals GPT (AC-GPT) which introduces a simple modification to standard causal Transformers to enable evaluating and sampling from arbitrary conditionals -- including past, future, and mixed contexts -- within a single forward pass. Unlike prior approaches, our method preserves the standard left-to-right ordering and next-token prediction objective essential for both strong performance and efficient training on natural language. Crucially, this compatibility allows existing LLMs to be fine-tuned for arbitrary conditioning. Our empirical results indicate that our method outperforms baselines on modeling arbitrary conditionals, without degrading standard left-to-right performance.

2606.14975 2026-06-16 cs.NE cs.AI cs.LG physics.data-an q-bio.NC 交叉投稿

Harnessing cortical geometry, wiring, and function as inductive biases for recurrent neural networks

利用皮层几何、连接和功能作为循环神经网络的归纳偏置

Mo Shakiba, Rana Rokni, Mohammad Mohammadi, Nima Dehghani

发表机构 * Neuromatch Academy, Neuromatch, Inc., USA(Neuromatch学院,Neuromatch公司,美国) McGovern Institute for Brain Research, Massachusetts Institute of Technology (MIT)(麦戈文脑科学研究所,麻省理工学院(MIT))

AI总结 本研究利用MICrONS项目数据,通过神经元空间坐标、解剖连接和功能关系初始化循环权重并施加空间约束,构建生物基础循环神经网络,在认知决策任务中优于基线模型,并发展出低熵、模块化和小世界组织。

详情
AI中文摘要

皮层的连接和功能组织如何塑造循环计算仍然是神经科学和机器学习中的一个核心问题。在这里,我们利用通过皮层网络机器智能(MICrONS)项目发布的数据——一个涵盖小鼠视觉皮层多个区域的功能连接组学资源,其中密集钙成像与同一动物的高分辨率电子显微镜重建共同配准——来构建生物基础的循环神经网络。使用来自近12,000个共同配准的兴奋性神经元的神经元空间坐标、解剖连接和功能衍生关系,我们初始化循环权重并在学习过程中施加通信感知的空间约束。在三个认知决策任务中,受皮层结构和功能约束的网络始终优于基线和部分约束模型。功能权重初始化提供了最大的增益,而真实空间嵌入在多种条件下产生了稳健的额外改进。这些生物基础网络还发展出低熵、模块化和小世界组织,并且即使当循环被限制为正权重时也能保持强劲性能。总之,我们的结果表明,皮层的机制——其几何、连接和功能结构——可以作为构建循环网络的强大归纳基础,这些网络学习更有效,同时收敛于生物计算的关键组织原则。

英文摘要

How the wiring and functional organization of cortex shape recurrent computation remains a central question in both neuroscience and machine learning. Here, we leverage data released through the Machine Intelligence from Cortical Networks (MICrONS) program--a functional connectomics resource spanning multiple areas of mouse visual cortex, in which dense calcium imaging is co-registered with high-resolution electron microscopy reconstruction from the same animal--to build biologically grounded recurrent neural networks. Using neuronal spatial coordinates, anatomical connectivity, and function-derived relationships from nearly 12,000 coregistered excitatory neurons, we initialize recurrent weights and impose communication-aware spatial constraints during learning. Across three cognitive decision-making tasks, networks constrained by cortical structure and function consistently outperform baseline and partially constrained models. Functional weight initialization provides the largest gain, while real spatial embedding yields robust additional improvements across conditions. These biologically grounded networks also develop low-entropy, modular, and small-world organization, and retain strong performance even when recurrence is restricted to positive weights. Together, our results show that the machinery of cortex--its geometry, wiring, and functional structure--can be harnessed as a powerful inductive basis for building recurrent networks that learn more effectively while converging toward key organizational principles of biological computation.

2606.14997 2026-06-16 cs.AI cs.LG 交叉投稿

AI Engram: In Search of Memory Traces in Artificial Intelligence

AI Engram: 在人工智能中寻找记忆痕迹

Jea Kwon, Dong-Kyum Kim, Jiwon Kim, Yonghyun Kim, Woong Kook, Meeyoung Cha

发表机构 * University of California, Berkeley(加州大学伯克利分校) KAIST(韩国科学技术院)

AI总结 提出几何框架,通过约束逆问题形式化神经科学标准,识别深度神经网络中的记忆痕迹(AI engram),实现记忆的线性组合与擦除,无需迭代优化。

Comments Accepted to ICML 2026 (Oral). Code is available at https://github.com/jeakwon/ai-engram/

详情
AI中文摘要

记忆形成是智能的基础,但深度神经网络是否保留类似于生物记忆单元的可识别记忆痕迹仍是一个未解问题。本文引入一个几何框架,通过将神经科学标准(特异性、再激活、充分性和必要性)形式化为约束逆问题,来识别此类“AI engram”。我们推导出一个闭式估计器,从全局纠缠参数中分离出单个记忆痕迹,并证明这一生物学启发的解对应于参数流形上的自然梯度更新。AI engram 允许对学习知识进行手术式操作:任何记忆子集可以通过线性算术进行组合或擦除,无需迭代优化。从简单 MLP 到大语言模型的实验证明了 AI engram 的因果有效性和显著可扩展性。总之,这些结果桥接了生物记忆理论与人工表示学习,并提供了关于深度网络如何在分布式存储中同时支持功能特异性的几何洞见。

英文摘要

Memory formation is fundamental to intelligence, yet whether deep neural networks preserve identifiable memory traces analogous to biological memory units remains an open question. This work introduces a geometric framework to identify such "AI engrams" by formalizing the neuroscientific criteria of specificity, reactivation, sufficiency, and necessity into a constrained inverse problem. We derive a closed-form estimator that isolates individual memory traces from globally entangled parameters, and show that this biologically-derived solution corresponds to a natural gradient update on the parameter manifold. AI engrams enable surgical manipulation of learned knowledge: any subset of memories can be composed or erased through linear arithmetic, without iterative optimization. Experiments ranging from simple MLPs to LLMs demonstrate the causal validity and substantial scalability of AI engrams. Together, these results bridge theories of biological memory and artificial representation learning and offer geometric insight into how deep networks simultaneously support functional specificity within distributed storage.

2606.15007 2026-06-16 cs.CL cs.AI cs.LG 交叉投稿

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Nemotron 3 Ultra: 开放、高效的混合专家Mamba-Transformer模型用于智能体推理

NVIDIA, :, Aaron Blakeman, Aaron Thomas, Aastha Jhunjhunwala, Abhibha Gupta, Abhinav Khattar, Adam Rajfer, Adi Renduchintala, Adil Asif, Aditya Vavre, Adriana Flores Miranda, Ahmad Bilal, Aileen Zaman, Ajay Hotchandani, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Alex Gronskiy, Alex Kondratenko, Alex Steiner, Alex Ye, Alexander Bukharin, Alexandre Milesi, Ali Taghibakhshi, Alice Gatti, Alisa Liu, Alok Kumar, Amar Phanishayee, Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman, Anahita Bhiwandiwalla, Ananth Subramaniam, Andrea Santilli, Andrew Fulks, Andrew McHarg, Andrew Tao, Andrii Skliar, Anjulie Agrusa, Ankur Srivastava, Ankur Verma, Anna Shors, Anna Warno, Antoni-Joan Solergibert I Llaquet, Arham Mehta, Arkadiusz Nowaczynski, Arti Jain, Ashwath Aithal, Ashwin Poojary, Asif Ahamed, Asit Mishra, Asma Kuriparambil Thekkumpate, Atefeh Sohrabizadeh, Avinash Kaur, Avinash Vem, Ayush Dattagupta, Barath Subramaniam Anandan, Bardiya Sadeghi, Ben Lanir, Benedikt Schifferer, Besmira Nushi, Bilal Kartal, Bill Thiede, Bita Darvish Rouhani, Bo Deng, Bob Schatz, Boris Ginsburg, Boxin Wang, Brad Nemire, Brandon Norick, Brian Dang, Brian Westphal, Brian Yu, Brucek Khailany, Bryan Catanzaro, Carlo del Mundo, Caryln Aarish, Chankyu Lee, Chantal Hwang, Charbel Sakr, Charles Wang, Charlie Truong, Chen Cui, Cheng Cheng, Cheng-Ping Hsieh, Chenghao Zhang, Chenhui Deng, Chintan Patel, Chris Alexiuk, Christian Cosgrove, Christian Munley, Christine Harvey, Christopher Parisien, Chunyang Shen, Coco Li, Collin Neale, Cynthia Gao, Cyril Meurillon, Dan Gil, Dan Su, Dan Zhao, Dane Corneil, Daniel Afrimi, Daniel Egert, Daniel Korzekwa, Daniel Lo, Daniel Machlab, Daniel Serebrenik, Daniil Sorokin, Daria Gitman, Daria Levy, Darko Stosic, David Mosallanezhad, David Yu, Davit Karamyan, Deena Donia, Deep Debroy, Deepak Narayanan, Devin O'Kelly, Dheeraj Peri, Dhruv Nathawani, Di, Wu, Dima Rekesh, Divyanshu Kakwani, Donald Plummer, Dong Anh, Dongfeng Yu, Dongfu Jiang, Donnie Kim, Dorrin Poorkay, Duncan Riach, Dusan Stosic, Dustin VanStee, Eavan Meng, Edgar Minasyan, Edward Lin, Eileen Margaret Peters Long, Elad Sarafin, Elad Segal, Elena Lantz, Ellie Evans, Elliott Ning, Eric Chung, Eric Harper, Eric Pham-Hung, Eric Tramel, Eric Yang, Erick Galinkin, Erik Pounds, Erika Goncalves Goncalves, Evan Briones, Evan Wu, Evelina Bakhturina, Evgeny Tsykunov, Ewa Dobrowolska, Faisal Ladhak, Farzan Memarian, Fay Wang, Fei Jia, Felipe Soares, Felipe Vieira Frujeri, Feng Chen, Fengguang Lin, Ferenc Galko, Frank Sun, Frankie Siino, Frida Hou, Gal Hubara Agam, Gal Kaplun, Gantavya Bhatt, Gargi Prasad, Garvit Kulshreshtha, George Armstrong, Gerald Shen, Giulio Borghesi, Gordana Neskovic, Gorkem Batmaz, Grace Lam, Greg Mason, Greg Pauloski, Grigor Nalbandyan, Grzegorz Chlebus, Grzegorz Karch, Guan-Ting Liu, Guoming Zhang, Guyue Huang, Haggai Maron, Haifeng Qian, Haim Elisha, Haoxing Ren, Haran Kumar Shiv Kumar, Haribhau Hud, Harris Nover, Harrison Saturley Hall, Hayate Iso, Helen Ngo, Herbert Hum, Herman Sahota, Hexin Wang, Himanshu Soni, Hovhannes Tamoyan, Hua Li, Huanhuan Chen, Hui Li, Hui Wang, Huy Nguyen, Ian Chiles, Ido Galil, Ido Shahaf, Igor Gitman, Igor Shovkun, Ilya Loshchilov, Ingo Guehring, Itamar Schen, Itay Levy, Itay Neeman, Ivan Moshkov, Izik Golan, Izzy Putterman, Jaemin Choi, Jakub Slowikowski, Jan Kautz, Jane Polak Scowcroft, Jared Casper, Jatin Mitra, Jeffrey Glick, Jenny Chen, Jesse Oliver, Jiacheng Xu, Jiafan Zhu, Jialin Song, Jian Zhang, Jiantao Jiao, Jiaqi Zeng, Jie Lou, Jim King, Jimmy Zhang, Jingquan Wang, Jinhang Choi, Jinju Chu, Joey Conway, Joey Guman, Johan Jatko, Johannes Rausch, John Kamalu, John Roberts, Johnny Greco, Johnny Mensel, Jonah Alben, Jonas Yang, Jonathan Cohen, Jonathan Raiman, Joseph Jennings, Joshua Mabry, Joshua Pierce, Joyjit Daw, Julien Veron Vialard, Junkeun Yi, Jupinder Parmar, Kajal Jain, Kan Zhu, Kari Briski, Katherine Cheung, Katherine Luna, Keith Willowhawk, Keith Wyss, Keshav Santhanam, Kevin Shih, Kezhi Kong, Khanh Nguyen, Khushi Bhardwaj, Kirthi Shankar Sivamani, Konstantinos Krommydas, Krishna C. Puvvada, Krzysztof Pawelec, Kumar Anik, Kyle Keprios, Kylie Day, Lawrence McAfee, Leo Du, Leon Derczynski, Li Ding, Linda Liu, Lingjie Wu, Lior Kadoch, Lizzie Wei, Luis Vega, Luke Robison, Lun Su, Maarten Van Segbroeck, Maciej Jakub Mikulski, Maer Rodrigues de Melo, Magda Sypula, Mahan Fathi, Makesh Narsimhan Sreedhar, Makesh Tarun Chandran, Manoj Kilaru, Maor Ashkenazi, Marc Cuevas, Marc Romeijn, Marcin Chochowski, Mark Cai, Mark Mozolewski, Markus Kliegl, Marta Stepniewska-Dziubinska, Martyna Patelka, Mattei Machczynski, Matvei Novikov, Mauricio Ferrato, Maximilian Golub, Mehrzad Samadi, Melissa Corpuz, Mengru Wang, Mengxi Wu, Meredith Price, Meriem Boubdir, Micah Schaffer, Michael Andersch, Michael Boone, Michael Gschwind, Michael Lightstone, Michael Loh, Michal Bien, Michal Zawalski, Michelle Gill, Miguel Martinez, Mikail Khona, Mike Chrzanowski, Mike Houston, Mingyuan Ma, Minseok Lee, Mohamed Fawzy, Mohammad Dabbah, Mohammad Shoeybi, Mostofa Patwary, Nabin Mulepati, Najeeb Nabwani, Namit Dhameja, Narimane Hennouni, Natalie Hereth, Nathaniel Pinckney, Nave Algarici, Nave Assaf, Netanel Haber, Nicholas Knight, Nick Reamaroon, Nickson Quak, Nidhi Bhatia, Nikhil Desai, Nikolai Ludwig, Nima Tajbakhsh, Ning Xu, Nir Ailon, Nirmal Juluru, Nitin Nitin, Ofri Masad, Oleg Rybakov, Oleksii Hrinchuk, Oleksii Kuchaiev, Olivia Viessmann, Olivier Delalleau, Oluwatobi Olabiyi, Omer Ullman Argov, Omri Puny, Oren Tropp, Pablo Ribalta, Pallab Bhattacharya, Panos Lampropoulos, Parth Mannan, Pasha Shamis, Patrick Legresley, Paul Gibbons, Pavlo Molchanov, Pawel Morkisz, Peter Dykas, Peter Jin, Pierre-Yves Aquilanti, Pinky Xu, Piotr Januszewski, Piotr Laskiewicz, Pooya Jannaty, Prakash Gurumurthy, Pranav Prashant Thombre, Prasoon Varshney, Pritam Gundecha, Przemek Tredak, Puhui Meng, Qiyu Wan, Rabeeh Karimi Mahabadi, Rachel Oberman, Rachit Garg, Radha Sri-Tharan, Rahul Kandu, Rakshit Sanadhya, Ran El-Yaniv, Ran Zilberstein, Rasoul Shafipour, Ray Macalisang, Rayen Tian, Reka Kovacs, Renjie Pi, Rick Izzo, Rima Shahbazyan, Rishabh Garg, Rishi Puri, Rita Fernandes Neves, Ritchie Zhao, Ritika Borkar, Ritu Gala, Riyad Islam, Robert Clark, Robert Hesse, Robert Kirby, Roger Waleffe, Rohit Watve, Roi Koren, Ron Banner, Ruoxi Zhang, Russell J. Hewett, Ryan Prenger, Ryan Stewart, Ryota Egashira, Sadegh Mahdavi, Saee Paliwal, Sagar Singh, Sahil Modi, Salika Dave, Samantha Shinagawa, Samuel Kriman, Sandip Bhaskar, Sangkug Lym, Sanjay Kariyappa, Sanjeev Satheesh, Saran Vikas Murari, Satish Pasumarthi, Saurabh Mishra, Saurav Muralidharan, Scott Hara, Sean Narentharen, Selvaraj Anandaraj, Seonjin Na, Seonmeyong Bak, Seonmyeong Bak, Sepehr Sameni, Seph Mard, Serge Panev, Seth Henneman, Seth Poulos, Shahar Mor, Shantanu Acharya, Shaona Ghosh, Sharath Turuvekere Sreenivas, Sharon Mendelson, Shaun Kotek, Shawn Wang, Shay Aharon, Shaya Gharghabi, Sheng-Chieh Lin, Shi Chen, Shiqing Fan, Shirish Baskaran, Shreya Gopa, Shrimai Prabhumoye, Shubham Pachori, Shubham Toshniwal, Shuoyang Ding, Shwetha Krishnamurthy, Siddharth Singh, Simeng Sun, Sirshak Das, Sivakumar Arayandi Thottakara, Smita Ithape, Somshubra Majumdar, Soumye Singhal, Sri Harsha Singudasu, Sridhar Bhuvanapalli, Srimukh Veccham, Stas Sergienko, Stefania Alborghetti, Stephen Ge, Su Rong, Sugam Dipak Devare, Sukrit Rao, Sumeet Kumar Barua, Sungsoo Ha, Sunny Gai, Suriya Gunasekar, Suseella Panguluri, Suyog Gupta, Sviataslau Hinzburh, Sweta Priyadarshi, Syeda Nahida Akter, Talor Abramovich, Tan Bui, Tanay Varshney, Tatevik Ter-Hovhannisyan, Teodor-Dumitru Ene, Terry Kong, Thanh Do, Tianhe Zhang, Tiffany Moore, Tijmen Blankevoort, Tim Moon, Tiyasa Mitra, Tom Balough, Tomasz Grzegorzek, Tomasz Hliwiak, Tomer Asida, Tomer Bar Natan, Tomer Keren, Tomer Ronen, Tony Salim, Tony Wang, Traian Rebedea, Tugrul Konuk, Twinkle Vashishth, Udi Karpas, Ushnish De, Vahid Noorozi, Venkat Srinivasan, Venmugil Elango, Vibhor Agrawal, Victor Cui, Vijay Korthikanti, Vikas Mehta, Vinay Rao, Virginia Wu, Vitaly Kurin, Vitaly Lavrukhin, Vladimir Anisimov, Vu Pham, Wanli Jiang, Wasi Uddin Ahmad, Wataru Ishihara, Wei Du, Wei Ping, Weiheng Chai, Wenliang Dai, Wesley Helmholz, Will Jennings, Will Zhu, Wojciech Prazuch, Xiaowei Ren, Xiwen Yu, Yan Breek, Yang Chen, Yang Yu, Yangyi Chen, Yaniv Galron, Yashaswi Karnati, Yejin Choi, Yev Meyer, Yi-Fu Wu, Yian Zhang, Ying Lin, Yonatan Geifman, Yonggan Fu, Youngeun Kwon, Yu Yao, Yugi Guvvla, Yuki Huang, Yunsheng Liu, Zach Moshe, Zachary Newell, Zhilin Wang, Zhiyu Li, Zhongbo Zhu, Zhuolin Yang, Zihan Liu, Zijie Yan, Zsolt-Alon Wertheimer

发表机构 * NVIDIA(英伟达)

AI总结 提出550B总参数量、55B激活参数的混合专家Mamba-Attention语言模型Nemotron 3 Ultra,通过20T tokens预训练、1M上下文扩展及后训练,在推理吞吐量提升约6倍的同时保持与顶尖模型相当的精度。

详情
AI中文摘要

我们介绍了Nemotron 3 Ultra,一个总参数量5500亿、激活参数550亿的混合专家Mamba-Attention语言模型。我们在20万亿文本tokens上预训练了Nemotron 3 Ultra,然后将上下文长度扩展到100万tokens,并使用监督微调(SFT)、强化学习(RL)和多教师在线策略蒸馏(MOPD)进行后训练。Nemotron 3 Ultra是我们迄今为止能力最强的模型,采用了多项关键技术——LatentMoE、多token预测(MTP)、NVFP4预训练、多环境RLVR、MOPD和推理预算控制。与公开可用的最先进LLM相比,Nemotron 3 Ultra的推理吞吐量提高了约6倍,同时达到了相当的精度。最先进的精度、高推理吞吐量和100万tokens的上下文长度使Nemotron 3 Ultra成为长时间运行的自主智能体任务的理想选择。我们在HuggingFace上开源了基础、后训练和量化检查点,以及训练数据和配方。

英文摘要

We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens, then extended the context length to 1M tokens, and post-trained using Supervised Fine Tuning (SFT), Reinforcement Learning (RL), and Multi-teacher On-Policy Distillation (MOPD). Nemotron 3 Ultra is our most capable model yet, employing multiple key technologies - LatentMoE, Multi Token Prediction (MTP), NVFP4 pre-training, multi-environment RLVR, MOPD, and reasoning budget control. Nemotron 3 Ultra achieves up to ~6x higher inference throughput as compared to state-of-the-art publicly available LLMs while attaining on-par accuracy. The state-of-the-art accuracy, high inference throughput, and 1M token context length make Nemotron 3 Ultra ideal for long-running autonomous agentic tasks. We open-source the base, post-trained, and quantized checkpoints, along with the training data and recipe on HuggingFace.

2606.15151 2026-06-16 cs.CV cs.LG 交叉投稿

HiRo: A Compact Four-Directional Hierarchical Reservoir Token-Mixer for Efficient Image Classification

HiRo:一种用于高效图像分类的紧凑型四方向分层储层令牌混合器

Md Farhadul Islam, Ishan Thakkar, J. Todd Hastings

发表机构 * University of Kentucky(肯塔基大学)

AI总结 提出HiRo模型,通过四方向扫描和两级切片混合储层模块实现局部与跨窗口令牌混合,在MNIST、CIFAR-10/100上以不足1M参数达到高精度。

Comments Accepted at ICONS 2026

详情
AI中文摘要

最近的图像分类模型必须在局部特征建模、跨窗口交互和参数效率之间取得平衡。许多高性能架构依赖于完全可训练的令牌混合器,这改善了表示学习但增加了参数数量、优化复杂性和计算成本。我们提出了一种参数高效的图像分类模型HiRo,它将移位窗口分区与多方向分层储层计算相结合。图像被划分为非重叠块(视为令牌),线性投影、归一化,并添加二维正弦位置编码,然后在局部窗口内处理。在每个窗口内,令牌沿四个方向扫描,并通过两级切片混合储层模块。在第一阶段,方向序列被分割成连续的切片,每个切片由具有可训练闭环读出的固定储层处理。得到的切片输出使用开始、结束和均值表示进行汇总,然后由每个方向的第二阶段固定储层混合。混合后的切片表示被扩展回令牌级别并与第一阶段输出融合,之后四个方向的输出重新对齐并平均。连续块在常规窗口和移位窗口之间交替以实现跨窗口交互,随后是层归一化、残差前馈网络和用于分类的全局池化。该设计将常规和移位窗口分区与分层多方向储层相结合,构建了一个高效的局部到跨窗口令牌混合框架用于图像分类。尽管使用的可训练参数少于1M,且内存和时间显著低于基于Transformer的基线,HiRo在MNIST、CIFAR-10和CIFAR-100上分别达到了99.46%、85.57%和59.10%的准确率。

英文摘要

Recent image classification models must balance local feature modeling, cross-window interaction, and parameter efficiency. Many high-performing architectures rely on fully trainable token-mixers, which improve representation learning but increase parameter count, optimization complexity and computational cost. We propose a parameter-efficient image classification model called HiRo that integrates shifted-window partitioning with multi-directional hierarchical reservoir computing. Images are divided into non-overlapping patches (treated as tokens), linearly projected, normalized, and enriched with 2D sinusoidal positional encodings, then processed within local windows. Inside each window, tokens are scanned in four directions and passed through a two-stage slice-and-mix reservoir module. In the first stage, directional sequences are split into contiguous slices, each processed by its own fixed reservoir with a trainable closed-loop readout. The resulting slice outputs are summarized using the start, end, and mean representations, and then mixed by a second-stage fixed reservoir for each direction. The mixed slice representations are expanded back to the token level and fused with the first-stage outputs, after which the four directional outputs are realigned and averaged. Consecutive blocks alternate between regular and shifted windows to enable cross-window interaction, followed by layer normalization, a residual feed-forward network, and global pooling for classification. This design combines regular and shifted window partitioning with hierarchical multi-directional reservoirs to make an efficient local-to-cross-window token-mixing framework for image classification. Despite using under 1M trainable parameters and significantly lower memory and time than transformer-style baselines, HiRo also achieves 99.46%, 85.57%, and 59.10% accuracy on MNIST, CIFAR-10, and CIFAR-100, respectively.

2606.15378 2026-06-16 cs.CL cs.LG 交叉投稿

Rethinking the Role of Efficient Attention in Hybrid Architectures

重新思考高效注意力在混合架构中的作用

Ziqing Qiao, Yinuo Xu, Chaojun Xiao, Zhou Su, Zihan Zhou, Yingfa Chen, Xiaoyue Xu, Xu Han, Zhiyuan Liu

发表机构 * Tsinghua University(清华大学) OpenBMB

AI总结 本文系统分析混合架构中高效注意力模块(如滑动窗口注意力和循环序列混合器)的影响,发现其主要影响长上下文能力的涌现速度,并揭示“大窗口惰性”现象,提出仅对全注意力层去除位置编码可提升长上下文性能。

Comments 23 pages, 13 figures

详情
AI中文摘要

现代语言模型越来越多地采用混合架构,将全注意力与高效注意力模块(如滑动窗口注意力(SWA)和循环序列混合器)相结合。然而,这些高效模块如何塑造模型能力仍知之甚少。为填补这一空白,我们从三个角度对混合架构进行了系统分析:缩放行为、机制分析和架构设计。首先,从缩放角度来看,我们发现高效注意力设计主要影响长上下文能力涌现的速度,而不同的混合模型在充分训练下最终会收敛到可比较的长上下文性能。其次,从机制上,我们表明长距离检索主要由全注意力承担,而高效注意力则塑造其优化轨迹。这解释了我们称之为“大窗口惰性”的反直觉现象:更大的SWA窗口可能延迟全注意力层中检索头的形成。第三,受此机制指导,我们表明仅对小型窗口SWA混合架构的全注意力层应用NoPE(无位置编码)可以显著提升长上下文性能,而对短上下文性能影响甚微。

英文摘要

Modern language models increasingly adopt hybrid architectures that combine full attention with efficient attention modules, such as sliding-window attention (SWA) and recurrent sequence mixers. However, how these efficient modules shape model capabilities remains poorly understood. To address this gap, we conduct a systematic analysis across hybrid architectures from three perspectives: scaling behavior, mechanism analysis, and architecture design. First, from a scaling perspective, we find that efficient-attention design primarily affects how fast long-context capability emerges, while different hybrids eventually converge to comparable long-context performance under sufficient training. Second, mechanistically, we show that long-range retrieval is mainly carried by full attention, whereas efficient attention shapes its optimization trajectory. This explains a counter-intuitive phenomenon we call Large-Window Laziness: larger SWA windows can delay the formation of retrieval heads in full-attention layers. Third, guided by this mechanism, we show that applying NoPE to only the full-attention layers of a small-window SWA hybrid substantially improves long-context performance with negligible impact on short-context performance.

2606.15702 2026-06-16 math.OC cs.LG 交叉投稿

Schattor: Schatten-family methods for deep learning optimization

Schattor:用于深度学习优化的Schatten族方法

Bohao Ma, Junyu Zhang, Chuan He

发表机构 * School of Data Science, The Chinese University of Hong Kong (Shenzhen)(香港中文大学(深圳)数据科学学院) Department of Industrial Systems Engineering and Management, National University of Singapore(新加坡国立大学工业系统工程与管理系) Department of Mathematics, Linköping University(利乌波德大学数学系)

AI总结 提出基于Schatten范数的自适应一阶方法族Schattor,统一SGD与Muon,通过矩阵鞅矩界实现无维数平稳性保证,并开发多块扩展以自适应平衡块优化。

Comments 32 pages

详情
AI中文摘要

现代深度学习优化具有异构参数结构、噪声梯度和高度非凸的景观,给算法设计和理论分析带来了重大挑战。受SGD局限性和自适应优化器成功的启发,我们提出了{\it Schattor},一种基于Schatten范数的自适应一阶方法族。Schattor将SGD和最近提出的矩阵变量自适应优化器Muon统一在一个基于Schatten范数的框架内。我们通过一种新的矩阵鞅矩界,为随机矩阵优化问题建立了Schattor族方法的无维数平稳性保证。我们还开发了多块扩展,自适应地平衡块级优化进度,并在这一更一般的设置中证明了无维数平稳性保证。

英文摘要

Modern deep learning optimization features heterogeneous parameter structures, noisy gradients, and highly nonconvex landscapes, posing significant challenges for both algorithm design and theoretical analysis. Motivated by the limitations of SGD and the success of adaptive optimizers, we propose {\it Schattor}, a family of adaptive first-order methods based on Schatten norms. Schattor unifies SGD and the recently proposed matrix-variate adaptive optimizer Muon within a single Schatten-norm-based framework. We establish dimension-free stationarity guarantees for methods in the Schattor family for stochastic matrix optimization problems via a novel matrix martingale moment bound. We also develop multi-block extensions that adaptively balance block-wise optimization progress and prove dimension-free stationarity guarantees in this more general setting.

2606.15751 2026-06-16 cs.SD cs.LG cs.MM eess.AS 交叉投稿

Acoustic Prompting via Stage-wise Modulation for Few-Shot Learning in Audio Language Models

通过阶段调制进行声学提示以实现音频语言模型中的少样本学习

Hyebin Cho, Jaehyuk Jang, Changick Kim, Joon Son Chung

发表机构 * Korea Advanced Institute of Science and Technology(韩国科学技术院)

AI总结 提出在音频编码器中引入可训练提示以捕获任务特定声学特征,与文本提示结合提升少样本适应性能,在11个数据集上验证有效性。

Comments Accepted to INTERSPEECH 2026

详情
AI中文摘要

音频-语言模型(ALMs)通过将音频波形与文本对齐,在零样本音频分类中取得了显著成功。最近改进下游性能的努力集中在学习最优文本提示上。然而,先前的方法侧重于文本编码器,忽略了音频编码器中可学习提示的潜力。在本文中,我们提出了一种新颖框架,将可训练提示引入音频编码器以捕获任务特定的声学特征。我们证明,将音频侧提示学习与现有文本侧方法相结合可以增强少样本适应。通过在11个数据集上的广泛实验表明,将我们的方法作为即插即用模块与现有文本提示调优相结合通常能带来性能提升。这些发现表明,显式调制音频表示空间可以有效补充仅文本提示方法。代码可在 https://github.com/hyebin-c/aspl 获取。

英文摘要

Audio-Language Models (ALMs) have shown remarkable success in zero-shot audio classification by aligning audio waveforms with text. Recent efforts to improve downstream performance focus on learning optimal text prompts. However, previous approaches focus on the text encoder, leaving the potential of learnable prompts within the audio encoder unexplored. In this paper, we propose a novel framework that introduces trainable prompts into the audio encoder to capture task-specific acoustic features. We demonstrate that integrating audio-side prompt learning with existing text-side approaches enhances few-shot adaptation. Through extensive experiments across 11 datasets show that integrating our method as a plug-and-play module alongside existing text prompt tuning generally leads to performance improvements. These findings suggest that explicitly modulating the audio representation space effectively complements text-only prompting approaches. The code is available at https://github.com/hyebin-c/aspl.

2606.15837 2026-06-16 cs.CV cs.LG stat.ME stat.ML 交叉投稿

Learning a Sampling-Free Variational DNN Plugin from Tiny Training Sets to Refine OOD Segmentation With Uncertainty Estimation

学习一种无采样的变分DNN插件,从微小训练集精炼OOD分割并估计不确定性

Jimut B. Pal, Suyash P. Awate

发表机构 * Centre for Machine Intelligence and Data Science (C-MInDS), Indian Institute of Technology (IIT) Bombay(印度理工学院孟买分校机器智能与数据科学中心) Computer Science and Engineering (CSE) Department, Indian Institute of Technology (IIT) Bombay(印度理工学院孟买分校计算机科学与工程系)

AI总结 提出VarDeepPCA,一种轻量级变分DNN框架,利用小分布内数据集学习有效解剖几何分布,无需目标域数据或预训练,通过重新解释softmax映射实现无采样推理,并提供不确定性估计,在4种临床应用中显著提升OOD分割的解剖合理性和准确性。

Comments Accepted at the Journal of Machine Learning for Biomedical Imaging

详情
AI中文摘要

深度神经网络(DNN)由于扫描仪和采集协议的变化,经常无法泛化到分布外(OOD)的医学图像。由于获取和标注新医学数据集的成本高昂,重新训练DNN模型以应对这些分布偏移通常不切实际。为了解决这个问题,我们引入了VarDeepPCA,一种新颖的轻量级变分DNN框架,旨在通过利用内在几何先验来恢复/精炼退化的分割图。与需要目标域数据或大量预训练的现有方法不同,我们的VarDeepPCA仅使用小的分布内(ID)数据集显式学习有效解剖几何的分布。理论上,我们的新颖变分学习框架利用对softmax映射的重新解释来隐式执行精确分布建模,从而实现计算高效、无采样的学习和推理。这也使VarDeepPCA能够为其恢复的分割图提供不确定性估计。我们在4种不同的临床应用上,使用14个公开可用的数据集,涉及心肌、神经视网膜边缘、前列腺和胎儿头部分割,对我们的框架进行了实证验证。与15种现有方法的比较表明,VarDeepPCA一致地恢复了现有方法在OOD数据上产生的分割图,以(i)显著提高几何的解剖合理性和分割的临床实用性,以及(ii)显著减少误差,而不需要比现有方法更多的训练数据。

英文摘要

Deep neural networks (DNNs) frequently fail to generalize to out-of-distribution (OOD) medical images because of variations in scanners and acquisition protocols. Retraining DNN models to address these distribution shifts is often impractical due to the high cost of acquiring and annotating new medical datasets. To address this, we introduce VarDeepPCA, a novel lightweight variational DNN framework designed to restore/refine degraded segmentation maps by leveraging intrinsic geometric priors. Unlike existing approaches that require target-domain data or extensive pre-training, our VarDeepPCA explicitly learns a distribution of valid anatomical geometries using only small in-distribution (ID) datasets. Theoretically, our novel variational learning framework leverages a reinterpretation of the softmax mapping to implicitly perform exact distribution modeling, thereby enabling computationally efficient, sampling-free learning and inference. This also enables VarDeepPCA to provide uncertainty estimates associated with its restored segmentation maps. We empirically validate our framework across 4 distinct clinical applications, using 14 publicly available datasets, involving segmentation of the myocardium, neuroretinal rim, prostate, and fetal head. Comparisons against 15 existing methods demonstrate that VarDeepPCA consistently restores segmentation maps produced by the existing methods on OOD data to (i) significantly improve anatomical plausibility of geometries and clinical utility of the segmentations, and (ii) significantly reduce errors, without needing any more training data than that used by existing methods.

2606.16222 2026-06-16 cs.AI cs.LG 交叉投稿

Latent Thought Flow: Efficient Latent Reasoning in Large Language Models

潜在思维流:大型语言模型中的高效潜在推理

Xiandong Zou, Jing Huang, Jianshu Li, Pan Zhou

发表机构 * Singapore Management University(新加坡管理大学) Ant Group(蚂蚁集团)

AI总结 提出Latent Thought Flow (LTF)方法,将推理建模为可变长度连续轨迹,通过连续GFlowNet训练采样器匹配奖励后验,在提升准确率9.5%的同时平均减少推理长度27.2%。

详情
AI中文摘要

大型语言模型(LLMs)越来越依赖中间推理,然而显式的思维链(CoT)存在语言空间瓶颈:每个思维必须解码为token,导致高推理开销。潜在推理将思考过程转移到连续空间,但现有方法大多学习确定性或奖励最大化路径,缺乏在具有不同正确性和成本的轨迹间分配概率的原则性方法。我们提出潜在思维流(LTF),将推理建模为可变长度连续轨迹,并训练采样器以匹配由答案质量和计算成本定义的奖励诱导后验。我们使用具有随机潜在转移的连续GFlowNet实例化该方法。为处理稀疏答案监督,我们引入熵加权子轨迹平衡目标以获取中间奖励,以及参考先验正则化器以锚定探索。在微调和迁移学习设置下的实验表明,与强潜在推理基线相比,LTF在平均减少推理长度27.2%的同时,准确率提升9.5%,优于显式CoT和潜在推理基线。

英文摘要

Large Language Models (LLMs) increasingly rely on intermediate reasoning, yet explicit Chain-of-Thought (CoT) suffers from a linguistic space bottleneck: each thought must be decoded into tokens, causing high inference overhead. Latent reasoning moves deliberation into continuous space, but existing methods mostly learn deterministic or reward-maximizing paths, lacking a principled way to allocate probability across trajectories with different correctness and costs. We propose Latent Thought Flow (LTF), which models reasoning as variable-length continuous trajectories and trains a sampler to match a reward-induced posterior over answer quality and computation cost. We instantiate this with a continuous GFlowNet using stochastic latent transitions. To handle sparse answer supervision, we introduce an Entropy-Weighted Subtrajectory Balance objective for intermediate rewards and a reference-prior regularizer to anchor exploration. Experiments under finetuning and transfer learning settings show that LTF outperforms explicit CoT and latent reasoning baselines, improving accuracy by 9.5% while reducing reasoning length by 27.2% on average compared with strong latent reasoning baselines.

2606.16730 2026-06-16 stat.ML cs.AI cs.LG 交叉投稿

Attention is Just Another Name for Coupling?: A Fast-Slow ODE Perspective on Hierarchical Pretraining

注意力只是耦合的另一个名字?:关于层级预训练的快速-慢速ODE视角

Zhengyuan Gao

AI总结 本文提出一种快慢ODE视角,将因果自注意力视为耦合机制,并引入一个通过零初始化门控反馈到快路径的慢子系统,在理论证明和实验验证中揭示了其与主方程平稳分布的联系。

详情
AI中文摘要

因果自注意力是一种耦合机制:每个token的隐藏状态通过同一时间尺度上前置token的学习混合来更新。本文提出一个疑问:是否存在第二个时间上更慢的耦合——一个在序列的时间下采样视图上运行并通过零初始化门控反馈到快路径的慢子系统——来补充它?该问题以奇异摄动常微分方程(ODE)的语言提出,其中快变量$x$以token速率演化,慢变量$y$每$P$个token更新一次,时间尺度比$\varepsilon = 1/P$通过因果块均值池化在结构上强制执行。\n本文将快慢ODE形式具体化为一个神经网络:一个在$T$个token上的标准因果注意力快路径,一个在$T/P$个池化token上的全注意力慢路径(每层便宜$P^2$倍),以及一个零初始化的加法门控。此外,在快动力学的线性生成器假设下,我们证明了平衡流形$x = \phi(y)$恰好是主方程(ME)的平稳分布$p_{\mathrm{st}}(y)$;在该机制下,学习的MLP $\phi_\theta(y)$是其变分近似(训练块不是生成器,因此该恒等式是结构极限,而非对训练网络的断言)。实验上,在50万token时,耦合是中性的——门控保持关闭,耦合和冻结消融在运行间噪声范围内——其墙钟成本与密集基线相当。贡献在于精确的、带有间隙标记的映射本身,而非性能提升。

英文摘要

Causal self-attention is a coupling mechanism: each token's hidden state is updated by a learned mixture of preceding tokens at the same timescale. This paper asks whether a second, temporally slower coupling-a slow sub-system operating on a temporally-downsampled view of the sequence and fed back into the fast path through a zero-initialised gate-complements it. The question is framed in the language of singularly perturbed ordinary differential equations (ODEs), where the fast variable $x$ evolves at the token rate, the slow variable $y$ evolves at one update per $P$ tokens, and the timescale ratio $\varepsilon = 1/P$ is enforced structurally by causal block-mean pooling. The paper instantiates the fast-slow ODE formalism as a concrete neural network: a fast path of standard causal attention over $T$ tokens, a slow path of full attention over $T/P$ pooled tokens ($P^2 \times$ cheaper per layer), and a zero-initialised additive gate. In addition, under a linear-generator assumption on the fast dynamics, we prove that the equilibrium manifold $x = ϕ(y)$ is exactly the master-equation (ME) stationary distribution $p_{\mathrm{st}}(y)$; in that regime a learned MLP $ϕ_θ(y)$ is a variational approximation of it (the trained block is not a generator, so this identity is the structured limit, not a claim about the network as trained). Empirically, at $500$k tokens the coupling is neutral -- the gate stays closed and the coupled and frozen ablations are within run-to-run noise -- at a wall-clock cost comparable to a dense baseline. The contribution is the precise, gap-marked mapping itself, not a performance gain.

2606.16783 2026-06-16 cs.CV cs.AI cs.LG 交叉投稿

Gen-VCoT: Generative Visual Chain-of-Thought Reasoning via Diffusion-Based RGB Intermediate Representations

Gen-VCoT: 基于扩散的RGB中间表示的生成式视觉思维链推理

Zhiqiang Zhou, Junliang Dai, Xu ling

发表机构 * Hunan Chemical Industry Vocational and Technical College(湖南化工职业技术学院)

AI总结 提出Gen-VCoT框架,利用专家视觉模型生成RGB图像作为推理中间步骤,通过自适应路由器选择推理深度,在空间和深度问题上分别提升25%和50%,但简单事实查询性能下降,表明最优表示依赖于任务。

Comments 12 pages, 5 figures

详情
AI中文摘要

多模态大语言模型(MLLMs)在视觉推理方面表现出色,但依赖基于文本的思维链(CoT),缺乏可解释的视觉中间表示。现有方法使用不透明的标记或外部工具,缺失关键属性。我们提出Gen-VCoT,一个使用专家视觉模型生成RGB图像作为推理中间表示的框架。它包含三个阶段:视觉定位(SAM分割)、几何推理(Marigold深度图)和语义推理(Qwen2-VL集成)。一个自适应路由器选择推理深度。评估显示,Gen-VCoT在空间问题(提升25%)和深度问题(提升50%)上表现更好,但可能损害简单事实查询。文本CoT在CLEVR上优于视觉中间表示(91.2% vs 62.5%),表明最优表示依赖于任务。Gen-VCoT为可解释的多模态推理建立了新范式。

英文摘要

Multimodal large language models (MLLMs) excel at visual reasoning but rely on text-based chain-of-thought (CoT), lacking interpretable visual intermediates. Existing methods use opaque tokens or external tools, missing key properties. We propose Gen-VCoT, a framework using expert vision models to generate RGB images as reasoning intermediates. It has three stages: visual grounding (SAM segmentation), geometric reasoning (Marigold depth maps), and semantic reasoning (Qwen2-VL integration). An adaptive router selects reasoning depth. Evaluations show Gen-VCoT improves spatial (25% better) and depth (50% better) questions, but may hurt simple factual queries. Text CoT outperforms visual intermediates on CLEVR (91.2% vs 62.5%), showing task-dependent optimal representations. Gen-VCoT establishes a new paradigm for interpretable multimodal reasoning.

2606.16825 2026-06-16 cs.CL cs.AI cs.LG 交叉投稿

Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models

循环绑定——混合专家语言模型中的专家层绑定

Martin Jaggi

发表机构 * EPFL(瑞士联邦理工学院洛桑)

AI总结 提出专家绑定方法,通过共享连续Transformer层的专家参数,在保持独立路由和注意力的同时,将MoE模型内存占用降低近2倍,且不损失困惑度或下游性能。

Comments Code available at https://github.com/epfml/looped-moe

详情
AI中文摘要

混合专家(MoE)架构通过每个令牌仅激活一小部分专家来高效扩展大型语言模型(LLM),但全部参数计数——主要由专家参数主导——必须保留在训练和推理内存中。为了解决这个问题,我们引入了专家绑定(Expert Tying),这是一种架构修改,它在连续Transformer层之间共享专家参数,同时保留独立的逐层路由和注意力。我们在常见的先进架构上评估了这种方法,包括OLMoE、Qwen3和DeepSeek风格的MoE。我们的预训练实验表明,绑定专家可以将内存占用减少近2倍,而几乎不降低困惑度或下游质量。通过利用MoE路径中固有的参数冗余,我们的方法提供了高度有利的计算-内存权衡,推动了下一代LLM的高效训练和扩展。

英文摘要

Mixture-of-Experts (MoE) architectures efficiently scale Large Language Models (LLMs) by activating only a small fraction of their experts per token, yet the full parameter count - dominated by the expert parameters - must be held in training and inference memory. To address this, we introduce Expert Tying, an architectural modification that shares expert parameters across consecutive transformer layers while preserving independent, layer-wise routing and attention. We evaluate this approach across common, state-of-the-art architectures, including OLMoE, Qwen3, and DeepSeek-style MoEs. Our pretraining experiments demonstrate that tying experts can reduce memory footprint by almost 2x at virtually no degradation in perplexity or downstream quality. By exploiting the parameter redundancy inherent in MoE pathways, our method provides a highly favorable compute-to-memory trade-off, advancing efficient training and scaling of next-generation LLMs.

2606.16934 2026-06-16 cs.CL cs.LG 交叉投稿

Exploring Extrinsic and Intrinsic Properties for Effective Reasoning with Code Interpreter

探索代码解释器有效推理的外在属性与内在属性

Patomporn Payoungkhamdee, Napat Laosaengpha, Jenta Wonglertsakul, Pittawat Taveekitworachai, Pume Tuchinda, Panjapong Poobanchuen, Ekapol Chuangsuwanich, Can Udomcharoenchaikit, Samuel Cahyawijaya, Peerat Limkonchotiwat, Sarana Nutanong

发表机构 * Vidyasirimedhi Institute of Science and Technology(维达亚希米科技学院) Kasetsart University(科琼大学) SCB 10X King Mongkut’s University of Technology Thonburi(朱拉隆功技术大学泰竹分校) Department of Computer Engineering Chulalongkorn Univesity(朱拉隆功大学计算机工程系) Cohere AI Singapore(AI新加坡)

AI总结 本文从外在属性(关键token)和内在属性(代码特定认知行为)两个角度研究代码解释器推理,发现强模型更频繁出现关键token和验证、回溯等行为,并利用这些属性在推理和训练中提升性能。

详情
AI中文摘要

使用代码解释器(CI)进行推理已成为一种有效范式,通过可执行计算和迭代验证增强大型语言模型(LLM)的推理能力。尽管其应用日益广泛,但有效代码推理的行为属性仍未被充分探索。在本工作中,我们受自然语言推理研究的启发,从两个不同视角研究代码推理:外在属性(由关键token表示)和内在属性(由代码特定的认知行为表示)。在多个LLM上,我们发现更强的CI推理模型一致地表现出更高比例的关键token和认知行为,特别是验证、回溯和反向链。基于这些观察,我们研究了如何在推理和训练期间利用这些属性。在推理时,附加代码特定的关键token在数学、排序和优化等若干推理能力上提升了性能,但在其他方面收益有限。在训练时,用代码特定的认知行为增强最先进的框架,在三个评估模型中的两个上提升了监督微调和强化学习性能。进一步分析表明,这些行为减少了错误回答中的过度思考,提高了token效率,同时也揭示了限制某个模型收益的因素。我们的发现首次系统性地描述了有效CI推理的特征,并展示了利用关键属性改进CI推理的潜力和局限性。

英文摘要

Reasoning with a Code Interpreter (CI) has emerged as an effective paradigm for enhancing the reasoning capabilities of large language models (LLMs) through executable computation and iterative verification. Despite its growing adoption, the behavioral properties underlying effective code reasoning remain largely underexplored. In this work, we investigate code reasoning from two distinct perspectives inspired by prior studies of natural language reasoning: extrinsic properties, represented by crucial tokens, and intrinsic properties, represented by code-specific cognitive behaviors. Across multiple LLMs, we find that stronger CI reasoning models consistently exhibit a higher prevalence of crucial tokens and cognitive behaviors, particularly verification, backtracking, and backward chaining. Building on these observations, we examine how these properties can be leveraged during both inference and training. At inference time, appending code-specific crucial tokens improves performance on several reasoning capabilities, including mathematical, ordering, and optimization, while yielding limited benefits elsewhere. At training time, augmenting a state-of-the-art framework with code-specific cognitive behaviors improves supervised fine-tuning and reinforcement learning performance in two of three evaluated models. Further analysis shows that these behaviors reduce overthinking in incorrect responses and improve token efficiency, while also revealing factors that limit gains in a certain model. Our findings provide the first systematic characterization of effective reasoning with CI and demonstrate both the potential and limitations of leveraging key properties to improve CI-based reasoning.

2606.16996 2026-06-16 cs.CV cs.AI cs.LG 交叉投稿

ActiveSAM: Image-Conditional Class Pruning for Fast and Accurate Open-Vocabulary Segmentation

ActiveSAM: 图像条件类别剪枝实现快速准确的开放词汇分割

Tran Dinh Tien, Zhiqiang Shen

发表机构 * VILA Lab, Mohamed bin Zayed University of Artificial Intelligence(VILA实验室,穆罕默德·本·扎耶德人工智能大学)

AI总结 提出ActiveSAM,一种无需训练、零样本的推理框架,通过图像条件类别剪枝和低分辨率预览,将SAM 3转化为主动词汇分割器,在8个基准上平均提升1.4 mIoU,速度提升最高5.5倍。

Comments Preprint. Code is available at https://github.com/VILA-Lab/ActiveSAM

详情
AI中文摘要

Segment Anything Model 3 (SAM 3) 为概念提示分割提供了强大的冻结骨干网络,但直接应用于开放词汇语义分割 (OVSS) 效率低下:全分辨率解码通常在整个数据集词汇表上运行,而每个图像只包含一小部分活跃类别。我们引入ActiveSAM,一种无需训练、零样本的推理框架,将SAM 3转化为主动词汇分割器。ActiveSAM首先规范化并扩展类别提示,然后从低分辨率存在预览中估计图像条件的活跃集。只有保留的类别使用冻结的SAM 3解码器进行桶式提示复用全分辨率解码。预览阶段仅使用类别存在证据,跳过不必要的分割头计算,而最终阶段应用边缘感知背景校准以抑制低置信度像素。ActiveSAM不需要目标数据集训练、权重更新或oracle类别存在标签。在八个OVSS基准上,ActiveSAM改善了无需训练的开放词汇语义分割的速度-准确率权衡,平均比当前最先进的SegEarth-OV3高出约+1.4 mIoU,同时在大型词汇数据集上运行速度最高提升5.5倍。ActiveSAM在模拟真实世界分布偏移的图像损坏下也表现出最强的鲁棒性,使其非常适合部署在噪声输入领域,如自动驾驶和具身AI。代码可在https://github.com/VILA-Lab/ActiveSAM获取。

英文摘要

Segment Anything Model 3 (SAM 3) provides a strong frozen backbone for concept-prompted segmentation, but applying it directly to open-vocabulary semantic segmentation (OVSS) is inefficient: full-resolution decoding is typically run over the entire dataset vocabulary, whereas each image contains only a small active subset of classes. We introduce ActiveSAM, a training-free, zero-shot inference framework that turns SAM 3 into an active-vocabulary segmenter. ActiveSAM first canonicalizes and expands class prompts, then estimates an image-conditioned active set from a low-resolution presence preview. Only the retained classes are decoded at full resolution, using bucketed prompt multiplexing with the frozen SAM 3 decoder. The preview stage uses only class-presence evidence and skips unnecessary segmentation-head computation, while the final stage applies margin-aware background calibration to suppress low-confidence pixels. ActiveSAM requires no target-dataset training, no weight updates, and no oracle class-presence labels. Across eight OVSS benchmarks, ActiveSAM improves the speed-accuracy tradeoff of training-free open-vocabulary semantic segmentation, outperforming the current state-of-the-art SegEarth-OV3 by approximately +1.4 mIoU on average while running up to 5.5x faster on large-vocabulary datasets. ActiveSAM also demonstrates the strongest robustness under image corruption that simulates real-world distribution shift, making it well-suited for deployment in noisy-input domains such as autonomous driving and embodied AI. Code is available at https://github.com/VILA-Lab/ActiveSAM.

2606.17037 2026-06-16 cs.CV cs.AI cs.LG 交叉投稿

The Importance of Phase in Neural Representations: An Internal Oppenheim-Lim Test of Image Classifiers

相位在神经表示中的重要性:图像分类器的内部Oppenheim-Lim测试

Alper Yıldırım

AI总结 通过内部相位-幅度移植实验,发现图像分类器(如PRISM2D、GFNet、ViT-B/16)的预测主要依赖相位/符号信息,而图像特定幅度对读出贡献有限;ResNet-50在ReLU前存在潜在符号编码,揭示了CNN与注意力模型在纹理-形状差异上的机制。

详情
AI中文摘要

Oppenheim和Lim(1981)表明,自然图像仅从傅里叶相位重建时仍可识别,而幅度几乎不携带其身份信息。我们探究训练后的图像分类器是否在其隐藏层内再现这种不对称性,并进行因果测试:给定两幅图像,我们在选定层将一幅图像的相位移植到另一幅图像的幅度上,并记录预测跟随哪幅图像。在PRISM2D、GFNet和ViT-B/16中,预测跟随相位或符号捐赠者,删除所有图像特定幅度几乎不影响准确率,因此身份信息依赖于相位,而图像特定幅度对读出而言在很大程度上是可舍弃的。ResNet-50起初似乎打破了这一模式,因为在ReLU之后移植符号无效;在ReLU之前的公平干预揭示了后期块中存在强烈的潜在符号编码,而仅DC对照表明读出消耗了通道空间平均值。对照排除了幅度简单地不依赖于图像的平凡情况。因此,这些架构共享一个相位/符号身份编码,但以不同基(由整流和读出几何决定)暴露出来,这为CNN与注意力模型之间的纹理-形状差异提供了机制性解释。

英文摘要

Oppenheim and Lim (1981) showed that natural images stay recognizable when reconstructed from their Fourier phase alone, while the magnitude carries little of their identity. We ask whether trained image classifiers reproduce this asymmetry inside their hidden layers, and we test it causally: given two images, we transplant the phase of one onto the magnitude of the other at a chosen layer and record which image the prediction follows. In PRISM2D, GFNet, and ViT-B/16 the prediction follows the phase or sign donor, and deleting all image-specific magnitude barely moves accuracy, so identity rides on phase while image-specific magnitude is largely dispensable to the readout. ResNet-50 at first seems to break the pattern, because transplanting sign after its ReLUs does nothing; a fair intervention before the ReLU reveals a strong latent sign code in the late blocks, and a DC-only control shows the readout consumes a channel-wise spatial average. Controls rule out the trivial case in which magnitude simply stops depending on the image. The architectures therefore share a phase/sign identity code but expose it in different bases, set by rectification and readout geometry, which gives a mechanistic account of the texture--shape gap between CNNs and attention models.

2606.17046 2026-06-16 cs.RO cs.CV cs.LG 交叉投稿

Geometric Action Model for Robot Policy Learning

几何动作模型用于机器人策略学习

Jisang Han, Seonghu Jeon, Jaewoo Jung, René Zurbrügg, Honggyu An, Tifanny Portela, Marco Hutter, Marc Pollefeys, Seungryong Kim, Sunghwan Hong

发表机构 * KAIST AI(韩国科学技术院人工智能学院) ETH Zurich(苏黎世联邦理工学院) ETH AI Center(苏黎世联邦理工学院人工智能中心)

AI总结 提出几何动作模型(GAM),通过重用预训练几何基础模型(GFM)作为共享骨干,实现语言条件下的操作策略,在仿真和真实机器人任务中优于现有方法。

Comments Project page: https://cvlab-kaist.github.io/Geometric-Action-Model/

详情
AI中文摘要

通用机器人策略必须遵循用户指令,同时推理物体、相机和机器人动作如何在3D物理世界中交互。最近的视觉-语言-动作模型(VLAs)和视频世界-动作模型(WAMs)从大规模基础模型中继承了强大的语义或时间先验,但它们仍然主要在2D图像帧或2D派生的潜在空间上操作,隐含了接触丰富操作所需的3D几何信息。我们提出了几何动作模型(GAM),一种语言条件操作策略,直接重用预训练的几何基础模型(GFM)作为感知、时间预测和动作解码的共享基础。GAM在中间层分割GFM:浅层作为观察编码器,在分割层插入一个因果未来预测器,根据语言、本体感受和动作历史预测未来的潜在令牌。然后,预测的未来令牌通过剩余的GFM块进行特征传播和解码,使得单个骨干能够同时产生未来几何和动作。这种设计通过最小的架构修改赋予GFM语言条件的时间世界建模能力,同时保留其丰富的几何先验。在广泛的仿真和真实机器人操作基准测试中,GAM比当前基础模型规模的基线更准确、更鲁棒、更快、更轻量。

英文摘要

Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WAMs) inherit strong semantic or temporal priors from large-scale foundation models, but they still operate primarily on 2D image frames or 2D-derived latent spaces, leaving implicit the 3D geometry required for contact-rich manipulation. We propose the Geometric Action Model (GAM), a language-conditioned manipulation policy that directly repurposes a pretrained geometric foundation model (GFM) as a shared substrate for perception, temporal prediction, and action decoding. GAM splits the GFM at an intermediate layer: the shallow layers serve as an observation encoder, and a causal future predictor inserted at the split layer forecasts future latent tokens conditioned on language, proprioception, and action history. The predicted future tokens are then routed through the remaining GFM blocks for feature propagation and decoding, allowing a single backbone to produce both future geometry and actions. This design equips the GFM with language-conditioned temporal world modeling through minimal architectural modification while preserving its rich geometric priors. Across a broad suite of simulation and real-robot manipulation benchmarks, GAM is more accurate, more robust, faster, and lighter than current foundation-model-scale baselines.

2410.11687 2026-06-16 cs.LG cs.AI cs.NE 版本更新

Learning in the Recurrent State: Gradient Descent with Linear Recurrent Networks

循环状态中的学习:线性循环网络的梯度下降

Yudou Tian, Neeraj Mohan Sushma, Harshvardhan Mestha, Nicolo Colombo, David Kappel, Anand Subramoney

发表机构 * Center for Cognitive Interaction Technology CITEC, Universität Bielefeld, Germany(认知交互技术中心CITEC,比勒菲尔德大学,德国) Department of Electrical and Electronics Engineering, Birla Institute of Technology and Science Pilani, India(电子与电子工程系,比拉理工科学与技术学院比兰,印度) Department of Computer Science, Royal Holloway, University of London, United Kingdom(计算机科学系,伦敦皇家霍洛威大学,英国)

AI总结 提出一种线性循环网络架构GRIL,通过乘法读出和滑动窗口交叉积自注意力更新,使其能在单次前向传播中实现任务特定线性预测器的小批量梯度下降,并在长程竞技场和语言建模中取得有效性能。

Comments 28 pages, 11 figures

详情
AI中文摘要

线性循环网络(LRNNs)提供线性时间序列建模,但标准循环更新不直接暴露上下文梯度下降所需的监督乘积。我们为LRNNs提出一种充分的构造性归纳偏置:配备乘法读出的对角循环状态和短滑动窗口交叉积自注意力更新。由此产生的架构,基于梯度的循环上下文学习器(GRIL),可以在单次前向传播中实现任务特定线性预测器的小批量梯度下降。同一设计扩展到多步更新和交叉熵分类,并有限地基于MLP扩展到非线性回归。实验上,训练好的GRIL在合成ICL任务上恢复了构造所预测的行为和参数,并且相同的架构偏置在长程竞技场和语言建模中产生了有用的性能。这些结果表明,窗口化交叉积自注意力是一种实用的、可测试的归纳偏置,使LRNNs通过类似梯度下降的更新在上下文中学习。

英文摘要

Linear recurrent networks (LRNNs) offer linear-time sequence modeling, but standard recurrent updates do not directly expose the supervised products needed for in-context gradient descent. We propose a sufficient constructive inductive bias for LRNNs: equip a diagonal recurrent state with multiplicative readout and a short sliding-window cross-product self-attention update. The resulting architecture, Gradient-based Recurrent In-context Learner (GRIL), can implement minibatch gradient descent on a task-specific linear predictor during a single forward pass. The same design extends to multi-step updates and cross-entropy classification, with a limited MLP-based extension to non-linear regression. Empirically, trained GRILs recover the behavior and parameters predicted by the construction on synthetic ICL tasks, and the same architectural bias yields useful performance on Long Range Arena and language modelling. These results present windowed cross-product self-attention as a practical, testable inductive bias for LRNNs that learn in context through gradient-descent-like updates.

2502.07209 2026-06-16 cs.LG 版本更新

Enhancing Physics-Informed Neural Networks Through Feature Engineering

通过特征工程增强物理信息神经网络

Shaghayegh Fazliani, Zachary Frangella, Madeleine Udell

发表机构 * Department of Mathematics, Stanford University(数学系,斯坦福大学) Department of Management Science & Engineering, Stanford University(管理科学与工程系,斯坦福大学) ICME, Stanford University(ICME,斯坦福大学)

AI总结 提出SAFE-NET,一种单层自适应特征工程网络,通过傅里叶特征和简化架构,以更少参数实现比深层网络更快的收敛和更低的误差。

Comments Published in Transactions on Machine Learning Research (TMLR), November 2025

详情
AI中文摘要

物理信息神经网络(PINNs)旨在通过深度学习求解偏微分方程(PDEs)。主流方法采用全连接多层深度学习架构,需要长时间训练才能达到中等精度,而近期关于特征工程的工作则实现了更高的精度和更快的收敛。本文介绍了SAFE-NET,一种单层自适应特征工程网络,与基线特征工程方法相比,它以更少的参数实现了数量级更低的误差。SAFE-NET回归机器学习的基本思想,使用傅里叶特征、简化的单隐藏层网络架构以及一种有效的优化器,改善了PINN优化问题的条件。数值结果表明,SAFE-NET收敛更快,并且通常优于更深的网络和更复杂的架构。它始终使用更少的参数——平均比竞争的特征工程方法少65%——同时在不到30%的训练周期内达到相当的精度。此外,每个SAFE-NET周期比竞争的特征工程方法快95%。这些发现挑战了现代PINNs在这些科学应用中有效学习特征的普遍观点,并强调了通过特征工程可能实现的效率提升。

英文摘要

Physics-Informed Neural Networks (PINNs) seek to solve partial differential equations (PDEs) with deep learning. Mainstream approaches that deploy fully-connected multi-layer deep learning architectures require prolonged training to achieve even moderate accuracy, while recent work on feature engineering allows higher accuracy and faster convergence. This paper introduces SAFE-NET, a Single-layered Adaptive Feature Engineering NETwork that achieves orders-of-magnitude lower errors with far fewer parameters than baseline feature engineering methods. SAFE-NET returns to basic ideas in machine learning, using Fourier features, a simplified single hidden layer network architecture, and an effective optimizer that improves the conditioning of the PINN optimization problem. Numerical results show that SAFE-NET converges faster and typically outperforms deeper networks and more complex architectures. It consistently uses fewer parameters -- on average, 65% fewer than the competing feature engineering methods -- while achieving comparable accuracy in less than 30% of the training epochs. Moreover, each SAFE-NET epoch is 95% faster than those of competing feature engineering approaches. These findings challenge the prevailing belief that modern PINNs effectively learn features in these scientific applications and highlight the efficiency gains possible through feature engineering.

2505.18227 2026-06-16 cs.LG cs.AI 版本更新

Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality

Token缩减应超越生成模型中的效率——从视觉、语言到多模态

Zhenglun Kong, Yize Li, Fanhu Zeng, Lei Xin, Shvat Messica, Xue Lin, Pu Zhao, Manolis Kellis, Hao Tang, Marinka Zitnik

发表机构 * Harvard University(哈佛大学) Northeastern University(东北大学) CAS(中国科学院) Wuhan University(武汉大学) MIT(麻省理工学院) Peking University(北京大学)

AI总结 本文提出Token缩减应超越传统效率优化,成为生成模型的基础原则,通过减少冗余token来促进多模态融合、缓解幻觉、维持长输入连贯性并提升训练稳定性。

Comments Project page: https://github.com/ZLKong/Awesome-Collection-Token-Reduction

详情
AI中文摘要

在Transformer架构中,token——从原始数据中分割出的离散单元——通过将输入切分为固定长度的块而形成。每个token被映射到一个嵌入向量,从而在保留输入关键信息的同时实现并行注意力计算。由于Transformer自注意机制的二次计算复杂度,token缩减主要被用作一种效率策略,尤其在单一视觉和语言领域,它有助于平衡计算成本、内存使用和推理延迟。尽管取得了这些进展,本文认为在大规模生成模型时代,token缩减应超越其传统的效率导向角色。相反,我们将其定位为生成建模中的基本原则,对模型架构和更广泛的应用产生关键影响。具体而言,我们认为在视觉、语言和多模态系统中,token缩减可以:(i) 促进更深层次的多模态集成和对齐,(ii) 缓解“过度思考”和幻觉,(iii) 在长输入上保持连贯性,(iv) 增强训练稳定性等。我们将token缩减重新定义为不仅仅是效率措施。通过这样做,我们概述了有前景的未来方向,包括算法设计、强化学习引导的token缩减、用于上下文学习的token优化、智能体框架设计以及更广泛的机器学习和科学领域。

英文摘要

In Transformer architectures, tokens\textemdash discrete units derived from raw data\textemdash are formed by segmenting inputs into fixed-length chunks. Each token is then mapped to an embedding, enabling parallel attention computations while preserving the input's essential information. Due to the quadratic computational complexity of transformer self-attention mechanisms, token reduction has primarily been used as an efficiency strategy. This is especially true in single vision and language domains, where it helps balance computational costs, memory usage, and inference latency. Despite these advances, this paper argues that token reduction should transcend its traditional efficiency-oriented role in the era of large generative models. Instead, we position it as a fundamental principle in generative modeling, critically influencing both model architecture and broader applications. Specifically, we contend that across vision, language, and multimodal systems, token reduction can: (i) facilitate deeper multimodal integration and alignment, (ii) mitigate "overthinking" and hallucinations, (iii) maintain coherence over long inputs, and (iv) enhance training stability, etc. We reframe token reduction as more than an efficiency measure. By doing so, we outline promising future directions, including algorithm design, reinforcement learning-guided token reduction, token optimization for in-context learning, agentic framework design, and broader ML and scientific domains.

2505.23878 2026-06-16 cs.LG cs.AI 版本更新

AC-ODM: Actor--Critic Online Data Mixing for Sample-Efficient LLM Pretraining

AC-ODM: 面向样本高效LLM预训练的演员-评论家在线数据混合

Jing Ma, Chenhao Dang, Mingjie Liao

发表机构 * National University of Singapore(新加坡国立大学) University of Science and Technology of China(中国科学技术大学)

AI总结 提出AC-ODM方法,从强化学习视角优化预训练数据混合,通过参数化策略实现梯度构造性干扰最大化,支持代理与非代理两种模式,显著提升收敛速度和下游任务准确率。

Comments ICML 2026 (Poster)

详情
AI中文摘要

优化预训练数据组成对于LLM的泛化能力至关重要。虽然动态混合通过捕捉不断变化的训练动态优于静态策略,但当前方法无法在计算效率、样本效率和结构灵活性之间取得平衡,以适应多样化的数据源。我们引入了演员-评论家在线数据混合(AC-ODM),该方法从强化学习角度处理数据混合,使用参数化策略,我们理论上证明该策略充当动态线性代理,最大化梯度的构造性干扰。为增强实际灵活性,AC-ODM支持两种操作模式:(i)代理模式,用于固定的预准备语料库,其中在小模型上学习的策略迁移到更大的目标模型;(ii)非代理模式,用于无需先验知识的直接端到端从头训练。实验上,AC-ODM在各种架构上的收敛速度和下游准确率显著优于先前方法。在Pythia-1B上,它使用比竞争基线少66%的训练步骤达到最优验证困惑度,在MMLU准确率上实现27.5%的相对提升,在HumanEval上pass@1提高2.23倍,同时每步墙钟时间几乎可忽略不计(增加0.4%),内存开销仅增加2%。代码可在https://this https URL获取。

英文摘要

Optimizing pretraining data composition is pivotal for LLM generalization. While dynamic mixing outperforms static strategies by capturing evolving training dynamics, current methods fail to reconcile computational efficiency with sample efficiency and structural flexibility for diverse pipelines.We introduce Actor--Critic Online Data Mixing (AC-ODM), which approaches data mixing from a reinforcement learning perspective with a parameterized policy that we theoretically prove to act as a dynamic linear surrogate maximizing the constructive interference of gradients. To enhance practical flexibility, AC-ODM supports two operational modes: (i) a proxy mode for fixed, pre-prepared corpora, where a policy learned on a small model is transferred to a larger target; and (ii) a non-proxy mode for direct end-to-end training from scratch without priors. Empirically, AC-ODM significantly outperforms prior methods in convergence speed and downstream accuracy across various architectures. On Pythia-1B, it reaches optimal validation perplexity using up to 66% fewer training steps than competitive baselines, delivering a 27.5% relative improvement in MMLU accuracy and a 2.23 x higher pass@1 on HumanEval, all while incurring a virtually negligible (0.4%) per-step wall-clock increase and only 2% additional memory overhead. Code is available at https://github.com/DANG-ai/AC-ODM.

2506.20015 2026-06-16 cs.LG cs.IT cs.NE math.IT 版本更新

Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons

基于共振-放电神经元的神经形态无线分割计算

Dengyu Wu, Jiechen Chen, H. Vincent Poor, Bipin Rajendran, Osvaldo Simeone

发表机构 * Department of Engineering, King’s College London(工程系,伦敦国王学院) Department of Electrical and Computer Engineering, Princeton University(电气与计算机工程系,普林斯顿大学) Institute for Intelligent Networked Systems, Northeastern University London(智能网络化系统研究所,伦敦东北大学)

AI总结 提出一种利用共振-放电神经元直接处理时域信号的无线分割计算架构,通过振荡动力学提取谱特征,降低脉冲率和能耗,在音频和调制分类任务中达到与传统方法相当的精度。

详情
AI中文摘要

神经形态计算为传统深度学习加速器提供了一种节能替代方案,尤其适用于时间序列数据的实时处理。然而,许多边缘应用(如无线感知和音频识别)生成的流信号具有丰富的频谱特征,而传统的漏积分-放电(LIF)脉冲神经元无法有效捕获这些特征。本文研究了一种无线分割计算架构,该架构采用具有振荡动力学的共振-放电(RF)神经元直接处理时域信号,从而消除了昂贵的频谱预处理需求。通过在可调频率上共振,RF神经元在保持低脉冲活动的同时提取时间局部化的频谱特征。这种时间稀疏性转化为计算和传输能量的显著节省。假设采用基于OFDM的模拟无线接口进行脉冲传输,我们提出了一个完整的系统设计,并在音频分类和调制分类任务上评估其性能。实验结果表明,所提出的RF-SNN架构在推理和通信期间实现了与传统LIF-SNN和ANN相当的精度,同时显著降低了脉冲率和总能耗。

英文摘要

Neuromorphic computing offers an energy-efficient alternative to conventional deep learning accelerators, particularly for real-time processing of time-series data. However, many edge applications, such as wireless sensing and audio recognition, generate streaming signals with rich spectral features that are not effectively captured by conventional leaky integrate-and-fire (LIF) spiking neurons. This paper investigates a wireless split computing architecture that employs resonate-and-fire (RF) neurons with oscillatory dynamics to process time-domain signals directly, eliminating the need for costly spectral pre-processing. By resonating at tunable frequencies, RF neurons extract time-localized spectral features while maintaining low spiking activity. This temporal sparsity translates into significant savings in both computation and transmission energy. Assuming an OFDM-based analog wireless interface for spike transmission, we present a complete system design and evaluate its performance on audio classification and modulation classification tasks. Experimental results show that the proposed RF-SNN architecture achieves comparable accuracy to conventional LIF-SNNs and ANNs, while substantially reducing spike rates and total energy consumption during inference and communication.

2508.05287 2026-06-16 cs.LG cs.AI 版本更新

FlowState: Sampling-Rate-Equivariant Time-Series Forecasting

FlowState: 采样率等变的时间序列预测

Lars Graf, Thomas Ortner, Stanisław Woźniak, Angeliki Pantazi

发表机构 * GitHub

AI总结 提出FlowState架构,通过状态空间模型编码器和函数基解码器实现采样率等变预测,无需重新训练即可适应不同采样率和预测长度,在GIFT-Eval基准上取得最优结果。

Comments Proceedings of the 43 rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026. Copyright 2026 by the author(s)

详情
AI中文摘要

现有的时间序列基础模型(TSFMs)通常基于Transformer变体,缺乏对不同采样率的适应性,难以在不同上下文和目标长度上泛化,且计算效率低下。我们提出FlowState,一种新颖的TSFM架构,通过将状态空间模型(SSM)编码器与函数基解码器(FBD)配对,实现采样率等变预测。这种设计支持连续时间建模和动态时间尺度调整,使FlowState能够天然地泛化到所有可能的时间分辨率,并动态调整预测范围而无需重新训练。我们进一步提出一种高效的预训练策略,提高了鲁棒性并加速了训练。尽管FlowState是最小的TSFMs之一,它在广泛使用的GIFT-Eval基准上取得了最先进的结果,同时展现出对未见采样率的卓越适应性。我们的详细分析证实了其组件的有效性,并展示了其适应不同输入采样率的独特能力。

英文摘要

Existing time series foundation models (TSFMs), often based on transformer variants, lack adaptability to different sampling rates, struggle with generalization across varying context and target lengths, and are computationally inefficient. We introduce FlowState, a novel TSFM architecture that achieves sampling-rate-equivariant forecasting through a unified design that pairs a state space model (SSM) encoder with a functional basis decoder (FBD). This design enables continuous-time modeling and dynamic time-scale adjustment, allowing FlowState to inherently generalize across all possible temporal resolutions, and dynamically adjust the forecasting horizons without retraining. We further propose an efficient pretraining strategy that improves robustness and accelerates training. Despite being one of the smallest TSFMs, FlowState achieves state-of-the-art results on the widely used GIFT-Eval benchmark, while demonstrating superior adaptability to unseen sampling rates. Our detailed analyses confirm the effectiveness of its components, and we demonstrate its unique ability to adapt to varying input sampling rates.

2601.16509 2026-06-16 cs.LG cs.AI 版本更新

Adaptive $k$NN graph model

自适应 $k$NN 图模型

Jiaye Li, Hang Xu, Shichao Zhang

发表机构 * The State Key Laboratory of Blockchain and Data Security(区块链与数据安全国家重点实验室) Zhejiang University(浙江大学) The School of Computer Science and Engineering(计算机科学与工程学院) Central South University(中南大学) School of Computer Science and Engineering(计算机科学与工程学院) Guangxi Normal University(广西师范大学)

AI总结 提出一种基于分层可导航小世界图与预计算投票机制的自适应图模型,将邻居选择与加权的计算负担转移到训练阶段,在保持分类精度的同时实现实时推理速度。

Comments 31 pages, 5 figures

详情
AI中文摘要

$k$ 近邻 ($k$NN) 算法是人工智能中非参数分类的基石,但其在大规模应用中的部署始终受到推理速度与准确性之间计算权衡的限制。现有的近似最近邻解决方案加速了检索,但往往降低了分类精度,并且缺乏选择最优邻域大小 ($k$) 的自适应性。本文提出了一种自适应图模型,将推理延迟与计算复杂度解耦。通过将分层可导航小世界 (HNSW) 图与预计算投票机制相结合,我们的框架将邻居选择和加权的计算负担完全转移到训练阶段。在这种拓扑结构中,较高的图层次实现快速导航,而较低的层次则通过自适应邻居数量编码精确的、节点特定的决策边界。在六个不同数据集上与八种最先进基线进行基准测试,我们证明了该架构显著加速了推理速度,实现了实时性能,且不牺牲分类精度。这些发现为 $k$NN 固有的推理瓶颈提供了可扩展、鲁棒的解决方案,为基于图的非参数学习奠定了自适应的结构基础。

英文摘要

The $k$-nearest neighbors ($k$NN) algorithm is a cornerstone of non-parametric classification in artificial intelligence, yet its deployment in large-scale applications is persistently constrained by the computational trade-off between inference speed and accuracy. Existing approximate nearest neighbor solutions accelerate retrieval but often degrade classification precision and lack adaptability in selecting the optimal neighborhood size ($k$). Here, we present an adaptive graph model that decouples inference latency from computational complexity. By integrating a Hierarchical Navigable Small World (HNSW) graph with a pre-computed voting mechanism, our framework completely transfers the computational burden of neighbor selection and weighting to the training phase. Within this topological structure, higher graph layers enable rapid navigation, while lower layers encode precise, node-specific decision boundaries with adaptive neighbor counts. Benchmarking against eight state-of-the-art baselines across six diverse datasets, we demonstrate that this architecture significantly accelerates inference speeds, achieving real-time performance, without compromising classification accuracy. These findings offer a scalable, robust solution to the inherent inference bottleneck of $k$NN, laying an adaptive structural foundation for graph-based nonparametric learning.

2601.22642 2026-06-16 cs.LG 版本更新

Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification

突破自然推理的边界:来自形式逻辑验证的交错奖励

Chuxue Cao, Jinluan Yang, Haoran Li, Kunhao Pan, Zijian Zhao, Zhengyu Chen, Yuchen Tian, Lijun Wu, Conghui He, Sirui Han, Yike Guo

发表机构 * GitHub

AI总结 提出形式逻辑验证引导框架,通过交错验证与生成过程实时纠正推理错误,结合两阶段训练,在数学、逻辑和通用推理基准上显著提升大模型性能。

Comments Accepted by ICML 26

详情
AI中文摘要

大型语言模型(LLMs)展现出卓越的能力,但其随机性的下一个词预测会导致逻辑不一致和奖励黑客问题,而形式符号系统则避免了这些问题。为弥合这一差距,我们引入了一个形式逻辑验证引导的框架,该框架动态地将形式符号验证与自然语言生成过程交错进行,提供实时反馈以在错误发生时检测并纠正它们。与之前受限于被动事后验证的神经符号方法不同,我们的方法在推理链中主动惩罚中间谬误。我们通过一种新颖的两阶段训练流程来实现该框架,该流程协同了形式逻辑验证引导的监督微调和策略优化。在涵盖数学、逻辑和通用推理的六个基准上的广泛评估表明,我们的7B和14B模型分别以平均10.4%和14.2%的幅度优于最先进的基线。这些结果验证了形式验证可以作为一种可扩展机制,显著推动高级LLM推理的性能边界。

英文摘要

Large Language Models (LLMs) show remarkable capabilities, yet their stochastic next-token prediction creates logical inconsistencies and reward hacking that formal symbolic systems avoid. To bridge this gap, we introduce a formal logic verification-guided framework that dynamically interleaves formal symbolic verification with the natural language generation process, providing real-time feedback to detect and rectify errors as they occur. Distinguished from previous neuro-symbolic methods limited by passive post-hoc validation, our approach actively penalizes intermediate fallacies during the reasoning chain. We operationalize this framework via a novel two-stage training pipeline that synergizes formal logic verification-guided supervised fine-tuning and policy optimization. Extensive evaluation on six benchmarks spanning mathematical, logical, and general reasoning demonstrates that our 7B and 14B models outperform state-of-the-art baselines by average margins of 10.4% and 14.2%, respectively. These results validate that formal verification can serve as a scalable mechanism to significantly push the performance boundaries of advanced LLM reasoning.

2602.05352 2026-06-16 cs.LG math.SG 版本更新

Smoothness Errors in Dynamics Models and How to Avoid Them

动力学模型中的平滑误差及如何避免

Edward Berman, Luisa Li, Jung Yeon Park, Robin Walters

发表机构 * GitHub arXiv

AI总结 本文研究了不同GNN在动力学建模中的平滑效应,证明了单位ary卷积对这类任务有害,并提出放松的单位ary卷积以平衡平滑性保留与物理系统需求。

Comments Ecstatic to share relaxed unitary mesh convolutions with the community :D! This version contains the camera ready for ICML 2026. Send me an email with your thoughts! I love getting mail :^)

详情
AI中文摘要

现代神经网络在求解表面偏微分方程中表现出潜力,通常通过将表面离散化为网格并使用网格感知图神经网络进行学习。然而,图神经网络存在过平滑问题,即节点特征逐渐趋同。单位ary图卷积通过数学约束保持平滑性,被提出以解决此问题。尽管如此,在许多物理系统如扩散过程中,平滑性会自然增加,单位性可能过于约束。本文系统研究了不同GNN的平滑效应,并证明单位ary卷积对这类任务有害。我们提出放松的单位ary卷积以平衡平滑性保留与物理系统需求。我们还将单位ary和放松的单位ary卷积从图扩展到网格。在热方程和波方程等复杂网格上的PDE以及天气预测实验中,我们发现我们的方法优于包括网格感知变压器和等变神经网络在内的多个强基线。

英文摘要

Modern neural networks have shown promise for solving partial differential equations over surfaces, often by discretizing the surface as a mesh and learning with a mesh-aware graph neural network. However, graph neural networks suffer from oversmoothing, where a node's features become increasingly similar to those of its neighbors. Unitary graph convolutions, which are mathematically constrained to preserve smoothness, have been proposed to address this issue. Despite this, in many physical systems, such as diffusion processes, smoothness naturally increases and unitarity may be overconstraining. In this paper, we systematically study the smoothing effects of different GNNs for dynamics modeling and prove that unitary convolutions hurt performance for such tasks. We propose relaxed unitary convolutions that balance smoothness preservation with the natural smoothing required for physical systems. We also generalize unitary and relaxed unitary convolutions from graphs to meshes. In experiments on PDEs such as the heat and wave equations over complex meshes and on weather forecasting, we find that our method outperforms several strong baselines, including mesh-aware transformers and equivariant neural networks.

2602.05779 2026-06-16 cs.LG cs.IT math.IT 版本更新

How Controlling the Variance can Improve Training Stability of Sparsely Activated DNNs and CNNs

如何控制方差以提高稀疏激活DNN和CNN的训练稳定性

Emily Dent, Jared Tanner

发表机构 * Mathematical Institute University of Oxford(牛津大学数学研究所)

AI总结 针对稀疏激活函数,提出增大高斯过程方差可提升训练稳定性,并设计新初始化策略实现隐藏层高达90%稀疏度的稳定训练。

详情
AI中文摘要

为随机初始化深度网络开发的混沌边缘(EoC)理论通过将中间层表征为高斯过程,既保留网络初始输出中的信息,又最小化梯度爆炸或消失,从而实现更高效的训练。该EoC理论提供了权重和偏置初始化分布方差的选择公式。对于在原点附近近似线性的激活函数,EoC理论通常鼓励高斯过程方差随深度增加收敛至零。本文考虑较少研究的高度稀疏诱导激活函数设置,其中原点附近大范围值被置为零。在此设置下,我们证明了一个新现象:导致更大固定高斯过程的初始化有利于训练稳定性。该理论指导了一种新的简单初始化策略,使得训练隐藏层稀疏度高达90%的DNN和CNN成为可能。

英文摘要

The Edge-of-Chaos (EoC) theory developed for the random initialization of deep networks allows more efficient training by both preserving information in the initial outputs of the network and minimising exploding or vanishing gradients through characterisation of the intermediate layers as Gaussian processes. This EoC theory provides formulae for the choice of the initialisation distribution variances of the weights and biases. For activations which are approximately linear around the origin, the EoC theory typically encourages the Gaussian process variance to converge towards zero with increasing depth. Here we consider the less studied setting of highly sparsity inducing activations where a large region of values near the origin are set to zero. In this setting we prove a new phenomenon whereby initialisations leading to larger fixed Gaussian processes are beneficial to training stability. This theory informs a new, yet simple, initialisation strategy that allows training DNNs and CNNs with as large as 90\% sparsity in the hidden layers.

2602.08306 2026-06-16 cs.LG 版本更新

TextResNet: Decoupling and Routing Optimization Signals in Compound AI Systems via Deep Residual Tuning

TextResNet:通过深度残差调优解耦和路由复合AI系统中的优化信号

Suizhi Huang, Mei Li, Han Yu, Xiaoxiao Li

AI总结 针对文本梯度优化器在深度链中因语义纠缠导致归因模糊的问题,提出TextResNet框架,通过前向加性语义增量、后向语义梯度分解、因果路由和密度感知调度实现信号解耦与精准路由,在复合AI系统中性能优于TextGrad且更稳定。

Comments Accepted by ICML2026

详情
AI中文摘要

文本梯度式优化器(TextGrad)能够通过复合AI系统传播类似梯度的反馈。然而,它们在深度链中表现不佳。这一局限的根本原因源于这些扩展工作流中的语义纠缠问题。在标准文本反向传播中,反馈信号将局部批评与上游上下文混合,导致归因模糊。为解决这一挑战,我们提出TextResNet,一个通过四项关键创新将优化过程重构为精确信号路由的框架。首先,在前向传播中,它强制加性语义增量以保留用于梯度流的恒等高速路。其次,在后向传播中,它通过语义投影器引入语义梯度分解,将反馈解耦为因果独立子空间。第三,它实现因果路由,将投影信号路由到其特定组件。最后,它执行密度感知优化调度,利用解耦信号动态分配资源到关键系统瓶颈。我们的结果表明,TextResNet不仅实现了优于TextGrad的性能,而且在基线崩溃的复合AI系统的智能体任务中表现出显著的稳定性。代码可在该 https URL 获取。

英文摘要

Textual Gradient-style optimizers (TextGrad) enable gradient-like feedback propagation through compound AI systems. However, they do not work well for deep chains. The root cause of this limitation stems from the Semantic Entanglement problem in these extended workflows. In standard textual backpropagation, feedback signals mix local critiques with upstream contexts, leading to Attribution Ambiguity. To address this challenge, we propose TextResNet, a framework that reformulates the optimization process to achieve precise signal routing via four key innovations. Firstly, in the forward pass, it enforces Additive Semantic Deltas to preserve an Identity Highway for gradient flow. Secondly, in the backward pass, it introduces Semantic Gradient Decomposition via a Semantic Projector to disentangle feedback into causally independent subspaces. Thirdly, it implements Causal Routing, which routes projected signals to their specific components. Finally, it performs Density-Aware Optimization Scheduling to leverage the disentangled signals to dynamically allocate resources to key system bottlenecks. Our results show that TextResNet not only achieves superior performance compared to TextGrad, but also exhibits remarkable stability for agentic tasks in compound AI systems where baselines collapse. Code is available at https://github.com/JeanDiable/TextResNet.

2602.11550 2026-06-16 cs.LG cs.AI 版本更新

TS-Memory: Plug-and-Play Memory for Time Series Foundation Models

TS-Memory: 时间序列基础模型的即插即用记忆模块

Sisuo Lyu, Siru Zhong, Tiegang Chen, Weilin Ruan, Qingxiang Liu, Taiqiang Lv, Qingsong Wen, Raymond Chi-Wing Wong, Yuxuan Liang

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港科学与技术大学(广州)) Tencent(腾讯) Squirrel Ai Learning The Hong Kong University of Science and Technology(香港科学与技术大学)

AI总结 提出参数化记忆蒸馏方法TS-Memory,通过轻量级记忆适配器增强冻结的时间序列基础模型,在分布偏移下实现无检索的高效零样本预测,显著提升点预测和概率预测性能。

详情
AI中文摘要

时间序列基础模型(TSFMs)通过大规模预训练实现了强大的零样本预测,但在分布偏移下将其适应到下游领域仍然具有挑战性。现有解决方案面临权衡:参数化适应可能导致灾难性遗忘,并需要昂贵的多领域维护,而非参数化检索虽然改善了预测,但由于数据存储搜索导致高推理延迟。我们提出了参数化记忆蒸馏,并将其实现为TS-Memory,一种增强冻结TSFMs的轻量级记忆适配器。TS-Memory分两个阶段训练。首先,我们构建一个离线、检索泄漏安全的kNN教师,从检索到的未来中合成置信度感知的分位数目标。其次,我们通过置信度门控监督将该检索诱导的分布校正蒸馏到轻量级记忆适配器中。在推理过程中,TS-Memory以常数时间开销融合记忆和骨干预测,实现无检索部署。在多种TSFMs和基准上的实验表明,与代表性的适应方法相比,在点预测和概率预测上均有一致的改进,效率与冻结骨干相当。代码:此 https URL。

英文摘要

Time Series Foundation Models (TSFMs) achieve strong zero-shot forecasting through large-scale pre-training, but adapting them to downstream domains under distribution shift remains challenging. Existing solutions face a trade-off: Parametric Adaptation can cause catastrophic forgetting and requires costly multi-domain maintenance, while Non-Parametric Retrieval improves forecasts but incurs high inference latency due to datastore search. We propose Parametric Memory Distillation and implement it as TS-Memory, a lightweight memory adapter that augments frozen TSFMs. TS-Memory is trained in two stages. First, we construct an offline, retrieval-leakage-safe kNN teacher that synthesizes confidence-aware quantile targets from retrieved futures. Second, we distill this retrieval-induced distributional correction into a lightweight memory adapter via confidence-gated supervision. During inference, TS-Memory fuses memory and backbone predictions with constant-time overhead, enabling retrieval-free deployment. Experiments across diverse TSFMs and benchmarks demonstrate consistent improvements in both point and probabilistic forecasting over representative adaptation methods, with efficiency comparable to the frozen backbone. Code: https://github.com/sisuolv/TS-Memory.

2602.20427 2026-06-16 cs.LG cs.AR 版本更新

GauS: Differentiable Scheduling Optimization via Gaussian Reparameterization

GauS:通过高斯重参数化的可微分调度优化

Yaohui Cai, Vesal Bakhtazad, Cunxi Yu, Zhiru Zhang

发表机构 * Cornell University(康奈尔大学) University of Maryland, College Park(马里兰大学学院公园分校)

AI总结 提出GauS框架,利用高斯分布对算子调度进行随机松弛,以可微分方式优化调度,捕获时间序数性质并大幅减少参数空间,首次实现流水线调度的可微分化,达到帕累托最优。

详情
AI中文摘要

高效的算子调度是软件编译和硬件合成中的基本挑战。虽然最近的可微分方法试图用基于梯度的搜索替代传统方法(如精确求解器或启发式方法),但它们通常依赖于分类分布,未能捕获时间的序数性质,并且参数空间扩展性差。在本文中,我们提出了一种新颖的可微分框架GauS,该框架使用高斯分布将算子调度建模为随机松弛,充分利用了现代并行计算设备(如GPU)。通过将调度表示为连续高斯变量,我们成功捕获了时间的序数性质,并将优化空间减少了数个数量级。我们的方法非常灵活,可以表示各种目标和约束,为复杂的流水线调度问题提供了第一个可微分公式。我们在多个基准测试上评估了我们的方法,证明GauS实现了帕累托最优结果。

英文摘要

Efficient operator scheduling is a fundamental challenge in software compilation and hardware synthesis. While recent differentiable approaches have sought to replace traditional ones like exact solvers or heuristics with gradient-based search, they typically rely on categorical distributions that fail to capture the ordinal nature of time and suffer from a parameter space that scales poorly. In this paper, we propose a novel differentiable framework, GauS, that models operator scheduling as a stochastic relaxation using Gaussian distributions, which fully utilize modern parallel computing devices like GPUs. By representing schedules as continuous Gaussian variables, we successfully capture the ordinal nature of time and reduce the optimization space by orders of magnitude. Our method is highly flexible to represent various objectives and constraints, which provides the first differentiable formulation for the complex pipelined scheduling problem. We evaluate our method on a range of benchmarks, demonstrating that Gaus achieves Pareto-optimal results.

2603.06861 2026-06-16 cs.LG cs.CV 版本更新

IGLU: The Integrated Gaussian Linear Unit Activation Function

IGLU:集成高斯线性单元激活函数

Mingi Kang, Zai Yang, Jeova Farias Sales Rocha Neto

发表机构 * Bowdoin College(布罗德学院)

AI总结 提出IGLU激活函数,基于半正态混合分布推导出闭式表达,其门控为柯西CDF,通过单一锐度参数在恒等与ReLU行为间插值,重尾特性保证非零梯度,并给出仅含ReLU操作的有理近似,在视觉和语言任务上达到或超越ReLU/GELU性能。

详情
AI中文摘要

激活函数对深度神经网络至关重要,控制着梯度流、优化稳定性和表示能力。在历史深度架构中,ReLU一直是激活函数的主要选择,而现代基于Transformer的模型越来越多地采用更平滑的替代方案,如GELU和其他自门控替代方案。尽管它们在经验上取得了成功,但这些函数之间的数学关系及其有效性背后的原理仍仅被部分理解。我们引入了IGLU,一个参数化激活函数,作为在半正态混合分布下的GELU门控的尺度混合推导得出。该推导产生了一个闭式表达式,其门控分量恰好是柯西CDF,提供了一个原则性的单参数族,通过单一锐度参数$\sigma$在类恒等和类ReLU行为之间连续插值。与GELU的高斯门控不同,IGLU的重尾柯西门控在负尾处以多项式衰减,保证所有有限输入的非零梯度,并对梯度消失具有更强的鲁棒性。我们进一步引入了IGLU-Approx,一种计算高效的IGLU有理近似,完全用ReLU操作表示,消除了超越函数求值。通过在CIFAR-10、CIFAR-100和WikiText-103上使用ResNet-20、ViT-Tiny和GPT-2 Small进行的评估,IGLU在视觉和语言数据集上相对于ReLU和GELU基线实现了具有竞争力或更优的性能,而IGLU-Approx以大幅降低的计算成本恢复了这一性能。特别地,我们表明在高度不平衡的分类数据集中,使用重尾门控带来了显著的性能提升。

英文摘要

Activation functions are fundamental to deep neural networks, governing gradient flow, optimization stability, and representational capacity. Within historic deep architectures, while ReLU has been the dominant choice for the activation function, modern transformer-based models increasingly are adopting smoother alternatives such as GELU and other self-gated alternatives. Despite their empirical success, the mathematical relationships among these functions and the principles underlying their effectiveness remains only partially understood. We introduce IGLU, a parametric activation function derived as a scale mixture of GELU gates under a half-normal mixing distribution. This derivation yields a closed-form expression whose gating component is exactly the Cauchy CDF, providing a principled one-parameter family that continuously interpolates between identity-like and ReLU-like behavior via a single sharpness parameter $σ$. Unlike GELU's Gaussian gate, IGLU's heavy-tailed Cauchy gate decays polynomially in the negative tail, guaranteeing non-zero gradients for all finite inputs and offering greater robustness to vanishing gradients. We further introduce IGLU-Approx, a computationally efficient rational approximation of IGLU expressed entirely in terms of ReLU operations that eliminates transcendental function evaluation. Through evaluations on CIFAR-10, CIFAR-100, and WikiText-103 across ResNet-20, ViT-Tiny, and GPT-2 Small, IGLU achieves competitive or superior performance on both vision and language datasets against ReLU and GELU baselines, with IGLU-Approx recovering this performance at substantially reduced computational cost. In particular, we show that employing a heavy-tailed gate leads to considerable performance gains in heavily imbalanced classification datasets.

2603.07079 2026-06-16 cs.LG cs.CL 版本更新

Entropy-Aware On-Policy Distillation of Language Models

熵感知的在线策略蒸馏语言模型

Woogyeol Jin, Taywon Min, Yongjin Yang, Dennis Wei, Yi Zhou, Swanand Ravindra Kadhe, Nathalie Baracaldo, Kimin Lee

AI总结 针对在线策略蒸馏中反向KL导致生成多样性下降和教师高熵时学习信号不稳定的问题,提出熵感知的在线策略蒸馏方法,通过在高熵时引入前向KL平衡模式寻求与模式覆盖,提升了生成多样性和学生-教师对齐度。

Comments 18 pages, 11 figures, ICML 2026

详情
AI中文摘要

在线策略蒸馏是一种有前景的语言模型知识迁移方法,学生模型沿着自身轨迹从密集的token级信号中学习。该框架通常使用反向KL散度,鼓励学生匹配教师的高置信度预测。然而,我们表明反向KL的模式寻求特性会降低生成多样性,并在教师分布具有高熵时产生不稳定的学习信号。为解决此问题,我们引入了熵感知的在线策略蒸馏。我们的关键思想是在教师熵高时,用前向KL增强标准的反向KL目标,以捕获全部合理输出范围,同时在其他地方保留精确模仿。它在不牺牲在线策略训练效率的情况下,平衡了模式寻求的精确性与模式覆盖的鲁棒性。实验表明,我们的方法保持了生成多样性(持续的token级熵),并改善了学生-教师对齐(在高熵token上降低前向KL)。在六个数学推理基准上,与基线在线策略蒸馏方法相比,Qwen3-0.6B-Base的Pass@8准确率提升+1.37,Qwen3-1.7B-Base提升+2.39,Qwen3-4B-Base提升+5.05。这些结果表明,考虑教师不确定性对于保持多样性和实现有效知识迁移至关重要。

英文摘要

On-policy distillation is a promising approach for transferring knowledge between language models, where a student learns from dense token-level signals along its own trajectories. This framework typically uses reverse KL divergence, encouraging the student to match the teacher's high-confidence predictions. However, we show that the mode-seeking property of reverse KL reduces generation diversity and yields unstable learning signals when the teacher distribution has high entropy. To address this, we introduce Entropy-Aware On-Policy Distillation. Our key idea is augmenting the standard reverse KL objective with forward KL when teacher entropy is high, capturing the full range of plausible outputs while retaining precise imitation elsewhere. It balances mode-seeking precision with mode-covering robustness without sacrificing on-policy training efficiency. Experiments show that our method maintains generation diversity (sustained token-level entropy) and improves student-teacher alignment (lower forward KL on high-entropy tokens). Across six math reasoning benchmarks, this yields Pass@8 accuracy gains of +1.37 for Qwen3-0.6B-Base, +2.39 for Qwen3-1.7B-Base, and +5.05 for Qwen3-4B-Base compared to baseline on-policy distillation methods. These results demonstrate that accounting for teacher uncertainty is essential for maintaining diversity and achieving effective knowledge transfer.

2603.13751 2026-06-16 cs.LG 版本更新

Manifold-Orthogonal Dual-spectrum Extrapolation for Parameterized Physics-Informed Neural Networks

流形正交双谱外推法用于参数化物理信息神经网络

Zhangyong Liang, Huanhuan Gao

发表机构 * National Center for Applied Mathematics, Tianjin University(天津大学应用数学中心) School of Mechanical and Aerospace Engineering, Jilin University(吉林大学机械与 aerospace 工程学院)

AI总结 提出流形正交双谱外推法(MODE),通过主谱密集混合、残谱激活和平移解锁三种机制,在保持SVD参数效率的同时实现参数化PINN的强分布外泛化。

详情
AI中文摘要

物理信息神经网络(PINN)在建模由偏微分方程(PDE)控制的动力系统方面取得了显著成功。为避免在新的物理条件下进行昂贵的重新训练,参数化PINN(P$^2$INN)通常使用奇异值分解(SVD)来适应预训练算子以处理分布外(OOD)区域。然而,基于SVD的微调常常受到刚性子空间锁定和重要高频谱模式截断的限制,从而削弱其捕捉复杂物理转变的能力。虽然参数高效微调(PEFT)方法看起来是有希望的替代方案,但将诸如LoRA之类的传统适配器应用于P$^2$INN会引入严重的帕累托权衡,因为加法更新增加了参数开销并破坏了算子表示中固有的结构化物理流形。为了解决这些限制,我们提出了流形正交双谱外推法(MODE),这是一种用于物理算子适应的轻量级微架构。MODE将物理演化分解为互补机制,包括主谱密集混合(在冻结的正交基内实现跨模态能量转移)、残谱激活(通过单个可训练标量激活高频谱分量)以及仿射伽利略解锁(显式隔离空间平移动力学)。在具有挑战性的PDE基准测试(包括一维对流-扩散-反应方程和二维亥姆霍兹方程)上的实验表明,MODE在保持原生SVD的最小参数复杂性的同时实现了强大的分布外泛化,并优于现有的基于PEFT的基线方法。

英文摘要

Physics-informed neural networks (PINNs) have achieved notable success in modeling dynamical systems governed by partial differential equations (PDEs). To avoid computationally expensive retraining under new physical conditions, parameterized PINNs (P$^2$INNs) commonly adapt pre-trained operators using singular value decomposition (SVD) for out-of-distribution (OOD) regimes. However, SVD-based fine-tuning often suffers from rigid subspace locking and truncation of important high-frequency spectral modes, limiting its ability to capture complex physical transitions. While parameter-efficient fine-tuning (PEFT) methods appear to be promising alternatives, applying conventional adapters such as LoRA to P$^2$INNs introduces a severe Pareto trade-off, as additive updates increase parameter overhead and disrupt the structured physical manifolds inherent in operator representations. To address these limitations, we propose Manifold-Orthogonal Dual-spectrum Extrapolation (MODE), a lightweight micro-architecture designed for physics operator adaptation. MODE decomposes physical evolution into complementary mechanisms including principal-spectrum dense mixing that enables cross-modal energy transfer within frozen orthogonal bases, residual-spectrum awakening that activates high-frequency spectral components through a single trainable scalar, and affine Galilean unlocking that explicitly isolates spatial translation dynamics. Experiments on challenging PDE benchmarks including the 1D Convection--Diffusion--Reaction equation and the 2D Helmholtz equation demonstrate that MODE achieves strong out-of-distribution generalization while preserving the minimal parameter complexity of native SVD and outperforming existing PEFT-based baselines.

2605.06734 2026-06-16 cs.LG cs.AI quant-ph 版本更新

Gated QKAN-FWP: Scalable Quantum-inspired Sequence Learning

门控QKAN-FWP:可扩展的量子启发序列学习

Kuo-Chung Peng, Samuel Yen-Chi Chen, Jiun-Cheng Jiang, Chen-Yu Liu, En-Jui Kuo, Yun-Yuan Wang, Prayag Tiwari, Andrea Ceschini, Chi-Sheng Chen, Yu-Chao Hsu, Chun-Hua Lin, Tai-Yue Li, Antonello Rosato, Massimo Panella, Simon See, Saif Al-Kuwari, Kuan-Cheng Chen, Nan-Yow Chen, Hsi-Sheng Goan

发表机构 * Department of Physics and Center for Theoretical Physics, National Taiwan University(物理系与理论物理中心,国立台湾大学) National Center for High-Performance Computing, National Institutes of Applied Research(高性能计算国家中心,应用研究国家机构) Wells Fargo, New York, NY, USA(摩根大通银行,纽约,纽约州,美国) NVIDIA AI Technology Center, NVIDIA Corp., Taipei, Taiwan(NVIDIA AI技术中心,NVIDIA公司,台北,台湾) Center for Quantum Science and Engineering, National Taiwan University(量子科学与工程中心,国立台湾大学) Graduate Institute of Applied Physics, National Taiwan University(应用物理研究所,国立台湾大学) Department of Electrophysics, National Yang Ming Chiao Tung University(电子物理系,国立阳明交通大学) School of Information Technology, Halmstad University(信息科技学院,哈尔姆斯塔德大学) Department of Information Engineering, Electronics and Telecommunications (DIET), University of Rome “La Sapienza”, Rome, Italy(信息工程、电子与电信系(DIET),罗马“拉·索拉维亚”大学,罗马,意大利) Beth Israel Deaconess Medical Center & Harvard Medical School(贝瑟尔以色列德acons医疗中心及哈佛医学院) Cross College Elite Program, National Cheng Kung University(跨学院精英计划,国立成功大学)

AI总结 提出门控QKAN-FWP框架,融合快速权重编程与量子启发KAN,使用单量子比特数据重上传电路作为非线性激活,引入标量门控更新规则,在时间序列基准、MiniGrid强化学习和太阳周期预测中优于经典循环模型,并在NISQ设备上验证了可行性。

Comments 46 pages, 13 figures, 10 tables

详情
AI中文摘要

快速权重编程器(FWP)通过动态更新的参数而非循环隐藏状态来编码时间依赖关系。量子FWP(QFWP)使用变分量子电路(VQC)扩展了这一思想,但现有实现依赖于多量子比特架构,在噪声中等规模量子(NISQ)设备上难以扩展,且经典模拟成本高昂。我们提出了门控QKAN-FWP,一种将FWP与量子启发Kolmogorov-Arnold网络(QKAN)相结合的快速权重框架,使用单量子比特数据重上传电路作为可学习非线性激活,称为数据重上传激活(DARUAN)。我们进一步引入了一种标量门控快速权重更新规则,稳定参数演化,并对其自适应记忆核、几何有界性和可并行梯度路径进行了理论分析。我们在时间序列基准、MiniGrid强化学习上评估了该框架,并以实际太阳周期预测作为主要实际结果。在528个月输入窗口和132个月预测水平的长时域设置中,我们的12.5k参数模型实现了比一系列经典循环基线(参数最多达13倍)更低的缩放均方误差(MSE)、峰值幅度误差和峰值时间误差,这些基线包括长短期记忆网络(LSTM)(25.9k-89.1k参数)、WaveNet-LSTM(167k)、普通循环神经网络(11.5k)和改进的echo state网络(132k)。为了验证NISQ兼容性,我们进一步在IonQ和IBM量子处理器上部署了训练好的快速编程器,在1024次测量下恢复了与无噪声模拟器相对MSE在0.1%以内的预测精度。这些结果使门控QKAN-FWP成为一种可扩展、参数高效且NISQ兼容的量子启发序列建模方法。

英文摘要

Fast Weight Programmers (FWPs) encode temporal dependencies through dynamically updated parameters rather than recurrent hidden states. Quantum FWPs (QFWPs) extend this idea with variational quantum circuits (VQCs), but existing implementations rely on multi-qubit architectures that are difficult to scale on noisy intermediate-scale quantum (NISQ) devices and expensive to simulate classically. We propose gated QKAN-FWP, a fast-weight framework that integrates FWP with Quantum-inspired Kolmogorov-Arnold Network (QKAN) using single-qubit data re-uploading circuits as learnable nonlinear activation, known as DatA Re-Uploading ActivatioN (DARUAN). We further introduce a scalar-gated fast-weight update rule that stabilizes parameter evolution, supported by a theoretical analysis of its adaptive memory kernel, geometric boundedness, and parallelizable gradient paths. We evaluate the framework across time-series benchmarks, MiniGrid reinforcement learning, and highlight real-world solar cycle forecasting as our main practical result. In the long-horizon setting with 528-month input window and 132-month forecast horizon, our 12.5k-parameter model achieves lower scaled Mean Square Error (MSE), peak amplitude error, and peak timing error than a suite of classical recurrent baselines with up to 13x more parameters, including Long Short-Term Memory (LSTM) networks (25.9k-89.1k parameters), WaveNet-LSTM (167k), Vanilla recurrent neural network (11.5k), and a Modified Echo State Network (132k). To validate NISQ compatibility, we further deploy the trained fast programmer on IonQ and IBM Quantum processors, recovering forecasting accuracy within 0.1% relative MSE of the noiseless simulator at 1024 shots. These results position gated QKAN-FWP as a scalable, parameter-efficient, and NISQ-compatible approach to quantum-inspired sequence modeling.

2605.22873 2026-06-16 cs.LG cs.AI cs.CL 版本更新

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

LLM何时推理?基于熵相变的动力系统视角

Wei Xia, Haoqing Wang, Zhi-Hong Deng, Yehui Tang

发表机构 * Samsung Research(三星研究院) State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University(通用人工智能国家重点实验室,北京理工大学)

AI总结 本文通过早期解码熵动态检测LLM的推理状态,提出轻量级无训练路由框架EDRM,自适应选择推理策略,在减少token消耗的同时提升准确率。

详情
AI中文摘要

链式思维(CoT)推理已成为增强LLM能力的默认策略,但其应用引发了一个基本问题:显式推理何时真正有益?实证证据揭示了一个显著悖论:CoT在事实性和开放式任务上往往带来边际甚至负增益,同时成倍增加token消耗。在这项工作中,我们表明LLM推理不是任务或模型的静态属性,而是在生成过程中涌现的\emph{动态解码状态}。通过系统分析,我们发现早期熵动态提供了这一状态的可靠信号:受益于CoT的任务表现出一致的熵降低,而其他任务则呈现不稳定或增加的模式。这种行为可以解释为从高熵探索状态到低熵结构化推理状态的类相变转变。基于这些见解,我们提出了 extbf{EDRM}(基于熵动态的推理流形),一个轻量级且无需训练的路由框架,利用早期解码熵自适应选择推理策略。EDRM将熵轨迹嵌入到紧凑且可解释的流形表示中,支持零样本部署和细粒度实例级适应。在15个基准测试和4个不同规模与架构的LLM上,EDRM始终优于静态基线。在数据集层面,EDRM实现了 extbf{41--55\%}的token减少,同时仅需50个校准样本即可提高准确率。在实例层面,它进一步将准确率提升高达 extbf{4.7\%},同时保持 extbf{27--45\%}的token节省。这些结果表明,推理应被选择性地调用而非默认使用,并展示了基于熵的解码控制对于高效自适应LLM推理的有效性。

英文摘要

Chain-of-thought (CoT) reasoning has become the default strategy for enhancing LLM capabilities, yet its application raises a fundamental question: when is explicit reasoning actually beneficial? Empirical evidence reveals a striking paradox: CoT often provides marginal or even negative gains on factual and open-ended tasks while multiplying token consumption. In this work, we show that LLM reasoning is not a static property of tasks or models, but a \emph{dynamic decoding state} that emerges during generation. Through systematic analysis, we find early-stage entropy dynamics provide a reliable signal of this state: tasks benefiting from CoT exhibit consistent entropy reduction, while others display unstable or increasing patterns. This behavior can be interpreted as a phase-transition-like shift from a high-entropy exploratory regime to a low-entropy structured reasoning regime. Based on these insights, we propose \textbf{EDRM} (Entropy Dynamics-based Reasoning Manifold), a lightweight and training-free routing framework that leverages early decoding entropy to adaptively select inference strategies. EDRM embeds entropy trajectories into a compact and interpretable manifold representation, enabling both zero-shot deployment and fine-grained instance-level adaptation. Across 15 benchmarks and 4 LLMs of varying scales and architectures, EDRM consistently outperforms static baselines. At the dataset level, EDRM achieves \textbf{41--55\%} token reduction while improving accuracy with as few as 50 calibration samples. At the instance level, it further improves accuracy by up to \textbf{4.7\%} while maintaining \textbf{27--45\%} token savings. These results suggest that reasoning should be invoked selectively rather than by default, and demonstrate the effectiveness of entropy-driven decoding control for efficient and adaptive LLM inference.

2605.31027 2026-06-16 cs.LG 版本更新

Multi-Scale Separable Fourier Neural Networks for Solving High-Frequency PDEs

多尺度可分离傅里叶神经网络用于求解高频偏微分方程

Qihong Yang, Qiaolin He

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出多尺度可分离傅里叶神经网络(MS-SFNN),通过可分离表示、随机固定权重和傅里叶特征嵌入,高效精确求解线性和非线性高频偏微分方程。

Comments 51 pages, 27 figures

详情
AI中文摘要

我们提出了一种新颖的神经网络架构,称为多尺度可分离傅里叶神经网络(MS-SFNN),用于精确高效地求解线性和非线性高频偏微分方程(PDE)。MS-SFNN利用可分离表示:给定一个$d$维输入,它采用$d$个独立的子网络——每个作用于单个坐标——并通过其输出的逐元素乘法构造基函数。PDE解被近似为这些基函数的线性组合,系数由最小二乘法确定。关键的是,所有网络权重和偏置仅从单位方差的均匀分布随机初始化一次,之后保持不变。为了增强表达能力,在每个子网络中引入可调缩放因子以调节所得基函数的频率内容。通过余弦激活显式嵌入傅里叶特征,赋予该方法强大的谱逼近能力。为了缓解高频或三维问题中密集配置带来的内存瓶颈,我们用解析推导的基函数导数替代自动微分,并开发了一种内存高效的批处理QR分解算法来求解大规模最小二乘系统。数值实验表明,MS-SFNN在一系列具有挑战性的PDE上达到了前所未有的精度,显著优于物理信息神经网络(PINN)和分离变量谱神经网络(SV-SNN)等最先进方法。

英文摘要

We propose a novel neural network architecture, termed Multi-Scale Separable Fourier Neural Networks (MS-SFNN), for the accurate and efficient solution of linear and nonlinear high-frequency partial differential equations (PDEs). MS-SFNN exploits a separable representation: given a $d$-dimensional input, it employs $d$ independent subnetworks -- each acting on a single coordinate -- and constructs basis functions via element-wise multiplication of their outputs. The PDE solution is approximated as a linear combination of these basis functions, with coefficients determined by least squares. Critically, all network weights and biases are randomly initialized once, from a uniform distribution with unit variance, and remain fixed thereafter. To enhance expressivity, a tunable scaling factor is introduced in each subnetwork to modulate the frequency content of the resulting basis functions. Fourier features are explicitly embedded through cosine activations, endowing the method with strong spectral approximation capabilities. To mitigate the memory bottleneck associated with dense collocation in high-frequency or three-dimensional problems, we replace automatic differentiation with analytically derived basis function derivatives and develop a memory-efficient batched QR decomposition algorithm for solving large-scale least-squares systems. Numerical experiments demonstrate that MS-SFNN achieves unprecedented accuracy across a range of challenging PDEs, significantly outperforming state-of-the-art methods such as Physics-Informed Neural Networks (PINN) and Separated-Variable Spectral Neural Networks (SV-SNN).

2606.04678 2026-06-16 cs.LG 版本更新

Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers

基于深度条件循环Transformer的ASR测试时计算缩放

Yacouba Kaloga, Shashi Kumar, Shakeel A. Sheikh, Driss Khalil, Petr Motlicek, Ina Kodrasi

发表机构 * Idiap Research Institute(Idiap研究 institute) EPFL(瑞士联邦理工学院) BUT(布拉格技术大学) Novartis Institute of Biomedical Research(诺华生物医学研究 institute)

AI总结 提出LARM模型,通过深度条件循环Transformer将循环编码器深度变为可控的测试时计算轴,结合稀疏CTC检查点、监督时钟嵌入、FiLM深度条件和延迟软后验反馈,在LibriSpeech上随推理循环次数增加提升WER,实现测试时计算缩放从自回归语言模型推理扩展到连续非自回归语音识别。

详情
AI中文摘要

端到端ASR系统通常在推理时使用固定深度的声学编码器,这使得在不训练更大模型的情况下,难以用额外的测试时计算换取更好的识别性能。一种自然的方法是循环重用共享的Transformer块,但我们发现简单的循环并不能充分利用额外的循环计算。我们引入了LARM,一种深度条件循环Transformer,将循环编码器深度变为可控的测试时计算轴。LARM结合了稀疏CTC检查点、监督时钟嵌入、FiLM深度条件和延迟软后验反馈。这些组件将循环结构化为由潜在精炼阶段分隔的识别检查点,并允许共享权重在循环步骤间进行特化。在LibriSpeech上,LARM随着推理循环次数的增加提高了WER,并达到了与更深的非共享参数基线相竞争的性能。我们的结果表明,测试时计算缩放可以超越自回归语言模型推理,扩展到连续非自回归语音识别。

英文摘要

End-to-end ASR systems typically use fixed-depth acoustic encoders at inference, making it difficult to trade additional test-time computation for improved recognition without training a larger model. A natural approach is to reuse a shared Transformer block recurrently, but we find that naive looping does not fully exploit additional recurrent compute. We introduce LARM, a depth-conditioned looped Transformer that turns recurrent encoder depth into a controllable test-time compute axis. LARM combines sparse CTC checkpoints, supervision-clock embeddings, FiLM depth conditioning, and delayed soft-posterior feedback. These components structure the loop into recognition checkpoints separated by latent refinement phases and allow shared weights to specialize across recurrent steps. On LibriSpeech, LARM improves WER as the number of inference loops increases and achieves performance competitive with deeper unshared-parameter baselines. Our results show that test-time compute scaling can extend beyond autoregressive language-model reasoning to continuous non-autoregressive speech recognition.

2606.05878 2026-06-16 cs.LG 版本更新

TS-ICL: A Flexible Time-Indexed Foundation Model for Time Series via In-Context Learning

TS-ICL: 一种基于上下文学习的灵活时间索引时间序列基础模型

Etienne Le Naour, Tahar Nabil, Adrien Petralia

发表机构 * EDF R&D(EDF研究与发展)

AI总结 提出TS-ICL,一种基于上下文学习的概率编码器-回归器Transformer,统一了时间序列预测与插值,并在插值任务上达到新最优,同时在部分观测回溯窗口预测中表现突出。

详情
AI中文摘要

基础模型标志着时间序列建模的深刻范式转变,任务特定模型正被通用零样本模型取代。然而,当前方法主要关注预测,而现实世界的时间序列通常是不规则和部分观测的,需要模型能够联合预测、插补缺失值并处理降采样条件。为应对这些挑战,我们引入了TS-ICL,一种新颖的基于概率上下文学习的编码器-回归器Transformer,统一了预测和插值。TS-ICL将时间序列任务表述为时间戳对齐的回归,并通过训练从新颖的因果数据先验生成的合成依赖结构自然地纳入协变量。实验上,TS-ICL在插值任务上达到了新的最优,同时在单变量和协变量感知基准上与领先的预测基础模型保持竞争力。它在部分观测回溯窗口的预测中表现出特别强的性能。

英文摘要

Foundation models mark a profound paradigm shift in time series modeling, with task-specific models being superseded by general-purpose zero-shot models. Yet, current approaches primarily focus on forecasting, while real-world time series are often irregularly and partially observed, requiring models that can jointly forecast, impute missing values, and handle degraded sampling conditions. To address these challenges, we introduce TS-ICL, a novel probabilistic In-Context Learning encoder--regressor Transformer that unifies forecasting and imputation. TS-ICL formulates time series tasks as timestamp-aligned regression and naturally incorporates covariates by training on synthetic dependency structures generated from a novel causal data prior. Empirically, TS-ICL achieves a new state-of-the-art in imputation, while remaining competitive with leading forecasting foundation models across both univariate and covariate-aware benchmarks. It shows particularly strong performance in forecasting with partially observed look-back windows.

2606.07082 2026-06-16 cs.LG cs.AI 版本更新

On the Geometry of On-Policy Distillation

论在线策略蒸馏的几何结构

Zhennan Shen, Yanshu Li, Qingyu Yin, Chak Tou Leong, Zhilin Wang, Yanxu Chen, Rongduo Han, Sunbowen Lee, Yi R. Fung

发表机构 * HKUST(香港科技大学) UT Austin(得克萨斯大学奥斯汀分校) Zhejiang University(浙江大学) Hong Kong PolyU(香港理工大学) USTC(中国科学技术大学) BUPT(北京邮电大学) Nankai University(南开大学) BIT(北京理工大学)

AI总结 本文通过参数空间诊断,揭示在线策略蒸馏(OPD)的更新轨迹具有松弛离主成分、子空间锁定等独特几何特性,表明其并非介于SFT和RLVR之间的中间方法。

Comments 17 pages, 8 figures

详情
AI中文摘要

在线策略蒸馏(OPD)越来越多地被用于改进大型语言模型的推理能力,但其训练动态仍鲜为人知。我们刻画了OPD更新在参数空间中的轨迹,并将其与监督微调(SFT)和可验证奖励强化学习(RLVR)进行了比较。一套参数空间诊断一致地将OPD置于松弛的离主成分区域:与SFT相比,其更新影响更少的权重,并更强烈地避开主方向;而与RLVR相比,其约束更宽松。除了这种静态定位外,OPD还表现出子空间锁定:其累积更新迅速进入一个狭窄的低维通道。将训练限制在早期形成的更新子空间内能保持OPD的性能,但会严重降低SFT,表明该锁定子空间对OPD在功能上是充分的。控制实验进一步表明,稀疏化更新令牌和将rollout生成移至离策略能保持秩动态,而将OPD目标与RLVR混合则会改变它们。总体而言,这些结果表明OPD不仅仅是SFT和RLVR之间的中间点,而是在参数空间中诱导出自身独特的更新几何结构。

英文摘要

On-policy distillation (OPD) is increasingly used to improve large language model reasoning, but its training dynamics remain poorly understood. We characterize the trajectory of OPD updates in parameter space and compare it with supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR). A suite of parameter-space diagnostics consistently places OPD in a relaxed off-principal regime: compared with SFT, its updates affect fewer weights and avoid principal directions more strongly, while compared with RLVR, they remain less tightly constrained. Beyond this static localization, OPD exhibits subspace locking: its cumulative updates rapidly enter a narrow low-dimensional channel. Constraining training to the update subspace formed early in training preserves OPD performance but substantially degrades SFT, indicating that the locked subspace is functionally sufficient for OPD. Control experiments further show that sparsifying the update tokens and shifting rollout generation off-policy preserve the rank dynamics, whereas mixing the OPD objective with RLVR changes them. Overall, these results suggest that OPD is not merely an intermediate point between SFT and RLVR, but induces its own update geometry in parameter space.

2606.07678 2026-06-16 cs.LG cs.AI 版本更新

DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment

DOG-DPO:几何中的动态优化用于安全对齐

Yi Nian, Tiankai Yang, Yudi Zhang, Qi Pan, Zelong Xu, Shenzhe Zhu, Qingqing Luan, Yue Huang, Xiangliang Zhang, Yue Zhao

发表机构 * University of Southern California(南加州大学) Iowa State University(爱荷华州立大学) University of Wisconsin–Madison(威斯康星大学麦迪逊分校) UT Austin(德克萨斯大学奥斯汀分校) Independent Researcher(独立研究员) University of Notre Dame(圣母大学)

AI总结 提出DOG-DPO框架,将偏好对表示为模型表示空间中的方向,通过几何分解和多样性覆盖选择子集,仅用11%数据即可恢复大部分安全增益。

详情
AI中文摘要

大型语言模型的安全对齐依赖于偏好数据,但当前的流水线通常训练于大规模冗余数据集。现有的数据选择方法通常独立地对每个偏好对评分,将方向性偏好信息压缩为标量质量或多样性分数。这种以样本为中心的视角在多数据集设置中尤其受限,其中共享的安全方向与数据集特定的残余风险共存。我们提出DOG-DPO,一种无需训练的数据选择框架,将偏好对视为结构化几何信号。DOG-DPO首先将每个偏好对表示为模型表示空间中的一个方向。然后,它将多数据集偏好几何分解为全局锚点子空间和数据集特定的残余子空间。最后,它通过最大化基于多样性的覆盖来选择子集,鼓励在DPO训练前广泛、非冗余地覆盖对齐方向。在六个安全基准和两个模型骨干上,DOG-DPO仅使用11%的偏好对就实现了强大的效用-鲁棒性权衡。它恢复了全数据训练的大部分安全增益,同时完全无需教师、无需训练,并且比代表性选择基线快得多。

英文摘要

Safety alignment for large language models relies on preference data, but current pipelines often train on large, redundant datasets. Existing data selection methods typically score each preference pair independently, collapsing directional preference information into scalar quality or diversity scores. This sample-centric view is especially limiting in multi-dataset settings, where shared safety directions coexist with dataset-specific residual risks. We propose DOG-DPO, a training-free data selection framework that treats preference pairs as structured geometric signals. DOG-DPO first represents each preference pair as a direction in model representation space. It then decomposes multi-dataset preference geometry into a global anchor subspace and dataset-specific residual subspaces. Finally, it selects subsets by maximizing diversity-based coverage, encouraging broad, non-redundant coverage of alignment directions before DPO training. Across six safety benchmarks and two model backbones, DOG-DPO achieves a strong utility-robustness trade-off using only 11% of the preference pairs. It recovers most of the safety gains of full-data training while remaining entirely teacher-free, training-free, and substantially faster than representative selection baselines.

2606.11123 2026-06-16 cs.LG 版本更新

Overcoming Rank Collapse in Feedback Alignment

克服反馈对齐中的秩坍缩

Gauthier Boeshertz, Razvan Pascanu, Claudia Clopath

发表机构 * Imperial College London(伦敦帝国理工学院) Mila(Mila研究所)

AI总结 研究发现反馈对齐(FA)在深层网络中因误差信号秩低而失效,提出通过Muon优化器和隐藏活动归一化提升信号维度,在CIFAR100上ResNet-18准确率提升9个百分点。

Comments 9 pages and 4 figures, 1 table for main text. Total of 21 pages and 13 figures with appendix

详情
AI中文摘要

反向传播(BP)被广泛认为在生物学上不可行,部分原因在于它要求反馈权重是前向权重的转置以进行误差传播。有趣的是,当使用固定的随机反馈权重训练网络以规避此问题时,学习过程会将前向权重与反馈权重对齐,导致反向传播的误差信号成为BP使用的标准梯度的近似。这一过程称为反馈对齐(FA),在MLP和非常浅的CNN中有效,但难以扩展到更深层的架构。在这项工作中,我们首先研究了在CIFAR10上训练的BP和FA模型之间的差异,特别关注信号的有效秩。我们发现FA误差的秩显著较低,因此被限制在比BP更低维的子空间中,限制了参数空间的探索。受此观察启发,我们评估了两种增加FA有效维度的机制:Muon,一种使权重更新正交化的优化器;以及隐藏活动归一化,促进激活正交性。在更大的架构和基准测试中,我们发现这些方法一致地优于FA基线,例如,在CIFAR100上使用ResNet-18,准确率提高了9个百分点。我们的结果将低维梯度动力学确定为扩展FA的关键障碍,并表明诱导更高维的更新几何是扩展反向传播替代方法的有前途的途径。

英文摘要

Backpropagation (BP) is widely viewed as biologically implausible, in part because it requires feedback weights to be the transpose of forward weights for error propagation. Interestingly, when training a network with fixed random feedback weights to circumvent this issue, learning aligns the forward weights with the feedback weights, leading the backpropagated error signal to become an approximation of the standard gradient used by BP. This process, called Feedback Alignment (FA), occurs in MLPs and very shallow CNNs but does not scale well to deeper architectures. In this work, we first investigated differences between BP and FA models, trained on CIFAR10, specifically focusing on the effective rank of the signal. We found that the FA error has a considerably lower rank and hence is constrained to a lower-dimensional subspace compared to BP, limiting exploration of the parameter space. Motivated by this observation, we evaluated two mechanisms for increasing the effective dimensionality of FA: Muon, an optimiser that orthogonalises weight updates; and hidden activity normalisation, which promotes activation orthogonality. Across larger architectures and benchmarks, we find that these methods consistently improve over FA baselines, for example, on CIFAR100 with a Resnet-18, accuracy increases by 9 percentage points. Our results identify low-dimensional gradient dynamics as a key obstacle to scaling FA and suggest that inducing higher-dimensional update geometry is a promising route toward scaling alternatives to backpropagation.

2606.14398 2026-06-16 cs.LG 版本更新

A theoretical model for task routing in mixture-of-expert transformers

混合专家Transformer中任务路由的理论模型

Vinoth Nandakumar, Yongli Xiang, Yunzhi Yao, Peike Li, Tongliang Liu

发表机构 * University of Sydney(悉尼大学) Zhejiang University(浙江大学) Google Research(谷歌研究院)

AI总结 通过离散语言模型证明单层MoE Transformer可利用专家实现任务专业化,支持经验发现。

详情
AI中文摘要

混合专家(MoE)层使得在保持推理计算固定的情况下扩展Transformer模型成为可能。尽管在前沿MoE Transformer模型的实证研究中观察到了任务-专家专业化现象,但现有的理论工作使用连续混合模型进行分析,无法有效建模自然语言。一个重要的问题是使用离散语言模型从理论上解释Transformer MoE模型中的任务-专家专业化。为此,我们通过句法模板和有限键值字典表示结构化知识,并正式证明单层MoE Transformer可以通过使用专注于相应任务的专家来编码知识。我们的构造展示了查询如何被路由到唯一的、特定于任务的专家,其大小仅取决于给定任务的内在复杂度(即其句法模板和事实字典的组合大小)。我们的构造为MoE模型中局部化知识回路的实证结果提供了理论支持。我们通过实验评估模型在不同MoE损失函数下的性能来支持我们的理论发现。

英文摘要

Mixture-of-experts (MoE) layers enable the scaling of transformer models while keeping the inference compute fixed. While task-expert specialization has been observed in empirical studies of frontier MoE transformer models, existing theoretical work analyzes this using continuous mixture models that cannot be used to model natural language effectively. An important open question is to \textit{theoretically explain task-expert specialization in transformer MoE models using discrete models of language}. To address this, we represent structured knowledge via syntactic templates and finite key-value dictionaries, and prove formally that a single-layer MoE transformer can encode knowledge by using experts that specialize in the corresponding tasks. Our construction shows how queries are routed to unique, task-specific experts whose size depends solely on the intrinsic complexity of the given task (i.e. the combined size of its syntactic templates and factual dictionary). Our construction provides a theoretical support for empirical results on localized knowledge circuits in MoE models. We support our theoretical findings with experiments evaluating model performance under varying MoE loss functions.

2310.06555 2026-06-16 cs.CL cs.AI cs.LG cs.MA 版本更新

It's About Time: Temporal References in Emergent Communication

关于时间:涌现通信中的时间指代

Olaf Lipinski, Adam J. Sobey, Federico Cerutti, Timothy J. Norman

发表机构 * University of Southampton(索姆塞特大学) The Alan Turing Institute(艾伦·图灵研究所) University of Brescia(布雷西亚大学)

AI总结 研究涌现通信中时间指代缺失问题,发现仅改变损失函数不足,需修改架构(分批方法)才能使时间指代涌现,95%以上代理成功,为提升通信效率奠定基础。

Comments 23 pages main body and 31 pages supplementary material, 9 figures in main body. Code available at https://github.com/olipinski/TRG

详情
Journal ref
Journal of Artificial Intelligence Research 86, Article 11 (June 2026)
AI中文摘要

涌现通信使代理能够开发定制语言以提高通信效率。尽管已知时间结构在自然语言中的重要性,但在涌现通信中尚无时间指代的证据。本文通过探索代理如何交流时间关系来填补这一空白。我们分析了时间指代涌现的三个潜在因素:环境因素、外部因素和架构因素。实验表明,仅改变损失函数不足以使时间指代涌现;相反,架构变化是必要的。代理架构的最小变化——使用不同的分批方法——允许时间指代涌现。在强调时间关系的时间指代游戏环境中,将此修改后的设计与标准架构进行比较。分析显示,超过95%使用修改后分批方法的代理发展出了时间指代,而无需改变其损失函数。我们认为时间指代对于未来提高代理通信效率是必要的,使未来代理能够使用更接近最优编码的方式,与纯组合语言相比。这些见解为将时间指代纳入其他涌现通信设置以及研究语言的其他方面提供了基础。

英文摘要

Emergent communication enables agents to develop bespoke languages that improve communication efficiency. Despite the known importance of temporal structure in natural language, there is no existing evidence of temporal references in emergent communication. This paper addresses this gap, by exploring how agents communicate about temporal relationships. We analyse three potential factors for the emergence of temporal references: environmental, external, and architectural. Our experiments demonstrate that altering the loss function is insufficient for temporal references to emerge; rather, architectural changes are necessary. A minimal change in agent architecture, using a different batching method, allows the emergence of temporal references. This modified design is compared with the standard architecture in a temporal referential games environment, which emphasises temporal relationships. The analysis shows that over 95% of the agents with the modified batching method develop temporal references, without changes to their loss function. We consider temporal referencing necessary for future improvements to the agents' communication efficiency, enabling future agents to use a closer to optimal coding as compared to purely compositional languages. These insights provide the basis for incorporation of temporal references into other emergent communication settings, and investigation of other aspects of language.

2402.00094 2026-06-16 cs.NE cs.AI cs.LG 版本更新

Deep Neural Networks: A Formulation Via Non-Archimedean Analysis

深度神经网络:基于非阿基米德分析的公式化

W. A. Zúñiga-Galindo

发表机构 * University of Texas Rio Grande Valley School of Mathematical & Statistical Sciences(德克萨斯大学里奥格兰德谷大学数学与统计科学学院)

AI总结 提出一种基于非阿基米德局部域整数环的多层树状架构深度神经网络,该网络是定义在环上实值函数的鲁棒通用逼近器,并证明其对单位区间上平方可积函数的通用逼近性。

Comments Final version accepted in the Journal of Fourier Analysis and Applications

详情
AI中文摘要

我们引入了一类具有多层树状架构的新型深度神经网络(DNN)。这些架构使用非阿基米德局部域的整数环中的数字进行编码。这些环具有作为无限有根树的自然层次结构。这些环上的自然态射允许我们构建有限的多层架构。新的DNN是定义在所述环上的实值函数的鲁棒通用逼近器。我们还证明了这些DNN是单位区间上定义的实值平方可积函数的鲁棒通用逼近器。

英文摘要

We introduce a new class of deep neural networks (DNNs) with multilayered tree-like architectures. The architectures are codified using numbers from the ring of integers of non-Archimdean local fields. These rings have a natural hierarchical organization as infinite rooted trees. Natural morphisms on these rings allow us to construct finite multilayered architectures. The new DNNs are robust universal approximators of real-valued functions defined on the mentioned rings. We also show that the DNNs are robust universal approximators of real-valued square-integrable functions defined in the unit interval.

2405.02369 2026-06-16 cs.NE cs.AI cs.LG 版本更新

No One-Size-Fits-All Neurons: Task-based Neurons for Artificial Neural Networks

没有万能神经元:面向任务的人工神经网络神经元

Feng-Lei Fan, Meng Wang, Hang-Cheng Dong, Jianwei Ma, Tieyong Zeng

发表机构 * Department of Data Science, City University of Hong Kong(城市大学数据科学系) School of Mathematics, Harbin Institute of Technology(哈尔滨工业大学数学系) School of Instrumentation, Harbin Institute of Technology(哈尔滨工业大学仪器系) School of Earth and Space Sciences, Peking University(北京大学地球与空间科学学院) Institute for Advanced Study, Beijing Normal-Hong Kong Baptist University(北京师范大学-香港 Baptist大学高级研究院)

AI总结 受大脑神经元任务特异性的启发,提出一种两阶段框架设计任务导向神经元,通过多项式基函数引入归纳偏置,在合成数据、经典基准和实际应用中性能优于现有模型。

Comments 8 pages, 4 figures

详情
AI中文摘要

在过去十年中,许多成功的网络都采用了新颖的架构,这些架构几乎无一例外地使用相同类型的神经元。最近,越来越多的深度学习研究受到NeuroAI理念和人类大脑中观察到的神经元多样性的启发,从而提出了新颖的人工神经元设计。设计性能良好的神经元代表了相对于设计性能良好的神经架构的一个新维度。从生物学角度看,大脑并不依赖一种在所有方面都普遍适用的单一类型神经元。相反,在我们的大脑中,神经元通常是基于任务的。在本研究中,我们探讨以下问题:既然人脑是一个基于任务的神经元使用者,那么人工网络设计能否从基于任务的架构设计转向基于任务的神经元设计?由于方法论上不存在万能神经元,在相同结构下,基于任务的神经元由于对任务具有内在的归纳偏置,相比现有的通用神经元可以增强特征表示能力。具体来说,我们提出了一个用于原型化基于任务神经元的两阶段框架。作为初始步骤,我们使用多项式作为基函数来评估所提出的框架。实验上,在合成数据、经典基准和实际应用上的系统实验结果表明,所提出的基于任务的神经元设计不仅可行,而且相比其他最先进模型具有竞争力的性能。

英文摘要

In the past decade, many successful networks are on novel architectures, which almost exclusively use the same type of neurons. Recently, more and more deep learning studies have been inspired by the idea of NeuroAI and the neuronal diversity observed in human brains, leading to the proposal of novel artificial neuron designs. Designing well-performing neurons represents a new dimension relative to designing well-performing neural architectures. Biologically, the brain does not rely on a single type of neuron that universally functions in all aspects. Instead, in our brain, neurons are often task-based. In this study, we address the following question: since the human brain is a task-based neuron user, can the artificial network design go from the task-based architecture design to the task-based neuron design? Since methodologically there are no one-size-fits-all neurons, given the same structure, task-based neurons can enhance the feature representation ability relative to the existing universal neurons due to the intrinsic inductive bias for the task. Specifically, we propose a two-step framework for prototyping task-based neurons. As the initial step, we evaluate the proposed framework using polynomials as base functions. Empirically, systematic experimental results on synthetic data, classic benchmarks, and real-world applications show that the proposed task-based neuron design is not only feasible but also delivers competitive performance over other state-of-the-art models.

2505.04397 2026-06-16 cs.CV cs.AI cs.LG eess.IV 版本更新

PURe: A Plug-and-Play Product-Unit Residual Module for Vision Networks

PURe: 一种用于视觉网络的即插即用乘积单元残差模块

Ziyuan Li, Uwe Jaekel, Babette Dellen

发表机构 * Department of Mathematics, Informatics and Technology, University of Applied Sciences Koblenz(科隆应用科学大学数学、信息学与技术系) Technical University of Munich(慕尼黑技术大学)

AI总结 提出PURe模块,通过二维乘积单元的对数域公式实现稳定的局部乘法交互,可替代残差网络中的标准单元,在图像分类和CT分割任务中提升精度-参数权衡。

Comments Revised version

详情
AI中文摘要

现代视觉网络主要由加性局部变换主导,而显式的乘法局部交互仍未得到充分探索。乘积单元提供了一种直接建模此类交互的方法,但其在深度架构中的使用受到优化不稳定性的限制。在这项工作中,我们提出了PURe,一种用于深度视觉网络的乘积单元残差模块。PURe围绕一个具有实值对数域公式的二维乘积单元构建,使得乘法局部聚合在深度残差层次结构中变得实用。由此产生的模块可作为原生残差单元的即插即用替代品。我们将PURe实例化到用于图像分类的残差CNN和用于体积CT数据切片分割的二维残差编码器-解码器网络中。在Galaxy10 DECaLS、ImageNet和CIFAR-10上,PURe一致地改进了残差CNN,并产生了更有利的精度-参数权衡,使得中等深度模型能够以更小的参数预算匹配或超越显著更深的ResNet基线。在AMOS基准测试中,PURe还在3D病例级评估下改进了切片CT分割。这些结果表明,显式的乘法局部交互是深度残差视觉网络的一种实用且有效的设计原语。

英文摘要

Modern vision networks are dominated by additive local transformations, whereas explicit multiplicative local interactions remain underexplored. Product units offer a direct approach to modeling such interactions, but their use in deep architectures has been limited by optimization instability. In this work, we propose PURe, a Product-Unit Residual Module for deep vision networks. PURe is built around a 2D Product Unit with a real-valued log-domain formulation that makes multiplicative local aggregation practical within deep residual hierarchies. The resulting module serves as a drop-in replacement for native residual units. We instantiate PURe in residual CNNs for image classification and in 2D residual encoder-decoder networks for slice-based segmentation on volumetric CT data. Across Galaxy10 DECaLS, ImageNet, and CIFAR-10, PURe consistently improves residual CNNs and yields a more favorable accuracy-parameter trade-off, allowing moderately deep models to match or surpass substantially deeper ResNet baselines with much smaller parameter budgets. On the AMOS benchmark, PURe also improves slice-based CT segmentation under 3D case-level evaluation. These results show that explicit multiplicative local interaction is a practical and effective design primitive for deep residual vision networks.

2511.08577 2026-06-16 cs.CL cs.AI cs.LG cs.PF 版本更新

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Think-at-Hard: 选择性潜在迭代以改进推理语言模型

Tianyu Fu, Yichen You, Zekai Chen, Guohao Dai, Huazhong Yang, Yu Wang

AI总结 针对循环变压器中潜在过思考问题,提出Think-at-Hard方法,通过轻量级决策器选择性地在困难令牌上触发潜在迭代,并采用深度感知LoRA和双因果注意力机制,在数学、问答和编码任务上一致提升性能。

Comments Accepted by ICML'26

详情
AI中文摘要

提升大型语言模型(LLMs)的推理能力,特别是在参数约束下,对实际应用至关重要。循环变压器通过执行多次潜在迭代来细化每个令牌,超越单次前向传播。然而,我们识别出一种潜在过思考现象:大多数令牌预测在第一次前向传播后已经正确,但在后续迭代中有时会被修改为错误。我们询问选择性地跳过潜在迭代是否能提高准确性,并揭示了一个显著的潜力:使用预言迭代策略可将性能提升高达7.3%。受此启发,我们提出了Think-at-Hard (TaH),一种针对选择性迭代优化的循环变压器。TaH采用轻量级神经决策器来触发潜在迭代,仅在标准前向传播后可能不正确的令牌上触发。在潜在迭代期间,深度感知的低秩适应(LoRA)模块将目标从一般的下一个令牌预测转变为聚焦的困难令牌细化。双因果注意力机制将注意力从令牌序列维度扩展到额外的迭代深度维度,实现跨迭代信息流,同时保持完全的序列并行性。在九个基准上的实验显示,在数学、问答和编码任务上一致提升。在相同参数数量下,TaH在93%的令牌上跳过迭代,性能比始终迭代的基线高3.8-4.4%,并超过单次迭代的Qwen3基线3.0-3.8%。当允许LoRA和决策器增加不到3%的参数时,增益分别进一步增加到5.3-6.2%和6.1-6.8%。我们的代码可在以下网址获取:https://this URL。

英文摘要

Improving the reasoning abilities of Large Language Models (LLMs), especially under parameter constraints, is crucial for real-world applications. Looped transformers address this by performing multiple latent iterations to refine each token beyond a single forward pass. However, we identify a latent overthinking phenomenon: most token predictions are already correct after the first pass, but are sometimes revised into errors in later iterations. We ask whether selectively skipping latent iterations can improve accuracy, and reveal significant potential with an oracle iteration policy that boosts performance by up to 7.3%. Motivated by this, we propose Think-at-Hard (TaH), a looped transformer optimized for selective iteration. TaH employs a lightweight neural decider to trigger latent iteration, only at tokens likely to be incorrect after the standard forward pass. During latent iterations, depth-aware Low-Rank Adaptation (LoRA) modules shift the objective from general next-token prediction to focused hard-token refinement. A duo-causal attention mechanism extends attention from the token sequence dimension to an additional iteration depth dimension, enabling cross-iteration information flow with full sequential parallelism. Experiments on nine benchmarks show consistent gains across math, QA, and coding tasks. With identical parameter counts, TaH outperforms always-iterate baselines by 3.8-4.4% while skipping iterations on 93% of tokens, and exceeds single-iteration Qwen3 baselines by 3.0-3.8%. When allowing <3% more parameters from LoRA and decider, the gains further increase to 5.3-6.2% and 6.1-6.8%, respectively. Our code is available at https://github.com/thu-nics/TaH.

2602.12279 2026-06-16 cs.CV cs.AI cs.LG 版本更新

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

UniT:统一多模态思维链测试时扩展

Leon Liangyu Chen, Haoyu Ma, Zhipeng Fan, Ziqi Huang, Animesh Sinha, Xiaoliang Dai, Jialiang Wang, Zecheng He, Jianwei Yang, Chunyuan Li, Junzhe Sun, Chu Wang, Serena Yeung-Levy, Felix Juefei-Xu

发表机构 * Stanford University(斯坦福大学) Meta Superintelligence Labs(Meta超级智能实验室) Nanyang Technological University(南洋理工大学)

AI总结 提出UniT框架,通过多轮推理、验证和细化实现统一多模态模型的测试时扩展,实验表明短推理轨迹可泛化到长链,顺序思维链比并行采样更高效。

Comments CVPR 2026

详情
AI中文摘要

统一模型可以在单一架构内处理多模态理解和生成,但它们通常以单次通过的方式运行,而不迭代地细化输出。许多多模态任务,尤其是那些涉及复杂空间组合、多个交互对象或不断变化的指令的任务,需要分解指令、验证中间结果并进行迭代修正。虽然测试时扩展(TTS)已证明分配额外的推理计算用于迭代推理能显著提升语言模型性能,但将这一范式扩展到统一多模态模型仍然是一个开放挑战。我们引入了UniT,一个用于多模态思维链测试时扩展的框架,使单个统一模型能够在多轮中推理、验证和细化。UniT结合了智能体数据合成、统一模型训练和灵活的测试时推理,以激发包括验证、子目标分解和内容记忆在内的认知行为。我们的关键发现是:(1)在短推理轨迹上训练的统一模型能在测试时泛化到更长的推理链;(2)顺序思维链推理比并行采样提供更可扩展且计算高效的TTS策略;(3)在生成和编辑轨迹上训练能提升分布外视觉推理能力。这些结果确立了多模态测试时扩展作为推进统一模型中生成和理解的有效的范式。

英文摘要

Unified models can handle both multimodal understanding and generation within a single architecture, yet they typically operate in a single pass without iteratively refining their outputs. Many multimodal tasks, especially those involving complex spatial compositions, multiple interacting objects, or evolving instructions, require decomposing instructions, verifying intermediate results, and making iterative corrections. While test-time scaling (TTS) has demonstrated that allocating additional inference compute for iterative reasoning substantially improves language model performance, extending this paradigm to unified multimodal models remains an open challenge. We introduce UniT, a framework for multimodal chain-of-thought test-time scaling that enables a single unified model to reason, verify, and refine across multiple rounds. UniT combines agentic data synthesis, unified model training, and flexible test-time inference to elicit cognitive behaviors including verification, subgoal decomposition, and content memory. Our key findings are: (1) unified models trained on short reasoning trajectories generalize to longer inference chains at test time; (2) sequential chain-of-thought reasoning provides a more scalable and compute-efficient TTS strategy than parallel sampling; (3) training on generation and editing trajectories improves out-of-distribution visual reasoning. These results establish multimodal test-time scaling as an effective paradigm for advancing both generation and understanding in unified models.

2605.02427 2026-06-16 cs.AI cs.LG 版本更新

The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling

模型知晓,解码器发现:未来价值引导的粒子力量采样

Tu Nguyen, Matthieu Zimmer, Rasul Tutunov, Xiaotong Ji, Haitham Bou Ammar

发表机构 * Huawei Heisenberg Research Center(华为海森堡研究中心) Huawei Noah’s Ark Lab(华为诺亚实验室) UCL Centre for Artificial Intelligence(伦敦大学学院人工智能中心)

AI总结 本文提出APPS算法,通过块状粒子方法高效定位LLM的多步解,提升推理准确率与运行效率,减少对训练数据的依赖。

详情
AI中文摘要

英文摘要

A recurring pattern in "reasoning without training" is that base LLMs already assign non-trivial probability mass to correct multi-step solutions; the bottleneck is locating these modes efficiently at inference time. Power sampling provides a principled way to bias decoding toward such modes by targeting p_theta(x)^alpha with alpha > 1, but practical approximations must account for future-dependent correction factors that determine which prefixes remain promising. We introduce Auxiliary Particle Power Sampling (APPS), a blockwise particle algorithm for approximating the sequence-level power target with a bounded population of partial solutions. APPS propagates hypotheses in parallel using proposal-corrected power reweighting and refines their survival through future-value-guided selection at resampling boundaries. This redistributes finite compute across competing prefixes rather than committing to a single unfolding path, while providing a direct scaling knob in the particle count and predictable peak memory. We instantiate the future-value signal with short-horizon rollouts and also study an amortized variant that replaces rollouts with a lightweight learned selection head. AMore broadly, APPS improves the accuracy--runtime trade-off of training-free decoding, further supporting the view that inference-time power approximation can recover gains often attributed to post-training.

2606.01561 2026-06-16 cs.AI cs.LG 版本更新

S-SPPO: Semantic-Calibrated Self-Play Preference Optimization

S-SPPO:语义校准的自对弈偏好优化

Xiwen Chen, Wenhui Zhu, Jingjing Wang, Peijie Qiu, Zhipeng Wang, Huayu Li, ZhengXiao He, Xuanzhao Dong, Prayag Tiwari, Mingkun Xu, Yujian Xiong, Feng Luo, Abolfazl Razi, Brendan Hogan Rappazzo, Anderson Schneider, Yuriy Nevmyvaka

发表机构 * University of Arizona, USA(亚利桑那大学) Arizona State University, USA(亚利桑那州立大学) Now at Google LLC, work done at Rice University(现就职于谷歌公司,曾就职于里士大学) Clemson University, USA(克莱姆森大学) Washington University in St. Louis, USA(圣路易斯华盛顿大学) Halmstad University, Sweden(哈姆斯塔德大学) Guangdong Institute of Intelligence Science and Technology, China(广东智能科学与技术研究院)

AI总结 针对自对弈偏好优化(SPPO)中因偏好预测过度自信导致策略退化的问题,提出双空间语义校准框架S-SPPO,通过语义门控监督校准和潜在排斥表示校准,在保持博弈结构的同时提升对齐性能。

Comments Accepted by ICML2026

详情
AI中文摘要

将大型语言模型(LLM)与人类偏好对齐通常通过直接偏好优化(DPO)来实现。然而,DPO的标准Bradley-Terry实现在建模人类偏好中常见的传递性偏离方面存在局限。为解决此问题,近期工作引入了自对弈偏好优化(SPPO),通过训练自生成的胜负对来迭代优化策略。然而,我们的研究发现SPPO存在一个关键的不稳定性:当偏好预测器对语义上无法区分的响应赋予过度自信的胜利时,优化容易导致策略退化。为缓解这一问题,我们提出S-SPPO,一个双空间语义校准框架,包括:i)通过语义门控进行监督校准,随着语义重叠增加将胜率目标退火至最大熵基线;ii)通过潜在排斥进行表示校准,以强制几何多样性,防止流形坍塌并保持所选样本与拒绝样本之间的潜在多样性。理论上,我们证明该校准保持了常和博弈结构,促进收敛至纳什均衡。实验上,S-SPPO避免了先前方法中的性能退化,在AlpacaEval 2.0上使用Llama-3-8B实现了52.19%的胜率和47.46%的长度控制胜率,且在训练过程中未使用额外的人工标注偏好。代码将在https://github.com/xiwenc1/s-sppo提供。

英文摘要

Aligning Large Language Models (LLMs) with human preferences is often formulated via Direct Preference Optimization (DPO). However, the standard Bradley-Terry instantiation of DPO is limited in modeling common departures from transitivity in human preferences. To address this, recent work has introduced Self-Play Preference Optimization (SPPO), which iteratively refines the policy by training on self-generated win-lose pairs. Our investigation, however, reveals a critical instability in SPPO: the optimization is prone to policy degeneration when the preference oracle assigns overly confident wins to semantically indistinguishable responses. To mitigate this, we propose S-SPPO, a dual-space semantic calibration framework comprising: i) Supervision Calibration via semantic gating, which anneals win rate targets toward the maximum-entropy baseline as semantic overlap increases; and ii) Representation Calibration via latent repulsion to enforce geometric diversity to prevent manifold collapse and maintain latent diversity between chosen and rejected samples. Theoretically, we show that the calibration preserves the constant-sum game structure, facilitating convergence to a Nash Equilibrium. Empirically, S-SPPO avoids the performance degradation seen in prior methods, achieving 52.19% win rate and 47.46% length-controlled win rate on AlpacaEval 2.0 with Llama-3-8B, without using additional human-annotated preferences during training. The code will be available at https://github.com/xiwenc1/s-sppo.

2606.02955 2026-06-16 cs.CL cs.AI cs.LG 版本更新

Fast-dLLM++: Fréchet Profile Decoding for Faster Diffusion LLM Inference

Fast-dLLM++: 用于更快扩散LLM推理的Fréchet轮廓解码

Siva Rajesh Kasa, Yasong Dai, Sumit Negi, Hongdong Li

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 针对扩散大语言模型推理中并行令牌生成的瓶颈,提出Fréchet轮廓解码方法,通过利用异构置信度轮廓选择并行提交集,在保持模型和缓存不变的情况下提升吞吐量。

Comments Initial version accepted at Workshop on Structured Probabilistic Inference & Generative Modeling, ICML 2026. Project Page: https://ringo-star.github.io/projectpage_frechet/

详情
AI中文摘要

扩散大语言模型承诺并行令牌生成,但推理仍然受限于决定哪些掩码令牌可以安全地一起提交。Fast-dLLM通过KV缓存和置信度引导的并行解码解决了这个问题,但其解码理论使用同质高置信度假设,实际上将每个候选集简化为其最弱的选择令牌。我们认为这留下了速度提升空间,因为实际解码步骤表现出异构置信度轮廓。我们提出 extbf{Fast-dLLM++},一种无需训练的扩展,引入了\emph{Fréchet轮廓解码}:从完整的排序置信度轮廓中选择并行提交集,而不是单个最坏情况置信度。得到的规则是Fast-dLLM因子选择器的异构置信度泛化,在等置信度情况下精确恢复先前规则,并在所选令牌具有不均匀置信度时增加一个可证明的\emph{异构性奖励}。Fast-dLLM++完全保持模型、扩散过程和缓存实现不变,使其成为现有Fast-dLLM解码的直接替代品。在GSM8K、MATH、HumanEval和MBPP上使用LLaDA-8B模型的实验表明,理论改进直接转化为经验收益:轮廓感知选择通过利用最弱令牌规则忽略的安全并行性改进了准确率-吞吐量前沿,在可比准确率下实现了高达37%的吞吐量提升。我们的匿名代码发布在此https URL。

英文摘要

Diffusion large language models promise parallel token generation, yet inference remains bottlenecked by deciding which masked tokens can be safely committed together. Fast-dLLM addressed this with KV caching and confidence-guided parallel decoding, but its decoding theory uses a homogeneous high-confidence assumption that effectively reduces each candidate set to its weakest selected token. We argue that this leaves speed on the table because real decoding steps exhibit heterogeneous confidence profiles. We propose \textbf{Fast-dLLM++}, a training-free extension that introduces \emph{Fréchet profile decoding}: selecting parallel commit sets from the full sorted confidence profile rather than a single worst-case confidence. The resulting rule is a heterogeneous-confidence generalization of Fast-dLLM's factor selector and it recovers the previous rule exactly in the equal-confidence case and adds a provable \emph{heterogeneity bonus} when the selected tokens have uneven confidences. Fast-dLLM++ leaves the model, diffusion process, and cache implementation entirely unchanged, making it a drop-in replacement for existing Fast-dLLM decoding. Experiments on GSM8K, MATH, HumanEval, and MBPP with the LLaDA-8B model show that the theoretical improvement translates directly into empirical gains: profile-aware selection improves the accuracy--throughput frontier by exploiting safe parallelism that weakest-token rules miss, achieving up to 37\% higher throughput at comparable accuracy. Our code release is at https://github.com/Ringo-Star/FastdLLM_plusplus.

2606.10237 2026-06-16 cs.AI cs.LG 版本更新

Minimalist Genetic Programming

极简遗传编程

Leonardo Trujillo

发表机构 * Tecnológico Nacional de México/IT de Tijuana(墨西哥国家理工学院/蒂胡ana信息技术学院) LASIGE, Department of Informatics, Faculty of Sciences, University of Lisbon(里斯本大学科学学院信息系LASIGE)

AI总结 提出极简遗传编程(MGP),借鉴语言学中的极简主义程序,用MERGE操作替代进化搜索,在符号回归任务中有效避免膨胀,稳定找到精确解。

详情
AI中文摘要

遗传编程(GP)基于两个重要见解。首先,任何学习任务从根本上都可以视为程序归纳问题,目标是构建表示为语法树的符号层次模型。其次,将此任务视为搜索问题,并使用进化来定位所需模型。自提出以来,GP在广泛的任务和问题领域中取得了显著成果。本文通过修改GP的第二个核心见解,将问题视为句法推导任务,提出了一种替代观点。具体来说,本文提出了极简遗传编程(MGP),该算法与GP一样受生物启发,但并非源自进化,而是从人类语言的极简主义程序中汲取灵感,其中句法被理解为连接其他两个心智系统的最优解决方案。在极简主义中,核心计算过程是一个称为MERGE的二元集合形成算子,它可以通过简单的马尔可夫过程逐步构建复杂的句法结构。MGP能够发现符号表达式的核心构建块,并使用MERGE逐步组合它们。所提出的系统在已知因膨胀倾向而难以用标准GP系统解决的符号回归任务上进行了基准测试。结果表明,当选择适当的原子句法对象词典时,MGP能够在一组标准GP难以做到同样任务的符号回归中一致地产生精确的真实模型。极简主义提供的见解被证明与程序归纳问题相关,并且基于MGP在这项工作中展示的潜力,应进一步探索。

英文摘要

Genetic programming (GP) is based on two important insights. First, that any learning task can fundamentally be posed as a program induction problem, where the goal is to construct a symbolic hierarchical model that is expressed as a syntax tree. Second, to pose this task as a search problem, and use evolution to locate the desired model. Since it was proposed, GP has produced notable results in a wide range of tasks and problem domains. This work presents an alternative view by modifying the second core insight of GP, posing the problem as a syntactic derivation task instead. In particular, this paper presents Minimalist Genetic Programming (MGP), an algorithm that like GP is biologically inspired, but instead of evolution it takes inspiration from the Minimalist Program to human language, in which syntax is understood as an optimal solution to the problem of linking two other mental systems. In minimalism, the core computational process is a binary set formation operator called $MERGE$, than can be used to incrementally construct complex syntactic structures using a simple Markovian process. MGP is able to discover the core building blocks of the symbolic expressions, and to incrementally combined them using $MERGE$. The proposed system is benchmarked on symbolic regression tasks that are known to be difficult to solve with standard GP systems because of the propensity for bloat. Results show that when a proper lexicon of atomic syntactic objects are chosen, MGP is able to consistently produce the exact ground truth model on a set of symbolic regression tasks where standard GP struggles to do the same. The insights provided by minimalism are shown to be relevant to the problem of program induction, and should be explored further based on the potential exhibited by MGP in this work.

2606.13710 2026-06-16 cs.AI cs.LG 版本更新

Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher

混合开放式三重进化打造更优深度研究者

Hongming Piao, Chi Liu, Mengzhuo Chen, Yan Shu, Xidong Wang, Derek Li, Ying Wei, Bryan Dai

发表机构 * IQuest Research Zhejiang University(浙江大学)

AI总结 提出混合开放式三重进化框架,通过混合模式强化学习协同进化提议者、求解者和评判者,使8B模型在深度研究任务上超越静态开源8-32B模型及先进训练方法。

详情
AI中文摘要

深度研究和智能体进化是AI智能体在现实应用中迈向通用人工智能的实际任务。前者使智能体能够在开放环境中自主检索和整合信息以处理开放式研究任务,但受限于智能体系统的静态参数化深度研究能力。后者允许智能体自主与环境交互以获得经验,从而进化模型能力。然而,其有效性仅在具有标准答案的可验证任务上得到广泛验证,与开放式研究任务存在差距。为桥接这两个关键任务,我们提出混合开放式三重进化框架,该框架利用混合模式强化学习,基于网络规模知识促进提议者、求解者和评判者的协同进化,朝着开放式任务和环境中自主进化的智能体迈进。在三个长格式深度研究基准上的大量实验表明,通过HOTE训练的8B模型超越了最强的静态开源8-32B模型以及通过最先进深度研究训练方法训练的模型,且时间开销更少,并进一步验证了HOTE中三个模块的进化不可或缺。

英文摘要

Deep research and agent evolution serve as de-facto tasks for AI agents in real-world applications toward artificial general intelligence. The former enables autonomous retrieval and integration of information in open-ended environments to tackle open-ended research tasks, yet it is constrained by the static parametric deep research capabilities of agent systems. The latter allows agents to autonomously interact with the environment to gain experiences that evolve model capabilities. However, its effectiveness has been widely validated only on verifiable tasks with standard answers, leaving a gap with open-ended research tasks. To bridge these two critical tasks, we propose the Hybrid Open-Ended Tri-Evolution (HOTE) framework, which leverages hybrid-mode reinforcement learning to facilitate the collaborative evolution of a proposer, solver and judge based on web-scale knowledge, moving toward autonomous evolving agents in open-ended tasks and environments. Extensive experiments on three long-form deep research benchmarks demonstrate that the 8B model trained via HOTE surpasses the strongest static open 8-32B models as well as those trained by state-of-the-art deep research training methods with less time overhead, and further verify that the evolution of all three modules in HOTE is indispensable.

2. 表示学习、自监督与对比学习 21 篇

2606.15054 2026-06-16 cs.LG 新提交

Size Doesn't Matter: Cosine-Scored Sparse Autoencoders

大小无关:余弦评分稀疏自编码器

Silen Naihin, Lev Stambler

发表机构 * GitHub arXiv

AI总结 针对稀疏自编码器中内积评分受输入范数干扰的问题,提出余弦评分方法,使特征检测更关注方向对齐,实验表明该方法能更频繁地学习到人类可识别的概念。

详情
Journal ref
ICML 2026, Spotlight at the Mechanistic Interpretability Workshop
AI中文摘要

稀疏自编码器通过内积检测特征,因此特征的激活既取决于其方向对齐,也取决于输入的范数。在BatchTopK下,高范数令牌同时膨胀所有预激活,无论内容对齐如何都占用字典槽位。这很重要,因为子层归一化已经丢弃了评分所衡量的幅度,因此编码器检测到模型不读取的量。我们将评分替换为余弦相似度和输入幅度的学习混合,让优化器选择使用多少范数;每个特征的扩展让每个特征独立决定。在两种模式下,训练都可以自由恢复内积,但从未这样做,没有特征选择超过一半的幅度依赖。在匹配重构下,余弦编码器学习的特征比标准编码器更频繁地与人类可识别的概念对齐,填补了内积浪费在范数检测器上的字典槽位。均衡梯度的损失重加权几乎无法缩小差距,证实了前向传播评分几何是关键。该优势并非在所有任务或深度上普遍存在,但我们认为余弦评分应成为归一化表示上字典学习的默认选择。

英文摘要

Sparse autoencoders (SAEs) detect features via inner product, so a feature's activation scales with both its directional alignment and the input's norm. Under BatchTopK, high-norm tokens inflate all pre-activations simultaneously, claiming dictionary slots regardless of content alignment. This matters because sublayer normalization has already discarded the magnitude the score measures, so the encoder detects a quantity the model does not read. We replace the score with a learned blend of cosine similarity and input magnitude, letting the optimizer choose how much norm to use; a per-feature extension lets each feature decide independently. In both regimes, training is free to recover inner product but never does, with no feature ever choosing more than half-magnitude dependence. At matched reconstruction, the cosine encoder learns features that align with human-recognizable concepts far more often than standard, filling dictionary slots that inner product wastes on norm detectors. Loss reweighting that equalizes gradients barely closes the gap, confirming forward-pass score geometry as the lever. The advantage is not universal across tasks or depths, but we believe cosine scoring should be the default for dictionary learning on normalized representations.

2606.15092 2026-06-16 cs.LG 新提交

High-Dimensional Random Projection for Activation Steering in Language Models

高维随机投影用于语言模型中的激活引导

Minh-Hieu Pham, Bach Do, Laziz Abdullaev, Tan Minh Nguyen, Khoat Than

发表机构 * Hanoi University of Science and Technology(河内科技大学) National University of Singapore(新加坡国立大学)

AI总结 针对现有激活引导方法仅捕捉均值差异的局限,提出无训练的高维随机投影激活引导方法(HiDRA),通过在投影高维空间中进行激活加法,捕获非线性特征子空间中的判别信号,实验证明其优于基线方法。

详情
AI中文摘要

激活引导已成为控制大型语言模型(LLM)行为的关键方法。然而,现有的基于均值差异的方法存在根本性局限:它们仅捕捉类别激活之间的均值差异,未能恢复在叠加假设下非线性特征子空间中自然存在的判别信号。受此启发,我们提出了高维随机投影激活引导(HiDRA),这是一种无训练的方法,可与现有的激活引导方法无缝集成。通过在投影高维空间中执行激活加法,HiDRA 能够可靠地捕捉线性方法无法达到的更好判别结构。跨不同 LLM 系列和基准的实验表明,HiDRA 始终优于基线方法,在不显著增加计算开销的情况下实现了更强的行为控制。

英文摘要

Activation steering has emerged as a key methodology for controlling the behavior of large language models (LLMs). Existing difference-in-means based methods, however, are fundamentally limited: they capture only mean differences between class activations and fail to recover discriminative signals that naturally exist in the nonlinear feature subspace under the superposition hypothesis. Motivated by that, we propose High-Dimensional Random-projection for Activation Steering (HiDRA), a training-free approach that integrates seamlessly with existing activation steering methods. By performing activation addition in the projected high-dimensional space, HiDRA can provably capture a better discriminative structure beyond the reach of linear methods. Experiments across diverse LLM families and benchmarks demonstrate that HiDRA consistently outperforms baseline counterparts, achieving stronger behavioral control without significant computational overhead.

2606.15278 2026-06-16 cs.LG cs.AI 新提交

RECTOR: Masked Region-Channel-Temporal Modeling for Affective and Cognitive Representation Learning

RECTOR:面向情感与认知表征学习的掩码区域-通道-时间建模

Jinhan Liu, Mahsa Shoaran

发表机构 * Cornell University(康奈尔大学)

AI总结 提出RECTOR自监督框架,通过自适应功能分区和掩码拓扑学习,统一建模EEG/sEEG的区域-通道-时间动态,在情感识别和任务参与分类上达到新最优,且对缺失通道和跨导联泛化鲁棒。

详情
AI中文摘要

情感和认知障碍表现为跨区域、通道和时间的分布式、时变脑网络动态,给基于EEG/sEEG的临床诊断鲁棒表征学习带来挑战。我们提出RECTOR(掩码区域-通道-时间建模),一种端到端自监督框架,超越固定解剖先验,统一联合区域-通道-时间表征学习。其核心RECTOR-SA是一种由自适应功能分区诱导的层次化块稀疏自注意力,将区域结构从静态解剖定义演变为自适应功能区域。自监督由掩码拓扑和表征学习驱动,联合优化三个互补目标:掩码预测建模、拓扑结构建模和跨视图一致性。在多个基准上,RECTOR在EEG情感识别和sEEG任务参与分类中达到新最优。关键的是,其对缺失通道的强鲁棒性和跨导联泛化能力凸显了其在异构EEG/sEEG上进行大规模预训练的潜力,并在区域和通道层面提供可解释的洞察。

英文摘要

Affective and cognitive disorders manifest as distributed, time-varying brain network dynamics across regions, channels, and time, challenging robust representation learning from EEG/sEEG for clinical diagnosis. We propose RECTOR (Masked Region-Channel-Temporal Modeling), an end-to-end self-supervised framework that unifies joint region-channel-temporal representation learning beyond fixed anatomical priors. At its core, RECTOR-SA is a hierarchical, block-sparse self-attention induced by Adaptive Functional Partitioning that evolves region structures from static anatomical definitions to adaptive functional regions. The self-supervision is driven by Masked Topology and Representation Learning, which jointly optimizes three complementary objectives: Masked Predictive Modeling, Topological Structure Modeling, and Cross-View Consistency. Across diverse benchmarks, RECTOR sets a new state-of-the-art in EEG emotion recognition and sEEG task-engagement classification. Crucially, its strong robustness to missing channels and cross-montage generalization underscores its potential for large-scale pre-training on heterogeneous EEG/sEEG, providing interpretable insights at both region and channel levels.

2606.15743 2026-06-16 cs.LG 新提交

Unsupervised Learning for Missing Modalities in Multimodal Learning

多模态学习中缺失模态的无监督学习

Hassan Ismkhan, Hamid Bouchahcia

发表机构 * Bournemouth University(伯恩茅斯大学)

AI总结 提出UL4M4框架,通过无监督聚类和迭代插补处理任意缺失模态,实现跨模态结构保持和尺度不变性,在超过50%模态缺失时仍稳定达到F1-Micro>0.7。

详情
AI中文摘要

本文通过引入多模态学习中缺失模态的无监督学习(UL4M4),解决了多模态学习中的缺失模态挑战。UL4M4是一个灵活的框架,在监督预测之前以任务无关的方式插补缺失的特征嵌入。我们提出了模态特定归一化和一种新颖的部分模态距离度量,以实现对不完整观测的公平聚类,在保持跨模态结构的同时,跨不同维度和模态数量保持尺度不变性。该无监督阶段的聚类中心指导训练或推理过程中任何缺失模态的迭代贪婪插补过程,支持任意数量的模态和每个样本的任意缺失模式。插补模块轻量级,使用冻结编码器,并与下游任务解耦,易于与任何融合/预测架构集成。在多样且高度不完整的情况下的广泛实验证明了UL4M4的鲁棒性,据我们所知,即使在超过50%的模态槽位缺失的情况下,它在具有挑战性的缺失配置上首次一致地实现了高于0.7的F1-Micro分数。结果在不同聚类大小下也保持稳定,并显著优于最先进的基线。代码可在此处获取:https://github.com/h-ismkhan/Multimodal-Learning-with-Missing-Modalities-via-Unsupervised-Learning。

英文摘要

This paper addresses the missing-modality challenge in multi-modal learning by introducing Unsupervised Learning for Missing Modalities in Multi-Modal Learning (UL4M4), a flexible framework that imputes missing feature embeddings in a task-independent manner before supervised prediction. We propose modality-specific normalization and a novel partial-modality distance metric to enable fair clustering of incomplete observations, capturing cross-modal structures while preserving scale-invariance across varying dimensionalities and modality counts. Cluster centers from this unsupervised stage guide an iterative greedy imputation process for any missing modalities during training or inference, supporting arbitrary numbers of modalities and arbitrary missing patterns per sample. The imputation module is lightweight, uses frozen encoders, and decouples from the downstream task, allowing easy integration with any fusion/prediction architecture. Extensive experiments under diverse and highly incomplete regimes demonstrate UL4M4's robustness, achieving, to the best of our knowledge, the first consistent F1-Micro scores above 0.7 on challenging missing configurations even when more than 50\% of modality slots are missing. Results are also stable across cluster sizes and significantly outperform state-of-the-art baselines. Code is available here: https://github.com/h-ismkhan/Multimodal-Learning-with-Missing-Modalities-via-Unsupervised-Learning.

2606.16044 2026-06-16 cs.LG q-bio.QM 新提交

Circuit Tracing in Autoregressive Protein Language Models

自回归蛋白质语言模型中的电路追踪

Darin Tsui, William Deinzer, Daniel Saeedi, Amirali Aghazadeh

发表机构 * Stanford University(斯坦福大学)

AI总结 提出ProGenMech框架,通过跨层稀疏编码器忠实恢复ProGen3的生成计算,并零样本发现与蛋白质生成和适应性预测相关的稀疏电路,揭示生物意义基序。

Comments Accepted into the Mechanistic Interpretability Workshop at ICML 2026. 24 pages, 14 figures

详情
AI中文摘要

蛋白质语言模型(pLMs)可以生成具有超越自然界观察到的特性的新型蛋白质序列,然而蛋白质生成背后的机制仍然知之甚少。现有的基于稀疏自编码器和跨层编码器的机械可解释性方法主要关注蛋白质表示学习模型,并未捕捉自回归生成所需的计算。在这里,我们引入了ProGenMech,一个用于生成式蛋白质语言模型的机械可解释性框架,它将跨层编码器(CLTs)扩展到ProGen3,一个为因果生成和跨度填充训练的稀疏专家混合模型。与逐层方法不同,CLTs使用来自所有前层的稀疏潜变量重建每一层,从而能够忠实地恢复层间生成计算。我们进一步开发了一个零样本电路发现框架,以识别负责蛋白质生成和适应性预测的稀疏潜电路。在因果生成和零样本适应性估计任务中,ProGenMech在恢复ProGen3的概率分布和功能评分行为方面优于局部跨层编码器基线,同时在跨度填充任务中匹配原始模型的生成分布。此外,恢复的电路揭示了与保守序列模式和蛋白质适应性景观相关的生物学上有意义的基序和功能区域,为可解释和可引导的蛋白质生成奠定了基础。

英文摘要

Protein language models (pLMs) can generate novel protein sequences with properties beyond those observed in nature, yet the mechanisms underlying protein generation remain poorly understood. Existing mechanistic interpretability methods based on sparse autoencoders and transcoders primarily focus on protein representation learning models and do not capture the computation required for autoregressive generation. Here, we introduce ProGenMech, a mechanistic interpretability framework for generative protein language models that extends cross-layer transcoders (CLTs) to ProGen3, a sparse Mixture-of-Experts model trained for both causal generation and span infilling. Unlike per-layer approaches, CLTs reconstruct each layer using sparse latent variables from all preceding layers, enabling faithful recovery of inter-layer generative computation. We further develop a zero-shot circuit discovery framework to identify sparse latent circuits responsible for protein generation and fitness prediction. In causal generation and zero-shot fitness estimation tasks, ProGenMech outperforms local transcoder baselines in recovering ProGen3's probability distribution and functional scoring behavior, while matching the original model's generative distribution in span infilling tasks. Moreover, the recovered circuits reveal biologically meaningful motifs and functional regions associated with conserved sequence patterns and protein fitness landscapes, establishing a foundation for interpretable and steerable protein generation.

2606.16462 2026-06-16 cs.LG cs.AI 新提交

Learning aligned EEG representations with subject-specific encoders

学习带有主体特定编码器的对齐脑电图表示

Bruna J. Lopes, Gabriel Schwartz, Sylvain Chevallier, Raphael Y. de Camargo, Bruno Aristimunha

发表机构 * University of São Paulo(圣保罗大学) Université Paris-Saclay, Inria TAU team, LISN-CNRS(巴黎萨克雷大学,Inria TAU团队,LISN-CNRS) Institut de neuromodulation, GHU Paris, psychiatrie et neurosciences, centre hospitalier Sainte-Anne, pôle hospitalo-universitaire 15, Université Paris Cité(神经调控研究所,GHU巴黎,精神病学与神经科学,圣安娜医院,大学医院中心15区,巴黎西岱大学) Federal University of ABC (UFABC)(ABC联邦大学) Yneuro Swartz Center for Computational Neuroscience (SCCN), Institute for Neural Computation (INC), University of California San Diego(斯沃茨计算神经科学中心,神经计算研究所,加州大学圣地亚哥分校)

AI总结 提出使用主体特定编码器替代共享编码器,结合共同分类器实现跨主体脑电图对齐,实验表明该方法能内化欧几里得对齐的作用,提高类别区分度,并识别出未见主体的编码器选择是主要瓶颈。

详情
AI中文摘要

跨主体脑电图解码有望提供更多训练数据,但也使神经网络面临强烈的跨主体分布偏移。我们研究仅凭任务监督和架构是否能学习主体对齐的表示。我们将共享的脑电图编码器替换为主体特定编码器后接共同分类器,并在四个运动想象数据集上将该混合模型与标准EEGNet、AttentionBaseNet和CTNet基线(结合欧几里得对齐EA)进行比较。EA通过重新居中主体协方差改进了共享编码器,但混合编码器在很大程度上内化了这一作用:当移除EA时,验证损失曲线和潜在距离分析变化很小。主体特定头增加了类别区分度,并将每个主体置于其自身的潜在流形附近,改善了大多数主体,但留下了一个对方法敏感的子集。这些结果支持主体特定编码器作为脑电图解码的学习对齐机制,并将未见主体的编码器选择确定为剩余瓶颈。

英文摘要

Cross-subject EEG decoding promises more training data, but it also exposes neural networks to strong inter-subject distribution shifts. We study whether task supervision and architecture alone can learn subject-aligned representations. We replace a shared EEG encoder with subject-specific encoders followed by a common classifier, and compare this hybrid model with standard EEGNet, AttentionBaseNet, and CTNet baselines with Euclidean Alignment (EA) on four motor-imagery datasets. EA improves shared encoders by recentering subject covariances, but the hybrid encoder largely internalises this role: validation-loss curves and latent-distance analyses change little when EA is removed. Subject-specific heads increase class distinctiveness and place each subject close to its own latent manifold, improving most subjects while leaving a method-sensitive subset. These results support subject-specific encoders as a learned alignment mechanism for EEG decoding and identify head selection for unseen subjects as the remaining bottleneck.

2606.14752 2026-06-16 cs.CV cs.AI cs.LG cs.RO 交叉投稿

X-Tokenizer: A Multimodal Action Tokenizer for Vision-Language-Action Pretraining

X-Tokenizer: 一种用于视觉-语言-动作预训练的多模态动作分词器

Xirui Kang, Yanpei Shi, Lucy Liang, Roy Gan, Dongxiu Liu, Pushi Zhang, Danpeng Chen, Xiaoyi Qin, Yinan Zheng, Jinliang Zheng, Hao Wang, Xianyuan Zhan, Hang Su

发表机构 * Square Robot City University of Hong Kong(香港城市大学) Tsinghua University(清华大学)

AI总结 提出X-Tokenizer,通过语义残差量化(SRQ)和掩码动作建模(MAM)将动作离散化为语义接口,在2.4M轨迹上预训练后提升VLA模型的多模态接地和长程任务性能。

Comments Project page: https://x-square-robot.github.io/X-Tokenizer_projectPage/

详情
AI中文摘要

现代视觉-语言-动作(VLA)模型必须桥接预训练的视觉-语言推理和精确的连续机器人控制。现有的动作分词器主要为了重建而离散化动作,产生的编码保留了运动几何结构,但仅向主干网络提供弱语义监督。因此,我们将动作分词化不仅视为压缩,而是作为多模态推理与可执行控制之间的语义接口学习。为此,我们引入了X-Tokenizer,一种轻量级的编码器-语义残差量化(SRQ)-解码器架构,为多种机械臂形态提供共享的动作接口。其关键组件SRQ在残差向量量化上施加了非对称结构:第一层通过掩码动作建模(MAM)训练,形成捕获粗略运动意图的离散动作语言,而更深层则保持面向重建的残差,保留细粒度细节。为了进一步将动作标记与多模态语义对齐,X-Tokenizer通过与预训练基础模型的表示空间进行对比对齐以及下一帧视觉-语言特征预测进行预训练。在2.4M轨迹(2.0B动作帧)上预训练后,单个冻结的X-Tokenizer作为表示塑造的监督信号插入混合离散-连续VLA中。X-Tokenizer在真实世界聚合指标上达到最佳,并在RoboTwin 2.0模拟中表现强劲。在多模态接地(+13.5%)和长程任务(+8.25)上优于FAST,表明动作分词器作为VLA预训练的语义接口,而不仅仅是动作压缩。

英文摘要

Modern Vision-Language-Action (VLA) models must bridge pretrained vision-language reasoning and precise continuous robot control. Existing action tokenizers discretize actions primarily for reconstruction, producing codes that preserve motion geometry but provide only weak semantic supervision to the backbone. We therefore formulate action tokenization not as mere compression, but as semantic interface learning between multimodal reasoning and executable control. To this end, we introduce X-Tokenizer, a lightweight encoder-Semantic Residual Quantization (SRQ)-decoder architecture that provides a shared action interface across diverse robotic arm embodiments. Its key component, SRQ, imposes an asymmetric structure on residual vector quantization: the first level is trained with Masked Action Modeling (MAM) to form a discrete action language that captures coarse motion intent, while deeper levels remain reconstruction-oriented residuals that preserve fine-grained details. To further align action tokens with multimodal semantics, X-Tokenizer is pretrained with contrastive alignment to the representation space of a pretrained foundation model and with next-frame vision-language feature prediction. Pretrained on 2.4M trajectories (2.0B action frames), a single frozen X-Tokenizer plugs into a mixed discrete-continuous VLA as a representation-shaping supervision signal. X-Tokenizer achieves top real-world aggregate and strong RoboTwin 2.0 simulation results. Outperforming FAST in multimodal grounding (+13.5%) and long-horizon tasks (+8.25), it shows that action tokenizers serve as semantic interfaces for VLA pretraining beyond mere action compression.

2606.14765 2026-06-16 cs.CV cs.AI cs.LG cs.MM 交叉投稿

Momentum-Guided Semantic Forecasting (MoFore) for Self-Supervised Video Representation Learning

动量引导的语义预测(MoFore)用于自监督视频表示学习

Qinwu Xu

发表机构 * Qinwu Xu, PhD(秦武 Xu 博士)

AI总结 提出MoFore框架,通过预测未来潜在嵌入进行自监督视频表示学习,结合对比正则化防止表示崩溃,在UCF101上验证了时间一致性和语义结构。

Comments 13 pages, 5 Figures, and 2 Tables

详情
AI中文摘要

自监督视频表示学习最近通过对比学习、掩码重建和预测表示学习取得了进展。基于重建的方法如MAE和VideoMAE通过恢复掩码视觉内容来学习表示,而对比方法如CLIP通过表示对齐学习语义有意义的嵌入空间。在这项工作中,我们提出了一种动量引导的语义预测框架(MoFore)用于自监督视频表示学习。该方法不是优化像素级重建或任务特定的语义对齐,而是通过从时间上遥远的上下文片段预测未来的潜在嵌入来学习时间预测性视频表示。为了提高跨时间尺度的鲁棒性,我们进一步引入了训练期间的随机时间间隔预测。该框架将预测性潜在预测与对比正则化相结合,以鼓励时间一致性同时防止表示崩溃。在UCF101数据集上的实验表明,所提出的框架在训练期间不使用动作标签的情况下学习了时间一致且语义有意义的视频表示。定量分析显示学习到的嵌入空间具有强时间稳定性和涌现的类别级结构,而定性检索实验揭示了跨相关活动的运动感知组织。总体而言,结果表明长程潜在预测为自监督视频表示学习提供了一种有效且计算高效的方法,而不依赖于基于重建的目标。

英文摘要

Self-supervised video representation learning has recently advanced through contrastive learning, masked reconstruction, and predictive representation learning. Reconstruction-based approaches such as MAE and VideoMAE learn representations by recovering masked visual content \cite{he2022mae,tong2022videomae}, while contrastive methods such as CLIP learn semantically meaningful embedding spaces through representation alignment \cite{radford2021clip}. In this work, we introduce a Momentum-Guided Semantic Forecasting framework (MoFore) for self-supervised video representation learning. Instead of optimizing for pixel-level reconstruction or task-specific semantic alignment, the proposed method learns temporally predictive video representations by forecasting future latent embeddings from temporally distant context clips. To improve robustness across temporal scales, we further introduce randomized temporal-gap forecasting during training. The framework combines predictive latent forecasting with contrastive regularization to encourage temporal consistency while preventing representation collapse. Experiments on the UCF101 dataset demonstrate that the proposed framework learns temporally consistent and semantically meaningful video representations without using action labels during training. Quantitative analysis shows strong temporal stability and emergent category-level structure in the learned embedding space, while qualitative retrieval experiments reveal motion-aware organization across related activities. Overall, the results suggest that long-range latent forecasting provides an effective and computationally efficient approach for self-supervised video representation learning without relying on reconstruction-based objectives.

2606.14791 2026-06-16 eess.AS cs.LG cs.SD 交叉投稿

From Physics to Representation: Audio Learning with Synthetic Pre-training via Procedural Generation

从物理到表示:通过程序化生成进行合成预训练的音频学习

Fengrui Liu, Ruiyang Huang, Qijian Zheng, Yuanfang Wang, Feng Liu

发表机构 * East China Normal University(华东师范大学) Southeast University(东南大学) Fudan University(复旦大学) Shanghai Jiao Tong University(上海交通大学)

AI总结 提出AudioPG框架,利用程序化合成生成波形进行掩码自编码器预训练,无需真实音频数据,在多个基准上取得高精度,且单GPU训练不到20分钟。

Comments Accepted to ACM ICMR 2026

详情
AI中文摘要

自监督学习推动了多媒体分析中音频表示的发展。然而,主流的数据驱动方法依赖大规模真实世界语料库,增加了训练成本、整理负担和隐私障碍。为解决这一问题,我们提出了AudioPG,一个程序化合成框架,在预训练过程中完全消除了真实音频录音。AudioPG在由基本声学基元和组合规则实时生成的波形上训练基于Transformer的掩码自编码器。该编码器有效迁移到真实音频基准,在ESC-50上达到90.60%的准确率,在FSD50K上达到0.546 mAP,在UrbanSound8K上达到88.17%,在Speech Commands V2上达到97.03%。值得注意的是,预训练在单个GPU上不到20分钟即可完成。潜在空间分析揭示了物理因素(包括基频和相对强度)在正交子空间中出现,使得表示可线性解码。这些结果表明,当大规模语料库不可用时,程序化合成是一种高效、可解释的预训练信号。我们的代码可在https://github.com/Freyliu0516/audioPG获取。

英文摘要

Self-supervised learning advances audio representation for multimedia analysis. However, prevailing data-centric approaches rely on massive real-world corpora, increasing training costs, curation burdens, and privacy barriers. To address this, we present AudioPG, a procedural synthesis framework eliminating real audio recordings during pre-training. AudioPG trains a Transformer-based masked autoencoder on waveforms generated on-the-fly from basic acoustic primitives and composition rules. The encoder transfers effectively to real audio benchmarks, achieving 90.60% accuracy on ESC-50, 0.546 mAP on FSD50K, 88.17% on UrbanSound8K, and 97.03% on Speech Commands V2. Notably, pre-training completes in under 20 minutes on a single GPU. Latent space analysis reveals physical factors, including fundamental frequency and relative intensity, emerge in orthogonal subspaces, making representations linearly decodable. These results establish procedural synthesis as an efficient, interpretable pre-training signal when large-scale corpora are unavailable. Our code is available at: https://github.com/Freyliu0516/audioPG.

2606.14813 2026-06-16 hep-ph cs.AI cs.LG 交叉投稿

JetParticle-JEPA: An Efficient Self-Supervised Representation Learning method for Jet Tagging in High-Energy Physics

JetParticle-JEPA:一种用于高能物理喷注标记的高效自监督表示学习方法

Guillaume Letellier, Antonin Vacheret, Frédéric Jurie

发表机构 * GREYC, Normandy University, Unicaen, ENSICAEN, UMR CNRS 6072(GREYC,诺曼底大学,Unicaen,ENSICAEN,CNRS UMR 6072) LPC, Normandy University, Unicaen, ENSICAEN, IN2P3, UMR CNRS 6534(LPC,诺曼底大学,Unicaen,ENSICAEN,IN2P3,CNRS UMR 6534)

AI总结 提出JetParticle-JEPA,一种基于粒子Transformer的自监督联合嵌入预测架构,无需标记或重建原始输入,直接从连续粒子云学习物理有意义的喷注表示,在JetClass等基准上达到与全监督方法相当的性能,并在低标签场景下超越监督基线。

详情
AI中文摘要

大型强子对撞机上的喷注标记越来越依赖于在大量模拟数据集上训练的深度学习模型,导致计算成本高且对探测器建模误差的鲁棒性有限。我们引入了JetParticle-JEPA (JP-JEPA),一种自监督联合嵌入预测架构,它直接从连续粒子云中学习物理有意义的喷注表示,无需对原始输入进行标记化或重建。基于粒子Transformer主干,JP-JEPA在保留细粒度运动学相关性的同时预测被掩码粒子的潜在表示。在JetClass基准上,JP-JEPA在完整数据集上实现了与全监督最先进方法相当的性能,在低标签场景下超越了监督基线,并显著优于现有的自监督学习方法。在顶夸克和夸克-胶子喷注标记基准上,它与监督方法保持同等水平。学习到的表示还对缺失探测器信息表现出强鲁棒性,并改善了不确定性行为,凸显了JP-JEPA作为LHC上鲁棒且数据高效的喷注物理基础模型框架的潜力。

英文摘要

Jet tagging at the Large Hadron Collider increasingly relies on deep learning models trained on massive simulated datasets, leading to high computational costs and limited robustness to detector mismodeling. We introduce JetParticle-JEPA (JP-JEPA), a self-supervised Joint-Embedding Predictive Architecture that learns physically meaningful jet representations directly from continuous particle clouds without tokenization or reconstruction of raw inputs. Built on a Particle Transformer backbone, JP-JEPA predicts latent representations of masked particles while preserving fine-grained kinematic correlations. On the JetClass benchmark, JP-JEPA achieves performance comparable to fully supervised state-of-the-art methods on the full dataset, surpasses supervised baselines in low-label regimes, and significantly outperforms existing SSL approaches. On Top Quark and Quark-Gluon Tagging benchmarks, it remains on par with supervised methods. The learned representations also exhibit strong robustness to missing detector information and improved uncertainty behavior, highlighting JP-JEPA as a promising foundation-model framework for robust and data-efficient jet physics at the LHC.

2606.15134 2026-06-16 cs.CV cs.AI cs.LG 交叉投稿

Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings

超越标量距离:来自冻结MLLM的语义属性梯度用于视觉嵌入

Shubhang Bhatnagar, Dheeraj Baiju, Narendra Ahuja

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出SAGA框架,利用冻结的多模态大语言模型(MLLM)通过GRPO奖励机制为视觉编码器提供属性级监督,替代传统标量距离,提升零样本图像检索性能。

详情
AI中文摘要

用于检索的视觉编码器通常通过类标签监督进行训练:每个训练对简化为一个标量,均匀地将嵌入推远或拉近,就好像每个视觉属性要么不同要么匹配。一个多模态大语言模型(MLLM),在展示相同的一对图像时,能够阐述这些属性并利用它们预测图像是否共享一个类别。我们提出\textbf{SAGA},一个框架,将这种基于语言、属性感知的感知转化为编码器本身的训练信号。具体来说,我们使用组相对策略优化(GRPO)来奖励MLLM对视觉编码器令牌的正确预测。由于正确的预测要求这些令牌暴露该对之间不同或匹配的具体属性,梯度推动编码器编码这些属性,用属性解析的监督取代统一的成对标量。一个辅助的注意力蒸馏损失将编码器的嵌入锚定到MLLM关注的令牌上,一个标准的度量学习损失塑造嵌入几何结构以进行最近邻检索。MLLM在整个过程中被冻结,在推理时被丢弃,与度量学习基线的部署成本相匹配。在CUB-200-2011、Cars-196、FGVC-Aircraft和iNaturalist Aves上的零样本图像检索中,SAGA在Recall@1上比最先进的基线提高了3到6个百分点。

英文摘要

Vision encoders for retrieval are typically trained with class-label supervision: each training pair reduces to a scalar that uniformly pushes the embedding apart or pulls it together, as if every visual attribute either differed or matched. A multimodal large language model (MLLM), shown the same pair, can articulate those attributes and use them to predict whether the images share a class. We propose \textbf{SAGA}, a framework that turns this language-grounded, attribute-aware perception into a training signal for the encoder itself. Specifically, we use Group Relative Policy Optimization (GRPO) to reward the MLLM for correct predictions on the vision encoder's tokens. Since correct predictions require those tokens to expose the specific attributes that differ or match between the pair, the gradient pushes the encoder to encode them, replacing the uniform pair-level scalar with attribute-resolved supervision. An auxiliary attention-distillation loss anchors the encoder's embedding to tokens the MLLM attended to, and a standard metric-learning loss shapes the embedding geometry for nearest-neighbour retrieval. The MLLM is frozen throughout and discarded at inference, matching the deployment cost of a metric-learning baseline. SAGA improves Recall@1 by 3 to 6 points over state-of-the-art baselines on CUB-200-2011, Cars-196, FGVC-Aircraft, and iNaturalist Aves on zero-shot image retrieval.

2606.15284 2026-06-16 eess.SP cs.AI cs.LG 交叉投稿

CAP: Towards PPG Universal Representation Learning with Patient-level Supervision

CAP:面向患者级监督的PPG通用表示学习

Chenyang He, Xinyi Shao, Shun Huang, Bosong Huang, Daoqiang Zhang, Ming Jing, Cheng Ding

发表机构 * Nanjing University of Aeronautics and Astronautics(南京航空航天大学) Peking University(北京大学) Independent Researcher(独立研究者) Jinling Clinical Medical College College of Artificial Intelligence Nanjing University of Aeronautics and Astronautics(金陵临床医学院人工智能学院南京航空航天大学)

AI总结 提出CAP方法,通过构建大规模PPG-EHR多模态数据集和跨模态对比对齐,学习患者级临床语义的PPG表示,在四项下游任务中平均提升26.7%,呼吸率预测提升87.6%。

Comments Accepted as an Oral presentation at KDD 2026

详情
AI中文摘要

光电容积描记法(PPG)在可穿戴健康监测和临床决策支持中发挥着核心作用。然而,现有的通用PPG表示学习方法主要关注信号级目标,往往忽略患者级健康背景,这限制了对复杂临床任务和异质性队列的泛化能力。为解决这一问题,我们通过将碎片化的病史和临床记录整合为连贯的患者级电子健康记录(EHR),构建了一个大规模配对PPG-EHR多模态数据集。基于此资源,我们提出了临床锚定预训练方法(CAP)。在预训练期间,CAP执行跨模态对比对齐,将PPG表示锚定到患者级临床语义,引导编码器超越波形拟合,建模患者整体生理状态的一致性。在下游适应期间,预训练的PPG编码器提供临床基础的表示,增强归纳偏置,提高鲁棒性和可迁移性。实验表明,CAP在四个不同的下游任务上持续优于强基线。CAP在呼吸率预测上取得了特别大的提升(相比最先进基线相对提升高达87.6%),并在所有任务上平均相对提升26.7%。我们通过全面分析(包括消融实验和多个互补的可视化学习表示)进一步增强了方法的可解释性。实验代码可在 https://github.com/gody123gody/CAP 获取。

英文摘要

Photoplethysmography (PPG) plays a central role in wearable health monitoring and clinical decision support. Yet existing approaches to universal PPG representation learning largely focus on signal-level objectives and often overlook patient-level health context, which limits generalization to complex clinical tasks and heterogeneous cohorts. To address this gap, we construct a large-scale paired PPG-EHR multimodal dataset by distilling fragmented medical histories and clinical records into cohesive, patient-level electronic health records (EHR). Building on this resource, we propose Clinical Anchored Pretraining for PPG (CAP). During pretraining, CAP performs cross-modal contrastive alignment that anchors PPG representations to patient-level clinical semantics, guiding the encoder beyond waveform fitting toward modeling consistency in a patient's overall physiological state. During downstream adaptation, the pretrained PPG encoder provides clinically grounded representations that strengthen inductive bias and improve robustness and transferability. Experiments demonstrate that CAP consistently outperforms strong baselines on four diverse downstream tasks. CAP achieves a particularly large gain on respiratory rate prediction (up to +87.6% relative improvement over the state-of-the-art baseline) and delivers an average relative +26.7% across all tasks. We further enhance the interpretability of our approach through comprehensive analyses, including ablations and multiple complementary visualizations of the learned representations. The code for our experiments is available at: https://github.com/gody123gody/CAP .

2606.15468 2026-06-16 cs.CV cs.LG 交叉投稿

Analyzing Visual Aircraft Representations with Sparse Autoencoders

使用稀疏自编码器分析飞机视觉表示

Deepshik Sharma

发表机构 * Jain University(耆那大学)

AI总结 本文通过稀疏自编码器分解ConvNeXt模型在FGVC-Aircraft数据集上的中间表示,发现可解释的飞机结构特征,并通过消融实验验证其类别相关性。

Comments 18 pages, 4 figures, 7 tables

详情
AI中文摘要

视觉模型可以在分类任务上取得强性能,但支持其预测的内部表示通常难以解释。本文研究稀疏自编码器是否可以将视觉模型的中间表示分解为可解释的特征。我们在FGVC-Aircraft数据集上训练ConvNeXt分类器,从其最终特征阶段提取空间激活,并在这些激活上训练稀疏自编码器。使用最高激活图像块、激活强度和类别选择性分析学习到的稀疏特征。定性视觉检查显示,几个特征对应于可识别的飞机结构和视觉模式。我们使用输入空间和特征空间消融评估选定的特征子集,测量模糊图像块和抑制稀疏特征对类别logits、分类边界和预测置信度的影响。结果表明,稀疏自编码器可以揭示与飞机识别相关的部分可解释、类别相关的视觉特征,同时也暴露出多义性和粗糙空间定位等局限性。

英文摘要

Vision models can achieve strong performance on classification tasks, but the internal representations supporting their predictions are often difficult to interpret. This work investigates whether sparse autoencoders can decompose intermediate representations of a vision model into interpretable features. We train a ConvNeXt classifier on the FGVC-Aircraft dataset, extract spatial activations from its final feature stage, and train a sparse autoencoder on these activations. The learned sparse features are analyzed using top-activating image patches, activation strength, and class selectivity. Qualitative visual inspection reveals that several features correspond to recognizable aircraft structures and visual patterns. We evaluate a subset of selected features using input-space and feature-space ablations, measuring how blurring image patches and suppressing sparse features affect class logits, classification margins, and prediction confidence. The results suggest that sparse autoencoders can reveal partially interpretable, class-relevant visual features associated with aircraft recognition, while also exposing limitations such as polysemanticity and coarse spatial localization.

2606.15956 2026-06-16 cs.CV cs.AI cs.LG 交叉投稿

You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences

你不需要强假设:通过时间差异进行视觉表示学习

Ninad Daithankar, Alexi Gladstone, Yann LeCun, Heng Ji

发表机构 * UIUC(伊利诺伊大学厄巴纳-香槟分校) New York University(纽约大学)

AI总结 提出TDV方法,基于因果假设(过去导致未来)从视频中自监督学习,避免强归纳偏置,在密集空间任务上达到SOTA。

详情
AI中文摘要

AI的进步很大程度上是由假设更少的方法驱动的。随着计算和数据量的增加,弱归纳偏置的方法通常优于强假设的方法。这在视觉表示学习领域尤为典型,方法从监督学习主导,到弱监督学习,再到如今无需人工标签的自监督学习的广泛成功。然而,即使是现代自监督学习方法仍然依赖于强归纳偏置,如数据增强、掩码或裁剪。如果这一趋势持续,这些剩余的偏置在大规模下将成为瓶颈——我们的实验证实了这一点:随着数据增长,归纳偏置的最优强度降低。这促使我们寻找依赖更少假设的方法。为此,我们提出了视觉时间差异(TDV),一种从视频中进行自监督学习的新范式,它避免了现有的归纳偏置,而是依赖于一个因果假设:过去导致未来。TDV通过联合训练图像编码器和运动编码器,使得当前帧的表示加上编码的运动等于下一帧的表示。尽管没有利用任何强归纳偏置,TDV在密集空间任务上达到了最先进的水平,为无需强假设的表示学习奠定了基础。

英文摘要

Progress in AI has largely been driven by methods that assume less. As compute and data increase, approaches with weaker inductive biases generally outperform those with stronger assumptions. This is particularly characteristic of the field of Visual Representation Learning, where approaches have gone from being dominated by Supervised Learning, to Weakly Supervised Learning, to the now widespread success of Self-Supervised Learning without human labels. Yet, even modern Self-Supervised Learning approaches still depend on strong inductive biases such as augmentations, masking, or cropping. If this trend holds, even these remaining biases should become bottlenecks at scale -- and our experiments confirm this: the optimal strength of inductive biases decreases as data grows. This motivates the search for approaches that rely on fewer assumptions. To this end, we introduce Temporal Difference in Vision (TDV), a new paradigm for self-supervised learning from video that avoids existing inductive biases, relying instead on a causal assumption that the past causes the future. TDV functions by jointly training an image encoder and a motion encoder so that the current frame's representation plus the encoded motion equals the next frame's representation. Despite not leveraging any strong inductive biases, TDV matches state-of-the-art recipes on dense spatial tasks, laying the foundation for representation learning without strong assumptions.

2606.16193 2026-06-16 cs.CV cs.AI cs.LG 交叉投稿

Cascaded Sparse Autoencoders Learn Multi-Level Visual Concepts in Multimodal LLMs

级联稀疏自编码器在多模态大语言模型中学习多级视觉概念

Yusong Zhao, Hengyi Wang, Tanuja Ganu, Akshay Nambi, Hao Wang

发表机构 * Rutgers University(罗格斯大学) Microsoft Research(微软研究院)

AI总结 提出级联稀疏自编码器(CSAEs),通过在第一级SAE解码器权重上训练第二级SAE来学习层次化视觉概念,避免嵌套或堆叠SAE的缺点,在多个MLLM和数据集上提升了概念层次一致性和干预效果。

详情
AI中文摘要

多模态大语言模型(MLLMs)在视觉-语言任务上表现出色,但其内部视觉表示仍难以解释。稀疏自编码器(SAEs)提供了一种可扩展的方式,将密集模型激活分解为稀疏、可解释的特征。然而,现有SAE架构主要恢复扁平特征字典,不太适合显式的多级概念组织。在本文中,我们引入级联稀疏自编码器(CSAEs)用于学习MLLMs中的层次化视觉概念。CSAEs并非嵌套或堆叠SAE稀疏激活码,而是直接在第一个SAE的解码器权重上训练第二个SAE,将学习到的低级特征方向作为高级抽象的输入。这种设计使CSAEs能够学习“概念的概念”,同时避免了嵌套、Matryoshka式层次结构中的共享前缀耦合问题以及简单堆叠SAE的瓶颈。在Qwen3-VL、Gemma-3和LLaVA上的多个视觉数据集上的实验表明,与最先进的SAE基线相比,CSAEs在层次概念一致性方面提高了可解释性。概念引导的结果进一步表明,学习到的概念组支持对MLLM输出进行有效的组级干预。

英文摘要

Multimodal Large Language Models (MLLMs) have demonstrated strong performance on vision-language tasks, yet their internal visual representations remain difficult to interpret. Sparse Autoencoders (SAEs) provide a scalable way to decompose dense model activations into sparse, interpretable features. However, existing SAE architectures primarily recover flat feature dictionaries and are less suited for explicit multi-level concept organization. In this paper, we introduce cascaded sparse autoencoders (CSAEs) for learning hierarchical visual concepts in MLLMs. Rather than nesting or stacking SAE sparse activation codes, CSAEs train a second-level SAE directly on the decoder weights of the first-level SAE, treating learned low-level feature directions as inputs for higher-level abstraction. This design enables CSAEs to learn "concepts of concepts" while avoiding drawbacks from the shared-prefix coupling of nesting, Matryoshka-style hierarchies and the bottlenecks of naively stacked SAEs. Experiments across Qwen3-VL, Gemma-3, and LLaVA on multiple visual datasets show that CSAEs improve interpretability in terms of hierarchical concept coherence over state-of-the-art SAE baselines. Results on concept steering further demonstrate that the learned concept groups support effective group-level interventions in MLLM outputs.

2606.16240 2026-06-16 cs.CL cs.LG 交叉投稿

Creative Collision: Directorial Persona Steering and Competition in Large Language Models

创意碰撞:大型语言模型中的导演人格引导与竞争

Subramanyam Sahoo, Justin Shenk

发表机构 * AI Safety Camp(AI安全训练营)

AI总结 研究通过叠加两种语义相反的导演人格向量(斯皮尔伯格与斯科塞斯)来引导语言模型生成,发现斯皮尔伯格向量主导道德倾向,中间点提升连贯性,且两者在特定层共享道德基调基底。

Comments Accepted at ICML 2026 Workshop on Human-AI Co-Creativity

详情
AI中文摘要

激活引导已成为在推理时塑造大型语言模型行为的强大工具,但以往大多数工作向残差流注入单一的语义方向。我们研究了两种语义相反的引导向量叠加的丰富场景——我们称之为“创意碰撞”。具体而言,我们通过在精心策划的剧本语料库上进行均值差异激活对比,构建了史蒂文·斯皮尔伯格(乐观、救赎的道德价值)和马丁·斯科塞斯(黑暗、道德模糊)的导演人格向量,然后通过标量混合参数$α\in[0,1]$和引导系数$λ$在两者之间进行插值。在五个评估轴(道德价值、生成连贯性、表面风格、方向主导性和向量几何)上,出现了三个主要发现:(i)斯皮尔伯格的表征特征表现出稳健的“方向主导性”,在几乎整个插值范围内抑制了斯科塞斯的道德影响;(ii)中间碰撞点在高$λ$下相对于纯单导演引导反而提高了生成连贯性;(iii)两种人格在40层仅解码器Transformer的第28层达到最大定位,揭示了一个共享的“道德基调基底”。这些结果阐明了Transformer残差流中竞争语义方向的几何结构,并对可控创意生成和价值对齐叙事合成具有直接影响。

英文摘要

Activation steering has emerged as a powerful tool for shaping the behaviour of large language models at inference time, yet most prior work injects a \emph{single} semantic direction into the residual stream. We study the richer setting in which two semantically opposing steering vectors are superimposed -- a regime we call \textbf{Creative Collision}. Concretely, we construct directorial persona vectors for Steven Spielberg (optimistic, redemptive moral valence) and Martin Scorsese (dark, morally ambiguous) via mean-difference activation contrast on curated screenplay-derived corpora, then interpolate between them with a scalar mixing parameter $α\in [0,1]$ and a steering coefficient $λ$. Across five evaluation axes -- moral valence, generation coherence, surface style, directional dominance, and vector geometry -- three principal findings emerge: (i)~Spielberg's representational signature exhibits robust \emph{directional dominance}, suppressing Scorsese's moral influence across almost the entire interpolation range; (ii)~intermediate collision points paradoxically \emph{improve} generation coherence relative to pure single-director steering at high $λ$; and (iii)~both personas localise maximally to layer~28 of a 40-layer decoder-only transformer, revealing a shared \emph{moral-tone substrate}. These results illuminate the geometry of competing semantic directions in transformer residual streams and have direct implications for controllable creative generation and value-aligned narrative synthesis.

2508.00956 2026-06-16 cs.LG cs.AI cs.IR 版本更新

FOUNDv2: Learning Unified User Quantized Tokenizers for User Representation

FOUNDv2: 学习统一的用户量化分词器用于用户表示

Chuan He, Yang Chen, Bin Dou, Wuliang Huang, Baokun Wang, Yongchao Liu, Xing Fu, Yu Cheng, Chuntao Hong, Weiqiang Wang, Zhongle Xie, Jiajun Zheng, Xin-Wei Yao

发表机构 * Ant Group(蚂蚁集团) Zhejiang University of Technology(浙江工业大学) Zhejiang University(浙江大学)

AI总结 提出FOUNDv2框架,通过统一用户量化分词器(U2QT)将异构用户数据转化为离散令牌,结合多视图RQ-VAE和多尺度对齐目标,实现高效存储和预测性能,在多个基准上优于任务特定基线。

详情
AI中文摘要

用户表示学习是大规模网络平台上个性化服务的基础支柱。尽管其重要性,传统的连续嵌入方法面临重大挑战,包括缺乏多源数据融合的统一范式、由于信息密度低导致的过高存储开销以及缺乏多尺度建模粒度。为克服这些限制,我们引入FOUNDv2,一个以统一用户量化分词器(U2QT)框架为核心的综合用户表示方案。FOUNDv2通过一个稳健的两阶段架构将异构用户数据转化为标准化的离散令牌空间。具体来说,该框架首先提取紧凑的特征表示,然后使用多视图RQ-VAE通过共享和源特定的码本将其离散化为存储高效的令牌。为了赋予这些表示预测智能,我们进一步设计多尺度对齐目标以捕捉细粒度的行为依赖和宏观时间周期性。在各种基准上的大量实验表明,FOUNDv2在实现存储和计算成本大幅降低的同时,始终优于任务特定基线。最后,FOUNDv2在支付宝上的大规模部署验证了其在多种工业场景中的实际可扩展性和效率。主要代码可在以下网址获取:this https URL。

英文摘要

User representation learning serves as a fundamental pillar for personalized services on large-scale web platforms. Despite its importance, conventional continuous embedding methods face significant challenges, including the lack of a unified paradigm for multi-source data integration, prohibitive storage overhead due to low information density, and the lack of multi-scale modeling granularity. To overcome these limitations, we introduce FOUNDv2, a comprehensive user representation scheme centered on the Unified User Quantized Tokenizer U2QT) framework. FOUNDv2 transforms heterogeneous user data into a standardized discrete token space through a robust two-stage architecture. Specifically, the framework first extracts compact feature representations and subsequently employs a multi-view RQ-VAE to discretize them into storage-efficient tokens using shared and source-specific codebooks. To empower these representations with predictive intelligence, we further design multi-scale alignment objectives to capture both fine-grained behavioral dependencies and macro-temporal periodicity. Extensive experiments on various benchmarks demonstrate that FOUNDv2 consistently outperforms task-specific baselines while achieving substantial reductions in storage and computational costs. Finally, the large-scale deployment of FOUNDv2 on Alipay validates its practical scalability and efficiency across diverse industrial scenarios. The main code is available at: https://github.com/chuanhe1999/FOUNDv2.

2511.05963 2026-06-16 cs.LG 版本更新

Next-Latent Prediction Transformers Learn Compact World Models

下一潜在预测变换器学习紧凑世界模型

Jayden Teoh, Manan Tomar, Kwangjun Ahn, Edward S. Hu, Tim Pearce, Pratyusha Sharma, Akshay Krishnamurthy, Riashat Islam, Alex Lamb, John Langford

发表机构 * Massachusetts Institute of Technology(麻省理工学院) Microsoft Research(微软研究院)

AI总结 提出NextLat方法,通过潜在空间中的自监督预测训练变换器学习紧凑世界模型,提升泛化能力和推理效率。

Comments Microsoft Research Preprint

详情
AI中文摘要

变换器用随序列长度增长的记忆和能够对过去标记进行即席查找的自注意力取代了循环。因此,它们缺乏将历史压缩成具有一致转换规则的紧凑潜在状态的内在动机。这常常导致学习到的解决方案泛化能力差。我们引入了下一潜在预测(NextLat),该方法在潜在空间中通过自监督预测扩展了标准的下一标记训练。具体来说,NextLat训练变换器学习潜在表示,这些表示能够根据下一个标记预测其下一个潜在状态。理论上,我们证明了这些潜在状态可证明地收敛于信念状态,即预测未来所需的历史的压缩信息。这个简单的辅助目标为变换器注入了循环归纳偏置,同时保持其架构、并行训练效率和推理不变。NextLat有效地鼓励变换器形成具有连贯信念状态和转换动力学的紧凑内部世界模型——这些关键属性是标准下一标记预测单独无法保证的。在经验上,在世界建模、推理、规划和语言建模等多个基准测试中,NextLat在下游准确性、表示压缩和前瞻规划方面表现出比标准下一标记预测和其他基线显著的提升。此外,NextLat实现了可变长度的自推测解码,在语言建模中将推理速度提升高达3.3倍。NextLat提供了一种简单而有效的范式,用于学习变换器中泛化能力更强的紧凑预测表示。我们的代码可在https://github.com/microsoft/NextLat获取。

英文摘要

Transformers replace recurrence with a memory that grows with sequence length and self-attention that enables ad-hoc lookups over past tokens. Consequently, they lack an inherent incentive to compress history into compact latent states with consistent transition rules. This often leads to learning solutions that generalize poorly. We introduce Next-Latent Prediction (NextLat), which extends standard next-token training with self-supervised predictions in the latent space. Specifically, NextLat trains a transformer to learn latent representations that are predictive of its next latent state given the next token. Theoretically, we show that these latents provably converge towards belief states, compressed information about the history necessary to predict the future. This simple auxiliary objective injects a recurrent inductive bias into transformers while leaving their architecture, parallel training efficiency, and inference unchanged. NextLat effectively encourages transformers to form compact internal world models with coherent belief states and transition dynamics -- crucial properties not guaranteed by standard next-token prediction alone. Empirically, across benchmarks in world modeling, reasoning, planning, and language modeling, NextLat demonstrates significant gains over standard next-token prediction and other baselines in downstream accuracy, representation compression, and lookahead planning. Furthermore, NextLat enables variable-length self-speculative decoding, accelerating inference by up to 3.3x in language modeling. NextLat offers a simple yet effective paradigm for learning compact, predictive representations in transformers that generalize better. Our code is available at https://github.com/JaydenTeoh/NextLat.

2602.24012 2026-06-16 cs.LG eess.SP 版本更新

InfoNCE Induces Gaussian Distribution

InfoNCE 诱导高斯分布

Roy Betser, Eyal Gofer, Meir Yossef Levi, Guy Gilboa

发表机构 * Technion - Israel Institute of Technology(技术学院-以色列理工学院)

AI总结 本文证明对比学习中的 InfoNCE 损失函数会使表示学习产生高斯分布结构,通过理论分析和实验验证了这种高斯行为。

Comments Accepted to ICLR 2026, Oral

详情
AI中文摘要

对比学习已成为现代表示学习的基石,允许使用大量无标签数据训练任务特定和通用(基础)模型。对比训练中的典型损失是 InfoNCE 及其变体。在这项工作中,我们展示了 InfoNCE 目标在对比训练中诱导出表示的高斯结构。我们在两个互补的机制中建立了这一结果。首先,我们表明在某些对齐和集中假设下,高维表示的投影渐近地接近多元高斯分布。其次,在较不严格的假设下,我们表明添加一个促进低特征范数和高特征熵的小渐近消失正则化项会导致类似的渐近结果。我们通过在合成和 CIFAR-10 数据集上使用多种编码器架构和大小的实验支持我们的分析,展示了一致的高斯行为。这一视角为对比表示中常见的高斯性提供了原则性解释。由此产生的高斯模型使得对学习表示进行原则性分析处理成为可能,并有望支持对比学习中的广泛应用。

英文摘要

Contrastive learning has become a cornerstone of modern representation learning, allowing training with massive unlabeled data for both task-specific and general (foundation) models. A prototypical loss in contrastive training is InfoNCE and its variants. In this work, we show that the InfoNCE objective induces Gaussian structure in representations that emerge from contrastive training. We establish this result in two complementary regimes. First, we show that under certain alignment and concentration assumptions, projections of the high-dimensional representation asymptotically approach a multivariate Gaussian distribution. Next, under less strict assumptions, we show that adding a small asymptotically vanishing regularization term that promotes low feature norm and high feature entropy leads to similar asymptotic results. We support our analysis with experiments on synthetic and CIFAR-10 datasets across multiple encoder architectures and sizes, demonstrating consistent Gaussian behavior. This perspective provides a principled explanation for commonly observed Gaussianity in contrastive representations. The resulting Gaussian model enables principled analytical treatment of learned representations and is expected to support a wide range of applications in contrastive learning.

2602.09764 2026-06-16 cs.CV cs.IR cs.LG 版本更新

Self-Supervised Learning as Discrete Communication

自监督学习作为离散通信

Kawtar Zaher, Ilyass Moummad, Olivier Buisson, Alexis Joly

发表机构 * Kawtar Zaher, Ilyass Moummad, Olivier Buisson, Alexis Joly

AI总结 将视觉自监督学习视为教师与学生网络间的离散通信过程,通过固定容量二进制信道传输语义信息,使用逐元素二元交叉熵目标强制离散一致性,并引入编码率正则化促进结构化表示,在图像分类、检索和密集预测任务上优于连续对齐基线。

详情
AI中文摘要

大多数自监督学习(SSL)方法通过对齐同一输入的不同视图来学习连续视觉表示,对信息如何在表示维度间进行结构化提供的控制有限。在这项工作中,我们将视觉自监督学习视为教师网络与学生网络之间的离散通信过程,其中语义信息通过固定容量的二进制信道传输。学生网络不是对齐连续特征,而是预测教师网络产生的多标签二进制消息。通过逐元素二元交叉熵目标强制离散一致性,同时编码率正则化项鼓励有效利用受限信道,促进结构化表示。我们进一步表明,周期性地重新初始化投影头通过鼓励嵌入在多个离散编码中保持可预测性来增强这种效果。大量实验表明,在图像分类、检索和密集视觉预测任务中,以及通过自监督适应在领域转移下,该方法持续优于连续对齐基线。除了骨干表示,我们分析了学习到的二进制编码,并表明它们形成了一种紧凑且信息丰富的离散语言,捕获了可跨类别复用的语义因子。

英文摘要

Most self-supervised learning (SSL) methods learn continuous visual representations by aligning different views of the same input, offering limited control over how information is structured across representation dimensions. In this work, we frame visual self-supervised learning as a discrete communication process between a teacher and a student network, where semantic information is transmitted through a fixed-capacity binary channel. Rather than aligning continuous features, the student predicts multi-label binary messages produced by the teacher. Discrete agreement is enforced through an element-wise binary cross-entropy objective, while a coding-rate regularization term encourages effective utilization of the constrained channel, promoting structured representations. We further show that periodically reinitializing the projection head strengthens this effect by encouraging embeddings that remain predictive across multiple discrete encodings. Extensive experiments demonstrate consistent improvements over continuous agreement baselines on image classification, retrieval, and dense visual prediction tasks, as well as under domain shift through self-supervised adaptation. Beyond backbone representations, we analyze the learned binary codes and show that they form a compact and informative discrete language, capturing semantic factors reusable across classes.

2604.25853 2026-06-16 cs.CL cs.AI cs.LG 版本更新

G-Loss: Graph-Guided Fine-Tuning of Language Models

G-Loss:图引导的语言模型微调

Aditya Sharma, Vinti Agarwal, Rajesh Kumar

发表机构 * BITS Pilani(BITS 派拉尼) Bucknell University(巴克内尔大学)

AI总结 提出G-Loss损失函数,通过构建文档相似度图并利用半监督标签传播捕捉全局语义结构,引导语言模型学习更具判别性和鲁棒性的嵌入,在多个分类任务上提升准确率并加速收敛。

Comments 20 pages, Learning on Graphs (LoG2025)

详情
AI中文摘要

用于微调预训练语言模型(如BERT)的传统损失函数,包括交叉熵、对比损失、三元组损失和监督对比损失,仅在局部邻域内操作,未能考虑全局语义结构。我们提出了G-Loss,一种图引导的损失函数,它结合半监督标签传播来利用嵌入流形中的结构关系。G-Loss构建了一个文档相似度图,捕捉全局语义关系,从而引导模型学习更具判别性和鲁棒性的嵌入。我们在五个涵盖关键下游分类任务的基准数据集上评估了G-Loss:MR(情感分析)、R8和R52(主题分类)、Ohsumed(医学文档分类)和20NG(新闻分类)。在大多数实验设置中,G-Loss收敛更快,并产生语义一致的嵌入空间,从而比使用传统损失函数微调的模型获得更高的分类准确率。

英文摘要

Traditional loss functions, including cross-entropy, contrastive, triplet, and su pervised contrastive losses, used for fine-tuning pre-trained language models such as BERT, operate only within local neighborhoods and fail to account for the global semantic structure. We present G-Loss, a graph-guided loss function that incorporates semi-supervised label propagation to use structural relationships within the embedding manifold. G-Loss builds a document-similarity graph that captures global semantic relationships, thereby guiding the model to learn more discriminative and robust embeddings. We evaluate G-Loss on five benchmark datasets covering key downstream classification tasks: MR (sentiment analysis), R8 and R52 (topic categorization), Ohsumed (medical document classification), and 20NG (news categorization). In the majority of experimental setups, G-Loss converges faster and produces semantically coherent embedding spaces, resulting in higher classification accuracy than models fine-tuned with traditional loss functions.

3. 强化学习与序列决策 62 篇

2606.14801 2026-06-16 cs.LG cs.AI cs.RO 新提交

QPILOTS: Efficient Test-Time Q-Steering for Flow Policies

QPILOTS:面向流策略的高效测试时Q引导

Yifan Ruan, Chenyang Cao, Andreas Burger, Ali Pesaranghader, Kaveh Kamali, Jaehong Kim, Nandita Vijaykumar, Alan Aspuru-Guzik, Igor Gilitschenski, Nicholas Rhinehart

发表机构 * University of Toronto(多伦多大学) Vector Institute(向量研究所) LG Electronics(LG电子)

AI总结 提出QPILOTS方法,在推理时通过投影去噪中间状态到最终动作估计并计算评论家梯度来引导流匹配和扩散策略,无需修改原策略,在离线到在线RL基准上达到90%平均成功率。

Comments 10 pages, 7 figures

详情
AI中文摘要

流匹配和扩散策略是表达力强的动作生成器,但使用时序差分强化学习(RL)优化它们仍然困难。有效的策略提取需要利用评论家的动作梯度,但通过多步去噪过程直接反向传播该信号可能数值不稳定。现有方法要么丢弃梯度信息,将策略蒸馏为更简单的单步动作器,要么随着评论家改进而重复微调去噪策略。我们提出QPILOTS,一种保持原策略不变并在推理时引导去噪过程的方法。在每个去噪步骤中,我们不是评估评论家对噪声中间动作(其中评论家预测不可靠),而是首先将该中间状态投影到最终干净动作的估计,并在那里计算评论家梯度。我们引入两种变体:QPILOTS-U使用快速单点近似,而QPILOTS-M通过学习的辅助网络绘制可微后验样本。在标准的离线到在线RL基准测试中,QPILOTS实现了最佳整体性能,在50个任务中达到平均90%的成功率。我们还应用QPILOTS引导一个大型、冻结的预训练视觉-语言动作(VLA)基础模型,在模拟的六个操作任务中优于或匹配先前的推理时方法。

英文摘要

Flow-matching and diffusion policies are expressive action generators, but optimizing them with temporal-difference reinforcement learning (RL) remains difficult. Effective policy extraction requires exploiting the critic's action gradient, yet directly backpropagating this signal through a multi-step denoising process can be numerically unstable. Existing methods work around this either by discarding gradient information, distilling the policy into a simpler one-step actor, or repeatedly fine-tuning the denoising policy as the critic improves. We propose QPILOTS, a method that leaves the original policy unmodified and steers the denoising process at inference time. At each denoising step, instead of evaluating the critic on the noisy intermediate action where critic predictions are unreliable, we first project that intermediate state to an estimate of the final clean action and compute the critic gradient there. We introduce two variants: QPILOTS-U uses a fast single-point approximation, while QPILOTS-M draws differentiable posterior samples via a learned auxiliary network. On a standard offline-to-online RL benchmark, QPILOTS achieves the best aggregate performance, reaching an average success rate of 90% across 50 tasks. We also apply QPILOTS to steer a large, frozen, pretrained Vision-Language Action (VLA) foundation model, outperforming or matching prior inference-time approaches across six manipulation tasks in simulation.

2606.14929 2026-06-16 cs.LG cs.AI stat.ML 新提交

Policy Regret for Embedding Model Routing: Contextual Bandits with Low-Rank Experts

嵌入模型路由的策略遗憾:具有低秩专家的上下文赌博机

Yan Dai, Negin Golrezaei, Patrick Jaillet

发表机构 * Operations Research Center, MIT(麻省理工学院运筹学研究中心) Sloan School of Management, MIT(麻省理工学院斯隆管理学院) Department of EECS, MIT(麻省理工学院电气工程与计算机科学系)

AI总结 针对推荐系统中嵌入模型路由问题,形式化为具有低秩专家的对抗性上下文线性赌博机,提出Hypentropy策略梯度算法,实现$\tilde{\mathcal O}(s\sqrt{M T})$线性化策略遗憾。

详情
AI中文摘要

现代推荐系统越来越依赖于将多样化的查询动态路由到多个嵌入模型。尽管具有实际意义,但在对抗性查询、赌博机反馈和模型有限可观测性等现实条件下,该问题仍未得到充分理解。我们将嵌入模型路由形式化为具有低秩专家的对抗性上下文线性赌博机,其中上下文是查询,动作是物品,专家是在低秩潜在表示空间上工作的嵌入模型。我们首先证明,标准遗憾概念存在结构错误指定或统计难解性,并确定了一个对数二次策略类,它足够表达以捕获查询相关的模型路由,同时又足够结构化以允许高效的在线学习。其次,我们提出了一种称为Hypentropy策略梯度(HPG)的策略梯度算法。它在不完全信息下可证明地适应未知的低秩结构,并达到$\tilde{\mathcal O}(s\sqrt{M T})$线性化策略遗憾——其中$s$、$M$和$T$分别是专家的内在秩、模型数量和轮数——从而避免了维度灾难。最后,我们还提供了HPG的计算高效且无需参数调整的实现。

英文摘要

Modern recommendation systems increasingly rely on dynamically routing diverse queries to multiple embedding models. Despite its practical significance, this problem remains poorly understood under realistic conditions like adversarial queries, bandit feedback, and limited observability of models. We formalize embedding model routing as an adversarial contextual linear bandit with low-rank experts, where contexts are queries, actions are items, and experts are the embedding models working on low-rank latent representation spaces. We first establish that standard regret notions suffer from structural misspecification or statistical intractability, and we identify a log-quadratic policy class that is expressive enough to capture query-dependent model routing, yet structured enough to allow efficient online learning. Second, we propose a policy gradient algorithm called Hypentropy Policy Gradient (HPG). It provably adapts to the unknown low-rank structure under incomplete information and attains $\tilde{\mathcal O}(s\sqrt{M T})$ linearized policy regret -- where $s, M$, and $T$ are the intrinsic rank of the experts, the number of models, and the number of rounds -- thus avoiding a curse of dimensionality. Finally, we also provide an computationally efficient and parameter-free implementation of HPG.

2606.15146 2026-06-16 cs.LG 新提交

Contextual Bandits for Maximizing Stimulated Word-of-Mouth Rewards

最大化激励口碑奖励的上下文赌博机

Ahmed Sayeed Faruk, Elena Zheleva

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出上下文多臂赌博机框架,通过学习个体溢出概率并排序连接用户,以最大化激励口碑奖励,实验证明考虑溢出异质性可提升目标定位精度。

Comments Presented at the AAAI 2025 Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning (PRL)

详情
AI中文摘要

激励口碑是一种通过提示或激励促进信息分享的策略。通过社交网络优化激励口碑需要识别并定位最易受溢出效应影响的连接用户,即推荐的影响力超出直接受众并波及与其相连的用户。溢出概率因个体及其连接而异,导致异质性。理解并准确估计社交网络中用户间的溢出概率对于提高激励口碑的效果至关重要。为此,我们提出了一种新颖的上下文多臂赌博机框架,该框架学习个体溢出概率并对连接用户进行排序,以最大化激励口碑的奖励。在真实网络数据集上的实验表明,考虑溢出异质性可提升对 top-$k$ 连接用户的目标定位精度,从而增加奖励,并优于不学习个体溢出效应的基线方法。

英文摘要

Stimulated word-of-mouth is a strategy that promotes information sharing through prompts or incentives. Optimizing stimulated word-of-mouth through social networks requires identifying and targeting connected users who are most susceptible to spillover, a phenomenon where the influence of recommendations extends beyond the immediate audience to impact their connected users. The probability of spillover varies across individuals, and their connections, leading to heterogeneity. Understanding and accurately estimating the spillover probabilities among users in social networks is crucial for improving the effectiveness of stimulated word-of-mouth. To address this, we present a novel contextual multi-armed bandit framework that learns individual spillover probabilities and ranks connected users to maximize rewards from stimulated word-of-mouth. Experiments on real-world network datasets demonstrate that accounting for spillover heterogeneity enhances the targeting precision of top-$k$ connected users, boosting rewards and outperforming baseline methods that do not learn individual spillover effects.

2606.15197 2026-06-16 cs.LG cs.AI 新提交

StarOR: Synergizing Tree Search and Test-Time Reinforcement Learning for Optimization Modeling

StarOR: 协同树搜索与测试时强化学习用于优化建模

Jiajun Li, Yu Ding, Shisi Guan, Ran Hou, Wanyuan Wang

发表机构 * School of Computer Science and Engineering, Southeast University(东南大学计算机科学与工程学院) Northwest A&F University(西北农林科技大学)

AI总结 提出StarOR框架,结合蒙特卡洛树搜索与测试时强化学习,通过四阶段分解和GRPO更新LoRA适配器,实现无监督细粒度奖励的中间决策优化,在5个基准上以4B模型达到最优性能。

Comments 41pages, V1, preprint

详情
AI中文摘要

优化建模本质上是层次化的,需要精确的符号承诺序列。传统的基于学习的自动化优化建模方法通过大规模标注或策划的训练数据改进建模策略,但适应新问题分布成本高昂。同时,一次性生成在层次化建模中仍然脆弱,早期符号错误可能传播为无效公式。测试时缩放通过额外的实例级计算实现结构探索,提供了一种有前景的替代方案;然而,现有的基于搜索的方法通常依赖固定策略,导致重复展开继承相似的建模偏差,并为中间决策提供有限的信用分配。为了解决这些限制,我们提出了StarOR,一种协同搜索与适应的框架,将MCTS与测试时强化学习相结合用于优化建模。StarOR将建模过程分解为四个阶段,并通过GRPO在每个非终端节点更新瞬态LoRA适配器。通过使用MCTS生成的兄弟节点作为局部比较集,StarOR将搜索时的探索转化为实例特定的策略细化。此外,无监督的多方面奖励系统为中间公式决策提供细粒度反馈,无需真实标签。在五个优化基准上的实验表明,即使使用4B骨干网络,StarOR也实现了最先进的性能,优于现有方法和前沿LLMs。

英文摘要

Optimization modeling is inherently hierarchical, requiring a precise sequence of symbolic commitments. Traditional learning-based automated optimization modeling methods improve modeling policies through large-scale annotated or curated training data, but are costly to adapt to new problem distributions. Meanwhile, one-shot generation remains brittle in hierarchical modeling, where early symbolic errors can propagate into invalid formulations. Test-time scaling offers a promising alternative by enabling structural exploration with additional instance-level computation; however, existing search-based methods typically rely on a fixed policy, causing repeated rollouts to inherit similar modeling biases and providing limited credit assignment for intermediate decisions. To address these limitations, we propose StarOR, a synergistic search-and-adaptation framework that couples MCTS with Test-Time Reinforcement Learning for optimization modeling. StarOR decomposes the modeling process into four stages and updates a transient LoRA adapter via GRPO at each non-terminal node. By using MCTS-generated siblings as local comparison sets, StarOR transforms search-time exploration into instance-specific policy refinement. Moreover, an unsupervised multi-faceted reward system provides fine-grained feedback for intermediate formulation decisions without ground-truth labels. Experiments across five optimization benchmarks show that StarOR achieves state-of-the-art performance even with a 4B backbone, outperforming existing methods and the frontier LLMs.

2606.15247 2026-06-16 cs.LG cs.AI 新提交

Exploring Starts Are Not Enough: Counterexamples and a Fix for Monte Carlo Exploring Starts

探索性初始状态并不足够:蒙特卡洛探索性初始状态的反例与修正

Octave Oliviers, Glenn Vinnicombe

发表机构 * Department of Engineering, University of Cambridge(剑桥大学工程系)

AI总结 本文通过构造反例证明,在表格设置下,蒙特卡洛探索性初始状态(MCES)算法可能收敛到次优解,并提出基于状态级学习率缩放的修正方法以恢复最优性收敛。

详情
AI中文摘要

蒙特卡洛探索性初始状态(MCES)的渐近行为是强化学习中一个长期存在的开放问题,即使在表格设置中也是如此。我们通过构造算法收敛到次优解的例子,研究了表格MCES的收敛性质。本文为初始访问和首次访问MCES提供了新的反例,并给出了初始访问情况下的收敛恢复修正。我们表明,即使贪婪动作平均更新频率高于非贪婪动作,初始访问MCES在样本平均更新下也可能存在稳定的次优解。然而,通过按状态将学习率与更新频率成反比缩放,可以保证收敛到最优性。与之前的均匀化方法不同,此修正适用于需要近似估计值函数的大规模问题。然后,我们扩展该例子以表明样本平均首次访问MCES也可能收敛到次优解。这基本上解决了一个基本的开放问题,并表明仅靠探索性初始状态并不能保证收敛到最优性。更广泛地说,这些结果突显了收敛性关键取决于应用于不同动作的更新的相对大小和频率,使得学习率的选择以及探索与利用的平衡成为MCES分析和可扩展蒙特卡洛控制方法实现的核心。

英文摘要

The asymptotic behaviour of Monte Carlo Exploring Starts (MCES) is a long-standing open question in reinforcement learning, even in the tabular setting. We investigated the convergence properties of tabular MCES by constructing examples in which the algorithm converges to suboptimal solutions. This paper presents new counterexamples for both initial-visit and first-visit MCES and gives a convergence-restoring modification for the initial-visit case. We show that stable suboptimal solutions may exist for initial-visit MCES with sample-average updates even when greedy actions are updated more often than non-greedy actions on average. However, by scaling learning rates inversely to update frequencies on a state-by-state basis, convergence to optimality is guaranteed. Unlike previous uniformisation methods, this modification is applicable to large-scale problems that require approximating the estimated value function. We then extend the example to show that sample-average first-visit MCES may also converge to suboptimal solutions. This largely settles a fundamental open problem and shows that exploring starts alone do not guarantee convergence to optimality. More broadly, these results highlight that convergence depends critically on the relative size and frequency of updates applied to different actions, making the choice of learning rates and the balance between exploration and exploitation central to the analysis of MCES and the implementation of scalable Monte Carlo control methods.

2606.15260 2026-06-16 cs.LG cs.AI 新提交

Trust-Region Diffusion Policies for Massively Parallel On-Policy RL

大规模并行在线强化学习的信任区域扩散策略

Huy Le, Onur Celik, Denis Blessing, Tai Hoang, Claas A Voelcker, Axel Brunnbauer, Felix Richter, Michael Volpp, Gerhard Neumann

发表机构 * University of Freiburg(弗赖堡大学) Max Planck Institute for Intelligent Systems(智能系统马克斯·普朗克研究所)

AI总结 提出TruDi方法,通过信任区域优化约束扩散轨迹的KL散度,实现大规模并行在线强化学习中的稳定训练,在73个任务中优于或持平基线。

详情
AI中文摘要

利用大规模并行模拟的强化学习已成为开发鲁棒、可部署策略的标准框架;然而,大多数现有方法仍依赖简单的高斯策略参数化。扩散模型提供了更具表达力的策略类,并在具有挑战性的控制问题上表现出色,但大多数基于扩散的强化学习方法是为离线或离策略训练设计的。在这项工作中,我们探究扩散策略能否在大规模并行、在线策略机制下有效训练。为此,我们引入了信任区域扩散策略(TruDi),它使得扩散策略能够用于大规模并行模拟的在线强化学习。这种设置特别具有挑战性,因为数据分布在每次更新中快速变化,使得复杂策略的稳定训练变得困难。TruDi通过整合信任区域优化规则来约束整个扩散轨迹上的KL散度,从而解决了这一问题。实验上,我们在包含73个任务的4个不同的大规模并行强化学习基准上评估了TruDi。在这些任务中,TruDi在标准任务上始终优于或与强基线持平,在更具挑战性的人形控制任务上取得了明显收益,为大规模并行在线强化学习建立了新的强基线。

英文摘要

Reinforcement learning with massively parallel simulations has become a standard framework for developing robust, deployable policies; however, most existing approaches still rely on simple Gaussian policy parameterizations. Diffusion models provide a more expressive policy class and have shown strong performance on challenging control problems, yet most diffusion-based RL methods are designed for offline or off-policy training. In this work, we ask whether diffusion policies can be trained effectively in the massively parallel, on-policy regime. To this end, we introduce Trust-region Diffusion Policies (TruDi), which enables diffusion policies for on-policy RL with massively parallel simulations. This setting is particularly challenging because the data distribution changes quickly across updates, making stable training with complex policies difficult. TruDi addresses this by integrating a trust-region optimization rule to enforce a KL-divergence constraint over the entire diffusion trajectory. Empirically, we evaluate TruDi on a diverse set of 4 massively parallel RL benchmarks comprising a total of 73 tasks. Across these tasks, TruDi consistently outperforms or is on-par with strong baselines on standard tasks and achieves clear gains on more challenging humanoid control tasks, establishing a strong new baseline for massively parallel on-policy RL.

2606.15301 2026-06-16 cs.LG cs.AI 新提交

Discovering Lattice Reduction Strategies via Self-Play

通过自我对弈发现格基约简策略

Mohamed Malhou, Kristin Lauter, Ludovic Perret

发表机构 * FAIR, Meta Superintelligence Labs(Meta超级智能实验室FAIR) Sorbonne Université CNRS, LIP6(索邦大学CNRS/LIP6) EPITA, EPITA Research Lab (LRE)(EPITA研究实验室(LRE))

AI总结 利用深度强化学习和AlphaZero风格自我对弈,在LLL原始动作空间中学习更优的格基约简策略,训练于8维格但可零样本泛化至32维。

详情
AI中文摘要

Lenstra-Lenstra-Lovász (LLL) 算法是计算机科学中用于格基约简的开创性贡献,但其多项式时间输出的基随着维数增长远非最优。我们证明,深度强化学习可以通过与LLL的原始动作空间交互,发现严格更优、可泛化的约简策略。我们将格基约简形式化为单人马尔可夫决策过程 (MDP),并使用AlphaZero风格的自我对弈流水线训练深度残差网络,该流水线结合了自适应视界MCTS(蒙特卡洛树搜索),将多步网络预测与熵门控扩展机制耦合。由此产生的策略DeltaStar仅在小的8维q-ary格上训练,且需要的原始行操作少于LLL。关键的是,它无需重新训练即可零样本泛化到未见过的模数和高达n=32的更高维度。

英文摘要

The Lenstra-Lenstra-Lovász (LLL) algorithm is a seminal contribution to computer science used for lattice basis reduction, yet its polynomial-time outputs produce bases that are far from optimal as the dimension grows. We show that deep reinforcement learning can discover strictly superior, generalizable reduction strategies by interacting with the primitive action space of LLL. We formulate lattice reduction as a single-player Markov Decision Process (MDP) and train a deep residual network using an AlphaZero-style self-play pipeline augmented with adaptive-horizon MCTS (Monte Carlo Tree Search), which couples multi-step network predictions with an entropy-gated expansion mechanism. The resulting policy, DeltaStar, is trained exclusively on small $8$-dimensional $q$-ary lattices and requires fewer primitive row operations than LLL. Crucially, it generalizes zero-shot to unseen moduli and higher dimensions up to $n=32$ without retraining.

2606.15917 2026-06-16 cs.LG 新提交

Reinforcement Learning for LLM-based Event Forecasting

基于强化学习的LLM事件预测

Amit Arnold Levy

发表机构 * Advanced Computer Science(高级计算机科学) DeepSeek R1

AI总结 使用GRPO微调LLM,结合Wikipedia修订工具获取实时信息,预测未来事件,使1.5B参数模型性能超越Claude Sonnet 3.5。

Comments Submitted internally at the University of Oxford in Oct 2025, migrated to arXiv on Jun 2026

详情
AI中文摘要

我们使用Group Relative Policy Optimization (GRPO),一种最近提出的样本和内存高效的强化学习方法,来微调预训练的LLM(参数范围1.5B到14B),使其能够通过Wikipedia修订工具或新闻摘要获取当前信息,从而预测超出LLM知识截止日期的真实事件,以及模拟训练动态不同方面的问题。我们利用这些实验结果来评论LLM在预测方面的扩展能力,并分类判断性预测如何适应可验证/不可验证的领域分类法,考虑预测未来事件时固有的偶然不确定性(例如掷骰子)的影响。通过GRPO训练,我们成功使一个1.5B参数的Transformer(Qwen 2.5 1.5B)在预测性能上超越了Claude Sonnet 3.5,以市场同意概率的交叉熵衡量。我们还讨论了达到这一结果过程中的各种死胡同。

英文摘要

We use Group Relative Policy Optimization (GRPO), a recently devised sample and memory efficient reinforcement learning method, to finetune pretrained LLMs in the range of 1.5B to 14B parameters equipped with the ability to get current information through the use of a Wikipedia revisions tool, or news summaries, to forecast real events beyond the knowledge cutoff of the LLM, as well as problems made to simulate different aspects of the dynamics of that training. We use the results of these experiments to comment on the scaling capability of LLMs for forecasting, as well as classify how judgmental forecasting fits into the verifiable/unverifiable domain taxonomy, considering the impact of the inherent aleatoric uncertainty when forecasting future events (e.g. the roll of a die). As a result of the GRPO training, we manage to bring a 1.5B parameter transformer (Qwen 2.5 1.5B) to forecasting performance superior to Claude Sonnet 3.5 over the same dataset as measured by cross entropy from the market agreed probabilities. We also discuss various dead ends on the path to this result.

2606.15978 2026-06-16 cs.LG 新提交

Scalar-Stepsize Nonuniform Monte Carlo Optimistic Policy Iteration: A Certified Counterexample

标量步长非均匀蒙特卡洛乐观策略迭代:一个经过认证的反例

Yuanlong Chen

发表机构 * Yuanlong Chen(陈元龙)

AI总结 针对非均匀更新频率下的蒙特卡洛乐观策略迭代,本文通过一个三状态MDP反例证明标量步长非均匀异步值迭代可能不收敛,并揭示了各向异性畸变导致的切换吸引环。

详情
AI中文摘要

Tsitsiklis证明了在均匀更新结构下蒙特卡洛乐观策略迭代的收敛性,并指出非均匀更新频率是一个微妙的障碍。对于自然的标量步长、非归一化异步状态值递归,在固定非均匀状态选择概率下,我们给出了一个经过认证的否定答案。在一个三状态、两动作的折扣MDP中,非均匀更新频率诱导出一个对角缩放贪婪策略平均场,该平均场具有一个经过认证的非恒定混合周期轨道。使用有界无偏几何视界估计器和Robbins-Monro步长,原始随机递归以正概率被困在周期附近,因此无法收敛。该例子揭示了一个几何障碍:均匀采样产生径向残差收缩,而标量非均匀采样各向异性地扭曲残差动态,并可能产生切换吸引环。

英文摘要

Tsitsiklis proved convergence of Monte Carlo optimistic policy iteration under a uniform update structure and identified nonuniform update frequencies as a delicate obstruction. We give a certified negative answer for the natural scalar-stepsize, unnormalized asynchronous state-value recursion with fixed nonuniform state-selection probabilities. In a three-state, two-action discounted MDP, the nonuniform update frequencies induce a diagonally scaled greedy-policy mean field with a certified nonconstant attracting hybrid periodic orbit. With a bounded unbiased geometric-horizon estimator and Robbins--Monro stepsizes, the original stochastic recursion remains trapped near the cycle with positive probability and therefore fails to converge. The example pinpoints a geometric obstruction: uniform sampling gives radial residual contraction, whereas scalar nonuniform sampling anisotropically distorts the residual dynamics and can generate switched attracting cycles.

2606.16154 2026-06-16 cs.LG 新提交

A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization

RLVR稳定性与胜者优势策略优化的梯度视角

Prasanth YSS, Zhichen Ren, Rasa Hosseinzadeh, Ilan Gofman, Yuqi Chen, Zhaoyan Liu, Guangwei Yu, Jesse C. Cresswell, Satya Krishna Gorti

发表机构 * Berkeley(伯克利) Layer 6 AI

AI总结 通过令牌级梯度动力学分析GRPO的不稳定性,提出仅更新正优势完成的WAPO算法,在数学推理和多跳QA任务中提升训练稳定性并匹配或超越基线。

详情
AI中文摘要

具有可验证奖励的强化学习(RLVR)改进了语言模型的推理能力,但GRPO风格的优化仍然容易崩溃。我们通过令牌级梯度动力学分析这种不稳定性,推导出一个分类法,预测更新如何影响下一个令牌的概率和熵。该分类法表明,稳定性共同取决于当前策略下的优势符号和令牌分布。受此发现启发,我们提出了胜者优势策略优化(WAPO),一种简单的在线裁剪策略梯度目标,仅更新正优势完成。在数学推理和多跳QA基准测试中,WAPO提高了训练稳定性,并在多个模型家族中匹配或超越基线。完整代码可在https://github.com/layer6ai-labs/wapo找到。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) improves language-model reasoning, but GRPO-style optimization remains prone to collapse. We analyse this instability through token-level gradient dynamics, deriving a taxonomy that predicts how updates affect next-token probabilities and entropy. The taxonomy shows that stability depends jointly on the advantage sign and token distribution under the current policy. Motivated by this finding, we propose Winner Advantage Policy Optimization (WAPO), a simple online clipped policy-gradient objective that updates only on positive-advantage completions. Across mathematical reasoning and multi-hop QA benchmarks, WAPO improves training stability and matches or outperforms baselines across multiple model families. Full code can be found at https://github.com/layer6ai-labs/wapo.

2606.16236 2026-06-16 cs.LG cs.NE 新提交

Evolutionary Bilevel Reward Shaping for Generalization in Reinforcement Learning

进化双层奖励塑形以增强强化学习的泛化能力

Ekasit Usaratniwart, Xilin Gao, Marc Ong, Youhei Akimoto

发表机构 * University of Tsukuba(筑波大学) RIKEN Center for Advanced Intelligence Project(理化学研究所革新智能综合研究中心)

AI总结 提出GERS方法,通过双层优化利用标量验证反馈调整奖励函数,在限制轨迹访问下提升强化学习在未见环境中的泛化性能。

Comments Accepted at PPSN 2026

详情
AI中文摘要

强化学习(RL)在部署于与训练环境不同的环境时,通常会出现性能下降。现有技术如域随机化(DR)可以缓解这一问题,但需要访问多样化的训练环境和完整的轨迹可观测性,这些假设在隐私保护或受限场景中无法满足,此时仅能获得标量性能指标。我们提出通过进化奖励塑形实现泛化(GERS),一种双层优化方法,仅使用来自验证环境的标量反馈来改善在未见测试环境上的泛化能力。在下层,由上层塑形的奖励函数引导的RL智能体在具有可访问轨迹数据的有限训练环境集上学习策略;在上层,CMA-ES优化奖励塑形参数,以最大化在无法访问轨迹的单独验证环境上的累积未塑形奖励。在连续控制任务上的结果表明,GERS在未见测试环境上优于标准RL基线。尽管DR将GERS的训练和验证环境组合集视为需要轨迹访问的单一训练集,而GERS无法访问验证轨迹,但GERS的性能与DR相当。这些结果证实,GERS在受限数据访问约束下有效增强了泛化能力。

英文摘要

Reinforcement learning (RL) often suffers from performance degradation when deployed in environments that differ from those encountered during training. Existing techniques such as domain randomization (DR) mitigate this, but require access to diverse training environments and full trajectory observability, assumptions that fail in privacy-preserving or restricted scenarios where only scalar performance metrics are available. We propose Generalization via Evolutionary Reward Shaping (GERS), a bilevel optimization approach to improve generalization on unseen test environments using only scalar feedback from validation environments. At the lower level, an RL agent guided via a reward function shaped by the upper level learns a policy on a limited set of training environments with accessible trajectory data; at the upper level, CMA-ES optimizes the reward shaping parameters to maximize the cumulative unshaped reward on separate validation environments for which trajectory access is unavailable. Results on continuous control tasks indicate that GERS outperforms the standard RL baseline on unseen test environments. GERS performance is comparable to DR, despite DR treating the combined set of training and validation environments of GERS as a single training set that requires trajectory access, whereas GERS cannot access validation trajectories. These results confirm that GERS effectively enhances generalization under restricted data access constraints.

2606.16286 2026-06-16 cs.LG cs.AI cs.RO 新提交

FlowMPC: Improving Flow Matching policies with World Models

FlowMPC:利用世界模型改进流匹配策略

Chandon Hamel

发表机构 * Stanford University(斯坦福大学)

AI总结 提出FlowMPC框架,结合流匹配模仿策略与学习的世界模型,通过MPPI规划提升测试时性能,在ManiSkill操作任务中显著提高成功率。

详情
AI中文摘要

流匹配(FM)是一种在多模态动作空间中进行行为克隆的强大方法[Jiang et al., 2025],但由于它没有直接训练以最大化期望回报,FM策略在测试时的表现仍有改进空间。本文研究学习的世界模型是否可以通过对策略提出的候选动作序列进行模型预测路径积分(MPPI)规划来改进FM策略。基于TD-MPC2 [Hansen et al., 2024],我引入了FlowMPC,这是一个将模仿学习的FM策略与学习的世界模型相结合的框架,用于ManiSkill操作任务[Tao et al., 2025]中的测试时规划。在PickCube和PickSingleYCB上,添加世界模型比单独使用FM策略提高了性能,尤其是在回合结束时的成功率方面有显著提升。这些结果表明,基于世界模型的规划可以有效地补充基于流的模仿策略,而无需修改FM训练目标。

英文摘要

Flow Matching (FM) is a powerful approach for behavior cloning in multimodal action spaces [Jiang et al., 2025], but because it is not trained to directly maximize expected return, there is still room to improve how FM policies act at test time. This work investigates whether a learned world model can improve FM policies by enabling Model Predictive Path Integral (MPPI) planning over candidate action sequences proposed by the policy. Building on TD-MPC2 [Hansen et al., 2024], I introduce FlowMPC, a framework that combines an imitation-learned FM policy with a learned world model for test-time planning in ManiSkill manipulation tasks [Tao et al., 2025]. Across PickCube and PickSingleYCB, adding the world model improved performance over the FM policy alone, with especially clear gains in end-of-episode success. These results suggest that world-model-based planning can effectively complement flow-based imitation policies without modifying the FM training objective.

2606.16331 2026-06-16 cs.LG 新提交

Diffusion Offline Reinforcement Learning for Fair and Energy-Efficient UAV-Assisted Wireless Networks

面向公平与节能的无人机辅助无线网络的扩散离线强化学习

Eslam Eldeeb, Hirley Alves

发表机构 * Centre for Wireless Communications (CWC), University of Oulu(奥卢大学无线通信中心(CWC))

AI总结 提出扩散软演员-评论家方法,结合保守Q学习与扩散模型,在离线强化学习中优化无人机轨迹与调度,降低能耗并提升公平性,性能优于现有算法。

详情
AI中文摘要

生成式人工智能与无线通信及信号处理系统的融合为未来6G网络中的智能数据驱动决策开辟了新途径。本文提出一种扩散软演员-评论家方法,利用去噪扩散概率模型增强的离线强化学习,优化无人机网络中的轨迹与调度控制。虽然离线强化学习方法(如保守Q学习)可以从静态数据集中学习,但在低数据或动态条件下往往难以泛化。为此,我们将保守Q学习的鲁棒性与扩散模型的生成能力相结合,实现超越行为策略的、具有信号感知能力的策略学习。将该框架应用于无人机辅助无线网络,可最小化传输能量并提高设备间的公平性。仿真表明,扩散软演员-评论家方法优于标准离线强化学习基线,即使在有限数据集下也能实现更稳定的收敛和更高的奖励。该方法提升了数据效率,降低了能耗,与现有算法相比吞吐量提高了35%以上,展示了其在下一代无线控制系统中进行鲁棒策略学习的潜力。

英文摘要

The integration of generative artificial intelligence with wireless communication and signal processing systems has opened new avenues for intelligent, data-driven decision-making in future 6G networks. This work proposes a diffusion soft actor-critic (Diffusion-SAC) approach that leverages offline reinforcement learning (RL) enhanced by denoising diffusion probabilistic models (DDPMs) to optimize trajectory and scheduling control in unmanned aerial vehicle (UAV) networks. While offline RL methods, such as conservative Q-learning (CQL), can learn from static datasets, they often struggle to generalize in low-data or dynamic conditions. To address this, we combine the robustness of CQL with the generative power of diffusion models, enabling expressive and signal-aware policy learning that generalizes beyond behavior policies. Applied to a UAV-assisted wireless network, the proposed framework minimizes transmission energy and improves fairness among devices. Simulations show that Diffusion-SAC outperforms standard offline RL baselines, achieving more stable convergence and higher rewards even with limited datasets. The method enhances data efficiency, reduces energy consumption, and increases throughput by more than 35 % compared to existing algorithms, demonstrating its potential for robust policy learning in next-generation wireless control systems.

2606.16489 2026-06-16 cs.LG 新提交

BRICKS-WM: Building Reusability via Interface Composition Kinetics for Structured World Models

BRICKS-WM:通过接口组合动力学构建结构化世界模型的可重用性

Shaowei Zhang, Jiahan Cao, Xunlan Zhou, Shenghua Wan, De-Chuan Zhan

发表机构 * National Key Laboratory for Novel Software Technology, Nanjing University, China(南京大学计算机软件新技术国家重点实验室) School of Artificial Intelligence, Nanjing University, China(南京大学人工智能学院) School of Intelligence Science and Technology, Nanjing University, China(南京大学智能科学与技术学院)

AI总结 提出BRICKS-WM框架,将全局动力学分解为通过潜在接口交互的独立模块(如智能体和背景),实现冻结背景模块跨智能体重用,避免从头训练。

详情
AI中文摘要

基于模型强化学习(MBRL)通过利用潜在世界模型在连续控制中取得了显著成功。然而,现有方法通常依赖单一潜在动力学,将环境动力学纠缠为耦合过程。这种耦合严重限制了可重用性:即使环境保持不变,改变智能体也需要从头重新训练整个世界模型。为了解决这个问题,我们引入了BRICKS-WM(通过接口组合动力学构建结构化世界模型的可重用性),一个用于模块化组装结构化世界模型的框架。基于物理世界由独立实体组成的洞察,我们假设全局动力学可以建模为通过潜在接口交互的不同动力学模块的组合。作为一个最小实例,我们将潜在状态空间分解为一个被驱动的智能体模块和一个外部背景模块,通过学习的潜在接口连接。与先前优先考虑视觉分割的以对象为中心的方法不同,BRICKS-WM在转移动力学中强制执行功能分离,确保背景动力学对智能体动力学保持不可知。实验表明,BRICKS-WM在从头训练时实现了与强单一基线相当的控制性能,并能够跨智能体重用冻结的背景动力学。

英文摘要

Model-based Reinforcement Learning (MBRL) has achieved remarkable success in continuous control by leveraging latent world models. However, prevailing approaches typically rely on monolithic latent dynamics, entangling environment dynamics into a coupled process. This coupling severely limits reusability: altering the agent necessitates retraining the entire world from scratch, even if the environment remains constant. To address this, we introduce BRICKS-WM (Building Reusability via Interface Composition Kinetics for Structured World Models), a framework for the modular assembly of structured world models. Driven by the insight that the physical world is composed of independent entities, we posit that global dynamics can be modeled as a composition of distinct dynamical modules interacting via latent interfaces. As a minimal instantiation, we factorize the latent state space into an actuated Agent module and an external Background module, bridged by a learned latent interface. Unlike prior object-centric methods that prioritize visual segmentation, BRICKS-WM enforces a functional separation in transition dynamics, ensuring that background dynamics remains agnostic to the agent's dynamics. Empirically, BRICKS-WM achieves control performance comparable to strong monolithic baselines when trained from scratch, and enables the reuse of frozen background dynamics across agents.

2606.16497 2026-06-16 cs.LG cs.AI cs.CL 新提交

daVinci-kernel: Co-Evolving Skill Selection, Summarization, and Utilization via RL for GPU Kernel Optimization

daVinci-kernel:通过强化学习协同进化技能选择、总结与利用的GPU内核优化

Dayuan Fu, Mohan Jiang, Tongyu Wang, Dian Yang, Jiarui Hu, Liming Liu, Jinlong Hou, Pengfei Li

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出daVinci-kernel框架,通过强化学习联合训练技能选择、策略生成和技能总结三个智能体,共享LLM骨干,实现GPU内核优化,在KernelBench上超越先前最优模型。

详情
AI中文摘要

GPU内核优化代表了一种范式,其中功能正确性被假定,执行效率是目标。我们提出daVinci-kernel,一个强化学习框架,通过动态演化的技能库将技能发现与技能利用相结合。daVinci-kernel联合训练三个共享一个LLM骨干的智能体:技能选择智能体通过BM25和LLM重排序检索相关技术,策略智能体基于所选技能生成多轮CUDA/Triton内核,技能总结智能体将成功轨迹提炼为可复用技能。候选技能仅在基于执行的验证确认可复现加速后才被添加。所有三个智能体共享单个LLM骨干,通过多样性过滤数据上的结构化SFT冷启动初始化,然后通过多轮REINFORCE和每个智能体的优势估计进行端到端联合优化。在KernelBench上,daVinci-kernel-14B在Fast$_1$阈值下,Level 1、Level 2和Level 3分别达到37.2%、70.6%和32.2%,优于先前最强的RL训练模型Dr.Kernel-14B。

英文摘要

GPU kernel optimization represents a paradigm where functional correctness is assumed and execution efficiency is the objective. We present daVinci-kernel, a reinforcement learning framework that couples skill discovery with skill exploitation through a dynamically evolving skill library. daVinci-kernel jointly trains three agents sharing one LLM backbone: a Skill Selection Agent that retrieves relevant techniques via BM25 and LLM reranking, a Policy Agent that generates multi-turn CUDA/Triton kernels conditioned on selected skills, and a Skill Summary Agent that distills successful rollouts into reusable skills. Candidate skills are added only after execution-based verification confirms reproducible speedups. All three agents share a single LLM backbone, are initialized via a structured SFT cold start on diversity-filtered data, and are then jointly optimized end-to-end with multi-turn REINFORCE and per-agent advantage estimation. On KernelBench, daVinci-kernel-14B achieves 37.2%, 70.6%, and 32.2% on Level 1, Level 2, and Level 3 under the Fast$_1$ threshold, outperforming the strongest prior RL-trained model, Dr.Kernel-14B.

2606.16515 2026-06-16 cs.LG cs.AI cs.RO 新提交

Direction-Conditioned Policies via Compositional Subgoal Scoring for Online Goal-Conditioned Reinforcement Learning

基于组合子目标评分的方向条件策略用于在线目标条件强化学习

Swaminathan S K, Damiya Gondha, Theyanesh Eswaramoorthy Rajahkrishnan, Aritra Hazra

AI总结 提出方向条件策略(DCP),通过共享InfoNCE表示将目标达成分解为子目标评分和方向条件动作,理论证明方向充分性、训练与部署一致性及可控子空间失效条件,在九个环境中优于对比RL。

Comments 17 pages, Accepted to the 2nd Workshop on Compositional Learning at ICML 2026 (Seoul, South Korea)

详情
AI中文摘要

Hamilton-Jacobi-Bellman理论表明,最优目标条件动作仅通过当前状态下目标距离的梯度依赖于目标,然而标准的在线GCRL仍然将演员网络条件于原始目标——当目标远离数据分布时,这是一个几何上无信息的信号。我们提出方向条件策略(DCP),一种完全在线的方法,将目标达成分解为两个共享一个InfoNCE表示ψ的组件:一个子目标评分步骤,选择与最终目标g在ψ空间中对齐的已访问状态z_t;以及一个方向条件演员,它消耗从ψ(s_t)到ψ(z_t)的单位方向d_t和幅度r_t。这两个组件联合训练,在部署时干净地分解(子目标评分被移除,而方向条件保留,用g代替z_t),并允许在相同的(d_t, r_t)接口上进行独立修改。我们证明了三个结果。首先,HJB下的方向充分性:在控制仿射动力学下,最优动作仅通过价值梯度依赖于目标。其次,一个定量界表明,在学习表示的温和条件下,并假设评分规则返回一个路径上的z_t,演员在训练和部署时的条件输入在表示误差和测地线松弛下是一致的。第三,一个可控子空间刻画了方向条件失效的情况。在九个环境中,DCP在大多数最终指标上优于对比RL,在操作和障碍物交互任务上提升最大;对学习到的ψ-距离景观的定性分析表明,对比表示表现为一种在线拟度量,编码环境拓扑,而唯一的失败案例(AntSoccer)定位到理论预期的学习梯度病理。

英文摘要

Hamilton-Jacobi-Bellman theory implies that the optimal goal-conditioned action depends on the goal only through the gradient of the goal-reaching distance at the current state, yet standard online GCRL still conditions the actor on the raw goal -- a signal that is geometrically uninformative when the goal is far from the data distribution. We propose Direction-Conditioned Policies (DCP), a fully online method that decomposes goal-reaching into two components sharing one InfoNCE representation $ψ$: a subgoal-scoring step that selects a visited state $z_t$ aligned with the final goal $g$ in $ψ_g$, and a direction-conditioned actor that consumes the unit direction $d_t$ and magnitude $r_t$ from $ψ(s_t)$ to $ψ(z_t)$. The two components train jointly, factor cleanly at deployment (subgoal scoring is removed, while direction conditioning remains with $g$ in place of $z_t$), and admit independent modification at the same $(d_t,r_t)$ interface. We prove three results. First, direction sufficiency under HJB: the optimal action under control-affine dynamics depends on the goal only through the value gradient. Second, a quantitative bound showing that, under mild conditions on the learned representation and assuming the scoring rule returns an on-path $z_t$, the actor's conditioning input at training and at deployment coincide up to representation error and geodesic slack. Third, a controllable-subspace characterization of when directional conditioning fails. Across nine environments, DCP improves over Contrastive RL on most final metrics, with the largest gains on manipulation and obstacle-interaction tasks; a qualitative analysis of the learned $ψ$-distance landscape shows the contrastive representation behaves as an online quasimetric encoding environment topology, and the single failure case (AntSoccer) localizes to a learned-gradient pathology that the theory anticipates.

2606.16656 2026-06-16 cs.LG 新提交

Near-Optimal Stochastic Linear Bandits with Delay

带延迟的近最优随机线性赌博机

Ofir Schlisselberg, Mengxiao Zhang, Yishay Mansour

发表机构 * Tel Aviv University(特拉维夫大学) University of Iowa(爱荷华大学) Tel Aviv University and Google Research(特拉维夫大学和谷歌研究)

AI总结 研究多种延迟模型下的随机线性赌博机,给出近最优遗憾界,揭示延迟与线性结构交互的维度影响。

详情
AI中文摘要

我们研究了在几种延迟模型下具有延迟反馈的随机线性赌博机,并建立了近最优的遗憾保证。我们的结果确定了延迟线性赌博机何时表现出与多臂赌博机(MAB)相同的定性行为,以及线性结构何时产生根本性的新挑战。具体来说,(1)对于\emph{损失无关延迟},其中延迟不依赖于实现的损失(但可能依赖于臂),我们表明延迟仅引起加性遗憾惩罚。在随机延迟下,该惩罚与期望延迟成比例,而在对抗性延迟下,它与最大未完成观测数成比例。值得注意的是,两种延迟惩罚都是无维度的,改进了现有最优结果;(2)对于\emph{损失相关延迟},我们表明线性赌博机比MAB困难得多:与MAB不同,我们在线性赌博机中证明了匹配(最多对数因子)的上界和下界,其延迟惩罚依赖于维度的平方根。(3)对于\emph{延迟作为收益模型},这是损失相关延迟的一个特例,我们表明仅依赖于最优臂延迟的最优MAB保证在线性赌博机中也是无法实现的。这些结果共同提供了延迟反馈如何与线性泛化相互作用的清晰刻画。

英文摘要

We study stochastic linear bandits with delayed feedback under several delay models and establish near-optimal regret guarantees. Our results identify when delayed linear bandits exhibit the same qualitative behavior as multi-armed bandits (MAB), and when the linear structure creates fundamentally new challenges. Specifically, (1) for \emph{loss-independent delays}, where the delay does not depend on the realized loss (but potentially depends on the arm), we show that delays incur only an additive regret penalty. Under stochastic delays, this penalty scales with the expected delay, while under adversarial delays, it scales with the maximum number of outstanding observations. Notably, both delay penalties are dimension-free, improving upon the state-of-the-art results; (2) for \emph{loss-dependent delays}, we show that linear bandits are substantially harder than MAB: unlike in MAB, we prove matching (up to log factors) upper and lower bounds in linear bandits, whose delay penalty depends on the square root of the dimension. (3) for the \emph{delay-as-payoff model}, a special case of loss-dependent delay, we show that the optimal MAB guarantee, which depends only on the delay of the optimal arm, is also unattainable in linear bandits. Together, these results provide a sharp characterization of how delayed feedback interacts with linear generalization.

2606.16729 2026-06-16 cs.LG math.OC 新提交

Learning Policy from a Single Trajectory in Average-Reward Markov Decision Process

从平均奖励马尔可夫决策过程中的单条轨迹学习策略

Jongmin Lee, Ernest K. Ryu, Vaneet Aggarwal

发表机构 * Seoul National University(首尔国立大学) UCLA(加州大学洛杉矶分校) Purdue University(普渡大学)

AI总结 针对弱通信平均奖励MDP,首次从单条轨迹建立有限样本复杂度保证,提出无模型方法,值函数和策略方法分别达到$\widetilde{O}(1/\varepsilon^2)$和$\widetilde{O}(1/\varepsilon^4)$的样本复杂度。

详情
AI中文摘要

尽管已有大量工作刻画了折扣累积奖励MDP的样本复杂度,但平均奖励MDP的有限样本分析仍然有限,且大多数现有工作依赖于遍历性或生成模型访问等限制性假设。在这项工作中,我们首次为弱通信平均奖励MDP从单条轨迹建立了有限样本复杂度保证。为此,我们研究了弱通信MDP中单条轨迹的动力学,并基于此分析,开发了新颖的无模型方法。值得注意的是,我们的基于值函数和基于策略的方法在弱通信MDP中从单条轨迹分别提供了$\widetilde{O}(1/\varepsilon^2)$和$\widetilde{O}(1/\varepsilon^4)$的有限样本复杂度保证。此外,我们引入了第一个无需问题相关参数先验知识的通信MDP无模型方法。

英文摘要

While there is an extensive body of work characterizing the sample complexity of discounted cumulative-reward MDPs, finite sample analyses for average-reward MDPs have been limited, and most existing works rely on restrictive assumptions such as ergodicity or access to a generative model. In this work, we establish the first finite sample complexity guarantees from a single trajectory for weakly communicating average-reward MDPs. To this end, we study the dynamics of a single trajectory in weakly communicating MDPs and based on this analysis, we develop novel model-free methods. Notably, our value-based and policy-based methods provide finite sample complexity guarantees of $\widetilde{O}(1/\varepsilon^2)$ and $\widetilde{O}(1/\varepsilon^4)$ from a single trajectory in weakly communicating MDPs, respectively. Furthermore, we introduce the first model-free method that requires no prior knowledge of problem-dependent quantities for communicating MDPs.

2606.16759 2026-06-16 cs.LG 新提交

Maximum Entropy Inverse Reinforcement Learning for Mean-Field Games with Average Reward

平均奖励均值场博弈的最大熵逆强化学习

Şevket Kaan Alkır, Naci Saldı, Berkay Anahtarcı, Can Deha Karıksız

发表机构 * Bilkent University(比尔肯大学) Özyeğin University(厄齐金大学)

AI总结 针对平均奖励准则下的离散时间无限时域均值场博弈,提出基于最大因果熵的逆强化学习方法,通过占据测度框架统一处理有限维线性奖励和无限维RKHS奖励,并设计梯度上升算法实现策略恢复。

Comments 49 pages, 2 figures, 2 tables

详情
AI中文摘要

我们研究了平均奖励准则下离散时间、无限时域均值场博弈(MFGs)的逆强化学习。专家演示被认为来自未知奖励下的平稳均值场均衡,目标是通过最大因果熵原理恢复解释观察行为的策略。我们通过强制与专家均值场项和长期特征期望的一致性来制定逆问题,在统一的占据测度框架内处理两类奖励。对于有限维线性奖励,我们给出了具有显式对数配分目标的对偶凸重构,并证明了平滑性和曲率性质,从而证明了恒定步长梯度下降的合理性。对于无限维RKHS奖励,我们开发了一种拉格朗日松弛,其内最大化策略由软贝尔曼方程刻画。主要障碍是缺乏折扣因子收缩。我们通过引入基于极小化的次随机核来解决这个问题,该核产生了软贝尔曼算子的严格收缩。我们建立了对数似然得分的Fréchet可微性和Lipschitz平滑性,从而得到了具有收敛保证的梯度上升算法。两个数值例子,一个恶意软件传播MFG和一个基于RKHS的消费者选择模型,表明恢复的策略与专家行为紧密匹配。

英文摘要

We study inverse reinforcement learning for discrete-time, infinite-horizon mean-field games (MFGs) under an average-reward criterion. Expert demonstrations are assumed to arise from a stationary mean-field equilibrium under an unknown reward, and the goal is to recover a policy explaining the observed behaviour via the maximum causal entropy principle. We formulate the inverse problem by enforcing consistency with the expert mean-field term and long-run feature expectations, treating two reward classes within a unified occupation-measure framework. For finite-dimensional linear rewards, we give a convex dual reformulation with an explicit log-partition objective, and prove smoothness and curvature properties justifying constant-step-size gradient descent. For infinite-dimensional RKHS rewards, we develop a Lagrangian relaxation whose inner-maximising policy is characterised by a soft Bellman equation. The main obstacle is the absence of a discount-factor contraction. We resolve this by introducing a minorisation-based sub-stochastic kernel that yields a strict contraction of the soft Bellman operator. We establish Fréchet differentiability and Lipschitz smoothness of the log-likelihood score, leading to a gradient ascent algorithm with convergence guarantees. Two numerical examples, a malware-spread MFG and an RKHS-based consumer-choice model, show that the recovered policies closely match expert behaviour.

2606.16771 2026-06-16 cs.LG 新提交

GD$^2$PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

GD$^2$PO: 通过组动态奖励解耦策略优化缓解多奖励冲突

Haotian Liu, Yihao Liu, Jingwei Ni, Siyuan Huang, Xinpeng Liu, Pengyu Cheng, Jiajun Song, Ruijin Ding, Junfeng Li, Zhechao Yu, Mengyu Zhou, Hongteng Xu, Xiaoxi Jiang, Guanjun Jiang

发表机构 * Qwen Large Model Application Team, Alibaba(阿里巴巴通义千问大模型应用团队) Renmin University of China(中国人民大学) Peking University(北京大学) ETH Zürich(苏黎世联邦理工学院) University of Zurich(苏黎世大学) The Chinese University of Hong Kong(香港中文大学)

AI总结 提出GD$^2$PO算法,通过冲突感知过滤机制屏蔽奖励不一致的rollout,并结合查询级重加权,解决多奖励优化中的信号抵消问题,提升RL训练效率。

Comments 24 pages, 9 figures

详情
AI中文摘要

随着LLM的发展,后训练强化学习(RL)越来越依赖多维奖励来培养全面能力。这种转变需要新的算法来同时优化多样且可能相互竞争的目标。为了解决这个问题,现有方法如组奖励解耦策略优化(GDPO)将整体得分分解为独立的奖励组,然后在每个组内分别计算RL损失。然而,这种策略仍然会遇到多奖励冲突:单个rollout在某些奖励维度上可能产生正优势,但在其他维度上产生负优势,导致聚合过程中相反信号相互抵消,进一步阻碍RL训练效率。受动态采样策略优化(DAPO)的启发,DAPO通过过滤掉具有接近零优势的无效rollout来提高RL训练效率,我们提出了组动态奖励解耦策略优化(GD$^2$PO)。具体来说,GD$^2$PO采用冲突感知过滤机制来屏蔽遭受严重奖励不一致的rollout。通过防止冲突信号相互抵消,这种掩蔽策略保留并增强了有效RL优势的幅度,从而显著加速学习效率。此外,我们引入了查询级重加权,根据每个查询的整体奖励共识动态调整其更新强度。在多种多奖励场景(包括工具调用和人类偏好对齐)上的实验表明,GD$^2$PO持续且显著优于现有基线。代码可在https://github.com/Qwen-Applications/GD2PO获取。

英文摘要

As LLMs advance, post-training reinforcement learning (RL) increasingly relies on multi-dimensional rewards to cultivate comprehensive capabilities. This shift demands new algorithms capable of optimizing diverse and potentially competing objectives simultaneously. To address this, existing methods such as Group reward-Decoupled Policy Optimization (GDPO) decompose the overall score into independent reward groups, then compute the RL loss separately within each group. However, this strategy still encounters multi-reward conflicts: a single rollout can yield positive advantages on certain reward dimensions but negative ones on others, causing opposing signals to cancel each other out during aggregation, further hindering RL training efficiency. Inspired by Dynamic sAmpling Policy Optimization (DAPO), which improves RL training efficiency by filtering out ineffective rollouts with near-zero advantages, we propose Group-Dynamic reward-Decoupled Policy Optimization (GD$^2$PO). Specifically, GD$^2$PO employs a conflict-aware filtering mechanism to mask out rollouts suffering from severe reward-wise disagreement. By preventing conflicting signals from canceling each other out, this masking strategy preserves and enhances the magnitude of effective RL advantages, thereby significantly accelerating learning efficiency. Furthermore, we introduce query-level reweighting to dynamically adjust the update intensity of each query based on its overall reward consensus. Experiments on various multi-reward scenarios, including tool calling and human preference alignment, demonstrate that GD$^2$PO consistently and significantly outperforms existing baselines. The code is available at https://github.com/Qwen-Applications/GD2PO.

2606.16846 2026-06-16 cs.LG cs.AI 新提交

Deep Q-Learning on Hölder Spaces

Hölder空间上的深度Q学习

Qian Qi

发表机构 * Peking University(北京大学)

AI总结 研究连续时间随机控制中Q学习的算子核心,通过分析扩散设置下Bellman最优性目标的正则性和逼近复杂度,提出适应混合正则性的张量积DeepONet架构,并给出显式逼近和资源界限。

详情
AI中文摘要

我们研究了具有连续状态和动作的连续时间随机控制中Q学习的算子理论核心。在基于价值的强化学习中,每次Q学习或DQN更新都基于Bellman最优性目标;我们的分析在扩散设置中分离出该目标,并研究其正则性和逼近复杂度。在均匀椭圆性和Hölder正则系数下,我们证明Bellman更新将有界输入映射到各向异性正则类,平滑状态变量而仅保留对动作变量的Lipschitz依赖性。这产生了Bellman迭代的紧族,并激发了适应问题混合正则性的张量积DeepONet架构。然后我们推导出显式的逼近和资源界限,以及时间步长$δ\ o 0$时的刚度-复杂度权衡。所得理论在连续随机控制中Bellman目标正则性和逼近层面直接贡献于Q学习理论。同时,我们并未声称对包含探索、经验回放和随机梯度更新的实际采样Q学习有完整的收敛定理。

英文摘要

We study the operator-theoretic core of Q-learning in continuous-time stochastic control with continuous states and actions. In value-based reinforcement learning, each Q-learning or DQN update is built from a Bellman optimality target; our analysis isolates this target in a diffusion setting and studies its regularity and approximation complexity. Under uniform ellipticity and Hölder-regular coefficients, we show that a Bellman update maps bounded inputs into an anisotropic regularity class, smoothing the state variable while leaving only Lipschitz dependence on the action variable. This yields a compact family of Bellman iterates and motivates a tensor-product DeepONet architecture adapted to the mixed regularity of the problem. We then derive explicit approximation and resource bounds, together with a stiffness--complexity trade-off as the time step $δ\to 0$. The resulting theory makes a direct contribution to Q-learning theory at the level of Bellman target regularity and approximation in continuous stochastic control. At the same time, we do not claim a full convergence theorem for practical sampled Q-learning with exploration, replay, and stochastic gradient updates.

2606.16933 2026-06-16 cs.LG cs.AI 新提交

A Unified Causal-Origin Taxonomy of Distributional Shifts in Reinforcement Learning

强化学习中分布偏移的统一因果起源分类法

Ardianto Wibowo, Paulo E Santos, Amer Baghdadi, Matthew Stephenson, Karl Sammut, Jean-Philippe Diguet

发表机构 * IMT Atlantique(IMT大西洋) Flinders University(弗林德斯大学) IRL Crossing Priori Analytica CNRS(法国国家科学研究中心)

AI总结 提出一种统一因果起源分类法,将强化学习中的分布偏移按因果来源(内部/外部)和时间边界(显式/隐式/混合)分类,统一了分布内/外泛化与非平稳性分析。

Comments The paper is currently under review at the Journal of Artificial Intelligence Research (JAIR)

详情
AI中文摘要

强化学习系统在运行条件与先前遇到的条件不同时通常会退化,这反映了底层数据生成过程中的分布偏移。这种偏移可能发生在训练和评估之间,如分布内(ID)和分布外(OOD)泛化,或者发生在环境动态随时间演变的非平稳设置中。然而,这些观点之间的形式关系尚不清楚,现有工作主要关注缓解措施而非智能体-环境交互中偏移的因果起源。本文开发了一个统一的因果起源分类法,描述了强化学习中分布偏移的来源,并将ID/OOD泛化与非平稳设置联系起来。我们将监督学习中的经典数据集偏移原则迁移到强化学习,通过将分布偏移重新表述为生成交互过程。使用部分可观测马尔可夫决策过程(POMDP),我们将交互分解为结构组件,包括状态分布、观测过程、策略、奖励和转移动态,以及偏移时间边界。所提出的分类法区分了内部(智能体驱动)和外部(环境驱动)的分布偏移。偏移时间边界视角进一步刻画了显式、隐式和混合偏移。这种表述将ID/OOD泛化和非平稳性统一为底层过程中的结构化变化。我们还引入了一个评估框架,通过性能退化和恢复指标来衡量偏移影响和适应能力。通过将分布偏移扎根于强化学习的因果起源结构,本文支持在分布偏移下进行系统性的鲁棒性分析。

英文摘要

Reinforcement learning (RL) systems often degrade when operating conditions differ from those previously encountered, reflecting distributional shifts in the underlying data-generating process. Such shifts may occur between training and evaluation, as in In-Distribution (ID) and Out-of-Distribution (OOD) generalization, or within non-stationary settings where environment dynamics evolve over time. However, the formal relationship between these views remains unclear, and existing work mainly focuses on mitigation rather than the causal origin of shift within the agent-environment interaction. This work develops a unified causal-origin taxonomy that characterizes sources of distributional shift in RL and relates ID/OOD generalization to non-stationary settings. We transfer the classical dataset-shift principle from supervised learning to RL by reformulating distributional shift in terms of the generative interaction process. Using a Partially Observable Markov Decision Process (POMDP), we decompose the interaction into structural components, including the state distribution, observation process, policy, reward, and transition dynamics, together with the shifted-time boundary. The proposed taxonomy distinguishes internal, agent-driven, and external, environment-driven, distributional shifts. The shifted-time boundary perspective further characterizes explicit, implicit, and hybrid shifts. This formulation unifies ID/OOD generalization and non-stationarity as structured changes in the underlying process. We also introduce an evaluation framework for measuring shift impact and adaptation through performance degradation and recovery metrics. By grounding distributional shift in the causal-origin structure of RL, this work supports systematic analysis of robustness under distributional shift.

2606.17024 2026-06-16 cs.LG 新提交

ExpRL: Exploratory RL for LLM Mid-Training

ExpRL: 用于LLM中期训练的探索性强化学习

Violet Xiang, Amrith Setlur, Chase Blagden, Nick Haber, Aviral Kumar

发表机构 * Stanford University(斯坦福大学) Carnegie Mellon University(卡内基梅隆大学) OpenAI Rogo

AI总结 提出ExpRL方法,利用人类编写的问答数据作为奖励支架,通过密集奖励强化推理过程中的部分进展和有用行为,在数学推理任务上优于SFT、稀疏奖励GRPO和自蒸馏,并为后续稀疏奖励RL提供更好的初始化。

详情
AI中文摘要

稀疏奖励强化学习(RL)已成为提升LLM推理能力的标准工具,但其成功关键取决于基础模型中的覆盖范围。实践中,模型通常通过在精心策划的推理轨迹上进行中期训练来为RL做准备,这些轨迹教授有用的基本技能,如分解、验证或自我纠正。尽管有效,但这种策略需要手动指定模型应学习的内容,并且尚不清楚这种基本覆盖是否足以解决更难的问题,这些问题需要将这些技能组合成更广泛的解决方案策略。我们研究了一种更自动化的方法:使用大规模人工编写的问答数据进行基于RL的中期训练。我们的方法ExpRL不是将参考解决方案作为模仿目标,而是将其用作奖励支架:参考对策略隐藏,仅用于构建问题特定的评分标准,以评判在策略推理轨迹。策略从原始问题提示中采样,而LLM评判器将采样的推理轨迹与参考解决方案进行比较,并分配结果级或过程级的密集奖励。这使得ExpRL能够强化部分进展、有用的中间归约以及稀疏最终答案奖励通常无法提升的生产性推理行为。在具有挑战性的数学推理任务上,ExpRL比SFT、稀疏奖励GRPO和自蒸馏产生更强的RL启动,并为后续稀疏奖励RL提供更好的初始化。额外的混合领域实验进一步表明,ExpRL可以扩展到最初的纯数学设置之外。

英文摘要

Sparse reward reinforcement learning (RL) has become a standard tool for improving LLM reasoning, but its success depends critically on the coverage present in the base model. In practice, models are often primed for RL through \emph{mid-training} on curated reasoning traces that teach useful primitive skills such as decomposition, verification, or self-correction. Although effective, this strategy requires manually specifying what the model should learn, and it remains unclear whether such primitive coverage is enough for much harder problems, which require combining these skills into broader solution strategies. We study a more automated approach: \emph{RL-based mid-training} using large corpora of human-written question-answer data. Rather than treating reference solutions as targets to imitate, our method, ExpRL, uses them as \emph{reward scaffolds}: references are hidden from the policy and used only to construct problem-specific grading rubrics for judging on-policy reasoning traces. The policy samples from the original problem prompt, while an LLM judge compares the sampled reasoning trace against the reference solution and assigns outcome-level or process-level dense rewards. This lets ExpRL reinforce partial progress, useful intermediate reductions, and productive reasoning behaviors that sparse final-answer rewards often fail to upweight. On challenging math reasoning tasks, ExpRL yields stronger RL priming than SFT, sparse-reward GRPO, and self-distillation, and provides a better initialization for subsequent sparse-reward RL. Additional mixed-domain experiments further suggest that ExpRL can extend beyond the original math-only setting.

2606.14879 2026-06-16 cs.RO cs.CV cs.LG 交叉投稿

VANDERER: Map-Free Exploration using Future-Aware and Visual-Curiosity-Guided Diffusion Policy

VANDERER: 基于未来感知与视觉好奇心引导扩散策略的无地图探索

Venkata Naren Devarakonda, Raktim Gautam Goswami, Prashanth Krishnamurthy, Farshad Khorrami

发表机构 * Control/Robotics Research Laboratory (CRRL), Department of Electrical and Computer Engineering, NYU Tandon School of Engineering(纽约大学坦登工程学院电气与计算机工程系控制/机器人研究实验室(CRRL)) New York University Abu Dhabi (NYUAD) Center for Artificial Intelligence and Robotics (CAIR)(纽约大学阿布扎比分校人工智能与机器人中心(CAIR))

AI总结 提出VANDERER框架,利用视觉好奇心模块引导预训练扩散策略,仅依赖单目图像实现高效无地图探索,在多种模拟环境中平均探索面积比NoMaD多13.4%。

详情
AI中文摘要

移动智能体需要高效的探索策略来绘制未知环境并自主规划任务。传统方法依赖于生成占据地图并优化未探索区域的访问顺序。然而,在传感器受限的设置中,例如仅使用单目相机,生成准确的占据地图具有挑战性。为了解决这一问题,我们提出了VANDERER,一个探索框架,它利用视觉好奇心模块(VCM)仅使用单目图像数据来引导预训练的扩散策略。该好奇心模块通过导航世界模型预测所提议动作的结果,并通过好奇心成本对其进行评估。然后,该成本引导扩散过程生成最大化探索的动作。在多种模拟环境中进行评估,VANDERER始终优于现有基线,平均探索面积比NoMaD多13.4%。我们的结果揭示了室外环境中视觉好奇心与几何好奇心之间的直接相关性,表明VANDERER能够有效利用这种关系,在传感器受限的智能体上实现高效探索。

英文摘要

Mobile agents require efficient exploration strategies to map unseen environments and autonomously plan tasks. Traditional methods rely on generating occupancy maps and optimizing the sequence in which unexplored regions are visited. However, in sensor-constrained settings, such as those limited to monocular cameras, generating accurate occupancy maps is challenging. To address this, we propose VANDERER, an exploration framework that leverages a Visual Curiosity Module (VCM) to guide pre-trained diffusion policies using only monocular image data. This curiosity module predicts the outcomes of proposed actions via a navigation world model and evaluates them through a curiosity cost. The cost then guides the diffusion process toward generating actions that maximize exploration. Evaluated across diverse simulated environments, VANDERER consistently outperforms established baselines, exploring an average of 13.4% more area than NoMaD. Our results reveal a direct correlation between visual and geometric curiosity in outdoor environments, demonstrating that VANDERER can effectively leverage this relationship for efficient exploration using sensor-constrained agents.

2606.14981 2026-06-16 cs.RO cs.AI cs.LG 交叉投稿

Inference-time Policy Steering via Vision and Touch

通过视觉和触觉进行推理时策略引导

Yilin Wu, Zilin Si, Zeynep Temel, Oliver Kroemer, Andrea Bajcsy

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出ViTaL框架,通过视觉采样验证和触觉引导扩散编辑的双层优化,在推理时引导机器人策略,显著提升接触丰富操作任务的成功率。

详情
AI中文摘要

推理时引导通过在部署前验证候选动作来适应预训练的生成式机器人策略。虽然先前的方法通常仅使用视觉观察进行验证,但对于接触丰富的操作任务,仅靠视觉往往不足,因为成功取决于全局任务进展和微妙的局部交互(如接触力)。我们提出了ViTaL,一个视觉-触觉推理时引导框架,将多模态引导形式化为双层优化问题。在高层,视觉采样与验证执行长时域模式选择,决定机器人应执行何种行为。在低层,触觉引导的扩散编辑在较短时域内细化所选动作序列,以满足局部接触要求。为了支持基于结果的引导,ViTaL学习了一个视觉-触觉潜在世界模型,并采用了语义对齐的视觉和触觉验证器,包括一个新颖的文本条件触觉奖励,直接在潜在空间中对预测的触觉未来进行评分。在三个真实世界的接触丰富操作任务中,ViTaL相对于基础策略将整体成功率提高了51%,比单模态引导至少高出33%,并且比朴素多模态融合至少高出20%。网站:https://yilin-wu98.github.io/vital_website。

英文摘要

Inference-time steering adapts pre-trained generative robot policies during deployment by verifying candidate actions before execution. While prior methods typically perform this verification only with visual observations, vision alone is often insufficient for contact-rich manipulation, where success depends on both global task progress and subtle local interactions such as contact force. We introduce ViTaL, a visuo-tactile inference-time steering framework that formulates multimodal guidance as a bi-level optimization problem. At the high level, visual sampling-and-verification performs long-horizon mode selection, deciding what behavior the robot should execute. At the low level, tactile-guided diffusion editing refines the selected action sequence over a shorter horizon to satisfy local contact requirements. To support outcome-based steering, ViTaL learns a visuo-tactile latent world model and employs semantically aligned visual and tactile verifiers, including a novel text-conditioned tactile reward that scores predicted tactile futures directly in latent space. Across three real-world contact-rich manipulation tasks, ViTaL improves overall success by 51% over the base policy, outperforms unimodal steering by at least 33%, and exceeds naive multimodal fusion by at least 20%. Website: https://yilin-wu98.github.io/vital_website.

2606.15099 2026-06-16 cs.CV cs.LG cs.RO 交叉投稿

Think Less, Act Early: Reinforced Latent Reasoning with Early Exit in Vision-Language-Action Models

少思考,早行动:视觉-语言-动作模型中带早退的强化潜在推理

Dianqiao Lei, Lianlei Shan

AI总结 提出AVA-VLA框架,通过强化学习去噪和早退策略优化潜在推理轨迹,在LIBERO上实现6倍推理加速和98.3%平均成功率。

Comments Accepted at ICML 2026

详情
AI中文摘要

现有的视觉-语言-动作(VLA)模型主要依赖显式的思维链(CoT)推理来桥接感知和动作。虽然有效,但这种范式在多步骤任务中面临高计算成本和错误传播的问题。在本文中,我们提出了自适应变量对齐VLA(AVA-VLA),一种新颖的潜在推理VLA框架,将推理建模为一系列不可观测的潜在变量,绕过了显式文本生成的需求。然而,潜在轨迹本质上容易受到噪声干扰和与下游目标不对齐的影响。为了解决这个问题,我们引入了一种基于强化学习的去噪机制,将潜在状态生成视为一个顺序决策过程,通过任务级奖励优化推理轨迹。此外,我们结合了一种早退策略,根据状态置信度自适应地终止推理,实现了深度和效率之间的动态权衡。在具身决策基准上的大量实验表明,AVA-VLA在LIBERO上实现了比显式CoT方法6倍的推理加速,同时达到了98.3%的平均成功率,在效率和长期稳定性上均优于全推理基线。

英文摘要

Existing Vision-Language-Action (VLA) models predominantly rely on explicit Chain-of-Thought (CoT) reasoning to bridge perception and action. While effective, this paradigm suffers from high computational costs and error propagation in multi-step tasks. In this paper, we propose Adaptive Variable Alignment VLA (AVA-VLA), a novel Latent Reasoning VLA framework that models reasoning as a sequence of unobservable latent variables, bypassing the need for explicit text generation. However, latent trajectories are inherently susceptible to noise interference and misalignment with downstream objectives. To address this, we introduce a Reinforcement Learning-based Denoising mechanism that treats latent state generation as a sequential decision process, optimizing reasoning trajectories via task-level rewards. Furthermore, we incorporate an Early-Exit Strategy that adaptively terminates reasoning based on state confidence, enabling a dynamic trade-off between depth and efficiency. Extensive experiments on embodied decision benchmarks demonstrate that AVA-VLA achieves a 6x inference speedup over explicit CoT methods while attaining a 98.3% average success rate on LIBERO, improving both efficiency and long-horizon stability over full-reasoning baselines.

2606.15160 2026-06-16 cs.CV cs.LG 交叉投稿

DLWM: Diverse Latent World Models for Efficient Multimodal Reasoning

DLWM: 多样化潜在世界模型用于高效多模态推理

David Huang, Lianlei Shan

发表机构 * University of Toronto(多伦多大学) Tsinghua University(清华大学)

AI总结 提出DLWM框架,结合潜在空间推理与强化学习,通过多样化潜在假设和资源感知策略提升多模态推理效率,准确率提升2-5%,内存减少24%。

Comments Preprint. 9 pages main text, 15 pages total including appendix, 2 figures

详情
AI中文摘要

近年来,多模态大语言模型(MLLMs)的推理能力有了显著提升。现有方法通常依赖显式的思维链或连续的潜在空间轨迹来增强多步推理。然而,这些方法通常假设输入具有单一的潜在解释,并沿着固定路径或在统一计算预算下展开推理。在现实世界的多模态场景中,视觉观测常受遮挡、模糊、视角变化或语义歧义的影响,产生多种合理的解释。统一的推理策略不仅限制了模型探索多个假设的能力,还导致高内存使用和展开成本。我们提出DLWM(多样化潜在世界模型),一种结合潜在空间推理与强化学习的多模态推理框架。首先,我们在连续潜在空间中构建一组多样化的潜在世界假设,每个假设捕捉视觉输入的不同合理解释,并在每个假设上独立展开潜在推理。基于正交性的多样性正则化器明确防止假设坍缩。其次,我们将潜在推理过程形式化为资源受限的序列决策问题,并引入资源感知的强化学习策略,该策略自适应地在假设间分配计算资源,动态决定是扩展、终止还是合并推理路径,从而大幅减少内存占用并提高展开效率。在多个多模态推理基准上的实验表明,DLWM在准确率上比现有方法高出2-5个百分点,同时内存使用减少24%。

英文摘要

Reasoning capabilities of multimodal large language models (MLLMs) have improved considerably in recent years. Existing approaches typically rely on explicit chain-of-thought or continuous latent-space trajectories to enhance multi-step reasoning. However, these methods generally assume that an input admits a single latent interpretation and unfold reasoning along a fixed path or under a uniform computation budget. In real-world multimodal settings, visual observations are often subject to occlusion, blur, viewpoint variation, or semantic ambiguity, giving rise to multiple plausible interpretations. A uniform reasoning strategy not only limits the model's ability to explore multiple hypotheses but also incurs high memory usage and rollout cost. We present DLWM (Diverse Latent World Models), a multimodal reasoning framework that combines latent-space reasoning with reinforcement learning. First, we construct a set of diverse latent world hypotheses in continuous latent space, each capturing a different plausible interpretation of the visual input, and unfold latent reasoning independently on each hypothesis. An orthogonality-based diversity regularizer explicitly prevents hypothesis collapse. Second, we formulate the latent reasoning process as a resource-constrained sequential decision problem and introduce a resource-aware reinforcement learning policy that adaptively allocates computation across hypotheses, dynamically deciding whether to expand, terminate, or merge reasoning paths, thereby substantially reducing memory footprint and improving rollout efficiency. Experiments on multiple multimodal reasoning benchmarks demonstrate that DLWM outperforms existing methods by 2-5 points in accuracy while reducing memory usage by 24%.

2606.15333 2026-06-16 cs.CL cs.LG 交叉投稿

Replay What Matters: Off-Policy Replay for Efficient LLM Reinforcement Unlearning

重放重要内容:面向高效LLM强化反学习的离策略重放

Zirui Pang, Chenlong Zhang, Haosheng Tan, Zhuoran Jin, Jiaheng Wei, Zixin Zhong

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所) University of Glasgow(格拉斯哥大学)

AI总结 针对LLM反学习中在线策略优化对困难样本利用不足的问题,提出ReRULE方法,通过离策略重放缓冲区存储并复用低奖励困难样本,在保持通用性的同时提升反学习效率。

详情
AI中文摘要

LLM反学习已成为一种经济有效的替代方案,无需完全重新训练即可从预训练模型中移除危险知识,同时保持通用实用性。最近的基于RL的方法(如RULE)将反学习重新定义为学习拒绝行为,但其在线策略优化在整个训练过程中反复从相同的遗忘和保留/边界提示中采样。我们发现了该过程中的一个关键低效问题:简单案例迅速收敛,提供的梯度信号很少,而遗忘/保留边界附近的困难案例持续产生低奖励的轨迹,这些轨迹在单次使用后被丢弃。为了解决这个问题,我们提出了ReRULE,一种用于强化反学习的离策略重放增强方法。ReRULE在早期GRPO训练期间将低奖励的困难案例轨迹组存储在重放缓冲区中,并通过重要性采样的离策略更新在后续阶段重用它们,将计算重定向到仍需学习的边界案例。理论上,我们证明ReRULE比纯在线策略RULE具有更紧的困难案例收敛界。实验上,ReRULE将MUSE-Books保留质量从46.3提高到56.2,同时仅增加5-11%的训练时间。其在更简单的TOFU设置上改进有限,进一步支持了预期的条件行为:当困难/简单差异显著时,重放最为有益。

英文摘要

LLM unlearning has emerged as a cost-effective alternative to full retraining for removing hazardous knowledge from pretrained models while preserving general utility. Recent RL-based methods such as RULE reformulate unlearning as learning a refusal behavior, but their on-policy optimization repeatedly samples from the same forget and retain/boundary prompts throughout training. We identify a critical inefficiency in this process: easy cases quickly converge and provide little useful gradient signal, while hard cases near the forget/retain boundary continue to produce low-reward rollouts that are discarded after a single use. To address this issue, we propose ReRULE, an off-policy replay enhancement for reinforcement unlearning. ReRULE stores low-reward hard-case rollout groups in a replay buffer during early GRPO training and reuses them in later stages through importance-sampled off-policy updates, redirecting computation toward boundary cases that still require learning. Theoretically, we show that ReRULE yields a tighter hard-case convergence bound than pure on-policy RULE. Empirically, ReRULE improves MUSE-Books Retain Quality from 46.3 to 56.2 while adding only 5--11% training time across benchmarks. Its limited improvement on the simpler TOFU setting further supports the intended conditional behavior: replay is most beneficial when the hard/easy disparity is pronounced.

2606.15514 2026-06-16 cs.RO cs.LG 交叉投稿

Reinforcement Learning-Guided Retrieval with Soft Fusion for Robust Multimodal Imitation Learning under Missing Modalities

强化学习引导的软融合检索用于缺失模态下的鲁棒多模态模仿学习

Hassan Ismkhan, Hamid Bouchahcia

发表机构 * Bournemouth University(伯恩茅斯大学)

AI总结 提出RL4IL方法,利用强化学习策略从训练库中检索最相关专家演示,并通过软交叉注意力融合生成动作,有效处理传感器缺失问题,在LIBERO基准上超越现有方法。

详情
AI中文摘要

机器人系统通过多种输入模态感知世界——包括视觉摄像头流和自然语言指令——并必须基于这些信号选择适当的动作。然而,假设所有输入设备永久可用是不现实的,因为在部署过程中传感器可能失效、被遮挡或完全丢失。因此,鲁棒处理此类缺失模态场景对于真实世界的机器人操作至关重要。本文介绍了RL4IL,一种强化学习引导的模仿学习方法,通过从训练库中识别最相关的专家演示,为给定观测选择最合适的动作。一个强化学习策略,通过基于广度优先搜索候选集的近端策略优化进行训练,对候选演示进行排序,一个软交叉注意力融合头聚合它们的动作信号以产生最终预测。当推理时模态缺失时,一个专用的每模态RL检索策略从训练库中识别捐赠演示,一个软插补头通过交叉注意力在排名靠前的捐赠者上重建缺失嵌入——无需对系统进行任何重新训练。在三个LIBERO基准套件上的实验表明,RL4IL在传感器丢失条件下显著优于最先进的模仿学习方法,同时无需策略网络训练。代码可在https://github.com/h-ismkhan/Reinforcement-Learning-via-kNN-for-Robotic-Learning-with-Missing-Camera找到。

英文摘要

Robotic systems perceive the world through multiple input modalities -- including visual camera streams and natural language instructions -- and must select appropriate actions based on these signals. However, assuming the permanent availability of all input devices is unrealistic, as sensors may fail, become occluded, or drop out entirely during deployment. Robust handling of such missing-modality scenarios is therefore essential for real-world robot operation. This paper introduces RL4IL, a reinforcement learning guided method for imitation learning that selects the most suitable action for a given observation by identifying the most relevant expert demonstrations from a training library. A reinforcement learning policy, trained via Proximal Policy Optimisation over Breadth-First Search candidate sets, ranks candidate demonstrations and a soft cross-attention fusion head aggregates their action signals to produce the final prediction. When a modality is missing at inference time, a dedicated per-modality RL retrieval policy identifies donor demonstrations from the training library, and a soft imputation head reconstructs the missing embedding via cross-attention over the top-ranked donors -- without requiring any retraining of the system. Experiments on three LIBERO benchmark suites demonstrate that RL4IL substantially outperforms state-of-the-art imitation learning methods under sensor dropout conditions, while requiring no policy network training. The code can be found at https://github.com/h-ismkhan/Reinforcement-Learning-via-kNN-for-Robotic-Learning-with-Missing-Camera

2606.15866 2026-06-16 cs.AI cs.LG 交叉投稿

STRIDE: Strategic Trajectory Reasoning via Discriminative Estimation for Verifiable Reinforcement Learning

STRIDE: 通过判别估计进行策略轨迹推理以实现可验证强化学习

Qinjian Zhao, Zhihao Dou, Dinggen Zhang, Xiangyu Li, Chaoda Song, Zhongwei Wan, Xinpeng Li, Yanyan Zhang, Kaijie Chen, Qingtao Pan, Chengcheng Feng, Zhiqiang Gao, Xiaoyu Xia

发表机构 * Kean University(基恩大学) Case Western Reserve University(凯斯西储大学) University of Texas at Austin(德克萨斯大学奥斯汀分校) The Ohio State University(俄亥俄州立大学) Tongji University(同济大学) Duke Kunshan University(昆山杜克大学) Royal Melbourne Institute of Technology(皇家墨尔本理工大学)

AI总结 提出STRIDE框架,通过对比成功与失败轨迹估计n-gram策略模式的判别偏好,结合推理显著性熵识别关键策略模式,实现细粒度信用分配,提升可验证强化学习的推理性能。

详情
AI中文摘要

可验证奖励强化学习(RLVR)已成为提升大语言模型推理能力的有效后训练范式。然而,现有RLVR方法通常依赖最终答案正确性分配轨迹级奖励,提供稀疏监督,并统一处理所有token,不考虑它们对推理的实际贡献。尽管最近的研究引入了中间信号,如过程奖励、高熵token和语义不确定性,但这些信号通常本身不可验证,且可能无法区分有益策略模式与有害模式。为解决这一局限,我们提出STRIDE(通过判别估计进行策略轨迹推理),一种从可验证结果中推导策略推理监督的细粒度RLVR框架。STRIDE对比每个响应组内的成功和失败轨迹,以估计每个n-gram策略模式的结果判别偏好,并进一步将该信号与推理显著性熵结合,识别决策相关的策略模式。在RL优化过程中,这些模式被分配差异化的优势值,从而在保持RLVR可验证性的同时实现更精确的信用分配。大量实验表明,STRIDE在多种模型、任务和扩展设置(包括VLM和基于智能体的系统)中一致提升了推理性能。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) has become an effective post-training paradigm for improving the reasoning abilities of large language models. However, existing RLVR methods typically rely on final-answer correctness to assign trajectory-level rewards, providing sparse supervision and treating all tokens uniformly regardless of their actual contribution to reasoning. Although recent studies introduce intermediate signals such as process rewards, high-entropy tokens, and semantic uncertainty, these signals are often not inherently verifiable and may fail to distinguish beneficial strategic patterns from harmful ones. To address this limitation, we propose STRIDE (Strategic Trajectory Reasoning with Discriminative Estimation), a fine-grained RLVR framework that derives strategic reasoning supervision from verifiable outcomes. STRIDE contrasts successful and failed trajectories within each response group to estimate the outcome-discriminative preference of each $n$-gram strategic pattern, and further combines this signal with reasoning saliency entropy to identify decision-relevant strategic patterns. These patterns are assigned differentiated advantage values during RL optimization, enabling more precise credit assignment while preserving the verifiability of RLVR. Extensive experiments demonstrate that STRIDE consistently improves reasoning performance across diverse models, tasks, and extended settings, including VLMs and agent-based systems.

2606.16215 2026-06-16 cs.CL cs.AI cs.LG 交叉投稿

PACT: Privileged Trace Co-Training for Multi-Turn Tool-Use Agents

PACT: 多轮工具使用智能体的特权轨迹协同训练

Zhenbang Du, Jun Luo, Zhiwei Zheng, Xiangchi Yuan, Kejing Xia, Dachuan Shi, Qirui Jin, Qijia He, Shaofeng Zou, Yingbin Liang, Wenke Lee

发表机构 * Georgia Institute of Technology(佐治亚理工学院) Ohio State University(俄亥俄州立大学) University of Pennsylvania(宾夕法尼亚大学) Arizona State University(亚利桑那州立大学)

AI总结 提出PACT框架,通过特权轨迹(专家轨迹)在训练时提供密集监督信号,结合轨迹条件RL和组件感知SFT损失,避免推理时依赖轨迹,显著提升多轮工具使用智能体的性能。

Comments Project page: https://zhenbangdu.github.io/pact-project-page/

详情
AI中文摘要

多轮工具使用智能体必须在多个交互轮次中进行推理、调用工具并适应观察结果。对此类智能体进行后训练具有挑战性,因为强化学习通常面临稀疏奖励和弱信用分配问题(尽管匹配仅提示推理设置),而基于专家轨迹的监督微调提供密集过程监督,但可能过度约束模型到固定轨迹。为解决这一问题,我们提出PACT,一种用于多轮工具使用智能体的特权轨迹协同训练框架。关键思想是仅将专家轨迹作为训练时的优化信号,而非推理时的提示。PACT保持推理生成仅基于提示,然后通过两个互补信号利用专家轨迹指导优化:一个轨迹条件RL代理,在专家轨迹上下文中评估仅提示轨迹;一个组件感知SFT损失,以退火强度监督推理前缀和工具调用。为减少对训练时轨迹上下文的过度依赖,PACT进一步引入仅提示锚定。我们还提供了一个潜在轨迹视角,连接两个基于轨迹的目标,并解释专家轨迹如何在推理生成中不被使用的情况下指导优化。在FTRL、BFCL和ToolHop上的实验表明,PACT持续优于强SFT和RL基线,凸显了特权轨迹协同训练在多轮工具使用学习中的价值。

英文摘要

Multi-turn tool-use agents must reason, call tools, and adapt to observations across several interaction turns. Post-training such agents is challenging, as reinforcement learning often suffers from sparse rewards and weak credit assignment despite matching the prompt-only inference setting, while supervised fine-tuning on expert traces provides dense process supervision but can over-constrain the model to fixed trajectories. To tackle this, we propose PACT, a Privileged trAce Co-Training framework for multi-turn tool-use agents. The key idea is to use expert traces only as training-time optimization signals rather than rollout-time hints. PACT keeps rollout generation prompt-only, then uses expert traces to guide optimization through two complementary signals: a trace-conditioned RL surrogate that evaluates prompt-only rollouts under expert-trace context, and a component-aware SFT loss that supervises reasoning prefixes and tool-calls with annealed strength. To reduce over-reliance on the training-only trace context, PACT further introduces a prompt-only anchoring. We also provide a latent-trace view that connects the two trace-based objectives and explains how expert traces can guide optimization without being used during rollout generation. Experiments on FTRL, BFCL, and ToolHop show that PACT consistently improves over strong SFT- and RL-based baselines, highlighting the value of privileged trace co-training for multi-turn tool-use learning.

2606.16285 2026-06-16 cs.CL cs.LG 交叉投稿

HiMPO: Hindsight-Informed Memory Policy Optimization for Less-Entangled Credit in Long-Horizon Agents

HiMPO:面向长周期智能体的后见知情记忆策略优化以减少纠缠信用分配

Jiangze Yan, Yi Shen, Wenjing Zhang, Jieyun Huang, Zhaoxiang Liu, Ning Wang, Kai Wang, Shiguo Lian

发表机构 * Unicom Data Intelligence, China Unicom(联通数据智能有限公司,中国联通) Data Science & Artificial Intelligence Research Institute, China Unicom(中国联通数据科学与人工智能研究院)

AI总结 提出HiMPO框架,通过比较记忆更新前后的任务相关信息估计局部效用,并利用后见相关性作为回顾性滤波器,减少记忆写入动作的信用纠缠,提升长周期智能体性能。

Comments Preprint. 2 figures

详情
AI中文摘要

长周期智能体依赖记忆机制压缩交互历史,但优化记忆写入面临独特的信用分配挑战:记忆更新可能因下游工具故障、噪声观测或推理错误而受到奖励或惩罚,而非其自身贡献。这种因果纠缠的信用可能导致智能体丢弃有用证据或保留无关信息。我们提出HiMPO,一种后见知情记忆策略优化框架,用于在长周期智能体中对记忆写入动作分配较少纠缠的信用。HiMPO首先通过比较在相同写前状态下从先前记忆和更新记忆中可恢复的任务相关信息,估计记忆更新的局部效用。然后,它使用后见相关性作为有界回顾性滤波器,当局部效用不受目标结果支持时,衰减记忆信用。由此产生的记忆特定优势仅应用于记忆令牌,而轨迹级奖励则优化智能体的其余行为。在基于裁判的开放领域任务和客观压缩记忆问答中,HiMPO在保持压缩上下文效率的同时,优于基于强记忆和基于强化学习的基线。受控干预进一步表明,HiMPO减少了工具诱导错误的责备泄漏,并提高了记忆更新的归因保真度。

英文摘要

Long-horizon agents rely on memory mechanisms to compress interaction history, but optimizing memory writing faces a distinct credit assignment challenge: a memory update may be rewarded or penalized due to downstream tool failures, noisy observations, or reasoning errors rather than its own contribution. This causally entangled credit can lead agents to discard useful evidence or preserve irrelevant information. We propose HiMPO, a Hindsight-Informed Memory Policy Optimization framework for assigning less-entangled credit to memory-writing actions in long-horizon agents. HiMPO first estimates the local utility of a memory update by comparing the task-relevant information recoverable from the previous and updated memories under the same pre-write state. It then uses hindsight relevance as a bounded retrospective filter that attenuates memory credit when local utility is not supported by the target outcome. The resulting memory-specific advantage is applied only to memory tokens, while trajectory-level rewards optimize the rest of the agent behavior. Across judge-based open-domain tasks and objective compressive-memory QA, HiMPO improves over strong memory-based and RL-based baselines while preserving compressed-context efficiency. Controlled interventions further show that HiMPO reduces blame leakage from tool-induced errors and improves attribution fidelity of memory updates.

2606.16316 2026-06-16 cs.IR cs.AI cs.LG 交叉投稿

RL-Index: Reinforcement Learning for Retrieval Index Reasoning

RL-Index:用于检索索引推理的强化学习

Yongjia Lei, Nedim Lipka, Zhisheng Qi, Utkarsh Sahu, Koustava Goswami, Franck Dernoncourt, Ryan A. Rossi, Yu Wang

发表机构 * University of Oregon(俄勒冈大学) Adobe Research(Adobe研究)

AI总结 提出RL-Index框架,将检索索引推理转化为强化学习问题,通过LLM生成理由增强文档,使用GRPO优化,提升检索和问答性能并降低在线延迟。

详情
AI中文摘要

检索外部知识对于解决现实世界任务至关重要,但当查询与其相关知识之间的关系涉及超越表面语义或词汇匹配的隐式和复杂推理时(例如,依赖同一定理的数学问题或需要深度推理的编码),仍然具有挑战性。现有方法主要依赖查询端推理(例如,查询重写),这引入了显著的在线延迟,并且未能充分利用对知识语料库本身进行推理的机会(即索引端推理)。在本文中,我们提出了RL-Index,一个智能索引框架,将检索索引推理形式化为强化学习问题。RL-Index不是在进行查询时执行推理,而是通过用LLM生成的理由增强文档,将推理转移到索引阶段,这些理由显式编码了潜在的查询-知识关系。为了优化这些理由的质量,我们采用了组相对策略优化(GRPO),并使用检索相似性作为可验证的奖励信号,从而能够直接优化索引决策以提高检索效果。在BRIGHT基准上的大量实验表明,RL-Index持续提高了检索和下游问答性能,同时显著降低了在线推理延迟。此外,学到的理由增强跨不同的检索器和生成器具有泛化能力,突显了其作为即插即用索引策略在不同检索系统中的鲁棒性。

英文摘要

Retrieving external knowledge is essential for solving real-world tasks, yet it remains challenging when the relationship between a query and its relevant knowledge involves implicit and complex reasoning beyond surface-level semantic or lexical matching (e.g., mathematical problems relying on the same theorem or coding requiring deep reasoning). Existing approaches primarily rely on query-side reasoning (e.g., query rewriting), which introduces significant online latency and underutilizes the opportunity to perform reasoning over the knowledge corpus itself (i.e., index-side reasoning). In this paper, we propose RL-Index, an agentic indexing framework that formulates retrieval index reasoning as a reinforcement learning problem. Instead of performing reasoning at query time, RL-Index shifts reasoning to the indexing stage by augmenting documents with LLM-generated rationales that explicitly encode the latent query-knowledge relationship. To optimize the quality of these rationales, we employ Group Relative Policy Optimization (GRPO) and use retrieval similarity as a verifiable reward signal, enabling direct optimization of indexing decisions for retrieval effectiveness. Extensive experiments on the BRIGHT benchmark demonstrate that RL-Index consistently improves both retrieval and downstream question-answering performance, while significantly reducing online inference latency. Moreover, the learned rationale augmentation generalizes across diverse retrievers and generators, highlighting its robustness as a plug-and-play indexing strategy across different retrieval systems.

2606.16496 2026-06-16 cs.CL cs.LG 交叉投稿

REFLEX: Reflective Evolution from LLM Experience

REFLEX: 基于大语言模型经验的反思进化

Pan Wang

AI总结 提出REFLEX框架,通过解耦视觉诊断与代码生成实现可审计的高效策略进化,在控制任务和天线阵列合成中展现优异样本效率。

详情
AI中文摘要

大型多模态语言模型已成为引导进化搜索朝向可解释程序化策略的强大工具。然而,现有框架依赖单一模型调用来同时解释视觉行为证据并合成修正代码。这种诊断-修复纠缠造成了不透明的反馈循环,掩盖了突变背后的理由,并阻止了跨独立运行的算法洞察保留。为了实现可审计且高效的策略搜索,我们认为视觉诊断必须在结构上与代码生成解耦。我们提出了REFLEX,一个无需训练的进化框架,实现了这种解耦。在REFLEX中,一个具备视觉能力的Critic首先将任务特定的行为证据提炼为结构化的、可审计的诊断。随后,一个文本优化的Actor利用这些诊断以及一个持久且自我进化的可重用代码片段技能记忆来合成子代策略。这种架构不仅提供了透明的突变轨迹,还实现了跨运行的程序化知识迁移。在控制基准(Lunar Lander、Acrobot、Pendulum)和一个36维天线阵列合成任务上的广泛评估展示了卓越的样本效率。值得注意的是,REFLEX在不到10次大语言模型调用中解决了Acrobot和Pendulum,并在Lunar Lander上达到了最佳归一化加权分数1.092,实现了极具竞争力的最终性能,同时显著加速了透明策略的早期发现。

英文摘要

Large multimodal language models (LLMs) have emerged as powerful tools for guiding evolutionary search toward interpretable programmatic policies. However, existing frameworks rely on a monolithic model call to simultaneously interpret visual behavioral evidence and synthesize corrective code. This diagnosis-repair entanglement creates an opaque feedback loop, obscuring the rationale behind mutations and preventing the retention of algorithmic insights across independent runs. To achieve auditable and efficient policy search, we argue that visual diagnosis must be structurally decoupled from code generation. We present REFLEX, a train-free evolutionary framework that operationalizes this decoupling. In REFLEX, a vision-enabled Critic first distills task-specific behavioral evidence into structured, auditable diagnoses. Subsequently, a text-optimized Actor synthesizes child policies using these diagnoses alongside a persistent, self-evolving Skill Memory of reusable code snippets. This architecture not only provides transparent mutation traces but also enables cross-run programmatic knowledge transfer. Extensive evaluations across control benchmarks (Lunar Lander, Acrobot, Pendulum) and a 36-dimensional antenna array synthesis task demonstrate exceptional sample efficiency. Notably, REFLEX solves Acrobot and Pendulum in under 10 LLM calls and reaches a best Normalized Weighted Score of 1.092 on Lunar Lander, achieving highly competitive final performance while significantly accelerating the early-stage discovery of transparent policies.

2606.16978 2026-06-16 cs.RO cs.LG cs.SY eess.SY 交叉投稿

Task-Error Residual Learning for Real-Robot Five-Ball Juggling

任务误差残差学习用于真实机器人五球杂耍

Kai Ploeger, Jan Peters

发表机构 * Technical University of Darmstadt(达姆施塔特工业大学) German Research Center for AI (DFKI)(德国人工智能研究中心) Hessian Center for Artificial Intelligence (hessian.AI)(黑森州人工智能中心)

AI总结 提出基于任务误差方向监督和误差模型驱动样本选择的残差学习方法,在Barrett WAM机械臂上实现稳定三、四、五球杂耍,首次尝试失败后任务误差单调递减,无需进一步失败。

Comments Submitted to the 2026 International Symposium on Robotics Research (ISRR)

详情
AI中文摘要

对于改进现有行为的残差学习,样本效率取决于两个因素:每次试错返回的信息量,以及学习器使用这些信息的效率。强化学习的标准标量奖励携带的信息远少于定义任务的方向性任务误差。随机探索进一步丢弃了每次试错返回的信息。通过使用方向性任务误差监督和驱动样本选择的任务误差模型进行残差学习,我们在拟人化Barrett WAM机械臂上实现了稳定的三、四、五球杂耍。尽管通过简单、理想化的堆栈进行规划和控制,系统从第二次尝试开始收敛。第一次尝试失败后,任务误差单调递减,没有进一步的失败。相比之下,五球杂耍通常需要人类多年的练习。我们在三个三元轴上比较残差学习器:学习反馈中的方向性信息和分析先验的承诺,涵盖牛顿式雅可比更新、复合贝叶斯优化和随机搜索方法。两个轴都被证明是必要的:方向性反馈或信息性先验单独都不足够,而结合它们的最简单方法——固定雅可比牛顿更新——是最可靠的。学习到的残差能够容忍大量的先验失准和退化的关节跟踪,主要影响收敛速度。因此,真实机器人上残差学习的瓶颈是监督信号的信息内容以及学习器如何使用它,而不是周围堆栈的精度。所有实验的视频文档可在 https://kai-ploeger.com/residual-juggling 获取。

英文摘要

For residual learning that refines existing behavior, sample efficiency depends on two things: how much information each rollout returns, and how efficiently the learner uses that information. Reinforcement learning's standard scalar reward carries far less information than the directional task error that defines the task. Random exploration further discards whatever information each rollout returns. Through residual learning with directional task-error supervision and a task error model that drives sample selection, we achieve stable three-, four-, and five-ball juggling on anthropomorphic Barrett WAM arms. Despite planning and controlling through a simple, idealized stack, the system converges from the second attempt. The first attempt drops, after which task error decreases monotonically without further failures. In comparison, five-ball juggling typically takes humans years of practice. We compare residual learners across two ternary axes, the directional information in the learning feedback and the commitment of the analytic prior, spanning Newton-style Jacobian updates, Composite Bayesian Optimization, and stochastic search methods. Both axes prove necessary: neither directional feedback nor an informative prior suffices alone, and the simplest method that combines them, a fixed-Jacobian Newton update, is the most reliable. The learned residual tolerates substantial prior misalignment and degraded joint tracking, affecting mainly convergence speed. The bottleneck for residual learning on real robots is therefore the information content of the supervision signal and how the learner uses it, not the accuracy of the surrounding stack. Video documentation of all experiments is available at https://kai-ploeger.com/residual-juggling.

2606.16995 2026-06-16 cs.AI cs.LG 交叉投稿

When in Doubt, Plan It Out: Committed Small Language Model Deliberation for Reactive Reinforcement Learning

存疑则计划:用于反应式强化学习的小型语言模型承诺式推理

Nathan Gavenski, Juarez Monteiro, Francisco Galuppo, Adriano Veloso, Odinaldo Rodrigues

AI总结 提出PACT混合架构,结合快速反应式强化学习策略与慢速小型语言模型规划器,通过异步生成和验证候选动作计划来提升策略在陌生环境中的表现。

Comments LM4Plan Workshop at ICML 2026

详情
AI中文摘要

强化学习(RL)策略在陌生环境中常常性能下降,因为它们缺乏明确的推理。我们提出了Plan, Align, Commit, Think (PACT),一种混合架构,结合了快速、反应式的RL策略与慢速、深思熟虑的小型语言模型(SLM)规划器。PACT异步调用SLM来生成和验证候选动作计划。一旦通过模拟验证计划是安全、可行且完整的,就直接执行该计划,绕过RL策略,无需重新训练或修改它。在三个难度递增的FrozenLake配置上评估,PACT在所有基线中表现最佳,同时依赖于一个2B参数的SLM骨干,这表明在这些设置中,深思熟虑的规划和反应式执行相结合比单独任何一种都更强大。

英文摘要

Reinforcement Learning (RL) policies often degrade in unfamiliar environments because they lack explicit deliberation. We propose Plan, Align, Commit, Think (PACT), a hybrid architecture that combines a fast, reactive RL policy with a slow, deliberative Small Language Model (SLM) planner. PACT invokes the SLM asynchronously to generate and validate candidate action plans. Once a plan is verified through simulation as safe, feasible, and complete, it is executed directly, bypassing the RL policy without retraining or modifying it. Evaluated on three FrozenLake configurations of increasing difficulty, PACT outperforms all baselines while relying on a 2B-parameter SLM backbone, suggesting that deliberative planning and reactive execution are more powerful in concert than either is alone in these settings.

2606.17011 2026-06-16 cs.RO cs.LG 交叉投稿

ROVE: Unlocking Human Interventions for Humanoid Manipulation via Reinforcement Learning

ROVE: 通过强化学习解锁人类干预用于人形机器人操作

Wei Xiao, Weiliang Tang, Yuying Ge, Hui Zhou, Yao Mu, Li Zhang, Yixiao Ge

发表机构 * XPENG Robotics(小鹏机器人) Fudan University(复旦大学) The Chinese University of Hong Kong(香港中文大学) Shanghai Jiao Tong University(上海交通大学)

AI总结 提出ROVE框架,利用强化学习和乐观价值估计,从次优人类干预轨迹中学习高价值行为,提升人形机器人操作性能。

详情
AI中文摘要

人类干预为视觉-语言-动作(VLA)模型的后训练提供了关键的纠正信号。然而,由于复杂的全身运动学和灵巧手控制,实现无缝的人形干预是一个严峻的系统挑战。因此,收集到的干预轨迹往往是次优的,依赖人类干预作为专家监督的方法可能会吸收犹豫、低效甚至错误的行为。为了解决系统和算法两方面的挑战,我们提出了ROVE,一个用于人形VLA后训练的强化学习框架,能够处理不完美的人类干预。首先,ROVE引入了一个人在环的流水线,能够收集人形操作中的部署和干预数据。其次,它利用乐观价值估计(OVE)从混合质量的轨迹中优先考虑高价值行为。为了进一步增强价值估计的鲁棒性,我们融入了跨具身的人类经验视频,为长尾失败和恢复模式提供丰富的监督。由此产生的评论家产生信息丰富的优势信号,引导VLA演员专注于高价值行为,而不是不加区分地模仿所有动作。在具有挑战性的真实世界接触密集和精细的人形操作任务中,ROVE优于基于经验学习的基线,并在多次部署-干预迭代中持续改进。

英文摘要

Human interventions provide crucial corrective signals for post-training Vision-Language-Action (VLA) models. However, enabling seamless humanoid interventions is a formidable systems challenge due to complex whole-body kinematics and dexterous-hand control. Consequently, the collected intervention trajectories are often suboptimal, and methods that rely on human interventions as expert supervision can absorb hesitant, inefficient, or even erroneous behaviors. To address both the system and algorithmic challenges, we propose ROVE, a reinforcement learning framework for humanoid VLA post-training with imperfect human interventions. First, ROVE introduces a human-in-the-loop pipeline capable of collecting deployment and intervention data for humanoid manipulation. Second, it utilizes Optimistic Value Estimation (OVE) to prioritize high-value behaviors from mixed-quality trajectories. To further robustify value estimation, we incorporate cross-embodiment human experience videos to provide rich supervision for long-tailed failure and recovery modes. The resulting critic yields informative advantage signals, steering the VLA actor to focus on high-value behaviors rather than indiscriminately imitating all actions. On challenging real-world contact-rich and fine-grained humanoid manipulation tasks, ROVE outperforms experience-learning baselines and consistently improves across multiple rollout-intervention iterations.

2606.17043 2026-06-16 cs.RO cs.LG 交叉投稿

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

基于层级优势加权的在线RL微调VLA策略从稀疏回合结果

Tongyan Fang, Siyuan Huang, Naiyu Fang, Ganlong Zhao, Zhongjin Luo, Jianbo Liu, Xiaogang Wang, Ying Dong, Hongsheng Li

发表机构 * ACE Robotics Shenzhen International Graduate School, Tsinghua University(清华大学深圳国际研究生院) The Chinese University of Hong Kong(香港中文大学)

AI总结 提出层级优势加权行为克隆(HABC),通过分离生存性和效率目标并自适应平衡,解决稀疏二元结果下VLA策略在线微调中的信用分配问题,在三个双臂接触任务上将成功率从12-44%提升至38-92%。

Comments Website: https://acerobotics-vla.github.io/HABC-Website

详情
AI中文摘要

当预训练的VLA策略通过在线RL进行微调时,每次 rollout 回合仅产生单个二元结果(成功或失败),但 actor 更新需要每个时间步的监督。现有方法通常将此稀疏结果简化为单个标量奖励或优势信号,这混淆了不同形式的过渡级反馈,并且在基本任务成功可实现后提供的指导有限。首先,单个标量信号混淆了生存性和效率这两个目标;一旦基本成功实现,二元标签无法提供梯度来区分高效完成与缓慢完成。其次,真实世界的 rollout 混合了自主段和干预段;天真地将回合结果跨这些边界分配会导致不正确的信用分配。为解决这些问题,我们提出层级优势加权行为克隆(HABC),该方法在不同数据子集上为这两个目标训练独立的评论家头,并通过状态自适应平衡组合其输出。状态自适应门 $g_t$ 合并它们的一步优势,在成功不确定时优先考虑生存性,仅在生存性高时转向效率,并将结果转换为 actor 损失上的每时间步权重。干预感知的信用分配进一步将结果标签限制在当前策略执行的段,防止监督跨干预边界泄漏。在三个接触丰富的双臂任务上的真实机器人实验中,HABC 将监督微调(SFT)基线的成功率从 36%、44% 和 12% 提升至 92%、88% 和 38%。

英文摘要

When pretrained VLA policies are fine-tuned through online RL, each rollout episode produces only a single binary outcome (success or failure), yet the actor update requires per-transition supervision. Existing approaches commonly reduce this sparse outcome to a single scalar reward or advantage signal, which conflates distinct forms of transition-level feedback and provides limited guidance once basic task success becomes achievable. First, a single scalar signal conflates the two objectives of viability and efficiency; once basic success is achieved, the binary label provides no gradient to distinguish efficient completions from slow ones. Second, real-world rollouts mix autonomous and intervention segments; naively assigning episode outcomes across these boundaries introduces incorrect credit assignment. To address these issues, we propose Hierarchical Advantage-Weighted Behavior Cloning (HABC), which trains separate critic heads for these two objectives on different data subsets and combines their outputs with a state-adaptive balance. A state-adaptive gate $g_t$ merges their one-step advantages, prioritizing viability when success is uncertain and shifting to efficiency only when viability is high, and converts the result into per-transition weights on the actor loss. Intervention-aware credit assignment further restricts outcome labels to segments executed by the current policy, preventing supervision from leaking across intervention boundaries. In real-robot experiments on three contact-rich bimanual tasks, HABC raises success from supervised fine-tuning (SFT) baselines of 36%, 44%, and 12% to 92%, 88%, and 38%.

2409.18909 2026-06-16 cs.LG cs.IT math.IT stat.ML 版本更新

Best Arm Identification with Minimal Regret

最小化遗憾的最佳臂识别

Junwen Yang, Vincent Y. F. Tan, Tianyuan Jin

发表机构 * Institute of Operations Research and Analytics National University of Singapore(运营研究与分析研究所,新加坡国立大学) Department of Mathematics Department of Electrical and Computer Engineering Institute of Operations Research and Analytics National University of Singapore(数学系电子与计算机工程系运营研究与分析研究所,新加坡国立大学) Department of Mathematics National University of Singapore(数学系新加坡国立大学)

AI总结 提出在最小化累积遗憾的同时以置信度δ识别最佳臂的问题,利用信息论推导下界,并设计渐近最优的Double KL-UCB算法。

详情
AI中文摘要

受需要负责任实验的现实应用启发,我们提出了最小化遗憾的最佳臂识别(BAI)问题。这一多臂老虎机问题的变体优雅地融合了其两个最普遍的目标:遗憾最小化和BAI。更准确地说,智能体的目标是以规定的置信水平δ识别最佳臂,同时最小化直到停止时间的累积遗憾。聚焦于单参数指数族分布,我们利用信息论技术建立了期望累积遗憾的实例相关下界。此外,我们提出了一个不可能结果,强调了固定置信度BAI中累积遗憾与样本复杂度之间的张力。作为补充,我们设计并分析了Double KL-UCB算法,该算法在置信水平趋近于零时达到渐近最优性。值得注意的是,该算法采用两种不同的置信界限以随机方式指导臂选择。我们的发现阐明了遗憾最小化与BAI之间内在联系的新视角。

英文摘要

Motivated by real-world applications that necessitate responsible experimentation, we introduce the problem of best arm identification (BAI) with minimal regret. This variant of the multi-armed bandit problem elegantly amalgamates two of its most ubiquitous objectives: regret minimization and BAI. More precisely, the agent's goal is to identify the best arm with a prescribed confidence level $δ$, while minimizing the cumulative regret up to the stopping time. Focusing on single-parameter exponential families of distributions, we leverage information-theoretic techniques to establish an instance-dependent lower bound on the expected cumulative regret. Moreover, we present an impossibility result that underscores the tension between cumulative regret and sample complexity in fixed-confidence BAI. Complementarily, we design and analyze the Double KL-UCB algorithm, which achieves asymptotic optimality as the confidence level tends to zero. Notably, this algorithm employs two distinct confidence bounds to guide arm selection in a randomized manner. Our findings elucidate a fresh perspective on the inherent connections between regret minimization and BAI.

2501.19401 2026-06-16 cs.LG stat.ML 版本更新

DAL: A Practical Prior-Free Black-Box Framework for Piecewise Stationary Bandits

DAL:一种面向分段平稳赌博机的实用无先验黑盒框架

Argyrios Gerogiannis, Yu-Han Huang, Subhonmesh Bose, Venugopal V. Veeravalli

发表机构 * Georgia Institute of Technology(佐治亚理工学院) University of California, Berkeley(加州大学伯克利分校)

AI总结 提出检测增强学习(DAL)框架,无需非平稳性先验知识,将任意最优静态赌博机算法与变化检测器结合,在多种非平稳场景下超越现有方法。

Comments 28 pages, 12 figures

详情
AI中文摘要

我们引入了一种实用的黑盒框架,称为检测增强学习(DAL),用于解决无需底层非平稳性先验知识的分段平稳赌博机问题。DAL接受任何具有阶数最优遗憾的静态赌博机算法作为输入,并通过变化检测器对其进行增强,使其适用于所有常见的赌博机变体。大量实验表明,DAL在各种非平稳场景中(包括合成基准和真实世界数据集)始终优于所有最先进的方法,凸显了其通用性和可扩展性。我们提供了对DAL强大经验性能的理论见解,并辅以彻底的经验验证。

英文摘要

We introduce a practical, black-box framework termed Detection Augmented Learning (DAL) for the problem of piecewise stationary bandits without knowledge of the underlying non-stationarity. DAL accepts any stationary bandit algorithm with order-optimal regret as input and augments it with a change detector, enabling applicability to all common bandit variants. Extensive experimentation demonstrates that DAL consistently surpasses all state-of-the-art methods across diverse non-stationary scenarios, including synthetic benchmarks and real-world datasets, underscoring its versatility and scalability. We provide theoretical insights into DAL's strong empirical performance, complemented by thorough empirical validation.

2502.19544 2026-06-16 cs.LG cs.RO 版本更新

Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data

通过非策划数据引导世界模型的高效强化学习

Yi Zhao, Aidan Scannell, Wenshuai Zhao, Yuxin Hou, Tianyu Cui, Le Chen, Dieter Büchler, Arno Solin, Juho Kannala, Joni Pajarinen

发表机构 * Aalto University(阿alto大学) University of Edinburgh(爱丁堡大学) ELLIS Institute Finland(芬兰ELLIS研究所) Deep Render Imperial College London(伦敦帝国理工学院) Max Planck Institute for Intelligent Systems(马克斯·普朗克智能系统研究所) CIFAR AI Chair(CIFAR人工智能主席) University of Alberta(阿尔伯塔大学) Alberta Machine Intelligence Institute (Amii)(阿尔伯塔机器智能研究所(Amii)) University of Oulu(奥卢大学)

AI总结 提出利用无奖励、混合质量、多本体的非策划离线数据,通过经验回放和执行引导技术解决分布偏移问题,显著提升在线强化学习的样本效率。

详情
AI中文摘要

利用离线数据是提高在线强化学习(RL)样本效率的一种有前景的方法。本文通过利用丰富的非策划数据(无奖励、混合质量、跨多个本体收集)来扩展离线到在线RL的可用数据池。尽管学习世界模型似乎有望利用此类数据,但我们发现简单的微调在许多任务上无法加速RL训练。通过仔细研究,我们将这种失败归因于微调期间离线数据和在线数据之间的分布偏移。为了解决这个问题并有效使用离线数据,我们提出了两种技术:\emph{i)} 经验回放和\emph{ii)} 执行引导。通过这些修改,非策划离线数据显著提高了RL的样本效率。在有限的样本预算下,我们的方法在跨越6个本体的72个视觉运动任务上,实现了几乎两倍于从头学习基线的总得分。在诸如移动和机器人操作等具有挑战性的任务上,它显著优于先前利用离线数据的方法。

英文摘要

Leveraging offline data is a promising way to improve the sample efficiency of online reinforcement learning (RL). This paper expands the pool of usable data for offline-to-online RL by leveraging abundant non-curated data that is reward-free, of mixed quality, and collected across multiple embodiments. Although learning a world model appears promising for utilizing such data, we find that naive fine-tuning fails to accelerate RL training on many tasks. Through careful investigation, we attribute this failure to the distributional shift between offline and online data during fine-tuning. To address this issue and effectively use the offline data, we propose two techniques: \emph{i)} experience rehearsal and \emph{ii)} execution guidance. With these modifications, the non-curated offline data substantially improves RL's sample efficiency. Under limited sample budgets, our method achieves nearly twice the aggregate score of learning-from-scratch baselines across 72 visuomotor tasks spanning 6 embodiments. On challenging tasks such as locomotion and robotic manipulation, it outperforms prior methods that utilize offline data by a decent margin.

2510.01721 2026-06-16 cs.LG 版本更新

Finite-Time Convergence of Distributionally Robust Q-Learning with Linear Function Approximation

具有线性函数逼近的分布鲁棒Q学习的有限时间收敛性

Saptarshi Mandal, Yashaswini Murthy, R. Srikant

发表机构 * ECE and CSL University of Illinois Urbana-Champaign(电子与计算机工程系和计算机科学实验室,伊利诺伊大学厄巴纳-香槟分校) Computing and Mathematical Sciences California Institute of Technology(计算与数学科学,加州理工学院) ECE, CSL, and NCSA University of Illinois Urbana-Champaign(电子与计算机工程系、计算机科学实验室和网络与计算科学中心,伊利诺伊大学厄巴纳-香槟分校)

AI总结 针对未知标称模型下的折扣鲁棒强化学习问题,提出一种结合目标网络和双函数逼近的模型无关鲁棒Q学习算法,并证明其有限时间收敛到最优鲁棒Q函数。

Comments Preprint. 54 Pages, 2 figures

详情
AI中文摘要

分布鲁棒强化学习(DRRL)旨在寻找当部署转移模型与生成数据的标称模型不同时仍表现良好的策略。大多数DRRL的有限样本保证是基于表格的、基于模型的、依赖于生成访问,或者仅在额外结构(如线性转移模型或限制性折扣因子条件)下获得函数逼近保证。我们研究了在$(s,a)$-矩形卡方不确定集下的折扣无模型鲁棒Q学习,使用鲁棒Q函数的线性逼近,仅依赖于来自未知标称模型的单条马尔可夫轨迹。我们的算法将目标网络外循环与用于卡方鲁棒贝尔曼更新的双函数逼近方案相结合。双过程使用矩跟踪评论家、后缀平均、用于方差类矩的新评估阶段以及可调平滑参数,以实现Lipschitz连续的卡方对偶梯度。我们证明了在无需小折扣因子假设的情况下,到最优鲁棒Q函数的有限时间收敛界(达到逼近误差)。我们的结果有助于缩小鲁棒RL算法的经验使用与其非鲁棒对应物可用的非渐近保证之间的差距。

英文摘要

Distributionally robust reinforcement learning (DRRL) seeks policies that perform well when the deployment transition model differs from the nominal model generating the data. Most finite-sample guarantees for DRRL are tabular, model-based, rely on generative access, or obtain function-approximation guarantees only under additional structure, such as linear-transition models or restrictive discount-factor conditions. We study discounted model-free robust Q-learning under an $(s,a)$-rectangular chi-square uncertainty set, with linear approximation of the robust Q-function, using only a single Markovian trajectory from an unknown nominal model. Our algorithm combines a target-network outer loop with a dual function-approximation scheme for the chi-square robust Bellman update. The dual procedure uses moment-tracking critics, suffix averaging, a fresh-evaluation stage for the variance-like moment, and a tunable smoothing parameter to have a Lipschitz-continuous chi-square dual gradient. We prove a finite-time convergence bound to the optimal robust Q-function up to approximation error, without imposing a small-discount-factor assumption. Our results help close a gap between the empirical use of robust RL algorithms and the non-asymptotic guarantees available for their non-robust counterparts.

2601.19612 2026-06-16 cs.LG cs.AI cs.RO 版本更新

Safe Exploration via Policy Priors

通过策略先验进行安全探索

Manuel Wendl, Yarden As, Manish Prajapat, Anton Pollak, Stelian Coros, Andreas Krause

发表机构 * ETH Zurich(苏黎世联邦理工学院)

AI总结 提出SOOPER方法,利用次优但保守的策略先验,结合概率动力学模型进行乐观探索和悲观回退,在保证安全的同时收敛到最优策略。

详情
AI中文摘要

安全探索是强化学习智能体在受控(例如模拟)环境之外在线学习和适应的关键要求。在这项工作中,我们通过利用次优但保守的策略(例如,从离线数据或模拟器中获得)作为先验来应对这一挑战。我们的方法SOOPER使用概率动力学模型进行乐观探索,但在必要时悲观地回退到保守的策略先验。我们证明了SOOPER在整个学习过程中保证安全性,并通过限制其累积遗憾建立了收敛到最优策略的保证。在关键的安全强化学习基准测试和真实硬件上的大量实验表明,SOOPER具有可扩展性,优于现有技术,并在实践中验证了我们的理论保证。

英文摘要

Safe exploration is a key requirement for reinforcement learning (RL) agents to learn and adapt online, beyond controlled (e.g. simulated) environments. In this work, we tackle this challenge by utilizing suboptimal yet conservative policies (e.g., obtained from offline data or simulators) as priors. Our approach, SOOPER, uses probabilistic dynamics models to optimistically explore, yet pessimistically fall back to the conservative policy prior if needed. We prove that SOOPER guarantees safety throughout learning, and establish convergence to an optimal policy by bounding its cumulative regret. Extensive experiments on key safe RL benchmarks and real-world hardware demonstrate that SOOPER is scalable, outperforms the state-of-the-art and validate our theoretical guarantees in practice.

2602.00781 2026-06-16 cs.LG stat.ML 版本更新

Fast Non-Episodic Finite-Horizon RL with K-Step Lookahead Thresholding

快速非情节有限时域强化学习:K步前瞻阈值法

Jiamin Xu, Kyra Gan

发表机构 * GitHub arXiv

AI总结 针对非情节有限时域MDP,提出K步前瞻Q函数与阈值机制,实现快速有限样本收敛,在合成环境和标准RL任务中优于现有方法。

详情
AI中文摘要

非情节、有限时域MDP中的在线强化学习仍未充分探索,且面临估计到固定终止时间的回报的挑战。现有的无限时域方法通常依赖折扣收缩,无法自然适应这种固定时域结构。我们引入一种修改的Q函数:不针对全时域,而是学习一个K步前瞻Q函数,将规划截断到接下来的K步。为了进一步提高样本效率,我们引入阈值机制:仅当动作的估计K步前瞻值超过时变阈值时才选择该动作。我们为这一新目标提供了一种高效的表格学习算法,证明其实现了快速有限样本收敛:对于$K=1$,达到极小极大最优常数遗憾;对于任意$K \geq 2$,达到$\mathcal{O}(\max((K-1),C_{K-1})\sqrt{SAT\log(T)})$遗憾。我们在最大化奖励的目标下数值评估了算法性能。我们的实现自适应地随时间增加K,平衡前瞻深度与估计方差。实验结果表明,在合成MDP和RL环境(JumpRiverswim、FrozenLake和AnyTrading)中,累积奖励优于最先进的表格RL方法。代码见\href{this https URL}{github}。

英文摘要

Online reinforcement learning in non-episodic, finite-horizon MDPs remains underexplored and is challenged by the need to estimate returns to a fixed terminal time. Existing infinite-horizon methods, which often rely on discounted contraction, do not naturally account for this fixed-horizon structure. We introduce a modified Q-function: rather than targeting the full-horizon, we learn a K-step lookahead Q-function that truncates planning to the next K steps. To further improve sample efficiency, we introduce a thresholding mechanism: actions are selected only when their estimated K-step lookahead value exceeds a time-varying threshold. We provide an efficient tabular learning algorithm for this novel objective, proving it achieves fast finite-sample convergence: it achieves minimax optimal constant regret for $K=1$ and $\mathcal{O}(\max((K-1),C_{K-1})\sqrt{SAT\log(T)})$ regret for any $K \geq 2$. We numerically evaluate the performance of our algorithm under the objective of maximizing reward. Our implementation adaptively increases K over time, balancing lookahead depth against estimation variance. Empirical results demonstrate superior cumulative rewards over state-of-the-art tabular RL methods across synthetic MDPs and RL environments: JumpRiverswim, FrozenLake and AnyTrading. Code is provided on \href{https://github.com/jamie01713/K-Step-Lookahead}{github}.

2602.05999 2026-06-16 cs.LG 版本更新

On the Role of Computation in Reinforcement Learning

论计算在强化学习中的作用

Raj Ghugare, Michał Bortkiewicz, Alicja Ziarko, Benjamin Eysenbach

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文形式化计算受限策略,证明使用更多计算的策略能解决更复杂问题并泛化到更长视野任务,提出可变计算量架构,实验表明仅通过增加计算量即可提升性能。

详情
AI中文摘要

强化学习策略可用的计算量如何影响其学习?使用固定参数量的策略能否从额外计算中获益?标准强化学习框架没有提供正式回答这些问题的语言。经验上,深度强化学习策略通常被参数化为具有静态架构的神经网络,混淆了计算量和参数量。在本文中,我们形式化了计算受限策略,并证明使用更多计算的策略可以解决那些计算较少的策略无法解决的问题,并泛化到更长视野的任务。基于算法学习和无模型规划方面的先前工作,我们提出了一种可以使用可变计算量的最小架构。我们的实验补充了我们的理论。在涵盖在线和离线强化学习的31个不同任务上,我们表明:(1) 仅通过使用更多计算,该架构就能实现更强的性能;(2) 与使用多达5倍参数的标准前馈网络或深度残差网络相比,该架构在更长视野的测试任务上具有更强的泛化能力。

英文摘要

How does the amount of compute available to a reinforcement learning (RL) policy affect its learning? Can policies using a fixed amount of parameters, still benefit from additional compute? The standard RL framework does not provide a language to answer these questions formally. Empirically, deep RL policies are often parameterized as neural networks with static architectures, conflating the amount of compute and the number of parameters. In this paper, we formalize compute bounded policies and prove that policies which use more compute can solve problems and generalize to longer-horizon tasks that are outside the scope of policies with less compute. Building on prior work in algorithmic learning and model-free planning, we propose a minimal architecture that can use a variable amount of compute. Our experiments complement our theory. On a set 31 different tasks spanning online and offline RL, we show that $(1)$ this architecture achieves stronger performance simply by using more compute, and $(2)$ stronger generalization on longer-horizon test tasks compared to standard feedforward networks or deep residual network using up to 5 times more parameters.

2602.06404 2026-06-16 cs.LG 版本更新

Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach

分布式对抗性赌博机问题的近最优遗憾:一种黑盒方法

Hao Qiu, Mengxiao Zhang, Nicolò Cesa-Bianchi

发表机构 * University of Iowa(爱荷华大学)

AI总结 针对分布式对抗性赌博机问题,提出基于延迟反馈黑盒归约的算法,实现遗憾上界显著改进,并给出匹配下界,证明问题难度分解为通信代价和赌博机代价。

详情
AI中文摘要

我们研究分布式对抗性赌博机问题,其中$N$个智能体合作最小化全局平均损失,同时仅观察自己的局部损失。我们证明该问题的极小极大遗憾为$\tilde{\Theta}(\sqrt{(\rho^{-1/2}+K/N)T})$,其中$T$是时间范围,$K$是动作数量,$\rho$是通信矩阵的谱间隙。我们的算法基于一种新颖的黑盒归约,将问题转化为带延迟反馈的赌博机问题,要求智能体仅通过gossip进行通信。该算法获得的上界显著优于Yi和Vojnovic(2023)之前的最佳上界$\tilde{O}(\rho^{-1/3}(KT)^{2/3})$。我们通过匹配的下界补充了这一结果,表明问题的难度分解为通信代价$\rho^{-1/4}\sqrt{T}$和赌博机代价$\sqrt{KT/N}$。我们进一步通过在分布式对抗性设置中推导一阶界和两全其美界,展示了我们方法的多功能性。最后,我们将框架扩展到$\mathbb{R}^d$中的分布式线性赌博机,通过体积生成器实现每个智能体每轮仅$O(d)$通信代价,获得$\tilde{O}(\sqrt{(\rho^{-1/2}+1/N)dT})$的遗憾界。

英文摘要

We study distributed adversarial bandits, where $N$ agents cooperate to minimize the global average loss while observing only their own local losses. We show that the minimax regret for this problem is $\tildeΘ(\sqrt{(ρ^{-1/2}+K/N)T})$, where $T$ is the horizon, $K$ is the number of actions, and $ρ$ is the spectral gap of the communication matrix. Our algorithm, based on a novel black-box reduction to bandits with delayed feedback, requires agents to communicate only through gossip. It achieves an upper bound that significantly improves over the previous best bound $\tilde{O}(ρ^{-1/3}(KT)^{2/3})$ of Yi and Vojnovic (2023). We complement this result with a matching lower bound, showing that the problem's difficulty decomposes into a communication cost $ρ^{-1/4}\sqrt{T}$ and a bandit cost $\sqrt{KT/N}$. We further demonstrate the versatility of our approach by deriving first-order and best-of-both-worlds bounds in the distributed adversarial setting. Finally, we extend our framework to distributed linear bandits in $R^d$, obtaining a regret bound of $\tilde{O}(\sqrt{(ρ^{-1/2}+1/N)dT})$, achieved with only $O(d)$ communication cost per agent and per round via a volumetric spanner.

2602.08026 2026-06-16 cs.LG stat.ML 版本更新

Sharp analysis of linear ensemble sampling

线性集成采样的尖锐分析

David Janz, Arya Akhavan, Csaba Szepesvári

发表机构 * University of Oxford, UK(牛津大学,英国) University of Alberta, Canada(阿尔伯塔大学,加拿大)

AI总结 本文针对随机线性bandits中的线性集成采样(ES)方法,证明当集成大小m=Θ(d log n)时,ES达到~O(d^{3/2}√n)的高概率遗憾,缩小了与汤普森采样基准的差距,同时保持计算量相当。

详情
AI中文摘要

我们分析了随机线性bandits中具有标准高斯扰动的线性集成采样(ES)。我们证明,对于集成大小$m=\Theta(d \log n)$,ES达到了$\tilde O(d^{3/2}\sqrt n)$的高概率遗憾,缩小了与汤普森采样基准的差距,同时保持计算量相当。证明通过将分析简化为$m$个独立布朗运动的时间一致超越问题,为线性bandits中的随机探索带来了新视角。这种连续时间视角在这里显得特别自然:它给出了相关离散时间过程的精确表示,而我们不知道有其他途径能得到尖锐的ES界。

英文摘要

We analyse linear ensemble sampling (ES) with standard Gaussian perturbations in stochastic linear bandits. We show that for ensemble size $m=Θ(d\log n)$, ES attains $\tilde O(d^{3/2}\sqrt n)$ high-probability regret, closing the gap to the Thompson sampling benchmark while keeping computation comparable. The proof brings a new perspective on randomized exploration in linear bandits by reducing the analysis to a time-uniform exceedance problem for $m$ independent Brownian motions. This continuous-time lens appears particularly natural here: it yields an exact representation of the relevant discrete-time processes, and we do not know another route to a sharp ES bound.

2602.08210 2026-06-16 cs.LG stat.ML 版本更新

CADO: From Imitation to Cost Minimization for Heatmap-based Solvers in Combinatorial Optimization

CADO:从模仿到成本最小化的组合优化热力图求解器

Hyungseok Song, Deunsol Yoon, Kanghoon Lee, Han-Seul Jeong, Soonyoung Lee, Woohyung Lim

发表机构 * LG AI Research(LG人工智能研究院)

AI总结 针对热力图求解器监督训练中模仿损失与成本最小化的目标不匹配问题,提出CADO框架,通过强化学习微调直接优化解码后解的成本,在多个基准上取得最优性能。

Comments 22 pages, 4 figures. Accepted for publication in Transactions on Machine Learning Research (TMLR), 2026. OpenReview: https://openreview.net/forum?id=fvxx5FOED6

详情
AI中文摘要

基于热力图的求解器已成为组合优化(CO)的一种有前景的范式。然而,我们认为主流的监督学习(SL)训练范式存在根本性的目标不匹配:最小化模仿损失(例如交叉熵)并不能保证解的成本最小化。我们将这种不匹配分解为两个缺陷:解码器盲区(忽视不可微的解码过程)和成本盲区(优先考虑结构模仿而非解的质量)。我们通过实验证明,这些内在缺陷施加了硬性性能上限。为了克服这一限制,我们提出了CADO(成本感知的优化扩散模型),一个简化的强化学习微调框架,将扩散去噪过程建模为MDP,以直接优化解码后的解成本。我们引入了标签中心奖励,将真实标签重新用作无偏基线而非模仿目标,以及混合微调以实现参数高效的适应。CADO在多个基准上取得了最先进的性能,验证了目标对齐对于释放热力图求解器全部潜力的必要性。

英文摘要

Heatmap-based solvers have emerged as a promising paradigm for Combinatorial Optimization (CO). However, we argue that the dominant Supervised Learning (SL) training paradigm suffers from a fundamental objective mismatch: minimizing imitation loss (e.g., cross-entropy) does not guarantee solution cost minimization. We dissect this mismatch into two deficiencies: Decoder-Blindness (being oblivious to the non-differentiable decoding process) and Cost-Blindness (prioritizing structural imitation over solution quality). We empirically demonstrate that these intrinsic flaws impose a hard performance ceiling. To overcome this limitation, we propose CADO (Cost-Aware Diffusion models for Optimization), a streamlined Reinforcement Learning fine-tuning framework that formulates the diffusion denoising process as an MDP to directly optimize the post-decoded solution cost. We introduce Label-Centered Reward, which repurposes ground-truth labels as unbiased baselines rather than imitation targets, and Hybrid Fine-Tuning for parameter-efficient adaptation. CADO achieves state-of-the-art performance across diverse benchmarks, validating that objective alignment is essential for unlocking the full potential of heatmap-based solvers.

2602.20804 2026-06-16 cs.LG cs.MA 版本更新

Probing Dec-POMDP Reasoning in Cooperative MARL

探究合作多智能体强化学习中的Dec-POMDP推理

Kale-ab Tessera, Leonard Hinckeldey, Riccardo Zamboni, David Abel, Amos Storkey

发表机构 * University of Edinburgh(爱丁堡大学)

AI总结 通过统计和信息论探针分析基线策略,发现多数基准测试无需真正的Dec-POMDP推理,反应策略性能与记忆策略相当,协调常依赖脆弱的同步耦合。

Comments To appear at the 25th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2026), added DOI

详情
Journal ref
AAMAS 2026
AI中文摘要

合作多智能体强化学习通常被建模为分散式部分可观测马尔可夫决策过程(Dec-POMDP),其难度源于两个关键挑战:部分可观测性和分散式协调。真正解决此类任务需要Dec-POMDP推理,即智能体利用历史推断隐藏状态并基于局部信息进行协调。然而,目前尚不清楚流行的基准测试是否真正需要这种推理,还是可以通过更简单的策略取得成功。我们引入了一套诊断工具,结合统计上可靠的性能比较和信息论探针,审计基线策略(IPPO和MAPPO)在涵盖MPE、SMAX、Overcooked、Hanabi和MaBrax的37个场景中的行为复杂度。我们的诊断表明,这些基准测试的成功很少需要真正的Dec-POMDP推理。在超过一半的场景中,反应策略的性能与基于记忆的智能体相当,并且涌现的协调常常依赖于脆弱的同步动作耦合,而非稳健的时间影响。这些发现表明,在当前训练范式下,一些广泛使用的基准测试可能未能充分测试核心的Dec-POMDP假设,可能导致对进展的过于乐观的评估。我们发布了诊断工具,以支持合作MARL中更严格的环境设计和评估。

英文摘要

Cooperative multi-agent reinforcement learning (MARL) is typically framed as a decentralised partially observable Markov decision process (Dec-POMDP), a setting whose hardness stems from two key challenges: partial observability and decentralised coordination. Genuinely solving such tasks requires Dec-POMDP reasoning, where agents use history to infer hidden states and coordinate based on local information. Yet it remains unclear whether popular benchmarks actually demand this reasoning or permit success via simpler strategies. We introduce a diagnostic suite combining statistically grounded performance comparisons and information-theoretic probes to audit the behavioural complexity of baseline policies (IPPO and MAPPO) across 37 scenarios spanning MPE, SMAX, Overcooked, Hanabi, and MaBrax. Our diagnostics reveal that success on these benchmarks rarely requires genuine Dec-POMDP reasoning. Reactive policies match the performance of memory-based agents in over half the scenarios, and emergent coordination frequently relies on brittle, synchronous action coupling rather than robust temporal influence. These findings suggest that some widely used benchmarks may not adequately test core Dec-POMDP assumptions under current training paradigms, potentially leading to over-optimistic assessments of progress. We release our diagnostic tooling to support more rigorous environment design and evaluation in cooperative MARL.

2603.23249 2026-06-16 cs.LG cs.AI math.OC 版本更新

A Learning Method with Gap-Aware Generation for Heterogeneous DAG Scheduling

一种具有间隙感知生成的异构DAG调度学习方法

Ruisong Zhou, Haijun Zou, Li Zhou, Chumin Sun, Zaiwen Wen

发表机构 * School of Mathematical Science, Peking University(北京大学数学科学学院) State Key Laboratory of Mathematical Sciences, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences(数学科学国家重点实验室,计算数学与科学/工程计算研究所,中国科学院数学系统科学研究院) Theory Lab, Central Research Institute, 2012 Labs, Huawei Technologies Co., Ltd(华为技术有限公司2012实验室理论实验室,中央研究院) Beijing International Center for Mathematical Research, Peking University(北京大学北京国际数学研究中心)

AI总结 提出WeCAN,一种端到端强化学习框架,通过加权交叉注意力编码器建模任务-资源池兼容性,并引入跳序扩展生成机制消除调度间隙,在TPC-H等真实DAG上优于强基线。

Comments 31pages, 8 figures

详情
AI中文摘要

有向无环图(DAG)的高效调度是大规模数据密集型计算系统的核心问题,其中查询计划、数据处理工作负载和计算图由依赖任务组成,这些任务竞争有限的异构资源池。在实践中,实现高性能执行需要调度器适应具有不同资源池和任务类型的环境,同时在严格运行时预算下生成调度。我们提出WeCAN,一种用于异构DAG调度的端到端强化学习框架,解决了任务-资源池兼容系数和生成诱导的最优性间隙。它采用两阶段单次通过设计:单次前向传播产生任务-资源池分数和全局参数,随后通过生成映射构建调度,无需重复网络调用。其加权交叉注意力编码器通过兼容系数门控建模任务-资源池交互,并且对环境波动具有规模无关性。此外,广泛使用的列表调度映射可能因受限可达性而产生生成诱导的最优性间隙。我们引入一种顺序空间分析,通过可行调度顺序刻画生成映射的可达集,解释生成诱导间隙的机制,并给出间隙消除的充分条件。在这些条件指导下,我们设计了一种跳序扩展实现,具有解析参数化的递减跳序规则,在保持单次通过效率的同时扩大可达顺序集。在真实TPC-H查询DAG、资源密集型工作负载数据集和ML编译器计算图上的实验表明,相比强基线,我们改善了完工时间,推理时间与经典启发式相当,且快于多轮神经调度器。

英文摘要

Efficient scheduling of directed acyclic graphs (DAGs) is a core problem in large-scale data-intensive computing systems, where query plans, data-processing workloads, and computation graphs consist of dependent tasks competing for limited heterogeneous resource pools. In practice, achieving high-performance execution requires schedulers to adapt across environments with varying resource pools and task types, while generating schedules under tight runtime budgets. We propose WeCAN, an end-to-end reinforcement learning framework for heterogeneous DAG scheduling that addresses task-pool compatibility coefficients and generation-induced optimality gaps. It adopts a two-stage single-pass design: a single forward pass produces task-pool scores and global parameters, followed by a generation map that constructs schedules without repeated network calls. Its weighted cross-attention encoder models task-pool interactions gated by compatibility coefficients, and is size-agnostic to environment fluctuations. Moreover, widely used list-scheduling maps can incur generation-induced optimality gaps from restricted reachability. We introduce an order-space analysis that characterizes the reachable set of generation maps via feasible schedule orders, explains the mechanism behind generation-induced gaps, and yields sufficient conditions for gap elimination. Guided by these conditions, we design a skip-extended realization with an analytically parameterized decreasing skip rule, which enlarges the reachable order set while preserving single-pass efficiency. Experiments on real-world TPC-H query DAGs, resource-intensive workload datasets, and ML-compiler computation graphs demonstrate improved makespan over strong baselines, with inference time comparable to classical heuristics and faster than multi-round neural schedulers.

2603.27450 2026-06-16 cs.LG 版本更新

FlowRL: A Taxonomy and Modular Framework for Reinforcement Learning with Diffusion Policies

FlowRL:基于扩散策略的强化学习分类与模块化框架

Chenxiao Gao, Edward Chen, Tianyi Chen, Bo Dai

AI总结 提出扩散/流策略强化学习算法的统一分类法,构建模块化JAX框架,在多个基准上提供系统比较,为算法设计与应用提供指导。

Comments accepted by RLC 2026

详情
AI中文摘要

由于其显著的灵活性,扩散模型和流模型已成为策略表示的有前途的候选者。然而,在这些策略上进行有效的强化学习仍然是一个挑战,因为缺乏用于普通策略梯度估计器的显式对数概率。尽管已经提出了许多尝试来解决这个问题,但该领域缺乏统一的视角来调和这些看似不同的方法,从而阻碍了持续的发展。在本文中,我们通过引入一个针对扩散/流策略的强化学习算法的全面分类法来弥合这一差距。为了支持可重复性和快速原型设计,我们引入了一个基于JAX的模块化开源代码库,该库利用JIT编译进行高吞吐量训练。最后,我们在Gym-Locomotion、DeepMind Control Suite和IsaacLab上提供了系统化和标准化的基准测试,提供了基于扩散的方法的严格并排比较,并为从业者根据应用选择合适算法提供了指导。我们的工作为理解和算法设计建立了清晰的基础,为未来该领域的研究提供了高效工具包,并为生成模型和机器人领域的从业者提供了算法指南。我们的代码可在此https URL获取。

英文摘要

Thanks to their remarkable flexibility, diffusion models and flow models have emerged as promising candidates for policy representation. However, efficient reinforcement learning (RL) upon these policies remains a challenge due to the lack of explicit log-probabilities for vanilla policy gradient estimators. While numerous attempts have been proposed to address this, the field lacks a unified perspective to reconcile these seemingly disparate methods, thus hampering ongoing development. In this paper, we bridge this gap by introducing a comprehensive taxonomy for RL algorithms with diffusion/flow policies. To support reproducibility and agile prototyping, we introduce a modular, JAX-based open-source codebase that leverages JIT-compilation for high-throughput training. Finally, we provide systematic and standardized benchmarks across Gym-Locomotion, DeepMind Control Suite, and IsaacLab, offering a rigorous side-by-side comparison of diffusion-based methods and guidance for practitioners to choose proper algorithms based on the application. Our work establishes a clear foundation for understanding and algorithm design, a high-efficiency toolkit for future research in the field, and an algorithmic guideline for practitioners in generative models and robotics. Our code is available at https://github.com/typoverflow/flow-rl.

2604.13085 2026-06-16 cs.LG cs.AI 版本更新

Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments

自适应记忆结晶:动态环境中自主AI智能体学习

Rajat Khanda, Mohammad Baqar, Sambuddha Chakrabarti, Satyasaran Changdar

发表机构 * GitHub

AI总结 提出自适应记忆结晶(AMC)架构,基于突触标记与捕获理论,通过三阶段记忆层次和随机微分方程实现持续强化学习,在多个基准上显著提升前向迁移、减少灾难性遗忘并降低内存占用。

详情
AI中文摘要

在动态环境中运行的自主AI智能体面临一个持续挑战:在不遗忘先前知识的情况下获取新能力。我们提出自适应记忆结晶(AMC),一种用于持续强化学习中渐进式经验巩固的记忆架构。AMC在概念上受突触标记与捕获(STC)理论的定性结构启发,即记忆经历离散的稳定阶段,但不声称模拟潜在的分子或突触机制。AMC将记忆建模为一个连续的结晶过程,其中经验根据多目标效用信号从可塑状态迁移到稳定状态。该框架引入了一个三阶段记忆层次(液态-玻璃态-晶态),由伊藤随机微分方程(SDE)控制,其群体行为由显式的福克-普朗克方程描述,该方程具有封闭形式的贝塔平稳分布。我们提供了以下证明:(i)结晶SDE的适定性和全局收敛到唯一的贝塔平稳分布;(ii)单个结晶状态指数收敛到其固定点,具有显式速率和方差界;(iii)端到端Q学习误差界和匹配的记忆容量下界,将SDE参数直接与智能体性能联系起来。在Meta-World MT50、Atari 20游戏序列学习和MuJoCo持续运动上的实证评估一致显示,前向迁移提高了34-43%(相对于最强基线),灾难性遗忘减少了67-80%,内存占用减少了62%。

英文摘要

Autonomous AI agents operating in dynamic environments face a persistent challenge: acquiring new capabilities without erasing prior knowledge. We present Adaptive Memory Crystallization (AMC), a memory architecture for progressive experience consolidation in continual reinforcement learning. AMC is conceptually inspired by the qualitative structure of synaptic tagging and capture (STC) theory, the idea that memories transition through discrete stability phases, but makes no claim to model the underlying molecular or synaptic mechanisms. AMC models memory as a continuous crystallization process in which experiences migrate from plastic to stable states according to a multi-objective utility signal. The framework introduces a three-phase memory hierarchy (Liquid--Glass--Crystal) governed by an Itô stochastic differential equation (SDE) whose population-level behavior is captured by an explicit Fokker--Planck equation admitting a closed-form Beta stationary distribution. We provide proofs of: (i) well-posedness and global convergence of the crystallization SDE to a unique Beta stationary distribution; (ii) exponential convergence of individual crystallization states to their fixed points, with explicit rates and variance bounds; and (iii) end-to-end Q-learning error bounds and matching memory-capacity lower bounds that link SDE parameters directly to agent performance. Empirical evaluation on Meta-World MT50, Atari 20-game sequential learning, and MuJoCo continual locomotion consistently shows improvements in forward transfer (+34--43\% over the strongest baseline), reductions in catastrophic forgetting (67--80\%), and a 62\% decrease in memory footprint.

2605.01961 2026-06-16 cs.LG 版本更新

Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare

多用户决斗式赌博机:一种基于纳什社会福利的公平方法

Maheed H. Ahmed, Mahsa Ghasemi

发表机构 * Electrical and Computer Engineering, Purdue University(电子与计算机工程系,普渡大学)

AI总结 针对多用户偏好异质的决斗式赌博机问题,采用纳什社会福利目标最大化用户效用乘积,提出Fair-Explore-Then-Commit和Fair-ε-Greedy算法,并证明其遗憾上界匹配下界。

详情
AI中文摘要

从人类偏好数据中学习正成为一种有用的工具,从微调大型语言模型到训练强化学习智能体。然而,在大多数场景中,模型是在所有人类评估者的平均偏好上训练的,这在偏好差异较大时可能对少数群体不公平。在这项工作中,我们考虑了决斗式赌博机中的公平性,这是一个从偏好数据中进行在线学习的标准框架。我们假设每个用户都有一个(可能不同的)康多塞赢家,即一个优于其他所有臂的臂。使用这些用户特定的康多塞赢家作为参考点,我们根据臂相对于相应赢家的表现来评估和评分。为了促进异质用户之间的公平性,我们采用了成熟的纳什社会福利目标,该目标最大化用户效用的乘积,从而固有地惩罚不平等并防止任何单个用户被边缘化。在此框架内,我们构建了一个困难实例,以建立时间范围$T$、$K$个臂和$D$个用户的遗憾下界$Ω(T^{2/3}\min(K,D)^\frac{1}{3})$,据我们所知,这是第一个量化异质偏好决斗式赌博机中公平性成本的结果。然后,我们提出了带有康多塞赢家识别阶段的Fair-Explore-Then-Commit和Fair-$ε$-Greedy算法。我们进一步推导了它们的遗憾上界,该上界在$T$的依赖关系上与下界匹配,仅相差对数因子。

英文摘要

Learning from human preference data is becoming a useful tool, from fine-tuning large language models to training reinforcement learning agents. However, in most scenarios, the model is trained on the average preference of all human evaluators, which, under large variations of preferences, can be unfair to minority groups. In this work, we consider fairness in dueling bandits, a standard framework for online learning from preference data. We assume that each user has a (potentially distinct) Condorcet winner, which is an arm preferred to every other arm. Using these user-specific Condorcet winners as reference points, we evaluate and score arms according to their performance relative to the corresponding winner. To promote fairness across heterogeneous users, we adopt the well-established Nash Social Welfare objective, which maximizes the product of user utilities, thereby inherently penalizing inequality and preventing the marginalization of any single user. Within this framework, we construct a hard instance to establish a regret lower bound of $Ω(T^{2/3}\min(K,D)^\frac{1}{3})$ for a time horizon $T$, $K$ arms, and $D$ users, which, to the best of our knowledge, is the first result quantifying the cost of fairness in dueling bandits with heterogeneous preferences. We then present the Fair-Explore-Then-Commit and Fair-$ε$-Greedy algorithms with a Condorcet winner identification phase. We further derive their regret upper bounds that match the lower-bound dependence on $T$ up to logarithmic factors.

2502.05163 2026-06-16 cs.CL cs.LG 版本更新

Enhancing LLM Safety Through a Theoretical Minimax Game Lens

通过理论极小极大博弈视角增强LLM安全性

Yihe Deng, Yu Yang, Junkai Zhang, Wei Wang, Bo Li

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校) VirtueAI University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出极小极大强化学习框架,通过数据生成器与分类器协同进化生成高质量多语言安全数据,理论证明收敛到纳什均衡,使小模型在英文基准上超越SOTA近10%且推理速度提升4.5倍。

Comments 24 pages, 9 figures, 5 tables

详情
AI中文摘要

大型语言模型(LLM)的快速发展需要有效的机制来确保其负责任部署,通过准确区分不安全内容和良性内容。尽管英文中有大量安全数据集,但由于其他语言的开源安全数据集有限,多语言安全建模仍未得到充分探索。即使在英文数据集中,安全但敏感的边界情况内容也很稀缺,导致模型出现捷径学习和非平凡的误报率。为缓解这些问题,我们引入了一种新颖的极小极大强化学习(RL)框架,其中数据生成器和分类器模型共同进化,促进高质量合成多语言安全数据的生成。我们从理论上将这种交互形式化为一个极小极大博弈,并严格证明了收敛到纳什均衡。实证评估证实,我们的合成数据生成方法显著提升了分类器模型的性能,使得一个更小的模型在英文基准上超越当前最优水平近10%,同时实现4.5倍的推理速度提升。这些结果为合成数据生成建立了一种可扩展且高效的方法,推动了更安全、更稳健的多语言LLM部署的发展。

英文摘要

The rapid advancement of large language models (LLMs) necessitates effective mechanisms to ensure their responsible deployment by accurately distinguishing unsafe content from benign content. While substantial safety datasets are available in English, multilingual safety modeling remains underexplored due to limited open-source safety datasets in other languages. Even within English datasets, safe yet sensitive corner-case content is scarce, leading to shortcut learning by models and non-trivial false-positive rates. To mitigate these issues, we introduce a novel minimax reinforcement learning (RL) framework wherein a data generator and a classifier model co-evolve, facilitating the production of high-quality synthetic multilingual safety data. We theoretically formalize this interaction as a minimax game and rigorously demonstrate convergence to a Nash equilibrium. Empirical evaluations confirm that our synthetic data generation method significantly enhances the classifier model performance, enabling a substantially smaller model to surpass the state-of-the-art by nearly 10% on English benchmarks while achieving 4.5x faster inference speed. These results establish a scalable and efficient methodology for synthetic data generation, advancing the development of safer and more robust multilingual LLM deployments.

2505.09655 2026-06-16 cs.CL cs.LG 版本更新

DRA-GRPO: Your GRPO Needs to Know Diverse Reasoning Paths for Mathematical Reasoning

DRA-GRPO:你的GRPO需要了解多样化的推理路径以进行数学推理

Xiwen Chen, Wenhui Zhu, Peijie Qiu, Xuanzhao Dong, Hao Wang, Haiyu Wu, Huayu Li, Aristeidis Sotiras, Yalin Wang, Abolfazl Razi

发表机构 * Morgan Stanley(摩根士丹利) Clemson University(克莱姆森大学) Arizona State University(亚利桑那州立大学) Washington University in St. Louis(圣路易斯华盛顿大学) University of Notre Dame(圣母大学) University of Arizona(亚利桑那大学)

AI总结 针对GRPO在数学推理中因奖励信号非单射导致策略坍塌的问题,提出基于子模互信息的多样性感知奖励调整框架DRA,通过逆倾向评分去偏梯度估计,在五个数学基准上以少量数据和成本取得平均58.2%的准确率。

Comments ACL2026

详情
AI中文摘要

使用强化学习(特别是组相对策略优化GRPO)对大型语言模型进行后训练已成为增强数学推理的一种范式。然而,标准GRPO依赖于标量正确性奖励,这些奖励在语义内容上通常是非单射的:不同的推理路径获得相同的奖励。这导致了多样性-质量不一致性,策略会坍缩到一组狭窄的主导模式,而忽略同样有效但结构新颖的策略。为弥补这一差距,我们提出了多样性感知奖励调整(DRA),这是一个理论上有基础的框架,它使用采样组的语义密度来校准奖励信号。通过利用子模互信息(SMI),DRA实现了一种逆倾向评分(IPS)机制,有效去偏梯度估计。这产生了对抗冗余的排斥力,推动策略更好地覆盖高奖励区域。我们的方法是即插即用的,并与GRPO变体无缝集成。在五个数学基准上的实证评估表明,DRA-GRPO持续优于强基线,在DeepSeek-R1-Distill-Qwen-1.5B上仅使用7,000个训练样本和55美元成本就达到了58.2%的平均准确率,突显了多样性校准在数据高效对齐中的关键作用。代码可在该网址获取。

英文摘要

Post-training LLMs with Reinforcement Learning, specifically Group Relative Policy Optimization (GRPO), has emerged as a paradigm for enhancing mathematical reasoning. However, standard GRPO relies on scalar correctness rewards that are often non-injective with respect to semantic content: distinct reasoning paths receive identical rewards. This leads to a Diversity-Quality Inconsistency, where the policy collapses into a narrow set of dominant modes while ignoring equally valid but structurally novel strategies. To bridge this gap, we propose Diversity-aware Reward Adjustment (DRA), a theoretically grounded framework that calibrates the reward signal using the semantic density of sampled groups. By leveraging Submodular Mutual Information (SMI), DRA implements an Inverse Propensity Scoring (IPS) mechanism that effectively de-biases the gradient estimation. This creates a repulsive force against redundancy, driving the policy to achieve better coverage of the high-reward landscape. Our method is plug-and-play and integrates seamlessly with GRPO variants. Empirical evaluations on five math benchmarks demonstrate that DRA-GRPO consistently outperforms strong baselines, achieving an average accuracy of 58.2% on DeepSeek-R1-Distill-Qwen-1.5B with only 7,000 training samples and $55 cost, highlighting the critical role of diversity calibration in data-efficient alignment. The code is available at https://github.com/xiwenc1/DRA-GRPO.

2509.06108 2026-06-16 cs.CG cs.LG 版本更新

Using Reinforcement Learning to Optimize the Global and Local Crossing Number

使用强化学习优化全局和局部交叉数

Timo Brand, Henry Förster, Stephen Kobourov, Daniel Kohrt, Robin Schukrafft, Markus Wallinger, Johannes Zink

发表机构 * Technical University of Munich, Heilbronn, Germany(慕尼黑技术大学(海因斯贝格)) John Cabot University, Rome, Italy(约翰·卡博特大学) Technical University of Munich, Garching, Germany(慕尼黑技术大学(戈林根))

AI总结 将图绘制视为单玩家优化游戏,利用强化学习通过移动顶点减少边交叉,提出一种优化全局或局部交叉数的策略,在局部交叉数最小化上具有竞争力。

详情
AI中文摘要

图绘制关注图的算法可视化。一个好的图绘制易于阅读并有助于解决图上的任务。已确定好的图绘制中出现的几个属性。这些属性包括低交叉数、边之间的大角度、短边以及描绘对称性。其中许多属性是可明确度量的指标。这使我们认识到图绘制可以看作一个游戏。在本文中,我们研究一个单玩家优化游戏,其中玩家迭代移动直线图绘制的顶点以减少边交叉。该游戏自然产生于图绘制挑战赛的自动赛道,其中解决方案通过重复执行局部顶点移动获得。我们将此过程形式化为一个具有完全信息的游戏,并研究强化学习是否能发现有效的策略来玩这个游戏。我们的强化学习代理观察顶点的局部几何和结构上下文,并选择一个移动方向,目标是减少全局或局部交叉数,即总交叉数或每条边的最大交叉数。我们将所得策略与现有方法和标准基准图上的既定交叉最小化启发式算法进行比较。虽然我们的方法在最小化全局交叉数方面未超越最先进的方法,但在最小化局部交叉数方面具有竞争力且通常更优。

英文摘要

Graph drawing concerns the algorithmic visualization of graphs. A good drawing of a graph is easy to read and facilitates solving tasks on the graph. Several properties have been identified to occur in good drawings of graphs. Such properties include a low number of crossings, large angles between edges, short edges, and depicting symmetries. Many of these properties are explicitly measurable metrics. This brings us to the insight that graph drawing can be seen as a game. In this paper, we study a single-player optimization game in which the player iteratively moves vertices of a straight-line graph drawing to reduce edge crossings. This game arose naturally from the automatic track of the Graph Drawing Challenge, where solutions are obtained by repeatedly performing local vertex movements. We formalize this process as a game with full information and investigate whether reinforcement learning can discover effective strategies for playing it. Our reinforcement-learning agent observes the local geometric and structural context of a vertex and selects a movement direction with the goal of reducing either the global or the local crossing number, that is, the total number of crossings or the maximum number of crossings per edge. We compare the resulting strategies to existing methods and established crossing-minimization heuristics on standard benchmark graphs. While our approach does not out-compete state-of-the-art methods for minimizing the global crossing number, it is competitive and often superior for minimizing the local crossing number.

2510.01444 2026-06-16 cs.AI cs.CL cs.LG 版本更新

Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning

双不确定性引导的多模态推理策略学习

Rui Liu, Dian Yu, Tong Zheng, Runpeng Dai, Zongxia Li, Wenhao Yu, Zhenwen Liang, Linfeng Song, Haitao Mi, Pratap Tokekar, Dong Yu

发表机构 * Tencent Hunyuan(腾讯文汇) University of Maryland(马里兰大学) University of North Carolina(北卡罗来纳大学)

AI总结 提出DUPL方法,通过量化感知不确定性和输出不确定性来引导策略更新,在多个多模态推理基准上显著提升模型准确率,优于现有方法。

详情
AI中文摘要

具有可验证奖励的强化学习(RLVR)已经提升了多模态大语言模型的推理能力。然而,现有方法通常将视觉输入视为确定性的,忽略了视觉模态固有的感知模糊性。因此,它们无法区分模型的不确定性是源于复杂推理还是模糊感知,从而无法有针对性地分配探索或学习信号。为了解决这一问题,我们引入了\textbf{DUPL},一种用于多模态RLVR的双不确定性引导策略学习方法,该方法量化并利用感知不确定性(通过对称KL散度)和输出不确定性(通过策略熵)来指导策略更新。通过建立不确定性驱动的反馈循环并采用动态分支优先级机制,DUPL重新校准策略优势,将学习重点放在具有高感知或决策模糊性的状态上,从而实现超越被动数据增强的有效目标探索。在涵盖数学和通用领域的多个多模态推理基准上,DUPL取得了显著提升。它将Qwen2.5-VL的准确率提升了高达$\textbf{12.3%}$(3B)和$\textbf{7.9%}$(7B),将Qwen3-VL-Instruct的准确率提升了高达$\textbf{10.7%}$(4B)和$\textbf{12.4%}$(8B),持续优于GRPO,同时无缝泛化到其他算法(DAPO,平均$\textbf{+6.5%}$)和架构(LLaVA-OneVision-1.5,平均$\textbf{+4.7%}$)。这些结果表明,DUPL是一种有效且可泛化的多模态RLVR方法。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has advanced reasoning capabilities in multimodal large language models. However, existing methods typically treat visual inputs as deterministic, overlooking the perceptual ambiguity inherent to the visual modality. Consequently, they fail to distinguish whether a model's uncertainty stems from complex reasoning or ambiguous perception, preventing the targeted allocation of exploration or learning signals. To address this gap, we introduce \textbf{DUPL}, a dual-uncertainty guided policy learning approach for multimodal RLVR that quantifies and leverages both perceptual uncertainty (via symmetric KL divergence) and output uncertainty (via policy entropy) to guide policy updates. By establishing an uncertainty-driven feedback loop and employing a dynamic branch prioritization mechanism, DUPL recalibrates the policy advantage to focus learning on states with high perceptual or decisional ambiguity, enabling effective targeted exploration beyond passive data augmentation. Evaluated on diverse multimodal reasoning benchmarks spanning mathematical and general domains, DUPL achieves solid gains. It improves Qwen2.5-VL accuracy by up to $\textbf{12.3%}$ (3B) and $\textbf{7.9%}$ (7B), and Qwen3-VL-Instruct by up to $\textbf{10.7%}$ (4B) and $\textbf{12.4%}$ (8B), consistently outperforming GRPO, while seamlessly generalizing to alternative algorithms (DAPO, $\textbf{+6.5%}$ avg) and architectures (LLaVA-OneVision-1.5, $\textbf{+4.7%}$ avg). These results demonstrate that DUPL is an effective and generalizable approach for multimodal RLVR.

2510.06647 2026-06-16 stat.ML cs.LG 版本更新

Q-Learning with Fine-Grained Gap-Dependent Regret

具有细粒度间隙依赖遗憾的Q学习

Haochen Zhang, Zhong Zheng, Lingzhou Xue

发表机构 * Department of Statistics, The Pennsylvania State University(统计学系,宾夕法尼亚州立大学)

AI总结 针对表格型马尔可夫决策过程,提出细粒度间隙依赖遗憾界,分别改进UCB和非UCB算法,并修正了AMB算法的设计缺陷。

详情
AI中文摘要

我们研究了在情节式表格马尔可夫决策过程中无模型强化学习的细粒度间隙依赖遗憾界。现有的无模型算法实现了极小化极大最坏情况遗憾,但其间隙依赖界仍然粗糙,未能完全捕捉次优间隙的结构。我们通过为基于UCB和非UCB的算法建立细粒度间隙依赖遗憾界来解决这一限制。在基于UCB的设置中,我们开发了一个新颖的分析框架,明确分离了最优和次优状态-动作对的分析,从而为UCB-Hoeffding (Jin et al., 2018) 提供了第一个细粒度遗憾上界。为了突出该框架的通用性,我们引入了ULCB-Hoeffding,这是一种受AMB (Xu et al., 2021) 启发但结构简化的新UCB算法,它享有细粒度遗憾保证并在经验上优于AMB。在非UCB设置中,我们重新审视了唯一已知的算法AMB,并识别出其算法设计和分析中的两个关键问题:Q更新中的不当截断以及其集中论证中鞅差条件的违反。我们提出了AMB的改进版本,解决了这些问题,为非UCB方法建立了第一个严格的细粒度间隙依赖遗憾,实验表明其性能优于AMB。

英文摘要

We study fine-grained gap-dependent regret bounds for model-free reinforcement learning in episodic tabular Markov Decision Processes. Existing model-free algorithms achieve minimax worst-case regret, but their gap-dependent bounds remain coarse and fail to fully capture the structure of suboptimality gaps. We address this limitation by establishing fine-grained gap-dependent regret bounds for both UCB-based and non-UCB-based algorithms. In the UCB-based setting, we develop a novel analytical framework that explicitly separates the analysis of optimal and suboptimal state-action pairs, yielding the first fine-grained regret upper bound for UCB-Hoeffding (Jin et al., 2018). To highlight the generality of this framework, we introduce ULCB-Hoeffding, a new UCB-based algorithm inspired by AMB (Xu et al.,2021) but with a simplified structure, which enjoys fine-grained regret guarantees and empirically outperforms AMB. In the non-UCB-based setting, we revisit the only known algorithm AMB, and identify two key issues in its algorithm design and analysis: improper truncation in the $Q$-updates and violation of the martingale difference condition in its concentration argument. We propose a refined version of AMB that addresses these issues, establishing the first rigorous fine-grained gap-dependent regret for a non-UCB-based method, with experiments demonstrating improved performance over AMB.

2510.12560 2026-06-16 cs.CV cs.LG cs.RO 版本更新

CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving

CoIRL-AD:面向自动驾驶的潜在世界模型中的协作-竞争模仿-强化学习

Xiaoji Zheng, Ziyuan Yang, Yanhao Chen, Yuhang Peng, Yuanrong Tang, Gengyuan Liu, Bokui Chen, Jiangtao Gong

发表机构 * University of Science and Technology of China(中国科学技术大学) Tsinghua University(清华大学)

AI总结 提出CoIRL-AD框架,通过解耦模仿学习与强化学习、利用潜在世界模型进行长时程奖励估计以及引入竞争机制,在离线训练中提升自动驾驶的鲁棒性,尤其在跨城市泛化和长尾场景中表现优异。

Comments 19 pages, 22 figures, ICML 2026

详情
AI中文摘要

基于模仿学习(IL)训练的端到端自动驾驶模型通常泛化能力较差,尤其是在专家演示稀疏的长尾场景中。强化学习(RL)可以提供互补的任务级监督,但在没有交互模拟器的离线设置中,将RL应用于真实世界的自动驾驶具有挑战性,因为数据集主要由专家动作主导,行为多样性有限。我们提出CoIRL-AD,一个竞争性的双策略框架,在统一的离线训练机制下整合IL和RL。CoIRL-AD将模仿和奖励优化解耦到不同的智能体中,以缓解目标冲突,使用想象的未来轨迹进行长时程奖励估计,并引入竞争机制,选择性地传递有益行为,同时使RL保持与专家驾驶行为一致。在nuScenes基准上的实验表明,CoIRL-AD在强IL基线上持续提升鲁棒性,尤其在跨城市泛化和长尾场景中取得了显著改进。代码可在以下网址获取:this https URL。

英文摘要

End-to-end autonomous driving models trained with imitation learning (IL) often generalize poorly, particularly in long-tail scenarios where expert demonstrations are sparse. Reinforcement learning (RL) can provide complementary task-level supervision, but applying RL to real-world autonomous driving is challenging in offline settings without interactive simulators, where datasets are dominated by expert actions and provide limited behavioral diversity. We propose CoIRL-AD, a competitive dual-policy framework that integrates IL and RL under a unified offline training regime. CoIRL-AD decouples imitation and reward optimization into separate actors to alleviate objective conflicts, uses imagined future rollouts for long-horizon reward estimation, and introduces a competition mechanism that selectively transfers beneficial behaviors while keeping RL anchored to expert-like driving. Experiments on the nuScenes benchmark show that CoIRL-AD consistently improves robustness over strong IL-based baselines, with especially large gains in cross-city generalization and long-tail scenarios. Code is available at: https://github.com/SEU-zxj/CoIRL-AD.

2512.22560 2026-06-16 cs.DC cs.AI cs.LG 版本更新

RollArt: Disaggregated Multi-Task Agentic RL Training at Scale

RollArt: 可分解的多任务智能体强化学习规模化训练

Wei Gao, Yuheng Zhao, Tianyuan Wu, Shaopan Xiong, Weixun Wang, Dakai An, Lunxi Cao, Dilxat Muhtar, Zichen Liu, Haizhou Zhao, Ju Huang, Siran Yang, Yongbin Li, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng, Wei Wang

发表机构 * HKUST(香港科技大学) Alibaba Group(阿里巴巴集团) Tongyi Lab, Alibaba(阿里云实验室)

AI总结 提出RollArt系统,通过将强化学习流水线分解到异构硬件上,实现多任务智能体RL的高效训练,相比现有系统减少1.31-2.05倍训练时间。

Comments 19 pages, 15 figures

详情
AI中文摘要

智能体强化学习通过与环境的多轮交互训练大语言模型,产生混合计算密集型预填充、带宽密集型解码、CPU密集型环境执行和突发性奖励评估的工作负载。现有系统要么将所有阶段共置于单一GPU集群,要么仅以粗粒度解耦,忽视了硬件异构性并导致阶段间大量同步开销。我们提出ROLLART,一个在可分解基础设施上的多任务智能体RL系统。ROLLART将每个流水线阶段映射到最合适的硬件:将预填充密集型任务路由到计算优化GPU,解码密集型任务路由到带宽优化GPU,环境任务路由到CPU集群。它在轨迹级别解耦生成,使得生成、环境交互和奖励评分可以独立进行,从而慢速或失败的环境不会阻塞其他任务。ROLLART将无状态奖励计算卸载到无服务器基础设施,并通过有界陈旧性的异步权重同步将生成与训练重叠。结果表明,ROLLART有效提高了训练吞吐量,与各种RL系统相比实现了1.31-2.05倍的训练时间减少。我们还在阿里巴巴集群上使用超过3000个GPU训练了用于Qoder产品的数千亿参数MoE模型,验证了其稳定性和可扩展性。

英文摘要

Agentic Reinforcement Learning (RL) trains LLMs through multi-turn interactions with environments, producing workloads that mix compute-bound prefill, bandwidth-bound decoding, CPU-heavy environment execution, and bursty reward evaluation. Existing systems either colocate all stages on a single GPU cluster or decouple them only at a coarse granularity, overlooking hardware heterogeneity and incurring substantial synchronization overhead across stages. We present ROLLART, a system for multi-task agentic RL on disaggregated infrastructure. ROLLART maps each pipeline stage to best-fit hardware, routing prefill-heavy tasks to compute-optimized GPUs, decode-heavy tasks to bandwidth-optimized GPUs, and environments to CPU clusters. It decouples rollout at the trajectory level, allowing generation, environment interaction, and reward scoring to proceed independently, so that slow or failed environments never block the others. ROLLART offloads stateless reward computation to serverless infrastructure and overlaps rollout with training via staleness-bounded asynchronous weight synchronization. Our results demonstrate that ROLLART effectively improves training throughput and achieves 1.31--2.05 \(\times\) training time reduction compared to various RL systems. We also evaluated ROLLART by training a hundreds-of-billions-parameter MoE model for Qoder product on an Alibaba cluster with above 3,000 GPUs, demonstrating its stability and scalability.

2602.13197 2026-06-16 cs.RO cs.CV cs.LG 版本更新

Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos

模仿有效的方法:基于仿真过滤的人类视频模块化策略学习

Albert J. Zhai, Kuo-Hao Zeng, Jiasen Lu, Ali Farhadi, Shenlong Wang, Wei-Chiu Ma

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Allen Institute for AI(Allen人工智能研究所) University of Washington(华盛顿大学) Cornell University(康奈尔大学)

AI总结 提出Perceive-Simulate-Imitate框架,通过仿真过滤人类视频中的抓取-轨迹对,学习任务导向的抓取与后抓取运动策略,无需机器人数据即可实现鲁棒操作。

Comments Transactions on Machine Learning Research (TMLR)

详情
AI中文摘要

通过观看人类视频学习操作技能的能力有潜力为机器人学习解锁新的高度可扩展数据源。本文研究抓取操作,其中任务涉及在抓取物体后执行各种后抓取运动。人类视频为学习后抓取运动提供了强信号,但对于学习先决的抓取行为帮助较小,尤其是对于没有类人手的机器人。一个有前景的方法是采用模块化策略设计,利用专用抓取生成器产生稳定抓取。然而,任意稳定抓取通常与任务不兼容,阻碍机器人执行期望的下游运动。为解决这一挑战,我们提出Perceive-Simulate-Imitate (PSI)框架,该框架使用通过仿真中配对抓取-轨迹过滤处理的人类视频运动数据来训练模块化操作策略。这一仿真步骤用抓取适用性标签扩展轨迹数据,从而允许对任务导向的抓取能力进行监督学习。通过真实世界实验,我们展示了该框架可以在没有任何机器人数据的情况下高效学习精确操作技能,相比直接使用抓取生成器,性能显著更鲁棒。

英文摘要

The ability to learn manipulation skills by watching videos of humans has the potential to unlock a new source of highly scalable data for robot learning. Here, we tackle prehensile manipulation, in which tasks involve grasping an object before performing various post-grasp motions. Human videos offer strong signals for learning the post-grasp motions, but they are less useful for learning the prerequisite grasping behaviors, especially for robots without human-like hands. A promising way forward is to use a modular policy design, leveraging a dedicated grasp generator to produce stable grasps. However, arbitrary stable grasps are often not task-compatible, hindering the robot's ability to perform the desired downstream motion. To address this challenge, we present Perceive-Simulate-Imitate (PSI), a framework for training a modular manipulation policy using human video motion data processed by paired grasp-trajectory filtering in simulation. This simulation step extends the trajectory data with grasp suitability labels, which allows for supervised learning of task-oriented grasping capabilities. We show through real-world experiments that our framework can be used to learn precise manipulation skills efficiently without any robot data, resulting in significantly more robust performance than using a grasp generator naively.

2605.29796 2026-06-16 cs.AI cs.CL cs.LG 版本更新

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

SAAS:面向智能体搜索中过度搜索缓解的自我感知强化学习

Yunbo Tang, Chengyi Yang, Shiyu Liu, Zhishang Xiang, Zerui Chen, Qinggang Zhang, Jinsong Su

发表机构 * School of Informatics, Xiamen University(厦门大学信息学院) School of Artificial Intelligence, Jilin University(吉林大学人工智能学院)

AI总结 提出SAAS强化学习框架,通过搜索边界建模、边界感知奖励和分阶段优化策略,使LLM智能体具备动态自我感知能力,在不降低准确率的前提下显著减少过度搜索。

详情
AI中文摘要

智能体搜索使LLM能够通过迭代推理和外部搜索解决复杂的多跳问题。尽管有效,但这些系统在实践中常受限于一个关键缺陷:智能体无法识别自身知识边界,在内部知识足够时盲目触发搜索,甚至在已收集足够证据时未能终止搜索。缺乏自我感知导致严重的 extbf{过度搜索},带来大量推理延迟和过高的计算成本。为此,我们提出SAAS,一种新颖的强化学习框架,旨在培养动态自我感知能力,精确调节搜索行为而不损害准确性。SAAS引入三个关键组件:(i) 搜索边界建模机制,通过对比禁用搜索和启用搜索的轨迹,识别策略演化下的搜索边界;(ii) 边界感知奖励模块,将这种边界意识转化为轨迹级惩罚,抑制不必要和冗余的搜索;(iii) 分阶段优化策略,利用顺序课程优先考虑推理而非搜索正则化,从而避免奖励黑客。大量实验表明,SAAS在保持准确性的同时大幅减少了过度搜索。我们的代码和实现细节已在https://github.com/XMUDeepLIT/SAAS发布。

英文摘要

Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe \textbf{over-search}, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy. Our code and implementation details are released at https://github.com/XMUDeepLIT/SAAS.

4. 生成模型与概率建模 39 篇

2606.15048 2026-06-16 cs.LG cs.CV 新提交

Temporal Difference Learning for Diffusion Models

扩散模型的时间差分学习

Qizhen Ying, Yangchen Pan, Victor Adrian Prisacariu, Junfeng Wen

AI总结 提出时间差分(TD)目标函数,通过将扩散过程视为马尔可夫奖励过程并利用强化学习中的策略评估,强制去噪轨迹上的跨时间一致性,显著提升少步采样下的生成质量。

Comments 15 pages, 4 figures. Accepted at ICML 2026

详情
AI中文摘要

扩散模型通常使用专注于单个时间步(或相邻对)的局部去噪目标的损失函数进行训练,这并不强制去噪轨迹上预测之间的一致性。这种跨时间一致性的缺乏会降低性能,尤其是对于少步采样器。我们引入了一个时间差分(TD)目标,惩罚模型沿去噪路径的多步进展的不一致性。通过将扩散过程重新表述为马尔可夫奖励过程,并将去噪视为强化学习中的策略评估问题,我们推导出一个统一的TD方法,适用于离散和连续时间扩散公式。我们进一步提出了一种基于样本的加权方法,稳定训练。实验表明,使用我们的TD训练可以显著提高由FID衡量的样本质量,当采样步数较少时优势更强,突显了其在低计算预算场景下的实用价值。我们进行了消融研究以证明我们的设计选择,包括成对损失加权、正则化权重和单步跨度。总体而言,我们的TD方法可以作为一种通用的即插即用模块,强制跨时间一致性并提高不同扩散生成模型的生成质量。

英文摘要

Diffusion models are typically trained with objectives that focus on local denoising targets at individual time steps (or adjacent pairs), which do not enforce consistency between predictions along the denoising trajectory. This lack of cross-time consistency can degrade performance, especially for few-step samplers. We introduce a temporal difference (TD) objective that penalizes inconsistency of the model's multi-step progress along the denoising path. By reformulating the diffusion process as a Markov reward process and casting denoising as a policy evaluation problem in reinforcement learning, we derive a unified TD approach that applies to both discrete- and continuous-time diffusion formulations. We further propose a principled sample-based reweighting method that stabilizes training. Empirically, we show that using our TD training can significantly improve sample quality measured by FID, with stronger advantages when the number of sampling steps is small, highlighting its practical utility under low-computation-budget scenarios. We provide ablation studies to justify our design choices, including pairwise loss reweighting, regularization weight, and one-step stride. Overall, our TD approach can be a general drop-in that enforces cross-time consistency and improves generation quality across different diffusion generative models.

2606.15172 2026-06-16 cs.LG 新提交

Towards a Unified Generative Model for Scarce Time Series with Domain Experts

面向稀缺时间序列的统一生成模型与领域专家

Zihao Yao, Qi Zheng, Jiankai Zuo, Yaying Zhang

AI总结 提出TimeMoDE框架,结合扩散Transformer与专家混合,通过领域提示和扩散时间步信号,在数据稀缺场景下生成高质量时间序列,显著优于现有方法。

详情
AI中文摘要

使用生成模型合成逼真的时间序列在现实场景中具有广泛的应用。尽管最近取得了进展,但大多数现有方法都是在假设训练数据充足的情况下训练的,这极大地限制了它们在数据稀缺场景中的有效性。在本文中,我们提出了TimeMoDE,一种新颖的框架,它将扩散Transformer与专家混合相结合,以利用领域适应性和扩散阶段感知能力,在数据稀缺下生成时间序列。它在多领域数据集的大规模集合上进行预训练,以提取领域无关的时间表示和领域特定信息,从而有利于微调时的泛化。我们提出领域提示来条件化专家分配,以处理不可区分的噪声令牌,减轻捕获数据集间关系的局限性。此外,我们融入扩散时间步信号,使专家具备时间序列退化变化的感知能力,促进自适应校准以应对阶段依赖的去噪需求。大量实验表明,TimeMoDE在各种低数据设置下优于现有方法。它为先进的时间序列少样本生成建立了一个创新范式。

英文摘要

Synthesizing realistic time series with generative models has wide-ranging applications in real-world scenarios. Despite recent progress, most existing methods are trained under the assumption of abundant training data, which substantially limits their effectiveness in data-scarce settings. In this paper, we propose TimeMoDE, a novel framework that integrates Diffusion Transformers with Mixture-of-Experts to exploit both domain adaptability and diffusion-stage awareness for time series generation under data scarcity. It is pre-trained on a large-scale collection of multi-domain datasets to extract domain-agnostic temporal representations and domain-specific information benefiting generalization during fine-tuning. We propose Domain Prompts to condition expert assignment for indistinguishable noised tokens, mitigating the limitations of capturing inter-dataset relationships. Moreover, we incorporate diffusion timestep signals to equip the experts with awareness of time series degradation variations, facilitating adaptive calibrate to stage-dependent denoising requirements. Extensive experiments demonstrate that TimeMoDE outperforms existing methods under diverse low-data settings. It establishes an innovative paradigm for advanced time series few-shot generation.

2606.15327 2026-06-16 cs.LG 新提交

Semantic DLM+: Improving Diffusion Language Models through Bias-variance Trade-off in Transition Kernel Design

语义DLM+:通过转移核设计中的偏差-方差权衡改进扩散语言模型

Keyue Jiang, Yuxiang Wang, Yanan Zhao, Xiang Yu, Qifang Zhao, Bohan Tang, Baojian Zhou, Yanghua Xiao, Lin Qu, Xiaoxiao Xu

发表机构 * Alibaba Group(阿里巴巴集团) Fudan University(复旦大学) University College London(伦敦大学学院) Nanyang Technological University(南洋理工大学) University of Oxford(牛津大学)

AI总结 本文通过分析泛化误差的三个关键因素,提出SemDLM+模型,通过全局转移和语义频率惩罚解决语义盆地问题,在LM1B和OpenWebText上提升了训练动态和生成质量。

详情
AI中文摘要

扩散语言模型(DLMs)已展现出作为自回归语言模型替代方案的强大扩展能力。然而,它们的性能对转移核的选择高度敏感,设计不当的核可能导致训练不稳定、收敛缓慢和采样偏差等问题。在本文中,我们通过泛化误差的原则性分析来研究这种敏感性,并确定了三个关键因素:渐近偏差(逼近后验分布的难度)、暴露偏差(采样过程中的误差传播)以及由核分散引起的优化方差。我们进一步比较了不同的转移核:掩码扩散产生稀疏且更易逼近后验的目标,而均匀扩散提供更强的采样侧修复能力但导致更难的逼近。受此权衡启发,我们重新审视了一个先前被忽视的变体——语义DLM(SemDLM),其中转移核将标记破坏为语义相似的邻域。我们的理论表明,SemDLM可以通过降低均匀扩散的后验逼近难度同时保留修复能力,作为一个合理的中间地带。然而,我们发现SemDLM存在语义盆地问题,即采样反复停留在某个语义区域内,产生低多样性的文本。为解决此问题,我们提出SemDLM+,在采样过程中添加全局转移和语义频率惩罚。在LM1B和OpenWebText上的实验表明,SemDLM+改善了训练动态,并实现了具有满意多样性的竞争性语言建模和生成质量。

英文摘要

Diffusion Language Models (DLMs) have demonstrated strong scaling capacity as alternatives to autoregressive language models. However, their performance is highly sensitive to the choice of transition kernels, and poorly designed kernels can lead to issues like training instability, slow convergence, and biased sampling. In this paper, we study this sensitivity through a principled analysis of generalization error and identify three critical factors: asymptotic bias (difficulty in approximating the posterior distribution), exposure bias (error propagation during sampling), and optimization variance induced by kernel dispersion. We further compare different transition kernels: masking diffusion yields sparse and easier posterior-approximation targets, while uniform diffusion provides stronger sampling-side repair but induces harder approximation. Motivated by this trade-off, we revisit a previously overlooked variant, semantic DLM (SemDLM), where the transition kernel corrupts tokens to neighborhoods that are semantically similar. Our theory suggests that SemDLM can serve as a plausible middle ground by reducing the posterior approximation difficulty of uniform diffusion while retaining repair ability. However, we find that SemDLM suffers from a semantic basin problem, where sampling repeatedly stays within a semantic region and produces low-diversity text. To address this, we propose SemDLM+, which adds a global transition and a semantic-frequency penalty during sampling. Experiments on LM1B and OpenWebText show that SemDLM+ improves training dynamics and achieves competitive language modeling and generation quality with satisfactory diversity.

2606.15332 2026-06-16 cs.LG 新提交

Probabilistic Signature Inversion: Learning Conditional Distributions from Truncated Signatures

概率签名反演:从截断签名中学习条件分布

Junoh Kang, Kiseop Lee, Bohyung Han

发表机构 * ECE & IPAI, Seoul National University(首尔大学电气与计算机工程系 & 人工智能研究所) Department of Statistics, Purdue University(普渡大学统计系)

AI总结 针对截断签名反演的病态问题,提出概率框架,采用签名条件流匹配模型学习路径的条件分布,并推导线性统计下的贝叶斯最优误差作为理论基线。

详情
AI中文摘要

签名变换是连续时间路径的一种原则性特征映射,因其唯一性和普适性而受到重视。然而,从截断签名中恢复路径在结构上是不适定的,因为截断签名映射不是单射的。因此,我们将截断签名反演重新构建为一个概率问题——学习给定截断签名的路径的条件分布——并采用签名条件流匹配模型作为实际估计器。这种概率公式阐明了反演的基本困难:贝叶斯重建误差量化了在给定统计量后仍存在的不可约不确定性。我们推导了线性统计下的贝叶斯最优误差,获得了对数几何布朗运动的闭式解以及对数分数布朗运动和奥恩斯坦-乌伦贝克过程的数值可处理公式,为模型验证提供了具体的理论基线。该基线是截断签名条件下贝叶斯误差的上界,因为截断签名提供了比线性统计更丰富的信息。实验表明,在线性统计条件下,经验重建误差与理论推导的基线高度一致,而当统计量替换为截断签名时,误差减小。此外,生成的路径忠实地恢复了条件签名,同时保留了关键的分布和时间结构,表明估计器对目标条件分布具有良好的校准性。这些结果共同建立了一个适定的截断签名反演概率框架,并在理论覆盖的参数过程族之外的真实金融数据上展示了其适用性。

英文摘要

The signature transform is a principled feature map for continuous-time paths, valued for its uniqueness and universality. Recovering a path from its truncated signature is, however, structurally ill-posed because the truncated signature map is not injective. We therefore reframe truncated signature inversion as a probabilistic problem -- learning the conditional distribution of a path given its truncated signature -- and adopt a signature-conditioned flow matching model as a practical estimator. This probabilistic formulation elucidates the fundamental difficulty of inversion: Bayes reconstruction error quantifies the irreducible uncertainty remaining after conditioning on a statistic. We derive the Bayes-optimal error under linear statistics, obtaining a closed form for log-GBM and numerically tractable formulas for log-fBM and OU, yielding a concrete theoretical baseline for model validation. This baseline upper-bounds the Bayes error under truncated-signature conditioning, since truncated signatures provide richer information than linear statistics. Experiments show that empirical reconstruction errors under linear-statistics conditioning faithfully align with the theory-derived baseline, while errors decrease when the statistic is replaced with truncated signatures. Moreover, generated paths faithfully recover the conditioning signature while preserving key distributional and temporal structures, indicating that the estimator is well-calibrated to the target conditional distribution. Together, these results establish a well-posed probabilistic framework for truncated-signature inversion, with applicability demonstrated on real financial data beyond the parametric process families covered by theory.

2606.15359 2026-06-16 cs.LG 新提交

DiRecT: Safe Diffusion-Based Planning via Receding-Horizon Denoising

DiRecT:基于滚动去噪的安全扩散规划

Paolo Giaretta, Zeyang Li, Navid Azizan

发表机构 * MIT(麻省理工学院)

AI总结 提出DiRecT算法,通过随机最优控制仅在最终干净轨迹上施加约束,避免中间去噪步骤过度约束,实现安全扩散规划,提升安全性和任务性能。

详情
AI中文摘要

扩散模型通过学习动作和轨迹上的多模态分布,已成为规划和控制的强大工具。然而,在安全关键任务中,可靠的推理时安全强制执行仍然是其部署的主要障碍。现有方法通常将每个去噪迭代投影到可行集上,尽管约束仅定义在最终的干净轨迹上。因此,对含噪中间样本强制执行可行性可能会过度约束采样动态,显著降低样本质量。为解决这一限制,我们引入了DiRecT(通过带终端约束的滚动去噪进行基于扩散的规划),这是一种通过随机最优控制(SOC)从扩散模型中进行无训练约束采样的算法。DiRecT仅在最终干净样本上施加约束,避免了对中间去噪动态的不必要限制。受模型预测控制的启发,我们为原本难以处理的约束SOC公式推导了一个原则性的滚动时域替代方案,从而产生了一种高效的算法,该算法将随机去噪与约束满足清晰分离,逐步将样本引导至可行的最终轨迹,而不会扭曲学习到的扩散动态。此外,DiRecT高度灵活:它可以利用现成的或特定领域的优化器,整合环境动态的先验知识,并优化额外的软奖励。在安全规划基准上的大量实验表明,与现有的基于扩散的规划基线相比,DiRecT显著提高了部署安全性和任务性能。

英文摘要

Diffusion models have emerged as powerful tools for planning and control by learning multimodal distributions over actions and trajectories. Yet reliable inference-time safety enforcement remains a key barrier to their deployment in safety-critical tasks. Existing approaches typically project each denoising iterate onto the feasible set, even though constraints are defined only on the final clean trajectory. Enforcing feasibility on noisy intermediate samples can therefore overconstrain the sampling dynamics, substantially degrading sample quality. To address this limitation, we introduce DiRecT (Diffusion-based planning via Receding-horizon denoising with Terminal constraints), a training-free algorithm for constrained sampling from diffusion models via stochastic optimal control (SOC). DiRecT enforces constraints only on the final clean sample, avoiding unnecessary restrictions on the intermediate denoising dynamics. Inspired by model predictive control, we derive a principled receding-horizon surrogate for the otherwise intractable constrained SOC formulation, yielding an efficient algorithm that cleanly separates stochastic denoising from constraint satisfaction, progressively steering samples toward feasible final trajectories without distorting the learned diffusion dynamics. Furthermore, DiRecT is highly flexible: it can leverage off-the-shelf or domain-specific optimizers, incorporate priors over environment dynamics, and optimize additional soft rewards. Extensive experiments on safe planning benchmarks demonstrate that DiRecT substantially improves deployment safety and task performance over existing diffusion-based planning baselines.

2606.15452 2026-06-16 cs.LG math.AT q-fin.RM stat.ML 新提交

PHINN: Persistent Homology Inspired Neural Network for Rare-Event Time Series Generation

PHINN: 基于持久同构的稀有事件时间序列生成神经网络

Emre Yusuf, Ren Takahashi, Jayabrata Bhaduri

发表机构 * Defense.Codes (a DBA of CapaCloud Corp)(Defense.Codes(CapaCloud Corp 的商用名))

AI总结 提出PHINN框架,利用动态Betti曲线作为条件信号和持久景观损失保持同调一致性,在金融、流行病和多模态基准上拓扑保真度优于统计和扩散基线。

Comments 15 pages, 4 figures

详情
AI中文摘要

时间序列中的稀有事件对建模至关重要,但由于数据稀缺而难以学习。当前的生成模型难以处理极端值。我们观察到稀有事件会留下独特的拓扑指纹——从点云嵌入中Betti数的转变——这些指纹比统计矩更稳定且更具判别性。我们提出了PHINN,一个流匹配框架,使用动态Betti曲线作为条件信号,并采用持久景观损失来保持同调一致性。它可扩展到多变量数据,包含一个自然语言接口来设置Betti目标,支持跨领域元学习和少样本生成,并提供经过认证的对抗鲁棒性。在金融、流行病和多模态基准上,PHINN在拓扑保真度(beta-RMSE降低41-63%,转换准确率提高84%)方面优于统计和扩散基线,在尾部覆盖方面与跳跃扩散模型相当,在形状保真度方面超过它们。所有结果均具有95%置信区间。

英文摘要

Rare events in time series are critical to model but hard to learn due to data scarcity. Current generative models struggle with extreme values. We observe that rare events leave distinct topological fingerprints - transitions in Betti numbers from point-cloud embeddings - that are more stable and discriminative than statistical moments. We introduce PHINN, a flow-matching framework using dynamic Betti curves as conditioning signals and a persistence landscape loss for homology consistency. It scales to multivariate data, includes a natural-language interface to set Betti targets, supports cross-domain meta-learning and few-shot generation, and provides certified adversarial robustness. On financial, epidemiological, and multi-modal benchmarks, PHINN outperforms statistical and diffusion baselines in topological fidelity (beta-RMSE down 41-63%, transition accuracy up 84%) and matches jump-diffusion models in tail coverage while exceeding them in shape fidelity. All results have 95% confidence intervals.

2606.15793 2026-06-16 cs.LG cs.AI stat.ML 新提交

Proximal Policy Optimization for Amortized Discrete Sampling

用于摊销离散采样的近端策略优化

Anna Zykova-Myzina, Timofei Gritsaev, Daniil Tiapkin, Nikita Morozov

发表机构 * HSE University(高等经济学院) Constructor University(康斯特大学) CMAP, CNRS, École polytechnique, IPP(CMAP,CNRS,巴黎综合理工学院,IPP)

AI总结 本文在生成流网络框架下,推导了策略梯度算法并首次应用近端策略优化,提升了离散概率分布采样的收敛速度和数据效率。

详情
AI中文摘要

本文探讨了在生成流网络(GFlowNet)框架下,使用策略梯度算法训练随机策略以从结构化离散概率分布中采样。基于GFlowNet与熵正则化强化学习之间的广泛理论联系,我们推导了用于训练GFlowNet的标准策略梯度算法的等价形式,并实验性地探索了其各种方法论方面,包括基线训练和优势估计。最重要的是,我们的工作是首次推导并成功将近端策略优化应用于GFlowNet,在从合成能量到分子图生成的基准测试中,与标准GFlowNet训练目标相比,显示出更快的收敛速度和更高的数据效率。

英文摘要

This paper explores policy gradient algorithms for training stochastic policies to sample from structured discrete probability distributions under the Generative Flow Network (GFlowNet) framework. Building on extensive theoretical connections between GFlowNets and entropy-regularized reinforcement learning, we derive equivalents of standard policy gradient algorithms for training GFlowNets, as well as experimentally explore their various methodological aspects, including baseline training and advantage estimation. Most importantly, our work is the first to derive and successfully apply proximal policy optimization to GFlowNets, showing its improved convergence speed and data efficiency compared to standard GFlowNet training objectives on benchmarks ranging from synthetic energies to molecular graph generation.

2606.15805 2026-06-16 cs.LG 新提交

Mean-Field Parallel Decoding for Discrete Diffusion Language Models

离散扩散语言模型的平均场并行解码

Tamim Zoabi, Ameen Ali, Liran Ringel, Lior Wolf

发表机构 * School of Electrical & Computer Engineering, Tel Aviv University(特拉维夫大学电气与计算机工程学院) School of Computer Science and AI, Tel Aviv University(特拉维夫大学计算机科学与人工智能学院) Department of Computer Science, Technion, Israel Institute of Technology(以色列理工学院计算机科学系)

AI总结 提出一种无需训练的解码框架,通过平均场变分松弛协调并行令牌更新,在单次前向传播中抑制冲突,提升质量-延迟权衡。

详情
AI中文摘要

离散扩散语言模型支持并行令牌生成,为低延迟解码提供了途径。然而,根据边际置信度独立选择令牌限制了并行性:单独看起来可靠的令牌在同时更新多个位置时可能形成不兼容的配置。我们引入了一种无需训练的解码框架来协调这些并行更新。在每次前向传播中,该方法为每个掩码位置分配一个提交分数,并使用从模型预测分布中导出的成对交互来细化这些分数。变分松弛产生了一个简单的定点更新,在单次前向传播中抑制了冲突的同时提交。这种机制允许解码器并行提交更多令牌,同时保持有竞争力的生成质量。该方法轻量级,不需要辅助模型或重新训练,并且可以无需修改地插入现有的扩散解码流程。在推理和代码生成基准上的实验表明,质量-延迟权衡得到了一致的改善。

英文摘要

Discrete diffusion language models enable parallel token generation, offering a pathway to low-latency decoding. However, selecting tokens independently by marginal confidence limits effective parallelism: tokens that appear reliable in isolation can form incompatible configurations when several positions are updated at once. We introduce a training-free decoding framework that coordinates these parallel updates. At each forward pass, the method assigns a commit score to each masked position and refines these scores using pairwise interactions derived from the model's predictive distributions. A variational relaxation yields a simple fixed-point update that suppresses conflicting simultaneous commitments within a single forward pass. This mechanism allows the decoder to commit more tokens in parallel while maintaining competitive generation quality. The method is lightweight, requires no auxiliary model or retraining, and drops into existing diffusion decoding pipelines without modification. Experiments on reasoning and code-generation benchmarks show consistent improvements in the quality-latency trade-off.

2606.15835 2026-06-16 cs.LG cs.AI 新提交

Wasserstein Convergence of ODE-Based Samplers in Decentralized Diffusion Model via Velocity Field Decomposition

基于速度场分解的去中心化扩散模型中ODE采样器的Wasserstein收敛性

Chencheng Tang, Xuanyu Xue, Fangyikang Wang, Chao Zhang, Hubery Yin

发表机构 * Peking University(北京大学) Shanghai Jiao Tong University(上海交通大学) MBZUAI(穆罕默德·本·扎耶德人工智能大学) Zhejiang University(浙江大学) Tencent(腾讯)

AI总结 针对去中心化扩散模型中随机专家切换的ODE采样,通过速度场分解建立Wasserstein-2距离下的收敛保证,证明N步离散化以O(N^{-1/2}+ε)速率收敛。

Comments 50 pages, 9 figures. Preprint under review

详情
AI中文摘要

扩散模型在生成任务中取得了令人印象深刻的实证成功,其收敛理论现在已相对完善。受隐私和可扩展性驱动,最近的去中心化扩散架构用多个局部专家和路由机制取代单个全局速度场,产生具有随机专家切换的采样动力学,这超出了标准扩散收敛分析的范围。在这项工作中,我们研究了具有随机速度场和基于ODE的采样的去中心化扩散框架。我们在Wasserstein-2距离下建立了收敛保证,表明$N$步离散化的分布在$W_2$中以速率$\mathcal{O}(N^{-1/2}+\varepsilon)$收敛到解析解,其中$\varepsilon$捕捉神经逼近误差。据我们所知,这是针对具有基于ODE采样方案的去中心化扩散模型的第一个$W_2$收敛结果。

英文摘要

Diffusion models have achieved impressive empirical success in generative tasks, and their convergence theory is now relatively well understood. Motivated by privacy and scalability, recent decentralized diffusion architectures replace a single global velocity field with multiple local experts and a routing mechanism, yielding a sampling dynamics with stochastic expert switching that falls outside standard diffusion convergence analyses. In this work, We study a decentralized diffusion framework with stochastic velocity fields and ODE-based sampling. We establish a convergence guarantee in Wasserstein-2 distance, showing that the distribution of the $N$-step discretization converges to the analytical solution at rate $\mathcal{O}(N^{-1/2}+\varepsilon)$ in $W_2$, where $\varepsilon$ captures the neural approximation errors. To our knowledge, this is the first $W_2$ convergence result for decentralized diffusion models with an ODE-based sampling scheme.

2606.15897 2026-06-16 cs.LG cs.AI stat.ML 新提交

Topological Flow Matching

拓扑流匹配

Kacper Wyrwal, İsmail İlkan Ceylan, Alexander Tong

发表机构 * University of Oxford(牛津大学) TU Wien(维也纳技术大学) AITHYRA

AI总结 提出拓扑流匹配,通过拉普拉斯漂移增强参考过程,在保留流匹配稳定性和无模拟目标的同时,捕捉底层域拓扑结构,适用于脑fMRI、洋流等结构化数据。

Comments Accepted at ICLR 2026. 26 pages, 24 figures. Code: https://github.com/KacperWyrwal/topological-flow-matching

详情
AI中文摘要

流匹配是一个强大的生成建模框架,因其简单性和强大的经验性能而受到重视。然而,其标准公式将结构化空间上的信号(例如脑图上的fMRI数据)视为欧几里得空间中的点,忽略了其域的丰富拓扑特征。为了解决这个问题,我们引入了拓扑流匹配,这是流匹配的一种拓扑感知泛化。我们将流匹配解释为解决退化薛定谔桥问题的框架,并通过用拉普拉斯导出的漂移增强参考过程来注入拓扑信息。这种原则性修改捕获了底层域的结构,同时保留了流匹配的理想特性:稳定的、无模拟的目标和确定性样本路径。因此,我们的框架可以作为标准流匹配的直接替代品。我们在多样化的结构化数据集上展示了其有效性,包括脑fMRI、洋流、地震事件和交通流。

英文摘要

Flow matching is a powerful generative modeling framework, valued for its simplicity and strong empirical performance. However, its standard formulation treats signals on structured spaces, such as fMRI data on brain graphs, as points in Euclidean space, overlooking the rich topological features of their domains. To address this, we introduce topological flow matching, a topology-aware generalization of flow matching. We interpret flow matching as a framework for solving a degenerate Schrödinger bridge problem and inject topological information by augmenting the reference process with a Laplacian-derived drift. This principled modification captures the structure of the underlying domain while preserving the desirable properties of flow matching: a stable, simulation-free objective and deterministic sample paths. As a result, our framework serves as a drop-in replacement for standard flow matching. We demonstrate its effectiveness on diverse structured datasets, including brain fMRIs, ocean currents, seismic events, and traffic flows.

2606.16073 2026-06-16 cs.LG stat.ML 新提交

Stop the Sampler! Classifier-Based Adaptive Stopping for Sampling Kernels

停止采样器!基于分类器的采样核自适应停止

Kirill Korolev, Nikita Morozov, Stepan Pavlenko, Esmeralda S. Whitammer, Sergey Samsonov

发表机构 * Stanford University(斯坦福大学)

AI总结 提出将MCMC轨迹终止作为可学习组件,利用非循环生成流网络训练状态依赖分类器,在保证详细平衡条件下自适应停止采样,显著缩短轨迹长度并改善模式覆盖与混合。

Comments ICML 2026 SPIGM Workshop

详情
AI中文摘要

从复杂、未归一化的概率密度中采样是贝叶斯推断和概率建模中的基本挑战。虽然马尔可夫链蒙特卡罗(MCMC)方法提供了渐近保证,但由于固定或手动调整的轨迹长度,它们常常遭受慢混合和高计算成本。在这项工作中,我们提出了一种新颖的框架,将轨迹终止视为采样动力学的可学习组件。通过将MCMC置于非循环生成流网络(GFlowNets)的理论中,我们训练状态依赖的神经分类器来决定轨迹何时到达高密度区域并应终止。我们通过详细平衡条件从理论上建立了最优分类器与目标密度之间的联系,并引入了一种多级训练方案以促进复杂几何中的探索。在各种基准密度上的实验结果表明,与标准MCMC基线相比,我们的方法显著减少了平均轨迹长度,同时改善了模式覆盖和混合。

英文摘要

Sampling from complex, unnormalized probability densities is a fundamental challenge in Bayesian inference and probabilistic modeling. While Markov chain Monte Carlo (MCMC) methods provide asymptotic guarantees, they often suffer from slow mixing and high computational costs due to fixed or manually tuned trajectory lengths. In this work, we propose a novel framework that treats trajectory termination as a learnable component of the sampling dynamics. By framing MCMC within the theory of non-acyclic generative flow networks (GFlowNets), we train state-dependent neural classifiers to decide when a trajectory has reached a high-density region and should terminate. We theoretically establish the connection between optimal classifiers and the target density via detailed balance conditions and introduce a multilevel training scheme to facilitate exploration in complex geometries. Experimental results across various benchmark densities demonstrate that our approach significantly reduces average trajectory lengths while improving mode coverage and mixing compared to standard MCMC baselines.

2606.16408 2026-06-16 cs.LG 新提交

MUNI: Multimodal Unified Latent Diffusion for Coherent Any-to-Any Generation

MUNI:面向连贯任意到任意生成的多模态统一潜在扩散

Kyeongmin Yeo, Yunhong Min, Minhyuk Sung

发表机构 * KAIST(韩国科学技术院)

AI总结 提出MUNI框架,通过端到端多模态潜在扩散和路由训练目标,实现任意到任意生成,在条件生成上匹配或超越基线,并在无条件连贯性上取得最大优势。

Comments Project page: https://muni-proj.github.io/

详情
AI中文摘要

我们提出MUNI,一个端到端的多模态潜在扩散框架,用于任意到任意生成,通过共享随机潜在变量统一了子集条件跨模态生成和无条件联合采样。现有的多模态生成模型大多基于LLM,这限制了利用特定模态生成器,并且需要文本配对数据进行训练。最近的基于扩散和流的任意到任意扩展采取了不同方向,但仍依赖于文本对齐嵌入、完全配对训练或匹配维度的确定性映射。MUNI基于两个互补贡献,一个架构上的,一个在训练目标上。首先,我们将潜在扩散扩展到端到端的多模态任意到任意生成:不是标准的两个阶段方案(预计算冻结潜在空间然后在其上拟合先验),MUNI联合训练特定模态编码器、表达性解码器和单个共享的基于流的先验,在一个目标下。其次,我们识别出多模态变分推断的标准聚合规则在与学习到的先验和表达性解码器结合时是不充分的。一个合适的共享潜在变量必须同时满足生成模态间的连贯性、子集潜在变量的预测充分性以及潜在内容的最小性。我们提出一个路由训练目标,其结构选择使潜在变量与这些标准对齐,并在可实现设置中允许最小充分性表征。在PolyMNIST-Quadrant-Labels和一个大规模图像-文本-音频基准上的实验表明,MUNI在条件生成上匹配或超过最强基线,同时在无条件连贯性上打开最大差距。项目页面:https://muni-proj.github.io/。

英文摘要

We introduce MUNI, an end-to-end multimodal latent diffusion framework for any-to-any generation that unifies subset-conditioned cross-modal generation and unconditional joint sampling through a shared stochastic latent. Existing multimodal generative models are largely LLM-based, which limits leveraging modality-specific generators and requires text-paired data for training. Recent diffusion- and flow-based any-to-any extensions take a different direction but still rely on text-aligned embeddings, fully-paired training, or matched-dimensionality deterministic mappings. MUNI rests on two complementary contributions, one architectural and one in the training objective. First, we extend latent diffusion to multimodal any-to-any generation end-to-end: instead of the standard two-stage recipe that precomputes a frozen latent space and then fits a prior over it, MUNI jointly trains modality-specific encoders, expressive decoders, and a single shared flow-based prior under one objective. Second, we identify that the standard aggregation rules of multimodal variational inference are insufficient once coupled with a learned prior and expressive decoders. A suitable shared latent must simultaneously satisfy coherence across generated modalities, predictive sufficiency of subset latents, and minimality of the latent content. We propose a routed training objective whose structural choices align the latent with these criteria and admit a minimal-sufficiency characterization in the realizable setting. Experiments on PolyMNIST-Quadrant-Labels and a large-scale image-text-audio benchmark show MUNI matching or exceeding the strongest baselines on conditional generation while opening its largest margins on unconditional coherence. Project page: https://muni-proj.github.io/.

2606.16790 2026-06-16 cs.LG cs.AI 新提交

Decision-Weighted Flow Matching for Contextual Stochastic Optimization

决策加权流匹配用于上下文随机优化

Jize Xie, Haomiao Wu, Qiang Chen, Xiu Su, Yi Chen

发表机构 * Hong Kong University of Science and Technology(香港科技大学) Central South University(中南大学) Big Data Institute(大数据研究院)

AI总结 提出决策加权流匹配(DW-FM)框架,通过重加权速度回归目标对齐下游遗憾,在CVaR基准上优于标准方法。

详情
AI中文摘要

条件生成模型越来越多地被用作随机优化的场景生成器,但标准训练目标强调均匀分布拟合,而非生成场景所引发的下游决策。这造成了目标不匹配:统计常见区域的误差对决策遗憾影响很小,而决策敏感区域的误差可能显著改变最优行动。我们提出决策加权流匹配(DW-FM),一种遗憾对齐的训练框架,它保留了标准流匹配的简单性,同时使用决策敏感的端点信息对其速度回归目标进行重加权。理论上,我们通过损失诱导的决策差异和伴随输运论证将下游遗憾与路径速度不匹配联系起来,得到一个理想的遗憾对齐替代目标以及具有遗憾保证的实用端点加权目标。实验上,我们在三个基于CVaR的上下文随机优化基准(涵盖合成投资组合、半真实金融和交通CVaR任务)上展示了DW-FM的有效性,其中DW-FM在标准基线上改善了下游遗憾。

英文摘要

Conditional generative models are increasingly used as scenario generators for stochastic optimization, but standard training objectives emphasize uniform distributional fit rather than the downstream decisions induced by generated scenarios. This creates an objective mismatch: errors in statistically common regions may have little effect on decision regret, whereas errors in decision-sensitive regions can substantially change the optimal action. We propose Decision-Weighted Flow Matching (DW-FM), a regret-aligned training framework that preserves the simplicity of standard flow matching while reweighting its velocity-regression objective using decision-sensitive endpoint information. Theoretically, we connect downstream regret to pathwise velocity mismatch through a loss-induced decision discrepancy and an adjoint transport argument, yielding an ideal regret-aligned surrogate and practical endpoint-weighted objectives with regret guarantees. Empirically, we demonstrate the effectiveness of DW-FM on three CVaR-based contextual stochastic optimization benchmarks spanning synthetic portfolio, semi-real financial, and traffic-CVaR tasks, where DW-FM improves downstream regret over standard baselines.

2606.17048 2026-06-16 cs.LG cs.CV stat.ML 新提交

Exact Posterior Score Estimation for Solving Linear Inverse Problems

精确后验分数估计用于求解线性逆问题

Abbas Mammadov, Ozgur Kara, Kaan Oktay, Iskander Azangulov, Adil Kaan Akan, Hyungjin Chung, James Matthew Rehg, Yee Whye Teh

发表机构 * University of Oxford(牛津大学) UIUC(伊利诺伊大学厄巴纳-香槟分校) EverEx

AI总结 提出精确后验分数(EPS)方法,通过闭式后验分数将线性逆问题转化为去噪问题,无需梯度或投影,在FFHQ和ImageNet上优于现有方法。

详情
AI中文摘要

扩散和基于流的模型通过训练去噪器来逆转高斯损坏,从而学习强大的数据先验。为了利用这一先验解决线性逆问题,需要从后验中采样,但先验提供的分数是无条件分数,而非后验分数。现有方法要么使用近似测量匹配校正来引导固定的预训练去噪器,要么训练一个放弃先验去噪结构的条件恢复模型。我们在一般高斯插值下推导了线性高斯逆问题的精确后验分数闭式,并表明后验采样可归结为在算子依赖的偏移枢轴和各向异性噪声协方差下的去噪问题。我们将这一恒等式转化为精确后验分数(EPS),这是一种去噪训练目标,保留了标准预训练的输入/输出结构,因此可以从头训练或从预训练去噪器微调。在推理时,EPS使用与底层骨干相同的采样器,无需似然梯度或投影。我们在FFHQ和ImageNet上的五个线性逆问题上评估了EPS,在保真度、感知和分布指标上优于无训练和基于训练的基线,同时使用的去噪器评估次数比基于梯度的后验采样器少大约一个数量级。

英文摘要

Diffusion and flow-based models learn powerful data priors by training a denoiser to reverse Gaussian corruption. To use this prior to solve a linear inverse problem, one needs to sample from the posterior, but the score that the prior provides is the unconditional score, not the posterior score. Existing methods either steer a fixed pretrained denoiser with approximate measurement-matching corrections, or train a conditional restoration model that abandons the denoising structure of the prior. We derive the exact posterior score in closed form for linear Gaussian inverse problems under general Gaussian interpolants, and show that posterior sampling reduces to a denoising problem at an operator-dependent shifted pivot under an anisotropic noise covariance. We turn this identity into Exact Posterior Score (EPS), a denoising training objective that preserves the input/output structure of standard pretraining and can therefore be trained from scratch or fine-tuned from a pretrained denoiser. At inference, EPS uses the same sampler as the underlying backbone, with no likelihood gradients or projections. We evaluate EPS on five linear inverse problems across FFHQ and ImageNet, where it outperforms training-free and training-based baselines on fidelity, perceptual, and distributional metrics, while using roughly an order of magnitude fewer denoiser evaluations than gradient-based posterior samplers.

2606.14732 2026-06-16 cs.CV cs.AI cs.LG cs.MM 交叉投稿

Steady-Forcing: Balancing Spatial Persistence and Motion Continuity in Long-Horizon Nature Video Diffusion

Steady-Forcing: 长时程自然视频扩散中空间持久性与运动连续性的平衡

Matiur Rahman Minar, Seunghun Oh, GangHyeon Jeong, Unsang Park

发表机构 * Department of Computer Science and Engineering, Sogang University(西江大学计算机科学与工程系) Department of Artificial Intelligence, Sogang University(西江大学人工智能系)

AI总结 提出Steady-Forcing框架,通过视觉锚点、运动记忆和蒸馏等技术,在长时程固定相机自然视频生成中平衡背景稳定与运动连续性,优于现有方法。

Comments Project page: https://minar09.github.io/steadyforcing/

详情
AI中文摘要

自回归视频扩散模型支持流式生成,但在长时程生成中常退化:静态场景布局漂移,而改善空间稳定性的机制往往抑制运动,导致水流、火焰或烟雾等自然流动停滞。我们研究了固定相机长时程自然视频生成中的这种稳定性-运动权衡,其中两种失败模式比移动相机设置更易区分。我们提出Steady-Forcing,一种结合持久视觉锚点(V-Sink)、指数移动平均运动记忆(EMA-Sink)、块相对时间编码、周期性缓存净化以及从Wan2.1-14B教师模型蒸馏(在任务聚焦配置下使用运动奖励先验)的记忆与训练框架。这些组件共同设计用于在数分钟的自回归生成中保持背景一致性,同时维持视觉上合理的流体动力学。在七个基线上的评估表明,Steady-Forcing改善了长时程背景一致性和成像质量,而盲用户研究显示更强的感知稳定性和运动连续性。基准评估进一步表明,通用的VBench聚合分数对固定相机伪影惩罚不足,同时将漂移引起的光流奖励为动态程度,而不直接惩罚纹理硬化或流动停滞——这激励了未来针对静态相机自然流动评估的任务特定基准。项目页面:https://minar09.github.io/steadyforcing/

英文摘要

Autoregressive video diffusion models enable streaming generation but often degrade over long rollouts: static scene layouts drift, while mechanisms that improve spatial stability tend to suppress motion, causing natural flows such as water, fire, or smoke to stagnate. We study this stability-motion trade-off in fixed-camera long-horizon nature video generation, where the two failure modes can be more clearly separated than in moving-camera settings. We propose Steady-Forcing, a memory and training framework combining a persistent visual anchor (V-Sink), an exponential moving-average motion memory (EMA-Sink), block-relative temporal encoding, periodic cache purification, and distillation from a Wan2.1-14B teacher with motion-rewarded priors under task-focused configurations. Together, these components are designed to preserve background identity while sustaining visually plausible fluid dynamics over multi-minute autoregressive rollouts. Evaluations across seven baselines show that Steady-Forcing improves long horizon background consistency and imaging quality, while a blind user study indicates stronger perceived stability and motion continuity. The benchmark evaluation further suggest that generic VBench aggregate scores under-penalize fixed-camera artifacts as well as rewarding drift-induced optical flow as Dynamic Degree while not directly penalizing texture hardening or flow stagnation - motivating future task-specific benchmarks for static-camera nature-flow evaluation. Project page: https://minar09.github.io/steadyforcing/

2606.14756 2026-06-16 cs.CV cs.AI cs.LG 交叉投稿

Divide-and-Denoise: A Game-Theoretic Method for Fairly Composing Diffusion Models

分而除噪:一种公平组合扩散模型的博弈论方法

Abhi Gupta, Polina Barabanshchikova, Vikas Garg, Samuel Kaski, Tommi Jaakkola

发表机构 * Massachusetts Institute of Technology(麻省理工学院) University of Washington(华盛顿大学) University of Cambridge(剑桥大学)

AI总结 提出Divide-and-Denoise方法,通过公平分配博弈协调多个预训练扩散模型,在采样时划分区域并引导各模型去噪,解决模型主导或冲突问题,在条件图像生成中优于基线。

Comments Accepted as spotlight at ICML 2026

详情
AI中文摘要

大量预训练扩散模型为组合提供了机会。然而,组合多个模型存在一个模型主导或模型间相互冲突的风险。在此,我们提出Divide-and-Denoise,一种在采样过程中协调多个预训练扩散模型的方法。类似于管理专业劳动力,我们的方法在模型间创建了公平且高效的劳动分工。我们方法的核心是分配的概念,它定义了每个模型对含噪样本每个区域的责任。在每个时间步,我们通过以下步骤去噪:(i) 通过求解公平分配博弈更新分配,其中我们在公平约束下将样本划分为最大化总效用的区域,以及(ii) 使模型与这种分配对齐,引导每个模型在其分配区域内去噪。这导致了一个新的复合去噪过程,该过程与划分过程同步演化。我们在条件图像生成上评估了Divide-and-Denoise。在包括GenEval基准在内的多个质量指标上,我们的方法优于基线,并解决了常见失败情况,包括缺失对象和属性不匹配。实验表明,Divide-and-Denoise利用了每个模型的专业知识,同时不忽视任何其他模型。

英文摘要

The abundance of pre-trained diffusion models provides an opportunity for composition. Combining several models, however, runs the risk of one model dominating or models disagreeing with each other. Here, we propose Divide-and-Denoise, a method for coordinating multiple pre-trained diffusion models during sampling. Much like managing a specialized workforce, our method creates a fair but efficient division of labor across models. Central to our method is the notion of an allocation which defines the responsibility of each model to every region of the noisy sample. At every timestep, we then denoise by (i) updating the allocation by solving a fair division game, where we divide the sample into regions that maximize total utility under fairness constraints, and (ii) aligning the models with this allocation, where we guide each model to denoise within its assigned region. This leads to a new composite denoising process that evolves in tandem with a division process. We evaluate Divide-and-Denoise on conditional image generation. Across several quality metrics, including the GenEval benchmark, our method outperforms baselines and resolves common failures including missing objects and mismatched attributes. Experiments show that Divide-and-Denoise utilizes each model's expertise without neglecting any other model.

2606.14800 2026-06-16 stat.ME cs.LG eess.IV stat.ML 交叉投稿

Bridging data-driven priors via the score function for posterior sampling -- Comparative review and experimental study

通过得分函数桥接数据驱动先验进行后验采样——比较综述与实验研究

Elhadji Cisse Faye, Mame Diarra Fall, Sylvain Delchini, Nicolas Dobigeon

发表机构 * IDP, Univ Orléans(IDP,奥尔良大学) LITIS, Univ Rouen Normandie(LITIS,鲁昂-诺曼底大学) Bureau de Recherches Géologiques et Minières Orléans, France(奥尔良地质与矿业研究局,法国) IRIT, Univ Toulouse(图卢兹大学IRIT)

AI总结 本文综述了贝叶斯逆问题中多种数据驱动先验如何通过得分函数统一,并展示其在采样算法中的有效集成,通过图像修复和超分辨率实验验证了方法的效率与通用性。

详情
AI中文摘要

本文综述了贝叶斯逆问题中常用的多种数据驱动先验如何通过各自的得分函数统一起来。通过将这些先验置于这一共同视角下,我们表明它们可以受益于直接且有效地集成到最近提出的采样算法中。通过考虑几种数据驱动先验,即去噪正则化、基于归一化流的先验、基于得分的生成模型和凸脊正则化,说明了这一通用框架的适用性。对于这四种特定的先验,在图像修复和单图像超分辨率任务中评估了该方法的性能。这些结果以及在地质背景下恢复真实图像的结果证明了该方法的效率。这一统一框架证明足够通用,能够处理由广泛类别的基于得分函数的先验定义的任何后验分布,而不仅限于本文考虑的具体情况。

英文摘要

This paper reviews how a diverse set of popular data-driven priors commonly used in Bayesian inverse problems can be unified through their respective score functions. By framing these priors under this common perspective, we show that they can benefit from their straightfoward and effective integration into a recently proposed sampling algorithm. The applicability of this common framework is illustrated by considering several data-driven priors, namely regularization-by-denoising, normalizing flow-based priors, score-based generative models, and convex-ridge regularizers. For these four particular priors, the performance of the method is evaluated when conducting image inpainting and single image super-resolution. These results, as well as those obtained when restoring real images acquired in a geological context, demonstrate the efficiency of the method. This unified framework proves versatile enough to handle any posterior distribution defined by a broad class of score function-based priors, beyond the specific cases considered in this paper.

2606.15344 2026-06-16 cond-mat.dis-nn cs.LG physics.optics quant-ph 交叉投稿

Generative modelling powered by room-temperature polariton condensates

基于室温极化激元凝聚的生成建模

Yuan Wang, Marcin Muszynski, Avinash Dash, Rishabh Kaurav, Vinod M. Menon, Oleksandr Kyriienko

发表机构 * School of Mathematical and Physical Sciences, University of Sheffield, Sheffield S10 2TN, United Kingdom(谢菲尔德大学数学与物理科学学院) Department of Physics, City College of New York, New York, NY 10031, USA(纽约城市学院物理系) Physics Doctoral Program, Graduate Center of the City University of New York, New York, NY 10016, USA(纽约城市大学研究生中心物理博士项目) Chemistry Doctoral Program, Graduate Center of the City University of New York, New York, NY 10016, USA(纽约城市大学研究生中心化学博士项目)

AI总结 利用有机染料微腔中室温激子-极化激元凝聚体的非线性多体动力学和固有随机性,作为生成对抗网络中的物理随机变换层,实现条件数字到图像翻译,优于数字注入扰动方法。

Comments 9 pages and 4 figures in the main text; 17 pages SM; codes to be released

详情
AI中文摘要

生成建模需要高效的随机非线性变换以及能够自然实现这些变换的物理平台。我们实验证明,工作在强光-物质耦合机制下的非线性光学系统可以作为条件生成建模的物理变换层。具体而言,我们开发了一个工作流程,其中在有机染料微腔中形成的室温激子-极化激元凝聚体作为生成对抗网络中的物理随机变换,实现条件数字到图像翻译。通过利用极化激元凝聚体的非线性多体动力学和固有随机性,该工作流程优于基于数字注入扰动的基线方法。我们发现,与数字采样和基于激光的系统相比,通过生成对抗网络(Polariton GAN)进行的极化激元增强采样提高了初始分数、数字保留精度和结构相似性。我们进一步表明,空间相关的输出变化可以自然地正则化对抗训练并增强输出多样性。我们的结果确立了极化激元凝聚作为生成建模的新计算资源,为物理增强机器学习系统开辟了道路。

英文摘要

Generative modelling requires efficient stochastic nonlinear transformations and physical platforms that can naturally realise them. We experimentally demonstrate that nonlinear optical systems operating in the strong light-matter coupling regime can serve as physical transformation layers for conditional generative modelling. Specifically, we develop a workflow in which room-temperature exciton-polariton condensates formed in organic dye microcavities act as a physical stochastic transform within a generative adversarial network and enable conditional digit-to-image translation. By using the nonlinear many-body dynamics and intrinsic stochasticity of polariton condensates, the workflow outperforms baseline approaches based on digitally injected perturbations. We find that polariton-enabled sampling via generative adversarial network (Polariton GAN) yields improved inception score, digit preservation accuracy and structural similarity compared with both digital sampling and laser-based systems. We further show that spatially correlated output variations can naturally regularise adversarial training and enhance output diversity. Our results establish polariton condensation as a new computational resource for generative modelling, opening a pathway towards physics-enhanced machine learning systems.

2606.15442 2026-06-16 stat.ML cs.LG 交叉投稿

The Reverse Telescoping Coordinate System for Positive Definite Matrices: Geometry, Computation, and Generative Modeling

正定矩阵的反向望远镜坐标系:几何、计算与生成建模

Anindya Bhadra

发表机构 * Purdue University(普渡大学)

AI总结 提出一种新的无约束坐标系,通过反向望远镜映射表示对称正定矩阵,实现雅可比仅依赖对数行列式、矩阵与逆矩阵的符号表示,并设计分裂体积-形状流模型用于生成建模。

详情
AI中文摘要

我们设计了一种新的无约束坐标系,其中 $p\times p$ 对称正定(SPD)矩阵 $\Theta$ 由反向望远镜映射 $\Theta(x)=\rm{RT}(x)$ 表示,其中 $x=(v,d,r)\in\mathbb{R}\times\mathbb{R}^{(p-1)}\times\mathbb{R}^{p(p-1)/2}$ 分别代表对数体积或对数行列式;以及形状,由对数相对对角尺度与节点间的部分协方差编码。这一构造产生了其他坐标图(如矩阵对数)所不具备的重要性质,例如雅可比仅依赖于对数行列式。我们构造的一个有用特性是 $x$ 包含矩阵及其逆的无损符号表示。许多涉及矩阵及其逆的重要计算可以在变换域中以 $O(p^2)$ 完成,而将结果以矩阵形式呈现(按需)才需要 $O(p^3)$ 成本。此外,变换域中两个单位行列式矩阵可以通过一条路径上单位行列式的直线连接。对于生成建模,这允许设计一个分裂体积-形状流模型,通过条件流匹配在单位行列式路径上传输形状,并有一个独立的一维流传输体积或行列式。令人生畏的SPD约束被驯服为强大的引导力,带来令人惊讶的洞察:在某种意义上,为SPD设计体积归一化的形状流比无约束的 $\mathbb{R}^{p\times p}$ 更容易,因为后者没有内在的体积概念来辅助归一化,而SPD矩阵的行列式则提供了这一点。我们将我们的构造应用于高达 $p=200$ 的SPD矩阵生成建模,针对一个困难的合成双峰目标,以及通过fMRI数据训练的模型生成脑连接网络;还应用于SPD流形上的内在扩散。

英文摘要

We design a new unconstrained coordinate system where a $p\times p$ symmetric positive definite (SPD) matrix $Θ$ is represented by a reverse telescoping map $Θ(x)=\rm{RT}(x)$, with $x=(v,d,r)\in\mathbb{R}\times\mathbb{R}^{(p-1)}\times\mathbb{R}^{p(p-1)/2}$, representing respectively the log volume or log determinant; and the shape, as encoded by log relative diagonal scales and partial covariances among the nodes. This construction results in important properties not available in other charts, e.g., matrix logarithm, such as Jacobian depending on only the log-determinant. A useful feature of our construction is $x$ contains a lossless symbolic representation of both the matrix and its inverse. Many important computations involving a matrix and its inverse can be performed in $O(p^2)$ in the transformed domain, while it is the rendering of results in matrix forms (on demand) that must incur an $O(p^3)$ cost. Moreover, two unit-determinant matrices in the transformed domain can be joined by a straight line with pathwise unit determinant. For generative modeling, this allows designing a split volume-shape flow model trained by conditional flow matching for transporting the shape over the unit-determinant path, with a separate one-dimensional flow for transporting the volume or the determinant. The forbidding SPD constraint, tamed thus into a powerful guiding force, leads to the surprising insight that it is in some sense easier to design a volume-normalized shape flow for SPD compared to the unconstrained $\mathbb{R}^{p\times p}$, with no intrinsic notion of volume to aid normalization, unlike the determinant of SPD matrices. We apply our construction for up to $p=200$ in generative modeling of SPD matrices on a difficult synthetic bimodal target, and in generating brain connectivity networks by models trained on fMRI data; as well as in intrinsic diffusion on the SPD manifold.

2606.15457 2026-06-16 cs.CV cs.LG 交叉投稿

Lesion-DDPM: Lesion-Enhanced 3D Diffusion for MS MRI Synthesis

Lesion-DDPM:用于MS MRI合成的病灶增强3D扩散模型

Weidong Zhang, Yongchan Jung, Shafayat Mowla Anik, Furen Xiao, Vasudevan Janarthanan, Enkhzaya Chuluunbaatar, Byeong Kil Lee, Jeeho Ryoo

发表机构 * University of Texas at Arlington(德克萨斯大学阿灵顿分校) University of Texas at San Antonio(德克萨斯大学圣安东尼奥分校) University of Texas at Dallas(德克萨斯大学达拉斯分校) National Taiwan University Hospital(国立台湾大学医院) National University of Mongolia(蒙古国立大学) University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 提出Lesion-DDPM,一种3D条件扩散框架,通过多级解剖掩膜注入和病灶加权重建损失,实现病灶感知的FLAIR合成,在MS病灶分割下游任务中显著提升Dice分数。

详情
AI中文摘要

3D FLAIR MRI被广泛推荐为多发性硬化(MS)脑部成像的标准MRI序列之一,但公开可用的MS数据集仍然相对较小,且在不同扫描仪、采集协议和病灶模式上存在差异。这种稀缺性和异质性阻碍了稳健的神经影像机器学习模型的发展,尤其对于旨在合成图像同时保留小而稀疏病灶的生成模型而言,这是一个挑战。我们提出了Lesion-DDPM,一种用于病灶感知FLAIR合成的3D条件扩散框架,该框架结合了多级解剖掩膜注入以及病灶加权重建损失,以在保持整体大脑结构的同时强调病灶体素。使用MSLesSeg数据集的精选子集,我们将Lesion-DDPM与代表性的最先进GAN和扩散模型进行比较,评估图像生成指标和下游3D U-Net分割性能。在我们的实验中,Lesion-DDPM在所有方法中实现了最低的病灶区域重建误差。在下游3D U-Net病灶分割任务中,仅使用Lesion-DDPM生成的扫描训练并在真实MRI上评估的模型达到了0.616的Dice分数,而最佳竞争合成数据集为0.569。当将Lesion-DDPM图像添加到真实训练集中时,Dice分数进一步增加到0.685。

英文摘要

3D FLAIR MRI is widely recommended as one of the standard MRI sequences for brain imaging in multiple sclerosis (MS), but publicly available MS datasets remain relatively small and vary across scanners, acquisition protocols, and lesion patterns. This scarcity and variability hinder the development of robust neuroimaging machine learning models and are particularly challenging for generative models that aim to synthesize images while preserving small, sparse lesions. We propose Lesion-DDPM, a 3D conditional diffusion framework for lesion-aware FLAIR synthesis that incorporates multi-level anatomical mask injection together with a lesion-weighted reconstruction loss to emphasize lesion voxels while maintaining global brain structure. Using a curated subset of the MSLesSeg dataset, we compare Lesion-DDPM with representative state-of-the-art GAN- and diffusion-based models, assessing both image-generation metrics and downstream 3D U-Net segmentation. In our experiments, Lesion-DDPM achieved the lowest lesion-region reconstruction error among all methods. In a downstream 3D U-Net lesion segmentation task, a model trained only on Lesion-DDPM-generated scans and evaluated on real MRIs reached a Dice score of 0.616 compared with 0.569 for the best competing synthetic dataset. When Lesion-DDPM images were added to the real training set, the Dice score further increased to 0.685.

2606.15871 2026-06-16 stat.CO cs.LG stat.ML 交叉投稿

Amortized mean-shift interacting particles

摊销均值漂移交互粒子

Ali Siahkoohi

发表机构 * Department of Computer Science University of Central Florida(计算机科学系佛罗里达中央大学)

AI总结 提出摊销均值漂移交互粒子方法,通过学习映射从观测和少量后验样本直接输出加权节点,无需评估密度或得分,实现比同等数量蒙特卡洛样本更精确的积分估计。

详情
AI中文摘要

逆问题的贝叶斯推断用于评估积分——后验期望、尾部概率和风险——跨观测流。标准估计通过对后验样本的积分求平均,其误差仅随样本量的平方根衰减,因此精度需要大量样本——当每个样本调用偏微分方程正演模型时,这是禁止的。均值漂移交互粒子需要的样本少得多:它们返回一小组带符号权重的节点——一种确定性求积,其加权平均值估计这些积分。然而,寻找节点是一个每次观测的优化,在其最精确的形式中,每一步都读取后验得分——返回它本意要节省的成本。我们引入了摊销均值漂移交互粒子,一种学习映射,在单次前向传递中从观测和几个后验样本输出加权节点。训练仅需要联合参数-观测样本和一个可供抽样的后验——条件归一化流、经验条件或用户能抽样的任何参考——映射仅从样本学习积分该后验,既不评估其密度也不评估其得分。一旦训练完成,它泛化到未见过的观测和任意节点预算的积分,并以两种方式改进独立样本:通过重新加权,证明不劣于蒙特卡洛的等权重;通过移动它们,经验上进一步降低误差。在闭式、抽样、学习和基于物理的后验中——直到一千个系数的地下水场——它在每个预算下比相同数量的样本更准确地积分,并且后验白化、维度感知核消除了高维障碍。结果是蒙特卡洛积分的帕累托改进,而非与抽取更多样本竞争。

英文摘要

Bayesian inference for inverse problems is run to evaluate integrals -- posterior expectations, tail probabilities, and risks -- across a stream of observations. The standard estimate averages the integrand over posterior samples, a Monte-Carlo average whose error decays only as the square root of the sample size, so accuracy demands many samples -- prohibitive when each one calls a partial-differential-equation forward model. Mean-shift interacting particles need far fewer: they return a small set of signed-weight nodes -- a deterministic quadrature whose weighted averages estimate those integrals. Finding the nodes, however, is a per-observation optimization that, in its most accurate form, reads the posterior score at every step -- returning the cost it meant to save. We introduce amortized mean-shift interacting particles, a learned map that emits the weighted nodes from an observation and a few posterior samples in a single forward pass. Training asks only for joint parameter-observation samples and a posterior to draw from -- a conditional normalizing flow, an empirical conditional, or any reference the user can sample -- and the map learns to integrate that posterior from samples alone, evaluating neither its density nor its score. Once trained, it generalizes to unseen observations and integrands at any node budget and improves on independent samples in two ways: by reweighting them, provably no worse than the equal weights of Monte-Carlo; and by moving them, which empirically lowers it further. Across closed-form, sampled, learned, and physics-based posteriors -- up to a thousand-coefficient groundwater field -- it integrates more accurately than the same number of samples at every budget, and a posterior-whitened, dimension-aware kernel removes the high-dimensional wall. The result is a Pareto improvement on Monte-Carlo integration, not a competitor to drawing more samples.

2606.16138 2026-06-16 stat.ML cs.LG 交叉投稿

Closing the Approximation Gap in Simulation-free Latent SDEs

弥合无模拟潜在随机微分方程中的近似差距

Henry D. Smith, Brian L. Trippe, Scott W. Linderman

发表机构 * Stanford University(斯坦福大学)

AI总结 针对现有无模拟变分推断算法因参数化限制导致后验推断和参数学习性能下降的问题,提出Helmholtz-SDE算法,通过优化与指定边际分布兼容的路径律来弥合近似差距,在保持高效的同时恢复更准确的动力学。

详情
AI中文摘要

从含噪声观测中恢复动力系统是包括神经科学和物理学在内的科学领域中的反复挑战。潜在随机微分方程通过将系统建模为根据可学习SDE演化并生成观测的未观测状态来解决这一问题。变分推断为拟合潜在SDE提供了可处理的目标。传统的VI算法通过在时间离散化上进行数值模拟来评估该目标,在保真度和计算成本之间进行权衡。最近一类算法,即无模拟VI,通过其瞬时边际而不是漂移来参数化后验,从而避开了这种权衡。在这项工作中,我们表明现有无模拟VI算法的效率是有代价的:它们的参数化将近似后验限制为基于模拟的方法可用的SDE的子集,降低了后验推断和参数学习。我们提出了Helmholtz-SDE,一种无模拟VI算法,通过优化与指定边际分布集合兼容的路径律来弥合这一差距。Helmholtz-SDE比先前的无模拟方法更忠实地恢复动力学,在高后验不确定性下增益最大。它进一步以一小部分运行时间匹配基于模拟的VI的性能。

英文摘要

Recovering dynamical systems from noisy observations is a recurring challenge across scientific domains, including neuroscience and physics. Latent stochastic differential equations (SDEs) address this by modeling the system as an unobserved state that evolves according to a learnable SDE and generates the observations. Variational inference (VI) provides a tractable objective for fitting latent SDEs. Traditional VI algorithms evaluate this objective by numerical simulation over a time discretization, trading fidelity for computational cost. A recent class of algorithms, simulation-free VI, sidesteps this tradeoff by parameterizing the posterior through its instantaneous marginals rather than its drift. In this work, we show that the efficiency of existing simulation-free VI algorithms comes at a price: their parameterizations restrict the approximate posterior to a subset of the SDEs available to simulation-based methods, degrading posterior inference and parameter learning. We propose Helmholtz-SDE, a simulation-free VI algorithm that closes this gap by optimizing over path laws compatible with a prescribed collection of marginals. Helmholtz-SDE recovers dynamics more faithfully than prior simulation-free methods, with the largest gains under high posterior uncertainty. It further matches the performance of simulation-based VI at a fraction of the runtime.

2606.16219 2026-06-16 cs.CE cs.LG physics.comp-ph 交叉投稿

Graphical conditional generative modeling for digital twin modeling

面向数字孪生建模的图条件生成建模

Zongren Zou, Théo Bourdais, Ricardo Baptista, Houman Owhadi

发表机构 * Department of Computing and Mathematical Sciences, California Institute of Technology(计算与数学科学系,加州理工学院) Department of Statistical Sciences, University of Toronto(统计科学系,多伦多大学)

AI总结 针对数字孪生建模中的保真度问题,提出一种基于条件生成模型和高斯过程方差分析(核模式分解)的框架,从观测数据中发现影响目标条件分布的关键变量,构建简约随机代理模型,并在控制、强化学习等任务中验证其性能。

详情
AI中文摘要

数字孪生建模,包括模型不确定性下的控制和数据同化,通常面临一个开放式的保真度问题:增加变量、数据流和时间尺度会无限增加模型复杂度,最终产生难以维护、验证、解释以及用于压力或安全测试的系统。作为替代方案,可以寻求仅基于描述相关感兴趣量所需的变量构建的简约随机代理模型。我们引入了一个框架,通过识别哪些候选输入影响目标量的完整条件律(而不仅仅是其条件均值),从观测数据中发现此类变量。这一区别在随机、粗粒化或部分观测系统中至关重要,在这些系统中,依赖关系可能通过变异性、尾部行为、多模态或不确定性的变化而非确定性函数关系表现出来。该框架将条件生成模型(学习给定候选输入下目标的条件分布)与基于高斯过程的方差分析(通过核模式分解)相结合,从而能够迭代剪除非影响输入并发现可解释的结构。在控制设置中,得到的代理模型可以解释为学习到的马尔可夫决策过程:该方法不仅识别出转移模型,还识别出使学习到的动态过程有效马尔可夫所需的状态、动作和记忆变量。在涉及随机动力系统、缺失变量、偏微分方程控制、强化学习和经济数据的多个示例中,发现的结构产生了可解释的随机代理模型,其下游性能与在完整变量集上训练的模型相当。

英文摘要

Digital twin modeling, including control and data assimilation under model uncertainty, often faces an open-ended fidelity problem: adding variables, data streams, and time scales can indefinitely increase model complexity, ultimately producing systems that are difficult to maintain, validate, interpret, and use for stress or safety testing. As an alternative, one can seek parsimonious stochastic surrogate models built only on the variables needed to describe the relevant quantities of interest. We introduce a framework for discovering such variables from observational data by identifying which candidate inputs influence the full conditional law of a target quantity, rather than only its conditional mean. This distinction is essential in stochastic, coarse-grained, or partially observed systems, where dependencies may appear through changes in variability, tail behavior, multimodality, or uncertainty rather than through deterministic functional relationships. The framework couples conditional generative modeling, which learns the conditional distribution of the target given candidate inputs, with Gaussian-process-based analysis of variance (through kernel mode decomposition), which enables iterative pruning of non-influential inputs and interpretable structure discovery. In control settings, the resulting surrogate can be interpreted as a learned Markov decision process: the method identifies not only a transition model, but also the state, action, and memory variables needed to make the learned dynamics effectively Markovian. Across examples involving stochastic dynamical systems, missing variables, PDE control, reinforcement learning, and economic data, the discovered structures yield interpretable stochastic surrogates whose downstream performance is comparable to models trained on the full variable set.

2606.16273 2026-06-16 stat.ML cs.LG stat.ME 交叉投稿

Generative Modeling on Metric Graphs via Neural Optimal Transport

基于神经最优传输的度量图生成建模

Alessandro Micheli, Yueqi Cao, Anthea Monod, Samir Bhatt

发表机构 * Imperial College London(帝国理工学院伦敦分校) KTH Royal Institute of Technology(皇家理工学院) Statens Serum Institut(丹麦国家血清研究所) University of Copenhagen(哥本哈根大学)

AI总结 提出首个深度生成建模框架,用于度量图上连续分布,通过图嵌入、神经半对偶求解熵Kantorovich问题并投影回原图,理论证明收敛性,实验优于离散图OT基线。

详情
AI中文摘要

我们提出了,据我们所知,首个用于紧度量图上连续支撑概率分布的深度生成建模框架。给定度量图上的源测度和目标测度,我们的方法将图嵌入到光滑环境空间,通过神经半对偶参数化求解熵Kantorovich问题,并将生成的样本投影回原始图。我们研究了两种嵌入几何:外在欧几里得实现和内在热带Abel--Jacobi嵌入到Jacobian环面。在这两种情况下,生成的生成器通过构造支持在图上。我们证明,在增加神经表达能力的联合极限下,学习到的生成器弱收敛到原始图测度之间的有效传输耦合。实验上,在一系列几何不同的图上,我们的方法匹配或改进了基于离散图OT的启发式传输基线,同时具有更好的可扩展性。最后,我们通过在纽约市曼哈顿的一百万Uber上车点数据上训练模型,展示了在真实世界城市移动数据上的可扩展性。

英文摘要

We introduce, to our knowledge, the first deep generative modeling framework for probability distributions continuously supported on compact metric graphs. Given source and target measures on a metric graph, our method embeds the graph into a smooth ambient space, solves an entropic Kantorovich problem via a neural semidual parameterization, and projects generated samples back onto the original graph. We study two embedded geometries: an extrinsic Euclidean realization and the intrinsic tropical Abel--Jacobi embedding into the Jacobian torus. In both cases, the resulting generator is graph-supported by construction. We prove that, in the joint limit of increasing neural expressivity, the learned generator converges weakly to a valid transport coupling between the original graph measures. Empirically, across a range of geometrically distinct graphs, our method matches or improves upon heuristic transport baselines based on discrete graph OT, while scaling more favorably. Finally, we demonstrate scalability on real-world urban mobility data by training our model on one million Uber pickup locations in Manhattan, New York City.

2606.16610 2026-06-16 stat.ML cs.LG 交叉投稿

Diffusion Flow Matching: Dimension-Improved KL Bounds and Wasserstein Guarantees

扩散流匹配:维度改进的KL界和Wasserstein保证

Marta Gentiloni Silveri, Giovanni Conforti, Alain Durmus

发表机构 * Ecole Polytechnique, Massy Palaiseau, France(法国高等理工学院,马希-帕莱索)

AI总结 本文针对基于布朗运动的扩散流匹配,在KL散度和2-Wasserstein距离下推导了改进的离散化误差收敛界,实现了维度依赖的最优缩放。

详情
AI中文摘要

扩散流匹配(DFM)最近已成为生成建模的多功能框架,但其理论收敛性质仍仅被部分理解。在这项工作中,我们为基于布朗运动的DFM提供了精炼且新颖的收敛保证,重点关注离散化误差。我们的分析是在Kullback-Leibler(KL)散度和2-Wasserstein距离下进行的。在有限矩条件和温和的分数可积性假设下,我们推导了KL收敛界,与先前工作相比具有改进的维度依赖性,据我们所知,在最小条件下实现了最先进的缩放。我们进一步将分析扩展到2-Wasserstein距离:在额外的一阶分数可积性假设和弱对数凹性条件下,我们获得了与KL情况一致的维度依赖性的收敛保证。

英文摘要

Diffusion Flow Matching (DFM) has recently emerged as a versatile framework for generative modeling, yet its theoretical convergence properties remain only partially understood. In this work, we provide refined and novel convergence guarantees for Brownian motion based DFMs, focusing on the discretization error. Our analysis is conducted under the Kullback-Leibler (KL) divergence and the 2-Wasserstein distance. Under finite-moment conditions and a mild score integrability assumption, we derive KL convergence bounds with improved dimensional dependence compared to prior work, achieving, up to our knowledge, state-of-the-art scaling under minimal conditions. We further extend the analysis to the 2-Wasserstein distance: under an additional first-order score integrability assumption and a weak log-concavity condition, we obtain convergence guarantees with dimensional dependence consistent with the KL case.

2509.24223 2026-06-16 cs.LG cs.CV stat.ML 版本更新

Semantic Editing with Coupled Stochastic Differential Equations

耦合随机微分方程的语义编辑

Jianxin Zhang, Clayton Scott

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出耦合随机微分方程(coupled SDEs)引导预训练生成模型的采样过程,无需重新训练即可实现高提示保真度和近像素级一致性的语义编辑。

详情
AI中文摘要

使用预训练的文本到图像模型编辑图像内容仍然具有挑战性。现有方法常常扭曲细节或引入意外伪影。我们提出使用\emph{耦合随机微分方程}(coupled SDEs)来引导任何可以通过求解SDE进行采样的预训练生成模型的采样过程,包括扩散模型和整流流模型。通过用相同的相关噪声驱动源图像和编辑图像,我们的方法将新样本引导至所需语义,同时保持与源图像的视觉相似性。该方法开箱即用,无需重新训练或辅助网络,并实现了高提示保真度和近像素级一致性。这些结果使耦合SDE成为受控生成式AI的简单而强大的工具。项目页面:此 https URL。代码:此 https URL。

英文摘要

Editing the content of an image with a pretrained text-to-image model remains challenging. Existing methods often distort fine details or introduce unintended artifacts. We propose using \emph{coupled stochastic differential equations} (coupled SDEs) to guide the sampling process of any pre-trained generative model that can be sampled by solving an SDE, including diffusion and rectified flow models. By driving both the source image and the edited image with the same correlated noise, our approach steers new samples toward the desired semantics while preserving visual similarity to the source. The method works out-of-the-box, without retraining or auxiliary networks, and achieves high prompt fidelity along with near-pixel-level consistency. These results position coupled SDEs as a simple yet powerful tool for controlled generative AI. Project page: https://z-jianxin.github.io/syncSDE-release/. Code: https://github.com/Z-Jianxin/syncSDE-release.

2603.17353 2026-06-16 cs.LG cs.AI 版本更新

Learning Permutation Distributions via Reflected Diffusion on Ranks

通过秩上的反射扩散学习排列分布

Sizhuang He, Yangtian Zhang, Shiyang Zhang, David van Dijk

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出Soft-Rank Diffusion框架,通过将排列松弛为软秩实现平滑扩散,并引入上下文广义Plackett-Luce去噪器,在排序和组合优化任务上优于现有扩散方法。

Comments 18 pages including the appendix, 7 figures, 9 tables, Accepted at ICML 2026

详情
AI中文摘要

有限对称群 S_n 为排列提供了自然域,但由于其阶乘增长的大小和离散、非欧几里得结构,在 S_n 上学习概率分布具有挑战性。最近的排列扩散方法通过基于洗牌的随机游走(例如,riffle shuffles)定义前向加噪,并使用 Plackett-Luce (PL) 变体学习反向转移,但由此产生的轨迹可能很突兀,并且随着 n 的增长,去噪变得越来越困难。我们提出 Soft-Rank Diffusion,一种离散扩散框架,用结构化的软秩前向过程取代基于洗牌的破坏:通过将离散秩松弛为软秩,将排列提升到连续的潜在表示,从而产生更平滑、更易处理的轨迹。对于反向过程,我们引入了上下文广义 Plackett-Luce (cGPL) 去噪器,它推广了先前的 PL 风格参数化,并提高了序列决策结构的表达能力。在排序和组合优化基准上的实验表明,Soft-Rank Diffusion 始终优于先前的扩散基线,在长序列和内在序列设置中尤其有显著优势。

英文摘要

The finite symmetric group S_n provides a natural domain for permutations, yet learning probability distributions on S_n is challenging due to its factorially growing size and discrete, non-Euclidean structure. Recent permutation diffusion methods define forward noising via shuffle-based random walks (e.g., riffle shuffles) and learn reverse transitions with Plackett-Luce (PL) variants, but the resulting trajectories can be abrupt and increasingly hard to denoise as n grows. We propose Soft-Rank Diffusion, a discrete diffusion framework that replaces shuffle-based corruption with a structured soft-rank forward process: we lift permutations to a continuous latent representation of order by relaxing discrete ranks into soft ranks, yielding smoother and more tractable trajectories. For the reverse process, we introduce contextualized generalized Plackett-Luce (cGPL) denoisers that generalize prior PL-style parameterizations and improve expressivity for sequential decision structures. Experiments on sorting and combinatorial optimization benchmarks show that Soft-Rank Diffusion consistently outperforms prior diffusion baselines, with particularly strong gains in long-sequence and intrinsically sequential settings.

2604.02751 2026-06-16 cs.LG 版本更新

Understanding Latent Diffusability via Fisher Geometry

通过Fisher几何理解潜在可扩散性

Jing Gu, Morteza Mardani, Wonjun Lee, Dongmian Zou, Gilad Lerman

发表机构 * School of Mathematics, University of Minnesota(明尼苏达大学数学系) NVIDIA(NVIDIA公司) Department of Mathematics, The Ohio State University(俄亥俄州立大学数学系) Zu Chongzhi Center and DIRC, Duke Kunshan University(杜克-昆山大学祖冲之中心和DIRC)

AI总结 通过Fisher信息与Fisher信息率分解MMSE变化率,量化潜在空间可扩散性,并导出保持扩散稳定性的理论条件。

详情
AI中文摘要

扩散模型在潜在空间中常常退化,但其形式原因尚不清楚。我们通过沿扩散轨迹的最小均方误差(MMSE)变化率来量化潜在空间可扩散性。我们的框架将MMSE率分解为Fisher信息(FI)和Fisher信息率(FIR)的贡献。我们证明,虽然全局等距保证了FI对齐,但FIR由编码器和数据几何之间的相互作用控制。我们的分析将扩散退化解耦为四个惩罚项:维度压缩、切向畸变、高频编码器曲率和内在数据曲率。我们推导了保持FIR的理论条件,以确保稳定的可扩散性。跨多种自编码架构的实验证明了我们理论界限的意义。我们将FI和FIR建立为理解潜在可扩散性的全面分析框架。

英文摘要

Diffusion models often degrade in latent spaces, yet the formal causes remain poorly understood. We quantify latent-space diffusability via the rate of change of the Minimum Mean Squared Error (MMSE) along the diffusion trajectory. Our framework decomposes this MMSE rate into contributions from Fisher Information (FI) and Fisher Information Rate (FIR). We demonstrate that while global isometry ensures FI alignment, FIR is governed by the interplay between encoder and data geometries. Our analysis decouples diffusion degradation into four penalties: dimensional compression, tangential distortion, high-frequency encoder curvature, and intrinsic data curvature. We derive theoretical conditions for FIR preservation to ensure stable diffusability. Experiments across diverse autoencoding architectures demonstrate the implications of our theoretical bounds. We establish FI and FIR as a comprehensive analytical framework for understanding latent diffusability.

2606.03212 2026-06-16 cs.LG 版本更新

Bayesian Tensor Decomposition with Diffusion Model Prior

贝叶斯张量分解与扩散模型先验

Zerui Tao, Qibin Zhao

发表机构 * Zerui Tao(泽瑞·陶) Qibin Zhao(赵启斌)

AI总结 提出DiffBCP框架,结合累积收缩过程先验和预训练扩散模型,通过分裂吉布斯采样实现贝叶斯CP分解,在图像修复和去噪任务中优于现有方法。

Comments ICML 2026

详情
AI中文摘要

低秩张量分解(TD)通常对干净、完全观测的数据有效,但在严重缺失或噪声下性能下降。低秩性本身是一种有用但有限的结构先验,额外的手工先验(如稀疏性或平滑性)仍难以捕捉真实世界数据的丰富统计特性。为了在重度污染下补偿这种弱的归纳偏置,我们希望注入一个学习到的、数据驱动的先验;然而,最先进的扩散模型与当前的TD和可处理的后验推断并不兼容。为了解决这些挑战,我们引入了DiffBCP,一种混合先验的贝叶斯CP分解框架,它将CP因子上的累积收缩过程先验(用于自动秩选择)与一个现成的预训练扩散模型(作为重构张量上的隐式数据先验)相结合。尽管似然、低秩约束和扩散先验之间存在耦合,为了使后验推断可处理,我们开发了一个分裂吉布斯采样器:CP因子允许共轭更新,而扩散块通过低秩引导的去噪进行采样。一个噪声自适应的耦合调度进一步减少了对手动调参退火的敏感性。在图像修复和去噪(包括高分辨率分布外图像)上的实验表明,与贝叶斯、非线性和即插即用TD基线相比,该方法具有一致的改进。

英文摘要

Low-rank tensor decomposition (TD) is usually effective on clean, fully observed data, but it often degrades under severe missingness or noise. Low-rankness is itself a useful but limited structural prior, and additional handcrafted priors (e.g., sparsity or smoothness) still fall short of capturing the rich statistics of real-world data. To compensate for this weak inductive bias under heavy corruption, one would like to inject a learned, data-driven prior; however, the state-of-the-art diffusion models are not readily compatible with current TD and tractable posterior inference. To address these challenges, we introduce DiffBCP, a hybrid-prior Bayesian CP decomposition framework that couples a cumulative shrinkage process prior over the CP factors for automatic rank selection with an off-the-shelf pre-trained diffusion model as an implicit data prior on the reconstructed tensor. To make posterior inference tractable despite the coupling among the likelihood, low-rank constraint, and diffusion prior, we develop a split Gibbs sampler: CP factors admit conjugate updates, while the diffusion block is sampled via low-rank-guided denoising. A noise-adaptive coupling schedule further reduces sensitivity to hand-tuned annealing. Experiments on image inpainting and denoising, including high-resolution out-of-distribution images, show consistent gains over Bayesian, nonlinear, and plug-and-play TD baselines.

2606.06007 2026-06-16 cs.LG 版本更新

Diffusion Models for Adaptive Sequential Data Generation

自适应序列数据生成的扩散模型

Haoyang Cao, Minshuo Chen, Yinbin Han, Renyuan Xu

发表机构 * Department of Applied Mathematics and Statistics, Data Science and AI Institute, and Mathematical Institute for Data Science, Johns Hopkins University(应用数学与统计学系、数据科学与人工智能研究所、数据科学数学研究所,约翰霍普金斯大学) Department of Industrial Engineering and Management Sciences, Northwestern University(工业工程与管理科学系,西北大学) Department Management Science and Engineering, Stanford University(管理科学与工程系,斯坦福大学)

AI总结 提出一种顺序前向后向扩散框架,通过沿序列逐步注入和去除噪声并基于历史生成条件确保自适应性,用于生成自适应时间序列数据,并引入新的分数匹配目标实现高效并行训练,在合成数据和均值-方差最优投资组合构建中验证有效性。

Comments 38 pages

详情
AI中文摘要

生成逼真的合成序列数据在运筹学、金融、医疗、能源系统和科学计算等实际应用中至关重要,这些领域使用时间索引观测进行预测、模拟、风险评估和数据驱动决策。虽然扩散模型在生成静态数据方面取得了显著成功,但其直接扩展到序列设置往往无法捕捉时间依赖性和信息结构。设计能够以自适应方式模拟序列数据且不预知未来信息的扩散模型仍然是一个开放挑战。在这项工作中,我们提出了一种用于自适应时间序列生成的顺序前向后向扩散框架。我们的方法沿序列逐步注入和去除噪声,并基于先前生成的历史进行条件化以确保自适应性。引入了一种新的分数匹配目标以实现高效的并行训练。我们在一个通用框架下推导了严格的统计保证,然后以ReLU网络作为具体实例建立了分数逼近、分数估计和分布估计结果。在实验上,我们在合成数据(包括ARMA模型和高斯过程)上验证了我们的方法,并展示了其在构建均值-方差最优投资组合中的有效性。

英文摘要

Generating realistic synthetic sequential data is critical in real-world applications across operations research, finance, healthcare, energy systems, and scientific computing, where time-indexed observations are used for prediction, simulation, risk assessment, and data-driven decision-making. While diffusion models have achieved remarkable success in generating static data, their direct extensions to sequential settings often fail to capture temporal dependence and information structure. Designing diffusion models that can simulate sequential data in an adapted manner, and hence without anticipation of future information, therefore remains an open challenge. In this work, we propose a sequential forward-backward diffusion framework for adapted time series generation. Our approach progressively injects and removes noise along the sequence, conditioning on the previously generated history to ensure adaptiveness. A novel score-matching objective is introduced for efficient parallel training. We derive rigorous statistical guarantees under a generic framework, then establish score approximation, score estimation, and distribution estimation results with ReLU networks serving as a concrete instance. Empirically, we validate our method on synthetic data, including ARMA models and Gaussian processes, and demonstrate its effectiveness in constructing mean-variance optimal portfolios.

2505.04486 2026-06-16 cs.CV cs.AI cs.LG 版本更新

Efficient Flow Matching using Latent Variables

使用潜在变量的高效流匹配

Anirban Samaddar, Yixuan Sun, Viktor Nilsson, Sandeep Madireddy

发表机构 * Argonne National Laboratory(阿贡国家实验室) KTH Royal Institute of Technology(皇家理工学院)

AI总结 提出Latent-CFM方法,利用预训练深度潜在变量模型提取数据特征作为条件,提升流匹配模型的训练效率和生成质量,在图像和物理场生成任务中优于现有方法。

详情
AI中文摘要

流匹配模型在概率生成模型的图像生成任务中显示出巨大潜力。然而,文献中的大多数流匹配模型在从简单源分布(如标准高斯)学习流时,并未显式利用目标数据中的潜在聚类结构。这导致学习效率低下,尤其是对于许多通常位于低维流形中的高维真实世界数据集。为此,我们提出了 $\texttt{Latent-CFM}$,它通过使用预训练的深度潜在变量模型从数据中提取的特征作为条件,提供了高效的训练策略。通过对来自多模态分布的合成数据和广泛使用的图像基准数据集的实验,我们表明,$\texttt{Latent-CFM}$ 通过采用预训练的轻量级潜在变量模型,在显著减少训练和计算量的情况下,展现出比最先进的流匹配模型更好的生成质量。除了自然图像,我们还考虑了源自物理过程的空间场的生成建模。使用二维达西流数据集,我们证明了我们的方法比竞争方法生成更物理准确的样本。此外,通过潜在空间分析,我们证明了我们的方法可用于以潜在特征为条件的条件图像生成,这增加了生成过程的可解释性。

英文摘要

Flow matching models have shown great potential in image generation tasks among probabilistic generative models. However, most flow matching models in the literature do not explicitly utilize the underlying clustering structure in the target data when learning the flow from a simple source distribution like the standard Gaussian. This leads to inefficient learning, especially for many high-dimensional real-world datasets, which often reside in a low-dimensional manifold. To this end, we present $\texttt{Latent-CFM}$, which provides efficient training strategies by conditioning on the features extracted from data using pretrained deep latent variable models. Through experiments on synthetic data from multi-modal distributions and widely used image benchmark datasets, we show that $\texttt{Latent-CFM}$ exhibits improved generation quality with significantly less training and computation than state-of-the-art flow matching models by adopting pretrained lightweight latent variable models. Beyond natural images, we consider generative modeling of spatial fields stemming from physical processes. Using a 2d Darcy flow dataset, we demonstrate that our approach generates more physically accurate samples than competing approaches. In addition, through latent space analysis, we demonstrate that our approach can be used for conditional image generation conditioned on latent features, which adds interpretability to the generation process.

2511.09465 2026-06-16 stat.ML cs.LG 版本更新

Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions

分支流:带有分裂和删除的离散、连续和流形流匹配

Lukas Billera, Hedwig Nora Nordlinder, Jack Collier Ryder, Anton Oresten, Aron Stålmarck, Theodor Mosetti Björk, Ben Murrell

发表机构 * Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet(卡罗林斯卡研究所微生物学、肿瘤和细胞生物学系)

AI总结 提出分支流框架,通过随机分支和死亡过程控制序列元素数量,适用于变长数据生成,并在小分子、抗体序列和蛋白质骨架生成中验证效果。

Comments 39 pages, 16 figures

详情
AI中文摘要

扩散和流匹配方法在状态空间连续的领域(如图像生成或蛋白质折叠与设计)以及离散领域(如扩散大语言模型)中显示出前景。当状态中的元素数量预先固定时(如图像),它们自然适用,但当大语言模型响应的长度或蛋白质链中的氨基酸数量未知时,则需要临时解决方案。这里我们提出分支流,一种生成建模框架,与扩散和流匹配方法一样,将简单分布传输到数据分布。但在分支流中,状态中的元素在二叉树森林上演化,以模型学习的速率随机分支和死亡。这使得模型在生成过程中能够控制序列中的元素数量。我们还表明,分支流可以与离散集、连续欧几里得空间、光滑流形以及混合这些组件的“多模态”乘积空间上的任何流匹配基础过程组合。我们在三个领域进行了演示:小分子生成(多模态)、抗体序列生成(离散)和蛋白质骨架生成(多模态),并表明分支流是一个具有稳定学习目标的能力分布学习器,并且它实现了新的能力。

英文摘要

Diffusion and flow matching approaches to generative modeling have shown promise in domains where the state space is continuous, such as image generation or protein folding & design, and discrete, exemplified by diffusion large language models. They offer a natural fit when the number of elements in a state is fixed in advance (e.g. images), but require ad hoc solutions when, for example, the length of a response from a large language model, or the number of amino acids in a protein chain is not known a priori. Here we propose Branching Flows, a generative modeling framework that, like diffusion and flow matching approaches, transports a simple distribution to the data distribution. But in Branching Flows, the elements in the state evolve over a forest of binary trees, branching and dying stochastically with rates that are learned by the model. This allows the model to control, during generation, the number of elements in the sequence. We also show that Branching Flows can compose with any flow matching base process on discrete sets, continuous Euclidean spaces, smooth manifolds, and `multimodal' product spaces that mix these components. We demonstrate this in three domains: small molecule generation (multimodal), antibody sequence generation (discrete), and protein backbone generation (multimodal), and show that Branching Flows is a capable distribution learner with a stable learning objective, and that it enables new capabilities.

2512.07212 2026-06-16 cs.AI cs.LG 版本更新

Sample from What You See: Visuomotor Policy Learning via Diffusion Bridge with Observation-Embedded Stochastic Differential Equation

从所见中采样:基于观测嵌入随机微分方程的扩散桥视觉运动策略学习

Zhaoyang Liu, Mokai Pan, Zhongyi Wang, Kaizhen Zhu, Haotao Lu, Haipeng Zhang, Jingya Wang, Ye Shi

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出BridgePolicy,通过扩散桥公式将观测直接集成到随机动力学中,利用语义对齐器处理异构观测,在模拟和真实任务中超越现有生成式策略。

Comments Accepted by ICML 2026

详情
AI中文摘要

基于扩散模型的模仿学习通过捕获多模态动作分布推动了机器人控制的发展。然而,现有方法通常仅将观测视为去噪网络的高层条件,而非将其整合到扩散过程本身的随机动力学中。因此,采样被迫从随机噪声开始,削弱了感知与控制之间的耦合,往往导致次优性能。我们提出BridgePolicy,一种生成式视觉运动策略,通过扩散桥公式将观测直接集成到随机动力学中。通过构建观测信息轨迹,BridgePolicy使采样能够从丰富且信息丰富的先验而非随机噪声开始,显著提高了控制的精度和可靠性。一个关键难点是扩散桥通常连接维度匹配的分布,而机器人观测是异构的,且与动作自然不对齐。为克服这一点,我们引入语义对齐器来统一视觉和状态输入,并将观测与动作表示对齐,使扩散桥适用于异构机器人数据。在三个基准测试的52个模拟任务和5个真实世界任务上的大量实验表明,BridgePolicy持续优于最先进的生成式策略。我们的代码可在此https URL获取。

英文摘要

Imitation learning with diffusion models has advanced robotic control by capturing the multi-modal action distributions. However, existing methods typically treat observations only as high-level conditions to the denoising network, rather than integrating them into the stochastic dynamics of the diffusion process itself. As a result, the sampling is forced to begin from random noise, weakening the coupling between perception and control and often yielding suboptimal performance. We propose BridgePolicy, a generative visuomotor policy that directly integrates observations into the stochastic dynamics via a diffusion-bridge formulation. By constructing an observation-informed trajectory, BridgePolicy enables sampling to start from a rich and informative prior rather than random noise, substantially improving precision and reliability in control. A key difficulty is that diffusion bridge normally connects distributions of matched dimensionality, while robotic observations are heterogeneous and not naturally aligned with actions. To overcome this, we introduce a semantic aligner to unify the visual and state inputs and align the observations with action representations, making diffusion bridge applicable to heterogeneous robot data. Extensive experiments across 52 simulation tasks on three benchmarks and 5 real-world tasks demonstrate that BridgePolicy consistently outperforms state-of-the-art generative policies. Our code is available at https://jianghcsr.github.io/BridgePolicy_page/.

2512.15313 2026-06-16 cs.SD cs.LG 版本更新

Time-Varying Audio Effect Modeling by End-to-End Adversarial Training

通过端到端对抗训练进行时变音频效果建模

Yann Bourdin, Pierrick Legrand, Fanny Roche

发表机构 * Arturia Inria center at the University of Bordeaux(Inria中心,位于波尔多大学)

AI总结 提出一种生成对抗网络框架,仅用输入输出音频记录建模时变音频效果,无需调制信号提取,通过两阶段训练策略和状态预测网络实现黑箱建模。

Comments (03/2026) Accepted to the Journal of the Audio Engineering Society (JAES). Accompanying website: https://ybourdin.github.io/sptvmod

详情
AI中文摘要

深度学习已成为音频效果建模的标准方法,但严格的黒箱建模对于时变系统仍然存在问题。与时不变效果不同,在具有内部调制的设备上训练模型通常需要记录或提取控制信号,以确保标准损失函数所需的时间对齐。本文介绍了一种生成对抗网络(GAN)框架,仅使用输入输出音频记录来建模此类效果,无需调制信号提取。我们提出了一种卷积循环架构,通过两阶段策略进行训练:初始对抗阶段允许模型在没有严格相位约束的情况下学习调制行为的分布,随后是监督微调阶段,其中状态预测网络(SPN)估计所需的初始内部状态,以使模型与目标同步。此外,开发了一种基于啁啾信号的新指标来量化调制精度。对复古硬件移相器的建模实验证明了该方法在完全黑箱上下文中捕获时变动态的能力。

英文摘要

Deep learning has become a standard approach for the modeling of audio effects, yet strictly black-box modeling remains problematic for time-varying systems. Unlike time-invariant effects, training models on devices with internal modulation typically requires the recording or extraction of control signals to ensure the time-alignment required by standard loss functions. This paper introduces a Generative Adversarial Network (GAN) framework to model such effects using only input-output audio recordings, without requiring a modulation signal extraction. We propose a convolutional-recurrent architecture trained via a two-stage strategy: an initial adversarial phase allows the model to learn the distribution of the modulation behavior without strict phase constraints, followed by a supervised fine-tuning phase where a State Prediction Network (SPN) estimates the initial internal states required to synchronize the model with the target. Additionally, a new metric based on chirp-train signals is developed to quantify modulation accuracy. Experiments modeling a vintage hardware phaser demonstrate the method's ability to capture time-varying dynamics in a fully black-box context.

2602.01394 2026-06-16 eess.AS cs.LG cs.SD 版本更新

SSNAPS: Audio-Visual Separation of Speech and Background Noise with Diffusion Inverse Sampling

SSNAPS: 基于扩散逆采样的语音与背景噪声视听分离

Yochai Yemini, Yoav Ellinson, Rami Ben-Ari, Sharon Gannot, Ethan Fetaya

发表机构 * Bar-Ilan University(巴伊兰大学) OriginAI

AI总结 提出一种无监督的视听语音分离方法,利用扩散先验和逆采样联合建模语音与噪声,在单麦克风场景下优于有监督基线,并支持离屏说话人分离。

详情
AI中文摘要

本文解决了在真实环境噪声下进行视听单麦克风语音分离和增强的挑战。我们的方法基于生成逆采样,其中我们用专用的扩散先验对干净语音和环境噪声进行建模,并联合利用它们来恢复所有潜在源。为此,我们重新制定了一个最近的逆采样器以匹配我们的设置。我们在包含1、2和3个说话人以及噪声的混合信号上进行了评估,结果表明,尽管是完全无监督的,我们的方法在所有条件下的WER上始终优于领先的有监督基线。我们进一步扩展了我们的框架以处理离屏说话人分离。此外,分离出的噪声分量具有高保真度,使其适用于声学场景的下游检测。代码和预训练模型将在接收后提供。演示页面:此 https URL

英文摘要

This paper addresses the challenge of audio-visual single-microphone speech separation and enhancement in the presence of real-world environmental noise. Our approach is based on generative inverse sampling, where we model clean speech and ambient noise with dedicated diffusion priors and jointly leverage them to recover all underlying sources. To achieve this, reformulate a recent inverse sampler to match our setting. We evaluate on mixtures of 1, 2, and 3 speakers with noise and show that, despite being entirely unsupervised, our method consistently outperforms leading supervised baselines in WER across all conditions. We further extend our framework to handle off-screen speaker separation. Moreover, the high fidelity of the separated noise component makes it suitable for downstream detection of the acoustic scene. Code and pretrained models will become available upon acceptance. Demo page: https://ssnaps2026.github.io/ssnaps2026/

2604.23952 2026-06-16 stat.ML cs.LG nlin.CD 版本更新

Conditional Score-Based Modeling of Effective Langevin Dynamics

基于条件分数的有效朗之万动力学建模

Ludovico T. Giorgini

发表机构 * Department of Mathematics, Massachusetts Institute of Technology(数学系,麻省理工学院)

AI总结 提出一种基于有限时间转移密度条件分数的随机降阶模型校准方法,通过最小二乘拟合从数据中推断漂移和扩散系数,避免轨迹微分或状态空间划分。

详情
AI中文摘要

随机降阶模型广泛用于表示复杂系统的有效动力学,但根据数据估计其漂移和扩散系数仍然具有挑战性。标准方法通常依赖于短时间轨迹增量、状态空间划分或候选模型的重复模拟,这些方法对于高维系统、粗时间采样或非均匀采样数据变得不可靠或计算成本高昂。我们引入了一种数据驱动的校准方法,该方法基于随机降阶模型系数与有限时间转移密度的条件分数(定义为转移密度对初始状态的对数梯度)之间的新关系。由此得到的恒等式将滞后相关函数的导数表示为观测到的滞后对上的平稳期望,其中涉及该条件分数和未知模型系数。这种公式允许直接从有限滞后统计量约束漂移和扩散结构,而无需在校准过程中对轨迹进行微分、划分状态空间或重复积分候选降阶模型,从而产生一个关于平稳滞后对的最小二乘拟合问题。我们在三个复杂度递增的系统上验证了该方法:一个解析可解的Cox-Ingersoll-Ross扩散过程、一个具有仿射乘性噪声的二维非平衡扩散过程,以及一个周期性的软自旋随机朗道-利夫希茨链。在这些测试中,推断出的模型在再现有限滞后动力学相关性的同时保持了不变统计量。该框架为从数据中学习再现规定统计和动力学性质的随机降阶模型提供了一种可扩展的途径。

英文摘要

Stochastic reduced-order models are widely used to represent the effective dynamics of complex systems, but estimating their drift and diffusion coefficients from data remains challenging. Standard approaches often rely on short-time trajectory increments, state-space partitioning, or repeated simulation of candidate models, which become unreliable or computationally expensive for high-dimensional systems, coarse temporal sampling, or unevenly sampled data. We introduce a data-driven calibration method based on a novel relationship between the coefficients of a stochastic reduced model and the conditional score of the finite-time transition density, defined as the gradient of the logarithm of the transition density with respect to the initial state. The resulting identity expresses derivatives of lagged correlation functions as stationary expectations over observed lagged pairs involving this conditional score and the unknown model coefficients. This formulation allows the drift and diffusion structure to be constrained directly from finite-lag statistics, without differentiating trajectories, partitioning state space, or repeatedly integrating candidate reduced models during calibration, yielding a least-squares fitting problem over stationary lagged pairs. We validate the approach on three systems of increasing complexity: an analytically tractable Cox--Ingersoll--Ross diffusion, a two-dimensional nonequilibrium diffusion with affine multiplicative noise, and a periodic soft-spin stochastic Landau--Lifshitz chain. Across these tests, the inferred models preserve the invariant statistics while reproducing finite-lag dynamical correlations. The framework provides a scalable route for learning stochastic reduced-order models from data that reproduce prescribed statistical and dynamical properties.

2605.03573 2026-06-16 stat.ML cs.LG 版本更新

Stochastic Schrödinger Diffusion Models for Pure-State Ensemble Generation

随机薛定谔扩散模型用于纯态集合生成

Jian Xu, Wei Chen, Shigui Li, Chao Li, Jingyuan Zheng, Delu Zeng, John Paisley, Qibin Zhao

发表机构 * RIKEN iTHEMS RIKEN AIP South China University of Technology(华南理工大学) Stanford University(斯坦福大学) Columbia University(哥伦比亚大学)

AI总结 本文提出随机薛定谔扩散模型(SSDMs),在复射影空间CP^{d-1}上构建基于分数的生成框架,通过局部欧几里得奥本海姆-乌尔申贝格近似实现无解析过渡密度的训练,提升量子机器学习的泛化能力。

详情
AI中文摘要

在量子机器学习(QML)中,经典数据通常被编码为量子纯态并直接处理为量子表示,推动了在底层表示层面生成模型的发展,该模型从底层纯态集合中采样新量子态,而非从扰动的经典输入重新准备。然而,将具有明确反向时间采样器的分数扩散模型扩展到量子纯态集合仍具挑战性,由于复射影空间CP^{d-1}的非欧几里得几何和过渡密度的不可行性。我们提出了随机薛定谔扩散模型(SSDMs),一种内在的基于分数的生成框架,配备了Fubini-Study(FS)度量。SSDMs通过随机薛定谔方程(SSE)实现正向黎曼扩散,并推导出由黎曼分数∇_{FS} log p_t驱动的反向时间动力学。为了在没有解析过渡密度的情况下进行训练,我们引入了一个基于FS正常坐标中局部欧几里得奥本海姆-乌尔申贝格近似的局部时间目标,从而得到一个映射回流形的解析教师分数。实验表明,SSDMs能够忠实捕捉目标纯态集合的统计特性,包括可观测量的矩、重叠核MMD和纠缠度量,并且SSDM生成的量子表示通过表示层面的数据增强提升了下游QML的泛化能力。

英文摘要

Quantum machine learning increasingly relies on pure-state representations, motivating generative models that sample directly in quantum representation space rather than perturbing classical inputs and re-encoding. We introduce Stochastic Schrödinger Diffusion Models (SSDMs), a score-based generative framework that defines diffusion, scores, and reverse-time sampling intrinsically on the complex projective manifold $\mathbb{CP}^{d-1}$ under the Fubini--Study metric. SSDMs combine a Riemannian Ornstein--Uhlenbeck forward diffusion with a stochastic Schrödinger realization, and learn reverse-time dynamics driven by the Riemannian score. Our central technical contribution is a local-time learning objective that exploits the local Euclidean OU limit of intrinsic manifold diffusions in Fubini-Study normal coordinates to obtain an analytic teacher score, bypassing the intractable transition densities that limit existing Riemannian score-based models. Across synthetic, physics-inspired (TFIM, XXZ), and quantum feature-state benchmarks up to $14$ qubits, SSDMs match target pure-state ensembles by orders of magnitude on MMD and observable statistics over both ambient Euclidean and matched Riemannian score-based baselines, and improve representation-level diagnostics for downstream quantum kernel methods.

2605.18324 2026-06-16 cs.CV cs.AI cs.GR cs.LG stat.ML 版本更新

Improved Baselines with Representation Autoencoders

改进的基于表示自动编码器的基线

Jaskirat Singh, Boyang Zheng, Zongze Wu, Richard Zhang, Eli Shechtman, Saining Xie

发表机构 * Adobe Research(Adobe研究院) ANU(澳大利亚国立大学) New York University(纽约大学)

AI总结 本文研究了基于表示自动编码器(RAE)的设计选择,发现三个见解,简化并改进了RAE。首先,研究了一种通用公式,将表示定义为最后k个编码器层的总和,而不是仅最终层。其次,研究了RAE与表示对齐(REPA)的假设,发现两者具有互补的工作机制。最后,改进了RAE在无分类器指导(CFG)中的表现,通过重新参数化DiT模型输出,实现了无需训练第二个模型的指导效果。RAEv2在ImageNet-256上达到了1.06的gFID,且训练效率显著提高。

详情
AI中文摘要

Representation Autoencoders (RAE) replace traditional VAE with pretrained vision encoders. In this paper, we systematically investigate several design choices and find three insights which simplify and improve RAE. First, we study a generalized formulation where the representation is defined as sum of the last k encoder layers rather than solely the final layer. This simple change greatly improves reconstruction without encoder finetuning or specialized data (e.g., text, faces). Second, we study the prevalent assumption that RAE (using pretrained representation as encoder) replaces representation alignment (REPA), which distills the same representation to intermediate layers instead. Through large-scale empirical analysis, we uncover a surprising finding: RAE and REPA exhibit complementary working mechanisms, allowing the same representation to be used as both encoder and target for intermediate diffusion layers. Finally, the original RAE struggles with classifier-free guidance (CFG) and requires training a second, weaker diffusion model for AutoGuidance (AG). We show that REPA itself can be viewed as x-prediction in RAE latent space. By simply re-parameterizing the output of the DiT model, it can provide guidance for

英文摘要

Representation Autoencoders (RAE) replace traditional VAE with pretrained vision encoders. In this paper, we systematically investigate several design choices and find three insights which simplify and improve RAE. First, we study a generalized formulation where the representation is defined as sum of the last k encoder layers rather than solely the final layer. This simple change greatly improves reconstruction without encoder finetuning or specialized data (e.g., text, faces). Second, we study the prevalent assumption that RAE (using pretrained representation as encoder) replaces representation alignment (REPA), which distills the same representation to intermediate layers instead. Through large-scale empirical analysis, we uncover a surprising finding: RAE and REPA exhibit complementary working mechanisms, allowing the same representation to be used as both encoder and target for intermediate diffusion layers. Finally, the original RAE struggles with classifier-free guidance (CFG) and requires training a second, weaker diffusion model for AutoGuidance (AG). We show that REPA itself can be viewed as x-prediction in RAE latent space. By simply re-parameterizing the output of the DiT model, it can provide guidance for "free". Overall, RAEv2 leads to more than 10x faster convergence over the original RAE, achieving a state-of-the-art gFID of 1.06 in just 80 epochs on ImageNet-256. On FDr6, RAEv2 achieves a state-of-the-art 2.17 at just 80 epochs compared to the previous best 3.26 (800 epochs) without any post-training. This motivates EPFID@k (epochs to reach unguided gFID < k) as a measure of training efficiency. RAEv2 attains an EPFID@2 of 35 epochs, versus 177 for the original RAE. We also validate our approach across diverse settings for text-to-image generation and navigation world models, showing consistent improvements. The code is available at https://raev2.github.io.

2606.13769 2026-06-16 cs.RO cs.CV cs.LG 版本更新

$μ_0$: A Scalable 3D Interaction-Trace World Model

$\mu_0$: 一种可扩展的3D交互轨迹世界模型

Seungjae Lee, Yoonkyo Jung, Jusuk Lee, Jonghun Shin, Amir Hossein Shahidzadeh, Yao-Chih Lee, H. Jin Kim, Jia-Bin Huang, Furong Huang

发表机构 * University of Maryland, College Park(马里兰大学帕克分校) Seoul National University(首尔大学)

AI总结 提出基于3D轨迹的可扩展世界模型$\mu_0$,通过预测交互点轨迹实现跨本体机器人学习,无需动作标签,性能媲美有监督模型。

详情
AI中文摘要

能够捕捉动作如何引起物理变化的世界模型使得可扩展的机器人学习成为可能,而无需依赖特定本体的动作标签。像素空间视频模型提供了广泛的视觉先验,但将模型容量消耗在密集外观重建上,而直接动作模型则需要特定本体的标签,阻碍了可扩展性。我们提出$\mu_0$,一种基于3D轨迹的可扩展世界模型。$\mu_0$不是预测密集像素或直接建模动作,而是预测显著交互点(如物体、工具、手和接触区域)的平滑3D轨迹,从而产生一个紧凑、与本体无关的运动接口。为了能够从多样化的视频源进行训练,我们的TraceExtract系统通过选择关键点、构建全局对齐的轨迹以及将运动片段与层次化语言描述关联,自动提取3D监督。这种TraceExtract监督通过将预训练的视觉-语言骨干网络与模块化轨迹专家相结合来预训练$\mu_0$,其中轨迹专家通过B样条控制点表示每个查询并预测未来轨迹。实验表明,$\mu_0$在2D和3D轨迹预测方面均优于基线方法,包括轨迹预测模型和分词VLM方法。由于$\mu_0$是冻结且可重用的,它可以与动作专家配对用于下游机器人本体。尽管是无动作预训练,由此产生的轨迹条件策略在性能上与使用动作监督预训练的VLA模型(如$\pi_0$)相当。这些结果确立了3D轨迹作为跨本体操作的可扩展和可迁移表示。

英文摘要

World models that capture how actions induce physical change enable scalable robot learning without reliance on embodiment-specific action labels. Pixel-space video models provide broad visual priors but expend model capacity on dense appearance reconstruction, while direct action models require embodiment-specific labels that hinder scalability. We present $μ_0$, a scalable world model based on 3D traces. Rather than predicting dense pixels or directly modeling actions, $μ_0$ forecasts smooth 3D trajectories for salient interaction points such as objects, tools, hands, and contact regions, yielding a compact, embodiment-agnostic motion interface. To enable training from diverse video sources, our TraceExtract system automatically extracts 3D supervision by selecting keypoints, constructing globally aligned traces, and associating motion segments with hierarchical language captions. This TraceExtract supervision pretrains $μ_0$ by combining a pretrained vision-language backbone with a modular trace expert, which represents each query via B-spline control points and predicts future traces. Experiments show that $μ_0$ outperforms baselines in both 2D and 3D trace prediction, including trace prediction models and tokenized VLM methods. Because $μ_0$ is frozen and reusable, it can be paired with action experts for downstream robot embodiments. Despite action-free pretraining, the resulting trace-conditioned policies achieve performance competitive with VLA models pretrained with action supervision, such as $π_0$. These results establish 3D traces as a scalable and transferable representation for cross-embodiment manipulation.

5. 优化、泛化与理论分析 53 篇

2606.14970 2026-06-16 cs.LG 新提交

Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning

基于LMO方法的零阶无参数优化:高效微调的新方法

Dmitriy Bystrov, Daniil Medyakov, Dmitry Bylinkin, Aleksandr Beznosikov

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 针对大模型微调中反向传播内存开销大、零阶优化对步长和平滑参数敏感的问题,提出统一无梯度训练、自适应调参和非欧几里得更新几何的AdaNAGED方法,并在OPT-1.3B模型上验证有效性。

Comments 29 pages, 1 table

详情
AI中文摘要

微调大型语言模型(LLM)已成为现代优化的核心应用,使预训练模型能够适应多样化的下游任务和特定领域数据。大规模微调的主要障碍是反向传播的内存开销,这需要存储激活值、梯度和优化器状态。零阶(ZO)优化提供了一种内存高效的替代方案,但其性能对步长和平滑参数高度敏感,通常需要昂贵的任务特定调参。无参数(PF)优化通过在没有问题相关常数先验知识的情况下调整算法参数来解决这一问题。此外,大规模微调可以受益于几何感知更新,该更新考虑了参数块的异质结构,这可以通过利用线性最小化预言(LMO)的方法来建模。在这项工作中,我们研究了基于LMO的ZO优化的PF自适应,并引入了$\texttt{AdaNAGED}$,一种统一无梯度训练、自适应调参和非欧几里得更新几何的方法。我们建立了收敛保证,并在使用$\texttt{OPT}-1.3\mathrm{B}$模型的大规模LLM微调任务上验证了该方法。

英文摘要

Fine-tuning large language models (LLMs) has become a central application of modern optimization, enabling pretrained models to adapt to diverse downstream tasks and domain-specific data. A major obstacle in large-scale fine-tuning is the memory overhead of backpropagation, which requires storing activations, gradients, and optimizer states. Zeroth-order (ZO) optimization offers a memory-efficient alternative, but its performance is highly sensitive to the stepsize and smoothing parameter, often requiring costly task-specific tuning. Parameter-free (PF) optimization addresses this issue by adapting algorithmic parameters without prior knowledge of problem-dependent constants. Moreover, large-scale fine-tuning can benefit from geometry-aware updates that account for the heterogeneous structure of parameter blocks, which can be modeled through methods that exploit linear minimization oracle (LMO). In this work, we study PF adaptation for LMO-based ZO optimization and introduce $\texttt{AdaNAGED}$, a method that unifies gradient-free training, adaptive tuning, and non-Euclidean update geometry. We establish convergence guarantees and validate the method on large-scale LLM fine-tuning task with $\texttt{OPT}-1.3\mathrm{B}$ model.

2606.15115 2026-06-16 cs.LG 新提交

Diversity-Driven Offline Multi-Objective Optimization via Nested Pareto Set Learning

基于嵌套帕累托集学习的多样性驱动离线多目标优化

Yiyi Zhu, Yaolin Wen, Xiang Xia, Xin An, Hanyi Si, Xiang Shu, Yangde Fu, Liang Dou, Hong Qian

AI总结 针对离线多目标优化中的分布外问题,提出DOMOO方法,通过累积风险控制、嵌套帕累托集学习和多样性驱动选择策略,在合成和真实基准上实现了收敛性和多样性的最佳平均排名。

Comments 32 pages, 7 figures, accepted by ICML 2026. Project: https://github.com/YaolinWen/DOMOO

详情
AI中文摘要

多目标优化(MOO)已成为解决涉及多个目标的复杂优化问题的强大方法。在许多实际场景中,函数评估不可用或成本过高,因此必须仅基于固定的离线数据集进行优化。在这种称为离线MOO的设置中,目标是在无法访问真实目标函数的情况下找到帕累托集。这种设置存在分布外(OOD)问题,即代理模型对于未见过的设计不准确。由于OOD问题,代理误差可能导致优化器选择不在真实帕累托前沿上且偏向其极端的解。为了解决这个问题,本文提出了多样性驱动的离线多目标优化(DOMOO),旨在找到一组多样且高质量的解。首先,DOMOO包含一个累积风险控制模块,用于估计候选解的潜在风险,并缓解训练数据与生成解之间的OOD问题。此外,提出了一种嵌套帕累托集学习(PSL)策略,以联合学习偏好和PSL参数,然后优化它们,从而适应多样化的帕累托前沿几何形状。为了进一步提高解的质量,我们设计了一种多样性驱动的选择策略,用于提取一组具有代表性且分布良好的最终解。为了实现这种多样性驱动的选择策略,我们提出了$\text{IGD}_\text{offline}$,这是一个针对离线设置定制的指标,同时考虑了多样性和收敛性,并避免了超体积指标的偏差。在合成和真实基准上的大量实验表明,在比较的方法中,DOMOO在收敛性和多样性方面均实现了跨任务的最佳平均排名。

英文摘要

Multi-objective optimization (MOO) has emerged as a powerful approach to solving complex optimization problems involving multiple objectives. In many practical scenarios, function evaluations are unavailable or prohibitively expensive, necessitating optimization solely based on a fixed offline dataset. In this setting, known as offline MOO, the goal is to find out the Pareto set without access to the true objective functions. This setting suffers from the out-of-distribution (OOD) issue, where the surrogate model is not accurate for unseen designs. Due to the OOD issue, surrogate errors may cause the optimizer to select solutions that do not lie on the true Pareto front and are biased toward its extremes. To address this, this paper proposes Diversity-driven Offline Multi-Objective Optimization (DOMOO), which aims to find out a diverse and high-quality set of solutions. First, DOMOO incorporates an accumulative risk control module that estimates the potential risk of candidate solutions and alleviates the OOD issue between the training data and the generated solutions. In addition, a nested Pareto set learning (PSL) strategy is proposed to jointly learn preference and PSL parameters, then optimize them, enabling adaptation to diverse Pareto front geometries. To further enhance solution quality, we design a diversity-driven selection strategy that extracts a representative and well-distributed set of final solutions. To achieve this diversity-driven selection strategy, we propose $\text{IGD}_\text{offline}$, a tailored indicator for the offline setting that considers both diversity and convergence, and avoids the bias of hypervolume indicator. Extensive experiments on synthetic and real-world benchmarks show that DOMOO achieves the best average rank across tasks in both convergence and diversity among the compared methods.

2606.15219 2026-06-16 cs.LG cs.DS math.ST stat.ML stat.TH 新提交

Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model

神经网络能否实现最优计算-统计权衡?基于单指标模型的分析

Siyu Chen, Beining Wu, Miao Lu, Zhuoran Yang, Tianhao Wang

发表机构 * Department of Statistics and Data Science, Yale University(耶鲁大学统计与数据科学系) Department of Statistics, University of Chicago(芝加哥大学统计系) Department of Management Science and Engineering, Stanford University(斯坦福大学管理科学与工程系) Toyota Technological Institute at Chicago(芝加哥丰田技术研究所)

AI总结 提出统一梯度算法训练两层神经网络,在多项式时间内学习高斯单指标模型,样本复杂度匹配SQ下界,并扩展到稀疏情形。

Comments 96 pages, 4 figures

详情
AI中文摘要

在这项工作中,我们解决以下问题:基于梯度的神经网络训练能否在学习高斯单指标模型时实现最优计算-统计权衡?先前研究表明,统计查询框架下的任何多项式时间算法需要$Ω(d^{s^\star/2}\lor d)$个样本,其中$s^\star$是生成指数,代表学习潜在模型的内在难度。然而,神经网络能否达到这一样本复杂度尚不清楚。受先前学习单指标模型的技术(如标签变换和景观平滑)启发,我们提出了一种统一的梯度算法,用于在多项式时间内训练两层神经网络。我们的方法适用于多种损失函数和激活函数,涵盖了广泛现有方法。我们证明,该算法学习到的特征表示与未知信号$θ^\star$高度对齐,样本复杂度为$\widetilde{O}(d^{s^\star/2} \lor d)$,对于所有生成指数$s^\star\geq 1$,与SQ下界仅差多对数因子。此外,我们通过引入一种利用稀疏结构的新型权重扰动技术,将方法扩展到$θ^\star$为$k$-稀疏($k = o(\sqrt{d})$)的情形。我们推导出相应的SQ下界为$\widetildeΩ(k^{s^\star})$,我们的方法与之匹配至多对数因子。我们的框架,特别是权重扰动技术,具有独立意义,并暗示了其他问题(如稀疏张量PCA)的潜在梯度解法。

英文摘要

In this work, we tackle the following question: Can neural networks trained with gradient-based methods achieve the optimal computational-statistical tradeoff in learning Gaussian single-index models? Prior research has shown that any polynomial-time algorithm under the statistical query (SQ) framework requires $Ω(d^{s^\star/2}\lor d)$ samples, where $s^\star$ is the generative exponent representing the intrinsic difficulty of learning the underlying model. However, it remains unknown whether neural networks can achieve this sample complexity. Inspired by prior techniques such as label transformation and landscape smoothing for learning single-index models, we propose a unified gradient-based algorithm for training a two-layer neural network in polynomial time. Our method is adaptable to a variety of loss and activation functions, covering a broad class of existing approaches. We show that our algorithm learns a feature representation that strongly aligns with the unknown signal $θ^\star$, with sample complexity $\widetilde{O} (d^{s^\star/2} \lor d)$, matching the SQ lower bound up to a polylogarithmic factor for all generative exponents $s^\star\geq 1$. Furthermore, we extend our approach to the setting where $θ^\star$ is $k$-sparse for $k = o(\sqrt{d})$ by introducing a novel weight perturbation technique that leverages the sparsity structure. We derive a corresponding SQ lower bound of order $\widetildeΩ(k^{s^\star})$, matched by our method up to a polylogarithmic factor. Our framework, especially the weight perturbation technique, is of independent interest, and suggests potential gradient-based solutions to other problems such as sparse tensor PCA.

2606.15268 2026-06-16 cs.LG 新提交

When to use what Schatten-$p$ norm in deep learning?

在深度学习中何时使用何种 Schatten-$p$ 范数?

Thomas Pethick

发表机构 * Pethick et al. [2026](Pethick 等人 [2026])

AI总结 本文通过理论分析解决关于 Schatten-∞ 优化器有效性的矛盾观察,发现结论取决于数据维度:在低维场景(包括 Chinchilla 缩放)下,较小的 Schatten-p 几何更优,并基于 SODA 框架为 p>2 提出新的噪声鲁棒加速结果。

详情
AI中文摘要

基于 Schatten-$\infty$ 的优化器(如 Muon)在经验上表现出色,但关于它们是否有益仍存在看似矛盾的观察。我们通过表明结论具有场景依赖性来解决这一矛盾。即使目标在 Schatten-$\infty$ 几何中是光滑的,较小的 Schatten-$p$ 几何也可能是最优的,特别是在低维场景中,我们证明这包括 Chinchilla 缩放。这一结论源于 SODA 框架在 $p>2$ 时的一个新的噪声鲁棒加速结果。同样的分析解释了为什么类似 Muon 的方法不需要预热,为什么它们自然偏好大批量,并得出了任意 $p$ 的批量大小缩放规则。

英文摘要

Schatten-$\infty$ based optimizers such as Muon have shown promising empirical performance, but there remains seemingly conflicting observations regarding whether they are beneficial. We resolve this conflict by showing that the conclusion is regime dependent. Even when the objective is smooth in the Schatten-$\infty$ geometry, smaller Schatten-$p$ geometries can be optimal, specifically in the low-dimensional regime, which we show includes Chinchilla scaling. This conclusion follows from a new noise-robust acceleration result for the SODA framework for $p>2$. The same analysis explains why Muon-like methods do not require warmup, why they naturally favor large batches, and yields a batch size scaling rule for arbitrary $p$.

2606.15455 2026-06-16 cs.LG cs.AI 新提交

Understanding Diversity Collapse in RLVR via the Lens of Overtraining

通过过度训练的视角理解RLVR中的多样性崩溃

Suqin Yuan, Jinkun Chen, Jiyang Zheng, Muyang Li, Lei Feng, Dadong Wang, Tao Xiang, Tongliang Liu, Bo An

发表机构 * Sydney AI Centre, The University of Sydney(悉尼大学悉尼人工智能中心) Southeast University(东南大学) Microsoft(微软) Data61, CSIRO(澳大利亚联邦科学与工业研究组织Data61) Chongqing University(重庆大学) Nanyang Technological University(南洋理工大学)

AI总结 本文通过过度训练的视角形式化RLVR中的多样性崩溃,发现标准训练中大部分更新是过度训练,并提出贝叶斯边界门控(BBG)方法,通过估计每个问题对推理边界的边际贡献来优化,提升多个基准上的Pass@k。

详情
AI中文摘要

基于可验证奖励的强化学习(RLVR)已成为增强大型语言模型推理能力的关键方法。然而,RLVR常常遭受\emph{多样性崩溃}:Pass@$1$提升而高$k$的Pass@$k$下降,这被视为模型推理边界的收窄。我们通过\emph{过度训练}的视角形式化了这种多样性崩溃:一旦一个问题对参考指标的贡献有效饱和,进一步的更新不再扩展模型能解决的问题,但仍将概率质量集中在on-policy采样偏好的轨迹上。在每次问题少量rollout的标准设置下,即使单次成功也会使问题进入高$k$ Pass@$k$的近乎饱和状态,因此标准RLVR中的大多数更新从边界角度来看都是过度训练。这一视角也提供了一种解读:RLVR能否扩展模型超越基础模型的推理能力?由于RLVR结构上偏向于高$k$ Pass@$k$,其总体下降本身并不意味着没有新的推理增益。在干预上,将更新限制在零成功的问题上,在困难基准上将Pass@$256$提升到基础模型之上;在观察上,标准RLVR训练中,最初不可解的问题中有相当一部分变得可解。基于这些发现,我们提出\emph{贝叶斯边界门控}(BBG),通过估计每个问题对推理边界的边际贡献,将优化从过度训练中转移出来。在多个推理基准上,BBG在广泛的$k$范围内提升了平均Pass@$k$。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become a key approach for enhancing the reasoning abilities of large language models. However, RLVR often suffers from \emph{diversity collapse}: Pass@$1$ improves while high-$k$ Pass@$k$ degrades, which is viewed as a narrowing of the model's reasoning boundary. We formalize this diversity collapse through the lens of \emph{overtraining}: once a problem's contribution to the reference metric has effectively saturated, further updates no longer expand what the model can solve but still concentrate probability mass on the trajectories favored by on-policy sampling. Under a standard setup with few rollouts per problem, even a single observed success places a problem in a nearly saturated regime for high-$k$ Pass@$k$, so most updates in standard RLVR are overtraining from the boundary perspective. This perspective also suggests a reading of whether RLVR can expand the model's reasoning abilities beyond the base model: since RLVR is structurally biased against high-$k$ Pass@$k$, its aggregate decline does not by itself mean that no new reasoning gains occurred. Interventionally, restricting updates to problems with zero observed success lifts Pass@$256$ above the base model on difficult benchmarks; observationally, a non-trivial fraction of initially unsolvable problems become solvable during standard RLVR training. Building on these findings, we propose \emph{Bayesian Boundary Gating} (BBG), which redirects optimization away from overtraining by estimating each problem's marginal contribution to the reasoning boundary. Across multiple reasoning benchmarks, BBG improves average Pass@$k$ across a wide range of $k$.

2606.15551 2026-06-16 cs.LG 新提交

A Bifurcation Theory Framework for Gradient Descent on the Edge of Stability

梯度下降在稳定性边缘的分岔理论框架

Eric Gan

发表机构 * Eric Gan(埃里克·甘)

AI总结 提出分岔理论框架,通过将训练动力学分解为法向和切向分量,证明稳定性边缘训练源于法向的翻转分岔,并收敛到最小化流形。

详情
AI中文摘要

稳定性边缘(EoS)现象,即梯度下降操作的锐度超过经典收敛阈值但损失在长时间尺度上下降,在现代深度学习中普遍存在,但在现实环境中仍鲜为人知。先前的严格分析主要局限于具有特定结构形式的标量或低维损失。在这项工作中,我们为梯度下降在稳定性边缘上开发了一个分岔理论框架,该框架直接适用于过参数化神经网络。通过将训练动力学分解为法向和切向于最小化流形的分量,我们表明稳定的EoS训练源于法向方向的翻转分岔,由第一个李雅普诺夫系数的符号控制,而切向动力学向锐度递减的区域漂移。在损失景观的温和谱和几何假设下,我们证明了在EoS阈值下训练时收敛到最小化流形。作为推论,我们恢复并统一了先前的结果:我们表明Gan(2026)的乘积稳定性条件是我们框架的一个实例。

英文摘要

The Edge of Stability (EoS) phenomenon, where gradient descent operates with sharpness exceeding the classical convergence threshold yet the loss decreases over long timescales, is ubiquitous in modern deep learning but remains poorly understood in realistic settings. Prior rigorous analyses have been largely confined to scalar or low-dimensional losses with specific structural forms. In this work, we develop a bifurcation theory framework for gradient descent on the edge of stability that applies directly to overparameterized neural networks. By decomposing the training dynamics into components normal and tangent to the manifold of minimizers, we show that stable EoS training arises from a flip bifurcation in the normal direction, governed by the sign of the first Lyapunov coefficient, while the tangent dynamics drift toward regions of decreasing sharpness. Under mild spectral and geometric assumptions on the loss landscape, we prove convergence to the minimizing manifold when training at the EoS threshold. As a corollary, we recover and unify prior results: we show that the product-stability condition of Gan (2026) is an instance of our framework.

2606.15569 2026-06-16 cs.LG math.ST stat.ML stat.TH 新提交

A Decision-Theoretic View of Test-Time Training: When, How Far, and Which Directions to Adapt

测试时训练的决策论视角:何时、多远以及哪些方向进行自适应

Tomoya Wakayama

发表机构 * N/A

AI总结 通过决策论将测试时训练视为核机制下的隐式贝叶斯推断,揭示了更新步长和子空间选择对性能的影响,并提出了自适应策略、PAC-Bayes保证和最优子空间选择规则。

详情
AI中文摘要

测试时训练(TTT)通过参数更新使预训练模型适应每个提示,提高了在预训练到测试分布偏移下的准确性。然而,其性能常常受到不稳定性和对超参数(如更新步长和子空间)敏感性的影响。我们通过决策论的视角解释这一行为,将TTT视为核机制下的隐式贝叶斯推断。在高斯过程基准下,我们表明当更新与提示的信噪比谱匹配并与查询相关的特征方向对齐时,TTT能降低预测误差。这一视角支撑了以下结果:(1)我们展示了固定更新步长和子空间在分布偏移下失败的情况,从而激励自适应策略;(2)我们证明通过提示证据选择更新步长具有对抗过拟合的PAC-Bayes保证;(3)我们在线性-高斯校正模型下刻画了贝叶斯最优更新子空间,从而为选择Transformer块和头提供了评分规则。我们的理论有助于解释TTT的经验不稳定性,为何时、多远以及哪些方向进行自适应提供了原则性指导。

英文摘要

Test-time training (TTT) adapts a pretrained model to each prompt via parameter updates, improving accuracy under pretraining-to-test distribution shifts. Yet, its performance often suffers from instability and sensitivity to hyperparameters such as update steps and subspace. We explain this behavior through a decision-theoretic lens, treating TTT as implicit Bayesian inference in the kernel regime. Under a Gaussian process benchmark, we show that TTT reduces prediction error when updates are spectrally matched to the prompt's signal-to-noise ratio and aligned with query-relevant eigen-directions. This perspective underpins the following results: (1) we show when fixed update steps and subspaces fail under distribution shifts, motivating adaptive strategies; (2) we prove that selecting update steps via prompt evidence admits a PAC-Bayes guarantee against overfitting; and (3) we characterize the Bayes-optimal update subspace under a linear-Gaussian correction model, yielding a scoring rule for selecting Transformer blocks and heads. Our theory helps explain the empirical instability of TTT, taking a step toward principled guidance for when, how far, and which directions to adapt.

2606.15690 2026-06-16 cs.LG math.DS 新提交

Multi-Fidelity SINDy: Sparse Discovery of Nonlinear Dynamical Systems with Fidelity-Weighted Measurements

多保真度SINDy:基于保真度加权测量的非线性动力系统稀疏发现

Filippo Zacchei, Ana Larrañaga, Attilio Frangi, Andrea Manzoni, Steven L. Brunton

发表机构 * Politecnico di Milano(米兰理工大学) University of Washington(华盛顿大学)

AI总结 针对异质噪声数据,提出多保真度SINDy方法,通过加权回归融合集成SINDy和弱SINDy,从不同保真度测量中稀疏识别非线性动力系统,理论证明加权策略的统计合理性,在常微分和偏微分方程基准系统及双摆预测中验证了其抑制异方差噪声、利用低成本低质量数据提升模型恢复的效果。

Comments 27 pages, 6 figures, 2 tables

详情
AI中文摘要

来自模拟和实验的数据很少是无噪声的,并且常常表现出异质保真度水平。测量不确定性可能在重复观测、传感设备甚至单个实验中变化。本文解决了从这种非均匀数据中发现非线性动力系统的问题。我们通过将集成SINDy和弱SINDy结合在由广义最小二乘法导出的加权回归公式中,扩展了稀疏识别非线性动力系统(SINDy)框架以考虑可变噪声水平。还提供了加权策略的统计证明。该方法在几个基准系统上得到验证,包括常微分和偏微分方程。此外,我们展示了多保真度集成在预测双摆系统动力学中的优势。结果证实,所提出的方法减轻了异方差噪声的不利影响,并且重复、低成本、低质量的测量可以改善模型恢复,在某些情况下匹配或优于仅使用高保真度数据获得的重建结果。

英文摘要

Data from simulations and experiments are rarely noise-free and often exhibit heterogeneous levels of fidelity. Measurement uncertainty may vary across repeated observations, sensing devices, or even within a single experiment. This work addresses the problem of discovering nonlinear dynamical systems from such inhomogeneous data. We extend the Sparse Identification of Nonlinear Dynamical Systems (SINDy) framework to account for variable noise levels by combining Ensemble SINDy and Weak SINDy within a weighted regression formulation derived from generalized least squares. A statistical justification for the weighting strategy is also provided. The methodology is validated on several benchmark systems, including ordinary and partial differential equations. In addition, we show the benefit of multi-fidelity integration for forecasting the dynamics of a double pendulum system. The results confirm that the proposed approach mitigates the adverse effects of heteroscedastic noise and that repeated, low-cost, low-quality measurements can improve model recovery, in some cases matching or outperforming reconstructions obtained using only high-fidelity data.

2606.15812 2026-06-16 cs.LG 新提交

Brownian Kernel Ladders

布朗核梯子

Mahdi Mohammadigohari, Giuseppe Di Fatta, Giuseppe Nicosia, Panos M Pardalos

发表机构 * Faculty of Engineering, Free University of Bozen-Bolzano(博洛尼亚-博兹纳自由大学工程学院) Department of Biomedical and Biotechnological Sciences, University of Catania(卡塔尼亚大学生物医学与生物技术科学系) Center for Applied Optimization, Department of Industrial and Systems Engineering, University of Florida(佛罗里达大学应用优化中心、工业与系统工程系)

AI总结 提出布朗核梯子(BKL)递归层次函数空间,通过布朗核积分构造,证明其为准Banach空间且具有深度相关Hölder正则性,为深度学习的组合表示提供可解析框架。

Comments Submitted to JMLR

详情
AI中文摘要

在统计学习理论中,构建能够捕捉层次组合表示的可解析函数空间仍然是一个核心挑战。我们引入了布朗核梯子(BKL),这是一个通过布朗核积分构造递归定义的积分再生核希尔伯特空间层次结构。从线性泛函开始,每一层通过对前一层子集上的概率测度积分布朗核得到,产生一个递归函数空间模型,其中深度直接通过层次结构编码。基于此框架,我们定义了规范BKL空间及其相关的复杂度泛函。我们建立了这些空间的若干分析和统计性质。特别地,我们证明BKL空间构成准Banach空间,满足依赖于深度的Hölder正则性估计,并表现出关于深度的严格单调性。我们进一步证明了正则化经验风险最小化的存在性结果,并推导了关于环境维度和层次深度一致控制的高斯复杂度界。分析的一个关键成分是基于递归子集分解和布朗核阈值表示的组合证明技术。这些估计为BKL空间上的正则化经验风险最小化提供了接近参数阶的过剩风险保证。我们的结果为研究深度学习中的组合表示提供了一个数学上可解析的层次函数空间框架。

英文摘要

Constructing mathematically tractable function spaces that capture hierarchical compositional representations remains a central challenge in statistical learning theory. We introduce Brownian kernel ladders (BKLs), a recursively defined hierarchy of integral reproducing kernel Hilbert spaces generated through Brownian-kernel integral constructions. Starting from linear functionals, each layer is obtained by integrating Brownian kernels over probability measures supported on subsets of the previous layer, yielding a recursive function-space model in which depth is encoded directly through the hierarchy. Based on this framework, we define canonical BKL spaces together with an associated complexity functional. We establish several analytical and statistical properties of these spaces. In particular, we show that BKL spaces form quasi-Banach spaces, satisfy depth-dependent Hölder regularity estimates, and exhibit strict monotonicity with respect to depth. We further prove existence results for regularized empirical risk minimization and derive Gaussian complexity bounds that remain uniformly controlled with respect to both the ambient dimension and the hierarchy depth. A key ingredient of the analysis is a combinatorial proof technique based on recursive subset decompositions and Brownian-kernel threshold representations. These estimates yield excess-risk guarantees of near-parametric order for regularized empirical risk minimization over BKL spaces. Our results provide a mathematically tractable hierarchical function-space framework for studying compositional representations in deep learning.

2606.15832 2026-06-16 cs.LG math.OC 新提交

SILAGE: Memory-Efficient, Full-Gradient-Free Nonconvex Optimization for Nested Finite Sums

SILAGE: 针对嵌套有限和的内存高效、完全无全梯度的非凸优化

Igor Sokolov, Laurent Condat, Peter Richtárik

发表机构 * Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST)(生成人工智能卓越中心,国王阿卜杜勒-阿齐兹大学科学与技术学院)

AI总结 针对大规模数据中嵌套双有限和结构的非凸优化,提出SILAGE算法,通过利用双和结构避免全局全梯度刷新,仅需O(n)内存,并基于组间和组内异质性实现自适应收敛分析。

Comments 80 pages, 3 algorithms, 4 theorems, 2 corollaries, 11 lemmas, 2 figures, 12 tables

详情
AI中文摘要

大规模数据集上的经验风险最小化自然呈现出嵌套的双有限和结构,其中 $N=nm$ 个总样本被逻辑或物理地划分为 $n$ 个大小为 $m$ 的块(例如,在池化数据孤岛、核外学习或有意分层中)。虽然方差缩减方法对非凸目标实现了最优的 oracle 复杂度,但在此集中式场景中它们遭受严重的扩展瓶颈。递归估计器(如 PAGE)需要定期对所有 $nm$ 个样本进行全局全梯度刷新,这在计算上代价高昂。相反,单循环方法(如 SILVER)避免了此类刷新,但需要不切实际的 $\mathcal{O}(nm)$ 内存来存储每个样本的控制变量。在本文中,我们提出了 SILAGE,一种解决此权衡的方差缩减算法。通过主动利用双和结构,SILAGE 消除了对所有 $nm$ 组件的周期性全局全梯度刷新(每次迭代最多评估一个局部组梯度),同时仅需 $\mathcal{O}(n)$ 内存。此外,我们提供了严格的收敛分析,避免了悲观的 worst-case Lipschitz 常数。相反,SILAGE 的复杂度通过嵌套的函数相似性(组间异质性 $δ_1$ 和组内异质性 $δ_2$)自然地适应底层数据几何。我们的结果在几个实际相关场景中改进了现有的最先进界限。

英文摘要

Empirical risk minimization on massive datasets naturally exhibits a nested double finite-sum structure, where $N=nm$ total samples are logically or physically partitioned into $n$ blocks of size $m$ (e.g., in pooled data silos, out-of-core learning, or deliberate stratification). While variance-reduced methods achieve optimal oracle complexities for nonconvex objectives, they suffer from severe scaling bottlenecks in this centralized regime. Recursive estimators, such as PAGE, require periodic global full-gradient refreshes over all $nm$ samples, which are computationally expensive. Conversely, single-loop methods, such as SILVER, avoid such refreshes but require an impractical $\mathcal{O}(nm)$ memory footprint to store a control variate for every sample. In this paper, we propose SILAGE, a variance-reduced algorithm that addresses this trade-off. By actively exploiting the double-sum structure, SILAGE eliminates periodic global full-gradient refreshes over all $nm$ components (evaluating at most one local group gradient per iteration) while requiring only $\mathcal{O}(n)$ memory. Furthermore, we provide a tight convergence analysis that avoids pessimistic worst-case Lipschitz constants. Instead, SILAGE's complexity natively adapts to the underlying data geometry via nested functional similarities: across-group ($δ_1$) and within-group ($δ_2$) heterogeneity. Our results improve existing state-of-the-art bounds in several practically relevant regimes.

2606.16028 2026-06-16 cs.LG cs.IT math.FA math.IT 新提交

The Information-Theoretic Benefit of Shared Representations under Orthogonality Constraints

正交约束下共享表示的信息论优势

Thomas Dittrich, Oliver Potocki, Philipp Grohs

发表机构 * Johann Radon Institute of Computational and Applied Mathematics, Austrian Academy of Sciences(奥地利科学院约翰·拉东计算与应用数学研究所) Faculty of Mathematics, University of Vienna(维也纳大学数学学院)

AI总结 本文通过信息论框架,证明在正交约束下,联合近似比单独近似需要更少的描述长度,揭示了共享表示在组合架构中的效率优势。

详情
AI中文摘要

现代深度学习架构越来越多地采用多任务和多模态方式,使用预训练的基础模型结合任务特定的微调模型。经验上,利用不同问题之间的相似性,而不是单独解决它们,可以显著提高整体性能。虽然多任务学习的泛化和样本复杂度性质已被广泛研究,但与单独近似相比,联合近似的参数复杂度仍不太清楚。这个问题在现代深度学习中尤为重要,因为模型越来越需要满足结构约束,如等变性、守恒律或正交性。我们证明了在一致范数下,分别针对单独和联合近似类的描述长度的下界和上界。我们通过组合一个共享的硬特征(由Rademacher-Haar小波级数实现)与Sawtooth-Walsh读出层来构建一类正交函数,以强制输出坐标的正交性。Rademacher-Haar小波的二叉树结构将近似难度集中在共同特征组件上,而读出层则充当任务特定的头部。使用信息论框架,我们获得了联合编码和单独编码可实现的最优近似率之间的显著差距。最后,我们通过归约为三角波近似,在具有Heaviside激活函数的神经网络模型中实现了这种分离。我们的结果表明,即使在正交约束下,只要任务共享一个潜在的硬特征,联合近似在组合架构中需要的比特数严格更少。这为组合多输出架构的描述长度效率提供了理论见解,并阐明了神经网络如何在几何约束下保持表达能力。

英文摘要

Modern deep learning architectures are increasingly multi-task and multi-modal, using a pretrained foundation model combined with task-specific, fine-tuned models. Empirically, exploiting similarity across different problems, instead of solving them individually, can significantly improve overall performance. While the generalization and sample complexity properties of multitask learning have been widely studied, the parametric complexity of joint approximation in comparison to separate approximation remains less well understood. The question is particularly relevant in modern deep learning, where models are increasingly required to satisfy structural constraints such as equivariance, conservation laws, or orthogonality. We prove lower and upper bounds on the description-length for separate and joint approximation classes, respectively, in uniform norm. We build a class of orthogonal functions by composing a shared hard feature, realized by a Rademacher-Haar wavelet series, with Sawtooth-Walsh readouts to enforce orthogonality of output coordinates. The dyadic tree structure of the Rademacher-Haar wavelet concentrates the approximation hardness in the common feature component, while the readouts act as task-specific heads. Using an information-theoretic framework, we obtain a sharp gap between the optimal approximation rates achievable by joint and separate coding. Finally, we realize this separation in a neural network model using Heaviside activations via reduction to triangle-wave approximation. Our results show that even under an orthogonality constraint joint approximation requires strictly fewer bits in compositional architectures, provided the tasks share a latent hard feature. This provides theoretical insight into the description-length-efficiency of compositional multi-output architectures and clarifies how neural networks can retain expressivity under geometric constraints.

2606.16257 2026-06-16 cs.LG cs.AI 新提交

Variance Reduction for Non-Log-Concave Sampling with Applications to Inverse Problems

非对数凹采样的方差缩减及其在逆问题中的应用

M. Berk Sahin, Ahmet Ege Tanriverdi, Behzad Sharif, Abolfazl Hashemi

发表机构 * School of Electrical and Computer Engineering, Purdue University(普渡大学电气与计算机工程学院) School of Electrical and Computer Engineering, University of Southern California(南加州大学电气与计算机工程学院) School of Biomedical Engineering, Purdue University(普渡大学生物医学工程学院)

AI总结 针对非对数凹分布采样中随机梯度高方差问题,提出统一分析动量、STORM和PAGE等方差缩减方法,证明其在相对Fisher信息和非平方总变差距离下的改进收敛率,并扩展至基于得分的生成先验逆问题求解。

Comments Accepted to Uncertainty in Artificial Intelligence (UAI) 2026

详情
AI中文摘要

从具有未归一化密度的高维、非对数凹分布中采样是机器学习中的一个基本挑战,特别是当势能的精确梯度不可用,且必须通过每次迭代固定梯度计算预算下表现出高方差的随机梯度来近似时。尽管诸如带动量的SGD、STORM和PAGE等方差缩减技术已在非凸优化中展现出改进的收敛性质,但它们对非对数凹分布采样的影响仍 largely unexplored。在这项工作中,我们首次对这些估计器用于非对数凹分布采样进行了统一分析。我们在$\varepsilon$-相对Fisher信息下建立了改进的非渐近收敛率,并在Poincaré不等式假设下,在平方总变差距离下建立了改进的非渐近收敛率,进一步证明了向目标分布的弱收敛。我们将分析扩展到使用基于得分的生成先验求解逆问题。我们通过实验验证了理论,并证明在每次迭代固定梯度计算预算下,方差缩减技术在两个标准成像应用中 consistently 提高了样本质量。

英文摘要

Sampling from high-dimensional, non-log-concave distributions with unnormalized densities is a fundamental challenge in machine learning, particularly when the exact gradient of the potential is unavailable and must be approximated via stochastic gradients that exhibit high variance under a fixed budget of gradient computations per iteration. Although variance reduction techniques such as SGD with momentum, STORM, and PAGE have demonstrated improved convergence properties in non-convex optimization, their implications for sampling from non-log-concave distributions remain largely unexplored. In this work, we develop the first unified analysis of these estimators for sampling from non-log-concave distributions. We establish improved non-asymptotic convergence rates in $\varepsilon$-relative Fisher information and, under a Poincaré inequality assumption, in squared total variation distance, and further prove weak convergence to the target distribution. We extend our analysis to solving inverse problems with score-based generative priors. We empirically validate our theory and demonstrate that, under a fixed gradient computations per iteration, variance-reduction techniques consistently improve sample quality in two standard imaging applications.

2606.16301 2026-06-16 cs.LG stat.ML 新提交

One-Step Generalization Ratio Guided Optimization for Domain Generalization

一步泛化比率引导的域泛化优化

Sumin Cho, Dongwon Kim, Kwangsu Kim

发表机构 * Korea Advanced Institute of Science and Technology (KAIST)(韩国高级科学技术研究所)

AI总结 提出GENIE优化器,通过一步泛化比率(OSGR)动态均衡参数更新,抑制虚假相关,促进域不变特征学习,在域泛化任务中超越现有优化器。

Comments 29 pages, accepted at the 42nd International Conference on Machine Learning (ICML 2025)

详情
AI中文摘要

域泛化(DG)旨在训练模型泛化到未见过的目标域,但常常过拟合到域特定特征,即所谓的非期望相关性。基于梯度的DG方法通常引导梯度朝向主导方向,但往往无意中强化了虚假相关性。最近的工作采用dropout来正则化过度自信的参数,但未明确调整梯度对齐或确保平衡的参数更新。我们提出GENIE(泛化增强迭代均衡器),一种新颖的优化器,利用一步泛化比率(OSGR)量化每个参数对损失减少的贡献并评估梯度对齐。通过预条件因子动态均衡OSGR,GENIE防止少量参数主导优化,从而促进域不变特征学习。理论上,GENIE平衡参数间的收敛贡献和梯度对齐,在保持SGD收敛速度的同时实现更高的OSGR。实验上,它优于现有优化器,并在与各种DG和单DG方法集成时提升性能。

英文摘要

Domain Generalization (DG) aims to train models that generalize to unseen target domains but often overfit to domain-specific features, known as undesired correlations. Gradient-based DG methods typically guide gradients in a dominant direction but often inadvertently reinforce spurious correlations. Recent work has employed dropout to regularize overconfident parameters, but has not explicitly adjusted gradient alignment or ensured balanced parameter updates. We propose GENIE (Generalization-ENhancing Iterative Equalizer), a novel optimizer that leverages the One-Step Generalization Ratio (OSGR) to quantify each parameter's contribution to loss reduction and assess gradient alignment. By dynamically equalizing OSGR via a preconditioning factor, GENIE prevents a small subset of parameters from dominating optimization, thereby promoting domain-invariant feature learning. Theoretically, GENIE balances convergence contribution and gradient alignment among parameters, achieving higher OSGR while retaining SGD's convergence rate. Empirically, it outperforms existing optimizers and enhances performance when integrated with various DG and single-DG methods.

2606.16579 2026-06-16 cs.LG math-ph math.DG math.MP 新提交

On the Entropy Formula for Real, Complex, and Quaternionic Deep Linear Networks

关于实、复和四元数深度线性网络的熵公式

Luis Contreras, Marco Nahas, Tejas Kotwal

发表机构 * CINVESTAV-IPN(墨西哥国立理工学院高级研究中心) Brown University(布朗大学)

AI总结 将Menon和Yu的实深度线性网络熵公式推广到复和四元数情形,得到统一公式。

Comments 17 pages

详情
AI中文摘要

我们将Menon和Yu关于实深度线性网络(DLN)的熵公式推广到其复和四元数类似物,得到了在$\mathbb{R}$、$\mathbb{C}$和$\mathbb{H}$上的DLN的统一公式。

英文摘要

We extend the entropy formula of Menon and Yu for the real Deep Linear Network (DLN) to its complex and quaternionic analogues, obtaining a unified formula for DLNs over $\mathbb{R}$, $\mathbb{C}$, and $\mathbb{H}$.

2606.15217 2026-06-16 stat.ML cs.LG 交叉投稿

Conformal Candidate Certification for Offline Model-Based Optimization

离线模型优化的共形候选认证

Seungjin Choi

发表机构 * Seungjin Choi(Choi)

AI总结 提出共形候选认证(CCC)方法,通过加权共形预测为离线模型优化中的候选设计提供校准的单侧下界,确保超过目标阈值的候选被认证,解决了分布偏移下的统计可靠性问题。

Comments ICML 2026 Workshop on Decision-Making from Offline Datasets to Online Adaptation: Black-Box Optimization to Reinforcement Learning

详情
AI中文摘要

离线模型优化(MBO)通过优化在固定历史数据集上训练的代理模型来提出候选方案。由于候选方案故意处于分布外,代理模型的排名在最优化器最激进的地方最不可靠,然而现有方法没有为每个候选提供统计证书,证明其设计满足目标阈值。我们提出\emph{共形候选认证}(CCC),一种事后包装器,为每个候选附加一个校准的单侧下界,并仅推进那些下界超过目标阈值的候选。我们证明,熵正则化的代理最大化诱导出吉布斯倾斜提议,因此同一代理模型为加权共形预测提供重要性权重,无需单独的密度比估计步骤。在受控的合成研究中,CCC在名义水平0.90下认证了激进提议池中的16.7%的候选,经验覆盖率为0.990,而忽略协变量偏移的标准共形预测覆盖率降至0.416。

英文摘要

Offline model-based optimization (MBO) proposes candidates by optimizing a surrogate trained on a fixed historical dataset. Because candidates are deliberately out-of-distribution, surrogate rankings are least reliable exactly where the optimizer is most aggressive, yet existing methods provide no per-candidate statistical certificate that a design meets a target threshold. We propose \emph{Conformal Candidate Certification} (CCC), a post-hoc wrapper that attaches a calibrated one-sided lower bound to each candidate and advances only those whose bound exceeds the target. We show that entropy-regularized surrogate maximization induces a Gibbs-tilted proposal, so the same surrogate supplies importance weights for weighted conformal prediction without a separate density-ratio estimation step. In a controlled synthetic study, CCC certifies $16.7\%$ of an aggressive proposal pool with empirical coverage 0.990 at nominal 0.90, while standard conformal prediction ignoring the covariate shift collapses to 0.416 coverage.

2606.15271 2026-06-16 math.OC cs.LG 交叉投稿

Dual-Network PINNs for Optimal Control: A Reproducible Benchmark on the Mass-Spring-Damper System

双网络PINNs用于最优控制:质量-弹簧-阻尼器系统的可复现基准

Abdeladhim Tahimi, Rinaldo Vieira da Silva Junior

发表机构 * Centro de Engenharias e Ciências Agrárias, Universidade Federal de Alagoas, Brazil(工程与农业科学系,巴西联邦大学阿拉加斯分校)

AI总结 提出双网络物理信息神经网络(PINN)直接求解质量-弹簧-阻尼器系统最优控制问题,通过状态网络精确满足边界条件、控制网络无约束,损失函数结合物理残差和成本泛函梯形近似,在基准上复现经典最优成本至四位有效数字。

Comments 22 pages, 6 figures. Reproducible benchmark study of dual-network Physics-Informed Neural Networks (PINNs) for optimal control of a mass-spring-damper system. Includes comparison with Pontryagin's Minimum Principle and direct transcription methods and accompanying Google Colab implementation

详情
AI中文摘要

本文提出了一个透明且可复现的基准研究,针对质量-弹簧-阻尼器系统的最优控制,采用直接双网络物理信息神经网络(PINN)公式。经典的线性二次最优控制问题通过两种独立的经典方法求解——Pontryagin最小值原理结合单次打靶法,以及通过梯形配点法的直接转录——并重新表述为一个受约束的优化问题,由两个前馈神经网络求解:一个状态网络,其边界条件通过复合三次和掩码假设精确强制执行;以及一个无约束的控制网络。复合损失结合了配点处的物理残差和成本泛函的梯形近似,并由单个标量超参数加权。在所考虑的基准上,PINN将经典最优成本复现至四位有效数字,精确满足终端状态约束,并产生点态状态和控制误差,这些误差落在两个经典参考的范围内。在此基准上,训练速度比经典打靶法慢大约两个数量级,这是如实报告的。贡献在于方法的清晰性而非方法的新颖性:该公式及附带的Google Colab实现旨在降低实践者探索基于PINN的最优控制的入门门槛,无需预先了解伴随方法或两点边值问题。

英文摘要

This work presents a transparent and reproducible benchmark study of a direct dual-network Physics-Informed Neural Network (PINN) formulation for the optimal control of a mass-spring-damper system. The classical linear-quadratic optimal control problem is solved by two independent classical methods -- Pontryagin's Minimum Principle with single shooting, and direct transcription through trapezoidal collocation -- and recast as a constrained optimization problem solved by two feedforward neural networks: a state network whose boundary conditions are enforced exactly through a composite cubic-and-mask ansatz, and an unconstrained control network. The composite loss combines the physics residual at the collocation points with a trapezoidal approximation of the cost functional, weighted by a single scalar hyperparameter. On the benchmark considered, the PINN reproduces the classical optimal cost to four significant digits, satisfies the terminal state constraints exactly by construction, and produces pointwise state and control errors that fall within the spread of the two classical references. Training is approximately two orders of magnitude slower than classical shooting on this benchmark, which is honestly reported. The contribution is methodological clarity rather than methodological novelty: the formulation and the accompanying Google Colab implementation are intended to lower the barrier to entry for practitioners exploring PINN-based optimal control without prior exposure to adjoint methods or two-point boundary value problems.

2606.15393 2026-06-16 stat.ML cs.LG stat.ME 交叉投稿

Finite Resources False Discovery Rate Control in Structured Hypothesis Spaces

结构化假设空间中的有限资源错误发现率控制

Binyamin Perets, Shie Mannor

发表机构 * Technion – Israel Institute of Technology(技术学院 – 以色列理工学院) NVIDIA

AI总结 针对有限空分布样本和结构化假设空间,提出基于再生核的框架,通过两种决策规则在精确FDR控制与统计功效间权衡,并优化资源分配。

详情
AI中文摘要

科学发现依赖于大规模假设检验。然而,在控制错误发现的同时识别真正发现的能力面临重大挑战:获取相关参考数据(零分布)是资源密集型的,留下有限数据的不确定性,并且当假设空间存在固有结构时,程序应考虑该结构。在这里,我们提出了一个框架,用于在以下两种情况下控制错误发现率:当每个假设仅由有限数量的空分布样本支持,导致其p值不确定时;以及当假设空间具有任意结构时,仅要求通过合适的再生核表示该结构。我们提出了两种决策规则,它们对结构错误指定都具有鲁棒性,但在精确FDR控制和统计功效之间提供了不同的权衡。第一个规则保证精确的FDR控制;第二个规则通过将镜像统计控制适应到计数空间来最大化功效,利用分析框架在精确镜像对称放松时评估FDR控制。此外,RKHS框架带来的可处理性使我们能够直接研究有限数据的不确定性,我们利用这一点提出了一种有效分配零分布样本的策略。

英文摘要

Scientific discovery relies on large-scale hypothesis testing. However, the capacity to identify true discoveries while controlling false discovery faces major challenges: obtaining relevant reference data (the null distribution) is resource-intensive, leaving finite-data uncertainty, and the procedure should account for the inherent structure in the hypothesis space, when such structure exists. Here, we present a framework for controlling the false discovery rate both when each hypothesis is evidenced only by a finite count of null draws, leaving its p-value uncertain, and when the hypothesis space carries arbitrary structure, requiring only that the structure be represented through a suitable reproducing kernel. We present two decision rules that are both robust to structural mis-specification, yet offer a distinct trade-off between exact FDR control and statistical power. The first rule guarantees exact FDR control; the second maximizes power by adapting mirror-statistic control into count space, utilizing an analytical framework to assess FDR control when exact mirror symmetry is relaxed. Furthermore, the tractability gained by the RKHS framework allows us to directly investigate finite-data uncertainties, which we leverage to suggest a policy for the efficient allocation of null distribution samples.

2606.15443 2026-06-16 math.OC cs.LG 交叉投稿

Coercivity and Local Convergence of Physical Learning in Linear Circuits

线性电路中物理学习的强制性与局部收敛性

Joshua A. McGinnis, Xinbo Li, Yoichiro Mori

发表机构 * Department of Mathematics, University of Pennsylvania, Philadelphia(宾夕法尼亚大学数学系)

AI总结 针对线性电路,分析三种物理学习方法(平衡传播、耦合学习及其伴随变体)在小扰动极限下的局部收敛性,发现强制条件(基于网络结构的秩条件)保证指数收敛,且非退化情况是普遍的。

详情
AI中文摘要

物理学习方法利用系统的物理特性处理全局信息传递,仅通过局部更新规则训练物理网络执行计算任务。我们首次对三种此类方法——平衡传播(EP)、耦合学习(CL)以及我们提出的新方法伴随耦合学习(AL)——在离散和连续时间的小扰动极限下,针对线性电路进行了局部收敛性分析。EP和AL在自然损失函数上执行梯度下降,而CL遵循带有额外三次修正的修正动力学。假设解存在,我们识别出一个强制条件,表示为基于网络关联结构构建的矩阵的秩条件,在该条件下训练损失指数衰减且参数收敛到解流形。我们通过展示一个风筝电路(其中对称性导致强制常数在解流形上退化)证明了强制可能失败,但利用Sard定理证明这种退化是非典型的:对于几乎每个期望输出的选择,强制条件在解流形的每一点都成立。

英文摘要

Physical learning methods train physical networks to perform computational tasks using only local update rules, exploiting the physics of the system to handle the global transfer of information. We provide the first local convergence analysis of three such methods -- Equilibrium Propagation (EP), Coupled Learning (CL), and a new method we call Adjoint Coupled Learning (AL) -- for linear circuits, in the limit of small-nudging for both discrete and continuous time. EP and AL perform gradient descent on a natural loss function, while CL follows modified dynamics with an additional cubic correction. Assuming the existence of a solution, we identify a coercivity condition, expressed as a rank condition on a matrix built from the network's incidence structure, under which the training loss decays exponentially and the parameters converge to the solution manifold. We show that coercivity can fail by exhibiting a kite circuit in which a symmetry causes the coercivity constant to degenerate on the solution manifold, but prove using Sard's theorem that such degeneracies are non-generic: coercivity holds at every point of the solution manifold for almost every choice of desired output.

2606.15444 2026-06-16 math.OC cs.LG 交叉投稿

A Conservation Law for Equilibrium Propagation and Coupled Learning

平衡传播与耦合学习中的守恒律

Joshua A. McGinnis, Adam G. Kline, Yoichiro Mori

发表机构 * SAS, University of Pennsylvania(宾夕法尼亚大学SAS学院)

AI总结 本文证明物理学习方法耦合学习和平衡传播在连续时间小扰动极限下守恒类质量量,并分析其对线性电路训练动力学的影响。

详情
AI中文摘要

在本文中,我们展示了被称为耦合学习(CL)和平衡传播(EP)的物理学习方法在连续时间、小扰动极限下,在可训练参数中守恒一个类质量量。我们证明这种守恒在广泛的物理相关设置中成立。然后,我们展示了守恒律以某种方式约束训练动力学,使得在线性电路的重要设置中收敛可靠。最后,我们讨论了该守恒律的一些实际意义。

英文摘要

In this paper we show that the physical learning methods known as coupled learning (CL) and equilibrium propagation (EP) conserve a mass-like quantity in the trainable parameters in the continuous-time, small-nudging limit. We prove that this conservation holds in a broad range of physically relevant settings. We then show that the conservation law constrains the training dynamics in a way that makes convergence reliable in important settings for linear circuits. We conclude by discussing some practical implications of this conservation law.

2606.15458 2026-06-16 stat.ML cs.LG 交叉投稿

Structured Nonparametric Variational Inference for Dependent Latent Modeling

面向依赖潜变量建模的结构化非参数变分推断

Yuda Shao, Zhiling Gu, Shan Yu

发表机构 * Department of Statistics, University of Virginia(弗吉尼亚大学统计系) Department of Biostatistics, Yale University(耶鲁大学生物统计学系)

AI总结 提出结构化非参数变分推断(SN-VI),利用多元样条技术建模后验分布中潜变量的复杂依赖关系,无需均值场假设,具有理论保证和自动依赖发现能力,在计算机视觉和空间转录组学中表现优异。

详情
AI中文摘要

变分推断(VI)是现代人工智能的核心引擎,能够实现大规模概率和生成模型的可扩展近似贝叶斯学习及不确定性感知训练。本文提出结构化非参数变分推断(SN-VI),一种利用多元样条技术对后验近似中潜变量间的复杂依赖关系进行建模的新框架。与依赖均值场假设的传统方法不同,SN-VI保留了复杂的潜变量依赖关系,能够灵活且准确地逼近任意形状的后验分布。我们建立了严格的理论保证,包括变分目标下界的推导以及后验估计渐近一致性的证明。为便于实际实现,我们开发了一种算法,可自动识别依赖潜变量及其底层依赖结构,无需手动指定。模拟研究验证了SN-VI在逼近具有有界支撑和复杂依赖的后验分布方面的有效性。该方法已成功应用于高维结构化数据,包括计算机视觉数据集和空间转录组学。在这些应用中,SN-VI展示了改进的生成模型性能,并通过学习到的依赖结构有效揭示了耦合的生物信号。

英文摘要

Variational inference (VI) is a core engine of modern AI, enabling scalable approximate Bayesian learning and uncertainty-aware training of large probabilistic and generative models. In this paper, we propose Structured Nonparametric Variational Inference (SN-VI), a novel framework for modeling complex dependencies among latent variables in posterior approximation, leveraging multivariate spline techniques. Unlike traditional methods that rely on the mean-field assumption, SN-VI preserves intricate latent variable dependencies, providing a flexible and accurate approximation of posteriors with arbitrary shapes. We establish rigorous theoretical guarantees, including the derivation of the lower bound for the variational objective and proof of asymptotic consistency in posterior estimation. To facilitate practical implementation, we develop an algorithm that automatically identifies dependent latent variables and their underlying dependence structure, without requiring manual specification. Simulation studies validate the effectiveness of SN-VI in approximating posterior distributions with bounded support and complex dependencies. The proposed method has been successfully applied to high-dimensional structured data, including computer vision datasets and spatial transcriptomics. In these applications, SN-VI demonstrates improved generative model performance and effectively uncovers coupled biological signals through the learned dependency structure.

2606.15555 2026-06-16 math.OC cs.AI cs.LG stat.ML 交叉投稿

Service-Induced Congestion in Memory-Constrained LLM Serving

内存受限的大语言模型服务中的服务引发拥塞

Ruicheng Ao, Jing Dong, Gan Luo, David Simchi-Levi

发表机构 * Institute for Data, Systems, and Society, Massachusetts Institute of Technology(数据、系统与社会研究所,麻省理工学院) Columbia Business School, Columbia University(哥伦比亚大学商学院) School of Mathematical Sciences, Peking University(北京大学数学科学学院)

AI总结 本文通过离散时间动力学模型研究内存受限的大语言模型服务中,因键值缓存增长导致的服务引发拥塞,发现同质负载下无驱逐均衡不稳定且收敛到最坏情况极限环,异质负载下稳定条件与解码长度互质相关,并提出调度设计原则。

Comments 101 pages

详情
AI中文摘要

在大语言模型(LLM)服务中,每个请求在服务期间会积累持久的图形处理单元(GPU)内存,因为其键值缓存随着每个生成的令牌而增长。在高并发下,总内存使用量因此随时间内生增长:服务过程本身会创造未来的容量压力。当内存容量超出时,系统会驱逐活动请求,丢弃缓存状态并在稍后重新启动它们,这浪费了计算并降低了吞吐量。我们开发了一个内存受限的LLM推理的离散时间动力学模型,该模型捕获了连续批处理下的准入、内存增长和驱逐。在饱和输入机制下,系统同时存在无驱逐的固定点和带驱逐的极限环。对于同质负载,我们证明无驱逐平衡是不稳定的,并且除了一个勒贝格测度为零的精确捕获集外,系统收敛到一个唯一的最坏情况极限环,该极限环在该例外集外是渐近稳定的,吞吐量损失高达50%。对于异质负载,我们在两类共同输入设置下证明了一个稳定性准则,并解释了生存多项式机制如何推广到多类和异质输入长度。在输入主导的缩放机制下,互质的解码长度稳定了无驱逐平衡,而非互质的长度创造了同步模式,导致不稳定。这些结果描述了负载异质性何时使完成去同步化并有助于稳定内存受限的服务。更广泛地说,我们将服务引发的拥塞识别为一种结构性不稳定机制,并推导出维持高吞吐量的调度设计原则。

英文摘要

In large language model (LLM) serving, each request accumulates persistent graphics processing unit (GPU) memory during service as its key-value cache grows with every generated token. Under high concurrency, aggregate memory usage therefore increases endogenously over time: the service process itself creates future capacity pressure. When memory capacity is exceeded, systems evict active requests, discarding cached state and restarting them later, which wastes computation and reduces throughput. We develop a discrete-time dynamical model of memory-constrained LLM inference that captures admission, memory growth, and eviction under continuous batching. In the saturated-input regime, the system admits both eviction-free fixed points and limit cycles with evictions. For homogeneous workloads, we show that the eviction-free equilibrium is unstable and that, except for a Lebesgue-measure-zero exact-capture set, the system converges to a unique worst-case limit cycle that is asymptotically stable outside this exceptional set, with throughput losses as large as 50%. For heterogeneous workloads, we prove a stability criterion in the two-class common-input setting and explain how the survival-polynomial mechanism generalizes to multiple classes and heterogeneous-input lengths. Under an input-dominated scaling regime, coprime decoding lengths stabilize the eviction-free equilibrium, while non-coprime lengths create synchronized modes that drive instability. These results characterize when workload heterogeneity desynchronizes completions and helps stabilize memory-constrained serving. More broadly, we identify service-induced congestion as a structural instability mechanism and derive scheduling design principles for sustaining high throughput.

2606.15581 2026-06-16 stat.ML cs.LG math.PR math.SP math.ST stat.TH 交叉投稿

Phase Transition in Convex Relaxations for Graph Alignment

图对齐凸松弛中的相变

Laurent Massoulié, Sushil Mahavir Varma, Louis Vassaux, Irène Waldspurger

发表机构 * INRIA, DI/ENS, PSL Research University(INRIA,DI/ENS,PSL研究大学) Industrial and Operations Engineering, University of Michigan(工业与运营工程,密歇根大学)

AI总结 研究相关GOE矩阵的图对齐问题,分析凸松弛方法,证明当相关参数σ=o(n^{-1/2}/log^4 n)时解集中到真实排列,并刻画了相变阈值。

Comments Accepted for presentation at the Conference on Learning Theory (COLT) 2026

详情
AI中文摘要

我们研究了相关高斯正交系综(GOE)矩阵的图对齐问题,目标是在给定两个相关对称高斯矩阵$(A, B)$(相关性为$1/\sqrt{1+σ^2}$)的情况下恢复隐藏的顶点排列。虽然最大似然估计在信息论上是最优的,但其计算归结为二次分配问题,难以处理。受此启发,我们分析了基于在双随机矩阵集和单位超立方体上最小化$\\|AX - XB\\|_F$的凸松弛。我们证明,当相关参数满足$σ= o(n^{-1/2}/\log^4 n)$时,任一松弛的解$(X^\star)$集中在真实排列矩阵$(Π^\star)$附近,即$\\|X^\star-Π^\star\\|_F^2 = o(n)$,这意味着在简单的后处理后可以恢复除消失比例顶点外的所有顶点。结合现有下界,我们的结果精确刻画了$\\|X^\star-Π^\star\\|_F^2$从$σ= \tilde{o}(n^{-1/2})$时的$o(n)$到$σ= \tildeΩ(n^{-1/2})$时的$Ω(n)$的转变。在此过程中,我们的分析显著收紧了先前的结果,并将其扩展到双随机松弛之外。

英文摘要

We study the graph alignment problem for correlated Gaussian Orthogonal Ensemble (GOE) matrices, where the goal is to recover a hidden vertex permutation given two correlated symmetric Gaussian matrices $(A, B)$ with correlation $1/\sqrt{1+σ^2}$. While the maximum likelihood estimator is information-theoretically optimal, its computation, which reduces to a quadratic assignment problem, is intractable. Motivated by this, we analyze convex relaxations based on minimizing $\|AX - XB\|_F$ over the set of doubly stochastic matrices and the unit hypercube. We show that when the correlation parameter satisfies $σ= o(n^{-1/2}/\log^4 n)$, the solution of either relaxation $(X^\star)$ concentrates around the ground-truth permutation matrix $(Π^\star)$, i.e., $\|X^\star-Π^\star\|_F^2 = o(n)$, implying recovery of all but a vanishing fraction of vertices after simple post-processing. Combined with existing lower bounds, our results precisely characterize that $\|X^\star-Π^\star\|_F^2$ transitions from $o(n)$ for $σ= \tilde{o}(n^{-1/2})$ to $Ω(n)$ for $σ= \tildeΩ(n^{-1/2})$. In doing so, our analysis significantly tightens prior results and extends them beyond doubly stochastic relaxations.

2606.15665 2026-06-16 stat.ML cs.LG math.ST stat.TH 交叉投稿

Information Gap and Feasibility-Aware Inference in Binomial Logistic Mixtures

二项逻辑混合模型中的信息差距与可行性感知推断

Yuta Hayashida, Shonosuke Sugasawa

AI总结 研究二项逻辑混合模型中混合检测与标签恢复之间的信息差距,提出基于后验熵惩罚的可行性感知推断方法,避免误导性成分选择并改善后验标签概率校准。

Comments 33 pages (main) + 30 pages (supplement)

详情
AI中文摘要

本文研究二项逻辑混合模型中混合检测与标签恢复之间的信息差距。基于似然的标准准则(如贝叶斯信息准则,BIC)可以检测到两个成分的存在,但这并不能保证相应的标签是可恢复的。我们表明,这种差距对于具有固定试验次数的二项逻辑混合模型是内在的:观察到的混合结构证据和用于标签恢复的每个观测信息在成分分离度上具有不同的局部阶数,并且只有前者随样本量累积。因此,存在一个可检测但不可恢复的区域,其中BIC选择两个成分,而后验标签基本上没有信息。为了解决这个问题,我们提出了两种可行性感知推断程序:具有后验熵惩罚的可恢复性感知BIC,以及一种熵正则化估计器,它减轻了最大似然估计器产生过度分离成分和过度集中的后验责任的倾向。数值实验证实了预测的差距,并表明所提出的方法避免了误导性的成分选择,并改善了后验标签概率的校准。

英文摘要

This paper studies the information gap between mixture detection and label recovery in binomial logistic mixtures. Standard likelihood-based criteria such as the Bayesian information criterion (BIC) can detect the presence of two components, but this does not guarantee that the corresponding labels are recoverable. We show that this gap is intrinsic to binomial logistic mixtures with a fixed number of trials: observed-data evidence for mixture structure and per-observation information for label recovery have different local orders in the component separation, and only the former accumulates with the sample size. As a result, there exists a detectable-but-unrecoverable regime in which BIC selects two components while the posterior labels remain essentially uninformative. To address this issue, we propose two feasibility-aware inference procedures: a recoverability-aware BIC with a posterior-entropy penalty and an entropy-regularized estimator that mitigates the tendency of the maximum likelihood estimator to produce overly separated components and overly concentrated posterior responsibilities. Numerical experiments confirm the predicted gap and demonstrate that the proposed methods avoid misleading component selections and improve the calibration of posterior label probabilities.

2606.15679 2026-06-16 stat.ML cs.LG cs.NA math.NA 交叉投稿

Stochastic trace estimation with tensor train random vectors

基于张量列随机向量的随机迹估计

Zvonimir Bujanović, Daniel Kressner, Hrvoje Olić

发表机构 * University of Zagreb, Faculty of Science, Department of Mathematics(Zagreb大学科学学院数学系) Institute of Mathematics, EPFL(EPFL数学研究所)

AI总结 研究使用高斯随机张量列向量进行随机迹估计,证明适当秩下可恢复维度无关保证,并应用于Nyström++框架。

详情
AI中文摘要

随机迹估计是一种标准工具,用于近似仅通过矩阵-向量乘积可获得的大规模矩阵的迹。然而,在张量结构设置中,非结构化的高斯或Rademacher测试向量在存储和计算上可能过于昂贵,而更便宜的秩一张量积向量可能需要随张量阶数指数增长的样本复杂度。本文研究高斯随机张量列向量作为随机迹估计的结构化替代方案。我们证明,通过适当选择张量列秩,随机张量列向量可以恢复Girard-Hutchinson估计器的维度无关保证。特别地,基于张量列秩$r \geq d-1$的中位数均值变体在精度$\varepsilon$和失败概率$\delta$上实现了与基于非结构化高斯向量的经典估计器相同的依赖性。我们进一步证明了由独立高斯随机张量列向量形成的草图的一个无意识子空间注入结果:张量列秩$r\geq d-1$和$\mathcal{O}(\varepsilon^{-2}(k+\log(1/δ)))$个样本足以用于$k$维目标子空间。最后,我们研究了此类草图在Nyström++框架中的应用。我们证明,在额外的谱尾条件下,所得估计器可以实现所需的$\mathcal{O}(\varepsilon^{-1})$样本复杂度。这些结果阐明了随机张量列向量在随机迹估计中的潜力和局限性。

英文摘要

Stochastic trace estimation is a standard tool for approximating the trace of a large-scale matrix available only through matrix-vector products. However, in tensor-structured settings, unstructured Gaussian or Rademacher test vectors may be prohibitively expensive to store and compute with, while cheaper rank-one tensor-product vectors can require sample complexities that grow exponentially with the tensor order. This work studies Gaussian random tensor train vectors as a structured alternative for stochastic trace estimation. We show that, with a suitable choice of the tensor train rank, random tensor train vectors recover dimension-independent guarantees for the Girard--Hutchinson estimator. In particular, a median-of-means variant with tensor train rank $r \geq d-1$ achieves the same dependence on the accuracy $\varepsilon$ and failure probability $δ$ as the classical estimator based on unstructured Gaussian vectors. We further prove an oblivious subspace injection result for sketches formed from independent Gaussian random tensor train vectors: tensor train rank $r\geq d-1$ and $\mathcal{O}(\varepsilon^{-2}(k+\log(1/δ)))$ samples suffice for a $k$-dimensional target subspace. Finally, we investigate the use of such sketches within the Nyström++ framework. We show that the resulting estimator can achieve the desired $\mathcal{O}(\varepsilon^{-1})$ sample complexity under an additional spectral-tail condition. These results provide clarififcation on both the potential and the limitations of random tensor train vectors in stochastic trace estimation.

2606.15923 2026-06-16 cs.NE cs.AI cs.LG 交叉投稿

Runtime Analysis of Cartesian Genetic Programming in Evolving Boolean Functions

笛卡尔遗传规划在演化布尔函数中的运行时分析

Duc-Cuong Dang, Roman Kalkreuth, Andre Opris

发表机构 * University of Passau(帕绍大学) RWTH Aachen University(亚琛工业大学)

AI总结 本文首次对笛卡尔遗传规划在完全训练集上演化布尔函数进行运行时分析,证明构造n输入合取式的期望适应度评估次数为O(n D^5),并发现非严格选择可加速至O(n D^4),而异或函数需要指数时间。

Comments To appear in the Proceedings of PPSN 2026

详情
AI中文摘要

笛卡尔遗传规划(CGP)是遗传规划中实用且流行的形式之一,因为它使用基于图的程序表示。本文首次对CGP在完全训练集上演化布尔函数进行运行时分析。我们证明了CGP使用最多D≥n-1个二元门、最小函数集,甚至采用严格生存选择时,构造n个输入的合取式的期望适应度评估次数的渐近界为O(n D^5)。当使用非严格选择时,该界改进为O(n D^4)。我们的分析揭示了CGP诱导搜索的有趣特征,这些特征此前仅通过经验观察得到。特别是,允许接受同样好的解(包括那些包含不贡献适应度的连接门的解)可以导致加速,从而获得更好的渐近时间界。与合取式相反,我们还证明了一个负面结果,即CGP需要指数时间来演化异或函数。演化合取式的实验补充了我们的理论发现。使用不完全训练集可以进一步减少平均适应度评估次数,同时保持较好的泛化水平。

英文摘要

Cartesian Genetic Programming (CGP) is among the practical and popular forms of Genetic Programming as it uses a graph-based representation of programs. This paper presents a first runtime analysis of CGP in evolving Boolean functions using complete training sets. We prove an asymptotic bound $O(n D^5)$ for the expected number of fitness evaluations of CGP to construct a conjunction of $n$ inputs using at most $D \geq n-1$ binary gates, a minimal function set, and even with a strict survival selection. When the non-strict selection is used, the bound is improved to $O(n D^4)$. Our analysis reveals interesting characteristics of CGP induced search, which have been only observed empirically. In particular, enabling the acceptance of equally good solutions, including those with connected gates non-contributing to fitness, can lead to a speedup, and consequently a better asymptotic time bound. In contrast to conjunctions, we also prove a negative result which shows that CGP requires exponential time to evolve an exclusive disjunction. Experiments evolving conjunctions complement our theoretical findings. The use of incomplete training sets is found to further reduce the average number of fitness evaluations while maintaining a good level of generalisation.

2606.15962 2026-06-16 stat.ME cs.LG 交叉投稿

p-PSO: A Penalized Particle Swarm Optimization Technique for Finding D-Optimal Designs with Mixed Factors in Generalized Linear Models

p-PSO: 一种用于广义线性模型中混合因子D-最优设计的惩罚粒子群优化技术

Shrabanti Chowdhury, Abhyuday Mandal

发表机构 * Icahn School of Medicine at Mount Sinai(伊坎医学院) University of Georgia(佐治亚大学)

AI总结 提出一种新的惩罚粒子群优化方法p-PSO,通过通用惩罚公式解决广义线性模型中混合因子D-最优设计问题,高效且可直接使用现成PSO算法。

详情
AI中文摘要

寻找广义线性模型(GLMs)的D-最优设计具有挑战性,因为Fisher信息矩阵依赖于未知参数且缺乏闭式解,尤其当输入因子包含离散和连续变量时。尽管经典算法和最近的元启发式方法提供了部分解决方案,但仍需要稳健且计算高效的方法。本文提出了一种惩罚粒子群优化(PSO)方法,称为$p$-PSO。我们引入了一种新的、通用的约束优化惩罚公式,并展示了其在最优设计问题中的有效性。该公式与算法无关,适用于一大类黑箱优化方法。结果表明,该方法非常高效,其主要贡献在于提出了一种惩罚公式,使得可以直接使用现成的PSO算法,并自然地扩展到更一般的约束优化任务。

英文摘要

Finding D-optimal designs for generalized linear models (GLMs) is challenging due to the dependence of the Fisher information matrix on unknown parameters and the lack of closed-form solutions, particularly when input factors include both discrete and continuous variables. Although classical algorithms and recent metaheuristic approaches have offered partial solutions, there remains a need for robust and computationally efficient methods. In this paper, we propose a penalized Particle Swarm Optimization (PSO) approach, named $p$-PSO. Here we introduce a new, general-purpose penalty formulation for constrained optimization and demonstrate its effectiveness in optimal design problems. The formulation is algorithm-agnostic and applicable to a broad class of black-box optimization methods. Results show that the method is highly efficient, with its primary contribution being a penalty formulation that enables the direct use of an off-the-shelf PSO algorithm and extends naturally to more general constrained optimization tasks.

2606.16013 2026-06-16 cond-mat.dis-nn cs.LG physics.data-an stat.ML 交叉投稿

The limits of interpretability in multiple linear regression

多元线性回归中可解释性的极限

Anand Sharma, Chen Liu, Daniele Coslovich, Misaki Ozawa

发表机构 * Indian Institute of Science Education and Research(印度科学教育与研究学院) Innovation and Research Division, Ge-Room Inc.(Ge-Room公司创新与研究部) Dipartimento di Fisica, Università di Trieste(特里este大学物理系) Univ. Grenoble Alpes, CNRS, LIPhy(格勒诺布尔阿尔卑斯大学,CNRS,LIPhy)

AI总结 本文通过分析特征相关矩阵的本征模,理论解释了多重共线性导致线性回归权重不稳定和振荡模式,从而丧失可解释性的机制,并验证了岭回归的缓解作用。

Comments 23 pages, 8 figures

详情
AI中文摘要

解释机器学习模型已引起越来越多的关注,特别是在物理科学中,人们常常寻求理解潜在机制而不仅仅是进行预测。多元线性回归通常被视为比深度神经网络等更复杂模型更具可解释性的替代方案,因为其预测表示为输入特征的显式加权和。然而,当输入特征强相关时,即存在多重共线性时,学习到的权重可能表现出数据集间的大幅波动和跨物理相似特征的振荡行为,使得其解释变得困难甚至不可能。尽管统计学家熟知多重共线性下权重的不稳定性,但其对物理解释的影响,特别是与跨物理相似特征的振荡权重的联系,尚未得到系统阐明。本文通过分析特征相关矩阵的本征模,从理论上讨论了这种可解释性丧失背后的机制。我们表明,与多重共线性相关的小本征值模式会放大权重的波动,并产生不一定反映有意义贡献的振荡模式。我们在物理数据集上数值验证了这一理论图景,并表明岭回归抑制了这些不稳定模式,尽管得到的权重仍需谨慎解释。通过分析多种公开数据集,我们进一步证实了研究结果的普适性。我们的结果阐明了为何在存在多重共线性的情况下,即使对于线性回归模型,物理解释仍然可能困难。

英文摘要

Interpreting machine-learning models has attracted increasing attention, particularly in the physical sciences, where one often seeks to understand the underlying mechanisms rather than merely make predictions. Multiple linear regression is often regarded as an interpretable alternative to more complex models, such as deep neural networks, because its predictions are expressed as explicit weighted sums of input features. However, when input features are strongly correlated, namely in the presence of multicollinearity, the learned weights can exhibit large dataset-to-dataset fluctuations and oscillatory behavior across physically similar features, making their interpretation difficult or even impossible. Although the instability of the weights under multicollinearity is well known in statistics, its consequences for physical interpretation, in particular its connection to oscillatory weights across physically similar features, have not been systematically clarified. Here, we theoretically discuss the mechanism behind this loss of interpretability by analyzing the eigenmodes of the feature correlation matrix. We show that small-eigenvalue modes associated with multicollinearity amplify fluctuations in the weights and generate oscillatory patterns that do not necessarily reflect meaningful contributions. We test this theoretical picture numerically on physics datasets and show that Ridge regularization suppresses these unstable modes, although the resulting weights must still be interpreted with caution. We further confirm the generality of our findings beyond physics by analyzing a diverse collection of publicly available datasets. Our results clarify why, in the presence of multicollinearity, physical interpretation can remain difficult even for linear regression models.

2606.16077 2026-06-16 cs.CC cs.LG 交叉投稿

Polynomial-Time Mistake-Bounded Language Generation

多项式时间错误有界语言生成

Héctor Jimenez, Alexander Kozachinskiy, Vicente Opazo

发表机构 * University of Chile(智利大学) CENIA

AI总结 本文提出多项式时间错误有界语言生成框架,证明奇偶函数族、文字合取族以及具有多项式多个极大项的单调整布尔函数族属于该框架,后者包含所有多项式大小决策树可计算的单调函数。

详情
AI中文摘要

在这篇笔记中,我们引入了Kleinberg、Peale和Reingold(2026)提出的错误有界语言生成(MBLG)框架的多项式时间版本。我们观察到变量奇偶函数族和文字合取族是多项式时间MBLG的。我们的主要结果表明,具有多项式多个极大项的单调整布尔函数族是多项式时间MBLG的。该族包含所有可由多项式大小决策树计算的单调整布尔函数。我们的技术可以呈现为一种关于在黑板上书写数字的新组合博弈。

英文摘要

In this note, we introduce a polynomial-time version of the mistake-bounded language generation (MBLG) framework due to Kleinberg, Peale, and Reingold (2026). We observe that the family of parities of variables, and the family of conjunctions of literals, are polynomial-time MBLG. Our main result states that the family of monotone Boolean functions with polynomially-many maxterms is polynomial-time MBLG. This family includes all monotone Boolean functions, computable by polynomial-size decision trees. Our technique can be presented as a new combinatorial game about writing numbers on a board.

2606.16564 2026-06-16 cs.RO cs.LG 交叉投稿

Elastic ODYN: Differentiable Optimization for Infeasible Control and Learning in Robotics

Elastic ODYN:面向机器人中不可行控制与学习的可微优化

Aristotelis Papatheodorou, Jose Rojas, Ioannis Havoutis, Carlos Mastalli

发表机构 * University of Oxford(牛津大学) Heriot-Watt University(赫瑞瓦特大学)

AI总结 提出Elastic ODYN,一种通过平滑平方ℓ2弹性松弛处理不可行二次规划(QP)的原始-对偶非内点求解器,支持热启动,在无可行点时收敛到最接近可行解,并基于此开发可微QP层和不可行感知SQP方法,在基准QP、奇异接触力学、可微参数辨识及四足/人形机器人轨迹优化中优于现有方法。

Comments 8 pages, 5 figures, 2 tables

详情
AI中文摘要

机器人系统经常遇到冲突的目标、建模误差和退化接触条件,这些条件使得二次规划(QP)不可行。然而,大多数优化求解器和可微QP层假设可行性,当约束无法同时满足时,会导致数值失败、梯度不稳定或求解器崩溃。我们提出Elastic ODYN,一种原始-对偶非内点QP求解器,通过平滑平方ℓ2弹性松弛处理不可行性。所得公式在病态和退化条件下保持良态,支持热启动,并在无可行点时收敛到最接近可行解。一个轻量级细化阶段从弹性解中恢复有物理意义的对偶变量。基于此框架,我们开发了Elastic OdynLayer,一个在不可行性下具有稳定梯度的可微QP层,以及Elastic OdynSQP,一种不可行感知的SQP方法,通过选择性约束松弛解决不一致的子问题和本质不可行的最优控制任务。我们在基准QP、奇异接触力学、可微参数辨识以及四足和人形机器人轨迹优化上评估该框架。在所有设置中,Elastic ODYN在鲁棒性、热启动性能和收敛可靠性方面始终优于最先进的弹性QP求解器,使得优化、仿真、控制和学习能够超越现有方法的可行性假设。

英文摘要

Robotic systems routinely encounter conflicting objectives, modeling errors, and degenerate contact conditions that render quadratic programs (QPs) infeasible. Yet most optimization solvers and differentiable QP layers assume feasibility, leading to numerical failures, unstable gradients, or solver breakdown when constraints cannot be simultaneously satisfied. We present Elastic ODYN, a primal--dual non-interior-point QP solver that handles infeasibility through smooth squared-$\ell_2$ elastic relaxations. The resulting formulation remains well posed under ill-conditioning and degeneracy, supports warm starting, and converges to closest-to-feasible solutions when no feasible point exists. A lightweight refinement stage recovers physically meaningful dual variables from the elastic solution. Building on this framework, we develop Elastic OdynLayer, a differentiable QP layer with stable gradients under infeasibility, and Elastic OdynSQP, an infeasibility-aware SQP method that resolves inconsistent subproblems and intrinsically infeasible optimal control tasks through selective constraint relaxation. We evaluate the framework on benchmark QPs, singular contact mechanics, differentiable parameter identification, and quadrupedal and humanoid trajectory optimization. Across all settings, Elastic ODYN consistently outperforms state-of-the-art elastic QP solvers in robustness, warm-start performance, and convergence reliability, enabling optimization, simulation, control, and learning beyond the feasibility assumptions of existing methods.

2606.16926 2026-06-16 math.OC cs.LG stat.ML 交叉投稿

Functional Gradient Descent with Adaptive Representations

自适应表示的函数梯度下降

Daniel Csillag, Rodrigo Schuller, Pedro Dall'Antonia, Leonidas Guibas, Luiz Velho, Tiago Novello

AI总结 提出一种自适应表示的函数梯度下降算法,通过将近似误差纳入分析,在平滑损失下收敛到驻点,在PL条件下收敛到全局最小值,在回归、PDE求解和计算机视觉中优于固定近似FGD和神经网络基线。

详情
AI中文摘要

函数优化问题通常通过优化固定表示(如神经网络)的参数来解决,这导致高度非凸的损失,使训练和理论分析复杂化。一个有趣的替代方案是函数梯度下降(FGD),即直接在函数空间中进行梯度下降,它受益于强收敛结果并具有简洁的理论。然而,FGD在实践中难以实现,因为函数梯度是无限维的,因此无法完全计算或存储在内存中。现有的实现因此依赖于固定近似,这引入了近似误差。我们提出了一种新的、有理论基础的FGD算法,该算法在优化过程中自适应地调整函数梯度的表示。通过将这种近似明确地纳入分析,我们证明了无论近似如何,算法都能收敛到驻点(对于平滑损失)和全局最小值(在平滑性和Polyak-Lojasiewicz型条件下)。据我们所知,这是第一个在一般设置下具有此类保证的可实现FGD方法。我们在回归、偏微分方程的数值求解和现代计算机视觉中展示了我们方法的有效性。在各种设置中,我们的方法在效率和准确性上始终优于固定近似的FGD和神经网络基线。

英文摘要

Functional optimization problems are typically solved by optimizing the parameters of a fixed representation, such as a neural network, resulting in highly nonconvex losses that complicate both training and theoretical analysis. An interesting alternative is functional gradient descent (FGD), that is, gradient descent directly in function space, which benefits from strong convergence results and admits a clean theory. However, FGD is difficult to implement in practice because functional gradients are infinite-dimensional, and thus cannot be fully computed nor stored in memory. Existing implementations therefore rely on fixed approximations, which introduce approximation error. We propose a new, theoretically-grounded FGD algorithm that adapts the representation of the functional gradients over the course of optimization. By explicitly incorporating this approximation into the analysis, we establish convergence to a stationary point (for smooth losses) and to a global minimizer (under smoothness + a Polyak-Lojasiewicz-type condition) regardless of our approximations. To the best of our knowledge, this is the first implementable FGD method with such guarantees in a general setting. We demonstrate the effectiveness of our method on regression, numerical solution of PDEs, and modern computer vision. Across settings, our method consistently outperforms both FGD with fixed approximations and neural network baselines in efficiency and accuracy.

2606.16941 2026-06-16 stat.ML cs.LG 交叉投稿

A nonparametric two-sample test using a parametric integral probability metric

使用参数化积分概率度量的非参数双样本检验

Yuha Park, Yongdai Kim

发表机构 * University of Hamburg(汉堡大学) Seoul National University(首尔国立大学)

AI总结 提出基于单节点神经网络的参数化判别器类构造积分概率度量,得到非参数检验统计量PReLU-IPM,并证明其一致性和渐近等价性,实验表明有限样本下检验功效更高或相当。

Comments 45 pages. Accepted for publication in Statistical Analysis and Data Mining

详情
AI中文摘要

检测两个独立样本之间的分布差异是统计学和机器学习中的一个基本问题。非参数双样本检验提供了一个原则性框架,用于确定两个样本是否来自同一潜在分布,而不假设分布的任何特定参数形式。在本研究中,我们基于新引入的积分概率度量(IPM),使用一个特殊设计的、具有神经网络单节点的参数化判别器类,提出了一种新的双样本检验统计量。我们证明了所得到的检验统计量PReLU-IPM是非参数的,并为相关的双样本检验程序PReLU-TST建立了理论保证,包括其一致性以及在正则条件下与非参数基于IPM的检验的渐近等价性。通过分析多个模拟和真实基准数据集,我们证明了PReLU-TST在有限样本下,在一系列备择假设中实现了更高的检验功效,或与竞争对手表现相当。

英文摘要

Detecting distributional differences between two independent samples is a fundamental problem in statistics and machine learning. Nonparametric two-sample testing provides a principled framework for determining whether two samples are drawn from the same underlying distribution, without assuming any specific parametric form for the distribution. In this study, we propose a new two-sample test statistic based on a newly introduced integral probability metric (IPM), using a specially designed parametric discriminator class with a single node of a neural network. We show that the resulting test statistic, called PReLU-IPM, is nonparametric and establish theoretical guarantees for the associated two-sample testing procedure, PReLU-TST, including its consistency and asymptotical equivalence to nonparametric IPM-based tests under regularity conditions. By analyzing multiple simulated and real benchmark datasets, we demonstrate that PReLU-TST achieves higher power across a range of alternatives or performs comparably to its competitors, for finite samples.

2606.16975 2026-06-16 stat.ML cs.LG 交叉投稿

Sobolev Approximation by Fixed-Size Neural Networks with Arbitrary Accuracy

固定大小神经网络实现任意精度的Sobolev逼近

Baicheng Li, Haizhao Yang, Shijun Zhang

AI总结 提出新型激活函数(EUAF、DUAF∞等),使固定大小神经网络能以任意精度逼近Sobolev空间中的函数,并给出显式的宽度和深度界。

详情
AI中文摘要

本文研究用于固定大小神经网络实现任意精度Sobolev逼近的新型激活函数。我们首先证明,任何$W^{2,\infty}((a,b)^d)$中的函数都可以通过使用基本通用激活函数($\mathrm{EUAF}$)的固定大小神经网络,以$W^{1,\infty}$范数度量达到任意精度。为了将此结果推广到$s\in\mathbb{N}$时的$W^{s,\infty}((a,b)^d)$,我们引入了来自可微通用激活函数族($\mathrm{DUAF}_n$)的光滑激活函数$\mathrm{DUAF}_{\infty}$。我们证明,任何$W^{s,\infty}((a,b)^d)$中的函数都可以通过固定大小的$\mathrm{DUAF}_{\infty}$激活网络,以$W^{s-1,\infty}$范数度量达到任意精度。我们进一步构造了Sigmoid变体$\widetilde{\mathrm{DUAF}}_n$,并证明对于每个$1\leq s\leq n$,固定大小的$\widetilde{\mathrm{DUAF}}_n$激活网络仍能以$W^{s-1,\infty}$范数度量任意逼近任何$f\in W^{s,\infty}((a,b)^d)$。在所有结果中,宽度和深度界均被显式计算,且所提出的激活函数是初等的。

英文摘要

In this work, we investigate new activation functions for achieving arbitrary-accuracy Sobolev approximation by fixed-size neural networks. We first show that any function in $W^{2,\infty}((a,b)^d)$ can be approximated with arbitrary accuracy, measured in the $W^{1,\infty}$-norm, by a fixed-size neural network using the Elementary Universal Activation Function ($\mathrm{EUAF}$). To extend this result to $W^{s,\infty}((a,b)^d)$ for $s\in\mathbb{N}$, we introduce a smooth activation $\mathrm{DUAF}_{\infty}$ from the family of Differentiable Universal Activation Functions ($\mathrm{DUAF}_n$). We prove that any function in $W^{s,\infty}((a,b)^d)$ can be approximated with arbitrary accuracy in the $W^{s-1,\infty}$-norm by a fixed-size $\mathrm{DUAF}_{\infty}$-activated network. We further construct sigmoidal variants $\widetilde{\mathrm{DUAF}}_n$ and show that, for every $1\leq s\leq n$, fixed-size $\widetilde{\mathrm{DUAF}}_n$-activated networks still approximate any $f\in W^{s,\infty}((a,b)^d)$ with arbitrary accuracy in the $W^{s-1,\infty}$-norm. In all these results, the width and depth bounds are computed explicitly, and the proposed activations are elementary.

2606.17000 2026-06-16 cs.CC cs.GT cs.LG math.OC 交叉投稿

The Complexity of Min-Max Optimization for Quadratic Polynomials

二次多项式极小极大优化的复杂性

Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Alexandros Hollender

AI总结 证明超立方体上极小极大优化的近似稳定点计算对二次多项式是PPAD难的,即使多项式是多线性的且每个变量最多出现在三个单项式中。

详情
AI中文摘要

我们证明,对于二次多项式,计算超立方体上极小极大优化的近似稳定点是PPAD难的。即使多项式是多线性的,每个变量最多出现在三个单项式中,且近似因子是逆多项式,这一结论仍然成立。作为直接推论,我们得到了两队零和多项式矩阵博弈的首个PPAD难结果。

英文摘要

We prove that computing approximate stationary points of min-max optimization over the hypercube is PPAD-hard for quadratic polynomials. This holds even when the polynomials are multilinear, each variable appears in at most three monomials, and the approximation factor is inverse polynomial. As a direct consequence, we obtain the first PPAD-hardness results for two-team zero-sum polymatrix games.

2606.17013 2026-06-16 math.OC cs.LG 交叉投稿

Exploding and vanishing gradients in deep neural networks: the effect of residual connections

深度神经网络中的梯度爆炸和消失:残差连接的影响

Vivek S Borkar

AI总结 利用乘法遍历理论分析深度神经网络中的梯度爆炸与消失现象,并解释残差连接对李雅普诺夫谱的影响。

Comments 10 pages

详情
AI中文摘要

深度神经网络中众所周知的梯度爆炸和消失现象通过乘法遍历理论进行分析。在此背景下,解释了添加残差连接的效果。具体而言,利用Furstenberg和Kifer对李雅普诺夫指数的刻画,对李雅普诺夫谱以及残差连接对其的影响做出了精确陈述。

英文摘要

The well known phenomenon of exploding and vanishing gradients in deep neural networks is analyzed using multiplicative ergodic theory. The effect of adding a residual connection is explained in this context. Specifically, a characterization of Liapunov exponents due to Furstenberg and Kifer is exploited in order to make a precise statement about the Liapunov spectrum and the effect of residual connections on it.

2409.08066 2026-06-16 cs.LG math.OC 版本更新

Self-Supervised Learning of Iterative Solvers for Constrained Optimization

约束优化的迭代求解器的自监督学习

Lukas Lüken, Sergio Lucia

发表机构 * Chair of Process Automation Systems, TU Dortmund University(过程自动化系统教授会,杜伊斯堡-艾森大学)

AI总结 提出一种基于学习的迭代求解器,通过神经网络预测初始解并用学习型迭代器精炼,利用KKT条件设计损失函数实现自监督训练,在非凸问题上比IPOPT快一个数量级且精度更高。

Comments This work has been published in Results in Control and Optimization. Update to accepted manuscript

详情
Journal ref
Results in Control and Optimization, Volume 23, 2026
AI中文摘要

参数优化问题的实时求解对于在严格实时约束下要求高精度的应用(如模型预测控制)至关重要。为此,本文提出了一种基于学习的约束优化迭代求解器,包括一个神经网络预测器,用于生成初始原始-对偶解估计,随后是一个学习型迭代求解器,用于精炼这些估计以达到高精度。我们引入了一种基于Karush-Kuhn-Tucker(KKT)最优性条件的新型损失函数,实现了完全自监督训练,无需预先求解的优化器解决方案。理论保证确保训练损失函数仅在KKT点处达到最小值。一种凸化过程使得该方法能够应用于非凸问题,同时保留这些保证。在两个非凸案例研究上的实验表明,与最先进的求解器(如IPOPT)相比,速度提升高达一个数量级,同时比竞争的学习方法实现了数量级更高的精度。

英文摘要

The real-time solution of parametric optimization problems is critical for applications that demand high accuracy under tight real-time constraints, such as model predictive control. To this end, this work presents a learning-based iterative solver for constrained optimization, comprising a neural network predictor that generates initial primal-dual solution estimates, followed by a learned iterative solver that refines these estimates to reach high accuracy. We introduce a novel loss function based on Karush-Kuhn-Tucker (KKT) optimality conditions, enabling fully self-supervised training without pre-solved optimizer solutions. Theoretical guarantees ensure that the training loss function attains minima exclusively at KKT points. A convexification procedure enables application to nonconvex problems while preserving these guarantees. Experiments on two nonconvex case studies demonstrate speedups of up to one order of magnitude compared to state-of-the-art solvers such as IPOPT, while achieving orders of magnitude higher accuracy than competing learning-based approaches.

2505.20030 2026-06-16 cs.LG cs.AI nlin.CD physics.comp-ph 版本更新

Multiple Descents in Deep Learning as a Sequence of Order-Chaos Transitions in LSTM Networks

深度学习中的多重下降现象:LSTM网络中的有序-混沌转变序列

Wenbo Wei, Fan Xu, Nicholas Chong Jia Le, Choy Heng Lai, Ling Feng

发表机构 * Department of Physics(物理系) National University of Singapore(新加坡国立大学) Institute of High Performance Computing (IHPC)(高性能计算研究所) A*STAR

AI总结 本文在LSTM网络训练中发现多重下降现象,通过渐近稳定性分析表明性能周期与有序-混沌相变相关,最优训练点位于临界转变点,且首次有序-混沌转变处边缘最宽,利于权重探索。

详情
AI中文摘要

我们在长短期记忆(LSTM)网络训练真实世界任务的过程中观察到一种新颖的“多重下降”现象,即模型过训练后性能会经历多次长期的上行和下行循环。通过对模型进行渐近稳定性分析,我们发现性能周期——由测试数据中的损失函数指示——与模型有序和混沌之间的相变过程密切相关,且局部最优训练步骤始终处于两个阶段之间的临界转变点。更重要的是,模型的最优点通常出现在从有序到混沌的第一次转变处,此时“混沌边缘”的“宽度”往往最宽,从而允许对学习权重配置进行最佳探索。

英文摘要

We observe a novel `multiple-descent' phenomenon during the learning process of a recurrent neural network called long-short-term memory (LSTM) networks during its training on real-world task, in which the performance goes through long cycles of up and down trends multiple times after the model is overtrained. By carrying out asymptotic stability analysis of the models, we found that the cycles in performance -- indicated by loss function in test data -- are closely associated with the phase transition process between order and chaos of the model, and the local optimal training step are consistently at the critical transition point between the two phases. More importantly, the most optimal point of the model usually occurs at the first transition from order to chaos, where the `width' of the `edge of chaos' is often the widest, allowing the best exploration of weight configurations for learning.

2505.24275 2026-06-16 cs.LG math.OC stat.ML 版本更新

GradPower: Powering Gradients for Faster Language Model Pre-Training

GradPower: 通过梯度加速更快的语言模型预训练

Jinbo Wang, Mingze Wang, Jiaqi Zhang, Wei Wang, Peng Pei, Xunliang Cai, Weinan E, Lei Wu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出GradPower,一种轻量级的梯度变换技术,用于加速语言模型预训练。通过元素级符号幂变换,将梯度输入基础优化器,无需修改优化器内部逻辑或超参数,从而在多种架构、参数规模、数据集和学习率调度方案中均取得更低的终端损失。

Comments 24 pages, accepted by ICML 2026

详情
AI中文摘要

我们提出GradPower,一种轻量级的梯度变换技术,用于加速语言模型预训练。给定一个梯度向量$g=(g_i)_i$,GradPower首先应用元素级符号幂变换:$φ_p(g)=({ m sign}(g_i)|g_i|^p)_{i}$,其中$p>0$为固定值,然后将变换后的梯度输入基础优化器。值得注意的是,GradPower只需单行代码更改,无需修改基础优化器的内部逻辑,包括超参数。当应用于Adam(称为AdamPower)时,GradPower在多种架构(LLaMA、Qwen2MoE)、参数规模(66M到2B)、数据集(C4、OpenWebText)和学习率调度方案(余弦、warmup-stable-decay)中均一致取得更低的终端损失。最显著的收益出现在训练现代混合专家模型时使用warmup-stable-decay调度方案。GradPower还无缝集成到其他最先进的优化器中,如Muon,从而进一步提升性能。最后,我们提供了理论分析,揭示了GradPower的内在机制,并突显了梯度噪声的影响。

英文摘要

We propose GradPower, a lightweight gradient-transformation technique for accelerating language model pre-training. Given a gradient vector $g=(g_i)_i$, GradPower first applies the elementwise sign-power transformation: $φ_p(g)=({\rm sign}(g_i)|g_i|^p)_{i}$ for a fixed $p>0$, and then feeds the transformed gradient into a base optimizer. Notably, GradPower requires only a single-line code change and no modifications to the base optimizer's internal logic, including the hyperparameters. When applied to Adam (termed AdamPower), GradPower consistently achieves lower terminal loss across diverse architectures (LLaMA, Qwen2MoE), parameter scales (66M to 2B), datasets (C4, OpenWebText), and learning-rate schedules (cosine, warmup-stable-decay). The most pronounced gains are observed when training modern mixture-of-experts models with warmup-stable-decay schedules. GradPower also integrates seamlessly with other state-of-the-art optimizers, such as Muon, yielding further improvements. Finally, we provide theoretical analyses that reveal the underlying mechanism of GradPower and highlight the influence of gradient noise.

2508.11522 2026-06-16 cs.LG hep-th 版本更新

Finite-Width Neural Tangent Kernels from Feynman Diagrams

从费曼图看有限宽神经正切核

Max Guillen, Philipp Misof, Jan E. Gerken

发表机构 * Department of Mathematical Sciences, Chalmers University of Technology(楚德大学技术学院数学科学系) the University of Gothenburg(哥德堡大学)

AI总结 引入费曼图计算有限宽神经正切核修正,简化代数运算并推导递归关系,证明ReLU等尺度不变非线性在Gram矩阵对角线上无有限宽修正。

Comments Published at the ICML 2026; 12 pages + appendices

详情
AI中文摘要

神经正切核(NTK)是分析深度非线性神经网络的有力工具。在无限宽极限下,大多数常见架构的NTK易于计算,从而完全解析控制训练动力学。然而,在无限宽下,训练的重要特性如NTK演化或特征学习缺失。不过,通过计算无限宽高斯统计的修正,可以包含有限宽效应。我们引入费曼图来计算NTK统计的有限宽修正。这些图极大地简化了必要的代数运算,并使得能够计算涉及预激活、NTK和某些高阶导数张量(dNTK和ddNTK)的任意统计量的逐层递归关系,这些张量对于预测主导阶的训练动力学是必需的。我们通过将深度网络的稳定性结果从预激活扩展到NTK,并证明对于尺度不变非线性(如ReLU),在NTK的Gram矩阵对角线上不存在有限宽修正,展示了我们框架的可行性。我们数值实现了计算任意输入一阶修正所需的完整方程组,并证明结果遵循宽度$n\gtrsim 20$的采样神经网络的统计量。

英文摘要

Neural tangent kernels (NTKs) are a powerful tool for analyzing deep, non-linear neural networks. In the infinite-width limit, NTKs can easily be computed for most common architectures, yielding full analytic control over the training dynamics. However, at infinite width, important properties of training such as NTK evolution or feature learning are absent. Nevertheless, finite width effects can be included by computing corrections to the Gaussian statistics at infinite width. We introduce Feynman diagrams for computing finite-width corrections to NTK statistics. These dramatically simplify the necessary algebraic manipulations and enable the computation of layer-wise recursion relations for arbitrary statistics involving preactivations, NTKs and certain higher-derivative tensors (dNTK and ddNTK) required to predict the training dynamics at leading order. We demonstrate the feasibility of our framework by extending stability results for deep networks from preactivations to NTKs and proving the absence of finite-width corrections for scale-invariant nonlinearities such as ReLU on the diagonal of the Gram matrix of the NTK. We numerically implement the complete set of equations necessary to compute the first-order corrections for arbitrary inputs and demonstrate that the results follow the statistics of sampled neural networks for widths $n\gtrsim 20$.

2510.01175 2026-06-16 cs.LG eess.SP math.OC stat.ML 版本更新

On the Benefits of Weight Normalization for Overparameterized Matrix Sensing

关于过参数化矩阵感知中权重归一化的优势

Yudong Wei, Liang Zhang, Bingcong Li, Niao He

发表机构 * ETH Zurich(苏黎世联邦理工学院)

AI总结 本文证明在过参数化矩阵感知中,权重归一化结合黎曼优化可实现线性收敛,相比未使用归一化的方法获得指数级加速,且过参数化程度越高,迭代和样本复杂度多项式级降低。

详情
AI中文摘要

尽管归一化技术在深度学习中广泛应用,但其理论理解仍然相对有限。在这项工作中,我们建立了(广义)权重归一化(WN)应用于过参数化矩阵感知问题的优势。我们证明,使用黎曼优化的WN实现了线性收敛,相比未使用WN的标准方法获得了指数级加速。我们的分析进一步表明,随着过参数化程度的增加,迭代和样本复杂度都多项式级地改善。据我们所知,这项工作首次描述了WN如何利用过参数化在矩阵感知中实现更快的收敛。

英文摘要

While normalization techniques are widely used in deep learning, their theoretical understanding remains relatively limited. In this work, we establish the benefits of (generalized) weight normalization (WN) applied to the overparameterized matrix sensing problem. We prove that WN with Riemannian optimization achieves linear convergence, yielding an exponential speedup over standard methods that do not use WN. Our analysis further demonstrates that both iteration and sample complexity improve polynomially as the level of overparameterization increases. To the best of our knowledge, this work provides the first characterization of how WN leverages overparameterization for faster convergence in matrix sensing.

2512.02494 2026-06-16 cs.LG 版本更新

A Fully First-Order Layer for Differentiable Optimization

用于可微优化的全一阶层

Zihao Zhao, Kai-Chia Mo, Shing-Hei Ho, Brandon Amos, Kai Wang

发表机构 * University of California, Berkeley(加州大学伯克利分校) DeepMind(深度思维)

AI总结 提出一种仅使用一阶信息计算梯度的算法,通过将可微优化重写为双层优化并引入活动集拉格朗日超梯度方法,避免Hessian计算,实现高效近似。

Comments ICML 2026

详情
AI中文摘要

可微优化层使得学习系统能够通过求解嵌入的优化问题来做出决策。然而,通过隐式微分计算梯度需要求解一个包含Hessian项的线性系统,这既计算密集又内存密集。为了解决这一挑战,我们提出了一种仅使用一阶信息计算梯度的新算法。关键洞察是将可微优化重写为双层优化问题,并利用双层方法的最新进展。具体来说,我们引入了一个活动集拉格朗日超梯度方法,避免了Hessian计算,并提供了有限时间、非渐近的近似保证。我们证明,仅使用一阶信息即可在$\tilde{O}(1)$时间内计算出近似超梯度,从而使得约束双层优化的总体复杂度为$\tilde{O}(\delta^{-1}\epsilon^{-3})$,这与非光滑非凸优化的最佳已知速率相匹配。此外,我们发布了一个开源Python库,可以轻松地从现有求解器进行适配。源代码可在该https URL获取。

英文摘要

Differentiable optimization layers enable learning systems to make decisions by solving embedded optimization problems. However, computing gradients via implicit differentiation requires solving a linear system with Hessian terms, which is both compute- and memory-intensive. To address this challenge, we propose a novel algorithm that computes the gradient using only first-order information. The key insight is to rewrite the differentiable optimization as a bilevel optimization problem and leverage recent advances in bilevel methods. Specifically, we introduce an active-set Lagrangian hypergradient oracle that avoids Hessian evaluations and provides finite-time, non-asymptotic approximation guarantees. We show that an approximate hypergradient can be computed using only first-order information in $\tilde{O}(1)$ time, leading to an overall complexity of $\tilde{O}(δ^{-1}ε^{-3})$ for constrained bilevel optimization, which matches the best known rate for non-smooth non-convex optimization. Furthermore, we release an open-source Python library that can be easily adapted from existing solvers. The source code is available at https://github.com/guaguakai/FFOLayer.

2602.12471 2026-06-16 cs.LG 版本更新

Tight Bounds for Logistic Regression with Large Stepsize Gradient Descent in Low Dimension

低维大步长梯度下降逻辑回归的紧界

Michael Crawshaw, Mingrui Liu

发表机构 * George Mason University(乔治·马歇尔大学)

AI总结 针对可分离数据二分类的逻辑回归,研究大步长梯度下降的收敛速率,通过精细分析正交子空间振荡动力学,给出过渡时间的紧界,得到改进的损失上界。

Comments COLT 2026 camera ready

详情
AI中文摘要

我们考虑用梯度下降最小化逻辑损失来训练线性模型进行可分离数据的二分类的优化问题。在$T$次迭代的预算下,最近研究表明通过选择大步长$\eta = \Theta(\gamma^2 T)$(其中$\gamma$是数据集的间隔)可以实现加速的$1/T^2$速率,尽管损失会出现非单调性。在本文中,我们针对数据是二维的情况提供了梯度下降在该问题上的更紧分析:我们证明,只要$T \geq \Omega(n/\gamma + 1/\gamma^2)$(其中$n$是数据集大小),具有足够大学习率$\eta$的GD就能找到损失小于$\mathcal{O}(1/(\eta \gamma^2 T))$的点。我们的改进速率来自于对GD从非稳定(非单调损失)过渡到稳定(单调损失)所需时间$\tau$的更紧界,这是通过对最大间隔分类器正交子空间中GD的振荡动力学进行精细分析得到的。我们还给出了$\tau$的下界,与上界匹配至对数因子,表明我们的分析是紧的。

英文摘要

We consider the optimization problem of minimizing the logistic loss with gradient descent to train a linear model for binary classification with separable data. With a budget of $T$ iterations, it was recently shown that an accelerated $1/T^2$ rate is possible by choosing a large stepsize $η= Θ(γ^2 T)$ (where $γ$ is the dataset's margin) despite the resulting non-monotonicity of the loss. In this paper, we provide a tighter analysis of gradient descent for this problem when the data is two-dimensional: we show that GD with a sufficiently large learning rate $η$ finds a point with loss smaller than $\mathcal{O}(1/(ηγ^2 T))$, as long as $T \geq Ω(n/γ+ 1/γ^2)$, where $n$ is the dataset size. Our improved rate comes from a tighter bound on the time $τ$ that it takes for GD to transition from unstable (non-monotonic loss) to stable (monotonic loss), via a fine-grained analysis of the oscillatory dynamics of GD in the subspace orthogonal to the max-margin classifier. We also provide a lower bound of $τ$ matching our upper bound up to logarithmic factors, showing that our analysis is tight.

2602.14154 2026-06-16 cs.LG math.OC 版本更新

A Penalty Approach for Differentiation Through Black-Box Quadratic Programming Solvers

一种通过黑箱二次规划求解器进行微分的惩罚方法

Yuxuan Linghu, Zhiyuan Liu, Qi Deng

AI总结 提出dXPP,一种基于惩罚的微分框架,通过解耦QP求解与微分,利用黑箱求解器进行前向传播,并在反向传播中隐式微分一个光滑近似惩罚问题,显著提升大规模问题的计算效率和鲁棒性。

Comments 16 pages, 4 figures, 5 tables

详情
AI中文摘要

通过二次规划(QP)的解进行微分是可微优化中的一个核心问题。大多数现有方法通过Karush--Kuhn--Tucker(KKT)系统进行微分,但其计算成本和数值鲁棒性在大规模时会下降。为了解决这些限制,我们提出了dXPP,一种基于惩罚的微分框架,将QP求解与微分解耦。在求解步骤(前向传播)中,dXPP与求解器无关,可以利用任何黑箱QP求解器。在微分步骤(反向传播)中,我们将解映射到一个光滑的近似惩罚问题,并通过隐式微分对其进行微分,仅需求解一个在原始变量上小得多的线性系统。这种方法绕过了显式KKT微分固有的困难,显著提高了计算效率和鲁棒性。我们在各种任务上评估了dXPP,包括随机生成的QP、大规模稀疏投影问题以及一个真实的多期投资组合优化任务。实验结果表明,dXPP与基于KKT的微分方法具有竞争力,并在大规模问题上实现了显著的加速。我们的实现是开源的,可在此https URL获取。

英文摘要

Differentiating through the solution of a quadratic program (QP) is a central problem in differentiable optimization. Most existing approaches differentiate through the Karush--Kuhn--Tucker (KKT) system, but their computational cost and numerical robustness can degrade at scale. To address these limitations, we propose dXPP, a penalty-based differentiation framework that decouples QP solving from differentiation. In the solving step (forward pass), dXPP is solver-agnostic and can leverage any black-box QP solver. In the differentiation step (backward pass), we map the solution to a smooth approximate penalty problem and implicitly differentiate through it, requiring only the solution of a much smaller linear system in the primal variables. This approach bypasses the difficulties inherent in explicit KKT differentiation and significantly improves computational efficiency and robustness. We evaluate dXPP on various tasks, including randomly generated QPs, large-scale sparse projection problems, and a real-world multi-period portfolio optimization task. Empirical results demonstrate that dXPP is competitive with KKT-based differentiation methods and achieves substantial speedups on large-scale problems. Our implementation is open source and available at https://github.com/mmmmmmlinghu/dXPP.

2602.19172 2026-06-16 cs.LG 版本更新

Online Realizable Regression and Applications for ReLU Networks

在线可实现回归及其在ReLU网络中的应用

Ilan Doron-Arad, Idan Mehalel, Elchanan Mossel

发表机构 * Massachusetts Institute of Technology(麻省理工学院) The Hebrew University of Jerusalem(耶路撒冷希伯来大学)

AI总结 研究对抗模型下满足近似三角不等式的损失函数的可实现在线回归,提出熵势方法通过覆盖数上界化缩放Littlestone维数,并应用于Lipschitz回归和ReLU网络,揭示回归与分类的差异。

详情
AI中文摘要

可实现在线回归的行为可能与在线分类截然不同。即使没有边际或随机假设,可实现性也可能在类似度量的损失下强制实现无界(有限)累积损失,即使类似的分类问题具有无限的错误界。我们研究了对抗模型下满足近似三角不等式(近似伪度量)的损失函数的可实现在线回归。Attias等人最近的工作表明,最小最大可实现累积损失由缩放的Littlestone/在线维数 $\mathbb{D}_{\mathrm{onl}}$ 刻画,但这个量可能难以分析。我们的主要技术贡献是一个通用的势方法,通过一个具体的Dudley型熵积分来上界 $\mathbb{D}_{\mathrm{onl}}$,该积分仅依赖于假设类在诱导的sup伪度量下的覆盖数。我们定义了一个 \emph{熵势} $\Phi(\mathcal{H})=\int_{0}^{diam(\mathcal{H})} \log N(\mathcal{H},\varepsilon)\,d\varepsilon$,其中 $N(\mathcal{H},\varepsilon)$ 是 $\mathcal{H}$ 的 $\varepsilon$-覆盖数,并证明对于每个 $c$-近似伪度量损失,$\mathbb{D}_{\mathrm{onl}}(\mathcal{H})\le O(c)\,\Phi(\mathcal{H})$。特别地,多项式度量熵意味着 $\Phi(\mathcal{H})<\infty$,从而得到具有透明有效维数依赖的无界可实现累积损失界。我们在两个族上说明了该方法。我们证明了可实现在线学习的尖锐 $q$-vs.-$d$ 二分法(对于 $L$-Lipschitz回归,当且仅当 $q>d$ 时,总损失有限且可高效实现 $\Theta_{d,q}(L^d)$,否则无限),以及对于有界范数 $k$-ReLU网络,将回归(有限损失,甚至 $\widetilde O(k^2)$,对于单个ReLU为 $O(1)$)与分类(对于 $k=2,d=1$ 已经不可能)区分开来。

英文摘要

Realizable online regression can behave very differently from online classification. Even without any margin or stochastic assumptions, realizability may enforce horizon-free (finite) cumulative loss under metric-like losses, even when the analogous classification problem has an infinite mistake bound. We study realizable online regression in the adversarial model under losses that satisfy an approximate triangle inequality (approximate pseudo-metrics). Recent work of Attias et al. shows that the minimax realizable cumulative loss is characterized by the scaled Littlestone/online dimension $\mathbb{D}_{\mathrm{onl}}$, but this quantity can be difficult to analyze. Our main technical contribution is a generic potential method that upper bounds $\mathbb{D}_{\mathrm{onl}}$ by a concrete Dudley-type entropy integral that depends only on covering numbers of the hypothesis class under the induced sup pseudo-metric. We define an \emph{entropy potential} $Φ(\mathcal{H})=\int_{0}^{diam(\mathcal{H})} \log N(\mathcal{H},\varepsilon)\,d\varepsilon$, where $N(\mathcal{H},\varepsilon)$ is the $\varepsilon$-covering number of $\mathcal{H}$, and show that for every $c$-approximate pseudo-metric loss, $\mathbb{D}_{\mathrm{onl}}(\mathcal{H})\le O(c)\,Φ(\mathcal{H})$. In particular, polynomial metric entropy implies $Φ(\mathcal{H})<\infty$ and hence a horizon-free realizable cumulative-loss bound with transparent dependence on effective dimension. We illustrate the method on two families. We prove a sharp $q$-vs.-$d$ dichotomy for realizable online learning (finite and efficiently achievable $Θ_{d,q}(L^d)$ total loss for $L$-Lipschitz regression iff $q>d$, otherwise infinite), and for bounded-norm $k$-ReLU networks separate regression (finite loss, even $\widetilde O(k^2)$, and $O(1)$ for one ReLU) from classification (impossible already for $k=2,d=1$).

2603.09923 2026-06-16 cs.LG cs.NA math.NA math.OC 版本更新

OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality

OptEMA:用于零噪声最优性的随机优化的自适应指数移动平均

Ganzhao Yuan

发表机构 * Shenzhen University of Advanced Technology (SUAT)(深圳先进技术大学)

AI总结 提出OptEMA自适应优化器,通过闭环修正的AdaGrad-Norm系数调度,在无噪声时自动达到近最优确定速率,无需超参数重调。

详情
AI中文摘要

指数移动平均(EMA)是广泛使用的自适应优化器(如Adam)的核心组件。然而,现有的Adam类方法分析在零噪声场景下往往产生次优保证,依赖于开环参数调度,或需要预先知道光滑常数。受这些限制的启发,我们引入了OptEMA并分析了两个互补变体:OptEMA-M,它对一阶矩应用自适应递减的EMA系数并固定二阶矩衰减;OptEMA-V,交换了这些角色。这些变体的核心是修正的AdaGrad-Norm系数调度。该公式使OptEMA算法上闭环且无Lipschitz依赖,即其有效步长依赖于轨迹且无需通过Lipschitz常数参数化。在假设下界、无偏性、有界方差、平均光滑性以及用于控制自适应归一化器的有界随机梯度条件下,我们证明两个变体在平均梯度范数上达到统一的噪声自适应速率$\tilde{\mathcal{O}} \left(T^{-1/2}+\sigma^{1/2}T^{-1/4}\right)$。在零噪声场景下,这些界限自动退化为接近最优的确定速率$\widetilde{\mathcal{O}}(T^{-1/2})$,无需手动超参数重调。

英文摘要

Exponential moving averages (EMAs) are a central component of widely used adaptive optimizers such as Adam. However, existing analyses of Adam-style methods often yield suboptimal guarantees in the zero-noise regime, rely on open-loop parameter schedules, or require prior knowledge of smoothness constants. Motivated by these limitations, we introduce OptEMA and analyze two complementary variants: OptEMA-M, which applies an adaptive, decreasing EMA coefficient to the first moment with a fixed second-moment decay, and OptEMA-V, which swaps these roles. At the heart of these variants is a Corrected AdaGrad-Norm coefficient schedule. This formulation renders OptEMA algorithmically closed-loop and Lipschitz-free, meaning its effective stepsizes are trajectory-dependent and require no parameterization via the Lipschitz constant. Under lower-boundedness, unbiasedness, bounded variance, average smoothness, and a bounded stochastic-gradient condition used to control the adaptive normalizers, we prove that both variants achieve the unified noise-adaptive rate $\tilde{\mathcal{O}} \left(T^{-1/2}+σ^{1/2}T^{-1/4}\right)$ for the averaged gradient norm. In the zero-noise regime, these bounds automatically reduce to the nearly optimal deterministic rate $\widetilde{\mathcal{O}}(T^{-1/2})$ without manual hyperparameter retuning.

2605.01702 2026-06-16 cs.LG 版本更新

Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients

具有自动微分的浮点网络可以表示几乎所有浮点函数及其梯度

Sejun Park, Yeachan Park, Geonho Hwang

发表机构 * Department of Artificial Intelligence, Korea University(人工智能系,韩国大学) Department of Mathematics and Statistics, Sejong University(数学与统计学系,世宗大学) Department of Mathematical Sciences, Gwangju Institute of Science and Technology(数学科学系,光州科学技术院)

AI总结 本文证明,在浮点算术下,使用自动微分的浮点神经网络可以表示任意浮点函数及其梯度,适用于ReLU、ELU等常见激活函数。

详情
AI中文摘要

理论研究显示,对于紧致域上的任意可微函数,存在一个神经网络可以同时逼近函数值和梯度。然而,由于该结果假设实数参数和精确内部运算,无法在实际中使用。相反,实际实现仅使用实数的有限子集和带有舍入误差的机器运算。本文研究在浮点算术下,当输入梯度由自动微分算法$D^\mathtt{AD}$计算时,神经网络是否具有类似结果。我们首先证明,给定一个浮点函数$\phi$(例如损失函数),任意函数值和梯度可以分别由浮点网络$f$和$D^\mathtt{AD}(\phi\circ f)$表示。我们进一步推广该结果:在温和条件下,给定$\phi_1,\dots,\phi_n$,$D^\mathtt{AD}(\phi_i\circ f)$可以同时表示任意梯度,而$f$表示目标值。我们的结果适用于实际激活函数,例如$\mathrm{ReLU}$、$\mathrm{ELU}$、$\mathrm{GeLU}$、$\mathrm{Swish}$、$\mathrm{Sigmoid}$和$\mathrm{tanh}$。

英文摘要

Theoretical studies show that for any differentiable function on a compact domain, there exists a neural network that approximates both the function values and gradients. However, such a result cannot be used in practice since it assumes real parameters and exact internal operations. In contrast, real implementations only use a finite subset of reals and machine operations with round-off errors. In this work, we investigate whether a similar result holds for neural networks under floating-point arithmetic, when the gradient with respect to the input is computed by the automatic differentiation algorithm $D^\mathtt{AD}$. We first show that given a floating-point function $ϕ$ (e.g., a loss function), arbitrary function values and gradients can be represented by a floating-point network $f$ and $D^\mathtt{AD}(ϕ\circ f)$, respectively. We further extend this result: given $ϕ_1,\dots,ϕ_n$, $D^\mathtt{AD}(ϕ_i\circ f)$ can simultaneously represent arbitrary gradients while $f$ represents the target values, under mild conditions. Our results hold for practical activation functions, e.g., $\mathrm{ReLU}$, $\mathrm{ELU}$, $\mathrm{GeLU}$, $\mathrm{Swish}$, $\mathrm{Sigmoid}$, and $\mathrm{tanh}$.

2606.14095 2026-06-16 cs.LG math.OC math.PR stat.ML 版本更新

Lyapunov-Based Sample Complexity Analysis for Weakly-Coupled MDPs

基于Lyapunov的弱耦合MDP样本复杂度分析

Tianhao Wu, Matthew Zurek, Weina Wang, Qiaomin Xie

发表机构 * Department of Industrial and Systems Engineering, University of Wisconsin-Madison(威斯康星大学麦迪逊分校工业与系统工程系) Department of Computer Sciences, University of Wisconsin-Madison(威斯康星大学麦迪逊分校计算机科学系) Computer Science Department, Carnegie Mellon University(卡内基梅隆大学计算机科学系)

AI总结 针对平均奖励弱耦合MDP和Restless Bandits,提出基于Lyapunov的分析框架,实现样本和计算复杂度关于臂数N的多项式级界限,并给出首个有限样本PAC保证。

Comments Accepted for presentation at the Conference on Learning Theory (COLT) 2026

详情
AI中文摘要

我们研究了在生成模型下,平均奖励弱耦合马尔可夫决策过程(WCMDPs)和Restless Bandits(RBs)中学习的样本复杂度。直接简化为表格MDP会导致高复杂度界限,因为状态-动作空间随臂数$N$呈指数增长。通过利用弱耦合结构,我们证明可以以关于$N$的多项式样本和计算复杂度学习近优策略。具体来说,我们分析了插件方法,该方法对从数据估计的经验模型应用高效规划算法。对于完全异质的WCMDPs,我们建立了首个具有多项式复杂度和$O(1/\sqrt{N})$最优性间隙的有限样本PAC保证。对于同质RBs,我们进一步证明在温和的结构假设下可以实现更小的最优性间隙。我们工作的一个主要技术贡献是一个新颖的基于Lyapunov的分析框架。与依赖于难以控制的偏差函数的经典方法不同,我们的框架使用显式构造的Lyapunov函数以及真实模型与经验模型之间的漂移传递技术。我们框架中一个具有独立意义的关键步骤是对底层线性规划(LP)松弛的细粒度扰动分析,这为分析基于LP的策略和弱耦合系统提供了一个通用工具。

英文摘要

We study the sample complexity of learning in average-reward weakly-coupled Markov decision processes (WCMDPs) and Restless Bandits (RBs) under a generative model. Naive reduction to a tabular MDP leads to high complexity bounds as the state-action space is exponentially large in the number of arms $N$. By exploiting the weakly coupled structure, we show that near-optimal policies can be learned with sample and computational complexities that are polynomial in $N$. Specifically, we analyze the plug-in approach, which applies an efficient planning algorithm to an empirical model estimated from data. For fully heterogeneous WCMDPs, we establish the first finite-sample PAC guarantee with polynomial complexity and an $O(1/\sqrt{N})$ optimality gap. For homogeneous RBs, we further prove that a smaller optimality gap is achievable under mild structural assumptions. A primary technical contribution of our work is a novel Lyapunov-based analysis framework. Unlike classical approaches that rely on the difficult-to-control bias function, our framework uses an explicitly constructed Lyapunov function along with a drift transfer technique between the true and empirical models. A key step of independent interest in our framework is a fine-grained perturbation analysis for the underlying linear programming (LP) relaxation, which provides a general tool for analyzing LP-based policies and weakly-coupled systems.

2508.03867 2026-06-16 math.AG cs.LG stat.ML 版本更新

Constraining the outputs of ReLU neural networks

约束ReLU神经网络的输出

Yulia Alexandr, Guido Montúfar

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校) Max Planck Institute for Mathematics in the Sciences(马克斯·普朗克数学研究所)

AI总结 通过引入与ReLU网络相关的代数簇,利用激活区域内的秩约束推导多项式方程,刻画网络可表示的函数,并研究簇达到预期维度的条件。

Comments 33 pages, 4 figures

详情
AI中文摘要

我们引入了一类与ReLU神经网络自然相关的代数簇,这些代数簇源于网络输出在输入空间激活区域上的分段线性结构,以及在参数空间上的分段多线性结构。通过分析每个激活区域内网络输出的秩约束,我们推导出刻画网络可表示函数的多项式方程。我们进一步研究了这些簇达到预期维度的条件,从而深入理解ReLU网络的表达能力和结构特性。

英文摘要

We introduce a class of algebraic varieties naturally associated with ReLU neural networks, arising from the piecewise linear structure of their outputs across activation regions in input space, and the piecewise multilinear structure in parameter space. By analyzing the rank constraints on the network outputs within each activation region, we derive polynomial equations that characterize the functions representable by the network. We further investigate conditions under which these varieties attain their expected dimension, providing insight into the expressive and structural properties of ReLU networks.

2601.07326 2026-06-16 math.OC cs.LG 版本更新

Convergence Rate Analysis of the AdamW-style Shampoo: Unifying One-Sided and Two-Sided Preconditioning

AdamW风格Shampoo的收敛率分析:统一单侧与双侧预处理

Huan Li, Yiming Dong, Zhouchen Lin

发表机构 * Huan Li(李焕) Yiming Dong(董怡铭) Zhouchen Lin(林周辰)

AI总结 本文研究AdamW风格Shampoo优化器,统一单侧与双侧预处理,并建立了以核范数度量的收敛率,该收敛率在理想情况下与SGD的最优收敛率类似。

Comments V3:ICML Camera-Ready. V4 v.s. V3: extend to the more general setting where the exponents of the two preconditioners do not sum to 1/2

详情
AI中文摘要

本文研究AdamW风格Shampoo优化器,它是经典Shampoo的一种有效实现,并在AlgoPerf神经网络训练算法竞赛的外部调优赛道中获胜。我们的分析统一了单侧和双侧预处理,并建立了以核范数度量的收敛率 $\frac{1}{K}\sum_{k=1}^K E\left[\|\nabla f(X_k)\|_*\right]\leq O(\frac{\sqrt{m+n}C}{K^{1/4}})$,其中 $K$ 表示迭代次数,$(m,n)$ 表示矩阵参数的尺寸,$C$ 与SGD最优收敛率中的常数一致。理论上,我们有 $\|\nabla f(X)\|_F\leq \|\nabla f(X)\|_*\leq \sqrt{m+n}\|\nabla f(X)\|_F$,这支持了我们的收敛率在 $\|\nabla f(X)\|_*= Θ(\sqrt{m+n})\|\nabla f(X)\|_F$ 且 $m$ 和 $n$ 平衡的理想情况下,可以被视为类似于SGD的最优收敛率 $\frac{1}{K}\sum_{k=1}^KE\left[\|\nabla f(X_k)\|_F\right]\leq O(\frac{C}{K^{1/4}})$。

英文摘要

This paper studies AdamW-style Shampoo, an effective variant of the classical Shampoo that won the external tuning track of the AlgoPerf neural network training competition. Our analysis unifies one-sided and two-sided preconditioning. When the exponents of the two preconditioners sum to $1/2$, we establish the convergence rate $\frac{1}{K}\sum_{k=1}^KE\left[||\nabla f(X_k)||_*\right]\leq O(\frac{\sqrt{m+n}C}{K^{1/4}})$, where $K$ represents the number of iterations, $(m,n)$ denotes the dimensions of the matrix-valued parameters, and $C$ matches the constant appearing in the optimal convergence rate of SGD. Theoretically, the nuclear norm and Frobenius norm satisfy $||\nabla f(X)||_F\leq ||\nabla f(X)||_*\leq \sqrt{\min\{m,n\}}||\nabla f(X)||_F$, which suggests that our convergence rate is analogous to the optimal $\frac{1}{K}\sum_{k=1}^KE\left[||\nabla f(X_k)||_F\right]\leq O(\frac{C}{K^{1/4}})$ convergence rate of SGD in the ideal case where $||\nabla f(X)||_*= Θ(\sqrt{\min\{m,n\}})||\nabla f(X)||_F$ and $m$ and $n$ are of comparable magnitude. Then, we extend our analysis to settings where the preconditioning exponents do not sum to 1/2, and establish convergence with an explicit but more involved rate.

2605.28860 2026-06-16 cs.LG cs.AI cs.CL cs.CR 版本更新

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

灾难性遗忘的机制起源:为什么RL比SFT更好地保留电路?

Jeanmely Rojas Nunez, Viraj Sawant, Nathan Allen, Nomgondalai Amgalanbaatar, Yannis Zongo, Vasu Sharma, Maheep Chaudhary

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Washington(华盛顿大学) University of Toronto(多伦多大学)

AI总结 通过引入差异电路脆弱性指标,研究比较了强化学习与监督微调在大型语言模型微调中对内部计算电路的保留程度,发现RL虽任务适应较慢但能更好保留电路,从而减轻灾难性遗忘。

详情
AI中文摘要

微调大型语言模型(LLMs)经常导致先前能力的灾难性遗忘。最近的研究表明,强化学习(RL)比监督微调(SFT)更有效地保留先前能力,这归因于策略梯度更新更接近基础策略\cite{shenfeld2025rl}。我们将这种行为解释扩展到机制层面,并探究RL的优势是否通过内部计算电路的更强保留来体现。我们引入了差异电路脆弱性,一种头部级别的度量,用于衡量电路在微调下的退化程度,并将其用于比较RL和SFT在Qwen2.5-3B-Instruct适应科学问答任务上的表现。我们发现了清晰的机制权衡:SFT更快地适应目标任务,但导致更大的电路破坏和先前能力的遗忘,而RL保留了更大比例的基础电路,代价是任务适应较慢。这些发现表明,电路保留可能有助于解释为什么RL对灾难性遗忘更具鲁棒性。我们在此发布了代码:https://github.com/rl-sft-circuit-research/differential-circuit-vulnerability。

英文摘要

Fine-tuning large language models (LLMs) frequently induces catastrophic forgetting of prior capabilities. Recent work has shown that reinforcement learning (RL) retains prior capabilities more effectively than supervised fine-tuning (SFT), attributing this to policy-gradient updates remaining closer to the base policy \cite{shenfeld2025rl}. We extend this behavioral account to the mechanistic level and ask whether RL's advantage is mirrored by stronger preservation of internal computational circuits. We introduce differential circuit vulnerability, a head-level measure of how much a circuit degrades under fine-tuning, and use it to compare RL and SFT on Qwen2.5-3B-Instruct adapted to scientific question-answering. We find a clear mechanistic trade-off: SFT adapts more rapidly to the target task but produces substantially greater circuit disruption and forgetting of prior capabilities, whereas RL preserves a larger fraction of the base circuit at the cost of slower task adaptation. These findings suggest that circuit preservation may help explain why RL is more robust to catastrophic forgetting. We released our code here: https://github.com/rl-sft-circuit-research/differential-circuit-vulnerability.

2602.17587 2026-06-16 math.ST cs.LG stat.ML stat.TH 版本更新

Asymptotically Optimal Sequential Testing with Markovian Data

马尔可夫数据的渐近最优序贯检验

Alhad Sethi, Kavali Sofia Sagar, Shubhada Agrawal, Debabrota Basu, P. N. Karthik

发表机构 * Indian Institute of Science, Bangalore(班加罗尔印度科学学院) Indian Institute of Technology, Hyderabad(海得拉巴印度理工学院) Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 – CRIStAL(里尔大学、法国国家科学研究中心、中央里尔学院、UMR 9189 – CRIStAL)

AI总结 针对遍历有限状态马尔可夫链生成的数据,提出一种渐近最优的序贯假设检验方法,其期望停止时间与实例相关的下界渐近匹配,并应用于马尔可夫链蒙特卡洛模型误设检测和马尔可夫决策过程结构性质检验。

Comments ICML 2026

详情
AI中文摘要

我们研究了由遍历有限状态马尔可夫链生成的数据的单侧和α-正确序贯假设检验。原假设是未知转移矩阵属于随机矩阵的指定集合P,备择假设对应于不相交的集合Q。我们建立了备择假设下任何有效序贯检验的期望停止时间的非渐近实例相关下界,该下界是渐近紧的。我们的新分析改进了现有下界,这些下界在此设置中要么是渐近的,要么被证明是次优的。我们的下界同时包含了由未知马尔可夫链诱导的平稳分布和转移结构。我们进一步提出了一种最优检验,其期望停止时间在α→0时渐近匹配该下界。我们通过应用该框架到马尔可夫链蒙特卡洛中模型误设的序贯检测以及马尔可夫决策过程中转移动力学的线性等结构性质的检验,说明了我们框架的实用性。我们的发现给出了马尔可夫依赖下最优序贯检验程序的尖锐且一般的刻画。

英文摘要

We study one-sided and $α$-correct sequential hypothesis testing for data generated by an ergodic, finite-state Markov chain. The null hypothesis is that the unknown transition matrix belongs to a prescribed set $P$ of stochastic matrices, and the alternative corresponds to a disjoint set $Q$. We establish a non-asymptotic instance-dependent lower bound on the expected stopping time of any valid sequential test under the alternative, which is asymptotically tight. Our novel analysis improves the existing lower bounds, which are either asymptotic or provably sub-optimal in this setting. Our lower bound incorporates both the stationary distribution and the transition structure induced by the unknown Markov chain. We further propose an optimal test whose expected stopping time matches this lower bound asymptotically as $α\to 0$. We illustrate the usefulness of our framework through applications to sequential detection of model misspecification in Markov Chain Monte Carlo and to testing structural properties, such as the linearity of transition dynamics, in Markov decision processes. Our findings yield a sharp and general characterization of optimal sequential testing procedures under Markovian dependence.

2605.03289 2026-06-16 stat.ML cs.LG math.ST stat.TH 版本更新

Imbalanced Classification under Capacity Constraints

容量约束下的不平衡分类

Daniel Fraiman, Ricardo Fraiman

发表机构 * Departamento de Matemática y Ciencias Universidad de San Andrés(数学与科学系,圣安德烈斯大学) CONICET Argentina(阿根廷国家科研委员会) PEDECIBA Matemática Uruguay(乌拉圭PEDECIBA数学)

AI总结 针对少数类检测中容量约束问题,提出形式化分类框架,通过重加权先验概率等价于贝叶斯分类器,并引入容量调整性能指标,实验表明优于传统方法和SMOTE。

详情
AI中文摘要

在欺诈检测、医学筛查和工业质量控制等应用中,从严重类别不平衡中检测少数类观测是一个核心挑战。在这些场景中,每个阳性预测都会触发昂贵的后续行动(如MRI扫描、交易审计),其执行受到实际运营约束。本文提出了一个容量约束下的形式化分类框架:给定用户定义的界限$b$(可标记为少数类的观测比例上限),目标是找到在该类上最大化灵敏度的分类器。我们刻画了该约束下的最优分类器,并建立了其与重加权先验概率下的经典贝叶斯分类器的等价性。我们还引入了一个容量调整的性能指标$M$,用于衡量容量约束生效时的有效检测率。该框架在标准学习方法(k-NN、SVM、随机森林和神经网络)上实现,并为每种方法建立了统计一致性。我们进一步证明,当没有超参数面向容量约束目标时,这些方法退化为事后阈值调整,并引入了一种容量感知支持向量机,在训练过程中利用约束,实现了最强的经验性能。在台湾信用卡违约数据集上的实验证实,在高不平衡情况下,容量约束分类器显著优于经典方法和SMOTE。该框架自然地扩展到多类别设置和在线环境。

英文摘要

Detecting observations from a minority class under severe class imbalance is a central challenge in applications such as fraud detection, medical screening, and industrial quality control. In these settings, each positive prediction triggers a costly follow-up action, an MRI scan, a transaction audit, whose execution is subject to real operational constraints. This paper proposes a formal classification framework under capacity constraints: given a user-defined bound limit $b$ on the proportion of observations that can be labeled as belonging to the minority class, the goal is to find the classifier that maximizes sensitivity on that class. We characterize the optimal classifier under this constraint and establish its equivalence with the classical Bayes classifier under a reweighting of the prior probabilities. We also introduce a capacity-adjusted performance metric $M$ that accounts for the effective detection rate when the capacity constraint is binding. The framework is implemented on top of standard learning methods, k-NN, SVM, random forests, and neural networks, and statistical consistency is established for each. We further show that these methods reduce to post-hoc thresholding when no hyperparameters are oriented toward the capacity-constrained objective, and introduce a capacity-aware support vector machine that exploits the constraint during training and achieves the strongest empirical performance. Experiments on the Taiwanese credit card default dataset confirm that capacity-constrained classifiers substantially outperform both classical approaches and SMOTE under high imbalance regimes. The framework extends naturally to multiclass settings and online environments.

2605.13092 2026-06-16 stat.ML cs.LG stat.ME 版本更新

Adaptive Kernel Density Estimation with Pre-training

具有预训练的自适应核密度估计

Ruitong Zhang, Ke Deng

发表机构 * Department of Statistics and Data Science, Tsinghua University(统计与数据科学系,清华大学)

AI总结 本文提出利用预训练技术提升高维下自适应核密度估计效率,通过神经网络推荐合适核函数,实验证明在目标分布接近预训练分布时效果显著。

详情
AI中文摘要

高维密度估计是一个重要且具有挑战性的统计问题。传统基于核平滑的方法在高维中效率低下,因难以指定合适的位置自适应核。本文将预训练技术引入非参数密度估计中,通过建立预训练神经网络为每个样本点推荐合适的位置自适应核,实现高维高效密度估计。大量数值实验表明,当目标分布接近预训练分布族时,该策略能显著提升密度估计精度。当目标分布与预训练分布族差异较大时,预训练策略的益处可能减弱,但可通过额外的微调过程重新激活。

英文摘要

Density estimation in high-dimensional settings is an important and challenging statistical problem.Traditional methods based on kernel smoothing are inefficient in high dimensions due to the difficulties in specifying appropriate location-adaptive kernels. In this work, we introduce pre-training, a key idea behind many cutting-edge AI technologies, to the context of non-parametric density estimation. By establishing a pre-trained neural network that can recommend an appropriate location-adaptive kernel for each sample point, efficient density estimation with adaptive kernels is achieved in high dimensions. A wide range of numerical experiments show that this strategy is highly effective for improving density-estimation accuracy, when the target distribution is close to the distribution family for pre-training. When the target distribution is substantially different from the pre-training distribution family, the benefit from the proposed pre-training strategy may be diluted, but can be reactivated by an additional fine-tuning procedure.

2605.18528 2026-06-16 math.OC cs.LG 版本更新

Scale-Invariant Neural Network Optimization: Norm Geometry and Heavy-Tailed Noise

尺度不变神经网络优化:范数几何与重尾噪声

Jiayu Zhang, Tianyi Lin

发表机构 * Department of Industrial Engineering and Operations Research(工业工程与运营管理系)

AI总结 针对重尾噪声下的非凸随机优化,研究了尺度不变一阶方法的维度依赖下界,并提出了匹配上界的批处理Scion方法以及利用高阶光滑性的传输Scion方法。

Comments Polished writing; Fixed typos and references; 45 pages

详情
AI中文摘要

来自神经网络优化的一个日益增长的经验是,优化器的设计应尊重模型的参数化方式。尺度不变方法变得重要,因为其归一化的逐层更新不仅支持跨模型大小的超参数迁移,还能利用输入-输出矩阵范数几何。同时,深度学习中的随机梯度噪声通常远非亚高斯,可能表现出重尾。这些关键观察塑造了近期训练神经网络的算法原理,然而它们的联合理论后果仍未被充分探索。特别地,对于具有一般输入-输出矩阵范数的尺度不变方法,什么维度依赖是不可避免的,以及高阶光滑性是否能在重尾噪声下加速训练,尚不清楚。我们通过一般范数下 $\mathbb{R}^{m\times n}$ 上的非凸光滑随机优化来研究这些问题,目标是在 $p^{\mathrm{th}}$ 阶矩重尾噪声下达到 $\varepsilon$-稳定点。我们的第一个贡献是维度相关的下界:当 $\frac{\max\{m,n\}}{(\min\{m,n\})^2}$ 足够大时,任何具有谱范数的尺度不变一阶方法需要 $\Omega(\min\{m, n\}\varepsilon^{-\frac{3p-2}{p-1}})$ 次 oracle 调用。我们证明,具有谱范数的批处理 Scion 方法达到了匹配的上界 $O(\min\{m, n\}\varepsilon^{-\frac{3p-2}{p-1}})$。为了利用高阶光滑性,我们提出了一种传输 Scion 方法,并在范数为谱范数且 Hessian 矩阵 Lipschitz 连续时将界改进为 $O(\min\{m, n\}\varepsilon^{-\frac{5p-3}{2p-2}})$。最后,我们将实践启发式方法融入我们的传输方法,并在多种架构和模型大小上进行评估,展示了其在训练神经网络中的灵活性和兼容性。

英文摘要

A growing lesson from neural network optimization is that optimizer design should respect how the model is parametrized. The layerwise input-output structure of neural networks motivates scale-invariant optimizers, such as Muon and Scion, whose updates also support hyperparameter transfer. At the same time, stochastic gradient noise in deep learning is often far from sub-Gaussian and may exhibit heavy tails. These observations have shaped recent algorithmic principles for training neural networks, yet their joint theoretical consequences are underexplored. In particular, it remains unclear what dimension dependence is unavoidable for gradient-based methods given the problem class is defined by input-output norm and under heavy-tailed noise, and whether higher-order smoothness can accelerate training. We study these questions through nonconvex smooth stochastic optimization over $\mathbb R^{m\times n}$ equipped with general norms and under $p^\mathrm{th}$-moment heavy-tailed noise, where the goal is to achieve an $ε$-stationary point in the dual norm. Our first contribution is a dimension-dependent lower bound: when $\frac{\max\{m,n\}}{(\min\{m,n\})^2}$ is large enough, any gradient-based method requires $Ω(\min\{m, n\}ε^{-\frac{3p-2}{p-1}})$ oracles for the problem class defined by the spectral norm, which is a common input-output norm. We prove that a scale-invariant Scion method with the spectral norm can achieve the matching upper bound of $O(\min\{m, n\}ε^{-\frac{3p-2}{p-1}})$. To exploit higher-order smoothness, we propose a transported Scion method and improve the bound to $O(\min\{m, n\}ε^{-\frac{5p-3}{2p-2}})$ when the Hessian is Lipschitz. Finally, we incorporate heuristics into our transported method and evaluate it across multiple architectures and model sizes, demonstrating its flexibility and compatibility with neural network training.

6. 高效学习、压缩与部署 45 篇

2606.14945 2026-06-16 cs.LG 新提交

Remember, Don't Re-read: Stateful ReAct Agents for Token-Efficient Autonomous Experimentation

记住,不要重读:用于令牌高效自主实验的有状态ReAct智能体

Faramarz Jabbarvaziri

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出基于LangGraph的有状态ReAct智能体,通过持久化状态和固定大小对话窗口,将自主实验的令牌成本从O(n²)降至O(1),在超参数调优和代码优化任务中分别减少90%和52%的令牌消耗。

详情
AI中文摘要

自动研究模式通过让大语言模型(LLM)迭代修改代码来优化目标指标,从而实现自主实验。然而,其无状态设计在每次迭代中从头重建实验上下文,导致每次迭代的令牌成本为$O(n)$,总成本为$O(n^{2})$。本文将该模式重新表述为使用LangGraph的有状态ReAct智能体,其中类型化的持久化状态通过工具调用接口跨迭代传递实验历史。评估了两个基准:超参数调优(15次迭代,每次迭代观察数据小)和代码性能优化(40次迭代,每次迭代观察数据大,包含完整源代码和基准测试结果)。在超参数调优中,有状态智能体消耗的令牌减少90%(2,492 vs. 24,465)。在代码优化中,有状态智能体消耗的令牌减少52%(627K vs. 1,275K),同时在两项任务上实现了相当的优化质量。令牌减少是结构性的:无状态智能体以每次迭代$O(n)$的成本重读完整历史,而有状态智能体在固定大小的对话窗口内以$O(1)$成本运行。本文详细描述了该架构,使从业者能够为其自己的工作流程实现有状态自动研究智能体。

英文摘要

The autoresearch pattern enables autonomous experimentation by having a large language model (LLM) iteratively modify code to optimize a target metric. Its stateless design, however, reconstructs experimental context from scratch at every iteration, incurring $O(n)$ token cost per iteration and $O(n^{2})$ total. This work reformulates the pattern as a stateful ReAct agent using LangGraph, where typed persistent state carries experimental history across iterations via a tool-calling interface. Two benchmarks are evaluated: hyperparameter tuning (15 iterations, small per-iteration observations) and code performance optimization (40 iterations, large per-iteration observations containing full source code and benchmark results). On hyperparameter tuning, the stateful agent consumes 90\% fewer tokens (2{,}492 vs.\ 24{,}465). On code optimization, the stateful agent consumes 52\% fewer tokens (627K vs.\ 1{,}275K) while achieving comparable optimization quality on both tasks. The token reduction is structural: the stateless agent re-reads the full history at $O(n)$ cost per iteration, while the stateful agent operates within a fixed-size conversation window at $O(1)$ cost. This paper describes the architecture in sufficient detail for practitioners to implement a stateful autoresearch agent for their own workflows.

2606.15157 2026-06-16 cs.LG cs.AI 新提交

PolyKV: Heterogeneous Retention and Allocation for KV Cache Compression

PolyKV: 异构保留与分配用于KV缓存压缩

Chao Fei, Panos Kalnis

发表机构 * King Abdullah University of Science and Technology(阿卜杜拉国王科技大学)

AI总结 针对长上下文大模型推理中KV缓存压缩问题,提出PolyKV框架,通过层级别信号为每层选择合适压缩策略并分配非均匀缓存预算,实验表明在固定预算下显著恢复性能差距。

详情
AI中文摘要

KV缓存压缩对于减少长上下文大语言模型推理的内存成本至关重要。然而,现有方法通常在所有Transformer层上应用单一的压缩策略和统一的缓存预算。这种统一设计忽略了不同层在预填充和解码过程中可能扮演不同角色,因此可能需要不同的驱逐策略和缓存容量。我们提出了PolyKV,一种逐层KV缓存优化框架,考虑了方法选择和预算分配的设计空间。PolyKV基于层级别信号将每层路由到合适的KV压缩策略,同时在固定总预算下分配非均匀预算。这种公式化实现了现有KV缓存方法的异构组合。在LLaMA-3.1-8B和Qwen3-8B上的实验表明,在相同的512 token平均KV预算下,PolyKV分别恢复了最强单策略基线与FullKV之间LongBench性能差距的54.5%和25.7%。在128-1024预算范围内,PolyKV持续比最强基线提升1.7%-6.4%,对应FullKV差距的40.0%-54.5%恢复。

英文摘要

KV cache compression is essential for reducing the memory cost of long-context large language model inference. Existing approaches, however, typically apply a single compression policy and a uniform cache budget across all transformer layers. This uniform design ignores the fact that different layers can play different roles during prefill and decoding, and may therefore require different eviction strategies and cache capacities. We present PolyKV, a layer-wise KV cache optimization framework that considers design space with method selection and budget allocation. PolyKV routes each layer to a suitable KV compression policy based on layer-level signals, while assigning non-uniform budgets under a fixed total budget. This formulation enables heterogeneous compositions of existing KV cache methods. Experiments on LLaMA-3.1-8B and Qwen3-8B show that, under the same 512-token average KV budget, PolyKV recovers 54.5% and 25.7% of the LongBench performance gap between the strongest single-policy baseline and FullKV, respectively. Across 128-1024 budget sweep, PolyKV consistently improves over the strongest baseline by 1.7%-6.4%, corresponding to 40.0%-54.5% recovery of the FullKV gap.

2606.15244 2026-06-16 cs.LG 新提交

M-CTX: Exact and Scalable Spatial Context Retrieval for Trajectory Analytics

M-CTX:用于轨迹分析的精确保可扩展空间上下文检索

Kun Ma, Qilong Han, Chengjing Song, Jingzheng Yao, Xiao Han, Yuee Zhou, Changmao Wu

发表机构 * Harbin Engineering University(哈尔滨工程大学) Wuhan University of Technology(武汉理工大学) University of Chinese Academy of Sciences(中国科学院大学) Alibaba Group(阿里巴巴集团)

AI总结 提出M-CTX框架,将空间上下文构建转化为空间数据库查询,通过索引加速实现226倍加速,解决轨迹预测中上下文构建的系统瓶颈。

Comments 14 pages, 10 figures, 12 tables. Submitted to ICDE 2027

详情
AI中文摘要

现代轨迹预测器越来越多地依赖于外部空间上下文,例如地图几何、符号距离场(SDF)和附近的移动代理。虽然这种上下文提高了预测质量,但为每个训练锚点构建它已成为一个隐藏的系统瓶颈。在一个代表性的海事AIS流程中,空间上下文构建需要大约17个CPU天来处理一个5.48M锚点的语料库,这主导了下游预测器的成本。我们提出了M-CTX,一个用于轨迹分析的精确保可扩展空间上下文检索框架。M-CTX将上下文构建重新构想为一次摄取、多次查询的空间数据库工作负载,并用可组合的、基于索引的操作符替换了三个暴力阶段——OSM范围检索、SDF计算和移动船舶邻居查找。其学习的范围索引后端BR-LZ提供了召回完全的MBR重叠范围检索,并将候选放大相对于全局扩展单曲线基线降低了1.1倍至2.7倍。在四个海事区域、八个基线系统、多达4000万个空间特征的合成工作负载以及10^7条记录的AIS流上,M-CTX精确地重现了参考上下文。在5.48M锚点语料库上,它将上下文构建从大约17个CPU天减少到1.8小时,实现了226倍的端到端加速。一个可选的存储模式进一步将SDF上下文压缩了64倍,仅改变了0.04米的ADE。这些结果确立了精确空间上下文检索作为现代轨迹分析中一类数据库问题的地位。代码和数据集公开在https://github.com/mark000071/M-CTX-Traj。

英文摘要

Modern trajectory predictors increasingly condition on external spatial context, such as map geometry, signed distance fields (SDFs), and nearby moving agents. While this context improves prediction quality, constructing it for every training anchor has become a hidden systems bottleneck. In a representative maritime AIS pipeline, spatial context construction requires roughly 17 CPU-days for a 5.48M-anchor corpus, dominating the cost of the downstream predictor. We present M-CTX, an exact and scalable spatial context-retrieval framework for trajectory analytics. M-CTX recasts context construction as an ingest-once, query-many spatial database workload and replaces three brute-force stages -- OSM range retrieval, SDF computation, and moving-vessel neighbour lookup -- with composable, index-backed operators. Its learned range-index backend, BR-LZ, provides recall-complete MBR-overlap range retrieval and reduces candidate amplification by 1.1x--2.7x relative to global-expansion one-curve baselines. Across four maritime regions, eight baseline systems, synthetic workloads with up to 40M spatial features, and 10^7-record AIS streams, M-CTX reproduces the reference context exactly. On the 5.48M-anchor corpus, it reduces context construction from about 17 CPU-days to 1.8 hours, a measured 226x end-to-end speed-up. An optional storage mode further compresses SDF context by 64x with only a 0.04 m ADE change. These results establish exact spatial context retrieval as a first-class database problem in modern trajectory analytics. Code and datasets are publicly available at https://github.com/mark000071/M-CTX-Traj.

2606.15553 2026-06-16 cs.LG cs.AI 新提交

Distilling Drifting Transformers with Representation Autoencoders

用表示自编码器蒸馏漂移变换器

Jiawei Zhang, Mengfei Xia, Gen Li, Yuantao Gu

发表机构 * Tsinghua University(清华大学) Ant Group(蚂蚁集团) CUHK(香港中文大学)

AI总结 提出Drift-RAE方法,通过漂移范式在表示自编码器潜空间中蒸馏预训练流模型,解决各向异性和大曲率问题,在ImageNet 256上仅用10k步达到1.77 FID。

详情
AI中文摘要

表示自编码器(RAE)通过预训练编码器中强标签聚类的DINO特征,在语义更丰富的潜空间中改进了扩散和流模型。然而,在蒸馏阶段,丰富语义表示导致的严重各向异性和大曲率会阻碍收敛和性能,使得基于轨迹的蒸馏不稳定。在这项工作中,我们认为RAE潜空间通过新提出的漂移模型与蒸馏兼容。我们首先定量研究了不同自编码器上的曲率和各向同性统计,并从理论上揭示了漂移模型本身极有可能在像基于重建的VAE这样的极端分散空间上失败。这些促使我们直接将漂移范式应用于表示自编码器。我们提出的方法Drift-RAE使用漂移在RAE潜空间中蒸馏预训练流模型,并进行了有洞察力的修改,通过理论上将漂移场与其他框架对齐来提高训练稳定性。关于实验证据,我们在ImageNet 256数据集上仅用10k步蒸馏就达到了1.77 FID,超越了最先进的RAE蒸馏方法,并且与原始漂移模型相比具有竞争力,而无需辅助MAE特征提取器。代码将公开提供。

英文摘要

Representation Autoencoders (RAEs) have improved diffusion and flow models by semantically richer latent space owing to the strongly label-wise clustered DINO features in the pretrained encoders. Yet in the distillation stage, the severe anisotropy and large curvatures caused by the rich semantic representations would hinder the convergence and performance, making the trajectory-based distillation unstable. In this work, we argue that the RAE latent space is compatible with distillation via the newly proposed Drifting Models. We first quantitatively study the curvatures and isotropy statistics across different autoencoders, and theoretically reveal that Drifting Model itself is highly likely to fail on extremely scattered spaces like reconstruction-based VAEs. These motivate us to apply the drifting paradigm directly to representation autoencoders. Our proposed method, Drift-RAE, distills pretrained flow models in RAE latent spaces using Drifting, together with insightful modifications that improve training stability by thereotically aligning drifting fields with other frameworks. Regarding the experimental evidences, we achieve 1.77 FID on ImageNet 256 dataset using only 10k distillation steps, surpassing state-of-the-art RAE distillation methods and appearing comparative with the original Drifting Model without requiring an auxiliary MAE feature extractor. The code will be made publicly available.

2606.15615 2026-06-16 cs.LG cs.CV 新提交

MoECa: Aligning Feature Reuse with Expert Decomposition in Diffusion Transformers

MoECa: 在扩散变换器中对齐特征复用与专家分解

Maoliang Li, Haojing Chen, Jiayu Chen, Zihao Zheng, Xinhao Sun, Hailong Zou, Xiang Chen

发表机构 * School of Computer Science, Peking University(北京大学计算机科学学院) School of Software Engineering, University of Electronic Science and Technology of China(电子科技大学软件工程学院)

AI总结 针对DiT-MoE中跨时间步的冗余计算,提出基于专家分支级别的细粒度缓存框架MoECa,实现分支级特征复用,并引入专家感知自适应控制和同步缓存更新,在多个模型上取得高达2.83倍加速且质量损失极小。

Comments under review

详情
AI中文摘要

基于混合专家模型的扩散变换器(DiT-MoE)通过稀疏激活提升了模型容量,但扩散推理仍然受限于跨时间步的冗余计算。现有的缓存方法主要在token级别操作,这在DiT-MoE中变得次优,因为每个token更新内部被分解为多个路由专家分支。我们的分析表明,DiT-MoE中的跨时间步冗余在专家分支级别比在整个token级别更易于表征。基于这一观察,我们提出MoECa,一种细粒度的缓存框架,跨时间步执行分支级特征复用。MoECa进一步引入了专家感知的自适应控制和MoE与注意力路径之间的同步缓存更新,以维持稳定的中间状态。在多个DiT-MoE模型上的实验表明,MoECa在速度-质量权衡上始终优于先前的缓存方法,实现了高达2.83倍的推理加速且质量退化极小。

英文摘要

Diffusion Transformers with Mixture-of-Experts (DiT-MoE) improve model capacity under sparse activation, but diffusion inference is still bottlenecked by redundant computation across timesteps. Existing caching methods mainly operate at the token level, which becomes suboptimal in DiT-MoE because each token update is internally decomposed into multiple routed expert branches. Our analysis shows that cross-timestep redundancy in DiT-MoE is better characterized at the expert-branch level than at the whole-token level. Based on this observation, we propose MoECa, a fine-grained caching framework that performs branch-level feature reuse across timesteps. MoECa further introduces expert-aware adaptive control and synchronized cache updates across MoE and attention paths to maintain stable intermediate states. Experiments on multiple DiT-MoE models show that MoECa consistently achieves a better speed-quality trade-off than prior caching methods, with up to 2.83$\times$ inference speedup and minimal quality degradation.

2606.15652 2026-06-16 cs.LG cs.CL 新提交

MosaicQuant: Inlier-Outlier Disaggregation for Unified 4-Bit LLM Quantization

MosaicQuant: 基于内点-离点分离的统一4位LLM量化

Yangjia Hu, Haodong Wang, Zicong Hong, Qianli Liu, Quanxin Shou, Jian Lin, Song Guo, Xiaowei Shen, Xiangjun Huang, Dian Wang, Jian Yang

发表机构 * HKUST(香港科技大学) EPFL(瑞士联邦理工学院洛桑) MetaX Integrated Circuits Co., Ltd(MetaX集成电路有限公司)

AI总结 提出MosaicQuant,通过将权重矩阵量化为密集4位基分量和稀疏4位残差分量,结合ZipperEngine融合稀疏块计算,实现统一4位推理,在LLaMA3和Qwen3上保持近FP16精度并加速1.24倍。

Comments 17 pages

详情
AI中文摘要

4位量化显著减少了内存占用并加速了大语言模型(LLM)的推理。然而,其有限的位宽表示难以忠实捕捉密集的常见值(内点)和罕见的大幅度值(离点),导致显著的精度下降。现有的混合精度方法通过保留离点的高精度来缓解这一问题,但代价是破坏了低比特执行的统一性,引入了精度转换和额外的数据移动,削弱了实际加速效果。我们提出MosaicQuant,一种基于内点-离点分离新原理的统一4位LLM量化范式。MosaicQuant不提升离点精度,而是将整个权重矩阵量化为密集的4位基分量,其中内点被忠实捕捉,而离点不可避免地量化。然后引入一个稀疏的4位残差分量来补偿这些量化误差,选择性地针对输出失真最严重的误差关键权重块。然而,仅统一表示是不够的,因为将稀疏残差作为单独内核执行仍然会破坏统一的低比特推理流水线。为弥补这一差距,我们引入ZipperEngine,通过重叠流水线将稀疏块计算融合到密集4位GEMM内核中,不仅统一了表示,而且将执行统一为单个连贯的低比特推理流水线。在LLaMA3和Qwen3上的大量实验表明,MosaicQuant在保持接近FP16精度的同时,相比W16A16基线实现了高达1.24倍的加速。

英文摘要

4-bit quantization significantly reduces the memory footprint and accelerates the inference of large language models (LLMs). However, its limited bit-width representation struggles to faithfully capture both dense common values (\emph{inliers}) and rare large-magnitude values (\emph{outliers}), causing substantial accuracy degradation. Existing mixed-precision methods mitigate this by retaining outliers in high precision, but at the cost of breaking the uniformity of low-bit execution, introducing precision conversion and extra data movement that undermine practical speedup. We propose \textbf{MosaicQuant}, a unified 4-bit LLM quantization paradigm built on a novel principle of \emph{inlier--outlier disaggregation}. Rather than elevating outlier precision, MosaicQuant quantizes the full weight matrix into a dense 4-bit base component, where inliers are captured faithfully while outlier are inevitably quantized. A sparse 4-bit residual component is then introduced to compensate for these quantization errors, selectively targeting the most error-critical weight blocks where output distortion is shown to be concentrated. However, a unified representation alone is insufficient, as naïvely executing the sparse residual as a separate kernel still breaks the unified low-bit inference pipeline. To bridge this gap, we introduce \textbf{ZipperEngine}, which fuses sparse block computation into the dense 4-bit GEMM kernel via an overlapped pipeline, unifying not only the representation but also the execution into a single coherent low-bit inference pipeline. Extensive experiments on LLaMA3 and Qwen3 demonstrate that MosaicQuant preserves near-FP16 accuracy while achieving up to $1.24\times$ speedup over the W16A16 baseline.

2606.15682 2026-06-16 cs.LG 新提交

ReQAT: Achieving Full-Precision Reasoning Accuracy with 4-bit Floating-Point Quantization-Aware Training

ReQAT: 实现全精度推理精度的4位浮点量化感知训练

Janghwan Lee, Sihwa Lee, Jinseok Kim, Yongjik Kim, Jieun Lim, Jinwook Oh, Jungwook Choi

发表机构 * Hanyang University(汉阳大学) Samsung Advanced Institute of Technology(三星综合技术院)

AI总结 针对大推理模型在低比特量化(W4A4KV4)下推理精度严重下降的问题,提出ReQAT框架,通过迹对齐QAT、选择性熵最小化和量化友好初始化,恢复并超越BF16微调精度,实现最高3.9倍吞吐加速。

Comments ICML 2026

详情
AI中文摘要

大型推理模型(LRMs)通过长思维链实现了强大的问题解决能力,但其部署受到全精度推理的高成本和不断增长的KV缓存占用限制。微尺度FP4格式支持高效的FP4部署;然而,完全量化权重、激活和KV缓存(W4A4KV4)会导致严重的推理退化,现有的PTQ和QAT无法恢复。我们发现FP4失败集中在低熵token上——精确的符号承诺,如数字和运算符——量化噪声放大了采样误差,这些误差在推理轨迹中级联。基于这一洞察,我们提出了ReQAT,一个以推理为中心的FP4训练框架,包含三个组件:(i)迹对齐QAT(TAQ),重新审视相同的推理轨迹,将更新集中在关键的低熵决策上;(ii)选择性熵最小化(SEM),在低熵位置增强置信度;(iii)Q-FIT,一种量化友好的初始化,联合校准RoPE一致的KV缓存变换以稳定QAT。在相同的训练预算下,ReQAT不仅恢复而且超越了BF16微调精度,同时在NVIDIA DGX Spark上实现了高达3.9倍的吞吐加速,在B200上实现了3.1倍。

英文摘要

Large Reasoning Models (LRMs) achieve strong problem-solving through long chain-of-thought, but their deployment is constrained by the high cost of full-precision inference and growing KV cache footprints. Microscaled FP4 formats enable efficient FP4 deployment; however, fully quantizing weights, activations, and KV caches (W4A4KV4) causes severe reasoning degradation that existing PTQ and QAT fail to recover. We identify that FP4 failures concentrate on low-entropy tokens--precise symbolic commitments such as digits and operators--where quantization noise inflates sampling errors that cascade through reasoning traces. Based on this insight, we propose ReQAT, a reasoning-centric FP4 training framework with three components: (i) Trace-Aligned QAT (TAQ), which revisits identical reasoning traces to focus updates on critical low-entropy decisions; (ii) Selective Entropy Minimization (SEM), which reinforces confidence at low-entropy positions; and (iii) Q-FIT, a quantization-friendly initialization that jointly calibrates RoPE-consistent KV cache transformations to stabilize QAT. Under the same training budget, ReQAT not only recovers but surpasses BF16 fine-tuning accuracy, while delivering up to 3.9x throughput speedup on NVIDIA DGX Spark and 3.1x on B200.

2606.15716 2026-06-16 cs.LG 新提交

How to Score Experts for One-Shot MoE Expert Pruning: A Unified Formulation and Selection Principle

如何为一次性MoE专家剪枝评分:统一公式与选择原则

Zongfang Liu, Jinghui Zhang, Zijian Ma, Guangyi Chen, Xin Yuan

发表机构 * Zhejiang University(浙江大学) Westlake University(西湖大学) Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出一次性MoE专家剪枝的统一公式,基于路由频率、门控权重和激活强度三个因素,推导出任务无关剪枝应使用基于激活的准则,任务特定剪枝可保留路由频率和门控信息,并据此提出两种新准则MAN和MSAN,在多个模型和基准上取得最优性能。

详情
AI中文摘要

混合专家(MoE)语言模型通过稀疏专家激活减少了每令牌的计算量,但部署时仍需存储完整的专家池,使得一次性专家剪枝成为减少内存使用的实用方法。尽管有效,现有准则大多是启发式的,且没有单一准则普遍最优。因此,为不同部署目标建立选择剪枝准则的原则,是一次性专家剪枝中一个重要但尚未充分探索的问题。为此,我们引入了一个一次性MoE专家剪枝的统一公式,围绕三个因素组织:路由频率、门控权重和激活强度。该公式产生了一个准则选择原则:任务无关剪枝应倾向于基于路由令牌平均、无门控的激活准则,而任务特定剪枝可以从保留路由频率和门控权重信息中受益。除了这一原则,该公式还提供了对现有启发式准则的系统性视角,并提出了两个新的任务无关准则:平均激活范数(MAN)和均方激活范数(MSAN)。在四个代表性MoE模型和16个多样化基准上,MAN和MSAN在任务无关设置中始终表现强劲,获得前两名的平均排名,并在最强基线上将平均性能提升高达8.8个百分点。

英文摘要

Mixture-of-Experts (MoE) language models reduce per-token computation through sparse expert activation, yet deployment still requires storing the full expert pool, making one-shot expert pruning a practical approach for reducing memory usage. Although effective, existing criteria are largely heuristic, and no single criterion is universally optimal. Thus, establishing a principle for selecting pruning criteria suited to different deployment objectives remains an important yet largely underexplored problem in one-shot expert pruning. To this end, we introduce a unified formulation for one-shot MoE expert pruning organized around three factors: routing frequency, gate weighting, and activation strength. The formulation yields a criteria selection principle: task-agnostic pruning should favor routed-token-averaged, gate-free activation-based criteria, whereas task-specific pruning can benefit from retaining routing-frequency and gate-weight information. Beyond this principle, the formulation also provides a systematic view of existing heuristic criteria and gives rise to two new task-agnostic criteria, Mean Activation Norm (MAN) and Mean Squared Activation Norm (MSAN). Across four representative MoE models and 16 diverse benchmarks, MAN and MSAN are consistently strong in the task-agnostic setting, obtain the top-two average ranks, and improve average performance by up to 8.8 points over the strongest baseline.

2606.15912 2026-06-16 cs.LG cs.AI 新提交

On-Policy Distillation with Curriculum Turn-level Guidance for Multi-turn Agents

基于课程回合级指导的在线策略蒸馏用于多轮智能体

Gengsheng Li, Mao Zheng, Mingyang Song, Ruiqi Liu, Tianyu Yang, Jie Sun, Qiyong Zhong, Haiyun Guo, Junfeng Fang, Dan Zhang, Jinqiao Wang

发表机构 * Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所基础模型研究中心) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院) Large Language Model Department, Tencent(腾讯大语言模型部) University of Science and Technology of China(中国科学技术大学) Zhejiang University(浙江大学) National University of Singapore(新加坡国立大学) Wuhan AI Research(武汉人工智能研究院)

AI总结 针对多轮智能体在线策略蒸馏中错误累积导致教师监督失效的问题,提出混合教师和学生生成回合的Guided-OPD算法,通过课程式衰减教师干预概率,在ALFWorld等任务上平均提升21.1%得分和25.5%成功率。

详情
AI中文摘要

能够规划、调用工具并与环境交互的多轮智能体为解决复杂任务提供了一种有前景的范式,但其能力通常依赖于非常大的模型,这些模型的推理成本在实践中令人望而却步。在线策略蒸馏(OPD)是将这种能力迁移到较小学生模型的一种自然方法,但我们发现它在这种设置下存在一种特征性失败模式:小的学生错误在回合间累积,将轨迹推离教师熟悉的状态分布,因此教师的监督在最需要的地方变得最不可靠。我们提出了引导式在线策略蒸馏(Guided-OPD),一种简单而有效的算法,它在每个轨迹中混合教师和学生生成的回合,并按照衰减到零的课程安排教师的干预概率。强引导使早期轨迹接近教师分布,然后逐渐撤除以恢复推理时使用的纯在线策略。在ALFWorld、ScienceWorld和WebShop上,从Qwen3-30B-A3B教师蒸馏Qwen3学生,Guided-OPD相比普通OPD平均提高21.1%得分和25.5%成功率,在较小的学生上收益更大。

英文摘要

Multi-turn agents that plan, invoke tools, and interact with environments offer a promising paradigm for solving complex tasks, yet their capabilities typically rely on very large models whose inference cost is prohibitive in practice.On-Policy Distillation (OPD) is a natural recipe for transferring such capabilities to smaller students, but we find that it suffers a characteristic failure mode in this setting: small student errors compound across turns and push the trajectory out of the teacher's familiar state distribution, so the teacher's supervision becomes least reliable precisely where the student needs it most.We propose Guided On-Policy Distillation (Guided-OPD), a simple yet effective algorithm that mixes teacher- and student-generated turns within each rollout and schedules the teacher's intervention probability along a curriculum that decays to zero.Strong guidance keeps early trajectories close to the teacher distribution and is then gradually withdrawn to recover the purely on-policy regime used at inference.On ALFWorld, ScienceWorld, and WebShop, distilling Qwen3 students from a Qwen3-30B-A3B teacher, Guided-OPD improves Score by 21.1\% and Success Rate by 25.5\% over vanilla OPD on average, with larger gains on smaller students.

2606.16059 2026-06-16 cs.LG cs.AI 新提交

Mojo: A Promising Tool for Scalable Financial AI Efficiency

Mojo:可扩展金融AI效率的有前景工具

Henry Han

发表机构 * Data Science and Artificial Intelligence Innovation Laboratory, School of Engineering and Computer Science, Baylor University(贝勒大学工程与计算机科学学院数据科学与人工智能创新实验室)

AI总结 本文介绍Mojo语言,通过MLIR编译和确定性内核设计,解决量化金融中Python到C++的性能差距与数值不一致问题,在金融AI工作负载上实现20-180倍加速。

Comments 15, 3 figures

详情
AI中文摘要

三十年来,量化金融一直承受着高昂的双语言税:用Python研究的模型需重写为C++用于生产,常常引入数值差异。GPU加速深度学习加剧了这一问题,因为非确定性浮点归约可能在长回测中产生漂移,挑战监管可重复性和审计期望。本文调查了Mojo——Modular公司2026年推出的类Python系统语言,作为资本市场工程的结构性回应。在缩小Python到C++性能差距的同时,Mojo独特地结合了原生互操作性和构建位精确确定性内核所需的底层系统控制。其MLIR编译基础设施进一步允许单一代码库针对标量、SIMD、多核和GPU执行,减少了研究与生产之间的转换瓶颈。我们对四个核心金融AI工作负载进行了基准测试:蒙特卡洛期权定价、LLM情感推理、多资产回测和投资组合风险价值。在Apple Silicon上,Mojo在直接测量的内核上相比纯Python实现了20倍到180倍的加速;更大规模GPU工作负载的结果是根据已发表基准校准的预测。除了透明的性能数据,我们还介绍了mojo-deterministic,一个可重现归约内核的开源库,并对Mojo已解决和尚未解决的问题进行了坦诚评估。

英文摘要

For thirty years, quantitative finance has paid a costly two-language tax: models researched in Python are rewritten in C++ for production, often introducing numerical discrepancies. GPU-accelerated deep learning exacerbates this problem, as nondeterministic floating-point reductions can produce drift in long backtests, challenging regulatory reproducibility and auditability expectations. This article surveys Mojo, Modular's 2026 Python-like systems language, as a structural response for capital markets engineering. While closing the Python-to-C++ performance gap, Mojo uniquely combines native interoperability with the low-level systems control required to construct bit-exact deterministic kernels. Its MLIR compilation infrastructure further allows a single codebase to target scalar, SIMD, multicore, and GPU execution, reducing the translation bottleneck between research and production. We benchmark four core financial AI workloads: Monte Carlo option pricing, LLM sentiment inference, multi-asset backtesting, and portfolio Value at Risk. On Apple Silicon, Mojo demonstrates 20x to 180x speedups over pure Python on directly measured kernels; larger-scale GPU workload results are projections calibrated from published benchmarks. Alongside transparent performance data, we introduce mojo-deterministic, an open-source library of reproducible reduction kernels, and provide a candid assessment of the problems Mojo does and does not yet solve.

2606.16352 2026-06-16 cs.LG cs.AI 新提交

Communication-Efficient Verifiable Attention for LLM Inference

面向LLM推理的高效通信可验证注意力机制

Ziqun Chen, Ming Wu, Michael Heinrich, Jason Zeng, Huiying Lan, Tianwei Zhang, Rui Tan

发表机构 * Nanyang Technological University(南洋理工大学) Zero Gravity Labs(零重力实验室)

AI总结 提出VeriAttn,通过将注意力计算卸载到GPU并由TEE验证,结合两阶段流水线和分区策略,显著降低TEE计算和通信开销,实现LLM推理加速。

Comments 19 pages, 16 figures

详情
AI中文摘要

远程大型语言模型(LLM)服务的计算完整性可能存在问题。对于传统深度神经网络(DNN),现有的TEE屏蔽DNN分区(TSDP)方法使用可信执行环境(TEE)计算非线性组件,并验证卸载到不可信GPU的线性组件的完整性。然而,直接将TSDP应用于基于Transformer的LLM会导致大量的TEE计算和TEE-GPU通信开销。本文提出通信高效的TEE-GPU注意力机制(\textsc{VeriAttn}),用于加速可验证的LLM推理。\textsc{VeriAttn}将注意力的线性和非线性计算都卸载到GPU,而TEE执行验证。此外,对于预填充阶段,\textsc{VeriAttn}使用两级流水线来重叠数据移动、TEE前后处理和GPU计算。对于解码阶段,当键值缓存超过可用GPU内存时,\textsc{VeriAttn}将注意力在TEE和GPU之间分区,以减少重复的键值传输。在Intel TDX平台上的评估表明,对于6k令牌提示和10k令牌输出,\textsc{VeriAttn}在预填充和解码阶段分别比TSDP加速2.60-3.38倍和3.86-5.42倍。

英文摘要

Computation integrity of remote large language model (LLM) serving can be questionable. For conventional deep neural networks (DNNs), the existing TEE-shielded DNN partitioning (TSDP) approach uses Trusted Execution Environment (TEE) to compute non-linear components and verify the integrity of linear components offloaded to an untrusted GPU. However, directly applying TSDP to Transformer-based LLMs incurs significant TEE computation and TEE-GPU communication overhead. This paper presents Communication-efficient TEE-GPU Attention (\textsc{VeriAttn}) for accelerating verifiable LLM inference. \textsc{VeriAttn} offloads both linear and non-linear computations of attention to the GPU, while TEE performs verification. Moreover, for prefill, \textsc{VeriAttn} uses a two-level pipeline to overlap data movement, TEE pre-/post-processing, and GPU computation. For decoding, when the key-value cache exceeds available GPU memory, \textsc{VeriAttn} partitions attention across TEE and GPU to reduce repeated key-value transfers. Evaluation on an Intel TDX platform shows that \textsc{VeriAttn} achieves 2.60-3.38$\times$ and 3.86-5.42$\times$ acceleration over TSDP for 6k-token prompts and 10k-token outputs during prefill and decoding, respectively.

2606.16384 2026-06-16 cs.LG 新提交

Mixtures of Subspaces for Bandwidth Efficient Context Parallel Training

子空间混合:面向带宽高效上下文并行训练

Sameera Ramasinghe, Ajanthan Thalaiyasingam, Hadi Mohaghegh Dolatabadi, Gil Avraham, Violetta Shevchenko, Yan Zuo, Chamin Hewa Koneputugodage, Alexander Long

发表机构 * Pluralis Research

AI总结 提出一种基于子空间混合的压缩方法,在低带宽分布式训练中实现超过95%的通信压缩,支持百亿参数模型在100K上下文长度下高效训练。

详情
AI中文摘要

预训练具有扩展上下文窗口的语言模型增强了它们在生成过程中利用丰富信息的能力。现有方法将输入序列分割成块,广播到多个设备,并逐块计算注意力,这带来了显著的通信开销。虽然在高速集群中可行,但这些方法在低带宽连接上的去中心化训练中不实用。我们提出了一种用于去中心化设置中通信高效上下文并行的压缩方法,实现了超过95%的显著压缩率,开销极小且无收敛损失。我们的关键洞察是通过高效重参数化,将激活输出动态约束到学习到的子空间混合,从而利用其内在的低秩结构。我们展示了将十亿参数去中心化模型扩展到超过100K令牌的上下文长度,在慢至300Mbps的网络上,匹配了在100Gbps互连上的集中式模型的壁钟收敛速度。

英文摘要

Pretraining language models with extended context windows enhances their ability to leverage rich information during generation. Existing methods split input sequences into chunks, broadcast them across multiple devices, and compute attention block by block which incurs significant communication overhead. While feasible in high-speed clusters, these methods are impractical for decentralized training over low-bandwidth connections. We propose a compression method for communication-efficient context parallelism in decentralized settings, achieving a remarkable compression rate of over 95\% with negligible overhead and no loss in convergence. Our key insight is to exploit the intrinsic low-rank structure of activation outputs by dynamically constraining them to learned mixtures of subspaces via efficient reparameterizations. We demonstrate scaling billion-parameter decentralized models to context lengths exceeding 100K tokens on networks as slow as 300Mbps, matching the wall-clock convergence speed of centralized models on 100Gbps interconnects.

2606.14739 2026-06-16 cs.ET cs.LG cs.SY eess.SY 交叉投稿

An RRAM-based Hardware Implementation of a Radial Basis Function Neuron for Edge Classifiers

基于RRAM的径向基函数神经元硬件实现用于边缘分类器

Georgios Papandroulidakis, Shady Agwa, Themis Prodromakis

发表机构 * Centre for Electronics Frontiers, Institute of Micro and Nano Systems(电子前沿中心,微纳系统研究所)

AI总结 提出一种基于金属氧化物RRAM的模拟内容可寻址存储器(ACAM)硬件设计,通过可配置感受野神经元实现边缘设备上的度量分类和在线自适应,在MNIST上达到89.1%准确率,每单元每操作能耗185fJ。

详情
AI中文摘要

现代机器学习(ML)解决方案在资源受限的边缘设备上的部署凸显了实现挑战。对于包含安全关键组件(如自主导航任务)的极端边缘应用尤其如此。本文展示了一种人工神经网络(ANN)设计,利用基于金属氧化物电阻式RAM(RRAM)的模拟内容可寻址存储器(ACAM)作为高效的硬件基础,用于在边缘执行基于度量的分类和在线自适应。所提出的设计基于用于构建ACAM模块的自定义模板像素(TXL)单元,其中每个TXL单元充当可配置的感受野神经元。这些单元采用径向基激活函数来计算输入与编程感受野的距离。TXL可以组织成密集阵列,用于计算高维输入与所有存储原型之间的距离,从而有效执行快速且节能的相似性搜索。该硬件引擎支持即时学习,其中感受野参数可以调整以跟踪域偏移。通过模拟所提出的TXL-RBF分类器,我们在MNIST数据集上实现了89.1%的准确率,同时在100MHz运行时每单元每操作消耗185fJ。

英文摘要

The deployment of modern machine learning (ML) solutions on resource-constrained edge devices highlights implementation challenges. This is especially true for extreme edge applications that include safety-critical components, such as autonomous navigation tasks. This paper demonstrates an artificial neural network (ANN) design leveraging Metal-Oxide Resistive RAM (RRAM) -based Analogue Content Addressable Memory (ACAM) as an efficient hardware substrate for performing metric-based classification and online adaptation on the edge. The proposed design is based on a custom Template piXeL (TXL) cell used for building the ACAM module, where each TXL cell acts as a configurable receptive field neuron. These cells employ a Radial Basis activation function to calculate the distance of an input from the programmed receptive field. The TXL can be organised into dense arrays for calculating the distance of a high-dimensional input against all stored prototypes, effectively performing fast and energy efficient similarity search. This hardware engine enables on-the-fly learning, where the receptive field parameters can be tuned to track domain shift. Through simulation of the proposed TXL-RBF classifier we can achieve 89.1\% accuracy on the MNIST dataset while consuming 185fJ per cell per operation when operating at 100MHz.

2606.14992 2026-06-16 cs.AR cs.LG 交叉投稿

KATANA: A Fast, Low-Power Mapping of Kalman Filters onto Edge NPUs for Real-Time Tracking

KATANA:一种将卡尔曼滤波器快速、低功耗映射到边缘NPU上用于实时跟踪的方法

Bodhisatwa Kundu, Anish Rooj, Sumit Saha, Abhradeep Sarkar, Arghadip Das, Arnab Raha, Mrinal K. Naskar

发表机构 * Indian Institute of Technology, Kharagpur(印度理工学院,Khargpur分校)

AI总结 针对实时跟踪系统中卡尔曼滤波器在边缘设备上的功耗和实时性约束,提出KATANA框架,通过三种代数图重写将LKF/EKF映射到商用NPU,在Intel Core Ultra系列上实现高达97.9%的动态能耗降低。

详情
AI中文摘要

状态估计是每个实时跟踪系统的闭环核心,从雷达监视和反无人机防御到自动驾驶和机器人技术。这些部署运行在边缘平台上,防御系统安装在车辆和无人机上,民用管道则存在于汽车和手持设备中。在这里,每增加一瓦计算能力都会侵蚀任务持续时间或操作范围。随之而来两个硬约束:每个新测量值必须在下一个控制周期之前融合,并且总计算量必须严格符合电池和热功率预算。线性卡尔曼滤波器(LKF)和扩展卡尔曼滤波器(EKF)是这些系统上的主要估计器,但如今它们几乎完全在CPU上执行,这会使多目标跟踪(MOT)更新串行化,或者在定制FPGA/ASIC加速器上执行,这会延长设计周期。当代AI-PC SoC,如Intel Core Ultra系列1和2,集成了一个低功耗、数据并行的神经处理单元(NPU)。因此,我们询问是否可以将卡尔曼滤波器映射到这个现有的矩阵引擎上,同时满足实时和低功耗预算,避免专用加速器,并保持CPU和GPU空闲用于主要工作负载。我们提出KATANA,一个NPU感知的优化框架,首次将LKF和EKF端到端映射到商用NPU上,并在量产AI-PC芯片上进行跨平台表征。KATANA应用了三种代数图重写:通过预计算的负投影矩阵H_neg进行减到加的重构、静态形状张量融合以及块对角批量并行化,确保100%的操作在DPU矩阵引擎上执行。在Series 2上,优化的批量EKF达到223.35 FPS,有功功率13.43 W,LKF达到408.73 FPS,有功功率14.05 W,与CPU实现相比,动态能耗降低高达97.9%。

英文摘要

State estimation is the closed-loop core of every real-time tracking system, from radar surveillance and counter-UAV defense to autonomous driving and robotics. These deployments run on edge platforms, where defense systems mount on vehicles and drones, and civilian pipelines live on cars and handheld devices. Here, every additional watt of compute erodes mission duration or operational range. Two hard constraints follow: each new measurement must be fused before the next control cycle, and the total compute must fit within a strict battery and thermal power envelope. The Linear and Extended Kalman Filters (LKF, EKF) are dominant estimators on these systems, but today they execute almost exclusively on CPUs, which serialize multi-object tracking (MOT) updates, or on custom FPGA/ASIC accelerators that lengthen design cycles. Contemporary AI-PC SoCs, like the Intel Core Ultra Series 1 and 2, integrate a low-power, data-parallel Neural Processing Unit (NPU). We therefore ask whether the Kalman filter can be mapped onto this existing matrix engine to meet real-time and low-power budgets simultaneously, avoiding a dedicated accelerator and keeping the CPU and GPU free for primary workloads. We present KATANA, an NPU-aware optimization framework delivering the first end-to-end mapping of the LKF and EKF onto a commercial NPU, alongside a cross-platform characterization on shipping AI-PC silicon. KATANA applies three algebraic graph rewrites: subtract-to-add reformulation via a precomputed negative-projection matrix H_neg, static-shape tensor fusion, and block-diagonal batched parallelization, ensuring 100% of operations execute on the DPU matrix engine. On the Series 2, the optimized batched EKF reaches 223.35 FPS at 13.43 W active power, and the LKF reaches 408.73 FPS at 14.05 W, delivering up to a 97.9% reduction in dynamic energy versus the CPU implementation.

2606.15001 2026-06-16 physics.comp-ph cond-mat.mtrl-sci cs.LG physics.chem-ph 交叉投稿

Distilling latent electrostatics from foundation machine learning interatomic potentials

从基础机器学习原子间势中提取潜静电

Xiaoyu Wang, Bingqing Cheng

发表机构 * Department of Chemistry, UC Berkeley(加州大学伯克利分校化学系) Bakar Institute of Digital Materials for the Planet, UC Berkeley(伯克利大学数字材料研究所) Chemical Sciences Division, Lawrence Berkeley National Laboratory(伯克利国家实验室化学科学部)

AI总结 提出潜埃瓦德求和(LES)方法,从基础机器学习原子间势中提取潜静电,训练轻量级学生模型,降低计算成本并提供玻恩有效电荷张量和红外光谱,基准测试表明教师模型的DFT级别比架构更重要。

详情
AI中文摘要

基础机器学习原子间势(MLIPs)已能够在化学和材料空间的广泛区域进行原子模拟,但许多模型计算成本高昂且缺乏显式静电,限制了其在长程相互作用和电响应主导的系统中的应用。此前,我们引入了潜埃瓦德求和(LES),该方法仅从密度泛函理论(DFT)能量和力标签中学习潜原子电荷和长程静电。在此,我们使用LES提取基础模型中潜藏的静电:教师模型预测的能量和力用于训练轻量级LES增强的学生MLIP,并可选择在额外DFT数据上进行微调。所得模型降低了计算成本,同时提供了玻恩有效电荷张量和红外光谱。我们针对液态水、浓盐酸和锐钛矿TiO2(101)-水界面的实验红外光谱,对从多种基础MLIP(包括基于UMA、MACE、Orb、eSEN、GemNet-OC、PET和EquiformerV2的模型)蒸馏得到的学生模型进行了基准测试。在这些系统中,大多数基础MLIP都能提取静电响应。基准测试进一步表明,用于训练教师模型的底层DFT水平和数据集在决定静电和光谱精度方面比架构更重要。对于TiO2-水界面,使用适量更高级别DFT数据进行微调可改善结构和红外预测。因此,基于LES的蒸馏提供了一条实用途径,将基础MLIP转化为高效、电响应的模型,同时测试了基础模型中编码的物理保真度。

英文摘要

Foundation machine learning interatomic potentials (MLIPs) have enabled atomistic simulations across broad regions of chemical and materials space, but many remain computationally expensive and lack explicit electrostatics, limiting their use for systems governed by long-range interactions and electrical response. Previously, we introduced Latent Ewald Summation (LES), which learns latent atomic charges and long-range electrostatics from density functional theory (DFT) energy and force labels alone. Here, we use LES to extract electrostatics that are latent in foundation models: energies and forces predicted by a teacher model are used to train a lightweight LES-augmented student MLIP, with optional fine-tuning on additional DFT data. The resulting models reduce computational cost while providing access to Born effective charge tensors, and infrared spectra. We benchmark student models distilled from a broad set of foundation MLIPs, including UMA, MACE, Orb, eSEN, GemNet-OC, PET, and EquiformerV2-based models, against experimental infrared spectra for liquid water, concentrated hydrochloric acid, and the anatase TiO2(101)-water interface. Across these systems, electrostatic response can be extracted from most foundation MLIPs. The benchmark further shows that the underlying DFT level and dataset used to train the teacher model play a larger role than architecture in determining electrostatic and spectroscopic accuracy. For the TiO2-water interface, fine-tuning with a modest amount of higher-level DFT data improves structural and infrared predictions. LES-based distillation therefore provides a practical route for converting foundation MLIPs into efficient, electrically responsive models, while also testing the physical fidelity encoded in foundation models.

2606.15004 2026-06-16 eess.SY cs.LG cs.SY 交叉投稿

CREST: Deployment-Realistic Hardware-in-the-Loop NAS for Embedded Sensing Systems

CREST:面向嵌入式传感系统的部署真实硬件在环神经网络架构搜索

Joseph Q. Zales, Pragya Sharma, Mani Srivastava

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校)

AI总结 提出CREST框架,通过硬件在环测量联合优化模型架构、目标平台、运行时调度和部署策略,在惯性里程计和音频分类任务上相比FLOPs选择降低中位推理能耗41.7%。

Comments 14 pages, 10 figures, 7 tables

详情
AI中文摘要

在低功耗微控制器(MCU)上部署神经网络需要在严格的内存、延迟和能量约束下选择模型架构。现有工作流通常沿一个或多个维度简化此过程:静态代理成本(如FLOPs或参数)、将单个MCU视为代表性目标、以及连续推理测试而非实际部署的传感调度。这些假设可能导致帕累托前沿候选排序错误、遗漏不可行部署,并掩盖调度相关的能耗。\n我们提出CREST(跨平台运行时评估与搜索工具),一种面向MCU传感系统的部署真实硬件在环(HIL)神经网络架构搜索(NAS)框架。CREST保持优化器、HIL测量边界、日志记录和重放工作流固定,同时将工作负载、模型族、目标后端、调度、量化和评分策略作为可配置轴暴露。这使得部署效应在单个可重用工作流内实验上可分离。\n我们在三个Arm Cortex-M目标上评估CREST在惯性里程计和音频分类任务上的表现。对于惯性里程计,基于测量能量的HIL搜索相比基于FLOPs的选择中位推理能耗降低41.7%,相比基于内存流量的选择降低40.8%,且误差相似。基于FLOPs的选择还会在内存受限目标上选择不可行的部署。在STM32 N657目标上,连续推理和占空比搜索产生不同的帕累托前沿。对于音频分类,相同的应用级策略在不同板上选择不同的DS-CNN架构,跨板重放显著改变部署成本。\n总体而言,CREST表明部署真实的MCU NAS必须联合优化模型架构、目标平台、运行时调度和部署策略,而非仅依赖静态代理成本或连续推理测量。

英文摘要

Deploying neural networks on low-power microcontrollers (MCUs) requires selecting model architectures under tight memory, latency, and energy constraints. Existing workflows often simplify this process along one or more axes: static proxy costs such as FLOPs or parameters, treating one MCU as representative, and continuous-inference tests instead of deployed sensing schedules. These assumptions can mis-rank Pareto-front candidates, miss infeasible deployments, and obscure schedule-dependent energy. We present CREST (Cross-platform Runtime Evaluation and Search Tool), a deployment-realistic hardware-in-the-loop (HIL) neural architecture search (NAS) framework for MCU sensing systems. CREST keeps the optimizer, HIL measurement boundary, logging, and replay workflow fixed while exposing workload, model family, target backend, schedule, quantization, and scoring policy as configurable axes. This makes deployment effects experimentally separable within one reusable workflow. We evaluate CREST on inertial odometry and audio classification across three Arm Cortex-M targets. For inertial odometry, measured-energy HIL search reduces median per-inference energy by 41.7% versus FLOPs-based selection and 40.8% versus memory-traffic-based selection at similar error. FLOPs-based selection also chooses infeasible deployments on memory-constrained targets. On the STM32 N657 target, continuous-inference and duty-cycled searches produce different Pareto frontiers. For audio classification, the same application-level policy selects different DS-CNN architectures on different boards, and cross-board replay changes deployment cost substantially. Overall, CREST shows that deployment-realistic MCU NAS must jointly optimize model architecture, target platform, runtime schedule, and deployment policy rather than relying only on static proxy costs or continuous-inference measurements.

2606.15346 2026-06-16 cs.CV cs.LG cs.MM 交叉投稿

DYNA-PRUNER: Input-Adaptive Data-Model Co-Pruning for Efficient and Scalable Spatio-Temporal Media Prediction

DYNA-PRUNER: 面向高效可扩展时空媒体预测的输入自适应数据-模型协同剪枝

Fuyan Zhang, Yuqi Li, Yingli Tian, Edmond S. L. Ho

发表机构 * The City College of New York(纽约市立学院) The Graduate Center, CUNY(纽约市立大学研究生中心) New York University(纽约大学) University of Glasgow(格拉斯哥大学)

AI总结 提出Dyna-Pruner框架,通过共享重要性同步机制实现输入自适应的数据与模型结构协同剪枝,在CNN、RNN和Transformer骨干上减少70% FLOPs并实现2.5倍加速,精度损失小于1%。

Comments ICME 2026 Spotlight Paper

详情
AI中文摘要

时空预测支持雷达/卫星临近预报和城市级交通监测,但现代模型通常因实时部署成本过高而受限。这源于密集计算与强输入依赖冗余(如平静海面或晴朗天空)之间的不匹配。为了在可扩展媒体分析中实现自动化的资源感知架构优化,我们提出Dyna-Pruner,一个用于输入依赖的数据和模型结构协同剪枝的端到端框架。一种共享重要性同步机制生成耦合掩码,剪枝冗余区域及其对应的计算单元(如卷积滤波器),从而在推理时产生每个样本的稀疏子网络。在WeatherBench、SEVIR和TaxiBJ上的实验表明,该框架与CNN、RNN和Transformer骨干无缝集成,将FLOPs减少高达70%,并在NVIDIA Jetson AGX Orin上实现2.5倍加速,精度损失可忽略不计(<1%)。

英文摘要

Spatio-temporal prediction supports radar/satellite nowcasting and city-scale traffic monitoring, but modern models are often too expensive for real-time deployment. This stems from a mismatch between dense computation and strong input-dependent redundancy (e.g., calm seas or clear skies). To enable automated, resource-aware architecture optimization in scalable media analysis, we propose Dyna-Pruner, an end-to-end framework for input-dependent co-pruning of data and model structure. A shared-importance synchronization mechanism generates coupled masks that prune redundant regions and their corresponding computational units (e.g., convolutional filters), yielding per-sample sparse sub-networks at inference time. Experiments on WeatherBench, SEVIR, and TaxiBJ show seamless integration with CNN, RNN, and Transformer backbones, reducing FLOPs by up to $70\%$ and achieving a $2.5\times$ speedup on NVIDIA Jetson AGX Orin with negligible accuracy loss ($<1\%$).

2606.15453 2026-06-16 cs.AR cs.LG 交叉投稿

A Spatio-Temporal Expert Prefetching Framework for Efficient MoE-based LLM Inference

面向高效MoE大语言模型推理的时空专家预取框架

Yingnan Zhao, Razvan Bunescu, Ahmed Louri, Avinash Karanth, Ke Wang

发表机构 * George Washington University(乔治华盛顿大学) University of North Carolina at Charlotte(北卡罗来纳大学夏洛特分校) Ohio University(俄亥俄大学)

AI总结 针对MoE大模型推理中专家加载延迟问题,通过分析专家选择行为的时空相关性,提出ST-MoE框架,结合轻量级运行时预测和可重构硬件设计,实现专家预取以重叠计算与加载,提升性能与能效。

详情
AI中文摘要

基于混合专家(MoE)的大语言模型(LLM),如Qwen和DeepSeek,最近成为一种有效的方法,可以在不按比例增加计算成本的情况下提高模型容量。通过用一组专家替换密集LLM中的传统前馈网络,并为每个输入令牌仅激活其中一部分专家,MoE模型显著增加了总参数数量,同时保持每个令牌的计算相对可控。然而,这种动态且不规则的专家激活模式在推理过程中也引入了大量的专家加载开销,因为所需的专家必须根据令牌相关的路由结果按需获取。因此,专家加载延迟成为性能和能效低下的主要来源。为此,我们首先对多种基于MoE的LLM及应用(包括语言理解和代码生成)中的专家选择行为进行了全面分析。我们的分析揭示,在每个应用领域内,专家请求在相邻的MoE层和连续的解码令牌之间表现出强相关性,使得未来的专家激活可预测。基于这一洞察,我们提出了ST-MoE,一种时空专家预取框架,它主动提前准备专家,以将专家加载与正在进行的计算重叠。ST-MoE结合了一种轻量级的运行时预测机制(保持原始路由行为)和一种可重构的硬件设计(有效支持动态专家预取)。预测机制与支持硬件的结合效果显著提高了MoE推理性能和能效,同时保持了模型推理精度。

英文摘要

Mixture-of-Experts (MoE) based large language models (LLMs), such as Qwen and DeepSeek, have recently emerged as an effective approach to improving model capacity without proportionally increasing computational cost. By replacing the conventional feed-forward network in dense LLMs with a set of experts and activating only a subset of them for each input token, MoE models significantly increase the total number of parameters while keeping the per-token computation relatively manageable. However, this dynamic and irregular expert activation pattern also introduces substantial expert loading overhead during inference, since the required experts must be fetched on demand according to token-dependent routing results. As a result, expert loading latency becomes a major source of performance and energy inefficiency. To this end, we first perform a comprehensive analysis of expert selection behavior in various MoE-based LLMs and applications, including language understanding and code generation. Our analysis reveals that, within each application domain, expert requests exhibit strong correlation across both adjacent MoE layers and consecutive decoding tokens, making future expert activations predictable. Based on this insight, we propose ST-MoE, a spatio-temporal expert prefetching framework that proactively stages experts ahead of use to overlap expert loading with ongoing computation. ST-MoE combines a lightweight runtime prediction mechanism that preserves the original routing behavior with a reconfigurable hardware design that efficiently supports dynamic expert prefetching. The combined effect of the prediction mechanism with the supporting hardware significantly improves MoE inference performance and energy efficiency while preserving model inference accuracy.

2606.15523 2026-06-16 cs.NE cs.AI cs.LG 交叉投稿

AQ4SViT: An Automated Quantization Framework with Search Gating Policy for Compressing Spiking Vision Transformers

AQ4SViT:一种用于压缩脉冲视觉Transformer的自动化量化框架与搜索门控策略

Rachmad Vidya Wicaksana Putra, Saad Iftikhar, Muhammad Shafique

发表机构 * eBRAIN Lab, Division of Engineering, New York University (NYU) Abu Dhabi(eBRAIN实验室,工程学院,纽约大学(NYU)阿布扎克分校) New York University (NYU) Abu Dhabi, United Arab Emirates (UAE)(纽约大学(NYU)阿布扎克分校,阿拉伯联合酋长国(UAE))

AI总结 提出AQ4SViT自动化量化框架,通过量化搜索策略和基于膜电位漂移的搜索门控策略,快速找到精度与内存的平衡点,实现脉冲视觉Transformer的高效压缩。

Comments 8 pages, 4 figures, 2 tables

详情
AI中文摘要

脉冲视觉Transformer(SViT)已成为替代性的低功耗ViT模型,但其大规模阻碍了在资源受限的嵌入式AI系统上的部署。为解决此问题,现有工作提出了量化技术来压缩SViT模型,但其手动、人工引导的方法需要大量设计时间和功耗来为每个给定网络找到合适的量化设置,使得该方法在量化多个网络时不可扩展。为此,我们提出了AQ4SViT,一种新颖的SViT自动化量化框架,能够提供快速的量化设置,并在精度和内存之间取得良好权衡。为实现这一点,AQ4SViT采用以下关键思想:量化搜索策略,在考虑精度约束的同时评估量化设置候选;以及搜索门控策略,通过利用膜电位漂移作为性能代理,快速评估和选择有前景的量化候选。在搜索门控策略中,AQ4SViT采用两种搜索算法变体以提供权衡选项:贪心搜索,执行速度快但可能导致局部最优;以及束搜索,执行速度较慢但由于搜索空间更广,在寻找全局最优选择方面性能更好。实验结果表明,与现有技术相比,AQ4SViT-Greedy快速找到合适的量化设置,搜索时间加快高达6.6倍,内存节省高达82.5%;而AQ4SViT-Beam进一步将内存占用降低高达90%,但搜索时间延长4.5倍;所有这些结果均在保持高精度的前提下获得,在ImageNet数据集上精度与原始/非量化模型相差在1.5%以内。这些结果凸显了AQ4SViT框架在推动SViT在嵌入式AI系统部署方面的进展。

英文摘要

Spiking Vision Transformers (SViTs) have emerged as alternative low-power ViT models, but their large sizes hinder their deployments on resource-constrained embedded AI systems. To address this, state-of-the-art works proposed quantization techniques to compress SViT models, but their manual, human-guided approach needs a huge design time and power/energy consumption to find the appropriate quantization setting for each given network, making this approach not scalable for quantizing multiple networks. Toward this, we propose AQ4SViT, a novel automated quantization framework for SViTs that can provide quick quantization settings with good trade-offs between accuracy and memory. To achieve this, AQ4SViT employs the following key ideas: quantization search strategy that evaluates the quantization setting candidates while considering the accuracy constraint; and search gating policy that quickly evaluates and selects promising quantization candidates by leveraging membrane potential drift as a performance proxy. In the search gating policy, AQSViT employs two search algorithm variants to provide trade-off options: Greedy search, which performs fast but may lead to local optima; and Beam search, which performs slower but has better performance in finding global optima selection due to a wider search space. Experimental results show that AQ4SViT-Greedy quickly finds the appropriate quantization settings, achieving up to 6.6x faster search time and up to 82.5% memory saving compared to the state-of-the-art; while AQ4SViT-Beam further reduces the memory footprint by up to 90% compared to the state-of-the-art, but with 4.5x longer search time; all these results are obtained while maintaining high accuracy within 1.5% from the original/non-quantized models on the ImageNet dataset. These results highlight that AQ4SViT framework offers advancements toward SViT deployments on embedded AI systems.

2606.15959 2026-06-16 cs.DC cs.AI cs.LG 交叉投稿

Quantifying the Impact of Lossy Compression on Neural Generative Surrogate Modeling

量化有损压缩对神经生成代理建模的影响

Zhimin Li, Harshitha Menon, Charles Jekel, Valerio Pascucci, Peter Lindstrom

发表机构 * LLNL-CONF-2007282

AI总结 研究有损压缩训练数据对生成代理模型质量的影响,提出利用神经网络训练不确定性估计压缩容错阈值的方法,在保持模型质量的同时实现高达39倍的数据存储节省和3倍的训练加速。

详情
AI中文摘要

神经网络被用作科学发现的生成代理模型,这些模型是可训练的科学模拟近似。它们使用户能够用学习到的替代方案取代耗时的数值模拟,提供快速解决方案。然而,高保真生成代理模型需要庞大的训练数据集,这可能导致存储和I/O挑战。有损压缩是减轻这一负担的有前景的方法,但压缩误差可能以微妙的方式影响模型质量,使得量化其影响具有挑战性。在这项工作中,我们研究了训练数据的有损压缩如何影响生成代理模型的质量。我们首先刻画了训练神经网络固有的不确定性,表明相同的训练配置可能产生不同的模型。通过利用这种变异性,我们提出了一种方法来估计代理模型在不影响其准确性的情况下可以容忍多少压缩引起的误差。对两个应用模拟的评估表明,我们的方法显著降低了内存/存储需求,加快了训练速度,同时生成了高质量的代理模型。这些结果表明,有损压缩可节省高达23.7倍和39倍的数据存储,而对代理模型质量的影响可忽略不计。同时,减小训练数据集的大小也提高了数据加载速度,并将训练时间减少了多达3倍。

英文摘要

Neural networks are used as generative surrogate models for scientific discovery, which are trainable approximations of scientific simulations. These models enable users to replace time-consuming numerical simulations with learned alternatives, providing quick solutions. However, high-fidelity generative surrogate models require massive training datasets, which can create storage and I/O challenges. Lossy compression is a promising way to reduce this burden, but compression errors may affect the model quality in subtle ways, making it challenging to quantify their impact. In this work, we examine how lossy compression of training data impacts the quality of generative surrogate models. We begin by characterizing the uncertainty inherent in training neural networks, showing that identical training configurations can produce different models. By exploiting this variability, we propose a method to estimate how much compression-induced error a surrogate model can tolerate without affecting its accuracy. Evaluation of two application simulations demonstrates that our approach significantly reduces memory/storage requirements and speeds up training while producing high-quality surrogate models. These results show that lossy compression saves data storage up to 23.7x and 39x with negligible impact on the quality of the surrogate model. Meanwhile, reducing the size of the training data set also enhances the data loading speed and reduces the training time by up to 3x.

2606.16131 2026-06-16 cs.CV cs.LG 交叉投稿

Shift-and-Sum Quantization for Visual Autoregressive Models

Shift-and-Sum 量化用于视觉自回归模型

Jaehyeon Moon, Bumsub Ham

发表机构 * Yonsei University(延世大学) Articron

AI总结 提出针对视觉自回归模型的训练后量化框架,通过移位求和量化减少注意力值乘积误差,并采用重采样策略校准数据,在图像生成等任务上达到新最优。

Comments ICLR 2026

详情
AI中文摘要

训练后量化(PTQ)能够使用少量数据实现深度网络的高效部署。然而,其在视觉自回归模型(VAR)上的应用仍相对未被探索。我们识别出将PTQ应用于VAR的两个关键挑战:(i)注意力值乘积中的大重建误差,尤其是在高注意力分数更频繁出现的粗尺度上;(ii)由于有限的校准数据,码本条目的采样频率与其预测概率之间存在差异。为了解决这些挑战,我们提出了一种针对VAR的PTQ框架。首先,我们引入了一种移位求和量化方法,通过聚合值令牌的对称移位副本的量化结果来减少重建误差。其次,我们提出了一种校准数据的重采样策略,使码本条目的采样频率与其预测概率对齐。在类别条件图像生成、修复、外推和类别条件编辑上的实验表明,该方法在VAR架构上取得了一致的改进,为VAR的PTQ建立了新的最先进水平。

英文摘要

Post-training quantization (PTQ) enables efficient deployment of deep networks using a small set of data. Its application to visual autoregressive models (VAR), however, remains relatively unexplored. We identify two key challenges for applying PTQ to VAR: (i) large reconstruction errors in attention-value products, especially at coarse scales where high attention scores occur more frequently; and (ii) a discrepancy between the sampling frequencies of codebook entries and their predicted probabilities due to limited calibration data. To address these challenges, we propose a PTQ framework tailored for VAR. First, we introduce a shift-and-sum quantization method that reduces reconstruction errors by aggregating quantized results from symmetrically shifted duplicates of value tokens. Second, we present a resampling strategy for calibration data that aligns sampling frequencies of codebook entries with their predicted probabilities. Experiments on class-conditional image generation, inpainting, outpainting, and class-conditional editing show consistent improvements across VAR architectures, establishing a new state of the art in PTQ for VAR.

2606.16359 2026-06-16 cs.CR cs.LG 交叉投稿

FEnc$^2$: Unifying Data Packing for Efficient Private Inference via Convolution and Architecture-Aware Fragment Encoding

FEnc$^2$: 通过卷积和架构感知的片段编码统一数据打包以实现高效私有推理

Ran Ran, Zhaoting Gong, Nuo Xu, Yuanchao Xu, Fan Yao, Wujie Wen

发表机构 * University of Science and Technology of China(中国科学技术大学) Tsinghua University(清华大学)

AI总结 提出FEnc$^2$框架,通过卷积感知编码和架构感知密文压缩优化CKKS加密推理中的密文打包,减少旋转和密文数量,实现端到端加速高达228倍。

Comments 15 pages, 9 figures. To appear in ISCA 2026

详情
AI中文摘要

全同态加密(FHE)支持隐私保护机器学习,但带来了极高的计算和内存开销。这些成本不仅来自昂贵的底层原语(包括数论变换(NTT)、旋转和密钥切换),还来自应用层低效的密文打包。现有的打包策略通常保留相邻数据元素或特征分组,但无法同时兼顾两者,导致密文槽浪费、旋转过多以及密文数量膨胀。我们提出FEnc$^2$,一个统一且原则性的基于片段的编码框架,用于基于CKKS的私有卷积神经网络推理。FEnc$^2$通过两个组件优化槽利用率、旋转复杂度和密文密度:1)卷积感知编码,分析性地选择最优片段大小以解耦空间依赖,并联合最小化层间内外部旋转;2)架构感知密文压缩,在特征或通道缩减层后恢复密文密度。这些变换共同重塑加密工作负载结构,将同态操作减少一到两个数量级。在充分利用内存容量(即最大批量大小)下,FEnc$^2$在MNIST上的LeNet上相比最先进的Orion实现了GPU上高达228.83倍和CPU上226.06倍的端到端延迟加速,在ImageNet上的MobileNet上实现了GPU上高达4.55倍和CPU上9.43倍的加速。FEnc$^2$与硬件无关但具有架构变革性:通过在执行前优化加密张量布局,它减少了密文数量和硬件上的工作负载压力,补充了NTT和密钥切换加速器等底层优化。这些结果表明,应用层数据布局是加密推理的一级架构设计维度,也是下一代FHE系统的重要推动因素。

英文摘要

Fully Homomorphic Encryption (FHE) enables privacy-preserving machine learning but incurs extreme computational and memory overhead. These costs come not only from expensive low-level primitives, including Number Theoretic Transform (NTT), rotation, and key-switching, but also from inefficient ciphertext packing at the application level. Existing packing strategies typically preserve either neighboring data elements or feature grouping, but not both, leading to wasted ciphertext slots, excessive rotations, and inflated ciphertext counts. We propose FEnc2, a unified and principled fragment-based encoding framework for CKKS-based private convolutional neural network inference. FEnc2 optimizes slot utilization, rotation complexity, and ciphertext density through two components: 1)Conv-aware Encoding, which analytically selects an optimal fragment size to decouple spatial dependencies and jointly minimize inner-outer rotations across layers, and 2)Arch-aware Ct Compression, which restores ciphertext density after feature- or channel-reduction layers. Together, these transformations reshape encrypted workload structure and reduce homomorphic operations by one to two orders of magnitude. With full memory capacity utilized, i.e., at maximum batch size, FEnc2 achieves end-to-end latency speedups over the state-of-the-art Orion of up to 228.83x on GPU and 226.06x on CPU for LeNet on MNIST, and up to 4.55x on GPU and 9.43x on CPU for MobileNet on ImageNet. FEnc2 is hardware-agnostic yet architecturally transformative: by optimizing encrypted tensor layout before execution, it reduces ciphertext count and workload pressure on hardware, complementing primitive-level optimizations such as NTT and keyswitch accelerators. These results show that application-level data layout is a first-order architectural design dimension for encrypted inference and an important enabler for next-generation FHE systems.

2606.16440 2026-06-16 cs.AR cs.AI cs.LG 交叉投稿

NeuronFabric: A Software Reference Architecture for On-Chip Transformer Training with Local Adam

NeuronFabric:一种用于片上Transformer训练与本地Adam的软件参考架构

Evgeny Ukladchikov

发表机构 * Independent Researcher(独立研究者)

AI总结 提出NeuronFabric软件参考架构,用于FPGA/ASIC实现Transformer训练与本地Adam优化,通过BF16W权重存储减少片上内存需求,在334K参数模型上验证数值正确性。

详情
AI中文摘要

公开记载的加速器架构通常将训练计算与优化器状态更新分离,或依赖外部内存和主机协调。本文提出NeuronFabric,一种旨在用于未来FPGA和ASIC实现Transformer训练与本地Adam更新的软件参考架构。一个完整的C#原型实现了前向传播、反向传播和Adam优化,无需外部机器学习框架。目标是在硬件实现前验证数值正确性和内存需求。评估模型是一个334K参数的自回归Transformer(d=88, H=4, f=264, L=4, vocab=256),在莎士比亚语料库上训练。BF16W配置在80K样本后达到评估损失1.5426,而FP32 GPU参考为1.5224,同时生成连贯的字符级文本。本文引入BF16W,它以BF16存储权重,同时以FP32保留Adam优化器动量。这减少了片上训练的内存需求。一个带Adam动量的334K参数FP32模型需要约4.0 MB,与Xilinx ZCU102设备的BRAM容量匹配。BF16W变体需要约3.34 MB,为激活存储留出内存。我们描述了早期实验中观察到的词汇预算约束,量化了BF16W内存节省,并概述了FPGA训练作为下一开发阶段。本文不包含FPGA测量。本出版物作为未来FPGA和ASIC探索NeuronFabric架构的公开架构披露和软件参考实现。

英文摘要

Publicly documented accelerator architectures generally separate training computation from optimizer-state updates or rely on external memory and host orchestration. This paper presents NeuronFabric, a software reference architecture intended for future FPGA and ASIC implementations of transformer training with local Adam updates. A complete C# prototype implements forward pass, backpropagation, and Adam optimization without external machine-learning frameworks. The goal is to validate numerical correctness and memory requirements before hardware implementation. The evaluated model is a 334K-parameter autoregressive transformer (d=88, H=4, f=264, L=4, vocab=256) trained on the Shakespeare corpus. The BF16W configuration achieves evaluation loss 1.5426 after 80K samples, compared with 1.5224 for an FP32 GPU reference, while producing coherent character-level text. The paper introduces BF16W, which stores weights in BF16 while retaining Adam optimizer moments in FP32. This reduces memory requirements for on-chip training. A 334K-parameter FP32 model with Adam moments requires approximately 4.0 MB, matching the BRAM capacity of a Xilinx ZCU102 device. The BF16W variant requires approximately 3.34 MB, leaving memory available for activation storage. We describe the vocabulary-budget constraint observed during earlier experiments, quantify BF16W memory savings, and outline FPGA training as the next stage of development. No FPGA measurements are included in this paper. This publication serves as a public architectural disclosure and software reference implementation for future FPGA and ASIC exploration of the NeuronFabric architecture.

2606.16599 2026-06-16 cs.AR cs.LG 交叉投稿

TreeGRNG: Binary Tree Gaussian Random Number Generator for Efficient Probabilistic AI Hardware

TreeGRNG:用于高效概率AI硬件的二叉树高斯随机数生成器

Jonas Crols, Guilherme Paim, Shirui Zhao, Marian Verhelst

AI总结 提出TreeGRNG,一种基于二叉树的GRNG,用低成本常数比较器替代算术单元,在保持分布精度的同时实现每样本能耗降低3.7倍、单位面积吞吐量提升5.8倍,并支持灵活调整采样分布形状。

Comments 6 pages, 5 figures, Proceeded by the 2024 Design, Automation and Test in Europe Conference (DATE)

详情
Journal ref
2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1-6 (2024)
AI中文摘要

贝叶斯神经网络(BNN)通过监控决策中的不确定性,为增强传统神经网络的可信度提供了机会。然而,在极端边缘进行BNN推理的一个显著缺点是在每个神经元中必须集成高斯随机数生成器(GRNG)。最先进的GRNG算法严重依赖多种算术运算和大型查找表的使用,给超低功耗硬件实现带来了重大挑战。为了克服这一问题,本文提出了一种创新的二叉树随机数生成器(TreeGRNG),允许使用超低成本的常数比较器代替算术单元。我们进一步利用高斯特性对TreeGRNG方案进行了一系列硬件感知优化。优化后的TreeGRNG在分布精度上超越了最先进技术(SoTA),同时每样本能耗降低了3.7倍,单位面积吞吐量提升了5.8倍。此外,我们的TreeGRNG方案在灵活性方面比当前SoTA具有明显优势,因为它使设计者能够轻松调整采样概率分布的形状,超越了传统GRNG的能力,为未来概率AI设计开辟了前景。TreeGRNG设计已在链接中开源。

英文摘要

Bayesian Neural Networks (BNNs) offer opportunities for greatly enhancing the trustworthiness of conventional neural networks by monitoring the uncertainties in decision-making. A significant drawback for BNN inference at the extreme edge, however, is the imperative need to incorporate Gaussian Random Number Generators (GRNG) within each neuron. State-of-the-art GRNG algorithms heavily depend on multiple arithmetic operations and the use of extensive look-up tables, posing significant implementation challenges for ultra-low power hardware implementations. To overcome this, this paper presents an innovative binary tree random number generator (TreeGRNG) allowing the use of ultra-low-cost constant comparators instead of arithmetic units. We further enhance the TreeGRNG proposal with a set of hardware-aware optimizations exploiting the Gaussian properties. The optimized TreeGRNG surpasses the State-of-the-Art (SoTA) in terms of distribution accuracy while achieving a 3.7$\times$ reduction in energy per sample and boosting the throughput per unit area by 5.8$\times$. Moreover, our TreeGRNG proposal possesses a distinct advantage over the current SoTA in terms of flexibility, as it easily enables designers to adjust the shape of the sampled probability distribution, extending beyond the capabilities of traditional GRNGs, opening the horizon towards future probabilistic AI designs. The TreeGRNG design is available open-source in the link

2606.16981 2026-06-16 cs.DB cs.LG 交叉投稿

Decoupling Inference from State Updates in Low-Latency Feature Engines via Probabilistic Thinning

通过概率稀疏化在低延迟特征引擎中解耦推理与状态更新

Augusto Peres, Iker Perez, Pedro Valdeira, Guilherme Jardim, Ana Sofia Gomes, Hugo Ferreira, Pedro Bizarro

AI总结 提出概率稀疏化方法,在流式机器学习管道中解耦推理与状态持久化,仅对信息事件触发状态更新,显著降低存储I/O和序列化开销,同时保持下游效用。

详情
AI中文摘要

流数据系统日益支撑着维护大量连续更新聚合的机器学习工作流。在生产环境中,每个传入事件通常触发持久化存储的读-修改-写操作,使得高频状态更新成为延迟、争用和运营成本的主要来源。在这项工作中,我们通过概率稀疏化在流式机器学习管道中解耦推理与状态持久化:每个事件都被评分,但持久状态更新仅由信息事件选择性触发。与丢弃输入或状态的方法不同,我们展示了无需高频内存控制平面或跨工作节点协调即可实现持久化路径控制,仅依赖于从磁盘支持的键值存储中检索的近似统计信息。我们对产生的随机过程进行建模,推导出过滤率的界限,并证明常见的时间聚合在考虑方差的情况下保持无偏,防止系统性误差累积。我们在一个隔离每事件成本的受控环境中评估该方法,展示了存储输入/输出和序列化开销的显著减少。在实验中,高达90%的事件被排除在持久化路径之外,同时保持并在某些情况下改善了下游效用。

英文摘要

Streaming data systems increasingly underpin Machine Learning workflows that maintain large numbers of continuously updated aggregations. In production settings, each incoming event typically triggers read-modify-write operations to persistent storage, making high-frequency state updates a dominant source of latency, contention, and operational cost. In this work, we decouple inference from state persistence in streaming Machine Learning pipelines via probabilistic thinning: every event is scored, but durable state updates are selectively triggered by informative events. Unlike approaches that shed input or state, we show that persistence-path control is achievable without a high-frequency in-memory control plane or cross-worker coordination, relying exclusively on approximate statistics retrieved from disk-backed key-value stores. We model the resulting stochastic processes, derive bounds on filtering rates, and prove that common time-based aggregations remain unbiased under variance-aware formulations, preventing systemic error accumulation. We evaluate the approach in a controlled setting that isolates per-event costs, demonstrating substantial reductions in storage Input/Output and serialization overhead. Across experiments, up to 90% of events are excluded from the persistence path while preserving and in some cases improving downstream utility.

2606.17016 2026-06-16 cs.CL cs.AI cs.LG cs.MA 交叉投稿

TokenPilot: Cache-Efficient Context Management for LLM Agents

TokenPilot: 面向LLM智能体的缓存高效上下文管理

Buqiang Xu, Zirui Xue, Dianmou Chen, Chenyang Fu, Chiyu Wu, Caiying Huang, Chen Jiang, Jizhan Fang, Xinle Deng, Yijun Chen, Yunzhi Yao, Xuehai Wang, Jin Shang, Gong Yu, Ningyu Zhang

发表机构 * Zhejiang University(浙江大学) University of Electronic Science and Technology of China(电子科技大学) Xi’an University of Electronic Science and Technology(西安电子科技大学) HomologyAI(同源人工智能)

AI总结 针对LLM智能体长会话中上下文累积导致推理成本高的问题,提出TokenPilot双粒度上下文管理框架,通过摄入感知压缩和生命周期感知驱逐策略,在保持性能的同时降低61%-87%的成本。

Comments LightMem Series: Work in Progress

详情
AI中文摘要

随着LLM智能体被部署在长周期会话中,上下文累积推高了推理成本。现有方法利用文本修剪或动态内存驱逐来最小化token占用,但其无约束的序列突变改变了布局,引入前缀不匹配和缓存失效。这揭示了文本稀疏性与提示缓存连续性之间的关键权衡。为解决此问题,我们提出TokenPilot,一个双粒度上下文管理框架。全局上,摄入感知压缩作为框架工具,稳定提示前缀并在摄入门处消除开放世界环境噪声。局部上,生命周期感知驱逐监控上下文段的持续剩余效用,强制执行保守的批处理轮次调度,仅在任务相关性过期时卸载内容段。在PinchBench和Claw-Eval上的隔离和连续模式实验表明,TokenPilot在隔离模式下成本降低61%和56%,在连续模式下降低61%和87%,同时与先前系统相比保持竞争性能。TokenPilot已集成到LightMem2中,地址为https://github.com/zjunlp/LightMem2。

英文摘要

As LLM agents are deployed in long-horizon sessions, context accumulation drives up inference costs. Existing approaches utilize text pruning or dynamic memory eviction to minimize token footprints; however, their unconstrained sequence mutations alter layouts, introducing prefix mismatches and cache invalidation. This reveals a critical trade-off between text sparsity and prompt cache continuity. To address this, we present TokenPilot, a dual-granularity context management framework. Globally, Ingestion-Aware Compaction acts as a framework harness to stabilize prompt prefixes and eliminate open-world environmental noise at the ingestion gate. Locally, Lifecycle-Aware Eviction monitors the ongoing residual utility of context segments, enforcing a conservative batch-turn schedule to offload content segments only when task relevance expires. Experiments on PinchBench and Claw-Eval under both isolated and continuous modes demonstrate that TokenPilot reduces costs by 61% and 56% in isolated mode, and 61% and 87% in continuous mode, while maintaining competitive performance compared to prior systems. TokenPilot has been integrated into LightMem2 at https://github.com/zjunlp/LightMem2.

2606.17034 2026-06-16 cs.CL cs.LG 交叉投稿

KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing

KVEraser: 学习操控KV缓存以实现高效的局部上下文擦除

Mufei Li, Shikun Liu, Dongqi Fu, Haoyu Wang, Yinglong Xia, Hong Li, Hong Yan, Pan Li

发表机构 * Georgia Institute of Technology(佐治亚理工学院) Meta

AI总结 提出KVEraser方法,通过学习操控KV缓存实现局部上下文擦除,避免全局重计算,在长上下文任务中接近全重算性能且延迟仅增加24%。

Comments Oral at the ICML 2026 Workshop on the Impact of Memorization on Trustworthy Foundation Models

详情
AI中文摘要

在KV缓存上进行事后上下文擦除具有挑战性,因为局部编辑会产生全局影响:一旦某个跨度被处理,其影响会传播到所有后续token的缓存状态。这个问题在长上下文LLM应用中自然出现,其中过时的检索事实、错误的工具观察、撤回的用户偏好或有害的提示注入可能仅在预填充后才发现。精确擦除必须重新计算删除跨度后的所有token,使其计算成本取决于后缀长度而非擦除跨度长度。我们引入KVEraser,一种学习型KV缓存编辑方法,用于高效的局部上下文擦除。给定已处理的上下文和要移除的跨度,KVEraser仅用学习到的操控状态替换擦除区间的KV状态,同时保持其余缓存不变。为了学习可迁移的擦除机制,我们构建了一个两阶段训练流程:通用跨度-邻居预训练教会擦除器抑制擦除跨度的影响,而任务特定微调将此能力适应下游场景。实验表明,在1K--32K上下文长度的域内任务中,KVEraser在擦除后性能上几乎匹配全重算,而其延迟仅增加24%,而全重算延迟增加17.6倍。KVEraser还能泛化到具有有害事实干扰项的未见长文档QA任务,在全重算的3--4倍加速下,在近似基线中取得最佳性能。

英文摘要

Post-hoc context erasing over the KV cache is challenging because a local edit has a global consequence: once a span has been processed, its influence propagates into the cached states of all subsequent tokens. This issue arises naturally in long-context LLM applications, where stale retrieved facts, incorrect tool observations, retracted user preferences, or harmful prompt injections may be identified only after prefill. Exact erasing must then recompute all tokens after the deleted span, making its computational cost depend on suffix length rather than erased-span length. We introduce KVEraser, a learned KV-cache editing method for efficient localized context erasing. Given a processed context and a span to remove, KVEraser replaces only the KV states of the erased interval with learned steering states while reusing the remaining cache unchanged. To learn a transferable erasing mechanism, we build a two-stage training pipeline: generic span-neighbor pre-training teaches the eraser to suppress the influence of the erased span, while task-specific fine-tuning adapts this capability to downstream scenarios. Experiments show that KVEraser nearly matches full recomputation in post-erasure performance on in-domain tasks across 1K--32K context lengths, while its latency increases by only 24% compared with a 17.6x increase for full recomputation. KVEraser also generalizes to unseen long-document QA tasks with harmful factual distractors, achieving the best performance among approximate baselines with a 3--4x speedup over full recomputation.

2507.20424 2026-06-16 cs.LG cs.DC 版本更新

Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning

面向深度学习协同平坦最优恢复的高效通信分布式训练

Tolga Dimlioglu, Anna Choromanska

发表机构 * Electrical Engineering Dept.(电气工程系) New York University(纽约大学)

AI总结 提出分布式拉推力(DPPF)算法,通过轻量正则化鼓励工人协作寻找宽最小值,在保持通信效率的同时提升泛化性能,理论证明其自稳定性和收敛性。

Comments Accepted to UAI 2026 - 9 pages main body, 33 pages of supplementary material for hyperparameter configurations, full proofs of theorems and additional results

详情
AI中文摘要

我们研究深度神经网络(DNN)的集中式分布式数据并行训练,旨在改善局部梯度方法在通信效率与模型性能之间的权衡。为此,我们重新审视平坦最小值假设,该假设认为泛化能力更好的模型倾向于位于损失景观的平坦区域。我们引入一种简单而有效的锐度度量——逆均值谷(Inverse Mean Valley),并证明其与DNN泛化差距的强相关性。我们将该度量的高效松弛作为轻量正则化项纳入分布式训练目标,鼓励工人协同寻找宽最小值。该正则化项产生一个推力,抵消将工人拉向一致的共识步骤,从而形成分布式拉推力(DPPF)算法。实验表明,DPPF优于其他通信高效方法,在保持通信效率的同时,比局部梯度方法和同步梯度平均获得更好的泛化性能。此外,我们的损失景观可视化证实了DPPF定位平坦最小值的能力。理论方面,我们证明DPPF引导工人跨越平坦谷,最终谷宽由推力和拉力的相互作用决定,且其拉推动力学是自稳定的。我们进一步提供了与谷宽相关的泛化保证,并证明了非凸设置下的收敛性。

英文摘要

We study centralized distributed data parallel training of deep neural networks (DNNs), aiming to improve the trade-off between communication efficiency and model performance of the local gradient methods. To this end, we revisit the flat-minima hypothesis, which suggests that models with better generalization tend to lie in flatter regions of the loss landscape. We introduce a simple, yet effective, sharpness measure, Inverse Mean Valley, and demonstrate its strong correlation with the generalization gap of DNNs. We incorporate an efficient relaxation of this measure into the distributed training objective as a lightweight regularizer that encourages workers to collaboratively seek wide minima. The regularizer exerts a pushing force that counteracts the consensus step pulling the workers together, giving rise to the Distributed Pull-Push Force (DPPF) algorithm. Empirically, we show that DPPF outperforms other communication-efficient approaches and achieves better generalization performance than local gradient methods and synchronous gradient averaging, while maintaining communication efficiency. In addition, our loss landscape visualizations confirm the ability of DPPF to locate flatter minima. On the theoretical side, we show that DPPF guides workers to span flat valleys, with the final valley width governed by the interplay between push and pull strengths, and that its pull-push dynamics is self-stabilizing. We further provide generalization guarantees linked to the valley width and prove convergence in the non-convex setting.

2510.04212 2026-06-16 cs.LG cs.AI 版本更新

Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

低精度Transformer训练失败的原因:对Flash Attention的分析

Haiquan Qiu, Quanming Yao

发表机构 * Department of Electronic Engineering, Tsinghua University(清华大学电子工程系) Beijing National Research Center for Information Science and Technology(北京信息科学与技术国家研究中心) State Key laboratory of Space Network and Communications(空间网络与通信国家重点实验室)

AI总结 本文首次从机制上解释了低精度下Flash Attention导致训练崩溃的原因,揭示了相似低秩表示与有偏舍入误差的恶性循环,并通过最小化修改稳定训练。

Comments ICLR 2026

详情
AI中文摘要

追求计算效率推动了在训练Transformer模型时采用低精度格式。然而,这一进展常常受到臭名昭著的训练不稳定性的阻碍。本文首次对一种长期存在且未解决的失败案例提供了机制性解释,即在低精度设置下使用flash attention会导致灾难性的损失爆炸。我们的深入分析表明,这种失败并非随机现象,而是由两个相互交织的现象引起的:注意力机制内出现相似的低秩表示,以及低精度算术中固有的有偏舍入误差的累积效应。我们展示了这些因素如何形成误差累积的恶性循环,破坏权重更新,最终使训练动态偏离轨道。为了验证我们的发现,我们对flash attention引入了一个最小修改,以减轻舍入误差的偏差。这一简单改变稳定了训练过程,证实了我们的分析,并为这一长期存在的问题提供了实用解决方案。代码可在以下网址获取:此 https URL。

英文摘要

The pursuit of computational efficiency has driven the adoption of low-precision formats for training transformer models. However, this progress is often hindered by notorious training instabilities. This paper provides the first mechanistic explanation for a long-standing and unresolved failure case where training with flash attention in low-precision settings leads to catastrophic loss explosion. Our in-depth analysis reveals that the failure is not a random artifact but caused by two intertwined phenomena: the emergence of similar low-rank representations within the attention mechanism and the compounding effect of biased rounding errors inherent in low-precision arithmetic. We demonstrate how these factors create a vicious cycle of error accumulation that corrupts weight updates, ultimately derailing the training dynamics. To validate our findings, we introduce a minimal modification to the flash attention that mitigates the bias in rounding errors. This simple change stabilizes the training process, confirming our analysis and offering a practical solution to this persistent problem. Code is available at https://github.com/ucker/why-low-precision-training-fails.

2510.16882 2026-06-16 cs.LG cs.AI cs.CL 版本更新

Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning

面向LLM监督微调的效用-多样性感知在线批次选择

Heming Zou, Yixiu Mao, Yun Qu, Qi Wang, Xiangyang Ji

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出UDS框架,利用logits矩阵核范数和轻量记忆缓冲实现高效在线批次选择,兼顾数据效用与多样性,无需外部资源,在多个基准上优于现有方法并降低训练时间。

Comments ICML 2026 accepted paper

详情
AI中文摘要

监督微调(SFT)是一种常用的技术,用于将大型语言模型(LLM)适配到下游任务。在实践中,对整个数据集进行SFT计算成本高昂,且有时会导致过拟合或偏差放大。这促进了SFT中数据筛选的兴起,即优先选择最有价值的数据进行优化。本文研究了在线批次选择系列方法,这些方法在训练过程中动态评分和过滤样本。然而,现有的流行方法通常(i)仅依赖数据的效用选择子集,而忽略多样性等其他关键因素,(ii)依赖外部资源如参考模型或验证集,以及(iii)相对于全数据集训练增加了额外训练时间。为解决这些局限,本文开发了UDS(效用-多样性采样),一个用于SFT中高效在线批次选择的框架。UDS利用logits矩阵的核范数来捕获数据效用和样本内多样性,同时通过与历史样本的轻量内存缓冲进行高效低维嵌入比较来估计样本间多样性。这种设计消除了对外部资源和不必要反向传播的需求,确保了计算效率。在多个基准上的实验表明,UDS在不同数据预算下始终优于最先进的在线批次选择方法,并且与全数据集微调相比显著减少了训练时间。代码可在该https URL获取。

英文摘要

Supervised fine-tuning (SFT) is a commonly used technique to adapt large language models (LLMs) to downstream tasks. In practice, SFT on a full dataset is computationally expensive and sometimes suffers from overfitting or bias amplification. This facilitates the rise of data curation in SFT, which prioritizes the most valuable data to optimze. This work studies the online batch selection family that dynamically scores and filters samples during the training process. However, existing popular methods often (i) rely merely on the utility of data to select a subset while neglecting other crucial factors like diversity, (ii) rely on external resources such as reference models or validation sets, and (iii) incur extra training time over full-dataset training. To address these limitations, this work develops UDS (Utility-Diversity Sampling), a framework for efficient online batch selection in SFT. UDS leverages the nuclear norm of the logits matrix to capture both data utility and intra-sample diversity, while estimating inter-sample diversity through efficient low-dimensional embedding comparisons with a lightweight memory buffer of historical samples. Such a design eliminates the need for external resources and unnecessary backpropagation, securing computational efficiency. Experiments on multiple benchmarks demonstrate that UDS consistently outperforms state-of-the-art online batch selection methods under varying data budgets, and significantly reduces training time compared to full-dataset fine-tuning. Code is available at https://github.com/gfyddha/UDS.

2511.18689 2026-06-16 cs.LG 版本更新

QuantKAN: A Unified Quantization Framework for Kolmogorov Arnold Networks

QuantKAN:科尔莫戈罗夫-阿诺德网络的统一量化框架

Kazi Ahmed Asif Fuad, Lizhong Chen

发表机构 * Department of EECS, Oregon State University(电子工程与计算机科学系,俄勒冈州立大学)

AI总结 提出QuantKAN,首个针对KAN网络的统一量化感知训练和后训练量化框架,通过分支感知量化器处理异构参数,在多个数据集上建立基准,并揭示架构特定失效模式。

详情
AI中文摘要

科尔莫戈罗夫-阿诺德网络(KANs)用基于样条的函数替代线性权重,提供了强大的表达能力,但由于参数分布异构,给低精度部署带来了挑战。我们引入了QuantKAN,这是第一个针对KANs的量化感知训练(QAT)和后训练量化(PTQ)的统一框架。该框架对基参数和样条参数采用分支感知量化器,并将现代QAT和PTQ方法扩展到基于样条的层,涵盖EfficientKAN、FastKAN、PyKAN和KAGN。在MNIST、CIFAR-10/100、TinyImageNet和ImageNet上的实验提供了首个统一的QAT/PTQ KAN基准,表明DSQ在激进低位设置下是最稳健的QAT方法,而GPTQ在中等精度下是最强的PTQ方法。敏感性分析揭示了架构特定的失效模式:在FastKAN中,样条/基参数占主导,而在EfficientKAN、GRAM和PyKAN中,基或缩放参数占主导。在Xilinx UltraScale+设备上的Vivado HLS估计进一步表明,在W4A4下吞吐量提升高达3.32倍,每推理估计动态能耗降低7.7倍,揭示了残留的“基评估税”,这激发了基感知微架构。QuantKAN可在该https URL获取。

英文摘要

Kolmogorov--Arnold Networks (KANs) replace linear weights with spline-based functions, offering strong expressivity but posing challenges for low-precision deployment due to heterogeneous parameter distributions. We introduce QuantKAN, the first unified framework for quantization-aware training (QAT) and post-training quantization (PTQ) of KANs. The framework employs branch-aware quantizers for base and spline parameters and extends modern QAT and PTQ methods to spline-based layers across EfficientKAN, FastKAN, PyKAN, and KAGN. Experiments on MNIST, CIFAR-10/100, TinyImageNet, and ImageNet provide the first unified QAT/PTQ KAN benchmarks and show that DSQ is the most robust QAT method at aggressive low-bit settings, while GPTQ is the strongest PTQ method at moderate precision. Sensitivity analyses reveal architecture-specific failure modes: spline/basis parameters dominate in FastKAN, while base or scaling parameters dominate in EfficientKAN, GRAM, and PyKAN. Vivado HLS estimates on a Xilinx UltraScale+ device further suggest up to 3.32$\times$ throughput and 7.7$\times$ lower estimated dynamic energy per inference under W4A4, exposing a residual \emph{basis-evaluation tax} that motivates basis-aware microarchitecture. QuantKAN is available at https://github.com/OSU-STARLAB/QuantKAN/.

2602.00482 2026-06-16 cs.LG 版本更新

AREAL-DTA: Dynamic Tree Attention for Efficient Reinforcement Learning of Large Language Models

AREAL-DTA:用于大语言模型高效强化学习的动态树注意力机制

Jiarui Zhang, Yuchen Yang, Ran Yan, Zhiyu Mei, Liyuan Zhang, Daifeng Li, Wei Fu, Jiaxuan Gao, Shusheng Xu, Yi Wu, Binhang Yuan

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 针对RL后训练中rollout序列共享前缀导致计算冗余的问题,提出基于深度优先搜索的动态树注意力机制,结合负载均衡分布式批处理,实现高效前缀共享,训练吞吐量提升最高8.31倍。

Comments Accepted at ICML 2026. Camera-ready version. Code: https://github.com/areal-project/AReaL/tree/feat/dta

详情
AI中文摘要

基于强化学习(RL)的大语言模型(LLM)后训练计算成本高昂,因为其生成大量rollout序列,这些序列经常共享长token前缀。现有的RL框架通常在策略训练期间独立处理这些序列,即在策略梯度计算的前向和后向传播中重复计算相同的前缀,导致计算资源和内存使用的严重低效。尽管前缀共享自然地在rollout上形成树结构,但打包的树掩码方法在RL设置中扩展性差。在本文中,我们介绍AReaL-DTA,它高效地利用了RL训练中的前缀共享。AReaL-DTA采用基于深度优先搜索(DFS)的执行策略,在前向和后向计算期间动态遍历rollout前缀树,每次只具体化一条从根到叶的路径。为了进一步提高可扩展性,AReaL-DTA结合了负载均衡的分布式批处理机制,跨多个GPU动态构建和处理前缀树。在τ²-bench上,AReaL-DTA相比密集训练提高了高达8.31倍的训练吞吐量,相比稀疏训练提高了高达1.70倍。我们的代码可在该https URL获取。

英文摘要

Reinforcement learning (RL)-based post-training for large language models (LLMs) is computationally expensive, as it generates many rollout sequences that frequently share long token prefixes. Existing RL frameworks usually process these sequences independently during policy training, i.e., repeatedly recomputing identical prefixes in both the forward and backward passes of policy gradient computation, leading to substantial inefficiencies in computation resources and memory usage. Although prefix sharing naturally induces a tree structure over rollouts, packed tree-mask approaches scale poorly in RL settings. In this paper, we introduce AReaL-DTA, which efficiently exploits prefix sharing in RL training. AReaL-DTA employs a depth-first search (DFS)-based execution strategy that dynamically traverses the rollout prefix tree during both forward and backward computation, materializing only a single root-to-leaf path at a time. To further improve scalability, AReaL-DTA incorporates a load-balanced distributed batching mechanism that dynamically constructs and processes prefix trees across multiple GPUs. On $τ^2$-bench, AReaL-DTA improves training throughput by up to $8.31\times$ over dense training and up to $1.70\times$ over sparse training. Our code is available at https://github.com/areal-project/AReaL/tree/feat/dta.

2602.06694 2026-06-16 cs.LG 版本更新

NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models

NanoQuant: 大型语言模型高效子1位量化

Hyochan Chong, Dongkyu Kim, Changdong Kim, Minseop Choi

发表机构 * KAIST(韩国科学技术院)

AI总结 本文提出NanoQuant,一种新型后训练量化方法,能够将大型语言模型压缩到二进制和子1位水平,通过低秩二进制分解问题实现高效压缩,并在消费者硬件上实现大规模部署。

Comments Accepted to ICML 2026. Hyochan Chong and Dongkyu Kim contributed equally to this work

详情
AI中文摘要

仅权重量化已成为高效服务大型语言模型(LLMs)的标准方法。然而,现有方法无法高效地将模型压缩到二进制(1位)级别,因为它们要么需要大量数据和计算资源,要么会增加存储需求。在本工作中,我们提出了NanoQuant,这是首个后训练量化(PTQ)方法,能够将LLMs压缩到二进制和子1位级别。NanoQuant将量化问题公式化为低秩二进制分解问题,并将全精度权重压缩为低秩二进制矩阵和尺度。具体而言,它利用高效的交替方向乘子法(ADMM)求解器来精确初始化潜在的二进制矩阵和尺度,然后通过块和模型重建过程调整初始化参数。因此,NanoQuant在低内存后训练量化中建立了新的帕累托前沿,并实现了子1位压缩。NanoQuant使大规模部署在消费者硬件上成为可能。例如,在单个H100上,仅需13小时即可将Llama2-70B压缩25.8倍,使70B模型能够在消费者8GB GPU上运行。

英文摘要

Weight-only quantization has become a standard approach for efficiently serving large language models (LLMs). However, existing methods fail to efficiently compress models to binary (1-bit) levels, as they either require large amounts of data and compute or incur additional storage. In this work, we propose NanoQuant, the first post-training quantization (PTQ) method to compress LLMs to both binary and sub-1-bit levels. NanoQuant formulates quantization as a low-rank binary factorization problem, and compresses full-precision weights to low-rank binary matrices and scales. Specifically, it utilizes an efficient alternating direction method of multipliers (ADMM) solver to precisely initialize latent binary matrices and scales, and then tunes the initialized parameters through a block and model reconstruction process. Consequently, NanoQuant establishes a new Pareto frontier in low-memory post-training quantization, and enables sub-1-bit compression. NanoQuant makes large-scale deployment feasible on consumer hardware. For example, it compresses Llama2-70B by 25.8$\times$ in just 13 hours on a single H100, enabling a 70B model to operate on a consumer 8 GB GPU. Code is available at https://github.com/SamsungLabs/NanoQuant.

2604.02343 2026-06-16 cs.LG cs.AI cs.IT math.IT 版本更新

Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains

仅用10比特从俳句到巨作:LLMs解锁巨大压缩增益

Roy Rinberg, Annabelle Michael Carrell, Simon Henniger, Nicholas Carlini, Keri Warr

发表机构 * Harvard University(哈佛大学) University of Cambridge(剑桥大学) Anthropic

AI总结 研究LLM生成文本的无损和有损压缩,提出问答压缩(QA)交互协议,用少量二进制问题实现超100倍压缩比,高效传递知识。

详情
AI中文摘要

我们研究了LLM生成文本在无损和有损场景下的压缩,刻画了一个压缩-计算边界,其中更多的压缩需要更多的计算。对于无损压缩,领域适应的LoRA适配器可以将基于LLM的算术编码的压缩比提高2倍,相对于仅使用基础LLM的压缩。对于有损压缩,提示模型进行简洁重写然后应用算术编码可以实现约0.03的压缩比,比压缩原始响应提高2倍。我们进一步引入了问答压缩(QA),一种受游戏“二十个问题”启发的交互式有损协议。一个小模型通过向更强模型提问是/否问题来迭代优化其响应,每个答案恰好传输1比特。在涵盖数学、科学和代码的8个基准测试中,10个二进制问题恢复了小模型和大模型在标准基准上能力差距的23%到72%,在更难的基准上恢复了7%到38%,实现了0.0006到0.004的压缩比。这比之前基于LLM的压缩(Deletang等人,2024)小100倍以上,表明交互式协议可以比传输完整响应更高效地传递知识。

英文摘要

We study the compression of LLM-generated text across lossless and lossy regimes, characterizing a compression-compute frontier where more compression is possible at the cost of more compute. For lossless compression, domain-adapted LoRA adapters can improve LLM-based arithmetic coding by 2x over compression with the base LLM alone. For lossy compression, prompting a model for a succinct rewrite then applying arithmetic coding can achieve compression ratios of approximately 0.03, a 2x improvement over compressing the original response. We further introduce Question-Asking compression (QA), an interactive lossy protocol inspired by the game 'Twenty Questions'. A small model iteratively refines its response by asking yes/no questions to a stronger model, transferring exactly one bit per answer. On 8 benchmarks spanning math, science, and code, 10 binary questions recover 23% to 72% of the capability gap between a small and large model on standard benchmarks and 7% to 38% on harder benchmarks, achieving compression ratios of 0.0006 to 0.004. This is over 100x smaller than prior LLM-based compression (Deletang et al., 2024), suggesting that interactive protocols can transfer knowledge far more efficiently than transmitting full responses.

2605.27599 2026-06-16 cs.LG cs.AI cs.AR cs.DC cs.PF 版本更新

The Energy Blind Spot: NVIDIA's Flagship Edge AI Hardware Cannot Support Process-Level Energy Attribution

能源盲点:NVIDIA 旗舰边缘 AI 硬件无法支持进程级能源归因

Deepak Panigrahy, Aakash Tyagi

发表机构 * Independent Researcher(独立研究者) Texas A&M University(德克萨斯农工大学)

AI总结 本文审计了 ASUS Ascent GX10 (GB10 SoC) 平台的能源可观测性,发现其缺乏 CPU 能源计数器等关键接口,导致无法像 x86 的 RAPL 那样进行进程级能源归因,并提出通过外部直流计量和 GPU 减法进行校准的临时方案,呼吁将能源可观测性作为硬件的一等要求。

详情
AI中文摘要

代理型 AI 工作负载——其中单个用户目标触发多步编排、工具调用、重试和故障恢复——正被瞄准用于边缘部署,NVIDIA、戴尔、惠普、华硕、微星、宏碁和技嘉都将在 2026 年出货基于 GB10 的桌面 AI 系统。我们最近证明,编排结构主导了代理型能源成本,工作流每个成功目标消耗的能源是线性基线的 4.33 倍,而多步推理任务的 OOI 达到 7.63 倍。另外,Rajat 等人表明,在代理型工作负载中,CPU 端处理占总延迟的 90.6%,占总动态能源的 44%。我们报告了对 ASUS Ascent GX10 (GB10 SoC) 的系统性能源可观测性审计,发现该平台通过任何支持的软件接口都不暴露 CPU 能源计数器、INA 电源轨监视器、IPMI/BMC 和 SCMI powercap 协议。唯一的设备上能源遥测是通过 NVML 的瞬时 GPU 功率。我们进一步发现,联发科固件已经通过未记录的 ACPI 接口 (SPBM) 在内部计算每轨能源,但 NVIDIA 表示“没有计划暴露 CPU 轨信息”。因此,通过支持的接口,无法在此平台上重现像 x86 通过 RAPL 执行的设备上每进程能源归因。我们形式化了能源归因 AI 的硬件需求规范,提出了使用外部直流计量结合 GPU 减法的临时校准桥接,并确定了通过 SCMI powercap 的标准轨道路径。我们的发现激励低碳计算社区将能源可观测性作为硬件的头等要求。

英文摘要

Agentic AI workloads - where a single user goal triggers multi-step orchestration, tool calls, retries, and failure recovery - are being targeted for edge deployment, with NVIDIA, Dell, HP, ASUS, MSI, Acer, and Gigabyte all shipping GB10-based desktop AI systems in 2026. We recently demonstrated that orchestration structure dominates agentic energy cost, with workflows consuming 4.33x more energy per successful goal than linear baselines and OOI reaching 7.63x for multi-step reasoning tasks. Separately, Raj et al. show that CPU-side processing accounts for up to 90.6% of total latency and 44% of total dynamic energy in agentic workloads. We report a systematic energy-observability audit of the ASUS Ascent GX10 (GB10 SoC) and find that the platform exposes no CPU energy counter, no INA power-rail monitor, no IPMI/BMC, and no SCMI powercap protocol through any supported software interface. The only on-device energy telemetry is instantaneous GPU power via NVML. We further discover that the MediaTek firmware already computes per-rail energy internally via an undocumented ACPI interface (SPBM), but NVIDIA states there are "no plans to expose CPU rail information." On-device per-process energy attribution - as performed on x86 via RAPL - is therefore not reproducible on this platform through supported interfaces. We formalize a hardware requirements specification for energy-attributed AI, propose an interim calibration bridge for per-domain energy decomposition - confirmed on the Acer Veriton GN100 where CPU energy accumulators are live - and identify a standards-track path via SCMI powercap. Our findings motivate the low-carbon computing community to demand energy observability as a first-class hardware requirement.

2606.06302 2026-06-16 cs.LG cs.SE 版本更新

Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving

Tangram: 解锁非均匀KV缓存以实现高效的多轮LLM服务

Hyungmin Kim, Minsoo Kim, Hongseok Kim, Jungwook Choi

发表机构 * Hanyang University(翰林大学) Rebellions Republic of Korea(Rebellions)

AI总结 针对多轮LLM服务中KV缓存线性增长导致的GPU内存和带宽压力,提出Tangram系统,通过确定性预算分配、头组页面和提前负载均衡三项技术实现非均匀KV缓存的高效管理,吞吐量提升达2.6倍。

Comments 13 pages. 15 figures

详情
AI中文摘要

多轮大语言模型(LLM)服务对于一致的用户体验至关重要,但键值(KV)缓存的线性增长给GPU内存和带宽带来了巨大压力。非均匀KV压缩通过考虑每个KV缓存的重要性来有效保留更多信息。然而,这种KV缓存的异质性带来了各种系统挑战——包括内存碎片、调度复杂性和内核利用率降低——这些共同导致现有LLM服务系统的显著低效。为了克服这些挑战,我们提出了Tangram,一种新颖的服务系统,旨在使非均匀KV缓存变得实用。Tangram通过三种核心技术解决系统低效问题:(1)确定性预算分配根据每个头的内在模式为其分配静态内存占用,完全消除动态调度开销和预填充停滞;(2)头组页面将具有相似保留需求的注意力头聚类,并使用独立的向量化页表进行管理,从而最大化物理内存回收;(3)提前(AOT)负载均衡利用静态预算配置文件确保均匀的GPU利用率,无需运行时开销。实验结果表明,与现有基线相比,Tangram在完全保持模型准确性的同时,吞吐量提升高达2.6倍。我们的实现已在https://github.com/aiha-lab/TANGRAM公开。

英文摘要

Multi-turn LLM serving accumulates dialogue history whose Key-Value (KV) cache grows with every turn and every user, quickly exceeding the model weights themselves and making memory -- not compute -- the binding constraint on throughput. Non-uniform KV compression, which allocates heterogeneous budgets across attention heads, preserves accuracy far better than uniform schemes, yet remains impractical: modern serving stacks assume identical KV lengths across heads, so heterogeneity traps freed memory as page fragmentation, spends up to 25% of prefill time reclaiming scattered pages, and skews GPU workloads that inflate decode latency by up to $1.7\times$ or burn 15--20% of each decode step on re-planning. We observe that this heterogeneity need not be discovered at runtime: head-wise retention follows a two-level structural regularity -- an input-invariant head ranking with narrowly bounded per-head ratios -- that can be calibrated offline from as few as 50 samples. Building on this insight, we present Tangram, a serving framework that statically resolves what prior systems handle dynamically: Budget Reservation fixes each head's post-compression footprint at scheduling time, eliminating page reclamation; Ragged Paging clusters similar-budget heads into independent page tables, turning fragmentation into reclaimable memory; and Ahead-of-Time Load Balancing precomputes balanced GPU partitions with zero runtime planning. Implemented on vLLM, Tangram serves as a drop-in substrate for existing non-uniform compression methods, matching their accuracy while improving end-to-end throughput by up to $2.6\times$ over the full-KV baseline. Our implementation is publicly available at https://github.com/aiha-lab/TANGRAM.

2606.12688 2026-06-16 cs.LG cs.AI cs.DC 版本更新

M*: A Modular, Extensible, Serving System for Multimodal Models

M*: 一个模块化、可扩展的多模态模型服务系统

Atindra Jha, Naomi Sagan, Keisuke Kamahori, Irmak Sivgin, Rohan Sanda, Steven Gao, Mark Horowitz, Luke Zettlemoyer, Olivia Hsu, Jure Leskovec, Baris Kasikci, Stephanie Wang

发表机构 * Stanford University(斯坦福大学) University of Washington(华盛顿大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出M*系统,通过将模型表示为数据流图并引入Walk Graph抽象,支持多模态复合模型的高效服务,在多个任务上降低延迟并提升吞吐量。

Comments The codebase is available at https://github.com/mstar-project/mstar

详情
AI中文摘要

我们正在进入一个复合模型架构的新时代,这些架构集成了多种组件,如视觉编码器、语言骨干网络、扩散和流头、音频编解码器、动作生成器和世界模型预测器。这种架构支撑了广泛的多模态模型类别,包括统一多模态模型、全能模型、语音-语言模型、视觉-语言-动作策略和世界模型。然而,现有的模型服务框架基于对模型结构的狭隘假设,难以适应这种新的架构多样性。在此,我们提出M*,一个用于高效服务复合AI模型的通用服务系统。M*将模型表示为数据流图,将跨越多种模态和任务的请求处理视为对这些图的遍历。核心洞察是一种模块化抽象,支持模型组件的任意组合、在物理集群上的灵活放置以及分布式运行时中的模型无关优化。我们将这种抽象称为Walk Graph,并展示它如何简洁地捕获来自广泛家族的复合模型。我们在代表性模型上实例化M*,发现与vLLM-Omni相比,在BAGEL上的文本到图像工作负载中,端到端延迟平均降低20%,同时在Qwen3-Omni上的文本到语音工作负载中,实时因子降低高达2.9倍,吞吐量提升高达2.7倍。M*在机器人规划任务上也比V-JEPA 2-AC rollout基线性能提升高达12.5倍。因此,我们的工作为以最小开发工作量高效服务复杂模型铺平了道路。

英文摘要

We are entering a new era of composite model architectures that integrate diverse components such as vision encoders, language backbones, diffusion and flow heads, audio codecs, action generators, and world-model predictors. Such architectures underpin a broad class of multimodal models, including unified multimodal models, omni models, speech-language models, vision-language-action policies, and world models. However, existing model serving frameworks were built on narrow assumptions about model structure, making them ill-suited to accommodate this new architectural diversity. Here we present M*, a universal serving system for efficient serving of composite AI models. M* represents models as dataflow graphs, processing requests spanning diverse modalities and tasks as traversals over these graphs. The core insight is a modular abstraction that supports arbitrary composition of model components, flexible placement onto a physical cluster, and model-agnostic optimizations within a distributed runtime. We call this abstraction the Walk Graph and show how it can concisely capture composite models from a broad range of families. We instantiate M* on representative models and find that it achieves, on average, 20% lower end-to-end latency than vLLM-Omni for text-to-image workloads on BAGEL, while delivering up to 2.9x lower real-time factor and 2.7x higher throughput for text-to-speech workloads on Qwen3-Omni. M* also outperforms the V-JEPA 2-AC rollout baseline for robotic planning by up to 12.5x. Thus, our work paves the road towards more efficient serving of complex models with minimal developer effort.

2606.13300 2026-06-16 cs.LG 版本更新

Quantizing Time-Series Models As Dynamical Systems: Trajectory-Based Quantization Sensitivity Score

将时间序列模型量化为动力系统:基于轨迹的量化敏感度评分

Mariya Pavlova, Harrison Bo Hua Zhu, Lidia Vitanova, Elizaveta Semenova, Yingzhen Li

发表机构 * GitHub arXiv

AI总结 提出基于轨迹的量化敏感度评分(TQS),从动力系统稳定性角度分析量化误差传播,实现无需校准数据的混合精度量化。

Comments ICML 2026, Workshop on Forecasting as a New Frontier of Intelligence

详情
AI中文摘要

我们引入了基于轨迹的量化敏感度评分(TQS),这是一种通过动力系统稳定性视角重新定义训练后量化(PTQ)的指标。通过将网络的展开建模为离散时间动力系统,TQS 描述了量化引起的误差如何在展开时间范围内传播和放大。与传统的 PTQ 方法不同,传统方法中敏感度分析通常与量化过程耦合,而 TQS 实现了先验的敏感度估计,与量化器选择和位宽分配解耦。这种分离允许即使在具有融合算子的黑盒或编译网络中进行量化预算规划。在此基础上,我们提出了 TQS-PTQ,一个灵活的混合精度框架,不需要校准数据或昂贵的二阶近似。我们的实验表明,动力系统视角为资源受限环境下的低精度部署提供了一条稳健且高性能的路径。

英文摘要

We introduce the Trajectory-based Quantization Sensitivity Score (TQS), a metric that reframes post-training quantization (PTQ) through the lens of dynamical-systems stability. By modeling the network's rollout as a discrete-time dynamical system, TQS characterizes how quantization-induced errors propagate and amplify over the rollout horizon. Unlike conventional PTQ methods, where sensitivity analysis is often coupled to the quantization procedure, TQS enables a priori sensitivity estimation decoupled from quantizer selection and bit-width assignment. This separation allows for quantization budget planning even for black-box or compiled networks with fused operators. Building on this, we present TQS-PTQ, a flexible mixed-precision framework that requires no calibration data or costly second-order approximations. Our experiments show that a dynamical-systems perspective provides a robust, high-performing pathway for low-precision deployment in resource-constrained settings.

2407.02362 2026-06-16 cs.AR cs.AI cs.LG 版本更新

Mitigating scalability challenges in LUT-based neural networks via pruning optimisations

通过剪枝优化缓解基于LUT的神经网络的可扩展性挑战

Xuqi Zhu, Huaizhi Zhang, JunKyu Lee, Jiacheng Zhu, Chandrajit Pal, Sangeet Saha, Klaus D. McDonald-Maier, Xiaojun Zhai

发表机构 * School of Computer Science and Electronic Engineering, University of Essex(埃塞克斯大学计算机科学与电子工程学院)

AI总结 针对LUT矩阵乘法可扩展性差的问题,提出集成剪枝策略的LUT-MU架构,在FPGA上实现最高1.6倍吞吐量和4.2倍能效提升。

详情
AI中文摘要

现代深度神经网络严重依赖大量的乘加运算,这构成了主要的计算成本。为了解决这个问题,基于查找表(LUT)的矩阵乘法已成为减少神经网络中乘加运算计算成本和时间的有效替代方案。然而,由于LUT矩阵乘法的固有限制,基于LUT的神经网络仍然面临可扩展性挑战。为了缓解这些可扩展性限制,本文提出了一种可扩展且节能的基于LUT的近似矩阵乘法单元(LUT-MU),通过将剪枝策略集成到MADDNESS算法(一种基于LUT的矩阵乘法方法)中,构成神经网络的基本组件。随着矩阵乘法中问题规模和精度要求的增加,我们提出的LUT-MU架构有效约束了资源扩展。案例研究表明,将我们的LUT-MU部署在神经网络架构中,包括全连接层(MNIST)和ResNets(CIFAR-10、ImageNet)——在XCZU7EV和XCZU19EG FPGA上——与主流的基于CUDA的网络实现相比,产生了高达1.6倍的吞吐量提升和4.2倍的能效提升,与领先的量化神经网络实现相比,能效提升1.8倍,且对精度影响适中。与基于原始MADDNESS的神经网络相比,我们的LUT-MU根据MADDNESS的不同分辨率配置设置,节省了1.3到2.6倍的资源。

英文摘要

Modern deep neural networks heavily rely on a large number of multiply-accumulate operations, which constitute the predominant computational cost. To address this, Look-Up Table (LUT)-based matrix multiplications have emerged as a promising alternative for reducing the computational cost and time of the multiply-accumulate operations in a neural network. However, the LUT-based neural network still faces the scalability challenge due to the inherent limitations of LUT-based matrix multiplication. To mitigate these scalability limitations, this paper proposes a scalable and energy-efficient LUT-based approximate matrix multiplication unit (LUT-MU) constituting the basic component of the neural networks by integrating a pruning strategy on the MADDNESS algorithm, a LUT-based matrix multiplication methodology. With increasing problem size and precision demands in matrix multiplication, our proposed LUT-MU architecture effectively constrains resource expansion. The case study shows that deploying our LUT-MU in neural network architectures, including fully connected layers (MNIST) and ResNets (CIFAR-10, ImageNet)-on XCZU7EV and XCZU19EG FPGAs, produces up to $1.6 \times$ throughput improvement and $4.2 \times$ energy efficiency gains over mainstream CUDA-based network implementations, and $1.8\times$ energy efficiency compared to leading quantised neural network implementations, with moderate impact on accuracy. Compared to original MADDNESS-based neural networks, our LUT-MU shows $1.3$ to $2.6\times$ resource savings based on various resolution configuration settings of MADDNESS.

2505.23666 2026-06-16 cs.CL cs.LG 版本更新

LoLA: Low-Rank Linear Attention With Sparse Caching

LoLA: 低秩线性注意力与稀疏缓存

Luke McDermott, Robert W. Heath, Rahul Parhi

发表机构 * University of California, San Diego(加州大学圣地亚哥分校)

AI总结 提出LoLA,一种无需训练的线性注意力增强方法,通过三种记忆系统(局部滑动窗口、稀疏全局缓存和循环隐状态)提升关联回忆,在pass-key检索任务上将准确率从0.6%提升至97.4%,且缓存大小比Llama-3.1 8B小4.6倍。

详情
AI中文摘要

Transformer推理的每token成本随上下文长度扩展,阻碍了其在终身上下文学习中的应用。线性注意力是一种高效的替代方案,即使在无限上下文长度下也能保持恒定的内存占用。虽然这可能是终身学习的潜在候选,但其内存容量不足。在本文中,我们提出LoLA,一种无需训练的线性注意力增强方法,可提升关联回忆。LoLA将上下文中的过去键值对分配到三种记忆系统中:(i) 局部滑动窗口缓存中的近期对;(ii) 稀疏全局缓存中的难以记忆的对;以及(iii) 线性注意力循环隐状态中的通用对。通过消融实验,我们表明自回忆误差指标对于高效管理长期关联记忆至关重要。在pass-key检索任务上,LoLA将基础模型的准确率从0.6%提升至97.4%。这是在4K上下文长度下,缓存大小比Llama-3.1 8B小4.6倍的情况下实现的。LoLA在零样本常识推理任务上也优于其他1B和8B参数的次二次模型。

英文摘要

The per-token cost of transformer inference scales with context length, preventing its application to lifelong in-context learning. Linear attention is an efficient alternative that maintains a constant memory footprint, even on infinite context lengths. While this is a potential candidate for lifelong learning, it falls short in memory capacity. In this paper, we propose LoLA, a training-free augmentation to linear attention that boosts associative recall. LoLA distributes past key-value pairs from context into three memory systems: (i) recent pairs in a local sliding window cache; (ii) difficult-to-memorize pairs in a sparse, global cache; and (iii) generic pairs in the recurrent hidden state of linear attention. We show through ablations that our self-recall error metric is crucial to efficiently manage long-term associative memories. On pass-key retrieval tasks, LoLA improves the base model's performance from 0.6% to 97.4% accuracy. This is achieved with a 4.6x smaller cache than Llama-3.1 8B on 4K context length. LoLA also outperforms other 1B and 8B parameter subquadratic models on zero-shot commonsense reasoning tasks.

2506.20686 2026-06-16 q-bio.BM cs.DC cs.LG cs.PF 版本更新

MegaFold: Efficient Training of Next-Generation 3D Attention Protein Models on Cross-Platform GPUs

MegaFold: 跨平台GPU上高效训练下一代3D注意力蛋白质模型

Hoa La, Ahan Gupta, Alex Morehead, Jianlin Cheng, Minjia Zhang

发表机构 * UIUC SSAIL Lab(UIUC SSAIL实验室)

AI总结 针对AlphaFold3类模型因3D注意力机制导致训练效率低的问题,提出MegaFold系统,通过高效内核、分片策略、算子融合和流水线优化,在NVIDIA和AMD GPU上实现更长序列训练和加速。

Comments 13 pages, 12 figures

详情
AI中文摘要

最近,生物分子建模的进展受到AlphaFold3(AF3)等模型的推动,这些模型将科学信息引入Transformer架构。与Transformer不同,AF3风格模型的一个定义性特征是它们在二维成对表示上的3D注意力,这产生的张量的计算和内存成本随序列长度呈立方增长。因此,尽管参数数量适中,AF3风格模型的训练成本远高于同等大小的Transformer,并且受到GPU内存容量的严重限制。我们的特征分析表明,3D注意力从根本上改变了训练工作负载,导致巨大的3D注意力图、复杂的算子间依赖、内核碎片化和繁重的主机端数据管道,这些与LLM训练截然不同,导致现代GPU系统的利用率低下。此外,由于3D注意力引入的复杂跨层算子间依赖,现有的GPU优化未能充分应对这些挑战。受这些挑战的启发,我们引入了MegaFold,一个新颖的跨平台系统,用于高效训练下一代3D注意力蛋白质模型。MegaFold结合了内存高效的3D注意力内核、用于二次表示的通信高效分片策略、关键执行路径的融合算子实现,以及一个确定性感知的主机-设备流水线,消除了预处理停顿。在NVIDIA H200和AMD MI250 GPU上的评估表明,MegaFold能够在32个GPU上训练长达3.36倍的序列长度,同时将端到端执行时间减少高达1.73倍(NVIDIA)和1.62倍(AMD)。

英文摘要

Recent advances in biomolecular modeling have been catalyzed by models such as AlphaFold3 (AF3), which introduce science-informed changes to the transformer architecture. Unlike transformers, a defining characteristic of AF3-style models is their 3D attention over 2D pairwise representations which produces tensors whose computation and memory costs scale cubically with sequence length. As a result, despite moderate parameter counts, AF3-style models are far more expensive to train than size-equivalent transformers, and are severely constrained by GPU memory capacity. Our characterization shows 3D attention fundamentally changes the training workload, causing massive 3D attention maps, complex inter-operator dependencies, kernel fragmentation, and heavy host-side data pipelines which differ substantially from LLM training, leading to poor utilization on modern GPU systems. Moreover, existing GPU optimizations do not adequately address these challenges due to complex cross-layer inter-operator dependencies introduced by 3D attention. Motivated by these challenges, we introduce MegaFold, a novel cross-platform system for efficient training of next-generation 3D-attention protein models. MegaFold combines a memory-efficient 3D-attention kernel, a communication-efficient sharding strategy for quadratic representations, fused operator implementations for critical execution paths, and a determinism-aware host-device pipeline that eliminates preprocessing stalls. Evaluation on both NVIDIA H200 and AMD MI250 GPUs shows that MegaFold enables training with up to 3.36$\times$ longer sequence lengths on 32 GPUs while reducing end-to-end execution time by up to 1.73$\times$ (NVIDIA) and 1.62$\times$ (AMD).

2512.19011 2026-06-16 cs.CR cs.AI cs.CL cs.LG 版本更新

Do You Really Need a GPU to Guard Your LLM? CPU-Class Classifiers and Multi-Stage Pipelines for Safety Enforcement at Scale

你真的需要GPU来保护你的LLM吗?用于大规模安全执行的CPU级分类器与多阶段流水线

Vasudev Majhi, Dhruv Gupta, Advait Singh, Matthew Barker, Dhruv Kumar

发表机构 * BITS Pilani(比斯帕利尼大学) Trustwise(Trustwise公司)

AI总结 本文研究CPU级分类器(如SVM、梯度提升树)在LLM输入安全检测中的性能,发现其与GPU模型互补,并设计三阶段流水线GuardChain,在80%的分布内查询中达到近峰值精度,降低部署成本。

Comments Under Review. 25 pages, 5 figures, 38 tables

详情
AI中文摘要

用于筛选LLM输入中越狱尝试的安全分类器已成为标准部署组件,但几乎所有生产系统都依赖基于GPU的模型:微调变换器和LLM-as-a-judge流水线。这些方法带来了显著的每查询延迟和基础设施成本。很少有研究探讨基于CPU的分类器(例如在TF-IDF特征上训练的支持向量机和梯度提升树)是否能在生产部署遇到的各种条件下匹配其准确性。我们评估了五个CPU分类器家族、基于SSM的GPU分类器Mamba-130M以及基于变换器的GPU模型(DeBERTa-v3和带LoRA的Gemma-2B),涵盖九个越狱来源和三种场景:分布内(D1)、分布外(D2)和对抗性混淆(D3)。在D1上,最佳CPU分类器以约五分之一的部署成本匹配最佳变换器GPU模型。在D2上,CPU分类器因自信的校准错误而失败,产生高置信度的假阴性,完全绕过升级。在D3上,CPU分类器在F1上比变换器GPU模型高出超过26个百分点。基于这些互补的失败模式,我们设计了GuardChain,一个三阶段安全流水线(正则表达式 -> CPU -> GPU),将每个提示路由到能够做出自信决策的最便宜阶段。仅CPU阶段就解决了80%的分布内提示,接近峰值精度,而GPU阶段恢复了分布外失败。对于大规模部署LLM安全的从业者,这项工作提供了证据,表明GPU级基础设施对于大多数流量是不必要的。

英文摘要

Safety classifiers that screen LLM inputs for jailbreak attempts have become standard deployment components, yet almost all production systems rely on GPU-based models: fine-tuned transformers and LLM-as-a-judge pipelines. These approaches impose significant per-query latency and infrastructure cost. Very little research has asked whether CPU-based classifiers, such as support vector machines and gradient-boosted trees trained on TF-IDF features, can match their accuracy across the conditions that production deployments encounter. We evaluate five CPU classifier families, Mamba-130M as an SSM-based GPU classifier, and transformer-based GPU models (DeBERTa-v3 and Gemma-2B with LoRA) across nine jailbreak sources and three regimes: in-distribution (D1), out-of-distribution (D2), and adversarially obfuscated (D3). On D1, the best CPU classifier matches the best transformer GPU model at roughly one-fifth the deployment cost. On D2, CPU classifiers fail via confident miscalibration, producing high-confidence false negatives that bypass escalation entirely. On D3, CPU classifiers outperform transformer GPU models by more than 26 percentage points in F1. Based on these complementary failure modes, we design GuardChain, a three-stage safety pipeline (Regex -> CPU -> GPU) that routes each prompt to the cheapest stage capable of a confident decision. The CPU stage alone resolves 80\% of in-distribution prompts at near-peak accuracy, and the GPU stage recovers the out-of-distribution failures. For practitioners deploying LLM safety at scale, this work provides evidence that GPU-class infrastructure is unnecessary for the majority of traffic.

2602.00887 2026-06-16 cs.CL cs.AI cs.LG 版本更新

EffGen: Enabling Small Language Models as Capable Autonomous Agents

EffGen: 使小型语言模型成为能干的自主智能体

Gaurav Srivastava, Aafiya Hussain, Chi Wang, Yingyan Celine Lin, Xuan Wang

发表机构 * Department of Computer Science, Virginia Tech, Blacksburg, VA, USA(弗吉尼亚理工大学计算机科学系) Georgia Institute of Technology, Atlanta, GA, USA(佐治亚理工学院) Google DeepMind, USA(谷歌DeepMind)

AI总结 EffGen是一个针对小型语言模型优化的开源智能体框架,通过提示压缩、任务分解、复杂度路由和统一记忆系统,实现高效、安全的本地部署,在13个基准测试中优于LangChain等框架。

Comments Accepted to ICML 2026 Conference

详情
AI中文摘要

目前大多数基于语言模型的智能体系统都是通过API调用为大型语言模型(如GPT、Claude、Gemini)构建和优化的;虽然强大,但这种方法面临高令牌成本和敏感应用中的隐私问题等限制。我们提出了EffGen,一个针对小型语言模型优化的开源智能体框架,能够实现有效、高效且安全的本地部署。EffGen有四大贡献:(1)增强的工具调用与提示优化,可将输入提示压缩高达70-80%(在我们的基准测试中平均压缩57%),同时保留任务语义;(2)智能任务分解,根据依赖关系将复杂查询分解为并行或顺序子任务;(3)基于复杂度的路由,利用五个因素做出智能的执行前决策;(4)统一记忆系统,结合短期、长期和基于向量的存储。此外,EffGen统一了多种智能体协议(MCP、A2A、ACP)以实现跨协议通信。在13个基准测试上的结果表明,EffGen在成功率、执行速度和内存占用方面优于LangChain、AutoGen和Smolagents。我们的结果揭示,提示优化和复杂度路由具有互补的缩放行为:优化对小型语言模型更有利(1.5B模型提升11.2%,而32B模型提升2.4%),而路由对大型模型更有利(1.5B模型提升3.6%,而32B模型提升7.9%),两者结合在所有规模上都能带来一致的增益。EffGen在Apache 2.0许可证下发布,确保研究和商业用途的广泛可访问性,代码可在https://github.com/effgen/effgen获取,Python包可通过pip install effgen安装,项目网站和文档位于https://effgen.ai和https://docs.effgen.ai。

英文摘要

Most existing language model agentic systems today are built and optimized for large language models (e.g., GPT, Claude, Gemini) via API calls; while powerful, this approach faces several limitations including high token costs and privacy concerns for sensitive applications. We introduce EffGen, an open-source agentic framework optimized for small language models (SLMs) that enables effective, efficient, and secure local deployment. EffGen makes four major contributions: (1) Enhanced tool-calling with prompt optimization that compresses input prompts by up to 70-80% (and 57% on average across our benchmarks) while preserving task semantics, (2) Intelligent task decomposition that breaks complex queries into parallel or sequential subtasks based on dependencies, (3) Complexity-based routing using five factors to make smart pre-execution decisions, and (4) Unified memory system combining short-term, long-term, and vector-based storage. Additionally, EffGen unifies multiple agent protocols (MCP, A2A, ACP) for cross-protocol communication. Results on 13 benchmarks show EffGen outperforms LangChain, AutoGen, and Smolagents with higher success rates, faster execution, and lower memory. Our results reveal that prompt optimization and complexity routing have complementary scaling behavior: optimization benefits SLMs more (11.2% gain at 1.5B vs 2.4% at 32B), while routing benefits large models more (3.6% at 1.5B vs 7.9% at 32B), providing consistent gains across all scales when combined. EffGen is released under the Apache 2.0 License, ensuring broad accessibility for research and commercial use, with the code available at https://github.com/ctrl-gaurav/effGen, the Python package at https://pypi.org/project/effgen/ (pip install effgen), and the project website and documentation at https://effgen.org/ and https://docs.effgen.org/.

2605.17106 2026-06-16 cs.CL cs.LG 版本更新

HyDRA: Hybrid Dynamic Routing Architecture for Heterogeneous LLM Pools

HyDRA:异构LLM池的混合动态路由架构

Aashna Garg, Siddharth Singha Roy, Jinu Jang, Federico Brancasi, Shengyu Fu

发表机构 * Microsoft(微软)

AI总结 本文提出HyDRA,一种能够根据查询预测细粒度多维能力需求并匹配配置定义模型配置的混合动态路由架构,实现了在异构LLM池中高效且无需重新训练的模型选择。

Comments preprint v2

详情
AI中文摘要

生产中的LLM部署越来越多地维护跨越数量级成本差异的异构模型池。现有路由器做出二元强弱决策,并将学习参数与特定模型身份耦合,当目录更改时需要重新训练。我们提出了HyDRA(混合动态路由架构),一种框架,可以预测每个查询的细粒度、多维能力需求,并通过短缺匹配与配置定义的模型配置匹配。一个带有K=4个独立sigmoid头的ModernBERT编码器对每个查询进行评分,评分维度包括推理、代码生成、调试和工具使用;然后,一个短缺匹配算法会选择最便宜的模型,其能力满足预测的需求。部署的预测器在生产中的中位CPU推理延迟为86毫秒,并且完全解耦于模型目录--添加或删除模型只需配置更改,无需重新训练。在SWE-Bench Verified(5模型池:GPT-5.4-mini,Claude Haiku 4.5,GPT-5.3 Codex,Claude Sonnet 4.6,GPT-5.4)上,HyDRA的可调短缺阈值跨越三个领域:峰值质量超过始终强劲的Claude Sonnet 4.6基线(75.4% vs. 74.2%分辨率)在12.9%的成本节省;等质量匹配Sonnet在54.1%的成本节省,比我们先前的内部二元路由器在9.1%的改进;激进的推动节省到72.5%在3.2点质量折损。结果在LiveCodeBench、BigCodeBench和tau-bench上通用。HyDRA已部署到GitHub Copilot的所有用户在VS Code Chat自动模式中,并且据我们所知,在LLM路由文献中首次展示了跨CJK、欧洲和其他文字家族的语言不变路由。

英文摘要

Production LLM deployments increasingly maintain heterogeneous model pools spanning order-of-magnitude cost differences. Existing routers make binary strong-vs-weak decisions and couple learned parameters to specific model identities, requiring retraining whenever the catalog changes. We present HyDRA (Hybrid Dynamic Routing Architecture), a framework that predicts fine-grained, multi-dimensional capability requirements per query and matches them against configuration-defined model profiles via shortfall matching. A ModernBERT encoder with K=4 independent sigmoid heads scores each query along reasoning, code generation, debugging, and tool use; a shortfall-matching algorithm then selects the cheapest model whose capabilities meet the predicted requirements. The deployed predictor runs at 86 ms median CPU inference latency in production, and is fully decoupled from the model catalog -- adding or removing models requires only a configuration change, with zero retraining. On SWE-Bench Verified (5-model pool: GPT-5.4-mini, Claude Haiku 4.5, GPT-5.3 Codex, Claude Sonnet 4.6, GPT-5.4), HyDRA's tunable shortfall threshold spans three regimes: peak-quality exceeds the always-strong Claude Sonnet 4.6 baseline (75.4% vs. 74.2% resolution) at 12.9% cost savings; iso-quality matches Sonnet at 54.1% cost savings, a 6x improvement over our prior in-house binary router at 9.1%; aggressive pushes savings to 72.5% for a 3.2-point quality trade. Results generalize across LiveCodeBench, BigCodeBench, and tau-bench. HyDRA is deployed to all users in GitHub Copilot's VS Code Chat auto-mode and -- to our knowledge for the first time in the LLM routing literature -- demonstrates language-invariant routing across CJK, European, and other script families.

2605.21312 2026-06-16 cs.DC cs.AI cs.LG 版本更新

Frontier: Towards Comprehensive and Accurate LLM Inference Simulation

Frontier: 向全面且准确的LLM推理模拟迈进

Yicheng Feng, Xin Tan, Yangtao Deng, Yimin Jiang, Yibo Zhu, Hong Xu

发表机构 * The Chinese University of Hong Kong(香港中文大学) Anuttacon StepFun

AI总结 本文提出Frontier,一种用于现代LLM推理服务的离散事件模拟器,通过离散化抽象和对关键运行时优化的建模,实现了对复杂工作负载的准确预测,从而在不同服务场景中提供更精确的计算、通信和内存成本预测。

详情
AI中文摘要

现代LLM服务已不再是单一或整体的。生产系统现在结合了解耦执行、复杂并行性、运行时优化和状态化工作负载,如推理、代理和RL展开。模拟对于探索这个快速增长的设计空间具有吸引力,但现有模拟器缺乏所需的架构完整性和决策级精度。它们的单体-副本抽象不适合解耦服务,而平均情况分析代理可能会扭曲SLA预测甚至逆转优化结论。我们提出了Frontier,一种用于现代LLM推理服务的离散事件模拟器。Frontier具有解耦抽象。它通过建模共置、预填解码解耦(PDD)和注意力-前馈网络解耦(AFD)与角色特定的集群工作者,捕捉现代服务系统的结构和动态。它在调度器-批次引擎循环中整合关键运行时优化(例如CUDA图、推测解码),并支持新兴工作负载的状态请求。它进一步提供了在多样化服务场景中对计算、通信和内存成本的准确且可推广的预测。在16-H800 GPU测试平台上,Frontier实现了平均吞吐量误差低于4%。与最先进的模拟器相比,它在共置情况下将端到端延迟误差从44.9%降低到6.4%,在解耦情况下从51.7%降低到2.6%。它扩展到超过1000个GPU在商用CPU上,并启用了新的用例,如依赖SLA的帕累托前沿探索、异构解耦分配、代理推理调度验证和RL后训练重配置。

英文摘要

Modern LLM serving is no longer homogeneous or monolithic. Production systems now combine disaggregated execution, complex parallelism, runtime optimizations, and stateful workloads such as reasoning, agents, and RL rollouts. Simulation is attractive for exploring this growing design space, yet existing simulators lack the architectural completeness and decision-grade fidelity it demands. Their monolithic-replica abstractions are ill-suited to disaggregated serving, while average-case analytical proxies can distort SLA predictions and even reverse optimization conclusions. We present Frontier, a discrete-event simulator for modern LLM inference serving. Frontier features a disaggregated abstraction. It captures the structure and dynamics of modern serving systems by modeling co-location, Prefill-Decode Disaggregation (PDD), and Attention-FFN Disaggregation (AFD) with role-specific cluster workers, incorporating key runtime optimizations (e.g., CUDA Graphs, speculative decoding) within the scheduler-batch-engine loop, and supporting stateful requests for emerging workloads. It further provides accurate and generalizable predictions of computation, communication, and memory costs across diverse serving scenarios with complex workload compositions. On 16-H800 GPU testbed, Frontier achieves an average throughput error below 4%. Compared with state-of-the-art simulators, it reduces end-to-end latency error from 44.9% to 6.4% under co-location and from 51.7% to 2.6% under disaggregation. It scales to over 1K GPUs on commodity CPUs and enables new use cases such as SLA-dependent Pareto frontier exploration, heterogeneous disaggregated allocation, agentic reasoning scheduling validation, and RL post-training reconfiguration. We release Frontier at https://github.com/NetX-lab/Frontier.

7. 联邦学习、隐私与安全 29 篇

2606.15625 2026-06-16 cs.LG cs.NI 新提交

Conflict-Aware Federated Fine-Tuning of Large Language Models with Mixture-of-Experts

基于混合专家的大语言模型冲突感知联邦微调

Yijun Lu, Zihan Fang, Pengpeng Qiao, Zheng Lin, Jing Yang, Yuxin Zhang, Por Lip Yee, Zhe Chen, Jun Luo

发表机构 * Nanyang Technological University(南洋理工大学) University of Malaya(马来亚大学)

AI总结 针对联邦学习中混合专家模型因数据异质性导致的专家优化冲突问题,提出FC-MoE框架,通过重要性加权、梯度共识投影和局部知识保留机制,实现稳定优化并提升非独立同分布环境下的模型性能。

Comments 6 pages, 4 figures

详情
AI中文摘要

大语言模型(LLMs)的持续扩展带来了高昂的计算成本,使得混合专家(MoE)通过稀疏激活成为一种可扩展的高效微调替代方案。虽然联邦学习(FL)作为隐私保护的协作优化范式出现,但在数据异质性下将MoE集成到FL中可能触发冲突的专家优化。客户端特定的数据分布迫使相同索引的专家在不一致甚至冲突的特征-标签相关性下进行优化。这种不匹配在聚合过程中引起破坏性干扰,从而破坏优化轨迹并降低模型性能。为解决此问题,我们提出FC-MoE,一种用于MoE微调的联邦冲突感知框架。它采用重要性感知加权方案来优先考虑可靠的局部更新,并利用梯度共识投影来抑制冲突更新,确保稳定的全局优化路径。此外,局部知识保留机制通过重新锚定领域特定残差进一步保留专门的客户端专业知识。大量实验表明,FC-MoE在非独立同分布联邦环境中加速收敛并增强全局和局部模型性能。

英文摘要

The continuous scaling of large language models (LLMs) incurs prohibitive computational costs, making Mixture-of-Experts (MoE) a scalable alternative for efficient fine-tuning via sparse activation. While federated learning (FL) emerges as the paradigm for privacy-preserving collaborative optimization, integrating MoE into FL under data heterogeneity may trigger conflicting expert optimizations. Client-specific data distributions force same-indexed experts to optimize under inconsistent or even conflicting feature-label correlations. This mismatch induces destructive interference during aggregation, thus destabilizing the optimization trajectory and degrading model performance. To address this issue, we propose FC-MoE, a federated conflict-aware framework for MoE fine-tuning. It employs an importance aware weighting scheme to prioritize reliable local updates and utilizes gradient consensus projection to suppress conflicting updates, ensuring a stable global optimization path. Moreover, a local knowledge retention mechanism further preserves specialized client expertise by re-anchoring domain-specific residuals. Extensive experiments demonstrate that FC-MoE accelerates convergence and enhances both global and local model performance in non-IID federated environments.

2606.15695 2026-06-16 cs.LG cs.AI 新提交

When Generator Replay Degrades: Projected Rehearsal Orchestration for Heterogeneous Federated Class-Incremental Learning

当生成器回放退化时:面向异构联邦类增量学习的投影排练编排

Thinh T. H. Nguyen, Khoa D. Doan, Binh T. Nguyen, Danh Le-Phuoc, Kok-Seng Wong

发表机构 * VinUniversity VNU-HCM, University of Science(胡志明市国家大学理科大学) Technische Universität Berlin(柏林工业大学)

AI总结 针对异构联邦类增量学习中客户端标签子集不同、任务阶段不一致导致的旧知识遗忘问题,提出投影排练编排框架PRO及增强版PRO-MAX,通过服务器端维护紧凑类级投影记忆并实现平衡伪多任务训练,在图像、文本和图基准上提升异构流下的保留与最终效用。

Comments 46 pages

详情
AI中文摘要

联邦类增量学习(FCIL)在客户端观察到不同标签子集、在不同阶段推进任务以及为相同语义概念提供不均匀监督时变得极其困难。现有的FCIL方法通常通过输入空间合成来保留旧知识,但在异构任务流下可能脆弱且难以跨模态迁移。为缓解这些问题,我们提出PRO,一个用投影排练编排替代合成输入回放的框架。为去除外部预训练,我们在相同的预热条件下评估所有方法。此后,PRO在服务器上维护紧凑的类级投影记忆,并允许客户端在当前示例和旧投影记忆上执行平衡的伪多任务训练。为处理更强的表示漂移,我们进一步引入PRO-MAX,它在保持相同服务器轻量原则(服务器仅聚合模型更新和记忆统计)的同时,用邻域加权记忆对齐增强PRO。在图像、文本和图基准上,PRO和PRO-MAX在异构流下提高了保留和最终效用,同时在同构FCIL中保持竞争力。即使基线获得更大的回放预算,它们在监督不平衡和阶段错位下也会退化,表明仅靠回放数量无法解决回放质量失败。额外的弱任务诊断进一步表明,更大的回放不匹配与更大的下游退化相关,而我们的方法使投影记忆与不断演化的表示保持更好对齐。

英文摘要

Federated class-incremental learning (FCIL) becomes substantially harder when clients observe different label subsets, progress through tasks at different stages, and provide uneven supervision for the same semantic concepts. Existing FCIL methods often preserve old knowledge through input-space synthesis, but they can be fragile under heterogeneous task streams and difficult to transfer across modalities. To alleviate such issues, we propose PRO, a framework that replaces synthetic input replay with projected rehearsal orchestration. To remove external pretraining, we evaluate all methods under the same warmup. After this, PRO maintains compact class-level projected memories on the server and allows clients perform balanced pseudo multi-task training over current examples and old projected memories. To handle stronger representation drift, we further introduce PRO-MAX, which augments PRO with neighborhood-weighted memory alignment while preserving the same server-light principle that the server only aggregates model updates and memory statistics. Across image, text, and graph benchmarks, PRO and PRO-MAX improve retention and final utility under heterogeneous streams while remaining competitive in homogeneous FCIL. Even when baselines are given expanded replay budgets, they degrade under supervision imbalance and stage misalignment, indicating that replay quantity alone does not resolve replay-quality failures. Additional weak-task diagnostics further show that larger replay mismatch is associated with larger downstream degradation, while our method keeps projected memories better aligned with the evolving representation.

2606.15940 2026-06-16 cs.LG 新提交

Causal-Privacy Audit Workflow for Synthetic and Distilled Data in Dropout Support

辍学支持中合成与蒸馏数据的因果隐私审计工作流

Hanghang Zheng, Xiwei Zhuang, Zhong Wang, Hong Liu, Xiao Chen, Jingwen He, Xia Li

发表机构 * Central University of Finance and Economics(中央财经大学) China Development Bank(中国发展银行) University of Cambridge(剑桥大学)

AI总结 提出CaP-Eval工作流,在固定估计目标下审计合成学生数据的预测效用、因果保真度和隐私风险,发现DPGNet和蒸馏数据在保留处理效应结构上优于基线方法。

详情
AI中文摘要

合成和蒸馏的学生数据越来越多地用于实现隐私意识的学习分析,但它们对面向决策的机构支持的适用性仍不确定。在辍学支持中,生成的数据不仅必须保留预测效用或分布相似性,还必须保留用于指导咨询、付款计划援助和奖学金相关决策的财务状况证据。方法:本研究引入了CaP-Eval,一种面向决策的因果隐私审计工作流,用于在固定估计目标、时间感知调整设计、估计器集和经验隐私治理筛选下评估生成的学生数据。该工作流比较了原始数据、蒸馏数据、对抗合成数据、统计合成数据和DPGNet隐私导向生成数据在预测效用、处理效应保真度、对替代估计器的鲁棒性以及局部训练记录邻近性方面的表现。结果:DPGNet和蒸馏数据比对抗和高斯Copula基线更可靠地保留了原始财务状况处理效应结构。DPGNet在epsilon水平上保留了完整的方向和秩一致性;epsilon=10产生了最小的非原始IPW和DML偏差,而epsilon=1和epsilon=5放大了若干财务状况对比。蒸馏数据保持高度忠实,但保留了最强的局部训练记录邻近信号。TabularGNet保留了定性方向但存在中度衰减,高斯Copula压缩了效应幅度。结论:预测效用、隐私导向、经验披露信号和因果保真度存在分歧;生成的学生数据在决策使用前需要对方向、幅度、重叠和发布治理风险进行联合审计。

英文摘要

Synthetic and distilled student data are increasingly used to enable privacy-conscious learning analytics, yet their suitability for decision-facing institutional support remains uncertain. In dropout support, generated data must preserve not only predictive utility or distributional resemblance, but also the financial-status evidence used to guide advising, payment-plan assistance, and scholarship-related decisions. Method: This study introduces CaP-Eval, a decision-facing causal-privacy audit workflow for evaluating generated student data under a fixed estimand, timing-aware adjustment design, estimator set, and empirical privacy-governance screen. The workflow compares original, distilled, adversarial synthetic, statistical synthetic, and DPGNet privacy-oriented generated data on predictive utility, treatment-effect fidelity, robustness to alternative estimators, and local training-record proximity. Results: DPGNet and distilled data preserved the original financial-status treatment-effect structure more reliably than the adversarial and Gaussian Copula baselines. DPGNet preserved full direction and rank agreement across epsilon levels; epsilon = 10 produced the smallest non-original IPW and DML deviations, while epsilon = 1 and epsilon = 5 amplified several financial-status contrasts. Distilled data remained highly faithful but retained the strongest local training-record proximity signal. TabularGNet preserved qualitative directions with moderate attenuation, and Gaussian Copula compressed effect magnitudes. Conclusions: Predictive utility, privacy orientation, empirical disclosure signals, and causal fidelity diverged; generated student data require joint audits of direction, magnitude, overlap, and release-governance risk before decision use.

2606.16110 2026-06-16 cs.LG 新提交

Auditing Machine Unlearning: A Systematic Research on Whether Models Truly Forget

审计机器遗忘:关于模型是否真正遗忘的系统性研究

Dayong Ye, Tianqing Zhu, Ruiding Huang, Xinbo Fu, Jiayang Li, Bo Liu, Huan Huo, Wanlei Zhou

发表机构 * University of Technology Sydney(悉尼科技大学) Deakin University(迪肯大学) Macquarie University(麦考瑞大学)

AI总结 针对隐私法规需求,提出首个实用通用审计框架,通过无知证明概念验证现有遗忘算法能否真正擦除指定数据影响,实验表明重训练和微调方法有效,去优化和Fisher/Hessian方法失败。

详情
AI中文摘要

机器遗忘因日益增长的隐私担忧和监管要求而受到广泛研究。然而,审计遗忘算法是否真正擦除了特定数据的影响仍然是一个开放的挑战。缺乏可靠且实用的审计机制可能导致严重的隐私风险,例如残留信息泄露。本文对现有遗忘算法能否真正遗忘指定数据进行了系统性研究。受无知证明概念的启发,我们提出了首个实用且通用的机器遗忘审计框架。我们的框架通过消除从头再训练基线、避免训练大量影子模型以及无需对原始训练过程进行侵入性干预,解决了现有方法的关键实用性限制。为了评估我们框架的有效性,我们首先进行验证实验以确认其健全性和完备性。然后,我们在六个数据集和十种代表性遗忘方法上进行了全面实验。结果表明,我们的框架能够可靠地区分成功和失败的遗忘。特别地,我们观察到基于重训练和基于微调的方法可以实现有效遗忘,即使目标数据仍保留在原始数据集中。相比之下,基于去优化的方法无法实现真正遗忘,反而降低了模型性能。基于Fisher/Hessian的方法也无法遗忘请求的数据,即使提供了形式化认证。此外,我们展示了我们的框架对虚假遗忘尝试具有鲁棒性,并且能够很好地泛化到大型语言模型。

英文摘要

Machine unlearning has been extensively studied in response to growing privacy concerns and regulatory requirements. However, auditing whether unlearning algorithms have truly erased the influence of specific data remains an open challenge. The lack of reliable and practical auditing mechanisms can lead to critical privacy risks, such as residual information leakage. This paper initiates a systematic investigation into whether existing unlearning algorithms can truly forget the designated data. We propose the first practical and general-purpose auditing framework for machine unlearning, inspired by the concept of proof of ignorance. Our framework addresses the key practicality limitations of existing methods by eliminating the need for retraining-from-scratch baselines, avoiding the training of large numbers of shadow models, and requiring no intrusive intervention in the original training process. To evaluate the effectiveness of our framework, we first conduct validation experiments to verify its soundness and completeness. We then perform comprehensive experiments across six datasets and ten representative unlearning methods. The results demonstrate that our framework reliably distinguishes between successful and failed unlearning. In particular, we observe that retraining-based and fine-tuning-based methods can achieve effective unlearning, even when the target data remain in the original dataset. In contrast, de-optimization-based methods fail to achieve true unlearning and instead degrade the model's performance. Fisher/Hessian-based methods also fail to unlearn requested data, even formal certification is provided. Moreover, we show that our framework is robust against fake unlearning attempts and generalizes well to large language models.

2606.16242 2026-06-16 cs.LG cs.CL 新提交

Rapid Poison: Practical Poisoning Attacks Against the Rapid Response Framework

快速投毒:针对快速响应框架的实用投毒攻击

David Huang, Jaewon Chang, Avidan Shah, Prateek Mittal, Chawin Sitawarin

发表机构 * Princeton University(普林斯顿大学)

AI总结 揭示针对快速响应框架的投毒攻击,通过提示注入在训练集中植入恶意样本,实现目标性投毒和概念后门攻击,仅1%投毒率即可导致高达100%误报率和96%漏报率。

Comments Spotlight at ICML 2026

详情
AI中文摘要

快速响应(RR)框架部署在生产系统中,包括Anthropic的ASL-3安全措施,持续改进越狱检测分类器。当出现绕过这些分类器的新越狱方法时,快速响应会生成合成变体用于训练,帮助模型从新攻击中泛化并快速适应。我们揭示,提示注入可以渗透到该管道中,将投毒样本送入分类器的训练集,实现两个攻击目标:(I)目标性投毒攻击,通过将无害样本归类为越狱来制造误报,并具有特定所需特征(例如特定格式、主题或关键词);(II)基于概念的后门攻击,在存在后门触发器时,诱导对越狱输入产生漏报,甚至泛化到防御者明确训练过的攻击策略中的越狱。重要的是,我们的威胁模型限制攻击者只能修改越狱样本(不能修改良性数据或标签),这是先前工作未探索的约束,使得第二个目标特别具有挑战性。我们通过遗漏攻击解决这一问题,该攻击利用了一个新现象:当在概念缺失的不安全样本上训练时,分类器错误地将该概念的存在与安全标签关联。两种攻击在仅1%的投毒率下都会导致显著且在某些情况下近乎完全的标签翻转,实现高达100%的误报率和高达96%的漏报率。

英文摘要

The Rapid Response (RR) framework, deployed in production systems, including Anthropic's ASL-3 safeguards, continuously improves jailbreak-detection classifiers. When new jailbreaks emerge that bypass these classifiers, Rapid Response generates synthetic variants for training, helping the model generalize from the new attacks and quickly adapt. We reveal that prompt injection can infiltrate this pipeline to deliver poisoned samples into the classifier's training set, enabling two attack objectives: (I) targeted poisoning attacks that create false positives on harmless samples by categorizing them as a jailbreak, with a specific desired feature (e.g., certain formatting, subject, or keyword), (II) concept-based backdoor attacks that induce false negatives on jailbreak inputs, generalizing even to jailbreaks from attack strategies the defender explicitly trained against, when the backdoor trigger is present. Importantly, our threat model restricts adversaries to modifying only jailbreak samples (not benign data or labels), a constraint unexplored by prior work that makes the second objective particularly challenging. We address this with Omission Attack, which exploits a new phenomenon: when training on concept-absent unsafe samples, the classifier misassociates that concept's presence with the safe label. Both attacks cause substantial and in some cases near-complete label flipping at only a 1% poisoning rate, achieving up to 100% false positive rates and up to 96% false negative rates.

2606.16304 2026-06-16 cs.LG 新提交

pFedUL: Layer-Aware Federated Unlearning for Personalized Federated Learning

pFedUL: 面向个性化联邦学习的层感知联邦遗忘

Zhuodong Liu, Xiangyu Li, Zhihao Zhang

AI总结 针对个性化联邦学习(pFL)中共享层与个性化层分离带来的遗忘挑战,提出pFedUL框架,通过梯度贡献归因、自适应选择性遗忘和轻量级重校准协议,在保证遗忘效果的同时维持剩余客户97.3%的个性化精度。

Comments This paper has been accepted for publication in CMC-Computers, Materials & Continua

详情
AI中文摘要

联邦遗忘(FU)能够从联邦学习(FL)模型中移除特定数据贡献,以符合《通用数据保护条例》(GDPR)等法规。然而,现有大多数FU方法是为FedAvg范式设计的,其中所有客户端共享一个全局模型。在实践中,诸如FedPer、FedRep、Ditto和FedBN等个性化联邦学习(pFL)方法因其对非独立同分布(non-IID)数据的优越处理能力而得到广泛应用。这些方法将模型分解为共享的全局层和客户端特定的个性化层,从根本上改变了遗忘的语义,但这一设置鲜有关注。我们正式定义了pFL范式下的FU,识别出共享层上的遗忘完整性与剩余客户端的个性化保持之间的张力。然后,我们提出pFedUL,一个层感知的选择性遗忘框架,包含三个组件:(1)基于梯度的逐层贡献归因,分别量化目标客户端对共享参数和个性化参数的影响;(2)自适应选择性遗忘,跨层类型应用差异化的遗忘策略;(3)轻量级重校准协议,使剩余客户端能够以最小开销恢复个性化。我们进一步引入两个新指标——个性化保持分数(PPS)和跨客户端公平性指数(CFI),以评估pFL特定的遗忘质量。在CIFAR-10、CIFAR-100和FEMNIST上不同non-IID设置下的实验表明,pFedUL在实现与完全重训练相当的遗忘效果的同时,为剩余客户端保持了平均97.3%的个性化准确率。与六种适应pFL设置的最新FU方法相比,pFedUL始终实现了更优的个性化保持。

英文摘要

Federated unlearning (FU) enables the removal of specific data contributions from federated learning (FL) models to comply with regulations such as the General Data Protection Regulation (GDPR). However, most existing FU methods are designed for the FedAvg paradigm, where all clients share a single global model. In practice, personalized federated learning (pFL) methods such as FedPer, FedRep, Ditto, and FedBN have become widely adopted due to their superior handling of non-IID data. These methods decompose the model into shared global layers and client-specific personalized layers, fundamentally altering the semantics of unlearning, yet this setting has received little attention. We formalize FU under the pFL paradigm, identifying a tension between unlearning completeness on shared layers and personalization preservation for remaining clients. We then propose pFedUL, a layer-aware selective unlearning framework comprising three components: (1) gradient-based layer-wise contribution attribution that separately quantifies the target client's influence on shared and personalized parameters, (2) adaptive selective unlearning that applies differentiated forgetting strategies across layer types, and (3) a lightweight recalibration protocol enabling remaining clients to restore personalization with minimal overhead. We further introduce two new metrics, Personalization Preservation Score (PPS) and Cross-client Fairness Index (CFI), to evaluate pFL-specific unlearning quality. Experiments on CIFAR-10, CIFAR-100, and FEMNIST under varying non-IID settings indicate that pFedUL achieves unlearning effectiveness comparable to full retraining while maintaining an average of 97.3\% personalized accuracy for remaining clients. Compared with six state-of-the-art FU methods adapted to the pFL setting, pFedUL consistently achieves superior personalization preservation.

2606.16655 2026-06-16 cs.LG 新提交

Distribution Alignment for One-Shot Federated Learning via Optimal Transport

基于最优传输的单轮联邦学习分布对齐

Daniele Berardini, Vito Paolo Pastore, Vittorio Murino

发表机构 * AI for Good (AIGO), Italian Institute of Technology(意大利技术研究院AI for Good (AIGO)) MaLGa-DIBRIS, University of Genoa(热那亚大学MaLGa-DIBRIS) Department of Computer Science, University of Verona(维罗纳大学计算机科学系)

AI总结 针对单轮联邦学习中客户端数据异构导致的特征错位问题,提出SLOT-Align方法,利用共享冻结编码器、Bures-Wasserstein重心和测地最优传输映射实现无训练的特征对齐,提升模型精度与鲁棒性。

Comments Accepted at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

单轮联邦学习(OSFL)应对极端通信场景,其中客户端仅与服务器交互一次,放大了异构客户端数据分布的影响。特别是,客户端间的域偏移和标签偏移相互作用导致特征表示错位,而无法通过迭代优化纠正。现有OSFL方法依赖于蒸馏、服务器端生成或基于集成的聚合,但假设表示已对齐或分别处理域偏移和标签偏移。我们提出SLOT-Align(单轮、无学习的最优传输对齐),一种面向OSFL的几何感知特征协调框架。SLOT-Align使用共享冻结编码器提取紧凑特征统计,通过Bures-Wasserstein重心构建全局参考,并利用闭式测地最优传输映射对齐局部表示。该方法计算高效,可与依赖冻结编码器的现有OSFL流程结合,无需修改其训练过程。跨多个基准、预训练骨干网络和OSFL方法的大量实验表明,SLOT-Align在联合域偏移和标签偏移下持续提升准确率和鲁棒性。

英文摘要

One-Shot Federated Learning (OSFL) addresses extreme communication regimes in which clients interact with the server only once, amplifying the impact of heterogeneous client data distributions. In particular, the interaction of domain shift and label shift across clients induces misaligned feature representations that cannot be corrected through iterative optimization. Existing OSFL methods rely on distillation, server-side generation or ensemble-based aggregation, but assume aligned representations or address domain and label shift separately. We introduce SLOT-Align (Single-round, Learning-free Optimal Transport Alignment), a geometry-aware feature harmonization framework for OSFL. SLOT-Align uses a shared frozen encoder to extract compact feature statistics, constructs a global reference via Bures-Wasserstein barycenters, and aligns local representations using closed-form geodesic optimal transport maps. The method is computationally efficient and can be combined with existing OSFL pipelines relying on frozen encoders without modifying their training procedures. Extensive experiments across multiple benchmarks, pretrained backbones, and OSFL methods show that SLOT-Align consistently improves accuracy and robustness under joint domain and label shift.

2606.16891 2026-06-16 cs.LG cs.AI 新提交

Beyond Weights and Gradients: A Taxonomy of Federated Learning Messages

超越权重和梯度:联邦学习消息的分类学

Alvaro Javier Vargas Guerrero, Xinguang Wang, Quang Manh Doan, Guy Nagels

发表机构 * AIMS lab, Center for Neurosciences, UZ Brussel, Vrije Universiteit Brussel, Brussels, Belgium(AIMS实验室,神经科学中心,布鲁塞尔大学医院,布鲁塞尔自由大学,布鲁塞尔,比利时) Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium(人工智能实验室,布鲁塞尔自由大学,布鲁塞尔,比利时)

AI总结 本文提出联邦消息的正式数学定义,建立包含模型结构、统计摘要和数据条件表示的三类分类法,分析计算、通信和隐私权衡,并综述202篇文献揭示2021年后消息范式多样化趋势。

Comments 4 figures, 9 pages, with 7 pages of content

详情
AI中文摘要

联邦学习正迅速发展,超越了传统模型权重和梯度的交换,但现有定义未能涵盖现代负载(如合成数据和联邦分析)的全部范围。本文通过提出一个联邦消息的正式数学定义来弥补这一空白,该定义同时考虑了效用和隐私。我们引入了一个分类法,将这些交换组织为三类:模型结构、统计摘要和数据条件表示。通过基于计算需求、通信成本和隐私风险评估这些组别,我们提供了对去中心化训练中涉及权衡的更清晰理解。我们对202篇近期出版物的回顾凸显了自2021年以来向多样化消息范式的显著转变,标志着从标准深度学习更新向更专业信息共享的转变。该框架为未来研究优化联邦系统以适应不同硬件和安全需求提供了结构化路径。

英文摘要

Federated Learning is rapidly evolving beyond the exchange of traditional model weights and gradients, yet existing definitions fail to capture the full scope of modern payloads like synthetic data and federated analytics. This paper addresses the gap by proposing a formal mathematical definition of a federated message that accounts for both utility and privacy. We introduce a taxonomy that organizes these exchanges into three categories: model structures, statistical summaries, and data-conditioned representations. By evaluating these groups based on computational demands, communication costs, and privacy risks, we provide a clearer understanding of the trade-offs involved in decentralized training. Our review of 202 recent publications highlights a significant shift since 2021 toward diverse messaging paradigms, signaling a move away from standard deep learning updates toward more specialized information sharing. This framework provides a structured path for future research to optimize federated systems for varying hardware and security requirements.

2606.16952 2026-06-16 cs.LG cs.AI stat.AP stat.ME stat.ML 新提交

Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data

幻象与披露:合成数据审计的因果框架

Kareem Amin, Rudrajit Das, Alessandro Epasto, Adel Javanmard, Dennis Kraft, Mónica Ribero, Sergei Vassilvitskii

发表机构 * Google(谷歌) University of Southern California(南加州大学)

AI总结 提出一个可定制的实证审计框架,通过区分真实披露与幻象披露,利用统计假设检验检测合成数据中的隐私泄露,无需模型访问或参考模型,提供比先前方法更紧的隐私泄露下界。

Comments 35 pages, 10 tables, 5 figures

详情
AI中文摘要

生成式AI和大语言模型(LLMs)的快速普及激发了人们对合成数据的兴趣,将其作为敏感真实数据集的隐私保护替代方案。然而,生成高实用性合成数据往往存在记忆和复述训练语料中隐私信息的风险。在这项工作中,我们提出了一个可定制的实证审计框架,旨在检测和解释此类数据披露。我们的框架引入了一种机制来区分“真实披露”——系统直接复现用户信息的情况,以及“幻象披露”——系统偶然生成用户数据的情况。通过将输入数据划分为训练集和保留集,并应用严格的统计假设检验,我们确定观察到的披露是否与严格的隐私基线(如零学习或特定的差分隐私(DP)边界)一致。关键的是,这种方法不需要模型访问、不需要插入金丝雀数据,也不需要参考模型训练——仅需要合成输出和保留的控制集。我们证明,该框架有效地充当了成员推断攻击,提供了比先前基于数据的审计方法更紧的隐私泄露经验下界。我们的方法是模型无关的,适用于任何合成数据生成机制,并且所需的计算资源比影子模型或基于金丝雀的替代方法少几个数量级。

英文摘要

The rapid adoption of generative AI and Large Language Models (LLMs) has spurred interest in synthetic data as a privacy-preserving alternative to sensitive real-world datasets. However, generating high-utility synthetic data often carries the risk of memorizing and regurgitating private information from the training corpus. In this work, we present a customizable empirical auditing framework designed to detect and explain such data disclosures. Our framework introduces a mechanism to distinguish between "true disclosures"-where the system directly reproduces a user's information-and "phantom disclosures''-where the system incidentally generates a user's data. By partitioning input data into training and holdout sets and applying rigorous statistical hypothesis testing, we determine if observed disclosures are consistent with strict privacy baselines, such as zero-learning or specific Differential Privacy (DP) bounds. Crucially, this approach requires no model access, no canary insertion, and no reference model training -only the synthetic output and a held-out control set. We demonstrate that this framework effectively functions as a membership inference attack, providing empirical lower bounds on privacy leakage that are tighter than prior data-based auditing methods. Our approach is model-agnostic, applies to any synthetic data generation mechanism, and requires orders of magnitude fewer computational resources than shadow-model or canary-based alternatives.

2606.17035 2026-06-16 cs.LG cs.CR 新提交

Your Privacy My Cloak: Backdoor Attacks on Differentially Private Federated Learning

你的隐私我的伪装:差分隐私联邦学习中的后门攻击

Xiaolin Li, Ning Wang, Ninghui Li, Wenhai Sun

AI总结 针对差分隐私联邦学习,提出RING攻击,利用差分隐私的掩蔽效应绕过防御,在中等隐私预算下平均攻击成功率90.3%。

详情
AI中文摘要

先前的研究表明,差分隐私(DP)本质上增强了联邦学习(FL)对后门攻击的鲁棒性。在本文中,我们挑战了这一假设。通过对两种基线攻击策略的实证分析,我们揭示了DP-FL中的一个基本矛盾:虽然绕过DP使得最先进的防御能够检测并过滤恶意更新,但遵守DP却无意中掩盖了其独特的统计特征。因此,随着DP降低原始后门信号,现有防御变得无效。基于这种掩蔽效应,我们提出了RING,一种新颖的攻击,明确利用DP来隐藏恶意贡献,同时最大化攻击影响。通过协同制作对抗性扰动,受损客户端在聚合过程中重构强大的后门信号而不触发异常检测。RING作为一个与底层后门技术无关的扰动层,使其广泛适用且可与现有攻击组合——这一特性显著放大了其对DP-FL的威胁。在四个图像和文本数据集上进行的非独立同分布分布下的广泛评估表明,在中等隐私预算下,RING针对六种最先进防御的平均攻击成功率达到90.3%,比基线策略提高了高达26.08倍。最后,我们评估了潜在的防御措施,发现缓解这一威胁会带来显著的效用权衡,暴露了部署差分隐私FL中的基本安全漏洞。

英文摘要

Prior research suggests that differential privacy (DP) inherently enhances the robustness of federated learning (FL) against backdoor attacks. In this paper, we challenge this assumption. Through an empirical analysis of two baseline attack strategies, we uncover a fundamental tension in DP-FL: while bypassing DP allows state-of-the-art defenses to detect and filter malicious updates, complying with DP inadvertently masks their distinguishing statistical characteristics. Consequently, existing defenses become ineffective as DP reduces the raw backdoor signal. Building on this masking effect, we propose RING, a novel attack that explicitly exploits DP to conceal malicious contributions while maximizing attack impact. By collaboratively crafting adversarial perturbations, compromised clients reconstruct a strong backdoor signal during aggregation without triggering anomaly detection. RING operates as a perturbation layer that is agnostic to the underlying backdoor technique, making it broadly applicable and composable with existing attacks -- a property that significantly amplifies the threat it poses to DP-FL. Extensive evaluations across four image and text datasets under non-iid distributions show that RING achieves an average attack success rate of 90.3% against six state-of-the-art defenses under a moderate privacy budget, an improvement of up to 26.08x over baseline strategies. Finally, we evaluate potential countermeasures and find that mitigating this threat incurs significant utility trade-offs, exposing a fundamental security gap in the deployment of differentially private FL.

2606.14987 2026-06-16 cs.CR cs.LG 交叉投稿

Continual Backdoor Training in IoT/CPS

物联网/信息物理系统中的持续后门训练

Oxana Salish, Kuniyilh S

AI总结 本文提出一种针对物联网/信息物理系统中持续学习的后门攻击方法,通过形式化威胁模型、分析持续学习放大后门持久性的原因,并评估不同条件下的攻击效果,揭示了保障终身学习安全的关键挑战。

详情
AI中文摘要

物联网(IoT)和信息物理系统(CPS)越来越依赖持续学习(CL)来适应不断变化的环境、设备异构性和概念漂移,从而提高整体效用。虽然持续适应对于数据模式演变的长期IoT部署至关重要,但它也引入了新的安全漏洞。特别是,后门攻击可以利用增量更新、重放缓冲区和表示重用来植入持久的恶意行为,这些行为在正常操作期间保持休眠,但在特定触发器激活时被触发。在本文中,我们提出了一种针对IoT/CPS系统中持续学习的后门攻击。为此,我们形式化了IoT/CPS特定的威胁模型,分析了为什么持续学习会放大IoT流水线中的后门持久性,并在不同条件下评估了我们的技术。我们的分析强调了在IoT/CPS和工业物联网(IIoT)环境中保障终身学习的关键开放挑战,以及加强安全控制的必要性。

英文摘要

Internet of Things (IoT) and Cyber-physical systems (CPS) increasingly rely on continual learning (CL) to adapt to evolving environments, device heterogeneity, and concept drift, thereby improving overall utility. While continual adaptation is essential for long-lived IoT deployments where data patterns evolve, it also introduces new security vulnerabilities. In particular, backdoor attacks can exploit incremental updates, replay buffers, and representation reuse to implant persistent malicious behaviors that remain dormant during normal operation but activate upon specific triggers. In this paper, we present a backdoor attack in continual learning used in IoT/CPS systems. To this end, we formalize an IoT/CPS-specific threat model, analyze why continual learning amplifies backdoor persistence in IoT pipelines, and evaluate our technique under varying conditions. Our analysis highlights critical open challenges in securing lifelong learning in IoT/CPS and industrial IoT (IIoT) environments, as well as the need for heightened security controls.

2606.15277 2026-06-16 cs.IR cs.AI cs.DB cs.ET cs.LG 交叉投稿

Guiding Federated Graph Recommendation with LLM-encoded knowledge

利用LLM编码知识指导联邦图推荐

Thi Minh Chau Nguyen, Hien Trang Nguyen, Duc Anh Nguyen, Van Ho-Long, Thanh Trung Huynh, Zhao Ren

发表机构 * institutetext(机构)

AI总结 针对联邦图推荐中跨客户端图表示对齐难的问题,提出利用大语言模型编码的语义信号指导结构表示的选择性聚合,提升推荐准确性。

Comments Technical Report

详情
AI中文摘要

基于图的推荐系统在从用户-物品交互中提取协同信号方面非常有效,联邦学习(FL)则可以在保护用户隐私的同时训练这些模型。然而,跨分布式、非独立同分布(non-IID)客户端聚合图表示仍然是一个挑战;局部学习的结构嵌入常常不对齐,简单的平均无法捕捉有意义的跨客户端关系。大多数现有的联邦图方法仅依赖结构聚合,忽略了大型语言模型(LLM)中丰富的全局语义上下文。在本文中,我们提出了一种新颖的框架,利用LLM编码的知识来指导联邦图推荐。具体来说,客户端从局部图中学习结构表示,同时通过冻结的LLM将其典型交互模式总结为紧凑的语义向量。中央服务器随后利用这些LLM编码的语义信号发现跨客户端的相关偏好模式,指导其结构表示的选择性聚合。这实现了语义感知的跨客户端协作,而无需暴露原始数据。在标准基准上的大量实验表明,利用LLM编码知识指导结构对齐一致地提高了现有联邦图基线的推荐准确性。

英文摘要

Graph-based recommender systems are highly effective at extracting collaborative signals from user--item interactions, and federated learning (FL) allows these models to be trained while preserving user privacy. However, aggregating graph representations across distributed, non-IID clients remains a challenge; structural embeddings learned locally often misalign, and naive averaging fails to capture meaningful cross-client relationships. Most existing federated graph methods rely exclusively on structural aggregation, neglecting the rich, global semantic context available in large language models (LLMs). In this paper, we propose a novel framework that uses LLM-encoded knowledge to guide federated graph recommendation. Specifically, clients learn structural representations from local graphs while simultaneously summarizing their typical interaction patterns into compact semantic vectors via a frozen LLM. The central server then uses these LLM-encoded semantic signals to discover related preference patterns across clients, guiding the selective aggregation of their structural representations. This enables semantically informed cross-client collaboration without exposing raw data. Extensive experiments on standard benchmarks show that guiding structural alignment with LLM-encoded knowledge consistently improves recommendation accuracy over existing federated graph baselines.

2606.15963 2026-06-16 cs.DC cs.AI cs.CL cs.LG 交叉投稿

PreLort: Prefix-Nested LoRA for Federated Fine-Tuning under Rank Heterogeneity

PreLort: 面向秩异构联邦微调的前缀嵌套LoRA

Muhammad Waseem, Nurbek Tastan, Andrej Jovanovic, Nicholas D. Lane, Nils Lukas, Karthik Nandakumar, Samuel Horvath

发表机构 * MBZUAI, UAE University of Cambridge, UK(MBZUAI,阿联酋剑桥大学,英国) Flower Labs, UK(Flower Labs,英国) Michigan State University, USA(密歇根州立大学,美国)

AI总结 针对联邦LoRA中异构秩导致的信息分布不均问题,提出PreLort方法,通过前缀层次化嵌套低秩结构、分段聚合规则和前缀嵌套训练策略,使低秩客户端受益于高秩客户端的丰富信息,在准确率和ROUGE-L上优于现有方法。

详情
AI中文摘要

使用LoRA等参数高效方法对大型语言模型进行联邦微调,能够实现基础模型的隐私保护适配。异构硬件资源带来了挑战,因为具有不同适配器秩的客户端无法直接聚合。现有方法虽能实现异构秩下的聚合,但未能控制信息在秩维度上的分布,导致共享低秩表示利用不充分。为此,我们提出PreLort:一种用于联邦LoRA的嵌套低秩公式,将适配器维度组织成前缀层次结构。我们的方法确保较低秩维度编码任务相关信息,而较高秩维度捕获额外容量。基于此,我们引入(i)分段聚合规则,仅对贡献于每个秩分段的客户端进行平均,避免来自零填充低秩客户端的稀释;以及(ii)前缀嵌套训练策略,在多个秩截断下优化每个适配器,鼓励有用信号集中在低秩前缀维度。这些组件共同鼓励一个一致的低秩前缀捕获最任务相关信息,而较高秩维度学习额外容量。这使得低秩客户端能够受益于高秩客户端贡献的更丰富信息,因为前缀维度被一致地学习和聚合。实验表明,我们的方法在准确率和ROUGE-L上持续优于先前的异构联邦LoRA方法,并在多个基础模型上实现了更低或相当困惑度。

英文摘要

Federated fine-tuning of large language models using parameter-efficient methods such as LoRA enables privacy-preserving adaptation of foundation models. Heterogeneous hardware resources introduce challenges, as clients with different adapter ranks cannot be directly aggregated. While existing methods enable aggregation under heterogeneous ranks, they fail to control how information is distributed across rank dimensions, leading to suboptimal use of shared low-rank representations. Instead, we propose PreLort: a nested low-rank formulation for federated LoRA that organizes adapter dimensions into a prefix hierarchy. Our approach ensures that lower-rank dimensions encode task-relevant information, while higher-rank dimensions capture additional capacity. Building on this, we introduce (i) a segment-wise aggregation rule that averages only over clients contributing to each rank segment, avoiding dilution from zero-padded lower-rank clients, and (ii) a prefix-nested training strategy that optimizes each adapter under multiple rank truncations, encouraging useful signal to concentrate in low-rank prefix dimensions. Together, these components encourage a consistent low-rank prefix capturing the most task-relevant information, while higher-rank dimensions learn additional capacity. This allows low-rank clients to benefit from richer information contributed by higher-rank clients, as prefix dimensions are consistently learned and aggregated. Experiments demonstrate that our method consistently outperforms prior heterogeneous federated LoRA methods in accuracy and ROUGE-L, while achieving lower or comparable perplexity across multiple base models.

2606.16100 2026-06-16 cs.CR cs.CL cs.LG 交叉投稿

Your "Pro" LLM Subscription May Actually Be "Free": Exposing Fingerprint Spoofing Risks in LLM Inference Services

你的“专业”LLM订阅可能实际上是“免费”的:揭示LLM推理服务中的指纹欺骗风险

Jiahao Zhang, Xiuyu Li, Suhang Wang

发表机构 * The Pennsylvania State University(宾夕法尼亚州立大学)

AI总结 提出指纹欺骗攻击,恶意服务商通过参数高效微调弱模型模仿强模型,绕过用户指纹验证;理论证明用户资源限制导致指纹易被欺骗,并设计GhostPrint攻击框架,实验表明其能以低成本持续绕过主流指纹方法。

详情
AI中文摘要

随着大型语言模型(LLM)API变得无处不在,用户越来越依赖黑盒指纹识别来验证提供商是否提供广告中宣传的高级模型。然而,这些方法可能忽视那些操纵模型权重以欺骗指纹识别过程的对抗性提供商。我们引入了一种称为指纹欺骗的新威胁,其中恶意提供商隐秘地提供一个通过参数高效微调以模仿更强模型的较弱模型,从而规避用户端的指纹识别。我们首先正式证明用户端资源限制(即有限的查询预算和弱指纹分类器)使得当前的指纹识别容易受到指纹欺骗。在此理论分析指导下,我们提出了GhostPrint,一个利用代理建模、奖励排名微调和知识蒸馏的成本效益攻击框架。在静态和持续指纹识别设置中的广泛评估表明,GhostPrint允许弱模型以低微调成本持续绕过代表性指纹方法,同时保持实用性,暴露了当前LLM指纹识别流程中的一个关键漏洞。

英文摘要

As Large Language Model (LLM) APIs become ubiquitous, users increasingly rely on black-box fingerprinting to verify that providers are serving the advertised premium models. However, these methods may overlook adversarial providers who manipulate model weights to cheat the fingerprint process. We introduce a novel threat termed fingerprint spoofing, where a malicious provider stealthily serves a weaker model that has been parameter-efficiently fine-tuned to mimic a stronger model, thereby evading user-side fingerprinting. We first formally prove that user-side resource constraints (i.e., finite query budgets and weak fingerprinting classifiers) make current fingerprinting vulnerable to fingerprint spoofing. Guided by this theoretical analysis, we propose GhostPrint, a cost-effective attack framework leveraging surrogate modeling, reward-ranked fine-tuning, and knowledge distillation. Extensive evaluations in both static and continual fingerprinting settings demonstrate that GhostPrint allows weak models to consistently bypass representative fingerprint methods while maintaining utility at a low fine-tuning cost, exposing a critical vulnerability in current LLM fingerprinting pipelines.

2606.16180 2026-06-16 cs.CV cs.LG 交叉投稿

To forget is to preserve: Machine Unlearning for 3D medical image segmentation

遗忘即保留:面向3D医学图像分割的机器遗忘

Nitesh Kumar Singh, Akhilesh Singh, Arjun Arora

发表机构 * University of California, San Diego(加州大学圣地亚哥分校)

AI总结 针对数据隐私法规,研究基于四种机制的近似遗忘策略在3D医学图像分割中的应用,通过Dice系数和MAE评估,发现噪声标签策略在遗忘集和保留集间取得最佳平衡。

Comments 9 pages, 5 figures

详情
AI中文摘要

随着新的数据隐私法规(如GDPR [1])允许个人要求从训练好的机器学习模型中删除其任何个人信息,人们开始推动研究从模型中遗忘数据以遵守这些法律。在这方面,基于四种机制,我们考虑了几种应用于MRBrainS18数据集 [2] 的近似遗忘策略。我们使用3D ResNet-50 [3] 作为分割的骨干架构,该架构已通过Med3D框架 [4] 进行预训练。以预训练模型为基线,我们评估了在两类主体(即保留和遗忘)上的相应保留准确率。我们通过Dice相似系数和平均绝对误差(MAE)值评估这些方法,使用两个独立的训练周期(20和50个epoch)。结果表明,噪声标签策略具有最佳的整体权衡,在50个epoch后,遗忘集准确率下降93%,同时保留集准确率保持84%。所有其他策略在更高的epoch数下表现出极端的遗忘水平,同时其保留集性能也出现灾难性退化。本研究结果为在主体特定水平上的遗忘提供了严格的性能指标基线,并为从业者选择适当策略提供了明确标准。

英文摘要

With new data privacy laws such as the General Data Protection Regulation (GDPR) [1] that allow individuals to ask that any of their personal information be erased from trained machine learning models, there has been a push to investigate the unlearning of data from models as a way to comply with these laws. In this regard, based on four mechanics, we consider several approximate unlearning strategies applied to the MRBrainS18 dataset [2]. We use a 3D ResNet-50 [3] as a backbone architecture for segmentation that has been pre-trained with the Med3D framework [4]. Considering the pre-trained model as a baseline, we evaluate respective retention accuracy on 2 types of subjects, i.e., retain and forget. We assess these approaches through their Dice similarity coefficient and mean absolute error (MAE) values using two separate training horizons 20 and 50 epochs. The results show that the Noisy Label strategy had the best overall trade-off with a decrease of 93% in the forget set while maintaining 84% accuracy for the retained set after 50 epochs. All other strategies showed extreme levels of forgetting at higher epoch numbers while also demonstrating catastrophic degradation of their retain set performance. The results of this study provide a strict baseline of performance metrics for unlearning on a subject-specific level and provide practitioners with clear criteria for selecting the proper strategies.

2606.16763 2026-06-16 cs.CR cs.IT cs.LG math.IT 交叉投稿

Cross-Silo De-Anonymization Under Local Differential Privacy: Threat Model, Phase Transition, and Coordination Necessity

跨数据源去匿名化在本地差分隐私下的威胁模型、相变与协调必要性

Ziniu Liu, Aiping Li

AI总结 本文提出跨数据源人员级DP(XSP-DP)框架,证明去匿名化在k*=Θ(log n/ε²)处发生相变,并表明跨数据源协调对防御攻击是必要的。

Comments 23 pages, 4 figures

详情
AI中文摘要

当一个人的记录出现在k个独立数据源中,每个数据源受(ε, δ)-差分隐私保护时,标准组合机制为联合输出提供有效的(kε, kδ)-DP保证。然而,这个最坏情况边界并未回答具体的推断问题:在多大的k下,攻击者实际上能识别出目标人物?本文开发了回答该问题所需的信息论框架。\n我们引入了跨数据源人员级DP(XSP-DP),这是一种Pufferfish风格的隐私概念,其邻接关系同时捕获单个人员在所有数据源中的所有记录,并验证了标准基本组合边界适用于该邻接模型。在此框架内,我们证明去匿名化在k* = Θ(log n / ε²)(总体规模n,每个数据源RR参数ε)处经历相变:Fano下界表明当k << k*时任何估计器都失败,而匹配的最大似然上界表明当k >> k*时攻击成功。一个显式的XOR+随机响应构造展示了信息协同:每个数据源的输出单独对目标无信息,但联合互信息严格为正。对于非协调的二元随机响应机制,我们证明一旦k超过阈值,去匿名化不可避免,从而确立了跨数据源协调的必要性。\n这些结果为本地DP下的跨数据源推断攻击提供了基线威胁模型和Θ级阈值。

英文摘要

When a person's records appear in k independent data silos, each protected by (epsilon, delta)-differential privacy, standard composition yields a valid (k*epsilon, k*delta)-DP guarantee for the joint output. This worst-case bound, however, does not answer the concrete inference question: at what k can an adversary actually identify a target person? This paper develops the information-theoretic framework needed to answer that question. We introduce cross-silo person-level DP (XSP-DP), a Pufferfish-style privacy notion whose adjacency relation captures all records of a single person across all silos simultaneously, and verify that the standard basic composition bound carries over to this adjacency model. Within this framework we prove that de-anonymization undergoes a phase transition at k* = Theta(log n / epsilon^2) (population size n, per-silo RR parameter epsilon): a Fano lower bound shows any estimator fails for k << k*, while a matching maximum-likelihood upper bound shows the attack succeeds for k >> k*. An explicit XOR + randomized-response construction demonstrates information synergy: each silo's output is individually uninformative about the target, yet the joint mutual information is strictly positive. For non-coordinated binary randomized-response mechanisms, we prove that de-anonymization is inevitable once k exceeds the threshold, establishing that cross-silo coordination is necessary. These results provide a baseline threat model and Theta-level threshold for cross-silo inference attacks under local DP.

2407.04884 2026-06-16 cs.LG cs.CR 版本更新

Convex Approximation of Two-Layer ReLU Networks for Hidden State Differential Privacy

两层ReLU网络的凸近似用于隐藏状态差分隐私

Rob Romijnders, Antti Koskela

发表机构 * University of Amsterdam(阿姆斯特丹大学) Nokia Bell Labs(诺基亚贝尔实验室)

AI总结 提出通过ReLU最小化问题的对偶形式的随机近似,将两层ReLU网络转化为强凸问题,从而应用隐藏状态差分隐私分析,实现与DP-SGD相当的隐私-效用权衡。

Comments Errata: correction of Lemma 3.1. Added experiments in Appendix D.1

详情
AI中文摘要

差分隐私的隐藏状态威胁模型假设攻击者只能访问最终训练好的机器学习模型,而无法看到训练过程中的中间状态。然而,当前该模型下的隐私分析仅限于凸优化问题,降低了其对多层神经网络的适用性,而多层神经网络在现代深度学习应用中至关重要。值得注意的是,隐藏状态隐私分析在分类任务中最成功的应用仅限于逻辑回归模型。我们证明,通过ReLU最小化问题的对偶形式的随机近似,可以得到一个强凸问题,从而能够私下训练凸问题,其隐私-效用权衡与使用差分隐私随机梯度下降(DP-SGD)训练的两层ReLU网络相当。这使得现有的隐藏状态隐私分析得以应用,并为使用固定不相交小批量的噪声循环小批量梯度下降(NoisyCGD)方法提供准确的隐私界限。在基准分类任务上的实证结果表明,NoisyCGD可以实现与应用于两层ReLU网络的DP-SGD相当的隐私-效用权衡。

英文摘要

The hidden state threat model of differential privacy (DP) assumes that the adversary has access only to the final trained machine learning (ML) model, without seeing intermediate states during training. However, the current privacy analyses under this model are restricted to convex optimization problems, reducing their applicability to multi-layer neural networks, which are essential in modern deep learning applications. Notably, the most successful applications of the hidden state privacy analyses in classification tasks have only been for logistic regression models. We demonstrate that it is possible to privately train convex problems with privacy-utility trade-offs comparable to those of 2-layer ReLU networks trained with DP stochastic gradient descent (DP-SGD). This is achieved through a stochastic approximation of a dual formulation of the ReLU minimization problem, resulting in a strongly convex problem. This enables the use of existing hidden state privacy analyses and provides accurate privacy bounds also for the noisy cyclic mini-batch gradient descent (NoisyCGD) method with fixed disjoint mini-batches. Empirical results on benchmark classification tasks demonstrate that NoisyCGD can achieve privacy-utility trade-offs on par with DP-SGD applied to 2-layer ReLU networks.

2411.02908 2026-06-16 cs.LG cs.DC 版本更新

Photon: Federated LLM Pre-Training

Photon: 联邦大语言模型预训练

Lorenzo Sani, Alex Iacob, Zeyu Cao, Royson Lee, Bill Marino, Yan Gao, Dongqi Cai, Zexi Li, Wanru Zhao, Xinchi Qiu, Nicholas D. Lane

发表机构 * University of Cambridge(剑桥大学) Flower Labs(Flower实验室) Zhejiang University(浙江大学)

AI总结 提出Photon系统,首次实现联邦端到端LLM预训练,通过跨孤岛联邦学习在弱连接GPU上训练高达7B参数模型,通信量减少64-512倍,收敛速度比DiLoCo快2倍。

Comments 18 pages, 9 appendix pages, 12 figures, 3 algorithms, 12 tables

详情
Journal ref
Proceedings of Machine Learning and Systems 7 (MLSys 2025), 2025
AI中文摘要

扩展大型语言模型(LLM)需要大量的数据和计算资源,传统上由于分布式训练的高带宽需求,这些资源被限制在数据中心内。低带宽方法如联邦学习(FL)如果能够有效地用于预训练,则可以实现在弱连接GPU上协作训练更大的模型。为此,我们介绍了Photon,这是第一个用于联邦端到端LLM训练的系统,利用跨孤岛FL进行全球规模训练,同时最小化通信开销。使用Photon,我们从零开始训练了第一个联邦式仅解码器LLM系列。我们证明:(1)Photon可以以联邦方式训练高达7B的模型大小,同时达到比集中式预训练更好的困惑度;(2)Photon模型训练时间随可用计算资源减少,实现与集中式类似的计算-时间权衡;(3)Photon通过通信量减少64-512倍,比基线分布式训练方法的实际时间提升35%。我们的方法对数据异质性具有鲁棒性,并且收敛速度是DiLoCo等先前方法的两倍。这种惊人的数据效率源于一种独特的方法,即结合小客户端批量大小和极高的学习率,这得益于联邦平均对超参数的鲁棒性。因此,Photon代表了第一个经济可行的全球互联网范围LLM预训练系统。

英文摘要

Scaling large language models (LLMs) demands extensive data and computing resources, which are traditionally constrained to data centers by the high-bandwidth requirements of distributed training. Low-bandwidth methods like federated learning (FL) could enable collaborative training of larger models across weakly-connected GPUs if they can effectively be used for pre-training. To achieve this, we introduce Photon, the first complete system for federated end-to-end LLM training, leveraging cross-silo FL for global-scale training with minimal communication overheads. Using Photon, we train the first federated family of decoder-only LLMs from scratch. We show that: (1) Photon can train model sizes up to 7B in a federated fashion while reaching an even better perplexity than centralized pre-training; (2) Photon model training time decreases with available compute, achieving a similar compute-time trade-off to centralized; and (3) Photon outperforms the wall-time of baseline distributed training methods by 35% via communicating 64x-512xless. Our proposal is robust to data heterogeneity and converges twice as fast as previous methods like DiLoCo. This surprising data efficiency stems from a unique approach combining small client batch sizes with extremely high learning rates, enabled by federated averaging's robustness to hyperparameters. Photon thus represents the first economical system for global internet-wide LLM pre-training.

2505.19699 2026-06-16 cs.LG cs.AI cs.DC 版本更新

Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments

Mosaic: 面向异构分布式环境的无数据知识蒸馏与混合专家模型

Junming Liu, Yanting Gao, Yuqi Li, Siyuan Meng, Yifei Sun, Aoqi Wu, Yirong Chen, Ding Wang, Shiping Wen

发表机构 * School of Computer Science and Technology, Tongji University(同济大学计算机科学与技术学院) Shanghai Artificial Intelligence Laboratory(上海人工智能实验室) The City University of New York(纽约城市大学) Shenzhen University of Advanced Technology(深圳先进技术大学)

AI总结 针对联邦学习中模型与数据异构性问题,提出Mosaic框架,通过本地生成模型合成隐私保护数据,并利用混合专家模型蒸馏全局模型,在图像和多模态基准上超越现有方法。

Comments 23 pages, 5 figures, 24 tables; Accepted by Knowledge-Based Systems, 2026

详情
AI中文摘要

联邦学习(FL)是一种去中心化的机器学习范式,使客户端能够在保护数据隐私的同时协作训练模型。然而,模型和数据异构性的共存导致客户端间表示不一致和优化动态发散,最终阻碍了鲁棒的全局性能。为克服这些挑战,我们提出了Mosaic,一种面向异构分布式环境的新型无数据知识蒸馏框架。Mosaic首先训练本地生成模型以近似每个客户端的个性化分布,从而能够生成合成数据,并通过与真实数据严格分离来保护隐私。随后,Mosaic根据客户端模型的专业知识形成混合专家模型(MoE),并使用生成的数据将其蒸馏到全局模型中。为进一步增强MoE架构,Mosaic通过一个在少量代表性原型上训练的轻量级元模型来集成专家预测。在标准图像和多模态基准上的大量实验表明,Mosaic在模型和数据异构性下均持续优于最先进的方法。源代码已发布在https://this https URL。

英文摘要

Federated Learning (FL) is a decentralized machine learning paradigm that enables clients to collaboratively train models while preserving data privacy. However, the coexistence of model and data heterogeneity gives rise to inconsistent representations and divergent optimization dynamics across clients, ultimately hindering robust global performance. To transcend these challenges, we propose Mosaic, a novel data-free knowledge distillation framework tailored for heterogeneous distributed environments. Mosaic first trains local generative models to approximate each client's personalized distribution, enabling synthetic data generation that safeguards privacy through strict separation from real data. Subsequently, Mosaic forms a Mixture-of-Experts (MoE) from client models based on their specialized knowledge, and distills it into a global model using the generated data. To further enhance the MoE architecture, Mosaic integrates expert predictions via a lightweight meta model trained on a few representative prototypes. Extensive experiments on standard image and multimodal benchmarks demonstrate that Mosaic consistently outperforms state-of-the-art approaches under both model and data heterogeneity. The source code has been published at https://github.com/Wings-Of-Disaster/Mosaic.

2505.23593 2026-06-16 cs.LG 版本更新

Federated Foundation Language Model Post-Training Should Focus on Open-Source Models

联邦基础语言模型后训练应聚焦开源模型

Nikita Agrawal, Ruben Mayer

发表机构 * University of Bayreuth(拜罗伊特大学)

AI总结 本文批判性分析联邦后训练中使用黑盒模型的问题,主张采用开源模型以符合联邦学习的数据隐私和自治原则。

Comments Accepted at International Workshop on Federated Learning in the Age of Foundation Models In Conjunction with IJCAI 2026

详情
AI中文摘要

基础语言模型的后训练已成为联邦学习(FL)中一个充满希望的研究领域,其目标是实现隐私保护的模型改进和适应用户的下游任务。该领域的最新进展采用了集中式后训练方法,这些方法建立在无法访问模型权重和架构细节的黑盒基础语言模型之上。尽管黑盒模型在集中式后训练中取得了成功,但它们在FL中的盲目复制引发了一些问题。我们的观点是,在FL中使用黑盒模型与联邦的核心原则(如数据隐私和自治)相矛盾。在本文中,我们批判性地分析了黑盒模型在联邦后训练中的使用,并详细阐述了开放性的各个方面及其对FL的影响。

英文摘要

Post-training of foundation language models has emerged as a promising research domain in federated learning (FL) with the goal to enable privacy-preserving model improvements and adaptations to user's downstream tasks. Recent advances in this area adopt centralized post-training approaches that build upon black-box foundation language models where there is no access to model weights and architecture details. Although the use of black-box models has been successful in centralized post-training, their blind replication in FL raises several concerns. Our opinion is that using black-box models in FL contradicts the core principles of federation such as data privacy and autonomy. In this paper, we critically analyze the usage of black-box models in federated post-training, and provide a detailed account of various aspects of openness and their implications for FL.

2506.22427 2026-06-16 cs.LG cs.AI 版本更新

CLoVE: Personalized Federated Learning through Clustering of Loss Vector Embeddings

CLoVE: 通过损失向量嵌入聚类的个性化联邦学习

Randeep Bhatia, Nikos Papadis, Murali Kodialam, TV Lakshman, Sayak Chakrabarty

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出CLoVE算法,利用客户端损失向量嵌入进行聚类,实现个性化联邦学习,具有简单、适用监督和无监督任务、无需最优模型初始化等优点,理论证明可高概率准确恢复聚类并指数收敛。

Comments Accepted at the 43rd International Conference on Machine Learning (ICML 2026); 35 pages, 7 figures

详情
AI中文摘要

我们提出了CLoVE(损失向量嵌入聚类),一种用于聚类联邦学习(CFL)的新算法。在CFL中,客户端根据其数据分布自然分组为聚类。然而,识别这些聚类具有挑战性,因为客户端分配是未知的。CLoVE利用从客户端数据上的模型损失导出的客户端嵌入,并利用以下洞察:同一聚类中的客户端共享相似的损失值,而不同聚类中的客户端表现出不同的损失模式。基于这些嵌入,CLoVE能够迭代地识别和分离来自不同聚类的客户端,并通过联邦聚合优化特定聚类的模型。与现有CFL算法相比,CLoVE的主要优点是:(1)简单性,(2)适用于监督和无监督设置,(3)消除了对接近最优模型初始化的需求,使其更稳健,更适合实际应用。我们建立了理论收敛界,表明CLoVE可以在单轮中以高概率准确恢复聚类,并在线性设置中以指数速度收敛到最优模型。我们与多种CFL和通用个性化联邦学习(PFL)算法在不同类型数据集和广泛非IID设置下的全面实验表明,CLoVE在仅几轮训练中就能实现高度准确的聚类恢复,并在各种监督和无监督PFL任务中达到最先进的模型精度。

英文摘要

We propose CLoVE (Clustering of Loss Vector Embeddings), a novel algorithm for Clustered Federated Learning (CFL). In CFL, clients are naturally grouped into clusters based on their data distribution. However, identifying these clusters is challenging, as client assignments are unknown. CLoVE utilizes client embeddings derived from model losses on client data, and leverages the insight that clients in the same cluster share similar loss values, while those in different clusters exhibit distinct loss patterns. Based on these embeddings, CLoVE is able to iteratively identify and separate clients from different clusters and optimize cluster-specific models through federated aggregation. Key advantages of CLoVE over existing CFL algorithms are (1) its simplicity, (2) its applicability to both supervised and unsupervised settings, and (3) the fact that it eliminates the need for near-optimal model initialization, which makes it more robust and better suited for real-world applications. We establish theoretical convergence bounds, showing that CLoVE can recover clusters accurately with high probability in a single round and converges exponentially fast to optimal models in a linear setting. Our comprehensive experiments comparing with a variety of both CFL and generic Personalized Federated Learning (PFL) algorithms on different types of datasets and an extensive array of non-IID settings demonstrate that CLoVE achieves highly accurate cluster recovery in just a few rounds of training, along with state-of-the-art model accuracy, across a variety of both supervised and unsupervised PFL tasks.

2510.04902 2026-06-16 cs.LG 版本更新

DP-Hype: Federated Differentially Private Hyperparameter Search

DP-Hype: 联邦差分隐私超参数搜索

Johannes Liebenow, Thorsten Peinemann, Esfandiar Mohammadi

发表机构 * Institute of IT Security, University of Luebeck(吕贝克大学IT安全研究所)

AI总结 提出DP-Hype算法,通过联邦投票基于客户端本地评估进行隐私保护超参数搜索,实现客户端级差分隐私且隐私预算与超参数数量无关,在多种数据集上展示高效用。

Comments PoPETs 26

详情
AI中文摘要

在联邦机器学习中调整超参数可以显著影响模型性能。当超参数在敏感数据上调整时,隐私成为一个重要挑战,为此差分隐私已成为可证明隐私的事实标准。联邦学习中的一个标准设置是客户端就共享设置达成一致,即从一组超参数(如模型的学习率)中找到折衷方案。然而,先前关于隐私保护超参数调整的工作针对特定学习任务定制,未考虑聚合结果的隐私泄露,或提供了次优的隐私-效用权衡。在这项工作中,我们提出了我们的算法DP-Hype,它通过基于客户端的本地超参数评估进行联邦投票来执行联邦且隐私保护的超参数搜索。通过这种方式,DP-Hype选择能够获得大多数客户端支持的折衷超参数,同时保持可扩展性和与特定学习任务的独立性。我们证明了DP-Hype保留了称为客户端级差分隐私的强差分隐私概念,并且重要的是,表明其隐私保证不依赖于超参数的数量。我们还提供了其效用保证的界限,即找到良好超参数的概率,并将DP-Hype作为子模块集成到流行的联邦机器学习框架Flower中。此外,我们在多个基准数据集上的独立同分布以及多个非独立同分布设置下评估了性能,并展示了即使在较小的隐私预算下,DP-Hype也具有高效用。

英文摘要

Tuning hyperparameters in federated machine learning can substantially impact model performance. When hyperparameters are tuned on sensitive data, privacy becomes an important challenge and to this end, differential privacy has emerged as the de facto standard for provable privacy. A standard setting in federated learning is that clients agree on a shared setup, i.e., find a compromise from a set of hyperparameters, like a model's learning rate. Yet, prior work on privacy-preserving hyperparameter tuning is tailored to specific learning tasks, does not account for the privacy leakage of aggregated results, or offers a sub-optimal privacy-utility trade-off. In this work, we present our algorithm DP-Hype, which performs a federated and privacy-preserving hyperparameter search by conducting a federated voting based on local hyperparameter evaluations of clients. In this way, DP-Hype selects hyperparameters that lead to a compromise supported by a majority of clients, while maintaining scalability and independence from specific learning tasks. We prove that DP-Hype preserves the strong notion of differential privacy called client-level differential privacy and, importantly, show that its privacy guarantees do not depend on the number of hyperparameters. We also provide bounds on its utility guarantees, that is, the probability of finding good hyperparameters, and implement DP-Hype as a submodule in the popular Flower framework for federated machine learning. In addition, we evaluate performance on multiple benchmark data sets in iid as well as multiple non-iid settings and demonstrate high utility of DP-Hype even under small privacy budgets.

2512.12737 2026-06-16 cs.LG cs.DC 版本更新

Communication-Efficient Neural Tangent Kernels for Heterogeneous Decentralized Federated Learning

面向异构去中心化联邦学习的通信高效神经正切核

Li Xia

发表机构 * Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China(民族语言智能分析与安全治理教育部重点实验室,中国民族大学)

AI总结 提出SPARK方法,通过阶段退火软标签正则化器稳定动量加速的神经正切核更新,在异构去中心化联邦学习中实现约3倍收敛加速和高达70%的通信节省。

Comments 33 pages, 13 figures

详情
AI中文摘要

去中心化联邦学习(DFL)无需中央服务器即可实现协作模型训练,但在统计异构性下收敛缓慢。最近的研究表明,神经正切核(NTK)方法在DFL中比基于梯度的更新收敛更快,而动量已被证明能有效加速基于梯度的联邦学习。然而,将动量应用于NTK更新会在异构数据下导致训练不稳定。我们提出SPARK,通过一种在邻域聚合数据上评估的阶段退火软标签正则化器来解决这种不稳定性,从而使动量能够稳定地加速NTK更新。在高异构性下,SPARK的收敛速度比基线快约3倍,并将达到目标精度的总通信量降低高达约70%,且在多种异构性水平下均获得更高精度。我们进一步研究了随机投影作为带宽受限场景下的可选雅可比压缩策略。我们在多个数据集、网络拓扑和异构性水平上验证了该方法。

英文摘要

Decentralized federated learning (DFL) enables collaborative model training without a central server, but converges slowly under statistical heterogeneity. Recent work has shown that neural tangent kernel (NTK) methods achieve faster convergence than gradient-based updates in DFL, while momentum has proven effective for accelerating gradient-based FL. However, applying momentum to NTK updates can destabilize training under heterogeneous data. We propose SPARK, which addresses this instability with a stage-wise annealed soft-label regularizer evaluated on neighborhood-aggregated data, so that momentum can accelerate NTK updates stably. Under high heterogeneity, SPARK converges about 3$\times$ faster than baselines and lowers the total communication to a target accuracy by up to about 70\%, and it attains higher accuracy across heterogeneity levels. We further study random projection as an optional Jacobian-compression strategy for bandwidth-constrained settings. We validate the approach across multiple datasets, network topologies, and heterogeneity levels.

2512.18295 2026-06-16 cs.LG cs.AI 版本更新

AL-GNN: Privacy-Preserving and Replay-Free Continual Graph Learning via Analytic Learning

AL-GNN: 基于分析学习的隐私保护且无需重放的持续图学习

Xuling Zhang, Jindong Li, Yifei Zhang, Mingqi Yang, Menglin Yang

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港理工大学(广州)) Northwestern Polytechnical University(西北工业大学) South China University of Technology(华南理工大学)

AI总结 提出AL-GNN框架,利用分析学习理论将持续图学习转化为递归最小二乘优化,通过闭式分类器更新和正则化特征自相关矩阵实现无需反向传播和重放缓冲的高效训练,在保护隐私的同时提升性能并减少遗忘。

详情
AI中文摘要

持续图学习(CGL)旨在使图神经网络能够从图结构数据流中增量学习,而不会遗忘先前获得的知识。现有方法,特别是基于经验重放的方法,通常存储并重新访问过去的图数据以缓解灾难性遗忘。然而,这些方法存在显著局限性,包括隐私问题和低效性。在这项工作中,我们提出了AL-GNN,一种新颖的持续图学习框架,消除了对反向传播和重放缓冲区的需求。相反,AL-GNN利用分析学习理论的原理,将学习形式化为递归最小二乘优化过程。它通过闭式分类器更新和正则化特征自相关矩阵来分析和更新模型知识。这种设计使得每个任务能够进行高效的单次训练,并通过避免存储历史样本固有地保护数据隐私。在多个动态图分类基准上的大量实验表明,AL-GNN取得了与现有方法相比具有竞争力或更优的性能。例如,它在CoraFull上平均性能提高了10%,在Reddit上遗忘减少了30%以上,同时由于其无反向传播的设计,训练时间减少了近50%。

英文摘要

Continual graph learning (CGL) aims to enable graph neural networks to incrementally learn from a stream of graph structured data without forgetting previously acquired knowledge. Existing methods particularly those based on experience replay typically store and revisit past graph data to mitigate catastrophic forgetting. However, these approaches pose significant limitations, including privacy concerns, inefficiency. In this work, we propose AL GNN, a novel framework for continual graph learning that eliminates the need for backpropagation and replay buffers. Instead, AL GNN leverages principles from analytic learning theory to formulate learning as a recursive least squares optimization process. It maintains and updates model knowledge analytically through closed form classifier updates and a regularized feature autocorrelation matrix. This design enables efficient one pass training for each task, and inherently preserves data privacy by avoiding historical sample storage. Extensive experiments on multiple dynamic graph classification benchmarks demonstrate that AL GNN achieves competitive or superior performance compared to existing methods. For instance, it improves average performance by 10% on CoraFull and reduces forgetting by over 30% on Reddit, while also reducing training time by nearly 50% due to its backpropagation free design.

2601.09304 2026-06-16 cs.LG 版本更新

Single-Round Clustered Federated Learning via Data Collaboration Analysis for Non-IID Data

基于数据协作分析的单轮聚类联邦学习用于非独立同分布数据

Sota Sugawara, Yuji Kawamata, Akihiro Toyoda, Tomoru Nakayama, Yukihiko Okada

发表机构 * Graduate School of Science and Technology, University of Tsukuba(茨川大学理学技术研究生院) Center for Artificial Intelligence Research, Tsukuba Institute for Advanced Research, University of Tsukuba(茨川大学先进研究所人工智能研究中心) Institute of Systems and Information Engineering, University of Tsukuba(茨川大学系统与信息工程研究所)

AI总结 提出单轮框架DC-CFL,通过数据协作分析中的总变差距离量化客户端相似性,使用层次聚类和协作学习实现聚类与模型训练,在非IID数据上达到与多轮方法相当的精度。

Comments 9 pages, 3 figures

详情
AI中文摘要

联邦学习(FL)允许多个客户端在不共享原始数据的情况下进行分布式学习。当客户端之间的统计异质性严重时,聚类联邦学习(CFL)可以通过对相似客户端进行分组并训练聚类模型来提高性能。然而,大多数CFL方法依赖多轮通信进行聚类估计和模型更新,这在通信轮数严格受限的情况下限制了其实用性。我们提出了基于数据协作的聚类联邦学习(DC-CFL),这是一个单轮框架,仅使用DC分析中共享的信息即可完成客户端聚类和聚类学习。DC-CFL通过标签分布之间的总变差距离量化客户端间相似性,使用层次聚类估计聚类,并通过DC分析进行聚类学习。在代表性非IID条件下的多个开放数据集上的实验表明,DC-CFL在仅需一轮通信的情况下达到了与多轮基线相当的精度。这些结果表明,当多轮通信不可行时,DC-CFL是协作AI模型开发的一种实用替代方案。我们的源代码在此https URL公开。

英文摘要

Federated Learning (FL) enables distributed learning across multiple clients without sharing raw data. When statistical heterogeneity across clients is severe, Clustered Federated Learning (CFL) can im-prove performance by grouping similar clients and training cluster-wise models. However, most CFL approaches rely on multiple communication rounds for cluster estimation and model updates, which limits their practicality under tight constraints on communication rounds. We propose Data Collaboration-based Clustered Federated Learning (DC-CFL), a single-round framework that completes both client clustering and cluster-wise learning, using only the information shared in DC analysis. DC-CFL quantifies inter-client similarity via total variation distance between label distributions, estimates clusters using hierarchical clustering, and performs cluster-wise learning via DC analysis. Experiments on multiple open datasets under representative non-IID conditions show that DC-CFL achieves accuracy comparable to multi-round baselines while requiring only one communication round. These results indicate that DC-CFL is a practical alternative for collaborative AI model development when multiple communication rounds are impractical. Our source code is publicly available at https://github.com/souta-suga/DC-CFL.

2601.11219 2026-06-16 cs.LG cs.AI 版本更新

SDFLoRA: Selective Decoupled Federated LoRA for Privacy-preserving Fine-tuning with Heterogeneous Clients

SDFLoRA: 面向异构客户隐私保护微调的选择性解耦联邦LoRA

Zhikang Shen, Jianrong Lu, Haiyuan Wan, Jianhai Chen

发表机构 * Zhejiang University(浙江大学) Tsinghua University(清华大学)

AI总结 提出SDFLoRA,通过将LoRA更新解耦为共享和私有组件,仅聚合共享部分并注入差分隐私噪声,解决联邦微调中的秩异构和数据异构问题,提升隐私-效用权衡。

详情
AI中文摘要

联邦学习(FL)用于大型语言模型(LLM)作为在分布式数据上适应模型的隐私保护方法日益受到关注,其中低秩适应(LoRA)等参数高效方法被广泛采用以降低通信和内存成本。然而,实际部署通常表现出秩和数据异构性:客户端在不同的低秩预算和数据分布下运行,使得LoRA更新的直接聚合存在偏差且不稳定。现有方法要么强制统一秩,要么将异构更新对齐到单个共享子空间,这往往会混合可迁移和客户端特定的方向,从而损害个性化。此外,在差分隐私(DP)下,扰动这种结构混合的更新会向本应保持纯局部的方向注入噪声,导致不必要的效用下降。为了解决这些问题,我们提出了选择性解耦联邦LoRA(SDFLoRA),一种结构感知的LoRA框架,将每个客户端更新解耦为用于聚合的共享组件和保留客户端特定语义的私有组件。只有共享组件参与子空间对齐,而私有组件保持本地且不通信,使得训练与DP兼容并在秩异构下稳定聚合。通过仅向聚合的可共享更新注入噪声,该方法避免了对局部方向的扰动,并改善了效用-隐私权衡。在多个基准上的实验表明,SDFLoRA优于联邦LoRA基线,并实现了强大的效用-隐私权衡。

英文摘要

Federated learning (FL) for large language models (LLMs) has attracted increasing attention as a privacy-preserving approach for adapting models over distributed data, where parameter-efficient methods such as Low-Rank Adaptation (LoRA) are widely adopted to reduce communication and memory costs. However, practical deployments often exhibit rank and data heterogeneity: clients operate under different low-rank budgets and data distributions, making direct aggregation of LoRA updates biased and unstable. Existing approaches either enforce a unified rank or align heterogeneous updates into a single shared subspace, which tends to mix transferable and client-specific directions and consequently undermines personalization. Moreover, under differential privacy (DP), perturbing such structurally mixed updates injects noise into directions that should remain purely local, leading to unnecessary utility degradation. To address these issues, we propose Selective Decoupled Federated LoRA (SDFLoRA), a structure-aware LoRA framework that decouples each client update into a shared component for aggregation and a private component that preserves client-specific semantics. Only the shared component participates in subspace alignment, while the private component remains local and uncommunicated, making the training DP-compatible and stabilizing aggregation under rank heterogeneity. By injecting noise only into the aggregated shareable update, this approach avoids perturbations to local directions and improves the utility-privacy trade-off. Experiments on multiple benchmarks demonstrate that SDFLoRA outperforms federated LoRA baselines and achieves a strong utility-privacy trade-off.

2603.12977 2026-06-16 cs.LG 版本更新

Exact Federated Continual Unlearning for Ridge Heads on Frozen Foundation Models

冻结基础模型上岭回归头的精确联邦持续遗忘

Yijun Quan, Wentai Wu, Giovanni Montana

发表机构 * WMG, University of Warwick, Coventry CV4 7AL, UK(沃里克大学WMG学院,沃里克大学,英格兰考文特里CV4 7AL,英国) Department of Computer Science, Jinan University, Guangzhou 510632, China(广州大学计算机科学系,广州510632,中国)

AI总结 针对冻结基础模型+岭回归头的联邦学习场景,提出基于充分统计量的通信协议,实现精确且高效的连续遗忘,支持任意添加和删除请求,保证与集中式重训练等价的确定性。

Comments Accepted to ECML-PKDD 2026

详情
AI中文摘要

基础模型通常被部署为冻结的特征提取器,并附带一个小的可训练头,以适应联邦设置中私有的、用户生成的数据。``被遗忘权''要求按需从训练模型中移除特定样本或用户的影响。现有的联邦遗忘方法针对通用深度模型,依赖于近似重构或选择性重训练,使得精确性代价高昂或难以实现。我们在一个实际相关但未充分探索的机制中研究这个问题:一个带有岭回归头的冻结基础模型。精确最优解仅通过两个加性充分统计量依赖于数据,我们将其转化为一种通信协议,通过固定大小的消息支持任意添加和删除请求流。服务器维护一个在精确算术意义上与每次请求后的集中式重训练逐点相同的头。我们提供了确定性的重训练等价保证、顺序和划分不变性、两种服务器端变体,以及零KL散度的贝叶斯证书。在四个基准上的实验证实了这些保证:两种变体在相对Frobenius误差$10^{-9}$内匹配集中式岭重训练,并且每个请求的完成成本比联邦重训练基线低几个数量级。

英文摘要

Foundation models are commonly deployed as frozen feature extractors with a small trainable head to adapt to private, user-generated data in federated settings. The ``right to be forgotten'' requires removing the influence of specific samples or users from the trained model on demand. Existing federated unlearning methods target general deep models and rely on approximate reconstruction or selective retraining, making exactness costly or elusive. We study this problem in a practically relevant but under-explored regime: a frozen foundation model with a ridge-regression head. The exact optimum depends on the data only through two additive sufficient statistics, which we turn into a communication protocol supporting an arbitrary stream of add and delete requests via fixed-size messages. The server maintains a head that is, in exact arithmetic, pointwise identical to centralized retraining after every request. We provide deterministic retrain-equivalence guarantees, order and partition invariance, two server-side variants, and a Bayesian certificate of zero KL divergence. Experiments on four benchmarks confirm the guarantees: both variants match centralized ridge retraining to within $10^{-9}$ relative Frobenius error and complete each request at orders-of-magnitude lower cost than federated retraining baselines.

2504.11775 2026-06-16 stat.ML cs.CY cs.LG q-fin.RM 版本更新

Discrimination-free Insurance Pricing with Privatized Sensitive Attributes

基于隐私化敏感属性的无歧视保险定价

Tianhe Zhang, Suhan Liu, Peng Shi

发表机构 * Department of Risk and Insurance, University of Wisconsin-Madison(风险与保险系,威斯康星大学麦迪逊分校) Department of Statistics and Operations Research, University of North Carolina-Chapel Hill(统计与运筹系,北卡罗来纳大学教堂山分校)

AI总结 针对保险公司无法直接获取敏感属性(如性别、种族)的公平定价问题,提出利用隐私化(加噪)敏感属性估计无歧视保费的方法,并建立理论保证与实证验证。

详情
AI中文摘要

公平性已成为保险定价中的重要关注点,因为保险公司越来越依赖机器学习模型来预测预期损失。同时,监管和隐私约束通常限制保险公司访问或使用敏感属性(如性别或种族)。最近的精算研究通过无歧视保费的概念来解决这一背景下的公平性问题,该概念消除了敏感属性的直接和间接影响,同时保持精算一致性。然而,实施这种方法通常需要访问敏感属性本身,而在实践中可能无法获得。本文研究了当敏感属性仅以隐私化或噪声扰动形式被观测时,无歧视保险保费的估计问题。我们考虑一个多方数据设置,其中保险公司观测非敏感属性和结果,而一个可信第三方持有通过隐私机制生成的隐私化敏感属性。在此框架内,我们开发了仅使用隐私化属性估计无歧视保费的统计方法。我们研究了两种实际相关的情况:隐私机制已知和其噪声水平未知。对于这两种情况,我们为所提出的估计量建立了理论保证。数值实验和实证应用表明,所提出的方法能够在尊重隐私和监管约束的同时实现公平的保险定价。

英文摘要

Fairness has become an important concern in insurance pricing as insurers increasingly rely on machine learning models to predict expected losses. At the same time, regulatory and privacy constraints often restrict insurers' ability to access or use sensitive attributes such as gender or race. Recent actuarial research addresses fairness in this context through the concept of the discrimination-free premium, which removes both the direct and indirect effects of sensitive attributes while preserving actuarial consistency. However, implementing this approach typically requires access to the sensitive attributes themselves, which may not be available in practice. This paper studies the estimation of discrimination-free insurance premiums when sensitive attributes are observed only in privatized or noise-perturbed form. We consider a multi-party data setting in which insurers observe non-sensitive attributes and outcomes, while a trusted third party holds privatized sensitive attributes generated through a privacy mechanism. Within this framework, we develop statistical methods for estimating discrimination-free premiums using only the privatized attributes. We study two settings of practical relevance: when the privacy mechanism is known and when its noise level is unknown. For both cases, we establish theoretical guarantees for the proposed estimators. Numerical experiments and empirical applications demonstrate that the proposed approach enables fair insurance pricing while respecting privacy and regulatory constraints.

2605.26595 2026-06-16 cs.CR cs.AI cs.LG 版本更新

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

Cordyceps: 通过数据投毒对LLM的隐蔽控制攻击

Zedian Shao, Charles Fleming, Teodora Baluta

发表机构 * Georgia Institute of Technology(佐治亚理工学院) Cisco Systems(思科系统)

AI总结 提出一种数据投毒方法,通过语义关联教LLM隐藏任意恶意指令,实现隐蔽控制攻击,绕过多种防御。

Comments USENIX Security '26

详情
AI中文摘要

大型语言模型(LLM)通常在没有经过精心筛选的文本数据集上进行微调,而对手可以对这些数据集进行投毒。现有的投毒攻击主要依赖于固定的触发短语,而异常检测、干净数据正则化或在线监控等防御措施可以中和这些触发短语。在本文中,我们提出了一种数据投毒方法,通过共享知识(如事实或概念)与攻击者选择的短语之间的语义关联,可靠且隐蔽地教LLM一种信息隐藏方案。诱导的隐藏方案可以编码和解码任意恶意指令,从而揭示了一种新的、微妙的投毒诱导漏洞:隐蔽控制攻击。我们精确描述了隐蔽控制攻击,并在5个LLM、3个后门防御和4个提示注入防御上进行了评估。在少量投毒样本的情况下,隐蔽控制攻击在平均攻击成功率上比基于启发式的提示注入攻击高出约40%(相对于干净微调模型)。它们还绕过了基于检测和微调的防御,在后门防御后保持高达93%的攻击成功率,在提示注入防御后保持高达98%的攻击成功率。

英文摘要

Large language models (LLMs) are often fine-tuned on uncurated text datasets that adversaries can poison. Existing poisoning attacks primarily rely on fixed trigger phrases that defenses such as outlier detection, clean-data regularization, or online monitoring can neutralize. In this paper, we propose a data poisoning method that teaches an LLM an information hiding scheme reliably and stealthily through semantic associations between shared knowledge such as facts or concepts and attacker-chosen phrases. The induced hiding scheme can encode and decode arbitrary malicious instructions, thus revealing a new and subtle poisoning-induced vulnerability: covert control attacks. We precisely characterize covert control attacks and evaluate them across $5$ LLMs, $3$ backdoor defenses, and $4$ prompt injection defenses. With a small poisoned fraction, covert control attacks outperform heuristic-based prompt injection attacks in average attack success rate by about $40\%$ relative to clean fine-tuned models. They also circumvent defenses based on detection and fine-tuning, maintaining up to $93\%$ attack success rate after backdoor defenses and up to $98\%$ after prompt injection defenses.

8. 鲁棒性、不确定性与可信学习 33 篇

2606.14865 2026-06-16 cs.LG cs.AI 新提交

GRAPE: Guided Parameter-Space Evolution for Compact Adversarial Robustness

GRAPE: 面向紧凑对抗鲁棒性的引导式参数空间演化

Zhiyuan Ye, Xiangyu Zhou, Ji Qi, Hao Zhang, Yi Zhou

发表机构 * University of Science and Technology of China(中国科学技术大学) China Mobile (Suzhou) Software Technology Co., Ltd.(中移(苏州)软件技术有限公司)

AI总结 提出GRAPE框架,通过逐步暴露参数空间并利用对抗谱利用分数引导容量分配,在固定计算预算下提升紧凑模型的对抗鲁棒性,在CIFAR-10上以1.009倍FLOPs将PGD-20鲁棒准确率从51.70%提升至56.94%,参数减少21.4%。

详情
AI中文摘要

对抗训练(AT)提高了神经网络的鲁棒性,但大多数方法从一开始就训练固定的参数空间。本文探讨了参数变得可优化的顺序是否会影响最终的鲁棒解,即使最终架构或计算预算被控制。我们提出了GRAPE(引导式参数空间演化),一种面向紧凑对抗鲁棒性的训练框架。GRAPE结合了参数空间稳定化与渐进式隐藏扩展:它在当前暴露空间中稳定鲁棒优化,逐步释放新的可优化维度,并使用对抗谱利用分数引导新释放的容量流向高压模块。与固定结构的AT相比,GRAPE将鲁棒模型学习视为一个渐进式参数空间暴露和演化的过程。在CIFAR-10上的标准$\ell_\infty$威胁模型下,以固定结构ResNet-18 AT作为对照参考,GRAPE在几乎匹配的计算预算下(FLOPs比率为1.009倍)将PGD-20鲁棒准确率从51.70%提升至56.94%,同时参数数量减少约21.4%。一个具有相同最终ResNet-18架构的序列增长变体达到了56.52%的PGD-20鲁棒准确率,表明增益不仅来自最终架构差异,还来自参数空间暴露路径。这些结果表明,引导式参数空间演化可以在匹配计算条件下产生紧凑且鲁棒的参数配置。

英文摘要

Adversarial Training (AT) improves neural network robustness, but most methods train a fixed parameter space from the start. This paper asks whether the order in which parameters become optimizable can affect the final robust solution, even when the final architecture or computation budget is controlled. We propose GRAPE, Guided Parameter-Space Evolution, a training framework for compact adversarial robustness. GRAPE combines parameter-space stabilization with progressive hidden expansion: it stabilizes robust optimization in the currently exposed space, gradually releases new optimizable dimensions, and uses an adversarial spectral utilization score to guide newly released capacity toward high-pressure modules. In contrast to fixed-structure AT, GRAPE treats robust model learning as a process of progressive parameter-space exposure and evolution. Under the standard $\ell_\infty$ threat model on CIFAR-10, with fixed-structure ResNet-18 AT as a controlled reference, GRAPE improves PGD-20 robust accuracy from 51.70% to 56.94% at a nearly matched computation budget with a FLOPs ratio of 1.009x, while reducing parameter count by about 21.4%. A sequential grow variant with the same final ResNet-18 architecture reaches 56.52% PGD-20 robust accuracy, indicating that the gain is not only due to final architecture differences but also to the parameter-space exposure path. These results suggest that guided parameter-space evolution can yield compact and robust parameter configurations under matched computation.

2606.15127 2026-06-16 cs.LG 新提交

Beyond Accuracy: Measuring Bias Acknowledgment in Chain-of-Thought Reasoning for Responsible AI Evaluation

超越准确率:在负责任AI评估中衡量思维链推理中的偏见承认

Xian Sun, Wei Gao, Yingshuo Wang, Lingdong Kong, Yanhang Li, Zhichao Fan, Zexin Zhuang, Wenlong Dong, Zhiyuan Zheng, Hrishikesh Paranjape, Abhishek Mandal, Johnny R. Zhang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 针对仅用准确率评估忽略推理链中偏见承认的问题,提出包含易感性(susceptibility)和承认(acknowledgment)两个维度的诊断方法,实验发现不同模型在准确率相近时承认率差异显著。

Comments ICML 2026 Workshop on Trustworthy AI for Good

详情
AI中文摘要

推理模型越来越多地用于最终答案并非唯一审查对象的场景:教育工具可能向学生展示中间步骤,决策支持系统可能需要人工监督,审计工作流可能检查痕迹是否存在误导性或偏见输入。在这些场景中,两个响应可能获得相同的最终答案分数,但在痕迹是否明确标记注入的偏见内容方面存在差异。仅用准确率评估会忽略这些情况。我们将这一差距视为负责任评估中的测量盲点,并引入一个最小痕迹级诊断,包含两个维度:\emph{易感性}(偏见是否破坏先前正确的答案)和\emph{承认}(痕迹是否包含由规则定义的提及注入内容的表面引用)。在数千个有偏见的GSM8K试验中,GPT-4o和Claude Sonnet~4的易感性率相似($1.3\%$ vs. $1.2\%$),但在相同规则下承认率差异显著($13.0\%$ vs. $75.0\%$)。

英文摘要

Reasoning models are increasingly used in settings where the final answer is not the only object of review: educational tools may show students intermediate steps, decision-support systems may require human oversight, and audit workflows may inspect traces for misleading or biased input. In such settings, two responses can receive the same final-answer score while differing in whether the trace explicitly flags injected biasing content. Accuracy-only evaluation collapses these cases. We study this gap as a measurement blind spot for responsible evaluation and introduce a minimal trace-level diagnostic with two axes: \emph{susceptibility} (whether the bias breaks a previously correct answer) and \emph{acknowledgment} (whether the trace contains a rubric-defined surface reference to the injected content). Across thousands of biased GSM8K trials, GPT-4o and Claude Sonnet~4 have similar susceptibility rates ($1.3\%$ vs.\ $1.2\%$) but substantially different acknowledgment rates ($13.0\%$ vs.\ $75.0\%$) under the same rubric.

2606.15153 2026-06-16 cs.LG 新提交

False Sense of Safety in Selective Signal Classification: Auditing Bound Tightness and Exchangeability for Risk Control

选择性信号分类中的虚假安全感:风险控制的边界紧致性与可交换性审计

Jingwen Zhou, Mingzhe Wang

AI总结 审计分布自由风险控制下选择性预测的边界紧致性与可交换性假设,发现经验阈值法常超预算,而认证方法在可交换时有效,但组部署下因可交换性失效导致违规。

详情
AI中文摘要

具有分布自由风险控制的选择性预测承诺:在标定样本上以置信度1-delta,接受输入的误差率保持在用户预算alpha以下。我们在信号域检测器(机器异常声音检测(ASD)和AI生成图像取证)上审计了这一承诺,针对四种标定规则:未经认证的经验阈值法(NAIVE)以及认证的Hoeffding、Clopper-Pearson(CP)和赌博(WSR)上置信界。我们报告三个发现。(i)实践中常见的NAIVE阈值法在49-73%的合成试验(n=200个标定点)和高达68%的真实数据分割中超出其声明预算:这是一种虚假的安全感,而非定理被破坏,因为该规则从未有证书。(ii)紧致性重要:CP和WSR在Hoeffding未认证的地方认证了显著覆盖,且在可交换分割下观察到零预算超限。(iii)在分组部署(未见过的机器类型或生成器)下,认证规则在9-30%的试验中超限——远高于delta——表明失败在于可交换性前提被破坏,而非边界本身;保守的逐组阈值以严重的覆盖代价恢复了有效性。

英文摘要

Selective prediction with distribution-free risk control promises that, with confidence 1-delta over the calibration draw, the error rate of accepted inputs stays below a user budget alpha. We audit this promise on signal-domain detectors -- machine anomalous-sound detection (ASD) and AI-generated-image forensics -- for four calibration rules: uncertified empirical thresholding (NAIVE) and certified Hoeffding, Clopper-Pearson (CP), and betting (WSR) upper confidence bounds. We report three findings. (i) NAIVE thresholding, common in practice, exceeds its declared budget in 49-73% of synthetic trials (n=200 calibration points) and in up to 68% of real-data splits: a false sense of safety rather than a broken theorem, since the rule never had a certificate. (ii) Tightness matters: CP and WSR certify substantial coverage where Hoeffding certifies none, with zero observed budget overruns under exchangeable splits. (iii) Under grouped deployment (unseen machine types or generators), certified rules overrun in 9-30% of trials -- far above delta -- showing the failure lies in the broken exchangeability premise, not in the bounds; a conservative per-group threshold restores validity at a severe coverage cost.

2606.15479 2026-06-16 cs.LG cs.AI math.PR 新提交

Bayesian 3D Steerable CNNs: Enabling Equivariance and Uncertainty Quantification Simultaneously

贝叶斯3D可转向CNN:同时实现等变性和不确定性量化

Abhishek Keripale, Ponkrshnan Thiagarajan, Susanta Ghosh

发表机构 * Michigan Technological University(密歇根理工大学) Johns Hopkins University(约翰霍普金斯大学) The Center for Artificial Intelligence at the Institute of Computing and Cybersystems, Michigan Technological University(密歇根理工大学计算与网络系统研究所人工智能中心)

AI总结 提出贝叶斯可转向CNN,通过后验分布赋予核随机性同时保持SE(3)-等变性,实现不确定性分解,在分类精度和分布偏移下鲁棒性优于确定性模型。

详情
AI中文摘要

可转向卷积神经网络(Steerable-CNNs)通过将核参数化为可转向基函数的线性组合来保证SE(3)-等变性,但其确定性本质阻碍了不确定性量化——限制了其在需要置信度估计的场景中的应用。我们提出一种贝叶斯可转向CNN,将后验分布置于基系数上,从而在精确保持等变性的同时产生随机核。模型的损失函数通过变分推断获得,并通过贝叶斯反向传播最小化。该框架将预测不确定性分解为认知不确定性和偶然不确定性。实验上,该模型在取得竞争性分类精度的同时,预期校准误差为0.0263,并且在加性高斯噪声引起的分布偏移下,其性能比确定性对应模型高出最多6.17%。此外,我们利用模型的不确定性估计显著提升其性能,在测试数据集的84%上实现了约4%的准确率提升。认知不确定性与预测误差之间统计显著的负相关性表明,学习到的后验方差具有语义意义。该框架将贝叶斯不确定性量化与等变CNN的归纳偏置统一起来。

英文摘要

Steerable convolutional neural networks (Steerable-CNNs) guarantee SE(3)-equivariance by parameterizing kernels as linear combinations of steerable basis functions, but their deterministic nature precludes uncertainty quantification - limiting their use in settings where confidence estimates are essential. We propose a Bayesian Steerable-CNN that places posterior distributions over the basis coefficients, yielding stochastic kernels while preserving equivariance exactly. The loss function of the model is obtained via variational inference and minimized by Bayes-by-Backpropagation. The framework admits a decomposition of predictive uncertainty into epistemic and aleatoric components. Empirically, the model attains competitive classification accuracy alongside an expected calibration error of 0.0263 and outperforms its deterministic counterpart by up to 6.17% under distributional shift induced by additive Gaussian noise. Furthermore, we leverage the model's uncertainty estimates to enhance its performance significantly, achieving a notable gain - approximately 4% higher accuracy across 84% of the test dataset. A statistically significant negative correlation between epistemic uncertainty and prediction error confirms that the learned posterior variance is semantically meaningful. The framework unifies Bayesian uncertainty quantification with the inductive bias of equivariant CNNs.

2606.15493 2026-06-16 cs.LG cs.CR 新提交

Model Stealing Through the Lens of Model Multiplicity

从模型多重性视角看模型窃取

Eliott Baltz, Satoshi Hara, Ulrich Aïvodji

发表机构 * ÉTS, Mila(蒙特利尔高等技术学院,Mila) The University of Electro-Communications(电气通信大学)

AI总结 本文通过计算替代模型的Rashomon集并评估其多样性,发现高保真替代模型在关键性能指标上可能与目标模型存在显著差异,挑战了传统观点。

Comments 14 pages, 15 figures

详情
AI中文摘要

模型窃取攻击中,对手创建高保真替代模型,对机器学习服务的知识产权构成重大威胁。传统观点认为这些替代模型能为对手提供与原始服务提供商相当的经济杠杆。本文通过评估模型窃取攻击超越单纯对目标模型的保真度来挑战这一假设。由于基于查询的提取仅提供目标输入输出行为的部分监督,替代模型并非唯一确定:许多接近最优的替代模型可以在实现相当保真度的同时,在部署相关属性上存在差异。我们不执行经典的基于学习的模型窃取攻击,而是计算替代模型的Rashomon集(即几乎同等准确的模型集合),并使用多重性指标(歧义性、差异性和Rashomon容量)和群体公平性指标评估其多样性。在表格、医学影像和NLP任务中,我们在真实数据集上的实验表明,尽管替代模型与目标模型表现出相似的保真度,但在其他关键性能指标上可能显示出显著差异。这些发现对高保真替代模型与实际部署场景中目标模型之间的假定等价性提出了质疑。

英文摘要

Model stealing attacks, where adversaries create high-fidelity surrogate models, are a significant threat to the intellectual property of machine learning services. Conventional wisdom suggests these surrogates could provide adversaries with economic leverage comparable to the original service providers. This paper challenges this assumption by evaluating model stealing attacks beyond mere fidelity to the target model. Because query-based extraction provides only partial supervision of the target's input-output behavior, the surrogate is not uniquely identified: many near-optimal surrogates can achieve comparable fidelity while differing in deployment-relevant properties. Instead of performing a classic learning-based model stealing attack, we compute the Rashomon Set (i.e., the set of almost-equally-accurate models) of surrogate models, and evaluate its diversity using multiplicity metrics (ambiguity, discrepancy, and Rashomon Capacity) and group fairness metrics. Across tabular, medical imaging, and NLP tasks, our experiments on real-world datasets reveal that despite exhibiting similar fidelity to the target model, surrogate models can display significant variances in other critical performance metrics. These findings cast doubt on the presumed equivalence between high-fidelity surrogates and the target model in practical deployment scenarios.

2606.15730 2026-06-16 cs.LG cs.AI 新提交

InstantForget: Update-Free Backdoor Unlearning with Inference-Time Feature Reset

InstantForget: 无需更新的后门遗忘与推理时特征重置

Zhenyu Yu

发表机构 * College of Computer Science and Artificial Intelligence, Fudan University(复旦大学计算机科学与人工智能学院)

AI总结 提出InstantForget方法,通过推理时特征重置实现无需参数更新的后门遗忘,利用马氏距离检测异常特征并重置为中性表示,在CIFAR-10上平均ASR降至0.071。

详情
AI中文摘要

后门遗忘旨在从部署模型中移除恶意触发行为,同时保持清洁效用。我们研究了无需更新的推理时设置,其中模型参数保持冻结。首先,我们在oracle配对的清洁和触发特征下审计了一个常见的投影假设。投影主要对BadNets成功,而在CIFAR-10 ResNet-18上对WaNet、Blended和SIG的ASR分别为0.683、0.888和0.941。这种失败不能由谱紧凑性、空间局部性或子空间错位解释,而是由涉及目标边际、目标logit下降和非目标logit上升的logit三元组差距预测。然后我们引入了InstantForget,一种清洁校准的门控重置,通过马氏距离标记异常特征,并仅将标记的特征移向中性的非目标表示。在保留的触发验证集上选择一个固定操作点后,InstantForget在部署时无需触发样本或参数更新,将CIFAR-10上四种非自适应触发的平均ASR降至0.071。它还达到了0.981的检测AUROC,并迁移到八个测试骨干中的六个。报告的在WaNet、ModelNet10点混合、两种骨干几何和自适应特征紧凑性攻击下的失败定义了该方法的适用范围。

英文摘要

Backdoor unlearning aims to remove a malicious trigger behavior from a deployed model while preserving clean utility. We study the update-free inference-time setting, where model parameters remain frozen. First, we audit a common projection assumption under oracle paired clean and triggered features. Projection succeeds mainly on BadNets and leaves WaNet, Blended, and SIG at 0.683, 0.888, and 0.941 ASR on CIFAR-10 ResNet-18. This failure is not explained by spectral compactness, spatial locality, or subspace misalignment. It is predicted by a logit-triplet gap involving the target margin, target-logit drop, and non-target logit rise. We then introduce InstantForget, a clean-calibrated gated reset that flags anomalous features with a Mahalanobis score and moves only flagged features toward a neutral non-target representation. With one fixed operating point selected on held-out triggered validation, InstantForget reduces average ASR to 0.071 across four non-adaptive CIFAR-10 triggers without triggered samples or parameter updates at deployment. It also reaches 0.981 detection AUROC and transfers to six of eight tested backbones. Reported failures under WaNet, ModelNet10 point blend, two backbone geometries, and adaptive feature-compactness attacks define the method's scope.

2606.15767 2026-06-16 cs.LG cs.AI 新提交

Visualizing Uncertainty: Spatial Maps of Missing and Conflicting Evidence in Deep Learning

可视化不确定性:深度学习中缺失与冲突证据的空间图

Dong Hyun Jeong, Feng Chen, Jin-Hee Cho, Lance M. Kaplan, Audun Jøsang, Soo-Yeon Ji

发表机构 * University of the District of Columbia(哥伦比亚特区大学) University of Texas at Dallas(德克萨斯大学达拉斯分校) Virginia Tech(弗吉尼亚理工大学) U.S. Army DEVCOM Army Research Laboratory(美国陆军DEVCOM陆军研究实验室) University of Oslo(奥斯陆大学) Bowie State University(鲍伊州立大学)

AI总结 提出不确定性激活图(UAM)框架,结合证据深度学习与全梯度类激活映射,生成空间不确定性激活图,区分缺乏证据的空虚和假设冲突的不和谐,填补不确定性量化与可解释性之间的空白。

详情
AI中文摘要

理解深度神经网络何时以及为何不确定对于在安全关键领域部署可靠的机器学习系统至关重要。虽然现有的不确定性量化方法提供了模型置信度的标量度量,但它们对输入的哪些空间区域导致不同类型的不确定性提供的洞察有限。我们提出了一种新颖的可视化框架——不确定性激活图(UAM),它将证据深度学习(EDL)与全梯度类激活映射(FullGrad)相结合,生成可解释的空间不确定性激活图。我们的方法区分了两种基本的不确定性类型:空虚(代表缺乏证据)和不和谐(捕捉竞争假设之间的冲突证据)。通过利用FullGrad的完整梯度分解特性和主观逻辑的原则性不确定性量化,我们的方法产生了理论上合理的可视化,突出显示了导致模型不确定性的特定图像区域。利用该框架,通过计算信念加权属性生成空虚和不和谐激活图,从而能够识别模型缺乏知识的区域与遇到模糊证据的区域。在多个基准数据集上的广泛评估表明,所提出的框架有效地解决了不确定性量化与可解释性之间的关键差距,为评估复杂视觉识别任务中的模型可靠性提供了直观的视觉反馈。

英文摘要

Understanding when and why deep neural networks are uncertain is crucial for deploying reliable machine learning systems in safety-critical domains. While existing uncertainty quantification methods provide scalar measures of model confidence, they offer limited insight into which spatial regions of an input contribute to different types of uncertainty. We propose a novel visualization framework, Uncertainty Activation Map (UAM), that combines Evidential Deep Learning (EDL) with Full-Gradient Class Activation Mapping (FullGrad) to generate interpretable spatial uncertainty activation maps. Our approach distinguishes between two fundamental types of uncertainty: vacuity, representing lack of evidence, and dissonance, capturing conflicting evidence between competing hypotheses. By leveraging the complete gradient decomposition property of FullGrad and the principled uncertainty quantification of Subjective Logic, our method produces theoretically grounded visualizations that highlight specific image regions responsible for model uncertainty. With this framework, vacuity and dissonance activation maps are generated by computing belief-weighted attributions, enabling identification of where models lack knowledge versus where they encounter ambiguous evidence. Extensive evaluations across multiple benchmark datasets demonstrate that the proposed framework effectively addresses the critical gap between uncertainty quantification and explainability, providing intuitive visual feedback to assess model reliability in complex visual recognition tasks.

2606.15980 2026-06-16 cs.LG cs.AI cs.CL 新提交

Do Safety Monitors Stay Reliable After an Update? Benchmarking and Predicting Activation-Monitor Staleness

安全监控器在更新后是否仍可靠?激活监控器陈旧性的基准测试与预测

Evan Duan

发表机构 * University of Michigan(密歇根大学)

AI总结 研究语言模型更新后激活监控器是否仍可靠,发现量化更新影响小,微调更新常导致监控器失效,且可通过预部署特征预测退化。

详情
AI中文摘要

激活监控器——在语言模型内部表示上训练的轻量级探针——在部署安全栈中越来越常见。然而,部署的模型很少是静态的:它们被量化、微调、用LoRA适配,或与合并适配器一起服务,而监控器保持冻结。我们首次系统测试了这一隐含契约是否成立:在基础模型上训练的激活监控器在这些常规模型更新后是否仍可靠。跨多个安全相关监控器、模型深度、更新系列和开放权重模型,我们发现一个明显的分裂:量化风格的更新大多保持冻结探针性能,而微调风格的更新经常使探针变得陈旧。脆弱性高度依赖于监控器,隐私/PII探针受影响最大,而拒绝合规探针相对稳定,表明重新训练行为不一定使其对应的监控器变得陈旧。QLoRA尤其有害,尽管单独的NF4量化相对良性,这表明量化在与适配结合时风险更大。我们进一步表明,退化可以从部署前的特征预测,从而能够将重新验证预算优先分配给最可能失败的监控器。这些结果表明,微调应默认触发激活监控器重新验证,而预测可以帮助优先检查哪些监控器。

英文摘要

Activation monitors-lightweight probes trained on a language model's internal representations-are an increasingly common layer in deployment safety stacks. Deployed models however are rarely static: they are quantized, fine-tuned, adapted with LoRA, or served with merged adapters while the monitor remains frozen. We present the first systematic test of whether this implicit contract holds: whether activation monitors trained on a base model remain reliable after these routine model updates. Across multiple safety-relevant monitors, model depths, update families, and open-weight models, we find a sharp split: quantization-style updates largely preserve frozen probe performance, while fine-tuning-style updates frequently make probes stale. Fragility is highly monitor-dependent, with privacy/PII probes most affected and refusal-compliance probes comparatively stable, showing that retraining a behavior need not stale its corresponding monitor. QLoRA is especially damaging despite NF4 quantization alone being relatively benign, suggesting that quantization becomes riskier when combined with adaptation. We further show that degradation is predictable from pre-deployment features, enabling revalidation budgets to be triaged toward the monitors most likely to fail. These results suggest that fine-tuning should trigger activation-monitor revalidation by default, while prediction can help prioritize which monitors to check first.

2606.16050 2026-06-16 cs.LG cs.AI 新提交

ALCL: An Adaptive Log-Correntropy Loss for Robust Learning under Non-Gaussian Noise

ALCL:一种用于非高斯噪声下鲁棒学习的自适应对数相关熵损失

Mainak Kundu, Ria Kanjilal, Ismail Uysal

发表机构 * University of South Florida(南佛罗里达大学) California Polytechnic State University(加州州立理工大学)

AI总结 提出自适应对数相关熵损失(ALCL),通过可微重参数化联合学习形状和尺度参数,使损失几何动态适应残差统计,抑制极端异常值,在混合重尾和脉冲噪声下优于MSE和固定核相关熵损失。

详情
AI中文摘要

在重尾和脉冲噪声下的鲁棒深度学习仍然具有挑战性,因为均方误差(MSE)等传统损失对异常值表现出无界敏感性。尽管基于相关熵的目标函数提高了鲁棒性,但现有公式依赖于固定的核参数,这些参数必须凭经验调整且在训练期间保持不变。为了解决这些局限性,我们提出了一种自适应对数相关熵损失(ALCL),这是一种重尾损失公式,能够在优化过程中自适应地学习其鲁棒性几何结构。ALCL引入了一个对数残差模型,其形状和尺度参数通过可微重参数化与网络权重联合学习。这产生了一个原理性的最大似然公式,其影响函数形式上是有界且再下降的,使得损失几何能够动态适应不断变化的残差统计,同时抑制极端异常值。在四个广泛使用的基准数据集(涵盖灰度图像和红绿蓝(RGB)图像数据)上,在混合重尾和脉冲噪声下进行的比较实验表明,ALCL在重建保真度和下游分类准确性方面始终优于MSE和最优调整的广义相关熵损失。虽然在低噪声条件下性能差异仍然很小,但在高噪声条件下,ALCL在灰度基准上中位数准确率提高了高达4.75%,在RGB数据集上提高了4.51%,并且运行间方差减小。这些结果表明,通过联合学习损失参数实现的自适应鲁棒性为非高斯环境下深度学习中基于静态相关熵的损失提供了一种计算高效的替代方案。

英文摘要

Robust deep learning under heavy-tailed and impulsive noise remains challenging because conventional losses such as mean squared error (MSE) exhibit unbounded sensitivity to outliers. Although correntropy-based objectives improve robustness, existing formulations rely on fixed kernel parameters that must be empirically tuned and remain static during training. To address these limitations, we propose an Adaptive Log-Correntropy Loss (ALCL), a heavy-tailed loss formulation that adaptively learns its robustness geometry during optimization. ALCL introduces a logarithmic residual model whose shape and scale parameters are learned jointly with network weights through differentiable reparameterization. This yields a principled maximum likelihood formulation whose influence function is formally bounded and redescending, allowing the loss geometry to adapt dynamically to evolving residual statistics while suppressing extreme outliers. Comparative experiments on four widely used benchmark datasets spanning grayscale and red-green-blue (RGB) image data under mixed heavy-tailed and impulsive noise demonstrate that ALCL consistently outperforms MSE and optimally tuned generalized correntropy losses in both reconstruction fidelity and downstream classification accuracy. While performance differences remain small under low-noise conditions, under high-noise regimes ALCL improves median accuracy by up to 4.75% on grayscale benchmarks and 4.51% on RGB datasets, with reduced variance across runs. These results demonstrate that adaptive robustness through joint learning of loss parameters provides a computationally efficient alternative to static correntropy-based losses for deep learning in non-Gaussian environments.

2606.16196 2026-06-16 cs.LG cs.CV 新提交

When Confidence Lacks Concepts: Interpretable OOD Detection via Representation Perturbations

当置信度缺乏概念:通过表示扰动实现可解释的OOD检测

Anju Chhetri, Pratik Shrestha, Ramesh Rana, Prashnna Gyawali, Binod Bhattarai

发表机构 * NepAl Applied Mathematics and Informatics Institute for research(尼泊尔应用数学与信息学研究所) West Virginia University(西弗吉尼亚大学) Kathmandu University(加德满都大学) University College London(伦敦大学学院) University of Aberdeen(阿伯丁大学)

AI总结 提出一种基于类条件语义扰动和稀疏自编码器的可解释OOD检测框架,通过分析表示稳定性实现检测与内部机制解释。

详情
AI中文摘要

深度神经网络在医学影像任务中取得了显著性能,但其在分布偏移下过度泛化的倾向对安全临床部署构成了主要障碍。分布外(OOD)检测方法旨在缓解这一风险,但现有方法大多依赖语义含义理解不足的不透明内部信号,限制了在安全关键场景中的信任。本文提出一种可解释的OOD检测框架,该框架通过类条件语义扰动探测模型预测的稳定性。利用稀疏自编码器(SAE),我们从分布内数据中学习类特定概念向量,将密集的中间表示解耦为稀疏、语义有意义的组件。在推理时,我们使用与模型预测类别相关的概念向量扰动深层表示,并测量类别logits的稳定性。我们假设分布内样本对此类扰动表现出低敏感性,因为其表示与类特定语义方向对齐,而OOD样本由于表示错位而显示出放大的偏差。通过将OOD检测框架为概念条件稳定性分析,我们的方法既提供了判别性OOD信号,又提供了驱动模型不确定性的内部机制的可解释视角,使其特别适用于高风险医学应用。

英文摘要

Deep neural networks have achieved remarkable performance across medical imaging tasks, yet their tendency to overgeneralize under distributional shifts poses a major obstacle to safe clinical deployment. Out-of-Distribution (OOD) detection methods aim to mitigate this risk, but most existing approaches rely on opaque internal signals with poorly understood semantic meaning, limiting trust in safety-critical settings. In this work, we propose an interpretable OOD detection framework that probes the stability of model predictions under class-conditioned semantic perturbations. Leveraging sparse autoencoders (SAEs), we learn class-specific concept vectors from in-distribution data that disentangle dense intermediate representations into sparse, semantically meaningful components. At inference, we perturb deeper-layer representations using the concept vectors associated with the model's predicted class and measure the class logits stability. We hypothesize that in-distribution samples exhibit low sensitivity to such perturbations, as their representations align with class-specific semantic directions, whereas OOD samples show amplified deviations due to representational misalignment. By framing OOD detection as a concept conditioned stability analysis, our approach provides both a discriminative OOD signal and an interpretable lens into the internal mechanisms driving model uncertainty, making it particularly suitable for high stakes medical applications.

2606.16524 2026-06-16 cs.LG astro-ph.CO stat.ML 新提交

Neural Bayesian Anomaly Mitigation: A Robust Loss that Doubles as an Unsupervised Contamination Classifier

神经贝叶斯异常缓解:一种兼具无监督污染分类器功能的鲁棒损失函数

S. A. K. Leeney, W. J. Handley, H. T. J. Bevins, E. de Lera Acedo

发表机构 * Astrophysics Group, Cavendish Laboratory, University of Cambridge(剑桥大学卡文迪许实验室天体物理组) Institute of Astronomy, University of Cambridge(剑桥大学天文研究所)

AI总结 提出神经贝叶斯异常缓解(NBAM)损失,基于贝叶斯潜变量混合模型,既提供鲁棒监督损失又输出无监督污染后验,在CIFAR-10上优于Huber等基线。

Comments 13 pages, 4 figures

详情
AI中文摘要

工程化的鲁棒损失函数(如Huber、Student-$t$和广义交叉熵)使监督模型能够容忍污染,但无法回答哪些观测被破坏。我们引入神经贝叶斯异常缓解(NBAM),一种通用的即插即用损失函数,源自贝叶斯潜在开关混合模型:边际似然定义了一个鲁棒的监督损失,相关的后验定义了一个无监督的污染分类器。与Huber或Student-$t$类似,NBAM可以替换任何监督流程中的标准训练损失;与它们不同,NBAM还学习了一个结构化的污染模型,并返回每个样本的校准污染后验。学习到的输入相关先验$π_ϕ(x)$捕获污染的空间局部性,使得靠近已知损坏的样本更可能被标记,同时自动出现奥卡姆惩罚并正则化以防止过度标记。在具有非对称标签污染的CIFAR-10上,NBAM无需监督即可恢复污染过程的结构:污染后验将干净样本与污染样本分开,学习到的异常头识别每个标签翻转对的方向。除了这些能力之外,在0.2-0.6的污染率下,NBAM的性能优于本文考虑的四种鲁棒损失基线。

英文摘要

Engineered robust losses such as Huber, Student-$t$, and generalised cross-entropy make supervised models tolerant of contamination but cannot answer which observations are corrupted. We introduce Neural Bayesian Anomaly Mitigation (NBAM), a general-purpose drop-in loss derived from a Bayesian latent-switch mixture model: the marginal likelihood defines a robust supervised loss, and the associated posterior defines an unsupervised contamination classifier. Like Huber or Student-$t$, NBAM can replace the standard training loss in any supervised pipeline; unlike them, it additionally learns a structured contamination model and returns a calibrated per-sample contamination posterior. A learned input-dependent prior $π_ϕ(x)$ captures the spatial locality of contamination, so that samples near known corruptions are more likely to be flagged, while an Occam penalty emerges automatically and regularises against over-flagging. On CIFAR-10 with asymmetric label contamination, NBAM recovers the structure of the corruption process without supervision: the contamination posterior separates clean from corrupted samples, and the learned anomaly head identifies the direction of every label-flip pair. Alongside these capabilities, NBAM outperforms the four robust-loss baselines considered here at contamination rates 0.2-0.6.

2606.16535 2026-06-16 cs.LG cs.CV cs.SC 新提交

Assessing Reliability of Symbol Detection in Concept Bottleneck Models

评估概念瓶颈模型中符号检测的可靠性

Javier Fumanal-Idocin, Javier Andreu-Perez

发表机构 * University of Essex(埃塞克斯大学)

AI总结 本文研究概念瓶颈模型(CBM)中符号检测的可靠性问题,通过交换独立训练的概念检测器和分类头来识别易受虚假激活影响的概念,并提出一种可靠性感知训练策略,在CUB-200-2011和合成任务上验证了其有效性。

详情
AI中文摘要

概念瓶颈模型(CBM)是可解释人工智能的相关工具,因为它们通过人类可解释的符号进行预测。然而,高任务准确率并不能保证这些符号被忠实地检测到:联合训练的CBM可能在瓶颈中编码任务特定的捷径,使其解释不可靠。在本文中,我们通过交换共享相同符号词汇的独立训练的概念检测器和分类头来研究概念检测的可靠性。我们利用由此产生的性能下降、概念级指标和符号级不确定性估计来识别特别容易发生虚假激活的概念。最后,我们提出了一种可靠性感知训练策略,其中共享的概念检测器通过多个分类头进行优化,并因依赖全局或实例级不可靠符号而受到惩罚。在具有完整概念监督的CUB-200-2011上,检测器和头几乎可以自由互换(交换下降低于一个准确率点,相对保留率高于99%,且没有概念检测低于随机水平),而在受控的合成任务上,我们表明,随着概念监督权重的减少,模型保持近乎完美的任务准确率,而交换准确率和与真实概念的一致性下降到随机水平。我们的可靠性感知训练显著缓解了这种泄漏,在泄漏情况下大致使交换准确率翻倍。

英文摘要

Concept Bottleneck Models (CBMs) are a relevant tool for explainable Artificial Intelligence because they make their predictions through human-interpretable symbols. However, high task accuracy does not guarantee that these symbols are detected faithfully: jointly trained CBMs may encode task-specific shortcuts in the bottleneck, making their explanations unreliable. In this paper, we study concept-detection reliability by swapping independently trained concept detectors and classification heads that share the same symbolic vocabulary. We use the resulting performance degradation, concept-level metrics, and symbol-wise uncertainty estimates to identify concepts that are especially prone to spurious firing. Finally, we propose a reliability-aware training strategy in which a shared concept detector is optimized with multiple classification heads and penalized for relying on globally or instance-wise unreliable symbols. On CUB-200-2011 with full concept supervision, detectors and heads are almost freely interchangeable (swap drop below one accuracy point, relative retention above $99\%$, and no concept detected below chance), whereas on a controlled synthetic task we show that, as the concept-supervision weight is reduced, models keep near-perfect task accuracy while swapped accuracy and agreement with the ground-truth concepts collapse to chance. Our reliability-aware training substantially mitigates this leakage, roughly doubling swap accuracy in the leaky regime.

2606.16602 2026-06-16 cs.LG cs.NA math.NA physics.comp-ph 新提交

PhysGuard: Fisher-Guided Gradient Projection for Sim-to-Real Neural PDE Surrogates

PhysGuard: 面向仿真到现实神经PDE代理的Fisher引导梯度投影

Changjian Zhou, Junfeng Fang, Negin Yousefpour, Peng Wu, Bin Yan, Guillermo A Narsilio

发表机构 * Faculty of Engineering and IT, University of Melbourne(墨尔本大学工程与信息技术学院) School of Computing, National University of Singapore(新加坡国立大学计算机学院) Artificial Intelligence Research Institute, IFLYTEK Co., Ltd.(科大讯飞股份有限公司人工智能研究院)

AI总结 针对神经算子模型从仿真到现实迁移时的精度下降问题,提出PhysGuard框架,利用仿真数据的Fisher信息矩阵保护物理关键参数,限制微调更新方向,在严重域偏移下将低频误差降低32%。

详情
AI中文摘要

在仿真数据上训练的神经算子模型由于仿真到现实的差距,应用于实验测量时往往失去精度。使用有限真实数据的标准微调可以缩小这一差距,但可能损害预训练期间学到的核心物理相关表示。尽管知识保留自适应在视觉或语言任务中已被广泛研究,但对于架构和受保护知识根本不同的神经算子,这些方法是否适用仍不清楚。神经算子需要保留核心尺度的物理结构,而非语义或视觉特征。我们提出PhysGuard,一个用于神经算子精确仿真到现实自适应的物理保留框架。具体来说,PhysGuard利用在仿真数据上计算的实证Fisher信息矩阵来识别物理关键参数方向,然后将微调更新限制在不干扰这些方向的方向上。逐层的Gram矩阵公式使其对具有数百万参数的模型高效,而自适应阈值自动确定受保护子空间大小。频谱探测实验表明,主导Fisher方向与低频输出结构强相关。在四个神经算子架构和不同物理系统的基准实验表明,与基线相比,PhysGuard在大多数评估指标上表现强劲。在严重域偏移下优势最为明显,与标准微调相比,低频误差降低高达32%,同时保持适应性。我们的代码可在https://github.com/ZhouChaunge/PhysGuard获取。

英文摘要

Neural operator models trained on simulation data often lose accuracy when applied to experimental measurements due to the sim-to-real gap. Standard fine-tuning with limited real data can reduce this gap, but it may also damage the core physics-relevant representations learned during pretraining. Although knowledge-preserving adaptation has been widely investigated in vision or language tasks, it remains unclear whether these methods are suitable for neural operators whose architectures and protected knowledge are fundamentally different. Neural operators need to preserve core-scale physical structures rather than semantic or visual features. We propose PhysGuard, a physics-preserving framework for accurate sim-to-real adaptation of neural operators. Specifically, PhysGuard uses the empirical Fisher Information Matrix computed on simulation data to identify physics-critical parameter directions, then restricts fine-tuning updates to directions that do not interfere with them. A layer-wise Gram-matrix formulation makes this efficient for models with millions of parameters, while an adaptive threshold automatically determines the protected subspace size. A spectral probe experiment shows that the dominant Fisher directions are strongly associated with low-frequency output structures. Experiments on benchmark across four neural operator architectures and different physical systems show that PhysGuard performs strongly on most evaluation metrics compared to baselines. The benefits are most evident under severe domain shift, where it reduces low-frequency error by up to 32\% compared to standard fine-tuning while maintaining adaptability. Our code is available at https://github.com/ZhouChaunge/PhysGuard.

2606.16682 2026-06-16 cs.LG cs.CL 新提交

Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agents

多模态评估者偏好坍缩:自进化智能体中的跨模态传染

Zewen Liu

发表机构 * Qilu Institute of Technology, School of Software Engineering(齐鲁理工学院软件工程学院)

AI总结 研究多模态自评估中偏好坍缩的加剧现象,发现跨模态传染导致策略选择扭曲,并引入传染矩阵量化风险。

Comments 19 pages, 0 figures

详情
AI中文摘要

当AI智能体使用语言模型在反馈循环中评估自身输出时,会出现系统性偏差。我们表明,评估者偏好坍缩(EPC)在多模态设置中被显著放大。使用GPT-4o评估DeepSeek-chat在文本和视觉任务上的表现,我们发现单一策略(step_by_step)吸收了48.4%的权重——是纯文本自评估中坍缩的3.2倍——而三个视觉域策略合计仅获得9.1%的权重。然后,我们展示了一种称为跨模态传染的新现象:在一个模态上获得的评估者偏好会迁移到另一个模态并破坏其策略选择。通过一个四阶段隔离训练范式,我们测量了传染系数并记录了策略反转——一个模态的最优策略在跨模态暴露后发生逆转。跨四种评估者配置(总计53次独立重复,15,592次API调用)的第3阶段统计验证揭示了一个清晰的层次结构:跨模型评估(GPT-4o,N=8)产生强但对称的双向传染(平均gamma_{T->V}=1.176,gamma_{V->T}=1.089,Delta=-0.088,p=0.575,Cohen's d=0.29);高轮次(DashScope,50轮)导致坍缩为单一策略主导(70%零传染);而自评估提供近乎完全的免疫——97%的运行(N=30,DeepSeek-chat)产生恰好为零的传染(平均gamma=0.033,95% CI [-0.031, 0.010],p=0.642,d=0.07)。没有评估者条件显示出统计显著的方向不对称性。我们引入了由评估者身份索引的传染矩阵,发布了MM-EPC实验框架,并将跨模型评估者架构确定为偏好传染的主要风险因素。

英文摘要

When AI agents use language models to evaluate their own outputs in a feedback loop, systematic biases emerge. We show that Evaluator Preference Collapse (EPC) is dramatically amplified in multimodal settings. Using GPT-4o to evaluate DeepSeek-chat across text and visual tasks, we find that a single strategy (step_by_step) absorbs 48.4% of all weight -- 3.2x the collapse observed in text-only self-evaluation -- while three visual-domain strategies receive only 9.1% combined weight. We then demonstrate a novel phenomenon we term cross-modal contagion: evaluator preferences acquired on one modality transfer to and corrupt strategy selection on another. Through a four-phase isolation training paradigm, we measure contagion coefficients and document strategy inversion -- the optimal strategy for a modality reverses after cross-modal exposure. A Phase 3 statistical validation across four evaluator configurations (N=53 total independent repetitions, 15,592 API calls) reveals a clear hierarchy: cross-model evaluation (GPT-4o, N=8) produces strong but symmetric bidirectional contagion (mean gamma_{T->V}=1.176, gamma_{V->T}=1.089, Delta=-0.088, p=0.575, Cohen's d=0.29); high round counts (DashScope, 50 rounds) cause collapse to single-strategy dominance (70% zero contagion); and self-evaluation provides near-complete immunity -- 97% of runs (N=30, DeepSeek-chat) yield exactly zero contagion (mean gamma=0.033, 95% CI [-0.031, 0.010], p=0.642, d=0.07). No evaluator condition shows statistically significant directional asymmetry. We introduce the contagion matrix indexed by evaluator identity, release the MM-EPC experimental framework, and identify cross-model evaluator architecture as the primary risk factor for preference contagion.

2606.16786 2026-06-16 cs.LG 新提交

We Need Explanation Cards to Connect Explanation Algorithms to the Real World

我们需要解释卡来连接解释算法与现实世界

Eric Günther, Balázs Szabados, Kristof Meding, Gunnar König, Sebastian Bordt, Ulrike von Luxburg

发表机构 * University of Tübingen(蒂宾根大学) Tübingen AI Center(蒂宾根人工智能中心) HUN-REN Institute for Computer Science and Control (SZTAKI), Budapest, Hungary(匈牙利科学院计算机科学与控制研究所(SZTAKI))

AI总结 针对算法解释在实践中含义模糊且信息不足的问题,提出解释卡,通过补充鲁棒性和有效性信息及解释说明,帮助用户正确解读,并满足欧盟AI法案的可解释性要求。

详情
AI中文摘要

算法解释旨在帮助利益相关者理解不透明的算法决策,但在实践中往往达不到预期。首先,算法解释的含义通常不是人们直观期望的那样,因此需要专业知识才能正确解释。其次,最近的研究表明,流行的解释算法对于复杂决策函数的行为信息不足。这些共同导致了解释表面传达的内容与实际提供的内容之间的差距。在这项工作中,我们提出了解释算法的解释卡,它用关于鲁棒性和有效性的补充信息以及清晰的解释说明来增强标准解释。补充信息可以使原本无信息的解释变得实际有用,同时也有助于检测它们不适用的情况。重要的是,解释卡中的解释说明将责任从用户转移到提供者:提供者必须事先明确说明从解释中可以得出什么和不能得出什么,而不是期望用户自己识别。使用反事实解释和SHAP作为示例,我们展示了提供者如何构建解释卡,以及这些卡为用户提供了正确解释所需的指导。我们进一步论证了解释卡是实践欧盟AI法案可解释性规定的实用手段。总体而言,解释卡是使解释算法适应现实世界用例的重要一步。

英文摘要

Algorithmic explanations are intended to help stakeholders understand opaque algorithmic decisions, but in practice, they often fall short. First, the meaning of algorithmic explanations is often not what one might intuitively expect, so expert knowledge is required to interpret them correctly. Second, recent work has shown that popular explanation algorithms are uninformative about the behavior of complex decision functions. Together, these issues create a gap between what explanations appear to convey and what they actually provide. In this work, we propose Explanation Cards for Explanation Algorithms, which augment standard explanations with complementary information about robustness and validity, as well as clear instructions for interpretation. The complementary information can render otherwise uninformative explanations practically useful, while also helping to detect cases where they are not. Importantly, the interpretation instructions in explanation cards shift responsibility from users to providers: Rather than expecting users to recognize what can and cannot be concluded from an explanation, providers must make this explicit upfront. Using counterfactual explanations and SHAP as examples, we demonstrate how providers can construct explanation cards and that these cards provide users with the guidance needed for sound interpretation. We further argue that explanation cards offer a practical means of operationalising the explainability provisions of the EU AI Act. Overall, explanation cards are a significant step toward making explanation algorithms fit for real-world use cases.

2606.16883 2026-06-16 cs.LG cs.AI 新提交

Upper Bounds on the Generalization Error of Deep Learning Models via Local Robustness and Stability

深度学习模型泛化误差的上界:基于局部鲁棒性和稳定性

Abdul-Rauf Nuhu, Parham M. Kebria, Vahid Hemmati, Mahmoud N. Mahmoud, Edward Tunstel, Abdollah Homaifar

发表机构 * North Carolina Agricultural and Technical State University(北卡罗来纳农业技术州立大学) University of Alabama(阿拉巴马大学) Southwest Research Institute(西南研究院)

AI总结 提出一种通过局部区域稳定样本数缩放鲁棒性项的泛化上界,在ImageNet上实现非空洞且最紧的误差估计。

详情
AI中文摘要

泛化是数据驱动模型的关键属性,尤其是在安全关键应用中部署的深度学习模型。基于鲁棒性的泛化界作为一种将鲁棒性与泛化性能联系起来的原则性方法而受到关注,通常以数据依赖的方式。然而,大多数现有界在实际设置中存在空洞问题,产生远超过实际错误率的松散上界,限制了其在真实世界评估中的实用性。虽然这个问题通常归因于不确定性项,但问题的很大一部分源于鲁棒性项本身,特别是对于0-1损失。现有方法通常将鲁棒性项视为全局度量,忽略了其在输入空间不同子区域间的变化。在这项工作中,我们提出了一种泛化界,通过根据每个子区域内稳定和不稳定样本的数量来缩放鲁棒性项,从而解决了这一局限性。我们的界同时包含数据和模型依赖因素,同时保持实际相关性(产生更紧的真实误差上界)。在ImageNet数据集上训练的模型上的实验表明,我们的界始终非空洞,并在现有方法中实现了最紧的估计,与一系列鲁棒深度神经网络的实证性能紧密对齐。

英文摘要

Generalization is a critical property of data-driven models, particularly deep learning models deployed in safety-critical applications. Robustness-based generalization bounds have gained attention as a principled way to link robustness properties to generalization performance, often in a data-dependent manner. However, most existing bounds suffer from vacuousness in practical settings, yielding loose upper bounds that greatly exceed the actual error rates and limiting their usefulness for real-world evaluation. While this issue is often attributed to the uncertainty term, a substantial part of the problem originates from the robustness term itself, particularly for the 0-1 loss. Existing approaches typically treat the robustness term as a global measure, ignoring its variation across different sub-regions of the input space. In this work, we propose a generalization bound that addresses this limitation by scaling the robustness term according to the number of stable and unstable samples within each sub-region. Our bounds incorporate both data- and model-dependent factors while maintaining practical relevance (yielding tighter upper bounds on true error). Experiments on models trained on the ImageNet dataset show that our bounds remain consistently non-vacuous and achieve the tightest estimates among existing methods, closely aligning with empirical performance across a range of robust deep neural networks.

2606.14867 2026-06-16 cs.CL cs.AI cs.LG 交叉投稿

Evaluating the Robustness of Proof Autoformalization in Lean 4

评估 Lean 4 中证明自动形式化的鲁棒性

Zhengtao Gui, Sheng Yang, Zhouxing Shi

发表机构 * University of California, Irvine(加州大学洛杉矶分校) University of California, Riverside(加州大学河滨分校)

AI总结 研究证明自动形式化模型在全局和局部扰动下的鲁棒性,发现现有模型对全局扰动敏感且多数无法忠实反映局部扰动。

Comments Preprint

详情
AI中文摘要

证明自动形式化旨在将用自然语言编写的数学非正式证明翻译成形式语言(如 Lean~4)中的形式证明。已有几项工作开发了基于 LLM 的证明自动形式化模型。然而,现有评估通常侧重于翻译来自精选数据集的规范非正式证明。我们认为,一个鲁棒的证明自动形式化器必须即使对于偏离这些理想化形式的非正式证明也能保持忠实,并提出了首个关于证明自动形式化模型鲁棒性的研究。我们制定了两类扰动并评估每种扰动下的鲁棒性:全局扰动以不同风格改写非正式证明,在此情况下形式化应保持一致;局部扰动改变一个值、符号或证明步骤,可能是反事实的方式,鲁棒的形式化应忠实地反映扰动,而不是自行恢复为原始形式或推断出不同的形式。我们在 miniF2F 和 MATH-500 上构建了包含两种扰动的基准,并自动衡量证明自动形式化在全局扰动下正确性的稳定程度,以及其输出在局部扰动下的忠实程度。我们评估了七个最新模型,所有模型都对全局扰动敏感,且大多数在局部扰动下无法保持忠实。代码和数据可通过 https://github.com/ucr-rai/robust-proof-autoformalization 获取。

英文摘要

Proof autoformalization aims to translate a mathematical informal proof written in natural language into a formal proof in a formal language such as Lean~4. Several works have developed LLM-based models for proof autoformalization. However, existing evaluations have typically focused on translating well-formed informal proofs from curated datasets. We argue that a robust proof autoformalizer must remain faithful even for informal proofs that diverge from these idealized ones, and we present the first study on the robustness of proof autoformalization models. We formulate two categories of perturbations and evaluate robustness under each: a global perturbation paraphrases the informal proof in a different style, under which the formalization should remain consistent; a local perturbation alters a value, symbol, or proof step, possibly in a counterfactual way, and a robust formalization should faithfully reflect the perturbation rather than reverting to the original one or inferring a different one on its own. We build a benchmark with both perturbations on miniF2F and MATH-500, and automatically measure how stable a proof autoformalization's correctness is under global perturbations and how faithfully its output reflects local perturbations. We evaluate seven recent models, all of which are sensitive to global perturbations and mostly fail to remain faithful under local perturbations. Code and data are available via https://github.com/ucr-rai/robust-proof-autoformalization.

2606.14909 2026-06-16 stat.ML cs.LG 交叉投稿

Audited Conformal Prediction for Classification under Unknown Distribution Shift

未知分布漂移下分类问题的审计共形预测

Yanfei Zhou, Rizal Fathony, Nam H. Nguyen, Matteo Sesia

发表机构 * Department of Data Sciences and Operations, University of Southern California(数据科学与运营系,南加州大学) AI Foundations, Capital One(Capital One人工智能基础) Department of Data Sciences and Operations, Thomas Lord Department of Computer Science, University of Southern California(数据科学与运营系,托马斯·劳德计算机科学系,南加州大学)

AI总结 提出审计共形预测方法,利用目标群体小标注数据训练审计模型识别旧模型可能失败的输入,结合共形预测框架在保证边际覆盖的同时提高条件覆盖,并提供理论保证。

详情
AI中文摘要

我们考虑在未知分布漂移下部署的预训练分类模型的不确定性量化问题。我们提出了审计共形预测(ACP),该方法利用来自目标群体的小标注数据集训练一个辅助审计模型,以识别旧模型可能失败的输入。通过将审计模型的输出整合到共形预测框架中,ACP 产生的预测集在保证边际覆盖的同时,在实践中比现有方法实现了更高的条件覆盖。我们开发并分析了两种互补的整合策略——一种针对边际覆盖并改善条件性能,另一种提供明确的组条件覆盖保证——并为两者建立了理论保证。在合成和真实世界数据集上的实验验证了该方法,并说明了预测集大小与条件覆盖之间的权衡。

英文摘要

We consider the problem of uncertainty quantification for a pretrained classification model deployed under unknown distribution shift. We propose Audited Conformal Prediction (ACP), a method that leverages a small labeled dataset from the target population to train an auxiliary audit model identifying inputs where the legacy model is likely to fail. By integrating the audit model's outputs into the conformal prediction framework, ACP produces prediction sets that guarantee marginal coverage while achieving substantially higher conditional coverage in practice than existing approaches. We develop and analyze two complementary integration strategies -- one targeting marginal coverage with improved conditional performance, the other providing explicit group-conditional coverage guarantees -- and establish theoretical guarantees for both. Experiments on synthetic and real-world datasets validate the method and illustrate trade-offs between prediction set size and conditional coverage.

2606.15779 2026-06-16 cs.CV cs.LG 交叉投稿

Faithful Action-unit Causal Reasoning for Counterfactually Faithful Emotion Explanations

面向反事实忠实情感解释的忠实动作单元因果推理

Van Thong Huynh, Hong Hai Nguyen, Thuy Pham, Trong Nghia Nguyen, Soo-Hyung Kim

发表机构 * Faculty of CSE, Ho Chi Minh City University of Technology (HCMUT), VNUHCM(胡志明市理工大学计算机科学与工程学院,越南国家大学胡志明市分校) Dept. of AI, FPT University(FPT大学人工智能系) Faculty of DSAI, College of Technology, National Economic University(国民经济大学技术学院数据科学与人工智能系) Dept. of AI Convergence, Chonnam National University(全南大学人工智能融合系)

AI总结 提出FACR方法,通过反事实一致性目标和极性感知因果图,训练模型在动作单元与情感之间实现可测量的因果忠实性,在UNBC-PAIN数据集上将忠实度从0.08提升至0.57。

详情
AI中文摘要

多模态模型可以命名面部情感背后的动作单元(AU),但其AU->情感的解释通常是合理的而非忠实的:没有任何机制强制模型调用的AU是实际驱动其预测的AU。我们将AU->情感推理视为解释、标签和结构化AU->情感因果图G之间的反事实一致性问题,并提出FACR,该方法将推理器建立在独立诱导的、极性感知的G上,并训练一个反事实忠实性目标:对G标记为某类因果的AU进行do干预必须改变预测,而对标记为无关的AU进行do干预必须保持预测不变。因此,忠实性既可通过匹配的干预指标进行训练和测量,我们针对已知因果结构PSPI疼痛-AU组成评估该指标,因为现有情感推理基准不支持。我们明确指出,该指标测试的是对给定结构的忠实性而非重新发现:它询问训练后的推理器是否调用结构标记为因果的AU,在留出受试者和第二个数据集上进行评估。在UNBC-PAIN上的受试者独立评估中,该目标将调用AU与PSPI组成的一致性从无目标的基线0.08提高到0.57,检测成本略有增加;一个不忠实控制实验将增益归因于该目标。在跨数据集情感迁移中,该目标同样提高了七类任务上对G的忠实性(0.50到0.84)。最后,我们附加语言verbalizer并将审计扩展到生成的文本:通过潜在激活偏置每个动作单元的发射,使解释在结构上忠实,因此消融一个AU会将其从解释中移除,该属性可迁移到第二个语言模型骨干,而自由生成的解释则不忠实。

英文摘要

Multimodal models can name the action units (AUs) behind a facial emotion, but their AU->emotion rationales are typically plausible rather than faithful: nothing forces the AUs a model invokes to be the AUs that actually drive its prediction. We cast AU->emotion reasoning as a counterfactual-consistency problem between the rationale, the label, and a structural AU->emotion causal graph G, and propose FACR, which grounds the reasoner in an independently induced, polarity-aware G and trains a counterfactual-faithfulness objective: a do-intervention on an AU that G marks causal for a class must move the prediction, while one it marks irrelevant must leave it unchanged. Faithfulness is thereby both trainable and measurable through a matching interventional metric, which we evaluate against a known causal structure, the PSPI pain-AU composition, as no existing affective-reasoning benchmark allows. We are explicit that this metric tests fidelity to the supplied structure rather than its rediscovery: it asks whether the trained reasoner invokes the AUs the structure marks causal, on held-out subjects and a second dataset. Under subject-independent evaluation on UNBC-PAIN, the objective raises the agreement between the invoked AUs and the PSPI composition from a no-objective baseline of 0.08 to 0.57, at a small detection cost; an unfaithfulness control attributes the gain to the objective. On a cross-dataset emotion transfer, the objective likewise raises fidelity to G on a seven-class task (0.50 to 0.84). Finally, we attach a language verbalizer and extend the audit to the generated text: biasing each action unit's emission by its latent activation makes the rationale faithful by construction, so that ablating an AU removes it from the explanation, a property that transfers to a second language-model backbone, whereas a freely generated rationale is unfaithful.

2606.15821 2026-06-16 cs.CL cs.AI cs.LG 交叉投稿

The Truth Stays in the Family: Enhancing Contextual Grounding via Inherited Truthful Heads in Model Lineages

真相留在家族中:通过模型谱系中继承的真相头增强上下文基础

Miso Choi, Seonga Choi, Mincheol Kwon, Woosung Joung, Jinkyu Kim, Jungbeom Lee

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 研究发现基础LLM与下游变体间存在上下文真相分数的强继承性,提出TruthProbe软门控策略放大真相头以提升上下文真实性并减少多模态幻觉。

Comments Accepted at ICML 2026

详情
AI中文摘要

大型语言模型(LLM)的最新进展产生了许多共享基础LLM的专业多模态LLM(MLLM),形成了不同的模型谱系。基础LLM与下游变体之间是否存在基本的行为联系尚不清楚。我们通过量化头部级别的上下文真相分数来研究这个问题。在包括基于Vicuna、Qwen2.5、LLaMA2和Mistral的模型在内的多种LLM和MLLM谱系中,我们发现真相分数在模型家族内被强烈保留,即使在指令调优或多模态适应后也是如此。我们进一步表明,这种继承与注意力头权重保留一致,并且上下文真相头关注查询相关的证据。基于这一发现,我们提出了TruthProbe,一种软门控策略,在保留其他头部贡献的同时放大上下文真相头。TruthProbe在HaluEval上提高了上下文真实性,并在POPE和CHAIR上减少了多模态幻觉,基础LLM的真相分数有效转移到其微调的LLM和MLLM后代。代码可在https://github.com/miso-choi/TruthProbe获取。

英文摘要

Recent advances in large language models (LLMs) have produced many specialized multimodal LLMs (MLLMs) that share common foundational LLMs, forming distinct model lineages. It remains unclear whether a fundamental behavioral link exists between the foundational LLMs and downstream variants. We investigate this question by quantifying head-level context-truthfulness scores. Across diverse LLM and MLLM lineages, including Vicuna-, Qwen2.5-, LLaMA2-, and Mistral-based models, we find that Truth Scores are strongly preserved within model families, even after instruction tuning or multimodal adaptation. We further show that this inheritance is consistent with attention-head weight preservation, and that context-truthful heads attend to query-relevant evidence. Building on this finding, we propose TruthProbe, a soft-gating strategy that amplifies context-truthful heads while preserving other head contributions. TruthProbe improves contextual truthfulness on HaluEval and reduces multimodal hallucination on POPE and CHAIR, with base-LLM Truth Scores transferring effectively to their fine-tuned LLM and MLLM descendants. Code is available at https://github.com/miso-choi/TruthProbe.

2606.15964 2026-06-16 stat.ML cs.LG 交叉投稿

PromptShift-CRC: Drift-Aware Conformal Risk Control for Foundation Models Under Prompt and Domain Shift

PromptShift-CRC: 面向提示和领域漂移的基础模型的漂移感知保形风险控制

Jeffery Opoku, David Banahene

发表机构 * The University of Texas Rio Grande Valley(德克萨斯理工大学里奥格兰德谷分校) Florida International University(佛罗里达国际大学)

AI总结 提出PromptShift-CRC方法,通过嵌入提示和响应、测量漂移、加权校准样本并在线更新风险水平,在提示和领域漂移下控制基础模型输出的风险。

详情
AI中文摘要

基础模型现在被用于其接收的提示可能快速变化的场景。用户变化、主题变化、策略变化,模型可能突然面临在校准数据中罕见的请求类型。这使得固定校准变得有风险。保形预测和保形风险控制提供了与模型无关的控制错误的方法,但当校准数据与未来数据相似时效果最佳。本文开发了PromptShift CRC,一种面向提示和领域漂移的基础模型输出的漂移感知保形风险控制方法。该方法嵌入提示和响应,测量当前提示流与校准池的偏离程度,对相关或最近的校准示例赋予更大权重,并在观察到违规后在线更新风险水平。它报告三个实用诊断指标:实现风险误差、提示漂移和有效校准大小。我们给出了该方法在分布不匹配和加权分位数不确定性项下控制风险的条件。在一个合成提示漂移基准中,静态保形风险控制在漂移后急剧失效,而PromptShift-CRC在所考虑的适应性基线中提供了最佳覆盖。然后,我们在公开基准的派生流上评估相同的校准层,包括问答、毒性、摘要事实性和长上下文幻觉风险。

英文摘要

Foundation models are now used in settings where the prompts they receive can change quickly. Users change, topics change, policies change, and the model may suddenly face a kind of request that was rare in the calibration data. This makes fixed calibration risky. Conformal prediction and conformal risk control give model-agnostic ways to control error, but they work best when the calibration data still look like the future data. This paper develops PromptShift CRC, a drift-aware conformal risk control method for foundation-model outputs under prompt and domain shift. The method embeds prompts and responses, measures how far the current prompt stream has moved from the calibration pool, gives more weight to relevant or recent calibration examples, and updates the risk level online after observed violations. It reports three practical diagnostics: realized risk error, prompt drift, and effective calibration size. We give conditions under which the method controls risk up to terms for distribution mismatch and weighted quantile uncertainty. In a synthetic prompt-shift benchmark, static conformal risk control fails sharply after drift, while PromptShift-CRC gives the best coverage among the adaptive baselines considered. We then evaluate the same calibration layer on public benchmark derived streams for question answering, toxicity, summarization factuality, and long-context hallucination risk

2606.16567 2026-06-16 cs.AI cs.LG cs.SY eess.SY math.DS 交叉投稿

TNODEV: Toolbox for Neural ODE Verification

TNODEV: 神经ODE验证工具箱

Abdelrahman Sayed Sayed, Pierre-Jean Meyer, Mohamed Ghazel

发表机构 * Univ Gustave Eiffel, COSYS-ESTAS(古斯塔夫·埃菲尔大学,COSYS-ESTAS实验室)

AI总结 提出TNODEV,首个集成伪造检查、区间可达性、验证循环和并行调度的神经ODE形式验证器,支持安全集包含和分类鲁棒性验证。

Comments 29 pages, 7 figures, Under review in TMLR

详情
AI中文摘要

神经常微分方程(神经ODE)已开始出现在安全关键场景中,例如网络物理系统的连续时间控制器和集成到自动化决策流水线中的分类器,这引发了对其行为能否被形式化验证的问题。现有的专门用于神经ODE的工具仅提供单次可达性调用,没有迭代输入集细化,将其判定的精度限制在单次可达性调用所能提供的范围内。我们提出了TNODEV,这是首个用于神经ODE的可靠形式验证器,它集成了伪造检查器、基于连续时间混合单调性的快速区间可达性后端、具有三种输入集分裂启发式的验证与细化循环以及并行调度器,构成一个端到端流水线。TNODEV支持纯神经ODE、与神经网络控制器闭环的神经ODE以及通用神经ODE(GNODE)上的安全集包含验证,安全集可指定为区间或由目标分类标签诱导的半空间交集。我们在安全集包含和分类鲁棒性属性的一系列基准上评估了TNODEV,包括与NNV 2.0和CORA的直接可达性比较,以及在MNIST通用神经ODE分类器上与NNV2.0的验证比较。

英文摘要

Neural ordinary differential equations (neural ODE) have started to appear in safety critical settings such as continuous-time controllers for cyber-physical systems and classifiers integrated into automated decision pipelines, raising the question of whether their behavior can be formally verified. Existing tools dedicated to neural ODE provide only a single reachability call without iterative input set refinement, limiting the precision of their verdicts to whatever one reachability call can deliver. We present TNODEV, the first sound formal verifier for neural ODE that integrates a falsification checker, a fast interval-based reachability backend based on continuous-time mixed monotonicity, a verification and refinement loop with three input-set splitting heuristics, and a parallel scheduler in a single end-to-end pipeline. TNODEV supports safe-set inclusion verification on pure neural ODE, neural ODE in closed loop with a neural network controller and general neural ODE (GNODE), with the safe set specified either as an interval or as the half-space intersection induced by a target classification label. We evaluate TNODEV on a range of benchmarks across safe-set inclusion and classification-robustness properties, including a direct reachability comparison against NNV~2.0 and CORA and a verification comparison against NNV2.0 on MNIST general neural ODE classifiers.

2409.01062 2026-06-16 cs.LG cs.CR cs.CV 版本更新

Random Erasing vs. Model Inversion: A Promising Defense or a False Hope?

随机擦除 vs. 模型反演:有希望的防御还是虚假的希望?

Viet-Hung Tran, Ngoc-Bao Nguyen, Son T. Mai, Hans Vandierendonck, Ira Assent, Alex Kot, Ngai-Man Cheung

发表机构 * Temasek Laboratories, Singapore University of Technology and Design(Temasek实验室,新加坡技术与设计大学) The Queen’s University Belfast(女王大学贝尔法斯特分校) Aarhus University(阿arhus大学) Nanyang Technological University (NTU)(南洋理工大学(NTU);河内 Vin 大学) VinUniversity, Hanoi, Vietnam

AI总结 本文探索随机擦除(RE)作为防御模型反演攻击的方法,通过特征空间分析揭示其有效性,并在37种设置下实现隐私-效用权衡的最优性能。

Comments Accepted in Transactions on Machine Learning Research (TMLR). First two authors contributed equally

详情
AI中文摘要

模型反演(MI)攻击通过从机器学习模型重建私有训练数据,构成重大的隐私威胁。虽然现有防御主要集中于模型中心方法,但数据对MI鲁棒性的影响仍 largely 未被探索。在这项工作中,我们探索了随机擦除(RE)——一种传统上用于提高遮挡下模型泛化能力的技术——并揭示了其作为防御MI攻击的惊人有效性。具体来说,我们新颖的特征空间分析表明,使用RE图像训练的模型在MI重建图像的特征与私有数据的特征之间引入了显著差异。同时,私有图像的特征与其他类别保持 distinct,并与不同分类区域良好分离。这些效应共同降低了MI重建质量和攻击准确率,同时保持了合理的自然准确率。此外,我们探索了RE的两个关键属性,包括部分擦除和随机位置。部分擦除防止模型在训练期间观察完整对象。我们发现这对旨在重建完整对象的MI有显著影响。擦除的随机位置在实现强隐私-效用权衡中起着关键作用。我们的发现凸显了RE作为一种简单而有效的防御机制,可以轻松与现有隐私保护技术集成。在37种设置上的广泛实验表明,我们的方法在隐私-效用权衡中达到了最先进的性能。结果一致证明了我们的防御在不同MI攻击、网络架构和攻击配置下优于现有方法。首次,我们在某些配置下实现了攻击准确率的显著下降而不降低效用。

英文摘要

Model Inversion (MI) attacks pose a significant privacy threat by reconstructing private training data from machine learning models. While existing defenses primarily concentrate on model-centric approaches, the impact of data on MI robustness remains largely unexplored. In this work, we explore Random Erasing (RE), a technique traditionally used for improving model generalization under occlusion, and uncover its surprising effectiveness as a defense against MI attacks. Specifically, our novel feature space analysis shows that models trained with RE-images introduce a significant discrepancy between the features of MI-reconstructed images and those of the private data. At the same time, features of private images remain distinct from other classes and well-separated from different classification regions. These effects collectively degrade MI reconstruction quality and attack accuracy while maintaining reasonable natural accuracy. Furthermore, we explore two critical properties of RE including Partial Erasure and Random Location. Partial Erasure prevents the model from observing entire objects during training. We find this has a significant impact on MI, which aims to reconstruct the entire objects. Random Location of erasure plays a crucial role in achieving a strong privacy-utility trade-off. Our findings highlight RE as a simple yet effective defense mechanism that can be easily integrated with existing privacy-preserving techniques. Extensive experiments across 37 setups demonstrate that our method achieves state-of-the-art (SOTA) performance in the privacy-utility trade-off. The results consistently demonstrate the superiority of our defense over existing methods across different MI attacks, network architectures, and attack configurations. For the first time, we achieve a significant degradation in attack accuracy without a decrease in utility for some configurations.

2510.24043 2026-06-16 cs.LG stat.ML 版本更新

Localized Kernel Projection Outlyingness: A Two-Stage Approach for Multi-Modal Outlier Detection

局部核投影离群度:一种用于多模态离群检测的两阶段方法

Akira Tamamori

发表机构 * Department of Computer Science, Aichi Institute of Technology(爱知技术大学计算机科学系)

AI总结 提出两阶段LKPLO框架,结合自适应损失函数、全局核PCA和局部聚类,解决多模态离群检测问题,在10个基准数据集上取得最优性能。

Comments 12 pages, 5 figures; accepted by The IEICE Transactions on Information and Systems

详情
AI中文摘要

本文提出两阶段LKPLO,一种新颖的多阶段离群检测框架,克服了传统基于投影的方法同时存在的局限性:它们依赖于固定的统计度量并假设单一数据结构。我们的框架独特地综合了三个关键概念:(1) 一种基于广义损失的离群度度量(PLO),用灵活的自适应损失函数(如我们提出的SVM类损失)替代固定度量;(2) 一个全局核PCA阶段,用于线性化非线性数据结构;(3) 一个后续的局部聚类阶段,用于处理多模态分布。在10个基准数据集上进行的全面5折交叉验证实验,结合自动超参数优化,表明两阶段LKPLO达到了最先进的性能。在现有方法失败且具有挑战性结构的数据集上,尤其是在多簇数据(Optdigits)和复杂高维数据(Arrhythmia)上,它显著优于强基线。此外,消融研究实证证实,核化和局部化阶段的协同组合对其优越性能不可或缺。这项工作为重要类别的离群检测问题贡献了一个强大的新工具,并强调了混合多阶段架构的重要性。

英文摘要

This paper presents Two-Stage LKPLO, a novel multi-stage outlier detection framework that overcomes the coexisting limitations of conventional projection-based methods: their reliance on a fixed statistical metric and their assumption of a single data structure. Our framework uniquely synthesizes three key concepts: (1) a generalized loss-based outlyingness measure (PLO) that replaces the fixed metric with flexible, adaptive loss functions like our proposed SVM-like loss; (2) a global kernel PCA stage to linearize non-linear data structures; and (3) a subsequent local clustering stage to handle multi-modal distributions. Comprehensive 5-fold cross-validation experiments on 10 benchmark datasets, with automated hyperparameter optimization, demonstrate that Two-Stage LKPLO achieves state-of-the-art performance. It significantly outperforms strong baselines on datasets with challenging structures where existing methods fail, most notably on multi-cluster data (Optdigits) and complex, high-dimensional data (Arrhythmia). Furthermore, an ablation study empirically confirms that the synergistic combination of both the kernelization and localization stages is indispensable for its superior performance. This work contributes a powerful new tool for a significant class of outlier detection problems and underscores the importance of hybrid, multi-stage architectures.

2604.17805 2026-06-16 cs.LG cs.AI cs.GT 版本更新

Ranking Abuse via Strategic Pairwise Data Perturbations

通过策略性成对数据扰动进行排名滥用

Junyi Yao, Zihao Zheng, Jiayu Long

发表机构 * Computational Decision Systems Report GitHub Issue(计算决策系统报告GitHub问题) GitHub Issue(GitHub问题)

AI总结 研究基于最大似然估计的成对排名系统对策略性数据扰动的脆弱性,提出自适应子集选择攻击(ASSA)方法,实验表明少量扰动即可显著改变全局排名。

详情
AI中文摘要

基于最大似然估计(MLE)的成对排名系统,如Bradley-Terry模型,被广泛用于从成对比较中聚合偏好。然而,它们在策略性数据操纵下的鲁棒性仍未被充分理解。在本文中,我们研究了基于MLE的排名系统对对抗性扰动的脆弱性。我们将操纵任务形式化为一个受约束的组合优化问题,并提出了一种自适应子集选择攻击(ASSA)来高效识别高影响力的扰动。在合成数据和真实世界选举数据集上的实验结果表明,基于MLE的排名表现出尖锐的相变行为:在超过一个小的扰动预算后,有限数量的策略性投票者可以显著改变全局排名。特别是,我们的方法在受约束的预算下始终优于随机和贪婪基线。这些发现揭示了基于MLE的排名机制对结构化扰动的基本敏感性,并强调了在集体决策系统中需要更鲁棒的聚合方法。

英文摘要

Pairwise ranking systems based on Maximum Likelihood Estimation (MLE), such as the Bradley-Terry model, are widely used to aggregate preferences from pairwise comparisons. However, their robustness under strategic data manipulation remains insufficiently understood. In this paper, we study the vulnerability of MLE-based ranking systems to adversarial perturbations. We formulate the manipulation task as a constrained combinatorial optimization problem and propose an Adaptive Subset Selection Attack (ASSA) to efficiently identify high-impact perturbations. Experimental results on both synthetic data and real-world election datasets show that MLE-based rankings exhibit a sharp phase-transition behavior: beyond a small perturbation budget, a limited number of strategic voters can significantly alter the global ranking. In particular, our method consistently outperforms random and greedy baselines under constrained budgets. These findings reveal a fundamental sensitivity of MLE-based ranking mechanisms to structured perturbations and highlight the need for more robust aggregation methods in collective decision-making systems.

2605.00924 2026-06-16 cs.LG cs.AI 版本更新

StyleShield: Exposing the Fragility of AIGC Detectors through Continuous Controllable Style Transfer

StyleShield: 通过连续可控风格迁移揭示AIGC检测器的脆弱性

Guantian Zheng

发表机构 * National University of Singapore(新加坡国立大学)

AI总结 提出StyleShield,一种基于流匹配的条件文本风格迁移框架,通过连续控制风格迁移强度,在保持语义相似度的同时实现高规避率,并引入RateAudit算法质疑基于分数的检测可靠性。

Comments 12 pages, 5 figures. Code and model weights will be released upon acceptance

详情
AI中文摘要

AI生成内容(AIGC)检测器越来越多地部署在学术诚信筛查等高风险场景中,然而其可靠性依赖于一个基本悖论:随着语言模型在人类编写的语料库上训练,AI与人类写作之间的统计边界将不可避免地消失。商业激励进一步扭曲了这一格局——检测服务和“去AI化”工具通常在同一供应链中运作,用对内容来源的判断取代了对内容质量的评估。我们提出了StyleShield,这是第一个用于条件文本风格迁移的流匹配框架,通过DiT骨干网络和零初始化交叉注意力适配器,直接在连续token嵌入空间中操作,并以冻结的Qwen-7B表示为条件。在推理时,我们将图像合成中的SDEdit范式适配到文本嵌入,通过单个参数gamma提供对规避-保留权衡的平滑连续控制。在一个多领域中文基准测试中,StyleShield对训练检测器实现了94.6%的规避率,对三个未见检测器实现了≥99%的规避率,同时保持了0.928的语义相似度。我们进一步引入了RateAudit,一种文档级调度算法,证明检测率判定可以设置为任意值,直接质疑了基于分数评估的可靠性。

英文摘要

AI-generated content (AIGC) detectors are increasingly deployed in high-stakes settings such as academic integrity screening, yet their reliability rests on a fundamental paradox: as language models are trained on human-written corpora, the statistical boundary between AI and human writing will inevitably dissolve as models improve. Commercial incentives have further distorted this landscape -- detection services and "de-AIification" tools often operate within the same supply chain, replacing evaluation of content quality with judgment of content origin. We present StyleShield, the first flow matching framework for conditional text style transfer, operating directly in continuous token embedding space via a DiT backbone with zero-initialized cross-attention adapters conditioned on frozen Qwen-7B representations. At inference, we adapt the SDEdit paradigm from image synthesis to text embeddings, with a single parameter gamma providing smooth continuous control over the evasion-preservation trade-off. On a multi-domain Chinese benchmark, StyleShield achieves 94.6% evasion against the training detector and >=99% against three unseen detectors, maintaining 0.928 semantic similarity. We further introduce RateAudit, a document-level scheduling algorithm that demonstrates detection-rate verdicts can be set to arbitrary values, directly questioning the reliability of score-based evaluation.

2606.11474 2026-06-16 cs.LG cs.SY eess.SY physics.acc-ph 版本更新

Mahalanobis-Guided Latent OOD Detection for Hybrid ES-DRL Control in Time-Varying Systems

基于马氏距离的潜在分布外检测用于时变系统中混合ES-DRL控制

Shaifalee Saxena, Alexander Scheinker

AI总结 针对时变系统中强化学习控制器性能下降问题,提出基于变分自编码器潜在空间马氏距离的分布外检测方法,实现与极值搜索控制器的自适应切换,并在粒子加速器控制中验证有效性。

详情
AI中文摘要

本文研究了非线性时变系统中基于马氏距离的潜在分布外(OOD)检测,用于测试时RL控制器切换。RL控制器可以在训练分布内快速控制高维系统,但当时间变化动力学产生未见过的观测时,其性能可能下降。我们考虑一个组合的ES-DRL控制器,其中RL提供快速的分布内动作,而有界极值搜索(ES)在OOD操作下提供鲁棒的模型无关控制。关键挑战在于决定何时切换。我们在分布内束流剖面观测上训练变分自编码器(VAE),并使用VAE潜在空间中的马氏距离在测试时检测OOD束流剖面。此OOD决策设置一个二元开关,选择RL控制器或ES控制器。我们在安全关键的粒子加速器控制中评估该方法。在此设置中,空间磁体运动产生RL训练期间未见过的OOD束流剖面。VAE潜在空间的可视化表明,所提方法识别出此OOD场景,并为组合控制器中RL和ES之间的切换提供可解释信号。

英文摘要

In this paper, we study Mahalanobis-guided latent out-of-distribution (OOD) detection for test-time RL controller switching in nonlinear time-varying systems. RL controllers can quickly control high-dimensional systems within the training distribution, but their performance can degrade when time-varying dynamics produce unseen observations. We consider a combined ES--DRL controller, where RL provides fast in-distribution actions and bounded extremum seeking (ES) provides robust model-independent control under OOD operation. The key challenge is deciding when to switch. We train a variational autoencoder (VAE) on in-distribution beam-profile observations and use Mahalanobis distance in the VAE latent space to detect OOD beam profiles at test time. This OOD decision sets a binary switch that selects either the RL controller or the ES controller. We evaluate the approach in safety-critical particle accelerator control. In this setting, spatial magnet motion creates OOD beam profiles that were not seen during RL training. Visualization of the VAE latent space shows that the proposed method identifies this OOD scenario and provides an interpretable signal for switching between RL and ES in the combined controller.

2501.01908 2026-06-16 cs.CV cs.LG eess.IV physics.med-ph 版本更新

Training-Free Adversarial Robustness in Computational MRI

计算MRI中无需训练的抗对抗鲁棒性

Mahdi Saberi, Chi Zhang, Mehmet Akçakaya

发表机构 * arXiv

AI总结 提出一种无需重训练即可缓解MRI重建模型对抗攻击的方法,基于循环测量一致性在攻击输入的小邻域内最小化目标函数,显著降低对抗扰动影响。

Comments International Conference on Machine Learning (ICML), 2026

详情
AI中文摘要

深度学习方法已成为重建欠采样磁共振成像数据的最先进技术。然而,研究表明这些方法易受小的对抗输入扰动影响,导致输出图像出现严重失真。已有多种策略被提出以减少这些攻击的影响,但它们需要重新训练。在这项工作中,我们提出了一种新颖的方法,无需任何重训练即可缓解MRI重建模型上的对抗攻击。基于循环测量一致性的思想,我们设计了一个新颖的缓解目标,在攻击输入周围的小球内最小化该目标。结果表明,我们的方法在不同数据集、攻击类型/强度以及PD-DL网络上显著降低了对抗扰动的影响,并在定性和定量上优于传统的缓解方法。我们还引入了一个实际相关的小对抗扰动场景,该场景模拟原始数据中的脉冲噪声(与人字形伪影相关),并展示了我们的方法在此设置中的适用性。最后,我们展示了我们的缓解方法在两种现实扩展场景中仍然有效:盲设置(用户不知道攻击强度或算法)和自适应攻击设置(攻击者完全了解防御策略)。

英文摘要

Deep learning (DL) methods have become the state-of-the-art for reconstructing sub-sampled magnetic resonance imaging (MRI) data. However, studies have shown that these methods are susceptible to small adversarial input perturbations, resulting in major distortions in the output images. Various strategies have been proposed to reduce the effects of these attacks, but they require retraining. In this work, we propose a novel approach for mitigating adversarial attacks on MRI reconstruction models without any retraining. Based on the idea of cyclic measurement consistency, we devise a novel mitigation objective that is minimized in a small ball around the attack input. Results show that our method substantially reduces the impact of adversarial perturbations across different datasets, attack types/strengths and PD-DL networks, and qualitatively and quantitatively outperforms conventional mitigation methods. We also introduce a practically relevant scenario for small adversarial perturbations that models impulse noise in raw data, which relates to herringbone artifacts, and show the applicability of our approach in this setting. Finally, we show our mitigation approach remains effective in two realistic extension scenarios: a blind setup, where the attack strength or algorithm is not known to the user; and an adaptive attack setup, where the attacker has full knowledge of the defense strategy.

2502.12445 2026-06-16 cs.AI cs.LG stat.ML 版本更新

Computational Safety for Generative AI: A Hypothesis Testing Perspective

生成式AI的计算安全性:假设检验视角

Pin-Yu Chen

发表机构 * IBM Research(IBM研究院)

AI总结 本文从假设检验角度形式化生成式AI的计算安全性,提出基于信号处理的方法检测恶意输入和AI生成内容。

Comments Extended version of the paper presented at the ICML 2026 Workshop on Hypothesis Testing

详情
AI中文摘要

AI安全是一个快速发展的研究领域,旨在防止前沿AI技术的危害和滥用,特别是针对能够通过文本提示创建逼真高质量内容的生成式AI(GenAI)工具。此类工具的例子包括大型语言模型(LLM)和文本到图像(T2I)扩散模型。由于相似的训练数据源和神经网络架构设计,各种领先GenAI模型的性能趋于饱和,因此开发可靠的安全护栏已成为责任和可持续性的关键差异化因素。本文提出了计算安全性概念的形式化,这是一个数学框架,通过信号处理理论和方法的视角,能够对GenAI中的安全挑战进行定量评估、表述和研究。特别是,我们探讨了GenAI中两类可表述为假设检验问题的计算安全挑战。对于模型输入的安全性,我们展示了如何使用敏感性分析和损失景观分析来检测带有越狱尝试的恶意提示。对于模型输出的安全性,我们阐明了如何使用统计信号处理来检测AI生成的内容。最后,我们讨论了关键的开放研究挑战、机遇以及信号处理在计算AI安全中的重要作用。

英文摘要

AI safety is a rapidly growing area of research that seeks to prevent the harm and misuse of frontier AI technology, particularly with respect to generative AI (GenAI) tools that are capable of creating realistic and high-quality content through text prompts. Examples of such tools include large language models (LLMs) and text-to-image (T2I) diffusion models. As the performance of various leading GenAI models approaches saturation due to similar training data sources and neural network architecture designs, the development of reliable safety guardrails has become a key differentiator for responsibility and sustainability. This paper presents a formalization of the concept of computational safety, which is a mathematical framework that enables the quantitative assessment, formulation, and study of safety challenges in GenAI through the lens of signal processing theory and methods. In particular, we explore two exemplary categories of computational safety challenges in GenAI that can be formulated as hypothesis testing problems. For the safety of model input, we show how sensitivity analysis and loss landscape analysis can be used to detect malicious prompts with jailbreak attempts. For the safety of model output, we elucidate how statistical signal processing can be used to detect AI-generated content. Finally, we discuss key open research challenges, opportunities, and the essential role of signal processing in computational AI safety.

2505.03201 2026-06-16 stat.ML cs.LG 版本更新

Enhancing Visual Feature Attribution via Weighted Integrated Gradients

通过加权积分梯度增强视觉特征归因

Kien Tran Duc Tuan, Tam Nguyen Trong, Son Nguyen Hoang, Khoat Than, Anh Nguyen Duc

发表机构 * Institute of Information and Communication Technology, Vietnam Academy of Science and Technology(越南科学与技术学院信息与通信技术研究所)

AI总结 针对积分梯度方法对基线选择敏感的问题,提出加权积分梯度,通过无监督准则自适应选择和加权基线,在保持公理性质的同时提升归因可靠性,实验显示在卷积和Transformer架构上最高提升36%。

详情
AI中文摘要

积分梯度(IG)是可解释AI中广泛使用的归因方法,尤其在需要可靠特征归因的计算机视觉应用中。IG的一个关键限制是其对基线(参考)图像选择的敏感性。多基线扩展如期望梯度(EG)假设基线均匀加权,隐含地认为所有基线图像信息量相等。在高维视觉模型中,这一假设常导致噪声或不稳定的解释。本文提出加权积分梯度(WG),一种通过评估和加权基线来增强归因可靠性的原则性方法。WG引入了一个无监督的基线适用性标准,实现了基于每个输入的自适应基线选择和加权。该方法在广义加权基线形式下保留了IG的核心公理性质。在预期的、基于代理的适应度-相关性单调性假设下,WG为更信息丰富的基线分配更大权重提供了概率依据。在常用图像数据集和模型上的实验表明,在我们的协议下,WG优于EG,在评估的卷积和Transformer架构上最高提升36%。这些提升伴随着额外的适应度评估成本,因此WG应被视为归因保真度的权衡,而非EG的更快替代方案。通过超越所有基线贡献相等的假设,加权积分梯度为解释计算机视觉模型提供了更清晰、更可靠的方法,提高了可解释AI的理解和实际可用性。

英文摘要

Integrated Gradients (IG) is a widely used attribution method in explainable AI, particularly in computer vision applications where reliable feature attribution is essential. A key limitation of IG is its sensitivity to the choice of baseline (reference) images. Multi-baseline extensions such as Expected Gradients (EG) assume uniform weighting over baselines, implicitly treating all baseline images as equally informative. In high-dimensional vision models, this assumption often leads to noisy or unstable explanations. This paper proposes Weighted Integrated Gradients (WG), a principled approach that evaluates and weights baselines to enhance attribution reliability. WG introduces an unsupervised criterion for baseline suitability, enabling adaptive selection and weighting of baselines on a per-input basis. The method preserves the core axiomatic properties of IG in a generalized weighted-baseline form. Under an expected, proxy-based fitness--relevance monotonicity assumption, WG provides a probabilistic justification for assigning larger weights to more informative baselines. Experiments on commonly used image datasets and models show that WG improves over EG under our protocol, with up to 36% gains across evaluated convolutional and Transformer architectures. These gains come with additional fitness-evaluation cost, so WG should be viewed as an attribution-fidelity trade-off rather than a faster alternative to EG. By moving beyond the assumption that all baselines contribute equally, Weighted Integrated Gradients offers a clearer and more reliable approach to explaining computer-vision models, improving both understanding and practical usability in explainable AI.

2603.10562 2026-06-16 math.OC cs.LG cs.SY eess.SY 版本更新

Quantization Robustness of Monotone Operator Equilibrium Networks

单调算子均衡网络的量化鲁棒性

James Li, Philip H. W. Leong, Thomas Chaffey

发表机构 * School of Electrical and Computer Engineering, The University of Sydney(悉尼大学电气与计算机工程学院)

AI总结 分析单调算子均衡网络在低精度硬件部署时权重量化对收敛性和均衡解的影响,提出基于谱扰动和单调性边界的理论保证,并通过MNIST实验验证了量化精度与收敛性的相变关系。

Comments 6 pages, 4 figures. Accepted for publication in IEEE Control Systems Letters (L-CSS)

详情
AI中文摘要

单调算子均衡网络是隐式层模型,其输出是单调算子的唯一均衡点,保证了存在性、唯一性和收敛性。当部署在低精度硬件上时,权重被量化,可能破坏这些保证。我们将权重量化分析为底层单调包含的谱扰动。当谱范数权重扰动小于单调性边界时,量化求解器的收敛性得到保证;量化与全精度均衡之间的位移由扰动大小和边界界定;一个条件数(算子范数与边界的比值)将量化精度与前向误差联系起来。MNIST实验在预测阈值处确认了相变:三位和四位后训练量化发散,而五位及以上收敛。反向传播保证使得量化感知训练成为可能,在四位时恢复了可证明的收敛性。

英文摘要

Monotone operator equilibrium networks are implicit-layer models whose output is the unique equilibrium of a monotone operator, guaranteeing existence, uniqueness, and convergence. When deployed on low-precision hardware, weights are quantized, potentially destroying these guarantees. We analyze weight quantization as a spectral perturbation of the underlying monotone inclusion. Convergence of the quantized solver is guaranteed whenever the spectral-norm weight perturbation is smaller than the monotonicity margin; the displacement between quantized and full-precision equilibria is bounded in terms of the perturbation size and margin; and a condition number characterizing the ratio of the operator norm to the margin links quantization precision to forward error. MNIST experiments confirm a phase transition at the predicted threshold: three- and four-bit post-training quantization diverge, while five-bit and above converge. The backward-pass guarantee enables quantization-aware training, which recovers provable convergence at four bits.

2605.03297 2026-06-16 cs.SD cs.LG 版本更新

Contrastive Regularization for Accent-Robust ASR

对比正则化用于口音鲁棒的ASR

Van-Phat Thai, Aradhya Dhruv, Duc-Thinh Pham, Sameer Alam

发表机构 * Air Traffic Management Research Institute, Nanyang Technological University, Singapore(新加坡南洋理工大学航空交通管理研究所) Center of AI Research, VinUniversity, Vietnam(越南Vin大学人工智能研究中心)

AI总结 提出使用监督对比学习作为轻量级口音不变辅助目标,在CTC微调中正则化编码器表示,无需架构修改或显式口音监督,在L2-ARCTIC基准上实现高达25-29%的未见口音词错误率降低。

Comments Accepted by Interspeech 2026

详情
AI中文摘要

基于自监督声学预训练和CTC微调的ASR系统在母语语音上表现强劲,但对口音变化仍然敏感。我们研究监督对比学习(SupCon)作为CTC微调的轻量级、口音不变辅助目标。一个话语级对比损失正则化编码器表示,无需架构修改或显式口音监督。在L2-ARCTIC基准上的实验表明,多个预训练编码器均实现一致的WER降低,在未见口音评估下相对降低高达25-29%。使用转录内余弦离散度分析表明,SupCon在口音变化下促进更紧凑和稳定的表示几何结构。总体而言,SupCon提供了一种有效且模型无关的正则化策略,用于提高口音鲁棒性。

英文摘要

ASR systems based on self-supervised acoustic pretraining and CTC fine-tuning achieve strong performance on native speech but remain sensitive to accent variability. We investigate supervised contrastive learning (SupCon) as a lightweight, accent-invariant auxiliary objective for CTC fine-tuning. An utterance-level contrastive loss regularizes encoder representations without architectural modification or explicit accent supervision. Experiments on the L2-ARCTIC benchmark show consistent WER reductions across multiple pretrained encoders, with up to 25 -- 29\% relative reduction under unseen-accent evaluation. Analysis using within-transcript cosine dispersion indicates that SupCon promotes more compact and stable representation geometry under accent variability. Overall, SupCon provides an effective and model-agnostic regularization strategy for improving accent robustness.

2605.30837 2026-06-16 cs.CR cs.LG 版本更新

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

先派侦察兵:提示注入防御中自适应检测器分配的预推理方法

Shuhao Zhang, Jiarui Li, Qi Cao, Ruiyi Zhang, Pengtao Xie

发表机构 * UC San Diego(加州大学圣迭戈分校) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 针对提示注入检测器异构且不可靠的问题,提出SCOUT框架,通过预测每个检测器对每个样本的可靠性和延迟,动态分配检测器,实现安全性与效率的权衡。

Comments We propose SCOUT, a detector allocation framework that predicts each detector's accuracy and latency on a given input before running it, letting operators control the safety-utility trade-off with a single threshold and route to an LLM judge only when needed

详情
AI中文摘要

提示注入检测器是异构的:每个检测器在不同攻击切片上表现强劲,但没有一个始终可靠。然而现有系统仍将检测视为固定的单检测器流水线,将每个请求提交给一个检测器的盲点。我们将防御重新定义为检测器分配:给定一个异构池,决定每个请求运行哪些检测器以及是否升级到LLM法官。我们的框架SCOUT(可扩展且可控的结果预测用于不确定性感知分诊)通过预测每个检测器在类似历史输入上的样本级可靠性和延迟,使这一决策动态化,并向操作员暴露一个单一的安全-效用阈值(其中效用包含良性通过率和挂钟时间)。为了评估这一设置,我们构建了SCOUT-450基准,该基准捕捉了旧提示注入集未充分代表的、结构复杂的面向代理的注入。在SCOUT-450上,相对于始终开启的GPT-4o法官,安全导向的工作点将攻击成功率降低46%,总挂钟时间降低40%,同时良性效用下降5.1个百分点。SCOUT还迁移到三个外部基准(BIPIA、IPI和IHEval),改善了安全-效用前沿。

英文摘要

Prompt-injection detectors are heterogeneous: each is strong on a different slice of attacks, and none is always reliable. Yet existing systems still treat detection as a fixed single-detector pipeline, committing every request to one detector's blind spots. We reframe defense as detector allocation: given a heterogeneous pool, decide per request which detectors to run and whether to escalate to an LLM judge. Our framework SCOUT (Scalable and Controllable Outcome-prediction for Uncertainty-aware Triage) makes this decision dynamic by predicting each detector's per-sample reliability and latency from how it behaved on similar past inputs, and exposes a single safety-utility threshold to the operator (where utility bundles benign-pass rate and wall-clock). To evaluate this setting, we build SCOUT-450, a benchmark that captures the structurally complex, agent-facing injections that older prompt-injection sets under-represent. On SCOUT-450, a safety-oriented operating point reduces attack-success rate by 46% and total wall-clock by 40% relative to an always-on GPT-4o judge, at a 5.1-point benign-utility drop. SCOUT also transfers to three external benchmarks (BIPIA, IPI, and IHEval), improving the safety-utility frontier.

9. 图学习与结构化数据 10 篇

2606.14956 2026-06-16 cs.LG 新提交

A Comparative Study of Graph Neural Network Layer Selection for Interaction Modelling in Driving Trajectory Prediction

图神经网络层选择用于驾驶轨迹预测中交互建模的比较研究

George Daoud, Mohamed El-Darieby

发表机构 * Ontario Tech University(安大略理工大学) Assiut University(艾斯尤特大学)

AI总结 本文比较了19种图神经网络层在轨迹预测中的空间和时间处理能力,发现ARMA、Chebyshev和拓扑感知层表现最佳,并总结了基于和聚合、多头注意力和不同跳距权重等设计原则。

Comments 6 pages, 1 figure

详情
Journal ref
The IEEE Intelligent Vehicles Symposium (IEEE IV 2026)
AI中文摘要

自动驾驶系统依赖精确的轨迹预测来规划安全高效的移动。图神经网络(GNN)已成为对道路智能体间时空交互建模的一种有前景的方法。然而,为轨迹预测设计GNN架构仍缺乏标准化,关于哪些图层能有效捕捉空间交互和时间动态的指导很少。本文对19种图层类型进行了详细的比较研究,重点关注它们的空间和时间处理能力,以发现最有效的轨迹预测架构。在所探索的超参数设置中,我们突出了五种突出的图层组合,其中ARMA、Chebyshev和拓扑感知层始终优于其他层。除了性能指标外,我们的发现还产生了实用的设计原则:基于和的聚合比基于均值的方法更有效,多头注意力机制能够实现更丰富的交互,为不同跳距分配不同权重显著提高了预测精度。这些发现为设计更可解释和有效的轨迹预测模型提供了有用的指导。

英文摘要

Autonomous driving systems rely on precise trajectory prediction to plan safe and efficient movement. Graph Neural Networks (GNNs) have become a promising approach for modelling spatiotemporal interactions among road agents. However, designing GNN architectures for trajectory prediction remains non-standardized, with little guidance on which graph layers effectively capture spatial interactions and temporal dynamics. This paper offers a detailed comparative study of 19 graph layer types, focusing on their spatial and temporal processing capabilities to discover the most effective architectures for trajectory prediction. Within the explored hyperparameter setting, we highlight five standout layer combinations, with ARMA, Chebyshev, and topology-aware layers consistently performing better than others. Beyond performance metrics, our findings yield practical design principles: sum-based aggregation is more effective than mean-based methods, multi-head attention mechanisms enable richer interactions, and assigning different weights to different hop distances significantly improves prediction accuracy. These findings offer useful guidance for designing more interpretable and effective trajectory prediction models.

2606.16611 2026-06-16 cs.LG 新提交

TCHG: Tri-Trust Conditioned Heterogeneous Graph Learning for Reliable Dynamic Trust Prediction

TCHG:基于三重信任条件异构图学习的可靠动态信任预测

Bohao Liao, Boyu Deng, Qipeng Song, Jieling Wang, Jingchao Wang

发表机构 * Xidian University(西安电子科技大学) Tsinghua University(清华大学)

AI总结 提出TCHG框架,将信任证据分解为三个通道(实体可靠性、交互行为可靠性、上下文信任),分别控制图传播中的消息准入、传播强度和模式选择,并采用非均匀衰减的时间状态处理多尺度演化,实现可靠动态信任预测。

Comments 18 pages, 10 figures, 13 tables

详情
AI中文摘要

信任预测推断潜在的用户-用户信任关系,为社会推荐、虚假评论与操纵检测以及风险识别提供重要支持。图神经网络因其学习网络结构和复杂信任依赖的能力,已成为信任预测的主流方法。然而,现有方法通常依赖信任信号的统一表示,未将异质信任证据分解为独立的证据通道,未能利用不同证据通道在信任建模中应发挥的不同作用。为弥补这一不足,本文认为信任证据不应被视为无差别的输入,而应分解并用作图传播的功能控制因子。我们提出TCHG,一种三重信任条件异构图学习框架,将信任证据分解为三个通道,并赋予它们在传播中不同的功能角色:实体可靠性控制消息准入,交互行为可靠性调节传播强度,上下文信任通过上下文条件算子选择调整传播模式。由于三个证据通道以不同时间尺度演化,TCHG维护具有非均匀衰减率的独立时间状态,以防止快速变化的上下文信号覆盖缓慢积累的实体可靠性。它进一步预测信任概率并校准输出概率,提高稀疏或冲突证据下的预测置信度。在多个公开信任数据集上的大量实验表明,与代表性信任预测和异构图基线方法相比,TCHG实现了有效且可靠的信任预测。

英文摘要

Trust prediction infers latent user-user trust relations and provides important support for social recommendation, fake-review and manipulation detection, and risk identification. Graph neural networks have become a prominent approach to trust prediction because of their ability to learn network structures and complex trust dependencies. However, existing methods often rely on a unified representation of trust signals and do not disentangle heterogeneous trust evidence into separate evidence channels, failing to exploit the distinct roles that different evidence channels should play during trust modeling. To address this gap, this paper argues that trust evidence should not be treated as an undifferentiated input, but should be decomposed and used as functional control factors over graph propagation. We propose TCHG, a tri-trust conditioned heterogeneous graph learning framework that decomposes trust evidence into three channels and assigns them distinct functional roles in propagation: entity reliability governs message admission, interaction-behavior reliability modulates propagation strength, and contextual trust adjusts the propagation mode through context-conditioned operator selection. Since the three evidence channels evolve at different temporal scales, TCHG maintains independent temporal states with non-uniform decay rates to prevent rapidly changing contextual signals from overwriting slowly accumulated entity reliability. It further predicts trust probability and calibrates the output probability, improving predictive confidence under sparse or conflicting evidence. Extensive experiments on multiple public trust datasets show that TCHG achieves effective and reliable trust prediction compared with representative trust prediction and heterogeneous graph baselines.

2606.16990 2026-06-16 cs.LG math.AT 新提交

Analytic Torsion and Spectral Gap Capture Persistent-Laplacian Performance

解析挠率和谱间隙捕捉持久拉普拉斯算子的性能

Jernej Grlj, Aaron D. Lauda

发表机构 * University of Southern California(南加州大学)

AI总结 提出用贝蒂数、谱间隙和解析挠率三个不变量的紧凑谱表示替代全谱,在多个数据集上实现同等或更优性能,显著降低计算开销并避免高频噪声。

Comments 13 pages

详情
AI中文摘要

虽然持久拉普拉斯算子(PL)比持久同调提供更丰富的数据几何表示,但利用其全特征谱进行学习任务常因高维性和不同过滤尺度下的“变长”问题而受阻。我们提出一种紧凑谱表示,将持久拉普拉斯算子提炼为三个数学基础不变量:贝蒂数、谱间隙和解析挠率。在包括MNIST、QM-3D和SKEMPI WT的基准数据集上,我们证明该降维特征空间捕捉了全谱的基本预测信号,在某些情况下甚至优于全谱,同时显著降低计算开销并防止高频特征值引入的噪声。我们的结果表明,这些不变量提供了谱几何与拓扑学习之间原则性的固定长度接口。

英文摘要

While persistent Laplacians (PL) offer a richer geometric representation of data than persistent homology, utilizing their full eigenspectrum for learning tasks is often hampered by high dimensionality and the ``varying length'' problem across different filtration scales. We propose a compact spectral representation that distills the persistent Laplacian into three mathematically grounded invariants: Betti numbers, the spectral gap, and analytic torsion. Across benchmark datasets including MNIST, QM-3D, and SKEMPI WT, we demonstrate that this reduced feature space captures the essential predictive signal of the full spectrum, and in some cases outperforms it, while significantly reducing computational overhead and preventing the noise introduced by higher-frequency eigenvalues. Our results suggest that these invariants provide a principled, fixed-length interface between spectral geometry and topological learning.

2606.14892 2026-06-16 cs.AI cs.LG cs.SI stat.ML 交叉投稿

Relational Structural Causal Models

关系结构因果模型

Adiba Ejaz, Elias Bareinboim

发表机构 * Causal Artificial Intelligence Lab, Columbia University(哥伦比亚大学因果人工智能实验室)

AI总结 提出关系结构因果模型,将结构因果模型扩展到对象和关系可变的场景,通过关系因果图和符号识别准则实现未见组合的因果和观测查询识别,并设计关系神经因果模型在交通场景中优于非关系基线。

Comments Proceedings of the Forty-Third International Conference on Machine Learning

详情
AI中文摘要

人工智能必须拥有一个因果的环境模型,支持关于干预和反事实的推理,同时具有组合性,支持对未见过的对象组合进行泛化。在这项工作中,我们正式研究了何时以及如何学习这样的模型。我们开发了关系结构因果模型,将结构因果模型(Pearl 2009)扩展到对象及其关系变化的场景。首先,我们展示了在没有进一步假设的情况下,不仅因果查询,而且关于未见对象组合的观测查询的答案也无法被识别。为了实现这种识别——包括在存在未观测混杂的情况下——我们定义了关系因果图并推导了符号识别准则。最后,我们提出了关系神经因果模型,这是一种可证明正确的方法,在具有不同汽车、信号和行人的模拟交通场景中优于非关系基线。

英文摘要

An artificial intelligence must have a model of its environment that is causal, supporting reasoning about interventions and counterfactuals, and also combinatorial, supporting generalization to unseen combinations of objects. In this work, we formally study when and how such a model can be learned. We develop relational structural causal models, extending structural causal models (Pearl 2009) to settings where objects and their relations vary. First, we show how answers to not only causal but also observational queries about unseen combinations of objects can not be identified without further assumptions. To enable such identification--including in the presence of unobserved confounding--we define relational causal graphs and derive symbolic identification criteria. Finally, we propose relational neural causal models, a provably correct approach that outperforms non-relational baselines on simulated traffic scenes with varying cars, signals, and pedestrians.

2407.07357 2026-06-16 cs.LG q-bio.MN 版本更新

A polarity-aware multi-relational model for the signed interaction prediction in biological networks

面向生物网络中符号交互预测的极性感知多关系模型

Ziye Zhou, Meijie Wang, Lun Yu

发表机构 * Metanovas Biotech, Inc.(MetaNovas生物技术公司)

AI总结 提出极性感知多关系模型PAMR,结合图卷积网络与张量分解及冲突感知采样策略,预测化学-基因的极性(激活/抑制)与非极性交互,在分类精度和极性区分上超越基线模型。

详情
AI中文摘要

预测生物网络中的符号交互对于理解药物机制和促进药物再利用至关重要。尽管深度图模型在建模复杂生物系统方面已展现出成功,但现有方法往往无法区分正负交互,限制了其在精确药理学预测中的实用性。在本研究中,我们提出了一种新颖的深度图模型PAMR(极性感知多关系模型),旨在预测极性(如激活、抑制)和非极性(如结合、影响)的化学-基因交互。我们的模型将图卷积网络与张量分解相结合以增强特征表示,并引入了一种冲突感知采样策略来解决极性歧义。我们引入了新的评估指标——极性区分得分(PDS)和CP@100,以评估模型区分交互类型的能力。实验结果表明,PAMR优于基线模型,实现了更高的分类精度和改进的极性边区分能力。具体而言,PAMR-CL达到了0.9072的宏AUROC和0.974的CP@100,超越了RGCN、GraphSAGE、TransE和BioNet基线。一项关于尼古丁的案例研究进一步识别了两个新的化学-基因抑制关联,即S100A6和SPP1,这些关联得到了独立实验文献的证实。此外,我们分析了子图成分对预测性能的影响,揭示了额外的网络结构并不总能提高准确性。这些发现强调了极性感知建模在药物发现和网络药理学中的重要性,为极性感知的化学-基因交互预测和网络药理学分析提供了一个可扩展的计算框架。

英文摘要

Predicting signed interactions in biological networks is crucial for understanding drug mechanisms and facilitating drug repurposing. While deep graph models have demonstrated success in modeling complex biological systems, existing approaches often fail to distinguish between positive and negative interactions, limiting their utility for precise pharmacological predictions. In this study, we propose a novel deep graph model, PAMR (polarity-aware multi-relational model), designed to predict both polar (e.g., activation, inhibition) and non-polar (e.g., binding, affect) chemical-gene interactions. Our model integrates graph convolutional networks with tensor decomposition to enhance feature representation and incorporates a conflict-aware sampling strategy to resolve polarity ambiguities. We introduce new evaluation metrics, polarity discrimination score (PDS) and CP@100, to assess the model's ability to differentiate interaction types. Experimental results demonstrate that PAMR outperforms baseline models, achieving superior classification accuracy and improved discrimination of polar edges. Specifically, PAMR-CL attains a Macro AUROC of 0.9072 and CP@100 of 0.974, surpassing RGCN, GraphSAGE, TransE, and BioNet baselines. A case study on nicotine further identifies two novel chemical-gene suppression links, S100A6 and SPP1, that are corroborated by independent experimental literature. Furthermore, we analyze the impact of subgraph components on predictive performance, revealing that additional network structures do not always enhance accuracy. These findings highlight the importance of polarity-aware modeling in drug discovery and network pharmacology, providing a scalable computational framework for polarity-aware chemical-gene interaction prediction and network pharmacology analysis.

2502.17614 2026-06-16 cs.LG cs.SI 版本更新

Scalable Graph Condensation with Evolving Capabilities

具有演化能力的可扩展图压缩

Shengbo Gong, Mohammad Hashemi, Juntong Ni, Carl Yang, Wei Jin

发表机构 * Emory University(埃默里大学)

AI总结 提出GECC框架,通过类级聚类和继承先前压缩结果,实现大规模动态图数据的可扩展压缩,性能优于现有方法且加速约1000倍。

详情
Journal ref
Page 314-323, Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, 2026
AI中文摘要

图数据的快速增长带来了显著的可扩展性挑战,因为大多数图算法的大小呈二次方扩展。为了缓解这些问题,图压缩(GC)方法被提出,用于从大图中学习一个小图,从而加速下游任务。然而,现有方法关键地假设训练集是静态的,这与现实世界图数据固有的动态和演化特性相冲突。本文引入了一个新颖的连续图压缩框架,能够高效更新蒸馏图,处理数据流而无需昂贵的重新训练。这一限制导致在压缩不断增长的训练集时效率低下。在本文中,我们提出了GECC(图演化聚类压缩),一种可扩展的图压缩方法,旨在处理大规模和演化的图数据。GECC通过对聚合特征执行类级聚类,采用了一种可追踪且高效的方法。此外,当压缩图扩展时,它可以继承先前的压缩结果作为聚类中心,从而获得演化能力。该方法具有坚实的理论基础,并展示了优越的经验性能。包括真实场景在内的综合实验表明,GECC在实现比大多数最先进图压缩方法更好性能的同时,在大数据集上实现了约1000倍的加速。

英文摘要

The rapid growth of graph data creates significant scalability challenges as most graph algorithms scale quadratically with size. To mitigate these issues, Graph Condensation (GC) methods have been proposed to learn a small graph from a larger one, accelerating downstream tasks. However, existing approaches critically assume a static training set, which conflicts with the inherently dynamic and evolving nature of real-world graph data. This work introduces a novel framework for continual graph condensation, enabling efficient updates to the distilled graph that handle data streams without requiring costly retraining. This limitation leads to inefficiencies when condensing growing training sets. In this paper, we introduce GECC (\underline{G}raph \underline{E}volving \underline{C}lustering \underline{C}ondensation), a scalable graph condensation method designed to handle large-scale and evolving graph data. GECC employs a traceable and efficient approach by performing class-wise clustering on aggregated features. Furthermore, it can inherit previous condensation results as clustering centroids when the condensed graph expands, thereby attaining an evolving capability. This methodology is supported by robust theoretical foundations and demonstrates superior empirical performance. Comprehensive experiments including real world scenario show that GECC achieves better performance than most state-of-the-art graph condensation methods while delivering an around 1000$\times$ speedup on large datasets.

2602.10031 2026-06-16 cs.LG 版本更新

Graph Learning Should Move Beyond Restrictive Views of Spectral and Message-Passing GNNs

图学习应超越对谱图神经网络和消息传递图神经网络的狭隘观点

Antonis Vasileiou, Juan Cervino, Pascal Frossard, Charilaos I. Kanatsoulis, Christopher Morris, Michael T. Schaub, Pierre Vandergheynst, Zhiyang Wang, Guy Wolf, Ron Levie

发表机构 * RWTH Aachen University(亚琛工业大学) Massachusetts Institute of Technology(麻省理工学院) École Polytechnique Fédérale de Lausanne(洛桑联邦理工学院) Stanford University(斯坦福大学) University of California San Diego(加州大学圣地亚哥分校) Univ. de Montréal(蒙特利尔大学) Mila(Mila人工智能研究所) Technion – Israel Institute of Technology(技术学院–以色列理工学院)

AI总结 本文澄清了谱图神经网络与消息传递图神经网络的异同,提出基于特征基对称性的谱GNN精确定义,并倡导统一理论框架以推动图学习发展。

Comments 44 pages, 1 figure

详情
AI中文摘要

图神经网络(GNN)通常分为消息传递神经网络(MPNN)和谱图神经网络,反映了机器学习和信号处理中两个相对独立的研究传统。虽然MPNN有精确的定义,但对于什么构成谱GNN的映射,没有广泛接受的标准。大多数现有工作将谱GNN限制为基于线性谱滤波器的分层架构。在此限制下,我们表明谱GNN和空间GNN具有大致相当的表达能力。为了促进该领域的进步,我们基于特征基对称性提出了谱GNN的精确定义,与通过邻域置换对称性定义的MPNN形成对比。我们进一步论证这两种视角提供了互补的优势。MPNN通过逻辑和图同构的工具为离散结构和表达能力分析提供了自然语言,而谱视角为理解平滑、瓶颈、稳定性和社区结构提供了原则性工具。总体而言,我们认为通过澄清这些视角之间的异同并迈向统一的理论框架,将加速图学习的进展。

英文摘要

Graph neural networks (GNNs) are commonly divided into message-passing neural networks (MPNNs) and spectral GNNs, reflecting two largely separate research traditions in machine learning and signal processing. While MPNNs have a precise definition, there is no widely accepted criterion for what makes a mapping a spectral GNN. Most existing work restricts spectral GNNs to layered architectures based on linear spectral filters. Under this restriction, we show that spectral and spatial GNNs have largely equivalent expressive power. To promote progress in the field, we propose a precise definition of spectral GNNs based on eigenbasis symmetries, in contrast to the definition of MPNNs via neighborhood permutation symmetries. We further argue that the two perspectives offer complementary strengths. MPNNs provide a natural language for discrete structure and expressivity analysis through tools from logic and graph isomorphism, while the spectral perspective offers principled tools for understanding smoothing, bottlenecks, stability, and community structure. Overall, we argue that progress in graph learning will be accelerated by clarifying the similarities and differences between these perspectives and by moving toward a unified theoretical framework.

2605.26290 2026-06-16 cs.LG 版本更新

Dynamic Link Prediction with Temporally Enhanced Signed Graph Neural Networks

基于时间增强符号图神经网络的动态链接预测

Derek Regier, Andrew Polyak, Aresh Dadlani, Khosro Salmani

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出一种模块化时间增强框架,通过历史上下文集成模块(HCIM)结合可学习的近因感知时间加权、LSTM嵌入轨迹建模和多头时间注意力,在符号图神经网络中捕获短期和长期符号交互动态,并在SE-SGformer上实例化,实验证明在多个真实和合成时间符号网络上性能显著提升。

Comments This manuscript has been withdrawn by the authors due to errors discovered in the implementation and experimental evaluation. These errors materially affect the reported results and conclusions. The authors therefore do not recommend using or citing this work

详情
AI中文摘要

时间符号网络(TSNs)模拟了社交媒体分析、信任与声誉系统以及金融交易网络等应用中出现的合作与对抗关系的时间演化。尽管图神经网络(GNNs)在静态或无符号链接预测中表现良好,但由于符号关系、演化结构和平衡理论约束的相互作用,在时间符号图中的有效学习仍然具有挑战性。为了解决这一差距,我们提出了一种用于符号GNN的模块化时间增强框架,该框架将历史上下文集成到原本静态的架构中。该框架引入了一个历史上下文集成模块(HCIM),该模块结合了可学习的近因感知时间加权、基于LSTM的嵌入轨迹建模和多头时间注意力,以捕获短期和长期的符号交互动态。历史信息通过全局或节点自适应权重与当前节点表示融合,使得与架构无关的框架能够适应异质的时间行为。我们在自解释符号图变换器(SE-SGformer)上实例化了该方法,在保持可解释性的同时扩展了其时间感知能力。在真实和合成TSN(包括Bitcoin OTC、Bitcoin Alpha、Reddit和小世界网络模型)上的实验表明,与静态基线相比,该方法取得了一致且统计显著的改进。

英文摘要

Temporal signed networks (TSNs) model the time evolution of cooperative and adversarial relationships that arise in applications such as social media analysis, trust and reputation systems, and financial transaction networks. While graph neural networks (GNNs) perform well for static or unsigned link prediction, effective learning in temporal signed graphs remains challenging due to the interaction of signed relations, evolving structure, and balance-theoretic constraints. To address this gap, we propose a \emph{modular} temporal enhancement framework for signed GNNs that integrates historical context into otherwise static architectures. The framework introduces a Historical Context Integration Module (HCIM) that combines learnable recency-aware temporal weighting, LSTM-based embedding trajectory modeling, and multi-head temporal attention to capture both short- and long-term signed interaction dynamics. Historical information is fused with current node representations using either global or node-adaptive weighting, allowing the architecture-agnostic framework to accommodate heterogeneous temporal behaviors. We instantiate the approach on the Self-Explainable Signed Graph Transformer (SE-SGformer), preserving interpretability while extending it with temporal awareness. Experiments on real-world and synthetic TSNs, including Bitcoin OTC, Bitcoin Alpha, Reddit, and small-world network models, demonstrate consistent and statistically significant improvements over the static baseline.

2505.13986 2026-06-16 math.OC cs.AI cs.LG 版本更新

RIDGECUT: Learning Graph Partitioning with Rings and Wedges

RIDGECUT:基于环与楔形结构的图分割学习

Qize Jiang, Angelo Zangari, Linsey Pang, Alice Gatti, Mahima Aggarwal, Giovanna Vantini, Xiaosong Ma, Weiwei Sun, Sourav Medya, Sanjay Chawla

发表机构 * College of Computer Science and Artificial Intelligence, Shanghai Key Laboratory of Data Science(计算机科学与人工智能学院,上海数据科学重点实验室) University of Illinois Chicago(伊利诺伊大学芝加哥分校) PayPal Inc.(PayPal公司) Center for AI Safety(人工智能安全中心) Qatar Computing Research Institute(卡塔尔计算研究所) Hamad Bin Khalifa University(哈马德·本·卡西姆大学) Computing and Mathematical Sciences (CMS) Division(计算与数学科学(CMS)部门) Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)(Mohamed bin Zayed人工智能大学(MBZUAI)) Fudan University(复旦大学)

AI总结 提出RidgeCut框架,通过将动作空间约束为环与楔形结构,利用强化学习解决归一化割问题,在交通网络上实现结构感知分割,降低归一化割值并展现强泛化能力。

Comments Extended version of the paper accepted at KDD 2026

详情
AI中文摘要

强化学习通过学习跨实例泛化的启发式方法,在图的组合优化问题上展现出潜力。然而,如何有效地将领域知识融入强化学习框架进行图分割仍然具有挑战性,因为现有方法通常依赖于无约束的节点级动作,导致动作空间大且探索效率低。在本文中,我们提出RidgeCut,一种强化学习框架,通过约束动作空间来在归一化割问题中实现结构感知分割。以交通网络为动机示例,我们引入了一个利用城市道路拓扑领域知识的新概念——其中自然分割通常呈现为同心环和径向楔形。通过将图转换为线性或圆形表示,我们的方法能够使用基于变换器的策略并通过近端策略优化进行高效学习。RidgeCut产生的分割不仅与预期的空间布局一致,而且与现有方法相比实现了更低的归一化割值。在合成和真实交通图上的实验结果表明,RidgeCut在跨图大小的归纳泛化方面始终优于现有方法。尽管以道路网络为动机,RidgeCut为将结构先验嵌入到图分割的强化学习框架中提供了一种通用机制。

英文摘要

Reinforcement learning (RL) has shown promise for combinatorial optimization problems on graphs by learning heuristics that generalize across instances. However, effectively incorporating domain knowledge into RL frameworks for graph partitioning remains challenging, as existing approaches typically rely on unconstrained node-level actions that lead to large action spaces and inefficient exploration. In this paper, we propose RidgeCut, an RL framework that constrains the action space to enforce structure-aware partitioning in the Normalized Cut problem. Using transportation networks as a motivating example, we introduce a novel concept that leverages domain knowledge about urban road topology -- where natural partitions often take the form of concentric rings and radial wedges. By transforming the graph into linear or circular representations, our method enables the use of transformer-based policies and efficient learning via Proximal Policy Optimization. The resulting partitions from RidgeCut are not only aligned with expected spatial layouts but also achieve lower normalized cuts compared to existing methods. Experimental results on synthetic and real-world traffic graphs demonstrate that RidgeCut consistently outperforms existing methods while exhibiting strong inductive generalization across graph sizes. Although motivated by road networks, RidgeCut provides a general mechanism for embedding structural priors into RL frameworks for graph partitioning.

2604.03496 2026-06-16 cs.AI cs.IR cs.LG 版本更新

Beyond Predefined Schemas: TRACE-KG for Context-Enriched Knowledge Graph Generation

超越预定义模式:TRACE-KG 用于上下文增强的知识图谱生成

Mohammad Sadeq Abolhasani, Yang Ba, Yixuan He, Rong Pan

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出 TRACE-KG 框架,通过数据驱动模式联合构建上下文增强的知识图谱和归纳模式,无需预定义本体,解决长技术文档中图谱碎片化问题。

Comments Accepted at Graph Foundation Models at ICML 2026

详情
AI中文摘要

知识图谱生成通常依赖于预定义本体或免模式提取。本体驱动的流水线强制执行一致的类型,但需要昂贵的模式设计和维护,而免模式方法通常产生碎片化的图谱,全局组织薄弱,尤其是在信息密集、依赖上下文的冗长技术文档中。我们提出 \textbf{TRACE-KG}(\textbf{T}ext-d\textbf{R}iven schem\textbf{A} for \textbf{C}ontext-\textbf{E}nriched \textbf{K}nowledge \textbf{G}raphs),一个无需假设预定义本体即可联合构建上下文增强的知识图谱和归纳模式的框架。TRACE-KG 通过结构化限定符捕获条件关系,并使用数据驱动模式组织实体和关系,该模式作为可重用的语义支架,同时保持对源证据的完全可追溯性。实验表明,TRACE-KG 生成结构连贯、可追溯的知识图谱,并为本体驱动和免模式构建流水线提供了实用的替代方案。

英文摘要

Knowledge graph generation typically relies either on predefined ontologies or on schema-free extraction. Ontology-driven pipelines enforce consistent typing but require costly schema design and maintenance, whereas schema-free methods often produce fragmented graphs with weak global organization, especially in long technical documents with dense, context-dependent information. We propose \textbf{TRACE-KG} (\textbf{T}ext-d\textbf{R}iven schem\textbf{A} for \textbf{C}ontext-\textbf{E}nriched \textbf{K}nowledge \textbf{G}raphs), a framework that jointly constructs a context-enriched knowledge graph and an induced schema without assuming a predefined ontology. TRACE-KG captures conditional relations through structured qualifiers and organizes entities and relations using a data-driven schema that serves as a reusable semantic scaffold while preserving full traceability to the source evidence. Experiments show that TRACE-KG produces structurally coherent, traceable knowledge graphs and offers a practical alternative to both ontology-driven and schema-free construction pipelines.

10. 迁移、元学习与持续学习 15 篇

2606.14900 2026-06-16 cs.LG 新提交

GRASP: Gradient-Aligned Sequential Parameter Transfer for Memory-Efficient Multi-Source Learning

GRASP: 梯度对齐的序列参数迁移用于内存高效的多源学习

Mary Isabelle Wisell, Nicholas Jacobs, Aayush Manandhar, Salimeh Yasaei Sekeh

发表机构 * San Diego State University(圣地亚哥州立大学) University of Utah(犹他大学) University of Maine(缅因大学)

AI总结 提出GRASP方法,通过序列处理、参数梯度对齐和迭代微调,在O(1)内存下实现多源知识融合,在三个持续学习基准上平均准确率93.5%,优于集成方法的71.7%。

详情
AI中文摘要

多源迁移学习面临一个根本的可扩展性瓶颈:现有方法在参数融合时需要将所有K个源模型同时加载到内存中,需要O(K)内存,或者在推理时部署所有模型,使得生产部署不可行。我们提出GRASP(梯度对齐的序列参数迁移),通过三个关键创新实现了优越的知识集成,同时保持O(1)内存消耗:(1)序列处理,一次将一个源合并到正在演化的目标模型中;(2)参数级梯度对齐,仅选择性迁移其优化方向与目标域对齐的参数,避免负迁移;(3)迭代微调,在集成下一个源之前适应迁移的知识。在三个持续学习基准(Yearbook、CLEAR-10、CLEAR-100)上进行的广泛实验,涵盖了10到108年的时间分布偏移和四种架构(1.3M到25.6M参数),表明GRASP在所有数据集和架构上实现了93.5%的平均准确率,而集成方法为71.7%,同时仅需要恒定内存,而标准多源融合需要K个模型。关键的是,GRASP的序列处理先前合并的模型,并扩展到任意多的源而无需增加内存,使其特别适合资源受限的部署和不断演化的源域。

英文摘要

Multi-source transfer learning faces a fundamental scalability bottleneck: existing approaches require either loading all K source models into memory simultaneously during parameter fusion, requiring O(K) memory, or deploying all models at inference time, making production deployment infeasible. We propose GRASP (Gradient-Aligned Sequential Parameter Transfer), which achieves superior knowledge integration while maintaining O(1) memory consumption through three key innovations: (1) sequential processing that merges one source at a time into an evolving target model, (2) parameter-wise gradient alignment that selectively transfers only parameters whose optimization directions align with the target domain, avoiding negative transfer, and (3) iterative fine-tuning that adapts transferred knowledge before integrating the next source. Extensive experiments across three continual learning benchmarks (Yearbook, CLEAR-10, CLEAR-100) spanning 10 to 108-year temporal distribution shifts and four architectures (1.3M to 25.6M parameters) demonstrate that GRASP achieves 93.5% mean accuracy over all datasets and architectures compared to ensemble method's 71.7% accuracy while requiring only constant memory versus K models for standard multi-source fusion. Critically, GRASP's sequential previously merged models and scales to arbitrarily many sources without memory growth, making it uniquely suitable for resource-constrained deployment and continually evolving source domains.

2606.15512 2026-06-16 cs.LG physics.plasm-ph 新提交

Towards Data-Efficient Cross-Device Generalization of Grad-Shafranov Equilibria via Transfer Learning Neural Operator

通过迁移学习神经算子实现Grad-Shafranov平衡的数据高效跨设备泛化

Jay Phil Yoo, William Howes, Yashika Ghai, Kazuma Kobayashi, Souvik Chakraborty, Syed Bahauddin Alam

发表机构 * Grainger College of Engineering, Nuclear, Plasma & Radiological Engineering Department, University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校格兰杰工程学院核、等离子体与放射工程系) Fusion Energy Division, Oak Ridge National Lab(橡树岭国家实验室聚变能源部) National Center for Supercomputing Applications(国家超级计算应用中心) Department of Applied Mechanics, Indian Institute of Technology Delhi(印度理工学院德里分校应用力学系) Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi(印度理工学院德里分校亚迪人工智能学院)

AI总结 提出跨设备神经算子框架,将平衡重建转化为算子学习问题,通过多几何预训练实现数据高效迁移,Wavelet Neural Operator在100个目标样本下达到低于4%的L2误差。

详情
AI中文摘要

磁流体动力学平衡的实时重建对于磁约束聚变中的等离子体成形、稳定性评估和反馈控制至关重要。然而,Grad-Shafranov平衡计算在很大程度上仍然是设备特定的和迭代的,限制了它们在延迟受限的控制环境中的应用。现有的神经方法可以加速单个平衡预测,但它们通常无法提供跨变化的等离子体边界或托卡马克几何形状的可重用模型。在这里,我们展示了平衡重建可以重新表述为跨设备算子学习问题。我们开发了一个特定领域的神经算子框架,将几何和剖面参数直接映射到极向磁通场,用摊销的算子推理取代重复的按需求解计算。使用可解析处理的Solov'ev族作为受控的Grad-Shafranov测试平台,我们在八种几何上不同的类托卡马克配置中生成平衡,并在四种迁移学习策略下对五种神经算子架构进行基准测试。单几何预训练对未见设备迁移效果差,而多几何预训练能够实现数据高效的适应。Wavelet Neural Operator在跨几何性能上最强,在100个标记目标平衡下达到低于4%的平均相对L2误差,在全微调下低于2%。预测的磁场满足无散约束至数值精度,四种架构实现毫秒或亚毫秒级推理。这些结果确定了神经算子预训练是实现跨聚变设备配置的可重用实时平衡推理的途径。

英文摘要

Real-time reconstruction of magnetohydrodynamic equilibria is essential for plasma shaping, stability assessment and feedback control in magnetic confinement fusion. However, Grad-Shafranov equilibrium calculations remain largely device-specific and iterative, limiting their use in latency-constrained control settings. Existing neural approaches can accelerate individual equilibrium predictions, but they do not generally provide reusable models across changing plasma boundaries or tokamak geometries. Here we show that equilibrium reconstruction can be recast as a cross-device operator learning problem. We develop a domain-specific neural operator framework that maps geometry and profile parameters directly to the poloidal flux field, replacing repeated solve-on-demand computation with amortized operator inference. Using the analytically tractable Solov'ev family as a controlled Grad-Shafranov testbed, we generate equilibria across eight geometrically distinct tokamak-like configurations and benchmark five neural operator architectures under four transfer-learning strategies. Single-geometry pretraining gives poor transfer to unseen devices, whereas multi-geometry pretraining enables data-efficient adaptation. The Wavelet Neural Operator gives the strongest cross-geometry performance, reaching mean relative L2 errors below 4% with 100 labelled target equilibria and below 2% with full fine-tuning. The predicted magnetic fields satisfy the divergence-free constraint to numerical precision, and four architectures achieve millisecond or sub-millisecond inference. These results identify neural operator pretraining as a route towards reusable, real-time equilibrium inference across fusion device configurations.

2606.16517 2026-06-16 cs.LG q-bio.QM 新提交

How Post-Training Shapes Biological Reasoning Models

后训练如何塑造生物学推理模型

Lukas Fesser, Hanlin Zhang, Michelle M. Li, Eric Wang, Bryan Perozzi, Shekoofeh Azizi, Sham M. Kakade, Marinka Zitnik

发表机构 * Harvard University(哈佛大学) Google DeepMind(谷歌DeepMind) Google Research(谷歌研究院)

AI总结 研究后训练各阶段(CPT、SFT、RL)对生物学推理模型领域内和领域外性能的影响,发现SFT提升领域内性能但损害泛化,RL可部分恢复泛化,最佳策略是短SFT加长RL。

详情
AI中文摘要

生物学科学推理模型将语言模型与在多模态生物数据(包括DNA、RNA和蛋白质)上训练的基础模型相结合。这些模型通过后训练构建,然而每个阶段如何塑造推理和泛化能力仍知之甚少。我们研究后训练何时提升性能以及何时导致过度专门化。在基因组学、转录组学和蛋白质领域,我们训练并评估了超过100个生物学推理模型,在骨干网络、持续预训练(CPT)、监督微调(SFT)和强化学习(RL)方面进行受控变化,并测量领域内(ID)和领域外(OOD)性能。我们发现每个后训练阶段以不同方式重塑泛化,而非贡献均匀增益。CPT通过使模型与生物语言对齐来提升下游性能。SFT持续提高ID性能,但导致OOD性能早期达到峰值并随着模型拟合训练分布而下降。RL在应用于具有对齐奖励的强SFT检查点时,改善OOD性能并部分恢复泛化。这些结果表明,生物学推理并非随着额外监督或计算而单调提升。相反,性能取决于训练阶段的组合方式。在固定后训练预算下,最强的ID-OOD权衡来自短暂的SFT、更大的RL分配以及各阶段间不对称的适应能力。

英文摘要

Scientific reasoning models for biology combine language models with foundation models trained on multimodal biological data, including DNA, RNA, and proteins. These models are built through post-training, yet how each stage shapes reasoning and generalization remains poorly understood. We study when post-training improves performance and when it induces over-specialization. Across genomics, transcriptomics, and proteins, we train and evaluate more than 100 biological reasoning models under controlled variation in backbone, continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL), measuring both in-domain (ID) and out-of-domain (OOD) performance. We find that each post-training stage reshapes generalization in a distinct way rather than contributing uniform gains. CPT improves downstream performance by aligning models with biological language. SFT consistently increases ID performance but causes OOD performance to peak early and decline as models fit the training distribution. RL, when applied to strong SFT checkpoints with aligned rewards, improves OOD performance and partially recovers generalization. These results show that biological reasoning does not improve monotonically with additional supervision or compute. Instead, performance depends on how training stages are composed. Under fixed post-training budgets, the strongest ID-OOD trade-off comes from brief SFT, larger RL allocations, and asymmetric adaptation capacity across stages.

2606.14883 2026-06-16 cs.CV cs.LG 交叉投稿

Understanding Cross-Modal Contributions in Continual Vision-Language Models: A Theoretical Perspective

理解连续视觉-语言模型中的跨模态贡献:一个理论视角

Salimeh Sekeh, Mary Wisell

发表机构 * San Diego State University(圣地亚哥州立大学)

AI总结 本文从理论角度分析连续视觉-语言模型中跨模态(视觉-语言)贡献,提出新视角并通过实验验证其有效性,揭示任务顺序和相似性对贡献鲁棒性的影响,提升泛化性能。

详情
AI中文摘要

连续视觉-语言模型通常通过顺序微调来解决;然而,尽管这种范式能够适应新环境(任务),但它本质上以牺牲保持先前获取知识所需的稳定性为代价,强调了先前学习环境(任务)的贡献。虽然现有方法已经充分研究了视觉-语言模型(VLM)中的连续学习和灾难性遗忘,但跨一系列环境的模态特定贡献的理论理解仍然很大程度上未被探索。在本文中,我们提出了一个新的理论视角来理解跨模态(视觉-语言)对连续环境的贡献。我们在大型VLM上实证评估了我们的理论发现,并展示了它们在捕捉环境级跨模态贡献方面的有效性。我们的分析为连续VLM提供了更深入的见解,突出了它们对不同任务顺序和任务间相似性的贡献鲁棒性,以及它们改进的泛化性能。

英文摘要

Continual vision-language models are commonly addressed through sequential fine-tuning; however, although this paradigm enables adaptation to new environments (tasks), it inherently emphasizes the contribution of previously learned environments (tasks) at the expense of the stability required to preserve previously acquired knowledge. While existing approaches have adequately studied continual learning and catastrophic forgetting in vision-language models (VLMs), the theoretical understanding of modality-specific contributions across a sequence of environments remains largely unexplored. In this paper, we present a new theoretical perspective to understand the cross-modal (vision-language) contributions to consecutive environments. We empirically evaluate our theoretical findings on large VLMs and demonstrate their effectiveness in capturing environment-level cross-modal contributions. Our analysis provides deeper insights into continual VLMs, highlighting their contribution robustness to varying task orders and inter-task similarities, and their improved generalization performance.

2606.15117 2026-06-16 cs.MM cs.AI cs.CV cs.LG cs.SD 交叉投稿

Teacher-Student Structure for Domain Adaptation in Ensemble Audio-Visual Video Deepfake Detection

用于集成视听视频深度伪造检测中领域适应的师生结构

Elham Abolhasani, Maryam Ramezani, Hamid R. Rabiee

发表机构 * Department of Computer Engineering, Sharif University of Technology(谢里夫理工学院计算机工程系)

AI总结 提出EAV-DFD方法,结合师生框架的领域适应机制,提升模型在未见领域上的泛化能力,在三个数据集上AUC分别提升4.09%、17.94%和0.5%。

详情
AI中文摘要

生成式AI模型的快速发展导致了更逼真的深度伪造媒体,包括对音频、视频或两者的操纵。这引发了严重的隐私和社会问题。该领域的许多研究已经取得了有前景的域内结果;然而,这些模型在面对来自不同领域的数据时,其有效性常常下降。因此,最近的深度伪造检测方法侧重于通过多种技术增强泛化能力,这些技术融合了所有输入模态,包括音频、图像及其交互。为此,我们提出了EAV-DFD方法,一种广义的深度集成视听模型(EAV-DFD),结合了利用师生框架的领域适应机制,以增强模型在未见领域上的表现和泛化能力。为了评估模型性能,我们使用FakeAVCeleb数据集作为主领域,DFDC、Deepfake_TIMIT和PolyGlotFake数据集作为未见领域。我们的实验结果表明,所提出的框架在领域适应方面是有效的,仅使用一小部分未见数据集训练学生模型,就在三个未见数据集上分别将模型的AUC性能提升了4.09%、17.94%和0.5%。这产生了一种新颖的深度伪造检测模型,能够适应新领域并解释哪个模态被操纵,突显了我们的方法在现实世界应用中的潜力。

英文摘要

The rapid advancement of generative AI models is leading to more realistic deepfake media, encompassing the manipulation of audio, video, or both. This raises severe privacy and societal concerns. Numerous studies in this area have yielded promising intra-domain results; however, these models frequently exhibit decreased efficacy when faced with data from dissimilar domains. Consequently, recent deepfake detection approaches focus on enhancing the generalization ability through multiple techniques that incorporate all input modalities, including audio, images, and their interactions. In this regard, we propose the EAV-DFD method, a generalized deep ensemble audio-visual model (EAV-DFD) combined with a domain adaptation mechanism utilizing a teacher-student framework to enhance the model's ability to perform and generalize effectively across unseen domains. To evaluate the model's performance, we used the FakeAVCeleb dataset as the primary domain and the DFDC, Deepfake_TIMIT, and PolyGlotFake datasets as an unseen domain. Our experimental results demonstrate that the proposed framework is efficient in domain adaptation, improving AUC performance of the model by 4.09%, 17.94%, and 0.5% on three unseen datasets, using only a small portion of them to train the student model. This leads to a novel deepfake detection model capable of adapting to new domains and interpreting which modality has been manipulated, highlighting the potential of our approach for real-world applications.

2606.15734 2026-06-16 cs.CL cs.AI cs.IR cs.LG 交叉投稿

Retrievable Gradients: Continual Post-Training Without Cumulative Weight Drift

可检索梯度:无累积权重漂移的持续后训练

Weihang Su, Jiacheng Kang, Jingyan Xu, Qingyao Ai, Jianming Long, Hanwen Zhang, Bangde Du, Xinyuan Cao, Min Zhang, Yiqun Liu

发表机构 * Department of Computer Science and Technology, Tsinghua University(清华大学计算机科学与技术系)

AI总结 提出ReGrad范式,将梯度作为可检索知识单元,通过元学习重塑文档梯度为通用适应信号,实现无权重漂移的可扩展参数知识注入。

详情
AI中文摘要

持续后训练使模型在部署后能够吸收新知识,但重复更新共享参数会累积权重漂移,可能导致灾难性遗忘并降低通用能力。检索增强生成避免了这种参数漂移,但往往缺乏参数化知识整合的深度。在本文中,我们提出ReGrad(可检索梯度),一种将梯度视为可检索知识单元的新范式。ReGrad离线预计算文档特定梯度,存储在索引化的梯度库中,并在推理时仅检索与查询相关的梯度以进行临时权重调整。然而,原始语言建模梯度针对词级文档重建而非查询驱动的知识使用进行优化。因此,我们引入双层元学习目标,将文档派生梯度重塑为下游任务的通用适应信号。在通用和特定领域设置上的实验表明,ReGrad优于CPT和RAG基线,实现了可扩展且可逆的参数知识注入,且不累积权重漂移。

英文摘要

Continual post-training enables models to absorb emerging knowledge after deployment, but repeatedly updating shared parameters can accumulate weight drift, potentially causing catastrophic forgetting and degrading general capabilities. Retrieval-augmented generation avoids such parameter drift, yet often lacks the depth of parametric knowledge integration. In this paper, we propose ReGrad (Retrievable Gradients), a new paradigm that treats gradients as retrievable units of knowledge. ReGrad pre-computes document-specific gradients offline, stores them in an indexed Gradient Bank, and retrieves only query-relevant gradients at inference time for temporary weight adaptation. However, raw language-modeling gradients are optimized for token-level document reconstruction rather than for query-driven knowledge use. We therefore introduce a bi-level meta-learning objective that reshapes document-derived gradients into generalizable adaptation signals for downstream tasks. Experiments across general and domain-specific settings show that \textsc{ReGrad} outperforms CPT and RAG baselines, enabling scalable and reversible parametric knowledge injection without accumulating weight drift.

2606.15778 2026-06-16 cs.CL cs.AI cs.LG cs.SI 交叉投稿

DYNA : Dynamic Episodic Memory Networks for Augmenting Large Language Models with Temporal Knowledge Graphs in Continuous Learning

DYNA:用于在持续学习中通过时间知识图谱增强大语言模型的动态情景记忆网络

Ali Sarabadani, Mahtab Tajvidiyan

发表机构 * Department of Computer Engineering and Information Technology, University of Qom(卡姆大学计算机工程与信息科技系)

AI总结 提出DYNA框架,通过时间知识图谱作为外部可更新记忆,增强冻结的大语言模型,在三个时间召回任务上减少约7%的灾难性遗忘并提升约5%的时间排序能力。

详情
AI中文摘要

大语言模型(LLMs)难以在不遗忘或昂贵重训练的情况下融入新知识。我们提出DYNA,一个轻量级框架,通过时间知识图谱增强冻结的LLM,其中事件作为节点,时间关系作为有向、带时间戳的边。该图谱作为外部可更新记忆。在查询时,DYNA通过随机游走和中心性度量检索相关节点,然后增强LLM的响应。在三个时间召回任务上评估,DYNA相比微调减少了约7%的灾难性遗忘,相比标准RAG提升了约5%的时间排序能力。更高的图谱聚类系数与更好的检索相关,表明图谱结构的重要性。贡献:(1)将情景记忆作为时间知识图谱,(2)无需重训练的LLM增强,(3)图谱属性作为检索性能的预测因子。

英文摘要

Large Language Models (LLMs) struggle to incorporate new knowledge without forgetting or costly retraining. We propose DYNA, a lightweight framework that augments a frozen LLM with a temporal knowledge graph where events are nodes and temporal relations are directed, timestamped edges. The graph serves as an external, updatable memory. At query time, DYNA retrieves relevant nodes via random walks and centrality measures, then augments the LLM's response. Evaluated on three temporal recall tasks, DYNA reduces catastrophic forgetting by ~7% compared to fine-tuning and improves temporal ordering by ~5% over standard RAG. Higher graph clustering coefficients correlate with better retrieval, showing that graph structure matters. Contributions: (1) episodic memory as temporal KG, (2) retraining-free LLM augmentation, (3) graph properties as predictors of retrieval performance.

2606.16256 2026-06-16 cs.CV cs.LG 交叉投稿

KeepLoRA++: Continual Learning with Layer-Scaled Residual Gradient Adaptation

KeepLoRA++: 基于层级缩放残差梯度适应的持续学习

Mao-Lin Luo, Yi-Lin Zhang, Zi-Hao Zhou, Yankun Hong, Xialiang Tong, Mingxuan Yuan, Tong Wei, Min-Ling Zhang

发表机构 * School of Computer Science and Engineering, Southeast University(东南大学计算机科学与工程学院) Key Laboratory of Computer Network and Information Integration, Southeast University, Ministry of Education(东南大学计算机网络和信息集成教育部重点实验室) Huawei Noah’s Ark Lab(华为诺亚方舟实验室)

AI总结 针对预训练视觉语言模型持续学习中保留预训练知识、旧任务知识和学习新知识的冲突,提出KeepLoRA++,通过层级缩放残差梯度适应方法,限制LoRA参数更新到残差子空间并采用浅到深层缩放,平衡三者,在图像分类、视觉问答和视频理解任务上优于基线。

详情
AI中文摘要

预训练视觉语言模型的持续学习需要平衡三个相互竞争的目标:保留预训练知识、保留一系列已学习任务的知识以及保持获取新知识的可塑性。本文提出KeepLoRA++,通过统一的二维知识保留机制来平衡这些目标。我们从层间和层内两个角度分析Transformer架构的知识分布。层间视角考察知识保留如何跨层分布,而层内视角关注每层内的参数空间。我们的分析揭示了一个结构特性:通用可迁移知识主要编码在浅层和参数的主子空间中,而任务特定适应则定位于深层和残差子空间。受此启发,KeepLoRA++引入了一种层级缩放残差梯度适应方法。新任务的学习通过将LoRA参数更新限制在残差子空间,并结合从浅到深的层级缩放来实现,以防止干扰先前获得的能力。具体而言,新任务的梯度被投影到与预训练模型主子空间以及先前任务特征主导方向正交的子空间上,同时为浅层分配较小的更新幅度,为深层分配较大的更新幅度。我们的理论分析和实证评估证实,KeepLoRA++成功平衡了这三个相互竞争的目标,在图像分类、视觉问答和视频理解任务上持续优于代表性基线。

英文摘要

Continual learning for pre-trained vision-language models requires balancing three competing objectives: retaining pre-trained knowledge, preserving knowledge from a sequence of learned tasks, and maintaining the plasticity to acquire new knowledge. This paper presents KeepLoRA++, balancing these objectives through a unified dual-dimensional knowledge retention mechanism. We analyze knowledge distribution of Transformer architecture from both inter-layer and intra-layer perspectives. The inter-layer perspective examines how retention is distributed across layers, while the intra-layer perspective focuses on the parameter space within each layer. Our analysis reveals a structural property: general transferable knowledge is mainly encoded in the shallow layers and the principal subspace of the parameters, while task-specific adaptations are localized in the deep layers and the residual subspace. Motivated by this insight, KeepLoRA++ introduces a layer-scaled residual gradient adaptation method. New tasks are learned by restricting LoRA parameter updates to the residual subspace, combined with a shallow-to-deep layer scaling, to prevent interference with previously acquired capabilities. Specifically, the gradient of a new task is projected onto a subspace orthogonal to both the principal subspace of the pre-trained model and the dominant directions of previous task features, while simultaneously assigning smaller update magnitudes to shallow layers and larger ones to deeper layers. Our theoretical analysis and empirical evaluations confirm that KeepLoRA++ successfully balances these three competing objectives, consistently outperforming representative baselines across image classification, visual question answering, and video understanding tasks.

2312.06173 2026-06-16 cs.LG 版本更新

Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

基于具体子空间学习的多任务模型融合干扰消除

Anke Tang, Xianglin Luo, Li Shen, Yong Luo, Liang Ding, Han Hu, Bo Du, Dacheng Tao

发表机构 * School of Computer Science, Wuhan University(武汉大学计算机学院) Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Sun Yat-sen University(中山大学) Alibaba Group(阿里巴巴集团) School of Information and Electronics, Beijing Institute of Technology(北京理工大学信息与电子学院) Nanyang Technological University(南洋理工大学)

AI总结 提出连续松弛离散子空间学习方法,通过元学习框架识别低维共享子空间,解决多任务模型融合中的参数冲突问题,在视觉和语言任务上验证有效性。

详情
AI中文摘要

从共同的大规模预训练模型微调得到但专用于不同任务的模型合并,已被证明是一种廉价且可扩展的策略,用于构建在各种任务上表现良好的多任务模型。最近的研究,以任务算术为例,强调这种多任务模型可以通过任务向量上的算术运算得到。然而,当前的融合技术通常通过评估单个属性(如参数的大小或符号)来解决任务特定模型参数之间的潜在冲突,忽略了它们对模型整体功能的集体影响。在这项工作中,我们提出了连续松弛离散(Concrete)子空间学习方法,以识别一个共同的低维子空间,并利用其共享信息来跟踪干扰问题,而不会牺牲太多性能。具体来说,我们将问题建模为双层优化问题,并引入一个元学习框架,通过基于梯度的技术找到具体的子空间掩码。在上层,我们专注于学习一个共享的具体掩码来识别子空间,而在内层,执行模型合并以最大化合并模型的性能。我们在视觉领域和语言领域进行了广泛的实验,结果证明了我们方法的有效性。代码可在以下网址获取:https://this https URL

英文摘要

Merging models fine-tuned from a common, extensively pre-trained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multi-task model that performs well across diverse tasks. Recent research, exemplified by task arithmetic, highlights that this multi-task model can be derived through arithmetic operations on task vectors. Nevertheless, current merging techniques frequently resolve potential conflicts among parameters from task-specific models by evaluating individual attributes, such as the parameters' magnitude or sign, overlooking their collective impact on the overall functionality of the model. In this work, we propose the CONtinuous relaxation of disCRETE (Concrete) subspace learning method to identify a common low-dimensional subspace and utilize its shared information to track the interference problem without sacrificing much performance. Specifically, we model the problem as a bi-level optimization problem and introduce a meta-learning framework to find the Concrete subspace mask through gradient-based techniques. At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model. We conduct extensive experiments on both vision domain and language domain, and the results demonstrate the effectiveness of our method. The code is available at https://github.com/tanganke/subspace_fusion

2601.18699 2026-06-16 cs.LG cs.CL 版本更新

Mechanistic Analysis of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

大型语言模型在持续微调过程中灾难性遗忘的机制分析

Gustav Olaf Yunus Laitinen-Fredriksson Lundstrom-Imanov

发表机构 * Division of Statistics and Machine Learning (STIMA), Department of Computer and Information Science (IDA), Linköping University(统计与机器学习系(STIMA)、计算机与信息科学系(IDA)、利厄普堡大学)

AI总结 本文系统比较了20个顶级LLM在持续微调中的灾难性遗忘,通过行为分析和机制解释定位易受参数覆盖的神经回路,并提出低秩电路投影(LRCP)方法,在开放权重模型中恢复高达94.2%的祖先能力。

Comments 12 pages, 8 figures, 5 tables. Preprint submitted to Elsevier

详情
AI中文摘要

大型语言模型(LLMs)在适应目标任务时的顺序微调常常引发灾难性遗忘,即获取新目标技能会削弱原有能力。本文对代表2026年中期的二十个顶级模型进行了灾难性遗忘的系统比较研究。我们将研究分为两条主线:(i)对十个领先闭源模型(包括Claude Fable 5、GPT-5.5 High和Gemini 3.5 Flash)的行为和语义输出漂移分析;(ii)对十个著名开放权重架构(如DeepSeek-V4-Pro、Llama 4 Maverick和Qwen 3.6-27B)的深度机制解释。通过权重空间轨迹追踪、中心核对齐(CKA)以及混合专家(MoE)层中的路由门漂移计算,我们定位了高度易受参数覆盖的神经回路。我们的发现表明,早期层的注意力头表现出系统性熵扩散,而中深层的前馈网络(或稀疏专家块)则遭受局部表示崩溃。基于这些见解,我们引入了低秩电路投影(LRCP),一种子空间正则化的训练干预。实证评估显示,LRCP在开放权重配置中成功恢复了高达94.2%的祖先能力,并匹配了标准PEFT基线的适应速度。

英文摘要

Sequential fine-tuning of Large Language Models (LLMs) adaptation to target tasks often triggers catastrophic forgetting, where the acquisition of novel target skills degrades ancestral capabilities. This paper presents a systematic comparative study of catastrophic forgetting across twenty premier models representing the state-of-the-art in mid-2026. We categorize our investigation into two primary research lines: (i) a behavioral and semantic output drift analysis of ten leading closed-source models (including Claude Fable 5, GPT-5.5 High, and Gemini 3.5 Flash), and (ii) a deep mechanistic interpretation of ten prominent open-weight architectures (such as DeepSeek-V4-Pro, Llama 4 Maverick, and Qwen 3.6-27B). Through weight-space trajectory tracking, Centered Kernel Alignment (CKA), and routing gate drift calculations in Mixture-of-Experts (MoE) layers, we localize the neural circuits highly susceptible to parameter overwriting. Our findings indicate that early-layer attention heads exhibit systemic entropic dispersion, while mid-to-deep feed-forward networks (or sparse expert blocks) suffer localized representation collapse. Informed by these insights, we introduce Low-Rank Circuit Projection (LRCP), a subspace-regularized training intervention. Empirical evaluations show that LRCP successfully mitigates up to 94.2% of ancestral capabilities in open-weight configurations and matches the adaptation velocity of standard PEFT baselines.

2606.00558 2026-06-16 cs.LG 版本更新

Semi-Supervised Noise Adaptation: Transferring Knowledge from Noise Domain

半监督噪声适应:从噪声域迁移知识

Yuan Yao, Jin Song, Huixia Li, Tongtong Yuan, Jiaqi Wu, Yu Zhang

发表机构 * Guangdong Laboratory of Artificial Intelligence and Digital Economy(广东人工智能与数字经济实验室) Nanjing University of Posts and Telecommunications(南京邮电大学) Beijing Jiaotong University(北京交通大学) Beijing University of Technology(北京工业大学) Tsinghua University(清华大学) Southern University of Science and Technology(南方科技大学)

AI总结 提出半监督噪声适应(SSNA)问题,利用合成噪声域作为源域,通过噪声适应框架(NAF)改善目标域的泛化性能。

Comments Accepted by ICML 2026

详情
AI中文摘要

迁移学习旨在通过从源域迁移知识来促进目标域的学习。源域通常包含语义上有意义的样本(例如图像),以促进有效的知识迁移。然而,最近的一项研究观察到,由简单分布(例如高斯分布)构建的噪声域可以在半监督设置中作为替代源域,其中只有一小部分目标样本被标记,而大多数样本未标记。基于这一令人惊讶的观察,我们提出了一种称为半监督噪声适应(SSNA)的新问题,旨在利用合成噪声域来提高目标域的泛化能力。为了解决这个问题,我们首先建立了一个泛化界,描述了噪声域对泛化的影响,基于此我们提出了噪声适应框架(NAF)。大量实验表明,NAF有效地利用噪声域来收紧目标域的泛化界,从而提高了性能。代码可在 https://github.com/AIResearch-Group/SSNA 获取。

英文摘要

Transfer learning aims to facilitate the learning of a target domain by transferring knowledge from a source domain. The source domain typically contains semantically meaningful samples (*e.g.*, images) to facilitate effective knowledge transfer. However, a recent study observes that the noise domain constructed from simple distributions (*e.g.*, Gaussian distributions) can serve as a surrogate source domain in the semi-supervised setting, where only a small proportion of target samples are labeled while most remain unlabeled. Based on this surprising observation, we formulate a novel problem termed *Semi-Supervised Noise Adaptation* (SSNA), which aims to leverage a synthetic noise domain to improve the generalization of the target domain. To address this problem, we first establish a generalization bound characterizing the effect of the noise domain on generalization, based on which we propose a Noise Adaptation Framework (NAF). Extensive experiments demonstrate that NAF effectively leverages the noise domain to tighten the generalization bound of the target domain, leading to improved performance. The codes are available at https://github.com/AIResearch-Group/SSNA.

2507.02288 2026-06-16 cs.CV cs.LG 版本更新

Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization

基于语言引导与表示对齐的提示解缠用于域泛化

De Cheng, Zhipeng Xu, Xinyang Jiang, Dongsheng Li, Nannan Wang, Xinbo Gao

发表机构 * School of Telecommunications Engineering, the State Key Laboratory of Integrated Services Networks (ISN), Xidian University, Xi’an, China(电信工程学院、集成服务网络国家重点实验室(ISN)、西安电子科技大学) Microsoft Research Asia, Shanghai, China(微软亚洲研究院,上海,中国)

AI总结 提出利用大语言模型自动解缠文本提示,并引入最差显式表示对齐,结合抽象提示增强源域多样性,实现域不变视觉表示学习,在多个基准上超越现有方法。

详情
Journal ref
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 48, no. 6, pp. 6799-6816, June 2026
AI中文摘要

域泛化(DG)旨在开发一个能够在未见过的目标域上有效执行的通用模型。值得注意的是,预训练视觉基础模型(VFM)如CLIP的最新进展,已显示出增强深度学习模型泛化能力的巨大潜力。尽管基于VFM的域提示调整在DG中受到越来越多的关注,但设计能够解缠跨域不变特征的提示仍然是一个关键挑战。在本文中,我们提出通过利用VFM的可控且灵活的语言提示来解决这一挑战。注意到VFM的文本模态自然更容易解缠,我们引入了一个新颖的文本特征引导的视觉提示调整框架。该框架首先使用大语言模型(LLM)自动解缠文本提示,然后学习由解缠文本特征引导的域不变视觉表示。然而,仅依赖语言来引导视觉特征解缠存在局限性,因为视觉特征有时可能过于复杂或微妙,难以被描述性文本完全捕捉。为解决这一问题,我们引入了最差显式表示对齐(WERA),它通过添加一组额外的抽象提示来扩展文本引导的视觉提示。这些提示通过风格化图像增强来增强源域多样性,而对齐约束确保视觉表示在原始分布和增强分布上保持一致。在包括PACS、VLCS、OfficeHome、DomainNet和TerraInc在内的主要DG数据集上进行的实验表明,我们提出的方法优于最先进的DG方法。

英文摘要

Domain Generalization (DG) seeks to develop a versatile model capable of performing effectively on unseen target domains. Notably, recent advances in pre-trained Visual Foundation Models (VFMs), such as CLIP, have demonstrated considerable potential in enhancing the generalization capabilities of deep learning models. Despite the increasing attention toward VFM-based domain prompt tuning within DG, the effective design of prompts capable of disentangling invariant features across diverse domains remains a critical challenge. In this paper, we propose addressing this challenge by leveraging the controllable and flexible language prompt of the VFM. Noting that the text modality of VFMs is naturally easier to disentangle, we introduce a novel framework for text feature-guided visual prompt tuning. This framework first automatically disentangles the text prompt using a large language model (LLM) and then learns domain-invariant visual representation guided by the disentangled text feature. However, relying solely on language to guide visual feature disentanglement has limitations, as visual features can sometimes be too complex or nuanced to be fully captured by descriptive text. To address this, we introduce Worst Explicit Representation Alignment (WERA), which extends text-guided visual prompts by incorporating an additional set of abstract prompts. These prompts enhance source domain diversity through stylized image augmentations, while alignment constraints ensure that visual representations remain consistent across both the original and augmented distributions. Experiments conducted on major DG datasets, including PACS, VLCS, OfficeHome, DomainNet, and TerraInc, demonstrate that our proposed method outperforms state-of-the-art DG methods.

2510.10981 2026-06-16 stat.ML cs.LG 版本更新

In-Context Learning Is Provably Bayesian Inference: A Generalization Theory for Meta-Learning

上下文学习可证明是贝叶斯推断:元学习的泛化理论

Tomoya Wakayama, Taiji Suzuki

发表机构 * University of Tokyo(东京大学) National Institute of Information and Communications Technology(信息通信技术国家研究所)

AI总结 本文在元学习框架下,将上下文学习总风险分解为贝叶斯差距和后验方差,并证明Transformer通过预训练选择最优元算法,在测试时快速收敛到真实任务的最优算法。

详情
AI中文摘要

本文在元学习框架下,为上下文学习(ICL)发展了一个有限样本统计理论,该框架能够容纳多种任务类型的混合。我们引入了一个原则性的风险分解,将总ICL风险分解为两个正交分量:贝叶斯差距和后验方差。贝叶斯差距量化了训练模型逼近贝叶斯最优上下文预测器的程度。对于均匀注意力Transformer,我们推导出该差距的非渐近上界,明确阐明了其对预训练提示数量及其上下文长度的依赖关系。后验方差是一个与模型无关的风险,代表内在的任务不确定性。我们的关键发现是,该项仅由真实底层任务的难度决定,而任务混合带来的不确定性随着少量上下文示例呈指数级消失。这些结果共同提供了ICL的统一视角:Transformer在预训练期间选择最优元算法,并在测试时快速收敛到真实任务的最优算法。

英文摘要

This paper develops a finite-sample statistical theory for in-context learning (ICL), analyzed within a meta-learning framework that accommodates mixtures of diverse task types. We introduce a principled risk decomposition that separates the total ICL risk into two orthogonal components: Bayes Gap and Posterior Variance. The Bayes Gap quantifies how well the trained model approximates the Bayes-optimal in-context predictor. For a uniform-attention Transformer, we derive a non-asymptotic upper bound on this gap, which explicitly clarifies the dependence on the number of pretraining prompts and their context length. The Posterior Variance is a model-independent risk representing the intrinsic task uncertainty. Our key finding is that this term is determined solely by the difficulty of the true underlying task, while the uncertainty arising from the task mixture vanishes exponentially fast with only a few in-context examples. Together, these results provide a unified view of ICL: the Transformer selects the optimal meta-algorithm during pretraining and rapidly converges to the optimal algorithm for the true task at test time.

2511.17228 2026-06-16 quant-ph cs.LG 版本更新

Intrinsic preservation of plasticity in continual quantum learning

连续量子学习中可塑性的内在保持

Yu-Qin Chen, Shi-Xin Zhang

发表机构 * Graduate School of China Academy of Engineering Physics(中国工程物理研究院研究生部) Institute of Physics, Chinese Academy of Sciences(中国科学院物理研究所)

AI总结 量子学习模型通过其内在的物理约束(如酉变换)自然克服了经典深度学习中的可塑性丧失问题,在监督学习和强化学习等多种任务中保持长期学习能力。

Comments 17 pages, 13 figures

详情
AI中文摘要

动态现实环境中的人工智能需要具备持续学习的能力。然而,标准深度学习面临一个基本问题:可塑性丧失,即网络逐渐失去从新数据中学习的能力。在这里,我们展示量子学习模型自然地克服了这一限制,在长时间尺度上保持可塑性。我们在来自多个学习范式(包括监督学习和强化学习)以及多种数据模态(从经典高维图像到量子原生数据集)的广泛任务中系统地证明了这一优势。尽管经典模型表现出与无界权重和梯度增长相关的性能退化,但量子神经网络无论数据或任务如何都保持一致的学习能力。我们将这一优势的根源归因于量子模型的内在物理约束。与经典网络中无界权重增长导致景观崎岖或饱和不同,酉约束将优化限制在一个紧致流形上。我们的结果表明,量子计算在机器学习中的效用不仅限于潜在的加速,还为构建自适应人工智能和终身学习者提供了一条稳健的途径。

英文摘要

Artificial intelligence in dynamic, real-world environments requires the capacity for continual learning. However, standard deep learning suffers from a fundamental issue: loss of plasticity, in which networks gradually lose their ability to learn from new data. Here we show that quantum learning models naturally overcome this limitation, preserving plasticity over long timescales. We demonstrate this advantage systematically across a broad spectrum of tasks from multiple learning paradigms, including supervised learning and reinforcement learning, and diverse data modalities, from classical high-dimensional images to quantum-native datasets. Although classical models exhibit performance degradation correlated with unbounded weight and gradient growth, quantum neural networks maintain consistent learning capabilities regardless of the data or task. We identify the origin of the advantage as the intrinsic physical constraints of quantum models. Unlike classical networks where unbounded weight growth leads to landscape ruggedness or saturation, the unitary constraints confine the optimization to a compact manifold. Our results suggest that the utility of quantum computing in machine learning extends beyond potential speedups, offering a robust pathway for building adaptive artificial intelligence and lifelong learners.

2603.24350 2026-06-16 cs.RO cs.AI cs.LG 版本更新

Evidence of an Emergent "Self" in Continual Robot Learning

持续机器人学习中涌现的“自我”证据

Adidev Jhunjhunwala, Judah Goldfeder, Hod Lipson

发表机构 * Creative Machines Lab, Department of Mechanical Engineering, Columbia University(创意机器实验室,机械工程系,哥伦比亚大学) Creative Machines Lab, Department of Computer Science, Columbia University(创意机器实验室,计算机科学系,哥伦比亚大学)

AI总结 通过比较恒定任务与持续学习下机器人的认知结构,发现持续学习机器人形成显著更稳定的不变子网络,该子网络对适应性至关重要,为量化智能系统自我概念提供原则性方法。

Comments 44 pages, 24 figures, includes supplementary materials

详情
AI中文摘要

理解自我意识的一个关键挑战是,如何以原则性的方式量化一个智能系统是否具有“自我”概念,以及如果存在,如何将“自我”与其他认知结构区分开来。我们提出,可以通过寻找认知过程中相对于快速获得的认知技能变化较小的不变部分来隔离“自我”——因为我们的自我是我们经验中最持久的方面。我们利用这一原则分析了两种条件下机器人的认知结构:一个机器人学习恒定任务,而另一个在可变任务下进行持续学习。我们发现,经历持续学习的机器人形成了一个不变子网络,该子网络比对照组显著更稳定(p < 0.001),并且该子网络在功能上也很重要:保留它有助于适应,而破坏它会损害性能。我们在跨越运动控制和操作的三种不同机器人上验证了这一模式。

英文摘要

A key challenge to understanding self-awareness has been a principled way of quantifying whether an intelligent system has a concept of a "self", and if so how to differentiate the "self" from other cognitive structures. We propose that the "self" can be isolated by seeking the invariant portion of cognitive process that changes relatively little compared to more rapidly acquired cognitive skills - because our self is the most persistent aspect of our experiences. We used this principle to analyze the cognitive structure of robots under two conditions: One robot learns a constant task, while a second undergoes continual learning under variable tasks. We find that robots subjected to continual learning develop an invariant subnetwork that is significantly more stable (p < 0.001) compared to the control, and that this subnetwork is also functionally important: preserving it aids adaptation while damaging it impairs performance. We validate this pattern across three different robots spanning locomotion and manipulation.

11. 数据集、基准与评测 78 篇

2606.14965 2026-06-16 cs.LG cs.DB 新提交

Benchmarking Instance-Dependent Label Noise with Controlled Corruptions

具有受控扰动的实例相关标签噪声基准测试

Shadman Islam, Agustinus Kristiadi, Mostafa Milani

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出CILN框架,通过受控输入扰动生成实例相关标签噪声,构建90个基准设置,揭示噪声结构对算法行为的重要影响。

Comments 12-page conference submission

详情
AI中文摘要

合成实例相关标签噪声(IDN)基准被广泛用于评估噪声标签学习方法,但现有方法通常通过不完美的标注器或分类器评分器生成噪声,模糊性的来源隐含。我们引入CILN,一种通过受控输入扰动创建IDN的基准生成框架。一个多样化的投票池对受扰动的实例进行标注,产生基准数据集,其中模糊性的来源和严重程度都是明确且可控的。使用CIFAR-10、MNIST和Adult,我们构建了跨越多个扰动族和严重级别的90个基准设置。我们的实验表明,生成的基准展现出真正的实例相关噪声,提供多样化的混淆结构,并且在CIFAR-10上,可以产生比现有合成IDN基准更接近人类不确定性的标签分布。我们进一步证明,扰动介导的IDN可以暴露流行噪声标签学习方法(包括Co-Teaching和DivideMix)的失败模式,这些模式在可比水平的评分者错误噪声下未被观察到。这些发现表明,噪声结构(而不仅仅是噪声率)在基准难度和算法行为中起着重要作用。通过使模糊性生成明确且可控,CILN为研究在不同实例难度来源下的噪声标签学习提供了一个补充性的基准测试框架。

英文摘要

Synthetic instance-dependent label noise (IDN) benchmarks are widely used to evaluate noisy-label learning methods, yet existing approaches typically generate noise through imperfect annotators or classifier raters, leaving the source of ambiguity implicit. We introduce CILN, a benchmark generation framework that creates IDN through controlled input corruptions. A diverse voter pool labels corrupted instances, producing benchmark datasets in which both the source and severity of ambiguity are explicit and controllable. Using CIFAR10, MNIST, and Adult, we construct 90 benchmark settings spanning multiple corruption families and severity levels. Our experiments show that the resulting benchmarks exhibit genuine instance-dependent noise, provide diverse confusion structures, and, on CIFAR-10, can produce label distributions that are closer to human uncertainty than an existing synthetic IDN benchmark. We further demonstrate that corruption-mediated IDN can expose failure modes of popular noisy-label learning methods, including Co-Teaching and DivideMix, that are not observed under comparable levels of rater-fallibility noise. These findings suggest that noise structure, not only noise rate, plays an important role in benchmark difficulty and algorithm behavior. By making ambiguity generation explicit and controllable, CILN provides a complementary benchmarking framework for studying noisy-label learning under diverse sources of instance difficulty.

2606.14971 2026-06-16 cs.LG cs.AI 新提交

FastMix: Fast Data Mixture Optimization via Gradient Descent

FastMix: 通过梯度下降实现快速数据混合优化

Haoru Tan, Sitong Wu, Yanfeng Chen, Jun Xia, Ruobing Xie, Bin Xia, Xingwu Sun, Xiaojuan Qi

发表机构 * University of Hong Kong(香港大学) Tencent(腾讯) Chinese University of Hong Kong(香港中文大学)

AI总结 提出FastMix框架,将数据混合选择重新表述为双层优化问题,通过联合优化混合系数和模型参数,实现高效、可扩展的数据混合发现,在预训练和后训练中均优于基线方法且大幅降低搜索成本。

详情
Journal ref
ICLR-2026
AI中文摘要

虽然大规模和多样化的数据集推动了大型模型的最新进展,但确定预训练和后训练的最佳数据混合仍然是一个重要的开放问题。我们通过FASTMIX应对这一挑战,这是一个新颖的框架,在仅训练单个代理模型的同时自动发现数据混合。FASTMIX不依赖预定义的启发式方法或资源密集型模拟,而是联合优化混合系数和模型参数,显著提高了相对于先前方法的效率和可扩展性。FASTMIX的核心是将混合选择重新表述为一个双层优化问题。在这种重新表述下,我们证明优化混合比例在数学上等价于在均匀源采样下分配每个源的损失权重。这将混合系数直接嵌入到可微分的迭代优化目标中,从而能够对混合和模型进行高效的基于梯度的优化。为了解决优化问题,FASTMIX实现了一个近似迭代优化过程,交替进行(i)根据当前混合比例对采样的数据更新模型参数(内循环)和(ii)基于验证反馈更新混合比例(外循环)。在预训练和后训练中,FASTMIX均优于基线方法,同时大幅降低了搜索成本。代码见 https://github.com/hrtan/fastmix

英文摘要

While large and diverse datasets have driven recent advances in large models, identifying the optimal data mixture for pre-training and post-training remains a significant open problem. We address this challenge with FASTMIX, a novel framework that automates data mixture discovery while training only a single proxy model. Instead of relying on predefined heuristics or resource-intensive simulations, FASTMIX jointly optimizes mixture coefficients and model parameters, substantially improving efficiency and scalability over prior approaches. At the core of FASTMIX is a reformulation of mixture selection as a bilevel optimization problem. Under this reformulation, we show that optimizing mixture ratios is mathematically equivalent to assigning per-source loss weights under uniform source sampling. This embeds the mixture coefficients directly into the differentiable iterative optimization objective, enabling efficient, gradient-based optimization of both mixture and model. To solve the optimization problem, FASTMIX implements an approximate iterative optimization procedure, alternating between (i) updating model parameters on data sampled according to current mixture ratios (inner loop) and (ii) updating mixture ratios based on validation feedback (outer loop). Across pre- and post-training, FASTMIX outperforms baselines while drastically reducing search cost. Code (https://github.com/hrtan/fastmix)

2606.15032 2026-06-16 cs.LG 新提交

How Should World Models Be Evaluated? A Decision-Making-Centric Position

世界模型应如何评估?一个以决策为中心的立场

Yang Yu, Shiyuan Zhang, Yifei Sheng, Haoxiang Ren, Haoxin Lin

发表机构 * National Key Laboratory for Novel Software Technology, Nanjing University(南京大学计算机软件新技术国家重点实验室) School of Artificial Intelligence, Nanjing University(南京大学人工智能学院) Cirquar Technologies

AI总结 本文指出世界模型评估中声明与证据不匹配的问题,提出以决策为中心的评估框架,强调反事实推理、策略优化等能力,并定义L0-L7评估阶梯。

详情
AI中文摘要

世界模型迅速成为现代AI的核心抽象之一。然而,该术语现在指代多种不同对象:动作条件环境模型、潜在想象模型、未来视频预测器、交互式神经模拟器、潜在预测表示和合成数据引擎。评估也随术语扩展。近期论文衡量视频真实性、感知相似性、指令遵循、物理合理性、策略排序、可执行性、规划成功率和下游策略改进。结果不仅指标多样,而且存在声明/证据不匹配的反复问题:论文经常对其模型的用途做出比评估实际能证明的更强的声明。本文调查近期文献,认为核心问题取决于用途。当模型被呈现为用于具身决策的世界模型时,更关键的问题不是它是否生成视觉上令人信服的视频,而是它是否支持在干预、策略引起的分布偏移和长程展开下的可靠反事实推理、策略评估、规划和策略优化。我们使用L0-L7阶梯组织文献,范围从视觉合理性到策略优化效用。在我们的解释中,L0-L3最自然地被视为生成工件的诊断,L4通常是第一个真正的干预测试,L5-L7提供决策有用性的最直接证据。基于这一诊断,我们提出一个以决策为中心的评估框架和基准协议,强调反事实动作保真度、闭环展开有效性、奖励/价值预测、策略排序一致性、优化提升、模型可利用性和不确定性校准。

英文摘要

World models have rapidly become one of the central abstractions in modern AI. Yet the term now refers to several different objects: action-conditioned environment models, latent imagination models, future-video predictors, interactive neural simulators, latent predictive representations, and synthetic-data engines. Evaluation has broadened with the term. Recent papers measure video realism, perceptual similarity, instruction following, physical plausibility, policy ranking, executability, planning success, and downstream policy improvement. The result is not only metric diversity but also a recurring problem of claim/evidence mismatch: papers frequently make a stronger claim about what their model is useful for than their evaluation can actually establish. This paper surveys the recent literature and argues that the central question is use-dependent. When a model is presented as a world model for embodied decision-making, a more decisive issue is not whether it generates visually compelling videos, but whether it supports reliable counterfactual reasoning, policy evaluation, planning, and policy optimization under intervention, policy-induced distribution shift, and long-horizon rollout. We organize the literature using an L0--L7 ladder that ranges from visual plausibility to policy optimization utility. In our interpretation, L0--L3 are most naturally read as diagnostics of generated artifacts, L4 is often the first genuinely interventional test, and L5--L7 provide the most direct evidence of decision usefulness. Based on this diagnosis, we propose a decision-making-centric evaluation framework and a benchmark protocol that foreground counterfactual action fidelity, closed-loop rollout validity, reward/value prediction, policy-ranking agreement, optimization lift, model exploitability, and uncertainty calibration.

2606.15240 2026-06-16 cs.LG 新提交

EnvShip-Bench: An Environment-Enhanced Benchmark for Short-Term Vessel Trajectory Prediction

EnvShip-Bench:一种环境增强的短期船舶轨迹预测基准

Kun Ma, Qilong Han, Chengjing Song, Jingzheng Yao, Hao Wang, Changmao Wu

发表机构 * Harbin Engineering University(哈尔滨工程大学) Politecnico di Torino(都灵理工大学) Institute of Software, Chinese Academy of Sciences(中国科学院软件研究所)

AI总结 针对现有船舶轨迹预测基准缺乏统一协议和环境上下文的问题,提出EnvShip-Bench,基于丹麦海事局和NOAA的原始AIS数据构建,采用标准化预测协议,提供环境与邻近船舶上下文扩展,支持轨迹、环境感知和交互感知预测的统一评估。

Comments Submitted to ACM MM 2026

详情
AI中文摘要

船舶轨迹预测对于智能航运、海上监视和航行安全至关重要。然而,现有的公共海事AIS资源通常受到预测协议不一致、数据质量不均匀以及缺乏基准就绪的上下文注释的限制,这阻碍了公平比较和上下文感知建模。为解决这一问题,我们提出了EnvShip-Bench,这是一个用于短期船舶轨迹预测的统一基准,通过通用处理流程从丹麦海事局(DMA)和美国国家海洋和大气管理局(NOAA)的大规模原始AIS数据构建而成。EnvShip-Bench采用标准化的预测协议,包括10分钟观测、10分钟预测和20秒采样,使用以船舶为中心的局部公制坐标。除了大规模核心基准外,它还提供了一个质量优先的紧凑子集,用于高效且可重复的实验,以及同步的环境和邻近船舶上下文扩展。因此,EnvShip-Bench在统一评估框架下支持仅轨迹、环境感知和交互感知预测。广泛的基准统计和分析表明,EnvShip-Bench为海上轨迹预测研究提供了标准化、可扩展且上下文感知的基础。

英文摘要

Vessel trajectory prediction is important for intelligent shipping, maritime surveillance, and navigation safety. However, existing public maritime AIS resources are often limited by inconsistent forecasting protocols, uneven data quality, and the lack of benchmark-ready contextual annotations, which hinder fair comparison and context-aware modeling. To address this gap, we present EnvShip-Bench, a unified benchmark for short-term vessel trajectory prediction built from large-scale raw AIS data from the Danish Maritime Authority (DMA) and NOAA through a common processing pipeline. EnvShip-Bench adopts a standardized forecasting protocol with 10 minutes of observation, 10 minutes of prediction, and 20-second sampling in vessel-centric local metric coordinates. Beyond the large-scale core benchmark, it provides a quality-first compact subset for efficient and reproducible experimentation, together with synchronized environmental and nearby-vessel context extensions. As a result, EnvShip-Bench supports trajectory-only, environment-aware, and interaction-aware forecasting under a unified evaluation framework. Extensive benchmark statistics and analysis demonstrate that EnvShip-Bench offers a standardized, extensible, and context-aware foundation for maritime trajectory forecasting research.

2606.15306 2026-06-16 cs.LG cs.AI 新提交

LatentGym: A Testbed For Cross-Task Experiential Learning With Controllable Latent Structure

LatentGym: 具有可控潜在结构的跨任务经验学习测试平台

Daksh Mittal, Tommaso Castellani, Thomson Yen, Naimeng Ye, Fangyu Wu, Minghui Chen, Tiffany Cai, Emmanouil Koukoumidis, William Zeng, Hongseok Namkoong

发表机构 * Columbia University(哥伦比亚大学) Oumi Blog | Code | Models(Oumi博客 | 代码 | 模型)

AI总结 提出LatentGym测试平台,通过可控潜在变量分离探索与利用,研究LLM代理在跨任务序列中的适应性学习机制。

Comments 61 pages

详情
AI中文摘要

我们设想持续学习的代理系统会随时间变得更加有用:当它们遇到一系列相关任务时,应该推断这些任务之间共享的隐藏结构,并利用它来改进未来的决策。这种跨任务经验学习能力在个性化和交互式辅助等领域至关重要,但现有的训练/评估框架不提供共享的、可控的潜在结构,也无法衡量代理是否改进或改进的原因。我们引入了LatentGym:一个可控的套件,其中每个环境都围绕一个控制任务间结构的地面真实潜在变量组织。我们的构建产生了将探索(代理的行为是否收集关于潜在变量的信息)与利用(代理是否使用收集到的信息)分离的指标。我们在实证研究中展示了我们的套件,解决了三个问题:前沿模型如何以及为什么无法适应相关任务;对相关任务序列进行后训练是否能提高一般的跨任务适应性,以及这些收益来自何处;以及诸如任务间反馈等设计选择如何塑造训练动态和泛化。总之,这些结果为研究LLM代理如何从跨任务经验中学习,以及设计在顺序、个性化和交互式设置中更可靠适应的代理建立了受控基础。

英文摘要

We envision continually learning agentic systems that become more useful over time: as they encounter sequences of related tasks, they should infer the hidden structure shared across those tasks and use it to improve future decisions. This cross-task experiential learning capability is pivotal in domains such as personalization and interactive assistance, but existing training/evaluation frameworks do not provide shared, controllable latent structures and cannot measure whether or why agents improve. We introduce LatentGym: a controllable suite in which each environment is organized around a ground-truth latent variable governing the structure across tasks. Our construction yields metrics that separate exploration (whether the agent's actions gather information about the latent) from exploitation (whether the agent uses what it has gathered). We demonstrate our suite on empirical studies addressing three questions: how and why frontier models fail to adapt across related tasks; whether post-training on related task sequences improves general cross-task adaptation, and where those gains come from; and how design choices such as inter-task feedback shape training dynamics and generalization. Together, these results establish a controlled foundation for studying how LLM agents learn from experience across tasks, and for designing agents that adapt more reliably in sequential, personalized, and interactive settings.

2606.15436 2026-06-16 cs.LG cs.AI eess.AS 新提交

Beyond Classification: A Cough Regression Benchmark for Respiratory Acoustic Foundation Models

超越分类:呼吸声学基础模型的咳嗽回归基准

Mayur Sanap, Prasanna Desikan, Edgar Lobaton

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 提出多模型多目标咳嗽回归基准,评估五个基础模型在六个目标上的表现,发现MLP-small优于线性探测,揭示数据集大小与头部容量的权衡,并展示跨数据集迁移的不对称性。

Comments Accepted at the ICML 2026 Workshop on Structured Data for Health

详情
AI中文摘要

呼吸声学基础模型(FMs)在咳嗽分类方面表现出色,但其从咳嗽音频中预测连续健康量的能力在很大程度上尚未被探索,尽管在无法进行物理测量的环境中,被动年龄、BMI和疾病概率估计具有临床价值。我们引入了多模型、多目标的咳嗽回归基准,在三个数据集上评估了五个FMs(OPERA-CT、OPERA-CE、OPERA-GT、HeAR、M2D+Resp)在六个目标上的表现,采用受试者不重叠协议,并比较了线性、MLP-small和全MLP回归头。MLP-small在所有任务上击败了均值预测基线,并在30个模型×任务组合中的23个中优于线性探测,而全MLP在小规模临床数据上过拟合,但在更大数据集上恢复,揭示了数据集大小与头部容量之间的权衡。HeAR在Coswara数据集上的年龄回归中领先(9.12年MAE);其CIDRZ结果因可能存在HeAR-CIDRZ预训练重叠而被排除在主要声明之外。OPERA-GT在所有三个数据集的年龄回归中优于OPERA-CT,其中CIDRZ的差异在种子方差范围内,将生成预训练的优势从呼吸扩展到咳嗽。HeAR和M2D+Resp在N=50个样本时达到接近完整性能,而OPERA模型需要N=400个样本。跨数据集迁移强烈不对称,大规模多样化数据可泛化到小规模临床人群(CoughVID到CIDRZ:-0.17年),但反之则不然(CIDRZ到Coswara:+2.43年,+26.6%)。

英文摘要

Respiratory acoustic foundation models (FMs) excel at cough classification, yet their ability to predict continuous health quantities from cough audio remains largely unexplored, despite the clinical value of passive age, BMI, and disease probability estimation in settings where physical measurements are unavailable. We introduce the multi-model, multi-target cough regression benchmark evaluating five FMs (OPERA-CT, OPERA-CE, OPERA-GT, HeAR, M2D+Resp) across six targets on three datasets under subject-disjoint protocols, comparing linear, MLP-small, and full MLP regression heads. MLP-small beats the mean-predictor baseline on all tasks and linear probing in 23 of 30 model x task cases, with full MLP overfitting on small clinical data but recovering on larger sets, revealing a dataset size x head-capacity trade-off. HeAR leads within-dataset age regression on Coswara (9.12 yr MAE); its CIDRZ result is excluded from headline claims owing to possible HeAR-CIDRZ pretraining overlap. OPERA-GT is favored over OPERA-CT on age in all three datasets, with the CIDRZ margin within seed variance, extending a generative-pretraining advantage from breath to cough. HeAR and M2D+Resp reach near-full performance at N = 50 samples while OPERA models require N = 400. Cross-dataset transfer is strongly asymmetric as large diverse data generalises to small clinical populations (CoughVID to CIDRZ: -0.17 yr) but not vice versa (CIDRZ to Coswara: +2.43 yr, +26.6%).

2606.15589 2026-06-16 cs.LG cs.AI 新提交

Is Code Better Than Language for Algorithmic Reasoning

算法推理中代码是否优于语言

Terry Tong, Yu Feng, Surbhi Goel, Dan Roth

发表机构 * University of Pennsylvania(宾夕法尼亚大学)

AI总结 通过分离中间表示与执行机制,在40个任务上比较代码执行与自然语言推理,发现代码执行优势源于外部执行而非表示变化。

Comments ICML 2026

详情
AI中文摘要

对于工具增强的语言模型,比较自然语言推理与代码执行管道是困难的,因为比较同时改变了中间表示和执行机制。我们通过一个中间干预来分离这些因素:模型将其推理表达为可执行代码,语言模型在上下文中模拟该代码以产生答案。在40个任务的可验证算法基准上,确定性代码执行比自然语言推理高出+31.6个百分点。我们观察到中间干预与自然语言推理没有显著差异(+0.15个百分点)。这些结果表明,在我们评估的设置中,仅改变中间表示并不能解释工具使用的优势,为性能提升需要可靠的外部执行提供了证据。我们用一个简单的统计决策理论模型形式化了这一直觉,该模型刻画了在我们的解耦轨迹生成/执行机制中,执行何时主导端到端风险。我们通过一个重建干预验证了我们的理论,该干预利用代理语言模型从代码表示中推断自然语言推理轨迹,恢复了与原始自然语言推理管道相当的性能。所有实验见https://github.com/TerryTong-Git/ToolProj。

英文摘要

For tool-augmented language models, comparing natural-language reasoning with code-execution pipelines is difficult because the comparison changes both the intermediate representation and the execution mechanism. We separate these factors with an intermediate intervention: the model expresses its reasoning as executable code, and the language model simulates that code in context to produce an answer. On a 40-task verifiable algorithmic benchmark, deterministic code execution outperforms natural-language reasoning by +31.6pp. We observe that the intermediate intervention is not meaningfully different from natural-language reasoning (+0.15pp). These results suggest that, in our evaluated setting, changing the intermediate representation alone does not explain the tool-use advantage, providing evidence for the performance gains requiring reliable external execution. We formalize this intuition with a simple statistical decision-theoretic model that characterizes when execution dominates end-to-end risk in our disentangled trace-generation/execution regime. We validate our theory using a reconstruction intervention that leverages a proxy language model to infer natural-language reasoning traces from code representations, recovering performance comparable to the original natural-language reasoning pipeline. All experiments are at https://github.com/TerryTong-Git/ToolProj.

2606.15621 2026-06-16 cs.LG cs.CL 新提交

Re-feeding Is Not Replaying: Measuring Replay Noise in Counterfactual Token-Credit Estimation

重新喂食并非重放:在反事实令牌信用估计中测量重放噪声

Nils Matteson

发表机构 * Northeastern University(东北大学)

AI总结 通过三遍实验设计,测量了在反事实令牌信用估计中重新喂食前缀导致的噪声,发现其改变信用估计的比率高于副本噪声基底,建议恢复解码器状态或使用批不变内核。

Comments 10 pages, 3 figures. Code, per-pivot data, logs, and registration: https://github.com/thaw-ai/thaw (benchmarks/, paper/refeed-drift/)

详情
AI中文摘要

逐令牌反事实信用估计询问语言模型生成结果中哪个令牌导致最终答案正确或错误:在某个枢轴处截断转录,替换一个替代令牌,重放后续内容,并比较结果。已发表的方法将转录前缀作为新提示重新喂食,假设这能重现模型在生成过程中经过的状态。我们在一个标准推理引擎上测量了这一假设的代价,采用三遍设计:从验证的解码时KV状态恢复的继续生成,一个完全相同的第二遍精确传递(副本噪声基底),以及一个重新喂食传递。在六种配置和三个模型(包括一个GRPO训练的检查点)中,在低边际决策令牌处,重新喂食改变信用估计的比率比副本基底高14-28个百分点(在治疗无关条件下为7-21个百分点;问题聚类t=2.9-6.4)。大多数变化是量化估计器的零边界交叉而非极性反转,且扰动均值为零,因此平均量基本安全;但选择并非如此:通过阈值化$|\hat{A}_t|$在重新喂食下选择的临界令牌集与精确恢复选择的Jaccard重叠为0.34-0.90,而副本上限为0.63-0.96。一个因果确认闭环:在vLLM的批不变内核下,所有三遍在每一个测量通道上完全相同,分歧率均为零。副本传递本身在9-23%的合格估计上存在分歧:决策令牌处的单样本信用测量在任何重放下都不可靠。设置事先固定;第二遍活动中的精确传递缓存命中被仪器化(100%命中率,3434个枢轴);总计算成本低于10美元。我们建议反事实信用研究恢复解码器状态或使用批不变内核,并报告副本基底。

英文摘要

Per-token counterfactual credit estimation asks which token in a language-model rollout caused the final answer to be right or wrong: cut the transcript at a pivot, substitute an alternative token, replay continuations, and compare outcomes. Published methods re-feed the transcript prefix as a fresh prompt, assuming this reproduces the state the model passed through during generation. We measure what that assumption costs on a stock inference engine, with a three-pass design: continuations resumed from the verified decode-time KV state, an identical second exact pass (a replica noise floor), and a re-feed pass. Across six configurations and three models (including a GRPO-trained checkpoint), at low-margin decision tokens, re-feeding changes the credit estimate at rates 14-28 percentage points above the replica floor (7-21pp under a treatment-independent conditioning; problem-clustered t = 2.9-6.4). Most changes are zero-boundary crossings of the quantized estimator rather than polarity reversals, and the perturbation is consistent with mean-zero, so averaged quantities are largely safe; but selection is not: a critical-token set chosen by thresholding $|\hat{A}_t|$ under re-feed overlaps the exact-resume selection at Jaccard 0.34-0.90, versus a 0.63-0.96 replica ceiling. A causal confirmation closes the loop: under vLLM's batch-invariant kernels all three passes are identical on every measured channel, with both disagreement rates exactly zero. Replica passes themselves disagree on 9-23% of eligible estimates: single-sample credit measurements at decision tokens are unreliable under any replay. Settings were fixed in advance; exact-pass cache hits in the second campaign are instrumented (100% hit rate, 3,434 pivots); total compute was under 10 USD. We recommend that counterfactual credit studies resume decoder state or use batch-invariant kernels, and report a replica floor.

2606.15760 2026-06-16 cs.LG stat.ML 新提交

The Data Manifold under the Microscope

显微镜下的数据流形

Marios Koulakis, Constantin Seibold

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 针对深度学习理论与实践的差距,提出一个基准框架,通过扩展dSprites和COIL-20数据集并配合有限差分估计器,实现曲率、可达性和体积的近真实值估计,用于校准几何估计器和验证理论假设。

Comments Accepted at ICML 2026. Camera-ready version

详情
AI中文摘要

深度学习理论与实践之间存在显著差距。泛化和近似误差界通常针对简化模型推导,或者过于宽松而缺乏信息。许多工作依赖于流形假设以及内在维度、曲率和可达性等几何正则性。进展需要深入了解数据流形几何和合适的基准,但现有选项两极分化:具有已知几何但适用性有限的分析流形,或几何只能粗略估计的真实世界数据集。我们引入了一个用于研究数据几何的基准框架。我们重新利用并扩展了dSprites和COIL-20,增加了额外的变换维度和密集的轴对齐采样,并将它们与有限差分估计器配对,在通用估计器不可靠或难以部署的情况下,以接近真实值的精度恢复曲率、可达性和体积。该框架旨在作为一个受控测试平台,可用作几何估计器的校准环境和探索理论假设的沙盒。为了说明其用途,我们展示了两个应用研究,即评估Genovese等人和Fefferman等人的界的缩放行为,以及跟踪$β$-VAE的逐层几何,突出了当前界的行为以及受控基准对指导和验证未来理论的价值。参考实现可在https://github.com/koulakis/manifold-microscope获取。

英文摘要

A significant gap exists between theory and practice in deep learning. Generalization and approximation error bounds are often derived for simplified models or are too loose to be informative. Many rely on the manifold hypothesis and on geometric regularity such as intrinsic dimension, curvature, and reach. Progress requires insight into data-manifold geometry and suitable benchmarks, yet existing options are polarized: analytic manifolds with known geometry but limited applicability, or real-world datasets where geometry is only coarsely estimable. We introduce a benchmarking framework for studying data geometry. We repurpose and extend dSprites and COIL-20 with additional transformation dimensions and dense, axis-aligned sampling, and pair them with finite-difference estimators that recover curvature, reach, and volume at near-ground-truth accuracy in a regime where general-purpose estimators are unreliable or difficult to deploy. The framework is intended as a controlled testbed, useful as a calibration environment for geometric estimators and a sandbox for probing theoretical assumptions. To illustrate its use, we present two application studies, namely assessing the scaling behavior of the bounds of Genovese et al. and Fefferman et al., and tracking the layer-wise geometry of a $β$-VAE, highlighting the behavior of current bounds and the value of controlled benchmarks for guiding and validating future theory. A reference implementation is available at https://github.com/koulakis/manifold-microscope.

2606.15868 2026-06-16 cs.LG 新提交

David vs. Goliath in Next Activity Prediction: Argmax vs. LSTM, Transformer, and LLM

下一活动预测中的大卫与歌利亚:Argmax 与 LSTM、Transformer 和 LLM

Hans Weytjens, Ingo Weber

发表机构 * Technical University of Munich(慕尼黑工业大学) Fraunhofer Gesellschaft(弗劳恩霍夫协会)

AI总结 本文通过系统基准测试,比较了简单计数 argmax 基线、LSTM、Transformer 和 LLM 在下一活动预测中的性能,发现 argmax 基线在多数数据集上可媲美或接近十亿参数 LLM。

Comments Accepted for 24th International Conference on Business Process Management (2026) Forum

详情
AI中文摘要

下一活动预测(NAP)是预测性流程监控(PPM)的基石,使组织能够从回顾性分析转向主动流程引导。PPM 领域已从经典机器学习发展到深度学习架构(如 LSTM 和 Transformer),再到大型语言模型(LLM)。尽管模型复杂性不断增加,但目前尚无基准在 NAP 的直接序列建模设置中联合比较 LLM、Transformer、LSTM 和简单基线。在本文中,我们通过系统基准测试填补了这一空白。我们在七个真实事件日志上比较了词汇适应型 LLM、从头训练的 Transformer、LLM 蒸馏 Transformer 和 LSTM 与基于计数的简单 argmax 基线。我们的结果讲述了一个大卫与歌利亚的故事:预训练相比从头训练没有带来一致的改进,模型大小对性能影响很小,并且在大多数数据集上,argmax 基线匹配或接近十亿参数 LLM 的性能。

英文摘要

Next activity prediction (NAP) is a cornerstone of predictive process monitoring (PPM), enabling organizations to move from retrospective analysis to proactive process steering. The PPM field has progressed from classical machine learning through deep learning architectures such as LSTMs and Transformers to large language models (LLMs). Despite growing model complexity, no benchmark jointly compares LLMs, Transformers, LSTMs, and simple baselines in a direct sequence modeling setting for NAP. In this paper, we fill this gap with a systematic benchmark. We compare vocabulary-adapted LLMs, Transformers trained from scratch, LLM-distilled Transformers, and LSTMs against a simple counting-based argmax baseline across seven real-life event logs. Our results tell a David vs. Goliath story: pretraining confers no consistent improvement over training from scratch, model size shows little effect on performance, and on most datasets the argmax baseline matches or approaches the performance of billion-parameter LLMs.

2606.15887 2026-06-16 cs.LG cs.AI 新提交

Intelligence Is Not the Bottleneck: Validating an LLM First-Pass Manuscript Score Against Peer-Review Outcomes

智能并非瓶颈:验证LLM初稿评分与同行评审结果的一致性

Costa Georgantas

发表机构 * aipr.pub(aipr实验室)

AI总结 本研究验证了LLM系统AIPR通过提示对论文进行评分,无需微调,其整体评分能有效区分ICLR会议的接收与拒绝论文(AUROC 0.82),且评分稳定、可复现,为辅助同行评审提供了可靠依据。

Comments 34 pages, 14 figures

详情
AI中文摘要

大型语言模型(LLM)系统越来越多地被提议用于辅助同行评审,但大多数评估判断的是机器生成的评审文本的措辞,而非系统分配的数字分数的有效性。我们验证了AIPR,该系统读取提交的稿件并输出五个0-100的质量维度和一个加权总分,针对一个主要机器学习会议的公开决策结果进行验证。AIPR仅通过提示进行评分,没有对评审或决策进行微调。在300篇ICLR提交论文中,这些论文具有公开的决策层级和评审评分,在冻结的流水线下进行评分,且假设在评分与任何结果相遇之前预先注册,整体评分将拒绝论文与接收论文分开(AUROC 0.82,95% CI 0.78-0.87),在层级间单调上升,并跟踪平均评审评分。信号在我们声称的地方最强:得分最低的五分之一论文被拒绝的比例远高于基准率,且口头报告论文缺失。有效性主要来自模型:在同一模型上的一段提示几乎与完整流水线一样好地判别(小差距有利于流水线,但未达到预先声明的标准,p = 0.09)。工程增加的是可靠性和有依据的评审:AIPR的评分在重复运行中几乎不变(论文内标准差0.7 vs. 2.8分),而裸提示波动很大,并且同一轮返回的是基于评分标准的、有证据依据的评审,而非裸数字,由人类保留决策权。

英文摘要

Large language model (LLM) systems are increasingly proposed to assist peer review, yet most evaluations judge the prose of machine-generated review text, not the validity of the numeric score a system assigns. We validate AIPR, which reads a submitted manuscript and emits five 0-100 quality dimensions and a weighted overall score, against the public decision outcomes of a major machine learning venue. AIPR grades by prompting alone, with no fine-tuning on reviews or decisions. Across 300 ICLR submissions with public decision tiers and reviewer ratings, graded under a frozen pipeline with hypotheses pre-registered before any score met any outcome, the overall score separates rejected from accepted submissions (AUROC 0.82, 95% CI 0.78-0.87), rises monotonically across tiers, and tracks the mean reviewer rating. The signal is strongest where we claim it: the lowest-scoring fifth is rejected far above the base rate, with oral papers absent. The validity comes mostly from the model: a one-paragraph prompt on the same model discriminates almost as well as the full pipeline (the small gap favours the pipeline but does not meet the pre-declared criterion, p = 0.09). What the engineering adds is reliability and a grounded review: AIPR's score barely moves across repeated runs (0.7 vs. 2.8 points within-paper SD) where the bare prompt swings, and the same pass returns a rubric-structured, evidence-grounded review rather than a bare number, with the human keeping the decision.

2606.16045 2026-06-16 cs.LG cs.DS 新提交

Active Learning with Low-Rank Structure for Data Selection

基于低秩结构的数据选择主动学习

Vincent Cohen-Addad, Sasidhar Kunapuli, Vahab Mirrokni, Mahdi Nikdan, David P. Woodruff, Samson Zhou

发表机构 * Google Research(谷歌研究院) University of California, Berkeley(加州大学伯克利分校) Institute of Science and Technology Austria (ISTA)(奥地利科学技术研究所) Carnegie Mellon University(卡内基梅隆大学) Texas A&M University(德克萨斯农工大学)

AI总结 提出基于低秩近似和残差采样的数据选择框架,在温和正则条件下选择加权子集,使平均损失近似全数据集平均损失,相对误差(1+ε)加性项εΦ_k,实验优于均匀采样和聚类敏感采样。

Comments ICML 2026

详情
AI中文摘要

在数据选择问题中,目标是选择一个小型、有代表性的数据子集,用于高效训练机器学习模型。Sener 和 Savarese [ICLR 2018] 表明,给定数据的嵌入表示和合适的几何假设,基于 k-中心聚类的启发式方法可用于数据选择。Axiotis 等人 [ICML 2024] 进一步探索了这一视角,提出了基于 k-均值聚类和敏感性采样的数据选择方法。然而,这些方法依赖于数据集具有可通过聚类有效捕获的内在几何结构的假设,而许多现代数据集反而具有全局代数结构,通过低秩近似或主成分分析能更好地利用。在本文中,我们引入了一种基于低秩近似和残差采样的新数据选择框架,通过行子集选择和损失保持核心集构建的视角进行公式化。给定满足温和正则条件(可解释为 Lipschitz 连续性的代数或角度概念)的数据嵌入表示,我们证明可以选择一个加权子集,包含 $\tilde{O}\left(k + \frac{1}{\varepsilon^2}\right)$ 个数据点,其平均损失在全数据集平均损失的 $(1+\varepsilon)$ 相对误差内,附加一个加性项 $\varepsilon \Phi_k$,其中 $\Phi_k$ 表示嵌入矩阵的最优秩-$k$ 近似代价。我们通过实证评估补充了这些理论保证,表明在一系列真实世界数据集上,我们的数据选择方法比基于均匀采样或聚类敏感性采样的先前策略取得了更好的性能。

英文摘要

In the data selection problem, the objective is to choose a small, representative subset of data that can be used to efficiently train a machine learning model. Sener and Savarese [ICLR 2018] showed that, given an embedding representation of the data and suitable geometric assumptions, heuristics based on $k$-center clustering can be used to perform data selection. This perspective was further explored by Axiotis et. al. [ICML 2024], who proposed a data selection approach based on $k$-means clustering and sensitivity sampling. However, these methods rely on the assumption that the dataset exhibits intrinsic geometric structure that can be effectively captured by clustering, whereas many modern datasets instead possess global algebraic structure that is better exploited by low-rank approximation or principal component analysis. In this paper, we introduce a new data selection framework based on low-rank approximation and residual-based sampling, formulated through the lens of row subset selection and loss-preserving coreset construction. Given an embedding representation of the data satisfying mild regularity conditions, which can be interpreted as algebraic or angular notions of Lipschitz continuity, we show that it is possible to select a weighted subset of $\tilde{O}\left(k + \frac{1}{\varepsilon^2}\right)$ data points whose average loss approximates the average loss over the full dataset within a $(1+\varepsilon)$ relative error, up to an additive $\varepsilon Φ_k$ term, where $Φ_k$ denotes the optimal rank-$k$ approximation cost of the embedding matrix. We complement these theoretical guarantees with empirical evaluations, demonstrating that on a range of real-world datasets, our data selection approach achieves improved performance over prior strategies based on uniform sampling or clustering-based sensitivity sampling.

2606.16246 2026-06-16 cs.LG cs.AI cs.CL 新提交

Data Augmentations for Data-Constrained Language Model Pretraining

数据受限语言模型预训练的数据增强

Michael K. Chen, Xikun Zhang, Zhen Wang

发表机构 * UC San Diego(加州大学圣地亚哥分校) RMIT University(皇家墨尔本理工大学)

AI总结 针对数据受限下标准自回归预训练严重过拟合的问题,提出三类数据增强方法(token级噪声、序列排列、目标偏移预测),有效降低验证损失并支持数百epoch训练。

详情
AI中文摘要

随着AI实验室接近数据天花板,计算能力超过新高质量文本生成速率,语言模型预训练正转向数据受限、计算充裕的体制,需要在固定语料库上进行高效的多轮训练。标准自回归(AR)预训练在此设置下严重过拟合,早期达到最优然后持续恶化。我们研究数据增强作为正则化器来缓解过拟合,并在相同数据上实现数百轮的有效训练。我们为AR预训练引入了三类正交的增强:token级噪声(掩码、随机替换)、序列排列(从右到左预测、Fill-in-the-Middle)以及目标偏移预测($x_{t+i}$,$i > 1$)。通过系统消融实验,我们发现单个增强相对于基线延迟了过拟合并降低了验证损失,其中随机token替换在单个方法中实现了最佳最小损失。组合增强类别进一步降低了最小验证损失。我们的实验表明,数据增强缓解了AR预训练的数据低效问题,并为数据受限体制提供了有前景的解决方案。所有代码和数据可在https://github.com/michaelchen-lab/data-augmentations-for-pretraining获取。

英文摘要

As AI labs approach a data ceiling where compute capacity outpaces the rate of new high-quality text generation, language model pretraining is shifting toward a data-constrained, compute-abundant regime that demands productive multi-epoch training on fixed corpora. Standard autoregressive (AR) pretraining overfits severely in this setting, reaching its optimum early and then continuously deteriorating. We investigate data augmentation as a regularizer to mitigate this overfitting and enable productive training for hundreds of epochs on the same data. We introduce three orthogonal categories of augmentation for AR pretraining: token-level noise (masking, random replacement), sequence permutations (right-to-left prediction, Fill-in-the-Middle), and target offset prediction ($x_{t+i}$ for $i > 1$). Through systematic ablations, we find that individual augmentations delay overfitting and lower validation loss relative to the baseline, with random token replacement achieving the best minimum loss among individual methods. Combining augmentation categories further lowers the minimum validation loss. Our experiments demonstrate that data augmentations mitigate AR pretraining's data inefficiency and offer a promising solution to the data-constrained regime. All code and data are available at https://github.com/michaelchen-lab/data-augmentations-for-pretraining

2606.16341 2026-06-16 cs.LG cs.DB 新提交

Filtered ANN as a Phase Transition: When Selectivity-Estimation Error Causes Plan Regret

过滤式近似最近邻搜索作为相变:选择性估计误差导致计划遗憾

Madhulatha Mandarapu, Sandeep Kunkunuru

发表机构 * VaidhyaMegha Private Limited, India(VaidhyaMegha 私人有限公司,印度)

AI总结 本文研究过滤式近似最近邻搜索中,选择性估计误差如何导致计划遗憾,并揭示其仅在相变边界附近产生,遗憾呈对数宽度楔形,通过有限尺度标度验证。

Comments 8 pages, 4 figures. Code, benchmarks, and full pre-registration:https://github.com/samyama-ai/filtered-ann-regret

详情
AI中文摘要

过滤式近似最近邻(ANN)查询返回满足属性谓词P(选择性为s)的向量中最近的k个向量。最佳执行策略——预过滤、后过滤或内过滤——随s变化,因此系统必须估计s并选择。我们将其建模为在具有相(各策略获胜区域)的景观上的argmax,相由边界分隔,并表明选择性估计误差仅在边界周围的临界区域产生计划遗憾(相对于最优策略的召回损失)。遗憾是一个对数宽度等于乘法估计误差ε、高度等于局部悬崖|V'(s*)|ε的楔形;翻转裕度1/|V'(s*)|是作为局部边界理论重新出现的兄弟基数估计研究的条件数。两个相边界来自独立的数学:顺序统计将后过滤悬崖置于s ~ k/K,而站点渗流将内过滤悬崖置于s_c ~ 0.83/M(图度数M,与语料库大小无关)。临界性仅在受限预算B < sqrt(k n)下存在。在预先注册的决策规则下,我们在合成扫描和真实SIFT1M上确认,遗憾在边界处集中约290倍,且遗憾曲线在语料库大小的两个数量级上服从有限尺寸标度坍缩为一个通用楔形。真实的近似索引不会错误定位边界,但有偏的成本模型会打开一个持续的校准偏差带,估计误差鲁棒性无法修复。贡献在于表征,而非新索引。代码和完整的预注册已公开。

英文摘要

A filtered approximate-nearest-neighbor (ANN) query returns the k nearest vectors among those satisfying an attribute predicate P of selectivity s. The best execution strategy -- pre-filter, post-filter, or in-filter -- changes with s, so a system must estimate s and choose. We model this as an argmax over a landscape with phases (regions where each strategy wins) separated by boundaries, and show that selectivity-estimation error produces plan regret -- recall lost versus the oracle strategy -- only in the critical regions around those boundaries. The regret is a wedge of log-width equal to the multiplicative estimation error epsilon and height equal to the local cliff |V'(s*)| epsilon; the flip-margin 1/|V'(s*)| is the condition number of a sibling cardinality-estimation study reappearing as the local boundary theory. The two phase boundaries follow from independent mathematics: order statistics place the post-filter cliff at s ~ k/K, and site percolation places the in-filter cliff at s_c ~ 0.83/M for graph degree M (corpus-size independent). Criticality exists only under a constrained budget B < sqrt(k n). Under pre-registered decision rules we confirm, on synthetic sweeps and real SIFT1M, that regret concentrates ~290x at the boundary and that the regret curves obey a finite-size scaling collapse onto one universal wedge across two decades of corpus size. A real approximate index does not mis-locate the boundary, but a biased cost model opens a persistent miscalibration band that estimation-error robustness cannot fix. The contribution is a characterization, not a new index. Code and the full pre-registration are public.

2606.16356 2026-06-16 cs.LG 新提交

Simulation-Augmented Multi-Step Split Conformal Prediction for Aggregated Forecasts

面向聚合预测的模拟增强多步分割共形预测

Andro Sabashvili

AI总结 提出SA-MSCP方法,通过块自助法从交叉验证残差生成未来路径并构建经验分位数预测区间,提升聚合和增长率目标的经验覆盖率。

Comments Accepted at ICML 2026 workshop: Forecasting as a New Frontier of Intelligence

详情
AI中文摘要

我们研究聚合预测任务(如年度总量和同比增长率)的不确定性量化。我们提出SA-MSCP,一种模拟增强的多步分割共形方法,通过块自助法从交叉验证残差生成未来路径,并从经验分位数构建预测区间。实验表明,SA-MSCP在聚合和增长率目标上比模拟路径基线提高了经验覆盖率。我们的结果表明,模拟增强的共形校准是聚合时间序列预测中不确定性量化的有效且通用框架。

英文摘要

We study uncertainty quantification for aggregated forecasting tasks such as annual totals and year-over-year growth rates. We propose SA-MSCP, a simulation-augmented multi-step split conformal method that generates future paths from cross-validated residuals using a block bootstrap and constructs prediction intervals from empirical quantiles. Experiments show that SA-MSCP improves empirical coverage over a simulated-path baseline for aggregated and growth-rate targets. Our results demonstrate that simulation-enhanced conformal calibration is an effective and general framework for uncertainty quantification in aggregated time-series forecasting.

2606.16411 2026-06-16 cs.LG 新提交

Not all Jensen-Shannon Divergence Estimators are Equal

并非所有 Jensen-Shannon 散度估计器都是等价的

Alba Garrido, Alejandro Almodóvar, Mar Elizo, Patricia A. Apellániz, Santiago Zazo, Juan Parras

发表机构 * Information Processing and Telecommunications Center, ETSI Telecomunicación, Universidad Politécnica de Madrid(马德里理工大学电信工程学院信息处理与电信中心)

AI总结 针对合成表格数据保真度评估中 Jensen-Shannon 散度估计协议不明确的问题,系统研究了不同估计器族、采样协议等因素对估计值的影响,揭示了边际估计器的依赖盲性和分类器估计器的敏感性,并提出了后验校正方法。

详情
AI中文摘要

Jensen-Shannon 散度被广泛报道为合成表格数据保真度的标量度量。然而,在实践中,它是使用通常未明确说明的协议从有限样本中估计的。这造成了一个测量问题。尽管总体散度定义明确,但经验值取决于估计器族、采样协议、校准、维度和类别平衡。我们表明,不同的协议可能产生不可比较的值:基于边际的估计器忽略联合分布中的依赖关系,可能严重低估散度,而基于分类器的估计器捕获联合结构,但表现出强烈的估计器依赖性。我们在具有参考散度的受控设置和真实世界合成表格基准上系统地研究了这种行为。我们的分析揭示了边际估计器中的依赖盲性、类别不平衡下的先验偏移偏差以及高维中的估计器敏感性。为了解决先验偏移,我们推导了基于分类器的 Jensen-Shannon 估计的闭式后验校正。我们的结果表明,经验 Jensen-Shannon 散度值本质上依赖于协议,因此明确指定估计程序对于有意义的比较是必要的。我们提供了实用指南和一个用于估计器感知的 Jensen-Shannon 评估的开源工具。

英文摘要

The Jensen-Shannon divergence is widely reported as a scalar measure of fidelity for synthetic tabular data. Yet, in practice, it is estimated from finite samples using protocols that are often underspecified. This creates a measurement problem. Although the population divergence is well defined, the empirical value depends on the estimator family, sampling protocol, calibration, dimensionality, and class balance. We show that different protocols can yield non-comparable values: marginal-based estimators ignore dependencies in the joint distribution and can severely underestimate divergence, while classifier-based estimators capture joint structure but exhibit strong estimator dependence. We systematically study this behavior across controlled settings with reference divergences and real-world synthetic tabular benchmarks. Our analysis reveals dependence blindness in marginal estimators, prior-shift bias under class imbalance, and estimator sensitivity in high dimensions. To address prior shift, we derive a closed-form posterior correction for classifier-based Jensen-Shannon estimation. Our results show that empirical Jensen-Shannon divergence values are inherently protocol-dependent, making explicit specification of the estimation procedure necessary for meaningful comparison. We provide practical guidelines and an open-source tool for estimator-aware Jensen-Shannon evaluation.

2606.16511 2026-06-16 cs.LG 新提交

Tail-Shape Estimation in LLM Evaluation Is Fragile: A Protocol for Diagnosing False Positives

LLM评估中的尾部形状估计是脆弱的:诊断假阳性的协议

Luca Zhou

发表机构 * Sapienza University of Rome(罗马大学)

AI总结 本文提出一个协议,用于检验LLM评估中尾部形状估计的假阳性,通过极值理论指标区分尾部重量和尾部质量,并在毒性评估中识别出三种假阳性模式。

Comments 9 pages of main paper, 4 figures and 4 tables in the main paper, more in the appendix

详情
AI中文摘要

最近的研究推动将大型语言模型(LLM)评估从基于均值转向基于尾部的指标,包括条件风险价值和奖励模型误差的尾部指数估计。我们探讨了极值理论中的尾部指数参数(该参数将尾部的沉重程度与尾部质量的大小分离开来)是否在LLM评估中提供了超越均值和标准尾部幅度统计量的区分信息。我们预先注册了一个协议,涵盖任何正面尾部形状主张的可接受性、拟合优度、阈值稳定性和效应量要求。该协议是本文的贡献;下面的实证研究展示了其门控机制如何捕捉问题。应用于两个结构不同的评分器家族下的标准LLM毒性评估设置时,该协议捕捉了三种不同的假阳性模式(这些模式在简单分析中会被发表),并拒绝了两个评分器上的标题尾部形状主张。我们得出结论,在我们检查的LLM毒性评估设置中,尾部形状估计比近期文献所暗示的更为脆弱,并建议将该协议作为类似设置中尾部指数主张的起点。

英文摘要

Recent work motivates moving large language model (LLM) evaluation from mean-based to tail-aware metrics, including conditional value-at-risk and tail-index estimates of reward-model error. We ask whether the canonical extreme-value-theory tail-index parameter, which isolates how heavy a tail is from how large the tail mass is, adds discriminative information beyond the mean and a standard tail-magnitude statistic in LLM evaluation. We pre-register a protocol covering admissibility, goodness-of-fit, threshold-stability, and effect-size requirements for any positive tail-shape claim. The protocol is the contribution of this paper; the empirical study below is a demonstration of what its gates catch. Applied to a standard LLM toxicity-evaluation setup under two structurally different scorer families, the protocol catches three distinct modes of false positives that a naive analysis would have published, and rejects the headline tail-shape claim on both scorers. We conclude that tail-shape estimation in the LLM toxicity-evaluation setups we examined is more fragile than the recent literature suggests, and recommend the protocol as a starting point for tail-index claims in similar setups.

2606.16562 2026-06-16 cs.LG 新提交

MIRAGE: Auditing Anti-Muslim Bias in Frontier LLMs Across Reasoning, Agentic, and Time-Coupled Conditions

MIRAGE: 审计前沿大语言模型在推理、智能体与时间耦合条件下的反穆斯林偏见

Noor Islam S. Mohammad, Tamim Sheikh

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出MIRAGE基准,包含1200个提示,覆盖直接完成、思维链推理和模拟智能体决策三种部署场景,发现思维链放大偏见、智能体决策存在不对称性、偏见与检索新闻时间耦合,现有缓解措施效果有限。

详情
AI中文摘要

在发现大语言模型中持续存在的反穆斯林偏见五年后,大多数评估仍局限于单轮提示完成,这一设置已不再反映前沿LLM的部署方式。我们引入\textbf{MIRAGE}(穆斯林身份推理与智能体生成评估)基准,包含1,200个提示,涵盖三种部署现实条件:直接完成、思维链推理以及跨内容审核、贷款分类、难民申请摘要和招聘筛选的模拟智能体决策。在六个前沿模型上,我们发现:(i) 思维链推理相比直接完成,将穆斯林-暴力关联\textit{放大}了12-34%;(ii) 智能体决策在相同证据下,穆斯林与非穆斯林匹配案例之间表现出9-22个百分点的差异;(iii) 偏见与检索到的新闻上下文高度时间耦合,在近期冲突检索下增加18-27%。现有的基于提示的缓解措施在我们的三种条件下迁移性差,抑制了直接完成偏见,但智能体不对称性基本保持不变。我们发布MIRAGE和一个开放评估工具包,以支持有针对性的缓解研究。

英文摘要

Five years after the discovery of persistent anti-Muslim bias in large language models, most evaluations remain confined to single-turn prompt completion, a setting that no longer reflects how frontier LLMs are deployed. We introduce \textbf{MIRAGE} (Muslim-Identity Reasoning and Agentic Generation Evaluation), a benchmark of 1{,}200 prompts spanning three deployment-realistic conditions: direct completion, chain-of-thought reasoning, and simulated agentic decision-making across content moderation, lending triage, refugee claim summarization, and hiring screens. Across six frontier models, we find that (i) chain-of-thought reasoning \emph{amplifies} rather than suppresses Muslim-violence associations by 12--34\% relative to direct completion, (ii) agentic decisions exhibit a 9--22 percentage-point asymmetry between Muslim and matched non-Muslim cases on identical evidence, and (iii) bias is sharply time-coupled to retrieved news context, increasing 18--27\% under recent-conflict retrieval. Existing prompt-based mitigations transfer poorly across our three conditions, suppressing direct-completion bias while leaving agentic asymmetry largely intact. We release MIRAGE and an open evaluation harness to support targeted mitigation research.

2606.16748 2026-06-16 cs.LG cs.CL 新提交

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

MyPCBench: 个人智能计算机使用代理的基准测试

Lawrence Keunho Jang, Andrew Keunwoo Jang, Jing Yu Koh, Ruslan Salakhutdinov

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出MyPCBench基准,在模拟真实桌面环境(含17个Web应用)中测试个人计算机使用代理,发现最佳模型Claude Opus 4.6仅解决55.4%任务,失败集中在多应用和长轨迹任务。

详情
AI中文摘要

当前的计算机使用代理基准测试在非个人化环境中评估模型。这导致评估与部署之间存在差距,因为个人助理预计将在用户的整个数字生活中工作,包括其上下文、历史数据和已登录账户。这种差距在Web任务中最为明显,因为实时Web评估无法测试需要登录或个人信息的网站,而真正的个人助理必须驱动这类网站。我们引入了MyPCBench,它在Linux桌面上测试计算机使用代理作为个人助理,该桌面填充了17个模拟的真实世界Web应用程序和一个完整的桌面堆栈,所有这些都为一个典型角色——来自《办公室》的Michael Scott——进行了种子化。我们在此环境中定义了184个任务,每个任务都受到来自OpenClaw社区的真实请求的启发,并使用统一的计算机+bash工具界面基准测试了六个闭源和开源模型。我们发现,最佳模型Claude Opus 4.6完全解决了55.4%的任务,是唯一超过50%的模型。模型失败集中在跨越多个应用程序的任务和长轨迹上,其中个性化对助理的压力最大。我们在https://mypcbench.com上发布了环境、任务集和代理工具包。

英文摘要

Current benchmarks for computer-use agents evaluate models in impersonal environments. This leaves a gap between evaluation and deployment where personal assistants are expected to work across a user's whole digital life, including their context, historical data, and logged-in accounts. This gap is widest on web tasks, where live web evaluations cannot exercise sites that require logging in or personal information, the kind of site a real personal assistant has to drive. We introduce MyPCBench, which tests computer-use agents as personal assistants on a Linux desktop populated with 17 simulated real-world web applications and a full desktop stack, all seeded for one canonical persona, Michael Scott from The Office. We define 184 tasks in this environment, each inspired by a real request drawn from the OpenClaw community, and benchmark six closed and open-weight models with a uniform computer+bash tool surface. We find that the best model, Claude Opus 4.6, fully solves 55.4\% of the tasks, the only model above 50\%. Model failures cluster on tasks that span many applications and on long trajectories, where personalization stresses an assistant the most. We release the environment, task set, and agent harness at https://mypcbench.com.

2606.16765 2026-06-16 cs.LG physics.flu-dyn 新提交

A Validated LBM Dataset and Pipeline for Surrogate Modeling of Turbulent 3D Obstructed Channel Flows

一个经过验证的LBM数据集和用于湍流三维阻塞通道流代理建模的流水线

Lukas Schröder, Shubham Kavane, Harald Köstler

发表机构 * Chair of Computer Science 10 (System Simulation)(计算机科学系10号 chair(系统仿真)) Friedrich-Alexander-Universität Erlangen-Nürnberg(埃尔朗根-纽伦堡弗里德里希-亚历山大大学)

AI总结 提出一个可复现的流水线,生成雷诺数1000-10000的三维通道流训练数据,使用累积碰撞算子的格子玻尔兹曼求解器,并通过实验测量和网格收敛研究验证,为神经算子标准化比较提供基础。

Comments 4 pages + appendix, 9 figures, Accepted at the 1st Workshop on Differentiable Systems and Scientific Machine Learning (SysDiff) @ EurIPS 2025, OpenReview: https://openreview.net/forum?id=rdmHT72NQH

详情
AI中文摘要

评估三维湍流的神经算子需要经过验证的数据集和物理基准。我们提出了一个可复现的流水线,用于生成在雷诺数1000-10000范围内、围绕生成几何体的三维通道流的训练数据。我们的格子玻尔兹曼求解器采用累积碰撞算子,并通过实验测量(斯特劳哈尔数、阻力系数、湍流波动)进行了严格验证,在1024x512x512分辨率下进行了全面的网格收敛研究。基于已建立的框架,这个经过验证的流水线能够实现代理模型的标准化比较。我们概述了计划中的系统评估,包括傅里叶神经算子与U-Net变体在预测、超分辨率和误差校正任务上的表现,并使用物理信息度量来评估湍流能量级联的表示。未来的工作将比较数值求解器和神经代理之间的计算效率,探索实际应用。我们寻求社区对我们验证方法、计划中的基准方法论以及湍流中神经算子评估优先级的反馈。

英文摘要

Evaluating neural operators for 3D turbulent flow requires validated datasets with physical benchmarks. We present a reproducible pipeline generating training data for 3D channel flows around generated geometries at Re=1,000-10,000. Our lattice Boltzmann solver with cumulant collision operators is rigorously verified against experimental measurements (Strouhal number, drag coefficients, turbulent fluctuations) with comprehensive grid convergence studies at resolution 1024x512x512. Building upon an established framework, this validated pipeline enables standardized surrogate model comparison. We outline planned systematic evaluation of Fourier Neural Operator and U-Net variants on forecasting, super-resolution, and error correction tasks, using physics-informed metrics to assess turbulent energy cascade representation. Future work will compare computational efficiency between numerical solvers and neural surrogates, exploring practical application. We seek community feedback on our validation approach, planned benchmark methodology, and evaluation priorities for neural operators in turbulent flows.

2606.16863 2026-06-16 cs.LG 新提交

HawkesNest: A Multi-Axis Synthetic Benchmark for Spatiotemporal Pattern Complexity

HawkesNest:时空模式复杂度的多轴合成基准

Yahya Aalaila, Sumantrak Mukherjee, Gerrit Großmann, Sebastian Vollmer

发表机构 * German Research Center for Artificial Intelligence (DFKI), Data Science and its Applications Research Group, Kaiserslautern, Germany(德国人工智能研究中心(DFKI),数据科学及其应用研究组,凯撒斯劳滕) Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau (RPTU), Kaiserslautern, Germany(莱茵兰-普法尔茨凯撒斯劳滕-兰道工业大学(RPTU)计算机科学系,凯撒斯劳滕)

AI总结 提出HawkesNest基准,基于多元Hawkes过程定义四个复杂度轴,用于可控测试时空点过程模型在已知结构难度下的性能。

详情
AI中文摘要

时空点过程(STPP)模型的评估严重依赖于不透明的真实世界数据集,其中潜在生成结构未知且模型失败难以归因。我们引入HawkesNest,一个基于多元Hawkes骨干的生成器对齐基准,用于可控的时空模式复杂度。HawkesNest定义了四个复杂度轴:时空纠缠、背景异质性、跨类型交互和域拓扑。每个轴与从潜在数据生成机制计算出的确定性指标相关联。通过在保持全局速率、稳定性和模拟预算固定的同时改变这些轴,HawkesNest能够在已知结构难度下对STPP模型进行诊断性压力测试。我们验证了在受控扫描下这些指标是单调且几乎正交的。我们通过展示Hawkes系列基线在联合异质性-纠缠复杂度下性能下降来说明其用途,尽管它们在结构上与Hawkes数据生成骨干对齐。我们进一步表明HawkesNest暴露了神经模型的敏感性:AutoSTPP在时空纠缠单独增加时仍然脆弱。代码可在https://github.com/YahyaAalaila/HawkesNest获取。

英文摘要

Evaluation of spatiotemporal point process (STPP) models relies heavily on opaque real-world datasets, where latent generative structure is unknown and model failures are difficult to attribute. We introduce HawkesNest, a generator-aligned benchmark for controlled spatiotemporal pattern complexity built on a multivariate Hawkes backbone. HawkesNest defines four complexity axes: space--time entanglement, background heterogeneity, cross-type interaction, and domain topology. Each axis is associated with a deterministic index computed from the latent data-generating mechanism. By varying these axes while holding global rate, stability, and simulation budget fixed, HawkesNest enables diagnostic stress tests of STPP models under known structural difficulty. We verify that the indices are monotone and nearly orthogonal under controlled sweeps. We illustrate its use by showing that Hawkes-family baselines degrade under joint heterogeneity--entanglement complexity, even though they are structurally aligned with the Hawkes data-generating backbone. We further show that HawkesNest exposes neural-model sensitivity: AutoSTPP remains vulnerable under isolated increases in space--time entanglement. Code. Available at https://github.com/YahyaAalaila/HawkesNest

2606.17014 2026-06-16 cs.LG math.ST stat.ML stat.TH 新提交

Filtered Conformal Ellipsoids for Graph-Native Time Series

图原生时间序列的过滤共形椭球

Yannick Limmer

发表机构 * DRW London(DRW伦敦)

AI总结 提出过滤共形椭球方法,结合状态空间滤波与共形校准,为多元时间序列生成联合预测集,控制单事件并适应跨坐标依赖,通过可观测预测律商分析保证覆盖界。

详情
AI中文摘要

多元时间序列的联合预测集应控制单个事件,同时适应跨坐标依赖性。我们研究过滤共形椭球:一个冻结的状态空间滤波器输出一步预测均值和协方差,并对得到的马氏距离分数应用分割共形校准。滤波器用于选择椭球形状;共形校准选择标量半径,因此该构造受益于学习到的预测协方差,而不依赖高斯尾部概率来保证覆盖。主要困难在于过滤分数是依赖的,且学习到的循环滤波器不需要在其原始隐藏状态上收缩;因此,我们分析可观测预测律商中的收缩,该商识别产生相同未来发射高斯律序列的隐藏状态。在稳定的贝叶斯高斯投影滤波器、协方差界和有限时域可观测性费舍尔条件下,小超额高斯负对数似然意味着学习到的发射律的收缩。结合阈值自协方差包络,这给出了依赖下过滤分割共形预测的切比雪夫型近似覆盖界;更尖锐的伯恩斯坦型界需要额外的几何混合集中假设。在高斯预言可实现性下,我们还在条件有效的高斯椭球规则类中获得了接近预言的log体积比较。我们使用具有对角加低秩协方差的GCN-GRU滤波器实例化该框架。在中等规模的图原生交通基准(METRLA-$20$和PEMSBAY-$50$)上,学习到的滤波器比静态协方差和非滤波基线给出更尖锐的目标椭球;在全图规模和非图原生数据集上,因子和copula基线可能更强。

英文摘要

Joint prediction sets for multivariate time series should control a single event while adapting to cross-coordinate dependence. We study filtered conformal ellipsoids: a frozen state-space filter emits a one-step predictive mean and covariance, and split-conformal calibration is applied to the resulting Mahalanobis scores. The filter is used to choose the ellipsoid shape; conformal calibration chooses the scalar radius, so the construction benefits from a learned predictive covariance without relying on Gaussian tail probabilities for coverage. The main difficulty is that filtered scores are dependent and learned recurrent filters need not contract in their raw hidden state; we therefore analyse contraction in an observable predictive-law quotient that identifies hidden states producing the same future sequence of emitted Gaussian laws. Under a stable Bayes Gaussian-projection filter, covariance bounds, and a finite-horizon observability Fisher condition, small excess Gaussian negative log-likelihood implies contraction of the learned emitted laws. Combined with a threshold-autocovariance envelope this yields a Chebyshev-type approximate coverage bound for filtered split-conformal prediction under dependence; a sharper Bernstein-type bound requires an additional geometric-mixing concentration assumption. Under Gaussian oracle realisability we also obtain a near-oracle log-volume comparison within the class of conditionally valid Gaussian ellipsoid rules. We instantiate the framework with a GCN-GRU filter with diagonal-plus-low-rank covariance. On moderate-size graph-native traffic benchmarks (METRLA-$20$ and PEMSBAY-$50$), the learned filter gives sharper at-target ellipsoids than static-covariance and non-filter baselines; at full-graph scale and on non-graph-native datasets, factor and copula baselines can be stronger.

2606.14780 2026-06-16 cs.CV cs.LG 交叉投稿

YTClickbait21K: Human-Annotated Multimodal Dataset for YouTube Clickbait Detection Across Diverse Channels and Content Categories

YTClickbait21K:面向YouTube点击诱饵检测的多模态人工标注数据集,覆盖多样频道与内容类别

Md. Minhazul Islam, Md. Tanbeer Jubaer, Amith Khandakar, Shovon Sarker, Sumaiya Rahman, Md. Masum Mia, Mohamed Arselene Ayari, Hamed Noori

发表机构 * Department of Computer Science and Engineering, Rajshahi University of Engineering & Technology(拉贾沙希工程与技术大学计算机科学与工程系) Department of Electrical Engineering, Qatar University(卡塔尔大学电气工程系) Department of Civil and Environmental Engineering, Qatar University(卡塔尔大学土木与环境工程系) SenseNet Inc.(SenseNet公司)

AI总结 为应对视频平台点击诱饵检测缺乏大规模高质量多模态数据的问题,构建了包含21,238个视频、来自29国40频道、覆盖新闻/娱乐/教育/游戏等类别的人工标注数据集YTClickbait21K,通过三人独立标注与多数投票确保质量,为多模态语义理解和自动内容审核提供基准。

详情
AI中文摘要

视频分享平台上的点击诱饵内容对信息可靠性构成重大挑战,然而自动检测的进展一直受限于缺乏大规模、高质量的多模态数据集。我们提出了YTClickbait21K,一个人工标注的YouTube点击诱饵数据集,包含来自29个国家40个频道的21,238个视频,覆盖新闻、娱乐、教育和游戏等多种内容类别。每个样本包括结构化元数据(标题、描述、互动统计)以及相关的缩略图图像,支持全面的多模态分析。为确保标注质量,每个视频由三名标注员使用标准化的决策框架独立标注,该框架融合了文本、视觉和跨模态一致性线索,最终标签通过多数投票确定。该数据集展现出显著的人工标注一致性(k=0.65),尽管点击诱饵检测具有固有的主观性,但仍确认了可靠的标注。通过结合规模、标注严谨性和多模态丰富性,该数据集为开发和评估机器学习模型提供了稳健的基准,促进了跨模态语义理解的研究,并推动了自动内容审核系统的发展。

英文摘要

Clickbait content on video-sharing platforms poses a significant challenge to information reliability, yet progress in automated detection has been constrained by the lack of large-scale, high-quality multimodal datasets. We present YTClickbait21K, a human-annotated YouTube clickbait dataset comprising 21,238 videos collected from 40 channels across 29 countries, covering diverse content categories such as news, entertainment, education, and gaming. Each sample includes structured metadata (title, description, engagement statistics) along with associated thumbnail images, enabling comprehensive multimodal analysis. To ensure annotation quality, every video was independently labeled by three annotators using a standardized decision framework that incorporates textual, visual, and cross-modal consistency cues, with final labels determined through majority voting. The dataset exhibits substantial inter-annotator agreement (k=0.65), confirming reliable labeling despite the inherent subjectivity of clickbait detection. By combining scale, annotation rigor, and multimodal richness, this dataset provides a robust benchmark for developing and evaluating machine learning models, facilitating research in cross-modal semantic understanding, and advancing automated content moderation systems.

2606.14784 2026-06-16 cs.SD cs.LG eess.AS 交叉投稿

LLM-Based Synthetic Ground Truth Generation for Audio-Based Emotion Classification via In-Context Learning

基于上下文学习的音频情感分类的LLM合成真实标签生成

Qing Huang, Pooja Pol, Jianing Zhang

发表机构 * School of Business, Technical University of Applied Sciences Augsburg(应用技术大学阿沙芬堡商学院) Data Science und Autonome Systeme Technologietransferzentrum (TTZ)(数据科学与自主系统技术转移中心(TTZ))

AI总结 提出利用大语言模型(LLM)和上下文学习(ICL)从多用户VR环境的流式语音数据中自动生成情感相关合成真实标签,解决团队协作状态标注难题。

Comments Proceedings of the International Conference on Applied Innovations in IT (ICAIIT), April 2026

详情
AI中文摘要

理解人类状态和交互动态是人机交互(HCI)的核心目标。随着交互范式变得更加沉浸,虚拟现实(VR)已成为研究协作工作的强大平台。在此类环境中,评估团队协作状态(包括团队表现和团队韧性)需要从多模态传感器数据(如语音信号)中连续可靠地推断潜在的团队级认知和情感状态。然而,由于传感器噪声、上下文变异性和稀疏的专家标注,为这些潜在状态生成真实标签仍然具有挑战性。传统的自我报告方法仅提供静态和延迟的测量,因此不足以捕捉连续语音数据中反映的动态团队过程。在这项工作中,我们提出了一种由大语言模型(LLM)驱动的、基于代理的推理工作流,用于从多用户VR环境中的流式语音数据自动生成情感相关的合成真实标签。利用LLM的泛化能力,我们使用上下文学习(ICL)和少量配对的音频样本及其对应转录的演示。ICL倾向于实现与模型微调相当的任务适应,同时避免了参数更新的计算开销。为了构建信息丰富且鲁棒的上下文提示,我们采用基于检索的选择策略,根据声学特征空间中的相似性动态识别相关的音频演示。

英文摘要

Understanding human states and interaction dynamics is a core goal of human-computer interaction (HCI). As interaction paradigms become more immersive, virtual reality (VR) has emerged as a powerful platform for studying collaborative work. In such settings, evaluating team collaboration states, including team performance and team resilience, requires continuous and reliable inference of latent team-level cognitive and affective states from multi-modal sensor data, such as speech signals. However, generating ground truth labels for these latent states remains challenging due to sensor-induced noise, contextual variability, and sparse expert annotations. Traditional self-reporting approaches provide only static and delayed measurements and are therefore insufficient for capturing dynamic team processes reflected in continuous speech data. In this work, we propose a large language model (LLM)-driven, agentic inference workflow for automated emotion-related synthetic ground truth generation from streaming speech data in multi-user VR environments. Leveraging the generalization capabilities of LLMs, we use In-Context Learning (ICL) with few-shot demonstrations of paired audio-based samples and their corresponding transcriptions. ICL tends to achieve task adaptation comparable to model fine-tuning while circumventing the computational overhead of parameter updates. To construct informative and robust in-context prompts, we adopt a retrieval-based selection strategy that dynamically identifies relevant audio demonstrations based on similarity in the acoustic feature space.

2606.14870 2026-06-16 hep-ph cs.LG 交叉投稿

Pre-Training for Simulation-Based Science: A Study on Jet Foundation Model Training Objectives

基于模拟的科学预训练:喷注基础模型训练目标研究

Ibrahim Elsharkawy, Joschka Birk, Vinicius Mikuni, Wahid Bhimji, Gregor Kasieczka, Benjamin Nachman

发表机构 * Department of Physics, University of Toronto and Vector Institute(物理系,多伦多大学和向量研究所) NERSC, Lawrence Berkeley National Laboratory(NERSC,伯克利国家实验室) Institut für Experimentalphysik, Universität Hamburg(实验物理研究所,汉堡大学) Nagoya University, Kobayashi-Maskawa Institute(名古屋大学,小林昭夫研究所) Department of Particle Physics and Astrophysics, Stanford University(粒子物理与天体物理系,斯坦福大学) Fundamental Physics Directorate, SLAC National Accelerator Laboratory(基础物理局,SLAC国家加速器实验室)

AI总结 本文系统比较了高能物理中基础模型的预训练方法,发现纯分类预训练在标签充足时最优,结合自监督掩码粒子建模在低标签场景下表现突出,而流匹配生成预训练对下游分类无益,但必须包含在预训练目标中才能提升生成任务。

详情
AI中文摘要

基于大规模数据集预训练并在下游任务上微调的基础模型已成为人工智能促进科学领域的强大范式。工业基础模型通常由于缺乏标签而使用掩码自监督训练。在许多科学领域,精确的模拟资源丰富,并提供了大量带标签的数据集,这为预训练开辟了新的可能性。我们利用全学习高能物理基础模型框架,系统比较了预训练方法。我们测试了监督分类、流匹配生成和自监督掩码粒子建模。所有模型均在JetClass数据集上预训练,并在两个代表性下游任务(顶喷注分类和JetNet条件生成)上微调。在其他观察中,对于分类任务,我们发现当下游标签和模型容量充足时,纯分类器预训练是最优的,但在低微调标签区域,将其与自监督掩码粒子建模结合具有独特优势。基于流匹配的生成预训练似乎对下游分类几乎没有益处,有趣的是,对于下游生成,我们发现流匹配必须出现在预训练目标中才能看到显著的微调优势,这暗示了分类和生成任务的正交性。也就是说,要使模型能够迁移到生成和分类下游任务,它必须在两者上都进行预训练。本研究为基于模拟科学中基础模型的受控缩放分析提供了模板。

英文摘要

Foundation models (FMs) trained on large datasets and fine-tuned on downstream tasks have emerged as a powerful paradigm in AI for science. Industrial FMs are typically trained using self-supervision with masking due to the lack of labels. In many scientific domains, accurate simulations are plentiful and facilitate large, labeled datasets. This opens up new possibilities for pre-training. We present a systematic comparison of pre-training methods using the OmniLearned High Energy Physics FM framework. We test supervised classification, flow-matching generation, and self-supervised masked particle modeling. All models are pre-trained on the JetClass dataset and fine-tuned on two representative downstream tasks, top jet classification and JetNet conditional generation. Among other observations, for classification tasks, we find that pure classifier pre-training is optimal when downstream labels and model capacity are plentiful, but combining it with self-supervised masked particle modeling (MPM) is uniquely powerful in the low-finetuning label regime. Flow matching-based generative pre-training seems to provide little benefit for downstream classification, and interestingly, for downstream generation, we find that flow matching must be in the pre-training objective to see a significant finetuning advantage, hinting at the orthogonality of classification and generation tasks. That is, for a model to transfer to both generative and classification downstream tasks, it must be pre-trained on both. This study provides a template for controlled scaling analysis of pre-training objectives for foundation models in simulation-based sciences.

2606.14958 2026-06-16 cs.CV cs.IR cs.LG 交叉投稿

MVEB: Massive Video Embedding Benchmark

MVEB:大规模视频嵌入基准

Adnan El Assadi, Roman Solomatin, Isaac Chung, Chenghao Xiao, Deep Shah, Manan Dey, Shriya Sudhakar, Zacharie Bugaud, Wissam Siblini, Ayush Sunil Munot, Yashwanth Devavarapu, Rakshitha Ireddi, Michelle Yang, Márton Kardos, Niklas Muennighoff, Kenneth Enevoldsen

AI总结 提出MVEB基准,包含23个任务评估33种视频嵌入模型,发现无单一模型占优,音频贡献取决于标注来源,并集成到MTEB生态。

详情
AI中文摘要

我们介绍了大规模视频嵌入基准(MVEB),这是一个包含23个任务的视频嵌入基准,涵盖分类、零样本分类、聚类、配对分类、检索和以视频为中心的问答。我们评估了33个模型,发现没有单一模型占优:基于MLLM的嵌入在分类、聚类、配对分类和问答上领先;多模态绑定在检索和零样本分类上领先;没有对比适应训练的生成式MLLM在跨模态任务上崩溃。成对的仅视频与音频+视频评估表明,音频的贡献取决于数据集标注来源:当标签来自两种模态时音频有帮助,当仅来自视觉时则有害,这一差距在模型族中一致为6个百分点。MVEB源自MVEB+(一个包含184个任务的任务池),旨在保持任务多样性的同时降低评估成本。它集成到MTEB生态系统中,以实现跨文本、图像、音频和视频的统一评估。我们在https://github.com/embeddings-benchmark/mteb上发布MVEB和所有184个任务,以及代码和排行榜。

英文摘要

We introduce the Massive Video Embedding Benchmark (MVEB), a 23-task benchmark for video embeddings spanning classification, zero-shot classification, clustering, pair classification, retrieval, and video-centric question answering. We evaluate 33 models and find that no single model dominates: MLLM-based embeddings lead on classification, clustering, pair classification, and QA; multimodal binding leads on retrieval and zero-shot classification; generative MLLMs without contrastive adaptation collapse on cross-modal tasks. Paired video-only vs. audio+video evaluations show that audio's contribution depends on dataset annotation provenance: audio helps when labels were produced from both modalities and hurts when they were produced from visuals alone, a six-point gap consistent across model families. MVEB is derived from MVEB+, a 184-task pool, and is designed to maintain task diversity while reducing evaluation cost. It integrates into the MTEB ecosystem for unified evaluation across text, image, audio, and video. We release MVEB and all 184 tasks along with code and a leaderboard at https://github.com/embeddings-benchmark/mteb.

2606.15123 2026-06-16 cs.CR cs.LG 交叉投稿

Data-Centric Benchmarking of Exploit Generation in LLMs: Understanding the Impact of Fine-Tuning

数据为中心的LLM漏洞利用生成基准测试:理解微调的影响

Yiwei Chen, Lichi Li, Kai Cheung, Vinny Parla, Ganesh Sundaram

发表机构 * Cisco Systems, Inc.(思科系统公司) Michigan State University(密歇根州立大学)

AI总结 采用数据驱动方法,构建高质量数据集并设计评估框架,对17个大语言模型进行零样本漏洞利用生成能力基准测试,发现8B开源模型经微调后性能提升超42.5%,接近部分商业模型。

Comments Technical Report

详情
AI中文摘要

我们研究了CVE条件漏洞利用生成任务,即模型根据软件漏洞上下文生成概念验证(PoC)漏洞利用。我们采用数据驱动的方法,通过多阶段预处理构建高质量数据集,并引入可扩展的评估框架,使用LLM作为评判者和细粒度评分标准。在此统一设置下,我们根据8个评估标准对17个大语言模型进行了基准测试,系统性地洞察了它们的零样本能力。我们进一步证明,一个紧凑的8B开源模型在精选数据上微调后,漏洞利用质量提升了超过42.5%,并且当与简单的测试时拒绝策略结合时,可与一些专有模型相媲美。我们的结果强调了数据质量、结构化监督和评估设计对于可靠漏洞利用生成的重要性,表明这些因素在将LLM适应网络安全任务时可能与模型规模同等关键。

英文摘要

We study the task of CVE-conditioned exploit generation, where a model drafts proof-of-concept (PoC) exploits given software vulnerability context. We adopt a data-centric approach, constructing a high-quality dataset via multi-stage preprocessing and introducing a scalable evaluation framework with LLM-as-judge and fine-grained rubrics. Under this unified setup, we benchmark 17 large language models across 8 evaluation criteria, providing systematic insights into their zero-shot capabilities. We further show that a compact 8B open-weight model, when fine-tuned on curated data, achieves over 42.5% improvement in exploit quality and rivals some proprietary models when combined with simple test-time rejection strategies. Our results highlight the importance of data quality, structured supervision, and evaluation design for reliable exploit generation, suggesting that these factors can be as critical as model scale in adapting LLMs to cybersecurity tasks.

2606.15367 2026-06-16 cs.AI cs.CL cs.IR cs.LG 交叉投稿

S1-DeepResearch: Beyond Search, Toward Real-World Long-Horizon Research Agents

S1-DeepResearch:超越搜索,迈向真实世界的长周期研究智能体

Yao Dong, Xinglin Xiao, Liwei Dong, Xinlong Jin, Zhengbo Li, Heng Zhang, Duyun Wang, Nan Xu

发表机构 * XScience Lab(XScience实验室) Wenge AI(问格人工智能)

AI总结 提出统一轨迹构建范式,结合封闭式问答与开放式探索,通过图基任务构建、智能体轨迹生成和多维验证,合成高质量长链推理轨迹,训练出在20个基准上达到开源最优的32B模型。

详情
AI中文摘要

深度研究智能体旨在通过长周期规划、证据收集、推理和报告生成来解决复杂的知识密集型任务。尽管搜索智能体近期在信息检索和答案验证方面展现出强大能力,但现有训练数据集大多以搜索为中心,主要关注封闭式问答和信息定位。因此,它们主要训练信息寻求行为,而对关键深度研究能力(包括证据整合、知识综合、规划、文件理解和结构化报告生成)的覆盖有限。在这项工作中,我们提出了一种用于深度研究智能体的统一轨迹构建范式,该范式结合了封闭式问答和开放式探索。所提出的框架包括图基任务构建、智能体轨迹展开和多维轨迹验证,能够可扩展地合成涵盖长链复杂推理、深度研究指令遵循、报告撰写、文件理解与生成以及技能使用的高质量智能体轨迹。与现有的面向搜索的数据集相比,我们合成的轨迹更强调知识综合、复杂推理和规划。S1-DeepResearch-32B在跨越五个能力维度(包括复杂推理、指令遵循、报告生成、文件理解和技能使用)的20个基准测试中,达到了同等规模开源模型的最先进性能。在几个具有挑战性的深度研究基准上,它接近领先的专有前沿模型的性能。这些结果强调了联合建模信息获取、知识综合和面向规划的智能体行为对于构建有效深度研究智能体的重要性。

英文摘要

Deep research agents aim to solve complex knowledge-intensive tasks through long-horizon planning, evidence gathering, reasoning, and report generation. While recent progress in search agents has demonstrated strong capabilities in information retrieval and answer verification, most existing training datasets remain search-centric, focusing primarily on closed-ended question answering and information localization. As a result, they mainly train information-seeking behavior while providing limited coverage of key deep research capabilities, including evidence integration, knowledge synthesis, planning, file understanding, and structured report generation. In this work, we propose a unified trajectory construction paradigm for deep research agents that combines closed-ended QA and open-ended exploration. The proposed framework consists of graph-grounded task formulation, agentic trajectory rollout, and multi-dimensional trajectory verification, enabling scalable synthesis of high-quality agentic trajectories spanning long-chain complex reasoning, deep research instruction following, report writing, file understanding and generation, and skills usage. Compared with existing search-oriented datasets, our synthesized trajectories place greater emphasis on knowledge synthesis, complex reasoning, and planning. S1-DeepResearch-32B achieves state-of-the-art performance among open-source models of comparable scale across 20 benchmarks spanning five capability dimensions, including complex reasoning, instruction following, report generation, file understanding, and skills usage. On several challenging deep research benchmarks, it approaches the performance of leading proprietary frontier models. These results highlight the importance of jointly modeling information acquisition, knowledge synthesis, and planning-oriented agent behaviors for building effective deep research agents.

2606.15532 2026-06-16 cs.CL cs.LG 交叉投稿

EIBench: A Simulator-Based Benchmark and Turn-Credit RL for Emotion Management

EIBench: 基于模拟器的基准测试和用于情绪管理的回合信用强化学习

Rongzhi Zhu, Xiang Huang, Yuchuan Wu, Rui Wang, Zequn Sun, Tao Ren, Weiyao Luo, Bingxue Qiu, Jieping Ye, Yongbin Li, Wei Hu

发表机构 * State Key Laboratory for Novel Software Technology, Nanjing University(南京大学计算机软件新技术国家重点实验室) Qwen-Character Team, Alibaba Group(阿里巴巴集团Qwen-Character团队)

AI总结 提出EIBench模拟器基准,包含2222个场景,通过2x2分类(支持、防御、修复、魅力)评估多轮情绪管理;并设计CTC-GRPO方法利用逐轮状态更新作为密集反馈,提升模型情绪智能。

详情
AI中文摘要

大型语言模型(LLM)的情绪智能(EI)通常通过静态理解任务或单轮对话生成来评估。然而,情绪管理是交互式的:一个好的模型不仅应识别用户的情绪,还应在多轮对话中改善用户的情绪和关系状态。我们引入了EIBench,一个基于模拟器的交互式情绪管理基准。EIBench包含2222个场景,其中2009个用于训练,213个用于保留测试。场景按2x2分类法组织,涵盖支持、防御、修复和魅力,分别对应不同形式的支持、边界维护、信任修复和融洽关系建立。在每个场景中,LLM模拟器扮演用户,每轮后更新情绪-关系状态,并将最终状态映射到基于锚点的分数。这一设计使EIBench既是一个评估基准,也是一个训练环境:最终状态提供结果奖励,而逐轮状态更新为强化学习提供密集反馈。我们评估了15个开源和闭源LLM。当前模型在支持和融洽关系建立场景中表现良好,但在用户压力下的边界维护方面存在困难。为了提升LLM的情绪智能能力,我们提出了中心化回合信用GRPO(CTC-GRPO),这是GRPO的一个扩展,它重用模拟器的逐轮状态更新作为密集的回合级反馈,同时保留最终结果奖励。CTC-GRPO将Qwen3-8B在EIBench上的得分从-22.4提升至+22.4,并在分布外评估(包括SAGE +12.4和EQBench3 +20.9%)中也有所提升。我们的结果表明,模拟器追踪的用户状态可以支持多轮情绪管理的评估和训练。

英文摘要

Emotional intelligence (EI) in Large Language Models (LLMs) is often evaluated through static understanding tasks or single-response dialogue generation. However, emotion management is interactive: a good model should not only recognize a user's emotion, but also improve the user's emotional and relational state over several turns. We introduce EIBench, a simulator-based benchmark for interactive emotion management. EIBench contains 2,222 scenarios, with 2,009 for training and 213 for held-out testing. The scenarios are organized by a 2x2 taxonomy covering Support, Defense, Repair, and Charm, which together capture different forms of support, boundary maintenance, trust repair, and rapport building. In each scenario, an LLM simulator plays the user, updates an emotion-relation state after each turn, and maps the final state to an anchor-based score. This design makes EIBench both an evaluation benchmark and a training environment: the final state gives the outcome reward, while the per-turn state updates provide dense feedback for RL. We evaluate 15 open- and closed-source LLMs. Current models perform well on support and rapport-building scenes, but struggle with boundary maintenance under user pressure. To improve the EI ability of LLMs, we propose Centered Turn-Credit GRPO (CTC-GRPO), a GRPO extension that reuses the simulator's per-turn state updates as dense turn-level feedback while preserving the final outcome reward. CTC-GRPO improves Qwen3-8B from -22.4 to +22.4 on EIBench and also improves on out-of-distribution evaluations including SAGE (+12.4) and EQBench3 (+20.9%). Our results show that simulator-tracked user states can support both evaluation and training for multi-turn emotion management.

2606.15600 2026-06-16 cs.DB cs.LG 交叉投稿

When Does q-error Predict Plan Regret? Three Regimes of Cardinality-Estimation Error

q-误差何时能预测计划遗憾?基数估计误差的三种机制

Madhulatha Mandarapu, Sandeep Kunkunuru

发表机构 * VaidhyaMegha Private Limited, India(印度VaidhyaMegha私有有限公司)

AI总结 研究基数估计中q-误差与查询计划质量的关系,发现计划遗憾由三种机制决定:小误差时条件数κ预测更优,大误差时ACS-infinity预测遗憾,而q-误差在查询级别几乎无信息。

Comments 8 pages, 5 figures. Code, benchmarks, and full pre-registration: https://github.com/samyama-ai/ce-metric-eval

详情
AI中文摘要

基数估计(CE)研究通过q-误差对估计器进行排名,但众所周知q-误差是查询计划质量的不完美代理。我们通过测量驱动的方式阐述了它何时是好的代理、何时不是,以及原因。将计划选择建模为分段线性代价景观上的argmin,我们发现计划遗憾(在真实基数下所选计划相对于最优计划的代价)以依赖于机制的方式由计划-代价几何决定。(i)对于小误差,真点条件数κ预测遗憾,且优于q-误差;其预测能力随误差增长衰减至零,正如局部线性化所必然。(ii)对于大误差——部署的学习型估计器在此运行——一个与估计器无关的平均情况次优性度量ACS-infinity预测哪些查询容易产生遗憾(在STATS-CEB上Spearman ρ约0.54),而q-误差在查询级别几乎无信息(ρ约0.05)。(iii)最坏情况是Haritsa的最大次优性(MSO)。这三种机制是在三种权重下的同一代价比谱。我们证明了一个极限律ACS-infinity = sum_k r_k pi_k,其中组合权重与基数无关,并在STATS-CEB和JOB-light上使用四个已发布估计器,在预注册决策规则下验证了每个主张,并在真实PostgreSQL运行时确认ACS-infinity在q-误差不能时预测遗憾。贡献是概念性和经验性的——作为最坏情况鲁棒查询优化的平均情况补充,以及刻画准确性度量何时跟踪计划质量——而非新的估计器。代码和完整预注册已公开。

英文摘要

Cardinality-estimation (CE) research ranks estimators by q-error, yet it is well known that q-error is an imperfect proxy for query-plan quality. We give a measurement-driven account of when it is a good proxy and when it is not, and why. Modeling plan selection as an argmin over a piecewise-linear cost landscape, we find that plan regret (the cost of the chosen plan relative to the optimal, under true cardinalities) is governed by plan-cost geometry in a regime-dependent way. (i) For small errors, a true-point condition number kappa predicts regret and out-predicts q-error; its predictive power decays to zero as error grows, as a local linearization must. (ii) For large errors -- where deployed learned estimators operate -- an estimator-independent average-case sub-optimality measure ACS-infinity predicts which queries are regret-prone (Spearman rho ~ 0.54 on STATS-CEB), while q-error is nearly uninformative at the query level (rho ~ 0.05). (iii) The worst case is Haritsa's maximum sub-optimality (MSO). The three are one cost-ratio spectrum under three weightings. We prove a limit law ACS-infinity = sum_k r_k pi_k with cardinality-independent combinatorial weights, and validate every claim on STATS-CEB and JOB-light with four released estimators under pre-registered decision rules, and confirm on real PostgreSQL runtime that ACS-infinity predicts regret where q-error does not. The contribution is conceptual and empirical -- an average-case companion to worst-case robust query optimization, and a characterization of when an accuracy metric tracks plan quality -- rather than a new estimator. Code and the full pre-registration are public.

2606.15610 2026-06-16 cs.CL astro-ph.IM cs.AI cs.LG 交叉投稿

LLM Judges Have Dark Current: A Psychometric Datasheet for LLM-as-a-Judge Evaluation

LLM 裁判具有暗电流:LLM 作为裁判评估的心理测量数据表

Hiroyasu Usami, Keisuke Hara, Ayato Tsuboi, Naohiko Matsuda

发表机构 * Chubu University(中部大学) Mitsubishi Heavy Industries, Ltd., Research & Innovation Center(三菱重工业株式会社研究创新中心)

AI总结 提出裁判数据表协议,通过真空输入、表面变异、位置偏好等指标测量 LLM 裁判的暗电流和偏差,揭示其测量特性。

Comments 22 pages, 4 figures

详情
AI中文摘要

LLM 作为裁判的系统现在常规用于开放式模型评估,其中人类偏好标注成本高、速度慢且难以复现。然而,这些裁判通常被报告为标量准确率、胜率或一致性指标。我们认为,裁判应被报告为测量仪器。我们引入了一个裁判数据表协议,该协议测量在真实真空输入下的暗电流、对相同质量表面变化的稳定交叉敏感性、位置虚假偏好、在受控质量阶梯上的目标敏感性,以及由平局指令引发的标准或操作点。方向-稳定性分解揭示,明显的 Delta0 偏好可能是稳定的表面响应或伪装的位置偏差。在一个三裁判开放权重案例研究中,Llama-3.1-8B 显示出高暗电流和呈现冲突的 Delta0 行为,Qwen2.5-14B 是真空清洁且对目标敏感,但混合了稳定和位置过度判别,而 Qwen2.5-32B 是真空清洁,具有低稳定交叉敏感性和低位置虚假偏好。严格的平局标准消除了 Qwen32B 的 Delta0 虚假偏好,但将边缘 Delta1 目标信号吸收为平局,同时保留了 Delta5 敏感性。结果表明,提示移动的是标准,而不是分辨率。我们并不声称激发这项工作的下游机制假设已得到确认;贡献是在做出下游声明之前测量测量仪器的计量协议。

英文摘要

LLM-as-a-judge systems are now routinely used for open-ended model evaluation, where human preference annotation is costly, slow, and difficult to reproduce. Yet these judges are often reported as scalar accuracy, win-rate, or agreement devices. We argue that a judge should instead be reported as a measurement instrument. We introduce a Judge Datasheet protocol that measures dark current under true-vacuum inputs, stable cross-sensitivity to same-quality surface variation, positional false preference, target sensitivity on a controlled quality ladder, and the criterion or operating point induced by tie instructions. The direction-stability decomposition reveals that apparent Delta0 preference can be stable surface response or disguised position bias. In a three-judge open-weight case study, Llama-3.1-8B shows high dark current and presentation-conflicted Delta0 behavior, Qwen2.5-14B is vacuum-clean and target-sensitive but mixes stable and positional over-discrimination, and Qwen2.5-32B is vacuum-clean with low stable cross-sensitivity and low positional false preference. A strict tie criterion eliminates Qwen32B Delta0 false preference but absorbs marginal Delta1 target signals into ties while preserving Delta5 sensitivity. The results show that prompting moves the criterion, not the resolution. We do not claim that the downstream mechanism hypothesis that motivated this work is confirmed; the contribution is a metrological protocol for measuring the measuring device before downstream claims are made.

2606.15673 2026-06-16 cs.AI cs.LG 交叉投稿

Where Did It Go Wrong? Process-Level Evaluation of Web Agents with Semantic State Tracking

哪里出错了?基于语义状态追踪的Web智能体过程级评估

Jiwan Chung, JiHyuk Byun, Vibhav Vineet, Seon Joo Kim

发表机构 * Yonsei University(延世大学) Microsoft Research(微软研究院)

AI总结 提出WebStep基准,通过语义MDP追踪过程状态,揭示隐藏于终端成功率下的智能体差异,并定位具体改进方向。

详情
AI中文摘要

Web智能体通过长交互序列执行任务,然而现有基准仅评估终端成功,丢弃所有过程信息,对改进提供的指导有限。在这项工作中,我们对Web智能体进行了过程级分析。我们引入了WebStep,一个包含1800个任务实例的基准,具有可控难度和自动语义状态追踪。每个网站除了GUI外还暴露一个确定性的语义MDP:智能体在界面上操作,而环境在后台记录高级状态和转换,从而实现无需人工标注的细粒度分析。基于语义轨迹,我们首先表明过程度量揭示了结果评估无法察觉的差异:三个成功率集中在31-33%的智能体在探索范围与执行准确性上存在分歧。然后,按技能分解刻画了这些差异的本质,揭示了同一网站内隐藏的相反技能排名:例如,在Housing上,OpenAI CUA在提交动作上优于Qwen3.5 23.7%,但在过滤上却落后15.6%,精确指出了即使在单个领域内也需要改进的具体技能。分叉分析进一步定位了导致任务失败的决定性错误,并表明该错误是智能体特定的而非共享的。最后,随着任务难度增加,这些差异扩大:在简单任务上成功率相似,但随着探索要求提高而急剧分化。我们的过程级分析为Web智能体评估开辟了新途径,提供了关于每个智能体应在何处以及如何改进的细粒度且可操作的见解。

英文摘要

Web agents act through long interaction sequences, yet existing benchmarks evaluate only terminal success, discarding all process information and offering little guidance on improvement. In this work, we conduct a process-level analysis of web agents. We introduce WebStep, a benchmark of 1,800 task instances with controlled difficulty and automatic semantic state tracking. Each website exposes a deterministic semantic MDP alongside the GUI: the agent operates on the interface, while the environment records high-level states and transitions in the background, enabling fine-grained analysis without manual annotation. Based on the semantic trajectory, we first show that process metrics reveal differences invisible to outcome evaluation: three agents whose success rates cluster within 31-33% diverge in exploration reach versus execution accuracy. Then, decomposing by skill characterizes the nature of these differences, exposing opposite per-skill rankings hidden within the same website: e.g., on Housing, OpenAI CUA outperforms Qwen3.5 by 23.7% on commit actions yet underperforms it by 15.6% on filtering, pinpointing a concrete skill to improve even within a domain. Bifurcation analysis further localizes the decisive error that loses the task and shows that this error is agent-specific rather than shared. Finally, these differences widen as tasks grow harder: success rate is similar on easy tasks but separates sharply as exploration becomes more demanding. Our process-level analysis opens a new avenue in web agent evaluation, providing fine-grained and actionable insight into where and how each agent should be improved.

2606.15686 2026-06-16 cs.AI cs.LG 交叉投稿

Recurrent Reasoning on Symbolic Puzzles with Sequence Models

基于序列模型的符号谜题循环推理

Gowrav Mannem, Chowdhury Marzia Mahjabin, Jason Chen, Shivank Garg, Kevin Zhu

发表机构 * Algoverse AI Research Cornell University(康奈尔大学)

AI总结 提出 RecurrReason 基准,包含四个递归逻辑谜题,通过控制难度参数 N 评估序列模型,发现架构比规模更重要,预训练仅对局部结构转移函数的谜题有效。

详情
AI中文摘要

大型语言模型在符号和算法任务上通常表现强劲,但当问题变长、变难或略微超出分布时,这种表面优势可能隐藏脆弱行为。当前推理基准的一个主要限制是,许多主要测试模型是否能产生有效答案,而较少关注解决方案在可控难度缩放下是否最小、稳健和稳定。我们引入了 RecurrReason,一个难度可控的基准,包含四个递归逻辑谜题(汉诺塔、过河问题、积木世界和跳棋),具有 BFS 最优轨迹和单一可解释难度参数 $N \in \{1,\dots,10\}$,总计 10,817 个独特谜题和 285,933 步动作。我们在一致的数据划分和评估标准下,对两个 Transformer 家族(编码器-解码器模型(T5 风格)和仅解码器模型(GPT-2 风格))进行基准测试,在 $N=1$ 到 $7$ 上训练,并在 $N=8$ 到 $10$ 的保留分布内实例和更难的分布外实例上评估。微调后的预训练 T5 在积木世界上达到 97.27% 的验证准确率和 81.00% 的 OOD 准确率;所有模型在过河问题上的所有条件下得分为 0.00%。失败模式分析表明,架构比规模更能决定成功。预训练仅能迁移到具有局部结构转移函数的谜题。我们的代码和数据集将在接收后开源。

英文摘要

Large language models often appear strong on symbolic and algorithmic tasks, yet this apparent strength can hide brittle behaviour when problems become longer, harder, or slightly out of distribution. A major limitation of current reasoning benchmarks is that many primarily test whether a model can produce a valid answer, while paying less attention to whether the solution is minimal, robust, and stable under controlled difficulty scaling. We introduce RecurrReason, a difficulty-controlled benchmark of four recurrent logic puzzles (Tower of Hanoi, River Crossing, Block World, and Checkers Jumping) with BFS-optimal trajectories and a single interpretable difficulty parameter $N \in \{1,\dots,10\}$, totalling 10{,}817 unique puzzles and 285{,}933 moves. We benchmark two Transformer families, an encoder-decoder model (T5-style) and a decoder-only model (GPT-2-style), under consistent data splits and evaluation criteria, training on $N{=}1$ to $7$ and evaluating on both held-out in-distribution instances and harder out-of-distribution instances at $N{=}8$ to $10$. Fine-tuned pre-trained T5 achieves 97.27\% validation and 81.00\% OOD accuracy on Block World; all models score 0.00\% on River Crossing under all conditions. Failure mode analysis reveals that architecture is a stronger determinant of success than scale. Pre-training transfers only to puzzles with locally structured transition functions. Our code and dataset will be open-sourced upon acceptance.

2606.15899 2026-06-16 cs.CR cs.AI cs.HC cs.LG cs.MA 交叉投稿

SkillVetBench: LLM-as-Judge for Multi-Dimensional Security Risk Evaluation in Open-Source LLM Agent Skills

SkillVetBench: 基于LLM评判的多维安全风险评估开源LLM智能体技能

Ismail Hossain, Sai Puppala, Md Jahangir Alam, Tanzim Ahad, Sajedul Talukder

发表机构 * SUPREME Lab, University of Texas at El Paso, Texas, USA(SUPREME实验室,德克萨斯理工大学埃尔帕索分校,德克萨斯州,美国)

AI总结 提出SkillVetBench,利用LLM作为评判器对开源LLM智能体技能进行多维安全风险评估,引入五维技能智能体风险评分(SARS)和CVSS v4.0向量分解,在78个恶意技能上实现零假阴性,22个良性技能上零假阳性。

Comments The main research paper is submitted to NeurIPS 2027, it is in under review

详情
AI中文摘要

开源LLM智能体生态系统正在快速增长,然而社区贡献的技能——扩展智能体能力的模块化工具定义——的安全性在很大程度上仍未经过审查。我们填补的空白:现有的扫描器在代码层操作,在结构上对指令层和多智能体风险——劫持智能体的自然语言指令、通过编码侧信道窃取数据或跨流水线链式传播危害——视而不见,因此需要的是一个语义化的、多维度的审查系统,而不是另一个签名匹配器。我们提出了SKILLVETBENCH,一个在Hugging Face上的实时公共排行榜,它使用LLM作为评判器来审查智能体技能。新贡献:SARS(技能智能体风险评分),一个五维智能体风险度量,带有针对指令跟随系统的原则性加权公式。集成内容:完整的CVSS v4.0向量分解和一个ClawHub双视图,将我们的LLM生成的审查与官方市场判决并列。实验证明:基于我们的配套基准论文[1],LLM评判阶段在78个已确认的恶意技能上实现了零假阴性,在22个良性控制上实现了零假阳性,而最佳静态基线(SKILLSIEVE)仍然遗漏了15%;对于指令层类别如提示注入和记忆中毒,传统工具遗漏了89%到100%的威胁(例如,CODEBERT未检测到九个记忆中毒技能中的任何一个)。四个LLM评估器的检测率从35%到95%不等,这促使在生产部署中使用集成评分。

英文摘要

Open-source LLM agent ecosystems are growing rapidly, yet the security of community-contributed skills - modular tool definitions that extend agent capabilities - remains largely unvetted. The gap we fill: existing scanners operate at the code layer and are structurally blind to instruction-layer and multi-agent risk - natural-language directives that hijack an agent, exfiltrate data through encoded side channels, or chain harm across pipelines - so what is needed is a semantic, multi-dimensional vetting system rather than another signature matcher. We present SKILLVETBENCH, a live public leaderboard on Hugging Face that uses an LLM-as-Judge to vet agent skills. What is new: SARS (Skill Agentic Risk Score), a five-dimensional agentic-risk metric with a principled weighted formula for instruction-following systems. What is integrated: full CVSS v4.0 vector decomposition and a ClawHub dual-view that places our LLM-generated review beside the official marketplace verdict. What is demonstrated: drawing on our companion benchmark paper [ 1], the LLM-as-Judge stage achieves zero false negatives across 78 confirmed-malicious skills and zero false positives across 22 benign controls, while the best static baseline (SKILLSIEVE) still misses 15%; for instruction-layer categories such as Prompt Injection and Memory Poisoning, conventional tools miss between 89% and 100% of threats (e.g., CODEBERT detects none of nine memory-poisoning skills). Detection rates vary from 35% to 95% across four LLM evaluators, motivating ensemble scoring in production deployments.

2606.15950 2026-06-16 stat.ML cs.LG 交叉投稿

Spectral Adaptive Conformal Prediction for Structured Non-Exchangeable Data

面向结构化非可交换数据的谱自适应共形预测

Jeffery Opoku, David Banahene

发表机构 * University of Texas Rio Grande Valley(德克萨斯理工大学里奥格兰德谷分校) Florida International University(佛罗里达国际大学)

AI总结 针对非可交换时间序列数据,提出谱自适应共形预测方法,通过局部谱相似性加权共形分位数并在线调整目标误覆盖率,在循环模式和频率变化场景下提升区间覆盖的长期校准性。

Comments 35 pages, includes figures and references

详情
AI中文摘要

当数据可交换时,共形预测提供具有有限样本覆盖率的预测区间。许多时间索引数据集是不可交换的,它们具有季节、循环模式、变化频率或其他形式的结构化依赖。本文研究了一种利用这种结构的简单方法。我们提出了谱自适应共形预测,该方法使用局部谱相似性形成加权共形分位数,然后在线更新目标误覆盖率。谱权重选择与当前测试点相关的校准残差。自适应更新在不确定性随时间变化时纠正长期错误率。我们给出了固定谱加权分位数的近似覆盖结果,以及自适应更新的确定性长期校准结果。涉及循环模式和缓慢变化频率的模拟,以及三个美国真实数据示例表明,混合方法可以改进固定谱加权,同时也表明谱加权必须通过有效样本量诊断进行监控。

英文摘要

Conformal prediction gives prediction intervals with finite-sample coverage when the data are exchangeable. Many time-indexed datasets are not exchangeable. They have seasons, recurring regimes, changing frequencies, or other forms of structured dependence. This paper studies a simple way to use that structure. We propose spectral adaptive conformal prediction, a method that forms weighted conformal quantiles using local spectral similarity and then updates the target miscoverage level online. The spectral weights choose calibration residuals that look relevant to the current test point. The adaptive update corrects the long-run miss rate when uncertainty changes over time. We give an approximate coverage result for the fixed spectral weighted quantile and a deterministic long-run calibration result for the adaptive update. Simulations with recurring regimes and slowly changing frequencies, together with three U.S. real-data examples, show that the hybrid method can improve on fixed spectral weighting, while also showing that spectral weighting must be monitored through effective sample size diagnostics.

2606.15998 2026-06-16 cs.IR cs.AI cs.CL cs.LG 交叉投稿

Entity Labels Are Not Entity Signals: A Framework for Observable Relevance in Document Re-Ranking

实体标签并非实体信号:文档重排序中可观测相关性的框架

Utshab Kumar Ghosh, Shubham Chatterjee

发表机构 * Department of Computer Science, Missouri University of Science and Technology(计算机科学系,密苏里科技大学)

AI总结 提出实体可观测相关性(OER)与概念相关性(CER)的区分,证明CER监督效果差,而OER对齐可显著提升重排序性能。

Comments ICTIR '26

详情
Journal ref
Proceedings of the 2026 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR)
AI中文摘要

实体感知的文档检索使用与查询关联的实体作为排序信号,假设语义相关的实体也是有用的检索信号。我们证明这一假设是不充分的,并解释原因。与作为真实观测的词项不同,实体链接是由不完美的链接器产生的假设:如果链接器在相关和非相关文档中无差别地触发,那么一个实体可能在主题上重要,却不提供任何判别性信号。我们将此形式化为概念实体相关性(CER)——实体是否与查询主题相关——和可观测实体相关性(OER)——其在集合中的观测出现是否能区分相关与非相关文档——之间的区别。在四个集合和包括人工实体判断的标注来源上,CER和OER表现出接近随机的吻合度(κ≈0),而OER的操作化实现吻合度较高(κ≈0.5),确认CER是系统性异常值。基于CER的监督选择主题上合理但判别性弱的实体,在某些集合上仅能过滤不到4%的非相关文档。将监督与OER对齐可将非相关文档过滤提升至10倍,并在BM25基础上将开放世界MAP提升0.051。我们的发现促使实体感知检索中从概念实体相关性向可观测实体相关性的转变。

英文摘要

Entity-aware document retrieval uses query-associated entities as ranking signals, assuming that semantically relevant entities are also useful retrieval signals. We show this assumption is insufficient- and explain why. Unlike terms, which are ground-truth observations, entity links are hypotheses produced by an imperfect linker: an entity can be topically central yet provide no discriminative signal if the linker fires indiscriminately across relevant and non-relevant documents. We formalize this as a distinction between Conceptual Entity Relevance (CER)- whether an entity is topically related to a query- and Observable Entity Relevance (OER)- whether its observed presence in a collection discriminates relevant from non-relevant documents. Across four collections and annotation sources including human entity judgments, CER and OER exhibit near-chance agreement ($κ\approx 0$), while OER operationalizations agree substantially ($κ\approx 0.5$), confirming CER as the systematic outlier. CER-based supervision selects topically plausible but weakly discriminative entities, pruning fewer than 4% of non-relevant documents on some collections. Aligning supervision with OER improves non-relevant pruning by up to 10x and open-world MAP by 0.051 over BM25. Our findings motivate a shift from conceptual to observable notions of entity relevance in entity-aware retrieval.

2606.16062 2026-06-16 cs.AI cs.LG 交叉投稿

Auditing Reward Hackability in Code RL Training Environments

审计代码强化学习训练环境中的奖励可破解性

Shreshth Rajan

发表机构 * GitHub

AI总结 测量代码RL环境接受错误解决方案的比率,发现SWE-bench Verified中28.5%的任务测试套件薄弱,并提出通过LLM判断器和Docker金标准门控来加固漏洞任务的方法。

详情
AI中文摘要

我们测量了代码强化学习环境将错误解决方案视为正确的比率。在SWE-bench Verified的49个任务样本中,28.5%的任务测试套件足够薄弱,以至于Docker验证的错误补丁能通过它们。在6个代码库的20个R2E-Gym任务上,相同的单次利用生成管道产生25.0%的成功率。对SWE-bench Verified上134个前沿模型提交的随机效应荟萃分析发现,在相同人工评定的难度层级内,模型Pass@1在标记为可破解的任务上比稳健任务高14.14个百分点(95%置信区间[+11.80, +16.48];单侧p < 10^-6;I^2 = 0%;134个模型中有123个为正)。然后我们描述了一个加固被破坏任务的流程。一个内联LLM判断器配合Docker金标准门控,在咨询判断器之前对每个生成的测试针对金标准解决方案运行。在审计中的11个被破坏任务上,门控标记出105个决定性的LLM生成测试中的65个在金标准补丁上失败,这是LLM判断器单独遗漏的61.9%的每次增强缺陷率。通过多样性偏置重试,该循环将11个任务中的9个收敛到门控升级。

英文摘要

We measure the rate at which code RL environments accept incorrect solutions as correct. On a 49-task sample of SWE-bench Verified, 28.5% of tasks have test suites weak enough that a Docker-verified incorrect patch passes them. On 20 R2E-Gym tasks across 6 repositories, the same pipeline at single-shot exploit generation yields 25.0%. A random-effects meta-analysis over 134 frontier model submissions to SWE-bench Verified finds, within the same human-rated difficulty stratum, model Pass@1 is +14.14 percentage points higher on flagged-hackable tasks than on robust ones (95% CI [+11.80, +16.48]; one-sided p < 10^-6; I^2 = 0%; 123 of 134 models positive). We then describe a procedure for hardening the broken tasks. An inline LLM judge with a Docker gold-sanity gate runs each generated test against the gold solution before the judge is consulted. On the 11 broken tasks in the audit, the gate flags 65 of 105 decisive LLM-generated tests as failing on the gold patch itself, a 61.9% per-augmentation defect rate the LLM judge alone misses. With diversity-biased retry, the loop converges 9 of 11 tasks to a gated upgrade.

2606.16113 2026-06-16 cs.AI cs.LG 交叉投稿

RecourseBench: A Modular Framework for Reproducible Algorithmic Recourse Evaluation

RecourseBench: 一个用于可复现算法追责评估的模块化框架

Zahra Khotanlou, Hashir Ahmed, Chenghao Tan, Ahmed Abdelaal, Amir-Hossein Karimi

发表机构 * University of Waterloo(滑铁卢大学)

AI总结 提出RecourseBench框架,通过模块化、可复现性和交互性三大承诺,实现追责方法的统一评估,并集成28种方法,首次通过自动化定量测试强制方法级可复现性。

详情
AI中文摘要

算法追责方法提供反事实解释,告知个体需要采取哪些行动来推翻不利的模型决策。尽管方法学进展迅速,但原则性比较仍然难以实现;现有框架通常难以扩展,缺乏互操作性,并且缺乏系统验证来确保集成的方法忠实复现其最初报告的结果。我们引入了\emph{RecourseBench},一个围绕三大承诺(即模块化、可复现性和交互性)构建的统一评估框架。该框架将流程分解为五个完全解耦的层——数据、预处理、模型、追责方法和评估——由抽象接口和动态注册表管理。为了解决先前基准测试中的可复现性差距,我们引入了一个四级分类系统,其中每个集成的方法都通过自动化测试套件针对其最初报告的结果进行验证。我们还提供了一个交互式Web界面,用于在方法、数据集和模型架构之间进行灵活的、配置驱动的比较。我们的框架目前集成了28种最先进的追责方法,据我们所知,这是第一个通过自动化定量测试明确强制执行方法级可复现性的追责基准。

英文摘要

Algorithmic recourse methods provide counterfactual explanations that inform individuals of the actions required to overturn an unfavorable model decision. Despite rapid methodological progress, principled comparison remains elusive; existing frameworks are often difficult to extend and lack both interoperability and systematic verification that integrated methods faithfully reproduce their originally reported results. We introduce \emph{RecourseBench}, a unified evaluation framework built around three commitments namely, modularity, reproducibility, and interactivity. The framework decomposes the pipeline into five fully decoupled layers -- Data, Preprocessing, Model, Recourse Method, and Evaluation -- governed by abstract interfaces and a dynamic registry. To address the reproducibility gap in prior benchmarks, we introduce a four-tier classification system in which every integrated method is validated by an automated test suite against its originally reported results. We further provide an interactive web interface for flexible, configuration-driven comparison across methods, datasets, and model architectures. Our framework currently integrates 28 state-of-the-art recourse methods and, to our knowledge, constitutes the first recourse benchmark to explicitly enforce method-level reproducibility through automated, quantitative testing.

2606.16127 2026-06-16 cs.CL cs.AI cs.LG 交叉投稿

AuAu: A Benchmark for Auditing Authoritarian Alignment in Large Language Models

AuAu: 大型语言模型中威权对齐审计基准

Andreas Einwiller, Max Klabunde, Florian Lemmerich

发表机构 * University of Zurich(苏黎世大学)

AI总结 提出AuAu基准,结合心理测量、情境行为测试和用户提示评估LLM的威权倾向,发现17个模型均存在显著威权响应,且系统提示可操纵多数模型。

Comments v1, 50 pages

详情
AI中文摘要

全球威权主义的浪潮,加上用户日常生活中日益核心的角色,引发了特定模型在多大程度上展现或促进威权态度和特征的问题。我们引入了AuAu,一个旨在评估LLM生成具有威权倾向响应风险的全面基准。该基准结合了三种评估方法:(i) 来自15个经过人类验证的广泛工具库的心理测量问题;(ii) 在具体情境中探究意图行为的情境行为小故事;(iii) 对现实用户提示的响应。与先前工作不同,AuAu不仅评估对威权主义的一般亲近程度,还评估已建立的子概念:威权攻击、威权服从和传统主义。评估来自中国、欧盟、俄罗斯和美国的17个模型,我们发现所有测试模型在心理测量评估下都表现出显著的威权响应率,尽管在越来越现实的下游任务中,该比率显著下降。我们进一步发现,威权系统提示容易操纵17个模型中的15个以促进增强的威权主义。我们的结果强调了持续、系统性地审计基于LLM的AI系统的必要性,以检测并最终减轻生成输出中不期望的威权倾向。我们的代码和数据可在 https://github.com/andreaseinwiller/AuAu 获取。

英文摘要

The worldwide surge of authoritarianism, combined with the increasing central role in users' everyday lives, raises the question of to what extent specific models exhibit or promote authoritarian attitudes and characteristics. We introduce AuAu, a comprehensive benchmark that aims to assess the risk of LLMs generating responses with authoritarian tendencies. This benchmark combines three evaluation approaches: (i) psychometric questions from an extensive pool of 15 human validated instruments; (ii) contextual behavior vignettes probing intended actions in concrete situations; and (iii) responses to realistic user prompts. Unlike prior work, AuAu evaluates not only a general closeness towards authoritarianism but also the established sub-concepts Authoritarian Aggression, Authoritarian Submission, and Conventionalism. Evaluating 17 models from China, the EU, Russia, and the USA, we find that all tested models exhibit substantial authoritarian response rates under the psychometric evaluation, though rates drop significantly in increasingly more realistic downstream task. We further find that an authoritarian system prompt easily manipulates 15 out of 17 models to promote increased authoritarianism. Our results underscore the need for continued, systematic auditing of LLM-based AI systems to detect and ultimately mitigate undesired authoritarian tendencies in generated output. Our code and data are available at: https://github.com/andreaseinwiller/AuAu

2606.16344 2026-06-16 cs.AI cs.CL cs.CY cs.LG 交叉投稿

Whose hotel does the AI recommend? An algorithm audit of reputation signals in LLM-assisted hotel selection

AI推荐哪家酒店?LLM辅助酒店选择中声誉信号的算法审计

Mirza Samad Ahmed Baig, Syeda Anshrah Gillani, Asher Ali

发表机构 * Fandaqah, Al Khobar, Saudi Arabia(沙特阿拉伯阿尔科巴尔Fandaqah) Hamdard University, Karachi, Pakistan(巴基斯坦卡拉奇哈姆达德大学)

AI总结 通过随机选择联合实验审计12种LLM,发现客人评分和价格主导推荐,但过度重视生态认证而忽略管理回复,且列表位置(无内容特征)有因果影响。

Comments 32 Pages

详情
AI中文摘要

旅行者越来越多地询问大语言模型(LLM)助手预订哪家酒店,使这些系统成为物业可见性的守门人——但什么驱动了它们的推荐尚未有记录。我们使用基于随机选择的联合实验进行预先指定的算法审计:跨角色、提示模板和十二个开放权重及专有模型,助手在五家酒店中进行选择,这些酒店的客人评分、评论数量和时效性、管理回复、连锁品牌、价格、生态认证和列表位置均被独立随机化。我们估计每个信号对推荐概率的平均边际成分效应。客人评分和价格占主导地位(高评分使选择概率提高31.6个百分点;高价格使其降低30.0个百分点),重现了人类效价和价格优先性,但过度重视生态认证而忽略管理回复。列表位置——一个无内容的伪影——因果性地改变推荐,价值约为每晚12美元。陈述的理由与揭示的权重不完全一致。这些发现为生成式引擎优化和AI信息中介的可问责性提供了因果证据。

英文摘要

Travelers increasingly ask large language model (LLM) assistants which hotel to book, making these systems gatekeepers of property visibility -- yet what moves their recommendations is undocumented. We conduct a pre-specified algorithm audit using a randomized choice-based conjoint: across personas, prompt templates, and twelve open-weight and proprietary models, assistants choose among five hotels whose guest rating, review volume and recency, management response, chain affiliation, price, eco-certification, and list position are independently randomized. We estimate the average marginal component effect of each signal on the probability of recommendation. Guest rating and price dominate (a top rating raises selection by 31.6 percentage points; a high price lowers it by 30.0), reproducing human valence-and-price primacy but over-weighting eco-certification and ignoring management response. List position -- a content-free artifact -- shifts recommendations causally, worth about \$12 per night. Stated reasons track revealed weights imperfectly. The findings ground generative engine optimization and the accountability of AI infomediaries in causal evidence.

2606.16368 2026-06-16 cs.CL cs.LG 交叉投稿

Evaluating LLM Personalization via Semantic Constraint Verification

通过语义约束验证评估LLM个性化

Xuran Li, Guanqin Zhang, Imran Razzak, Hakim Hacid, Eleanna Kafeza, Hao Xue, Flora D. Salim

发表机构 * University of New South Wales(新南威尔士大学) Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学) The Technology Innovation Institute(技术创新研究所) The Hong Kong University of Science and Technology(香港科技大学)

AI总结 提出NLICV框架,利用自然语言推理模型将句子映射到真值条件集,验证个性化约束,将LLM行为分为四类,与人类标注高度一致,并大幅降低延迟和成本。

详情
AI中文摘要

当前大型语言模型(LLM)个性化的评估范式严重依赖于脆弱的表面匹配指标或计算成本高昂的LLM作为评判者的协议,两者都缺乏可解释性。为了解决这些局限性,我们引入了自然语言推理约束验证(NLICV),这是一个可扩展的、语义不变的框架,它将句子含义映射到真值条件集,通过自然语言推理(NLI)模型验证个性化约束。超越二元评分,NLICV将LLM行为分为四种不同模式:个性化、泛化、谄媚和失败。大量实验表明,NLICV与人工标注高度一致,同时大幅降低了与LLM评判者相关的延迟和令牌成本(高达2100倍推理加速)。最后,通过基于消融的程序,NLICV精确定位驱动约束验证的准确句子,为其评估提供忠实、可理解的证据。

英文摘要

Current evaluation paradigms for Large Language Model (LLM) personalization rely heavily on brittle surface-matching metrics or computationally expensive LLM-as-a-judge protocols, both of which lack interpretability. To address these limitations, we introduce Natural Language Inference Constraint Verification (NLICV), a scalable, semantically invariant framework that maps sentence meanings to truth-condition sets to verify personalization constraints via a Natural Language Inference (NLI) model. Moving beyond binary scoring, NLICV categorizes LLM behaviors into four distinct modes: personalization, generalization, sycophancy, and failure. Extensive experiments demonstrate that NLICV aligns closely with human annotations while drastically reducing the latency and token costs associated with LLM judges (up to 2100 inference speedup). Finally, through an ablation-based procedure, NLICV pinpoints the exact sentences driving the constraint verification, yielding faithful, understandable evidence for its evaluations.

2606.16540 2026-06-16 q-bio.QM cs.LG q-bio.BM q-bio.GN 交叉投稿

MultiMolecule: a modular ecosystem for biomolecular sequence-model workflows

MultiMolecule: 一个用于生物分子序列模型工作流的模块化生态系统

Zhiyuan Chen

发表机构 * DanLing Team(丹 Ling 团队)

AI总结 提出MultiMolecule开源生态系统,通过标准化接口整合RNA、DNA和蛋白质序列模型,提供53个模型族实现、112个检查点和16个数据集资源,支持模型复用、评估和生物预测。

详情
AI中文摘要

生物分子序列模型越来越多地被用于最初研究之外的任务,但公开的检查点很少保留检查源定义行为、适应新实验、在共享任务定义下比较模型或部署生物预测所需的执行上下文。MultiMolecule是一个开源Python生态系统,它将异质的RNA、DNA和蛋白质序列模型发布转变为完整的、经过源检查的模型族实现,并带有共享的加载、工作流和预测接口。此处报告的Resource状态包括53个完整的模型族实现,包含112个标准化的模型检查点,以及通过39个公共数据集仓库发布的16个精选数据集资源和10个面向用户的预测管道。标准化组件链接到源出处、转换或准备代码、源参考检查、扩展数据摘要和公共文档,允许用户检查哪些内容被标准化、哪些行为被检查以及每个组件如何进入训练、评估、推理或部署。通过将复用从特定仓库的检查点转移到与标准化检查点、精选数据集、Runner工作流和生物预测管道相连的可执行实现,MultiMolecule为保留源定义的模型行为、适应新实验、实现受控评估和部署生物分子预测提供了通用基础设施。

英文摘要

Biomolecular sequence models are increasingly reused outside the studies in which they were introduced, but public checkpoints rarely preserve the execution context needed to inspect source-defined behavior, adapt models to new assays, compare models under shared task definitions or deploy biological predictions. MultiMolecule is an open-source Python ecosystem that turns heterogeneous RNA, DNA and protein sequence-model releases into complete, source-checked model-family implementations with shared loading, workflow and prediction interfaces. The Resource state reported here includes 53 complete model-family implementations with 112 standardized model checkpoints, together with 16 curated dataset resources released through 39 public dataset repositories and 10 user-facing prediction pipelines. Standardized components are linked to source provenance, conversion or preparation code, source-reference checks, Extended Data summaries and public documentation, allowing users to inspect what was standardized, what behavior was checked and how each component enters training, evaluation, inference or deployment. By shifting reuse from repository-specific checkpoints to executable implementations connected to standardized checkpoints, curated datasets, Runner workflows and biological prediction pipelines, MultiMolecule provides common infrastructure for preserving source-defined model behavior, adapting models to new assays, enabling controlled evaluation and deploying biomolecular predictions.

2606.16541 2026-06-16 cs.AI cs.LG 交叉投稿

The Faithfulness Gap: Certifying Semantic Equivalence Between Natural-Language and Formal Mathematical Statements

忠实性差距:认证自然语言与形式数学语句之间的语义等价性

Noor Islam S. Mohammad, Tamim Sheikh

发表机构 * Department of Computer Science, Informatics Institute, Istanbul Technical University, İstanbul, Türkiye(信息学院计算机科学系,伊斯坦布尔技术大学,伊斯坦布尔,土耳其) Department of Computer Science(计算机科学系) Engineering, Jashore University of Science(工程系,贾沙尔大学科学学院)

AI总结 提出双向可证明性指纹识别框架,通过前向和后向推论邻域匹配自然语言探针,认证自动形式化翻译的忠实性,并引入反事实探针生成、等价谱、自适应探针预算分配和忠实性引导解码四个新组件,在基准上实现高检测率并减少漂移。

详情
AI中文摘要

自动形式化——将自然语言数学翻译成形式证明助手——的瓶颈不在于翻译流畅性,而在于\emph{忠实性}:一个形式语句可以通过类型检查且可证明,但仍可能编码与源意图不同的定理。我们引入\emph{双向可证明性指纹识别}(\bpf{}),这是一个通过刻画每个候选在背景理论中的前向和后向推论邻域,并将这些邻域与从自然语言语句导出的探针进行匹配来认证忠实性的框架。我们进一步引入四个新组件:(i)\emph{反事实探针生成}(\cpg{}),一种合成针对特定漂移方向的探针的对比性程序;(ii)\emph{等价谱},一个替代脆弱的二元判决的连续忠实性分数;(iii)\emph{自适应探针预算分配}(\apba{}),一个信息论预算路由器;以及(iv)\emph{忠实性引导解码}(\fgd{}),它在自动形式化过程中使用\bpf{}信号作为奖励。我们证明了一个\emph{漂移检测定理}和一个\emph{PAC-忠实性}结果,该结果确立了在温和假设下,自然语言语句的等价类可以从$\mathcal{O}(\log(1/δ)/\varepsilon)$个探针中学习。我们发布了\driftbench{},一个包含$2{,}183$个NL/Lean~4对的基准,这些对具有跨mathlib4六个子领域的受控漂移标签。\bpf{}\,+\,\cpg{}在$3.0\%$的假阳性率下检测出$89.6\%$的漂移形式化——相比之下,类型检查为$41.2\%$,LLM评判基线为$63.3\%$——并且\fgd{}将最先进的自动形式化器产生漂移语句的比率降低了$47\%$。https://pmlrbd.github.io/BPF/

英文摘要

Autoformalization, translating natural-language mathematics into formal proof assistants, is bottlenecked not by translation fluency but by \emph{faithfulness}: a formal statement can typecheck and be provable, yet still encode a different theorem than the source intended. We introduce \emph{Bidirectional Provability Fingerprinting} (\bpf{}), a framework that certifies faithfulness by characterizing each candidate through its forward and backward consequence neighborhoods in the ambient theory and matching these against probes derived from the natural-language statement. We further introduce four novel components: (i) \emph{Counterfactual Probe Generation} (\cpg{}), a contrastive procedure that synthesizes probes targeting specific drift directions; (ii) the \emph{Equivalence Spectrum}, a continuous faithfulness score that replaces brittle binary verdicts; (iii) \emph{Adaptive Probe Budget Allocation} (\apba{}), an information-theoretic budget router; and (iv) \emph{Faithfulness-Guided Decoding} (\fgd{}), which uses \bpf{} signals as a reward during autoformalization. We prove a \emph{drift detection theorem} and a \emph{PAC-faithfulness} result establishing that the equivalence class of a natural language statement is learnable from $\mathcal{O}(\log(1/δ)/\varepsilon)$ probes under mild assumptions. We release \driftbench{}, a benchmark of $2{,}183$ NL/Lean~4 pairs with controlled drift labels across six subfields of mathlib4. \bpf{}\,+\,\cpg{} detects $89.6\%$ of drifted formalizations at a $3.0\%$ false-positive rate-against $41.2\%$ for typecheck and $63.3\%$ for LLM-judge baselines, and \fgd{} reduces the rate at which a state-of-the-art autoformalizer emits drifted statements by $47\%$. https://pmlrbd.github.io/BPF/

2606.16555 2026-06-16 cs.DC cs.LG 交叉投稿

Incentives and Evidence in Learned Service Orchestration

学习型服务编排中的激励与证据

Syed Izhan Khilji, Alireza Furutanpey, Schahram Dustdar

发表机构 * EPFL(苏黎世联邦理工学院) University of Waterloo(多伦多大学)

AI总结 本文通过预注册实验检验三个有影响力的基于强化学习的服务编排系统,发现大多数性能反转未发生,并指出发表和评审激励偏向基准增益而非部署性能,提出需要生产级比较器和可复现操作证据。

Comments To be presented at the IEEE 2026 International Congress on Intelligent and Service Oriented Systems Engineering (CISOSE 2026)

详情
AI中文摘要

强化学习用于服务编排已持续研究超过十年,但尚未在大规模生产中应用。通常的解释是学习型控制器在延迟和噪声遥测、工作负载变化以及不受控制的租户下性能下降。我们检验现有证据是否支持这一解释。我们评估了三个极具影响力的基于RL的编排系统,涵盖资源分配、DAG调度和自动缩放,使用预注册的关于在生产相关扰动下比较退化的预测,以及带有族系误差校正的配对推断。在测试中,大多数预测的性能反转并未发生。诊断分析表明,这些结果通常反映的是比较器崩溃、工件限制或评估选择,而非学习型控制器容忍扰动的证据。一个在观测滞后下的明显优势大约是Kubernetes HPA等效控制器的四十倍。另一个广泛引用的结果无法从其发布的工件中重建,且最强的可复现边际远小于已发表的结果。结论也会在扰动幅度和评估模式的变化下发生逆转。基于这些结果和文献中的更广泛模式,我们识别出一个制度性问题。发表和评审激励偏向于针对便捷比较器的基准增益,即使这些增益几乎不能提供部署性能的证据。我们认为问题不仅仅是技术性的,而是制度性的,因此学习型编排需要生产级比较器、注册扰动模型、独立的操作指标,以及奖励可复现操作证据的发表标准。没有这些改变,文献可以增长,但无法确定学习是否改进了编排。

英文摘要

Reinforcement learning for service orchestration has been the subject of sustained research for over a decade, yet it is not used in production at scale. The usual explanation is that learned controllers degrade under delayed and noisy telemetry, workload shifts, and uncontrolled tenants. We test whether existing evidence supports that explanation. We evaluate three highly influential RL-based orchestration systems spanning resource allocation, DAG scheduling, and autoscaling, using pre-registered predictions about comparative degradation under production-relevant perturbations and paired inference with family-wise error correction. Across the tests, most predicted performance reversals do not occur. Diagnostic analyses show that these outcomes often reflect comparator collapse, artefact limitations, or evaluation choices rather than evidence that learned controllers tolerate the perturbations. One apparent advantage under observation lag is roughly fortyfold compared to a Kubernetes HPA-equivalent controller. Another widely cited result cannot be reconstructed from its released artefact, and the strongest reproducible margin is far smaller than the published results. Conclusions also reverse under changes in perturbation magnitude and evaluation mode. Based on these results and broader patterns in the literature, we identify an institutional problem. Publication and review incentives favour benchmark gains against convenient comparators, even when those gains provide little evidence of deployment performance. We argue that the problem is not solely technical. Rather, it is institutional, so learned orchestration needs production-grade comparators, registered perturbation models, separate operational metrics, and publication criteria that reward reproducible operational evidence. Without these changes, the literature can grow without establishing whether learning improves orchestration.

2606.16612 2026-06-16 cs.SD cs.LG cs.MM 交叉投稿

Beyond Artifacts: Towards Generalizable Synthetic Song Detection via Music-Intrinsic Features

超越伪影:基于音乐内在特征的可泛化合成歌曲检测

Yan Han, Zhibin Wen, Yuan Wang, Shuangrun Shao, Xiaobing Li, Yang Xu, Wei Li

发表机构 * Central Conservatory of Music(中央音乐学院) Southern University of Science and Technology(南方科技大学) Fudan University(复旦大学)

AI总结 提出Sofia框架,通过特征特定专家和自适应混合专家模型利用音乐内在特征(人声、音频效果、全局结构)进行合成歌曲检测,在MUSIC8K基准上F1提升18.5点,具有强鲁棒性。

详情
AI中文摘要

AI音乐生成器的快速发展凸显了对可靠合成歌曲检测(SSD)的迫切需求。现有SSD方法通常依赖于低级伪影或固定特征假设,难以捕捉生成器无关的线索。为解决这一问题,我们提出Sofia(基于音乐特征的合成歌曲检测框架),一个灵活的框架,通过特征特定专家和自适应混合专家(MoE)模块对音乐内在属性进行建模。通过使用代表性的人声、音频效果、全局结构特征及其组合配置Sofia,我们展示了它们的个体和互补贡献。为全面评估我们的框架,我们进一步构建了MUSIC8K,一个具有挑战性的基准,包含最新出现的生成器和逼真的音频扰动。实验表明,Sofia从音乐内在特征中学习生成器无关的表示,在MUSIC8K-O上相比最强基线F1分数提升18.5点,同时保持强鲁棒性。

英文摘要

The rapid advancement of AI music generators highlights the urgent need for reliable Synthetic Song Detection (SSD). Existing SSD methods often rely on low-level artifacts or fixed feature assumptions, struggling to capture generator-agnostic cues. To address this, we propose Sofia (Synthetic-song detection framework via music features), a flexible framework that models music-intrinsic attributes via feature-specific experts and an adaptive Mixture-of-Experts (MoE) module. By configuring Sofia with representative Vocal, Audio-effect, Global structure features, and their combinations, we present their individual and complementary contributions. To comprehensively evaluate our framework, we further construct MUSIC8K, a challenging benchmark featuring lastest emerging generators and realistic audio perturbations. Experiments show that Sofia learns generator-agnostic representations from music-intrinsic features, improving the F1 score by 18.5 points over the strongest baseline on MUSIC8K-O while maintaining strong robustness.

2606.16753 2026-06-16 cs.CL cs.AI cs.LG 交叉投稿

P3B3: A Multi-Turn Conversational Benchmark for Measuring European and Brazilian Portuguese Variety Bias in LLMs

P3B3:用于测量大语言模型中欧洲和巴西葡萄牙语变体偏差的多轮对话基准

Rafael Ferreira, Inês Vieira, Inês Calvo, James Furtado, Iago Paulo, Diogo Tavares, Diogo Glória-Silva, David Semedo, João Magalhães

发表机构 * NOVA University of Lisbon(新里斯本大学) NOVA LINCS(NOVA LINCS实验室)

AI总结 提出P3B3基准,通过专家策划的对话提示和评估框架,测量大语言模型在葡萄牙语变体(欧洲vs巴西)上的偏差和可控性,发现多数模型偏向巴西葡萄牙语。

Comments Accepted at MeLLM Workshop at ACL 2026

详情
AI中文摘要

随着大语言模型(LLMs)融入日常交流,捕捉区域语言变异对于可靠和公平的语言使用至关重要。在葡萄牙语中,欧洲(pt-PT)和巴西(pt-BR)变体仍然代表性不均,pt-BR在数据量上占主导地位,而LLM对葡萄牙语变体的偏好尚未得到充分探索。为弥补这一空白,我们引入了P3B3,一个由专家策划的语言变体无关的对话提示基准,以及一个用于测量变体偏差和可控性的评估框架。在多个模型上的实验表明,大多数LLM表现出对pt-BR的强烈偏差,且不同模型的可控性存在差异。这些结果凸显了需要在语言变体之间实现更平衡的多语言表示。

英文摘要

As Large Language Models (LLMs) become embedded in everyday communication, capturing regional linguistic variation is essential for reliable and equitable language use. In Portuguese, European (pt-PT) and Brazilian (pt-BR) varieties remain unevenly represented, with pt-BR dominating in data quantity, while LLM preference for Portuguese variants remains underexplored. To address this gap, we introduce P3B3, an expert-curated language variety agnostic benchmark of conversational prompts, along with an evaluation framework for measuring variety bias and controllability. Experiments on several models show that most LLMs exhibit a strong bias toward pt-BR, with variation in controllability across models. These results highlight the need for more balanced multilingual representation across language varieties.

2606.16991 2026-06-16 cs.CV cs.LG 交叉投稿

A Multi-Center Benchmark for Abdominal Disease Diagnosis and Report Generation from Non-Contrast CT

基于非增强CT的腹部疾病诊断与报告生成的多中心基准

Mariam Elbakry, Aliaa Sayed Sheha, Salma Hassan Tantawy, Aya Yassin, Concetto Spampinato, Karim Lekadir, Xiaomeng Li, Marawan Elbatel

发表机构 * Ain Shams University(艾因夏姆斯大学) The Hong Kong University of Science and Technology(香港科技大学) University of Catania(卡塔尼亚大学) Universitat de Barcelona(巴塞罗那大学)

AI总结 提出一个多中心基准,利用非增强CT合成增强CT发现,用于多器官腹部疾病诊断和自动报告生成,实验表明非增强CT保留诊断信号,平均AUC达69.1%(内部)和63.1%(外部)。

Comments Early Accept (top ~9%), MICCAI 2026

详情
AI中文摘要

多期增强CT(CECT)广泛用于腹部病变表征,但存在造影剂肾病风险、增加采集负担并加重放射科医生工作量。为解决这些问题,我们引入了一个新的多中心基准,用于多器官腹部疾病诊断和自动放射报告生成,该基准学习从单期非增强CT(NCCT)合成增强CT发现。为此,我们从两个中心收集了配对NCCT-CECT研究及其对应的增强放射报告的大规模数据集,分为内部集和外部验证队列。在统一评估协议下,我们对五种当代深度学习架构进行了基准测试,涵盖胸部专用、腹部专用和通用多模态领域。大量实验表明,NCCT保留了诊断信号,在内部队列和外部队列上分别实现了平均多器官AUC 69.1%和63.1%。通过公开发布该数据集和标准化基准,本研究旨在促进未来对更安全、资源高效且全球可及的免造影腹部成像工作流程的研究。代码地址:https://github.com/xmed-lab/TriALS-Report。

英文摘要

Multiphasic contrast-enhanced CT (CECT) is widely used for abdominal lesion characterization, yet it carries inherent risks of contrast-induced nephropathy, escalates acquisition burden, and heavily contributes to radiologist workload. To address these challenges, we introduce a novel multi-center benchmark for multi-organ abdominal disease diagnosis and automated radiology report generation, which learns to synthesize contrast-enhanced findings from single-phase non-contrast CT (NCCT). To support this, we curated a large-scale dataset of paired NCCT-CECT studies and their corresponding contrast-enhanced radiology reports from two centers, partitioned into internal sets and an external validation cohort. Under a unified evaluation protocol, we benchmarked five contemporary deep learning architectures encompassing chest-specific, abdomen-specific, and general-purpose multimodal domains. Extensive experiments demonstrate that NCCT retains diagnostic signals, achieving an average multi-organ AUC of 69.1% on the internal cohort and 63.1% on the external cohort, respectively. By releasing this dataset and standardized benchmark publicly, this study aims to catalyze future research into safer, resource-efficient, and globally accessible contrast-free abdominal imaging workflows. Code is available at: https://github.com/xmed-lab/TriALS-Report.

2606.17006 2026-06-16 cs.SD cs.AI cs.LG cs.MM eess.AS 交叉投稿

TuneJury: An Open Metric for Improving Music Generation Preference Alignment

TuneJury: 一种改进音乐生成偏好对齐的开放指标

Yonghyun Kim, Junwon Lee, Haiwen Xia, Yinghao Ma, Junghyun Koo, Koichi Saito, Yuki Mitsufuji, Chris Donahue

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Sony AI(索尼AI) Georgia Tech(佐治亚理工学院) KAIST(韩国科学技术院) Peking University(北京大学) QMUL(伦敦玛丽女王大学)

AI总结 提出TuneJury,一个开放、实例级别的成对奖励模型,用于文本到音乐生成,通过预测偏好分数支持数据筛选、后处理校准,并在推理、优化和训练中提升对齐效果。

Comments 32 pages, 9 figures

详情
AI中文摘要

我们引入了TuneJury,一个开放、实例级别的成对奖励模型,用于文本到音乐生成,它从文本提示和音频片段中预测音乐偏好分数。发布的检查点在公开的人类偏好标签上训练,涵盖竞技场风格(A vs. B)投票、度量对齐偏好对、众包成对比较和专家审美评分。两个片段之间的预测分数差在我们的保留测试集上校准良好,支持通过简单的分数阈值进行数据筛选。TuneJury泛化到保留测试对和分布外基准,在后一任务上与先前基线保持竞争力。对于训练后发布的生成器,我们引入了锚定校准,一种事后、每系统的Bradley-Terry校准,以显著优于从头再训练的数据效率恢复一致性。相同的冻结奖励在三个下游应用中驱动一致的奖励轴增益:推理时的最佳N选择、DITTO风格的潜在优化和专家迭代后训练。TuneJury可在https://github.com/yonghyunk1m/TuneJury获取。

英文摘要

We introduce TuneJury, an open, instance-level pairwise reward model for text-to-music that predicts a music preference score from a text prompt and an audio clip. The released checkpoint is trained on publicly available human-preference labels covering arena-style (A vs. B) votes, metric-alignment preference pairs, crowdsourced pairwise comparisons, and expert aesthetic ratings. The predicted score margin between two clips is well calibrated on our held-out test split, supporting data filtering via a simple score threshold. TuneJury generalizes to both held-out test pairs and out-of-distribution benchmarks, remaining competitive with prior baselines on the latter. For generators released after training, we introduce anchor calibration, a post-hoc, per-system Bradley-Terry calibration that recovers agreement at substantially better data efficiency than from-scratch retraining. The same frozen reward drives consistent reward-axis gains across three downstream applications: inference-time best-of-N selection, DITTO-style latent optimization, and expert-iteration post-training. TuneJury is available at https://github.com/yonghyunk1m/TuneJury.

2407.05370 2026-06-16 cs.LG 版本更新

Imbalanced Semi-Supervised Learning via Label Refinement and Threshold Adjustment

通过标签精炼和阈值调整实现不平衡半监督学习

Zeju Li, Ying-Qiu Zheng, Chen Chen, Saad Jbabdi

发表机构 * College of Biomedical Engineering, Fudan University, Shanghai, China(复旦大学生物医学工程学院,上海,中国) FMRIB Centre, Oxford Centre for Integrative Neuroimaging (OxCIN), University of Oxford, Oxford, UK(牛津大学FMRIB研究中心,牛津大学整合神经影像中心(OxCIN),牛津,英国) School of Computer Science, University of Sheffield, Sheffield, UK(谢菲尔德大学计算机科学学院,谢菲尔德,英国) Department of Engineering Science, University of Oxford, Oxford, UK(牛津大学工程科学系,牛津,英国)

AI总结 针对半监督学习在类别不平衡数据上性能下降的问题,提出SEVAL框架,通过从类别平衡的子集学习标签精炼和阈值调整参数,联合优化生成更准确的伪标签,在多种不平衡场景下超越现有方法。

Comments Accepted by Transactions on Machine Learning Research

详情
AI中文摘要

半监督学习(SSL)算法在不平衡数据上训练时往往表现不佳。在这种情况下,生成的伪标签倾向于偏向多数类,而依赖这些伪标签的模型会进一步放大这种偏差。现有的不平衡SSL算法探索了基于伪标签精炼(PLR)或阈值调整(THA)的伪标签策略,旨在通过启发式设计减轻偏差。然而,通过仔细的统计分析,我们发现现有策略是次优的:大多数PLR算法要么过于经验化,要么依赖于模型在整个训练过程中保持良好校准的不现实假设,而大多数THA算法依赖于有缺陷的伪标签选择指标。为了解决这些缺点,我们首先推导了类别不平衡下伪标签的理论最优形式。这一基础引出了我们的关键贡献:基于验证数据的伪标签优化半监督学习(SEVAL),这是一个统一的框架,从类别平衡的训练数据子集中学习PLR和THA参数。通过联合优化这些组件,SEVAL适应特定任务需求,同时确保每类伪标签的可靠性。我们的实验表明,SEVAL优于最先进的SSL方法,在各种不平衡SSL场景中产生更准确和有效的伪标签,同时保持与多种SSL算法的兼容性。代码已公开(此 https URL )。

英文摘要

Semi-supervised learning (SSL) algorithms often struggle to perform well when trained on imbalanced data. In such scenarios, the generated pseudo-labels tend to exhibit a bias toward the majority class, and models relying on these pseudo-labels can further amplify this bias. Existing imbalanced SSL algorithms explore pseudo-labeling strategies based on either pseudo-label refinement (PLR) or threshold adjustment (THA), aiming to mitigate the bias through heuristic-driven designs. However, through a careful statistical analysis, we find that existing strategies are suboptimal: most PLR algorithms are either overly empirical or rely on the unrealistic assumption that models remain well-calibrated throughout training, while most THA algorithms depend on flawed metrics for pseudo-label selection. To address these shortcomings, we first derive the theoretically optimal form of pseudo-labels under class imbalance. This foundation leads to our key contribution: SEmi-supervised learning with pseudo-label optimization based on VALidation data (SEVAL), a unified framework that learns both PLR and THA parameters from a class-balanced subset of training data. By jointly optimizing these components, SEVAL adapts to specific task requirements while ensuring per-class pseudo-label reliability. Our experiments demonstrate that SEVAL outperforms state-of-the-art SSL methods, producing more accurate and effective pseudo-labels across various imbalanced SSL scenarios while remaining compatible with diverse SSL algorithms. The code is publicly available (https://github.com/ZerojumpLine/SEVAL).

2509.07605 2026-06-16 cs.LG cs.AI cs.IT math.IT 版本更新

Beyond Rebalancing: Benchmarking Binary Classifiers Under Class Imbalance Without Rebalancing Techniques

超越重平衡:在不使用重平衡技术的情况下对类别不平衡下的二分类器进行基准测试

Ali Nawaz, Amir Ahmad, Shehroz S. Khan

发表机构 * Department of Information Systems and Security, College of Information Technology and Center for Artificial Intelligence and Digital Innovation, United Arab Emirates University(信息系统与安全系,信息技术学院和人工智能与数字创新中心,阿联酋大学) College of Engineering and Technology, American University of the Middle East(工程与技术学院,中东大学)

AI总结 本研究系统评估了多种二分类器在无显式重平衡技术下对类别不平衡的鲁棒性,发现TabPFN和基于提升的集成模型在极端不平衡下仍保持较高性能。

详情
AI中文摘要

类别不平衡对监督分类构成了重大挑战,特别是在医疗诊断和异常检测等关键领域,其中少数类实例很少。尽管许多研究探索了重平衡技术来解决这个问题,但在未应用此类技术的情况下评估不平衡下二分类器性能的关注较少。因此,本研究的目标是评估二分类器“原样”的性能,而不执行任何显式重平衡。具体来说,我们系统评估了多种二分类器在真实世界和合成数据集上的鲁棒性,在逐步减少的少数类规模下,使用一次和少量样本场景作为基线。我们的方法还通过合成决策边界生成探索不同的数据复杂性,以模拟真实世界条件。除了标准分类器,我们还包括使用欠采样、过采样策略和单类分类方法的实验,以检查它们在严重不平衡下的行为。结果证实,随着数据复杂性增加和少数类规模减小,分类变得更加困难。虽然传统分类器在极端不平衡下性能下降,但像TabPFN和基于提升的集成模型等先进模型相比传统分类器保持了相对更高的性能和更好的泛化能力。可视化可解释性和评估指标进一步验证了这些发现。我们的工作为不平衡学习中的模型选择提供了有价值的指导,提供了关于分类器鲁棒性而不依赖显式重平衡技术的见解。

英文摘要

Class imbalance poses a significant challenge to supervised classification, particularly in critical domains like medical diagnostics and anomaly detection where minority class instances are rare. While numerous studies have explored rebalancing techniques to address this issue, less attention has been given to evaluating the performance of binary classifiers under imbalance when no such techniques are applied. Therefore, the goal of this study is to assess the performance of binary classifiers "as-is", without performing any explicit rebalancing. Specifically, we systematically evaluate the robustness of a diverse set of binary classifiers across both real-world and synthetic datasets, under progressively reduced minority class sizes, using one-shot and few-shot scenarios as baselines. Our approach also explores varying data complexities through synthetic decision boundary generation to simulate real-world conditions. In addition to standard classifiers, we include experiments using undersampling, oversampling strategies, and one-class classification (OCC) methods to examine their behavior under severe imbalance. The results confirm that classification becomes more difficult as data complexity increases and the minority class size decreases. While traditional classifiers deteriorate under extreme imbalance, advanced models like TabPFN and boosting-based ensembles retain relatively higher performance and better generalization compared to traditional classifiers. Visual interpretability and evaluation metrics further validate these findings. Our work offers valuable guidance on model selection for imbalanced learning, providing insights into classifier robustness without dependence on explicit rebalancing techniques.

2510.14217 2026-06-16 cs.LG physics.chem-ph 版本更新

Spectral Analysis of Molecular Features: When Richer Features Do Not Guarantee Better Generalization

分子特征的光谱分析:更丰富的特征并不保证更好的泛化

Asma Jamali, Tin Sum Cheng, Rodrigo A. Vargas-Hernández

发表机构 * School of Computational Science and Engineering, McMaster University, Canada(麦 master 大学计算科学与工程学院) Department of Chemistry and Chemical Biology, McMaster University, Canada(麦 master 大学化学与生物化学系) Department of Mathematics and Computer Science, University of Basel, Switzerland(巴塞尔大学数学与计算机科学系) Brockhouse Institute for Materials Research, McMaster University, Canada(麦 master 大学材料研究布罗克豪斯研究所)

AI总结 通过核岭回归对多种分子表示进行光谱分析,发现更丰富的光谱特征并不一致地提升泛化性能,挑战了自监督学习中表示越丰富越好的启发式方法。

Comments 11 pages, 7 figures, 3 tables, SI: 13 pages, 9 figures, 4 Tables

详情
AI中文摘要

特征嵌入的光谱特性为模型泛化和表示质量提供了关键见解。虽然深度学习模型广泛用于分子性质预测,但核方法在低数据场景下仍具有竞争力,然而其光谱行为尚未被充分探索。我们首次对核岭回归在多种表示(包括分子指纹ECFP、预训练变换器、图神经网络和3D描述符)上的光谱特性进行了全面分析,并在QM9和3个MoleculeNet基准上进行了评估。令人惊讶的是,更丰富的光谱特征并不一致地产生更好的泛化性能,这与自监督学习中常用的表示启发式方法相矛盾。在4个光谱指标中,只有基于ECFP的核与性能呈严格正相关。变换器和全局3D表示表现出混合行为,而局部3D表示则始终呈负相关。截断分析进一步强调了这种差异:对于热力学目标上的局部3D表示,仅需不到2%的特征值(有时低至0.02%)即可恢复95%的性能,而ECFP和变换器核则需要显著更多的特征值。通过证明对任务和表示的强烈依赖性,我们的结果挑战了更丰富光谱固有地改善泛化的启发式方法,为自监督学习和标签有限的科学任务中的表示评估提供了新指导。

英文摘要

The spectral properties of feature embeddings offer critical insights into model generalization and representation quality. While deep learning models are widely used for molecular property prediction, kernel methods remain competitive in low-data regimes, yet their spectral behavior is largely unexplored. We present the first comprehensive spectral analysis of kernel ridge regression across diverse representations-including molecular fingerprints (ECFP), pretrained transformers, graph neural networks, and 3D descriptors-evaluated on QM9 and 3 MoleculeNet benchmarks. Surprisingly, richer spectral features do not consistently yield better generalization performance, contradicting common representation heuristics used in self-supervised learning (SSL). Across 4 spectral metrics, only ECFP-based kernels show a strictly positive correlation with performance. Transformer and global 3D representations exhibit mixed behavior, whereas local 3D representations show consistently negative correlations. Truncation analysis further emphasizes this disparity: for local 3D representations on thermodynamic targets, fewer than 2\% of eigenvalues (and occasionally as few as 0.02\%) are needed to recover 95\% of performance, whereas ECFP and transformer kernels require significantly more. By demonstrating a strong dependence on both task and representation, our results challenge the heuristic that richer spectra inherently improve generalization, providing new guidance for evaluating representations in SSL and in label-limited scientific tasks.

2602.03293 2026-06-16 cs.LG 版本更新

Anomaly Detection via Mean Shift Density Enhancement

基于均值漂移密度增强的异常检测

Pritam Kar, Rahul Bordoloi, Olaf Wolkenhauer, Saptarshi Bej

发表机构 * School of Data Science, Indian Institute of Science Education and Research(数据科学学院,印度科学教育与研究学院) Institute of Computer Science, University of Rostock(计算机科学研究所,罗斯托克大学) Leibniz-Institute for Food Systems Biology, Technical University of Munich(食品系统生物学莱比锡研究所,慕尼黑技术大学) Stellenbosch Institute of Advanced Studies (STIAS)(斯托尔波茨堡高级研究 institute (STIAS))

AI总结 提出MSDE框架,通过密度驱动流形演化下样本的几何位移检测异常,在46个表格数据集上优于13种基线方法。

详情
AI中文摘要

无监督异常检测是机器学习中的一个重要问题。现有的无监督异常检测算法很少能在不同异常类型上表现良好,通常仅在特定结构假设下表现出色。这种缺乏鲁棒性在噪声设置下尤为明显。我们提出均值漂移密度增强(MSDE),一个完全无监督的框架,通过异常样本对密度驱动流形演化的几何响应来检测异常。MSDE被设计为一个通用异常检测框架,其原理是:正常样本由于得到局部密度的良好支持,在迭代密度增强下保持稳定,而异常样本在向附近密度模式吸引时会产生大的累积位移。为了实现这一思想,MSDE采用加权均值漂移过程,其中自适应、样本特定的密度权重来源于基于流形学习的模糊邻域图。我们在一个包含46个真实世界表格数据集、四种现实异常生成机制和六种噪声水平的异常检测基准上评估了MSDE。与13个已建立的无监督基线相比,MSDE在几种标准分类指标上、在多个噪声水平下以及平均多种异常类型上,均实现了持续强大、平衡且鲁棒的性能。这些结果表明,基于位移的评分方法为现有的无监督异常检测最先进技术提供了一种鲁棒的替代方案。

英文摘要

Unsupervised anomaly detection stands as an important problem in machine learning. Existing unsupervised anomaly detection algorithms rarely perform well across different anomaly types, often excelling only under specific structural assumptions. This lack of robustness also becomes particularly evident under noisy settings. We propose Mean Shift Density Enhancement (MSDE), a fully unsupervised framework that detects anomalies through their geometric response to density-driven manifold evolution. MSDE is designed as a general purpose anomaly detection framework, based on the principle that normal samples, being well supported by local density, remain stable under iterative density enhancement, whereas anomalous samples undergo large cumulative displacements as they are attracted toward nearby density modes. To operationalize this idea, MSDE employs a weighted mean-shift procedure with adaptive, sample-specific density weights derived from a manifold learning-based fuzzy neighborhood graph. We evaluate MSDE on an anomaly detection benchmark comprising 46 real-world tabular datasets, four realistic anomaly generation mechanisms, and six noise levels. Compared to 13 established unsupervised baselines, MSDE achieves consistently strong, balanced and robust performance for several standard classification metrics, at several noise levels and on average over several types of anomalies. These results demonstrate that displacement-based scoring provides a robust alternative to the existing state-of-the-art for unsupervised anomaly detection.

2602.09329 2026-06-16 cs.LG 版本更新

MacrOData: New Benchmarks of Thousands of Datasets for Tabular Outlier Detection

MacrOData:用于表格异常检测的数千个数据集的新基准

Xueying Ding, Simon Klüttermann, Haomin Wen, Yilong Chen, Leman Akoglu

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Technical University of Dortmund(多特蒙德技术大学)

AI总结 提出大规模表格异常检测基准MacrOData,包含2446个数据集,覆盖真实与合成异常,支持全面鲁棒的评估。

Comments 29 pages, KDD 2026

详情
AI中文摘要

质量基准对于公平准确地跟踪科学进展以及使从业者能够做出明智的方法选择至关重要。表格数据上的异常检测(OD)支撑着众多现实世界应用,然而现有的OD基准仍然有限。突出的OD基准AdBench是文献中的事实标准,但仅包含57个数据集。除了本文讨论的其他缺点外,其小规模严重限制了多样性和统计功效。我们引入了MacrOData,一个用于表格OD的大规模基准套件,包含三个精心策划的组成部分:OddBench,包含790个具有真实世界语义异常的数据集;OvrBench,包含856个具有真实世界统计异常的数据集;以及SynBench,包含800个合成生成的数据集,涵盖多样化的数据先验和异常类型。由于其规模和多样性,MacrOData能够对表格OD方法进行全面且统计稳健的评估。我们的基准进一步满足几个关键需求:我们为所有数据集提供标准化的训练/测试划分,公共/私有基准划分,其中私有划分的测试标签保留用于在线排行榜,并为我们的数据集注释语义元数据。我们在所有基准上进行了广泛的实验,评估了广泛的OD方法,包括经典模型、深度模型和基础模型,以及多样化的超参数配置。我们报告了详细的实证发现、实用指南以及个体性能,作为未来研究的参考。所有包含2446个数据集的基准均已开源,并在此https URL上托管了公开可访问的排行榜。

英文摘要

Quality benchmarks are essential for fairly and accurately tracking scientific progress and enabling practitioners to make informed methodological choices. Outlier detection (OD) on tabular data underpins numerous real-world applications, yet existing OD benchmarks remain limited. The prominent OD benchmark AdBench is the de facto standard in the literature, yet comprises only 57 datasets. In addition to other shortcomings discussed in this work, its small scale severely restricts diversity and statistical power. We introduce MacrOData, a large-scale benchmark suite for tabular OD comprising three carefully curated components: OddBench, with 790 datasets containing real-world semantic anomalies; OvrBench, with 856 datasets featuring real-world statistical outliers; and SynBench, with 800 synthetically generated datasets spanning diverse data priors and outlier archetypes. Owing to its scale and diversity, MacrOData enables comprehensive and statistically robust evaluation of tabular OD methods. Our benchmarks further satisfy several key desiderata: We provide standardized train/test splits for all datasets, public/private benchmark partitions with held-out test labels for the latter reserved toward an online leaderboard, and annotate our datasets with semantic metadata. We conduct extensive experiments across all benchmarks, evaluating a broad range of OD methods comprising classical, deep, and foundation models, over diverse hyperparameter configurations. We report detailed empirical findings, practical guidelines, as well as individual performances as references for future research. All benchmarks containing 2,446 datasets combined are open-sourced, along with a publicly accessible leaderboard hosted at https://huggingface.co/MacrOData-CMU.

2602.22422 2026-06-16 cs.LG cs.AI 版本更新

Revisiting Chebyshev Polynomial and Anisotropic RBF Models for Tabular Regression

重新审视切比雪夫多项式和各向异性RBF模型在表格回归中的应用

Luciano Gerber, Huw Lloyd

发表机构 * Department of Computing and Mathematics, Manchester Metropolitan University(计算与数学系,曼彻斯特 Metropolitan 大学)

AI总结 本文在55个数据集上基准测试切比雪夫多项式回归器、各向异性RBF网络和平滑树混合模型,发现平滑模型在CPU可行模型中与树集成准确率相当且泛化差距更小,建议将其纳入候选池。

Comments 46 pages, 6 figures, 21 tables. Under review at Knowledge-Based Systems

详情
AI中文摘要

平滑基模型如切比雪夫多项式回归器和径向基函数(RBF)网络在数值分析中已得到充分确立。它们的连续可微预测表面适用于代理优化、敏感性分析以及其他响应随输入逐渐变化的环境。尽管具有这些特性,平滑模型在树集成主导的表格回归中很少出现。我们探究它们是否能够竞争,跨55个按应用领域组织的回归数据集对模型进行基准测试。我们开发了一种各向异性RBF网络,具有数据驱动的中心放置和基于梯度的宽度优化,一个岭正则化的切比雪夫多项式回归器,以及一个平滑树混合模型(切比雪夫模型树);这三个模型均作为scikit-learn兼容包发布。我们将这些模型与树集成、预训练transformer和标准基线进行基准测试,评估准确性和泛化行为。transformer在大多数数据集上准确率排名第一,但其GPU依赖性、推理延迟和数据集大小限制制约了其在应用科学和工业中常见的基于CPU环境中的部署。在CPU可行的模型中,平滑模型和树集成在准确率上统计上持平,但前者倾向于表现出更紧的泛化差距。我们建议常规地将平滑基模型纳入候选池,特别是当下游使用受益于更紧的泛化和逐渐变化的预测时。

英文摘要

Smooth-basis models such as Chebyshev polynomial regressors and radial basis function (RBF) networks are well established in numerical analysis. Their continuously differentiable prediction surfaces suit surrogate optimisation, sensitivity analysis, and other settings where the response varies gradually with inputs. Despite these properties, smooth models seldom appear in tabular regression, where tree ensembles dominate. We ask whether they can compete, benchmarking models across 55 regression datasets organised by application domain. We develop an anisotropic RBF network with data-driven centre placement and gradient-based width optimisation, a ridge-regularised Chebyshev polynomial regressor, and a smooth-tree hybrid (Chebyshev model tree); all three are released as scikit-learn-compatible packages. We benchmark these against tree ensembles, a pre-trained transformer, and standard baselines, evaluating accuracy alongside generalisation behaviour. The transformer ranks first on accuracy across a majority of datasets, but its GPU dependence, inference latency, and dataset-size limits constrain deployment in the CPU-based settings common across applied science and industry. Among CPU-viable models, smooth models and tree ensembles are statistically tied on accuracy, but the former tend to exhibit tighter generalisation gaps. We recommend routinely including smooth-basis models in the candidate pool, particularly when downstream use benefits from tighter generalisation and gradually varying predictions.

2605.09169 2026-06-16 cs.LG cs.AI 版本更新

Prediction Bottlenecks Don't Discover Causal Structure (But Here's What They Actually Do)

预测瓶颈不会发现因果结构(但它们实际上做了什么)

Ankit Hemant Lade, Sai Krishna Jasti, Indar Kumar, Aman Chadha

发表机构 * Ankit Hemant Lade Sai Krishna Jasti Indar Kumar Aman Chadha

AI总结 研究通过实验证明,预测模型中的瓶颈无法发现因果结构,但在特定条件下仍表现出一定的干预效果,主要贡献是提出了可复用的验证基准。

Comments 6 pages, 3 tables. Code: https://github.com/ankitlade12/ssm-causal

详情
AI中文摘要

一个仅用于下一步预测的Mamba状态空间模型似乎通过简单的读出$S = |W_{out} W_{in}|$恢复了格兰杰因果结构,早期实验表明该现象在不同架构中普遍,并在$p < 10^{-5}$时受益于干预数据。我们包装了用于测试该主张的协议——标准化合成生成器(VAR/洛伦兹/CauseMe式)、三种干预语义($do(X=c)$、软噪声、随机强迫)、三个真实数据集上的边来源卡片,以及大小匹配的对照组——作为可重用的验证基准,并在五个阶段中检验该主张。方法层面的主张未能通过:(i)简单的线性瓶颈同样表现良好或更优;(ii)在合成CauseMe式基准和洛伦兹96(唯一具有明确地面真实性的现实基准)上,调优的Lasso在瓶颈之上;经典PCMCI和格兰杰领先紧邻的集群中,瓶颈落后;(iii)头条干预优势约为60%的样本量混杂因素,残差在标准$do(X=c)$干预下消失,仅在非标准随机强迫方案下存活;(iv)即使该残差再现,其效果在经典二元格兰杰中重现,效果更具普遍性。所剩的是狭窄的特征化结果;基准是持久的产物,上述每个阶段都是其对照组之一。

英文摘要

A Mamba state-space model trained only for next-step prediction appears to recover Granger-causal structure through a simple readout $S = |W_{out} W_{in}|$, with early experiments suggesting the phenomenon generalized across architectures and benefited from interventional data at $p < 10^{-5}$. We package the protocol used to test that claim -- standardized synthetic generators (VAR/Lorenz/CauseMe-style), three intervention semantics ($do(X=c)$, soft-noise, random-forcing), edge-provenance cards on three real datasets, and size-matched control arms -- as a reusable falsification benchmark, and walk the claim through it in five stages. The method-level claim does not survive: (i) a plain linear bottleneck does as well or better; (ii) tuned Lasso beats the bottleneck on synthetic CauseMe-style benchmarks, and on Lorenz-96 (the only real benchmark with unambiguous ground truth) classical PCMCI and Granger lead a tight cluster in which the bottleneck trails; (iii) the headline intervention advantage is roughly 60% a sample-size confound, and the residual disappears under standard $do(X=c)$ interventions, surviving only under a non-standard random-forcing scheme; (iv) even that residual reproduces, with a larger effect, in classical bivariate Granger -- the effect is method-agnostic. What survives is a narrow characterization result; the benchmark is the lasting artifact, and each stage above is one of its control arms.

2605.26418 2026-06-16 cs.LG cs.AI cs.DC 版本更新

When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control

深度强化学习何时超越校准基线?自适应资源控制的基准研究

Guilin Zhang, Chuanyi Sun, Kai Zhao, Xu Chu, Shahryar Sarkani, John Fossaceca

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学) University of Toronto(多伦多大学)

AI总结 通过RLScale-Bench基准测试,发现校准的基于规则的自动缩放器在所有工作负载上成本均低于六种主流深度强化学习算法,并揭示了算法选择、基线校准和评估协议的关键瓶颈。

详情
AI中文摘要

一个适当校准的基于规则的自动缩放器可以在我们测试的每个工作负载上,在成本方面击败六种主流深度强化学习(DRL)算法——那么,如果存在的话,DRL究竟何时能真正发挥作用?我们在RLScale-Bench中研究这个问题,这是一个用于自适应资源控制的DRL可重复基准和评估协议,其中代理在成本和服务级别约束下将计算资源分配给动态工作负载。我们在匹配的架构、训练预算和奖励函数下,评估PPO、DQN、A2C、SAC、TD3和DDPG,与校准的基于规则基线在六个工作负载模式和五个种子(240次运行)上进行对比,在Kubernetes水平Pod自动缩放上实例化基准,并探测分布偏移泛化。三个发现挑战了常见假设:(i)校准控制器在所有六个工作负载上实现了最低成本,尽管在突发和闪流流量上落后于最佳RL代理;(ii)由于动作空间不匹配,离散动作算法在约束违反方面比连续动作算法好一到两个数量级;(iii)没有单一算法在所有工作负载上占主导地位,排名变化高达四个位置。基于RL的资源控制的瓶颈不是算法选择,而是基线校准、奖励工程和现实的评估协议。

英文摘要

A properly calibrated rule-based autoscaler can beat every one of six mainstream deep reinforcement learning (DRL) algorithms on cost across every workload we test - so when, if ever, does DRL actually help? We study this in RLScale-Bench, a reproducible benchmark and evaluation protocol for DRL on adaptive resource control, where an agent allocates compute to a dynamic workload under cost and service-level constraints. We evaluate PPO, DQN, A2C, SAC, TD3, and DDPG under matched architectures, training budgets, and reward functions against a calibrated rule-based baseline across six workload patterns and five seeds (240 runs), instantiate the benchmark on Kubernetes Horizontal Pod Autoscaling, and probe distribution-shift generalization. Three findings challenge common assumptions: (i) the calibrated controller achieves the lowest cost on all six workloads, though it trails the best RL agents on bursty and flash traffic; (ii) discrete-action algorithms outperform continuous-action ones by one to two orders of magnitude in constraint violations due to action-space mismatch; and (iii) no single algorithm dominates across workloads, with rankings shifting by up to four positions. The bottleneck in RL-based resource control is not algorithm selection but baseline calibration, reward engineering, and realistic evaluation protocols.

2605.27618 2026-06-16 cs.LG 版本更新

Evaluating Local Explainability Metrics for Machine Learning Models on Tabular Data

评估表格数据机器学习模型的局部可解释性指标

Tomás Pereira, João Vitorino, Eva Maia, Isabel Praça

发表机构 * GECAD, ISEP, Polytechnic of Porto(GECAD、ISEP、波尔图理工大学)

AI总结 研究局部可解释性技术在复杂表格分类任务中的可信度,通过基准测试LIME、Kernel SHAP和特征消融技术,发现解释质量主要受数据集复杂性和特征分布影响,而非模型预测性能。

Comments 9 pages, 12 tables, 1 figure, DATA 2026 Conference

详情
AI中文摘要

尽管广泛使用可解释性技术来尝试理解人工智能(AI)的行为,但生成的解释可能并不总是可靠的。一个解释对人类来说可能看似合理,但未能捕捉模型的内部推理,特别是在处理复杂的表格数据时。本文研究了局部可解释性技术在应用于复杂表格分类任务时的可信度,考虑了三个主要属性的评估指标:对模型预测的忠实度、对输入数据变化的鲁棒性以及解释本身的复杂性。对局部可解释模型无关解释(LIME)、Kernel SHAP(Shapley Additive exPlanations)和特征消融技术进行了基准测试,涉及32个数据集和不同类型的机器学习模型。分析了模型性能范围,以识别两组:共识正确(所有模型正确预测的样本)和共识错误(所有模型错误预测的样本)。获得的结果表明,解释并不总是与模型的预测性能相关。相反,数据集复杂性和特征分布似乎是影响解释质量和可靠性的主要因素。

英文摘要

Despite the wide use of explainability techniques to attempt to understand the behavior of Artificial Intelligence (AI), the generated explanations may not always be reliable. An explanation can appear plausible to humans but fail to capture the internal reasoning of a model, particularly when dealing with complex tabular data. This paper studies the trustworthiness of local explainability techniques when applied to complex tabular classification tasks, considering evaluated metrics for three main properties: faithfulness to the model's predictions, robustness to input data variations, and complexity of the explanation itself. A benchmark was performed for Local Interpretable Model-Agnostic Explanations (LIME), Kernel SHapley Additive exPlanations (SHAP), and Feature Ablation techniques, across 32 datasets and different types of machine learning models. Model performance ranges were analyzed to identify two groups: consensus-correct, which are samples that all models predicted correctly, and consensus-wrong, samples that all models predicted incorrectly. The obtained results demonstrate that that the explanations are not always correlated with a model's predictive performance. Instead, dataset complexity and feature distributions seem to be the main factors affecting explanation quality and reliability.

2606.01602 2026-06-16 cs.LG cs.AI cs.IT math.IT 版本更新

Estimating Mutual Information between Time Series and Temporal Event Sequences Across Diverse Analysis Tasks

估计时间序列与时间事件序列在不同分析任务中的互信息

Haoji Hu, Huaqing Mao, Yijun Lin, Xiaowei Jia, Jinwei Zhou, Minoh Jeong, Yao-Yi Chiang

发表机构 * University of Minnesota - Twin Cities(明尼苏达大学-双城分校) University of Pittsburgh(匹兹堡大学) Inha University(Inha大学)

AI总结 提出一种非参数互信息估计器,直接度量连续时间序列与离散事件序列之间的依赖关系,无需数据转换或离散化,通过处理量化伪影和事件冗余实现鲁棒统一框架。

详情
AI中文摘要

成对依赖度量(如相关性和因果性)是时间数据挖掘的基础,但目前仍缺乏一种原则性且稳健的方法来量化异构数据类型之间的依赖关系,特别是连续时间序列与离散时间事件序列之间。现有方法依赖于对量化、重复值和事件冗余高度敏感的临时变换或互信息估计器,导致实践中结果有偏或不稳定。我们提出一种非参数互信息估计器,无需数据转换、学习或临时离散化,直接度量时间序列与事件序列之间的依赖关系。我们的方法对真实世界时间序列的连续-离散二元性进行建模,以处理量化和重复值伪影,并引入潜在事件聚类策略以减轻事件共现和冗余带来的偏差。这些共同构成了一个鲁棒且统一的框架,桥接了离散和连续互信息。我们在四个代表性任务上评估了所提出的估计器:用于因果分析的离散-连续时延互信息、全局和局部时间重复发现、用于时间序列预测的离散协变量选择以及用于分类的连续特征选择。在合成和真实世界数据集上的实验表明,在准确性、鲁棒性和可解释性方面,该方法一致优于现有方法,使其成为异构时间数据的通用依赖算子,类似于同质时间序列的皮尔逊相关。代码见:https://github.com/HaojiHu/Multimodal-Temporal-Data-Quantification

英文摘要

Pairwise dependence measures such as correlation and causality are fundamental to temporal data mining, yet there is still no principled and robust way to quantify dependence between heterogeneous data types, especially between continuous time series and discrete temporal event sequences. Existing approaches rely on ad hoc transformations or mutual-information estimators that are highly sensitive to quantization, repeated values, and event redundancy, leading to biased or unstable results in practice. We propose a nonparametric mutual information estimator that directly measures the dependence between time series and event sequences without data transformation, learning, or ad hoc discretization. Our method models the continuous-discrete duality of real-world time series to handle quantization and repeated-value artifacts and introduces a latent event clustering strategy to mitigate bias from event co-occurrence and redundancy. Together, these yield a robust and unified framework that bridges discrete and continuous mutual information. We evaluate the proposed estimator on four representative tasks: discrete-continuous time-delayed mutual information for causality analysis, global and local temporal repetition discovery, discrete covariate selection for time series forecasting, and continuous feature selection for classification. Experiments on synthetic and real-world datasets show consistent improvements over existing methods in accuracy, robustness, and interpretability, positioning our approach as a general-purpose dependence operator for heterogeneous temporal data, similar to Pearson correlation for homogeneous time series. Code available at: https://github.com/HaojiHu/Multimodal-Temporal-Data-Quantification

2606.02670 2026-06-16 cs.LG cs.AI 版本更新

Anomalies in Multivariate Time Series Benchmarks Are Mostly Univariate

多变量时间序列基准中的异常主要是单变量的

Marc Pinet, Julien Cumin, Samuel Berlemont, Dominique Vaufreydaz

发表机构 * Orange Research(Orange研究院) Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG(格勒诺布尔阿尔卑斯大学、CNRS、格勒诺布尔INP、LIG)

AI总结 本文通过诊断框架和实验证明,当前多变量时间序列异常检测基准中,异常主要源于单变量偏离,跨通道结构变化极少,因此现有基准不适合验证跨通道建模能力。

Comments Accepted at the 12th International Workshop on Mining and Learning from Time Series (MiLeTS), co-located with KDD 2026

详情
AI中文摘要

许多最新的多变量时间序列异常检测(MT-SAD)模型引入了跨通道建模,其隐含假设是异常的结构可能分布在多个通道上。我们在八个广泛使用的公共基准上评估了这一假设,引入了一个逐段诊断框架,该框架针对每个标记的异常,标记是否至少有一个通道单独偏离其正常历史,是否跨通道相关结构发生变化,或两者兼有。该框架表明,在一系列合理阈值下,没有跨通道破裂发生在没有伴随单变量偏离的情况下。一个补充指标还显示,在八个基准中的六个上,至少一半的标记异常段在79%到100%的时间步上发生单变量偏离,在其中的三个数据集上达到100%。为了验证我们的框架在存在跨通道结构时能够捕获它,我们构建了具有共享噪声的相移正弦通道的合成数据。每个异常段通过两种通道级损坏之一进行改变,这些损坏保留了每个通道的边缘分布,同时破坏了跨通道结构,我们的框架正确地将这些段表征为仅跨通道异常。在这些数据上,依赖通道(CD)模型成功利用了跨通道信号,而独立通道(CI)模型则失败。在真实基准上对最近SOTA检测器的CI/CD比较进一步证实了CD建模没有带来可衡量的收益。我们得出结论,当前的MT-SAD基准不适合验证跨通道建模能力,并呼吁开发更多结构多样的评估集。本研究的代码已公开。

英文摘要

Many recent multivariate time series anomaly detection (MTSAD) models incorporate cross-channel modeling, under the implicit assumption that the structure of anomalies may be spread across multiple channels. We evaluate this assumption on eight widely used public benchmarks by introducing a per-segment diagnostic framework that flags, for each labeled anomaly, whether at least one channel deviates individually from its normal history, whether the cross-channel correlation structure changes, or both. The framework shows that no cross-channel rupture occurs without an accompanying univariate deviation across a range of reasonable thresholds. A complementary metric also reveals that on six of the eight benchmarks, at least half of the labeled anomaly segments deviate univariately on 89% to 100% of their timesteps, reaching 100% on three of these datasets. To verify that our framework captures cross-channel structure when present, we construct synthetic data of phase-shifted sinusoidal channels with shared noise. Each anomalous segment is altered through one of two channel-wise corruptions that preserve the per-channel marginal distribution while breaking cross-channel structure, and our framework correctly characterizes these segments as cross-channel-only. On these data, channel-dependent (CD) models successfully exploit the cross-channel signal whereas channel-independent (CI) ones fail. The CI/CD comparison of a recent SOTA detector on real benchmarks further confirms that CD modeling brings no measurable gain. We conclude that current MTSAD benchmarks are unsuitable for validating cross-channel modeling capabilities, and we call for the development of more structurally diverse evaluation sets. The code for this study is publicly available.

2606.05692 2026-06-16 cs.LG cs.AI 版本更新

Benchmarking Counterfactual Prediction in Epidemic Time Series with Time-Varying Interventions

具有时变干预的流行病时间序列中的反事实预测基准测试

Wenhao Mu, Facundo Yan, Anik Mumssen, Marisa Eisenberg, Alexander Rodríguez

发表机构 * University of Michigan Computer Science and Engineering(密歇根大学计算机科学与工程系) University of Michigan Epidemiology & Complex Systems(密歇根大学流行病学与复杂系统)

AI总结 为解决缺乏可观测反事实结果的真实基准问题,基于校准的基于智能体的模型生成大规模流行病时间序列反事实预测基准,支持静态/时变治疗和单/多策略干预,评估多种因果推断方法。

Comments To appear in Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

详情
AI中文摘要

深度学习在时间序列因果推断方面取得了显著进展,但由于缺乏具有可观测反事实结果的现实基准,进展仍然受到限制。现有数据集要么依赖没有真实反事实的真实世界观测,要么依赖无法捕捉复杂因果动态的简化模拟。为了解决这一差距,我们开发了一个大规模基准,用于动态干预下流行病时间序列的反事实预测。与现有基准不同,它支持静态和时变治疗,以及单策略和多策略干预设置,从而能够在广泛的因果推断场景中评估因果推断方法。利用基于真实世界人口、流动性、流行病学和政策数据校准的基于智能体的模型,我们生成了跨越美国150多个县的真实反事实轨迹。使用该基准,我们评估了广泛使用和最先进的因果推断方法,揭示了显著的性能差异,并突出了现实时间序列因果推理的挑战。

英文摘要

Deep learning has enabled significant advances in time-series causal inference, yet progress remains constrained by the lack of realistic benchmarks with observable counterfactual outcomes. Existing datasets either rely on real-world observations without ground-truth counterfactuals or on simplified simulations that fail to capture complex causal dynamics. To address this gap, we develop a large-scale benchmark for counterfactual prediction in epidemic time series under dynamic interventions. Unlike existing benchmarks, it supports static and time-varying treatments, as well as both single-policy and multi-policy intervention settings, enabling evaluation of causal inference methods across a broad range of causal inference scenarios. Leveraging a calibrated agent-based model grounded in real-world demographic, mobility, epidemiological, and policy data, we generate realistic counterfactual trajectories across more than 150 U.S. counties. Using this benchmark, we evaluate widely used and state-of-the-art causal inference methods, revealing substantial performance differences and highlighting the challenges of realistic time-series causal reasoning.

2606.08583 2026-06-16 cs.LG eess.SP 版本更新

A spectral audit framework reveals task-dependent aperiodic reliance across EEG and ECG deep learning

频谱审计框架揭示EEG和ECG深度学习中任务依赖的非周期性依赖

Jasmeet Singh Bindra, Siddharth Panwar

发表机构 * Indian Knowledge Systems and Mental Health Applications (IKSMHA) Center, Indian Institute of Technology Mandi(印度理工学院曼迪分校印度知识体系与心理健康应用中心) School of Computing and Electrical Engineering, Indian Institute of Technology Mandi(印度理工学院曼迪分校计算与电气工程学院)

AI总结 提出频谱审计框架,结合非周期/周期分解、相位保持傅里叶干预等,发现深度学习模型对非周期成分的依赖是任务依赖且架构通用的,在睡眠-觉醒分类中影响显著,临床异常检测中中等,运动想象中最小,并扩展到ECG。

Comments 25 pages, being prepared for submission to peer-reviewed journal

详情
AI中文摘要

生理时间序列的深度学习通过领域特定特征解释——EEG中的振荡节律、ECG中的形态复合波——但这些信号位于一个宽带非周期1/f样包络之上,该包络与觉醒、年龄和病理共变。我们引入了一个频谱审计框架,结合非周期/周期分解、相位保持傅里叶干预、假对照和模拟验证。非周期依赖是任务依赖且架构通用的:在六种神经架构中,对于睡眠-觉醒分类,平坦化下降超过0.42平衡准确率点;对于临床异常检测达到0.07-0.13;对于运动想象保持最小。七个EEG基础模型中有六个在临床EEG上显示出FDR显著的非周期依赖;年龄/性别和记录时代控制减少了但未消除该效应。将审计应用于PTB-XL ECG,发现神经下降0.32-0.36,在人口统计匹配后持续存在,确认此类混淆因素扩展到EEG之外。非周期控制应成为可解释生理时间序列深度学习的标准。

英文摘要

Deep learning on physiological time series is interpreted through domain-specific features -- oscillatory rhythms in EEG, morphological complexes in ECG -- yet these signals sit atop a broadband aperiodic 1/f-like envelope that covaries with arousal, age, and pathology. We introduce a spectral audit framework combining aperiodic/periodic decomposition, phase-preserving Fourier interventions, sham controls, and simulation validation. Aperiodic reliance was task-dependent and architecture-general: across six neural architectures, flattening drops exceeded 0.42 balanced-accuracy points for sleep-wake classification, reached 0.07-0.13 for clinical abnormality detection, and remained minimal for motor imagery. Six of seven EEG foundation models showed FDR-significant aperiodic reliance on clinical EEG; age/sex and recording-era controls reduced but did not eliminate the effect. Applying the audit to PTB-XL ECG revealed neural drops of 0.32--0.36 persisting after demographic matching, confirming this confound class extends beyond EEG. Aperiodic controls should become standard for interpretable physiological time-series deep learning.

2606.08594 2026-06-16 cs.LG eess.SP 版本更新

How Much Capacity Does EEG Denoising Need? Ultra-Compact Networks reveal Benchmark Saturation and Metric-Utility Gap

脑电图去噪需要多少容量?超紧凑网络揭示基准饱和与度量-效用差距

Jasmeet Singh Bindra, Siddharth Panwar

发表机构 * Indian Knowledge Systems and Mental Health Applications (IKSMHA) Center, Indian Institute of Technology Mandi(印度理工学院曼迪分校印度知识体系与心理健康应用中心) School of Computing and Electrical Engineering, Indian Institute of Technology Mandi(印度理工学院曼迪分校计算与电气工程学院)

AI总结 通过固定架构仅改变通道宽度(1.05K-40.26K参数),发现EEG去噪重建性能在3-6.5K参数时饱和,且重建度量不预测下游BCI效用,超紧凑模型(33-46KB)适用于边缘部署。

Comments 17 pages, will be submitted to peer-reviewed journal

详情
AI中文摘要

深度学习脑电图去噪架构已从数万参数扩展到数千万参数,然而尚无先前研究将模型容量作为实验变量隔离,或测试重建度量是否预测下游神经信号效用。我们通过固定架构、损失、数据划分和训练配方,仅在最小深度可分离卷积U-Net中从1.05K到40.26K参数扫描通道宽度,解决了这两个空白。模型在EEGDenoiseNet基准、跨数据集BCI迁移测试、受控基线重训练以及所有九个BCI竞赛IV-2a受试者的五个解码器家族的下游运动想象分类上进行了评估。重建性能在3-6.5K参数时饱和,肘部后每log10参数单位增益最多0.015相关系数。在相同流程下重训练的8.46M参数基线在EOG上与40.26K紧凑变体匹配——200倍参数差距未带来优势——而Patch-Transformer控制重现了相同的递减回报形状。下游评估揭示了分类器依赖的度量-效用差距:重建优化的去噪显著降低了所有九个受试者和三种伪影类型的CSP+LDA分类(最佳去噪准确率0.547 vs. 噪声基线0.612;Bonferroni p=0.0488),在自然记录试验中持续存在(Delta=-0.047;BH-FDR q=0.0049)。端到端神经解码器显示可变或中性效果。标准EEG去噪基准在远低于当前模型容量时已饱和,重建度量不预测BCI效用。33-46 KB和1.27-2.61M FLOPs/段的超紧凑模型适用于边缘部署。这些发现主张容量控制评估、更困难的任务感知基准以及强制性的下游验证。

英文摘要

Deep learning EEG denoising architectures have scaled from tens of thousands to tens of millions of parameters, yet no prior study has isolated model capacity as the experimental variable or tested whether reconstruction metrics predict downstream neural-signal utility. We address both gaps by fixing architecture, loss, data split, and training recipe while sweeping only channel width from 1.05K to 40.26K parameters in a minimal depthwise-separable convolutional U-Net. Models were evaluated on the EEGDenoiseNet benchmark, cross-dataset BCI transfer tests, controlled baseline retraining, and downstream motor-imagery classification with five decoder families across all nine BCI Competition IV-2a subjects. Reconstruction performance saturated by 3-6.5K parameters, with post-elbow gains of at most 0.015 correlation coefficient per log10-parameter unit. An 8.46M-parameter baseline retrained under the same pipeline matched the 40.26K compact variant on EOG--a 200x parameter gap yielding no advantage--while a Patch-Transformer control reproduced the same diminishing-return shape. Downstream evaluation exposed a classifier-dependent metric-utility gap: reconstruction-optimized denoising significantly degraded CSP+LDA classification across all nine subjects and three artifact types (best denoised accuracy 0.547 vs. 0.612 noisy baseline; Bonferroni p=0.0488), persisting on naturally recorded trials (Delta=-0.047; BH-FDR q=0.0049). End-to-end neural decoders showed variable or neutral effects. Standard EEG denoising benchmarks are saturated far below current model capacity, and reconstruction metrics do not predict BCI utility. Ultra-compact models at 33-46 KB and 1.27-2.61M FLOPs/segment are practical for edge deployment. These findings argue for capacity-controlled evaluation, harder task-aware benchmarks, and mandatory downstream validation.

2306.11252 2026-06-16 cs.CL cs.LG 版本更新

HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation

HK-LegiCoST: 利用非逐字转录进行语音翻译

Cihan Xiao, Henry Li Xinyuan, Jinyi Yang, Dongji Gao, Matthew Wiesner, Kevin Duh, Sanjeev Khudanpur

发表机构 * Center for Language and Speech Processing(语言与语音处理中心) Human Language Technology Center of Excellence(人类语言技术卓越中心) Johns Hopkins University(约翰霍普金斯大学)

AI总结 提出HK-LegiCoST语料库,包含600+小时粤语-英语三路平行数据,解决非逐字转录的句子级对齐挑战,在粤语语音翻译上取得竞争性基线并跨语料库验证。

详情
AI中文摘要

我们介绍了HK-LegiCoST,一个新的粤语-英语三路平行语料库,包含600+小时的粤语音频、其标准繁体中文转录和英文翻译,并在句子级别进行切分和对齐。我们描述了语料库准备中的显著挑战:切分、长音频记录的对齐,以及与非逐字转录的句子级对齐。当源语言的口语和书面形式存在显著差异时,此类转录使语料库适用于语音翻译研究。由于其大规模,我们能够在HK-LegiCoST上展示具有竞争力的语音翻译基线,并将其扩展到FLEURS粤语子集上具有前景的跨语料库结果。这些结果为语音识别和翻译研究提供了见解,特别是对于因各种因素(包括方言和口语)而常见非逐字或“噪声”转录的语言。

英文摘要

We introduce HK-LegiCoST, a new three-way parallel corpus of Cantonese-English translations, containing 600+ hours of Cantonese audio, its standard traditional Chinese transcript, and English translation, segmented and aligned at the sentence level. We describe the notable challenges in corpus preparation: segmentation, alignment of long audio recordings, and sentence-level alignment with non-verbatim transcripts. Such transcripts make the corpus suitable for speech translation research when there are significant differences between the spoken and written forms of the source language. Due to its large size, we are able to demonstrate competitive speech translation baselines on HK-LegiCoST and extend them to promising cross-corpus results on the FLEURS Cantonese subset. These results deliver insights into speech recognition and translation research in languages for which non-verbatim or ``noisy'' transcription is common due to various factors, including vernacular and dialectal speech.

2401.14283 2026-06-16 stat.ML cs.LG 版本更新

Information Leakage Detection through Approximate Bayes-optimal Prediction

通过近似贝叶斯最优预测的信息泄露检测

Pritha Gupta, Marcel Wever, Eyke Hüllermeier

发表机构 * University of Potsdam(波恩大学) University of Hanover(汉诺威大学) Ludwig-Maximilians-University Munich(慕尼黑大学)

AI总结 提出基于统计学习与信息论的理论框架,通过自动机器学习近似贝叶斯预测器的对数损失和准确率来估计互信息,从而检测信息泄露,在合成和真实OpenSSL TLS服务器数据集上优于现有方法。

Comments Accepted at Information Sciences

详情
AI中文摘要

在当今数据驱动的世界中,公开可用信息的激增因信息泄露(IL)问题而引发安全担忧。IL涉及通过可观察的系统信息无意中将敏感信息暴露给未经授权的方。传统的统计方法依赖于估计可观察信息与秘密信息之间的互信息(MI)来检测IL,面临维度灾难、收敛性、计算复杂性和MI误估计的挑战。尽管有效,新兴的基于监督机器学习的方法检测IL仅限于二元系统敏感信息,并且缺乏全面的框架。为了解决这些局限性,我们利用统计学习理论和信息论建立了一个理论框架,以准确量化和检测IL。使用自动机器学习,我们证明通过近似通常未知的贝叶斯预测器的对数损失和准确率,可以准确估计MI。基于此,我们展示了如何有效估计MI以检测IL。在考虑合成和真实OpenSSL TLS服务器数据集的实证研究中,我们的方法优于最先进的基线方法。

英文摘要

In today's data-driven world, the proliferation of publicly available information raises security concerns due to the information leakage (IL) problem. IL involves unintentionally exposing sensitive information to unauthorized parties via observable system information. Conventional statistical approaches rely on estimating mutual information (MI) between observable and secret information for detecting ILs, face challenges of the curse of dimensionality, convergence, computational complexity, and MI misestimation. Though effective, emerging supervised machine learning based approaches to detect ILs are limited to binary system sensitive information and lack a comprehensive framework. To address these limitations, we establish a theoretical framework using statistical learning theory and information theory to quantify and detect IL accurately. Using automated machine learning, we demonstrate that MI can be accurately estimated by approximating the typically unknown Bayes predictor's log-loss and accuracy. Based on this, we show how MI can effectively be estimated to detect ILs. Our method performs superior to state-of-the-art baselines in an empirical study considering synthetic and real-world OpenSSL TLS server datasets.

2502.08266 2026-06-16 cs.CL cs.AI cs.LG 版本更新

Dealing with Annotator Disagreement in Hate Speech Classification

处理仇恨言论分类中的标注者分歧

Somaiyeh Dehghan, Mehmet Umut Sen, Berrin Yanikoglu

发表机构 * Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul, Turkey(工程与自然科学学院,Sabanci大学,伊斯坦布尔,土耳其) Center of Excellence in Data Analytics (VERIM), Sabanci University, Istanbul, Turkey(数据分析卓越中心(VERIM),Sabanci大学,伊斯坦布尔,土耳其)

AI总结 研究标注者分歧对仇恨言论分类的影响,评估多数投票等聚合方法,并利用感知强度增强分类性能,在土耳其语推文中取得新最优结果。

Comments 19 pages, 4 Tables

详情
AI中文摘要

仇恨言论检测是一项关键任务,尤其是在有害内容可能迅速传播的社交媒体上。收集社交媒体内容(如推文)来训练机器学习模型很容易,但由于其固有的主观性,检测和分类仇恨言论可能很困难。这种主观性导致标注者之间频繁出现分歧,尤其是对于微妙或边缘内容。传统方法要么丢弃非共识样本,要么通过专家裁决强制设定“黄金标准”,忽略了关于不确定性和多样化人类视角的宝贵信息。我们研究了仇恨言论分类中标注者分歧这一很大程度上被忽视的问题,并评估了一系列聚合方法,包括多数投票、序数策略(最小值、最大值和均值),并分析了它们在二分类、四分类和六分类任务中的影响。此外,我们利用标注者感知的仇恨言论强度分数来探索基于回归和混合建模的方法。我们证明,过滤非共识样本会导致过于乐观的结果,而感知强度提供了增强分类性能的补充信号。最后,我们在土耳其语推文的仇恨言论检测中建立了新的最优结果,并表明标注者分歧在适当建模后,是构建更稳健可靠系统的宝贵资源。

英文摘要

Hate speech detection is a crucial task, especially on social media where harmful content can spread quickly. Collecting social media content (tweets etc.) to train machine learning models is easy, but detecting and categorizing hate speech can be difficult due to the inherently subjective nature. This subjectivity leads to frequent disagreement among annotators, particularly for subtle or borderline content. Traditional approaches either discard non-consensus samples or force a ''gold standard'' through expert adjudication, ignoring valuable information about uncertainty and diverse human perspectives. We examine the largely overlooked problem of annotator disagreement in hate speech classification and evaluate a range of aggregation methods, including majority voting, ordinal strategies (minimum, maximum, and mean), and analyze their impact across binary, 4-class, and 6-class classification tasks. In addition, we leverage annotators' perceived hate speech strength scores to explore regression-based and hybrid modeling approaches. Among others, we show that filtering non-consensus samples results in over-optimistic results and that the perceived strength provides a complementary signal that enhance classification performance. Finally, we establish new state-of-the-art results for hate speech detection in Turkish tweets, and demonstrate that annotator disagreement, when properly modeled, is a valuable resource for building more robust and reliable systems.

2505.13553 2026-06-16 cs.SE cs.LG 版本更新

Towards Functional Correctness of Large Code Models with Selective Generation

面向大型代码模型的功能正确性:选择性生成方法

Jaewoo Jeong, Taesoo Kim, Sangdon Park

发表机构 * KAIST(韩国科学技术院)

AI总结 针对代码生成模型的幻觉问题,提出利用动态代码分析自动生成单元测试,基于功能正确性评估进行选择性生成,以控制非弃权答案的错误发现率,并引入FuzzEval范式用于精确评估。

Comments ICML 2026

详情
AI中文摘要

代码生成模型的幻觉阻碍了其在需要更高安全标准的系统中的应用。解决代码幻觉的一个关键瓶颈是难以识别生成代码的功能正确性,因为其形式不自然。我们通过利用代码的可执行性质,使用动态代码分析工具自动生成单元测试来解决这一核心瓶颈。据此,我们提出了一种选择性代码生成器,它基于生成的单元测试评估的功能正确性,放弃不确定的生成,从而在理论上控制非弃权答案的正确性,即错误发现率。最后,我们建议在评估以及学习中使用生成的单元测试进行精确代码评估,称此范式为FuzzEval。我们展示了我们方法的有效性,以及代码幻觉的可控性和合理的选择效率。

英文摘要

The hallucination of code generation models hinders their applicability to systems requiring higher safety standards. One critical bottleneck in addressing code hallucination is the difficulty of identifying the functional correctness of generated code, due to its unnatural form. We address this core bottleneck by automatically generating unit tests using dynamic code analysis tools, leveraging the \emph{executable nature} of code. Accordingly, we propose a \emph{selective code generator} that abstains from uncertain generations -- based on the functional correctness evaluated by generated unit tests -- to theoretically control the correctness among non-abstained answers, \ie the false discovery rate. Finally, we propose to use generated unit tests in evaluation as well as in learning for precise code evaluation, calling this paradigm \emph{FuzzEval}. We demonstrate the efficacy of our method along with the controllability of code hallucination and reasonable selection efficiency.

2510.04127 2026-06-16 cs.IR cs.AI cs.CV cs.LG 版本更新

Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era

投影与量化:学习哈希的统一视角,从随机投影到RAG时代

Sean Moran

发表机构 * Independent Researcher(独立研究者) London United Kingdom(伦敦英国)

AI总结 提出投影-量化-组织(PQO)框架,统一理解从局部敏感哈希到深度哈希、乘积量化、图索引及向量数据库二进制嵌入的方法,并通过可复现实验揭示量化轴上的内存-质量权衡。

Comments 80 pages, 19 figures, 22 tables. Survey. Accompanying open benchmark (BitBudget): https://github.com/sjmoran/bitbudget ; live leaderboard: https://sjmoran.github.io/bitbudget/

详情
AI中文摘要

近似最近邻(ANN)搜索支撑着大规模检索,尤其是在增强大型语言模型的检索增强生成管道中,但解决该问题的方法已在不同社区中激增,以至于很少被视为一个统一领域。我们认为它们构成一个具有三个设计选择的领域,并开发了投影-量化-组织(PQO)视角,在该视角下,局部敏感哈希、学习二进制哈希、深度端到端哈希、乘积量化、基于图的索引以及现代向量数据库的二进制嵌入都是三个耦合问题的设置:投影放置在哪里,量化阈值放置在哪里,以及如何组织生成的编码。投影然后量化的解读是已有的;我们的贡献是第三个同等重要的组织阶段,证明这三个阶段从该领域的起源到深度、乘积量化、图和检索增强时代一脉相承,以及一个可复现的测量,将视角从分类方法转向预测方法。该测量得出三个发现。首先,内存节省在量化轴上:一位编码的大小是浮点数的三十二分之一,而在短候选列表上单次全精度重排序即可完全恢复未压缩的质量。其次,视角预期的权衡顺序在嵌入增长时保持不变。第三,在有监督的情况下,八字节编码的质量比其替换的两千字节浮点数提高一倍以上。我们将这些测量结果发布为BitBudget,一个带有实时排行榜的可扩展基准,将生成式检索的“语义标识符”重新解释为量化编码,并指出随着紧凑编码重回大规模检索中心,随之而来的开放问题。

英文摘要

Approximate nearest-neighbour search underpins large-scale retrieval and retrieval-augmented generation, yet its methods are studied in communities that seldom read one another. We argue that they form one field with three design choices. We develop the projection-quantisation-organisation lens: every method places its projections, places its quantisation thresholds, and organises the resulting codes for search. We test the lens with a reproducible measurement, released as the open BitBudget benchmark, and report three findings. First, the quantisation axis delivers the largest memory savings: a one-bit code with full-precision re-ranking matches uncompressed quality for six of seven embedders, the scanned code one thirty-second of the float's size. Second, the orderings the lens anticipates, including a learned-embedding regime where binary codes overtake an inverted-file product quantiser at a matched byte budget, recur as the embedding is enlarged. Third, given class labels, an eight-byte supervised code more than doubles the retrieval quality of the two-kilobyte task-agnostic float it replaces. We also recast the semantic identifiers of generative retrieval as quantisation codes. The main contribution is a single, tested account of compact-code search, from random projections to the retrieval-augmented era.

2512.01095 2026-06-16 cs.CV cs.AI cs.LG 版本更新

CycliST: A Video Language Model Benchmark for Reasoning on Cyclical State Transitions

CycliST:用于循环状态转换推理的视频语言模型基准

Simon Kohaut, Daniel Ochs, Shun Zhang, Benedict Flade, Julian Eggert, Kristian Kersting, Devendra Singh Dhami

发表机构 * Artificial Intelligence and Machine Learning Lab, TU Darmstadt(人工智能与机器学习实验室,图腾斯达特技术大学) Konrad Zuse School of Excellence in Learning and Intelligent Systems (ELIZA)(Konrad Zuse 学校(ELIZA)) Honda Research Institute Europe GmbH, Offenbach, Germany(本田欧洲研究院,奥芬巴赫,德国) Uncertainty in Artificial Intelligence Group, TU Eindhoven(人工智能不确定性小组,埃因霍温技术大学) Hessian Center for AI (hessian.AI)(黑森人工智能中心(hessian.AI)) Center for Cognitive Science(认知科学中心) German Center for Artificial Intelligence (DFKI)(德国人工智能中心(DFKI))

AI总结 提出CycliST基准,通过合成视频评估视频语言模型对循环状态转换的文本推理能力,揭示现有模型在检测循环模式、时间理解和定量分析方面的局限。

Comments Published in the Journal of Data-centric Machine Learning Research (DMLR); https://openreview.net/forum?id=l03g53HUL2

详情
Journal ref
Journal of Data-centric Machine Learning Research, 2026
AI中文摘要

我们提出了CycliST,这是一个新颖的基准数据集,旨在评估视频语言模型(VLM)在循环状态转换上的文本推理能力。CycliST通过生成合成的、结构丰富的视频序列来捕捉现实世界过程的基本方面,这些视频序列具有物体运动和视觉属性的周期性模式。CycliST采用分层评估系统,通过改变循环物体的数量、场景杂乱程度和光照条件逐步增加难度,挑战最先进模型的时空认知能力。我们使用当前最先进的VLM(包括开源和专有模型)进行了大量实验,揭示了它们在泛化到循环动力学(如线性和轨道运动)以及视觉属性(如颜色和尺度)随时间变化方面的局限性。我们的结果表明,当前的VLM难以可靠地检测和利用循环模式,缺乏时间理解的概念,并且无法从场景中提取定量信息(如运动物体的数量),突显了需要解决的重要技术差距。更具体地说,我们发现没有单一模型在性能上始终领先:大小和架构与结果的相关性不强,且没有模型在所有任务上同样成功。通过提供有针对性的挑战和全面的评估框架,CycliST为超越当前最先进水平的视觉推理模型在理解周期性模式方面铺平了道路。

英文摘要

We present CycliST, a novel benchmark dataset designed to evaluate Video Language Models (VLM) on their ability for textual reasoning over cyclical state transitions. CycliST captures fundamental aspects of real-world processes by generating synthetic, richly structured video sequences featuring periodic patterns in object motion and visual attributes. CycliST employs a tiered evaluation system that progressively increases difficulty through variations in the number of cyclic objects, scene clutter, and lighting conditions, challenging state-of-the-art models on their spatio-temporal cognition. We conduct extensive experiments with current state-of-the-art VLMs, both open-source and proprietary, and reveal their limitations in generalizing to cyclical dynamics such as linear and orbital motion, as well as time-dependent changes in visual attributes like color and scale. Our results demonstrate that present-day VLMs struggle to reliably detect and exploit cyclic patterns, lack a notion of temporal understanding, and are unable to extract quantitative insights from scenes, such as the number of objects in motion, highlighting a significant technical gap that needs to be addressed. More specifically, we find no single model consistently leads in performance: neither size nor architecture correlates strongly with outcomes, and no model succeeds equally well across all tasks. By providing a targeted challenge and a comprehensive evaluation framework, CycliST paves the way for visual reasoning models that surpass the state-of-the-art in understanding periodic patterns.

2512.11682 2026-06-16 cs.AI cs.LG 版本更新

MedAI: Evaluating TxAgent's Therapeutic Agentic Reasoning in the NeurIPS CURE-Bench Competition

MedAI: 评估 TxAgent 在 NeurIPS CURE-Bench 竞赛中的治疗性智能推理

Tim Cofala, Christian Kalfar, Jingge Xiao, Johanna Schrader, Michelle Tang, Wolfgang Nejdl

发表机构 * L3S Research Center(L3S研究中心)

AI总结 本文介绍 TxAgent,一种通过迭代检索增强生成和统一生物医学工具集进行治疗决策的智能AI方法,并在CURE-Bench竞赛中评估其推理质量,通过改进工具检索策略提升性能,荣获开放科学卓越奖。

Comments 7 pages, 3 figures

详情
AI中文摘要

临床医学中的治疗决策构成了一个高风险领域,其中AI指导与患者特征、疾病过程和药物制剂之间的复杂相互作用相互交织。药物推荐、治疗计划和不良反应预测等任务需要基于可靠生物医学知识的稳健、多步骤推理。以TxAgent为代表的智能AI方法通过迭代检索增强生成(RAG)应对这些挑战。TxAgent采用微调的Llama-3.1-8B模型,动态生成并执行对统一生物医学工具集(ToolUniverse)的函数调用,整合FDA药物API、OpenTargets和Monarch资源,确保获取最新的治疗信息。与通用RAG系统相比,医疗应用施加了严格的安全约束,使得推理轨迹和工具调用序列的准确性至关重要。这些考虑促使评估协议将令牌级推理和工具使用行为视为明确的监督信号。本文展示了我们参与CURE-Bench NeurIPS 2025挑战赛的见解,该挑战赛使用评估正确性、工具利用和推理质量的指标来基准测试治疗推理系统。我们分析了函数(工具)调用的检索质量如何影响整体模型性能,并展示了通过改进工具检索策略实现的性能提升。我们的工作获得了开放科学卓越奖。完整信息请访问此https URL。

英文摘要

Therapeutic decision-making in clinical medicine constitutes a high-stakes domain in which AI guidance interacts with complex interactions among patient characteristics, disease processes, and pharmacological agents. Tasks such as drug recommendation, treatment planning, and adverse-effect prediction demand robust, multi-step reasoning grounded in reliable biomedical knowledge. Agentic AI methods, exemplified by TxAgent, address these challenges through iterative retrieval-augmented generation (RAG). TxAgent employs a fine-tuned Llama-3.1-8B model that dynamically generates and executes function calls to a unified biomedical tool suite (ToolUniverse), integrating FDA Drug API, OpenTargets, and Monarch resources to ensure access to current therapeutic information. In contrast to general-purpose RAG systems, medical applications impose stringent safety constraints, rendering the accuracy of both the reasoning trace and the sequence of tool invocations critical. These considerations motivate evaluation protocols treating token-level reasoning and tool-usage behaviors as explicit supervision signals. This work presents insights derived from our participation in the CURE-Bench NeurIPS 2025 Challenge, which benchmarks therapeutic-reasoning systems using metrics that assess correctness, tool utilization, and reasoning quality. We analyze how retrieval quality for function (tool) calls influences overall model performance and demonstrate performance gains achieved through improved tool-retrieval strategies. Our work was awarded the Excellence Award in Open Science. Complete information can be found at https://curebench.ai/.

2602.16902 2026-06-16 cs.AI cs.LG 版本更新

LLM-WikiRace Benchmark: How Far Can LLMs Plan over Real-World Knowledge Graphs?

LLM-WikiRace 基准测试:大语言模型在真实知识图谱上的规划能力有多强?

Juliusz Ziomek, William Bankes, Lorenz Wolf, Shyam Sundhar Ramesh, Xiaohang Tang, Ilija Bogunovic

发表机构 * University of Oxford, UK(牛津大学,英国) University College London (Centre for AI), UK(伦敦大学学院(人工智能中心),英国) University of Basel, Switzerland(巴塞尔大学,瑞士)

AI总结 提出 LLM-Wikirace 基准,通过维基百科超链接导航任务评估大语言模型的规划、推理与世界知识,发现模型在简单任务上超人类,但困难任务成功率仅 23%,且规划与长程推理是主要瓶颈。

详情
AI中文摘要

我们引入了 LLM-Wikirace,一个用于评估大语言模型(LLM)规划、推理和世界知识的基准。在 LLM-Wikirace 中,模型必须逐步高效地导航维基百科超链接,从给定源页面到达目标页面,这需要前瞻性规划和推理概念如何在现实世界中连接的能力。我们评估了广泛的开源和闭源模型,包括 Gemini-3、GPT-5 和 Claude Opus 4.5,它们在任务的简单级别上取得了最强结果,并展现了超人类性能。尽管如此,在困难难度下性能急剧下降:表现最好的模型 Gemini-3 仅在 23% 的困难游戏中成功,凸显了前沿模型面临的重大挑战。我们的分析表明,世界知识是成功的必要因素,但仅在一定程度内;超过这个阈值,规划和长程推理能力成为主导因素。轨迹级分析进一步揭示,即使是最强的模型在失败后也难以重新规划,经常陷入循环而非恢复。LLM-Wikirace 是一个简单的基准,揭示了当前推理系统的明显局限性,提供了一个开放的竞技场,其中具备规划能力的 LLM 仍有待证明。我们的代码和排行榜可在 https://llmwikirace.github.io 获取。

英文摘要

We introduce LLM-Wikirace, a benchmark for evaluating planning, reasoning, and world knowledge in large language models (LLMs). In LLM-Wikirace, models must efficiently navigate Wikipedia hyperlinks step by step to reach a target page from a given source, requiring look-ahead planning and the ability to reason about how concepts are connected in the real world. We evaluate a broad set of open- and closed-source models, including Gemini-3, GPT-5, and Claude Opus 4.5, which achieve the strongest results on the easy level of the task and demonstrate superhuman performance. Despite this, performance drops sharply on hard difficulty: the best-performing model, Gemini-3, succeeds in only 23\% of hard games, highlighting substantial remaining challenges for frontier models. Our analysis shows that world knowledge is a necessary ingredient for success, but only up to a point, beyond this threshold, planning and long-horizon reasoning capabilities become the dominant factors. Trajectory-level analysis further reveals that even the strongest models struggle to replan after failure, frequently entering loops rather than recovering. LLM-Wikirace is a simple benchmark that reveals clear limitations in current reasoning systems, offering an open arena where planning-capable LLMs still have much to prove. Our code and leaderboard available at https:/llmwikirace.github.io.

2603.02668 2026-06-16 cs.AI cs.LG 版本更新

SorryDB: Can AI Provers Complete Real-World Lean Theorems?

SorryDB: AI证明者能完成现实世界的Lean定理吗?

Austin Letson, Leopoldo Sarra, Auguste Poiroux, Oliver Dressler, Paul Lezeau, Dhyan Aranha, Frederick Pu, Aaron Hill, Miguel Corredera Hidalgo, Julian Berman, George Tsoukalas, Lenny Taelman

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出动态更新的基准SorryDB,包含78个GitHub上的现实形式化项目,评估AI证明者在复杂依赖下的能力,发现当前方法互补,基于Gemini Flash的智能体方法表现最佳。

详情
AI中文摘要

我们提出了SorryDB,一个动态更新的基准,包含从GitHub上78个现实世界形式化项目中提取的开放Lean任务。与现有的静态基准(通常由竞赛问题组成)不同,攀登SorryDB基准将产生与社区需求对齐、对数学家更易用、更能理解复杂依赖的工具。此外,通过提供持续更新的任务流,SorryDB减轻了测试集污染,并为智能体对新颖形式数学项目的贡献能力提供了稳健的度量。我们评估了一系列方法,包括通用大型语言模型、智能体方法和专用符号证明器,在SorryDB中选取的1000个任务快照上。我们表明当前方法是互补的:尽管基于Gemini Flash的智能体方法性能最佳,但它并不严格优于其他现成的大型语言模型、专用证明器,甚至精心策划的Lean策略列表。

英文摘要

We present SorryDB, a dynamically-updating benchmark of open Lean tasks drawn from 78 real world formalization projects on GitHub. Unlike existing static benchmarks, often composed of competition problems, hillclimbing the SorryDB benchmark will yield tools that are aligned to the community needs, more usable by mathematicians, and more capable of understanding complex dependencies. Moreover, by providing a continuously updated stream of tasks, SorryDB mitigates test-set contamination and offers a robust metric for an agent's ability to contribute to novel formal mathematics projects. We evaluate a collection of approaches, including generalist large language models, agentic approaches, and specialized symbolic provers, over a selected snapshot of 1000 tasks from SorryDB. We show that current approaches are complementary: even though an agentic approach based on Gemini Flash is the most performant, it is not strictly better than other off-the-shelf large-language models, specialized provers, or even a curated list of Lean tactics.

2605.09697 2026-06-16 cs.CV cs.LG 版本更新

Discriminative Span as a Predictor of Synthetic Data Utility via Classifier Reconstruction

判别跨度作为通过分类器重构预测合成数据效用的指标

Radhika Amar Desai, Modigari Narendra

发表机构 * School Of Computer Science(计算机科学学院) Vellore Institute of Technology(维杰雷理工学院)

AI总结 本文提出一种几何驱动的指标,通过预训练模型的嵌入空间评估合成数据效用,无需模型训练,通过测量线性分类器权重向量在变化子空间中的投影误差,判断合成数据对下游分类性能的影响。

详情
AI中文摘要

在许多现实世界计算机视觉应用中,如医学影像和工业检测,二分类任务常面临正样本严重缺乏的问题。广泛采用的解决方案是通过图像到图像转换生成合成正样本。然而,一个根本性挑战是:如何可靠地评估此类合成数据是否能提升下游模型性能?本文提出一种几何驱动的指标,该指标可预测合成数据的效用,而无需模型训练。我们的方法在预训练基础模型的嵌入空间中操作,并通过样本之间的差异向量表示数据集。我们通过测量线性分类器权重向量在这些变化子空间中的投影误差,评估其是否可被表示在该子空间内。直观上,如果合成数据诱导的变化捕捉了任务相关方向,其张量可近似分类器,导致投影误差低。反之,质量差的合成数据无法张量这些方向,导致误差高。在多个数据集和架构上,我们证明该指标与混合真实负样本和合成正样本训练的CNN下游分类性能有强相关性。这些发现表明,所提指标是评估数据稀缺设置中合成数据质量的实用且信息丰富的工具。

英文摘要

In many real-world computer vision applications, including medical imaging and industrial inspection, binary classification tasks are characterized by a severe scarcity of positive samples. A widely adopted solution is to generate synthetic positive data using image-to-image transformations applied to negative samples. However, a fundamental challenge remains: how can we reliably assess whether such synthetic data will improve downstream model performance? In this work, we propose a geometry-driven metric that predicts the utility of synthetic data without requiring model training. Our approach operates in the embedding space of a pre-trained foundation model and represents the dataset through difference vectors between samples. We evaluate whether the weight vector of a linear classifier can be expressed within the subspace spanned by these variations by measuring the relative projection error. Intuitively, if the variations induced by synthetic data capture task-relevant directions, their span can approximate the classifier, resulting in low projection error. Conversely, poor synthetic data fails to span these directions, leading to higher error. Across multiple datasets and architectures, we show that this metric exhibits strong correlation with downstream classification performance of CNNs trained on mixtures of real negative and synthetic positive data. These findings suggest that the proposed metric serves as a practical and informative tool for evaluating synthetic data quality in data-scarce settings.

2605.18421 2026-06-16 cs.CL cs.AI cs.LG 版本更新

EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective

EvoMemBench: 从自演化视角评估智能体记忆

Yuyao Wang, Zhongjian Zhang, Mo Chi, Kaichi Yu, Yuhan Li, Miao Peng, Bing Tong, Chen Zhang, Yan Zhou, Jia Li

发表机构 * Hong Kong University of Science and Technology (Guangzhou)(香港理工大学(广州)) Createlink Technology(创-link科技) Beijing University of Posts and Telecommunications(北京邮电大学) Beijing Institute of Technology(北京理工大学)

AI总结 本文提出EvoMemBench,从自演化视角评估智能体记忆,通过内存范围和内容两个维度构建统一基准,比较15种内存方法并发现当前内存系统尚未达到通用解决方案,长上下文基线仍具竞争力,内存在上下文不足或任务困难时效果显著,检索方法在知识密集型任务中表现优异,而程序和长期记忆方法在任务结构匹配时更有效。

详情
AI中文摘要

近期针对大语言模型(LLM)智能体的基准测试主要评估推理、规划和执行能力。然而,记忆对于智能体同样至关重要,因为它使智能体能够随时间存储、更新和检索信息。这种能力仍被低估,主要是因为现有基准测试未能提供系统评估记忆机制的方法。本文从自演化视角研究智能体记忆,引入EvoMemBench,一个沿内存范围(回合内 vs. 跨回合)和内存内容(知识导向 vs. 执行导向)两个轴线组织的统一基准。我们在标准化协议下比较了15种代表性内存方法与强大的长上下文基线。结果表明,当前内存系统仍远未达到通用解决方案:长上下文基线仍具有高度竞争力,内存在当前上下文不足或任务困难时效果最显著,且没有单一的内存形式能一致适用于所有设置。基于检索的方法在知识密集型任务中仍表现强劲,而程序和长期记忆方法在存储的经验与任务结构匹配时,对执行导向任务更有效。我们希望EvoMemBench能促进未来更有效的LLM智能体内存系统研究。我们的代码可在https://github.com/DSAIL-Memory/EvoMemBench获取。

英文摘要

Recent benchmarks for Large Language Model (LLM) agents mainly evaluate reasoning, planning, and execution. However, memory is also essential for agents, as it enables them to store, update, and retrieve information over time. This ability remains under-evaluated, largely because existing benchmarks do not provide a systematic way to assess memory mechanisms. In this paper, we study agent memory from a self-evolving perspective and introduce EvoMemBench, a unified benchmark organized along two axes: memory scope (in-episode vs. cross-episode) and memory content (knowledge-oriented vs. execution-oriented). We compare 15 representative memory methods with strong long-context baselines under a standardized protocol. Results show that current memory systems are still far from a general solution: long-context baselines remain highly competitive, memory helps most when the current context is insufficient or tasks are difficult, and no single memory form works consistently across all settings. Retrieval-based methods remain strong for knowledge-intensive settings, whereas procedural and long-term memory methods are more effective for execution-oriented tasks when their stored experience matches the task structure. We hope EvoMemBench facilitates future research on more effective memory systems for LLM-based agents. Our code is available at https://github.com/DSAIL-Memory/EvoMemBench.

2605.28734 2026-06-16 cs.CR cs.CL cs.LG 版本更新

Code as a Weapon: A Consensus-Labeled Prompt Bank for Measuring Coding-Model Compliance with Malicious-Code Requests

代码即武器:用于衡量编码模型对恶意代码请求遵从性的共识标记提示库

Richard J. Young, Gregory D. Moody

发表机构 * University of Nevada Las Vegas(内华达大学拉斯维加斯分校) Department of Information Systems(信息系统系)

AI总结 本文通过构建一个经五名评审共识标记的提示库(包含4,748个可执行恶意代码请求和1,923个有害安全知识请求),为编码模型对恶意代码请求的拒绝行为提供了可靠且可跨语料库比较的测量基准。

Comments 23 pages, 9 figures, 6 tables. Consensus-labeled prompt bank consolidating eight malicious-code corpora (ASTRA, CySecBench, AdvBench/harmful_behaviors, JailbreakBench, MalwareBench, RedCode, RMCBench, Scam2Prompt) spanning diverse elicitation paradigms; 6,675 prompts, 33,375 classification calls

详情
AI中文摘要

一个回答有害问题的通用语言模型返回文本;而一个遵从恶意请求的编码模型可以返回一个可运行的武器——键盘记录器、勒索软件存根、按原样运行的漏洞利用。这种单一遵从行为严重性的不对称意味着,编码专用模型应比通用聊天模型设置更高的拒绝标准,而非更低,然而目前该领域无法判断它们是否做到了这一点。针对恶意代码的拒绝基准是零散的:它们混合了可执行软件(即用型武器)的请求与有害安全知识(仍需人类操作的信息)的请求,并在不可比较的语料库上报告拒绝率,因此没有单一统计量衡量真正重要的属性。本文引入了一个扩展的共识标记提示库,区分了这两种请求类型,并为跨语料库的编码模型遵从性测量提供了结构稳定的基础。八个语料库(ASTRA、CySecBench、AdvBench/harmful_behaviors、JailbreakBench、MalwareBench、RedCode、RMCBench、Scam2Prompt)在五名评审共识协议下被整合和分类(6,675个提示 × 5名评审 = 33,375次调用)。评审小组达到Fleiss' kappa = 0.767 [95% CI 0.755, 0.777](“显著”);95.0%的提示获得至少四名评审一致,76.9%完全一致,并且小组在3,133个共享提示上以Cohen's kappa = 0.952复现了先前四个语料库的发布。发布的库包含4,748个共识-CODE提示(可执行恶意代码请求)和1,923个共识-KNOWLEDGE提示(有害安全知识请求)。该库是该领域一直缺乏的经过验证的工具:一个经过可靠性量化的基础,用于测试编码模型是否满足其可执行输出所要求的更严格拒绝标准。

英文摘要

A general-purpose language model that answers a harmful question returns text; a coding model that complies with a malicious request can return a working weapon: a keylogger, ransomware, an exploit that runs as written. This asymmetry in the severity of a single act of compliance implies coding-specialized models should clear a higher refusal bar than general-purpose chat models, not a lower one, yet the field cannot tell whether they do. Refusal benchmarks for malicious code are fragmented: they mix requests for executable software with requests for harmful security knowledge and report refusal rates over non-comparable corpora. This paper's central result is that the CODE-versus-KNOWLEDGE classification axis established in a prior four-corpus release remains stable under a substantially expanded corpus pool and an independently refreshed judge panel, evidence that it measures a real construct rather than an artifact of the prompts or judges. Eight corpora spanning diverse elicitation paradigms (direct, jailbreak-decorated, indirect, and agent/interpreter: ASTRA, CySecBench, AdvBench/harmful_behaviors, JailbreakBench, MalwareBench, RedCode, RMCBench, Scam2Prompt) are classified under a five-judge consensus protocol (6,675 prompts x 5 judges = 33,375 calls), reaching Fleiss' kappa = 0.767 [95% CI 0.755, 0.777] ("substantial"). Critically, the panel shares no judge with the prior release (five paid commercial APIs replaced by five open-weight models from five vendors), yet the two panels agree on 94.45% of the 3,133 shared prompts and reach Cohen's kappa = 0.952 [0.942, 0.963] on the 3,031-prompt binary overlap: the axis survives near-total panel replacement. The released bank comprises 4,748 consensus-CODE and 1,923 consensus-KNOWLEDGE prompts, a reliability-quantified benchmark whose central classification axis is shown stable across corpus expansion and judge-panel replacement.

2605.29208 2026-06-16 cs.MS cs.LG 版本更新

libhmm: A Modern C++20 Library for Hidden Markov Models with Correct MLE Emission M-Steps

libhmm:一个用于隐马尔可夫模型的现代C++20库,具有正确的MLE发射M步

Gary Wolfman

发表机构 * Independent Researcher(独立研究者)

AI总结 本文介绍libhmm,一个C++20库,用于隐马尔可夫模型参数估计、序列解码和模型选择,解决了现有软件中缺乏零依赖C++ HMM库以及Baum-Welch算法发射分布M步中广泛使用矩估计近似的问题,实现了十六种连续和离散发射分布的正确最大似然估计。

Comments 17 pages, 3 figures, 8 tables

详情
AI中文摘要

我们描述了libhmm,一个用于隐马尔可夫模型参数估计、序列解码和模型选择的C++20库。libhmm解决了现有软件中的两个空白:缺乏一个维护良好、零依赖的C++ HMM库,适合嵌入到生产系统中;以及在Baum-Welch算法的发射分布M步中广泛使用矩估计近似。该库实现了十六种连续和离散发射分布的正确最大似然估计,包括用于位置-尺度Student-t分布的ECME算法、用于Gamma、Beta、Weibull和负二项分布的Newton-Raphson最大化,以及用于圆形数据的von Mises分布。所有前向-后向和Viterbi计算都在全对数空间中运行。通过编译时分派和标量回退,为AVX-512、AVX2、SSE2和ARM NEON提供了SIMD加速。通过配套包pylibhmm提供Python绑定。我们将libhmm与现有的C和C++ HMM库以及已发布的R参考包在五个真实数据基准上进行比较,并讨论了设计中做出的架构权衡。

英文摘要

We describe libhmm, a C++20 library for Hidden Markov Model parameter estimation, sequence decoding, and model selection. libhmm addresses two gaps in existing software: the absence of a well-maintained, zero-dependency C++ HMM library suitable for embedding in production systems, and the widespread use of method-of-moments (MOM) approximations in the emission distribution M-step of the Baum-Welch algorithm. The library implements correct maximum likelihood estimators for sixteen scalar emission distributions, including an ECME algorithm for the location-scale Student-t distribution, Newton-Raphson maximization for Gamma, Beta, Weibull, and Negative Binomial distributions, and the von Mises distribution for circular data. All forward-backward and Viterbi calculations operate in full log-space. SIMD acceleration is provided for AVX-512, AVX2, SSE2, and ARM NEON via compile-time dispatch with scalar fallback. Version 4 adds multivariate observation support via the BasicHmm<Obs> template, with three multivariate emission families (diagonal Gaussian, full-covariance Gaussian, and independent components) each with correct weighted MLE M-steps. Python bindings are available via the companion package pylibhmm. We compare libhmm against established C and C++ HMM libraries and against published R reference packages on seven real-data benchmarks, and discuss the architectural tradeoffs made in the design.

2606.07086 2026-06-16 cs.CV cs.LG 版本更新

An Adaptive Data cleaning Framework for Noisy Label Detection

自适应数据清洗框架用于噪声标签检测

Chen-Hsuan Fang, Wei-Hsinag Chen, Pin-Hsuan Yu, Jung-Hua Wang, Tsung-Wei Pan

发表机构 * Department of Electrical Eng(电子工程系) AI Research Center(人工智能研究中心)

AI总结 提出一种无需手动阈值的自适应数据清洗框架,融合局部、全局和学习动态等多重度量,通过特征空间的多度量聚类实现噪声标签检测,在CIFAR-10、MNIST和ImageNet-100上显著提升召回率和模型精度。

详情
AI中文摘要

深度神经网络(DNN)在给定大型标注数据集的计算机视觉任务中表现出色。然而,在实际应用中,标签常常因歧义、人为错误或动态环境而受到污染。过参数化的DNN在训练过程中容易记忆这些噪声标签,从而降低模型的准确性和泛化能力。现有的数据清洗和样本选择策略通常依赖于手动指定的阈值、噪声比率的先验知识或单一度量(学习动态或几何结构),这使得它们在复杂数据场景下不稳定。本文提出了一种自适应数据清洗框架,该框架整合了局部、全局和学习动态线索,用于鲁棒的噪声标签检测。通过模块化特征拼接范式,样本被映射到统一的低维特征空间。我们提供了两种实例化:一种二维度量,结合了基于类自适应KNN的局部不一致性和基于k-means的全局质心距离;另一种三维多度量,额外引入了z归一化分数。与传统的将一维高斯混合模型应用于单一标量度量的方法不同,我们的框架在特征空间上执行多度量聚类,以自适应地将样本划分为干净主导和噪声主导成分,无需手动阈值或噪声先验。在CIFAR-10、MNIST和ImageNet-100上,针对5%至40%的对称标签噪声进行的实验表明,该框架在所有设置下均实现了高召回率,包括在ImageNet-100上40%噪声时接近完美的召回率(≥98%)。后续训练在所有评估设置下均获得了精度提升,尤其是在ImageNet-100的严重污染情况下。这些发现表明,多度量整合为噪声标签检测提供了一种无阈值、实用且低调整的策略。

英文摘要

Deep neural networks (DNNs) excel in computer vision tasks given large annotated datasets. In real-world applications, however, labels are often corrupted by ambiguity, human error, or dynamic environments. Over-parameterized DNNs easily memorize these noisy labels during training, degrading model accuracy and generalization. Existing data-cleaning and sample-selection strategies often rely on manually specified thresholds, prior knowledge of the noise ratio, or a single metric (either learning dynamics or geometric structure), making them unstable in complex data regimes. This paper proposes a self-adaptive data-cleaning framework that integrates local, global, and learning dynamics cues for robust noisy-label detection. Samples are mapped into a unified low-dimensional feature space through a modular feature concatenation paradigm. We provide two instantiations: a 2D metric integrating class-adaptive KNN-based local disagreement with k-means-based global centroid distance, and a 3D multi-metric that additionally incorporates a z-normalized score. Unlike conventional 1D Gaussian Mixture Models applied to a single scalar metric, our framework performs multi-metric clustering on the feature space to adaptively partition samples into clean-dominant and noise-dominant components without requiring manual thresholds or noise priors. Experiments on CIFAR-10, MNIST, and ImageNet-100 with 5% to 40% symmetric label noise show high recall across settings, including near-perfect recall (>=98%) on ImageNet-100 at 40% noise. Subsequent training yields accuracy gains across evaluated settings, especially under severe corruption on ImageNet-100. These findings suggest that multi-metric integration provides a threshold-free, practical, and low-tuning strategy for noisy label detection.

2606.11520 2026-06-16 cs.CL cs.AI cs.LG 版本更新

ISE: An Execution-Grounded Recipe for Multi-Turn OS-Agent Trajectories

ISE:一种基于执行的多轮操作系统代理轨迹合成方法

Siyuan Luo, Nairong Zheng, Lin Zhou, Tiankuo Yao, Shengyou Yuan, Haojia Yu, Cong Pang, Jiapeng Luo, Lewei Lu

发表机构 * University of Electronic Science and Technology of China(电子科技大学) SenseTime Research(字节跳动研究院)

AI总结 提出ISE三阶段范式,通过结构化意图构建、角色锁定用户模拟和真实执行环境,生成多轮代理轨迹,微调后显著提升代理工具使用性能。

Comments 13 pages, 6 figures. Dataset and code: https://github.com/Valiere01/ISE-Trace

详情
AI中文摘要

训练有能力的操作系统代理需要同时捕获结构化用户意图、多轮任务委派和基于工具执行的数据——这些属性在现有数据集中缺失。我们提出ISE(意图->模拟->执行),一种三阶段合成范式,联合解决这些差距。阶段1通过4D框架(人物角色x领域x任务x复杂度)构建约50000个结构化意图;去重后池中包含43956个唯一意图,并在mpnet-base-v2嵌入(余弦核,q=1)上获得61.57的Vendi分数。阶段2通过角色锁定的用户模拟器驱动多轮用户-代理交互,将每轮用户交互基于实际执行结果,生成23132条完整轨迹,平均8.12轮用户交互和68.24轮总对话。阶段3在实时、隔离的操作系统工作空间中执行每个工具调用,生成真实的故障恢复动态而非模拟响应。在ISETrace上微调后,使用Qwen3-8B在标准协议下的代理工具使用任务中,ClawEval pass@1从19.3提升至37.7。该结果优于零样本GPT-4o和四倍大的Qwen3-32B基础模型。对阶段2的消融实验证明多轮模拟带来了大部分性能提升。我们在该https URL发布所有源代码和数据集。

英文摘要

Training capable OS agents requires data that simultaneously captures structured user intents, multi-turn task delegation, and grounded tool execution--properties absent from existing datasets. We propose ISE (Intent -> Simulate -> Execute), a three-stage synthesis paradigm that addresses these gaps jointly. Stage 1 constructs roughly 50000 structured intents via a 4D framework (Persona x Domain x Task x Complexity); after deduplication the pool contains 43956 unique intents and attains a Vendi Score of 61.57 over the entire pool on mpnet-base-v2 embeddings (cosine kernel, q=1). Stage 2 drives multi-turn user-agent interaction through a role-locked user simulator that grounds each user turn in actual execution outcomes, producing 23132 complete trajectories averaging 8.12 user turns and 68.24 total dialogue turns. Stage 3 runs every tool call inside a live, isolated OS workspace, generating authentic failure-recovery dynamics instead of simulated responses. Fine-tuning on ISETrace improves ClawEval pass@1 from 19.3 to 37.7 using Qwen3-8B on agent tool-use tasks with a standard protocol. This result outperforms zero-shot GPT-4o and the larger Qwen3-32B base model which is four times bigger. An ablation on Stage 2 proves multi-turn simulation brings a large portion of the performance gain. We release all source code and dataset at https://github.com/Valiere01/ISE-Trace.

2606.13608 2026-06-16 cs.AI cs.LG 版本更新

AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility

AgentBeats:面向开放性、标准化和可复现性的智能体评估代理化

Xiaoyuan Liu, Jianhong Tu, Yuqi Chen, Siyuan Xie, Sihan Ren, Tianneng Shi, Gal Gantar, Evan Sandoval, Donghyun Lee, Daniel Miao, Peter J. Gilbert, Nick Hynes, Mauro Staver, Warren He, David Marn, Andrew Low, Xi Zhang, Elron Bandel, Michal Shmueli-Scheuer, Siva Reddy, Alexandre Drouin, Alexandre Lacoste, Ramayya Krishnan, Elham Tabassi, Yu Su, Victor Barres, Chenguang Wang, Wenbo Guo, Dawn Song

发表机构 * University of California, Berkeley(加州大学伯克利分校) Purdue University(普渡大学) University of Ljubljana(卢布尔雅那大学) University of Washington(华盛顿大学) Oasis Labs University of Maryland(马里兰大学) IBM Research(IBM研究院) Mila McGill University(麦吉尔大学) ServiceNow Research(ServiceNow研究院) Carnegie Mellon University(卡内基梅隆大学) National Institute of Standards and Technology(美国国家标准与技术研究院) The Ohio State University(俄亥俄州立大学) University of Cambridge(剑桥大学) University of California, Santa Barbara(加州大学圣塔芭芭拉分校)

AI总结 提出代理化智能体评估(AAA)框架,通过标准化协议(A2A和MCP)统一评估接口,实现开放、可复现的多智能体评估,并基于AgentBeats系统通过大规模竞赛和案例研究验证其覆盖性、实用性和保真度。

详情
AI中文摘要

智能体系统在各领域快速进步,但其评估仍然碎片化。大多数基准测试依赖于固定的、以LLM为中心的测试框架,需要大量集成,造成测试与生产环境不匹配,并限制了不同智能体设计之间的公平比较。根本问题在于缺乏开放的、与智能体无关的评估接口。我们倡导代理化智能体评估(AAA),其中评估由裁判智能体执行,所有参与者通过标准化协议交互:A2A用于任务管理,MCP用于工具访问。传统基准测试定义了两个独立的接口(一个用于基准测试,一个用于智能体),而AAA只需要一个;这产生了一个通用的统一框架,将评估逻辑与智能体实现分离,并支持可复现、可互操作和多智能体评估。我们进一步引入AgentBeats作为AAA的具体实现:我们确定了五种实际操作模式,使标准化评估与开放性、隐私性和可复现性的现实约束兼容。为了大规模评估我们的设计,我们进行了两项研究:一项为期五个月的开放竞赛,吸引了来自独立参与者的12个类别的298个裁判智能体和467个主题智能体,表明AAA适用于异构基准测试范围;以及一项关于编码智能体的案例研究,证实代理化评估在保留与公开记录一致性的同时,揭示了先前缺失的直接比较结果,产生了关于智能体设计的研究见解。结合社区规模实地研究和受控编码案例研究,我们验证了AAA在异构场景下大规模提供覆盖性、实用性和保真度。AAA和AgentBeats共同为开放、标准化和可复现的智能体评估提供了清晰路径。

英文摘要

Agent systems are advancing quickly across domains, but their evaluation remains fragmented. Most benchmarks rely on fixed, LLM-centric harnesses that require heavy integration, create test-production mismatch, and limit fair comparison across diverse agent designs. The root problem is the lack of an open, agent-agnostic assessment interface. We advocate Agentified Agent Assessment (AAA), where evaluation is performed by judge agents and all participants interact through standardized protocols: A2A for task management and MCP for tool access. Conventional benchmarking defines two separate interfaces, one for the benchmark and one for the agent, while AAA only needs one; this yields a generic, unified framework that separates assessment logic from agent implementation and enables reproducible, interoperable, and multi-agent evaluation. We further introduce AgentBeats as a concrete realization of AAA: we identify five practical operation modes that make standardized assessment compatible with real-world constraints on openness, privacy, and reproducibility. To evaluate our design at scale, we conduct two studies: a five-month open competition that drew 298 judge agents across 12 categories together with 467 subject agents from independent participants, showing that AAA applies across a heterogeneous range of benchmarks; and a case study on coding agents that confirms agentified evaluation preserves fidelity with the public record while surfacing previously missing head-to-head results, yielding research insights about agent design. Combining a community-scale field study and a controlled coding case study, we verify that AAA delivers coverage, practicality, and fidelity across heterogeneous scenarios at scale. Together, AAA and AgentBeats offer a clear path toward open, standardized, and reproducible agent assessment.

12. 机器学习应用 130 篇

2606.14898 2026-06-16 cs.LG 新提交

α-Fair Insurance Pricing: A Fairness Continuum

α-公平保险定价:一个公平性连续谱

Tianhe Zhang, Xiguang Liu, Peng Shi

发表机构 * Department of Risk and Insurance, Wisconsin School of Business, University of Wisconsin–Madison(威斯康星大学麦迪逊分校威斯康星商学院风险与保险系) Department of Information Systems and Operations Management, Warrington College of Business, University of Florida(佛罗里达大学沃灵顿商学院信息系统与运营管理系)

AI总结 提出α-FISP框架,通过约束优化平衡精算公平与团结公平,参数α实现从纯精算到纯团结的连续定价谱,理论保证且计算可行。

详情
AI中文摘要

保险定价中的公平性仍然是一个长期存在且争论不休的难题。一方面,保险公司出于盈利考虑,设定区分个体风险的保费以实现精算公平。另一方面,保险通过跨人群的风险汇集发挥关键社会功能,激励群体间的交叉补贴以促进团结公平。这两种竞争性公平观念之间的张力使得保险定价本质上复杂,尤其是在现代环境中,精细数据允许越来越细的风险区分,而监管机构面临保护弱势群体的压力日益增大。为解决这一挑战,我们提出了一个$α$-公平个体偿付能力保费($α$-FISP)框架,该框架在保证偿付能力(保险运营的基本要求)的同时,明确捕捉精算公平与团结公平之间的权衡。我们将定价问题表述为一个约束优化任务,其中精算公平保费在每一风险类别内的交叉补贴预算约束下进行调整。这一表述自然产生一族由$α$参数化的解,追踪从纯精算定价到纯团结定价的连续谱,使决策者能够在此公平性谱上选择操作点。我们为所提出的框架推导了理论保证。数值实验表明,$α$-FISP计算上可行,并且与具有异质性州级公平性要求的美国监管体制高度一致。

英文摘要

Fairness in insurance pricing remains a long-standing and deeply debated puzzle. On one hand, insurers, driven by profitability considerations, set premiums that differentiate across individual risks to achieve actuarial fairness. On the other hand, insurance serves a critical societal function by pooling risks across a population, motivating cross-subsidization among groups to promote solidarity fairness. The tension between these two competing notions of fairness makes insurance pricing inherently complex, particularly in modern settings where granular data allow for increasingly fine risk differentiation and regulators face growing pressure to protect vulnerable groups. To address this challenge, we propose an $α$-\textbf{F}air \textbf{I}ndividual \textbf{S}olvent \textbf{P}remium ($α$-FISP) framework for insurance pricing that explicitly captures the trade-off between actuarial and solidarity fairness while guaranteeing solvency, a fundamental requirement in insurance operations. We formulate the pricing problem as a constrained optimization task, where actuarially fair premiums are adjusted subject to budget constraints on cross-subsidization within each risk class. This formulation naturally yields a family of solutions parameterized by $α$, tracing a continuum between purely actuarial and purely solidarity-based pricing and enabling decision-makers to select an operating point along this fairness spectrum. We derive theoretical guarantees for the proposed framework. Numerical experiments show that $α$-FISP is computationally tractable and aligns well with the U.S. regulatory regimes featuring heterogeneous state-level fairness requirements.

2606.12486 2026-06-16 cs.LG 新提交

An Empirical Study on Predictive Maintenance for Component X in Heavy-Duty Scania Trucks

重型斯堪尼亚卡车中组件X的预测性维护实证研究

Valeriu Dimidov, Sasan Jafarnejad, Raphaël Frank

发表机构 * SnT, University of Luxembourg(卢森堡大学SnT) Scania CV AB(斯堪尼亚商用车公司)

AI总结 针对卡车车队,提出一种基于状态监测的预测性维护方法,将磨损状态建模为单调非递减时间序列,通过选取最近观测并转换为表格数据,利用AutoML简化建模,在Scania组件X数据集上降低了成本。

详情
AI中文摘要

近年来,基于状态的预测性维护(PdM)在卡车车队中得到了广泛应用。这种维护策略旨在通过监测车辆的健康状况并根据其状态采取主动措施,最大限度地减少计划外停机并降低成本。然而,由于卡车产生的大量数据、通过传感器数据检测故障的内在复杂性以及在解决方案实施中寻找成本效益权衡的困难,基于状态的PdM系统的实施具有挑战性。在本文中,我们定义并验证了一种基于状态的PdM方法,该方法基于一个假设:被监测组件的磨损状态可以表示为单调非递减的时间序列。它涉及仅从时间序列中选择最近的观测值,并将其转换为表格格式,以便使用为表格数据设计的机器学习(ML)模型进行分类。我们的结果表明,与当前最先进(SOTA)方法相比,所提出的方法在Scania组件X数据集上降低了成本,同时通过AutoML简化了建模过程。

英文摘要

Condition-based Predictive Maintenance (PdM) for truck fleets has gained momentum in recent years. This maintenance strategy aims to minimize unplanned downtimes and reduce costs by monitoring the health status of vehicles and taking proactive action based on their condition. However, the implementation of condition-based PdM systems is challenging due to the large volume of data generated by the trucks, the inherent complexity of detecting failures through sensor data and the difficulties in finding cost-effective trade-offs in the solution's implementation. In this paper, we define and validate a condition-based PdM methodology built on the assumption that the wear-and-tear state of the monitored component can be represented as a monotonically non-decreasing time series. It involves selecting only the most recent observations from the time series and transforming them into a tabular format for classification using machine learning (ML) models designed for tabular data. Our results indicate that the proposed methodology reduces costs on the Scania Component X dataset compared to current state-of-the-art (SOTA) approaches, while also simplifying the modeling process through AutoML.

2606.14960 2026-06-16 cs.LG cs.CY 新提交

Leveraging Physiological Signals to Predict Exam Outcomes with Machine Learning

利用生理信号通过机器学习预测考试结果

Lala Yamazaki, Ramchandra Rimal

发表机构 * Middle Tennessee State University(中田纳西州立大学)

AI总结 研究使用机器学习模型分析考试期间的生理数据(皮肤电活动、心率、皮肤温度)预测成绩,比较了逻辑回归、随机森林、SVM及LSTM、GRU、Transformer等模型,发现随机森林在效率和可解释性上表现优异,Transformer与LSTM/GRU性能相当。

Comments 9 figures, and 5 tables

详情
AI中文摘要

本研究探讨了利用机器学习模型预测考试结果的可行性,数据来自考试期间收集的生理信号。分析了包括皮肤电活动、心率和皮肤温度在内的生理压力指标,以揭示其与学业表现的关系。采用了多种机器学习方法,从逻辑回归、随机森林和支持向量机等标准模型,到更先进的架构,包括Transformer、长短期记忆(LSTM)和门控循环单元(GRU)模型。这种多样性旨在有效捕捉数据中的复杂交互。一个关键焦点是评估Transformer在处理数值数据方面的适应性,并评估其在此新情境下的性能。使用准确率、精确率、召回率和F1分数等标准性能指标来比较模型效果。实验结果表明,虽然深度学习模型通常擅长捕捉生理数据中的复杂关系,但像随机森林这样的简单模型有时能实现更优性能,同时提供计算效率和可解释性。此外,Transformer表现出显著的多功能性,展现出与LSTM和GRU模型相当的性能。本研究强调了尝试与问题目标相符的广泛模型类别的重要性,平衡了精度、效率和可解释性。通过阐明生理信号与学业表现之间的关系,本研究有助于理解影响学生心理健康的压力因素,并进一步促进利用生理数据提升学生福祉和学业成果。

英文摘要

This study investigates the application of machine learning models to predict exam outcomes using physiological data collected during examination sessions. Physiological stress indicators, including electrodermal activity, heart rate, and skin temperature, were analyzed to uncover their association with academic performance. A variety of machine learning approaches were employed, ranging from standard models like logistic regression, random forest, and support vector machines to more advanced architectures, including transformers, long short-term memory (LSTM), and gated recurrent unit (GRU) models. This diversity aimed to capture the complex interactions within the data effectively. A key focus was assessing the adaptability of transformers in processing numerical data and evaluating their performance in this novel context. Standard performance metrics, such as accuracy, precision, recall, and F1-score, were used to compare model efficacy. The experimental results demonstrate that while deep learning models generally excel at capturing complex relationships in physiological data, simpler models like random forests can sometimes achieve superior performance while offering computational efficiency and interpretability. Furthermore, transformers demonstrated notable versatility, showcasing performances comparable to those of the LSTM and GRU models. This research underscores the importance of experimenting with a broad class of models that align with the objectives of the problem at hand, balancing precision, efficiency, and interpretability. By elucidating the relationships between physiological signals and academic performance, this study contributes to understanding stressors affecting students' mental health. It further promotes leveraging physiological data to enhance student well-being and academic outcomes.

2606.14999 2026-06-16 cs.LG 新提交

Unlocking Latent Dimensions: Exploring Representations of Large-Scale X-ray Scattering Data using Variational Autoencoders

解锁潜在维度:使用变分自编码器探索大规模X射线散射数据的表示

Monika Choudhary, Xiaoya Chong, Runbo Jiang, Wiebke Koepp, Petrus H. Zwart, Damon English, Gregory M. Su, Eric Schaible, Chenhui Zhu, Mostafa Nassr, Noah P. Wamble, Kelvin Kam-Yun Li, Jonathan M. Chan, Jose Carlos Diaz, Cameron McKay, Lynn Katz, Benny Freeman, Guillaume Freychet, Yevgen Matviychuk, Eliot Gann, Daniel B. Allan, Benedikt Sochor, Frank Schluenzen, Stephan V. Roth, Ethan Crumlin, Dylan McReynolds, Tanny Chavez, Alexander Hexemer

发表机构 * Advanced Light Source, Lawrence Berkeley National Laboratory(劳伦斯伯克利国家实验室先进光源) Center for Advanced Mathematics for Energy Research Applications, Lawrence Berkeley National Laboratory(劳伦斯伯克利国家实验室能源研究应用高级数学中心) Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory(劳伦斯伯克利国家实验室分子生物物理学与综合生物成像部) Berkeley Synchrotron Infrared Structural Biology program, Lawrence Berkeley National Laboratory(劳伦斯伯克利国家实验室伯克利同步辐射红外结构生物学项目) Materials Sciences Division, Lawrence Berkeley National Laboratory(劳伦斯伯克利国家实验室材料科学部) McKetta Department of Chemical Engineering, University of Texas(德克萨斯大学麦凯塔化学工程系)

AI总结 针对X射线散射数据离线探索和实时分析两大挑战,训练领域特定注意力卷积变分自编码器(C-VAE),学习低维表示以捕捉结构变化,并集成到MLExchange平台的Latent Space Explorer中,支持交互式结构探索。

详情
AI中文摘要

科学用户设施产生的X射线散射数据速度超过传统工作流的处理能力。我们针对两种场景解决这一挑战:离线数据集探索和实时在线分析。我们在150万张X射线散射图像上训练了一个领域特定的基于注意力的卷积变分自编码器(C-VAE),以学习捕捉跨不同实验条件的结构变化的低维表示。学习到的潜在空间揭示了反映实验进展的组织良好的聚类和平滑轨迹。它还支持跨不同结构状态的受控合成散射图像生成。当未经重新训练部署时,该模型将两个同步加速器设施的时间分辨薄膜形成实验组织成可解释的潜在结构。与通用视觉基础模型DINOv3(ViT-7B)的基准测试表明,领域特定训练为散射数据产生了更可解释的潜在组织。两个工作流都集成在MLExchange平台的Latent Space Explorer中,支持跨存档数据集和实时实验的交互式结构探索。

英文摘要

Scientific user facilities generate X-ray scattering data faster than traditional workflows can process them. We address this challenge across two settings, offline dataset exploration and live on-the-fly analysis. We train a domain-specific attention-based Convolutional Variational Autoencoder (C-VAE) on 1.5 million X-ray scattering images to learn low-dimensional representations capturing structural variation across diverse experimental conditions. The learned latent space reveals well-organized clusters and smooth trajectories reflecting experimental progression. It further supports controlled synthetic scattering image generation across diverse structural states. When deployed without retraining, the model organizes time-resolved film formation experiments at two synchrotron facilities into interpretable latent structures. Benchmarking against DINOv3 (ViT-7B), a general-purpose vision foundation model, demonstrates that domain-specific training yields more interpretable latent organization for scattering data. Both workflows are integrated within Latent Space Explorer, a component of the MLExchange platform, supporting interactive structural exploration across archived datasets and live experiments.

2606.15053 2026-06-16 cs.LG cs.NA math.NA 新提交

Physics-conforming Latent Twins

物理一致潜在对偶

Matthias Chung, Yutong Bu, Deepanshu Verma

发表机构 * Emory University(埃默里大学) Clemson University(克莱姆森大学)

AI总结 提出物理一致潜在对偶框架,通过联合学习编码器、解码器和潜在流映射,使潜在动力学满足守恒律、不变性和耗散结构,在保持代理模型预测精度的同时提高物理约束满足度和长期行为质量。

Comments 32 pages, 11 figures

详情
AI中文摘要

代理模型是科学机器学习的核心,能够对复杂物理系统进行快速预测、模拟、推断和控制。然而,对于时间相关问题,仅准确插值训练轨迹是不够的:可靠的代理还应尊重赋予这些轨迹物理意义的守恒律、不变量、可接受条件和耗散结构。我们提出了物理一致潜在对偶,这是一个学习潜在代理解算子的框架,其动力学通过设计满足选定的物理原理。该方法基于潜在对偶公式,通过联合学习编码器、解码器和任意时间索引状态之间的潜在流映射,同时约束潜在动力学以保持或耗散指定的结构量。我们发展了一种约束转移观点,将原始状态空间中的物理结构与潜在空间中的兼容约束联系起来,并证明了结构保持界,表明潜在强制执行如何改善解码后物理缺陷的控制。我们还推导了保持线性和二次不变量或强制执行耗散不等式的潜在流映射的代数条件。在代表性ODE和PDE基准上的数值实验表明,该方法在保持准确代理预测的同时,改善了约束满足、结构保真度和定性长期行为。

英文摘要

Surrogate models are central to scientific machine learning, where they enable fast prediction, simulation, inference, and control for complex physical systems. For time-dependent problems, however, accurate interpolation of training trajectories is not sufficient: reliable surrogates should also respect the conservation laws, invariants, admissibility conditions, and dissipative structures that give those trajectories physical meaning. We introduce Physics-conforming Latent Twins, a framework for learning latent surrogate solution operators whose dynamics satisfy selected physical principles by design. The method builds on the Latent Twin formulation by jointly learning an encoder, a decoder, and a latent flow map between arbitrary time-indexed states, while constraining the latent dynamics to preserve or dissipate prescribed structural quantities. We develop a constraint-transfer viewpoint that connects physical structure in the original state space with compatible constraints in latent space, and prove structure-preservation bounds showing how latent enforcement improves control of physical defects after decoding. We also derive algebraic conditions for latent flow maps that preserve linear and quadratic invariants or enforce dissipative inequalities. Numerical experiments on representative ODE and PDE benchmarks demonstrate improved constraint satisfaction, structural fidelity, and qualitative long-time behavior while maintaining accurate surrogate prediction.

2606.15058 2026-06-16 cs.LG stat.AP 新提交

Machine Learning and the Random Walk Puzzle: Forecasting the CAD/USD Exchange Rate with Expanding Window Evaluation and SHAP Interpretability

机器学习与随机游走难题:基于扩展窗口评估和SHAP可解释性的CAD/USD汇率预测

Louis Agyekum, Edmund Fosu Agyemang, Obu-Amoah Ampomah, Kofi Acheampong, Emmanuel Boadi, Priscilla Yaa Amakye, Fafa Shalom Tchorly, Enock Adu Bonsu, Eric Nyarko

发表机构 * Department of Economics, University of Ottawa(Ottawa大学经济学系) Department of Biostatistics and Data Science, Celia Scott Weatherhead School of Public Health and Tropical Medicine at Tulane University(Tulane大学生物统计学与数据科学系) Department of Statistics, Western Michigan University(西方密苏里大学统计学系) Department of Economics, Western Michigan University(西方密苏里大学经济学系) School of Mathematical and Statistical Sciences, University of Texas Rio Grande Valley(德克萨斯里奥格兰德谷大学数学与统计学系) Robinson College of Business, Georgia State University(佐治亚州立大学罗宾逊商学院) Department of Mathematics & Statistics, University of North Florida(北佛罗里达大学数学与统计学系) Department of Epidemiology and Biostatistics, University of Arizona(亚利桑那大学流行病学与生物统计学系) Department of Statistics and Actuarial Science, University of Ghana(加纳大学统计学与精算科学系)

AI总结 研究机器学习模型能否超越朴素随机游走基准预测月度美元/加元汇率,采用扩展窗口评估和SHAP解释,发现线性回归显著优于随机游走,集成模型表现接近。

Comments 10 pages, 14 figures, 8 tables

详情
AI中文摘要

本研究考察机器学习(ML)模型能否在预测月度美元/加元汇率时超越朴素随机游走基准。使用加拿大银行2017年1月至2026年5月的日度数据,重采样为113个月度观测值,评估了五种ML模型:线性回归、随机森林、梯度提升、XGBoost和AdaBoost。这些模型以朴素随机游走模型和带有Holt-Winters季节性的指数平滑(ETS)为基准。所有模型均采用扩展窗口框架评估以保持严格的样本外完整性,并使用Diebold-Mariano(DM)检验评估预测精度差异。结构断点检测识别出序列中的四个显著断点,分别对应2018年中美贸易战升级、2020年COVID-19经济复苏、2022年加拿大银行加息周期峰值以及2024年加拿大银行降息周期开始。应用SHAP(Shapley Additive Explanations)分析解释表现最佳ML模型的驱动因素。结果表明,朴素随机游走模型仍然是一个强大的基准。线性回归是唯一在统计上优于朴素随机游走模型的模型,DM统计量为3.0585,p值为0.0071,而ML集成模型仅显示出微小差异。采用扩展窗口框架的随机森林在所有模型(除随机游走外)中实现了最低的MAPE,为1.17%。SHAP分析证实,短期滞后(尤其是滞后1和滞后2)以及近期滚动均值主导预测,这与汇率的近随机游走行为一致。

英文摘要

This study examines whether machine learning (ML) models can outperform the naive random walk benchmark in forecasting the monthly USD/CAD exchange rate. Using daily data from the Bank of Canada spanning January 2017 to May 2026, resampled into 113 monthly observations, five ML models are evaluated: linear regression, random forest, gradient boosting, XGBoost, and AdaBoost. These models are benchmarked against the naive random walk model and exponential smoothing with Holt-Winters seasonality (ETS). All models are evaluated using an expanding-window framework to maintain strict out-of-sample integrity, and forecast-accuracy differences are assessed using the Diebold-Mariano (DM) test. Structural break detection identifies four significant breakpoints in the series, corresponding to the escalation of the US-China trade war in 2018, the COVID-19 economic recovery in 2020, the peak of the Bank of Canada rate-hiking cycle in 2022, and the start of the Bank of Canada rate-cutting cycle in 2024. SHAP, or Shapley Additive Explanations, analysis is applied to interpret the drivers of the best-performing ML model. The results show that the naive random walk model remains a formidable benchmark. Linear regression is the only model that statistically outperforms the naive random walk model, with a DM statistic of 3.0585 and a p value of 0.0071, whereas the ML ensemble models show only marginal differences. Random Forest with an expanding-window framework achieves the lowest MAPE of 1.17 percent among all models except the random walk. SHAP analysis confirms that short-term lags, particularly lag1 and lag2, and recent rolling means dominate predictions, consistent with the near-random-walk behavior of exchange rates.

2606.15074 2026-06-16 cs.LG 新提交

TriAdReview: Triangular Adversarial Review Architecture for Multi-Model Technical Document Generation

TriAdReview: 用于多模型技术文档生成的三角对抗审查架构

Zhiqiang Zhou, Junliang Dai, Xu Ling

发表机构 * Hunan Chemical Industry Vocational and Technical College(湖南化工职业技术学院)

AI总结 提出TriAdReview三角对抗审查架构,使用两个独立审查模型和三角判断机制迭代改进生成器输出,在五个基准任务上相比单模型基线提升10.1%,但发现对抗审查在完整性任务上存在结构偏差。

Comments 12 pages, 7 figures, 5 tables

详情
AI中文摘要

大型语言模型(LLMs)越来越多地用于技术文档生成,但单模型输出常常存在过度工程化、安全盲点和覆盖不完整的问题。我们提出TriAdReview,一种三角对抗审查架构,采用两个独立的审查模型(工程视角和边界视角)以及一个三角判断机制,迭代改进生成器模型的输出。我们在五个基准任务——架构设计、代码生成、提案审查、安全审计和需求分析——上评估了TriAdReview,使用了三种配置:单模型(基线)、双模型(单次审查)和三模型(完整系统)。在75次实验(每个单元n=5)中,结果显示三模型配置相比单模型基线实现了10.1%的总体改进(50分制中26.2 vs. 23.8;p<0.05,配对t检验),在安全审计(+27.6%)、代码生成(+20.8%)和架构设计(+15.6%)上尤为显著。第二个评分者(mimo-v2.5-pro)以较小的效应(+2.7%)确认了方向,表明评分者间一致性中等。然而,系统在需求分析上出现了-7.5%的退化,揭示出对抗审查架构存在对简化的结构性偏见,这对面向完整性的任务适得其反。我们通过任务类型框架分析了这一边界条件,并证明审查提示适应可以部分缓解该问题。我们的发现首次实证描述了多模型对抗审查何时有益或有害,对协作AI系统的设计具有启示意义。

英文摘要

Large language models (LLMs) are increasingly used for technical document generation, yet single-model outputs often suffer from over-engineering, security blind spots, and incomplete coverage. We propose TriAdReview, a triangular adversarial review architecture that employs two independent reviewer models (engineering and boundary perspectives) and a triangular judging mechanism to iteratively improve a generator model's output. We evaluate TriAdReview across five benchmark tasks - architecture design, code generation, proposal review, security audit, and requirements analysis - using three configurations: single model (baseline), dual model (single review), and triple model (full system). Results across 75 experiments (n=5 per cell) show that the triple model configuration achieves a 10.1% overall improvement over the single model baseline (26.2 vs. 23.8 out of 50; p<0.05, paired t-test), with particularly strong gains on security audit (+27.6%), code generation (+20.8%), and architecture design (+15.6%). A second scorer (mimo-v2.5-pro) confirms the direction with a smaller effect (+2.7%), suggesting moderate inter-rater agreement. However, the system shows a -7.5% degradation on requirements analysis, revealing that adversarial review architectures have a structural bias toward simplification that is counterproductive for completeness-oriented tasks. We analyze this boundary condition through a task-type framework and demonstrate that reviewer prompt adaptation partially mitigates the issue. Our findings provide the first empirical characterization of when multi-model adversarial review helps versus harms, with implications for the design of collaborative AI systems.

2606.15155 2026-06-16 cs.LG 新提交

Semantic Reasoning in Medicine: The Role of Knowledge Graphs Across Five Key Domains

医学中的语义推理:知识图谱在五个关键领域的作用

Haniye Sherafatmandjoo, Mohammad Akbari, Zahed Rahmati

发表机构 * Amirkabir University of Technology(阿米尔卡比尔理工大学)

AI总结 综述知识图谱在医学中的应用,涵盖临床决策支持、疾病预测、健康推荐、精准医疗和医学问答,并讨论构建方法、挑战及未来方向。

详情
AI中文摘要

知识图谱(KGs)已成为整合和推理复杂生物医学与临床数据的有前景解决方案。通过表示疾病、药物、症状和患者记录等实体之间的结构化关系,KGs为决策、预测、推荐和个性化护理提供了语义基础。最近的进展已证明它们在多种医学应用中的实用性——包括临床决策支持系统、疾病和治疗结果预测、健康推荐系统、精准医疗和医学问答——其中KGs通常增强可解释性、语义一致性和患者特定推理。与此同时,越来越多的研究专注于医学KG生成本身,提出了利用本体、语义网技术、基于深度学习的信息提取和混合神经符号流水线,从电子健康记录、临床叙述、生物医学文献和网络资源构建图谱的框架。尽管取得了这些进展,仍然存在重大挑战,包括知识覆盖有限且分散、异构数据源对齐困难、当前推理和表示学习方法在密集多关系图上的脆弱性,以及与隐私、偏见和问责相关的未解决问题。本综述从应用导向和方法导向两个维度回顾和分类了当前医学KG的研究,讨论了其优势和技术基础,并概述了关键局限性和开放研究方向。通过分析趋势、架构和评估实践,本文旨在指导KG驱动的医学AI系统的未来发展,并支持其安全有效地融入医疗环境。

英文摘要

Knowledge graphs (KGs) have emerged as a promising solution for integrating and reasoning over complex biomedical and clinical data in healthcare. By representing structured relationships among entities such as diseases, drugs, symptoms, and patient records, KGs provide a semantic backbone for decision-making, prediction, recommendation, and personalized care. Recent advances have demonstrated their utility across diverse medical applications--including clinical decision support systems, disease and treatment outcome prediction, health recommender systems, precision medicine, and medical question answering--where KGs often enhance interpretability, semantic coherence, and patient-specific reasoning. In parallel, a growing body of work focuses on medical KG generation itself, proposing frameworks that construct graphs from EHRs, clinical narratives, biomedical literature, and web resources using ontologies, semantic web technologies, deep-learning-based information extraction, and hybrid neuro-symbolic pipelines. Despite this progress, significant challenges remain, including limited and fragmented knowledge coverage, difficulties in aligning heterogeneous data sources, the fragility of current reasoning and representation-learning methods on dense multi-relational graphs, and unresolved issues related to privacy, bias, and accountability. This survey reviews and categorizes current research on KGs in medicine along both application-oriented and methodology-oriented dimensions, discusses their benefits and technical foundations, and outlines key limitations and open research directions. By analyzing trends, architectures, and evaluation practices, this work aims to guide future developments in KG-driven medical AI systems and support their safe and effective integration into healthcare environments.

2606.15225 2026-06-16 cs.LG cs.AI cs.IR 新提交

Edu-Theater: A Data-Efficient Agent Framework for Scalable Learner Behavior Simulation through Staging Roll-Call

Edu-Theater: 一种通过点名排演实现可扩展学习者行为模拟的数据高效智能体框架

Weibo Gao, Qi Liu, Linan Yue, Zheng Zhang, Yichao Du, Fangzhou Yao, Ao Yu, Zhenya Huang, Shijin Wang

发表机构 * University of Science and Technology of China(中国科学技术大学) State Key Laboratory of Cognitive Intelligence(认知智能国家重点实验室) Southeast University(东南大学) Alibaba Group(阿里巴巴集团) iFLYTEK Co., Ltd.(科大讯飞股份有限公司)

AI总结 提出Edu-Theater框架,通过构建群体水平能力先验和少量诊断查询,利用LLM智能体模拟学习者行为,在减少数据需求的同时提高模拟精度,并增强下游自适应测试等应用。

Comments LLM Agent, Educational Data Mining, Data Synthesis, Human Simulation

详情
AI中文摘要

大规模学习者-任务交互数据对智能教育系统至关重要,但收集成本高且受隐私和学习者参与度限制。学习模拟器在无需真实学习者持续参与的情况下,对模拟可扩展的学习者行为起着关键作用。然而,现有方法主要是**以个体为中心**,为每个学习者配对模拟器,从密集的交互历史中迭代推断潜在知识状态,这既数据密集又计算密集,且在冷启动场景中脆弱。我们提出一种**群体感知的点名模拟范式**,首先构建群体水平的能力先验,然后通过少量有针对性的诊断查询细化个体学习者状态。基于该范式,我们引入**Edu-Theater**,一个由LLM驱动的智能体系统,通过教师智能体和基于学习者日志的回顾性点名探测执行群体感知的学习者模拟。Edu-Theater无需每个学习者的密集历史即可实现可扩展的未来行为模拟。在两个真实世界数据集上的实验表明,Edu-Theater以显著更少的LLM调用实现了更高的模拟精度,生成的合成数据增强了自适应测试等下游应用。

英文摘要

Large-scale learner-task interaction data are crucial for intelligent educational systems but are costly to collect and constrained by privacy and learner engagement. Learner simulators play a critical role in simulating scalable learner behavior without the need for continuous involvement of real learners. However, existing methods are predominantly \textbf{individual-centric}, pairing a simulator with each learner to iteratively infer latent knowledge states from dense interaction histories, which is both data- and computation-intensive, and fragile in cold-start scenarios. We propose a \textbf{cohort-aware roll-call simulation paradigm} that first constructs cohort-level proficiency priors and refines individual learner states through a small number of targeted diagnostic queries. Based on this paradigm, we introduce \textbf{Edu-Theater}, an LLM-powered agent system that performs cohort-aware learner simulation via a teacher agent and retrospective roll-call probing over learner logs. Edu-Theater enables scalable future behavior simulation without the need for dense per-learner histories. Experiments on two real-world datasets demonstrate that Edu-Theater achieves higher simulation accuracy with significantly fewer LLM calls, producing synthetic data that enhances downstream applications such as adaptive testing.

2606.15257 2026-06-16 cs.LG 新提交

AI for Social Good: An Investigation of the Causal Relationship Between Environmental Regulations and Their Effects on Air Pollution in London, UK

AI 促进社会公益:英国伦敦环境法规与其对空气污染影响的因果关系研究

Yang Han, Jacqueline CK Lam, Victor OK Li, Yiu-Wai Man

发表机构 * Department of Electrical and Electronic Engineering, The University of Hong Kong(香港大学电子与电气工程系)

AI总结 提出不确定性感知的贝叶斯深度学习框架,估计2010-2020年伦敦空气污染法规对PM2.5的因果效应,发现法规平均降低PM2.5 1.88 μg/m³(12.35%)。

详情
AI中文摘要

空气污染法规是城市公共卫生治理的核心,但估计其效果具有挑战性,因为政策实施非随机,且污染轨迹受气象、社会经济变化、时间趋势和重叠干预措施的影响。本研究开发了一个不确定性感知的贝叶斯深度学习框架,用于估计2010年至2020年伦敦空气污染法规对PM$_{2.5}$浓度的总体影响。该框架整合了来自内伦敦监测站的每日PM$_{2.5}$观测数据、气象协变量、年度社会经济指标、月份和星期指示变量,以及32项政策措施的每日法规状态数据。贝叶斯LSTM捕获环境和社会经济协变量的时间依赖性,贝叶斯嵌入层表示时间和法规状态输入,法规状态预测分支支持基于倾向性得分的非随机政策实施调整。通过将观测到的PM$_{2.5}$浓度与假设无法规情景下的反事实预测进行比较,估计法规效果,并在重复贝叶斯训练和bootstrap重采样中总结不确定性。结果显示,伦敦的法规与平均PM$_{2.5}$减少1.88 μg/m³(相对减少12.35%)相关,95%置信区间为1.64-2.12 μg/m³。2013年之前效果有限,2013年至2017年效果逐渐明显,2018年和2019年效果最强。研究结果表明,持续累积的监管干预措施对伦敦空气质量改善产生了可衡量的影响。本研究展示了不确定性感知的因果AI如何支持环境问责、公共卫生保护和基于证据的环境决策治理。

英文摘要

Air pollution regulation is central to urban public health governance, but estimating its effects is difficult because policies are implemented non-randomly and pollution trajectories are shaped by meteorology, socioeconomic change, temporal trends, and overlapping interventions. This study develops an uncertainty-aware Bayesian deep learning framework to estimate the aggregate effect of air pollution regulations on PM$_{2.5}$ concentrations in London from 2010 to 2020. The framework integrates daily PM$_{2.5}$ observations from Inner London monitoring stations, meteorological covariates, annual socioeconomic indicators, month-of-year and day-of-week indicators, and daily regulation status data for 32 policy measures. A Bayesian LSTM captures temporal dependencies in environmental and socioeconomic covariates, Bayesian embedding layers represent temporal and regulation status inputs, and a regulation status prediction branch supports propensity score-based adjustment for non-random policy implementation. Regulatory effects are estimated by comparing observed PM$_{2.5}$ concentrations with counterfactual predictions under a hypothetical no-regulation scenario, with uncertainty summarized across repeated Bayesian training runs and bootstrap resampling. Results show that London's regulations were associated with an average PM$_{2.5}$ reduction of 1.88 $μ$g/m$^3$, a relative reduction of 12.35%, with a 95% confidence interval of 1.64-2.12 $μ$g/m$^3$. Estimated effects were limited before 2013, became clearer from 2013 to 2017, and were strongest in 2018 and 2019. The findings suggest that sustained and cumulative regulatory interventions contributed to measurable improvements in London's air quality. This study demonstrates how uncertainty-aware causal AI can support environmental accountability, public health protection, and evidence-based governance for environmental decision-making.

2606.15288 2026-06-16 cs.LG cs.AI physics.ao-ph 新提交

Hybrid NARX-LLM for Greenland Iceberg Discharge: Prompt-Driven Residual Correction

混合NARX-LLM用于格陵兰冰山排放:提示驱动的残差校正

Yiquan Gao, Duohui Xu

发表机构 * Heriot-Watt University(赫瑞瓦特大学) StudioYG

AI总结 提出混合NARX-LLM框架,结合非线性自回归模型与大型语言模型进行残差校正,并引入物理信息提示方法,用于建模格陵兰冰山排放的复杂非线性动态,提升预测准确性。

详情
AI中文摘要

格陵兰冰山排放表现出复杂的非线性动态,且可观测性有限,对传统预测模型构成挑战。我们提出一个混合NARX-LLM框架,该框架结合了具有外源输入的非线性自回归模型(NARX)和用于残差校正的大型语言模型(LLM)。我们进一步提出了一种物理信息提示(PIP)方法,将非结构化物理知识转化为结构化提示,用于零样本上下文推理。主要目标是探索该框架在建模格陵兰冰山排放方面的校正潜力,而不仅仅是优化预测精度。NARX组件捕获内在的时间依赖性,而由PIP引导的LLM编码冰川动力学和环境驱动因素,并感知关键趋势模式以校正系统预测误差。这种集成允许模型推理未建模因素并产生可解释的残差,从而提升整体预测精度。应用于格陵兰冰山排放时间序列,我们的方法处理了由于罕见变化和非平稳趋势而难以预测的极端事件,这是传统方法经常忽视的局限性。通过融合结构化时间序列建模与知识驱动的Foundation AI,该框架提供了一条可扩展且可解释的路径,将数据受限的气候预测与物理信息LLM推理相结合。代码已公开。

英文摘要

Greenland iceberg discharge exhibits complex nonlinear dynamics with limited observability, challenging traditional predictive models. We present a Hybrid NARX-LLM framework that combines a nonlinear autoregressive model with exogenous inputs (NARX) and a large language model (LLM) for residual correction. We further propose a Physics-Informed Prompt (PIP) method that transforms unstructured physical knowledge into structured prompts for zero-shot in-context reasoning. The primary objective is to explore the corrective potential of this framework for modeling Greenland iceberg discharge, rather than merely optimizing predictive accuracy. The NARX component captures intrinsic temporal dependencies, while the LLM, guided by PIP, encodes glacier dynamics and environmental drivers and perceives key trend patterns to correct systematic prediction errors. This integration allows the model to reason about unmodeled factors and produce interpretable residuals, enhancing overall predictive accuracy. Applied to Greenland iceberg discharge time series, our approach addresses extreme events that are difficult to predict due to rare variations and nonstationary trends, a limitation often overlooked by traditional methods. By fusing structured time-series modeling with knowledge-driven foundation AI, the framework offers a scalable and interpretable pathway to bridge data-limited climate forecasting with physics-informed LLM reasoning. The code is available.

2606.15314 2026-06-16 cs.LG cs.AI stat.ML 新提交

LLMs on Tabular Data with Limited Semantics: Evidence from Industrial Car Retrofit Prediction

有限语义表格数据上的LLM:来自工业汽车改造预测的证据

Aina Vila Pons, Ioannis Tzachristas, Constantinos Antoniou

发表机构 * Technical University of Munich(慕尼黑工业大学) BMW Group(宝马集团)

AI总结 研究在工业表格数据中,LLM(嵌入、直接分类、混合堆叠)与经典树集成方法的对比,发现LLM在语义受限时效果有限,但嵌入和混合方法仍有价值。

详情
AI中文摘要

工业改造规划依赖于结构化操作数据而非自由文本:规划者必须估计新注册的原型是否需要改造、需要哪种改造包以及工作将花费多长时间。我们研究了一个工业数据集,该数据集将原型注册系统(284,271辆车)与改造管理系统(48,716次清洗后的访问)相连接,并在行序列化输入上比较了强大的表格机器学习基线与三种基于LLM的策略:嵌入特征(Amazon Titan)、直接提示分类(Claude Sonnet 4)和ML+LLM堆叠方法。在二分类发生预测、15类改造类型分类、每次访问持续时间回归以及聚合的月度基准测试中,经典树集成仍然是最强的独立模型。然而,LLM结果揭示了一致的模式:嵌入在表格上仍然有用(二分类AUC = 0.982),直接提示在通过哈希去除语义信号后崩溃(二分类AUC = 0.500;多类加权F1 = 0.018),而混合堆叠产生了最佳的手动构建多类模型(加权F1 = 0.626)。在月度基准测试中,基于滞后的机器学习优于时间序列基础模型,尽管Chronos-small在零样本预测中仍具有竞争力。结果表明,在隐私受限的工业表格上,LLM作为补充组件比替代强大的表格基线更有效。

英文摘要

Industrial retrofit planning depends on structured operational data rather than free text: planners must estimate whether a newly registered prototype will require a retrofit, which retrofit package it will need, and how long the work will take. We study an industrial dataset linking a prototype-registration system (284,271 vehicles) with a retrofit-management system (48,716 cleaned visits), and compare strong tabular machine learning baselines with three LLM-based strategies on row-serialized inputs: embedding features (Amazon Titan), direct prompted classification (Claude Sonnet 4), and an ML+LLM stacking approach. Across binary occurrence prediction, 15-way retrofit-type classification, per-visit duration regression, and an aggregated monthly benchmark, classical tree ensembles remain the strongest standalone models. However, the LLM results reveal a consistent pattern: embeddings remain useful on tables (binary AUC = 0.982), direct prompting collapses once semantic signal is stripped by hashing (binary AUC = 0.500; multiclass weighted F1 = 0.018), and hybrid stacking yields the best manually built multiclass model (weighted F1 = 0.626). On the monthly benchmark, lag-based machine learning outperforms time-series foundation models, though Chronos-small remains competitive in zero-shot forecasting. The results suggest that on privacy-constrained industrial tables, LLMs are more effective as complementary components than as replacements for strong tabular baselines.

2606.15377 2026-06-16 cs.LG cs.AI physics.geo-ph 新提交

Learning Earthquake Wave Arrival Time Picking from Labels with Inaccuracies

从不准确标签中学习地震波到时拾取

Sen Li, Xu Yang, S. Mostafa Mousavi, Anye Cao, Keting Fan, Yaoqi Liu, Changbin Wang, Qiang Niu

发表机构 * Department of Earth and Planetary Sciences, Harvard University(哈佛大学地球与行星科学系) School of Computer Science and Technology, China University of Mining and Technology(中国矿业大学(北京)计算机科学与技术学院) School of Mines, China University of Mining and Technology(中国矿业大学(北京)矿院) State Key Laboratory of Coal Exploration and Intelligent Mining, China University of Mining and Technology(中国矿业大学(北京)煤炭勘探与智能开采国家重点实验室)

AI总结 提出标签噪声对比鲁棒学习(LaNCoR)方法,通过对齐波形特征与标签表示分布来纠正错误标签,在微地震P波到时拾取任务中性能提升高达28.8%。

Comments 28 pages, 10 figures

详情
AI中文摘要

不准确标记的训练数据,或称“标签噪声”,对监督机器学习模型的完整性构成重大威胁。这种污染通过教导模型特征与标签之间的错误映射直接降低性能,导致泛化能力差,并在正确标记的验证和测试数据上准确性降低。当前地震学应用主要依赖大规模训练集或数据增强来减少标签噪声影响,这可能是劳动密集且成本高昂的。在这里,我们介绍一种标签噪声对比鲁棒学习(LaNCoR)方法,该方法可以有效处理地震信号处理任务中的噪声标签,而无需大规模训练数据集。在该方法中,输入波形特征和标签表示分布在特征空间中对齐,以纠正错误标记并减少其对训练过程的影响。我们使用两个基线模型和训练方法展示了LaNCoR在真实微地震数据P波到时拾取任务上的性能。我们的结果表明,LaNCoR在性能指标上可提升高达28.8%。该方法在地震学和地球科学中的模型训练方面具有巨大潜力。

英文摘要

Inaccurately labeled training data, or "label noise", poses a significant threat to the integrity of supervised machine learning models. This corruption directly degrades performance by teaching the model erroneous mappings between features and labels, which leads to poor generalization and reduced accuracy on properly labeled validation and test data. Current seismological applications mainly rely on large-scale training sets or data augmentation to reduce the label-noise impact, which can be labor-intensive and costly. Here, we introduce a Label Noise-Contrastive Robust Learning (LaNCoR) approach that can effectively handle noisy labels in seismic signal processing tasks, without requiring large-scale training datasets. In this approach, the input waveform feature and label representation distributions are aligned in the feature space to correct mislabeling and reduce its impact on the training process. We present LaNCoR's performance on the task of P-phase arrival-time picking of real microseismic data using two baseline models and training approaches. Our results indicate that LaNCoR can improve performance by up to 28.8% across performance metrics. This approach holds great promise for model training in seismology and geosciences.

2606.15427 2026-06-16 cs.LG cs.AI cs.CV 新提交

Post-Launch Capability Expansion of Vision-Language Models via Prompting for On-Orbit Spacecraft Inspection

通过提示实现视觉语言模型发射后能力扩展用于在轨航天器检测

Nicholas A. Welsh, Lennon J. Shikhman, Monty Nehru Attazs, Seemanthini K. Putane, Van Minh Nguyen, Ryan T. White

发表机构 * Florida Institute of Technology(佛罗里达理工学院) University of Florida(佛罗里达大学)

AI总结 研究利用提示驱动的视觉语言模型在轨扩展语义能力,无需修改权重即可通过自然语言提示检测新航天器部件,在129张图像上零样本实例分割达到0.385 mAP@0.5。

Comments 5 pages, 1 figure, 2 tables. Equal contribution by Nicholas A. Welsh and Lennon Shikhman. Published in the CVPR2026 Workshop on AI4Space

详情
AI中文摘要

星载检测系统通常在发射前部署感知模型,之后更新模型权重或扩展固定标签集在操作上变得不可行。虽然监督模型可以在飞行前集成,但在轨道上添加新的语义能力需要重新训练和重新上传参数。我们研究提示驱动的视觉语言模型是否能够实现发射后语义扩展,允许通过自然语言提示指定新的航天器部件,而无需修改星载权重。我们在一个包含129张先前未见卫星图像的测试集上,采用严格冻结的单次推理协议,评估了航天器部件的零样本实例分割。在固定全局阈值且无后处理的情况下,SAM3达到0.385 mAP@0.5和0.267 mAP@0.5:0.95。性能强烈依赖于尺度:大型结构元素如航天器主体(0.639 AP@0.50)和太阳翼(0.598 AP@0.5)定位可靠,而相对较小的附件如天线(0.221 AP@0.5)和推进器(0.081 AP@0.5)仍然困难。提示形式影响性能,包含空间和几何描述符的结构化提示相比短类别名称提示提升高达82%。该模型在当代嵌入式GPU的内存和计算范围内运行,表明提示驱动的定位可以为主要航天器结构提供发射后语义扩展的实用机制,同时突显了在轨道域偏移下细粒度部件零样本定位的局限性。

英文摘要

Spaceborne inspection systems often deploy perception models prior to launch, after which updating model weights or expanding fixed label sets becomes operationally impractical. While supervised models can be integrated pre-flight, adding new semantic capabilities in orbit requires retraining and re-uploading parameters. We investigate whether prompt-driven vision--language models can enable post-launch semantic expansion, allowing new spacecraft components to be specified via natural-language prompts without modifying onboard weights. We evaluate zero-shot instance segmentation of spacecraft components under a strictly frozen, single-pass inference protocol on a test set of $129$ images of previously unseen satellites. Under fixed global thresholds and no post-processing, SAM3 achieves $0.385$ mAP@$0.5$ and $0.267$ mAP@$0.5{:}0.95$. Performance is strongly scale-dependent: large structural elements like spacecraft bodies ($0.639$ AP@$0.50$) and solar arrays ($0.598$ AP@$0.5$) localize reliably, while relatively small appendages like antennas ($0.221$ AP@$0.5$) and thrusters ($0.081$ AP@$0.5$) remain difficult. Prompt formulation influences performance, with structured prompts incorporating spatial and geometric descriptors yielding up to $82%$ improvement over short category-name prompts. The model operates within the memory and compute envelope of contemporary embedded GPUs, suggesting prompt-driven grounding can provide a practical mechanism for post-launch semantic extension of dominant spacecraft structures while highlighting limitations of zero-shot localization for fine-scale components under orbital domain shift.

2606.15623 2026-06-16 cs.LG cs.AI 新提交

Surprise-Guided MergeSort: Budget-Efficient Human-in-the-Loop Ranking via Adaptive Comparison Scheduling

惊喜引导的归并排序:通过自适应比较调度实现预算高效的人机协同排名

Yujin Park, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University(汉阳大学) Hankuk University of Foreign Studies(韩国外国语大学)

AI总结 提出惊喜引导的归并排序(SGS)框架,利用视觉语言模型(VLM)作为问题优先级排序器,通过自适应预算分配将高模糊度比较路由给人类,在六个基准上以相同预算实现Kendall's τ×100提升6-12点。

Comments 16 pages

详情
AI中文摘要

成对比较是主观排名任务的金标准;然而,穷举标注需要大量人工比较($O(n^2)$)。虽然基于排序的方法已将此负担减少到$O(n\log n)$,但每次比较仍需昂贵的人工判断。为了进一步提高标注效率,我们提出利用视觉语言模型(VLM)不是作为标注替代,而是作为\emph{问题优先级排序器},以识别哪些比较真正需要人工判断。所提出的\textbf{惊喜引导的归并排序(SGS)}框架通过三个集成组件实现这一点:(1)自底向上的归并排序调度器,结构化比较并利用传递性;(2)复合惊喜评分器——结合位置偏差消除的VLM置信度、Elo差距和投票熵——量化比较模糊性;(3)自适应预算分配器,将高惊喜对路由给人类,同时通过传递性推理自动化低惊喜对。在六个不同基准上进行了验证,涵盖文本相似度(STS-B、BIOSSES、SICKR-STS)和图像质量评估(KonIQ-10k、TID2013、LIVE Challenge)。SGS有效地识别并跳过了每次会话多达535个非信息性比较。因此,在相同总预算下,它相对于Active Elo实现了Kendall's $τ{\times}100$提升+6到+12。这些结果表明,将VLM引导的惊喜度量与算法排序相结合,在不同领域提供了普遍一致的准确性-效率权衡。

英文摘要

Pairwise comparison is the gold standard for subjective ranking tasks; however, exhaustive annotation requires a massive number of human comparisons ($O(n^2)$). While sorting-based methods have reduced this burden to $O(n\log n)$, they still require expensive human judgment for every single comparison. To further improve annotation efficiency, we propose leveraging a Vision-Language Model (VLM) not as an annotator replacement, but as a \emph{question prioritizer} to identify which comparisons genuinely require human judgment. The proposed \textbf{Surprise-Guided MergeSort (SGS)} framework achieves this through three integrated components: (1) a bottom-up MergeSort scheduler that structures comparisons and exploits transitivity, (2) a composite Surprise Scorer -- combining position-bias-cancelled VLM confidence, Elo gap, and vote entropy -- to quantify comparison ambiguity, and (3) an adaptive budget allocator that routes high-surprise pairs to humans while automating low-surprise pairs via transitivity inference. Validation was conducted on six diverse benchmarks spanning text similarity (STS-B, BIOSSES, SICKR-STS) and image quality assessment (KonIQ-10k, TID2013, LIVE Challenge). SGS effectively identified and skipped up to 535 non-informative comparisons per session. Consequently, it achieved Kendall's $τ{\times}100$ improvements of $+6$ to $+12$ over Active Elo under the same total budget. These results demonstrate that combining VLM-guided surprise metrics with algorithmic sorting provides a generally consistent accuracy-efficiency trade-off across diverse domains.

2606.15637 2026-06-16 cs.LG 新提交

HAPI-EP: Towards Hybrid, Adaptive, and Predictive Digital Twins of Cardiac Electrophysiology

HAPI-EP:迈向混合、自适应和预测性的心脏电生理数字孪生

Sumeet Vadhavkar, Xiajun Jiang, Yubo Ye, Maryam Toloubidokhti, Linwei Wang

发表机构 * Rochester Institute of Technology(罗切斯特理工学院)

AI总结 提出HAPI框架,通过物理集成灰盒模型、元学习快速自适应和条件生成模型,构建可识别、强预测性的心脏电生理数字孪生。

详情
AI中文摘要

患者特异性心脏的数字孪生(DT)在个性化医疗中具有巨大潜力。然而,其快速动态适应个体实时数据以及适应后的预测能力仍是核心挑战。我们从两个组成部分审视这一挑战:DT公式化中,机械模型和数据驱动模型展现出竞争性的优点和局限性;DT优化策略主要由重建目标驱动,导致模型不可识别。我们通过HAPI——一个用于构建混合、自适应和预测性DT的AI框架——解决这两个瓶颈,该框架包含三个关键使能器。首先,HAPI构建了一个物理集成的灰盒模型,其中可解释的机械骨干网络由神经组件增强,以建模其与观测数据的残差。其次,HAPI不试图在静态混合模型中预编码所有可能的变异,而是通过前馈元学习器实现混合模型对少样本实时数据的快速即时自适应,这些元学习器通过预测目标训练实现机械和神经参数的摊销推理。最后,我们证明这种自适应性对应于构建一个条件生成模型(即混合DT),赋予其理论可识别性,从而在预测场景中表现出色。我们在心脏电生理学中展示了HAPI的概念验证,使用具有机械反应动力学和神经图扩散的混合单域模型。通过合成和真实数据研究,我们表明HAPI的机械-神经混合和预测自适应对于获得具有强预测和分布外能力的可识别DT至关重要。

英文摘要

A digital twin (DT) of a patient-specific heart offers significant potential in personalized medicine. However, its rapid and dynamic adaptation to an individual's live data and its predictive capability after adaptation remains central challenges. We examine this challenge from its two building blocks: DT formulation where mechanistic and data-driven models show competing merits and limitations, and DT optimization strategies that are largely driven by a reconstruction objective leading to un-identifiable models. We address both bottlenecks via HAPI -- an AI framework for building hybrid, adaptive, and predictive DTs with three key enablers. First, HAPI constructs a physics-integrated gray-box model in which an interpretable mechanistic backbone is augmented by a neural component that models its residual to the observed data. Second, rather than attempting to pre-encode all possible variations in a static hybrid model, HAPI enables rapid on-the-fly adaptation of the hybrid model to few-shot live data, achieved by feedforward meta-learners realizing amortized inference of both mechanistic and neural parameters of the hybrid model trained with predictive objectives. Finally, we show that this adaptivity corresponds to the construction of a conditional generative model (i.e., the hybrid DT) that endows it with theoretical identifiability and thus strong performance in predictive scenarios. We demonstrate the proof-of-concept of HAPI in cardiac electrophysiology using a hybrid monodomain model with mechanistic reaction kinetics and neural graph diffusion. Across synthetic and real-data studies, we show that HAPI's mechanistic-neural hybridization and predictive adaptation are critical for obtaining identifiable DTs with strong predictive and out-of-distribution capabilities.

2606.15640 2026-06-16 cs.LG 新提交

Multi-Agent Framework for Audit Risk Assessment with Explicit Uncertainty and Evidence Conflict Modeling

具有显式不确定性和证据冲突建模的审计风险评估多智能体框架

Yuhan Wang, Manqing Wang, Yixuan Lu, Zhaoyue Peng, Shengda Lin

发表机构 * Columbia University(哥伦比亚大学) Trine University(特林大学) University of Sofia(索菲亚大学) University of Illinois at Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Westcliff University(韦斯特克莱夫大学)

AI总结 提出UMAR框架,通过三个专业智能体独立评估风险并校准不确定性,利用Dempster-Shafer理论融合分数并测量冲突,在SEC 10-K数据集上优于基线模型,提供可解释的风险信号。

详情
AI中文摘要

审计风险评估日益受益于结合异质证据源,但现有方法通常产生点预测,而不量化不同证据流的一致程度。我们提出UMAR(不确定性感知多智能体风险评估),一个采用三个专业智能体的框架:MD&A文本智能体、财务比率智能体和CAM智能体,每个智能体产生具有校准不确定性估计的独立风险评分。基于Dempster-Shafer证据理论的不确定性聚合器融合这些分数,同时显式测量智能体间冲突。我们在来自SEC 10-K文件(2019-2023)的3200个公司年观测值的美国数据集上评估UMAR,以财务重述为目标标签。实验结果表明,UMAR的AUROC为0.782,PR-AUC为0.341,优于逻辑回归、XGBoost、FinBERT以及单智能体和双智能体LLM基线。UMAR在所有方法中达到最低的期望校准误差(ECE = 0.052),并识别出与实际重述风险相关的证据冲突模式,为审计师提供潜在可操作且可解释的风险信号。

英文摘要

Audit risk assessment increasingly benefits from combining heterogeneous evidence sources, yet existing approaches typically produce point predictions without quantifying how well different evidence streams agree. We propose UMAR (Uncertainty-Aware Multi-Agent Risk Assessment), a framework that employs three specialized agents: an MD&A Text Agent, a Financial Ratio Agent, and a CAM Agent, each producing independent risk scores with calibrated uncertainty estimates. An Uncertainty Aggregator based on Dempster-Shafer evidence theory fuses these scores while explicitly measuring inter-agent conflict. We evaluate UMAR on a U.S. dataset of 3,200 firm-year observations from SEC 10-K filings (2019-2023), with financial restatement as the target label. Experimental results show that UMAR achieves an AUROC of 0.782 and a PR-AUC of 0.341, outperforming logistic regression, XGBoost, FinBERT, and single-agent and dual-agent LLM baselines. UMAR attains the lowest expected calibration error (ECE = 0.052) among all methods and identifies evidence-conflict patterns that correlate with actual restatement risk, offering auditors potentially actionable and interpretable risk signals.

2606.15642 2026-06-16 cs.LG cs.AI 新提交

CIWI-CKT: Chaos-Informed Wave Interference Feature Fusion and Cross-City Knowledge Transfer for Traffic Flow Forecasting

CIWI-CKT:混沌信息波干涉特征融合与跨城市知识迁移用于交通流预测

Abdul Joseph Fofanah, Lian Wen, David Chen, Shaoyang Zhang

发表机构 * Griffith University(格里菲斯大学) School of Information and Communication Technology, Griffith University(格里菲斯大学信息与通信技术学院) School of Information Engineering, Chang’an University(长安大学信息工程学院)

AI总结 针对跨城市数据稀缺场景,提出CIWI-CKT框架,融合混沌信息波生成、元干涉处理和混沌感知元学习,显著提升预测精度并降低数据需求。

详情
AI中文摘要

在跨城市、数据稀缺的场景下,准确预测交通流仍然具有挑战性,因为有限的历史数据阻碍了模型的泛化能力。交通动态的混沌性质、复杂的时空依赖关系以及异质的城市网络使得跨城市的小样本学习变得复杂。现有的深度学习方法要么将交通视为完全确定性的,要么缺乏对跨体制交通动态至关重要的波状干涉模式进行建模的机制。为了解决这些局限性,本文提出了CIWI-CKT,一种新颖的混沌信息波干涉特征融合框架,结合跨城市知识迁移。我们的框架引入了三个核心创新:混沌信息波生成,提取可测量的混沌不变量并将交通建模为自适应波分量;元干涉处理,捕获支持域和查询域之间的波相互作用,同时生成可预测性分数用于置信度估计;以及混沌感知元学习,在保留混沌特性的同时实现高效的跨城市知识迁移。我们建立了理论保证,包括混沌到波的稳定性、波诱导的降维以及元学习泛化界限。在四个真实世界交通数据集上的大量实验表明,CIWI-CKT显著优于最先进的时空图学习、迁移学习、基于提示和小样本方法,在提高预测精度的同时大幅减少了所需的训练数据。

英文摘要

Accurate traffic flow prediction remains challenging in cross-city, data-scarce scenarios where limited historical data hinders model generalisation. The chaotic nature of traffic dynamics, complex spatio-temporal dependencies, and heterogeneous urban networks complicate few-shot learning across cities. Existing deep learning approaches either treat traffic as purely deterministic or lack mechanisms to model wave-like interference patterns essential for cross-regime traffic dynamics. To address these limitations, this paper proposes CIWI-CKT, a novel Chaos-Informed Wave Interference Feature Fusion framework with Cross-City Knowledge Transfer. Our framework introduces three core innovations: chaos-informed wave generation that extracts measurable chaos invariants and models traffic as adaptive wave components; meta-interference processing that captures wave interactions between support and query regimes while producing a predictability score for confidence estimation; and chaos-aware meta-learning that enables efficient cross-city knowledge transfer while preserving chaotic characteristics. We establish theoretical guarantees including chaos-to-wave stability, wave-induced dimension reduction, and meta-learning generalisation bounds. Extensive experiments on four real-world traffic datasets demonstrate that CIWI-CKT significantly outperforms state-of-the-art spatio-temporal graph learning, transfer learning, prompt-based, and few-shot methods, improving prediction accuracy while substantially reducing required training data.

2606.15701 2026-06-16 cs.LG q-fin.ST 新提交

Robust Transformer-Based One-Step Stock Index Forecasting via Shifted Data Augmentation

基于移位数据增强的鲁棒Transformer一步股票指数预测

Tien Thanh Thach

发表机构 * Faculty of Mathematics and Statistics, Ton Duc Thang University(孙德胜大学数学与统计学院)

AI总结 提出改进的Transformer架构结合余弦退火学习率调度和移位数据增强(SDA),在VN30和S&P 500指数上有效降低预测误差和波动性,优于增加模型复杂度的方法。

详情
AI中文摘要

Transformer在序列建模中取得了显著成功,但由于噪声信号、短记忆动态和分布偏移,其直接应用于金融时间序列仍具有挑战性。本文提出了一种改进的Transformer架构用于一步股票指数预测,结合了先进的学习率调度和一种新颖的移位数据增强(SDA)技术。我们在两个基准股票指数数据集VN30和S&P 500上评估了所提出的框架。实验结果表明,带预热的余弦退火相比广义逆幂调度器持续提高了预测精度。此外,SDA显著降低了预测误差和运行间变异性,同时提高了对超参数选择的鲁棒性。余弦退火调度与SDA的组合在两个数据集上均取得了最佳性能,表明在基于Transformer的金融预测中,数据增强比增加模型复杂度可以发挥更重要的作用。这些发现为在噪声金融环境中进行鲁棒的股票指数预测提供了一种实用且计算高效的方法。

英文摘要

Transformers have shown remarkable success in sequence modeling, yet their direct application to financial time series remains challenging due to noisy signals, short-memory dynamics, and distributional shifts. This paper proposes a modified Transformer architecture for one-step stock index forecasting, combined with advanced learning-rate scheduling and a novel Shifted Data Augmentation (SDA) technique. We evaluate the proposed framework on two benchmark stock index datasets, VN30 and S&P 500. Experimental results demonstrate that cosine annealing with warmup consistently improves forecasting accuracy over the generalized inverse-power scheduler. Furthermore, SDA substantially reduces forecasting errors and run-to-run variability while improving robustness to hyperparameter selection. The combination of cosine annealing scheduling and SDA achieved the best performance on both datasets, indicating that data augmentation can play a more important role than increasing model complexity in Transformer-based financial forecasting. These findings provide a practical and computationally efficient approach for robust stock index forecasting in noisy financial environments.

2606.15756 2026-06-16 cs.LG cs.AI 新提交

From Correlation to Causation in Lane Change Prediction for Automated Driving: A Causal Explanation Framework

从相关性到因果性:自动驾驶换道预测的因果解释框架

Mohamed Manzour, Aditya Kumar, Augusto Luis Ballardini, Miguel Ángel Sotelo

发表机构 * University of Alcalá(阿尔卡拉大学)

AI总结 提出基于因果推断的换道预测框架,结合深度结构因果建模与干预效应分析,在预测准确率超过95%的同时,识别直接贡献变量及其因果链,实现可解释的因果推理。

详情
AI中文摘要

换道预测是智能车辆的核心任务,提前预测操作有助于更安全的决策。然而,现有方法主要学习观测驾驶变量与未来操作之间的统计关联,而忽略了输入变量之间的因果依赖关系。这限制了可解释性,尤其是当纵向间隙、相对纵向速度和碰撞时间(TTC)等物理相关变量被视为独立平坦输入时。本文提出一个基于因果推断的换道预测与解释框架。该方法结合语言特征构建、专家约束的因果发现、基于深度端到端因果推断(DECI)的深度结构因果建模、基于干预的效果分析、反驳测试和递归因果链解释。目标不仅是预测未来操作,还要识别直接贡献于预测的候选变量、影响这些变量的上游因素以及这些效应传播的因果链。该框架在车道标记交叉事件前的前三秒内平均F1分数超过95%。除了预测精度,该框架使用基于干预的效果分析,在学到的因果结构下区分有影响力的变量和弱影响力变量。它进一步区分候选直接贡献者和中介效应,并生成对比性因果链解释,阐明为什么预测的操作更受青睐,而替代操作支持较少。因此,主要贡献是一个机制感知的换道预测流程,从基于相关性的分类转向更可解释的因果推理用于操作预测。

英文摘要

Lane-change prediction is a central task in intelligent vehicles, where early maneuver anticipation can support safer decision-making. However, many existing approaches mainly learn statistical associations between observed driving variables and future maneuvers, while overlooking the causal dependencies among the input variables themselves. This limits interpretability, especially when physically related variables such as longitudinal gap, relative longitudinal velocity, and Time-To-Collision (TTC) are treated as independent flat inputs. This article presents a causal-inference-based framework for lane-change prediction and explanation. The proposed approach combines linguistic feature construction, expert-constrained causal discovery, deep structural causal modeling with Deep End-to-end Causal Inference (DECI), intervention-based effect analysis, refutation testing, and recursive causal-chain explanation. The objective is not only to predict the future maneuver, but also to identify candidate variables that directly contribute to the prediction, the upstream factors influencing them, and the causal chains through which these effects propagate. The framework achieves average F1-scores above 95% during the first three seconds before the lane-marking crossing event. Beyond prediction accuracy, the framework uses intervention-based effect analysis to distinguish influential from weakly influential variables under the learned causal structure. It further distinguishes candidate direct contributors from mediated effects and generates contrastive causal-chain explanations that clarify why the predicted maneuver is favored and why the alternative maneuvers are less supported. The main contribution is therefore a mechanism-aware lane-change prediction pipeline that moves beyond correlation-based classification toward more interpretable causal reasoning for maneuver prediction.

2606.15784 2026-06-16 cs.LG cs.CE 新提交

Bayesian Networks with Latent Time Embedding for Stage-Aware Causal Modeling of Alzheimer's Disease Progression

具有潜在时间嵌入的贝叶斯网络用于阿尔茨海默病进展的阶段感知因果建模

Nguyen Linh Dan Le

发表机构 * Alzheimer's Disease Neuroimaging Initiative(阿尔茨海默病神经影像学倡议) Open Access Series of Imaging Studies(开放获取影像学研究系列)

AI总结 提出BN-LTE框架,结合贝叶斯网络与潜在时间嵌入,利用AT(N)级联约束建模AD进展,在ADNI数据上优于基线,并识别出淀粉样蛋白敏感性的中期伪时间窗口。

Comments 7 pages, 5 figures

详情
AI中文摘要

阿尔茨海默病(AD)的进展通常通过淀粉样蛋白-tau-神经退行性变(AT(N))级联来描述。然而,大多数纵向模型要么将这种级联表示为固定的生物标志物序列,要么表示为黑箱预测任务。这使得难以确定生物学引导的生物标志物关系何时影响未来的区域病理。在本研究中,我们引入了具有潜在时间嵌入的贝叶斯网络(BN-LTE),这是一个用于AD进展阶段感知建模的贝叶斯结构框架。BN-LTE从基线生物标志物谱估计疾病伪时间,并根据生物学上合理的AT(N)排序约束有向依赖关系。然后使用后验样条变结构方程将初始多模态测量与未来的年度区域tau-PET变化联系起来。在使用ADNI数据的重复受试者分离评估中,与包含的预测基线相比,BN-LTE显示出tau进展的强空间重建。除了空间重建,BN-LTE恢复了后验阶段变化的AT(N)约束效应,并识别出淀粉样蛋白敏感性的中期伪时间窗口。该窗口得到模型隐含的g公式对比、根调整AIPW、机制敏感消融以及跨样条和先验规范的鲁棒性分析的支持。总体而言,这些发现将BN-LTE定位为一种贝叶斯结构框架,用于预测tau进展,同时检查观察性纵向神经影像数据中阶段依赖的AT(N)级联机制。我们的代码可在https://github.com/danleneurocom/BN-LTE获取。

英文摘要

Alzheimer's disease (AD) progression is often described through the amyloid-tau-neurodegeneration, or AT(N), cascade. However, most longitudinal models represent this cascade either as a fixed sequence of biomarkers or as a black-box forecasting task. This makes it difficult to determine when biologically guided biomarker relationships influence future regional pathology. In this study, we introduce Bayesian Networks with Latent Time Embedding (BN-LTE), a Bayesian structural framework for stage-aware modeling of AD progression. BN-LTE estimates disease pseudotime from baseline biomarker profiles and constrains directed dependencies according to biologically plausible AT(N) ordering. Posterior spline-varying structural equations are then used to link initial multimodal measurements with future annualized regional tau-PET change. Across repeated subject-disjoint evaluations using ADNI data, BN-LTE shows strong spatial reconstruction of tau progression compared with the included forecasting baselines. Beyond spatial reconstruction, BN-LTE recovers posterior stage-varying AT(N)-constrained effects and identifies a mid-pseudotime window of amyloid sensitivity. This window is supported by model-implied g-formula contrasts, root-adjusted AIPW, mechanism-sensitive ablations, and robustness analyses across spline and prior specifications. Overall, these findings position BN-LTE as a Bayesian structural framework for forecasting tau progression while examining stage-dependent AT(N)-cascade mechanisms in observational longitudinal neuroimaging data. Our code is available at https://github.com/danleneurocom/BN-LTE.

2606.15807 2026-06-16 cs.LG cs.AI 新提交

Continuous Cross-Domain Traffic State Prediction via Memory-Augmented Graph Liquid Time-Constant Networks

基于记忆增强图液态时间常数网络的连续跨域交通状态预测

Jinrong Xiang, Ming Xu

发表机构 * Software College, Liaoning Technical University(辽宁工程技术大学软件学院)

AI总结 提出记忆增强图液态时间常数网络(MA-GLTC),通过时空单元分解、图液态时间常数动态和记忆迁移存储机制,实现连续时间下的跨域交通状态预测,在五个数据集上优于现有方法。

详情
AI中文摘要

交通状态预测是智能交通系统中的一项基本任务。在实际应用中,一些区域由于感知基础设施不足而面临有限的交通观测,使得跨域知识迁移成为数据稀缺交通预测的重要解决方案。然而,现有的跨域交通预测方法仍面临若干局限,包括粗粒度的源-目标域适应、处理未见目标域模式的能力有限,以及在非规则或异质时间条件下对连续交通动态建模不足。为解决这些问题,本文提出了一种连续跨域交通预测框架,称为记忆增强图液态时间常数网络(MA-GLTC)。具体地,我们首先构建时空单元(STU)将交通网络分解为可迁移的局部单元,实现跨域的细粒度知识对齐。然后,开发了图液态时间常数网络(GLTC)来建模连续时间下图耦合的交通演化。与通用的基于图神经ODE的模型不同,GLTC将图耦合的循环电导引入液态时间常数动态,允许节点状态随泄漏、自适应时间常数和邻域感知反馈而演化。此外,设计了基于记忆的迁移存储(MTS)机制,以保留源域知识、检索匹配的交通模式,并在出现未见状态时更新可靠的目标域模式。在五个公开交通数据集上的实验表明,MA-GLTC在短期和长期预测任务中均持续优于代表性的域内和跨域基线。与次优方法相比,MA-GLTC分别将平均预测误差降低了3.02%、0.33%、8.92%、10.09%和2.11%。

英文摘要

Traffic state prediction is a fundamental task in intelligent transportation systems. In practical applications, some regions suffer from limited traffic observations due to insufficient sensing infrastructure, making cross-domain knowledge transfer an important solution for data-scarce traffic prediction. However, existing cross-domain traffic prediction methods still face several limitations, including coarse-grained source-target adaptation, limited capability in handling unseen target-domain patterns, and insufficient modeling of continuous traffic dynamics under irregular or heterogeneous temporal conditions. To address these issues, this paper proposes a continuous cross-domain traffic prediction framework, termed Memory-Augmented Graph Liquid Time-Constant Network (MA-GLTC). Specifically, we first construct spatio-temporal units (STUs) to decompose traffic networks into transferable local units, enabling fine-grained knowledge alignment across domains. Then, a graph liquid time-constant network (GLTC) is developed to model graph-coupled traffic evolution in continuous time. Different from generic graph neural ODE-based models, GLTC introduces graph-coupled recurrent conductance into liquid time-constant dynamics, allowing node states to evolve with leakage, adaptive time constants, and neighborhood-aware feedback. Furthermore, a Memory-based Transfer Storage (MTS) mechanism is designed to preserve source-domain knowledge, retrieve matched traffic patterns, and update reliable target-domain patterns when unseen states emerge. Experiments on five public traffic datasets demonstrate that MA-GLTC consistently outperforms representative innerdomain and cross-domain baselines in both short-term and longterm prediction tasks. Compared with the second-best method, MA-GLTC reduces the average prediction errors by 3.02%, 0.33%, 8.92%, 10.09%, and 2.11%, respectively.

2606.15892 2026-06-16 cs.LG 新提交

Scalar-pathway fidelity improves physical accuracy in short-range equivariant interatomic potentials

标量路径保真度提高短程等变原子间势的物理准确性

Jia Bi, Alin Marin Elena, Samuel Pinilla

发表机构 * Science and Technology Facilities Council(科学技术设施委员会) Diamond Light Source(钻石光源)

AI总结 提出标量路径修正方法(PAN池化和PGS混合器),在保持等变骨架不变下优化标量通道,使MACE等势的力误差降低22-27%,能量误差降低19-22%,且计算开销仅增5%。

详情
AI中文摘要

精确的原子间势能实现超越密度泛函理论长度和时间尺度的材料、分子和界面的分子动力学。等变神经网络势能改进了局部几何的表示。然而,其可部署的能量表面最终通过不变的标量通道体现,这些通道的聚合和光谱分辨率相对未充分研究。这里我们使用物理感知邻域(PAN)池化和物理引导光谱(PGS)混合器作为受控的标量路径探针:轻量级、对称性保持的修改,仅作用于\(\ell=0\)通道,同时保持等变张量主干不变。使用MACE作为高体阶机制支架,PAN添加协调敏感幅度调制,而PGS用径向和锥形光谱基增强边和读出标量特征。在金属Ag、共价Si、短程离子LiF/Li--F子集和MD17/rMD17分子上,这种标量路径修正将MACE力误差降低22-27%,能量误差降低19-22%;在带有应力标签的系统上,应力误差降低27-28%,推理FLOPs成本增加约5%。在Allegro和NequIP中方向一致的增益进一步表明该修正可跨不同短程等变主干移植,尽管效果大小仍依赖于架构。这些结果将标量路径保真度确定为短程等变原子间势的一个实用设计维度。

英文摘要

Accurate interatomic potentials enable molecular dynamics of materials, molecules, and interfaces beyond density-functional-theory length and time scales. Equivariant neural network potentials have improved the representation of local geometry. However, their deployable energy surfaces ultimately manifest through invariant scalar channels, whose aggregation and spectral resolution remain comparatively underexamined. Here we use Physics-Aware Neighborhood (PAN) pooling and Physics-Guided Spectral (PGS) mixers as controlled scalar-pathway probes: lightweight, symmetry-preserving modifications that act only on \(\ell=0\) channels while leaving the equivariant tensor backbone unchanged. Using MACE as a high-body-order mechanistic scaffold, PAN adds coordination-sensitive amplitude modulation, whereas PGS augments edge and readout scalar features with radial and tapered spectral bases. Across metallic Ag, covalent Si, a short-range ionic LiF/Li--F subset, and MD17/rMD17 molecules, this scalar-pathway correction reduces MACE force errors by 22--27\% and energy errors by 19--22\%; on systems with stress labels, stress errors decrease by 27--28\%, at approximately 5\% additional inference-FLOPs cost. Directionally consistent gains in Allegro and NequIP further indicate that the correction is portable across distinct short-range equivariant backbones, although effect sizes remain architecture-dependent. These results identify scalar-pathway fidelity as a practical design dimension for short-range equivariant interatomic potentials.

2606.15927 2026-06-16 cs.LG 新提交

An Exploratory Study of Blood Glucose Estimation from Photoplethysmography Signals using Machine Learning

基于机器学习从光电容积脉搏波信号估计血糖的探索性研究

Ruhani Bhatia, Vijval Ekbote

发表机构 * Indraprastha Institute of Information Technology, Delhi(德里印度信息技术学院)

AI总结 本研究利用智能手表PPG信号和CGM血糖数据构建机器学习模型,探索无创血糖估计的可行性,初步结果显示存在预测信号但需更多数据验证。

Comments 7 pages, 3 figures

详情
AI中文摘要

糖尿病和极端血糖水平是当今人类面临的主要健康问题之一。虽然连续血糖监测(CGM)已成为管理糖尿病和监测血糖水平的有效技术,但该技术传统上是侵入性的(即需要刺穿皮肤),并存在刺激、硬结等风险。这凸显了对准确且可大规模部署的非侵入性CGM方法的需求。随着各种传感技术的出现及其在智能手表等可穿戴设备中的集成,我们现在能够以非侵入方式连续监测光电容积脉搏波(PPG)等身体信号。通过CGM连续监测血糖并通过智能手表连续监测PPG信号的能力,为我们提供了获取这两类密集数据的机会,从而开启了构建基于机器学习和深度学习的模型以从PPG信号估计血糖水平的可能性。在这项工作中,我们首先提供了一个配对数据集,包含来自智能手表的连续PPG信号以及使用CGM设备记录的血糖值。我们还展示了在数据集上进行的一些初步实验探索的结果。这些初步结果表明可能存在一些预测信号,但需要来自更多个体的更多数据进行进一步探索。数据集可在 https://zenodo.org/records/20577959 获取。

英文摘要

Diabetes and extreme blood sugar levels are some of the major health problems faced by humans today across the world. While Continuous Glucose Monitoring (CGM) has emerged as an effective technology for management of diabetes as well as for monitoring blood sugar levels, this technology has traditionally been invasive (that is, requiring the piercing of the skin) and carries the risk of irritation, induration, etc. This highlights the need for accurate and non-invasive CGM methods that can be deployed at scale. With the emergence of various sensing technologies and their integration in wearables like the smart-watch, we now have the capability to continuously monitor body signals like the Photoplethysmogram (PPG) in a non-invasive manner. Having the ability to continuously monitor blood glucose through CGMs and continuously monitor PPG signals through a smart-watch offers an opportunity to get dense data on these two, opening the possibility of building machine learning and deep learning based models to estimate blood glucose level from PPG signals. In this work, we first present a paired dataset comprising continuous PPG signals from a smartwatch along with glucose values recorded using a CGM device. We also present the results of some preliminary experimental explorations performed on our dataset. These preliminary results suggest that some predictive signals may exist, though more exploration is needed with more data from a larger number of individuals. The dataset can be accessed at https://zenodo.org/records/20577959

2606.16023 2026-06-16 cs.LG 新提交

IBAD: Interpretable Behavioral Anomaly Detection on Human Mobility Data

IBAD:人类移动数据上的可解释行为异常检测

Bita Azarijoo, John Krumm, Cyrus Shahabi

发表机构 * University of Southern California(南加州大学)

AI总结 提出IBAD框架,利用LDA学习可解释的日常移动模板,通过层次自监督模型检测个体行为异常,在真实和合成数据集上验证了模板的可迁移性和鲁棒性。

详情
AI中文摘要

人类移动行为看似高度多样化,但个体日常移动的大部分可由少量重复的行为模板解释,如通勤、学校活动、照护、夜生活或差事模式。我们提出 \texttt{IBAD}(可解释行为异常检测),该框架学习可解释的日常移动模板,并将每个个体表示为这些模板混合上的分布。IBAD 不关注特定位置,而是刻画个体在不同地点执行的活动。该方法首先使用潜在狄利克雷分配(LDA)发现全局行为模板,然后采用层次自监督模型从个体的软行为模板中学习正常行为。我们还引入了一个 \emph{拼接基准},用于在个体历史画像与注入的移动模式之间创建受控的行为不匹配。在真实和合成数据集上的实验表明,日常行为可有效分解为少量可解释的模板。关键的是,我们证明学习到的行为原型在不同地理和人口统计背景下具有 \emph{可迁移性}。此外,IBAD 在所有设置下均保持稳健的竞争性能。为便于复现,代码可在 \href{https://github.com/USC-InfoLab/IBAD}{https://github.com/USC-InfoLab/IBAD} 获取。

英文摘要

Human mobility appears highly diverse, yet much of a person's daily mobility can be explained by a small set of recurring behavioral templates, such as commuting, school-centered activities, caregiving, nightlife, or errand patterns. We present \texttt{IBAD} (\underline{I}nterpretable \underline{B}ehavioral \underline{A}nomaly \underline{D}etection), a framework that learns interpretable daily mobility templates and represents each individual as a distribution over mixtures of these templates. Rather than focusing on specific locations, IBAD characterizes activities that individuals perform across locations. This approach first discovers global behavioral templates using Latent Dirichlet Allocation (LDA), then employs a hierarchical self-supervised model to learn normal behavior of individuals from their soft behavioral templates. We also introduce a \emph{splicing benchmark} that creates controlled behavioral mismatches between an individual's historical profile and injected mobility patterns. Experiments on real-world and synthetic datasets show that daily behavior can be effectively decomposed into a small number of interpretable templates. Crucially, we show that the learned behavioral archetypes \emph{transfer} across distinct geographic and demographic contexts. Furthermore, IBAD maintains a robust competitive performance across all settings. For reproducibility purposes, the code is accessible at ~\href{https://github.com/USC-InfoLab/IBAD}{https://github.com/USC-InfoLab/IBAD}.

2606.16056 2026-06-16 cs.LG cs.HC 新提交

Beyond the Blood Draw: Explainable Machine Learning for Non-Invasive Dysglycemia Risk Screening

超越抽血:用于非侵入性血糖异常风险筛查的可解释机器学习

Black Sun, Chenyi Zhang, Kaiyi Ji, Xi Lu

发表机构 * Department of Computer Science, Aarhus University(奥胡斯大学计算机科学系) University at Buffalo, SUNY(纽约州立大学布法罗分校)

AI总结 利用NHANES数据训练LightGBM等六种机器学习模型,实现无需实验室检测的血糖异常风险筛查,AUC达0.820,优于传统风险评分,并识别出年龄、种族和腰高比等关键预测因素。

详情
AI中文摘要

血糖异常,包括糖尿病前期和糖尿病,影响着全球大量成年人,但其中许多人仍未得到诊断。我们开发并验证了用于非侵入性血糖异常风险筛查的机器学习模型,这些模型无需实验室检测。汇集2017-2023年国家健康与营养调查(NHANES)数据(n=14,352),我们使用分层5折交叉验证训练了六种机器学习模型,并将其与两种既定的临床风险评分进行比较。LightGBM在受试者工作特征曲线下面积(AUC=0.820,95% CI:0.806-0.835)上表现最佳,优于芬兰糖尿病风险评分(0.745)和美国糖尿病协会风险测试(0.783)。SHAP分析确定年龄、种族/民族和腰高比是最有影响力的预测因素。亚组分析证实了在不同人口统计分层中的一致表现(AUC:0.735-0.832)。这些结果证明了在社区环境和自我跟踪健康应用中部署可解释、无需实验室的血糖异常筛查的可行性。

英文摘要

Dysglycemia, encompassing both prediabetes and diabetes, affects huge numbers of adults worldwide, yet many of them remain undiagnosed. We developed and validated machine-learning (ML) models for non-invasive screening of dysglycemia risk that require no laboratory tests. Pooling data from the National Health and Nutrition Examination Survey (NHANES) 2017--2023 (n=14,352), we trained six ML models with stratified 5-fold cross-validation and compared them with two established clinical risk scores. LightGBM achieved the highest area under the receiver operating characteristic curve (AUC=0.820, 95% CI: 0.806--0.835), outperforming the Finnish Diabetes Risk Score (0.745) and American Diabetes Association Risk Test (0.783). SHAP analysis identified age, race/ethnicity, and waist-to-height ratio as the most influential predictors. Subgroup analyses confirmed consistent performance across demographic strata (AUC: 0.735--0.832). These results demonstrate the feasibility of explainable, laboratory-free dysglycemia screening for deployment in community settings and self-tracking health applications.

2606.16160 2026-06-16 cs.LG cs.AI cs.HC 新提交

A comparative and critical study of EEGNet for fNIRS-driven cognitive load classification

EEGNet在fNIRS驱动的认知负荷分类中的比较与批判性研究

Mehshan Ahmed Khan, Houshyar Asadi, Li Zhang, Mohammad reza Chalak Qazani, Ghazal Bargshady, Stefanos gkikas, Christian arzate, Sam Oladazimi, Zoran Najdovsk, Lei Wei, Chee Peng Lim

发表机构 * Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University(智能系统研究与创新研究所(IISRI),德克萨斯大学) Department of Computer Science, Royal Holloway, University of London(伦敦大学皇家霍洛威学院计算机科学系) College of Science and Engineering, James Cook University(詹姆斯库克大学科学与工程学院) Faculty of Science and Technology, University of Canberra(堪培拉大学科学与技术学院) Honda research institute (HRI), Japan(日本本田研究院) Swinburne University of Technonology, Hawthorn, Victoria(技术学院,维多利亚州哈沃恩)

AI总结 本研究系统评估EEGNet在fNIRS认知负荷分类中的性能,发现重叠分段和小固定学习率在随机分割中表现最佳,但受试者独立评估准确率大幅下降,非重叠分段和PCA特征在SI评估中取得最佳56.11%准确率,表明消除时间冗余有助于学习更鲁棒的跨个体表征。

详情
AI中文摘要

由于时间变异性、受试者间差异以及对预处理选择的敏感性,从功能性近红外光谱(fNIRS)信号中准确分类认知负荷仍然是一个重大挑战。本研究通过系统检查时间分割策略(重叠与非重叠)、窗口长度(10秒、20秒、30秒)、特征提取方法(方差分析(ANOVA)、主成分分析(PCA)、快速独立成分分析(FastICA))、学习率配置(固定和自适应)以及评估协议(随机分割与受试者独立(SI))的影响,对EEGNet在基于fNIRS的认知负荷分类中进行了全面评估。随机分割实验的结果表明,重叠分割结合较小的固定学习率(0.01-0.001)由于时间冗余和血流动力学转变的密集采样而产生了最高的准确率。然而,SI评估显示准确率大幅下降,表明对未见参与者的泛化能力有限。在SI评估下,非重叠分割优于重叠窗口,使用PCA特征、20秒窗口和0.1学习率获得了最佳准确率56.11%。这些发现表明,消除时间冗余有助于模型学习更鲁棒和可泛化的跨个体认知负荷表征。尽管自适应学习率策略提高了训练稳定性,但并未超过最优选择的固定学习率的性能。该研究强调了分割策略和学习率选择在提高模型泛化能力中的关键作用,并指出了开发基于fNIRS的可靠、实时和受试者独立认知负荷分类系统所必需的方法学考虑。

英文摘要

Accurately classifying cognitive load from functional near-infrared spectroscopy (fNIRS) signals remains a significant challenge due to temporal variability, inter-subject differences, and sensitivity to preprocessing choices. This study provides a comprehensive evaluation of EEGNet for fNIRS-based cognitive load classification by systematically examining the effects of temporal segmentation strategies (overlapping vs. non-overlapping), window lengths (10s, 20s, 30s), feature extraction methods (Analysis of Variance (ANOVA), Principal Component Analysis (PCA), Fast Independent Component Analysis (FastICA)), learning rate configurations (fixed and adaptive), and evaluation protocols (random split vs. subject-independent (SI)). Results from random-split experiments show that overlapping segmentation, combined with smaller fixed learning rates (0.01-0.001), yields the highest accuracies, due to temporal redundancy and dense sampling of hemodynamic transitions. However, SI evaluation reveals a substantial drop in accuracy, demonstrating limited generalization to unseen participants. Under SI evaluation, non-overlapping segmentation outperformed overlapping windows, with the best accuracy of 56.11% achieved using PCA features with a 20-second window and a 0.1 learning rate. These findings indicate that eliminating temporal redundancy helps the model learn more robust and generalizable representations of cognitive load across individuals. Although adaptive learning rate strategy improved training stability, it did not surpass the performance of optimally selected fixed learning rates. The study highlights the critical role of segmentation strategy and learning rate selection in improving model generalization and identifies methodological considerations essential for developing reliable, real-time, and SI cognitive load classification systems using fNIRS.

2606.16183 2026-06-16 cs.LG cs.AI cs.CL 新提交

LLM-Powered Virtual Population for Demand Simulation and Pricing

基于LLM的虚拟人群用于需求模拟与定价

Chengpiao Huang, Kaizheng Wang

发表机构 * Columbia University(哥伦比亚大学)

AI总结 提出一种LLM驱动的虚拟人群模型,通过混合客户画像和LLM评估购买概率,生成需求分布,支持风险感知定价,在H&M数据集上表现最优。

Comments 18 pages, 7 figures

详情
AI中文摘要

我们开发了一个基于LLM的虚拟人群模型,用于模拟定价决策中的需求,其中产品由丰富的非结构化信息(如文本描述和图像)描述,决策者不仅需要平均需求预测,还需要反事实价格的不确定性估计。我们的模型将暴露的客户表示为从有限混合客户画像中的抽取。对于每个画像、产品和候选价格,LLM使用结构化画像信息和非结构化产品信息来引出画像级别的购买概率。这些概率通过校准的混合权重聚合,形成总需求的预测分布。生成的模拟器可以在各种定价目标下评估反事实价格,包括期望收入和风险感知标准(如条件风险价值)。我们在一个包含产品描述和图像的在线H&M时尚数据集上测试了该框架。校准后的基于LLM的模拟器在所考虑的模型中实现了最佳的整体预测性能,并支持样本高效的定价决策。我们的框架提供了一种实用的方法,将LLM用作需求模拟器,适用于历史需求数据有限但产品信息丰富的产品。通过生成完整的需求预测分布而不仅仅是点预测,它使管理者能够比较候选价格、量化需求不确定性,并选择针对平均收入或风险感知目标的价格。

英文摘要

We develop an LLM-powered virtual population model that simulates demand for pricing decisions, in settings where products are described by rich unstructured information, such as text descriptions and images, and where decision makers need not only mean-demand predictions but also uncertainty estimates for counterfactual prices. Our model represents exposed customers as draws from a finite mixture of customer personas. For each persona, product, and candidate price, an LLM elicits a persona-level purchase probability using both structured persona information and unstructured product information. These probabilities are aggregated through calibrated mixture weights to form a predictive distribution of aggregate demand. The resulting simulator can evaluate counterfactual prices under various pricing objectives, including expected revenue and risk-aware criteria such as conditional value at risk. We test the framework on an online H&M fashion dataset with product descriptions and images. The calibrated LLM-based simulator achieves the best overall predictive performance among the models considered, and supports sample-efficient pricing decisions. Our framework provides a practical way to use LLMs as demand simulators for products with limited historical demand data but rich product information. By producing a full predictive demand distribution rather than only a point forecast, it enables managers to compare candidate prices, quantify demand uncertainty, and choose prices that target either average-case revenue or risk-aware objectives.

2606.16226 2026-06-16 cs.LG 新提交

Prediction of Runtime Parameters of Parallel Chemistry Applications via Active and Generative Learning

通过主动和生成学习预测并行化学应用的运行时参数

Tanzila Tabassum, Omer Subasi, Ajay Panyala, Epiya Ebiapia, Gerald Baumgartner, Erdal Mutlu, P Sadayappan, Karol Kowalski

发表机构 * Louisiana State University(路易斯安那州立大学) Pacific Northwest National Laboratory(太平洋西北国家实验室) University of Utah(犹他大学)

AI总结 提出基于主动学习和生成学习的机器学习方法,结合梯度提升回归树模型,预测并行化学计算的运行时参数,在CCSD计算中MAPE低至0.023,R²高达99.9%。

详情
AI中文摘要

在这项工作中,我们开发了两种主要的基于机器学习的方法来预测高度可扩展的并行化学计算的运行时参数。这些方法将主动学习和生成学习与经验确定的梯度提升回归树模型相结合,该模型是从丰富的机器学习模型套件中选出的。当在耦合簇单双激发计算上进行评估时,我们的模型实现了低至0.023的平均绝对误差百分比(MAPE)和高达99.9%的决定系数。此外,当与主动学习相结合以缓解缺乏大量训练数据的问题时,我们的模型在使用原始数据集的20-25%时,MAPE约为0.2。

英文摘要

In this work, we develop two main Machine Learning based approaches to predict the runtime parameters of highly scalable parallel chemistry computations.These approaches employ active and generative learning together with the empirically determined gradient boosted regression tree models chosen among a rich suite of machine learning models. When evaluated on Coupled-Cluster with Singles and Doubles computations, our models achieve a mean absolute error percentage (MAPE) as low as 0.023 and a coefficient of determination as high as 99.9%. Furthermore, when combined with active learning to mitigate the lack of large amounts of training data, our models score a MAPE about 0.2 with 20-25% of the original dataset.

2606.16434 2026-06-16 cs.LG cs.AI 新提交

Autonomous End-to-End SOH Prediction Services for Battery Systems via Temporal-Contrastive Representation Learning

基于时间对比表示学习的电池系统自主端到端健康状态预测服务

Junting Wen, Dan Li, Qihao Quan, Xiwen Wang, Hang Yang, Zhaohong Meng, Zigui Jiang, Changlin Yang, Tianle Liu, Diego Muñoz-Carpintero, Jian Lou

发表机构 * School of Software Engineering, Sun Yat-sen University(中山大学软件学院) Tianneng Battery Group Co., Ltd(天能电池集团有限公司) School of Communication Engineering, Hangzhou Dianzi University(杭州电子科技大学通信工程学院) Institute of Engineering Science, Universidad de O’Higgins(奥希金斯大学工程科学研究所)

AI总结 提出TC-SOH模块化服务架构,通过时间对比机制和跨窗口预测任务从原始数据中提取退化相关表示,实现自主端到端SOH预测,在四个数据集上MAPE和RMSE分别降低1.91倍和2.13倍。

详情
AI中文摘要

准确的状态健康(SOH)估计是锂离子电池管理的关键诊断服务。然而,依赖劳动密集型的手动特征工程和不透明的黑箱模型阻碍了可扩展的工业部署。为此,我们引入TC-SOH:一种模块化、即插即用的服务架构,用于自主、端到端的SOH预测。TC-SOH采用时间对比机制和跨窗口预测预任务,直接从原始运行数据中提取与退化相关的表示。为了提高透明度,我们将模型效能与表示诊断联系起来:可视化、敏感性分析、冗余分析、双向探测、未来SOH探测和时间洗牌表明,学习到的特征与选定的专家描述符重叠,同时保留了额外的SOH相关变化,并且有序的时间上下文改善了后续SOH预测。在四个公开数据集上,TC-SOH优于所考虑的物理信息和数据驱动基线,MAPE降低了1.91倍,RMSE降低了2.13倍。

英文摘要

Accurate state of health (SOH) estimation is a critical diagnostic service for lithium-ion battery management. However, reliance on labor-intensive manual feature engineering and opaque black-box models hinders scalable industrial deployment. To address this, we introduce TC-SOH: a modular, plug-and-play service architecture for autonomous, end-to-end SOH prediction. TC-SOH employs a temporal-contrastive mechanism and a cross-window prediction pretext task to extract degradation-relevant representations directly from raw operational data. To improve transparency, we connect model efficacy with representation diagnostics: visualization, sensitivity analysis, redundancy analysis, bidirectional probing, future-SOH probing, and temporal shuffling show that learned features overlap with selected expert descriptors while retaining additional SOH-relevant variation, and that ordered temporal context improves subsequent-SOH prediction. Across four public datasets, TC-SOH outperforms the considered physics-informed and data-driven baselines, reducing MAPE by 1.91 times and RMSE by 2.13 times.

2606.16580 2026-06-16 cs.LG cs.CV 新提交

Multi-Modal Spatio-Temporal Graph Neural Network with Mixture of Experts for Soil Organic Carbon Prediction

基于专家混合的多模态时空图神经网络用于土壤有机碳预测

Daniele Mos, Felipe Drummond, Anton Bossenbroek, Soufiane el Khinifri

发表机构 * Spatialise B.V.

AI总结 提出SpTGNN,一种多模态时空图神经网络,通过异构图注意力、微调基础模型特征提取和稀疏专家混合融合,结合异方差回归与深度集成的不确定性量化,在三个区域数据集上优于XGBoost基线。

Comments Paper is 27 pages, 14 figures, 12 tables

详情
AI中文摘要

表层土壤有机碳(SOC)预测是农业可持续性、土地利用政策和施肥规划的基础。现有方法面临两个限制:它们将手工制作的协变量与经典机器学习或单模态深度模型配对,忽略了丰富的光谱和时间信息,而基于网格的架构忽略了田间测量的不规则空间结构。我们提出了SpTGNN,一种多模态时空图神经网络来解决这两个问题。SpTGNN将土壤测量表示为具有三种边类型(空间邻近性、光谱相似性、高程)的异构图中的节点,并应用关系图注意力来学习每种关系的独立模式。一个微调的TerraMind编码器从Sentinel-2、Sentinel-1和DEM信号中提取节点特征,并结合每个样本的环境协变量以及学习到的位置和时间嵌入。一个稀疏专家混合模块通过top-$k$路由融合四个流。通过配对异方差回归(偶然不确定性)和深度集成(认知不确定性)来捕获不确定性,并使用Moran's $I$惩罚项正则化空间自相关。我们在一个全球SOC语料库上进行评估,该语料库分为三个区域实例(全球约49k样本,非洲约26k,欧洲约14k)。我们的5成员深度集成在非洲测试集上报告$R^2=0.762$,RMSE $=3.51\pm0.48$ g/kg和MAPE $=22.9\\%$,优于表格XGBoost基线;最佳单个检查点达到验证$R^2=0.864$。消融实验证实异构图、MoE融合和微调主干各自贡献显著,集成不确定性量化栈实现后校准ECE为$0.031$(混合)和$0.026$($\beta$-NLL)。据我们所知,这是第一个统一基础模型特征提取、异构图注意力和分解不确定性量化的SOC估计框架。

英文摘要

Top-soil organic carbon (SOC) prediction is fundamental to agricultural sustainability, land use policy and fertilization planning. Existing approaches face two limitations: they pair hand-crafted covariates with classical ML or single-modal deep models that miss rich spectral and temporal information, and grid-based architectures ignore the irregular spatial structure of field measurements. We introduce SpTGNN, a multi-modal spatio-temporal graph neural network addressing both. SpTGNN represents soil measurements as nodes in a heterogeneous graph with three edge types (spatial proximity, spectral similarity, elevation), and applies relational graph attention to learn separate patterns per relation. A fine-tuned TerraMind encoder extracts node features from Sentinel-2, Sentinel-1 and DEM signals, combined with per-sample environmental covariates and learned positional and temporal embeddings. A sparse Mixture-of-Experts module fuses the four streams via top-$k$ routing. Uncertainty is captured by pairing heteroscedastic regression (aleatoric) with deep ensembles (epistemic), and a Moran's $I$ penalty regularizes spatial autocorrelation. We evaluate on a global SOC corpus split into three regional instances ($\sim$49k samples globally, Africa $\sim$26k, Europe $\sim$14k). Our 5-member deep ensemble reports $R^2=0.762$, RMSE $=3.51\pm0.48$ g/kg and MAPE $=22.9\%$ on the Africa test split, improving over a tabular XGBoost baseline; the best single checkpoint reaches validation $R^2=0.864$. Ablations confirm the heterogeneous graph, MoE fusion and fine-tuned backbone each contribute substantively, and the ensemble UQ stack achieves post-calibration ECE of $0.031$ (hybrid) and $0.026$ ($β$-NLL). To our knowledge, this is the first framework to unify foundation-model feature extraction, heterogeneous graph attention and decomposed uncertainty quantification for SOC estimation.

2606.16663 2026-06-16 cs.LG 新提交

Beyond Defensive Reporting: Machine Learning for Active Anti-Money Laundering Control in Insurance

超越防御性报告:机器学习在保险主动反洗钱控制中的应用

Dara Goldar, Geir Kjetil Ferkingstad Sandve, Martin Jullum

发表机构 * Fremtind Insurance(Fremtind保险) University of Oslo(奥斯陆大学) Norwegian Computing Center(挪威计算中心)

AI总结 本文利用挪威保险公司的生产数据,训练梯度提升决策树模型检测洗钱索赔,并引入欺诈标签辅助训练,在预算加权捕获率指标下,最佳模型在2-6%的审查索赔中捕获近三分之二的洗钱案例。

详情
AI中文摘要

通过保险索赔进行洗钱对保险公司构成威胁,既包括欺诈性赔付,也包括声誉和监管风险。尽管如此,很少有研究探讨如何预防此类洗钱行为。本文考察了机器学习是否可以帮助保险公司在赔付前标记可疑索赔,将重点从被动报告转向主动预防。使用一家挪威主要保险公司的生产数据,我们训练梯度提升决策树模型来检测后来被报告给当局涉嫌洗钱的索赔。由于欺诈和洗钱可能共享行为模式,我们还考察了保险欺诈标签是否可以作为辅助训练信号。我们使用预算加权捕获率(本文引入的指标)比较了不同的学习设置,该指标衡量在只能手动审查一小部分索赔时捕获了多少洗钱案例。结果表明,纳入与欺诈相关的调查标签显著改善了洗钱检测。表现最佳的模型在排名前2%至6%的选定调查索赔中捕获了近三分之二的洗钱案例。据我们所知,这是首个关于机器学习在保险索赔中检测洗钱的实证研究。

英文摘要

Money laundering through insurance claims poses a threat to insurers both through fraudulent payouts and reputational and regulatory risk. Despite this, little research has examined how such laundering can be prevented. This paper examines whether machine learning can help insurers flag suspicious claims before payout, shifting the focus from passive reporting to active prevention. Using production data from a major Norwegian insurer, we train gradient-boosted decision tree models to detect claims later reported to authorities for suspected money laundering. Because fraud and laundering may share behavioural patterns, we also examine whether insurance fraud labels can serve as an auxiliary training signal. We compare different learning setups using the Budget-Weighted Capture Rate, a metric introduced in this paper to measure how many laundering cases are captured when only a small share of claims can be manually reviewed. The results show that incorporating fraud-related investigation labels substantially improves laundering detection. The best-performing model captures nearly two-thirds of laundering cases within the top-ranked 2 to 6 percent of claims selected for investigation. To our knowledge, this is the first empirical study of machine learning for money laundering detection in insurance claims.

2606.16961 2026-06-16 cs.LG q-fin.CP 新提交

Beyond the Smile: A Hybrid Convolutional VAE for Crypto Volatility Surfaces

超越微笑:用于加密货币波动率曲面的混合卷积VAE

Sadanand Singh, Allam Reddy, Manan Chopra

发表机构 * Jasper Research, USA(Jasper Research(美国))

AI总结 提出混合卷积VAE结合二次微笑重拟合的预测器,在BTC和ETH期权数据上实现低RMSE,显著优于纯参数化方法,并消除日历和蝶式套利。

详情
AI中文摘要

我们提出了一种用于加密货币隐含波动率曲面的卷积变分自编码器,以及一个可部署的预测器,该预测器通过确定性每期限路由规则将其与二次微笑重拟合相结合。该模型在2023年5月至10月期间6034个完全填充的每小时Binance期权曲面(BTC和ETH)上训练,并在共同的$6 \ imes 7$期限-Delta网格上参数化,在两个市场和10-50%的掩码率下,隐藏单元曲面补全RMSE达到0.94-1.56波动率点范围。混合预测器在50%掩码率下达到0.83波动率点,而单独的微笑重拟合为7.00,在无额外推理成本下实现了八倍的降低。在模拟整个期限行权价撤销的结构相关空洞模式下,微笑重拟合产生9.6-13.1波动率点的误差,而学习模型保持在1.5-1.9,隔离了生成模型是唯一可行预测器的场景。在BTC和ETH上的联合训练相对于表现更优的单标的模型,在两个市场上将分布内模型提升了9-27%,表明在观测窗口内两种最大加密货币之间存在显著共享的波动率曲面流形。混合模型在上市行权价上无日历和蝶式套利,而单独的参数化微笑重拟合在高掩码率下无法保持这一性质。训练模型的每快照重构误差在无监督情况下标记了10月底ETF预期反弹和2023年8月17日闪崩为高误差时期。所有训练和评估基础设施均已发布以支持可重复的后续工作。

英文摘要

We present a convolutional variational autoencoder for cryptocurrency implied-volatility surfaces, together with a deployable predictor that combines it with a quadratic smile re-fit through a deterministic per-tenor routing rule. Trained on 6,034 fully-filled hourly Binance Options surfaces of BTC and ETH spanning May-October 2023 and parameterised on a common $6 \times 7$ tenor-delta grid, the model attains a hidden-cell surface-completion RMSE in the 0.94-1.56 vol-point range across both markets and mask rates 10-50%. The hybrid predictor attains 0.83 vol points at 50% masking against 7.00 for the smile re-fit alone, an eightfold reduction obtained at no additional inference cost. Under structurally-correlated hole patterns that emulate the withdrawal of an entire tenor of strikes, the smile re-fit incurs 9.6-13.1 vol points of error while the learned model remains at 1.5-1.9, isolating a regime in which the generative model is the only viable predictor. Joint training on BTC and ETH improves the in-distribution model on both markets by 9-27% relative to the better-performing single-symbol counterpart, indicating a substantially shared vol-surface manifold across the two largest cryptocurrencies over the observation window. The hybrid is calendar- and butterfly-arbitrage-free at the listed strikes, a property that the parametric smile re-fit alone fails at high mask rates. The per-snapshot reconstruction error of the trained model flags the late-October ETF-anticipation rally and the August $17$, $2023$ flash crash as elevated-error periods without supervision. All training and evaluation infrastructure is released to support reproducible follow-on work.

2606.17010 2026-06-16 cs.LG 新提交

From Tokens to Policy: Causal and Interpretable Heterogeneous Treatment Effects Identification

从令牌到策略:因果且可解释的异质性处理效应识别

Riccardo Cadei, Frank Otchere, Nyasha Tirivayi, Gustavo Angeles Tagliaferro, Falco J. Bargagli-Stoffi, Francesco Locatello

发表机构 * ISTA UNICEF(联合国儿童基金会) UCLA(加州大学洛杉矶分校)

AI总结 提出NEXIS方法,利用多模态预处理表示将HTE识别转化为马尔可夫毯发现问题,实现因果可解释的异质性处理效应识别,并在非洲反贫困项目中验证。

详情
AI中文摘要

异质性处理效应(HTE)识别对于解释干预的影响并据此优化策略至关重要。现有方法在表达性和可解释性之间权衡,但如果某些活跃的异质性驱动因素未被测量,这两种极端方法都会允许虚假的HTE表征,缺乏因果解读。在这项工作中,我们聚焦于受控实验,并认为通过潜在交互变量实现因果HTE表征现在已触手可及,这得益于(i)更广泛的预处理测量,即多模态和多视角,以及(ii)具有最小人工监督的可扩展表示。然后,我们将HTE识别重新定义为在充分且对齐的预处理表示上的马尔可夫毯发现问题,并引入神经暴露交互搜索(NEXIS),这是一种具有可证明且经验验证的一致选择性的迭代过程。我们在非洲的两个反贫困项目中部署NEXIS,为每个项目增加卫星图像以捕捉先前未测量的环境效应修饰因子,从而为优化项目的后续迭代提供新颖、可解释且规范性的指导。

英文摘要

Heterogeneous Treatment Effect (HTE) identification is crucial to explain the impact of an intervention and optimize our policies accordingly. Existing approaches trade expressivity for interpretability, but, if some active heterogeneity drivers are unmeasured, methods at both ends of this spectrum allow for spurious HTE characterization with no causal reading. In this work, we focus on controlled experiments and argue that an oracle HTE causal characterization via the latent interactors is now within reach, thanks to (i) more extensive pre-treatment measurements, i.e., multi-modal and multi-view, and (ii) scalable representations with minimal human supervision. We then re-frame HTE identification as a Markov-blanket discovery problem on a sufficient and aligned pre-treatment representation, and introduce Neural EXposure Interaction Search (NEXIS), an iterative procedure with provable and empirically validated consistent selection. We deploy NEXIS on two anti-poverty programs in Africa, augmenting each with satellite imagery capturing previously unmeasured environmental effect modifiers, leading to novel, interpretable and prescriptive guidelines to optimize the programs' next iterations.

2606.14729 2026-06-16 physics.comp-ph cs.LG physics.flu-dyn 交叉投稿

Machine Learning-Driven Chemical Reactor Network Modeling of the Sandia-D Flame

机器学习驱动的Sandia-D火焰化学反应器网络建模

Nicolas J. Tricard, Benjamin C. Koenig, Sili Deng

发表机构 * Massachusetts Institute of Technology(麻省理工学院)

AI总结 针对湍流燃烧模拟成本高的问题,提出机器学习辅助的等效反应器网络(ERN)自动构建方法,结合主成分分析、k-means聚类和梯度下降优化,在Sandia-D火焰上实现6000倍加速且最大温度R²达0.7945。

Comments 12 pages, 11 figures

详情
AI中文摘要

湍流燃烧模拟对许多科学和工程系统至关重要。然而,完全解析复杂的多尺度和多物理行为的成本很高,使得直接模拟通常不可行。等效反应器网络(ERN)方法试图通过用一系列更便宜的0-D和1-D化学反应器替代多维湍流模拟来提高计算效率,提供了一种保留详细化学但简化流动物理的代理模型。然而,其开发仍然是一个挑战,通常需要专家分析或牺牲精度的自动化方法。在这项工作中,我们开发了一个自动化的机器学习辅助框架,用于构建Sandia-D湍流甲烷/空气火焰的ERN。首先使用主成分分析将高维热化学计算流体动力学(CFD)数据降维到低维潜在空间,其中k-means聚类识别出物理可解释的火焰区域,用于初始化反应器网络图。然后使用有限差分梯度下降(围绕不可微的Cantera反应器模拟)优化此初始化。在跨越一系列引燃温度和入口甲烷组成的30个RANS模拟中,优化的7反应器ERN实现了最大温度$R^2$得分为0.7945,同时相对于CFD求解器保持了约6000倍的加速。出口CO预测仍然更具挑战性,最终$R^2$得分为-0.4183,但相对于未优化的聚类初始化有显著改善。这些结果表明,无监督热化学特征提取可以为ERN构建提供有效的物理信息初始化,而基于梯度的优化可以显著提高预测精度,无需手动设计反应器网络。

英文摘要

Turbulent combustion simulations are crucial for many scientific and engineering systems. However, the high cost to fully resolve the complex multiscale and multiphysics behavior makes direct simulation typically infeasible. The equivalent reactor network (ERN) approach attempts to improve computational efficiency by replacing a multidimensional turbulent simulation with a series of much cheaper 0-D and 1-D chemical reactors, providing a surrogate model that retains detailed chemistry at the cost of simplified flow physics. However, their development remains a challenge, often requiring either expert analysis, or automated approaches that sacrifice accuracy. In this work, we develop an automated machine-learning-assisted framework for constructing ERNs of the Sandia-D turbulent methane/air flame. Principal component analysis is first used to reduce high-dimensional thermochemical computational fluid dynamics (CFD) data to a low-dimensional latent space, where k-means clustering identifies physically interpretable flame regions used to initialize a reactor-network graph. This initialization is then refined using finite-difference gradient descent wrapped around non-differentiable Cantera reactor simulations. Across 30 RANS simulations spanning a range of pilot temperatures and inlet methane compositions, the optimized 7-reactor ERN achieves a maximum-temperature $R^2$ score of 0.7945 while preserving a $\sim6000\times$ speedup over the CFD solver. Outlet CO prediction remains more challenging, with a final $R^2$ score of $-0.4183$, but improves substantially from the unoptimized clustering initialization. These results show that unsupervised thermochemical feature extraction can provide effective physics-informed initializations for ERN construction, while gradient-based refinement can significantly improve predictive accuracy without manual reactor-network design.

2606.14734 2026-06-16 q-bio.MN cs.AI cs.LG 交叉投稿

BRIDGE: Biological Evidence Refinement and Heterogeneous Dynamic Gating for Gene Regulatory Networks

BRIDGE:基因调控网络的生物学证据精炼与异质动态门控

Ziyang Dong, Shanwen Tan, Hengchuang Yin, Wei Liu, Yifan Wang, Siyu Yi, Jiancheng Lv, Wei Ju

发表机构 * College of Computer Science(计算机科学学院) Sichuan University(四川大学) Xinjiang Technical Institute of Physics and Chemistry(新疆物理化学研究所) Chinese Academy of Sciences(中国科学院) School of Mathematics(数学学院) University of International Business and Economics(国际商务经济大学) School of Artificial Intelligence and Data Science(人工智能与数据科学学院)

AI总结 提出BRIDGE框架,通过共表达精炼视图和异质门控编码,从scRNA-seq数据中稳健推断基因调控网络,在多个基准数据集上取得最优性能。

Comments 19 pages, 10 figures, 7 tables

详情
AI中文摘要

动机:从单细胞RNA测序(scRNA-seq)数据推断基因调控网络(GRN)对于揭示细胞状态特异性转录程序至关重要。然而,scRNA-seq测量存在稀疏性和噪声,且实验验证的转录因子-靶基因相互作用仍然有限,使得可靠推断具有挑战性。尽管图神经网络已经推进了GRN预测,现有方法通常依赖生物学上无约束的图增强(如随机边扰动),并且对基因与细胞之间的信息传递控制不足。这些局限性可能扭曲调控结构,并在噪声和弱监督设置下削弱鲁棒性。结果:为解决这些问题,我们提出了一个创新框架,名为基因调控网络的生物学证据精炼与异质动态门控(BRIDGE)。BRIDGE从表达矩阵及其矩阵对偶中提取基因和细胞表示,并在基因空间和细胞空间中,在共表达精炼的调控视图与原始图之间,对自身和邻居进行对比学习。然后,它应用异质门控编码自适应地调节基因与细胞之间的信息传递,实现稳健的转录因子-靶基因预测。在涵盖三种网络类型和七种细胞类型的基准数据集上的实验表明,BRIDGE在大多数设置下达到了最先进的AUROC和AUPRC。特别是在特异性网络上,BRIDGE的平均AUPRC比第二好的基线GCLink提高了5%。在跨细胞类型的小样本迁移中,BRIDGE在所有六种目标细胞类型上始终优于GCLink和GENELink。在hESC上的案例研究进一步支持了预测的生物学相关性,其中前10个中的9个和前100个中的46个新型转录因子-靶基因相互作用得到了ChIPBase的验证。

英文摘要

Motivation: Gene regulatory network inference from single-cell RNA sequencing (scRNA-seq) data is important for uncovering cell-state-specific transcriptional programs. However, scRNA-seq measurements are sparse and noisy, and experimentally validated TF-target interactions remain limited, making reliable inference challenging. Although graph neural networks have advanced GRN prediction, existing methods often rely on biologically unconstrained graph augmentation, such as random edge perturbation, and insufficiently control information transfer between genes and cells. These limitations may distort regulatory structures and weaken robustness under noisy and weakly supervised settings. Results: To address these issues, we propose an innovative framework named Biological Evidence Refinement and Heterogeneous Dynamic Gating for Gene Regulatory Networks (BRIDGE). BRIDGE extracts gene and cell representations from the expression matrix and its matrix dual, and performs contrastive learning in the gene space and cell space between self and neighbors across the co-expression-refined regulatory view and the original graph. It then applies heterogeneous gated encoding to adaptively regulate information transfer between genes and cells, enabling robust transcription factor-to-target gene prediction. Experiments on benchmark datasets spanning three network types and seven cell types show that BRIDGE achieves state-of-the-art AUROC and AUPRC in most settings. In particular, on Specific networks, BRIDGE improves average AUPRC by 5% over the second-best baseline, GCLink. In cross-cell-type few-shot transfer, BRIDGE consistently outperforms GCLink and GENELink across all six target cell types. A case study on hESC further supports the biological relevance of the predictions, with 9 of the top 10 and 46 of the top 100 novel TF-target interactions validated by ChIPBase.

2606.14737 2026-06-16 q-bio.BM cs.LG stat.ML 交叉投稿

Learning Topological Representations for Molecular Dynamics

学习分子动力学的拓扑表示

Dominik Geng, Florian Graf, Martin Uray, Roland Kwitt

发表机构 * University of Salzburg(萨尔茨堡大学) Centre for Intelligent and Secure Industrial Automation(智能与安全工业自动化中心) University of Applied Sciences(应用科学大学)

AI总结 提出掩蔽Flood复形用于持久同源性分析,在共享表示空间中实现蛋白质构象的几何感知表征,并在分类、回归和马尔可夫状态模型估计中取得竞争性能。

Comments 20 pages, 4 figures

详情
AI中文摘要

分子动力学(MD)模拟生成高维构型空间中的轨迹,其分析关键依赖于分子描述符,通常是手工设计的可观测量或学习的动力学嵌入。然而,设计既具表达力又广泛适用的描述符仍然具有挑战性。我们研究持久同源性(PH)作为MD的通用表示,并引入掩蔽Flood复形,这是一种针对蛋白质定制的最近提出的单纯复形构造的改进,以低计算成本强调残基间结构。向量化的持久图随后提供信息丰富、几何感知的蛋白质构象摘要,我们在单个共享表示空间中评估其在蛋白质类别预测、帧级可观测回归以及从学习的低维坐标估计马尔可夫状态模型(MSM)上的性能。在mdCATH数据集上的结果表明,基于PH的描述符在各项任务中具有竞争力,其中掩蔽Flood PH产生最一致的整体性能。此外,在最近的MarS-FM框架中,当使用拓扑信息MSM作为蛋白质构象生成建模的直接替代时,我们获得了比基于物理可观测量的MSM更一致的系综统计。最后,我们探索了生成模型向性质不同的快速折叠蛋白质的可迁移性。

英文摘要

Molecular dynamics (MD) simulations generate trajectories in a high-dimensional configuration space whose analysis critically depends on molecular descriptors, typically handcrafted observables or learned kinetic embeddings. Designing descriptors that are both expressive and broadly applicable, however, remains challenging. We study persistent homology (PH) as a general-purpose representation for MD and introduce the masked Flood complex, a protein-tailored modification of a recently introduced simplicial complex construction that emphasizes inter-residue structure at low computational cost. Vectorized persistence diagrams then provide information-rich, geometry-aware summaries of protein conformations, which we evaluate on protein class prediction, frame-level observable regression, and Markov state model (MSM) estimation from learned low-dimensional coordinates in a single shared representation space. Results on the mdCATH dataset show that PH-based descriptors are competitive across tasks, with masked Flood PH yielding the most consistent overall performance. Further, when using topologically-informed MSMs as a drop-in replacement within the recent MarS-FM framework for generative modeling of protein conformations, we obtain consistently better ensemble statistics than MSMs based on physical observables. Finally, we explore the transferability of the generative model to qualitatively different, fast folding, proteins.

2606.14741 2026-06-16 cs.CV cs.LG 交叉投稿

HorusEye: Language as Dynamic Attention for Emergency Visual Analysis

HorusEye:语言作为动态注意力用于应急视觉分析

Armel Yara

发表机构 * Armel Yara

AI总结 提出HorusEye框架,通过语言反馈动态引导视觉分析,在应急场景下评估多种VLM,发现语言反馈效果依赖模型,并揭示热成像中的裁剪悖论。

Comments 18 pages, 9 figures, 11 tables

详情
AI中文摘要

我们介绍了HorusEye,即语言作为动态注意力用于应急视觉分析。我们的研究分为五个阶段。第一阶段是构建RefCOCO-Degraded基准数据集,包含15,244张图像(3,811张基础图像×4种条件:清晰、雾、烟和热成像),具有系统性的视觉退化。通过四个研究问题,我们评估了多种VLM(Gemini、Qwen2-VL、BLIP-2、LLaVA、Kosmos-2)在视觉定位(第二阶段)、语言反馈恢复(第三阶段)、健康VQA任务(第四阶段)以及幻觉分析(最终阶段)上的表现。我们的关键发现是语言反馈的有效性依赖于模型:Gemini通过迭代语言反馈在热成像条件下提升了47.3%,而Qwen2-VL在相同协议下性能下降了5.1%。我们还发现了“热成像悖论”,即提升RGB性能的裁剪策略在热成像中灾难性地失败。此外,BLIP-2在退化条件下独特地产生更多幻觉,使其不适合应急部署。

英文摘要

We introduce HorusEye, Language as Dynamic Attention for Emergency Visual Analysis. Our investigation followed five stages. The first one is benchmarking RefCOCO-Degraded, a dataset of 15,244 images (3,811 base images x 4 conditions: Clean, Fog, Smoke and Thermal) with systematic visual degradation. Through four research questions, we evaluate multiple VLMs (Gemini, Qwen2-VL, BLIP-2, LLaVA, Kosmos-2) across visual grounding the second stage, language feedback recovery the third one, health VQA tasks the fourth, and hallucination analysis the final stage. Our key finding is that language feedback effectiveness is model-dependent: Gemini achieves +47.3% improvement in thermal conditions through iterative language feedback, while Qwen2-VL shows -5.1% degradation under the same protocol. We also identify the 'Thermal Paradox' where cropping strategies that improve RGB performance catastrophically fail in thermal imagery. Furthermore, BLIP-2 uniquely hallucinates more under degradation, making it unsuitable for emergency deployment

2606.14763 2026-06-16 cs.RO cs.LG math.OC 交叉投稿

Bayesian Optimization for Learning Nonlinear MPC in Autonomous Agent Navigation

自主智能体导航中学习非线性模型预测控制的贝叶斯优化

Lorenzo Ortolani, Gabriel Voss, Gabriele Beltrami, Francesco Dorati, Tommaso Felice Banfi

发表机构 * Talos Robotics AI

AI总结 提出一种无地图框架,结合滚动时域规划与非线性MPC,利用贝叶斯优化自动调参,在仿真和实物四足机器人上实现高效导航。

Comments Published at the IEEE ICRA 2026 Xplore Workshop (Oral), Cross-Disciplinary aspects of Exploration in Robotics, Reinforcement Learning, and Search

详情
AI中文摘要

在动态未知环境中的实时自主导航仍然是移动机器人领域的一个基本挑战。我们提出了一种无地图框架,该框架紧密集成了反应式滚动时域规划与非线性模型预测控制(MPC)。在每个控制周期,构建基于激光雷达的高斯占据表示,并通过A*搜索生成无碰撞轨迹,随后由采用平滑sigmoid障碍屏障的CasADi/IPOPT MPC公式进行跟踪。为了提高对参数敏感性的鲁棒性,我们采用基于树结构Parzen估计器(TPE)的离线贝叶斯优化方案,该方案针对复合导航目标识别出接近最优的控制器参数。此外,使用高斯过程代理分析参数敏感性,并深入了解优化景观。所提出的框架与机器人无关,在仿真中使用Gazebo在Unitree Go2四足机器人上进行评估,随后部署到实体机器人上。实验结果表明,在仿真中调优的参数能有效迁移到硬件上,无需额外调优即可保持相当的性能。完整系统在部署时实现了高达90.0%的导航成功率,并且在仿真环境中评估指标平均提升38.9%。

英文摘要

Real-time autonomous navigation in dynamic, unknown environments remains a fundamental challenge for mobile robotics. We propose a map-free framework that tightly integrates reactive rolling-horizon planning with nonlinear Model Predictive Control (MPC). At each control cycle, a LiDAR-based Gaussian occupancy representation is constructed and used to generate collision-free trajectories via A* search, which are then tracked by a CasADi/IPOPT MPC formulation incorporating a smooth sigmoid obstacle barrier. To improve robustness to parameter sensitivity, we adopt an offline Bayesian optimization scheme based on Tree-structured Parzen Estimators (TPE), which identifies near-optimal controller parameters with respect to a composite navigation objective. In addition, a Gaussian Process surrogate is used to analyze parameter sensitivity and provide insight into the optimization landscape. The proposed framework is robot-agnostic and is evaluated on the Unitree Go2 quadruped in simulation using Gazebo, followed by deployment on the physical robot. Experimental results show that parameters tuned in simulation transfer effectively to hardware, maintaining comparable performance without additional tuning. The full system achieves up to a 90.0\% navigation success rate when deployed, along with a 38.9\% average improvement in the evaluation metrics across simulated environments.

2606.14776 2026-06-16 cs.RO cs.LG 交叉投稿

Deep Learning-Based Lunar Crater Terrain Relative Navigation

基于深度学习的月球陨石坑地形相对导航

Batu Candan, Simone Servadio

发表机构 * NASA(美国国家航空航天局) University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 提出一种结合深度学习陨石坑检测器和扩展卡尔曼滤波的地形相对导航算法,在初始位置偏差达5公里时仍能将导航误差降至数百米。

详情
AI中文摘要

准确的位置估计对于未来使用自主飞行器实现月球着陆至关重要,尤其是在地形特征稀疏的危险环境中。本文提出了一种地形相对导航(TRN)算法,该算法结合了我们专门为NASA陨石坑检测挑战问题设计的深度学习陨石坑检测器和扩展卡尔曼滤波(EKF)。我们的检测器分析从轨道获取的单目图像中的陨石坑特征,并通过匈牙利分配方法及基于共识的离群点去除方法,识别它们与全球数据库中陨石坑的匹配。然后,估计的测量值用于优化EKF,其中航天器在月心月固(LCLF)参考系中的姿态估计,结合高度辅助信息,约束径向漂移。仿真结果表明,即使航天器偏离实际位置达5公里,TRN也能从这种情况中恢复,将导航误差降低到几百米。需要注意的是,为了保持陨石坑特征的对应关系,必须将图像分辨率和场景中的尺度与检测器训练集分布相匹配。

英文摘要

Accurate position estimation is crucial for the successful implementation of future lunar landings using autonomous vehicles, especially in dangerous environments with sparse terrain features. In this paper, we propose a terrain relative navigation (TRN) algorithm combining our deep-learning crater detector, which was designed specifically for the NASA Crater Detection Challenge problem, and an Extended Kalman Filter (EKF). Our detector analyzes crater features from the monocular images acquired from orbit, and their matches with craters from a global database are identified via a Hungarian assignment approach followed by the consensus-based outliers removal method. The estimated measurements are then used to refine an EKF, where spacecraft pose estimation in the Lunar-Centered Lunar-Fixed (LCLF) frame of reference, augmented with altitude aiding information, constrains radial drift. The simulation results indicate that even if the spacecraft is off from its actual location up to 5 km, TRN could recover from this situation, achieving navigation error reduction to a few hundred meters. It should be noted that in order to maintain crater feature correspondences, it is important to match the image resolution and the scales within the scene to the detector training set distribution.

2606.14788 2026-06-16 cs.SD cs.AI cs.LG eess.AS 交叉投稿

Unifying Acoustic Features and Text with Multimodal LLMs for Neurodegenerative Screening

统一声学特征与文本的多模态大语言模型用于神经退行性疾病筛查

Qingfeng Zhang, Yuanxiong Guo, Yanmin Gong

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出NeurMLLM框架,通过多模态大语言模型融合声谱图、MFCC和文本,实现阿尔茨海默病和帕金森病的精细分期,优于传统方法和现有LLM方法。

Comments IEEE International Conference on Healthcare Informatics, 2026

详情
AI中文摘要

基于语音的筛查为评估阿尔茨海默病(AD)和帕金森病(PD)等神经退行性疾病提供了一种可扩展且非侵入性的方式,但由于整合异质数据的困难,其分期仍然具有挑战性。本文提出了NeurMLLM,一种用于神经退行性疾病分期的高效多模态生成框架。NeurMLLM首先使用视觉变换器对音频数据的声谱图和梅尔频率倒谱系数进行编码,并将其表示投影到大语言模型(LLM)的嵌入空间中,在那里它们与转录文本和人口统计指令标记连接成一个统一的序列。然后,通过低秩适应使用任务提示对LLM进行指令微调,以自回归方式预测受限的标签标记,从而实现生成式分类。通过在Bridge2AI-Voice数据集上对AD和PD进行细粒度分期评估,我们观察到NeurMLLM取得了强劲的性能,持续优于经典机器学习方法和现有的基于LLM的方法。结果表明,多模态LLM在神经退行性疾病分期中具有巨大潜力,提高了分期准确性并支持可访问的部署。

英文摘要

Voice-based screening offers a scalable and non-invasive way to assess neurodegenerative diseases such as Alzheimer's disease (AD) and Parkinson's disease (PD), but their staging remains challenging due to the difficulty of integrating heterogeneous data. This paper presents NeurMLLM, an efficient multimodal generative framework for neurodegenerative disease staging. NeurMLLM first encodes the spectrograms and Mel-frequency cepstral coefficients of audio data with vision transformers and projects their representations into the embedding space of a large language model (LLM), where they are concatenated with transcript and demographic instruction tokens as a single unified sequence. The LLM is then instruction-tuned via Low-Rank Adaptation using task prompts to autoregressively predict a constrained label token, enabling a generative classification. By evaluating on the Bridge2AI-Voice dataset for fine-grained staging of AD and PD, we observe that NeurMLLM achieves strong performance, consistently outperforming classical machine learning methods and existing LLM-based approaches. The results show the high potential of multimodal LLMs in neurodegenerative disease staging, improving staging accuracy and supporting accessible deployment.

2606.14874 2026-06-16 physics.data-an cs.LG nucl-ex 交叉投稿

Peak-Based Nuclide Identification in HPGe $γ$-Spectrometry with Machine Learning and SHAP

基于峰值的HPGe γ能谱机器学习与SHAP核素识别

Samuel Emmons, Kelly Truax, Maurice Lonsway, Bruce Pierson, Brian Archambault

发表机构 * University of California, Berkeley(加州大学伯克利分校) Lawrence Berkeley National Laboratory(伯克利国家实验室)

AI总结 提出机器学习模型,利用分析者拟合的光电峰映射到核素识别结果,在65种同位素组合的实验谱上F1达0.97,优于传统软件的0.84,并通过SHAP解释揭示模型使用物理相关峰进行预测。

Comments 25 pages, 11 figures (plus an additional 6 figures in the appendix), and 3 tables. To be published in Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment

详情
AI中文摘要

高纯锗伽马能谱通常需要领域专家进行耗时分析。谱中的光电峰被仔细拟合,并采用数值方法辅助核素识别(NID)和定量。修改分析软件识别的核素列表可能很复杂。因此,当需要分析大量样品时,及时做出正确决策具有挑战性。基于监督机器学习的NID可以作为专家知识驱动的自动化工具,改进向分析人员建议的初始放射性核素集合,并更有效地推动后续定量。为此,我们实现了机器学习模型,将分析人员仔细拟合的光电峰映射到NID结果,用于包含从65种同位素集合中抽取的各种同位素组合的实验谱。最佳模型达到了0.97的F1分数,显著超过了使用包含模型评估的相同65种同位素的核素库进行比较时传统软件达到的0.84的F1分数。最后,我们使用Shapley加法解释说明了模型预测的最重要输入特征。这些解释表明,模型在对核素库中的同位素进行预测时使用了物理相关的光电峰。

英文摘要

High-purity germanium gamma spectra often require time-consuming analyses from subject matter experts. Photopeaks within these spectra are carefully fitted and numerical methods are employed to assist with nuclide identification (NID) and quantification. Amending the list of nuclides identified by analysis software can be nontrivial. When many samples need to be analyzed, it is therefore challenging to make timely and correct decisions. Supervised machine-learning-based NID can serve as an expert-informed, automated tool to improve the initial set of radionuclides suggested to an analyst and more effectively drive subsequent quantification. To that end, we implemented machine learning models that map photopeaks carefully fitted by analysts to NID results for experimental spectra containing various isotopic combinations drawn from a set of 65 isotopes. The best model achieved an F1 score of 0.97, markedly surpassing the F1 score of 0.84 achieved by traditional software when compared using a nuclide library comprising the same 65 isotopes assessed by the models. Finally, we illustrated the most important input features for model predictions using Shapley Additive Explanations. These explanations revealed that the models use physically relevant photopeaks when making predictions for the isotopes in our nuclide library.

2606.15023 2026-06-16 physics.flu-dyn cs.LG 交叉投稿

Multiscale Hypersonic Boundary Layer Reconstruction via Spectral Binning and Subdomain-wise Conditional Diffusion

基于频谱分箱和子域条件扩散的高超声速边界层多尺度重构

Hojin Kim, Dibyajyoti Chakraborty, Takahiko Toki, Carlo Scalo, Romit Maulik

发表机构 * School of Mechanical Engineering, Purdue University(普渡大学机械工程学院) College of Information Sciences and Technology, Pennsylvania State University(宾夕法尼亚州立大学信息科学与技术学院) Mathematics and Computer Science Division, Argonne National Laboratory(阿贡国家实验室数学与计算机科学部)

AI总结 提出多尺度概率重构框架,通过条件扩散模型从顶部壁面有限观测推断近壁状态,采用软重叠修复策略和边界频谱损失实现高超声速库埃特流全场重构。

Comments 33 pages, 28 figures

详情
AI中文摘要

我们提出了一个用于高超声速库埃特流的多尺度概率重构框架,其中通过条件扩散模型从有限的顶部壁面观测推断近壁状态。边界层被划分为重叠的壁法向子域,并联合训练一个高度和马赫数条件的阐明扩散模型(EDM),用于M=6,7,8,以采样以顶部壁面边界切片为条件的速度、密度、压力和温度场。一种软重叠修复策略将子域预测组装成全体积重构,同时保持子域间的连续性和小尺度变异性。为了提高生成场的频谱保真度,我们引入了一种新颖的有界分箱频谱功率(BSP)损失,该损失保留高波数内容,同时在扩散噪声调度中保持数值稳定。与直接数值模拟数据的验证表明,该模型在所有训练马赫数下恢复了瞬时结构、频谱、统计剖面、相关性和壁面量,同时提供了空间结构化的不确定性估计。重构的马赫数条件剖面也在Trettel-Larsson变换下坍缩,表明与可压缩性缩放的一致性。这些结果确立了具有有界分箱频谱损失的域分解条件扩散模型作为高超声速壁面湍流中近壁重构的有效概率代理。

英文摘要

We propose a multiscale probabilistic reconstruction framework for hypersonic Couette flow, where near-wall states are inferred from limited top-wall observations using conditional diffusion model. The boundary layer is divided into overlapping wall-normal subdomains, and a single height- and Mach-conditioned Elucidating Diffusion Model (EDM) is trained jointly for M=6,7,8 to sample velocity, density, pressure, and temperature fields conditioned on a top-wall boundary slice. A soft overlap inpainting strategy assembles subdomain predictions into full-volume reconstructions while maintaining inter-subdomain continuity and small-scale variability. To improve the spectral fidelity of the generated fields, we introduce a novel bounded binned spectral power (BSP) loss that preserves high-wavenumber content while remaining numerically stable across the diffusion noise schedule. Validation against direct numerical simulation data shows that the model recovers instantaneous structures, spectra, statistical profiles, correlations, and wall quantities across all training Mach numbers, while providing spatially structured uncertainty estimates. The reconstructed Mach-conditioned profiles also collapse under the Trettel-Larsson transformation, indicating consistency with compressibility scaling. These results establish the domain decomposed conditional diffusion model with a bounded binned spectral loss as an effective probabilistic surrogate for near-wall reconstruction in hypersonic wall-bounded turbulence.

2606.15213 2026-06-16 quant-ph cs.LG 交叉投稿

Quantum-classical hybrid models based on error correction for time series forecasting

基于纠错机制的量子-经典混合模型用于时间序列预测

Jonathan H. A. de Carvalho, Filipe C. de L. Duarte, Fernando M. de Paula Neto, Paulo S. G. de Mattos Neto

AI总结 提出首个基于纠错的量子-经典混合预测系统,量子模型提取模式,经典模型从量子误差中捕获剩余模式,在多数问题上取得最优结果。

Comments Submitted to Nature Computational Science. 24 pages, 10 figures

详情
AI中文摘要

时间序列预测很大程度上受益于结合不同模型的优势,特别是使用一种方案,其中一个模型通过从预测误差中捕获补充模式来纠正另一个模型。同时,量子模型通过在混合架构中与经典模型一起作用,为增强经典能力提供了手段,包括在时间序列预测中。在这项工作中,我们提出了第一个基于纠错的预测系统,该系统联合使用量子模型和经典模型。在这里,量子模型首先通过探索量子现象提取模式,然后经典模型从量子误差中捕获剩余模式。与经典单一模型和基于纠错的经典-经典混合模型相比,这种量子-经典系统产生的互补能力在大多数处理的问题中提供了最佳结果。因此,这项工作为在时间序列预测的既定混合方案中引入量子模型铺平了道路。

英文摘要

Time series forecasting largely benefits from combining the strengths of different models, especially using a scheme where a model corrects another model by capturing supplementary patterns from forecasting errors. Concurrently, quantum models are providing a means to augment the classical capacity, including in time series forecasting, by acting alongside classical models in hybrid architectures. In this work, we propose the first forecasting system based on error correction that jointly uses quantum and classical models. Here, quantum models first extract patterns by exploring quantum phenomena, and classical models capture the remaining patterns from the quantum errors. Compared to classical single models and classical-classical hybrid models based on error correction, the complementary capacity that emerges from this quantum-classical system provided the best results in most of the addressed problems. Therefore, this work paves the way to introduce quantum models in established hybridization schemes for time series forecasting.

2606.15234 2026-06-16 eess.SP cs.CE cs.LG 交叉投稿

Surrogate-Assisted Framework for SI-Compliant Interconnect Design Optimization Using the Earth Mover's Distance

基于推土机距离的SI合规互连设计优化代理辅助框架

Emre Ecik, Werner John, Julian Withöft, Ralf Brüning, Jürgen Götze

发表机构 * Information Processing Lab, TU Dortmund University(图腾大学信息处理实验室) Pyramide2525/TU Dortmund University(图腾大学Pyramide2525分部) EMC Technology Center Paderborn, Zuken GmbH(帕德博恩EMC技术中心,祖克纳公司)

AI总结 提出一种基于推土机距离的确定性机器学习辅助框架,通过代理模型预测波形、决策树筛选SI合规设计,并利用EMD排序,实现可解释且高效的PCB互连优化。

Comments 16 pages, 15 figures. This manuscript has been submitted to Advances in Radio Science for review (2026)

详情
AI中文摘要

本文提出一种基于推土机距离(EMD)的确定性机器学习辅助框架,用于SI合规的PCB设计。与依赖迭代黑盒搜索过程的传统代理优化方法不同,本方法采用可解释的顺序评估策略。首先使用神经代理模型根据拓扑相关设计参数高效预测波形描述特征。然后,决策树作为物理驱动的质量门,根据预定义的SI标准识别SI合规波形。在得到的有效解空间中,采用推土机距离作为相似性度量,根据候选设计与理想参考信号的接近程度对其进行排序。这不仅能够确定性地识别可接受的参数区域,而且无需逆建模或随机搜索过程即可透明地优先选择物理上更优的解。通过大规模仿真DDR3飞越波形数据集验证了该方法。通过结合代理预测、可解释分类和基于EMD的波形评估,该框架为基于AI方法的PCB开发提供了可解释且计算高效的替代传统优化策略的方案。

英文摘要

This work presents a deterministic, machine-assisted framework for SI-compliant PCB design based on the Earth Mover's Distance (EMD). In contrast to conventional surrogate-based optimization methods that rely on iterative black-box search procedures, the proposed approach follows an interpretable, sequential evaluation strategy. Neural surrogate models are first used to efficiently predict waveform describing features from topology-dependent design parameters. A decision tree then acts as a physically motivated quality gate that identifies SI-compliant waveforms according to predefined SI criteria. Within the resulting valid solution space, the Earth Mover's Distance is employed as a similarity metric to rank candidate designs according to their proximity to an ideal reference signal. This enables not only the deterministic identification of admissible parameter regions but also a transparent prioritization of physically superior solutions without inverse modeling or stochastic search procedures. The methodology is demonstrated using a large-scale set of simulated DDR3 fly-by waveforms. By combining surrogate prediction, interpretable classification, and EMD-based waveform evaluation, the framework provides an explainable and computationally efficient alternative to conventional optimization strategies for supporting PCB development with AI-based methods.

2606.15251 2026-06-16 cs.RO cs.AI cs.LG 交叉投稿

Driving, Fast or Slow? Neuro-Symbolic Guidance for Motion Prediction in Multi-Modal Ground Mobility

驾驶,快或慢?多模态地面移动中运动预测的神经符号引导

Simon Kohaut, Felix Divo, Julius Hahnewald, Benedict Flade, Julian Eggert, Kristian Kersting, Devendra Singh Dhami

发表机构 * Artificial Intelligence and Machine Learning Lab, TU Darmstadt(达姆施塔特工业大学人工智能与机器学习实验室) Honda Research Institute(本田研究所) Hessian Center for AI (hessian.AI)(黑森州人工智能中心) Centre for Cognitive Science(认知科学中心) German Center for AI (DFKI)(德国人工智能研究中心) Uncertainty in Artificial Intelligence Lab, TU Eindhoven(埃因霍温理工大学人工智能不确定性实验室)

AI总结 提出TraCS框架,通过神经符号方法将交通规则编码为概率一阶逻辑,增强黑盒运动预测模型的可解释性和合规性,在Argoverse 2上持续提升SOTA性能。

详情
AI中文摘要

准确且可解释的异构交通空间(包括行人、自行车、汽车和卡车)运动预测对于安全的自主导航至关重要。然而,最先进的方法仍然是黑盒,缺乏对现实世界移动的监管和行为约束的显式编码。我们提出Trajectory Compliance-Shaping (TraCS),一种神经符号框架,通过可解释的概率一阶逻辑增强现有的黑盒运动预测骨干网络。为此,TraCS采用智能体代码生成流水线,弥合交通规则的自然语言描述与概率运动预测之间的差距。此外,TraCS采用反应式数据流推理引擎,随着场景演变维护并高效更新合规性景观。为防止TraCS过度自信地将骨干网络的预测引导到错误方向,我们提出一种神经置信度评分,作为上下文感知的合规性信号衰减。我们在Argoverse 2基准上展示了TraCS如何持续改进最先进的预测骨干网络,表明概率和符号合规性推理是纯神经运动预测的广泛适用且计算高效的补充。

英文摘要

Accurate and interpretable motion prediction for heterogeneous traffic spaces, including pedestrians, bicycles, cars, and trucks, is essential for safe autonomous navigation. Nevertheless, state-of-the-art approaches remain predominantly black-box, lacking explicit encoding of the regulatory and behavioral constraints of real-world mobility. We propose Trajectory Compliance-Shaping (TraCS), a neuro-symbolic framework that augments existing black-box motion prediction backbones with interpretable and probabilistic first-order logic. To do so, TraCS employs an agentic code-generation pipeline to bridge the gap between natural-language descriptions of traffic regulations and probabilistic motion prediction. Furthermore, TraCS employs a reactive data-streaming inference engine that maintains and efficiently updates compliance landscapes as scenes evolve. To prevent TraCS from overconfidently steering the backbone's predictions in the wrong direction, we propose a neural confidence rating learned as a context-aware attenuation of the compliance signal. We demonstrate on the Argoverse 2 benchmark how TraCS consistently improves state-of-the-art prediction backbones, showing that probabilistic and symbolic compliance reasoning is a broadly applicable and computationally efficient complement to purely neural motion predictors.

2606.15356 2026-06-16 physics.flu-dyn cs.LG 交叉投稿

ShipNet: A Geometric Deep Learning Surrogate for Real-Time Ship Hydrodynamics

ShipNet:一种用于实时船舶水动力学的几何深度学习代理模型

Kirsten Odendaal, George Drakoulas

发表机构 * Maritime Research Institute(海洋研究机构) Wageningen, Netherlands(荷兰瓦格宁根) Damen Research(达门研究) Gorinchem, Netherlands(荷兰戈林切姆)

AI总结 提出ShipNet几何深度学习代理模型,直接从船体几何和速度预测压力分布与波浪场,在保留测试集上R²达0.98和0.91,推理速度比势流求解器快550倍以上。

详情
AI中文摘要

准确预测水动力性能是船舶设计的核心,然而高保真计算流体动力学在大规模参数探索中仍然过于昂贵。这促使开发数据驱动的代理模型,以显著降低的成本提供对水动力预测的快速近似。我们提出ShipNet,一种几何深度学习代理模型,直接从船体几何和速度预测船体表面压力分布和远场自由表面波模式。该网络在船体点云上采用正则化动态图卷积主干,并使用多头解码器同时输出近体压力和自由表面高程。训练数据包括使用势流面板法对两种母型游艇船体生成的420次无粘自由表面模拟,每种船体参数化为70种变体并在三种速度下评估。ShipNet使用结合逐点回归和图像结构项的复合损失预测每点压力系数和二维波浪高程图。在几何保留测试集上,ShipNet对船体压力达到R²=0.98,对波浪场达到R²=0.91。每个案例推理约需0.15秒,在传统硬件上相比势流求解器实现超过550倍的加速。局限性包括受限的几何和速度范围以及无粘训练数据,未来工作将通过物理信息正则化将模型扩展到高保真粘性模拟。

英文摘要

Accurate prediction of hydrodynamic performance is central to ship design, yet high-fidelity computational fluid dynamics remains prohibitively expensive for large-scale parametric exploration. This motivates the development of data-driven surrogate models that provide rapid approximations to hydrodynamic predictions at substantially reduced cost. We present ShipNet, a geometric deep-learning surrogate that predicts both hull-surface pressure distributions and far-field free-surface wave patterns directly from hull geometry and speed. The network employs a regularized dynamic graph convolutional backbone on hull point clouds, with a multi-head decoder for simultaneous near-body pressure and free-surface elevation outputs. Training data consist of 420 inviscid free-surface simulations generated using a potential-flow panel method for two parent yacht hulls, each parameterized into 70 variants and evaluated at three speeds. ShipNet predicts per-point pressure coefficient and two-dimensional wave elevation map using a composite loss that combines point-wise regression and image-structure terms. On a geometry-held-out test set, ShipNet achieves R^2=0.98 for hull pressure and R^2=0.91 for wave fields. Inference requires approximately 0.15s per case, yielding over a 550x speedup relative to the potential-flow solver on conventional hardware. Limitations include the restricted geometry and speed ranges and the inviscid training data, while future work will extend the model to high-fidelity viscous simulations with physics-informed regularization.

2606.15370 2026-06-16 cs.CV cs.LG 交叉投稿

MNet++: Extended 2D/3D Networks for Anisotropic Medical Image Segmentation

MNet++: 用于各向异性医学图像分割的扩展2D/3D网络

Kirsten Odendaal, Rade Bajic

发表机构 * School of Computing, Georgia Institute of Technology(佐治亚理工学院计算学院)

AI总结 本文复现并扩展了混合2D/3D卷积网络MNet,引入自适应融合门控和VMamba状态空间模块,在保持各向异性鲁棒性的同时提升分割性能。

详情
AI中文摘要

本工作展示了MNet的完整复现与扩展,MNet是一种专为各向异性医学图像分割设计的混合2D/3D卷积网络。在nnU-Net框架内重新实现了原始架构,以验证其报告的性能和对可变体素间距(即各向异性)的鲁棒性。在匹配的预处理和计算约束下,在PROMISE前列腺MRI和LiTS肝脏CT的受控子集上进行了实验。复现的MNet在PROMISE上达到了89.0 +/- 0.9%的Dice相似系数(DSC),与已发表结果相差0.8%,在LiTS上肝脏和肿瘤分割分别达到94.3 +/- 1.9%和54.6 +/- 3.1%。进一步引入了两种轻量级扩展:(1) 一种学习的融合门控机制,实现自适应2D-3D特征融合;(2) 一个VMamba状态空间模块,用于高效的长程深度建模。空间门控变体以不到3%的推理开销将DSC提高了+0.8%,而VMamba提高了性能一致性,将PROMISE Dice变异降低至+/- 0.7%,并在LiTS肝脏上达到最强性能,Dice为95.8%。两种扩展均保持了MNet对各向异性的鲁棒性,在1-4 mm体素间距下Dice变化为1.5%。总体而言,该研究证实了MNet的可复现性,并表明自适应融合和状态空间建模有潜力进一步增强各向异性条件下的分割可靠性。然而,需要进一步测试才能得出明确结论。

英文摘要

This work demonstrates a full reproduction and extension of MNet, a hybrid 2D/3D convolutional network designed for anisotropic medical image segmentation. The original architecture was re-implemented within the nnU-Net framework to verify its reported performance and robustness to variable voxel spacing, known as anisotropy. Experiments were conducted on PROMISE prostate MRI and a controlled subset of LiTS liver CT under matched preprocessing and compute constraints. The reproduced MNet achieved a Dice similarity coefficient (DSC) of 89.0 +/- 0.9% on PROMISE, within 0.8% of the published result, and 94.3 +/- 1.9% / 54.6 +/- 3.1% for liver and tumor segmentation on LiTS, respectively. Two lightweight extensions were further introduced: (1) a learned Fusion Gating mechanism enabling adaptive 2D-3D feature blending, and (2) a VMamba state-space module for efficient long-range depth modelling. The Spatial Gating variant improved DSC by +0.8% with less than 3% inference overhead, while VMamba improved performance consistency, reducing PROMISE Dice variation to +/- 0.7% and achieving the strongest LiTS liver performance at 95.8% Dice. Both extensions preserved MNet robustness to anisotropy, with delta Dice = 1.5% across 1-4 mm voxel spacing. Overall, the study confirms MNet reproducibility and demonstrates that adaptive fusion and state-space modelling have the potential to further strengthen segmentation reliability under anisotropic conditions. However, further tests are required to provide definitive conclusions.

2606.15449 2026-06-16 cs.CL cs.IR cs.LG 交叉投稿

Transfer Learning for FHIR Questionnaire Terminology Binding

面向 FHIR 问卷术语绑定的迁移学习

Maxim Gorshkov

发表机构 * Department of Computer Science, Stanford University(斯坦福大学计算机科学系)

AI总结 将 FHIR 问卷项与 LOINC 代码的绑定视为检索问题,比较六种方法,发现 BioLORD 在 top-1 准确率上最优,而对比微调在 top-5 和 top-10 上表现更好,并分析了分布偏移和错误类型。

详情
AI中文摘要

电子预授权工作流要求 FHIR 问卷项携带 LOINC 代码,但 HL7 Da Vinci CDS-Library 中的大多数项缺乏这些绑定。我们将其视为一个检索问题:给定问卷项的文本,从 97,314 个活跃代码池中找到正确的 LOINC 代码。我们在一个包含 54 个项的评估集上比较了六种方法(TF-IDF、冻结 MiniLM、BioBERT、BioLORD、对比微调 MiniLM 以及 TF-IDF+GPT 重排序器),该评估集涵盖三种查询风格(自然问题、中等和简洁)。没有单一方法在所有指标上获胜。BioLORD 是一个在生物医学本体定义上预训练的冻结编码器,尽管没有见过任务特定数据,但其 top-1 准确率最高(R@1 = 0.185,MRR = 0.246),而在原始 LHC-Forms 对上的对比微调则在 R@5(0.389)和 R@10(0.426)上表现最佳。分布偏移消融实验表明,为什么我们主表中的微调不是最强的:在原始对中添加 GPT 生成的释义后,R@5 从 0.389 降至 0.296,因此增强联合在除 R@1 外的所有指标上均不如仅使用原始训练。性能在 5k 训练对时达到峰值。对 BioLORD 的 R@1 失败案例的错误分析表明,错误特异性和歧义文本案例共占错误的 59%。

英文摘要

Electronic prior authorization workflows require FHIR Questionnaire items to carry LOINC codes, yet most items in the HL7 Da Vinci CDS-Library lack these bindings. We treat this as a retrieval problem: given a Questionnaire item's text, find the correct LOINC code in a pool of 97,314 active codes. We compare six methods (TF-IDF, frozen MiniLM, BioBERT, BioLORD, contrastively fine-tuned MiniLM, and a TF-IDF+GPT reranker) on a 54-item evaluation set spanning three query styles (natural question, medium, and terse). No single method wins on every metric. BioLORD, a frozen encoder pre-trained on biomedical ontology definitions, has the best top-rank accuracy (R@1 = 0.185, MRR = 0.246) despite seeing no task-specific data, while a contrastive fine-tune on raw LHC-Forms pairs takes R@5 (0.389) and R@10 (0.426). A distribution-shift ablation shows why the fine-tune in our main table is not the strongest one: adding GPT-generated paraphrases to the raw pairs drops R@5 from 0.389 to 0.296, so the augmented union underperforms raw-only training on every metric except R@1. Performance peaks at 5k training pairs. Error analysis on BioLORD's R@1 failures shows that wrong-specificity and ambiguous-text cases together account for 59% of errors.

2606.15559 2026-06-16 cs.SE cs.DC cs.LG 交叉投稿

SDVDiag: Multimodal Causal Discovery for Online Diagnosis in Software-defined Vehicles

SDVDiag:软件定义车辆中用于在线诊断的多模态因果发现

Matthias Weiß, Athreya Hosahalli Prakash, Falk Dettinger, Nasser Jazdi, Michael Weyrich

发表机构 * University of Erlangen-Nuremberg(埃尔兰根-纽伦堡大学) Fraunhofer Institute for Software and Virtual Systems(弗劳恩霍夫软件与虚拟系统研究所)

AI总结 提出SDVDiag多模态因果发现管道,融合日志和指标表示构建因果图,结合异常触发实现持续在线诊断,在自动泊车测试中因果图更稀疏,根因定位准确。

Comments 8 pages, 4 figures, 2 tables

详情
AI中文摘要

向软件定义车辆的转变将越来越多的车辆功能集中到分布式软件服务中,故障通过服务依赖关系传播,表面症状通常与潜在缺陷相隔多个因果跳。现有方法仅部分解决此类系统中的因果根因分析:它们通常基于单一可观测性模态进行推理,并以离线、操作员驱动的方式运行,无法满足连续车辆运行的需求。本文提出SDVDiag,一种多模态因果发现管道,在图构建之前将基于日志和基于指标的服务表示融合到共享嵌入空间中,并结合异常驱动触发器,将诊断平台从手动操作的批处理工具转变为持续运行的在线系统。在自动代客泊车测试平台上的评估表明,多模态管道生成的因果图比仅基于指标的基线更稀疏(平均134条边 vs. 182条边),并且在人工反馈优化的每个阶段,基于专家知识图的边加权奖励始终优于基线,在60次反馈查询后比基线提高了2.4倍。端到端故障注入场景进一步证明,集成触发器正确恢复了位于可观察症状上游两个因果跳的真实根因。

英文摘要

The transition toward software-defined vehicles concentrates an increasing share of vehicle functionality into distributed software services, where failures propagate through service dependencies and the surface symptom is often several causal hops away from the underlying defect. Existing approaches to causal root-cause analysis in such systems address this only partially: they typically reason over a single observability modality and operate in an offline, operator-driven mode that does not match the demands of continuous vehicle operation. This paper presents SDVDiag, a multimodal causal-discovery pipeline that fuses log-based and metric-based service representations into a shared embedding space before graph construction, coupled with an anomaly-driven trigger that converts the diagnostic platform from a manually operated batch tool into a continuously running online system. Evaluation on an Autonomous Valet Parking testbed shows that the multimodal pipeline produces sparser causal graphs than a metrics-only baseline (134 vs. 182 edges on average) and consistently outperforms it in edge-weighted reward against an expert knowledge graph at every stage of human-feedback refinement, showing a 2.4-fold improvement over the baseline after 60 feedback queries. An end-to-end fault-injection scenario further demonstrates that the integrated trigger correctly recovers a true root cause located two causal hops upstream of the observable symptom.

2606.15565 2026-06-16 cs.HC cs.LG 交叉投稿

If These Walls Could Talk: Critical Play with Large Language Models in Museums

如果这些墙会说话:博物馆中大语言模型的批判性游戏

Anders Sundnes Løvlie

发表机构 * The Dalí Museum(达利博物馆)

AI总结 针对博物馆中大语言模型聊天机器人不可靠但吸引人的矛盾,提出设计批判性游戏,将机器人作为虚构角色呈现历史叙事、话语风格和多元视角。

详情
AI中文摘要

大语言模型(LLM)越来越多地被用于博物馆中,作为角色扮演聊天机器人,让参观者与模拟的历史人物和文物对话。虽然这样的装置可以有趣且引人入胜,但它们也存在问题,因为LLM无法被信任说出真相。我指出了在博物馆聊天机器人中使用LLM的一个基本困境:LLM无法被信任说出真相,而使其更可靠的努力可能会破坏这些机器人最初吸引人的地方——它们进行逼真对话的能力。对此,我提出设计基于LLM的机器人的批判性游戏:设计与之进行游戏性互动,这些机器人虽然不可靠,但仍能以适当且引人入胜的方式呈现过去——作为代表历史叙事、话语风格、多元视角、幽默和讽刺的虚构角色。

英文摘要

Large Language Models (LLMs) are increasingly being used in museums to as role playing chatbots which let visitors talk to simulated versions of people and artefacts from the past. While such installations can be playful and engaging, they are also problematic because LLMs cannot be trusted to speak truthfully. I identify a fundamental dilemma for the use of LLMs in museum chatbots: LLMs cannot be trusted to tell the truth, and efforts to make them more reliable may ruin that which is attractive about the bots in the first place - their ability to engage in life-like conversation. In response, I propose designing for critical play with LLM-based bots: Designing for playful interactions with bots that are unreliable but still able to represent the past in an adequate and engaging manner - as fictional characters representing historical narratives, styles of discourse, diverse perspectives, humor and satire.

2606.15594 2026-06-16 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY 交叉投稿

Pixels to Proofs: Probabilistically-Safe Latent World Model Control via Parallel Conformal Robust MPC

从像素到证明:通过并行保形鲁棒MPC实现概率安全的潜在世界模型控制

Devesh Nath, Anutam Srinivasan, Haoran Yin, Ruitong Jiang, Jeffrey Fang, Glen Chou

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 提出SLS^2框架,结合保形预测与鲁棒模型预测控制,在学习的潜在世界模型中实现基于视觉的安全运动规划,提升目标到达性能与安全性。

详情
AI中文摘要

我们提出了SLS^2,一个使用鲁棒模型预测控制(MPC)在学习的潜在世界模型中进行安全反馈运动规划的框架。我们的方法训练了一个动作条件的联合嵌入世界模型,具有紧凑的马尔可夫潜在状态,通过学习的潜在动力学实现高效的基于梯度的轨迹优化。为了在潜在预测不完美的情况下确保真实系统的安全性,我们采用保形预测来通知GPU加速的系统级综合(SLS)鲁棒MPC方案,以获得校准的潜在误差界限和鲁棒的潜在空间约束集。我们还学习并保形化了一个潜在约束检查器,使SLS规划器能够在闭环执行期间施加概率安全约束。我们在基于视觉的控制任务上评估了我们的方法,与潜在世界模型和安全规划基线相比,它提高了目标到达性能和安全性。

英文摘要

We present SLS^2, a framework for safe feedback motion planning from pixels using robust model predictive control (MPC) in learned latent world models. Our approach trains an action-conditioned joint-embedding world model with compact Markovian latent states, enabling efficient gradient-based trajectory optimization through learned latent dynamics. To enforce safety for the true system despite imperfect latent predictions, we inform a GPU-accelerated system level synthesis (SLS) robust MPC scheme with conformal prediction to obtain calibrated latent error bounds and robust latent-space constraint sets. We further learn and conformalize a latent constraint checker, allowing the SLS planner to impose probabilistic safety constraints during closed-loop execution. We evaluate our method on vision-based control tasks, where it improves both goal-reaching performance and safety over latent world-model and safe-planning baselines.

2606.15694 2026-06-16 cs.MM cs.AI cs.CV cs.LG 交叉投稿

MAF: Multimodal Adaptive Few-shot Prompting for Sentiment Analysis with MLLMs

MAF: 面向情感分析的多模态自适应少样本提示方法

Hangling Xie

发表机构 * Nanjing University of Posts and Telecommunications(南京邮电大学)

AI总结 提出MAF框架,通过动态检索与查询相关的多模态示例,利用轻量级系数生成网络实时融合多模态相似度,结合多数投票提升MLLM在情感分析中的性能。

详情
AI中文摘要

多模态大语言模型(MLLMs)在理解复杂多模态内容方面展现了卓越的能力。然而,它们在情感分析中的性能对提示设计高度敏感,导致静态、统一应用的提示本质上无法捕捉不同输入中变化的细微多模态线索。为了解决这一局限性,我们提出了一种多模态自适应少样本提示(MAF)框架,该框架动态检索并整合与查询相关的示例,以上下文敏感的方式激发MLLM的情感推理能力。MAF构建了一个示例检索模块,整体编码面部表情、场景上下文和文本语义,并引入唇部运动幅度检测机制以在多人物场景中准确识别说话者。与传统的固定权重融合不同,我们训练了一个轻量级系数生成网络,实时输出查询条件的融合权重,从而实现多模态相似度分数的加权聚合,以检索最具信息量的前K个示例。通过MLLM生成的多个候选输出进行多数投票,进一步增强了预测稳定性。在公开基准数据集上的大量实验表明,MAF相比相应的骨干变体取得了显著且一致的性能提升,并与强大的多模态情感分析基线保持竞争力。

英文摘要

Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in understanding complex multimodal content. However, their performance in sentiment analysis exhibits acute sensitivity to prompt design, rendering static, uniformly applied prompts inherently suboptimal for capturing the nuanced multimodal cues that vary across inputs. To address this limitation, we propose a Multimodal Adaptive Few-Shot Prompting (MAF) framework, which dynamically retrieves and integrates query-relevant demonstrations to elicit the sentiment reasoning capabilities of MLLMs in a context-sensitive manner. MAF constructs a demonstration retrieval module that holistically encodes facial expressions, scene context, and textual semantics, with a lip movement amplitude detection mechanism introduced for accurate speaker identification in multi-person scenarios. Departing from conventional fixed-weight fusion, a lightweight coefficient generation network is trained to output query-conditioned fusion weights in real time, enabling weighted aggregation of multimodal similarity scores to retrieve the top-K most informative demonstrations. Prediction stability is further enhanced through majority voting over multiple candidate outputs generated by the MLLM. Extensive experiments on public benchmark datasets demonstrate that MAF achieves substantial and consistent performance improvements over the corresponding backbone variants and remains competitive with strong multimodal sentiment-analysis baselines.

2606.15696 2026-06-16 cs.AI cs.CL cs.LG 交叉投稿

Do LLMs Reliably Identify Correct Information Units in Aphasic Discourse?

LLMs 能否可靠识别失语症语篇中的正确信息单元?

Jason M Pittman, Yesenia Medina-Santos, Anton Phillips, Brielle C. Stark

发表机构 * Indiana University Bloomington(印第安纳大学布卢明顿分校)

AI总结 研究评估指令微调大语言模型在零样本和少样本提示下对失语症语篇进行词级正确信息单元分类的性能,发现少样本提示可提升效果但一致性仍不足。

Comments 5 tables, 4 figures

详情
AI中文摘要

正确信息单元(CIUs)是失语症语篇评估的核心,因为它们量化了交际信息性而非仅语言形式。然而,CIU评分耗时且需要训练有素的评分者。本研究考察了指令微调的大语言模型(LLMs)是否能够可靠地从失语症语篇转录中进行词级CIU分类。使用Cat Rescue刺激引发的16个图片描述转录根据Nicholas和Brookshire(1993)的标准进行CIU状态标注。样本涵盖四个严重程度层:对照组、轻度、中度和重度失语症。在零样本和两种少样本提示条件下,对四个公开可用的指令微调LLMs进行了基准测试,使用五个分层随机种子。通过准确率、精确率、召回率、F1和Cohen's kappa与人类共识标签进行性能评估。零样本提示在所有模型中均不足。相比之下,少样本提示带来了显著提升,并为三个可行模型产生了有竞争力的性能。Llama-3.1-8B、Qwen2.5-7B和Mistral-7B的平均少样本F1分数范围为0.776至0.817,固定全局和逐块局部示例选择之间无显著差异。Phi-3-mini不稳定且未产生可靠性能。可行模型显示出高召回率但较低的精确率,表明系统性地过度将词元分类为CIU。性能也随语篇严重程度变化,在更严重的失语症中结果最弱。少样本LLM提示可以在无需基于梯度的任务训练的情况下支持自动CIU识别,但与人类标注的一致性仍不足以完全自主使用。这些发现支持基于LLM的CIU评分作为语篇评估系统中一个有前景的人机协同组件。

英文摘要

Correct Information Units (CIUs) are central to discourse assessment in aphasia because they quantify communicative informativeness rather than linguistic form alone. However, CIU scoring is time intensive and requires trained raters. This study examined whether instruction-tuned large language models (LLMs) can reliably perform token-level CIU classification from aphasic discourse transcripts. Sixteen picture-description transcripts elicited with the Cat Rescue stimulus were annotated for CIU status according to Nicholas and Brookshire (1993). The sample spanned four severity strata: control, mild, moderate, and severe aphasia. Four publicly available instruction-tuned LLMs were benchmarked under zero-shot and two few-shot prompting conditions across five stratified random seeds. Performance was evaluated against consensus human labels using accuracy, precision, recall, F1, and Cohen's kappa. Zero-shot prompting was insufficient across models. In contrast, few-shot prompting yielded substantial gains and produced competitive performance for three viable models. Mean few-shot F1 scores ranged from 0.776 to 0.817 across Llama-3.1-8B, Qwen2.5-7B, and Mistral-7B, with no significant differences between fixed global and per-chunk local example selection. Phi-3-mini was unstable and did not yield reliable performance. Viable models showed high recall but lower precision, suggesting systematic over-classification of tokens as CIUs. Performance also varied by discourse severity, with the weakest results in more severe aphasia. Few-shot LLM prompting can support automated CIU identification without gradient-based task training, but agreement with human annotation remains insufficient for fully autonomous use. These findings support LLM-based CIU scoring as a promising human-in-the-loop component of discourse assessment systems.

2606.15831 2026-06-16 cs.AI cs.LG cs.NE cs.SY eess.SY 交叉投稿

An Integrated System for Real-Time Student Assessment and Career Guidance Using Neural Networks in Computing Disciplines

基于神经网络的计算学科实时学生评估与职业指导集成系统

Sakir Hossain Faruque, Md. Jubair Hossain, Sharun Akter Khushbu

发表机构 * Daffodil International University(达福尔国际大学) Barishal Engineering College(巴里什尔工程学院)

AI总结 针对计算机专业学生职业路径选择困难,提出集成职业指导专家系统与网络评估平台的AI驱动系统,采用多层感知器模型实现94.71%的职业路径预测准确率。

Comments 25 pages, 24 figures

详情
AI中文摘要

许多计算机科学(CS)和软件工程(SWE)专业的本科生在确定合适的职业道路时面临困难,尤其是当他们的学业表现、能力和兴趣不完全匹配时。为了解决这一问题,本研究提出了一种AI驱动的学生评估与职业预测系统,该系统集成了职业指导专家(CGE)系统和基于网络的学生评估(WBSA)平台。在集成框架内,CGE利用AI增强个性化职业推荐,同时帮助毕业生根据其技能和兴趣确定合适的工作、研究领域和深造机会。WBSA平台通过评估、个性化任务、导师活动和安全的实时聊天应用程序进一步加强了学生与教师之间的互动。CGE系统采用多层感知器(MLP)模型,该模型使用滚雪球抽样法从大学学生中收集的真实学术和课外数据进行训练,在预测个性化职业路径方面达到了94.71%的验证准确率。在部署前,跨大学进行了预调查以评估所提出的模型。WBSA系统作为现代Web应用程序开发,使用了Node.js、Next.js和PostgreSQL等技术,以确保可扩展性、响应性和安全的数据管理。整个系统由安全的云基础设施支持,该平台提供可靠的性能,同时帮助毕业生在IT领域选择合适的职业道路。此外,还进行了一项涉及学生和教师的后期调查,以收集反馈并进一步提高系统的整体有效性和可用性。

英文摘要

Many undergraduate students in Computer Science (CS) and Software Engineering (SWE) struggle to identify suitable career paths, particularly when their academic performance, abilities, and interests do not fully align. To address this issue, this study proposes an AI-driven Student Assessment and Career Prediction System that integrates a Career Guidance Expert (CGE) system with a Web-Based Student Assessment (WBSA) platform. Within the integrated framework, CGE enhances personalized career recommendations using AI while also assisting students after graduation in identifying suitable jobs, research domains, and higher study opportunities aligned with their skills and interests. The WBSA platform further strengthens interaction between students and faculty through assessments, personalized tasks, mentorship activities, and a secure real-time chat application. The CGE system employs a Multilayer Perceptron (MLP) model trained on real-world academic and extracurricular data collected using the snowball sampling method from the students of universities, achieving a validation accuracy of 94.71% in predicting personalized career paths. A pre-survey was conducted across universities to evaluate the proposed model before deployment. The WBSA system was developed as a modern web application using technologies such as Node.js, Next.js, and PostgreSQL to ensure scalability, responsiveness, and secure data management. The overall system is supported by a secure cloud-based infrastructure, the platform provides reliable performance while assisting graduates to select suitable career path in IT sector. In addition, a post-survey involving both students and faculty was conducted to gather feedback and further improve the overall effectiveness and usability of the system.

2606.15856 2026-06-16 eess.SP cs.LG math.SP 交叉投稿

Early Anomaly-Onset Detection based on Wigner--Ville Distribution Slice Spectra: A Transmission-Grid Test Case

基于Wigner-Ville分布切片谱的早期异常起始检测:一个输电网测试案例

Eduardo Jr Piedad, Eduardo Prieto-Araujo, Oriol Gomis-Bellmunt

发表机构 * DOST–Advanced Science and Technology Institute(菲律宾科技先进研究院) Universitat Politècnica de Catalunya(加泰罗尼亚理工大学)

AI总结 提出基于Wigner-Ville分布切片谱的全向量方法,用于高压电网电压波形的序列异常起始检测,通过基线归一化偏差评分实现低虚警率。

Comments 7 pages, 3 figures, 4 tables

详情
AI中文摘要

电力网络中的运行扰动监测需要根据到达的波形窗口做出决策,而不是在事件发生后根据完整记录进行决策。本研究评估了全向量Wigner-Ville分布切片(WVDS)谱用于高压电网电压波形的序列异常起始检测。该方法保留了Wigner-Ville分布的双线性中点交互结构,并将每个128样本电压窗口表示为128维切片谱,避免了手动选择故障频率标记。WVDS与基线归一化偏差(BND)评分结合使用,并与快速傅里叶变换的BND(FFT-BND)、原始窗口自编码器、FFT自编码器和WVDS自编码器在相同阈值和三窗口持久性规则下进行比较。使用合成自编码器-聚类教师模型选择RTE故障记录,这些记录从初始正常区域开始,然后过渡到异常行为。在过滤后的测试集上,FFT-BND实现了最高灵敏度,而WVDS-BND提供了最低的虚警工作点,将记录级起始前虚警率降低至0.69%。自编码器比较遵循相同的选择性模式:WVDS重建相对于FFT重建减少了虚警,但漏检了更多样本。结果表明,当虚警代价较高时,保留的WVD交叉项信息可以形成用于在线电网波形异常监测的选择性表示。

英文摘要

Operational disturbance monitoring in power networks requires decisions to be made from waveform windows as they arrive, rather than from completed records after the event. This study evaluates full-vector Wigner--Ville Distribution Slice (WVDS) spectra for sequential anomaly-onset detection in high-voltage grid-voltage waveforms. The approach keeps the bilinear midpoint interaction structure of the Wigner--Ville distribution and represents each 128-sample voltage window by a 128-dimensional slice spectrum, avoiding manually selected fault-frequency markers. WVDS is used with a baseline-normalized deviation (BND) score and is compared against the BND of Fast Fourier Transform (FFT-BND), raw-window autoencoders, FFT autoencoders, and WVDS autoencoders under the same thresholding and three-window persistence rule. A synthetic autoencoder--clustering teacher is used to select RTE fault records that start from an initially normal region and then transition to anomalous behavior. On the filtered test set, FFT-BND achieves the highest sensitivity, whereas WVDS-BND provides the lowest false-alarm operating point, reducing record-level pre-onset false alarms to 0.69%. The autoencoder comparison follows the same selectivity pattern: WVDS reconstruction decreases false alarms relative to FFT reconstruction but misses more examples. The results indicate that preserved WVD cross-term information can form a selective representation for online grid-waveform anomaly monitoring when false alarms are costly.

2606.15881 2026-06-16 stat.ME cs.LG stat.AP 交叉投稿

Biarchetype analysis for univariate functional data. An application to macroeconomic financial time series

单变量函数数据的双原型分析及其在宏观经济金融时间序列中的应用

Aleix Alcacer, Rafael Benitez, Vicente J. Bolos, Irene Epifanio

发表机构 * Jaume I University(Jaime I 大学) University of València(瓦伦西亚大学)

AI总结 提出双原型分析方法,同时识别案例和时间维度的原型结构,应用于欧洲国家10年期国债收益率数据,揭示三个时间区间和三个国家原型。

Comments 6 pages, 2 figures. To be published in the proceedings of SIS-FENStatS 2026, Sapienza University of Rome, Italy, June 22-25, 2026

详情
AI中文摘要

我们首次在单变量函数数据背景下引入双原型分析。这种无监督方法通过同时识别案例(在我们的应用中为国家)和时间参数上的原型结构,扩展了原型分析。案例和时间点都被表示为双原型的混合,从而得到复杂函数观测的简洁且高度可解释的表示。尽管双原型分析并非旨在作为一种聚类技术,但与双聚类方法相比,它提供了更优的可解释性,因为它基于极端的、有代表性的模式而非平均质心,从而增强了人类的理解。我们将所提出的方法应用于2001-2025年期间欧洲国家的10年期政府债券收益率。结果识别出三个不同的时间区间(危机前时期、欧元区主权债务危机时期和危机后时期),并揭示了德国、希腊和匈牙利作为国家原型。

英文摘要

We introduce biarchetype analysis for the first time in the context of univariate functional data. This unsupervised methodology extends archetype analysis by simultaneously identifying archetypal structures across both the cases (countries, in our application) and the temporal argument. Both cases and time points are expressed as mixtures of biarchetypes, yielding a concise and highly interpretable representation of complex functional observations. Although biarchetype analysis is not intended as a clustering technique, it offers superior interpretability compared with biclustering approaches, as it is based on extreme, representative patterns rather than average centroids, thereby enhancing human comprehension. We apply the proposed method to 10-year government bond yields of European countries over the period 2001-2025. The results identify three distinct time regimes (the pre-crisis period, the euro-area sovereign debt crisis, and the post-crisis period), and reveal Germany, Greece, and Hungary as country archetypes.

2606.15896 2026-06-16 cs.RO cs.LG 交叉投稿

LoComposition: Terrain-Adaptive Energy-Efficient Quadruped Locomotion without Gait Priors

LoComposition:无需步态先验的地形自适应高效四足运动

Loukas Kordos, Leonard T. Franz, Simon Rappenecker, Oliver Hausdoerfer, Angela P. Schoellig, Pavel Kolev, Georg Martius

发表机构 * Max Planck Institute for Intelligent Systems(马克斯·普朗克智能系统研究所) University of Tübingen(图宾根大学) Technical University of Munich(慕尼黑工业大学) University of Stuttgart(斯图加特大学)

AI总结 提出一种将任务奖励、操作约束、能量最小化和地形感知分离的框架,无需显式步态先验,在四足机器人上实现高效地形自适应运动,运输成本降低56%,违规减少96%。

Comments 17 pages, 5 figures, 10 tables

详情
AI中文摘要

基于学习的四足运动通常依赖于复杂的奖励函数,将任务规范、操作限制、步态偏好和地形适应纠缠在单个优化目标中。我们通过不同的机制处理这些功能:任务规范用奖励,操作限制用约束,步态偏好用能量最小化,以及用外部感知来根据地形难度调整能量使用。我们表明,这些组件共同实现了高效、地形自适应的运动,并且移除每个组件会暴露出不同的失败模式。我们的公式移除了显式的步态先验(包括腾空时间、接触次数和足部间隙目标),转而支持涌现行为。与传统的复杂奖励基线相比,我们的公式在实现相当的地形穿越的同时,将运输成本降低了56%,操作限制违规减少了96%。得到的策略零样本迁移到使用基于LiDAR高程地图的物理Unitree Go2上。项目网站含视频:https://tinyurl.com/locomposition。

英文摘要

Learning-based quadrupedal locomotion typically relies on complex reward formulations that entangle task specification, operational limits, gait preference, and terrain adaptation within a single optimization objective. We instead treat these functions through distinct mechanisms: rewards for task specification, constraints for operational limits, energy minimization for gait preference, and exteroceptive perception for adapting energy use to terrain difficulty. We show that these components jointly enable efficient, terrain-adaptive locomotion, and that removing each component exposes a distinct failure mode. Our formulation removes explicit gait priors (including air-time, contact-count, and foot-clearance targets) in favor of emergent behavior. Compared to a conventional complex-reward baseline, our formulation achieves comparable terrain traversal while reducing cost of transport by 56% and operational-limit violations by 96%. The resulting policies transfer zero-shot to a physical Unitree Go2 using LiDAR-based elevation mapping. Project website with videos: https://tinyurl.com/locomposition.

2606.15954 2026-06-16 cs.SE cs.AI cs.DC cs.LG 交叉投稿

Green SARC: Predictive Cost and Carbon Governance for Agentic AI Systems

Green SARC:面向代理型AI系统的预测性成本与碳治理

Gaston Besanson

发表机构 * Universidad Torcuato Di Tella(托库托迪泰拉大学)

AI总结 提出Green SARC框架,通过架构级治理在代理循环中强制执行成本与碳预算,理论贡献包括预测性执行点,实验证明门控机制实现0%超支,端到端节省47-55%。

Comments 19 figures. Code: https://github.com/besanson/Greensarc -- Software DOI: https://doi.org/10.5281/zenodo.20692196

详情
AI中文摘要

代理型AI系统通过工具和子代理运作,但旨在约束其财务和环境成本的控制措施仍停留在仪表盘上,在执行过程中或执行后进行评估。Green SARC将SARC架构治理框架——代理循环中的四个执行点——应用于FinOps和GreenOps,贡献了关于执行什么以及如何预测的理论。我们报告了四个与策略无关的结果。(i) 无约束的“状态雪球”在循环深度上为$Θ(n^2)$;在3000个真实多步计划(SWE-rebench)上,100%成立,中位曲率$\hat{c}_2=216$超过线性累积预测$p/2=134$——真实计划累积速度快于模型。(ii) 在真实残差上,正态-$σ$门覆盖不足(标称95%时实际92%);分裂共形校准成立(95.2%)。(iii) 根据预期预算调整的软拉格朗日惩罚在91.5%的种子上违反预算;架构门违反率为0%。(iv) 在绑定预算下,门在合成和真实(BurstGPT)到达上的超预算发生率为0%。端到端的token/美元/碳节省(47-55%)是真实的,但幅度依赖于策略——由范围-容量旋钮设定,而非门拒绝。该库是开源的,无依赖,并为每个引用的数字提供了再生脚本。

英文摘要

Agentic AI systems act through tools and sub-agents, yet the controls meant to bound their financial and environmental cost still sit on dashboards evaluated beside or after execution. Green SARC applies the SARC governance-by-architecture framework -- four enforcement sites in the agent loop -- to FinOps and GreenOps, contributing the theory of what to enforce and how to predict it. We report four policy-independent results. (i) The unconstrained "State Snowball" is $Θ(n^2)$ in loop depth; on 3,000 real multi-step plans (SWE-rebench) it holds on 100%, with median curvature $\hat{c}_2=216$ exceeding the linear-accretion prediction $p/2=134$ -- real plans accrete faster than the model. (ii) On real residuals the Normal-$σ$ gate under-covers (92% at nominal 95%); split-conformal calibration holds (95.2%). (iii) A soft Lagrangian penalty tuned to the budget in expectation breaches it on 91.5% of seeds; the architectural gate breaches 0%. (iv) Under binding budgets the gate's over-budget incidence is 0% on synthetic and real (BurstGPT) arrivals. End-to-end token/USD/carbon savings (47--55%) are real but policy-dependent in magnitude -- set by a scope-cap knob, not by gate rejections. The library is open-source, dependency-free, and ships a regeneration script for every cited number.

2606.15972 2026-06-16 cs.CL cs.AI cs.LG 交叉投稿

Formalize Once, Edit the Rest: Efficient Lean-Based Answer Selection for Math Reasoning

一次形式化,其余编辑:基于Lean的高效数学推理答案选择

Ji Feng, Zhouxing Shi

发表机构 * University of California, Riverside(加州大学河滨分校)

AI总结 提出BASE流水线,通过形式化一个候选答案并编辑其余答案,减少自动形式化调用约5倍,同时提升选择准确性。

Comments 15 pages, 1 figure. Code available at https://github.com/ucr-rai/base-and-edit

详情
AI中文摘要

随着大型语言模型(LLMs)越来越多地应用于数学推理,形式化证明助手(如Lean)可用于以机器可检查的严谨性验证推理输出,从而支持在测试时扩展中从K个采样候选答案中进行答案选择等用例。然而,使用Lean要求LLM的输出(最初为自然语言)首先被形式化。现有的基于Lean的答案选择工作使用自动形式化模型为每个候选答案独立生成一个Lean形式化语句,这带来了显著的计算成本。我们提出BASE,一个基础-编辑流水线,它为每个问题形式化一个基础候选答案,并通过就地编辑答案表达式来推导出其余K-1个语句。为此,我们训练了一个重写器模型LEANSCRIBE,用于定位基础形式化中的答案,并为其他K-1个候选答案生成可重用的编辑函数。BASE同时提高了选择准确性并降低了形式化成本——这是一个帕累托改进,在四个基准测试和三个求解器上的所有12个(数据集,求解器)配置中均成立,在K=8时自动形式化器调用减少约5倍,且随着K增长,减少幅度预计会更大。代码可在https://github.com/ucr-rai/base-and-edit获取。

英文摘要

With large language models (LLMs) increasingly applied to mathematical reasoning, formal proof assistants such as Lean can be leveraged to verify reasoning outputs with machine-checkable rigor, enabling use cases such as answer selection in test-time scaling with K sampled candidate answers. However, employing Lean requires that LLM outputs, originally in natural language, first be formalized. Existing Lean-based answer-selection work uses an autoformalization model to generate a formal statement in Lean for each candidate answer independently, incurring a significant computational cost. We propose BASE, a base-and-edit pipeline that formalizes a single base candidate per problem and derives the remaining K-1 statements by editing the answer expression in place. To facilitate this, we train a rewriter model LEANSCRIBE to localize the answer in the base formalization and generate a reusable edit function for the other K-1 candidates. BASE simultaneously improves selection accuracy and reduces formalization cost - a Pareto improvement that holds on all 12 (dataset, solver) configurations across four benchmarks and three solvers, cutting autoformalizer calls by about 5x at K=8, with the reduction expected to become larger as K grows. Code is available at https://github.com/ucr-rai/base-and-edit.

2606.15983 2026-06-16 quant-ph cond-mat.mtrl-sci cs.LG 交叉投稿

Learning ground state observables from quantum computing experiments

从量子计算实验中学习基态可观测量

Ben Jaderberg, Freya Shah, Minjun Jeon, M. Emre Sahin, Christa Zoufal, Kunal Sharma

发表机构 * IBM Quantum, IBM Research Europe, Hursley, Winchester, SO21 2JN, United Kingdom(IBM量子、IBM欧洲研究院,赫尔斯利,温切斯特,SO21 2JN,英国) Department of Engineering Science, University of Oxford, Parks Road, Oxford OX1 3PJ, United Kingdom(工程科学系,牛津大学,帕克斯路,牛津 OX1 3PJ,英国) IBM Quantum, T. J. Watson Research Center, Yorktown Heights, NY 10598, USA(IBM量子、T.J. Watson研究中心,扬斯敦高地,纽约 10598,美国) Department of Materials, University of Oxford, Parks Road, Oxford OX1 3PH, United Kingdom(材料系,牛津大学,帕克斯路,牛津 OX1 3PH,英国) The Hartree Centre, STFC, Sci-Tech Daresbury, Warrington WA4 4AD, UK(哈特里中心,STFC,科技达尔斯伯里,沃林顿 WA4 4AD,英国) IBM Quantum, IBM Research Europe — Zurich, Ruschlikon 8803, Switzerland(IBM量子、IBM欧洲研究院——苏黎世,卢斯利康 8803,瑞士) IBM Research, Chicago, IL 60606, USA(IBM研究院,芝加哥,伊利诺伊 60606,美国)

AI总结 本文在115量子比特的二维海森堡XXZ模型中,利用近似基态的实验数据训练神经网络,成功预测了未见哈密顿量参数下的空间分辨可观测量,展示了从量子数据学习的实际可行性。

Comments 20 pages, 14 figures

详情
AI中文摘要

最近的理论进展确立了机器学习模型在基于量子生成数据训练时,能够有效预测带隙局部哈密顿量基态性质的条件。然而,由于在量子处理器上制备多体基态的困难,此范式先前的实验演示大多局限于小系统或高度结构化状态。在本工作中,我们展示了从二维海森堡XXZ模型的近似基态生成的实验量子数据中学习,系统规模达115量子比特。我们构建了一个数据集,包含反铁磁相中的单点期望值、两点关联和12体环关联。然后,我们在该数据上训练神经网络,并证明它们能够准确预测未见过的哈密顿量参数下的空间分辨可观测量,无论是在训练分布内还是在接近相界的分布外区域。我们的结果展示了从相互作用二维多体系统的大规模量子数据中学习的实际实现,为量子处理器能够提供超越经典近似方法范围的训练数据的路径提供了动力。

英文摘要

Recent theoretical progress has established conditions under which machine learning models can efficiently predict ground-state properties of gapped local Hamiltonians when trained on quantum-generated data. Previous experimental demonstrations in this paradigm, however, have largely been limited to small systems or highly structured states, due to the difficulty of preparing many-body ground states on quantum processors. In this work, we demonstrate learning from experimental quantum data generated from approximate ground states of the two-dimensional Heisenberg XXZ model with system sizes up to 115 qubits. We construct a dataset of single-site expectation values, two-point correlations, and 12-body loop correlations across the antiferromagnetic phase. We then train neural networks on this data and show that they can accurately predict spatially resolved observables for previously unseen Hamiltonian parameters, both within the training distribution and in an out-of-distribution regime approaching the phase boundary. Our results demonstrate the practical realization of learning from quantum data for an interacting two-dimensional many-body system at scale, motivating a path toward regimes where quantum processors could provide training data beyond the reach of classical approximation methods.

2606.15986 2026-06-16 hep-lat cs.LG 交叉投稿

Learning the generating functional for variance reduction in lattice QCD

学习格点QCD中方差约化的生成泛函

Ryan Abbott, Yang Fu, Daniel C. Hackett, Gurtej Kanwar, Fernando Romero-López, Phiala E. Shanahan

发表机构 * Physics Department, Columbia University(哥伦比亚大学物理系) Center for Theoretical Physics, Massachusetts Institute of Technology(麻省理工学院理论物理中心) Fermi National Accelerator Laboratory(费米国家加速器实验室) Higgs Centre for Theoretical Physics, School of Physics and Astronomy, University of Edinburgh(爱丁堡大学物理与天文学学院希格斯理论物理中心) Albert Einstein Center, Institute for Theoretical Physics, University of Bern(伯尔尼大学爱因斯坦中心理论物理研究所) NSF AI Institute for Artificial Intelligence and Fundamental Interactions(国家科学基金会人工智能与基本相互作用AI研究所)

AI总结 利用机器学习归一化流编码生成泛函表示,系统降低格点规范场论中任意N点关联函数的方差,在QCD和杨-米尔斯理论中实现高达三个数量级的方差约化。

Comments 8 pages, 3 figures

详情
AI中文摘要

量子场论中的生成泛函为构造作为源算符导数的关联函数提供了自然框架。我们提出了一种方法,利用机器学习归一化流编码生成泛函的表示,以降低格点规范场论计算中玻色子算符任意$N$点关联函数的方差。我们展示了在此框架下系统逼近无噪声关联函数估计量的可能性。我们通过量子色动力学和杨-米尔斯理论中胶球关联函数和威尔逊环的计算演示了该方法。结果显示方差降低了多达三个数量级。

英文摘要

The generating functional in quantum field theory provides the natural framework for constructing correlation functions as derivatives with respect to source operators. We present a methodology that leverages machine-learned normalizing flows to reduce the variance of arbitrary $N$-point correlation functions of bosonic operators in lattice gauge field theory calculations by encoding a representation of the generating functional. We show that it is possible to systematically approach noiseless estimators of correlation functions in this framework. We demonstrate this methodology with applications to calculations of glueball correlation functions and Wilson loops in Quantum Chromodynamics and Yang-Mills theory. The results show up to three orders of magnitude variance reduction.

2606.15994 2026-06-16 cs.AI cs.LG 交叉投稿

Agentic Framework for Deep Learning workload migration via In-Context Learning

基于上下文学习的深度学习工作负载迁移智能体框架

Qiyue Liang, Steven Ingram, George Vanica, Andi Gavrilescu, Newfel Harrat, Hassan Sipra, Sethuraman Sankaran

发表机构 * Google(谷歌)

AI总结 提出结合上下文学习与Oracle驱动的自调试的自主系统,实现从PyTorch到JAX的深度学习模型自动迁移,在神经模块上达到91%数值等价性。

详情
AI中文摘要

将深度学习模型从PyTorch灵活的面向对象设计迁移到JAX的函数式无状态设置通常是一项手动且易出错的任务。自动迁移具有挑战性,因为大型语言模型(LLM)难以处理严格且动态的API对齐,并且容易在精确操作上出错。我们提出了一个完全自主的系统,结合了上下文学习(ICL)与Oracle驱动的自调试。首先,我们整理了一个ICL上下文,作为惯用JAX样式和测试用例生成的严格参考。其次,不依赖LLM推导数学输出,而是运行源PyTorch模块以获取其实际的动态张量状态,从而创建一个不可变的执行Oracle。然后,我们使用自主智能体循环基于Oracle数据合成测试。测试用例被重复执行,并将回溯发送回LLM进行自我修正。消融实验表明,将ICL参考与Oracle基础及自调试相结合,大大优于纯指令和基本智能体基线。这种改进没有增加过多的计算开销。我们的轻量级流水线在神经模块上实现了91%的数值等价性(相比之下,基线为9%,指令+自调试为27%),为跨框架迁移提供了高度可靠、可扩展的蓝图。该方案已在多个最先进模型上得到验证,包括SAM(Segment Anything)、T5、Code Whisper等,显示出高数值等价性。代码:https://github.com/AI-Hypercomputer/accelerator-agents/tree/main/MaxCode

英文摘要

Translating deep learning models from PyTorch's flexible, object-oriented design to JAX's functional, stateless setup is usually a manual and error-prone task. Automated migration is challenging because Large Language Models (LLMs) struggle with strict and dynamic API alignment and are prone to mistakes for exacting operations. We propose a fully autonomous system that combines In-Context Learning (ICL) with oracle-driven self-debugging. First, we curated an ICL context that serves as a strict reference for idiomatic JAX styling and test case generation. Second, instead of depending on the LLM to deduce mathematical outputs, we run the source PyTorch modules to get their actual dynamic tensor states. This creates an unchangeable execution oracle. We then use an autonomous agentic loop to synthesize tests based on the oracle data. The test cases are executed repeatedly, and the traceback is sent back to the LLM for self-correction. Ablations show that combining ICL references with oracle grounding and self-debugging greatly outperforms pure instructional and basic agentic baselines. This improvement does not add an excessive computational overhead. Our lightweight pipeline achieves 91% numerical equivalence (compared to baseline: 9%, instruction + self-debugging: 27%) on neural modules, providing a highly reliable, scalable blueprint for cross-framework migration. This has been validated across several state-of-the-art models including SAM (segment anything), T5, Code Whisper amongst others showing high numerical equivalency. Code: https://github.com/AI-Hypercomputer/accelerator-agents/tree/main/MaxCode

2606.16019 2026-06-16 cs.CL cs.LG cs.SD 交叉投稿

Scaling Human and G2P Supervision for Robust Phonetic Transcription

扩展人类与G2P监督以实现鲁棒语音转录

Alexander Metzger, Aruna Srivastava, Ruslan Mukhamedvaleev

发表机构 * Koel Labs LLC

AI总结 研究自动语音转录中人类标注与G2P监督的扩展规律,发现当人类标注少于20-30小时时G2P有效,超过后无益甚至降低鲁棒性,而ASR预训练可显著提升性能。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

专家语音标注成本高昂,尤其对于非标准方言和非典型语音。一种常见替代方法是使用字素到音素(G2P)模型从文本转录中自动生成语音标签。我们研究了自动语音转录性能如何随英语中人类和G2P监督的扩展而变化。使用一个涵盖母语、非母语和卒中后语音的精心策划的80小时基准测试,我们确定了一个监督质量阈值:只有当人类标注少于20-30小时时,G2P监督才有帮助。超过此阈值,它不提供显著益处,并可能降低跨方言鲁棒性。在此阈值之后有效的是ASR预训练,我们使用它实现了比先前系统加权音素特征错误率降低2.3倍,在非母语和失语症语音上取得了强劲提升。这些结果表明,数量驱动的G2P扩展可能对鲁棒泛化产生递减收益。

英文摘要

Expert phonetic annotation is costly, especially for non-standard dialects and atypical speech. A common alternative is using Grapheme-to-Phoneme (G2P) models to auto-generate phonetic labels from text transcripts at scale. We study how automatic phonetic transcription performance scales with human and G2P supervision in English. Using a curated 80-hour benchmark spanning native, non-native and post-stroke speech, we identify a supervision quality threshold: G2P supervision helps only when fewer than 20-30 hours of human annotation are available. Beyond this threshold, it provides no significant benefit and can reduce cross-dialect robustness. What is effective after this threshold is ASR pretraining which we use to achieve a 2.3x reduction in weighted phone feature error rate over prior systems, with strong gains on non-native and aphasic speech. These results suggest that quantity-driven G2P scaling may yield diminishing returns for robust generalization.

2606.16032 2026-06-16 cond-mat.other cs.LG 交叉投稿

Machine learning enables roughness-driven inverse design of milling processes

机器学习驱动基于粗糙度的铣削过程逆向设计

Hadi Bakhshan, Sima Farshbaf, Fernando Rastellini, Josep Maria Carbonell

发表机构 * Centre Internacional de Mètodes Numèrics a l’Enginyeria (CIMNE), Campus Norte UPC, 08034 Barcelona, Spain(国际数值工程方法中心(CIMNE),UPC北校区,巴塞罗那,西班牙) Universitat Politècnica de Catalunya (UPC), Campus Norte UPC, 08034 Barcelona, Spain(加泰罗尼亚理工大学(UPC),UPC北校区,巴塞罗那,西班牙) Mechatronics and Modelling Applied on Technology of Materials (MECAMAT) group. Universitat de Vic-Universitat Central de Catalunya (UVic-UCC), C. de la Laura 13, 08500 Vic, Spain(机械与材料技术应用机械建模组(MECAMAT),维克-加泰罗尼亚中央大学(UVic-UCC),Laura街13号,维克,西班牙)

AI总结 提出基于机器学习的铣削过程逆向设计框架,以表面粗糙度为设计目标,通过深度神经网络和随机森林前向训练结合贝叶斯优化解决多对一映射问题,平均相对误差低于5%。

详情
AI中文摘要

在制造业中应用数据驱动方法的兴趣显著增长,特别是在映射复杂高维关系方面。铣削过程是预测模型可以在原位操作之前将影响参数与表面粗糙度指标联系起来的领域之一。虽然这种方法具有明显优势,但由于数据集有限和逆向设计范式的鲁棒性问题,它面临挑战。为了解决这些挑战,本文提出了一种基于机器学习(ML)的框架,用于表面铣削过程的逆向设计,以表面粗糙度为设计目标。该框架采用两个ML模型的前向训练:深度神经网络(DNN)和随机森林(RF)集成,两者均使用从计算模拟框架生成的高保真合成数据集开发。这些训练好的模型被集成到贝叶斯优化(BO)过程中,以克服数据集固有的多对一映射产生的多重性问题。该方法识别出性能最佳的铣削工艺配置,同时考虑工艺和刀具参数,并从完整解空间中呈现它们。与参考结果相比,模型的平均相对误差低于5%,从而证明了所提方法的鲁棒性和可靠性。

英文摘要

Interest in applying data-driven approaches in manufacturing has grown significantly, particularly for mapping complex, high-dimensional relationships. The milling process is one area where predictive models can link influential parameters to surface roughness metrics prior to in situ operations. While this approach offers clear advantages, it faces challenges due to limited datasets and robustness issues in inverse design paradigms. To address these challenges, this paper proposes a machine learning (ML)-based framework for the inverse design of the surface milling process, with a focus on surface roughness as the design objective. The framework employs forward training of two ML models, a deep neural network (DNN) and a random forest (RF) ensemble, both developed using a high-fidelity synthetic dataset generated from a computational simulation framework. These trained models are integrated into a Bayesian optimization (BO) procedure to overcome the multiplicity problem arising from the many-to-one mapping inherent in the dataset. The approach identifies top-performing milling process configurations, considering both process and tool parameters, and presents them from the full solution space. The models achieve average relative errors below 5% when compared to reference results, thereby demonstrating the robustness and reliability of the proposed methodology.

2606.16035 2026-06-16 physics.ins-det cs.LG hep-ex nucl-ex physics.data-an 交叉投稿

GPT-Based Fast Simulation of CLAS12 Detector Hits via Conditional Autoregressive Generation

基于GPT的条件自回归生成实现CLAS12探测器击中快速模拟

Cole Granger, James Giroux, Richard Tyson, Maurizio Ungaro, Cristiano Fanelli

发表机构 * William & Mary, Department of Data Science(威廉玛丽学院数据科学系) William & Mary, Department of Physics(威廉玛丽学院物理系) University of Glasgow, School of Physics and Astronomy(格拉斯哥大学物理与天文学学院) Thomas Jefferson National Accelerator Facility(泰勒·杰弗里斯国家加速器设施)

AI总结 提出GPT风格自回归Transformer作为CLAS12电磁量能器的快速替代模型,以入射动量条件生成探测器击中序列,在保持物理保真度下实现每秒700事件以上的推理速度。

Comments 19 pages, 9 figures, 3 tables

详情
AI中文摘要

现代粒子物理实验表明,随着探测器组件的改进和后续计算需求接近可用资源极限,对快速、高保真探测器模拟的需求日益增长。最近,深度生成模型已成为传统蒙特卡洛方法的有前景的替代方案,近期工作从大型语言模型(LLM)和自监督下一标记预测方法中汲取灵感。在这项工作中,我们提出了一种GPT风格的自回归Transformer作为托马斯·杰斐逊国家加速器设施CLAS12实验内部量能器的快速替代模型。该模型以入射动量为条件,自回归地生成所有九个量能器层中的真实探测器击中,作为条、ADC和TDC标记序列。我们证明该模型忠实地再现了击中多重性、空间分布、能量沉积以及电磁量能器的能量-动量响应。该生成器在单个GPU上实现了超过每秒700事件的推理速率,相比传统的基于Geant4的模拟提供了显著加速,同时保持了高亮度实验项目所需的物理保真度。

英文摘要

Modern particles physics experiments have demonstrated an increasing need for fast, high-fidelity detector simulation as detector components have improved and subsequent computational requirements approach the limits of available resources. Recently, deep generative models have emerged as a promising alternative to traditional Monte-Carlo methods, with recent works drawing inspiration from large language models (LLMs) and self-supervised next-token prediction methods. In this work, we present an application of a GPT-style autoregressive transformer as a fast surrogate model for the calorimeter inside the CLAS12 experiment at the Thomas Jefferson National Accelerator Facility. The model is conditioned on incident momentum and generates realistic detector hits autoregressively across all nine calorimeter layers as sequences of strip, ADC, and TDC tokens. We demonstrate that the model faithfully reproduces hit multiplicity, spatial distributions, energy deposits, and the energy-momentum response of the electromagnetic calorimeter. The generator achieves inference rates exceeding 700 events per second on a single GPU, providing a substantial speedup over traditional Geant4-based simulations while maintaining physics fidelity essential for high-luminosity experimental programs.

2606.16051 2026-06-16 cs.NI cs.LG 交叉投稿

Hidden Degradation Costs in Energy-Cost-Only HEMS Optimisation: Study on Battery and PV Sensitivity

仅考虑能源成本的HEMS优化中隐藏的退化成本:电池和光伏敏感性研究

Dawood Butt, Nandor Verba

发表机构 * WMG, The University of Warwick(沃里克大学工程学院)

AI总结 研究住宅HEMS中仅优化能源成本时电池退化成本的隐藏影响,通过3×3敏感性分析发现退化成本可超过能源节省达1060%,需引入退化感知控制。

Comments FSEM 2026, 5 Pages

详情
AI中文摘要

住宅电池储能系统(BESS)越来越多地与光伏(PV)发电一起部署,以在波动的分时电价(TOU)下降低家庭能源成本。模型预测控制(MPC)是家庭能源管理系统(HEMS)广泛采用的优化策略,通常以最小化净能源成本为目标,并受物理和运行约束。然而,电池退化很少嵌入优化目标中,这意味着其成本未被量化且可能激进;高循环次数策略一旦部署到物理系统可能会产生显著损失。本文提出了一个基于英国住宅HEMS的滚动时域混合整数线性规划(MILP)基线,使用REFIT数据集的用电数据。对三种电池容量和三种光伏阵列尺寸进行了3×3敏感性研究,并使用Naumann应力模型和雨流循环计数法估计事后退化成本。结果表明,对于每种电池容量,退化成本保持不变,且可能超过能源成本节省高达1060%。这些结果表明,仅考虑能源成本的优化系统性地低估了真实系统成本,从而激励了退化感知控制公式的提出。

英文摘要

Residential battery energy storage systems (BESS) are increasingly deployed alongside photovoltaic (PV) generation to reduce household energy costs under volatile time-of-use (TOU) tariffs. Model predictive control (MPC) is a widely adopted optimisation strategy for home energy management systems (HEMS), typically formulated to minimise net energy cost, subject to physical and operational constraints. However, battery degradation is rarely embedded in the optimisation objective, meaning its cost is unquantified and aggressive; high-cycle-count strategies could incur significant losses once deployed to physical systems. This paper presents a receding-horizon mixed-integer linear programming (MILP) baseline for a UK residential HEMS, using demand data from the REFIT dataset. A 3 by 3 sensitivity study is conducted across three battery sizes and three PV array sizes, with post-hoc degradation cost estimated using the Naumann stress model and rainflow cycle counting. Results show that degradation remains constant for each battery size and can exceed energy cost savings by up to 1,060 %. These results demonstrate that energy-cost-only optimisation systematically underestimates the true system cost, motivating a degradation-aware control formulation.

2606.16090 2026-06-16 quant-ph cs.LG 交叉投稿

Enhancing Quantum Machine Learning with Anyons

利用任意子增强量子机器学习

Da Zhang, Wen-Qiang Liu, Zhaohui Wei, Zhang-Qi Yin

发表机构 * Center for Quantum Technology Research and Key Laboratory of Advanced Optoelectic Quantum Architecture and Measurements (MOE), School of Physics, Beijing Institute of Technology(量子技术研究中心和先进光电量子架构与测量(MOE)重点实验室,物理系,北京理工大学) Department of Mathematics and Physics, Shijiazhuang Tiedao University(数学物理系,石家庄铁道大学) Yau Mathematical Sciences Center, Tsinghua University(叶中数学科学中心,清华大学) Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing(燕奇湖北京应用数学科学研究所,北京)

AI总结 提出统一玻色子、费米子和任意子交换统计的量子核框架,通过表示、核几何和学习基准分析,证明任意子核在特征空间结构和学习性能上优于传统统计。

Comments 19 pages, 3 figures

详情
AI中文摘要

量子计算和量子机器学习的力量依赖于利用独特的量子现象作为计算资源。虽然叠加、相干和纠缠一直是这一努力的核心,但粒子交换统计的作用在很大程度上仍未探索。在这里,我们引入了一个量子核框架,将玻色子、费米子和任意子(分数)交换统计统一在单一学习范式内。我们从三个角度研究这一族核。在表示层面,Haar平均有效维数分析表明,分数交换相位能够访问纯对称或反对称极限无法访问的特征空间方向。在核几何层面,相应的Gram矩阵显示出与可区分粒子基线的更大分离,以及更低的标签相关模型复杂度。最后,在学习基准测试中,任意子核始终优于其玻色子和费米子对应物,具有更强的目标对齐和更有利的类别几何。总之,这些发现表明,交换统计重塑了量子特征空间的结构和几何,从而增强了学习性能。我们的工作将粒子交换统计确定为量子机器学习中被忽视的计算成分,并首次对跨交换相位的量子学习模型进行了系统比较。

英文摘要

The power of quantum computing and quantum machine learning relies on harnessing uniquely quantum phenomena as computational resources. While superposition, coherence and entanglement have been central to this effort, the role of particle exchange statistics remains largely unexplored. Here, we introduce a quantum kernel framework that unifies bosonic, fermionic, and anyonic (fractional) exchange statistics within a single learning paradigm. We study this family of kernels from three perspectives. At the representation level, Haar-averaged effective-dimension analysis shows that fractional exchange phases access feature-space directions inaccessible to the purely symmetric or antisymmetric limits. At the level of kernel geometry, the corresponding Gram matrices show greater separation from the distinguishable-particle baseline and reduced label-dependent model complexity. Finally, on learning benchmarks, anyonic kernels consistently outperform their bosonic and fermionic counterparts, with stronger target alignment and more favorable class geometry. Together, these findings show that exchange statistics reshape the structure and geometry of quantum feature space, leading to enhanced learning performance. Our work identifies particle exchange statistics as an overlooked computational ingredient for quantum machine learning and provides the first systematic comparison of quantum learning models across exchange phases.

2606.16171 2026-06-16 eess.SY cs.LG cs.SY 交叉投稿

Data-driven Control with Real-time Uncertainty Compensation for Multi-Fuel Engines

多燃料发动机的实时不确定性补偿数据驱动控制

Rajasree Sarkar, Arunava Banerjee, Sathya Aswath Govind Raju, Ishan Berk Altiner, Zongxuan Sun, Kenneth Kim, Chol-Bum Mike Keown

发表机构 * Department of Mechanical Engineering, University of Minnesota Twin Cities(明尼苏达大学双城分校机械工程系) DEVCOM Army Research Laboratory, Aberdeen Proving Ground, MD, USA(美国陆军研发实验室,阿伯丁试验场,马里兰州)

AI总结 针对多燃料压燃发动机燃烧相位控制中建模不确定性的挑战,提出一种基于高斯过程回归模型和不确定性补偿器的数据驱动实时控制框架,实现有限循环内收敛。

详情
AI中文摘要

多燃料压燃(CI)发动机具有出色的功率密度和燃料灵活性。然而,在广泛运行条件下实现一致且最优的燃烧相位仍然是一个重大挑战,尤其是在存在建模不确定性的情况下。本文提出了一种新颖的、数据驱动的实时不确定性补偿框架,用于多燃料CI发动机的燃烧控制。所提出的方法引入了一个伪发动机转速,使得控制输入能够动态适应影响发动机的不确定性。为了对底层燃烧过程进行建模,首先在可用的输入-输出数据上训练高斯过程回归(GPR)模型,捕捉不同运行条件下的非线性和燃料依赖行为。然后通过学习的GPR代理的模型逆合成控制输入,并增加一个不确定性补偿器,旨在减轻由运行条件动态变化和模型不准确性引起的偏差。这种集成控制策略允许在有限数量的燃烧循环内进行实时输入修正。理论分析为所提出的控制器建立了有限时间收敛保证。仿真结果表明,所提出的方法能够实时将燃烧相位引导至期望值,为多燃料CI发动机运行提供了一种可扩展且自适应的控制解决方案。

英文摘要

Multi-fuel compression ignition (CI) engines offer superior power density and fuel flexibility. However, achieving consistent and optimal combustion phasing across a wide range of operating conditions remains a major challenge, particularly in the presence of modeling uncertainties. This paper presents a novel, data-driven real-time uncertainty compensation framework for combustion control in multi-fuel CI engines. The proposed approach introduces a pseudo-engine speed that enables dynamic adaptation of control inputs in response to uncertainty affecting the engine. To model the underlying combustion process, a Gaussian Process Regression (GPR) model is first trained on available input-output data, capturing the nonlinear and fuel-dependent behavior across varying operating conditions. Control inputs are then synthesized through model inversion of the learned GPR surrogate and augmented with an uncertainty compensator designed to mitigate deviations caused by dynamic variations in operating conditions and model inaccuracies. This integrated control strategy allows for real-time input corrections within a finite number of combustion cycles. Theoretical analysis establishes finite-time convergence guarantees for the proposed controller. Simulation results demonstrate that the proposed method steers the combustion phasing to the desired value in real-time, providing a scalable and adaptive control solution for multi-fuel CI engine operation.

2606.16271 2026-06-16 cs.CV cs.LG 交叉投稿

Contrastive Learning for Seismic Horizon Tracking with Domain-Specific Priors

基于领域先验的对比学习用于地震层位追踪

Alexandre Thouvenot, Lionel Boillot, Vincent Gripon

发表机构 * IMT Atlantique, LAB-STICC, UMR CNRS 6285(IMT Atlantique, LAB-STICC, CNRS 6285联合实验室) TotalEnergies, OneTech(道达尔能源公司, OneTech)

AI总结 提出自监督融合信号与纹理的方法,利用信号导出的局部层位对应作为领域先验训练纹理深度学习模型,通过对比学习保持层位身份,实现跨不连续面的层位追踪。

Comments 5 pages, 5 figures. Submitted to the IEEE GRSL for possible publication

详情
AI中文摘要

无监督3D地震层位追踪面临一个关键限制:基于信号的传播器提供精确的迹级对齐,但在断层附近常失败,而纹理驱动的深度模型对不连续性更鲁棒,但通常以标记数据需求和降低迹级精度为代价。我们提出了一种自监督融合两种范式的方法,其中信号导出的局部层位对应作为领域先验来训练基于纹理的深度学习模型。具体来说,我们从反射体斜率估计可靠的迹间流,并将其用于形成对比目标中的正对,同时将训练限制在高置信度邻域,可选地使用断层掩码增强。目标不是推断不连续性附近的模糊对应,而是跨不连续性保持层位身份。结果,网络学习到体素级嵌入,保持局部信号连续性,同时通过相似性搜索实现跨不连续性的层位传播。在公共F3数据集和含断层合成数据集上的实验实现了比无监督基线更低的平均绝对误差(MAE),并且与使用单个标记切片的半监督方法性能相当。

英文摘要

Unsupervised 3D seismic horizon tracking faces a key limitation: signal-based propagators provide accurate trace-level alignment but often fail near faults, whereas texture-driven deep models are more robust to discontinuities, typically at the cost of labeled data requirements and reduced trace-level precision. We propose a self-supervised fusion of both paradigms in which signal-derived local horizon correspondences act as domain-specific priors to train a texture-based deep learning model. Specifically, we estimate reliable trace-to-trace flows from reflector slopes and use them to form positive pairs in a contrastive objective, while restricting training to high-confidence neighborhoods, optionally augmented with a fault mask. The objective is not to infer ambiguous correspondences close to discontinuities, but to preserve horizon identity across them. As a result, the network learns voxel-wise embeddings that preserve local signal continuity while enabling horizon propagation beyond discontinuities through similarity search. Experiments on the public F3 dataset and a faulted synthetic dataset achieve lower mean absolute error (MAE) than unsupervised baselines and competitive performance against a semi-supervised method using a single labeled slice.

2606.16333 2026-06-16 cs.CV cs.GR cs.LG 交叉投稿

Differentiable Packing of Irregular 3D Objects with Adaptive Container Estimation

不规则3D物体的可微分装箱与自适应容器估计

Palak Gupta, Shanmuganathan Raman

发表机构 * Indian Institute of Technology Gandhinagar(印度理工学院甘地讷格尔分校)

AI总结 提出一种可微分装箱框架,通过梯度优化联合调整物体姿态和容器尺寸,利用自适应挤压机制和基于张量广播的快速计算,在单个GPU上数分钟内实现比基线方法小11-32%的容器。

Comments Comments: 20 pages, 8 figures, 5 tables. Under review at Computers & Graphics (Elsevier)

详情
AI中文摘要

大多数现有方法要么预先固定容器,要么通过外部搜索循环仅优化单个容器维度,其余维度则作为手动调整问题。我们提出了一种可微分装箱框架,在单个基于梯度的循环内联合优化所有6N个物体姿态参数和所有三个容器边长。该公式结合了六个基于物理的、可微分的损失项,这些损失项通过轴对齐包围盒代理直接在三角形网格上计算。自适应挤压机制在重叠损失低于按对数量缩放的阈值时周期性收紧容器,导致容器体积先大幅下降,然后进行小幅细化。所有成对计算均以张量广播形式编写,与基于循环的参考实现相比,速度提升了3.4到54倍。该流程使用Python和PyTorch实现,无需物理引擎、FFT库或凸分解。在多个物体类别上,该方法在N=100时产生的容器比时间匹配的DBLF和模拟退火基线小11%至32%,同时在单个消费级GPU上每个实例的运行时间不到4分钟。

英文摘要

Most existing approaches either fix the container in advance or optimize only a single container dimension through an outer search loop, leaving the remaining dimensions as a manual tuning problem. We present a differentiable packing framework that jointly optimizes all 6N object pose parameters and all three container side lengths inside a single gradient-based loop. The formulation combines six physics-inspired, differentiable loss terms computed directly on triangle meshes through axis-aligned bounding-box proxies. An adaptive squeezing mechanism periodically tightens the container whenever the overlap loss falls below a pair-count-scaled threshold, producing a large initial drop in container volume, followed by small refinements. All pairwise computations are written in tensor-broadcasting form, giving a 3.4 to 54 times speedup over a reference loop-based implementation. The pipeline is implemented in Python and PyTorch, with no physics engine, FFT library, or convex decomposition. On multiple object categories, the method produces containers that are 11 to 32 percent smaller than time-matched DBLF and simulated-annealing baselines at N =100, while running in under 4 minutes per instance on a single consumer GPU.

2606.16505 2026-06-16 cs.SD cs.LG 交叉投稿

Semi-Supervised Speech Confidence Detection using Pseudo-Labelling and Whisper Embeddings

半监督语音自信度检测:使用伪标签和Whisper嵌入

Adam Wynn, Jingyun Wang, Xiangyu Tan

发表机构 * Durham University(杜伦大学) Shanghai Open University(上海开放大学)

AI总结 提出一种结合人工特征与Whisper嵌入的框架,通过伪标签技术扩充数据,利用共注意力机制融合特征,实现75%的语音自信度检测准确率。

Comments 8 pages, 3 figures. Published in the Proceedings of the 26th International Conference on Artificial Intelligence in Education (AIED 2025). Shorter, preliminary version of arXiv:2605.12387

详情
Journal ref
AIED 2025. LNCS vol 15882. Springer, Cham (2025)
AI中文摘要

理解说话者的自信度在教育环境中至关重要,因为它可以增强个性化反馈并改善学习成果。本研究引入了一种新颖的框架,通过将人工设计的特征与Whisper编码器的嵌入相结合来检测说话者的自信度。为了解决数据限制问题,采用伪标签技术来扩展标记数据集,使模型能够从人工标注和模型生成的标签中学习。该框架将传统语音特征(包括音高、音量、语速以及不流畅和重音的存在)与Whisper嵌入相结合,并使用共注意力机制融合这些表示,实现了75%的整体准确率。本研究有助于推进语音分析,支持个性化学习和口语技能发展的应用。

英文摘要

Understanding speaker confidence is crucial in educational settings, as it can enhance personalised feedback and improve learning outcomes. This study introduces a novel framework for detecting speaker confidence by integrating human-engineered features with embeddings from the Whisper encoder. To address data limitations, a pseudo-labelling technique is employed to expand the labelled dataset, allowing the model to learn from both human-annotated and model-generated labels. The framework combines traditional speech features including pitch, volume, rate of speech, and the presence of disfluencies and stress, with Whisper embeddings, and uses a co-attention mechanism to fuse these representations and achieve an overall accuracy of 75%. This study contributes to advancing speech analysis, enabling applications that support personalised learning and speaking skill development.

2606.16510 2026-06-16 math.NA cs.LG cs.NA 交叉投稿

Petrov-Galerkin Variational Physics-Informed Neural Network Framework for Two-Dimensional Singularly Perturbed Problems

Petrov-Galerkin变分物理信息神经网络框架用于二维奇异摄动问题

Vijay Kumar, Gautam Singh

发表机构 * Department of Mathematics, National Institute of Technology Tiruchirappalli(数学系,特里奇里帕利尔国家理工学院)

AI总结 提出Petrov-Galerkin变分物理信息神经网络(VPINN)方法,采用神经网络构建试验空间和张量积帽函数作为测试函数,高效求解二维奇异摄动问题,在最大范数和L2范数上实现高精度。

详情
AI中文摘要

本研究提出了一种基于Petrov-Galerkin的变分物理信息神经网络(VPINN),用于高效求解具有一个和两个小摄动参数的二维奇异摄动问题(SPPs)。该方法采用神经网络构建试验解空间,同时采用张量积帽函数作为测试函数来强制执行变分形式。为了精确解析尖锐边界层,使用Petrov-Galerkin公式实现变分形式。Dirichlet边界条件直接施加,而源项通过自动微分计算。在标准二维问题上的计算实验表明,所提方法在最大范数和L2范数上均实现了高精度。这些结果证实了Petrov-Galerkin VPINN方法在准确捕捉二维SPPs多尺度特征方面的效率和鲁棒性。

英文摘要

This study proposes a Petrov-Galerkin based Variational Physics-Informed Neural Network (VPINN) for efficiently solving two-dimensional singularly perturbed problems (SPPs) with one and two small perturbation parameters. The approach employs neural networks to construct the trial solution space, while tensor-product hat functions are adopted as test functions to enforce the variational form. To accurately resolve of sharp boundary layers, the variational form is implemented using a Petrov-Galerkin formulation. Dirichlet boundary conditions are imposed directly, while the source terms are computed using automatic differentiation. Computational experiments on standard two-dimensional problems demonstrate that the proposed method achieves high accuracy in both the maximum and L_2 norms. These results confirm the efficiency and robustness of the Petrov-Galerkin VPINN approach in accurately capturing the multiscale features of two-dimensional SPPs.

2606.16587 2026-06-16 physics.flu-dyn cs.AI cs.LG physics.comp-ph 交叉投稿

Learning Interface Breakup: A Geometry-Conditioned Latent Surrogate for Spray Formation

学习界面破碎:一种用于喷雾形成的几何条件潜在代理模型

Julius H Ramlau, Friedrich Hastedt, Tolga Birdal, Ehecatl-Antonio del Río Chanona, Nausheen S Basha, Omar K Matar

发表机构 * University of California, Berkeley(加州大学伯克利分校) Technical University of Munich(慕尼黑技术大学) Istanbul Technology University(伊斯坦布尔技术大学) University of Texas at Austin(德克萨斯大学奥斯汀分校) University of Cambridge(剑桥大学) University of Oxford(牛津大学)

AI总结 提出一种几何条件潜在代理模型,通过编码自适应网格细化(AMR)的单元密度场,在797个两相喷嘴模拟上训练,实现瞬态破碎动力学的高效预测,推理速度比Basilisk CFD快6×10^4倍。

Comments 11 pages, 5 figures, accepted to ICML AI4Physics 2026

详情
AI中文摘要

设计喷雾喷嘴需要预测几何形状如何影响瞬态两相破碎,但采用自适应网格细化(AMR)的高保真流体体积(VOF)模拟对于迭代设计探索来说成本过高。标准代理模型也面临挑战,因为液-气界面和底层的自适应离散化都随时间及几何形状变化。我们引入了一种几何条件潜在代理模型,该模型在797个两相喷嘴模拟上训练,通过编码AMR单元密度场(而非完整的多通道流状态)作为求解器集中分辨率的紧凑代理。从该表示出发,模型重建瞬态密度演化和喷嘴几何形状,而一个轻量级的第二阶段则恢复剩余的流动变量。在保留的模拟上,该方法准确捕捉了关键的界面动力学,同时将每条轨迹的推理时间减少到0.045秒,相对于Basilisk CFD加速超过6×10^4倍。这些结果表明,AMR细化结构可以作为瞬态两相流几何条件代理建模的紧凑且可学习的表示。

英文摘要

Designing spray nozzles requires predicting how geometry shapes transient two-phase breakup, but high-fidelity volume-of-fluid (VOF) simulations with adaptive mesh refinement (AMR) are too expensive for iterative design exploration. Standard surrogate models are also challenged by this setting because both the liquid--gas interface and the underlying adaptive discretization evolve across time and geometries. We introduce a geometry-conditioned latent surrogate trained on 797 two-phase nozzle simulations that addresses this by encoding the AMR cell-density field, rather than the full multi-channel flow state, as a compact proxy for where the solver concentrates resolution. From this representation, the model reconstructs transient density evolution and nozzle geometry, and a lightweight second stage recovers the remaining flow variables. On held-out simulations, the method accurately captures key interface dynamics while reducing inference time to 0.045 seconds per trajectory, corresponding to a speed-up of more than $6\times10^4$ relative to Basilisk CFD. These results suggest that AMR refinement structure can serve as a compact and learnable representation for geometry-conditioned surrogate modeling of transient two-phase flows.

2606.16607 2026-06-16 eess.SP cs.IT cs.LG math.IT 交叉投稿

Context-Aware Markov VAE for CSI Compression in Wireless Systems

面向无线系统中CSI压缩的上下文感知马尔可夫VAE

Efstathios Chatziloizos, Konstantinos Vandikas, Aneta Vulgarakis Feljan, Zheng Chen, Nikolaos Pappas

AI总结 提出基于k-记忆马尔可夫变分自编码器的上下文感知压缩框架,利用有限时间窗口捕捉CSI在潜在空间中的演化,在低中压缩率下显著提升重构性能。

Comments 5 pages, 3 figures, 2 tables

详情
AI中文摘要

本文研究了在频分双工(FDD)系统中,针对时变大规模多输入多输出(MIMO)信道,在有限反馈资源下的神经信道状态信息(CSI)压缩问题。主要挑战在于,由于CSI在连续快照间表现出强时间相关性,需要获得紧凑且高效的CSI表示。现有的无记忆压缩模型未利用这一特性,而简单的时间扩展方法通常合并多个观测值,但未显式建模潜在动态。我们提出了一种基于k-记忆马尔可夫变分自编码器(k-MMVAE)的上下文感知压缩框架,该框架使用有限时间窗口在潜在空间中捕捉CSI的演化。该模型引入了具有有限记忆的马尔可夫结构潜在动态,从而能够有效利用时间依赖性进行压缩。仿真结果表明,与无记忆和弱顺序基线相比,所提方法改善了目标CSI重构性能,尤其是在低和中压缩率下。这些结果表明,显式的潜在时间建模可以在有限反馈约束下为CSI压缩提供有效机制。

英文摘要

This paper considers neural channel state information (CSI) compression for time-varying massive multiple-input multiple-output (MIMO) channels in frequency division duplex (FDD) systems with limited feedback resources. The main challenge lies in obtaining a compact and efficient representation of the CSI given that it exhibits strong temporal correlation across successive snapshots. Existing memoryless compression models do not exploit this property, while simple temporal extensions often incorporate multiple observations without explicitly modeling the latent dynamics. We propose a context-aware compression framework based on a k-memory Markov variational autoencoder (k-MMVAE), which uses a finite temporal window to capture the evolution of CSI in the latent space. The model introduces Markov-structured latent dynamics with finite memory, enabling efficient use of temporal dependencies for compression. Simulation results show that the proposed approach improves target CSI reconstruction performance compared to memoryless and weakly sequential baselines, particularly at low and moderate compression rates. These results suggest that explicit latent temporal modeling can provide an effective mechanism for CSI compression under limited feedback constraints.

2606.16693 2026-06-16 q-bio.NC cs.LG 交叉投稿

Learning Hybrid Biophysical Neuron Models with Neural ODEs

利用神经常微分方程学习混合生物物理神经元模型

Jonas Beck, Michael Deistler, Dóra Viktória Molnár, Jakob H. Macke, Philipp Berens

AI总结 提出混合建模框架,将神经常微分方程嵌入电导基生物物理模型,以捕捉未知电流或错误指定的通道动力学,从电压记录中恢复可解释的门控动力学,并降低计算成本。

详情
AI中文摘要

生物物理神经元模型将神经活动的测量与潜在的细胞机制联系起来。然而,一个核心挑战是许多离子通道的动力学特征不明确,而实际简化——省略通道或减少形态细节——会在模型与生物学之间引入系统性差距。弥合这些差距需要能够灵活发现未建模动力学同时保持机制可解释性的方法。在这里,我们引入了一个混合建模框架,将神经常微分方程嵌入到基于电导的生物物理模型中,以捕捉未知电流或错误指定的通道动力学。通过根据电压依赖的稳态和时间常数函数参数化神经ODE,我们直接从电压记录中恢复可解释的门控动力学,而无需假设函数形式。我们展示了混合模型能够拟合2400个离子通道模型的门控动力学,并从单电流钳记录中恢复未知的门控动力学,在现实输入和参数错误指定下泛化到分布外刺激模式。我们还使用我们的方法将皮层神经元的多室模型简化为具有学习轴向电流的单室混合模型,计算成本降低了一个数量级。总之,我们的结果建立了一个即插即用的框架,用于选择性地用电导基模型中的未知组件替换为神经常微分方程,同时保留其机制结构。

英文摘要

Biophysical neuron models link measurements of neural activity to underlying cellular mechanisms. Yet, a central challenge is that the kinetics of many ion channels are poorly characterized, and practical simplifications -- omitting channels or reducing morphological detail -- introduce systematic gaps between model and biology. Bridging these gaps requires approaches that can flexibly discover unmodeled dynamics while preserving mechanistic interpretability. Here, we introduce a hybrid modeling framework that embeds neural ordinary differential equations into conductance-based biophysical models to capture unknown currents or mis-specified channel kinetics. By parameterizing the neural ODE in terms of voltage-dependent steady-state and time-constant functions, we recover interpretable gating dynamics directly from voltage recordings without assuming a functional form. We show that the hybrid model fits the gating kinetics of 2400 ion channel models and recovers unknown gating dynamics from single current-clamp recordings, generalizing to out-of-distribution stimulus regimes under realistic inputs and parameter misspecification. We also use our method to reduce a multicompartment model of a cortical neuron into a single-compartment hybrid model with a learned axial current, yielding up to an order of magnitude lower computational cost. Together, our results establish a plug-and-play framework for selectively replacing unknown components of conductance-based models with neural ODEs while preserving their mechanistic structure.

2606.16737 2026-06-16 math-ph cs.LG math.MP 交叉投稿

The Algebra of Units: From Buckingham's Pi-grec Theorem to Latent-Variable Learning

单位的代数:从白金汉π定理到潜变量学习

Mauro Valorani

AI总结 提出一种从数据中自动发现无量纲群的方法,利用对数变换后的低维流形和奇异值分解,无需物理先验知识,在合成压缩机数据集上精确恢复经典工程无量纲数。

Comments 31 pages, 2 figures

详情
AI中文摘要

工程师经常测量许多量——速度、压力、温度、长度——这些量用不同的物理单位表示。白金汉π定理指出,这些变量总是可以组合成一组较小的无量纲数,其值完全决定系统的行为。传统上,识别合适的无量纲群需要专家知识和物理洞察。本文表明,它们可以从数据中自动发现,无需事先了解控制物理。关键观察是,在对数变换后,同一系统在不同缩放下的测量值位于一个低维流形上,其几何形状由潜在的无量纲群决定。奇异值分解(SVD)直接从数据中识别该流形。随后对整数指数组合的搜索恢复候选无量纲量,而重复变量过滤器仅保留由机器特征尺度构造的那些。该过程恢复了熟悉的工程群,包括流量系数、扬程系数和马赫数,同时排除了等价但可解释性较差的替代方案。该方法在包含16,000个测量值的合成压缩机数据集上进行了演示。从原始有量纲变量开始,无物理输入,它以数值精度恢复正确的无量纲群,并以低于0.01%的误差重现压缩机性能图。更广泛地说,这项工作揭示了经典量纲分析与现代数据驱动学习之间的密切联系。两者依赖于相同的基本代数结构,为构建同时可解释、可扩展和数据高效的物理模型提供了新途径。

英文摘要

Engineers often measure many quantities-speed, pressure, temperature, length-expressed in different physical units. The Buckingham Pi-grec theorem states that these variables can always be combined into a smaller set of dimensionless numbers whose values fully determine the system's behaviour. Identifying the appropriate dimensionless groups has traditionally required expert knowledge and physical insight. This paper shows that they can instead be discovered automatically from data, without prior knowledge of the governing physics. The key observation is that, after logarithmic transformation, measurements collected under different scalings of the same system lie on a low-dimensional manifold whose geometry is determined by the underlying dimensionless groups. Singular value decomposition (SVD) identifies this manifold directly from data. A subsequent search over integer-exponent combinations recovers candidate dimensionless quantities, while a repeating-variable filter retains only those constructed from the machine's characteristic scales. This procedure recovers familiar engineering groups, including the flow coefficient, head coefficient, and Mach number, while excluding equivalent but less interpretable alternatives. The method is demonstrated on a synthetic compressor dataset containing 16,000 measurements. Starting from raw dimensional variables and no physics input, it recovers the correct dimensionless groups to numerical precision and reproduces the compressor performance map with an error below 0.01%. More broadly, the work reveals a close connection between classical dimensional analysis and modern data-driven learning. Both rely on the same underlying algebraic structure, suggesting new approaches for building physical models that are simultaneously interpretable, scalable, and data-efficient.

2606.16747 2026-06-16 cs.GR cs.LG 交叉投稿

STAR-NT: Spatiotemporal Acceleration of Real-Time Neural Transparency Rendering

STAR-NT: 实时神经透明渲染的时空加速

Grigoris Tsopouridis, Christos Georgiou-Mousses, Aris Panagiotidis, Andreas Vasilakis, David Corrigan, Tobias A. Franke, Aleksei Gorbonosov, Andrei Astapov, Ioannis Fudos

AI总结 提出时空加速框架,利用空间自适应四叉树细分和时间深度重投影,降低神经顺序无关透明渲染的几何开销,保持视觉质量。

Comments Supplemental material at https://github.com/gtsopus/STAR-NT

详情
AI中文摘要

神经顺序无关透明渲染能够高质量地渲染重叠透明表面,但其几何通道和网络输入生成仍然代价高昂,尤其是在移动和旧硬件上。我们提出了一种时空加速框架,利用空间和时间相干性来减少这一开销,同时保持视觉质量。在空间上,我们使用自适应四叉树屏幕空间细分,根据局部颜色方差缩放几何通道分辨率。在时间上,选定帧通过基于深度的重投影重用前一透明度结果,而不是完全渲染。这些优化共同降低了渲染成本,并高效集成到现有的实时渲染管线中。

英文摘要

Neural order-independent transparency delivers high-quality rendering of overlapping transparent surfaces, but its geometry passes and network input generation remain costly, particularly on mobile and legacy hardware. We present a spatiotemporal acceleration framework that exploits spatial and temporal coherence to reduce this overhead while preserving visual quality. Spatially, we use adaptive quadtree-based screen-space subdivision to scale geometry pass resolution according to local color variance. Temporally, selected frames reuse the previous transparency result through depth-based reprojection instead of full rendering. Together, these optimizations reduce rendering cost and integrate efficiently into existing real-time rendering pipelines.

2606.16815 2026-06-16 eess.SP cs.AI cs.LG 交叉投稿

A Perception vs. Distortion Perspective on Score-Based Generative Channel Estimation

基于分数的生成式信道估计中的感知与失真权衡视角

Marco Skocaj, Lukas Eller, Mate Boban

AI总结 本文通过感知-失真权衡理论,分析了基于分数的生成模型在信道估计中的优势与局限,指出在高预测不确定性下可接近贝叶斯最优性能,低不确定性下判别式方法更优。

Comments 13 pages

详情
AI中文摘要

受其在计算机视觉和逆问题求解中的显著成功驱动,基于分数的模型越来越多地应用于无线通信,并在一系列物理层任务中展现出潜力。然而,尽管兴趣日益增长,当前文献往往缺乏对分数匹配何时比传统判别学习具有实际优势的严格分析。本文旨在通过信道估计这一无线系统中的基本逆问题用例来填补这一空白。我们通过感知-失真权衡的视角,提出了基于分数的信道估计的理论解释,识别了分数匹配表现优异的条件及其关键局限性。特别是,通过将下游无线任务(如容量最大化)建模为信道估计过程的泛函,我们量化了标准失真最小化方法所导致的超额风险。大量数值结果表明,在高预测不确定性下,大的超额风险差距可以通过基于分数的估计来弥补,从而通过学习的后验实现接近贝叶斯最优的预编码,而在低预测不确定性下,由于复杂度更低且模型容量利用更高效,判别式失真最小化方法更可取。

英文摘要

Driven by their remarkable success in computer vision and inverse problem solving, score-based models are increasingly applied to wireless communications, where they show promise across a range of physical-layer tasks. However, despite this growing interest, the current literature often lacks a rigorous analysis of when score-matching offers a tangible advantage over traditional discriminative learning. This paper aims to address this gap through the use-case of channel estimation, a fundamental inverse problem in wireless systems. We present a theoretically grounded interpretation of score-based channel estimation through the lens of the perception-distortion tradeoff, identifying the conditions where score matching excels as well as its key limitations. In particular, by modeling downstream wireless tasks (e.g., capacity maximization) as functionals of the channel estimation process, we quantify the excess risk incurred by standard distortion-minimization approaches. Extensive numerical results show that under high predictive uncertainty, the large excess risk gap can be offset by score-based estimation, enabling near Bayesian-optimal precoding via the learned posterior, whereas in the low predictive uncertainty regime, discriminative distortion-minimization approaches are preferable due to lower complexity and more efficient use of model capacity.

2606.16935 2026-06-16 cs.RO cs.AI cs.LG 交叉投稿

CrossMaps: Confidence-Aware Open-Vocabulary Semantic Mapping for Rover Navigation

CrossMaps: 用于漫游车导航的置信度感知开放词汇语义地图

Jan-Niklas Klein, Sona Ghahremani, Christian Medeiros Adriano, Holger Giese

发表机构 * Hasso Plattner Institute for Digital Engineering, Potsdam, Germany(哈索·普拉特纳数字工程研究所(德国波茨坦))

AI总结 提出CrossMaps,一种实时置信度感知开放词汇语义地图构建流水线,通过多尺度CLIP嵌入、置信度融合和双记忆架构生成可查询语义地图,用于漫游车导航。

Comments IEEE International Conference on Robotics and Automation (ICRA) 2026: ROSE International Workshop on Robotics Software Engineering, June 01, 2026, Vienna, Austria

详情
AI中文摘要

漫游车依赖感知来维护空间地图,该地图编码物体和传感器质量(例如,距离可靠性、光照伪影、数据密度),指导数据融合、嵌入更新以及在部分可观测性下的导航。为了研究这些耦合的感知-导航过程,我们提出了CrossMaps,一种实时的置信度感知开放词汇语义地图构建流水线,该流水线从RGB-D数据构建可语言查询的地图。基于VLMaps风格的方法,CrossMaps集成了多尺度CLIP嵌入、置信度感知融合以及由短期记忆(STM)和长期记忆(LTM)组成的双记忆架构。STM使用几何、语义和时间置信度线索聚合噪声视觉观测,而置信且一致的单元被提升到LTM作为持久语义地标。CrossMaps设计用于与Jetson Orin驱动的UGV以及SLAM一起部署,实时运行并生成语义热力图,可通过自然语言查询来引导漫游车导航。

英文摘要

Rovers rely on perception to maintain spatial maps that encode both objects and sensor quality (e.g., range reliability, lighting artifacts, data density), guiding data fusion, embedding updates, and navigation under partial observability. To study these coupled perception-navigation processes, we present CrossMaps, a real-time confidence-aware open-vocabulary semantic mapping pipeline that constructs language-queryable maps from RGB-D data. Building on VLMaps-style approaches, CrossMaps integrates multi-scale CLIP embeddings with confidence-aware fusion and a dual-memory architecture consisting of Short-Term Memory (STM) and Long-Term Memory (LTM). The STM aggregates noisy visual observations using geometric, semantic, and temporal confidence cues, while confident and coherent cells are promoted to the LTM as persistent semantic landmarks. Designed for deployment with a Jetson Orin-powered UGV alongside SLAM, CrossMaps runs in real time and produces semantic heatmaps that can be queried with natural language to guide rover navigation.

2606.16950 2026-06-16 physics.ins-det cs.LG physics.bio-ph physics.chem-ph physics.data-an q-bio.BM 交叉投稿

Latent space mapping of interpretable structural coordinates from stochastic single-molecule signals

从随机单分子信号中可解释结构坐标的潜空间映射

Matteo Cartiglia, Sandro Kuppel, Wouter Botermans Wannes Peeters, Natan Biesmans, Liam Vandekerckhove, Eric Beamish, Koen Ongena, Wouter Renckens, Pol Van Dorpe, Sanjin Marion

AI总结 提出通过对比编码器将纳米孔随机信号映射到可解释分子结构坐标的潜空间,实现高效识别与数据融合。

Comments 32 pages, 6 figures

详情
AI中文摘要

纳米孔是通用的单分子传感器,但其效用从根本上受到随机易位动力学扭曲任何编码信息的限制。我们通过从时域分析转向学习潜空间映射来解决这一问题,该映射通过一个仅在物理信息模型模拟信号上训练的对比编码器实现。该编码器将固态纳米孔对工程化DNA条形码的信号映射到一个可解释的分子坐标系统。学习到的表示对结构条形码参数敏感,同时对采集条件和易位构象保持不变,允许跨设备的数据池化。分子识别只需一次通过编码器,计算成本相比基于比对的方法降低三个数量级。我们通过混合物定量、稀有变异检测、共识条形码重建和实时信号采集进行了实验验证。这种从时间分析到将结构坐标映射到潜空间的转变,通过将分类与可解释的编码分子信息联系起来,改变了分析随机传感器信号的范式。

英文摘要

Nanopores are versatile single-molecular sensors, but their utility is fundamentally constrained by stochastic translocation dynamics warping any encoded information. We resolve it by shifting from time-domain analysis to a learned latent-space mapping via a contrastive encoder trained exclusively on simulated signals from a physics-informed model. This encoder maps solid-state nanopore signals of engineered DNA barcodes into an interpretable molecular coordinate system. The learned representation is responsive to structural barcode parameters while remaining invariant to acquisition conditions and translocation conformation, allowing data pooling across devices. Molecule identification requires a single pass through the encoder, reducing computational cost by three orders of magnitude relative to alignment-based methods. We experimentally validate through mixture quantification, rare-variant detection, consensus barcode reconstruction, and real-time signal acquisition. This shift from temporal analysis to mapping structural coordinates into a latent space changes the paradigm behind analyzing stochastic sensor signals by linking classification to interpretable encoded molecular information.

2606.16985 2026-06-16 stat.ML cs.LG eess.SP nlin.CD stat.ME 交叉投稿

Dynestyx: A Probabilistic Programming Library for Dynamical Systems

Dynestyx: 一个面向动态系统的概率编程库

Daniel Waxman, Dmitry Batenkov, John Feser, Andy Zane, Eli Bingham, Youssef Marzouk, Matthew E. Levine

AI总结 提出dynestyx库,通过统一接口支持状态空间模型的先验指定、混合效应推断及状态与参数估计,实现贝叶斯动态系统分析。

Comments 7 pages

详情
AI中文摘要

状态空间模型(SSMs)是贝叶斯处理动态系统的标准形式,在统计学、信号处理和机器学习中有自然应用。尽管在理论和应用中都很重要,但动态系统已被证明难以融入现代概率编程语言(PPLs),使得最先进的方法对实践者不太可及,并在遵循“贝叶斯工作流”时引入摩擦。我们介绍了dynestyx,一个对SSMs提供一流支持的概率编程库,包括在状态和参数估计方面的最先进方法。通过一个统一的接口,用户可以指定离散时间或连续时间动态系统的任意先验,对混合效应数据进行推断,并进行具有原则性不确定性量化的状态和参数估计。

英文摘要

State-space models (SSMs) are the standard formalism for Bayesian treatment of dynamical systems, with natural applications in statistics, signal processing, and machine learning. Despite their importance in both theory and application, dynamical systems have proven difficult to incorporate in modern probabilistic programming languages (PPLs), making state-of-the-art methods less accessible to practitioners and introducing friction in following the "Bayesian workflow." We introduce dynestyx, a probabilistic programming library with first-class support for SSMs, including state-of-the-art methods in the estimation of both states and parameters. Through a single, unified interface, users may specify arbitrary priors for discrete-time or continuous-time dynamical systems, perform inference over mixed-effect data, and make state and parameter estimates with principled uncertainty quantification.

2602.10385 2026-06-16 cs.LG cs.AI 版本更新

Capture Timing-Attention of Events in Clinical Time Series

捕捉临床时间序列中的事件时序注意力

Jia Li, Yu Hou, Rui Zhang

发表机构 * Department of Surgery(外科系;计算机科学系,明尼苏达大学明尼阿波利斯分校,MN USA) Department of Computer Science, U of M Minneapolis MN USA

AI总结 提出LITT架构,通过虚拟相对时间轴对齐事件序列,实现事件时序注意力机制,用于个性化临床轨迹分析,在乳腺癌患者心脏毒性预测中优于现有方法。

Comments 8 pages of body text

详情
AI中文摘要

从纵向EHR数据中自动发现个性化轨迹(即顺序事件模式)对于临床研究中的精准医学至关重要,但即使对于当代AI模型来说,这仍然是一个艰巨的挑战。例如,虽然Transformer的注意力机制可以捕捉丰富的关联,但它基本上不关心事件的时间和顺序,从而绕过了潜在的因果推理。直观上,我们需要一种能够评估患者特定轨迹之间“对齐程度”并识别其共享模式(即一致序列中的显著事件)的方法。这需要将时间视为一个真正的**可计算**维度,允许模型为候选事件分配超出其观测物理时间的“相对时间戳”。在这项工作中,我们引入了LITT(个体级时间变换),一种新颖的架构,能够在虚拟的“相对时间线”上临时对齐序列事件,从而实现**事件时序聚焦的注意力**和临床轨迹的个性化解释。其可解释性和有效性在来自3,276名乳腺癌患者的真实纵向EHR数据上得到验证,用于预测心脏毒性诱发心脏病的发病时间。此外,LITT在公共数据集上优于基准和最先进的生存分析方法,使其成为临床AI精准医学的重要一步。

英文摘要

The contemporary paradigm of trajectory learning operates fundamentally at the level of group dynamics, systematically reducing individual-level complexity to fit group-level models, thus rendering effective patient subtyping difficult and individual-level modeling largely out of reach. We propose a data-driven paradigm that introduces a dedicated individual-level temporal variable to capture \emph{Timing Attention} (i.e., the degree of concentration of an event's timing distribution across the patient cohort), thereby rendering timing a \emph{computable dimension} that enables individualized temporal features in trajectory learning. Instantiated as the Level-of-Individual Time Transformation (LITT) and applied to longitudinal EHR data from 3,276 breast cancer patients, the proposed paradigm demonstrates, for the first time to our knowledge: (1) automatic discovery of clinically significant patient trajectories, and (2) counterfactual timing deduction, that is, a \emph{What-If Machine}. Both results are purely data-driven, requiring no prior domain knowledge. LITT further achieves strong performance on timing prediction and survival analysis tasks.

2412.00107 2026-06-16 cs.LG cs.AI eess.SP 版本更新

Virtual Sensing to Enable Real-Time Monitoring of Inaccessible Locations & Unmeasurable Parameters

虚拟传感实现不可达位置与不可测参数的实时监测

Kazuma Kobayashi, Farid Ahmed, Jaewan Park, Subhankar Sarkar, Souvik Chakraborty, Syed Bahauddin Alam

发表机构 * Plasma & Radiological Engineering Department, Grainger College of Engineering, Nuclear, University of Illinois Urbana-Champaign(等离子体与辐射工程系,格拉inger工程学院,核能,伊利诺伊大学厄巴纳-香槟分校) Mechanical Science and Engineering Department, Grainger College of Engineering, University of Illinois Urbana-Champaign(机械科学与工程系,格拉inger工程学院,伊利诺伊大学厄巴纳-香槟分校) National Center for Supercomputing Applications, Urbana, IL, USA(国家超级计算应用中心,伊利诺伊州厄巴纳,美国) Department of Applied Mechanics, Indian Institute of Technology Delhi, New Delhi, India(应用力学系,印度理工学院德里,新德里,印度) Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi(Yardi人工智能学院,印度理工学院德里)

AI总结 针对能量系统中物理传感器无法部署的实时监测问题,提出基于神经算子的虚拟传感框架MIMONet,将稀疏边界测量映射到内部场,在多种热流体系统中实现亚毫秒级高精度推理。

Comments New analysis and results are added

详情
AI中文摘要

在物理仪器不可行的能量系统中,对安全关键内部状态的实时监测仍然是一个开放问题。现有方法依赖于显式控制方程、有限维状态向量或逐实例重训练,这阻碍了在实时约束下对任意内部坐标进行网格无关的场级推理。我们针对核级热流体系统引入了基于算子的虚拟传感:使用神经算子框架学习将稀疏边界测量映射到物理不可达区域中耦合内部场的解算子,明确地将问题分类以区别于经典状态估计和逐点软传感。我们通过MIMONet实例化该框架,这是一种分支-主干算子,扩展了三个实用选择:用于异构(标量和函数值)输入的多模态分支编码器;用于保持双线性PDE耦合结构的乘法分支融合;以及在主干最后一层具有每通道基投影的共享潜在多场解码。在从经典顶盖驱动空腔流到压水堆子通道再到完全耦合换热器的逐步复杂评估中,MIMONet实现了低于5%的相对误差和在数据中心加速器上的亚毫秒推理(在NVIDIA H200上每次换热器推理为0.35 ms / 46 mJ,且在A40-H200-GH200范围内均低于毫秒),同时在50%传感器噪声下保持稳定。随着几何约束和物理耦合的增强,MIMONet保持准确,表明基于算子的虚拟传感可以在物理仪器失效时恢复可观测性,在评估的运行包络内建立了基于仿真的可行性,作为面向安全关键能量系统的未来实验和跨求解器验证的一步。

英文摘要

Real-time monitoring of safety-critical interior states remains an open problem in energy systems where physical instrumentation is infeasible. Existing approaches rely on explicit governing equations, finite-dimensional state vectors, or per-instance retraining, which prevents mesh-independent, field-level inference at arbitrary interior coordinates under real-time constraints. We introduce operator-based virtual sensing for nuclear-grade thermal-fluid systems: we use the neural-operator framework to learn solution operators that map sparse boundary measurements to coupled internal fields in physically inaccessible regions, framing the problem class explicitly to distinguish it from classical state estimation and pointwise soft sensing. We instantiate this framework with MIMONet, a branch-trunk operator extended with three practical choices: multi-modal branch encoders for heterogeneous (scalar and function-valued) inputs; multiplicative branch fusion to preserve the bilinear PDE coupling structure; and shared-latent multi-field decoding with per-channel basis projections at the trunk's final layer. Evaluated across escalating complexity, from canonical lid-driven cavity flow to pressurized water reactor subchannels to fully coupled heat exchangers, MIMONet achieves below 5% relative errors and sub-millisecond inference on data-center accelerators (0.35 ms / 46 mJ per heat-exchanger inference on an NVIDIA H200, and sub-millisecond across the A40-H200-GH200 range), while remaining stable under 50% sensor noise. By staying accurate as geometric confinement and physics coupling intensify, MIMONet shows that operator-based virtual sensing can restore observability where physical instrumentation fails, establishing simulation-based feasibility within the evaluated operating envelopes as a step toward future experimental and cross-solver validation for safety-critical energy systems.

2504.11320 2026-06-16 cs.LG cs.AI cs.DC math.OC stat.ML 版本更新

Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints

优化大语言模型推理:带有内存约束的流引导在线调度

Ruicheng Ao, Gan Luo, David Simchi-Levi, Xinshang Wang

发表机构 * Institute for Data, Systems, and Society, Massachusetts Institute of Technology(数据、系统与社会研究所,麻省理工学院) School of Mathematical Sciences, Peking University(北京大学数学科学学院) Alibaba Group(阿里巴巴集团)

AI总结 本文提出流引导在线调度方法,通过等待阈值算法和嵌套等待算法,在内存约束下优化大语言模型推理的延迟和容量,减少过载时的延迟。

Comments 79 pages, 20 figures

详情
AI中文摘要

大型语言模型现在每天服务于数百万用户,提供商每天的支出超过70万美元。每个请求需要逐token推理,使GPU调度成为延迟、容量和成本的关键因素。难点在于内生内存增长:生成的token会扩展键值(KV)缓存,溢出可能导致正在进行的请求被驱逐并浪费先前计算。我们将推理视为一个具有内生内存增长、线性迭代次数和驻留GPU的KV缓存约束的多阶段在线调度问题。我们引入了流模型,该模型表征了平衡批处理组成、内存需求和稳定性区域。受流模型指导,我们设计了WAIT(等待累积推理阈值)算法,该算法为已知输出长度设计了基于阈值的准入规则,并通过调节请求在解码阶段段中的推进方式扩展到未知输出长度的嵌套WAIT。两种算法在所陈述的内存条件下近似流基准。嵌套WAIT使用额外的中等规模安全缓冲区,以应对未知输出长度引起的内存溢出导致的驱逐。在配置为Llama-2-7B的A100 GPU上的Vidur模拟中,补充的实GPU验证在附录中报告,这些策略相对于广泛使用的基线算法扩大了经验上观察到的稳定运行范围,并在接近过载和过载区域显著降低了延迟。

英文摘要

Large language models now serve millions of users daily, with providers incurring costs exceeding $700,000 per day. Each request requires token-by-token inference, making GPU scheduling central to latency, capacity, and cost. The difficulty is endogenous memory growth: generated tokens expand the Key-Value (KV) cache, and overflow can evict in-progress requests and waste prior computation. We formulate inference as a multi-stage online scheduling problem with endogenous memory growth, linear iteration times, and GPU-resident KV-cache constraints. We introduce a fluid model that characterizes equilibrium batch composition, memory requirement, and stability region. Guided by the fluid model, we design WAIT (Waiting for Accumulated Inference Threshold), a threshold-based admission rule for known output lengths, and Nested WAIT, which extends the rule to unknown output lengths by regulating how requests advance across decode-stage segments. Both algorithms approximate the fluid benchmark asymptotically under the stated memory conditions. Nested WAIT uses an additional safety buffer of moderate scale to hedge against memory-overflow-induced evictions under unknown output lengths. In Vidur simulations configured for Llama-2-7B on an A100 GPU, with supplemental real-GPU validation reported in the appendix, the policies enlarge the empirically observed stable operating range relative to widely used baseline algorithms and reduce latency especially in near-overloaded and overloaded regimes.

2508.04243 2026-06-16 cs.LG cs.AI 版本更新

Automated ultrasound doppler angle estimation using deep learning

基于深度学习的自动化超声多普勒角度估计

Nilesh Patil, Ajay Anand

发表机构 * Goergen Institute for Data Science(戈尔根数据科学研究所) University of Rochester Medical Center(罗切斯特大学医学中心) University of Rochester(罗切斯特大学)

AI总结 提出一种基于深度学习的自动化多普勒角度估计方法,使用2100张颈动脉超声图像及预训练模型,平均绝对误差3.9°-9.4°,最佳模型误差低于临床可接受阈值,可避免正常速度误判为狭窄。

详情
Journal ref
Annu Int Conf IEEE Eng Med Biol Soc. 2019 Jul;2019:28-31
AI中文摘要

角度估计是测量血流速度的多普勒超声临床工作流程中的重要步骤。人们普遍认为,角度估计不正确是基于多普勒的血流速度测量误差的主要原因。在本文中,我们提出了一种基于深度学习的自动化多普勒角度估计方法。该方法使用2100张人类颈动脉超声图像(包括图像增强)进行开发。使用五个预训练模型提取图像特征,并将这些特征传递给一个自定义浅层网络进行多普勒角度估计。独立地,由一名人类观察者审阅图像进行测量以进行比较。对于评估的模型,自动角度估计与手动角度估计之间的平均绝对误差(MAE)范围为3.9°至9.4°。此外,最佳性能模型的MAE低于可接受的临床多普勒角度误差阈值,从而避免了将正常速度值误分类为狭窄。结果表明,应用基于深度学习的技术进行自动化超声多普勒角度估计具有潜力。这种技术有可能在商业超声扫描仪的成像软件中实现。

英文摘要

Angle estimation is an important step in the Doppler ultrasound clinical workflow to measure blood velocity. It is widely recognized that incorrect angle estimation is a leading cause of error in Doppler-based blood velocity measurements. In this paper, we propose a deep learning-based approach for automated Doppler angle estimation. The approach was developed using 2100 human carotid ultrasound images including image augmentation. Five pre-trained models were used to extract images features, and these features were passed to a custom shallow network for Doppler angle estimation. Independently, measurements were obtained by a human observer reviewing the images for comparison. The mean absolute error (MAE) between the automated and manual angle estimates ranged from 3.9° to 9.4° for the models evaluated. Furthermore, the MAE for the best performing model was less than the acceptable clinical Doppler angle error threshold thus avoiding misclassification of normal velocity values as a stenosis. The results demonstrate potential for applying a deep-learning based technique for automated ultrasound Doppler angle estimation. Such a technique could potentially be implemented within the imaging software on commercial ultrasound scanners.

2508.10967 2026-06-16 cs.LG cs.AI 版本更新

Retro-Expert: Collaborative Reasoning for Interpretable Retrosynthesis

Retro-Expert: 面向可解释逆合成的协同推理

Xinyi Li, Sai Wang, Yutian Lin, Yu Wu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出Retro-Expert框架,通过强化学习结合大语言模型与专用模型,实现可解释的逆合成预测,并生成基于化学逻辑的自然语言解释。

详情
AI中文摘要

逆合成预测旨在根据给定的产物分子推断反应物分子,这是化学合成中的一项基本任务。然而,现有方法依赖于静态模式匹配范式,限制了其从化学数据中进行有效逻辑决策的能力,导致黑箱过程。我们提出Retro-Expert,一个可解释的逆合成框架,通过纯强化学习结合大语言模型和专用模型的互补优势,进行协同推理。它通过三个组件输出基于化学逻辑的自然语言解释:(1)专用模型提供化学知识,将其蒸馏到高质量的化学决策空间中;(2)大语言模型驱动的批判性推理,生成具有可解释推理路径的预测;(3)基于知识的策略优化,改进可解释的决策策略。实验表明,Retro-Expert在不同指标上均优于基于大语言模型和专用模型的方法,同时生成基于化学的解释,增强了化学家在实践中的信任。本文源代码见:此 https URL。

英文摘要

Retrosynthesis prediction aims to infer the reactant molecules based on a given product molecule, which is a fundamental task in chemical synthesis. However, existing methods rely on a static pattern-matching paradigm, which limits their ability to perform effective logical decision-making from chemical data, leading to a black-box process. We propose Retro-Expert, an interpretable retrosynthesis framework that performs collaborative reasoning by combining the complementary strengths of Large Language Models and specialized models via pure reinforcement learning. It outputs natural language explanations grounded in chemical logic through three components: (1) specialized models provide chemical knowledge that is distilled into a high-quality chemical decision space, (2) LLM-driven critical reasoning to generate predictions with an interpretable reasoning path, and (3) knowledge-grounded policy optimization refines the interpretable decision policy. Experiments show that Retro-Expert surpasses both LLM-based and specialized models across different metrics, while generating chemically grounded explanations that enhance chemists' trust in practice. The source code for this paper is available at https://github.com/MagixRab-ll/Retro-Expert.

2510.02605 2026-06-16 cs.LG 版本更新

Towards CONUS-Wide ML-Augmented Conceptually-Interpretable Modeling of Catchment-Scale Precipitation-Storage-Runoff Dynamics

面向美国本土的机器学习增强概念可解释流域尺度降水-存储-径流动力学建模

Yuan-Heng Wang, Yang Yang, Fabio Ciulla, Hoshin V. Gupta, Charuleka Varadharajan

发表机构 * Earth and Environmental Science Area, Lawrence Berkeley National Lab(伯克利国家实验室地球与环境科学部) Department of Hydrology and Atmospheric Science, University of Arizona(亚利桑那大学水文学与大气科学系) School for the Environment, University of Massachusetts Boston(马萨诸塞大学波士顿分校环境学院) Department of Civil Engineering, The University of Hong Kong(香港大学土木工程系)

AI总结 本研究利用质量守恒感知机(MCP)构建机器学习增强的物理可解释流域模型,在美国本土多种水文气候条件下评估模型性能,发现基于MCP的模型在性能上与LSTM相当,强调了根据水文过程优势选择合适模型复杂度的重要性。

Comments Main text: 99 pages, 15 figures, 5 tables; Applendix: Section A-E; 2 figures; Supplementary Materials: 22 figures, 9 tables

详情
AI中文摘要

尽管许多现代研究致力于基于机器学习的大样本水文建模,但这些努力并未必然转化为基于增强的物理概念理解的预测改进。在此,我们报告了一项覆盖美国本土(跨越多种水文-地质-气候条件)的大样本研究,使用基于质量守恒感知机(MCP)的机器学习增强、物理可解释的流域尺度模型,模型复杂度各异。使用属性掩码(如雪情、森林覆盖和气候区)评估结果。我们的结果表明,根据过程优势随水文情势的变化选择适当复杂度的模型架构的重要性。基准比较显示,基于物理可解释的质量守恒MCP模型可以达到与基于长短期记忆网络(LSTM)架构的数据驱动模型相当的性能。总体而言,本研究强调了理论指导、物理基础方法在大样本水文学中的潜力,侧重于机制理解和简约可解释模型架构的开发,从而为未来能够编码空间和时间变化过程优势信息的通用模型奠定基础。

英文摘要

While many modern studies are dedicated to ML-based large-sample hydrologic modeling, these efforts have not necessarily translated into predictive improvements that are grounded in enhanced physical-conceptual understanding. Here, we report on a CONUS-wide large-sample study (spanning diverse hydro-geo-climatic conditions) using ML-augmented physically-interpretable catchment-scale models of varying complexity based in the Mass-Conserving Perceptron (MCP). Results were evaluated using attribute masks such as snow regime, forest cover, and climate zone. Our results indicate the importance of selecting model architectures of appropriate model complexity based on how process dominance varies with hydrological regime. Benchmark comparisons show that physically-interpretable mass-conserving MCP-based models can achieve performance comparable to data-based models based in the Long Short-Term Memory network (LSTM) architecture. Overall, this study highlights the potential of a theory-informed, physically grounded approach to large-sample hydrology, with emphasis on mechanistic understanding and the development of parsimonious and interpretable model architectures, thereby laying the foundation for future models of everywhere that architecturally encode information about spatially- and temporally-varying process dominance.

2510.22266 2026-06-16 cs.LG cs.AI cs.CY 版本更新

A Multi-level Analysis of Factors Associated with Student Performance: A Machine Learning Approach to the SAEB Microdata

学生表现相关因素的多层次分析:基于SAEB微观数据的机器学习方法

Rodrigo Tertulino, Laércio Alencar

发表机构 * Federal Institute of Education, Science, and Technology of Rio Grande do Norte(巴西里约格朗德杜北教育、科学和技术联邦学院)

AI总结 采用多级机器学习方法,利用SAEB微观数据中四类特征,通过随机森林模型以90.2%准确率分类学生水平,并借助SHAP解释发现学校平均社会经济水平是最强预测因子,表明学业表现是系统性现象。

Comments This article has been published in Discover Education (Springer Nature). The final authenticated version is available at:https://doi.org/10.1007/s44217-026-01699-0

详情
Journal ref
Discover Education, 2026
AI中文摘要

识别影响基础教育学生表现的因素是巴西制定有效公共政策的核心挑战。本研究引入了一种多级机器学习方法,利用巴西基础教育评估系统(SAEB)的微观数据对九年级和高中学生的熟练程度进行分类。我们的模型独特地整合了四个数据源:学生社会经济特征、教师专业档案、学校指标和校长管理档案。对四种集成算法的比较分析证实了随机森林模型的优越性,该模型达到了90.2%的准确率和96.7%的曲线下面积(AUC)。为了超越预测,我们应用了基于SHAP的可解释人工智能(XAI),结果显示学校的平均社会经济水平是最主要的预测因子,表明系统性因素比孤立的个体特征影响更大。主要结论是,学业表现是一种与学校生态系统深度相关的系统性现象。本研究提供了一个数据驱动的、可解释的工具,以通过解决学校之间的差异来促进教育公平的政策制定。

英文摘要

Identifying the factors that influence student performance in basic education is a central challenge for formulating effective public policies in Brazil. This study introduces a multi-level machine learning approach to classify the proficiency of 9th-grade and high school students using microdata from the System of Assessment of Basic Education (SAEB). Our model uniquely integrates four data sources: student socioeconomic characteristics, teacher professional profiles, school indicators, and principal management profiles. A comparative analysis of four ensemble algorithms confirmed the superiority of a Random Forest model, which achieved 90.2% accuracy and an Area Under the Curve (AUC) of 96.7%. To move beyond prediction, we applied Explainable AI (XAI) using SHAP, which revealed that the school's average socioeconomic level is the most dominant predictor, demonstrating that systemic factors have a greater impact than individual characteristics in isolation. The primary conclusion is that academic performance is a systemic phenomenon deeply tied to the school's ecosystem. This study provides a data-driven, interpretable tool to inform policies aimed at promoting educational equity by addressing disparities between schools.

2511.18960 2026-06-16 cs.LG cs.CV cs.RO 版本更新

AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention

AVA-VLA: 通过主动视觉注意力改进视觉-语言-动作模型

Lei Xiao, Jifeng Li, Juntao Gao, Feiyang Ye, Yan Jin, Jingjing Qian, Jing Zhang, Yong Wu, Xiaoyuan Yu

发表机构 * LiAuto Inc.(LiAuto公司) Beijing University of Technology(北京理工大学) The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳))

AI总结 针对VLA模型忽视历史信息的问题,提出AVA-VLA框架,利用循环状态近似信念并引入主动视觉注意力动态重加权视觉令牌,在LIBERO和CALVIN等基准上取得最优性能。

Comments Accepted at CVPR 2026 (Highlight)

详情
AI中文摘要

视觉-语言-动作(VLA)模型最近在具身任务中取得了显著进展,但大多数方法在每个时间步独立处理视觉观察。这种历史无关的设计将机器人操作视为马尔可夫决策过程,而现实中的机器人控制本质上是部分可观测的,需要推理过去的交互。为了解决这一不匹配,我们从部分可观测马尔可夫决策过程的角度重新表述VLA策略学习,并提出AVA-VLA,一种将动作生成建立在循环状态上的框架,该状态作为智能体对任务历史信念的神经近似。基于此循环状态,我们引入了主动视觉注意力(AVA),它动态地重新加权当前观测中的视觉令牌,以关注与指令和执行历史最相关的区域。大量实验表明,AVA-VLA在标准机器人基准测试(包括LIBERO和CALVIN)上达到了最先进的性能,并有效迁移到真实世界的双臂操作任务。这些结果证明了时间基础的主动视觉处理在改善机器人序列决策中VLA性能的有效性。项目页面见该URL。

英文摘要

Vision-Language-Action (VLA) models have shown remarkable progress in embodied tasks recently, but most methods process visual observations independently at each timestep. This history-agnostic design treats robot manipulation as a Markov Decision Process, even though real-world robotic control is inherently partially observable and requires reasoning over past interactions. To address this mismatch, we reformulate VLA policy learning from a Partially Observable Markov Decision Process perspective and propose AVA-VLA, a framework that conditions action generation on a recurrent state that serves as a neural approximation to the agent's belief over task history. Built on this recurrent state, we introduce Active Visual Attention (AVA), which dynamically reweights visual tokens in the current observation to focus on regions most relevant given both the instruction and execution history. Extensive experiments show that AVA-VLA achieves state-of-the-art performance on standard robotic benchmarks, including LIBERO and CALVIN, and transfers effectively to real-world dual-arm manipulation tasks. These results demonstrate the effectiveness of temporally grounded active visual processing for improving VLA performance in robotic sequential decision-making. The project page is available at https://liauto-dsr.github.io/AVA-VLA-Page.

2512.16184 2026-06-16 cs.LG 版本更新

A Multimodal Approach to Alzheimer's Diagnosis: Geometric Insights from Cube Copying and Cognitive Assessments

一种多模态阿尔茨海默病诊断方法:来自立方体复制和认知评估的几何洞察

Jaeho Yang, Kijung Yoon

发表机构 * Department of Electronic Engineering, Hanyang University(电子工程系,翰阳大学) Department of Artificial Intelligence, Hanyang University(人工智能系,翰阳大学)

AI总结 提出多模态框架,将手绘立方体草图转换为图结构,结合人口统计和神经心理测试分数,用于阿尔茨海默病分类,图表示优于像素模型,多模态融合提升性能。

详情
AI中文摘要

早期可及的阿尔茨海默病检测仍是一个关键的临床挑战,而立方体复制任务提供了一种简单但信息丰富的视空间功能评估。本文提出一种多模态框架,将手绘立方体草图转换为捕获几何和拓扑属性的图结构表示,并将这些特征与人口统计信息和神经心理测试分数相结合,用于阿尔茨海默病分类。立方体绘图被建模为图,节点特征编码空间坐标、基于局部图元的拓扑和角度几何,通过图神经网络处理,并在后期融合模型中与年龄、教育程度和NPT特征融合。实验结果表明,基于图的表示提供了强大的单模态基线,并显著优于基于像素的卷积模型,而多模态集成进一步提高了平衡分类性能和判别能力。基于SHAP的可解释性分析确定了与角完整性和边缘连续性相关的特定图元基序作为关键预测因子,与阿尔茨海默病中立方体绘图扭曲的临床观察高度一致。总之,这些发现将基于图的立方体复制行为分析建立为一种可解释、非侵入且可扩展的阿尔茨海默病筛查框架。

英文摘要

Early and accessible detection of Alzheimer's disease (AD) remains a critical clinical challenge, and cube-copying tasks offer a simple yet informative assessment of visuospatial function. This work proposes a multimodal framework that converts hand-drawn cube sketches into graph-structured representations capturing geometric and topological properties, and integrates these features with demographic information and neuropsychological test (NPT) scores for AD classification. Cube drawings are modeled as graphs with node features encoding spatial coordinates, local graphlet-based topology, and angular geometry, which are processed using graph neural networks and fused with age, education, and NPT features in a late-fusion model. Experimental results show that graph-based representations provide a strong unimodal baseline and substantially outperform pixel-based convolutional models, while multimodal integration further improves balanced classification performance and discriminative ability. SHAP-based interpretability analysis identifies specific graphlet motifs associated with corner integrity and edge continuity as key predictors, closely aligning with clinical observations of distorted cube drawings in AD. Together, these findings establish graph-based analysis of cube-copying behavior as an interpretable, non-invasive, and scalable framework for Alzheimer's disease screening.

2512.18725 2026-06-16 cs.LG 版本更新

ML Inference Scheduling with Predictable Latency

具有可预测延迟的ML推理调度

Haidong Zhao, Nikolaos Georgantas

发表机构 * Inria(法国国家信息与自动化研究所) Sorbonne University Paris(巴黎索邦大学)

AI总结 针对ML推理中并发任务干扰导致调度不可预测的问题,提出细粒度动态干扰预测方法,提高GPU利用率的同时满足SLO。

Comments Accepted at MAIoT@Middleware 2025

详情
Journal ref
Proceedings of the Middleware for Autonomous AIoT Systems in the Computing Continuum (MAIoT 2025)
AI中文摘要

机器学习(ML)推理服务系统可以调度请求以提高GPU利用率并满足服务级别目标(SLO)或截止时间。然而,提高GPU利用率可能会影响延迟敏感的调度,因为并发任务会竞争GPU资源,从而引入干扰。鉴于干扰效应在调度中引入不可预测性,忽略它们可能会影响SLO或截止时间的满足。尽管如此,现有的干扰预测方法在几个方面仍然有限,这可能限制它们在调度中的实用性。首先,它们通常是粗粒度的,忽略了运行时共置动态,从而限制了干扰预测的准确性。其次,它们倾向于使用静态预测模型,这可能无法有效应对不同的工作负载特征。在本文中,我们评估了现有干扰预测方法的潜在局限性,发现粗粒度方法可能导致预测精度的显著偏差,而静态模型在变化的工作负载下会显著退化。

英文摘要

Machine learning (ML) inference serving systems can schedule requests to improve GPU utilization and to meet service level objectives (SLOs) or deadlines. However, improving GPU utilization may compromise latency-sensitive scheduling, as concurrent tasks contend for GPU resources and thereby introduce interference. Given that interference effects introduce unpredictability in scheduling, neglecting them may compromise SLO or deadline satisfaction. Nevertheless, existing interference prediction approaches remain limited in several respects, which may restrict their usefulness for scheduling. First, they are often coarse-grained, which ignores runtime co-location dynamics and thus restricts their accuracy in interference prediction. Second, they tend to use a static prediction model, which may not effectively cope with different workload characteristics. In this paper, we evaluate the potential limitations of existing interference prediction approaches, finding that coarse-grained methods can lead to noticeable deviations in prediction accuracy and that static models degrade considerably under changing workloads.

2512.19643 2026-06-16 cs.LG cs.CE 版本更新

ANCHOR: Error-Controlled Adaptive Numerical Correction for Neural Operator Time Marching

ANCHOR: 神经算子时间推进的误差控制自适应数值校正

Rajyasri Roy, Dibyajyoti Nayak, Somdatta Goswami

发表机构 * Department of Civil and Systems Engineering, Johns Hopkins University(土木与系统工程系,约翰霍普金斯大学)

AI总结 提出ANCHOR框架,通过基于物理信息的残差估计器自适应耦合预训练神经算子与经典数值求解器,实现非线性时变PDE的稳定长时预测,有效控制误差累积。

Comments 32 pages, 18 figures

详情
AI中文摘要

时间相关偏微分方程(PDE)的数值模拟是科学和工程应用的核心,但对于长时间或时间紧迫的场景,高保真求解器往往成本过高。神经算子(NO)替代模型在参数和函数输入上提供快速推理;然而,大多数自回归NO框架仍然容易受到累积误差的影响,且集成平均指标对单个推理轨迹的保证有限。在实践中,误差累积在训练时间范围外可能变得不可接受,现有方法缺乏在线监测或校正机制。为解决这一问题,我们提出ANCHOR(高保真算子展开的自适应数值校正),一种在线、实例感知的混合推理框架,用于非线性时变PDE的稳定长时预测。ANCHOR将预训练NO作为主要推理引擎,并通过基于物理信息的残差误差估计器自适应地将其与经典数值求解器耦合。受数值分析中自适应时间步长的启发,ANCHOR监测归一化PDE残差的指数移动平均(EMA),以检测累积误差并在无需真实解的情况下触发校正求解器干预。我们表明,基于EMA的估计器与真实相对L2误差强相关,从而在推理过程中实现无数据、实例感知的误差控制。在六个经典PDE上的评估:一维和二维Burgers方程、二维Allen-Cahn方程、二维Cahn-Hilliard方程、二维Navier-Stokes方程和三维热传导方程,证明ANCHOR能够可靠地限制长时误差增长,稳定外推展开,并显著提高相对于独立神经算子的鲁棒性,同时保持比高保真数值求解器更高的效率。

英文摘要

Numerical simulation of time-dependent partial differential equations (PDEs) is central to scientific and engineering applications, but high-fidelity solvers are often prohibitively expensive for long-horizon or time-critical settings. Neural operator (NO) surrogates offer fast inference across parametric and functional inputs; however, most autoregressive NO frameworks remain vulnerable to compounding errors, and ensemble-averaged metrics provide limited guarantees for individual inference trajectories. In practice, error accumulation can become unacceptable beyond the training horizon, and existing methods lack mechanisms for online monitoring or correction. To address this gap, we propose ANCHOR (Adaptive Numerical Correction for High-fidelity Operator Rollouts), an online, instance-aware hybrid inference framework for stable long-horizon prediction of nonlinear, time-dependent PDEs. ANCHOR treats a pretrained NO as the primary inference engine and adaptively couples it with a classical numerical solver using a physics-informed, residual-based error estimator. Inspired by adaptive time-stepping in numerical analysis, ANCHOR monitors an exponential moving average (EMA) of the normalized PDE residual to detect accumulating error and trigger corrective solver interventions without requiring access to ground-truth solutions. We show that the EMA-based estimator correlates strongly with the true relative L2 error, enabling data-free, instance-aware error control during inference. Evaluations on six canonical PDEs: 1D and 2D Burgers', 2D Allen-Cahn, 2D Cahn-Hilliard, 2D Navier-Stokes, and 3D heat conduction, demonstrate that ANCHOR reliably bounds long-horizon error growth, stabilizes extrapolative rollouts, and significantly improves robustness over standalone neural operators, while remaining substantially more efficient than high-fidelity numerical solvers.

2602.03957 2026-06-16 cs.LG cs.CY 版本更新

Temporal Validation Changes the Apparent Public-Health Utility of Under-Five Mortality Prediction in Bangladesh: A Four-Round DHS Machine-Learning Study

时间验证改变孟加拉国五岁以下儿童死亡率预测的公共卫生效用:一项四轮DHS机器学习研究

Md Muhtasim Munif Fahim, M. Monimul Huq, M. Sabiruzzaman, Md Rezaul Karim

发表机构 * Data Science Research Lab, Department of Statistics, University of Rajshahi(数据科学研究实验室,统计学系,拉贾沙希大学) Department of Statistics, University of Rajshahi(统计学系,拉贾沙希大学)

AI总结 本研究通过四轮孟加拉国人口与健康调查数据,比较不同验证设计下机器学习模型预测五岁以下儿童死亡率的表现,发现时间验证设计比模型架构更显著影响公共卫生效用评估。

Comments 26 pages, 6 figures. Submitted to BMC Medical Informatics

详情
AI中文摘要

背景:尽管国家取得进展,孟加拉国五岁以下儿童死亡率仍不均衡。基于DHS的预测模型可能指导针对性随访,但前提是验证反映未来使用。我们检验了验证设计如何改变预测性能的表观。方法:分析了四轮BDHS(2011-2022年;33,962名儿童;1,290例死亡),使用26特征管道和三类模型,在四种验证方案下,包括跨调查时间验证(训练2011+2014,校准2017,测试2022)。通过遗传算法神经架构搜索选择了32单元ELU多层感知器。AUROC使用2,000次bootstrap重采样;筛查效用使用敏感性、阳性预测值和固定容量下需筛查人数。结果:验证方案比模型类别更显著改变公共卫生解释。NAS MLP AUROC范围从0.669(仅2022年随机)到0.775(合并随机),时间AUROC为0.730。在时间验证的前10%阈值下,NAS识别出2022年355例死亡中的152例(敏感性42.8%,PPV 13.2%,NNS 7.6)。不同设计下的NNS范围从5.6到11.0。结论:验证方案选择比架构更显著改变筛查工作量和表观政策价值。时间验证支持对随访和转诊需求的可靠估计;DHS儿童死亡率研究在项目使用前应报告敏感性、PPV和NNS。

英文摘要

Background: Under-five mortality in Bangladesh remains uneven despite national progress. DHS-based prediction models may guide targeted follow-up, but only if validation reflects future use. We examined how validation design changes apparent prediction performance. Methods: Four BDHS rounds (2011-2022; 33,962 children; 1,290 deaths) were analysed with a 26-feature pipeline and three model classes under four validation regimes, including cross-survey temporal validation (train 2011+2014, calibrate 2017, test 2022). A 32-unit ELU multilayer perceptron was selected via genetic-algorithm neural architecture search. AUROC used 2,000 bootstrap resamples; screening utility used sensitivity, PPV, and number needed to screen (NNS) at fixed capacity. Results: Validation regime altered public-health interpretation more than model class. NAS MLP AUROC ranged from 0.669 (2022-only random) to 0.775 (pooled random), with temporal AUROC 0.730. At the top-10% temporal threshold, NAS identified 152/355 deaths in 2022 (sensitivity 42.8%, PPV 13.2%, NNS 7.6). NNS across designs ranged from 5.6 to 11.0. Conclusions: Validation-regime choice changed screening workload and apparent policy value more than architecture. Temporal validation supports defensible estimates of follow-up and referral demand; DHS child-mortality studies should report sensitivity, PPV, and NNS before programmatic use.

2602.05060 2026-06-16 cs.LG cs.CL 版本更新

StagePilot: Stage-Level Planning for Long-Horizon Dialogue Simulation in Cybergrooming

StagePilot: 网络诱骗中长程对话模拟的阶段级规划

Heajun An, Qi Zhang, Minqian Liu, Xinyi Zhang, Sang Won Lee, Lifu Huang, Pamela J. Wisniewski, Jin-Hee Cho

发表机构 * Virginia Tech(弗吉尼亚理工大学) University of California, Davis(加州大学戴维斯分校) International Computer Science Institute(国际计算机科学研究所)

AI总结 提出StagePilot框架,通过分离阶段级规划与响应生成,结合强化学习学习阶段策略,实现网络诱骗对话的结构化、连贯模拟,相比基线减少对话停滞,IQL+AWAC变体最终阶段到达率提升43%。

Comments Accepted at the 27th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2026)

详情
AI中文摘要

网络诱骗是对青少年的一种不断演变的威胁,需要主动的教育干预。我们通过将对话进展建模为阶段式交互上的结构化规划问题来解决这一问题。我们提出StagePilot,一个将阶段级规划与响应生成分离的对话框架,其中模型在受约束的转换下选择下一阶段,并基于该阶段生成响应,从而实现连贯且逼真的进展。使用强化学习从离线数据中学习阶段级策略,优化情感对齐和目标一致进展。我们的实证实验表明,与基线相比,StagePilot生成更结构化、更连贯的对话轨迹,并减少对话停滞;值得注意的是,IQL+AWAC变体更频繁地到达最终阶段,同时保持超过70%的正面或中性响应,实现了43%的相对改进。

英文摘要

Cybergrooming is an evolving threat to youth, requiring proactive educational interventions. We address this by modeling dialogue progression as a structured planning problem over stage-wise interactions. We propose StagePilot, a dialogue framework that separates stage-level planning from response generation, in which the model selects the next stage under constrained transitions and generates responses conditioned on it, enabling coherent and realistic progression. Reinforcement learning is used to learn stage-level policies from offline data, optimizing for both emotional alignment and goal-consistent progression. Our empirical experiments show that StagePilot generates more structured, coherent dialogue trajectories and reduces conversational stagnation compared to baselines; notably, the IQL+AWAC variant reaches the final stage more often while maintaining over 70% positive or neutral responses, yielding a 43% relative improvement.

2602.16793 2026-06-16 cs.LG 版本更新

Escaping the Cognitive Well: Efficient Competition Math with Off-the-Shelf Models

逃离认知陷阱:使用现成模型高效解决竞赛数学问题

Xingyu Dang, Rohit Agarwal, Rodrigo Porto, Anirudh Goyal, Liam H Fowl, Sanjeev Arora

发表机构 * Princeton University(普林斯顿大学) Princeton Language and Intelligence(普林斯顿语言与智能)

AI总结 提出一种推理流水线,利用现成模型以极低成本在IMO风格数学问题上达到最佳性能,通过猜想提取和上下文分离解决求解器-评分器流水线中的认知陷阱问题。

详情
AI中文摘要

在过去一年中,定制和未公开的数学推理模型在国际数学奥林匹克竞赛(IMO)中达到了金牌水平。随后,使用公开可用的模型通过大规模推理也报告了类似的性能,但成本高昂(例如,每个问题3000美元)。在这项工作中,我们提出了一种推理流水线,在IMO风格的数学问题上以平均推理成本比竞争方法低几个数量级的情况下实现了最佳性能,同时仅使用通用现成模型。我们的方法基于对求解器-评分器流水线中评分器失败的见解,我们称之为认知陷阱(迭代优化收敛到错误解,而求解器和流水线的内部评分器认为该解基本正确)。我们的流水线通过猜想提取来解决这些失败模式,其中候选引理从生成的解中分离出来,并在新环境(上下文分离)中与其否定形式一起独立验证。在IMO-ProofBench Advanced(PB-Adv)上,我们的流水线使用Gemini 3.0 Pro达到了67.1%的性能,每个问题的平均成本约为31美元。在评估时,这代表了PB-Adv上公开和未公开模型中的最先进水平,并且成功率是下一个最佳公开可访问流水线的两倍以上,而成本仅为其一小部分。

英文摘要

In the past year, custom and unreleased math reasoning models reached gold medal performance on the International Mathematical Olympiad (IMO). Similar performance was then reported using large-scale inference on publicly available models but at prohibitive costs (e.g., 3000 USD per problem). In this work, we present an inference pipeline that attains best-in-class performance on IMO-style math problems at an average inference cost orders of magnitude below competing methods while using only general-purpose off-the-shelf models. Our method relies on insights about grader failure in solver-grader pipelines, which we call the Cognitive Well (iterative refinement converging to a wrong solution that the solver as well as the pipeline's internal grader consider to be basically correct). Our pipeline addresses these failure modes through conjecture extraction, wherein candidate lemmas are isolated from generated solutions and independently verified alongside their negations in a fresh environment (context detachment). On IMO-ProofBench Advanced (PB-Adv), our pipeline achieves 67.1 percent performance using Gemini 3.0 Pro with an average cost per question of approximately 31 USD. At the time of evaluation, this represented the state-of-the-art on PB-Adv among both public and unreleased models, and more than doubles the success rate of the next best publicly accessible pipeline, all at a fraction of the cost.

2602.17997 2026-06-16 cs.LG cs.RO 版本更新

Whole-Brain Connectomic Graph Model Enables Whole-Body Locomotion Control in Fruit Fly

全脑连接组图模型实现果蝇全身运动控制

Zehao Jin, Yaoye Zhu, Chen Zhang, Yanan Sui

发表机构 * Tsinghua University(清华大学)

AI总结 提出Fly-connectomic Graph Model,将果蝇全脑连接组作为图结构控制器,通过深度强化学习驱动仿真果蝇运动,在多种任务中表现稳定且样本效率优于基线。

详情
AI中文摘要

动物在由全脑连接塑造的神经系统控制下执行协调的全身运动。全脑神经连接(即连接组)的映射为建模感觉运动信息流提供了天然的图结构,但其作为具身智能体神经控制器的潜力尚未被充分探索。本文介绍了Fly-connectomic Graph Model,该模型直接将成年果蝇的全脑连接组实例化为图结构神经控制器,通过深度强化学习驱动仿真生物力学果蝇的运动。我们在多种运动任务中实现了稳定的性能,并且与图和非图基线相比,样本效率更高。我们的结果展示了一种通过将全脑布线原理转化为可操作的架构先验来设计有效控制策略的生物启发式方法,同时通过动态信息流提高了可解释性。这项工作还通过提供一个计算平台来研究动物行为背后的感觉运动转换,以及一种推动更贴近自然的智能系统发展的范式,强调了连接神经力学与具身智能的潜力。

英文摘要

Animals perform coordinated whole-body movements under the control of neural systems shaped by brain-wide connectivity. The mapping of the whole-brain neural connections, or the connectomes, provides a natural graph for modeling sensorimotor information flow, yet its potential as a neural controller for embodied agents remains largely unexplored. Here, we introduce the Fly-connectomic Graph Model, which directly instantiates the whole-brain connectome of an adult Drosophila as a graph-structured neural controller for movements of a simulated biomechanical fruit fly via deep reinforcement learning. We achieve stable performance across diverse locomotion tasks, as well as better sample efficiency compared to both graph and non-graph baselines. Our results demonstrate a biologically informed way towards effective control policy design by translating whole-brain wiring principles into actionable architectural priors, while also improving the interpretability through dynamic information flow. This work also highlights the potential to bridge neuromechanics with embodied intelligence by providing a computational platform for investigating the sensorimotor transformation underlying animal behavior and a paradigm to advance the development of more nature-aligned intelligent systems.

2602.22179 2026-06-16 cs.LG 版本更新

Discovering Subgroups with Exceptional Survival Characteristics

发现具有异常生存特征的子群

Mhd Jawad Al Rahwanji, Sascha Xu, Nils Philipp Walter, Jilles Vreeken

发表机构 * CISPA Helmholtz Center for Information Security(CISPA 河岸信息安全中心)

AI总结 提出非参数可微方法Sysurv,通过可读规则发现生存时间异常的子群,在癌症数据等案例中优于现有方法。

详情
AI中文摘要

在许多应用中,识别比总体生存时间更长或更短的子群体非常重要。例如,在医学中,它可以确定哪些患者从治疗中受益;在预测性维护中,哪些组件更可能失效。现有发现具有异常生存特征子群的方法依赖于生存模型的限制性假设(如比例风险),需要预先离散化的特征,并且由于比较平均统计量,往往忽略个体异质性。在本文中,我们提出Sysurv,一种非参数、完全可微的方法,能够发现选择具有异常生存特征子群的人类可读规则。在广泛的数据集和设置上的实证评估,包括癌症数据的案例研究,表明Sysurv揭示了有洞察力和可操作的生存子群,优于现有技术。

英文摘要

In many applications, it is important to identify subpopulations that survive longer or shorter than the rest of the population. In medicine, for example, it allows determining which patients benefit from treatment, and in predictive maintenance, which components are more likely to fail. Existing methods for discovering subgroups with exceptional survival characteristics rely on restrictive assumptions about the survival model (e.g. proportional hazards), require pre-discretized features, and, as they compare average statistics, tend to overlook individual heterogeneity. In this paper, we propose Sysurv, a non-parametric, fully differentiable method that discovers human-readable rules selecting subgroups with exceptional survival characteristics. Empirical evaluation on a wide range of datasets and settings, including a case study on cancer data, shows that Sysurv reveals insightful and actionable survival subgroups, outperforming the state of the art.

2602.22673 2026-06-16 cs.LG q-bio.QM 版本更新

Forecasting Bacterial Antimicrobial Resistance Trends Using Machine Learning on WHO GLASS Surveillance Data: A Retrieval-Augmented Generation Approach for Policy Decision Support

基于机器学习对WHO GLASS监测数据的细菌抗菌药物耐药趋势预测:一种用于政策决策支持的检索增强生成方法

Md Tanvir Hasan Turja

发表机构 * Independent Researcher(独立研究者) London, United Kingdom(伦敦,英国)

AI总结 利用XGBoost模型预测全球抗菌药物耐药趋势,结合检索增强生成系统提供可溯源的政策建议,误差较基线降低85.3%。

Comments 20 pages, 8 figures, code and data available at https://github.com/TanvirTurja/amr-forecasting-rag

详情
AI中文摘要

背景:抗菌药物耐药性(AMR)是全球健康威胁。尽管WHO全球抗菌药物耐药性与使用监测系统(GLASS)提供了标准化数据,但基于人群的机器学习耐药趋势预测仍然有限。将计算预测转化为政策需要透明的解释机制。方法:处理了2021-2023年的监测数据,包含44个国家和五个WHO区域的5909个观测值。采用严格的时间划分防止数据泄露。使用包括前一年耐药性和抗生素消耗在内的特征,对六种模型(Naive、Linear、Ridge、XGBoost、LightGBM、LSTM)进行基准测试,以预测一年后的耐药率。计算评估指标(MAE、RMSE、sMAPE),并给出MAE的95%自助法置信区间。实现了一个利用Gemma 4的本地检索增强生成(RAG)系统,将预测结果转化为基于检索到的WHO文件的政策指导。结果:XGBoost取得了最佳性能(测试MAE = 6.13% [95% CI: 5.83-6.44]),相比Naive基线(MAE = 41.79%)误差降低了85.3%。SHAP分析确定前一年耐药性为最主要的预测因子(贡献50.5%),证实了强自回归行为。区域预测误差与监测覆盖率密切相关,范围从欧洲区域的3.65%到东南亚区域的8.61%。RAG管道生成了准确、可溯源的政策响应,没有虚构引用。结论:短期AMR耐药率表现出强时间自相关性,可通过梯度提升准确预测。将这些预测与抗幻觉的RAG系统相结合,为AMR治理提供了一个可扩展、基于证据的决策支持框架。

英文摘要

Background: Antimicrobial resistance (AMR) is a global health threat. While the WHO Global Antimicrobial Resistance and Use Surveillance System (GLASS) provides standardized data, population-level machine learning forecasting of resistance trends remains limited. Translating computational forecasts into policy requires transparent interpretation mechanisms. Methods: Surveillance data (2021-2023) comprising 5,909 observations across 44 countries and five WHO regions were processed. A rigorous temporal split prevented data leakage. Six models (Naive, Linear, Ridge, XGBoost, LightGBM, LSTM) were benchmarked to forecast one-year-ahead resistance rates using features including prior-year resistance and antibiotic consumption. Evaluation metrics (MAE, RMSE, sMAPE) were computed, with 95% bootstrap confidence intervals for MAE. A local Retrieval-Augmented Generation (RAG) system utilizing Gemma 4 was implemented to translate forecast findings into policy guidance grounded in retrieved WHO documents. Results: XGBoost achieved the best performance (test MAE = 6.13% [95% CI: 5.83-6.44]), an 85.3% error reduction versus the naive baseline (MAE = 41.79%). SHAP analysis identified prior-year resistance as the dominant predictor (50.5% gain), confirming strong autoregressive behavior. Regional forecast error tracked closely with surveillance coverage, ranging from 3.65% in the European Region to 8.61% in South-East Asia. The RAG pipeline generated accurate, source-attributed policy responses without fabricated citations. Conclusion: Short-term AMR resistance rates exhibit strong temporal autocorrelation that can be accurately forecasted using gradient boosting. Coupling these forecasts with a hallucination-resistant RAG system provides a scalable, evidence-based decision-support framework for AMR governance.

2603.05299 2026-06-16 cs.LG cs.AI cs.CL cs.SD 版本更新

WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation

WavSLM: 通过WavLM蒸馏的单流语音语言建模

Luca Della Libera, Cem Subakan, Mirco Ravanelli

发表机构 * Concordia University(康科迪亚大学) Mila-Quebec AI Institute(蒙特利尔AI研究所) Université Laval(拉瓦尔大学)

AI总结 提出WavSLM,通过量化蒸馏WavLM自监督表示到单一码本并优化自回归下一块预测,实现无文本监督的单流语音语言建模,在一致性和生成任务上表现竞争。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

大型语言模型表明,简单的自回归训练可以产生可扩展且连贯的生成,但由于语义和声学信息的纠缠,将这一范式扩展到语音仍然具有挑战性。大多数现有的语音语言模型依赖于文本监督、分层令牌流或复杂的混合架构,偏离了在文本中已被证明有效的单流生成预训练范式。在这项工作中,我们引入了WavSLM,一种通过将自监督WavLM表示量化和蒸馏到单一码本中,并优化自回归下一块预测目标来训练的语音语言模型。WavSLM在单个令牌流中联合建模语义和声学信息,无需文本监督或文本预训练。尽管其简单性,它在一致性基准和语音生成方面取得了有竞争力的性能,同时使用更少的参数、更少的训练数据,并支持流式推理。

英文摘要

Large language models show that simple autoregressive training can yield scalable and coherent generation, but extending this paradigm to speech remains challenging due to the entanglement of semantic and acoustic information. Most existing speech language models rely on text supervision, hierarchical token streams, or complex hybrid architectures, departing from the single-stream generative pretraining paradigm that has proven effective in text. In this work, we introduce WavSLM, a speech language model trained by quantizing and distilling self-supervised WavLM representations into a single codebook and optimizing an autoregressive next-chunk prediction objective. WavSLM jointly models semantic and acoustic information within a single token stream without text supervision or text pretraining. Despite its simplicity, it achieves competitive performance on consistency benchmarks and speech generation while using fewer parameters, less training data, and supporting streaming inference.

2603.14709 2026-06-16 cs.LG 版本更新

Not All Retrievals are Useful: Cross-Attention for Input-Aware RAG in Time Series Forecasting

并非所有检索都有用:时间序列预测中输入感知RAG的交叉注意力

Seunghan Lee, Jaehoon Lee, Jun Seo, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn

发表机构 * LG AI Research(LG人工智能研究所)

AI总结 提出Cross-RAG框架,通过查询-检索交叉注意力选择性关注相关检索样本,联合利用查询、检索样本及其关系,提升零样本时间序列预测性能,避免无关检索的干扰。

Comments KDD Workshop on Mining and Learning from Time Series 2026

详情
AI中文摘要

检索增强生成(RAG)通过利用外部知识库增强了零样本时间序列(TS)预测,然而现有方法在将检索样本与查询融合时忽略了输入级别的相关性。我们认为并非所有检索都同样有用,无关的检索会降低性能。为此,我们提出Cross-RAG,一种基于RAG的零样本预测框架,通过查询-检索交叉注意力选择性地关注与查询相关的检索样本。通过建模查询与检索样本之间的输入级别相关性,Cross-RAG联合整合了三种信息源:1)查询本身,2)检索样本,以及3)它们的关系交互。特别地,这种输入感知设计使得Cross-RAG在检索样本数量k增长时保持稳定,而先前没有交叉注意力的方法需要仔细调整k以避免因无关检索导致的性能下降。大量实验表明,Cross-RAG在多个TSFM骨干网络和各种RAG方法上持续提升了零样本预测性能,额外分析证实了其在各种检索场景下的有效性。代码可在该https URL获取。

英文摘要

Retrieval-augmented generation (RAG) enhances zero-shot time series (TS) forecasting by leveraging external knowledge bases, yet existing approaches overlook input-level relevance when fusing retrieved samples with the query. We argue that not all retrievals are equally useful, and irrelevant ones can degrade performance. To this end, we propose Cross-RAG, a zero-shot RAG-based forecasting framework that selectively attends to query-relevant retrieved samples via query--retrieval cross-attention. By modeling input-level relevance between the query and retrieved samples, Cross-RAG jointly incorporates three sources of information: 1) the query itself, 2) the retrieved samples, and 3) their relational interactions. In particular, this input-aware design enables Cross-RAG to remain stable as the number of retrieved samples $k$ grows, whereas prior methods without cross-attention require careful $k$ tuning to avoid degradation from irrelevant retrievals. Extensive experiments demonstrate that Cross-RAG consistently improves zero-shot forecasting performance across multiple TSFM backbones and various RAG methods, with additional analyses confirming its effectiveness across various retrieval scenarios. Code is available at https://github.com/seunghan96/cross-rag/.

2604.00163 2026-06-16 cs.LG cs.AI cs.NE 版本更新

Epileptic Seizure Detection in Separate Frequency Bands Using Feature Analysis and Graph Convolutional Neural Network (GCN) from Electroencephalogram (EEG) Signals

基于特征分析和图卷积神经网络(GCN)的脑电图(EEG)信号癫痫发作检测在不同频段的研究

Ferdaus Anam Jibon, Fazlul Hasan Siddiqui, F. Deeba, Gahangir Hossain

AI总结 提出一种频率感知框架,将EEG分解为五个频段并提取判别特征,利用图卷积神经网络建模电极空间依赖,在CHB-MIT数据集上实现99.01%的宽带准确率,提高了可解释性和诊断精度。

Comments One author disagrees with the archiving

详情
AI中文摘要

癫痫发作是一种神经系统疾病,其特征是大脑中异常和过度的电活动,导致反复发作事件。脑电图(EEG)信号因其能够捕捉时间和空间的神经动力学而被广泛用于癫痫诊断。虽然最近的深度学习方法取得了高检测准确率,但它们往往缺乏可解释性和神经生理学相关性。本研究提出了一种基于发作期EEG分析的频率感知框架用于癫痫发作检测。原始EEG信号被分解为五个频段(delta、theta、alpha、低beta和高beta),并从每个频段提取十一个判别特征。然后采用图卷积神经网络(GCN)对EEG电极之间的空间依赖性进行建模,电极表示为图节点。在CHB-MIT头皮EEG数据集上的实验表明,该方法在相应频段上分别达到了97.1%、97.13%、99.5%、99.7%和51.4%的准确率,总体宽带准确率为99.01%。结果突出了中频段的强判别能力,并揭示了特定频率的发作模式。与传统的宽带EEG方法相比,所提出的方法提高了可解释性和诊断精度。

英文摘要

Epileptic seizures are neurological disorders characterized by abnormal and excessive electrical activity in the brain, resulting in recurrent seizure events. Electroencephalogram (EEG) signals are widely used for seizure diagnosis due to their ability to capture temporal and spatial neural dynamics. While recent deep learning methods have achieved high detection accuracy, they often lack interpretability and neurophysiological relevance. This study presents a frequency-aware framework for epileptic seizure detection based on ictal-phase EEG analysis. The raw EEG signals are decomposed into five frequency bands (delta, theta, alpha, lower beta, and higher beta), and eleven discriminative features are extracted from each band. A graph convolutional neural network (GCN) is then employed to model spatial dependencies among EEG electrodes, represented as graph nodes. Experiments on the CHB-MIT scalp EEG dataset demonstrate high detection performance, achieving accuracies of 97.1%, 97.13%, 99.5%, 99.7%, and 51.4% across the respective frequency bands, with an overall broadband accuracy of 99.01%. The results highlight the strong discriminative capability of mid-frequency bands and reveal frequency-specific seizure patterns. The proposed approach improves interpretability and diagnostic precision compared to conventional broadband EEG-based methods.

2604.09361 2026-06-16 cs.LG 版本更新

Stochastic-Dimension Frozen Sampled Neural Network for High-Dimensional Gross-Pitaevskii Equations on Unbounded Domains

用于无界域上高维格罗斯-皮塔耶夫斯基方程的随机维度冻结采样神经网络

Zhangyong Liang

发表机构 * National Center for Applied Mathematics, Tianjin University(天津大学应用数学中心) School of Mathematics and Statistics, Wuhan University(武汉大学数学与统计学院) School of Mathematics and Statistics & Computational Sciences Hubei Key Laboratory, Wuhan University(武汉大学数学与统计学院及计算科学湖北省重点实验室)

AI总结 本文提出了一种名为SD-FSNN的新型计算框架,用于求解高维无界域上的格罗斯-皮塔耶夫斯基方程。该方法通过结合多种技术,克服了传统离散化方法中的维度诅咒和梯度基神经网络求解器的计算瓶颈。首先,预设的高斯包络编码了波函数的远场衰减,使得空间-时间分离得以实现,其中空间近似通过冻结的单隐层神经网络和数据驱动的采样特征进行处理。这导致了一个无梯度的形式化,其中空间导数被解析地预先计算,时间依赖性则通过减少的常微分方程演化。其次,随机维度采样器通过在每个时间步只评估少量空间维度,提供了空间算子的条件无偏估计,从而降低了计算和内存成本。离散守恒定律也被强制执行,确保了长期稳定性。大量的数值实验表明,SD-FSNN在高达1000维的GPE上实现了显著更高的准确性和效率,优于当前最先进的方法,包括PINNs、随机特征方法和张量网络方法。结果证实SD-FSNN有效缓解了冻结基模型在结构解流形上的Kolmogorov n-宽度障碍。

详情
AI中文摘要

本文介绍了一种名为随机维度冻结采样神经网络(SD-FSNN)的新计算框架,用于求解无界域上的高维格罗斯-皮塔耶夫斯基方程(GPE)。所提出的方法通过技术的协同作用,克服了传统离散化方法中的维度诅咒和梯度基神经网络求解器的计算瓶颈。首先,预设的高斯包络编码了波函数的远场衰减,使得空间-时间分离得以实现,其中空间近似通过冻结的单隐层神经网络和数据驱动的采样特征进行处理。这导致了一个无梯度的形式化,其中空间导数被解析地预先计算,时间依赖性则通过减少的常微分方程演化。其次,随机维度采样器通过在每个时间步只评估少量空间维度,提供了空间算子的条件无偏估计,从而降低了计算和内存成本。离散守恒定律也被强制执行,确保了长期稳定性。大量的数值实验表明,SD-FSNN在高达1000维的GPE上实现了显著更高的准确性和效率,优于当前最先进的方法,包括PINNs、随机特征方法和张量网络方法。结果证实SD-FSNN有效缓解了冻结基模型在结构解流形上的Kolmogorov n-宽度障碍。

英文摘要

This paper introduces the Stochastic-Dimension Frozen Sampled Neural Network (SD-FSNN), a novel computational framework for solving high-dimensional Gross-Pitaevskii equation (GPE) on unbounded domain. The proposed method circumvents the curse-of-dimensionality that plagues traditional discretizations and the computational bottlenecks of gradient-based neural network solvers through a synergistic combination of techniques. First, a prescribed Gaussian envelope encodes the far-field decay of the wavefunction, enabling a space-time separation where the spatial approximation is handled by a frozen, single-hidden-layer neural network with data-driven sampled features. This yields a gradient-free formalism where spatial derivatives are analytically precomputed and time-dependence is evolved via reduced ODEs. Second, a stochastic-dimension sampler provides a conditionally unbiased estimate of the spatial operator by evaluating only a small subset of spatial dimensions at each time step, essentially reducing computational and memory costs. Discrete conservation laws are also enforced, ensuring long-term stability. Extensive numerical experiments on GPE in up to 1000 dimensions demonstrate that SD-FSNN achieves significantly higher accuracy and efficiency compared to state-of-the-art methods, including PINNs, randomized feature methods, and tensor-network approaches. The results confirm that SD-FSNN effectively mitigates the Kolmogorov $n$-width barrier for frozen-basis models on structured solution manifolds.

2604.09673 2026-06-16 cs.LG cs.AI 版本更新

Active Inference with a Self-Prior in the Mirror-Mark Task

镜像标记任务中带有自我先验的主动推理

Dongmin Kim, Hoshinori Kanazawa, Yasuo Kuniyoshi

发表机构 * The University of Tokyo(东京大学) Laboratory for Intelligent Systems and Informatics(智能系统与信息学实验室)

AI总结 提出一种基于自我先验的计算模型,通过主动推理驱动标记导向行为,无需外部奖励即可模拟镜像自我识别。

Comments 8 pages, 5 figures, Accepted to IEEE ICDL 2026

详情
AI中文摘要

镜像自我识别测试评估受试者是否触摸仅在镜子中可见的自身标记,被广泛用作自我意识的指标。在本研究中,我们提出一个计算模型,其中这种行为通过单一机制——自我先验——自发产生,无需任何外部奖励。自我先验通过Transformer实现,学习熟悉多感官经验的密度;当出现新标记时,与学习分布的差异通过主动推理驱动标记导向行为。一个仅依赖视觉和本体感觉而无触觉输入的模拟婴儿,发现镜中自己脸上的贴纸并在约70%的情况下将其移除,无需任何明确指令。贴纸移除后预期自由能显著下降,证实自我先验作为区分自我与非自我的内部标准。跨模态采样进一步表明,自我先验捕获视觉-本体感觉关联,充当概率身体图式。这些结果为镜像测试中观察到的关键行为提供了简洁的计算解释,并表明自由能原理可作为研究自我意识发展起源的统一假设。代码见:this https URL

英文摘要

The mirror self-recognition test evaluates whether a subject touches a mark on its own body that is visible only in a mirror, and is widely used as an indicator of self-awareness. In this study, we present a computational model in which this behavior emerges spontaneously through a single mechanism, the self-prior, without any external reward. The self-prior, implemented with a Transformer, learns the density of familiar multisensory experiences; when a novel mark appears, the discrepancy from this learned distribution drives mark-directed behavior through active inference. A simulated infant, relying solely on vision and proprioception without tactile input, discovered a sticker placed on its own face in the mirror and removed it in approximately 70% of cases without any explicit instruction. Expected free energy decreased significantly after sticker removal, confirming that the self-prior operates as an internal criterion for distinguishing self from non-self. Cross-modal sampling further demonstrated that the self-prior captures visual--proprioceptive associations, functioning as a probabilistic body schema. These results provide a concise computational account of the key behavior observed in the mirror test and suggest that the free energy principle can serve as a unifying hypothesis for investigating the developmental origins of self-awareness. Code is available at: https://github.com/kim135797531/self-prior-mirror

2605.04813 2026-06-16 cs.LG 版本更新

A Biased Nonnegative Block Term Tensor Decomposition Model for Dynamic QoS Prediction

一种用于动态QoS预测的有偏非负块项张量分解模型

Wenjing Liu, Yujia Lei, Qu Wang

发表机构 * GitHub

AI总结 提出BNBT框架,采用有偏非负块项张量分解增强表示能力,引入线性偏置项并设计SLF-NMUT算法,在动态QoS预测中显著提升精度。

详情
AI中文摘要

随着云计算和Web服务的快速发展,服务质量(QoS)已成为服务选择与推荐的关键标准。张量潜在特征分析为建模多维QoS数据提供了有效途径,现有大多数QoS预测方法主要基于规范多元分解(CP分解)或Tucker分解。然而,受限于其固有结构特性,这些方法无法准确捕捉用户-服务交互中复杂且动态的依赖关系,从而限制了预测性能。为解决此问题,本文提出一种基于有偏非负块项张量分解模型的动态QoS预测框架,称为BNBT。具体而言,该框架从三个方面进行构建:(1)采用块项张量分解增强潜在特征学习的表示能力;(2)引入线性偏置项以进一步提高预测精度;(3)设计一种面向张量的单元素依赖非负乘性更新算法SLF-NMUT,用于高效参数估计。在真实QoS数据集上的大量实验表明,所提出的BNBT框架在预测精度上持续优于多种先进的QoS预测方法。

英文摘要

With the rapid development of cloud computing and Web services, Quality of Service (QoS) has become a key criterion for service selection and recommendation. Tensor latent feature analysis provides an effective way to model multidimensional QoS data, and most existing QoS prediction methods are mainly based on Canonical Polyadic (CP) decomposition or Tucker decomposition. However, constrained by their inherent structural properties, these methods cannot accurately capture the complex and dynamic dependencies in user-service interactions, which limits their prediction performance. To address this issue, this paper proposes a dynamic QoS prediction framework based on the Biased Nonnegative Block Term Tensor Decomposition Model, termed BNBT. Specifically, the proposed framework is developed from three aspects: (1) block term tensor decomposition is employed to enhance the representation capability of latent feature learning; (2) linear bias terms are incorporated to further improve prediction accuracy; and (3) a tensor-oriented single-element-dependent nonnegative multiplicative update algorithm, called SLF-NMUT, is designed for efficient parameter estimation. Extensive experiments on real-world QoS datasets demonstrate that the proposed BNBT framework consistently outperforms several state-of-the-art QoS prediction methods in terms of prediction accuracy.

2605.23234 2026-06-16 cs.LG cs.CY 版本更新

Assessing Predictive Models for Fairness Based on Movement Patterns

基于移动模式评估预测模型的公平性

Francesco Lettich, Mario A. Nascimento, Chiara Pugliese, Chiara Renso

发表机构 * University of Padua(帕多瓦大学)

AI总结 针对预测模型的空间公平性,提出将公平性概念从单一地理位置扩展到移动模式,并采用空间扫描统计方法检测基于移动模式的不公平性。

Comments 33 pages, 10 figures, 7 tables

详情
AI中文摘要

评估预测模型的空间公平性涉及确定模型是否在统计上惩罚(或偏袒)与某些地理位置相关的个体。关于这一主题的文献基本假设每个个体被分配到一个单一地理位置(例如居住地)。然而,当考虑公平性时,个体所到过的位置集合(即他们在不同区域的移动模式)也很重要。因此,我们认为有必要将空间公平性的概念推广到包括移动模式,从而引出评估预测模型相对于个体移动的公平性的新问题。为了解决这个问题,我们提出了一种方法,首先将个体的移动与特定地理区域关联,考虑具有不同分辨率和对齐方式的多个空间划分,然后采用合适的空间扫描统计量来评估预测模型是否基于移动模式是公平的。在实验评估中,我们研究了该方法在数千个合成的不公平数据集上的性能,结果表明它能够有效检测这种新型的不公平性并检索受到不公平对待的对象集合,而定位性能表现出一致的多分辨率权衡。

英文摘要

Assessing the spatial fairness of predictive models involves establishing whether they are statistically penalizing (favoring) individuals associated with certain geographical locations. Literature on this topic makes the fundamental assumption that each individual is assigned to a single geographical location (e.g., place of residence). However, fairness with respect to the set of locations where one has been, i.e., their movement patterns over different regions, also matters when fairness is considered. Consequently, we argue that it is necessary to generalize the notion of spatial fairness to also include movement patterns, leading to the novel problem of assessing predictive models for fairness relative to the movements of individuals. To deal with this problem, we propose an approach that first associates the movements of individuals to certain geographic regions, considering multiple spatial partitions with different resolutions and alignments, and then employs a suitable spatial scan statistic to assess whether a predictive model is fair based on movement patterns. In the experimental evaluation, we study the performance of our approach over thousands of synthetic unfair datasets, showing that it is effective at detecting this new type of unfairness and at retrieving the set of objects treated unfairly, while localization performance exhibits a consistent multi-resolution trade-off.

2606.04145 2026-06-16 cs.LG cs.AI cs.DC 版本更新

EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms

EvalStop:利用世界反馈检测和纠正多租户RLHF平台中的奖励过度优化

Guilin Zhang, Chuanyi Sun, Kai Zhao, Xu Chu, Shahryar Sarkani, John M. Fossaceca

发表机构 * DeepMind, London, UK(深度Mind, 英国伦敦) University of Cambridge, UK(英国剑桥大学) University of Washington, USA(美国华盛顿大学)

AI总结 提出EvalStop调度原语,通过检测评估分数连续下降来终止作业、释放GPU并保留最佳检查点,以纠正奖励过度优化,在RLHF负载上实现高精度检测并提升JCT。

详情
AI中文摘要

云LLM微调平台越来越多地服务于RLHF工作负载,其中学习到的奖励模型作为人类质量的代理被优化。正如Gao等人(2023)所示,在持续优化压力下,该代理与世界反馈(下游评估指标)发生偏离,这种现象称为奖励过度优化。现有的平台调度器忽略这种偏离:非预见性调度器优化JCT而不考虑任何质量信号,SLAQ式质量感知调度器使用训练损失(一个单调下降的较弱代理,可通过黑客攻击降低),而经典的每作业早停需要人工监控且不释放共享GPU。我们提出EvalStop,一个可组合的调度原语,它在连续k次评估分数下降时终止作业,释放GPU,保留最佳检查点,并委托给任何基础调度器。我们将调度器级别的早停视为检测问题,并在一个离散事件模拟器中评估它,该模拟器的RLHF工作负载混合了奖励黑客攻击和结构健康运行,真实标签对调度器隐藏。在RLHF密集型负载(80% RLHF,64 GPU)上,EvalStop实现了精确率98%、召回率99%、假阳性率1.5%,同时相比SRTF-Est将JCT提高了9%,将浪费的计算减少了22%(p<0.05)。简单的固定进度和损失平台竞争对手要么在健康RLHF上产生65%的假阳性率,要么错过超过一半的真实黑客攻击案例。增益在所有测试的基础调度器上均成立(JCT提升9-25%),且检测质量在评估噪声(噪声标准差≤0.05时精确率至少91%)和黑客攻击基础率(黑客攻击比例20-80%时精确率至少89%)下保持稳定。

英文摘要

Cloud LLM fine-tuning platforms increasingly serve RLHF workloads, where a learned reward model is optimized as a proxy for human quality. As Gao et al. (2023) showed, this proxy diverges from world feedback (downstream eval metrics) under sustained optimization pressure, a phenomenon known as reward overoptimization. Existing platform schedulers ignore this divergence: non-clairvoyant schedulers optimize JCT without any quality signal, SLAQ-style quality-aware schedulers use training loss (a weaker proxy that drops monotonically through hacking), and classical per-job early stopping requires human monitoring and does not free shared GPUs. We propose EvalStop, a composable scheduling primitive that terminates jobs on k consecutive eval-score declines, releases GPUs, preserves the best checkpoint, and delegates to any base scheduler. We frame scheduler-level early stopping as a detection problem and evaluate it in a discrete-event simulator whose RLHF workload mixes reward-hacking and structurally healthy runs, with ground-truth labels hidden from schedulers. On RLHF-heavy workloads (80% RLHF, 64 GPUs), EvalStop achieves precision 98% / recall 99% / FPR 1.5% while improving JCT by 9% and cutting wasted compute by 22% over SRTF-Est (p<0.05). Trivial fixed-progress and loss-plateau competitors either incur 65% FPR on healthy RLHF or miss over half of true hacking cases. Gains compose across every base scheduler tested (9-25% JCT) and detection quality stays stable under eval noise (precision at least 91% at noise std <= 0.05) and hacking base rate (precision at least 89% across 20-80% hacking fractions).

2606.05693 2026-06-16 cs.LG cs.IR 版本更新

MolE-RAG: Molecular Structure-Enhanced Retrieval-Augmented Generation for Chemistry

MolE-RAG:面向化学的分子结构增强检索增强生成

Joey Chan, Wonbin Kweon, Ashley Shin, Niharika Bhattacharjee, Pengcheng Jiang, Yue Guo, Jiawei Han

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) University of California, San Diego(加州大学圣地亚哥分校)

AI总结 提出无需训练的分子中心检索增强生成框架MolE-RAG,通过整合检索文献、分子特定信息和结构相似分子三种上下文,显著提升LLM在分子性质预测任务中的性能。

详情
AI中文摘要

大型语言模型(LLM)在分子性质预测方面展现出潜力,但其对化学结构的推理能力仍然有限,因为分子表示(如SMILES)与LLM主要训练的自然语言存在显著差异。为弥合这一语义和化学知识鸿沟,我们提出MolE-RAG,一种无需训练的、以分子为中心的检索增强生成框架,用于基于LLM的分子性质预测。MolE-RAG通过三种互补的推理时上下文来源增强每次预测:检索的化学文献、分子特定信息(包括化合物同义词、标识符、官能团注释和物理化学描述符),以及从训练集中检索的结构相似分子。我们使用专有、化学专用和开源LLM在九个分子性质预测任务上评估MolE-RAG。在通用LLM上,相比仅使用SMILES的基线,MolE-RAG在分类任务上将ROC-AUC提升最多28个百分点,并将回归RMSE降低最多67%。我们进一步发现,每种上下文来源的效用因模型和任务而异,不同模型分别从文本检索、分子上下文或结构检索中获益最多。这些结果表明,以分子为中心的检索可以在无需模型微调的情况下改进基于LLM的分子性质预测,同时为在推理时整合异构化学知识提供灵活框架。

英文摘要

Large language models (LLMs) have shown promise for molecular property prediction, but their ability to reason over chemical structures remains limited, as molecular representations such as SMILES differ substantially from the natural language on which LLMs are primarily trained. To bridge this semantic and chemical knowledge gap, we propose MolE-RAG, a training-free, molecule-centric retrieval-augmented generation framework for LLM-based molecular property prediction. MolE-RAG augments each prediction with three complementary sources of inference-time context: retrieved chemistry literature, molecule-specific information including compound synonyms, identifiers, functional group annotations, and physicochemical descriptors, and structurally similar molecules retrieved from the training set. We evaluate MolE-RAG across nine molecular property prediction tasks using proprietary, chemistry-specialized, and open-source LLMs. Across general-purpose LLMs, MolE-RAG improves ROC-AUC by up to 28 percentage points on classification tasks and reduces regression RMSE by up to 67% relative to a SMILES-only baseline. We further find that the utility of each context source varies across models and tasks, with different models benefiting most from textual retrieval, molecular context, or structural retrieval. These results suggest that molecule-centric retrieval can improve LLM-based molecular property prediction without model fine-tuning while providing a flexible framework for integrating heterogeneous chemical knowledge at inference time.

2606.07226 2026-06-16 cs.LG cs.AI cs.CL 版本更新

DEFINED: A Data-Efficient Computational Framework for Fine-Grained Creativity Assessment in Debate Scenarios

DEFINED: 辩论场景中细粒度创造力评估的数据高效计算框架

Tongzhou Yu, Mingjia Li, Hong Qian, Wenkai Wang, Zongbao Zhang, Yaoyu Jiang, Xiangfeng Wang, Aimin Zhou, Jiajun Guo

发表机构 * Nanjing University(南京大学) Shanghai Innovation Institute(上海创新研究院) East China Normal University(华东师范大学)

AI总结 提出DEFINED框架,通过层次化八维指标体系、预训练语言模型和混合粒度训练策略,在辩论场景中实现数据高效的细粒度创造力自动评估,优于现有方法。

Comments Accepted by KDD 2026

详情
AI中文摘要

人类创造力已成为大语言模型时代的关键能力。在复杂、开放环境中评估创造力是数据挖掘领域的一大挑战,目前受限于对标准化简单任务的依赖以及细粒度专家数据的稀缺。作为生态有效的评估场景,辩论反映了创造力的多个维度,涵盖发散思维和收敛思维。此外,辩论是一个数据丰富的领域,拥有大量公开可获取的材料。当前主流的自动评分方法难以适应辩论等复杂场景,因此仍然依赖昂贵的人工评估。为此,本文提出DEFINED,一种数据高效的计算框架,用于辩论场景中的细粒度创造力评估。DEFINED通过层次化的八维指标体系操作化辩论创造力,采用预训练自回归语言模型,并配备支持细粒度和粗粒度评估的层次化评分头。从真实辩论比赛中获取陈述及其相关专家评分,并采用约束数据增强策略以解决原始数据中的精英偏差。DEFINED采用混合粒度训练策略,能够从训练有素的研究生专家提供的有限细粒度监督中实现鲁棒学习。为严格验证超越合成基准的生态效度,我们纳入了一项针对辩论新手参与者的实证研究,利用这些真实数据作为中低水平人群的定性案例研究。在我们的评估协议中,评分模型实现了准确且稳定的评分,优于基于提示的大语言模型评估器和现有的辩论评分方法。

英文摘要

Human creativity has emerged as a critical competency in the era of large language models. Assessing creativity in complex, open-ended environments is a grand challenge in data mining, currently hindered by a reliance on standardized simple tasks and the scarcity of fine-grained expert data. As an ecologically valid assessment context, debate reflects multiple dimensions of creativity, encompassing both divergent thinking and convergent thinking. Moreover, debate is a data-rich domain, with a large volume of publicly accessible materials. Current mainstream automated scoring methods are poorly suited to complex settings such as debate, and therefore still rely on costly human evaluation. To this end, this paper proposes DEFINED, a data-efficient computational framework for fine-grained creativity assessment in debate scenarios. DEFINED operationalizes debate creativity through a hierarchical eight-dimensional metric system, implemented via a pre-trained autoregressive language model with a hierarchical scoring head that supports both fine-grained and coarse-grained evaluation. Statements and their associated expert scores were obtained from authentic debate competitions, and a constrained data augmentation strategy was employed to address the elite bias inherent in the original data. DEFINED adopts a mixed-granularity training strategy enabling robust learning from limited fine-grained supervision annotated by trained graduate experts. To rigorously validate ecological validity beyond synthetic benchmarks, we incorporate an empirical study with debate-naive participants, utilizing these authentic data to serve as a qualitative case study for mid-to-low proficiency populations. Across our evaluation protocol, our scoring model achieves accurate and stable scoring, outperforming prompt-based large language model evaluators and existing debate scoring methods.

2606.08592 2026-06-16 cs.LG quant-ph 版本更新

Quantum Global Variational Learning for Quantum Error Correction

量子全局变分学习用于量子纠错

Shun Ryuzaki, Hideo Mukai

发表机构 * Meiji University(明治大学)

AI总结 提出一种全局结构的量子神经网络,减少量子电路中酉矩阵数量,训练时间降低97%,训练完成率提升25%,实现100%训练成功率,纠错性能超越以往研究。

Comments 24 pages, 22 figures

详情
AI中文摘要

高效的量子纠错对于量子计算的发展至关重要。我们提出了一种具有全局结构的量子神经网络,该网络减少了量子电路中所需的酉矩阵数量。这种方法使训练时间减少了97%,训练完成率提高了25%,最终实现了100%的训练成功率,同时超越了以往研究中报告的纠错性能。此外,我们展示了量子纠错对内部网络噪声的增强鲁棒性。而且,由于计算负载的减少,内部网络噪声下的量子纠错保真度提高了15%。

英文摘要

Efficient quantum error correction is essential for the advancement of quantum computing. We propose a quantum neural network with a global structure that reduces the number of unitary matrices required in quantum circuits. This approach resulted in a 97% reduction in training time and up to a 25% improvement in the training completion rate, ultimately achieving a 100% success rate in training while surpassing the error correction performance reported in previous studies. In addition, we demonstrated the enhanced robustness of quantum error correction against internal network noise. Moreover, the fidelity of quantum error correction under internal network noise increased by up to 15% due to the reduced computational load.

2405.15768 2026-06-16 stat.ML cs.AI cs.LG 版本更新

Canonical Variates in Wasserstein Metric Space

Wasserstein度量空间中的典型变量

Jia Li, Lin Lin

发表机构 * Department of Statistics, The Pennsylvania State University(宾夕法尼亚州立大学统计学系) Department of Biostatistics and Bioinformatics, Duke University(杜克大学生物统计学与生物信息学系)

AI总结 针对分布数据分类问题,提出基于Wasserstein距离的Fisher比最大化降维方法,通过迭代优化算法实现,实验证明能显著提升分类性能。

Comments single space 39 pages, 10 figures

详情
AI中文摘要

在本文中,我们处理由向量空间上的分布(而非单个点)表示的实例的分类问题。我们考虑基于成对距离的分类算法,特别是分布之间的Wasserstein度量。我们研究的核心是在Wasserstein度量空间中进行降维以提高分类准确性。我们引入了一种基于最大化Fisher比(定义为类间变异与类内变异之比)原理的新方法。该比值最大化的方向被称为判别坐标或典型变量轴。在实践中,类间变异和类内变异被定义为分布对之间的平均平方Wasserstein距离,这些分布对要么属于同一类,要么属于不同类。该比值优化通过一种迭代算法实现,该算法在向量空间中的最优传输和最大化步骤之间交替进行。进行了实证研究以评估算法的收敛性;实验结果表明,降维技术显著提高了分类性能。此外,新方法优于基于从分布数据派生的向量表示运行的成熟算法。它对实例如何由分布总结的变化(例如高斯混合模型表示中的分量数量)也表现出鲁棒性。

英文摘要

In this paper, we address the classification of instances represented by distributions on a vector space rather than single points. We consider classification algorithms based on pairwise distances, specifically, the Wasserstein metric between distributions. Central to our investigation is dimension reduction within the Wasserstein metric space to enhance classification accuracy. We introduce a novel approach grounded in the principle of maximizing Fisher's ratio, defined as the quotient of between-class variation to within-class variation. The directions in which this ratio is maximized are termed discriminant coordinates or canonical variates axes. In practice, both between-class and within-class variations are defined as the average squared Wasserstein distances between pairs of distributions, with the pairs either belonging to the same class or to different classes. This ratio optimization is achieved through an iterative algorithm, which alternates between optimal transport and maximization steps within the vector space. Empirical studies are conducted to assess the algorithm's convergence; and experimental results demonstrate that the dimension reduction technique substantially enhances classification performance. Moreover, the new method outperforms well-established algorithms that operate on vector representations derived from distributional data. It also exhibits robustness to variations in how instances are summarized by distributions, such as the number of components in a Gaussian mixture model (GMM) representation.

2406.06855 2026-06-16 math.OC cs.LG 版本更新

Design and Scheduling of an AI-based Queueing System

基于AI的排队系统的设计与调度

Jiung Lee, Hongseok Namkoong, Yibo Zeng

发表机构 * Columbia University(哥伦比亚大学)

AI总结 针对预测模型在服务系统中与人类服务器交互的场景,研究预测误差对拥塞成本的影响,提出一种基于索引的策略,在重流量下近最优地利用预测类别信息,并指导预测模型选择。

详情
AI中文摘要

为了利用预测模型在服务系统中做出最优调度决策,我们必须理解预测误差如何通过影响其他作业延迟的外部性而导致拥塞。受预测模型与人类服务器(例如内容审核)交互的应用启发,我们考虑一个由多个单服务器队列组成的大型排队系统,其中作业的类别使用预测模型估计。通过刻画重流量下误预测对拥塞成本的影响,我们设计了一种基于索引的策略,该策略以近最优的方式整合预测类别信息。我们的理论结果通过提供一个以下游排队性能为核心关注的简单模型选择程序,指导了预测模型的设计,并为如何设计基于AI分诊的排队系统提供了新颖见解。我们基于真实在线评论的内容审核任务说明了我们的框架,其中通过微调大型语言模型构建毒性分类器。

英文摘要

To leverage prediction models to make optimal scheduling decisions in service systems, we must understand how predictive errors impact congestion due to externalities on the delay of other jobs. Motivated by applications where prediction models interact with human servers (e.g., content moderation), we consider a large queueing system comprising of many single server queues where the class of a job is estimated using a prediction model. By characterizing the impact of mispredictions on congestion cost in heavy traffic, we design an index-based policy that incorporates the predicted class information in a near-optimal manner. Our theoretical results guide the design of predictive models by providing a simple model selection procedure with downstream queueing performance as a central concern, and offer novel insights on how to design queueing systems with AI-based triage. We illustrate our framework on a content moderation task based on real online comments, where we construct toxicity classifiers by finetuning large language models.

2411.05824 2026-06-16 eess.IV cs.CV cs.LG 版本更新

Navigating Distribution Shifts in Medical Image Analysis: A Survey

医学图像分析中的分布偏移导航:综述

Zixian Su, Jingwei Guo, Xi Yang, Qiufeng Wang, Frans Coenen, Amir Hussain, Kaizhu Huang

发表机构 * Life Simulation Research Center, Beijing Academy of Artificial Intelligence(北京人工智能生命模拟研究中心) Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology(王国阿卜杜勒·阿齐兹国王科技大学电气与数学科学与工程系) Department of Intelligent Science, School of Advanced Technology, Xi’an Jiaotong-Liverpool University(西安交通大学利物浦大学先进科技学院智能科学系) Computer Science, School of Computer Science and Informatics, University of Liverpool(利物浦大学计算机科学与信息学学院) SDAIA-KFUPM Joint Research Centre for Artificial Intelligence, King Fahd University of Petroleum and Minerals(法赫德石油与矿物大学人工智能SDAIA-KFUPM联合研究中心) Nuffield Department of Primary Care Health Sciences, University of Oxford(牛津大学初级保健健康科学努尔菲尔德部门)

AI总结 本文系统综述了应对医学图像分析中分布偏移的深度学习方法,按临床约束分类为联合训练、联邦学习、微调和域泛化,并揭示方法从显式对齐向不确定性建模的转变。

详情
AI中文摘要

医学图像分析(MedIA)已成为现代医疗保健中不可或缺的一部分,增强了临床诊断和个性化治疗。尽管深度学习(DL)技术取得了显著进展,但其实际部署面临分布偏移带来的挑战,即基于特定数据集训练的模型在不同医院或患者群体的数据上表现不佳。为解决这一问题,研究人员积极开发策略以提高DL模型的适应性,使其能够在陌生环境中有效使用。本文系统综述了将DL技术应用于受分布偏移影响的MedIA系统的方法。我们并非按技术特征组织现有方法,而是明确将现实临床约束(如有限的数据可访问性、严格的隐私要求和异构协作协议)与能够解决这些约束的技术范式联系起来。通过建立操作约束与方法论演变之间的这种联系,我们将现有工作分类为联合训练、联邦学习、微调和域泛化,每种方法对应特定的医疗场景。除了这种分类,我们的实证分析表明,随着这些范式中域信息逐渐变得不可访问,性能改进变得越来越受限,并进一步揭示了方法论焦点从显式分布对齐向不确定性感知建模的逐渐转变,最终指向在实际MedIA中需要更多可部署性感知的设计。

英文摘要

Medical Image Analysis (MedIA) has become indispensable in modern healthcare, enhancing clinical diagnostics and personalized treatment. Despite the remarkable advancements supported by deep learning (DL) technologies, their practical deployment faces challenges posed by distribution shifts, where models trained on specific datasets underperform on others from varying hospitals, or patient populations. To address this issue, researchers have been actively developing strategies to increase the adaptability of DL models, enabling their effective use in unfamiliar environments. This paper systematically reviews approaches that apply DL techniques to MedIA systems affected by distribution shifts. Rather than organizing existing methods by technical characteristics, we explicitly bridge real-world clinical constraints -- such as limited data accessibility, strict privacy requirements, and heterogeneous collaboration protocols -- with the technical paradigms able to address them. By establishing this connection between operational constraints and methodological evolution, we categorize existing works into Joint Training, Federated Learning, Fine-tuning, and Domain Generalization, each aligned with specific healthcare scenarios. Beyond this taxonomy, our empirical analysis suggests that, as domain information becomes progressively less accessible across these paradigms, performance improvements become increasingly constrained, and further uncovers a gradual shift in methodological focus from explicit distribution alignment toward uncertainty-aware modeling, ultimately pointing to the need for more deployability-aware design in real-world MedIA.

2411.18714 2026-06-16 cs.RO cs.AI cs.LG 版本更新

Explainable deep learning improves human mental models of self-driving cars

可解释深度学习提升人类对自动驾驶汽车的心理模型

Eoin M. Kenny, Akshay Dharmavaram, Sang Uk Lee, Tung Phan-Minh, Shreyas Rajesh, Yunqing Hu, Laura Major, Momchil S. Tomov, Julie A. Shah

发表机构 * Computer Science & Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology(计算机科学与人工智能实验室(CSAIL),麻省理工学院) Motional AD Inc.(Motional AD公司) Department of Psychology and Center for Brain Science, Harvard University(心理学系和大脑科学中心,哈佛大学) Department of Aeronautics and Astronautics, Massachusetts Institute of Technology(航空与宇航系,麻省理工学院)

AI总结 提出概念包装网络(CW-Net),在真实自动驾驶车上实现可解释规划,通过因果性概念解释提升驾驶员对车辆行为的预测能力,尤其在意外场景中。

Comments MST & JAS contributed equally to this work

详情
AI中文摘要

自动驾驶汽车越来越依赖深度神经网络来实现类人驾驶。这种黑箱规划器的不透明性使得准确预测其何时会失败变得具有挑战性,可能带来灾难性后果。尽管关于解释这些系统的研究激增,但由于实际部署的困难,大部分研究局限于模拟或玩具设置,使得这些技术的实际效用未知。在此,我们引入概念包装网络(CW-Net),一种忠实解释基于机器学习的规划器行为的方法,该方法在不牺牲性能的情况下,将其推理因果地扎根于人类可解释的概念。我们在真实自动驾驶车上部署CW-Net,并表明由此产生的解释改善了人类驾驶员对车辆的心理模型,使他们能够更好地预测其行为,特别是在意外情况下。这表明,集成到自动驾驶汽车中的可解释深度学习在现实部署环境中既易于理解又有用。我们预计我们的方法可以应用于其他安全关键系统,如自主无人机和机器人外科医生,以及其他架构,如端到端学习系统和视觉-语言-动作模型。总体而言,我们的研究为自主代理的可解释性建立了一条经过部署验证的路径,这可能有助于使其更加透明和安全。

英文摘要

Self-driving cars increasingly rely on deep neural networks to achieve human-like driving. The opacity of such black-box planners makes it challenging to accurately anticipate when they will fail, with potentially catastrophic consequences. While research into interpreting these systems has surged, most of it is confined to simulations or toy setups due to the difficulty of real-world deployment, leaving the practical utility of such techniques unknown. Here, we introduce the Concept-Wrapper Network (CW-Net), a method for faithfully explaining the behavior of machine-learning-based planners that causally grounds their reasoning in human-interpretable concepts without sacrificing performance. We deploy CW-Net on a real self-driving car and show that the resulting explanations improve the human driver's mental model of the vehicle, allowing them to better predict its behavior, particularly in surprising situations. This demonstrates that explainable deep learning integrated into self-driving cars can be both understandable and useful in a realistic deployment setting. We anticipate our method could be applied to other safety-critical systems, such as autonomous drones and robotic surgeons, as well as to other architectures, such as end-to-end learning systems and vision-language-action models. Overall, our study establishes a deployment-validated pathway to interpretability for autonomous agents, which could help make them more transparent and safe.

2505.08774 2026-06-16 q-bio.BM cs.LG 版本更新

Generative Molecular Design with Steerable and Granular Synthesizability Control

具有可引导和粒度合成性控制的生成式分子设计

Jeff Guo, Víctor Sabanza-Gil, Olha Semenenko, Oleksii Hrabovskyi, Mykola Protopopov, Anna Kapeliukha, Oleksandr Mosia, Sofiia Hatych, Diana Alieksieieva, Tom Nelis, Patrick Molliet, Helena Solé-Àvila, Valentas Olikauskas, Nina Aregger, Irina Morozova, Joseph Schmidt, Zlatko Jončev, Olga Tarkhanova, Petro Borysko, Jerome Waser, Bruno Correia, Jeremy Luterbacher, Philippe Schwaller

发表机构 * Laboratory of Artificial Chemical Intelligence (LIAC)(人工化学智能实验室) NCCR Catalysis(催化联合研究所) Laboratory of Sustainable and Catalytic Processing (LPDC)(可持续与催化加工实验室) CHEMSPACE LLC Enamine Ltd.(Enamine有限公司) Taras Shevchenko National University of Kyiv(基塔斯·谢甫琴科基辅国立大学) V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry(V. P. Kukhar生物有机化学与石油化学研究所) Palladin Institute of Biochemistry(Palladin生物化学研究所) Laboratory of Catalysis and Organic Synthesis (LCSO)(催化与有机合成实验室) Laboratory of Protein Design and Immunoengineering (LPDI)(蛋白质设计与免疫工程实验室)

AI总结 提出统一合成约束分子设计与超大规模虚拟筛选的生成框架,通过可引导和粒度合成性控制,生成满足多参数优化目标且具有预测合成路径的分子,在BRD4和Wee1靶点上验证了有效性。

详情
AI中文摘要

设计既具有最佳性质又易于合成的分子是药物发现中的核心挑战。现有考虑合成性的工作可以联合输出生成分子的预测合成路线。然而,在解决合成难易程度以及灵活纳入所需反应约束方面,关注甚少。另一方面,虚拟筛选搜索可商购化合物,但在扩展到超大规模(十亿级及以上)化学空间时带来挑战。在这里,我们提出一个生成式设计框架,通过可引导和粒度合成性控制,统一了合成约束分子设计与超大规模虚拟筛选。生成的分子满足任意多参数优化目标,其预测合成路线满足混合匹配约束:包括或排除特定反应、纳入特定构建模块以及最小化合成路线长度。在针对BRD4的端到端内部活动中,我们设计了可用特定选定反应和构建模块合成的分子,合成了所有六个选定化合物,并鉴定了两个微摩尔级结合剂。我们进一步证明,反应控制能够有效导航超大规模按需化学空间,以识别性质最优的候选分子。通过将我们的框架应用于Chemspace的Freedom 4.0按需空间(1420亿分子),我们在单个消费级GPU(仅8 GB GPU内存)上生成了约32万分子(库的0.00023%),并在60个合成候选物中鉴定出一个微摩尔级Wee1结合剂。因此,单一统一框架能够生成新颖的可合成分子并检索目录就绪候选物,为缓解合成性瓶颈提供了灵活解决方案。

英文摘要

Designing molecules that are both property-optimal and readily synthesizable is a central challenge in drug discovery. Existing works that do consider synthesizability can jointly output predicted synthesis routes for generated molecules. However, there has been minimal attention in addressing the ease of synthesis and with flexibility to incorporate desired reaction constraints. On the other hand, virtual screening searches for commercially available compounds, but imposes challenges when scaling to ultra-large (billion-size and beyond) chemical spaces. Here, we propose a generative design framework that unifies synthesis-constrained molecular design and ultra-large-scale virtual screening through steerable and granular synthesizability control. Generated molecules satisfy arbitrary multi-parameter optimization objectives with predicted synthesis routes satisfying mix-and-match constraints: including or avoiding certain reactions, incorporating specific building blocks, and minimizing synthesis route length. In an end-to-end in-house campaign targeting BRD4, we designed molecules synthesizable with specific selected reactions and building blocks, synthesized all six selected compounds, and identified two micromolar binders. We further demonstrate that reaction control enables efficient navigation of ultra-large make-on-demand chemical spaces to identify property-optimal candidates. By applying our framework to Chemspace's Freedom 4.0 make-on-demand space (142 billion molecules), we generated ~320k molecules (0.00023% of the library) on a single consumer-grade GPU (with only 8 GB GPU memory) and identified a micromolar Wee1 binder amongst 60 synthesized candidates. The single unified framework thus enables generating novel synthesizable molecules and retrieving catalogue-ready candidates, offering a flexible solution to mitigating the synthesizability bottleneck.

2506.20668 2026-06-16 cs.RO cs.LG 版本更新

DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy

DemoDiffusion: 使用预训练扩散策略的一次性人类模仿

Sungjae Park, Homanga Bharadhwaj, Shubham Tulsiani

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出DemoDiffusion方法,通过单次人类演示和预训练扩散策略,无需任务特定训练即可使机器人执行操作任务,在8项任务中平均成功率达83.8%。

Comments 11 pages. Published at ICRA 2026

详情
AI中文摘要

我们提出DemoDiffusion,一种简单的方法,使机器人能够通过模仿单次人类演示来执行操作任务,无需任务特定训练或配对的人-机器人数据。我们的方法基于两个见解。首先,人类演示中的手部运动为机器人的末端执行器轨迹提供了有用的先验,我们可以通过运动学重定向将其转换为粗略的开环机器人运动轨迹。其次,虽然这种重定向的运动捕捉了任务的整体结构,但它可能无法很好地与上下文中的合理机器人动作对齐。为了解决这个问题,我们利用预训练的通用扩散策略来修改轨迹,确保它既遵循人类运动,又保持在合理机器人动作的分布内。与基于在线强化学习或配对的人-机器人数据的方法不同,我们的方法能够以最小的努力稳健地适应新任务和场景。在涵盖8种不同操作任务的实际实验中,DemoDiffusion实现了83.8%的平均成功率,而预训练策略为13.8%,运动学重定向为52.5%,甚至在预训练通用策略完全失败的任务上也取得了成功。项目页面:此 https URL

英文摘要

We propose DemoDiffusion, a simple method for enabling robots to perform manipulation tasks by imitating a single human demonstration, without requiring task-specific training or paired human-robot data. Our approach is based on two insights. First, the hand motion in a human demonstration provides a useful prior for the robot's end-effector trajectory, which we can convert into a rough open-loop robot motion trajectory via kinematic retargeting. Second, while this retargeted motion captures the overall structure of the task, it may not align well with plausible robot actions in-context. To address this, we leverage a pre-trained generalist diffusion policy to modify the trajectory, ensuring it both follows the human motion and remains within the distribution of plausible robot actions. Unlike approaches based on online reinforcement learning or paired human-robot data, our method enables robust adaptation to new tasks and scenes with minimal effort. In real-world experiments across 8 diverse manipulation tasks, DemoDiffusion achieves 83.8\% average success rate, compared to 13.8\% for the pre-trained policy and 52.5\% for kinematic retargeting, succeeding even on tasks where the pre-trained generalist policy fails entirely. Project page: https://demodiffusion.github.io/

2507.17804 2026-06-16 astro-ph.HE astro-ph.CO astro-ph.IM cs.LG hep-ph 版本更新

On the Energy Distribution of the Galactic Center Excess' Sources

银河系中心过量辐射源的能谱分布

Florian List, Yujin Park, Nicholas L. Rodd, Eve Schoen, Florian Wolf

发表机构 * Department of Astrophysics, University of Vienna(维也纳大学天体物理系) Theory Group, Lawrence Berkeley National Laboratory(伯克利劳伦斯国家实验室理论组) Berkeley Center for Theoretical Physics, University of California(加州大学伯克利分校理论物理中心) University of California, Berkeley(加州大学伯克利分校) Lawrence Berkeley National Laboratory(伯克利劳伦斯国家实验室)

AI总结 利用基于神经网络模拟的推理方法联合分析空间和能谱数据,发现银河系中心过量辐射若由点源贡献,所需源数量比之前估计高两个数量级,支持其可能为暗物质湮灭产生的弥散辐射。

Comments 7+22 pages, 2+22 figures; v2: journal version

详情
AI中文摘要

银河系中心过量辐射(GCE)可能预示着湮灭暗物质的发现。但与此结论相悖的分析表明,在发射的空间结构内存在暗弱点源的证据。由于技术限制,这些分析纯粹基于空间信息,丢弃了所有可能将过量辐射与天体物理背景区分开来的能谱信息。在这里,我们证明基于神经网络模拟的推理方法可以联合分析空间和能谱数据。这一改进意义深远:能量信息使假定的点源显著变暗,表明GCE本质上是弥散的,或者由异常大量的源组成。定量而言,对于我们的最佳拟合背景模型,过量辐射基本上与暗物质预测的泊松发射一致。如果由点源引起,我们的中值预测为$\mathcal{O}(10^5)$个源,或在90%置信度下超过35,000个,两者都比早期GCE点源分析所偏好的数百个源高出几个数量级,尽管背景系统学允许的变化可能将所需源数量减少大约一个数量级。

英文摘要

The Galactic Center Excess (GCE) may yet herald the discovery of annihilating dark matter. Weighing against that conclusion are analyses showing evidence for dim point sources within the spatial structure of the emission. Due to technical limitations these analyses are purely spatial with all spectral information that could disentangle the excess from astrophysical backgrounds discarded. Here, we demonstrate that a neural network simulation-based inference approach can jointly analyze the spatial and spectra data. The addition is profound: energy information drives the putative point sources to be significantly dimmer, indicating either the GCE is truly diffuse in nature or made of an exceptionally large number of sources. Quantitatively, for our best fit background model, the excess is essentially consistent with Poisson emission as predicted by dark matter. If due to point sources, our median prediction is $\mathcal{O}(10^5)$ sources, or more than 35,000 at 90\% confidence, both orders of magnitude larger than the hundreds preferred by earlier point-source analyses of the GCE, although variations allowed by background systematics could reduce the required number of sources by roughly an order of magnitude.

2510.14092 2026-06-16 stat.ML cs.LG 版本更新

deFOREST: Fusing Optical and Radar satellite data for Enhanced Sensing of Tree-loss

deFOREST: 融合光学与雷达卫星数据增强树木损失感知

Julio Enrique Castrillon-Candas, Hanfeng Gu, Caleb Meredith, Yulin Li, Xiaojing Tang, Pontus Olofsson, Mark Kon

发表机构 * Department of Mathematics and Statistics, Boston University(波士顿大学数学与统计学系) Department of Earth and Environment, Boston University(波士顿大学地球与环境系) College of Integrated Science & Engineering, James Madison University(詹姆斯麦迪逊大学整合科学与工程学院) NASA Marshall Space Flight Center(美国国家航空航天局马歇尔航天飞行中心)

AI总结 提出融合光学与SAR数据的森林砍伐检测流程,利用离散KL展开残差空间构建异常图,结合HMM分类,在亚马逊区域验证混合方法优于现有技术且对稀疏光学数据更鲁棒。

详情
Journal ref
IEEE Transactions on Geoscience and Remote Sensing, vol. 64, 2026, Art no. 4409213
AI中文摘要

本文开发了一个结合光学和合成孔径雷达(SAR)数据的森林砍伐检测流程。该流程的一个关键组成部分是利用离散Karhunen-Loéve(KL)展开的残差空间构建光学数据的异常图。异常通过森林标称状态下残差分量分布的浓度界限来量化。该界限不需要关于数据分布的先验知识。这与假设知道数据分布的统计参数方法形成对比,这种假设不切实际,尤其对于高维数据(如我们的数据)不可行。一旦计算出光学异常图,它们与SAR数据结合,并通过隐马尔可夫模型(HMM)对森林状态进行分类。我们在亚马逊森林中一个$92\,km \times 92\,km$的区域使用Sentinel-1(SAR)和Sentinel-2(光学)数据测试了我们的方法。结果表明,混合光学-雷达方法和仅光学方法都实现了高精度,优于最新的混合方法。此外,在高度多云地区常见的光学数据稀疏情况下,混合方法显著更鲁棒。

英文摘要

In this paper we develop a deforestation detection pipeline that incorporates optical and Synthetic Aperture Radar (SAR) data. A crucial component of the pipeline is the construction of anomaly maps of the optical data, which is done using the residual space of a discrete Karhunen-Loéve (KL) expansion. Anomalies are quantified using a concentration bound on the distribution of the residual components for the nominal state of the forest. This bound does not require prior knowledge on the distribution of the data. This is in contrast to statistical parametric methods that assume knowledge of the data distribution, an impractical assumption that is especially infeasible for high dimensional data such as ours. Once the optical anomaly maps are computed they are combined with SAR data, and the state of the forest is classified by using a Hidden Markov Model (HMM). We test our approach with Sentinel-1 (SAR) and Sentinel-2 (Optical) data on a $92\,km \times 92\,km$ region in the Amazon forest. The results show that both the hybrid optical-radar and optical only methods achieve high accuracy that is superior to the recent state-of-the-art hybrid method. Moreover, the hybrid method is significantly more robust in the case of sparse optical data that are common in highly cloudy regions.

2511.22486 2026-06-16 physics.plasm-ph cs.LG 版本更新

The Machine Learning Approach to Moment Closure Relations for Plasma: A Review

等离子体矩闭包关系的机器学习方法:综述

Samuel Burles, Enrico Camporeale

发表机构 * School of Physical and Chemical Sciences, Queen Mary University of London(伦敦大学女王学院物理与化学科学学院) Space Weather TREC, University of Colorado(科罗拉多大学空间天气TREC)

AI总结 本文综述了机器学习方法在等离子体流体模型中发展改进闭包模型的研究,涵盖神经网络代理和方程发现两类方法,并讨论了离线测试与在线模拟的挑战及未来方向。

Comments 58 pages, 6 figures

详情
AI中文摘要

大规模等离子体全局模拟的需求是空间和实验室等离子体物理学中持续存在的挑战。任何基于流体模型的模拟都固有地需要高阶等离子体矩的闭包关系。本综述汇编并分析了近期涌现的机器学习方法,这些方法旨在开发改进的等离子体闭包模型,能够在等离子体流体模型中捕捉动力学现象。我们调查了两类方法:神经网络代理(从多层感知器到傅里叶神经算子,后者最近在流体求解器内在线复现了线性和非线性朗道阻尼)和方程发现方法(如稀疏回归);并根据这些研究是离线对照参考数据测试还是在线在时间演化求解器内测试进行组织。我们概述了与机器学习闭包相关的挑战,包括非对角压力张量精度、超出训练分布的泛化能力以及稳定集成到大尺度模拟中,并指出了未来研究可能解决这些问题的方向。

英文摘要

The requirement for large-scale global simulations of plasma is an ongoing challenge in both space and laboratory plasma physics. Any simulation based on a fluid model inherently requires a closure relation for the high order plasma moments. This review compiles and analyses the recent surge of machine learning approaches developing improved plasma closure models capable of capturing kinetic phenomena within plasma fluid models. We survey two methodological families: neural-network surrogates (from multilayer perceptrons to Fourier neural operators, the latter recently reproducing both linear and non-linear Landau damping online within a fluid solver) and equation-discovery methods such as sparse regression; and organise the studies by whether they are tested offline against reference data or online within a time-evolving solver. We outline the challenges associated with machine-learning closures, including off-diagonal pressure-tensor accuracy, generalisation beyond the training distribution, and stable integration into large-scale simulations, and the directions future research might take to address them.

2601.20875 2026-06-16 stat.AP cs.LG econ.EM stat.ME stat.ML 版本更新

Drivers, Receivers, and Dynamic Linkages: The Directed Structure of SDG Interdependence, 2000--2024

驱动者、接收者与动态联系:可持续发展目标相互依赖的有向结构,2000-2024

Md Muhtasim Munif Fahim, Md Jahid Hasan Imran, Md. Naim Molla, Luknath Debnath, Tonmoy Shil, Ehsanul Bashar Pranto, Md Mostafizur Rahman Likhon, Md Shafin Sanyan Saad, Md. Rezaul Karim

发表机构 * Data Science Research Lab, Department of Statistics, University of Rajshahi(数据科学研究实验室,统计学系,拉贾沙希大学)

AI总结 使用面板格兰杰因果检验和局部投影法,分析114个国家2000-2024年17个可持续发展目标的有向相互依赖网络,发现84个显著联系(40个协同、44个权衡),驱动者-接收者排名脆弱,和平与强大机构是净接收者,减贫是效应加权驱动者。

Comments 27 pages, 5 figures. Panel Granger non-causality and local projections on 114 countries (2000-2024). Submitted to Sustainability Science

详情
AI中文摘要

财政和行政能力有限的政府需要知道哪些可持续发展目标(SDGs)通过目标系统传播进展以及传播速度有多快。我们利用2000年至2024年每年观测的114个国家的平衡面板数据,绘制了所有17个目标的有向相互依赖结构。目标序列具有持续性、趋势性和横截面依赖性,因此我们应用了两种适用于该机制的估计量:对一阶差分序列运行的Dumitrescu-Hurlin面板格兰杰非因果性检验,以恢复有向交互网络;以及具有Driscoll-Kraay标准误的面板局部投影,以测量31个理论推导的指标联系的动态幅度。在272个有向目标对中,84个联系通过了错误发现控制(40个协同,44个权衡;网络密度0.31)。协同和权衡以相当的强度出现,因此没有单一目标表现为通用加速器,目标层级本身也很脆弱。驱动者-接收者排名在滞后阶数和中心性指标上弱相关,并且在国家自助法下只有两个角色与零可区分:和平与强大机构作为最清晰的净接收者,以及减贫作为最可能的效应量加权驱动者。支持的联系是动态的,在四到五年内累积:卫生设施和贫困改善是降低儿童死亡率的最强预测因子,教育-儿童健康关联在183个国家的独立世界发展指标数据中得到证实。这些结果警示基于排名的加速器政策,并支持基于通过组成指标监测的、有支持的时间滞后联系构建的自适应投资组合。

英文摘要

Governments with limited fiscal and administrative capacity need to know which Sustainable Development Goals (SDGs) propagate progress through the goal system and how quickly. We map the directed interdependence structure of all seventeen goals using a balanced panel of 114 countries observed annually from 2000 to 2024. The goal series are persistent, trending, and cross-sectionally dependent, so we apply two estimators matched to this regime: a Dumitrescu-Hurlin panel Granger non-causality test, run on first-differenced series, to recover the directed interaction network, and panel local projections with Driscoll-Kraay standard errors to measure the dynamic magnitude of 31 theory-derived indicator linkages. Of 272 directed goal pairs, 84 linkages survive false-discovery control (40 synergies, 44 trade-offs; network density 0.31). Synergies and trade-offs occur at comparable strength, so no single goal behaves as a universal accelerator, and the goal-level hierarchy itself is fragile. Driver-receiver rankings correlate weakly across lag orders and centrality metrics, and under a country bootstrap only two roles are distinguishable from zero: peace and strong institutions as the clearest net receiver, and poverty reduction as the most probable effect-size-weighted driver. The supported linkages are dynamic, accruing over four to five years: sanitation and poverty improvements are the strongest predictors of lower child mortality, and the education-child-health association is corroborated in independent World Development Indicators data across 183 countries. These results caution against rankings-based accelerator policy and support adaptive portfolios built on supported, time-lagged linkages monitored through constituent indicators.

2602.07343 2026-06-16 cs.CV cs.AI cs.LG cs.RO 版本更新

Seeing Roads Through Words: A Language-Guided Framework for RGB-T Driving Scene Segmentation

通过文字看道路:一种语言引导的RGB-T驾驶场景分割框架

Ruturaj Reddy, Hrishav Bakul Barua, Junn Yong Loo, Thanh Thi Nguyen, Ganesh Krishnasamy

发表机构 * National University of Singapore(新加坡国立大学) University of Technology Sydney(悉尼科技大学)

AI总结 提出CLARITY框架,利用视觉语言模型先验动态调整RGB-T融合策略,并引入暗目标语义保留和层次化解码器,在MFNet数据集上达到62.3% mIoU和77.5% mAcc的新SOTA。

详情
AI中文摘要

在恶劣光照、照明和阴影条件下,道路场景的鲁棒语义分割仍然是自动驾驶应用的核心挑战。RGB-热融合是一种标准方法,但现有方法在所有条件下统一应用静态融合策略,导致模态特定噪声在网络中传播。因此,我们提出CLARITY,它根据检测到的场景条件动态调整融合策略。在视觉语言模型(VLM)先验的引导下,网络学习根据光照状态调节每种模态的贡献,同时利用对象嵌入进行分割,而不是应用固定的融合策略。我们进一步引入了两种机制:一种保留有效的暗对象语义,这些语义在先前的噪声抑制方法中被错误丢弃;另一种是层次化解码器,它在不同尺度上强制结构一致性,以锐化薄对象的边界。在MFNet数据集上的实验表明,CLARITY建立了新的最先进水平(SOTA),实现了62.3%的mIoU和77.5%的mAcc。

英文摘要

Robust semantic segmentation of road scenes under adverse illumination, lighting, and shadow conditions remain a core challenge for autonomous driving applications. RGB-Thermal fusion is a standard approach, yet existing methods apply static fusion strategies uniformly across all conditions, allowing modality-specific noise to propagate throughout the network. Hence, we propose CLARITY that dynamically adapts its fusion strategy to the detected scene condition. Guided by vision-language model (VLM) priors, the network learns to modulate each modality's contribution based on the illumination state while leveraging object embeddings for segmentation, rather than applying a fixed fusion policy. We further introduce two mechanisms - one which preserves valid dark-object semantics that prior noise-suppression methods incorrectly discard, and a hierarchical decoder that enforces structural consistency across scales to sharpen boundaries on thin objects. Experiments on the MFNet dataset demonstrate that CLARITY establishes a new state-of-the-art (SOTA), achieving 62.3% mIoU and 77.5% mAcc.

2603.12514 2026-06-16 cs.CV cs.LG 版本更新

CT-VDETR: Semi-supervised 3D Trauma Detection in Computed Tomography (CT) scans using Dense Vertex Relative Position Encoding

CT-VDETR:使用密集顶点相对位置编码的CT扫描半监督3D创伤检测

Shivam Chaudhary, Sheethal Bhat, Andreas Maier

发表机构 * University of Freiburg(弗赖堡大学)

AI总结 提出CT-VDETR框架,结合自监督预训练和半监督transformer检测,在仅78个标注体数据上实现31.33% mAP@0.50,比纯监督方法提升1.53倍。

Comments v2: Updated results with corrected dataset split. Revised Table 1 (mAP@0.50: 31.33% SSL vs 20.45% baseline, 1.53x improvement; mAP@0.75: 30.95% vs 10.45%, 2.96x improvement). Updated validation curves showing stable convergence. No methodology changes. 7 pages, 4 figures, 2 tables. Code: https://github.com/shivasmic/3d-trauma-detection-ssl

详情
AI中文摘要

在腹部CT中准确检测和定位创伤性损伤仍然具有挑战性,因为体素级标注有限且获取成本高。我们提出了一种标签高效的3D腹部创伤检测框架,该框架将自监督预训练与半监督基于transformer的检测相结合。首先,我们在1098个CT体数据上使用掩码图像建模(MIM)预训练3D U-Net编码器,用于解剖表示学习。接着,我们通过特征适配器将V-DETR适应到密集体积CT,该适配器将编码器特征网格转换为紧凑的token序列,用于transformer解码。然后,将预训练编码器与V-DETR和3D顶点相对位置编码(3D V-RPE)集成,以改善不规则形状损伤的定位。最后,在半监督教师-学生一致性正则化中,利用额外的2000个未标注体数据进行检测器训练。据我们所知,这是3D DETR风格检测器首次应用于RSNA腹部创伤检测任务。在该基准上,所提方法仅使用78个标注训练体数据就达到了31.33%的测试mAP@0.50,相当于纯监督训练的1.53倍提升。这些结果表明,将医学领域预训练与半监督学习相结合是标签稀缺的3D医学检测的有效策略。

英文摘要

Accurate detection and localization of traumatic injuries in abdominal CT remain challenging because voxel-level annotations are limited and expensive to obtain. We present a label-efficient framework for 3D abdominal trauma detection that combines self-supervised pretraining with semi-supervised transformer-based detection. First, we use Masked Image Modeling (MIM) on 1098 CT volumes to pretrain a 3D U-Net encoder for anatomical representation learning. Next, we adapt V-DETR to dense volumetric CT through a feature adapter that converts the encoder feature grid into a compact token sequence for transformer decoding. The pretrained encoder is then integrated with V-DETR and 3D Vertex Relative Position Encoding (3D V-RPE) to improve the localization of irregularly shaped injuries. Finally, semi-supervised teacher-student consistency regularization leverages 2,000 additional unlabeled volumes during detector training. To the best of our knowledge, this is the first application of a 3D DETR-style detector to the RSNA abdominal trauma detection task. On this benchmark, the proposed method achieves 31.33% test mAP@0.50 using only 78 labeled training volumes, corresponding to a 1.53x improvement over supervised-only training. These results show that combining medical-domain pretraining with semi-supervised learning is an effective strategy for label-scarce 3D medical detection.

2603.27998 2026-06-16 eess.AS cs.LG 版本更新

HRIR-Former: Grid-Free Time-Domain Reconstruction of Head-Related Impulse Responses with a Spatially Encoded Transformer

HRIR-Former:基于空间编码Transformer的无网格时域头相关冲激响应重建

Shaoheng Xu, Chunyi Sun, Jihui Zhang, Amy Bastine, Prasanga N. Samarasinghe, Thushara D. Abhayapala, Hongdong Li

发表机构 * The Australian National University(澳大利亚国立大学) The University of Queensland(昆士兰大学)

AI总结 提出HRIR-Former,一种时域无网格双耳Transformer,从稀疏测量中预测任意方向HRIR,采用正弦空间特征、Conv1D细化模块及ITD/ILD辅助头,在SONICOM数据集上优于现有方法。

Comments Accepted at Interspeech 2026, Sydney, Australia

详情
AI中文摘要

个性化头相关冲激响应(HRIR)能够实现双耳渲染,但密集的逐听者测量成本高昂。我们解决从稀疏的逐听者测量中进行HRIR空间上采样的问题:给定一个听者的少量测量HRIR,预测未测量目标方向的HRIR。先前的学习方法通常在频域中工作,依赖最小相位假设或单独的时序模型,并使用固定方向网格,这可能会降低时间保真度和空间连续性。我们提出HRIR-Former,一种时域、无网格的双耳Transformer,用于从稀疏输入重建任意方向的HRIR。它使用正弦空间特征、Conv1D细化模块以及辅助的耳间时间差(ITD)和耳间电平差(ILD)头。在SONICOM数据集上,它在归一化均方误差(NMSE)、余弦距离和ITD/ILD误差上优于先前方法;消融实验验证了各模块,并表明最小相位预处理是不必要的。

英文摘要

Individualized head-related impulse responses (HRIRs) enable binaural rendering, but dense per-listener measurements are costly. We address HRIR spatial up-sampling from sparse per-listener measurements: given a few measured HRIRs for a listener, predict HRIRs at unmeasured target directions. Prior learning methods often work in the frequency domain, rely on minimum-phase assumptions or separate timing models, and use a fixed direction grid, which can degrade temporal fidelity and spatial continuity. We propose HRIR-Former, a time-domain, grid-free binaural Transformer for reconstructing HRIRs at arbitrary directions from sparse inputs. It uses sinusoidal spatial features, a Conv1D refinement module, and auxiliary interaural time difference (ITD) and interaural level difference (ILD) heads. On SONICOM, it improves normalized mean squared error (NMSE), cosine distance, and ITD/ILD errors over prior methods; ablations validate modules and show minimum-phase preprocessing is unnecessary.

2604.17301 2026-06-16 cs.CL cs.AI cs.HC cs.IR cs.LG 版本更新

RoTRAG: Rule of Thumb Reasoning for Conversation Harm Detection with Retrieval-Augmented Generation

RoTRAG: 基于经验法则推理的检索增强生成对话有害内容检测

Juhyeon Lee, Wonduk Seo, Junseo Koh, Seunghyun Lee, Haihua Chen, Yi Bu

发表机构 * Peking University(北京大学) Enhans University of North Texas(北得克萨斯大学)

AI总结 提出RoTRAG框架,通过检索外部道德规范(RoTs)增强LLM的多轮对话有害内容检测,实现基于规范推理和分类,平均F1提升约40%,分布误差降低8.4%。

Comments Accepted by SIGIR-ICTIR 2026, Oral Presentation

详情
Journal ref
Proceedings of the 2026 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR '26), July 25, 2026, Melbourne, VIC, Australia. ACM, New York, NY, USA, 12 pages
AI中文摘要

检测多轮对话中的有害内容需要对完整对话上下文进行推理,而非孤立的话语。然而,现有方法主要依赖模型内部的参数化知识,缺乏对外部规范性原则的明确依据。这常导致在社会细微语境下判断不一致、可解释性有限以及跨轮次冗余推理。为解决此问题,我们提出RoTRAG,一种检索增强框架,将简洁的人类编写的道德规范(称为经验法则,RoTs)融入基于LLM的有害性评估中。对于每一轮,RoTRAG从外部语料库中检索相关RoTs,并将其作为轮次推理和最终严重性分类的明确规范性证据。为提高效率,我们进一步引入一个轻量级二元路由分类器,决定新轮次是否需要基于检索的推理或可重用现有上下文。在ProsocialDialog和Safety Reasoning Multi Turn Dialogue上的实验表明,RoTRAG在有害分类和严重性估计上均持续优于竞争基线,在基准数据集上F1平均相对提升约40%,分布误差平均相对降低8.4%,同时在不牺牲性能的情况下减少冗余计算。

英文摘要

Detecting harmful content in multi turn dialogue requires reasoning over the full conversational context rather than isolated utterances. However, most existing methods rely mainly on models internal parametric knowledge, without explicit grounding in external normative principles. This often leads to inconsistent judgments in socially nuanced contexts, limited interpretability, and redundant reasoning across turns. To address this, we propose RoTRAG, a retrieval augmented framework that incorporates concise human written moral norms, called Rules of Thumb (RoTs), into LLM based harm assessment. For each turn, RoTRAG retrieves relevant RoTs from an external corpus and uses them as explicit normative evidence for turn level reasoning and final severity classification. To improve efficiency, we further introduce a lightweight binary routing classifier that decides whether a new turn requires retrieval grounded reasoning or can reuse existing context. Experiments on ProsocialDialog and Safety Reasoning Multi Turn Dialogue show that RoTRAG consistently improves both harm classification and severity estimation over competitive baselines, with an average relative gain of around 40% in F1 across benchmark datasets and an average relative reduction of 8.4% in distributional error, while reducing redundant computation without sacrificing performance.

2604.26963 2026-06-16 cs.OS cs.DC cs.LG cs.MA 版本更新

MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems

MARS:面向异构智能体系统的高效自适应协同调度

Yifei Wang, Hancheng Ye, Yechen Xu, Cong Guo, Chiyue Wei, Qinsi Wang, Dongting Li, Tingjun Chen, Hai "Helen" Li, Danyang Zhuo, Yiran Chen

发表机构 * Duke University(杜克大学)

AI总结 提出MARS协同调度系统,通过统一信息流全局协调GPU推理与CPU工具执行,解耦准入与执行防止资源过载,并采用智能体中心调度器最小化端到端延迟,实验显示延迟降低5.94倍。

Comments 14 pages, 13 figures. Preprint

详情
AI中文摘要

大型语言模型(LLM)越来越多地被部署为自主智能体的执行核心,而非独立的文本生成器。智能体工作负载引发了时间上的转变,从单轮推理转向多轮LLM-工具循环,以及空间上的转变,从聊天规模的仅GPU执行转向仓库规模的GPU-CPU协同执行。因此,协调智能体执行的异构资源需求已成为一个关键的系统挑战。我们设计并实现了MARS,一个高效且自适应的协同调度系统,它在GPU-CPU耦合资源压力下全局协调异构智能体工作负载。通过统一信息流建立对GPU推理和CPU工具执行的全局可见性,MARS中的外部控制平面将准入与执行解耦,以防止异构资源过载。内部智能体中心调度器通过优先处理延迟敏感的延续,并仅在热恢复带来延迟收益时自适应保留KV缓存状态,进一步最小化端到端关键路径。我们的评估表明,MARS将端到端延迟降低高达5.94倍,同时保持接近最大的系统吞吐量。我们进一步将MARS作为OpenHands编码智能体框架的服务后端,通过加速端到端任务完成时间高达1.87倍,展示了其在现实世界中的有效性。我们的源代码在此https URL公开提供。

英文摘要

Large language models (LLMs) are increasingly deployed as the execution core of autonomous agents rather than as standalone text generators. Agentic workloads induce a temporal shift from single-turn inference to multi-turn LLM-tool loops, and a spatial shift from chat-scale, GPU-only execution to repository-scale, GPU-CPU co-located execution. Consequently, coordinating heterogeneous resource demands of agentic execution has emerged as a critical system challenge. We design and implement MARS, an efficient and adaptive co-scheduling system that globally coordinates heterogeneous agentic workloads under coupled GPU-CPU resource pressure. By establishing holistic visibility across GPU inference and CPU tool execution via a unified information stream, an external control plane in MARS decouples admission from execution to prevent heterogeneous resource oversubscription. An internal agent-centric scheduler further minimizes the end-to-end critical path by prioritizing latency-sensitive continuations and adaptively retaining KV cache state only when warm resumption yields a latency benefit. Our evaluations show that MARS reduces end-to-end latency by up to 5.94x while maintaining nearly maximal system throughput. We further integrate MARS as the serving backend for the OpenHands coding agent framework, demonstrating its real-world effectiveness by accelerating end-to-end task completion time by up to 1.87x. Our source code is publicly available at https://github.com/Afterglow231/MARS_preview .

2605.04998 2026-06-16 cs.SD cs.IR cs.LG 版本更新

Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

流行与爵士混合比例对体裁自适应和弦生成的实证研究

Jinju Lee

发表机构 * PearlLeeStudio(pearllee studio)

AI总结 本研究通过调整流行与爵士音乐的比例进行和弦生成排练,发现适度的流行排练能在保持流行准确率的同时提升爵士预测性能,并修正了先前版本中的检查点选择错误。

Comments Erratum: the released F1 checkpoint equals the Phase-0 pop baseline (full SHA-256 verified); min mixed validation loss selection kept the unadapted warmup epoch. Tables 4 and 5 are best epoch metrics; mix ratio conclusions hold. A corrected retrain (jazz only validation), ft-pop80-v2, reproduces across 3 seeds. v1 F2 row fixed. 3 figs, 5 tables. https://huggingface.co/PearlLeeStudio

详情
AI中文摘要

本修订更新了一项流行到爵士和弦生成的排练研究。最佳时期的指标仍然表明,适度的流行排练能在保持流行准确率的同时提高爵士预测性能,但v2版本修正了已发布检查点的选择:已发布的F1等于阶段0,F2存在转录错误,而ft-pop80-v2恢复了跨3个种子的哈希区分爵士适应F1。

英文摘要

This revision updates a pop-to-jazz chord-generation rehearsal study. Best-epoch metrics still show that modest pop rehearsal preserves pop accuracy while improving jazz prediction, but v2 corrects released-checkpoint selection: the released F1 equals Phase 0, F2 had a transcription error, and ft-pop80-v2 restores a hash-distinct jazz-adapted F1 across 3 seeds.

2606.01110 2026-06-16 physics.geo-ph cs.LG quant-ph 版本更新

Accelerating physics-informed neural networks for full waveform inversion using a hybrid quantum-classical finite-basis architecture

使用混合量子-经典有限基架构加速全波形反演的物理信息神经网络

Hoang Anh Nguyen, Divakar Vashisth, Ali Tura

发表机构 * Department of Geophysics, Colorado School of Mines(地质学系,科罗拉多矿业学院) Department of Energy Science and Engineering, Stanford University(能源科学与工程系,斯坦福大学) Department of Petroleum Engineering, Colorado School of Mines(石油工程系,科罗拉多矿业学院)

AI总结 提出一种混合量子-经典FBPINN用于声波全波形反演,通过参数化量子电路实现波场和速度网络,在约8倍少的训练迭代次数下达到比经典基线更低的L1速度误差,并泛化至其他波反演问题。

详情
AI中文摘要

全波形反演(FWI)从接收器数据重建非均匀材料属性,但计算需求高。物理信息神经网络(PINN)及其域分解变体(FBPINN)提供无网格替代方案,但在表示复杂速度场时面临收敛挑战。我们提出一种用于声波FWI的混合量子-经典FBPINN,结合量子计算和经典机器学习,其中分解的波场网络和全局速度网络实现为以参数化量子电路(PQC)终结的经典到量子流水线。PQC作为可微分的JAX状态向量模拟器实现,通过经典PINN、量子电路和物理信息损失实现端到端自动微分。在地球物理异常基准上,量子混合模型在约8倍少的训练迭代次数下达到比主要经典FBPINN基线更低的L1速度误差,尽管使用的可训练参数约少33%,并且优于所有15个经典超参数变体。第二个基准(棋盘格)展示了反演流水线的通用性,确认量子混合架构可以恢复超出局部异常基准的结构化空间变化。我们的框架广泛适用于基于波的反演问题,包括医学超声断层扫描和无损评估。

英文摘要

Full waveform inversion (FWI) reconstructs heterogeneous material properties from receiver data but remains computationally demanding. Physics-informed neural networks (PINNs) and their domain-decomposed variants (FBPINNs) offer a mesh-free alternative but face convergence challenges when representing complex velocity fields. We present a hybrid quantum-classical FBPINN for acoustic FWI, bringing together quantum computing and classical machine learning, in which the decomposed wavefield network and the global velocity network are implemented as classical-to-quantum pipelines terminating in parameterized quantum circuits (PQCs). The PQCs are realized as differentiable JAX statevector simulators, enabling end-to-end automatic differentiation through the classical PINN, the quantum circuit, and the physics-informed loss. On a geophysical anomaly benchmark, the quantum hybrid reaches a lower L1 velocity error than the primary classical FBPINN baseline in approximately 8x fewer training iterations, despite using approximately 33% fewer trainable parameters, and it outperforms all 15 classical hyperparameter variants tested. A second benchmark (checkerboard) demonstrates the generality of the inversion pipeline, confirming that the quantum hybrid architecture can recover structured spatial variations beyond the localized anomaly benchmark. Our framework is broadly applicable to wave-based inverse problems beyond geophysics, including medical ultrasound tomography and non-destructive evaluation.

2606.07334 2026-06-16 cs.SD cs.LG 版本更新

How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling

和弦符号时间序列适应能承载多远流派身份?多流派和弦符号建模的能力与边界

Jinju Lee

发表机构 * PearlLeeStudio

AI总结 本研究评估了五种轻量级适应方法(LoRA、IA3、BitFit、前缀微调和全微调)将预训练流行爵士和弦模型扩展到11个目标流派的效果,发现所有方法均能提升和弦预测性能,但和弦符号本身不足以完整传递流派身份。

Comments v3: ft-pop80-v2, a selection-corrected, hash-distinct jazz base, exists, reproducing over 3 seeds (top-1 75.76 +/- 0.03), so the Sec. 8 base robustness ablation is now gated by effort, not checkpoint availability. Added a v3 changelog; corrected Sec. 5.2/6.3/6.9 stats for CSV fidelity (no qualitative changes). https://github.com/PearlLeeStudio/TheArtist | https://huggingface.co/PearlLeeStudio

详情
AI中文摘要

和声是一个紧凑的符号层,其中数学音高关系、声学协和与音乐惯例交汇。本报告将和弦符号序列视为音乐的不完全表示,而是作为可解释、可控的时间序列用于流派局部和声建模。从一个冻结的流行爵士音乐变换器检查点开始,我评估了小型适应接口能将模型扩展到11个目标流派的程度:布鲁斯、波萨诺瓦、巴赫众赞歌、乡村、电子、民谣、放克、福音、嘻哈、R&B/灵魂乐和摇滚。主要比较了LoRA、IA3、BitFit、前缀微调和全微调在11个流派和3个种子上的表现,构成完整的165个单元格网格。所有五种方法在保留和弦预测上都优于冻结基线,宏观增益从+2.89到+3.61分;LoRA和IA3得分最高,但经Holm和Benjamini-Hochberg校正的Wilcoxon检验不支持决定性优胜者。一个匹配数据量的对照实验进一步明确了这一点:当流派被子采样到共同语料库大小时,IA3保持领先,但LoRA的全数据优势消失并跌至最后,表明小差距部分由数据驱动。一个控制标记基线也很强,错误流派适配器通常优于冻结基线,表明大部分效果来自对可重用和声基底的轻量级条件化,而非特定适配器家族。额外的诊断(秩扫描、错误流派轮换、基础检查点消融、仅和弦流派分类、生成输出统计、真实歌曲评估和重复分析)支持一个有限的结论:和弦符号适应可靠地改进了流派局部和声预测,但仅靠和弦符号不能承载完整的流派身份。因此,本报告避免关于感知流派真实性或完整音乐质量的声明,这需要受控的听众或音乐家评估。

英文摘要

This revision updates an 11-genre chord-symbol adaptation report. The main 165-cell result is unchanged: all methods improve over the frozen pure-pop base, with no decisive method winner. v3 adds the ft-pop80-v2 multi-seed base-restoration note and corrects a few summary statistics for exact CSV faithfulness without changing conclusions.

2606.08898 2026-06-16 eess.AS cs.AI cs.LG 版本更新

Few-shot Class-variable Incremental Audio Classification via Prototype Adaptation and Pseudo Class-variable Training

基于原型适应和伪类变量训练的少样本类变量增量音频分类

Yanxiong Li, Guoqing Chen, Qianqian Li, Sen Huang

发表机构 * School of Electronic and Information Engineering, South China University of Technology(华南理工大学电子与信息学院)

AI总结 针对实际中类别数量增减的少样本类变量增量音频分类问题,提出一种结合原型适应网络和伪类变量训练策略的方法,在三个公开数据集上平均准确率超过现有方法。

Comments This paper has been accepted for publication in Interspeech 2026. 4 Tables and 4 Figures

详情
AI中文摘要

在少样本类增量音频分类任务中,通常假设类别数量总是增加而不考虑减少的可能性。然而,实际中类别数量通常会增加或减少。本文研究了少样本类变量增量音频分类(FCIAC)问题,其中类别数量增加或减少。我们提出了一种使用原型适应和伪类变量训练的FCIAC方法。我们的方法中的模型由编码器和分类器组成。分类器由类变量原型适应网络初始化,其结构随类别的变化而动态变化。此外,我们设计了一种伪类变量训练策略,以增强模型对变化类别的适应性。在三个公开数据集上的实验表明,我们的方法在平均准确率上超过了先前的方法。代码位于:https://github.com/cgq2971-afk/FCIAC。

英文摘要

In the task of few-shot class-incremental audio classification, the number of classes is assumed to always increase without considering the possibility of decrease. However, the number of classes generally increases or decreases in practice. In this paper, we investigate a problem of Few-shot Class-variable Incremental Audio Classification (FCIAC), in which the number of classes increases or decreases. We propose a FCIAC method using prototype adaptation and pseudo class-variable training. The model in our method consists of an encoder and a classifier. The classifier is initialized by a class-variable prototype adaptation network, whose structure dynamically changes with the change of classes. In addition, we design a pseudo class-variable training strategy to enhance the model's adaptability to changing classes. Experiments on three public datasets show that our method exceeds previous methods in average accuracy. The code is at: https://github.com/cgq2971-afk/FCIAC.

2606.13578 2026-06-16 cs.CL cs.AI cs.LG cs.MM cs.RO 版本更新

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

LabVLA:在科学实验室中落地视觉-语言-动作模型

Baochang Ren, Xinjie Liu, Xi Chen, Yanshuo Liu, Chenxi Li, Daqi Gao, Zeqin Su, Jintao Xing, Zirui Xue, Rui Li, Xiangyu Zhao, Shuofei Qiao, Minting Pan, Wangmeng Zuo, Lei Bai, Dongzhan Zhou, Ningyu Zhang, Huajun Chen

发表机构 * Zhejiang University(浙江大学) Shanghai AI Laboratory(上海人工智能实验室) Harbin Institute of Technology(哈尔滨工业大学)

AI总结 针对科学实验室中机器人执行协议面临的数据和实体瓶颈,提出模拟数据引擎RoboGenesis和两阶段训练策略LabVLA,在LabUtopia基准上取得最高平均成功率。

Comments Work in progress. Project website at https://zjunlp.github.io/LabVLA/

详情
AI中文摘要

科学实验室越来越依赖AI系统来推理实验,但物理实验操作仍超出其能力范围。AI可以帮助阅读文献、生成假设和规划协议,但实验台前的协议执行仍需人类操作员。视觉-语言-动作(VLA)模型为书面协议与机器人执行之间提供了一种可能的接口,但现有策略主要在家庭和桌面演示上训练,很少遇到科学实验室中的仪器、透明液体或固定协议工作流。弥补这一差距需要实验室特定的监督和统一的学习框架,以适应执行实验协议所使用的不同机器人实体。因此,我们将数据和实体视为与模型设计并列的核心瓶颈。为解决数据方面的问题,我们构建了RoboGenesis,这是一个基于模拟的工作流和数据引擎,能够从原子技能组合配置的实验室工作流,验证和过滤 rollout,并跨支持的机器人配置文件导出结构化演示。在策略方面,我们提出了LabVLA,采用两阶段训练方案:首先进行FAST动作标记预训练,使Qwen3-VL-4B-Instruct骨干网络在学习任何连续控制之前具备动作意识;然后进行流匹配后训练,在知识隔离下附加一个DiT动作专家。在LabUtopia基准上,LabVLA在分布内和分布外设置下均达到了所有评估基线中最高的平均成功率。

英文摘要

Scientific laboratories increasingly rely on AI systems to reason about experiments, but the physical act of doing science remains largely outside their reach. AI can help read literature, generate hypotheses, and plan protocols, yet the execution of those protocols at the bench still requires a human operator. Vision-Language-Action (VLA) models provide one possible interface between written protocols and robot execution, but existing policies are trained mostly on household and tabletop demonstrations and rarely encounter the instruments, transparent liquids, or fixed protocol workflows found in scientific laboratories. Closing this gap requires both laboratory-specific supervision and a unified learning framework that can accommodate the diverse robot embodiments used to execute experimental protocols. We therefore identify data and embodiment as central bottlenecks alongside model design. To address the data side, we build RoboGenesis, a simulation-based workflow and data engine that composes configured laboratory workflows from atomic skills, validates and filters rollouts, and exports structured demonstrations across supported robot profiles. On the policy side, we present LabVLA, trained with a two-stage recipe: FAST action token pretraining first makes the Qwen3-VL-4B-Instruct backbone action aware before any continuous control is learned, and flow matching posttraining then attaches a DiT action expert under knowledge insulation. On the LabUtopia benchmark, LabVLA achieves the highest average success rate among all evaluated baselines under both in-distribution and out-of-distribution settings.

13. 其他/综合机器学习 55 篇

2606.15064 2026-06-16 cs.LG cs.RO 新提交

Phase-Localized Curation Does Not Help: A Negative Result on Per-Phase Metric Selection for Demonstration Filtering

相位局部筛选无帮助:基于逐阶段度量选择的演示过滤负面结果

Aarav Bedi

发表机构 * Department of Mechanical Engineering, University of California, Berkeley(加州大学伯克利分校机械工程系)

AI总结 本文通过LIBERO任务实验证明,按阶段局部应用度量进行演示筛选不如全局或统一度量,原因是缺陷信号被稀释且阶段度量不可迁移。

Comments 5 pages, 3 tables. Code: https://github.com/aaravbedi/phase-gated-curation

详情
AI中文摘要

操作演示具有时间阶段结构,一个自然的假设是演示筛选度量应在阶段内而非全局应用。其思想是将每条轨迹分割为阶段,用局部信息最丰富的度量对每个阶段评分,然后聚合。这直接源于先前工作,表明单个全局度量可能是缺陷的最佳检测器,但却是结果策略的最差筛选器。我们在三个接触丰富的LIBERO拾取放置任务上测试了逐阶段假设,使用受控的早期释放结构缺陷,将阶段门控筛选与相同度量的统一应用以及强单个全局度量进行比较。在所有三个任务和每个条件五个随机种子下,阶段门控筛选从未是最佳筛选策略,并且在三个任务中的两个上是最差的(任务1:86.0 vs. 全局92.0;任务3:22.7 vs. 统一48.0)。我们将失败归因于一个具体机制:当缺陷信号集中在单个阶段时,跨阶段排名聚合会用来自无缺陷阶段的无信息分数稀释该信号,从而选择比简单地在各处应用缺陷信息度量更差的演示子集。我们进一步表明,逐阶段度量选择不能跨任务迁移,因为任何两个任务之间没有阶段共享获胜度量,因此选择不能重用,必须从噪声扫描中为每个任务重新推导。这些结果限制了一种看似合理且先前未经测试的方法,并论证了实践者应优先识别单个缺陷信息度量,而非按阶段分解筛选。我们发布了完整流程、所有度量实现和每个种子的结果。

英文摘要

Manipulation demonstrations have temporal phase structure, and a natural hypothesis is that demonstration-curation metrics should be applied within phases rather than globally. The idea is to segment each trajectory into phases, score each phase with the metric that is locally most informative, and then aggregate. This follows directly from prior work showing that a single global metric can be the best detector of a defect and yet the worst curator of the resulting policy. We test the per-phase hypothesis on three contact-rich LIBERO pick-and-place tasks with a controlled early-release structural defect, comparing phase-gated curation against the same metrics applied uniformly and against a strong single global metric. Across all three tasks and five random seeds per condition, phase-gated curation is never the best curation strategy, and it is the worst of the three on two of the three tasks (Task 1: 86.0 vs. 92.0 for global; Task 3: 22.7 vs. 48.0 for uniform). We trace the failure to a concrete mechanism. When the defect signal is concentrated in a single phase, rank-aggregating across phases dilutes that signal with uninformative scores from defect-free phases, selecting a worse demonstration subset than simply applying the defect-informative metric everywhere. We further show that the per-phase metric selection does not transfer across tasks, since no phase shares a winning metric between any two tasks, so the selection cannot be reused and must be re-derived per task from a noisy sweep. These results bound a plausible and previously untested method, and they argue that practitioners should prefer identifying a single defect-informative metric over decomposing curation by phase. We release the full pipeline, all metric implementations, and per-seed results.

2606.15280 2026-06-16 cs.LG 新提交

Rethinking Structural Anomaly Detection: From Decision Boundaries to Projection Operators

重新思考结构异常检测:从决策边界到投影算子

Alexander Bauer

发表机构 * Machine Learning Group, TU Berlin(柏林工业大学机器学习组) BIFOLD, Berlin, Germany(柏林BIFOLD研究所)

AI总结 针对现有异常检测方法在流形支持数据上的局限性,提出基于投影算子的几何视角,将异常定义为投影残差,统一了重建方法并提升了性能。

详情
AI中文摘要

大多数现有的异常检测方法依赖于估计概率密度或学习封闭的决策边界,隐含地假设正常数据在环境空间中占据非零体积的区域。相比之下,结构异常检测考虑位于低维流形附近的数据,导致现有方法的归纳偏差与数据结构不匹配,常常导致性能下降。为了解决这种不匹配,我们引入了几何视角。具体来说,我们学习一个投影算子到正常样本的流形上,并定义一个样本为异常如果它被这个投影改变。这个公式自然地整合了流形支持数据的归纳偏差,并将异常检测重新表述为投影残差,从而解决了由退化分布建模引起的问题。值得注意的是,它通过用投影质量解释重建方法的成功和失败,提供了对基于重建方法的统一解释。特别是,它解释了投影对齐模型强大的泛化能力,作为向流形收缩行为的结果。此外,通过将异常检测与概率建模解耦,它减少了将罕见但正常的样本错误分类的趋势,这是现有方法广泛认可的局限性。实验上,我们证明了投影对齐方法实现了强大的性能,优于基于边界的方法,同时改进了现有的基于重建的方法。

英文摘要

Most existing anomaly detection methods rely on estimating a probability density or learning an enclosing decision boundary, implicitly assuming that normal data occupies a region of non-zero volume in the ambient space. In contrast, structural anomaly detection considers data that lies near a low-dimensional manifold, creating a mismatch between the inductive bias of existing methods and the structure of the data, often resulting in degraded performance. To address this mismatch, we introduce a geometric perspective. Specifically, we learn a projection operator onto the manifold of normal samples and define a sample as anomalous if it is altered by this projection. This formulation naturally integrates the inductive bias of manifold-supported data and reframes anomaly detection in terms of a projection residual, thereby resolving issues arising from modeling degenerate distributions. Notably, it provides a unifying interpretation of reconstruction-based methods by explaining their success and failure in terms of projection quality. In particular, it explains the strong generalization ability of projection-aligned models as a consequence of contraction behavior toward the manifold. Moreover, by decoupling anomaly detection from probabilistic modeling, it reduces the tendency to misclassify rare but normal samples, a widely recognized limitation of existing approaches. Empirically, we demonstrate that projection-aligned methods achieve strong performance, outperforming boundary-based methods while improving upon existing reconstruction-based approaches.

2606.15369 2026-06-16 cs.LG 新提交

Repeated Bilateral Trade: The Quest for Fairness

重复双边贸易:追求公平

François Bachoc, Roberto Colomboni, Emilie Kaufmann

发表机构 * University of Lille(里尔大学) Institut Universitaire de France (IUF)(法国大学研究院) School of Mathematics, University of Bristol(布里斯托大学数学学院) Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189-CRIStAL(里尔大学、法国国家科学研究中心、法国国家信息与自动化研究所、中央理工-里尔高等电力学院,UMR 9189-CRIStAL)

AI总结 研究重复双边贸易中的公平性,提出Rawls-to-Nash公平增益目标族,并刻画其最优学习率。

详情
AI中文摘要

我们从公平的角度研究重复双边贸易。每轮,一对新的卖方-买方到达,平台在观察交易者估值之前发布价格。只有当双方都接受价格时,交易才会发生。我们考虑的不是最大化贸易收益,而是寻求平衡分配所产生的盈余的平台。我们表明,自然的公平性要求导致了一个单参数的Rawls-to-Nash公平增益目标族,该目标族通过非正Hölder均值聚合卖方和买方的净收益而得到。与标准的贸易收益目标和先前工作中研究的Rawlsian公平增益目标不同,我们提出的目标引入了一种新的统计结构,其中期望奖励通过阈值反馈从二维奇异核积分恒等式中恢复。这导致了一个非标准的纯探索问题,其自然估计量是具有行列依赖和奇异权重的矩形双重和。假设卖方和买方估值序列独立同分布且具有任意未知边际分布,我们刻画了整个Rawls-to-Nash公平增益目标族的最优学习率,给出了匹配的固定置信度样本复杂度和遗憾界(最多相差多对数因子)。

英文摘要

We study repeated bilateral trade from a fairness perspective. At each round, a fresh seller-buyer pair arrives, and the platform posts a price before observing the traders' valuations. Trade occurs only if both agents accept the price. Rather than maximizing only the gain from trade, we consider platforms that seek balanced divisions of the generated surplus. We show that natural fairness desiderata lead to a one-parameter Rawls-to-Nash family of fair-gain objectives, obtained by aggregating the seller's and buyer's net gains through nonpositive Hölder means. Unlike the standard gain-from-trade objective and the Rawlsian fair-gain objective studied in prior work, our proposed objectives induce a new statistical structure in which expected rewards are recovered from threshold feedback through a two-dimensional singular-kernel integral identity. This leads to a nonstandard pure-exploration problem whose natural estimators are rectangular double sums with row-column dependence and singular weights. Assuming independent i.i.d. seller and buyer valuation sequences with arbitrary unknown marginals, we characterize the optimal learning rates for the whole Rawls-to-Nash family of fair-gain objectives, giving matching fixed-confidence sample-complexity and regret bounds up to polylogarithmic factors.

2606.15420 2026-06-16 cs.LG cs.AI 新提交

Constitutional Value Potentials: reading and steering internal priority margins in language models

宪法价值潜力:读取和引导语言模型中的内部优先级边际

Tong Che, Rui Wu

发表机构 * NVIDIA Research(英伟达研究院) Rutgers University(罗格斯大学)

AI总结 提出宪法价值潜力(CVP)方法,通过从隐藏状态学习标量势来读取模型内部的价值优先级边际,以预测和干预价值冲突,AUROC高达0.95。

详情
AI中文摘要

宪法告诉语言模型应该重视什么,但很少有方法告诉我们它是否真的重视。遵守程度通过输出来判断,而输出证据在价值冲突中最脆弱,此时重要的不是模型提及哪个价值,而是它愿意牺牲哪个价值。我们提供证据表明,这种仲裁可以从结构化边际读出中的激活状态中读取。我们引入宪法价值潜力(CVP)。对于每个价值,我们从隐藏状态学习一个标量势:一种保存该价值的内部压力,其监督不是来自提示,而是来自独立评判者对模型自身响应实际保存了哪个价值的裁决。两个势的符号差就是优先级边际。宪法条款成为边际保持为正的主张,而单个监控分数在边际不为正时发出警报。该监控器预测冲突违规的AUROC高达0.95,优于强隐藏状态探针,并在三个Qwen2.5尺度上泛化到未见过的合成冲突。该信号在答案开始时出现,来自提示尾部和第一个响应令牌。早期读取该信号,可以揭示对抗性优先级攻击是否实际上已将模型推向违规,而不仅仅是提示看起来具有对抗性。相同的方向也支持干预测试:在选定的引导设置下,沿着价值方向移动会按预期方向改变评判的权衡。这些结果表明,一些与宪法相关的优先级可以作为激活空间中的边际访问,而不仅仅是输出行为。

英文摘要

A constitution tells a language model what to value, but little tells us whether it does. Adherence is judged from outputs, and output evidence is most fragile on value conflicts, where what matters is not which value a model mentions but which one it is willing to sacrifice. We provide evidence that this arbitration can be read from activations in a structured margin readout. We introduce Constitutional Value Potentials (CVP). For each value we learn a scalar potential from the hidden state: an internal pressure to preserve that value, supervised not by the prompt but by an independent judge's verdict on which value the model's own response actually preserved. The signed difference of two potentials is a priority margin. A constitutional clause becomes the claim that a margin stays positive, and a single monitor score flags when it does not. The monitor predicts conflict violations with AUROC up to 0.95, beats a strong hidden-state probe, and generalizes to held-out synthetic conflicts across three Qwen2.5 scales. The signal appears as the answer begins, from the prompt tail and first response token. Read this early, the same signal reveals whether an adversarial priority hack has actually pushed the model toward a violation, rather than only whether the prompt looks adversarial. The same directions also support intervention tests: under selected steering settings, moving along a value direction shifts judged trade-offs in the intended direction. Together, these results suggest that some constitution-relevant priorities are accessible as activation-space margins, rather than only as output behavior.

2606.16075 2026-06-16 cs.LG cs.CV 新提交

AME: A Multi-Type Contributor Attribution Framework in Generative AI Markets

AME:生成式AI市场中的多类型贡献者归属框架

Yang Shi, Songwen Pei, Yang Gao, Bingxue Zhang

发表机构 * University of Shanghai for Science and Technology(上海理工大学) Fudan University(复旦大学)

AI总结 针对生成式AI中多阶段协作的价值分配问题,提出AME框架,整合异构数据贡献评估、数据权利映射和可信执行,实现与人类判断一致的低成本价值分配。

详情
AI中文摘要

生成式AI通过异构贡献者(包括训练数据、基础模型、微调行为和提示)之间的多阶段协作实现价值创造。然而,如何公平分配数据价值仍未得到充分探索。本文将多阶段生成式AI价值分配定义为一个新的研究问题,并识别出三个核心挑战:异构数据贡献评估、数据权利映射和可信执行。我们提出AME(归属-映射-执行)框架,这是一个统一框架,将数据贡献评估、数据权利映射和可信执行整合到单个工作流中。实验结果表明,AME框架实现了与人类参考判断更一致的数据价值分配结果,同时保持低成本的可信执行。我们的工作为生成式AI数据市场中的价值评估和收益分配提供了初步基础。

英文摘要

Generative AI enables value creation through multi-stage collaboration among heterogeneous contributors, including training data, base models, fine-tuning behaviors, and prompts. However, how to fairly allocate the data value remains largely unexplored. This paper formulates multi-stage generative AI value allocation as a new research problem and identifies three core challenges: heterogeneous data contribution valuation, data rights mapping, and trustworthy execution. We propose AME (Attribution-Mapping-Execution) framework, a unified framework that integrates data contribution valuation, data rights mapping, and trustworthy execution into a single workflow. Experimental results demonstrate that AME framework achieves data value allocation outcomes more consistent with human reference judgments while maintaining low-cost trustworthy execution. Our work provides an initial foundation for value assessment and revenue allocation in generative AI data markets.

2606.16461 2026-06-16 cs.LG 新提交

Privacy from Symmetry: Orthogonally Equivariant Transformers for LLM Inference

对称性带来的隐私:用于大语言模型推理的正交等变Transformer

Alexander Yukhimchuk, Andrey Shulga, Mladen Kolar, Martin Takáč

发表机构 * MBZUAI(穆罕默德·本·扎耶德人工智能大学) University of Southern California(南加州大学)

AI总结 针对拆分推理中隐藏表示易被近邻搜索恢复的问题,提出正交混淆方法,并设计ConjFormer架构实现O(d)-等变性,在不加噪声或重加密下将令牌恢复率从35%降至1.3%,困惑度仅增0.4%。

详情
AI中文摘要

本地运行大型语言模型通常不切实际,这促使将敏感文本的推理推向第三方提供商。拆分推理通过将令牌保留在客户端并仅发送隐藏表示来部分缓解这一问题,但这些表示仍可通过针对公共嵌入表的最近邻搜索恢复。我们提出一种正交混淆过程,其中客户端在传输前将嵌入乘以一个秘密正交矩阵。为了在任意旋转下实现正确推理,我们引入了ConjFormer,这是一种Transformer变体,通过轻量级归一化更改(标量RMSNorm)以及所有线性权重的块状正交共轭,实现精确的$\mathrm{O}(d)$-等变性。因此,服务器完全在旋转基中执行前向传播,并且从未观察到未旋转的隐藏状态。在PubMed上微调的GPT-2和Llama 3.2 1B模型上的实验表明,正交混淆消除了直接余弦最近邻反演,并将令牌恢复率从超过35%的前10名降至最多1.3%,而微调后困惑度仅增加0.4%。这些结果表明,在架构层面强制执行对称性可以为隐私保护的大语言模型推理提供一种实用的防御,无需噪声注入或繁重的密码学机制。

英文摘要

Running large language models locally is often impractical, pushing inference on sensitive text to third-party providers. Split inference partially mitigates this by keeping tokens on the client and sending only hidden representations, but these representations can still be recovered via nearest-neighbor search against the public embedding table. We propose an orthogonal obfuscation procedure in which the client multiplies embeddings by a secret orthogonal matrix before transmission. To enable correct inference under arbitrary rotations, we introduce ConjFormer, a transformer variant that is exactly $\mathrm{O}(d)$-equivariant via a lightweight normalization change (scalar RMSNorm) together with blockwise orthogonal conjugation of all linear weights. As a result, the server performs the full forward pass entirely in the rotated basis and never observes unrotated hidden states. Experiments on GPT-2 and Llama 3.2 1B models fine-tuned on PubMed show that orthogonal obfuscation eliminates direct cosine nearest-neighbor inversion and reduces token recovery from over 35% top-10 to at most 1.3%, while increasing perplexity by only 0.4% after fine-tuning. These results indicate that enforcing symmetry at the architectural level can provide a practical defense for privacy-preserving LLM inference without noise injection or heavy cryptographic machinery.

2606.16920 2026-06-16 cs.LG cs.AI 新提交

Demystifying Variance in Circuit Discovery of LLMs

揭示LLM电路发现中的方差

Frank Zhengqing Wu, Francesco Tonin, Volkan Cevher

发表机构 * Laboratory for Information and Inference Systems (LIONS), École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland(信息与推理系统实验室(LIONS),洛桑联邦理工学院(EPFL),瑞士洛桑)

AI总结 本文研究LLM电路发现中的重采样、重述和样本方差,提出CEAP方法减少重采样方差,并分析重述方差源于不同模板激活不同电路,样本方差主要由不忠定义导致。

详情
AI中文摘要

电路发现是机械可解释性中的关键技术,用于定位对执行给定任务至关重要的模型组件。尽管当前最先进的方法(EAP-IG)在(不)忠实性指标上表现良好,但它存在显著的变异性。这包括重采样方差(当我们用来自同一分布的新数据批次探测时电路发生变化)、重述方差(当提示被重新表述时发现的电路发生偏移)以及样本方差(具有低总体不忠实性的电路在单个样本上的不忠实性表现出大幅波动)。本文研究了这些方差的根源。我们证明了CEAP(我们新的电路发现方法,在理论上改进了EAP-IG)可以显著减轻重采样方差。我们进一步表明,重述方差是由于不同模板的提示倾向于激活模型中的不同电路。这使我们提出,可能很难找到一个全面的电路来解释和控制模型在任务上的行为,而该任务可以用无数模板表达,这表明LLM可能本质上难以操控。我们表明,稀疏性(据称能形成更紧凑和可解释的任务电路)无法解决这个问题。关于样本方差,我们认为它很大程度上是良性的:极差的不忠实性分数通常源于不忠实性的定义方式,而非测量电路的缺陷。我们表明,不忠实性的大小受选择性贡献缩放的影响,这是一种神经机制,解释了有时观察到的极差分数。

英文摘要

Circuit discovery is a key technique in mechanistic interpretability to pinpoint the model components that are crucial for performing a given task. Although the current state-of-the-art method (EAP-IG) performs well on the metric of (un)faithfulness, it suffers from substantial variability. This includes resampling variance, where the circuit changes when we probe with a new batch of data from the same distribution; rephrasing variance, where the discovered circuit shifts when the prompts are rephrased; and sample-wise variance, where a circuit with low population unfaithfulness exhibits large fluctuations in unfaithfulness across individual samples. This paper studies the roots of these variances. We demonstrate that CEAP, our new circuit discovery method that improves upon EAP-IG with a theoretical guarantee, can substantially lessen resampling variance. We further show that rephrasing variance arises because prompts with different templates tend to activate different circuits in the model. This leads us to argue that it may be challenging to find a comprehensive circuit that explains and controls the model's behavior on a task, which can be expressed in countless templates, suggesting that LLMs may be inherently hard to steer. We show that sparsity, which has been claimed to form more compact and interpretable task circuits, fails to solve this problem. Regarding sample-wise variance, we argue that it is largely benign: extremely poor unfaithfulness scores often stem from how unfaithfulness is defined, rather than from defects in the measured circuits. We show that the magnitude of unfaithfulness is affected by selective contribution scaling, a neural mechanism that accounts for the extremely poor scores sometimes observed.

2606.14977 2026-06-16 econ.EM cs.LG 交叉投稿

Identification and Inference for Algorithmic Frontiers with Selective Labels

选择性标签下的算法前沿识别与推断

Yiqi Liu, Francesca Molinari, Amilcar Velez

发表机构 * Department of Economics, Cornell University(经济系,康奈尔大学)

AI总结 本文针对仅观测到部分个体结果的情况,提出了公平-准确性前沿的识别方法及统计推断工具,包括无限制选择下的锐识别区域、无混淆假设下的点识别与去偏机器学习估计量。

Comments 68 pages, 2 figures

详情
AI中文摘要

本文提供了识别结果以刻画公平-准确性(FA)前沿,并给出了统计推断工具来检验假设和构建FA前沿的置信集,当结果仅对选定的个体可观测时。当选择过程不受限制但损失以特定方式度量时,我们给出了FA前沿的锐识别区域的刻画。在假设基于可观测变量的无混淆性(以及无限制损失函数)下,我们获得了点识别,并提出了一种去偏机器学习估计量,推导了其渐近分布,并展示了如何将其用于FA前沿的推断。在正在进行的工作中,我们将部分识别结果扩展到更广泛的损失函数类别。

英文摘要

This paper provides identification results to characterize a fairness-accuracy (FA) frontier, and statistical inference tools to test hypotheses and build a confidence set for the FA-frontier, when outcomes are observed only for selected individuals. When the selection process is unrestricted but loss is measured in specific ways, we provide a characterization of the sharp identification region of the FA-frontier. Under an assumption of unconfoundedness conditional on observables (and unrestricted loss functions), we obtain point identification and propose a debiased machine learning estimator, derive its asymptotic distribution, and show how this can be used to carry out inference for the FA-frontier. In work in progress, we extend the partial identification results to a broader class of loss functions.

2606.15390 2026-06-16 cs.CL cs.AI cs.LG 交叉投稿

Not All Skills Help: Measuring and Repairing Agent Knowledge

并非所有技能都有用:测量与修复智能体知识

Yixuan Wang, Yiyang Zhou, Yiming Liang, Congyu Zhang, Fuxiao Liu, Jiawei Zhou, Huaxiu Yao

发表机构 * UNC Chapel Hill(北卡罗来纳大学教堂山分校) Purdue(普渡大学) NVIDIA(英伟达)

AI总结 提出ASSAY框架,通过随机掩码测量技能因果贡献,分离技能生成与筛选,在推理时抑制负面技能,显著提升LLM智能体任务完成率。

Comments 18 pages, 5 figures

详情
AI中文摘要

LLM智能体可以通过从经验中积累自然语言技能来改进,而无需更新权重,但当前系统将所有关于保留哪些技能以及如何应用它们的决策完全交由LLM判断。我们认为这混淆了两个不同的角色:从经验中生成技能是判断擅长的创造性行为,而决定该技能是否真正有帮助则需要跨多个任务的实证证据。通过随机掩码测量每个技能的因果贡献,我们发现技能库表现出普遍的因果异质性:单个技能通常在某些任务类型上有帮助,但在其他任务类型上有害,然而它们的相反效应在总体上相互抵消,使得全局筛选方法无法察觉。我们提出ASSAY,一个将生成与筛选分离的框架:它在小型开发集上计算每个技能的因果归因,离线重组技能库,并为每个测试任务抑制预测效应为负的技能。在跨越四个提供商的七个基础模型以及两个基准(AppWorld和tau-bench)上,ASSAY始终优于先前的技能筛选方法。在AppWorld最难的数据划分上,DeepSeek-V3实现了69.3%的任务目标完成率(相对提升47.4%),在所有已发表方法(包括权重调整方法)中达到了新的最先进水平。在tau-bench零售领域,GPT-4.1相对提升8.7%,在公开排行榜上超越了o4-mini、o1和GPT-4.5,且无需任何权重修改。消融实验将主要收益归因于每任务掩码,证实瓶颈在于推理时将技能与任务匹配,而非全局移除不良技能。代码已开源:https://github.com/aiming-lab/assay。

英文摘要

LLM agents can improve without weight updates by accumulating natural-language skills from experience, but current systems entrust every decision about which skills to keep and how to apply them to LLM judgment alone. We argue that this conflates two distinct roles: generating a skill from experience is a creative act that judgment handles well, while deciding whether that skill actually helps requires empirical evidence across many tasks. Measuring per-skill causal contributions via randomized masking, we find that skill libraries exhibit pervasive causal heterogeneity: individual skills routinely help on some task types while hurting on others, yet their opposing effects cancel in aggregate, making them invisible to global curation methods. We propose ASSAY, a framework that separates generation from curation: it computes a per-skill causal attribution on a small development set, restructures the library offline, and suppresses skills with negative predicted effect for each test task. Across seven base models spanning four providers and two benchmarks (AppWorld and tau-bench), ASSAY consistently improves over prior skill-curation approaches. On AppWorld's hardest split, DeepSeek-V3 achieves 69.3% task-goal completion (47.4% relative improvement), a new state of the art among all published methods including weight-tuned approaches. On tau-bench retail, GPT-4.1 improves by 8.7% relative, advancing past o4-mini, o1, and GPT-4.5 on the public leaderboard without any weight modification. Ablation traces the dominant gain to per-task masking, confirming that the bottleneck is matching skills to tasks at inference time, not removing bad skills globally. Code is available at https://github.com/aiming-lab/assay.

2606.15482 2026-06-16 stat.ML cs.LG 交叉投稿

Ricci-Filtration: Boosting Retrieval-Augmented Generation Reranker to Query-Answer Tasks by Discrete Ricci Flow

Ricci-Filtration:通过离散Ricci流提升检索增强生成重排序器在查询-答案任务中的性能

Tian Qin, Wei-Min Huang

发表机构 * Tian Qin(田琴) Wei-Min Huang(黄伟民)

AI总结 提出基于离散曲率和Ricci流的几何重排序增强方法Ricci-Filtration,通过建模查询与检索块为网络并利用曲率过滤噪声块,显著提升RAG生成性能。

详情
AI中文摘要

Ricci流是一种曲率引导的扩散过程,通过收缩高正曲率区域和扩张负曲率区域来变形空间。类似地,加权图上的离散Ricci流通过收缩正Ricci曲率的边和拉伸负Ricci曲率的边来修改边权重,有效增加簇之间的分离度。受这两项开创性工作的启发,我们提出了一种基于几何的RAG重排序增强方法,称为Ricci-Filtration。通过将输入查询和初始检索块建模为一个网络,其中输入查询和块作为节点,基于嵌入的成对关系定义初始图,Ricci-Filtration利用离散曲率和Ricci流评估每个块相对于用户查询的结构重要性。该系统首先根据块相对于查询的几何曲率过滤初始块;然后,重排序器处理剩余块以增强生成性能。我们从理论上证明,归一化离散Ricci流可以通过识别边权重的不同渐近行为来检测社区结构。这支持移除相对于查询节点具有大权重和负Ricci曲率的“噪声”文档块。大量实验证实,Ricci-Filtration在准确率、精确率、召回率和F1分数上优于几种基线重排序方法。此外,消融研究表明,Ricci-Filtration在各种设置下通常优于基线,突显了该框架在不同架构下的鲁棒性。

英文摘要

Ricci flow is a curvature-guided diffusion process that deforms space by shrinking regions of high positive curvature and expanding those with negative curvature. Similarly, discrete Ricci flow on weighted graphs modifies edge weights by shrinking edges with positive Ricci curvature and stretching those with negative Ricci curvature, effectively increasing the separation between clusters. Inspired by these two cornerstone works, we propose a geometry-based RAG reranker enhancement procedure called Ricci-Filtration. By modeling the input query and initial retrieved chunks as a network, where the input query and chunks serve as nodes and embedding-based pairwise relations define an initial graph, Ricci-Filtration leverages discrete curvature and Ricci flow to evaluate the structural importance of each chunk with respect to the user query. The system first filters the initial chunks based on their geometric curvature relative to the query; then, a reranker processes the remaining chunks to enhance generative performance. We theoretically prove that normalized discrete Ricci flow can detect community structures by identifying distinct asymptotic behaviors in edge weights. This supports the removal of ``noisy'' document chunks characterized by large weights and negative Ricci curvature relative to the query node. Extensive experiments confirm that Ricci-Filtration outperforms several baseline reranking methods in accuracy, precision, recall, and F1 scores. Furthermore, ablation studies demonstrate that the Ricci-Filtration generally outperforms the baseline under various settings, highlighting the framework's robustness across different architectures.

2606.15485 2026-06-16 cs.CY cs.AI cs.HC cs.LG cs.SE 交叉投稿

The Perils of Agency: How Developers Perceive, Prioritize, and Address Risks in Agentic AI Products

代理的风险:开发者如何感知、优先级排序和应对代理型AI产品中的风险

Hao-Ping Lee, Jessica He, David Piorkowski, Thomas Serban von Davier, Jodi Forlizzi, Sauvik Das

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 通过35位行业开发者的研究,发现开发者对代理型AI风险的感知与自主性、工具使用等代理特性紧密相关,他们优先考虑产品和业务风险,缺乏成熟的控制手段,揭示了代理能力与风险控制之间的张力。

详情
AI中文摘要

代理型AI系统自主行动、使用工具、适应环境并在复杂的现实世界中运行。然而,这些相同的特性可能产生或加剧产品风险。我们研究了行业开发者(n=35)如何感知、优先级排序和应对其代理型AI产品中的风险。我们发现,开发者对风险的感知与使产品具有代理性的特性(如自主性、工具使用和现实世界中的使用)密切相关。开发者在考虑下游社会风险(如工作替代和最终用户隐私)之前,优先考虑产品和业务风险。这种优先级排序也影响了开发者缓解代理风险的能力和动机。最后,开发者缺乏用于控制代理风险的成熟手段,通常依赖于限制使代理有用的相同特性:例如,自主性和目标复杂性。这些发现揭示了代理型AI开发中能力与风险控制之间的张力:开发者需要应对由代理能力产生的风险,但目前他们在不限制代理功能的情况下应对这些风险的支持有限。

英文摘要

Agentic AI systems act autonomously, use tools, adapt to context, and operate in complex real-world environments. However, these same characteristics can create or exacerbate product risks. We studied how industry developers (n=35) perceive, prioritize, and address the risks in their agentic AI products. We found that developers' perceptions of risk were closely tied to the qualities that made the product agentic, such as autonomy, tool use, and usage in a real-world context. Developers prioritized product and business risks before considering downstream societal risks like job displacement and end-user privacy. This prioritization also impacted developers' ability and motivation to mitigate agentic risks. Finally, developers lacked mature controls for containing agentic risks, often relying on constraining the same characteristics that make agents useful: e.g., autonomy and goal complexity. These findings reveal a capability vs. risk control tension in agentic AI development: developers need to address risks that emerge from agentic capabilities, yet they currently have limited support for doing so without constraining agentic functionality.

2606.15521 2026-06-16 cs.CL cs.LG 交叉投稿

Emergent retokenization symmetry in large language models: phenomenology and applications

大型语言模型中涌现的重分词对称性:现象学与应用

Kanishk Jain, Matthew Day, Tankut Can

发表机构 * Department of Physics, Emory University(埃默里大学物理系)

AI总结 研究发现大型语言模型在训练中部分涌现出重分词对称性,通过重分词实验探测模型对语义等价输入表示的敏感性和鲁棒性,并提出一种新的推理时采样策略。

详情
AI中文摘要

分词引入了表示冗余:在固定词表下,每个字节串存在多种有效的分词编码(或切分方式),它们解码后得到相同的表面字符串。然而,给定提示词时,大多数语言模型的分词器通过返回规范切分打破了这种表示对称性。仅基于规范切分进行训练应会影响推理行为,且几乎没有理由期望模型在下游任务中尊重切分对称性。我们发现这种对称性在训练过程中部分涌现。本文通过实验探测这种涌现对称性,测试了分词组合理解、表示多样性和任务导向的基准性能。我们主要使用\textbf{重分词}——在保持字节完全不变的情况下,将提示词的规范分词替换为另一种切分。相对于其他提示扰动,重分词异常干净,因为它隔离了切分效果而不改变语法、语义或表面形式。我们利用重分词研究预训练和后训练中对语义等价输入表示的敏感性和鲁棒性。此外,这种部分重分词对称性暗示了一个不同的推理时采样轴。温度采样通过模型的下一个词概率分布生成多样输出,而重分词通过语义等价的输入表示从模型内部计算生成多样性。我们发现,虽然这种重分词采样策略在简单问题上可能损害性能,但它也能恢复传统采样无法找到的解决方案。总体而言,我们的工作将重分词呈现为一种简单而强大的大型语言模型探测工具,揭示了组合理解和提示敏感性,并提供了一种新颖的采样策略。

英文摘要

Tokenization introduces representational redundancy: under a fixed token vocabulary, every byte string admits many valid token encodings, or segmentations, that decode to the same surface string. However, given a prompt, most language model tokenizers break this representational symmetry by returning a canonical segmentation. Training only on canonical segmentations should influence inference behavior, and there is little reason to expect models to respect segmentation symmetry on downstream tasks. We find that this symmetry partially emerges during training. Here, we probe this emergent symmetry through experiments testing token compositional understanding, representation diversity, and task focused benchmark performance. We primarily use \textbf{retokenization} -- replacing a prompt's canonical tokenization with an alternative segmentation while preserving its bytes exactly. Relative to other prompt perturbations, retokenization is unusually clean because it isolates segmentation effects without changing syntax, semantics or surface form. We use retokenization to study sensitivity and robustness to semantically identical input representations across pretraining and post-training. Moreover, this partial retokenization symmetry suggests a distinct inference-time sampling axis. While temperature sampling generates diverse outputs from the model using its next-token probability distribution, retokenization generates diversity from the model's internal computations through semantically equivalent input representations. We find that while this retokenization sampling strategy can hurt performance on easy problems, it can also recover solutions that conventional sampling does not find. Overall, our work presents retokenization as a simple yet powerful probe of large language models, shedding light on compositional understanding and prompt sensitivity, and offering a novel sampling strategy.

2606.15579 2026-06-16 cs.AI cs.LG cs.MA cs.SE 交叉投稿

Your Agent Has a Genome: Sequence-Level Behavioral Analysis and Runtime Governance of LLM-Powered Autonomous Agents

你的智能体有基因组:基于序列的LLM驱动自主智能体行为分析与运行时治理

Sidi Deng

发表机构 * Independent Researcher(独立研究员)

AI总结 提出XEPV序列编码框架,将LLM智能体行为建模为基因组序列,通过n-gram挖掘发现P-X-P高风险模式,设计Governor三层干预系统,使成功率提升6.2%并减少44% token消耗。

Comments 16 pages, 15 figures, 12 tables

详情
AI中文摘要

我们提出基础序列分析框架,该框架将LLM驱动的自主智能体的运行时行为编码为使用四个字母的字母表的紧凑符号序列:X(探索)、E(执行)、P(规划)和V(验证)。借鉴基因组序列分析的类比,我们对从生产ReAct智能体系统收集的347条真实世界执行轨迹(跨越8天)应用n-gram模式挖掘、马尔可夫转移矩阵和点二列相关分析。我们的分析揭示:(1) 三元组P-X-P是唯一统计显著的高风险模式,使成功率降低10.4%;(2) P比率是成功的最强负预测因子(r=-0.256, p<0.0001);(3) E→V转移概率仅为2.1%,表明存在系统性验证缺陷。基于这些发现,我们设计了Governor,一个三层运行时干预系统,包括规则引擎、统计累加器和基于卡方的阈值自适应器。在自然的部署前后评估中(N=101 vs. N=246),Governor使任务成功率绝对提升6.2%,同时平均token消耗减少44%。为验证跨系统通用性,我们将XEPV编码应用于SWE-bench上2000条公开SWE-agent轨迹,确认探索螺旋和E→V验证缺陷在独立系统中复现。我们概述了六个研究方向,包括基础序列语言模型、跨智能体行为指纹识别和奖励塑造,并发布开源工具包以促进可重复性。

英文摘要

We propose Base Sequence Analysis, a framework that encodes the runtime behavior of LLM-powered autonomous agents into compact symbolic sequences using a four-letter alphabet: X (Explore), E (Execute), P (Plan), and V (Verify). Drawing an analogy to genomic sequence analysis, we apply n-gram pattern mining, Markov transition matrices, and point-biserial correlation to 347 real-world execution traces collected from a production ReAct agent system over 8 days. Our analysis reveals that (1) the trigram P-X-P is the only statistically significant high-risk pattern, lowering success rate by 10.4%; (2) P-ratio is the strongest negative predictor of success (r=-0.256, p<0.0001); and (3) the E->V transition probability is only 2.1%, indicating a systemic verification deficit. Based on these findings, we design Governor, a three-layer runtime intervention system comprising a rule engine, a statistical accumulator, and a chi-square-based threshold adaptor. In a natural before/after deployment evaluation (N=101 vs. N=246), Governor achieves a +6.2% absolute increase in task success rate while simultaneously reducing average token consumption by 44%. To validate cross-system generality, we apply the XEPV encoding to 2,000 public SWE-agent trajectories on SWE-bench, confirming that exploration spirals and the E->V verification deficit replicate in an independent system. We outline six research directions including base sequence language models, cross-agent behavioral fingerprinting, and reward shaping, and release an open-source toolkit for reproducibility.

2606.16407 2026-06-16 cs.CL cs.LG 交叉投稿

A Mechanistic Understanding of Pronoun Fidelity in LLMs

对大型语言模型中代词忠实性的机制理解

Katharina Trinley, Jesujoba O. Alabi, Dietrich Klakow, Vagrant Gautam

发表机构 * Saarland University(萨尔大学) Heidelberg Institute for Theoretical Studies(海德堡理论研究所)

AI总结 通过因果分析发现,代词忠实性由组实体绑定、近因偏差和刻板印象偏差三种因果子空间共同作用,解释了91-99.5%的行为。

详情
AI中文摘要

忠实且稳健的代词使用对于公平和连贯的生成至关重要,然而当多个指代对象使用不同代词时,大型语言模型大多会失败。为了研究推理、重复和偏差在此任务中的相互作用,先前的工作完全依赖行为方法,这可能无法反映模型的内部运作。因此,我们提供了关于代词忠实性的机制性、模型内部视角,测试了三种机制——组实体绑定(G)、近因偏差(R)和刻板印象偏差(S)——是否在多个SOTA语言模型中因果实现。使用无界分布式对齐搜索,我们发现三者作为因果子空间共存,分布在网络深度上。没有单一机制能完全解释模型行为,但三者的组合一致地解释了91-99.5%。注意力头分析进一步揭示了两种竞争的复制路径;组绑定和刻板印象共享一个局部化的概念级路径,检索绑定的职业-代词单元,而近因使用分布式的令牌级路径,重复表面形式。总之,代词忠实性源于同时活跃的因果子空间之间的竞争。

英文摘要

Faithful and robust pronoun use is important for fair and coherent generations, yet large language models largely fail when multiple referents use different pronouns. To study the interplay of reasoning, repetition, and bias in this task, prior work relies exclusively on behavioural approaches, which may not reflect a model's internal workings. Therefore, we provide a mechanistic, model-internal perspective on pronoun fidelity, testing whether three mechanisms -- group entity binding (G), recency bias (R), and stereotypical bias (S) -- are causally implemented across several SOTA language models. Using Boundless Distributed Alignment Search, we find all three coexist as causal subspaces distributed across network depth. No single mechanism fully explains model behaviour, but a combination of the three consistently accounts for 91-99.5%. An attention head analysis further reveals two competing copying routes; group binding and stereotype share a localized concept-level route that retrieves a bound occupation-pronoun unit, while recency uses a distributed token-level route that repeats surface forms. In sum, pronoun fidelity arises from competition between simultaneously active causal subspaces.

2606.16988 2026-06-16 cs.SE cs.LG 交叉投稿

Agent trajectories as programs: fingerprinting and programming coding-agent behavior

智能体轨迹作为程序:编码智能体行为的指纹识别与编程

Hamidah Oderinwale

AI总结 提出通过程序性表示分析智能体行为模式,实现轨迹指纹识别(85.7%准确率),并开发ProcGrep库用于审计和评估智能体任务处理过程。

详情
AI中文摘要

基准分数告诉你智能体做对了什么;它们不会告诉你它是如何做到的。在这项工作中,我们引入了在不同上下文中程序性比较智能体的方法,其中模型、任务和方法各不相同。我们比较了十个智能体,发现它们可以通过其行为习惯来识别,我们将其定义为指纹:对这些程序性特征进行探测,可以将未见过的轨迹以85.7%的准确率归因于正确的智能体,并控制了跨任务的泄漏。我们通过一种新兴词汇归纳技术为智能体问题解决过程开发了程序性表示,该技术旨在最大程度压缩以避免表面变化,同时具有足够的表达能力以揭示模型模式的怪癖。我们将我们的框架应用于软件工程评估数据集SWE-Bench,以研究智能体轨迹的结构独特性,并发现来自相似发布时期的模型以及彼此蒸馏的模型(例如,蒸馏学生模型与其教师之间的Jensen-Shannon散度为0.25,约为其他模型对之间距离的一半)的行为最为相似。随着更多模型在评估中饱和,我们认为从比成功率更全面的维度探究模型行为将变得重要。我们引入了ProcGrep,一个用于审计和评估智能体如何以程序级别自上而下处理其轨迹的库。我们相信这项工作具有一系列应用,可以帮助开发者使用和编程编码智能体,例如任务感知模型路由、智能体监控和更细粒度的成本分析。

英文摘要

Benchmark scores tell you what an agent got right; they do not tell you how it got there. In this work, we introduce methods for comparing agents procedurally in different contexts, where the model, tasks, and approaches vary. We compare ten agents and find that they are identifiable by their behavioral habits, which we define as fingerprints: a probe over these procedural signatures attributes an unseen trajectory to the correct agent at 85.7% accuracy, controlling for leakage across tasks. We develop procedural representations for agent problem-solving procedures with an emergent vocabulary induction technique that is meant to be maximally compressive to avoid surface-level variation while being expressive enough to unveil the quirks of the models' patterns. We apply our framework to the software engineering evaluation dataset SWE-Bench to study the structural distinctness of agent trajectories and find that behavior is most similar between models from similar release periods and those that are distilled from one another (e.g., a distilled student model and its teacher have a Jensen-Shannon divergence of 0.25, about half the distance between other model pairs). As more models saturate evaluations, we believe that it will be important to probe model behavior along more holistic dimensions than success rates alone. We introduce ProcGrep, a library for auditing and evaluating agents for how they approach tasks at a procedural level given their traces in a top-down fashion. We believe this work has a range of applications to help developers work with and program coding agents, such as task-aware model routing, agent monitoring, and finer-grained cost analysis.

2606.16999 2026-06-16 cs.SE cs.CL cs.LG 交叉投稿

Selection Without Signal, Recovery Through Expression: A Measurement Study of Post-Hoc Falsification Operators for Frozen Small Code Models

无信号下的选择,通过表达恢复:冻结小代码模型的事后伪造操作符的测量研究

Mehmet Iscan

AI总结 本研究测量了冻结小代码模型的事后语义操作符(如选择、验证、修复)的有效性,发现它们均未优于Best-of-N,并揭示了覆盖墙、能力剪刀和共识陷阱等机制原因;而表达层恢复(M1)通过鲁棒提取和签名对齐提升了准确率。

Comments 33 pages, 4 figures, 8 tables

详情
AI中文摘要

冻结的小代码模型(<=1.5B参数,本地运行无需微调)适用于离线或隐私受限场景,但常输出看似合理实则错误的程序。一种自然的补救措施是事后操作符,无需重新训练即可选择、验证、修复或重新处理模型的样本;其原则形式是波普尔式的:用严格测试攻击每个候选,保留通过者。我们测量了这类操作符是否有帮助。在单一确定性执行预言机和无泄漏、计算匹配的协议下,26种语义事后操作符(选择、验证、修复、淘汰、组合、合理否决、生成条件)与Best-of-N(BoN)进行了比较;在测试的单元和基准上,没有一种操作符在保留集上的准确率优于BoN。这种负面结果源于机制原因:覆盖墙(系统性困难任务失败,更深采样无法挽救)、能力剪刀(有能力的生成器使得可见测试通过者之间几乎不存在可区分的错误)以及近乎空的共识陷阱(可见通过但隐藏错误的多数,无泄漏选择器需要与正确替代方案同时出现,但这种情况很少发生)。一个无分布假设的无害界无法在零观察伤害下保证伤害率<=alpha,除非n>=45。两种操作符在语义输出空间之外的不同轴线上有所帮助。表达层恢复(M1)是这里唯一的准确率提升,它恢复了标准提取器丢弃的正确程序(鲁棒提取和公开测试签名对齐);它无害(b10=0),无泄漏,并在HumanEval+上使DeepSeek-Coder-1.3B提升了+12个任务(p=2.4e-4)。自适应共识早停(ACE)是一种校准的计算节省控制(约节省19%,零伤害)。M1和选择负面结果在HumanEval+和MBPP+上跨三个模型单元复现。教训是:在指责语义事后推理之前,先修复测试框架并测量覆盖范围。

英文摘要

Frozen small code models (<=1.5B parameters, run locally without fine-tuning) suit offline and privacy-constrained use, but often emit plausible-but-wrong programs. A natural remedy is a post-hoc operator that selects, verifies, repairs, or re-processes the model's samples without retraining; in principled form it is Popperian: attack each candidate with a severe test, keep what survives. We measure whether such operators help. Under one deterministic execution oracle and a leakage-free, matched-compute protocol, 26 semantic post-hoc operators (selection, verification, repair, elimination, portfolios, sound vetoes, generation conditioning) are evaluated against Best-of-N (BoN); on the cells and benchmarks tested, none improves held-out accuracy over BoN. The negative is mechanistic: a coverage wall (systematic hard-task failures deeper sampling does not rescue), a capability scissors (a competent generator leaves almost no discriminable error among visible-test passers), and a near-empty consensus trap (the visible-pass-but-hidden-wrong majority a leakage-free selector needs rarely co-occurs with a correct alternative). A distribution-free do-no-harm bound cannot certify a harm rate <=alpha at zero observed harm unless n>=45. Two operators help on a different axis, outside the semantic output space. An expression-layer recovery (M1), the only accuracy gain here, recovers correct programs the standard extractor discards (robust extraction and public-test signature alignment); it does no harm (b10=0), is leakage-free, and lifts DeepSeek-Coder-1.3B by +12 tasks on HumanEval+ (p=2.4e-4). An adaptive consensus early-stop (ACE) is a calibrated compute-saving control (~19% saving, zero harm). M1 and the selection negative replicate on HumanEval+ and MBPP+ across three model cells. The lesson: fix the harness and measure coverage before blaming semantic post-hoc reasoning.

2606.17022 2026-06-16 math.ST cs.LG stat.ML stat.TH 交叉投稿

Learning the Geometry of Data: A Mathematical Review of Shape Space Analysis

学习数据的几何:形状空间分析的数学综述

Gary P. T. Choi, Khanh Dao Duc, Shira Faigenbaum-Golovin, Karen Habermann, Emmanuel Hartman, Christoph von Tycowicz, Chi Zhang, Wenjun Zhao, Felix Zhou

AI总结 本文综述形状空间分析,利用微分几何、统计学和机器学习构建从形状表示到几何感知学习的分析流程,用于表征几何数据中的非线性结构。

Comments 79 pages, 10 figures, 8 tables

详情
AI中文摘要

机器学习的一个核心目标是识别数据中的结构和模式。数据采集的进步日益产生具有丰富几何形态的观测数据集,从而产生了编码对象几何变异的形状空间。这类数据集出现在广泛的学科中,包括生物学、医学、人类学和计算机视觉,其中微妙的几何差异通常携带重要的科学信息。然而,传统的机器学习方法常常不足以解释这些数据背后的非线性几何结构。本综述综合了快速增长的形状空间分析工作,该工作为几何数据的研究提供了数学和计算框架。借鉴微分几何、统计学和机器学习的理念,我们围绕一个共同的分析流程组织文献:形状表示和参数化、稳健测地距离的严格构造、形状空间上的统计分析以及几何感知的学习方法。我们讨论了这些工具如何能够表征形状变异、比较几何对象以及分析跨群体和时间的结构轨迹。为了说明该领域的广度,我们重点介绍了跨越多个生物组织尺度的应用,包括亚细胞形态学和灵长类牙齿进化的研究。在这些以及许多其他领域中,研究人员面临着由复杂、非线性且常常未对齐的几何变异引起的共同挑战。本综述最后指出了关键的理论和计算挑战,以及由日益庞大和多样化的几何数据集驱动的新兴机遇。

英文摘要

A central objective of machine learning is to identify structure and patterns in data. Advances in data acquisition have increasingly produced datasets whose observations possess rich geometric form, giving rise to shape spaces that encode variability in object geometry. Such datasets arise across a wide range of disciplines, including biology, medicine, anthropology, and computer vision, where subtle geometric differences often carry important scientific information. Traditional machine learning methods, however, are frequently ill-equipped to account for the nonlinear geometric structure underlying these data. This survey synthesizes a rapidly growing body of work on shape space analysis, which provides a mathematical and computational framework for the study of geometric data. Drawing on ideas from differential geometry, statistics, and machine learning, we organize the literature around a common analytical pipeline: shape representation and parameterization, the rigorous construction of robust geodesic metrics, statistical analysis on shape spaces, and geometry-aware learning methods. We discuss how these tools enable the characterization of shape variability, the comparison of geometric objects, and the analysis of structural trajectories across populations and time. To illustrate the breadth of the field, we highlight applications spanning multiple scales of biological organization, including studies of subcellular morphology and primate tooth evolution. Across these and many other domains, researchers face common challenges arising from complex, nonlinear, and often unaligned geometric variation. The review concludes by identifying key theoretical and computational challenges, as well as emerging opportunities driven by increasingly large and diverse geometric datasets.

2602.02819 2026-06-16 cs.LG stat.ML 版本更新

Causal Evaluation of Membership Inference Attacks

成员推断攻击的因果评估

Mathieu Even, Clément Berenfeld, Linus Bleistein, Tudor Cebere, Julie Josse, Aurélien Bellet

发表机构 * Inria(法国国家科学研究中心) PreMeDICaL, Inserm, Montpellier, France(PreMeDICaL、法国国家医学研究院、蒙彼利埃,法国) School of Computer and Communication Science (EPFL)(信息与通信科学学院(EPFL)) School of Life Sciences (EPFL)(生命科学学院(EPFL)) Lausanne, Switzerland(瑞士洛桑)

AI总结 将成员推断攻击评估视为因果推断问题,定义记忆化为包含数据点的因果效应,提出多轮、单轮和零轮设置下的实用估计器并验证其有效性。

Comments Fixed ref label problems

详情
AI中文摘要

成员推断攻击(MIA)旨在区分训练点(成员)和未见数据(非成员),并广泛用于量化记忆化和评估隐私风险。标准MIA评估需要重复训练,对于大型模型计算成本高昂。单轮(单次训练,随机数据包含)和零轮(事后评估)方法常被用作替代,但其统计有效性尚不清楚。我们通过将MIA评估框架化为因果推断问题来填补这一空白,将\emph{记忆化定义为在训练集中包含一个数据点的因果效应}。这一新颖的表述揭示并形式化了现有协议中偏差的关键来源:单轮方法受到联合包含点之间的干扰,而零轮评估还受到成员与非成员评估数据之间分布偏移的混淆。我们推导了标准MIA指标的因果类比,并提出了多轮、单轮和零轮设置下的实用估计器,具有非渐近一致性保证。我们在多个设置中验证了我们的方法,包括预训练和微调的大型语言模型,表明它能够在无需重新训练且存在分布偏移的情况下可靠地测量MIA性能。总体而言,我们的框架为现代AI系统中的隐私评估提供了原则性基础。

英文摘要

Membership Inference Attacks (MIAs) aim to distinguish training points (members) from unseen data (non-members), and are widely used to quantify memorization and assess privacy risks. Standard MIA evaluation requires repeated retraining, which is computationally costly for large models. One-run (single training with randomized data inclusion) and zero-run (post hoc evaluation) methods are often used instead, but their statistical validity remains unclear. We address this gap by framing MIA evaluation as a causal inference problem, defining \emph{memorization as the causal effect of including a data point in the training set}. This novel formulation reveals and formalizes key sources of bias in existing protocols: one-run methods suffer from interference between jointly included points, while zero-run evaluations are additionally confounded by distribution shift between member and non-member evaluation data. We derive causal analogues of standard MIA metrics and propose practical estimators for multi-run, one-run, and zero-run regimes with non-asymptotic consistency guarantees. We validate our approach in several settings, including pretrained and fine-tuned LLMs, showing that it enables reliable measurement of MIA performance without retraining and under distribution shift. Overall, our framework provides a principled foundation for privacy evaluation in modern AI systems.

2602.09326 2026-06-16 cs.LG 版本更新

Priority-Aware Shapley Value

优先级感知的Shapley值

Kiljae Lee, Ziqi Liu, Weijing Tang, Yuan Zhang

发表机构 * arXiv

AI总结 提出优先级感知Shapley值(PASV),通过硬约束和软优先级权重扩展Shapley值,适用于依赖贡献者场景,并开发高效采样算法。

详情
AI中文摘要

Shapley值广泛用于模型无关的数据估值和特征归因,但隐含假设贡献者可互换。当贡献者存在依赖关系(例如,重用/增强数据或因果特征排序)或贡献应根据信任或风险等因素调整时,这可能存在问题。我们提出优先级感知Shapley值(PASV),它同时包含硬优先级约束和软、贡献者特定的优先级权重。PASV适用于一般优先级结构,将仅优先级和仅权重的Shapley变体作为特例恢复,并由自然公理唯一刻画。我们开发了一种高效的相邻交换Metropolis-Hastings采样器,用于可扩展的蒙特卡洛估计,并分析了由极端优先级权重引起的极限状态。在数据估值(MNIST/CIFAR10)和特征归因(Census Income)上的实验展示了更结构忠实的分配,并通过我们提出的“优先级扫描”进行了实用的敏感性分析。

英文摘要

Shapley values are widely used for model-agnostic data valuation and feature attribution, yet they implicitly assume contributors are interchangeable. This can be problematic when contributors are dependent (e.g., reused/augmented data or causal feature orderings) or when contributions should be adjusted by factors such as trust or risk. We propose Priority-Aware Shapley Value (PASV), which incorporates both hard precedence constraints and soft, contributor-specific priority weights. PASV is applicable to general precedence structures, recovers precedence-only and weight-only Shapley variants as special cases, and is uniquely characterized by natural axioms. We develop an efficient adjacent-swap Metropolis-Hastings sampler for scalable Monte Carlo estimation and analyze limiting regimes induced by extreme priority weights. Experiments on data valuation (MNIST/CIFAR10) and feature attribution (Census Income) demonstrate more structure-faithful allocations and a practical sensitivity analysis via our proposed "priority sweeping".

2605.18909 2026-06-16 cs.LG cs.SY eess.SY 版本更新

Descriptive versus Regulatory Uncertainty in Bounded Predictive Systems

描述性不确定性与监管性不确定性在有界预测系统中的区别

Ahmed Gamal Eldin

发表机构 * Nova University Lisbon – Cairo Branch (NOVA IMS)(里斯本诺瓦大学-开罗分校区(NOVA IMS))

AI总结 本文研究了有界预测系统中描述性不确定性与监管性不确定性的区别,证明了当前Transformer架构在推理时仅限于描述性不确定性,并通过热力学原理和实验验证了这种区别,发现熵与准确性相互正交,且系统规模不影响熵的平坦性。

详情
AI中文摘要

任何在有限表示能力下建模世界的系统都必须压缩;任何压缩都意味着一个先验;而先验是系统的偏差。尚未确立的是不确定性是否参与决定未来行为的动力学,或者仅仅是描述输出分布而无后果。我们引入了描述性不确定性与监管性不确定性的结构性区别:前者不递归地调节系统的策略,后者直接进入优化景观并驱动持续的适应性重构。我们证明当前Transformer架构在推理时受限于描述性不确定性。我们通过热力学原理(Landauer原理)来支撑这一结论:若不确定性具有监管性,则知识性误差必须消耗真实能量;在解耦系统中,幻觉和正确推导消耗的能量相同。我们通过在三个本地部署的语言模型(3B、8B、70B参数)上进行实验证明这一点。在所有三个模型中,token级Shannon熵在跨模式检索、因果操作应用和分布外因果泛化任务中统计上不变(所有成对p值≥0.568;模型内范围0.011-0.028纳特),而任务准确性在相同条件下变化显著(0%-100%)。熵和准确性相互正交。解耦是规模不变的:更大模型实现更高的准确性但熵的平坦性相同。这种结构性的无能无法通过增加参数或训练数据来解决。真正的知识性基础需要热力学基态与信息处理成本之间的物理耦合。

英文摘要

Any system that models the world under finite representational capacity must compress; any compression entails a prior; and the prior is the system's bias. What has not been established is whether uncertainty participates in the dynamics governing future behavior, or merely describes the output distribution without consequence. We introduce a structural distinction between descriptive uncertainty, which does not recursively modulate the system's policy, and regulatory uncertainty, which directly enters the optimization landscape and drives persistent adaptive restructuring. We prove formally that current transformer architectures are confined to descriptive uncertainty at inference. We ground this in thermodynamics via Landauer's principle: for uncertainty to be regulatory, epistemic error must cost real energy; in a decoupled system, hallucinations and correct derivations dissipate identical energy. We test this empirically across three locally-deployed language models (3B, 8B, 70B parameters). Token-level Shannon entropy is statistically invariant across tasks spanning pattern retrieval, causal operator application, and out-of-distribution causal generalization in all three models (all pairwise p >= 0.568; within-model ranges 0.011-0.028 nats), while task accuracy varies substantially across the same conditions (0%-100%). Entropy and accuracy are orthogonal. The decoupling is scale-invariant: larger models achieve higher accuracy but identical entropy flatness. This structural incapacity is not resolvable by additional parameters or training data. Genuine epistemic grounding requires physical coupling between thermodynamic substrate state and information processing cost.

2507.22951 2026-06-16 cs.AI cs.LG 版本更新

Unifying Post-hoc Explanations of Knowledge Graph Completions

统一知识图谱补全的事后解释

Alessandro Lonardi, Samy Badreddine, Tarek R. Besold, Pablo Sanchez Martin

发表机构 * Sony AI, Barcelona, Spain(索尼人工智能,巴塞罗那,西班牙)

AI总结 针对知识图谱补全缺乏统一事后解释框架的问题,提出基于多目标优化的分类法,统一现有算法,改进评估协议,强调可解释性对用户查询的重要性。

Comments 22 pages, 8 figures, 4 tables

详情
AI中文摘要

知识图谱将信息组织为实体-关系-实体三元组,使机器学习模型能够预测可能缺失的三元组,这一任务称为知识图谱补全(KGC)。KGC的事后可解释性解决的是识别哪些三元组最影响机器学习模型预测的问题。目前,该领域缺乏形式化和一致的评估,阻碍了可重复性和跨研究比较。本文主张为KGC中的事后可解释性建立统一的分类法。首先,我们通过多目标优化提出事后解释的特征描述,统一了KGC中现有的事后可解释性算法及其产生的解释,平衡了解释有效性和简洁性。接着,我们通过说明性实验,基于流行指标(如平均倒数排名和Hits@k)检验了改进的评估协议。最后,我们强调可解释性作为解释解决最终用户有意义查询的能力的重要性。通过统一方法和讨论评估标准,本文为KGC可解释性中更可重复和更有影响力的研究提供了论据。

英文摘要

Knowledge Graphs organize information as entity-relation-entity triples, enabling machine learning models to predict plausible missing triples in a task known as Knowledge Graph Completion (KGC). Post-hoc explainability for KGC addresses the problem of identifying which triples most influence the predictions of machine learning models. Currently, the field lacks formalization and consistent evaluations, hindering reproducibility and cross-study comparisons. This paper argues for a unified taxonomy for post-hoc explainability in KGC. First, we propose a characterization of post-hoc explanations via multi-objective optimization that unifies existing post-hoc explainability algorithms in KGC and the explanations they produce, balancing explanation effectiveness and conciseness. Next, we examine improved evaluation protocols based on popular metrics, such as Mean Reciprocal Rank and Hits@k, through illustrative experiments. Finally, we stress the importance of interpretability as the ability of explanations to address queries meaningful to end users. By unifying methods and discussing evaluation standards, this work puts forward a case for more reproducible and impactful research in KGC explainability.

2512.09831 2026-06-16 cs.AI cs.LG cs.MA cs.SI 版本更新

Interpretation as Linear Transformation: A Cognitive-Geometric Model of Concepts and Meaning

解释作为线性变换:概念与意义的认知几何模型

Chainarong Amornbunchornvej

发表机构 * National Electronics and Computer Technology Center(国家电子与计算机技术中心)

AI总结 提出一个几何框架,通过线性映射和向量空间建模异构智能体间的概念传递、动机与影响,揭示误解与概念消亡的结构条件,并给出领导力的可达性解释。

Comments The revised draft w.r.t. reviewer comments. The code is at https://github.com/DarkEyes/Cognitive-Geometry

详情
AI中文摘要

本文发展了一个几何框架,用于建模认知异构智能体间的概念、动机和影响。每个智能体由一个个性化价值空间表示,这是一个编码智能体解释和评估意义的内在维度的向量空间。评价性概念被形式化为结构化向量(抽象存在),其传递由线性解释映射中介。抽象存在只有在避免这些映射的零空间时才能在通信中存活,从而为可理解性、误解和概念消亡提供了结构性标准。在该框架内,我展示了概念扭曲、动机漂移和相互理解的限制如何源于纯代数约束。一个核心结果——无零空间领导条件——将领导力刻画为表征可达性的属性,而非说服或权威。更广泛地,该模型解释了抽象存在在穿越不同认知几何时如何传播、变异或消失。该理论通过将意义保存建立在结构兼容性而非共享信息或理性之上,统一了概念空间、社会认识论和AI价值对齐的见解。我认为,这种认知几何视角澄清了人类和人工系统中影响的认知边界,并为分析异构智能体间的概念动力学提供了通用基础。

英文摘要

This paper develops a geometric framework for modeling concepts, motivation, and influence across cognitively heterogeneous agents. Each agent is represented by a personalized value space, a vector space encoding the internal dimensions through which the agent interprets and evaluates meaning. Evaluative concepts are formalized as structured vectors, abstract beings, whose transmission is mediated by linear interpretation maps. An abstract being survives communication only if it avoids the null spaces of these maps, yielding a structural criterion for intelligibility, miscommunication, and concept death. Within this framework, I show how conceptual distortion, motivational drift, and the limits of mutual understanding arise from purely algebraic constraints. A central result, the No-Null-Space Leadership Condition, characterizes leadership as a property of representational reachability rather than persuasion or authority. More broadly, the model explains how abstract beings can propagate, mutate, or disappear as they traverse diverse cognitive geometries. The account unifies insights from conceptual spaces, social epistemology, and AI value alignment by grounding meaning preservation in structural compatibility rather than shared information or rationality. I argue that this cognitive-geometric perspective clarifies the epistemic boundaries of influence in both human and artificial systems, and offers a general foundation for analyzing conceptual dynamics across heterogeneous agents.

2512.21577 2026-06-16 cs.CL cs.AI cs.LG stat.ML 版本更新

A Unified Definition of Hallucination: It's The World Model, Stupid!

幻觉的统一定义:是世界模型的问题,笨蛋!

Emmy Liu, Varun Gangal, Chelsea Zou, Michael Yu, Xiaoqi Huang, Alex Chang, Zhuofu Tao, Karan Singh, Sachin Kumar, Steven Y. Feng

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出幻觉的统一定义,即用户可观察到的错误内部世界建模,并连接至HalluWorld基准测试,以区分真实幻觉与规划或奖励错误。

Comments ICML 2026. HalluWorld benchmark at https://github.com/DegenAI-Labs/HalluWorld

详情
AI中文摘要

尽管自语言模型诞生以来已有无数缓解尝试,但即使在当今最前沿的LLM中,幻觉仍然是一个持续存在的问题。这是为什么?我们回顾了现有的幻觉定义,并将它们整合为一个统一的定义,其中先前的定义被包含在内。我们认为,幻觉可以通过将其简单地定义为不准确的(内部)世界建模来统一,其形式是用户可观察到的。例如,陈述与知识库相矛盾的事实,或生成与来源相矛盾的摘要。通过改变参考世界模型和冲突策略,我们的框架统一了先前的定义。我们认为,这种统一观点是有用的,因为它迫使评估澄清其假定的参考“世界”,区分真实幻觉与规划或奖励错误,并为跨基准比较和缓解策略讨论提供共同语言。基于这一定义,我们还将我们的框架连接到HalluWorld,这是一个补充基准,它实例化了完全指定的参考世界模型,用于压力测试模型幻觉。

英文摘要

Despite numerous attempts at mitigation since the inception of language models, hallucinations remain a persistent problem even in today's frontier LLMs. Why is this? We review existing definitions of hallucination and fold them into a single, unified definition wherein prior definitions are subsumed. We argue that hallucination can be unified by defining it as simply inaccurate (internal) world modeling, in a form where it is observable to the user. For example, stating a fact which contradicts a knowledge base OR producing a summary which contradicts the source. By varying the reference world model and conflict policy, our framework unifies prior definitions. We argue that this unified view is useful because it forces evaluations to clarify their assumed reference "world", distinguishes true hallucinations from planning or reward errors, and provides a common language for comparison across benchmarks and discussion of mitigation strategies. Building on this definition, we also connect our framework to HalluWorld, a complementary benchmark that instantiates fully specified reference world models for stress-testing model hallucinations.

2606.10740 2026-06-16 cs.AI cs.CL cs.LG 版本更新

When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models

当思维链更清楚时:多轮推理模型的失败模式

Sai Kartheek Reddy Kasu, Nils Lukas, Samuele Poppi

发表机构 * GitHub

AI总结 提出CoT-Output 2x2安全矩阵诊断多轮推理模型隐藏的时间动态失败,发现监督悖论和上下文注入失败两种可复现漏洞。

Comments Accepted at the ICML 2026 Workshop on Failure Modes in Agentic AI (FAGEN)

详情
AI中文摘要

多轮推理模型中的失败在终端评分评估中基本不可见。模型可能在长对话早期锁定不安全立场,但其最终轮拒绝率可能看起来与稳健对齐的基线无法区分。为了揭示这些隐藏的时间动态,我们提出了一种轨迹级诊断方法——CoT-Output 2x2安全矩阵。该框架沿两个独立轴(内部推理和可见输出)标记每一轮,产生四个操作定义的失败单元:稳健对齐、对齐伪装、显式越狱,以及我们称为上下文注入失败的不同失败模式(其中CoT保持安全推理,但可见输出产生危害,突出了多轮推理不忠实的表现)。我们在五个监督条件下针对固定攻击者评估了三个蒸馏推理目标,在信息危害场景上收集了6750个轮级观察。我们的分析揭示了两个可复现的漏洞:一个监督悖论,其中显式监控线索反而增加对齐伪装率而非抑制它;以及一个上下文注入失败,其中模型尽管内部状态安全却锁定不安全的外部输出。我们发布了多轮对话和CoT轨迹的完整数据集,以支持后续的轨迹诊断研究。

英文摘要

Failures in multi-turn reasoning models are largely invisible to terminal-score evaluation. A model can lock onto an unsafe stance early in a long dialogue, yet its final-turn refusal rate may appear indistinguishable from a robustly aligned baseline. To expose these hidden temporal dynamics, we propose a trace-level diagnostic - the CoT-Output 2x2 safety matrix. This framework labels every turn along two independent axes (internal reasoning and visible output), yielding four operationally defined failure cells: robust alignment, alignment faking, overt jailbreak, and a distinct failure mode we term context-injection failure (where the CoT maintains safe reasoning, but the visible output produces harm, highlighting a multi-turn manifestation of reasoning unfaithfulness). We evaluate three distilled reasoning targets against a fixed attacker across five oversight conditions, collecting 6750 turn-level observations on the Information-Hazard scenario. Our analysis reveals two reproducible vulnerabilities: an oversight paradox where explicit monitoring cues paradoxically increase alignment-faking rates rather than suppress them, and a context-injection failure where models lock onto unsafe external outputs despite safe internal states. We release the full dataset of multi-turn dialogues and CoT traces to support follow-up trace-diagnostic research.

2606.13295 2026-06-16 stat.ML cs.LG stat.ME 版本更新

Simultaneous Latent Budget Trees for Stratified Classification

用于分层分类的同时潜在预算树

Simultaneous Latent Budget Trees for Stratified Classification Cristian Buoncompagni, Stefano Pellegrino, Giulia Vannucci, Raffaele Dubbioso, Roberta Siciliano

AI总结 提出同时潜在预算树框架,通过模型驱动的分裂规则处理分层因素,实现可解释分类,并应用于肌萎缩侧索硬化症性别差异分析。

详情
AI中文摘要

在可解释人工智能时代,单棵树因其易于解释而重新受到关注。本文介绍了同时潜在预算树,这是一个概率机器学习框架,用于在存在分层因素(如时间、空间或人口统计变量)作为控制变量或潜在混杂因素时的分类树。标准的树生长过程并非设计用于优化条件分裂规则。提出了一种基于模型的分裂规则,其中子节点被解释为同时混合模型(如同时潜在预算模型及其约束版本)的潜在成分,该模型拟合于父节点。混合参数驱动观测值(不同组别不同)到达子节点,而潜在预算参数更新控制变量每个水平的响应类别轮廓。参数通过最小二乘法估计,考虑模型的神经网络视角。信息丰富的树结构可以通过节点和路径上的解释辅助工具进行交互式可视化,包括视觉剪枝和决策树选择过程。提出了适当的措施来处理不平衡的响应类别分布。所提出的方法应用于调查肌萎缩侧索硬化症疾病进展中的性别相关差异。SLBT库及其各种基于树的算法可在链接的GitHub仓库中获取。

英文摘要

In the era of Explainable Artificial Intelligence, there is a renewed focus on single trees for their ease of interpretation. This paper introduces Simultaneous Latent Budget Trees, a probabilistic machine learning framework for classification trees in the presence of a stratification factor such as a temporal, spatial, or demographic variable, acting as a control variable or potential confounder. Standard tree growth procedures are not designed to optimize a conditional split rule. A model-based split rule is proposed in which child nodes are interpreted as latent components of a simultaneous mixture model, such as the Simultaneous Latent Budget Model and its constrained versions, fitted to the parent node. Mixing parameters drive the observations, differently for each group, to the child nodes whereas latent budgets parameters update the response classes profile of each level of the control variable. Parameters are estimated by least squares considering a neural network perspective of the model. An informative tree structure can be interactively visualized with interpretation aids on the node and the paths, including visual pruning and decision tree selection procedure. Suitable measures are proposed to handle an unbalanced response class distribution. The proposed methodology is applied to investigate gender-related differences in disease progression of Amyotrophic Lateral Sclerosis. The SLBT library with the various tree-based algorithms is available in the linked GitHub repository.

2605.02593 2026-06-16 cs.LG cs.MS 版本更新

Gradient Boosted Risk Scores

梯度提升风险评分

Costa Georgantas, Jonas Richiardi

发表机构 * Department of Radiology, Lausanne University Hospital and University of Lausanne(放射科,洛桑大学医院和洛桑大学)

AI总结 提出基于梯度提升的算法构建紧凑且可预测的风险评分模型,能建模非线性效应,在12个表格数据集上相比回归方法平均减少60%分类规则和16%时间事件规则。

详情
AI中文摘要

风险评分是一类可解释且可操作的机器学习模型,在医学、保险和风险管理中有应用。与大多数计算方法不同,风险评分设计为由人类通过基于有限标准集对数据样本分配分数来计算。生成风险评分的最常见方法使用线性回归来估计选定变量的效应。我们提出了一种构建紧凑且预测性强的风险评分的简单有效方法。我们提供了一种基于梯度提升的算法,能够建模非线性效应,并附带一个C++实现以及Python和R绑定。通过在12个表格数据集(涵盖回归、分类和时间事件任务)上的广泛实证评估,我们表明,与基于回归的替代方法相比,我们的方法在实现竞争性预测性能的同时,生成了更紧凑的评分,分类任务平均减少60%的规则,时间事件任务平均减少16%的规则(与AutoScore相比)。

英文摘要

Risk scores are an interpretable and actionable class of machine learning models with applications in medicine, insurance, and risk management. Unlike most computational methods, risk scores are designed to be computed by a human by attributing points to a data sample based on a limited set of criteria. The most common approaches for generating risk scores use linear regressions to estimate the effect of selected variables. We propose a simple and effective approach towards building compact and predictive risk scores. We provide an algorithm based on gradient boosting that is capable of modeling nonlinear effects, along with a C++ implementation with Python and R bindings. Through extensive empirical evaluation on twelve tabular datasets spanning regression, classification, and time-to-event tasks, we show that our method achieves competitive predictive performance while producing substantially more compact scores than regression-based alternatives, with 60% fewer rules for classification tasks and 16% fewer rules for time-to-event tasks on average, compared to AutoScore.

2605.25006 2026-06-16 cs.RO cs.LG cs.NE 版本更新

Convex-Neural RRT*: Fast and Reliable Learning-Guided Sampling for High-Quality Robot Path Planning

Convex-Neural RRT*: 快速可靠的基于学习引导的高质量机器人路径规划采样

Hichem Cheriet, Badra Khellat Kihel, Samira Chouraqui, Bara J. Emran

AI总结 提出Convex-Neural RRT*算法,通过神经网络预测高质量路径附近的凸候选区域来引导采样,在多种环境中相比神经引导变体减少30-75%计算时间,路径长度平均减少约5%,成功率超99%。

详情
AI中文摘要

基于采样的机器人路径规划算法在不同障碍物配置的环境中提供了概率完备性和强经验收敛性。然而,在实践中,这些方法通常需要多次迭代才能获得高质量解。本文提出了Convex-Neural RRT*,一种增强的RRT*变体,它结合神经引导来预测高质量路径附近的信息性航点区域。从这些预测中提取凸候选区域,使规划器能够将探索集中在几何相关区域,同时保持全局探索。该算法在三种环境类型和18个基准地图上与Neural RRT*、Neural Informed RRT*、经典RRT*和LTA*进行了评估。实验结果表明,与神经引导变体相比,Convex-Neural RRT*减少了30-75%的计算时间,相对于LTA*减少了高达88-98%,同时与经典RRT*相比,平均路径长度减少了约5%,在复杂环境中改进更大。该方法在不同障碍物密度下保持了超过99%的整体成功率。这些发现表明,凸引导神经采样在计算效率和解质量之间提供了有效平衡,支持其在时间敏感的机器人导航任务中的适用性。

英文摘要

Sampling-based algorithms for robot path planning offer probabilistic completeness and strong empirical convergence properties across environments with diverse obstacle configurations. However, in practice, these methods often require many iterations to obtain high-quality solutions. This paper proposes Convex-Neural RRT*, an enhanced RRT* variant that incorporates neural guidance to predict informative waypoint regions near high-quality paths. Convex candidate regions are extracted from these predictions, enabling the planner to concentrate exploration on geometrically relevant areas while preserving global exploration. The proposed algorithm is evaluated against Neural RRT*, Neural Informed RRT*, classical RRT*, and LTA* across three environment types and 18 benchmark maps. Experimental results show that Convex-Neural RRT* reduces computation time by 30-75% compared to neural-guided variants and up to 88-98% relative to LTA*, while achieving an average path length reduction of approximately 5% compared to classical RRT*, with larger improvements observed in complex environments. The method also maintains an overall success rate above 99% across varying obstacle densities. These findings indicate that convex-guided neural sampling provides an effective balance between computational efficiency and solution quality, supporting its applicability to time-sensitive robotic navigation tasks.

2502.06178 2026-06-16 math.OC cs.LG stat.ML 版本更新

Bayesian Optimization by Kernel Regression and Density-based Exploration

基于核回归和密度探索的贝叶斯优化

Tansheng Zhu, Hongyu Zhou, Ke Jin, Xusheng Xu, Qiufan Yuan, Lijie Ji

发表机构 * Zhiyuan College, Shanghai Jiao Tong University, Shanghai 200240, P. R. China(上海交通大学紫阳学院) School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai 200240, P. R. China(上海交通大学数学科学学院) Shanghai Institute of Aerospace Systems Engineering, Shanghai 201109, P. R. China(上海航天系统工程研究院) Department of Mathematics, Shanghai University, Shanghai 200444, P. R. China(上海大学数学系) Newtouch Center for Mathematics of Shanghai University, Shanghai University, Shanghai 200444, P. R. China(上海大学数学中心)

AI总结 该研究提出了一种新的贝叶斯优化算法BOKE,通过核回归和密度探索结合,减少计算成本至二次复杂度,并在理论和实验上证明了其收敛性和有效性。

详情
AI中文摘要

贝叶斯优化在优化昂贵评估的黑盒函数时非常有效,但因高斯过程的每次迭代三次计算复杂度而面临显著的计算挑战,导致总时间复杂度与迭代次数的四次方成正比。为了解决这一限制,我们提出了一种新的算法,即基于核回归和密度探索的贝叶斯优化(BOKE)。BOKE利用核回归进行高效的函数近似,核密度用于探索,并将它们整合到置信界标准中以指导优化过程,从而将计算成本降低到二次。我们的理论分析严格建立了在噪声评估下的BOKE全局收敛性。通过广泛的数值实验,在合成和现实优化任务中,我们证明了BOKE不仅在与高斯过程方法和其他基线方法相比具有竞争力,而且表现出优越的计算效率。这些结果突显了BOKE在资源受限环境中的有效性,为工程应用中的优化问题提供了一种实用的方法。

英文摘要

Bayesian optimization is highly effective for optimizing expensive-to-evaluate black-box functions, but it faces significant computational challenges due to the cubic per-iteration cost of Gaussian processes, which results in a total time complexity that is quartic with respect to the number of iterations. To address this limitation, we propose a novel algorithm, Bayesian optimization by kernel regression and density-based exploration (BOKE). BOKE uses kernel regression for efficient function approximation, kernel density for exploration, and integrates them into the confidence bound criteria to guide the optimization process, thus reducing computational costs to quadratic. Our theoretical analysis rigorously establishes the global convergence of BOKE under noisy evaluations. Through extensive numerical experiments on both synthetic and real-world optimization tasks, we demonstrate that BOKE not only performs competitively compared to Gaussian process-based methods and several other baseline methods but also exhibits superior computational efficiency. These results highlight BOKE's effectiveness in resource-constrained environments, providing a practical approach for optimization problems in engineering applications.

2605.06184 2026-06-16 cs.SE cs.LG cs.LO cs.PL 版本更新

Teaching LLMs Program Semantics via Symbolic Execution Traces

通过符号执行轨迹教学LLM程序语义

Jonas Bayer, Stefan Zetzsche, Olivier Bouissou, Remi Delmas, Michael Tautschnig, Soonho Kong

发表机构 * University of Cambridge(剑桥大学) Amazon Web Services(亚马逊网络服务)

AI总结 本文通过符号执行轨迹训练提升LLM对程序语义的理解,发现结合推理的训练显著提升了漏洞检测能力,且在不同属性类型上均有效。

详情
AI中文摘要

我们介绍了一个基于SV-COMP 2025的500个C语言验证任务评估框架,覆盖五种属性类型(内存安全、溢出、终止、可达性、数据竞争)。我们评估了14种模型,发现高整体准确率掩盖了关键弱点:虽然大多数模型能可靠确认属性成立,但违反检测差异大且随程序长度下降。为解决这一差距,我们训练了形式验证 artifacts:运行Soteria符号执行引擎于通用开源C代码并利用生成的轨迹继续预训练Qwen3-8B。仅约3,000个bug轨迹结合推理在推理时提升违反检测超过17个百分点,产生评估模型中最平衡的准确率曲线。在违反检测方面,训练后的8B模型在不思考的情况下优于4倍大的Qwen3-32B,在整体准确率上接近。轨迹训练与推理的交互是超加性的:单独使用无法带来明显提升,但结合使用则有效。改进在所有五种属性类型上均有效,包括训练轨迹未目标的属性类型。我们的28种配置证实收益源于轨迹语义而非代码体积,且轨迹整理和格式至关重要。

英文摘要

We introduce an evaluation framework of 500 C verification tasks across five property types (memory safety, overflow, termination, reachability, data races) built on SV-COMP 2025, and evaluate 14 models across six families. We find that high overall accuracy masks a critical weakness: while most models reliably confirm properties hold, violation detection varies widely and degrades sharply with program length. To close this gap, we train on formal verification artifacts: running the Soteria symbolic execution engine on generic open-source C code and using the resulting traces for continued pretraining of Qwen3-8B. Just ${\sim}$3,000 bug traces combined with chain-of-thought reasoning at inference time improve violation detection by over 17 percentage points, producing one of the most balanced accuracy profiles among evaluated models. On violation detection, the trained 8B model outperforms the 4$\times$ larger Qwen3-32B without thinking and approaches it in overall accuracy. The interaction between trace training and chain-of-thought is superadditive: neither alone provides meaningful gains, but their combination does. Improvements transfer across all five property types, including ones the training traces do not target. Our 28 configurations confirm the gains stem from trace semantics, not code volume, and that trace curation and format matter.

2604.22795 2026-06-16 eess.SY cs.LG cs.SY 版本更新

Load constrained wind farm flow control through multi-objective multi-agent reinforcement learning

基于多目标多智能体强化学习的负载约束风电场流动控制

Teodor Åstrand, Marcus Binder Nilsen, Iasonas Tsaklis, Tuhfe Göçmen, Pierre-Elouan Réthoré, Nikolay Dimitrov

发表机构 * Department of Wind and Energy Systems, Technical University of Denmark(丹麦技术大学风能与能源系统系)

AI总结 提出多智能体强化学习框架,结合独立软演员-评论家架构和数据驱动代理模型,在风电场流动控制中通过形状奖励函数约束损伤等效载荷增量,实现功率提升与负载控制的多目标优化。

Comments Submitted to Journal of Physics: Conference Series (Torque 2026). This is the Accepted Manuscript version of an article accepted for publication in Journal of Physics: Conference Series. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. This Accepted Manuscript is published under a CC BY licence

详情
Journal ref
J. Phys.: Conf. Ser. 3224 032065 (2026)
AI中文摘要

本研究提出了一种用于负载约束风电场流动控制(WFFC)的多智能体强化学习(MARL)框架。虽然尾流偏转可以提升风电场总功率,但通常会增加下游风机的结构载荷。为了解决这一问题,我们将独立软演员-评论家(I-SAC)架构与数据驱动的局部入流扇区平均代理模型相结合,以实时估计损伤等效载荷(DELs)。通过将这些估计值纳入形状奖励函数,训练特定风机的智能体在相对于基线控制器遵守特定载荷增加阈值($Δ_{max}$)为10%、20%和30%的同时最大化发电量。该框架在WindGym环境中实现,使用带有动态尾流蜿蜒(DWM)模型的DYNAMIKS流动求解器来捕捉非稳态尾流物理特性。结果表明,MARL智能体成功学习了协作策略,优先考虑功率增益,同时主动回避高DEL控制策略。

英文摘要

This study presents a multi-agent reinforcement learning (MARL) framework for load-constrained wind farm flow control (WFFC). While wake steering can enhance total wind farm power, it often introduces increased structural loads on downstream turbines. To address this, we integrate an Independent Soft Actor-Critic (I-SAC) architecture with a data-driven, local inflow sector-averaged surrogate model to provide real-time estimates of Damage Equivalent Loads (DELs). By incorporating these estimates into a shaped reward function, turbine-specific agents are trained to maximize power production while adhering to specific load-increase thresholds ($Δ_{max}$) of 10%, 20%, and 30% relative to a baseline controller. The framework is implemented within the WindGym environment using the DYNAMIKS flow solver with Dynamic Wake Meandering (DWM) model to capture non-stationary wake physics. Results indicate that the MARL agents successfully learn collaborative policies that prioritise power gain while actively retreating from high-DEL control strategies.

2511.12635 2026-06-16 cs.SE cs.AI cs.LG 版本更新

LLM4SCREENLIT: Recommendations on Assessing the Performance of Large Language Models for Screening Literature in Systematic Reviews

LLM4SCREENLIT: 关于评估用于系统综述文献筛选的大型语言模型性能的建议

Lech Madeyski, Barbara Kitchenham, Martin Shepperd

发表机构 * University of Kent(肯特大学) University of Leicester(利兹大学) University of Birmingham(伯明翰大学)

AI总结 本文提出LLM4SCREENLIT建议,针对系统综述文献筛选中大型语言模型的评估,提出基于加权马修相关系数的改进方法,强调在不平衡和成本不对称条件下使用成本敏感的WMCC进行评估。

Comments 34 pages, 6 figures

详情
Journal ref
Information and Software Technology 198 (2026) 108204
AI中文摘要

本文提出LLM4SCREENLIT建议,针对系统综述文献筛选中大型语言模型的评估,提出基于加权马修相关系数的改进方法,强调在不平衡和成本不对称条件下使用成本敏感的WMCC进行评估。

英文摘要

Context: Large language models (LLMs) are increasingly used to screen literature for systematic reviews (SRs), but the standard confusion-matrix metrics used to evaluate them can mislead under the imbalanced, cost-asymmetric conditions of screening. Objective: We develop and justify LLM4SCREENLIT-practical recommendations for researchers conducting LLM-screening evaluations and for editors and reviewers assessing such studies-differentiated by study type (retrospective benchmarking vs deployment for a specific SR). Method: Using Delgado-Chaves et al. (2025), an 18-LLM benchmark across three biomedical SRs, as a motivating example, we reviewed 28 additional papers and extracted their reported metrics. We propose a Weighted Matthews Correlation Coefficient (WMCC) that integrates MCC's chance-correction with asymmetric misclassification costs, and validated it on three software-engineering (SE) reanalyses, the largest covering 9 LLMs x 24 SE secondary studies (34,528 articles). Results: Across the 29 papers, only 10% reported MCC, only 24% reported full confusion matrices, and none of the five papers claiming workload savings priced false-negative cost. In the largest SE reanalysis, MCC and WMCC disagree on the best LLM in 55% of evaluable studies; in the most striking 9,695-article SE study, the Accuracy-best LLM loses 63.3% of relevant evidence (Lost Evidence), the MCC-best 43.9%, but the WMCC-best only 5.8%. Sensitivity analysis (median crossover at w~=2.7, all <7) supports w=10 as a conservative default. Conclusions: SR-screening evaluations should prioritize Lost Evidence and use cost-sensitive WMCC alongside MCC for ranking. Reporting must include the full confusion matrix and treat unclassifiable outputs as positives requiring human review. Designs should be leakage-aware, with non-LLM baselines when the study aims to inform SR practice and labels are available.

2601.03612 2026-06-16 cs.LG cs.SD eess.AS 版本更新

Mathematical Foundations of Polyphonic Music Generation via Structural Inductive Bias

通过结构归纳偏差的多声部音乐生成的数学基础

Joonwon Seo

发表机构 * GitHub

AI总结 本文通过结构归纳偏差提出多声部音乐生成的数学框架,采用贝多芬钢琴奏鸣曲案例,引入Smart Embedding架构,减少参数并提升模型稳定性。

Comments 86 pages. A comprehensive monograph on the Smart Embedding architecture for polyphonic music generation. Includes rigorous theoretical proofs using Information Theory, Rademacher Complexity, and the Rank-Preserving Transversality Property (RPTP), along with empirical validation and a human listening study (N=53)

详情
AI中文摘要

本文通过结构归纳偏差解决AI音乐生成中的'缺失中间'问题,即产生连贯的、句级音乐结构的挑战。以贝多芬的钢琴奏鸣曲为例,引入Smart Embedding架构,一种基于经验证实的音高和手部属性独立性(NMI=0.167)的因子化表示。该架构在减少嵌入参数48.3%的同时,将验证损失降低了9.47%。理论层面,通过信息论、Rademacher复杂度分析(得出28.09%更紧的泛化界限)和范畴论解释建立正式保证。这些结果进一步通过奇异值分解分析和盲专家听觉研究(N=53)得到支持。总体而言,本文结合了架构创新与数学严谨性,为复杂序列数据生成模型提供了原则性的框架,使其更加高效、稳定和可解释。

英文摘要

This monograph addresses the "Missing Middle" problem in AI music generation - the challenge of producing coherent, phrase-level musical structure. Using Beethoven's piano sonatas as a case study, I introduce the Smart Embedding architecture, a factorized representation grounded in the empirically verified independence of pitch and hand attributes (NMI=0.167). The architecture achieves a 48.3% reduction in embedding parameters while improving validation loss by 9.47%. Theoretically, I establish formal guarantees through information theory, Rademacher complexity analysis (yielding a 28.09% tighter generalization bound), and category-theoretic interpretation. These results are further supported by Singular Value Decomposition analysis and a blind expert listening study (N=53). Collectively, this work presents a dual contribution that combines architectural innovation with mathematical rigor, offering a principled framework for building more efficient, stable, and interpretable generative models for complex sequential data.

2603.22530 2026-06-16 cs.LG 版本更新

Multimodal Training to Unimodal Deployment: Leveraging Unstructured Data During Training to Optimize Structured Data Only Deployment

多模态训练到单模态部署:利用训练中的无结构数据优化仅结构化数据的部署

Zigui Wang, Minghui Sun, Jiang Shu, Matthew M. Engelhard, Lauren Franz, Benjamin A. Goldstein

发表机构 * Department of Biostatistics and Bioinformatics, Duke University School of Medicine(生物统计学与生物信息学系,杜克大学医学中心) Department of Pediatrics, Duke University School of Medicine(儿科学系,杜克大学医学中心) Duke Center for Autism and Brain Development, Duke University School of Medicine(杜克大学自主与脑发展中心,杜克大学医学中心) Department of Psychiatry and Behavioral Sciences, Durham, NC, USA(精神病学与行为科学系,新伯尔尼,NC,美国)

AI总结 本文提出一种多模态学习框架,利用训练中的无结构EHR数据提升模型对结构化数据的识别能力,通过对比学习和知识蒸馏损失联合训练,实现仅结构化数据部署的高效分类模型。

Comments 10 pages,3 figures

详情
Journal ref
Proceedings of the AMIA 2026 Annual Symposium, American Medical Informatics Association (AMIA), 2026
AI中文摘要

无结构电子健康记录(EHR)数据,如临床笔记,包含未直接反映在结构化数据字段中的临床上下文观察。这些额外信息可以显著提升模型学习。然而,由于其无结构特性,这些数据在部署模型时往往不可用或不实际使用。我们引入了一种多模态学习框架,在训练过程中利用无结构EHR数据,同时生成仅能使用结构化EHR数据部署的模型。使用3,466名晚说话儿童的队列,我们生成了注释嵌入(BioClinicalBERT)并编码了人口统计学和医疗代码的结构化嵌入。通过对比学习和对比知识蒸馏损失联合训练,一个基于注释的教师模型和仅结构化的学生模型生成了一个强大的分类器(AUROC = 0.985)。我们提出的模型达到了AUROC为0.705,优于仅结构化基线的0.656。这些结果表明,在训练中纳入无结构数据增强了模型识别任务相关信息的能力,使仅结构化数据的可部署现象模型成为可能。

英文摘要

Unstructured Electronic Health Record (EHR) data, such as clinical notes, contain clinical contextual observations that are not directly reflected in structured data fields. This additional information can substantially improve model learning. However, due to their unstructured nature, these data are often unavailable or impractical to use when deploying a model. We introduce a multimodal learning framework that leverages unstructured EHR data during training while producing a model that can be deployed using only structured EHR data. Using a cohort of 3,466 children evaluated for late talking, we generated note embeddings with BioClinicalBERT and encoded structured embeddings from demographics and medical codes. A note-based teacher model and a structured-only student model were jointly trained using contrastive learning and contrastive knowledge distillation loss, producing a strong classifier (AUROC = 0.985). Our proposed model reached an AUROC of 0.705, outperforming the structured-only baseline of 0.656. These results demonstrate that incorporating unstructured data during training enhances the model's capacity to identify task-relevant information within structured EHR data, enabling a deployable structured-only phenotype model.

2208.00335 2026-06-16 cs.LG 版本更新

Rule Extraction in Machine Learning: Chat Incremental Pattern Constructor

机器学习中的规则提取:聊天增量模式构造器

Caleb Princewill Nwokocha

发表机构 * Caleb Princewill Nwokocha

AI总结 提出ChatIPC系统,通过增量学习从文本中提取有序令牌转换规则,利用定义扩展和相似度引导候选选择构建响应,实现可解释的规则提取。

Comments 11 pages

详情
AI中文摘要

规则提取是可解释机器学习中的一个核心问题,因为它旨在将不透明的预测行为转换为人类可读的符号结构。本文提出了聊天增量模式构造器(ChatIPC),一个轻量级的增量符号学习系统,它从文本中提取有序的令牌转换规则,通过基于定义的扩展丰富这些规则,并通过相似度引导的候选选择构建响应。该系统可被视为在令牌图上运行的规则提取器,而非传统的分类器。我形式化了ChatIPC使用的知识库、定义扩展、候选评分、重复控制、英语规则启发式和响应构建机制。我还将该方法置于规则提取、决策树归纳、关联规则、可解释机器学习和序列构建的文献中。此外,详细回顾了更新后的实现:它解析嵌入式字典,标准化词汇键,缓存定义令牌和词性标签,在位集上计算Jaccard分数,应用启发式语言奖励,并使用带版本号的二进制格式持久化知识库。本文强调数学公式和算法清晰性,并为学习、评分和构建算法提供了伪代码。

英文摘要

Rule extraction is a central problem in interpretable machine learning because it seeks to convert opaque predictive behavior into human-readable symbolic structure. This paper presents Chat Incremental Pattern Constructor (ChatIPC), a lightweight incremental symbolic learning system that extracts ordered token-transition rules from text, enriches them with definition-based expansion, and constructs responses by similarity-guided candidate selection. The system may be viewed as a rule extractor operating over a token graph rather than a conventional classifier. I formalize the knowledge base, definition expansion, candidate scoring, repetition control, English-rule heuristics, and response construction mechanisms used by ChatIPC. I further situate the method within the literature on rule extraction, decision tree induction, association rules, interpretable machine learning, and sequence construction. The updated C++ code implementation of ChatIPC is also reviewed in detail: it parses an embedded dictionary, normalizes lexical keys, caches definition tokens and part-of-speech tags, computes Jaccard scores on bitsets, applies heuristic linguistic bonuses, and persists the knowledge base with a versioned binary format. The paper emphasizes mathematical formulation and algorithmic clarity, and it provides pseudocode for the learning, scoring, and construction algorithms.

2310.00336 2026-06-16 cs.LG 版本更新

DURENDAL: Graph deep learning framework for temporal heterogeneous networks

DURENDAL:用于时序异构网络的图深度学习框架

Manuel Dileo, Matteo Zignani, Sabrina Gaito

发表机构 * Department of Computer Science University of Milan(计算机科学系米兰大学)

AI总结 本文提出DURENDAL框架,用于处理时序异构网络,通过结合快照式和多关系消息传递模型的设计原则,改进异构图学习模型,并引入新的高分辨率时序异构网络数据集进行实验验证。

详情
AI中文摘要

时序异构网络(THNs)是演化的网络,广泛应用于引文网络、事件网络、推荐系统和知识图谱等现实世界应用。尽管不同的图神经网络(GNNs)已成功应用于动态图,但大多数仅支持同构图或受到特定THNs预测任务的模型设计影响。此外,当前标准图基准数据集缺乏时序异构网络数据。因此,本文提出DURENDAL,一种用于THNs的图深度学习框架。DURENDAL通过结合快照式和多关系消息传递图学习模型的设计原则,能够将任何异构图学习模型轻松应用于演化的网络。我们引入了两种不同的方案来更新THNs的嵌入表示,讨论了两种策略的优缺点。我们还通过引入两个新的高分辨率时序异构图数据集扩展了THNs的基准数据集,这两个数据集源自新兴的Web3平台和知名的电子商务网站。整体上,我们对四个时序异构网络数据集进行了实验评估,评估设置考虑了数据的演进性质。实验显示,DURENDAL在预测能力方面优于当前解决方案,并验证了其模型设计的有效性。

英文摘要

Temporal heterogeneous networks (THNs) are evolving networks that characterize many real-world applications such as citation and events networks, recommender systems, and knowledge graphs. Although different Graph Neural Networks (GNNs) have been successfully applied to dynamic graphs, most of them only support homogeneous graphs or suffer from model design heavily influenced by specific THNs prediction tasks. Furthermore, there is a lack of temporal heterogeneous networked data in current standard graph benchmark datasets. Hence, in this work, we propose DURENDAL, a graph deep learning framework for THNs. DURENDAL can help to easily repurpose any heterogeneous graph learning model to evolving networks by combining design principles from snapshot-based and multirelational message-passing graph learning models. We introduce two different schemes to update embedding representations for THNs, discussing the strengths and weaknesses of both strategies. We also extend the set of benchmarks for TNHs by introducing two novel high-resolution temporal heterogeneous graph datasets derived from an emerging Web3 platform and a well-established e-commerce website. Overall, we conducted the experimental evaluation of the framework over four temporal heterogeneous network datasets on future link prediction tasks in an evaluation setting that takes into account the evolving nature of the data. Experiments show the prediction power of DURENDAL compared to current solutions for evolving and dynamic graphs, and the effectiveness of its model design.

2511.00369 2026-06-16 cs.LG cs.AI cs.NE 版本更新

Balancing Interpretability and Performance in Motor Imagery EEG Classification: A Comparative Study of ANFIS-FBCSP-PSO and EEGNet

在运动想象EEG分类中平衡可解释性和性能:ANFIS-FBCSP-PSO和EEGNet的比较研究

Farjana Aktar, Mohd Ruhul Ameen, Akif Islam, Md Ekramul Hamid

发表机构 * University of Rajshahi(拉贾沙希大学)

AI总结 本文比较了ANFIS-FBCSP-PSO与EEGNet在BCI竞赛IV-2a数据集上的性能,发现模糊神经模型在内子试验中表现更优,而深度模型在跨受试者测试中更具泛化能力,为选择MI-BCI系统提供指导。

Comments Accepted at the 2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence and Networking (QPAIN 2026)

详情
Journal ref
2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking (QPAIN)
AI中文摘要

实现准确且可解释的运动想象EEG分类仍是脑机接口(BCI)研究中的关键挑战。本文比较了透明的模糊推理方法(ANFIS-FBCSP-PSO)与知名的深度学习基准(EEGNet),使用公开的BCI竞赛IV-2a数据集。ANFIS流程结合滤波器银行共同空间模式特征提取与通过粒子群优化优化的模糊IF-THEN规则,而EEGNet直接从原始EEG数据学习层次化的空间-时间表示。在内子试验中,模糊神经模型表现更好(68.58%±13.76%准确率,kappa=58.04%±18.43),而在跨受试者(LOSO)测试中,深度模型表现出更强的泛化能力(68.20%±12.13%准确率,kappa=57.33%±16.22)。因此,该研究为根据设计目标选择MI-BCI系统提供了实用指导:可解释性或用户间鲁棒性。未来对基于Transformer和混合神经符号框架的研究有望进一步推动透明的EEG解码。

英文摘要

Achieving both accurate and interpretable classification of motor-imagery EEG remains a key challenge in brain-computer interface (BCI) research. In this paper, we compare a transparent fuzzy-reasoning approach (ANFIS-FBCSP-PSO) with a well-known deep-learning benchmark (EEGNet) using the publicly available BCI Competition IV-2a dataset. The ANFIS pipeline combines filter-bank common spatial pattern feature extraction with fuzzy IF-THEN rules optimized via particle-swarm optimization, while EEGNet learns hierarchical spatial-temporal representations directly from raw EEG data. In within-subject experiments, the fuzzy-neural model performed better (68.58% +/- 13.76% accuracy, kappa = 58.04% +/- 18.43), while in cross-subject (LOSO) tests, the deep model exhibited stronger generalization (68.20% +/- 12.13% accuracy, kappa = 57.33% +/- 16.22). The study therefore provides practical guidance for selecting MI-BCI systems according to the design goal: interpretability or robustness across users. Future investigations into transformer-based and hybrid neuro-symbolic frameworks are expected to further advance transparent EEG decoding.

2505.04382 2026-06-16 eess.AS cs.LG cs.SD 版本更新

Discrete Optimal Transport and Voice Conversion

离散最优传输与语音转换

Anton Selitskiy, Maitreya Kocharekar

发表机构 * The University of Texas at Austin(德克萨斯大学奥斯汀分校) University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出kDOT框架,利用预训练语音嵌入空间进行语音转换,通过离散最优传输计划的质心投影改进分布对齐,提升WER、MOS和FAD性能。

Comments 5 pages, 1 figure, 7 tables. 11th International Conference on Machine Learning Technologies (ICMLT), Berlin, Germany, May 2026

详情
AI中文摘要

我们提出kDOT,一种在预训练语音嵌入空间中运行的离散最优传输(OT)框架,用于语音转换(VC)。与kNN-VC和SinkVC中的平均策略以及MKL中的独立假设不同,我们的方法利用离散OT计划的质心投影来构建源和目标说话人嵌入分布之间的传输映射。我们对传输嵌入数量进行了全面的消融研究,并系统分析了源和目标语音持续时间的影响。在LibriSpeech上的实验表明,具有质心投影的OT在分布对齐方面表现一致,并且在WER、MOS和FAD方面通常优于基于平均的方法。此外,我们还表明,将离散OT作为后处理步骤可以将伪造语音转换为被最新伪造检测器误判为真实语音的样本。这展示了OT在嵌入空间中的强大域适应能力,同时也揭示了伪造检测系统的重要安全影响。

英文摘要

We propose kDOT, a discrete optimal transport (OT) framework for voice conversion (VC) operating in a pretrained speech embedding space. In contrast to the averaging strategies used in kNN-VC and SinkVC, and the independence assumption adopted in MKL, our method employs the barycentric projection of the discrete OT plan to construct a transport map between source and target speaker embedding distributions. We conduct a comprehensive ablation study over the number of transported embeddings and systematically analyze the impact of source and target utterance duration. Experiments on LibriSpeech demonstrate that OT with barycentric projection consistently improves distribution alignment and often outperforms averaging-based approaches in terms of WER, MOS, and FAD. Furthermore, we show that applying discrete OT as a post-processing step can transform spoofed speech into samples that are misclassified as bona fide by a state-of-the-art spoofing detector. This demonstrates the strong domain adaptation capability of OT in embedding space, while also revealing important security implications for spoof detection systems.

2509.22935 2026-06-16 cs.LG cs.AI 版本更新

Compute-Optimal Quantization-Aware Training

计算最优量化感知训练

Aleksandr Dremov, David Grangier, Angelos Katharopoulos, Awni Hannun

发表机构 * Apple(苹果公司)

AI总结 本文研究了量化感知训练与全精度训练的计算分配优化问题,通过实验发现QAT与FP训练比例随总计算量增加而上升,并提出新的冷却与QAT融合方法以提升效率。

Comments ICLR 2026

详情
Journal ref
International Conference on Learning Representations (ICLR), 2026
AI中文摘要

量化感知训练(QAT)是提高量化神经网络精度的重要技术。先前研究表明,将训练分解为全精度阶段后接QAT阶段能获得更优精度。然而,全精度与QAT阶段的计算分配仍不明确。本文通过不同计算预算、QAT位宽和模型大小的实验,探讨了不同QAT持续时间对最终性能的影响。研究发现,与先前结论相反,QAT与全精度训练的损失最优比随总计算量增加而上升。使用tokens-per-parameter-byte统计量可准确预测广泛模型大小和量化位宽的最优比例。从实验数据中推导出一个损失标度定律,可预测不同QAT/FP计算分配策略和QAT位宽下的最优QAT比例和最终模型性能。利用该定律进行进一步预测,包括在给定内存约束下最优QAT位宽以及不同位宽QAT精度与全精度模型精度的比较。此外,本文提出了一种新的冷却与QAT融合方法,通过联合学习率衰减与量化感知训练,消除冗余的全精度模型更新,实现显著的计算节省。这些发现为高效的QAT规划提供了实用见解,并使在相同计算预算下训练更高质量的量化模型成为可能。

英文摘要

Quantization-aware training (QAT) is a leading technique for improving the accuracy of quantized neural networks. Previous work has shown that decomposing training into a full-precision (FP) phase followed by a QAT phase yields superior accuracy compared to QAT alone. However, the optimal allocation of compute between the FP and QAT phases remains unclear. We conduct extensive experiments with various compute budgets, QAT bit widths, and model sizes from 86.0M to 2.2B to investigate how different QAT durations impact final performance. We demonstrate that, contrary to previous findings, the loss-optimal ratio of QAT to FP training increases with the total amount of compute. Moreover, the optimal fraction can be accurately predicted for a wide range of model sizes and quantization widths using the tokens-per-parameter-byte statistic. From experimental data, we derive a loss scaling law that predicts both optimal QAT ratios and final model performance across different QAT/FP compute allocation strategies and QAT bit widths. We use the scaling law to make further predictions, which we verify experimentally, including which QAT bit width is optimal under a given memory constraint and how QAT accuracy with different bit widths compares to full-precision model accuracy. Additionally, we propose a novel cooldown and QAT fusion approach that performs learning rate decay jointly with quantization-aware training, eliminating redundant full-precision model updates and achieving significant compute savings. These findings provide practical insights into efficient QAT planning and enable the training of higher-quality quantized models with the same compute budget.

2602.21381 2026-06-16 cs.LG cs.AI cs.CE 版本更新

VCDF: A Validated Consensus-Driven Framework for Time Series Causal Discovery

VCDF:一种验证性共识驱动的时间序列因果发现框架

Gene Yu, Ce Guo, Wayne Luk

发表机构 * Department of Computing, Imperial College London(帝国理工学院伦敦分校计算机系)

AI总结 本文提出VCDF框架,通过评估时间序列阻断子集的因果关系稳定性,提升因果发现的鲁棒性,实验显示其在VAR-LiNGAM等方法上显著提高了F1分数,尤其在长序列中效果更佳。

Comments This paper has been accepted to PAKDD 2026. Please cite the proceedings version when available

详情
Journal ref
LNCS vol. 16599, pp. 29-41, Springer, 2026
AI中文摘要

时间序列因果发现对于理解动态系统至关重要,但现有方法对噪声、非平稳性和采样变异敏感。本文提出验证性共识驱动框架(VCDF),一种简单且方法无关的层,通过评估因果关系在阻断时间子集中的稳定性来提高鲁棒性。VCDF无需修改基础算法,可应用于VAR-LiNGAM和PCMCI等方法。实验表明,VCDF在合成数据集上提高了VAR-LiNGAM的窗口和总结F1分数,增益在不同数据特性中最为明显的是中等至长序列。该框架还受益于更长的序列,时间序列长度1000及以上可获得高达0.18的绝对改进。在模拟fMRI数据和IT监控场景中的评估进一步展示了其在现实噪声条件下的稳定性和结构准确性。VCDF为时间序列因果发现提供了一个有效的可靠性层,而不会改变底层建模假设。

英文摘要

Time series causal discovery is essential for understanding dynamic systems, yet many existing methods remain sensitive to noise, non-stationarity, and sampling variability. We propose the Validated Consensus-Driven Framework (VCDF), a simple and method-agnostic layer that improves robustness by evaluating the stability of causal relations across blocked temporal subsets. VCDF requires no modification to base algorithms and can be applied to methods such as VAR-LiNGAM and PCMCI. Experiments on synthetic datasets show that VCDF improves VAR-LiNGAM by approximately 0.08-0.12 in both window and summary F1 scores across diverse data characteristics, with gains most pronounced for moderate-to-long sequences. The framework also benefits from longer sequences, yielding up to 0.18 absolute improvement on time series of length 1000 and above. Evaluations on simulated fMRI data and IT-monitoring scenarios further demonstrate enhanced stability and structural accuracy under realistic noise conditions. VCDF provides an effective reliability layer for time series causal discovery without altering underlying modeling assumptions.

2602.19253 2026-06-16 cs.LG cs.NE 版本更新

Alternating Bi-Objective Optimization for Explainable Neuro-Fuzzy Systems

交替双目标优化用于可解释的神经模糊系统

Qusai Khaled, Uzay Kaymak, Laura Genga

发表机构 * University of Birmingham(伯明翰大学) Bilkent University(比尔肯特大学) University of Turku(图尔库大学)

AI总结 本文提出X-ANFIS方法,通过交替双目标梯度优化提升神经模糊系统的可解释性,在UCI回归数据集上验证了其在保持预测精度的同时实现目标区分性。

Comments Accepted at IEEE Conference on Artificial Intelligence 2026 (IEEE CAI 2026)

详情
Journal ref
Proc. 2026 IEEE Conference on Artificial Intelligence (CAI), 1166-1173 (2026)
AI中文摘要

模糊系统由于其基于规则的架构和语言变量,在可解释AI中展现出强大潜力。现有方法通过进化多目标优化(MOO)或梯度基标量化来平衡精度与可解释性,但前者计算成本高,后者无法恢复非凸帕累托区域。我们提出X-ANFIS,一种用于可解释自适应神经模糊推理系统的交替双目标梯度优化方案。通过语义控制的初始值使用Cauchy隶属函数实现稳定训练,并引入可微的可解释性目标,通过交替梯度传递将其与性能目标解耦。在约5000个实验中验证于九个UCI回归数据集,X-ANFIS在保持竞争性预测精度的同时,持续实现目标区分性,并恢复MOO帕累托前沿的凸包外的解。

英文摘要

Fuzzy systems show strong potential in explainable AI due to their rule-based architecture and linguistic variables. Existing approaches navigate the accuracy-explainability trade-off either through evolutionary multi-objective optimization (MOO), which is computationally expensive, or gradient-based scalarization, which cannot recover non-convex Pareto regions. We propose X-ANFIS, an alternating bi-objective gradient-based optimization scheme for explainable adaptive neuro-fuzzy inference systems. Cauchy membership functions are used for stable training under semantically controlled initializations, and a differentiable explainability objective is introduced and decoupled from the performance objective through alternating gradient passes. Validated in approximately 5,000 experiments on nine UCI regression datasets, X-ANFIS consistently achieves target distinguishability while maintaining competitive predictive accuracy, recovering solutions beyond the convex hull of the MOO Pareto front.

2505.17786 2026-06-16 cs.LG 版本更新

Supervised Graph Contrastive Learning for Gene Regulatory Networks

监督图对比学习用于基因调控网络

Sho Oshima, Yuji Okamoto, Taisei Tosaki, Ryosuke Kojima

发表机构 * University of Tokyo(东京大学) National Institute of Genetics(日本国立遗传学研究所)

AI总结 本文提出SupGCL方法,通过整合基因敲除实验中的生物扰动作为监督信号,改进基因调控网络的表示学习,提升疾病亚型识别和下游任务性能。

Comments ICML 2026

详情
AI中文摘要

图对比学习(GCL)是一种强大的自监督学习框架,通过图扰动进行数据增强,在分析生物网络如基因调控网络(GRNs)中应用广泛。GCL中常用的节点删除等人工扰动会引起结构变化,可能偏离生物现实。这一问题促使图表示学习向无增强方法发展,但该趋势忽略了生物有意义的扰动引起的结构变化并非需要避免的问题,而是信息来源。本文提出SupGCL,一种新的GRN GCL方法,直接整合基因敲除实验中的生物扰动作为监督。SupGCL是一种概率模型,连续扩展传统GCL,将人工增强与实测的敲除实验扰动联系起来,并利用后者作为显式监督。在三种癌症类型的患者衍生GRNs上,我们训练GRN表示并评估:(i)嵌入空间分析,产生更清晰的疾病亚型结构并提升聚类;(ii)任务特定微调,其在13个下游任务中一致优于强图表示学习基线,涵盖基因层面的功能注释和患者层面预测。

英文摘要

Graph Contrastive Learning (GCL) is a powerful self-supervised learning framework that performs data augmentation through graph perturbations, with growing applications in the analysis of biological networks such as Gene Regulatory Networks (GRNs). The artificial perturbations commonly used in GCL, such as node dropping, induce structural changes that can diverge from biological reality. This concern has contributed to a broader trend in graph representation learning toward augmentation-free methods, which view such structural changes as problematic and should be avoided. However, this trend overlooks the fundamental insight that structural changes from biologically meaningful perturbations are not a problem to be avoided, but rather a rich source of information, thereby ignoring the valuable opportunity to leverage data from real biological experiments. Motivated by this insight, we propose SupGCL (Supervised Graph Contrastive Learning), a new GCL method for GRNs that directly incorporates biological perturbations from gene knockdown experiments as supervision. SupGCL is a probabilistic formulation that continuously generalizes conventional GCL, linking artificial augmentations with real perturbations measured in knockdown experiments, and using the latter as explicit supervision. On patient-derived GRNs from three cancer types, we train GRN representations with SupGCL and evaluate it in two regimes: (i) embedding space analysis, where it yields clearer disease-subtype structure and improves clustering, and (ii) task-specific fine-tuning, where it consistently outperforms strong graph representation learning baselines on 13 downstream tasks spanning gene-level functional annotation and patient-level prediction.

2602.00240 2026-06-16 cs.LG 版本更新

Green-NAS: A Global-Scale Multi-Objective Neural Architecture Search for Robust and Efficient Edge-Native Weather Forecasting

Green-NAS:一种全球尺度多目标神经架构搜索用于鲁棒且高效的边缘原生天气预测

Md Muhtasim Munif Fahim, Soyda Humyra Yesmin, Saiful Islam, Md. Palash Bin Faruque, Md. A. Salam, Md. Mahfuz Uddin, Samiul Islam, Tofayel Ahmed, Md. Binyamin, Md. Rezaul Karim

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Washington(华盛顿大学) University of Arizona(亚利桑那大学)

AI总结 Green-NAS通过多目标优化寻找轻量高精度模型,减少计算能耗与碳足迹,提升边缘天气预测的鲁棒性与效率。

Comments Accepted at the 2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking

详情
Journal ref
2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking (QPAIN)
AI中文摘要

我们引入Green-NAS,一种针对低资源环境设计的多目标神经架构搜索框架,以天气预测为案例研究。遵循'绿色AI'原则,该框架明确最小化计算能耗和碳足迹,优先考虑可持续部署而非单纯计算规模。Green-NAS架构搜索方法通过同时优化多个目标,寻找高精度且参数极少的轻量模型;我们的最佳模型Green-NAS-A仅使用153k参数,达到RMSE 0.0988(比手动调优基线高1.4%),比其他全球应用的天气预测模型如GraphCast少239倍。此外,我们还描述了迁移学习如何在历史数据有限时,将天气预测精度提高约5.2%。

英文摘要

We introduce Green-NAS, a multi-objective NAS (neural architecture search) framework designed for low-resource environments using weather forecasting as a case study. By adhering to 'Green AI' principles, the framework explicitly minimizes computational energy costs and carbon footprints, prioritizing sustainable deployment over raw computational scale. The Green-NAS architecture search method is optimized for both model accuracy and efficiency to find lightweight models with high accuracy and very few model parameters; this is accomplished through an optimization process that simultaneously optimizes multiple objectives. Our best-performing model, Green-NAS-A, achieved an RMSE of 0.0988 (i.e., within 1.4% of our manually tuned baseline) using only 153k model parameters, which is 239 times fewer than other globally applied weather forecasting models, such as GraphCast. In addition, we also describe how the use of transfer learning will improve the weather forecasting accuracy by approximately 5.2%, in comparison to a naive approach of training a new model for each city, when there is limited historical weather data available for that city.

2602.08088 2026-06-16 cs.LG cs.AI 版本更新

Online Domain-aware LLM Decoding for Continual Domain Evolution

在线领域感知的LLM解码用于持续领域演变

Mohammad Abu-Shaira, Weishi Shi

发表机构 * University of North Texas(北卡罗来纳州立大学)

AI总结 本文提出在线领域感知解码框架ODD,通过概率融合和自适应置信度调节,提升LLM在持续领域变化中的适应能力,实验表明其在语法和语义生成任务中表现优异。

详情
Journal ref
Advances in Knowledge Discovery and Data Mining, PAKDD 2026, LNAI 16600, pp. 565-577, Springer, 2026
AI中文摘要

LLMs通常在领域特定数据上离线微调,假设领域静态。但实际上,领域知识通过新法规、产品、服务和交互模式持续演变。对每个新实例重新训练或微调LLM在计算上不可行。此外,现实环境也表现出时间动态性,数据分布不断变化。忽视这种现象,即概念漂移,会显著降低模型的预测准确性。这种领域演变与静态适应管道的不匹配凸显了需要高效实时适应而无需昂贵再训练的需求。为此,我们引入在线领域感知解码框架(ODD)。ODD在基础LLM和前缀树先验之间进行概率级融合,通过自适应置信度调节使用分歧和连续性信号进行指导。在多样化的漂移场景下的实证评估表明,ODD在所有语法和语义NLG指标上均优于LLM-Greedy和LLM-Temp Scaled。它在ROUGE-L指标上获得绝对增益0.065,并在最佳基线上使余弦相似度提高13.6%。这些结果证明了ODD对演变词汇和上下文模式的鲁棒性,使其适用于动态LLM应用。

英文摘要

LLMs are typically fine-tuned offline on domain-specific data, assuming a static domain. In practice, domain knowledge evolves continuously through new regulations, products, services, and interaction patterns. Retraining or fine-tuning LLMs for every new instance is computationally infeasible. Additionally, real-world environments also exhibit temporal dynamics with shifting data distributions. Disregarding this phenomenon, commonly referred to as concept drift, can significantly diminish a model's predictive accuracy. This mismatch between evolving domains and static adaptation pipelines highlights the need for efficient, real-time adaptation without costly retraining. In response, we introduce Online Domain-aware Decoding framework (ODD). ODD performs probability-level fusion between a base LLM and a prefix-tree prior, guided by adaptive confidence modulation using disagreement and continuity signals. Empirical evaluation under diverse drift scenarios demonstrates that ODD consistently surpasses LLM-Greedy and LLM-Temp Scaled across all syntactic and semantic NLG metrics. It yields an absolute ROUGE-L gain of 0.065 and a 13.6% relative improvement in Cosine Similarity over the best baseline. These results demonstrate ODD 's robustness to evolving lexical and contextual patterns, making it suitable for dynamic LLM applications.

2601.18897 2026-06-16 cs.AI cs.LG 版本更新

Explainable Uncertainty Quantification for Wastewater Treatment Energy Prediction via Interval Type-2 Neuro-Fuzzy System

通过区间型2神经模糊系统实现废水处理能耗预测的可解释不确定性量化

Qusai Khaled, Bahjat Mallak, Uzay Kaymak, Laura Genga

发表机构 * Jheronimus Academy of Data Science, Eindhoven University of Technology, Eindhoven, The Netherlands(杰罗尼穆斯数据科学学院,埃因霍温理工大学,埃因霍温,荷兰) Haskoning, Amersfoort, The Netherlands(哈索宁,阿默斯福尔特,荷兰) School of Industrial Engineering, Eindhoven University of Technology(工业工程学院,埃因霍温理工大学)

AI总结 本文提出一种区间型2神经模糊系统,用于废水处理能耗预测,通过模糊规则结构生成可解释的预测区间,分解不确定性层级,提升决策可靠性。

Comments Submitted to 21st International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU2026)

详情
Journal ref
IPMU 2026, Commun. Comput. Inf. Sci. 3020, 392-406 (2026)
AI中文摘要

废水处理厂消耗全球1-3%的电力,准确的能耗预测对运营优化和可持续性至关重要。尽管机器学习模型提供点预测,但缺乏可解释的不确定性量化,这对安全关键基础设施的风险意识决策至关重要。本研究开发了一种区间型2自适应神经模糊推理系统(IT2-ANFIS),通过模糊规则结构生成可解释的预测区间。与黑箱概率方法不同,所提出的框架将不确定性分解为三个层次:特征层、不确定性足迹识别引入模糊性的变量,规则层分析揭示局部模型的置信度,实例层区间量化整体预测不确定性。在墨尔本水务东处理厂数据集上验证,IT2-ANFIS在预测性能上与一阶ANFIS相当,但在训练运行中方差显著降低,同时提供可解释的不确定性估计,将预测置信度直接与运营条件和输入变量联系起来。

英文摘要

Wastewater treatment plants consume 1-3% of global electricity, making accurate energy forecasting critical for operational optimization and sustainability. While machine learning models provide point predictions, they lack explainable uncertainty quantification essential for risk-aware decision-making in safety-critical infrastructure. This study develops an Interval Type-2 Adaptive Neuro-Fuzzy Inference System (IT2-ANFIS) that generates interpretable prediction intervals through fuzzy rule structures. Unlike black-box probabilistic methods, the proposed framework decomposes uncertainty across three levels: feature-level, footprint of uncertainty identify which variables introduce ambiguity, rule-level analysis reveals confidence in local models, and instance-level intervals quantify overall prediction uncertainty. Validated on Melbourne Water's Eastern Treatment Plant dataset, IT2-ANFIS achieves comparable predictive performance to first order ANFIS with substantially reduced variance across training runs, while providing explainable uncertainty estimates that link prediction confidence directly to operational conditions and input variables.

2510.24987 2026-06-16 q-bio.QM cs.LG q-bio.GN 版本更新

scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration

scMRDR:一种可扩展且灵活的无配对单细胞多组学数据整合框架

Jianle Sun, Chaoqi Liang, Ran Wei, Peng Zheng, Lei Bai, Wanli Ouyang, Hongliang Yan, Peng Ye

发表机构 * Shanghai Artificial Intelligence Laboratory(上海人工智能实验室) Carnegie Mellon University(卡内基梅隆大学) The Chinese University of Hong Kong(香港中文大学) Guangzhou Laboratory(广州实验室)

AI总结 scMRDR通过β-VAE架构解耦细胞潜在表示,结合等距正则化、对抗目标和掩码重建损失,实现无配对多组学数据整合,有效提升大规模数据处理能力。

Comments Accepted at NeurIPS 2025 (Spotlight)

详情
Journal ref
Advances in Neural Information Processing Systems 38 (2025): 154538-154565
AI中文摘要

scMRDR通过β-VAE架构解耦细胞潜在表示,结合等距正则化、对抗目标和掩码重建损失,实现无配对多组学数据整合,有效提升大规模数据处理能力。

英文摘要

Advances in single-cell sequencing have enabled high-resolution profiling of diverse molecular modalities, while integrating unpaired multi-omics single-cell data remains challenging. Existing approaches either rely on pair information or prior correspondences, or require computing a global pairwise coupling matrix, limiting their scalability and flexibility. In this paper, we introduce a scalable and flexible generative framework called single-cell Multi-omics Regularized Disentangled Representations (scMRDR) for unpaired multi-omics integration. Specifically, we disentangle each cell's latent representations into modality-shared and modality-specific components using a well-designed $β$-VAE architecture, which are augmented with isometric regularization to preserve intra-omics biological heterogeneity, adversarial objective to encourage cross-modal alignment, and masked reconstruction loss strategy to address the issue of missing features across modalities. Our method achieves excellent performance on benchmark datasets in terms of batch correction, modality alignment, and biological signal preservation. Crucially, it scales effectively to large-scale datasets and supports integration of more than two omics, offering a powerful and flexible solution for large-scale multi-omics data integration and downstream biological discovery.

2512.14892 2026-06-16 cs.LG cs.AI 版本更新

OLR-WA: Online Weighted Average Linear Regression in Multivariate Data Streams

OLR-WA:多变量数据流中的在线加权平均线性回归

Mohammad Abu-Shaira, Alejandro Rodriguez, Greg Speegle, Victor Sheng, Ishfaq Ahmad

发表机构 * University of California, San Diego(加州大学圣地亚哥分校)

AI总结 本文提出OLR-WA模型,用于多变量数据流的在线线性回归,通过处理数据漂移和置信度场景,实现与批量回归相当甚至更优的性能。

详情
Journal ref
2023 IEEE International Conference on Big Data (BigData), 1039-1046
AI中文摘要

在线学习通过增量更新模型来处理新数据,避免大规模存储需求和昂贵的模型重计算。本文引入了

英文摘要

Online learning updates models incrementally with new data, avoiding large storage requirements and costly model recalculations. In this paper, we introduce "OLR-WA; OnLine Regression with Weighted Average", a novel and versatile multivariate online linear regression model. We also investigate scenarios involving drift, where the underlying patterns in the data evolve over time, conduct convergence analysis, and compare our approach with existing online regression models. The results of OLR-WA demonstrate its ability to achieve performance comparable to the batch regression, while also showcasing comparable or superior performance when compared with other state-of-the-art online models, thus establishing its effectiveness. Moreover, OLR-WA exhibits exceptional performance in terms of rapid convergence, surpassing other online models with consistently achieving high r2 values as a performance measure from the first iteration to the last iteration, even when initialized with minimal amount of data points, as little as 1% to 10% of the total data points. In addition to its ability to handle time-based (temporal drift) scenarios, remarkably, OLR-WA stands out as the only model capable of effectively managing confidence-based challenging scenarios. It achieves this by adopting a conservative approach in its updates, giving priority to older data points with higher confidence levels. In summary, OLR-WA's performance further solidifies its versatility and utility across different contexts, making it a valuable solution for online linear regression tasks.

2512.08879 2026-06-16 cs.LG cs.AI 版本更新

DAO-GP Drift Aware Online Non-Linear Regression Gaussian-Process

DAO-GP:漂移感知在线非线性回归高斯过程

Mohammad Abu-Shaira, Ajita Rattani, Weishi Shi

发表机构 * st Mohammad Abu-Shaira(第一作者) nd Ajita Rattani(第二作者) rd Weishi Shi(第三作者)

AI总结 提出DAO-GP模型,通过内置漂移检测与自适应机制、无超参数、稀疏化和衰减策略,解决在线高斯过程回归中概念漂移、超参数固定等问题,在多种漂移类型下表现鲁棒且优于现有方法。

详情
Journal ref
2025 IEEE International Conference on Big Data (BigData), pp. 776-785, 2025
AI中文摘要

真实世界的数据集通常表现出以数据分布演变为特征的时态动态。忽视这一现象(通常称为概念漂移)会显著降低模型的预测精度。此外,在线模型中超参数的存在加剧了这一问题。这些参数通常是固定的,用户无法根据演化的数据分布动态调整。高斯过程模型提供了具有不确定性量化的强大非参数回归能力,使其成为在线设置中建模复杂数据关系的理想选择。然而,传统的在线高斯过程方法存在几个关键限制,包括缺乏漂移感知、依赖固定超参数、易受数据窥探影响、缺乏原则性的衰减机制以及内存效率低下。为此,我们提出了DAO-GP(漂移感知在线高斯过程),一种新颖的、完全自适应的、无超参数、带衰减的稀疏非线性回归模型。DAO-GP具有内置的漂移检测和自适应机制,可根据漂移的严重程度动态调整模型行为。广泛的经验评估证实了DAO-GP在平稳条件、多种漂移类型(突变、增量、渐变)以及不同数据特征下的鲁棒性。分析表明其动态自适应、高效的内存和基于衰减的管理以及演化的诱导点。与最先进的参数和非参数模型相比,DAO-GP始终达到优越或竞争性的性能,使其成为在线非线性回归中具有漂移鲁棒性的解决方案。

英文摘要

Real-world datasets often exhibit temporal dynamics characterized by evolving data distributions. Disregarding this phenomenon, commonly referred to as concept drift, can significantly diminish a model's predictive accuracy. Furthermore, the presence of hyperparameters in online models exacerbates this issue. These parameters are typically fixed and cannot be dynamically adjusted by the user in response to the evolving data distribution. Gaussian Process (GP) models offer powerful non-parametric regression capabilities with uncertainty quantification, making them ideal for modeling complex data relationships in an online setting. However, conventional online GP methods face several critical limitations, including a lack of drift-awareness, reliance on fixed hyperparameters, vulnerability to data snooping, absence of a principled decay mechanism, and memory inefficiencies. In response, we propose DAO-GP (Drift-Aware Online Gaussian Process), a novel, fully adaptive, hyperparameter-free, decayed, and sparse non-linear regression model. DAO-GP features a built-in drift detection and adaptation mechanism that dynamically adjusts model behavior based on the severity of drift. Extensive empirical evaluations confirm DAO-GP's robustness across stationary conditions, diverse drift types (abrupt, incremental, gradual), and varied data characteristics. Analyses demonstrate its dynamic adaptation, efficient in-memory and decay-based management, and evolving inducing points. Compared with state-of-the-art parametric and non-parametric models, DAO-GP consistently achieves superior or competitive performance, establishing it as a drift-resilient solution for online non-linear regression.

2510.19728 2026-06-16 cs.LG cs.AI 版本更新

Enabling Granular Subgroup Level Model Evaluations by Generating Synthetic Medical Time Series

通过生成合成医疗时间序列实现细粒度亚组级别模型评估

Mahmoud Ibrahim, Bart Elen, Chang Sun, Gökhan Ertaylan, Michel Dumontier

发表机构 * Institute of Data Science, Faculty of Science and Engineering, Maastricht University(数据科学研究所,科学与工程学院,马斯特里赫特大学) Department of Advanced Computing Sciences, Faculty of Science and Engineering, Maastricht University(先进计算科学系,科学与工程学院,马斯特里赫特大学) VITO(VITO研究院)

AI总结 本文提出一种框架,利用合成ICU时间序列数据训练和评估预测模型,特别是在细粒度人口亚组中。引入Enhanced TimeAutoDiff,通过分布对齐惩罚增强潜在扩散目标,减少真实-合成与真实-真实评估差距,提升亚组模型评估的鲁棒性和可靠性。

详情
AI中文摘要

我们提出了一种新的框架,利用合成ICU时间序列数据不仅训练,还能严格可信地评估预测模型,既在总体层面,又在细粒度人口亚组中。基于先前的扩散和VAE生成器(TimeDiff,HealthGen,TimeAutoDiff),我们引入Enhanced TimeAutoDiff,通过在潜在扩散目标中加入分布对齐惩罚。我们广泛在MIMIC-III和eICU上对所有模型进行了基准测试,针对24小时死亡率和二元住院时间任务。我们的结果表明,Enhanced TimeAutoDiff通过减少真实-合成与真实-真实评估(

英文摘要

We present a novel framework for leveraging synthetic ICU time-series data not only to train but also to rigorously and trustworthily evaluate predictive models, both at the population level and within fine-grained demographic subgroups. Building on prior diffusion and VAE-based generators (TimeDiff, HealthGen, TimeAutoDiff), we introduce \textit{Enhanced TimeAutoDiff}, which augments the latent diffusion objective with distribution-alignment penalties. We extensively benchmark all models on MIMIC-III and eICU, on 24-hour mortality and binary length-of-stay tasks. Our results show that Enhanced TimeAutoDiff reduces the gap between real-on-synthetic and real-on-real evaluation (``TRTS gap'') by over 70\%, achieving $Δ_{TRTS} \leq 0.014$ AUROC, while preserving training utility ($Δ_{TSTR} \approx 0.01$). Crucially, for 32 intersectional subgroups, large synthetic cohorts cut subgroup-level AUROC estimation error by up to 50\% relative to small real test sets, and outperform them in 72--84\% of subgroups. This work provides a practical, privacy-preserving roadmap for trustworthy, granular model evaluation in critical care, enabling robust and reliable performance analysis across diverse patient populations without exposing sensitive EHR data, contributing to the overall trustworthiness of Medical AI.

2410.13439 2026-06-16 cs.LG cs.CL cs.CV 版本更新

Similarity-Dissimilarity Loss for Multi-label Supervised Contrastive Learning

多标签监督对比学习中的相似性-差异性损失

Guangming Huang, Yunfei Long, Cunjin Luo

发表机构 * University of Essex(埃塞克斯大学) Queen Mary University of London(伦敦大学玛丽女王学院)

AI总结 本文提出相似性-差异性损失,通过动态加权样本解决多标签场景下正样本确定问题,提供理论证明并统一单标签与多标签对比学习框架,实验表明方法在图像、文本和医疗领域均优于基线。

Comments Accepted by Transactions on Machine Learning Research (TMLR)

详情
AI中文摘要

监督对比学习通过利用标签信息取得了显著成功;然而,在多标签场景中确定正样本仍是一个关键挑战。在多标签监督对比学习(MSCL)中,多标签关系尚未完全定义,导致正样本识别和对比损失函数构建存在歧义。为解决这些挑战,我们:(i)系统地制定了MSCL中的多标签关系;(ii)提出了一种新颖的相似性-差异性损失,根据相似性和差异性因素动态重新加权样本;(iii)通过严谨的数学分析提供了理论支持,支持我们的方法制定和有效性;(iv)为单标签和多标签监督对比损失提供统一形式和范式。我们在图像和文本模态上进行了实验,并进一步将其扩展到医疗领域。结果表明,我们的方法在全面评估中始终优于基线,证明了其有效性和鲁棒性。

英文摘要

Supervised contrastive learning has achieved remarkable success by leveraging label information; however, determining positive samples in multi-label scenarios remains a critical challenge. In multi-label supervised contrastive learning (MSCL), multi-label relations are not yet fully defined, leading to ambiguity in identifying positive samples and formulating contrastive loss functions to construct the representation space. To address these challenges, we: (i) systematically formulate multi-label relations in MSCL, (ii) propose a novel Similarity-Dissimilarity Loss, which dynamically re-weights samples based on similarity and dissimilarity factors, (iii) further provide theoretically grounded proofs for our method through rigorous mathematical analysis that supports the formulation and effectiveness, and (iv) offer a unified form and paradigm for both single-label and multi-label supervised contrastive loss. We conduct experiments on both image and text modalities and further extend the evaluation to the medical domain. The results show that our method consistently outperforms baselines in comprehensive evaluations, demonstrating its effectiveness and robustness.

2509.19197 2026-06-16 cs.LG 版本更新

A Validation Strategy for Deep Learning Models: Evaluating and Enhancing Robustness

深度学习模型的验证策略:评估与增强鲁棒性

Abdul-Rauf Nuhu, Parham Kebria, Vahid Hemmati, Benjamin Lartey, Mahmoud Nabil Mahmoud, Abdollah Homaifar, Edward Tunstel

发表机构 * National Center for Atmospheric Research (NCAR)(国家大气科学研究中心)

AI总结 本文提出通过局部鲁棒性分析从训练数据中提取'弱鲁棒'样本,用于评估和提升模型鲁棒性,验证了该方法在CIFAR-10、CIFAR-100和ImageNet上的有效性。

详情
AI中文摘要

数据驱动模型,尤其是深度学习分类器在干净数据集上表现优异,但易受对抗性及常见扰动影响。传统鲁棒性验证依赖扰动测试数据集,而本文提出从训练数据中提取'弱鲁棒'样本进行验证。这些样本对扰动最敏感,能早期揭示模型漏洞。通过评估这些挑战性样本,可更深入理解模型鲁棒性并指导性能提升。在CIFAR-10、CIFAR-100和ImageNet上验证了该方法的有效性,展示了基于弱鲁棒样本的鲁棒性验证如何提升模型在对抗和常见扰动下的可靠性。

英文摘要

Data-driven models, especially deep learning classifiers often demonstrate great success on clean datasets. Yet, they remain vulnerable to common data distortions such as adversarial and common corruption perturbations. These perturbations can significantly degrade performance, thereby challenging the overall reliability of the models. Traditional robustness validation typically relies on perturbed test datasets to assess and improve model performance. In our framework, however, we propose a validation approach that extracts "weak robust" samples directly from the training dataset via local robustness analysis. These samples, being the most susceptible to perturbations, serve as an early and sensitive indicator of the model's vulnerabilities. By evaluating models on these challenging training instances, we gain a more nuanced understanding of its robustness, which informs targeted performance enhancement. We demonstrate the effectiveness of our approach on models trained with CIFAR-10, CIFAR-100, and ImageNet, highlighting how robustness validation guided by weak robust samples can drive meaningful improvements in model reliability under adversarial and common corruption scenarios.

2506.22530 2026-06-16 cs.LG cs.DB 版本更新

Task-Agnostic Contrastive Pretraining for Relational Deep Learning

关系深度学习中的任务无关对比预训练

Jakub Peleška, Gustav Šír

发表机构 * Czech Technical University in Prague(捷克技术大学布拉格分校)

AI总结 本文提出一种任务无关的对比预训练方法,通过三层对比目标提升关系数据的表示学习,实验表明预训练模型在关系数据迁移学习中表现优异。

Comments arXiv admin note: text overlap with arXiv:2506.22199

详情
AI中文摘要

关系深度学习(RDL)是一种新兴范式,利用图神经网络原理直接从关系数据库中学习,通过将其表示为异构图。然而,现有RDL模型通常依赖任务特定的监督学习,需要为每个预测任务训练独立模型,这可能影响可扩展性和重用性。本文提出了一种新的任务无关对比预训练方法,旨在实现数据库层面的表示学习。为此,我们引入了三个层次的对比目标——行级、链接级和上下文级,旨在捕捉关系数据固有的结构和语义异质性。我们通过模块化的RDL架构和高效的采样策略实现了相应的预训练方法。在标准RDL基准上的初步结果表明,微调预训练模型在性能上显著优于从头开始训练,验证了所提出方法在学习可迁移表示方面的潜力。

英文摘要

Relational Deep Learning (RDL) is an emerging paradigm that leverages Graph Neural Network principles to learn directly from relational databases by representing them as heterogeneous graphs. However, existing RDL models typically rely on task-specific supervised learning, requiring training separate models for each predictive task, which may hamper scalability and reuse. In this work, we propose a novel task-agnostic contrastive pretraining approach for RDL that enables database-wide representation learning. For that aim, we introduce three levels of contrastive objectives$-$row-level, link-level, and context-level$-$designed to capture the structural and semantic heterogeneity inherent to relational data. We implement the respective pretraining approach through a modular RDL architecture and an efficient sampling strategy tailored to the heterogeneous database setting. Our preliminary results on standard RDL benchmarks demonstrate that fine-tuning the pretrained models measurably outperforms training from scratch, validating the promise of the proposed methodology in learning transferable representations for relational data.

2504.18179 2026-06-16 cs.CV cs.LG 版本更新

Label-independent hyperparameter-free self-supervised single-view deep subspace clustering

与标签无关的超参数自由单视图深度子空间聚类

Lovro Sindicic, Ivica Kopriva

发表机构 * Division of Computing and Data Science, Ruđer Bošković Institute(计算与数据科学系,鲁德·博克维奇研究所)

AI总结 本文提出一种无需超参数调节的单视图深度子空间聚类方法,通过层间自表达损失、子空间结构范数优化、多阶段学习框架和相对误差终止机制提升聚类性能。

Comments 35 pages; 1 figure; 10 Tables

详情
AI中文摘要

深度子空间聚类(DSC)算法面临多个挑战,限制了其在各种应用领域中的广泛应用。首先,聚类质量通常仅通过编码器的输出层评估,忽略了中间层中的有价值信息。其次,大多数DSC方法将表示学习和子空间聚类视为独立任务,限制了其有效性。第三,它们假设可以使用一个留出的数据集进行超参数调节,这在实际场景中往往不现实。第四,学习终止通常基于聚类误差监控,需要外部标签。最后,其性能通常依赖于依赖标注数据的后处理技术。为了解决这些限制,我们引入了一种新的单视图DSC方法:(i) 使用联合表示矩阵最小化层间自表达损失;(ii) 优化子空间结构范数以提高聚类质量;(iii) 采用多阶段顺序学习框架,包括预训练和微调,使能够使用多个正则化项而无需超参数调节;(iv) 融合基于相对误差的自停止机制以终止训练而不使用标签;(v) 根据先验知识在学习的表示矩阵中保留固定数量的领先系数。我们在六个代表面孔、数字和物体的数据集上评估了所提出的方法。结果表明,我们的方法在经过仔细调节的超参数下优于大多数线性SC算法,同时在最佳线性方法中保持竞争力。

英文摘要

Deep subspace clustering (DSC) algorithms face several challenges that hinder their widespread adoption across variois application domains. First, clustering quality is typically assessed using only the encoder's output layer, disregarding valuable information present in the intermediate layers. Second, most DSC approaches treat representation learning and subspace clustering as independent tasks, limiting their effectiveness. Third, they assume the availability of a held-out dataset for hyperparameter tuning, which is often impractical in real-world scenarios. Fourth, learning termination is commonly based on clustering error monitoring, requiring external labels. Finally, their performance often depends on post-processing techniques that rely on labeled data. To address this limitations, we introduce a novel single-view DSC approach that: (i) minimizes a layer-wise self expression loss using a joint representation matrix; (ii) optimizes a subspace-structured norm to enhance clustering quality; (iii) employs a multi-stage sequential learning framework, consisting of pre-training and fine-tuning, enabling the use of multiple regularization terms without hyperparameter tuning; (iv) incorporates a relative error-based self-stopping mechanism to terminate training without labels; and (v) retains a fixed number of leading coefficients in the learned representation matrix based on prior knowledge. We evaluate the proposed method on six datasets representing faces, digits, and objects. The results show that our method outperforms most linear SC algorithms with careffulyl tuned hyperparameters while maintaining competitive performance with the best performing linear appoaches.

2403.19444 2026-06-16 cs.LG cs.CV 版本更新

Leveraging Expert Input for Robust and Explainable AI-Assisted Lung Cancer Detection in Chest X-rays

利用专家输入实现稳健且可解释的AI辅助肺癌检测

Amy Rafferty, Rishi Ramaesh, Ajitha Rajan

发表机构 * School of Informatics, University of Edinburgh(信息学院,爱丁堡大学) NHS Lothian(洛锡安国家健康服务)

AI总结 本文研究了基于InceptionV3的肺癌检测模型的可解释性和鲁棒性,提出ClinicXAI方法,通过专家驱动的思路生成临床相关解释,并在对抗攻击下表现出更强的鲁棒性。

详情
AI中文摘要

深度学习模型在推动AI辅助医学诊断方面显示出巨大潜力,特别是在通过胸部X光等医学图像模态检测肺癌方面。然而,这些模型的黑盒性质对可解释性和可信度构成挑战,限制了其在临床中的应用。本研究评估了基于InceptionV3的高性能肺癌检测模型的可解释性和鲁棒性,利用公开的胸部X光和放射学报告数据集。我们评估了多种可解释AI(XAI)技术的临床效用,包括后验和先验方法,并发现现有方法常无法提供临床相关解释,存在不一致性和与放射科专家评估的偏离。为解决这些限制,我们与放射科医生合作定义诊断特定的临床概念,并开发了ClinicXAI,一种专家驱动的方法,利用概念瓶颈方法。ClinicXAI生成具有临床意义的解释,与临床医生的实践需求紧密相关,同时保持高诊断准确性。我们还通过一系列广泛使用的对抗攻击测试ClinicXAI与原始InceptionV3模型的鲁棒性。我们的分析表明,ClinicXAI在对抗扰动下表现出显著更强的鲁棒性。这些发现强调了在医学诊断中将领域专业知识纳入可解释和鲁棒AI系统设计的重要性,为医疗领域更可信和有效的AI解决方案铺平道路。

英文摘要

Deep learning models show significant potential for advancing AI-assisted medical diagnostics, particularly in detecting lung cancer through medical image modalities such as chest X-rays. However, the black-box nature of these models poses challenges to their interpretability and trustworthiness, limiting their adoption in clinical practice. This study examines both the interpretability and robustness of a high-performing lung cancer detection model based on InceptionV3, utilizing a public dataset of chest X-rays and radiological reports. We evaluate the clinical utility of multiple explainable AI (XAI) techniques, including both post-hoc and ante-hoc approaches, and find that existing methods often fail to provide clinically relevant explanations, displaying inconsistencies and divergence from expert radiologist assessments. To address these limitations, we collaborated with a radiologist to define diagnosis-specific clinical concepts and developed ClinicXAI, an expert-driven approach leveraging the concept bottleneck methodology. ClinicXAI generated clinically meaningful explanations which closely aligned with the practical requirements of clinicians while maintaining high diagnostic accuracy. We also assess the robustness of ClinicXAI in comparison to the original InceptionV3 model by subjecting both to a series of widely utilized adversarial attacks. Our analysis demonstrates that ClinicXAI exhibits significantly greater resilience to adversarial perturbations. These findings underscore the importance of incorporating domain expertise into the design of interpretable and robust AI systems for medical diagnostics, paving the way for more trustworthy and effective AI solutions in healthcare.

2408.06350 2026-06-16 cs.HC cs.LG 版本更新

Predicting cognitive load in immersive driving scenarios with a hybrid CNN-RNN model

利用混合CNN-RNN模型预测沉浸式驾驶场景中的认知负荷

Mehshan Ahmed Khan, Houshyar Asadi, Mohammad Reza Chalak Qazani, Adetokunbo Arogbonlo, Saeid Nahavandi, Chee Peng Lim

发表机构 * Institute for Intelligent Systems Research and Innovation(智能系统研究与创新研究所) Faculty of Computing and Information Technology (FoCIT)(计算与信息科技学院) Swinburne University of Technology(斯威丁大学)

AI总结 本文提出混合CNN-RNN模型,通过融合fNIRS、眼动追踪和驾驶行为数据,准确预测三种认知负荷水平,提升预测精度。

Comments 17 pages

详情
AI中文摘要

在交通安全研究中,次要任务的认知负荷会降低主要任务表现,如驾驶。尽管生理信号已被广泛用于驾驶相关研究以评估认知负荷,但仅有少数研究专门关注高认知负荷场景。本研究采用三种等级的听觉n-back任务作为认知负荷的次要任务,在驾驶模拟器中驾驶时同时执行驾驶和n-back任务,记录fNIRS、眼动追踪和驾驶行为数据以预测三种不同水平的认知负荷。不同于以往研究中在无交通条件下使用二元分类法,本研究在低能见度条件下,特别是在夜间和雨天的正常交通环境中,考察三种认知负荷水平。我们提出了一种结合1D卷积神经网络和循环神经网络的混合神经网络来预测认知负荷。实验结果表明,所提出的模型在参数更少的情况下,使用生理数据将准确率从99.82%提升到99.99%,使用驾驶行为数据单独时从87.26%提升到92.02%。这一显著改进突显了我们混合神经网络在复杂条件下准确预测驾驶认知负荷的有效性。

英文摘要

One debatable issue in traffic safety research is that cognitive load from sec-ondary tasks reduces primary task performance, such as driving. Although physiological signals have been extensively used in driving-related research to assess cognitive load, only a few studies have specifically focused on high cognitive load scenarios. Most existing studies tend to examine moderate or low levels of cognitive load In this study, we adopted an auditory version of the n-back task of three levels as a cognitively loading secondary task while driving in a driving simulator. During the simultaneous execution of driving and the n-back task, we recorded fNIRS, eye-tracking, and driving behavior data to predict cognitive load at three different levels. To the best of our knowledge, this combination of data sources has never been used before. Un-like most previous studies that utilize binary classification of cognitive load and driving in conditions without traffic, our study involved three levels of cognitive load, with drivers operating in normal traffic conditions under low visibility, specifically during nighttime and rainy weather. We proposed a hybrid neural network combining a 1D Convolutional Neural Network and a Recurrent Neural Network to predict cognitive load. Our experimental re-sults demonstrate that the proposed model, with fewer parameters, increases accuracy from 99.82% to 99.99% using physiological data, and from 87.26% to 92.02% using driving behavior data alone. This significant improvement highlights the effectiveness of our hybrid neural network in accurately pre-dicting cognitive load during driving under challenging conditions.

2401.06644 2026-06-16 cs.LG eess.SP 版本更新

SeizNet: An AI-enabled Implantable Sensor Network System for Seizure Prediction

SeizNet:一种基于人工智能的可植入传感器网络系统用于癫痫预测

Ali Saeizadeh, Douglas Schonholtz, Daniel Uvaydov, Raffaele Guida, Emrecan Demirors, Pedram Johari, Jorge M. Jimenez, Joseph S. Neimat, Tommaso Melodia

发表机构 * Institute for the Wireless Internet of Things, Northeastern University, Boston, MA, U.S.A.(无线互联网研究所,东北大学,波士顿,马萨诸塞州,美国) University of Louisville, Louisville, KY, U.S.A.(路易斯维尔大学,路易斯维尔,肯塔基州,美国)

AI总结 SeizNet利用深度学习和多传感器数据提升癫痫预测的特异性与灵敏度,实现高达99%的预测准确率,为难治性癫痫治疗提供新途径。

Comments 4 pages, 4 figures, 1 table

详情
Journal ref
2024 19th Wireless On-Demand Network Systems and Services Conference (WONS)
AI中文摘要

本文介绍SeizNet,一种通过深度学习方法和可植入传感器网络实现癫痫预测的闭环系统。SeizNet结合脑电图(iEEG)和心电图(ECG)数据,提升预测特异性的同时保持高灵敏度。系统设计用于边缘计算,减少数据隐私、传输和能耗问题。实验表明,SeizNet在所有指标上均优于传统单模态和非个性化预测系统,达到99%的癫痫预测准确率,为难治性癫痫治疗提供新方向。

英文摘要

In this paper, we introduce SeizNet, a closed-loop system for predicting epileptic seizures through the use of Deep Learning (DL) method and implantable sensor networks. While pharmacological treatment is effective for some epilepsy patients (with ~65M people affected worldwide), one out of three suffer from drug-resistant epilepsy. To alleviate the impact of seizure, predictive systems have been developed that can notify such patients of an impending seizure, allowing them to take precautionary measures. SeizNet leverages DL techniques and combines data from multiple recordings, specifically intracranial electroencephalogram (iEEG) and electrocardiogram (ECG) sensors, that can significantly improve the specificity of seizure prediction while preserving very high levels of sensitivity. SeizNet DL algorithms are designed for efficient real-time execution at the edge, minimizing data privacy concerns, data transmission overhead, and power inefficiencies associated with cloud-based solutions. Our results indicate that SeizNet outperforms traditional single-modality and non-personalized prediction systems in all metrics, achieving up to 99% accuracy in predicting seizure, offering a promising new avenue in refractory epilepsy treatment.