arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.14934 2026-06-16 cs.LG cs.AI 新提交

Separable Neural Architectures as Physical World Models: from Mathematical Theory to Applications

可分离神经架构作为物理世界模型：从数学理论到应用

Reza T Batley, Andrew Kichline, Sourav Saha

发表机构 * Kevin T. Crofton Department of Aerospace and Ocean Engineering, Virginia Polytechnic Institute and State University（弗吉尼亚理工大学凯文·T·克罗夫顿航空航天与海洋工程系）

AI总结提出可分离神经架构（SNA），结合神经逼近与张量分解，通过变分框架求解偏微分方程，实现高维问题代数级缩放，并在工程案例中取得显著加速。

详情

AI中文摘要

本文介绍了可分离神经架构（SNA），这是一种结合神经逼近与张量分解的函数表示类。SNA将局部坐标函数（原子）与由稀疏低秩交互对象控制的全局相互作用解耦。该架构具有紧凑且平滑的归纳偏置，非常适合求解偏微分方程（PDE）。当在变分SNA（VSNA）框架下被视为Galerkin试验空间时，该公式满足Lax-Milgram下的经典变分保证：适定性、拟最优性、收敛性和稳定性。在高维时空-参数PDE中，VSNA通过代数级而非指数级缩放来缓解维数灾难。利用完全分解的、张量原生的交替最小二乘（ALS）优化框架，可将此成本降低至维度线性。VSNA在椭圆、双曲和抛物系统中得到验证，显示出与预测的代数谱缩放率高度一致。我们通过两个工程案例研究展示了SNA作为“一次求解，随处查询”的物理世界模型：一个7维参数化制造模拟和一个用于Inconel 718的实验性热-属性反演流程。VSNA在标准笔记本电脑CPU上102秒内执行了1,000,000次蒙特卡洛扫描，相比基于NVIDIA A100 GPU的全网格有限元基线实现了150,000倍加速。它还能在100毫秒内实现实时生成式逆模态重建。这些结果表明，SNA可作为连续参数流形的紧凑数学基础，实现实时反演、优化循环和快速不确定性传播。

英文摘要

This work introduces the Separable Neural Architecture (SNA), a function representational class combining neural approximation with tensor decomposition. The SNA decouples localized coordinate functions (atoms) from global interactions governed by a sparse, low-rank interaction object. This architecture possesses a compact and smooth inductive bias well-suited for solving partial differential equations (PDEs). When viewed as a Galerkin trial space under the variational SNA (VSNA) framework, the formulation satisfies classical variational guarantees under Lax-Milgram: well-posedness, quasi-optimality, convergence, and stability. In high-dimensional spatiotemporal--parametric PDEs, the VSNA mitigates the curse of dimensionality by scaling algebraically rather than exponentially. Exploiting an entirely factorized, tensor-native alternating least squares (ALS) optimization framework reduces this cost to linear in dimension. The VSNA is validated across elliptic, hyperbolic, and parabolic systems, demonstrating close alignment with predicted algebraic and spectral scaling rates. We showcase the SNA as a "solve once, query anywhere" physical world model via two engineering case studies: a 7D parametric manufacturing simulation and an experimental thermal-to-property inversion pipeline for Inconel 718. The VSNA executes a 1,000,000-query Monte Carlo sweep in 102s on a standard laptop CPU, yielding a 150,000x speedup over a full-grid finite element baseline hosted on an NVIDIA A100 GPU. It further enables real-time generative inverse-mode reconstructions under 100ms. These results demonstrate that the SNA serves as a compact mathematical substrate for continuous parameter manifolds to enable real-time inversion, optimization loops, and rapid uncertainty propagation.

URL PDF HTML ☆

赞 0 踩 0

2606.15036 2026-06-16 cs.LG math.NT 新提交

Transformers Learn the Mestre-Nagao Heuristic

Transformer学习Mestre-Nagao启发式方法

Pranav Venkata Konda

发表机构 * Pranav Venkata Konda（普拉纳夫·文卡塔·科恩达）

AI总结训练两层Transformer编码器对有理椭圆曲线进行秩分类（rank 0/1），精度>99%，并通过机械可解释性发现模型学到了Mestre-Nagao和启发式权重，且CLS嵌入编码了L(E,1)的对数。

Comments 15 pages, 10 figures

详情

AI中文摘要

我们训练了一个两层Transformer编码器，用于将导子≤10000的有理椭圆曲线$E/\mathbb{Q}$从前128个归一化Frobenius迹分类为秩0或秩1。我们在两个类别上都达到了>99%的准确率，并且在测试曲线上（训练集中没有同源或二次扭的曲线）准确率基本不变。然后，我们应用机械可解释性技术，如注意力分析、线性探针、激活修补、logit归因和神经元级电路分析，来逆向工程模型（函数空间中的质心）学到的算法。我们发现，在平台期，一个由512个第一层MLP神经元中的20个组成的稀疏电路足以在AUROC为0.992的线性探针下进行秩预测，实现了秩0和秩1检测器的推挽检测架构，并带有单侧读出。然而，我们注意到模型存在次优的读出问题，表明读出路径与判别电路之间的秩顺序不匹配。关键的是，顶部判别神经元的学得输入权重与Mestre-Nagao和启发式权重$\log(p)/(p\cdot \log{B})$匹配，Spearman系数$r=0.997$，Pearson系数$r=0.952$：模型仅从Frobenius迹数据就学到了解析数论的一个结果。我们还发现，所有50个独立训练的模型都将CLS注意力集中在素数位置，其速率是合数位置的2-50倍。CLS嵌入编码了$\log{L(E,1)}$，在50个模型中的$R^2=0.962\pm 0.011$（在控制导子后）。激活修补分析表明，注意力权重与因果信息流分离。此外，训练得到的50个解在函数空间上几乎相同（成对一致性>98.8%），尽管权重空间存在巨大障碍。

英文摘要

We train a two-layer transformer encoder to classify rational elliptic curves $E/\mathbb{Q}$ of conductor $\leq 10000$ as either rank 0 or rank 1 from the first 128 normalized Frobenius traces. We achieve >99% accuracy on both classes, and accuracy is essentially unchanged on test curves with no isogeny or quadratic-twist relative in the training set. We then apply techniques from mechanistic interpretability such as attention analysis, linear probing, activation patching, logit attribution, and neuron-level circuit analysis to reverse-engineer the algorithm the (centroid in function space) model learned. We find that a sparse circuit of 20 out of 512 layer-1 MLP neurons is sufficient for rank prediction under a linear probe with an AUROC of 0.992 at plateau, implementing a push-pull detector architecture of rank-0 and rank-1 detectors with a one-sided readout. However, we notice that the model has sub-optimal readout problems indicating a mismatch in rank-order between the readout pathway and the discriminative circuit. Critically, the learned input weights of the top discriminating neuron match the Mestre-Nagao sum heuristic weights $\log(p)/(p\cdot \log{B})$ with a Spearman coefficient $r = 0.997$ and Pearson coefficient $r = 0.952$: the model has learnt a result from analytic number theory from the Frobenius trace data alone. We additionally find that all 50 independently trained models concentrate CLS attention on prime positions at 2-50$\times$ the rate of composite positions. The CLS embedding encodes $\log{L(E,1)}$ with $R^2 = 0.962\pm 0.011$ across the 50 models (after controlling for the conductor). Activation patching analysis reveals that attention weights are dissociated from causal information flow. Additionally, the 50 solutions from training are near-identical in function space (with pairwise agreement $>$98.8%) despite large weight space barriers.

URL PDF HTML ☆

赞 0 踩 0

2606.15085 2026-06-16 cs.LG 新提交

An Integrable Token Mixing Layer from the Generalized Yang Baxter Equation

来自广义杨-巴克斯特方程的可积令牌混合层

Snigdha Chandan Khilar

发表机构 * Independent Researcher（独立研究员）

AI总结提出YB Mixer，一种基于自由费米子和广义杨-巴克斯特结构的序列令牌混合层，利用可积系统的局部代数约束保证全局计算稳定性，并实现保范正交映射、可交换传输矩阵和谱循环生成器，以支持变长序列推理。

2606.15207 2026-06-16 cs.LG cs.AI cs.NE 新提交

Controlled Dynamics Attractor Transformer

受控动力学吸引子Transformer

Cheng Zhang, Minnan Luo, Zesheng Yang, Ming Li, Yong-Jin Liu, Qinghua Zheng

发表机构 * Xi'an Jiaotong University（西安交通大学）； Tsinghua University（清华大学）

AI总结提出受控动力学吸引子Transformer（CDAT），通过耦合混合von Mises-Fisher注意力能量与Hopfield精炼能量，并引入CANN启发的兴奋-抑制调制，实现拓扑约束的动力学系统，在图异常检测和图分类任务上达到最优性能。

Comments 20pages,3 figures

详情

Journal ref: Forty-Third International Conference on Machine Learning(ICML 2026)

AI中文摘要

Transformer架构通过自注意力机制在深度模型的表示学习和推理方面取得了显著进展。同时，联想记忆（AM）框架将表示映射到能量景观上，提供了可解释的检索机制。然而，其连续时间推理动力学缺乏经典连续吸引子神经网络（CANN）的生物合理性。为弥合这一差距，我们提出了受控动力学吸引子Transformer（CDAT），它将混合von Mises-Fisher（Mo-vMF）注意力能量与Hopfield精炼能量耦合，同时通过CANN启发的兴奋-抑制调制增强能量下降。CDAT实例化了一个拓扑约束的动力学系统，其耦合编码了标记之间的关系结构，从而将吸引子式动力学与现代基于能量的注意力联系起来。我们进一步提供了构造性的耗散分析，以正式建立其受控推理动力学。得益于这些鲁棒且结构化的动力学，CDAT在图异常检测和图分类的多个基准测试中达到了最先进的性能。

英文摘要

Transformer architectures have dramatically advanced representation learning and inference in deep models through self-attention mechanisms. In parallel,associative memory (AM) frameworks map representations onto energy landscapes, offering interpretable retrieval mechanisms. However, their continuous-time inference dynamics lack the biological plausibility of classical Continuous Attractor Neural Networks (CANNs). To bridge this gap, we propose Controlled Dynamics Attractor Transformer (CDAT), which couples a mixture von Mises-Fisher (Mo-vMF) attention energy with a Hopfield refinement energy, while augmenting energy descent with a CANN-inspired excitation-inhibition modulation. CDAT instantiates a topology-constrained dynamical system whose couplings encode relational structure among tokens, thereby linking attractor-style dynamics to modern energy-based attention. We further provide a constructive dissipation analysis to formally establish their controlled inference dynamics. Benefiting from these robust and structured dynamics, CDAT achieves state-of-the-art performance across multiple benchmarks in graph anomaly detection and graph classification.

URL PDF HTML ☆

赞 0 踩 0

2606.15576 2026-06-16 cs.LG cs.AI 新提交

Localizing Credit at the Divergence: Path-Conditioned Self-Distillation for LLM Reasoning

在分歧处定位信用：路径条件自蒸馏用于LLM推理

Yu Li, Shu Hong, Tian Lan

发表机构 * Department of Electrical and Computer Engineering, George Washington University（乔治华盛顿大学电气与计算机工程系）

AI总结提出Hindsight Self-Distillation (HSD)方法，通过将教师模型条件于当前训练组中的成功同伴轨迹，在失败与成功轨迹的分歧处提供密集信用信号，提升LLM在数学和代码推理任务上的性能。

详情

AI中文摘要

基于可验证奖励的强化学习为每次 rollout 分配一个标量，在长推理轨迹中留下了 token 级信用分配不明确的问题。同策略自蒸馏通过让同一模型作为教师，并条件于特权信息，产生密集的逐 token 信号来解决这一问题。但常见的真实答案选择仅是一个终点线索：在简短答案任务中，教师在需要路径级指导的中间位置保持沉默。我们提出后见自蒸馏（HSD），它将教师条件于从当前训练组中抽取的一个成功同伴 rollout。这样的同伴是从成功条件策略中精确采样的样本，无需额外的采样 rollout。通过提供完整的成功延续而不仅仅是最终答案，产生的信用信号集中在失败 rollout 与成功同伴之间的分歧位置。在 Qwen3-8B 和 Qwen3-32B 的数学和代码基准测试中，HSD 相比 GRPO 变体和同策略蒸馏基线获得了最佳结果，在 AIME 等简短答案任务上提升最大。

英文摘要

Reinforcement learning from verifiable rewards assigns a single scalar to each rollout, leaving token-level credit assignment underspecified in long reasoning traces. On-policy self-distillation addresses this by letting the same model act as a teacher conditioned on privileged information, producing a dense per-token signal. But the common choice of a ground-truth answer is only an endpoint cue: on terse-answer tasks, the teacher falls silent at the intermediate positions where path-level guidance matters most. We propose Hindsight Self-Distillation (HSD), which conditions the teacher on a successful peer rollout drawn from the current training group. Such a peer is an exact sample from the success-conditioned policy, requiring no additional sampled rollouts. By providing a full successful continuation rather than only the final answer, the resulting credit signal concentrates at the divergence position between a failed rollout and a successful peer. Across Qwen3-8B and Qwen3-32B on math and code benchmarks, HSD obtains the best result against GRPO variants and on-policy distillation baselines, with the largest gains on terse-answer tasks such as AIME.

URL PDF HTML ☆

赞 0 踩 0

2606.15669 2026-06-16 cs.LG cs.AI 新提交

Z-Plane Neural Networks: Bounded Geometric Activation Replaces ReLU and LayerNorm

Z平面神经网络：有界几何激活替代ReLU和LayerNorm

Sungwoo Goo, Hwi-yeol Yun, Sangkeun Jung

发表机构 * College of Pharmacy, Chungnam National University（忠南大学药学院）； Department of Computer Science & Engineering, Chungnam National University（忠南大学计算机科学与工程系）

AI总结提出Z平面神经网络，通过有界几何激活函数Radial Bounding将隐藏状态映射到超球面上的2D相量束，在保持方向信息的同时限制能量幅度，理论证明其保持1-Lipschitz连续性并防止梯度消失，实验表明100层无ReLU和LayerNorm的MLP在MNIST上稳定收敛。

详情

AI中文摘要

现代深度神经网络依赖欧几里得标量激活（如ReLU）和全局归一化技术（如LayerNorm）来防止深层架构中的梯度不稳定。然而，这些机制固有地导致神经元死亡、丢弃关键方向信息并破坏特征表示的正交性。受生物轴突频率调制传输的启发，我们提出了Z平面神经网络，将隐藏状态映射到超球面上的2D相量束。我们引入了一种新颖的几何激活函数Radial Bounding（$\mathbf{x} / \max(1, \\|\mathbf{x}\\|_2)$），它在保持相位（方向）的同时限制能量幅度。我们从数学上证明，这种各向同性激活保持了1-Lipschitz连续性，并通过保留切向梯度防止梯度消失。实验上，一个完全不含ReLU和LayerNorm的100层Z平面多层感知机（MLP）在MNIST数据集上成功收敛，准确率达到98.34%，且具有绝对数值稳定性，证明仅靠有界几何激活就足以实现稳定的深度学习。

Phys-JEPA：面向多变量时间序列预测的物理信息潜在世界模型

Weizhi Nie, Weichao Liu, Honglin Guo, Yuting Su

发表机构 * Tianjin University（天津大学）

AI总结提出Phys-JEPA架构，将物理一致性约束引入潜在状态和状态转移，分解预测状态为物理和残差分量，在气候、交通、电力数据集上提升预测精度。

Comments Submitted to arXiv as a preliminary manuscript. 10 figures

详情

AI中文摘要

物理系统中的多变量预测需要模型在预测耦合时间变量的同时保持有意义的状态演化。深度预测器可以拟合时间相关性，物理信息模型可以用科学约束正则化预测，但这些方向通常仅在解码输出层面连接。因此，生成未来轨迹的隐藏预测状态可能在统计上有用，但在物理上无结构。我们提出Phys-JEPA，一种用于多变量时间序列预测的物理信息联合嵌入预测架构。Phys-JEPA学习一个潜在世界模型，其中预测状态被分解为物理和残差分量，物理一致性直接施加于潜在状态和潜在转移，而不仅仅施加于解码后的预测。该公式利用已知物理变量组织表示空间，同时保留未解析动力学的残差容量。在Jena Climate 2009–2016上，Phys-JEPA在H=24时将聚合MSE从0.12482降至0.12273，温度MSE从0.01892降至0.01831。在Traffic上，完整Phys-JEPA在所有测试视界内优于监督基线，将H=192的MSE从0.800784降至0.773873。在Electricity上，最佳变体取决于视界：静态潜在一致性在H=24和H=48时最强，而完整Phys-JEPA在H=192时给出最佳的聚合和目标变量MSE。这些初步结果表明，将物理信息学习从输出空间转移到潜在预测状态空间是可解释时间世界模型的一个有前景的方向。

英文摘要

Multivariate forecasting in physical systems requires models that predict coupled temporal variables while preserving meaningful state evolution. Deep forecasters can fit temporal correlations, and physics-informed models can regularize predictions with scientific constraints, but these directions are often connected only at the decoded-output level. As a result, the hidden predictive state that generates future trajectories may remain statistically useful but physically unstructured. We introduce Phys-JEPA, a physics-informed joint-embedding predictive architecture for multivariate time-series forecasting. Phys-JEPA learns a latent world model in which predictive states are decomposed into physical and residual components, and physical consistency is imposed directly on latent states and latent transitions rather than only on decoded forecasts. This formulation uses known physical variables to organize the representation space while retaining residual capacity for unresolved dynamics. On Jena Climate 2009--2016, Phys-JEPA reduces aggregate MSE from 0.12482 to 0.12273 and temperature MSE from 0.01892 to 0.01831 at H=24. On Traffic, full Phys-JEPA improves aggregate MSE over the supervised baseline across all tested horizons, reducing H=192 MSE from 0.800784 to 0.773873. On Electricity, the best variant depends on horizon: static latent consistency is strongest at H=24 and H=48, while full Phys-JEPA gives the best aggregate and target-variable MSE at H=192. These initial results suggest that moving physics-informed learning from output space to latent predictive state space is a promising direction for interpretable temporal world models.

URL PDF HTML ☆

赞 0 踩 0

2606.16112 2026-06-16 cs.LG cs.AI 新提交

Scaling Adaptive Depth with Norm-Agnostic Residual Networks

缩放自适应深度：范数无关残差网络

Tomás Figliolia, Beren Millidge

发表机构 * Zyphra San Francisco, CA（Zyphra旧金山加州）

AI总结针对残差网络中残差流范数随深度增长导致深层更新被抑制的问题，提出范数无关残差架构NAG，通过分离幅度和方向信息保持各层贡献，并实现可解释的自适应深度跳过机制，在等计算量下匹配全深度性能。

详情

AI中文摘要

残差架构在深度学习中无处不在，但它们存在一个微妙的结构性限制：残差流的范数会随深度迅速增长。因此，来自后层的更新相对于累积的残差状态变得很小。这降低了它们对表示的影响，并限制了模型在深度上扩展的益处。为了解决这个问题，我们引入了NAG，一种范数无关的残差架构，它将残差流中的幅度与方向信息分离，在整个深度中保留有意义的层贡献，并防止后层更新被残差范数增长系统地抑制。重要的是，NAG仅引入可忽略数量的额外参数，并依赖于易于内核融合的简单操作，从而在实践中保持训练效率。我们表明，该架构优于基线Transformer，其增益随深度增加而显著增大，从而能够有效训练更深的模型。范数无关的公式还产生了一种可解释的深度混合（MoD）机制，该机制自适应地跳过注意力和MLP层。除了作为训练后的精度-计算权衡外，该机制还可以用作预训练时的扩展策略：在等FLOP训练下，通过减少每token前向传播成本节省的计算量可以再投资于在更多token上训练，同时保持总参数数量和KV缓存预算固定。在我们的实验中，约20%-25%的适度深度混合率在相等训练计算量下匹配全深度基线性能，同时大幅减少执行的层参数数量和前向传播FLOPs。这些结果将深度稀疏性确定为固定计算量训练的新扩展轴，从而能够实现非常深但FLOP高效的模型。

英文摘要

Residual architectures are ubiquitous in deep learning, but they suffer from a subtle structural limitation: the norm of the residual stream can grow rapidly with depth. As a result, updates from later layers become small relative to the accumulated residual state. This reduces their impact on the representation and limits the benefits of scaling models in depth. To address this, we introduce NAG, a norm-agnostic residual architecture that separates magnitude from directional information in the residual stream, preserving meaningful layer contributions throughout depth and preventing later updates from being systematically suppressed by residual-norm growth. Importantly, NAG introduces only a negligible number of additional parameters and relies on simple operations that are easily kernel-fusible, preserving training efficiency in practice. We show that this architecture outperforms baseline Transformers, with gains that increase substantially as depth grows, enabling effective training of much deeper models. The norm-agnostic formulation also leads to an interpretable Mixture-of-Depths (MoD) mechanism that adaptively skips both attention and MLP layers. Beyond serving as a post-training accuracy-compute tradeoff, this mechanism can be used as a pretraining-time scaling strategy: under iso-FLOP training, compute saved by reducing per-token forward-pass cost can be reinvested into training on more tokens while keeping the total parameter count and KV-cache budget fixed. In our experiments, moderate Mixture-of-Depths rates of approximately 20%-25% match full-depth baseline performance under equal training compute while substantially reducing the number of executed layer parameters and forward-pass FLOPs. These results identify sparsity in depth as a new scaling axis for fixed-compute training, enabling very deep yet FLOP-efficient models.

URL PDF HTML ☆

赞 0 踩 0

2606.16231 2026-06-16 cs.LG cs.AI 新提交

CacheMuon：利用时间预条件近似极分解因子

Bishnu Dev, Sushil Bohara, Martin Takáč, Samuel Horváth

发表机构 * Mohamed bin Zayed University of Artificial Intelligence（莫扎德·本·扎耶德人工智能大学）

AI总结提出CacheMuon，通过缓存历史优化步的极分解因子来减少Muon优化器中牛顿-舒尔茨迭代的计算开销，在保持训练质量的同时降低正交化计算量。

详情

AI中文摘要

Muon是一种优化器，它利用动量矩阵的极分解因子计算更新，并在多种训练设置中展现出强大的实证性能。Muon的一个关键组件是用于计算该极分解因子的牛顿-舒尔茨迭代。尽管这避免了精确奇异值分解的计算成本，但由于每一步优化都要执行，实际中仍然昂贵。同时，动量矩阵在训练过程中平滑变化，表明对应的极分解因子存在强时间相关性。在本文中，我们利用这一结构，提出CacheMuon，一种时间预条件方法，它重用先前优化步的信息来近似当前步的极分解因子。这减少了跨迭代的冗余正交化计算。我们将CacheMuon分析为一种非精确Muon更新，其误差由新鲜求解器误差和缓存陈旧度控制。实验上，CacheMuon提供了可控的质量-效率边界：保守阈值在语言模型和视觉训练中与新鲜Muon紧密匹配，同时减少正交化FLOPs，而更激进的阈值在牺牲适度验证质量下降的情况下带来更大的算术节省。

英文摘要

Muon is an optimizer that computes updates using the polar factor of the momentum matrix and has shown strong empirical performance across a range of training settings. A key component of Muon is the Newton-Schulz iteration used to compute this polar factor. Although this avoids the cost of an exact singular value decomposition, it remains expensive in practice because it is applied at every optimization step. At the same time, the momentum matrix changes smoothly over training, suggesting strong temporal correlation in the corresponding polar factors. In this paper, we exploit this structure and propose CacheMuon, a temporal preconditioning method that reuses information from previous optimization steps to approximate the polar factor at the current step. This reduces redundant orthogonalization computation across iterations. We analyze CacheMuon as an inexact Muon update, with error controlled by fresh-solver error and cache staleness. Empirically, CacheMuon provides a controllable quality-efficiency frontier: conservative thresholds closely match fresh Muon on language-model and vision training while reducing orthogonalization FLOPs, whereas more aggressive thresholds yield larger arithmetic savings at the cost of modest validation-quality degradation.

URL PDF HTML ☆

赞 0 踩 0

2606.16388 2026-06-16 cs.LG 新提交

Robust Neural Tucker Factorization with Bias Correction and Adaptive Initialization

鲁棒神经Tucker分解：偏差校正与自适应初始化

Yuchao Su, Yixin Ran

发表机构 * School of Computer Science and Engineering, Chongqing University of Science and Technology（重庆科技大学计算机科学与工程学院）； College of Computer and Information Science, School of Software, Southwest University（西南大学计算机与信息科学学院软件学院）

AI总结提出KaBiN模型，结合Kaiming初始化和偏差校正，解决高维不完全张量补全中初始化不当和偏差缺失导致的优化不稳定问题。

Comments 9 pages,3 figures, 106 conferences

详情

AI中文摘要

高维不完全（HDI）张量广泛应用于交通和气候领域，但稀疏观测使得准确补全困难。跨不同多模态场的固有非线性动态和非平稳变化严重阻碍了传统线性重构框架的有效性。神经Tucker分解为建模张量模式间的高阶交互提供了有效框架。通过将底层结构特征参数化为连续潜在空间，神经表示规避了经典代数的刚性低秩约束。然而，其性能仍可能受到实现层面选择的影响，尤其是参数初始化和最终输出映射的偏差配置。次优初始化常导致立方扩展交互空间中的方差爆炸，将后续非线性激活边界推入严重梯度饱和区域，而忽略专用平移参数迫使交互权重隐式吸收全局统计偏差。本文提出一种简单有效的神经Tucker分解模型，结合Kaiming初始化和偏差校正（KaBiN），用于HDI张量补全。所提模型对嵌入和Tucker线性参数采用Kaiming均匀初始化，并在输出映射中采用简单偏差校正。通过优雅地将全局均值偏移与局部结构表示解耦，该框架提供了高度稳定且条件良好的优化景观。在三个真实HDI张量数据集上的实验表明，KaBiN在引入最小计算开销的同时，实现了优于原始NeuTucF的性能。

英文摘要

High-dimensional incomplete (HDI) tensors are widely used in traffic and climate applications, but sparse observations make accurate completion difficult. The intrinsic non-linear dynamics and non-stationary variations across distinct multi-modal fields severely hinder the efficacy of conventional linear reconstruction frameworks. Neural Tucker factorization provides an effective framework for modeling high-order interactions among tensor modes. By parameterizing underlying structural characteristics into continuous latent spaces, neural representations circumvent the rigid low-rank constraints of classical algebra. However, its performance can still be affected by implementation-level choices, especially parameter initialization and the bias configuration of the final output mapping. Suboptimal initializations frequently lead to variance explosion across the cubically expanded interaction spaces, driving the subsequent non-linear activation boundaries into severe gradient saturation zones, while the omission of a dedicated translation parameter forces interaction weights to implicitly absorb global statistical deviations. This paper proposes a simple yet effective neural Tucker factorization model with Kaiming initialization and bias correction (KaBiN) for HDI tensor completion. The proposed model utilizes Kaiming uniform initialization for the embedding and Tucker linear parameters, and adopts a simple bias correction in output mapping. By elegantly decoupling global mean shifts from local structural representations, the framework provides a highly stable and well-conditioned optimization landscape. Experiments on three real-world HDI tensor datasets show that KaBiN achieves better performance than the original NeuTucF, while introducing minimal computational overhead.

URL PDF HTML ☆

赞 0 踩 0

2606.16429 2026-06-16 cs.LG cs.CL 新提交

Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation

Taylor-Calibrate：混合线性注意力蒸馏的原则性初始化

Zhongzhu Zhou, Qingyang Wu, Junxiong Wang, Mayank Mishra, Shuaiwen Leon Song, Ben Athiwaratkun, Chenfeng Xu

发表机构 * The University of Sydney（悉尼大学）； Together AI ； University of California, Berkeley（加州大学伯克利分校）； The University of Texas at Austin（德克萨斯大学奥斯汀分校）； Microsoft（微软）

AI总结提出Taylor-Calibrate方法，利用泰勒引导的教师注意力统计初始化混合线性注意力学生模型，显著减少蒸馏所需训练令牌数。

Comments 24 pages, 9 figures

详情

AI中文摘要

混合线性注意力模型提供了一条更快长上下文推理的诱人路径：它们降低了全softmax注意力的二次成本和KV缓存负担，同时保留了Transformer模型的大部分质量。获得此类模型的一种实用方法是转换预训练的Transformer，而不是从头开始预训练新架构，但这种转换仍然脆弱。简单地将教师注意力投影复制到Gated DeltaNet（GDN）学生中并不能指定新的循环衰减、写入和输出门控动态。因此，转换后的模型通常从较差的动态状态开始，必须花费大量蒸馏令牌来修复初始化，而不是学习剩余的教师行为。我们提出了Taylor-Calibrate，一种用于混合GDN学生的轻量级初始化方法。该方法使用泰勒引导的教师注意力统计来设置值投影、记忆时间尺度、写入门和输出门，然后应用一个简短的逐层对齐步骤，使每个转换后的层与教师输出匹配。在四种教师设置和三种保留层策略下，Taylor-Calibrate提供了显著更强的零样本学生，在代表性消融中改进高达88倍，并且达到匹配恢复目标所需的训练令牌比朴素转换少4.9倍至9.2倍。

英文摘要

Hybrid linear attention models offer an appealing path to faster long-context inference: they reduce the quadratic cost and KV-cache burden of full softmax attention while retaining much of the quality of Transformer models. A practical way to obtain such models is to convert a pretrained Transformer instead of pretraining a new architecture from scratch, but this conversion is still brittle. Simply copying the teacher attention projections into a Gated DeltaNet (GDN) student does not specify the new recurrent decay, write, and output-gating dynamics. As a result, the converted model often starts in a poor dynamical regime and must spend many distillation tokens repairing initialization rather than learning the remaining teacher behavior. We propose Taylor-Calibrate, a lightweight initialization method for hybrid GDN students. The method uses Taylor-guided teacher attention statistics to set the value projection, memory timescale, write gates, and output gate, then applies a short per-layer alignment step to match each converted layer to the teacher output. Across four teacher settings and three retained-layer policies, Taylor-Calibrate gives substantially stronger zero-shot students, with up to an 88x improvement in a representative ablation, and reaches matched recovery targets with 4.9x--9.2x fewer training tokens than naive conversion.

URL PDF HTML ☆

赞 0 踩 0

2606.16454 2026-06-16 cs.LG cs.AI 新提交

SDS-LoRA: Overcoming Anisotropic Gradient Scaling in Low-Rank Adaptation

SDS-LoRA：克服低秩适应中的各向异性梯度缩放

Junghun Oh, Sungyong Baik, Kyoung Mu Lee

发表机构 * Seoul National University（首尔大学）； Hanyang University（汉阳大学）

AI总结提出SDS-LoRA，通过结构解耦奇异值与反向传播，消除LoRA中梯度各向异性缩放导致的秩降低和次优对齐问题，提升收敛速度和适应性能。

详情

AI中文摘要

低秩适应（LoRA）通过使用低秩矩阵参数化权重更新，实现了大型预训练模型对下游任务的高效适应。在本文中，我们从几何角度研究了LoRA参数化的局限性。具体地，我们表明当全微调梯度反向传播到低秩矩阵时，它会经历由奇异值驱动的各向异性缩放。我们认为这种现象是不可取的，因为它通过将梯度偏向主导奇异方向而抑制其他方向，从而扭曲了全微调梯度。我们的分析表明，各向异性梯度缩放降低了低秩矩阵梯度的有效秩，并导致LoRA中全微调梯度与其低秩近似之间的次优对齐，从而加剧了与全微调的差距。为了解决这些局限性，我们提出了一种新的低秩参数化方法SDS-LoRA，该方法在结构上将奇异值与反向传播解耦。我们的方法确保全微调梯度仅通过低秩矩阵子空间的正交基反向传播，独立于其尺度。收敛性分析表明，虽然LoRA的收敛速率随低秩矩阵的条件数而恶化，但SDS-LoRA与之无关。在自然语言和视觉基准上的实验结果表明，SDS-LoRA改善了损失收敛并缩小了与全微调的差距，显著提升了适应性能。

英文摘要

Low-Rank Adaptation (LoRA) enables efficient adaptation of large pre-trained models to downstream tasks by parameterizing weight updates with low-rank matrices. In this paper, we investigate the limitations of the LoRA parameterization from a geometric perspective. Specifically, we show that when a full fine-tuning gradient is backpropagated to the low-rank matrices, it undergoes anisotropic scaling driven by their singular values. We argue that this phenomenon is undesirable because it distorts the full fine-tuning gradient by skewing it toward dominant singular directions while suppressing others. Our analyses demonstrate that anisotropic gradient scaling reduces the effective rank of the low-rank matrices' gradients and results in suboptimal alignment between the full fine-tuning gradient and its low-rank approximation in LoRA, thereby exacerbating the gap to full fine-tuning. To address these limitations, we propose a new low-rank parameterization, SDS-LoRA, which structurally decouples singular values from the backward pass. Our method ensures that the full fine-tuning gradient backpropagates only through the orthonormal bases of the low-rank matrices' subspaces, independent of their scales. Convergence analysis demonstrates that while LoRA's convergence rate degrades with the condition number of the low-rank matrices, SDS-LoRA remains independent of it. Experimental results across natural language and vision benchmarks show that SDS-LoRA improves loss convergence and reduces the gap to full fine-tuning, significantly enhancing adaptation performance.

URL PDF HTML ☆

赞 0 踩 0

2606.16456 2026-06-16 cs.LG cs.AI 新提交

SPRI: SVD-Partitioned Residual Initialization for Data-Constrained MoE Upcycling

SPRI: 基于SVD分解残差初始化的数据受限MoE升级方法

Weiqiao Shan, Ruixiang Mao, Yuang Li, Yuhao Zhang, Yingfeng Luo, Tong Zheng, Chen Xu, Yucheng Qiao, Chunxiang Jin, Yi Yuan, Jingdong Chen, Tong Xiao, Jingbo Zhu

发表机构 * Northeastern University, China（东北大学）； Huawei TSC, China（华为技术有限公司）； CUHK-Shenzhen, China（香港中文大学（深圳））； University of Maryland, USA（马里兰大学）； Harbin Engineering University, China（哈尔滨工程大学）； Inclusion AI, Ant Group（蚂蚁集团Inclusion AI）； NiuTrans Research, China（小牛翻译研究中心）

AI总结提出SPRI方法，利用预训练FFN权重的SVD分解残差初始化MoE专家，结合两阶段训练策略，在数据受限的多语言语音翻译任务中显著提升性能。

Comments 8pages, 12 tables, 3 figures

详情

AI中文摘要

混合专家（MoE）模型能够实现高效扩展，但从头训练成本过高。MoE升级通过将预训练的密集模型转换为稀疏MoE模型来降低这一成本。然而，现有的升级方法通常依赖大规模持续训练，并且在数据受限的监督适应中表现不佳，原因在于专家同质化或对预训练参数的过度扰动。在此设置下，有效的升级必须利用预训练权重结构，同时为路由专家引入足够的多样性。为此，我们提出了基于SVD分解残差初始化（SPRI）的方法，该方法将从预训练前馈网络（FFN）权重中提取的SVD分解残差分配到路由专家中，从而在预训练谱结构的基础上引入可控的专家多样性。我们进一步引入两阶段训练策略以提高适应稳定性。我们在多语言语音到文本翻译任务上评估SPRI，该任务中有限的监督数据对MoE升级构成挑战，而多个目标语言提供了天然的路由异质性。在CoVoST2数据集上的15个英语到其他语言方向中，SPRI相比完全微调的密集模型平均BLEU和COMET分别提高了2.58和3.32分，并且比之前最佳的MoE升级基线高出3.39 BLEU和4.34 COMET分。

英文摘要

Mixture-of-Experts (MoE) models enable efficient scaling, but training them from scratch remains prohibitively expensive. MoE upcycling mitigates this cost by converting pretrained dense models into sparse MoE models. However, existing upcycling methods typically rely on large-scale continued training and often perform poorly under data-constrained supervised adaptation, due to either homogeneous experts or overly disruptive perturbations to pretrained parameters. In this setting, effective upcycling must leverage pretrained weight structure while introducing sufficient diversity among routed experts. To this end, we propose SVD-Partitioned Residual Initialization (SPRI), which distributes SVD-partitioned residuals derived from pretrained feed-forward network (FFN) weights across routed experts, introducing controlled expert diversity grounded in pretrained spectral structure. We further introduce a two-stage training strategy to improve adaptation stability. We evaluate SPRI on multilingual speech-to-text translation, where limited supervised data challenges MoE upcycling and multiple target languages provide natural routing heterogeneity. On CoVoST2 across 15 En-to-XX directions, SPRI improves average BLEU and COMET over fully fine-tuned dense models by 2.58 and 3.32 points, respectively, and outperforms the prior best MoE upcycling baseline by 3.39 BLEU and 4.34 COMET points.

URL PDF HTML ☆

赞 0 踩 0

2606.16575 2026-06-16 cs.LG math-ph math.MP 新提交

RepNet: Tackling spectral bias in deep neural networks via parameter reparameterization

RepNet：通过参数重参数化解决深度神经网络中的谱偏差

Yong Wang, Tao Zhou, Xuhui Meng

发表机构 * Institute of Interdisciplinary Research for Mathematics and Applied Science, School of Mathematics and Statistics, Huazhong University of Science and Technology（华中科技大学数学与统计学院交叉科学与应用数学研究所）； Institute of Computational Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences（中国科学院数学与系统科学研究院计算数学研究所）

AI总结针对深度神经网络在捕捉振荡和多尺度行为时的谱偏差问题，提出RepNet模型，通过重参数化第一隐藏层的权重和偏置，有效控制初始斜率尺度和分区点分布，实现自适应频率缩放，在函数逼近、PDE求解和算子学习中显著提升精度。

详情

AI中文摘要

深度神经网络（DNN）在科学计算中取得了显著成功，但在捕捉振荡和多尺度行为时常常受到谱偏差的影响。在本研究中，我们通过考察浅层ReLU神经网络在高频函数拟合中的失败来探究这一局限性。这一观察识别出解决快速振荡的两个重要因素：初始斜率尺度和网络诱导的分区点分布。受此分析启发，我们提出了RepNet，一种针对ReLU和tanh网络的重参数化DNN模型，专为高频和多尺度问题设计。关键思想是重参数化第一隐藏层的权重和偏置，从而能够有效控制初始斜率尺度并提供合适的初始分区点分布。此外，将重参数化的权重和偏置视为可训练参数，使得DNN在训练过程中实现自适应频率缩放。我们还推导了重参数化DNN的输出和斜率幅度的定量估计，以指导所提方法的初始化。数值实验，包括多尺度一维和四维函数逼近、结合物理信息神经网络（PINN）的正向和逆向PDE问题以及算子学习，表明RepNet在略微增加计算成本的情况下，提高了普通DNN在捕捉高度振荡特征时的预测精度。这些结果表明，RepNet为克服谱偏差并将DNN应用于多尺度问题提供了一种有效且灵活的方法。

英文摘要

Deep neural networks (DNNs) have achieved remarkable success in scientific computing, yet they often suffer from spectral bias in capturing oscillatory and multiscale behaviors. In this study, we investigate this limitation by examining the failure of shallow ReLU neural networks in fitting high-frequency functions. This observation identifies two important factors in resolving rapid oscillations: the initial slope scale and the distribution of partition points induced by the networks. Motivated by this analysis, we propose RepNet, a reparameterized DNN model for ReLU and tanh networks designed for high-frequency and multiscale problems. The key idea is to reparameterize the weights and biases in the first hidden layer, which enables effective control of the initial slope scale and provides an appropriate distribution of the initial partition points. Furthermore, treating the reparameterized weights and biases as trainable parameters allows the DNN to achieve adaptive frequency scaling during training. In addition, we derive quantitative estimates for the output and slope magnitudes of the reparameterized DNN to guide the initialization of the proposed method. Numerical experiments, including multiscale one- and four-dimensional function approximation, forward and inverse PDE problems in combination with physics-informed neural networks (PINNs), and operator learning, demonstrate that RepNet improves the predicted accuracy of vanilla DNNs in capturing highly oscillatory features with slightly additional computational cost. These results indicate that RepNet provides an effective and flexible approach for overcoming spectral bias and applying DNNs to multiscale problems.

URL PDF HTML ☆

赞 0 踩 0

2606.16620 2026-06-16 cs.LG cs.AI 新提交

Entropy-Gated Latent Recursion

熵门控潜在递归

Soham Bhattacharjee, Dushyant Singh Chauhan, Salem Lahlou, Martin Takac, Nils Lukas

发表机构 * Mohamed bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）

AI总结提出熵门控潜在递归（EGLR），通过在高不确定性token处递归应用冻结模型顶层解码器，构建与温度采样正交的确定性采样轴，扩展推理时缩放空间，在数学推理任务中显著提升性能。

详情

AI中文摘要

推理时缩放已成为改进语言模型推理能力的主要手段，但现有方法的展开多样性仅来源于单一来源：随机token级采样。我们认为这种单轴采样空间本质上是受限的，并识别出第二个完全确定且互补的轴：在冻结模型的顶层解码器层在高不确定性token处递归重新应用的层跨度$L$。不同的$L$选择会产生不同的展开，解决不同的问题子集，且无需随机性。我们通过熵门控潜在递归（EGLR）实例化这一轴，这是一种无需训练的解码过程，它重新应用顶层$L$层最多$K_{\max}$次迭代，直到下一个token分布收敛。结合$T$个温度采样，EGLR将单轴随机展开池转变为$L\times T$笛卡尔采样空间，且几乎不增加每次展开的成本。我们在8个指令微调模型和6个数学推理基准上表征了这一空间，并表明$L$轴与温度确实互补：在MATH-500上使用Qwen2.5-3B-Instruct时，联合$L\times T$预言机达到91.6%，比仅温度预言机（83.4%）高出8.2个百分点，比仅层预言机（81.2%）高出10.4个百分点，证实两个轴捕获了真正互补的问题。扩展的展开池为任何下游过程（包括自一致性、带验证器的最佳$N$选择和组相对RL训练（GRPO））提供了更丰富的每个提示候选，开辟了不依赖随机噪声的推理时缩放新方向。

英文摘要

Inference-time scaling has become the dominant lever for improving language-model reasoning, but existing methods derive rollout diversity from a single source: stochastic token-level sampling. We argue that this single-axis sampling space is fundamentally limiting, and identify a second, fully deterministic and complementary axis: the layer span $L$ at which a frozen model's top decoder layers are recursively re-applied at high-uncertainty tokens. Different choices of $L$ produce distinct rollouts that solve different subsets of problems, with no stochasticity. We instantiate this axis through Entropy-Gated Latent Recursion (EGLR), a training-free decoding procedure that re-applies the top-$L$ layers for at most $K_{\max}$ iterations until the next-token distribution converges. Combined with $T$ temperature samples, EGLR turns a single-axis stochastic rollout pool into an $L\times T$ Cartesian sampling space at almost the same per-rollout cost. We characterize this space across $8$ instruction-tuned models and $6$ math reasoning benchmarks, and show that the $L$-axis is genuinely complementary to temperature: on MATH-500 with Qwen2.5-3B-Instruct, the joint $L\times T$ oracle reaches $91.6\%$, $+8.2$ percentage points beyond the temperature-only oracle ($83.4\%$) and $+10.4$ points beyond the layer-only oracle ($81.2\%$), confirming that the two axes capture genuinely complementary problems. The expanded rollout pool provides richer per-prompt candidates for any downstream procedure that consumes rollouts, including self-consistency, best-of-$N$ with verifiers, and group-relative RL training (GRPO), opening a new direction for inference-time scaling that does not rely on stochastic noise.

URL PDF HTML ☆

赞 0 踩 0

2606.16639 2026-06-16 cs.LG 新提交

SPICE: Synergy and Partial Information Based Curriculum Evolution

SPICE: 基于协同与部分信息的课程演化

Ankush Pratap Singh, Houwei Cao, Yong Liu

发表机构 * New York Institute of Technology（纽约理工学院）； New York University（纽约大学）

AI总结提出SPICE框架，利用部分信息分解理论动态量化样本复杂度，设计渐进式课程使模型从学习共享跨模态线索过渡到模态特定模式再到复杂协同交互，在多个多模态基准上取得一致改进。

详情

AI中文摘要

多模态学习利用异构模态间的互补信息。每种模态的信息量在不同样本和训练阶段可能差异很大。现有的多模态课程学习策略通常假设样本的相对复杂度在训练过程中保持不变，因此无法适应模型的演化。我们提出了SPICE（基于协同与部分信息的课程演化），一种新颖的渐进式课程框架，用于多模态交互学习。在部分信息分解（PID）理论的指导下，我们的方法将多模态交互分解为冗余、独特和协同信息成分，从而实现对样本复杂度的可解释且动态的表征。基于这种分解，我们设计了一个在训练过程中不断演化的渐进式课程，使模型能够从学习共享的跨模态线索过渡到模态特定模式，最后到复杂的协同交互。为了适应模型演化，样本排序通过从单模态和多模态预测中得出的PID信息估计进行实时优化。在多个多模态基准上的实验表明，与传统训练和最先进基线相比，该方法取得了持续改进，凸显了PID信息分解和自适应样本排序在多模态课程学习中的有效性。

英文摘要

Multimodal learning exploits complementary information across heterogeneous modalities. The informativeness of each modality can vary widely across samples and training stages. Existing multimodal curriculum learning strategies often assume that the relative complexity of samples remains unchanged throughout training and therefore cannot adapt to model evolution. We propose SPICE (Synergy and Partial Information based Curriculum Evolution), a novel progressive curriculum framework for multimodal interaction learning. Guided by Partial Information Decomposition (PID) theory, our approach decomposes multimodal interactions into redundant, unique, and synergistic information components, enabling an interpretable and dynamic characterization of sample complexity. Building on this decomposition, we design a progressive curriculum that evolves throughout training, allowing the model to transition from learning shared cross-modal cues to modality-specific patterns and, finally, to complex synergistic interactions. Adapting to model evolution, sample ordering is refined in real-time using PID information estimates derived from unimodal and multimodal predictions. Experiments across multiple multimodal benchmarks demonstrate consistent improvements over conventional training and state-of-the-art baselines, highlighting the effectiveness of PID information decomposition and adaptive sample ordering for multimodal curriculum learning.

URL PDF HTML ☆

赞 0 踩 0

2606.16694 2026-06-16 cs.LG cs.AI physics.app-ph q-bio.NC 新提交

Adaptive inference and function vectors in deep transformers

深度变换器中的自适应推理与函数向量

Ravin Raj, Gautam Reddy

发表机构 * Joseph Henry Laboratories of Physics, Princeton University（普林斯顿大学约瑟夫·亨利物理实验室）

AI总结提出深度变换器作为平均场交互系统实现分布式推理的理论，利用函数向量逐层推断潜在上下文变量，在上下文回归任务中预测非高斯分层结构与深度的关系，并通过约束线性注意力变换器验证。

详情

AI中文摘要

变换器被广泛用作学习大量耦合变量间复杂相关性的通用基础架构，但其内部机制仍不明确。我们提出了一种深度变换器作为平均场交互系统的理论，该系统在通信、局部性和深度约束下实现分布式推理。我们证明，这样的系统可以利用内部状态表示（“函数向量”）在其层上以越来越精细的尺度推断潜在上下文变量。在上下文回归任务中，该理论预测了潜在上下文变量中的非高斯分层结构与变换器深度之间的非平凡关系。使用约束线性注意力变换器对预测进行了测试，并展示了深度架构中的自适应推理。前馈模块和深度使变换器能够实现比先前描述的更丰富的上下文学习算法类别。

英文摘要

Transformers are widely used as a general-purpose substrate for learning complex correlations between a large collection of coupled variables, but their internal mechanisms have remained mysterious. We introduce a theory of a deep transformer as a mean-field interacting system that implements distributed inference, subject to constraints on communication, locality and depth. We show that such a system can exploit internal state representations ('function vectors') to infer a latent context variable at increasingly finer scales over its layers. In an in-context regression task, the theory predicts a non-trivial relationship between non-Gaussian, hierarchical structure in the latent context variable, and transformer depth. Predictions are tested using constrained linear attention transformers and demonstrate adaptive inference in deep architectures. Feedforward blocks and depth enable transformers to implement a much richer class of in-context learning algorithms than previously described.

URL PDF HTML ☆

赞 0 踩 0

2606.16768 2026-06-16 cs.LG 新提交

Taming Curvature: Architecture Warm-Up for Stable Transformer Training

驯服曲率：稳定Transformer训练的架构预热

Sameera Ramasinghe, Ajanthan Thalaiyasingam, Hadi Mohaghegh Dolatabadi, Chamin Hewa Koneputugodage, Gil Avraham, Violetta Shevchenko, Yan Zuo, Karol Pajak, Alexander Long

发表机构 * Pluralis Research

AI总结提出基于热启动幂迭代的快速在线曲率估计方法，并发现训练不稳定性与预条件曲率激增相关，进而提出渐进增加网络深度的架构预热策略，有效稳定大模型训练。

详情

AI中文摘要

训练数十亿参数的Transformer通常很脆弱，会出现瞬时的损失尖峰和发散，浪费计算资源。尽管最近发展的边缘稳定性（EoS）理论通过（预条件）曲率提供了理解和控制优化方法稳定性的强大工具，但由于曲率估计的复杂性，这些曲率控制方法在大规模Transformer训练中并不流行。为此，我们首先引入一种基于热启动变体的快速在线估计器，用于估计最大的（预条件）Hessian特征值（即曲率），该估计器使用Hessian-向量积进行幂迭代。我们从理论上证明，并通过实验验证，所提出的方法在十亿参数规模下使每次迭代的曲率跟踪变得可行，同时更加准确。利用这一工具，我们发现训练不稳定性与预条件曲率的激增同时发生，并且曲率随深度增加而增长。基于这些观察，我们提出架构预热：逐步增加网络深度，以仔细控制预条件Hessian并稳定训练。在大规模Transformer上的实验验证了我们的方法能够实现高效的曲率跟踪，并在不减慢收敛速度的情况下，与现有最先进的稳定技术相比减少了不稳定性。

英文摘要

Training billion-parameter Transformers is often brittle, with transient loss spikes and divergence that waste compute. Even though the recently developed Edge of Stability (EoS) theory provides a powerful tool to understand and control the stability of optimization methods via the (preconditioned) curvature, these curvature-controlling methods are not popular in large-scale Transformer training due to the complexity of curvature estimation. To this end, we first introduce a fast online estimator of the largest (preconditioned) Hessian eigenvalue (i.e., curvature) based on a warm-started variant for power iteration with Hessian-vector products. We show theoretically, and verify empirically, that the proposed method makes per-iteration curvature tracking feasible at billion parameter scale while being more accurate. Using this tool, we find that training instabilities coincide with surges in preconditioned curvature and that curvature grows with depth. Motivated by these observations, we propose architecture warm-up: progressively growing network depth to carefully control the preconditioned Hessian and stabilize training. Experiments on large Transformers validate that our approach enables efficient curvature tracking and reduces instabilities compared to existing state-of-the-art stabilization techniques without slowing down convergence.

URL PDF HTML ☆

赞 0 踩 0

2606.16899 2026-06-16 cs.LG 新提交

Fantastic Pretraining Optimizers and Where to Find Them II: Hyperball Optimization

奇妙预训练优化器及其发现之处 II：超球优化

Kaiyue Wen, Xingyu Dang, Kaifeng Lyu, Tengyu Ma, Percy Liang

发表机构 * Stanford University（斯坦福大学）； Princeton University（普林斯顿大学）； Tsinghua University（清华大学）

AI总结针对Muon等优化器在大模型预训练中增益随规模增大而减弱的问题，提出Hyperball包装器，固定权重矩阵及其更新的Frobenius范数，在1.2B参数模型上实现20-30%的token等效加速，并改善学习率迁移。

Comments Corresponding blog post: https://psychedelic-sunstone-851.notion.site/Fantastic-Pretraining-Optimizers-and-Where-to-Find-Them-2-1-Hyperball-Optimization-2e924306e6f280e7a5ffee00eb40a0dd

详情

AI中文摘要

基于矩阵的优化器（如Muon）可以显著加速语言模型预训练，但观察到当使用标准常数解耦权重衰减时，随着模型大小和数据规模的增长，它们相对于AdamW的增益会缩小。我们提出Hyperball，一个简单的优化器包装器来解决这个问题。给定一个基础优化器（如Adam或Muon），Hyperball将权重矩阵的Frobenius范数及其对应的优化器更新设置为固定常数。在高达1.2B参数的Qwen3风格模型上，Muon Hyperball相对于权重衰减基线实现了20-30%的token等效加速。与解耦权重衰减相比，Hyperball还改善了跨宽度和深度的学习率迁移。该方法的动机来自先前的理论，该理论表明使用权重衰减训练会导致一个仅依赖于训练超参数的平衡权重范数。通过这种机制，权重衰减决定了角度学习率，即权重矩阵方向变化的速度。

英文摘要

Matrix based optimizers such as Muon can substantially speed up language model pretraining, but their gains over AdamW are observed to shrink as model size and data scale grow when using standard constant decoupled weight decay. We propose Hyperball, a simple optimizer wrapper that addresses this issue. Given a base optimizer such as Adam or Muon, Hyperball sets the Frobenius norms of weight matrices and their corresponding optimizer updates to fixed constants. On Qwen3 style models up to 1.2B parameters, Muon Hyperball achieves 20--30% token equivalent speedup over weight decay baselines. Hyperball also improves learning rate transfer across widths and depths compared to decoupled weight decay. This method is motivated by prior theory showing that training with weight decay leads to an equilibrium weight norm that only depends on the training hyperparameters. Through this mechanism, the weight decay then decides the angular learning rate, i.e. how fast the direction of the weight matrix changes.

URL PDF HTML ☆

赞 0 踩 0

2606.16900 2026-06-16 cs.LG 新提交

Factorized Neural Operators Decompose Dynamic and Persistent Responses

因子化神经算子分解动态与持久响应

Hao Tang, Yuechen Duan, Jiongyu Zhu, Zimeng Feng, Hao Li, Chao Li

发表机构 * School of Medicine, University of Dundee（邓迪大学医学院）； School of Data Science, Fudan University（复旦大学数据科学学院）； School of Mathematical Sciences, Fudan University（复旦大学数学科学学院）； Institute of Science and Technology for Brain-inspired Intelligence, Fudan University（复旦大学类脑智能科学与技术研究院）； School of Science and Engineering, University of Dundee（邓迪大学科学与工程学院）； Department of Applied Mathematics and Theoretical Physics, University of Cambridge（剑桥大学应用数学与理论物理系）

AI总结提出因子化神经算子（FaNO），通过分解谱表示为等变动态响应和不变持久响应，提升多尺度物理系统的预测精度、参数效率和跨尺度泛化能力。

详情

AI中文摘要

物理系统通常表现出异质性机制，其中快速演变的动力学与持久结构共存。现有的神经算子通常依赖单一主导归纳偏置，因此将不同的物理响应耦合到共享表示中，难以捕捉这种多尺度物理行为。我们引入了跨域的统一格林函数框架，并提出了因子化神经算子（FaNO），它将谱表示分解为等变动态响应和不变持久响应，从而提高了可解释性和泛化能力。从机制上讲，我们展示了两个算子分支自发地特化为不同的物理角色，这些角色在尺度和域上保持一致：等变分支捕捉快速变化的瞬态动力学，而不变分支提取连贯的持久结构。FaNO的这种因子化机制提高了跨物理系统和域的预测精度、参数效率和跨尺度泛化能力。特别是，它在长时程自回归滚动、跨分辨率外推和物理状态转移下保持一致的预测。这些发现表明，可扩展的物理建模可能受益于从单一归纳偏置公式转向更好地反映物理系统异质性组织的因子化算子表示，从而加速机器学习在科学计算和发现中的可靠部署。

英文摘要

Physical systems often exhibit heterogeneous mechanisms, where rapidly evolving dynamics coexist with persistent structures. Capturing such multiscale physical behavior remains challenging for existing neural operators, which typically rely on single dominant inductive bias and therefore couple distinct physical responses into a shared representation. We introduce the Unified Green's Function Framework across domains and propose the Factorized Neural Operators (FaNO), which decompose spectral representations into equivariant dynamic responses and invariant persistent responses, leading to better interpretability and generalization. Mechanistically, we show that the two operator branches spontaneously specialize into distinct physical roles that remain consistent across scales and domains: the equivariant branch captures rapidly varying transient dynamics, whereas the invariant branch extracts coherent persistent structures. This factorized mechanism of FaNO improves prediction accuracy, parameter efficiency and cross-scale generalization across physical systems and domains. In particular, it maintains consistent predictions under long-horizon autoregressive rollout, cross-resolution extrapolation and physical-regime shifts. These findings suggest that scalable physical modeling may benefit from moving beyond single-inductive-bias formulations toward factorized operator representations that better reflect the heterogeneous organization of physical systems, accelerating the reliable deployment of machine learning for scientific computing and discovery.

URL PDF HTML ☆

赞 0 踩 0

2606.16939 2026-06-16 cs.LG cs.AI 新提交

Scalable Circuit Learning for Interpreting Large Language Models

可扩展的电路学习用于解释大型语言模型

Naiyu Yin, Dennis Wei, Tian Gao, Amit Dhurandhar, Karthikeyan Natesan Ramamurthy, Yue Yu

AI总结提出CircuitLasso方法，基于稀疏线性回归高效学习LLM中的稀疏电路，以SAE特征为单元，在保持结构准确性的同时大幅降低计算成本，并揭示语义特征传播机制。

Comments Accepted to the Mechanistic Interpretability Workshop at ICML 2026

详情

AI中文摘要

机械可解释性中的一个重要研究方向是学习LLM组件上的稀疏电路，以揭示它们如何共同产生模型行为。然而，原始神经元具有多语义性，使得学习到的电路难以解释。稀疏自编码器（SAE）特征缓解了这一问题，但其高维度使得现有的基于干预的电路学习方法在计算上变得不可行。我们提出了CircuitLasso，一种基于稀疏线性回归的可扩展电路学习方法。CircuitLasso恢复的电路在基准数据上的结构准确性与最先进的基于干预的方法相匹配，而计算成本仅为后者的一小部分。为了可解释性，CircuitLasso高效地揭示了SAE特征之间的关系，展示了人类可解释的语义特征如何通过模型传播并影响其预测。最后，我们通过利用所学电路的见解，在领域泛化任务上以显著更低的成本实现了相当的性能，从而验证了所学电路的实用性。

英文摘要

A prominent research direction in mechanistic interpretability is learning sparse circuits over LLM components to reveal how they jointly produce model behavior. However, raw neurons are polysemantic, making learned circuits hard to interpret. Sparse autoencoder (SAE) features alleviate this, but their high dimensionality makes existing intervention-based circuit learning methods computationally prohibitive. We propose CircuitLasso, a scalable circuit-learning approach based on sparse linear regression. CircuitLasso recovers circuits whose structural accuracy matches that of state-of-the-art intervention-based methods on the benchmark data, at a fraction of the computational cost. For interpretability, CircuitLasso efficiently uncovers relationships among SAE features, showing how human-interpretable semantic features propagate through the model and influence its predictions. Finally, we validate the utility of our learned circuits by leveraging their insights to achieve comparable performance at substantially lower cost on a domain-generalization task.

URL PDF HTML ☆

赞 0 踩 0

2606.16979 2026-06-16 cs.LG 新提交

Scalable Pairwise Kernel Learning with Stochastic Vec Trick

可扩展的成对核学习与随机Vec技巧

Napsu Karmitsa, Tapio Pahikkala, Antti Airola

发表机构 * Department of Computing, University of Turku（图尔库大学计算系）

AI总结提出SPaiK方法，利用随机广义vec技巧（sGVT）实现成对核学习的大规模扩展，在七个药物-靶标亲和力数据集上优于现有方法。

2606.17028 2026-06-16 cs.LG cs.AI cs.AR 新提交

HAMON: Passive Optical Sequence Mixing for Long-Horizon Forecasting

HAMON: 用于长程预测的无源光学序列混合

Alper Yıldırım

AI总结提出HAMON无源衍射光学预测核心，通过光学传播替代数字序列混合层，在多个基准上优于或接近最强数字基线，MSE最多降低14%。

详情

AI中文摘要

简单的线性模型和频域模型在长程时间序列预测中仍然出奇地具有竞争力，最近的机制证据表明，标准预测基准可能不需要使Transformer在其他领域强大的密集叠加表示。这引发了一个底层问题：如果核心预测算子通常是低复杂度的且近似线性，它是否需要被实现为学习到的数字时间混合？我们引入了HAMON，一种无源衍射光学预测核心，其中历史值被编码到光学孔径上，未来位置保持暗场，级联的可训练相位掩模与自由空间衍射直接在输出场中形成预测。在推理时，预测由单个无源光学传播过程完成，无需可训练的数字序列混合层。在标准基准上，HAMON在ETTm2的所有预测长度和ETTh2除最长预测长度外的所有长度上优于考虑的最强数字基线，MSE最多降低14%，并且在不同预测长度上一致地优于基线，而非孤立点。它在Weather上具有竞争力，在其余ETT设置以及高通道数的Traffic和Electricity数据集上略逊于最强基线。相位编码、强度兼容读出和相位扰乱消融实验，以及TorchOptics交叉模拟检查表明，预测来自承载数据的光场而非数字预测头。由于无源核心使用标准傅里叶光学，HAMON为光学硬件和无源物理序列混合定义了一个具体目标。

英文摘要

Simple linear and frequency-domain models remain surprisingly competitive in long-horizon time-series forecasting, and recent mechanistic evidence suggests that standard forecasting benchmarks may not require the dense superposed representations that make transformers powerful in other domains. This raises a substrate-level question: if the core forecasting operator is often low-complexity and approximately linear, does it need to be implemented as learned digital temporal mixing? We introduce HAMON, a passive diffractive optical forecasting core in which historical values are encoded onto an optical aperture, future positions are left dark, and cascaded trainable phase masks with free-space diffraction shape the forecast directly in the output field. At inference, prediction is performed by a single passive optical propagation pass with no trainable digital sequence-mixing layer. Across standard benchmarks, HAMON outperforms the strongest digital baselines considered on ETTm2 at all horizons and on ETTh2 at all but the longest horizon, improving MSE by up to 14\% and doing so consistently across horizons rather than at isolated points. It is competitive on Weather and trails the strongest baselines on the remaining ETT settings and on the high-channel-count Traffic and Electricity datasets. Phase encoding, intensity-compatible readout, and phase-scrambling ablations, together with a TorchOptics cross-simulator check, indicate that the forecasts arise from the data-bearing optical field rather than from a digital forecasting head. Because the passive core uses standard Fourier optics, HAMON defines a concrete target for optical hardware and for passive physical sequence mixing.

URL PDF HTML ☆

赞 0 踩 0

2606.14757 2026-06-16 cs.CV cs.LG 交叉投稿

Spatial Priors via Space Filling Curves for Small and Limited Data Vision Transformers

基于空间填充曲线的小型与有限数据视觉Transformer的空间先验

Leyla Naz Candogan, Arshia Afzal, Pol Puigdemont, Volkan Cevher

发表机构 * ETH Zürich（苏黎世联邦理工学院）

AI总结提出VIOLIN，一种轻量级掩码注意力机制，通过空间填充曲线编码空间结构，以极小的参数和计算开销为视觉Transformer注入空间归纳偏置，在小模型和有限数据场景下显著提升性能。

Comments ICML 2026

详情

AI中文摘要

尽管视觉Transformer（ViT）已成为许多计算机视觉任务中的主导骨干网络，但由于置换等变性，其注意力机制缺乏显式的空间归纳偏置。这在模型容量小或训练数据有限的情况下尤为重要。受线性Transformer中的注意力掩码策略和视觉状态空间模型（SSM）的扫描模式的启发，我们引入了VIOLIN，一种轻量级掩码注意力机制，通过空间填充曲线（SFC）在注意力中编码空间结构，仅增加不到0.0015%的额外参数和可忽略的计算开销。VIOLIN使用多条SFC扫描图像，构建曲线特定的衰减掩码，然后将其组合并与注意力矩阵相乘。在广泛的评估中，VIOLIN持续提升性能。在有限数据场景下，例如在VTAB-1K上进行微调时，它提升了所有任务组的准确率，在空间信息至关重要的任务上提升高达8.7%。它可以与参数高效微调方法（如LoRA）结合，进一步提高性能。除了微调，VIOLIN在ImageNet-1K上预训练期间改进了各种小型ViT架构（如DeiT、DINO）。此外，在高度依赖位置信息的像素级CIFAR-100训练中，VIOLIN将准确率提升了高达7.2%。总体而言，VIOLIN提供了一种计算高效且有效的方式，将空间归纳偏置注入ViT，特别有利于小模型和有限数据场景。

英文摘要

Though Vision Transformers (ViTs) have become the dominant backbone in many computer vision tasks, due to permutation equivariance, their attention mechanism lacks explicit spatial inductive biases. This become particularly important in two settings: when model capacity is small or training data is limited. Inspired by the attention masking strategies in Linear Transformers and the scanning patterns of Vision SSMs, we introduce VIOLIN, a lightweight masked attention mechanism that encodes spatial structure within attention via Space Filling Curves (SFCs) with less than 0.0015% extra parameters and negligible computational overhead. VIOLIN scans the image using multiple SFCs to construct curve-specific decay masks, which are then combined and multiplied with the attention matrix. Across a wide range of evaluations, VIOLIN consistently improves performance. In limited data regimes such as fine-tuning on VTAB-1K, it boosts accuracy across all task groups and by up to 8.7% on the tasks where spatial information is essential. It can be combined with parameter-efficient fine-tuning methods such as LoRA to further increase the performance. Beyond fine-tuning, VIOLIN improves various small scale ViT architectures (e.g., DeiT, DINO) during pretraining on ImageNet-1K. Additionally, on pixel-level CIFAR-100 training, a task that is highly dependent on location information, VIOLIN increases accuracy by up to 7.2%. Overall, VIOLIN provides a computationally efficient yet effective way to inject spatial inductive bias into ViTs, especially benefiting small models and limited data settings.

URL PDF HTML ☆

赞 0 踩 0

2606.14770 2026-06-16 cs.CV cs.AI cs.IR cs.LG 交叉投稿

An Empirical Analysis of Optimization Dynamics and Sparsity Boundaries in Large-Scale Pedestrian Attribute Recognition

大规模行人属性识别中的优化动态与稀疏边界实证分析

Houssam El Mir

发表机构 * College of Computer Science and Technology, Zhejiang University of Technology（浙江工业大学计算机科学与技术学院）

AI总结针对行人属性识别中极端类别不平衡问题，提出多标签焦点损失校准配置（alpha=0.50, gamma=2.0），在零计算开销下匹配BCE基线并提升难例挖掘，同时识别出0.1%正样本率下的稀疏墙边界。

详情

AI中文摘要

行人属性识别（PAR）对于视频监控至关重要，支持法医搜索和重识别系统。当将PETA和PA-100K合并为一个包含109,000张图像的复合语料库时，极端类别不平衡仍然是一个基本障碍，其中少数属性的正样本比例低于1%。这导致标准BCE优化抑制稀有特征，我们称之为多数负类欺骗陷阱。我们在ResNet-18骨干网络上对多标签焦点损失超参数（alpha和gamma）进行了系统消融。校准配置（alpha=0.50, gamma=2.0）实现了62.32%的宏F1分数，与BCE基线相当，同时保留了优越的难例挖掘和收敛动态。我们的方法使用纯损失函数工程，边缘部署零计算开销。我们识别出稀疏墙，这是一个硬边界，当正样本比例低于0.1%时，全局损失重新加权失效，需要实例级干预。

英文摘要

Pedestrian Attribute Recognition (PAR) is critical for video surveillance, enabling forensic search and re-identification systems. Extreme class imbalance remains a fundamental obstacle when merging PETA and PA-100K into a 109,000-image composite corpus, where minority attributes have positive sample fractions below 1%. This causes standard BCE optimization to suppress rare traits, a phenomenon we term the majority negative class cheating trap. We present a systematic ablation of Multi-Label Focal Loss hyperparameters (alpha and gamma) on a ResNet-18 backbone. A calibrated configuration (alpha=0.50, gamma=2.0) achieves a Macro F1-score of 62.32%, matching BCE baseline while preserving superior hard-example mining and convergence dynamics. Our approach uses pure loss-function engineering with zero computational overhead for edge deployment. We identify the Sparsity Wall, a hard boundary where positive sample fractions below 0.1% make global loss reweighting ineffective, requiring instance-level intervention.

URL PDF HTML ☆

赞 0 踩 0

2606.14943 2026-06-16 cs.CL cs.LG 交叉投稿

Simplifying the Modeling of Arbitrary Conditionals in Natural Language

简化自然语言中任意条件概率的建模

Yinhan Lu, Eric Elmoznino, Léo Gagnon, Sarthak Mittal, Tejas Kasetty, Guillaume Lajoie

发表机构 * Mila — Quebec AI Institute（Mila — 魁北克人工智能研究所）； McGill University（麦吉尔大学）； Université de Montréal（蒙特利尔大学）

AI总结提出AC-GPT，通过简单修改标准因果Transformer，实现单次前向传播中评估和采样任意条件（包括过去、未来和混合上下文），保持左到右顺序和下一词预测目标，无需退化标准性能。

详情

AI中文摘要

因果Transformer通过联合分布的自回归分解对序列进行建模，这使得高效的从左到右解码和条件似然计算成为可能。然而，它们无法高效地从任意条件中采样或评估——例如，以过去和未来标记为条件的文本块。最近的工作旨在通过新颖的架构解决这个问题，但通常导致对此类条件概率的次优建模和退化的生成。我们提出了任意条件GPT（AC-GPT），它引入了一个对标准因果Transformer的简单修改，使得在单次前向传播中能够评估和采样任意条件——包括过去、未来和混合上下文。与先前的方法不同，我们的方法保留了标准的从左到右顺序和下一词预测目标，这对于自然语言上的强性能和高效训练都是必不可少的。关键的是，这种兼容性允许现有的LLM被微调以进行任意条件建模。我们的实证结果表明，我们的方法在建模任意条件概率方面优于基线，且不会降低标准的从左到右性能。

英文摘要

Causal Transformers model sequences through an autoregressive factorization of the joint distribution, which enables efficient left-to-right decoding and conditional likelihood computation. However, they cannot tractably sample from or evaluate arbitrary conditionals -- e.g., a block of text conditioned on past and future tokens. Recent work aims to solve this problem through novel architectures, but they often lead to sub-optimal modeling of such conditionals and degraded generations. We propose Arbitrary Conditionals GPT (AC-GPT) which introduces a simple modification to standard causal Transformers to enable evaluating and sampling from arbitrary conditionals -- including past, future, and mixed contexts -- within a single forward pass. Unlike prior approaches, our method preserves the standard left-to-right ordering and next-token prediction objective essential for both strong performance and efficient training on natural language. Crucially, this compatibility allows existing LLMs to be fine-tuned for arbitrary conditioning. Our empirical results indicate that our method outperforms baselines on modeling arbitrary conditionals, without degrading standard left-to-right performance.

URL PDF HTML ☆

赞 0 踩 0

2606.14975 2026-06-16 cs.NE cs.AI cs.LG physics.data-an q-bio.NC 交叉投稿

Harnessing cortical geometry, wiring, and function as inductive biases for recurrent neural networks

利用皮层几何、连接和功能作为循环神经网络的归纳偏置

Mo Shakiba, Rana Rokni, Mohammad Mohammadi, Nima Dehghani

发表机构 * Neuromatch Academy, Neuromatch, Inc., USA（Neuromatch学院，Neuromatch公司，美国）； McGovern Institute for Brain Research, Massachusetts Institute of Technology (MIT)（麦戈文脑科学研究所，麻省理工学院（MIT））

AI总结本研究利用MICrONS项目数据，通过神经元空间坐标、解剖连接和功能关系初始化循环权重并施加空间约束，构建生物基础循环神经网络，在认知决策任务中优于基线模型，并发展出低熵、模块化和小世界组织。

详情

AI中文摘要

皮层的连接和功能组织如何塑造循环计算仍然是神经科学和机器学习中的一个核心问题。在这里，我们利用通过皮层网络机器智能（MICrONS）项目发布的数据——一个涵盖小鼠视觉皮层多个区域的功能连接组学资源，其中密集钙成像与同一动物的高分辨率电子显微镜重建共同配准——来构建生物基础的循环神经网络。使用来自近12,000个共同配准的兴奋性神经元的神经元空间坐标、解剖连接和功能衍生关系，我们初始化循环权重并在学习过程中施加通信感知的空间约束。在三个认知决策任务中，受皮层结构和功能约束的网络始终优于基线和部分约束模型。功能权重初始化提供了最大的增益，而真实空间嵌入在多种条件下产生了稳健的额外改进。这些生物基础网络还发展出低熵、模块化和小世界组织，并且即使当循环被限制为正权重时也能保持强劲性能。总之，我们的结果表明，皮层的机制——其几何、连接和功能结构——可以作为构建循环网络的强大归纳基础，这些网络学习更有效，同时收敛于生物计算的关键组织原则。

英文摘要

How the wiring and functional organization of cortex shape recurrent computation remains a central question in both neuroscience and machine learning. Here, we leverage data released through the Machine Intelligence from Cortical Networks (MICrONS) program--a functional connectomics resource spanning multiple areas of mouse visual cortex, in which dense calcium imaging is co-registered with high-resolution electron microscopy reconstruction from the same animal--to build biologically grounded recurrent neural networks. Using neuronal spatial coordinates, anatomical connectivity, and function-derived relationships from nearly 12,000 coregistered excitatory neurons, we initialize recurrent weights and impose communication-aware spatial constraints during learning. Across three cognitive decision-making tasks, networks constrained by cortical structure and function consistently outperform baseline and partially constrained models. Functional weight initialization provides the largest gain, while real spatial embedding yields robust additional improvements across conditions. These biologically grounded networks also develop low-entropy, modular, and small-world organization, and retain strong performance even when recurrence is restricted to positive weights. Together, our results show that the machinery of cortex--its geometry, wiring, and functional structure--can be harnessed as a powerful inductive basis for building recurrent networks that learn more effectively while converging toward key organizational principles of biological computation.

URL PDF HTML ☆

赞 0 踩 0

2606.14997 2026-06-16 cs.AI cs.LG 交叉投稿

AI Engram: In Search of Memory Traces in Artificial Intelligence

AI Engram: 在人工智能中寻找记忆痕迹

Jea Kwon, Dong-Kyum Kim, Jiwon Kim, Yonghyun Kim, Woong Kook, Meeyoung Cha

发表机构 * University of California, Berkeley（加州大学伯克利分校）； KAIST（韩国科学技术院）

AI总结提出几何框架，通过约束逆问题形式化神经科学标准，识别深度神经网络中的记忆痕迹（AI engram），实现记忆的线性组合与擦除，无需迭代优化。

Comments Accepted to ICML 2026 (Oral). Code is available at https://github.com/jeakwon/ai-engram/

详情

AI中文摘要

记忆形成是智能的基础，但深度神经网络是否保留类似于生物记忆单元的可识别记忆痕迹仍是一个未解问题。本文引入一个几何框架，通过将神经科学标准（特异性、再激活、充分性和必要性）形式化为约束逆问题，来识别此类“AI engram”。我们推导出一个闭式估计器，从全局纠缠参数中分离出单个记忆痕迹，并证明这一生物学启发的解对应于参数流形上的自然梯度更新。AI engram 允许对学习知识进行手术式操作：任何记忆子集可以通过线性算术进行组合或擦除，无需迭代优化。从简单 MLP 到大语言模型的实验证明了 AI engram 的因果有效性和显著可扩展性。总之，这些结果桥接了生物记忆理论与人工表示学习，并提供了关于深度网络如何在分布式存储中同时支持功能特异性的几何洞见。

英文摘要

Memory formation is fundamental to intelligence, yet whether deep neural networks preserve identifiable memory traces analogous to biological memory units remains an open question. This work introduces a geometric framework to identify such "AI engrams" by formalizing the neuroscientific criteria of specificity, reactivation, sufficiency, and necessity into a constrained inverse problem. We derive a closed-form estimator that isolates individual memory traces from globally entangled parameters, and show that this biologically-derived solution corresponds to a natural gradient update on the parameter manifold. AI engrams enable surgical manipulation of learned knowledge: any subset of memories can be composed or erased through linear arithmetic, without iterative optimization. Experiments ranging from simple MLPs to LLMs demonstrate the causal validity and substantial scalability of AI engrams. Together, these results bridge theories of biological memory and artificial representation learning and offer geometric insight into how deep networks simultaneously support functional specificity within distributed storage.

URL PDF HTML ☆

赞 0 踩 0

2606.15007 2026-06-16 cs.CL cs.AI cs.LG 交叉投稿

Schattor：用于深度学习优化的Schatten族方法

Bohao Ma, Junyu Zhang, Chuan He

发表机构 * School of Data Science, The Chinese University of Hong Kong (Shenzhen)（香港中文大学（深圳）数据科学学院）； Department of Industrial Systems Engineering and Management, National University of Singapore（新加坡国立大学工业系统工程与管理系）； Department of Mathematics, Linköping University（利乌波德大学数学系）

AI总结提出基于Schatten范数的自适应一阶方法族Schattor，统一SGD与Muon，通过矩阵鞅矩界实现无维数平稳性保证，并开发多块扩展以自适应平衡块优化。

Comments 32 pages

2606.15751 2026-06-16 cs.SD cs.LG cs.MM eess.AS 交叉投稿

Acoustic Prompting via Stage-wise Modulation for Few-Shot Learning in Audio Language Models

通过阶段调制进行声学提示以实现音频语言模型中的少样本学习

Hyebin Cho, Jaehyuk Jang, Changick Kim, Joon Son Chung

发表机构 * Korea Advanced Institute of Science and Technology（韩国科学技术院）

AI总结提出在音频编码器中引入可训练提示以捕获任务特定声学特征，与文本提示结合提升少样本适应性能，在11个数据集上验证有效性。

Comments Accepted to INTERSPEECH 2026

详情

AI中文摘要

音频-语言模型（ALMs）通过将音频波形与文本对齐，在零样本音频分类中取得了显著成功。最近改进下游性能的努力集中在学习最优文本提示上。然而，先前的方法侧重于文本编码器，忽略了音频编码器中可学习提示的潜力。在本文中，我们提出了一种新颖框架，将可训练提示引入音频编码器以捕获任务特定的声学特征。我们证明，将音频侧提示学习与现有文本侧方法相结合可以增强少样本适应。通过在11个数据集上的广泛实验表明，将我们的方法作为即插即用模块与现有文本提示调优相结合通常能带来性能提升。这些发现表明，显式调制音频表示空间可以有效补充仅文本提示方法。代码可在 https://github.com/hyebin-c/aspl 获取。

英文摘要

Audio-Language Models (ALMs) have shown remarkable success in zero-shot audio classification by aligning audio waveforms with text. Recent efforts to improve downstream performance focus on learning optimal text prompts. However, previous approaches focus on the text encoder, leaving the potential of learnable prompts within the audio encoder unexplored. In this paper, we propose a novel framework that introduces trainable prompts into the audio encoder to capture task-specific acoustic features. We demonstrate that integrating audio-side prompt learning with existing text-side approaches enhances few-shot adaptation. Through extensive experiments across 11 datasets show that integrating our method as a plug-and-play module alongside existing text prompt tuning generally leads to performance improvements. These findings suggest that explicitly modulating the audio representation space effectively complements text-only prompting approaches. The code is available at https://github.com/hyebin-c/aspl.

URL PDF HTML ☆

赞 0 踩 0

2606.15837 2026-06-16 cs.CV cs.LG stat.ME stat.ML 交叉投稿

Learning a Sampling-Free Variational DNN Plugin from Tiny Training Sets to Refine OOD Segmentation With Uncertainty Estimation

学习一种无采样的变分DNN插件，从微小训练集精炼OOD分割并估计不确定性

Jimut B. Pal, Suyash P. Awate

发表机构 * Centre for Machine Intelligence and Data Science (C-MInDS), Indian Institute of Technology (IIT) Bombay（印度理工学院孟买分校机器智能与数据科学中心）； Computer Science and Engineering (CSE) Department, Indian Institute of Technology (IIT) Bombay（印度理工学院孟买分校计算机科学与工程系）

AI总结提出VarDeepPCA，一种轻量级变分DNN框架，利用小分布内数据集学习有效解剖几何分布，无需目标域数据或预训练，通过重新解释softmax映射实现无采样推理，并提供不确定性估计，在4种临床应用中显著提升OOD分割的解剖合理性和准确性。

Comments Accepted at the Journal of Machine Learning for Biomedical Imaging

详情

AI中文摘要

深度神经网络（DNN）由于扫描仪和采集协议的变化，经常无法泛化到分布外（OOD）的医学图像。由于获取和标注新医学数据集的成本高昂，重新训练DNN模型以应对这些分布偏移通常不切实际。为了解决这个问题，我们引入了VarDeepPCA，一种新颖的轻量级变分DNN框架，旨在通过利用内在几何先验来恢复/精炼退化的分割图。与需要目标域数据或大量预训练的现有方法不同，我们的VarDeepPCA仅使用小的分布内（ID）数据集显式学习有效解剖几何的分布。理论上，我们的新颖变分学习框架利用对softmax映射的重新解释来隐式执行精确分布建模，从而实现计算高效、无采样的学习和推理。这也使VarDeepPCA能够为其恢复的分割图提供不确定性估计。我们在4种不同的临床应用上，使用14个公开可用的数据集，涉及心肌、神经视网膜边缘、前列腺和胎儿头部分割，对我们的框架进行了实证验证。与15种现有方法的比较表明，VarDeepPCA一致地恢复了现有方法在OOD数据上产生的分割图，以（i）显著提高几何的解剖合理性和分割的临床实用性，以及（ii）显著减少误差，而不需要比现有方法更多的训练数据。

英文摘要

Deep neural networks (DNNs) frequently fail to generalize to out-of-distribution (OOD) medical images because of variations in scanners and acquisition protocols. Retraining DNN models to address these distribution shifts is often impractical due to the high cost of acquiring and annotating new medical datasets. To address this, we introduce VarDeepPCA, a novel lightweight variational DNN framework designed to restore/refine degraded segmentation maps by leveraging intrinsic geometric priors. Unlike existing approaches that require target-domain data or extensive pre-training, our VarDeepPCA explicitly learns a distribution of valid anatomical geometries using only small in-distribution (ID) datasets. Theoretically, our novel variational learning framework leverages a reinterpretation of the softmax mapping to implicitly perform exact distribution modeling, thereby enabling computationally efficient, sampling-free learning and inference. This also enables VarDeepPCA to provide uncertainty estimates associated with its restored segmentation maps. We empirically validate our framework across 4 distinct clinical applications, using 14 publicly available datasets, involving segmentation of the myocardium, neuroretinal rim, prostate, and fetal head. Comparisons against 15 existing methods demonstrate that VarDeepPCA consistently restores segmentation maps produced by the existing methods on OOD data to (i) significantly improve anatomical plausibility of geometries and clinical utility of the segmentations, and (ii) significantly reduce errors, without needing any more training data than that used by existing methods.

URL PDF HTML ☆

赞 0 踩 0

2606.16222 2026-06-16 cs.AI cs.LG 交叉投稿

Latent Thought Flow: Efficient Latent Reasoning in Large Language Models

潜在思维流：大型语言模型中的高效潜在推理

Xiandong Zou, Jing Huang, Jianshu Li, Pan Zhou

发表机构 * Singapore Management University（新加坡管理大学）； Ant Group（蚂蚁集团）

AI总结提出Latent Thought Flow (LTF)方法，将推理建模为可变长度连续轨迹，通过连续GFlowNet训练采样器匹配奖励后验，在提升准确率9.5%的同时平均减少推理长度27.2%。

详情

AI中文摘要

大型语言模型（LLMs）越来越依赖中间推理，然而显式的思维链（CoT）存在语言空间瓶颈：每个思维必须解码为token，导致高推理开销。潜在推理将思考过程转移到连续空间，但现有方法大多学习确定性或奖励最大化路径，缺乏在具有不同正确性和成本的轨迹间分配概率的原则性方法。我们提出潜在思维流（LTF），将推理建模为可变长度连续轨迹，并训练采样器以匹配由答案质量和计算成本定义的奖励诱导后验。我们使用具有随机潜在转移的连续GFlowNet实例化该方法。为处理稀疏答案监督，我们引入熵加权子轨迹平衡目标以获取中间奖励，以及参考先验正则化器以锚定探索。在微调和迁移学习设置下的实验表明，与强潜在推理基线相比，LTF在平均减少推理长度27.2%的同时，准确率提升9.5%，优于显式CoT和潜在推理基线。

英文摘要

Large Language Models (LLMs) increasingly rely on intermediate reasoning, yet explicit Chain-of-Thought (CoT) suffers from a linguistic space bottleneck: each thought must be decoded into tokens, causing high inference overhead. Latent reasoning moves deliberation into continuous space, but existing methods mostly learn deterministic or reward-maximizing paths, lacking a principled way to allocate probability across trajectories with different correctness and costs. We propose Latent Thought Flow (LTF), which models reasoning as variable-length continuous trajectories and trains a sampler to match a reward-induced posterior over answer quality and computation cost. We instantiate this with a continuous GFlowNet using stochastic latent transitions. To handle sparse answer supervision, we introduce an Entropy-Weighted Subtrajectory Balance objective for intermediate rewards and a reference-prior regularizer to anchor exploration. Experiments under finetuning and transfer learning settings show that LTF outperforms explicit CoT and latent reasoning baselines, improving accuracy by 9.5% while reducing reasoning length by 27.2% on average compared with strong latent reasoning baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.16730 2026-06-16 stat.ML cs.AI cs.LG 交叉投稿

Attention is Just Another Name for Coupling?: A Fast-Slow ODE Perspective on Hierarchical Pretraining

注意力只是耦合的另一个名字？：关于层级预训练的快速-慢速ODE视角

Zhengyuan Gao

AI总结本文提出一种快慢ODE视角，将因果自注意力视为耦合机制，并引入一个通过零初始化门控反馈到快路径的慢子系统，在理论证明和实验验证中揭示了其与主方程平稳分布的联系。

详情

AI中文摘要

因果自注意力是一种耦合机制：每个token的隐藏状态通过同一时间尺度上前置token的学习混合来更新。本文提出一个疑问：是否存在第二个时间上更慢的耦合——一个在序列的时间下采样视图上运行并通过零初始化门控反馈到快路径的慢子系统——来补充它？该问题以奇异摄动常微分方程（ODE）的语言提出，其中快变量$x$以token速率演化，慢变量$y$每$P$个token更新一次，时间尺度比$\varepsilon = 1/P$通过因果块均值池化在结构上强制执行。\n本文将快慢ODE形式具体化为一个神经网络：一个在$T$个token上的标准因果注意力快路径，一个在$T/P$个池化token上的全注意力慢路径（每层便宜$P^2$倍），以及一个零初始化的加法门控。此外，在快动力学的线性生成器假设下，我们证明了平衡流形$x = \phi(y)$恰好是主方程（ME）的平稳分布$p_{\mathrm{st}}(y)$；在该机制下，学习的MLP $\phi_\theta(y)$是其变分近似（训练块不是生成器，因此该恒等式是结构极限，而非对训练网络的断言）。实验上，在50万token时，耦合是中性的——门控保持关闭，耦合和冻结消融在运行间噪声范围内——其墙钟成本与密集基线相当。贡献在于精确的、带有间隙标记的映射本身，而非性能提升。

英文摘要

Causal self-attention is a coupling mechanism: each token's hidden state is updated by a learned mixture of preceding tokens at the same timescale. This paper asks whether a second, temporally slower coupling-a slow sub-system operating on a temporally-downsampled view of the sequence and fed back into the fast path through a zero-initialised gate-complements it. The question is framed in the language of singularly perturbed ordinary differential equations (ODEs), where the fast variable $x$ evolves at the token rate, the slow variable $y$ evolves at one update per $P$ tokens, and the timescale ratio $\varepsilon = 1/P$ is enforced structurally by causal block-mean pooling. The paper instantiates the fast-slow ODE formalism as a concrete neural network: a fast path of standard causal attention over $T$ tokens, a slow path of full attention over $T/P$ pooled tokens ($P^2 \times$ cheaper per layer), and a zero-initialised additive gate. In addition, under a linear-generator assumption on the fast dynamics, we prove that the equilibrium manifold $x = ϕ(y)$ is exactly the master-equation (ME) stationary distribution $p_{\mathrm{st}}(y)$; in that regime a learned MLP $ϕ_θ(y)$ is a variational approximation of it (the trained block is not a generator, so this identity is the structured limit, not a claim about the network as trained). Empirically, at $500$k tokens the coupling is neutral -- the gate stays closed and the coupled and frozen ablations are within run-to-run noise -- at a wall-clock cost comparable to a dense baseline. The contribution is the precise, gap-marked mapping itself, not a performance gain.

URL PDF HTML ☆

赞 0 踩 0

2606.16783 2026-06-16 cs.CV cs.AI cs.LG 交叉投稿

Gen-VCoT: Generative Visual Chain-of-Thought Reasoning via Diffusion-Based RGB Intermediate Representations

Gen-VCoT: 基于扩散的RGB中间表示的生成式视觉思维链推理

Zhiqiang Zhou, Junliang Dai, Xu ling

发表机构 * Hunan Chemical Industry Vocational and Technical College（湖南化工职业技术学院）

AI总结提出Gen-VCoT框架，利用专家视觉模型生成RGB图像作为推理中间步骤，通过自适应路由器选择推理深度，在空间和深度问题上分别提升25%和50%，但简单事实查询性能下降，表明最优表示依赖于任务。

Comments 12 pages, 5 figures

2606.16825 2026-06-16 cs.CL cs.AI cs.LG 交叉投稿

Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models

循环绑定——混合专家语言模型中的专家层绑定

Martin Jaggi

发表机构 * EPFL（瑞士联邦理工学院洛桑）

AI总结提出专家绑定方法，通过共享连续Transformer层的专家参数，在保持独立路由和注意力的同时，将MoE模型内存占用降低近2倍，且不损失困惑度或下游性能。

Comments Code available at https://github.com/epfml/looped-moe

详情

AI中文摘要

混合专家（MoE）架构通过每个令牌仅激活一小部分专家来高效扩展大型语言模型（LLM），但全部参数计数——主要由专家参数主导——必须保留在训练和推理内存中。为了解决这个问题，我们引入了专家绑定（Expert Tying），这是一种架构修改，它在连续Transformer层之间共享专家参数，同时保留独立的逐层路由和注意力。我们在常见的先进架构上评估了这种方法，包括OLMoE、Qwen3和DeepSeek风格的MoE。我们的预训练实验表明，绑定专家可以将内存占用减少近2倍，而几乎不降低困惑度或下游质量。通过利用MoE路径中固有的参数冗余，我们的方法提供了高度有利的计算-内存权衡，推动了下一代LLM的高效训练和扩展。

英文摘要

Mixture-of-Experts (MoE) architectures efficiently scale Large Language Models (LLMs) by activating only a small fraction of their experts per token, yet the full parameter count - dominated by the expert parameters - must be held in training and inference memory. To address this, we introduce Expert Tying, an architectural modification that shares expert parameters across consecutive transformer layers while preserving independent, layer-wise routing and attention. We evaluate this approach across common, state-of-the-art architectures, including OLMoE, Qwen3, and DeepSeek-style MoEs. Our pretraining experiments demonstrate that tying experts can reduce memory footprint by almost 2x at virtually no degradation in perplexity or downstream quality. By exploiting the parameter redundancy inherent in MoE pathways, our method provides a highly favorable compute-to-memory trade-off, advancing efficient training and scaling of next-generation LLMs.

URL PDF HTML ☆

赞 0 踩 0

2606.16934 2026-06-16 cs.CL cs.LG 交叉投稿

Exploring Extrinsic and Intrinsic Properties for Effective Reasoning with Code Interpreter

探索代码解释器有效推理的外在属性与内在属性

Patomporn Payoungkhamdee, Napat Laosaengpha, Jenta Wonglertsakul, Pittawat Taveekitworachai, Pume Tuchinda, Panjapong Poobanchuen, Ekapol Chuangsuwanich, Can Udomcharoenchaikit, Samuel Cahyawijaya, Peerat Limkonchotiwat, Sarana Nutanong

发表机构 * Vidyasirimedhi Institute of Science and Technology（维达亚希米科技学院）； Kasetsart University（科琼大学）； SCB 10X ； King Mongkut’s University of Technology Thonburi（朱拉隆功技术大学泰竹分校）； Department of Computer Engineering Chulalongkorn Univesity（朱拉隆功大学计算机工程系）； Cohere ； AI Singapore（AI新加坡）

AI总结本文从外在属性（关键token）和内在属性（代码特定认知行为）两个角度研究代码解释器推理，发现强模型更频繁出现关键token和验证、回溯等行为，并利用这些属性在推理和训练中提升性能。

详情

AI中文摘要

使用代码解释器（CI）进行推理已成为一种有效范式，通过可执行计算和迭代验证增强大型语言模型（LLM）的推理能力。尽管其应用日益广泛，但有效代码推理的行为属性仍未被充分探索。在本工作中，我们受自然语言推理研究的启发，从两个不同视角研究代码推理：外在属性（由关键token表示）和内在属性（由代码特定的认知行为表示）。在多个LLM上，我们发现更强的CI推理模型一致地表现出更高比例的关键token和认知行为，特别是验证、回溯和反向链。基于这些观察，我们研究了如何在推理和训练期间利用这些属性。在推理时，附加代码特定的关键token在数学、排序和优化等若干推理能力上提升了性能，但在其他方面收益有限。在训练时，用代码特定的认知行为增强最先进的框架，在三个评估模型中的两个上提升了监督微调和强化学习性能。进一步分析表明，这些行为减少了错误回答中的过度思考，提高了token效率，同时也揭示了限制某个模型收益的因素。我们的发现首次系统性地描述了有效CI推理的特征，并展示了利用关键属性改进CI推理的潜力和局限性。

英文摘要

Reasoning with a Code Interpreter (CI) has emerged as an effective paradigm for enhancing the reasoning capabilities of large language models (LLMs) through executable computation and iterative verification. Despite its growing adoption, the behavioral properties underlying effective code reasoning remain largely underexplored. In this work, we investigate code reasoning from two distinct perspectives inspired by prior studies of natural language reasoning: extrinsic properties, represented by crucial tokens, and intrinsic properties, represented by code-specific cognitive behaviors. Across multiple LLMs, we find that stronger CI reasoning models consistently exhibit a higher prevalence of crucial tokens and cognitive behaviors, particularly verification, backtracking, and backward chaining. Building on these observations, we examine how these properties can be leveraged during both inference and training. At inference time, appending code-specific crucial tokens improves performance on several reasoning capabilities, including mathematical, ordering, and optimization, while yielding limited benefits elsewhere. At training time, augmenting a state-of-the-art framework with code-specific cognitive behaviors improves supervised fine-tuning and reinforcement learning performance in two of three evaluated models. Further analysis shows that these behaviors reduce overthinking in incorrect responses and improve token efficiency, while also revealing factors that limit gains in a certain model. Our findings provide the first systematic characterization of effective reasoning with CI and demonstrate both the potential and limitations of leveraging key properties to improve CI-based reasoning.

URL PDF HTML ☆

赞 0 踩 0

2606.16996 2026-06-16 cs.CV cs.AI cs.LG 交叉投稿

ActiveSAM: Image-Conditional Class Pruning for Fast and Accurate Open-Vocabulary Segmentation

ActiveSAM: 图像条件类别剪枝实现快速准确的开放词汇分割

Tran Dinh Tien, Zhiqiang Shen

发表机构 * VILA Lab, Mohamed bin Zayed University of Artificial Intelligence（VILA实验室，穆罕默德·本·扎耶德人工智能大学）

AI总结提出ActiveSAM，一种无需训练、零样本的推理框架，通过图像条件类别剪枝和低分辨率预览，将SAM 3转化为主动词汇分割器，在8个基准上平均提升1.4 mIoU，速度提升最高5.5倍。

Comments Preprint. Code is available at https://github.com/VILA-Lab/ActiveSAM

详情

AI中文摘要

Segment Anything Model 3 (SAM 3) 为概念提示分割提供了强大的冻结骨干网络，但直接应用于开放词汇语义分割 (OVSS) 效率低下：全分辨率解码通常在整个数据集词汇表上运行，而每个图像只包含一小部分活跃类别。我们引入ActiveSAM，一种无需训练、零样本的推理框架，将SAM 3转化为主动词汇分割器。ActiveSAM首先规范化并扩展类别提示，然后从低分辨率存在预览中估计图像条件的活跃集。只有保留的类别使用冻结的SAM 3解码器进行桶式提示复用全分辨率解码。预览阶段仅使用类别存在证据，跳过不必要的分割头计算，而最终阶段应用边缘感知背景校准以抑制低置信度像素。ActiveSAM不需要目标数据集训练、权重更新或oracle类别存在标签。在八个OVSS基准上，ActiveSAM改善了无需训练的开放词汇语义分割的速度-准确率权衡，平均比当前最先进的SegEarth-OV3高出约+1.4 mIoU，同时在大型词汇数据集上运行速度最高提升5.5倍。ActiveSAM在模拟真实世界分布偏移的图像损坏下也表现出最强的鲁棒性，使其非常适合部署在噪声输入领域，如自动驾驶和具身AI。代码可在https://github.com/VILA-Lab/ActiveSAM获取。

英文摘要

Segment Anything Model 3 (SAM 3) provides a strong frozen backbone for concept-prompted segmentation, but applying it directly to open-vocabulary semantic segmentation (OVSS) is inefficient: full-resolution decoding is typically run over the entire dataset vocabulary, whereas each image contains only a small active subset of classes. We introduce ActiveSAM, a training-free, zero-shot inference framework that turns SAM 3 into an active-vocabulary segmenter. ActiveSAM first canonicalizes and expands class prompts, then estimates an image-conditioned active set from a low-resolution presence preview. Only the retained classes are decoded at full resolution, using bucketed prompt multiplexing with the frozen SAM 3 decoder. The preview stage uses only class-presence evidence and skips unnecessary segmentation-head computation, while the final stage applies margin-aware background calibration to suppress low-confidence pixels. ActiveSAM requires no target-dataset training, no weight updates, and no oracle class-presence labels. Across eight OVSS benchmarks, ActiveSAM improves the speed-accuracy tradeoff of training-free open-vocabulary semantic segmentation, outperforming the current state-of-the-art SegEarth-OV3 by approximately +1.4 mIoU on average while running up to 5.5x faster on large-vocabulary datasets. ActiveSAM also demonstrates the strongest robustness under image corruption that simulates real-world distribution shift, making it well-suited for deployment in noisy-input domains such as autonomous driving and embodied AI. Code is available at https://github.com/VILA-Lab/ActiveSAM.

URL PDF HTML ☆

赞 0 踩 0

基于共振-放电神经元的神经形态无线分割计算

Dengyu Wu, Jiechen Chen, H. Vincent Poor, Bipin Rajendran, Osvaldo Simeone

发表机构 * Department of Engineering, King’s College London（工程系，伦敦国王学院）； Department of Electrical and Computer Engineering, Princeton University（电气与计算机工程系，普林斯顿大学）； Institute for Intelligent Networked Systems, Northeastern University London（智能网络化系统研究所，伦敦东北大学）

AI总结提出一种利用共振-放电神经元直接处理时域信号的无线分割计算架构，通过振荡动力学提取谱特征，降低脉冲率和能耗，在音频和调制分类任务中达到与传统方法相当的精度。

详情

AI中文摘要

神经形态计算为传统深度学习加速器提供了一种节能替代方案，尤其适用于时间序列数据的实时处理。然而，许多边缘应用（如无线感知和音频识别）生成的流信号具有丰富的频谱特征，而传统的漏积分-放电（LIF）脉冲神经元无法有效捕获这些特征。本文研究了一种无线分割计算架构，该架构采用具有振荡动力学的共振-放电（RF）神经元直接处理时域信号，从而消除了昂贵的频谱预处理需求。通过在可调频率上共振，RF神经元在保持低脉冲活动的同时提取时间局部化的频谱特征。这种时间稀疏性转化为计算和传输能量的显著节省。假设采用基于OFDM的模拟无线接口进行脉冲传输，我们提出了一个完整的系统设计，并在音频分类和调制分类任务上评估其性能。实验结果表明，所提出的RF-SNN架构在推理和通信期间实现了与传统LIF-SNN和ANN相当的精度，同时显著降低了脉冲率和总能耗。

英文摘要

Neuromorphic computing offers an energy-efficient alternative to conventional deep learning accelerators, particularly for real-time processing of time-series data. However, many edge applications, such as wireless sensing and audio recognition, generate streaming signals with rich spectral features that are not effectively captured by conventional leaky integrate-and-fire (LIF) spiking neurons. This paper investigates a wireless split computing architecture that employs resonate-and-fire (RF) neurons with oscillatory dynamics to process time-domain signals directly, eliminating the need for costly spectral pre-processing. By resonating at tunable frequencies, RF neurons extract time-localized spectral features while maintaining low spiking activity. This temporal sparsity translates into significant savings in both computation and transmission energy. Assuming an OFDM-based analog wireless interface for spike transmission, we present a complete system design and evaluate its performance on audio classification and modulation classification tasks. Experimental results show that the proposed RF-SNN architecture achieves comparable accuracy to conventional LIF-SNNs and ANNs, while substantially reducing spike rates and total energy consumption during inference and communication.

URL PDF HTML ☆

赞 0 踩 0

2508.05287 2026-06-16 cs.LG cs.AI 版本更新

FlowState: Sampling-Rate-Equivariant Time-Series Forecasting

FlowState: 采样率等变的时间序列预测

Lars Graf, Thomas Ortner, Stanisław Woźniak, Angeliki Pantazi

发表机构 * GitHub

AI总结提出FlowState架构，通过状态空间模型编码器和函数基解码器实现采样率等变预测，无需重新训练即可适应不同采样率和预测长度，在GIFT-Eval基准上取得最优结果。

详情

AI中文摘要

现有的时间序列基础模型（TSFMs）通常基于Transformer变体，缺乏对不同采样率的适应性，难以在不同上下文和目标长度上泛化，且计算效率低下。我们提出FlowState，一种新颖的TSFM架构，通过将状态空间模型（SSM）编码器与函数基解码器（FBD）配对，实现采样率等变预测。这种设计支持连续时间建模和动态时间尺度调整，使FlowState能够天然地泛化到所有可能的时间分辨率，并动态调整预测范围而无需重新训练。我们进一步提出一种高效的预训练策略，提高了鲁棒性并加速了训练。尽管FlowState是最小的TSFMs之一，它在广泛使用的GIFT-Eval基准上取得了最先进的结果，同时展现出对未见采样率的卓越适应性。我们的详细分析证实了其组件的有效性，并展示了其适应不同输入采样率的独特能力。

英文摘要

Existing time series foundation models (TSFMs), often based on transformer variants, lack adaptability to different sampling rates, struggle with generalization across varying context and target lengths, and are computationally inefficient. We introduce FlowState, a novel TSFM architecture that achieves sampling-rate-equivariant forecasting through a unified design that pairs a state space model (SSM) encoder with a functional basis decoder (FBD). This design enables continuous-time modeling and dynamic time-scale adjustment, allowing FlowState to inherently generalize across all possible temporal resolutions, and dynamically adjust the forecasting horizons without retraining. We further propose an efficient pretraining strategy that improves robustness and accelerates training. Despite being one of the smallest TSFMs, FlowState achieves state-of-the-art results on the widely used GIFT-Eval benchmark, while demonstrating superior adaptability to unseen sampling rates. Our detailed analyses confirm the effectiveness of its components, and we demonstrate its unique ability to adapt to varying input sampling rates.

URL PDF HTML ☆

赞 0 踩 0

2601.16509 2026-06-16 cs.LG cs.AI 版本更新

Adaptive $k$NN graph model

自适应 $k$NN 图模型

Jiaye Li, Hang Xu, Shichao Zhang

发表机构 * The State Key Laboratory of Blockchain and Data Security（区块链与数据安全国家重点实验室）； Zhejiang University（浙江大学）； The School of Computer Science and Engineering（计算机科学与工程学院）； Central South University（中南大学）； School of Computer Science and Engineering（计算机科学与工程学院）； Guangxi Normal University（广西师范大学）

AI总结提出一种基于分层可导航小世界图与预计算投票机制的自适应图模型，将邻居选择与加权的计算负担转移到训练阶段，在保持分类精度的同时实现实时推理速度。

Comments 31 pages, 5 figures

详情

DOI: 10.1038/s41467-026-74296-2

AI中文摘要

$k$ 近邻 ($k$NN) 算法是人工智能中非参数分类的基石，但其在大规模应用中的部署始终受到推理速度与准确性之间计算权衡的限制。现有的近似最近邻解决方案加速了检索，但往往降低了分类精度，并且缺乏选择最优邻域大小 ($k$) 的自适应性。本文提出了一种自适应图模型，将推理延迟与计算复杂度解耦。通过将分层可导航小世界 (HNSW) 图与预计算投票机制相结合，我们的框架将邻居选择和加权的计算负担完全转移到训练阶段。在这种拓扑结构中，较高的图层次实现快速导航，而较低的层次则通过自适应邻居数量编码精确的、节点特定的决策边界。在六个不同数据集上与八种最先进基线进行基准测试，我们证明了该架构显著加速了推理速度，实现了实时性能，且不牺牲分类精度。这些发现为 $k$NN 固有的推理瓶颈提供了可扩展、鲁棒的解决方案，为基于图的非参数学习奠定了自适应的结构基础。

英文摘要

The $k$-nearest neighbors ($k$NN) algorithm is a cornerstone of non-parametric classification in artificial intelligence, yet its deployment in large-scale applications is persistently constrained by the computational trade-off between inference speed and accuracy. Existing approximate nearest neighbor solutions accelerate retrieval but often degrade classification precision and lack adaptability in selecting the optimal neighborhood size ($k$). Here, we present an adaptive graph model that decouples inference latency from computational complexity. By integrating a Hierarchical Navigable Small World (HNSW) graph with a pre-computed voting mechanism, our framework completely transfers the computational burden of neighbor selection and weighting to the training phase. Within this topological structure, higher graph layers enable rapid navigation, while lower layers encode precise, node-specific decision boundaries with adaptive neighbor counts. Benchmarking against eight state-of-the-art baselines across six diverse datasets, we demonstrate that this architecture significantly accelerates inference speeds, achieving real-time performance, without compromising classification accuracy. These findings offer a scalable, robust solution to the inherent inference bottleneck of $k$NN, laying an adaptive structural foundation for graph-based nonparametric learning.

URL PDF HTML ☆

赞 0 踩 0

2601.22642 2026-06-16 cs.LG 版本更新

Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification

突破自然推理的边界：来自形式逻辑验证的交错奖励

Chuxue Cao, Jinluan Yang, Haoran Li, Kunhao Pan, Zijian Zhao, Zhengyu Chen, Yuchen Tian, Lijun Wu, Conghui He, Sirui Han, Yike Guo

发表机构 * GitHub

AI总结提出形式逻辑验证引导框架，通过交错验证与生成过程实时纠正推理错误，结合两阶段训练，在数学、逻辑和通用推理基准上显著提升大模型性能。

Comments Accepted by ICML 26

详情

AI中文摘要

大型语言模型（LLMs）展现出卓越的能力，但其随机性的下一个词预测会导致逻辑不一致和奖励黑客问题，而形式符号系统则避免了这些问题。为弥合这一差距，我们引入了一个形式逻辑验证引导的框架，该框架动态地将形式符号验证与自然语言生成过程交错进行，提供实时反馈以在错误发生时检测并纠正它们。与之前受限于被动事后验证的神经符号方法不同，我们的方法在推理链中主动惩罚中间谬误。我们通过一种新颖的两阶段训练流程来实现该框架，该流程协同了形式逻辑验证引导的监督微调和策略优化。在涵盖数学、逻辑和通用推理的六个基准上的广泛评估表明，我们的7B和14B模型分别以平均10.4%和14.2%的幅度优于最先进的基线。这些结果验证了形式验证可以作为一种可扩展机制，显著推动高级LLM推理的性能边界。

英文摘要

Large Language Models (LLMs) show remarkable capabilities, yet their stochastic next-token prediction creates logical inconsistencies and reward hacking that formal symbolic systems avoid. To bridge this gap, we introduce a formal logic verification-guided framework that dynamically interleaves formal symbolic verification with the natural language generation process, providing real-time feedback to detect and rectify errors as they occur. Distinguished from previous neuro-symbolic methods limited by passive post-hoc validation, our approach actively penalizes intermediate fallacies during the reasoning chain. We operationalize this framework via a novel two-stage training pipeline that synergizes formal logic verification-guided supervised fine-tuning and policy optimization. Extensive evaluation on six benchmarks spanning mathematical, logical, and general reasoning demonstrates that our 7B and 14B models outperform state-of-the-art baselines by average margins of 10.4% and 14.2%, respectively. These results validate that formal verification can serve as a scalable mechanism to significantly push the performance boundaries of advanced LLM reasoning.

URL PDF HTML ☆

赞 0 踩 0

2602.05352 2026-06-16 cs.LG math.SG 版本更新

Smoothness Errors in Dynamics Models and How to Avoid Them

动力学模型中的平滑误差及如何避免

Edward Berman, Luisa Li, Jung Yeon Park, Robin Walters

发表机构 * GitHub ； arXiv

AI总结本文研究了不同GNN在动力学建模中的平滑效应，证明了单位ary卷积对这类任务有害，并提出放松的单位ary卷积以平衡平滑性保留与物理系统需求。

Comments Ecstatic to share relaxed unitary mesh convolutions with the community :D! This version contains the camera ready for ICML 2026. Send me an email with your thoughts! I love getting mail :^)

详情

AI中文摘要

现代神经网络在求解表面偏微分方程中表现出潜力，通常通过将表面离散化为网格并使用网格感知图神经网络进行学习。然而，图神经网络存在过平滑问题，即节点特征逐渐趋同。单位ary图卷积通过数学约束保持平滑性，被提出以解决此问题。尽管如此，在许多物理系统如扩散过程中，平滑性会自然增加，单位性可能过于约束。本文系统研究了不同GNN的平滑效应，并证明单位ary卷积对这类任务有害。我们提出放松的单位ary卷积以平衡平滑性保留与物理系统需求。我们还将单位ary和放松的单位ary卷积从图扩展到网格。在热方程和波方程等复杂网格上的PDE以及天气预测实验中，我们发现我们的方法优于包括网格感知变压器和等变神经网络在内的多个强基线。

英文摘要

Modern neural networks have shown promise for solving partial differential equations over surfaces, often by discretizing the surface as a mesh and learning with a mesh-aware graph neural network. However, graph neural networks suffer from oversmoothing, where a node's features become increasingly similar to those of its neighbors. Unitary graph convolutions, which are mathematically constrained to preserve smoothness, have been proposed to address this issue. Despite this, in many physical systems, such as diffusion processes, smoothness naturally increases and unitarity may be overconstraining. In this paper, we systematically study the smoothing effects of different GNNs for dynamics modeling and prove that unitary convolutions hurt performance for such tasks. We propose relaxed unitary convolutions that balance smoothness preservation with the natural smoothing required for physical systems. We also generalize unitary and relaxed unitary convolutions from graphs to meshes. In experiments on PDEs such as the heat and wave equations over complex meshes and on weather forecasting, we find that our method outperforms several strong baselines, including mesh-aware transformers and equivariant neural networks.

URL PDF HTML ☆

赞 0 踩 0

2602.05779 2026-06-16 cs.LG cs.IT math.IT 版本更新

How Controlling the Variance can Improve Training Stability of Sparsely Activated DNNs and CNNs

如何控制方差以提高稀疏激活DNN和CNN的训练稳定性

Emily Dent, Jared Tanner

发表机构 * Mathematical Institute University of Oxford（牛津大学数学研究所）

AI总结针对稀疏激活函数，提出增大高斯过程方差可提升训练稳定性，并设计新初始化策略实现隐藏层高达90%稀疏度的稳定训练。

详情

AI中文摘要

为随机初始化深度网络开发的混沌边缘（EoC）理论通过将中间层表征为高斯过程，既保留网络初始输出中的信息，又最小化梯度爆炸或消失，从而实现更高效的训练。该EoC理论提供了权重和偏置初始化分布方差的选择公式。对于在原点附近近似线性的激活函数，EoC理论通常鼓励高斯过程方差随深度增加收敛至零。本文考虑较少研究的高度稀疏诱导激活函数设置，其中原点附近大范围值被置为零。在此设置下，我们证明了一个新现象：导致更大固定高斯过程的初始化有利于训练稳定性。该理论指导了一种新的简单初始化策略，使得训练隐藏层稀疏度高达90%的DNN和CNN成为可能。

英文摘要

The Edge-of-Chaos (EoC) theory developed for the random initialization of deep networks allows more efficient training by both preserving information in the initial outputs of the network and minimising exploding or vanishing gradients through characterisation of the intermediate layers as Gaussian processes. This EoC theory provides formulae for the choice of the initialisation distribution variances of the weights and biases. For activations which are approximately linear around the origin, the EoC theory typically encourages the Gaussian process variance to converge towards zero with increasing depth. Here we consider the less studied setting of highly sparsity inducing activations where a large region of values near the origin are set to zero. In this setting we prove a new phenomenon whereby initialisations leading to larger fixed Gaussian processes are beneficial to training stability. This theory informs a new, yet simple, initialisation strategy that allows training DNNs and CNNs with as large as 90\% sparsity in the hidden layers.

URL PDF HTML ☆

赞 0 踩 0

2602.08306 2026-06-16 cs.LG 版本更新

TextResNet: Decoupling and Routing Optimization Signals in Compound AI Systems via Deep Residual Tuning

TextResNet：通过深度残差调优解耦和路由复合AI系统中的优化信号

Suizhi Huang, Mei Li, Han Yu, Xiaoxiao Li

AI总结针对文本梯度优化器在深度链中因语义纠缠导致归因模糊的问题，提出TextResNet框架，通过前向加性语义增量、后向语义梯度分解、因果路由和密度感知调度实现信号解耦与精准路由，在复合AI系统中性能优于TextGrad且更稳定。

Comments Accepted by ICML2026

详情

AI中文摘要

文本梯度式优化器（TextGrad）能够通过复合AI系统传播类似梯度的反馈。然而，它们在深度链中表现不佳。这一局限的根本原因源于这些扩展工作流中的语义纠缠问题。在标准文本反向传播中，反馈信号将局部批评与上游上下文混合，导致归因模糊。为解决这一挑战，我们提出TextResNet，一个通过四项关键创新将优化过程重构为精确信号路由的框架。首先，在前向传播中，它强制加性语义增量以保留用于梯度流的恒等高速路。其次，在后向传播中，它通过语义投影器引入语义梯度分解，将反馈解耦为因果独立子空间。第三，它实现因果路由，将投影信号路由到其特定组件。最后，它执行密度感知优化调度，利用解耦信号动态分配资源到关键系统瓶颈。我们的结果表明，TextResNet不仅实现了优于TextGrad的性能，而且在基线崩溃的复合AI系统的智能体任务中表现出显著的稳定性。代码可在该 https URL 获取。

英文摘要

Textual Gradient-style optimizers (TextGrad) enable gradient-like feedback propagation through compound AI systems. However, they do not work well for deep chains. The root cause of this limitation stems from the Semantic Entanglement problem in these extended workflows. In standard textual backpropagation, feedback signals mix local critiques with upstream contexts, leading to Attribution Ambiguity. To address this challenge, we propose TextResNet, a framework that reformulates the optimization process to achieve precise signal routing via four key innovations. Firstly, in the forward pass, it enforces Additive Semantic Deltas to preserve an Identity Highway for gradient flow. Secondly, in the backward pass, it introduces Semantic Gradient Decomposition via a Semantic Projector to disentangle feedback into causally independent subspaces. Thirdly, it implements Causal Routing, which routes projected signals to their specific components. Finally, it performs Density-Aware Optimization Scheduling to leverage the disentangled signals to dynamically allocate resources to key system bottlenecks. Our results show that TextResNet not only achieves superior performance compared to TextGrad, but also exhibits remarkable stability for agentic tasks in compound AI systems where baselines collapse. Code is available at https://github.com/JeanDiable/TextResNet.

URL PDF HTML ☆

赞 0 踩 0

2602.11550 2026-06-16 cs.LG cs.AI 版本更新

TS-Memory: Plug-and-Play Memory for Time Series Foundation Models

TS-Memory: 时间序列基础模型的即插即用记忆模块

Sisuo Lyu, Siru Zhong, Tiegang Chen, Weilin Ruan, Qingxiang Liu, Taiqiang Lv, Qingsong Wen, Raymond Chi-Wing Wong, Yuxuan Liang

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））； Tencent（腾讯）； Squirrel Ai Learning ； The Hong Kong University of Science and Technology（香港科学与技术大学）

AI总结提出参数化记忆蒸馏方法TS-Memory，通过轻量级记忆适配器增强冻结的时间序列基础模型，在分布偏移下实现无检索的高效零样本预测，显著提升点预测和概率预测性能。

详情

AI中文摘要

时间序列基础模型（TSFMs）通过大规模预训练实现了强大的零样本预测，但在分布偏移下将其适应到下游领域仍然具有挑战性。现有解决方案面临权衡：参数化适应可能导致灾难性遗忘，并需要昂贵的多领域维护，而非参数化检索虽然改善了预测，但由于数据存储搜索导致高推理延迟。我们提出了参数化记忆蒸馏，并将其实现为TS-Memory，一种增强冻结TSFMs的轻量级记忆适配器。TS-Memory分两个阶段训练。首先，我们构建一个离线、检索泄漏安全的kNN教师，从检索到的未来中合成置信度感知的分位数目标。其次，我们通过置信度门控监督将该检索诱导的分布校正蒸馏到轻量级记忆适配器中。在推理过程中，TS-Memory以常数时间开销融合记忆和骨干预测，实现无检索部署。在多种TSFMs和基准上的实验表明，与代表性的适应方法相比，在点预测和概率预测上均有一致的改进，效率与冻结骨干相当。代码：此 https URL。

英文摘要

Time Series Foundation Models (TSFMs) achieve strong zero-shot forecasting through large-scale pre-training, but adapting them to downstream domains under distribution shift remains challenging. Existing solutions face a trade-off: Parametric Adaptation can cause catastrophic forgetting and requires costly multi-domain maintenance, while Non-Parametric Retrieval improves forecasts but incurs high inference latency due to datastore search. We propose Parametric Memory Distillation and implement it as TS-Memory, a lightweight memory adapter that augments frozen TSFMs. TS-Memory is trained in two stages. First, we construct an offline, retrieval-leakage-safe kNN teacher that synthesizes confidence-aware quantile targets from retrieved futures. Second, we distill this retrieval-induced distributional correction into a lightweight memory adapter via confidence-gated supervision. During inference, TS-Memory fuses memory and backbone predictions with constant-time overhead, enabling retrieval-free deployment. Experiments across diverse TSFMs and benchmarks demonstrate consistent improvements in both point and probabilistic forecasting over representative adaptation methods, with efficiency comparable to the frozen backbone. Code: https://github.com/sisuolv/TS-Memory.

URL PDF HTML ☆

赞 0 踩 0

2602.20427 2026-06-16 cs.LG cs.AR 版本更新

GauS: Differentiable Scheduling Optimization via Gaussian Reparameterization

GauS：通过高斯重参数化的可微分调度优化

Yaohui Cai, Vesal Bakhtazad, Cunxi Yu, Zhiru Zhang

发表机构 * Cornell University（康奈尔大学）； University of Maryland, College Park（马里兰大学学院公园分校）

AI总结提出GauS框架，利用高斯分布对算子调度进行随机松弛，以可微分方式优化调度，捕获时间序数性质并大幅减少参数空间，首次实现流水线调度的可微分化，达到帕累托最优。

详情

AI中文摘要

高效的算子调度是软件编译和硬件合成中的基本挑战。虽然最近的可微分方法试图用基于梯度的搜索替代传统方法（如精确求解器或启发式方法），但它们通常依赖于分类分布，未能捕获时间的序数性质，并且参数空间扩展性差。在本文中，我们提出了一种新颖的可微分框架GauS，该框架使用高斯分布将算子调度建模为随机松弛，充分利用了现代并行计算设备（如GPU）。通过将调度表示为连续高斯变量，我们成功捕获了时间的序数性质，并将优化空间减少了数个数量级。我们的方法非常灵活，可以表示各种目标和约束，为复杂的流水线调度问题提供了第一个可微分公式。我们在多个基准测试上评估了我们的方法，证明GauS实现了帕累托最优结果。

英文摘要

Efficient operator scheduling is a fundamental challenge in software compilation and hardware synthesis. While recent differentiable approaches have sought to replace traditional ones like exact solvers or heuristics with gradient-based search, they typically rely on categorical distributions that fail to capture the ordinal nature of time and suffer from a parameter space that scales poorly. In this paper, we propose a novel differentiable framework, GauS, that models operator scheduling as a stochastic relaxation using Gaussian distributions, which fully utilize modern parallel computing devices like GPUs. By representing schedules as continuous Gaussian variables, we successfully capture the ordinal nature of time and reduce the optimization space by orders of magnitude. Our method is highly flexible to represent various objectives and constraints, which provides the first differentiable formulation for the complex pipelined scheduling problem. We evaluate our method on a range of benchmarks, demonstrating that Gaus achieves Pareto-optimal results.

URL PDF HTML ☆

赞 0 踩 0

2603.06861 2026-06-16 cs.LG cs.CV 版本更新

IGLU: The Integrated Gaussian Linear Unit Activation Function

IGLU：集成高斯线性单元激活函数

Mingi Kang, Zai Yang, Jeova Farias Sales Rocha Neto

发表机构 * Bowdoin College（布罗德学院）

AI总结提出IGLU激活函数，基于半正态混合分布推导出闭式表达，其门控为柯西CDF，通过单一锐度参数在恒等与ReLU行为间插值，重尾特性保证非零梯度，并给出仅含ReLU操作的有理近似，在视觉和语言任务上达到或超越ReLU/GELU性能。

详情

AI中文摘要

激活函数对深度神经网络至关重要，控制着梯度流、优化稳定性和表示能力。在历史深度架构中，ReLU一直是激活函数的主要选择，而现代基于Transformer的模型越来越多地采用更平滑的替代方案，如GELU和其他自门控替代方案。尽管它们在经验上取得了成功，但这些函数之间的数学关系及其有效性背后的原理仍仅被部分理解。我们引入了IGLU，一个参数化激活函数，作为在半正态混合分布下的GELU门控的尺度混合推导得出。该推导产生了一个闭式表达式，其门控分量恰好是柯西CDF，提供了一个原则性的单参数族，通过单一锐度参数$\sigma$在类恒等和类ReLU行为之间连续插值。与GELU的高斯门控不同，IGLU的重尾柯西门控在负尾处以多项式衰减，保证所有有限输入的非零梯度，并对梯度消失具有更强的鲁棒性。我们进一步引入了IGLU-Approx，一种计算高效的IGLU有理近似，完全用ReLU操作表示，消除了超越函数求值。通过在CIFAR-10、CIFAR-100和WikiText-103上使用ResNet-20、ViT-Tiny和GPT-2 Small进行的评估，IGLU在视觉和语言数据集上相对于ReLU和GELU基线实现了具有竞争力或更优的性能，而IGLU-Approx以大幅降低的计算成本恢复了这一性能。特别地，我们表明在高度不平衡的分类数据集中，使用重尾门控带来了显著的性能提升。

英文摘要

Activation functions are fundamental to deep neural networks, governing gradient flow, optimization stability, and representational capacity. Within historic deep architectures, while ReLU has been the dominant choice for the activation function, modern transformer-based models increasingly are adopting smoother alternatives such as GELU and other self-gated alternatives. Despite their empirical success, the mathematical relationships among these functions and the principles underlying their effectiveness remains only partially understood. We introduce IGLU, a parametric activation function derived as a scale mixture of GELU gates under a half-normal mixing distribution. This derivation yields a closed-form expression whose gating component is exactly the Cauchy CDF, providing a principled one-parameter family that continuously interpolates between identity-like and ReLU-like behavior via a single sharpness parameter $σ$. Unlike GELU's Gaussian gate, IGLU's heavy-tailed Cauchy gate decays polynomially in the negative tail, guaranteeing non-zero gradients for all finite inputs and offering greater robustness to vanishing gradients. We further introduce IGLU-Approx, a computationally efficient rational approximation of IGLU expressed entirely in terms of ReLU operations that eliminates transcendental function evaluation. Through evaluations on CIFAR-10, CIFAR-100, and WikiText-103 across ResNet-20, ViT-Tiny, and GPT-2 Small, IGLU achieves competitive or superior performance on both vision and language datasets against ReLU and GELU baselines, with IGLU-Approx recovering this performance at substantially reduced computational cost. In particular, we show that employing a heavy-tailed gate leads to considerable performance gains in heavily imbalanced classification datasets.

URL PDF HTML ☆

赞 0 踩 0

2603.07079 2026-06-16 cs.LG cs.CL 版本更新

Entropy-Aware On-Policy Distillation of Language Models

熵感知的在线策略蒸馏语言模型

Woogyeol Jin, Taywon Min, Yongjin Yang, Dennis Wei, Yi Zhou, Swanand Ravindra Kadhe, Nathalie Baracaldo, Kimin Lee

AI总结针对在线策略蒸馏中反向KL导致生成多样性下降和教师高熵时学习信号不稳定的问题，提出熵感知的在线策略蒸馏方法，通过在高熵时引入前向KL平衡模式寻求与模式覆盖，提升了生成多样性和学生-教师对齐度。

Comments 18 pages, 11 figures, ICML 2026

详情

AI中文摘要

在线策略蒸馏是一种有前景的语言模型知识迁移方法，学生模型沿着自身轨迹从密集的token级信号中学习。该框架通常使用反向KL散度，鼓励学生匹配教师的高置信度预测。然而，我们表明反向KL的模式寻求特性会降低生成多样性，并在教师分布具有高熵时产生不稳定的学习信号。为解决此问题，我们引入了熵感知的在线策略蒸馏。我们的关键思想是在教师熵高时，用前向KL增强标准的反向KL目标，以捕获全部合理输出范围，同时在其他地方保留精确模仿。它在不牺牲在线策略训练效率的情况下，平衡了模式寻求的精确性与模式覆盖的鲁棒性。实验表明，我们的方法保持了生成多样性（持续的token级熵），并改善了学生-教师对齐（在高熵token上降低前向KL）。在六个数学推理基准上，与基线在线策略蒸馏方法相比，Qwen3-0.6B-Base的Pass@8准确率提升+1.37，Qwen3-1.7B-Base提升+2.39，Qwen3-4B-Base提升+5.05。这些结果表明，考虑教师不确定性对于保持多样性和实现有效知识迁移至关重要。

英文摘要

On-policy distillation is a promising approach for transferring knowledge between language models, where a student learns from dense token-level signals along its own trajectories. This framework typically uses reverse KL divergence, encouraging the student to match the teacher's high-confidence predictions. However, we show that the mode-seeking property of reverse KL reduces generation diversity and yields unstable learning signals when the teacher distribution has high entropy. To address this, we introduce Entropy-Aware On-Policy Distillation. Our key idea is augmenting the standard reverse KL objective with forward KL when teacher entropy is high, capturing the full range of plausible outputs while retaining precise imitation elsewhere. It balances mode-seeking precision with mode-covering robustness without sacrificing on-policy training efficiency. Experiments show that our method maintains generation diversity (sustained token-level entropy) and improves student-teacher alignment (lower forward KL on high-entropy tokens). Across six math reasoning benchmarks, this yields Pass@8 accuracy gains of +1.37 for Qwen3-0.6B-Base, +2.39 for Qwen3-1.7B-Base, and +5.05 for Qwen3-4B-Base compared to baseline on-policy distillation methods. These results demonstrate that accounting for teacher uncertainty is essential for maintaining diversity and achieving effective knowledge transfer.

URL PDF HTML ☆

赞 0 踩 0

2603.13751 2026-06-16 cs.LG 版本更新

Manifold-Orthogonal Dual-spectrum Extrapolation for Parameterized Physics-Informed Neural Networks

流形正交双谱外推法用于参数化物理信息神经网络

Zhangyong Liang, Huanhuan Gao

发表机构 * National Center for Applied Mathematics, Tianjin University（天津大学应用数学中心）； School of Mechanical and Aerospace Engineering, Jilin University（吉林大学机械与 aerospace 工程学院）

AI总结提出流形正交双谱外推法（MODE），通过主谱密集混合、残谱激活和平移解锁三种机制，在保持SVD参数效率的同时实现参数化PINN的强分布外泛化。

详情

AI中文摘要

物理信息神经网络（PINN）在建模由偏微分方程（PDE）控制的动力系统方面取得了显著成功。为避免在新的物理条件下进行昂贵的重新训练，参数化PINN（P$^2$INN）通常使用奇异值分解（SVD）来适应预训练算子以处理分布外（OOD）区域。然而，基于SVD的微调常常受到刚性子空间锁定和重要高频谱模式截断的限制，从而削弱其捕捉复杂物理转变的能力。虽然参数高效微调（PEFT）方法看起来是有希望的替代方案，但将诸如LoRA之类的传统适配器应用于P$^2$INN会引入严重的帕累托权衡，因为加法更新增加了参数开销并破坏了算子表示中固有的结构化物理流形。为了解决这些限制，我们提出了流形正交双谱外推法（MODE），这是一种用于物理算子适应的轻量级微架构。MODE将物理演化分解为互补机制，包括主谱密集混合（在冻结的正交基内实现跨模态能量转移）、残谱激活（通过单个可训练标量激活高频谱分量）以及仿射伽利略解锁（显式隔离空间平移动力学）。在具有挑战性的PDE基准测试（包括一维对流-扩散-反应方程和二维亥姆霍兹方程）上的实验表明，MODE在保持原生SVD的最小参数复杂性的同时实现了强大的分布外泛化，并优于现有的基于PEFT的基线方法。

英文摘要

Physics-informed neural networks (PINNs) have achieved notable success in modeling dynamical systems governed by partial differential equations (PDEs). To avoid computationally expensive retraining under new physical conditions, parameterized PINNs (P$^2$INNs) commonly adapt pre-trained operators using singular value decomposition (SVD) for out-of-distribution (OOD) regimes. However, SVD-based fine-tuning often suffers from rigid subspace locking and truncation of important high-frequency spectral modes, limiting its ability to capture complex physical transitions. While parameter-efficient fine-tuning (PEFT) methods appear to be promising alternatives, applying conventional adapters such as LoRA to P$^2$INNs introduces a severe Pareto trade-off, as additive updates increase parameter overhead and disrupt the structured physical manifolds inherent in operator representations. To address these limitations, we propose Manifold-Orthogonal Dual-spectrum Extrapolation (MODE), a lightweight micro-architecture designed for physics operator adaptation. MODE decomposes physical evolution into complementary mechanisms including principal-spectrum dense mixing that enables cross-modal energy transfer within frozen orthogonal bases, residual-spectrum awakening that activates high-frequency spectral components through a single trainable scalar, and affine Galilean unlocking that explicitly isolates spatial translation dynamics. Experiments on challenging PDE benchmarks including the 1D Convection--Diffusion--Reaction equation and the 2D Helmholtz equation demonstrate that MODE achieves strong out-of-distribution generalization while preserving the minimal parameter complexity of native SVD and outperforming existing PEFT-based baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.06734 2026-06-16 cs.LG cs.AI quant-ph 版本更新

Gated QKAN-FWP: Scalable Quantum-inspired Sequence Learning

门控QKAN-FWP：可扩展的量子启发序列学习

Kuo-Chung Peng, Samuel Yen-Chi Chen, Jiun-Cheng Jiang, Chen-Yu Liu, En-Jui Kuo, Yun-Yuan Wang, Prayag Tiwari, Andrea Ceschini, Chi-Sheng Chen, Yu-Chao Hsu, Chun-Hua Lin, Tai-Yue Li, Antonello Rosato, Massimo Panella, Simon See, Saif Al-Kuwari, Kuan-Cheng Chen, Nan-Yow Chen, Hsi-Sheng Goan

发表机构 * Department of Physics and Center for Theoretical Physics, National Taiwan University（物理系与理论物理中心，国立台湾大学）； National Center for High-Performance Computing, National Institutes of Applied Research（高性能计算国家中心，应用研究国家机构）； Wells Fargo, New York, NY, USA（摩根大通银行，纽约，纽约州，美国）； NVIDIA AI Technology Center, NVIDIA Corp., Taipei, Taiwan（NVIDIA AI技术中心，NVIDIA公司，台北，台湾）； Center for Quantum Science and Engineering, National Taiwan University（量子科学与工程中心，国立台湾大学）； Graduate Institute of Applied Physics, National Taiwan University（应用物理研究所，国立台湾大学）； Department of Electrophysics, National Yang Ming Chiao Tung University（电子物理系，国立阳明交通大学）； School of Information Technology, Halmstad University（信息科技学院，哈尔姆斯塔德大学）； Department of Information Engineering, Electronics and Telecommunications (DIET), University of Rome “La Sapienza”, Rome, Italy（信息工程、电子与电信系（DIET），罗马“拉·索拉维亚”大学，罗马，意大利）； Beth Israel Deaconess Medical Center & Harvard Medical School（贝瑟尔以色列德acons医疗中心及哈佛医学院）； Cross College Elite Program, National Cheng Kung University（跨学院精英计划，国立成功大学）

AI总结提出门控QKAN-FWP框架，融合快速权重编程与量子启发KAN，使用单量子比特数据重上传电路作为非线性激活，引入标量门控更新规则，在时间序列基准、MiniGrid强化学习和太阳周期预测中优于经典循环模型，并在NISQ设备上验证了可行性。

Comments 46 pages, 13 figures, 10 tables

详情

AI中文摘要

快速权重编程器（FWP）通过动态更新的参数而非循环隐藏状态来编码时间依赖关系。量子FWP（QFWP）使用变分量子电路（VQC）扩展了这一思想，但现有实现依赖于多量子比特架构，在噪声中等规模量子（NISQ）设备上难以扩展，且经典模拟成本高昂。我们提出了门控QKAN-FWP，一种将FWP与量子启发Kolmogorov-Arnold网络（QKAN）相结合的快速权重框架，使用单量子比特数据重上传电路作为可学习非线性激活，称为数据重上传激活（DARUAN）。我们进一步引入了一种标量门控快速权重更新规则，稳定参数演化，并对其自适应记忆核、几何有界性和可并行梯度路径进行了理论分析。我们在时间序列基准、MiniGrid强化学习上评估了该框架，并以实际太阳周期预测作为主要实际结果。在528个月输入窗口和132个月预测水平的长时域设置中，我们的12.5k参数模型实现了比一系列经典循环基线（参数最多达13倍）更低的缩放均方误差（MSE）、峰值幅度误差和峰值时间误差，这些基线包括长短期记忆网络（LSTM）（25.9k-89.1k参数）、WaveNet-LSTM（167k）、普通循环神经网络（11.5k）和改进的echo state网络（132k）。为了验证NISQ兼容性，我们进一步在IonQ和IBM量子处理器上部署了训练好的快速编程器，在1024次测量下恢复了与无噪声模拟器相对MSE在0.1%以内的预测精度。这些结果使门控QKAN-FWP成为一种可扩展、参数高效且NISQ兼容的量子启发序列建模方法。

英文摘要

Fast Weight Programmers (FWPs) encode temporal dependencies through dynamically updated parameters rather than recurrent hidden states. Quantum FWPs (QFWPs) extend this idea with variational quantum circuits (VQCs), but existing implementations rely on multi-qubit architectures that are difficult to scale on noisy intermediate-scale quantum (NISQ) devices and expensive to simulate classically. We propose gated QKAN-FWP, a fast-weight framework that integrates FWP with Quantum-inspired Kolmogorov-Arnold Network (QKAN) using single-qubit data re-uploading circuits as learnable nonlinear activation, known as DatA Re-Uploading ActivatioN (DARUAN). We further introduce a scalar-gated fast-weight update rule that stabilizes parameter evolution, supported by a theoretical analysis of its adaptive memory kernel, geometric boundedness, and parallelizable gradient paths. We evaluate the framework across time-series benchmarks, MiniGrid reinforcement learning, and highlight real-world solar cycle forecasting as our main practical result. In the long-horizon setting with 528-month input window and 132-month forecast horizon, our 12.5k-parameter model achieves lower scaled Mean Square Error (MSE), peak amplitude error, and peak timing error than a suite of classical recurrent baselines with up to 13x more parameters, including Long Short-Term Memory (LSTM) networks (25.9k-89.1k parameters), WaveNet-LSTM (167k), Vanilla recurrent neural network (11.5k), and a Modified Echo State Network (132k). To validate NISQ compatibility, we further deploy the trained fast programmer on IonQ and IBM Quantum processors, recovering forecasting accuracy within 0.1% relative MSE of the noiseless simulator at 1024 shots. These results position gated QKAN-FWP as a scalable, parameter-efficient, and NISQ-compatible approach to quantum-inspired sequence modeling.

URL PDF HTML ☆

赞 0 踩 0

2605.22873 2026-06-16 cs.LG cs.AI cs.CL 版本更新

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

LLM何时推理？基于熵相变的动力系统视角

Wei Xia, Haoqing Wang, Zhi-Hong Deng, Yehui Tang

发表机构 * Samsung Research（三星研究院）； State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University（通用人工智能国家重点实验室，北京理工大学）

AI总结本文通过早期解码熵动态检测LLM的推理状态，提出轻量级无训练路由框架EDRM，自适应选择推理策略，在减少token消耗的同时提升准确率。

详情

AI中文摘要

链式思维（CoT）推理已成为增强LLM能力的默认策略，但其应用引发了一个基本问题：显式推理何时真正有益？实证证据揭示了一个显著悖论：CoT在事实性和开放式任务上往往带来边际甚至负增益，同时成倍增加token消耗。在这项工作中，我们表明LLM推理不是任务或模型的静态属性，而是在生成过程中涌现的\emph{动态解码状态}。通过系统分析，我们发现早期熵动态提供了这一状态的可靠信号：受益于CoT的任务表现出一致的熵降低，而其他任务则呈现不稳定或增加的模式。这种行为可以解释为从高熵探索状态到低熵结构化推理状态的类相变转变。基于这些见解，我们提出了 extbf{EDRM}（基于熵动态的推理流形），一个轻量级且无需训练的路由框架，利用早期解码熵自适应选择推理策略。EDRM将熵轨迹嵌入到紧凑且可解释的流形表示中，支持零样本部署和细粒度实例级适应。在15个基准测试和4个不同规模与架构的LLM上，EDRM始终优于静态基线。在数据集层面，EDRM实现了 extbf{41--55\%}的token减少，同时仅需50个校准样本即可提高准确率。在实例层面，它进一步将准确率提升高达 extbf{4.7\%}，同时保持 extbf{27--45\%}的token节省。这些结果表明，推理应被选择性地调用而非默认使用，并展示了基于熵的解码控制对于高效自适应LLM推理的有效性。

英文摘要

Chain-of-thought (CoT) reasoning has become the default strategy for enhancing LLM capabilities, yet its application raises a fundamental question: when is explicit reasoning actually beneficial? Empirical evidence reveals a striking paradox: CoT often provides marginal or even negative gains on factual and open-ended tasks while multiplying token consumption. In this work, we show that LLM reasoning is not a static property of tasks or models, but a \emph{dynamic decoding state} that emerges during generation. Through systematic analysis, we find early-stage entropy dynamics provide a reliable signal of this state: tasks benefiting from CoT exhibit consistent entropy reduction, while others display unstable or increasing patterns. This behavior can be interpreted as a phase-transition-like shift from a high-entropy exploratory regime to a low-entropy structured reasoning regime. Based on these insights, we propose \textbf{EDRM} (Entropy Dynamics-based Reasoning Manifold), a lightweight and training-free routing framework that leverages early decoding entropy to adaptively select inference strategies. EDRM embeds entropy trajectories into a compact and interpretable manifold representation, enabling both zero-shot deployment and fine-grained instance-level adaptation. Across 15 benchmarks and 4 LLMs of varying scales and architectures, EDRM consistently outperforms static baselines. At the dataset level, EDRM achieves \textbf{41--55\%} token reduction while improving accuracy with as few as 50 calibration samples. At the instance level, it further improves accuracy by up to \textbf{4.7\%} while maintaining \textbf{27--45\%} token savings. These results suggest that reasoning should be invoked selectively rather than by default, and demonstrate the effectiveness of entropy-driven decoding control for efficient and adaptive LLM inference.

URL PDF HTML ☆

赞 0 踩 0

2605.31027 2026-06-16 cs.LG 版本更新

Multi-Scale Separable Fourier Neural Networks for Solving High-Frequency PDEs

多尺度可分离傅里叶神经网络用于求解高频偏微分方程

Qihong Yang, Qiaolin He

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出多尺度可分离傅里叶神经网络（MS-SFNN），通过可分离表示、随机固定权重和傅里叶特征嵌入，高效精确求解线性和非线性高频偏微分方程。

Comments 51 pages, 27 figures

详情

AI中文摘要

我们提出了一种新颖的神经网络架构，称为多尺度可分离傅里叶神经网络（MS-SFNN），用于精确高效地求解线性和非线性高频偏微分方程（PDE）。MS-SFNN利用可分离表示：给定一个$d$维输入，它采用$d$个独立的子网络——每个作用于单个坐标——并通过其输出的逐元素乘法构造基函数。PDE解被近似为这些基函数的线性组合，系数由最小二乘法确定。关键的是，所有网络权重和偏置仅从单位方差的均匀分布随机初始化一次，之后保持不变。为了增强表达能力，在每个子网络中引入可调缩放因子以调节所得基函数的频率内容。通过余弦激活显式嵌入傅里叶特征，赋予该方法强大的谱逼近能力。为了缓解高频或三维问题中密集配置带来的内存瓶颈，我们用解析推导的基函数导数替代自动微分，并开发了一种内存高效的批处理QR分解算法来求解大规模最小二乘系统。数值实验表明，MS-SFNN在一系列具有挑战性的PDE上达到了前所未有的精度，显著优于物理信息神经网络（PINN）和分离变量谱神经网络（SV-SNN）等最先进方法。

英文摘要

We propose a novel neural network architecture, termed Multi-Scale Separable Fourier Neural Networks (MS-SFNN), for the accurate and efficient solution of linear and nonlinear high-frequency partial differential equations (PDEs). MS-SFNN exploits a separable representation: given a $d$-dimensional input, it employs $d$ independent subnetworks -- each acting on a single coordinate -- and constructs basis functions via element-wise multiplication of their outputs. The PDE solution is approximated as a linear combination of these basis functions, with coefficients determined by least squares. Critically, all network weights and biases are randomly initialized once, from a uniform distribution with unit variance, and remain fixed thereafter. To enhance expressivity, a tunable scaling factor is introduced in each subnetwork to modulate the frequency content of the resulting basis functions. Fourier features are explicitly embedded through cosine activations, endowing the method with strong spectral approximation capabilities. To mitigate the memory bottleneck associated with dense collocation in high-frequency or three-dimensional problems, we replace automatic differentiation with analytically derived basis function derivatives and develop a memory-efficient batched QR decomposition algorithm for solving large-scale least-squares systems. Numerical experiments demonstrate that MS-SFNN achieves unprecedented accuracy across a range of challenging PDEs, significantly outperforming state-of-the-art methods such as Physics-Informed Neural Networks (PINN) and Separated-Variable Spectral Neural Networks (SV-SNN).

URL PDF HTML ☆

赞 0 踩 0

2606.04678 2026-06-16 cs.LG 版本更新

Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers

基于深度条件循环Transformer的ASR测试时计算缩放

Yacouba Kaloga, Shashi Kumar, Shakeel A. Sheikh, Driss Khalil, Petr Motlicek, Ina Kodrasi

发表机构 * Idiap Research Institute（Idiap研究 institute）； EPFL（瑞士联邦理工学院）； BUT（布拉格技术大学）； Novartis Institute of Biomedical Research（诺华生物医学研究 institute）

AI总结提出LARM模型，通过深度条件循环Transformer将循环编码器深度变为可控的测试时计算轴，结合稀疏CTC检查点、监督时钟嵌入、FiLM深度条件和延迟软后验反馈，在LibriSpeech上随推理循环次数增加提升WER，实现测试时计算缩放从自回归语言模型推理扩展到连续非自回归语音识别。

详情

AI中文摘要

端到端ASR系统通常在推理时使用固定深度的声学编码器，这使得在不训练更大模型的情况下，难以用额外的测试时计算换取更好的识别性能。一种自然的方法是循环重用共享的Transformer块，但我们发现简单的循环并不能充分利用额外的循环计算。我们引入了LARM，一种深度条件循环Transformer，将循环编码器深度变为可控的测试时计算轴。LARM结合了稀疏CTC检查点、监督时钟嵌入、FiLM深度条件和延迟软后验反馈。这些组件将循环结构化为由潜在精炼阶段分隔的识别检查点，并允许共享权重在循环步骤间进行特化。在LibriSpeech上，LARM随着推理循环次数的增加提高了WER，并达到了与更深的非共享参数基线相竞争的性能。我们的结果表明，测试时计算缩放可以超越自回归语言模型推理，扩展到连续非自回归语音识别。

英文摘要

End-to-end ASR systems typically use fixed-depth acoustic encoders at inference, making it difficult to trade additional test-time computation for improved recognition without training a larger model. A natural approach is to reuse a shared Transformer block recurrently, but we find that naive looping does not fully exploit additional recurrent compute. We introduce LARM, a depth-conditioned looped Transformer that turns recurrent encoder depth into a controllable test-time compute axis. LARM combines sparse CTC checkpoints, supervision-clock embeddings, FiLM depth conditioning, and delayed soft-posterior feedback. These components structure the loop into recognition checkpoints separated by latent refinement phases and allow shared weights to specialize across recurrent steps. On LibriSpeech, LARM improves WER as the number of inference loops increases and achieves performance competitive with deeper unshared-parameter baselines. Our results show that test-time compute scaling can extend beyond autoregressive language-model reasoning to continuous non-autoregressive speech recognition.

URL PDF HTML ☆

赞 0 踩 0

2606.05878 2026-06-16 cs.LG 版本更新

TS-ICL: A Flexible Time-Indexed Foundation Model for Time Series via In-Context Learning

TS-ICL: 一种基于上下文学习的灵活时间索引时间序列基础模型

Etienne Le Naour, Tahar Nabil, Adrien Petralia

发表机构 * EDF R&D（EDF研究与发展）

AI总结提出TS-ICL，一种基于上下文学习的概率编码器-回归器Transformer，统一了时间序列预测与插值，并在插值任务上达到新最优，同时在部分观测回溯窗口预测中表现突出。

详情

AI中文摘要

基础模型标志着时间序列建模的深刻范式转变，任务特定模型正被通用零样本模型取代。然而，当前方法主要关注预测，而现实世界的时间序列通常是不规则和部分观测的，需要模型能够联合预测、插补缺失值并处理降采样条件。为应对这些挑战，我们引入了TS-ICL，一种新颖的基于概率上下文学习的编码器-回归器Transformer，统一了预测和插值。TS-ICL将时间序列任务表述为时间戳对齐的回归，并通过训练从新颖的因果数据先验生成的合成依赖结构自然地纳入协变量。实验上，TS-ICL在插值任务上达到了新的最优，同时在单变量和协变量感知基准上与领先的预测基础模型保持竞争力。它在部分观测回溯窗口的预测中表现出特别强的性能。

英文摘要

Foundation models mark a profound paradigm shift in time series modeling, with task-specific models being superseded by general-purpose zero-shot models. Yet, current approaches primarily focus on forecasting, while real-world time series are often irregularly and partially observed, requiring models that can jointly forecast, impute missing values, and handle degraded sampling conditions. To address these challenges, we introduce TS-ICL, a novel probabilistic In-Context Learning encoder--regressor Transformer that unifies forecasting and imputation. TS-ICL formulates time series tasks as timestamp-aligned regression and naturally incorporates covariates by training on synthetic dependency structures generated from a novel causal data prior. Empirically, TS-ICL achieves a new state-of-the-art in imputation, while remaining competitive with leading forecasting foundation models across both univariate and covariate-aware benchmarks. It shows particularly strong performance in forecasting with partially observed look-back windows.

URL PDF HTML ☆

赞 0 踩 0

2606.07082 2026-06-16 cs.LG cs.AI 版本更新

On the Geometry of On-Policy Distillation

论在线策略蒸馏的几何结构

Zhennan Shen, Yanshu Li, Qingyu Yin, Chak Tou Leong, Zhilin Wang, Yanxu Chen, Rongduo Han, Sunbowen Lee, Yi R. Fung

发表机构 * HKUST（香港科技大学）； UT Austin（得克萨斯大学奥斯汀分校）； Zhejiang University（浙江大学）； Hong Kong PolyU（香港理工大学）； USTC（中国科学技术大学）； BUPT（北京邮电大学）； Nankai University（南开大学）； BIT（北京理工大学）

AI总结本文通过参数空间诊断，揭示在线策略蒸馏（OPD）的更新轨迹具有松弛离主成分、子空间锁定等独特几何特性，表明其并非介于SFT和RLVR之间的中间方法。

Comments 17 pages, 8 figures

详情

AI中文摘要

在线策略蒸馏（OPD）越来越多地被用于改进大型语言模型的推理能力，但其训练动态仍鲜为人知。我们刻画了OPD更新在参数空间中的轨迹，并将其与监督微调（SFT）和可验证奖励强化学习（RLVR）进行了比较。一套参数空间诊断一致地将OPD置于松弛的离主成分区域：与SFT相比，其更新影响更少的权重，并更强烈地避开主方向；而与RLVR相比，其约束更宽松。除了这种静态定位外，OPD还表现出子空间锁定：其累积更新迅速进入一个狭窄的低维通道。将训练限制在早期形成的更新子空间内能保持OPD的性能，但会严重降低SFT，表明该锁定子空间对OPD在功能上是充分的。控制实验进一步表明，稀疏化更新令牌和将rollout生成移至离策略能保持秩动态，而将OPD目标与RLVR混合则会改变它们。总体而言，这些结果表明OPD不仅仅是SFT和RLVR之间的中间点，而是在参数空间中诱导出自身独特的更新几何结构。

英文摘要

On-policy distillation (OPD) is increasingly used to improve large language model reasoning, but its training dynamics remain poorly understood. We characterize the trajectory of OPD updates in parameter space and compare it with supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR). A suite of parameter-space diagnostics consistently places OPD in a relaxed off-principal regime: compared with SFT, its updates affect fewer weights and avoid principal directions more strongly, while compared with RLVR, they remain less tightly constrained. Beyond this static localization, OPD exhibits subspace locking: its cumulative updates rapidly enter a narrow low-dimensional channel. Constraining training to the update subspace formed early in training preserves OPD performance but substantially degrades SFT, indicating that the locked subspace is functionally sufficient for OPD. Control experiments further show that sparsifying the update tokens and shifting rollout generation off-policy preserve the rank dynamics, whereas mixing the OPD objective with RLVR changes them. Overall, these results suggest that OPD is not merely an intermediate point between SFT and RLVR, but induces its own update geometry in parameter space.

URL PDF HTML ☆

赞 0 踩 0

2606.07678 2026-06-16 cs.LG cs.AI 版本更新

DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment

DOG-DPO：几何中的动态优化用于安全对齐

Yi Nian, Tiankai Yang, Yudi Zhang, Qi Pan, Zelong Xu, Shenzhe Zhu, Qingqing Luan, Yue Huang, Xiangliang Zhang, Yue Zhao

发表机构 * University of Southern California（南加州大学）； Iowa State University（爱荷华州立大学）； University of Wisconsin–Madison（威斯康星大学麦迪逊分校）； UT Austin（德克萨斯大学奥斯汀分校）； Independent Researcher（独立研究员）； University of Notre Dame（圣母大学）

AI总结提出DOG-DPO框架，将偏好对表示为模型表示空间中的方向，通过几何分解和多样性覆盖选择子集，仅用11%数据即可恢复大部分安全增益。

详情

AI中文摘要

大型语言模型的安全对齐依赖于偏好数据，但当前的流水线通常训练于大规模冗余数据集。现有的数据选择方法通常独立地对每个偏好对评分，将方向性偏好信息压缩为标量质量或多样性分数。这种以样本为中心的视角在多数据集设置中尤其受限，其中共享的安全方向与数据集特定的残余风险共存。我们提出DOG-DPO，一种无需训练的数据选择框架，将偏好对视为结构化几何信号。DOG-DPO首先将每个偏好对表示为模型表示空间中的一个方向。然后，它将多数据集偏好几何分解为全局锚点子空间和数据集特定的残余子空间。最后，它通过最大化基于多样性的覆盖来选择子集，鼓励在DPO训练前广泛、非冗余地覆盖对齐方向。在六个安全基准和两个模型骨干上，DOG-DPO仅使用11%的偏好对就实现了强大的效用-鲁棒性权衡。它恢复了全数据训练的大部分安全增益，同时完全无需教师、无需训练，并且比代表性选择基线快得多。

英文摘要

Safety alignment for large language models relies on preference data, but current pipelines often train on large, redundant datasets. Existing data selection methods typically score each preference pair independently, collapsing directional preference information into scalar quality or diversity scores. This sample-centric view is especially limiting in multi-dataset settings, where shared safety directions coexist with dataset-specific residual risks. We propose DOG-DPO, a training-free data selection framework that treats preference pairs as structured geometric signals. DOG-DPO first represents each preference pair as a direction in model representation space. It then decomposes multi-dataset preference geometry into a global anchor subspace and dataset-specific residual subspaces. Finally, it selects subsets by maximizing diversity-based coverage, encouraging broad, non-redundant coverage of alignment directions before DPO training. Across six safety benchmarks and two model backbones, DOG-DPO achieves a strong utility-robustness trade-off using only 11% of the preference pairs. It recovers most of the safety gains of full-data training while remaining entirely teacher-free, training-free, and substantially faster than representative selection baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.11123 2026-06-16 cs.LG 版本更新

Overcoming Rank Collapse in Feedback Alignment

克服反馈对齐中的秩坍缩

Gauthier Boeshertz, Razvan Pascanu, Claudia Clopath

发表机构 * Imperial College London（伦敦帝国理工学院）； Mila（Mila研究所）

AI总结研究发现反馈对齐（FA）在深层网络中因误差信号秩低而失效，提出通过Muon优化器和隐藏活动归一化提升信号维度，在CIFAR100上ResNet-18准确率提升9个百分点。

Comments 9 pages and 4 figures, 1 table for main text. Total of 21 pages and 13 figures with appendix

详情

AI中文摘要

反向传播（BP）被广泛认为在生物学上不可行，部分原因在于它要求反馈权重是前向权重的转置以进行误差传播。有趣的是，当使用固定的随机反馈权重训练网络以规避此问题时，学习过程会将前向权重与反馈权重对齐，导致反向传播的误差信号成为BP使用的标准梯度的近似。这一过程称为反馈对齐（FA），在MLP和非常浅的CNN中有效，但难以扩展到更深层的架构。在这项工作中，我们首先研究了在CIFAR10上训练的BP和FA模型之间的差异，特别关注信号的有效秩。我们发现FA误差的秩显著较低，因此被限制在比BP更低维的子空间中，限制了参数空间的探索。受此观察启发，我们评估了两种增加FA有效维度的机制：Muon，一种使权重更新正交化的优化器；以及隐藏活动归一化，促进激活正交性。在更大的架构和基准测试中，我们发现这些方法一致地优于FA基线，例如，在CIFAR100上使用ResNet-18，准确率提高了9个百分点。我们的结果将低维梯度动力学确定为扩展FA的关键障碍，并表明诱导更高维的更新几何是扩展反向传播替代方法的有前途的途径。

英文摘要

Backpropagation (BP) is widely viewed as biologically implausible, in part because it requires feedback weights to be the transpose of forward weights for error propagation. Interestingly, when training a network with fixed random feedback weights to circumvent this issue, learning aligns the forward weights with the feedback weights, leading the backpropagated error signal to become an approximation of the standard gradient used by BP. This process, called Feedback Alignment (FA), occurs in MLPs and very shallow CNNs but does not scale well to deeper architectures. In this work, we first investigated differences between BP and FA models, trained on CIFAR10, specifically focusing on the effective rank of the signal. We found that the FA error has a considerably lower rank and hence is constrained to a lower-dimensional subspace compared to BP, limiting exploration of the parameter space. Motivated by this observation, we evaluated two mechanisms for increasing the effective dimensionality of FA: Muon, an optimiser that orthogonalises weight updates; and hidden activity normalisation, which promotes activation orthogonality. Across larger architectures and benchmarks, we find that these methods consistently improve over FA baselines, for example, on CIFAR100 with a Resnet-18, accuracy increases by 9 percentage points. Our results identify low-dimensional gradient dynamics as a key obstacle to scaling FA and suggest that inducing higher-dimensional update geometry is a promising route toward scaling alternatives to backpropagation.

URL PDF HTML ☆

赞 0 踩 0

2606.14398 2026-06-16 cs.LG 版本更新

A theoretical model for task routing in mixture-of-expert transformers

混合专家Transformer中任务路由的理论模型

Vinoth Nandakumar, Yongli Xiang, Yunzhi Yao, Peike Li, Tongliang Liu

发表机构 * University of Sydney（悉尼大学）； Zhejiang University（浙江大学）； Google Research（谷歌研究院）

AI总结通过离散语言模型证明单层MoE Transformer可利用专家实现任务专业化，支持经验发现。

详情

AI中文摘要

混合专家（MoE）层使得在保持推理计算固定的情况下扩展Transformer模型成为可能。尽管在前沿MoE Transformer模型的实证研究中观察到了任务-专家专业化现象，但现有的理论工作使用连续混合模型进行分析，无法有效建模自然语言。一个重要的问题是使用离散语言模型从理论上解释Transformer MoE模型中的任务-专家专业化。为此，我们通过句法模板和有限键值字典表示结构化知识，并正式证明单层MoE Transformer可以通过使用专注于相应任务的专家来编码知识。我们的构造展示了查询如何被路由到唯一的、特定于任务的专家，其大小仅取决于给定任务的内在复杂度（即其句法模板和事实字典的组合大小）。我们的构造为MoE模型中局部化知识回路的实证结果提供了理论支持。我们通过实验评估模型在不同MoE损失函数下的性能来支持我们的理论发现。

英文摘要

Mixture-of-experts (MoE) layers enable the scaling of transformer models while keeping the inference compute fixed. While task-expert specialization has been observed in empirical studies of frontier MoE transformer models, existing theoretical work analyzes this using continuous mixture models that cannot be used to model natural language effectively. An important open question is to \textit{theoretically explain task-expert specialization in transformer MoE models using discrete models of language}. To address this, we represent structured knowledge via syntactic templates and finite key-value dictionaries, and prove formally that a single-layer MoE transformer can encode knowledge by using experts that specialize in the corresponding tasks. Our construction shows how queries are routed to unique, task-specific experts whose size depends solely on the intrinsic complexity of the given task (i.e. the combined size of its syntactic templates and factual dictionary). Our construction provides a theoretical support for empirical results on localized knowledge circuits in MoE models. We support our theoretical findings with experiments evaluating model performance under varying MoE loss functions.

URL PDF HTML ☆

赞 0 踩 0

2310.06555 2026-06-16 cs.CL cs.AI cs.LG cs.MA 版本更新

It's About Time: Temporal References in Emergent Communication

关于时间：涌现通信中的时间指代

Olaf Lipinski, Adam J. Sobey, Federico Cerutti, Timothy J. Norman

发表机构 * University of Southampton（索姆塞特大学）； The Alan Turing Institute（艾伦·图灵研究所）； University of Brescia（布雷西亚大学）

AI总结研究涌现通信中时间指代缺失问题，发现仅改变损失函数不足，需修改架构（分批方法）才能使时间指代涌现，95%以上代理成功，为提升通信效率奠定基础。

Comments 23 pages main body and 31 pages supplementary material, 9 figures in main body. Code available at https://github.com/olipinski/TRG

详情

DOI: 10.1613/jair.1.19795
Journal ref: Journal of Artificial Intelligence Research 86, Article 11 (June 2026)

AI中文摘要

Think-at-Hard: 选择性潜在迭代以改进推理语言模型

Tianyu Fu, Yichen You, Zekai Chen, Guohao Dai, Huazhong Yang, Yu Wang

AI总结针对循环变压器中潜在过思考问题，提出Think-at-Hard方法，通过轻量级决策器选择性地在困难令牌上触发潜在迭代，并采用深度感知LoRA和双因果注意力机制，在数学、问答和编码任务上一致提升性能。

Comments Accepted by ICML'26

详情

AI中文摘要

提升大型语言模型（LLMs）的推理能力，特别是在参数约束下，对实际应用至关重要。循环变压器通过执行多次潜在迭代来细化每个令牌，超越单次前向传播。然而，我们识别出一种潜在过思考现象：大多数令牌预测在第一次前向传播后已经正确，但在后续迭代中有时会被修改为错误。我们询问选择性地跳过潜在迭代是否能提高准确性，并揭示了一个显著的潜力：使用预言迭代策略可将性能提升高达7.3%。受此启发，我们提出了Think-at-Hard (TaH)，一种针对选择性迭代优化的循环变压器。TaH采用轻量级神经决策器来触发潜在迭代，仅在标准前向传播后可能不正确的令牌上触发。在潜在迭代期间，深度感知的低秩适应（LoRA）模块将目标从一般的下一个令牌预测转变为聚焦的困难令牌细化。双因果注意力机制将注意力从令牌序列维度扩展到额外的迭代深度维度，实现跨迭代信息流，同时保持完全的序列并行性。在九个基准上的实验显示，在数学、问答和编码任务上一致提升。在相同参数数量下，TaH在93%的令牌上跳过迭代，性能比始终迭代的基线高3.8-4.4%，并超过单次迭代的Qwen3基线3.0-3.8%。当允许LoRA和决策器增加不到3%的参数时，增益分别进一步增加到5.3-6.2%和6.1-6.8%。我们的代码可在以下网址获取：https://this URL。

英文摘要

Improving the reasoning abilities of Large Language Models (LLMs), especially under parameter constraints, is crucial for real-world applications. Looped transformers address this by performing multiple latent iterations to refine each token beyond a single forward pass. However, we identify a latent overthinking phenomenon: most token predictions are already correct after the first pass, but are sometimes revised into errors in later iterations. We ask whether selectively skipping latent iterations can improve accuracy, and reveal significant potential with an oracle iteration policy that boosts performance by up to 7.3%. Motivated by this, we propose Think-at-Hard (TaH), a looped transformer optimized for selective iteration. TaH employs a lightweight neural decider to trigger latent iteration, only at tokens likely to be incorrect after the standard forward pass. During latent iterations, depth-aware Low-Rank Adaptation (LoRA) modules shift the objective from general next-token prediction to focused hard-token refinement. A duo-causal attention mechanism extends attention from the token sequence dimension to an additional iteration depth dimension, enabling cross-iteration information flow with full sequential parallelism. Experiments on nine benchmarks show consistent gains across math, QA, and coding tasks. With identical parameter counts, TaH outperforms always-iterate baselines by 3.8-4.4% while skipping iterations on 93% of tokens, and exceeds single-iteration Qwen3 baselines by 3.0-3.8%. When allowing <3% more parameters from LoRA and decider, the gains further increase to 5.3-6.2% and 6.1-6.8%, respectively. Our code is available at https://github.com/thu-nics/TaH.

URL PDF HTML ☆

赞 0 踩 0

2602.12279 2026-06-16 cs.CV cs.AI cs.LG 版本更新

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

UniT：统一多模态思维链测试时扩展

Leon Liangyu Chen, Haoyu Ma, Zhipeng Fan, Ziqi Huang, Animesh Sinha, Xiaoliang Dai, Jialiang Wang, Zecheng He, Jianwei Yang, Chunyuan Li, Junzhe Sun, Chu Wang, Serena Yeung-Levy, Felix Juefei-Xu

发表机构 * Stanford University（斯坦福大学）； Meta Superintelligence Labs（Meta超级智能实验室）； Nanyang Technological University（南洋理工大学）

AI总结提出UniT框架，通过多轮推理、验证和细化实现统一多模态模型的测试时扩展，实验表明短推理轨迹可泛化到长链，顺序思维链比并行采样更高效。

Comments CVPR 2026

详情

极简遗传编程

Leonardo Trujillo

发表机构 * Tecnológico Nacional de México/IT de Tijuana（墨西哥国家理工学院/蒂胡ana信息技术学院）； LASIGE, Department of Informatics, Faculty of Sciences, University of Lisbon（里斯本大学科学学院信息系LASIGE）

AI总结提出极简遗传编程（MGP），借鉴语言学中的极简主义程序，用MERGE操作替代进化搜索，在符号回归任务中有效避免膨胀，稳定找到精确解。

详情

AI中文摘要

遗传编程（GP）基于两个重要见解。首先，任何学习任务从根本上都可以视为程序归纳问题，目标是构建表示为语法树的符号层次模型。其次，将此任务视为搜索问题，并使用进化来定位所需模型。自提出以来，GP在广泛的任务和问题领域中取得了显著成果。本文通过修改GP的第二个核心见解，将问题视为句法推导任务，提出了一种替代观点。具体来说，本文提出了极简遗传编程（MGP），该算法与GP一样受生物启发，但并非源自进化，而是从人类语言的极简主义程序中汲取灵感，其中句法被理解为连接其他两个心智系统的最优解决方案。在极简主义中，核心计算过程是一个称为MERGE的二元集合形成算子，它可以通过简单的马尔可夫过程逐步构建复杂的句法结构。MGP能够发现符号表达式的核心构建块，并使用MERGE逐步组合它们。所提出的系统在已知因膨胀倾向而难以用标准GP系统解决的符号回归任务上进行了基准测试。结果表明，当选择适当的原子句法对象词典时，MGP能够在一组标准GP难以做到同样任务的符号回归中一致地产生精确的真实模型。极简主义提供的见解被证明与程序归纳问题相关，并且基于MGP在这项工作中展示的潜力，应进一步探索。

英文摘要

Genetic programming (GP) is based on two important insights. First, that any learning task can fundamentally be posed as a program induction problem, where the goal is to construct a symbolic hierarchical model that is expressed as a syntax tree. Second, to pose this task as a search problem, and use evolution to locate the desired model. Since it was proposed, GP has produced notable results in a wide range of tasks and problem domains. This work presents an alternative view by modifying the second core insight of GP, posing the problem as a syntactic derivation task instead. In particular, this paper presents Minimalist Genetic Programming (MGP), an algorithm that like GP is biologically inspired, but instead of evolution it takes inspiration from the Minimalist Program to human language, in which syntax is understood as an optimal solution to the problem of linking two other mental systems. In minimalism, the core computational process is a binary set formation operator called $MERGE$, than can be used to incrementally construct complex syntactic structures using a simple Markovian process. MGP is able to discover the core building blocks of the symbolic expressions, and to incrementally combined them using $MERGE$. The proposed system is benchmarked on symbolic regression tasks that are known to be difficult to solve with standard GP systems because of the propensity for bloat. Results show that when a proper lexicon of atomic syntactic objects are chosen, MGP is able to consistently produce the exact ground truth model on a set of symbolic regression tasks where standard GP struggles to do the same. The insights provided by minimalism are shown to be relevant to the problem of program induction, and should be explored further based on the potential exhibited by MGP in this work.

URL PDF HTML ☆

赞 0 踩 0

2606.13710 2026-06-16 cs.AI cs.LG 版本更新

Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher

混合开放式三重进化打造更优深度研究者

Hongming Piao, Chi Liu, Mengzhuo Chen, Yan Shu, Xidong Wang, Derek Li, Ying Wei, Bryan Dai

发表机构 * IQuest Research ； Zhejiang University（浙江大学）

AI总结提出混合开放式三重进化框架，通过混合模式强化学习协同进化提议者、求解者和评判者，使8B模型在深度研究任务上超越静态开源8-32B模型及先进训练方法。

详情

AI中文摘要

深度研究和智能体进化是AI智能体在现实应用中迈向通用人工智能的实际任务。前者使智能体能够在开放环境中自主检索和整合信息以处理开放式研究任务，但受限于智能体系统的静态参数化深度研究能力。后者允许智能体自主与环境交互以获得经验，从而进化模型能力。然而，其有效性仅在具有标准答案的可验证任务上得到广泛验证，与开放式研究任务存在差距。为桥接这两个关键任务，我们提出混合开放式三重进化框架，该框架利用混合模式强化学习，基于网络规模知识促进提议者、求解者和评判者的协同进化，朝着开放式任务和环境中自主进化的智能体迈进。在三个长格式深度研究基准上的大量实验表明，通过HOTE训练的8B模型超越了最强的静态开源8-32B模型以及通过最先进深度研究训练方法训练的模型，且时间开销更少，并进一步验证了HOTE中三个模块的进化不可或缺。

英文摘要

Deep research and agent evolution serve as de-facto tasks for AI agents in real-world applications toward artificial general intelligence. The former enables autonomous retrieval and integration of information in open-ended environments to tackle open-ended research tasks, yet it is constrained by the static parametric deep research capabilities of agent systems. The latter allows agents to autonomously interact with the environment to gain experiences that evolve model capabilities. However, its effectiveness has been widely validated only on verifiable tasks with standard answers, leaving a gap with open-ended research tasks. To bridge these two critical tasks, we propose the Hybrid Open-Ended Tri-Evolution (HOTE) framework, which leverages hybrid-mode reinforcement learning to facilitate the collaborative evolution of a proposer, solver and judge based on web-scale knowledge, moving toward autonomous evolving agents in open-ended tasks and environments. Extensive experiments on three long-form deep research benchmarks demonstrate that the 8B model trained via HOTE surpasses the strongest static open 8-32B models as well as those trained by state-of-the-art deep research training methods with less time overhead, and further verify that the evolution of all three modules in HOTE is indispensable.

URL PDF HTML ☆

赞 0 踩 0

2606.15054 2026-06-16 cs.LG 新提交

Size Doesn't Matter: Cosine-Scored Sparse Autoencoders

大小无关：余弦评分稀疏自编码器

Silen Naihin, Lev Stambler

发表机构 * GitHub ； arXiv

AI总结针对稀疏自编码器中内积评分受输入范数干扰的问题，提出余弦评分方法，使特征检测更关注方向对齐，实验表明该方法能更频繁地学习到人类可识别的概念。

详情

Journal ref: ICML 2026, Spotlight at the Mechanistic Interpretability Workshop

AI中文摘要

自回归蛋白质语言模型中的电路追踪

Darin Tsui, William Deinzer, Daniel Saeedi, Amirali Aghazadeh

发表机构 * Stanford University（斯坦福大学）

AI总结提出ProGenMech框架，通过跨层稀疏编码器忠实恢复ProGen3的生成计算，并零样本发现与蛋白质生成和适应性预测相关的稀疏电路，揭示生物意义基序。

Comments Accepted into the Mechanistic Interpretability Workshop at ICML 2026. 24 pages, 14 figures

详情

AI中文摘要

蛋白质语言模型（pLMs）可以生成具有超越自然界观察到的特性的新型蛋白质序列，然而蛋白质生成背后的机制仍然知之甚少。现有的基于稀疏自编码器和跨层编码器的机械可解释性方法主要关注蛋白质表示学习模型，并未捕捉自回归生成所需的计算。在这里，我们引入了ProGenMech，一个用于生成式蛋白质语言模型的机械可解释性框架，它将跨层编码器（CLTs）扩展到ProGen3，一个为因果生成和跨度填充训练的稀疏专家混合模型。与逐层方法不同，CLTs使用来自所有前层的稀疏潜变量重建每一层，从而能够忠实地恢复层间生成计算。我们进一步开发了一个零样本电路发现框架，以识别负责蛋白质生成和适应性预测的稀疏潜电路。在因果生成和零样本适应性估计任务中，ProGenMech在恢复ProGen3的概率分布和功能评分行为方面优于局部跨层编码器基线，同时在跨度填充任务中匹配原始模型的生成分布。此外，恢复的电路揭示了与保守序列模式和蛋白质适应性景观相关的生物学上有意义的基序和功能区域，为可解释和可引导的蛋白质生成奠定了基础。

英文摘要

Protein language models (pLMs) can generate novel protein sequences with properties beyond those observed in nature, yet the mechanisms underlying protein generation remain poorly understood. Existing mechanistic interpretability methods based on sparse autoencoders and transcoders primarily focus on protein representation learning models and do not capture the computation required for autoregressive generation. Here, we introduce ProGenMech, a mechanistic interpretability framework for generative protein language models that extends cross-layer transcoders (CLTs) to ProGen3, a sparse Mixture-of-Experts model trained for both causal generation and span infilling. Unlike per-layer approaches, CLTs reconstruct each layer using sparse latent variables from all preceding layers, enabling faithful recovery of inter-layer generative computation. We further develop a zero-shot circuit discovery framework to identify sparse latent circuits responsible for protein generation and fitness prediction. In causal generation and zero-shot fitness estimation tasks, ProGenMech outperforms local transcoder baselines in recovering ProGen3's probability distribution and functional scoring behavior, while matching the original model's generative distribution in span infilling tasks. Moreover, the recovered circuits reveal biologically meaningful motifs and functional regions associated with conserved sequence patterns and protein fitness landscapes, establishing a foundation for interpretable and steerable protein generation.

URL PDF HTML ☆

赞 0 踩 0

2606.16462 2026-06-16 cs.LG cs.AI 新提交

从物理到表示：通过程序化生成进行合成预训练的音频学习

Fengrui Liu, Ruiyang Huang, Qijian Zheng, Yuanfang Wang, Feng Liu

发表机构 * East China Normal University（华东师范大学）； Southeast University（东南大学）； Fudan University（复旦大学）； Shanghai Jiao Tong University（上海交通大学）

AI总结提出AudioPG框架，利用程序化合成生成波形进行掩码自编码器预训练，无需真实音频数据，在多个基准上取得高精度，且单GPU训练不到20分钟。

Comments Accepted to ACM ICMR 2026

详情

DOI: 10.1145/3805622.3810789

AI中文摘要

自监督学习推动了多媒体分析中音频表示的发展。然而，主流的数据驱动方法依赖大规模真实世界语料库，增加了训练成本、整理负担和隐私障碍。为解决这一问题，我们提出了AudioPG，一个程序化合成框架，在预训练过程中完全消除了真实音频录音。AudioPG在由基本声学基元和组合规则实时生成的波形上训练基于Transformer的掩码自编码器。该编码器有效迁移到真实音频基准，在ESC-50上达到90.60%的准确率，在FSD50K上达到0.546 mAP，在UrbanSound8K上达到88.17%，在Speech Commands V2上达到97.03%。值得注意的是，预训练在单个GPU上不到20分钟即可完成。潜在空间分析揭示了物理因素（包括基频和相对强度）在正交子空间中出现，使得表示可线性解码。这些结果表明，当大规模语料库不可用时，程序化合成是一种高效、可解释的预训练信号。我们的代码可在https://github.com/Freyliu0516/audioPG获取。

英文摘要

Self-supervised learning advances audio representation for multimedia analysis. However, prevailing data-centric approaches rely on massive real-world corpora, increasing training costs, curation burdens, and privacy barriers. To address this, we present AudioPG, a procedural synthesis framework eliminating real audio recordings during pre-training. AudioPG trains a Transformer-based masked autoencoder on waveforms generated on-the-fly from basic acoustic primitives and composition rules. The encoder transfers effectively to real audio benchmarks, achieving 90.60% accuracy on ESC-50, 0.546 mAP on FSD50K, 88.17% on UrbanSound8K, and 97.03% on Speech Commands V2. Notably, pre-training completes in under 20 minutes on a single GPU. Latent space analysis reveals physical factors, including fundamental frequency and relative intensity, emerge in orthogonal subspaces, making representations linearly decodable. These results establish procedural synthesis as an efficient, interpretable pre-training signal when large-scale corpora are unavailable. Our code is available at: https://github.com/Freyliu0516/audioPG.

URL PDF HTML ☆

赞 0 踩 0

2606.14813 2026-06-16 hep-ph cs.AI cs.LG 交叉投稿

JetParticle-JEPA: An Efficient Self-Supervised Representation Learning method for Jet Tagging in High-Energy Physics

JetParticle-JEPA：一种用于高能物理喷注标记的高效自监督表示学习方法

Guillaume Letellier, Antonin Vacheret, Frédéric Jurie

发表机构 * GREYC, Normandy University, Unicaen, ENSICAEN, UMR CNRS 6072（GREYC，诺曼底大学，Unicaen，ENSICAEN，CNRS UMR 6072）； LPC, Normandy University, Unicaen, ENSICAEN, IN2P3, UMR CNRS 6534（LPC，诺曼底大学，Unicaen，ENSICAEN，IN2P3，CNRS UMR 6534）

AI总结提出JetParticle-JEPA，一种基于粒子Transformer的自监督联合嵌入预测架构，无需标记或重建原始输入，直接从连续粒子云学习物理有意义的喷注表示，在JetClass等基准上达到与全监督方法相当的性能，并在低标签场景下超越监督基线。

详情

AI中文摘要

大型强子对撞机上的喷注标记越来越依赖于在大量模拟数据集上训练的深度学习模型，导致计算成本高且对探测器建模误差的鲁棒性有限。我们引入了JetParticle-JEPA (JP-JEPA)，一种自监督联合嵌入预测架构，它直接从连续粒子云中学习物理有意义的喷注表示，无需对原始输入进行标记化或重建。基于粒子Transformer主干，JP-JEPA在保留细粒度运动学相关性的同时预测被掩码粒子的潜在表示。在JetClass基准上，JP-JEPA在完整数据集上实现了与全监督最先进方法相当的性能，在低标签场景下超越了监督基线，并显著优于现有的自监督学习方法。在顶夸克和夸克-胶子喷注标记基准上，它与监督方法保持同等水平。学习到的表示还对缺失探测器信息表现出强鲁棒性，并改善了不确定性行为，凸显了JP-JEPA作为LHC上鲁棒且数据高效的喷注物理基础模型框架的潜力。

英文摘要

Jet tagging at the Large Hadron Collider increasingly relies on deep learning models trained on massive simulated datasets, leading to high computational costs and limited robustness to detector mismodeling. We introduce JetParticle-JEPA (JP-JEPA), a self-supervised Joint-Embedding Predictive Architecture that learns physically meaningful jet representations directly from continuous particle clouds without tokenization or reconstruction of raw inputs. Built on a Particle Transformer backbone, JP-JEPA predicts latent representations of masked particles while preserving fine-grained kinematic correlations. On the JetClass benchmark, JP-JEPA achieves performance comparable to fully supervised state-of-the-art methods on the full dataset, surpasses supervised baselines in low-label regimes, and significantly outperforms existing SSL approaches. On Top Quark and Quark-Gluon Tagging benchmarks, it remains on par with supervised methods. The learned representations also exhibit strong robustness to missing detector information and improved uncertainty behavior, highlighting JP-JEPA as a promising foundation-model framework for robust and data-efficient jet physics at the LHC.

URL PDF HTML ☆

赞 0 踩 0

2606.15134 2026-06-16 cs.CV cs.AI cs.LG 交叉投稿

Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings

超越标量距离：来自冻结MLLM的语义属性梯度用于视觉嵌入

Shubhang Bhatnagar, Dheeraj Baiju, Narendra Ahuja

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结提出SAGA框架，利用冻结的多模态大语言模型（MLLM）通过GRPO奖励机制为视觉编码器提供属性级监督，替代传统标量距离，提升零样本图像检索性能。

详情

AI中文摘要

用于检索的视觉编码器通常通过类标签监督进行训练：每个训练对简化为一个标量，均匀地将嵌入推远或拉近，就好像每个视觉属性要么不同要么匹配。一个多模态大语言模型（MLLM），在展示相同的一对图像时，能够阐述这些属性并利用它们预测图像是否共享一个类别。我们提出\textbf{SAGA}，一个框架，将这种基于语言、属性感知的感知转化为编码器本身的训练信号。具体来说，我们使用组相对策略优化（GRPO）来奖励MLLM对视觉编码器令牌的正确预测。由于正确的预测要求这些令牌暴露该对之间不同或匹配的具体属性，梯度推动编码器编码这些属性，用属性解析的监督取代统一的成对标量。一个辅助的注意力蒸馏损失将编码器的嵌入锚定到MLLM关注的令牌上，一个标准的度量学习损失塑造嵌入几何结构以进行最近邻检索。MLLM在整个过程中被冻结，在推理时被丢弃，与度量学习基线的部署成本相匹配。在CUB-200-2011、Cars-196、FGVC-Aircraft和iNaturalist Aves上的零样本图像检索中，SAGA在Recall@1上比最先进的基线提高了3到6个百分点。

英文摘要

Vision encoders for retrieval are typically trained with class-label supervision: each training pair reduces to a scalar that uniformly pushes the embedding apart or pulls it together, as if every visual attribute either differed or matched. A multimodal large language model (MLLM), shown the same pair, can articulate those attributes and use them to predict whether the images share a class. We propose \textbf{SAGA}, a framework that turns this language-grounded, attribute-aware perception into a training signal for the encoder itself. Specifically, we use Group Relative Policy Optimization (GRPO) to reward the MLLM for correct predictions on the vision encoder's tokens. Since correct predictions require those tokens to expose the specific attributes that differ or match between the pair, the gradient pushes the encoder to encode them, replacing the uniform pair-level scalar with attribute-resolved supervision. An auxiliary attention-distillation loss anchors the encoder's embedding to tokens the MLLM attended to, and a standard metric-learning loss shapes the embedding geometry for nearest-neighbour retrieval. The MLLM is frozen throughout and discarded at inference, matching the deployment cost of a metric-learning baseline. SAGA improves Recall@1 by 3 to 6 points over state-of-the-art baselines on CUB-200-2011, Cars-196, FGVC-Aircraft, and iNaturalist Aves on zero-shot image retrieval.

URL PDF HTML ☆

赞 0 踩 0

2606.15284 2026-06-16 eess.SP cs.AI cs.LG 交叉投稿

CAP: Towards PPG Universal Representation Learning with Patient-level Supervision

CAP：面向患者级监督的PPG通用表示学习

Chenyang He, Xinyi Shao, Shun Huang, Bosong Huang, Daoqiang Zhang, Ming Jing, Cheng Ding

发表机构 * Nanjing University of Aeronautics and Astronautics（南京航空航天大学）； Peking University（北京大学）； Independent Researcher（独立研究者）； Jinling Clinical Medical College College of Artificial Intelligence Nanjing University of Aeronautics and Astronautics（金陵临床医学院人工智能学院南京航空航天大学）

AI总结提出CAP方法，通过构建大规模PPG-EHR多模态数据集和跨模态对比对齐，学习患者级临床语义的PPG表示，在四项下游任务中平均提升26.7%，呼吸率预测提升87.6%。

Comments Accepted as an Oral presentation at KDD 2026

详情

DOI: 10.1145/3770855.3818881

AI中文摘要

光电容积描记法（PPG）在可穿戴健康监测和临床决策支持中发挥着核心作用。然而，现有的通用PPG表示学习方法主要关注信号级目标，往往忽略患者级健康背景，这限制了对复杂临床任务和异质性队列的泛化能力。为解决这一问题，我们通过将碎片化的病史和临床记录整合为连贯的患者级电子健康记录（EHR），构建了一个大规模配对PPG-EHR多模态数据集。基于此资源，我们提出了临床锚定预训练方法（CAP）。在预训练期间，CAP执行跨模态对比对齐，将PPG表示锚定到患者级临床语义，引导编码器超越波形拟合，建模患者整体生理状态的一致性。在下游适应期间，预训练的PPG编码器提供临床基础的表示，增强归纳偏置，提高鲁棒性和可迁移性。实验表明，CAP在四个不同的下游任务上持续优于强基线。CAP在呼吸率预测上取得了特别大的提升（相比最先进基线相对提升高达87.6%），并在所有任务上平均相对提升26.7%。我们通过全面分析（包括消融实验和多个互补的可视化学习表示）进一步增强了方法的可解释性。实验代码可在 https://github.com/gody123gody/CAP 获取。

英文摘要

Photoplethysmography (PPG) plays a central role in wearable health monitoring and clinical decision support. Yet existing approaches to universal PPG representation learning largely focus on signal-level objectives and often overlook patient-level health context, which limits generalization to complex clinical tasks and heterogeneous cohorts. To address this gap, we construct a large-scale paired PPG-EHR multimodal dataset by distilling fragmented medical histories and clinical records into cohesive, patient-level electronic health records (EHR). Building on this resource, we propose Clinical Anchored Pretraining for PPG (CAP). During pretraining, CAP performs cross-modal contrastive alignment that anchors PPG representations to patient-level clinical semantics, guiding the encoder beyond waveform fitting toward modeling consistency in a patient's overall physiological state. During downstream adaptation, the pretrained PPG encoder provides clinically grounded representations that strengthen inductive bias and improve robustness and transferability. Experiments demonstrate that CAP consistently outperforms strong baselines on four diverse downstream tasks. CAP achieves a particularly large gain on respiratory rate prediction (up to +87.6% relative improvement over the state-of-the-art baseline) and delivers an average relative +26.7% across all tasks. We further enhance the interpretability of our approach through comprehensive analyses, including ablations and multiple complementary visualizations of the learned representations. The code for our experiments is available at: https://github.com/gody123gody/CAP .

URL PDF HTML ☆

赞 0 踩 0

2606.15468 2026-06-16 cs.CV cs.LG 交叉投稿

Analyzing Visual Aircraft Representations with Sparse Autoencoders

使用稀疏自编码器分析飞机视觉表示

Deepshik Sharma

发表机构 * Jain University（耆那大学）

AI总结本文通过稀疏自编码器分解ConvNeXt模型在FGVC-Aircraft数据集上的中间表示，发现可解释的飞机结构特征，并通过消融实验验证其类别相关性。

Comments 18 pages, 4 figures, 7 tables

详情

AI中文摘要

视觉模型可以在分类任务上取得强性能，但支持其预测的内部表示通常难以解释。本文研究稀疏自编码器是否可以将视觉模型的中间表示分解为可解释的特征。我们在FGVC-Aircraft数据集上训练ConvNeXt分类器，从其最终特征阶段提取空间激活，并在这些激活上训练稀疏自编码器。使用最高激活图像块、激活强度和类别选择性分析学习到的稀疏特征。定性视觉检查显示，几个特征对应于可识别的飞机结构和视觉模式。我们使用输入空间和特征空间消融评估选定的特征子集，测量模糊图像块和抑制稀疏特征对类别logits、分类边界和预测置信度的影响。结果表明，稀疏自编码器可以揭示与飞机识别相关的部分可解释、类别相关的视觉特征，同时也暴露出多义性和粗糙空间定位等局限性。

英文摘要

Vision models can achieve strong performance on classification tasks, but the internal representations supporting their predictions are often difficult to interpret. This work investigates whether sparse autoencoders can decompose intermediate representations of a vision model into interpretable features. We train a ConvNeXt classifier on the FGVC-Aircraft dataset, extract spatial activations from its final feature stage, and train a sparse autoencoder on these activations. The learned sparse features are analyzed using top-activating image patches, activation strength, and class selectivity. Qualitative visual inspection reveals that several features correspond to recognizable aircraft structures and visual patterns. We evaluate a subset of selected features using input-space and feature-space ablations, measuring how blurring image patches and suppressing sparse features affect class logits, classification margins, and prediction confidence. The results suggest that sparse autoencoders can reveal partially interpretable, class-relevant visual features associated with aircraft recognition, while also exposing limitations such as polysemanticity and coarse spatial localization.

URL PDF HTML ☆

赞 0 踩 0

2606.15956 2026-06-16 cs.CV cs.AI cs.LG 交叉投稿

You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences

你不需要强假设：通过时间差异进行视觉表示学习

Ninad Daithankar, Alexi Gladstone, Yann LeCun, Heng Ji

发表机构 * UIUC（伊利诺伊大学厄巴纳-香槟分校）； New York University（纽约大学）

AI总结提出TDV方法，基于因果假设（过去导致未来）从视频中自监督学习，避免强归纳偏置，在密集空间任务上达到SOTA。

详情

AI中文摘要

AI的进步很大程度上是由假设更少的方法驱动的。随着计算和数据量的增加，弱归纳偏置的方法通常优于强假设的方法。这在视觉表示学习领域尤为典型，方法从监督学习主导，到弱监督学习，再到如今无需人工标签的自监督学习的广泛成功。然而，即使是现代自监督学习方法仍然依赖于强归纳偏置，如数据增强、掩码或裁剪。如果这一趋势持续，这些剩余的偏置在大规模下将成为瓶颈——我们的实验证实了这一点：随着数据增长，归纳偏置的最优强度降低。这促使我们寻找依赖更少假设的方法。为此，我们提出了视觉时间差异（TDV），一种从视频中进行自监督学习的新范式，它避免了现有的归纳偏置，而是依赖于一个因果假设：过去导致未来。TDV通过联合训练图像编码器和运动编码器，使得当前帧的表示加上编码的运动等于下一帧的表示。尽管没有利用任何强归纳偏置，TDV在密集空间任务上达到了最先进的水平，为无需强假设的表示学习奠定了基础。

英文摘要

Progress in AI has largely been driven by methods that assume less. As compute and data increase, approaches with weaker inductive biases generally outperform those with stronger assumptions. This is particularly characteristic of the field of Visual Representation Learning, where approaches have gone from being dominated by Supervised Learning, to Weakly Supervised Learning, to the now widespread success of Self-Supervised Learning without human labels. Yet, even modern Self-Supervised Learning approaches still depend on strong inductive biases such as augmentations, masking, or cropping. If this trend holds, even these remaining biases should become bottlenecks at scale -- and our experiments confirm this: the optimal strength of inductive biases decreases as data grows. This motivates the search for approaches that rely on fewer assumptions. To this end, we introduce Temporal Difference in Vision (TDV), a new paradigm for self-supervised learning from video that avoids existing inductive biases, relying instead on a causal assumption that the past causes the future. TDV functions by jointly training an image encoder and a motion encoder so that the current frame's representation plus the encoded motion equals the next frame's representation. Despite not leveraging any strong inductive biases, TDV matches state-of-the-art recipes on dense spatial tasks, laying the foundation for representation learning without strong assumptions.

URL PDF HTML ☆

赞 0 踩 0

2606.16193 2026-06-16 cs.CV cs.AI cs.LG 交叉投稿

Cascaded Sparse Autoencoders Learn Multi-Level Visual Concepts in Multimodal LLMs

级联稀疏自编码器在多模态大语言模型中学习多级视觉概念

Yusong Zhao, Hengyi Wang, Tanuja Ganu, Akshay Nambi, Hao Wang

发表机构 * Rutgers University（罗格斯大学）； Microsoft Research（微软研究院）

AI总结提出级联稀疏自编码器（CSAEs），通过在第一级SAE解码器权重上训练第二级SAE来学习层次化视觉概念，避免嵌套或堆叠SAE的缺点，在多个MLLM和数据集上提升了概念层次一致性和干预效果。

详情

AI中文摘要

多模态大语言模型（MLLMs）在视觉-语言任务上表现出色，但其内部视觉表示仍难以解释。稀疏自编码器（SAEs）提供了一种可扩展的方式，将密集模型激活分解为稀疏、可解释的特征。然而，现有SAE架构主要恢复扁平特征字典，不太适合显式的多级概念组织。在本文中，我们引入级联稀疏自编码器（CSAEs）用于学习MLLMs中的层次化视觉概念。CSAEs并非嵌套或堆叠SAE稀疏激活码，而是直接在第一个SAE的解码器权重上训练第二个SAE，将学习到的低级特征方向作为高级抽象的输入。这种设计使CSAEs能够学习“概念的概念”，同时避免了嵌套、Matryoshka式层次结构中的共享前缀耦合问题以及简单堆叠SAE的瓶颈。在Qwen3-VL、Gemma-3和LLaVA上的多个视觉数据集上的实验表明，与最先进的SAE基线相比，CSAEs在层次概念一致性方面提高了可解释性。概念引导的结果进一步表明，学习到的概念组支持对MLLM输出进行有效的组级干预。

英文摘要

Multimodal Large Language Models (MLLMs) have demonstrated strong performance on vision-language tasks, yet their internal visual representations remain difficult to interpret. Sparse Autoencoders (SAEs) provide a scalable way to decompose dense model activations into sparse, interpretable features. However, existing SAE architectures primarily recover flat feature dictionaries and are less suited for explicit multi-level concept organization. In this paper, we introduce cascaded sparse autoencoders (CSAEs) for learning hierarchical visual concepts in MLLMs. Rather than nesting or stacking SAE sparse activation codes, CSAEs train a second-level SAE directly on the decoder weights of the first-level SAE, treating learned low-level feature directions as inputs for higher-level abstraction. This design enables CSAEs to learn "concepts of concepts" while avoiding drawbacks from the shared-prefix coupling of nesting, Matryoshka-style hierarchies and the bottlenecks of naively stacked SAEs. Experiments across Qwen3-VL, Gemma-3, and LLaVA on multiple visual datasets show that CSAEs improve interpretability in terms of hierarchical concept coherence over state-of-the-art SAE baselines. Results on concept steering further demonstrate that the learned concept groups support effective group-level interventions in MLLM outputs.

URL PDF HTML ☆

赞 0 踩 0

2606.16240 2026-06-16 cs.CL cs.LG 交叉投稿

Creative Collision: Directorial Persona Steering and Competition in Large Language Models

创意碰撞：大型语言模型中的导演人格引导与竞争

Subramanyam Sahoo, Justin Shenk

发表机构 * AI Safety Camp（AI安全训练营）

AI总结研究通过叠加两种语义相反的导演人格向量（斯皮尔伯格与斯科塞斯）来引导语言模型生成，发现斯皮尔伯格向量主导道德倾向，中间点提升连贯性，且两者在特定层共享道德基调基底。

Comments Accepted at ICML 2026 Workshop on Human-AI Co-Creativity

详情

AI中文摘要

激活引导已成为在推理时塑造大型语言模型行为的强大工具，但以往大多数工作向残差流注入单一的语义方向。我们研究了两种语义相反的引导向量叠加的丰富场景——我们称之为“创意碰撞”。具体而言，我们通过在精心策划的剧本语料库上进行均值差异激活对比，构建了史蒂文·斯皮尔伯格（乐观、救赎的道德价值）和马丁·斯科塞斯（黑暗、道德模糊）的导演人格向量，然后通过标量混合参数$α\in[0,1]$和引导系数$λ$在两者之间进行插值。在五个评估轴（道德价值、生成连贯性、表面风格、方向主导性和向量几何）上，出现了三个主要发现：（i）斯皮尔伯格的表征特征表现出稳健的“方向主导性”，在几乎整个插值范围内抑制了斯科塞斯的道德影响；（ii）中间碰撞点在高$λ$下相对于纯单导演引导反而提高了生成连贯性；（iii）两种人格在40层仅解码器Transformer的第28层达到最大定位，揭示了一个共享的“道德基调基底”。这些结果阐明了Transformer残差流中竞争语义方向的几何结构，并对可控创意生成和价值对齐叙事合成具有直接影响。

英文摘要

Activation steering has emerged as a powerful tool for shaping the behaviour of large language models at inference time, yet most prior work injects a \emph{single} semantic direction into the residual stream. We study the richer setting in which two semantically opposing steering vectors are superimposed -- a regime we call \textbf{Creative Collision}. Concretely, we construct directorial persona vectors for Steven Spielberg (optimistic, redemptive moral valence) and Martin Scorsese (dark, morally ambiguous) via mean-difference activation contrast on curated screenplay-derived corpora, then interpolate between them with a scalar mixing parameter $α\in [0,1]$ and a steering coefficient $λ$. Across five evaluation axes -- moral valence, generation coherence, surface style, directional dominance, and vector geometry -- three principal findings emerge: (i)~Spielberg's representational signature exhibits robust \emph{directional dominance}, suppressing Scorsese's moral influence across almost the entire interpolation range; (ii)~intermediate collision points paradoxically \emph{improve} generation coherence relative to pure single-director steering at high $λ$; and (iii)~both personas localise maximally to layer~28 of a 40-layer decoder-only transformer, revealing a shared \emph{moral-tone substrate}. These results illuminate the geometry of competing semantic directions in transformer residual streams and have direct implications for controllable creative generation and value-aligned narrative synthesis.

URL PDF HTML ☆

赞 0 踩 0

2508.00956 2026-06-16 cs.LG cs.AI cs.IR 版本更新

自监督学习作为离散通信

Kawtar Zaher, Ilyass Moummad, Olivier Buisson, Alexis Joly

发表机构 * Kawtar Zaher, Ilyass Moummad, Olivier Buisson, Alexis Joly

AI总结将视觉自监督学习视为教师与学生网络间的离散通信过程，通过固定容量二进制信道传输语义信息，使用逐元素二元交叉熵目标强制离散一致性，并引入编码率正则化促进结构化表示，在图像分类、检索和密集预测任务上优于连续对齐基线。

详情

AI中文摘要

大多数自监督学习（SSL）方法通过对齐同一输入的不同视图来学习连续视觉表示，对信息如何在表示维度间进行结构化提供的控制有限。在这项工作中，我们将视觉自监督学习视为教师网络与学生网络之间的离散通信过程，其中语义信息通过固定容量的二进制信道传输。学生网络不是对齐连续特征，而是预测教师网络产生的多标签二进制消息。通过逐元素二元交叉熵目标强制离散一致性，同时编码率正则化项鼓励有效利用受限信道，促进结构化表示。我们进一步表明，周期性地重新初始化投影头通过鼓励嵌入在多个离散编码中保持可预测性来增强这种效果。大量实验表明，在图像分类、检索和密集视觉预测任务中，以及通过自监督适应在领域转移下，该方法持续优于连续对齐基线。除了骨干表示，我们分析了学习到的二进制编码，并表明它们形成了一种紧凑且信息丰富的离散语言，捕获了可跨类别复用的语义因子。

英文摘要

Most self-supervised learning (SSL) methods learn continuous visual representations by aligning different views of the same input, offering limited control over how information is structured across representation dimensions. In this work, we frame visual self-supervised learning as a discrete communication process between a teacher and a student network, where semantic information is transmitted through a fixed-capacity binary channel. Rather than aligning continuous features, the student predicts multi-label binary messages produced by the teacher. Discrete agreement is enforced through an element-wise binary cross-entropy objective, while a coding-rate regularization term encourages effective utilization of the constrained channel, promoting structured representations. We further show that periodically reinitializing the projection head strengthens this effect by encouraging embeddings that remain predictive across multiple discrete encodings. Extensive experiments demonstrate consistent improvements over continuous agreement baselines on image classification, retrieval, and dense visual prediction tasks, as well as under domain shift through self-supervised adaptation. Beyond backbone representations, we analyze the learned binary codes and show that they form a compact and informative discrete language, capturing semantic factors reusable across classes.

URL PDF HTML ☆

赞 0 踩 0

2604.25853 2026-06-16 cs.CL cs.AI cs.LG 版本更新

G-Loss: Graph-Guided Fine-Tuning of Language Models

G-Loss：图引导的语言模型微调

Aditya Sharma, Vinti Agarwal, Rajesh Kumar

发表机构 * BITS Pilani（BITS 派拉尼）； Bucknell University（巴克内尔大学）

AI总结提出G-Loss损失函数，通过构建文档相似度图并利用半监督标签传播捕捉全局语义结构，引导语言模型学习更具判别性和鲁棒性的嵌入，在多个分类任务上提升准确率并加速收敛。

Comments 20 pages, Learning on Graphs (LoG2025)

详情

AI中文摘要

用于微调预训练语言模型（如BERT）的传统损失函数，包括交叉熵、对比损失、三元组损失和监督对比损失，仅在局部邻域内操作，未能考虑全局语义结构。我们提出了G-Loss，一种图引导的损失函数，它结合半监督标签传播来利用嵌入流形中的结构关系。G-Loss构建了一个文档相似度图，捕捉全局语义关系，从而引导模型学习更具判别性和鲁棒性的嵌入。我们在五个涵盖关键下游分类任务的基准数据集上评估了G-Loss：MR（情感分析）、R8和R52（主题分类）、Ohsumed（医学文档分类）和20NG（新闻分类）。在大多数实验设置中，G-Loss收敛更快，并产生语义一致的嵌入空间，从而比使用传统损失函数微调的模型获得更高的分类准确率。

英文摘要

Traditional loss functions, including cross-entropy, contrastive, triplet, and su pervised contrastive losses, used for fine-tuning pre-trained language models such as BERT, operate only within local neighborhoods and fail to account for the global semantic structure. We present G-Loss, a graph-guided loss function that incorporates semi-supervised label propagation to use structural relationships within the embedding manifold. G-Loss builds a document-similarity graph that captures global semantic relationships, thereby guiding the model to learn more discriminative and robust embeddings. We evaluate G-Loss on five benchmark datasets covering key downstream classification tasks: MR (sentiment analysis), R8 and R52 (topic categorization), Ohsumed (medical document classification), and 20NG (news categorization). In the majority of experimental setups, G-Loss converges faster and produces semantically coherent embedding spaces, resulting in higher classification accuracy than models fine-tuned with traditional loss functions.

URL PDF HTML ☆

赞 0 踩 0

2606.14801 2026-06-16 cs.LG cs.AI cs.RO 新提交

QPILOTS: Efficient Test-Time Q-Steering for Flow Policies

QPILOTS：面向流策略的高效测试时Q引导

Yifan Ruan, Chenyang Cao, Andreas Burger, Ali Pesaranghader, Kaveh Kamali, Jaehong Kim, Nandita Vijaykumar, Alan Aspuru-Guzik, Igor Gilitschenski, Nicholas Rhinehart

发表机构 * University of Toronto（多伦多大学）； Vector Institute（向量研究所）； LG Electronics（LG电子）

AI总结提出QPILOTS方法，在推理时通过投影去噪中间状态到最终动作估计并计算评论家梯度来引导流匹配和扩散策略，无需修改原策略，在离线到在线RL基准上达到90%平均成功率。

Comments 10 pages, 7 figures

详情

AI中文摘要

流匹配和扩散策略是表达力强的动作生成器，但使用时序差分强化学习（RL）优化它们仍然困难。有效的策略提取需要利用评论家的动作梯度，但通过多步去噪过程直接反向传播该信号可能数值不稳定。现有方法要么丢弃梯度信息，将策略蒸馏为更简单的单步动作器，要么随着评论家改进而重复微调去噪策略。我们提出QPILOTS，一种保持原策略不变并在推理时引导去噪过程的方法。在每个去噪步骤中，我们不是评估评论家对噪声中间动作（其中评论家预测不可靠），而是首先将该中间状态投影到最终干净动作的估计，并在那里计算评论家梯度。我们引入两种变体：QPILOTS-U使用快速单点近似，而QPILOTS-M通过学习的辅助网络绘制可微后验样本。在标准的离线到在线RL基准测试中，QPILOTS实现了最佳整体性能，在50个任务中达到平均90%的成功率。我们还应用QPILOTS引导一个大型、冻结的预训练视觉-语言动作（VLA）基础模型，在模拟的六个操作任务中优于或匹配先前的推理时方法。

英文摘要

Flow-matching and diffusion policies are expressive action generators, but optimizing them with temporal-difference reinforcement learning (RL) remains difficult. Effective policy extraction requires exploiting the critic's action gradient, yet directly backpropagating this signal through a multi-step denoising process can be numerically unstable. Existing methods work around this either by discarding gradient information, distilling the policy into a simpler one-step actor, or repeatedly fine-tuning the denoising policy as the critic improves. We propose QPILOTS, a method that leaves the original policy unmodified and steers the denoising process at inference time. At each denoising step, instead of evaluating the critic on the noisy intermediate action where critic predictions are unreliable, we first project that intermediate state to an estimate of the final clean action and compute the critic gradient there. We introduce two variants: QPILOTS-U uses a fast single-point approximation, while QPILOTS-M draws differentiable posterior samples via a learned auxiliary network. On a standard offline-to-online RL benchmark, QPILOTS achieves the best aggregate performance, reaching an average success rate of 90% across 50 tasks. We also apply QPILOTS to steer a large, frozen, pretrained Vision-Language Action (VLA) foundation model, outperforming or matching prior inference-time approaches across six manipulation tasks in simulation.

URL PDF HTML ☆

赞 0 踩 0

2606.14929 2026-06-16 cs.LG cs.AI stat.ML 新提交

探索性初始状态并不足够：蒙特卡洛探索性初始状态的反例与修正

Octave Oliviers, Glenn Vinnicombe

发表机构 * Department of Engineering, University of Cambridge（剑桥大学工程系）

AI总结本文通过构造反例证明，在表格设置下，蒙特卡洛探索性初始状态（MCES）算法可能收敛到次优解，并提出基于状态级学习率缩放的修正方法以恢复最优性收敛。

详情

AI中文摘要

蒙特卡洛探索性初始状态（MCES）的渐近行为是强化学习中一个长期存在的开放问题，即使在表格设置中也是如此。我们通过构造算法收敛到次优解的例子，研究了表格MCES的收敛性质。本文为初始访问和首次访问MCES提供了新的反例，并给出了初始访问情况下的收敛恢复修正。我们表明，即使贪婪动作平均更新频率高于非贪婪动作，初始访问MCES在样本平均更新下也可能存在稳定的次优解。然而，通过按状态将学习率与更新频率成反比缩放，可以保证收敛到最优性。与之前的均匀化方法不同，此修正适用于需要近似估计值函数的大规模问题。然后，我们扩展该例子以表明样本平均首次访问MCES也可能收敛到次优解。这基本上解决了一个基本的开放问题，并表明仅靠探索性初始状态并不能保证收敛到最优性。更广泛地说，这些结果突显了收敛性关键取决于应用于不同动作的更新的相对大小和频率，使得学习率的选择以及探索与利用的平衡成为MCES分析和可扩展蒙特卡洛控制方法实现的核心。

英文摘要

The asymptotic behaviour of Monte Carlo Exploring Starts (MCES) is a long-standing open question in reinforcement learning, even in the tabular setting. We investigated the convergence properties of tabular MCES by constructing examples in which the algorithm converges to suboptimal solutions. This paper presents new counterexamples for both initial-visit and first-visit MCES and gives a convergence-restoring modification for the initial-visit case. We show that stable suboptimal solutions may exist for initial-visit MCES with sample-average updates even when greedy actions are updated more often than non-greedy actions on average. However, by scaling learning rates inversely to update frequencies on a state-by-state basis, convergence to optimality is guaranteed. Unlike previous uniformisation methods, this modification is applicable to large-scale problems that require approximating the estimated value function. We then extend the example to show that sample-average first-visit MCES may also converge to suboptimal solutions. This largely settles a fundamental open problem and shows that exploring starts alone do not guarantee convergence to optimality. More broadly, these results highlight that convergence depends critically on the relative size and frequency of updates applied to different actions, making the choice of learning rates and the balance between exploration and exploitation central to the analysis of MCES and the implementation of scalable Monte Carlo control methods.

URL PDF HTML ☆

赞 0 踩 0

2606.15260 2026-06-16 cs.LG cs.AI 新提交

Trust-Region Diffusion Policies for Massively Parallel On-Policy RL

大规模并行在线强化学习的信任区域扩散策略

Huy Le, Onur Celik, Denis Blessing, Tai Hoang, Claas A Voelcker, Axel Brunnbauer, Felix Richter, Michael Volpp, Gerhard Neumann

发表机构 * University of Freiburg（弗赖堡大学）； Max Planck Institute for Intelligent Systems（智能系统马克斯·普朗克研究所）

AI总结提出TruDi方法，通过信任区域优化约束扩散轨迹的KL散度，实现大规模并行在线强化学习中的稳定训练，在73个任务中优于或持平基线。

详情

AI中文摘要

利用大规模并行模拟的强化学习已成为开发鲁棒、可部署策略的标准框架；然而，大多数现有方法仍依赖简单的高斯策略参数化。扩散模型提供了更具表达力的策略类，并在具有挑战性的控制问题上表现出色，但大多数基于扩散的强化学习方法是为离线或离策略训练设计的。在这项工作中，我们探究扩散策略能否在大规模并行、在线策略机制下有效训练。为此，我们引入了信任区域扩散策略（TruDi），它使得扩散策略能够用于大规模并行模拟的在线强化学习。这种设置特别具有挑战性，因为数据分布在每次更新中快速变化，使得复杂策略的稳定训练变得困难。TruDi通过整合信任区域优化规则来约束整个扩散轨迹上的KL散度，从而解决了这一问题。实验上，我们在包含73个任务的4个不同的大规模并行强化学习基准上评估了TruDi。在这些任务中，TruDi在标准任务上始终优于或与强基线持平，在更具挑战性的人形控制任务上取得了明显收益，为大规模并行在线强化学习建立了新的强基线。

英文摘要

Reinforcement learning with massively parallel simulations has become a standard framework for developing robust, deployable policies; however, most existing approaches still rely on simple Gaussian policy parameterizations. Diffusion models provide a more expressive policy class and have shown strong performance on challenging control problems, yet most diffusion-based RL methods are designed for offline or off-policy training. In this work, we ask whether diffusion policies can be trained effectively in the massively parallel, on-policy regime. To this end, we introduce Trust-region Diffusion Policies (TruDi), which enables diffusion policies for on-policy RL with massively parallel simulations. This setting is particularly challenging because the data distribution changes quickly across updates, making stable training with complex policies difficult. TruDi addresses this by integrating a trust-region optimization rule to enforce a KL-divergence constraint over the entire diffusion trajectory. Empirically, we evaluate TruDi on a diverse set of 4 massively parallel RL benchmarks comprising a total of 73 tasks. Across these tasks, TruDi consistently outperforms or is on-par with strong baselines on standard tasks and achieves clear gains on more challenging humanoid control tasks, establishing a strong new baseline for massively parallel on-policy RL.

URL PDF HTML ☆

赞 0 踩 0

2606.15301 2026-06-16 cs.LG cs.AI 新提交

Discovering Lattice Reduction Strategies via Self-Play

通过自我对弈发现格基约简策略

Mohamed Malhou, Kristin Lauter, Ludovic Perret

发表机构 * FAIR, Meta Superintelligence Labs（Meta超级智能实验室FAIR）； Sorbonne Université CNRS, LIP6（索邦大学CNRS/LIP6）； EPITA, EPITA Research Lab (LRE)（EPITA研究实验室(LRE)）

AI总结利用深度强化学习和AlphaZero风格自我对弈，在LLL原始动作空间中学习更优的格基约简策略，训练于8维格但可零样本泛化至32维。

2606.15917 2026-06-16 cs.LG 新提交

Reinforcement Learning for LLM-based Event Forecasting

基于强化学习的LLM事件预测

Amit Arnold Levy

发表机构 * Advanced Computer Science（高级计算机科学）； DeepSeek R1

AI总结使用GRPO微调LLM，结合Wikipedia修订工具获取实时信息，预测未来事件，使1.5B参数模型性能超越Claude Sonnet 3.5。

Comments Submitted internally at the University of Oxford in Oct 2025, migrated to arXiv on Jun 2026

详情

AI中文摘要

我们使用Group Relative Policy Optimization (GRPO)，一种最近提出的样本和内存高效的强化学习方法，来微调预训练的LLM（参数范围1.5B到14B），使其能够通过Wikipedia修订工具或新闻摘要获取当前信息，从而预测超出LLM知识截止日期的真实事件，以及模拟训练动态不同方面的问题。我们利用这些实验结果来评论LLM在预测方面的扩展能力，并分类判断性预测如何适应可验证/不可验证的领域分类法，考虑预测未来事件时固有的偶然不确定性（例如掷骰子）的影响。通过GRPO训练，我们成功使一个1.5B参数的Transformer（Qwen 2.5 1.5B）在预测性能上超越了Claude Sonnet 3.5，以市场同意概率的交叉熵衡量。我们还讨论了达到这一结果过程中的各种死胡同。

英文摘要

We use Group Relative Policy Optimization (GRPO), a recently devised sample and memory efficient reinforcement learning method, to finetune pretrained LLMs in the range of 1.5B to 14B parameters equipped with the ability to get current information through the use of a Wikipedia revisions tool, or news summaries, to forecast real events beyond the knowledge cutoff of the LLM, as well as problems made to simulate different aspects of the dynamics of that training. We use the results of these experiments to comment on the scaling capability of LLMs for forecasting, as well as classify how judgmental forecasting fits into the verifiable/unverifiable domain taxonomy, considering the impact of the inherent aleatoric uncertainty when forecasting future events (e.g. the roll of a die). As a result of the GRPO training, we manage to bring a 1.5B parameter transformer (Qwen 2.5 1.5B) to forecasting performance superior to Claude Sonnet 3.5 over the same dataset as measured by cross entropy from the market agreed probabilities. We also discuss various dead ends on the path to this result.

URL PDF HTML ☆

赞 0 踩 0

2606.15978 2026-06-16 cs.LG 新提交

Scalar-Stepsize Nonuniform Monte Carlo Optimistic Policy Iteration: A Certified Counterexample

标量步长非均匀蒙特卡洛乐观策略迭代：一个经过认证的反例

Yuanlong Chen

发表机构 * Yuanlong Chen（陈元龙）

AI总结针对非均匀更新频率下的蒙特卡洛乐观策略迭代，本文通过一个三状态MDP反例证明标量步长非均匀异步值迭代可能不收敛，并揭示了各向异性畸变导致的切换吸引环。

详情

AI中文摘要

Tsitsiklis证明了在均匀更新结构下蒙特卡洛乐观策略迭代的收敛性，并指出非均匀更新频率是一个微妙的障碍。对于自然的标量步长、非归一化异步状态值递归，在固定非均匀状态选择概率下，我们给出了一个经过认证的否定答案。在一个三状态、两动作的折扣MDP中，非均匀更新频率诱导出一个对角缩放贪婪策略平均场，该平均场具有一个经过认证的非恒定混合周期轨道。使用有界无偏几何视界估计器和Robbins-Monro步长，原始随机递归以正概率被困在周期附近，因此无法收敛。该例子揭示了一个几何障碍：均匀采样产生径向残差收缩，而标量非均匀采样各向异性地扭曲残差动态，并可能产生切换吸引环。

英文摘要

Tsitsiklis proved convergence of Monte Carlo optimistic policy iteration under a uniform update structure and identified nonuniform update frequencies as a delicate obstruction. We give a certified negative answer for the natural scalar-stepsize, unnormalized asynchronous state-value recursion with fixed nonuniform state-selection probabilities. In a three-state, two-action discounted MDP, the nonuniform update frequencies induce a diagonally scaled greedy-policy mean field with a certified nonconstant attracting hybrid periodic orbit. With a bounded unbiased geometric-horizon estimator and Robbins--Monro stepsizes, the original stochastic recursion remains trapped near the cycle with positive probability and therefore fails to converge. The example pinpoints a geometric obstruction: uniform sampling gives radial residual contraction, whereas scalar nonuniform sampling anisotropically distorts the residual dynamics and can generate switched attracting cycles.

URL PDF HTML ☆

赞 0 踩 0

2606.16154 2026-06-16 cs.LG 新提交

A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization

RLVR稳定性与胜者优势策略优化的梯度视角

Prasanth YSS, Zhichen Ren, Rasa Hosseinzadeh, Ilan Gofman, Yuqi Chen, Zhaoyan Liu, Guangwei Yu, Jesse C. Cresswell, Satya Krishna Gorti

发表机构 * Berkeley（伯克利）； Layer 6 AI

AI总结通过令牌级梯度动力学分析GRPO的不稳定性，提出仅更新正优势完成的WAPO算法，在数学推理和多跳QA任务中提升训练稳定性并匹配或超越基线。

2606.16236 2026-06-16 cs.LG cs.NE 新提交

Evolutionary Bilevel Reward Shaping for Generalization in Reinforcement Learning

进化双层奖励塑形以增强强化学习的泛化能力

Ekasit Usaratniwart, Xilin Gao, Marc Ong, Youhei Akimoto

发表机构 * University of Tsukuba（筑波大学）； RIKEN Center for Advanced Intelligence Project（理化学研究所革新智能综合研究中心）

AI总结提出GERS方法，通过双层优化利用标量验证反馈调整奖励函数，在限制轨迹访问下提升强化学习在未见环境中的泛化性能。

Comments Accepted at PPSN 2026

详情

AI中文摘要

强化学习（RL）在部署于与训练环境不同的环境时，通常会出现性能下降。现有技术如域随机化（DR）可以缓解这一问题，但需要访问多样化的训练环境和完整的轨迹可观测性，这些假设在隐私保护或受限场景中无法满足，此时仅能获得标量性能指标。我们提出通过进化奖励塑形实现泛化（GERS），一种双层优化方法，仅使用来自验证环境的标量反馈来改善在未见测试环境上的泛化能力。在下层，由上层塑形的奖励函数引导的RL智能体在具有可访问轨迹数据的有限训练环境集上学习策略；在上层，CMA-ES优化奖励塑形参数，以最大化在无法访问轨迹的单独验证环境上的累积未塑形奖励。在连续控制任务上的结果表明，GERS在未见测试环境上优于标准RL基线。尽管DR将GERS的训练和验证环境组合集视为需要轨迹访问的单一训练集，而GERS无法访问验证轨迹，但GERS的性能与DR相当。这些结果证实，GERS在受限数据访问约束下有效增强了泛化能力。

英文摘要

Reinforcement learning (RL) often suffers from performance degradation when deployed in environments that differ from those encountered during training. Existing techniques such as domain randomization (DR) mitigate this, but require access to diverse training environments and full trajectory observability, assumptions that fail in privacy-preserving or restricted scenarios where only scalar performance metrics are available. We propose Generalization via Evolutionary Reward Shaping (GERS), a bilevel optimization approach to improve generalization on unseen test environments using only scalar feedback from validation environments. At the lower level, an RL agent guided via a reward function shaped by the upper level learns a policy on a limited set of training environments with accessible trajectory data; at the upper level, CMA-ES optimizes the reward shaping parameters to maximize the cumulative unshaped reward on separate validation environments for which trajectory access is unavailable. Results on continuous control tasks indicate that GERS outperforms the standard RL baseline on unseen test environments. GERS performance is comparable to DR, despite DR treating the combined set of training and validation environments of GERS as a single training set that requires trajectory access, whereas GERS cannot access validation trajectories. These results confirm that GERS effectively enhances generalization under restricted data access constraints.

URL PDF HTML ☆

赞 0 踩 0

2606.16286 2026-06-16 cs.LG cs.AI cs.RO 新提交

FlowMPC: Improving Flow Matching policies with World Models

FlowMPC：利用世界模型改进流匹配策略

Chandon Hamel

发表机构 * Stanford University（斯坦福大学）

AI总结提出FlowMPC框架，结合流匹配模仿策略与学习的世界模型，通过MPPI规划提升测试时性能，在ManiSkill操作任务中显著提高成功率。

详情

AI中文摘要

流匹配（FM）是一种在多模态动作空间中进行行为克隆的强大方法[Jiang et al., 2025]，但由于它没有直接训练以最大化期望回报，FM策略在测试时的表现仍有改进空间。本文研究学习的世界模型是否可以通过对策略提出的候选动作序列进行模型预测路径积分（MPPI）规划来改进FM策略。基于TD-MPC2 [Hansen et al., 2024]，我引入了FlowMPC，这是一个将模仿学习的FM策略与学习的世界模型相结合的框架，用于ManiSkill操作任务[Tao et al., 2025]中的测试时规划。在PickCube和PickSingleYCB上，添加世界模型比单独使用FM策略提高了性能，尤其是在回合结束时的成功率方面有显著提升。这些结果表明，基于世界模型的规划可以有效地补充基于流的模仿策略，而无需修改FM训练目标。

英文摘要

Flow Matching (FM) is a powerful approach for behavior cloning in multimodal action spaces [Jiang et al., 2025], but because it is not trained to directly maximize expected return, there is still room to improve how FM policies act at test time. This work investigates whether a learned world model can improve FM policies by enabling Model Predictive Path Integral (MPPI) planning over candidate action sequences proposed by the policy. Building on TD-MPC2 [Hansen et al., 2024], I introduce FlowMPC, a framework that combines an imitation-learned FM policy with a learned world model for test-time planning in ManiSkill manipulation tasks [Tao et al., 2025]. Across PickCube and PickSingleYCB, adding the world model improved performance over the FM policy alone, with especially clear gains in end-of-episode success. These results suggest that world-model-based planning can effectively complement flow-based imitation policies without modifying the FM training objective.

URL PDF HTML ☆

赞 0 踩 0

2606.16331 2026-06-16 cs.LG 新提交

Diffusion Offline Reinforcement Learning for Fair and Energy-Efficient UAV-Assisted Wireless Networks

面向公平与节能的无人机辅助无线网络的扩散离线强化学习

Eslam Eldeeb, Hirley Alves

发表机构 * Centre for Wireless Communications (CWC), University of Oulu（奥卢大学无线通信中心（CWC））

AI总结提出扩散软演员-评论家方法，结合保守Q学习与扩散模型，在离线强化学习中优化无人机轨迹与调度，降低能耗并提升公平性，性能优于现有算法。

详情

AI中文摘要

生成式人工智能与无线通信及信号处理系统的融合为未来6G网络中的智能数据驱动决策开辟了新途径。本文提出一种扩散软演员-评论家方法，利用去噪扩散概率模型增强的离线强化学习，优化无人机网络中的轨迹与调度控制。虽然离线强化学习方法（如保守Q学习）可以从静态数据集中学习，但在低数据或动态条件下往往难以泛化。为此，我们将保守Q学习的鲁棒性与扩散模型的生成能力相结合，实现超越行为策略的、具有信号感知能力的策略学习。将该框架应用于无人机辅助无线网络，可最小化传输能量并提高设备间的公平性。仿真表明，扩散软演员-评论家方法优于标准离线强化学习基线，即使在有限数据集下也能实现更稳定的收敛和更高的奖励。该方法提升了数据效率，降低了能耗，与现有算法相比吞吐量提高了35%以上，展示了其在下一代无线控制系统中进行鲁棒策略学习的潜力。

英文摘要

The integration of generative artificial intelligence with wireless communication and signal processing systems has opened new avenues for intelligent, data-driven decision-making in future 6G networks. This work proposes a diffusion soft actor-critic (Diffusion-SAC) approach that leverages offline reinforcement learning (RL) enhanced by denoising diffusion probabilistic models (DDPMs) to optimize trajectory and scheduling control in unmanned aerial vehicle (UAV) networks. While offline RL methods, such as conservative Q-learning (CQL), can learn from static datasets, they often struggle to generalize in low-data or dynamic conditions. To address this, we combine the robustness of CQL with the generative power of diffusion models, enabling expressive and signal-aware policy learning that generalizes beyond behavior policies. Applied to a UAV-assisted wireless network, the proposed framework minimizes transmission energy and improves fairness among devices. Simulations show that Diffusion-SAC outperforms standard offline RL baselines, achieving more stable convergence and higher rewards even with limited datasets. The method enhances data efficiency, reduces energy consumption, and increases throughput by more than 35 % compared to existing algorithms, demonstrating its potential for robust policy learning in next-generation wireless control systems.

URL PDF HTML ☆

赞 0 踩 0

2606.16489 2026-06-16 cs.LG 新提交

BRICKS-WM: Building Reusability via Interface Composition Kinetics for Structured World Models

BRICKS-WM：通过接口组合动力学构建结构化世界模型的可重用性

Shaowei Zhang, Jiahan Cao, Xunlan Zhou, Shenghua Wan, De-Chuan Zhan

发表机构 * National Key Laboratory for Novel Software Technology, Nanjing University, China（南京大学计算机软件新技术国家重点实验室）； School of Artificial Intelligence, Nanjing University, China（南京大学人工智能学院）； School of Intelligence Science and Technology, Nanjing University, China（南京大学智能科学与技术学院）

AI总结提出BRICKS-WM框架，将全局动力学分解为通过潜在接口交互的独立模块（如智能体和背景），实现冻结背景模块跨智能体重用，避免从头训练。

详情

AI中文摘要

基于模型强化学习（MBRL）通过利用潜在世界模型在连续控制中取得了显著成功。然而，现有方法通常依赖单一潜在动力学，将环境动力学纠缠为耦合过程。这种耦合严重限制了可重用性：即使环境保持不变，改变智能体也需要从头重新训练整个世界模型。为了解决这个问题，我们引入了BRICKS-WM（通过接口组合动力学构建结构化世界模型的可重用性），一个用于模块化组装结构化世界模型的框架。基于物理世界由独立实体组成的洞察，我们假设全局动力学可以建模为通过潜在接口交互的不同动力学模块的组合。作为一个最小实例，我们将潜在状态空间分解为一个被驱动的智能体模块和一个外部背景模块，通过学习的潜在接口连接。与先前优先考虑视觉分割的以对象为中心的方法不同，BRICKS-WM在转移动力学中强制执行功能分离，确保背景动力学对智能体动力学保持不可知。实验表明，BRICKS-WM在从头训练时实现了与强单一基线相当的控制性能，并能够跨智能体重用冻结的背景动力学。

英文摘要

Model-based Reinforcement Learning (MBRL) has achieved remarkable success in continuous control by leveraging latent world models. However, prevailing approaches typically rely on monolithic latent dynamics, entangling environment dynamics into a coupled process. This coupling severely limits reusability: altering the agent necessitates retraining the entire world from scratch, even if the environment remains constant. To address this, we introduce BRICKS-WM (Building Reusability via Interface Composition Kinetics for Structured World Models), a framework for the modular assembly of structured world models. Driven by the insight that the physical world is composed of independent entities, we posit that global dynamics can be modeled as a composition of distinct dynamical modules interacting via latent interfaces. As a minimal instantiation, we factorize the latent state space into an actuated Agent module and an external Background module, bridged by a learned latent interface. Unlike prior object-centric methods that prioritize visual segmentation, BRICKS-WM enforces a functional separation in transition dynamics, ensuring that background dynamics remains agnostic to the agent's dynamics. Empirically, BRICKS-WM achieves control performance comparable to strong monolithic baselines when trained from scratch, and enables the reuse of frozen background dynamics across agents.

URL PDF HTML ☆

赞 0 踩 0

2606.16497 2026-06-16 cs.LG cs.AI cs.CL 新提交

daVinci-kernel: Co-Evolving Skill Selection, Summarization, and Utilization via RL for GPU Kernel Optimization

daVinci-kernel：通过强化学习协同进化技能选择、总结与利用的GPU内核优化

Dayuan Fu, Mohan Jiang, Tongyu Wang, Dian Yang, Jiarui Hu, Liming Liu, Jinlong Hou, Pengfei Li

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出daVinci-kernel框架，通过强化学习联合训练技能选择、策略生成和技能总结三个智能体，共享LLM骨干，实现GPU内核优化，在KernelBench上超越先前最优模型。

详情

AI中文摘要

GPU内核优化代表了一种范式，其中功能正确性被假定，执行效率是目标。我们提出daVinci-kernel，一个强化学习框架，通过动态演化的技能库将技能发现与技能利用相结合。daVinci-kernel联合训练三个共享一个LLM骨干的智能体：技能选择智能体通过BM25和LLM重排序检索相关技术，策略智能体基于所选技能生成多轮CUDA/Triton内核，技能总结智能体将成功轨迹提炼为可复用技能。候选技能仅在基于执行的验证确认可复现加速后才被添加。所有三个智能体共享单个LLM骨干，通过多样性过滤数据上的结构化SFT冷启动初始化，然后通过多轮REINFORCE和每个智能体的优势估计进行端到端联合优化。在KernelBench上，daVinci-kernel-14B在Fast$_1$阈值下，Level 1、Level 2和Level 3分别达到37.2%、70.6%和32.2%，优于先前最强的RL训练模型Dr.Kernel-14B。

英文摘要

GPU kernel optimization represents a paradigm where functional correctness is assumed and execution efficiency is the objective. We present daVinci-kernel, a reinforcement learning framework that couples skill discovery with skill exploitation through a dynamically evolving skill library. daVinci-kernel jointly trains three agents sharing one LLM backbone: a Skill Selection Agent that retrieves relevant techniques via BM25 and LLM reranking, a Policy Agent that generates multi-turn CUDA/Triton kernels conditioned on selected skills, and a Skill Summary Agent that distills successful rollouts into reusable skills. Candidate skills are added only after execution-based verification confirms reproducible speedups. All three agents share a single LLM backbone, are initialized via a structured SFT cold start on diversity-filtered data, and are then jointly optimized end-to-end with multi-turn REINFORCE and per-agent advantage estimation. On KernelBench, daVinci-kernel-14B achieves 37.2%, 70.6%, and 32.2% on Level 1, Level 2, and Level 3 under the Fast$_1$ threshold, outperforming the strongest prior RL-trained model, Dr.Kernel-14B.

URL PDF HTML ☆

赞 0 踩 0

2606.16515 2026-06-16 cs.LG cs.AI cs.RO 新提交

Direction-Conditioned Policies via Compositional Subgoal Scoring for Online Goal-Conditioned Reinforcement Learning

基于组合子目标评分的方向条件策略用于在线目标条件强化学习

Swaminathan S K, Damiya Gondha, Theyanesh Eswaramoorthy Rajahkrishnan, Aritra Hazra

AI总结提出方向条件策略（DCP），通过共享InfoNCE表示将目标达成分解为子目标评分和方向条件动作，理论证明方向充分性、训练与部署一致性及可控子空间失效条件，在九个环境中优于对比RL。

Comments 17 pages, Accepted to the 2nd Workshop on Compositional Learning at ICML 2026 (Seoul, South Korea)

详情

AI中文摘要

Hamilton-Jacobi-Bellman理论表明，最优目标条件动作仅通过当前状态下目标距离的梯度依赖于目标，然而标准的在线GCRL仍然将演员网络条件于原始目标——当目标远离数据分布时，这是一个几何上无信息的信号。我们提出方向条件策略（DCP），一种完全在线的方法，将目标达成分解为两个共享一个InfoNCE表示ψ的组件：一个子目标评分步骤，选择与最终目标g在ψ空间中对齐的已访问状态z_t；以及一个方向条件演员，它消耗从ψ(s_t)到ψ(z_t)的单位方向d_t和幅度r_t。这两个组件联合训练，在部署时干净地分解（子目标评分被移除，而方向条件保留，用g代替z_t），并允许在相同的(d_t, r_t)接口上进行独立修改。我们证明了三个结果。首先，HJB下的方向充分性：在控制仿射动力学下，最优动作仅通过价值梯度依赖于目标。其次，一个定量界表明，在学习表示的温和条件下，并假设评分规则返回一个路径上的z_t，演员在训练和部署时的条件输入在表示误差和测地线松弛下是一致的。第三，一个可控子空间刻画了方向条件失效的情况。在九个环境中，DCP在大多数最终指标上优于对比RL，在操作和障碍物交互任务上提升最大；对学习到的ψ-距离景观的定性分析表明，对比表示表现为一种在线拟度量，编码环境拓扑，而唯一的失败案例（AntSoccer）定位到理论预期的学习梯度病理。

英文摘要

Hamilton-Jacobi-Bellman theory implies that the optimal goal-conditioned action depends on the goal only through the gradient of the goal-reaching distance at the current state, yet standard online GCRL still conditions the actor on the raw goal -- a signal that is geometrically uninformative when the goal is far from the data distribution. We propose Direction-Conditioned Policies (DCP), a fully online method that decomposes goal-reaching into two components sharing one InfoNCE representation $ψ$: a subgoal-scoring step that selects a visited state $z_t$ aligned with the final goal $g$ in $ψ_g$, and a direction-conditioned actor that consumes the unit direction $d_t$ and magnitude $r_t$ from $ψ(s_t)$ to $ψ(z_t)$. The two components train jointly, factor cleanly at deployment (subgoal scoring is removed, while direction conditioning remains with $g$ in place of $z_t$), and admit independent modification at the same $(d_t,r_t)$ interface. We prove three results. First, direction sufficiency under HJB: the optimal action under control-affine dynamics depends on the goal only through the value gradient. Second, a quantitative bound showing that, under mild conditions on the learned representation and assuming the scoring rule returns an on-path $z_t$, the actor's conditioning input at training and at deployment coincide up to representation error and geodesic slack. Third, a controllable-subspace characterization of when directional conditioning fails. Across nine environments, DCP improves over Contrastive RL on most final metrics, with the largest gains on manipulation and obstacle-interaction tasks; a qualitative analysis of the learned $ψ$-distance landscape shows the contrastive representation behaves as an online quasimetric encoding environment topology, and the single failure case (AntSoccer) localizes to a learned-gradient pathology that the theory anticipates.

URL PDF HTML ☆

赞 0 踩 0

2606.16656 2026-06-16 cs.LG 新提交

Near-Optimal Stochastic Linear Bandits with Delay

带延迟的近最优随机线性赌博机

Ofir Schlisselberg, Mengxiao Zhang, Yishay Mansour

发表机构 * Tel Aviv University（特拉维夫大学）； University of Iowa（爱荷华大学）； Tel Aviv University and Google Research（特拉维夫大学和谷歌研究）

AI总结研究多种延迟模型下的随机线性赌博机，给出近最优遗憾界，揭示延迟与线性结构交互的维度影响。

详情

Hölder空间上的深度Q学习

Qian Qi

发表机构 * Peking University（北京大学）

AI总结研究连续时间随机控制中Q学习的算子核心，通过分析扩散设置下Bellman最优性目标的正则性和逼近复杂度，提出适应混合正则性的张量积DeepONet架构，并给出显式逼近和资源界限。

详情

AI中文摘要

我们研究了具有连续状态和动作的连续时间随机控制中Q学习的算子理论核心。在基于价值的强化学习中，每次Q学习或DQN更新都基于Bellman最优性目标；我们的分析在扩散设置中分离出该目标，并研究其正则性和逼近复杂度。在均匀椭圆性和Hölder正则系数下，我们证明Bellman更新将有界输入映射到各向异性正则类，平滑状态变量而仅保留对动作变量的Lipschitz依赖性。这产生了Bellman迭代的紧族，并激发了适应问题混合正则性的张量积DeepONet架构。然后我们推导出显式的逼近和资源界限，以及时间步长$δ\ o 0$时的刚度-复杂度权衡。所得理论在连续随机控制中Bellman目标正则性和逼近层面直接贡献于Q学习理论。同时，我们并未声称对包含探索、经验回放和随机梯度更新的实际采样Q学习有完整的收敛定理。

英文摘要

We study the operator-theoretic core of Q-learning in continuous-time stochastic control with continuous states and actions. In value-based reinforcement learning, each Q-learning or DQN update is built from a Bellman optimality target; our analysis isolates this target in a diffusion setting and studies its regularity and approximation complexity. Under uniform ellipticity and Hölder-regular coefficients, we show that a Bellman update maps bounded inputs into an anisotropic regularity class, smoothing the state variable while leaving only Lipschitz dependence on the action variable. This yields a compact family of Bellman iterates and motivates a tensor-product DeepONet architecture adapted to the mixed regularity of the problem. We then derive explicit approximation and resource bounds, together with a stiffness--complexity trade-off as the time step $δ\to 0$. The resulting theory makes a direct contribution to Q-learning theory at the level of Bellman target regularity and approximation in continuous stochastic control. At the same time, we do not claim a full convergence theorem for practical sampled Q-learning with exploration, replay, and stochastic gradient updates.

URL PDF HTML ☆

赞 0 踩 0

2606.16933 2026-06-16 cs.LG cs.AI 新提交

A Unified Causal-Origin Taxonomy of Distributional Shifts in Reinforcement Learning

强化学习中分布偏移的统一因果起源分类法

Ardianto Wibowo, Paulo E Santos, Amer Baghdadi, Matthew Stephenson, Karl Sammut, Jean-Philippe Diguet

发表机构 * IMT Atlantique（IMT大西洋）； Flinders University（弗林德斯大学）； IRL Crossing ； Priori Analytica ； CNRS（法国国家科学研究中心）

AI总结提出一种统一因果起源分类法，将强化学习中的分布偏移按因果来源（内部/外部）和时间边界（显式/隐式/混合）分类，统一了分布内/外泛化与非平稳性分析。

Comments The paper is currently under review at the Journal of Artificial Intelligence Research (JAIR)

详情

AI中文摘要

强化学习系统在运行条件与先前遇到的条件不同时通常会退化，这反映了底层数据生成过程中的分布偏移。这种偏移可能发生在训练和评估之间，如分布内（ID）和分布外（OOD）泛化，或者发生在环境动态随时间演变的非平稳设置中。然而，这些观点之间的形式关系尚不清楚，现有工作主要关注缓解措施而非智能体-环境交互中偏移的因果起源。本文开发了一个统一的因果起源分类法，描述了强化学习中分布偏移的来源，并将ID/OOD泛化与非平稳设置联系起来。我们将监督学习中的经典数据集偏移原则迁移到强化学习，通过将分布偏移重新表述为生成交互过程。使用部分可观测马尔可夫决策过程（POMDP），我们将交互分解为结构组件，包括状态分布、观测过程、策略、奖励和转移动态，以及偏移时间边界。所提出的分类法区分了内部（智能体驱动）和外部（环境驱动）的分布偏移。偏移时间边界视角进一步刻画了显式、隐式和混合偏移。这种表述将ID/OOD泛化和非平稳性统一为底层过程中的结构化变化。我们还引入了一个评估框架，通过性能退化和恢复指标来衡量偏移影响和适应能力。通过将分布偏移扎根于强化学习的因果起源结构，本文支持在分布偏移下进行系统性的鲁棒性分析。

英文摘要

Reinforcement learning (RL) systems often degrade when operating conditions differ from those previously encountered, reflecting distributional shifts in the underlying data-generating process. Such shifts may occur between training and evaluation, as in In-Distribution (ID) and Out-of-Distribution (OOD) generalization, or within non-stationary settings where environment dynamics evolve over time. However, the formal relationship between these views remains unclear, and existing work mainly focuses on mitigation rather than the causal origin of shift within the agent-environment interaction. This work develops a unified causal-origin taxonomy that characterizes sources of distributional shift in RL and relates ID/OOD generalization to non-stationary settings. We transfer the classical dataset-shift principle from supervised learning to RL by reformulating distributional shift in terms of the generative interaction process. Using a Partially Observable Markov Decision Process (POMDP), we decompose the interaction into structural components, including the state distribution, observation process, policy, reward, and transition dynamics, together with the shifted-time boundary. The proposed taxonomy distinguishes internal, agent-driven, and external, environment-driven, distributional shifts. The shifted-time boundary perspective further characterizes explicit, implicit, and hybrid shifts. This formulation unifies ID/OOD generalization and non-stationarity as structured changes in the underlying process. We also introduce an evaluation framework for measuring shift impact and adaptation through performance degradation and recovery metrics. By grounding distributional shift in the causal-origin structure of RL, this work supports systematic analysis of robustness under distributional shift.

URL PDF HTML ☆

赞 0 踩 0

2606.17024 2026-06-16 cs.LG 新提交

ExpRL: Exploratory RL for LLM Mid-Training

ExpRL: 用于LLM中期训练的探索性强化学习

Violet Xiang, Amrith Setlur, Chase Blagden, Nick Haber, Aviral Kumar

发表机构 * Stanford University（斯坦福大学）； Carnegie Mellon University（卡内基梅隆大学）； OpenAI ； Rogo

AI总结提出ExpRL方法，利用人类编写的问答数据作为奖励支架，通过密集奖励强化推理过程中的部分进展和有用行为，在数学推理任务上优于SFT、稀疏奖励GRPO和自蒸馏，并为后续稀疏奖励RL提供更好的初始化。

详情

AI中文摘要

稀疏奖励强化学习（RL）已成为提升LLM推理能力的标准工具，但其成功关键取决于基础模型中的覆盖范围。实践中，模型通常通过在精心策划的推理轨迹上进行中期训练来为RL做准备，这些轨迹教授有用的基本技能，如分解、验证或自我纠正。尽管有效，但这种策略需要手动指定模型应学习的内容，并且尚不清楚这种基本覆盖是否足以解决更难的问题，这些问题需要将这些技能组合成更广泛的解决方案策略。我们研究了一种更自动化的方法：使用大规模人工编写的问答数据进行基于RL的中期训练。我们的方法ExpRL不是将参考解决方案作为模仿目标，而是将其用作奖励支架：参考对策略隐藏，仅用于构建问题特定的评分标准，以评判在策略推理轨迹。策略从原始问题提示中采样，而LLM评判器将采样的推理轨迹与参考解决方案进行比较，并分配结果级或过程级的密集奖励。这使得ExpRL能够强化部分进展、有用的中间归约以及稀疏最终答案奖励通常无法提升的生产性推理行为。在具有挑战性的数学推理任务上，ExpRL比SFT、稀疏奖励GRPO和自蒸馏产生更强的RL启动，并为后续稀疏奖励RL提供更好的初始化。额外的混合领域实验进一步表明，ExpRL可以扩展到最初的纯数学设置之外。

英文摘要

Sparse reward reinforcement learning (RL) has become a standard tool for improving LLM reasoning, but its success depends critically on the coverage present in the base model. In practice, models are often primed for RL through \emph{mid-training} on curated reasoning traces that teach useful primitive skills such as decomposition, verification, or self-correction. Although effective, this strategy requires manually specifying what the model should learn, and it remains unclear whether such primitive coverage is enough for much harder problems, which require combining these skills into broader solution strategies. We study a more automated approach: \emph{RL-based mid-training} using large corpora of human-written question-answer data. Rather than treating reference solutions as targets to imitate, our method, ExpRL, uses them as \emph{reward scaffolds}: references are hidden from the policy and used only to construct problem-specific grading rubrics for judging on-policy reasoning traces. The policy samples from the original problem prompt, while an LLM judge compares the sampled reasoning trace against the reference solution and assigns outcome-level or process-level dense rewards. This lets ExpRL reinforce partial progress, useful intermediate reductions, and productive reasoning behaviors that sparse final-answer rewards often fail to upweight. On challenging math reasoning tasks, ExpRL yields stronger RL priming than SFT, sparse-reward GRPO, and self-distillation, and provides a better initialization for subsequent sparse-reward RL. Additional mixed-domain experiments further suggest that ExpRL can extend beyond the original math-only setting.

URL PDF HTML ☆

赞 0 踩 0

2606.14879 2026-06-16 cs.RO cs.CV cs.LG 交叉投稿

VANDERER: Map-Free Exploration using Future-Aware and Visual-Curiosity-Guided Diffusion Policy

VANDERER: 基于未来感知与视觉好奇心引导扩散策略的无地图探索

Venkata Naren Devarakonda, Raktim Gautam Goswami, Prashanth Krishnamurthy, Farshad Khorrami

发表机构 * Control/Robotics Research Laboratory (CRRL), Department of Electrical and Computer Engineering, NYU Tandon School of Engineering（纽约大学坦登工程学院电气与计算机工程系控制/机器人研究实验室（CRRL））； New York University Abu Dhabi (NYUAD) Center for Artificial Intelligence and Robotics (CAIR)（纽约大学阿布扎比分校人工智能与机器人中心（CAIR））

AI总结提出VANDERER框架，利用视觉好奇心模块引导预训练扩散策略，仅依赖单目图像实现高效无地图探索，在多种模拟环境中平均探索面积比NoMaD多13.4%。

详情

AI中文摘要

移动智能体需要高效的探索策略来绘制未知环境并自主规划任务。传统方法依赖于生成占据地图并优化未探索区域的访问顺序。然而，在传感器受限的设置中，例如仅使用单目相机，生成准确的占据地图具有挑战性。为了解决这一问题，我们提出了VANDERER，一个探索框架，它利用视觉好奇心模块（VCM）仅使用单目图像数据来引导预训练的扩散策略。该好奇心模块通过导航世界模型预测所提议动作的结果，并通过好奇心成本对其进行评估。然后，该成本引导扩散过程生成最大化探索的动作。在多种模拟环境中进行评估，VANDERER始终优于现有基线，平均探索面积比NoMaD多13.4%。我们的结果揭示了室外环境中视觉好奇心与几何好奇心之间的直接相关性，表明VANDERER能够有效利用这种关系，在传感器受限的智能体上实现高效探索。

英文摘要

Mobile agents require efficient exploration strategies to map unseen environments and autonomously plan tasks. Traditional methods rely on generating occupancy maps and optimizing the sequence in which unexplored regions are visited. However, in sensor-constrained settings, such as those limited to monocular cameras, generating accurate occupancy maps is challenging. To address this, we propose VANDERER, an exploration framework that leverages a Visual Curiosity Module (VCM) to guide pre-trained diffusion policies using only monocular image data. This curiosity module predicts the outcomes of proposed actions via a navigation world model and evaluates them through a curiosity cost. The cost then guides the diffusion process toward generating actions that maximize exploration. Evaluated across diverse simulated environments, VANDERER consistently outperforms established baselines, exploring an average of 13.4% more area than NoMaD. Our results reveal a direct correlation between visual and geometric curiosity in outdoor environments, demonstrating that VANDERER can effectively leverage this relationship for efficient exploration using sensor-constrained agents.

URL PDF HTML ☆

赞 0 踩 0

2606.14981 2026-06-16 cs.RO cs.AI cs.LG 交叉投稿

Inference-time Policy Steering via Vision and Touch

通过视觉和触觉进行推理时策略引导

Yilin Wu, Zilin Si, Zeynep Temel, Oliver Kroemer, Andrea Bajcsy

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结提出ViTaL框架，通过视觉采样验证和触觉引导扩散编辑的双层优化，在推理时引导机器人策略，显著提升接触丰富操作任务的成功率。

详情

AI中文摘要

推理时引导通过在部署前验证候选动作来适应预训练的生成式机器人策略。虽然先前的方法通常仅使用视觉观察进行验证，但对于接触丰富的操作任务，仅靠视觉往往不足，因为成功取决于全局任务进展和微妙的局部交互（如接触力）。我们提出了ViTaL，一个视觉-触觉推理时引导框架，将多模态引导形式化为双层优化问题。在高层，视觉采样与验证执行长时域模式选择，决定机器人应执行何种行为。在低层，触觉引导的扩散编辑在较短时域内细化所选动作序列，以满足局部接触要求。为了支持基于结果的引导，ViTaL学习了一个视觉-触觉潜在世界模型，并采用了语义对齐的视觉和触觉验证器，包括一个新颖的文本条件触觉奖励，直接在潜在空间中对预测的触觉未来进行评分。在三个真实世界的接触丰富操作任务中，ViTaL相对于基础策略将整体成功率提高了51%，比单模态引导至少高出33%，并且比朴素多模态融合至少高出20%。网站：https://yilin-wu98.github.io/vital_website。

英文摘要

Inference-time steering adapts pre-trained generative robot policies during deployment by verifying candidate actions before execution. While prior methods typically perform this verification only with visual observations, vision alone is often insufficient for contact-rich manipulation, where success depends on both global task progress and subtle local interactions such as contact force. We introduce ViTaL, a visuo-tactile inference-time steering framework that formulates multimodal guidance as a bi-level optimization problem. At the high level, visual sampling-and-verification performs long-horizon mode selection, deciding what behavior the robot should execute. At the low level, tactile-guided diffusion editing refines the selected action sequence over a shorter horizon to satisfy local contact requirements. To support outcome-based steering, ViTaL learns a visuo-tactile latent world model and employs semantically aligned visual and tactile verifiers, including a novel text-conditioned tactile reward that scores predicted tactile futures directly in latent space. Across three real-world contact-rich manipulation tasks, ViTaL improves overall success by 51% over the base policy, outperforms unimodal steering by at least 33%, and exceeds naive multimodal fusion by at least 20%. Website: https://yilin-wu98.github.io/vital_website.

URL PDF HTML ☆

赞 0 踩 0

2606.15099 2026-06-16 cs.CV cs.LG cs.RO 交叉投稿

Think Less, Act Early: Reinforced Latent Reasoning with Early Exit in Vision-Language-Action Models

少思考，早行动：视觉-语言-动作模型中带早退的强化潜在推理

Dianqiao Lei, Lianlei Shan

AI总结提出AVA-VLA框架，通过强化学习去噪和早退策略优化潜在推理轨迹，在LIBERO上实现6倍推理加速和98.3%平均成功率。

Comments Accepted at ICML 2026

详情

AI中文摘要

现有的视觉-语言-动作（VLA）模型主要依赖显式的思维链（CoT）推理来桥接感知和动作。虽然有效，但这种范式在多步骤任务中面临高计算成本和错误传播的问题。在本文中，我们提出了自适应变量对齐VLA（AVA-VLA），一种新颖的潜在推理VLA框架，将推理建模为一系列不可观测的潜在变量，绕过了显式文本生成的需求。然而，潜在轨迹本质上容易受到噪声干扰和与下游目标不对齐的影响。为了解决这个问题，我们引入了一种基于强化学习的去噪机制，将潜在状态生成视为一个顺序决策过程，通过任务级奖励优化推理轨迹。此外，我们结合了一种早退策略，根据状态置信度自适应地终止推理，实现了深度和效率之间的动态权衡。在具身决策基准上的大量实验表明，AVA-VLA在LIBERO上实现了比显式CoT方法6倍的推理加速，同时达到了98.3%的平均成功率，在效率和长期稳定性上均优于全推理基线。

英文摘要

Existing Vision-Language-Action (VLA) models predominantly rely on explicit Chain-of-Thought (CoT) reasoning to bridge perception and action. While effective, this paradigm suffers from high computational costs and error propagation in multi-step tasks. In this paper, we propose Adaptive Variable Alignment VLA (AVA-VLA), a novel Latent Reasoning VLA framework that models reasoning as a sequence of unobservable latent variables, bypassing the need for explicit text generation. However, latent trajectories are inherently susceptible to noise interference and misalignment with downstream objectives. To address this, we introduce a Reinforcement Learning-based Denoising mechanism that treats latent state generation as a sequential decision process, optimizing reasoning trajectories via task-level rewards. Furthermore, we incorporate an Early-Exit Strategy that adaptively terminates reasoning based on state confidence, enabling a dynamic trade-off between depth and efficiency. Extensive experiments on embodied decision benchmarks demonstrate that AVA-VLA achieves a 6x inference speedup over explicit CoT methods while attaining a 98.3% average success rate on LIBERO, improving both efficiency and long-horizon stability over full-reasoning baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.15160 2026-06-16 cs.CV cs.LG 交叉投稿

DLWM: Diverse Latent World Models for Efficient Multimodal Reasoning

DLWM: 多样化潜在世界模型用于高效多模态推理

David Huang, Lianlei Shan

发表机构 * University of Toronto（多伦多大学）； Tsinghua University（清华大学）

AI总结提出DLWM框架，结合潜在空间推理与强化学习，通过多样化潜在假设和资源感知策略提升多模态推理效率，准确率提升2-5%，内存减少24%。

Comments Preprint. 9 pages main text, 15 pages total including appendix, 2 figures

详情

AI中文摘要

近年来，多模态大语言模型（MLLMs）的推理能力有了显著提升。现有方法通常依赖显式的思维链或连续的潜在空间轨迹来增强多步推理。然而，这些方法通常假设输入具有单一的潜在解释，并沿着固定路径或在统一计算预算下展开推理。在现实世界的多模态场景中，视觉观测常受遮挡、模糊、视角变化或语义歧义的影响，产生多种合理的解释。统一的推理策略不仅限制了模型探索多个假设的能力，还导致高内存使用和展开成本。我们提出DLWM（多样化潜在世界模型），一种结合潜在空间推理与强化学习的多模态推理框架。首先，我们在连续潜在空间中构建一组多样化的潜在世界假设，每个假设捕捉视觉输入的不同合理解释，并在每个假设上独立展开潜在推理。基于正交性的多样性正则化器明确防止假设坍缩。其次，我们将潜在推理过程形式化为资源受限的序列决策问题，并引入资源感知的强化学习策略，该策略自适应地在假设间分配计算资源，动态决定是扩展、终止还是合并推理路径，从而大幅减少内存占用并提高展开效率。在多个多模态推理基准上的实验表明，DLWM在准确率上比现有方法高出2-5个百分点，同时内存使用减少24%。

英文摘要

Reasoning capabilities of multimodal large language models (MLLMs) have improved considerably in recent years. Existing approaches typically rely on explicit chain-of-thought or continuous latent-space trajectories to enhance multi-step reasoning. However, these methods generally assume that an input admits a single latent interpretation and unfold reasoning along a fixed path or under a uniform computation budget. In real-world multimodal settings, visual observations are often subject to occlusion, blur, viewpoint variation, or semantic ambiguity, giving rise to multiple plausible interpretations. A uniform reasoning strategy not only limits the model's ability to explore multiple hypotheses but also incurs high memory usage and rollout cost. We present DLWM (Diverse Latent World Models), a multimodal reasoning framework that combines latent-space reasoning with reinforcement learning. First, we construct a set of diverse latent world hypotheses in continuous latent space, each capturing a different plausible interpretation of the visual input, and unfold latent reasoning independently on each hypothesis. An orthogonality-based diversity regularizer explicitly prevents hypothesis collapse. Second, we formulate the latent reasoning process as a resource-constrained sequential decision problem and introduce a resource-aware reinforcement learning policy that adaptively allocates computation across hypotheses, dynamically deciding whether to expand, terminate, or merge reasoning paths, thereby substantially reducing memory footprint and improving rollout efficiency. Experiments on multiple multimodal reasoning benchmarks demonstrate that DLWM outperforms existing methods by 2-5 points in accuracy while reducing memory usage by 24%.

URL PDF HTML ☆

赞 0 踩 0

2606.15333 2026-06-16 cs.CL cs.LG 交叉投稿

Replay What Matters: Off-Policy Replay for Efficient LLM Reinforcement Unlearning

重放重要内容：面向高效LLM强化反学习的离策略重放

Zirui Pang, Chenlong Zhang, Haosheng Tan, Zhuoran Jin, Jiaheng Wei, Zixin Zhong

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所）； University of Glasgow（格拉斯哥大学）

AI总结针对LLM反学习中在线策略优化对困难样本利用不足的问题，提出ReRULE方法，通过离策略重放缓冲区存储并复用低奖励困难样本，在保持通用性的同时提升反学习效率。

详情

AI中文摘要

LLM反学习已成为一种经济有效的替代方案，无需完全重新训练即可从预训练模型中移除危险知识，同时保持通用实用性。最近的基于RL的方法（如RULE）将反学习重新定义为学习拒绝行为，但其在线策略优化在整个训练过程中反复从相同的遗忘和保留/边界提示中采样。我们发现了该过程中的一个关键低效问题：简单案例迅速收敛，提供的梯度信号很少，而遗忘/保留边界附近的困难案例持续产生低奖励的轨迹，这些轨迹在单次使用后被丢弃。为了解决这个问题，我们提出了ReRULE，一种用于强化反学习的离策略重放增强方法。ReRULE在早期GRPO训练期间将低奖励的困难案例轨迹组存储在重放缓冲区中，并通过重要性采样的离策略更新在后续阶段重用它们，将计算重定向到仍需学习的边界案例。理论上，我们证明ReRULE比纯在线策略RULE具有更紧的困难案例收敛界。实验上，ReRULE将MUSE-Books保留质量从46.3提高到56.2，同时仅增加5-11%的训练时间。其在更简单的TOFU设置上改进有限，进一步支持了预期的条件行为：当困难/简单差异显著时，重放最为有益。

英文摘要

LLM unlearning has emerged as a cost-effective alternative to full retraining for removing hazardous knowledge from pretrained models while preserving general utility. Recent RL-based methods such as RULE reformulate unlearning as learning a refusal behavior, but their on-policy optimization repeatedly samples from the same forget and retain/boundary prompts throughout training. We identify a critical inefficiency in this process: easy cases quickly converge and provide little useful gradient signal, while hard cases near the forget/retain boundary continue to produce low-reward rollouts that are discarded after a single use. To address this issue, we propose ReRULE, an off-policy replay enhancement for reinforcement unlearning. ReRULE stores low-reward hard-case rollout groups in a replay buffer during early GRPO training and reuses them in later stages through importance-sampled off-policy updates, redirecting computation toward boundary cases that still require learning. Theoretically, we show that ReRULE yields a tighter hard-case convergence bound than pure on-policy RULE. Empirically, ReRULE improves MUSE-Books Retain Quality from 46.3 to 56.2 while adding only 5--11% training time across benchmarks. Its limited improvement on the simpler TOFU setting further supports the intended conditional behavior: replay is most beneficial when the hard/easy disparity is pronounced.

URL PDF HTML ☆

赞 0 踩 0

2606.15514 2026-06-16 cs.RO cs.LG 交叉投稿

Reinforcement Learning-Guided Retrieval with Soft Fusion for Robust Multimodal Imitation Learning under Missing Modalities

强化学习引导的软融合检索用于缺失模态下的鲁棒多模态模仿学习

Hassan Ismkhan, Hamid Bouchahcia

发表机构 * Bournemouth University（伯恩茅斯大学）

AI总结提出RL4IL方法，利用强化学习策略从训练库中检索最相关专家演示，并通过软交叉注意力融合生成动作，有效处理传感器缺失问题，在LIBERO基准上超越现有方法。

详情

AI中文摘要

机器人系统通过多种输入模态感知世界——包括视觉摄像头流和自然语言指令——并必须基于这些信号选择适当的动作。然而，假设所有输入设备永久可用是不现实的，因为在部署过程中传感器可能失效、被遮挡或完全丢失。因此，鲁棒处理此类缺失模态场景对于真实世界的机器人操作至关重要。本文介绍了RL4IL，一种强化学习引导的模仿学习方法，通过从训练库中识别最相关的专家演示，为给定观测选择最合适的动作。一个强化学习策略，通过基于广度优先搜索候选集的近端策略优化进行训练，对候选演示进行排序，一个软交叉注意力融合头聚合它们的动作信号以产生最终预测。当推理时模态缺失时，一个专用的每模态RL检索策略从训练库中识别捐赠演示，一个软插补头通过交叉注意力在排名靠前的捐赠者上重建缺失嵌入——无需对系统进行任何重新训练。在三个LIBERO基准套件上的实验表明，RL4IL在传感器丢失条件下显著优于最先进的模仿学习方法，同时无需策略网络训练。代码可在https://github.com/h-ismkhan/Reinforcement-Learning-via-kNN-for-Robotic-Learning-with-Missing-Camera找到。

英文摘要

Robotic systems perceive the world through multiple input modalities -- including visual camera streams and natural language instructions -- and must select appropriate actions based on these signals. However, assuming the permanent availability of all input devices is unrealistic, as sensors may fail, become occluded, or drop out entirely during deployment. Robust handling of such missing-modality scenarios is therefore essential for real-world robot operation. This paper introduces RL4IL, a reinforcement learning guided method for imitation learning that selects the most suitable action for a given observation by identifying the most relevant expert demonstrations from a training library. A reinforcement learning policy, trained via Proximal Policy Optimisation over Breadth-First Search candidate sets, ranks candidate demonstrations and a soft cross-attention fusion head aggregates their action signals to produce the final prediction. When a modality is missing at inference time, a dedicated per-modality RL retrieval policy identifies donor demonstrations from the training library, and a soft imputation head reconstructs the missing embedding via cross-attention over the top-ranked donors -- without requiring any retraining of the system. Experiments on three LIBERO benchmark suites demonstrate that RL4IL substantially outperforms state-of-the-art imitation learning methods under sensor dropout conditions, while requiring no policy network training. The code can be found at https://github.com/h-ismkhan/Reinforcement-Learning-via-kNN-for-Robotic-Learning-with-Missing-Camera

URL PDF HTML ☆

赞 0 踩 0

2606.15866 2026-06-16 cs.AI cs.LG 交叉投稿

RL-Index：用于检索索引推理的强化学习

Yongjia Lei, Nedim Lipka, Zhisheng Qi, Utkarsh Sahu, Koustava Goswami, Franck Dernoncourt, Ryan A. Rossi, Yu Wang

发表机构 * University of Oregon（俄勒冈大学）； Adobe Research（Adobe研究）

AI总结提出RL-Index框架，将检索索引推理转化为强化学习问题，通过LLM生成理由增强文档，使用GRPO优化，提升检索和问答性能并降低在线延迟。

详情

AI中文摘要

检索外部知识对于解决现实世界任务至关重要，但当查询与其相关知识之间的关系涉及超越表面语义或词汇匹配的隐式和复杂推理时（例如，依赖同一定理的数学问题或需要深度推理的编码），仍然具有挑战性。现有方法主要依赖查询端推理（例如，查询重写），这引入了显著的在线延迟，并且未能充分利用对知识语料库本身进行推理的机会（即索引端推理）。在本文中，我们提出了RL-Index，一个智能索引框架，将检索索引推理形式化为强化学习问题。RL-Index不是在进行查询时执行推理，而是通过用LLM生成的理由增强文档，将推理转移到索引阶段，这些理由显式编码了潜在的查询-知识关系。为了优化这些理由的质量，我们采用了组相对策略优化（GRPO），并使用检索相似性作为可验证的奖励信号，从而能够直接优化索引决策以提高检索效果。在BRIGHT基准上的大量实验表明，RL-Index持续提高了检索和下游问答性能，同时显著降低了在线推理延迟。此外，学到的理由增强跨不同的检索器和生成器具有泛化能力，突显了其作为即插即用索引策略在不同检索系统中的鲁棒性。

英文摘要

Retrieving external knowledge is essential for solving real-world tasks, yet it remains challenging when the relationship between a query and its relevant knowledge involves implicit and complex reasoning beyond surface-level semantic or lexical matching (e.g., mathematical problems relying on the same theorem or coding requiring deep reasoning). Existing approaches primarily rely on query-side reasoning (e.g., query rewriting), which introduces significant online latency and underutilizes the opportunity to perform reasoning over the knowledge corpus itself (i.e., index-side reasoning). In this paper, we propose RL-Index, an agentic indexing framework that formulates retrieval index reasoning as a reinforcement learning problem. Instead of performing reasoning at query time, RL-Index shifts reasoning to the indexing stage by augmenting documents with LLM-generated rationales that explicitly encode the latent query-knowledge relationship. To optimize the quality of these rationales, we employ Group Relative Policy Optimization (GRPO) and use retrieval similarity as a verifiable reward signal, enabling direct optimization of indexing decisions for retrieval effectiveness. Extensive experiments on the BRIGHT benchmark demonstrate that RL-Index consistently improves both retrieval and downstream question-answering performance, while significantly reducing online inference latency. Moreover, the learned rationale augmentation generalizes across diverse retrievers and generators, highlighting its robustness as a plug-and-play indexing strategy across different retrieval systems.

URL PDF HTML ☆

赞 0 踩 0

2606.16496 2026-06-16 cs.CL cs.LG 交叉投稿

REFLEX: Reflective Evolution from LLM Experience

REFLEX: 基于大语言模型经验的反思进化

Pan Wang

AI总结提出REFLEX框架，通过解耦视觉诊断与代码生成实现可审计的高效策略进化，在控制任务和天线阵列合成中展现优异样本效率。

详情

AI中文摘要

大型多模态语言模型已成为引导进化搜索朝向可解释程序化策略的强大工具。然而，现有框架依赖单一模型调用来同时解释视觉行为证据并合成修正代码。这种诊断-修复纠缠造成了不透明的反馈循环，掩盖了突变背后的理由，并阻止了跨独立运行的算法洞察保留。为了实现可审计且高效的策略搜索，我们认为视觉诊断必须在结构上与代码生成解耦。我们提出了REFLEX，一个无需训练的进化框架，实现了这种解耦。在REFLEX中，一个具备视觉能力的Critic首先将任务特定的行为证据提炼为结构化的、可审计的诊断。随后，一个文本优化的Actor利用这些诊断以及一个持久且自我进化的可重用代码片段技能记忆来合成子代策略。这种架构不仅提供了透明的突变轨迹，还实现了跨运行的程序化知识迁移。在控制基准（Lunar Lander、Acrobot、Pendulum）和一个36维天线阵列合成任务上的广泛评估展示了卓越的样本效率。值得注意的是，REFLEX在不到10次大语言模型调用中解决了Acrobot和Pendulum，并在Lunar Lander上达到了最佳归一化加权分数1.092，实现了极具竞争力的最终性能，同时显著加速了透明策略的早期发现。

英文摘要

Large multimodal language models (LLMs) have emerged as powerful tools for guiding evolutionary search toward interpretable programmatic policies. However, existing frameworks rely on a monolithic model call to simultaneously interpret visual behavioral evidence and synthesize corrective code. This diagnosis-repair entanglement creates an opaque feedback loop, obscuring the rationale behind mutations and preventing the retention of algorithmic insights across independent runs. To achieve auditable and efficient policy search, we argue that visual diagnosis must be structurally decoupled from code generation. We present REFLEX, a train-free evolutionary framework that operationalizes this decoupling. In REFLEX, a vision-enabled Critic first distills task-specific behavioral evidence into structured, auditable diagnoses. Subsequently, a text-optimized Actor synthesizes child policies using these diagnoses alongside a persistent, self-evolving Skill Memory of reusable code snippets. This architecture not only provides transparent mutation traces but also enables cross-run programmatic knowledge transfer. Extensive evaluations across control benchmarks (Lunar Lander, Acrobot, Pendulum) and a 36-dimensional antenna array synthesis task demonstrate exceptional sample efficiency. Notably, REFLEX solves Acrobot and Pendulum in under 10 LLM calls and reaches a best Normalized Weighted Score of 1.092 on Lunar Lander, achieving highly competitive final performance while significantly accelerating the early-stage discovery of transparent policies.

URL PDF HTML ☆

赞 0 踩 0

2606.16978 2026-06-16 cs.RO cs.LG cs.SY eess.SY 交叉投稿

Task-Error Residual Learning for Real-Robot Five-Ball Juggling

任务误差残差学习用于真实机器人五球杂耍

Kai Ploeger, Jan Peters

发表机构 * Technical University of Darmstadt（达姆施塔特工业大学）； German Research Center for AI (DFKI)（德国人工智能研究中心）； Hessian Center for Artificial Intelligence (hessian.AI)（黑森州人工智能中心）

AI总结提出基于任务误差方向监督和误差模型驱动样本选择的残差学习方法，在Barrett WAM机械臂上实现稳定三、四、五球杂耍，首次尝试失败后任务误差单调递减，无需进一步失败。

Comments Submitted to the 2026 International Symposium on Robotics Research (ISRR)

详情

AI中文摘要

对于改进现有行为的残差学习，样本效率取决于两个因素：每次试错返回的信息量，以及学习器使用这些信息的效率。强化学习的标准标量奖励携带的信息远少于定义任务的方向性任务误差。随机探索进一步丢弃了每次试错返回的信息。通过使用方向性任务误差监督和驱动样本选择的任务误差模型进行残差学习，我们在拟人化Barrett WAM机械臂上实现了稳定的三、四、五球杂耍。尽管通过简单、理想化的堆栈进行规划和控制，系统从第二次尝试开始收敛。第一次尝试失败后，任务误差单调递减，没有进一步的失败。相比之下，五球杂耍通常需要人类多年的练习。我们在三个三元轴上比较残差学习器：学习反馈中的方向性信息和分析先验的承诺，涵盖牛顿式雅可比更新、复合贝叶斯优化和随机搜索方法。两个轴都被证明是必要的：方向性反馈或信息性先验单独都不足够，而结合它们的最简单方法——固定雅可比牛顿更新——是最可靠的。学习到的残差能够容忍大量的先验失准和退化的关节跟踪，主要影响收敛速度。因此，真实机器人上残差学习的瓶颈是监督信号的信息内容以及学习器如何使用它，而不是周围堆栈的精度。所有实验的视频文档可在 https://kai-ploeger.com/residual-juggling 获取。

英文摘要

For residual learning that refines existing behavior, sample efficiency depends on two things: how much information each rollout returns, and how efficiently the learner uses that information. Reinforcement learning's standard scalar reward carries far less information than the directional task error that defines the task. Random exploration further discards whatever information each rollout returns. Through residual learning with directional task-error supervision and a task error model that drives sample selection, we achieve stable three-, four-, and five-ball juggling on anthropomorphic Barrett WAM arms. Despite planning and controlling through a simple, idealized stack, the system converges from the second attempt. The first attempt drops, after which task error decreases monotonically without further failures. In comparison, five-ball juggling typically takes humans years of practice. We compare residual learners across two ternary axes, the directional information in the learning feedback and the commitment of the analytic prior, spanning Newton-style Jacobian updates, Composite Bayesian Optimization, and stochastic search methods. Both axes prove necessary: neither directional feedback nor an informative prior suffices alone, and the simplest method that combines them, a fixed-Jacobian Newton update, is the most reliable. The learned residual tolerates substantial prior misalignment and degraded joint tracking, affecting mainly convergence speed. The bottleneck for residual learning on real robots is therefore the information content of the supervision signal and how the learner uses it, not the accuracy of the surrounding stack. Video documentation of all experiments is available at https://kai-ploeger.com/residual-juggling.

URL PDF HTML ☆

赞 0 踩 0

2606.16995 2026-06-16 cs.AI cs.LG 交叉投稿

When in Doubt, Plan It Out: Committed Small Language Model Deliberation for Reactive Reinforcement Learning

存疑则计划：用于反应式强化学习的小型语言模型承诺式推理

Nathan Gavenski, Juarez Monteiro, Francisco Galuppo, Adriano Veloso, Odinaldo Rodrigues

AI总结提出PACT混合架构，结合快速反应式强化学习策略与慢速小型语言模型规划器，通过异步生成和验证候选动作计划来提升策略在陌生环境中的表现。

Comments LM4Plan Workshop at ICML 2026

2606.17011 2026-06-16 cs.RO cs.LG 交叉投稿

ROVE: Unlocking Human Interventions for Humanoid Manipulation via Reinforcement Learning

ROVE: 通过强化学习解锁人类干预用于人形机器人操作

Wei Xiao, Weiliang Tang, Yuying Ge, Hui Zhou, Yao Mu, Li Zhang, Yixiao Ge

发表机构 * XPENG Robotics（小鹏机器人）； Fudan University（复旦大学）； The Chinese University of Hong Kong（香港中文大学）； Shanghai Jiao Tong University（上海交通大学）

AI总结提出ROVE框架，利用强化学习和乐观价值估计，从次优人类干预轨迹中学习高价值行为，提升人形机器人操作性能。

详情

AI中文摘要

人类干预为视觉-语言-动作（VLA）模型的后训练提供了关键的纠正信号。然而，由于复杂的全身运动学和灵巧手控制，实现无缝的人形干预是一个严峻的系统挑战。因此，收集到的干预轨迹往往是次优的，依赖人类干预作为专家监督的方法可能会吸收犹豫、低效甚至错误的行为。为了解决系统和算法两方面的挑战，我们提出了ROVE，一个用于人形VLA后训练的强化学习框架，能够处理不完美的人类干预。首先，ROVE引入了一个人在环的流水线，能够收集人形操作中的部署和干预数据。其次，它利用乐观价值估计（OVE）从混合质量的轨迹中优先考虑高价值行为。为了进一步增强价值估计的鲁棒性，我们融入了跨具身的人类经验视频，为长尾失败和恢复模式提供丰富的监督。由此产生的评论家产生信息丰富的优势信号，引导VLA演员专注于高价值行为，而不是不加区分地模仿所有动作。在具有挑战性的真实世界接触密集和精细的人形操作任务中，ROVE优于基于经验学习的基线，并在多次部署-干预迭代中持续改进。

英文摘要

Human interventions provide crucial corrective signals for post-training Vision-Language-Action (VLA) models. However, enabling seamless humanoid interventions is a formidable systems challenge due to complex whole-body kinematics and dexterous-hand control. Consequently, the collected intervention trajectories are often suboptimal, and methods that rely on human interventions as expert supervision can absorb hesitant, inefficient, or even erroneous behaviors. To address both the system and algorithmic challenges, we propose ROVE, a reinforcement learning framework for humanoid VLA post-training with imperfect human interventions. First, ROVE introduces a human-in-the-loop pipeline capable of collecting deployment and intervention data for humanoid manipulation. Second, it utilizes Optimistic Value Estimation (OVE) to prioritize high-value behaviors from mixed-quality trajectories. To further robustify value estimation, we incorporate cross-embodiment human experience videos to provide rich supervision for long-tailed failure and recovery modes. The resulting critic yields informative advantage signals, steering the VLA actor to focus on high-value behaviors rather than indiscriminately imitating all actions. On challenging real-world contact-rich and fine-grained humanoid manipulation tasks, ROVE outperforms experience-learning baselines and consistently improves across multiple rollout-intervention iterations.

URL PDF HTML ☆

赞 0 踩 0

2606.17043 2026-06-16 cs.RO cs.LG 交叉投稿

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

基于层级优势加权的在线RL微调VLA策略从稀疏回合结果

Tongyan Fang, Siyuan Huang, Naiyu Fang, Ganlong Zhao, Zhongjin Luo, Jianbo Liu, Xiaogang Wang, Ying Dong, Hongsheng Li

发表机构 * ACE Robotics ； Shenzhen International Graduate School, Tsinghua University（清华大学深圳国际研究生院）； The Chinese University of Hong Kong（香港中文大学）

AI总结提出层级优势加权行为克隆（HABC），通过分离生存性和效率目标并自适应平衡，解决稀疏二元结果下VLA策略在线微调中的信用分配问题，在三个双臂接触任务上将成功率从12-44%提升至38-92%。

Comments Website: https://acerobotics-vla.github.io/HABC-Website

详情

AI中文摘要

当预训练的VLA策略通过在线RL进行微调时，每次 rollout 回合仅产生单个二元结果（成功或失败），但 actor 更新需要每个时间步的监督。现有方法通常将此稀疏结果简化为单个标量奖励或优势信号，这混淆了不同形式的过渡级反馈，并且在基本任务成功可实现后提供的指导有限。首先，单个标量信号混淆了生存性和效率这两个目标；一旦基本成功实现，二元标签无法提供梯度来区分高效完成与缓慢完成。其次，真实世界的 rollout 混合了自主段和干预段；天真地将回合结果跨这些边界分配会导致不正确的信用分配。为解决这些问题，我们提出层级优势加权行为克隆（HABC），该方法在不同数据子集上为这两个目标训练独立的评论家头，并通过状态自适应平衡组合其输出。状态自适应门 $g_t$ 合并它们的一步优势，在成功不确定时优先考虑生存性，仅在生存性高时转向效率，并将结果转换为 actor 损失上的每时间步权重。干预感知的信用分配进一步将结果标签限制在当前策略执行的段，防止监督跨干预边界泄漏。在三个接触丰富的双臂任务上的真实机器人实验中，HABC 将监督微调（SFT）基线的成功率从 36%、44% 和 12% 提升至 92%、88% 和 38%。

英文摘要

When pretrained VLA policies are fine-tuned through online RL, each rollout episode produces only a single binary outcome (success or failure), yet the actor update requires per-transition supervision. Existing approaches commonly reduce this sparse outcome to a single scalar reward or advantage signal, which conflates distinct forms of transition-level feedback and provides limited guidance once basic task success becomes achievable. First, a single scalar signal conflates the two objectives of viability and efficiency; once basic success is achieved, the binary label provides no gradient to distinguish efficient completions from slow ones. Second, real-world rollouts mix autonomous and intervention segments; naively assigning episode outcomes across these boundaries introduces incorrect credit assignment. To address these issues, we propose Hierarchical Advantage-Weighted Behavior Cloning (HABC), which trains separate critic heads for these two objectives on different data subsets and combines their outputs with a state-adaptive balance. A state-adaptive gate $g_t$ merges their one-step advantages, prioritizing viability when success is uncertain and shifting to efficiency only when viability is high, and converts the result into per-transition weights on the actor loss. Intervention-aware credit assignment further restricts outcome labels to segments executed by the current policy, preventing supervision from leaking across intervention boundaries. In real-robot experiments on three contact-rich bimanual tasks, HABC raises success from supervised fine-tuning (SFT) baselines of 36%, 44%, and 12% to 92%, 88%, and 38%.

URL PDF HTML ☆

赞 0 踩 0

2409.18909 2026-06-16 cs.LG cs.IT math.IT stat.ML 版本更新

Best Arm Identification with Minimal Regret

最小化遗憾的最佳臂识别

Junwen Yang, Vincent Y. F. Tan, Tianyuan Jin

发表机构 * Institute of Operations Research and Analytics National University of Singapore（运营研究与分析研究所，新加坡国立大学）； Department of Mathematics Department of Electrical and Computer Engineering Institute of Operations Research and Analytics National University of Singapore（数学系电子与计算机工程系运营研究与分析研究所，新加坡国立大学）； Department of Mathematics National University of Singapore（数学系新加坡国立大学）

AI总结提出在最小化累积遗憾的同时以置信度δ识别最佳臂的问题，利用信息论推导下界，并设计渐近最优的Double KL-UCB算法。

详情

AI中文摘要

受需要负责任实验的现实应用启发，我们提出了最小化遗憾的最佳臂识别（BAI）问题。这一多臂老虎机问题的变体优雅地融合了其两个最普遍的目标：遗憾最小化和BAI。更准确地说，智能体的目标是以规定的置信水平δ识别最佳臂，同时最小化直到停止时间的累积遗憾。聚焦于单参数指数族分布，我们利用信息论技术建立了期望累积遗憾的实例相关下界。此外，我们提出了一个不可能结果，强调了固定置信度BAI中累积遗憾与样本复杂度之间的张力。作为补充，我们设计并分析了Double KL-UCB算法，该算法在置信水平趋近于零时达到渐近最优性。值得注意的是，该算法采用两种不同的置信界限以随机方式指导臂选择。我们的发现阐明了遗憾最小化与BAI之间内在联系的新视角。

英文摘要

Motivated by real-world applications that necessitate responsible experimentation, we introduce the problem of best arm identification (BAI) with minimal regret. This variant of the multi-armed bandit problem elegantly amalgamates two of its most ubiquitous objectives: regret minimization and BAI. More precisely, the agent's goal is to identify the best arm with a prescribed confidence level $δ$, while minimizing the cumulative regret up to the stopping time. Focusing on single-parameter exponential families of distributions, we leverage information-theoretic techniques to establish an instance-dependent lower bound on the expected cumulative regret. Moreover, we present an impossibility result that underscores the tension between cumulative regret and sample complexity in fixed-confidence BAI. Complementarily, we design and analyze the Double KL-UCB algorithm, which achieves asymptotic optimality as the confidence level tends to zero. Notably, this algorithm employs two distinct confidence bounds to guide arm selection in a randomized manner. Our findings elucidate a fresh perspective on the inherent connections between regret minimization and BAI.

URL PDF HTML ☆

赞 0 踩 0

2501.19401 2026-06-16 cs.LG stat.ML 版本更新

DAL: A Practical Prior-Free Black-Box Framework for Piecewise Stationary Bandits

DAL：一种面向分段平稳赌博机的实用无先验黑盒框架

Argyrios Gerogiannis, Yu-Han Huang, Subhonmesh Bose, Venugopal V. Veeravalli

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； University of California, Berkeley（加州大学伯克利分校）

AI总结提出检测增强学习（DAL）框架，无需非平稳性先验知识，将任意最优静态赌博机算法与变化检测器结合，在多种非平稳场景下超越现有方法。

Comments 28 pages, 12 figures

2502.19544 2026-06-16 cs.LG cs.RO 版本更新

Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data

通过非策划数据引导世界模型的高效强化学习

Yi Zhao, Aidan Scannell, Wenshuai Zhao, Yuxin Hou, Tianyu Cui, Le Chen, Dieter Büchler, Arno Solin, Juho Kannala, Joni Pajarinen

发表机构 * Aalto University（阿alto大学）； University of Edinburgh（爱丁堡大学）； ELLIS Institute Finland（芬兰ELLIS研究所）； Deep Render ； Imperial College London（伦敦帝国理工学院）； Max Planck Institute for Intelligent Systems（马克斯·普朗克智能系统研究所）； CIFAR AI Chair（CIFAR人工智能主席）； University of Alberta（阿尔伯塔大学）； Alberta Machine Intelligence Institute (Amii)（阿尔伯塔机器智能研究所（Amii））； University of Oulu（奥卢大学）

AI总结提出利用无奖励、混合质量、多本体的非策划离线数据，通过经验回放和执行引导技术解决分布偏移问题，显著提升在线强化学习的样本效率。

详情

AI中文摘要

利用离线数据是提高在线强化学习（RL）样本效率的一种有前景的方法。本文通过利用丰富的非策划数据（无奖励、混合质量、跨多个本体收集）来扩展离线到在线RL的可用数据池。尽管学习世界模型似乎有望利用此类数据，但我们发现简单的微调在许多任务上无法加速RL训练。通过仔细研究，我们将这种失败归因于微调期间离线数据和在线数据之间的分布偏移。为了解决这个问题并有效使用离线数据，我们提出了两种技术：\emph{i)} 经验回放和\emph{ii)} 执行引导。通过这些修改，非策划离线数据显著提高了RL的样本效率。在有限的样本预算下，我们的方法在跨越6个本体的72个视觉运动任务上，实现了几乎两倍于从头学习基线的总得分。在诸如移动和机器人操作等具有挑战性的任务上，它显著优于先前利用离线数据的方法。

英文摘要

Leveraging offline data is a promising way to improve the sample efficiency of online reinforcement learning (RL). This paper expands the pool of usable data for offline-to-online RL by leveraging abundant non-curated data that is reward-free, of mixed quality, and collected across multiple embodiments. Although learning a world model appears promising for utilizing such data, we find that naive fine-tuning fails to accelerate RL training on many tasks. Through careful investigation, we attribute this failure to the distributional shift between offline and online data during fine-tuning. To address this issue and effectively use the offline data, we propose two techniques: \emph{i)} experience rehearsal and \emph{ii)} execution guidance. With these modifications, the non-curated offline data substantially improves RL's sample efficiency. Under limited sample budgets, our method achieves nearly twice the aggregate score of learning-from-scratch baselines across 72 visuomotor tasks spanning 6 embodiments. On challenging tasks such as locomotion and robotic manipulation, it outperforms prior methods that utilize offline data by a decent margin.

URL PDF HTML ☆

赞 0 踩 0

2510.01721 2026-06-16 cs.LG 版本更新

Finite-Time Convergence of Distributionally Robust Q-Learning with Linear Function Approximation

具有线性函数逼近的分布鲁棒Q学习的有限时间收敛性

Saptarshi Mandal, Yashaswini Murthy, R. Srikant

发表机构 * ECE and CSL University of Illinois Urbana-Champaign（电子与计算机工程系和计算机科学实验室，伊利诺伊大学厄巴纳-香槟分校）； Computing and Mathematical Sciences California Institute of Technology（计算与数学科学，加州理工学院）； ECE, CSL, and NCSA University of Illinois Urbana-Champaign（电子与计算机工程系、计算机科学实验室和网络与计算科学中心，伊利诺伊大学厄巴纳-香槟分校）

AI总结针对未知标称模型下的折扣鲁棒强化学习问题，提出一种结合目标网络和双函数逼近的模型无关鲁棒Q学习算法，并证明其有限时间收敛到最优鲁棒Q函数。

Comments Preprint. 54 Pages, 2 figures

详情

AI中文摘要

线性集成采样的尖锐分析

David Janz, Arya Akhavan, Csaba Szepesvári

发表机构 * University of Oxford, UK（牛津大学，英国）； University of Alberta, Canada（阿尔伯塔大学，加拿大）

AI总结本文针对随机线性bandits中的线性集成采样(ES)方法，证明当集成大小m=Θ(d log n)时，ES达到~O(d^{3/2}√n)的高概率遗憾，缩小了与汤普森采样基准的差距，同时保持计算量相当。

2602.08210 2026-06-16 cs.LG stat.ML 版本更新

CADO: From Imitation to Cost Minimization for Heatmap-based Solvers in Combinatorial Optimization

CADO：从模仿到成本最小化的组合优化热力图求解器

Hyungseok Song, Deunsol Yoon, Kanghoon Lee, Han-Seul Jeong, Soonyoung Lee, Woohyung Lim

发表机构 * LG AI Research（LG人工智能研究院）

AI总结针对热力图求解器监督训练中模仿损失与成本最小化的目标不匹配问题，提出CADO框架，通过强化学习微调直接优化解码后解的成本，在多个基准上取得最优性能。

Comments 22 pages, 4 figures. Accepted for publication in Transactions on Machine Learning Research (TMLR), 2026. OpenReview: https://openreview.net/forum?id=fvxx5FOED6

详情

AI中文摘要

基于热力图的求解器已成为组合优化（CO）的一种有前景的范式。然而，我们认为主流的监督学习（SL）训练范式存在根本性的目标不匹配：最小化模仿损失（例如交叉熵）并不能保证解的成本最小化。我们将这种不匹配分解为两个缺陷：解码器盲区（忽视不可微的解码过程）和成本盲区（优先考虑结构模仿而非解的质量）。我们通过实验证明，这些内在缺陷施加了硬性性能上限。为了克服这一限制，我们提出了CADO（成本感知的优化扩散模型），一个简化的强化学习微调框架，将扩散去噪过程建模为MDP，以直接优化解码后的解成本。我们引入了标签中心奖励，将真实标签重新用作无偏基线而非模仿目标，以及混合微调以实现参数高效的适应。CADO在多个基准上取得了最先进的性能，验证了目标对齐对于释放热力图求解器全部潜力的必要性。

英文摘要

Heatmap-based solvers have emerged as a promising paradigm for Combinatorial Optimization (CO). However, we argue that the dominant Supervised Learning (SL) training paradigm suffers from a fundamental objective mismatch: minimizing imitation loss (e.g., cross-entropy) does not guarantee solution cost minimization. We dissect this mismatch into two deficiencies: Decoder-Blindness (being oblivious to the non-differentiable decoding process) and Cost-Blindness (prioritizing structural imitation over solution quality). We empirically demonstrate that these intrinsic flaws impose a hard performance ceiling. To overcome this limitation, we propose CADO (Cost-Aware Diffusion models for Optimization), a streamlined Reinforcement Learning fine-tuning framework that formulates the diffusion denoising process as an MDP to directly optimize the post-decoded solution cost. We introduce Label-Centered Reward, which repurposes ground-truth labels as unbiased baselines rather than imitation targets, and Hybrid Fine-Tuning for parameter-efficient adaptation. CADO achieves state-of-the-art performance across diverse benchmarks, validating that objective alignment is essential for unlocking the full potential of heatmap-based solvers.

URL PDF HTML ☆

赞 0 踩 0

2602.20804 2026-06-16 cs.LG cs.MA 版本更新

Probing Dec-POMDP Reasoning in Cooperative MARL

探究合作多智能体强化学习中的Dec-POMDP推理

Kale-ab Tessera, Leonard Hinckeldey, Riccardo Zamboni, David Abel, Amos Storkey

发表机构 * University of Edinburgh（爱丁堡大学）

AI总结通过统计和信息论探针分析基线策略，发现多数基准测试无需真正的Dec-POMDP推理，反应策略性能与记忆策略相当，协调常依赖脆弱的同步耦合。

Comments To appear at the 25th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2026), added DOI

详情

DOI: 10.65109/ECCJ1033
Journal ref: AAMAS 2026

AI中文摘要

合作多智能体强化学习通常被建模为分散式部分可观测马尔可夫决策过程（Dec-POMDP），其难度源于两个关键挑战：部分可观测性和分散式协调。真正解决此类任务需要Dec-POMDP推理，即智能体利用历史推断隐藏状态并基于局部信息进行协调。然而，目前尚不清楚流行的基准测试是否真正需要这种推理，还是可以通过更简单的策略取得成功。我们引入了一套诊断工具，结合统计上可靠的性能比较和信息论探针，审计基线策略（IPPO和MAPPO）在涵盖MPE、SMAX、Overcooked、Hanabi和MaBrax的37个场景中的行为复杂度。我们的诊断表明，这些基准测试的成功很少需要真正的Dec-POMDP推理。在超过一半的场景中，反应策略的性能与基于记忆的智能体相当，并且涌现的协调常常依赖于脆弱的同步动作耦合，而非稳健的时间影响。这些发现表明，在当前训练范式下，一些广泛使用的基准测试可能未能充分测试核心的Dec-POMDP假设，可能导致对进展的过于乐观的评估。我们发布了诊断工具，以支持合作MARL中更严格的环境设计和评估。

英文摘要

Cooperative multi-agent reinforcement learning (MARL) is typically framed as a decentralised partially observable Markov decision process (Dec-POMDP), a setting whose hardness stems from two key challenges: partial observability and decentralised coordination. Genuinely solving such tasks requires Dec-POMDP reasoning, where agents use history to infer hidden states and coordinate based on local information. Yet it remains unclear whether popular benchmarks actually demand this reasoning or permit success via simpler strategies. We introduce a diagnostic suite combining statistically grounded performance comparisons and information-theoretic probes to audit the behavioural complexity of baseline policies (IPPO and MAPPO) across 37 scenarios spanning MPE, SMAX, Overcooked, Hanabi, and MaBrax. Our diagnostics reveal that success on these benchmarks rarely requires genuine Dec-POMDP reasoning. Reactive policies match the performance of memory-based agents in over half the scenarios, and emergent coordination frequently relies on brittle, synchronous action coupling rather than robust temporal influence. These findings suggest that some widely used benchmarks may not adequately test core Dec-POMDP assumptions under current training paradigms, potentially leading to over-optimistic assessments of progress. We release our diagnostic tooling to support more rigorous environment design and evaluation in cooperative MARL.

URL PDF HTML ☆

赞 0 踩 0

2603.23249 2026-06-16 cs.LG cs.AI math.OC 版本更新

A Learning Method with Gap-Aware Generation for Heterogeneous DAG Scheduling

一种具有间隙感知生成的异构DAG调度学习方法

Ruisong Zhou, Haijun Zou, Li Zhou, Chumin Sun, Zaiwen Wen

发表机构 * School of Mathematical Science, Peking University（北京大学数学科学学院）； State Key Laboratory of Mathematical Sciences, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences（数学科学国家重点实验室，计算数学与科学/工程计算研究所，中国科学院数学系统科学研究院）； Theory Lab, Central Research Institute, 2012 Labs, Huawei Technologies Co., Ltd（华为技术有限公司2012实验室理论实验室，中央研究院）； Beijing International Center for Mathematical Research, Peking University（北京大学北京国际数学研究中心）

AI总结提出WeCAN，一种端到端强化学习框架，通过加权交叉注意力编码器建模任务-资源池兼容性，并引入跳序扩展生成机制消除调度间隙，在TPC-H等真实DAG上优于强基线。

Comments 31pages, 8 figures

详情

AI中文摘要

有向无环图（DAG）的高效调度是大规模数据密集型计算系统的核心问题，其中查询计划、数据处理工作负载和计算图由依赖任务组成，这些任务竞争有限的异构资源池。在实践中，实现高性能执行需要调度器适应具有不同资源池和任务类型的环境，同时在严格运行时预算下生成调度。我们提出WeCAN，一种用于异构DAG调度的端到端强化学习框架，解决了任务-资源池兼容系数和生成诱导的最优性间隙。它采用两阶段单次通过设计：单次前向传播产生任务-资源池分数和全局参数，随后通过生成映射构建调度，无需重复网络调用。其加权交叉注意力编码器通过兼容系数门控建模任务-资源池交互，并且对环境波动具有规模无关性。此外，广泛使用的列表调度映射可能因受限可达性而产生生成诱导的最优性间隙。我们引入一种顺序空间分析，通过可行调度顺序刻画生成映射的可达集，解释生成诱导间隙的机制，并给出间隙消除的充分条件。在这些条件指导下，我们设计了一种跳序扩展实现，具有解析参数化的递减跳序规则，在保持单次通过效率的同时扩大可达顺序集。在真实TPC-H查询DAG、资源密集型工作负载数据集和ML编译器计算图上的实验表明，相比强基线，我们改善了完工时间，推理时间与经典启发式相当，且快于多轮神经调度器。

英文摘要

Efficient scheduling of directed acyclic graphs (DAGs) is a core problem in large-scale data-intensive computing systems, where query plans, data-processing workloads, and computation graphs consist of dependent tasks competing for limited heterogeneous resource pools. In practice, achieving high-performance execution requires schedulers to adapt across environments with varying resource pools and task types, while generating schedules under tight runtime budgets. We propose WeCAN, an end-to-end reinforcement learning framework for heterogeneous DAG scheduling that addresses task-pool compatibility coefficients and generation-induced optimality gaps. It adopts a two-stage single-pass design: a single forward pass produces task-pool scores and global parameters, followed by a generation map that constructs schedules without repeated network calls. Its weighted cross-attention encoder models task-pool interactions gated by compatibility coefficients, and is size-agnostic to environment fluctuations. Moreover, widely used list-scheduling maps can incur generation-induced optimality gaps from restricted reachability. We introduce an order-space analysis that characterizes the reachable set of generation maps via feasible schedule orders, explains the mechanism behind generation-induced gaps, and yields sufficient conditions for gap elimination. Guided by these conditions, we design a skip-extended realization with an analytically parameterized decreasing skip rule, which enlarges the reachable order set while preserving single-pass efficiency. Experiments on real-world TPC-H query DAGs, resource-intensive workload datasets, and ML-compiler computation graphs demonstrate improved makespan over strong baselines, with inference time comparable to classical heuristics and faster than multi-round neural schedulers.

URL PDF HTML ☆

赞 0 踩 0

2603.27450 2026-06-16 cs.LG 版本更新

FlowRL: A Taxonomy and Modular Framework for Reinforcement Learning with Diffusion Policies

FlowRL：基于扩散策略的强化学习分类与模块化框架

Chenxiao Gao, Edward Chen, Tianyi Chen, Bo Dai

AI总结提出扩散/流策略强化学习算法的统一分类法，构建模块化JAX框架，在多个基准上提供系统比较，为算法设计与应用提供指导。

Comments accepted by RLC 2026

详情

AI中文摘要

由于其显著的灵活性，扩散模型和流模型已成为策略表示的有前途的候选者。然而，在这些策略上进行有效的强化学习仍然是一个挑战，因为缺乏用于普通策略梯度估计器的显式对数概率。尽管已经提出了许多尝试来解决这个问题，但该领域缺乏统一的视角来调和这些看似不同的方法，从而阻碍了持续的发展。在本文中，我们通过引入一个针对扩散/流策略的强化学习算法的全面分类法来弥合这一差距。为了支持可重复性和快速原型设计，我们引入了一个基于JAX的模块化开源代码库，该库利用JIT编译进行高吞吐量训练。最后，我们在Gym-Locomotion、DeepMind Control Suite和IsaacLab上提供了系统化和标准化的基准测试，提供了基于扩散的方法的严格并排比较，并为从业者根据应用选择合适算法提供了指导。我们的工作为理解和算法设计建立了清晰的基础，为未来该领域的研究提供了高效工具包，并为生成模型和机器人领域的从业者提供了算法指南。我们的代码可在此https URL获取。

英文摘要

Thanks to their remarkable flexibility, diffusion models and flow models have emerged as promising candidates for policy representation. However, efficient reinforcement learning (RL) upon these policies remains a challenge due to the lack of explicit log-probabilities for vanilla policy gradient estimators. While numerous attempts have been proposed to address this, the field lacks a unified perspective to reconcile these seemingly disparate methods, thus hampering ongoing development. In this paper, we bridge this gap by introducing a comprehensive taxonomy for RL algorithms with diffusion/flow policies. To support reproducibility and agile prototyping, we introduce a modular, JAX-based open-source codebase that leverages JIT-compilation for high-throughput training. Finally, we provide systematic and standardized benchmarks across Gym-Locomotion, DeepMind Control Suite, and IsaacLab, offering a rigorous side-by-side comparison of diffusion-based methods and guidance for practitioners to choose proper algorithms based on the application. Our work establishes a clear foundation for understanding and algorithm design, a high-efficiency toolkit for future research in the field, and an algorithmic guideline for practitioners in generative models and robotics. Our code is available at https://github.com/typoverflow/flow-rl.

URL PDF HTML ☆

赞 0 踩 0

2604.13085 2026-06-16 cs.LG cs.AI 版本更新

Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments

自适应记忆结晶：动态环境中自主AI智能体学习

Rajat Khanda, Mohammad Baqar, Sambuddha Chakrabarti, Satyasaran Changdar

发表机构 * GitHub

AI总结提出自适应记忆结晶（AMC）架构，基于突触标记与捕获理论，通过三阶段记忆层次和随机微分方程实现持续强化学习，在多个基准上显著提升前向迁移、减少灾难性遗忘并降低内存占用。

详情

AI中文摘要

在动态环境中运行的自主AI智能体面临一个持续挑战：在不遗忘先前知识的情况下获取新能力。我们提出自适应记忆结晶（AMC），一种用于持续强化学习中渐进式经验巩固的记忆架构。AMC在概念上受突触标记与捕获（STC）理论的定性结构启发，即记忆经历离散的稳定阶段，但不声称模拟潜在的分子或突触机制。AMC将记忆建模为一个连续的结晶过程，其中经验根据多目标效用信号从可塑状态迁移到稳定状态。该框架引入了一个三阶段记忆层次（液态-玻璃态-晶态），由伊藤随机微分方程（SDE）控制，其群体行为由显式的福克-普朗克方程描述，该方程具有封闭形式的贝塔平稳分布。我们提供了以下证明：（i）结晶SDE的适定性和全局收敛到唯一的贝塔平稳分布；（ii）单个结晶状态指数收敛到其固定点，具有显式速率和方差界；（iii）端到端Q学习误差界和匹配的记忆容量下界，将SDE参数直接与智能体性能联系起来。在Meta-World MT50、Atari 20游戏序列学习和MuJoCo持续运动上的实证评估一致显示，前向迁移提高了34-43%（相对于最强基线），灾难性遗忘减少了67-80%，内存占用减少了62%。

英文摘要

Autonomous AI agents operating in dynamic environments face a persistent challenge: acquiring new capabilities without erasing prior knowledge. We present Adaptive Memory Crystallization (AMC), a memory architecture for progressive experience consolidation in continual reinforcement learning. AMC is conceptually inspired by the qualitative structure of synaptic tagging and capture (STC) theory, the idea that memories transition through discrete stability phases, but makes no claim to model the underlying molecular or synaptic mechanisms. AMC models memory as a continuous crystallization process in which experiences migrate from plastic to stable states according to a multi-objective utility signal. The framework introduces a three-phase memory hierarchy (Liquid--Glass--Crystal) governed by an Itô stochastic differential equation (SDE) whose population-level behavior is captured by an explicit Fokker--Planck equation admitting a closed-form Beta stationary distribution. We provide proofs of: (i) well-posedness and global convergence of the crystallization SDE to a unique Beta stationary distribution; (ii) exponential convergence of individual crystallization states to their fixed points, with explicit rates and variance bounds; and (iii) end-to-end Q-learning error bounds and matching memory-capacity lower bounds that link SDE parameters directly to agent performance. Empirical evaluation on Meta-World MT50, Atari 20-game sequential learning, and MuJoCo continual locomotion consistently shows improvements in forward transfer (+34--43\% over the strongest baseline), reductions in catastrophic forgetting (67--80\%), and a 62\% decrease in memory footprint.

URL PDF HTML ☆

赞 0 踩 0

2605.01961 2026-06-16 cs.LG 版本更新

使用强化学习优化全局和局部交叉数

Timo Brand, Henry Förster, Stephen Kobourov, Daniel Kohrt, Robin Schukrafft, Markus Wallinger, Johannes Zink

发表机构 * Technical University of Munich, Heilbronn, Germany（慕尼黑技术大学（海因斯贝格））； John Cabot University, Rome, Italy（约翰·卡博特大学）； Technical University of Munich, Garching, Germany（慕尼黑技术大学（戈林根））

AI总结将图绘制视为单玩家优化游戏，利用强化学习通过移动顶点减少边交叉，提出一种优化全局或局部交叉数的策略，在局部交叉数最小化上具有竞争力。

详情

AI中文摘要

图绘制关注图的算法可视化。一个好的图绘制易于阅读并有助于解决图上的任务。已确定好的图绘制中出现的几个属性。这些属性包括低交叉数、边之间的大角度、短边以及描绘对称性。其中许多属性是可明确度量的指标。这使我们认识到图绘制可以看作一个游戏。在本文中，我们研究一个单玩家优化游戏，其中玩家迭代移动直线图绘制的顶点以减少边交叉。该游戏自然产生于图绘制挑战赛的自动赛道，其中解决方案通过重复执行局部顶点移动获得。我们将此过程形式化为一个具有完全信息的游戏，并研究强化学习是否能发现有效的策略来玩这个游戏。我们的强化学习代理观察顶点的局部几何和结构上下文，并选择一个移动方向，目标是减少全局或局部交叉数，即总交叉数或每条边的最大交叉数。我们将所得策略与现有方法和标准基准图上的既定交叉最小化启发式算法进行比较。虽然我们的方法在最小化全局交叉数方面未超越最先进的方法，但在最小化局部交叉数方面具有竞争力且通常更优。

英文摘要

Graph drawing concerns the algorithmic visualization of graphs. A good drawing of a graph is easy to read and facilitates solving tasks on the graph. Several properties have been identified to occur in good drawings of graphs. Such properties include a low number of crossings, large angles between edges, short edges, and depicting symmetries. Many of these properties are explicitly measurable metrics. This brings us to the insight that graph drawing can be seen as a game. In this paper, we study a single-player optimization game in which the player iteratively moves vertices of a straight-line graph drawing to reduce edge crossings. This game arose naturally from the automatic track of the Graph Drawing Challenge, where solutions are obtained by repeatedly performing local vertex movements. We formalize this process as a game with full information and investigate whether reinforcement learning can discover effective strategies for playing it. Our reinforcement-learning agent observes the local geometric and structural context of a vertex and selects a movement direction with the goal of reducing either the global or the local crossing number, that is, the total number of crossings or the maximum number of crossings per edge. We compare the resulting strategies to existing methods and established crossing-minimization heuristics on standard benchmark graphs. While our approach does not out-compete state-of-the-art methods for minimizing the global crossing number, it is competitive and often superior for minimizing the local crossing number.

URL PDF HTML ☆

赞 0 踩 0

2510.01444 2026-06-16 cs.AI cs.CL cs.LG 版本更新

Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning

双不确定性引导的多模态推理策略学习

Rui Liu, Dian Yu, Tong Zheng, Runpeng Dai, Zongxia Li, Wenhao Yu, Zhenwen Liang, Linfeng Song, Haitao Mi, Pratap Tokekar, Dong Yu

发表机构 * Tencent Hunyuan（腾讯文汇）； University of Maryland（马里兰大学）； University of North Carolina（北卡罗来纳大学）

AI总结提出DUPL方法，通过量化感知不确定性和输出不确定性来引导策略更新，在多个多模态推理基准上显著提升模型准确率，优于现有方法。

详情

AI中文摘要

具有可验证奖励的强化学习（RLVR）已经提升了多模态大语言模型的推理能力。然而，现有方法通常将视觉输入视为确定性的，忽略了视觉模态固有的感知模糊性。因此，它们无法区分模型的不确定性是源于复杂推理还是模糊感知，从而无法有针对性地分配探索或学习信号。为了解决这一问题，我们引入了\textbf{DUPL}，一种用于多模态RLVR的双不确定性引导策略学习方法，该方法量化并利用感知不确定性（通过对称KL散度）和输出不确定性（通过策略熵）来指导策略更新。通过建立不确定性驱动的反馈循环并采用动态分支优先级机制，DUPL重新校准策略优势，将学习重点放在具有高感知或决策模糊性的状态上，从而实现超越被动数据增强的有效目标探索。在涵盖数学和通用领域的多个多模态推理基准上，DUPL取得了显著提升。它将Qwen2.5-VL的准确率提升了高达$\textbf{12.3%}$（3B）和$\textbf{7.9%}$（7B），将Qwen3-VL-Instruct的准确率提升了高达$\textbf{10.7%}$（4B）和$\textbf{12.4%}$（8B），持续优于GRPO，同时无缝泛化到其他算法（DAPO，平均$\textbf{+6.5%}$）和架构（LLaVA-OneVision-1.5，平均$\textbf{+4.7%}$）。这些结果表明，DUPL是一种有效且可泛化的多模态RLVR方法。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has advanced reasoning capabilities in multimodal large language models. However, existing methods typically treat visual inputs as deterministic, overlooking the perceptual ambiguity inherent to the visual modality. Consequently, they fail to distinguish whether a model's uncertainty stems from complex reasoning or ambiguous perception, preventing the targeted allocation of exploration or learning signals. To address this gap, we introduce \textbf{DUPL}, a dual-uncertainty guided policy learning approach for multimodal RLVR that quantifies and leverages both perceptual uncertainty (via symmetric KL divergence) and output uncertainty (via policy entropy) to guide policy updates. By establishing an uncertainty-driven feedback loop and employing a dynamic branch prioritization mechanism, DUPL recalibrates the policy advantage to focus learning on states with high perceptual or decisional ambiguity, enabling effective targeted exploration beyond passive data augmentation. Evaluated on diverse multimodal reasoning benchmarks spanning mathematical and general domains, DUPL achieves solid gains. It improves Qwen2.5-VL accuracy by up to $\textbf{12.3%}$ (3B) and $\textbf{7.9%}$ (7B), and Qwen3-VL-Instruct by up to $\textbf{10.7%}$ (4B) and $\textbf{12.4%}$ (8B), consistently outperforming GRPO, while seamlessly generalizing to alternative algorithms (DAPO, $\textbf{+6.5%}$ avg) and architectures (LLaVA-OneVision-1.5, $\textbf{+4.7%}$ avg). These results demonstrate that DUPL is an effective and generalizable approach for multimodal RLVR.

URL PDF HTML ☆

赞 0 踩 0

2510.06647 2026-06-16 stat.ML cs.LG 版本更新

SAAS：面向智能体搜索中过度搜索缓解的自我感知强化学习

Yunbo Tang, Chengyi Yang, Shiyu Liu, Zhishang Xiang, Zerui Chen, Qinggang Zhang, Jinsong Su

发表机构 * School of Informatics, Xiamen University（厦门大学信息学院）； School of Artificial Intelligence, Jilin University（吉林大学人工智能学院）

AI总结提出SAAS强化学习框架，通过搜索边界建模、边界感知奖励和分阶段优化策略，使LLM智能体具备动态自我感知能力，在不降低准确率的前提下显著减少过度搜索。

详情

AI中文摘要

智能体搜索使LLM能够通过迭代推理和外部搜索解决复杂的多跳问题。尽管有效，但这些系统在实践中常受限于一个关键缺陷：智能体无法识别自身知识边界，在内部知识足够时盲目触发搜索，甚至在已收集足够证据时未能终止搜索。缺乏自我感知导致严重的 extbf{过度搜索}，带来大量推理延迟和过高的计算成本。为此，我们提出SAAS，一种新颖的强化学习框架，旨在培养动态自我感知能力，精确调节搜索行为而不损害准确性。SAAS引入三个关键组件：(i) 搜索边界建模机制，通过对比禁用搜索和启用搜索的轨迹，识别策略演化下的搜索边界；(ii) 边界感知奖励模块，将这种边界意识转化为轨迹级惩罚，抑制不必要和冗余的搜索；(iii) 分阶段优化策略，利用顺序课程优先考虑推理而非搜索正则化，从而避免奖励黑客。大量实验表明，SAAS在保持准确性的同时大幅减少了过度搜索。我们的代码和实现细节已在https://github.com/XMUDeepLIT/SAAS发布。

英文摘要

Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe \textbf{over-search}, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy. Our code and implementation details are released at https://github.com/XMUDeepLIT/SAAS.

URL PDF HTML ☆

赞 0 踩 0

2606.15048 2026-06-16 cs.LG cs.CV 新提交

Temporal Difference Learning for Diffusion Models

扩散模型的时间差分学习

Qizhen Ying, Yangchen Pan, Victor Adrian Prisacariu, Junfeng Wen

AI总结提出时间差分（TD）目标函数，通过将扩散过程视为马尔可夫奖励过程并利用强化学习中的策略评估，强制去噪轨迹上的跨时间一致性，显著提升少步采样下的生成质量。

Comments 15 pages, 4 figures. Accepted at ICML 2026

详情

AI中文摘要

扩散模型通常使用专注于单个时间步（或相邻对）的局部去噪目标的损失函数进行训练，这并不强制去噪轨迹上预测之间的一致性。这种跨时间一致性的缺乏会降低性能，尤其是对于少步采样器。我们引入了一个时间差分（TD）目标，惩罚模型沿去噪路径的多步进展的不一致性。通过将扩散过程重新表述为马尔可夫奖励过程，并将去噪视为强化学习中的策略评估问题，我们推导出一个统一的TD方法，适用于离散和连续时间扩散公式。我们进一步提出了一种基于样本的加权方法，稳定训练。实验表明，使用我们的TD训练可以显著提高由FID衡量的样本质量，当采样步数较少时优势更强，突显了其在低计算预算场景下的实用价值。我们进行了消融研究以证明我们的设计选择，包括成对损失加权、正则化权重和单步跨度。总体而言，我们的TD方法可以作为一种通用的即插即用模块，强制跨时间一致性并提高不同扩散生成模型的生成质量。

英文摘要

Diffusion models are typically trained with objectives that focus on local denoising targets at individual time steps (or adjacent pairs), which do not enforce consistency between predictions along the denoising trajectory. This lack of cross-time consistency can degrade performance, especially for few-step samplers. We introduce a temporal difference (TD) objective that penalizes inconsistency of the model's multi-step progress along the denoising path. By reformulating the diffusion process as a Markov reward process and casting denoising as a policy evaluation problem in reinforcement learning, we derive a unified TD approach that applies to both discrete- and continuous-time diffusion formulations. We further propose a principled sample-based reweighting method that stabilizes training. Empirically, we show that using our TD training can significantly improve sample quality measured by FID, with stronger advantages when the number of sampling steps is small, highlighting its practical utility under low-computation-budget scenarios. We provide ablation studies to justify our design choices, including pairwise loss reweighting, regularization weight, and one-step stride. Overall, our TD approach can be a general drop-in that enforces cross-time consistency and improves generation quality across different diffusion generative models.

URL PDF HTML ☆

赞 0 踩 0

2606.15172 2026-06-16 cs.LG 新提交

PHINN: 基于持久同构的稀有事件时间序列生成神经网络

Emre Yusuf, Ren Takahashi, Jayabrata Bhaduri

发表机构 * Defense.Codes (a DBA of CapaCloud Corp)（Defense.Codes（CapaCloud Corp 的商用名））

AI总结提出PHINN框架，利用动态Betti曲线作为条件信号和持久景观损失保持同调一致性，在金融、流行病和多模态基准上拓扑保真度优于统计和扩散基线。

Comments 15 pages, 4 figures

详情

AI中文摘要

时间序列中的稀有事件对建模至关重要，但由于数据稀缺而难以学习。当前的生成模型难以处理极端值。我们观察到稀有事件会留下独特的拓扑指纹——从点云嵌入中Betti数的转变——这些指纹比统计矩更稳定且更具判别性。我们提出了PHINN，一个流匹配框架，使用动态Betti曲线作为条件信号，并采用持久景观损失来保持同调一致性。它可扩展到多变量数据，包含一个自然语言接口来设置Betti目标，支持跨领域元学习和少样本生成，并提供经过认证的对抗鲁棒性。在金融、流行病和多模态基准上，PHINN在拓扑保真度（beta-RMSE降低41-63%，转换准确率提高84%）方面优于统计和扩散基线，在尾部覆盖方面与跳跃扩散模型相当，在形状保真度方面超过它们。所有结果均具有95%置信区间。

英文摘要

Rare events in time series are critical to model but hard to learn due to data scarcity. Current generative models struggle with extreme values. We observe that rare events leave distinct topological fingerprints - transitions in Betti numbers from point-cloud embeddings - that are more stable and discriminative than statistical moments. We introduce PHINN, a flow-matching framework using dynamic Betti curves as conditioning signals and a persistence landscape loss for homology consistency. It scales to multivariate data, includes a natural-language interface to set Betti targets, supports cross-domain meta-learning and few-shot generation, and provides certified adversarial robustness. On financial, epidemiological, and multi-modal benchmarks, PHINN outperforms statistical and diffusion baselines in topological fidelity (beta-RMSE down 41-63%, transition accuracy up 84%) and matches jump-diffusion models in tail coverage while exceeding them in shape fidelity. All results have 95% confidence intervals.

URL PDF HTML ☆

赞 0 踩 0

2606.15793 2026-06-16 cs.LG cs.AI stat.ML 新提交

Proximal Policy Optimization for Amortized Discrete Sampling

用于摊销离散采样的近端策略优化

Anna Zykova-Myzina, Timofei Gritsaev, Daniil Tiapkin, Nikita Morozov

发表机构 * HSE University（高等经济学院）； Constructor University（康斯特大学）； CMAP, CNRS, École polytechnique, IPP（CMAP，CNRS，巴黎综合理工学院，IPP）

AI总结本文在生成流网络框架下，推导了策略梯度算法并首次应用近端策略优化，提升了离散概率分布采样的收敛速度和数据效率。

2606.15805 2026-06-16 cs.LG 新提交

决策加权流匹配用于上下文随机优化

Jize Xie, Haomiao Wu, Qiang Chen, Xiu Su, Yi Chen

发表机构 * Hong Kong University of Science and Technology（香港科技大学）； Central South University（中南大学）； Big Data Institute（大数据研究院）

AI总结提出决策加权流匹配（DW-FM）框架，通过重加权速度回归目标对齐下游遗憾，在CVaR基准上优于标准方法。

详情

AI中文摘要

条件生成模型越来越多地被用作随机优化的场景生成器，但标准训练目标强调均匀分布拟合，而非生成场景所引发的下游决策。这造成了目标不匹配：统计常见区域的误差对决策遗憾影响很小，而决策敏感区域的误差可能显著改变最优行动。我们提出决策加权流匹配（DW-FM），一种遗憾对齐的训练框架，它保留了标准流匹配的简单性，同时使用决策敏感的端点信息对其速度回归目标进行重加权。理论上，我们通过损失诱导的决策差异和伴随输运论证将下游遗憾与路径速度不匹配联系起来，得到一个理想的遗憾对齐替代目标以及具有遗憾保证的实用端点加权目标。实验上，我们在三个基于CVaR的上下文随机优化基准（涵盖合成投资组合、半真实金融和交通CVaR任务）上展示了DW-FM的有效性，其中DW-FM在标准基线上改善了下游遗憾。

英文摘要

Conditional generative models are increasingly used as scenario generators for stochastic optimization, but standard training objectives emphasize uniform distributional fit rather than the downstream decisions induced by generated scenarios. This creates an objective mismatch: errors in statistically common regions may have little effect on decision regret, whereas errors in decision-sensitive regions can substantially change the optimal action. We propose Decision-Weighted Flow Matching (DW-FM), a regret-aligned training framework that preserves the simplicity of standard flow matching while reweighting its velocity-regression objective using decision-sensitive endpoint information. Theoretically, we connect downstream regret to pathwise velocity mismatch through a loss-induced decision discrepancy and an adjoint transport argument, yielding an ideal regret-aligned surrogate and practical endpoint-weighted objectives with regret guarantees. Empirically, we demonstrate the effectiveness of DW-FM on three CVaR-based contextual stochastic optimization benchmarks spanning synthetic portfolio, semi-real financial, and traffic-CVaR tasks, where DW-FM improves downstream regret over standard baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.17048 2026-06-16 cs.LG cs.CV stat.ML 新提交

通过得分函数桥接数据驱动先验进行后验采样——比较综述与实验研究

Elhadji Cisse Faye, Mame Diarra Fall, Sylvain Delchini, Nicolas Dobigeon

发表机构 * IDP, Univ Orléans（IDP，奥尔良大学）； LITIS, Univ Rouen Normandie（LITIS，鲁昂-诺曼底大学）； Bureau de Recherches Géologiques et Minières Orléans, France（奥尔良地质与矿业研究局，法国）； IRIT, Univ Toulouse（图卢兹大学IRIT）

AI总结本文综述了贝叶斯逆问题中多种数据驱动先验如何通过得分函数统一，并展示其在采样算法中的有效集成，通过图像修复和超分辨率实验验证了方法的效率与通用性。

详情

AI中文摘要

本文综述了贝叶斯逆问题中常用的多种数据驱动先验如何通过各自的得分函数统一起来。通过将这些先验置于这一共同视角下，我们表明它们可以受益于直接且有效地集成到最近提出的采样算法中。通过考虑几种数据驱动先验，即去噪正则化、基于归一化流的先验、基于得分的生成模型和凸脊正则化，说明了这一通用框架的适用性。对于这四种特定的先验，在图像修复和单图像超分辨率任务中评估了该方法的性能。这些结果以及在地质背景下恢复真实图像的结果证明了该方法的效率。这一统一框架证明足够通用，能够处理由广泛类别的基于得分函数的先验定义的任何后验分布，而不仅限于本文考虑的具体情况。

英文摘要

This paper reviews how a diverse set of popular data-driven priors commonly used in Bayesian inverse problems can be unified through their respective score functions. By framing these priors under this common perspective, we show that they can benefit from their straightfoward and effective integration into a recently proposed sampling algorithm. The applicability of this common framework is illustrated by considering several data-driven priors, namely regularization-by-denoising, normalizing flow-based priors, score-based generative models, and convex-ridge regularizers. For these four particular priors, the performance of the method is evaluated when conducting image inpainting and single image super-resolution. These results, as well as those obtained when restoring real images acquired in a geological context, demonstrate the efficiency of the method. This unified framework proves versatile enough to handle any posterior distribution defined by a broad class of score function-based priors, beyond the specific cases considered in this paper.

URL PDF HTML ☆

赞 0 踩 0

2606.15344 2026-06-16 cond-mat.dis-nn cs.LG physics.optics quant-ph 交叉投稿

Generative modelling powered by room-temperature polariton condensates

基于室温极化激元凝聚的生成建模

Yuan Wang, Marcin Muszynski, Avinash Dash, Rishabh Kaurav, Vinod M. Menon, Oleksandr Kyriienko

发表机构 * School of Mathematical and Physical Sciences, University of Sheffield, Sheffield S10 2TN, United Kingdom（谢菲尔德大学数学与物理科学学院）； Department of Physics, City College of New York, New York, NY 10031, USA（纽约城市学院物理系）； Physics Doctoral Program, Graduate Center of the City University of New York, New York, NY 10016, USA（纽约城市大学研究生中心物理博士项目）； Chemistry Doctoral Program, Graduate Center of the City University of New York, New York, NY 10016, USA（纽约城市大学研究生中心化学博士项目）

AI总结利用有机染料微腔中室温激子-极化激元凝聚体的非线性多体动力学和固有随机性，作为生成对抗网络中的物理随机变换层，实现条件数字到图像翻译，优于数字注入扰动方法。

Comments 9 pages and 4 figures in the main text; 17 pages SM; codes to be released

详情

AI中文摘要

生成建模需要高效的随机非线性变换以及能够自然实现这些变换的物理平台。我们实验证明，工作在强光-物质耦合机制下的非线性光学系统可以作为条件生成建模的物理变换层。具体而言，我们开发了一个工作流程，其中在有机染料微腔中形成的室温激子-极化激元凝聚体作为生成对抗网络中的物理随机变换，实现条件数字到图像翻译。通过利用极化激元凝聚体的非线性多体动力学和固有随机性，该工作流程优于基于数字注入扰动的基线方法。我们发现，与数字采样和基于激光的系统相比，通过生成对抗网络（Polariton GAN）进行的极化激元增强采样提高了初始分数、数字保留精度和结构相似性。我们进一步表明，空间相关的输出变化可以自然地正则化对抗训练并增强输出多样性。我们的结果确立了极化激元凝聚作为生成建模的新计算资源，为物理增强机器学习系统开辟了道路。

英文摘要

Generative modelling requires efficient stochastic nonlinear transformations and physical platforms that can naturally realise them. We experimentally demonstrate that nonlinear optical systems operating in the strong light-matter coupling regime can serve as physical transformation layers for conditional generative modelling. Specifically, we develop a workflow in which room-temperature exciton-polariton condensates formed in organic dye microcavities act as a physical stochastic transform within a generative adversarial network and enable conditional digit-to-image translation. By using the nonlinear many-body dynamics and intrinsic stochasticity of polariton condensates, the workflow outperforms baseline approaches based on digitally injected perturbations. We find that polariton-enabled sampling via generative adversarial network (Polariton GAN) yields improved inception score, digit preservation accuracy and structural similarity compared with both digital sampling and laser-based systems. We further show that spatially correlated output variations can naturally regularise adversarial training and enhance output diversity. Our results establish polariton condensation as a new computational resource for generative modelling, opening a pathway towards physics-enhanced machine learning systems.

URL PDF HTML ☆

赞 0 踩 0

2606.15442 2026-06-16 stat.ML cs.LG 交叉投稿

The Reverse Telescoping Coordinate System for Positive Definite Matrices: Geometry, Computation, and Generative Modeling

正定矩阵的反向望远镜坐标系：几何、计算与生成建模

Anindya Bhadra

发表机构 * Purdue University（普渡大学）

AI总结提出一种新的无约束坐标系，通过反向望远镜映射表示对称正定矩阵，实现雅可比仅依赖对数行列式、矩阵与逆矩阵的符号表示，并设计分裂体积-形状流模型用于生成建模。

详情

AI中文摘要

我们设计了一种新的无约束坐标系，其中 $p\times p$ 对称正定（SPD）矩阵 $\Theta$ 由反向望远镜映射 $\Theta(x)=\rm{RT}(x)$ 表示，其中 $x=(v,d,r)\in\mathbb{R}\times\mathbb{R}^{(p-1)}\times\mathbb{R}^{p(p-1)/2}$ 分别代表对数体积或对数行列式；以及形状，由对数相对对角尺度与节点间的部分协方差编码。这一构造产生了其他坐标图（如矩阵对数）所不具备的重要性质，例如雅可比仅依赖于对数行列式。我们构造的一个有用特性是 $x$ 包含矩阵及其逆的无损符号表示。许多涉及矩阵及其逆的重要计算可以在变换域中以 $O(p^2)$ 完成，而将结果以矩阵形式呈现（按需）才需要 $O(p^3)$ 成本。此外，变换域中两个单位行列式矩阵可以通过一条路径上单位行列式的直线连接。对于生成建模，这允许设计一个分裂体积-形状流模型，通过条件流匹配在单位行列式路径上传输形状，并有一个独立的一维流传输体积或行列式。令人生畏的SPD约束被驯服为强大的引导力，带来令人惊讶的洞察：在某种意义上，为SPD设计体积归一化的形状流比无约束的 $\mathbb{R}^{p\times p}$ 更容易，因为后者没有内在的体积概念来辅助归一化，而SPD矩阵的行列式则提供了这一点。我们将我们的构造应用于高达 $p=200$ 的SPD矩阵生成建模，针对一个困难的合成双峰目标，以及通过fMRI数据训练的模型生成脑连接网络；还应用于SPD流形上的内在扩散。

英文摘要

We design a new unconstrained coordinate system where a $p\times p$ symmetric positive definite (SPD) matrix $Θ$ is represented by a reverse telescoping map $Θ(x)=\rm{RT}(x)$, with $x=(v,d,r)\in\mathbb{R}\times\mathbb{R}^{(p-1)}\times\mathbb{R}^{p(p-1)/2}$, representing respectively the log volume or log determinant; and the shape, as encoded by log relative diagonal scales and partial covariances among the nodes. This construction results in important properties not available in other charts, e.g., matrix logarithm, such as Jacobian depending on only the log-determinant. A useful feature of our construction is $x$ contains a lossless symbolic representation of both the matrix and its inverse. Many important computations involving a matrix and its inverse can be performed in $O(p^2)$ in the transformed domain, while it is the rendering of results in matrix forms (on demand) that must incur an $O(p^3)$ cost. Moreover, two unit-determinant matrices in the transformed domain can be joined by a straight line with pathwise unit determinant. For generative modeling, this allows designing a split volume-shape flow model trained by conditional flow matching for transporting the shape over the unit-determinant path, with a separate one-dimensional flow for transporting the volume or the determinant. The forbidding SPD constraint, tamed thus into a powerful guiding force, leads to the surprising insight that it is in some sense easier to design a volume-normalized shape flow for SPD compared to the unconstrained $\mathbb{R}^{p\times p}$, with no intrinsic notion of volume to aid normalization, unlike the determinant of SPD matrices. We apply our construction for up to $p=200$ in generative modeling of SPD matrices on a difficult synthetic bimodal target, and in generating brain connectivity networks by models trained on fMRI data; as well as in intrinsic diffusion on the SPD manifold.

URL PDF HTML ☆

赞 0 踩 0

2606.15457 2026-06-16 cs.CV cs.LG 交叉投稿

Lesion-DDPM: Lesion-Enhanced 3D Diffusion for MS MRI Synthesis

Lesion-DDPM：用于MS MRI合成的病灶增强3D扩散模型

Weidong Zhang, Yongchan Jung, Shafayat Mowla Anik, Furen Xiao, Vasudevan Janarthanan, Enkhzaya Chuluunbaatar, Byeong Kil Lee, Jeeho Ryoo

发表机构 * University of Texas at Arlington（德克萨斯大学阿灵顿分校）； University of Texas at San Antonio（德克萨斯大学圣安东尼奥分校）； University of Texas at Dallas（德克萨斯大学达拉斯分校）； National Taiwan University Hospital（国立台湾大学医院）； National University of Mongolia（蒙古国立大学）； University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结提出Lesion-DDPM，一种3D条件扩散框架，通过多级解剖掩膜注入和病灶加权重建损失，实现病灶感知的FLAIR合成，在MS病灶分割下游任务中显著提升Dice分数。

详情

AI中文摘要

3D FLAIR MRI被广泛推荐为多发性硬化（MS）脑部成像的标准MRI序列之一，但公开可用的MS数据集仍然相对较小，且在不同扫描仪、采集协议和病灶模式上存在差异。这种稀缺性和异质性阻碍了稳健的神经影像机器学习模型的发展，尤其对于旨在合成图像同时保留小而稀疏病灶的生成模型而言，这是一个挑战。我们提出了Lesion-DDPM，一种用于病灶感知FLAIR合成的3D条件扩散框架，该框架结合了多级解剖掩膜注入以及病灶加权重建损失，以在保持整体大脑结构的同时强调病灶体素。使用MSLesSeg数据集的精选子集，我们将Lesion-DDPM与代表性的最先进GAN和扩散模型进行比较，评估图像生成指标和下游3D U-Net分割性能。在我们的实验中，Lesion-DDPM在所有方法中实现了最低的病灶区域重建误差。在下游3D U-Net病灶分割任务中，仅使用Lesion-DDPM生成的扫描训练并在真实MRI上评估的模型达到了0.616的Dice分数，而最佳竞争合成数据集为0.569。当将Lesion-DDPM图像添加到真实训练集中时，Dice分数进一步增加到0.685。

英文摘要

3D FLAIR MRI is widely recommended as one of the standard MRI sequences for brain imaging in multiple sclerosis (MS), but publicly available MS datasets remain relatively small and vary across scanners, acquisition protocols, and lesion patterns. This scarcity and variability hinder the development of robust neuroimaging machine learning models and are particularly challenging for generative models that aim to synthesize images while preserving small, sparse lesions. We propose Lesion-DDPM, a 3D conditional diffusion framework for lesion-aware FLAIR synthesis that incorporates multi-level anatomical mask injection together with a lesion-weighted reconstruction loss to emphasize lesion voxels while maintaining global brain structure. Using a curated subset of the MSLesSeg dataset, we compare Lesion-DDPM with representative state-of-the-art GAN- and diffusion-based models, assessing both image-generation metrics and downstream 3D U-Net segmentation. In our experiments, Lesion-DDPM achieved the lowest lesion-region reconstruction error among all methods. In a downstream 3D U-Net lesion segmentation task, a model trained only on Lesion-DDPM-generated scans and evaluated on real MRIs reached a Dice score of 0.616 compared with 0.569 for the best competing synthetic dataset. When Lesion-DDPM images were added to the real training set, the Dice score further increased to 0.685.

URL PDF HTML ☆

赞 0 踩 0

2606.15871 2026-06-16 stat.CO cs.LG stat.ML 交叉投稿

Amortized mean-shift interacting particles

摊销均值漂移交互粒子

Ali Siahkoohi

发表机构 * Department of Computer Science University of Central Florida（计算机科学系佛罗里达中央大学）

AI总结提出摊销均值漂移交互粒子方法，通过学习映射从观测和少量后验样本直接输出加权节点，无需评估密度或得分，实现比同等数量蒙特卡洛样本更精确的积分估计。

详情

AI中文摘要

逆问题的贝叶斯推断用于评估积分——后验期望、尾部概率和风险——跨观测流。标准估计通过对后验样本的积分求平均，其误差仅随样本量的平方根衰减，因此精度需要大量样本——当每个样本调用偏微分方程正演模型时，这是禁止的。均值漂移交互粒子需要的样本少得多：它们返回一小组带符号权重的节点——一种确定性求积，其加权平均值估计这些积分。然而，寻找节点是一个每次观测的优化，在其最精确的形式中，每一步都读取后验得分——返回它本意要节省的成本。我们引入了摊销均值漂移交互粒子，一种学习映射，在单次前向传递中从观测和几个后验样本输出加权节点。训练仅需要联合参数-观测样本和一个可供抽样的后验——条件归一化流、经验条件或用户能抽样的任何参考——映射仅从样本学习积分该后验，既不评估其密度也不评估其得分。一旦训练完成，它泛化到未见过的观测和任意节点预算的积分，并以两种方式改进独立样本：通过重新加权，证明不劣于蒙特卡洛的等权重；通过移动它们，经验上进一步降低误差。在闭式、抽样、学习和基于物理的后验中——直到一千个系数的地下水场——它在每个预算下比相同数量的样本更准确地积分，并且后验白化、维度感知核消除了高维障碍。结果是蒙特卡洛积分的帕累托改进，而非与抽取更多样本竞争。

英文摘要

Bayesian inference for inverse problems is run to evaluate integrals -- posterior expectations, tail probabilities, and risks -- across a stream of observations. The standard estimate averages the integrand over posterior samples, a Monte-Carlo average whose error decays only as the square root of the sample size, so accuracy demands many samples -- prohibitive when each one calls a partial-differential-equation forward model. Mean-shift interacting particles need far fewer: they return a small set of signed-weight nodes -- a deterministic quadrature whose weighted averages estimate those integrals. Finding the nodes, however, is a per-observation optimization that, in its most accurate form, reads the posterior score at every step -- returning the cost it meant to save. We introduce amortized mean-shift interacting particles, a learned map that emits the weighted nodes from an observation and a few posterior samples in a single forward pass. Training asks only for joint parameter-observation samples and a posterior to draw from -- a conditional normalizing flow, an empirical conditional, or any reference the user can sample -- and the map learns to integrate that posterior from samples alone, evaluating neither its density nor its score. Once trained, it generalizes to unseen observations and integrands at any node budget and improves on independent samples in two ways: by reweighting them, provably no worse than the equal weights of Monte-Carlo; and by moving them, which empirically lowers it further. Across closed-form, sampled, learned, and physics-based posteriors -- up to a thousand-coefficient groundwater field -- it integrates more accurately than the same number of samples at every budget, and a posterior-whitened, dimension-aware kernel removes the high-dimensional wall. The result is a Pareto improvement on Monte-Carlo integration, not a competitor to drawing more samples.

URL PDF HTML ☆

赞 0 踩 0

2606.16138 2026-06-16 stat.ML cs.LG 交叉投稿

Closing the Approximation Gap in Simulation-free Latent SDEs

弥合无模拟潜在随机微分方程中的近似差距

Henry D. Smith, Brian L. Trippe, Scott W. Linderman

发表机构 * Stanford University（斯坦福大学）

AI总结针对现有无模拟变分推断算法因参数化限制导致后验推断和参数学习性能下降的问题，提出Helmholtz-SDE算法，通过优化与指定边际分布兼容的路径律来弥合近似差距，在保持高效的同时恢复更准确的动力学。

详情

AI中文摘要

从含噪声观测中恢复动力系统是包括神经科学和物理学在内的科学领域中的反复挑战。潜在随机微分方程通过将系统建模为根据可学习SDE演化并生成观测的未观测状态来解决这一问题。变分推断为拟合潜在SDE提供了可处理的目标。传统的VI算法通过在时间离散化上进行数值模拟来评估该目标，在保真度和计算成本之间进行权衡。最近一类算法，即无模拟VI，通过其瞬时边际而不是漂移来参数化后验，从而避开了这种权衡。在这项工作中，我们表明现有无模拟VI算法的效率是有代价的：它们的参数化将近似后验限制为基于模拟的方法可用的SDE的子集，降低了后验推断和参数学习。我们提出了Helmholtz-SDE，一种无模拟VI算法，通过优化与指定边际分布集合兼容的路径律来弥合这一差距。Helmholtz-SDE比先前的无模拟方法更忠实地恢复动力学，在高后验不确定性下增益最大。它进一步以一小部分运行时间匹配基于模拟的VI的性能。

英文摘要

Recovering dynamical systems from noisy observations is a recurring challenge across scientific domains, including neuroscience and physics. Latent stochastic differential equations (SDEs) address this by modeling the system as an unobserved state that evolves according to a learnable SDE and generates the observations. Variational inference (VI) provides a tractable objective for fitting latent SDEs. Traditional VI algorithms evaluate this objective by numerical simulation over a time discretization, trading fidelity for computational cost. A recent class of algorithms, simulation-free VI, sidesteps this tradeoff by parameterizing the posterior through its instantaneous marginals rather than its drift. In this work, we show that the efficiency of existing simulation-free VI algorithms comes at a price: their parameterizations restrict the approximate posterior to a subset of the SDEs available to simulation-based methods, degrading posterior inference and parameter learning. We propose Helmholtz-SDE, a simulation-free VI algorithm that closes this gap by optimizing over path laws compatible with a prescribed collection of marginals. Helmholtz-SDE recovers dynamics more faithfully than prior simulation-free methods, with the largest gains under high posterior uncertainty. It further matches the performance of simulation-based VI at a fraction of the runtime.

URL PDF HTML ☆

赞 0 踩 0

2606.16219 2026-06-16 cs.CE cs.LG physics.comp-ph 交叉投稿

Graphical conditional generative modeling for digital twin modeling

面向数字孪生建模的图条件生成建模

Zongren Zou, Théo Bourdais, Ricardo Baptista, Houman Owhadi

发表机构 * Department of Computing and Mathematical Sciences, California Institute of Technology（计算与数学科学系，加州理工学院）； Department of Statistical Sciences, University of Toronto（统计科学系，多伦多大学）

AI总结针对数字孪生建模中的保真度问题，提出一种基于条件生成模型和高斯过程方差分析（核模式分解）的框架，从观测数据中发现影响目标条件分布的关键变量，构建简约随机代理模型，并在控制、强化学习等任务中验证其性能。

详情

AI中文摘要

数字孪生建模，包括模型不确定性下的控制和数据同化，通常面临一个开放式的保真度问题：增加变量、数据流和时间尺度会无限增加模型复杂度，最终产生难以维护、验证、解释以及用于压力或安全测试的系统。作为替代方案，可以寻求仅基于描述相关感兴趣量所需的变量构建的简约随机代理模型。我们引入了一个框架，通过识别哪些候选输入影响目标量的完整条件律（而不仅仅是其条件均值），从观测数据中发现此类变量。这一区别在随机、粗粒化或部分观测系统中至关重要，在这些系统中，依赖关系可能通过变异性、尾部行为、多模态或不确定性的变化而非确定性函数关系表现出来。该框架将条件生成模型（学习给定候选输入下目标的条件分布）与基于高斯过程的方差分析（通过核模式分解）相结合，从而能够迭代剪除非影响输入并发现可解释的结构。在控制设置中，得到的代理模型可以解释为学习到的马尔可夫决策过程：该方法不仅识别出转移模型，还识别出使学习到的动态过程有效马尔可夫所需的状态、动作和记忆变量。在涉及随机动力系统、缺失变量、偏微分方程控制、强化学习和经济数据的多个示例中，发现的结构产生了可解释的随机代理模型，其下游性能与在完整变量集上训练的模型相当。

英文摘要

Digital twin modeling, including control and data assimilation under model uncertainty, often faces an open-ended fidelity problem: adding variables, data streams, and time scales can indefinitely increase model complexity, ultimately producing systems that are difficult to maintain, validate, interpret, and use for stress or safety testing. As an alternative, one can seek parsimonious stochastic surrogate models built only on the variables needed to describe the relevant quantities of interest. We introduce a framework for discovering such variables from observational data by identifying which candidate inputs influence the full conditional law of a target quantity, rather than only its conditional mean. This distinction is essential in stochastic, coarse-grained, or partially observed systems, where dependencies may appear through changes in variability, tail behavior, multimodality, or uncertainty rather than through deterministic functional relationships. The framework couples conditional generative modeling, which learns the conditional distribution of the target given candidate inputs, with Gaussian-process-based analysis of variance (through kernel mode decomposition), which enables iterative pruning of non-influential inputs and interpretable structure discovery. In control settings, the resulting surrogate can be interpreted as a learned Markov decision process: the method identifies not only a transition model, but also the state, action, and memory variables needed to make the learned dynamics effectively Markovian. Across examples involving stochastic dynamical systems, missing variables, PDE control, reinforcement learning, and economic data, the discovered structures yield interpretable stochastic surrogates whose downstream performance is comparable to models trained on the full variable set.

URL PDF HTML ☆

赞 0 踩 0

2606.16273 2026-06-16 stat.ML cs.LG stat.ME 交叉投稿

Generative Modeling on Metric Graphs via Neural Optimal Transport

基于神经最优传输的度量图生成建模

Alessandro Micheli, Yueqi Cao, Anthea Monod, Samir Bhatt

发表机构 * Imperial College London（帝国理工学院伦敦分校）； KTH Royal Institute of Technology（皇家理工学院）； Statens Serum Institut（丹麦国家血清研究所）； University of Copenhagen（哥本哈根大学）

AI总结提出首个深度生成建模框架，用于度量图上连续分布，通过图嵌入、神经半对偶求解熵Kantorovich问题并投影回原图，理论证明收敛性，实验优于离散图OT基线。

详情

AI中文摘要

我们提出了，据我们所知，首个用于紧度量图上连续支撑概率分布的深度生成建模框架。给定度量图上的源测度和目标测度，我们的方法将图嵌入到光滑环境空间，通过神经半对偶参数化求解熵Kantorovich问题，并将生成的样本投影回原始图。我们研究了两种嵌入几何：外在欧几里得实现和内在热带Abel--Jacobi嵌入到Jacobian环面。在这两种情况下，生成的生成器通过构造支持在图上。我们证明，在增加神经表达能力的联合极限下，学习到的生成器弱收敛到原始图测度之间的有效传输耦合。实验上，在一系列几何不同的图上，我们的方法匹配或改进了基于离散图OT的启发式传输基线，同时具有更好的可扩展性。最后，我们通过在纽约市曼哈顿的一百万Uber上车点数据上训练模型，展示了在真实世界城市移动数据上的可扩展性。

英文摘要

We introduce, to our knowledge, the first deep generative modeling framework for probability distributions continuously supported on compact metric graphs. Given source and target measures on a metric graph, our method embeds the graph into a smooth ambient space, solves an entropic Kantorovich problem via a neural semidual parameterization, and projects generated samples back onto the original graph. We study two embedded geometries: an extrinsic Euclidean realization and the intrinsic tropical Abel--Jacobi embedding into the Jacobian torus. In both cases, the resulting generator is graph-supported by construction. We prove that, in the joint limit of increasing neural expressivity, the learned generator converges weakly to a valid transport coupling between the original graph measures. Empirically, across a range of geometrically distinct graphs, our method matches or improves upon heuristic transport baselines based on discrete graph OT, while scaling more favorably. Finally, we demonstrate scalability on real-world urban mobility data by training our model on one million Uber pickup locations in Manhattan, New York City.

URL PDF HTML ☆

赞 0 踩 0

2606.16610 2026-06-16 stat.ML cs.LG 交叉投稿

Diffusion Flow Matching: Dimension-Improved KL Bounds and Wasserstein Guarantees

扩散流匹配：维度改进的KL界和Wasserstein保证

Marta Gentiloni Silveri, Giovanni Conforti, Alain Durmus

发表机构 * Ecole Polytechnique, Massy Palaiseau, France（法国高等理工学院，马希-帕莱索）

AI总结本文针对基于布朗运动的扩散流匹配，在KL散度和2-Wasserstein距离下推导了改进的离散化误差收敛界，实现了维度依赖的最优缩放。

2509.24223 2026-06-16 cs.LG cs.CV stat.ML 版本更新

Semantic Editing with Coupled Stochastic Differential Equations

耦合随机微分方程的语义编辑

Jianxin Zhang, Clayton Scott

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出耦合随机微分方程（coupled SDEs）引导预训练生成模型的采样过程，无需重新训练即可实现高提示保真度和近像素级一致性的语义编辑。

2603.17353 2026-06-16 cs.LG cs.AI 版本更新

Learning Permutation Distributions via Reflected Diffusion on Ranks

通过秩上的反射扩散学习排列分布

Sizhuang He, Yangtian Zhang, Shiyang Zhang, David van Dijk

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出Soft-Rank Diffusion框架，通过将排列松弛为软秩实现平滑扩散，并引入上下文广义Plackett-Luce去噪器，在排序和组合优化任务上优于现有扩散方法。

Comments 18 pages including the appendix, 7 figures, 9 tables, Accepted at ICML 2026

详情

使用潜在变量的高效流匹配

Anirban Samaddar, Yixuan Sun, Viktor Nilsson, Sandeep Madireddy

发表机构 * Argonne National Laboratory（阿贡国家实验室）； KTH Royal Institute of Technology（皇家理工学院）

AI总结提出Latent-CFM方法，利用预训练深度潜在变量模型提取数据特征作为条件，提升流匹配模型的训练效率和生成质量，在图像和物理场生成任务中优于现有方法。

详情

AI中文摘要

流匹配模型在概率生成模型的图像生成任务中显示出巨大潜力。然而，文献中的大多数流匹配模型在从简单源分布（如标准高斯）学习流时，并未显式利用目标数据中的潜在聚类结构。这导致学习效率低下，尤其是对于许多通常位于低维流形中的高维真实世界数据集。为此，我们提出了 $\texttt{Latent-CFM}$，它通过使用预训练的深度潜在变量模型从数据中提取的特征作为条件，提供了高效的训练策略。通过对来自多模态分布的合成数据和广泛使用的图像基准数据集的实验，我们表明，$\texttt{Latent-CFM}$ 通过采用预训练的轻量级潜在变量模型，在显著减少训练和计算量的情况下，展现出比最先进的流匹配模型更好的生成质量。除了自然图像，我们还考虑了源自物理过程的空间场的生成建模。使用二维达西流数据集，我们证明了我们的方法比竞争方法生成更物理准确的样本。此外，通过潜在空间分析，我们证明了我们的方法可用于以潜在特征为条件的条件图像生成，这增加了生成过程的可解释性。

英文摘要

Flow matching models have shown great potential in image generation tasks among probabilistic generative models. However, most flow matching models in the literature do not explicitly utilize the underlying clustering structure in the target data when learning the flow from a simple source distribution like the standard Gaussian. This leads to inefficient learning, especially for many high-dimensional real-world datasets, which often reside in a low-dimensional manifold. To this end, we present $\texttt{Latent-CFM}$, which provides efficient training strategies by conditioning on the features extracted from data using pretrained deep latent variable models. Through experiments on synthetic data from multi-modal distributions and widely used image benchmark datasets, we show that $\texttt{Latent-CFM}$ exhibits improved generation quality with significantly less training and computation than state-of-the-art flow matching models by adopting pretrained lightweight latent variable models. Beyond natural images, we consider generative modeling of spatial fields stemming from physical processes. Using a 2d Darcy flow dataset, we demonstrate that our approach generates more physically accurate samples than competing approaches. In addition, through latent space analysis, we demonstrate that our approach can be used for conditional image generation conditioned on latent features, which adds interpretability to the generation process.

URL PDF HTML ☆

赞 0 踩 0

2511.09465 2026-06-16 stat.ML cs.LG 版本更新

Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions

分支流：带有分裂和删除的离散、连续和流形流匹配

Lukas Billera, Hedwig Nora Nordlinder, Jack Collier Ryder, Anton Oresten, Aron Stålmarck, Theodor Mosetti Björk, Ben Murrell

发表机构 * Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet（卡罗林斯卡研究所微生物学、肿瘤和细胞生物学系）

AI总结提出分支流框架，通过随机分支和死亡过程控制序列元素数量，适用于变长数据生成，并在小分子、抗体序列和蛋白质骨架生成中验证效果。

Comments 39 pages, 16 figures

详情

AI中文摘要

扩散和流匹配方法在状态空间连续的领域（如图像生成或蛋白质折叠与设计）以及离散领域（如扩散大语言模型）中显示出前景。当状态中的元素数量预先固定时（如图像），它们自然适用，但当大语言模型响应的长度或蛋白质链中的氨基酸数量未知时，则需要临时解决方案。这里我们提出分支流，一种生成建模框架，与扩散和流匹配方法一样，将简单分布传输到数据分布。但在分支流中，状态中的元素在二叉树森林上演化，以模型学习的速率随机分支和死亡。这使得模型在生成过程中能够控制序列中的元素数量。我们还表明，分支流可以与离散集、连续欧几里得空间、光滑流形以及混合这些组件的“多模态”乘积空间上的任何流匹配基础过程组合。我们在三个领域进行了演示：小分子生成（多模态）、抗体序列生成（离散）和蛋白质骨架生成（多模态），并表明分支流是一个具有稳定学习目标的能力分布学习器，并且它实现了新的能力。

英文摘要

Diffusion and flow matching approaches to generative modeling have shown promise in domains where the state space is continuous, such as image generation or protein folding & design, and discrete, exemplified by diffusion large language models. They offer a natural fit when the number of elements in a state is fixed in advance (e.g. images), but require ad hoc solutions when, for example, the length of a response from a large language model, or the number of amino acids in a protein chain is not known a priori. Here we propose Branching Flows, a generative modeling framework that, like diffusion and flow matching approaches, transports a simple distribution to the data distribution. But in Branching Flows, the elements in the state evolve over a forest of binary trees, branching and dying stochastically with rates that are learned by the model. This allows the model to control, during generation, the number of elements in the sequence. We also show that Branching Flows can compose with any flow matching base process on discrete sets, continuous Euclidean spaces, smooth manifolds, and `multimodal' product spaces that mix these components. We demonstrate this in three domains: small molecule generation (multimodal), antibody sequence generation (discrete), and protein backbone generation (multimodal), and show that Branching Flows is a capable distribution learner with a stable learning objective, and that it enables new capabilities.

URL PDF HTML ☆

赞 0 踩 0

2512.07212 2026-06-16 cs.AI cs.LG 版本更新

Sample from What You See: Visuomotor Policy Learning via Diffusion Bridge with Observation-Embedded Stochastic Differential Equation

从所见中采样：基于观测嵌入随机微分方程的扩散桥视觉运动策略学习

Zhaoyang Liu, Mokai Pan, Zhongyi Wang, Kaizhen Zhu, Haotao Lu, Haipeng Zhang, Jingya Wang, Ye Shi

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出BridgePolicy，通过扩散桥公式将观测直接集成到随机动力学中，利用语义对齐器处理异构观测，在模拟和真实任务中超越现有生成式策略。

Comments Accepted by ICML 2026

详情

AI中文摘要

基于扩散模型的模仿学习通过捕获多模态动作分布推动了机器人控制的发展。然而，现有方法通常仅将观测视为去噪网络的高层条件，而非将其整合到扩散过程本身的随机动力学中。因此，采样被迫从随机噪声开始，削弱了感知与控制之间的耦合，往往导致次优性能。我们提出BridgePolicy，一种生成式视觉运动策略，通过扩散桥公式将观测直接集成到随机动力学中。通过构建观测信息轨迹，BridgePolicy使采样能够从丰富且信息丰富的先验而非随机噪声开始，显著提高了控制的精度和可靠性。一个关键难点是扩散桥通常连接维度匹配的分布，而机器人观测是异构的，且与动作自然不对齐。为克服这一点，我们引入语义对齐器来统一视觉和状态输入，并将观测与动作表示对齐，使扩散桥适用于异构机器人数据。在三个基准测试的52个模拟任务和5个真实世界任务上的大量实验表明，BridgePolicy持续优于最先进的生成式策略。我们的代码可在此https URL获取。

英文摘要

Imitation learning with diffusion models has advanced robotic control by capturing the multi-modal action distributions. However, existing methods typically treat observations only as high-level conditions to the denoising network, rather than integrating them into the stochastic dynamics of the diffusion process itself. As a result, the sampling is forced to begin from random noise, weakening the coupling between perception and control and often yielding suboptimal performance. We propose BridgePolicy, a generative visuomotor policy that directly integrates observations into the stochastic dynamics via a diffusion-bridge formulation. By constructing an observation-informed trajectory, BridgePolicy enables sampling to start from a rich and informative prior rather than random noise, substantially improving precision and reliability in control. A key difficulty is that diffusion bridge normally connects distributions of matched dimensionality, while robotic observations are heterogeneous and not naturally aligned with actions. To overcome this, we introduce a semantic aligner to unify the visual and state inputs and align the observations with action representations, making diffusion bridge applicable to heterogeneous robot data. Extensive experiments across 52 simulation tasks on three benchmarks and 5 real-world tasks demonstrate that BridgePolicy consistently outperforms state-of-the-art generative policies. Our code is available at https://jianghcsr.github.io/BridgePolicy_page/.

URL PDF HTML ☆

赞 0 踩 0

2512.15313 2026-06-16 cs.SD cs.LG 版本更新

Time-Varying Audio Effect Modeling by End-to-End Adversarial Training

通过端到端对抗训练进行时变音频效果建模

Yann Bourdin, Pierrick Legrand, Fanny Roche

发表机构 * Arturia ； Inria center at the University of Bordeaux（Inria中心，位于波尔多大学）

AI总结提出一种生成对抗网络框架，仅用输入输出音频记录建模时变音频效果，无需调制信号提取，通过两阶段训练策略和状态预测网络实现黑箱建模。

Comments (03/2026) Accepted to the Journal of the Audio Engineering Society (JAES). Accompanying website: https://ybourdin.github.io/sptvmod

详情

AI中文摘要

深度学习已成为音频效果建模的标准方法，但严格的黒箱建模对于时变系统仍然存在问题。与时不变效果不同，在具有内部调制的设备上训练模型通常需要记录或提取控制信号，以确保标准损失函数所需的时间对齐。本文介绍了一种生成对抗网络（GAN）框架，仅使用输入输出音频记录来建模此类效果，无需调制信号提取。我们提出了一种卷积循环架构，通过两阶段策略进行训练：初始对抗阶段允许模型在没有严格相位约束的情况下学习调制行为的分布，随后是监督微调阶段，其中状态预测网络（SPN）估计所需的初始内部状态，以使模型与目标同步。此外，开发了一种基于啁啾信号的新指标来量化调制精度。对复古硬件移相器的建模实验证明了该方法在完全黑箱上下文中捕获时变动态的能力。

英文摘要

Deep learning has become a standard approach for the modeling of audio effects, yet strictly black-box modeling remains problematic for time-varying systems. Unlike time-invariant effects, training models on devices with internal modulation typically requires the recording or extraction of control signals to ensure the time-alignment required by standard loss functions. This paper introduces a Generative Adversarial Network (GAN) framework to model such effects using only input-output audio recordings, without requiring a modulation signal extraction. We propose a convolutional-recurrent architecture trained via a two-stage strategy: an initial adversarial phase allows the model to learn the distribution of the modulation behavior without strict phase constraints, followed by a supervised fine-tuning phase where a State Prediction Network (SPN) estimates the initial internal states required to synchronize the model with the target. Additionally, a new metric based on chirp-train signals is developed to quantify modulation accuracy. Experiments modeling a vintage hardware phaser demonstrate the method's ability to capture time-varying dynamics in a fully black-box context.

URL PDF HTML ☆

赞 0 踩 0

2602.01394 2026-06-16 eess.AS cs.LG cs.SD 版本更新

SSNAPS: Audio-Visual Separation of Speech and Background Noise with Diffusion Inverse Sampling

SSNAPS: 基于扩散逆采样的语音与背景噪声视听分离

Yochai Yemini, Yoav Ellinson, Rami Ben-Ari, Sharon Gannot, Ethan Fetaya

发表机构 * Bar-Ilan University（巴伊兰大学）； OriginAI

AI总结提出一种无监督的视听语音分离方法，利用扩散先验和逆采样联合建模语音与噪声，在单麦克风场景下优于有监督基线，并支持离屏说话人分离。

详情

AI中文摘要

本文解决了在真实环境噪声下进行视听单麦克风语音分离和增强的挑战。我们的方法基于生成逆采样，其中我们用专用的扩散先验对干净语音和环境噪声进行建模，并联合利用它们来恢复所有潜在源。为此，我们重新制定了一个最近的逆采样器以匹配我们的设置。我们在包含1、2和3个说话人以及噪声的混合信号上进行了评估，结果表明，尽管是完全无监督的，我们的方法在所有条件下的WER上始终优于领先的有监督基线。我们进一步扩展了我们的框架以处理离屏说话人分离。此外，分离出的噪声分量具有高保真度，使其适用于声学场景的下游检测。代码和预训练模型将在接收后提供。演示页面：此 https URL

英文摘要

This paper addresses the challenge of audio-visual single-microphone speech separation and enhancement in the presence of real-world environmental noise. Our approach is based on generative inverse sampling, where we model clean speech and ambient noise with dedicated diffusion priors and jointly leverage them to recover all underlying sources. To achieve this, reformulate a recent inverse sampler to match our setting. We evaluate on mixtures of 1, 2, and 3 speakers with noise and show that, despite being entirely unsupervised, our method consistently outperforms leading supervised baselines in WER across all conditions. We further extend our framework to handle off-screen speaker separation. Moreover, the high fidelity of the separated noise component makes it suitable for downstream detection of the acoustic scene. Code and pretrained models will become available upon acceptance. Demo page: https://ssnaps2026.github.io/ssnaps2026/

URL PDF HTML ☆

赞 0 踩 0

2604.23952 2026-06-16 stat.ML cs.LG nlin.CD 版本更新

Conditional Score-Based Modeling of Effective Langevin Dynamics

基于条件分数的有效朗之万动力学建模

Ludovico T. Giorgini

发表机构 * Department of Mathematics, Massachusetts Institute of Technology（数学系，麻省理工学院）

AI总结提出一种基于有限时间转移密度条件分数的随机降阶模型校准方法，通过最小二乘拟合从数据中推断漂移和扩散系数，避免轨迹微分或状态空间划分。

详情

AI中文摘要

随机降阶模型广泛用于表示复杂系统的有效动力学，但根据数据估计其漂移和扩散系数仍然具有挑战性。标准方法通常依赖于短时间轨迹增量、状态空间划分或候选模型的重复模拟，这些方法对于高维系统、粗时间采样或非均匀采样数据变得不可靠或计算成本高昂。我们引入了一种数据驱动的校准方法，该方法基于随机降阶模型系数与有限时间转移密度的条件分数（定义为转移密度对初始状态的对数梯度）之间的新关系。由此得到的恒等式将滞后相关函数的导数表示为观测到的滞后对上的平稳期望，其中涉及该条件分数和未知模型系数。这种公式允许直接从有限滞后统计量约束漂移和扩散结构，而无需在校准过程中对轨迹进行微分、划分状态空间或重复积分候选降阶模型，从而产生一个关于平稳滞后对的最小二乘拟合问题。我们在三个复杂度递增的系统上验证了该方法：一个解析可解的Cox-Ingersoll-Ross扩散过程、一个具有仿射乘性噪声的二维非平衡扩散过程，以及一个周期性的软自旋随机朗道-利夫希茨链。在这些测试中，推断出的模型在再现有限滞后动力学相关性的同时保持了不变统计量。该框架为从数据中学习再现规定统计和动力学性质的随机降阶模型提供了一种可扩展的途径。

英文摘要

Stochastic reduced-order models are widely used to represent the effective dynamics of complex systems, but estimating their drift and diffusion coefficients from data remains challenging. Standard approaches often rely on short-time trajectory increments, state-space partitioning, or repeated simulation of candidate models, which become unreliable or computationally expensive for high-dimensional systems, coarse temporal sampling, or unevenly sampled data. We introduce a data-driven calibration method based on a novel relationship between the coefficients of a stochastic reduced model and the conditional score of the finite-time transition density, defined as the gradient of the logarithm of the transition density with respect to the initial state. The resulting identity expresses derivatives of lagged correlation functions as stationary expectations over observed lagged pairs involving this conditional score and the unknown model coefficients. This formulation allows the drift and diffusion structure to be constrained directly from finite-lag statistics, without differentiating trajectories, partitioning state space, or repeatedly integrating candidate reduced models during calibration, yielding a least-squares fitting problem over stationary lagged pairs. We validate the approach on three systems of increasing complexity: an analytically tractable Cox--Ingersoll--Ross diffusion, a two-dimensional nonequilibrium diffusion with affine multiplicative noise, and a periodic soft-spin stochastic Landau--Lifshitz chain. Across these tests, the inferred models preserve the invariant statistics while reproducing finite-lag dynamical correlations. The framework provides a scalable route for learning stochastic reduced-order models from data that reproduce prescribed statistical and dynamical properties.

URL PDF HTML ☆

赞 0 踩 0

2605.03573 2026-06-16 stat.ML cs.LG 版本更新

Stochastic Schrödinger Diffusion Models for Pure-State Ensemble Generation

随机薛定谔扩散模型用于纯态集合生成

Jian Xu, Wei Chen, Shigui Li, Chao Li, Jingyuan Zheng, Delu Zeng, John Paisley, Qibin Zhao

发表机构 * RIKEN iTHEMS ； RIKEN AIP ； South China University of Technology（华南理工大学）； Stanford University（斯坦福大学）； Columbia University（哥伦比亚大学）

AI总结本文提出随机薛定谔扩散模型（SSDMs），在复射影空间CP^{d-1}上构建基于分数的生成框架，通过局部欧几里得奥本海姆-乌尔申贝格近似实现无解析过渡密度的训练，提升量子机器学习的泛化能力。

详情

AI中文摘要

在量子机器学习（QML）中，经典数据通常被编码为量子纯态并直接处理为量子表示，推动了在底层表示层面生成模型的发展，该模型从底层纯态集合中采样新量子态，而非从扰动的经典输入重新准备。然而，将具有明确反向时间采样器的分数扩散模型扩展到量子纯态集合仍具挑战性，由于复射影空间CP^{d-1}的非欧几里得几何和过渡密度的不可行性。我们提出了随机薛定谔扩散模型（SSDMs），一种内在的基于分数的生成框架，配备了Fubini-Study（FS）度量。SSDMs通过随机薛定谔方程（SSE）实现正向黎曼扩散，并推导出由黎曼分数∇_{FS} log p_t驱动的反向时间动力学。为了在没有解析过渡密度的情况下进行训练，我们引入了一个基于FS正常坐标中局部欧几里得奥本海姆-乌尔申贝格近似的局部时间目标，从而得到一个映射回流形的解析教师分数。实验表明，SSDMs能够忠实捕捉目标纯态集合的统计特性，包括可观测量的矩、重叠核MMD和纠缠度量，并且SSDM生成的量子表示通过表示层面的数据增强提升了下游QML的泛化能力。

英文摘要

Quantum machine learning increasingly relies on pure-state representations, motivating generative models that sample directly in quantum representation space rather than perturbing classical inputs and re-encoding. We introduce Stochastic Schrödinger Diffusion Models (SSDMs), a score-based generative framework that defines diffusion, scores, and reverse-time sampling intrinsically on the complex projective manifold $\mathbb{CP}^{d-1}$ under the Fubini--Study metric. SSDMs combine a Riemannian Ornstein--Uhlenbeck forward diffusion with a stochastic Schrödinger realization, and learn reverse-time dynamics driven by the Riemannian score. Our central technical contribution is a local-time learning objective that exploits the local Euclidean OU limit of intrinsic manifold diffusions in Fubini-Study normal coordinates to obtain an analytic teacher score, bypassing the intractable transition densities that limit existing Riemannian score-based models. Across synthetic, physics-inspired (TFIM, XXZ), and quantum feature-state benchmarks up to $14$ qubits, SSDMs match target pure-state ensembles by orders of magnitude on MMD and observable statistics over both ambient Euclidean and matched Riemannian score-based baselines, and improve representation-level diagnostics for downstream quantum kernel methods.

URL PDF HTML ☆

赞 0 踩 0

2605.18324 2026-06-16 cs.CV cs.AI cs.GR cs.LG stat.ML 版本更新

Improved Baselines with Representation Autoencoders

改进的基于表示自动编码器的基线

Jaskirat Singh, Boyang Zheng, Zongze Wu, Richard Zhang, Eli Shechtman, Saining Xie

发表机构 * Adobe Research（Adobe研究院）； ANU（澳大利亚国立大学）； New York University（纽约大学）

AI总结本文研究了基于表示自动编码器（RAE）的设计选择，发现三个见解，简化并改进了RAE。首先，研究了一种通用公式，将表示定义为最后k个编码器层的总和，而不是仅最终层。其次，研究了RAE与表示对齐（REPA）的假设，发现两者具有互补的工作机制。最后，改进了RAE在无分类器指导（CFG）中的表现，通过重新参数化DiT模型输出，实现了无需训练第二个模型的指导效果。RAEv2在ImageNet-256上达到了1.06的gFID，且训练效率显著提高。

详情

AI中文摘要

Representation Autoencoders (RAE) replace traditional VAE with pretrained vision encoders. In this paper, we systematically investigate several design choices and find three insights which simplify and improve RAE. First, we study a generalized formulation where the representation is defined as sum of the last k encoder layers rather than solely the final layer. This simple change greatly improves reconstruction without encoder finetuning or specialized data (e.g., text, faces). Second, we study the prevalent assumption that RAE (using pretrained representation as encoder) replaces representation alignment (REPA), which distills the same representation to intermediate layers instead. Through large-scale empirical analysis, we uncover a surprising finding: RAE and REPA exhibit complementary working mechanisms, allowing the same representation to be used as both encoder and target for intermediate diffusion layers. Finally, the original RAE struggles with classifier-free guidance (CFG) and requires training a second, weaker diffusion model for AutoGuidance (AG). We show that REPA itself can be viewed as x-prediction in RAE latent space. By simply re-parameterizing the output of the DiT model, it can provide guidance for

英文摘要

Representation Autoencoders (RAE) replace traditional VAE with pretrained vision encoders. In this paper, we systematically investigate several design choices and find three insights which simplify and improve RAE. First, we study a generalized formulation where the representation is defined as sum of the last k encoder layers rather than solely the final layer. This simple change greatly improves reconstruction without encoder finetuning or specialized data (e.g., text, faces). Second, we study the prevalent assumption that RAE (using pretrained representation as encoder) replaces representation alignment (REPA), which distills the same representation to intermediate layers instead. Through large-scale empirical analysis, we uncover a surprising finding: RAE and REPA exhibit complementary working mechanisms, allowing the same representation to be used as both encoder and target for intermediate diffusion layers. Finally, the original RAE struggles with classifier-free guidance (CFG) and requires training a second, weaker diffusion model for AutoGuidance (AG). We show that REPA itself can be viewed as x-prediction in RAE latent space. By simply re-parameterizing the output of the DiT model, it can provide guidance for "free". Overall, RAEv2 leads to more than 10x faster convergence over the original RAE, achieving a state-of-the-art gFID of 1.06 in just 80 epochs on ImageNet-256. On FDr6, RAEv2 achieves a state-of-the-art 2.17 at just 80 epochs compared to the previous best 3.26 (800 epochs) without any post-training. This motivates EPFID@k (epochs to reach unguided gFID < k) as a measure of training efficiency. RAEv2 attains an EPFID@2 of 35 epochs, versus 177 for the original RAE. We also validate our approach across diverse settings for text-to-image generation and navigation world models, showing consistent improvements. The code is available at https://raev2.github.io.

URL PDF HTML ☆

赞 0 踩 0

2606.13769 2026-06-16 cs.RO cs.CV cs.LG 版本更新

$μ_0$: A Scalable 3D Interaction-Trace World Model

$\mu_0$: 一种可扩展的3D交互轨迹世界模型

Seungjae Lee, Yoonkyo Jung, Jusuk Lee, Jonghun Shin, Amir Hossein Shahidzadeh, Yao-Chih Lee, H. Jin Kim, Jia-Bin Huang, Furong Huang

发表机构 * University of Maryland, College Park（马里兰大学帕克分校）； Seoul National University（首尔大学）

AI总结提出基于3D轨迹的可扩展世界模型$\mu_0$，通过预测交互点轨迹实现跨本体机器人学习，无需动作标签，性能媲美有监督模型。

详情

AI中文摘要

能够捕捉动作如何引起物理变化的世界模型使得可扩展的机器人学习成为可能，而无需依赖特定本体的动作标签。像素空间视频模型提供了广泛的视觉先验，但将模型容量消耗在密集外观重建上，而直接动作模型则需要特定本体的标签，阻碍了可扩展性。我们提出$\mu_0$，一种基于3D轨迹的可扩展世界模型。$\mu_0$不是预测密集像素或直接建模动作，而是预测显著交互点（如物体、工具、手和接触区域）的平滑3D轨迹，从而产生一个紧凑、与本体无关的运动接口。为了能够从多样化的视频源进行训练，我们的TraceExtract系统通过选择关键点、构建全局对齐的轨迹以及将运动片段与层次化语言描述关联，自动提取3D监督。这种TraceExtract监督通过将预训练的视觉-语言骨干网络与模块化轨迹专家相结合来预训练$\mu_0$，其中轨迹专家通过B样条控制点表示每个查询并预测未来轨迹。实验表明，$\mu_0$在2D和3D轨迹预测方面均优于基线方法，包括轨迹预测模型和分词VLM方法。由于$\mu_0$是冻结且可重用的，它可以与动作专家配对用于下游机器人本体。尽管是无动作预训练，由此产生的轨迹条件策略在性能上与使用动作监督预训练的VLA模型（如$\pi_0$）相当。这些结果确立了3D轨迹作为跨本体操作的可扩展和可迁移表示。

英文摘要

World models that capture how actions induce physical change enable scalable robot learning without reliance on embodiment-specific action labels. Pixel-space video models provide broad visual priors but expend model capacity on dense appearance reconstruction, while direct action models require embodiment-specific labels that hinder scalability. We present $μ_0$, a scalable world model based on 3D traces. Rather than predicting dense pixels or directly modeling actions, $μ_0$ forecasts smooth 3D trajectories for salient interaction points such as objects, tools, hands, and contact regions, yielding a compact, embodiment-agnostic motion interface. To enable training from diverse video sources, our TraceExtract system automatically extracts 3D supervision by selecting keypoints, constructing globally aligned traces, and associating motion segments with hierarchical language captions. This TraceExtract supervision pretrains $μ_0$ by combining a pretrained vision-language backbone with a modular trace expert, which represents each query via B-spline control points and predicts future traces. Experiments show that $μ_0$ outperforms baselines in both 2D and 3D trace prediction, including trace prediction models and tokenized VLM methods. Because $μ_0$ is frozen and reusable, it can be paired with action experts for downstream robot embodiments. Despite action-free pretraining, the resulting trace-conditioned policies achieve performance competitive with VLA models pretrained with action supervision, such as $π_0$. These results establish 3D traces as a scalable and transferable representation for cross-embodiment manipulation.

URL PDF HTML ☆

赞 0 踩 0

2606.14970 2026-06-16 cs.LG 新提交

Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning

基于LMO方法的零阶无参数优化：高效微调的新方法

Dmitriy Bystrov, Daniil Medyakov, Dmitry Bylinkin, Aleksandr Beznosikov

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结针对大模型微调中反向传播内存开销大、零阶优化对步长和平滑参数敏感的问题，提出统一无梯度训练、自适应调参和非欧几里得更新几何的AdaNAGED方法，并在OPT-1.3B模型上验证有效性。

Comments 29 pages, 1 table

详情

AI中文摘要

微调大型语言模型（LLM）已成为现代优化的核心应用，使预训练模型能够适应多样化的下游任务和特定领域数据。大规模微调的主要障碍是反向传播的内存开销，这需要存储激活值、梯度和优化器状态。零阶（ZO）优化提供了一种内存高效的替代方案，但其性能对步长和平滑参数高度敏感，通常需要昂贵的任务特定调参。无参数（PF）优化通过在没有问题相关常数先验知识的情况下调整算法参数来解决这一问题。此外，大规模微调可以受益于几何感知更新，该更新考虑了参数块的异质结构，这可以通过利用线性最小化预言（LMO）的方法来建模。在这项工作中，我们研究了基于LMO的ZO优化的PF自适应，并引入了$\texttt{AdaNAGED}$，一种统一无梯度训练、自适应调参和非欧几里得更新几何的方法。我们建立了收敛保证，并在使用$\texttt{OPT}-1.3\mathrm{B}$模型的大规模LLM微调任务上验证了该方法。

英文摘要

Fine-tuning large language models (LLMs) has become a central application of modern optimization, enabling pretrained models to adapt to diverse downstream tasks and domain-specific data. A major obstacle in large-scale fine-tuning is the memory overhead of backpropagation, which requires storing activations, gradients, and optimizer states. Zeroth-order (ZO) optimization offers a memory-efficient alternative, but its performance is highly sensitive to the stepsize and smoothing parameter, often requiring costly task-specific tuning. Parameter-free (PF) optimization addresses this issue by adapting algorithmic parameters without prior knowledge of problem-dependent constants. Moreover, large-scale fine-tuning can benefit from geometry-aware updates that account for the heterogeneous structure of parameter blocks, which can be modeled through methods that exploit linear minimization oracle (LMO). In this work, we study PF adaptation for LMO-based ZO optimization and introduce $\texttt{AdaNAGED}$, a method that unifies gradient-free training, adaptive tuning, and non-Euclidean update geometry. We establish convergence guarantees and validate the method on large-scale LLM fine-tuning task with $\texttt{OPT}-1.3\mathrm{B}$ model.

URL PDF HTML ☆

赞 0 踩 0

2606.15115 2026-06-16 cs.LG 新提交

Diversity-Driven Offline Multi-Objective Optimization via Nested Pareto Set Learning

基于嵌套帕累托集学习的多样性驱动离线多目标优化

Yiyi Zhu, Yaolin Wen, Xiang Xia, Xin An, Hanyi Si, Xiang Shu, Yangde Fu, Liang Dou, Hong Qian

AI总结针对离线多目标优化中的分布外问题，提出DOMOO方法，通过累积风险控制、嵌套帕累托集学习和多样性驱动选择策略，在合成和真实基准上实现了收敛性和多样性的最佳平均排名。

Comments 32 pages, 7 figures, accepted by ICML 2026. Project: https://github.com/YaolinWen/DOMOO

详情

AI中文摘要

多目标优化（MOO）已成为解决涉及多个目标的复杂优化问题的强大方法。在许多实际场景中，函数评估不可用或成本过高，因此必须仅基于固定的离线数据集进行优化。在这种称为离线MOO的设置中，目标是在无法访问真实目标函数的情况下找到帕累托集。这种设置存在分布外（OOD）问题，即代理模型对于未见过的设计不准确。由于OOD问题，代理误差可能导致优化器选择不在真实帕累托前沿上且偏向其极端的解。为了解决这个问题，本文提出了多样性驱动的离线多目标优化（DOMOO），旨在找到一组多样且高质量的解。首先，DOMOO包含一个累积风险控制模块，用于估计候选解的潜在风险，并缓解训练数据与生成解之间的OOD问题。此外，提出了一种嵌套帕累托集学习（PSL）策略，以联合学习偏好和PSL参数，然后优化它们，从而适应多样化的帕累托前沿几何形状。为了进一步提高解的质量，我们设计了一种多样性驱动的选择策略，用于提取一组具有代表性且分布良好的最终解。为了实现这种多样性驱动的选择策略，我们提出了$\text{IGD}_\text{offline}$，这是一个针对离线设置定制的指标，同时考虑了多样性和收敛性，并避免了超体积指标的偏差。在合成和真实基准上的大量实验表明，在比较的方法中，DOMOO在收敛性和多样性方面均实现了跨任务的最佳平均排名。

英文摘要

Multi-objective optimization (MOO) has emerged as a powerful approach to solving complex optimization problems involving multiple objectives. In many practical scenarios, function evaluations are unavailable or prohibitively expensive, necessitating optimization solely based on a fixed offline dataset. In this setting, known as offline MOO, the goal is to find out the Pareto set without access to the true objective functions. This setting suffers from the out-of-distribution (OOD) issue, where the surrogate model is not accurate for unseen designs. Due to the OOD issue, surrogate errors may cause the optimizer to select solutions that do not lie on the true Pareto front and are biased toward its extremes. To address this, this paper proposes Diversity-driven Offline Multi-Objective Optimization (DOMOO), which aims to find out a diverse and high-quality set of solutions. First, DOMOO incorporates an accumulative risk control module that estimates the potential risk of candidate solutions and alleviates the OOD issue between the training data and the generated solutions. In addition, a nested Pareto set learning (PSL) strategy is proposed to jointly learn preference and PSL parameters, then optimize them, enabling adaptation to diverse Pareto front geometries. To further enhance solution quality, we design a diversity-driven selection strategy that extracts a representative and well-distributed set of final solutions. To achieve this diversity-driven selection strategy, we propose $\text{IGD}_\text{offline}$, a tailored indicator for the offline setting that considers both diversity and convergence, and avoids the bias of hypervolume indicator. Extensive experiments on synthetic and real-world benchmarks show that DOMOO achieves the best average rank across tasks in both convergence and diversity among the compared methods.

URL PDF HTML ☆

赞 0 踩 0

2606.15219 2026-06-16 cs.LG cs.DS math.ST stat.ML stat.TH 新提交

Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model

神经网络能否实现最优计算-统计权衡？基于单指标模型的分析

Siyu Chen, Beining Wu, Miao Lu, Zhuoran Yang, Tianhao Wang

发表机构 * Department of Statistics and Data Science, Yale University（耶鲁大学统计与数据科学系）； Department of Statistics, University of Chicago（芝加哥大学统计系）； Department of Management Science and Engineering, Stanford University（斯坦福大学管理科学与工程系）； Toyota Technological Institute at Chicago（芝加哥丰田技术研究所）

AI总结提出统一梯度算法训练两层神经网络，在多项式时间内学习高斯单指标模型，样本复杂度匹配SQ下界，并扩展到稀疏情形。

Comments 96 pages, 4 figures

详情

AI中文摘要

多保真度SINDy：基于保真度加权测量的非线性动力系统稀疏发现

Filippo Zacchei, Ana Larrañaga, Attilio Frangi, Andrea Manzoni, Steven L. Brunton

发表机构 * Politecnico di Milano（米兰理工大学）； University of Washington（华盛顿大学）

AI总结针对异质噪声数据，提出多保真度SINDy方法，通过加权回归融合集成SINDy和弱SINDy，从不同保真度测量中稀疏识别非线性动力系统，理论证明加权策略的统计合理性，在常微分和偏微分方程基准系统及双摆预测中验证了其抑制异方差噪声、利用低成本低质量数据提升模型恢复的效果。

Comments 27 pages, 6 figures, 2 tables

详情

AI中文摘要

来自模拟和实验的数据很少是无噪声的，并且常常表现出异质保真度水平。测量不确定性可能在重复观测、传感设备甚至单个实验中变化。本文解决了从这种非均匀数据中发现非线性动力系统的问题。我们通过将集成SINDy和弱SINDy结合在由广义最小二乘法导出的加权回归公式中，扩展了稀疏识别非线性动力系统（SINDy）框架以考虑可变噪声水平。还提供了加权策略的统计证明。该方法在几个基准系统上得到验证，包括常微分和偏微分方程。此外，我们展示了多保真度集成在预测双摆系统动力学中的优势。结果证实，所提出的方法减轻了异方差噪声的不利影响，并且重复、低成本、低质量的测量可以改善模型恢复，在某些情况下匹配或优于仅使用高保真度数据获得的重建结果。

英文摘要

Data from simulations and experiments are rarely noise-free and often exhibit heterogeneous levels of fidelity. Measurement uncertainty may vary across repeated observations, sensing devices, or even within a single experiment. This work addresses the problem of discovering nonlinear dynamical systems from such inhomogeneous data. We extend the Sparse Identification of Nonlinear Dynamical Systems (SINDy) framework to account for variable noise levels by combining Ensemble SINDy and Weak SINDy within a weighted regression formulation derived from generalized least squares. A statistical justification for the weighting strategy is also provided. The methodology is validated on several benchmark systems, including ordinary and partial differential equations. In addition, we show the benefit of multi-fidelity integration for forecasting the dynamics of a double pendulum system. The results confirm that the proposed approach mitigates the adverse effects of heteroscedastic noise and that repeated, low-cost, low-quality measurements can improve model recovery, in some cases matching or outperforming reconstructions obtained using only high-fidelity data.

URL PDF HTML ☆

赞 0 踩 0

2606.15812 2026-06-16 cs.LG 新提交

Brownian Kernel Ladders

布朗核梯子

Mahdi Mohammadigohari, Giuseppe Di Fatta, Giuseppe Nicosia, Panos M Pardalos

发表机构 * Faculty of Engineering, Free University of Bozen-Bolzano（博洛尼亚-博兹纳自由大学工程学院）； Department of Biomedical and Biotechnological Sciences, University of Catania（卡塔尼亚大学生物医学与生物技术科学系）； Center for Applied Optimization, Department of Industrial and Systems Engineering, University of Florida（佛罗里达大学应用优化中心、工业与系统工程系）

AI总结提出布朗核梯子（BKL）递归层次函数空间，通过布朗核积分构造，证明其为准Banach空间且具有深度相关Hölder正则性，为深度学习的组合表示提供可解析框架。

Comments Submitted to JMLR

详情

AI中文摘要

在统计学习理论中，构建能够捕捉层次组合表示的可解析函数空间仍然是一个核心挑战。我们引入了布朗核梯子（BKL），这是一个通过布朗核积分构造递归定义的积分再生核希尔伯特空间层次结构。从线性泛函开始，每一层通过对前一层子集上的概率测度积分布朗核得到，产生一个递归函数空间模型，其中深度直接通过层次结构编码。基于此框架，我们定义了规范BKL空间及其相关的复杂度泛函。我们建立了这些空间的若干分析和统计性质。特别地，我们证明BKL空间构成准Banach空间，满足依赖于深度的Hölder正则性估计，并表现出关于深度的严格单调性。我们进一步证明了正则化经验风险最小化的存在性结果，并推导了关于环境维度和层次深度一致控制的高斯复杂度界。分析的一个关键成分是基于递归子集分解和布朗核阈值表示的组合证明技术。这些估计为BKL空间上的正则化经验风险最小化提供了接近参数阶的过剩风险保证。我们的结果为研究深度学习中的组合表示提供了一个数学上可解析的层次函数空间框架。

英文摘要

Constructing mathematically tractable function spaces that capture hierarchical compositional representations remains a central challenge in statistical learning theory. We introduce Brownian kernel ladders (BKLs), a recursively defined hierarchy of integral reproducing kernel Hilbert spaces generated through Brownian-kernel integral constructions. Starting from linear functionals, each layer is obtained by integrating Brownian kernels over probability measures supported on subsets of the previous layer, yielding a recursive function-space model in which depth is encoded directly through the hierarchy. Based on this framework, we define canonical BKL spaces together with an associated complexity functional. We establish several analytical and statistical properties of these spaces. In particular, we show that BKL spaces form quasi-Banach spaces, satisfy depth-dependent Hölder regularity estimates, and exhibit strict monotonicity with respect to depth. We further prove existence results for regularized empirical risk minimization and derive Gaussian complexity bounds that remain uniformly controlled with respect to both the ambient dimension and the hierarchy depth. A key ingredient of the analysis is a combinatorial proof technique based on recursive subset decompositions and Brownian-kernel threshold representations. These estimates yield excess-risk guarantees of near-parametric order for regularized empirical risk minimization over BKL spaces. Our results provide a mathematically tractable hierarchical function-space framework for studying compositional representations in deep learning.

URL PDF HTML ☆

赞 0 踩 0

2606.15832 2026-06-16 cs.LG math.OC 新提交

SILAGE: Memory-Efficient, Full-Gradient-Free Nonconvex Optimization for Nested Finite Sums

SILAGE: 针对嵌套有限和的内存高效、完全无全梯度的非凸优化

Igor Sokolov, Laurent Condat, Peter Richtárik

发表机构 * Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST)（生成人工智能卓越中心，国王阿卜杜勒-阿齐兹大学科学与技术学院）

AI总结针对大规模数据中嵌套双有限和结构的非凸优化，提出SILAGE算法，通过利用双和结构避免全局全梯度刷新，仅需O(n)内存，并基于组间和组内异质性实现自适应收敛分析。

Comments 80 pages, 3 algorithms, 4 theorems, 2 corollaries, 11 lemmas, 2 figures, 12 tables

详情

AI中文摘要

大规模数据集上的经验风险最小化自然呈现出嵌套的双有限和结构，其中 $N=nm$ 个总样本被逻辑或物理地划分为 $n$ 个大小为 $m$ 的块（例如，在池化数据孤岛、核外学习或有意分层中）。虽然方差缩减方法对非凸目标实现了最优的 oracle 复杂度，但在此集中式场景中它们遭受严重的扩展瓶颈。递归估计器（如 PAGE）需要定期对所有 $nm$ 个样本进行全局全梯度刷新，这在计算上代价高昂。相反，单循环方法（如 SILVER）避免了此类刷新，但需要不切实际的 $\mathcal{O}(nm)$ 内存来存储每个样本的控制变量。在本文中，我们提出了 SILAGE，一种解决此权衡的方差缩减算法。通过主动利用双和结构，SILAGE 消除了对所有 $nm$ 组件的周期性全局全梯度刷新（每次迭代最多评估一个局部组梯度），同时仅需 $\mathcal{O}(n)$ 内存。此外，我们提供了严格的收敛分析，避免了悲观的 worst-case Lipschitz 常数。相反，SILAGE 的复杂度通过嵌套的函数相似性（组间异质性 $δ_1$ 和组内异质性 $δ_2$）自然地适应底层数据几何。我们的结果在几个实际相关场景中改进了现有的最先进界限。

英文摘要

Empirical risk minimization on massive datasets naturally exhibits a nested double finite-sum structure, where $N=nm$ total samples are logically or physically partitioned into $n$ blocks of size $m$ (e.g., in pooled data silos, out-of-core learning, or deliberate stratification). While variance-reduced methods achieve optimal oracle complexities for nonconvex objectives, they suffer from severe scaling bottlenecks in this centralized regime. Recursive estimators, such as PAGE, require periodic global full-gradient refreshes over all $nm$ samples, which are computationally expensive. Conversely, single-loop methods, such as SILVER, avoid such refreshes but require an impractical $\mathcal{O}(nm)$ memory footprint to store a control variate for every sample. In this paper, we propose SILAGE, a variance-reduced algorithm that addresses this trade-off. By actively exploiting the double-sum structure, SILAGE eliminates periodic global full-gradient refreshes over all $nm$ components (evaluating at most one local group gradient per iteration) while requiring only $\mathcal{O}(n)$ memory. Furthermore, we provide a tight convergence analysis that avoids pessimistic worst-case Lipschitz constants. Instead, SILAGE's complexity natively adapts to the underlying data geometry via nested functional similarities: across-group ($δ_1$) and within-group ($δ_2$) heterogeneity. Our results improve existing state-of-the-art bounds in several practically relevant regimes.

URL PDF HTML ☆

赞 0 踩 0

2606.16028 2026-06-16 cs.LG cs.IT math.FA math.IT 新提交

关于实、复和四元数深度线性网络的熵公式

Luis Contreras, Marco Nahas, Tejas Kotwal

发表机构 * CINVESTAV-IPN（墨西哥国立理工学院高级研究中心）； Brown University（布朗大学）

AI总结将Menon和Yu的实深度线性网络熵公式推广到复和四元数情形，得到统一公式。

Comments 17 pages

2606.15217 2026-06-16 stat.ML cs.LG 交叉投稿

Conformal Candidate Certification for Offline Model-Based Optimization

离线模型优化的共形候选认证

Seungjin Choi

发表机构 * Seungjin Choi（Choi）

AI总结提出共形候选认证（CCC）方法，通过加权共形预测为离线模型优化中的候选设计提供校准的单侧下界，确保超过目标阈值的候选被认证，解决了分布偏移下的统计可靠性问题。

Comments ICML 2026 Workshop on Decision-Making from Offline Datasets to Online Adaptation: Black-Box Optimization to Reinforcement Learning

详情

AI中文摘要

离线模型优化（MBO）通过优化在固定历史数据集上训练的代理模型来提出候选方案。由于候选方案故意处于分布外，代理模型的排名在最优化器最激进的地方最不可靠，然而现有方法没有为每个候选提供统计证书，证明其设计满足目标阈值。我们提出\emph{共形候选认证}（CCC），一种事后包装器，为每个候选附加一个校准的单侧下界，并仅推进那些下界超过目标阈值的候选。我们证明，熵正则化的代理最大化诱导出吉布斯倾斜提议，因此同一代理模型为加权共形预测提供重要性权重，无需单独的密度比估计步骤。在受控的合成研究中，CCC在名义水平0.90下认证了激进提议池中的16.7%的候选，经验覆盖率为0.990，而忽略协变量偏移的标准共形预测覆盖率降至0.416。

英文摘要

Offline model-based optimization (MBO) proposes candidates by optimizing a surrogate trained on a fixed historical dataset. Because candidates are deliberately out-of-distribution, surrogate rankings are least reliable exactly where the optimizer is most aggressive, yet existing methods provide no per-candidate statistical certificate that a design meets a target threshold. We propose \emph{Conformal Candidate Certification} (CCC), a post-hoc wrapper that attaches a calibrated one-sided lower bound to each candidate and advances only those whose bound exceeds the target. We show that entropy-regularized surrogate maximization induces a Gibbs-tilted proposal, so the same surrogate supplies importance weights for weighted conformal prediction without a separate density-ratio estimation step. In a controlled synthetic study, CCC certifies $16.7\%$ of an aggressive proposal pool with empirical coverage 0.990 at nominal 0.90, while standard conformal prediction ignoring the covariate shift collapses to 0.416 coverage.

URL PDF HTML ☆

赞 0 踩 0

2606.15271 2026-06-16 math.OC cs.LG 交叉投稿

Dual-Network PINNs for Optimal Control: A Reproducible Benchmark on the Mass-Spring-Damper System

双网络PINNs用于最优控制：质量-弹簧-阻尼器系统的可复现基准

Abdeladhim Tahimi, Rinaldo Vieira da Silva Junior

发表机构 * Centro de Engenharias e Ciências Agrárias, Universidade Federal de Alagoas, Brazil（工程与农业科学系，巴西联邦大学阿拉加斯分校）

AI总结提出双网络物理信息神经网络（PINN）直接求解质量-弹簧-阻尼器系统最优控制问题，通过状态网络精确满足边界条件、控制网络无约束，损失函数结合物理残差和成本泛函梯形近似，在基准上复现经典最优成本至四位有效数字。

Comments 22 pages, 6 figures. Reproducible benchmark study of dual-network Physics-Informed Neural Networks (PINNs) for optimal control of a mass-spring-damper system. Includes comparison with Pontryagin's Minimum Principle and direct transcription methods and accompanying Google Colab implementation

详情

AI中文摘要

本文提出了一个透明且可复现的基准研究，针对质量-弹簧-阻尼器系统的最优控制，采用直接双网络物理信息神经网络（PINN）公式。经典的线性二次最优控制问题通过两种独立的经典方法求解——Pontryagin最小值原理结合单次打靶法，以及通过梯形配点法的直接转录——并重新表述为一个受约束的优化问题，由两个前馈神经网络求解：一个状态网络，其边界条件通过复合三次和掩码假设精确强制执行；以及一个无约束的控制网络。复合损失结合了配点处的物理残差和成本泛函的梯形近似，并由单个标量超参数加权。在所考虑的基准上，PINN将经典最优成本复现至四位有效数字，精确满足终端状态约束，并产生点态状态和控制误差，这些误差落在两个经典参考的范围内。在此基准上，训练速度比经典打靶法慢大约两个数量级，这是如实报告的。贡献在于方法的清晰性而非方法的新颖性：该公式及附带的Google Colab实现旨在降低实践者探索基于PINN的最优控制的入门门槛，无需预先了解伴随方法或两点边值问题。

英文摘要

This work presents a transparent and reproducible benchmark study of a direct dual-network Physics-Informed Neural Network (PINN) formulation for the optimal control of a mass-spring-damper system. The classical linear-quadratic optimal control problem is solved by two independent classical methods -- Pontryagin's Minimum Principle with single shooting, and direct transcription through trapezoidal collocation -- and recast as a constrained optimization problem solved by two feedforward neural networks: a state network whose boundary conditions are enforced exactly through a composite cubic-and-mask ansatz, and an unconstrained control network. The composite loss combines the physics residual at the collocation points with a trapezoidal approximation of the cost functional, weighted by a single scalar hyperparameter. On the benchmark considered, the PINN reproduces the classical optimal cost to four significant digits, satisfies the terminal state constraints exactly by construction, and produces pointwise state and control errors that fall within the spread of the two classical references. Training is approximately two orders of magnitude slower than classical shooting on this benchmark, which is honestly reported. The contribution is methodological clarity rather than methodological novelty: the formulation and the accompanying Google Colab implementation are intended to lower the barrier to entry for practitioners exploring PINN-based optimal control without prior exposure to adjoint methods or two-point boundary value problems.

URL PDF HTML ☆

赞 0 踩 0

2606.15393 2026-06-16 stat.ML cs.LG stat.ME 交叉投稿

Finite Resources False Discovery Rate Control in Structured Hypothesis Spaces

结构化假设空间中的有限资源错误发现率控制

Binyamin Perets, Shie Mannor

发表机构 * Technion – Israel Institute of Technology（技术学院 – 以色列理工学院）； NVIDIA

AI总结针对有限空分布样本和结构化假设空间，提出基于再生核的框架，通过两种决策规则在精确FDR控制与统计功效间权衡，并优化资源分配。

详情

AI中文摘要

科学发现依赖于大规模假设检验。然而，在控制错误发现的同时识别真正发现的能力面临重大挑战：获取相关参考数据（零分布）是资源密集型的，留下有限数据的不确定性，并且当假设空间存在固有结构时，程序应考虑该结构。在这里，我们提出了一个框架，用于在以下两种情况下控制错误发现率：当每个假设仅由有限数量的空分布样本支持，导致其p值不确定时；以及当假设空间具有任意结构时，仅要求通过合适的再生核表示该结构。我们提出了两种决策规则，它们对结构错误指定都具有鲁棒性，但在精确FDR控制和统计功效之间提供了不同的权衡。第一个规则保证精确的FDR控制；第二个规则通过将镜像统计控制适应到计数空间来最大化功效，利用分析框架在精确镜像对称放松时评估FDR控制。此外，RKHS框架带来的可处理性使我们能够直接研究有限数据的不确定性，我们利用这一点提出了一种有效分配零分布样本的策略。

英文摘要

Scientific discovery relies on large-scale hypothesis testing. However, the capacity to identify true discoveries while controlling false discovery faces major challenges: obtaining relevant reference data (the null distribution) is resource-intensive, leaving finite-data uncertainty, and the procedure should account for the inherent structure in the hypothesis space, when such structure exists. Here, we present a framework for controlling the false discovery rate both when each hypothesis is evidenced only by a finite count of null draws, leaving its p-value uncertain, and when the hypothesis space carries arbitrary structure, requiring only that the structure be represented through a suitable reproducing kernel. We present two decision rules that are both robust to structural mis-specification, yet offer a distinct trade-off between exact FDR control and statistical power. The first rule guarantees exact FDR control; the second maximizes power by adapting mirror-statistic control into count space, utilizing an analytical framework to assess FDR control when exact mirror symmetry is relaxed. Furthermore, the tractability gained by the RKHS framework allows us to directly investigate finite-data uncertainties, which we leverage to suggest a policy for the efficient allocation of null distribution samples.

URL PDF HTML ☆

赞 0 踩 0

2606.15443 2026-06-16 math.OC cs.LG 交叉投稿

Coercivity and Local Convergence of Physical Learning in Linear Circuits

线性电路中物理学习的强制性与局部收敛性

Joshua A. McGinnis, Xinbo Li, Yoichiro Mori

发表机构 * Department of Mathematics, University of Pennsylvania, Philadelphia（宾夕法尼亚大学数学系）

AI总结针对线性电路，分析三种物理学习方法（平衡传播、耦合学习及其伴随变体）在小扰动极限下的局部收敛性，发现强制条件（基于网络结构的秩条件）保证指数收敛，且非退化情况是普遍的。

详情

AI中文摘要

物理学习方法利用系统的物理特性处理全局信息传递，仅通过局部更新规则训练物理网络执行计算任务。我们首次对三种此类方法——平衡传播（EP）、耦合学习（CL）以及我们提出的新方法伴随耦合学习（AL）——在离散和连续时间的小扰动极限下，针对线性电路进行了局部收敛性分析。EP和AL在自然损失函数上执行梯度下降，而CL遵循带有额外三次修正的修正动力学。假设解存在，我们识别出一个强制条件，表示为基于网络关联结构构建的矩阵的秩条件，在该条件下训练损失指数衰减且参数收敛到解流形。我们通过展示一个风筝电路（其中对称性导致强制常数在解流形上退化）证明了强制可能失败，但利用Sard定理证明这种退化是非典型的：对于几乎每个期望输出的选择，强制条件在解流形的每一点都成立。

英文摘要

Physical learning methods train physical networks to perform computational tasks using only local update rules, exploiting the physics of the system to handle the global transfer of information. We provide the first local convergence analysis of three such methods -- Equilibrium Propagation (EP), Coupled Learning (CL), and a new method we call Adjoint Coupled Learning (AL) -- for linear circuits, in the limit of small-nudging for both discrete and continuous time. EP and AL perform gradient descent on a natural loss function, while CL follows modified dynamics with an additional cubic correction. Assuming the existence of a solution, we identify a coercivity condition, expressed as a rank condition on a matrix built from the network's incidence structure, under which the training loss decays exponentially and the parameters converge to the solution manifold. We show that coercivity can fail by exhibiting a kite circuit in which a symmetry causes the coercivity constant to degenerate on the solution manifold, but prove using Sard's theorem that such degeneracies are non-generic: coercivity holds at every point of the solution manifold for almost every choice of desired output.

URL PDF HTML ☆

赞 0 踩 0

2606.15444 2026-06-16 math.OC cs.LG 交叉投稿

A Conservation Law for Equilibrium Propagation and Coupled Learning

平衡传播与耦合学习中的守恒律

Joshua A. McGinnis, Adam G. Kline, Yoichiro Mori

发表机构 * SAS, University of Pennsylvania（宾夕法尼亚大学SAS学院）

AI总结本文证明物理学习方法耦合学习和平衡传播在连续时间小扰动极限下守恒类质量量，并分析其对线性电路训练动力学的影响。

2606.15458 2026-06-16 stat.ML cs.LG 交叉投稿

二项逻辑混合模型中的信息差距与可行性感知推断

Yuta Hayashida, Shonosuke Sugasawa

AI总结研究二项逻辑混合模型中混合检测与标签恢复之间的信息差距，提出基于后验熵惩罚的可行性感知推断方法，避免误导性成分选择并改善后验标签概率校准。

Comments 33 pages (main) + 30 pages (supplement)

详情

AI中文摘要

本文研究二项逻辑混合模型中混合检测与标签恢复之间的信息差距。基于似然的标准准则（如贝叶斯信息准则，BIC）可以检测到两个成分的存在，但这并不能保证相应的标签是可恢复的。我们表明，这种差距对于具有固定试验次数的二项逻辑混合模型是内在的：观察到的混合结构证据和用于标签恢复的每个观测信息在成分分离度上具有不同的局部阶数，并且只有前者随样本量累积。因此，存在一个可检测但不可恢复的区域，其中BIC选择两个成分，而后验标签基本上没有信息。为了解决这个问题，我们提出了两种可行性感知推断程序：具有后验熵惩罚的可恢复性感知BIC，以及一种熵正则化估计器，它减轻了最大似然估计器产生过度分离成分和过度集中的后验责任的倾向。数值实验证实了预测的差距，并表明所提出的方法避免了误导性的成分选择，并改善了后验标签概率的校准。

英文摘要

This paper studies the information gap between mixture detection and label recovery in binomial logistic mixtures. Standard likelihood-based criteria such as the Bayesian information criterion (BIC) can detect the presence of two components, but this does not guarantee that the corresponding labels are recoverable. We show that this gap is intrinsic to binomial logistic mixtures with a fixed number of trials: observed-data evidence for mixture structure and per-observation information for label recovery have different local orders in the component separation, and only the former accumulates with the sample size. As a result, there exists a detectable-but-unrecoverable regime in which BIC selects two components while the posterior labels remain essentially uninformative. To address this issue, we propose two feasibility-aware inference procedures: a recoverability-aware BIC with a posterior-entropy penalty and an entropy-regularized estimator that mitigates the tendency of the maximum likelihood estimator to produce overly separated components and overly concentrated posterior responsibilities. Numerical experiments confirm the predicted gap and demonstrate that the proposed methods avoid misleading component selections and improve the calibration of posterior label probabilities.

URL PDF HTML ☆

赞 0 踩 0

2606.15679 2026-06-16 stat.ML cs.LG cs.NA math.NA 交叉投稿

Stochastic trace estimation with tensor train random vectors

基于张量列随机向量的随机迹估计

Zvonimir Bujanović, Daniel Kressner, Hrvoje Olić

发表机构 * University of Zagreb, Faculty of Science, Department of Mathematics（Zagreb大学科学学院数学系）； Institute of Mathematics, EPFL（EPFL数学研究所）

AI总结研究使用高斯随机张量列向量进行随机迹估计，证明适当秩下可恢复维度无关保证，并应用于Nyström++框架。

详情

AI中文摘要

随机迹估计是一种标准工具，用于近似仅通过矩阵-向量乘积可获得的大规模矩阵的迹。然而，在张量结构设置中，非结构化的高斯或Rademacher测试向量在存储和计算上可能过于昂贵，而更便宜的秩一张量积向量可能需要随张量阶数指数增长的样本复杂度。本文研究高斯随机张量列向量作为随机迹估计的结构化替代方案。我们证明，通过适当选择张量列秩，随机张量列向量可以恢复Girard-Hutchinson估计器的维度无关保证。特别地，基于张量列秩$r \geq d-1$的中位数均值变体在精度$\varepsilon$和失败概率$\delta$上实现了与基于非结构化高斯向量的经典估计器相同的依赖性。我们进一步证明了由独立高斯随机张量列向量形成的草图的一个无意识子空间注入结果：张量列秩$r\geq d-1$和$\mathcal{O}(\varepsilon^{-2}(k+\log(1/δ)))$个样本足以用于$k$维目标子空间。最后，我们研究了此类草图在Nyström++框架中的应用。我们证明，在额外的谱尾条件下，所得估计器可以实现所需的$\mathcal{O}(\varepsilon^{-1})$样本复杂度。这些结果阐明了随机张量列向量在随机迹估计中的潜力和局限性。

英文摘要

Stochastic trace estimation is a standard tool for approximating the trace of a large-scale matrix available only through matrix-vector products. However, in tensor-structured settings, unstructured Gaussian or Rademacher test vectors may be prohibitively expensive to store and compute with, while cheaper rank-one tensor-product vectors can require sample complexities that grow exponentially with the tensor order. This work studies Gaussian random tensor train vectors as a structured alternative for stochastic trace estimation. We show that, with a suitable choice of the tensor train rank, random tensor train vectors recover dimension-independent guarantees for the Girard--Hutchinson estimator. In particular, a median-of-means variant with tensor train rank $r \geq d-1$ achieves the same dependence on the accuracy $\varepsilon$ and failure probability $δ$ as the classical estimator based on unstructured Gaussian vectors. We further prove an oblivious subspace injection result for sketches formed from independent Gaussian random tensor train vectors: tensor train rank $r\geq d-1$ and $\mathcal{O}(\varepsilon^{-2}(k+\log(1/δ)))$ samples suffice for a $k$-dimensional target subspace. Finally, we investigate the use of such sketches within the Nyström++ framework. We show that the resulting estimator can achieve the desired $\mathcal{O}(\varepsilon^{-1})$ sample complexity under an additional spectral-tail condition. These results provide clarififcation on both the potential and the limitations of random tensor train vectors in stochastic trace estimation.

URL PDF HTML ☆

赞 0 踩 0

2606.15923 2026-06-16 cs.NE cs.AI cs.LG 交叉投稿

Runtime Analysis of Cartesian Genetic Programming in Evolving Boolean Functions

笛卡尔遗传规划在演化布尔函数中的运行时分析

Duc-Cuong Dang, Roman Kalkreuth, Andre Opris

发表机构 * University of Passau（帕绍大学）； RWTH Aachen University（亚琛工业大学）

AI总结本文首次对笛卡尔遗传规划在完全训练集上演化布尔函数进行运行时分析，证明构造n输入合取式的期望适应度评估次数为O(n D^5)，并发现非严格选择可加速至O(n D^4)，而异或函数需要指数时间。

Comments To appear in the Proceedings of PPSN 2026

详情

AI中文摘要

笛卡尔遗传规划（CGP）是遗传规划中实用且流行的形式之一，因为它使用基于图的程序表示。本文首次对CGP在完全训练集上演化布尔函数进行运行时分析。我们证明了CGP使用最多D≥n-1个二元门、最小函数集，甚至采用严格生存选择时，构造n个输入的合取式的期望适应度评估次数的渐近界为O(n D^5)。当使用非严格选择时，该界改进为O(n D^4)。我们的分析揭示了CGP诱导搜索的有趣特征，这些特征此前仅通过经验观察得到。特别是，允许接受同样好的解（包括那些包含不贡献适应度的连接门的解）可以导致加速，从而获得更好的渐近时间界。与合取式相反，我们还证明了一个负面结果，即CGP需要指数时间来演化异或函数。演化合取式的实验补充了我们的理论发现。使用不完全训练集可以进一步减少平均适应度评估次数，同时保持较好的泛化水平。

英文摘要

Cartesian Genetic Programming (CGP) is among the practical and popular forms of Genetic Programming as it uses a graph-based representation of programs. This paper presents a first runtime analysis of CGP in evolving Boolean functions using complete training sets. We prove an asymptotic bound $O(n D^5)$ for the expected number of fitness evaluations of CGP to construct a conjunction of $n$ inputs using at most $D \geq n-1$ binary gates, a minimal function set, and even with a strict survival selection. When the non-strict selection is used, the bound is improved to $O(n D^4)$. Our analysis reveals interesting characteristics of CGP induced search, which have been only observed empirically. In particular, enabling the acceptance of equally good solutions, including those with connected gates non-contributing to fitness, can lead to a speedup, and consequently a better asymptotic time bound. In contrast to conjunctions, we also prove a negative result which shows that CGP requires exponential time to evolve an exclusive disjunction. Experiments evolving conjunctions complement our theoretical findings. The use of incomplete training sets is found to further reduce the average number of fitness evaluations while maintaining a good level of generalisation.

URL PDF HTML ☆

赞 0 踩 0

2606.15962 2026-06-16 stat.ME cs.LG 交叉投稿

p-PSO: A Penalized Particle Swarm Optimization Technique for Finding D-Optimal Designs with Mixed Factors in Generalized Linear Models

p-PSO: 一种用于广义线性模型中混合因子D-最优设计的惩罚粒子群优化技术

Shrabanti Chowdhury, Abhyuday Mandal

发表机构 * Icahn School of Medicine at Mount Sinai（伊坎医学院）； University of Georgia（佐治亚大学）

AI总结提出一种新的惩罚粒子群优化方法p-PSO，通过通用惩罚公式解决广义线性模型中混合因子D-最优设计问题，高效且可直接使用现成PSO算法。

详情

AI中文摘要

寻找广义线性模型（GLMs）的D-最优设计具有挑战性，因为Fisher信息矩阵依赖于未知参数且缺乏闭式解，尤其当输入因子包含离散和连续变量时。尽管经典算法和最近的元启发式方法提供了部分解决方案，但仍需要稳健且计算高效的方法。本文提出了一种惩罚粒子群优化（PSO）方法，称为$p$-PSO。我们引入了一种新的、通用的约束优化惩罚公式，并展示了其在最优设计问题中的有效性。该公式与算法无关，适用于一大类黑箱优化方法。结果表明，该方法非常高效，其主要贡献在于提出了一种惩罚公式，使得可以直接使用现成的PSO算法，并自然地扩展到更一般的约束优化任务。

英文摘要

Finding D-optimal designs for generalized linear models (GLMs) is challenging due to the dependence of the Fisher information matrix on unknown parameters and the lack of closed-form solutions, particularly when input factors include both discrete and continuous variables. Although classical algorithms and recent metaheuristic approaches have offered partial solutions, there remains a need for robust and computationally efficient methods. In this paper, we propose a penalized Particle Swarm Optimization (PSO) approach, named $p$-PSO. Here we introduce a new, general-purpose penalty formulation for constrained optimization and demonstrate its effectiveness in optimal design problems. The formulation is algorithm-agnostic and applicable to a broad class of black-box optimization methods. Results show that the method is highly efficient, with its primary contribution being a penalty formulation that enables the direct use of an off-the-shelf PSO algorithm and extends naturally to more general constrained optimization tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.16013 2026-06-16 cond-mat.dis-nn cs.LG physics.data-an stat.ML 交叉投稿

The limits of interpretability in multiple linear regression

多元线性回归中可解释性的极限

Anand Sharma, Chen Liu, Daniele Coslovich, Misaki Ozawa

发表机构 * Indian Institute of Science Education and Research（印度科学教育与研究学院）； Innovation and Research Division, Ge-Room Inc.（Ge-Room公司创新与研究部）； Dipartimento di Fisica, Università di Trieste（特里este大学物理系）； Univ. Grenoble Alpes, CNRS, LIPhy（格勒诺布尔阿尔卑斯大学，CNRS，LIPhy）

AI总结本文通过分析特征相关矩阵的本征模，理论解释了多重共线性导致线性回归权重不稳定和振荡模式，从而丧失可解释性的机制，并验证了岭回归的缓解作用。

Comments 23 pages, 8 figures

详情

AI中文摘要

解释机器学习模型已引起越来越多的关注，特别是在物理科学中，人们常常寻求理解潜在机制而不仅仅是进行预测。多元线性回归通常被视为比深度神经网络等更复杂模型更具可解释性的替代方案，因为其预测表示为输入特征的显式加权和。然而，当输入特征强相关时，即存在多重共线性时，学习到的权重可能表现出数据集间的大幅波动和跨物理相似特征的振荡行为，使得其解释变得困难甚至不可能。尽管统计学家熟知多重共线性下权重的不稳定性，但其对物理解释的影响，特别是与跨物理相似特征的振荡权重的联系，尚未得到系统阐明。本文通过分析特征相关矩阵的本征模，从理论上讨论了这种可解释性丧失背后的机制。我们表明，与多重共线性相关的小本征值模式会放大权重的波动，并产生不一定反映有意义贡献的振荡模式。我们在物理数据集上数值验证了这一理论图景，并表明岭回归抑制了这些不稳定模式，尽管得到的权重仍需谨慎解释。通过分析多种公开数据集，我们进一步证实了研究结果的普适性。我们的结果阐明了为何在存在多重共线性的情况下，即使对于线性回归模型，物理解释仍然可能困难。

英文摘要

Interpreting machine-learning models has attracted increasing attention, particularly in the physical sciences, where one often seeks to understand the underlying mechanisms rather than merely make predictions. Multiple linear regression is often regarded as an interpretable alternative to more complex models, such as deep neural networks, because its predictions are expressed as explicit weighted sums of input features. However, when input features are strongly correlated, namely in the presence of multicollinearity, the learned weights can exhibit large dataset-to-dataset fluctuations and oscillatory behavior across physically similar features, making their interpretation difficult or even impossible. Although the instability of the weights under multicollinearity is well known in statistics, its consequences for physical interpretation, in particular its connection to oscillatory weights across physically similar features, have not been systematically clarified. Here, we theoretically discuss the mechanism behind this loss of interpretability by analyzing the eigenmodes of the feature correlation matrix. We show that small-eigenvalue modes associated with multicollinearity amplify fluctuations in the weights and generate oscillatory patterns that do not necessarily reflect meaningful contributions. We test this theoretical picture numerically on physics datasets and show that Ridge regularization suppresses these unstable modes, although the resulting weights must still be interpreted with caution. We further confirm the generality of our findings beyond physics by analyzing a diverse collection of publicly available datasets. Our results clarify why, in the presence of multicollinearity, physical interpretation can remain difficult even for linear regression models.

URL PDF HTML ☆

赞 0 踩 0

2606.16077 2026-06-16 cs.CC cs.LG 交叉投稿

Polynomial-Time Mistake-Bounded Language Generation

多项式时间错误有界语言生成

Héctor Jimenez, Alexander Kozachinskiy, Vicente Opazo

发表机构 * University of Chile（智利大学）； CENIA

AI总结本文提出多项式时间错误有界语言生成框架，证明奇偶函数族、文字合取族以及具有多项式多个极大项的单调整布尔函数族属于该框架，后者包含所有多项式大小决策树可计算的单调函数。

2606.16564 2026-06-16 cs.RO cs.LG 交叉投稿

Elastic ODYN: Differentiable Optimization for Infeasible Control and Learning in Robotics

Elastic ODYN：面向机器人中不可行控制与学习的可微优化

Aristotelis Papatheodorou, Jose Rojas, Ioannis Havoutis, Carlos Mastalli

发表机构 * University of Oxford（牛津大学）； Heriot-Watt University（赫瑞瓦特大学）

AI总结提出Elastic ODYN，一种通过平滑平方ℓ2弹性松弛处理不可行二次规划（QP）的原始-对偶非内点求解器，支持热启动，在无可行点时收敛到最接近可行解，并基于此开发可微QP层和不可行感知SQP方法，在基准QP、奇异接触力学、可微参数辨识及四足/人形机器人轨迹优化中优于现有方法。

Comments 8 pages, 5 figures, 2 tables

详情

AI中文摘要

机器人系统经常遇到冲突的目标、建模误差和退化接触条件，这些条件使得二次规划（QP）不可行。然而，大多数优化求解器和可微QP层假设可行性，当约束无法同时满足时，会导致数值失败、梯度不稳定或求解器崩溃。我们提出Elastic ODYN，一种原始-对偶非内点QP求解器，通过平滑平方ℓ2弹性松弛处理不可行性。所得公式在病态和退化条件下保持良态，支持热启动，并在无可行点时收敛到最接近可行解。一个轻量级细化阶段从弹性解中恢复有物理意义的对偶变量。基于此框架，我们开发了Elastic OdynLayer，一个在不可行性下具有稳定梯度的可微QP层，以及Elastic OdynSQP，一种不可行感知的SQP方法，通过选择性约束松弛解决不一致的子问题和本质不可行的最优控制任务。我们在基准QP、奇异接触力学、可微参数辨识以及四足和人形机器人轨迹优化上评估该框架。在所有设置中，Elastic ODYN在鲁棒性、热启动性能和收敛可靠性方面始终优于最先进的弹性QP求解器，使得优化、仿真、控制和学习能够超越现有方法的可行性假设。

英文摘要

Robotic systems routinely encounter conflicting objectives, modeling errors, and degenerate contact conditions that render quadratic programs (QPs) infeasible. Yet most optimization solvers and differentiable QP layers assume feasibility, leading to numerical failures, unstable gradients, or solver breakdown when constraints cannot be simultaneously satisfied. We present Elastic ODYN, a primal--dual non-interior-point QP solver that handles infeasibility through smooth squared-$\ell_2$ elastic relaxations. The resulting formulation remains well posed under ill-conditioning and degeneracy, supports warm starting, and converges to closest-to-feasible solutions when no feasible point exists. A lightweight refinement stage recovers physically meaningful dual variables from the elastic solution. Building on this framework, we develop Elastic OdynLayer, a differentiable QP layer with stable gradients under infeasibility, and Elastic OdynSQP, an infeasibility-aware SQP method that resolves inconsistent subproblems and intrinsically infeasible optimal control tasks through selective constraint relaxation. We evaluate the framework on benchmark QPs, singular contact mechanics, differentiable parameter identification, and quadrupedal and humanoid trajectory optimization. Across all settings, Elastic ODYN consistently outperforms state-of-the-art elastic QP solvers in robustness, warm-start performance, and convergence reliability, enabling optimization, simulation, control, and learning beyond the feasibility assumptions of existing methods.

URL PDF HTML ☆

赞 0 踩 0

2606.16926 2026-06-16 math.OC cs.LG stat.ML 交叉投稿

Functional Gradient Descent with Adaptive Representations

自适应表示的函数梯度下降

Daniel Csillag, Rodrigo Schuller, Pedro Dall'Antonia, Leonidas Guibas, Luiz Velho, Tiago Novello

AI总结提出一种自适应表示的函数梯度下降算法，通过将近似误差纳入分析，在平滑损失下收敛到驻点，在PL条件下收敛到全局最小值，在回归、PDE求解和计算机视觉中优于固定近似FGD和神经网络基线。

详情

AI中文摘要

函数优化问题通常通过优化固定表示（如神经网络）的参数来解决，这导致高度非凸的损失，使训练和理论分析复杂化。一个有趣的替代方案是函数梯度下降（FGD），即直接在函数空间中进行梯度下降，它受益于强收敛结果并具有简洁的理论。然而，FGD在实践中难以实现，因为函数梯度是无限维的，因此无法完全计算或存储在内存中。现有的实现因此依赖于固定近似，这引入了近似误差。我们提出了一种新的、有理论基础的FGD算法，该算法在优化过程中自适应地调整函数梯度的表示。通过将这种近似明确地纳入分析，我们证明了无论近似如何，算法都能收敛到驻点（对于平滑损失）和全局最小值（在平滑性和Polyak-Lojasiewicz型条件下）。据我们所知，这是第一个在一般设置下具有此类保证的可实现FGD方法。我们在回归、偏微分方程的数值求解和现代计算机视觉中展示了我们方法的有效性。在各种设置中，我们的方法在效率和准确性上始终优于固定近似的FGD和神经网络基线。

英文摘要

Functional optimization problems are typically solved by optimizing the parameters of a fixed representation, such as a neural network, resulting in highly nonconvex losses that complicate both training and theoretical analysis. An interesting alternative is functional gradient descent (FGD), that is, gradient descent directly in function space, which benefits from strong convergence results and admits a clean theory. However, FGD is difficult to implement in practice because functional gradients are infinite-dimensional, and thus cannot be fully computed nor stored in memory. Existing implementations therefore rely on fixed approximations, which introduce approximation error. We propose a new, theoretically-grounded FGD algorithm that adapts the representation of the functional gradients over the course of optimization. By explicitly incorporating this approximation into the analysis, we establish convergence to a stationary point (for smooth losses) and to a global minimizer (under smoothness + a Polyak-Lojasiewicz-type condition) regardless of our approximations. To the best of our knowledge, this is the first implementable FGD method with such guarantees in a general setting. We demonstrate the effectiveness of our method on regression, numerical solution of PDEs, and modern computer vision. Across settings, our method consistently outperforms both FGD with fixed approximations and neural network baselines in efficiency and accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.16941 2026-06-16 stat.ML cs.LG 交叉投稿

A nonparametric two-sample test using a parametric integral probability metric

使用参数化积分概率度量的非参数双样本检验

Yuha Park, Yongdai Kim

发表机构 * University of Hamburg（汉堡大学）； Seoul National University（首尔国立大学）

AI总结提出基于单节点神经网络的参数化判别器类构造积分概率度量，得到非参数检验统计量PReLU-IPM，并证明其一致性和渐近等价性，实验表明有限样本下检验功效更高或相当。

Comments 45 pages. Accepted for publication in Statistical Analysis and Data Mining

详情

AI中文摘要

检测两个独立样本之间的分布差异是统计学和机器学习中的一个基本问题。非参数双样本检验提供了一个原则性框架，用于确定两个样本是否来自同一潜在分布，而不假设分布的任何特定参数形式。在本研究中，我们基于新引入的积分概率度量（IPM），使用一个特殊设计的、具有神经网络单节点的参数化判别器类，提出了一种新的双样本检验统计量。我们证明了所得到的检验统计量PReLU-IPM是非参数的，并为相关的双样本检验程序PReLU-TST建立了理论保证，包括其一致性以及在正则条件下与非参数基于IPM的检验的渐近等价性。通过分析多个模拟和真实基准数据集，我们证明了PReLU-TST在有限样本下，在一系列备择假设中实现了更高的检验功效，或与竞争对手表现相当。

英文摘要

Detecting distributional differences between two independent samples is a fundamental problem in statistics and machine learning. Nonparametric two-sample testing provides a principled framework for determining whether two samples are drawn from the same underlying distribution, without assuming any specific parametric form for the distribution. In this study, we propose a new two-sample test statistic based on a newly introduced integral probability metric (IPM), using a specially designed parametric discriminator class with a single node of a neural network. We show that the resulting test statistic, called PReLU-IPM, is nonparametric and establish theoretical guarantees for the associated two-sample testing procedure, PReLU-TST, including its consistency and asymptotical equivalence to nonparametric IPM-based tests under regularity conditions. By analyzing multiple simulated and real benchmark datasets, we demonstrate that PReLU-TST achieves higher power across a range of alternatives or performs comparably to its competitors, for finite samples.

URL PDF HTML ☆

赞 0 踩 0

2606.16975 2026-06-16 stat.ML cs.LG 交叉投稿

Sobolev Approximation by Fixed-Size Neural Networks with Arbitrary Accuracy

固定大小神经网络实现任意精度的Sobolev逼近

Baicheng Li, Haizhao Yang, Shijun Zhang

AI总结提出新型激活函数（EUAF、DUAF∞等），使固定大小神经网络能以任意精度逼近Sobolev空间中的函数，并给出显式的宽度和深度界。

详情

AI中文摘要

本文研究用于固定大小神经网络实现任意精度Sobolev逼近的新型激活函数。我们首先证明，任何$W^{2,\infty}((a,b)^d)$中的函数都可以通过使用基本通用激活函数（$\mathrm{EUAF}$）的固定大小神经网络，以$W^{1,\infty}$范数度量达到任意精度。为了将此结果推广到$s\in\mathbb{N}$时的$W^{s,\infty}((a,b)^d)$，我们引入了来自可微通用激活函数族（$\mathrm{DUAF}_n$）的光滑激活函数$\mathrm{DUAF}_{\infty}$。我们证明，任何$W^{s,\infty}((a,b)^d)$中的函数都可以通过固定大小的$\mathrm{DUAF}_{\infty}$激活网络，以$W^{s-1,\infty}$范数度量达到任意精度。我们进一步构造了Sigmoid变体$\widetilde{\mathrm{DUAF}}_n$，并证明对于每个$1\leq s\leq n$，固定大小的$\widetilde{\mathrm{DUAF}}_n$激活网络仍能以$W^{s-1,\infty}$范数度量任意逼近任何$f\in W^{s,\infty}((a,b)^d)$。在所有结果中，宽度和深度界均被显式计算，且所提出的激活函数是初等的。

英文摘要

In this work, we investigate new activation functions for achieving arbitrary-accuracy Sobolev approximation by fixed-size neural networks. We first show that any function in $W^{2,\infty}((a,b)^d)$ can be approximated with arbitrary accuracy, measured in the $W^{1,\infty}$-norm, by a fixed-size neural network using the Elementary Universal Activation Function ($\mathrm{EUAF}$). To extend this result to $W^{s,\infty}((a,b)^d)$ for $s\in\mathbb{N}$, we introduce a smooth activation $\mathrm{DUAF}_{\infty}$ from the family of Differentiable Universal Activation Functions ($\mathrm{DUAF}_n$). We prove that any function in $W^{s,\infty}((a,b)^d)$ can be approximated with arbitrary accuracy in the $W^{s-1,\infty}$-norm by a fixed-size $\mathrm{DUAF}_{\infty}$-activated network. We further construct sigmoidal variants $\widetilde{\mathrm{DUAF}}_n$ and show that, for every $1\leq s\leq n$, fixed-size $\widetilde{\mathrm{DUAF}}_n$-activated networks still approximate any $f\in W^{s,\infty}((a,b)^d)$ with arbitrary accuracy in the $W^{s-1,\infty}$-norm. In all these results, the width and depth bounds are computed explicitly, and the proposed activations are elementary.

URL PDF HTML ☆

赞 0 踩 0

2606.17000 2026-06-16 cs.CC cs.GT cs.LG math.OC 交叉投稿

The Complexity of Min-Max Optimization for Quadratic Polynomials

二次多项式极小极大优化的复杂性

Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Alexandros Hollender

AI总结证明超立方体上极小极大优化的近似稳定点计算对二次多项式是PPAD难的，即使多项式是多线性的且每个变量最多出现在三个单项式中。

2606.17013 2026-06-16 math.OC cs.LG 交叉投稿

Exploding and vanishing gradients in deep neural networks: the effect of residual connections

深度神经网络中的梯度爆炸和消失：残差连接的影响

Vivek S Borkar

AI总结利用乘法遍历理论分析深度神经网络中的梯度爆炸与消失现象，并解释残差连接对李雅普诺夫谱的影响。

Comments 10 pages

2409.08066 2026-06-16 cs.LG math.OC 版本更新

Self-Supervised Learning of Iterative Solvers for Constrained Optimization

约束优化的迭代求解器的自监督学习

Lukas Lüken, Sergio Lucia

发表机构 * Chair of Process Automation Systems, TU Dortmund University（过程自动化系统教授会，杜伊斯堡-艾森大学）

AI总结提出一种基于学习的迭代求解器，通过神经网络预测初始解并用学习型迭代器精炼，利用KKT条件设计损失函数实现自监督训练，在非凸问题上比IPOPT快一个数量级且精度更高。

Comments This work has been published in Results in Control and Optimization. Update to accepted manuscript

详情

DOI: 10.1016/j.rico.2026.100751
Journal ref: Results in Control and Optimization, Volume 23, 2026

关于过参数化矩阵感知中权重归一化的优势

Yudong Wei, Liang Zhang, Bingcong Li, Niao He

发表机构 * ETH Zurich（苏黎世联邦理工学院）

AI总结本文证明在过参数化矩阵感知中，权重归一化结合黎曼优化可实现线性收敛，相比未使用归一化的方法获得指数级加速，且过参数化程度越高，迭代和样本复杂度多项式级降低。

2512.02494 2026-06-16 cs.LG 版本更新

A Fully First-Order Layer for Differentiable Optimization

用于可微优化的全一阶层

Zihao Zhao, Kai-Chia Mo, Shing-Hei Ho, Brandon Amos, Kai Wang

发表机构 * University of California, Berkeley（加州大学伯克利分校）； DeepMind（深度思维）

AI总结提出一种仅使用一阶信息计算梯度的算法，通过将可微优化重写为双层优化并引入活动集拉格朗日超梯度方法，避免Hessian计算，实现高效近似。

Comments ICML 2026

详情

AI中文摘要

可微优化层使得学习系统能够通过求解嵌入的优化问题来做出决策。然而，通过隐式微分计算梯度需要求解一个包含Hessian项的线性系统，这既计算密集又内存密集。为了解决这一挑战，我们提出了一种仅使用一阶信息计算梯度的新算法。关键洞察是将可微优化重写为双层优化问题，并利用双层方法的最新进展。具体来说，我们引入了一个活动集拉格朗日超梯度方法，避免了Hessian计算，并提供了有限时间、非渐近的近似保证。我们证明，仅使用一阶信息即可在$\tilde{O}(1)$时间内计算出近似超梯度，从而使得约束双层优化的总体复杂度为$\tilde{O}(\delta^{-1}\epsilon^{-3})$，这与非光滑非凸优化的最佳已知速率相匹配。此外，我们发布了一个开源Python库，可以轻松地从现有求解器进行适配。源代码可在该https URL获取。

英文摘要

Differentiable optimization layers enable learning systems to make decisions by solving embedded optimization problems. However, computing gradients via implicit differentiation requires solving a linear system with Hessian terms, which is both compute- and memory-intensive. To address this challenge, we propose a novel algorithm that computes the gradient using only first-order information. The key insight is to rewrite the differentiable optimization as a bilevel optimization problem and leverage recent advances in bilevel methods. Specifically, we introduce an active-set Lagrangian hypergradient oracle that avoids Hessian evaluations and provides finite-time, non-asymptotic approximation guarantees. We show that an approximate hypergradient can be computed using only first-order information in $\tilde{O}(1)$ time, leading to an overall complexity of $\tilde{O}(δ^{-1}ε^{-3})$ for constrained bilevel optimization, which matches the best known rate for non-smooth non-convex optimization. Furthermore, we release an open-source Python library that can be easily adapted from existing solvers. The source code is available at https://github.com/guaguakai/FFOLayer.

URL PDF HTML ☆

赞 0 踩 0

2602.12471 2026-06-16 cs.LG 版本更新

Tight Bounds for Logistic Regression with Large Stepsize Gradient Descent in Low Dimension

低维大步长梯度下降逻辑回归的紧界

Michael Crawshaw, Mingrui Liu

发表机构 * George Mason University（乔治·马歇尔大学）

AI总结针对可分离数据二分类的逻辑回归，研究大步长梯度下降的收敛速率，通过精细分析正交子空间振荡动力学，给出过渡时间的紧界，得到改进的损失上界。

Comments COLT 2026 camera ready

详情

AI中文摘要

我们考虑用梯度下降最小化逻辑损失来训练线性模型进行可分离数据的二分类的优化问题。在$T$次迭代的预算下，最近研究表明通过选择大步长$\eta = \Theta(\gamma^2 T)$（其中$\gamma$是数据集的间隔）可以实现加速的$1/T^2$速率，尽管损失会出现非单调性。在本文中，我们针对数据是二维的情况提供了梯度下降在该问题上的更紧分析：我们证明，只要$T \geq \Omega(n/\gamma + 1/\gamma^2)$（其中$n$是数据集大小），具有足够大学习率$\eta$的GD就能找到损失小于$\mathcal{O}(1/(\eta \gamma^2 T))$的点。我们的改进速率来自于对GD从非稳定（非单调损失）过渡到稳定（单调损失）所需时间$\tau$的更紧界，这是通过对最大间隔分类器正交子空间中GD的振荡动力学进行精细分析得到的。我们还给出了$\tau$的下界，与上界匹配至对数因子，表明我们的分析是紧的。

英文摘要

We consider the optimization problem of minimizing the logistic loss with gradient descent to train a linear model for binary classification with separable data. With a budget of $T$ iterations, it was recently shown that an accelerated $1/T^2$ rate is possible by choosing a large stepsize $η= Θ(γ^2 T)$ (where $γ$ is the dataset's margin) despite the resulting non-monotonicity of the loss. In this paper, we provide a tighter analysis of gradient descent for this problem when the data is two-dimensional: we show that GD with a sufficiently large learning rate $η$ finds a point with loss smaller than $\mathcal{O}(1/(ηγ^2 T))$, as long as $T \geq Ω(n/γ+ 1/γ^2)$, where $n$ is the dataset size. Our improved rate comes from a tighter bound on the time $τ$ that it takes for GD to transition from unstable (non-monotonic loss) to stable (monotonic loss), via a fine-grained analysis of the oscillatory dynamics of GD in the subspace orthogonal to the max-margin classifier. We also provide a lower bound of $τ$ matching our upper bound up to logarithmic factors, showing that our analysis is tight.

URL PDF HTML ☆

赞 0 踩 0

2602.14154 2026-06-16 cs.LG math.OC 版本更新

A Penalty Approach for Differentiation Through Black-Box Quadratic Programming Solvers

一种通过黑箱二次规划求解器进行微分的惩罚方法

Yuxuan Linghu, Zhiyuan Liu, Qi Deng

AI总结提出dXPP，一种基于惩罚的微分框架，通过解耦QP求解与微分，利用黑箱求解器进行前向传播，并在反向传播中隐式微分一个光滑近似惩罚问题，显著提升大规模问题的计算效率和鲁棒性。

Comments 16 pages, 4 figures, 5 tables

详情

AI中文摘要

通过二次规划（QP）的解进行微分是可微优化中的一个核心问题。大多数现有方法通过Karush--Kuhn--Tucker（KKT）系统进行微分，但其计算成本和数值鲁棒性在大规模时会下降。为了解决这些限制，我们提出了dXPP，一种基于惩罚的微分框架，将QP求解与微分解耦。在求解步骤（前向传播）中，dXPP与求解器无关，可以利用任何黑箱QP求解器。在微分步骤（反向传播）中，我们将解映射到一个光滑的近似惩罚问题，并通过隐式微分对其进行微分，仅需求解一个在原始变量上小得多的线性系统。这种方法绕过了显式KKT微分固有的困难，显著提高了计算效率和鲁棒性。我们在各种任务上评估了dXPP，包括随机生成的QP、大规模稀疏投影问题以及一个真实的多期投资组合优化任务。实验结果表明，dXPP与基于KKT的微分方法具有竞争力，并在大规模问题上实现了显著的加速。我们的实现是开源的，可在此https URL获取。

英文摘要

Differentiating through the solution of a quadratic program (QP) is a central problem in differentiable optimization. Most existing approaches differentiate through the Karush--Kuhn--Tucker (KKT) system, but their computational cost and numerical robustness can degrade at scale. To address these limitations, we propose dXPP, a penalty-based differentiation framework that decouples QP solving from differentiation. In the solving step (forward pass), dXPP is solver-agnostic and can leverage any black-box QP solver. In the differentiation step (backward pass), we map the solution to a smooth approximate penalty problem and implicitly differentiate through it, requiring only the solution of a much smaller linear system in the primal variables. This approach bypasses the difficulties inherent in explicit KKT differentiation and significantly improves computational efficiency and robustness. We evaluate dXPP on various tasks, including randomly generated QPs, large-scale sparse projection problems, and a real-world multi-period portfolio optimization task. Empirical results demonstrate that dXPP is competitive with KKT-based differentiation methods and achieves substantial speedups on large-scale problems. Our implementation is open source and available at https://github.com/mmmmmmlinghu/dXPP.

URL PDF HTML ☆

赞 0 踩 0

2602.19172 2026-06-16 cs.LG 版本更新

Online Realizable Regression and Applications for ReLU Networks

在线可实现回归及其在ReLU网络中的应用

Ilan Doron-Arad, Idan Mehalel, Elchanan Mossel

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； The Hebrew University of Jerusalem（耶路撒冷希伯来大学）

AI总结研究对抗模型下满足近似三角不等式的损失函数的可实现在线回归，提出熵势方法通过覆盖数上界化缩放Littlestone维数，并应用于Lipschitz回归和ReLU网络，揭示回归与分类的差异。

详情

AI中文摘要

可实现在线回归的行为可能与在线分类截然不同。即使没有边际或随机假设，可实现性也可能在类似度量的损失下强制实现无界（有限）累积损失，即使类似的分类问题具有无限的错误界。我们研究了对抗模型下满足近似三角不等式（近似伪度量）的损失函数的可实现在线回归。Attias等人最近的工作表明，最小最大可实现累积损失由缩放的Littlestone/在线维数 $\mathbb{D}_{\mathrm{onl}}$ 刻画，但这个量可能难以分析。我们的主要技术贡献是一个通用的势方法，通过一个具体的Dudley型熵积分来上界 $\mathbb{D}_{\mathrm{onl}}$，该积分仅依赖于假设类在诱导的sup伪度量下的覆盖数。我们定义了一个 \emph{熵势} $\Phi(\mathcal{H})=\int_{0}^{diam(\mathcal{H})} \log N(\mathcal{H},\varepsilon)\,d\varepsilon$，其中 $N(\mathcal{H},\varepsilon)$ 是 $\mathcal{H}$ 的 $\varepsilon$-覆盖数，并证明对于每个 $c$-近似伪度量损失，$\mathbb{D}_{\mathrm{onl}}(\mathcal{H})\le O(c)\,\Phi(\mathcal{H})$。特别地，多项式度量熵意味着 $\Phi(\mathcal{H})<\infty$，从而得到具有透明有效维数依赖的无界可实现累积损失界。我们在两个族上说明了该方法。我们证明了可实现在线学习的尖锐 $q$-vs.-$d$ 二分法（对于 $L$-Lipschitz回归，当且仅当 $q>d$ 时，总损失有限且可高效实现 $\Theta_{d,q}(L^d)$，否则无限），以及对于有界范数 $k$-ReLU网络，将回归（有限损失，甚至 $\widetilde O(k^2)$，对于单个ReLU为 $O(1)$）与分类（对于 $k=2,d=1$ 已经不可能）区分开来。

英文摘要

Realizable online regression can behave very differently from online classification. Even without any margin or stochastic assumptions, realizability may enforce horizon-free (finite) cumulative loss under metric-like losses, even when the analogous classification problem has an infinite mistake bound. We study realizable online regression in the adversarial model under losses that satisfy an approximate triangle inequality (approximate pseudo-metrics). Recent work of Attias et al. shows that the minimax realizable cumulative loss is characterized by the scaled Littlestone/online dimension $\mathbb{D}_{\mathrm{onl}}$, but this quantity can be difficult to analyze. Our main technical contribution is a generic potential method that upper bounds $\mathbb{D}_{\mathrm{onl}}$ by a concrete Dudley-type entropy integral that depends only on covering numbers of the hypothesis class under the induced sup pseudo-metric. We define an \emph{entropy potential} $Φ(\mathcal{H})=\int_{0}^{diam(\mathcal{H})} \log N(\mathcal{H},\varepsilon)\,d\varepsilon$, where $N(\mathcal{H},\varepsilon)$ is the $\varepsilon$-covering number of $\mathcal{H}$, and show that for every $c$-approximate pseudo-metric loss, $\mathbb{D}_{\mathrm{onl}}(\mathcal{H})\le O(c)\,Φ(\mathcal{H})$. In particular, polynomial metric entropy implies $Φ(\mathcal{H})<\infty$ and hence a horizon-free realizable cumulative-loss bound with transparent dependence on effective dimension. We illustrate the method on two families. We prove a sharp $q$-vs.-$d$ dichotomy for realizable online learning (finite and efficiently achievable $Θ_{d,q}(L^d)$ total loss for $L$-Lipschitz regression iff $q>d$, otherwise infinite), and for bounded-norm $k$-ReLU networks separate regression (finite loss, even $\widetilde O(k^2)$, and $O(1)$ for one ReLU) from classification (impossible already for $k=2,d=1$).

URL PDF HTML ☆

赞 0 踩 0

2603.09923 2026-06-16 cs.LG cs.NA math.NA math.OC 版本更新

OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality

OptEMA：用于零噪声最优性的随机优化的自适应指数移动平均

Ganzhao Yuan

发表机构 * Shenzhen University of Advanced Technology (SUAT)（深圳先进技术大学）

AI总结提出OptEMA自适应优化器，通过闭环修正的AdaGrad-Norm系数调度，在无噪声时自动达到近最优确定速率，无需超参数重调。

详情

AI中文摘要

指数移动平均（EMA）是广泛使用的自适应优化器（如Adam）的核心组件。然而，现有的Adam类方法分析在零噪声场景下往往产生次优保证，依赖于开环参数调度，或需要预先知道光滑常数。受这些限制的启发，我们引入了OptEMA并分析了两个互补变体：OptEMA-M，它对一阶矩应用自适应递减的EMA系数并固定二阶矩衰减；OptEMA-V，交换了这些角色。这些变体的核心是修正的AdaGrad-Norm系数调度。该公式使OptEMA算法上闭环且无Lipschitz依赖，即其有效步长依赖于轨迹且无需通过Lipschitz常数参数化。在假设下界、无偏性、有界方差、平均光滑性以及用于控制自适应归一化器的有界随机梯度条件下，我们证明两个变体在平均梯度范数上达到统一的噪声自适应速率$\tilde{\mathcal{O}} \left(T^{-1/2}+\sigma^{1/2}T^{-1/4}\right)$。在零噪声场景下，这些界限自动退化为接近最优的确定速率$\widetilde{\mathcal{O}}(T^{-1/2})$，无需手动超参数重调。

英文摘要

Exponential moving averages (EMAs) are a central component of widely used adaptive optimizers such as Adam. However, existing analyses of Adam-style methods often yield suboptimal guarantees in the zero-noise regime, rely on open-loop parameter schedules, or require prior knowledge of smoothness constants. Motivated by these limitations, we introduce OptEMA and analyze two complementary variants: OptEMA-M, which applies an adaptive, decreasing EMA coefficient to the first moment with a fixed second-moment decay, and OptEMA-V, which swaps these roles. At the heart of these variants is a Corrected AdaGrad-Norm coefficient schedule. This formulation renders OptEMA algorithmically closed-loop and Lipschitz-free, meaning its effective stepsizes are trajectory-dependent and require no parameterization via the Lipschitz constant. Under lower-boundedness, unbiasedness, bounded variance, average smoothness, and a bounded stochastic-gradient condition used to control the adaptive normalizers, we prove that both variants achieve the unified noise-adaptive rate $\tilde{\mathcal{O}} \left(T^{-1/2}+σ^{1/2}T^{-1/4}\right)$ for the averaged gradient norm. In the zero-noise regime, these bounds automatically reduce to the nearly optimal deterministic rate $\widetilde{\mathcal{O}}(T^{-1/2})$ without manual hyperparameter retuning.

URL PDF HTML ☆

赞 0 踩 0

2605.01702 2026-06-16 cs.LG 版本更新

Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients

具有自动微分的浮点网络可以表示几乎所有浮点函数及其梯度

Sejun Park, Yeachan Park, Geonho Hwang

发表机构 * Department of Artificial Intelligence, Korea University（人工智能系，韩国大学）； Department of Mathematics and Statistics, Sejong University（数学与统计学系，世宗大学）； Department of Mathematical Sciences, Gwangju Institute of Science and Technology（数学科学系，光州科学技术院）

AI总结本文证明，在浮点算术下，使用自动微分的浮点神经网络可以表示任意浮点函数及其梯度，适用于ReLU、ELU等常见激活函数。

详情

AI中文摘要

理论研究显示，对于紧致域上的任意可微函数，存在一个神经网络可以同时逼近函数值和梯度。然而，由于该结果假设实数参数和精确内部运算，无法在实际中使用。相反，实际实现仅使用实数的有限子集和带有舍入误差的机器运算。本文研究在浮点算术下，当输入梯度由自动微分算法$D^\mathtt{AD}$计算时，神经网络是否具有类似结果。我们首先证明，给定一个浮点函数$\phi$（例如损失函数），任意函数值和梯度可以分别由浮点网络$f$和$D^\mathtt{AD}(\phi\circ f)$表示。我们进一步推广该结果：在温和条件下，给定$\phi_1,\dots,\phi_n$，$D^\mathtt{AD}(\phi_i\circ f)$可以同时表示任意梯度，而$f$表示目标值。我们的结果适用于实际激活函数，例如$\mathrm{ReLU}$、$\mathrm{ELU}$、$\mathrm{GeLU}$、$\mathrm{Swish}$、$\mathrm{Sigmoid}$和$\mathrm{tanh}$。

英文摘要

Theoretical studies show that for any differentiable function on a compact domain, there exists a neural network that approximates both the function values and gradients. However, such a result cannot be used in practice since it assumes real parameters and exact internal operations. In contrast, real implementations only use a finite subset of reals and machine operations with round-off errors. In this work, we investigate whether a similar result holds for neural networks under floating-point arithmetic, when the gradient with respect to the input is computed by the automatic differentiation algorithm $D^\mathtt{AD}$. We first show that given a floating-point function $ϕ$ (e.g., a loss function), arbitrary function values and gradients can be represented by a floating-point network $f$ and $D^\mathtt{AD}(ϕ\circ f)$, respectively. We further extend this result: given $ϕ_1,\dots,ϕ_n$, $D^\mathtt{AD}(ϕ_i\circ f)$ can simultaneously represent arbitrary gradients while $f$ represents the target values, under mild conditions. Our results hold for practical activation functions, e.g., $\mathrm{ReLU}$, $\mathrm{ELU}$, $\mathrm{GeLU}$, $\mathrm{Swish}$, $\mathrm{Sigmoid}$, and $\mathrm{tanh}$.

URL PDF HTML ☆

赞 0 踩 0

2606.14095 2026-06-16 cs.LG math.OC math.PR stat.ML 版本更新

Lyapunov-Based Sample Complexity Analysis for Weakly-Coupled MDPs

基于Lyapunov的弱耦合MDP样本复杂度分析

Tianhao Wu, Matthew Zurek, Weina Wang, Qiaomin Xie

发表机构 * Department of Industrial and Systems Engineering, University of Wisconsin-Madison（威斯康星大学麦迪逊分校工业与系统工程系）； Department of Computer Sciences, University of Wisconsin-Madison（威斯康星大学麦迪逊分校计算机科学系）； Computer Science Department, Carnegie Mellon University（卡内基梅隆大学计算机科学系）

AI总结针对平均奖励弱耦合MDP和Restless Bandits，提出基于Lyapunov的分析框架，实现样本和计算复杂度关于臂数N的多项式级界限，并给出首个有限样本PAC保证。

Comments Accepted for presentation at the Conference on Learning Theory (COLT) 2026

详情

AI中文摘要

我们研究了在生成模型下，平均奖励弱耦合马尔可夫决策过程（WCMDPs）和Restless Bandits（RBs）中学习的样本复杂度。直接简化为表格MDP会导致高复杂度界限，因为状态-动作空间随臂数$N$呈指数增长。通过利用弱耦合结构，我们证明可以以关于$N$的多项式样本和计算复杂度学习近优策略。具体来说，我们分析了插件方法，该方法对从数据估计的经验模型应用高效规划算法。对于完全异质的WCMDPs，我们建立了首个具有多项式复杂度和$O(1/\sqrt{N})$最优性间隙的有限样本PAC保证。对于同质RBs，我们进一步证明在温和的结构假设下可以实现更小的最优性间隙。我们工作的一个主要技术贡献是一个新颖的基于Lyapunov的分析框架。与依赖于难以控制的偏差函数的经典方法不同，我们的框架使用显式构造的Lyapunov函数以及真实模型与经验模型之间的漂移传递技术。我们框架中一个具有独立意义的关键步骤是对底层线性规划（LP）松弛的细粒度扰动分析，这为分析基于LP的策略和弱耦合系统提供了一个通用工具。

英文摘要

We study the sample complexity of learning in average-reward weakly-coupled Markov decision processes (WCMDPs) and Restless Bandits (RBs) under a generative model. Naive reduction to a tabular MDP leads to high complexity bounds as the state-action space is exponentially large in the number of arms $N$. By exploiting the weakly coupled structure, we show that near-optimal policies can be learned with sample and computational complexities that are polynomial in $N$. Specifically, we analyze the plug-in approach, which applies an efficient planning algorithm to an empirical model estimated from data. For fully heterogeneous WCMDPs, we establish the first finite-sample PAC guarantee with polynomial complexity and an $O(1/\sqrt{N})$ optimality gap. For homogeneous RBs, we further prove that a smaller optimality gap is achievable under mild structural assumptions. A primary technical contribution of our work is a novel Lyapunov-based analysis framework. Unlike classical approaches that rely on the difficult-to-control bias function, our framework uses an explicitly constructed Lyapunov function along with a drift transfer technique between the true and empirical models. A key step of independent interest in our framework is a fine-grained perturbation analysis for the underlying linear programming (LP) relaxation, which provides a general tool for analyzing LP-based policies and weakly-coupled systems.

URL PDF HTML ☆

赞 0 踩 0

2508.03867 2026-06-16 math.AG cs.LG stat.ML 版本更新

Constraining the outputs of ReLU neural networks

约束ReLU神经网络的输出

Yulia Alexandr, Guido Montúfar

发表机构 * University of California, Los Angeles（加州大学洛杉矶分校）； Max Planck Institute for Mathematics in the Sciences（马克斯·普朗克数学研究所）

AI总结通过引入与ReLU网络相关的代数簇，利用激活区域内的秩约束推导多项式方程，刻画网络可表示的函数，并研究簇达到预期维度的条件。

Comments 33 pages, 4 figures

2601.07326 2026-06-16 math.OC cs.LG 版本更新

Convergence Rate Analysis of the AdamW-style Shampoo: Unifying One-Sided and Two-Sided Preconditioning

AdamW风格Shampoo的收敛率分析：统一单侧与双侧预处理

Huan Li, Yiming Dong, Zhouchen Lin

发表机构 * Huan Li（李焕）； Yiming Dong（董怡铭）； Zhouchen Lin（林周辰）

AI总结本文研究AdamW风格Shampoo优化器，统一单侧与双侧预处理，并建立了以核范数度量的收敛率，该收敛率在理想情况下与SGD的最优收敛率类似。

Comments V3:ICML Camera-Ready. V4 v.s. V3: extend to the more general setting where the exponents of the two preconditioners do not sum to 1/2

详情

AI中文摘要

本文研究AdamW风格Shampoo优化器，它是经典Shampoo的一种有效实现，并在AlgoPerf神经网络训练算法竞赛的外部调优赛道中获胜。我们的分析统一了单侧和双侧预处理，并建立了以核范数度量的收敛率 $\frac{1}{K}\sum_{k=1}^K E\left[\|\nabla f(X_k)\|_*\right]\leq O(\frac{\sqrt{m+n}C}{K^{1/4}})$，其中 $K$ 表示迭代次数，$(m,n)$ 表示矩阵参数的尺寸，$C$ 与SGD最优收敛率中的常数一致。理论上，我们有 $\|\nabla f(X)\|_F\leq \|\nabla f(X)\|_*\leq \sqrt{m+n}\|\nabla f(X)\|_F$，这支持了我们的收敛率在 $\|\nabla f(X)\|_*= Θ(\sqrt{m+n})\|\nabla f(X)\|_F$ 且 $m$ 和 $n$ 平衡的理想情况下，可以被视为类似于SGD的最优收敛率 $\frac{1}{K}\sum_{k=1}^KE\left[\|\nabla f(X_k)\|_F\right]\leq O(\frac{C}{K^{1/4}})$。

英文摘要

This paper studies AdamW-style Shampoo, an effective variant of the classical Shampoo that won the external tuning track of the AlgoPerf neural network training competition. Our analysis unifies one-sided and two-sided preconditioning. When the exponents of the two preconditioners sum to $1/2$, we establish the convergence rate $\frac{1}{K}\sum_{k=1}^KE\left[||\nabla f(X_k)||_*\right]\leq O(\frac{\sqrt{m+n}C}{K^{1/4}})$, where $K$ represents the number of iterations, $(m,n)$ denotes the dimensions of the matrix-valued parameters, and $C$ matches the constant appearing in the optimal convergence rate of SGD. Theoretically, the nuclear norm and Frobenius norm satisfy $||\nabla f(X)||_F\leq ||\nabla f(X)||_*\leq \sqrt{\min\{m,n\}}||\nabla f(X)||_F$, which suggests that our convergence rate is analogous to the optimal $\frac{1}{K}\sum_{k=1}^KE\left[||\nabla f(X_k)||_F\right]\leq O(\frac{C}{K^{1/4}})$ convergence rate of SGD in the ideal case where $||\nabla f(X)||_*= Θ(\sqrt{\min\{m,n\}})||\nabla f(X)||_F$ and $m$ and $n$ are of comparable magnitude. Then, we extend our analysis to settings where the preconditioning exponents do not sum to 1/2, and establish convergence with an explicit but more involved rate.

URL PDF HTML ☆

赞 0 踩 0

2605.28860 2026-06-16 cs.LG cs.AI cs.CL cs.CR 版本更新

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

灾难性遗忘的机制起源：为什么RL比SFT更好地保留电路？

Jeanmely Rojas Nunez, Viraj Sawant, Nathan Allen, Nomgondalai Amgalanbaatar, Yannis Zongo, Vasu Sharma, Maheep Chaudhary

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Washington（华盛顿大学）； University of Toronto（多伦多大学）

AI总结通过引入差异电路脆弱性指标，研究比较了强化学习与监督微调在大型语言模型微调中对内部计算电路的保留程度，发现RL虽任务适应较慢但能更好保留电路，从而减轻灾难性遗忘。

详情

AI中文摘要

微调大型语言模型（LLMs）经常导致先前能力的灾难性遗忘。最近的研究表明，强化学习（RL）比监督微调（SFT）更有效地保留先前能力，这归因于策略梯度更新更接近基础策略\cite{shenfeld2025rl}。我们将这种行为解释扩展到机制层面，并探究RL的优势是否通过内部计算电路的更强保留来体现。我们引入了差异电路脆弱性，一种头部级别的度量，用于衡量电路在微调下的退化程度，并将其用于比较RL和SFT在Qwen2.5-3B-Instruct适应科学问答任务上的表现。我们发现了清晰的机制权衡：SFT更快地适应目标任务，但导致更大的电路破坏和先前能力的遗忘，而RL保留了更大比例的基础电路，代价是任务适应较慢。这些发现表明，电路保留可能有助于解释为什么RL对灾难性遗忘更具鲁棒性。我们在此发布了代码：https://github.com/rl-sft-circuit-research/differential-circuit-vulnerability。

英文摘要

Fine-tuning large language models (LLMs) frequently induces catastrophic forgetting of prior capabilities. Recent work has shown that reinforcement learning (RL) retains prior capabilities more effectively than supervised fine-tuning (SFT), attributing this to policy-gradient updates remaining closer to the base policy \cite{shenfeld2025rl}. We extend this behavioral account to the mechanistic level and ask whether RL's advantage is mirrored by stronger preservation of internal computational circuits. We introduce differential circuit vulnerability, a head-level measure of how much a circuit degrades under fine-tuning, and use it to compare RL and SFT on Qwen2.5-3B-Instruct adapted to scientific question-answering. We find a clear mechanistic trade-off: SFT adapts more rapidly to the target task but produces substantially greater circuit disruption and forgetting of prior capabilities, whereas RL preserves a larger fraction of the base circuit at the cost of slower task adaptation. These findings suggest that circuit preservation may help explain why RL is more robust to catastrophic forgetting. We released our code here: https://github.com/rl-sft-circuit-research/differential-circuit-vulnerability.

URL PDF HTML ☆

赞 0 踩 0

2602.17587 2026-06-16 math.ST cs.LG stat.ML stat.TH 版本更新

Asymptotically Optimal Sequential Testing with Markovian Data

马尔可夫数据的渐近最优序贯检验

Alhad Sethi, Kavali Sofia Sagar, Shubhada Agrawal, Debabrota Basu, P. N. Karthik

发表机构 * Indian Institute of Science, Bangalore（班加罗尔印度科学学院）； Indian Institute of Technology, Hyderabad（海得拉巴印度理工学院）； Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 – CRIStAL（里尔大学、法国国家科学研究中心、中央里尔学院、UMR 9189 – CRIStAL）

AI总结针对遍历有限状态马尔可夫链生成的数据，提出一种渐近最优的序贯假设检验方法，其期望停止时间与实例相关的下界渐近匹配，并应用于马尔可夫链蒙特卡洛模型误设检测和马尔可夫决策过程结构性质检验。

Comments ICML 2026

详情

AI中文摘要

我们研究了由遍历有限状态马尔可夫链生成的数据的单侧和α-正确序贯假设检验。原假设是未知转移矩阵属于随机矩阵的指定集合P，备择假设对应于不相交的集合Q。我们建立了备择假设下任何有效序贯检验的期望停止时间的非渐近实例相关下界，该下界是渐近紧的。我们的新分析改进了现有下界，这些下界在此设置中要么是渐近的，要么被证明是次优的。我们的下界同时包含了由未知马尔可夫链诱导的平稳分布和转移结构。我们进一步提出了一种最优检验，其期望停止时间在α→0时渐近匹配该下界。我们通过应用该框架到马尔可夫链蒙特卡洛中模型误设的序贯检测以及马尔可夫决策过程中转移动力学的线性等结构性质的检验，说明了我们框架的实用性。我们的发现给出了马尔可夫依赖下最优序贯检验程序的尖锐且一般的刻画。

英文摘要

We study one-sided and $α$-correct sequential hypothesis testing for data generated by an ergodic, finite-state Markov chain. The null hypothesis is that the unknown transition matrix belongs to a prescribed set $P$ of stochastic matrices, and the alternative corresponds to a disjoint set $Q$. We establish a non-asymptotic instance-dependent lower bound on the expected stopping time of any valid sequential test under the alternative, which is asymptotically tight. Our novel analysis improves the existing lower bounds, which are either asymptotic or provably sub-optimal in this setting. Our lower bound incorporates both the stationary distribution and the transition structure induced by the unknown Markov chain. We further propose an optimal test whose expected stopping time matches this lower bound asymptotically as $α\to 0$. We illustrate the usefulness of our framework through applications to sequential detection of model misspecification in Markov Chain Monte Carlo and to testing structural properties, such as the linearity of transition dynamics, in Markov decision processes. Our findings yield a sharp and general characterization of optimal sequential testing procedures under Markovian dependence.

URL PDF HTML ☆

赞 0 踩 0

2605.03289 2026-06-16 stat.ML cs.LG math.ST stat.TH 版本更新

Imbalanced Classification under Capacity Constraints

容量约束下的不平衡分类

Daniel Fraiman, Ricardo Fraiman

发表机构 * Departamento de Matemática y Ciencias Universidad de San Andrés（数学与科学系，圣安德烈斯大学）； CONICET Argentina（阿根廷国家科研委员会）； PEDECIBA Matemática Uruguay（乌拉圭PEDECIBA数学）

AI总结针对少数类检测中容量约束问题，提出形式化分类框架，通过重加权先验概率等价于贝叶斯分类器，并引入容量调整性能指标，实验表明优于传统方法和SMOTE。

详情

AI中文摘要

在欺诈检测、医学筛查和工业质量控制等应用中，从严重类别不平衡中检测少数类观测是一个核心挑战。在这些场景中，每个阳性预测都会触发昂贵的后续行动（如MRI扫描、交易审计），其执行受到实际运营约束。本文提出了一个容量约束下的形式化分类框架：给定用户定义的界限$b$（可标记为少数类的观测比例上限），目标是找到在该类上最大化灵敏度的分类器。我们刻画了该约束下的最优分类器，并建立了其与重加权先验概率下的经典贝叶斯分类器的等价性。我们还引入了一个容量调整的性能指标$M$，用于衡量容量约束生效时的有效检测率。该框架在标准学习方法（k-NN、SVM、随机森林和神经网络）上实现，并为每种方法建立了统计一致性。我们进一步证明，当没有超参数面向容量约束目标时，这些方法退化为事后阈值调整，并引入了一种容量感知支持向量机，在训练过程中利用约束，实现了最强的经验性能。在台湾信用卡违约数据集上的实验证实，在高不平衡情况下，容量约束分类器显著优于经典方法和SMOTE。该框架自然地扩展到多类别设置和在线环境。

英文摘要

Detecting observations from a minority class under severe class imbalance is a central challenge in applications such as fraud detection, medical screening, and industrial quality control. In these settings, each positive prediction triggers a costly follow-up action, an MRI scan, a transaction audit, whose execution is subject to real operational constraints. This paper proposes a formal classification framework under capacity constraints: given a user-defined bound limit $b$ on the proportion of observations that can be labeled as belonging to the minority class, the goal is to find the classifier that maximizes sensitivity on that class. We characterize the optimal classifier under this constraint and establish its equivalence with the classical Bayes classifier under a reweighting of the prior probabilities. We also introduce a capacity-adjusted performance metric $M$ that accounts for the effective detection rate when the capacity constraint is binding. The framework is implemented on top of standard learning methods, k-NN, SVM, random forests, and neural networks, and statistical consistency is established for each. We further show that these methods reduce to post-hoc thresholding when no hyperparameters are oriented toward the capacity-constrained objective, and introduce a capacity-aware support vector machine that exploits the constraint during training and achieves the strongest empirical performance. Experiments on the Taiwanese credit card default dataset confirm that capacity-constrained classifiers substantially outperform both classical approaches and SMOTE under high imbalance regimes. The framework extends naturally to multiclass settings and online environments.

URL PDF HTML ☆

赞 0 踩 0

2605.13092 2026-06-16 stat.ML cs.LG stat.ME 版本更新

Adaptive Kernel Density Estimation with Pre-training

具有预训练的自适应核密度估计

Ruitong Zhang, Ke Deng

发表机构 * Department of Statistics and Data Science, Tsinghua University（统计与数据科学系，清华大学）

AI总结本文提出利用预训练技术提升高维下自适应核密度估计效率，通过神经网络推荐合适核函数，实验证明在目标分布接近预训练分布时效果显著。

2605.18528 2026-06-16 math.OC cs.LG 版本更新

Scale-Invariant Neural Network Optimization: Norm Geometry and Heavy-Tailed Noise

尺度不变神经网络优化：范数几何与重尾噪声

Jiayu Zhang, Tianyi Lin

发表机构 * Department of Industrial Engineering and Operations Research（工业工程与运营管理系）

AI总结针对重尾噪声下的非凸随机优化，研究了尺度不变一阶方法的维度依赖下界，并提出了匹配上界的批处理Scion方法以及利用高阶光滑性的传输Scion方法。

Comments Polished writing; Fixed typos and references; 45 pages

详情

AI中文摘要

来自神经网络优化的一个日益增长的经验是，优化器的设计应尊重模型的参数化方式。尺度不变方法变得重要，因为其归一化的逐层更新不仅支持跨模型大小的超参数迁移，还能利用输入-输出矩阵范数几何。同时，深度学习中的随机梯度噪声通常远非亚高斯，可能表现出重尾。这些关键观察塑造了近期训练神经网络的算法原理，然而它们的联合理论后果仍未被充分探索。特别地，对于具有一般输入-输出矩阵范数的尺度不变方法，什么维度依赖是不可避免的，以及高阶光滑性是否能在重尾噪声下加速训练，尚不清楚。我们通过一般范数下 $\mathbb{R}^{m\times n}$ 上的非凸光滑随机优化来研究这些问题，目标是在 $p^{\mathrm{th}}$ 阶矩重尾噪声下达到 $\varepsilon$-稳定点。我们的第一个贡献是维度相关的下界：当 $\frac{\max\{m,n\}}{(\min\{m,n\})^2}$ 足够大时，任何具有谱范数的尺度不变一阶方法需要 $\Omega(\min\{m, n\}\varepsilon^{-\frac{3p-2}{p-1}})$ 次 oracle 调用。我们证明，具有谱范数的批处理 Scion 方法达到了匹配的上界 $O(\min\{m, n\}\varepsilon^{-\frac{3p-2}{p-1}})$。为了利用高阶光滑性，我们提出了一种传输 Scion 方法，并在范数为谱范数且 Hessian 矩阵 Lipschitz 连续时将界改进为 $O(\min\{m, n\}\varepsilon^{-\frac{5p-3}{2p-2}})$。最后，我们将实践启发式方法融入我们的传输方法，并在多种架构和模型大小上进行评估，展示了其在训练神经网络中的灵活性和兼容性。

英文摘要

A growing lesson from neural network optimization is that optimizer design should respect how the model is parametrized. The layerwise input-output structure of neural networks motivates scale-invariant optimizers, such as Muon and Scion, whose updates also support hyperparameter transfer. At the same time, stochastic gradient noise in deep learning is often far from sub-Gaussian and may exhibit heavy tails. These observations have shaped recent algorithmic principles for training neural networks, yet their joint theoretical consequences are underexplored. In particular, it remains unclear what dimension dependence is unavoidable for gradient-based methods given the problem class is defined by input-output norm and under heavy-tailed noise, and whether higher-order smoothness can accelerate training. We study these questions through nonconvex smooth stochastic optimization over $\mathbb R^{m\times n}$ equipped with general norms and under $p^\mathrm{th}$-moment heavy-tailed noise, where the goal is to achieve an $ε$-stationary point in the dual norm. Our first contribution is a dimension-dependent lower bound: when $\frac{\max\{m,n\}}{(\min\{m,n\})^2}$ is large enough, any gradient-based method requires $Ω(\min\{m, n\}ε^{-\frac{3p-2}{p-1}})$ oracles for the problem class defined by the spectral norm, which is a common input-output norm. We prove that a scale-invariant Scion method with the spectral norm can achieve the matching upper bound of $O(\min\{m, n\}ε^{-\frac{3p-2}{p-1}})$. To exploit higher-order smoothness, we propose a transported Scion method and improve the bound to $O(\min\{m, n\}ε^{-\frac{5p-3}{2p-2}})$ when the Hessian is Lipschitz. Finally, we incorporate heuristics into our transported method and evaluate it across multiple architectures and model sizes, demonstrating its flexibility and compatibility with neural network training.

URL PDF HTML ☆

赞 0 踩 0

2606.14945 2026-06-16 cs.LG 新提交

Remember, Don't Re-read: Stateful ReAct Agents for Token-Efficient Autonomous Experimentation

记住，不要重读：用于令牌高效自主实验的有状态ReAct智能体

Faramarz Jabbarvaziri

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出基于LangGraph的有状态ReAct智能体，通过持久化状态和固定大小对话窗口，将自主实验的令牌成本从O(n²)降至O(1)，在超参数调优和代码优化任务中分别减少90%和52%的令牌消耗。

详情

AI中文摘要

自动研究模式通过让大语言模型（LLM）迭代修改代码来优化目标指标，从而实现自主实验。然而，其无状态设计在每次迭代中从头重建实验上下文，导致每次迭代的令牌成本为$O(n)$，总成本为$O(n^{2})$。本文将该模式重新表述为使用LangGraph的有状态ReAct智能体，其中类型化的持久化状态通过工具调用接口跨迭代传递实验历史。评估了两个基准：超参数调优（15次迭代，每次迭代观察数据小）和代码性能优化（40次迭代，每次迭代观察数据大，包含完整源代码和基准测试结果）。在超参数调优中，有状态智能体消耗的令牌减少90%（2,492 vs. 24,465）。在代码优化中，有状态智能体消耗的令牌减少52%（627K vs. 1,275K），同时在两项任务上实现了相当的优化质量。令牌减少是结构性的：无状态智能体以每次迭代$O(n)$的成本重读完整历史，而有状态智能体在固定大小的对话窗口内以$O(1)$成本运行。本文详细描述了该架构，使从业者能够为其自己的工作流程实现有状态自动研究智能体。

英文摘要

The autoresearch pattern enables autonomous experimentation by having a large language model (LLM) iteratively modify code to optimize a target metric. Its stateless design, however, reconstructs experimental context from scratch at every iteration, incurring $O(n)$ token cost per iteration and $O(n^{2})$ total. This work reformulates the pattern as a stateful ReAct agent using LangGraph, where typed persistent state carries experimental history across iterations via a tool-calling interface. Two benchmarks are evaluated: hyperparameter tuning (15 iterations, small per-iteration observations) and code performance optimization (40 iterations, large per-iteration observations containing full source code and benchmark results). On hyperparameter tuning, the stateful agent consumes 90\% fewer tokens (2{,}492 vs.\ 24{,}465). On code optimization, the stateful agent consumes 52\% fewer tokens (627K vs.\ 1{,}275K) while achieving comparable optimization quality on both tasks. The token reduction is structural: the stateless agent re-reads the full history at $O(n)$ cost per iteration, while the stateful agent operates within a fixed-size conversation window at $O(1)$ cost. This paper describes the architecture in sufficient detail for practitioners to implement a stateful autoresearch agent for their own workflows.

URL PDF HTML ☆

赞 0 踩 0

2606.15157 2026-06-16 cs.LG cs.AI 新提交

PolyKV: Heterogeneous Retention and Allocation for KV Cache Compression

PolyKV: 异构保留与分配用于KV缓存压缩

Chao Fei, Panos Kalnis

发表机构 * King Abdullah University of Science and Technology（阿卜杜拉国王科技大学）

AI总结针对长上下文大模型推理中KV缓存压缩问题，提出PolyKV框架，通过层级别信号为每层选择合适压缩策略并分配非均匀缓存预算，实验表明在固定预算下显著恢复性能差距。

详情

AI中文摘要

KV缓存压缩对于减少长上下文大语言模型推理的内存成本至关重要。然而，现有方法通常在所有Transformer层上应用单一的压缩策略和统一的缓存预算。这种统一设计忽略了不同层在预填充和解码过程中可能扮演不同角色，因此可能需要不同的驱逐策略和缓存容量。我们提出了PolyKV，一种逐层KV缓存优化框架，考虑了方法选择和预算分配的设计空间。PolyKV基于层级别信号将每层路由到合适的KV压缩策略，同时在固定总预算下分配非均匀预算。这种公式化实现了现有KV缓存方法的异构组合。在LLaMA-3.1-8B和Qwen3-8B上的实验表明，在相同的512 token平均KV预算下，PolyKV分别恢复了最强单策略基线与FullKV之间LongBench性能差距的54.5%和25.7%。在128-1024预算范围内，PolyKV持续比最强基线提升1.7%-6.4%，对应FullKV差距的40.0%-54.5%恢复。

英文摘要

KV cache compression is essential for reducing the memory cost of long-context large language model inference. Existing approaches, however, typically apply a single compression policy and a uniform cache budget across all transformer layers. This uniform design ignores the fact that different layers can play different roles during prefill and decoding, and may therefore require different eviction strategies and cache capacities. We present PolyKV, a layer-wise KV cache optimization framework that considers design space with method selection and budget allocation. PolyKV routes each layer to a suitable KV compression policy based on layer-level signals, while assigning non-uniform budgets under a fixed total budget. This formulation enables heterogeneous compositions of existing KV cache methods. Experiments on LLaMA-3.1-8B and Qwen3-8B show that, under the same 512-token average KV budget, PolyKV recovers 54.5% and 25.7% of the LongBench performance gap between the strongest single-policy baseline and FullKV, respectively. Across 128-1024 budget sweep, PolyKV consistently improves over the strongest baseline by 1.7%-6.4%, corresponding to 40.0%-54.5% recovery of the FullKV gap.

URL PDF HTML ☆

赞 0 踩 0

2606.15244 2026-06-16 cs.LG 新提交

M-CTX: Exact and Scalable Spatial Context Retrieval for Trajectory Analytics

M-CTX：用于轨迹分析的精确保可扩展空间上下文检索

Kun Ma, Qilong Han, Chengjing Song, Jingzheng Yao, Xiao Han, Yuee Zhou, Changmao Wu

发表机构 * Harbin Engineering University（哈尔滨工程大学）； Wuhan University of Technology（武汉理工大学）； University of Chinese Academy of Sciences（中国科学院大学）； Alibaba Group（阿里巴巴集团）

AI总结提出M-CTX框架，将空间上下文构建转化为空间数据库查询，通过索引加速实现226倍加速，解决轨迹预测中上下文构建的系统瓶颈。

Comments 14 pages, 10 figures, 12 tables. Submitted to ICDE 2027

详情

AI中文摘要

现代轨迹预测器越来越多地依赖于外部空间上下文，例如地图几何、符号距离场（SDF）和附近的移动代理。虽然这种上下文提高了预测质量，但为每个训练锚点构建它已成为一个隐藏的系统瓶颈。在一个代表性的海事AIS流程中，空间上下文构建需要大约17个CPU天来处理一个5.48M锚点的语料库，这主导了下游预测器的成本。我们提出了M-CTX，一个用于轨迹分析的精确保可扩展空间上下文检索框架。M-CTX将上下文构建重新构想为一次摄取、多次查询的空间数据库工作负载，并用可组合的、基于索引的操作符替换了三个暴力阶段——OSM范围检索、SDF计算和移动船舶邻居查找。其学习的范围索引后端BR-LZ提供了召回完全的MBR重叠范围检索，并将候选放大相对于全局扩展单曲线基线降低了1.1倍至2.7倍。在四个海事区域、八个基线系统、多达4000万个空间特征的合成工作负载以及10^7条记录的AIS流上，M-CTX精确地重现了参考上下文。在5.48M锚点语料库上，它将上下文构建从大约17个CPU天减少到1.8小时，实现了226倍的端到端加速。一个可选的存储模式进一步将SDF上下文压缩了64倍，仅改变了0.04米的ADE。这些结果确立了精确空间上下文检索作为现代轨迹分析中一类数据库问题的地位。代码和数据集公开在https://github.com/mark000071/M-CTX-Traj。

英文摘要

Modern trajectory predictors increasingly condition on external spatial context, such as map geometry, signed distance fields (SDFs), and nearby moving agents. While this context improves prediction quality, constructing it for every training anchor has become a hidden systems bottleneck. In a representative maritime AIS pipeline, spatial context construction requires roughly 17 CPU-days for a 5.48M-anchor corpus, dominating the cost of the downstream predictor. We present M-CTX, an exact and scalable spatial context-retrieval framework for trajectory analytics. M-CTX recasts context construction as an ingest-once, query-many spatial database workload and replaces three brute-force stages -- OSM range retrieval, SDF computation, and moving-vessel neighbour lookup -- with composable, index-backed operators. Its learned range-index backend, BR-LZ, provides recall-complete MBR-overlap range retrieval and reduces candidate amplification by 1.1x--2.7x relative to global-expansion one-curve baselines. Across four maritime regions, eight baseline systems, synthetic workloads with up to 40M spatial features, and 10^7-record AIS streams, M-CTX reproduces the reference context exactly. On the 5.48M-anchor corpus, it reduces context construction from about 17 CPU-days to 1.8 hours, a measured 226x end-to-end speed-up. An optional storage mode further compresses SDF context by 64x with only a 0.04 m ADE change. These results establish exact spatial context retrieval as a first-class database problem in modern trajectory analytics. Code and datasets are publicly available at https://github.com/mark000071/M-CTX-Traj.

URL PDF HTML ☆

赞 0 踩 0

2606.15553 2026-06-16 cs.LG cs.AI 新提交

如何为一次性MoE专家剪枝评分：统一公式与选择原则

Zongfang Liu, Jinghui Zhang, Zijian Ma, Guangyi Chen, Xin Yuan

发表机构 * Zhejiang University（浙江大学）； Westlake University（西湖大学）； Mohamed bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）； Carnegie Mellon University（卡内基梅隆大学）

AI总结提出一次性MoE专家剪枝的统一公式，基于路由频率、门控权重和激活强度三个因素，推导出任务无关剪枝应使用基于激活的准则，任务特定剪枝可保留路由频率和门控信息，并据此提出两种新准则MAN和MSAN，在多个模型和基准上取得最优性能。

详情

AI中文摘要

混合专家（MoE）语言模型通过稀疏专家激活减少了每令牌的计算量，但部署时仍需存储完整的专家池，使得一次性专家剪枝成为减少内存使用的实用方法。尽管有效，现有准则大多是启发式的，且没有单一准则普遍最优。因此，为不同部署目标建立选择剪枝准则的原则，是一次性专家剪枝中一个重要但尚未充分探索的问题。为此，我们引入了一个一次性MoE专家剪枝的统一公式，围绕三个因素组织：路由频率、门控权重和激活强度。该公式产生了一个准则选择原则：任务无关剪枝应倾向于基于路由令牌平均、无门控的激活准则，而任务特定剪枝可以从保留路由频率和门控权重信息中受益。除了这一原则，该公式还提供了对现有启发式准则的系统性视角，并提出了两个新的任务无关准则：平均激活范数（MAN）和均方激活范数（MSAN）。在四个代表性MoE模型和16个多样化基准上，MAN和MSAN在任务无关设置中始终表现强劲，获得前两名的平均排名，并在最强基线上将平均性能提升高达8.8个百分点。

英文摘要

Mixture-of-Experts (MoE) language models reduce per-token computation through sparse expert activation, yet deployment still requires storing the full expert pool, making one-shot expert pruning a practical approach for reducing memory usage. Although effective, existing criteria are largely heuristic, and no single criterion is universally optimal. Thus, establishing a principle for selecting pruning criteria suited to different deployment objectives remains an important yet largely underexplored problem in one-shot expert pruning. To this end, we introduce a unified formulation for one-shot MoE expert pruning organized around three factors: routing frequency, gate weighting, and activation strength. The formulation yields a criteria selection principle: task-agnostic pruning should favor routed-token-averaged, gate-free activation-based criteria, whereas task-specific pruning can benefit from retaining routing-frequency and gate-weight information. Beyond this principle, the formulation also provides a systematic view of existing heuristic criteria and gives rise to two new task-agnostic criteria, Mean Activation Norm (MAN) and Mean Squared Activation Norm (MSAN). Across four representative MoE models and 16 diverse benchmarks, MAN and MSAN are consistently strong in the task-agnostic setting, obtain the top-two average ranks, and improve average performance by up to 8.8 points over the strongest baseline.

URL PDF HTML ☆

赞 0 踩 0

2606.15912 2026-06-16 cs.LG cs.AI 新提交

子空间混合：面向带宽高效上下文并行训练

Sameera Ramasinghe, Ajanthan Thalaiyasingam, Hadi Mohaghegh Dolatabadi, Gil Avraham, Violetta Shevchenko, Yan Zuo, Chamin Hewa Koneputugodage, Alexander Long

发表机构 * Pluralis Research

AI总结提出一种基于子空间混合的压缩方法，在低带宽分布式训练中实现超过95%的通信压缩，支持百亿参数模型在100K上下文长度下高效训练。

详情

AI中文摘要

预训练具有扩展上下文窗口的语言模型增强了它们在生成过程中利用丰富信息的能力。现有方法将输入序列分割成块，广播到多个设备，并逐块计算注意力，这带来了显著的通信开销。虽然在高速集群中可行，但这些方法在低带宽连接上的去中心化训练中不实用。我们提出了一种用于去中心化设置中通信高效上下文并行的压缩方法，实现了超过95%的显著压缩率，开销极小且无收敛损失。我们的关键洞察是通过高效重参数化，将激活输出动态约束到学习到的子空间混合，从而利用其内在的低秩结构。我们展示了将十亿参数去中心化模型扩展到超过100K令牌的上下文长度，在慢至300Mbps的网络上，匹配了在100Gbps互连上的集中式模型的壁钟收敛速度。

英文摘要

Pretraining language models with extended context windows enhances their ability to leverage rich information during generation. Existing methods split input sequences into chunks, broadcast them across multiple devices, and compute attention block by block which incurs significant communication overhead. While feasible in high-speed clusters, these methods are impractical for decentralized training over low-bandwidth connections. We propose a compression method for communication-efficient context parallelism in decentralized settings, achieving a remarkable compression rate of over 95\% with negligible overhead and no loss in convergence. Our key insight is to exploit the intrinsic low-rank structure of activation outputs by dynamically constraining them to learned mixtures of subspaces via efficient reparameterizations. We demonstrate scaling billion-parameter decentralized models to context lengths exceeding 100K tokens on networks as slow as 300Mbps, matching the wall-clock convergence speed of centralized models on 100Gbps interconnects.

URL PDF HTML ☆

赞 0 踩 0

2606.14739 2026-06-16 cs.ET cs.LG cs.SY eess.SY 交叉投稿

An RRAM-based Hardware Implementation of a Radial Basis Function Neuron for Edge Classifiers

基于RRAM的径向基函数神经元硬件实现用于边缘分类器

Georgios Papandroulidakis, Shady Agwa, Themis Prodromakis

发表机构 * Centre for Electronics Frontiers, Institute of Micro and Nano Systems（电子前沿中心，微纳系统研究所）

AI总结提出一种基于金属氧化物RRAM的模拟内容可寻址存储器（ACAM）硬件设计，通过可配置感受野神经元实现边缘设备上的度量分类和在线自适应，在MNIST上达到89.1%准确率，每单元每操作能耗185fJ。

详情

AI中文摘要

现代机器学习（ML）解决方案在资源受限的边缘设备上的部署凸显了实现挑战。对于包含安全关键组件（如自主导航任务）的极端边缘应用尤其如此。本文展示了一种人工神经网络（ANN）设计，利用基于金属氧化物电阻式RAM（RRAM）的模拟内容可寻址存储器（ACAM）作为高效的硬件基础，用于在边缘执行基于度量的分类和在线自适应。所提出的设计基于用于构建ACAM模块的自定义模板像素（TXL）单元，其中每个TXL单元充当可配置的感受野神经元。这些单元采用径向基激活函数来计算输入与编程感受野的距离。TXL可以组织成密集阵列，用于计算高维输入与所有存储原型之间的距离，从而有效执行快速且节能的相似性搜索。该硬件引擎支持即时学习，其中感受野参数可以调整以跟踪域偏移。通过模拟所提出的TXL-RBF分类器，我们在MNIST数据集上实现了89.1%的准确率，同时在100MHz运行时每单元每操作消耗185fJ。

英文摘要

The deployment of modern machine learning (ML) solutions on resource-constrained edge devices highlights implementation challenges. This is especially true for extreme edge applications that include safety-critical components, such as autonomous navigation tasks. This paper demonstrates an artificial neural network (ANN) design leveraging Metal-Oxide Resistive RAM (RRAM) -based Analogue Content Addressable Memory (ACAM) as an efficient hardware substrate for performing metric-based classification and online adaptation on the edge. The proposed design is based on a custom Template piXeL (TXL) cell used for building the ACAM module, where each TXL cell acts as a configurable receptive field neuron. These cells employ a Radial Basis activation function to calculate the distance of an input from the programmed receptive field. The TXL can be organised into dense arrays for calculating the distance of a high-dimensional input against all stored prototypes, effectively performing fast and energy efficient similarity search. This hardware engine enables on-the-fly learning, where the receptive field parameters can be tuned to track domain shift. Through simulation of the proposed TXL-RBF classifier we can achieve 89.1\% accuracy on the MNIST dataset while consuming 185fJ per cell per operation when operating at 100MHz.

URL PDF HTML ☆

赞 0 踩 0

2606.14992 2026-06-16 cs.AR cs.LG 交叉投稿

KATANA: A Fast, Low-Power Mapping of Kalman Filters onto Edge NPUs for Real-Time Tracking

KATANA：一种将卡尔曼滤波器快速、低功耗映射到边缘NPU上用于实时跟踪的方法

Bodhisatwa Kundu, Anish Rooj, Sumit Saha, Abhradeep Sarkar, Arghadip Das, Arnab Raha, Mrinal K. Naskar

发表机构 * Indian Institute of Technology, Kharagpur（印度理工学院，Khargpur分校）

AI总结针对实时跟踪系统中卡尔曼滤波器在边缘设备上的功耗和实时性约束，提出KATANA框架，通过三种代数图重写将LKF/EKF映射到商用NPU，在Intel Core Ultra系列上实现高达97.9%的动态能耗降低。

详情

AI中文摘要

状态估计是每个实时跟踪系统的闭环核心，从雷达监视和反无人机防御到自动驾驶和机器人技术。这些部署运行在边缘平台上，防御系统安装在车辆和无人机上，民用管道则存在于汽车和手持设备中。在这里，每增加一瓦计算能力都会侵蚀任务持续时间或操作范围。随之而来两个硬约束：每个新测量值必须在下一个控制周期之前融合，并且总计算量必须严格符合电池和热功率预算。线性卡尔曼滤波器（LKF）和扩展卡尔曼滤波器（EKF）是这些系统上的主要估计器，但如今它们几乎完全在CPU上执行，这会使多目标跟踪（MOT）更新串行化，或者在定制FPGA/ASIC加速器上执行，这会延长设计周期。当代AI-PC SoC，如Intel Core Ultra系列1和2，集成了一个低功耗、数据并行的神经处理单元（NPU）。因此，我们询问是否可以将卡尔曼滤波器映射到这个现有的矩阵引擎上，同时满足实时和低功耗预算，避免专用加速器，并保持CPU和GPU空闲用于主要工作负载。我们提出KATANA，一个NPU感知的优化框架，首次将LKF和EKF端到端映射到商用NPU上，并在量产AI-PC芯片上进行跨平台表征。KATANA应用了三种代数图重写：通过预计算的负投影矩阵H_neg进行减到加的重构、静态形状张量融合以及块对角批量并行化，确保100%的操作在DPU矩阵引擎上执行。在Series 2上，优化的批量EKF达到223.35 FPS，有功功率13.43 W，LKF达到408.73 FPS，有功功率14.05 W，与CPU实现相比，动态能耗降低高达97.9%。

英文摘要

State estimation is the closed-loop core of every real-time tracking system, from radar surveillance and counter-UAV defense to autonomous driving and robotics. These deployments run on edge platforms, where defense systems mount on vehicles and drones, and civilian pipelines live on cars and handheld devices. Here, every additional watt of compute erodes mission duration or operational range. Two hard constraints follow: each new measurement must be fused before the next control cycle, and the total compute must fit within a strict battery and thermal power envelope. The Linear and Extended Kalman Filters (LKF, EKF) are dominant estimators on these systems, but today they execute almost exclusively on CPUs, which serialize multi-object tracking (MOT) updates, or on custom FPGA/ASIC accelerators that lengthen design cycles. Contemporary AI-PC SoCs, like the Intel Core Ultra Series 1 and 2, integrate a low-power, data-parallel Neural Processing Unit (NPU). We therefore ask whether the Kalman filter can be mapped onto this existing matrix engine to meet real-time and low-power budgets simultaneously, avoiding a dedicated accelerator and keeping the CPU and GPU free for primary workloads. We present KATANA, an NPU-aware optimization framework delivering the first end-to-end mapping of the LKF and EKF onto a commercial NPU, alongside a cross-platform characterization on shipping AI-PC silicon. KATANA applies three algebraic graph rewrites: subtract-to-add reformulation via a precomputed negative-projection matrix H_neg, static-shape tensor fusion, and block-diagonal batched parallelization, ensuring 100% of operations execute on the DPU matrix engine. On the Series 2, the optimized batched EKF reaches 223.35 FPS at 13.43 W active power, and the LKF reaches 408.73 FPS at 14.05 W, delivering up to a 97.9% reduction in dynamic energy versus the CPU implementation.

URL PDF HTML ☆

赞 0 踩 0

2606.15001 2026-06-16 physics.comp-ph cond-mat.mtrl-sci cs.LG physics.chem-ph 交叉投稿

面向高效MoE大语言模型推理的时空专家预取框架

Yingnan Zhao, Razvan Bunescu, Ahmed Louri, Avinash Karanth, Ke Wang

发表机构 * George Washington University（乔治华盛顿大学）； University of North Carolina at Charlotte（北卡罗来纳大学夏洛特分校）； Ohio University（俄亥俄大学）

AI总结针对MoE大模型推理中专家加载延迟问题，通过分析专家选择行为的时空相关性，提出ST-MoE框架，结合轻量级运行时预测和可重构硬件设计，实现专家预取以重叠计算与加载，提升性能与能效。

详情

AI中文摘要

基于混合专家（MoE）的大语言模型（LLM），如Qwen和DeepSeek，最近成为一种有效的方法，可以在不按比例增加计算成本的情况下提高模型容量。通过用一组专家替换密集LLM中的传统前馈网络，并为每个输入令牌仅激活其中一部分专家，MoE模型显著增加了总参数数量，同时保持每个令牌的计算相对可控。然而，这种动态且不规则的专家激活模式在推理过程中也引入了大量的专家加载开销，因为所需的专家必须根据令牌相关的路由结果按需获取。因此，专家加载延迟成为性能和能效低下的主要来源。为此，我们首先对多种基于MoE的LLM及应用（包括语言理解和代码生成）中的专家选择行为进行了全面分析。我们的分析揭示，在每个应用领域内，专家请求在相邻的MoE层和连续的解码令牌之间表现出强相关性，使得未来的专家激活可预测。基于这一洞察，我们提出了ST-MoE，一种时空专家预取框架，它主动提前准备专家，以将专家加载与正在进行的计算重叠。ST-MoE结合了一种轻量级的运行时预测机制（保持原始路由行为）和一种可重构的硬件设计（有效支持动态专家预取）。预测机制与支持硬件的结合效果显著提高了MoE推理性能和能效，同时保持了模型推理精度。

英文摘要

Mixture-of-Experts (MoE) based large language models (LLMs), such as Qwen and DeepSeek, have recently emerged as an effective approach to improving model capacity without proportionally increasing computational cost. By replacing the conventional feed-forward network in dense LLMs with a set of experts and activating only a subset of them for each input token, MoE models significantly increase the total number of parameters while keeping the per-token computation relatively manageable. However, this dynamic and irregular expert activation pattern also introduces substantial expert loading overhead during inference, since the required experts must be fetched on demand according to token-dependent routing results. As a result, expert loading latency becomes a major source of performance and energy inefficiency. To this end, we first perform a comprehensive analysis of expert selection behavior in various MoE-based LLMs and applications, including language understanding and code generation. Our analysis reveals that, within each application domain, expert requests exhibit strong correlation across both adjacent MoE layers and consecutive decoding tokens, making future expert activations predictable. Based on this insight, we propose ST-MoE, a spatio-temporal expert prefetching framework that proactively stages experts ahead of use to overlap expert loading with ongoing computation. ST-MoE combines a lightweight runtime prediction mechanism that preserves the original routing behavior with a reconfigurable hardware design that efficiently supports dynamic expert prefetching. The combined effect of the prediction mechanism with the supporting hardware significantly improves MoE inference performance and energy efficiency while preserving model inference accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.15523 2026-06-16 cs.NE cs.AI cs.LG 交叉投稿

AQ4SViT: An Automated Quantization Framework with Search Gating Policy for Compressing Spiking Vision Transformers

AQ4SViT：一种用于压缩脉冲视觉Transformer的自动化量化框架与搜索门控策略

Rachmad Vidya Wicaksana Putra, Saad Iftikhar, Muhammad Shafique

发表机构 * eBRAIN Lab, Division of Engineering, New York University (NYU) Abu Dhabi（eBRAIN实验室，工程学院，纽约大学（NYU）阿布扎克分校）； New York University (NYU) Abu Dhabi, United Arab Emirates (UAE)（纽约大学（NYU）阿布扎克分校，阿拉伯联合酋长国（UAE））

AI总结提出AQ4SViT自动化量化框架，通过量化搜索策略和基于膜电位漂移的搜索门控策略，快速找到精度与内存的平衡点，实现脉冲视觉Transformer的高效压缩。

Comments 8 pages, 4 figures, 2 tables

详情

AI中文摘要

脉冲视觉Transformer（SViT）已成为替代性的低功耗ViT模型，但其大规模阻碍了在资源受限的嵌入式AI系统上的部署。为解决此问题，现有工作提出了量化技术来压缩SViT模型，但其手动、人工引导的方法需要大量设计时间和功耗来为每个给定网络找到合适的量化设置，使得该方法在量化多个网络时不可扩展。为此，我们提出了AQ4SViT，一种新颖的SViT自动化量化框架，能够提供快速的量化设置，并在精度和内存之间取得良好权衡。为实现这一点，AQ4SViT采用以下关键思想：量化搜索策略，在考虑精度约束的同时评估量化设置候选；以及搜索门控策略，通过利用膜电位漂移作为性能代理，快速评估和选择有前景的量化候选。在搜索门控策略中，AQ4SViT采用两种搜索算法变体以提供权衡选项：贪心搜索，执行速度快但可能导致局部最优；以及束搜索，执行速度较慢但由于搜索空间更广，在寻找全局最优选择方面性能更好。实验结果表明，与现有技术相比，AQ4SViT-Greedy快速找到合适的量化设置，搜索时间加快高达6.6倍，内存节省高达82.5%；而AQ4SViT-Beam进一步将内存占用降低高达90%，但搜索时间延长4.5倍；所有这些结果均在保持高精度的前提下获得，在ImageNet数据集上精度与原始/非量化模型相差在1.5%以内。这些结果凸显了AQ4SViT框架在推动SViT在嵌入式AI系统部署方面的进展。

英文摘要

Spiking Vision Transformers (SViTs) have emerged as alternative low-power ViT models, but their large sizes hinder their deployments on resource-constrained embedded AI systems. To address this, state-of-the-art works proposed quantization techniques to compress SViT models, but their manual, human-guided approach needs a huge design time and power/energy consumption to find the appropriate quantization setting for each given network, making this approach not scalable for quantizing multiple networks. Toward this, we propose AQ4SViT, a novel automated quantization framework for SViTs that can provide quick quantization settings with good trade-offs between accuracy and memory. To achieve this, AQ4SViT employs the following key ideas: quantization search strategy that evaluates the quantization setting candidates while considering the accuracy constraint; and search gating policy that quickly evaluates and selects promising quantization candidates by leveraging membrane potential drift as a performance proxy. In the search gating policy, AQSViT employs two search algorithm variants to provide trade-off options: Greedy search, which performs fast but may lead to local optima; and Beam search, which performs slower but has better performance in finding global optima selection due to a wider search space. Experimental results show that AQ4SViT-Greedy quickly finds the appropriate quantization settings, achieving up to 6.6x faster search time and up to 82.5% memory saving compared to the state-of-the-art; while AQ4SViT-Beam further reduces the memory footprint by up to 90% compared to the state-of-the-art, but with 4.5x longer search time; all these results are obtained while maintaining high accuracy within 1.5% from the original/non-quantized models on the ImageNet dataset. These results highlight that AQ4SViT framework offers advancements toward SViT deployments on embedded AI systems.

URL PDF HTML ☆

赞 0 踩 0

2606.15959 2026-06-16 cs.DC cs.AI cs.LG 交叉投稿

NeuronFabric：一种用于片上Transformer训练与本地Adam的软件参考架构

Evgeny Ukladchikov

发表机构 * Independent Researcher（独立研究者）

AI总结提出NeuronFabric软件参考架构，用于FPGA/ASIC实现Transformer训练与本地Adam优化，通过BF16W权重存储减少片上内存需求，在334K参数模型上验证数值正确性。

详情

AI中文摘要

公开记载的加速器架构通常将训练计算与优化器状态更新分离，或依赖外部内存和主机协调。本文提出NeuronFabric，一种旨在用于未来FPGA和ASIC实现Transformer训练与本地Adam更新的软件参考架构。一个完整的C#原型实现了前向传播、反向传播和Adam优化，无需外部机器学习框架。目标是在硬件实现前验证数值正确性和内存需求。评估模型是一个334K参数的自回归Transformer（d=88, H=4, f=264, L=4, vocab=256），在莎士比亚语料库上训练。BF16W配置在80K样本后达到评估损失1.5426，而FP32 GPU参考为1.5224，同时生成连贯的字符级文本。本文引入BF16W，它以BF16存储权重，同时以FP32保留Adam优化器动量。这减少了片上训练的内存需求。一个带Adam动量的334K参数FP32模型需要约4.0 MB，与Xilinx ZCU102设备的BRAM容量匹配。BF16W变体需要约3.34 MB，为激活存储留出内存。我们描述了早期实验中观察到的词汇预算约束，量化了BF16W内存节省，并概述了FPGA训练作为下一开发阶段。本文不包含FPGA测量。本出版物作为未来FPGA和ASIC探索NeuronFabric架构的公开架构披露和软件参考实现。

英文摘要

Publicly documented accelerator architectures generally separate training computation from optimizer-state updates or rely on external memory and host orchestration. This paper presents NeuronFabric, a software reference architecture intended for future FPGA and ASIC implementations of transformer training with local Adam updates. A complete C# prototype implements forward pass, backpropagation, and Adam optimization without external machine-learning frameworks. The goal is to validate numerical correctness and memory requirements before hardware implementation. The evaluated model is a 334K-parameter autoregressive transformer (d=88, H=4, f=264, L=4, vocab=256) trained on the Shakespeare corpus. The BF16W configuration achieves evaluation loss 1.5426 after 80K samples, compared with 1.5224 for an FP32 GPU reference, while producing coherent character-level text. The paper introduces BF16W, which stores weights in BF16 while retaining Adam optimizer moments in FP32. This reduces memory requirements for on-chip training. A 334K-parameter FP32 model with Adam moments requires approximately 4.0 MB, matching the BRAM capacity of a Xilinx ZCU102 device. The BF16W variant requires approximately 3.34 MB, leaving memory available for activation storage. We describe the vocabulary-budget constraint observed during earlier experiments, quantify BF16W memory savings, and outline FPGA training as the next stage of development. No FPGA measurements are included in this paper. This publication serves as a public architectural disclosure and software reference implementation for future FPGA and ASIC exploration of the NeuronFabric architecture.

URL PDF HTML ☆

赞 0 踩 0

2606.16599 2026-06-16 cs.AR cs.LG 交叉投稿

TreeGRNG: Binary Tree Gaussian Random Number Generator for Efficient Probabilistic AI Hardware

TreeGRNG：用于高效概率AI硬件的二叉树高斯随机数生成器

Jonas Crols, Guilherme Paim, Shirui Zhao, Marian Verhelst

AI总结提出TreeGRNG，一种基于二叉树的GRNG，用低成本常数比较器替代算术单元，在保持分布精度的同时实现每样本能耗降低3.7倍、单位面积吞吐量提升5.8倍，并支持灵活调整采样分布形状。

Comments 6 pages, 5 figures, Proceeded by the 2024 Design, Automation and Test in Europe Conference (DATE)

详情

DOI: 10.23919/DATE58400.2024.10546516
Journal ref: 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1-6 (2024)

AI中文摘要

贝叶斯神经网络（BNN）通过监控决策中的不确定性，为增强传统神经网络的可信度提供了机会。然而，在极端边缘进行BNN推理的一个显著缺点是在每个神经元中必须集成高斯随机数生成器（GRNG）。最先进的GRNG算法严重依赖多种算术运算和大型查找表的使用，给超低功耗硬件实现带来了重大挑战。为了克服这一问题，本文提出了一种创新的二叉树随机数生成器（TreeGRNG），允许使用超低成本的常数比较器代替算术单元。我们进一步利用高斯特性对TreeGRNG方案进行了一系列硬件感知优化。优化后的TreeGRNG在分布精度上超越了最先进技术（SoTA），同时每样本能耗降低了3.7倍，单位面积吞吐量提升了5.8倍。此外，我们的TreeGRNG方案在灵活性方面比当前SoTA具有明显优势，因为它使设计者能够轻松调整采样概率分布的形状，超越了传统GRNG的能力，为未来概率AI设计开辟了前景。TreeGRNG设计已在链接中开源。

英文摘要

Bayesian Neural Networks (BNNs) offer opportunities for greatly enhancing the trustworthiness of conventional neural networks by monitoring the uncertainties in decision-making. A significant drawback for BNN inference at the extreme edge, however, is the imperative need to incorporate Gaussian Random Number Generators (GRNG) within each neuron. State-of-the-art GRNG algorithms heavily depend on multiple arithmetic operations and the use of extensive look-up tables, posing significant implementation challenges for ultra-low power hardware implementations. To overcome this, this paper presents an innovative binary tree random number generator (TreeGRNG) allowing the use of ultra-low-cost constant comparators instead of arithmetic units. We further enhance the TreeGRNG proposal with a set of hardware-aware optimizations exploiting the Gaussian properties. The optimized TreeGRNG surpasses the State-of-the-Art (SoTA) in terms of distribution accuracy while achieving a 3.7$\times$ reduction in energy per sample and boosting the throughput per unit area by 5.8$\times$. Moreover, our TreeGRNG proposal possesses a distinct advantage over the current SoTA in terms of flexibility, as it easily enables designers to adjust the shape of the sampled probability distribution, extending beyond the capabilities of traditional GRNGs, opening the horizon towards future probabilistic AI designs. The TreeGRNG design is available open-source in the link

URL PDF HTML ☆

赞 0 踩 0

QuantKAN：科尔莫戈罗夫-阿诺德网络的统一量化框架

Kazi Ahmed Asif Fuad, Lizhong Chen

发表机构 * Department of EECS, Oregon State University（电子工程与计算机科学系，俄勒冈州立大学）

AI总结提出QuantKAN，首个针对KAN网络的统一量化感知训练和后训练量化框架，通过分支感知量化器处理异构参数，在多个数据集上建立基准，并揭示架构特定失效模式。

详情

AI中文摘要

科尔莫戈罗夫-阿诺德网络（KANs）用基于样条的函数替代线性权重，提供了强大的表达能力，但由于参数分布异构，给低精度部署带来了挑战。我们引入了QuantKAN，这是第一个针对KANs的量化感知训练（QAT）和后训练量化（PTQ）的统一框架。该框架对基参数和样条参数采用分支感知量化器，并将现代QAT和PTQ方法扩展到基于样条的层，涵盖EfficientKAN、FastKAN、PyKAN和KAGN。在MNIST、CIFAR-10/100、TinyImageNet和ImageNet上的实验提供了首个统一的QAT/PTQ KAN基准，表明DSQ在激进低位设置下是最稳健的QAT方法，而GPTQ在中等精度下是最强的PTQ方法。敏感性分析揭示了架构特定的失效模式：在FastKAN中，样条/基参数占主导，而在EfficientKAN、GRAM和PyKAN中，基或缩放参数占主导。在Xilinx UltraScale+设备上的Vivado HLS估计进一步表明，在W4A4下吞吐量提升高达3.32倍，每推理估计动态能耗降低7.7倍，揭示了残留的“基评估税”，这激发了基感知微架构。QuantKAN可在该https URL获取。

英文摘要

Kolmogorov--Arnold Networks (KANs) replace linear weights with spline-based functions, offering strong expressivity but posing challenges for low-precision deployment due to heterogeneous parameter distributions. We introduce QuantKAN, the first unified framework for quantization-aware training (QAT) and post-training quantization (PTQ) of KANs. The framework employs branch-aware quantizers for base and spline parameters and extends modern QAT and PTQ methods to spline-based layers across EfficientKAN, FastKAN, PyKAN, and KAGN. Experiments on MNIST, CIFAR-10/100, TinyImageNet, and ImageNet provide the first unified QAT/PTQ KAN benchmarks and show that DSQ is the most robust QAT method at aggressive low-bit settings, while GPTQ is the strongest PTQ method at moderate precision. Sensitivity analyses reveal architecture-specific failure modes: spline/basis parameters dominate in FastKAN, while base or scaling parameters dominate in EfficientKAN, GRAM, and PyKAN. Vivado HLS estimates on a Xilinx UltraScale+ device further suggest up to 3.32$\times$ throughput and 7.7$\times$ lower estimated dynamic energy per inference under W4A4, exposing a residual \emph{basis-evaluation tax} that motivates basis-aware microarchitecture. QuantKAN is available at https://github.com/OSU-STARLAB/QuantKAN/.

URL PDF HTML ☆

赞 0 踩 0

2602.00482 2026-06-16 cs.LG 版本更新

AREAL-DTA: Dynamic Tree Attention for Efficient Reinforcement Learning of Large Language Models

AREAL-DTA：用于大语言模型高效强化学习的动态树注意力机制

Jiarui Zhang, Yuchen Yang, Ran Yan, Zhiyu Mei, Liyuan Zhang, Daifeng Li, Wei Fu, Jiaxuan Gao, Shusheng Xu, Yi Wu, Binhang Yuan

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结针对RL后训练中rollout序列共享前缀导致计算冗余的问题，提出基于深度优先搜索的动态树注意力机制，结合负载均衡分布式批处理，实现高效前缀共享，训练吞吐量提升最高8.31倍。

Comments Accepted at ICML 2026. Camera-ready version. Code: https://github.com/areal-project/AReaL/tree/feat/dta

详情

AI中文摘要

基于强化学习（RL）的大语言模型（LLM）后训练计算成本高昂，因为其生成大量rollout序列，这些序列经常共享长token前缀。现有的RL框架通常在策略训练期间独立处理这些序列，即在策略梯度计算的前向和后向传播中重复计算相同的前缀，导致计算资源和内存使用的严重低效。尽管前缀共享自然地在rollout上形成树结构，但打包的树掩码方法在RL设置中扩展性差。在本文中，我们介绍AReaL-DTA，它高效地利用了RL训练中的前缀共享。AReaL-DTA采用基于深度优先搜索（DFS）的执行策略，在前向和后向计算期间动态遍历rollout前缀树，每次只具体化一条从根到叶的路径。为了进一步提高可扩展性，AReaL-DTA结合了负载均衡的分布式批处理机制，跨多个GPU动态构建和处理前缀树。在τ²-bench上，AReaL-DTA相比密集训练提高了高达8.31倍的训练吞吐量，相比稀疏训练提高了高达1.70倍。我们的代码可在该https URL获取。

英文摘要

Reinforcement learning (RL)-based post-training for large language models (LLMs) is computationally expensive, as it generates many rollout sequences that frequently share long token prefixes. Existing RL frameworks usually process these sequences independently during policy training, i.e., repeatedly recomputing identical prefixes in both the forward and backward passes of policy gradient computation, leading to substantial inefficiencies in computation resources and memory usage. Although prefix sharing naturally induces a tree structure over rollouts, packed tree-mask approaches scale poorly in RL settings. In this paper, we introduce AReaL-DTA, which efficiently exploits prefix sharing in RL training. AReaL-DTA employs a depth-first search (DFS)-based execution strategy that dynamically traverses the rollout prefix tree during both forward and backward computation, materializing only a single root-to-leaf path at a time. To further improve scalability, AReaL-DTA incorporates a load-balanced distributed batching mechanism that dynamically constructs and processes prefix trees across multiple GPUs. On $τ^2$-bench, AReaL-DTA improves training throughput by up to $8.31\times$ over dense training and up to $1.70\times$ over sparse training. Our code is available at https://github.com/areal-project/AReaL/tree/feat/dta.

URL PDF HTML ☆

赞 0 踩 0

2602.06694 2026-06-16 cs.LG 版本更新

NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models

NanoQuant: 大型语言模型高效子1位量化

Hyochan Chong, Dongkyu Kim, Changdong Kim, Minseop Choi

发表机构 * KAIST（韩国科学技术院）

AI总结本文提出NanoQuant，一种新型后训练量化方法，能够将大型语言模型压缩到二进制和子1位水平，通过低秩二进制分解问题实现高效压缩，并在消费者硬件上实现大规模部署。

Comments Accepted to ICML 2026. Hyochan Chong and Dongkyu Kim contributed equally to this work

详情

AI中文摘要

仅权重量化已成为高效服务大型语言模型（LLMs）的标准方法。然而，现有方法无法高效地将模型压缩到二进制（1位）级别，因为它们要么需要大量数据和计算资源，要么会增加存储需求。在本工作中，我们提出了NanoQuant，这是首个后训练量化（PTQ）方法，能够将LLMs压缩到二进制和子1位级别。NanoQuant将量化问题公式化为低秩二进制分解问题，并将全精度权重压缩为低秩二进制矩阵和尺度。具体而言，它利用高效的交替方向乘子法（ADMM）求解器来精确初始化潜在的二进制矩阵和尺度，然后通过块和模型重建过程调整初始化参数。因此，NanoQuant在低内存后训练量化中建立了新的帕累托前沿，并实现了子1位压缩。NanoQuant使大规模部署在消费者硬件上成为可能。例如，在单个H100上，仅需13小时即可将Llama2-70B压缩25.8倍，使70B模型能够在消费者8GB GPU上运行。

英文摘要

Weight-only quantization has become a standard approach for efficiently serving large language models (LLMs). However, existing methods fail to efficiently compress models to binary (1-bit) levels, as they either require large amounts of data and compute or incur additional storage. In this work, we propose NanoQuant, the first post-training quantization (PTQ) method to compress LLMs to both binary and sub-1-bit levels. NanoQuant formulates quantization as a low-rank binary factorization problem, and compresses full-precision weights to low-rank binary matrices and scales. Specifically, it utilizes an efficient alternating direction method of multipliers (ADMM) solver to precisely initialize latent binary matrices and scales, and then tunes the initialized parameters through a block and model reconstruction process. Consequently, NanoQuant establishes a new Pareto frontier in low-memory post-training quantization, and enables sub-1-bit compression. NanoQuant makes large-scale deployment feasible on consumer hardware. For example, it compresses Llama2-70B by 25.8$\times$ in just 13 hours on a single H100, enabling a 70B model to operate on a consumer 8 GB GPU. Code is available at https://github.com/SamsungLabs/NanoQuant.

URL PDF HTML ☆

赞 0 踩 0

2604.02343 2026-06-16 cs.LG cs.AI cs.IT math.IT 版本更新

Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains

仅用10比特从俳句到巨作：LLMs解锁巨大压缩增益

Roy Rinberg, Annabelle Michael Carrell, Simon Henniger, Nicholas Carlini, Keri Warr

发表机构 * Harvard University（哈佛大学）； University of Cambridge（剑桥大学）； Anthropic

AI总结研究LLM生成文本的无损和有损压缩，提出问答压缩（QA）交互协议，用少量二进制问题实现超100倍压缩比，高效传递知识。

详情

AI中文摘要

我们研究了LLM生成文本在无损和有损场景下的压缩，刻画了一个压缩-计算边界，其中更多的压缩需要更多的计算。对于无损压缩，领域适应的LoRA适配器可以将基于LLM的算术编码的压缩比提高2倍，相对于仅使用基础LLM的压缩。对于有损压缩，提示模型进行简洁重写然后应用算术编码可以实现约0.03的压缩比，比压缩原始响应提高2倍。我们进一步引入了问答压缩（QA），一种受游戏“二十个问题”启发的交互式有损协议。一个小模型通过向更强模型提问是/否问题来迭代优化其响应，每个答案恰好传输1比特。在涵盖数学、科学和代码的8个基准测试中，10个二进制问题恢复了小模型和大模型在标准基准上能力差距的23%到72%，在更难的基准上恢复了7%到38%，实现了0.0006到0.004的压缩比。这比之前基于LLM的压缩（Deletang等人，2024）小100倍以上，表明交互式协议可以比传输完整响应更高效地传递知识。

英文摘要

We study the compression of LLM-generated text across lossless and lossy regimes, characterizing a compression-compute frontier where more compression is possible at the cost of more compute. For lossless compression, domain-adapted LoRA adapters can improve LLM-based arithmetic coding by 2x over compression with the base LLM alone. For lossy compression, prompting a model for a succinct rewrite then applying arithmetic coding can achieve compression ratios of approximately 0.03, a 2x improvement over compressing the original response. We further introduce Question-Asking compression (QA), an interactive lossy protocol inspired by the game 'Twenty Questions'. A small model iteratively refines its response by asking yes/no questions to a stronger model, transferring exactly one bit per answer. On 8 benchmarks spanning math, science, and code, 10 binary questions recover 23% to 72% of the capability gap between a small and large model on standard benchmarks and 7% to 38% on harder benchmarks, achieving compression ratios of 0.0006 to 0.004. This is over 100x smaller than prior LLM-based compression (Deletang et al., 2024), suggesting that interactive protocols can transfer knowledge far more efficiently than transmitting full responses.

URL PDF HTML ☆

赞 0 踩 0

2605.27599 2026-06-16 cs.LG cs.AI cs.AR cs.DC cs.PF 版本更新

通过剪枝优化缓解基于LUT的神经网络的可扩展性挑战

Xuqi Zhu, Huaizhi Zhang, JunKyu Lee, Jiacheng Zhu, Chandrajit Pal, Sangeet Saha, Klaus D. McDonald-Maier, Xiaojun Zhai

发表机构 * School of Computer Science and Electronic Engineering, University of Essex（埃塞克斯大学计算机科学与电子工程学院）

AI总结针对LUT矩阵乘法可扩展性差的问题，提出集成剪枝策略的LUT-MU架构，在FPGA上实现最高1.6倍吞吐量和4.2倍能效提升。

详情

AI中文摘要

现代深度神经网络严重依赖大量的乘加运算，这构成了主要的计算成本。为了解决这个问题，基于查找表（LUT）的矩阵乘法已成为减少神经网络中乘加运算计算成本和时间的有效替代方案。然而，由于LUT矩阵乘法的固有限制，基于LUT的神经网络仍然面临可扩展性挑战。为了缓解这些可扩展性限制，本文提出了一种可扩展且节能的基于LUT的近似矩阵乘法单元（LUT-MU），通过将剪枝策略集成到MADDNESS算法（一种基于LUT的矩阵乘法方法）中，构成神经网络的基本组件。随着矩阵乘法中问题规模和精度要求的增加，我们提出的LUT-MU架构有效约束了资源扩展。案例研究表明，将我们的LUT-MU部署在神经网络架构中，包括全连接层（MNIST）和ResNets（CIFAR-10、ImageNet）——在XCZU7EV和XCZU19EG FPGA上——与主流的基于CUDA的网络实现相比，产生了高达1.6倍的吞吐量提升和4.2倍的能效提升，与领先的量化神经网络实现相比，能效提升1.8倍，且对精度影响适中。与基于原始MADDNESS的神经网络相比，我们的LUT-MU根据MADDNESS的不同分辨率配置设置，节省了1.3到2.6倍的资源。

英文摘要

Modern deep neural networks heavily rely on a large number of multiply-accumulate operations, which constitute the predominant computational cost. To address this, Look-Up Table (LUT)-based matrix multiplications have emerged as a promising alternative for reducing the computational cost and time of the multiply-accumulate operations in a neural network. However, the LUT-based neural network still faces the scalability challenge due to the inherent limitations of LUT-based matrix multiplication. To mitigate these scalability limitations, this paper proposes a scalable and energy-efficient LUT-based approximate matrix multiplication unit (LUT-MU) constituting the basic component of the neural networks by integrating a pruning strategy on the MADDNESS algorithm, a LUT-based matrix multiplication methodology. With increasing problem size and precision demands in matrix multiplication, our proposed LUT-MU architecture effectively constrains resource expansion. The case study shows that deploying our LUT-MU in neural network architectures, including fully connected layers (MNIST) and ResNets (CIFAR-10, ImageNet)-on XCZU7EV and XCZU19EG FPGAs, produces up to $1.6 \times$ throughput improvement and $4.2 \times$ energy efficiency gains over mainstream CUDA-based network implementations, and $1.8\times$ energy efficiency compared to leading quantised neural network implementations, with moderate impact on accuracy. Compared to original MADDNESS-based neural networks, our LUT-MU shows $1.3$ to $2.6\times$ resource savings based on various resolution configuration settings of MADDNESS.

URL PDF HTML ☆

赞 0 踩 0

2505.23666 2026-06-16 cs.CL cs.LG 版本更新

Frontier: 向全面且准确的LLM推理模拟迈进

Yicheng Feng, Xin Tan, Yangtao Deng, Yimin Jiang, Yibo Zhu, Hong Xu

发表机构 * The Chinese University of Hong Kong（香港中文大学）； Anuttacon ； StepFun

AI总结本文提出Frontier，一种用于现代LLM推理服务的离散事件模拟器，通过离散化抽象和对关键运行时优化的建模，实现了对复杂工作负载的准确预测，从而在不同服务场景中提供更精确的计算、通信和内存成本预测。

详情

AI中文摘要

现代LLM服务已不再是单一或整体的。生产系统现在结合了解耦执行、复杂并行性、运行时优化和状态化工作负载，如推理、代理和RL展开。模拟对于探索这个快速增长的设计空间具有吸引力，但现有模拟器缺乏所需的架构完整性和决策级精度。它们的单体-副本抽象不适合解耦服务，而平均情况分析代理可能会扭曲SLA预测甚至逆转优化结论。我们提出了Frontier，一种用于现代LLM推理服务的离散事件模拟器。Frontier具有解耦抽象。它通过建模共置、预填解码解耦（PDD）和注意力-前馈网络解耦（AFD）与角色特定的集群工作者，捕捉现代服务系统的结构和动态。它在调度器-批次引擎循环中整合关键运行时优化（例如CUDA图、推测解码），并支持新兴工作负载的状态请求。它进一步提供了在多样化服务场景中对计算、通信和内存成本的准确且可推广的预测。在16-H800 GPU测试平台上，Frontier实现了平均吞吐量误差低于4%。与最先进的模拟器相比，它在共置情况下将端到端延迟误差从44.9%降低到6.4%，在解耦情况下从51.7%降低到2.6%。它扩展到超过1000个GPU在商用CPU上，并启用了新的用例，如依赖SLA的帕累托前沿探索、异构解耦分配、代理推理调度验证和RL后训练重配置。

英文摘要

Modern LLM serving is no longer homogeneous or monolithic. Production systems now combine disaggregated execution, complex parallelism, runtime optimizations, and stateful workloads such as reasoning, agents, and RL rollouts. Simulation is attractive for exploring this growing design space, yet existing simulators lack the architectural completeness and decision-grade fidelity it demands. Their monolithic-replica abstractions are ill-suited to disaggregated serving, while average-case analytical proxies can distort SLA predictions and even reverse optimization conclusions. We present Frontier, a discrete-event simulator for modern LLM inference serving. Frontier features a disaggregated abstraction. It captures the structure and dynamics of modern serving systems by modeling co-location, Prefill-Decode Disaggregation (PDD), and Attention-FFN Disaggregation (AFD) with role-specific cluster workers, incorporating key runtime optimizations (e.g., CUDA Graphs, speculative decoding) within the scheduler-batch-engine loop, and supporting stateful requests for emerging workloads. It further provides accurate and generalizable predictions of computation, communication, and memory costs across diverse serving scenarios with complex workload compositions. On 16-H800 GPU testbed, Frontier achieves an average throughput error below 4%. Compared with state-of-the-art simulators, it reduces end-to-end latency error from 44.9% to 6.4% under co-location and from 51.7% to 2.6% under disaggregation. It scales to over 1K GPUs on commodity CPUs and enables new use cases such as SLA-dependent Pareto frontier exploration, heterogeneous disaggregated allocation, agentic reasoning scheduling validation, and RL post-training reconfiguration. We release Frontier at https://github.com/NetX-lab/Frontier.

URL PDF HTML ☆

赞 0 踩 0

2606.15625 2026-06-16 cs.LG cs.NI 新提交

Conflict-Aware Federated Fine-Tuning of Large Language Models with Mixture-of-Experts

基于混合专家的大语言模型冲突感知联邦微调

Yijun Lu, Zihan Fang, Pengpeng Qiao, Zheng Lin, Jing Yang, Yuxin Zhang, Por Lip Yee, Zhe Chen, Jun Luo

发表机构 * Nanyang Technological University（南洋理工大学）； University of Malaya（马来亚大学）

AI总结针对联邦学习中混合专家模型因数据异质性导致的专家优化冲突问题，提出FC-MoE框架，通过重要性加权、梯度共识投影和局部知识保留机制，实现稳定优化并提升非独立同分布环境下的模型性能。

Comments 6 pages, 4 figures

详情

AI中文摘要

大语言模型（LLMs）的持续扩展带来了高昂的计算成本，使得混合专家（MoE）通过稀疏激活成为一种可扩展的高效微调替代方案。虽然联邦学习（FL）作为隐私保护的协作优化范式出现，但在数据异质性下将MoE集成到FL中可能触发冲突的专家优化。客户端特定的数据分布迫使相同索引的专家在不一致甚至冲突的特征-标签相关性下进行优化。这种不匹配在聚合过程中引起破坏性干扰，从而破坏优化轨迹并降低模型性能。为解决此问题，我们提出FC-MoE，一种用于MoE微调的联邦冲突感知框架。它采用重要性感知加权方案来优先考虑可靠的局部更新，并利用梯度共识投影来抑制冲突更新，确保稳定的全局优化路径。此外，局部知识保留机制通过重新锚定领域特定残差进一步保留专门的客户端专业知识。大量实验表明，FC-MoE在非独立同分布联邦环境中加速收敛并增强全局和局部模型性能。

英文摘要

The continuous scaling of large language models (LLMs) incurs prohibitive computational costs, making Mixture-of-Experts (MoE) a scalable alternative for efficient fine-tuning via sparse activation. While federated learning (FL) emerges as the paradigm for privacy-preserving collaborative optimization, integrating MoE into FL under data heterogeneity may trigger conflicting expert optimizations. Client-specific data distributions force same-indexed experts to optimize under inconsistent or even conflicting feature-label correlations. This mismatch induces destructive interference during aggregation, thus destabilizing the optimization trajectory and degrading model performance. To address this issue, we propose FC-MoE, a federated conflict-aware framework for MoE fine-tuning. It employs an importance aware weighting scheme to prioritize reliable local updates and utilizes gradient consensus projection to suppress conflicting updates, ensuring a stable global optimization path. Moreover, a local knowledge retention mechanism further preserves specialized client expertise by re-anchoring domain-specific residuals. Extensive experiments demonstrate that FC-MoE accelerates convergence and enhances both global and local model performance in non-IID federated environments.

URL PDF HTML ☆

赞 0 踩 0

2606.15695 2026-06-16 cs.LG cs.AI 新提交

When Generator Replay Degrades: Projected Rehearsal Orchestration for Heterogeneous Federated Class-Incremental Learning

当生成器回放退化时：面向异构联邦类增量学习的投影排练编排

Thinh T. H. Nguyen, Khoa D. Doan, Binh T. Nguyen, Danh Le-Phuoc, Kok-Seng Wong

发表机构 * VinUniversity ； VNU-HCM, University of Science（胡志明市国家大学理科大学）； Technische Universität Berlin（柏林工业大学）

AI总结针对异构联邦类增量学习中客户端标签子集不同、任务阶段不一致导致的旧知识遗忘问题，提出投影排练编排框架PRO及增强版PRO-MAX，通过服务器端维护紧凑类级投影记忆并实现平衡伪多任务训练，在图像、文本和图基准上提升异构流下的保留与最终效用。

Comments 46 pages

详情

AI中文摘要

联邦类增量学习（FCIL）在客户端观察到不同标签子集、在不同阶段推进任务以及为相同语义概念提供不均匀监督时变得极其困难。现有的FCIL方法通常通过输入空间合成来保留旧知识，但在异构任务流下可能脆弱且难以跨模态迁移。为缓解这些问题，我们提出PRO，一个用投影排练编排替代合成输入回放的框架。为去除外部预训练，我们在相同的预热条件下评估所有方法。此后，PRO在服务器上维护紧凑的类级投影记忆，并允许客户端在当前示例和旧投影记忆上执行平衡的伪多任务训练。为处理更强的表示漂移，我们进一步引入PRO-MAX，它在保持相同服务器轻量原则（服务器仅聚合模型更新和记忆统计）的同时，用邻域加权记忆对齐增强PRO。在图像、文本和图基准上，PRO和PRO-MAX在异构流下提高了保留和最终效用，同时在同构FCIL中保持竞争力。即使基线获得更大的回放预算，它们在监督不平衡和阶段错位下也会退化，表明仅靠回放数量无法解决回放质量失败。额外的弱任务诊断进一步表明，更大的回放不匹配与更大的下游退化相关，而我们的方法使投影记忆与不断演化的表示保持更好对齐。

英文摘要

Federated class-incremental learning (FCIL) becomes substantially harder when clients observe different label subsets, progress through tasks at different stages, and provide uneven supervision for the same semantic concepts. Existing FCIL methods often preserve old knowledge through input-space synthesis, but they can be fragile under heterogeneous task streams and difficult to transfer across modalities. To alleviate such issues, we propose PRO, a framework that replaces synthetic input replay with projected rehearsal orchestration. To remove external pretraining, we evaluate all methods under the same warmup. After this, PRO maintains compact class-level projected memories on the server and allows clients perform balanced pseudo multi-task training over current examples and old projected memories. To handle stronger representation drift, we further introduce PRO-MAX, which augments PRO with neighborhood-weighted memory alignment while preserving the same server-light principle that the server only aggregates model updates and memory statistics. Across image, text, and graph benchmarks, PRO and PRO-MAX improve retention and final utility under heterogeneous streams while remaining competitive in homogeneous FCIL. Even when baselines are given expanded replay budgets, they degrade under supervision imbalance and stage misalignment, indicating that replay quantity alone does not resolve replay-quality failures. Additional weak-task diagnostics further show that larger replay mismatch is associated with larger downstream degradation, while our method keeps projected memories better aligned with the evolving representation.

URL PDF HTML ☆

赞 0 踩 0

2606.15940 2026-06-16 cs.LG 新提交

Causal-Privacy Audit Workflow for Synthetic and Distilled Data in Dropout Support

辍学支持中合成与蒸馏数据的因果隐私审计工作流

Hanghang Zheng, Xiwei Zhuang, Zhong Wang, Hong Liu, Xiao Chen, Jingwen He, Xia Li

发表机构 * Central University of Finance and Economics（中央财经大学）； China Development Bank（中国发展银行）； University of Cambridge（剑桥大学）

AI总结提出CaP-Eval工作流，在固定估计目标下审计合成学生数据的预测效用、因果保真度和隐私风险，发现DPGNet和蒸馏数据在保留处理效应结构上优于基线方法。

详情

AI中文摘要

合成和蒸馏的学生数据越来越多地用于实现隐私意识的学习分析，但它们对面向决策的机构支持的适用性仍不确定。在辍学支持中，生成的数据不仅必须保留预测效用或分布相似性，还必须保留用于指导咨询、付款计划援助和奖学金相关决策的财务状况证据。方法：本研究引入了CaP-Eval，一种面向决策的因果隐私审计工作流，用于在固定估计目标、时间感知调整设计、估计器集和经验隐私治理筛选下评估生成的学生数据。该工作流比较了原始数据、蒸馏数据、对抗合成数据、统计合成数据和DPGNet隐私导向生成数据在预测效用、处理效应保真度、对替代估计器的鲁棒性以及局部训练记录邻近性方面的表现。结果：DPGNet和蒸馏数据比对抗和高斯Copula基线更可靠地保留了原始财务状况处理效应结构。DPGNet在epsilon水平上保留了完整的方向和秩一致性；epsilon=10产生了最小的非原始IPW和DML偏差，而epsilon=1和epsilon=5放大了若干财务状况对比。蒸馏数据保持高度忠实，但保留了最强的局部训练记录邻近信号。TabularGNet保留了定性方向但存在中度衰减，高斯Copula压缩了效应幅度。结论：预测效用、隐私导向、经验披露信号和因果保真度存在分歧；生成的学生数据在决策使用前需要对方向、幅度、重叠和发布治理风险进行联合审计。

英文摘要

Synthetic and distilled student data are increasingly used to enable privacy-conscious learning analytics, yet their suitability for decision-facing institutional support remains uncertain. In dropout support, generated data must preserve not only predictive utility or distributional resemblance, but also the financial-status evidence used to guide advising, payment-plan assistance, and scholarship-related decisions. Method: This study introduces CaP-Eval, a decision-facing causal-privacy audit workflow for evaluating generated student data under a fixed estimand, timing-aware adjustment design, estimator set, and empirical privacy-governance screen. The workflow compares original, distilled, adversarial synthetic, statistical synthetic, and DPGNet privacy-oriented generated data on predictive utility, treatment-effect fidelity, robustness to alternative estimators, and local training-record proximity. Results: DPGNet and distilled data preserved the original financial-status treatment-effect structure more reliably than the adversarial and Gaussian Copula baselines. DPGNet preserved full direction and rank agreement across epsilon levels; epsilon = 10 produced the smallest non-original IPW and DML deviations, while epsilon = 1 and epsilon = 5 amplified several financial-status contrasts. Distilled data remained highly faithful but retained the strongest local training-record proximity signal. TabularGNet preserved qualitative directions with moderate attenuation, and Gaussian Copula compressed effect magnitudes. Conclusions: Predictive utility, privacy orientation, empirical disclosure signals, and causal fidelity diverged; generated student data require joint audits of direction, magnitude, overlap, and release-governance risk before decision use.

URL PDF HTML ☆

赞 0 踩 0

2606.16110 2026-06-16 cs.LG 新提交

Auditing Machine Unlearning: A Systematic Research on Whether Models Truly Forget

审计机器遗忘：关于模型是否真正遗忘的系统性研究

Dayong Ye, Tianqing Zhu, Ruiding Huang, Xinbo Fu, Jiayang Li, Bo Liu, Huan Huo, Wanlei Zhou

发表机构 * University of Technology Sydney（悉尼科技大学）； Deakin University（迪肯大学）； Macquarie University（麦考瑞大学）

AI总结针对隐私法规需求，提出首个实用通用审计框架，通过无知证明概念验证现有遗忘算法能否真正擦除指定数据影响，实验表明重训练和微调方法有效，去优化和Fisher/Hessian方法失败。

详情

AI中文摘要

机器遗忘因日益增长的隐私担忧和监管要求而受到广泛研究。然而，审计遗忘算法是否真正擦除了特定数据的影响仍然是一个开放的挑战。缺乏可靠且实用的审计机制可能导致严重的隐私风险，例如残留信息泄露。本文对现有遗忘算法能否真正遗忘指定数据进行了系统性研究。受无知证明概念的启发，我们提出了首个实用且通用的机器遗忘审计框架。我们的框架通过消除从头再训练基线、避免训练大量影子模型以及无需对原始训练过程进行侵入性干预，解决了现有方法的关键实用性限制。为了评估我们框架的有效性，我们首先进行验证实验以确认其健全性和完备性。然后，我们在六个数据集和十种代表性遗忘方法上进行了全面实验。结果表明，我们的框架能够可靠地区分成功和失败的遗忘。特别地，我们观察到基于重训练和基于微调的方法可以实现有效遗忘，即使目标数据仍保留在原始数据集中。相比之下，基于去优化的方法无法实现真正遗忘，反而降低了模型性能。基于Fisher/Hessian的方法也无法遗忘请求的数据，即使提供了形式化认证。此外，我们展示了我们的框架对虚假遗忘尝试具有鲁棒性，并且能够很好地泛化到大型语言模型。

英文摘要

Machine unlearning has been extensively studied in response to growing privacy concerns and regulatory requirements. However, auditing whether unlearning algorithms have truly erased the influence of specific data remains an open challenge. The lack of reliable and practical auditing mechanisms can lead to critical privacy risks, such as residual information leakage. This paper initiates a systematic investigation into whether existing unlearning algorithms can truly forget the designated data. We propose the first practical and general-purpose auditing framework for machine unlearning, inspired by the concept of proof of ignorance. Our framework addresses the key practicality limitations of existing methods by eliminating the need for retraining-from-scratch baselines, avoiding the training of large numbers of shadow models, and requiring no intrusive intervention in the original training process. To evaluate the effectiveness of our framework, we first conduct validation experiments to verify its soundness and completeness. We then perform comprehensive experiments across six datasets and ten representative unlearning methods. The results demonstrate that our framework reliably distinguishes between successful and failed unlearning. In particular, we observe that retraining-based and fine-tuning-based methods can achieve effective unlearning, even when the target data remain in the original dataset. In contrast, de-optimization-based methods fail to achieve true unlearning and instead degrade the model's performance. Fisher/Hessian-based methods also fail to unlearn requested data, even formal certification is provided. Moreover, we show that our framework is robust against fake unlearning attempts and generalizes well to large language models.

URL PDF HTML ☆

赞 0 踩 0

2606.16242 2026-06-16 cs.LG cs.CL 新提交

Rapid Poison: Practical Poisoning Attacks Against the Rapid Response Framework

快速投毒：针对快速响应框架的实用投毒攻击

David Huang, Jaewon Chang, Avidan Shah, Prateek Mittal, Chawin Sitawarin

发表机构 * Princeton University（普林斯顿大学）

AI总结揭示针对快速响应框架的投毒攻击，通过提示注入在训练集中植入恶意样本，实现目标性投毒和概念后门攻击，仅1%投毒率即可导致高达100%误报率和96%漏报率。

Comments Spotlight at ICML 2026

详情

AI中文摘要

快速响应（RR）框架部署在生产系统中，包括Anthropic的ASL-3安全措施，持续改进越狱检测分类器。当出现绕过这些分类器的新越狱方法时，快速响应会生成合成变体用于训练，帮助模型从新攻击中泛化并快速适应。我们揭示，提示注入可以渗透到该管道中，将投毒样本送入分类器的训练集，实现两个攻击目标：（I）目标性投毒攻击，通过将无害样本归类为越狱来制造误报，并具有特定所需特征（例如特定格式、主题或关键词）；（II）基于概念的后门攻击，在存在后门触发器时，诱导对越狱输入产生漏报，甚至泛化到防御者明确训练过的攻击策略中的越狱。重要的是，我们的威胁模型限制攻击者只能修改越狱样本（不能修改良性数据或标签），这是先前工作未探索的约束，使得第二个目标特别具有挑战性。我们通过遗漏攻击解决这一问题，该攻击利用了一个新现象：当在概念缺失的不安全样本上训练时，分类器错误地将该概念的存在与安全标签关联。两种攻击在仅1%的投毒率下都会导致显著且在某些情况下近乎完全的标签翻转，实现高达100%的误报率和高达96%的漏报率。

英文摘要

The Rapid Response (RR) framework, deployed in production systems, including Anthropic's ASL-3 safeguards, continuously improves jailbreak-detection classifiers. When new jailbreaks emerge that bypass these classifiers, Rapid Response generates synthetic variants for training, helping the model generalize from the new attacks and quickly adapt. We reveal that prompt injection can infiltrate this pipeline to deliver poisoned samples into the classifier's training set, enabling two attack objectives: (I) targeted poisoning attacks that create false positives on harmless samples by categorizing them as a jailbreak, with a specific desired feature (e.g., certain formatting, subject, or keyword), (II) concept-based backdoor attacks that induce false negatives on jailbreak inputs, generalizing even to jailbreaks from attack strategies the defender explicitly trained against, when the backdoor trigger is present. Importantly, our threat model restricts adversaries to modifying only jailbreak samples (not benign data or labels), a constraint unexplored by prior work that makes the second objective particularly challenging. We address this with Omission Attack, which exploits a new phenomenon: when training on concept-absent unsafe samples, the classifier misassociates that concept's presence with the safe label. Both attacks cause substantial and in some cases near-complete label flipping at only a 1% poisoning rate, achieving up to 100% false positive rates and up to 96% false negative rates.

URL PDF HTML ☆

赞 0 踩 0

2606.16304 2026-06-16 cs.LG 新提交

你的隐私我的伪装：差分隐私联邦学习中的后门攻击

Xiaolin Li, Ning Wang, Ninghui Li, Wenhai Sun

AI总结针对差分隐私联邦学习，提出RING攻击，利用差分隐私的掩蔽效应绕过防御，在中等隐私预算下平均攻击成功率90.3%。

详情

AI中文摘要

先前的研究表明，差分隐私（DP）本质上增强了联邦学习（FL）对后门攻击的鲁棒性。在本文中，我们挑战了这一假设。通过对两种基线攻击策略的实证分析，我们揭示了DP-FL中的一个基本矛盾：虽然绕过DP使得最先进的防御能够检测并过滤恶意更新，但遵守DP却无意中掩盖了其独特的统计特征。因此，随着DP降低原始后门信号，现有防御变得无效。基于这种掩蔽效应，我们提出了RING，一种新颖的攻击，明确利用DP来隐藏恶意贡献，同时最大化攻击影响。通过协同制作对抗性扰动，受损客户端在聚合过程中重构强大的后门信号而不触发异常检测。RING作为一个与底层后门技术无关的扰动层，使其广泛适用且可与现有攻击组合——这一特性显著放大了其对DP-FL的威胁。在四个图像和文本数据集上进行的非独立同分布分布下的广泛评估表明，在中等隐私预算下，RING针对六种最先进防御的平均攻击成功率达到90.3%，比基线策略提高了高达26.08倍。最后，我们评估了潜在的防御措施，发现缓解这一威胁会带来显著的效用权衡，暴露了部署差分隐私FL中的基本安全漏洞。

英文摘要

Prior research suggests that differential privacy (DP) inherently enhances the robustness of federated learning (FL) against backdoor attacks. In this paper, we challenge this assumption. Through an empirical analysis of two baseline attack strategies, we uncover a fundamental tension in DP-FL: while bypassing DP allows state-of-the-art defenses to detect and filter malicious updates, complying with DP inadvertently masks their distinguishing statistical characteristics. Consequently, existing defenses become ineffective as DP reduces the raw backdoor signal. Building on this masking effect, we propose RING, a novel attack that explicitly exploits DP to conceal malicious contributions while maximizing attack impact. By collaboratively crafting adversarial perturbations, compromised clients reconstruct a strong backdoor signal during aggregation without triggering anomaly detection. RING operates as a perturbation layer that is agnostic to the underlying backdoor technique, making it broadly applicable and composable with existing attacks -- a property that significantly amplifies the threat it poses to DP-FL. Extensive evaluations across four image and text datasets under non-iid distributions show that RING achieves an average attack success rate of 90.3% against six state-of-the-art defenses under a moderate privacy budget, an improvement of up to 26.08x over baseline strategies. Finally, we evaluate potential countermeasures and find that mitigating this threat incurs significant utility trade-offs, exposing a fundamental security gap in the deployment of differentially private FL.

URL PDF HTML ☆

赞 0 踩 0

2606.14987 2026-06-16 cs.CR cs.LG 交叉投稿

Continual Backdoor Training in IoT/CPS

物联网/信息物理系统中的持续后门训练

Oxana Salish, Kuniyilh S

AI总结本文提出一种针对物联网/信息物理系统中持续学习的后门攻击方法，通过形式化威胁模型、分析持续学习放大后门持久性的原因，并评估不同条件下的攻击效果，揭示了保障终身学习安全的关键挑战。

详情

AI中文摘要

物联网（IoT）和信息物理系统（CPS）越来越依赖持续学习（CL）来适应不断变化的环境、设备异构性和概念漂移，从而提高整体效用。虽然持续适应对于数据模式演变的长期IoT部署至关重要，但它也引入了新的安全漏洞。特别是，后门攻击可以利用增量更新、重放缓冲区和表示重用来植入持久的恶意行为，这些行为在正常操作期间保持休眠，但在特定触发器激活时被触发。在本文中，我们提出了一种针对IoT/CPS系统中持续学习的后门攻击。为此，我们形式化了IoT/CPS特定的威胁模型，分析了为什么持续学习会放大IoT流水线中的后门持久性，并在不同条件下评估了我们的技术。我们的分析强调了在IoT/CPS和工业物联网（IIoT）环境中保障终身学习的关键开放挑战，以及加强安全控制的必要性。

英文摘要

Internet of Things (IoT) and Cyber-physical systems (CPS) increasingly rely on continual learning (CL) to adapt to evolving environments, device heterogeneity, and concept drift, thereby improving overall utility. While continual adaptation is essential for long-lived IoT deployments where data patterns evolve, it also introduces new security vulnerabilities. In particular, backdoor attacks can exploit incremental updates, replay buffers, and representation reuse to implant persistent malicious behaviors that remain dormant during normal operation but activate upon specific triggers. In this paper, we present a backdoor attack in continual learning used in IoT/CPS systems. To this end, we formalize an IoT/CPS-specific threat model, analyze why continual learning amplifies backdoor persistence in IoT pipelines, and evaluate our technique under varying conditions. Our analysis highlights critical open challenges in securing lifelong learning in IoT/CPS and industrial IoT (IIoT) environments, as well as the need for heightened security controls.

URL PDF HTML ☆

赞 0 踩 0

2606.15277 2026-06-16 cs.IR cs.AI cs.DB cs.ET cs.LG 交叉投稿

Guiding Federated Graph Recommendation with LLM-encoded knowledge

利用LLM编码知识指导联邦图推荐

Thi Minh Chau Nguyen, Hien Trang Nguyen, Duc Anh Nguyen, Van Ho-Long, Thanh Trung Huynh, Zhao Ren

发表机构 * institutetext（机构）

AI总结针对联邦图推荐中跨客户端图表示对齐难的问题，提出利用大语言模型编码的语义信号指导结构表示的选择性聚合，提升推荐准确性。

Comments Technical Report

详情

AI中文摘要

基于图的推荐系统在从用户-物品交互中提取协同信号方面非常有效，联邦学习（FL）则可以在保护用户隐私的同时训练这些模型。然而，跨分布式、非独立同分布（non-IID）客户端聚合图表示仍然是一个挑战；局部学习的结构嵌入常常不对齐，简单的平均无法捕捉有意义的跨客户端关系。大多数现有的联邦图方法仅依赖结构聚合，忽略了大型语言模型（LLM）中丰富的全局语义上下文。在本文中，我们提出了一种新颖的框架，利用LLM编码的知识来指导联邦图推荐。具体来说，客户端从局部图中学习结构表示，同时通过冻结的LLM将其典型交互模式总结为紧凑的语义向量。中央服务器随后利用这些LLM编码的语义信号发现跨客户端的相关偏好模式，指导其结构表示的选择性聚合。这实现了语义感知的跨客户端协作，而无需暴露原始数据。在标准基准上的大量实验表明，利用LLM编码知识指导结构对齐一致地提高了现有联邦图基线的推荐准确性。

英文摘要

Graph-based recommender systems are highly effective at extracting collaborative signals from user--item interactions, and federated learning (FL) allows these models to be trained while preserving user privacy. However, aggregating graph representations across distributed, non-IID clients remains a challenge; structural embeddings learned locally often misalign, and naive averaging fails to capture meaningful cross-client relationships. Most existing federated graph methods rely exclusively on structural aggregation, neglecting the rich, global semantic context available in large language models (LLMs). In this paper, we propose a novel framework that uses LLM-encoded knowledge to guide federated graph recommendation. Specifically, clients learn structural representations from local graphs while simultaneously summarizing their typical interaction patterns into compact semantic vectors via a frozen LLM. The central server then uses these LLM-encoded semantic signals to discover related preference patterns across clients, guiding the selective aggregation of their structural representations. This enables semantically informed cross-client collaboration without exposing raw data. Extensive experiments on standard benchmarks show that guiding structural alignment with LLM-encoded knowledge consistently improves recommendation accuracy over existing federated graph baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.15963 2026-06-16 cs.DC cs.AI cs.CL cs.LG 交叉投稿

PreLort: Prefix-Nested LoRA for Federated Fine-Tuning under Rank Heterogeneity

PreLort: 面向秩异构联邦微调的前缀嵌套LoRA

Muhammad Waseem, Nurbek Tastan, Andrej Jovanovic, Nicholas D. Lane, Nils Lukas, Karthik Nandakumar, Samuel Horvath

发表机构 * MBZUAI, UAE University of Cambridge, UK（MBZUAI，阿联酋剑桥大学，英国）； Flower Labs, UK（Flower Labs，英国）； Michigan State University, USA（密歇根州立大学，美国）

AI总结针对联邦LoRA中异构秩导致的信息分布不均问题，提出PreLort方法，通过前缀层次化嵌套低秩结构、分段聚合规则和前缀嵌套训练策略，使低秩客户端受益于高秩客户端的丰富信息，在准确率和ROUGE-L上优于现有方法。

详情

AI中文摘要

使用LoRA等参数高效方法对大型语言模型进行联邦微调，能够实现基础模型的隐私保护适配。异构硬件资源带来了挑战，因为具有不同适配器秩的客户端无法直接聚合。现有方法虽能实现异构秩下的聚合，但未能控制信息在秩维度上的分布，导致共享低秩表示利用不充分。为此，我们提出PreLort：一种用于联邦LoRA的嵌套低秩公式，将适配器维度组织成前缀层次结构。我们的方法确保较低秩维度编码任务相关信息，而较高秩维度捕获额外容量。基于此，我们引入(i)分段聚合规则，仅对贡献于每个秩分段的客户端进行平均，避免来自零填充低秩客户端的稀释；以及(ii)前缀嵌套训练策略，在多个秩截断下优化每个适配器，鼓励有用信号集中在低秩前缀维度。这些组件共同鼓励一个一致的低秩前缀捕获最任务相关信息，而较高秩维度学习额外容量。这使得低秩客户端能够受益于高秩客户端贡献的更丰富信息，因为前缀维度被一致地学习和聚合。实验表明，我们的方法在准确率和ROUGE-L上持续优于先前的异构联邦LoRA方法，并在多个基础模型上实现了更低或相当困惑度。

英文摘要

Federated fine-tuning of large language models using parameter-efficient methods such as LoRA enables privacy-preserving adaptation of foundation models. Heterogeneous hardware resources introduce challenges, as clients with different adapter ranks cannot be directly aggregated. While existing methods enable aggregation under heterogeneous ranks, they fail to control how information is distributed across rank dimensions, leading to suboptimal use of shared low-rank representations. Instead, we propose PreLort: a nested low-rank formulation for federated LoRA that organizes adapter dimensions into a prefix hierarchy. Our approach ensures that lower-rank dimensions encode task-relevant information, while higher-rank dimensions capture additional capacity. Building on this, we introduce (i) a segment-wise aggregation rule that averages only over clients contributing to each rank segment, avoiding dilution from zero-padded lower-rank clients, and (ii) a prefix-nested training strategy that optimizes each adapter under multiple rank truncations, encouraging useful signal to concentrate in low-rank prefix dimensions. Together, these components encourage a consistent low-rank prefix capturing the most task-relevant information, while higher-rank dimensions learn additional capacity. This allows low-rank clients to benefit from richer information contributed by higher-rank clients, as prefix dimensions are consistently learned and aggregated. Experiments demonstrate that our method consistently outperforms prior heterogeneous federated LoRA methods in accuracy and ROUGE-L, while achieving lower or comparable perplexity across multiple base models.

URL PDF HTML ☆

赞 0 踩 0

2606.16100 2026-06-16 cs.CR cs.CL cs.LG 交叉投稿

Your "Pro" LLM Subscription May Actually Be "Free": Exposing Fingerprint Spoofing Risks in LLM Inference Services

你的“专业”LLM订阅可能实际上是“免费”的：揭示LLM推理服务中的指纹欺骗风险

Jiahao Zhang, Xiuyu Li, Suhang Wang

发表机构 * The Pennsylvania State University（宾夕法尼亚州立大学）

AI总结提出指纹欺骗攻击，恶意服务商通过参数高效微调弱模型模仿强模型，绕过用户指纹验证；理论证明用户资源限制导致指纹易被欺骗，并设计GhostPrint攻击框架，实验表明其能以低成本持续绕过主流指纹方法。

详情

AI中文摘要

随着大型语言模型（LLM）API变得无处不在，用户越来越依赖黑盒指纹识别来验证提供商是否提供广告中宣传的高级模型。然而，这些方法可能忽视那些操纵模型权重以欺骗指纹识别过程的对抗性提供商。我们引入了一种称为指纹欺骗的新威胁，其中恶意提供商隐秘地提供一个通过参数高效微调以模仿更强模型的较弱模型，从而规避用户端的指纹识别。我们首先正式证明用户端资源限制（即有限的查询预算和弱指纹分类器）使得当前的指纹识别容易受到指纹欺骗。在此理论分析指导下，我们提出了GhostPrint，一个利用代理建模、奖励排名微调和知识蒸馏的成本效益攻击框架。在静态和持续指纹识别设置中的广泛评估表明，GhostPrint允许弱模型以低微调成本持续绕过代表性指纹方法，同时保持实用性，暴露了当前LLM指纹识别流程中的一个关键漏洞。

英文摘要

As Large Language Model (LLM) APIs become ubiquitous, users increasingly rely on black-box fingerprinting to verify that providers are serving the advertised premium models. However, these methods may overlook adversarial providers who manipulate model weights to cheat the fingerprint process. We introduce a novel threat termed fingerprint spoofing, where a malicious provider stealthily serves a weaker model that has been parameter-efficiently fine-tuned to mimic a stronger model, thereby evading user-side fingerprinting. We first formally prove that user-side resource constraints (i.e., finite query budgets and weak fingerprinting classifiers) make current fingerprinting vulnerable to fingerprint spoofing. Guided by this theoretical analysis, we propose GhostPrint, a cost-effective attack framework leveraging surrogate modeling, reward-ranked fine-tuning, and knowledge distillation. Extensive evaluations in both static and continual fingerprinting settings demonstrate that GhostPrint allows weak models to consistently bypass representative fingerprint methods while maintaining utility at a low fine-tuning cost, exposing a critical vulnerability in current LLM fingerprinting pipelines.

URL PDF HTML ☆

赞 0 踩 0

2606.16180 2026-06-16 cs.CV cs.LG 交叉投稿

To forget is to preserve: Machine Unlearning for 3D medical image segmentation

遗忘即保留：面向3D医学图像分割的机器遗忘

Nitesh Kumar Singh, Akhilesh Singh, Arjun Arora

发表机构 * University of California, San Diego（加州大学圣地亚哥分校）

AI总结针对数据隐私法规，研究基于四种机制的近似遗忘策略在3D医学图像分割中的应用，通过Dice系数和MAE评估，发现噪声标签策略在遗忘集和保留集间取得最佳平衡。

Comments 9 pages, 5 figures

详情

AI中文摘要

随着新的数据隐私法规（如GDPR [1]）允许个人要求从训练好的机器学习模型中删除其任何个人信息，人们开始推动研究从模型中遗忘数据以遵守这些法律。在这方面，基于四种机制，我们考虑了几种应用于MRBrainS18数据集 [2] 的近似遗忘策略。我们使用3D ResNet-50 [3] 作为分割的骨干架构，该架构已通过Med3D框架 [4] 进行预训练。以预训练模型为基线，我们评估了在两类主体（即保留和遗忘）上的相应保留准确率。我们通过Dice相似系数和平均绝对误差（MAE）值评估这些方法，使用两个独立的训练周期（20和50个epoch）。结果表明，噪声标签策略具有最佳的整体权衡，在50个epoch后，遗忘集准确率下降93%，同时保留集准确率保持84%。所有其他策略在更高的epoch数下表现出极端的遗忘水平，同时其保留集性能也出现灾难性退化。本研究结果为在主体特定水平上的遗忘提供了严格的性能指标基线，并为从业者选择适当策略提供了明确标准。

英文摘要

With new data privacy laws such as the General Data Protection Regulation (GDPR) [1] that allow individuals to ask that any of their personal information be erased from trained machine learning models, there has been a push to investigate the unlearning of data from models as a way to comply with these laws. In this regard, based on four mechanics, we consider several approximate unlearning strategies applied to the MRBrainS18 dataset [2]. We use a 3D ResNet-50 [3] as a backbone architecture for segmentation that has been pre-trained with the Med3D framework [4]. Considering the pre-trained model as a baseline, we evaluate respective retention accuracy on 2 types of subjects, i.e., retain and forget. We assess these approaches through their Dice similarity coefficient and mean absolute error (MAE) values using two separate training horizons 20 and 50 epochs. The results show that the Noisy Label strategy had the best overall trade-off with a decrease of 93% in the forget set while maintaining 84% accuracy for the retained set after 50 epochs. All other strategies showed extreme levels of forgetting at higher epoch numbers while also demonstrating catastrophic degradation of their retain set performance. The results of this study provide a strict baseline of performance metrics for unlearning on a subject-specific level and provide practitioners with clear criteria for selecting the proper strategies.

URL PDF HTML ☆

赞 0 踩 0

2606.16763 2026-06-16 cs.CR cs.IT cs.LG math.IT 交叉投稿

Cross-Silo De-Anonymization Under Local Differential Privacy: Threat Model, Phase Transition, and Coordination Necessity

跨数据源去匿名化在本地差分隐私下的威胁模型、相变与协调必要性

Ziniu Liu, Aiping Li

AI总结本文提出跨数据源人员级DP（XSP-DP）框架，证明去匿名化在k*=Θ(log n/ε²)处发生相变，并表明跨数据源协调对防御攻击是必要的。

Comments 23 pages, 4 figures

详情

AI中文摘要

当一个人的记录出现在k个独立数据源中，每个数据源受(ε, δ)-差分隐私保护时，标准组合机制为联合输出提供有效的(kε, kδ)-DP保证。然而，这个最坏情况边界并未回答具体的推断问题：在多大的k下，攻击者实际上能识别出目标人物？本文开发了回答该问题所需的信息论框架。\n我们引入了跨数据源人员级DP（XSP-DP），这是一种Pufferfish风格的隐私概念，其邻接关系同时捕获单个人员在所有数据源中的所有记录，并验证了标准基本组合边界适用于该邻接模型。在此框架内，我们证明去匿名化在k* = Θ(log n / ε²)（总体规模n，每个数据源RR参数ε）处经历相变：Fano下界表明当k << k*时任何估计器都失败，而匹配的最大似然上界表明当k >> k*时攻击成功。一个显式的XOR+随机响应构造展示了信息协同：每个数据源的输出单独对目标无信息，但联合互信息严格为正。对于非协调的二元随机响应机制，我们证明一旦k超过阈值，去匿名化不可避免，从而确立了跨数据源协调的必要性。\n这些结果为本地DP下的跨数据源推断攻击提供了基线威胁模型和Θ级阈值。

英文摘要

When a person's records appear in k independent data silos, each protected by (epsilon, delta)-differential privacy, standard composition yields a valid (k*epsilon, k*delta)-DP guarantee for the joint output. This worst-case bound, however, does not answer the concrete inference question: at what k can an adversary actually identify a target person? This paper develops the information-theoretic framework needed to answer that question. We introduce cross-silo person-level DP (XSP-DP), a Pufferfish-style privacy notion whose adjacency relation captures all records of a single person across all silos simultaneously, and verify that the standard basic composition bound carries over to this adjacency model. Within this framework we prove that de-anonymization undergoes a phase transition at k* = Theta(log n / epsilon^2) (population size n, per-silo RR parameter epsilon): a Fano lower bound shows any estimator fails for k << k*, while a matching maximum-likelihood upper bound shows the attack succeeds for k >> k*. An explicit XOR + randomized-response construction demonstrates information synergy: each silo's output is individually uninformative about the target, yet the joint mutual information is strictly positive. For non-coordinated binary randomized-response mechanisms, we prove that de-anonymization is inevitable once k exceeds the threshold, establishing that cross-silo coordination is necessary. These results provide a baseline threat model and Theta-level threshold for cross-silo inference attacks under local DP.

URL PDF HTML ☆

赞 0 踩 0

2407.04884 2026-06-16 cs.LG cs.CR 版本更新

Convex Approximation of Two-Layer ReLU Networks for Hidden State Differential Privacy

两层ReLU网络的凸近似用于隐藏状态差分隐私

Rob Romijnders, Antti Koskela

发表机构 * University of Amsterdam（阿姆斯特丹大学）； Nokia Bell Labs（诺基亚贝尔实验室）

AI总结提出通过ReLU最小化问题的对偶形式的随机近似，将两层ReLU网络转化为强凸问题，从而应用隐藏状态差分隐私分析，实现与DP-SGD相当的隐私-效用权衡。

Comments Errata: correction of Lemma 3.1. Added experiments in Appendix D.1

详情

AI中文摘要

差分隐私的隐藏状态威胁模型假设攻击者只能访问最终训练好的机器学习模型，而无法看到训练过程中的中间状态。然而，当前该模型下的隐私分析仅限于凸优化问题，降低了其对多层神经网络的适用性，而多层神经网络在现代深度学习应用中至关重要。值得注意的是，隐藏状态隐私分析在分类任务中最成功的应用仅限于逻辑回归模型。我们证明，通过ReLU最小化问题的对偶形式的随机近似，可以得到一个强凸问题，从而能够私下训练凸问题，其隐私-效用权衡与使用差分隐私随机梯度下降（DP-SGD）训练的两层ReLU网络相当。这使得现有的隐藏状态隐私分析得以应用，并为使用固定不相交小批量的噪声循环小批量梯度下降（NoisyCGD）方法提供准确的隐私界限。在基准分类任务上的实证结果表明，NoisyCGD可以实现与应用于两层ReLU网络的DP-SGD相当的隐私-效用权衡。

英文摘要

The hidden state threat model of differential privacy (DP) assumes that the adversary has access only to the final trained machine learning (ML) model, without seeing intermediate states during training. However, the current privacy analyses under this model are restricted to convex optimization problems, reducing their applicability to multi-layer neural networks, which are essential in modern deep learning applications. Notably, the most successful applications of the hidden state privacy analyses in classification tasks have only been for logistic regression models. We demonstrate that it is possible to privately train convex problems with privacy-utility trade-offs comparable to those of 2-layer ReLU networks trained with DP stochastic gradient descent (DP-SGD). This is achieved through a stochastic approximation of a dual formulation of the ReLU minimization problem, resulting in a strongly convex problem. This enables the use of existing hidden state privacy analyses and provides accurate privacy bounds also for the noisy cyclic mini-batch gradient descent (NoisyCGD) method with fixed disjoint mini-batches. Empirical results on benchmark classification tasks demonstrate that NoisyCGD can achieve privacy-utility trade-offs on par with DP-SGD applied to 2-layer ReLU networks.

URL PDF HTML ☆

赞 0 踩 0

2411.02908 2026-06-16 cs.LG cs.DC 版本更新

Photon: Federated LLM Pre-Training

Photon: 联邦大语言模型预训练

Lorenzo Sani, Alex Iacob, Zeyu Cao, Royson Lee, Bill Marino, Yan Gao, Dongqi Cai, Zexi Li, Wanru Zhao, Xinchi Qiu, Nicholas D. Lane

发表机构 * University of Cambridge（剑桥大学）； Flower Labs（Flower实验室）； Zhejiang University（浙江大学）

AI总结提出Photon系统，首次实现联邦端到端LLM预训练，通过跨孤岛联邦学习在弱连接GPU上训练高达7B参数模型，通信量减少64-512倍，收敛速度比DiLoCo快2倍。

Comments 18 pages, 9 appendix pages, 12 figures, 3 algorithms, 12 tables

详情

Journal ref: Proceedings of Machine Learning and Systems 7 (MLSys 2025), 2025

AI中文摘要

扩展大型语言模型（LLM）需要大量的数据和计算资源，传统上由于分布式训练的高带宽需求，这些资源被限制在数据中心内。低带宽方法如联邦学习（FL）如果能够有效地用于预训练，则可以实现在弱连接GPU上协作训练更大的模型。为此，我们介绍了Photon，这是第一个用于联邦端到端LLM训练的系统，利用跨孤岛FL进行全球规模训练，同时最小化通信开销。使用Photon，我们从零开始训练了第一个联邦式仅解码器LLM系列。我们证明：（1）Photon可以以联邦方式训练高达7B的模型大小，同时达到比集中式预训练更好的困惑度；（2）Photon模型训练时间随可用计算资源减少，实现与集中式类似的计算-时间权衡；（3）Photon通过通信量减少64-512倍，比基线分布式训练方法的实际时间提升35%。我们的方法对数据异质性具有鲁棒性，并且收敛速度是DiLoCo等先前方法的两倍。这种惊人的数据效率源于一种独特的方法，即结合小客户端批量大小和极高的学习率，这得益于联邦平均对超参数的鲁棒性。因此，Photon代表了第一个经济可行的全球互联网范围LLM预训练系统。

英文摘要

Scaling large language models (LLMs) demands extensive data and computing resources, which are traditionally constrained to data centers by the high-bandwidth requirements of distributed training. Low-bandwidth methods like federated learning (FL) could enable collaborative training of larger models across weakly-connected GPUs if they can effectively be used for pre-training. To achieve this, we introduce Photon, the first complete system for federated end-to-end LLM training, leveraging cross-silo FL for global-scale training with minimal communication overheads. Using Photon, we train the first federated family of decoder-only LLMs from scratch. We show that: (1) Photon can train model sizes up to 7B in a federated fashion while reaching an even better perplexity than centralized pre-training; (2) Photon model training time decreases with available compute, achieving a similar compute-time trade-off to centralized; and (3) Photon outperforms the wall-time of baseline distributed training methods by 35% via communicating 64x-512xless. Our proposal is robust to data heterogeneity and converges twice as fast as previous methods like DiLoCo. This surprising data efficiency stems from a unique approach combining small client batch sizes with extremely high learning rates, enabled by federated averaging's robustness to hyperparameters. Photon thus represents the first economical system for global internet-wide LLM pre-training.

URL PDF HTML ☆

赞 0 踩 0

2505.19699 2026-06-16 cs.LG cs.AI cs.DC 版本更新

Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments

Mosaic: 面向异构分布式环境的无数据知识蒸馏与混合专家模型

Junming Liu, Yanting Gao, Yuqi Li, Siyuan Meng, Yifei Sun, Aoqi Wu, Yirong Chen, Ding Wang, Shiping Wen

发表机构 * School of Computer Science and Technology, Tongji University（同济大学计算机科学与技术学院）； Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； The City University of New York（纽约城市大学）； Shenzhen University of Advanced Technology（深圳先进技术大学）

AI总结针对联邦学习中模型与数据异构性问题，提出Mosaic框架，通过本地生成模型合成隐私保护数据，并利用混合专家模型蒸馏全局模型，在图像和多模态基准上超越现有方法。

Comments 23 pages, 5 figures, 24 tables; Accepted by Knowledge-Based Systems, 2026

详情

AI中文摘要

AL-GNN: 基于分析学习的隐私保护且无需重放的持续图学习

Xuling Zhang, Jindong Li, Yifei Zhang, Mingqi Yang, Menglin Yang

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港理工大学（广州））； Northwestern Polytechnical University（西北工业大学）； South China University of Technology（华南理工大学）

AI总结提出AL-GNN框架，利用分析学习理论将持续图学习转化为递归最小二乘优化，通过闭式分类器更新和正则化特征自相关矩阵实现无需反向传播和重放缓冲的高效训练，在保护隐私的同时提升性能并减少遗忘。

详情

AI中文摘要

持续图学习（CGL）旨在使图神经网络能够从图结构数据流中增量学习，而不会遗忘先前获得的知识。现有方法，特别是基于经验重放的方法，通常存储并重新访问过去的图数据以缓解灾难性遗忘。然而，这些方法存在显著局限性，包括隐私问题和低效性。在这项工作中，我们提出了AL-GNN，一种新颖的持续图学习框架，消除了对反向传播和重放缓冲区的需求。相反，AL-GNN利用分析学习理论的原理，将学习形式化为递归最小二乘优化过程。它通过闭式分类器更新和正则化特征自相关矩阵来分析和更新模型知识。这种设计使得每个任务能够进行高效的单次训练，并通过避免存储历史样本固有地保护数据隐私。在多个动态图分类基准上的大量实验表明，AL-GNN取得了与现有方法相比具有竞争力或更优的性能。例如，它在CoraFull上平均性能提高了10%，在Reddit上遗忘减少了30%以上，同时由于其无反向传播的设计，训练时间减少了近50%。

英文摘要

Continual graph learning (CGL) aims to enable graph neural networks to incrementally learn from a stream of graph structured data without forgetting previously acquired knowledge. Existing methods particularly those based on experience replay typically store and revisit past graph data to mitigate catastrophic forgetting. However, these approaches pose significant limitations, including privacy concerns, inefficiency. In this work, we propose AL GNN, a novel framework for continual graph learning that eliminates the need for backpropagation and replay buffers. Instead, AL GNN leverages principles from analytic learning theory to formulate learning as a recursive least squares optimization process. It maintains and updates model knowledge analytically through closed form classifier updates and a regularized feature autocorrelation matrix. This design enables efficient one pass training for each task, and inherently preserves data privacy by avoiding historical sample storage. Extensive experiments on multiple dynamic graph classification benchmarks demonstrate that AL GNN achieves competitive or superior performance compared to existing methods. For instance, it improves average performance by 10% on CoraFull and reduces forgetting by over 30% on Reddit, while also reducing training time by nearly 50% due to its backpropagation free design.

URL PDF HTML ☆

赞 0 踩 0

2601.09304 2026-06-16 cs.LG 版本更新

Single-Round Clustered Federated Learning via Data Collaboration Analysis for Non-IID Data

基于数据协作分析的单轮聚类联邦学习用于非独立同分布数据

Sota Sugawara, Yuji Kawamata, Akihiro Toyoda, Tomoru Nakayama, Yukihiko Okada

发表机构 * Graduate School of Science and Technology, University of Tsukuba（茨川大学理学技术研究生院）； Center for Artificial Intelligence Research, Tsukuba Institute for Advanced Research, University of Tsukuba（茨川大学先进研究所人工智能研究中心）； Institute of Systems and Information Engineering, University of Tsukuba（茨川大学系统与信息工程研究所）

AI总结提出单轮框架DC-CFL，通过数据协作分析中的总变差距离量化客户端相似性，使用层次聚类和协作学习实现聚类与模型训练，在非IID数据上达到与多轮方法相当的精度。

Comments 9 pages, 3 figures

详情

AI中文摘要

联邦学习（FL）允许多个客户端在不共享原始数据的情况下进行分布式学习。当客户端之间的统计异质性严重时，聚类联邦学习（CFL）可以通过对相似客户端进行分组并训练聚类模型来提高性能。然而，大多数CFL方法依赖多轮通信进行聚类估计和模型更新，这在通信轮数严格受限的情况下限制了其实用性。我们提出了基于数据协作的聚类联邦学习（DC-CFL），这是一个单轮框架，仅使用DC分析中共享的信息即可完成客户端聚类和聚类学习。DC-CFL通过标签分布之间的总变差距离量化客户端间相似性，使用层次聚类估计聚类，并通过DC分析进行聚类学习。在代表性非IID条件下的多个开放数据集上的实验表明，DC-CFL在仅需一轮通信的情况下达到了与多轮基线相当的精度。这些结果表明，当多轮通信不可行时，DC-CFL是协作AI模型开发的一种实用替代方案。我们的源代码在此https URL公开。

英文摘要

Federated Learning (FL) enables distributed learning across multiple clients without sharing raw data. When statistical heterogeneity across clients is severe, Clustered Federated Learning (CFL) can im-prove performance by grouping similar clients and training cluster-wise models. However, most CFL approaches rely on multiple communication rounds for cluster estimation and model updates, which limits their practicality under tight constraints on communication rounds. We propose Data Collaboration-based Clustered Federated Learning (DC-CFL), a single-round framework that completes both client clustering and cluster-wise learning, using only the information shared in DC analysis. DC-CFL quantifies inter-client similarity via total variation distance between label distributions, estimates clusters using hierarchical clustering, and performs cluster-wise learning via DC analysis. Experiments on multiple open datasets under representative non-IID conditions show that DC-CFL achieves accuracy comparable to multi-round baselines while requiring only one communication round. These results indicate that DC-CFL is a practical alternative for collaborative AI model development when multiple communication rounds are impractical. Our source code is publicly available at https://github.com/souta-suga/DC-CFL.

URL PDF HTML ☆

赞 0 踩 0

2601.11219 2026-06-16 cs.LG cs.AI 版本更新

SDFLoRA: Selective Decoupled Federated LoRA for Privacy-preserving Fine-tuning with Heterogeneous Clients

SDFLoRA: 面向异构客户隐私保护微调的选择性解耦联邦LoRA

Zhikang Shen, Jianrong Lu, Haiyuan Wan, Jianhai Chen

发表机构 * Zhejiang University（浙江大学）； Tsinghua University（清华大学）

AI总结提出SDFLoRA，通过将LoRA更新解耦为共享和私有组件，仅聚合共享部分并注入差分隐私噪声，解决联邦微调中的秩异构和数据异构问题，提升隐私-效用权衡。

详情

AI中文摘要

联邦学习（FL）用于大型语言模型（LLM）作为在分布式数据上适应模型的隐私保护方法日益受到关注，其中低秩适应（LoRA）等参数高效方法被广泛采用以降低通信和内存成本。然而，实际部署通常表现出秩和数据异构性：客户端在不同的低秩预算和数据分布下运行，使得LoRA更新的直接聚合存在偏差且不稳定。现有方法要么强制统一秩，要么将异构更新对齐到单个共享子空间，这往往会混合可迁移和客户端特定的方向，从而损害个性化。此外，在差分隐私（DP）下，扰动这种结构混合的更新会向本应保持纯局部的方向注入噪声，导致不必要的效用下降。为了解决这些问题，我们提出了选择性解耦联邦LoRA（SDFLoRA），一种结构感知的LoRA框架，将每个客户端更新解耦为用于聚合的共享组件和保留客户端特定语义的私有组件。只有共享组件参与子空间对齐，而私有组件保持本地且不通信，使得训练与DP兼容并在秩异构下稳定聚合。通过仅向聚合的可共享更新注入噪声，该方法避免了对局部方向的扰动，并改善了效用-隐私权衡。在多个基准上的实验表明，SDFLoRA优于联邦LoRA基线，并实现了强大的效用-隐私权衡。

英文摘要

Federated learning (FL) for large language models (LLMs) has attracted increasing attention as a privacy-preserving approach for adapting models over distributed data, where parameter-efficient methods such as Low-Rank Adaptation (LoRA) are widely adopted to reduce communication and memory costs. However, practical deployments often exhibit rank and data heterogeneity: clients operate under different low-rank budgets and data distributions, making direct aggregation of LoRA updates biased and unstable. Existing approaches either enforce a unified rank or align heterogeneous updates into a single shared subspace, which tends to mix transferable and client-specific directions and consequently undermines personalization. Moreover, under differential privacy (DP), perturbing such structurally mixed updates injects noise into directions that should remain purely local, leading to unnecessary utility degradation. To address these issues, we propose Selective Decoupled Federated LoRA (SDFLoRA), a structure-aware LoRA framework that decouples each client update into a shared component for aggregation and a private component that preserves client-specific semantics. Only the shared component participates in subspace alignment, while the private component remains local and uncommunicated, making the training DP-compatible and stabilizing aggregation under rank heterogeneity. By injecting noise only into the aggregated shareable update, this approach avoids perturbations to local directions and improves the utility-privacy trade-off. Experiments on multiple benchmarks demonstrate that SDFLoRA outperforms federated LoRA baselines and achieves a strong utility-privacy trade-off.

URL PDF HTML ☆

赞 0 踩 0

2603.12977 2026-06-16 cs.LG 版本更新

Exact Federated Continual Unlearning for Ridge Heads on Frozen Foundation Models

冻结基础模型上岭回归头的精确联邦持续遗忘

Yijun Quan, Wentai Wu, Giovanni Montana

发表机构 * WMG, University of Warwick, Coventry CV4 7AL, UK（沃里克大学WMG学院，沃里克大学，英格兰考文特里CV4 7AL，英国）； Department of Computer Science, Jinan University, Guangzhou 510632, China（广州大学计算机科学系，广州510632，中国）

AI总结针对冻结基础模型+岭回归头的联邦学习场景，提出基于充分统计量的通信协议，实现精确且高效的连续遗忘，支持任意添加和删除请求，保证与集中式重训练等价的确定性。

Comments Accepted to ECML-PKDD 2026

详情

AI中文摘要

基础模型通常被部署为冻结的特征提取器，并附带一个小的可训练头，以适应联邦设置中私有的、用户生成的数据。``被遗忘权''要求按需从训练模型中移除特定样本或用户的影响。现有的联邦遗忘方法针对通用深度模型，依赖于近似重构或选择性重训练，使得精确性代价高昂或难以实现。我们在一个实际相关但未充分探索的机制中研究这个问题：一个带有岭回归头的冻结基础模型。精确最优解仅通过两个加性充分统计量依赖于数据，我们将其转化为一种通信协议，通过固定大小的消息支持任意添加和删除请求流。服务器维护一个在精确算术意义上与每次请求后的集中式重训练逐点相同的头。我们提供了确定性的重训练等价保证、顺序和划分不变性、两种服务器端变体，以及零KL散度的贝叶斯证书。在四个基准上的实验证实了这些保证：两种变体在相对Frobenius误差$10^{-9}$内匹配集中式岭重训练，并且每个请求的完成成本比联邦重训练基线低几个数量级。

英文摘要

Foundation models are commonly deployed as frozen feature extractors with a small trainable head to adapt to private, user-generated data in federated settings. The ``right to be forgotten'' requires removing the influence of specific samples or users from the trained model on demand. Existing federated unlearning methods target general deep models and rely on approximate reconstruction or selective retraining, making exactness costly or elusive. We study this problem in a practically relevant but under-explored regime: a frozen foundation model with a ridge-regression head. The exact optimum depends on the data only through two additive sufficient statistics, which we turn into a communication protocol supporting an arbitrary stream of add and delete requests via fixed-size messages. The server maintains a head that is, in exact arithmetic, pointwise identical to centralized retraining after every request. We provide deterministic retrain-equivalence guarantees, order and partition invariance, two server-side variants, and a Bayesian certificate of zero KL divergence. Experiments on four benchmarks confirm the guarantees: both variants match centralized ridge retraining to within $10^{-9}$ relative Frobenius error and complete each request at orders-of-magnitude lower cost than federated retraining baselines.

URL PDF HTML ☆

赞 0 踩 0

2504.11775 2026-06-16 stat.ML cs.CY cs.LG q-fin.RM 版本更新

Discrimination-free Insurance Pricing with Privatized Sensitive Attributes

基于隐私化敏感属性的无歧视保险定价

Tianhe Zhang, Suhan Liu, Peng Shi

发表机构 * Department of Risk and Insurance, University of Wisconsin-Madison（风险与保险系，威斯康星大学麦迪逊分校）； Department of Statistics and Operations Research, University of North Carolina-Chapel Hill（统计与运筹系，北卡罗来纳大学教堂山分校）

AI总结针对保险公司无法直接获取敏感属性（如性别、种族）的公平定价问题，提出利用隐私化（加噪）敏感属性估计无歧视保费的方法，并建立理论保证与实证验证。

详情

AI中文摘要

公平性已成为保险定价中的重要关注点，因为保险公司越来越依赖机器学习模型来预测预期损失。同时，监管和隐私约束通常限制保险公司访问或使用敏感属性（如性别或种族）。最近的精算研究通过无歧视保费的概念来解决这一背景下的公平性问题，该概念消除了敏感属性的直接和间接影响，同时保持精算一致性。然而，实施这种方法通常需要访问敏感属性本身，而在实践中可能无法获得。本文研究了当敏感属性仅以隐私化或噪声扰动形式被观测时，无歧视保险保费的估计问题。我们考虑一个多方数据设置，其中保险公司观测非敏感属性和结果，而一个可信第三方持有通过隐私机制生成的隐私化敏感属性。在此框架内，我们开发了仅使用隐私化属性估计无歧视保费的统计方法。我们研究了两种实际相关的情况：隐私机制已知和其噪声水平未知。对于这两种情况，我们为所提出的估计量建立了理论保证。数值实验和实证应用表明，所提出的方法能够在尊重隐私和监管约束的同时实现公平的保险定价。

英文摘要

Fairness has become an important concern in insurance pricing as insurers increasingly rely on machine learning models to predict expected losses. At the same time, regulatory and privacy constraints often restrict insurers' ability to access or use sensitive attributes such as gender or race. Recent actuarial research addresses fairness in this context through the concept of the discrimination-free premium, which removes both the direct and indirect effects of sensitive attributes while preserving actuarial consistency. However, implementing this approach typically requires access to the sensitive attributes themselves, which may not be available in practice. This paper studies the estimation of discrimination-free insurance premiums when sensitive attributes are observed only in privatized or noise-perturbed form. We consider a multi-party data setting in which insurers observe non-sensitive attributes and outcomes, while a trusted third party holds privatized sensitive attributes generated through a privacy mechanism. Within this framework, we develop statistical methods for estimating discrimination-free premiums using only the privatized attributes. We study two settings of practical relevance: when the privacy mechanism is known and when its noise level is unknown. For both cases, we establish theoretical guarantees for the proposed estimators. Numerical experiments and empirical applications demonstrate that the proposed approach enables fair insurance pricing while respecting privacy and regulatory constraints.

URL PDF HTML ☆

赞 0 踩 0

2605.26595 2026-06-16 cs.CR cs.AI cs.LG 版本更新

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

Cordyceps: 通过数据投毒对LLM的隐蔽控制攻击

Zedian Shao, Charles Fleming, Teodora Baluta

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； Cisco Systems（思科系统）

AI总结提出一种数据投毒方法，通过语义关联教LLM隐藏任意恶意指令，实现隐蔽控制攻击，绕过多种防御。

Comments USENIX Security '26

详情

AI中文摘要

大型语言模型（LLM）通常在没有经过精心筛选的文本数据集上进行微调，而对手可以对这些数据集进行投毒。现有的投毒攻击主要依赖于固定的触发短语，而异常检测、干净数据正则化或在线监控等防御措施可以中和这些触发短语。在本文中，我们提出了一种数据投毒方法，通过共享知识（如事实或概念）与攻击者选择的短语之间的语义关联，可靠且隐蔽地教LLM一种信息隐藏方案。诱导的隐藏方案可以编码和解码任意恶意指令，从而揭示了一种新的、微妙的投毒诱导漏洞：隐蔽控制攻击。我们精确描述了隐蔽控制攻击，并在5个LLM、3个后门防御和4个提示注入防御上进行了评估。在少量投毒样本的情况下，隐蔽控制攻击在平均攻击成功率上比基于启发式的提示注入攻击高出约40%（相对于干净微调模型）。它们还绕过了基于检测和微调的防御，在后门防御后保持高达93%的攻击成功率，在提示注入防御后保持高达98%的攻击成功率。

英文摘要

Large language models (LLMs) are often fine-tuned on uncurated text datasets that adversaries can poison. Existing poisoning attacks primarily rely on fixed trigger phrases that defenses such as outlier detection, clean-data regularization, or online monitoring can neutralize. In this paper, we propose a data poisoning method that teaches an LLM an information hiding scheme reliably and stealthily through semantic associations between shared knowledge such as facts or concepts and attacker-chosen phrases. The induced hiding scheme can encode and decode arbitrary malicious instructions, thus revealing a new and subtle poisoning-induced vulnerability: covert control attacks. We precisely characterize covert control attacks and evaluate them across $5$ LLMs, $3$ backdoor defenses, and $4$ prompt injection defenses. With a small poisoned fraction, covert control attacks outperform heuristic-based prompt injection attacks in average attack success rate by about $40\%$ relative to clean fine-tuned models. They also circumvent defenses based on detection and fine-tuning, maintaining up to $93\%$ attack success rate after backdoor defenses and up to $98\%$ after prompt injection defenses.

URL PDF HTML ☆

赞 0 踩 0

2606.14865 2026-06-16 cs.LG cs.AI 新提交

GRAPE: Guided Parameter-Space Evolution for Compact Adversarial Robustness

GRAPE: 面向紧凑对抗鲁棒性的引导式参数空间演化

Zhiyuan Ye, Xiangyu Zhou, Ji Qi, Hao Zhang, Yi Zhou

发表机构 * University of Science and Technology of China（中国科学技术大学）； China Mobile (Suzhou) Software Technology Co., Ltd.（中移（苏州）软件技术有限公司）

AI总结提出GRAPE框架，通过逐步暴露参数空间并利用对抗谱利用分数引导容量分配，在固定计算预算下提升紧凑模型的对抗鲁棒性，在CIFAR-10上以1.009倍FLOPs将PGD-20鲁棒准确率从51.70%提升至56.94%，参数减少21.4%。

详情

AI中文摘要

对抗训练（AT）提高了神经网络的鲁棒性，但大多数方法从一开始就训练固定的参数空间。本文探讨了参数变得可优化的顺序是否会影响最终的鲁棒解，即使最终架构或计算预算被控制。我们提出了GRAPE（引导式参数空间演化），一种面向紧凑对抗鲁棒性的训练框架。GRAPE结合了参数空间稳定化与渐进式隐藏扩展：它在当前暴露空间中稳定鲁棒优化，逐步释放新的可优化维度，并使用对抗谱利用分数引导新释放的容量流向高压模块。与固定结构的AT相比，GRAPE将鲁棒模型学习视为一个渐进式参数空间暴露和演化的过程。在CIFAR-10上的标准$\ell_\infty$威胁模型下，以固定结构ResNet-18 AT作为对照参考，GRAPE在几乎匹配的计算预算下（FLOPs比率为1.009倍）将PGD-20鲁棒准确率从51.70%提升至56.94%，同时参数数量减少约21.4%。一个具有相同最终ResNet-18架构的序列增长变体达到了56.52%的PGD-20鲁棒准确率，表明增益不仅来自最终架构差异，还来自参数空间暴露路径。这些结果表明，引导式参数空间演化可以在匹配计算条件下产生紧凑且鲁棒的参数配置。

英文摘要

Adversarial Training (AT) improves neural network robustness, but most methods train a fixed parameter space from the start. This paper asks whether the order in which parameters become optimizable can affect the final robust solution, even when the final architecture or computation budget is controlled. We propose GRAPE, Guided Parameter-Space Evolution, a training framework for compact adversarial robustness. GRAPE combines parameter-space stabilization with progressive hidden expansion: it stabilizes robust optimization in the currently exposed space, gradually releases new optimizable dimensions, and uses an adversarial spectral utilization score to guide newly released capacity toward high-pressure modules. In contrast to fixed-structure AT, GRAPE treats robust model learning as a process of progressive parameter-space exposure and evolution. Under the standard $\ell_\infty$ threat model on CIFAR-10, with fixed-structure ResNet-18 AT as a controlled reference, GRAPE improves PGD-20 robust accuracy from 51.70% to 56.94% at a nearly matched computation budget with a FLOPs ratio of 1.009x, while reducing parameter count by about 21.4%. A sequential grow variant with the same final ResNet-18 architecture reaches 56.52% PGD-20 robust accuracy, indicating that the gain is not only due to final architecture differences but also to the parameter-space exposure path. These results suggest that guided parameter-space evolution can yield compact and robust parameter configurations under matched computation.

URL PDF HTML ☆

赞 0 踩 0

2606.15127 2026-06-16 cs.LG 新提交

Beyond Accuracy: Measuring Bias Acknowledgment in Chain-of-Thought Reasoning for Responsible AI Evaluation

超越准确率：在负责任AI评估中衡量思维链推理中的偏见承认

Xian Sun, Wei Gao, Yingshuo Wang, Lingdong Kong, Yanhang Li, Zhichao Fan, Zexin Zhuang, Wenlong Dong, Zhiyuan Zheng, Hrishikesh Paranjape, Abhishek Mandal, Johnny R. Zhang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结针对仅用准确率评估忽略推理链中偏见承认的问题，提出包含易感性（susceptibility）和承认（acknowledgment）两个维度的诊断方法，实验发现不同模型在准确率相近时承认率差异显著。

Comments ICML 2026 Workshop on Trustworthy AI for Good

详情

AI中文摘要

推理模型越来越多地用于最终答案并非唯一审查对象的场景：教育工具可能向学生展示中间步骤，决策支持系统可能需要人工监督，审计工作流可能检查痕迹是否存在误导性或偏见输入。在这些场景中，两个响应可能获得相同的最终答案分数，但在痕迹是否明确标记注入的偏见内容方面存在差异。仅用准确率评估会忽略这些情况。我们将这一差距视为负责任评估中的测量盲点，并引入一个最小痕迹级诊断，包含两个维度：\emph{易感性}（偏见是否破坏先前正确的答案）和\emph{承认}（痕迹是否包含由规则定义的提及注入内容的表面引用）。在数千个有偏见的GSM8K试验中，GPT-4o和Claude Sonnet~4的易感性率相似（$1.3\%$ vs. $1.2\%$），但在相同规则下承认率差异显著（$13.0\%$ vs. $75.0\%$）。

英文摘要

Reasoning models are increasingly used in settings where the final answer is not the only object of review: educational tools may show students intermediate steps, decision-support systems may require human oversight, and audit workflows may inspect traces for misleading or biased input. In such settings, two responses can receive the same final-answer score while differing in whether the trace explicitly flags injected biasing content. Accuracy-only evaluation collapses these cases. We study this gap as a measurement blind spot for responsible evaluation and introduce a minimal trace-level diagnostic with two axes: \emph{susceptibility} (whether the bias breaks a previously correct answer) and \emph{acknowledgment} (whether the trace contains a rubric-defined surface reference to the injected content). Across thousands of biased GSM8K trials, GPT-4o and Claude Sonnet~4 have similar susceptibility rates ($1.3\%$ vs.\ $1.2\%$) but substantially different acknowledgment rates ($13.0\%$ vs.\ $75.0\%$) under the same rubric.

URL PDF HTML ☆

赞 0 踩 0

2606.15153 2026-06-16 cs.LG 新提交

False Sense of Safety in Selective Signal Classification: Auditing Bound Tightness and Exchangeability for Risk Control

选择性信号分类中的虚假安全感：风险控制的边界紧致性与可交换性审计

Jingwen Zhou, Mingzhe Wang

AI总结审计分布自由风险控制下选择性预测的边界紧致性与可交换性假设，发现经验阈值法常超预算，而认证方法在可交换时有效，但组部署下因可交换性失效导致违规。

详情

AI中文摘要

具有分布自由风险控制的选择性预测承诺：在标定样本上以置信度1-delta，接受输入的误差率保持在用户预算alpha以下。我们在信号域检测器（机器异常声音检测（ASD）和AI生成图像取证）上审计了这一承诺，针对四种标定规则：未经认证的经验阈值法（NAIVE）以及认证的Hoeffding、Clopper-Pearson（CP）和赌博（WSR）上置信界。我们报告三个发现。（i）实践中常见的NAIVE阈值法在49-73%的合成试验（n=200个标定点）和高达68%的真实数据分割中超出其声明预算：这是一种虚假的安全感，而非定理被破坏，因为该规则从未有证书。（ii）紧致性重要：CP和WSR在Hoeffding未认证的地方认证了显著覆盖，且在可交换分割下观察到零预算超限。（iii）在分组部署（未见过的机器类型或生成器）下，认证规则在9-30%的试验中超限——远高于delta——表明失败在于可交换性前提被破坏，而非边界本身；保守的逐组阈值以严重的覆盖代价恢复了有效性。

英文摘要

Selective prediction with distribution-free risk control promises that, with confidence 1-delta over the calibration draw, the error rate of accepted inputs stays below a user budget alpha. We audit this promise on signal-domain detectors -- machine anomalous-sound detection (ASD) and AI-generated-image forensics -- for four calibration rules: uncertified empirical thresholding (NAIVE) and certified Hoeffding, Clopper-Pearson (CP), and betting (WSR) upper confidence bounds. We report three findings. (i) NAIVE thresholding, common in practice, exceeds its declared budget in 49-73% of synthetic trials (n=200 calibration points) and in up to 68% of real-data splits: a false sense of safety rather than a broken theorem, since the rule never had a certificate. (ii) Tightness matters: CP and WSR certify substantial coverage where Hoeffding certifies none, with zero observed budget overruns under exchangeable splits. (iii) Under grouped deployment (unseen machine types or generators), certified rules overrun in 9-30% of trials -- far above delta -- showing the failure lies in the broken exchangeability premise, not in the bounds; a conservative per-group threshold restores validity at a severe coverage cost.

URL PDF HTML ☆

赞 0 踩 0

2606.15479 2026-06-16 cs.LG cs.AI math.PR 新提交

Bayesian 3D Steerable CNNs: Enabling Equivariance and Uncertainty Quantification Simultaneously

贝叶斯3D可转向CNN：同时实现等变性和不确定性量化

Abhishek Keripale, Ponkrshnan Thiagarajan, Susanta Ghosh

发表机构 * Michigan Technological University（密歇根理工大学）； Johns Hopkins University（约翰霍普金斯大学）； The Center for Artificial Intelligence at the Institute of Computing and Cybersystems, Michigan Technological University（密歇根理工大学计算与网络系统研究所人工智能中心）

AI总结提出贝叶斯可转向CNN，通过后验分布赋予核随机性同时保持SE(3)-等变性，实现不确定性分解，在分类精度和分布偏移下鲁棒性优于确定性模型。

详情

AI中文摘要

可转向卷积神经网络（Steerable-CNNs）通过将核参数化为可转向基函数的线性组合来保证SE(3)-等变性，但其确定性本质阻碍了不确定性量化——限制了其在需要置信度估计的场景中的应用。我们提出一种贝叶斯可转向CNN，将后验分布置于基系数上，从而在精确保持等变性的同时产生随机核。模型的损失函数通过变分推断获得，并通过贝叶斯反向传播最小化。该框架将预测不确定性分解为认知不确定性和偶然不确定性。实验上，该模型在取得竞争性分类精度的同时，预期校准误差为0.0263，并且在加性高斯噪声引起的分布偏移下，其性能比确定性对应模型高出最多6.17%。此外，我们利用模型的不确定性估计显著提升其性能，在测试数据集的84%上实现了约4%的准确率提升。认知不确定性与预测误差之间统计显著的负相关性表明，学习到的后验方差具有语义意义。该框架将贝叶斯不确定性量化与等变CNN的归纳偏置统一起来。

英文摘要

Steerable convolutional neural networks (Steerable-CNNs) guarantee SE(3)-equivariance by parameterizing kernels as linear combinations of steerable basis functions, but their deterministic nature precludes uncertainty quantification - limiting their use in settings where confidence estimates are essential. We propose a Bayesian Steerable-CNN that places posterior distributions over the basis coefficients, yielding stochastic kernels while preserving equivariance exactly. The loss function of the model is obtained via variational inference and minimized by Bayes-by-Backpropagation. The framework admits a decomposition of predictive uncertainty into epistemic and aleatoric components. Empirically, the model attains competitive classification accuracy alongside an expected calibration error of 0.0263 and outperforms its deterministic counterpart by up to 6.17% under distributional shift induced by additive Gaussian noise. Furthermore, we leverage the model's uncertainty estimates to enhance its performance significantly, achieving a notable gain - approximately 4% higher accuracy across 84% of the test dataset. A statistically significant negative correlation between epistemic uncertainty and prediction error confirms that the learned posterior variance is semantically meaningful. The framework unifies Bayesian uncertainty quantification with the inductive bias of equivariant CNNs.

URL PDF HTML ☆

赞 0 踩 0

2606.15493 2026-06-16 cs.LG cs.CR 新提交

Model Stealing Through the Lens of Model Multiplicity

从模型多重性视角看模型窃取

Eliott Baltz, Satoshi Hara, Ulrich Aïvodji

发表机构 * ÉTS, Mila（蒙特利尔高等技术学院，Mila）； The University of Electro-Communications（电气通信大学）

AI总结本文通过计算替代模型的Rashomon集并评估其多样性，发现高保真替代模型在关键性能指标上可能与目标模型存在显著差异，挑战了传统观点。

Comments 14 pages, 15 figures

详情

AI中文摘要

模型窃取攻击中，对手创建高保真替代模型，对机器学习服务的知识产权构成重大威胁。传统观点认为这些替代模型能为对手提供与原始服务提供商相当的经济杠杆。本文通过评估模型窃取攻击超越单纯对目标模型的保真度来挑战这一假设。由于基于查询的提取仅提供目标输入输出行为的部分监督，替代模型并非唯一确定：许多接近最优的替代模型可以在实现相当保真度的同时，在部署相关属性上存在差异。我们不执行经典的基于学习的模型窃取攻击，而是计算替代模型的Rashomon集（即几乎同等准确的模型集合），并使用多重性指标（歧义性、差异性和Rashomon容量）和群体公平性指标评估其多样性。在表格、医学影像和NLP任务中，我们在真实数据集上的实验表明，尽管替代模型与目标模型表现出相似的保真度，但在其他关键性能指标上可能显示出显著差异。这些发现对高保真替代模型与实际部署场景中目标模型之间的假定等价性提出了质疑。

英文摘要

Model stealing attacks, where adversaries create high-fidelity surrogate models, are a significant threat to the intellectual property of machine learning services. Conventional wisdom suggests these surrogates could provide adversaries with economic leverage comparable to the original service providers. This paper challenges this assumption by evaluating model stealing attacks beyond mere fidelity to the target model. Because query-based extraction provides only partial supervision of the target's input-output behavior, the surrogate is not uniquely identified: many near-optimal surrogates can achieve comparable fidelity while differing in deployment-relevant properties. Instead of performing a classic learning-based model stealing attack, we compute the Rashomon Set (i.e., the set of almost-equally-accurate models) of surrogate models, and evaluate its diversity using multiplicity metrics (ambiguity, discrepancy, and Rashomon Capacity) and group fairness metrics. Across tabular, medical imaging, and NLP tasks, our experiments on real-world datasets reveal that despite exhibiting similar fidelity to the target model, surrogate models can display significant variances in other critical performance metrics. These findings cast doubt on the presumed equivalence between high-fidelity surrogates and the target model in practical deployment scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.15730 2026-06-16 cs.LG cs.AI 新提交

InstantForget: Update-Free Backdoor Unlearning with Inference-Time Feature Reset

InstantForget: 无需更新的后门遗忘与推理时特征重置

Zhenyu Yu

发表机构 * College of Computer Science and Artificial Intelligence, Fudan University（复旦大学计算机科学与人工智能学院）

AI总结提出InstantForget方法，通过推理时特征重置实现无需参数更新的后门遗忘，利用马氏距离检测异常特征并重置为中性表示，在CIFAR-10上平均ASR降至0.071。

详情

AI中文摘要

后门遗忘旨在从部署模型中移除恶意触发行为，同时保持清洁效用。我们研究了无需更新的推理时设置，其中模型参数保持冻结。首先，我们在oracle配对的清洁和触发特征下审计了一个常见的投影假设。投影主要对BadNets成功，而在CIFAR-10 ResNet-18上对WaNet、Blended和SIG的ASR分别为0.683、0.888和0.941。这种失败不能由谱紧凑性、空间局部性或子空间错位解释，而是由涉及目标边际、目标logit下降和非目标logit上升的logit三元组差距预测。然后我们引入了InstantForget，一种清洁校准的门控重置，通过马氏距离标记异常特征，并仅将标记的特征移向中性的非目标表示。在保留的触发验证集上选择一个固定操作点后，InstantForget在部署时无需触发样本或参数更新，将CIFAR-10上四种非自适应触发的平均ASR降至0.071。它还达到了0.981的检测AUROC，并迁移到八个测试骨干中的六个。报告的在WaNet、ModelNet10点混合、两种骨干几何和自适应特征紧凑性攻击下的失败定义了该方法的适用范围。

英文摘要

Backdoor unlearning aims to remove a malicious trigger behavior from a deployed model while preserving clean utility. We study the update-free inference-time setting, where model parameters remain frozen. First, we audit a common projection assumption under oracle paired clean and triggered features. Projection succeeds mainly on BadNets and leaves WaNet, Blended, and SIG at 0.683, 0.888, and 0.941 ASR on CIFAR-10 ResNet-18. This failure is not explained by spectral compactness, spatial locality, or subspace misalignment. It is predicted by a logit-triplet gap involving the target margin, target-logit drop, and non-target logit rise. We then introduce InstantForget, a clean-calibrated gated reset that flags anomalous features with a Mahalanobis score and moves only flagged features toward a neutral non-target representation. With one fixed operating point selected on held-out triggered validation, InstantForget reduces average ASR to 0.071 across four non-adaptive CIFAR-10 triggers without triggered samples or parameter updates at deployment. It also reaches 0.981 detection AUROC and transfers to six of eight tested backbones. Reported failures under WaNet, ModelNet10 point blend, two backbone geometries, and adaptive feature-compactness attacks define the method's scope.

URL PDF HTML ☆

赞 0 踩 0

2606.15767 2026-06-16 cs.LG cs.AI 新提交

Visualizing Uncertainty: Spatial Maps of Missing and Conflicting Evidence in Deep Learning

可视化不确定性：深度学习中缺失与冲突证据的空间图

Dong Hyun Jeong, Feng Chen, Jin-Hee Cho, Lance M. Kaplan, Audun Jøsang, Soo-Yeon Ji

发表机构 * University of the District of Columbia（哥伦比亚特区大学）； University of Texas at Dallas（德克萨斯大学达拉斯分校）； Virginia Tech（弗吉尼亚理工大学）； U.S. Army DEVCOM Army Research Laboratory（美国陆军DEVCOM陆军研究实验室）； University of Oslo（奥斯陆大学）； Bowie State University（鲍伊州立大学）

AI总结提出不确定性激活图（UAM）框架，结合证据深度学习与全梯度类激活映射，生成空间不确定性激活图，区分缺乏证据的空虚和假设冲突的不和谐，填补不确定性量化与可解释性之间的空白。

详情

AI中文摘要

理解深度神经网络何时以及为何不确定对于在安全关键领域部署可靠的机器学习系统至关重要。虽然现有的不确定性量化方法提供了模型置信度的标量度量，但它们对输入的哪些空间区域导致不同类型的不确定性提供的洞察有限。我们提出了一种新颖的可视化框架——不确定性激活图（UAM），它将证据深度学习（EDL）与全梯度类激活映射（FullGrad）相结合，生成可解释的空间不确定性激活图。我们的方法区分了两种基本的不确定性类型：空虚（代表缺乏证据）和不和谐（捕捉竞争假设之间的冲突证据）。通过利用FullGrad的完整梯度分解特性和主观逻辑的原则性不确定性量化，我们的方法产生了理论上合理的可视化，突出显示了导致模型不确定性的特定图像区域。利用该框架，通过计算信念加权属性生成空虚和不和谐激活图，从而能够识别模型缺乏知识的区域与遇到模糊证据的区域。在多个基准数据集上的广泛评估表明，所提出的框架有效地解决了不确定性量化与可解释性之间的关键差距，为评估复杂视觉识别任务中的模型可靠性提供了直观的视觉反馈。

英文摘要

Understanding when and why deep neural networks are uncertain is crucial for deploying reliable machine learning systems in safety-critical domains. While existing uncertainty quantification methods provide scalar measures of model confidence, they offer limited insight into which spatial regions of an input contribute to different types of uncertainty. We propose a novel visualization framework, Uncertainty Activation Map (UAM), that combines Evidential Deep Learning (EDL) with Full-Gradient Class Activation Mapping (FullGrad) to generate interpretable spatial uncertainty activation maps. Our approach distinguishes between two fundamental types of uncertainty: vacuity, representing lack of evidence, and dissonance, capturing conflicting evidence between competing hypotheses. By leveraging the complete gradient decomposition property of FullGrad and the principled uncertainty quantification of Subjective Logic, our method produces theoretically grounded visualizations that highlight specific image regions responsible for model uncertainty. With this framework, vacuity and dissonance activation maps are generated by computing belief-weighted attributions, enabling identification of where models lack knowledge versus where they encounter ambiguous evidence. Extensive evaluations across multiple benchmark datasets demonstrate that the proposed framework effectively addresses the critical gap between uncertainty quantification and explainability, providing intuitive visual feedback to assess model reliability in complex visual recognition tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.15980 2026-06-16 cs.LG cs.AI cs.CL 新提交

Do Safety Monitors Stay Reliable After an Update? Benchmarking and Predicting Activation-Monitor Staleness

安全监控器在更新后是否仍可靠？激活监控器陈旧性的基准测试与预测

Evan Duan

发表机构 * University of Michigan（密歇根大学）

AI总结研究语言模型更新后激活监控器是否仍可靠，发现量化更新影响小，微调更新常导致监控器失效，且可通过预部署特征预测退化。

详情

AI中文摘要

激活监控器——在语言模型内部表示上训练的轻量级探针——在部署安全栈中越来越常见。然而，部署的模型很少是静态的：它们被量化、微调、用LoRA适配，或与合并适配器一起服务，而监控器保持冻结。我们首次系统测试了这一隐含契约是否成立：在基础模型上训练的激活监控器在这些常规模型更新后是否仍可靠。跨多个安全相关监控器、模型深度、更新系列和开放权重模型，我们发现一个明显的分裂：量化风格的更新大多保持冻结探针性能，而微调风格的更新经常使探针变得陈旧。脆弱性高度依赖于监控器，隐私/PII探针受影响最大，而拒绝合规探针相对稳定，表明重新训练行为不一定使其对应的监控器变得陈旧。QLoRA尤其有害，尽管单独的NF4量化相对良性，这表明量化在与适配结合时风险更大。我们进一步表明，退化可以从部署前的特征预测，从而能够将重新验证预算优先分配给最可能失败的监控器。这些结果表明，微调应默认触发激活监控器重新验证，而预测可以帮助优先检查哪些监控器。

英文摘要

Activation monitors-lightweight probes trained on a language model's internal representations-are an increasingly common layer in deployment safety stacks. Deployed models however are rarely static: they are quantized, fine-tuned, adapted with LoRA, or served with merged adapters while the monitor remains frozen. We present the first systematic test of whether this implicit contract holds: whether activation monitors trained on a base model remain reliable after these routine model updates. Across multiple safety-relevant monitors, model depths, update families, and open-weight models, we find a sharp split: quantization-style updates largely preserve frozen probe performance, while fine-tuning-style updates frequently make probes stale. Fragility is highly monitor-dependent, with privacy/PII probes most affected and refusal-compliance probes comparatively stable, showing that retraining a behavior need not stale its corresponding monitor. QLoRA is especially damaging despite NF4 quantization alone being relatively benign, suggesting that quantization becomes riskier when combined with adaptation. We further show that degradation is predictable from pre-deployment features, enabling revalidation budgets to be triaged toward the monitors most likely to fail. These results suggest that fine-tuning should trigger activation-monitor revalidation by default, while prediction can help prioritize which monitors to check first.

URL PDF HTML ☆

赞 0 踩 0

2606.16050 2026-06-16 cs.LG cs.AI 新提交

ALCL: An Adaptive Log-Correntropy Loss for Robust Learning under Non-Gaussian Noise

ALCL：一种用于非高斯噪声下鲁棒学习的自适应对数相关熵损失

Mainak Kundu, Ria Kanjilal, Ismail Uysal

发表机构 * University of South Florida（南佛罗里达大学）； California Polytechnic State University（加州州立理工大学）

AI总结提出自适应对数相关熵损失（ALCL），通过可微重参数化联合学习形状和尺度参数，使损失几何动态适应残差统计，抑制极端异常值，在混合重尾和脉冲噪声下优于MSE和固定核相关熵损失。

详情

AI中文摘要

在重尾和脉冲噪声下的鲁棒深度学习仍然具有挑战性，因为均方误差（MSE）等传统损失对异常值表现出无界敏感性。尽管基于相关熵的目标函数提高了鲁棒性，但现有公式依赖于固定的核参数，这些参数必须凭经验调整且在训练期间保持不变。为了解决这些局限性，我们提出了一种自适应对数相关熵损失（ALCL），这是一种重尾损失公式，能够在优化过程中自适应地学习其鲁棒性几何结构。ALCL引入了一个对数残差模型，其形状和尺度参数通过可微重参数化与网络权重联合学习。这产生了一个原理性的最大似然公式，其影响函数形式上是有界且再下降的，使得损失几何能够动态适应不断变化的残差统计，同时抑制极端异常值。在四个广泛使用的基准数据集（涵盖灰度图像和红绿蓝（RGB）图像数据）上，在混合重尾和脉冲噪声下进行的比较实验表明，ALCL在重建保真度和下游分类准确性方面始终优于MSE和最优调整的广义相关熵损失。虽然在低噪声条件下性能差异仍然很小，但在高噪声条件下，ALCL在灰度基准上中位数准确率提高了高达4.75%，在RGB数据集上提高了4.51%，并且运行间方差减小。这些结果表明，通过联合学习损失参数实现的自适应鲁棒性为非高斯环境下深度学习中基于静态相关熵的损失提供了一种计算高效的替代方案。

英文摘要

Robust deep learning under heavy-tailed and impulsive noise remains challenging because conventional losses such as mean squared error (MSE) exhibit unbounded sensitivity to outliers. Although correntropy-based objectives improve robustness, existing formulations rely on fixed kernel parameters that must be empirically tuned and remain static during training. To address these limitations, we propose an Adaptive Log-Correntropy Loss (ALCL), a heavy-tailed loss formulation that adaptively learns its robustness geometry during optimization. ALCL introduces a logarithmic residual model whose shape and scale parameters are learned jointly with network weights through differentiable reparameterization. This yields a principled maximum likelihood formulation whose influence function is formally bounded and redescending, allowing the loss geometry to adapt dynamically to evolving residual statistics while suppressing extreme outliers. Comparative experiments on four widely used benchmark datasets spanning grayscale and red-green-blue (RGB) image data under mixed heavy-tailed and impulsive noise demonstrate that ALCL consistently outperforms MSE and optimally tuned generalized correntropy losses in both reconstruction fidelity and downstream classification accuracy. While performance differences remain small under low-noise conditions, under high-noise regimes ALCL improves median accuracy by up to 4.75% on grayscale benchmarks and 4.51% on RGB datasets, with reduced variance across runs. These results demonstrate that adaptive robustness through joint learning of loss parameters provides a computationally efficient alternative to static correntropy-based losses for deep learning in non-Gaussian environments.

URL PDF HTML ☆

赞 0 踩 0

2606.16196 2026-06-16 cs.LG cs.CV 新提交

When Confidence Lacks Concepts: Interpretable OOD Detection via Representation Perturbations

当置信度缺乏概念：通过表示扰动实现可解释的OOD检测

Anju Chhetri, Pratik Shrestha, Ramesh Rana, Prashnna Gyawali, Binod Bhattarai

发表机构 * NepAl Applied Mathematics and Informatics Institute for research（尼泊尔应用数学与信息学研究所）； West Virginia University（西弗吉尼亚大学）； Kathmandu University（加德满都大学）； University College London（伦敦大学学院）； University of Aberdeen（阿伯丁大学）

AI总结提出一种基于类条件语义扰动和稀疏自编码器的可解释OOD检测框架，通过分析表示稳定性实现检测与内部机制解释。

详情

AI中文摘要

深度神经网络在医学影像任务中取得了显著性能，但其在分布偏移下过度泛化的倾向对安全临床部署构成了主要障碍。分布外（OOD）检测方法旨在缓解这一风险，但现有方法大多依赖语义含义理解不足的不透明内部信号，限制了在安全关键场景中的信任。本文提出一种可解释的OOD检测框架，该框架通过类条件语义扰动探测模型预测的稳定性。利用稀疏自编码器（SAE），我们从分布内数据中学习类特定概念向量，将密集的中间表示解耦为稀疏、语义有意义的组件。在推理时，我们使用与模型预测类别相关的概念向量扰动深层表示，并测量类别logits的稳定性。我们假设分布内样本对此类扰动表现出低敏感性，因为其表示与类特定语义方向对齐，而OOD样本由于表示错位而显示出放大的偏差。通过将OOD检测框架为概念条件稳定性分析，我们的方法既提供了判别性OOD信号，又提供了驱动模型不确定性的内部机制的可解释视角，使其特别适用于高风险医学应用。

英文摘要

Deep neural networks have achieved remarkable performance across medical imaging tasks, yet their tendency to overgeneralize under distributional shifts poses a major obstacle to safe clinical deployment. Out-of-Distribution (OOD) detection methods aim to mitigate this risk, but most existing approaches rely on opaque internal signals with poorly understood semantic meaning, limiting trust in safety-critical settings. In this work, we propose an interpretable OOD detection framework that probes the stability of model predictions under class-conditioned semantic perturbations. Leveraging sparse autoencoders (SAEs), we learn class-specific concept vectors from in-distribution data that disentangle dense intermediate representations into sparse, semantically meaningful components. At inference, we perturb deeper-layer representations using the concept vectors associated with the model's predicted class and measure the class logits stability. We hypothesize that in-distribution samples exhibit low sensitivity to such perturbations, as their representations align with class-specific semantic directions, whereas OOD samples show amplified deviations due to representational misalignment. By framing OOD detection as a concept conditioned stability analysis, our approach provides both a discriminative OOD signal and an interpretable lens into the internal mechanisms driving model uncertainty, making it particularly suitable for high stakes medical applications.

URL PDF HTML ☆

赞 0 踩 0

2606.16524 2026-06-16 cs.LG astro-ph.CO stat.ML 新提交

Neural Bayesian Anomaly Mitigation: A Robust Loss that Doubles as an Unsupervised Contamination Classifier

神经贝叶斯异常缓解：一种兼具无监督污染分类器功能的鲁棒损失函数

S. A. K. Leeney, W. J. Handley, H. T. J. Bevins, E. de Lera Acedo

发表机构 * Astrophysics Group, Cavendish Laboratory, University of Cambridge（剑桥大学卡文迪许实验室天体物理组）； Institute of Astronomy, University of Cambridge（剑桥大学天文研究所）

AI总结提出神经贝叶斯异常缓解（NBAM）损失，基于贝叶斯潜变量混合模型，既提供鲁棒监督损失又输出无监督污染后验，在CIFAR-10上优于Huber等基线。

Comments 13 pages, 4 figures

详情

AI中文摘要

工程化的鲁棒损失函数（如Huber、Student-$t$和广义交叉熵）使监督模型能够容忍污染，但无法回答哪些观测被破坏。我们引入神经贝叶斯异常缓解（NBAM），一种通用的即插即用损失函数，源自贝叶斯潜在开关混合模型：边际似然定义了一个鲁棒的监督损失，相关的后验定义了一个无监督的污染分类器。与Huber或Student-$t$类似，NBAM可以替换任何监督流程中的标准训练损失；与它们不同，NBAM还学习了一个结构化的污染模型，并返回每个样本的校准污染后验。学习到的输入相关先验$π_ϕ(x)$捕获污染的空间局部性，使得靠近已知损坏的样本更可能被标记，同时自动出现奥卡姆惩罚并正则化以防止过度标记。在具有非对称标签污染的CIFAR-10上，NBAM无需监督即可恢复污染过程的结构：污染后验将干净样本与污染样本分开，学习到的异常头识别每个标签翻转对的方向。除了这些能力之外，在0.2-0.6的污染率下，NBAM的性能优于本文考虑的四种鲁棒损失基线。

英文摘要

Engineered robust losses such as Huber, Student-$t$, and generalised cross-entropy make supervised models tolerant of contamination but cannot answer which observations are corrupted. We introduce Neural Bayesian Anomaly Mitigation (NBAM), a general-purpose drop-in loss derived from a Bayesian latent-switch mixture model: the marginal likelihood defines a robust supervised loss, and the associated posterior defines an unsupervised contamination classifier. Like Huber or Student-$t$, NBAM can replace the standard training loss in any supervised pipeline; unlike them, it additionally learns a structured contamination model and returns a calibrated per-sample contamination posterior. A learned input-dependent prior $π_ϕ(x)$ captures the spatial locality of contamination, so that samples near known corruptions are more likely to be flagged, while an Occam penalty emerges automatically and regularises against over-flagging. On CIFAR-10 with asymmetric label contamination, NBAM recovers the structure of the corruption process without supervision: the contamination posterior separates clean from corrupted samples, and the learned anomaly head identifies the direction of every label-flip pair. Alongside these capabilities, NBAM outperforms the four robust-loss baselines considered here at contamination rates 0.2-0.6.

URL PDF HTML ☆

赞 0 踩 0

2606.16535 2026-06-16 cs.LG cs.CV cs.SC 新提交

Assessing Reliability of Symbol Detection in Concept Bottleneck Models

评估概念瓶颈模型中符号检测的可靠性

Javier Fumanal-Idocin, Javier Andreu-Perez

发表机构 * University of Essex（埃塞克斯大学）

AI总结本文研究概念瓶颈模型（CBM）中符号检测的可靠性问题，通过交换独立训练的概念检测器和分类头来识别易受虚假激活影响的概念，并提出一种可靠性感知训练策略，在CUB-200-2011和合成任务上验证了其有效性。

详情

AI中文摘要

概念瓶颈模型（CBM）是可解释人工智能的相关工具，因为它们通过人类可解释的符号进行预测。然而，高任务准确率并不能保证这些符号被忠实地检测到：联合训练的CBM可能在瓶颈中编码任务特定的捷径，使其解释不可靠。在本文中，我们通过交换共享相同符号词汇的独立训练的概念检测器和分类头来研究概念检测的可靠性。我们利用由此产生的性能下降、概念级指标和符号级不确定性估计来识别特别容易发生虚假激活的概念。最后，我们提出了一种可靠性感知训练策略，其中共享的概念检测器通过多个分类头进行优化，并因依赖全局或实例级不可靠符号而受到惩罚。在具有完整概念监督的CUB-200-2011上，检测器和头几乎可以自由互换（交换下降低于一个准确率点，相对保留率高于99%，且没有概念检测低于随机水平），而在受控的合成任务上，我们表明，随着概念监督权重的减少，模型保持近乎完美的任务准确率，而交换准确率和与真实概念的一致性下降到随机水平。我们的可靠性感知训练显著缓解了这种泄漏，在泄漏情况下大致使交换准确率翻倍。

英文摘要

Concept Bottleneck Models (CBMs) are a relevant tool for explainable Artificial Intelligence because they make their predictions through human-interpretable symbols. However, high task accuracy does not guarantee that these symbols are detected faithfully: jointly trained CBMs may encode task-specific shortcuts in the bottleneck, making their explanations unreliable. In this paper, we study concept-detection reliability by swapping independently trained concept detectors and classification heads that share the same symbolic vocabulary. We use the resulting performance degradation, concept-level metrics, and symbol-wise uncertainty estimates to identify concepts that are especially prone to spurious firing. Finally, we propose a reliability-aware training strategy in which a shared concept detector is optimized with multiple classification heads and penalized for relying on globally or instance-wise unreliable symbols. On CUB-200-2011 with full concept supervision, detectors and heads are almost freely interchangeable (swap drop below one accuracy point, relative retention above $99\%$, and no concept detected below chance), whereas on a controlled synthetic task we show that, as the concept-supervision weight is reduced, models keep near-perfect task accuracy while swapped accuracy and agreement with the ground-truth concepts collapse to chance. Our reliability-aware training substantially mitigates this leakage, roughly doubling swap accuracy in the leaky regime.

URL PDF HTML ☆

赞 0 踩 0

2606.16602 2026-06-16 cs.LG cs.NA math.NA physics.comp-ph 新提交

PhysGuard: Fisher-Guided Gradient Projection for Sim-to-Real Neural PDE Surrogates

PhysGuard: 面向仿真到现实神经PDE代理的Fisher引导梯度投影

Changjian Zhou, Junfeng Fang, Negin Yousefpour, Peng Wu, Bin Yan, Guillermo A Narsilio

发表机构 * Faculty of Engineering and IT, University of Melbourne（墨尔本大学工程与信息技术学院）； School of Computing, National University of Singapore（新加坡国立大学计算机学院）； Artificial Intelligence Research Institute, IFLYTEK Co., Ltd.（科大讯飞股份有限公司人工智能研究院）

AI总结针对神经算子模型从仿真到现实迁移时的精度下降问题，提出PhysGuard框架，利用仿真数据的Fisher信息矩阵保护物理关键参数，限制微调更新方向，在严重域偏移下将低频误差降低32%。

详情

AI中文摘要

在仿真数据上训练的神经算子模型由于仿真到现实的差距，应用于实验测量时往往失去精度。使用有限真实数据的标准微调可以缩小这一差距，但可能损害预训练期间学到的核心物理相关表示。尽管知识保留自适应在视觉或语言任务中已被广泛研究，但对于架构和受保护知识根本不同的神经算子，这些方法是否适用仍不清楚。神经算子需要保留核心尺度的物理结构，而非语义或视觉特征。我们提出PhysGuard，一个用于神经算子精确仿真到现实自适应的物理保留框架。具体来说，PhysGuard利用在仿真数据上计算的实证Fisher信息矩阵来识别物理关键参数方向，然后将微调更新限制在不干扰这些方向的方向上。逐层的Gram矩阵公式使其对具有数百万参数的模型高效，而自适应阈值自动确定受保护子空间大小。频谱探测实验表明，主导Fisher方向与低频输出结构强相关。在四个神经算子架构和不同物理系统的基准实验表明，与基线相比，PhysGuard在大多数评估指标上表现强劲。在严重域偏移下优势最为明显，与标准微调相比，低频误差降低高达32%，同时保持适应性。我们的代码可在https://github.com/ZhouChaunge/PhysGuard获取。

英文摘要

Neural operator models trained on simulation data often lose accuracy when applied to experimental measurements due to the sim-to-real gap. Standard fine-tuning with limited real data can reduce this gap, but it may also damage the core physics-relevant representations learned during pretraining. Although knowledge-preserving adaptation has been widely investigated in vision or language tasks, it remains unclear whether these methods are suitable for neural operators whose architectures and protected knowledge are fundamentally different. Neural operators need to preserve core-scale physical structures rather than semantic or visual features. We propose PhysGuard, a physics-preserving framework for accurate sim-to-real adaptation of neural operators. Specifically, PhysGuard uses the empirical Fisher Information Matrix computed on simulation data to identify physics-critical parameter directions, then restricts fine-tuning updates to directions that do not interfere with them. A layer-wise Gram-matrix formulation makes this efficient for models with millions of parameters, while an adaptive threshold automatically determines the protected subspace size. A spectral probe experiment shows that the dominant Fisher directions are strongly associated with low-frequency output structures. Experiments on benchmark across four neural operator architectures and different physical systems show that PhysGuard performs strongly on most evaluation metrics compared to baselines. The benefits are most evident under severe domain shift, where it reduces low-frequency error by up to 32\% compared to standard fine-tuning while maintaining adaptability. Our code is available at https://github.com/ZhouChaunge/PhysGuard.

URL PDF HTML ☆

赞 0 踩 0

2606.16682 2026-06-16 cs.LG cs.CL 新提交

Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agents

多模态评估者偏好坍缩：自进化智能体中的跨模态传染

Zewen Liu

发表机构 * Qilu Institute of Technology, School of Software Engineering（齐鲁理工学院软件工程学院）

AI总结研究多模态自评估中偏好坍缩的加剧现象，发现跨模态传染导致策略选择扭曲，并引入传染矩阵量化风险。

Comments 19 pages, 0 figures

详情

AI中文摘要

当AI智能体使用语言模型在反馈循环中评估自身输出时，会出现系统性偏差。我们表明，评估者偏好坍缩（EPC）在多模态设置中被显著放大。使用GPT-4o评估DeepSeek-chat在文本和视觉任务上的表现，我们发现单一策略（step_by_step）吸收了48.4%的权重——是纯文本自评估中坍缩的3.2倍——而三个视觉域策略合计仅获得9.1%的权重。然后，我们展示了一种称为跨模态传染的新现象：在一个模态上获得的评估者偏好会迁移到另一个模态并破坏其策略选择。通过一个四阶段隔离训练范式，我们测量了传染系数并记录了策略反转——一个模态的最优策略在跨模态暴露后发生逆转。跨四种评估者配置（总计53次独立重复，15,592次API调用）的第3阶段统计验证揭示了一个清晰的层次结构：跨模型评估（GPT-4o，N=8）产生强但对称的双向传染（平均gamma_{T->V}=1.176，gamma_{V->T}=1.089，Delta=-0.088，p=0.575，Cohen's d=0.29）；高轮次（DashScope，50轮）导致坍缩为单一策略主导（70%零传染）；而自评估提供近乎完全的免疫——97%的运行（N=30，DeepSeek-chat）产生恰好为零的传染（平均gamma=0.033，95% CI [-0.031, 0.010]，p=0.642，d=0.07）。没有评估者条件显示出统计显著的方向不对称性。我们引入了由评估者身份索引的传染矩阵，发布了MM-EPC实验框架，并将跨模型评估者架构确定为偏好传染的主要风险因素。

英文摘要

When AI agents use language models to evaluate their own outputs in a feedback loop, systematic biases emerge. We show that Evaluator Preference Collapse (EPC) is dramatically amplified in multimodal settings. Using GPT-4o to evaluate DeepSeek-chat across text and visual tasks, we find that a single strategy (step_by_step) absorbs 48.4% of all weight -- 3.2x the collapse observed in text-only self-evaluation -- while three visual-domain strategies receive only 9.1% combined weight. We then demonstrate a novel phenomenon we term cross-modal contagion: evaluator preferences acquired on one modality transfer to and corrupt strategy selection on another. Through a four-phase isolation training paradigm, we measure contagion coefficients and document strategy inversion -- the optimal strategy for a modality reverses after cross-modal exposure. A Phase 3 statistical validation across four evaluator configurations (N=53 total independent repetitions, 15,592 API calls) reveals a clear hierarchy: cross-model evaluation (GPT-4o, N=8) produces strong but symmetric bidirectional contagion (mean gamma_{T->V}=1.176, gamma_{V->T}=1.089, Delta=-0.088, p=0.575, Cohen's d=0.29); high round counts (DashScope, 50 rounds) cause collapse to single-strategy dominance (70% zero contagion); and self-evaluation provides near-complete immunity -- 97% of runs (N=30, DeepSeek-chat) yield exactly zero contagion (mean gamma=0.033, 95% CI [-0.031, 0.010], p=0.642, d=0.07). No evaluator condition shows statistically significant directional asymmetry. We introduce the contagion matrix indexed by evaluator identity, release the MM-EPC experimental framework, and identify cross-model evaluator architecture as the primary risk factor for preference contagion.

URL PDF HTML ☆

赞 0 踩 0

2606.16786 2026-06-16 cs.LG 新提交

We Need Explanation Cards to Connect Explanation Algorithms to the Real World

我们需要解释卡来连接解释算法与现实世界

Eric Günther, Balázs Szabados, Kristof Meding, Gunnar König, Sebastian Bordt, Ulrike von Luxburg

发表机构 * University of Tübingen（蒂宾根大学）； Tübingen AI Center（蒂宾根人工智能中心）； HUN-REN Institute for Computer Science and Control (SZTAKI), Budapest, Hungary（匈牙利科学院计算机科学与控制研究所（SZTAKI））

AI总结针对算法解释在实践中含义模糊且信息不足的问题，提出解释卡，通过补充鲁棒性和有效性信息及解释说明，帮助用户正确解读，并满足欧盟AI法案的可解释性要求。

详情

AI中文摘要

算法解释旨在帮助利益相关者理解不透明的算法决策，但在实践中往往达不到预期。首先，算法解释的含义通常不是人们直观期望的那样，因此需要专业知识才能正确解释。其次，最近的研究表明，流行的解释算法对于复杂决策函数的行为信息不足。这些共同导致了解释表面传达的内容与实际提供的内容之间的差距。在这项工作中，我们提出了解释算法的解释卡，它用关于鲁棒性和有效性的补充信息以及清晰的解释说明来增强标准解释。补充信息可以使原本无信息的解释变得实际有用，同时也有助于检测它们不适用的情况。重要的是，解释卡中的解释说明将责任从用户转移到提供者：提供者必须事先明确说明从解释中可以得出什么和不能得出什么，而不是期望用户自己识别。使用反事实解释和SHAP作为示例，我们展示了提供者如何构建解释卡，以及这些卡为用户提供了正确解释所需的指导。我们进一步论证了解释卡是实践欧盟AI法案可解释性规定的实用手段。总体而言，解释卡是使解释算法适应现实世界用例的重要一步。

英文摘要

Algorithmic explanations are intended to help stakeholders understand opaque algorithmic decisions, but in practice, they often fall short. First, the meaning of algorithmic explanations is often not what one might intuitively expect, so expert knowledge is required to interpret them correctly. Second, recent work has shown that popular explanation algorithms are uninformative about the behavior of complex decision functions. Together, these issues create a gap between what explanations appear to convey and what they actually provide. In this work, we propose Explanation Cards for Explanation Algorithms, which augment standard explanations with complementary information about robustness and validity, as well as clear instructions for interpretation. The complementary information can render otherwise uninformative explanations practically useful, while also helping to detect cases where they are not. Importantly, the interpretation instructions in explanation cards shift responsibility from users to providers: Rather than expecting users to recognize what can and cannot be concluded from an explanation, providers must make this explicit upfront. Using counterfactual explanations and SHAP as examples, we demonstrate how providers can construct explanation cards and that these cards provide users with the guidance needed for sound interpretation. We further argue that explanation cards offer a practical means of operationalising the explainability provisions of the EU AI Act. Overall, explanation cards are a significant step toward making explanation algorithms fit for real-world use cases.

URL PDF HTML ☆

赞 0 踩 0

2606.16883 2026-06-16 cs.LG cs.AI 新提交

Upper Bounds on the Generalization Error of Deep Learning Models via Local Robustness and Stability

深度学习模型泛化误差的上界：基于局部鲁棒性和稳定性

Abdul-Rauf Nuhu, Parham M. Kebria, Vahid Hemmati, Mahmoud N. Mahmoud, Edward Tunstel, Abdollah Homaifar

发表机构 * North Carolina Agricultural and Technical State University（北卡罗来纳农业技术州立大学）； University of Alabama（阿拉巴马大学）； Southwest Research Institute（西南研究院）

AI总结提出一种通过局部区域稳定样本数缩放鲁棒性项的泛化上界，在ImageNet上实现非空洞且最紧的误差估计。

详情

AI中文摘要

泛化是数据驱动模型的关键属性，尤其是在安全关键应用中部署的深度学习模型。基于鲁棒性的泛化界作为一种将鲁棒性与泛化性能联系起来的原则性方法而受到关注，通常以数据依赖的方式。然而，大多数现有界在实际设置中存在空洞问题，产生远超过实际错误率的松散上界，限制了其在真实世界评估中的实用性。虽然这个问题通常归因于不确定性项，但问题的很大一部分源于鲁棒性项本身，特别是对于0-1损失。现有方法通常将鲁棒性项视为全局度量，忽略了其在输入空间不同子区域间的变化。在这项工作中，我们提出了一种泛化界，通过根据每个子区域内稳定和不稳定样本的数量来缩放鲁棒性项，从而解决了这一局限性。我们的界同时包含数据和模型依赖因素，同时保持实际相关性（产生更紧的真实误差上界）。在ImageNet数据集上训练的模型上的实验表明，我们的界始终非空洞，并在现有方法中实现了最紧的估计，与一系列鲁棒深度神经网络的实证性能紧密对齐。

英文摘要

Generalization is a critical property of data-driven models, particularly deep learning models deployed in safety-critical applications. Robustness-based generalization bounds have gained attention as a principled way to link robustness properties to generalization performance, often in a data-dependent manner. However, most existing bounds suffer from vacuousness in practical settings, yielding loose upper bounds that greatly exceed the actual error rates and limiting their usefulness for real-world evaluation. While this issue is often attributed to the uncertainty term, a substantial part of the problem originates from the robustness term itself, particularly for the 0-1 loss. Existing approaches typically treat the robustness term as a global measure, ignoring its variation across different sub-regions of the input space. In this work, we propose a generalization bound that addresses this limitation by scaling the robustness term according to the number of stable and unstable samples within each sub-region. Our bounds incorporate both data- and model-dependent factors while maintaining practical relevance (yielding tighter upper bounds on true error). Experiments on models trained on the ImageNet dataset show that our bounds remain consistently non-vacuous and achieve the tightest estimates among existing methods, closely aligning with empirical performance across a range of robust deep neural networks.

URL PDF HTML ☆

赞 0 踩 0

2606.14867 2026-06-16 cs.CL cs.AI cs.LG 交叉投稿

Evaluating the Robustness of Proof Autoformalization in Lean 4

评估 Lean 4 中证明自动形式化的鲁棒性

Zhengtao Gui, Sheng Yang, Zhouxing Shi

发表机构 * University of California, Irvine（加州大学洛杉矶分校）； University of California, Riverside（加州大学河滨分校）

AI总结研究证明自动形式化模型在全局和局部扰动下的鲁棒性，发现现有模型对全局扰动敏感且多数无法忠实反映局部扰动。

Comments Preprint

详情

AI中文摘要

证明自动形式化旨在将用自然语言编写的数学非正式证明翻译成形式语言（如 Lean~4）中的形式证明。已有几项工作开发了基于 LLM 的证明自动形式化模型。然而，现有评估通常侧重于翻译来自精选数据集的规范非正式证明。我们认为，一个鲁棒的证明自动形式化器必须即使对于偏离这些理想化形式的非正式证明也能保持忠实，并提出了首个关于证明自动形式化模型鲁棒性的研究。我们制定了两类扰动并评估每种扰动下的鲁棒性：全局扰动以不同风格改写非正式证明，在此情况下形式化应保持一致；局部扰动改变一个值、符号或证明步骤，可能是反事实的方式，鲁棒的形式化应忠实地反映扰动，而不是自行恢复为原始形式或推断出不同的形式。我们在 miniF2F 和 MATH-500 上构建了包含两种扰动的基准，并自动衡量证明自动形式化在全局扰动下正确性的稳定程度，以及其输出在局部扰动下的忠实程度。我们评估了七个最新模型，所有模型都对全局扰动敏感，且大多数在局部扰动下无法保持忠实。代码和数据可通过 https://github.com/ucr-rai/robust-proof-autoformalization 获取。

英文摘要

Proof autoformalization aims to translate a mathematical informal proof written in natural language into a formal proof in a formal language such as Lean~4. Several works have developed LLM-based models for proof autoformalization. However, existing evaluations have typically focused on translating well-formed informal proofs from curated datasets. We argue that a robust proof autoformalizer must remain faithful even for informal proofs that diverge from these idealized ones, and we present the first study on the robustness of proof autoformalization models. We formulate two categories of perturbations and evaluate robustness under each: a global perturbation paraphrases the informal proof in a different style, under which the formalization should remain consistent; a local perturbation alters a value, symbol, or proof step, possibly in a counterfactual way, and a robust formalization should faithfully reflect the perturbation rather than reverting to the original one or inferring a different one on its own. We build a benchmark with both perturbations on miniF2F and MATH-500, and automatically measure how stable a proof autoformalization's correctness is under global perturbations and how faithfully its output reflects local perturbations. We evaluate seven recent models, all of which are sensitive to global perturbations and mostly fail to remain faithful under local perturbations. Code and data are available via https://github.com/ucr-rai/robust-proof-autoformalization.

URL PDF HTML ☆

赞 0 踩 0

2606.14909 2026-06-16 stat.ML cs.LG 交叉投稿

Audited Conformal Prediction for Classification under Unknown Distribution Shift

未知分布漂移下分类问题的审计共形预测

Yanfei Zhou, Rizal Fathony, Nam H. Nguyen, Matteo Sesia

发表机构 * Department of Data Sciences and Operations, University of Southern California（数据科学与运营系，南加州大学）； AI Foundations, Capital One（Capital One人工智能基础）； Department of Data Sciences and Operations, Thomas Lord Department of Computer Science, University of Southern California（数据科学与运营系，托马斯·劳德计算机科学系，南加州大学）

AI总结提出审计共形预测方法，利用目标群体小标注数据训练审计模型识别旧模型可能失败的输入，结合共形预测框架在保证边际覆盖的同时提高条件覆盖，并提供理论保证。

详情

AI中文摘要

我们考虑在未知分布漂移下部署的预训练分类模型的不确定性量化问题。我们提出了审计共形预测（ACP），该方法利用来自目标群体的小标注数据集训练一个辅助审计模型，以识别旧模型可能失败的输入。通过将审计模型的输出整合到共形预测框架中，ACP 产生的预测集在保证边际覆盖的同时，在实践中比现有方法实现了更高的条件覆盖。我们开发并分析了两种互补的整合策略——一种针对边际覆盖并改善条件性能，另一种提供明确的组条件覆盖保证——并为两者建立了理论保证。在合成和真实世界数据集上的实验验证了该方法，并说明了预测集大小与条件覆盖之间的权衡。

英文摘要

We consider the problem of uncertainty quantification for a pretrained classification model deployed under unknown distribution shift. We propose Audited Conformal Prediction (ACP), a method that leverages a small labeled dataset from the target population to train an auxiliary audit model identifying inputs where the legacy model is likely to fail. By integrating the audit model's outputs into the conformal prediction framework, ACP produces prediction sets that guarantee marginal coverage while achieving substantially higher conditional coverage in practice than existing approaches. We develop and analyze two complementary integration strategies -- one targeting marginal coverage with improved conditional performance, the other providing explicit group-conditional coverage guarantees -- and establish theoretical guarantees for both. Experiments on synthetic and real-world datasets validate the method and illustrate trade-offs between prediction set size and conditional coverage.

URL PDF HTML ☆

赞 0 踩 0

2606.15779 2026-06-16 cs.CV cs.LG 交叉投稿

Faithful Action-unit Causal Reasoning for Counterfactually Faithful Emotion Explanations

面向反事实忠实情感解释的忠实动作单元因果推理

Van Thong Huynh, Hong Hai Nguyen, Thuy Pham, Trong Nghia Nguyen, Soo-Hyung Kim

发表机构 * Faculty of CSE, Ho Chi Minh City University of Technology (HCMUT), VNUHCM（胡志明市理工大学计算机科学与工程学院，越南国家大学胡志明市分校）； Dept. of AI, FPT University（FPT大学人工智能系）； Faculty of DSAI, College of Technology, National Economic University（国民经济大学技术学院数据科学与人工智能系）； Dept. of AI Convergence, Chonnam National University（全南大学人工智能融合系）

AI总结提出FACR方法，通过反事实一致性目标和极性感知因果图，训练模型在动作单元与情感之间实现可测量的因果忠实性，在UNBC-PAIN数据集上将忠实度从0.08提升至0.57。

详情

AI中文摘要

多模态模型可以命名面部情感背后的动作单元（AU），但其AU->情感的解释通常是合理的而非忠实的：没有任何机制强制模型调用的AU是实际驱动其预测的AU。我们将AU->情感推理视为解释、标签和结构化AU->情感因果图G之间的反事实一致性问题，并提出FACR，该方法将推理器建立在独立诱导的、极性感知的G上，并训练一个反事实忠实性目标：对G标记为某类因果的AU进行do干预必须改变预测，而对标记为无关的AU进行do干预必须保持预测不变。因此，忠实性既可通过匹配的干预指标进行训练和测量，我们针对已知因果结构PSPI疼痛-AU组成评估该指标，因为现有情感推理基准不支持。我们明确指出，该指标测试的是对给定结构的忠实性而非重新发现：它询问训练后的推理器是否调用结构标记为因果的AU，在留出受试者和第二个数据集上进行评估。在UNBC-PAIN上的受试者独立评估中，该目标将调用AU与PSPI组成的一致性从无目标的基线0.08提高到0.57，检测成本略有增加；一个不忠实控制实验将增益归因于该目标。在跨数据集情感迁移中，该目标同样提高了七类任务上对G的忠实性（0.50到0.84）。最后，我们附加语言verbalizer并将审计扩展到生成的文本：通过潜在激活偏置每个动作单元的发射，使解释在结构上忠实，因此消融一个AU会将其从解释中移除，该属性可迁移到第二个语言模型骨干，而自由生成的解释则不忠实。

英文摘要

Multimodal models can name the action units (AUs) behind a facial emotion, but their AU->emotion rationales are typically plausible rather than faithful: nothing forces the AUs a model invokes to be the AUs that actually drive its prediction. We cast AU->emotion reasoning as a counterfactual-consistency problem between the rationale, the label, and a structural AU->emotion causal graph G, and propose FACR, which grounds the reasoner in an independently induced, polarity-aware G and trains a counterfactual-faithfulness objective: a do-intervention on an AU that G marks causal for a class must move the prediction, while one it marks irrelevant must leave it unchanged. Faithfulness is thereby both trainable and measurable through a matching interventional metric, which we evaluate against a known causal structure, the PSPI pain-AU composition, as no existing affective-reasoning benchmark allows. We are explicit that this metric tests fidelity to the supplied structure rather than its rediscovery: it asks whether the trained reasoner invokes the AUs the structure marks causal, on held-out subjects and a second dataset. Under subject-independent evaluation on UNBC-PAIN, the objective raises the agreement between the invoked AUs and the PSPI composition from a no-objective baseline of 0.08 to 0.57, at a small detection cost; an unfaithfulness control attributes the gain to the objective. On a cross-dataset emotion transfer, the objective likewise raises fidelity to G on a seven-class task (0.50 to 0.84). Finally, we attach a language verbalizer and extend the audit to the generated text: biasing each action unit's emission by its latent activation makes the rationale faithful by construction, so that ablating an AU removes it from the explanation, a property that transfers to a second language-model backbone, whereas a freely generated rationale is unfaithful.

URL PDF HTML ☆

赞 0 踩 0

2606.15821 2026-06-16 cs.CL cs.AI cs.LG 交叉投稿

The Truth Stays in the Family: Enhancing Contextual Grounding via Inherited Truthful Heads in Model Lineages

真相留在家族中：通过模型谱系中继承的真相头增强上下文基础

Miso Choi, Seonga Choi, Mincheol Kwon, Woosung Joung, Jinkyu Kim, Jungbeom Lee

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结研究发现基础LLM与下游变体间存在上下文真相分数的强继承性，提出TruthProbe软门控策略放大真相头以提升上下文真实性并减少多模态幻觉。

Comments Accepted at ICML 2026

详情

AI中文摘要

大型语言模型（LLM）的最新进展产生了许多共享基础LLM的专业多模态LLM（MLLM），形成了不同的模型谱系。基础LLM与下游变体之间是否存在基本的行为联系尚不清楚。我们通过量化头部级别的上下文真相分数来研究这个问题。在包括基于Vicuna、Qwen2.5、LLaMA2和Mistral的模型在内的多种LLM和MLLM谱系中，我们发现真相分数在模型家族内被强烈保留，即使在指令调优或多模态适应后也是如此。我们进一步表明，这种继承与注意力头权重保留一致，并且上下文真相头关注查询相关的证据。基于这一发现，我们提出了TruthProbe，一种软门控策略，在保留其他头部贡献的同时放大上下文真相头。TruthProbe在HaluEval上提高了上下文真实性，并在POPE和CHAIR上减少了多模态幻觉，基础LLM的真相分数有效转移到其微调的LLM和MLLM后代。代码可在https://github.com/miso-choi/TruthProbe获取。

英文摘要

Recent advances in large language models (LLMs) have produced many specialized multimodal LLMs (MLLMs) that share common foundational LLMs, forming distinct model lineages. It remains unclear whether a fundamental behavioral link exists between the foundational LLMs and downstream variants. We investigate this question by quantifying head-level context-truthfulness scores. Across diverse LLM and MLLM lineages, including Vicuna-, Qwen2.5-, LLaMA2-, and Mistral-based models, we find that Truth Scores are strongly preserved within model families, even after instruction tuning or multimodal adaptation. We further show that this inheritance is consistent with attention-head weight preservation, and that context-truthful heads attend to query-relevant evidence. Building on this finding, we propose TruthProbe, a soft-gating strategy that amplifies context-truthful heads while preserving other head contributions. TruthProbe improves contextual truthfulness on HaluEval and reduces multimodal hallucination on POPE and CHAIR, with base-LLM Truth Scores transferring effectively to their fine-tuned LLM and MLLM descendants. Code is available at https://github.com/miso-choi/TruthProbe.

URL PDF HTML ☆

赞 0 踩 0

2606.15964 2026-06-16 stat.ML cs.LG 交叉投稿

通过策略性成对数据扰动进行排名滥用

Junyi Yao, Zihao Zheng, Jiayu Long

发表机构 * Computational Decision Systems Report GitHub Issue（计算决策系统报告GitHub问题）； GitHub Issue（GitHub问题）

AI总结研究基于最大似然估计的成对排名系统对策略性数据扰动的脆弱性，提出自适应子集选择攻击（ASSA）方法，实验表明少量扰动即可显著改变全局排名。

详情

AI中文摘要

基于最大似然估计（MLE）的成对排名系统，如Bradley-Terry模型，被广泛用于从成对比较中聚合偏好。然而，它们在策略性数据操纵下的鲁棒性仍未被充分理解。在本文中，我们研究了基于MLE的排名系统对对抗性扰动的脆弱性。我们将操纵任务形式化为一个受约束的组合优化问题，并提出了一种自适应子集选择攻击（ASSA）来高效识别高影响力的扰动。在合成数据和真实世界选举数据集上的实验结果表明，基于MLE的排名表现出尖锐的相变行为：在超过一个小的扰动预算后，有限数量的策略性投票者可以显著改变全局排名。特别是，我们的方法在受约束的预算下始终优于随机和贪婪基线。这些发现揭示了基于MLE的排名机制对结构化扰动的基本敏感性，并强调了在集体决策系统中需要更鲁棒的聚合方法。

英文摘要

Pairwise ranking systems based on Maximum Likelihood Estimation (MLE), such as the Bradley-Terry model, are widely used to aggregate preferences from pairwise comparisons. However, their robustness under strategic data manipulation remains insufficiently understood. In this paper, we study the vulnerability of MLE-based ranking systems to adversarial perturbations. We formulate the manipulation task as a constrained combinatorial optimization problem and propose an Adaptive Subset Selection Attack (ASSA) to efficiently identify high-impact perturbations. Experimental results on both synthetic data and real-world election datasets show that MLE-based rankings exhibit a sharp phase-transition behavior: beyond a small perturbation budget, a limited number of strategic voters can significantly alter the global ranking. In particular, our method consistently outperforms random and greedy baselines under constrained budgets. These findings reveal a fundamental sensitivity of MLE-based ranking mechanisms to structured perturbations and highlight the need for more robust aggregation methods in collective decision-making systems.

URL PDF HTML ☆

赞 0 踩 0

2605.00924 2026-06-16 cs.LG cs.AI 版本更新

StyleShield: Exposing the Fragility of AIGC Detectors through Continuous Controllable Style Transfer

StyleShield: 通过连续可控风格迁移揭示AIGC检测器的脆弱性

Guantian Zheng

发表机构 * National University of Singapore（新加坡国立大学）

AI总结提出StyleShield，一种基于流匹配的条件文本风格迁移框架，通过连续控制风格迁移强度，在保持语义相似度的同时实现高规避率，并引入RateAudit算法质疑基于分数的检测可靠性。

Comments 12 pages, 5 figures. Code and model weights will be released upon acceptance

详情

AI中文摘要

AI生成内容（AIGC）检测器越来越多地部署在学术诚信筛查等高风险场景中，然而其可靠性依赖于一个基本悖论：随着语言模型在人类编写的语料库上训练，AI与人类写作之间的统计边界将不可避免地消失。商业激励进一步扭曲了这一格局——检测服务和“去AI化”工具通常在同一供应链中运作，用对内容来源的判断取代了对内容质量的评估。我们提出了StyleShield，这是第一个用于条件文本风格迁移的流匹配框架，通过DiT骨干网络和零初始化交叉注意力适配器，直接在连续token嵌入空间中操作，并以冻结的Qwen-7B表示为条件。在推理时，我们将图像合成中的SDEdit范式适配到文本嵌入，通过单个参数gamma提供对规避-保留权衡的平滑连续控制。在一个多领域中文基准测试中，StyleShield对训练检测器实现了94.6%的规避率，对三个未见检测器实现了≥99%的规避率，同时保持了0.928的语义相似度。我们进一步引入了RateAudit，一种文档级调度算法，证明检测率判定可以设置为任意值，直接质疑了基于分数评估的可靠性。

英文摘要

AI-generated content (AIGC) detectors are increasingly deployed in high-stakes settings such as academic integrity screening, yet their reliability rests on a fundamental paradox: as language models are trained on human-written corpora, the statistical boundary between AI and human writing will inevitably dissolve as models improve. Commercial incentives have further distorted this landscape -- detection services and "de-AIification" tools often operate within the same supply chain, replacing evaluation of content quality with judgment of content origin. We present StyleShield, the first flow matching framework for conditional text style transfer, operating directly in continuous token embedding space via a DiT backbone with zero-initialized cross-attention adapters conditioned on frozen Qwen-7B representations. At inference, we adapt the SDEdit paradigm from image synthesis to text embeddings, with a single parameter gamma providing smooth continuous control over the evasion-preservation trade-off. On a multi-domain Chinese benchmark, StyleShield achieves 94.6% evasion against the training detector and >=99% against three unseen detectors, maintaining 0.928 semantic similarity. We further introduce RateAudit, a document-level scheduling algorithm that demonstrates detection-rate verdicts can be set to arbitrary values, directly questioning the reliability of score-based evaluation.

URL PDF HTML ☆

赞 0 踩 0

通过加权积分梯度增强视觉特征归因

Kien Tran Duc Tuan, Tam Nguyen Trong, Son Nguyen Hoang, Khoat Than, Anh Nguyen Duc

发表机构 * Institute of Information and Communication Technology, Vietnam Academy of Science and Technology（越南科学与技术学院信息与通信技术研究所）

AI总结针对积分梯度方法对基线选择敏感的问题，提出加权积分梯度，通过无监督准则自适应选择和加权基线，在保持公理性质的同时提升归因可靠性，实验显示在卷积和Transformer架构上最高提升36%。

详情

AI中文摘要

积分梯度（IG）是可解释AI中广泛使用的归因方法，尤其在需要可靠特征归因的计算机视觉应用中。IG的一个关键限制是其对基线（参考）图像选择的敏感性。多基线扩展如期望梯度（EG）假设基线均匀加权，隐含地认为所有基线图像信息量相等。在高维视觉模型中，这一假设常导致噪声或不稳定的解释。本文提出加权积分梯度（WG），一种通过评估和加权基线来增强归因可靠性的原则性方法。WG引入了一个无监督的基线适用性标准，实现了基于每个输入的自适应基线选择和加权。该方法在广义加权基线形式下保留了IG的核心公理性质。在预期的、基于代理的适应度-相关性单调性假设下，WG为更信息丰富的基线分配更大权重提供了概率依据。在常用图像数据集和模型上的实验表明，在我们的协议下，WG优于EG，在评估的卷积和Transformer架构上最高提升36%。这些提升伴随着额外的适应度评估成本，因此WG应被视为归因保真度的权衡，而非EG的更快替代方案。通过超越所有基线贡献相等的假设，加权积分梯度为解释计算机视觉模型提供了更清晰、更可靠的方法，提高了可解释AI的理解和实际可用性。

英文摘要

Integrated Gradients (IG) is a widely used attribution method in explainable AI, particularly in computer vision applications where reliable feature attribution is essential. A key limitation of IG is its sensitivity to the choice of baseline (reference) images. Multi-baseline extensions such as Expected Gradients (EG) assume uniform weighting over baselines, implicitly treating all baseline images as equally informative. In high-dimensional vision models, this assumption often leads to noisy or unstable explanations. This paper proposes Weighted Integrated Gradients (WG), a principled approach that evaluates and weights baselines to enhance attribution reliability. WG introduces an unsupervised criterion for baseline suitability, enabling adaptive selection and weighting of baselines on a per-input basis. The method preserves the core axiomatic properties of IG in a generalized weighted-baseline form. Under an expected, proxy-based fitness--relevance monotonicity assumption, WG provides a probabilistic justification for assigning larger weights to more informative baselines. Experiments on commonly used image datasets and models show that WG improves over EG under our protocol, with up to 36% gains across evaluated convolutional and Transformer architectures. These gains come with additional fitness-evaluation cost, so WG should be viewed as an attribution-fidelity trade-off rather than a faster alternative to EG. By moving beyond the assumption that all baselines contribute equally, Weighted Integrated Gradients offers a clearer and more reliable approach to explaining computer-vision models, improving both understanding and practical usability in explainable AI.

URL PDF HTML ☆

赞 0 踩 0

2603.10562 2026-06-16 math.OC cs.LG cs.SY eess.SY 版本更新

Quantization Robustness of Monotone Operator Equilibrium Networks

单调算子均衡网络的量化鲁棒性

James Li, Philip H. W. Leong, Thomas Chaffey

发表机构 * School of Electrical and Computer Engineering, The University of Sydney（悉尼大学电气与计算机工程学院）

AI总结分析单调算子均衡网络在低精度硬件部署时权重量化对收敛性和均衡解的影响，提出基于谱扰动和单调性边界的理论保证，并通过MNIST实验验证了量化精度与收敛性的相变关系。

Comments 6 pages, 4 figures. Accepted for publication in IEEE Control Systems Letters (L-CSS)

详情

AI中文摘要

单调算子均衡网络是隐式层模型，其输出是单调算子的唯一均衡点，保证了存在性、唯一性和收敛性。当部署在低精度硬件上时，权重被量化，可能破坏这些保证。我们将权重量化分析为底层单调包含的谱扰动。当谱范数权重扰动小于单调性边界时，量化求解器的收敛性得到保证；量化与全精度均衡之间的位移由扰动大小和边界界定；一个条件数（算子范数与边界的比值）将量化精度与前向误差联系起来。MNIST实验在预测阈值处确认了相变：三位和四位后训练量化发散，而五位及以上收敛。反向传播保证使得量化感知训练成为可能，在四位时恢复了可证明的收敛性。

英文摘要

Monotone operator equilibrium networks are implicit-layer models whose output is the unique equilibrium of a monotone operator, guaranteeing existence, uniqueness, and convergence. When deployed on low-precision hardware, weights are quantized, potentially destroying these guarantees. We analyze weight quantization as a spectral perturbation of the underlying monotone inclusion. Convergence of the quantized solver is guaranteed whenever the spectral-norm weight perturbation is smaller than the monotonicity margin; the displacement between quantized and full-precision equilibria is bounded in terms of the perturbation size and margin; and a condition number characterizing the ratio of the operator norm to the margin links quantization precision to forward error. MNIST experiments confirm a phase transition at the predicted threshold: three- and four-bit post-training quantization diverge, while five-bit and above converge. The backward-pass guarantee enables quantization-aware training, which recovers provable convergence at four bits.

URL PDF HTML ☆

赞 0 踩 0

2605.03297 2026-06-16 cs.SD cs.LG 版本更新

Contrastive Regularization for Accent-Robust ASR

对比正则化用于口音鲁棒的ASR

Van-Phat Thai, Aradhya Dhruv, Duc-Thinh Pham, Sameer Alam

发表机构 * Air Traffic Management Research Institute, Nanyang Technological University, Singapore（新加坡南洋理工大学航空交通管理研究所）； Center of AI Research, VinUniversity, Vietnam（越南Vin大学人工智能研究中心）

AI总结提出使用监督对比学习作为轻量级口音不变辅助目标，在CTC微调中正则化编码器表示，无需架构修改或显式口音监督，在L2-ARCTIC基准上实现高达25-29%的未见口音词错误率降低。

Comments Accepted by Interspeech 2026

2605.30837 2026-06-16 cs.CR cs.LG 版本更新

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

先派侦察兵：提示注入防御中自适应检测器分配的预推理方法

Shuhao Zhang, Jiarui Li, Qi Cao, Ruiyi Zhang, Pengtao Xie

发表机构 * UC San Diego（加州大学圣迭戈分校）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结针对提示注入检测器异构且不可靠的问题，提出SCOUT框架，通过预测每个检测器对每个样本的可靠性和延迟，动态分配检测器，实现安全性与效率的权衡。

Comments We propose SCOUT, a detector allocation framework that predicts each detector's accuracy and latency on a given input before running it, letting operators control the safety-utility trade-off with a single threshold and route to an LLM judge only when needed

详情

AI中文摘要

提示注入检测器是异构的：每个检测器在不同攻击切片上表现强劲，但没有一个始终可靠。然而现有系统仍将检测视为固定的单检测器流水线，将每个请求提交给一个检测器的盲点。我们将防御重新定义为检测器分配：给定一个异构池，决定每个请求运行哪些检测器以及是否升级到LLM法官。我们的框架SCOUT（可扩展且可控的结果预测用于不确定性感知分诊）通过预测每个检测器在类似历史输入上的样本级可靠性和延迟，使这一决策动态化，并向操作员暴露一个单一的安全-效用阈值（其中效用包含良性通过率和挂钟时间）。为了评估这一设置，我们构建了SCOUT-450基准，该基准捕捉了旧提示注入集未充分代表的、结构复杂的面向代理的注入。在SCOUT-450上，相对于始终开启的GPT-4o法官，安全导向的工作点将攻击成功率降低46%，总挂钟时间降低40%，同时良性效用下降5.1个百分点。SCOUT还迁移到三个外部基准（BIPIA、IPI和IHEval），改善了安全-效用前沿。

英文摘要

Prompt-injection detectors are heterogeneous: each is strong on a different slice of attacks, and none is always reliable. Yet existing systems still treat detection as a fixed single-detector pipeline, committing every request to one detector's blind spots. We reframe defense as detector allocation: given a heterogeneous pool, decide per request which detectors to run and whether to escalate to an LLM judge. Our framework SCOUT (Scalable and Controllable Outcome-prediction for Uncertainty-aware Triage) makes this decision dynamic by predicting each detector's per-sample reliability and latency from how it behaved on similar past inputs, and exposes a single safety-utility threshold to the operator (where utility bundles benign-pass rate and wall-clock). To evaluate this setting, we build SCOUT-450, a benchmark that captures the structurally complex, agent-facing injections that older prompt-injection sets under-represent. On SCOUT-450, a safety-oriented operating point reduces attack-success rate by 46% and total wall-clock by 40% relative to an always-on GPT-4o judge, at a 5.1-point benign-utility drop. SCOUT also transfers to three external benchmarks (BIPIA, IPI, and IHEval), improving the safety-utility frontier.

URL PDF HTML ☆

赞 0 踩 0

2606.14956 2026-06-16 cs.LG 新提交

A Comparative Study of Graph Neural Network Layer Selection for Interaction Modelling in Driving Trajectory Prediction

图神经网络层选择用于驾驶轨迹预测中交互建模的比较研究

George Daoud, Mohamed El-Darieby

发表机构 * Ontario Tech University（安大略理工大学）； Assiut University（艾斯尤特大学）

AI总结本文比较了19种图神经网络层在轨迹预测中的空间和时间处理能力，发现ARMA、Chebyshev和拓扑感知层表现最佳，并总结了基于和聚合、多头注意力和不同跳距权重等设计原则。

Comments 6 pages, 1 figure

详情

Journal ref: The IEEE Intelligent Vehicles Symposium (IEEE IV 2026)

AI中文摘要

自动驾驶系统依赖精确的轨迹预测来规划安全高效的移动。图神经网络（GNN）已成为对道路智能体间时空交互建模的一种有前景的方法。然而，为轨迹预测设计GNN架构仍缺乏标准化，关于哪些图层能有效捕捉空间交互和时间动态的指导很少。本文对19种图层类型进行了详细的比较研究，重点关注它们的空间和时间处理能力，以发现最有效的轨迹预测架构。在所探索的超参数设置中，我们突出了五种突出的图层组合，其中ARMA、Chebyshev和拓扑感知层始终优于其他层。除了性能指标外，我们的发现还产生了实用的设计原则：基于和的聚合比基于均值的方法更有效，多头注意力机制能够实现更丰富的交互，为不同跳距分配不同权重显著提高了预测精度。这些发现为设计更可解释和有效的轨迹预测模型提供了有用的指导。

英文摘要

Autonomous driving systems rely on precise trajectory prediction to plan safe and efficient movement. Graph Neural Networks (GNNs) have become a promising approach for modelling spatiotemporal interactions among road agents. However, designing GNN architectures for trajectory prediction remains non-standardized, with little guidance on which graph layers effectively capture spatial interactions and temporal dynamics. This paper offers a detailed comparative study of 19 graph layer types, focusing on their spatial and temporal processing capabilities to discover the most effective architectures for trajectory prediction. Within the explored hyperparameter setting, we highlight five standout layer combinations, with ARMA, Chebyshev, and topology-aware layers consistently performing better than others. Beyond performance metrics, our findings yield practical design principles: sum-based aggregation is more effective than mean-based methods, multi-head attention mechanisms enable richer interactions, and assigning different weights to different hop distances significantly improves prediction accuracy. These findings offer useful guidance for designing more interpretable and effective trajectory prediction models.

URL PDF HTML ☆

赞 0 踩 0

2606.16611 2026-06-16 cs.LG 新提交

TCHG: Tri-Trust Conditioned Heterogeneous Graph Learning for Reliable Dynamic Trust Prediction

TCHG：基于三重信任条件异构图学习的可靠动态信任预测

Bohao Liao, Boyu Deng, Qipeng Song, Jieling Wang, Jingchao Wang

发表机构 * Xidian University（西安电子科技大学）； Tsinghua University（清华大学）

AI总结提出TCHG框架，将信任证据分解为三个通道（实体可靠性、交互行为可靠性、上下文信任），分别控制图传播中的消息准入、传播强度和模式选择，并采用非均匀衰减的时间状态处理多尺度演化，实现可靠动态信任预测。

Comments 18 pages, 10 figures, 13 tables

详情

AI中文摘要

信任预测推断潜在的用户-用户信任关系，为社会推荐、虚假评论与操纵检测以及风险识别提供重要支持。图神经网络因其学习网络结构和复杂信任依赖的能力，已成为信任预测的主流方法。然而，现有方法通常依赖信任信号的统一表示，未将异质信任证据分解为独立的证据通道，未能利用不同证据通道在信任建模中应发挥的不同作用。为弥补这一不足，本文认为信任证据不应被视为无差别的输入，而应分解并用作图传播的功能控制因子。我们提出TCHG，一种三重信任条件异构图学习框架，将信任证据分解为三个通道，并赋予它们在传播中不同的功能角色：实体可靠性控制消息准入，交互行为可靠性调节传播强度，上下文信任通过上下文条件算子选择调整传播模式。由于三个证据通道以不同时间尺度演化，TCHG维护具有非均匀衰减率的独立时间状态，以防止快速变化的上下文信号覆盖缓慢积累的实体可靠性。它进一步预测信任概率并校准输出概率，提高稀疏或冲突证据下的预测置信度。在多个公开信任数据集上的大量实验表明，与代表性信任预测和异构图基线方法相比，TCHG实现了有效且可靠的信任预测。

英文摘要

Trust prediction infers latent user-user trust relations and provides important support for social recommendation, fake-review and manipulation detection, and risk identification. Graph neural networks have become a prominent approach to trust prediction because of their ability to learn network structures and complex trust dependencies. However, existing methods often rely on a unified representation of trust signals and do not disentangle heterogeneous trust evidence into separate evidence channels, failing to exploit the distinct roles that different evidence channels should play during trust modeling. To address this gap, this paper argues that trust evidence should not be treated as an undifferentiated input, but should be decomposed and used as functional control factors over graph propagation. We propose TCHG, a tri-trust conditioned heterogeneous graph learning framework that decomposes trust evidence into three channels and assigns them distinct functional roles in propagation: entity reliability governs message admission, interaction-behavior reliability modulates propagation strength, and contextual trust adjusts the propagation mode through context-conditioned operator selection. Since the three evidence channels evolve at different temporal scales, TCHG maintains independent temporal states with non-uniform decay rates to prevent rapidly changing contextual signals from overwriting slowly accumulated entity reliability. It further predicts trust probability and calibrates the output probability, improving predictive confidence under sparse or conflicting evidence. Extensive experiments on multiple public trust datasets show that TCHG achieves effective and reliable trust prediction compared with representative trust prediction and heterogeneous graph baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.16990 2026-06-16 cs.LG math.AT 新提交

Analytic Torsion and Spectral Gap Capture Persistent-Laplacian Performance

解析挠率和谱间隙捕捉持久拉普拉斯算子的性能

Jernej Grlj, Aaron D. Lauda

发表机构 * University of Southern California（南加州大学）

AI总结提出用贝蒂数、谱间隙和解析挠率三个不变量的紧凑谱表示替代全谱，在多个数据集上实现同等或更优性能，显著降低计算开销并避免高频噪声。

Comments 13 pages

2606.14892 2026-06-16 cs.AI cs.LG cs.SI stat.ML 交叉投稿

Relational Structural Causal Models

关系结构因果模型

Adiba Ejaz, Elias Bareinboim

发表机构 * Causal Artificial Intelligence Lab, Columbia University（哥伦比亚大学因果人工智能实验室）

AI总结提出关系结构因果模型，将结构因果模型扩展到对象和关系可变的场景，通过关系因果图和符号识别准则实现未见组合的因果和观测查询识别，并设计关系神经因果模型在交通场景中优于非关系基线。

Comments Proceedings of the Forty-Third International Conference on Machine Learning

详情

AI中文摘要

人工智能必须拥有一个因果的环境模型，支持关于干预和反事实的推理，同时具有组合性，支持对未见过的对象组合进行泛化。在这项工作中，我们正式研究了何时以及如何学习这样的模型。我们开发了关系结构因果模型，将结构因果模型（Pearl 2009）扩展到对象及其关系变化的场景。首先，我们展示了在没有进一步假设的情况下，不仅因果查询，而且关于未见对象组合的观测查询的答案也无法被识别。为了实现这种识别——包括在存在未观测混杂的情况下——我们定义了关系因果图并推导了符号识别准则。最后，我们提出了关系神经因果模型，这是一种可证明正确的方法，在具有不同汽车、信号和行人的模拟交通场景中优于非关系基线。

英文摘要

An artificial intelligence must have a model of its environment that is causal, supporting reasoning about interventions and counterfactuals, and also combinatorial, supporting generalization to unseen combinations of objects. In this work, we formally study when and how such a model can be learned. We develop relational structural causal models, extending structural causal models (Pearl 2009) to settings where objects and their relations vary. First, we show how answers to not only causal but also observational queries about unseen combinations of objects can not be identified without further assumptions. To enable such identification--including in the presence of unobserved confounding--we define relational causal graphs and derive symbolic identification criteria. Finally, we propose relational neural causal models, a provably correct approach that outperforms non-relational baselines on simulated traffic scenes with varying cars, signals, and pedestrians.

URL PDF HTML ☆

赞 0 踩 0

2407.07357 2026-06-16 cs.LG q-bio.MN 版本更新

A polarity-aware multi-relational model for the signed interaction prediction in biological networks

面向生物网络中符号交互预测的极性感知多关系模型

Ziye Zhou, Meijie Wang, Lun Yu

发表机构 * Metanovas Biotech, Inc.（MetaNovas生物技术公司）

AI总结提出极性感知多关系模型PAMR，结合图卷积网络与张量分解及冲突感知采样策略，预测化学-基因的极性（激活/抑制）与非极性交互，在分类精度和极性区分上超越基线模型。

详情

AI中文摘要

预测生物网络中的符号交互对于理解药物机制和促进药物再利用至关重要。尽管深度图模型在建模复杂生物系统方面已展现出成功，但现有方法往往无法区分正负交互，限制了其在精确药理学预测中的实用性。在本研究中，我们提出了一种新颖的深度图模型PAMR（极性感知多关系模型），旨在预测极性（如激活、抑制）和非极性（如结合、影响）的化学-基因交互。我们的模型将图卷积网络与张量分解相结合以增强特征表示，并引入了一种冲突感知采样策略来解决极性歧义。我们引入了新的评估指标——极性区分得分（PDS）和CP@100，以评估模型区分交互类型的能力。实验结果表明，PAMR优于基线模型，实现了更高的分类精度和改进的极性边区分能力。具体而言，PAMR-CL达到了0.9072的宏AUROC和0.974的CP@100，超越了RGCN、GraphSAGE、TransE和BioNet基线。一项关于尼古丁的案例研究进一步识别了两个新的化学-基因抑制关联，即S100A6和SPP1，这些关联得到了独立实验文献的证实。此外，我们分析了子图成分对预测性能的影响，揭示了额外的网络结构并不总能提高准确性。这些发现强调了极性感知建模在药物发现和网络药理学中的重要性，为极性感知的化学-基因交互预测和网络药理学分析提供了一个可扩展的计算框架。

英文摘要

Predicting signed interactions in biological networks is crucial for understanding drug mechanisms and facilitating drug repurposing. While deep graph models have demonstrated success in modeling complex biological systems, existing approaches often fail to distinguish between positive and negative interactions, limiting their utility for precise pharmacological predictions. In this study, we propose a novel deep graph model, PAMR (polarity-aware multi-relational model), designed to predict both polar (e.g., activation, inhibition) and non-polar (e.g., binding, affect) chemical-gene interactions. Our model integrates graph convolutional networks with tensor decomposition to enhance feature representation and incorporates a conflict-aware sampling strategy to resolve polarity ambiguities. We introduce new evaluation metrics, polarity discrimination score (PDS) and CP@100, to assess the model's ability to differentiate interaction types. Experimental results demonstrate that PAMR outperforms baseline models, achieving superior classification accuracy and improved discrimination of polar edges. Specifically, PAMR-CL attains a Macro AUROC of 0.9072 and CP@100 of 0.974, surpassing RGCN, GraphSAGE, TransE, and BioNet baselines. A case study on nicotine further identifies two novel chemical-gene suppression links, S100A6 and SPP1, that are corroborated by independent experimental literature. Furthermore, we analyze the impact of subgraph components on predictive performance, revealing that additional network structures do not always enhance accuracy. These findings highlight the importance of polarity-aware modeling in drug discovery and network pharmacology, providing a scalable computational framework for polarity-aware chemical-gene interaction prediction and network pharmacology analysis.

URL PDF HTML ☆

赞 0 踩 0

2502.17614 2026-06-16 cs.LG cs.SI 版本更新

Scalable Graph Condensation with Evolving Capabilities

具有演化能力的可扩展图压缩

Shengbo Gong, Mohammad Hashemi, Juntong Ni, Carl Yang, Wei Jin

发表机构 * Emory University（埃默里大学）

AI总结提出GECC框架，通过类级聚类和继承先前压缩结果，实现大规模动态图数据的可扩展压缩，性能优于现有方法且加速约1000倍。

详情

DOI: 10.1145/3770854.3780217
Journal ref: Page 314-323, Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, 2026

AI中文摘要

图数据的快速增长带来了显著的可扩展性挑战，因为大多数图算法的大小呈二次方扩展。为了缓解这些问题，图压缩（GC）方法被提出，用于从大图中学习一个小图，从而加速下游任务。然而，现有方法关键地假设训练集是静态的，这与现实世界图数据固有的动态和演化特性相冲突。本文引入了一个新颖的连续图压缩框架，能够高效更新蒸馏图，处理数据流而无需昂贵的重新训练。这一限制导致在压缩不断增长的训练集时效率低下。在本文中，我们提出了GECC（图演化聚类压缩），一种可扩展的图压缩方法，旨在处理大规模和演化的图数据。GECC通过对聚合特征执行类级聚类，采用了一种可追踪且高效的方法。此外，当压缩图扩展时，它可以继承先前的压缩结果作为聚类中心，从而获得演化能力。该方法具有坚实的理论基础，并展示了优越的经验性能。包括真实场景在内的综合实验表明，GECC在实现比大多数最先进图压缩方法更好性能的同时，在大数据集上实现了约1000倍的加速。

英文摘要

The rapid growth of graph data creates significant scalability challenges as most graph algorithms scale quadratically with size. To mitigate these issues, Graph Condensation (GC) methods have been proposed to learn a small graph from a larger one, accelerating downstream tasks. However, existing approaches critically assume a static training set, which conflicts with the inherently dynamic and evolving nature of real-world graph data. This work introduces a novel framework for continual graph condensation, enabling efficient updates to the distilled graph that handle data streams without requiring costly retraining. This limitation leads to inefficiencies when condensing growing training sets. In this paper, we introduce GECC (\underline{G}raph \underline{E}volving \underline{C}lustering \underline{C}ondensation), a scalable graph condensation method designed to handle large-scale and evolving graph data. GECC employs a traceable and efficient approach by performing class-wise clustering on aggregated features. Furthermore, it can inherit previous condensation results as clustering centroids when the condensed graph expands, thereby attaining an evolving capability. This methodology is supported by robust theoretical foundations and demonstrates superior empirical performance. Comprehensive experiments including real world scenario show that GECC achieves better performance than most state-of-the-art graph condensation methods while delivering an around 1000$\times$ speedup on large datasets.

URL PDF HTML ☆

赞 0 踩 0

2602.10031 2026-06-16 cs.LG 版本更新

Graph Learning Should Move Beyond Restrictive Views of Spectral and Message-Passing GNNs

图学习应超越对谱图神经网络和消息传递图神经网络的狭隘观点

Antonis Vasileiou, Juan Cervino, Pascal Frossard, Charilaos I. Kanatsoulis, Christopher Morris, Michael T. Schaub, Pierre Vandergheynst, Zhiyang Wang, Guy Wolf, Ron Levie

发表机构 * RWTH Aachen University（亚琛工业大学）； Massachusetts Institute of Technology（麻省理工学院）； École Polytechnique Fédérale de Lausanne（洛桑联邦理工学院）； Stanford University（斯坦福大学）； University of California San Diego（加州大学圣地亚哥分校）； Univ. de Montréal（蒙特利尔大学）； Mila（Mila人工智能研究所）； Technion – Israel Institute of Technology（技术学院–以色列理工学院）

AI总结本文澄清了谱图神经网络与消息传递图神经网络的异同，提出基于特征基对称性的谱GNN精确定义，并倡导统一理论框架以推动图学习发展。

Comments 44 pages, 1 figure

详情

AI中文摘要

图神经网络（GNN）通常分为消息传递神经网络（MPNN）和谱图神经网络，反映了机器学习和信号处理中两个相对独立的研究传统。虽然MPNN有精确的定义，但对于什么构成谱GNN的映射，没有广泛接受的标准。大多数现有工作将谱GNN限制为基于线性谱滤波器的分层架构。在此限制下，我们表明谱GNN和空间GNN具有大致相当的表达能力。为了促进该领域的进步，我们基于特征基对称性提出了谱GNN的精确定义，与通过邻域置换对称性定义的MPNN形成对比。我们进一步论证这两种视角提供了互补的优势。MPNN通过逻辑和图同构的工具为离散结构和表达能力分析提供了自然语言，而谱视角为理解平滑、瓶颈、稳定性和社区结构提供了原则性工具。总体而言，我们认为通过澄清这些视角之间的异同并迈向统一的理论框架，将加速图学习的进展。

英文摘要

Graph neural networks (GNNs) are commonly divided into message-passing neural networks (MPNNs) and spectral GNNs, reflecting two largely separate research traditions in machine learning and signal processing. While MPNNs have a precise definition, there is no widely accepted criterion for what makes a mapping a spectral GNN. Most existing work restricts spectral GNNs to layered architectures based on linear spectral filters. Under this restriction, we show that spectral and spatial GNNs have largely equivalent expressive power. To promote progress in the field, we propose a precise definition of spectral GNNs based on eigenbasis symmetries, in contrast to the definition of MPNNs via neighborhood permutation symmetries. We further argue that the two perspectives offer complementary strengths. MPNNs provide a natural language for discrete structure and expressivity analysis through tools from logic and graph isomorphism, while the spectral perspective offers principled tools for understanding smoothing, bottlenecks, stability, and community structure. Overall, we argue that progress in graph learning will be accelerated by clarifying the similarities and differences between these perspectives and by moving toward a unified theoretical framework.

URL PDF HTML ☆

赞 0 踩 0

2605.26290 2026-06-16 cs.LG 版本更新

Dynamic Link Prediction with Temporally Enhanced Signed Graph Neural Networks

基于时间增强符号图神经网络的动态链接预测

Derek Regier, Andrew Polyak, Aresh Dadlani, Khosro Salmani

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种模块化时间增强框架，通过历史上下文集成模块（HCIM）结合可学习的近因感知时间加权、LSTM嵌入轨迹建模和多头时间注意力，在符号图神经网络中捕获短期和长期符号交互动态，并在SE-SGformer上实例化，实验证明在多个真实和合成时间符号网络上性能显著提升。

Comments This manuscript has been withdrawn by the authors due to errors discovered in the implementation and experimental evaluation. These errors materially affect the reported results and conclusions. The authors therefore do not recommend using or citing this work

详情

AI中文摘要

时间符号网络（TSNs）模拟了社交媒体分析、信任与声誉系统以及金融交易网络等应用中出现的合作与对抗关系的时间演化。尽管图神经网络（GNNs）在静态或无符号链接预测中表现良好，但由于符号关系、演化结构和平衡理论约束的相互作用，在时间符号图中的有效学习仍然具有挑战性。为了解决这一差距，我们提出了一种用于符号GNN的模块化时间增强框架，该框架将历史上下文集成到原本静态的架构中。该框架引入了一个历史上下文集成模块（HCIM），该模块结合了可学习的近因感知时间加权、基于LSTM的嵌入轨迹建模和多头时间注意力，以捕获短期和长期的符号交互动态。历史信息通过全局或节点自适应权重与当前节点表示融合，使得与架构无关的框架能够适应异质的时间行为。我们在自解释符号图变换器（SE-SGformer）上实例化了该方法，在保持可解释性的同时扩展了其时间感知能力。在真实和合成TSN（包括Bitcoin OTC、Bitcoin Alpha、Reddit和小世界网络模型）上的实验表明，与静态基线相比，该方法取得了一致且统计显著的改进。

英文摘要

Temporal signed networks (TSNs) model the time evolution of cooperative and adversarial relationships that arise in applications such as social media analysis, trust and reputation systems, and financial transaction networks. While graph neural networks (GNNs) perform well for static or unsigned link prediction, effective learning in temporal signed graphs remains challenging due to the interaction of signed relations, evolving structure, and balance-theoretic constraints. To address this gap, we propose a \emph{modular} temporal enhancement framework for signed GNNs that integrates historical context into otherwise static architectures. The framework introduces a Historical Context Integration Module (HCIM) that combines learnable recency-aware temporal weighting, LSTM-based embedding trajectory modeling, and multi-head temporal attention to capture both short- and long-term signed interaction dynamics. Historical information is fused with current node representations using either global or node-adaptive weighting, allowing the architecture-agnostic framework to accommodate heterogeneous temporal behaviors. We instantiate the approach on the Self-Explainable Signed Graph Transformer (SE-SGformer), preserving interpretability while extending it with temporal awareness. Experiments on real-world and synthetic TSNs, including Bitcoin OTC, Bitcoin Alpha, Reddit, and small-world network models, demonstrate consistent and statistically significant improvements over the static baseline.

URL PDF HTML ☆

赞 0 踩 0

2505.13986 2026-06-16 math.OC cs.AI cs.LG 版本更新

RIDGECUT: Learning Graph Partitioning with Rings and Wedges

RIDGECUT：基于环与楔形结构的图分割学习

Qize Jiang, Angelo Zangari, Linsey Pang, Alice Gatti, Mahima Aggarwal, Giovanna Vantini, Xiaosong Ma, Weiwei Sun, Sourav Medya, Sanjay Chawla

发表机构 * College of Computer Science and Artificial Intelligence, Shanghai Key Laboratory of Data Science（计算机科学与人工智能学院，上海数据科学重点实验室）； University of Illinois Chicago（伊利诺伊大学芝加哥分校）； PayPal Inc.（PayPal公司）； Center for AI Safety（人工智能安全中心）； Qatar Computing Research Institute（卡塔尔计算研究所）； Hamad Bin Khalifa University（哈马德·本·卡西姆大学）； Computing and Mathematical Sciences (CMS) Division（计算与数学科学（CMS）部门）； Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)（Mohamed bin Zayed人工智能大学（MBZUAI））； Fudan University（复旦大学）

AI总结提出RidgeCut框架，通过将动作空间约束为环与楔形结构，利用强化学习解决归一化割问题，在交通网络上实现结构感知分割，降低归一化割值并展现强泛化能力。

Comments Extended version of the paper accepted at KDD 2026

详情

AI中文摘要

强化学习通过学习跨实例泛化的启发式方法，在图的组合优化问题上展现出潜力。然而，如何有效地将领域知识融入强化学习框架进行图分割仍然具有挑战性，因为现有方法通常依赖于无约束的节点级动作，导致动作空间大且探索效率低。在本文中，我们提出RidgeCut，一种强化学习框架，通过约束动作空间来在归一化割问题中实现结构感知分割。以交通网络为动机示例，我们引入了一个利用城市道路拓扑领域知识的新概念——其中自然分割通常呈现为同心环和径向楔形。通过将图转换为线性或圆形表示，我们的方法能够使用基于变换器的策略并通过近端策略优化进行高效学习。RidgeCut产生的分割不仅与预期的空间布局一致，而且与现有方法相比实现了更低的归一化割值。在合成和真实交通图上的实验结果表明，RidgeCut在跨图大小的归纳泛化方面始终优于现有方法。尽管以道路网络为动机，RidgeCut为将结构先验嵌入到图分割的强化学习框架中提供了一种通用机制。

英文摘要

Reinforcement learning (RL) has shown promise for combinatorial optimization problems on graphs by learning heuristics that generalize across instances. However, effectively incorporating domain knowledge into RL frameworks for graph partitioning remains challenging, as existing approaches typically rely on unconstrained node-level actions that lead to large action spaces and inefficient exploration. In this paper, we propose RidgeCut, an RL framework that constrains the action space to enforce structure-aware partitioning in the Normalized Cut problem. Using transportation networks as a motivating example, we introduce a novel concept that leverages domain knowledge about urban road topology -- where natural partitions often take the form of concentric rings and radial wedges. By transforming the graph into linear or circular representations, our method enables the use of transformer-based policies and efficient learning via Proximal Policy Optimization. The resulting partitions from RidgeCut are not only aligned with expected spatial layouts but also achieve lower normalized cuts compared to existing methods. Experimental results on synthetic and real-world traffic graphs demonstrate that RidgeCut consistently outperforms existing methods while exhibiting strong inductive generalization across graph sizes. Although motivated by road networks, RidgeCut provides a general mechanism for embedding structural priors into RL frameworks for graph partitioning.

URL PDF HTML ☆

赞 0 踩 0

2604.03496 2026-06-16 cs.AI cs.IR cs.LG 版本更新

Beyond Predefined Schemas: TRACE-KG for Context-Enriched Knowledge Graph Generation

超越预定义模式：TRACE-KG 用于上下文增强的知识图谱生成

Mohammad Sadeq Abolhasani, Yang Ba, Yixuan He, Rong Pan

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出 TRACE-KG 框架，通过数据驱动模式联合构建上下文增强的知识图谱和归纳模式，无需预定义本体，解决长技术文档中图谱碎片化问题。

Comments Accepted at Graph Foundation Models at ICML 2026

详情

AI中文摘要

知识图谱生成通常依赖于预定义本体或免模式提取。本体驱动的流水线强制执行一致的类型，但需要昂贵的模式设计和维护，而免模式方法通常产生碎片化的图谱，全局组织薄弱，尤其是在信息密集、依赖上下文的冗长技术文档中。我们提出 \textbf{TRACE-KG}（\textbf{T}ext-d\textbf{R}iven schem\textbf{A} for \textbf{C}ontext-\textbf{E}nriched \textbf{K}nowledge \textbf{G}raphs），一个无需假设预定义本体即可联合构建上下文增强的知识图谱和归纳模式的框架。TRACE-KG 通过结构化限定符捕获条件关系，并使用数据驱动模式组织实体和关系，该模式作为可重用的语义支架，同时保持对源证据的完全可追溯性。实验表明，TRACE-KG 生成结构连贯、可追溯的知识图谱，并为本体驱动和免模式构建流水线提供了实用的替代方案。

英文摘要

Knowledge graph generation typically relies either on predefined ontologies or on schema-free extraction. Ontology-driven pipelines enforce consistent typing but require costly schema design and maintenance, whereas schema-free methods often produce fragmented graphs with weak global organization, especially in long technical documents with dense, context-dependent information. We propose \textbf{TRACE-KG} (\textbf{T}ext-d\textbf{R}iven schem\textbf{A} for \textbf{C}ontext-\textbf{E}nriched \textbf{K}nowledge \textbf{G}raphs), a framework that jointly constructs a context-enriched knowledge graph and an induced schema without assuming a predefined ontology. TRACE-KG captures conditional relations through structured qualifiers and organizes entities and relations using a data-driven schema that serves as a reusable semantic scaffold while preserving full traceability to the source evidence. Experiments show that TRACE-KG produces structurally coherent, traceable knowledge graphs and offers a practical alternative to both ontology-driven and schema-free construction pipelines.

URL PDF HTML ☆

赞 0 踩 0

2606.14900 2026-06-16 cs.LG 新提交

GRASP: Gradient-Aligned Sequential Parameter Transfer for Memory-Efficient Multi-Source Learning

GRASP: 梯度对齐的序列参数迁移用于内存高效的多源学习

Mary Isabelle Wisell, Nicholas Jacobs, Aayush Manandhar, Salimeh Yasaei Sekeh

发表机构 * San Diego State University（圣地亚哥州立大学）； University of Utah（犹他大学）； University of Maine（缅因大学）

AI总结提出GRASP方法，通过序列处理、参数梯度对齐和迭代微调，在O(1)内存下实现多源知识融合，在三个持续学习基准上平均准确率93.5%，优于集成方法的71.7%。

详情

AI中文摘要

多源迁移学习面临一个根本的可扩展性瓶颈：现有方法在参数融合时需要将所有K个源模型同时加载到内存中，需要O(K)内存，或者在推理时部署所有模型，使得生产部署不可行。我们提出GRASP（梯度对齐的序列参数迁移），通过三个关键创新实现了优越的知识集成，同时保持O(1)内存消耗：（1）序列处理，一次将一个源合并到正在演化的目标模型中；（2）参数级梯度对齐，仅选择性迁移其优化方向与目标域对齐的参数，避免负迁移；（3）迭代微调，在集成下一个源之前适应迁移的知识。在三个持续学习基准（Yearbook、CLEAR-10、CLEAR-100）上进行的广泛实验，涵盖了10到108年的时间分布偏移和四种架构（1.3M到25.6M参数），表明GRASP在所有数据集和架构上实现了93.5%的平均准确率，而集成方法为71.7%，同时仅需要恒定内存，而标准多源融合需要K个模型。关键的是，GRASP的序列处理先前合并的模型，并扩展到任意多的源而无需增加内存，使其特别适合资源受限的部署和不断演化的源域。

英文摘要

Multi-source transfer learning faces a fundamental scalability bottleneck: existing approaches require either loading all K source models into memory simultaneously during parameter fusion, requiring O(K) memory, or deploying all models at inference time, making production deployment infeasible. We propose GRASP (Gradient-Aligned Sequential Parameter Transfer), which achieves superior knowledge integration while maintaining O(1) memory consumption through three key innovations: (1) sequential processing that merges one source at a time into an evolving target model, (2) parameter-wise gradient alignment that selectively transfers only parameters whose optimization directions align with the target domain, avoiding negative transfer, and (3) iterative fine-tuning that adapts transferred knowledge before integrating the next source. Extensive experiments across three continual learning benchmarks (Yearbook, CLEAR-10, CLEAR-100) spanning 10 to 108-year temporal distribution shifts and four architectures (1.3M to 25.6M parameters) demonstrate that GRASP achieves 93.5% mean accuracy over all datasets and architectures compared to ensemble method's 71.7% accuracy while requiring only constant memory versus K models for standard multi-source fusion. Critically, GRASP's sequential previously merged models and scales to arbitrarily many sources without memory growth, making it uniquely suitable for resource-constrained deployment and continually evolving source domains.

URL PDF HTML ☆

赞 0 踩 0

2606.15512 2026-06-16 cs.LG physics.plasm-ph 新提交

Towards Data-Efficient Cross-Device Generalization of Grad-Shafranov Equilibria via Transfer Learning Neural Operator

通过迁移学习神经算子实现Grad-Shafranov平衡的数据高效跨设备泛化

Jay Phil Yoo, William Howes, Yashika Ghai, Kazuma Kobayashi, Souvik Chakraborty, Syed Bahauddin Alam

发表机构 * Grainger College of Engineering, Nuclear, Plasma & Radiological Engineering Department, University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校格兰杰工程学院核、等离子体与放射工程系）； Fusion Energy Division, Oak Ridge National Lab（橡树岭国家实验室聚变能源部）； National Center for Supercomputing Applications（国家超级计算应用中心）； Department of Applied Mechanics, Indian Institute of Technology Delhi（印度理工学院德里分校应用力学系）； Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi（印度理工学院德里分校亚迪人工智能学院）

AI总结提出跨设备神经算子框架，将平衡重建转化为算子学习问题，通过多几何预训练实现数据高效迁移，Wavelet Neural Operator在100个目标样本下达到低于4%的L2误差。

详情

AI中文摘要

磁流体动力学平衡的实时重建对于磁约束聚变中的等离子体成形、稳定性评估和反馈控制至关重要。然而，Grad-Shafranov平衡计算在很大程度上仍然是设备特定的和迭代的，限制了它们在延迟受限的控制环境中的应用。现有的神经方法可以加速单个平衡预测，但它们通常无法提供跨变化的等离子体边界或托卡马克几何形状的可重用模型。在这里，我们展示了平衡重建可以重新表述为跨设备算子学习问题。我们开发了一个特定领域的神经算子框架，将几何和剖面参数直接映射到极向磁通场，用摊销的算子推理取代重复的按需求解计算。使用可解析处理的Solov'ev族作为受控的Grad-Shafranov测试平台，我们在八种几何上不同的类托卡马克配置中生成平衡，并在四种迁移学习策略下对五种神经算子架构进行基准测试。单几何预训练对未见设备迁移效果差，而多几何预训练能够实现数据高效的适应。Wavelet Neural Operator在跨几何性能上最强，在100个标记目标平衡下达到低于4%的平均相对L2误差，在全微调下低于2%。预测的磁场满足无散约束至数值精度，四种架构实现毫秒或亚毫秒级推理。这些结果确定了神经算子预训练是实现跨聚变设备配置的可重用实时平衡推理的途径。

英文摘要

Real-time reconstruction of magnetohydrodynamic equilibria is essential for plasma shaping, stability assessment and feedback control in magnetic confinement fusion. However, Grad-Shafranov equilibrium calculations remain largely device-specific and iterative, limiting their use in latency-constrained control settings. Existing neural approaches can accelerate individual equilibrium predictions, but they do not generally provide reusable models across changing plasma boundaries or tokamak geometries. Here we show that equilibrium reconstruction can be recast as a cross-device operator learning problem. We develop a domain-specific neural operator framework that maps geometry and profile parameters directly to the poloidal flux field, replacing repeated solve-on-demand computation with amortized operator inference. Using the analytically tractable Solov'ev family as a controlled Grad-Shafranov testbed, we generate equilibria across eight geometrically distinct tokamak-like configurations and benchmark five neural operator architectures under four transfer-learning strategies. Single-geometry pretraining gives poor transfer to unseen devices, whereas multi-geometry pretraining enables data-efficient adaptation. The Wavelet Neural Operator gives the strongest cross-geometry performance, reaching mean relative L2 errors below 4% with 100 labelled target equilibria and below 2% with full fine-tuning. The predicted magnetic fields satisfy the divergence-free constraint to numerical precision, and four architectures achieve millisecond or sub-millisecond inference. These results identify neural operator pretraining as a route towards reusable, real-time equilibrium inference across fusion device configurations.

URL PDF HTML ☆

赞 0 踩 0

2606.16517 2026-06-16 cs.LG q-bio.QM 新提交

How Post-Training Shapes Biological Reasoning Models

后训练如何塑造生物学推理模型

Lukas Fesser, Hanlin Zhang, Michelle M. Li, Eric Wang, Bryan Perozzi, Shekoofeh Azizi, Sham M. Kakade, Marinka Zitnik

发表机构 * Harvard University（哈佛大学）； Google DeepMind（谷歌DeepMind）； Google Research（谷歌研究院）

AI总结研究后训练各阶段（CPT、SFT、RL）对生物学推理模型领域内和领域外性能的影响，发现SFT提升领域内性能但损害泛化，RL可部分恢复泛化，最佳策略是短SFT加长RL。

详情

AI中文摘要

生物学科学推理模型将语言模型与在多模态生物数据（包括DNA、RNA和蛋白质）上训练的基础模型相结合。这些模型通过后训练构建，然而每个阶段如何塑造推理和泛化能力仍知之甚少。我们研究后训练何时提升性能以及何时导致过度专门化。在基因组学、转录组学和蛋白质领域，我们训练并评估了超过100个生物学推理模型，在骨干网络、持续预训练（CPT）、监督微调（SFT）和强化学习（RL）方面进行受控变化，并测量领域内（ID）和领域外（OOD）性能。我们发现每个后训练阶段以不同方式重塑泛化，而非贡献均匀增益。CPT通过使模型与生物语言对齐来提升下游性能。SFT持续提高ID性能，但导致OOD性能早期达到峰值并随着模型拟合训练分布而下降。RL在应用于具有对齐奖励的强SFT检查点时，改善OOD性能并部分恢复泛化。这些结果表明，生物学推理并非随着额外监督或计算而单调提升。相反，性能取决于训练阶段的组合方式。在固定后训练预算下，最强的ID-OOD权衡来自短暂的SFT、更大的RL分配以及各阶段间不对称的适应能力。

英文摘要

Scientific reasoning models for biology combine language models with foundation models trained on multimodal biological data, including DNA, RNA, and proteins. These models are built through post-training, yet how each stage shapes reasoning and generalization remains poorly understood. We study when post-training improves performance and when it induces over-specialization. Across genomics, transcriptomics, and proteins, we train and evaluate more than 100 biological reasoning models under controlled variation in backbone, continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL), measuring both in-domain (ID) and out-of-domain (OOD) performance. We find that each post-training stage reshapes generalization in a distinct way rather than contributing uniform gains. CPT improves downstream performance by aligning models with biological language. SFT consistently increases ID performance but causes OOD performance to peak early and decline as models fit the training distribution. RL, when applied to strong SFT checkpoints with aligned rewards, improves OOD performance and partially recovers generalization. These results show that biological reasoning does not improve monotonically with additional supervision or compute. Instead, performance depends on how training stages are composed. Under fixed post-training budgets, the strongest ID-OOD trade-off comes from brief SFT, larger RL allocations, and asymmetric adaptation capacity across stages.

URL PDF HTML ☆

赞 0 踩 0

2606.14883 2026-06-16 cs.CV cs.LG 交叉投稿

Understanding Cross-Modal Contributions in Continual Vision-Language Models: A Theoretical Perspective

理解连续视觉-语言模型中的跨模态贡献：一个理论视角

Salimeh Sekeh, Mary Wisell

发表机构 * San Diego State University（圣地亚哥州立大学）

AI总结本文从理论角度分析连续视觉-语言模型中跨模态（视觉-语言）贡献，提出新视角并通过实验验证其有效性，揭示任务顺序和相似性对贡献鲁棒性的影响，提升泛化性能。

详情

AI中文摘要

连续视觉-语言模型通常通过顺序微调来解决；然而，尽管这种范式能够适应新环境（任务），但它本质上以牺牲保持先前获取知识所需的稳定性为代价，强调了先前学习环境（任务）的贡献。虽然现有方法已经充分研究了视觉-语言模型（VLM）中的连续学习和灾难性遗忘，但跨一系列环境的模态特定贡献的理论理解仍然很大程度上未被探索。在本文中，我们提出了一个新的理论视角来理解跨模态（视觉-语言）对连续环境的贡献。我们在大型VLM上实证评估了我们的理论发现，并展示了它们在捕捉环境级跨模态贡献方面的有效性。我们的分析为连续VLM提供了更深入的见解，突出了它们对不同任务顺序和任务间相似性的贡献鲁棒性，以及它们改进的泛化性能。

英文摘要

Continual vision-language models are commonly addressed through sequential fine-tuning; however, although this paradigm enables adaptation to new environments (tasks), it inherently emphasizes the contribution of previously learned environments (tasks) at the expense of the stability required to preserve previously acquired knowledge. While existing approaches have adequately studied continual learning and catastrophic forgetting in vision-language models (VLMs), the theoretical understanding of modality-specific contributions across a sequence of environments remains largely unexplored. In this paper, we present a new theoretical perspective to understand the cross-modal (vision-language) contributions to consecutive environments. We empirically evaluate our theoretical findings on large VLMs and demonstrate their effectiveness in capturing environment-level cross-modal contributions. Our analysis provides deeper insights into continual VLMs, highlighting their contribution robustness to varying task orders and inter-task similarities, and their improved generalization performance.

URL PDF HTML ☆

赞 0 踩 0

2606.15117 2026-06-16 cs.MM cs.AI cs.CV cs.LG cs.SD 交叉投稿

Teacher-Student Structure for Domain Adaptation in Ensemble Audio-Visual Video Deepfake Detection

用于集成视听视频深度伪造检测中领域适应的师生结构

Elham Abolhasani, Maryam Ramezani, Hamid R. Rabiee

发表机构 * Department of Computer Engineering, Sharif University of Technology（谢里夫理工学院计算机工程系）

AI总结提出EAV-DFD方法，结合师生框架的领域适应机制，提升模型在未见领域上的泛化能力，在三个数据集上AUC分别提升4.09%、17.94%和0.5%。

详情

DOI: 10.1109/TAI.2025.3642217

AI中文摘要

生成式AI模型的快速发展导致了更逼真的深度伪造媒体，包括对音频、视频或两者的操纵。这引发了严重的隐私和社会问题。该领域的许多研究已经取得了有前景的域内结果；然而，这些模型在面对来自不同领域的数据时，其有效性常常下降。因此，最近的深度伪造检测方法侧重于通过多种技术增强泛化能力，这些技术融合了所有输入模态，包括音频、图像及其交互。为此，我们提出了EAV-DFD方法，一种广义的深度集成视听模型（EAV-DFD），结合了利用师生框架的领域适应机制，以增强模型在未见领域上的表现和泛化能力。为了评估模型性能，我们使用FakeAVCeleb数据集作为主领域，DFDC、Deepfake_TIMIT和PolyGlotFake数据集作为未见领域。我们的实验结果表明，所提出的框架在领域适应方面是有效的，仅使用一小部分未见数据集训练学生模型，就在三个未见数据集上分别将模型的AUC性能提升了4.09%、17.94%和0.5%。这产生了一种新颖的深度伪造检测模型，能够适应新领域并解释哪个模态被操纵，突显了我们的方法在现实世界应用中的潜力。

英文摘要

The rapid advancement of generative AI models is leading to more realistic deepfake media, encompassing the manipulation of audio, video, or both. This raises severe privacy and societal concerns. Numerous studies in this area have yielded promising intra-domain results; however, these models frequently exhibit decreased efficacy when faced with data from dissimilar domains. Consequently, recent deepfake detection approaches focus on enhancing the generalization ability through multiple techniques that incorporate all input modalities, including audio, images, and their interactions. In this regard, we propose the EAV-DFD method, a generalized deep ensemble audio-visual model (EAV-DFD) combined with a domain adaptation mechanism utilizing a teacher-student framework to enhance the model's ability to perform and generalize effectively across unseen domains. To evaluate the model's performance, we used the FakeAVCeleb dataset as the primary domain and the DFDC, Deepfake_TIMIT, and PolyGlotFake datasets as an unseen domain. Our experimental results demonstrate that the proposed framework is efficient in domain adaptation, improving AUC performance of the model by 4.09%, 17.94%, and 0.5% on three unseen datasets, using only a small portion of them to train the student model. This leads to a novel deepfake detection model capable of adapting to new domains and interpreting which modality has been manipulated, highlighting the potential of our approach for real-world applications.

URL PDF HTML ☆

赞 0 踩 0

2606.15734 2026-06-16 cs.CL cs.AI cs.IR cs.LG 交叉投稿

Retrievable Gradients: Continual Post-Training Without Cumulative Weight Drift

可检索梯度：无累积权重漂移的持续后训练

Weihang Su, Jiacheng Kang, Jingyan Xu, Qingyao Ai, Jianming Long, Hanwen Zhang, Bangde Du, Xinyuan Cao, Min Zhang, Yiqun Liu

发表机构 * Department of Computer Science and Technology, Tsinghua University（清华大学计算机科学与技术系）

AI总结提出ReGrad范式，将梯度作为可检索知识单元，通过元学习重塑文档梯度为通用适应信号，实现无权重漂移的可扩展参数知识注入。

详情

AI中文摘要

大型语言模型在持续微调过程中灾难性遗忘的机制分析

Gustav Olaf Yunus Laitinen-Fredriksson Lundstrom-Imanov

发表机构 * Division of Statistics and Machine Learning (STIMA), Department of Computer and Information Science (IDA), Linköping University（统计与机器学习系（STIMA）、计算机与信息科学系（IDA）、利厄普堡大学）

AI总结本文系统比较了20个顶级LLM在持续微调中的灾难性遗忘，通过行为分析和机制解释定位易受参数覆盖的神经回路，并提出低秩电路投影（LRCP）方法，在开放权重模型中恢复高达94.2%的祖先能力。

Comments 12 pages, 8 figures, 5 tables. Preprint submitted to Elsevier

详情

AI中文摘要

大型语言模型（LLMs）在适应目标任务时的顺序微调常常引发灾难性遗忘，即获取新目标技能会削弱原有能力。本文对代表2026年中期的二十个顶级模型进行了灾难性遗忘的系统比较研究。我们将研究分为两条主线：（i）对十个领先闭源模型（包括Claude Fable 5、GPT-5.5 High和Gemini 3.5 Flash）的行为和语义输出漂移分析；（ii）对十个著名开放权重架构（如DeepSeek-V4-Pro、Llama 4 Maverick和Qwen 3.6-27B）的深度机制解释。通过权重空间轨迹追踪、中心核对齐（CKA）以及混合专家（MoE）层中的路由门漂移计算，我们定位了高度易受参数覆盖的神经回路。我们的发现表明，早期层的注意力头表现出系统性熵扩散，而中深层的前馈网络（或稀疏专家块）则遭受局部表示崩溃。基于这些见解，我们引入了低秩电路投影（LRCP），一种子空间正则化的训练干预。实证评估显示，LRCP在开放权重配置中成功恢复了高达94.2%的祖先能力，并匹配了标准PEFT基线的适应速度。

英文摘要

Sequential fine-tuning of Large Language Models (LLMs) adaptation to target tasks often triggers catastrophic forgetting, where the acquisition of novel target skills degrades ancestral capabilities. This paper presents a systematic comparative study of catastrophic forgetting across twenty premier models representing the state-of-the-art in mid-2026. We categorize our investigation into two primary research lines: (i) a behavioral and semantic output drift analysis of ten leading closed-source models (including Claude Fable 5, GPT-5.5 High, and Gemini 3.5 Flash), and (ii) a deep mechanistic interpretation of ten prominent open-weight architectures (such as DeepSeek-V4-Pro, Llama 4 Maverick, and Qwen 3.6-27B). Through weight-space trajectory tracking, Centered Kernel Alignment (CKA), and routing gate drift calculations in Mixture-of-Experts (MoE) layers, we localize the neural circuits highly susceptible to parameter overwriting. Our findings indicate that early-layer attention heads exhibit systemic entropic dispersion, while mid-to-deep feed-forward networks (or sparse expert blocks) suffer localized representation collapse. Informed by these insights, we introduce Low-Rank Circuit Projection (LRCP), a subspace-regularized training intervention. Empirical evaluations show that LRCP successfully mitigates up to 94.2% of ancestral capabilities in open-weight configurations and matches the adaptation velocity of standard PEFT baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.00558 2026-06-16 cs.LG 版本更新

Semi-Supervised Noise Adaptation: Transferring Knowledge from Noise Domain

半监督噪声适应：从噪声域迁移知识

Yuan Yao, Jin Song, Huixia Li, Tongtong Yuan, Jiaqi Wu, Yu Zhang

发表机构 * Guangdong Laboratory of Artificial Intelligence and Digital Economy（广东人工智能与数字经济实验室）； Nanjing University of Posts and Telecommunications（南京邮电大学）； Beijing Jiaotong University（北京交通大学）； Beijing University of Technology（北京工业大学）； Tsinghua University（清华大学）； Southern University of Science and Technology（南方科技大学）

AI总结提出半监督噪声适应（SSNA）问题，利用合成噪声域作为源域，通过噪声适应框架（NAF）改善目标域的泛化性能。

Comments Accepted by ICML 2026

详情

AI中文摘要

迁移学习旨在通过从源域迁移知识来促进目标域的学习。源域通常包含语义上有意义的样本（例如图像），以促进有效的知识迁移。然而，最近的一项研究观察到，由简单分布（例如高斯分布）构建的噪声域可以在半监督设置中作为替代源域，其中只有一小部分目标样本被标记，而大多数样本未标记。基于这一令人惊讶的观察，我们提出了一种称为半监督噪声适应（SSNA）的新问题，旨在利用合成噪声域来提高目标域的泛化能力。为了解决这个问题，我们首先建立了一个泛化界，描述了噪声域对泛化的影响，基于此我们提出了噪声适应框架（NAF）。大量实验表明，NAF有效地利用噪声域来收紧目标域的泛化界，从而提高了性能。代码可在 https://github.com/AIResearch-Group/SSNA 获取。

英文摘要

Transfer learning aims to facilitate the learning of a target domain by transferring knowledge from a source domain. The source domain typically contains semantically meaningful samples (*e.g.*, images) to facilitate effective knowledge transfer. However, a recent study observes that the noise domain constructed from simple distributions (*e.g.*, Gaussian distributions) can serve as a surrogate source domain in the semi-supervised setting, where only a small proportion of target samples are labeled while most remain unlabeled. Based on this surprising observation, we formulate a novel problem termed *Semi-Supervised Noise Adaptation* (SSNA), which aims to leverage a synthetic noise domain to improve the generalization of the target domain. To address this problem, we first establish a generalization bound characterizing the effect of the noise domain on generalization, based on which we propose a Noise Adaptation Framework (NAF). Extensive experiments demonstrate that NAF effectively leverages the noise domain to tighten the generalization bound of the target domain, leading to improved performance. The codes are available at https://github.com/AIResearch-Group/SSNA.

URL PDF HTML ☆

赞 0 踩 0

2507.02288 2026-06-16 cs.CV cs.LG 版本更新

Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization

基于语言引导与表示对齐的提示解缠用于域泛化

De Cheng, Zhipeng Xu, Xinyang Jiang, Dongsheng Li, Nannan Wang, Xinbo Gao

发表机构 * School of Telecommunications Engineering, the State Key Laboratory of Integrated Services Networks (ISN), Xidian University, Xi’an, China（电信工程学院、集成服务网络国家重点实验室（ISN）、西安电子科技大学）； Microsoft Research Asia, Shanghai, China（微软亚洲研究院，上海，中国）

AI总结提出利用大语言模型自动解缠文本提示，并引入最差显式表示对齐，结合抽象提示增强源域多样性，实现域不变视觉表示学习，在多个基准上超越现有方法。

详情

DOI: 10.1109/TPAMI.2026.3661049
Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 48, no. 6, pp. 6799-6816, June 2026

AI中文摘要

域泛化（DG）旨在开发一个能够在未见过的目标域上有效执行的通用模型。值得注意的是，预训练视觉基础模型（VFM）如CLIP的最新进展，已显示出增强深度学习模型泛化能力的巨大潜力。尽管基于VFM的域提示调整在DG中受到越来越多的关注，但设计能够解缠跨域不变特征的提示仍然是一个关键挑战。在本文中，我们提出通过利用VFM的可控且灵活的语言提示来解决这一挑战。注意到VFM的文本模态自然更容易解缠，我们引入了一个新颖的文本特征引导的视觉提示调整框架。该框架首先使用大语言模型（LLM）自动解缠文本提示，然后学习由解缠文本特征引导的域不变视觉表示。然而，仅依赖语言来引导视觉特征解缠存在局限性，因为视觉特征有时可能过于复杂或微妙，难以被描述性文本完全捕捉。为解决这一问题，我们引入了最差显式表示对齐（WERA），它通过添加一组额外的抽象提示来扩展文本引导的视觉提示。这些提示通过风格化图像增强来增强源域多样性，而对齐约束确保视觉表示在原始分布和增强分布上保持一致。在包括PACS、VLCS、OfficeHome、DomainNet和TerraInc在内的主要DG数据集上进行的实验表明，我们提出的方法优于最先进的DG方法。

英文摘要

Domain Generalization (DG) seeks to develop a versatile model capable of performing effectively on unseen target domains. Notably, recent advances in pre-trained Visual Foundation Models (VFMs), such as CLIP, have demonstrated considerable potential in enhancing the generalization capabilities of deep learning models. Despite the increasing attention toward VFM-based domain prompt tuning within DG, the effective design of prompts capable of disentangling invariant features across diverse domains remains a critical challenge. In this paper, we propose addressing this challenge by leveraging the controllable and flexible language prompt of the VFM. Noting that the text modality of VFMs is naturally easier to disentangle, we introduce a novel framework for text feature-guided visual prompt tuning. This framework first automatically disentangles the text prompt using a large language model (LLM) and then learns domain-invariant visual representation guided by the disentangled text feature. However, relying solely on language to guide visual feature disentanglement has limitations, as visual features can sometimes be too complex or nuanced to be fully captured by descriptive text. To address this, we introduce Worst Explicit Representation Alignment (WERA), which extends text-guided visual prompts by incorporating an additional set of abstract prompts. These prompts enhance source domain diversity through stylized image augmentations, while alignment constraints ensure that visual representations remain consistent across both the original and augmented distributions. Experiments conducted on major DG datasets, including PACS, VLCS, OfficeHome, DomainNet, and TerraInc, demonstrate that our proposed method outperforms state-of-the-art DG methods.

URL PDF HTML ☆

赞 0 踩 0

2510.10981 2026-06-16 stat.ML cs.LG 版本更新

FastMix: 通过梯度下降实现快速数据混合优化

Haoru Tan, Sitong Wu, Yanfeng Chen, Jun Xia, Ruobing Xie, Bin Xia, Xingwu Sun, Xiaojuan Qi

发表机构 * University of Hong Kong（香港大学）； Tencent（腾讯）； Chinese University of Hong Kong（香港中文大学）

AI总结提出FastMix框架，将数据混合选择重新表述为双层优化问题，通过联合优化混合系数和模型参数，实现高效、可扩展的数据混合发现，在预训练和后训练中均优于基线方法且大幅降低搜索成本。

详情

Journal ref: ICLR-2026

AI中文摘要

虽然大规模和多样化的数据集推动了大型模型的最新进展，但确定预训练和后训练的最佳数据混合仍然是一个重要的开放问题。我们通过FASTMIX应对这一挑战，这是一个新颖的框架，在仅训练单个代理模型的同时自动发现数据混合。FASTMIX不依赖预定义的启发式方法或资源密集型模拟，而是联合优化混合系数和模型参数，显著提高了相对于先前方法的效率和可扩展性。FASTMIX的核心是将混合选择重新表述为一个双层优化问题。在这种重新表述下，我们证明优化混合比例在数学上等价于在均匀源采样下分配每个源的损失权重。这将混合系数直接嵌入到可微分的迭代优化目标中，从而能够对混合和模型进行高效的基于梯度的优化。为了解决优化问题，FASTMIX实现了一个近似迭代优化过程，交替进行（i）根据当前混合比例对采样的数据更新模型参数（内循环）和（ii）基于验证反馈更新混合比例（外循环）。在预训练和后训练中，FASTMIX均优于基线方法，同时大幅降低了搜索成本。代码见 https://github.com/hrtan/fastmix

英文摘要

While large and diverse datasets have driven recent advances in large models, identifying the optimal data mixture for pre-training and post-training remains a significant open problem. We address this challenge with FASTMIX, a novel framework that automates data mixture discovery while training only a single proxy model. Instead of relying on predefined heuristics or resource-intensive simulations, FASTMIX jointly optimizes mixture coefficients and model parameters, substantially improving efficiency and scalability over prior approaches. At the core of FASTMIX is a reformulation of mixture selection as a bilevel optimization problem. Under this reformulation, we show that optimizing mixture ratios is mathematically equivalent to assigning per-source loss weights under uniform source sampling. This embeds the mixture coefficients directly into the differentiable iterative optimization objective, enabling efficient, gradient-based optimization of both mixture and model. To solve the optimization problem, FASTMIX implements an approximate iterative optimization procedure, alternating between (i) updating model parameters on data sampled according to current mixture ratios (inner loop) and (ii) updating mixture ratios based on validation feedback (outer loop). Across pre- and post-training, FASTMIX outperforms baselines while drastically reducing search cost. Code (https://github.com/hrtan/fastmix)

URL PDF HTML ☆

赞 0 踩 0

2606.15032 2026-06-16 cs.LG 新提交

数据受限语言模型预训练的数据增强

Michael K. Chen, Xikun Zhang, Zhen Wang

发表机构 * UC San Diego（加州大学圣地亚哥分校）； RMIT University（皇家墨尔本理工大学）

AI总结针对数据受限下标准自回归预训练严重过拟合的问题，提出三类数据增强方法（token级噪声、序列排列、目标偏移预测），有效降低验证损失并支持数百epoch训练。

详情

AI中文摘要

随着AI实验室接近数据天花板，计算能力超过新高质量文本生成速率，语言模型预训练正转向数据受限、计算充裕的体制，需要在固定语料库上进行高效的多轮训练。标准自回归（AR）预训练在此设置下严重过拟合，早期达到最优然后持续恶化。我们研究数据增强作为正则化器来缓解过拟合，并在相同数据上实现数百轮的有效训练。我们为AR预训练引入了三类正交的增强：token级噪声（掩码、随机替换）、序列排列（从右到左预测、Fill-in-the-Middle）以及目标偏移预测（$x_{t+i}$，$i > 1$）。通过系统消融实验，我们发现单个增强相对于基线延迟了过拟合并降低了验证损失，其中随机token替换在单个方法中实现了最佳最小损失。组合增强类别进一步降低了最小验证损失。我们的实验表明，数据增强缓解了AR预训练的数据低效问题，并为数据受限体制提供了有前景的解决方案。所有代码和数据可在https://github.com/michaelchen-lab/data-augmentations-for-pretraining获取。

英文摘要

As AI labs approach a data ceiling where compute capacity outpaces the rate of new high-quality text generation, language model pretraining is shifting toward a data-constrained, compute-abundant regime that demands productive multi-epoch training on fixed corpora. Standard autoregressive (AR) pretraining overfits severely in this setting, reaching its optimum early and then continuously deteriorating. We investigate data augmentation as a regularizer to mitigate this overfitting and enable productive training for hundreds of epochs on the same data. We introduce three orthogonal categories of augmentation for AR pretraining: token-level noise (masking, random replacement), sequence permutations (right-to-left prediction, Fill-in-the-Middle), and target offset prediction ($x_{t+i}$ for $i > 1$). Through systematic ablations, we find that individual augmentations delay overfitting and lower validation loss relative to the baseline, with random token replacement achieving the best minimum loss among individual methods. Combining augmentation categories further lowers the minimum validation loss. Our experiments demonstrate that data augmentations mitigate AR pretraining's data inefficiency and offer a promising solution to the data-constrained regime. All code and data are available at https://github.com/michaelchen-lab/data-augmentations-for-pretraining

URL PDF HTML ☆

赞 0 踩 0

2606.16341 2026-06-16 cs.LG cs.DB 新提交

Filtered ANN as a Phase Transition: When Selectivity-Estimation Error Causes Plan Regret

过滤式近似最近邻搜索作为相变：选择性估计误差导致计划遗憾

Madhulatha Mandarapu, Sandeep Kunkunuru

发表机构 * VaidhyaMegha Private Limited, India（VaidhyaMegha 私人有限公司，印度）

AI总结本文研究过滤式近似最近邻搜索中，选择性估计误差如何导致计划遗憾，并揭示其仅在相变边界附近产生，遗憾呈对数宽度楔形，通过有限尺度标度验证。

Comments 8 pages, 4 figures. Code, benchmarks, and full pre-registration:https://github.com/samyama-ai/filtered-ann-regret

详情

AI中文摘要

过滤式近似最近邻（ANN）查询返回满足属性谓词P（选择性为s）的向量中最近的k个向量。最佳执行策略——预过滤、后过滤或内过滤——随s变化，因此系统必须估计s并选择。我们将其建模为在具有相（各策略获胜区域）的景观上的argmax，相由边界分隔，并表明选择性估计误差仅在边界周围的临界区域产生计划遗憾（相对于最优策略的召回损失）。遗憾是一个对数宽度等于乘法估计误差ε、高度等于局部悬崖|V'(s*)|ε的楔形；翻转裕度1/|V'(s*)|是作为局部边界理论重新出现的兄弟基数估计研究的条件数。两个相边界来自独立的数学：顺序统计将后过滤悬崖置于s ~ k/K，而站点渗流将内过滤悬崖置于s_c ~ 0.83/M（图度数M，与语料库大小无关）。临界性仅在受限预算B < sqrt(k n)下存在。在预先注册的决策规则下，我们在合成扫描和真实SIFT1M上确认，遗憾在边界处集中约290倍，且遗憾曲线在语料库大小的两个数量级上服从有限尺寸标度坍缩为一个通用楔形。真实的近似索引不会错误定位边界，但有偏的成本模型会打开一个持续的校准偏差带，估计误差鲁棒性无法修复。贡献在于表征，而非新索引。代码和完整的预注册已公开。

英文摘要

A filtered approximate-nearest-neighbor (ANN) query returns the k nearest vectors among those satisfying an attribute predicate P of selectivity s. The best execution strategy -- pre-filter, post-filter, or in-filter -- changes with s, so a system must estimate s and choose. We model this as an argmax over a landscape with phases (regions where each strategy wins) separated by boundaries, and show that selectivity-estimation error produces plan regret -- recall lost versus the oracle strategy -- only in the critical regions around those boundaries. The regret is a wedge of log-width equal to the multiplicative estimation error epsilon and height equal to the local cliff |V'(s*)| epsilon; the flip-margin 1/|V'(s*)| is the condition number of a sibling cardinality-estimation study reappearing as the local boundary theory. The two phase boundaries follow from independent mathematics: order statistics place the post-filter cliff at s ~ k/K, and site percolation places the in-filter cliff at s_c ~ 0.83/M for graph degree M (corpus-size independent). Criticality exists only under a constrained budget B < sqrt(k n). Under pre-registered decision rules we confirm, on synthetic sweeps and real SIFT1M, that regret concentrates ~290x at the boundary and that the regret curves obey a finite-size scaling collapse onto one universal wedge across two decades of corpus size. A real approximate index does not mis-locate the boundary, but a biased cost model opens a persistent miscalibration band that estimation-error robustness cannot fix. The contribution is a characterization, not a new index. Code and the full pre-registration are public.

URL PDF HTML ☆

赞 0 踩 0

2606.16356 2026-06-16 cs.LG 新提交

Simulation-Augmented Multi-Step Split Conformal Prediction for Aggregated Forecasts

面向聚合预测的模拟增强多步分割共形预测

Andro Sabashvili

AI总结提出SA-MSCP方法，通过块自助法从交叉验证残差生成未来路径并构建经验分位数预测区间，提升聚合和增长率目标的经验覆盖率。

Comments Accepted at ICML 2026 workshop: Forecasting as a New Frontier of Intelligence

2606.16411 2026-06-16 cs.LG 新提交

Not all Jensen-Shannon Divergence Estimators are Equal

并非所有 Jensen-Shannon 散度估计器都是等价的

Alba Garrido, Alejandro Almodóvar, Mar Elizo, Patricia A. Apellániz, Santiago Zazo, Juan Parras

发表机构 * Information Processing and Telecommunications Center, ETSI Telecomunicación, Universidad Politécnica de Madrid（马德里理工大学电信工程学院信息处理与电信中心）

AI总结针对合成表格数据保真度评估中 Jensen-Shannon 散度估计协议不明确的问题，系统研究了不同估计器族、采样协议等因素对估计值的影响，揭示了边际估计器的依赖盲性和分类器估计器的敏感性，并提出了后验校正方法。

详情

AI中文摘要

Jensen-Shannon 散度被广泛报道为合成表格数据保真度的标量度量。然而，在实践中，它是使用通常未明确说明的协议从有限样本中估计的。这造成了一个测量问题。尽管总体散度定义明确，但经验值取决于估计器族、采样协议、校准、维度和类别平衡。我们表明，不同的协议可能产生不可比较的值：基于边际的估计器忽略联合分布中的依赖关系，可能严重低估散度，而基于分类器的估计器捕获联合结构，但表现出强烈的估计器依赖性。我们在具有参考散度的受控设置和真实世界合成表格基准上系统地研究了这种行为。我们的分析揭示了边际估计器中的依赖盲性、类别不平衡下的先验偏移偏差以及高维中的估计器敏感性。为了解决先验偏移，我们推导了基于分类器的 Jensen-Shannon 估计的闭式后验校正。我们的结果表明，经验 Jensen-Shannon 散度值本质上依赖于协议，因此明确指定估计程序对于有意义的比较是必要的。我们提供了实用指南和一个用于估计器感知的 Jensen-Shannon 评估的开源工具。

英文摘要

The Jensen-Shannon divergence is widely reported as a scalar measure of fidelity for synthetic tabular data. Yet, in practice, it is estimated from finite samples using protocols that are often underspecified. This creates a measurement problem. Although the population divergence is well defined, the empirical value depends on the estimator family, sampling protocol, calibration, dimensionality, and class balance. We show that different protocols can yield non-comparable values: marginal-based estimators ignore dependencies in the joint distribution and can severely underestimate divergence, while classifier-based estimators capture joint structure but exhibit strong estimator dependence. We systematically study this behavior across controlled settings with reference divergences and real-world synthetic tabular benchmarks. Our analysis reveals dependence blindness in marginal estimators, prior-shift bias under class imbalance, and estimator sensitivity in high dimensions. To address prior shift, we derive a closed-form posterior correction for classifier-based Jensen-Shannon estimation. Our results show that empirical Jensen-Shannon divergence values are inherently protocol-dependent, making explicit specification of the estimation procedure necessary for meaningful comparison. We provide practical guidelines and an open-source tool for estimator-aware Jensen-Shannon evaluation.

URL PDF HTML ☆

赞 0 踩 0

2606.16511 2026-06-16 cs.LG 新提交

Tail-Shape Estimation in LLM Evaluation Is Fragile: A Protocol for Diagnosing False Positives

LLM评估中的尾部形状估计是脆弱的：诊断假阳性的协议

Luca Zhou

发表机构 * Sapienza University of Rome（罗马大学）

AI总结本文提出一个协议，用于检验LLM评估中尾部形状估计的假阳性，通过极值理论指标区分尾部重量和尾部质量，并在毒性评估中识别出三种假阳性模式。

Comments 9 pages of main paper, 4 figures and 4 tables in the main paper, more in the appendix

详情

AI中文摘要

最近的研究推动将大型语言模型（LLM）评估从基于均值转向基于尾部的指标，包括条件风险价值和奖励模型误差的尾部指数估计。我们探讨了极值理论中的尾部指数参数（该参数将尾部的沉重程度与尾部质量的大小分离开来）是否在LLM评估中提供了超越均值和标准尾部幅度统计量的区分信息。我们预先注册了一个协议，涵盖任何正面尾部形状主张的可接受性、拟合优度、阈值稳定性和效应量要求。该协议是本文的贡献；下面的实证研究展示了其门控机制如何捕捉问题。应用于两个结构不同的评分器家族下的标准LLM毒性评估设置时，该协议捕捉了三种不同的假阳性模式（这些模式在简单分析中会被发表），并拒绝了两个评分器上的标题尾部形状主张。我们得出结论，在我们检查的LLM毒性评估设置中，尾部形状估计比近期文献所暗示的更为脆弱，并建议将该协议作为类似设置中尾部指数主张的起点。

英文摘要

Recent work motivates moving large language model (LLM) evaluation from mean-based to tail-aware metrics, including conditional value-at-risk and tail-index estimates of reward-model error. We ask whether the canonical extreme-value-theory tail-index parameter, which isolates how heavy a tail is from how large the tail mass is, adds discriminative information beyond the mean and a standard tail-magnitude statistic in LLM evaluation. We pre-register a protocol covering admissibility, goodness-of-fit, threshold-stability, and effect-size requirements for any positive tail-shape claim. The protocol is the contribution of this paper; the empirical study below is a demonstration of what its gates catch. Applied to a standard LLM toxicity-evaluation setup under two structurally different scorer families, the protocol catches three distinct modes of false positives that a naive analysis would have published, and rejects the headline tail-shape claim on both scorers. We conclude that tail-shape estimation in the LLM toxicity-evaluation setups we examined is more fragile than the recent literature suggests, and recommend the protocol as a starting point for tail-index claims in similar setups.

URL PDF HTML ☆

赞 0 踩 0

2606.16562 2026-06-16 cs.LG 新提交

MIRAGE: Auditing Anti-Muslim Bias in Frontier LLMs Across Reasoning, Agentic, and Time-Coupled Conditions

MIRAGE: 审计前沿大语言模型在推理、智能体与时间耦合条件下的反穆斯林偏见

Noor Islam S. Mohammad, Tamim Sheikh

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出MIRAGE基准，包含1200个提示，覆盖直接完成、思维链推理和模拟智能体决策三种部署场景，发现思维链放大偏见、智能体决策存在不对称性、偏见与检索新闻时间耦合，现有缓解措施效果有限。

详情

AI中文摘要

在发现大语言模型中持续存在的反穆斯林偏见五年后，大多数评估仍局限于单轮提示完成，这一设置已不再反映前沿LLM的部署方式。我们引入\textbf{MIRAGE}（穆斯林身份推理与智能体生成评估）基准，包含1,200个提示，涵盖三种部署现实条件：直接完成、思维链推理以及跨内容审核、贷款分类、难民申请摘要和招聘筛选的模拟智能体决策。在六个前沿模型上，我们发现：(i) 思维链推理相比直接完成，将穆斯林-暴力关联\textit{放大}了12-34%；(ii) 智能体决策在相同证据下，穆斯林与非穆斯林匹配案例之间表现出9-22个百分点的差异；(iii) 偏见与检索到的新闻上下文高度时间耦合，在近期冲突检索下增加18-27%。现有的基于提示的缓解措施在我们的三种条件下迁移性差，抑制了直接完成偏见，但智能体不对称性基本保持不变。我们发布MIRAGE和一个开放评估工具包，以支持有针对性的缓解研究。

英文摘要

Five years after the discovery of persistent anti-Muslim bias in large language models, most evaluations remain confined to single-turn prompt completion, a setting that no longer reflects how frontier LLMs are deployed. We introduce \textbf{MIRAGE} (Muslim-Identity Reasoning and Agentic Generation Evaluation), a benchmark of 1{,}200 prompts spanning three deployment-realistic conditions: direct completion, chain-of-thought reasoning, and simulated agentic decision-making across content moderation, lending triage, refugee claim summarization, and hiring screens. Across six frontier models, we find that (i) chain-of-thought reasoning \emph{amplifies} rather than suppresses Muslim-violence associations by 12--34\% relative to direct completion, (ii) agentic decisions exhibit a 9--22 percentage-point asymmetry between Muslim and matched non-Muslim cases on identical evidence, and (iii) bias is sharply time-coupled to retrieved news context, increasing 18--27\% under recent-conflict retrieval. Existing prompt-based mitigations transfer poorly across our three conditions, suppressing direct-completion bias while leaving agentic asymmetry largely intact. We release MIRAGE and an open evaluation harness to support targeted mitigation research.

URL PDF HTML ☆

赞 0 踩 0

2606.16748 2026-06-16 cs.LG cs.CL 新提交

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

MyPCBench: 个人智能计算机使用代理的基准测试

Lawrence Keunho Jang, Andrew Keunwoo Jang, Jing Yu Koh, Ruslan Salakhutdinov

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结提出MyPCBench基准，在模拟真实桌面环境（含17个Web应用）中测试个人计算机使用代理，发现最佳模型Claude Opus 4.6仅解决55.4%任务，失败集中在多应用和长轨迹任务。

详情

AI中文摘要

当前的计算机使用代理基准测试在非个人化环境中评估模型。这导致评估与部署之间存在差距，因为个人助理预计将在用户的整个数字生活中工作，包括其上下文、历史数据和已登录账户。这种差距在Web任务中最为明显，因为实时Web评估无法测试需要登录或个人信息的网站，而真正的个人助理必须驱动这类网站。我们引入了MyPCBench，它在Linux桌面上测试计算机使用代理作为个人助理，该桌面填充了17个模拟的真实世界Web应用程序和一个完整的桌面堆栈，所有这些都为一个典型角色——来自《办公室》的Michael Scott——进行了种子化。我们在此环境中定义了184个任务，每个任务都受到来自OpenClaw社区的真实请求的启发，并使用统一的计算机+bash工具界面基准测试了六个闭源和开源模型。我们发现，最佳模型Claude Opus 4.6完全解决了55.4%的任务，是唯一超过50%的模型。模型失败集中在跨越多个应用程序的任务和长轨迹上，其中个性化对助理的压力最大。我们在https://mypcbench.com上发布了环境、任务集和代理工具包。

英文摘要

Current benchmarks for computer-use agents evaluate models in impersonal environments. This leaves a gap between evaluation and deployment where personal assistants are expected to work across a user's whole digital life, including their context, historical data, and logged-in accounts. This gap is widest on web tasks, where live web evaluations cannot exercise sites that require logging in or personal information, the kind of site a real personal assistant has to drive. We introduce MyPCBench, which tests computer-use agents as personal assistants on a Linux desktop populated with 17 simulated real-world web applications and a full desktop stack, all seeded for one canonical persona, Michael Scott from The Office. We define 184 tasks in this environment, each inspired by a real request drawn from the OpenClaw community, and benchmark six closed and open-weight models with a uniform computer+bash tool surface. We find that the best model, Claude Opus 4.6, fully solves 55.4\% of the tasks, the only model above 50\%. Model failures cluster on tasks that span many applications and on long trajectories, where personalization stresses an assistant the most. We release the environment, task set, and agent harness at https://mypcbench.com.

URL PDF HTML ☆

赞 0 踩 0

2606.16765 2026-06-16 cs.LG physics.flu-dyn 新提交

A Validated LBM Dataset and Pipeline for Surrogate Modeling of Turbulent 3D Obstructed Channel Flows

一个经过验证的LBM数据集和用于湍流三维阻塞通道流代理建模的流水线

Lukas Schröder, Shubham Kavane, Harald Köstler

发表机构 * Chair of Computer Science 10 (System Simulation)（计算机科学系10号 chair（系统仿真））； Friedrich-Alexander-Universität Erlangen-Nürnberg（埃尔朗根-纽伦堡弗里德里希-亚历山大大学）

AI总结提出一个可复现的流水线，生成雷诺数1000-10000的三维通道流训练数据，使用累积碰撞算子的格子玻尔兹曼求解器，并通过实验测量和网格收敛研究验证，为神经算子标准化比较提供基础。

Comments 4 pages + appendix, 9 figures, Accepted at the 1st Workshop on Differentiable Systems and Scientific Machine Learning (SysDiff) @ EurIPS 2025, OpenReview: https://openreview.net/forum?id=rdmHT72NQH

详情

AI中文摘要

评估三维湍流的神经算子需要经过验证的数据集和物理基准。我们提出了一个可复现的流水线，用于生成在雷诺数1000-10000范围内、围绕生成几何体的三维通道流的训练数据。我们的格子玻尔兹曼求解器采用累积碰撞算子，并通过实验测量（斯特劳哈尔数、阻力系数、湍流波动）进行了严格验证，在1024x512x512分辨率下进行了全面的网格收敛研究。基于已建立的框架，这个经过验证的流水线能够实现代理模型的标准化比较。我们概述了计划中的系统评估，包括傅里叶神经算子与U-Net变体在预测、超分辨率和误差校正任务上的表现，并使用物理信息度量来评估湍流能量级联的表示。未来的工作将比较数值求解器和神经代理之间的计算效率，探索实际应用。我们寻求社区对我们验证方法、计划中的基准方法论以及湍流中神经算子评估优先级的反馈。

英文摘要

Evaluating neural operators for 3D turbulent flow requires validated datasets with physical benchmarks. We present a reproducible pipeline generating training data for 3D channel flows around generated geometries at Re=1,000-10,000. Our lattice Boltzmann solver with cumulant collision operators is rigorously verified against experimental measurements (Strouhal number, drag coefficients, turbulent fluctuations) with comprehensive grid convergence studies at resolution 1024x512x512. Building upon an established framework, this validated pipeline enables standardized surrogate model comparison. We outline planned systematic evaluation of Fourier Neural Operator and U-Net variants on forecasting, super-resolution, and error correction tasks, using physics-informed metrics to assess turbulent energy cascade representation. Future work will compare computational efficiency between numerical solvers and neural surrogates, exploring practical application. We seek community feedback on our validation approach, planned benchmark methodology, and evaluation priorities for neural operators in turbulent flows.

URL PDF HTML ☆

赞 0 踩 0

2606.16863 2026-06-16 cs.LG 新提交

HawkesNest: A Multi-Axis Synthetic Benchmark for Spatiotemporal Pattern Complexity

HawkesNest：时空模式复杂度的多轴合成基准

Yahya Aalaila, Sumantrak Mukherjee, Gerrit Großmann, Sebastian Vollmer

发表机构 * German Research Center for Artificial Intelligence (DFKI), Data Science and its Applications Research Group, Kaiserslautern, Germany（德国人工智能研究中心（DFKI），数据科学及其应用研究组，凯撒斯劳滕）； Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau (RPTU), Kaiserslautern, Germany（莱茵兰-普法尔茨凯撒斯劳滕-兰道工业大学（RPTU）计算机科学系，凯撒斯劳滕）

AI总结提出HawkesNest基准，基于多元Hawkes过程定义四个复杂度轴，用于可控测试时空点过程模型在已知结构难度下的性能。

详情

AI中文摘要

时空点过程（STPP）模型的评估严重依赖于不透明的真实世界数据集，其中潜在生成结构未知且模型失败难以归因。我们引入HawkesNest，一个基于多元Hawkes骨干的生成器对齐基准，用于可控的时空模式复杂度。HawkesNest定义了四个复杂度轴：时空纠缠、背景异质性、跨类型交互和域拓扑。每个轴与从潜在数据生成机制计算出的确定性指标相关联。通过在保持全局速率、稳定性和模拟预算固定的同时改变这些轴，HawkesNest能够在已知结构难度下对STPP模型进行诊断性压力测试。我们验证了在受控扫描下这些指标是单调且几乎正交的。我们通过展示Hawkes系列基线在联合异质性-纠缠复杂度下性能下降来说明其用途，尽管它们在结构上与Hawkes数据生成骨干对齐。我们进一步表明HawkesNest暴露了神经模型的敏感性：AutoSTPP在时空纠缠单独增加时仍然脆弱。代码可在https://github.com/YahyaAalaila/HawkesNest获取。

英文摘要

Evaluation of spatiotemporal point process (STPP) models relies heavily on opaque real-world datasets, where latent generative structure is unknown and model failures are difficult to attribute. We introduce HawkesNest, a generator-aligned benchmark for controlled spatiotemporal pattern complexity built on a multivariate Hawkes backbone. HawkesNest defines four complexity axes: space--time entanglement, background heterogeneity, cross-type interaction, and domain topology. Each axis is associated with a deterministic index computed from the latent data-generating mechanism. By varying these axes while holding global rate, stability, and simulation budget fixed, HawkesNest enables diagnostic stress tests of STPP models under known structural difficulty. We verify that the indices are monotone and nearly orthogonal under controlled sweeps. We illustrate its use by showing that Hawkes-family baselines degrade under joint heterogeneity--entanglement complexity, even though they are structurally aligned with the Hawkes data-generating backbone. We further show that HawkesNest exposes neural-model sensitivity: AutoSTPP remains vulnerable under isolated increases in space--time entanglement. Code. Available at https://github.com/YahyaAalaila/HawkesNest

URL PDF HTML ☆

赞 0 踩 0

2606.17014 2026-06-16 cs.LG math.ST stat.ML stat.TH 新提交

Filtered Conformal Ellipsoids for Graph-Native Time Series

图原生时间序列的过滤共形椭球

Yannick Limmer

发表机构 * DRW London（DRW伦敦）

AI总结提出过滤共形椭球方法，结合状态空间滤波与共形校准，为多元时间序列生成联合预测集，控制单事件并适应跨坐标依赖，通过可观测预测律商分析保证覆盖界。

详情

AI中文摘要

多元时间序列的联合预测集应控制单个事件，同时适应跨坐标依赖性。我们研究过滤共形椭球：一个冻结的状态空间滤波器输出一步预测均值和协方差，并对得到的马氏距离分数应用分割共形校准。滤波器用于选择椭球形状；共形校准选择标量半径，因此该构造受益于学习到的预测协方差，而不依赖高斯尾部概率来保证覆盖。主要困难在于过滤分数是依赖的，且学习到的循环滤波器不需要在其原始隐藏状态上收缩；因此，我们分析可观测预测律商中的收缩，该商识别产生相同未来发射高斯律序列的隐藏状态。在稳定的贝叶斯高斯投影滤波器、协方差界和有限时域可观测性费舍尔条件下，小超额高斯负对数似然意味着学习到的发射律的收缩。结合阈值自协方差包络，这给出了依赖下过滤分割共形预测的切比雪夫型近似覆盖界；更尖锐的伯恩斯坦型界需要额外的几何混合集中假设。在高斯预言可实现性下，我们还在条件有效的高斯椭球规则类中获得了接近预言的log体积比较。我们使用具有对角加低秩协方差的GCN-GRU滤波器实例化该框架。在中等规模的图原生交通基准（METRLA-$20$和PEMSBAY-$50$）上，学习到的滤波器比静态协方差和非滤波基线给出更尖锐的目标椭球；在全图规模和非图原生数据集上，因子和copula基线可能更强。

英文摘要

Joint prediction sets for multivariate time series should control a single event while adapting to cross-coordinate dependence. We study filtered conformal ellipsoids: a frozen state-space filter emits a one-step predictive mean and covariance, and split-conformal calibration is applied to the resulting Mahalanobis scores. The filter is used to choose the ellipsoid shape; conformal calibration chooses the scalar radius, so the construction benefits from a learned predictive covariance without relying on Gaussian tail probabilities for coverage. The main difficulty is that filtered scores are dependent and learned recurrent filters need not contract in their raw hidden state; we therefore analyse contraction in an observable predictive-law quotient that identifies hidden states producing the same future sequence of emitted Gaussian laws. Under a stable Bayes Gaussian-projection filter, covariance bounds, and a finite-horizon observability Fisher condition, small excess Gaussian negative log-likelihood implies contraction of the learned emitted laws. Combined with a threshold-autocovariance envelope this yields a Chebyshev-type approximate coverage bound for filtered split-conformal prediction under dependence; a sharper Bernstein-type bound requires an additional geometric-mixing concentration assumption. Under Gaussian oracle realisability we also obtain a near-oracle log-volume comparison within the class of conditionally valid Gaussian ellipsoid rules. We instantiate the framework with a GCN-GRU filter with diagonal-plus-low-rank covariance. On moderate-size graph-native traffic benchmarks (METRLA-$20$ and PEMSBAY-$50$), the learned filter gives sharper at-target ellipsoids than static-covariance and non-filter baselines; at full-graph scale and on non-graph-native datasets, factor and copula baselines can be stronger.

URL PDF HTML ☆

赞 0 踩 0

2606.14780 2026-06-16 cs.CV cs.LG 交叉投稿

YTClickbait21K: Human-Annotated Multimodal Dataset for YouTube Clickbait Detection Across Diverse Channels and Content Categories

YTClickbait21K：面向YouTube点击诱饵检测的多模态人工标注数据集，覆盖多样频道与内容类别

Md. Minhazul Islam, Md. Tanbeer Jubaer, Amith Khandakar, Shovon Sarker, Sumaiya Rahman, Md. Masum Mia, Mohamed Arselene Ayari, Hamed Noori

发表机构 * Department of Computer Science and Engineering, Rajshahi University of Engineering & Technology（拉贾沙希工程与技术大学计算机科学与工程系）； Department of Electrical Engineering, Qatar University（卡塔尔大学电气工程系）； Department of Civil and Environmental Engineering, Qatar University（卡塔尔大学土木与环境工程系）； SenseNet Inc.（SenseNet公司）

AI总结为应对视频平台点击诱饵检测缺乏大规模高质量多模态数据的问题，构建了包含21,238个视频、来自29国40频道、覆盖新闻/娱乐/教育/游戏等类别的人工标注数据集YTClickbait21K，通过三人独立标注与多数投票确保质量，为多模态语义理解和自动内容审核提供基准。

详情

AI中文摘要

视频分享平台上的点击诱饵内容对信息可靠性构成重大挑战，然而自动检测的进展一直受限于缺乏大规模、高质量的多模态数据集。我们提出了YTClickbait21K，一个人工标注的YouTube点击诱饵数据集，包含来自29个国家40个频道的21,238个视频，覆盖新闻、娱乐、教育和游戏等多种内容类别。每个样本包括结构化元数据（标题、描述、互动统计）以及相关的缩略图图像，支持全面的多模态分析。为确保标注质量，每个视频由三名标注员使用标准化的决策框架独立标注，该框架融合了文本、视觉和跨模态一致性线索，最终标签通过多数投票确定。该数据集展现出显著的人工标注一致性（k=0.65），尽管点击诱饵检测具有固有的主观性，但仍确认了可靠的标注。通过结合规模、标注严谨性和多模态丰富性，该数据集为开发和评估机器学习模型提供了稳健的基准，促进了跨模态语义理解的研究，并推动了自动内容审核系统的发展。

英文摘要

Clickbait content on video-sharing platforms poses a significant challenge to information reliability, yet progress in automated detection has been constrained by the lack of large-scale, high-quality multimodal datasets. We present YTClickbait21K, a human-annotated YouTube clickbait dataset comprising 21,238 videos collected from 40 channels across 29 countries, covering diverse content categories such as news, entertainment, education, and gaming. Each sample includes structured metadata (title, description, engagement statistics) along with associated thumbnail images, enabling comprehensive multimodal analysis. To ensure annotation quality, every video was independently labeled by three annotators using a standardized decision framework that incorporates textual, visual, and cross-modal consistency cues, with final labels determined through majority voting. The dataset exhibits substantial inter-annotator agreement (k=0.65), confirming reliable labeling despite the inherent subjectivity of clickbait detection. By combining scale, annotation rigor, and multimodal richness, this dataset provides a robust benchmark for developing and evaluating machine learning models, facilitating research in cross-modal semantic understanding, and advancing automated content moderation systems.

URL PDF HTML ☆

赞 0 踩 0

2606.14784 2026-06-16 cs.SD cs.LG eess.AS 交叉投稿

LLM-Based Synthetic Ground Truth Generation for Audio-Based Emotion Classification via In-Context Learning

基于上下文学习的音频情感分类的LLM合成真实标签生成

Qing Huang, Pooja Pol, Jianing Zhang

发表机构 * School of Business, Technical University of Applied Sciences Augsburg（应用技术大学阿沙芬堡商学院）； Data Science und Autonome Systeme Technologietransferzentrum (TTZ)（数据科学与自主系统技术转移中心（TTZ））

AI总结提出利用大语言模型（LLM）和上下文学习（ICL）从多用户VR环境的流式语音数据中自动生成情感相关合成真实标签，解决团队协作状态标注难题。

Comments Proceedings of the International Conference on Applied Innovations in IT (ICAIIT), April 2026

详情

AI中文摘要

理解人类状态和交互动态是人机交互（HCI）的核心目标。随着交互范式变得更加沉浸，虚拟现实（VR）已成为研究协作工作的强大平台。在此类环境中，评估团队协作状态（包括团队表现和团队韧性）需要从多模态传感器数据（如语音信号）中连续可靠地推断潜在的团队级认知和情感状态。然而，由于传感器噪声、上下文变异性和稀疏的专家标注，为这些潜在状态生成真实标签仍然具有挑战性。传统的自我报告方法仅提供静态和延迟的测量，因此不足以捕捉连续语音数据中反映的动态团队过程。在这项工作中，我们提出了一种由大语言模型（LLM）驱动的、基于代理的推理工作流，用于从多用户VR环境中的流式语音数据自动生成情感相关的合成真实标签。利用LLM的泛化能力，我们使用上下文学习（ICL）和少量配对的音频样本及其对应转录的演示。ICL倾向于实现与模型微调相当的任务适应，同时避免了参数更新的计算开销。为了构建信息丰富且鲁棒的上下文提示，我们采用基于检索的选择策略，根据声学特征空间中的相似性动态识别相关的音频演示。

英文摘要

Understanding human states and interaction dynamics is a core goal of human-computer interaction (HCI). As interaction paradigms become more immersive, virtual reality (VR) has emerged as a powerful platform for studying collaborative work. In such settings, evaluating team collaboration states, including team performance and team resilience, requires continuous and reliable inference of latent team-level cognitive and affective states from multi-modal sensor data, such as speech signals. However, generating ground truth labels for these latent states remains challenging due to sensor-induced noise, contextual variability, and sparse expert annotations. Traditional self-reporting approaches provide only static and delayed measurements and are therefore insufficient for capturing dynamic team processes reflected in continuous speech data. In this work, we propose a large language model (LLM)-driven, agentic inference workflow for automated emotion-related synthetic ground truth generation from streaming speech data in multi-user VR environments. Leveraging the generalization capabilities of LLMs, we use In-Context Learning (ICL) with few-shot demonstrations of paired audio-based samples and their corresponding transcriptions. ICL tends to achieve task adaptation comparable to model fine-tuning while circumventing the computational overhead of parameter updates. To construct informative and robust in-context prompts, we adopt a retrieval-based selection strategy that dynamically identifies relevant audio demonstrations based on similarity in the acoustic feature space.

URL PDF HTML ☆

赞 0 踩 0

2606.14870 2026-06-16 hep-ph cs.LG 交叉投稿

Pre-Training for Simulation-Based Science: A Study on Jet Foundation Model Training Objectives

基于模拟的科学预训练：喷注基础模型训练目标研究

Ibrahim Elsharkawy, Joschka Birk, Vinicius Mikuni, Wahid Bhimji, Gregor Kasieczka, Benjamin Nachman

发表机构 * Department of Physics, University of Toronto and Vector Institute（物理系，多伦多大学和向量研究所）； NERSC, Lawrence Berkeley National Laboratory（NERSC，伯克利国家实验室）； Institut für Experimentalphysik, Universität Hamburg（实验物理研究所，汉堡大学）； Nagoya University, Kobayashi-Maskawa Institute（名古屋大学，小林昭夫研究所）； Department of Particle Physics and Astrophysics, Stanford University（粒子物理与天体物理系，斯坦福大学）； Fundamental Physics Directorate, SLAC National Accelerator Laboratory（基础物理局，SLAC国家加速器实验室）

AI总结本文系统比较了高能物理中基础模型的预训练方法，发现纯分类预训练在标签充足时最优，结合自监督掩码粒子建模在低标签场景下表现突出，而流匹配生成预训练对下游分类无益，但必须包含在预训练目标中才能提升生成任务。

详情

AI中文摘要

基于大规模数据集预训练并在下游任务上微调的基础模型已成为人工智能促进科学领域的强大范式。工业基础模型通常由于缺乏标签而使用掩码自监督训练。在许多科学领域，精确的模拟资源丰富，并提供了大量带标签的数据集，这为预训练开辟了新的可能性。我们利用全学习高能物理基础模型框架，系统比较了预训练方法。我们测试了监督分类、流匹配生成和自监督掩码粒子建模。所有模型均在JetClass数据集上预训练，并在两个代表性下游任务（顶喷注分类和JetNet条件生成）上微调。在其他观察中，对于分类任务，我们发现当下游标签和模型容量充足时，纯分类器预训练是最优的，但在低微调标签区域，将其与自监督掩码粒子建模结合具有独特优势。基于流匹配的生成预训练似乎对下游分类几乎没有益处，有趣的是，对于下游生成，我们发现流匹配必须出现在预训练目标中才能看到显著的微调优势，这暗示了分类和生成任务的正交性。也就是说，要使模型能够迁移到生成和分类下游任务，它必须在两者上都进行预训练。本研究为基于模拟科学中基础模型的受控缩放分析提供了模板。

英文摘要

Foundation models (FMs) trained on large datasets and fine-tuned on downstream tasks have emerged as a powerful paradigm in AI for science. Industrial FMs are typically trained using self-supervision with masking due to the lack of labels. In many scientific domains, accurate simulations are plentiful and facilitate large, labeled datasets. This opens up new possibilities for pre-training. We present a systematic comparison of pre-training methods using the OmniLearned High Energy Physics FM framework. We test supervised classification, flow-matching generation, and self-supervised masked particle modeling. All models are pre-trained on the JetClass dataset and fine-tuned on two representative downstream tasks, top jet classification and JetNet conditional generation. Among other observations, for classification tasks, we find that pure classifier pre-training is optimal when downstream labels and model capacity are plentiful, but combining it with self-supervised masked particle modeling (MPM) is uniquely powerful in the low-finetuning label regime. Flow matching-based generative pre-training seems to provide little benefit for downstream classification, and interestingly, for downstream generation, we find that flow matching must be in the pre-training objective to see a significant finetuning advantage, hinting at the orthogonality of classification and generation tasks. That is, for a model to transfer to both generative and classification downstream tasks, it must be pre-trained on both. This study provides a template for controlled scaling analysis of pre-training objectives for foundation models in simulation-based sciences.

URL PDF HTML ☆

赞 0 踩 0

2606.14958 2026-06-16 cs.CV cs.IR cs.LG 交叉投稿

MVEB: Massive Video Embedding Benchmark

MVEB：大规模视频嵌入基准

Adnan El Assadi, Roman Solomatin, Isaac Chung, Chenghao Xiao, Deep Shah, Manan Dey, Shriya Sudhakar, Zacharie Bugaud, Wissam Siblini, Ayush Sunil Munot, Yashwanth Devavarapu, Rakshitha Ireddi, Michelle Yang, Márton Kardos, Niklas Muennighoff, Kenneth Enevoldsen

AI总结提出MVEB基准，包含23个任务评估33种视频嵌入模型，发现无单一模型占优，音频贡献取决于标注来源，并集成到MTEB生态。

详情

AI中文摘要

我们介绍了大规模视频嵌入基准（MVEB），这是一个包含23个任务的视频嵌入基准，涵盖分类、零样本分类、聚类、配对分类、检索和以视频为中心的问答。我们评估了33个模型，发现没有单一模型占优：基于MLLM的嵌入在分类、聚类、配对分类和问答上领先；多模态绑定在检索和零样本分类上领先；没有对比适应训练的生成式MLLM在跨模态任务上崩溃。成对的仅视频与音频+视频评估表明，音频的贡献取决于数据集标注来源：当标签来自两种模态时音频有帮助，当仅来自视觉时则有害，这一差距在模型族中一致为6个百分点。MVEB源自MVEB+（一个包含184个任务的任务池），旨在保持任务多样性的同时降低评估成本。它集成到MTEB生态系统中，以实现跨文本、图像、音频和视频的统一评估。我们在https://github.com/embeddings-benchmark/mteb上发布MVEB和所有184个任务，以及代码和排行榜。

英文摘要

We introduce the Massive Video Embedding Benchmark (MVEB), a 23-task benchmark for video embeddings spanning classification, zero-shot classification, clustering, pair classification, retrieval, and video-centric question answering. We evaluate 33 models and find that no single model dominates: MLLM-based embeddings lead on classification, clustering, pair classification, and QA; multimodal binding leads on retrieval and zero-shot classification; generative MLLMs without contrastive adaptation collapse on cross-modal tasks. Paired video-only vs. audio+video evaluations show that audio's contribution depends on dataset annotation provenance: audio helps when labels were produced from both modalities and hurts when they were produced from visuals alone, a six-point gap consistent across model families. MVEB is derived from MVEB+, a 184-task pool, and is designed to maintain task diversity while reducing evaluation cost. It integrates into the MTEB ecosystem for unified evaluation across text, image, audio, and video. We release MVEB and all 184 tasks along with code and a leaderboard at https://github.com/embeddings-benchmark/mteb.

URL PDF HTML ☆

赞 0 踩 0

2606.15123 2026-06-16 cs.CR cs.LG 交叉投稿

Data-Centric Benchmarking of Exploit Generation in LLMs: Understanding the Impact of Fine-Tuning

数据为中心的LLM漏洞利用生成基准测试：理解微调的影响

Yiwei Chen, Lichi Li, Kai Cheung, Vinny Parla, Ganesh Sundaram

发表机构 * Cisco Systems, Inc.（思科系统公司）； Michigan State University（密歇根州立大学）

AI总结采用数据驱动方法，构建高质量数据集并设计评估框架，对17个大语言模型进行零样本漏洞利用生成能力基准测试，发现8B开源模型经微调后性能提升超42.5%，接近部分商业模型。

Comments Technical Report

详情

AI中文摘要

我们研究了CVE条件漏洞利用生成任务，即模型根据软件漏洞上下文生成概念验证（PoC）漏洞利用。我们采用数据驱动的方法，通过多阶段预处理构建高质量数据集，并引入可扩展的评估框架，使用LLM作为评判者和细粒度评分标准。在此统一设置下，我们根据8个评估标准对17个大语言模型进行了基准测试，系统性地洞察了它们的零样本能力。我们进一步证明，一个紧凑的8B开源模型在精选数据上微调后，漏洞利用质量提升了超过42.5%，并且当与简单的测试时拒绝策略结合时，可与一些专有模型相媲美。我们的结果强调了数据质量、结构化监督和评估设计对于可靠漏洞利用生成的重要性，表明这些因素在将LLM适应网络安全任务时可能与模型规模同等关键。

英文摘要

We study the task of CVE-conditioned exploit generation, where a model drafts proof-of-concept (PoC) exploits given software vulnerability context. We adopt a data-centric approach, constructing a high-quality dataset via multi-stage preprocessing and introducing a scalable evaluation framework with LLM-as-judge and fine-grained rubrics. Under this unified setup, we benchmark 17 large language models across 8 evaluation criteria, providing systematic insights into their zero-shot capabilities. We further show that a compact 8B open-weight model, when fine-tuned on curated data, achieves over 42.5% improvement in exploit quality and rivals some proprietary models when combined with simple test-time rejection strategies. Our results highlight the importance of data quality, structured supervision, and evaluation design for reliable exploit generation, suggesting that these factors can be as critical as model scale in adapting LLMs to cybersecurity tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.15367 2026-06-16 cs.AI cs.CL cs.IR cs.LG 交叉投稿

S1-DeepResearch: Beyond Search, Toward Real-World Long-Horizon Research Agents

S1-DeepResearch：超越搜索，迈向真实世界的长周期研究智能体

Yao Dong, Xinglin Xiao, Liwei Dong, Xinlong Jin, Zhengbo Li, Heng Zhang, Duyun Wang, Nan Xu

发表机构 * XScience Lab（XScience实验室）； Wenge AI（问格人工智能）

AI总结提出统一轨迹构建范式，结合封闭式问答与开放式探索，通过图基任务构建、智能体轨迹生成和多维验证，合成高质量长链推理轨迹，训练出在20个基准上达到开源最优的32B模型。

详情

AI中文摘要

深度研究智能体旨在通过长周期规划、证据收集、推理和报告生成来解决复杂的知识密集型任务。尽管搜索智能体近期在信息检索和答案验证方面展现出强大能力，但现有训练数据集大多以搜索为中心，主要关注封闭式问答和信息定位。因此，它们主要训练信息寻求行为，而对关键深度研究能力（包括证据整合、知识综合、规划、文件理解和结构化报告生成）的覆盖有限。在这项工作中，我们提出了一种用于深度研究智能体的统一轨迹构建范式，该范式结合了封闭式问答和开放式探索。所提出的框架包括图基任务构建、智能体轨迹展开和多维轨迹验证，能够可扩展地合成涵盖长链复杂推理、深度研究指令遵循、报告撰写、文件理解与生成以及技能使用的高质量智能体轨迹。与现有的面向搜索的数据集相比，我们合成的轨迹更强调知识综合、复杂推理和规划。S1-DeepResearch-32B在跨越五个能力维度（包括复杂推理、指令遵循、报告生成、文件理解和技能使用）的20个基准测试中，达到了同等规模开源模型的最先进性能。在几个具有挑战性的深度研究基准上，它接近领先的专有前沿模型的性能。这些结果强调了联合建模信息获取、知识综合和面向规划的智能体行为对于构建有效深度研究智能体的重要性。

英文摘要

Deep research agents aim to solve complex knowledge-intensive tasks through long-horizon planning, evidence gathering, reasoning, and report generation. While recent progress in search agents has demonstrated strong capabilities in information retrieval and answer verification, most existing training datasets remain search-centric, focusing primarily on closed-ended question answering and information localization. As a result, they mainly train information-seeking behavior while providing limited coverage of key deep research capabilities, including evidence integration, knowledge synthesis, planning, file understanding, and structured report generation. In this work, we propose a unified trajectory construction paradigm for deep research agents that combines closed-ended QA and open-ended exploration. The proposed framework consists of graph-grounded task formulation, agentic trajectory rollout, and multi-dimensional trajectory verification, enabling scalable synthesis of high-quality agentic trajectories spanning long-chain complex reasoning, deep research instruction following, report writing, file understanding and generation, and skills usage. Compared with existing search-oriented datasets, our synthesized trajectories place greater emphasis on knowledge synthesis, complex reasoning, and planning. S1-DeepResearch-32B achieves state-of-the-art performance among open-source models of comparable scale across 20 benchmarks spanning five capability dimensions, including complex reasoning, instruction following, report generation, file understanding, and skills usage. On several challenging deep research benchmarks, it approaches the performance of leading proprietary frontier models. These results highlight the importance of jointly modeling information acquisition, knowledge synthesis, and planning-oriented agent behaviors for building effective deep research agents.

URL PDF HTML ☆

赞 0 踩 0

2606.15532 2026-06-16 cs.CL cs.LG 交叉投稿

哪里出错了？基于语义状态追踪的Web智能体过程级评估

Jiwan Chung, JiHyuk Byun, Vibhav Vineet, Seon Joo Kim

发表机构 * Yonsei University（延世大学）； Microsoft Research（微软研究院）

AI总结提出WebStep基准，通过语义MDP追踪过程状态，揭示隐藏于终端成功率下的智能体差异，并定位具体改进方向。

详情

AI中文摘要

Web智能体通过长交互序列执行任务，然而现有基准仅评估终端成功，丢弃所有过程信息，对改进提供的指导有限。在这项工作中，我们对Web智能体进行了过程级分析。我们引入了WebStep，一个包含1800个任务实例的基准，具有可控难度和自动语义状态追踪。每个网站除了GUI外还暴露一个确定性的语义MDP：智能体在界面上操作，而环境在后台记录高级状态和转换，从而实现无需人工标注的细粒度分析。基于语义轨迹，我们首先表明过程度量揭示了结果评估无法察觉的差异：三个成功率集中在31-33%的智能体在探索范围与执行准确性上存在分歧。然后，按技能分解刻画了这些差异的本质，揭示了同一网站内隐藏的相反技能排名：例如，在Housing上，OpenAI CUA在提交动作上优于Qwen3.5 23.7%，但在过滤上却落后15.6%，精确指出了即使在单个领域内也需要改进的具体技能。分叉分析进一步定位了导致任务失败的决定性错误，并表明该错误是智能体特定的而非共享的。最后，随着任务难度增加，这些差异扩大：在简单任务上成功率相似，但随着探索要求提高而急剧分化。我们的过程级分析为Web智能体评估开辟了新途径，提供了关于每个智能体应在何处以及如何改进的细粒度且可操作的见解。

英文摘要

Web agents act through long interaction sequences, yet existing benchmarks evaluate only terminal success, discarding all process information and offering little guidance on improvement. In this work, we conduct a process-level analysis of web agents. We introduce WebStep, a benchmark of 1,800 task instances with controlled difficulty and automatic semantic state tracking. Each website exposes a deterministic semantic MDP alongside the GUI: the agent operates on the interface, while the environment records high-level states and transitions in the background, enabling fine-grained analysis without manual annotation. Based on the semantic trajectory, we first show that process metrics reveal differences invisible to outcome evaluation: three agents whose success rates cluster within 31-33% diverge in exploration reach versus execution accuracy. Then, decomposing by skill characterizes the nature of these differences, exposing opposite per-skill rankings hidden within the same website: e.g., on Housing, OpenAI CUA outperforms Qwen3.5 by 23.7% on commit actions yet underperforms it by 15.6% on filtering, pinpointing a concrete skill to improve even within a domain. Bifurcation analysis further localizes the decisive error that loses the task and shows that this error is agent-specific rather than shared. Finally, these differences widen as tasks grow harder: success rate is similar on easy tasks but separates sharply as exploration becomes more demanding. Our process-level analysis opens a new avenue in web agent evaluation, providing fine-grained and actionable insight into where and how each agent should be improved.

URL PDF HTML ☆

赞 0 踩 0

2606.15686 2026-06-16 cs.AI cs.LG 交叉投稿

实体标签并非实体信号：文档重排序中可观测相关性的框架

Utshab Kumar Ghosh, Shubham Chatterjee

发表机构 * Department of Computer Science, Missouri University of Science and Technology（计算机科学系，密苏里科技大学）

AI总结提出实体可观测相关性（OER）与概念相关性（CER）的区分，证明CER监督效果差，而OER对齐可显著提升重排序性能。

Comments ICTIR '26

详情

DOI: 10.1145/3805713.3820411
Journal ref: Proceedings of the 2026 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR)

AI中文摘要

实体感知的文档检索使用与查询关联的实体作为排序信号，假设语义相关的实体也是有用的检索信号。我们证明这一假设是不充分的，并解释原因。与作为真实观测的词项不同，实体链接是由不完美的链接器产生的假设：如果链接器在相关和非相关文档中无差别地触发，那么一个实体可能在主题上重要，却不提供任何判别性信号。我们将此形式化为概念实体相关性（CER）——实体是否与查询主题相关——和可观测实体相关性（OER）——其在集合中的观测出现是否能区分相关与非相关文档——之间的区别。在四个集合和包括人工实体判断的标注来源上，CER和OER表现出接近随机的吻合度（κ≈0），而OER的操作化实现吻合度较高（κ≈0.5），确认CER是系统性异常值。基于CER的监督选择主题上合理但判别性弱的实体，在某些集合上仅能过滤不到4%的非相关文档。将监督与OER对齐可将非相关文档过滤提升至10倍，并在BM25基础上将开放世界MAP提升0.051。我们的发现促使实体感知检索中从概念实体相关性向可观测实体相关性的转变。

英文摘要

Entity-aware document retrieval uses query-associated entities as ranking signals, assuming that semantically relevant entities are also useful retrieval signals. We show this assumption is insufficient- and explain why. Unlike terms, which are ground-truth observations, entity links are hypotheses produced by an imperfect linker: an entity can be topically central yet provide no discriminative signal if the linker fires indiscriminately across relevant and non-relevant documents. We formalize this as a distinction between Conceptual Entity Relevance (CER)- whether an entity is topically related to a query- and Observable Entity Relevance (OER)- whether its observed presence in a collection discriminates relevant from non-relevant documents. Across four collections and annotation sources including human entity judgments, CER and OER exhibit near-chance agreement ($κ\approx 0$), while OER operationalizations agree substantially ($κ\approx 0.5$), confirming CER as the systematic outlier. CER-based supervision selects topically plausible but weakly discriminative entities, pruning fewer than 4% of non-relevant documents on some collections. Aligning supervision with OER improves non-relevant pruning by up to 10x and open-world MAP by 0.051 over BM25. Our findings motivate a shift from conceptual to observable notions of entity relevance in entity-aware retrieval.

URL PDF HTML ☆

赞 0 踩 0

2606.16062 2026-06-16 cs.AI cs.LG 交叉投稿

Auditing Reward Hackability in Code RL Training Environments

审计代码强化学习训练环境中的奖励可破解性

Shreshth Rajan

发表机构 * GitHub

AI总结测量代码RL环境接受错误解决方案的比率，发现SWE-bench Verified中28.5%的任务测试套件薄弱，并提出通过LLM判断器和Docker金标准门控来加固漏洞任务的方法。

详情

AI中文摘要

我们测量了代码强化学习环境将错误解决方案视为正确的比率。在SWE-bench Verified的49个任务样本中，28.5%的任务测试套件足够薄弱，以至于Docker验证的错误补丁能通过它们。在6个代码库的20个R2E-Gym任务上，相同的单次利用生成管道产生25.0%的成功率。对SWE-bench Verified上134个前沿模型提交的随机效应荟萃分析发现，在相同人工评定的难度层级内，模型Pass@1在标记为可破解的任务上比稳健任务高14.14个百分点（95%置信区间[+11.80, +16.48]；单侧p < 10^-6；I^2 = 0%；134个模型中有123个为正）。然后我们描述了一个加固被破坏任务的流程。一个内联LLM判断器配合Docker金标准门控，在咨询判断器之前对每个生成的测试针对金标准解决方案运行。在审计中的11个被破坏任务上，门控标记出105个决定性的LLM生成测试中的65个在金标准补丁上失败，这是LLM判断器单独遗漏的61.9%的每次增强缺陷率。通过多样性偏置重试，该循环将11个任务中的9个收敛到门控升级。

英文摘要

We measure the rate at which code RL environments accept incorrect solutions as correct. On a 49-task sample of SWE-bench Verified, 28.5% of tasks have test suites weak enough that a Docker-verified incorrect patch passes them. On 20 R2E-Gym tasks across 6 repositories, the same pipeline at single-shot exploit generation yields 25.0%. A random-effects meta-analysis over 134 frontier model submissions to SWE-bench Verified finds, within the same human-rated difficulty stratum, model Pass@1 is +14.14 percentage points higher on flagged-hackable tasks than on robust ones (95% CI [+11.80, +16.48]; one-sided p < 10^-6; I^2 = 0%; 123 of 134 models positive). We then describe a procedure for hardening the broken tasks. An inline LLM judge with a Docker gold-sanity gate runs each generated test against the gold solution before the judge is consulted. On the 11 broken tasks in the audit, the gate flags 65 of 105 decisive LLM-generated tests as failing on the gold patch itself, a 61.9% per-augmentation defect rate the LLM judge alone misses. With diversity-biased retry, the loop converges 9 of 11 tasks to a gated upgrade.

URL PDF HTML ☆

赞 0 踩 0

2606.16113 2026-06-16 cs.AI cs.LG 交叉投稿

通过语义约束验证评估LLM个性化

Xuran Li, Guanqin Zhang, Imran Razzak, Hakim Hacid, Eleanna Kafeza, Hao Xue, Flora D. Salim

发表机构 * University of New South Wales（新南威尔士大学）； Mohamed bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）； The Technology Innovation Institute（技术创新研究所）； The Hong Kong University of Science and Technology（香港科技大学）

AI总结提出NLICV框架，利用自然语言推理模型将句子映射到真值条件集，验证个性化约束，将LLM行为分为四类，与人类标注高度一致，并大幅降低延迟和成本。

详情

AI中文摘要

当前大型语言模型（LLM）个性化的评估范式严重依赖于脆弱的表面匹配指标或计算成本高昂的LLM作为评判者的协议，两者都缺乏可解释性。为了解决这些局限性，我们引入了自然语言推理约束验证（NLICV），这是一个可扩展的、语义不变的框架，它将句子含义映射到真值条件集，通过自然语言推理（NLI）模型验证个性化约束。超越二元评分，NLICV将LLM行为分为四种不同模式：个性化、泛化、谄媚和失败。大量实验表明，NLICV与人工标注高度一致，同时大幅降低了与LLM评判者相关的延迟和令牌成本（高达2100倍推理加速）。最后，通过基于消融的程序，NLICV精确定位驱动约束验证的准确句子，为其评估提供忠实、可理解的证据。

英文摘要

Current evaluation paradigms for Large Language Model (LLM) personalization rely heavily on brittle surface-matching metrics or computationally expensive LLM-as-a-judge protocols, both of which lack interpretability. To address these limitations, we introduce Natural Language Inference Constraint Verification (NLICV), a scalable, semantically invariant framework that maps sentence meanings to truth-condition sets to verify personalization constraints via a Natural Language Inference (NLI) model. Moving beyond binary scoring, NLICV categorizes LLM behaviors into four distinct modes: personalization, generalization, sycophancy, and failure. Extensive experiments demonstrate that NLICV aligns closely with human annotations while drastically reducing the latency and token costs associated with LLM judges (up to 2100 inference speedup). Finally, through an ablation-based procedure, NLICV pinpoints the exact sentences driving the constraint verification, yielding faithful, understandable evidence for its evaluations.

URL PDF HTML ☆

赞 0 踩 0

2606.16540 2026-06-16 q-bio.QM cs.LG q-bio.BM q-bio.GN 交叉投稿

MultiMolecule: a modular ecosystem for biomolecular sequence-model workflows

MultiMolecule: 一个用于生物分子序列模型工作流的模块化生态系统

Zhiyuan Chen

发表机构 * DanLing Team（丹 Ling 团队）

AI总结提出MultiMolecule开源生态系统，通过标准化接口整合RNA、DNA和蛋白质序列模型，提供53个模型族实现、112个检查点和16个数据集资源，支持模型复用、评估和生物预测。

详情

AI中文摘要

生物分子序列模型越来越多地被用于最初研究之外的任务，但公开的检查点很少保留检查源定义行为、适应新实验、在共享任务定义下比较模型或部署生物预测所需的执行上下文。MultiMolecule是一个开源Python生态系统，它将异质的RNA、DNA和蛋白质序列模型发布转变为完整的、经过源检查的模型族实现，并带有共享的加载、工作流和预测接口。此处报告的Resource状态包括53个完整的模型族实现，包含112个标准化的模型检查点，以及通过39个公共数据集仓库发布的16个精选数据集资源和10个面向用户的预测管道。标准化组件链接到源出处、转换或准备代码、源参考检查、扩展数据摘要和公共文档，允许用户检查哪些内容被标准化、哪些行为被检查以及每个组件如何进入训练、评估、推理或部署。通过将复用从特定仓库的检查点转移到与标准化检查点、精选数据集、Runner工作流和生物预测管道相连的可执行实现，MultiMolecule为保留源定义的模型行为、适应新实验、实现受控评估和部署生物分子预测提供了通用基础设施。

英文摘要

Biomolecular sequence models are increasingly reused outside the studies in which they were introduced, but public checkpoints rarely preserve the execution context needed to inspect source-defined behavior, adapt models to new assays, compare models under shared task definitions or deploy biological predictions. MultiMolecule is an open-source Python ecosystem that turns heterogeneous RNA, DNA and protein sequence-model releases into complete, source-checked model-family implementations with shared loading, workflow and prediction interfaces. The Resource state reported here includes 53 complete model-family implementations with 112 standardized model checkpoints, together with 16 curated dataset resources released through 39 public dataset repositories and 10 user-facing prediction pipelines. Standardized components are linked to source provenance, conversion or preparation code, source-reference checks, Extended Data summaries and public documentation, allowing users to inspect what was standardized, what behavior was checked and how each component enters training, evaluation, inference or deployment. By shifting reuse from repository-specific checkpoints to executable implementations connected to standardized checkpoints, curated datasets, Runner workflows and biological prediction pipelines, MultiMolecule provides common infrastructure for preserving source-defined model behavior, adapting models to new assays, enabling controlled evaluation and deploying biomolecular predictions.

URL PDF HTML ☆

赞 0 踩 0

2606.16541 2026-06-16 cs.AI cs.LG 交叉投稿

The Faithfulness Gap: Certifying Semantic Equivalence Between Natural-Language and Formal Mathematical Statements

忠实性差距：认证自然语言与形式数学语句之间的语义等价性

Noor Islam S. Mohammad, Tamim Sheikh

发表机构 * Department of Computer Science, Informatics Institute, Istanbul Technical University, İstanbul, Türkiye（信息学院计算机科学系，伊斯坦布尔技术大学，伊斯坦布尔，土耳其）； Department of Computer Science（计算机科学系）； Engineering, Jashore University of Science（工程系，贾沙尔大学科学学院）

AI总结提出双向可证明性指纹识别框架，通过前向和后向推论邻域匹配自然语言探针，认证自动形式化翻译的忠实性，并引入反事实探针生成、等价谱、自适应探针预算分配和忠实性引导解码四个新组件，在基准上实现高检测率并减少漂移。

详情

AI中文摘要

自动形式化——将自然语言数学翻译成形式证明助手——的瓶颈不在于翻译流畅性，而在于\emph{忠实性}：一个形式语句可以通过类型检查且可证明，但仍可能编码与源意图不同的定理。我们引入\emph{双向可证明性指纹识别}（\bpf{}），这是一个通过刻画每个候选在背景理论中的前向和后向推论邻域，并将这些邻域与从自然语言语句导出的探针进行匹配来认证忠实性的框架。我们进一步引入四个新组件：（i）\emph{反事实探针生成}（\cpg{}），一种合成针对特定漂移方向的探针的对比性程序；（ii）\emph{等价谱}，一个替代脆弱的二元判决的连续忠实性分数；（iii）\emph{自适应探针预算分配}（\apba{}），一个信息论预算路由器；以及（iv）\emph{忠实性引导解码}（\fgd{}），它在自动形式化过程中使用\bpf{}信号作为奖励。我们证明了一个\emph{漂移检测定理}和一个\emph{PAC-忠实性}结果，该结果确立了在温和假设下，自然语言语句的等价类可以从$\mathcal{O}(\log(1/δ)/\varepsilon)$个探针中学习。我们发布了\driftbench{}，一个包含$2{,}183$个NL/Lean~4对的基准，这些对具有跨mathlib4六个子领域的受控漂移标签。\bpf{}\,+\,\cpg{}在$3.0\%$的假阳性率下检测出$89.6\%$的漂移形式化——相比之下，类型检查为$41.2\%$，LLM评判基线为$63.3\%$——并且\fgd{}将最先进的自动形式化器产生漂移语句的比率降低了$47\%$。https://pmlrbd.github.io/BPF/

英文摘要

Autoformalization, translating natural-language mathematics into formal proof assistants, is bottlenecked not by translation fluency but by \emph{faithfulness}: a formal statement can typecheck and be provable, yet still encode a different theorem than the source intended. We introduce \emph{Bidirectional Provability Fingerprinting} (\bpf{}), a framework that certifies faithfulness by characterizing each candidate through its forward and backward consequence neighborhoods in the ambient theory and matching these against probes derived from the natural-language statement. We further introduce four novel components: (i) \emph{Counterfactual Probe Generation} (\cpg{}), a contrastive procedure that synthesizes probes targeting specific drift directions; (ii) the \emph{Equivalence Spectrum}, a continuous faithfulness score that replaces brittle binary verdicts; (iii) \emph{Adaptive Probe Budget Allocation} (\apba{}), an information-theoretic budget router; and (iv) \emph{Faithfulness-Guided Decoding} (\fgd{}), which uses \bpf{} signals as a reward during autoformalization. We prove a \emph{drift detection theorem} and a \emph{PAC-faithfulness} result establishing that the equivalence class of a natural language statement is learnable from $\mathcal{O}(\log(1/δ)/\varepsilon)$ probes under mild assumptions. We release \driftbench{}, a benchmark of $2{,}183$ NL/Lean~4 pairs with controlled drift labels across six subfields of mathlib4. \bpf{}\,+\,\cpg{} detects $89.6\%$ of drifted formalizations at a $3.0\%$ false-positive rate-against $41.2\%$ for typecheck and $63.3\%$ for LLM-judge baselines, and \fgd{} reduces the rate at which a state-of-the-art autoformalizer emits drifted statements by $47\%$. https://pmlrbd.github.io/BPF/

URL PDF HTML ☆

赞 0 踩 0

2606.16555 2026-06-16 cs.DC cs.LG 交叉投稿

Incentives and Evidence in Learned Service Orchestration

学习型服务编排中的激励与证据

Syed Izhan Khilji, Alireza Furutanpey, Schahram Dustdar

发表机构 * EPFL（苏黎世联邦理工学院）； University of Waterloo（多伦多大学）

AI总结本文通过预注册实验检验三个有影响力的基于强化学习的服务编排系统，发现大多数性能反转未发生，并指出发表和评审激励偏向基准增益而非部署性能，提出需要生产级比较器和可复现操作证据。

Comments To be presented at the IEEE 2026 International Congress on Intelligent and Service Oriented Systems Engineering (CISOSE 2026)

详情

AI中文摘要

强化学习用于服务编排已持续研究超过十年，但尚未在大规模生产中应用。通常的解释是学习型控制器在延迟和噪声遥测、工作负载变化以及不受控制的租户下性能下降。我们检验现有证据是否支持这一解释。我们评估了三个极具影响力的基于RL的编排系统，涵盖资源分配、DAG调度和自动缩放，使用预注册的关于在生产相关扰动下比较退化的预测，以及带有族系误差校正的配对推断。在测试中，大多数预测的性能反转并未发生。诊断分析表明，这些结果通常反映的是比较器崩溃、工件限制或评估选择，而非学习型控制器容忍扰动的证据。一个在观测滞后下的明显优势大约是Kubernetes HPA等效控制器的四十倍。另一个广泛引用的结果无法从其发布的工件中重建，且最强的可复现边际远小于已发表的结果。结论也会在扰动幅度和评估模式的变化下发生逆转。基于这些结果和文献中的更广泛模式，我们识别出一个制度性问题。发表和评审激励偏向于针对便捷比较器的基准增益，即使这些增益几乎不能提供部署性能的证据。我们认为问题不仅仅是技术性的，而是制度性的，因此学习型编排需要生产级比较器、注册扰动模型、独立的操作指标，以及奖励可复现操作证据的发表标准。没有这些改变，文献可以增长，但无法确定学习是否改进了编排。

英文摘要

Reinforcement learning for service orchestration has been the subject of sustained research for over a decade, yet it is not used in production at scale. The usual explanation is that learned controllers degrade under delayed and noisy telemetry, workload shifts, and uncontrolled tenants. We test whether existing evidence supports that explanation. We evaluate three highly influential RL-based orchestration systems spanning resource allocation, DAG scheduling, and autoscaling, using pre-registered predictions about comparative degradation under production-relevant perturbations and paired inference with family-wise error correction. Across the tests, most predicted performance reversals do not occur. Diagnostic analyses show that these outcomes often reflect comparator collapse, artefact limitations, or evaluation choices rather than evidence that learned controllers tolerate the perturbations. One apparent advantage under observation lag is roughly fortyfold compared to a Kubernetes HPA-equivalent controller. Another widely cited result cannot be reconstructed from its released artefact, and the strongest reproducible margin is far smaller than the published results. Conclusions also reverse under changes in perturbation magnitude and evaluation mode. Based on these results and broader patterns in the literature, we identify an institutional problem. Publication and review incentives favour benchmark gains against convenient comparators, even when those gains provide little evidence of deployment performance. We argue that the problem is not solely technical. Rather, it is institutional, so learned orchestration needs production-grade comparators, registered perturbation models, separate operational metrics, and publication criteria that reward reproducible operational evidence. Without these changes, the literature can grow without establishing whether learning improves orchestration.

URL PDF HTML ☆

赞 0 踩 0

2606.16612 2026-06-16 cs.SD cs.LG cs.MM 交叉投稿

Beyond Artifacts: Towards Generalizable Synthetic Song Detection via Music-Intrinsic Features

超越伪影：基于音乐内在特征的可泛化合成歌曲检测

Yan Han, Zhibin Wen, Yuan Wang, Shuangrun Shao, Xiaobing Li, Yang Xu, Wei Li

发表机构 * Central Conservatory of Music（中央音乐学院）； Southern University of Science and Technology（南方科技大学）； Fudan University（复旦大学）

AI总结提出Sofia框架，通过特征特定专家和自适应混合专家模型利用音乐内在特征（人声、音频效果、全局结构）进行合成歌曲检测，在MUSIC8K基准上F1提升18.5点，具有强鲁棒性。

详情

AI中文摘要

超越重平衡：在不使用重平衡技术的情况下对类别不平衡下的二分类器进行基准测试

Ali Nawaz, Amir Ahmad, Shehroz S. Khan

发表机构 * Department of Information Systems and Security, College of Information Technology and Center for Artificial Intelligence and Digital Innovation, United Arab Emirates University（信息系统与安全系，信息技术学院和人工智能与数字创新中心，阿联酋大学）； College of Engineering and Technology, American University of the Middle East（工程与技术学院，中东大学）

AI总结本研究系统评估了多种二分类器在无显式重平衡技术下对类别不平衡的鲁棒性，发现TabPFN和基于提升的集成模型在极端不平衡下仍保持较高性能。

详情

AI中文摘要

类别不平衡对监督分类构成了重大挑战，特别是在医疗诊断和异常检测等关键领域，其中少数类实例很少。尽管许多研究探索了重平衡技术来解决这个问题，但在未应用此类技术的情况下评估不平衡下二分类器性能的关注较少。因此，本研究的目标是评估二分类器“原样”的性能，而不执行任何显式重平衡。具体来说，我们系统评估了多种二分类器在真实世界和合成数据集上的鲁棒性，在逐步减少的少数类规模下，使用一次和少量样本场景作为基线。我们的方法还通过合成决策边界生成探索不同的数据复杂性，以模拟真实世界条件。除了标准分类器，我们还包括使用欠采样、过采样策略和单类分类方法的实验，以检查它们在严重不平衡下的行为。结果证实，随着数据复杂性增加和少数类规模减小，分类变得更加困难。虽然传统分类器在极端不平衡下性能下降，但像TabPFN和基于提升的集成模型等先进模型相比传统分类器保持了相对更高的性能和更好的泛化能力。可视化可解释性和评估指标进一步验证了这些发现。我们的工作为不平衡学习中的模型选择提供了有价值的指导，提供了关于分类器鲁棒性而不依赖显式重平衡技术的见解。

英文摘要

Class imbalance poses a significant challenge to supervised classification, particularly in critical domains like medical diagnostics and anomaly detection where minority class instances are rare. While numerous studies have explored rebalancing techniques to address this issue, less attention has been given to evaluating the performance of binary classifiers under imbalance when no such techniques are applied. Therefore, the goal of this study is to assess the performance of binary classifiers "as-is", without performing any explicit rebalancing. Specifically, we systematically evaluate the robustness of a diverse set of binary classifiers across both real-world and synthetic datasets, under progressively reduced minority class sizes, using one-shot and few-shot scenarios as baselines. Our approach also explores varying data complexities through synthetic decision boundary generation to simulate real-world conditions. In addition to standard classifiers, we include experiments using undersampling, oversampling strategies, and one-class classification (OCC) methods to examine their behavior under severe imbalance. The results confirm that classification becomes more difficult as data complexity increases and the minority class size decreases. While traditional classifiers deteriorate under extreme imbalance, advanced models like TabPFN and boosting-based ensembles retain relatively higher performance and better generalization compared to traditional classifiers. Visual interpretability and evaluation metrics further validate these findings. Our work offers valuable guidance on model selection for imbalanced learning, providing insights into classifier robustness without dependence on explicit rebalancing techniques.

URL PDF HTML ☆

赞 0 踩 0

2510.14217 2026-06-16 cs.LG physics.chem-ph 版本更新

Spectral Analysis of Molecular Features: When Richer Features Do Not Guarantee Better Generalization

分子特征的光谱分析：更丰富的特征并不保证更好的泛化

Asma Jamali, Tin Sum Cheng, Rodrigo A. Vargas-Hernández

发表机构 * School of Computational Science and Engineering, McMaster University, Canada（麦 master 大学计算科学与工程学院）； Department of Chemistry and Chemical Biology, McMaster University, Canada（麦 master 大学化学与生物化学系）； Department of Mathematics and Computer Science, University of Basel, Switzerland（巴塞尔大学数学与计算机科学系）； Brockhouse Institute for Materials Research, McMaster University, Canada（麦 master 大学材料研究布罗克豪斯研究所）

AI总结通过核岭回归对多种分子表示进行光谱分析，发现更丰富的光谱特征并不一致地提升泛化性能，挑战了自监督学习中表示越丰富越好的启发式方法。

Comments 11 pages, 7 figures, 3 tables, SI: 13 pages, 9 figures, 4 Tables

详情

AI中文摘要

特征嵌入的光谱特性为模型泛化和表示质量提供了关键见解。虽然深度学习模型广泛用于分子性质预测，但核方法在低数据场景下仍具有竞争力，然而其光谱行为尚未被充分探索。我们首次对核岭回归在多种表示（包括分子指纹ECFP、预训练变换器、图神经网络和3D描述符）上的光谱特性进行了全面分析，并在QM9和3个MoleculeNet基准上进行了评估。令人惊讶的是，更丰富的光谱特征并不一致地产生更好的泛化性能，这与自监督学习中常用的表示启发式方法相矛盾。在4个光谱指标中，只有基于ECFP的核与性能呈严格正相关。变换器和全局3D表示表现出混合行为，而局部3D表示则始终呈负相关。截断分析进一步强调了这种差异：对于热力学目标上的局部3D表示，仅需不到2%的特征值（有时低至0.02%）即可恢复95%的性能，而ECFP和变换器核则需要显著更多的特征值。通过证明对任务和表示的强烈依赖性，我们的结果挑战了更丰富光谱固有地改善泛化的启发式方法，为自监督学习和标签有限的科学任务中的表示评估提供了新指导。

英文摘要

The spectral properties of feature embeddings offer critical insights into model generalization and representation quality. While deep learning models are widely used for molecular property prediction, kernel methods remain competitive in low-data regimes, yet their spectral behavior is largely unexplored. We present the first comprehensive spectral analysis of kernel ridge regression across diverse representations-including molecular fingerprints (ECFP), pretrained transformers, graph neural networks, and 3D descriptors-evaluated on QM9 and 3 MoleculeNet benchmarks. Surprisingly, richer spectral features do not consistently yield better generalization performance, contradicting common representation heuristics used in self-supervised learning (SSL). Across 4 spectral metrics, only ECFP-based kernels show a strictly positive correlation with performance. Transformer and global 3D representations exhibit mixed behavior, whereas local 3D representations show consistently negative correlations. Truncation analysis further emphasizes this disparity: for local 3D representations on thermodynamic targets, fewer than 2\% of eigenvalues (and occasionally as few as 0.02\%) are needed to recover 95\% of performance, whereas ECFP and transformer kernels require significantly more. By demonstrating a strong dependence on both task and representation, our results challenge the heuristic that richer spectra inherently improve generalization, providing new guidance for evaluating representations in SSL and in label-limited scientific tasks.

URL PDF HTML ☆

赞 0 踩 0

2602.03293 2026-06-16 cs.LG 版本更新

Anomaly Detection via Mean Shift Density Enhancement

基于均值漂移密度增强的异常检测

Pritam Kar, Rahul Bordoloi, Olaf Wolkenhauer, Saptarshi Bej

发表机构 * School of Data Science, Indian Institute of Science Education and Research（数据科学学院，印度科学教育与研究学院）； Institute of Computer Science, University of Rostock（计算机科学研究所，罗斯托克大学）； Leibniz-Institute for Food Systems Biology, Technical University of Munich（食品系统生物学莱比锡研究所，慕尼黑技术大学）； Stellenbosch Institute of Advanced Studies (STIAS)（斯托尔波茨堡高级研究 institute (STIAS)）

AI总结提出MSDE框架，通过密度驱动流形演化下样本的几何位移检测异常，在46个表格数据集上优于13种基线方法。

详情

AI中文摘要

无监督异常检测是机器学习中的一个重要问题。现有的无监督异常检测算法很少能在不同异常类型上表现良好，通常仅在特定结构假设下表现出色。这种缺乏鲁棒性在噪声设置下尤为明显。我们提出均值漂移密度增强（MSDE），一个完全无监督的框架，通过异常样本对密度驱动流形演化的几何响应来检测异常。MSDE被设计为一个通用异常检测框架，其原理是：正常样本由于得到局部密度的良好支持，在迭代密度增强下保持稳定，而异常样本在向附近密度模式吸引时会产生大的累积位移。为了实现这一思想，MSDE采用加权均值漂移过程，其中自适应、样本特定的密度权重来源于基于流形学习的模糊邻域图。我们在一个包含46个真实世界表格数据集、四种现实异常生成机制和六种噪声水平的异常检测基准上评估了MSDE。与13个已建立的无监督基线相比，MSDE在几种标准分类指标上、在多个噪声水平下以及平均多种异常类型上，均实现了持续强大、平衡且鲁棒的性能。这些结果表明，基于位移的评分方法为现有的无监督异常检测最先进技术提供了一种鲁棒的替代方案。

英文摘要

Unsupervised anomaly detection stands as an important problem in machine learning. Existing unsupervised anomaly detection algorithms rarely perform well across different anomaly types, often excelling only under specific structural assumptions. This lack of robustness also becomes particularly evident under noisy settings. We propose Mean Shift Density Enhancement (MSDE), a fully unsupervised framework that detects anomalies through their geometric response to density-driven manifold evolution. MSDE is designed as a general purpose anomaly detection framework, based on the principle that normal samples, being well supported by local density, remain stable under iterative density enhancement, whereas anomalous samples undergo large cumulative displacements as they are attracted toward nearby density modes. To operationalize this idea, MSDE employs a weighted mean-shift procedure with adaptive, sample-specific density weights derived from a manifold learning-based fuzzy neighborhood graph. We evaluate MSDE on an anomaly detection benchmark comprising 46 real-world tabular datasets, four realistic anomaly generation mechanisms, and six noise levels. Compared to 13 established unsupervised baselines, MSDE achieves consistently strong, balanced and robust performance for several standard classification metrics, at several noise levels and on average over several types of anomalies. These results demonstrate that displacement-based scoring provides a robust alternative to the existing state-of-the-art for unsupervised anomaly detection.

URL PDF HTML ☆

赞 0 踩 0

2602.09329 2026-06-16 cs.LG 版本更新

MacrOData: New Benchmarks of Thousands of Datasets for Tabular Outlier Detection

MacrOData：用于表格异常检测的数千个数据集的新基准

Xueying Ding, Simon Klüttermann, Haomin Wen, Yilong Chen, Leman Akoglu

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Technical University of Dortmund（多特蒙德技术大学）

AI总结提出大规模表格异常检测基准MacrOData，包含2446个数据集，覆盖真实与合成异常，支持全面鲁棒的评估。

Comments 29 pages, KDD 2026

详情

DOI: 10.1145/3770855.3817520

AI中文摘要

质量基准对于公平准确地跟踪科学进展以及使从业者能够做出明智的方法选择至关重要。表格数据上的异常检测（OD）支撑着众多现实世界应用，然而现有的OD基准仍然有限。突出的OD基准AdBench是文献中的事实标准，但仅包含57个数据集。除了本文讨论的其他缺点外，其小规模严重限制了多样性和统计功效。我们引入了MacrOData，一个用于表格OD的大规模基准套件，包含三个精心策划的组成部分：OddBench，包含790个具有真实世界语义异常的数据集；OvrBench，包含856个具有真实世界统计异常的数据集；以及SynBench，包含800个合成生成的数据集，涵盖多样化的数据先验和异常类型。由于其规模和多样性，MacrOData能够对表格OD方法进行全面且统计稳健的评估。我们的基准进一步满足几个关键需求：我们为所有数据集提供标准化的训练/测试划分，公共/私有基准划分，其中私有划分的测试标签保留用于在线排行榜，并为我们的数据集注释语义元数据。我们在所有基准上进行了广泛的实验，评估了广泛的OD方法，包括经典模型、深度模型和基础模型，以及多样化的超参数配置。我们报告了详细的实证发现、实用指南以及个体性能，作为未来研究的参考。所有包含2446个数据集的基准均已开源，并在此https URL上托管了公开可访问的排行榜。

英文摘要

Quality benchmarks are essential for fairly and accurately tracking scientific progress and enabling practitioners to make informed methodological choices. Outlier detection (OD) on tabular data underpins numerous real-world applications, yet existing OD benchmarks remain limited. The prominent OD benchmark AdBench is the de facto standard in the literature, yet comprises only 57 datasets. In addition to other shortcomings discussed in this work, its small scale severely restricts diversity and statistical power. We introduce MacrOData, a large-scale benchmark suite for tabular OD comprising three carefully curated components: OddBench, with 790 datasets containing real-world semantic anomalies; OvrBench, with 856 datasets featuring real-world statistical outliers; and SynBench, with 800 synthetically generated datasets spanning diverse data priors and outlier archetypes. Owing to its scale and diversity, MacrOData enables comprehensive and statistically robust evaluation of tabular OD methods. Our benchmarks further satisfy several key desiderata: We provide standardized train/test splits for all datasets, public/private benchmark partitions with held-out test labels for the latter reserved toward an online leaderboard, and annotate our datasets with semantic metadata. We conduct extensive experiments across all benchmarks, evaluating a broad range of OD methods comprising classical, deep, and foundation models, over diverse hyperparameter configurations. We report detailed empirical findings, practical guidelines, as well as individual performances as references for future research. All benchmarks containing 2,446 datasets combined are open-sourced, along with a publicly accessible leaderboard hosted at https://huggingface.co/MacrOData-CMU.

URL PDF HTML ☆

赞 0 踩 0

2602.22422 2026-06-16 cs.LG cs.AI 版本更新

Revisiting Chebyshev Polynomial and Anisotropic RBF Models for Tabular Regression

重新审视切比雪夫多项式和各向异性RBF模型在表格回归中的应用

Luciano Gerber, Huw Lloyd

发表机构 * Department of Computing and Mathematics, Manchester Metropolitan University（计算与数学系，曼彻斯特 Metropolitan 大学）

AI总结本文在55个数据集上基准测试切比雪夫多项式回归器、各向异性RBF网络和平滑树混合模型，发现平滑模型在CPU可行模型中与树集成准确率相当且泛化差距更小，建议将其纳入候选池。

Comments 46 pages, 6 figures, 21 tables. Under review at Knowledge-Based Systems

详情

AI中文摘要

平滑基模型如切比雪夫多项式回归器和径向基函数（RBF）网络在数值分析中已得到充分确立。它们的连续可微预测表面适用于代理优化、敏感性分析以及其他响应随输入逐渐变化的环境。尽管具有这些特性，平滑模型在树集成主导的表格回归中很少出现。我们探究它们是否能够竞争，跨55个按应用领域组织的回归数据集对模型进行基准测试。我们开发了一种各向异性RBF网络，具有数据驱动的中心放置和基于梯度的宽度优化，一个岭正则化的切比雪夫多项式回归器，以及一个平滑树混合模型（切比雪夫模型树）；这三个模型均作为scikit-learn兼容包发布。我们将这些模型与树集成、预训练transformer和标准基线进行基准测试，评估准确性和泛化行为。transformer在大多数数据集上准确率排名第一，但其GPU依赖性、推理延迟和数据集大小限制制约了其在应用科学和工业中常见的基于CPU环境中的部署。在CPU可行的模型中，平滑模型和树集成在准确率上统计上持平，但前者倾向于表现出更紧的泛化差距。我们建议常规地将平滑基模型纳入候选池，特别是当下游使用受益于更紧的泛化和逐渐变化的预测时。

英文摘要

Smooth-basis models such as Chebyshev polynomial regressors and radial basis function (RBF) networks are well established in numerical analysis. Their continuously differentiable prediction surfaces suit surrogate optimisation, sensitivity analysis, and other settings where the response varies gradually with inputs. Despite these properties, smooth models seldom appear in tabular regression, where tree ensembles dominate. We ask whether they can compete, benchmarking models across 55 regression datasets organised by application domain. We develop an anisotropic RBF network with data-driven centre placement and gradient-based width optimisation, a ridge-regularised Chebyshev polynomial regressor, and a smooth-tree hybrid (Chebyshev model tree); all three are released as scikit-learn-compatible packages. We benchmark these against tree ensembles, a pre-trained transformer, and standard baselines, evaluating accuracy alongside generalisation behaviour. The transformer ranks first on accuracy across a majority of datasets, but its GPU dependence, inference latency, and dataset-size limits constrain deployment in the CPU-based settings common across applied science and industry. Among CPU-viable models, smooth models and tree ensembles are statistically tied on accuracy, but the former tend to exhibit tighter generalisation gaps. We recommend routinely including smooth-basis models in the candidate pool, particularly when downstream use benefits from tighter generalisation and gradually varying predictions.

URL PDF HTML ☆

赞 0 踩 0

2605.09169 2026-06-16 cs.LG cs.AI 版本更新

Prediction Bottlenecks Don't Discover Causal Structure (But Here's What They Actually Do)

预测瓶颈不会发现因果结构（但它们实际上做了什么）

Ankit Hemant Lade, Sai Krishna Jasti, Indar Kumar, Aman Chadha

发表机构 * Ankit Hemant Lade ； Sai Krishna Jasti ； Indar Kumar ； Aman Chadha

AI总结研究通过实验证明，预测模型中的瓶颈无法发现因果结构，但在特定条件下仍表现出一定的干预效果，主要贡献是提出了可复用的验证基准。

Comments 6 pages, 3 tables. Code: https://github.com/ankitlade12/ssm-causal

详情

AI中文摘要

一个仅用于下一步预测的Mamba状态空间模型似乎通过简单的读出$S = |W_{out} W_{in}|$恢复了格兰杰因果结构，早期实验表明该现象在不同架构中普遍，并在$p < 10^{-5}$时受益于干预数据。我们包装了用于测试该主张的协议——标准化合成生成器（VAR/洛伦兹/CauseMe式）、三种干预语义（$do(X=c)$、软噪声、随机强迫）、三个真实数据集上的边来源卡片，以及大小匹配的对照组——作为可重用的验证基准，并在五个阶段中检验该主张。方法层面的主张未能通过：（i）简单的线性瓶颈同样表现良好或更优；（ii）在合成CauseMe式基准和洛伦兹96（唯一具有明确地面真实性的现实基准）上，调优的Lasso在瓶颈之上；经典PCMCI和格兰杰领先紧邻的集群中，瓶颈落后；（iii）头条干预优势约为60%的样本量混杂因素，残差在标准$do(X=c)$干预下消失，仅在非标准随机强迫方案下存活；（iv）即使该残差再现，其效果在经典二元格兰杰中重现，效果更具普遍性。所剩的是狭窄的特征化结果；基准是持久的产物，上述每个阶段都是其对照组之一。

英文摘要

A Mamba state-space model trained only for next-step prediction appears to recover Granger-causal structure through a simple readout $S = |W_{out} W_{in}|$, with early experiments suggesting the phenomenon generalized across architectures and benefited from interventional data at $p < 10^{-5}$. We package the protocol used to test that claim -- standardized synthetic generators (VAR/Lorenz/CauseMe-style), three intervention semantics ($do(X=c)$, soft-noise, random-forcing), edge-provenance cards on three real datasets, and size-matched control arms -- as a reusable falsification benchmark, and walk the claim through it in five stages. The method-level claim does not survive: (i) a plain linear bottleneck does as well or better; (ii) tuned Lasso beats the bottleneck on synthetic CauseMe-style benchmarks, and on Lorenz-96 (the only real benchmark with unambiguous ground truth) classical PCMCI and Granger lead a tight cluster in which the bottleneck trails; (iii) the headline intervention advantage is roughly 60% a sample-size confound, and the residual disappears under standard $do(X=c)$ interventions, surviving only under a non-standard random-forcing scheme; (iv) even that residual reproduces, with a larger effect, in classical bivariate Granger -- the effect is method-agnostic. What survives is a narrow characterization result; the benchmark is the lasting artifact, and each stage above is one of its control arms.

URL PDF HTML ☆

赞 0 踩 0

2605.26418 2026-06-16 cs.LG cs.AI cs.DC 版本更新

When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control

深度强化学习何时超越校准基线？自适应资源控制的基准研究

Guilin Zhang, Chuanyi Sun, Kai Zhao, Xu Chu, Shahryar Sarkani, John Fossaceca

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）； University of Toronto（多伦多大学）

AI总结通过RLScale-Bench基准测试，发现校准的基于规则的自动缩放器在所有工作负载上成本均低于六种主流深度强化学习算法，并揭示了算法选择、基线校准和评估协议的关键瓶颈。

详情

AI中文摘要

一个适当校准的基于规则的自动缩放器可以在我们测试的每个工作负载上，在成本方面击败六种主流深度强化学习（DRL）算法——那么，如果存在的话，DRL究竟何时能真正发挥作用？我们在RLScale-Bench中研究这个问题，这是一个用于自适应资源控制的DRL可重复基准和评估协议，其中代理在成本和服务级别约束下将计算资源分配给动态工作负载。我们在匹配的架构、训练预算和奖励函数下，评估PPO、DQN、A2C、SAC、TD3和DDPG，与校准的基于规则基线在六个工作负载模式和五个种子（240次运行）上进行对比，在Kubernetes水平Pod自动缩放上实例化基准，并探测分布偏移泛化。三个发现挑战了常见假设：（i）校准控制器在所有六个工作负载上实现了最低成本，尽管在突发和闪流流量上落后于最佳RL代理；（ii）由于动作空间不匹配，离散动作算法在约束违反方面比连续动作算法好一到两个数量级；（iii）没有单一算法在所有工作负载上占主导地位，排名变化高达四个位置。基于RL的资源控制的瓶颈不是算法选择，而是基线校准、奖励工程和现实的评估协议。

英文摘要

A properly calibrated rule-based autoscaler can beat every one of six mainstream deep reinforcement learning (DRL) algorithms on cost across every workload we test - so when, if ever, does DRL actually help? We study this in RLScale-Bench, a reproducible benchmark and evaluation protocol for DRL on adaptive resource control, where an agent allocates compute to a dynamic workload under cost and service-level constraints. We evaluate PPO, DQN, A2C, SAC, TD3, and DDPG under matched architectures, training budgets, and reward functions against a calibrated rule-based baseline across six workload patterns and five seeds (240 runs), instantiate the benchmark on Kubernetes Horizontal Pod Autoscaling, and probe distribution-shift generalization. Three findings challenge common assumptions: (i) the calibrated controller achieves the lowest cost on all six workloads, though it trails the best RL agents on bursty and flash traffic; (ii) discrete-action algorithms outperform continuous-action ones by one to two orders of magnitude in constraint violations due to action-space mismatch; and (iii) no single algorithm dominates across workloads, with rankings shifting by up to four positions. The bottleneck in RL-based resource control is not algorithm selection but baseline calibration, reward engineering, and realistic evaluation protocols.

URL PDF HTML ☆

赞 0 踩 0

2605.27618 2026-06-16 cs.LG 版本更新

Evaluating Local Explainability Metrics for Machine Learning Models on Tabular Data

评估表格数据机器学习模型的局部可解释性指标

Tomás Pereira, João Vitorino, Eva Maia, Isabel Praça

发表机构 * GECAD, ISEP, Polytechnic of Porto（GECAD、ISEP、波尔图理工大学）

AI总结研究局部可解释性技术在复杂表格分类任务中的可信度，通过基准测试LIME、Kernel SHAP和特征消融技术，发现解释质量主要受数据集复杂性和特征分布影响，而非模型预测性能。

Comments 9 pages, 12 tables, 1 figure, DATA 2026 Conference

详情

AI中文摘要

尽管广泛使用可解释性技术来尝试理解人工智能（AI）的行为，但生成的解释可能并不总是可靠的。一个解释对人类来说可能看似合理，但未能捕捉模型的内部推理，特别是在处理复杂的表格数据时。本文研究了局部可解释性技术在应用于复杂表格分类任务时的可信度，考虑了三个主要属性的评估指标：对模型预测的忠实度、对输入数据变化的鲁棒性以及解释本身的复杂性。对局部可解释模型无关解释（LIME）、Kernel SHAP（Shapley Additive exPlanations）和特征消融技术进行了基准测试，涉及32个数据集和不同类型的机器学习模型。分析了模型性能范围，以识别两组：共识正确（所有模型正确预测的样本）和共识错误（所有模型错误预测的样本）。获得的结果表明，解释并不总是与模型的预测性能相关。相反，数据集复杂性和特征分布似乎是影响解释质量和可靠性的主要因素。

英文摘要

Despite the wide use of explainability techniques to attempt to understand the behavior of Artificial Intelligence (AI), the generated explanations may not always be reliable. An explanation can appear plausible to humans but fail to capture the internal reasoning of a model, particularly when dealing with complex tabular data. This paper studies the trustworthiness of local explainability techniques when applied to complex tabular classification tasks, considering evaluated metrics for three main properties: faithfulness to the model's predictions, robustness to input data variations, and complexity of the explanation itself. A benchmark was performed for Local Interpretable Model-Agnostic Explanations (LIME), Kernel SHapley Additive exPlanations (SHAP), and Feature Ablation techniques, across 32 datasets and different types of machine learning models. Model performance ranges were analyzed to identify two groups: consensus-correct, which are samples that all models predicted correctly, and consensus-wrong, samples that all models predicted incorrectly. The obtained results demonstrate that that the explanations are not always correlated with a model's predictive performance. Instead, dataset complexity and feature distributions seem to be the main factors affecting explanation quality and reliability.

URL PDF HTML ☆

赞 0 踩 0

2606.01602 2026-06-16 cs.LG cs.AI cs.IT math.IT 版本更新

Estimating Mutual Information between Time Series and Temporal Event Sequences Across Diverse Analysis Tasks

估计时间序列与时间事件序列在不同分析任务中的互信息

Haoji Hu, Huaqing Mao, Yijun Lin, Xiaowei Jia, Jinwei Zhou, Minoh Jeong, Yao-Yi Chiang

发表机构 * University of Minnesota - Twin Cities（明尼苏达大学-双城分校）； University of Pittsburgh（匹兹堡大学）； Inha University（Inha大学）

AI总结提出一种非参数互信息估计器，直接度量连续时间序列与离散事件序列之间的依赖关系，无需数据转换或离散化，通过处理量化伪影和事件冗余实现鲁棒统一框架。

详情

DOI: 10.1145/3770855.3817693

AI中文摘要

成对依赖度量（如相关性和因果性）是时间数据挖掘的基础，但目前仍缺乏一种原则性且稳健的方法来量化异构数据类型之间的依赖关系，特别是连续时间序列与离散时间事件序列之间。现有方法依赖于对量化、重复值和事件冗余高度敏感的临时变换或互信息估计器，导致实践中结果有偏或不稳定。我们提出一种非参数互信息估计器，无需数据转换、学习或临时离散化，直接度量时间序列与事件序列之间的依赖关系。我们的方法对真实世界时间序列的连续-离散二元性进行建模，以处理量化和重复值伪影，并引入潜在事件聚类策略以减轻事件共现和冗余带来的偏差。这些共同构成了一个鲁棒且统一的框架，桥接了离散和连续互信息。我们在四个代表性任务上评估了所提出的估计器：用于因果分析的离散-连续时延互信息、全局和局部时间重复发现、用于时间序列预测的离散协变量选择以及用于分类的连续特征选择。在合成和真实世界数据集上的实验表明，在准确性、鲁棒性和可解释性方面，该方法一致优于现有方法，使其成为异构时间数据的通用依赖算子，类似于同质时间序列的皮尔逊相关。代码见：https://github.com/HaojiHu/Multimodal-Temporal-Data-Quantification

英文摘要

Pairwise dependence measures such as correlation and causality are fundamental to temporal data mining, yet there is still no principled and robust way to quantify dependence between heterogeneous data types, especially between continuous time series and discrete temporal event sequences. Existing approaches rely on ad hoc transformations or mutual-information estimators that are highly sensitive to quantization, repeated values, and event redundancy, leading to biased or unstable results in practice. We propose a nonparametric mutual information estimator that directly measures the dependence between time series and event sequences without data transformation, learning, or ad hoc discretization. Our method models the continuous-discrete duality of real-world time series to handle quantization and repeated-value artifacts and introduces a latent event clustering strategy to mitigate bias from event co-occurrence and redundancy. Together, these yield a robust and unified framework that bridges discrete and continuous mutual information. We evaluate the proposed estimator on four representative tasks: discrete-continuous time-delayed mutual information for causality analysis, global and local temporal repetition discovery, discrete covariate selection for time series forecasting, and continuous feature selection for classification. Experiments on synthetic and real-world datasets show consistent improvements over existing methods in accuracy, robustness, and interpretability, positioning our approach as a general-purpose dependence operator for heterogeneous temporal data, similar to Pearson correlation for homogeneous time series. Code available at: https://github.com/HaojiHu/Multimodal-Temporal-Data-Quantification

URL PDF HTML ☆

赞 0 踩 0

2606.02670 2026-06-16 cs.LG cs.AI 版本更新

Anomalies in Multivariate Time Series Benchmarks Are Mostly Univariate

多变量时间序列基准中的异常主要是单变量的

Marc Pinet, Julien Cumin, Samuel Berlemont, Dominique Vaufreydaz

发表机构 * Orange Research（Orange研究院）； Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG（格勒诺布尔阿尔卑斯大学、CNRS、格勒诺布尔INP、LIG）

AI总结本文通过诊断框架和实验证明，当前多变量时间序列异常检测基准中，异常主要源于单变量偏离，跨通道结构变化极少，因此现有基准不适合验证跨通道建模能力。

Comments Accepted at the 12th International Workshop on Mining and Learning from Time Series (MiLeTS), co-located with KDD 2026

详情

AI中文摘要

许多最新的多变量时间序列异常检测（MT-SAD）模型引入了跨通道建模，其隐含假设是异常的结构可能分布在多个通道上。我们在八个广泛使用的公共基准上评估了这一假设，引入了一个逐段诊断框架，该框架针对每个标记的异常，标记是否至少有一个通道单独偏离其正常历史，是否跨通道相关结构发生变化，或两者兼有。该框架表明，在一系列合理阈值下，没有跨通道破裂发生在没有伴随单变量偏离的情况下。一个补充指标还显示，在八个基准中的六个上，至少一半的标记异常段在79%到100%的时间步上发生单变量偏离，在其中的三个数据集上达到100%。为了验证我们的框架在存在跨通道结构时能够捕获它，我们构建了具有共享噪声的相移正弦通道的合成数据。每个异常段通过两种通道级损坏之一进行改变，这些损坏保留了每个通道的边缘分布，同时破坏了跨通道结构，我们的框架正确地将这些段表征为仅跨通道异常。在这些数据上，依赖通道（CD）模型成功利用了跨通道信号，而独立通道（CI）模型则失败。在真实基准上对最近SOTA检测器的CI/CD比较进一步证实了CD建模没有带来可衡量的收益。我们得出结论，当前的MT-SAD基准不适合验证跨通道建模能力，并呼吁开发更多结构多样的评估集。本研究的代码已公开。

英文摘要

Many recent multivariate time series anomaly detection (MTSAD) models incorporate cross-channel modeling, under the implicit assumption that the structure of anomalies may be spread across multiple channels. We evaluate this assumption on eight widely used public benchmarks by introducing a per-segment diagnostic framework that flags, for each labeled anomaly, whether at least one channel deviates individually from its normal history, whether the cross-channel correlation structure changes, or both. The framework shows that no cross-channel rupture occurs without an accompanying univariate deviation across a range of reasonable thresholds. A complementary metric also reveals that on six of the eight benchmarks, at least half of the labeled anomaly segments deviate univariately on 89% to 100% of their timesteps, reaching 100% on three of these datasets. To verify that our framework captures cross-channel structure when present, we construct synthetic data of phase-shifted sinusoidal channels with shared noise. Each anomalous segment is altered through one of two channel-wise corruptions that preserve the per-channel marginal distribution while breaking cross-channel structure, and our framework correctly characterizes these segments as cross-channel-only. On these data, channel-dependent (CD) models successfully exploit the cross-channel signal whereas channel-independent (CI) ones fail. The CI/CD comparison of a recent SOTA detector on real benchmarks further confirms that CD modeling brings no measurable gain. We conclude that current MTSAD benchmarks are unsuitable for validating cross-channel modeling capabilities, and we call for the development of more structurally diverse evaluation sets. The code for this study is publicly available.

URL PDF HTML ☆

赞 0 踩 0

2606.05692 2026-06-16 cs.LG cs.AI 版本更新

Benchmarking Counterfactual Prediction in Epidemic Time Series with Time-Varying Interventions

具有时变干预的流行病时间序列中的反事实预测基准测试

Wenhao Mu, Facundo Yan, Anik Mumssen, Marisa Eisenberg, Alexander Rodríguez

发表机构 * University of Michigan Computer Science and Engineering（密歇根大学计算机科学与工程系）； University of Michigan Epidemiology & Complex Systems（密歇根大学流行病学与复杂系统）

AI总结为解决缺乏可观测反事实结果的真实基准问题，基于校准的基于智能体的模型生成大规模流行病时间序列反事实预测基准，支持静态/时变治疗和单/多策略干预，评估多种因果推断方法。

Comments To appear in Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

详情

DOI: 10.1145/3770855.3817522

AI中文摘要

深度学习在时间序列因果推断方面取得了显著进展，但由于缺乏具有可观测反事实结果的现实基准，进展仍然受到限制。现有数据集要么依赖没有真实反事实的真实世界观测，要么依赖无法捕捉复杂因果动态的简化模拟。为了解决这一差距，我们开发了一个大规模基准，用于动态干预下流行病时间序列的反事实预测。与现有基准不同，它支持静态和时变治疗，以及单策略和多策略干预设置，从而能够在广泛的因果推断场景中评估因果推断方法。利用基于真实世界人口、流动性、流行病学和政策数据校准的基于智能体的模型，我们生成了跨越美国150多个县的真实反事实轨迹。使用该基准，我们评估了广泛使用和最先进的因果推断方法，揭示了显著的性能差异，并突出了现实时间序列因果推理的挑战。

英文摘要

Deep learning has enabled significant advances in time-series causal inference, yet progress remains constrained by the lack of realistic benchmarks with observable counterfactual outcomes. Existing datasets either rely on real-world observations without ground-truth counterfactuals or on simplified simulations that fail to capture complex causal dynamics. To address this gap, we develop a large-scale benchmark for counterfactual prediction in epidemic time series under dynamic interventions. Unlike existing benchmarks, it supports static and time-varying treatments, as well as both single-policy and multi-policy intervention settings, enabling evaluation of causal inference methods across a broad range of causal inference scenarios. Leveraging a calibrated agent-based model grounded in real-world demographic, mobility, epidemiological, and policy data, we generate realistic counterfactual trajectories across more than 150 U.S. counties. Using this benchmark, we evaluate widely used and state-of-the-art causal inference methods, revealing substantial performance differences and highlighting the challenges of realistic time-series causal reasoning.

URL PDF HTML ☆

赞 0 踩 0

2606.08583 2026-06-16 cs.LG eess.SP 版本更新

A spectral audit framework reveals task-dependent aperiodic reliance across EEG and ECG deep learning

频谱审计框架揭示EEG和ECG深度学习中任务依赖的非周期性依赖

Jasmeet Singh Bindra, Siddharth Panwar

发表机构 * Indian Knowledge Systems and Mental Health Applications (IKSMHA) Center, Indian Institute of Technology Mandi（印度理工学院曼迪分校印度知识体系与心理健康应用中心）； School of Computing and Electrical Engineering, Indian Institute of Technology Mandi（印度理工学院曼迪分校计算与电气工程学院）

AI总结提出频谱审计框架，结合非周期/周期分解、相位保持傅里叶干预等，发现深度学习模型对非周期成分的依赖是任务依赖且架构通用的，在睡眠-觉醒分类中影响显著，临床异常检测中中等，运动想象中最小，并扩展到ECG。

Comments 25 pages, being prepared for submission to peer-reviewed journal

详情

AI中文摘要

生理时间序列的深度学习通过领域特定特征解释——EEG中的振荡节律、ECG中的形态复合波——但这些信号位于一个宽带非周期1/f样包络之上，该包络与觉醒、年龄和病理共变。我们引入了一个频谱审计框架，结合非周期/周期分解、相位保持傅里叶干预、假对照和模拟验证。非周期依赖是任务依赖且架构通用的：在六种神经架构中，对于睡眠-觉醒分类，平坦化下降超过0.42平衡准确率点；对于临床异常检测达到0.07-0.13；对于运动想象保持最小。七个EEG基础模型中有六个在临床EEG上显示出FDR显著的非周期依赖；年龄/性别和记录时代控制减少了但未消除该效应。将审计应用于PTB-XL ECG，发现神经下降0.32-0.36，在人口统计匹配后持续存在，确认此类混淆因素扩展到EEG之外。非周期控制应成为可解释生理时间序列深度学习的标准。

英文摘要

Deep learning on physiological time series is interpreted through domain-specific features -- oscillatory rhythms in EEG, morphological complexes in ECG -- yet these signals sit atop a broadband aperiodic 1/f-like envelope that covaries with arousal, age, and pathology. We introduce a spectral audit framework combining aperiodic/periodic decomposition, phase-preserving Fourier interventions, sham controls, and simulation validation. Aperiodic reliance was task-dependent and architecture-general: across six neural architectures, flattening drops exceeded 0.42 balanced-accuracy points for sleep-wake classification, reached 0.07-0.13 for clinical abnormality detection, and remained minimal for motor imagery. Six of seven EEG foundation models showed FDR-significant aperiodic reliance on clinical EEG; age/sex and recording-era controls reduced but did not eliminate the effect. Applying the audit to PTB-XL ECG revealed neural drops of 0.32--0.36 persisting after demographic matching, confirming this confound class extends beyond EEG. Aperiodic controls should become standard for interpretable physiological time-series deep learning.

URL PDF HTML ☆

赞 0 踩 0

2606.08594 2026-06-16 cs.LG eess.SP 版本更新

How Much Capacity Does EEG Denoising Need? Ultra-Compact Networks reveal Benchmark Saturation and Metric-Utility Gap

脑电图去噪需要多少容量？超紧凑网络揭示基准饱和与度量-效用差距

Jasmeet Singh Bindra, Siddharth Panwar

发表机构 * Indian Knowledge Systems and Mental Health Applications (IKSMHA) Center, Indian Institute of Technology Mandi（印度理工学院曼迪分校印度知识体系与心理健康应用中心）； School of Computing and Electrical Engineering, Indian Institute of Technology Mandi（印度理工学院曼迪分校计算与电气工程学院）

AI总结通过固定架构仅改变通道宽度（1.05K-40.26K参数），发现EEG去噪重建性能在3-6.5K参数时饱和，且重建度量不预测下游BCI效用，超紧凑模型（33-46KB）适用于边缘部署。

Comments 17 pages, will be submitted to peer-reviewed journal

详情

AI中文摘要

深度学习脑电图去噪架构已从数万参数扩展到数千万参数，然而尚无先前研究将模型容量作为实验变量隔离，或测试重建度量是否预测下游神经信号效用。我们通过固定架构、损失、数据划分和训练配方，仅在最小深度可分离卷积U-Net中从1.05K到40.26K参数扫描通道宽度，解决了这两个空白。模型在EEGDenoiseNet基准、跨数据集BCI迁移测试、受控基线重训练以及所有九个BCI竞赛IV-2a受试者的五个解码器家族的下游运动想象分类上进行了评估。重建性能在3-6.5K参数时饱和，肘部后每log10参数单位增益最多0.015相关系数。在相同流程下重训练的8.46M参数基线在EOG上与40.26K紧凑变体匹配——200倍参数差距未带来优势——而Patch-Transformer控制重现了相同的递减回报形状。下游评估揭示了分类器依赖的度量-效用差距：重建优化的去噪显著降低了所有九个受试者和三种伪影类型的CSP+LDA分类（最佳去噪准确率0.547 vs. 噪声基线0.612；Bonferroni p=0.0488），在自然记录试验中持续存在（Delta=-0.047；BH-FDR q=0.0049）。端到端神经解码器显示可变或中性效果。标准EEG去噪基准在远低于当前模型容量时已饱和，重建度量不预测BCI效用。33-46 KB和1.27-2.61M FLOPs/段的超紧凑模型适用于边缘部署。这些发现主张容量控制评估、更困难的任务感知基准以及强制性的下游验证。

英文摘要

Deep learning EEG denoising architectures have scaled from tens of thousands to tens of millions of parameters, yet no prior study has isolated model capacity as the experimental variable or tested whether reconstruction metrics predict downstream neural-signal utility. We address both gaps by fixing architecture, loss, data split, and training recipe while sweeping only channel width from 1.05K to 40.26K parameters in a minimal depthwise-separable convolutional U-Net. Models were evaluated on the EEGDenoiseNet benchmark, cross-dataset BCI transfer tests, controlled baseline retraining, and downstream motor-imagery classification with five decoder families across all nine BCI Competition IV-2a subjects. Reconstruction performance saturated by 3-6.5K parameters, with post-elbow gains of at most 0.015 correlation coefficient per log10-parameter unit. An 8.46M-parameter baseline retrained under the same pipeline matched the 40.26K compact variant on EOG--a 200x parameter gap yielding no advantage--while a Patch-Transformer control reproduced the same diminishing-return shape. Downstream evaluation exposed a classifier-dependent metric-utility gap: reconstruction-optimized denoising significantly degraded CSP+LDA classification across all nine subjects and three artifact types (best denoised accuracy 0.547 vs. 0.612 noisy baseline; Bonferroni p=0.0488), persisting on naturally recorded trials (Delta=-0.047; BH-FDR q=0.0049). End-to-end neural decoders showed variable or neutral effects. Standard EEG denoising benchmarks are saturated far below current model capacity, and reconstruction metrics do not predict BCI utility. Ultra-compact models at 33-46 KB and 1.27-2.61M FLOPs/segment are practical for edge deployment. These findings argue for capacity-controlled evaluation, harder task-aware benchmarks, and mandatory downstream validation.

URL PDF HTML ☆

赞 0 踩 0

2306.11252 2026-06-16 cs.CL cs.LG 版本更新

CycliST：用于循环状态转换推理的视频语言模型基准

Simon Kohaut, Daniel Ochs, Shun Zhang, Benedict Flade, Julian Eggert, Kristian Kersting, Devendra Singh Dhami

发表机构 * Artificial Intelligence and Machine Learning Lab, TU Darmstadt（人工智能与机器学习实验室，图腾斯达特技术大学）； Konrad Zuse School of Excellence in Learning and Intelligent Systems (ELIZA)（Konrad Zuse 学校（ELIZA））； Honda Research Institute Europe GmbH, Offenbach, Germany（本田欧洲研究院，奥芬巴赫，德国）； Uncertainty in Artificial Intelligence Group, TU Eindhoven（人工智能不确定性小组，埃因霍温技术大学）； Hessian Center for AI (hessian.AI)（黑森人工智能中心（hessian.AI））； Center for Cognitive Science（认知科学中心）； German Center for Artificial Intelligence (DFKI)（德国人工智能中心（DFKI））

AI总结提出CycliST基准，通过合成视频评估视频语言模型对循环状态转换的文本推理能力，揭示现有模型在检测循环模式、时间理解和定量分析方面的局限。

Comments Published in the Journal of Data-centric Machine Learning Research (DMLR); https://openreview.net/forum?id=l03g53HUL2

详情

Journal ref: Journal of Data-centric Machine Learning Research, 2026

AI中文摘要

我们提出了CycliST，这是一个新颖的基准数据集，旨在评估视频语言模型（VLM）在循环状态转换上的文本推理能力。CycliST通过生成合成的、结构丰富的视频序列来捕捉现实世界过程的基本方面，这些视频序列具有物体运动和视觉属性的周期性模式。CycliST采用分层评估系统，通过改变循环物体的数量、场景杂乱程度和光照条件逐步增加难度，挑战最先进模型的时空认知能力。我们使用当前最先进的VLM（包括开源和专有模型）进行了大量实验，揭示了它们在泛化到循环动力学（如线性和轨道运动）以及视觉属性（如颜色和尺度）随时间变化方面的局限性。我们的结果表明，当前的VLM难以可靠地检测和利用循环模式，缺乏时间理解的概念，并且无法从场景中提取定量信息（如运动物体的数量），突显了需要解决的重要技术差距。更具体地说，我们发现没有单一模型在性能上始终领先：大小和架构与结果的相关性不强，且没有模型在所有任务上同样成功。通过提供有针对性的挑战和全面的评估框架，CycliST为超越当前最先进水平的视觉推理模型在理解周期性模式方面铺平了道路。

英文摘要

We present CycliST, a novel benchmark dataset designed to evaluate Video Language Models (VLM) on their ability for textual reasoning over cyclical state transitions. CycliST captures fundamental aspects of real-world processes by generating synthetic, richly structured video sequences featuring periodic patterns in object motion and visual attributes. CycliST employs a tiered evaluation system that progressively increases difficulty through variations in the number of cyclic objects, scene clutter, and lighting conditions, challenging state-of-the-art models on their spatio-temporal cognition. We conduct extensive experiments with current state-of-the-art VLMs, both open-source and proprietary, and reveal their limitations in generalizing to cyclical dynamics such as linear and orbital motion, as well as time-dependent changes in visual attributes like color and scale. Our results demonstrate that present-day VLMs struggle to reliably detect and exploit cyclic patterns, lack a notion of temporal understanding, and are unable to extract quantitative insights from scenes, such as the number of objects in motion, highlighting a significant technical gap that needs to be addressed. More specifically, we find no single model consistently leads in performance: neither size nor architecture correlates strongly with outcomes, and no model succeeds equally well across all tasks. By providing a targeted challenge and a comprehensive evaluation framework, CycliST paves the way for visual reasoning models that surpass the state-of-the-art in understanding periodic patterns.

URL PDF HTML ☆

赞 0 踩 0

2512.11682 2026-06-16 cs.AI cs.LG 版本更新

MedAI: Evaluating TxAgent's Therapeutic Agentic Reasoning in the NeurIPS CURE-Bench Competition

MedAI: 评估 TxAgent 在 NeurIPS CURE-Bench 竞赛中的治疗性智能推理

Tim Cofala, Christian Kalfar, Jingge Xiao, Johanna Schrader, Michelle Tang, Wolfgang Nejdl

发表机构 * L3S Research Center（L3S研究中心）

AI总结本文介绍 TxAgent，一种通过迭代检索增强生成和统一生物医学工具集进行治疗决策的智能AI方法，并在CURE-Bench竞赛中评估其推理质量，通过改进工具检索策略提升性能，荣获开放科学卓越奖。

Comments 7 pages, 3 figures

详情

AI中文摘要

临床医学中的治疗决策构成了一个高风险领域，其中AI指导与患者特征、疾病过程和药物制剂之间的复杂相互作用相互交织。药物推荐、治疗计划和不良反应预测等任务需要基于可靠生物医学知识的稳健、多步骤推理。以TxAgent为代表的智能AI方法通过迭代检索增强生成（RAG）应对这些挑战。TxAgent采用微调的Llama-3.1-8B模型，动态生成并执行对统一生物医学工具集（ToolUniverse）的函数调用，整合FDA药物API、OpenTargets和Monarch资源，确保获取最新的治疗信息。与通用RAG系统相比，医疗应用施加了严格的安全约束，使得推理轨迹和工具调用序列的准确性至关重要。这些考虑促使评估协议将令牌级推理和工具使用行为视为明确的监督信号。本文展示了我们参与CURE-Bench NeurIPS 2025挑战赛的见解，该挑战赛使用评估正确性、工具利用和推理质量的指标来基准测试治疗推理系统。我们分析了函数（工具）调用的检索质量如何影响整体模型性能，并展示了通过改进工具检索策略实现的性能提升。我们的工作获得了开放科学卓越奖。完整信息请访问此https URL。

英文摘要

Therapeutic decision-making in clinical medicine constitutes a high-stakes domain in which AI guidance interacts with complex interactions among patient characteristics, disease processes, and pharmacological agents. Tasks such as drug recommendation, treatment planning, and adverse-effect prediction demand robust, multi-step reasoning grounded in reliable biomedical knowledge. Agentic AI methods, exemplified by TxAgent, address these challenges through iterative retrieval-augmented generation (RAG). TxAgent employs a fine-tuned Llama-3.1-8B model that dynamically generates and executes function calls to a unified biomedical tool suite (ToolUniverse), integrating FDA Drug API, OpenTargets, and Monarch resources to ensure access to current therapeutic information. In contrast to general-purpose RAG systems, medical applications impose stringent safety constraints, rendering the accuracy of both the reasoning trace and the sequence of tool invocations critical. These considerations motivate evaluation protocols treating token-level reasoning and tool-usage behaviors as explicit supervision signals. This work presents insights derived from our participation in the CURE-Bench NeurIPS 2025 Challenge, which benchmarks therapeutic-reasoning systems using metrics that assess correctness, tool utilization, and reasoning quality. We analyze how retrieval quality for function (tool) calls influences overall model performance and demonstrate performance gains achieved through improved tool-retrieval strategies. Our work was awarded the Excellence Award in Open Science. Complete information can be found at https://curebench.ai/.

URL PDF HTML ☆

赞 0 踩 0

2602.16902 2026-06-16 cs.AI cs.LG 版本更新

LLM-WikiRace Benchmark: How Far Can LLMs Plan over Real-World Knowledge Graphs?

LLM-WikiRace 基准测试：大语言模型在真实知识图谱上的规划能力有多强？

Juliusz Ziomek, William Bankes, Lorenz Wolf, Shyam Sundhar Ramesh, Xiaohang Tang, Ilija Bogunovic

发表机构 * University of Oxford, UK（牛津大学，英国）； University College London (Centre for AI), UK（伦敦大学学院（人工智能中心），英国）； University of Basel, Switzerland（巴塞尔大学，瑞士）

AI总结提出 LLM-Wikirace 基准，通过维基百科超链接导航任务评估大语言模型的规划、推理与世界知识，发现模型在简单任务上超人类，但困难任务成功率仅 23%，且规划与长程推理是主要瓶颈。

详情

AI中文摘要

我们引入了 LLM-Wikirace，一个用于评估大语言模型（LLM）规划、推理和世界知识的基准。在 LLM-Wikirace 中，模型必须逐步高效地导航维基百科超链接，从给定源页面到达目标页面，这需要前瞻性规划和推理概念如何在现实世界中连接的能力。我们评估了广泛的开源和闭源模型，包括 Gemini-3、GPT-5 和 Claude Opus 4.5，它们在任务的简单级别上取得了最强结果，并展现了超人类性能。尽管如此，在困难难度下性能急剧下降：表现最好的模型 Gemini-3 仅在 23% 的困难游戏中成功，凸显了前沿模型面临的重大挑战。我们的分析表明，世界知识是成功的必要因素，但仅在一定程度内；超过这个阈值，规划和长程推理能力成为主导因素。轨迹级分析进一步揭示，即使是最强的模型在失败后也难以重新规划，经常陷入循环而非恢复。LLM-Wikirace 是一个简单的基准，揭示了当前推理系统的明显局限性，提供了一个开放的竞技场，其中具备规划能力的 LLM 仍有待证明。我们的代码和排行榜可在 https://llmwikirace.github.io 获取。

英文摘要

We introduce LLM-Wikirace, a benchmark for evaluating planning, reasoning, and world knowledge in large language models (LLMs). In LLM-Wikirace, models must efficiently navigate Wikipedia hyperlinks step by step to reach a target page from a given source, requiring look-ahead planning and the ability to reason about how concepts are connected in the real world. We evaluate a broad set of open- and closed-source models, including Gemini-3, GPT-5, and Claude Opus 4.5, which achieve the strongest results on the easy level of the task and demonstrate superhuman performance. Despite this, performance drops sharply on hard difficulty: the best-performing model, Gemini-3, succeeds in only 23\% of hard games, highlighting substantial remaining challenges for frontier models. Our analysis shows that world knowledge is a necessary ingredient for success, but only up to a point, beyond this threshold, planning and long-horizon reasoning capabilities become the dominant factors. Trajectory-level analysis further reveals that even the strongest models struggle to replan after failure, frequently entering loops rather than recovering. LLM-Wikirace is a simple benchmark that reveals clear limitations in current reasoning systems, offering an open arena where planning-capable LLMs still have much to prove. Our code and leaderboard available at https:/llmwikirace.github.io.

URL PDF HTML ☆

赞 0 踩 0

2603.02668 2026-06-16 cs.AI cs.LG 版本更新

SorryDB: Can AI Provers Complete Real-World Lean Theorems?

SorryDB: AI证明者能完成现实世界的Lean定理吗？

Austin Letson, Leopoldo Sarra, Auguste Poiroux, Oliver Dressler, Paul Lezeau, Dhyan Aranha, Frederick Pu, Aaron Hill, Miguel Corredera Hidalgo, Julian Berman, George Tsoukalas, Lenny Taelman

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出动态更新的基准SorryDB，包含78个GitHub上的现实形式化项目，评估AI证明者在复杂依赖下的能力，发现当前方法互补，基于Gemini Flash的智能体方法表现最佳。

详情

AI中文摘要

我们提出了SorryDB，一个动态更新的基准，包含从GitHub上78个现实世界形式化项目中提取的开放Lean任务。与现有的静态基准（通常由竞赛问题组成）不同，攀登SorryDB基准将产生与社区需求对齐、对数学家更易用、更能理解复杂依赖的工具。此外，通过提供持续更新的任务流，SorryDB减轻了测试集污染，并为智能体对新颖形式数学项目的贡献能力提供了稳健的度量。我们评估了一系列方法，包括通用大型语言模型、智能体方法和专用符号证明器，在SorryDB中选取的1000个任务快照上。我们表明当前方法是互补的：尽管基于Gemini Flash的智能体方法性能最佳，但它并不严格优于其他现成的大型语言模型、专用证明器，甚至精心策划的Lean策略列表。

英文摘要

We present SorryDB, a dynamically-updating benchmark of open Lean tasks drawn from 78 real world formalization projects on GitHub. Unlike existing static benchmarks, often composed of competition problems, hillclimbing the SorryDB benchmark will yield tools that are aligned to the community needs, more usable by mathematicians, and more capable of understanding complex dependencies. Moreover, by providing a continuously updated stream of tasks, SorryDB mitigates test-set contamination and offers a robust metric for an agent's ability to contribute to novel formal mathematics projects. We evaluate a collection of approaches, including generalist large language models, agentic approaches, and specialized symbolic provers, over a selected snapshot of 1000 tasks from SorryDB. We show that current approaches are complementary: even though an agentic approach based on Gemini Flash is the most performant, it is not strictly better than other off-the-shelf large-language models, specialized provers, or even a curated list of Lean tactics.

URL PDF HTML ☆

赞 0 踩 0

2605.09697 2026-06-16 cs.CV cs.LG 版本更新

Discriminative Span as a Predictor of Synthetic Data Utility via Classifier Reconstruction

判别跨度作为通过分类器重构预测合成数据效用的指标

Radhika Amar Desai, Modigari Narendra

发表机构 * School Of Computer Science（计算机科学学院）； Vellore Institute of Technology（维杰雷理工学院）

AI总结本文提出一种几何驱动的指标，通过预训练模型的嵌入空间评估合成数据效用，无需模型训练，通过测量线性分类器权重向量在变化子空间中的投影误差，判断合成数据对下游分类性能的影响。

详情

AI中文摘要

在许多现实世界计算机视觉应用中，如医学影像和工业检测，二分类任务常面临正样本严重缺乏的问题。广泛采用的解决方案是通过图像到图像转换生成合成正样本。然而，一个根本性挑战是：如何可靠地评估此类合成数据是否能提升下游模型性能？本文提出一种几何驱动的指标，该指标可预测合成数据的效用，而无需模型训练。我们的方法在预训练基础模型的嵌入空间中操作，并通过样本之间的差异向量表示数据集。我们通过测量线性分类器权重向量在这些变化子空间中的投影误差，评估其是否可被表示在该子空间内。直观上，如果合成数据诱导的变化捕捉了任务相关方向，其张量可近似分类器，导致投影误差低。反之，质量差的合成数据无法张量这些方向，导致误差高。在多个数据集和架构上，我们证明该指标与混合真实负样本和合成正样本训练的CNN下游分类性能有强相关性。这些发现表明，所提指标是评估数据稀缺设置中合成数据质量的实用且信息丰富的工具。

英文摘要

In many real-world computer vision applications, including medical imaging and industrial inspection, binary classification tasks are characterized by a severe scarcity of positive samples. A widely adopted solution is to generate synthetic positive data using image-to-image transformations applied to negative samples. However, a fundamental challenge remains: how can we reliably assess whether such synthetic data will improve downstream model performance? In this work, we propose a geometry-driven metric that predicts the utility of synthetic data without requiring model training. Our approach operates in the embedding space of a pre-trained foundation model and represents the dataset through difference vectors between samples. We evaluate whether the weight vector of a linear classifier can be expressed within the subspace spanned by these variations by measuring the relative projection error. Intuitively, if the variations induced by synthetic data capture task-relevant directions, their span can approximate the classifier, resulting in low projection error. Conversely, poor synthetic data fails to span these directions, leading to higher error. Across multiple datasets and architectures, we show that this metric exhibits strong correlation with downstream classification performance of CNNs trained on mixtures of real negative and synthetic positive data. These findings suggest that the proposed metric serves as a practical and informative tool for evaluating synthetic data quality in data-scarce settings.

URL PDF HTML ☆

赞 0 踩 0

2605.18421 2026-06-16 cs.CL cs.AI cs.LG 版本更新

自适应数据清洗框架用于噪声标签检测

Chen-Hsuan Fang, Wei-Hsinag Chen, Pin-Hsuan Yu, Jung-Hua Wang, Tsung-Wei Pan

发表机构 * Department of Electrical Eng（电子工程系）； AI Research Center（人工智能研究中心）

AI总结提出一种无需手动阈值的自适应数据清洗框架，融合局部、全局和学习动态等多重度量，通过特征空间的多度量聚类实现噪声标签检测，在CIFAR-10、MNIST和ImageNet-100上显著提升召回率和模型精度。

详情

AI中文摘要

深度神经网络（DNN）在给定大型标注数据集的计算机视觉任务中表现出色。然而，在实际应用中，标签常常因歧义、人为错误或动态环境而受到污染。过参数化的DNN在训练过程中容易记忆这些噪声标签，从而降低模型的准确性和泛化能力。现有的数据清洗和样本选择策略通常依赖于手动指定的阈值、噪声比率的先验知识或单一度量（学习动态或几何结构），这使得它们在复杂数据场景下不稳定。本文提出了一种自适应数据清洗框架，该框架整合了局部、全局和学习动态线索，用于鲁棒的噪声标签检测。通过模块化特征拼接范式，样本被映射到统一的低维特征空间。我们提供了两种实例化：一种二维度量，结合了基于类自适应KNN的局部不一致性和基于k-means的全局质心距离；另一种三维多度量，额外引入了z归一化分数。与传统的将一维高斯混合模型应用于单一标量度量的方法不同，我们的框架在特征空间上执行多度量聚类，以自适应地将样本划分为干净主导和噪声主导成分，无需手动阈值或噪声先验。在CIFAR-10、MNIST和ImageNet-100上，针对5%至40%的对称标签噪声进行的实验表明，该框架在所有设置下均实现了高召回率，包括在ImageNet-100上40%噪声时接近完美的召回率（≥98%）。后续训练在所有评估设置下均获得了精度提升，尤其是在ImageNet-100的严重污染情况下。这些发现表明，多度量整合为噪声标签检测提供了一种无阈值、实用且低调整的策略。

英文摘要

Deep neural networks (DNNs) excel in computer vision tasks given large annotated datasets. In real-world applications, however, labels are often corrupted by ambiguity, human error, or dynamic environments. Over-parameterized DNNs easily memorize these noisy labels during training, degrading model accuracy and generalization. Existing data-cleaning and sample-selection strategies often rely on manually specified thresholds, prior knowledge of the noise ratio, or a single metric (either learning dynamics or geometric structure), making them unstable in complex data regimes. This paper proposes a self-adaptive data-cleaning framework that integrates local, global, and learning dynamics cues for robust noisy-label detection. Samples are mapped into a unified low-dimensional feature space through a modular feature concatenation paradigm. We provide two instantiations: a 2D metric integrating class-adaptive KNN-based local disagreement with k-means-based global centroid distance, and a 3D multi-metric that additionally incorporates a z-normalized score. Unlike conventional 1D Gaussian Mixture Models applied to a single scalar metric, our framework performs multi-metric clustering on the feature space to adaptively partition samples into clean-dominant and noise-dominant components without requiring manual thresholds or noise priors. Experiments on CIFAR-10, MNIST, and ImageNet-100 with 5% to 40% symmetric label noise show high recall across settings, including near-perfect recall (>=98%) on ImageNet-100 at 40% noise. Subsequent training yields accuracy gains across evaluated settings, especially under severe corruption on ImageNet-100. These findings suggest that multi-metric integration provides a threshold-free, practical, and low-tuning strategy for noisy label detection.

URL PDF HTML ☆

赞 0 踩 0

2606.11520 2026-06-16 cs.CL cs.AI cs.LG 版本更新

ISE: An Execution-Grounded Recipe for Multi-Turn OS-Agent Trajectories

ISE：一种基于执行的多轮操作系统代理轨迹合成方法

Siyuan Luo, Nairong Zheng, Lin Zhou, Tiankuo Yao, Shengyou Yuan, Haojia Yu, Cong Pang, Jiapeng Luo, Lewei Lu

发表机构 * University of Electronic Science and Technology of China（电子科技大学）； SenseTime Research（字节跳动研究院）

AI总结提出ISE三阶段范式，通过结构化意图构建、角色锁定用户模拟和真实执行环境，生成多轮代理轨迹，微调后显著提升代理工具使用性能。

Comments 13 pages, 6 figures. Dataset and code: https://github.com/Valiere01/ISE-Trace

详情

AI中文摘要

训练有能力的操作系统代理需要同时捕获结构化用户意图、多轮任务委派和基于工具执行的数据——这些属性在现有数据集中缺失。我们提出ISE（意图->模拟->执行），一种三阶段合成范式，联合解决这些差距。阶段1通过4D框架（人物角色x领域x任务x复杂度）构建约50000个结构化意图；去重后池中包含43956个唯一意图，并在mpnet-base-v2嵌入（余弦核，q=1）上获得61.57的Vendi分数。阶段2通过角色锁定的用户模拟器驱动多轮用户-代理交互，将每轮用户交互基于实际执行结果，生成23132条完整轨迹，平均8.12轮用户交互和68.24轮总对话。阶段3在实时、隔离的操作系统工作空间中执行每个工具调用，生成真实的故障恢复动态而非模拟响应。在ISETrace上微调后，使用Qwen3-8B在标准协议下的代理工具使用任务中，ClawEval pass@1从19.3提升至37.7。该结果优于零样本GPT-4o和四倍大的Qwen3-32B基础模型。对阶段2的消融实验证明多轮模拟带来了大部分性能提升。我们在该https URL发布所有源代码和数据集。

英文摘要

Training capable OS agents requires data that simultaneously captures structured user intents, multi-turn task delegation, and grounded tool execution--properties absent from existing datasets. We propose ISE (Intent -> Simulate -> Execute), a three-stage synthesis paradigm that addresses these gaps jointly. Stage 1 constructs roughly 50000 structured intents via a 4D framework (Persona x Domain x Task x Complexity); after deduplication the pool contains 43956 unique intents and attains a Vendi Score of 61.57 over the entire pool on mpnet-base-v2 embeddings (cosine kernel, q=1). Stage 2 drives multi-turn user-agent interaction through a role-locked user simulator that grounds each user turn in actual execution outcomes, producing 23132 complete trajectories averaging 8.12 user turns and 68.24 total dialogue turns. Stage 3 runs every tool call inside a live, isolated OS workspace, generating authentic failure-recovery dynamics instead of simulated responses. Fine-tuning on ISETrace improves ClawEval pass@1 from 19.3 to 37.7 using Qwen3-8B on agent tool-use tasks with a standard protocol. This result outperforms zero-shot GPT-4o and the larger Qwen3-32B base model which is four times bigger. An ablation on Stage 2 proves multi-turn simulation brings a large portion of the performance gain. We release all source code and dataset at https://github.com/Valiere01/ISE-Trace.

URL PDF HTML ☆

赞 0 踩 0

2606.13608 2026-06-16 cs.AI cs.LG 版本更新

AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility

AgentBeats：面向开放性、标准化和可复现性的智能体评估代理化

Xiaoyuan Liu, Jianhong Tu, Yuqi Chen, Siyuan Xie, Sihan Ren, Tianneng Shi, Gal Gantar, Evan Sandoval, Donghyun Lee, Daniel Miao, Peter J. Gilbert, Nick Hynes, Mauro Staver, Warren He, David Marn, Andrew Low, Xi Zhang, Elron Bandel, Michal Shmueli-Scheuer, Siva Reddy, Alexandre Drouin, Alexandre Lacoste, Ramayya Krishnan, Elham Tabassi, Yu Su, Victor Barres, Chenguang Wang, Wenbo Guo, Dawn Song

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Purdue University（普渡大学）； University of Ljubljana（卢布尔雅那大学）； University of Washington（华盛顿大学）； Oasis Labs ； University of Maryland（马里兰大学）； IBM Research（IBM研究院）； Mila ； McGill University（麦吉尔大学）； ServiceNow Research（ServiceNow研究院）； Carnegie Mellon University（卡内基梅隆大学）； National Institute of Standards and Technology（美国国家标准与技术研究院）； The Ohio State University（俄亥俄州立大学）； University of Cambridge（剑桥大学）； University of California, Santa Barbara（加州大学圣塔芭芭拉分校）

AI总结提出代理化智能体评估（AAA）框架，通过标准化协议（A2A和MCP）统一评估接口，实现开放、可复现的多智能体评估，并基于AgentBeats系统通过大规模竞赛和案例研究验证其覆盖性、实用性和保真度。

详情

AI中文摘要

智能体系统在各领域快速进步，但其评估仍然碎片化。大多数基准测试依赖于固定的、以LLM为中心的测试框架，需要大量集成，造成测试与生产环境不匹配，并限制了不同智能体设计之间的公平比较。根本问题在于缺乏开放的、与智能体无关的评估接口。我们倡导代理化智能体评估（AAA），其中评估由裁判智能体执行，所有参与者通过标准化协议交互：A2A用于任务管理，MCP用于工具访问。传统基准测试定义了两个独立的接口（一个用于基准测试，一个用于智能体），而AAA只需要一个；这产生了一个通用的统一框架，将评估逻辑与智能体实现分离，并支持可复现、可互操作和多智能体评估。我们进一步引入AgentBeats作为AAA的具体实现：我们确定了五种实际操作模式，使标准化评估与开放性、隐私性和可复现性的现实约束兼容。为了大规模评估我们的设计，我们进行了两项研究：一项为期五个月的开放竞赛，吸引了来自独立参与者的12个类别的298个裁判智能体和467个主题智能体，表明AAA适用于异构基准测试范围；以及一项关于编码智能体的案例研究，证实代理化评估在保留与公开记录一致性的同时，揭示了先前缺失的直接比较结果，产生了关于智能体设计的研究见解。结合社区规模实地研究和受控编码案例研究，我们验证了AAA在异构场景下大规模提供覆盖性、实用性和保真度。AAA和AgentBeats共同为开放、标准化和可复现的智能体评估提供了清晰路径。

英文摘要

Agent systems are advancing quickly across domains, but their evaluation remains fragmented. Most benchmarks rely on fixed, LLM-centric harnesses that require heavy integration, create test-production mismatch, and limit fair comparison across diverse agent designs. The root problem is the lack of an open, agent-agnostic assessment interface. We advocate Agentified Agent Assessment (AAA), where evaluation is performed by judge agents and all participants interact through standardized protocols: A2A for task management and MCP for tool access. Conventional benchmarking defines two separate interfaces, one for the benchmark and one for the agent, while AAA only needs one; this yields a generic, unified framework that separates assessment logic from agent implementation and enables reproducible, interoperable, and multi-agent evaluation. We further introduce AgentBeats as a concrete realization of AAA: we identify five practical operation modes that make standardized assessment compatible with real-world constraints on openness, privacy, and reproducibility. To evaluate our design at scale, we conduct two studies: a five-month open competition that drew 298 judge agents across 12 categories together with 467 subject agents from independent participants, showing that AAA applies across a heterogeneous range of benchmarks; and a case study on coding agents that confirms agentified evaluation preserves fidelity with the public record while surfacing previously missing head-to-head results, yielding research insights about agent design. Combining a community-scale field study and a controlled coding case study, we verify that AAA delivers coverage, practicality, and fidelity across heterogeneous scenarios at scale. Together, AAA and AgentBeats offer a clear path toward open, standardized, and reproducible agent assessment.

URL PDF HTML ☆

赞 0 踩 0

2606.14898 2026-06-16 cs.LG 新提交

α-Fair Insurance Pricing: A Fairness Continuum

α-公平保险定价：一个公平性连续谱

Tianhe Zhang, Xiguang Liu, Peng Shi

发表机构 * Department of Risk and Insurance, Wisconsin School of Business, University of Wisconsin–Madison（威斯康星大学麦迪逊分校威斯康星商学院风险与保险系）； Department of Information Systems and Operations Management, Warrington College of Business, University of Florida（佛罗里达大学沃灵顿商学院信息系统与运营管理系）

AI总结提出α-FISP框架，通过约束优化平衡精算公平与团结公平，参数α实现从纯精算到纯团结的连续定价谱，理论保证且计算可行。

详情

AI中文摘要

保险定价中的公平性仍然是一个长期存在且争论不休的难题。一方面，保险公司出于盈利考虑，设定区分个体风险的保费以实现精算公平。另一方面，保险通过跨人群的风险汇集发挥关键社会功能，激励群体间的交叉补贴以促进团结公平。这两种竞争性公平观念之间的张力使得保险定价本质上复杂，尤其是在现代环境中，精细数据允许越来越细的风险区分，而监管机构面临保护弱势群体的压力日益增大。为解决这一挑战，我们提出了一个$α$-公平个体偿付能力保费（$α$-FISP）框架，该框架在保证偿付能力（保险运营的基本要求）的同时，明确捕捉精算公平与团结公平之间的权衡。我们将定价问题表述为一个约束优化任务，其中精算公平保费在每一风险类别内的交叉补贴预算约束下进行调整。这一表述自然产生一族由$α$参数化的解，追踪从纯精算定价到纯团结定价的连续谱，使决策者能够在此公平性谱上选择操作点。我们为所提出的框架推导了理论保证。数值实验表明，$α$-FISP计算上可行，并且与具有异质性州级公平性要求的美国监管体制高度一致。

英文摘要

Fairness in insurance pricing remains a long-standing and deeply debated puzzle. On one hand, insurers, driven by profitability considerations, set premiums that differentiate across individual risks to achieve actuarial fairness. On the other hand, insurance serves a critical societal function by pooling risks across a population, motivating cross-subsidization among groups to promote solidarity fairness. The tension between these two competing notions of fairness makes insurance pricing inherently complex, particularly in modern settings where granular data allow for increasingly fine risk differentiation and regulators face growing pressure to protect vulnerable groups. To address this challenge, we propose an $α$-\textbf{F}air \textbf{I}ndividual \textbf{S}olvent \textbf{P}remium ($α$-FISP) framework for insurance pricing that explicitly captures the trade-off between actuarial and solidarity fairness while guaranteeing solvency, a fundamental requirement in insurance operations. We formulate the pricing problem as a constrained optimization task, where actuarially fair premiums are adjusted subject to budget constraints on cross-subsidization within each risk class. This formulation naturally yields a family of solutions parameterized by $α$, tracing a continuum between purely actuarial and purely solidarity-based pricing and enabling decision-makers to select an operating point along this fairness spectrum. We derive theoretical guarantees for the proposed framework. Numerical experiments show that $α$-FISP is computationally tractable and aligns well with the U.S. regulatory regimes featuring heterogeneous state-level fairness requirements.

URL PDF HTML ☆

赞 0 踩 0

2606.12486 2026-06-16 cs.LG 新提交

An Empirical Study on Predictive Maintenance for Component X in Heavy-Duty Scania Trucks

重型斯堪尼亚卡车中组件X的预测性维护实证研究

Valeriu Dimidov, Sasan Jafarnejad, Raphaël Frank

发表机构 * SnT, University of Luxembourg（卢森堡大学SnT）； Scania CV AB（斯堪尼亚商用车公司）

AI总结针对卡车车队，提出一种基于状态监测的预测性维护方法，将磨损状态建模为单调非递减时间序列，通过选取最近观测并转换为表格数据，利用AutoML简化建模，在Scania组件X数据集上降低了成本。

详情

DOI: 10.1109/ICPHM65385.2025.11061822

AI中文摘要

近年来，基于状态的预测性维护（PdM）在卡车车队中得到了广泛应用。这种维护策略旨在通过监测车辆的健康状况并根据其状态采取主动措施，最大限度地减少计划外停机并降低成本。然而，由于卡车产生的大量数据、通过传感器数据检测故障的内在复杂性以及在解决方案实施中寻找成本效益权衡的困难，基于状态的PdM系统的实施具有挑战性。在本文中，我们定义并验证了一种基于状态的PdM方法，该方法基于一个假设：被监测组件的磨损状态可以表示为单调非递减的时间序列。它涉及仅从时间序列中选择最近的观测值，并将其转换为表格格式，以便使用为表格数据设计的机器学习（ML）模型进行分类。我们的结果表明，与当前最先进（SOTA）方法相比，所提出的方法在Scania组件X数据集上降低了成本，同时通过AutoML简化了建模过程。

英文摘要

Condition-based Predictive Maintenance (PdM) for truck fleets has gained momentum in recent years. This maintenance strategy aims to minimize unplanned downtimes and reduce costs by monitoring the health status of vehicles and taking proactive action based on their condition. However, the implementation of condition-based PdM systems is challenging due to the large volume of data generated by the trucks, the inherent complexity of detecting failures through sensor data and the difficulties in finding cost-effective trade-offs in the solution's implementation. In this paper, we define and validate a condition-based PdM methodology built on the assumption that the wear-and-tear state of the monitored component can be represented as a monotonically non-decreasing time series. It involves selecting only the most recent observations from the time series and transforming them into a tabular format for classification using machine learning (ML) models designed for tabular data. Our results indicate that the proposed methodology reduces costs on the Scania Component X dataset compared to current state-of-the-art (SOTA) approaches, while also simplifying the modeling process through AutoML.

URL PDF HTML ☆

赞 0 踩 0

2606.14960 2026-06-16 cs.LG cs.CY 新提交

Leveraging Physiological Signals to Predict Exam Outcomes with Machine Learning

利用生理信号通过机器学习预测考试结果

Lala Yamazaki, Ramchandra Rimal

发表机构 * Middle Tennessee State University（中田纳西州立大学）

AI总结研究使用机器学习模型分析考试期间的生理数据（皮肤电活动、心率、皮肤温度）预测成绩，比较了逻辑回归、随机森林、SVM及LSTM、GRU、Transformer等模型，发现随机森林在效率和可解释性上表现优异，Transformer与LSTM/GRU性能相当。

Comments 9 figures, and 5 tables

详情

AI中文摘要

本研究探讨了利用机器学习模型预测考试结果的可行性，数据来自考试期间收集的生理信号。分析了包括皮肤电活动、心率和皮肤温度在内的生理压力指标，以揭示其与学业表现的关系。采用了多种机器学习方法，从逻辑回归、随机森林和支持向量机等标准模型，到更先进的架构，包括Transformer、长短期记忆（LSTM）和门控循环单元（GRU）模型。这种多样性旨在有效捕捉数据中的复杂交互。一个关键焦点是评估Transformer在处理数值数据方面的适应性，并评估其在此新情境下的性能。使用准确率、精确率、召回率和F1分数等标准性能指标来比较模型效果。实验结果表明，虽然深度学习模型通常擅长捕捉生理数据中的复杂关系，但像随机森林这样的简单模型有时能实现更优性能，同时提供计算效率和可解释性。此外，Transformer表现出显著的多功能性，展现出与LSTM和GRU模型相当的性能。本研究强调了尝试与问题目标相符的广泛模型类别的重要性，平衡了精度、效率和可解释性。通过阐明生理信号与学业表现之间的关系，本研究有助于理解影响学生心理健康的压力因素，并进一步促进利用生理数据提升学生福祉和学业成果。

英文摘要

This study investigates the application of machine learning models to predict exam outcomes using physiological data collected during examination sessions. Physiological stress indicators, including electrodermal activity, heart rate, and skin temperature, were analyzed to uncover their association with academic performance. A variety of machine learning approaches were employed, ranging from standard models like logistic regression, random forest, and support vector machines to more advanced architectures, including transformers, long short-term memory (LSTM), and gated recurrent unit (GRU) models. This diversity aimed to capture the complex interactions within the data effectively. A key focus was assessing the adaptability of transformers in processing numerical data and evaluating their performance in this novel context. Standard performance metrics, such as accuracy, precision, recall, and F1-score, were used to compare model efficacy. The experimental results demonstrate that while deep learning models generally excel at capturing complex relationships in physiological data, simpler models like random forests can sometimes achieve superior performance while offering computational efficiency and interpretability. Furthermore, transformers demonstrated notable versatility, showcasing performances comparable to those of the LSTM and GRU models. This research underscores the importance of experimenting with a broad class of models that align with the objectives of the problem at hand, balancing precision, efficiency, and interpretability. By elucidating the relationships between physiological signals and academic performance, this study contributes to understanding stressors affecting students' mental health. It further promotes leveraging physiological data to enhance student well-being and academic outcomes.

URL PDF HTML ☆

赞 0 踩 0

2606.14999 2026-06-16 cs.LG 新提交

医学中的语义推理：知识图谱在五个关键领域的作用

Haniye Sherafatmandjoo, Mohammad Akbari, Zahed Rahmati

发表机构 * Amirkabir University of Technology（阿米尔卡比尔理工大学）

AI总结综述知识图谱在医学中的应用，涵盖临床决策支持、疾病预测、健康推荐、精准医疗和医学问答，并讨论构建方法、挑战及未来方向。

详情

AI中文摘要

知识图谱（KGs）已成为整合和推理复杂生物医学与临床数据的有前景解决方案。通过表示疾病、药物、症状和患者记录等实体之间的结构化关系，KGs为决策、预测、推荐和个性化护理提供了语义基础。最近的进展已证明它们在多种医学应用中的实用性——包括临床决策支持系统、疾病和治疗结果预测、健康推荐系统、精准医疗和医学问答——其中KGs通常增强可解释性、语义一致性和患者特定推理。与此同时，越来越多的研究专注于医学KG生成本身，提出了利用本体、语义网技术、基于深度学习的信息提取和混合神经符号流水线，从电子健康记录、临床叙述、生物医学文献和网络资源构建图谱的框架。尽管取得了这些进展，仍然存在重大挑战，包括知识覆盖有限且分散、异构数据源对齐困难、当前推理和表示学习方法在密集多关系图上的脆弱性，以及与隐私、偏见和问责相关的未解决问题。本综述从应用导向和方法导向两个维度回顾和分类了当前医学KG的研究，讨论了其优势和技术基础，并概述了关键局限性和开放研究方向。通过分析趋势、架构和评估实践，本文旨在指导KG驱动的医学AI系统的未来发展，并支持其安全有效地融入医疗环境。

英文摘要

Knowledge graphs (KGs) have emerged as a promising solution for integrating and reasoning over complex biomedical and clinical data in healthcare. By representing structured relationships among entities such as diseases, drugs, symptoms, and patient records, KGs provide a semantic backbone for decision-making, prediction, recommendation, and personalized care. Recent advances have demonstrated their utility across diverse medical applications--including clinical decision support systems, disease and treatment outcome prediction, health recommender systems, precision medicine, and medical question answering--where KGs often enhance interpretability, semantic coherence, and patient-specific reasoning. In parallel, a growing body of work focuses on medical KG generation itself, proposing frameworks that construct graphs from EHRs, clinical narratives, biomedical literature, and web resources using ontologies, semantic web technologies, deep-learning-based information extraction, and hybrid neuro-symbolic pipelines. Despite this progress, significant challenges remain, including limited and fragmented knowledge coverage, difficulties in aligning heterogeneous data sources, the fragility of current reasoning and representation-learning methods on dense multi-relational graphs, and unresolved issues related to privacy, bias, and accountability. This survey reviews and categorizes current research on KGs in medicine along both application-oriented and methodology-oriented dimensions, discusses their benefits and technical foundations, and outlines key limitations and open research directions. By analyzing trends, architectures, and evaluation practices, this work aims to guide future developments in KG-driven medical AI systems and support their safe and effective integration into healthcare environments.

URL PDF HTML ☆

赞 0 踩 0

2606.15225 2026-06-16 cs.LG cs.AI cs.IR 新提交

Edu-Theater: A Data-Efficient Agent Framework for Scalable Learner Behavior Simulation through Staging Roll-Call

Edu-Theater: 一种通过点名排演实现可扩展学习者行为模拟的数据高效智能体框架

Weibo Gao, Qi Liu, Linan Yue, Zheng Zhang, Yichao Du, Fangzhou Yao, Ao Yu, Zhenya Huang, Shijin Wang

发表机构 * University of Science and Technology of China（中国科学技术大学）； State Key Laboratory of Cognitive Intelligence（认知智能国家重点实验室）； Southeast University（东南大学）； Alibaba Group（阿里巴巴集团）； iFLYTEK Co., Ltd.（科大讯飞股份有限公司）

AI总结提出Edu-Theater框架，通过构建群体水平能力先验和少量诊断查询，利用LLM智能体模拟学习者行为，在减少数据需求的同时提高模拟精度，并增强下游自适应测试等应用。

Comments LLM Agent, Educational Data Mining, Data Synthesis, Human Simulation

详情

AI中文摘要

大规模学习者-任务交互数据对智能教育系统至关重要，但收集成本高且受隐私和学习者参与度限制。学习模拟器在无需真实学习者持续参与的情况下，对模拟可扩展的学习者行为起着关键作用。然而，现有方法主要是**以个体为中心**，为每个学习者配对模拟器，从密集的交互历史中迭代推断潜在知识状态，这既数据密集又计算密集，且在冷启动场景中脆弱。我们提出一种**群体感知的点名模拟范式**，首先构建群体水平的能力先验，然后通过少量有针对性的诊断查询细化个体学习者状态。基于该范式，我们引入**Edu-Theater**，一个由LLM驱动的智能体系统，通过教师智能体和基于学习者日志的回顾性点名探测执行群体感知的学习者模拟。Edu-Theater无需每个学习者的密集历史即可实现可扩展的未来行为模拟。在两个真实世界数据集上的实验表明，Edu-Theater以显著更少的LLM调用实现了更高的模拟精度，生成的合成数据增强了自适应测试等下游应用。

英文摘要

Large-scale learner-task interaction data are crucial for intelligent educational systems but are costly to collect and constrained by privacy and learner engagement. Learner simulators play a critical role in simulating scalable learner behavior without the need for continuous involvement of real learners. However, existing methods are predominantly \textbf{individual-centric}, pairing a simulator with each learner to iteratively infer latent knowledge states from dense interaction histories, which is both data- and computation-intensive, and fragile in cold-start scenarios. We propose a \textbf{cohort-aware roll-call simulation paradigm} that first constructs cohort-level proficiency priors and refines individual learner states through a small number of targeted diagnostic queries. Based on this paradigm, we introduce \textbf{Edu-Theater}, an LLM-powered agent system that performs cohort-aware learner simulation via a teacher agent and retrospective roll-call probing over learner logs. Edu-Theater enables scalable future behavior simulation without the need for dense per-learner histories. Experiments on two real-world datasets demonstrate that Edu-Theater achieves higher simulation accuracy with significantly fewer LLM calls, producing synthetic data that enhances downstream applications such as adaptive testing.

URL PDF HTML ☆

赞 0 踩 0

2606.15257 2026-06-16 cs.LG 新提交

AI for Social Good: An Investigation of the Causal Relationship Between Environmental Regulations and Their Effects on Air Pollution in London, UK

AI 促进社会公益：英国伦敦环境法规与其对空气污染影响的因果关系研究

Yang Han, Jacqueline CK Lam, Victor OK Li, Yiu-Wai Man

发表机构 * Department of Electrical and Electronic Engineering, The University of Hong Kong（香港大学电子与电气工程系）

AI总结提出不确定性感知的贝叶斯深度学习框架，估计2010-2020年伦敦空气污染法规对PM2.5的因果效应，发现法规平均降低PM2.5 1.88 μg/m³（12.35%）。

详情

AI中文摘要

空气污染法规是城市公共卫生治理的核心，但估计其效果具有挑战性，因为政策实施非随机，且污染轨迹受气象、社会经济变化、时间趋势和重叠干预措施的影响。本研究开发了一个不确定性感知的贝叶斯深度学习框架，用于估计2010年至2020年伦敦空气污染法规对PM$_{2.5}$浓度的总体影响。该框架整合了来自内伦敦监测站的每日PM$_{2.5}$观测数据、气象协变量、年度社会经济指标、月份和星期指示变量，以及32项政策措施的每日法规状态数据。贝叶斯LSTM捕获环境和社会经济协变量的时间依赖性，贝叶斯嵌入层表示时间和法规状态输入，法规状态预测分支支持基于倾向性得分的非随机政策实施调整。通过将观测到的PM$_{2.5}$浓度与假设无法规情景下的反事实预测进行比较，估计法规效果，并在重复贝叶斯训练和bootstrap重采样中总结不确定性。结果显示，伦敦的法规与平均PM$_{2.5}$减少1.88 μg/m³（相对减少12.35%）相关，95%置信区间为1.64-2.12 μg/m³。2013年之前效果有限，2013年至2017年效果逐渐明显，2018年和2019年效果最强。研究结果表明，持续累积的监管干预措施对伦敦空气质量改善产生了可衡量的影响。本研究展示了不确定性感知的因果AI如何支持环境问责、公共卫生保护和基于证据的环境决策治理。

英文摘要

Air pollution regulation is central to urban public health governance, but estimating its effects is difficult because policies are implemented non-randomly and pollution trajectories are shaped by meteorology, socioeconomic change, temporal trends, and overlapping interventions. This study develops an uncertainty-aware Bayesian deep learning framework to estimate the aggregate effect of air pollution regulations on PM$_{2.5}$ concentrations in London from 2010 to 2020. The framework integrates daily PM$_{2.5}$ observations from Inner London monitoring stations, meteorological covariates, annual socioeconomic indicators, month-of-year and day-of-week indicators, and daily regulation status data for 32 policy measures. A Bayesian LSTM captures temporal dependencies in environmental and socioeconomic covariates, Bayesian embedding layers represent temporal and regulation status inputs, and a regulation status prediction branch supports propensity score-based adjustment for non-random policy implementation. Regulatory effects are estimated by comparing observed PM$_{2.5}$ concentrations with counterfactual predictions under a hypothetical no-regulation scenario, with uncertainty summarized across repeated Bayesian training runs and bootstrap resampling. Results show that London's regulations were associated with an average PM$_{2.5}$ reduction of 1.88 $μ$g/m$^3$, a relative reduction of 12.35%, with a 95% confidence interval of 1.64-2.12 $μ$g/m$^3$. Estimated effects were limited before 2013, became clearer from 2013 to 2017, and were strongest in 2018 and 2019. The findings suggest that sustained and cumulative regulatory interventions contributed to measurable improvements in London's air quality. This study demonstrates how uncertainty-aware causal AI can support environmental accountability, public health protection, and evidence-based governance for environmental decision-making.

URL PDF HTML ☆

赞 0 踩 0

2606.15288 2026-06-16 cs.LG cs.AI physics.ao-ph 新提交

Hybrid NARX-LLM for Greenland Iceberg Discharge: Prompt-Driven Residual Correction

混合NARX-LLM用于格陵兰冰山排放：提示驱动的残差校正

Yiquan Gao, Duohui Xu

发表机构 * Heriot-Watt University（赫瑞瓦特大学）； StudioYG

AI总结提出混合NARX-LLM框架，结合非线性自回归模型与大型语言模型进行残差校正，并引入物理信息提示方法，用于建模格陵兰冰山排放的复杂非线性动态，提升预测准确性。

详情

AI中文摘要

格陵兰冰山排放表现出复杂的非线性动态，且可观测性有限，对传统预测模型构成挑战。我们提出一个混合NARX-LLM框架，该框架结合了具有外源输入的非线性自回归模型（NARX）和用于残差校正的大型语言模型（LLM）。我们进一步提出了一种物理信息提示（PIP）方法，将非结构化物理知识转化为结构化提示，用于零样本上下文推理。主要目标是探索该框架在建模格陵兰冰山排放方面的校正潜力，而不仅仅是优化预测精度。NARX组件捕获内在的时间依赖性，而由PIP引导的LLM编码冰川动力学和环境驱动因素，并感知关键趋势模式以校正系统预测误差。这种集成允许模型推理未建模因素并产生可解释的残差，从而提升整体预测精度。应用于格陵兰冰山排放时间序列，我们的方法处理了由于罕见变化和非平稳趋势而难以预测的极端事件，这是传统方法经常忽视的局限性。通过融合结构化时间序列建模与知识驱动的Foundation AI，该框架提供了一条可扩展且可解释的路径，将数据受限的气候预测与物理信息LLM推理相结合。代码已公开。

英文摘要

Greenland iceberg discharge exhibits complex nonlinear dynamics with limited observability, challenging traditional predictive models. We present a Hybrid NARX-LLM framework that combines a nonlinear autoregressive model with exogenous inputs (NARX) and a large language model (LLM) for residual correction. We further propose a Physics-Informed Prompt (PIP) method that transforms unstructured physical knowledge into structured prompts for zero-shot in-context reasoning. The primary objective is to explore the corrective potential of this framework for modeling Greenland iceberg discharge, rather than merely optimizing predictive accuracy. The NARX component captures intrinsic temporal dependencies, while the LLM, guided by PIP, encodes glacier dynamics and environmental drivers and perceives key trend patterns to correct systematic prediction errors. This integration allows the model to reason about unmodeled factors and produce interpretable residuals, enhancing overall predictive accuracy. Applied to Greenland iceberg discharge time series, our approach addresses extreme events that are difficult to predict due to rare variations and nonstationary trends, a limitation often overlooked by traditional methods. By fusing structured time-series modeling with knowledge-driven foundation AI, the framework offers a scalable and interpretable pathway to bridge data-limited climate forecasting with physics-informed LLM reasoning. The code is available.

URL PDF HTML ☆

赞 0 踩 0

2606.15314 2026-06-16 cs.LG cs.AI stat.ML 新提交

HAPI-EP：迈向混合、自适应和预测性的心脏电生理数字孪生

Sumeet Vadhavkar, Xiajun Jiang, Yubo Ye, Maryam Toloubidokhti, Linwei Wang

发表机构 * Rochester Institute of Technology（罗切斯特理工学院）

AI总结提出HAPI框架，通过物理集成灰盒模型、元学习快速自适应和条件生成模型，构建可识别、强预测性的心脏电生理数字孪生。

详情

AI中文摘要

患者特异性心脏的数字孪生（DT）在个性化医疗中具有巨大潜力。然而，其快速动态适应个体实时数据以及适应后的预测能力仍是核心挑战。我们从两个组成部分审视这一挑战：DT公式化中，机械模型和数据驱动模型展现出竞争性的优点和局限性；DT优化策略主要由重建目标驱动，导致模型不可识别。我们通过HAPI——一个用于构建混合、自适应和预测性DT的AI框架——解决这两个瓶颈，该框架包含三个关键使能器。首先，HAPI构建了一个物理集成的灰盒模型，其中可解释的机械骨干网络由神经组件增强，以建模其与观测数据的残差。其次，HAPI不试图在静态混合模型中预编码所有可能的变异，而是通过前馈元学习器实现混合模型对少样本实时数据的快速即时自适应，这些元学习器通过预测目标训练实现机械和神经参数的摊销推理。最后，我们证明这种自适应性对应于构建一个条件生成模型（即混合DT），赋予其理论可识别性，从而在预测场景中表现出色。我们在心脏电生理学中展示了HAPI的概念验证，使用具有机械反应动力学和神经图扩散的混合单域模型。通过合成和真实数据研究，我们表明HAPI的机械-神经混合和预测自适应对于获得具有强预测和分布外能力的可识别DT至关重要。

英文摘要

A digital twin (DT) of a patient-specific heart offers significant potential in personalized medicine. However, its rapid and dynamic adaptation to an individual's live data and its predictive capability after adaptation remains central challenges. We examine this challenge from its two building blocks: DT formulation where mechanistic and data-driven models show competing merits and limitations, and DT optimization strategies that are largely driven by a reconstruction objective leading to un-identifiable models. We address both bottlenecks via HAPI -- an AI framework for building hybrid, adaptive, and predictive DTs with three key enablers. First, HAPI constructs a physics-integrated gray-box model in which an interpretable mechanistic backbone is augmented by a neural component that models its residual to the observed data. Second, rather than attempting to pre-encode all possible variations in a static hybrid model, HAPI enables rapid on-the-fly adaptation of the hybrid model to few-shot live data, achieved by feedforward meta-learners realizing amortized inference of both mechanistic and neural parameters of the hybrid model trained with predictive objectives. Finally, we show that this adaptivity corresponds to the construction of a conditional generative model (i.e., the hybrid DT) that endows it with theoretical identifiability and thus strong performance in predictive scenarios. We demonstrate the proof-of-concept of HAPI in cardiac electrophysiology using a hybrid monodomain model with mechanistic reaction kinetics and neural graph diffusion. Across synthetic and real-data studies, we show that HAPI's mechanistic-neural hybridization and predictive adaptation are critical for obtaining identifiable DTs with strong predictive and out-of-distribution capabilities.

URL PDF HTML ☆

赞 0 踩 0

2606.15640 2026-06-16 cs.LG 新提交

Multi-Agent Framework for Audit Risk Assessment with Explicit Uncertainty and Evidence Conflict Modeling

具有显式不确定性和证据冲突建模的审计风险评估多智能体框架

Yuhan Wang, Manqing Wang, Yixuan Lu, Zhaoyue Peng, Shengda Lin

发表机构 * Columbia University（哥伦比亚大学）； Trine University（特林大学）； University of Sofia（索菲亚大学）； University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Westcliff University（韦斯特克莱夫大学）

AI总结提出UMAR框架，通过三个专业智能体独立评估风险并校准不确定性，利用Dempster-Shafer理论融合分数并测量冲突，在SEC 10-K数据集上优于基线模型，提供可解释的风险信号。

详情

AI中文摘要

审计风险评估日益受益于结合异质证据源，但现有方法通常产生点预测，而不量化不同证据流的一致程度。我们提出UMAR（不确定性感知多智能体风险评估），一个采用三个专业智能体的框架：MD&A文本智能体、财务比率智能体和CAM智能体，每个智能体产生具有校准不确定性估计的独立风险评分。基于Dempster-Shafer证据理论的不确定性聚合器融合这些分数，同时显式测量智能体间冲突。我们在来自SEC 10-K文件（2019-2023）的3200个公司年观测值的美国数据集上评估UMAR，以财务重述为目标标签。实验结果表明，UMAR的AUROC为0.782，PR-AUC为0.341，优于逻辑回归、XGBoost、FinBERT以及单智能体和双智能体LLM基线。UMAR在所有方法中达到最低的期望校准误差（ECE = 0.052），并识别出与实际重述风险相关的证据冲突模式，为审计师提供潜在可操作且可解释的风险信号。

英文摘要

Audit risk assessment increasingly benefits from combining heterogeneous evidence sources, yet existing approaches typically produce point predictions without quantifying how well different evidence streams agree. We propose UMAR (Uncertainty-Aware Multi-Agent Risk Assessment), a framework that employs three specialized agents: an MD&A Text Agent, a Financial Ratio Agent, and a CAM Agent, each producing independent risk scores with calibrated uncertainty estimates. An Uncertainty Aggregator based on Dempster-Shafer evidence theory fuses these scores while explicitly measuring inter-agent conflict. We evaluate UMAR on a U.S. dataset of 3,200 firm-year observations from SEC 10-K filings (2019-2023), with financial restatement as the target label. Experimental results show that UMAR achieves an AUROC of 0.782 and a PR-AUC of 0.341, outperforming logistic regression, XGBoost, FinBERT, and single-agent and dual-agent LLM baselines. UMAR attains the lowest expected calibration error (ECE = 0.052) among all methods and identifies evidence-conflict patterns that correlate with actual restatement risk, offering auditors potentially actionable and interpretable risk signals.

URL PDF HTML ☆

赞 0 踩 0

2606.15642 2026-06-16 cs.LG cs.AI 新提交

CIWI-CKT: Chaos-Informed Wave Interference Feature Fusion and Cross-City Knowledge Transfer for Traffic Flow Forecasting

CIWI-CKT：混沌信息波干涉特征融合与跨城市知识迁移用于交通流预测

Abdul Joseph Fofanah, Lian Wen, David Chen, Shaoyang Zhang

发表机构 * Griffith University（格里菲斯大学）； School of Information and Communication Technology, Griffith University（格里菲斯大学信息与通信技术学院）； School of Information Engineering, Chang’an University（长安大学信息工程学院）

AI总结针对跨城市数据稀缺场景，提出CIWI-CKT框架，融合混沌信息波生成、元干涉处理和混沌感知元学习，显著提升预测精度并降低数据需求。

详情

AI中文摘要

在跨城市、数据稀缺的场景下，准确预测交通流仍然具有挑战性，因为有限的历史数据阻碍了模型的泛化能力。交通动态的混沌性质、复杂的时空依赖关系以及异质的城市网络使得跨城市的小样本学习变得复杂。现有的深度学习方法要么将交通视为完全确定性的，要么缺乏对跨体制交通动态至关重要的波状干涉模式进行建模的机制。为了解决这些局限性，本文提出了CIWI-CKT，一种新颖的混沌信息波干涉特征融合框架，结合跨城市知识迁移。我们的框架引入了三个核心创新：混沌信息波生成，提取可测量的混沌不变量并将交通建模为自适应波分量；元干涉处理，捕获支持域和查询域之间的波相互作用，同时生成可预测性分数用于置信度估计；以及混沌感知元学习，在保留混沌特性的同时实现高效的跨城市知识迁移。我们建立了理论保证，包括混沌到波的稳定性、波诱导的降维以及元学习泛化界限。在四个真实世界交通数据集上的大量实验表明，CIWI-CKT显著优于最先进的时空图学习、迁移学习、基于提示和小样本方法，在提高预测精度的同时大幅减少了所需的训练数据。

英文摘要

Accurate traffic flow prediction remains challenging in cross-city, data-scarce scenarios where limited historical data hinders model generalisation. The chaotic nature of traffic dynamics, complex spatio-temporal dependencies, and heterogeneous urban networks complicate few-shot learning across cities. Existing deep learning approaches either treat traffic as purely deterministic or lack mechanisms to model wave-like interference patterns essential for cross-regime traffic dynamics. To address these limitations, this paper proposes CIWI-CKT, a novel Chaos-Informed Wave Interference Feature Fusion framework with Cross-City Knowledge Transfer. Our framework introduces three core innovations: chaos-informed wave generation that extracts measurable chaos invariants and models traffic as adaptive wave components; meta-interference processing that captures wave interactions between support and query regimes while producing a predictability score for confidence estimation; and chaos-aware meta-learning that enables efficient cross-city knowledge transfer while preserving chaotic characteristics. We establish theoretical guarantees including chaos-to-wave stability, wave-induced dimension reduction, and meta-learning generalisation bounds. Extensive experiments on four real-world traffic datasets demonstrate that CIWI-CKT significantly outperforms state-of-the-art spatio-temporal graph learning, transfer learning, prompt-based, and few-shot methods, improving prediction accuracy while substantially reducing required training data.

URL PDF HTML ☆

赞 0 踩 0

2606.15701 2026-06-16 cs.LG q-fin.ST 新提交

Robust Transformer-Based One-Step Stock Index Forecasting via Shifted Data Augmentation

基于移位数据增强的鲁棒Transformer一步股票指数预测

Tien Thanh Thach

发表机构 * Faculty of Mathematics and Statistics, Ton Duc Thang University（孙德胜大学数学与统计学院）

AI总结提出改进的Transformer架构结合余弦退火学习率调度和移位数据增强（SDA），在VN30和S&P 500指数上有效降低预测误差和波动性，优于增加模型复杂度的方法。

详情

AI中文摘要

Transformer在序列建模中取得了显著成功，但由于噪声信号、短记忆动态和分布偏移，其直接应用于金融时间序列仍具有挑战性。本文提出了一种改进的Transformer架构用于一步股票指数预测，结合了先进的学习率调度和一种新颖的移位数据增强（SDA）技术。我们在两个基准股票指数数据集VN30和S&P 500上评估了所提出的框架。实验结果表明，带预热的余弦退火相比广义逆幂调度器持续提高了预测精度。此外，SDA显著降低了预测误差和运行间变异性，同时提高了对超参数选择的鲁棒性。余弦退火调度与SDA的组合在两个数据集上均取得了最佳性能，表明在基于Transformer的金融预测中，数据增强比增加模型复杂度可以发挥更重要的作用。这些发现为在噪声金融环境中进行鲁棒的股票指数预测提供了一种实用且计算高效的方法。

英文摘要

Transformers have shown remarkable success in sequence modeling, yet their direct application to financial time series remains challenging due to noisy signals, short-memory dynamics, and distributional shifts. This paper proposes a modified Transformer architecture for one-step stock index forecasting, combined with advanced learning-rate scheduling and a novel Shifted Data Augmentation (SDA) technique. We evaluate the proposed framework on two benchmark stock index datasets, VN30 and S&P 500. Experimental results demonstrate that cosine annealing with warmup consistently improves forecasting accuracy over the generalized inverse-power scheduler. Furthermore, SDA substantially reduces forecasting errors and run-to-run variability while improving robustness to hyperparameter selection. The combination of cosine annealing scheduling and SDA achieved the best performance on both datasets, indicating that data augmentation can play a more important role than increasing model complexity in Transformer-based financial forecasting. These findings provide a practical and computationally efficient approach for robust stock index forecasting in noisy financial environments.

URL PDF HTML ☆

赞 0 踩 0

2606.15756 2026-06-16 cs.LG cs.AI 新提交

From Correlation to Causation in Lane Change Prediction for Automated Driving: A Causal Explanation Framework

从相关性到因果性：自动驾驶换道预测的因果解释框架

Mohamed Manzour, Aditya Kumar, Augusto Luis Ballardini, Miguel Ángel Sotelo

发表机构 * University of Alcalá（阿尔卡拉大学）

AI总结提出基于因果推断的换道预测框架，结合深度结构因果建模与干预效应分析，在预测准确率超过95%的同时，识别直接贡献变量及其因果链，实现可解释的因果推理。

详情

AI中文摘要

换道预测是智能车辆的核心任务，提前预测操作有助于更安全的决策。然而，现有方法主要学习观测驾驶变量与未来操作之间的统计关联，而忽略了输入变量之间的因果依赖关系。这限制了可解释性，尤其是当纵向间隙、相对纵向速度和碰撞时间（TTC）等物理相关变量被视为独立平坦输入时。本文提出一个基于因果推断的换道预测与解释框架。该方法结合语言特征构建、专家约束的因果发现、基于深度端到端因果推断（DECI）的深度结构因果建模、基于干预的效果分析、反驳测试和递归因果链解释。目标不仅是预测未来操作，还要识别直接贡献于预测的候选变量、影响这些变量的上游因素以及这些效应传播的因果链。该框架在车道标记交叉事件前的前三秒内平均F1分数超过95%。除了预测精度，该框架使用基于干预的效果分析，在学到的因果结构下区分有影响力的变量和弱影响力变量。它进一步区分候选直接贡献者和中介效应，并生成对比性因果链解释，阐明为什么预测的操作更受青睐，而替代操作支持较少。因此，主要贡献是一个机制感知的换道预测流程，从基于相关性的分类转向更可解释的因果推理用于操作预测。

英文摘要

Lane-change prediction is a central task in intelligent vehicles, where early maneuver anticipation can support safer decision-making. However, many existing approaches mainly learn statistical associations between observed driving variables and future maneuvers, while overlooking the causal dependencies among the input variables themselves. This limits interpretability, especially when physically related variables such as longitudinal gap, relative longitudinal velocity, and Time-To-Collision (TTC) are treated as independent flat inputs. This article presents a causal-inference-based framework for lane-change prediction and explanation. The proposed approach combines linguistic feature construction, expert-constrained causal discovery, deep structural causal modeling with Deep End-to-end Causal Inference (DECI), intervention-based effect analysis, refutation testing, and recursive causal-chain explanation. The objective is not only to predict the future maneuver, but also to identify candidate variables that directly contribute to the prediction, the upstream factors influencing them, and the causal chains through which these effects propagate. The framework achieves average F1-scores above 95% during the first three seconds before the lane-marking crossing event. Beyond prediction accuracy, the framework uses intervention-based effect analysis to distinguish influential from weakly influential variables under the learned causal structure. It further distinguishes candidate direct contributors from mediated effects and generates contrastive causal-chain explanations that clarify why the predicted maneuver is favored and why the alternative maneuvers are less supported. The main contribution is therefore a mechanism-aware lane-change prediction pipeline that moves beyond correlation-based classification toward more interpretable causal reasoning for maneuver prediction.

URL PDF HTML ☆

赞 0 踩 0

2606.15784 2026-06-16 cs.LG cs.CE 新提交

Bayesian Networks with Latent Time Embedding for Stage-Aware Causal Modeling of Alzheimer's Disease Progression

具有潜在时间嵌入的贝叶斯网络用于阿尔茨海默病进展的阶段感知因果建模

Nguyen Linh Dan Le

发表机构 * Alzheimer's Disease Neuroimaging Initiative（阿尔茨海默病神经影像学倡议）； Open Access Series of Imaging Studies（开放获取影像学研究系列）

AI总结提出BN-LTE框架，结合贝叶斯网络与潜在时间嵌入，利用AT(N)级联约束建模AD进展，在ADNI数据上优于基线，并识别出淀粉样蛋白敏感性的中期伪时间窗口。

Comments 7 pages, 5 figures

详情

AI中文摘要

阿尔茨海默病（AD）的进展通常通过淀粉样蛋白-tau-神经退行性变（AT(N)）级联来描述。然而，大多数纵向模型要么将这种级联表示为固定的生物标志物序列，要么表示为黑箱预测任务。这使得难以确定生物学引导的生物标志物关系何时影响未来的区域病理。在本研究中，我们引入了具有潜在时间嵌入的贝叶斯网络（BN-LTE），这是一个用于AD进展阶段感知建模的贝叶斯结构框架。BN-LTE从基线生物标志物谱估计疾病伪时间，并根据生物学上合理的AT(N)排序约束有向依赖关系。然后使用后验样条变结构方程将初始多模态测量与未来的年度区域tau-PET变化联系起来。在使用ADNI数据的重复受试者分离评估中，与包含的预测基线相比，BN-LTE显示出tau进展的强空间重建。除了空间重建，BN-LTE恢复了后验阶段变化的AT(N)约束效应，并识别出淀粉样蛋白敏感性的中期伪时间窗口。该窗口得到模型隐含的g公式对比、根调整AIPW、机制敏感消融以及跨样条和先验规范的鲁棒性分析的支持。总体而言，这些发现将BN-LTE定位为一种贝叶斯结构框架，用于预测tau进展，同时检查观察性纵向神经影像数据中阶段依赖的AT(N)级联机制。我们的代码可在https://github.com/danleneurocom/BN-LTE获取。

英文摘要

Alzheimer's disease (AD) progression is often described through the amyloid-tau-neurodegeneration, or AT(N), cascade. However, most longitudinal models represent this cascade either as a fixed sequence of biomarkers or as a black-box forecasting task. This makes it difficult to determine when biologically guided biomarker relationships influence future regional pathology. In this study, we introduce Bayesian Networks with Latent Time Embedding (BN-LTE), a Bayesian structural framework for stage-aware modeling of AD progression. BN-LTE estimates disease pseudotime from baseline biomarker profiles and constrains directed dependencies according to biologically plausible AT(N) ordering. Posterior spline-varying structural equations are then used to link initial multimodal measurements with future annualized regional tau-PET change. Across repeated subject-disjoint evaluations using ADNI data, BN-LTE shows strong spatial reconstruction of tau progression compared with the included forecasting baselines. Beyond spatial reconstruction, BN-LTE recovers posterior stage-varying AT(N)-constrained effects and identifies a mid-pseudotime window of amyloid sensitivity. This window is supported by model-implied g-formula contrasts, root-adjusted AIPW, mechanism-sensitive ablations, and robustness analyses across spline and prior specifications. Overall, these findings position BN-LTE as a Bayesian structural framework for forecasting tau progression while examining stage-dependent AT(N)-cascade mechanisms in observational longitudinal neuroimaging data. Our code is available at https://github.com/danleneurocom/BN-LTE.

URL PDF HTML ☆

赞 0 踩 0

2606.15807 2026-06-16 cs.LG cs.AI 新提交

Continuous Cross-Domain Traffic State Prediction via Memory-Augmented Graph Liquid Time-Constant Networks

基于记忆增强图液态时间常数网络的连续跨域交通状态预测

Jinrong Xiang, Ming Xu

发表机构 * Software College, Liaoning Technical University（辽宁工程技术大学软件学院）

AI总结提出记忆增强图液态时间常数网络（MA-GLTC），通过时空单元分解、图液态时间常数动态和记忆迁移存储机制，实现连续时间下的跨域交通状态预测，在五个数据集上优于现有方法。

详情

AI中文摘要

交通状态预测是智能交通系统中的一项基本任务。在实际应用中，一些区域由于感知基础设施不足而面临有限的交通观测，使得跨域知识迁移成为数据稀缺交通预测的重要解决方案。然而，现有的跨域交通预测方法仍面临若干局限，包括粗粒度的源-目标域适应、处理未见目标域模式的能力有限，以及在非规则或异质时间条件下对连续交通动态建模不足。为解决这些问题，本文提出了一种连续跨域交通预测框架，称为记忆增强图液态时间常数网络（MA-GLTC）。具体地，我们首先构建时空单元（STU）将交通网络分解为可迁移的局部单元，实现跨域的细粒度知识对齐。然后，开发了图液态时间常数网络（GLTC）来建模连续时间下图耦合的交通演化。与通用的基于图神经ODE的模型不同，GLTC将图耦合的循环电导引入液态时间常数动态，允许节点状态随泄漏、自适应时间常数和邻域感知反馈而演化。此外，设计了基于记忆的迁移存储（MTS）机制，以保留源域知识、检索匹配的交通模式，并在出现未见状态时更新可靠的目标域模式。在五个公开交通数据集上的实验表明，MA-GLTC在短期和长期预测任务中均持续优于代表性的域内和跨域基线。与次优方法相比，MA-GLTC分别将平均预测误差降低了3.02%、0.33%、8.92%、10.09%和2.11%。

英文摘要

Traffic state prediction is a fundamental task in intelligent transportation systems. In practical applications, some regions suffer from limited traffic observations due to insufficient sensing infrastructure, making cross-domain knowledge transfer an important solution for data-scarce traffic prediction. However, existing cross-domain traffic prediction methods still face several limitations, including coarse-grained source-target adaptation, limited capability in handling unseen target-domain patterns, and insufficient modeling of continuous traffic dynamics under irregular or heterogeneous temporal conditions. To address these issues, this paper proposes a continuous cross-domain traffic prediction framework, termed Memory-Augmented Graph Liquid Time-Constant Network (MA-GLTC). Specifically, we first construct spatio-temporal units (STUs) to decompose traffic networks into transferable local units, enabling fine-grained knowledge alignment across domains. Then, a graph liquid time-constant network (GLTC) is developed to model graph-coupled traffic evolution in continuous time. Different from generic graph neural ODE-based models, GLTC introduces graph-coupled recurrent conductance into liquid time-constant dynamics, allowing node states to evolve with leakage, adaptive time constants, and neighborhood-aware feedback. Furthermore, a Memory-based Transfer Storage (MTS) mechanism is designed to preserve source-domain knowledge, retrieve matched traffic patterns, and update reliable target-domain patterns when unseen states emerge. Experiments on five public traffic datasets demonstrate that MA-GLTC consistently outperforms representative innerdomain and cross-domain baselines in both short-term and longterm prediction tasks. Compared with the second-best method, MA-GLTC reduces the average prediction errors by 3.02%, 0.33%, 8.92%, 10.09%, and 2.11%, respectively.

URL PDF HTML ☆

赞 0 踩 0

2606.15892 2026-06-16 cs.LG 新提交

Scalar-pathway fidelity improves physical accuracy in short-range equivariant interatomic potentials

标量路径保真度提高短程等变原子间势的物理准确性

Jia Bi, Alin Marin Elena, Samuel Pinilla

发表机构 * Science and Technology Facilities Council（科学技术设施委员会）； Diamond Light Source（钻石光源）

AI总结提出标量路径修正方法（PAN池化和PGS混合器），在保持等变骨架不变下优化标量通道，使MACE等势的力误差降低22-27%，能量误差降低19-22%，且计算开销仅增5%。

详情

AI中文摘要

精确的原子间势能实现超越密度泛函理论长度和时间尺度的材料、分子和界面的分子动力学。等变神经网络势能改进了局部几何的表示。然而，其可部署的能量表面最终通过不变的标量通道体现，这些通道的聚合和光谱分辨率相对未充分研究。这里我们使用物理感知邻域（PAN）池化和物理引导光谱（PGS）混合器作为受控的标量路径探针：轻量级、对称性保持的修改，仅作用于$\ell=0$通道，同时保持等变张量主干不变。使用MACE作为高体阶机制支架，PAN添加协调敏感幅度调制，而PGS用径向和锥形光谱基增强边和读出标量特征。在金属Ag、共价Si、短程离子LiF/Li--F子集和MD17/rMD17分子上，这种标量路径修正将MACE力误差降低22-27%，能量误差降低19-22%；在带有应力标签的系统上，应力误差降低27-28%，推理FLOPs成本增加约5%。在Allegro和NequIP中方向一致的增益进一步表明该修正可跨不同短程等变主干移植，尽管效果大小仍依赖于架构。这些结果将标量路径保真度确定为短程等变原子间势的一个实用设计维度。

英文摘要

Accurate interatomic potentials enable molecular dynamics of materials, molecules, and interfaces beyond density-functional-theory length and time scales. Equivariant neural network potentials have improved the representation of local geometry. However, their deployable energy surfaces ultimately manifest through invariant scalar channels, whose aggregation and spectral resolution remain comparatively underexamined. Here we use Physics-Aware Neighborhood (PAN) pooling and Physics-Guided Spectral (PGS) mixers as controlled scalar-pathway probes: lightweight, symmetry-preserving modifications that act only on $\ell=0$ channels while leaving the equivariant tensor backbone unchanged. Using MACE as a high-body-order mechanistic scaffold, PAN adds coordination-sensitive amplitude modulation, whereas PGS augments edge and readout scalar features with radial and tapered spectral bases. Across metallic Ag, covalent Si, a short-range ionic LiF/Li--F subset, and MD17/rMD17 molecules, this scalar-pathway correction reduces MACE force errors by 22--27\% and energy errors by 19--22\%; on systems with stress labels, stress errors decrease by 27--28\%, at approximately 5\% additional inference-FLOPs cost. Directionally consistent gains in Allegro and NequIP further indicate that the correction is portable across distinct short-range equivariant backbones, although effect sizes remain architecture-dependent. These results identify scalar-pathway fidelity as a practical design dimension for short-range equivariant interatomic potentials.

URL PDF HTML ☆

赞 0 踩 0

2606.15927 2026-06-16 cs.LG 新提交

An Exploratory Study of Blood Glucose Estimation from Photoplethysmography Signals using Machine Learning

基于机器学习从光电容积脉搏波信号估计血糖的探索性研究

Ruhani Bhatia, Vijval Ekbote

发表机构 * Indraprastha Institute of Information Technology, Delhi（德里印度信息技术学院）

AI总结本研究利用智能手表PPG信号和CGM血糖数据构建机器学习模型，探索无创血糖估计的可行性，初步结果显示存在预测信号但需更多数据验证。

Comments 7 pages, 3 figures

详情

AI中文摘要

糖尿病和极端血糖水平是当今人类面临的主要健康问题之一。虽然连续血糖监测（CGM）已成为管理糖尿病和监测血糖水平的有效技术，但该技术传统上是侵入性的（即需要刺穿皮肤），并存在刺激、硬结等风险。这凸显了对准确且可大规模部署的非侵入性CGM方法的需求。随着各种传感技术的出现及其在智能手表等可穿戴设备中的集成，我们现在能够以非侵入方式连续监测光电容积脉搏波（PPG）等身体信号。通过CGM连续监测血糖并通过智能手表连续监测PPG信号的能力，为我们提供了获取这两类密集数据的机会，从而开启了构建基于机器学习和深度学习的模型以从PPG信号估计血糖水平的可能性。在这项工作中，我们首先提供了一个配对数据集，包含来自智能手表的连续PPG信号以及使用CGM设备记录的血糖值。我们还展示了在数据集上进行的一些初步实验探索的结果。这些初步结果表明可能存在一些预测信号，但需要来自更多个体的更多数据进行进一步探索。数据集可在 https://zenodo.org/records/20577959 获取。

英文摘要

Diabetes and extreme blood sugar levels are some of the major health problems faced by humans today across the world. While Continuous Glucose Monitoring (CGM) has emerged as an effective technology for management of diabetes as well as for monitoring blood sugar levels, this technology has traditionally been invasive (that is, requiring the piercing of the skin) and carries the risk of irritation, induration, etc. This highlights the need for accurate and non-invasive CGM methods that can be deployed at scale. With the emergence of various sensing technologies and their integration in wearables like the smart-watch, we now have the capability to continuously monitor body signals like the Photoplethysmogram (PPG) in a non-invasive manner. Having the ability to continuously monitor blood glucose through CGMs and continuously monitor PPG signals through a smart-watch offers an opportunity to get dense data on these two, opening the possibility of building machine learning and deep learning based models to estimate blood glucose level from PPG signals. In this work, we first present a paired dataset comprising continuous PPG signals from a smartwatch along with glucose values recorded using a CGM device. We also present the results of some preliminary experimental explorations performed on our dataset. These preliminary results suggest that some predictive signals may exist, though more exploration is needed with more data from a larger number of individuals. The dataset can be accessed at https://zenodo.org/records/20577959

URL PDF HTML ☆

赞 0 踩 0

2606.16023 2026-06-16 cs.LG 新提交

IBAD: Interpretable Behavioral Anomaly Detection on Human Mobility Data

IBAD：人类移动数据上的可解释行为异常检测

Bita Azarijoo, John Krumm, Cyrus Shahabi

发表机构 * University of Southern California（南加州大学）

AI总结提出IBAD框架，利用LDA学习可解释的日常移动模板，通过层次自监督模型检测个体行为异常，在真实和合成数据集上验证了模板的可迁移性和鲁棒性。

详情

AI中文摘要

人类移动行为看似高度多样化，但个体日常移动的大部分可由少量重复的行为模板解释，如通勤、学校活动、照护、夜生活或差事模式。我们提出 \texttt{IBAD}（可解释行为异常检测），该框架学习可解释的日常移动模板，并将每个个体表示为这些模板混合上的分布。IBAD 不关注特定位置，而是刻画个体在不同地点执行的活动。该方法首先使用潜在狄利克雷分配（LDA）发现全局行为模板，然后采用层次自监督模型从个体的软行为模板中学习正常行为。我们还引入了一个 \emph{拼接基准}，用于在个体历史画像与注入的移动模式之间创建受控的行为不匹配。在真实和合成数据集上的实验表明，日常行为可有效分解为少量可解释的模板。关键的是，我们证明学习到的行为原型在不同地理和人口统计背景下具有 \emph{可迁移性}。此外，IBAD 在所有设置下均保持稳健的竞争性能。为便于复现，代码可在 \href{https://github.com/USC-InfoLab/IBAD}{https://github.com/USC-InfoLab/IBAD} 获取。

英文摘要

Human mobility appears highly diverse, yet much of a person's daily mobility can be explained by a small set of recurring behavioral templates, such as commuting, school-centered activities, caregiving, nightlife, or errand patterns. We present \texttt{IBAD} (\underline{I}nterpretable \underline{B}ehavioral \underline{A}nomaly \underline{D}etection), a framework that learns interpretable daily mobility templates and represents each individual as a distribution over mixtures of these templates. Rather than focusing on specific locations, IBAD characterizes activities that individuals perform across locations. This approach first discovers global behavioral templates using Latent Dirichlet Allocation (LDA), then employs a hierarchical self-supervised model to learn normal behavior of individuals from their soft behavioral templates. We also introduce a \emph{splicing benchmark} that creates controlled behavioral mismatches between an individual's historical profile and injected mobility patterns. Experiments on real-world and synthetic datasets show that daily behavior can be effectively decomposed into a small number of interpretable templates. Crucially, we show that the learned behavioral archetypes \emph{transfer} across distinct geographic and demographic contexts. Furthermore, IBAD maintains a robust competitive performance across all settings. For reproducibility purposes, the code is accessible at ~\href{https://github.com/USC-InfoLab/IBAD}{https://github.com/USC-InfoLab/IBAD}.

URL PDF HTML ☆

赞 0 踩 0

2606.16056 2026-06-16 cs.LG cs.HC 新提交

Beyond the Blood Draw: Explainable Machine Learning for Non-Invasive Dysglycemia Risk Screening

超越抽血：用于非侵入性血糖异常风险筛查的可解释机器学习

Black Sun, Chenyi Zhang, Kaiyi Ji, Xi Lu

发表机构 * Department of Computer Science, Aarhus University（奥胡斯大学计算机科学系）； University at Buffalo, SUNY（纽约州立大学布法罗分校）

AI总结利用NHANES数据训练LightGBM等六种机器学习模型，实现无需实验室检测的血糖异常风险筛查，AUC达0.820，优于传统风险评分，并识别出年龄、种族和腰高比等关键预测因素。

详情

AI中文摘要

血糖异常，包括糖尿病前期和糖尿病，影响着全球大量成年人，但其中许多人仍未得到诊断。我们开发并验证了用于非侵入性血糖异常风险筛查的机器学习模型，这些模型无需实验室检测。汇集2017-2023年国家健康与营养调查（NHANES）数据（n=14,352），我们使用分层5折交叉验证训练了六种机器学习模型，并将其与两种既定的临床风险评分进行比较。LightGBM在受试者工作特征曲线下面积（AUC=0.820，95% CI：0.806-0.835）上表现最佳，优于芬兰糖尿病风险评分（0.745）和美国糖尿病协会风险测试（0.783）。SHAP分析确定年龄、种族/民族和腰高比是最有影响力的预测因素。亚组分析证实了在不同人口统计分层中的一致表现（AUC：0.735-0.832）。这些结果证明了在社区环境和自我跟踪健康应用中部署可解释、无需实验室的血糖异常筛查的可行性。

英文摘要

Dysglycemia, encompassing both prediabetes and diabetes, affects huge numbers of adults worldwide, yet many of them remain undiagnosed. We developed and validated machine-learning (ML) models for non-invasive screening of dysglycemia risk that require no laboratory tests. Pooling data from the National Health and Nutrition Examination Survey (NHANES) 2017--2023 (n=14,352), we trained six ML models with stratified 5-fold cross-validation and compared them with two established clinical risk scores. LightGBM achieved the highest area under the receiver operating characteristic curve (AUC=0.820, 95% CI: 0.806--0.835), outperforming the Finnish Diabetes Risk Score (0.745) and American Diabetes Association Risk Test (0.783). SHAP analysis identified age, race/ethnicity, and waist-to-height ratio as the most influential predictors. Subgroup analyses confirmed consistent performance across demographic strata (AUC: 0.735--0.832). These results demonstrate the feasibility of explainable, laboratory-free dysglycemia screening for deployment in community settings and self-tracking health applications.

URL PDF HTML ☆

赞 0 踩 0

2606.16160 2026-06-16 cs.LG cs.AI cs.HC 新提交

A comparative and critical study of EEGNet for fNIRS-driven cognitive load classification

EEGNet在fNIRS驱动的认知负荷分类中的比较与批判性研究

Mehshan Ahmed Khan, Houshyar Asadi, Li Zhang, Mohammad reza Chalak Qazani, Ghazal Bargshady, Stefanos gkikas, Christian arzate, Sam Oladazimi, Zoran Najdovsk, Lei Wei, Chee Peng Lim

发表机构 * Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University（智能系统研究与创新研究所（IISRI），德克萨斯大学）； Department of Computer Science, Royal Holloway, University of London（伦敦大学皇家霍洛威学院计算机科学系）； College of Science and Engineering, James Cook University（詹姆斯库克大学科学与工程学院）； Faculty of Science and Technology, University of Canberra（堪培拉大学科学与技术学院）； Honda research institute (HRI), Japan（日本本田研究院）； Swinburne University of Technonology, Hawthorn, Victoria（技术学院，维多利亚州哈沃恩）

AI总结本研究系统评估EEGNet在fNIRS认知负荷分类中的性能，发现重叠分段和小固定学习率在随机分割中表现最佳，但受试者独立评估准确率大幅下降，非重叠分段和PCA特征在SI评估中取得最佳56.11%准确率，表明消除时间冗余有助于学习更鲁棒的跨个体表征。

详情

AI中文摘要

由于时间变异性、受试者间差异以及对预处理选择的敏感性，从功能性近红外光谱（fNIRS）信号中准确分类认知负荷仍然是一个重大挑战。本研究通过系统检查时间分割策略（重叠与非重叠）、窗口长度（10秒、20秒、30秒）、特征提取方法（方差分析（ANOVA）、主成分分析（PCA）、快速独立成分分析（FastICA））、学习率配置（固定和自适应）以及评估协议（随机分割与受试者独立（SI））的影响，对EEGNet在基于fNIRS的认知负荷分类中进行了全面评估。随机分割实验的结果表明，重叠分割结合较小的固定学习率（0.01-0.001）由于时间冗余和血流动力学转变的密集采样而产生了最高的准确率。然而，SI评估显示准确率大幅下降，表明对未见参与者的泛化能力有限。在SI评估下，非重叠分割优于重叠窗口，使用PCA特征、20秒窗口和0.1学习率获得了最佳准确率56.11%。这些发现表明，消除时间冗余有助于模型学习更鲁棒和可泛化的跨个体认知负荷表征。尽管自适应学习率策略提高了训练稳定性，但并未超过最优选择的固定学习率的性能。该研究强调了分割策略和学习率选择在提高模型泛化能力中的关键作用，并指出了开发基于fNIRS的可靠、实时和受试者独立认知负荷分类系统所必需的方法学考虑。

英文摘要

Accurately classifying cognitive load from functional near-infrared spectroscopy (fNIRS) signals remains a significant challenge due to temporal variability, inter-subject differences, and sensitivity to preprocessing choices. This study provides a comprehensive evaluation of EEGNet for fNIRS-based cognitive load classification by systematically examining the effects of temporal segmentation strategies (overlapping vs. non-overlapping), window lengths (10s, 20s, 30s), feature extraction methods (Analysis of Variance (ANOVA), Principal Component Analysis (PCA), Fast Independent Component Analysis (FastICA)), learning rate configurations (fixed and adaptive), and evaluation protocols (random split vs. subject-independent (SI)). Results from random-split experiments show that overlapping segmentation, combined with smaller fixed learning rates (0.01-0.001), yields the highest accuracies, due to temporal redundancy and dense sampling of hemodynamic transitions. However, SI evaluation reveals a substantial drop in accuracy, demonstrating limited generalization to unseen participants. Under SI evaluation, non-overlapping segmentation outperformed overlapping windows, with the best accuracy of 56.11% achieved using PCA features with a 20-second window and a 0.1 learning rate. These findings indicate that eliminating temporal redundancy helps the model learn more robust and generalizable representations of cognitive load across individuals. Although adaptive learning rate strategy improved training stability, it did not surpass the performance of optimally selected fixed learning rates. The study highlights the critical role of segmentation strategy and learning rate selection in improving model generalization and identifies methodological considerations essential for developing reliable, real-time, and SI cognitive load classification systems using fNIRS.

URL PDF HTML ☆

赞 0 踩 0

2606.16183 2026-06-16 cs.LG cs.AI cs.CL 新提交

LLM-Powered Virtual Population for Demand Simulation and Pricing

基于LLM的虚拟人群用于需求模拟与定价

Chengpiao Huang, Kaizheng Wang

发表机构 * Columbia University（哥伦比亚大学）

AI总结提出一种LLM驱动的虚拟人群模型，通过混合客户画像和LLM评估购买概率，生成需求分布，支持风险感知定价，在H&M数据集上表现最优。

Comments 18 pages, 7 figures

详情

AI中文摘要

我们开发了一个基于LLM的虚拟人群模型，用于模拟定价决策中的需求，其中产品由丰富的非结构化信息（如文本描述和图像）描述，决策者不仅需要平均需求预测，还需要反事实价格的不确定性估计。我们的模型将暴露的客户表示为从有限混合客户画像中的抽取。对于每个画像、产品和候选价格，LLM使用结构化画像信息和非结构化产品信息来引出画像级别的购买概率。这些概率通过校准的混合权重聚合，形成总需求的预测分布。生成的模拟器可以在各种定价目标下评估反事实价格，包括期望收入和风险感知标准（如条件风险价值）。我们在一个包含产品描述和图像的在线H&M时尚数据集上测试了该框架。校准后的基于LLM的模拟器在所考虑的模型中实现了最佳的整体预测性能，并支持样本高效的定价决策。我们的框架提供了一种实用的方法，将LLM用作需求模拟器，适用于历史需求数据有限但产品信息丰富的产品。通过生成完整的需求预测分布而不仅仅是点预测，它使管理者能够比较候选价格、量化需求不确定性，并选择针对平均收入或风险感知目标的价格。

英文摘要

We develop an LLM-powered virtual population model that simulates demand for pricing decisions, in settings where products are described by rich unstructured information, such as text descriptions and images, and where decision makers need not only mean-demand predictions but also uncertainty estimates for counterfactual prices. Our model represents exposed customers as draws from a finite mixture of customer personas. For each persona, product, and candidate price, an LLM elicits a persona-level purchase probability using both structured persona information and unstructured product information. These probabilities are aggregated through calibrated mixture weights to form a predictive distribution of aggregate demand. The resulting simulator can evaluate counterfactual prices under various pricing objectives, including expected revenue and risk-aware criteria such as conditional value at risk. We test the framework on an online H&M fashion dataset with product descriptions and images. The calibrated LLM-based simulator achieves the best overall predictive performance among the models considered, and supports sample-efficient pricing decisions. Our framework provides a practical way to use LLMs as demand simulators for products with limited historical demand data but rich product information. By producing a full predictive demand distribution rather than only a point forecast, it enables managers to compare candidate prices, quantify demand uncertainty, and choose prices that target either average-case revenue or risk-aware objectives.

URL PDF HTML ☆

赞 0 踩 0

2606.16226 2026-06-16 cs.LG 新提交

Prediction of Runtime Parameters of Parallel Chemistry Applications via Active and Generative Learning

通过主动和生成学习预测并行化学应用的运行时参数

Tanzila Tabassum, Omer Subasi, Ajay Panyala, Epiya Ebiapia, Gerald Baumgartner, Erdal Mutlu, P Sadayappan, Karol Kowalski

发表机构 * Louisiana State University（路易斯安那州立大学）； Pacific Northwest National Laboratory（太平洋西北国家实验室）； University of Utah（犹他大学）

AI总结提出基于主动学习和生成学习的机器学习方法，结合梯度提升回归树模型，预测并行化学计算的运行时参数，在CCSD计算中MAPE低至0.023，R²高达99.9%。

2606.16434 2026-06-16 cs.LG cs.AI 新提交

Autonomous End-to-End SOH Prediction Services for Battery Systems via Temporal-Contrastive Representation Learning

基于时间对比表示学习的电池系统自主端到端健康状态预测服务

Junting Wen, Dan Li, Qihao Quan, Xiwen Wang, Hang Yang, Zhaohong Meng, Zigui Jiang, Changlin Yang, Tianle Liu, Diego Muñoz-Carpintero, Jian Lou

发表机构 * School of Software Engineering, Sun Yat-sen University（中山大学软件学院）； Tianneng Battery Group Co., Ltd（天能电池集团有限公司）； School of Communication Engineering, Hangzhou Dianzi University（杭州电子科技大学通信工程学院）； Institute of Engineering Science, Universidad de O’Higgins（奥希金斯大学工程科学研究所）

AI总结提出TC-SOH模块化服务架构，通过时间对比机制和跨窗口预测任务从原始数据中提取退化相关表示，实现自主端到端SOH预测，在四个数据集上MAPE和RMSE分别降低1.91倍和2.13倍。

详情

AI中文摘要

准确的状态健康（SOH）估计是锂离子电池管理的关键诊断服务。然而，依赖劳动密集型的手动特征工程和不透明的黑箱模型阻碍了可扩展的工业部署。为此，我们引入TC-SOH：一种模块化、即插即用的服务架构，用于自主、端到端的SOH预测。TC-SOH采用时间对比机制和跨窗口预测预任务，直接从原始运行数据中提取与退化相关的表示。为了提高透明度，我们将模型效能与表示诊断联系起来：可视化、敏感性分析、冗余分析、双向探测、未来SOH探测和时间洗牌表明，学习到的特征与选定的专家描述符重叠，同时保留了额外的SOH相关变化，并且有序的时间上下文改善了后续SOH预测。在四个公开数据集上，TC-SOH优于所考虑的物理信息和数据驱动基线，MAPE降低了1.91倍，RMSE降低了2.13倍。

英文摘要

Accurate state of health (SOH) estimation is a critical diagnostic service for lithium-ion battery management. However, reliance on labor-intensive manual feature engineering and opaque black-box models hinders scalable industrial deployment. To address this, we introduce TC-SOH: a modular, plug-and-play service architecture for autonomous, end-to-end SOH prediction. TC-SOH employs a temporal-contrastive mechanism and a cross-window prediction pretext task to extract degradation-relevant representations directly from raw operational data. To improve transparency, we connect model efficacy with representation diagnostics: visualization, sensitivity analysis, redundancy analysis, bidirectional probing, future-SOH probing, and temporal shuffling show that learned features overlap with selected expert descriptors while retaining additional SOH-relevant variation, and that ordered temporal context improves subsequent-SOH prediction. Across four public datasets, TC-SOH outperforms the considered physics-informed and data-driven baselines, reducing MAPE by 1.91 times and RMSE by 2.13 times.

URL PDF HTML ☆

赞 0 踩 0

2606.16580 2026-06-16 cs.LG cs.CV 新提交

Multi-Modal Spatio-Temporal Graph Neural Network with Mixture of Experts for Soil Organic Carbon Prediction

基于专家混合的多模态时空图神经网络用于土壤有机碳预测

Daniele Mos, Felipe Drummond, Anton Bossenbroek, Soufiane el Khinifri

发表机构 * Spatialise B.V.

AI总结提出SpTGNN，一种多模态时空图神经网络，通过异构图注意力、微调基础模型特征提取和稀疏专家混合融合，结合异方差回归与深度集成的不确定性量化，在三个区域数据集上优于XGBoost基线。

Comments Paper is 27 pages, 14 figures, 12 tables

详情

AI中文摘要

表层土壤有机碳（SOC）预测是农业可持续性、土地利用政策和施肥规划的基础。现有方法面临两个限制：它们将手工制作的协变量与经典机器学习或单模态深度模型配对，忽略了丰富的光谱和时间信息，而基于网格的架构忽略了田间测量的不规则空间结构。我们提出了SpTGNN，一种多模态时空图神经网络来解决这两个问题。SpTGNN将土壤测量表示为具有三种边类型（空间邻近性、光谱相似性、高程）的异构图中的节点，并应用关系图注意力来学习每种关系的独立模式。一个微调的TerraMind编码器从Sentinel-2、Sentinel-1和DEM信号中提取节点特征，并结合每个样本的环境协变量以及学习到的位置和时间嵌入。一个稀疏专家混合模块通过top-$k$路由融合四个流。通过配对异方差回归（偶然不确定性）和深度集成（认知不确定性）来捕获不确定性，并使用Moran's $I$惩罚项正则化空间自相关。我们在一个全球SOC语料库上进行评估，该语料库分为三个区域实例（全球约49k样本，非洲约26k，欧洲约14k）。我们的5成员深度集成在非洲测试集上报告$R^2=0.762$，RMSE $=3.51\pm0.48$ g/kg和MAPE $=22.9\\%$，优于表格XGBoost基线；最佳单个检查点达到验证$R^2=0.864$。消融实验证实异构图、MoE融合和微调主干各自贡献显著，集成不确定性量化栈实现后校准ECE为$0.031$（混合）和$0.026$（$\beta$-NLL）。据我们所知，这是第一个统一基础模型特征提取、异构图注意力和分解不确定性量化的SOC估计框架。

英文摘要

Top-soil organic carbon (SOC) prediction is fundamental to agricultural sustainability, land use policy and fertilization planning. Existing approaches face two limitations: they pair hand-crafted covariates with classical ML or single-modal deep models that miss rich spectral and temporal information, and grid-based architectures ignore the irregular spatial structure of field measurements. We introduce SpTGNN, a multi-modal spatio-temporal graph neural network addressing both. SpTGNN represents soil measurements as nodes in a heterogeneous graph with three edge types (spatial proximity, spectral similarity, elevation), and applies relational graph attention to learn separate patterns per relation. A fine-tuned TerraMind encoder extracts node features from Sentinel-2, Sentinel-1 and DEM signals, combined with per-sample environmental covariates and learned positional and temporal embeddings. A sparse Mixture-of-Experts module fuses the four streams via top-$k$ routing. Uncertainty is captured by pairing heteroscedastic regression (aleatoric) with deep ensembles (epistemic), and a Moran's $I$ penalty regularizes spatial autocorrelation. We evaluate on a global SOC corpus split into three regional instances ($\sim$49k samples globally, Africa $\sim$26k, Europe $\sim$14k). Our 5-member deep ensemble reports $R^2=0.762$, RMSE $=3.51\pm0.48$ g/kg and MAPE $=22.9\%$ on the Africa test split, improving over a tabular XGBoost baseline; the best single checkpoint reaches validation $R^2=0.864$. Ablations confirm the heterogeneous graph, MoE fusion and fine-tuned backbone each contribute substantively, and the ensemble UQ stack achieves post-calibration ECE of $0.031$ (hybrid) and $0.026$ ($β$-NLL). To our knowledge, this is the first framework to unify foundation-model feature extraction, heterogeneous graph attention and decomposed uncertainty quantification for SOC estimation.

URL PDF HTML ☆

赞 0 踩 0

2606.16663 2026-06-16 cs.LG 新提交

Beyond Defensive Reporting: Machine Learning for Active Anti-Money Laundering Control in Insurance

超越防御性报告：机器学习在保险主动反洗钱控制中的应用

Dara Goldar, Geir Kjetil Ferkingstad Sandve, Martin Jullum

发表机构 * Fremtind Insurance（Fremtind保险）； University of Oslo（奥斯陆大学）； Norwegian Computing Center（挪威计算中心）

AI总结本文利用挪威保险公司的生产数据，训练梯度提升决策树模型检测洗钱索赔，并引入欺诈标签辅助训练，在预算加权捕获率指标下，最佳模型在2-6%的审查索赔中捕获近三分之二的洗钱案例。

详情

AI中文摘要

通过保险索赔进行洗钱对保险公司构成威胁，既包括欺诈性赔付，也包括声誉和监管风险。尽管如此，很少有研究探讨如何预防此类洗钱行为。本文考察了机器学习是否可以帮助保险公司在赔付前标记可疑索赔，将重点从被动报告转向主动预防。使用一家挪威主要保险公司的生产数据，我们训练梯度提升决策树模型来检测后来被报告给当局涉嫌洗钱的索赔。由于欺诈和洗钱可能共享行为模式，我们还考察了保险欺诈标签是否可以作为辅助训练信号。我们使用预算加权捕获率（本文引入的指标）比较了不同的学习设置，该指标衡量在只能手动审查一小部分索赔时捕获了多少洗钱案例。结果表明，纳入与欺诈相关的调查标签显著改善了洗钱检测。表现最佳的模型在排名前2%至6%的选定调查索赔中捕获了近三分之二的洗钱案例。据我们所知，这是首个关于机器学习在保险索赔中检测洗钱的实证研究。

英文摘要

Money laundering through insurance claims poses a threat to insurers both through fraudulent payouts and reputational and regulatory risk. Despite this, little research has examined how such laundering can be prevented. This paper examines whether machine learning can help insurers flag suspicious claims before payout, shifting the focus from passive reporting to active prevention. Using production data from a major Norwegian insurer, we train gradient-boosted decision tree models to detect claims later reported to authorities for suspected money laundering. Because fraud and laundering may share behavioural patterns, we also examine whether insurance fraud labels can serve as an auxiliary training signal. We compare different learning setups using the Budget-Weighted Capture Rate, a metric introduced in this paper to measure how many laundering cases are captured when only a small share of claims can be manually reviewed. The results show that incorporating fraud-related investigation labels substantially improves laundering detection. The best-performing model captures nearly two-thirds of laundering cases within the top-ranked 2 to 6 percent of claims selected for investigation. To our knowledge, this is the first empirical study of machine learning for money laundering detection in insurance claims.

URL PDF HTML ☆

赞 0 踩 0

2606.16961 2026-06-16 cs.LG q-fin.CP 新提交

Beyond the Smile: A Hybrid Convolutional VAE for Crypto Volatility Surfaces

超越微笑：用于加密货币波动率曲面的混合卷积VAE

Sadanand Singh, Allam Reddy, Manan Chopra

发表机构 * Jasper Research, USA（Jasper Research（美国））

AI总结提出混合卷积VAE结合二次微笑重拟合的预测器，在BTC和ETH期权数据上实现低RMSE，显著优于纯参数化方法，并消除日历和蝶式套利。

详情

AI中文摘要

我们提出了一种用于加密货币隐含波动率曲面的卷积变分自编码器，以及一个可部署的预测器，该预测器通过确定性每期限路由规则将其与二次微笑重拟合相结合。该模型在2023年5月至10月期间6034个完全填充的每小时Binance期权曲面（BTC和ETH）上训练，并在共同的$6 \ imes 7$期限-Delta网格上参数化，在两个市场和10-50%的掩码率下，隐藏单元曲面补全RMSE达到0.94-1.56波动率点范围。混合预测器在50%掩码率下达到0.83波动率点，而单独的微笑重拟合为7.00，在无额外推理成本下实现了八倍的降低。在模拟整个期限行权价撤销的结构相关空洞模式下，微笑重拟合产生9.6-13.1波动率点的误差，而学习模型保持在1.5-1.9，隔离了生成模型是唯一可行预测器的场景。在BTC和ETH上的联合训练相对于表现更优的单标的模型，在两个市场上将分布内模型提升了9-27%，表明在观测窗口内两种最大加密货币之间存在显著共享的波动率曲面流形。混合模型在上市行权价上无日历和蝶式套利，而单独的参数化微笑重拟合在高掩码率下无法保持这一性质。训练模型的每快照重构误差在无监督情况下标记了10月底ETF预期反弹和2023年8月17日闪崩为高误差时期。所有训练和评估基础设施均已发布以支持可重复的后续工作。

英文摘要

We present a convolutional variational autoencoder for cryptocurrency implied-volatility surfaces, together with a deployable predictor that combines it with a quadratic smile re-fit through a deterministic per-tenor routing rule. Trained on 6,034 fully-filled hourly Binance Options surfaces of BTC and ETH spanning May-October 2023 and parameterised on a common $6 \times 7$ tenor-delta grid, the model attains a hidden-cell surface-completion RMSE in the 0.94-1.56 vol-point range across both markets and mask rates 10-50%. The hybrid predictor attains 0.83 vol points at 50% masking against 7.00 for the smile re-fit alone, an eightfold reduction obtained at no additional inference cost. Under structurally-correlated hole patterns that emulate the withdrawal of an entire tenor of strikes, the smile re-fit incurs 9.6-13.1 vol points of error while the learned model remains at 1.5-1.9, isolating a regime in which the generative model is the only viable predictor. Joint training on BTC and ETH improves the in-distribution model on both markets by 9-27% relative to the better-performing single-symbol counterpart, indicating a substantially shared vol-surface manifold across the two largest cryptocurrencies over the observation window. The hybrid is calendar- and butterfly-arbitrage-free at the listed strikes, a property that the parametric smile re-fit alone fails at high mask rates. The per-snapshot reconstruction error of the trained model flags the late-October ETF-anticipation rally and the August $17$, $2023$ flash crash as elevated-error periods without supervision. All training and evaluation infrastructure is released to support reproducible follow-on work.

URL PDF HTML ☆

赞 0 踩 0

2606.17010 2026-06-16 cs.LG 新提交

基于深度学习的月球陨石坑地形相对导航

Batu Candan, Simone Servadio

发表机构 * NASA（美国国家航空航天局）； University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结提出一种结合深度学习陨石坑检测器和扩展卡尔曼滤波的地形相对导航算法，在初始位置偏差达5公里时仍能将导航误差降至数百米。

详情

AI中文摘要

准确的位置估计对于未来使用自主飞行器实现月球着陆至关重要，尤其是在地形特征稀疏的危险环境中。本文提出了一种地形相对导航（TRN）算法，该算法结合了我们专门为NASA陨石坑检测挑战问题设计的深度学习陨石坑检测器和扩展卡尔曼滤波（EKF）。我们的检测器分析从轨道获取的单目图像中的陨石坑特征，并通过匈牙利分配方法及基于共识的离群点去除方法，识别它们与全球数据库中陨石坑的匹配。然后，估计的测量值用于优化EKF，其中航天器在月心月固（LCLF）参考系中的姿态估计，结合高度辅助信息，约束径向漂移。仿真结果表明，即使航天器偏离实际位置达5公里，TRN也能从这种情况中恢复，将导航误差降低到几百米。需要注意的是，为了保持陨石坑特征的对应关系，必须将图像分辨率和场景中的尺度与检测器训练集分布相匹配。

英文摘要

Accurate position estimation is crucial for the successful implementation of future lunar landings using autonomous vehicles, especially in dangerous environments with sparse terrain features. In this paper, we propose a terrain relative navigation (TRN) algorithm combining our deep-learning crater detector, which was designed specifically for the NASA Crater Detection Challenge problem, and an Extended Kalman Filter (EKF). Our detector analyzes crater features from the monocular images acquired from orbit, and their matches with craters from a global database are identified via a Hungarian assignment approach followed by the consensus-based outliers removal method. The estimated measurements are then used to refine an EKF, where spacecraft pose estimation in the Lunar-Centered Lunar-Fixed (LCLF) frame of reference, augmented with altitude aiding information, constrains radial drift. The simulation results indicate that even if the spacecraft is off from its actual location up to 5 km, TRN could recover from this situation, achieving navigation error reduction to a few hundred meters. It should be noted that in order to maintain crater feature correspondences, it is important to match the image resolution and the scales within the scene to the detector training set distribution.

URL PDF HTML ☆

赞 0 踩 0

2606.14788 2026-06-16 cs.SD cs.AI cs.LG eess.AS 交叉投稿

Unifying Acoustic Features and Text with Multimodal LLMs for Neurodegenerative Screening

统一声学特征与文本的多模态大语言模型用于神经退行性疾病筛查

Qingfeng Zhang, Yuanxiong Guo, Yanmin Gong

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出NeurMLLM框架，通过多模态大语言模型融合声谱图、MFCC和文本，实现阿尔茨海默病和帕金森病的精细分期，优于传统方法和现有LLM方法。

Comments IEEE International Conference on Healthcare Informatics, 2026

详情

AI中文摘要

基于语音的筛查为评估阿尔茨海默病（AD）和帕金森病（PD）等神经退行性疾病提供了一种可扩展且非侵入性的方式，但由于整合异质数据的困难，其分期仍然具有挑战性。本文提出了NeurMLLM，一种用于神经退行性疾病分期的高效多模态生成框架。NeurMLLM首先使用视觉变换器对音频数据的声谱图和梅尔频率倒谱系数进行编码，并将其表示投影到大语言模型（LLM）的嵌入空间中，在那里它们与转录文本和人口统计指令标记连接成一个统一的序列。然后，通过低秩适应使用任务提示对LLM进行指令微调，以自回归方式预测受限的标签标记，从而实现生成式分类。通过在Bridge2AI-Voice数据集上对AD和PD进行细粒度分期评估，我们观察到NeurMLLM取得了强劲的性能，持续优于经典机器学习方法和现有的基于LLM的方法。结果表明，多模态LLM在神经退行性疾病分期中具有巨大潜力，提高了分期准确性并支持可访问的部署。

英文摘要

Voice-based screening offers a scalable and non-invasive way to assess neurodegenerative diseases such as Alzheimer's disease (AD) and Parkinson's disease (PD), but their staging remains challenging due to the difficulty of integrating heterogeneous data. This paper presents NeurMLLM, an efficient multimodal generative framework for neurodegenerative disease staging. NeurMLLM first encodes the spectrograms and Mel-frequency cepstral coefficients of audio data with vision transformers and projects their representations into the embedding space of a large language model (LLM), where they are concatenated with transcript and demographic instruction tokens as a single unified sequence. The LLM is then instruction-tuned via Low-Rank Adaptation using task prompts to autoregressively predict a constrained label token, enabling a generative classification. By evaluating on the Bridge2AI-Voice dataset for fine-grained staging of AD and PD, we observe that NeurMLLM achieves strong performance, consistently outperforming classical machine learning methods and existing LLM-based approaches. The results show the high potential of multimodal LLMs in neurodegenerative disease staging, improving staging accuracy and supporting accessible deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.14874 2026-06-16 physics.data-an cs.LG nucl-ex 交叉投稿

Peak-Based Nuclide Identification in HPGe $γ$-Spectrometry with Machine Learning and SHAP

基于峰值的HPGe γ能谱机器学习与SHAP核素识别

Samuel Emmons, Kelly Truax, Maurice Lonsway, Bruce Pierson, Brian Archambault

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Lawrence Berkeley National Laboratory（伯克利国家实验室）

AI总结提出机器学习模型，利用分析者拟合的光电峰映射到核素识别结果，在65种同位素组合的实验谱上F1达0.97，优于传统软件的0.84，并通过SHAP解释揭示模型使用物理相关峰进行预测。

Comments 25 pages, 11 figures (plus an additional 6 figures in the appendix), and 3 tables. To be published in Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment

详情

AI中文摘要

高纯锗伽马能谱通常需要领域专家进行耗时分析。谱中的光电峰被仔细拟合，并采用数值方法辅助核素识别（NID）和定量。修改分析软件识别的核素列表可能很复杂。因此，当需要分析大量样品时，及时做出正确决策具有挑战性。基于监督机器学习的NID可以作为专家知识驱动的自动化工具，改进向分析人员建议的初始放射性核素集合，并更有效地推动后续定量。为此，我们实现了机器学习模型，将分析人员仔细拟合的光电峰映射到NID结果，用于包含从65种同位素集合中抽取的各种同位素组合的实验谱。最佳模型达到了0.97的F1分数，显著超过了使用包含模型评估的相同65种同位素的核素库进行比较时传统软件达到的0.84的F1分数。最后，我们使用Shapley加法解释说明了模型预测的最重要输入特征。这些解释表明，模型在对核素库中的同位素进行预测时使用了物理相关的光电峰。

英文摘要

High-purity germanium gamma spectra often require time-consuming analyses from subject matter experts. Photopeaks within these spectra are carefully fitted and numerical methods are employed to assist with nuclide identification (NID) and quantification. Amending the list of nuclides identified by analysis software can be nontrivial. When many samples need to be analyzed, it is therefore challenging to make timely and correct decisions. Supervised machine-learning-based NID can serve as an expert-informed, automated tool to improve the initial set of radionuclides suggested to an analyst and more effectively drive subsequent quantification. To that end, we implemented machine learning models that map photopeaks carefully fitted by analysts to NID results for experimental spectra containing various isotopic combinations drawn from a set of 65 isotopes. The best model achieved an F1 score of 0.97, markedly surpassing the F1 score of 0.84 achieved by traditional software when compared using a nuclide library comprising the same 65 isotopes assessed by the models. Finally, we illustrated the most important input features for model predictions using Shapley Additive Explanations. These explanations revealed that the models use physically relevant photopeaks when making predictions for the isotopes in our nuclide library.

URL PDF HTML ☆

赞 0 踩 0

2606.15023 2026-06-16 physics.flu-dyn cs.LG 交叉投稿

Multiscale Hypersonic Boundary Layer Reconstruction via Spectral Binning and Subdomain-wise Conditional Diffusion

基于频谱分箱和子域条件扩散的高超声速边界层多尺度重构

Hojin Kim, Dibyajyoti Chakraborty, Takahiko Toki, Carlo Scalo, Romit Maulik

发表机构 * School of Mechanical Engineering, Purdue University（普渡大学机械工程学院）； College of Information Sciences and Technology, Pennsylvania State University（宾夕法尼亚州立大学信息科学与技术学院）； Mathematics and Computer Science Division, Argonne National Laboratory（阿贡国家实验室数学与计算机科学部）

AI总结提出多尺度概率重构框架，通过条件扩散模型从顶部壁面有限观测推断近壁状态，采用软重叠修复策略和边界频谱损失实现高超声速库埃特流全场重构。

Comments 33 pages, 28 figures

详情

AI中文摘要

我们提出了一个用于高超声速库埃特流的多尺度概率重构框架，其中通过条件扩散模型从有限的顶部壁面观测推断近壁状态。边界层被划分为重叠的壁法向子域，并联合训练一个高度和马赫数条件的阐明扩散模型（EDM），用于M=6,7,8，以采样以顶部壁面边界切片为条件的速度、密度、压力和温度场。一种软重叠修复策略将子域预测组装成全体积重构，同时保持子域间的连续性和小尺度变异性。为了提高生成场的频谱保真度，我们引入了一种新颖的有界分箱频谱功率（BSP）损失，该损失保留高波数内容，同时在扩散噪声调度中保持数值稳定。与直接数值模拟数据的验证表明，该模型在所有训练马赫数下恢复了瞬时结构、频谱、统计剖面、相关性和壁面量，同时提供了空间结构化的不确定性估计。重构的马赫数条件剖面也在Trettel-Larsson变换下坍缩，表明与可压缩性缩放的一致性。这些结果确立了具有有界分箱频谱损失的域分解条件扩散模型作为高超声速壁面湍流中近壁重构的有效概率代理。

英文摘要

We propose a multiscale probabilistic reconstruction framework for hypersonic Couette flow, where near-wall states are inferred from limited top-wall observations using conditional diffusion model. The boundary layer is divided into overlapping wall-normal subdomains, and a single height- and Mach-conditioned Elucidating Diffusion Model (EDM) is trained jointly for M=6,7,8 to sample velocity, density, pressure, and temperature fields conditioned on a top-wall boundary slice. A soft overlap inpainting strategy assembles subdomain predictions into full-volume reconstructions while maintaining inter-subdomain continuity and small-scale variability. To improve the spectral fidelity of the generated fields, we introduce a novel bounded binned spectral power (BSP) loss that preserves high-wavenumber content while remaining numerically stable across the diffusion noise schedule. Validation against direct numerical simulation data shows that the model recovers instantaneous structures, spectra, statistical profiles, correlations, and wall quantities across all training Mach numbers, while providing spatially structured uncertainty estimates. The reconstructed Mach-conditioned profiles also collapse under the Trettel-Larsson transformation, indicating consistency with compressibility scaling. These results establish the domain decomposed conditional diffusion model with a bounded binned spectral loss as an effective probabilistic surrogate for near-wall reconstruction in hypersonic wall-bounded turbulence.

URL PDF HTML ☆

赞 0 踩 0

2606.15213 2026-06-16 quant-ph cs.LG 交叉投稿

Quantum-classical hybrid models based on error correction for time series forecasting

基于纠错机制的量子-经典混合模型用于时间序列预测

Jonathan H. A. de Carvalho, Filipe C. de L. Duarte, Fernando M. de Paula Neto, Paulo S. G. de Mattos Neto

AI总结提出首个基于纠错的量子-经典混合预测系统，量子模型提取模式，经典模型从量子误差中捕获剩余模式，在多数问题上取得最优结果。

Comments Submitted to Nature Computational Science. 24 pages, 10 figures

详情

AI中文摘要

时间序列预测很大程度上受益于结合不同模型的优势，特别是使用一种方案，其中一个模型通过从预测误差中捕获补充模式来纠正另一个模型。同时，量子模型通过在混合架构中与经典模型一起作用，为增强经典能力提供了手段，包括在时间序列预测中。在这项工作中，我们提出了第一个基于纠错的预测系统，该系统联合使用量子模型和经典模型。在这里，量子模型首先通过探索量子现象提取模式，然后经典模型从量子误差中捕获剩余模式。与经典单一模型和基于纠错的经典-经典混合模型相比，这种量子-经典系统产生的互补能力在大多数处理的问题中提供了最佳结果。因此，这项工作为在时间序列预测的既定混合方案中引入量子模型铺平了道路。

英文摘要

Time series forecasting largely benefits from combining the strengths of different models, especially using a scheme where a model corrects another model by capturing supplementary patterns from forecasting errors. Concurrently, quantum models are providing a means to augment the classical capacity, including in time series forecasting, by acting alongside classical models in hybrid architectures. In this work, we propose the first forecasting system based on error correction that jointly uses quantum and classical models. Here, quantum models first extract patterns by exploring quantum phenomena, and classical models capture the remaining patterns from the quantum errors. Compared to classical single models and classical-classical hybrid models based on error correction, the complementary capacity that emerges from this quantum-classical system provided the best results in most of the addressed problems. Therefore, this work paves the way to introduce quantum models in established hybridization schemes for time series forecasting.

URL PDF HTML ☆

赞 0 踩 0

2606.15234 2026-06-16 eess.SP cs.CE cs.LG 交叉投稿

Surrogate-Assisted Framework for SI-Compliant Interconnect Design Optimization Using the Earth Mover's Distance

基于推土机距离的SI合规互连设计优化代理辅助框架

Emre Ecik, Werner John, Julian Withöft, Ralf Brüning, Jürgen Götze

发表机构 * Information Processing Lab, TU Dortmund University（图腾大学信息处理实验室）； Pyramide2525/TU Dortmund University（图腾大学Pyramide2525分部）； EMC Technology Center Paderborn, Zuken GmbH（帕德博恩EMC技术中心，祖克纳公司）

AI总结提出一种基于推土机距离的确定性机器学习辅助框架，通过代理模型预测波形、决策树筛选SI合规设计，并利用EMD排序，实现可解释且高效的PCB互连优化。

Comments 16 pages, 15 figures. This manuscript has been submitted to Advances in Radio Science for review (2026)

详情

AI中文摘要

本文提出一种基于推土机距离（EMD）的确定性机器学习辅助框架，用于SI合规的PCB设计。与依赖迭代黑盒搜索过程的传统代理优化方法不同，本方法采用可解释的顺序评估策略。首先使用神经代理模型根据拓扑相关设计参数高效预测波形描述特征。然后，决策树作为物理驱动的质量门，根据预定义的SI标准识别SI合规波形。在得到的有效解空间中，采用推土机距离作为相似性度量，根据候选设计与理想参考信号的接近程度对其进行排序。这不仅能够确定性地识别可接受的参数区域，而且无需逆建模或随机搜索过程即可透明地优先选择物理上更优的解。通过大规模仿真DDR3飞越波形数据集验证了该方法。通过结合代理预测、可解释分类和基于EMD的波形评估，该框架为基于AI方法的PCB开发提供了可解释且计算高效的替代传统优化策略的方案。

英文摘要

This work presents a deterministic, machine-assisted framework for SI-compliant PCB design based on the Earth Mover's Distance (EMD). In contrast to conventional surrogate-based optimization methods that rely on iterative black-box search procedures, the proposed approach follows an interpretable, sequential evaluation strategy. Neural surrogate models are first used to efficiently predict waveform describing features from topology-dependent design parameters. A decision tree then acts as a physically motivated quality gate that identifies SI-compliant waveforms according to predefined SI criteria. Within the resulting valid solution space, the Earth Mover's Distance is employed as a similarity metric to rank candidate designs according to their proximity to an ideal reference signal. This enables not only the deterministic identification of admissible parameter regions but also a transparent prioritization of physically superior solutions without inverse modeling or stochastic search procedures. The methodology is demonstrated using a large-scale set of simulated DDR3 fly-by waveforms. By combining surrogate prediction, interpretable classification, and EMD-based waveform evaluation, the framework provides an explainable and computationally efficient alternative to conventional optimization strategies for supporting PCB development with AI-based methods.

URL PDF HTML ☆

赞 0 踩 0

2606.15251 2026-06-16 cs.RO cs.AI cs.LG 交叉投稿

Driving, Fast or Slow? Neuro-Symbolic Guidance for Motion Prediction in Multi-Modal Ground Mobility

驾驶，快或慢？多模态地面移动中运动预测的神经符号引导

Simon Kohaut, Felix Divo, Julius Hahnewald, Benedict Flade, Julian Eggert, Kristian Kersting, Devendra Singh Dhami

发表机构 * Artificial Intelligence and Machine Learning Lab, TU Darmstadt（达姆施塔特工业大学人工智能与机器学习实验室）； Honda Research Institute（本田研究所）； Hessian Center for AI (hessian.AI)（黑森州人工智能中心）； Centre for Cognitive Science（认知科学中心）； German Center for AI (DFKI)（德国人工智能研究中心）； Uncertainty in Artificial Intelligence Lab, TU Eindhoven（埃因霍温理工大学人工智能不确定性实验室）

AI总结提出TraCS框架，通过神经符号方法将交通规则编码为概率一阶逻辑，增强黑盒运动预测模型的可解释性和合规性，在Argoverse 2上持续提升SOTA性能。

详情

AI中文摘要

准确且可解释的异构交通空间（包括行人、自行车、汽车和卡车）运动预测对于安全的自主导航至关重要。然而，最先进的方法仍然是黑盒，缺乏对现实世界移动的监管和行为约束的显式编码。我们提出Trajectory Compliance-Shaping (TraCS)，一种神经符号框架，通过可解释的概率一阶逻辑增强现有的黑盒运动预测骨干网络。为此，TraCS采用智能体代码生成流水线，弥合交通规则的自然语言描述与概率运动预测之间的差距。此外，TraCS采用反应式数据流推理引擎，随着场景演变维护并高效更新合规性景观。为防止TraCS过度自信地将骨干网络的预测引导到错误方向，我们提出一种神经置信度评分，作为上下文感知的合规性信号衰减。我们在Argoverse 2基准上展示了TraCS如何持续改进最先进的预测骨干网络，表明概率和符号合规性推理是纯神经运动预测的广泛适用且计算高效的补充。

英文摘要

Accurate and interpretable motion prediction for heterogeneous traffic spaces, including pedestrians, bicycles, cars, and trucks, is essential for safe autonomous navigation. Nevertheless, state-of-the-art approaches remain predominantly black-box, lacking explicit encoding of the regulatory and behavioral constraints of real-world mobility. We propose Trajectory Compliance-Shaping (TraCS), a neuro-symbolic framework that augments existing black-box motion prediction backbones with interpretable and probabilistic first-order logic. To do so, TraCS employs an agentic code-generation pipeline to bridge the gap between natural-language descriptions of traffic regulations and probabilistic motion prediction. Furthermore, TraCS employs a reactive data-streaming inference engine that maintains and efficiently updates compliance landscapes as scenes evolve. To prevent TraCS from overconfidently steering the backbone's predictions in the wrong direction, we propose a neural confidence rating learned as a context-aware attenuation of the compliance signal. We demonstrate on the Argoverse 2 benchmark how TraCS consistently improves state-of-the-art prediction backbones, showing that probabilistic and symbolic compliance reasoning is a broadly applicable and computationally efficient complement to purely neural motion predictors.

URL PDF HTML ☆

赞 0 踩 0

2606.15356 2026-06-16 physics.flu-dyn cs.LG 交叉投稿

ShipNet: A Geometric Deep Learning Surrogate for Real-Time Ship Hydrodynamics

ShipNet：一种用于实时船舶水动力学的几何深度学习代理模型

Kirsten Odendaal, George Drakoulas

发表机构 * Maritime Research Institute（海洋研究机构）； Wageningen, Netherlands（荷兰瓦格宁根）； Damen Research（达门研究）； Gorinchem, Netherlands（荷兰戈林切姆）

AI总结提出ShipNet几何深度学习代理模型，直接从船体几何和速度预测压力分布与波浪场，在保留测试集上R²达0.98和0.91，推理速度比势流求解器快550倍以上。

详情

AI中文摘要

准确预测水动力性能是船舶设计的核心，然而高保真计算流体动力学在大规模参数探索中仍然过于昂贵。这促使开发数据驱动的代理模型，以显著降低的成本提供对水动力预测的快速近似。我们提出ShipNet，一种几何深度学习代理模型，直接从船体几何和速度预测船体表面压力分布和远场自由表面波模式。该网络在船体点云上采用正则化动态图卷积主干，并使用多头解码器同时输出近体压力和自由表面高程。训练数据包括使用势流面板法对两种母型游艇船体生成的420次无粘自由表面模拟，每种船体参数化为70种变体并在三种速度下评估。ShipNet使用结合逐点回归和图像结构项的复合损失预测每点压力系数和二维波浪高程图。在几何保留测试集上，ShipNet对船体压力达到R²=0.98，对波浪场达到R²=0.91。每个案例推理约需0.15秒，在传统硬件上相比势流求解器实现超过550倍的加速。局限性包括受限的几何和速度范围以及无粘训练数据，未来工作将通过物理信息正则化将模型扩展到高保真粘性模拟。

英文摘要

Accurate prediction of hydrodynamic performance is central to ship design, yet high-fidelity computational fluid dynamics remains prohibitively expensive for large-scale parametric exploration. This motivates the development of data-driven surrogate models that provide rapid approximations to hydrodynamic predictions at substantially reduced cost. We present ShipNet, a geometric deep-learning surrogate that predicts both hull-surface pressure distributions and far-field free-surface wave patterns directly from hull geometry and speed. The network employs a regularized dynamic graph convolutional backbone on hull point clouds, with a multi-head decoder for simultaneous near-body pressure and free-surface elevation outputs. Training data consist of 420 inviscid free-surface simulations generated using a potential-flow panel method for two parent yacht hulls, each parameterized into 70 variants and evaluated at three speeds. ShipNet predicts per-point pressure coefficient and two-dimensional wave elevation map using a composite loss that combines point-wise regression and image-structure terms. On a geometry-held-out test set, ShipNet achieves R^2=0.98 for hull pressure and R^2=0.91 for wave fields. Inference requires approximately 0.15s per case, yielding over a 550x speedup relative to the potential-flow solver on conventional hardware. Limitations include the restricted geometry and speed ranges and the inviscid training data, while future work will extend the model to high-fidelity viscous simulations with physics-informed regularization.

URL PDF HTML ☆

赞 0 踩 0

2606.15370 2026-06-16 cs.CV cs.LG 交叉投稿

MNet++: Extended 2D/3D Networks for Anisotropic Medical Image Segmentation

MNet++: 用于各向异性医学图像分割的扩展2D/3D网络

Kirsten Odendaal, Rade Bajic

发表机构 * School of Computing, Georgia Institute of Technology（佐治亚理工学院计算学院）

AI总结本文复现并扩展了混合2D/3D卷积网络MNet，引入自适应融合门控和VMamba状态空间模块，在保持各向异性鲁棒性的同时提升分割性能。

详情

AI中文摘要

本工作展示了MNet的完整复现与扩展，MNet是一种专为各向异性医学图像分割设计的混合2D/3D卷积网络。在nnU-Net框架内重新实现了原始架构，以验证其报告的性能和对可变体素间距（即各向异性）的鲁棒性。在匹配的预处理和计算约束下，在PROMISE前列腺MRI和LiTS肝脏CT的受控子集上进行了实验。复现的MNet在PROMISE上达到了89.0 +/- 0.9%的Dice相似系数（DSC），与已发表结果相差0.8%，在LiTS上肝脏和肿瘤分割分别达到94.3 +/- 1.9%和54.6 +/- 3.1%。进一步引入了两种轻量级扩展：(1) 一种学习的融合门控机制，实现自适应2D-3D特征融合；(2) 一个VMamba状态空间模块，用于高效的长程深度建模。空间门控变体以不到3%的推理开销将DSC提高了+0.8%，而VMamba提高了性能一致性，将PROMISE Dice变异降低至+/- 0.7%，并在LiTS肝脏上达到最强性能，Dice为95.8%。两种扩展均保持了MNet对各向异性的鲁棒性，在1-4 mm体素间距下Dice变化为1.5%。总体而言，该研究证实了MNet的可复现性，并表明自适应融合和状态空间建模有潜力进一步增强各向异性条件下的分割可靠性。然而，需要进一步测试才能得出明确结论。

英文摘要

This work demonstrates a full reproduction and extension of MNet, a hybrid 2D/3D convolutional network designed for anisotropic medical image segmentation. The original architecture was re-implemented within the nnU-Net framework to verify its reported performance and robustness to variable voxel spacing, known as anisotropy. Experiments were conducted on PROMISE prostate MRI and a controlled subset of LiTS liver CT under matched preprocessing and compute constraints. The reproduced MNet achieved a Dice similarity coefficient (DSC) of 89.0 +/- 0.9% on PROMISE, within 0.8% of the published result, and 94.3 +/- 1.9% / 54.6 +/- 3.1% for liver and tumor segmentation on LiTS, respectively. Two lightweight extensions were further introduced: (1) a learned Fusion Gating mechanism enabling adaptive 2D-3D feature blending, and (2) a VMamba state-space module for efficient long-range depth modelling. The Spatial Gating variant improved DSC by +0.8% with less than 3% inference overhead, while VMamba improved performance consistency, reducing PROMISE Dice variation to +/- 0.7% and achieving the strongest LiTS liver performance at 95.8% Dice. Both extensions preserved MNet robustness to anisotropy, with delta Dice = 1.5% across 1-4 mm voxel spacing. Overall, the study confirms MNet reproducibility and demonstrates that adaptive fusion and state-space modelling have the potential to further strengthen segmentation reliability under anisotropic conditions. However, further tests are required to provide definitive conclusions.

URL PDF HTML ☆

赞 0 踩 0

2606.15449 2026-06-16 cs.CL cs.IR cs.LG 交叉投稿

Transfer Learning for FHIR Questionnaire Terminology Binding

面向 FHIR 问卷术语绑定的迁移学习

Maxim Gorshkov

发表机构 * Department of Computer Science, Stanford University（斯坦福大学计算机科学系）

AI总结将 FHIR 问卷项与 LOINC 代码的绑定视为检索问题，比较六种方法，发现 BioLORD 在 top-1 准确率上最优，而对比微调在 top-5 和 top-10 上表现更好，并分析了分布偏移和错误类型。

详情

AI中文摘要

电子预授权工作流要求 FHIR 问卷项携带 LOINC 代码，但 HL7 Da Vinci CDS-Library 中的大多数项缺乏这些绑定。我们将其视为一个检索问题：给定问卷项的文本，从 97,314 个活跃代码池中找到正确的 LOINC 代码。我们在一个包含 54 个项的评估集上比较了六种方法（TF-IDF、冻结 MiniLM、BioBERT、BioLORD、对比微调 MiniLM 以及 TF-IDF+GPT 重排序器），该评估集涵盖三种查询风格（自然问题、中等和简洁）。没有单一方法在所有指标上获胜。BioLORD 是一个在生物医学本体定义上预训练的冻结编码器，尽管没有见过任务特定数据，但其 top-1 准确率最高（R@1 = 0.185，MRR = 0.246），而在原始 LHC-Forms 对上的对比微调则在 R@5（0.389）和 R@10（0.426）上表现最佳。分布偏移消融实验表明，为什么我们主表中的微调不是最强的：在原始对中添加 GPT 生成的释义后，R@5 从 0.389 降至 0.296，因此增强联合在除 R@1 外的所有指标上均不如仅使用原始训练。性能在 5k 训练对时达到峰值。对 BioLORD 的 R@1 失败案例的错误分析表明，错误特异性和歧义文本案例共占错误的 59%。

英文摘要

Electronic prior authorization workflows require FHIR Questionnaire items to carry LOINC codes, yet most items in the HL7 Da Vinci CDS-Library lack these bindings. We treat this as a retrieval problem: given a Questionnaire item's text, find the correct LOINC code in a pool of 97,314 active codes. We compare six methods (TF-IDF, frozen MiniLM, BioBERT, BioLORD, contrastively fine-tuned MiniLM, and a TF-IDF+GPT reranker) on a 54-item evaluation set spanning three query styles (natural question, medium, and terse). No single method wins on every metric. BioLORD, a frozen encoder pre-trained on biomedical ontology definitions, has the best top-rank accuracy (R@1 = 0.185, MRR = 0.246) despite seeing no task-specific data, while a contrastive fine-tune on raw LHC-Forms pairs takes R@5 (0.389) and R@10 (0.426). A distribution-shift ablation shows why the fine-tune in our main table is not the strongest one: adding GPT-generated paraphrases to the raw pairs drops R@5 from 0.389 to 0.296, so the augmented union underperforms raw-only training on every metric except R@1. Performance peaks at 5k training pairs. Error analysis on BioLORD's R@1 failures shows that wrong-specificity and ambiguous-text cases together account for 59% of errors.

URL PDF HTML ☆

赞 0 踩 0

2606.15559 2026-06-16 cs.SE cs.DC cs.LG 交叉投稿

SDVDiag: Multimodal Causal Discovery for Online Diagnosis in Software-defined Vehicles

SDVDiag：软件定义车辆中用于在线诊断的多模态因果发现

Matthias Weiß, Athreya Hosahalli Prakash, Falk Dettinger, Nasser Jazdi, Michael Weyrich

发表机构 * University of Erlangen-Nuremberg（埃尔兰根-纽伦堡大学）； Fraunhofer Institute for Software and Virtual Systems（弗劳恩霍夫软件与虚拟系统研究所）

AI总结提出SDVDiag多模态因果发现管道，融合日志和指标表示构建因果图，结合异常触发实现持续在线诊断，在自动泊车测试中因果图更稀疏，根因定位准确。

Comments 8 pages, 4 figures, 2 tables

详情

AI中文摘要

向软件定义车辆的转变将越来越多的车辆功能集中到分布式软件服务中，故障通过服务依赖关系传播，表面症状通常与潜在缺陷相隔多个因果跳。现有方法仅部分解决此类系统中的因果根因分析：它们通常基于单一可观测性模态进行推理，并以离线、操作员驱动的方式运行，无法满足连续车辆运行的需求。本文提出SDVDiag，一种多模态因果发现管道，在图构建之前将基于日志和基于指标的服务表示融合到共享嵌入空间中，并结合异常驱动触发器，将诊断平台从手动操作的批处理工具转变为持续运行的在线系统。在自动代客泊车测试平台上的评估表明，多模态管道生成的因果图比仅基于指标的基线更稀疏（平均134条边 vs. 182条边），并且在人工反馈优化的每个阶段，基于专家知识图的边加权奖励始终优于基线，在60次反馈查询后比基线提高了2.4倍。端到端故障注入场景进一步证明，集成触发器正确恢复了位于可观察症状上游两个因果跳的真实根因。

英文摘要

The transition toward software-defined vehicles concentrates an increasing share of vehicle functionality into distributed software services, where failures propagate through service dependencies and the surface symptom is often several causal hops away from the underlying defect. Existing approaches to causal root-cause analysis in such systems address this only partially: they typically reason over a single observability modality and operate in an offline, operator-driven mode that does not match the demands of continuous vehicle operation. This paper presents SDVDiag, a multimodal causal-discovery pipeline that fuses log-based and metric-based service representations into a shared embedding space before graph construction, coupled with an anomaly-driven trigger that converts the diagnostic platform from a manually operated batch tool into a continuously running online system. Evaluation on an Autonomous Valet Parking testbed shows that the multimodal pipeline produces sparser causal graphs than a metrics-only baseline (134 vs. 182 edges on average) and consistently outperforms it in edge-weighted reward against an expert knowledge graph at every stage of human-feedback refinement, showing a 2.4-fold improvement over the baseline after 60 feedback queries. An end-to-end fault-injection scenario further demonstrates that the integrated trigger correctly recovers a true root cause located two causal hops upstream of the observable symptom.

URL PDF HTML ☆

赞 0 踩 0

2606.15565 2026-06-16 cs.HC cs.LG 交叉投稿

If These Walls Could Talk: Critical Play with Large Language Models in Museums

如果这些墙会说话：博物馆中大语言模型的批判性游戏

Anders Sundnes Løvlie

发表机构 * The Dalí Museum（达利博物馆）

AI总结针对博物馆中大语言模型聊天机器人不可靠但吸引人的矛盾，提出设计批判性游戏，将机器人作为虚构角色呈现历史叙事、话语风格和多元视角。

2606.15594 2026-06-16 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY 交叉投稿

Pixels to Proofs: Probabilistically-Safe Latent World Model Control via Parallel Conformal Robust MPC

从像素到证明：通过并行保形鲁棒MPC实现概率安全的潜在世界模型控制

Devesh Nath, Anutam Srinivasan, Haoran Yin, Ruitong Jiang, Jeffrey Fang, Glen Chou

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结提出SLS^2框架，结合保形预测与鲁棒模型预测控制，在学习的潜在世界模型中实现基于视觉的安全运动规划，提升目标到达性能与安全性。

详情

AI中文摘要

我们提出了SLS^2，一个使用鲁棒模型预测控制（MPC）在学习的潜在世界模型中进行安全反馈运动规划的框架。我们的方法训练了一个动作条件的联合嵌入世界模型，具有紧凑的马尔可夫潜在状态，通过学习的潜在动力学实现高效的基于梯度的轨迹优化。为了在潜在预测不完美的情况下确保真实系统的安全性，我们采用保形预测来通知GPU加速的系统级综合（SLS）鲁棒MPC方案，以获得校准的潜在误差界限和鲁棒的潜在空间约束集。我们还学习并保形化了一个潜在约束检查器，使SLS规划器能够在闭环执行期间施加概率安全约束。我们在基于视觉的控制任务上评估了我们的方法，与潜在世界模型和安全规划基线相比，它提高了目标到达性能和安全性。

英文摘要

We present SLS^2, a framework for safe feedback motion planning from pixels using robust model predictive control (MPC) in learned latent world models. Our approach trains an action-conditioned joint-embedding world model with compact Markovian latent states, enabling efficient gradient-based trajectory optimization through learned latent dynamics. To enforce safety for the true system despite imperfect latent predictions, we inform a GPU-accelerated system level synthesis (SLS) robust MPC scheme with conformal prediction to obtain calibrated latent error bounds and robust latent-space constraint sets. We further learn and conformalize a latent constraint checker, allowing the SLS planner to impose probabilistic safety constraints during closed-loop execution. We evaluate our method on vision-based control tasks, where it improves both goal-reaching performance and safety over latent world-model and safe-planning baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.15694 2026-06-16 cs.MM cs.AI cs.CV cs.LG 交叉投稿

学习格点QCD中方差约化的生成泛函

Ryan Abbott, Yang Fu, Daniel C. Hackett, Gurtej Kanwar, Fernando Romero-López, Phiala E. Shanahan

发表机构 * Physics Department, Columbia University（哥伦比亚大学物理系）； Center for Theoretical Physics, Massachusetts Institute of Technology（麻省理工学院理论物理中心）； Fermi National Accelerator Laboratory（费米国家加速器实验室）； Higgs Centre for Theoretical Physics, School of Physics and Astronomy, University of Edinburgh（爱丁堡大学物理与天文学学院希格斯理论物理中心）； Albert Einstein Center, Institute for Theoretical Physics, University of Bern（伯尔尼大学爱因斯坦中心理论物理研究所）； NSF AI Institute for Artificial Intelligence and Fundamental Interactions（国家科学基金会人工智能与基本相互作用AI研究所）

AI总结利用机器学习归一化流编码生成泛函表示，系统降低格点规范场论中任意N点关联函数的方差，在QCD和杨-米尔斯理论中实现高达三个数量级的方差约化。

Comments 8 pages, 3 figures

2606.15994 2026-06-16 cs.AI cs.LG 交叉投稿

Agentic Framework for Deep Learning workload migration via In-Context Learning

基于上下文学习的深度学习工作负载迁移智能体框架

Qiyue Liang, Steven Ingram, George Vanica, Andi Gavrilescu, Newfel Harrat, Hassan Sipra, Sethuraman Sankaran

发表机构 * Google（谷歌）

AI总结提出结合上下文学习与Oracle驱动的自调试的自主系统，实现从PyTorch到JAX的深度学习模型自动迁移，在神经模块上达到91%数值等价性。

详情

AI中文摘要

将深度学习模型从PyTorch灵活的面向对象设计迁移到JAX的函数式无状态设置通常是一项手动且易出错的任务。自动迁移具有挑战性，因为大型语言模型（LLM）难以处理严格且动态的API对齐，并且容易在精确操作上出错。我们提出了一个完全自主的系统，结合了上下文学习（ICL）与Oracle驱动的自调试。首先，我们整理了一个ICL上下文，作为惯用JAX样式和测试用例生成的严格参考。其次，不依赖LLM推导数学输出，而是运行源PyTorch模块以获取其实际的动态张量状态，从而创建一个不可变的执行Oracle。然后，我们使用自主智能体循环基于Oracle数据合成测试。测试用例被重复执行，并将回溯发送回LLM进行自我修正。消融实验表明，将ICL参考与Oracle基础及自调试相结合，大大优于纯指令和基本智能体基线。这种改进没有增加过多的计算开销。我们的轻量级流水线在神经模块上实现了91%的数值等价性（相比之下，基线为9%，指令+自调试为27%），为跨框架迁移提供了高度可靠、可扩展的蓝图。该方案已在多个最先进模型上得到验证，包括SAM（Segment Anything）、T5、Code Whisper等，显示出高数值等价性。代码：https://github.com/AI-Hypercomputer/accelerator-agents/tree/main/MaxCode

英文摘要

Translating deep learning models from PyTorch's flexible, object-oriented design to JAX's functional, stateless setup is usually a manual and error-prone task. Automated migration is challenging because Large Language Models (LLMs) struggle with strict and dynamic API alignment and are prone to mistakes for exacting operations. We propose a fully autonomous system that combines In-Context Learning (ICL) with oracle-driven self-debugging. First, we curated an ICL context that serves as a strict reference for idiomatic JAX styling and test case generation. Second, instead of depending on the LLM to deduce mathematical outputs, we run the source PyTorch modules to get their actual dynamic tensor states. This creates an unchangeable execution oracle. We then use an autonomous agentic loop to synthesize tests based on the oracle data. The test cases are executed repeatedly, and the traceback is sent back to the LLM for self-correction. Ablations show that combining ICL references with oracle grounding and self-debugging greatly outperforms pure instructional and basic agentic baselines. This improvement does not add an excessive computational overhead. Our lightweight pipeline achieves 91% numerical equivalence (compared to baseline: 9%, instruction + self-debugging: 27%) on neural modules, providing a highly reliable, scalable blueprint for cross-framework migration. This has been validated across several state-of-the-art models including SAM (segment anything), T5, Code Whisper amongst others showing high numerical equivalency. Code: https://github.com/AI-Hypercomputer/accelerator-agents/tree/main/MaxCode

URL PDF HTML ☆

赞 0 踩 0

2606.16019 2026-06-16 cs.CL cs.LG cs.SD 交叉投稿

Scaling Human and G2P Supervision for Robust Phonetic Transcription

扩展人类与G2P监督以实现鲁棒语音转录

Alexander Metzger, Aruna Srivastava, Ruslan Mukhamedvaleev

发表机构 * Koel Labs LLC

AI总结研究自动语音转录中人类标注与G2P监督的扩展规律，发现当人类标注少于20-30小时时G2P有效，超过后无益甚至降低鲁棒性，而ASR预训练可显著提升性能。

Comments Accepted to Interspeech 2026

详情

AI中文摘要

专家语音标注成本高昂，尤其对于非标准方言和非典型语音。一种常见替代方法是使用字素到音素（G2P）模型从文本转录中自动生成语音标签。我们研究了自动语音转录性能如何随英语中人类和G2P监督的扩展而变化。使用一个涵盖母语、非母语和卒中后语音的精心策划的80小时基准测试，我们确定了一个监督质量阈值：只有当人类标注少于20-30小时时，G2P监督才有帮助。超过此阈值，它不提供显著益处，并可能降低跨方言鲁棒性。在此阈值之后有效的是ASR预训练，我们使用它实现了比先前系统加权音素特征错误率降低2.3倍，在非母语和失语症语音上取得了强劲提升。这些结果表明，数量驱动的G2P扩展可能对鲁棒泛化产生递减收益。

英文摘要

Expert phonetic annotation is costly, especially for non-standard dialects and atypical speech. A common alternative is using Grapheme-to-Phoneme (G2P) models to auto-generate phonetic labels from text transcripts at scale. We study how automatic phonetic transcription performance scales with human and G2P supervision in English. Using a curated 80-hour benchmark spanning native, non-native and post-stroke speech, we identify a supervision quality threshold: G2P supervision helps only when fewer than 20-30 hours of human annotation are available. Beyond this threshold, it provides no significant benefit and can reduce cross-dialect robustness. What is effective after this threshold is ASR pretraining which we use to achieve a 2.3x reduction in weighted phone feature error rate over prior systems, with strong gains on non-native and aphasic speech. These results suggest that quantity-driven G2P scaling may yield diminishing returns for robust generalization.

URL PDF HTML ☆

赞 0 踩 0

2606.16032 2026-06-16 cond-mat.other cs.LG 交叉投稿

多燃料发动机的实时不确定性补偿数据驱动控制

Rajasree Sarkar, Arunava Banerjee, Sathya Aswath Govind Raju, Ishan Berk Altiner, Zongxuan Sun, Kenneth Kim, Chol-Bum Mike Keown

发表机构 * Department of Mechanical Engineering, University of Minnesota Twin Cities（明尼苏达大学双城分校机械工程系）； DEVCOM Army Research Laboratory, Aberdeen Proving Ground, MD, USA（美国陆军研发实验室，阿伯丁试验场，马里兰州）

AI总结针对多燃料压燃发动机燃烧相位控制中建模不确定性的挑战，提出一种基于高斯过程回归模型和不确定性补偿器的数据驱动实时控制框架，实现有限循环内收敛。

详情

AI中文摘要

多燃料压燃（CI）发动机具有出色的功率密度和燃料灵活性。然而，在广泛运行条件下实现一致且最优的燃烧相位仍然是一个重大挑战，尤其是在存在建模不确定性的情况下。本文提出了一种新颖的、数据驱动的实时不确定性补偿框架，用于多燃料CI发动机的燃烧控制。所提出的方法引入了一个伪发动机转速，使得控制输入能够动态适应影响发动机的不确定性。为了对底层燃烧过程进行建模，首先在可用的输入-输出数据上训练高斯过程回归（GPR）模型，捕捉不同运行条件下的非线性和燃料依赖行为。然后通过学习的GPR代理的模型逆合成控制输入，并增加一个不确定性补偿器，旨在减轻由运行条件动态变化和模型不准确性引起的偏差。这种集成控制策略允许在有限数量的燃烧循环内进行实时输入修正。理论分析为所提出的控制器建立了有限时间收敛保证。仿真结果表明，所提出的方法能够实时将燃烧相位引导至期望值，为多燃料CI发动机运行提供了一种可扩展且自适应的控制解决方案。

英文摘要

Multi-fuel compression ignition (CI) engines offer superior power density and fuel flexibility. However, achieving consistent and optimal combustion phasing across a wide range of operating conditions remains a major challenge, particularly in the presence of modeling uncertainties. This paper presents a novel, data-driven real-time uncertainty compensation framework for combustion control in multi-fuel CI engines. The proposed approach introduces a pseudo-engine speed that enables dynamic adaptation of control inputs in response to uncertainty affecting the engine. To model the underlying combustion process, a Gaussian Process Regression (GPR) model is first trained on available input-output data, capturing the nonlinear and fuel-dependent behavior across varying operating conditions. Control inputs are then synthesized through model inversion of the learned GPR surrogate and augmented with an uncertainty compensator designed to mitigate deviations caused by dynamic variations in operating conditions and model inaccuracies. This integrated control strategy allows for real-time input corrections within a finite number of combustion cycles. Theoretical analysis establishes finite-time convergence guarantees for the proposed controller. Simulation results demonstrate that the proposed method steers the combustion phasing to the desired value in real-time, providing a scalable and adaptive control solution for multi-fuel CI engine operation.

URL PDF HTML ☆

赞 0 踩 0

2606.16271 2026-06-16 cs.CV cs.LG 交叉投稿

Contrastive Learning for Seismic Horizon Tracking with Domain-Specific Priors

基于领域先验的对比学习用于地震层位追踪

Alexandre Thouvenot, Lionel Boillot, Vincent Gripon

发表机构 * IMT Atlantique, LAB-STICC, UMR CNRS 6285（IMT Atlantique, LAB-STICC, CNRS 6285联合实验室）； TotalEnergies, OneTech（道达尔能源公司, OneTech）

AI总结提出自监督融合信号与纹理的方法，利用信号导出的局部层位对应作为领域先验训练纹理深度学习模型，通过对比学习保持层位身份，实现跨不连续面的层位追踪。

Comments 5 pages, 5 figures. Submitted to the IEEE GRSL for possible publication

详情

AI中文摘要

无监督3D地震层位追踪面临一个关键限制：基于信号的传播器提供精确的迹级对齐，但在断层附近常失败，而纹理驱动的深度模型对不连续性更鲁棒，但通常以标记数据需求和降低迹级精度为代价。我们提出了一种自监督融合两种范式的方法，其中信号导出的局部层位对应作为领域先验来训练基于纹理的深度学习模型。具体来说，我们从反射体斜率估计可靠的迹间流，并将其用于形成对比目标中的正对，同时将训练限制在高置信度邻域，可选地使用断层掩码增强。目标不是推断不连续性附近的模糊对应，而是跨不连续性保持层位身份。结果，网络学习到体素级嵌入，保持局部信号连续性，同时通过相似性搜索实现跨不连续性的层位传播。在公共F3数据集和含断层合成数据集上的实验实现了比无监督基线更低的平均绝对误差（MAE），并且与使用单个标记切片的半监督方法性能相当。

英文摘要

Unsupervised 3D seismic horizon tracking faces a key limitation: signal-based propagators provide accurate trace-level alignment but often fail near faults, whereas texture-driven deep models are more robust to discontinuities, typically at the cost of labeled data requirements and reduced trace-level precision. We propose a self-supervised fusion of both paradigms in which signal-derived local horizon correspondences act as domain-specific priors to train a texture-based deep learning model. Specifically, we estimate reliable trace-to-trace flows from reflector slopes and use them to form positive pairs in a contrastive objective, while restricting training to high-confidence neighborhoods, optionally augmented with a fault mask. The objective is not to infer ambiguous correspondences close to discontinuities, but to preserve horizon identity across them. As a result, the network learns voxel-wise embeddings that preserve local signal continuity while enabling horizon propagation beyond discontinuities through similarity search. Experiments on the public F3 dataset and a faulted synthetic dataset achieve lower mean absolute error (MAE) than unsupervised baselines and competitive performance against a semi-supervised method using a single labeled slice.

URL PDF HTML ☆

赞 0 踩 0

2606.16333 2026-06-16 cs.CV cs.GR cs.LG 交叉投稿

Differentiable Packing of Irregular 3D Objects with Adaptive Container Estimation

不规则3D物体的可微分装箱与自适应容器估计

Palak Gupta, Shanmuganathan Raman

发表机构 * Indian Institute of Technology Gandhinagar（印度理工学院甘地讷格尔分校）

AI总结提出一种可微分装箱框架，通过梯度优化联合调整物体姿态和容器尺寸，利用自适应挤压机制和基于张量广播的快速计算，在单个GPU上数分钟内实现比基线方法小11-32%的容器。

Comments Comments: 20 pages, 8 figures, 5 tables. Under review at Computers & Graphics (Elsevier)

详情

AI中文摘要

大多数现有方法要么预先固定容器，要么通过外部搜索循环仅优化单个容器维度，其余维度则作为手动调整问题。我们提出了一种可微分装箱框架，在单个基于梯度的循环内联合优化所有6N个物体姿态参数和所有三个容器边长。该公式结合了六个基于物理的、可微分的损失项，这些损失项通过轴对齐包围盒代理直接在三角形网格上计算。自适应挤压机制在重叠损失低于按对数量缩放的阈值时周期性收紧容器，导致容器体积先大幅下降，然后进行小幅细化。所有成对计算均以张量广播形式编写，与基于循环的参考实现相比，速度提升了3.4到54倍。该流程使用Python和PyTorch实现，无需物理引擎、FFT库或凸分解。在多个物体类别上，该方法在N=100时产生的容器比时间匹配的DBLF和模拟退火基线小11%至32%，同时在单个消费级GPU上每个实例的运行时间不到4分钟。

英文摘要

Most existing approaches either fix the container in advance or optimize only a single container dimension through an outer search loop, leaving the remaining dimensions as a manual tuning problem. We present a differentiable packing framework that jointly optimizes all 6N object pose parameters and all three container side lengths inside a single gradient-based loop. The formulation combines six physics-inspired, differentiable loss terms computed directly on triangle meshes through axis-aligned bounding-box proxies. An adaptive squeezing mechanism periodically tightens the container whenever the overlap loss falls below a pair-count-scaled threshold, producing a large initial drop in container volume, followed by small refinements. All pairwise computations are written in tensor-broadcasting form, giving a 3.4 to 54 times speedup over a reference loop-based implementation. The pipeline is implemented in Python and PyTorch, with no physics engine, FFT library, or convex decomposition. On multiple object categories, the method produces containers that are 11 to 32 percent smaller than time-matched DBLF and simulated-annealing baselines at N =100, while running in under 4 minutes per instance on a single consumer GPU.

URL PDF HTML ☆

赞 0 踩 0

2606.16505 2026-06-16 cs.SD cs.LG 交叉投稿

Semi-Supervised Speech Confidence Detection using Pseudo-Labelling and Whisper Embeddings

半监督语音自信度检测：使用伪标签和Whisper嵌入

Adam Wynn, Jingyun Wang, Xiangyu Tan

发表机构 * Durham University（杜伦大学）； Shanghai Open University（上海开放大学）

AI总结提出一种结合人工特征与Whisper嵌入的框架，通过伪标签技术扩充数据，利用共注意力机制融合特征，实现75%的语音自信度检测准确率。

Comments 8 pages, 3 figures. Published in the Proceedings of the 26th International Conference on Artificial Intelligence in Education (AIED 2025). Shorter, preliminary version of arXiv:2605.12387

详情

DOI: 10.1007/978-3-031-98465-5_34
Journal ref: AIED 2025. LNCS vol 15882. Springer, Cham (2025)

AI中文摘要

理解说话者的自信度在教育环境中至关重要，因为它可以增强个性化反馈并改善学习成果。本研究引入了一种新颖的框架，通过将人工设计的特征与Whisper编码器的嵌入相结合来检测说话者的自信度。为了解决数据限制问题，采用伪标签技术来扩展标记数据集，使模型能够从人工标注和模型生成的标签中学习。该框架将传统语音特征（包括音高、音量、语速以及不流畅和重音的存在）与Whisper嵌入相结合，并使用共注意力机制融合这些表示，实现了75%的整体准确率。本研究有助于推进语音分析，支持个性化学习和口语技能发展的应用。

英文摘要

Understanding speaker confidence is crucial in educational settings, as it can enhance personalised feedback and improve learning outcomes. This study introduces a novel framework for detecting speaker confidence by integrating human-engineered features with embeddings from the Whisper encoder. To address data limitations, a pseudo-labelling technique is employed to expand the labelled dataset, allowing the model to learn from both human-annotated and model-generated labels. The framework combines traditional speech features including pitch, volume, rate of speech, and the presence of disfluencies and stress, with Whisper embeddings, and uses a co-attention mechanism to fuse these representations and achieve an overall accuracy of 75%. This study contributes to advancing speech analysis, enabling applications that support personalised learning and speaking skill development.

URL PDF HTML ☆

赞 0 踩 0

2606.16510 2026-06-16 math.NA cs.LG cs.NA 交叉投稿

Petrov-Galerkin Variational Physics-Informed Neural Network Framework for Two-Dimensional Singularly Perturbed Problems

Petrov-Galerkin变分物理信息神经网络框架用于二维奇异摄动问题

Vijay Kumar, Gautam Singh

发表机构 * Department of Mathematics, National Institute of Technology Tiruchirappalli（数学系，特里奇里帕利尔国家理工学院）

AI总结提出Petrov-Galerkin变分物理信息神经网络（VPINN）方法，采用神经网络构建试验空间和张量积帽函数作为测试函数，高效求解二维奇异摄动问题，在最大范数和L2范数上实现高精度。

2606.16587 2026-06-16 physics.flu-dyn cs.AI cs.LG physics.comp-ph 交叉投稿

Learning Interface Breakup: A Geometry-Conditioned Latent Surrogate for Spray Formation

学习界面破碎：一种用于喷雾形成的几何条件潜在代理模型

Julius H Ramlau, Friedrich Hastedt, Tolga Birdal, Ehecatl-Antonio del Río Chanona, Nausheen S Basha, Omar K Matar

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Technical University of Munich（慕尼黑技术大学）； Istanbul Technology University（伊斯坦布尔技术大学）； University of Texas at Austin（德克萨斯大学奥斯汀分校）； University of Cambridge（剑桥大学）； University of Oxford（牛津大学）

AI总结提出一种几何条件潜在代理模型，通过编码自适应网格细化（AMR）的单元密度场，在797个两相喷嘴模拟上训练，实现瞬态破碎动力学的高效预测，推理速度比Basilisk CFD快6×10^4倍。

Comments 11 pages, 5 figures, accepted to ICML AI4Physics 2026

详情

AI中文摘要

设计喷雾喷嘴需要预测几何形状如何影响瞬态两相破碎，但采用自适应网格细化（AMR）的高保真流体体积（VOF）模拟对于迭代设计探索来说成本过高。标准代理模型也面临挑战，因为液-气界面和底层的自适应离散化都随时间及几何形状变化。我们引入了一种几何条件潜在代理模型，该模型在797个两相喷嘴模拟上训练，通过编码AMR单元密度场（而非完整的多通道流状态）作为求解器集中分辨率的紧凑代理。从该表示出发，模型重建瞬态密度演化和喷嘴几何形状，而一个轻量级的第二阶段则恢复剩余的流动变量。在保留的模拟上，该方法准确捕捉了关键的界面动力学，同时将每条轨迹的推理时间减少到0.045秒，相对于Basilisk CFD加速超过6×10^4倍。这些结果表明，AMR细化结构可以作为瞬态两相流几何条件代理建模的紧凑且可学习的表示。

英文摘要

Designing spray nozzles requires predicting how geometry shapes transient two-phase breakup, but high-fidelity volume-of-fluid (VOF) simulations with adaptive mesh refinement (AMR) are too expensive for iterative design exploration. Standard surrogate models are also challenged by this setting because both the liquid--gas interface and the underlying adaptive discretization evolve across time and geometries. We introduce a geometry-conditioned latent surrogate trained on 797 two-phase nozzle simulations that addresses this by encoding the AMR cell-density field, rather than the full multi-channel flow state, as a compact proxy for where the solver concentrates resolution. From this representation, the model reconstructs transient density evolution and nozzle geometry, and a lightweight second stage recovers the remaining flow variables. On held-out simulations, the method accurately captures key interface dynamics while reducing inference time to 0.045 seconds per trajectory, corresponding to a speed-up of more than $6\times10^4$ relative to Basilisk CFD. These results suggest that AMR refinement structure can serve as a compact and learnable representation for geometry-conditioned surrogate modeling of transient two-phase flows.

URL PDF HTML ☆

赞 0 踩 0

2606.16607 2026-06-16 eess.SP cs.IT cs.LG math.IT 交叉投稿

Context-Aware Markov VAE for CSI Compression in Wireless Systems

面向无线系统中CSI压缩的上下文感知马尔可夫VAE

Efstathios Chatziloizos, Konstantinos Vandikas, Aneta Vulgarakis Feljan, Zheng Chen, Nikolaos Pappas

AI总结提出基于k-记忆马尔可夫变分自编码器的上下文感知压缩框架，利用有限时间窗口捕捉CSI在潜在空间中的演化，在低中压缩率下显著提升重构性能。

Comments 5 pages, 3 figures, 2 tables

详情

AI中文摘要

本文研究了在频分双工（FDD）系统中，针对时变大规模多输入多输出（MIMO）信道，在有限反馈资源下的神经信道状态信息（CSI）压缩问题。主要挑战在于，由于CSI在连续快照间表现出强时间相关性，需要获得紧凑且高效的CSI表示。现有的无记忆压缩模型未利用这一特性，而简单的时间扩展方法通常合并多个观测值，但未显式建模潜在动态。我们提出了一种基于k-记忆马尔可夫变分自编码器（k-MMVAE）的上下文感知压缩框架，该框架使用有限时间窗口在潜在空间中捕捉CSI的演化。该模型引入了具有有限记忆的马尔可夫结构潜在动态，从而能够有效利用时间依赖性进行压缩。仿真结果表明，与无记忆和弱顺序基线相比，所提方法改善了目标CSI重构性能，尤其是在低和中压缩率下。这些结果表明，显式的潜在时间建模可以在有限反馈约束下为CSI压缩提供有效机制。

英文摘要

This paper considers neural channel state information (CSI) compression for time-varying massive multiple-input multiple-output (MIMO) channels in frequency division duplex (FDD) systems with limited feedback resources. The main challenge lies in obtaining a compact and efficient representation of the CSI given that it exhibits strong temporal correlation across successive snapshots. Existing memoryless compression models do not exploit this property, while simple temporal extensions often incorporate multiple observations without explicitly modeling the latent dynamics. We propose a context-aware compression framework based on a k-memory Markov variational autoencoder (k-MMVAE), which uses a finite temporal window to capture the evolution of CSI in the latent space. The model introduces Markov-structured latent dynamics with finite memory, enabling efficient use of temporal dependencies for compression. Simulation results show that the proposed approach improves target CSI reconstruction performance compared to memoryless and weakly sequential baselines, particularly at low and moderate compression rates. These results suggest that explicit latent temporal modeling can provide an effective mechanism for CSI compression under limited feedback constraints.

URL PDF HTML ☆

赞 0 踩 0

2606.16693 2026-06-16 q-bio.NC cs.LG 交叉投稿

Learning Hybrid Biophysical Neuron Models with Neural ODEs

利用神经常微分方程学习混合生物物理神经元模型

Jonas Beck, Michael Deistler, Dóra Viktória Molnár, Jakob H. Macke, Philipp Berens

AI总结提出混合建模框架，将神经常微分方程嵌入电导基生物物理模型，以捕捉未知电流或错误指定的通道动力学，从电压记录中恢复可解释的门控动力学，并降低计算成本。

详情

AI中文摘要

生物物理神经元模型将神经活动的测量与潜在的细胞机制联系起来。然而，一个核心挑战是许多离子通道的动力学特征不明确，而实际简化——省略通道或减少形态细节——会在模型与生物学之间引入系统性差距。弥合这些差距需要能够灵活发现未建模动力学同时保持机制可解释性的方法。在这里，我们引入了一个混合建模框架，将神经常微分方程嵌入到基于电导的生物物理模型中，以捕捉未知电流或错误指定的通道动力学。通过根据电压依赖的稳态和时间常数函数参数化神经ODE，我们直接从电压记录中恢复可解释的门控动力学，而无需假设函数形式。我们展示了混合模型能够拟合2400个离子通道模型的门控动力学，并从单电流钳记录中恢复未知的门控动力学，在现实输入和参数错误指定下泛化到分布外刺激模式。我们还使用我们的方法将皮层神经元的多室模型简化为具有学习轴向电流的单室混合模型，计算成本降低了一个数量级。总之，我们的结果建立了一个即插即用的框架，用于选择性地用电导基模型中的未知组件替换为神经常微分方程，同时保留其机制结构。

英文摘要

Biophysical neuron models link measurements of neural activity to underlying cellular mechanisms. Yet, a central challenge is that the kinetics of many ion channels are poorly characterized, and practical simplifications -- omitting channels or reducing morphological detail -- introduce systematic gaps between model and biology. Bridging these gaps requires approaches that can flexibly discover unmodeled dynamics while preserving mechanistic interpretability. Here, we introduce a hybrid modeling framework that embeds neural ordinary differential equations into conductance-based biophysical models to capture unknown currents or mis-specified channel kinetics. By parameterizing the neural ODE in terms of voltage-dependent steady-state and time-constant functions, we recover interpretable gating dynamics directly from voltage recordings without assuming a functional form. We show that the hybrid model fits the gating kinetics of 2400 ion channel models and recovers unknown gating dynamics from single current-clamp recordings, generalizing to out-of-distribution stimulus regimes under realistic inputs and parameter misspecification. We also use our method to reduce a multicompartment model of a cortical neuron into a single-compartment hybrid model with a learned axial current, yielding up to an order of magnitude lower computational cost. Together, our results establish a plug-and-play framework for selectively replacing unknown components of conductance-based models with neural ODEs while preserving their mechanistic structure.

URL PDF HTML ☆

赞 0 踩 0

2606.16737 2026-06-16 math-ph cs.LG math.MP 交叉投稿

The Algebra of Units: From Buckingham's Pi-grec Theorem to Latent-Variable Learning

单位的代数：从白金汉π定理到潜变量学习

Mauro Valorani

AI总结提出一种从数据中自动发现无量纲群的方法，利用对数变换后的低维流形和奇异值分解，无需物理先验知识，在合成压缩机数据集上精确恢复经典工程无量纲数。

Comments 31 pages, 2 figures

详情

AI中文摘要

工程师经常测量许多量——速度、压力、温度、长度——这些量用不同的物理单位表示。白金汉π定理指出，这些变量总是可以组合成一组较小的无量纲数，其值完全决定系统的行为。传统上，识别合适的无量纲群需要专家知识和物理洞察。本文表明，它们可以从数据中自动发现，无需事先了解控制物理。关键观察是，在对数变换后，同一系统在不同缩放下的测量值位于一个低维流形上，其几何形状由潜在的无量纲群决定。奇异值分解（SVD）直接从数据中识别该流形。随后对整数指数组合的搜索恢复候选无量纲量，而重复变量过滤器仅保留由机器特征尺度构造的那些。该过程恢复了熟悉的工程群，包括流量系数、扬程系数和马赫数，同时排除了等价但可解释性较差的替代方案。该方法在包含16,000个测量值的合成压缩机数据集上进行了演示。从原始有量纲变量开始，无物理输入，它以数值精度恢复正确的无量纲群，并以低于0.01%的误差重现压缩机性能图。更广泛地说，这项工作揭示了经典量纲分析与现代数据驱动学习之间的密切联系。两者依赖于相同的基本代数结构，为构建同时可解释、可扩展和数据高效的物理模型提供了新途径。

英文摘要

Engineers often measure many quantities-speed, pressure, temperature, length-expressed in different physical units. The Buckingham Pi-grec theorem states that these variables can always be combined into a smaller set of dimensionless numbers whose values fully determine the system's behaviour. Identifying the appropriate dimensionless groups has traditionally required expert knowledge and physical insight. This paper shows that they can instead be discovered automatically from data, without prior knowledge of the governing physics. The key observation is that, after logarithmic transformation, measurements collected under different scalings of the same system lie on a low-dimensional manifold whose geometry is determined by the underlying dimensionless groups. Singular value decomposition (SVD) identifies this manifold directly from data. A subsequent search over integer-exponent combinations recovers candidate dimensionless quantities, while a repeating-variable filter retains only those constructed from the machine's characteristic scales. This procedure recovers familiar engineering groups, including the flow coefficient, head coefficient, and Mach number, while excluding equivalent but less interpretable alternatives. The method is demonstrated on a synthetic compressor dataset containing 16,000 measurements. Starting from raw dimensional variables and no physics input, it recovers the correct dimensionless groups to numerical precision and reproduces the compressor performance map with an error below 0.01%. More broadly, the work reveals a close connection between classical dimensional analysis and modern data-driven learning. Both rely on the same underlying algebraic structure, suggesting new approaches for building physical models that are simultaneously interpretable, scalable, and data-efficient.

URL PDF HTML ☆

赞 0 踩 0

2606.16747 2026-06-16 cs.GR cs.LG 交叉投稿

STAR-NT: Spatiotemporal Acceleration of Real-Time Neural Transparency Rendering

STAR-NT: 实时神经透明渲染的时空加速

Grigoris Tsopouridis, Christos Georgiou-Mousses, Aris Panagiotidis, Andreas Vasilakis, David Corrigan, Tobias A. Franke, Aleksei Gorbonosov, Andrei Astapov, Ioannis Fudos

AI总结提出时空加速框架，利用空间自适应四叉树细分和时间深度重投影，降低神经顺序无关透明渲染的几何开销，保持视觉质量。

Comments Supplemental material at https://github.com/gtsopus/STAR-NT

2606.16815 2026-06-16 eess.SP cs.AI cs.LG 交叉投稿

A Perception vs. Distortion Perspective on Score-Based Generative Channel Estimation

基于分数的生成式信道估计中的感知与失真权衡视角

Marco Skocaj, Lukas Eller, Mate Boban

AI总结本文通过感知-失真权衡理论，分析了基于分数的生成模型在信道估计中的优势与局限，指出在高预测不确定性下可接近贝叶斯最优性能，低不确定性下判别式方法更优。

Comments 13 pages

详情

AI中文摘要

受其在计算机视觉和逆问题求解中的显著成功驱动，基于分数的模型越来越多地应用于无线通信，并在一系列物理层任务中展现出潜力。然而，尽管兴趣日益增长，当前文献往往缺乏对分数匹配何时比传统判别学习具有实际优势的严格分析。本文旨在通过信道估计这一无线系统中的基本逆问题用例来填补这一空白。我们通过感知-失真权衡的视角，提出了基于分数的信道估计的理论解释，识别了分数匹配表现优异的条件及其关键局限性。特别是，通过将下游无线任务（如容量最大化）建模为信道估计过程的泛函，我们量化了标准失真最小化方法所导致的超额风险。大量数值结果表明，在高预测不确定性下，大的超额风险差距可以通过基于分数的估计来弥补，从而通过学习的后验实现接近贝叶斯最优的预编码，而在低预测不确定性下，由于复杂度更低且模型容量利用更高效，判别式失真最小化方法更可取。

英文摘要

Driven by their remarkable success in computer vision and inverse problem solving, score-based models are increasingly applied to wireless communications, where they show promise across a range of physical-layer tasks. However, despite this growing interest, the current literature often lacks a rigorous analysis of when score-matching offers a tangible advantage over traditional discriminative learning. This paper aims to address this gap through the use-case of channel estimation, a fundamental inverse problem in wireless systems. We present a theoretically grounded interpretation of score-based channel estimation through the lens of the perception-distortion tradeoff, identifying the conditions where score matching excels as well as its key limitations. In particular, by modeling downstream wireless tasks (e.g., capacity maximization) as functionals of the channel estimation process, we quantify the excess risk incurred by standard distortion-minimization approaches. Extensive numerical results show that under high predictive uncertainty, the large excess risk gap can be offset by score-based estimation, enabling near Bayesian-optimal precoding via the learned posterior, whereas in the low predictive uncertainty regime, discriminative distortion-minimization approaches are preferable due to lower complexity and more efficient use of model capacity.

URL PDF HTML ☆

赞 0 踩 0

2606.16935 2026-06-16 cs.RO cs.AI cs.LG 交叉投稿

CrossMaps: Confidence-Aware Open-Vocabulary Semantic Mapping for Rover Navigation

CrossMaps: 用于漫游车导航的置信度感知开放词汇语义地图

Jan-Niklas Klein, Sona Ghahremani, Christian Medeiros Adriano, Holger Giese

发表机构 * Hasso Plattner Institute for Digital Engineering, Potsdam, Germany（哈索·普拉特纳数字工程研究所（德国波茨坦））

AI总结提出CrossMaps，一种实时置信度感知开放词汇语义地图构建流水线，通过多尺度CLIP嵌入、置信度融合和双记忆架构生成可查询语义地图，用于漫游车导航。

Comments IEEE International Conference on Robotics and Automation (ICRA) 2026: ROSE International Workshop on Robotics Software Engineering, June 01, 2026, Vienna, Austria

详情

AI中文摘要

漫游车依赖感知来维护空间地图，该地图编码物体和传感器质量（例如，距离可靠性、光照伪影、数据密度），指导数据融合、嵌入更新以及在部分可观测性下的导航。为了研究这些耦合的感知-导航过程，我们提出了CrossMaps，一种实时的置信度感知开放词汇语义地图构建流水线，该流水线从RGB-D数据构建可语言查询的地图。基于VLMaps风格的方法，CrossMaps集成了多尺度CLIP嵌入、置信度感知融合以及由短期记忆（STM）和长期记忆（LTM）组成的双记忆架构。STM使用几何、语义和时间置信度线索聚合噪声视觉观测，而置信且一致的单元被提升到LTM作为持久语义地标。CrossMaps设计用于与Jetson Orin驱动的UGV以及SLAM一起部署，实时运行并生成语义热力图，可通过自然语言查询来引导漫游车导航。

英文摘要

Rovers rely on perception to maintain spatial maps that encode both objects and sensor quality (e.g., range reliability, lighting artifacts, data density), guiding data fusion, embedding updates, and navigation under partial observability. To study these coupled perception-navigation processes, we present CrossMaps, a real-time confidence-aware open-vocabulary semantic mapping pipeline that constructs language-queryable maps from RGB-D data. Building on VLMaps-style approaches, CrossMaps integrates multi-scale CLIP embeddings with confidence-aware fusion and a dual-memory architecture consisting of Short-Term Memory (STM) and Long-Term Memory (LTM). The STM aggregates noisy visual observations using geometric, semantic, and temporal confidence cues, while confident and coherent cells are promoted to the LTM as persistent semantic landmarks. Designed for deployment with a Jetson Orin-powered UGV alongside SLAM, CrossMaps runs in real time and produces semantic heatmaps that can be queried with natural language to guide rover navigation.

URL PDF HTML ☆

赞 0 踩 0

2606.16950 2026-06-16 physics.ins-det cs.LG physics.bio-ph physics.chem-ph physics.data-an q-bio.BM 交叉投稿

Latent space mapping of interpretable structural coordinates from stochastic single-molecule signals

从随机单分子信号中可解释结构坐标的潜空间映射

Matteo Cartiglia, Sandro Kuppel, Wouter Botermans Wannes Peeters, Natan Biesmans, Liam Vandekerckhove, Eric Beamish, Koen Ongena, Wouter Renckens, Pol Van Dorpe, Sanjin Marion

AI总结提出通过对比编码器将纳米孔随机信号映射到可解释分子结构坐标的潜空间，实现高效识别与数据融合。

Comments 32 pages, 6 figures

详情

基于深度学习的自动化超声多普勒角度估计

Nilesh Patil, Ajay Anand

发表机构 * Goergen Institute for Data Science（戈尔根数据科学研究所）； University of Rochester Medical Center（罗切斯特大学医学中心）； University of Rochester（罗切斯特大学）

AI总结提出一种基于深度学习的自动化多普勒角度估计方法，使用2100张颈动脉超声图像及预训练模型，平均绝对误差3.9°-9.4°，最佳模型误差低于临床可接受阈值，可避免正常速度误判为狭窄。

详情

DOI: 10.1109/embc.2019.8857587
Journal ref: Annu Int Conf IEEE Eng Med Biol Soc. 2019 Jul;2019:28-31

AI中文摘要

角度估计是测量血流速度的多普勒超声临床工作流程中的重要步骤。人们普遍认为，角度估计不正确是基于多普勒的血流速度测量误差的主要原因。在本文中，我们提出了一种基于深度学习的自动化多普勒角度估计方法。该方法使用2100张人类颈动脉超声图像（包括图像增强）进行开发。使用五个预训练模型提取图像特征，并将这些特征传递给一个自定义浅层网络进行多普勒角度估计。独立地，由一名人类观察者审阅图像进行测量以进行比较。对于评估的模型，自动角度估计与手动角度估计之间的平均绝对误差（MAE）范围为3.9°至9.4°。此外，最佳性能模型的MAE低于可接受的临床多普勒角度误差阈值，从而避免了将正常速度值误分类为狭窄。结果表明，应用基于深度学习的技术进行自动化超声多普勒角度估计具有潜力。这种技术有可能在商业超声扫描仪的成像软件中实现。

英文摘要

Angle estimation is an important step in the Doppler ultrasound clinical workflow to measure blood velocity. It is widely recognized that incorrect angle estimation is a leading cause of error in Doppler-based blood velocity measurements. In this paper, we propose a deep learning-based approach for automated Doppler angle estimation. The approach was developed using 2100 human carotid ultrasound images including image augmentation. Five pre-trained models were used to extract images features, and these features were passed to a custom shallow network for Doppler angle estimation. Independently, measurements were obtained by a human observer reviewing the images for comparison. The mean absolute error (MAE) between the automated and manual angle estimates ranged from 3.9° to 9.4° for the models evaluated. Furthermore, the MAE for the best performing model was less than the acceptable clinical Doppler angle error threshold thus avoiding misclassification of normal velocity values as a stenosis. The results demonstrate potential for applying a deep-learning based technique for automated ultrasound Doppler angle estimation. Such a technique could potentially be implemented within the imaging software on commercial ultrasound scanners.

URL PDF HTML ☆

赞 0 踩 0

2508.10967 2026-06-16 cs.LG cs.AI 版本更新

Retro-Expert: Collaborative Reasoning for Interpretable Retrosynthesis

Retro-Expert: 面向可解释逆合成的协同推理

Xinyi Li, Sai Wang, Yutian Lin, Yu Wu

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出Retro-Expert框架，通过强化学习结合大语言模型与专用模型，实现可解释的逆合成预测，并生成基于化学逻辑的自然语言解释。

详情

AI中文摘要

逆合成预测旨在根据给定的产物分子推断反应物分子，这是化学合成中的一项基本任务。然而，现有方法依赖于静态模式匹配范式，限制了其从化学数据中进行有效逻辑决策的能力，导致黑箱过程。我们提出Retro-Expert，一个可解释的逆合成框架，通过纯强化学习结合大语言模型和专用模型的互补优势，进行协同推理。它通过三个组件输出基于化学逻辑的自然语言解释：（1）专用模型提供化学知识，将其蒸馏到高质量的化学决策空间中；（2）大语言模型驱动的批判性推理，生成具有可解释推理路径的预测；（3）基于知识的策略优化，改进可解释的决策策略。实验表明，Retro-Expert在不同指标上均优于基于大语言模型和专用模型的方法，同时生成基于化学的解释，增强了化学家在实践中的信任。本文源代码见：此 https URL。

英文摘要

Retrosynthesis prediction aims to infer the reactant molecules based on a given product molecule, which is a fundamental task in chemical synthesis. However, existing methods rely on a static pattern-matching paradigm, which limits their ability to perform effective logical decision-making from chemical data, leading to a black-box process. We propose Retro-Expert, an interpretable retrosynthesis framework that performs collaborative reasoning by combining the complementary strengths of Large Language Models and specialized models via pure reinforcement learning. It outputs natural language explanations grounded in chemical logic through three components: (1) specialized models provide chemical knowledge that is distilled into a high-quality chemical decision space, (2) LLM-driven critical reasoning to generate predictions with an interpretable reasoning path, and (3) knowledge-grounded policy optimization refines the interpretable decision policy. Experiments show that Retro-Expert surpasses both LLM-based and specialized models across different metrics, while generating chemically grounded explanations that enhance chemists' trust in practice. The source code for this paper is available at https://github.com/MagixRab-ll/Retro-Expert.

URL PDF HTML ☆

赞 0 踩 0

2510.02605 2026-06-16 cs.LG 版本更新

Towards CONUS-Wide ML-Augmented Conceptually-Interpretable Modeling of Catchment-Scale Precipitation-Storage-Runoff Dynamics

面向美国本土的机器学习增强概念可解释流域尺度降水-存储-径流动力学建模

Yuan-Heng Wang, Yang Yang, Fabio Ciulla, Hoshin V. Gupta, Charuleka Varadharajan

发表机构 * Earth and Environmental Science Area, Lawrence Berkeley National Lab（伯克利国家实验室地球与环境科学部）； Department of Hydrology and Atmospheric Science, University of Arizona（亚利桑那大学水文学与大气科学系）； School for the Environment, University of Massachusetts Boston（马萨诸塞大学波士顿分校环境学院）； Department of Civil Engineering, The University of Hong Kong（香港大学土木工程系）

AI总结本研究利用质量守恒感知机（MCP）构建机器学习增强的物理可解释流域模型，在美国本土多种水文气候条件下评估模型性能，发现基于MCP的模型在性能上与LSTM相当，强调了根据水文过程优势选择合适模型复杂度的重要性。

Comments Main text: 99 pages, 15 figures, 5 tables; Applendix: Section A-E; 2 figures; Supplementary Materials: 22 figures, 9 tables

详情

AI中文摘要

尽管许多现代研究致力于基于机器学习的大样本水文建模，但这些努力并未必然转化为基于增强的物理概念理解的预测改进。在此，我们报告了一项覆盖美国本土（跨越多种水文-地质-气候条件）的大样本研究，使用基于质量守恒感知机（MCP）的机器学习增强、物理可解释的流域尺度模型，模型复杂度各异。使用属性掩码（如雪情、森林覆盖和气候区）评估结果。我们的结果表明，根据过程优势随水文情势的变化选择适当复杂度的模型架构的重要性。基准比较显示，基于物理可解释的质量守恒MCP模型可以达到与基于长短期记忆网络（LSTM）架构的数据驱动模型相当的性能。总体而言，本研究强调了理论指导、物理基础方法在大样本水文学中的潜力，侧重于机制理解和简约可解释模型架构的开发，从而为未来能够编码空间和时间变化过程优势信息的通用模型奠定基础。

英文摘要

While many modern studies are dedicated to ML-based large-sample hydrologic modeling, these efforts have not necessarily translated into predictive improvements that are grounded in enhanced physical-conceptual understanding. Here, we report on a CONUS-wide large-sample study (spanning diverse hydro-geo-climatic conditions) using ML-augmented physically-interpretable catchment-scale models of varying complexity based in the Mass-Conserving Perceptron (MCP). Results were evaluated using attribute masks such as snow regime, forest cover, and climate zone. Our results indicate the importance of selecting model architectures of appropriate model complexity based on how process dominance varies with hydrological regime. Benchmark comparisons show that physically-interpretable mass-conserving MCP-based models can achieve performance comparable to data-based models based in the Long Short-Term Memory network (LSTM) architecture. Overall, this study highlights the potential of a theory-informed, physically grounded approach to large-sample hydrology, with emphasis on mechanistic understanding and the development of parsimonious and interpretable model architectures, thereby laying the foundation for future models of everywhere that architecturally encode information about spatially- and temporally-varying process dominance.

URL PDF HTML ☆

赞 0 踩 0

2510.22266 2026-06-16 cs.LG cs.AI cs.CY 版本更新

A Multi-level Analysis of Factors Associated with Student Performance: A Machine Learning Approach to the SAEB Microdata

学生表现相关因素的多层次分析：基于SAEB微观数据的机器学习方法

Rodrigo Tertulino, Laércio Alencar

发表机构 * Federal Institute of Education, Science, and Technology of Rio Grande do Norte（巴西里约格朗德杜北教育、科学和技术联邦学院）

AI总结采用多级机器学习方法，利用SAEB微观数据中四类特征，通过随机森林模型以90.2%准确率分类学生水平，并借助SHAP解释发现学校平均社会经济水平是最强预测因子，表明学业表现是系统性现象。

Comments This article has been published in Discover Education (Springer Nature). The final authenticated version is available at:https://doi.org/10.1007/s44217-026-01699-0

详情

DOI: 10.1007/s44217-026-01699-0
Journal ref: Discover Education, 2026

AI中文摘要

识别影响基础教育学生表现的因素是巴西制定有效公共政策的核心挑战。本研究引入了一种多级机器学习方法，利用巴西基础教育评估系统（SAEB）的微观数据对九年级和高中学生的熟练程度进行分类。我们的模型独特地整合了四个数据源：学生社会经济特征、教师专业档案、学校指标和校长管理档案。对四种集成算法的比较分析证实了随机森林模型的优越性，该模型达到了90.2%的准确率和96.7%的曲线下面积（AUC）。为了超越预测，我们应用了基于SHAP的可解释人工智能（XAI），结果显示学校的平均社会经济水平是最主要的预测因子，表明系统性因素比孤立的个体特征影响更大。主要结论是，学业表现是一种与学校生态系统深度相关的系统性现象。本研究提供了一个数据驱动的、可解释的工具，以通过解决学校之间的差异来促进教育公平的政策制定。

英文摘要

Identifying the factors that influence student performance in basic education is a central challenge for formulating effective public policies in Brazil. This study introduces a multi-level machine learning approach to classify the proficiency of 9th-grade and high school students using microdata from the System of Assessment of Basic Education (SAEB). Our model uniquely integrates four data sources: student socioeconomic characteristics, teacher professional profiles, school indicators, and principal management profiles. A comparative analysis of four ensemble algorithms confirmed the superiority of a Random Forest model, which achieved 90.2% accuracy and an Area Under the Curve (AUC) of 96.7%. To move beyond prediction, we applied Explainable AI (XAI) using SHAP, which revealed that the school's average socioeconomic level is the most dominant predictor, demonstrating that systemic factors have a greater impact than individual characteristics in isolation. The primary conclusion is that academic performance is a systemic phenomenon deeply tied to the school's ecosystem. This study provides a data-driven, interpretable tool to inform policies aimed at promoting educational equity by addressing disparities between schools.

URL PDF HTML ☆

赞 0 踩 0

2511.18960 2026-06-16 cs.LG cs.CV cs.RO 版本更新

AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention

AVA-VLA: 通过主动视觉注意力改进视觉-语言-动作模型

Lei Xiao, Jifeng Li, Juntao Gao, Feiyang Ye, Yan Jin, Jingjing Qian, Jing Zhang, Yong Wu, Xiaoyuan Yu

发表机构 * LiAuto Inc.（LiAuto公司）； Beijing University of Technology（北京理工大学）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结针对VLA模型忽视历史信息的问题，提出AVA-VLA框架，利用循环状态近似信念并引入主动视觉注意力动态重加权视觉令牌，在LIBERO和CALVIN等基准上取得最优性能。

Comments Accepted at CVPR 2026 (Highlight)

详情

AI中文摘要

视觉-语言-动作（VLA）模型最近在具身任务中取得了显著进展，但大多数方法在每个时间步独立处理视觉观察。这种历史无关的设计将机器人操作视为马尔可夫决策过程，而现实中的机器人控制本质上是部分可观测的，需要推理过去的交互。为了解决这一不匹配，我们从部分可观测马尔可夫决策过程的角度重新表述VLA策略学习，并提出AVA-VLA，一种将动作生成建立在循环状态上的框架，该状态作为智能体对任务历史信念的神经近似。基于此循环状态，我们引入了主动视觉注意力（AVA），它动态地重新加权当前观测中的视觉令牌，以关注与指令和执行历史最相关的区域。大量实验表明，AVA-VLA在标准机器人基准测试（包括LIBERO和CALVIN）上达到了最先进的性能，并有效迁移到真实世界的双臂操作任务。这些结果证明了时间基础的主动视觉处理在改善机器人序列决策中VLA性能的有效性。项目页面见该URL。

英文摘要

Vision-Language-Action (VLA) models have shown remarkable progress in embodied tasks recently, but most methods process visual observations independently at each timestep. This history-agnostic design treats robot manipulation as a Markov Decision Process, even though real-world robotic control is inherently partially observable and requires reasoning over past interactions. To address this mismatch, we reformulate VLA policy learning from a Partially Observable Markov Decision Process perspective and propose AVA-VLA, a framework that conditions action generation on a recurrent state that serves as a neural approximation to the agent's belief over task history. Built on this recurrent state, we introduce Active Visual Attention (AVA), which dynamically reweights visual tokens in the current observation to focus on regions most relevant given both the instruction and execution history. Extensive experiments show that AVA-VLA achieves state-of-the-art performance on standard robotic benchmarks, including LIBERO and CALVIN, and transfers effectively to real-world dual-arm manipulation tasks. These results demonstrate the effectiveness of temporally grounded active visual processing for improving VLA performance in robotic sequential decision-making. The project page is available at https://liauto-dsr.github.io/AVA-VLA-Page.

URL PDF HTML ☆

赞 0 踩 0

2512.16184 2026-06-16 cs.LG 版本更新

A Multimodal Approach to Alzheimer's Diagnosis: Geometric Insights from Cube Copying and Cognitive Assessments

一种多模态阿尔茨海默病诊断方法：来自立方体复制和认知评估的几何洞察

Jaeho Yang, Kijung Yoon

发表机构 * Department of Electronic Engineering, Hanyang University（电子工程系，翰阳大学）； Department of Artificial Intelligence, Hanyang University（人工智能系，翰阳大学）

AI总结提出多模态框架，将手绘立方体草图转换为图结构，结合人口统计和神经心理测试分数，用于阿尔茨海默病分类，图表示优于像素模型，多模态融合提升性能。

详情

AI中文摘要

早期可及的阿尔茨海默病检测仍是一个关键的临床挑战，而立方体复制任务提供了一种简单但信息丰富的视空间功能评估。本文提出一种多模态框架，将手绘立方体草图转换为捕获几何和拓扑属性的图结构表示，并将这些特征与人口统计信息和神经心理测试分数相结合，用于阿尔茨海默病分类。立方体绘图被建模为图，节点特征编码空间坐标、基于局部图元的拓扑和角度几何，通过图神经网络处理，并在后期融合模型中与年龄、教育程度和NPT特征融合。实验结果表明，基于图的表示提供了强大的单模态基线，并显著优于基于像素的卷积模型，而多模态集成进一步提高了平衡分类性能和判别能力。基于SHAP的可解释性分析确定了与角完整性和边缘连续性相关的特定图元基序作为关键预测因子，与阿尔茨海默病中立方体绘图扭曲的临床观察高度一致。总之，这些发现将基于图的立方体复制行为分析建立为一种可解释、非侵入且可扩展的阿尔茨海默病筛查框架。

英文摘要

Early and accessible detection of Alzheimer's disease (AD) remains a critical clinical challenge, and cube-copying tasks offer a simple yet informative assessment of visuospatial function. This work proposes a multimodal framework that converts hand-drawn cube sketches into graph-structured representations capturing geometric and topological properties, and integrates these features with demographic information and neuropsychological test (NPT) scores for AD classification. Cube drawings are modeled as graphs with node features encoding spatial coordinates, local graphlet-based topology, and angular geometry, which are processed using graph neural networks and fused with age, education, and NPT features in a late-fusion model. Experimental results show that graph-based representations provide a strong unimodal baseline and substantially outperform pixel-based convolutional models, while multimodal integration further improves balanced classification performance and discriminative ability. SHAP-based interpretability analysis identifies specific graphlet motifs associated with corner integrity and edge continuity as key predictors, closely aligning with clinical observations of distorted cube drawings in AD. Together, these findings establish graph-based analysis of cube-copying behavior as an interpretable, non-invasive, and scalable framework for Alzheimer's disease screening.

URL PDF HTML ☆

赞 0 踩 0

2512.18725 2026-06-16 cs.LG 版本更新

ML Inference Scheduling with Predictable Latency

具有可预测延迟的ML推理调度

Haidong Zhao, Nikolaos Georgantas

发表机构 * Inria（法国国家信息与自动化研究所）； Sorbonne University Paris（巴黎索邦大学）

AI总结针对ML推理中并发任务干扰导致调度不可预测的问题，提出细粒度动态干扰预测方法，提高GPU利用率的同时满足SLO。

Comments Accepted at MAIoT@Middleware 2025

详情

DOI: 10.1145/3774901.3778066
Journal ref: Proceedings of the Middleware for Autonomous AIoT Systems in the Computing Continuum (MAIoT 2025)

AI中文摘要

机器学习（ML）推理服务系统可以调度请求以提高GPU利用率并满足服务级别目标（SLO）或截止时间。然而，提高GPU利用率可能会影响延迟敏感的调度，因为并发任务会竞争GPU资源，从而引入干扰。鉴于干扰效应在调度中引入不可预测性，忽略它们可能会影响SLO或截止时间的满足。尽管如此，现有的干扰预测方法在几个方面仍然有限，这可能限制它们在调度中的实用性。首先，它们通常是粗粒度的，忽略了运行时共置动态，从而限制了干扰预测的准确性。其次，它们倾向于使用静态预测模型，这可能无法有效应对不同的工作负载特征。在本文中，我们评估了现有干扰预测方法的潜在局限性，发现粗粒度方法可能导致预测精度的显著偏差，而静态模型在变化的工作负载下会显著退化。

英文摘要

Machine learning (ML) inference serving systems can schedule requests to improve GPU utilization and to meet service level objectives (SLOs) or deadlines. However, improving GPU utilization may compromise latency-sensitive scheduling, as concurrent tasks contend for GPU resources and thereby introduce interference. Given that interference effects introduce unpredictability in scheduling, neglecting them may compromise SLO or deadline satisfaction. Nevertheless, existing interference prediction approaches remain limited in several respects, which may restrict their usefulness for scheduling. First, they are often coarse-grained, which ignores runtime co-location dynamics and thus restricts their accuracy in interference prediction. Second, they tend to use a static prediction model, which may not effectively cope with different workload characteristics. In this paper, we evaluate the potential limitations of existing interference prediction approaches, finding that coarse-grained methods can lead to noticeable deviations in prediction accuracy and that static models degrade considerably under changing workloads.

URL PDF HTML ☆

赞 0 踩 0

2512.19643 2026-06-16 cs.LG cs.CE 版本更新

ANCHOR: Error-Controlled Adaptive Numerical Correction for Neural Operator Time Marching

ANCHOR: 神经算子时间推进的误差控制自适应数值校正

Rajyasri Roy, Dibyajyoti Nayak, Somdatta Goswami

发表机构 * Department of Civil and Systems Engineering, Johns Hopkins University（土木与系统工程系，约翰霍普金斯大学）

AI总结提出ANCHOR框架，通过基于物理信息的残差估计器自适应耦合预训练神经算子与经典数值求解器，实现非线性时变PDE的稳定长时预测，有效控制误差累积。

Comments 32 pages, 18 figures

详情

AI中文摘要

时间相关偏微分方程（PDE）的数值模拟是科学和工程应用的核心，但对于长时间或时间紧迫的场景，高保真求解器往往成本过高。神经算子（NO）替代模型在参数和函数输入上提供快速推理；然而，大多数自回归NO框架仍然容易受到累积误差的影响，且集成平均指标对单个推理轨迹的保证有限。在实践中，误差累积在训练时间范围外可能变得不可接受，现有方法缺乏在线监测或校正机制。为解决这一问题，我们提出ANCHOR（高保真算子展开的自适应数值校正），一种在线、实例感知的混合推理框架，用于非线性时变PDE的稳定长时预测。ANCHOR将预训练NO作为主要推理引擎，并通过基于物理信息的残差误差估计器自适应地将其与经典数值求解器耦合。受数值分析中自适应时间步长的启发，ANCHOR监测归一化PDE残差的指数移动平均（EMA），以检测累积误差并在无需真实解的情况下触发校正求解器干预。我们表明，基于EMA的估计器与真实相对L2误差强相关，从而在推理过程中实现无数据、实例感知的误差控制。在六个经典PDE上的评估：一维和二维Burgers方程、二维Allen-Cahn方程、二维Cahn-Hilliard方程、二维Navier-Stokes方程和三维热传导方程，证明ANCHOR能够可靠地限制长时误差增长，稳定外推展开，并显著提高相对于独立神经算子的鲁棒性，同时保持比高保真数值求解器更高的效率。

英文摘要

Numerical simulation of time-dependent partial differential equations (PDEs) is central to scientific and engineering applications, but high-fidelity solvers are often prohibitively expensive for long-horizon or time-critical settings. Neural operator (NO) surrogates offer fast inference across parametric and functional inputs; however, most autoregressive NO frameworks remain vulnerable to compounding errors, and ensemble-averaged metrics provide limited guarantees for individual inference trajectories. In practice, error accumulation can become unacceptable beyond the training horizon, and existing methods lack mechanisms for online monitoring or correction. To address this gap, we propose ANCHOR (Adaptive Numerical Correction for High-fidelity Operator Rollouts), an online, instance-aware hybrid inference framework for stable long-horizon prediction of nonlinear, time-dependent PDEs. ANCHOR treats a pretrained NO as the primary inference engine and adaptively couples it with a classical numerical solver using a physics-informed, residual-based error estimator. Inspired by adaptive time-stepping in numerical analysis, ANCHOR monitors an exponential moving average (EMA) of the normalized PDE residual to detect accumulating error and trigger corrective solver interventions without requiring access to ground-truth solutions. We show that the EMA-based estimator correlates strongly with the true relative L2 error, enabling data-free, instance-aware error control during inference. Evaluations on six canonical PDEs: 1D and 2D Burgers', 2D Allen-Cahn, 2D Cahn-Hilliard, 2D Navier-Stokes, and 3D heat conduction, demonstrate that ANCHOR reliably bounds long-horizon error growth, stabilizes extrapolative rollouts, and significantly improves robustness over standalone neural operators, while remaining substantially more efficient than high-fidelity numerical solvers.

URL PDF HTML ☆

赞 0 踩 0

2602.03957 2026-06-16 cs.LG cs.CY 版本更新

Temporal Validation Changes the Apparent Public-Health Utility of Under-Five Mortality Prediction in Bangladesh: A Four-Round DHS Machine-Learning Study

时间验证改变孟加拉国五岁以下儿童死亡率预测的公共卫生效用：一项四轮DHS机器学习研究

Md Muhtasim Munif Fahim, M. Monimul Huq, M. Sabiruzzaman, Md Rezaul Karim

发表机构 * Data Science Research Lab, Department of Statistics, University of Rajshahi（数据科学研究实验室，统计学系，拉贾沙希大学）； Department of Statistics, University of Rajshahi（统计学系，拉贾沙希大学）

AI总结本研究通过四轮孟加拉国人口与健康调查数据，比较不同验证设计下机器学习模型预测五岁以下儿童死亡率的表现，发现时间验证设计比模型架构更显著影响公共卫生效用评估。

Comments 26 pages, 6 figures. Submitted to BMC Medical Informatics

详情

AI中文摘要

背景：尽管国家取得进展，孟加拉国五岁以下儿童死亡率仍不均衡。基于DHS的预测模型可能指导针对性随访，但前提是验证反映未来使用。我们检验了验证设计如何改变预测性能的表观。方法：分析了四轮BDHS（2011-2022年；33,962名儿童；1,290例死亡），使用26特征管道和三类模型，在四种验证方案下，包括跨调查时间验证（训练2011+2014，校准2017，测试2022）。通过遗传算法神经架构搜索选择了32单元ELU多层感知器。AUROC使用2,000次bootstrap重采样；筛查效用使用敏感性、阳性预测值和固定容量下需筛查人数。结果：验证方案比模型类别更显著改变公共卫生解释。NAS MLP AUROC范围从0.669（仅2022年随机）到0.775（合并随机），时间AUROC为0.730。在时间验证的前10%阈值下，NAS识别出2022年355例死亡中的152例（敏感性42.8%，PPV 13.2%，NNS 7.6）。不同设计下的NNS范围从5.6到11.0。结论：验证方案选择比架构更显著改变筛查工作量和表观政策价值。时间验证支持对随访和转诊需求的可靠估计；DHS儿童死亡率研究在项目使用前应报告敏感性、PPV和NNS。

英文摘要

Background: Under-five mortality in Bangladesh remains uneven despite national progress. DHS-based prediction models may guide targeted follow-up, but only if validation reflects future use. We examined how validation design changes apparent prediction performance. Methods: Four BDHS rounds (2011-2022; 33,962 children; 1,290 deaths) were analysed with a 26-feature pipeline and three model classes under four validation regimes, including cross-survey temporal validation (train 2011+2014, calibrate 2017, test 2022). A 32-unit ELU multilayer perceptron was selected via genetic-algorithm neural architecture search. AUROC used 2,000 bootstrap resamples; screening utility used sensitivity, PPV, and number needed to screen (NNS) at fixed capacity. Results: Validation regime altered public-health interpretation more than model class. NAS MLP AUROC ranged from 0.669 (2022-only random) to 0.775 (pooled random), with temporal AUROC 0.730. At the top-10% temporal threshold, NAS identified 152/355 deaths in 2022 (sensitivity 42.8%, PPV 13.2%, NNS 7.6). NNS across designs ranged from 5.6 to 11.0. Conclusions: Validation-regime choice changed screening workload and apparent policy value more than architecture. Temporal validation supports defensible estimates of follow-up and referral demand; DHS child-mortality studies should report sensitivity, PPV, and NNS before programmatic use.

URL PDF HTML ☆

赞 0 踩 0

2602.05060 2026-06-16 cs.LG cs.CL 版本更新

StagePilot: Stage-Level Planning for Long-Horizon Dialogue Simulation in Cybergrooming

StagePilot: 网络诱骗中长程对话模拟的阶段级规划

Heajun An, Qi Zhang, Minqian Liu, Xinyi Zhang, Sang Won Lee, Lifu Huang, Pamela J. Wisniewski, Jin-Hee Cho

发表机构 * Virginia Tech（弗吉尼亚理工大学）； University of California, Davis（加州大学戴维斯分校）； International Computer Science Institute（国际计算机科学研究所）

AI总结提出StagePilot框架，通过分离阶段级规划与响应生成，结合强化学习学习阶段策略，实现网络诱骗对话的结构化、连贯模拟，相比基线减少对话停滞，IQL+AWAC变体最终阶段到达率提升43%。

Comments Accepted at the 27th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2026)

详情

AI中文摘要

网络诱骗是对青少年的一种不断演变的威胁，需要主动的教育干预。我们通过将对话进展建模为阶段式交互上的结构化规划问题来解决这一问题。我们提出StagePilot，一个将阶段级规划与响应生成分离的对话框架，其中模型在受约束的转换下选择下一阶段，并基于该阶段生成响应，从而实现连贯且逼真的进展。使用强化学习从离线数据中学习阶段级策略，优化情感对齐和目标一致进展。我们的实证实验表明，与基线相比，StagePilot生成更结构化、更连贯的对话轨迹，并减少对话停滞；值得注意的是，IQL+AWAC变体更频繁地到达最终阶段，同时保持超过70%的正面或中性响应，实现了43%的相对改进。

英文摘要

Cybergrooming is an evolving threat to youth, requiring proactive educational interventions. We address this by modeling dialogue progression as a structured planning problem over stage-wise interactions. We propose StagePilot, a dialogue framework that separates stage-level planning from response generation, in which the model selects the next stage under constrained transitions and generates responses conditioned on it, enabling coherent and realistic progression. Reinforcement learning is used to learn stage-level policies from offline data, optimizing for both emotional alignment and goal-consistent progression. Our empirical experiments show that StagePilot generates more structured, coherent dialogue trajectories and reduces conversational stagnation compared to baselines; notably, the IQL+AWAC variant reaches the final stage more often while maintaining over 70% positive or neutral responses, yielding a 43% relative improvement.

URL PDF HTML ☆

赞 0 踩 0

2602.16793 2026-06-16 cs.LG 版本更新

Escaping the Cognitive Well: Efficient Competition Math with Off-the-Shelf Models

逃离认知陷阱：使用现成模型高效解决竞赛数学问题

Xingyu Dang, Rohit Agarwal, Rodrigo Porto, Anirudh Goyal, Liam H Fowl, Sanjeev Arora

发表机构 * Princeton University（普林斯顿大学）； Princeton Language and Intelligence（普林斯顿语言与智能）

AI总结提出一种推理流水线，利用现成模型以极低成本在IMO风格数学问题上达到最佳性能，通过猜想提取和上下文分离解决求解器-评分器流水线中的认知陷阱问题。

详情

AI中文摘要

在过去一年中，定制和未公开的数学推理模型在国际数学奥林匹克竞赛（IMO）中达到了金牌水平。随后，使用公开可用的模型通过大规模推理也报告了类似的性能，但成本高昂（例如，每个问题3000美元）。在这项工作中，我们提出了一种推理流水线，在IMO风格的数学问题上以平均推理成本比竞争方法低几个数量级的情况下实现了最佳性能，同时仅使用通用现成模型。我们的方法基于对求解器-评分器流水线中评分器失败的见解，我们称之为认知陷阱（迭代优化收敛到错误解，而求解器和流水线的内部评分器认为该解基本正确）。我们的流水线通过猜想提取来解决这些失败模式，其中候选引理从生成的解中分离出来，并在新环境（上下文分离）中与其否定形式一起独立验证。在IMO-ProofBench Advanced（PB-Adv）上，我们的流水线使用Gemini 3.0 Pro达到了67.1%的性能，每个问题的平均成本约为31美元。在评估时，这代表了PB-Adv上公开和未公开模型中的最先进水平，并且成功率是下一个最佳公开可访问流水线的两倍以上，而成本仅为其一小部分。

英文摘要

In the past year, custom and unreleased math reasoning models reached gold medal performance on the International Mathematical Olympiad (IMO). Similar performance was then reported using large-scale inference on publicly available models but at prohibitive costs (e.g., 3000 USD per problem). In this work, we present an inference pipeline that attains best-in-class performance on IMO-style math problems at an average inference cost orders of magnitude below competing methods while using only general-purpose off-the-shelf models. Our method relies on insights about grader failure in solver-grader pipelines, which we call the Cognitive Well (iterative refinement converging to a wrong solution that the solver as well as the pipeline's internal grader consider to be basically correct). Our pipeline addresses these failure modes through conjecture extraction, wherein candidate lemmas are isolated from generated solutions and independently verified alongside their negations in a fresh environment (context detachment). On IMO-ProofBench Advanced (PB-Adv), our pipeline achieves 67.1 percent performance using Gemini 3.0 Pro with an average cost per question of approximately 31 USD. At the time of evaluation, this represented the state-of-the-art on PB-Adv among both public and unreleased models, and more than doubles the success rate of the next best publicly accessible pipeline, all at a fraction of the cost.

URL PDF HTML ☆

赞 0 踩 0

2602.17997 2026-06-16 cs.LG cs.RO 版本更新

Whole-Brain Connectomic Graph Model Enables Whole-Body Locomotion Control in Fruit Fly

全脑连接组图模型实现果蝇全身运动控制

Zehao Jin, Yaoye Zhu, Chen Zhang, Yanan Sui

发表机构 * Tsinghua University（清华大学）

AI总结提出Fly-connectomic Graph Model，将果蝇全脑连接组作为图结构控制器，通过深度强化学习驱动仿真果蝇运动，在多种任务中表现稳定且样本效率优于基线。

详情

基于特征分析和图卷积神经网络（GCN）的脑电图（EEG）信号癫痫发作检测在不同频段的研究

Ferdaus Anam Jibon, Fazlul Hasan Siddiqui, F. Deeba, Gahangir Hossain

AI总结提出一种频率感知框架，将EEG分解为五个频段并提取判别特征，利用图卷积神经网络建模电极空间依赖，在CHB-MIT数据集上实现99.01%的宽带准确率，提高了可解释性和诊断精度。

Comments One author disagrees with the archiving

详情

AI中文摘要

癫痫发作是一种神经系统疾病，其特征是大脑中异常和过度的电活动，导致反复发作事件。脑电图（EEG）信号因其能够捕捉时间和空间的神经动力学而被广泛用于癫痫诊断。虽然最近的深度学习方法取得了高检测准确率，但它们往往缺乏可解释性和神经生理学相关性。本研究提出了一种基于发作期EEG分析的频率感知框架用于癫痫发作检测。原始EEG信号被分解为五个频段（delta、theta、alpha、低beta和高beta），并从每个频段提取十一个判别特征。然后采用图卷积神经网络（GCN）对EEG电极之间的空间依赖性进行建模，电极表示为图节点。在CHB-MIT头皮EEG数据集上的实验表明，该方法在相应频段上分别达到了97.1%、97.13%、99.5%、99.7%和51.4%的准确率，总体宽带准确率为99.01%。结果突出了中频段的强判别能力，并揭示了特定频率的发作模式。与传统的宽带EEG方法相比，所提出的方法提高了可解释性和诊断精度。

英文摘要

Epileptic seizures are neurological disorders characterized by abnormal and excessive electrical activity in the brain, resulting in recurrent seizure events. Electroencephalogram (EEG) signals are widely used for seizure diagnosis due to their ability to capture temporal and spatial neural dynamics. While recent deep learning methods have achieved high detection accuracy, they often lack interpretability and neurophysiological relevance. This study presents a frequency-aware framework for epileptic seizure detection based on ictal-phase EEG analysis. The raw EEG signals are decomposed into five frequency bands (delta, theta, alpha, lower beta, and higher beta), and eleven discriminative features are extracted from each band. A graph convolutional neural network (GCN) is then employed to model spatial dependencies among EEG electrodes, represented as graph nodes. Experiments on the CHB-MIT scalp EEG dataset demonstrate high detection performance, achieving accuracies of 97.1%, 97.13%, 99.5%, 99.7%, and 51.4% across the respective frequency bands, with an overall broadband accuracy of 99.01%. The results highlight the strong discriminative capability of mid-frequency bands and reveal frequency-specific seizure patterns. The proposed approach improves interpretability and diagnostic precision compared to conventional broadband EEG-based methods.

URL PDF HTML ☆

赞 0 踩 0

2604.09361 2026-06-16 cs.LG 版本更新

Stochastic-Dimension Frozen Sampled Neural Network for High-Dimensional Gross-Pitaevskii Equations on Unbounded Domains

用于无界域上高维格罗斯-皮塔耶夫斯基方程的随机维度冻结采样神经网络

Zhangyong Liang

发表机构 * National Center for Applied Mathematics, Tianjin University（天津大学应用数学中心）； School of Mathematics and Statistics, Wuhan University（武汉大学数学与统计学院）； School of Mathematics and Statistics & Computational Sciences Hubei Key Laboratory, Wuhan University（武汉大学数学与统计学院及计算科学湖北省重点实验室）

AI总结本文提出了一种名为SD-FSNN的新型计算框架，用于求解高维无界域上的格罗斯-皮塔耶夫斯基方程。该方法通过结合多种技术，克服了传统离散化方法中的维度诅咒和梯度基神经网络求解器的计算瓶颈。首先，预设的高斯包络编码了波函数的远场衰减，使得空间-时间分离得以实现，其中空间近似通过冻结的单隐层神经网络和数据驱动的采样特征进行处理。这导致了一个无梯度的形式化，其中空间导数被解析地预先计算，时间依赖性则通过减少的常微分方程演化。其次，随机维度采样器通过在每个时间步只评估少量空间维度，提供了空间算子的条件无偏估计，从而降低了计算和内存成本。离散守恒定律也被强制执行，确保了长期稳定性。大量的数值实验表明，SD-FSNN在高达1000维的GPE上实现了显著更高的准确性和效率，优于当前最先进的方法，包括PINNs、随机特征方法和张量网络方法。结果证实SD-FSNN有效缓解了冻结基模型在结构解流形上的Kolmogorov n-宽度障碍。

详情

AI中文摘要

本文介绍了一种名为随机维度冻结采样神经网络（SD-FSNN）的新计算框架，用于求解无界域上的高维格罗斯-皮塔耶夫斯基方程（GPE）。所提出的方法通过技术的协同作用，克服了传统离散化方法中的维度诅咒和梯度基神经网络求解器的计算瓶颈。首先，预设的高斯包络编码了波函数的远场衰减，使得空间-时间分离得以实现，其中空间近似通过冻结的单隐层神经网络和数据驱动的采样特征进行处理。这导致了一个无梯度的形式化，其中空间导数被解析地预先计算，时间依赖性则通过减少的常微分方程演化。其次，随机维度采样器通过在每个时间步只评估少量空间维度，提供了空间算子的条件无偏估计，从而降低了计算和内存成本。离散守恒定律也被强制执行，确保了长期稳定性。大量的数值实验表明，SD-FSNN在高达1000维的GPE上实现了显著更高的准确性和效率，优于当前最先进的方法，包括PINNs、随机特征方法和张量网络方法。结果证实SD-FSNN有效缓解了冻结基模型在结构解流形上的Kolmogorov n-宽度障碍。

英文摘要

This paper introduces the Stochastic-Dimension Frozen Sampled Neural Network (SD-FSNN), a novel computational framework for solving high-dimensional Gross-Pitaevskii equation (GPE) on unbounded domain. The proposed method circumvents the curse-of-dimensionality that plagues traditional discretizations and the computational bottlenecks of gradient-based neural network solvers through a synergistic combination of techniques. First, a prescribed Gaussian envelope encodes the far-field decay of the wavefunction, enabling a space-time separation where the spatial approximation is handled by a frozen, single-hidden-layer neural network with data-driven sampled features. This yields a gradient-free formalism where spatial derivatives are analytically precomputed and time-dependence is evolved via reduced ODEs. Second, a stochastic-dimension sampler provides a conditionally unbiased estimate of the spatial operator by evaluating only a small subset of spatial dimensions at each time step, essentially reducing computational and memory costs. Discrete conservation laws are also enforced, ensuring long-term stability. Extensive numerical experiments on GPE in up to 1000 dimensions demonstrate that SD-FSNN achieves significantly higher accuracy and efficiency compared to state-of-the-art methods, including PINNs, randomized feature methods, and tensor-network approaches. The results confirm that SD-FSNN effectively mitigates the Kolmogorov $n$-width barrier for frozen-basis models on structured solution manifolds.

URL PDF HTML ☆

赞 0 踩 0

2604.09673 2026-06-16 cs.LG cs.AI 版本更新

Active Inference with a Self-Prior in the Mirror-Mark Task

镜像标记任务中带有自我先验的主动推理

Dongmin Kim, Hoshinori Kanazawa, Yasuo Kuniyoshi

发表机构 * The University of Tokyo（东京大学）； Laboratory for Intelligent Systems and Informatics（智能系统与信息学实验室）

AI总结提出一种基于自我先验的计算模型，通过主动推理驱动标记导向行为，无需外部奖励即可模拟镜像自我识别。

Comments 8 pages, 5 figures, Accepted to IEEE ICDL 2026

详情

AI中文摘要

镜像自我识别测试评估受试者是否触摸仅在镜子中可见的自身标记，被广泛用作自我意识的指标。在本研究中，我们提出一个计算模型，其中这种行为通过单一机制——自我先验——自发产生，无需任何外部奖励。自我先验通过Transformer实现，学习熟悉多感官经验的密度；当出现新标记时，与学习分布的差异通过主动推理驱动标记导向行为。一个仅依赖视觉和本体感觉而无触觉输入的模拟婴儿，发现镜中自己脸上的贴纸并在约70%的情况下将其移除，无需任何明确指令。贴纸移除后预期自由能显著下降，证实自我先验作为区分自我与非自我的内部标准。跨模态采样进一步表明，自我先验捕获视觉-本体感觉关联，充当概率身体图式。这些结果为镜像测试中观察到的关键行为提供了简洁的计算解释，并表明自由能原理可作为研究自我意识发展起源的统一假设。代码见：this https URL

英文摘要

The mirror self-recognition test evaluates whether a subject touches a mark on its own body that is visible only in a mirror, and is widely used as an indicator of self-awareness. In this study, we present a computational model in which this behavior emerges spontaneously through a single mechanism, the self-prior, without any external reward. The self-prior, implemented with a Transformer, learns the density of familiar multisensory experiences; when a novel mark appears, the discrepancy from this learned distribution drives mark-directed behavior through active inference. A simulated infant, relying solely on vision and proprioception without tactile input, discovered a sticker placed on its own face in the mirror and removed it in approximately 70% of cases without any explicit instruction. Expected free energy decreased significantly after sticker removal, confirming that the self-prior operates as an internal criterion for distinguishing self from non-self. Cross-modal sampling further demonstrated that the self-prior captures visual--proprioceptive associations, functioning as a probabilistic body schema. These results provide a concise computational account of the key behavior observed in the mirror test and suggest that the free energy principle can serve as a unifying hypothesis for investigating the developmental origins of self-awareness. Code is available at: https://github.com/kim135797531/self-prior-mirror

URL PDF HTML ☆

赞 0 踩 0

2605.04813 2026-06-16 cs.LG 版本更新

A Biased Nonnegative Block Term Tensor Decomposition Model for Dynamic QoS Prediction

一种用于动态QoS预测的有偏非负块项张量分解模型

Wenjing Liu, Yujia Lei, Qu Wang

发表机构 * GitHub

AI总结提出BNBT框架，采用有偏非负块项张量分解增强表示能力，引入线性偏置项并设计SLF-NMUT算法，在动态QoS预测中显著提升精度。

详情

AI中文摘要

随着云计算和Web服务的快速发展，服务质量（QoS）已成为服务选择与推荐的关键标准。张量潜在特征分析为建模多维QoS数据提供了有效途径，现有大多数QoS预测方法主要基于规范多元分解（CP分解）或Tucker分解。然而，受限于其固有结构特性，这些方法无法准确捕捉用户-服务交互中复杂且动态的依赖关系，从而限制了预测性能。为解决此问题，本文提出一种基于有偏非负块项张量分解模型的动态QoS预测框架，称为BNBT。具体而言，该框架从三个方面进行构建：（1）采用块项张量分解增强潜在特征学习的表示能力；（2）引入线性偏置项以进一步提高预测精度；（3）设计一种面向张量的单元素依赖非负乘性更新算法SLF-NMUT，用于高效参数估计。在真实QoS数据集上的大量实验表明，所提出的BNBT框架在预测精度上持续优于多种先进的QoS预测方法。

英文摘要

With the rapid development of cloud computing and Web services, Quality of Service (QoS) has become a key criterion for service selection and recommendation. Tensor latent feature analysis provides an effective way to model multidimensional QoS data, and most existing QoS prediction methods are mainly based on Canonical Polyadic (CP) decomposition or Tucker decomposition. However, constrained by their inherent structural properties, these methods cannot accurately capture the complex and dynamic dependencies in user-service interactions, which limits their prediction performance. To address this issue, this paper proposes a dynamic QoS prediction framework based on the Biased Nonnegative Block Term Tensor Decomposition Model, termed BNBT. Specifically, the proposed framework is developed from three aspects: (1) block term tensor decomposition is employed to enhance the representation capability of latent feature learning; (2) linear bias terms are incorporated to further improve prediction accuracy; and (3) a tensor-oriented single-element-dependent nonnegative multiplicative update algorithm, called SLF-NMUT, is designed for efficient parameter estimation. Extensive experiments on real-world QoS datasets demonstrate that the proposed BNBT framework consistently outperforms several state-of-the-art QoS prediction methods in terms of prediction accuracy.

URL PDF HTML ☆

赞 0 踩 0

2605.23234 2026-06-16 cs.LG cs.CY 版本更新

Assessing Predictive Models for Fairness Based on Movement Patterns

基于移动模式评估预测模型的公平性

Francesco Lettich, Mario A. Nascimento, Chiara Pugliese, Chiara Renso

发表机构 * University of Padua（帕多瓦大学）

AI总结针对预测模型的空间公平性，提出将公平性概念从单一地理位置扩展到移动模式，并采用空间扫描统计方法检测基于移动模式的不公平性。

Comments 33 pages, 10 figures, 7 tables

详情

AI中文摘要

评估预测模型的空间公平性涉及确定模型是否在统计上惩罚（或偏袒）与某些地理位置相关的个体。关于这一主题的文献基本假设每个个体被分配到一个单一地理位置（例如居住地）。然而，当考虑公平性时，个体所到过的位置集合（即他们在不同区域的移动模式）也很重要。因此，我们认为有必要将空间公平性的概念推广到包括移动模式，从而引出评估预测模型相对于个体移动的公平性的新问题。为了解决这个问题，我们提出了一种方法，首先将个体的移动与特定地理区域关联，考虑具有不同分辨率和对齐方式的多个空间划分，然后采用合适的空间扫描统计量来评估预测模型是否基于移动模式是公平的。在实验评估中，我们研究了该方法在数千个合成的不公平数据集上的性能，结果表明它能够有效检测这种新型的不公平性并检索受到不公平对待的对象集合，而定位性能表现出一致的多分辨率权衡。

英文摘要

Assessing the spatial fairness of predictive models involves establishing whether they are statistically penalizing (favoring) individuals associated with certain geographical locations. Literature on this topic makes the fundamental assumption that each individual is assigned to a single geographical location (e.g., place of residence). However, fairness with respect to the set of locations where one has been, i.e., their movement patterns over different regions, also matters when fairness is considered. Consequently, we argue that it is necessary to generalize the notion of spatial fairness to also include movement patterns, leading to the novel problem of assessing predictive models for fairness relative to the movements of individuals. To deal with this problem, we propose an approach that first associates the movements of individuals to certain geographic regions, considering multiple spatial partitions with different resolutions and alignments, and then employs a suitable spatial scan statistic to assess whether a predictive model is fair based on movement patterns. In the experimental evaluation, we study the performance of our approach over thousands of synthetic unfair datasets, showing that it is effective at detecting this new type of unfairness and at retrieving the set of objects treated unfairly, while localization performance exhibits a consistent multi-resolution trade-off.

URL PDF HTML ☆

赞 0 踩 0

2606.04145 2026-06-16 cs.LG cs.AI cs.DC 版本更新

EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms

EvalStop：利用世界反馈检测和纠正多租户RLHF平台中的奖励过度优化

Guilin Zhang, Chuanyi Sun, Kai Zhao, Xu Chu, Shahryar Sarkani, John M. Fossaceca

发表机构 * DeepMind, London, UK（深度Mind, 英国伦敦）； University of Cambridge, UK（英国剑桥大学）； University of Washington, USA（美国华盛顿大学）

AI总结提出EvalStop调度原语，通过检测评估分数连续下降来终止作业、释放GPU并保留最佳检查点，以纠正奖励过度优化，在RLHF负载上实现高精度检测并提升JCT。

详情

AI中文摘要

云LLM微调平台越来越多地服务于RLHF工作负载，其中学习到的奖励模型作为人类质量的代理被优化。正如Gao等人(2023)所示，在持续优化压力下，该代理与世界反馈（下游评估指标）发生偏离，这种现象称为奖励过度优化。现有的平台调度器忽略这种偏离：非预见性调度器优化JCT而不考虑任何质量信号，SLAQ式质量感知调度器使用训练损失（一个单调下降的较弱代理，可通过黑客攻击降低），而经典的每作业早停需要人工监控且不释放共享GPU。我们提出EvalStop，一个可组合的调度原语，它在连续k次评估分数下降时终止作业，释放GPU，保留最佳检查点，并委托给任何基础调度器。我们将调度器级别的早停视为检测问题，并在一个离散事件模拟器中评估它，该模拟器的RLHF工作负载混合了奖励黑客攻击和结构健康运行，真实标签对调度器隐藏。在RLHF密集型负载（80% RLHF，64 GPU）上，EvalStop实现了精确率98%、召回率99%、假阳性率1.5%，同时相比SRTF-Est将JCT提高了9%，将浪费的计算减少了22%（p<0.05）。简单的固定进度和损失平台竞争对手要么在健康RLHF上产生65%的假阳性率，要么错过超过一半的真实黑客攻击案例。增益在所有测试的基础调度器上均成立（JCT提升9-25%），且检测质量在评估噪声（噪声标准差≤0.05时精确率至少91%）和黑客攻击基础率（黑客攻击比例20-80%时精确率至少89%）下保持稳定。

英文摘要

Cloud LLM fine-tuning platforms increasingly serve RLHF workloads, where a learned reward model is optimized as a proxy for human quality. As Gao et al. (2023) showed, this proxy diverges from world feedback (downstream eval metrics) under sustained optimization pressure, a phenomenon known as reward overoptimization. Existing platform schedulers ignore this divergence: non-clairvoyant schedulers optimize JCT without any quality signal, SLAQ-style quality-aware schedulers use training loss (a weaker proxy that drops monotonically through hacking), and classical per-job early stopping requires human monitoring and does not free shared GPUs. We propose EvalStop, a composable scheduling primitive that terminates jobs on k consecutive eval-score declines, releases GPUs, preserves the best checkpoint, and delegates to any base scheduler. We frame scheduler-level early stopping as a detection problem and evaluate it in a discrete-event simulator whose RLHF workload mixes reward-hacking and structurally healthy runs, with ground-truth labels hidden from schedulers. On RLHF-heavy workloads (80% RLHF, 64 GPUs), EvalStop achieves precision 98% / recall 99% / FPR 1.5% while improving JCT by 9% and cutting wasted compute by 22% over SRTF-Est (p<0.05). Trivial fixed-progress and loss-plateau competitors either incur 65% FPR on healthy RLHF or miss over half of true hacking cases. Gains compose across every base scheduler tested (9-25% JCT) and detection quality stays stable under eval noise (precision at least 91% at noise std <= 0.05) and hacking base rate (precision at least 89% across 20-80% hacking fractions).

URL PDF HTML ☆

赞 0 踩 0

2606.05693 2026-06-16 cs.LG cs.IR 版本更新

MolE-RAG: Molecular Structure-Enhanced Retrieval-Augmented Generation for Chemistry

MolE-RAG：面向化学的分子结构增强检索增强生成

Joey Chan, Wonbin Kweon, Ashley Shin, Niharika Bhattacharjee, Pengcheng Jiang, Yue Guo, Jiawei Han

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； University of California, San Diego（加州大学圣地亚哥分校）

AI总结提出无需训练的分子中心检索增强生成框架MolE-RAG，通过整合检索文献、分子特定信息和结构相似分子三种上下文，显著提升LLM在分子性质预测任务中的性能。

详情

AI中文摘要

大型语言模型（LLM）在分子性质预测方面展现出潜力，但其对化学结构的推理能力仍然有限，因为分子表示（如SMILES）与LLM主要训练的自然语言存在显著差异。为弥合这一语义和化学知识鸿沟，我们提出MolE-RAG，一种无需训练的、以分子为中心的检索增强生成框架，用于基于LLM的分子性质预测。MolE-RAG通过三种互补的推理时上下文来源增强每次预测：检索的化学文献、分子特定信息（包括化合物同义词、标识符、官能团注释和物理化学描述符），以及从训练集中检索的结构相似分子。我们使用专有、化学专用和开源LLM在九个分子性质预测任务上评估MolE-RAG。在通用LLM上，相比仅使用SMILES的基线，MolE-RAG在分类任务上将ROC-AUC提升最多28个百分点，并将回归RMSE降低最多67%。我们进一步发现，每种上下文来源的效用因模型和任务而异，不同模型分别从文本检索、分子上下文或结构检索中获益最多。这些结果表明，以分子为中心的检索可以在无需模型微调的情况下改进基于LLM的分子性质预测，同时为在推理时整合异构化学知识提供灵活框架。

英文摘要

Large language models (LLMs) have shown promise for molecular property prediction, but their ability to reason over chemical structures remains limited, as molecular representations such as SMILES differ substantially from the natural language on which LLMs are primarily trained. To bridge this semantic and chemical knowledge gap, we propose MolE-RAG, a training-free, molecule-centric retrieval-augmented generation framework for LLM-based molecular property prediction. MolE-RAG augments each prediction with three complementary sources of inference-time context: retrieved chemistry literature, molecule-specific information including compound synonyms, identifiers, functional group annotations, and physicochemical descriptors, and structurally similar molecules retrieved from the training set. We evaluate MolE-RAG across nine molecular property prediction tasks using proprietary, chemistry-specialized, and open-source LLMs. Across general-purpose LLMs, MolE-RAG improves ROC-AUC by up to 28 percentage points on classification tasks and reduces regression RMSE by up to 67% relative to a SMILES-only baseline. We further find that the utility of each context source varies across models and tasks, with different models benefiting most from textual retrieval, molecular context, or structural retrieval. These results suggest that molecule-centric retrieval can improve LLM-based molecular property prediction without model fine-tuning while providing a flexible framework for integrating heterogeneous chemical knowledge at inference time.

URL PDF HTML ☆

赞 0 踩 0

2606.07226 2026-06-16 cs.LG cs.AI cs.CL 版本更新

DEFINED: A Data-Efficient Computational Framework for Fine-Grained Creativity Assessment in Debate Scenarios

DEFINED: 辩论场景中细粒度创造力评估的数据高效计算框架

Tongzhou Yu, Mingjia Li, Hong Qian, Wenkai Wang, Zongbao Zhang, Yaoyu Jiang, Xiangfeng Wang, Aimin Zhou, Jiajun Guo

发表机构 * Nanjing University（南京大学）； Shanghai Innovation Institute（上海创新研究院）； East China Normal University（华东师范大学）

AI总结提出DEFINED框架，通过层次化八维指标体系、预训练语言模型和混合粒度训练策略，在辩论场景中实现数据高效的细粒度创造力自动评估，优于现有方法。

Comments Accepted by KDD 2026

详情

DOI: 10.1145/3770855.3817874

AI中文摘要

人类创造力已成为大语言模型时代的关键能力。在复杂、开放环境中评估创造力是数据挖掘领域的一大挑战，目前受限于对标准化简单任务的依赖以及细粒度专家数据的稀缺。作为生态有效的评估场景，辩论反映了创造力的多个维度，涵盖发散思维和收敛思维。此外，辩论是一个数据丰富的领域，拥有大量公开可获取的材料。当前主流的自动评分方法难以适应辩论等复杂场景，因此仍然依赖昂贵的人工评估。为此，本文提出DEFINED，一种数据高效的计算框架，用于辩论场景中的细粒度创造力评估。DEFINED通过层次化的八维指标体系操作化辩论创造力，采用预训练自回归语言模型，并配备支持细粒度和粗粒度评估的层次化评分头。从真实辩论比赛中获取陈述及其相关专家评分，并采用约束数据增强策略以解决原始数据中的精英偏差。DEFINED采用混合粒度训练策略，能够从训练有素的研究生专家提供的有限细粒度监督中实现鲁棒学习。为严格验证超越合成基准的生态效度，我们纳入了一项针对辩论新手参与者的实证研究，利用这些真实数据作为中低水平人群的定性案例研究。在我们的评估协议中，评分模型实现了准确且稳定的评分，优于基于提示的大语言模型评估器和现有的辩论评分方法。

英文摘要

Human creativity has emerged as a critical competency in the era of large language models. Assessing creativity in complex, open-ended environments is a grand challenge in data mining, currently hindered by a reliance on standardized simple tasks and the scarcity of fine-grained expert data. As an ecologically valid assessment context, debate reflects multiple dimensions of creativity, encompassing both divergent thinking and convergent thinking. Moreover, debate is a data-rich domain, with a large volume of publicly accessible materials. Current mainstream automated scoring methods are poorly suited to complex settings such as debate, and therefore still rely on costly human evaluation. To this end, this paper proposes DEFINED, a data-efficient computational framework for fine-grained creativity assessment in debate scenarios. DEFINED operationalizes debate creativity through a hierarchical eight-dimensional metric system, implemented via a pre-trained autoregressive language model with a hierarchical scoring head that supports both fine-grained and coarse-grained evaluation. Statements and their associated expert scores were obtained from authentic debate competitions, and a constrained data augmentation strategy was employed to address the elite bias inherent in the original data. DEFINED adopts a mixed-granularity training strategy enabling robust learning from limited fine-grained supervision annotated by trained graduate experts. To rigorously validate ecological validity beyond synthetic benchmarks, we incorporate an empirical study with debate-naive participants, utilizing these authentic data to serve as a qualitative case study for mid-to-low proficiency populations. Across our evaluation protocol, our scoring model achieves accurate and stable scoring, outperforming prompt-based large language model evaluators and existing debate scoring methods.

URL PDF HTML ☆

赞 0 踩 0

2606.08592 2026-06-16 cs.LG quant-ph 版本更新

Quantum Global Variational Learning for Quantum Error Correction

量子全局变分学习用于量子纠错

Shun Ryuzaki, Hideo Mukai

发表机构 * Meiji University（明治大学）

AI总结提出一种全局结构的量子神经网络，减少量子电路中酉矩阵数量，训练时间降低97%，训练完成率提升25%，实现100%训练成功率，纠错性能超越以往研究。

Comments 24 pages, 22 figures

2405.15768 2026-06-16 stat.ML cs.AI cs.LG 版本更新

Canonical Variates in Wasserstein Metric Space

Wasserstein度量空间中的典型变量

Jia Li, Lin Lin

发表机构 * Department of Statistics, The Pennsylvania State University（宾夕法尼亚州立大学统计学系）； Department of Biostatistics and Bioinformatics, Duke University（杜克大学生物统计学与生物信息学系）

AI总结针对分布数据分类问题，提出基于Wasserstein距离的Fisher比最大化降维方法，通过迭代优化算法实现，实验证明能显著提升分类性能。

Comments single space 39 pages, 10 figures

详情

AI中文摘要

在本文中，我们处理由向量空间上的分布（而非单个点）表示的实例的分类问题。我们考虑基于成对距离的分类算法，特别是分布之间的Wasserstein度量。我们研究的核心是在Wasserstein度量空间中进行降维以提高分类准确性。我们引入了一种基于最大化Fisher比（定义为类间变异与类内变异之比）原理的新方法。该比值最大化的方向被称为判别坐标或典型变量轴。在实践中，类间变异和类内变异被定义为分布对之间的平均平方Wasserstein距离，这些分布对要么属于同一类，要么属于不同类。该比值优化通过一种迭代算法实现，该算法在向量空间中的最优传输和最大化步骤之间交替进行。进行了实证研究以评估算法的收敛性；实验结果表明，降维技术显著提高了分类性能。此外，新方法优于基于从分布数据派生的向量表示运行的成熟算法。它对实例如何由分布总结的变化（例如高斯混合模型表示中的分量数量）也表现出鲁棒性。

英文摘要

In this paper, we address the classification of instances represented by distributions on a vector space rather than single points. We consider classification algorithms based on pairwise distances, specifically, the Wasserstein metric between distributions. Central to our investigation is dimension reduction within the Wasserstein metric space to enhance classification accuracy. We introduce a novel approach grounded in the principle of maximizing Fisher's ratio, defined as the quotient of between-class variation to within-class variation. The directions in which this ratio is maximized are termed discriminant coordinates or canonical variates axes. In practice, both between-class and within-class variations are defined as the average squared Wasserstein distances between pairs of distributions, with the pairs either belonging to the same class or to different classes. This ratio optimization is achieved through an iterative algorithm, which alternates between optimal transport and maximization steps within the vector space. Empirical studies are conducted to assess the algorithm's convergence; and experimental results demonstrate that the dimension reduction technique substantially enhances classification performance. Moreover, the new method outperforms well-established algorithms that operate on vector representations derived from distributional data. It also exhibits robustness to variations in how instances are summarized by distributions, such as the number of components in a Gaussian mixture model (GMM) representation.

URL PDF HTML ☆

赞 0 踩 0

2406.06855 2026-06-16 math.OC cs.LG 版本更新

Design and Scheduling of an AI-based Queueing System

基于AI的排队系统的设计与调度

Jiung Lee, Hongseok Namkoong, Yibo Zeng

发表机构 * Columbia University（哥伦比亚大学）

AI总结针对预测模型在服务系统中与人类服务器交互的场景，研究预测误差对拥塞成本的影响，提出一种基于索引的策略，在重流量下近最优地利用预测类别信息，并指导预测模型选择。

详情

AI中文摘要

为了利用预测模型在服务系统中做出最优调度决策，我们必须理解预测误差如何通过影响其他作业延迟的外部性而导致拥塞。受预测模型与人类服务器（例如内容审核）交互的应用启发，我们考虑一个由多个单服务器队列组成的大型排队系统，其中作业的类别使用预测模型估计。通过刻画重流量下误预测对拥塞成本的影响，我们设计了一种基于索引的策略，该策略以近最优的方式整合预测类别信息。我们的理论结果通过提供一个以下游排队性能为核心关注的简单模型选择程序，指导了预测模型的设计，并为如何设计基于AI分诊的排队系统提供了新颖见解。我们基于真实在线评论的内容审核任务说明了我们的框架，其中通过微调大型语言模型构建毒性分类器。

英文摘要

To leverage prediction models to make optimal scheduling decisions in service systems, we must understand how predictive errors impact congestion due to externalities on the delay of other jobs. Motivated by applications where prediction models interact with human servers (e.g., content moderation), we consider a large queueing system comprising of many single server queues where the class of a job is estimated using a prediction model. By characterizing the impact of mispredictions on congestion cost in heavy traffic, we design an index-based policy that incorporates the predicted class information in a near-optimal manner. Our theoretical results guide the design of predictive models by providing a simple model selection procedure with downstream queueing performance as a central concern, and offer novel insights on how to design queueing systems with AI-based triage. We illustrate our framework on a content moderation task based on real online comments, where we construct toxicity classifiers by finetuning large language models.

URL PDF HTML ☆

赞 0 踩 0

2411.05824 2026-06-16 eess.IV cs.CV cs.LG 版本更新

Navigating Distribution Shifts in Medical Image Analysis: A Survey

医学图像分析中的分布偏移导航：综述

Zixian Su, Jingwei Guo, Xi Yang, Qiufeng Wang, Frans Coenen, Amir Hussain, Kaizhu Huang

发表机构 * Life Simulation Research Center, Beijing Academy of Artificial Intelligence（北京人工智能生命模拟研究中心）； Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology（王国阿卜杜勒·阿齐兹国王科技大学电气与数学科学与工程系）； Department of Intelligent Science, School of Advanced Technology, Xi’an Jiaotong-Liverpool University（西安交通大学利物浦大学先进科技学院智能科学系）； Computer Science, School of Computer Science and Informatics, University of Liverpool（利物浦大学计算机科学与信息学学院）； SDAIA-KFUPM Joint Research Centre for Artificial Intelligence, King Fahd University of Petroleum and Minerals（法赫德石油与矿物大学人工智能SDAIA-KFUPM联合研究中心）； Nuffield Department of Primary Care Health Sciences, University of Oxford（牛津大学初级保健健康科学努尔菲尔德部门）

AI总结本文系统综述了应对医学图像分析中分布偏移的深度学习方法，按临床约束分类为联合训练、联邦学习、微调和域泛化，并揭示方法从显式对齐向不确定性建模的转变。

详情

AI中文摘要

医学图像分析（MedIA）已成为现代医疗保健中不可或缺的一部分，增强了临床诊断和个性化治疗。尽管深度学习（DL）技术取得了显著进展，但其实际部署面临分布偏移带来的挑战，即基于特定数据集训练的模型在不同医院或患者群体的数据上表现不佳。为解决这一问题，研究人员积极开发策略以提高DL模型的适应性，使其能够在陌生环境中有效使用。本文系统综述了将DL技术应用于受分布偏移影响的MedIA系统的方法。我们并非按技术特征组织现有方法，而是明确将现实临床约束（如有限的数据可访问性、严格的隐私要求和异构协作协议）与能够解决这些约束的技术范式联系起来。通过建立操作约束与方法论演变之间的这种联系，我们将现有工作分类为联合训练、联邦学习、微调和域泛化，每种方法对应特定的医疗场景。除了这种分类，我们的实证分析表明，随着这些范式中域信息逐渐变得不可访问，性能改进变得越来越受限，并进一步揭示了方法论焦点从显式分布对齐向不确定性感知建模的逐渐转变，最终指向在实际MedIA中需要更多可部署性感知的设计。

英文摘要

Medical Image Analysis (MedIA) has become indispensable in modern healthcare, enhancing clinical diagnostics and personalized treatment. Despite the remarkable advancements supported by deep learning (DL) technologies, their practical deployment faces challenges posed by distribution shifts, where models trained on specific datasets underperform on others from varying hospitals, or patient populations. To address this issue, researchers have been actively developing strategies to increase the adaptability of DL models, enabling their effective use in unfamiliar environments. This paper systematically reviews approaches that apply DL techniques to MedIA systems affected by distribution shifts. Rather than organizing existing methods by technical characteristics, we explicitly bridge real-world clinical constraints -- such as limited data accessibility, strict privacy requirements, and heterogeneous collaboration protocols -- with the technical paradigms able to address them. By establishing this connection between operational constraints and methodological evolution, we categorize existing works into Joint Training, Federated Learning, Fine-tuning, and Domain Generalization, each aligned with specific healthcare scenarios. Beyond this taxonomy, our empirical analysis suggests that, as domain information becomes progressively less accessible across these paradigms, performance improvements become increasingly constrained, and further uncovers a gradual shift in methodological focus from explicit distribution alignment toward uncertainty-aware modeling, ultimately pointing to the need for more deployability-aware design in real-world MedIA.

URL PDF HTML ☆

赞 0 踩 0

2411.18714 2026-06-16 cs.RO cs.AI cs.LG 版本更新

Explainable deep learning improves human mental models of self-driving cars

可解释深度学习提升人类对自动驾驶汽车的心理模型

Eoin M. Kenny, Akshay Dharmavaram, Sang Uk Lee, Tung Phan-Minh, Shreyas Rajesh, Yunqing Hu, Laura Major, Momchil S. Tomov, Julie A. Shah

发表机构 * Computer Science & Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology（计算机科学与人工智能实验室（CSAIL），麻省理工学院）； Motional AD Inc.（Motional AD公司）； Department of Psychology and Center for Brain Science, Harvard University（心理学系和大脑科学中心，哈佛大学）； Department of Aeronautics and Astronautics, Massachusetts Institute of Technology（航空与宇航系，麻省理工学院）

AI总结提出概念包装网络（CW-Net），在真实自动驾驶车上实现可解释规划，通过因果性概念解释提升驾驶员对车辆行为的预测能力，尤其在意外场景中。

Comments MST & JAS contributed equally to this work

详情

AI中文摘要

自动驾驶汽车越来越依赖深度神经网络来实现类人驾驶。这种黑箱规划器的不透明性使得准确预测其何时会失败变得具有挑战性，可能带来灾难性后果。尽管关于解释这些系统的研究激增，但由于实际部署的困难，大部分研究局限于模拟或玩具设置，使得这些技术的实际效用未知。在此，我们引入概念包装网络（CW-Net），一种忠实解释基于机器学习的规划器行为的方法，该方法在不牺牲性能的情况下，将其推理因果地扎根于人类可解释的概念。我们在真实自动驾驶车上部署CW-Net，并表明由此产生的解释改善了人类驾驶员对车辆的心理模型，使他们能够更好地预测其行为，特别是在意外情况下。这表明，集成到自动驾驶汽车中的可解释深度学习在现实部署环境中既易于理解又有用。我们预计我们的方法可以应用于其他安全关键系统，如自主无人机和机器人外科医生，以及其他架构，如端到端学习系统和视觉-语言-动作模型。总体而言，我们的研究为自主代理的可解释性建立了一条经过部署验证的路径，这可能有助于使其更加透明和安全。

英文摘要

Self-driving cars increasingly rely on deep neural networks to achieve human-like driving. The opacity of such black-box planners makes it challenging to accurately anticipate when they will fail, with potentially catastrophic consequences. While research into interpreting these systems has surged, most of it is confined to simulations or toy setups due to the difficulty of real-world deployment, leaving the practical utility of such techniques unknown. Here, we introduce the Concept-Wrapper Network (CW-Net), a method for faithfully explaining the behavior of machine-learning-based planners that causally grounds their reasoning in human-interpretable concepts without sacrificing performance. We deploy CW-Net on a real self-driving car and show that the resulting explanations improve the human driver's mental model of the vehicle, allowing them to better predict its behavior, particularly in surprising situations. This demonstrates that explainable deep learning integrated into self-driving cars can be both understandable and useful in a realistic deployment setting. We anticipate our method could be applied to other safety-critical systems, such as autonomous drones and robotic surgeons, as well as to other architectures, such as end-to-end learning systems and vision-language-action models. Overall, our study establishes a deployment-validated pathway to interpretability for autonomous agents, which could help make them more transparent and safe.

URL PDF HTML ☆

赞 0 踩 0

2505.08774 2026-06-16 q-bio.BM cs.LG 版本更新

Generative Molecular Design with Steerable and Granular Synthesizability Control

具有可引导和粒度合成性控制的生成式分子设计

Jeff Guo, Víctor Sabanza-Gil, Olha Semenenko, Oleksii Hrabovskyi, Mykola Protopopov, Anna Kapeliukha, Oleksandr Mosia, Sofiia Hatych, Diana Alieksieieva, Tom Nelis, Patrick Molliet, Helena Solé-Àvila, Valentas Olikauskas, Nina Aregger, Irina Morozova, Joseph Schmidt, Zlatko Jončev, Olga Tarkhanova, Petro Borysko, Jerome Waser, Bruno Correia, Jeremy Luterbacher, Philippe Schwaller

发表机构 * Laboratory of Artificial Chemical Intelligence (LIAC)（人工化学智能实验室）； NCCR Catalysis（催化联合研究所）； Laboratory of Sustainable and Catalytic Processing (LPDC)（可持续与催化加工实验室）； CHEMSPACE LLC ； Enamine Ltd.（Enamine有限公司）； Taras Shevchenko National University of Kyiv（基塔斯·谢甫琴科基辅国立大学）； V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry（V. P. Kukhar生物有机化学与石油化学研究所）； Palladin Institute of Biochemistry（Palladin生物化学研究所）； Laboratory of Catalysis and Organic Synthesis (LCSO)（催化与有机合成实验室）； Laboratory of Protein Design and Immunoengineering (LPDI)（蛋白质设计与免疫工程实验室）

AI总结提出统一合成约束分子设计与超大规模虚拟筛选的生成框架，通过可引导和粒度合成性控制，生成满足多参数优化目标且具有预测合成路径的分子，在BRD4和Wee1靶点上验证了有效性。

详情

AI中文摘要

设计既具有最佳性质又易于合成的分子是药物发现中的核心挑战。现有考虑合成性的工作可以联合输出生成分子的预测合成路线。然而，在解决合成难易程度以及灵活纳入所需反应约束方面，关注甚少。另一方面，虚拟筛选搜索可商购化合物，但在扩展到超大规模（十亿级及以上）化学空间时带来挑战。在这里，我们提出一个生成式设计框架，通过可引导和粒度合成性控制，统一了合成约束分子设计与超大规模虚拟筛选。生成的分子满足任意多参数优化目标，其预测合成路线满足混合匹配约束：包括或排除特定反应、纳入特定构建模块以及最小化合成路线长度。在针对BRD4的端到端内部活动中，我们设计了可用特定选定反应和构建模块合成的分子，合成了所有六个选定化合物，并鉴定了两个微摩尔级结合剂。我们进一步证明，反应控制能够有效导航超大规模按需化学空间，以识别性质最优的候选分子。通过将我们的框架应用于Chemspace的Freedom 4.0按需空间（1420亿分子），我们在单个消费级GPU（仅8 GB GPU内存）上生成了约32万分子（库的0.00023%），并在60个合成候选物中鉴定出一个微摩尔级Wee1结合剂。因此，单一统一框架能够生成新颖的可合成分子并检索目录就绪候选物，为缓解合成性瓶颈提供了灵活解决方案。

英文摘要

Designing molecules that are both property-optimal and readily synthesizable is a central challenge in drug discovery. Existing works that do consider synthesizability can jointly output predicted synthesis routes for generated molecules. However, there has been minimal attention in addressing the ease of synthesis and with flexibility to incorporate desired reaction constraints. On the other hand, virtual screening searches for commercially available compounds, but imposes challenges when scaling to ultra-large (billion-size and beyond) chemical spaces. Here, we propose a generative design framework that unifies synthesis-constrained molecular design and ultra-large-scale virtual screening through steerable and granular synthesizability control. Generated molecules satisfy arbitrary multi-parameter optimization objectives with predicted synthesis routes satisfying mix-and-match constraints: including or avoiding certain reactions, incorporating specific building blocks, and minimizing synthesis route length. In an end-to-end in-house campaign targeting BRD4, we designed molecules synthesizable with specific selected reactions and building blocks, synthesized all six selected compounds, and identified two micromolar binders. We further demonstrate that reaction control enables efficient navigation of ultra-large make-on-demand chemical spaces to identify property-optimal candidates. By applying our framework to Chemspace's Freedom 4.0 make-on-demand space (142 billion molecules), we generated ~320k molecules (0.00023% of the library) on a single consumer-grade GPU (with only 8 GB GPU memory) and identified a micromolar Wee1 binder amongst 60 synthesized candidates. The single unified framework thus enables generating novel synthesizable molecules and retrieving catalogue-ready candidates, offering a flexible solution to mitigating the synthesizability bottleneck.

URL PDF HTML ☆

赞 0 踩 0

2506.20668 2026-06-16 cs.RO cs.LG 版本更新

DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy

DemoDiffusion: 使用预训练扩散策略的一次性人类模仿

Sungjae Park, Homanga Bharadhwaj, Shubham Tulsiani

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结提出DemoDiffusion方法，通过单次人类演示和预训练扩散策略，无需任务特定训练即可使机器人执行操作任务，在8项任务中平均成功率达83.8%。

Comments 11 pages. Published at ICRA 2026

详情

AI中文摘要

我们提出DemoDiffusion，一种简单的方法，使机器人能够通过模仿单次人类演示来执行操作任务，无需任务特定训练或配对的人-机器人数据。我们的方法基于两个见解。首先，人类演示中的手部运动为机器人的末端执行器轨迹提供了有用的先验，我们可以通过运动学重定向将其转换为粗略的开环机器人运动轨迹。其次，虽然这种重定向的运动捕捉了任务的整体结构，但它可能无法很好地与上下文中的合理机器人动作对齐。为了解决这个问题，我们利用预训练的通用扩散策略来修改轨迹，确保它既遵循人类运动，又保持在合理机器人动作的分布内。与基于在线强化学习或配对的人-机器人数据的方法不同，我们的方法能够以最小的努力稳健地适应新任务和场景。在涵盖8种不同操作任务的实际实验中，DemoDiffusion实现了83.8%的平均成功率，而预训练策略为13.8%，运动学重定向为52.5%，甚至在预训练通用策略完全失败的任务上也取得了成功。项目页面：此 https URL

英文摘要

We propose DemoDiffusion, a simple method for enabling robots to perform manipulation tasks by imitating a single human demonstration, without requiring task-specific training or paired human-robot data. Our approach is based on two insights. First, the hand motion in a human demonstration provides a useful prior for the robot's end-effector trajectory, which we can convert into a rough open-loop robot motion trajectory via kinematic retargeting. Second, while this retargeted motion captures the overall structure of the task, it may not align well with plausible robot actions in-context. To address this, we leverage a pre-trained generalist diffusion policy to modify the trajectory, ensuring it both follows the human motion and remains within the distribution of plausible robot actions. Unlike approaches based on online reinforcement learning or paired human-robot data, our method enables robust adaptation to new tasks and scenes with minimal effort. In real-world experiments across 8 diverse manipulation tasks, DemoDiffusion achieves 83.8\% average success rate, compared to 13.8\% for the pre-trained policy and 52.5\% for kinematic retargeting, succeeding even on tasks where the pre-trained generalist policy fails entirely. Project page: https://demodiffusion.github.io/

URL PDF HTML ☆

赞 0 踩 0

2507.17804 2026-06-16 astro-ph.HE astro-ph.CO astro-ph.IM cs.LG hep-ph 版本更新

On the Energy Distribution of the Galactic Center Excess' Sources

银河系中心过量辐射源的能谱分布

Florian List, Yujin Park, Nicholas L. Rodd, Eve Schoen, Florian Wolf

发表机构 * Department of Astrophysics, University of Vienna（维也纳大学天体物理系）； Theory Group, Lawrence Berkeley National Laboratory（伯克利劳伦斯国家实验室理论组）； Berkeley Center for Theoretical Physics, University of California（加州大学伯克利分校理论物理中心）； University of California, Berkeley（加州大学伯克利分校）； Lawrence Berkeley National Laboratory（伯克利劳伦斯国家实验室）

AI总结利用基于神经网络模拟的推理方法联合分析空间和能谱数据，发现银河系中心过量辐射若由点源贡献，所需源数量比之前估计高两个数量级，支持其可能为暗物质湮灭产生的弥散辐射。

Comments 7+22 pages, 2+22 figures; v2: journal version

详情

DOI: 10.1103/dkcq-6y4f

AI中文摘要

银河系中心过量辐射（GCE）可能预示着湮灭暗物质的发现。但与此结论相悖的分析表明，在发射的空间结构内存在暗弱点源的证据。由于技术限制，这些分析纯粹基于空间信息，丢弃了所有可能将过量辐射与天体物理背景区分开来的能谱信息。在这里，我们证明基于神经网络模拟的推理方法可以联合分析空间和能谱数据。这一改进意义深远：能量信息使假定的点源显著变暗，表明GCE本质上是弥散的，或者由异常大量的源组成。定量而言，对于我们的最佳拟合背景模型，过量辐射基本上与暗物质预测的泊松发射一致。如果由点源引起，我们的中值预测为$\mathcal{O}(10^5)$个源，或在90%置信度下超过35,000个，两者都比早期GCE点源分析所偏好的数百个源高出几个数量级，尽管背景系统学允许的变化可能将所需源数量减少大约一个数量级。

英文摘要

The Galactic Center Excess (GCE) may yet herald the discovery of annihilating dark matter. Weighing against that conclusion are analyses showing evidence for dim point sources within the spatial structure of the emission. Due to technical limitations these analyses are purely spatial with all spectral information that could disentangle the excess from astrophysical backgrounds discarded. Here, we demonstrate that a neural network simulation-based inference approach can jointly analyze the spatial and spectra data. The addition is profound: energy information drives the putative point sources to be significantly dimmer, indicating either the GCE is truly diffuse in nature or made of an exceptionally large number of sources. Quantitatively, for our best fit background model, the excess is essentially consistent with Poisson emission as predicted by dark matter. If due to point sources, our median prediction is $\mathcal{O}(10^5)$ sources, or more than 35,000 at 90\% confidence, both orders of magnitude larger than the hundreds preferred by earlier point-source analyses of the GCE, although variations allowed by background systematics could reduce the required number of sources by roughly an order of magnitude.

URL PDF HTML ☆

赞 0 踩 0

2510.14092 2026-06-16 stat.ML cs.LG 版本更新

deFOREST: Fusing Optical and Radar satellite data for Enhanced Sensing of Tree-loss

deFOREST: 融合光学与雷达卫星数据增强树木损失感知

Julio Enrique Castrillon-Candas, Hanfeng Gu, Caleb Meredith, Yulin Li, Xiaojing Tang, Pontus Olofsson, Mark Kon

发表机构 * Department of Mathematics and Statistics, Boston University（波士顿大学数学与统计学系）； Department of Earth and Environment, Boston University（波士顿大学地球与环境系）； College of Integrated Science & Engineering, James Madison University（詹姆斯麦迪逊大学整合科学与工程学院）； NASA Marshall Space Flight Center（美国国家航空航天局马歇尔航天飞行中心）

AI总结提出融合光学与SAR数据的森林砍伐检测流程，利用离散KL展开残差空间构建异常图，结合HMM分类，在亚马逊区域验证混合方法优于现有技术且对稀疏光学数据更鲁棒。

详情

DOI: 10.1109/TGRS.2026.3689741
Journal ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 64, 2026, Art no. 4409213

AI中文摘要

本文开发了一个结合光学和合成孔径雷达（SAR）数据的森林砍伐检测流程。该流程的一个关键组成部分是利用离散Karhunen-Loéve（KL）展开的残差空间构建光学数据的异常图。异常通过森林标称状态下残差分量分布的浓度界限来量化。该界限不需要关于数据分布的先验知识。这与假设知道数据分布的统计参数方法形成对比，这种假设不切实际，尤其对于高维数据（如我们的数据）不可行。一旦计算出光学异常图，它们与SAR数据结合，并通过隐马尔可夫模型（HMM）对森林状态进行分类。我们在亚马逊森林中一个$92\,km \times 92\,km$的区域使用Sentinel-1（SAR）和Sentinel-2（光学）数据测试了我们的方法。结果表明，混合光学-雷达方法和仅光学方法都实现了高精度，优于最新的混合方法。此外，在高度多云地区常见的光学数据稀疏情况下，混合方法显著更鲁棒。

英文摘要

In this paper we develop a deforestation detection pipeline that incorporates optical and Synthetic Aperture Radar (SAR) data. A crucial component of the pipeline is the construction of anomaly maps of the optical data, which is done using the residual space of a discrete Karhunen-Loéve (KL) expansion. Anomalies are quantified using a concentration bound on the distribution of the residual components for the nominal state of the forest. This bound does not require prior knowledge on the distribution of the data. This is in contrast to statistical parametric methods that assume knowledge of the data distribution, an impractical assumption that is especially infeasible for high dimensional data such as ours. Once the optical anomaly maps are computed they are combined with SAR data, and the state of the forest is classified by using a Hidden Markov Model (HMM). We test our approach with Sentinel-1 (SAR) and Sentinel-2 (Optical) data on a $92\,km \times 92\,km$ region in the Amazon forest. The results show that both the hybrid optical-radar and optical only methods achieve high accuracy that is superior to the recent state-of-the-art hybrid method. Moreover, the hybrid method is significantly more robust in the case of sparse optical data that are common in highly cloudy regions.

URL PDF HTML ☆

赞 0 踩 0

2511.22486 2026-06-16 physics.plasm-ph cs.LG 版本更新

The Machine Learning Approach to Moment Closure Relations for Plasma: A Review

等离子体矩闭包关系的机器学习方法：综述

Samuel Burles, Enrico Camporeale

发表机构 * School of Physical and Chemical Sciences, Queen Mary University of London（伦敦大学女王学院物理与化学科学学院）； Space Weather TREC, University of Colorado（科罗拉多大学空间天气TREC）

AI总结本文综述了机器学习方法在等离子体流体模型中发展改进闭包模型的研究，涵盖神经网络代理和方程发现两类方法，并讨论了离线测试与在线模拟的挑战及未来方向。

Comments 58 pages, 6 figures

详情

AI中文摘要

大规模等离子体全局模拟的需求是空间和实验室等离子体物理学中持续存在的挑战。任何基于流体模型的模拟都固有地需要高阶等离子体矩的闭包关系。本综述汇编并分析了近期涌现的机器学习方法，这些方法旨在开发改进的等离子体闭包模型，能够在等离子体流体模型中捕捉动力学现象。我们调查了两类方法：神经网络代理（从多层感知器到傅里叶神经算子，后者最近在流体求解器内在线复现了线性和非线性朗道阻尼）和方程发现方法（如稀疏回归）；并根据这些研究是离线对照参考数据测试还是在线在时间演化求解器内测试进行组织。我们概述了与机器学习闭包相关的挑战，包括非对角压力张量精度、超出训练分布的泛化能力以及稳定集成到大尺度模拟中，并指出了未来研究可能解决这些问题的方向。

英文摘要

The requirement for large-scale global simulations of plasma is an ongoing challenge in both space and laboratory plasma physics. Any simulation based on a fluid model inherently requires a closure relation for the high order plasma moments. This review compiles and analyses the recent surge of machine learning approaches developing improved plasma closure models capable of capturing kinetic phenomena within plasma fluid models. We survey two methodological families: neural-network surrogates (from multilayer perceptrons to Fourier neural operators, the latter recently reproducing both linear and non-linear Landau damping online within a fluid solver) and equation-discovery methods such as sparse regression; and organise the studies by whether they are tested offline against reference data or online within a time-evolving solver. We outline the challenges associated with machine-learning closures, including off-diagonal pressure-tensor accuracy, generalisation beyond the training distribution, and stable integration into large-scale simulations, and the directions future research might take to address them.

URL PDF HTML ☆

赞 0 踩 0

2601.20875 2026-06-16 stat.AP cs.LG econ.EM stat.ME stat.ML 版本更新

Drivers, Receivers, and Dynamic Linkages: The Directed Structure of SDG Interdependence, 2000--2024

驱动者、接收者与动态联系：可持续发展目标相互依赖的有向结构，2000-2024

Md Muhtasim Munif Fahim, Md Jahid Hasan Imran, Md. Naim Molla, Luknath Debnath, Tonmoy Shil, Ehsanul Bashar Pranto, Md Mostafizur Rahman Likhon, Md Shafin Sanyan Saad, Md. Rezaul Karim

发表机构 * Data Science Research Lab, Department of Statistics, University of Rajshahi（数据科学研究实验室，统计学系，拉贾沙希大学）

AI总结使用面板格兰杰因果检验和局部投影法，分析114个国家2000-2024年17个可持续发展目标的有向相互依赖网络，发现84个显著联系（40个协同、44个权衡），驱动者-接收者排名脆弱，和平与强大机构是净接收者，减贫是效应加权驱动者。

Comments 27 pages, 5 figures. Panel Granger non-causality and local projections on 114 countries (2000-2024). Submitted to Sustainability Science

详情

AI中文摘要

财政和行政能力有限的政府需要知道哪些可持续发展目标（SDGs）通过目标系统传播进展以及传播速度有多快。我们利用2000年至2024年每年观测的114个国家的平衡面板数据，绘制了所有17个目标的有向相互依赖结构。目标序列具有持续性、趋势性和横截面依赖性，因此我们应用了两种适用于该机制的估计量：对一阶差分序列运行的Dumitrescu-Hurlin面板格兰杰非因果性检验，以恢复有向交互网络；以及具有Driscoll-Kraay标准误的面板局部投影，以测量31个理论推导的指标联系的动态幅度。在272个有向目标对中，84个联系通过了错误发现控制（40个协同，44个权衡；网络密度0.31）。协同和权衡以相当的强度出现，因此没有单一目标表现为通用加速器，目标层级本身也很脆弱。驱动者-接收者排名在滞后阶数和中心性指标上弱相关，并且在国家自助法下只有两个角色与零可区分：和平与强大机构作为最清晰的净接收者，以及减贫作为最可能的效应量加权驱动者。支持的联系是动态的，在四到五年内累积：卫生设施和贫困改善是降低儿童死亡率的最强预测因子，教育-儿童健康关联在183个国家的独立世界发展指标数据中得到证实。这些结果警示基于排名的加速器政策，并支持基于通过组成指标监测的、有支持的时间滞后联系构建的自适应投资组合。

英文摘要

Governments with limited fiscal and administrative capacity need to know which Sustainable Development Goals (SDGs) propagate progress through the goal system and how quickly. We map the directed interdependence structure of all seventeen goals using a balanced panel of 114 countries observed annually from 2000 to 2024. The goal series are persistent, trending, and cross-sectionally dependent, so we apply two estimators matched to this regime: a Dumitrescu-Hurlin panel Granger non-causality test, run on first-differenced series, to recover the directed interaction network, and panel local projections with Driscoll-Kraay standard errors to measure the dynamic magnitude of 31 theory-derived indicator linkages. Of 272 directed goal pairs, 84 linkages survive false-discovery control (40 synergies, 44 trade-offs; network density 0.31). Synergies and trade-offs occur at comparable strength, so no single goal behaves as a universal accelerator, and the goal-level hierarchy itself is fragile. Driver-receiver rankings correlate weakly across lag orders and centrality metrics, and under a country bootstrap only two roles are distinguishable from zero: peace and strong institutions as the clearest net receiver, and poverty reduction as the most probable effect-size-weighted driver. The supported linkages are dynamic, accruing over four to five years: sanitation and poverty improvements are the strongest predictors of lower child mortality, and the education-child-health association is corroborated in independent World Development Indicators data across 183 countries. These results caution against rankings-based accelerator policy and support adaptive portfolios built on supported, time-lagged linkages monitored through constituent indicators.

URL PDF HTML ☆

赞 0 踩 0

2602.07343 2026-06-16 cs.CV cs.AI cs.LG cs.RO 版本更新

RoTRAG: 基于经验法则推理的检索增强生成对话有害内容检测

Juhyeon Lee, Wonduk Seo, Junseo Koh, Seunghyun Lee, Haihua Chen, Yi Bu

发表机构 * Peking University（北京大学）； Enhans ； University of North Texas（北得克萨斯大学）

AI总结提出RoTRAG框架，通过检索外部道德规范（RoTs）增强LLM的多轮对话有害内容检测，实现基于规范推理和分类，平均F1提升约40%，分布误差降低8.4%。

Comments Accepted by SIGIR-ICTIR 2026, Oral Presentation

详情

DOI: 10.1145/3805713.3820397
Journal ref: Proceedings of the 2026 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR '26), July 25, 2026, Melbourne, VIC, Australia. ACM, New York, NY, USA, 12 pages

AI中文摘要

检测多轮对话中的有害内容需要对完整对话上下文进行推理，而非孤立的话语。然而，现有方法主要依赖模型内部的参数化知识，缺乏对外部规范性原则的明确依据。这常导致在社会细微语境下判断不一致、可解释性有限以及跨轮次冗余推理。为解决此问题，我们提出RoTRAG，一种检索增强框架，将简洁的人类编写的道德规范（称为经验法则，RoTs）融入基于LLM的有害性评估中。对于每一轮，RoTRAG从外部语料库中检索相关RoTs，并将其作为轮次推理和最终严重性分类的明确规范性证据。为提高效率，我们进一步引入一个轻量级二元路由分类器，决定新轮次是否需要基于检索的推理或可重用现有上下文。在ProsocialDialog和Safety Reasoning Multi Turn Dialogue上的实验表明，RoTRAG在有害分类和严重性估计上均持续优于竞争基线，在基准数据集上F1平均相对提升约40%，分布误差平均相对降低8.4%，同时在不牺牲性能的情况下减少冗余计算。

英文摘要

Detecting harmful content in multi turn dialogue requires reasoning over the full conversational context rather than isolated utterances. However, most existing methods rely mainly on models internal parametric knowledge, without explicit grounding in external normative principles. This often leads to inconsistent judgments in socially nuanced contexts, limited interpretability, and redundant reasoning across turns. To address this, we propose RoTRAG, a retrieval augmented framework that incorporates concise human written moral norms, called Rules of Thumb (RoTs), into LLM based harm assessment. For each turn, RoTRAG retrieves relevant RoTs from an external corpus and uses them as explicit normative evidence for turn level reasoning and final severity classification. To improve efficiency, we further introduce a lightweight binary routing classifier that decides whether a new turn requires retrieval grounded reasoning or can reuse existing context. Experiments on ProsocialDialog and Safety Reasoning Multi Turn Dialogue show that RoTRAG consistently improves both harm classification and severity estimation over competitive baselines, with an average relative gain of around 40% in F1 across benchmark datasets and an average relative reduction of 8.4% in distributional error, while reducing redundant computation without sacrificing performance.

URL PDF HTML ☆

赞 0 踩 0

2604.26963 2026-06-16 cs.OS cs.DC cs.LG cs.MA 版本更新

MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems

MARS：面向异构智能体系统的高效自适应协同调度

Yifei Wang, Hancheng Ye, Yechen Xu, Cong Guo, Chiyue Wei, Qinsi Wang, Dongting Li, Tingjun Chen, Hai "Helen" Li, Danyang Zhuo, Yiran Chen

发表机构 * Duke University（杜克大学）

AI总结提出MARS协同调度系统，通过统一信息流全局协调GPU推理与CPU工具执行，解耦准入与执行防止资源过载，并采用智能体中心调度器最小化端到端延迟，实验显示延迟降低5.94倍。

Comments 14 pages, 13 figures. Preprint

详情

AI中文摘要

大型语言模型（LLM）越来越多地被部署为自主智能体的执行核心，而非独立的文本生成器。智能体工作负载引发了时间上的转变，从单轮推理转向多轮LLM-工具循环，以及空间上的转变，从聊天规模的仅GPU执行转向仓库规模的GPU-CPU协同执行。因此，协调智能体执行的异构资源需求已成为一个关键的系统挑战。我们设计并实现了MARS，一个高效且自适应的协同调度系统，它在GPU-CPU耦合资源压力下全局协调异构智能体工作负载。通过统一信息流建立对GPU推理和CPU工具执行的全局可见性，MARS中的外部控制平面将准入与执行解耦，以防止异构资源过载。内部智能体中心调度器通过优先处理延迟敏感的延续，并仅在热恢复带来延迟收益时自适应保留KV缓存状态，进一步最小化端到端关键路径。我们的评估表明，MARS将端到端延迟降低高达5.94倍，同时保持接近最大的系统吞吐量。我们进一步将MARS作为OpenHands编码智能体框架的服务后端，通过加速端到端任务完成时间高达1.87倍，展示了其在现实世界中的有效性。我们的源代码在此https URL公开提供。

英文摘要

Large language models (LLMs) are increasingly deployed as the execution core of autonomous agents rather than as standalone text generators. Agentic workloads induce a temporal shift from single-turn inference to multi-turn LLM-tool loops, and a spatial shift from chat-scale, GPU-only execution to repository-scale, GPU-CPU co-located execution. Consequently, coordinating heterogeneous resource demands of agentic execution has emerged as a critical system challenge. We design and implement MARS, an efficient and adaptive co-scheduling system that globally coordinates heterogeneous agentic workloads under coupled GPU-CPU resource pressure. By establishing holistic visibility across GPU inference and CPU tool execution via a unified information stream, an external control plane in MARS decouples admission from execution to prevent heterogeneous resource oversubscription. An internal agent-centric scheduler further minimizes the end-to-end critical path by prioritizing latency-sensitive continuations and adaptively retaining KV cache state only when warm resumption yields a latency benefit. Our evaluations show that MARS reduces end-to-end latency by up to 5.94x while maintaining nearly maximal system throughput. We further integrate MARS as the serving backend for the OpenHands coding agent framework, demonstrating its real-world effectiveness by accelerating end-to-end task completion time by up to 1.87x. Our source code is publicly available at https://github.com/Afterglow231/MARS_preview .

URL PDF HTML ☆

赞 0 踩 0

2605.04998 2026-06-16 cs.SD cs.IR cs.LG 版本更新

Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

流行与爵士混合比例对体裁自适应和弦生成的实证研究

Jinju Lee

发表机构 * PearlLeeStudio（pearllee studio）

AI总结本研究通过调整流行与爵士音乐的比例进行和弦生成排练，发现适度的流行排练能在保持流行准确率的同时提升爵士预测性能，并修正了先前版本中的检查点选择错误。

Comments Erratum: the released F1 checkpoint equals the Phase-0 pop baseline (full SHA-256 verified); min mixed validation loss selection kept the unadapted warmup epoch. Tables 4 and 5 are best epoch metrics; mix ratio conclusions hold. A corrected retrain (jazz only validation), ft-pop80-v2, reproduces across 3 seeds. v1 F2 row fixed. 3 figs, 5 tables. https://huggingface.co/PearlLeeStudio

2606.01110 2026-06-16 physics.geo-ph cs.LG quant-ph 版本更新

重新思考结构异常检测：从决策边界到投影算子

Alexander Bauer

发表机构 * Machine Learning Group, TU Berlin（柏林工业大学机器学习组）； BIFOLD, Berlin, Germany（柏林BIFOLD研究所）

AI总结针对现有异常检测方法在流形支持数据上的局限性，提出基于投影算子的几何视角，将异常定义为投影残差，统一了重建方法并提升了性能。

详情

AI中文摘要

大多数现有的异常检测方法依赖于估计概率密度或学习封闭的决策边界，隐含地假设正常数据在环境空间中占据非零体积的区域。相比之下，结构异常检测考虑位于低维流形附近的数据，导致现有方法的归纳偏差与数据结构不匹配，常常导致性能下降。为了解决这种不匹配，我们引入了几何视角。具体来说，我们学习一个投影算子到正常样本的流形上，并定义一个样本为异常如果它被这个投影改变。这个公式自然地整合了流形支持数据的归纳偏差，并将异常检测重新表述为投影残差，从而解决了由退化分布建模引起的问题。值得注意的是，它通过用投影质量解释重建方法的成功和失败，提供了对基于重建方法的统一解释。特别是，它解释了投影对齐模型强大的泛化能力，作为向流形收缩行为的结果。此外，通过将异常检测与概率建模解耦，它减少了将罕见但正常的样本错误分类的趋势，这是现有方法广泛认可的局限性。实验上，我们证明了投影对齐方法实现了强大的性能，优于基于边界的方法，同时改进了现有的基于重建的方法。

英文摘要

Most existing anomaly detection methods rely on estimating a probability density or learning an enclosing decision boundary, implicitly assuming that normal data occupies a region of non-zero volume in the ambient space. In contrast, structural anomaly detection considers data that lies near a low-dimensional manifold, creating a mismatch between the inductive bias of existing methods and the structure of the data, often resulting in degraded performance. To address this mismatch, we introduce a geometric perspective. Specifically, we learn a projection operator onto the manifold of normal samples and define a sample as anomalous if it is altered by this projection. This formulation naturally integrates the inductive bias of manifold-supported data and reframes anomaly detection in terms of a projection residual, thereby resolving issues arising from modeling degenerate distributions. Notably, it provides a unifying interpretation of reconstruction-based methods by explaining their success and failure in terms of projection quality. In particular, it explains the strong generalization ability of projection-aligned models as a consequence of contraction behavior toward the manifold. Moreover, by decoupling anomaly detection from probabilistic modeling, it reduces the tendency to misclassify rare but normal samples, a widely recognized limitation of existing approaches. Empirically, we demonstrate that projection-aligned methods achieve strong performance, outperforming boundary-based methods while improving upon existing reconstruction-based approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.15369 2026-06-16 cs.LG 新提交

Repeated Bilateral Trade: The Quest for Fairness

重复双边贸易：追求公平

François Bachoc, Roberto Colomboni, Emilie Kaufmann

发表机构 * University of Lille（里尔大学）； Institut Universitaire de France (IUF)（法国大学研究院）； School of Mathematics, University of Bristol（布里斯托大学数学学院）； Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189-CRIStAL（里尔大学、法国国家科学研究中心、法国国家信息与自动化研究所、中央理工-里尔高等电力学院，UMR 9189-CRIStAL）

AI总结研究重复双边贸易中的公平性，提出Rawls-to-Nash公平增益目标族，并刻画其最优学习率。

详情

AI中文摘要

我们从公平的角度研究重复双边贸易。每轮，一对新的卖方-买方到达，平台在观察交易者估值之前发布价格。只有当双方都接受价格时，交易才会发生。我们考虑的不是最大化贸易收益，而是寻求平衡分配所产生的盈余的平台。我们表明，自然的公平性要求导致了一个单参数的Rawls-to-Nash公平增益目标族，该目标族通过非正Hölder均值聚合卖方和买方的净收益而得到。与标准的贸易收益目标和先前工作中研究的Rawlsian公平增益目标不同，我们提出的目标引入了一种新的统计结构，其中期望奖励通过阈值反馈从二维奇异核积分恒等式中恢复。这导致了一个非标准的纯探索问题，其自然估计量是具有行列依赖和奇异权重的矩形双重和。假设卖方和买方估值序列独立同分布且具有任意未知边际分布，我们刻画了整个Rawls-to-Nash公平增益目标族的最优学习率，给出了匹配的固定置信度样本复杂度和遗憾界（最多相差多对数因子）。

英文摘要

We study repeated bilateral trade from a fairness perspective. At each round, a fresh seller-buyer pair arrives, and the platform posts a price before observing the traders' valuations. Trade occurs only if both agents accept the price. Rather than maximizing only the gain from trade, we consider platforms that seek balanced divisions of the generated surplus. We show that natural fairness desiderata lead to a one-parameter Rawls-to-Nash family of fair-gain objectives, obtained by aggregating the seller's and buyer's net gains through nonpositive Hölder means. Unlike the standard gain-from-trade objective and the Rawlsian fair-gain objective studied in prior work, our proposed objectives induce a new statistical structure in which expected rewards are recovered from threshold feedback through a two-dimensional singular-kernel integral identity. This leads to a nonstandard pure-exploration problem whose natural estimators are rectangular double sums with row-column dependence and singular weights. Assuming independent i.i.d. seller and buyer valuation sequences with arbitrary unknown marginals, we characterize the optimal learning rates for the whole Rawls-to-Nash family of fair-gain objectives, giving matching fixed-confidence sample-complexity and regret bounds up to polylogarithmic factors.

URL PDF HTML ☆

赞 0 踩 0

2606.15420 2026-06-16 cs.LG cs.AI 新提交

Constitutional Value Potentials: reading and steering internal priority margins in language models

宪法价值潜力：读取和引导语言模型中的内部优先级边际

Tong Che, Rui Wu

发表机构 * NVIDIA Research（英伟达研究院）； Rutgers University（罗格斯大学）

AI总结提出宪法价值潜力（CVP）方法，通过从隐藏状态学习标量势来读取模型内部的价值优先级边际，以预测和干预价值冲突，AUROC高达0.95。

详情

AI中文摘要

宪法告诉语言模型应该重视什么，但很少有方法告诉我们它是否真的重视。遵守程度通过输出来判断，而输出证据在价值冲突中最脆弱，此时重要的不是模型提及哪个价值，而是它愿意牺牲哪个价值。我们提供证据表明，这种仲裁可以从结构化边际读出中的激活状态中读取。我们引入宪法价值潜力（CVP）。对于每个价值，我们从隐藏状态学习一个标量势：一种保存该价值的内部压力，其监督不是来自提示，而是来自独立评判者对模型自身响应实际保存了哪个价值的裁决。两个势的符号差就是优先级边际。宪法条款成为边际保持为正的主张，而单个监控分数在边际不为正时发出警报。该监控器预测冲突违规的AUROC高达0.95，优于强隐藏状态探针，并在三个Qwen2.5尺度上泛化到未见过的合成冲突。该信号在答案开始时出现，来自提示尾部和第一个响应令牌。早期读取该信号，可以揭示对抗性优先级攻击是否实际上已将模型推向违规，而不仅仅是提示看起来具有对抗性。相同的方向也支持干预测试：在选定的引导设置下，沿着价值方向移动会按预期方向改变评判的权衡。这些结果表明，一些与宪法相关的优先级可以作为激活空间中的边际访问，而不仅仅是输出行为。

英文摘要

A constitution tells a language model what to value, but little tells us whether it does. Adherence is judged from outputs, and output evidence is most fragile on value conflicts, where what matters is not which value a model mentions but which one it is willing to sacrifice. We provide evidence that this arbitration can be read from activations in a structured margin readout. We introduce Constitutional Value Potentials (CVP). For each value we learn a scalar potential from the hidden state: an internal pressure to preserve that value, supervised not by the prompt but by an independent judge's verdict on which value the model's own response actually preserved. The signed difference of two potentials is a priority margin. A constitutional clause becomes the claim that a margin stays positive, and a single monitor score flags when it does not. The monitor predicts conflict violations with AUROC up to 0.95, beats a strong hidden-state probe, and generalizes to held-out synthetic conflicts across three Qwen2.5 scales. The signal appears as the answer begins, from the prompt tail and first response token. Read this early, the same signal reveals whether an adversarial priority hack has actually pushed the model toward a violation, rather than only whether the prompt looks adversarial. The same directions also support intervention tests: under selected steering settings, moving along a value direction shifts judged trade-offs in the intended direction. Together, these results suggest that some constitution-relevant priorities are accessible as activation-space margins, rather than only as output behavior.

URL PDF HTML ☆

赞 0 踩 0

2606.16075 2026-06-16 cs.LG cs.CV 新提交

AME: A Multi-Type Contributor Attribution Framework in Generative AI Markets

AME：生成式AI市场中的多类型贡献者归属框架

Yang Shi, Songwen Pei, Yang Gao, Bingxue Zhang

发表机构 * University of Shanghai for Science and Technology（上海理工大学）； Fudan University（复旦大学）

AI总结针对生成式AI中多阶段协作的价值分配问题，提出AME框架，整合异构数据贡献评估、数据权利映射和可信执行，实现与人类判断一致的低成本价值分配。

详情

AI中文摘要

生成式AI通过异构贡献者（包括训练数据、基础模型、微调行为和提示）之间的多阶段协作实现价值创造。然而，如何公平分配数据价值仍未得到充分探索。本文将多阶段生成式AI价值分配定义为一个新的研究问题，并识别出三个核心挑战：异构数据贡献评估、数据权利映射和可信执行。我们提出AME（归属-映射-执行）框架，这是一个统一框架，将数据贡献评估、数据权利映射和可信执行整合到单个工作流中。实验结果表明，AME框架实现了与人类参考判断更一致的数据价值分配结果，同时保持低成本的可信执行。我们的工作为生成式AI数据市场中的价值评估和收益分配提供了初步基础。

英文摘要

Generative AI enables value creation through multi-stage collaboration among heterogeneous contributors, including training data, base models, fine-tuning behaviors, and prompts. However, how to fairly allocate the data value remains largely unexplored. This paper formulates multi-stage generative AI value allocation as a new research problem and identifies three core challenges: heterogeneous data contribution valuation, data rights mapping, and trustworthy execution. We propose AME (Attribution-Mapping-Execution) framework, a unified framework that integrates data contribution valuation, data rights mapping, and trustworthy execution into a single workflow. Experimental results demonstrate that AME framework achieves data value allocation outcomes more consistent with human reference judgments while maintaining low-cost trustworthy execution. Our work provides an initial foundation for value assessment and revenue allocation in generative AI data markets.

URL PDF HTML ☆

赞 0 踩 0

2606.16461 2026-06-16 cs.LG 新提交

Privacy from Symmetry: Orthogonally Equivariant Transformers for LLM Inference

对称性带来的隐私：用于大语言模型推理的正交等变Transformer

Alexander Yukhimchuk, Andrey Shulga, Mladen Kolar, Martin Takáč

发表机构 * MBZUAI（穆罕默德·本·扎耶德人工智能大学）； University of Southern California（南加州大学）

AI总结针对拆分推理中隐藏表示易被近邻搜索恢复的问题，提出正交混淆方法，并设计ConjFormer架构实现O(d)-等变性，在不加噪声或重加密下将令牌恢复率从35%降至1.3%，困惑度仅增0.4%。

详情

AI中文摘要

本地运行大型语言模型通常不切实际，这促使将敏感文本的推理推向第三方提供商。拆分推理通过将令牌保留在客户端并仅发送隐藏表示来部分缓解这一问题，但这些表示仍可通过针对公共嵌入表的最近邻搜索恢复。我们提出一种正交混淆过程，其中客户端在传输前将嵌入乘以一个秘密正交矩阵。为了在任意旋转下实现正确推理，我们引入了ConjFormer，这是一种Transformer变体，通过轻量级归一化更改（标量RMSNorm）以及所有线性权重的块状正交共轭，实现精确的$\mathrm{O}(d)$-等变性。因此，服务器完全在旋转基中执行前向传播，并且从未观察到未旋转的隐藏状态。在PubMed上微调的GPT-2和Llama 3.2 1B模型上的实验表明，正交混淆消除了直接余弦最近邻反演，并将令牌恢复率从超过35%的前10名降至最多1.3%，而微调后困惑度仅增加0.4%。这些结果表明，在架构层面强制执行对称性可以为隐私保护的大语言模型推理提供一种实用的防御，无需噪声注入或繁重的密码学机制。

英文摘要

Running large language models locally is often impractical, pushing inference on sensitive text to third-party providers. Split inference partially mitigates this by keeping tokens on the client and sending only hidden representations, but these representations can still be recovered via nearest-neighbor search against the public embedding table. We propose an orthogonal obfuscation procedure in which the client multiplies embeddings by a secret orthogonal matrix before transmission. To enable correct inference under arbitrary rotations, we introduce ConjFormer, a transformer variant that is exactly $\mathrm{O}(d)$-equivariant via a lightweight normalization change (scalar RMSNorm) together with blockwise orthogonal conjugation of all linear weights. As a result, the server performs the full forward pass entirely in the rotated basis and never observes unrotated hidden states. Experiments on GPT-2 and Llama 3.2 1B models fine-tuned on PubMed show that orthogonal obfuscation eliminates direct cosine nearest-neighbor inversion and reduces token recovery from over 35% top-10 to at most 1.3%, while increasing perplexity by only 0.4% after fine-tuning. These results indicate that enforcing symmetry at the architectural level can provide a practical defense for privacy-preserving LLM inference without noise injection or heavy cryptographic machinery.

URL PDF HTML ☆

赞 0 踩 0

2606.16920 2026-06-16 cs.LG cs.AI 新提交

Demystifying Variance in Circuit Discovery of LLMs

揭示LLM电路发现中的方差

Frank Zhengqing Wu, Francesco Tonin, Volkan Cevher

发表机构 * Laboratory for Information and Inference Systems (LIONS), École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland（信息与推理系统实验室（LIONS），洛桑联邦理工学院（EPFL），瑞士洛桑）

AI总结本文研究LLM电路发现中的重采样、重述和样本方差，提出CEAP方法减少重采样方差，并分析重述方差源于不同模板激活不同电路，样本方差主要由不忠定义导致。

详情

AI中文摘要

电路发现是机械可解释性中的关键技术，用于定位对执行给定任务至关重要的模型组件。尽管当前最先进的方法（EAP-IG）在（不）忠实性指标上表现良好，但它存在显著的变异性。这包括重采样方差（当我们用来自同一分布的新数据批次探测时电路发生变化）、重述方差（当提示被重新表述时发现的电路发生偏移）以及样本方差（具有低总体不忠实性的电路在单个样本上的不忠实性表现出大幅波动）。本文研究了这些方差的根源。我们证明了CEAP（我们新的电路发现方法，在理论上改进了EAP-IG）可以显著减轻重采样方差。我们进一步表明，重述方差是由于不同模板的提示倾向于激活模型中的不同电路。这使我们提出，可能很难找到一个全面的电路来解释和控制模型在任务上的行为，而该任务可以用无数模板表达，这表明LLM可能本质上难以操控。我们表明，稀疏性（据称能形成更紧凑和可解释的任务电路）无法解决这个问题。关于样本方差，我们认为它很大程度上是良性的：极差的不忠实性分数通常源于不忠实性的定义方式，而非测量电路的缺陷。我们表明，不忠实性的大小受选择性贡献缩放的影响，这是一种神经机制，解释了有时观察到的极差分数。

英文摘要

Circuit discovery is a key technique in mechanistic interpretability to pinpoint the model components that are crucial for performing a given task. Although the current state-of-the-art method (EAP-IG) performs well on the metric of (un)faithfulness, it suffers from substantial variability. This includes resampling variance, where the circuit changes when we probe with a new batch of data from the same distribution; rephrasing variance, where the discovered circuit shifts when the prompts are rephrased; and sample-wise variance, where a circuit with low population unfaithfulness exhibits large fluctuations in unfaithfulness across individual samples. This paper studies the roots of these variances. We demonstrate that CEAP, our new circuit discovery method that improves upon EAP-IG with a theoretical guarantee, can substantially lessen resampling variance. We further show that rephrasing variance arises because prompts with different templates tend to activate different circuits in the model. This leads us to argue that it may be challenging to find a comprehensive circuit that explains and controls the model's behavior on a task, which can be expressed in countless templates, suggesting that LLMs may be inherently hard to steer. We show that sparsity, which has been claimed to form more compact and interpretable task circuits, fails to solve this problem. Regarding sample-wise variance, we argue that it is largely benign: extremely poor unfaithfulness scores often stem from how unfaithfulness is defined, rather than from defects in the measured circuits. We show that the magnitude of unfaithfulness is affected by selective contribution scaling, a neural mechanism that accounts for the extremely poor scores sometimes observed.

URL PDF HTML ☆

赞 0 踩 0

2606.14977 2026-06-16 econ.EM cs.LG 交叉投稿

Identification and Inference for Algorithmic Frontiers with Selective Labels

选择性标签下的算法前沿识别与推断

Yiqi Liu, Francesca Molinari, Amilcar Velez

发表机构 * Department of Economics, Cornell University（经济系，康奈尔大学）

AI总结本文针对仅观测到部分个体结果的情况，提出了公平-准确性前沿的识别方法及统计推断工具，包括无限制选择下的锐识别区域、无混淆假设下的点识别与去偏机器学习估计量。

Comments 68 pages, 2 figures

2606.15390 2026-06-16 cs.CL cs.AI cs.LG 交叉投稿

Not All Skills Help: Measuring and Repairing Agent Knowledge

并非所有技能都有用：测量与修复智能体知识

Yixuan Wang, Yiyang Zhou, Yiming Liang, Congyu Zhang, Fuxiao Liu, Jiawei Zhou, Huaxiu Yao

发表机构 * UNC Chapel Hill（北卡罗来纳大学教堂山分校）； Purdue（普渡大学）； NVIDIA（英伟达）

AI总结提出ASSAY框架，通过随机掩码测量技能因果贡献，分离技能生成与筛选，在推理时抑制负面技能，显著提升LLM智能体任务完成率。

Comments 18 pages, 5 figures

详情

AI中文摘要

LLM智能体可以通过从经验中积累自然语言技能来改进，而无需更新权重，但当前系统将所有关于保留哪些技能以及如何应用它们的决策完全交由LLM判断。我们认为这混淆了两个不同的角色：从经验中生成技能是判断擅长的创造性行为，而决定该技能是否真正有帮助则需要跨多个任务的实证证据。通过随机掩码测量每个技能的因果贡献，我们发现技能库表现出普遍的因果异质性：单个技能通常在某些任务类型上有帮助，但在其他任务类型上有害，然而它们的相反效应在总体上相互抵消，使得全局筛选方法无法察觉。我们提出ASSAY，一个将生成与筛选分离的框架：它在小型开发集上计算每个技能的因果归因，离线重组技能库，并为每个测试任务抑制预测效应为负的技能。在跨越四个提供商的七个基础模型以及两个基准（AppWorld和tau-bench）上，ASSAY始终优于先前的技能筛选方法。在AppWorld最难的数据划分上，DeepSeek-V3实现了69.3%的任务目标完成率（相对提升47.4%），在所有已发表方法（包括权重调整方法）中达到了新的最先进水平。在tau-bench零售领域，GPT-4.1相对提升8.7%，在公开排行榜上超越了o4-mini、o1和GPT-4.5，且无需任何权重修改。消融实验将主要收益归因于每任务掩码，证实瓶颈在于推理时将技能与任务匹配，而非全局移除不良技能。代码已开源：https://github.com/aiming-lab/assay。

英文摘要

LLM agents can improve without weight updates by accumulating natural-language skills from experience, but current systems entrust every decision about which skills to keep and how to apply them to LLM judgment alone. We argue that this conflates two distinct roles: generating a skill from experience is a creative act that judgment handles well, while deciding whether that skill actually helps requires empirical evidence across many tasks. Measuring per-skill causal contributions via randomized masking, we find that skill libraries exhibit pervasive causal heterogeneity: individual skills routinely help on some task types while hurting on others, yet their opposing effects cancel in aggregate, making them invisible to global curation methods. We propose ASSAY, a framework that separates generation from curation: it computes a per-skill causal attribution on a small development set, restructures the library offline, and suppresses skills with negative predicted effect for each test task. Across seven base models spanning four providers and two benchmarks (AppWorld and tau-bench), ASSAY consistently improves over prior skill-curation approaches. On AppWorld's hardest split, DeepSeek-V3 achieves 69.3% task-goal completion (47.4% relative improvement), a new state of the art among all published methods including weight-tuned approaches. On tau-bench retail, GPT-4.1 improves by 8.7% relative, advancing past o4-mini, o1, and GPT-4.5 on the public leaderboard without any weight modification. Ablation traces the dominant gain to per-task masking, confirming that the bottleneck is matching skills to tasks at inference time, not removing bad skills globally. Code is available at https://github.com/aiming-lab/assay.

URL PDF HTML ☆

赞 0 踩 0

2606.15482 2026-06-16 stat.ML cs.LG 交叉投稿

Ricci-Filtration: Boosting Retrieval-Augmented Generation Reranker to Query-Answer Tasks by Discrete Ricci Flow

Ricci-Filtration：通过离散Ricci流提升检索增强生成重排序器在查询-答案任务中的性能

Tian Qin, Wei-Min Huang

发表机构 * Tian Qin（田琴）； Wei-Min Huang（黄伟民）

AI总结提出基于离散曲率和Ricci流的几何重排序增强方法Ricci-Filtration，通过建模查询与检索块为网络并利用曲率过滤噪声块，显著提升RAG生成性能。

详情

AI中文摘要

Ricci流是一种曲率引导的扩散过程，通过收缩高正曲率区域和扩张负曲率区域来变形空间。类似地，加权图上的离散Ricci流通过收缩正Ricci曲率的边和拉伸负Ricci曲率的边来修改边权重，有效增加簇之间的分离度。受这两项开创性工作的启发，我们提出了一种基于几何的RAG重排序增强方法，称为Ricci-Filtration。通过将输入查询和初始检索块建模为一个网络，其中输入查询和块作为节点，基于嵌入的成对关系定义初始图，Ricci-Filtration利用离散曲率和Ricci流评估每个块相对于用户查询的结构重要性。该系统首先根据块相对于查询的几何曲率过滤初始块；然后，重排序器处理剩余块以增强生成性能。我们从理论上证明，归一化离散Ricci流可以通过识别边权重的不同渐近行为来检测社区结构。这支持移除相对于查询节点具有大权重和负Ricci曲率的“噪声”文档块。大量实验证实，Ricci-Filtration在准确率、精确率、召回率和F1分数上优于几种基线重排序方法。此外，消融研究表明，Ricci-Filtration在各种设置下通常优于基线，突显了该框架在不同架构下的鲁棒性。

英文摘要

Ricci flow is a curvature-guided diffusion process that deforms space by shrinking regions of high positive curvature and expanding those with negative curvature. Similarly, discrete Ricci flow on weighted graphs modifies edge weights by shrinking edges with positive Ricci curvature and stretching those with negative Ricci curvature, effectively increasing the separation between clusters. Inspired by these two cornerstone works, we propose a geometry-based RAG reranker enhancement procedure called Ricci-Filtration. By modeling the input query and initial retrieved chunks as a network, where the input query and chunks serve as nodes and embedding-based pairwise relations define an initial graph, Ricci-Filtration leverages discrete curvature and Ricci flow to evaluate the structural importance of each chunk with respect to the user query. The system first filters the initial chunks based on their geometric curvature relative to the query; then, a reranker processes the remaining chunks to enhance generative performance. We theoretically prove that normalized discrete Ricci flow can detect community structures by identifying distinct asymptotic behaviors in edge weights. This supports the removal of ``noisy'' document chunks characterized by large weights and negative Ricci curvature relative to the query node. Extensive experiments confirm that Ricci-Filtration outperforms several baseline reranking methods in accuracy, precision, recall, and F1 scores. Furthermore, ablation studies demonstrate that the Ricci-Filtration generally outperforms the baseline under various settings, highlighting the framework's robustness across different architectures.

URL PDF HTML ☆

赞 0 踩 0

2606.15485 2026-06-16 cs.CY cs.AI cs.HC cs.LG cs.SE 交叉投稿

The Perils of Agency: How Developers Perceive, Prioritize, and Address Risks in Agentic AI Products

代理的风险：开发者如何感知、优先级排序和应对代理型AI产品中的风险

Hao-Ping Lee, Jessica He, David Piorkowski, Thomas Serban von Davier, Jodi Forlizzi, Sauvik Das

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结通过35位行业开发者的研究，发现开发者对代理型AI风险的感知与自主性、工具使用等代理特性紧密相关，他们优先考虑产品和业务风险，缺乏成熟的控制手段，揭示了代理能力与风险控制之间的张力。

详情

AI中文摘要

代理型AI系统自主行动、使用工具、适应环境并在复杂的现实世界中运行。然而，这些相同的特性可能产生或加剧产品风险。我们研究了行业开发者（n=35）如何感知、优先级排序和应对其代理型AI产品中的风险。我们发现，开发者对风险的感知与使产品具有代理性的特性（如自主性、工具使用和现实世界中的使用）密切相关。开发者在考虑下游社会风险（如工作替代和最终用户隐私）之前，优先考虑产品和业务风险。这种优先级排序也影响了开发者缓解代理风险的能力和动机。最后，开发者缺乏用于控制代理风险的成熟手段，通常依赖于限制使代理有用的相同特性：例如，自主性和目标复杂性。这些发现揭示了代理型AI开发中能力与风险控制之间的张力：开发者需要应对由代理能力产生的风险，但目前他们在不限制代理功能的情况下应对这些风险的支持有限。

英文摘要

Agentic AI systems act autonomously, use tools, adapt to context, and operate in complex real-world environments. However, these same characteristics can create or exacerbate product risks. We studied how industry developers (n=35) perceive, prioritize, and address the risks in their agentic AI products. We found that developers' perceptions of risk were closely tied to the qualities that made the product agentic, such as autonomy, tool use, and usage in a real-world context. Developers prioritized product and business risks before considering downstream societal risks like job displacement and end-user privacy. This prioritization also impacted developers' ability and motivation to mitigate agentic risks. Finally, developers lacked mature controls for containing agentic risks, often relying on constraining the same characteristics that make agents useful: e.g., autonomy and goal complexity. These findings reveal a capability vs. risk control tension in agentic AI development: developers need to address risks that emerge from agentic capabilities, yet they currently have limited support for doing so without constraining agentic functionality.

URL PDF HTML ☆

赞 0 踩 0

2606.15521 2026-06-16 cs.CL cs.LG 交叉投稿

Emergent retokenization symmetry in large language models: phenomenology and applications

大型语言模型中涌现的重分词对称性：现象学与应用

Kanishk Jain, Matthew Day, Tankut Can

发表机构 * Department of Physics, Emory University（埃默里大学物理系）

AI总结研究发现大型语言模型在训练中部分涌现出重分词对称性，通过重分词实验探测模型对语义等价输入表示的敏感性和鲁棒性，并提出一种新的推理时采样策略。

详情

AI中文摘要

分词引入了表示冗余：在固定词表下，每个字节串存在多种有效的分词编码（或切分方式），它们解码后得到相同的表面字符串。然而，给定提示词时，大多数语言模型的分词器通过返回规范切分打破了这种表示对称性。仅基于规范切分进行训练应会影响推理行为，且几乎没有理由期望模型在下游任务中尊重切分对称性。我们发现这种对称性在训练过程中部分涌现。本文通过实验探测这种涌现对称性，测试了分词组合理解、表示多样性和任务导向的基准性能。我们主要使用\textbf{重分词}——在保持字节完全不变的情况下，将提示词的规范分词替换为另一种切分。相对于其他提示扰动，重分词异常干净，因为它隔离了切分效果而不改变语法、语义或表面形式。我们利用重分词研究预训练和后训练中对语义等价输入表示的敏感性和鲁棒性。此外，这种部分重分词对称性暗示了一个不同的推理时采样轴。温度采样通过模型的下一个词概率分布生成多样输出，而重分词通过语义等价的输入表示从模型内部计算生成多样性。我们发现，虽然这种重分词采样策略在简单问题上可能损害性能，但它也能恢复传统采样无法找到的解决方案。总体而言，我们的工作将重分词呈现为一种简单而强大的大型语言模型探测工具，揭示了组合理解和提示敏感性，并提供了一种新颖的采样策略。

英文摘要

Tokenization introduces representational redundancy: under a fixed token vocabulary, every byte string admits many valid token encodings, or segmentations, that decode to the same surface string. However, given a prompt, most language model tokenizers break this representational symmetry by returning a canonical segmentation. Training only on canonical segmentations should influence inference behavior, and there is little reason to expect models to respect segmentation symmetry on downstream tasks. We find that this symmetry partially emerges during training. Here, we probe this emergent symmetry through experiments testing token compositional understanding, representation diversity, and task focused benchmark performance. We primarily use \textbf{retokenization} -- replacing a prompt's canonical tokenization with an alternative segmentation while preserving its bytes exactly. Relative to other prompt perturbations, retokenization is unusually clean because it isolates segmentation effects without changing syntax, semantics or surface form. We use retokenization to study sensitivity and robustness to semantically identical input representations across pretraining and post-training. Moreover, this partial retokenization symmetry suggests a distinct inference-time sampling axis. While temperature sampling generates diverse outputs from the model using its next-token probability distribution, retokenization generates diversity from the model's internal computations through semantically equivalent input representations. We find that while this retokenization sampling strategy can hurt performance on easy problems, it can also recover solutions that conventional sampling does not find. Overall, our work presents retokenization as a simple yet powerful probe of large language models, shedding light on compositional understanding and prompt sensitivity, and offering a novel sampling strategy.

URL PDF HTML ☆

赞 0 踩 0

2606.15579 2026-06-16 cs.AI cs.LG cs.MA cs.SE 交叉投稿

Your Agent Has a Genome: Sequence-Level Behavioral Analysis and Runtime Governance of LLM-Powered Autonomous Agents

你的智能体有基因组：基于序列的LLM驱动自主智能体行为分析与运行时治理

Sidi Deng

发表机构 * Independent Researcher（独立研究员）

AI总结提出XEPV序列编码框架，将LLM智能体行为建模为基因组序列，通过n-gram挖掘发现P-X-P高风险模式，设计Governor三层干预系统，使成功率提升6.2%并减少44% token消耗。

Comments 16 pages, 15 figures, 12 tables

详情

AI中文摘要

我们提出基础序列分析框架，该框架将LLM驱动的自主智能体的运行时行为编码为使用四个字母的字母表的紧凑符号序列：X（探索）、E（执行）、P（规划）和V（验证）。借鉴基因组序列分析的类比，我们对从生产ReAct智能体系统收集的347条真实世界执行轨迹（跨越8天）应用n-gram模式挖掘、马尔可夫转移矩阵和点二列相关分析。我们的分析揭示：(1) 三元组P-X-P是唯一统计显著的高风险模式，使成功率降低10.4%；(2) P比率是成功的最强负预测因子（r=-0.256, p<0.0001）；(3) E→V转移概率仅为2.1%，表明存在系统性验证缺陷。基于这些发现，我们设计了Governor，一个三层运行时干预系统，包括规则引擎、统计累加器和基于卡方的阈值自适应器。在自然的部署前后评估中（N=101 vs. N=246），Governor使任务成功率绝对提升6.2%，同时平均token消耗减少44%。为验证跨系统通用性，我们将XEPV编码应用于SWE-bench上2000条公开SWE-agent轨迹，确认探索螺旋和E→V验证缺陷在独立系统中复现。我们概述了六个研究方向，包括基础序列语言模型、跨智能体行为指纹识别和奖励塑造，并发布开源工具包以促进可重复性。

英文摘要

We propose Base Sequence Analysis, a framework that encodes the runtime behavior of LLM-powered autonomous agents into compact symbolic sequences using a four-letter alphabet: X (Explore), E (Execute), P (Plan), and V (Verify). Drawing an analogy to genomic sequence analysis, we apply n-gram pattern mining, Markov transition matrices, and point-biserial correlation to 347 real-world execution traces collected from a production ReAct agent system over 8 days. Our analysis reveals that (1) the trigram P-X-P is the only statistically significant high-risk pattern, lowering success rate by 10.4%; (2) P-ratio is the strongest negative predictor of success (r=-0.256, p<0.0001); and (3) the E->V transition probability is only 2.1%, indicating a systemic verification deficit. Based on these findings, we design Governor, a three-layer runtime intervention system comprising a rule engine, a statistical accumulator, and a chi-square-based threshold adaptor. In a natural before/after deployment evaluation (N=101 vs. N=246), Governor achieves a +6.2% absolute increase in task success rate while simultaneously reducing average token consumption by 44%. To validate cross-system generality, we apply the XEPV encoding to 2,000 public SWE-agent trajectories on SWE-bench, confirming that exploration spirals and the E->V verification deficit replicate in an independent system. We outline six research directions including base sequence language models, cross-agent behavioral fingerprinting, and reward shaping, and release an open-source toolkit for reproducibility.

URL PDF HTML ☆

赞 0 踩 0

2606.16407 2026-06-16 cs.CL cs.LG 交叉投稿

A Mechanistic Understanding of Pronoun Fidelity in LLMs

对大型语言模型中代词忠实性的机制理解

Katharina Trinley, Jesujoba O. Alabi, Dietrich Klakow, Vagrant Gautam

发表机构 * Saarland University（萨尔大学）； Heidelberg Institute for Theoretical Studies（海德堡理论研究所）

AI总结通过因果分析发现，代词忠实性由组实体绑定、近因偏差和刻板印象偏差三种因果子空间共同作用，解释了91-99.5%的行为。

详情

AI中文摘要

忠实且稳健的代词使用对于公平和连贯的生成至关重要，然而当多个指代对象使用不同代词时，大型语言模型大多会失败。为了研究推理、重复和偏差在此任务中的相互作用，先前的工作完全依赖行为方法，这可能无法反映模型的内部运作。因此，我们提供了关于代词忠实性的机制性、模型内部视角，测试了三种机制——组实体绑定（G）、近因偏差（R）和刻板印象偏差（S）——是否在多个SOTA语言模型中因果实现。使用无界分布式对齐搜索，我们发现三者作为因果子空间共存，分布在网络深度上。没有单一机制能完全解释模型行为，但三者的组合一致地解释了91-99.5%。注意力头分析进一步揭示了两种竞争的复制路径；组绑定和刻板印象共享一个局部化的概念级路径，检索绑定的职业-代词单元，而近因使用分布式的令牌级路径，重复表面形式。总之，代词忠实性源于同时活跃的因果子空间之间的竞争。

英文摘要

Faithful and robust pronoun use is important for fair and coherent generations, yet large language models largely fail when multiple referents use different pronouns. To study the interplay of reasoning, repetition, and bias in this task, prior work relies exclusively on behavioural approaches, which may not reflect a model's internal workings. Therefore, we provide a mechanistic, model-internal perspective on pronoun fidelity, testing whether three mechanisms -- group entity binding (G), recency bias (R), and stereotypical bias (S) -- are causally implemented across several SOTA language models. Using Boundless Distributed Alignment Search, we find all three coexist as causal subspaces distributed across network depth. No single mechanism fully explains model behaviour, but a combination of the three consistently accounts for 91-99.5%. An attention head analysis further reveals two competing copying routes; group binding and stereotype share a localized concept-level route that retrieves a bound occupation-pronoun unit, while recency uses a distributed token-level route that repeats surface forms. In sum, pronoun fidelity arises from competition between simultaneously active causal subspaces.

URL PDF HTML ☆

赞 0 踩 0

2606.16988 2026-06-16 cs.SE cs.LG 交叉投稿

成员推断攻击的因果评估

Mathieu Even, Clément Berenfeld, Linus Bleistein, Tudor Cebere, Julie Josse, Aurélien Bellet

发表机构 * Inria（法国国家科学研究中心）； PreMeDICaL, Inserm, Montpellier, France（PreMeDICaL、法国国家医学研究院、蒙彼利埃，法国）； School of Computer and Communication Science (EPFL)（信息与通信科学学院（EPFL））； School of Life Sciences (EPFL)（生命科学学院（EPFL））； Lausanne, Switzerland（瑞士洛桑）

AI总结将成员推断攻击评估视为因果推断问题，定义记忆化为包含数据点的因果效应，提出多轮、单轮和零轮设置下的实用估计器并验证其有效性。

Comments Fixed ref label problems

详情

AI中文摘要

成员推断攻击（MIA）旨在区分训练点（成员）和未见数据（非成员），并广泛用于量化记忆化和评估隐私风险。标准MIA评估需要重复训练，对于大型模型计算成本高昂。单轮（单次训练，随机数据包含）和零轮（事后评估）方法常被用作替代，但其统计有效性尚不清楚。我们通过将MIA评估框架化为因果推断问题来填补这一空白，将\emph{记忆化定义为在训练集中包含一个数据点的因果效应}。这一新颖的表述揭示并形式化了现有协议中偏差的关键来源：单轮方法受到联合包含点之间的干扰，而零轮评估还受到成员与非成员评估数据之间分布偏移的混淆。我们推导了标准MIA指标的因果类比，并提出了多轮、单轮和零轮设置下的实用估计器，具有非渐近一致性保证。我们在多个设置中验证了我们的方法，包括预训练和微调的大型语言模型，表明它能够在无需重新训练且存在分布偏移的情况下可靠地测量MIA性能。总体而言，我们的框架为现代AI系统中的隐私评估提供了原则性基础。

英文摘要

Membership Inference Attacks (MIAs) aim to distinguish training points (members) from unseen data (non-members), and are widely used to quantify memorization and assess privacy risks. Standard MIA evaluation requires repeated retraining, which is computationally costly for large models. One-run (single training with randomized data inclusion) and zero-run (post hoc evaluation) methods are often used instead, but their statistical validity remains unclear. We address this gap by framing MIA evaluation as a causal inference problem, defining \emph{memorization as the causal effect of including a data point in the training set}. This novel formulation reveals and formalizes key sources of bias in existing protocols: one-run methods suffer from interference between jointly included points, while zero-run evaluations are additionally confounded by distribution shift between member and non-member evaluation data. We derive causal analogues of standard MIA metrics and propose practical estimators for multi-run, one-run, and zero-run regimes with non-asymptotic consistency guarantees. We validate our approach in several settings, including pretrained and fine-tuned LLMs, showing that it enables reliable measurement of MIA performance without retraining and under distribution shift. Overall, our framework provides a principled foundation for privacy evaluation in modern AI systems.

URL PDF HTML ☆

赞 0 踩 0

2602.09326 2026-06-16 cs.LG 版本更新

Priority-Aware Shapley Value

优先级感知的Shapley值

Kiljae Lee, Ziqi Liu, Weijing Tang, Yuan Zhang

发表机构 * arXiv

AI总结提出优先级感知Shapley值(PASV)，通过硬约束和软优先级权重扩展Shapley值，适用于依赖贡献者场景，并开发高效采样算法。

详情

AI中文摘要

Shapley值广泛用于模型无关的数据估值和特征归因，但隐含假设贡献者可互换。当贡献者存在依赖关系（例如，重用/增强数据或因果特征排序）或贡献应根据信任或风险等因素调整时，这可能存在问题。我们提出优先级感知Shapley值（PASV），它同时包含硬优先级约束和软、贡献者特定的优先级权重。PASV适用于一般优先级结构，将仅优先级和仅权重的Shapley变体作为特例恢复，并由自然公理唯一刻画。我们开发了一种高效的相邻交换Metropolis-Hastings采样器，用于可扩展的蒙特卡洛估计，并分析了由极端优先级权重引起的极限状态。在数据估值（MNIST/CIFAR10）和特征归因（Census Income）上的实验展示了更结构忠实的分配，并通过我们提出的“优先级扫描”进行了实用的敏感性分析。

英文摘要

Shapley values are widely used for model-agnostic data valuation and feature attribution, yet they implicitly assume contributors are interchangeable. This can be problematic when contributors are dependent (e.g., reused/augmented data or causal feature orderings) or when contributions should be adjusted by factors such as trust or risk. We propose Priority-Aware Shapley Value (PASV), which incorporates both hard precedence constraints and soft, contributor-specific priority weights. PASV is applicable to general precedence structures, recovers precedence-only and weight-only Shapley variants as special cases, and is uniquely characterized by natural axioms. We develop an efficient adjacent-swap Metropolis-Hastings sampler for scalable Monte Carlo estimation and analyze limiting regimes induced by extreme priority weights. Experiments on data valuation (MNIST/CIFAR10) and feature attribution (Census Income) demonstrate more structure-faithful allocations and a practical sensitivity analysis via our proposed "priority sweeping".

URL PDF HTML ☆

赞 0 踩 0

2605.18909 2026-06-16 cs.LG cs.SY eess.SY 版本更新

用于分层分类的同时潜在预算树

Simultaneous Latent Budget Trees for Stratified Classification Cristian Buoncompagni, Stefano Pellegrino, Giulia Vannucci, Raffaele Dubbioso, Roberta Siciliano

AI总结提出同时潜在预算树框架，通过模型驱动的分裂规则处理分层因素，实现可解释分类，并应用于肌萎缩侧索硬化症性别差异分析。

详情

AI中文摘要

在可解释人工智能时代，单棵树因其易于解释而重新受到关注。本文介绍了同时潜在预算树，这是一个概率机器学习框架，用于在存在分层因素（如时间、空间或人口统计变量）作为控制变量或潜在混杂因素时的分类树。标准的树生长过程并非设计用于优化条件分裂规则。提出了一种基于模型的分裂规则，其中子节点被解释为同时混合模型（如同时潜在预算模型及其约束版本）的潜在成分，该模型拟合于父节点。混合参数驱动观测值（不同组别不同）到达子节点，而潜在预算参数更新控制变量每个水平的响应类别轮廓。参数通过最小二乘法估计，考虑模型的神经网络视角。信息丰富的树结构可以通过节点和路径上的解释辅助工具进行交互式可视化，包括视觉剪枝和决策树选择过程。提出了适当的措施来处理不平衡的响应类别分布。所提出的方法应用于调查肌萎缩侧索硬化症疾病进展中的性别相关差异。SLBT库及其各种基于树的算法可在链接的GitHub仓库中获取。

英文摘要

In the era of Explainable Artificial Intelligence, there is a renewed focus on single trees for their ease of interpretation. This paper introduces Simultaneous Latent Budget Trees, a probabilistic machine learning framework for classification trees in the presence of a stratification factor such as a temporal, spatial, or demographic variable, acting as a control variable or potential confounder. Standard tree growth procedures are not designed to optimize a conditional split rule. A model-based split rule is proposed in which child nodes are interpreted as latent components of a simultaneous mixture model, such as the Simultaneous Latent Budget Model and its constrained versions, fitted to the parent node. Mixing parameters drive the observations, differently for each group, to the child nodes whereas latent budgets parameters update the response classes profile of each level of the control variable. Parameters are estimated by least squares considering a neural network perspective of the model. An informative tree structure can be interactively visualized with interpretation aids on the node and the paths, including visual pruning and decision tree selection procedure. Suitable measures are proposed to handle an unbalanced response class distribution. The proposed methodology is applied to investigate gender-related differences in disease progression of Amyotrophic Lateral Sclerosis. The SLBT library with the various tree-based algorithms is available in the linked GitHub repository.

URL PDF HTML ☆

赞 0 踩 0

2605.02593 2026-06-16 cs.LG cs.MS 版本更新

Gradient Boosted Risk Scores

梯度提升风险评分

Costa Georgantas, Jonas Richiardi

发表机构 * Department of Radiology, Lausanne University Hospital and University of Lausanne（放射科，洛桑大学医院和洛桑大学）

AI总结提出基于梯度提升的算法构建紧凑且可预测的风险评分模型，能建模非线性效应，在12个表格数据集上相比回归方法平均减少60%分类规则和16%时间事件规则。

详情

AI中文摘要

风险评分是一类可解释且可操作的机器学习模型，在医学、保险和风险管理中有应用。与大多数计算方法不同，风险评分设计为由人类通过基于有限标准集对数据样本分配分数来计算。生成风险评分的最常见方法使用线性回归来估计选定变量的效应。我们提出了一种构建紧凑且预测性强的风险评分的简单有效方法。我们提供了一种基于梯度提升的算法，能够建模非线性效应，并附带一个C++实现以及Python和R绑定。通过在12个表格数据集（涵盖回归、分类和时间事件任务）上的广泛实证评估，我们表明，与基于回归的替代方法相比，我们的方法在实现竞争性预测性能的同时，生成了更紧凑的评分，分类任务平均减少60%的规则，时间事件任务平均减少16%的规则（与AutoScore相比）。

英文摘要

Risk scores are an interpretable and actionable class of machine learning models with applications in medicine, insurance, and risk management. Unlike most computational methods, risk scores are designed to be computed by a human by attributing points to a data sample based on a limited set of criteria. The most common approaches for generating risk scores use linear regressions to estimate the effect of selected variables. We propose a simple and effective approach towards building compact and predictive risk scores. We provide an algorithm based on gradient boosting that is capable of modeling nonlinear effects, along with a C++ implementation with Python and R bindings. Through extensive empirical evaluation on twelve tabular datasets spanning regression, classification, and time-to-event tasks, we show that our method achieves competitive predictive performance while producing substantially more compact scores than regression-based alternatives, with 60% fewer rules for classification tasks and 16% fewer rules for time-to-event tasks on average, compared to AutoScore.

URL PDF HTML ☆

赞 0 踩 0

2605.25006 2026-06-16 cs.RO cs.LG cs.NE 版本更新

Convex-Neural RRT*: Fast and Reliable Learning-Guided Sampling for High-Quality Robot Path Planning

Convex-Neural RRT*: 快速可靠的基于学习引导的高质量机器人路径规划采样

Hichem Cheriet, Badra Khellat Kihel, Samira Chouraqui, Bara J. Emran

AI总结提出Convex-Neural RRT*算法，通过神经网络预测高质量路径附近的凸候选区域来引导采样，在多种环境中相比神经引导变体减少30-75%计算时间，路径长度平均减少约5%，成功率超99%。

详情

DOI: 10.1109/ACCESS.2026.3703346

AI中文摘要

基于采样的机器人路径规划算法在不同障碍物配置的环境中提供了概率完备性和强经验收敛性。然而，在实践中，这些方法通常需要多次迭代才能获得高质量解。本文提出了Convex-Neural RRT*，一种增强的RRT*变体，它结合神经引导来预测高质量路径附近的信息性航点区域。从这些预测中提取凸候选区域，使规划器能够将探索集中在几何相关区域，同时保持全局探索。该算法在三种环境类型和18个基准地图上与Neural RRT*、Neural Informed RRT*、经典RRT*和LTA*进行了评估。实验结果表明，与神经引导变体相比，Convex-Neural RRT*减少了30-75%的计算时间，相对于LTA*减少了高达88-98%，同时与经典RRT*相比，平均路径长度减少了约5%，在复杂环境中改进更大。该方法在不同障碍物密度下保持了超过99%的整体成功率。这些发现表明，凸引导神经采样在计算效率和解质量之间提供了有效平衡，支持其在时间敏感的机器人导航任务中的适用性。

英文摘要

Sampling-based algorithms for robot path planning offer probabilistic completeness and strong empirical convergence properties across environments with diverse obstacle configurations. However, in practice, these methods often require many iterations to obtain high-quality solutions. This paper proposes Convex-Neural RRT*, an enhanced RRT* variant that incorporates neural guidance to predict informative waypoint regions near high-quality paths. Convex candidate regions are extracted from these predictions, enabling the planner to concentrate exploration on geometrically relevant areas while preserving global exploration. The proposed algorithm is evaluated against Neural RRT*, Neural Informed RRT*, classical RRT*, and LTA* across three environment types and 18 benchmark maps. Experimental results show that Convex-Neural RRT* reduces computation time by 30-75% compared to neural-guided variants and up to 88-98% relative to LTA*, while achieving an average path length reduction of approximately 5% compared to classical RRT*, with larger improvements observed in complex environments. The method also maintains an overall success rate above 99% across varying obstacle densities. These findings indicate that convex-guided neural sampling provides an effective balance between computational efficiency and solution quality, supporting its applicability to time-sensitive robotic navigation tasks.

URL PDF HTML ☆

赞 0 踩 0

2502.06178 2026-06-16 math.OC cs.LG stat.ML 版本更新

Bayesian Optimization by Kernel Regression and Density-based Exploration

基于核回归和密度探索的贝叶斯优化

Tansheng Zhu, Hongyu Zhou, Ke Jin, Xusheng Xu, Qiufan Yuan, Lijie Ji

发表机构 * Zhiyuan College, Shanghai Jiao Tong University, Shanghai 200240, P. R. China（上海交通大学紫阳学院）； School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai 200240, P. R. China（上海交通大学数学科学学院）； Shanghai Institute of Aerospace Systems Engineering, Shanghai 201109, P. R. China（上海航天系统工程研究院）； Department of Mathematics, Shanghai University, Shanghai 200444, P. R. China（上海大学数学系）； Newtouch Center for Mathematics of Shanghai University, Shanghai University, Shanghai 200444, P. R. China（上海大学数学中心）

AI总结该研究提出了一种新的贝叶斯优化算法BOKE，通过核回归和密度探索结合，减少计算成本至二次复杂度，并在理论和实验上证明了其收敛性和有效性。

详情

AI中文摘要

贝叶斯优化在优化昂贵评估的黑盒函数时非常有效，但因高斯过程的每次迭代三次计算复杂度而面临显著的计算挑战，导致总时间复杂度与迭代次数的四次方成正比。为了解决这一限制，我们提出了一种新的算法，即基于核回归和密度探索的贝叶斯优化（BOKE）。BOKE利用核回归进行高效的函数近似，核密度用于探索，并将它们整合到置信界标准中以指导优化过程，从而将计算成本降低到二次。我们的理论分析严格建立了在噪声评估下的BOKE全局收敛性。通过广泛的数值实验，在合成和现实优化任务中，我们证明了BOKE不仅在与高斯过程方法和其他基线方法相比具有竞争力，而且表现出优越的计算效率。这些结果突显了BOKE在资源受限环境中的有效性，为工程应用中的优化问题提供了一种实用的方法。

英文摘要

Bayesian optimization is highly effective for optimizing expensive-to-evaluate black-box functions, but it faces significant computational challenges due to the cubic per-iteration cost of Gaussian processes, which results in a total time complexity that is quartic with respect to the number of iterations. To address this limitation, we propose a novel algorithm, Bayesian optimization by kernel regression and density-based exploration (BOKE). BOKE uses kernel regression for efficient function approximation, kernel density for exploration, and integrates them into the confidence bound criteria to guide the optimization process, thus reducing computational costs to quadratic. Our theoretical analysis rigorously establishes the global convergence of BOKE under noisy evaluations. Through extensive numerical experiments on both synthetic and real-world optimization tasks, we demonstrate that BOKE not only performs competitively compared to Gaussian process-based methods and several other baseline methods but also exhibits superior computational efficiency. These results highlight BOKE's effectiveness in resource-constrained environments, providing a practical approach for optimization problems in engineering applications.

URL PDF HTML ☆

赞 0 踩 0

2605.06184 2026-06-16 cs.SE cs.LG cs.LO cs.PL 版本更新

Teaching LLMs Program Semantics via Symbolic Execution Traces

通过符号执行轨迹教学LLM程序语义

Jonas Bayer, Stefan Zetzsche, Olivier Bouissou, Remi Delmas, Michael Tautschnig, Soonho Kong

发表机构 * University of Cambridge（剑桥大学）； Amazon Web Services（亚马逊网络服务）

AI总结本文通过符号执行轨迹训练提升LLM对程序语义的理解，发现结合推理的训练显著提升了漏洞检测能力，且在不同属性类型上均有效。

详情

AI中文摘要

我们介绍了一个基于SV-COMP 2025的500个C语言验证任务评估框架，覆盖五种属性类型（内存安全、溢出、终止、可达性、数据竞争）。我们评估了14种模型，发现高整体准确率掩盖了关键弱点：虽然大多数模型能可靠确认属性成立，但违反检测差异大且随程序长度下降。为解决这一差距，我们训练了形式验证 artifacts：运行Soteria符号执行引擎于通用开源C代码并利用生成的轨迹继续预训练Qwen3-8B。仅约3,000个bug轨迹结合推理在推理时提升违反检测超过17个百分点，产生评估模型中最平衡的准确率曲线。在违反检测方面，训练后的8B模型在不思考的情况下优于4倍大的Qwen3-32B，在整体准确率上接近。轨迹训练与推理的交互是超加性的：单独使用无法带来明显提升，但结合使用则有效。改进在所有五种属性类型上均有效，包括训练轨迹未目标的属性类型。我们的28种配置证实收益源于轨迹语义而非代码体积，且轨迹整理和格式至关重要。

英文摘要

We introduce an evaluation framework of 500 C verification tasks across five property types (memory safety, overflow, termination, reachability, data races) built on SV-COMP 2025, and evaluate 14 models across six families. We find that high overall accuracy masks a critical weakness: while most models reliably confirm properties hold, violation detection varies widely and degrades sharply with program length. To close this gap, we train on formal verification artifacts: running the Soteria symbolic execution engine on generic open-source C code and using the resulting traces for continued pretraining of Qwen3-8B. Just ${\sim}$3,000 bug traces combined with chain-of-thought reasoning at inference time improve violation detection by over 17 percentage points, producing one of the most balanced accuracy profiles among evaluated models. On violation detection, the trained 8B model outperforms the 4$\times$ larger Qwen3-32B without thinking and approaches it in overall accuracy. The interaction between trace training and chain-of-thought is superadditive: neither alone provides meaningful gains, but their combination does. Improvements transfer across all five property types, including ones the training traces do not target. Our 28 configurations confirm the gains stem from trace semantics, not code volume, and that trace curation and format matter.

URL PDF HTML ☆

赞 0 踩 0

2604.22795 2026-06-16 eess.SY cs.LG cs.SY 版本更新

Load constrained wind farm flow control through multi-objective multi-agent reinforcement learning

基于多目标多智能体强化学习的负载约束风电场流动控制

Teodor Åstrand, Marcus Binder Nilsen, Iasonas Tsaklis, Tuhfe Göçmen, Pierre-Elouan Réthoré, Nikolay Dimitrov

发表机构 * Department of Wind and Energy Systems, Technical University of Denmark（丹麦技术大学风能与能源系统系）

AI总结提出多智能体强化学习框架，结合独立软演员-评论家架构和数据驱动代理模型，在风电场流动控制中通过形状奖励函数约束损伤等效载荷增量，实现功率提升与负载控制的多目标优化。

Comments Submitted to Journal of Physics: Conference Series (Torque 2026). This is the Accepted Manuscript version of an article accepted for publication in Journal of Physics: Conference Series. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. This Accepted Manuscript is published under a CC BY licence

详情

DOI: 10.1088/1742-6596/3224/3/032065
Journal ref: J. Phys.: Conf. Ser. 3224 032065 (2026)

AI中文摘要

本研究提出了一种用于负载约束风电场流动控制（WFFC）的多智能体强化学习（MARL）框架。虽然尾流偏转可以提升风电场总功率，但通常会增加下游风机的结构载荷。为了解决这一问题，我们将独立软演员-评论家（I-SAC）架构与数据驱动的局部入流扇区平均代理模型相结合，以实时估计损伤等效载荷（DELs）。通过将这些估计值纳入形状奖励函数，训练特定风机的智能体在相对于基线控制器遵守特定载荷增加阈值（$Δ_{max}$）为10%、20%和30%的同时最大化发电量。该框架在WindGym环境中实现，使用带有动态尾流蜿蜒（DWM）模型的DYNAMIKS流动求解器来捕捉非稳态尾流物理特性。结果表明，MARL智能体成功学习了协作策略，优先考虑功率增益，同时主动回避高DEL控制策略。

英文摘要

This study presents a multi-agent reinforcement learning (MARL) framework for load-constrained wind farm flow control (WFFC). While wake steering can enhance total wind farm power, it often introduces increased structural loads on downstream turbines. To address this, we integrate an Independent Soft Actor-Critic (I-SAC) architecture with a data-driven, local inflow sector-averaged surrogate model to provide real-time estimates of Damage Equivalent Loads (DELs). By incorporating these estimates into a shaped reward function, turbine-specific agents are trained to maximize power production while adhering to specific load-increase thresholds ($Δ_{max}$) of 10%, 20%, and 30% relative to a baseline controller. The framework is implemented within the WindGym environment using the DYNAMIKS flow solver with Dynamic Wake Meandering (DWM) model to capture non-stationary wake physics. Results indicate that the MARL agents successfully learn collaborative policies that prioritise power gain while actively retreating from high-DEL control strategies.

URL PDF HTML ☆

赞 0 踩 0

2511.12635 2026-06-16 cs.SE cs.AI cs.LG 版本更新

LLM4SCREENLIT: Recommendations on Assessing the Performance of Large Language Models for Screening Literature in Systematic Reviews

LLM4SCREENLIT: 关于评估用于系统综述文献筛选的大型语言模型性能的建议

Lech Madeyski, Barbara Kitchenham, Martin Shepperd

发表机构 * University of Kent（肯特大学）； University of Leicester（利兹大学）； University of Birmingham（伯明翰大学）

AI总结本文提出LLM4SCREENLIT建议，针对系统综述文献筛选中大型语言模型的评估，提出基于加权马修相关系数的改进方法，强调在不平衡和成本不对称条件下使用成本敏感的WMCC进行评估。

Comments 34 pages, 6 figures

详情

DOI: 10.1016/j.infsof.2026.108204
Journal ref: Information and Software Technology 198 (2026) 108204

AI中文摘要

本文提出LLM4SCREENLIT建议，针对系统综述文献筛选中大型语言模型的评估，提出基于加权马修相关系数的改进方法，强调在不平衡和成本不对称条件下使用成本敏感的WMCC进行评估。

英文摘要

Context: Large language models (LLMs) are increasingly used to screen literature for systematic reviews (SRs), but the standard confusion-matrix metrics used to evaluate them can mislead under the imbalanced, cost-asymmetric conditions of screening. Objective: We develop and justify LLM4SCREENLIT-practical recommendations for researchers conducting LLM-screening evaluations and for editors and reviewers assessing such studies-differentiated by study type (retrospective benchmarking vs deployment for a specific SR). Method: Using Delgado-Chaves et al. (2025), an 18-LLM benchmark across three biomedical SRs, as a motivating example, we reviewed 28 additional papers and extracted their reported metrics. We propose a Weighted Matthews Correlation Coefficient (WMCC) that integrates MCC's chance-correction with asymmetric misclassification costs, and validated it on three software-engineering (SE) reanalyses, the largest covering 9 LLMs x 24 SE secondary studies (34,528 articles). Results: Across the 29 papers, only 10% reported MCC, only 24% reported full confusion matrices, and none of the five papers claiming workload savings priced false-negative cost. In the largest SE reanalysis, MCC and WMCC disagree on the best LLM in 55% of evaluable studies; in the most striking 9,695-article SE study, the Accuracy-best LLM loses 63.3% of relevant evidence (Lost Evidence), the MCC-best 43.9%, but the WMCC-best only 5.8%. Sensitivity analysis (median crossover at w~=2.7, all <7) supports w=10 as a conservative default. Conclusions: SR-screening evaluations should prioritize Lost Evidence and use cost-sensitive WMCC alongside MCC for ranking. Reporting must include the full confusion matrix and treat unclassifiable outputs as positives requiring human review. Designs should be leakage-aware, with non-LLM baselines when the study aims to inform SR practice and labels are available.

URL PDF HTML ☆

赞 0 踩 0

2601.03612 2026-06-16 cs.LG cs.SD eess.AS 版本更新

Mathematical Foundations of Polyphonic Music Generation via Structural Inductive Bias

通过结构归纳偏差的多声部音乐生成的数学基础

Joonwon Seo

发表机构 * GitHub

AI总结本文通过结构归纳偏差提出多声部音乐生成的数学框架，采用贝多芬钢琴奏鸣曲案例，引入Smart Embedding架构，减少参数并提升模型稳定性。

Comments 86 pages. A comprehensive monograph on the Smart Embedding architecture for polyphonic music generation. Includes rigorous theoretical proofs using Information Theory, Rademacher Complexity, and the Rank-Preserving Transversality Property (RPTP), along with empirical validation and a human listening study (N=53)

详情

AI中文摘要

本文通过结构归纳偏差解决AI音乐生成中的'缺失中间'问题，即产生连贯的、句级音乐结构的挑战。以贝多芬的钢琴奏鸣曲为例，引入Smart Embedding架构，一种基于经验证实的音高和手部属性独立性（NMI=0.167）的因子化表示。该架构在减少嵌入参数48.3%的同时，将验证损失降低了9.47%。理论层面，通过信息论、Rademacher复杂度分析（得出28.09%更紧的泛化界限）和范畴论解释建立正式保证。这些结果进一步通过奇异值分解分析和盲专家听觉研究（N=53）得到支持。总体而言，本文结合了架构创新与数学严谨性，为复杂序列数据生成模型提供了原则性的框架，使其更加高效、稳定和可解释。

英文摘要

This monograph addresses the "Missing Middle" problem in AI music generation - the challenge of producing coherent, phrase-level musical structure. Using Beethoven's piano sonatas as a case study, I introduce the Smart Embedding architecture, a factorized representation grounded in the empirically verified independence of pitch and hand attributes (NMI=0.167). The architecture achieves a 48.3% reduction in embedding parameters while improving validation loss by 9.47%. Theoretically, I establish formal guarantees through information theory, Rademacher complexity analysis (yielding a 28.09% tighter generalization bound), and category-theoretic interpretation. These results are further supported by Singular Value Decomposition analysis and a blind expert listening study (N=53). Collectively, this work presents a dual contribution that combines architectural innovation with mathematical rigor, offering a principled framework for building more efficient, stable, and interpretable generative models for complex sequential data.

URL PDF HTML ☆

赞 0 踩 0

2603.22530 2026-06-16 cs.LG 版本更新

Multimodal Training to Unimodal Deployment: Leveraging Unstructured Data During Training to Optimize Structured Data Only Deployment

多模态训练到单模态部署：利用训练中的无结构数据优化仅结构化数据的部署

Zigui Wang, Minghui Sun, Jiang Shu, Matthew M. Engelhard, Lauren Franz, Benjamin A. Goldstein

发表机构 * Department of Biostatistics and Bioinformatics, Duke University School of Medicine（生物统计学与生物信息学系，杜克大学医学中心）； Department of Pediatrics, Duke University School of Medicine（儿科学系，杜克大学医学中心）； Duke Center for Autism and Brain Development, Duke University School of Medicine（杜克大学自主与脑发展中心，杜克大学医学中心）； Department of Psychiatry and Behavioral Sciences, Durham, NC, USA（精神病学与行为科学系，新伯尔尼，NC，美国）

AI总结本文提出一种多模态学习框架，利用训练中的无结构EHR数据提升模型对结构化数据的识别能力，通过对比学习和知识蒸馏损失联合训练，实现仅结构化数据部署的高效分类模型。

Comments 10 pages,3 figures

详情

Journal ref: Proceedings of the AMIA 2026 Annual Symposium, American Medical Informatics Association (AMIA), 2026

AI中文摘要

无结构电子健康记录（EHR）数据，如临床笔记，包含未直接反映在结构化数据字段中的临床上下文观察。这些额外信息可以显著提升模型学习。然而，由于其无结构特性，这些数据在部署模型时往往不可用或不实际使用。我们引入了一种多模态学习框架，在训练过程中利用无结构EHR数据，同时生成仅能使用结构化EHR数据部署的模型。使用3,466名晚说话儿童的队列，我们生成了注释嵌入（BioClinicalBERT）并编码了人口统计学和医疗代码的结构化嵌入。通过对比学习和对比知识蒸馏损失联合训练，一个基于注释的教师模型和仅结构化的学生模型生成了一个强大的分类器（AUROC = 0.985）。我们提出的模型达到了AUROC为0.705，优于仅结构化基线的0.656。这些结果表明，在训练中纳入无结构数据增强了模型识别任务相关信息的能力，使仅结构化数据的可部署现象模型成为可能。

英文摘要

Unstructured Electronic Health Record (EHR) data, such as clinical notes, contain clinical contextual observations that are not directly reflected in structured data fields. This additional information can substantially improve model learning. However, due to their unstructured nature, these data are often unavailable or impractical to use when deploying a model. We introduce a multimodal learning framework that leverages unstructured EHR data during training while producing a model that can be deployed using only structured EHR data. Using a cohort of 3,466 children evaluated for late talking, we generated note embeddings with BioClinicalBERT and encoded structured embeddings from demographics and medical codes. A note-based teacher model and a structured-only student model were jointly trained using contrastive learning and contrastive knowledge distillation loss, producing a strong classifier (AUROC = 0.985). Our proposed model reached an AUROC of 0.705, outperforming the structured-only baseline of 0.656. These results demonstrate that incorporating unstructured data during training enhances the model's capacity to identify task-relevant information within structured EHR data, enabling a deployable structured-only phenotype model.

URL PDF HTML ☆

赞 0 踩 0

2208.00335 2026-06-16 cs.LG 版本更新

Rule Extraction in Machine Learning: Chat Incremental Pattern Constructor

机器学习中的规则提取：聊天增量模式构造器

Caleb Princewill Nwokocha

发表机构 * Caleb Princewill Nwokocha

AI总结提出ChatIPC系统，通过增量学习从文本中提取有序令牌转换规则，利用定义扩展和相似度引导候选选择构建响应，实现可解释的规则提取。

Comments 11 pages

详情

AI中文摘要

规则提取是可解释机器学习中的一个核心问题，因为它旨在将不透明的预测行为转换为人类可读的符号结构。本文提出了聊天增量模式构造器（ChatIPC），一个轻量级的增量符号学习系统，它从文本中提取有序的令牌转换规则，通过基于定义的扩展丰富这些规则，并通过相似度引导的候选选择构建响应。该系统可被视为在令牌图上运行的规则提取器，而非传统的分类器。我形式化了ChatIPC使用的知识库、定义扩展、候选评分、重复控制、英语规则启发式和响应构建机制。我还将该方法置于规则提取、决策树归纳、关联规则、可解释机器学习和序列构建的文献中。此外，详细回顾了更新后的实现：它解析嵌入式字典，标准化词汇键，缓存定义令牌和词性标签，在位集上计算Jaccard分数，应用启发式语言奖励，并使用带版本号的二进制格式持久化知识库。本文强调数学公式和算法清晰性，并为学习、评分和构建算法提供了伪代码。

英文摘要

Rule extraction is a central problem in interpretable machine learning because it seeks to convert opaque predictive behavior into human-readable symbolic structure. This paper presents Chat Incremental Pattern Constructor (ChatIPC), a lightweight incremental symbolic learning system that extracts ordered token-transition rules from text, enriches them with definition-based expansion, and constructs responses by similarity-guided candidate selection. The system may be viewed as a rule extractor operating over a token graph rather than a conventional classifier. I formalize the knowledge base, definition expansion, candidate scoring, repetition control, English-rule heuristics, and response construction mechanisms used by ChatIPC. I further situate the method within the literature on rule extraction, decision tree induction, association rules, interpretable machine learning, and sequence construction. The updated C++ code implementation of ChatIPC is also reviewed in detail: it parses an embedded dictionary, normalizes lexical keys, caches definition tokens and part-of-speech tags, computes Jaccard scores on bitsets, applies heuristic linguistic bonuses, and persists the knowledge base with a versioned binary format. The paper emphasizes mathematical formulation and algorithmic clarity, and it provides pseudocode for the learning, scoring, and construction algorithms.

URL PDF HTML ☆

赞 0 踩 0

2310.00336 2026-06-16 cs.LG 版本更新

DURENDAL: Graph deep learning framework for temporal heterogeneous networks

DURENDAL：用于时序异构网络的图深度学习框架

Manuel Dileo, Matteo Zignani, Sabrina Gaito

发表机构 * Department of Computer Science University of Milan（计算机科学系米兰大学）

AI总结本文提出DURENDAL框架，用于处理时序异构网络，通过结合快照式和多关系消息传递模型的设计原则，改进异构图学习模型，并引入新的高分辨率时序异构网络数据集进行实验验证。

详情

DOI: 10.1109/DSAA65442.2025.11416805

AI中文摘要

时序异构网络（THNs）是演化的网络，广泛应用于引文网络、事件网络、推荐系统和知识图谱等现实世界应用。尽管不同的图神经网络（GNNs）已成功应用于动态图，但大多数仅支持同构图或受到特定THNs预测任务的模型设计影响。此外，当前标准图基准数据集缺乏时序异构网络数据。因此，本文提出DURENDAL，一种用于THNs的图深度学习框架。DURENDAL通过结合快照式和多关系消息传递图学习模型的设计原则，能够将任何异构图学习模型轻松应用于演化的网络。我们引入了两种不同的方案来更新THNs的嵌入表示，讨论了两种策略的优缺点。我们还通过引入两个新的高分辨率时序异构图数据集扩展了THNs的基准数据集，这两个数据集源自新兴的Web3平台和知名的电子商务网站。整体上，我们对四个时序异构网络数据集进行了实验评估，评估设置考虑了数据的演进性质。实验显示，DURENDAL在预测能力方面优于当前解决方案，并验证了其模型设计的有效性。

英文摘要

Temporal heterogeneous networks (THNs) are evolving networks that characterize many real-world applications such as citation and events networks, recommender systems, and knowledge graphs. Although different Graph Neural Networks (GNNs) have been successfully applied to dynamic graphs, most of them only support homogeneous graphs or suffer from model design heavily influenced by specific THNs prediction tasks. Furthermore, there is a lack of temporal heterogeneous networked data in current standard graph benchmark datasets. Hence, in this work, we propose DURENDAL, a graph deep learning framework for THNs. DURENDAL can help to easily repurpose any heterogeneous graph learning model to evolving networks by combining design principles from snapshot-based and multirelational message-passing graph learning models. We introduce two different schemes to update embedding representations for THNs, discussing the strengths and weaknesses of both strategies. We also extend the set of benchmarks for TNHs by introducing two novel high-resolution temporal heterogeneous graph datasets derived from an emerging Web3 platform and a well-established e-commerce website. Overall, we conducted the experimental evaluation of the framework over four temporal heterogeneous network datasets on future link prediction tasks in an evaluation setting that takes into account the evolving nature of the data. Experiments show the prediction power of DURENDAL compared to current solutions for evolving and dynamic graphs, and the effectiveness of its model design.

URL PDF HTML ☆

赞 0 踩 0

2511.00369 2026-06-16 cs.LG cs.AI cs.NE 版本更新

Balancing Interpretability and Performance in Motor Imagery EEG Classification: A Comparative Study of ANFIS-FBCSP-PSO and EEGNet

在运动想象EEG分类中平衡可解释性和性能：ANFIS-FBCSP-PSO和EEGNet的比较研究

Farjana Aktar, Mohd Ruhul Ameen, Akif Islam, Md Ekramul Hamid

发表机构 * University of Rajshahi（拉贾沙希大学）

AI总结本文比较了ANFIS-FBCSP-PSO与EEGNet在BCI竞赛IV-2a数据集上的性能，发现模糊神经模型在内子试验中表现更优，而深度模型在跨受试者测试中更具泛化能力，为选择MI-BCI系统提供指导。

Comments Accepted at the 2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence and Networking (QPAIN 2026)

详情

DOI: 10.1109/QPAIN69676.2026.11545962
Journal ref: 2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking (QPAIN)

AI中文摘要

实现准确且可解释的运动想象EEG分类仍是脑机接口（BCI）研究中的关键挑战。本文比较了透明的模糊推理方法（ANFIS-FBCSP-PSO）与知名的深度学习基准（EEGNet），使用公开的BCI竞赛IV-2a数据集。ANFIS流程结合滤波器银行共同空间模式特征提取与通过粒子群优化优化的模糊IF-THEN规则，而EEGNet直接从原始EEG数据学习层次化的空间-时间表示。在内子试验中，模糊神经模型表现更好（68.58%±13.76%准确率，kappa=58.04%±18.43），而在跨受试者（LOSO）测试中，深度模型表现出更强的泛化能力（68.20%±12.13%准确率，kappa=57.33%±16.22）。因此，该研究为根据设计目标选择MI-BCI系统提供了实用指导：可解释性或用户间鲁棒性。未来对基于Transformer和混合神经符号框架的研究有望进一步推动透明的EEG解码。

英文摘要

Achieving both accurate and interpretable classification of motor-imagery EEG remains a key challenge in brain-computer interface (BCI) research. In this paper, we compare a transparent fuzzy-reasoning approach (ANFIS-FBCSP-PSO) with a well-known deep-learning benchmark (EEGNet) using the publicly available BCI Competition IV-2a dataset. The ANFIS pipeline combines filter-bank common spatial pattern feature extraction with fuzzy IF-THEN rules optimized via particle-swarm optimization, while EEGNet learns hierarchical spatial-temporal representations directly from raw EEG data. In within-subject experiments, the fuzzy-neural model performed better (68.58% +/- 13.76% accuracy, kappa = 58.04% +/- 18.43), while in cross-subject (LOSO) tests, the deep model exhibited stronger generalization (68.20% +/- 12.13% accuracy, kappa = 57.33% +/- 16.22). The study therefore provides practical guidance for selecting MI-BCI systems according to the design goal: interpretability or robustness across users. Future investigations into transformer-based and hybrid neuro-symbolic frameworks are expected to further advance transparent EEG decoding.

URL PDF HTML ☆

赞 0 踩 0

2505.04382 2026-06-16 eess.AS cs.LG cs.SD 版本更新

Discrete Optimal Transport and Voice Conversion

离散最优传输与语音转换

Anton Selitskiy, Maitreya Kocharekar

发表机构 * The University of Texas at Austin（德克萨斯大学奥斯汀分校）； University of California, Berkeley（加州大学伯克利分校）

AI总结本文提出kDOT框架，利用预训练语音嵌入空间进行语音转换，通过离散最优传输计划的质心投影改进分布对齐，提升WER、MOS和FAD性能。

Comments 5 pages, 1 figure, 7 tables. 11th International Conference on Machine Learning Technologies (ICMLT), Berlin, Germany, May 2026

详情

AI中文摘要

我们提出kDOT，一种在预训练语音嵌入空间中运行的离散最优传输（OT）框架，用于语音转换（VC）。与kNN-VC和SinkVC中的平均策略以及MKL中的独立假设不同，我们的方法利用离散OT计划的质心投影来构建源和目标说话人嵌入分布之间的传输映射。我们对传输嵌入数量进行了全面的消融研究，并系统分析了源和目标语音持续时间的影响。在LibriSpeech上的实验表明，具有质心投影的OT在分布对齐方面表现一致，并且在WER、MOS和FAD方面通常优于基于平均的方法。此外，我们还表明，将离散OT作为后处理步骤可以将伪造语音转换为被最新伪造检测器误判为真实语音的样本。这展示了OT在嵌入空间中的强大域适应能力，同时也揭示了伪造检测系统的重要安全影响。

英文摘要

We propose kDOT, a discrete optimal transport (OT) framework for voice conversion (VC) operating in a pretrained speech embedding space. In contrast to the averaging strategies used in kNN-VC and SinkVC, and the independence assumption adopted in MKL, our method employs the barycentric projection of the discrete OT plan to construct a transport map between source and target speaker embedding distributions. We conduct a comprehensive ablation study over the number of transported embeddings and systematically analyze the impact of source and target utterance duration. Experiments on LibriSpeech demonstrate that OT with barycentric projection consistently improves distribution alignment and often outperforms averaging-based approaches in terms of WER, MOS, and FAD. Furthermore, we show that applying discrete OT as a post-processing step can transform spoofed speech into samples that are misclassified as bona fide by a state-of-the-art spoofing detector. This demonstrates the strong domain adaptation capability of OT in embedding space, while also revealing important security implications for spoof detection systems.

URL PDF HTML ☆

赞 0 踩 0

2509.22935 2026-06-16 cs.LG cs.AI 版本更新

Compute-Optimal Quantization-Aware Training

计算最优量化感知训练

Aleksandr Dremov, David Grangier, Angelos Katharopoulos, Awni Hannun

发表机构 * Apple（苹果公司）

AI总结本文研究了量化感知训练与全精度训练的计算分配优化问题，通过实验发现QAT与FP训练比例随总计算量增加而上升，并提出新的冷却与QAT融合方法以提升效率。

Comments ICLR 2026

详情

Journal ref: International Conference on Learning Representations (ICLR), 2026

AI中文摘要

量化感知训练（QAT）是提高量化神经网络精度的重要技术。先前研究表明，将训练分解为全精度阶段后接QAT阶段能获得更优精度。然而，全精度与QAT阶段的计算分配仍不明确。本文通过不同计算预算、QAT位宽和模型大小的实验，探讨了不同QAT持续时间对最终性能的影响。研究发现，与先前结论相反，QAT与全精度训练的损失最优比随总计算量增加而上升。使用tokens-per-parameter-byte统计量可准确预测广泛模型大小和量化位宽的最优比例。从实验数据中推导出一个损失标度定律，可预测不同QAT/FP计算分配策略和QAT位宽下的最优QAT比例和最终模型性能。利用该定律进行进一步预测，包括在给定内存约束下最优QAT位宽以及不同位宽QAT精度与全精度模型精度的比较。此外，本文提出了一种新的冷却与QAT融合方法，通过联合学习率衰减与量化感知训练，消除冗余的全精度模型更新，实现显著的计算节省。这些发现为高效的QAT规划提供了实用见解，并使在相同计算预算下训练更高质量的量化模型成为可能。

英文摘要

Quantization-aware training (QAT) is a leading technique for improving the accuracy of quantized neural networks. Previous work has shown that decomposing training into a full-precision (FP) phase followed by a QAT phase yields superior accuracy compared to QAT alone. However, the optimal allocation of compute between the FP and QAT phases remains unclear. We conduct extensive experiments with various compute budgets, QAT bit widths, and model sizes from 86.0M to 2.2B to investigate how different QAT durations impact final performance. We demonstrate that, contrary to previous findings, the loss-optimal ratio of QAT to FP training increases with the total amount of compute. Moreover, the optimal fraction can be accurately predicted for a wide range of model sizes and quantization widths using the tokens-per-parameter-byte statistic. From experimental data, we derive a loss scaling law that predicts both optimal QAT ratios and final model performance across different QAT/FP compute allocation strategies and QAT bit widths. We use the scaling law to make further predictions, which we verify experimentally, including which QAT bit width is optimal under a given memory constraint and how QAT accuracy with different bit widths compares to full-precision model accuracy. Additionally, we propose a novel cooldown and QAT fusion approach that performs learning rate decay jointly with quantization-aware training, eliminating redundant full-precision model updates and achieving significant compute savings. These findings provide practical insights into efficient QAT planning and enable the training of higher-quality quantized models with the same compute budget.

URL PDF HTML ☆

赞 0 踩 0

2602.21381 2026-06-16 cs.LG cs.AI cs.CE 版本更新

VCDF: A Validated Consensus-Driven Framework for Time Series Causal Discovery

VCDF：一种验证性共识驱动的时间序列因果发现框架

Gene Yu, Ce Guo, Wayne Luk

发表机构 * Department of Computing, Imperial College London（帝国理工学院伦敦分校计算机系）

AI总结本文提出VCDF框架，通过评估时间序列阻断子集的因果关系稳定性，提升因果发现的鲁棒性，实验显示其在VAR-LiNGAM等方法上显著提高了F1分数，尤其在长序列中效果更佳。

Comments This paper has been accepted to PAKDD 2026. Please cite the proceedings version when available

详情

DOI: 10.1007/978-981-92-1465-5_3
Journal ref: LNCS vol. 16599, pp. 29-41, Springer, 2026

AI中文摘要

时间序列因果发现对于理解动态系统至关重要，但现有方法对噪声、非平稳性和采样变异敏感。本文提出验证性共识驱动框架（VCDF），一种简单且方法无关的层，通过评估因果关系在阻断时间子集中的稳定性来提高鲁棒性。VCDF无需修改基础算法，可应用于VAR-LiNGAM和PCMCI等方法。实验表明，VCDF在合成数据集上提高了VAR-LiNGAM的窗口和总结F1分数，增益在不同数据特性中最为明显的是中等至长序列。该框架还受益于更长的序列，时间序列长度1000及以上可获得高达0.18的绝对改进。在模拟fMRI数据和IT监控场景中的评估进一步展示了其在现实噪声条件下的稳定性和结构准确性。VCDF为时间序列因果发现提供了一个有效的可靠性层，而不会改变底层建模假设。

英文摘要

Time series causal discovery is essential for understanding dynamic systems, yet many existing methods remain sensitive to noise, non-stationarity, and sampling variability. We propose the Validated Consensus-Driven Framework (VCDF), a simple and method-agnostic layer that improves robustness by evaluating the stability of causal relations across blocked temporal subsets. VCDF requires no modification to base algorithms and can be applied to methods such as VAR-LiNGAM and PCMCI. Experiments on synthetic datasets show that VCDF improves VAR-LiNGAM by approximately 0.08-0.12 in both window and summary F1 scores across diverse data characteristics, with gains most pronounced for moderate-to-long sequences. The framework also benefits from longer sequences, yielding up to 0.18 absolute improvement on time series of length 1000 and above. Evaluations on simulated fMRI data and IT-monitoring scenarios further demonstrate enhanced stability and structural accuracy under realistic noise conditions. VCDF provides an effective reliability layer for time series causal discovery without altering underlying modeling assumptions.

URL PDF HTML ☆

赞 0 踩 0

2602.19253 2026-06-16 cs.LG cs.NE 版本更新

Alternating Bi-Objective Optimization for Explainable Neuro-Fuzzy Systems

交替双目标优化用于可解释的神经模糊系统

Qusai Khaled, Uzay Kaymak, Laura Genga

发表机构 * University of Birmingham（伯明翰大学）； Bilkent University（比尔肯特大学）； University of Turku（图尔库大学）

AI总结本文提出X-ANFIS方法，通过交替双目标梯度优化提升神经模糊系统的可解释性，在UCI回归数据集上验证了其在保持预测精度的同时实现目标区分性。

Comments Accepted at IEEE Conference on Artificial Intelligence 2026 (IEEE CAI 2026)

详情

DOI: 10.1109/CAI68641.2026.11536320
Journal ref: Proc. 2026 IEEE Conference on Artificial Intelligence (CAI), 1166-1173 (2026)

AI中文摘要

模糊系统由于其基于规则的架构和语言变量，在可解释AI中展现出强大潜力。现有方法通过进化多目标优化（MOO）或梯度基标量化来平衡精度与可解释性，但前者计算成本高，后者无法恢复非凸帕累托区域。我们提出X-ANFIS，一种用于可解释自适应神经模糊推理系统的交替双目标梯度优化方案。通过语义控制的初始值使用Cauchy隶属函数实现稳定训练，并引入可微的可解释性目标，通过交替梯度传递将其与性能目标解耦。在约5000个实验中验证于九个UCI回归数据集，X-ANFIS在保持竞争性预测精度的同时，持续实现目标区分性，并恢复MOO帕累托前沿的凸包外的解。

英文摘要

Fuzzy systems show strong potential in explainable AI due to their rule-based architecture and linguistic variables. Existing approaches navigate the accuracy-explainability trade-off either through evolutionary multi-objective optimization (MOO), which is computationally expensive, or gradient-based scalarization, which cannot recover non-convex Pareto regions. We propose X-ANFIS, an alternating bi-objective gradient-based optimization scheme for explainable adaptive neuro-fuzzy inference systems. Cauchy membership functions are used for stable training under semantically controlled initializations, and a differentiable explainability objective is introduced and decoupled from the performance objective through alternating gradient passes. Validated in approximately 5,000 experiments on nine UCI regression datasets, X-ANFIS consistently achieves target distinguishability while maintaining competitive predictive accuracy, recovering solutions beyond the convex hull of the MOO Pareto front.

URL PDF HTML ☆

赞 0 踩 0

2505.17786 2026-06-16 cs.LG 版本更新

Supervised Graph Contrastive Learning for Gene Regulatory Networks

监督图对比学习用于基因调控网络

Sho Oshima, Yuji Okamoto, Taisei Tosaki, Ryosuke Kojima

发表机构 * University of Tokyo（东京大学）； National Institute of Genetics（日本国立遗传学研究所）

AI总结本文提出SupGCL方法，通过整合基因敲除实验中的生物扰动作为监督信号，改进基因调控网络的表示学习，提升疾病亚型识别和下游任务性能。

Comments ICML 2026

详情

AI中文摘要

图对比学习（GCL）是一种强大的自监督学习框架，通过图扰动进行数据增强，在分析生物网络如基因调控网络（GRNs）中应用广泛。GCL中常用的节点删除等人工扰动会引起结构变化，可能偏离生物现实。这一问题促使图表示学习向无增强方法发展，但该趋势忽略了生物有意义的扰动引起的结构变化并非需要避免的问题，而是信息来源。本文提出SupGCL，一种新的GRN GCL方法，直接整合基因敲除实验中的生物扰动作为监督。SupGCL是一种概率模型，连续扩展传统GCL，将人工增强与实测的敲除实验扰动联系起来，并利用后者作为显式监督。在三种癌症类型的患者衍生GRNs上，我们训练GRN表示并评估：（i）嵌入空间分析，产生更清晰的疾病亚型结构并提升聚类；（ii）任务特定微调，其在13个下游任务中一致优于强图表示学习基线，涵盖基因层面的功能注释和患者层面预测。

英文摘要

Graph Contrastive Learning (GCL) is a powerful self-supervised learning framework that performs data augmentation through graph perturbations, with growing applications in the analysis of biological networks such as Gene Regulatory Networks (GRNs). The artificial perturbations commonly used in GCL, such as node dropping, induce structural changes that can diverge from biological reality. This concern has contributed to a broader trend in graph representation learning toward augmentation-free methods, which view such structural changes as problematic and should be avoided. However, this trend overlooks the fundamental insight that structural changes from biologically meaningful perturbations are not a problem to be avoided, but rather a rich source of information, thereby ignoring the valuable opportunity to leverage data from real biological experiments. Motivated by this insight, we propose SupGCL (Supervised Graph Contrastive Learning), a new GCL method for GRNs that directly incorporates biological perturbations from gene knockdown experiments as supervision. SupGCL is a probabilistic formulation that continuously generalizes conventional GCL, linking artificial augmentations with real perturbations measured in knockdown experiments, and using the latter as explicit supervision. On patient-derived GRNs from three cancer types, we train GRN representations with SupGCL and evaluate it in two regimes: (i) embedding space analysis, where it yields clearer disease-subtype structure and improves clustering, and (ii) task-specific fine-tuning, where it consistently outperforms strong graph representation learning baselines on 13 downstream tasks spanning gene-level functional annotation and patient-level prediction.

URL PDF HTML ☆

赞 0 踩 0

2602.00240 2026-06-16 cs.LG 版本更新

Green-NAS: A Global-Scale Multi-Objective Neural Architecture Search for Robust and Efficient Edge-Native Weather Forecasting

Green-NAS：一种全球尺度多目标神经架构搜索用于鲁棒且高效的边缘原生天气预测

Md Muhtasim Munif Fahim, Soyda Humyra Yesmin, Saiful Islam, Md. Palash Bin Faruque, Md. A. Salam, Md. Mahfuz Uddin, Samiul Islam, Tofayel Ahmed, Md. Binyamin, Md. Rezaul Karim

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Washington（华盛顿大学）； University of Arizona（亚利桑那大学）

AI总结 Green-NAS通过多目标优化寻找轻量高精度模型，减少计算能耗与碳足迹，提升边缘天气预测的鲁棒性与效率。

Comments Accepted at the 2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking

详情

DOI: 10.1109/QPAIN69676.2026.11545925
Journal ref: 2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking (QPAIN)

AI中文摘要

我们引入Green-NAS，一种针对低资源环境设计的多目标神经架构搜索框架，以天气预测为案例研究。遵循'绿色AI'原则，该框架明确最小化计算能耗和碳足迹，优先考虑可持续部署而非单纯计算规模。Green-NAS架构搜索方法通过同时优化多个目标，寻找高精度且参数极少的轻量模型；我们的最佳模型Green-NAS-A仅使用153k参数，达到RMSE 0.0988（比手动调优基线高1.4%），比其他全球应用的天气预测模型如GraphCast少239倍。此外，我们还描述了迁移学习如何在历史数据有限时，将天气预测精度提高约5.2%。

英文摘要

We introduce Green-NAS, a multi-objective NAS (neural architecture search) framework designed for low-resource environments using weather forecasting as a case study. By adhering to 'Green AI' principles, the framework explicitly minimizes computational energy costs and carbon footprints, prioritizing sustainable deployment over raw computational scale. The Green-NAS architecture search method is optimized for both model accuracy and efficiency to find lightweight models with high accuracy and very few model parameters; this is accomplished through an optimization process that simultaneously optimizes multiple objectives. Our best-performing model, Green-NAS-A, achieved an RMSE of 0.0988 (i.e., within 1.4% of our manually tuned baseline) using only 153k model parameters, which is 239 times fewer than other globally applied weather forecasting models, such as GraphCast. In addition, we also describe how the use of transfer learning will improve the weather forecasting accuracy by approximately 5.2%, in comparison to a naive approach of training a new model for each city, when there is limited historical weather data available for that city.

URL PDF HTML ☆

赞 0 踩 0

2602.08088 2026-06-16 cs.LG cs.AI 版本更新

Online Domain-aware LLM Decoding for Continual Domain Evolution

在线领域感知的LLM解码用于持续领域演变

Mohammad Abu-Shaira, Weishi Shi

发表机构 * University of North Texas（北卡罗来纳州立大学）

AI总结本文提出在线领域感知解码框架ODD，通过概率融合和自适应置信度调节，提升LLM在持续领域变化中的适应能力，实验表明其在语法和语义生成任务中表现优异。

详情

DOI: 10.1007/978-981-92-1468-6_40
Journal ref: Advances in Knowledge Discovery and Data Mining, PAKDD 2026, LNAI 16600, pp. 565-577, Springer, 2026

AI中文摘要

LLMs通常在领域特定数据上离线微调，假设领域静态。但实际上，领域知识通过新法规、产品、服务和交互模式持续演变。对每个新实例重新训练或微调LLM在计算上不可行。此外，现实环境也表现出时间动态性，数据分布不断变化。忽视这种现象，即概念漂移，会显著降低模型的预测准确性。这种领域演变与静态适应管道的不匹配凸显了需要高效实时适应而无需昂贵再训练的需求。为此，我们引入在线领域感知解码框架（ODD）。ODD在基础LLM和前缀树先验之间进行概率级融合，通过自适应置信度调节使用分歧和连续性信号进行指导。在多样化的漂移场景下的实证评估表明，ODD在所有语法和语义NLG指标上均优于LLM-Greedy和LLM-Temp Scaled。它在ROUGE-L指标上获得绝对增益0.065，并在最佳基线上使余弦相似度提高13.6%。这些结果证明了ODD对演变词汇和上下文模式的鲁棒性，使其适用于动态LLM应用。

英文摘要

LLMs are typically fine-tuned offline on domain-specific data, assuming a static domain. In practice, domain knowledge evolves continuously through new regulations, products, services, and interaction patterns. Retraining or fine-tuning LLMs for every new instance is computationally infeasible. Additionally, real-world environments also exhibit temporal dynamics with shifting data distributions. Disregarding this phenomenon, commonly referred to as concept drift, can significantly diminish a model's predictive accuracy. This mismatch between evolving domains and static adaptation pipelines highlights the need for efficient, real-time adaptation without costly retraining. In response, we introduce Online Domain-aware Decoding framework (ODD). ODD performs probability-level fusion between a base LLM and a prefix-tree prior, guided by adaptive confidence modulation using disagreement and continuity signals. Empirical evaluation under diverse drift scenarios demonstrates that ODD consistently surpasses LLM-Greedy and LLM-Temp Scaled across all syntactic and semantic NLG metrics. It yields an absolute ROUGE-L gain of 0.065 and a 13.6% relative improvement in Cosine Similarity over the best baseline. These results demonstrate ODD 's robustness to evolving lexical and contextual patterns, making it suitable for dynamic LLM applications.

URL PDF HTML ☆

赞 0 踩 0

2601.18897 2026-06-16 cs.AI cs.LG 版本更新

Explainable Uncertainty Quantification for Wastewater Treatment Energy Prediction via Interval Type-2 Neuro-Fuzzy System

通过区间型2神经模糊系统实现废水处理能耗预测的可解释不确定性量化

Qusai Khaled, Bahjat Mallak, Uzay Kaymak, Laura Genga

发表机构 * Jheronimus Academy of Data Science, Eindhoven University of Technology, Eindhoven, The Netherlands（杰罗尼穆斯数据科学学院，埃因霍温理工大学，埃因霍温，荷兰）； Haskoning, Amersfoort, The Netherlands（哈索宁，阿默斯福尔特，荷兰）； School of Industrial Engineering, Eindhoven University of Technology（工业工程学院，埃因霍温理工大学）

AI总结本文提出一种区间型2神经模糊系统，用于废水处理能耗预测，通过模糊规则结构生成可解释的预测区间，分解不确定性层级，提升决策可靠性。

Comments Submitted to 21st International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU2026)

详情

DOI: 10.1007/978-3-032-28997-1_28
Journal ref: IPMU 2026, Commun. Comput. Inf. Sci. 3020, 392-406 (2026)

AI中文摘要

废水处理厂消耗全球1-3%的电力，准确的能耗预测对运营优化和可持续性至关重要。尽管机器学习模型提供点预测，但缺乏可解释的不确定性量化，这对安全关键基础设施的风险意识决策至关重要。本研究开发了一种区间型2自适应神经模糊推理系统（IT2-ANFIS），通过模糊规则结构生成可解释的预测区间。与黑箱概率方法不同，所提出的框架将不确定性分解为三个层次：特征层、不确定性足迹识别引入模糊性的变量，规则层分析揭示局部模型的置信度，实例层区间量化整体预测不确定性。在墨尔本水务东处理厂数据集上验证，IT2-ANFIS在预测性能上与一阶ANFIS相当，但在训练运行中方差显著降低，同时提供可解释的不确定性估计，将预测置信度直接与运营条件和输入变量联系起来。

英文摘要

Wastewater treatment plants consume 1-3% of global electricity, making accurate energy forecasting critical for operational optimization and sustainability. While machine learning models provide point predictions, they lack explainable uncertainty quantification essential for risk-aware decision-making in safety-critical infrastructure. This study develops an Interval Type-2 Adaptive Neuro-Fuzzy Inference System (IT2-ANFIS) that generates interpretable prediction intervals through fuzzy rule structures. Unlike black-box probabilistic methods, the proposed framework decomposes uncertainty across three levels: feature-level, footprint of uncertainty identify which variables introduce ambiguity, rule-level analysis reveals confidence in local models, and instance-level intervals quantify overall prediction uncertainty. Validated on Melbourne Water's Eastern Treatment Plant dataset, IT2-ANFIS achieves comparable predictive performance to first order ANFIS with substantially reduced variance across training runs, while providing explainable uncertainty estimates that link prediction confidence directly to operational conditions and input variables.

URL PDF HTML ☆

赞 0 踩 0

2510.24987 2026-06-16 q-bio.QM cs.LG q-bio.GN 版本更新

scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration

scMRDR：一种可扩展且灵活的无配对单细胞多组学数据整合框架

Jianle Sun, Chaoqi Liang, Ran Wei, Peng Zheng, Lei Bai, Wanli Ouyang, Hongliang Yan, Peng Ye

发表机构 * Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； Carnegie Mellon University（卡内基梅隆大学）； The Chinese University of Hong Kong（香港中文大学）； Guangzhou Laboratory（广州实验室）

AI总结 scMRDR通过β-VAE架构解耦细胞潜在表示，结合等距正则化、对抗目标和掩码重建损失，实现无配对多组学数据整合，有效提升大规模数据处理能力。

Comments Accepted at NeurIPS 2025 (Spotlight)

详情

Journal ref: Advances in Neural Information Processing Systems 38 (2025): 154538-154565

AI中文摘要

scMRDR通过β-VAE架构解耦细胞潜在表示，结合等距正则化、对抗目标和掩码重建损失，实现无配对多组学数据整合，有效提升大规模数据处理能力。

英文摘要

Advances in single-cell sequencing have enabled high-resolution profiling of diverse molecular modalities, while integrating unpaired multi-omics single-cell data remains challenging. Existing approaches either rely on pair information or prior correspondences, or require computing a global pairwise coupling matrix, limiting their scalability and flexibility. In this paper, we introduce a scalable and flexible generative framework called single-cell Multi-omics Regularized Disentangled Representations (scMRDR) for unpaired multi-omics integration. Specifically, we disentangle each cell's latent representations into modality-shared and modality-specific components using a well-designed $β$-VAE architecture, which are augmented with isometric regularization to preserve intra-omics biological heterogeneity, adversarial objective to encourage cross-modal alignment, and masked reconstruction loss strategy to address the issue of missing features across modalities. Our method achieves excellent performance on benchmark datasets in terms of batch correction, modality alignment, and biological signal preservation. Crucially, it scales effectively to large-scale datasets and supports integration of more than two omics, offering a powerful and flexible solution for large-scale multi-omics data integration and downstream biological discovery.

URL PDF HTML ☆

赞 0 踩 0

2512.14892 2026-06-16 cs.LG cs.AI 版本更新

OLR-WA: Online Weighted Average Linear Regression in Multivariate Data Streams

OLR-WA：多变量数据流中的在线加权平均线性回归

Mohammad Abu-Shaira, Alejandro Rodriguez, Greg Speegle, Victor Sheng, Ishfaq Ahmad

发表机构 * University of California, San Diego（加州大学圣地亚哥分校）

AI总结本文提出OLR-WA模型，用于多变量数据流的在线线性回归，通过处理数据漂移和置信度场景，实现与批量回归相当甚至更优的性能。

详情

DOI: 10.1109/BigData59044.2023.10386601
Journal ref: 2023 IEEE International Conference on Big Data (BigData), 1039-1046

AI中文摘要

在线学习通过增量更新模型来处理新数据，避免大规模存储需求和昂贵的模型重计算。本文引入了

英文摘要

Online learning updates models incrementally with new data, avoiding large storage requirements and costly model recalculations. In this paper, we introduce "OLR-WA; OnLine Regression with Weighted Average", a novel and versatile multivariate online linear regression model. We also investigate scenarios involving drift, where the underlying patterns in the data evolve over time, conduct convergence analysis, and compare our approach with existing online regression models. The results of OLR-WA demonstrate its ability to achieve performance comparable to the batch regression, while also showcasing comparable or superior performance when compared with other state-of-the-art online models, thus establishing its effectiveness. Moreover, OLR-WA exhibits exceptional performance in terms of rapid convergence, surpassing other online models with consistently achieving high r2 values as a performance measure from the first iteration to the last iteration, even when initialized with minimal amount of data points, as little as 1% to 10% of the total data points. In addition to its ability to handle time-based (temporal drift) scenarios, remarkably, OLR-WA stands out as the only model capable of effectively managing confidence-based challenging scenarios. It achieves this by adopting a conservative approach in its updates, giving priority to older data points with higher confidence levels. In summary, OLR-WA's performance further solidifies its versatility and utility across different contexts, making it a valuable solution for online linear regression tasks.

URL PDF HTML ☆

赞 0 踩 0

2512.08879 2026-06-16 cs.LG cs.AI 版本更新

DAO-GP Drift Aware Online Non-Linear Regression Gaussian-Process

DAO-GP：漂移感知在线非线性回归高斯过程

Mohammad Abu-Shaira, Ajita Rattani, Weishi Shi

发表机构 * st Mohammad Abu-Shaira（第一作者）； nd Ajita Rattani（第二作者）； rd Weishi Shi（第三作者）

AI总结提出DAO-GP模型，通过内置漂移检测与自适应机制、无超参数、稀疏化和衰减策略，解决在线高斯过程回归中概念漂移、超参数固定等问题，在多种漂移类型下表现鲁棒且优于现有方法。

详情

DOI: 10.1109/BigData66926.2025.11401428
Journal ref: 2025 IEEE International Conference on Big Data (BigData), pp. 776-785, 2025

AI中文摘要

真实世界的数据集通常表现出以数据分布演变为特征的时态动态。忽视这一现象（通常称为概念漂移）会显著降低模型的预测精度。此外，在线模型中超参数的存在加剧了这一问题。这些参数通常是固定的，用户无法根据演化的数据分布动态调整。高斯过程模型提供了具有不确定性量化的强大非参数回归能力，使其成为在线设置中建模复杂数据关系的理想选择。然而，传统的在线高斯过程方法存在几个关键限制，包括缺乏漂移感知、依赖固定超参数、易受数据窥探影响、缺乏原则性的衰减机制以及内存效率低下。为此，我们提出了DAO-GP（漂移感知在线高斯过程），一种新颖的、完全自适应的、无超参数、带衰减的稀疏非线性回归模型。DAO-GP具有内置的漂移检测和自适应机制，可根据漂移的严重程度动态调整模型行为。广泛的经验评估证实了DAO-GP在平稳条件、多种漂移类型（突变、增量、渐变）以及不同数据特征下的鲁棒性。分析表明其动态自适应、高效的内存和基于衰减的管理以及演化的诱导点。与最先进的参数和非参数模型相比，DAO-GP始终达到优越或竞争性的性能，使其成为在线非线性回归中具有漂移鲁棒性的解决方案。

英文摘要

Real-world datasets often exhibit temporal dynamics characterized by evolving data distributions. Disregarding this phenomenon, commonly referred to as concept drift, can significantly diminish a model's predictive accuracy. Furthermore, the presence of hyperparameters in online models exacerbates this issue. These parameters are typically fixed and cannot be dynamically adjusted by the user in response to the evolving data distribution. Gaussian Process (GP) models offer powerful non-parametric regression capabilities with uncertainty quantification, making them ideal for modeling complex data relationships in an online setting. However, conventional online GP methods face several critical limitations, including a lack of drift-awareness, reliance on fixed hyperparameters, vulnerability to data snooping, absence of a principled decay mechanism, and memory inefficiencies. In response, we propose DAO-GP (Drift-Aware Online Gaussian Process), a novel, fully adaptive, hyperparameter-free, decayed, and sparse non-linear regression model. DAO-GP features a built-in drift detection and adaptation mechanism that dynamically adjusts model behavior based on the severity of drift. Extensive empirical evaluations confirm DAO-GP's robustness across stationary conditions, diverse drift types (abrupt, incremental, gradual), and varied data characteristics. Analyses demonstrate its dynamic adaptation, efficient in-memory and decay-based management, and evolving inducing points. Compared with state-of-the-art parametric and non-parametric models, DAO-GP consistently achieves superior or competitive performance, establishing it as a drift-resilient solution for online non-linear regression.

URL PDF HTML ☆

赞 0 踩 0

2510.19728 2026-06-16 cs.LG cs.AI 版本更新

Enabling Granular Subgroup Level Model Evaluations by Generating Synthetic Medical Time Series

通过生成合成医疗时间序列实现细粒度亚组级别模型评估

Mahmoud Ibrahim, Bart Elen, Chang Sun, Gökhan Ertaylan, Michel Dumontier

发表机构 * Institute of Data Science, Faculty of Science and Engineering, Maastricht University（数据科学研究所，科学与工程学院，马斯特里赫特大学）； Department of Advanced Computing Sciences, Faculty of Science and Engineering, Maastricht University（先进计算科学系，科学与工程学院，马斯特里赫特大学）； VITO（VITO研究院）

AI总结本文提出一种框架，利用合成ICU时间序列数据训练和评估预测模型，特别是在细粒度人口亚组中。引入Enhanced TimeAutoDiff，通过分布对齐惩罚增强潜在扩散目标，减少真实-合成与真实-真实评估差距，提升亚组模型评估的鲁棒性和可靠性。

详情

DOI: 10.1007/978-3-032-19102-1_19

AI中文摘要

我们提出了一种新的框架，利用合成ICU时间序列数据不仅训练，还能严格可信地评估预测模型，既在总体层面，又在细粒度人口亚组中。基于先前的扩散和VAE生成器（TimeDiff，HealthGen，TimeAutoDiff），我们引入Enhanced TimeAutoDiff，通过在潜在扩散目标中加入分布对齐惩罚。我们广泛在MIMIC-III和eICU上对所有模型进行了基准测试，针对24小时死亡率和二元住院时间任务。我们的结果表明，Enhanced TimeAutoDiff通过减少真实-合成与真实-真实评估（

英文摘要

We present a novel framework for leveraging synthetic ICU time-series data not only to train but also to rigorously and trustworthily evaluate predictive models, both at the population level and within fine-grained demographic subgroups. Building on prior diffusion and VAE-based generators (TimeDiff, HealthGen, TimeAutoDiff), we introduce \textit{Enhanced TimeAutoDiff}, which augments the latent diffusion objective with distribution-alignment penalties. We extensively benchmark all models on MIMIC-III and eICU, on 24-hour mortality and binary length-of-stay tasks. Our results show that Enhanced TimeAutoDiff reduces the gap between real-on-synthetic and real-on-real evaluation (``TRTS gap'') by over 70\%, achieving $Δ_{TRTS} \leq 0.014$ AUROC, while preserving training utility ($Δ_{TSTR} \approx 0.01$). Crucially, for 32 intersectional subgroups, large synthetic cohorts cut subgroup-level AUROC estimation error by up to 50\% relative to small real test sets, and outperform them in 72--84\% of subgroups. This work provides a practical, privacy-preserving roadmap for trustworthy, granular model evaluation in critical care, enabling robust and reliable performance analysis across diverse patient populations without exposing sensitive EHR data, contributing to the overall trustworthiness of Medical AI.

URL PDF HTML ☆

赞 0 踩 0

2410.13439 2026-06-16 cs.LG cs.CL cs.CV 版本更新

Similarity-Dissimilarity Loss for Multi-label Supervised Contrastive Learning

多标签监督对比学习中的相似性-差异性损失

Guangming Huang, Yunfei Long, Cunjin Luo

发表机构 * University of Essex（埃塞克斯大学）； Queen Mary University of London（伦敦大学玛丽女王学院）

AI总结本文提出相似性-差异性损失，通过动态加权样本解决多标签场景下正样本确定问题，提供理论证明并统一单标签与多标签对比学习框架，实验表明方法在图像、文本和医疗领域均优于基线。

Comments Accepted by Transactions on Machine Learning Research (TMLR)

详情

AI中文摘要

监督对比学习通过利用标签信息取得了显著成功；然而，在多标签场景中确定正样本仍是一个关键挑战。在多标签监督对比学习（MSCL）中，多标签关系尚未完全定义，导致正样本识别和对比损失函数构建存在歧义。为解决这些挑战，我们：（i）系统地制定了MSCL中的多标签关系；（ii）提出了一种新颖的相似性-差异性损失，根据相似性和差异性因素动态重新加权样本；（iii）通过严谨的数学分析提供了理论支持，支持我们的方法制定和有效性；（iv）为单标签和多标签监督对比损失提供统一形式和范式。我们在图像和文本模态上进行了实验，并进一步将其扩展到医疗领域。结果表明，我们的方法在全面评估中始终优于基线，证明了其有效性和鲁棒性。

英文摘要

Supervised contrastive learning has achieved remarkable success by leveraging label information; however, determining positive samples in multi-label scenarios remains a critical challenge. In multi-label supervised contrastive learning (MSCL), multi-label relations are not yet fully defined, leading to ambiguity in identifying positive samples and formulating contrastive loss functions to construct the representation space. To address these challenges, we: (i) systematically formulate multi-label relations in MSCL, (ii) propose a novel Similarity-Dissimilarity Loss, which dynamically re-weights samples based on similarity and dissimilarity factors, (iii) further provide theoretically grounded proofs for our method through rigorous mathematical analysis that supports the formulation and effectiveness, and (iv) offer a unified form and paradigm for both single-label and multi-label supervised contrastive loss. We conduct experiments on both image and text modalities and further extend the evaluation to the medical domain. The results show that our method consistently outperforms baselines in comprehensive evaluations, demonstrating its effectiveness and robustness.

URL PDF HTML ☆

赞 0 踩 0

2509.19197 2026-06-16 cs.LG 版本更新

A Validation Strategy for Deep Learning Models: Evaluating and Enhancing Robustness

深度学习模型的验证策略：评估与增强鲁棒性

Abdul-Rauf Nuhu, Parham Kebria, Vahid Hemmati, Benjamin Lartey, Mahmoud Nabil Mahmoud, Abdollah Homaifar, Edward Tunstel

发表机构 * National Center for Atmospheric Research (NCAR)（国家大气科学研究中心）

AI总结本文提出通过局部鲁棒性分析从训练数据中提取'弱鲁棒'样本，用于评估和提升模型鲁棒性，验证了该方法在CIFAR-10、CIFAR-100和ImageNet上的有效性。

详情

DOI: 10.1109/OJCS.2025.3650722

AI中文摘要

数据驱动模型，尤其是深度学习分类器在干净数据集上表现优异，但易受对抗性及常见扰动影响。传统鲁棒性验证依赖扰动测试数据集，而本文提出从训练数据中提取'弱鲁棒'样本进行验证。这些样本对扰动最敏感，能早期揭示模型漏洞。通过评估这些挑战性样本，可更深入理解模型鲁棒性并指导性能提升。在CIFAR-10、CIFAR-100和ImageNet上验证了该方法的有效性，展示了基于弱鲁棒样本的鲁棒性验证如何提升模型在对抗和常见扰动下的可靠性。

英文摘要

Data-driven models, especially deep learning classifiers often demonstrate great success on clean datasets. Yet, they remain vulnerable to common data distortions such as adversarial and common corruption perturbations. These perturbations can significantly degrade performance, thereby challenging the overall reliability of the models. Traditional robustness validation typically relies on perturbed test datasets to assess and improve model performance. In our framework, however, we propose a validation approach that extracts "weak robust" samples directly from the training dataset via local robustness analysis. These samples, being the most susceptible to perturbations, serve as an early and sensitive indicator of the model's vulnerabilities. By evaluating models on these challenging training instances, we gain a more nuanced understanding of its robustness, which informs targeted performance enhancement. We demonstrate the effectiveness of our approach on models trained with CIFAR-10, CIFAR-100, and ImageNet, highlighting how robustness validation guided by weak robust samples can drive meaningful improvements in model reliability under adversarial and common corruption scenarios.

URL PDF HTML ☆

赞 0 踩 0

2506.22530 2026-06-16 cs.LG cs.DB 版本更新

Task-Agnostic Contrastive Pretraining for Relational Deep Learning

关系深度学习中的任务无关对比预训练

Jakub Peleška, Gustav Šír

发表机构 * Czech Technical University in Prague（捷克技术大学布拉格分校）

AI总结本文提出一种任务无关的对比预训练方法，通过三层对比目标提升关系数据的表示学习，实验表明预训练模型在关系数据迁移学习中表现优异。

Comments arXiv admin note: text overlap with arXiv:2506.22199

详情

DOI: 10.1007/978-3-032-19102-1_38

AI中文摘要

关系深度学习（RDL）是一种新兴范式，利用图神经网络原理直接从关系数据库中学习，通过将其表示为异构图。然而，现有RDL模型通常依赖任务特定的监督学习，需要为每个预测任务训练独立模型，这可能影响可扩展性和重用性。本文提出了一种新的任务无关对比预训练方法，旨在实现数据库层面的表示学习。为此，我们引入了三个层次的对比目标——行级、链接级和上下文级，旨在捕捉关系数据固有的结构和语义异质性。我们通过模块化的RDL架构和高效的采样策略实现了相应的预训练方法。在标准RDL基准上的初步结果表明，微调预训练模型在性能上显著优于从头开始训练，验证了所提出方法在学习可迁移表示方面的潜力。

英文摘要

Relational Deep Learning (RDL) is an emerging paradigm that leverages Graph Neural Network principles to learn directly from relational databases by representing them as heterogeneous graphs. However, existing RDL models typically rely on task-specific supervised learning, requiring training separate models for each predictive task, which may hamper scalability and reuse. In this work, we propose a novel task-agnostic contrastive pretraining approach for RDL that enables database-wide representation learning. For that aim, we introduce three levels of contrastive objectives$-$row-level, link-level, and context-level$-$designed to capture the structural and semantic heterogeneity inherent to relational data. We implement the respective pretraining approach through a modular RDL architecture and an efficient sampling strategy tailored to the heterogeneous database setting. Our preliminary results on standard RDL benchmarks demonstrate that fine-tuning the pretrained models measurably outperforms training from scratch, validating the promise of the proposed methodology in learning transferable representations for relational data.

URL PDF HTML ☆

赞 0 踩 0

2504.18179 2026-06-16 cs.CV cs.LG 版本更新

Label-independent hyperparameter-free self-supervised single-view deep subspace clustering

与标签无关的超参数自由单视图深度子空间聚类

Lovro Sindicic, Ivica Kopriva

发表机构 * Division of Computing and Data Science, Ruđer Bošković Institute（计算与数据科学系，鲁德·博克维奇研究所）

AI总结本文提出一种无需超参数调节的单视图深度子空间聚类方法，通过层间自表达损失、子空间结构范数优化、多阶段学习框架和相对误差终止机制提升聚类性能。

Comments 35 pages; 1 figure; 10 Tables

详情

DOI: 10.1016/j.neucom.2025.132260

AI中文摘要

深度子空间聚类（DSC）算法面临多个挑战，限制了其在各种应用领域中的广泛应用。首先，聚类质量通常仅通过编码器的输出层评估，忽略了中间层中的有价值信息。其次，大多数DSC方法将表示学习和子空间聚类视为独立任务，限制了其有效性。第三，它们假设可以使用一个留出的数据集进行超参数调节，这在实际场景中往往不现实。第四，学习终止通常基于聚类误差监控，需要外部标签。最后，其性能通常依赖于依赖标注数据的后处理技术。为了解决这些限制，我们引入了一种新的单视图DSC方法：(i) 使用联合表示矩阵最小化层间自表达损失；(ii) 优化子空间结构范数以提高聚类质量；(iii) 采用多阶段顺序学习框架，包括预训练和微调，使能够使用多个正则化项而无需超参数调节；(iv) 融合基于相对误差的自停止机制以终止训练而不使用标签；(v) 根据先验知识在学习的表示矩阵中保留固定数量的领先系数。我们在六个代表面孔、数字和物体的数据集上评估了所提出的方法。结果表明，我们的方法在经过仔细调节的超参数下优于大多数线性SC算法，同时在最佳线性方法中保持竞争力。

英文摘要

Deep subspace clustering (DSC) algorithms face several challenges that hinder their widespread adoption across variois application domains. First, clustering quality is typically assessed using only the encoder's output layer, disregarding valuable information present in the intermediate layers. Second, most DSC approaches treat representation learning and subspace clustering as independent tasks, limiting their effectiveness. Third, they assume the availability of a held-out dataset for hyperparameter tuning, which is often impractical in real-world scenarios. Fourth, learning termination is commonly based on clustering error monitoring, requiring external labels. Finally, their performance often depends on post-processing techniques that rely on labeled data. To address this limitations, we introduce a novel single-view DSC approach that: (i) minimizes a layer-wise self expression loss using a joint representation matrix; (ii) optimizes a subspace-structured norm to enhance clustering quality; (iii) employs a multi-stage sequential learning framework, consisting of pre-training and fine-tuning, enabling the use of multiple regularization terms without hyperparameter tuning; (iv) incorporates a relative error-based self-stopping mechanism to terminate training without labels; and (v) retains a fixed number of leading coefficients in the learned representation matrix based on prior knowledge. We evaluate the proposed method on six datasets representing faces, digits, and objects. The results show that our method outperforms most linear SC algorithms with careffulyl tuned hyperparameters while maintaining competitive performance with the best performing linear appoaches.

URL PDF HTML ☆

赞 0 踩 0

2403.19444 2026-06-16 cs.LG cs.CV 版本更新

Leveraging Expert Input for Robust and Explainable AI-Assisted Lung Cancer Detection in Chest X-rays

利用专家输入实现稳健且可解释的AI辅助肺癌检测

Amy Rafferty, Rishi Ramaesh, Ajitha Rajan

发表机构 * School of Informatics, University of Edinburgh（信息学院，爱丁堡大学）； NHS Lothian（洛锡安国家健康服务）

AI总结本文研究了基于InceptionV3的肺癌检测模型的可解释性和鲁棒性，提出ClinicXAI方法，通过专家驱动的思路生成临床相关解释，并在对抗攻击下表现出更强的鲁棒性。

详情

DOI: 10.1109/ICHI64645.2025.00071

AI中文摘要

深度学习模型在推动AI辅助医学诊断方面显示出巨大潜力，特别是在通过胸部X光等医学图像模态检测肺癌方面。然而，这些模型的黑盒性质对可解释性和可信度构成挑战，限制了其在临床中的应用。本研究评估了基于InceptionV3的高性能肺癌检测模型的可解释性和鲁棒性，利用公开的胸部X光和放射学报告数据集。我们评估了多种可解释AI（XAI）技术的临床效用，包括后验和先验方法，并发现现有方法常无法提供临床相关解释，存在不一致性和与放射科专家评估的偏离。为解决这些限制，我们与放射科医生合作定义诊断特定的临床概念，并开发了ClinicXAI，一种专家驱动的方法，利用概念瓶颈方法。ClinicXAI生成具有临床意义的解释，与临床医生的实践需求紧密相关，同时保持高诊断准确性。我们还通过一系列广泛使用的对抗攻击测试ClinicXAI与原始InceptionV3模型的鲁棒性。我们的分析表明，ClinicXAI在对抗扰动下表现出显著更强的鲁棒性。这些发现强调了在医学诊断中将领域专业知识纳入可解释和鲁棒AI系统设计的重要性，为医疗领域更可信和有效的AI解决方案铺平道路。

英文摘要

Deep learning models show significant potential for advancing AI-assisted medical diagnostics, particularly in detecting lung cancer through medical image modalities such as chest X-rays. However, the black-box nature of these models poses challenges to their interpretability and trustworthiness, limiting their adoption in clinical practice. This study examines both the interpretability and robustness of a high-performing lung cancer detection model based on InceptionV3, utilizing a public dataset of chest X-rays and radiological reports. We evaluate the clinical utility of multiple explainable AI (XAI) techniques, including both post-hoc and ante-hoc approaches, and find that existing methods often fail to provide clinically relevant explanations, displaying inconsistencies and divergence from expert radiologist assessments. To address these limitations, we collaborated with a radiologist to define diagnosis-specific clinical concepts and developed ClinicXAI, an expert-driven approach leveraging the concept bottleneck methodology. ClinicXAI generated clinically meaningful explanations which closely aligned with the practical requirements of clinicians while maintaining high diagnostic accuracy. We also assess the robustness of ClinicXAI in comparison to the original InceptionV3 model by subjecting both to a series of widely utilized adversarial attacks. Our analysis demonstrates that ClinicXAI exhibits significantly greater resilience to adversarial perturbations. These findings underscore the importance of incorporating domain expertise into the design of interpretable and robust AI systems for medical diagnostics, paving the way for more trustworthy and effective AI solutions in healthcare.

URL PDF HTML ☆

赞 0 踩 0

2408.06350 2026-06-16 cs.HC cs.LG 版本更新

Predicting cognitive load in immersive driving scenarios with a hybrid CNN-RNN model

利用混合CNN-RNN模型预测沉浸式驾驶场景中的认知负荷

Mehshan Ahmed Khan, Houshyar Asadi, Mohammad Reza Chalak Qazani, Adetokunbo Arogbonlo, Saeid Nahavandi, Chee Peng Lim

发表机构 * Institute for Intelligent Systems Research and Innovation（智能系统研究与创新研究所）； Faculty of Computing and Information Technology (FoCIT)（计算与信息科技学院）； Swinburne University of Technology（斯威丁大学）

AI总结本文提出混合CNN-RNN模型，通过融合fNIRS、眼动追踪和驾驶行为数据，准确预测三种认知负荷水平，提升预测精度。

Comments 17 pages

详情

AI中文摘要

在交通安全研究中，次要任务的认知负荷会降低主要任务表现，如驾驶。尽管生理信号已被广泛用于驾驶相关研究以评估认知负荷，但仅有少数研究专门关注高认知负荷场景。本研究采用三种等级的听觉n-back任务作为认知负荷的次要任务，在驾驶模拟器中驾驶时同时执行驾驶和n-back任务，记录fNIRS、眼动追踪和驾驶行为数据以预测三种不同水平的认知负荷。不同于以往研究中在无交通条件下使用二元分类法，本研究在低能见度条件下，特别是在夜间和雨天的正常交通环境中，考察三种认知负荷水平。我们提出了一种结合1D卷积神经网络和循环神经网络的混合神经网络来预测认知负荷。实验结果表明，所提出的模型在参数更少的情况下，使用生理数据将准确率从99.82%提升到99.99%，使用驾驶行为数据单独时从87.26%提升到92.02%。这一显著改进突显了我们混合神经网络在复杂条件下准确预测驾驶认知负荷的有效性。

英文摘要

One debatable issue in traffic safety research is that cognitive load from sec-ondary tasks reduces primary task performance, such as driving. Although physiological signals have been extensively used in driving-related research to assess cognitive load, only a few studies have specifically focused on high cognitive load scenarios. Most existing studies tend to examine moderate or low levels of cognitive load In this study, we adopted an auditory version of the n-back task of three levels as a cognitively loading secondary task while driving in a driving simulator. During the simultaneous execution of driving and the n-back task, we recorded fNIRS, eye-tracking, and driving behavior data to predict cognitive load at three different levels. To the best of our knowledge, this combination of data sources has never been used before. Un-like most previous studies that utilize binary classification of cognitive load and driving in conditions without traffic, our study involved three levels of cognitive load, with drivers operating in normal traffic conditions under low visibility, specifically during nighttime and rainy weather. We proposed a hybrid neural network combining a 1D Convolutional Neural Network and a Recurrent Neural Network to predict cognitive load. Our experimental re-sults demonstrate that the proposed model, with fewer parameters, increases accuracy from 99.82% to 99.99% using physiological data, and from 87.26% to 92.02% using driving behavior data alone. This significant improvement highlights the effectiveness of our hybrid neural network in accurately pre-dicting cognitive load during driving under challenging conditions.

URL PDF HTML ☆

赞 0 踩 0

2401.06644 2026-06-16 cs.LG eess.SP 版本更新

SeizNet: An AI-enabled Implantable Sensor Network System for Seizure Prediction

SeizNet：一种基于人工智能的可植入传感器网络系统用于癫痫预测

Ali Saeizadeh, Douglas Schonholtz, Daniel Uvaydov, Raffaele Guida, Emrecan Demirors, Pedram Johari, Jorge M. Jimenez, Joseph S. Neimat, Tommaso Melodia

发表机构 * Institute for the Wireless Internet of Things, Northeastern University, Boston, MA, U.S.A.（无线互联网研究所，东北大学，波士顿，马萨诸塞州，美国）； University of Louisville, Louisville, KY, U.S.A.（路易斯维尔大学，路易斯维尔，肯塔基州，美国）

AI总结 SeizNet利用深度学习和多传感器数据提升癫痫预测的特异性与灵敏度，实现高达99%的预测准确率，为难治性癫痫治疗提供新途径。

Comments 4 pages, 4 figures, 1 table

详情

DOI: 10.23919/WONS60642.2024.10449556
Journal ref: 2024 19th Wireless On-Demand Network Systems and Services Conference (WONS)

AI中文摘要

本文介绍SeizNet，一种通过深度学习方法和可植入传感器网络实现癫痫预测的闭环系统。SeizNet结合脑电图(iEEG)和心电图(ECG)数据，提升预测特异性的同时保持高灵敏度。系统设计用于边缘计算，减少数据隐私、传输和能耗问题。实验表明，SeizNet在所有指标上均优于传统单模态和非个性化预测系统，达到99%的癫痫预测准确率，为难治性癫痫治疗提供新方向。

英文摘要

In this paper, we introduce SeizNet, a closed-loop system for predicting epileptic seizures through the use of Deep Learning (DL) method and implantable sensor networks. While pharmacological treatment is effective for some epilepsy patients (with ~65M people affected worldwide), one out of three suffer from drug-resistant epilepsy. To alleviate the impact of seizure, predictive systems have been developed that can notify such patients of an impending seizure, allowing them to take precautionary measures. SeizNet leverages DL techniques and combines data from multiple recordings, specifically intracranial electroencephalogram (iEEG) and electrocardiogram (ECG) sensors, that can significantly improve the specificity of seizure prediction while preserving very high levels of sensitivity. SeizNet DL algorithms are designed for efficient real-time execution at the edge, minimizing data privacy concerns, data transmission overhead, and power inefficiencies associated with cloud-based solutions. Our results indicate that SeizNet outperforms traditional single-modality and non-personalized prediction systems in all metrics, achieving up to 99% accuracy in predicting seizure, offering a promising new avenue in refractory epilepsy treatment.

URL PDF HTML ☆

赞 0 踩 0

1. 深度学习架构与训练方法 84 篇

Separable Neural Architectures as Physical World Models: from Mathematical Theory to Applications

Transformers Learn the Mestre-Nagao Heuristic

An Integrable Token Mixing Layer from the Generalized Yang Baxter Equation

Controlled Dynamics Attractor Transformer

Localizing Credit at the Divergence: Path-Conditioned Self-Distillation for LLM Reasoning

Z-Plane Neural Networks: Bounded Geometric Activation Replaces ReLU and LayerNorm

The Reservoir Attention Network: Cross-Pass State in Pretrained Transformers via Content-Addressable Reservoir Injection

Decomposing one-class support vector machine into an ensemble of one-data support vector machines

Inference-Time Decision Calibration for Temporal Classification

Phys-JEPA: Physics-Informed Latent World Models for Multivariate Time-Series Forecasting

Scaling Adaptive Depth with Norm-Agnostic Residual Networks

From Tokens to Regions: CUDA-Sensitive Instruction Tuning for GPU Kernel Generation

LiFT: Local Search via Linear Programming for Overfitting-Controlled Transformers

QK-Normed MLA: QK normalization without full key caching

CacheMuon: Using Temporal Preconditioning To Approximate Polar Factor

Robust Neural Tucker Factorization with Bias Correction and Adaptive Initialization

Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation

SDS-LoRA: Overcoming Anisotropic Gradient Scaling in Low-Rank Adaptation

SPRI: SVD-Partitioned Residual Initialization for Data-Constrained MoE Upcycling

RepNet: Tackling spectral bias in deep neural networks via parameter reparameterization

Entropy-Gated Latent Recursion

SPICE: Synergy and Partial Information Based Curriculum Evolution

Adaptive inference and function vectors in deep transformers

Taming Curvature: Architecture Warm-Up for Stable Transformer Training

Fantastic Pretraining Optimizers and Where to Find Them II: Hyperball Optimization

Factorized Neural Operators Decompose Dynamic and Persistent Responses

Scalable Circuit Learning for Interpreting Large Language Models

Scalable Pairwise Kernel Learning with Stochastic Vec Trick

HAMON: Passive Optical Sequence Mixing for Long-Horizon Forecasting

Spatial Priors via Space Filling Curves for Small and Limited Data Vision Transformers

An Empirical Analysis of Optimization Dynamics and Sparsity Boundaries in Large-Scale Pedestrian Attribute Recognition

Simplifying the Modeling of Arbitrary Conditionals in Natural Language

Harnessing cortical geometry, wiring, and function as inductive biases for recurrent neural networks

AI Engram: In Search of Memory Traces in Artificial Intelligence

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

HiRo: A Compact Four-Directional Hierarchical Reservoir Token-Mixer for Efficient Image Classification

Rethinking the Role of Efficient Attention in Hybrid Architectures

Schattor: Schatten-family methods for deep learning optimization

Acoustic Prompting via Stage-wise Modulation for Few-Shot Learning in Audio Language Models

Learning a Sampling-Free Variational DNN Plugin from Tiny Training Sets to Refine OOD Segmentation With Uncertainty Estimation

Latent Thought Flow: Efficient Latent Reasoning in Large Language Models

Attention is Just Another Name for Coupling?: A Fast-Slow ODE Perspective on Hierarchical Pretraining

Gen-VCoT: Generative Visual Chain-of-Thought Reasoning via Diffusion-Based RGB Intermediate Representations

Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models

Exploring Extrinsic and Intrinsic Properties for Effective Reasoning with Code Interpreter

ActiveSAM: Image-Conditional Class Pruning for Fast and Accurate Open-Vocabulary Segmentation

The Importance of Phase in Neural Representations: An Internal Oppenheim-Lim Test of Image Classifiers

Geometric Action Model for Robot Policy Learning

Learning in the Recurrent State: Gradient Descent with Linear Recurrent Networks

Enhancing Physics-Informed Neural Networks Through Feature Engineering

Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality

AC-ODM: Actor--Critic Online Data Mixing for Sample-Efficient LLM Pretraining

Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons

FlowState: Sampling-Rate-Equivariant Time-Series Forecasting

Adaptive $k$NN graph model

Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification

Smoothness Errors in Dynamics Models and How to Avoid Them

How Controlling the Variance can Improve Training Stability of Sparsely Activated DNNs and CNNs

TextResNet: Decoupling and Routing Optimization Signals in Compound AI Systems via Deep Residual Tuning

TS-Memory: Plug-and-Play Memory for Time Series Foundation Models

GauS: Differentiable Scheduling Optimization via Gaussian Reparameterization

IGLU: The Integrated Gaussian Linear Unit Activation Function

Entropy-Aware On-Policy Distillation of Language Models

Manifold-Orthogonal Dual-spectrum Extrapolation for Parameterized Physics-Informed Neural Networks

Gated QKAN-FWP: Scalable Quantum-inspired Sequence Learning

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

Multi-Scale Separable Fourier Neural Networks for Solving High-Frequency PDEs

Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers

TS-ICL: A Flexible Time-Indexed Foundation Model for Time Series via In-Context Learning

On the Geometry of On-Policy Distillation

DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment

Overcoming Rank Collapse in Feedback Alignment

A theoretical model for task routing in mixture-of-expert transformers

It's About Time: Temporal References in Emergent Communication

Deep Neural Networks: A Formulation Via Non-Archimedean Analysis

No One-Size-Fits-All Neurons: Task-based Neurons for Artificial Neural Networks

PURe: A Plug-and-Play Product-Unit Residual Module for Vision Networks

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling