arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.13705 2026-06-15 cs.LG cs.AI 新提交

Can Editing 1 Neuron Fix Repetition Loops in LLMs?

编辑1个神经元能修复LLM中的重复循环吗？

Aristotelis Lazaridis, Aman Sharma, Dylan Bates, Brian King, Vincent Lu, Jack FitzGerald

发表机构 * Edgerunner AI

AI总结本文发现Gemma 4模型在长事实列举任务中高达95%的概率陷入重复循环，通过逐层消融和逐神经元归因定位到少量MLP神经元，并用静态权重编辑（小至单个神经元符号反转）消除循环，但无法解决因知识缺失导致的“末日循环”。

详情

AI中文摘要

是的。它能治愈末日循环吗？可能不行。Gemma 4指令微调模型存在一个可复现的失败：在长事实列举提示（如列出电视剧的每一集、88个IAU星座或151个原始宝可梦）上，它们会崩溃成重复，要么是严格的逐字循环，要么是列表条目退化到单一答案。这些循环的发生率高达95%，并且能抵抗提示改写、推理引擎更改和大多数采样调整。在本文中，我们探讨这种行为是否足够局部化，从而可以通过权重编辑来消除。为了定位原因，我们使用逐层消融和逐神经元归因，然后通过完整生成扫描确认最强候选。循环追溯到一小部分MLP神经元（或者在26B-A4B混合专家模型中，几个路由专家），我们通过静态权重编辑抑制它们。这些“手术”可以小到单个符号反转的神经元（在E2B模型中）。有效编辑的大小随模型规模增长，但在所有情况下，循环模式可以在正常生成预算内解决，同时保持通用基准分数。然而，编辑并不能解决所有问题：我们还研究了更长的思考预算，其中两个较大的模型最明显地进入末日循环，即模型在无法回忆的事实上自我纠正的循环，耗尽预算而不给出最终答案。我们表明，这种残余失败通过相同的编辑减少但未消除，并认为它本质上是知识精度问题，而非可移除的电路；权重手术可以删除循环，但不能提供缺失的事实。我们的结果既是可行性证明——即具体的生成病理可以定位到少数参数并编辑掉——也是对该方法适用范围的界定。

英文摘要

Yes. Can it cure doom loops? Probably not. The Gemma 4 instruction-tuned models share a reproducible failure: on long factual enumeration prompts, such as listing every episode of a TV series, the 88 IAU constellations, or the 151 original Pokemon, they collapse into repetition, either a tight verbatim loop or a list whose entries decay onto a single answer. These loops occur at rates as high as 95% and survive prompt rewording, inference-engine changes, and most sampling adjustments. In this paper we explore whether this behavior is localized enough to remove by weight edits. To localize the cause, we use per-layer ablation and per-neuron attribution, then confirm the strongest candidates with full-generation sweeps. The loops trace to a small set of MLP neurons (or, in the 26B-A4B Mixture-of-Experts model, a few routed experts) which we suppress with static weight edits. These "surgeries" can be as small as a single sign-inverted neuron (in the E2B model). The size of the effective edits grows with model scale, but in all cases, the loop patterns can be addressed at normal generation budgets while preserving general-purpose benchmark scores. However, the edits do not solve everything: we also study longer thinking budgets, where the two larger models most visibly enter doom looping, i.e. a non-convergent regime in which the model self-corrects in circles over a fact it cannot recall, exhausting the budget without committing to a final answer. We show this residual failure is reduced but not eliminated by the same edits, and argue it is fundamentally a knowledge-precision problem rather than a removable circuit; weight surgery can delete a loop, but it cannot supply a missing fact. Our results are both a feasibility demonstration, that is, evidence that a concrete generation pathology can be localized to a few parameters and edited out, and a delineation of where that approach stops.

URL PDF HTML ☆

赞 0 踩 0

2606.13754 2026-06-15 cs.LG 新提交

D2H-AD: A Hybrid Model Utilizing Hyperdimensional Computing for Advanced Anomaly Detection

D2H-AD：一种利用超维度计算进行高级异常检测的混合模型

Ghazal Ghajari, Elaheh Ghajari, Ashutosh Ghimire, Saeid Ataei, Faris Alsulami, Fathi Amsaad

发表机构 * Wright State University（莱特州立大学）； Azad University（阿扎德大学）； Stevens Institute of Technology（史蒂文斯理工学院）； University of Jeddah（吉达大学）

AI总结提出基于超维度计算的异常检测框架D2H-AD，通过距离相似性和密度感知编码统一表示，在多个基准数据集上优于现有方法，具有轻量、可解释和高效的特点。

详情

DOI: 10.1109/ACCESS.2026.3677763

AI中文摘要

异常检测是智能系统的基本组成部分，应用于医疗、网络安全、智能电网和物联网环境。尽管传统的机器学习和深度学习方法在识别异常方面表现出有效性，但它们通常依赖大量标记数据集、计算成本高，并在边缘和高维场景中面临可扩展性挑战。本文提出D2H-AD，一种基于超维度计算（HDC）的新型异常检测框架，HDC是一种受大脑启发的范式，使用高维分布式向量表示信息。与现有基于HDC的方法不同，D2H-AD在统一框架中集成了基于距离的相似性和密度感知编码，改进了异常表示和检测性能。消融研究表明，仅超维度编码相比直接在原始特征空间应用相同的密度-距离评分，ROC-AUC提升高达5.4%。此外，D2H-AD在所有评估数据集上始终优于五个基线方法：HDAD、ODHD、一类SVM、孤立森林和自编码器。该框架轻量、可解释且计算高效，适用于资源受限和实时应用。我们在五个基准数据集上验证了D2H-AD，展示了优越的F1分数和ROC-AUC性能，以及对类别不平衡、噪声和数据复杂性的鲁棒性。除了提高准确性，D2H-AD还提供可扩展性、小内存占用和低延迟操作，这得益于二进制计算和紧凑设计。这些特性使其特别适用于TinyML和边缘AI部署。所提出的框架突显了HDC在动态环境中进行准确、可解释和节能异常检测的潜力。

英文摘要

Anomaly detection is a fundamental component of intelligent systems with applications in healthcare, cybersecurity, smart grids, and IoT environments. Although conventional machine learning and deep learning methods have demonstrated effectiveness in identifying anomalies, they often rely on large labeled datasets, incur high computational costs, and face scalability challenges in edge and high-dimensional settings. This paper presents D2H-AD, a novel anomaly detection framework based on Hyperdimensional Computing (HDC), a brain-inspired paradigm that represents information using high-dimensional distributed vectors. Unlike existing HDC-based methods, D2H-AD integrates distance-based similarity and density-aware encoding within a unified framework, improving anomaly representation and detection performance. Ablation studies show that hyperdimensional encoding alone yields up to 5.4% higher ROC-AUC than applying the same density-distance scoring directly in the original feature space. Furthermore, D2H-AD consistently outperforms five established baselines, namely HDAD, ODHD, One-Class SVM, Isolation Forest, and Autoencoders, across all evaluated datasets. The framework is lightweight, interpretable, and computationally efficient, making it suitable for resource-constrained and real-time applications. We validate D2H-AD on five benchmark datasets and demonstrate superior F1-score and ROC-AUC performance, together with robustness to class imbalance, noise, and data complexity. In addition to improved accuracy, D2H-AD offers scalability, a small memory footprint, and low-latency operation enabled by binary computations and a compact design. These properties make it particularly attractive for TinyML and edge AI deployments. The proposed framework highlights the potential of HDC for accurate, interpretable, and energy-efficient anomaly detection in dynamic environments.

URL PDF HTML ☆

赞 0 踩 0

2606.13803 2026-06-15 cs.LG 新提交

Neural Slack Variables for Shape Constraints

形状约束的神经松弛变量

Ruben Wiedemann, Antoine Jacquier, Lukas Gonon

发表机构 * Imperial College London（伦敦帝国理工学院）； University of St. Gallen（圣加仑大学）

AI总结提出神经松弛变量方法，将约束强制执行转化为回归问题，通过联合学习辅助网络实现零违规，应用于单调性和凸性约束及金融波动曲面学习。

详情

AI中文摘要

在神经网络中强制执行单调性和凸性等函数不等式约束是许多工业和科学应用中的基本挑战。经典的惩罚方法和基于互补松弛性的原始-对偶方法仅在违反位置提供约束梯度，导致约束满足脆弱。另一方面，通过构造保证可行性的架构仍然主要限于简单情况，并引入额外的归纳偏差。我们提出神经松弛变量，一种深度学习原生的原始侧方法，通过将主网络与联合学习的辅助网络耦合，将约束强制执行转化为回归问题。辅助网络作为主网络约束量的有效目标，诱导可行性和正则性。神经松弛变量在密集网格的单调性和凸性测试案例上实现了零测量违规，而惩罚和原始-对偶基线存在残余违规，并实现了波动率曲面的无套利学习，这是量化金融中的一个开放工业挑战。

英文摘要

Enforcing functional inequality constraints such as monotonicity and convexity in neural networks is a fundamental challenge in many industrial and scientific applications. Classical one-sided penalty methods, along with primal-dual methods gated by complementary slackness, provide constraint gradients only at violated locations, resulting in fragile satisfaction. Architectures that guarantee feasibility by construction, on the other hand, remain largely limited to elementary cases and impose additional inductive biases. We introduce neural slack variables, a deep learning native primal-side approach that converts constraint enforcement into a regression problem by coupling the primary network with a jointly learned auxiliary network. The auxiliary network serves as a valid target for the primary network's constraint quantities, inducing feasibility and regularity. Neural slack variables achieve zero measured violations on dense-grid monotonicity and convexity test cases, where penalty and primal-dual baselines leave residual violations, and enable arbitrage-free learning of volatility surfaces, an open industrial challenge in quantitative finance.

URL PDF HTML ☆

赞 0 踩 0

2606.13862 2026-06-15 cs.LG cs.AI cs.CL 新提交

SuperThoughts: Reasoning Tokens in Superposition

SuperThoughts: 叠加中的推理令牌

Zheyang Xiong, Shivam Garg, Max Yu, Vaishnavi Shrivastava, Haoyu Zhao, Anastasios Kyrillidis, Dimitris Papailiopoulos

发表机构 * University of Wisconsin-Madison（威斯康星大学麦迪逊分校）； Microsoft Research（微软研究院）； Independent（独立机构）； Princeton University（普林斯顿大学）； Rice University（莱斯大学）

AI总结提出SuperThoughts方法，通过将连续CoT令牌对压缩为单一潜在表示并利用多令牌预测模块解码，在保持训练监督的同时将推理吞吐量翻倍，实现约20-30%的CoT长度缩减且精度损失极小。

详情

AI中文摘要

长链思维（CoT）推理提升了LLM的问题解决能力，但由于顺序生成令牌导致计算成本高昂。尽管近期工作探索在连续潜在空间中进行推理以绕过离散令牌生成，但这些方法常面临训练稳定性问题，且因缺乏监督信号而难以扩展到复杂的长程任务。我们提出SuperThoughts，将连续的CoT令牌对压缩为单一潜在表示，并通过轻量级多令牌预测（MTP）模块每步解码两个令牌。这既在训练时保留了离散令牌监督，又在推理时使吞吐量翻倍。我们在Qwen2.5-Math-1.5B-Instruct、Qwen2.5-Math-7B-Instruct、Qwen2.5-Math-14B-Instruct上进行微调，并在MATH500、AMC、OlympiadBench和GPQA-Diamond上评估。通过基于置信度的自适应机制（在不确定时回退到标准解码），SuperThoughts实现了约20-30%的CoT长度缩减，同时保持精度，在大多数任务上仅下降1-2个准确率点。

英文摘要

Long Chain-of-Thought (CoT) reasoning improves LLM problem-solving but is computationally expensive due to sequential token generation. While recent works explore reasoning in continuous latent spaces to bypass discrete token generation, they often struggle with training stability and fail to scale to complex, long-horizon tasks due to lack of supervision signal. We propose SuperThoughts, which compresses pairs of consecutive CoT tokens into single latent representations and decodes two tokens per step via a lightweight Multi-Token Prediction (MTP) module. This preserves discrete token supervision at training time while doubling throughput at inference time. We finetune Qwen2.5-Math-1.5B-Instruct, Qwen2.5-Math-7B-Instruct, Qwen2.5-Math-14B-Instruct, and evaluate on MATH500, AMC, OlympiadBench, and GPQA-Diamond. With a confidence-based adaptive mechanism that falls back to standard decoding when uncertain, SuperThoughts achieves $\sim$20--30\% CoT length reduction while maintaining accuracy with minimal degradation (1-2 points accuracy drop on most tasks).

URL PDF HTML ☆

赞 0 踩 0

2606.13901 2026-06-15 cs.LG cs.NE 新提交

SpikF-GO: Spiking Fourier Graph Operators for Multivariate Time Series Forecasting

SpikF-GO: 用于多元时间序列预测的尖峰傅里叶图算子

Jafar Bakhshaliyev, Niels Landwehr

发表机构 * Data Science Group, University of Hildesheim（希尔德斯海姆大学数据科学组）

AI总结针对现有SNN预测方法缺乏变量间依赖建模的问题，提出SpikF-GO，通过超变量图公式和尖峰驱动谱处理，结合可学习稀疏频率门和复数LIF门，在统一协议下达到SNN方法最佳平均排名，并降低能耗。

Comments 23 pages, 2 figures, 11 tables. Accepted for presentation at ECML PKDD 2026. Code: https://github.com/jafarbakhshaliyev/SpikF-GO

详情

AI中文摘要

尖峰神经网络（SNNs）已成为传统神经网络的一种节能替代方案，在计算机视觉和机器人技术中表现出强劲性能。最近，SNNs已被应用于时间序列预测（TSF），相关方法探索了尖峰时间骨干网络、尖峰兼容位置编码、傅里叶域处理以及重新设计的神经元动力学。然而，现有的SNN预测方法独立处理变量，缺乏对变量间依赖关系建模的显式机制。在多元设置中，跨变量相关性携带大量预测信息，这是一个关键限制。我们提出了尖峰傅里叶图算子（SpikF-GO），通过结合超变量图公式（其中每个标量观测值成为一个图节点）和尖峰驱动谱处理来解决这一空白。SpikF-GO引入了一个硬混凝土频率门用于可学习的稀疏频率选择，以及一个复数LIF门，该门对实部和虚部傅里叶分量应用独立的尖峰神经元，在整个谱域中保持二进制事件驱动计算。我们进一步提出了一个变体，结合了基于中央模式生成器的位置编码，以增强长程时间建模。在统一实验协议下对八个基准进行评估，SpikF-GO在所有SNN方法中取得了最佳平均排名，并以更低的能耗优于其ANN对应方法FourierGNN。即使在显著更小的嵌入维度下，SpikF-GO仍保持竞争性精度，从而实现了显著的能耗降低。据我们所知，这是首批将基于图的多元建模引入尖峰领域用于TSF的工作之一，也是首个在共同实验协议下提供SNN预测架构统一比较的工作。

英文摘要

Spiking Neural Networks (SNNs) have emerged as an energy-efficient alternative to conventional neural networks, demonstrating strong performance in computer vision and robotics. More recently, SNNs have been applied to time series forecasting (TSF), with methods exploring spiking temporal backbones, spike-compatible positional encodings, Fourier-domain processing, and redesigned neuron dynamics. However, existing SNN forecasting approaches process variables independently, lacking explicit mechanisms for modeling inter-variable dependencies. This is a critical limitation in multivariate settings, where cross-variable correlations carry substantial predictive information. We propose Spiking Fourier Graph Operators (SpikF-GO), which addresses this gap by combining a hypervariate graph formulation in which every scalar observation becomes a graph node with spike-driven spectral processing. SpikF-GO introduces a Hard Concrete frequency gate for learnable sparse frequency selection and a Complex LIF gate that applies independent spiking neurons to real and imaginary Fourier components, preserving binary, event-driven computation throughout the spectral domain. We further present a variant incorporating Central Pattern Generator-based positional encodings for stronger long-range temporal modeling. Evaluated on eight benchmarks under a unified experimental protocol, SpikF-GO achieves the best average rank among all SNN methods and outperforms its ANN counterpart, FourierGNN, at reduced energy cost. SpikF-GO maintains competitive accuracy even at substantially smaller embedding dimensions, thereby achieving significant energy reductions. To our knowledge, this is among the first works to bring graph-based multivariate modeling into the spiking domain for TSF and the first to provide a unified comparison across SNN forecasting architectures under a common experimental protocol.

URL PDF HTML ☆

赞 0 踩 0

2606.14040 2026-06-15 cs.LG 新提交

Decompose Sparsely Where You Should, Absorb Densely Where You Should No

在应当稀疏处分解，在应当稠密处吸收

Ruixuan Deng, Zehao Jin, Zekun Wang, Zihan Dong

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结针对稀疏自编码器假设所有激活内容均可稀疏分解的缺陷，提出在标准SAE旁添加低秩线性瓶颈以吸收稠密成分，在Gemma-2-2B第12层上秩24瓶颈减少84%稠密潜变量，并揭示该成分是结构可识别、因果必要且被稀疏字典冗余编码的计算脚手架。

详情

AI中文摘要

稀疏自编码器（SAE）通常被训练为通过稀疏字典重建残差流的\textbf{全部}内容，隐含假设所有激活内容都适合稀疏、单语义的分解。我们质疑这一假设，并推测激活包含一个低秩、稠密的成分，该成分对模型计算重要但本质上不适合稀疏表示，这是训练SAE中广泛观察到的持久稠密潜变量的主要来源。为验证这一点，我们在标准SAE（BatchTopK和Matryoshka）旁添加一个小型秩$r$线性瓶颈，使得稠密结构在稀疏重建前被吸收。在Gemma-2-2B第12层上，秩24瓶颈将稠密潜变量计数减少高达84%，同时在匹配稀疏度下改善了两种架构的稀疏探测和定向探测扰动。被吸收的成分（i）在\textbf{结构上可识别}，即顶部主成分和离群维度；（ii）在\textbf{因果上必要}，移除它会使下一个token的交叉熵增加7.5倍，远超移除几何上几乎相同的顶部24个PCA方向带来的2.8倍增加；（iii）被\textbf{稀疏字典冗余编码}，消融787个最大对齐的稀疏特征仅使交叉熵增加2.9倍，消融2048个主题对齐特征几乎不改变MMLU主题分类，而移除脚手架则使其从98.7%降至随机水平。综合来看，我们的发现识别出残差流激活中一个紧凑、语义信息丰富且因果重要的成分（我们称之为\textbf{计算脚手架}），标准稀疏字典对其表示效率低下，表明基于稀疏性的可解释性方法的适用范围需要谨慎重新审视。

英文摘要

Sparse autoencoders (SAEs) are typically trained to reconstruct the \textbf{entire} residual stream through a sparse dictionary, implicitly assuming that all activation content is amenable to sparse, monosemantic decomposition. We question this assumption and hypothesize that activations contain a low-rank, dense component that is computationally important to the model yet inherently unsuitable for sparse representation, which serves as a major source of the persistent dense latents widely observed in trained SAEs. To test this, we add a small rank-$r$ linear bottleneck in parallel with standard SAEs (BatchTopK and Matryoshka), allowing dense structure to be absorbed before sparse reconstruction. On Gemma-2-2B layer 12, a rank-24 bottleneck reduces dense latent count by up to 84\% while improving sparse probing and targeted probe perturbation on both architectures at matched sparsity. The absorbed component is (i) \textbf{structurally identifiable} as the top principal components and outlier dimensions; (ii) \textbf{causally necessary}, with removing it raising next-token cross-entropy by 7.5$\times$, far exceeding the 2.8$\times$ from removing the geometrically near-identical top-24 PCA directions; and (iii) \textbf{redundantly encoded by sparse dictionaries}, with ablating 787 maximally aligned sparse features raising cross-entropy by only 2.9$\times$ and ablating 2,048 topic-aligned features leaving MMLU topic classification virtually unchanged, whereas removing the scaffold drops it from 98.7\% to chance. Together, our findings identify a compact, semantically informative and causally important component of residual stream activations (which we term a \textbf{computational scaffold}) that standard sparse dictionaries represent inefficiently, suggesting that the scope of sparsity-based interpretability methods warrants careful re-examination.

URL PDF HTML ☆

赞 0 踩 0

2606.14079 2026-06-15 cs.LG 新提交

Deep Spectral Learning of Embedded Latent Transfer Operators for Stochastic Dynamical Systems

随机动力系统的嵌入潜转移算子深度谱学习

Ryogo Tanaka, Yoshinobu Kawahara

发表机构 * Graduate School of Information Science and Technology, The University of Osaka（大阪大学信息科学与技术研究生院）； Center for Advanced Intelligence Project, RIKEN（理化学研究所先进智能项目中心）

AI总结提出一种深度谱编码器方法，通过可学习的非线性特征映射定义马尔可夫潜状态，利用泛函典型相关分析和Galerkin投影估计转移与观测算子，实现贝叶斯滤波和Koopman谱分解，在噪声和部分可观测条件下表现稳定优越。

Comments Accepted at the 42nd Conference on Uncertainty in Artificial Intelligence (UAI 2026)

详情

AI中文摘要

我们提出了一种用于随机非线性动力系统的谱学习方法，该方法在深度特征空间中用嵌入的潜转移算子表示。我们将该方法实例化为深度谱编码器（DSE），一种基于算子的潜状态空间模型，其中时不变神经编码器从观测中实现可学习的非线性特征映射，这些特征定义了马尔可夫潜状态，其时间演化和观测映射分别由转移算子和观测算子描述。在可学习的Galerkin投影特征空间中的泛函典型相关分析提供了来自过去和未来观测的状态坐标，两个线性算子以岭正则化的闭式解形式在状态坐标上估计，这些解与相关算子的Galerkin投影一致。在此表示上，我们推广了特征空间中的序贯贝叶斯滤波和Koopman谱模态分解。多个场景的实验表明，即使在噪声和部分可观测条件下，与序贯贝叶斯滤波和动态模式分解基线相比，该方法性能稳定且优越。

英文摘要

We propose a spectral learning method for stochastic nonlinear dynamical systems represented with embedded latent transfer operators in deep feature spaces. We instantiate the method as Deep Spectral Encoder (DSE), an operator-based latent state-space model in which a time-invariant neural encoder implements learnable nonlinear feature maps from observations, and these features define Markovian latent states whose temporal evolution and observation mapping are described by the transfer and observation operators, respectively. Functional canonical correlation analysis in a learnable Galerkin-projected feature space provides state coordinates from past and future observations, and the two linear operators are estimated on the state coordinates as ridge-regularized closed-form solutions that coincide with Galerkin projections of the associated covariance operators. On this representation, we generalize sequential Bayesian filtering and Koopman spectral mode decomposition in feature space. Experiments on several scenarios show stable and superior performance with sequential Bayesian filtering and dynamic mode decomposition baselines even under noise and partial observability.

URL PDF HTML ☆

赞 0 踩 0

2606.14156 2026-06-15 cs.LG cs.AI 新提交

Learning High Coverage Discriminative Parsimonious Rulesets

学习高覆盖判别性简约规则集

Mariamma Antony, Raman Sankaran, Chiranjib Bhattacharyya, Uma Satya Ranjan

发表机构 * Indian Institute of Science（印度科学研究所）； Compass

AI总结提出CDPR方法，通过子模最大化算法学习高覆盖、判别性且简约的规则集，在保持高准确率的同时显著提升可解释性，覆盖率比次优算法提升2.5倍以上。

详情

AI中文摘要

做我的导师：通过同伴反馈实现互惠LLM改进的在策略共蒸馏

Woohyeon Byeon, Jiwon Jeon, Jeonghye Kim, Youngchul Sung

发表机构 * KAIST（韩国科学技术院）

AI总结提出在策略共蒸馏（OPCoD）方法，通过认知门控和反馈锚定实现两个不同领域强模型间的相互教学，达到帕累托改进。

2606.14388 2026-06-15 cs.LG 新提交

A Low-Rank Subspace Analysis of LLM Interventions

LLM干预的低秩子空间分析

Angira Sharma, Christian Schroeder de Witt, Philip Torr, Anisoara Calinescu, Jialin Yu

发表机构 * University of Cambridge（剑桥大学）

AI总结提出诊断框架，将行为建模为激活空间中的低秩子空间，发现干预一个行为会不对称地影响其他行为，效果与子空间重叠和决策子空间角度相关。

Comments Mechanistic Interpretability Workshop @ ICML 2026

详情

AI中文摘要

旨在修改LLM特定行为（如拒绝或谄媚）的干预措施通常会在其他行为中产生意外变化。这种缺乏针对性控制使得设计和实施可靠的安全控制变得困难。为了理解这些副作用，我们引入了一个诊断框架来分析LLM中交互行为。我们将行为建模为激活空间中的低秩子空间，并研究干预如何跨行为产生影响。在多个指令调优模型（7B-70B）以及拒绝、越狱和谄媚设置中，我们发现不同行为共享内部表示，并且干预一个行为会以不对称方式改变其他行为。一些行为作为上游控制点，其干预广泛传播到其他行为，而另一些则更为孤立。我们将这些效应与两个几何量联系起来：（i）行为子空间之间的重叠，以主角度平均余弦平方度量；（ii）每个行为子空间与决策子空间（捕捉模型最终决策，如拒绝与服从）之间的角度。经验上，对于子空间重叠较高的行为对，以及子空间更接近（角度更小）决策子空间的源行为，干预对其他行为的影响往往更大。这些发现突显了针对性行为控制的挑战：行为难以独立修改，因为干预可以通过共享表示和不对称交互传播。

英文摘要

Interventions designed to modify a particular behavior in LLMs, such as refusal or sycophancy, often produce unintended changes in other behaviors. This lack of targeted control makes it difficult to design and implement reliable safety controls. To understand these side-effects, we introduce a diagnostic framework for analyzing interacting behaviors in LLMs. We model behaviors as low-rank subspaces in activation space, and study how interventions influence across behaviors. Across multiple instruction-tuned models (7B-70B) and across refusal, jailbreak, and sycophancy settings, we find that different behaviors share internal representations, and intervening on one behavior alters others in asymmetric ways. Some behaviors act as upstream control points whose interventions propagate broadly across other behaviors, while others remain more isolated. We relate these effects to two geometric quantities: (i) the overlap between behavior subspaces, measured as the average squared cosine of principal angles, and (ii) the angle between each behavior subspace and the decision subspace (capturing the model's final decision e.g., refuse vs. comply). Empirically, intervention effects on other behaviors tend to be larger for behavior pairs with higher subspace overlap, and for source behaviors whose subspaces lie closer (smaller angle) to the decision subspace. These findings highlight a challenge for targeted behavior control: behaviors are difficult to modify independently, as interventions can propagate through shared representations and asymmetric interactions.

URL PDF HTML ☆

赞 0 踩 0

2606.14463 2026-06-15 cs.LG 新提交

EM-NeSy: Expectation Maximization for Neurosymbolic Learning

EM-NeSy：神经符号学习的期望最大化

Annegret Seibt, Luc De Raedt, Giuseppe Marra

发表机构 * Department of Computer Science（计算机科学系）； KU Leuven（根特大学）

AI总结提出EM-NeSy框架，将概率神经符号学习视为期望最大化算法实例，通过概率推理计算符号后验，仅通过神经组件进行梯度更新，实现可扩展且高效的近似推理。

详情

AI中文摘要

神经符号（NeSy）模型融合神经网络和符号推理，以实现鲁棒且可解释的人工智能。最先进的NeSy模型要求符号组件以可微分方式表达，这常常使近似推理的使用复杂化。我们提出EM-NeSy，将概率神经符号学习视为期望最大化（EM）算法的一个实例。在期望步骤中，我们通过概率推理计算基于标签的神经预测符号的后验。在最大化步骤中，我们仅通过神经组件使用梯度下降基于该后验更新神经参数。该公式释放了EM算法在NeSy学习中的全部潜力。它允许NeSy自然地扩展到近似推理，无需对符号组件进行任何额外修改或可微分性要求。此外，在精确推理下，它恢复了标准的端到端基于梯度的NeSy设置。我们的实验结果证明了EM-NeSy的可扩展性和计算效率。

英文摘要

Neurosymbolic (NeSy) models integrate neural networks and symbolic reasoning for robust and interpretable AI. State-of-the-art NeSy models require that the symbolic component is expressed in a differentiable way, often complicating the use of approximate inference. We propose EM-NeSy which casts probabilistic NeSy learning as an instance of the Expectation-Maximization (EM) algorithm. In the expectation step, we compute the posterior over the neurally predicted symbols conditioned on the label via probabilistic inference. In the maximization step, we update the neural parameters based on this posterior using gradient descent only through the neural component. This formulation unlocks the full potential of the EM algorithm for NeSy learning. It allows NeSy to extend naturally to approximate reasoning without any additional modifications or differentiability requirements of the symbolic component. Furthermore, it recovers the standard end-to-end gradient-based NeSy setting under exact inference. Our experimental results demonstrate the scalability and computational efficiency of EM-NeSy.

URL PDF HTML ☆

赞 0 踩 0

2606.14530 2026-06-15 cs.LG 新提交

Code Correctness Signals in LLM Hidden States: Pre-Generation Probing and Repair Geometry

LLM隐藏状态中的代码正确性信号：生成前探测与修复几何

Carlo Di Cicco

发表机构 * Independent researcher（独立研究员）

AI总结本文通过残差化方法，发现Qwen3-4B-Instruct模型在生成前隐藏状态可线性解码代码正确性（AUC 0.931），但修复成功的方向性信号在控制上下文协变量后消失，揭示了方法学上的正负结果。

Comments 12 pages, 8 tables. Code, data, and analysis scripts available at https://github.com/CarloDiCicco/ReasoningLab

详情

AI中文摘要

大型语言模型在其隐藏状态中编码丰富信息。本文研究在Qwen3-4B-Instruct-2507生成之前以及修复失败尝试时，代码正确性是否可从隐藏状态中解读，基于444个LiveCodeBench任务。报告两个发现，通过单一混杂控制工具——残差化联系起来。首先，模型首次尝试代码的正确性可从提示最终隐藏状态线性解码，在50个外部分割上无泄漏的留出AUC为0.931±0.008。从每个隐藏状态维度去除提示长度的线性效应后，探针仍达到0.911±0.010，远高于提示长度基线0.754±0.014。其次，在236个清理后的案例中，模型尝试修复失败的首次尝试，从失败尝试到修复的隐藏状态偏移携带统计上可检测的对比方向，在幅度和分割半测试中均显著高于标签打乱的零假设。该方向在对修复上下文协变量（成功与失败修复间不同）进行条件残差化后不再存在，表明它是修复成功的相关因素，由修复上下文驱动，而非孤立的修复理解特征。探针层通过嵌套交叉验证选择，同样的残差化方法支持了生成前正确性结果，却推翻了修复方向解释。贡献既是方法论上的也是实证上的：一个足够诚实的诊断，同时报告了负面结果和正面结果。

英文摘要

Large language models encode rich information in their hidden states. This work asks whether code correctness is legible in the hidden states of Qwen3-4B-Instruct-2507, before it generates and as it repairs a failed attempt, studied on 444 LiveCodeBench tasks. It reports two findings connected by a single confound-control tool: residualization. First, the correctness of the model's first-attempt code is linearly decodable from the prompt-final hidden state, with a leakage-free held-out AUC of 0.931 +/- 0.008 across 50 outer splits. After the linear effect of prompt length is removed from each hidden state dimension, the probe still reaches 0.911 +/- 0.010, well above a prompt-length baseline of 0.754 +/- 0.014. Second, on 236 cleaned cases where the model attempts to repair a failed first attempt, the hidden state shift from the failing attempt to its repair carries a statistically detectable contrastive direction, significant on both a magnitude and a split-half test against label-shuffled nulls. This direction does not survive a conditional residualization against repair-context covariates that differ between successful and failed repairs, marking it as a correlate of repair success driven by the repair context rather than an isolated repair-comprehension feature. The probe layer is selected by nested cross-validation, and the same residualization approach that upholds the pre-generation correctness result overturns the repair-direction interpretation. The contribution is as much methodological as empirical: a diagnostic honest enough to report a negative result alongside a positive one.

URL PDF HTML ☆

赞 0 踩 0

2606.14597 2026-06-15 cs.LG 新提交

Zero-shot generalization of transformer neural operators to larger domains

Transformer神经算子对更大领域的零样本泛化

Armand de Villeroché, Sibo Cheng, Vincent Le Guen, Marc Bocquet, Rem-Sophia Mouradi, Patrick Armand, Alban Farchi, Patrick Massin

发表机构 * CEREA, ENPC, EDF R&D, Institut Polytechnique de Paris（CEREA, ENPC, EDF研发部, 巴黎综合理工学院）； SINCLAIR AI Laboratory（SINCLAIR人工智能实验室）； EDF R&D（EDF研发部）； CEA, DAM, DIF（法国原子能委员会, 军事应用局, 法兰西岛）

AI总结提出一种在注意力对数计算中引入可分解局部性偏置的方法，结合旋转位置嵌入，使Transformer神经算子能零样本泛化到更大空间域，在PDE和3D工业流中验证有效性。

详情

AI中文摘要

基于Transformer的神经算子在逼近复杂几何上偏微分方程的解算子方面表现出色。然而，现有方法隐式假设固定域大小，限制了其推理时的泛化能力。在这项工作中，我们研究了域扩展，即在空间域显著大于训练时遇到的域上进行零样本推理。我们认为这种设置从根本上需要空间局部性和平移等变性。我们提出通过在注意力对数计算中引入可分解偏置来实现这种局部性，从而在保持完全可分解为查询-键内积的同时实现精细可控的局部性，并直接与优化的注意力内核兼容。结合旋转位置嵌入，它能够在不改变Transformer架构的情况下，实现具有可控空间支持的表达性嵌入。我们通过实验表明，我们的方法在两个PDE基准测试和一个3D工业大气流动应用中显著改善了向更大域的零样本泛化。我们的代码和数据集可在以下网址获取：此 https URL。

英文摘要

Transformer-based neural operators have shown remarkable performance for approximating solution operators of partial differential equations on complex geometries. However, existing approaches implicitly assume a fixed domain size, which limits their ability to generalize at inference. In this work, we investigate domain extension, namely zero-shot inference on spatial domains that are significantly larger than those encountered during training. We argue that this setting fundamentally requires spatial locality and translation equivariance. We propose to implement this locality via a decomposable bias in the attention logits computation, enabling finely controllable locality while remaining fully decomposable into query-key inner products and directly compatible with optimized attention kernels. Combined with rotary positional embeddings, it enables expressive embeddings with controllable spatial support without altering the transformer architecture. We empirically show that our approach substantially improves zero-shot generalization to larger domains across two PDE benchmarks and a 3D industrial atmospheric flow application. Our code and datasets are available at https://github.com/cerea-daml/domain-extension.

URL PDF HTML ☆

赞 0 踩 0

2606.14620 2026-06-15 cs.LG 新提交

LEPO：面向大语言模型的潜在推理策略优化

Yuyan Zhou, Jiarui Yu, Hande Dong, Zhezheng Hao, Hong Wang, Jianqing Zhang, Qiang Lin

发表机构 * Tencent（腾讯）； Zhejiang University（浙江大学）； Shanghai Jiao Tong University（上海交通大学）

AI总结 LEPO通过引入Gumbel-Softmax在大语言模型中实现可控的随机性，提升其探索能力与强化学习兼容性，通过直接在连续潜在表示上应用强化学习，显著优于现有方法。

详情

AI中文摘要

近年来，潜在推理被引入大语言模型（LLMs）以利用连续空间中的丰富信息。然而，缺乏随机采样时，这些方法不可避免地退化为确定性推理，无法发现多样的推理路径。为弥合这一差距，我们通过Gumbel-Softmax在潜在推理中注入可控的随机性，恢复LLMs的探索能力并增强其与强化学习（RL）的兼容性。在此基础上，我们提出LEPO，一种将强化学习直接应用于连续潜在表示的新框架。具体而言，在回放阶段，LEPO保持随机性以实现多样化的轨迹采样；在优化阶段，LEPO为潜在表示和离散令牌构建统一的梯度估计。大量实验表明，LEPO在离散和潜在推理方面显著优于现有RL方法。

英文摘要

Recently, latent reasoning has been introduced into large language models (LLMs) to leverage rich information within a continuous space. However, without stochastic sampling, these methods inevitably collapse to deterministic inference, failing to discover diverse reasoning paths. To bridge the gap, we inject controllable stochasticity into latent reasoning via Gumbel-Softmax, restoring LLMs' exploratory capacity and enhancing their compatibility with Reinforcement Learning (RL). Building on this, we propose \textbf{\underline{L}}atent R\textbf{\underline{e}}asoning \textbf{\underline{P}}olicy \textbf{\underline{O}}ptimization~(\textbf{LEPO}), a novel framework that applies RL directly to continuous latent representations. Specifically, in rollout stage, LEPO maintains stochasticity to enable diverse trajectory sampling, while in optimization stage, LEPO constructs a unified gradient estimation for both latent representations and discrete tokens. Extensive experiments show that LEPO significantly outperforms existing RL methods for discrete and latent reasoning.

URL PDF HTML ☆

赞 0 踩 0

2605.05983 2026-06-15 cs.LG 版本更新

Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions

无牺牲的引导：面向仅提示干预的引导向量的原则性训练

Yuntai Bao, Qinfeng Li, Xinyan Yu, Ge Su, Wenqi Zhang, Liu Yan, Haiqin Weng, Jianwei Yin, Xuhong Zhang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出联合训练引导因子和方向的方法，消除后验因子选择；引入仅提示引导向量（PrOSV），仅干预少量提示词，在AxBench上优于传统全序列引导向量，并实现更好的通用模型效用与对抗鲁棒性权衡。

Comments 63 pages, 50 figures; accepted by ICML 2026

详情

AI中文摘要

近年来，引导向量（SVs）已成为一种有效且轻量级的方法来引导大型语言模型（LLMs）的行为，其中微调后的SVs比无优化的SVs更有效。然而，当前的微调SV方法存在两个局限性。首先，它们需要在推理时针对每个SV仔细选择引导因子，以平衡引导效果和生成质量。其次，它们作为全序列SV（FSSVs）运行，无论因子选择如何，由于对模型生成过程的过度干预，都可能牺牲生成质量。为了解决第一个局限性，我们提出联合训练引导因子和方向，从而不再需要事后因子选择。利用神经网络缩放理论，我们发现引导因子适中的大初始化大小和学习率对于联合训练的稳定性和效率至关重要。为了解决第二个局限性，我们从表示微调中汲取灵感，引入了仅提示SV（PrOSV），一种仅干预少量提示词的SV。我们的实验结果表明，在使用我们的联合训练方案时，PrOSV在AxBench上优于传统的FSSVs。我们还发现，与FSSV相比，PrOSV在通用模型效用和对抗鲁棒性之间实现了更好的权衡。

英文摘要

Recently, steering vectors (SVs) have emerged as an effective and lightweight approach to steer behaviors of large language models (LLMs), among which fine-tuned SVs are more effective than optimization-free ones. However, current approaches to fine-tuned SVs suffer from two limitations. First, they require careful selection of steering factors on a per-SV basis to balance steering effectiveness and generation quality at inference time. Second, they operate as full-sequence SVs (FSSVs), which can sacrifice generation quality regardless of factor selection due to excessive intervention on the model generation process. To address the first limitation, we propose joint training of steering factors and directions, such that post-hoc factor selection is no longer required. Using neural network scaling theory, we find that moderately large initialization sizes and learning rates for steering factors are essential for stability and efficiency of joint training. To tackle the second limitation, we draw inspiration from representation fine-tuning and introduce Prompt-only SV (PrOSV), an SV that intervenes only on a few prompt tokens. Our empirical results show that PrOSV outperforms traditional FSSVs on AxBench when using our joint training scheme. We also find that PrOSV achieves a better tradeoff between general model utility and adversarial robustness than FSSV.

URL PDF HTML ☆

赞 0 踩 0

2605.07984 2026-06-15 cs.LG cs.AI 版本更新

Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions

计划在哪里？通过轻量级机制干预定位语言模型中的潜在规划

Nicole Ma, Nick Rui

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结通过押韵对句补全任务，使用线性探针和激活修补方法，研究语言模型在生成过程中是否形成并因果依赖未来约束的潜在规划，发现仅Gemma-3-27B模型存在因果依赖，并定位到五个注意力头。

Comments 13 pages, 20 figures, 3 tables. Accepted to Workshop on Mechanistic Interpretability @ ICML 2026

详情

AI中文摘要

我们研究语言模型中的规划位点形成——在前向传播过程中，结构约束的未来标记的内部表示是否形成，以及它们是否因果驱动生成。使用押韵对句补全作为前向约束的干净测试，我们在Qwen3、Gemma-3和Llama-3的十多个规模上应用两种轻量级方法（线性探针和激活修补）。探针显示，未来押韵信息在行边界处是线性可解码的，且信号在所有三个模型族中随规模增强。激活修补揭示，只有Gemma-3-27B因果依赖这种编码，表现出一种交接，其中因果驱动因素在大约第30层从押韵词迁移到行边界。我们测试的其他每个模型在整个生成过程中都条件于押韵词，在行边界处因果效应接近零，尽管探针信号很强。通过两阶段路径修补，我们将Gemma-3-27B的交接定位到五个注意力头，这些头在新行处恢复了约90%的押韵路由能力。

英文摘要

We study planning site formation in language models -- where internal representations of structurally-constrained future tokens form during the forward pass, and whether they causally drive generation. Using rhyming-couplet completion as a clean test of forward-looking constraint, we apply two lightweight methods (linear probing and activation patching) across Qwen3, Gemma-3, and Llama-3 at more than ten scales. Probing shows that future-rhyme information is linearly decodable at the line boundary, with signal that strengthens with scale in all three families. Activation patching reveals that only Gemma-3-27B causally relies on this encoding, exhibiting a handoff in which the causal driver migrates from the rhyme word to the line boundary around layer 30. Every other model we test conditions on the rhyme word throughout generation, with near-zero causal effect at the line boundary despite strong probe signal. We localize the Gemma-3-27B handoff to five attention heads through two-stage path patching that recover ~90% of the rhyme-routing capacity at the newline.

URL PDF HTML ☆

赞 0 踩 0

2605.11558 2026-06-15 cs.LG stat.ML 版本更新

A Composite Activation Function for Learning Stable Binary Representations

一种用于学习稳定二进制表示的复合激活函数

Seokhun Park, Choeun Kim, Kwanho Lee, Sehyun Park, Insung Kong, Yongdai Kim

发表机构 * Department of Statistics（统计学系）； Seoul National University（首尔国立大学）； Department of Applied Mathematics（应用数学系）； University of Twente（埃因霍温理工大学）

AI总结本文提出HTAF复合激活函数，通过平滑近似Heaviside函数实现稳定训练，适用于Spiking神经网络等模型，并引入ICBMs模型实现可解释的图像处理。

Comments 32 pages

详情

AI中文摘要

激活函数在神经网络中通过塑造内部表示起核心作用。最近，学习二进制激活表示因其在计算和内存效率以及可解释性方面的优势而受到广泛关注。然而，使用Heaviside激活函数训练神经网络仍具挑战性，因其非可导性阻碍了标准梯度优化。本文提出Heavy Tailed Activation Function (HTAF)，一种Heaviside函数的平滑近似，使基于梯度的优化能够稳定训练。我们构造HTAF为sigmoid双曲正切复合函数，并理论证明其在零输入附近保持大梯度质量，同时在尾部区域表现出更慢的梯度衰减。我们展示Spiking神经网络、二进制神经网络和深度Heaviside神经网络可以使用HTAF稳定训练。最后，我们引入隐式概念瓶颈模型（ICBMs），一种利用HTAF诱导离散特征表示的可解释图像模型。在各种架构和图像数据集上的广泛实验表明，ICBMs能够稳定地实现离散化，同时预测性能与标准模型相当或更好。

英文摘要

Activation functions play a central role in neural networks by shaping internal representations. Recently, learning binary activation representations has attracted significant attention due to their advantages in computational and memory efficiency, as well as interpretability. However, training neural networks with Heaviside activations remains challenging, as their non-differentiability obstructs standard gradient-based optimization. In this paper, we propose Heavy Tailed Activation Function (HTAF), a smooth approximation to the Heaviside function that enables stable training with gradient-based optimization. We construct HTAF as a sigmoid hyperbolic tangent composite function and theoretically show that it maintains a large gradient mass around zero inputs while exhibiting slower gradient decay in the tail regions. We show that Spiking Neural Networks, Binary Neural Networks and Deep Heaviside neural Networks can be trained stably using HTAF with gradient-based optimization. Finally, we introduce Implicit Concept Bottleneck Models (ICBMs), an interpretable image model that leverages HTAF to induce discrete feature representations. Extensive experiments across various architectures and image datasets demonstrate that ICBM enables stable discretization while achieving prediction performance comparable to or better than standard models.

URL PDF HTML ☆

赞 0 踩 0

2605.17779 2026-06-15 cs.LG 版本更新

Learning Variable-Length Tokenization for Generative Recommendation

学习可变长度分词以生成推荐

Minhao Wang, Bowen Wu, Wei Zhang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文提出VarLenRec框架，通过Popularity-Weighted Information Budget Allocation方法解决生成推荐中可变长度分词问题，提升推荐准确性和效率。

Comments 14 pages, 5 figures

详情

AI中文摘要

生成推荐将推荐问题重新表述为离散语义标识符（ID）的下一个标记预测。一个基本但未被探索的设计选择是现有方法对所有项目使用固定长度分词，隐含假设编码能力在项目特性上是均匀的。通过系统地在四个数据集上进行实验，我们发现流行度-长度悖论：流行项目在短ID下表现最佳，而尾部项目需要显著更长的代码来捕捉判别性语义。这揭示了一个关键不匹配：流行项目受益于丰富的协同信号并需要最小的语义细节，而尾部项目必须依赖于细粒度的内容特征，因为交互数据稀疏。为了解决这个问题，我们提出了VarLenRec，一个学习可变长度分词的框架。我们开发了流行度加权信息预算分配（PIBA），一个信息论框架证明最优ID长度应与流行度的负幂成比例。直接实现可变长度分配面临两个技术挑战：标准欧几里得残差量化缺乏支持不同代码长度的几何容量，而离散长度决策是非可微的。我们通过双曲残差量化解决这些问题，该方法利用庞加莱球的指数体积增长来自然分层编码能力，并通过软长度控制器实现可微长度预测，通过连续层保留概率正则化由PIBA推导出的先验。广泛的实验表明，VarLenRec在推荐准确性和训练/推理效率上显著优于现有最先进方法，揭示了自适应编码能力在生成推荐中的重要性。

英文摘要

Generative recommendation reformulates recommendation as next-token prediction over discrete semantic identifiers (IDs). A fundamental yet unexplored design choice is that existing methods employ fixed-length tokenization for all items, implicitly assuming uniform encoding capacity regardless of item characteristics. Through systematic experiments across four datasets, we discover the Popularity-Length Paradox: popular items achieve optimal performance with short IDs, while tail items require substantially longer codes to capture discriminative semantics. This reveals a critical mismatch where popular items benefit from abundant collaborative signals and require minimal semantic detail, whereas tail items must rely on fine-grained content features due to sparse interaction data. To address this, we propose VarLenRec, a framework for learning variable-length tokenization. We develop Popularity-Weighted Information Budget Allocation (PIBA), an information-theoretic framework proving that optimal ID length should scale as a negative power of popularity. Directly implementing variable-length allocation faces two technical challenges: standard Euclidean residual quantization lacks geometric capacity to support diverse code lengths without distortion, and discrete length decisions are non-differentiable. We address these through Hyperbolic Residual Quantization, which leverages the exponential volume growth of the Poincaré ball to naturally stratify encoding capacity, and a Soft Length Controller, which enables differentiable length prediction via continuous layer retention probabilities regularized by PIBA-derived priors. Extensive experiments demonstrate that VarLenRec achieves significant improvements over state-of-the-art methods in recommendation accuracy and training/inference efficiency, revealing the importance of adaptive encoding capacity in generative recommendation.

URL PDF HTML ☆

赞 0 踩 0

2605.18848 2026-06-15 cs.LG cs.AI 版本更新

Exact Linear Attention

精确线性注意力

Weinuo Ou

发表机构 * GitHub

AI总结本文提出精确线性注意力（ELA），通过利用核函数的精确分解性质，实现Transformer注意力的线性计算复杂度，消除近似误差。针对先前线性注意力的两个关键限制——梯度爆炸和token注意力稀释，提出核约束以确保非负性、判别性和几何可解释性。此外，本文还提出了三种工程创新，包括Hyper-Link结构、Memory Lobe模块和基于路由分数的MoE偏置机制，实验结果表明ELA在解码速度和KV缓存内存使用上分别达到全注意力的6倍和75%的减少，同时保持或优于训练性能。

Comments 9 pages, 19 figures, journal

详情

AI中文摘要

本文介绍精确线性注意力（ELA），一种通过利用核函数的精确分解性质，实现Transformer注意力线性计算复杂度的机制，从而消除近似误差。我们识别并解决了先前线性注意力的两个关键限制——梯度爆炸和token注意力稀释——通过施加核约束，确保非负性、判别性和几何可解释性。提出了几种核函数，包括Hadamard Exp核、求和平方欧几里得距离核和减法平方欧几里得距离核，每种都针对特定的注意力行为进行了优化。除了核心注意力公式之外，本文还提出了三种工程创新：（1）Hyper-Link结构，用以替代传统残差连接以缓解梯度退化；（2）基于双向线性注意力的Memory Lobe模块，捕捉跨层的“转换流”以实现定性记忆和隐式强化学习范式；（3）基于路由分数的MoE偏置机制，以提高可解释性和语义对齐。实验结果表明，ELA在解码速度和KV缓存内存使用上分别达到全注意力的6倍和75%的减少，同时保持或优于训练性能。所提出的记忆模块加速了收敛并增强了泛化能力。此外，我们还将线性注意力原理扩展到视觉模型，得到YOLO-LAT，其在GPU推理速度和参数减少方面分别达到4.3倍和7.9倍，同时保持竞争性的检测精度。这些结果表明，精确线性注意力在扩展Transformer模型以处理超长序列和高效视觉任务方面具有广泛的应用前景。

英文摘要

This paper introduces Exact Linear Attention (ELA), a mechanism that achieves linear computational complexity for Transformer attention by exploiting the exact decomposition property of kernel functions, thereby eliminating approximation error. We identify and address two key limitations of prior linear attention -- gradient explosion and token attention dilution -- by imposing kernel constraints that ensure non-negativity, discriminability, and geometric interpretability. Several kernel functions are proposed, including the Hadamard Exp Kernel, Summation Squared Euclidean Distance Kernel, and Subtraction Squared Euclidean Distance Kernel, each tailored for specific attention behaviors. Beyond the core attention formulation, the paper presents three engineering innovations: (1) a Hyper-Link structure that replaces traditional residual connections to mitigate gradient degradation; (2) a Memory Lobe module based on bidirectional linear attention, which captures "transformation flow" across layers to implement qualitative memory and an implicit reinforcement learning paradigm; and (3) a routing-score-based bias mechanism for Mixture-of-Experts (MoE) to improve interpretability and semantic alignment. Experimental results demonstrate that ELA achieves up to 6x faster decoding speed and 75% reduction in KV cache memory usage compared to full attention, while maintaining comparable or superior training performance. The proposed memory module accelerates convergence and enhances generalization. Furthermore, we extend the linear attention principle to vision models, yielding YOLO-LAT, which attains up to 4.3x GPU inference speedup and 7.9x parameter reduction with competitive detection accuracy. These results underline the broad applicability of exact linear attention for scaling Transformer models to ultra-long sequences and efficient visual tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.01476 2026-06-15 cs.LG cs.CL 版本更新

OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification

OmniOPD：通过推测性验证实现无Logit的在线策略蒸馏

Yuhang Zhou, Lizhu Zhang, Yifan Wu, Mingyi Wang, Bo Peng, Jiayi Liu, Xiangjun Fan, Zhuokai Zhao

发表机构 * Meta AI

AI总结提出OmniOPD框架，通过基于蒙特卡洛展开的块级语义相似度替代token级logit匹配，结合峰值熵调度器和贝叶斯先验，解决在线策略蒸馏中logit不可获取和信号脆弱问题，在数学任务上超越标准OPD达28.64%。

Comments 26 pages, 3 figures

详情

AI中文摘要

在线策略蒸馏（OPD）在强教师模型的密集token级反馈下，基于学生模型自身的生成轨迹进行训练，缓解了监督微调（SFT）的离策略分布偏移和强化学习（RL）的稀疏信用分配问题。然而，标准OPD面临两个耦合的限制。首先，它需要直接访问教师模型的token级logit，将一大类有能力的专有模型排除在教师之外。其次，token级logit信号本身是脆弱的，依赖于教师和学生之间合理下一个token的狭窄重叠，并且容易放大重复循环等退化模式。在本文中，我们引入了OmniOPD，一种通过无logit的块级监督信号解决这两个限制的新框架。OmniOPD用蒙特卡洛展开替代确定性logit匹配，通过多token块上的连续语义相似性度量近似教师的局部偏好，并通过峰值熵调度器集中这种监督，仅在学生的高不确定性推理分叉处进行审计。Dirichlet-Multinomial贝叶斯先验和基础模型KL锚进一步限制了离散采样的方差，并防止了未审计token上的策略崩溃。在竞争性基准测试中，OmniOPD在数学任务上超越标准OPD方法高达28.64%，证实了块级语义验证提取了比token级logit匹配更可靠的学习信号，后者高信息密度被显著的噪声和脆弱性所抵消。此外，当与更强的黑盒教师（如Claude-4.5-Haiku和Gemini-2.5-Flash）配对时，OmniOPD在数学任务上相对于其开放权重教师对应物额外获得了9.54%的相对提升，使学生超越了自我探索RL的性能。

英文摘要

On-Policy Distillation (OPD) trains a student model on its own generative trajectories under dense token-level feedback from a stronger teacher, mitigating both the off-policy distribution shift of Supervised Fine-Tuning (SFT) and the sparse credit assignment of Reinforcement Learning (RL). However, standard OPD faces two coupled limitations. First, it requires direct access to the teacher's token-level logits, excluding a broad class of capable proprietary models from serving as teachers. Second, the token-level logit signal itself is brittle, depending on a narrow overlap of plausible next tokens between teacher and student, and prone to amplifying degenerate patterns such as repetition loops. In this paper, we introduce OmniOPD, a novel framework that addresses both limitations through a logit-free, chunk-level supervision signal. OmniOPD replaces deterministic logit matching with Monte Carlo rollouts that approximate the teacher's local preferences through a continuous semantic similarity metric over multi-token chunks, and concentrates this supervision via a peak-entropy scheduler that audits the student only at its high-uncertainty reasoning forks. A Dirichlet-Multinomial Bayesian prior and a base-model KL anchor further bound the variance of discrete sampling and prevent policy collapse across unaudited tokens. Across competitive benchmarks, OmniOPD surpasses the standard OPD approach by up to +28.64% on math, confirming that chunk-level semantic verification extracts a more reliable learning signal than token-level logit matching, whose high information density is offset by significant noise and brittleness. Furthermore, when paired with stronger black-box teachers such as Claude-4.5-Haiku and Gemini-2.5-Flash, OmniOPD achieves an additional +9.54% relative on math over its open-weight teacher counterpart, advancing the student past the performance of self-exploratory RL.

URL PDF HTML ☆

赞 0 踩 0

2606.03085 2026-06-15 cs.LG cs.CL 版本更新

Multi-component Causal Tracing in Large Language Models

大型语言模型中的多组件因果追踪

Zirui Yan, Dennis Wei, Dmitriy A. Katz, Prasanna Sattigeri, Ali Tajer

发表机构 * Rensselaer Polytechnic Institute（拉特拉姆技术学院）； IBM Research（IBM研究院）

AI总结本文提出一个统一框架，通过软干预和度量转换高效识别对目标性能指标最关键的多组件子集，优于现有基线方法。

Comments Accepted to ACL 2026 main conference

详情

AI中文摘要

因果追踪通过系统地干预大型语言模型（LLM）的内部表示，揭示并量化将特定输入或计算与特定感兴趣指标联系起来的因果路径，从而量化LLM的行为。在先前单组件或单层研究的基础上，本文提出了一个同时因果追踪多个组件的统一框架。该框架系统地识别对期望目标性能指标（如准确性和公平性）最关键的组件子集（例如注意力头和多层感知器神经元）。这是通过将灵活的干预应用于广泛期望的指标来实现的。为了解决多组件问题的组合复杂性，设计了一种高效算法，该算法利用软干预和精心设计的度量转换，将组合搜索问题转化为一个连续问题，该问题可以在适当约束下高效求解，从而为选择组件生成适当的二元决策。实验结果表明，所提出的方法高效地识别出对目标指标具有高影响力的模型组件子集，优于现有基线方法。我们的代码可从此https URL获取。

英文摘要

Causal tracing systematically intervenes on a large language model's (LLM's) internal representations to uncover and quantify the causal pathways linking specific inputs or computations to specific metrics of interest, quantifying the LLM's behavior. Building on previous single-component or single-layer studies, this paper presents a unified framework for causally tracing multiple components simultaneously. This framework systematically identifies the subsets of components (e.g., attention heads and multi-layer perceptron neurons) most critical to a desired target performance metric (e.g., accuracy and fairness). This is achieved by incorporating flexible interventions applied to a wide range of desired metrics. To address the combinatorial complexity of the multi-component problem, an efficient algorithm is designed that leverages soft interventions and a carefully designed metric transformation, converting the combinatorial search problem into a continuous one that can be solved efficiently under proper constraints, thereby generating proper binary decisions for selecting components. Experimental results demonstrate that the proposed method efficiently identifies subsets of the model's components that have a high impact on the target metric, outperforming existing baseline approaches. Our code is available at https://github.com/ZiruiYan/multi-component-causal-tracing.

URL PDF HTML ☆

赞 0 踩 0

2606.06010 2026-06-15 cs.LG cs.DB 版本更新

Adaptive Oscillatory-State Alignment for Time Series Forecasting

自适应振荡状态对齐用于时间序列预测

Zhangyao Song, Chaofeng Qu, Chao Zha, Xiaoyu Zhao, Yinfei Xu, Tao Guo

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出AOSNET框架，通过希尔伯特变换将固定模板匹配改为自适应振荡状态对齐，以处理实际时间序列中的非平稳振荡行为，在多个基准上达到先进或竞争性精度。

详情

AI中文摘要

长期时间序列预测受益于揭示重复时间结构的归纳偏置。现有的周期性预测方法通常通过预定义周期、全局频谱分量或固定可学习模板来建模重复性。然而，现实世界的时间动态很少是严格周期性的：振荡行为通常通过幅度调制、相位漂移和局部频率变化而演变。在这些条件下，固定模板的周期性建模可能与底层时间状态根本性不匹配。我们提出了AOSNET，一个希尔伯特引导的预测框架，将周期性预测从固定模板匹配重新表述为自适应振荡状态对齐。AOSNET从观测序列和可学习的全局振荡先验中提取解析信号描述符，然后通过描述符条件门自适应地对齐局部状态，该门选择性地保留可靠观测，同时软性纠正不匹配区域。学习到的先验不是作为刚性的重复模板，而是作为通过局部状态动力学解释的灵活振荡参考。在八个基准上的实验表明，具有快速推理速度的最先进或高度竞争的准确性。控制合成研究分离幅度调制、相位漂移和局部频率变化，证实振荡状态对齐的优势随着非平稳性加剧而持续增加。

英文摘要

Long-term time series forecasting benefits from inductive biases that expose recurring temporal structure. Existing periodic forecasting methods typically model recurrence through predefined periods, global spectral components, or fixed learnable templates. However, real-world temporal dynamics are rarely rigidly periodic: around a nominal cycle, oscillatory behavior often exhibits \emph{non-rigid periodicity} (NRP), where cycle magnitude, cycle alignment, and local cycle duration vary over time. Under these conditions, fixed-template periodic modeling can become fundamentally mismatched to the underlying temporal states. We propose AOSNet, a Hilbert-guided forecasting framework that reformulates periodic forecasting from fixed template matching to adaptive oscillatory-state alignment. AOSNet extracts analytic-signal descriptors from both the observed sequence and a learnable global oscillatory prior, then adaptively aligns local states through a descriptor-conditioned gate that selectively preserves reliable observations while softly correcting mismatched regions. The learned prior serves not as a rigid repeated template but as a flexible oscillatory reference interpreted through local state dynamics. Experiments on eight public benchmarks and two cloud workload traces demonstrate leading or highly competitive accuracy with a compact model size and low inference latency, supporting repeated forecasting settings such as capacity planning and autoscaling. Controlled synthetic studies that isolate cycle-magnitude and cycle-alignment variation and combine them with cycle-duration changes show that the advantage of oscillatory-state alignment increases as NRP intensifies.

URL PDF HTML ☆

赞 0 踩 0

2606.13119 2026-06-15 cs.LG cs.AI cs.NE 版本更新

Reward-SQL：通过逐步执行感知推理和过程监督奖励提升Text-to-SQL

Yuxin Zhang, Meihao Fan, Ju Fan, Mingyang Yi, Yuyu Luo, Guoliang Li, Bin Wu, Wenchao Zhou

发表机构 * Renmin University of China（中国人民大学）； Tsinghua University（清华大学）； Alibaba Cloud Computing（阿里云 computing）

AI总结针对强化学习在Text-to-SQL中缺乏逐步执行感知推理和过程级奖励的问题，提出CoCTE框架和Reward-SQL方法，通过中间视图验证、结构化CTE及过程奖励模型，显著提升复杂查询的准确性和可解释性。

详情

AI中文摘要

最近，使用强化学习（RL）训练的大型语言模型（LLMs）的进展提高了Text-to-SQL的性能。然而，基于RL的方法仍然在处理复杂查询时面临两个关键限制：缺乏基于数据库反馈的逐步执行感知推理，以及缺乏用于指导推理优化的过程级奖励。为了解决这些问题，我们提出了CoCTE，一种分治且执行感知的推理框架，通过中间视图验证和结构化公共表表达式（CTEs）逐步组合SQL查询，提高了准确性和可解释性。为了实现CoCTE推理过程，我们开发了Reward-SQL，一种统一的方法，包含三个阶段：（1）模型初始化，使LLMs具备结构化CoCTE推理能力；（2）过程奖励设计，提供细粒度的、执行感知的监督；（3）过程监督的RL和推理，将过程奖励整合到训练中，并通过过程奖励指导推理阶段。本文解决了Reward-SQL中的核心挑战，并做出了以下贡献。我们引入了一个过程奖励模型（PRM），它将执行感知的轨迹评分与基于熵的步骤加权相结合，在推理步骤中提供密集且可解释的监督。我们将PRM集成到RL训练和推理阶段，稳定优化并通过过程级信号改进轨迹探索。实验表明，Reward-SQL在可比模型大小下显著优于基线，并表现出强大的跨领域泛化能力。

英文摘要

Recent advances in large language models (LLMs) trained with reinforcement learning (RL) have improved Text-to-SQL performance. However, RL-based approaches still struggle with complex queries due to two key limitations: insufficient stepwise execution-aware reasoning grounded in database feedback, and the lack of process-level rewards for guiding reasoning optimization. To address these issues, we propose CoCTE, a divide-and-conquer and execution-aware reasoning framework that progressively composes SQL queries through intermediate view validation and structured Common Table Expressions (CTEs), improving both accuracy and interpretability. To realize a CoCTE reasoning process, we develop Reward-SQL, a unified approach with three stages: (1) model initialization, which equips LLMs with structured CoCTE reasoning capabilities; (2) process reward design, which delivers fine-grained, execution-aware supervision; and (3) process-supervised RL and inference, which integrates process rewards into training and guides the inference stage by process rewards. This paper addresses the core challenges in Reward-SQL and makes the following contributions. We introduce a process reward model (PRM) that combines execution-aware trajectory scoring with entropy-based step weighting, providing dense and interpretable supervision across reasoning steps. We integrate PRM into both RL training and inference stages, stabilizing optimization and improving trajectory exploration with process-level signals. Experiments show that Reward-SQL significantly outperforms baselines with comparable model sizes, and exhibits strong cross-domain generalization.

URL PDF HTML ☆

赞 0 踩 0

2601.05106 2026-06-15 cs.AI cs.CL cs.LG 版本更新

Token-Level LLM Collaboration via FusionRoute

通过融合路由实现令牌级LLM协作

Nuoya Xiong, Yuhang Zhou, Hanqing Zeng, Zhaorun Chen, Furong Huang, Shuchao Bi, Lizhu Zhang, Zhuokai Zhao

发表机构 * [cs.AI]（计算机科学与人工智能）

AI总结本文提出FusionRoute框架，通过轻量级路由器在解码步骤中选择最合适的专家并补充对数几率以优化下一个令牌分布，解决了单个通用模型在多个领域表现不佳的问题，同时在多个基准测试中优于其他方法。

Comments 25 pages

详情

AI中文摘要

大型语言模型（LLMs）在多个领域表现出色。然而，使用单一通用模型在这些领域实现强大性能通常需要扩展到训练和部署成本极高的规模。另一方面，虽然较小的领域专用模型更高效，但它们在训练分布之外的泛化能力较差。为了解决这一矛盾，我们提出了FusionRoute，一种稳健且有效的令牌级多LLM协作框架，其中轻量级路由器同时（i）在每个解码步骤中选择最合适的专家，（ii）贡献一个互补的对数几率，通过对数几率添加来细化或校正所选专家的下一个令牌分布。与现有依赖固定专家输出的令牌级协作方法不同，我们提供了一个理论分析，表明纯专家路由本质上是有限的：除非持有强全局覆盖假设，否则无法一般实现最优解码策略。通过在专家选择中加入可训练的互补生成器，FusionRoute扩展了有效的策略类别，并在温和条件下实现了最优价值函数的恢复。经验上，FusionRoute在Llama-3和Gemma-2家族以及涵盖数学推理、代码生成和指令跟随在内的多种基准测试中，优于序列级和令牌级协作、模型融合和直接微调方法，同时在各自任务上与领域专家保持竞争力。

英文摘要

Large language models (LLMs) exhibit strengths across diverse domains. However, achieving strong performance across these domains with a single general-purpose model typically requires scaling to sizes that are prohibitively expensive to train and deploy. On the other hand, while smaller domain-specialized models are much more efficient, they struggle to generalize beyond their training distributions. To address this dilemma, we propose FusionRoute, a robust and effective token-level multi-LLM collaboration framework in which a lightweight router simultaneously (i) selects the most suitable expert at each decoding step and (ii) contributes a complementary logit that refines or corrects the selected expert's next-token distribution via logit addition. Unlike existing token-level collaboration methods that rely solely on fixed expert outputs, we provide a theoretical analysis showing that pure expert-only routing is fundamentally limited: unless strong global coverage assumptions hold, it cannot in general realize the optimal decoding policy. By augmenting expert selection with a trainable complementary generator, FusionRoute expands the effective policy class and enables recovery of optimal value functions under mild conditions. Empirically, across both Llama-3 and Gemma-2 families and diverse benchmarks spanning mathematical reasoning, code generation, and instruction following, FusionRoute outperforms both sequence- and token-level collaboration, model merging, and direct fine-tuning, while remaining competitive with domain experts on their respective tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.26702 2026-06-15 cs.CV cs.AI cs.CR cs.LG 版本更新

Rotation-Invariant Spherical Watermarking via Third-Order SO(3) Representation Coupling

通过三阶SO(3)表示耦合的旋转不变球面水印

Pengzhen Chen, Yanwei Liu, Xiaoyan Gu, Antonios Argyriou, Wu Liu, Weiping Wang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结针对全景图像在任意3D旋转下水印鲁棒性不足的问题，提出利用三阶SO(3)表示耦合构造旋转不变的球面双谱，将水印嵌入高阶球谐系数并从不变标量中提取，实现理论保证的旋转不变性和高视觉保真度。

Comments ICML 2026

详情

AI中文摘要

DRIVE：基于分布与检索增强的价值评估竞价方法

Miduo Cui, Haochen Wang, Shangqin Mao, Xun Yang, Qianlong Xie, Xingxing Wang, Xuri Ge, Ying Zhou, Zhiwei Xu

发表机构 * Machine Learning, ICML（机器学习，ICML）

AI总结提出DRIVE框架，通过解耦候选动作生成与决策，结合分布建模、检索增强和价值评估，提升离线自动竞价性能。

Comments Accepted to ICML 2026

详情

AI中文摘要

自动竞价是实时广告系统的核心组成部分，决策必须在预算和成本约束下优化长期性能，而在线探索风险极高。离线强化学习以及最近基于Transformer的序列建模在从日志数据中学习竞价策略方面显示出前景，但其单峰和纯参数化公式通常将多种有效竞价策略折叠为次优的平均动作，并在稀疏或长尾流量下表现不可靠。为缓解这些限制，我们提出DRIVE（基于分布与检索增强的价值评估竞价），一个统一的基于Transformer的框架，将候选动作生成与决策解耦，用于离线自动竞价。DRIVE结合了分布动作建模、从高质量历史决策中检索增强的候选生成以及基于价值的评估，以在推理时选择最有希望的出价。在AuctionNet和额外离线强化学习基准上的大量实验表明，DRIVE持续改善竞价性能，并在多种基于Transformer的方法上具有良好的泛化能力。

英文摘要

Auto-bidding is a core component of real-time advertising systems, where decisions must optimize long-term performance under budget and cost constraints, while online exploration is prohibitively risky. Offline reinforcement learning and, more recently, Transformer-based sequence modeling have shown promise for learning bidding policies from logged data, but their unimodal and purely parametric formulations often collapse multiple effective bidding strategies into suboptimal averaged actions and perform unreliably under sparse or long-tail traffic. To mitigate these limitations, we propose DRIVE (Distributional and Retrieval-Augmented Bidding with Value Evaluation), a unified Transformer-based framework that decouples candidate action generation from decision making for offline auto-bidding. DRIVE combines distributional action modeling, retrieval-augmented candidate generation from high-quality historical decisions, and value-based evaluation to select the most promising bid at inference time. Extensive experiments on AuctionNet and additional offline reinforcement learning benchmarks demonstrate that DRIVE consistently improves bidding performance and generalizes well across multiple Transformer-based methods.

URL PDF HTML ☆

赞 0 踩 0

2606.14536 2026-06-15 cs.LG cs.RO cs.SY eess.SY 新提交

Provably Safe, Yet Scalable Reinforcement Learning

可证明安全且可扩展的强化学习

Kai S. Yun, Zeyang Li, Navid Azizan

发表机构 * MIT（麻省理工学院）

AI总结提出PS2-RL框架，通过两阶段架构（学习备份策略隐式构造控制不变集，再通过可微投影层训练RL策略）实现可证明安全且可扩展的强化学习，在高达10维状态空间中保持性能与安全性。

详情

AI中文摘要

安全强化学习旨在学习在满足约束的同时优化奖励的策略。主流方法依赖于软约束策略优化，虽取得经验成功，但无法为学习策略提供正式安全保证。相反，具有严格保证的方法通常依赖显式证书函数，其构造需要直接综合和验证控制不变集，这一过程随状态维度扩展性差，且往往导致过于保守的行为。本文提出可证明安全且可扩展的强化学习（PS2-RL）框架，一种新颖的两阶段架构，以可扩展方式学习可证明安全的策略，旨在克服先前方法的关键瓶颈。PS2-RL不显式计算不变集，而是利用学习的备份策略前向积分系统动力学，在线生成隐式控制不变集。第一阶段，通过提出的安全到达值函数训练备份策略，该值函数刻画了用于不变集构造的最优备份策略。第二阶段，通过可微投影层端到端训练RL策略，该投影层严格强制由学习备份策略诱导的安全保证。通过在第一阶段最大化隐式控制不变集的体积，第二阶段得到的PS2策略既高效又可扩展，同时保持可证明安全性。关键的是，PS2-RL对底层RL算法无限制，可插入任何现有训练流程。我们为所提框架建立了理论保证，并在状态维度高达10的机器人控制任务上进行了评估，而在此范围内，先前可证明安全的RL方法难以应对或变得不实用。

英文摘要

Safe reinforcement learning (RL) aims to learn policies that optimize rewards while satisfying constraints. Predominant approaches rely on soft-constrained policy optimization, which has achieved empirical success but does not provide formal safety guarantees for the learned policy. In contrast, methods with strict guarantees typically rely on explicit certificate functions, whose construction requires the direct synthesis and verification of control-invariant sets, a process that scales poorly with state dimension and often yields overly conservative behavior. In this paper, we present the Provably Safe, yet Scalable RL (PS2-RL) framework, a novel two-phase architecture for learning provably safe policies in a scalable manner, designed to overcome the key bottlenecks of prior methods. Rather than explicitly computing invariant sets, PS2-RL leverages a learned backup policy to forward-integrate the system dynamics, generating an implicit control-invariant set online. In the first phase, the backup policy is trained with our proposed safe-arrival value function, which characterizes the optimal backup policy for invariant-set construction. In the second phase, an RL policy is trained end-to-end through a differentiable projection layer that strictly enforces the safety guarantees induced by the learned backup policy. By maximizing the volume of the implicit control-invariant set in the first phase, the resulting PS2 policy from the second phase is performant and scalable, while maintaining provable safety. Crucially, PS2-RL imposes no restrictions on the underlying RL algorithm and can be plugged into any existing training pipeline. We establish theoretical guarantees for the proposed framework and evaluate it on robotic control tasks with state dimensions up to 10, a regime in which prior provably safe RL methods struggle or become impractical.

URL PDF HTML ☆

赞 0 踩 0

2606.14650 2026-06-15 cs.LG 新提交

Graph Structured Combinatorial Semi-Bandit with Nonlinear Reward Associations through Separable Signals

具有非线性奖励关联的图结构组合半赌博机通过可分离信号

Christoph Bauschmann, Setareh Maghsudi

发表机构 * IEEE

AI总结针对图结构组合半赌博机问题，提出基于图因果奖励建模、再生核方法和泰勒近似的自适应策略，实现时间次线性与数据量线性性能保证，并验证于合成与真实交通数据。

详情

AI中文摘要

在大量互连数据中识别最优结构需要大量的采样和计算工作。学习和利用潜在的信号依赖关系可以显著提高效率和预测能力，但非线性统计关系的普遍性增加了此类任务的复杂性。在本文中，我们开发了新颖的通用自适应策略，配备了基于图的因果奖励建模、解析再生核方法以及函数过程的泰勒逼近。我们建立了理论性能保证，在时间上呈次线性，在数据量上随时间呈线性。我们的分析涵盖了对噪声干扰、渐进模型收敛和解空间不匹配等多种不确定性的鲁棒性。该框架的通用性通过最小化条件集或对先验估计的依赖得到证实，而各种概述的修改则针对特定或扩展设置。为了证明实际有效性，我们使用基准合成和真实世界交通数据集进行了数值实验。

英文摘要

The identification of optimal structures within vast arrays of interconnected data necessitates significant sampling- and computational effort. Learning and leveraging underlying signal dependencies can improve efficiency and predictive capabilities considerably, but the ubiquity of nonlinear statistical relations amplifies the complexity of such undertakings. In this paper, we develop novel generic and adaptive strategies equipped with routines for graph-based causal reward modeling, analytic reproducing kernel methods, and Taylor approximation of functional processes. We establish theoretical performance guarantees sublinear in time and linear in data volume over time. Our analyses cover robustness to a multitude of uncertainties arising from noise interference, gradual model convergence, and solution space mismatch. The framework's general appeal is substantiated by a minimalistic set of conditions or reliance on prior estimates, while various outlined modifications address specific or extended settings. To demonstrate practical effectiveness, we conduct numerical experiments using both benchmarked synthetic and real-world transportation datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.13698 2026-06-15 eess.SY cs.AI cs.LG cs.NI cs.PF cs.SY 交叉投稿

Active Inference for Adaptive Traffic Signal Control in Noisy Nonstationary IoT Environments

嘈杂非平稳物联网环境下自适应交通信号控制的主动推理方法

Dénes Toth, George Ambroladze, Edwin Sundberg, Ali Beikmohammadi, Alfreds Lapkovskis

发表机构 * Department of Computer Systems and Sciences（计算机系统与科学系）； Stockholm University（斯德哥尔摩大学）

AI总结提出一种基于主动推理的交通信号控制器，通过最小化期望自由能动态选择相位，在传感器遮挡、天气衰减和非平稳需求下优于深度Q网络和规则方法，降低空闲时间和CO2排放。

Comments Submitted to IEEE 12th World Forum on Internet of Things (WF-IoT) 2026

详情

AI中文摘要

在物联网化交叉口的城市交通信号控制必须在传感器遮挡、天气衰减和非平稳需求下保持有效。传统控制器在这些条件下性能下降，学习策略难以审计。为应对这些挑战，我们提出一种针对四臂信号交叉口的主动推理控制器，通过最小化关于各方向拥堵水平的高斯信念的期望自由能（EFE）动态选择相位，形成完全可追踪的决策流程。我们在SUMO交通模拟器中，将控制器与基于规则的启发式方法和深度Q网络（DQN）进行对比，涵盖四种逐渐增加噪声和非平稳性的场景，包括传感器遮挡、恶劣天气和随机事故。每个场景进行100次独立随机评估，主动推理在噪声最大的场景中实现了最低的空闲时间和CO2排放（分别为56,977秒和29.12千克，而DQN为71,741秒和30.56千克）。这些收益以公交优先服务率和相位切换频率的适度代价为代价。

英文摘要

Urban traffic signal control at IoT-instrumented intersections must remain effective under sensor occlusion, weather attenuation, and nonstationary demand. Conventional controllers degrade under these conditions, and learned policies remain difficult to audit. To address these challenges, we propose an active inference controller for a four-arm signalized intersection that dynamically selects phases by minimizing expected free energy (EFE) over Gaussian beliefs about per-direction congestion levels, yielding a fully traceable decision pipeline. We benchmark the controller in a SUMO traffic simulator against a rule-based heuristic and a deep Q-network (DQN) across four scenarios that progressively increase noise and nonstationarity, spanning sensor occlusion, adverse weather, and stochastic accidents. Across 100 independent random evaluations per scenario, active inference attains the lowest idle times and CO2 emissions in the noisiest scenarios (56,977 s and 29.12 kg vs. 71,741 s and 30.56 kg for DQN). These gains come at a modest cost in bus priority service rate and phase switch frequency.

URL PDF HTML ☆

赞 0 踩 0

2606.13832 2026-06-15 cs.MA cs.AI cs.CR cs.LG 交叉投稿

缩小反思差距：面向智能体强化学习的免费校准奖励

Yinglun Zhu

发表机构 * University of California, Riverside（加州大学河滨分校）

AI总结针对LLM智能体在环境反馈后自我评估不准确的问题，提出RefGRPO方法，通过对比反思与实际结果计算免费校准奖励并动态调整系数，同时提升反思校准和任务准确率。

详情

AI中文摘要

LLM越来越多地被部署为与外部环境交互并观察执行结果、错误消息和工具输出等反馈的智能体。一个功能良好的智能体应能利用这些反馈准确评估自身表现。然而，我们发现存在持续的反思差距：LLM智能体在观察到具体环境反馈后，倾向于错误评估自身输出——即使对于它们正确回答的问题也是如此——而标准RL由于信用分配不匹配几乎无济于事。为缩小这一差距，我们提出RefGRPO，一种简单而有效的修复方法，通过两个关键要素增强标准RL算法：一个免费校准奖励，通过对比智能体自身反思与实际结果计算（无需额外奖励模型、LLM评判或外部标注），以及对其系数的动态调度。与标准RL基线相比，我们的方法在五个基准的文本到SQL任务上同时提高了反思校准（例如，将不自信率从44.4%降至7.7%）和任务准确率（例如，从75.1%提升至76.5%）。由此产生的校准反思将智能体转变为基于环境反馈的自身验证器，进一步实现：（i）更好的自我改进，使用反思作为伪奖励而无需结果监督；（ii）更有效的测试时选择性预测，仅提交标记为正确的rollout。

英文摘要

LLMs are increasingly deployed as agents that interact with external environments and observe feedback such as execution results, error messages, and tool outputs. A well-functioning agent should be able to leverage this feedback to accurately assess its own performance. Yet we find a persistent reflection gap: LLM agents tend to mis-assess their own outputs after observing concrete environment feedback -- even for questions they correctly answered -- and standard RL barely helps due to a credit-assignment mismatch. To close this gap, we propose RefGRPO, a simple yet effective fix that augments standard RL algorithms with two key ingredients: a free calibration bonus computed by contrasting the agent's own reflection with the actual outcome (requiring no additional reward model, LLM judge, or external annotation), and a dynamic schedule on its coefficient. Compared to standard RL baselines, our method simultaneously improves reflection calibration (e.g., reduces underconfidence rate $44.4\% \to 7.7\%$) and task accuracy (e.g., $75.1\% \to 76.5\%$) on text-to-SQL across five benchmarks. The resulting calibrated reflection turns the agent into its own verifier grounded in environment feedback, which further enables (i) better self-improvement that uses reflections as pseudo-rewards without outcome supervision, and (ii) more effective test-time selective prediction by committing only to rollouts flagged as correct.

URL PDF HTML ☆

赞 0 踩 0

2606.14418 2026-06-15 cs.AI cs.LG cs.RO 交叉投稿

Causal Object-Centric Models for Planning with Monte Carlo Tree Search

用于蒙特卡洛树搜索规划的因果对象中心模型

Rodion Vakhitov, Leonid Ugadiarov, Alexey Skrynnik, Aleksandr Panov

发表机构 * MIRAI ； CogAILab

AI总结提出COMET算法，结合无监督对象中心编码器和Transformer世界模型，通过动作-槽融合机制和对象因果注意力实现高效规划，在多个基准上优于基线方法。

详情

AI中文摘要

高效探索的无监督学习：通过自我设定目标预训练自适应策略

Octavio Pappalardo

发表机构 * University College London (UCL)（伦敦大学学院（UCL））

AI总结提出ULEE方法，结合上下文学习器与对抗性目标生成策略，在无监督元学习框架中优化多回合探索与适应，提升零样本和少样本性能。

Comments ICLR 2026; v2 adds link to code: https://github.com/Octavio-Pappalardo/ulee-jax

Journal ref The Fourteenth International Conference on Learning Representations, 2026

详情

AI中文摘要

无监督预训练可以为强化学习智能体提供先验知识，加速下游任务的学习。一个基于人类发展的有前景方向是研究智能体通过设定和追求自身目标来学习。核心挑战在于如何有效地生成、选择并从这些目标中学习。我们的关注点是下游任务的广泛分布，其中零样本解决每个任务是不可行的。当目标任务位于预训练分布之外或智能体未知其身份时，这种设置自然出现。在这项工作中，我们(i)在元学习框架内优化高效的多回合探索和适应，以及(ii)用智能体适应后性能的演化估计来指导训练课程。我们提出了ULEE，一种无监督元学习方法，它将上下文学习器与对抗性目标生成策略相结合，该策略将训练维持在智能体能力的前沿。在XLand-MiniGrid基准测试中，ULEE预训练产生了改进的探索和适应能力，这些能力泛化到新的目标、环境动态和地图结构。得到的策略获得了改进的零样本和少样本性能，并为更长的微调过程提供了强初始化。它优于从头学习、DIAYN预训练和替代课程。代码可在以下网址获取：https://github.com/facebookresearch/ulee

英文摘要

Unsupervised pre-training can equip reinforcement learning agents with prior knowledge and accelerate learning in downstream tasks. A promising direction, grounded in human development, investigates agents that learn by setting and pursuing their own goals. The core challenge lies in how to effectively generate, select, and learn from such goals. Our focus is on broad distributions of downstream tasks where solving every task zero-shot is infeasible. Such settings naturally arise when the target tasks lie outside of the pre-training distribution or when their identities are unknown to the agent. In this work, we (i) optimize for efficient multi-episode exploration and adaptation within a meta-learning framework, and (ii) guide the training curriculum with evolving estimates of the agent's post-adaptation performance. We present ULEE, an unsupervised meta-learning method that combines an in-context learner with an adversarial goal-generation strategy that maintains training at the frontier of the agent's capabilities. On XLand-MiniGrid benchmarks, ULEE pre-training yields improved exploration and adaptation abilities that generalize to novel objectives, environment dynamics, and map structures. The resulting policy attains improved zero-shot and few-shot performance, and provides a strong initialization for longer fine-tuning processes. It outperforms learning from scratch, DIAYN pre-training, and alternative curricula. Code is available at: https://github.com/Octavio-Pappalardo/ulee-jax

URL PDF HTML ☆

赞 0 踩 0

2602.04879 2026-06-15 cs.LG cs.AI cs.CL 版本更新

Rethinking the Trust Region in LLM Reinforcement Learning

重新思考LLM强化学习中的信任区域

Penghui Qi, Xiangxin Zhou, Zichen Liu, Tianyu Pang, Chao Du, Min Lin, Wee Sun Lee

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Toronto（多伦多大学）

AI总结针对PPO在LLM微调中因词表大导致的训练不稳定问题，提出基于策略散度直接约束的DPPO算法，并引入高效近似方法。

详情

AI中文摘要

强化学习已成为微调大型语言模型（LLM）的基石，其中近端策略优化（PPO）是事实上的标准算法。尽管其普遍存在，我们认为PPO中的核心比率裁剪机制在结构上不适合LLM固有的大词表。PPO基于采样令牌的概率比率约束策略更新，该比率是对真实策略散度的有噪单样本蒙特卡洛估计。这导致次优的学习动态：低概率令牌的更新被过度惩罚，而高概率令牌中潜在的灾难性变化却约束不足，导致训练效率低下和不稳定。为解决此问题，我们提出散度近端策略优化（DPPO），用基于策略散度（如总变差或KL）直接估计的更原则性约束替代启发式裁剪。为避免巨大内存占用，我们引入了高效的二元和Top-K近似，以可忽略的开销捕获本质散度。大量实证评估表明，DPPO相比现有方法实现了更优的训练稳定性和效率，为基于RL的LLM微调提供了更稳健的基础。我们的代码可在https://github.com/sail-sg/Stable-RL获取。

英文摘要

Reinforcement learning (RL) has become a cornerstone for fine-tuning Large Language Models (LLMs), with Proximal Policy Optimization (PPO) serving as the de facto standard algorithm. Despite its ubiquity, we argue that the core ratio clipping mechanism in PPO is structurally ill-suited for the large vocabularies inherent to LLMs. PPO constrains policy updates based on the probability ratio of sampled tokens, which serves as a noisy single-sample Monte Carlo estimate of the true policy divergence. This creates a sub-optimal learning dynamic: updates to low-probability tokens are aggressively over-penalized, while potentially catastrophic shifts in high-probability tokens are under-constrained, leading to training inefficiency and instability. To address this, we propose Divergence Proximal Policy Optimization (DPPO), which substitutes heuristic clipping with a more principled constraint based on a direct estimate of policy divergence (e.g., Total Variation or KL). To avoid huge memory footprint, we introduce the efficient Binary and Top-K approximations to capture the essential divergence with negligible overhead. Extensive empirical evaluations demonstrate that DPPO achieves superior training stability and efficiency compared to existing methods, offering a more robust foundation for RL-based LLM fine-tuning. Our code is available at https://github.com/sail-sg/Stable-RL.

URL PDF HTML ☆

赞 0 踩 0

2602.14169 2026-06-15 cs.LG cs.AI cs.CL 版本更新

Deep Dense Exploration for LLM Reinforcement Learning via Pivot-Driven Resampling

基于枢轴驱动重采样的LLM强化学习深度密集探索

Yiran Guo, Zhongjian Qiao, Yingqi Xie, Jie Liu, Dan Ye, Ruiqing Zhang, Shuang Qiu, Lijie Xu

发表机构 * Institute of Software, Chinese Academy of Sciences（中国科学院软件研究所）； City University of Hong Kong（香港城市大学）； Baidu（百度）

AI总结针对大语言模型强化学习中探索效率低的问题，提出深度密集探索（DDE）策略，通过识别失败轨迹中的可恢复枢轴状态并局部密集重采样，结合双流优化目标，在数学推理基准上优于现有方法。

详情

AI中文摘要

有效探索是大语言模型强化学习中的一个关键挑战：在有限的采样预算内，从庞大的自然语言序列空间中发现高质量轨迹。现有方法面临显著局限性：GRPO仅从根节点采样，使高概率轨迹饱和，而深层易错状态探索不足；基于树的方法盲目地将预算分散到琐碎或不可恢复的状态，导致采样稀释，无法发现罕见的正确后缀并破坏局部基线。为解决此问题，我们提出深度密集探索（DDE），一种将探索聚焦于失败轨迹中的“枢轴”——深层、可恢复状态的策略。我们通过DEEP-GRPO实例化DDE，引入三个关键创新：（1）轻量级数据驱动效用函数，自动平衡可恢复性和深度偏差以识别枢轴状态；（2）在每个枢轴处进行局部密集重采样，增加发现后续正确轨迹的概率；（3）双流优化目标，将全局策略学习与局部纠正更新解耦。在数学推理基准上的实验表明，我们的方法一致优于GRPO、基于树的方法及其他强基线。代码见 https://this https URL

英文摘要

Effective exploration is a key challenge in reinforcement learning for large language models: discovering high-quality trajectories within a limited sampling budget from the vast natural language sequence space. Existing methods face notable limitations: GRPO samples exclusively from the root, saturating high-probability trajectories while leaving deep, error-prone states under-explored. Tree-based methods blindly disperse budgets across trivial or unrecoverable states, causing sampling dilution that fails to uncover rare correct suffixes and destabilizes local baselines. To address this, we propose Deep Dense Exploration (DDE), a strategy that focuses exploration on $\textit{pivots}$-deep, recoverable states within unsuccessful trajectories. We instantiate DDE with DEEP-GRPO, which introduces three key innovations: (1) a lightweight data-driven utility function that automatically balances recoverability and depth bias to identify pivot states; (2) local dense resampling at each pivot to increase the probability of discovering correct subsequent trajectories; and (3) a dual-stream optimization objective that decouples global policy learning from local corrective updates. Experiments on mathematical reasoning benchmarks demonstrate that our method consistently outperforms GRPO, tree-based methods, and other strong baselines. Code is available at https://github.com/AgentCombo/DEEP-GRPO

URL PDF HTML ☆

赞 0 踩 0

2603.12231 2026-06-15 cs.LG 版本更新

Temporal Straightening for Latent Planning

时间拉直用于隐式规划

Ying Wang, Oumayma Bounou, Gaoyue Zhou, Randall Balestriero, Tim G. J. Rudner, Yann LeCun, Mengye Ren

发表机构 * New York University（纽约大学）； Brown University（布朗大学）； University of Toronto（多伦多大学）

AI总结受人类视觉处理中感知拉直假说启发，提出时间拉直方法，通过曲率正则化联合学习JEPA世界模型的编码器和预测器，改善隐式规划中的表示学习，使梯度规划更稳定并提高目标到达任务成功率。

Comments ICML2026 Camera Ready

详情

AI中文摘要

学习良好的表示对于基于世界模型的隐式规划至关重要。虽然预训练的视觉编码器能产生强大的语义视觉特征，但它们并非为规划定制，且包含与规划无关甚至有害的信息。受人类视觉处理中感知拉直假说的启发，我们引入时间拉直来改进隐式规划的表示学习。通过使用鼓励局部拉直隐式轨迹的曲率正则化器，我们联合学习联合嵌入预测架构（JEPA）世界模型的编码器和预测器。我们表明，以这种方式降低曲率使得隐空间中的欧氏距离更好地近似测地距离，并改善了规划目标的条件。我们通过实验证明，时间拉直使得基于梯度的规划更稳定，并在一系列目标到达任务中显著提高了成功率。我们的代码可在该 https URL 获取。

英文摘要

Learning good representations is essential for latent planning with world models. While pretrained visual encoders produce strong semantic visual features, they are not tailored to planning and contain information irrelevant -- or even detrimental -- to planning. Inspired by the perceptual straightening hypothesis in human visual processing, we introduce temporal straightening to improve representation learning for latent planning. Using a curvature regularizer that encourages locally straightened latent trajectories, we jointly learn an encoder and a predictor of a Joint-Embedding Predictive Architecture (JEPA) world model. We show that reducing curvature this way makes the Euclidean distance in latent space a better proxy for the geodesic distance and improves the conditioning of the planning objective. We demonstrate empirically that temporal straightening makes gradient-based planning more stable and yields significantly higher success rates across a suite of goal-reaching tasks. Our code is available at https://agenticlearning.ai/temporal-straightening.

URL PDF HTML ☆

赞 0 踩 0

2603.18464 2026-06-15 cs.LG 版本更新

AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

AcceRL: 面向视觉-语言-动作模型的分布式异步强化学习与世界模型框架

Chengxuan Lu, Shukuan Wang, Yanjie Li, Yingying Fang, Huoyan Wang, Tian Zhang, Wei Liu, Shiji Jin, Fuyuan Qian, Peiming Li, Chao Xu, Baigui Sun, Yang Liu

发表机构 * IROOTECH TECHNOLOGY Wolf 1069 b Lab, Sany Group（伊罗科技沃尔夫1069b实验室，三一集团）

AI总结提出AcceRL框架，通过物理隔离环境交互、模型推理和梯度更新实现分布式异步强化学习，消除同步系统的空闲气泡，提升硬件利用率，并支持即插即用的世界模型集成，在LIBERO任务上实现2.4倍吞吐加速和200倍样本效率提升。

详情

AI中文摘要

大规模视觉-语言-动作（VLA）模型的强化学习（RL）严重受限于同步障碍和环境数据获取的高成本。为克服这些挑战，我们提出AcceRL，一种分布式异步RL框架，物理隔离环境交互、模型推理和梯度更新。通过消除同步系统中固有的级联长尾空闲气泡，AcceRL最大化硬件利用率并确保可扩展吞吐量。此外，AcceRL采用模块化设计，支持将多种即插即用的世界模型集成到其分布式流水线中。大量实验表明，基础框架在所有四个LIBERO~\cite{liu2023libero}任务套件上均取得极具竞争力的性能。系统层面，异步架构相比领先的同步基线实现了2.4倍的吞吐加速。算法层面，通过利用在1000条离线轨迹上预训练的世界模型，AcceRL在LIBERO-Spatial上实现了高达200倍的在线样本效率提升，为具身AI建立了一个既样本高效又时间高效的稳健框架。代码包含在补充材料中。代码见此网址。

英文摘要

Reinforcement learning (RL) for large-scale Vision-Language-Action (VLA) models is severely bottlenecked by synchronization barriers and the high cost of environment data acquisition. To overcome these challenges, we propose AcceRL, a distributed asynchronous RL framework that physically isolates environment rollouts, model inference, and gradient updates. By eliminating the cascading long-tail idle bubbles inherent in synchronous systems, AcceRL maximizes hardware utilization and ensures scalable throughput. Furthermore, AcceRL features a modular design that supports the integration of diverse, plug-and-play world models into its distributed pipeline. Extensive experiments demonstrate that the base framework achieves highly competitive performance across all four LIBERO~\cite{liu2023libero} task suites. Systematically, the asynchronous architecture delivers a $2.4\times$ throughput speedup over leading synchronous baselines. Algorithmically, by leveraging a world model pre-trained on 1,000 offline trajectories, AcceRL achieves up to a $200\times$ improvement in online sample efficiency on LIBERO-Spatial, establishing a robust framework that is both sample-efficient and time-efficient for embodied AI. Code is included in the supplementary material. Code is available at https://github.com/distanceLu/AcceRL.

URL PDF HTML ☆

赞 0 踩 0

2605.03065 2026-06-15 cs.LG cs.RO 版本更新

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

OGPO：生成控制策略的样本高效全微调

Sarvesh Patil, Mitsuhiko Nakamoto, Manan Agarwal, Shashwat Saxena, Jesse Zhang, Giri Anantharaman, Cleah Winston, Chaoyi Pan, Douglas Chen, Nai-Chieh Huang, Zeynep Temel, Oliver Kroemer, Sergey Levine, Abhishek Gupta, Hongkai Dai, Paarth Shah, Max Simchowitz

发表机构 * University of California, Berkeley（加州大学伯克利分校）； UC Berkeley（加州大学伯克利分校）

AI总结提出OGPO算法，通过离策略评论网络和修改的PPO目标，实现生成控制策略的样本高效微调，在多种操作任务上达到最优性能，并能在无专家数据下微调不良初始化的行为克隆策略。

详情

AI中文摘要

生成控制策略（GCPs），如基于扩散和基于流的控制策略，已成为机器人学习的有效参数化方法。本文介绍了离策略生成策略优化（OGPO），一种用于微调GCPs的样本高效算法，该算法维护离策略评论网络以最大化数据重用，并通过修改的PPO目标将策略梯度传播到策略的完整生成过程，使用评论网络作为终端奖励。OGPO在涵盖多任务设置、高精度插入和灵巧控制的操作任务上达到了最先进的性能。据我们所知，它也是唯一一种能够在在线回放缓冲区中无专家数据的情况下，将初始化不良的行为克隆策略微调到接近完全任务成功的方法，并且只需很少的任务特定超参数调整。通过广泛的实证研究，我们证明了OGPO在策略引导和残差学习方面显著优于替代方法，并确定了其性能背后的关键机制。我们进一步引入了实用的稳定技巧，包括成功缓冲区正则化、双边保守优势和Q方差减少，以减轻基于状态和基于像素的设置中的评论网络过度利用。除了提出OGPO，我们还对GCP微调进行了系统的实证研究，确定了控制成功离策略全策略改进的稳定机制和失败模式。

英文摘要

Generative control policies (GCPs), such as diffusion- and flow-based control policies, have emerged as effective parameterizations for robot learning. This work introduces Off-policy Generative Policy Optimization (OGPO), a sample-efficient algorithm for finetuning GCPs that maintains off-policy critic networks to maximize data reuse and propagate policy gradients through the full generative process of the policy via a modified PPO objective, using critics as the terminal reward. OGPO achieves state-of-the-art performance on manipulation tasks spanning multi-task settings, high-precision insertion, and dexterous control. To our knowledge, it is also the only method that can fine-tune poorly-initialized behavior cloning policies to near full task-success with no expert data in the online replay buffer, and does so with few task-specific hyperparameter tuning. Through extensive empirical investigations, we demonstrate that OGPO drastically outperforms methods alternatives on policy steering and learning residual corrections, and identify the key mechanisms behind its performance. We further introduce practical stabilization tricks, including success-buffer regularization, two-sided conservative advantages, and Q-variance reduction, to mitigate critic over-exploitation across state- and pixel-based settings. Beyond proposing OGPO, we conduct a systematic empirical study of GCP finetuning, identifying the stabilizing mechanisms and failure modes that govern successful off-policy full-policy improvement.

URL PDF HTML ☆

赞 0 踩 0

2606.13955 2026-06-15 cs.LG 新提交

隐式变分拒绝采样

Jian Xu, Shigui Li, Wei Chen, Jiacheng Li, Zhiqi Lin, Delu Zeng, Xinghao Ding, John Paisley, Qibin Zhao

发表机构 * RIKEN iTHEMS ； RIKEN AIP ； South China University of Technology（华南理工大学）； Xiamen University（厦门大学）； Columbia University（哥伦比亚大学）

AI总结提出隐式变分拒绝采样（IVRS），结合隐式分布与拒绝采样，通过神经网络构建提议分布并用判别器估计密度比来改进后验近似，引入IR-ELBO作为质量度量，实验优于传统变分推断。

详情

AI中文摘要

变分推断（VI）是贝叶斯机器学习中用于近似复杂后验分布的基本推断技术。传统的VI通常依赖于均值场分解，这可能无法充分捕捉真实后验的复杂性。最近的进展利用神经网络建模隐式分布，提供了更大的灵活性。然而，神经网络架构的实际约束仍然会导致不准确性。在本文中，我们提出了一种称为隐式变分拒绝采样（IVRS）的方法，该方法将隐式分布与拒绝采样相结合，以改进后验近似。我们的方法使用神经网络构建隐式提议分布，并通过一个判别器网络进行拒绝采样，该网络估计隐式提议与真实后验之间的密度比，以细化近似。为此，我们引入了隐式重采样证据下界（IR-ELBO）作为度量重采样分布质量的指标，并推导出更紧的变分下界。实验结果表明，我们的方法优于传统的变分推断技术。

英文摘要

Variational Inference (VI) is a fundamental inference technique in Bayesian machine learning for approximating complex posterior distributions. Traditional VI often relies on the mean-field factorization, which can inadequately capture true posterior complexity. Recent advancements have leveraged neural networks to model implicit distributions, offering increased flexibility. However, the practical constraints of neural network architectures still produces inaccuracies. In this paper, we propose a method called Implicit Variational Rejection Sampling (IVRS), which integrates implicit distributions with rejection sampling to improve the posterior approximation. Our method uses neural networks to construct implicit proposal distributions, and rejection sampling with a discriminator network that estimates the density ratio between the implicit proposal and the true posterior for refining the approximation. Towards this end, we introduce the Implicit Resampling Evidence Lower Bound (IR-ELBO) as a metric to characterize the resampled distribution's quality and derive a tighter variational lower bound. Experimental results demonstrate that our method outperforms traditional variational inference techniques.

URL PDF HTML ☆

赞 0 踩 0

2606.13796 2026-06-15 stat.ML cs.LG 交叉投稿

Recursively Trained Diffusion Models: Limiting Collapse Distribution and Spectral Characterization

递归训练的扩散模型：限制崩溃分布与谱特征

Naïl B. Khelifa, Richard E. Turner, Ramji Venkataramanan

发表机构 * University of Cambridge（剑桥大学）

AI总结研究递归训练扩散模型时的分布崩溃问题，证明即使完美学习也会因早期停止导致漂移，并收敛到唯一极限分布，该分布具有低通滤波谱特性。

详情

AI中文摘要

生成模型在其自身输出上的递归训练可能导致模型崩溃，即与真实数据分布的复合漂移。现有的理论工作限制了扩散模型背景下有限轮误差的累积，但有两个问题仍然悬而未决：递归收敛到何种分布，以及收敛速度如何？我们回答了这两个问题，并分离出一种不同于不完美学习的机制：即使具有完美的分数估计和精确采样，反向扩散的早期停止（出于数值稳定性需要）也会驱动逐渐偏离数据分布。我们证明该递归几何收敛到唯一的极限分布，该分布具有闭式表征，即数据分布的无限混合，其中每个分量是数据分布的高斯平滑版本，且平滑程度递增。该极限的Hermite谱分解表明，递归训练充当低通滤波器：编码精细非高斯结构的高阶模式比粗模式衰减得更强。这种谱图景启发了一种退火截断调度，该调度在再训练轮次中逐步缩小截断时间；我们证明任何收敛到0的调度都能渐近消除递归复合。最后，我们展示了理想化表征的鲁棒性：在存在离散化和分数估计误差的情况下，学习到的分布保持在理想极限周围的Wasserstein-2球内，且具有模式依赖的收缩率，高阶误差比低阶误差收缩更快。我们在合成高斯混合和CIFAR-10上验证了该理论。

英文摘要

Recursive training of generative models on their own outputs can lead to model collapse, a compounding drift away from the true data distribution. Existing theoretical works bound finite-round error accumulation in the context of diffusion models, but two questions remain open:~what distribution does the recursion converge to, and how fast? We answer both, isolating a mechanism distinct from imperfect learning: even with perfect score estimation and exact sampling, the early stopping of the reverse diffusion (required for numerical stability) drives a progressive drift away from the data distribution. We prove that this recursion converges geometrically to a unique limiting distribution, which admits a closed-form characterization as an infinite mixture of increasingly Gaussian-smoothed versions of the data distribution. A Hermite spectral decomposition of this limit reveals that recursive training acts as a low-pass filter: higher-order modes, which encode fine non-Gaussian structure, are attenuated much more strongly than coarse modes. This spectral picture motivates annealed truncation schedules that progressively shrink truncation times across retraining rounds; we prove that any schedule converging to $0$ asymptotically eliminates recursive compounding. Finally, we show our idealized characterization is robust: in the presence of discretization and score estimation errors, the learned distribution remains in a Wasserstein-2 ball around the ideal limit, with mode-dependent contraction rates that contract high-order errors faster than low-order ones. We validate the theory on synthetic Gaussian mixtures and CIFAR-10.

URL PDF HTML ☆

赞 0 踩 0

2606.13817 2026-06-15 cs.RO cs.LG 交叉投稿

巴赫风格符号音乐的生成建模：自回归、潜变量和对抗方法的比较研究

Dezhi Yu, Kyuil Lee, Yongkang Huang

发表机构 * Stanford University（斯坦福大学）

AI总结比较自回归LSTM、潜变量模型和生成对抗网络在巴赫风格钢琴音乐生成中的表现，发现带注意力的自回归LSTM生成音乐最连贯，向量量化缓解后验塌陷，对抗方法捕捉局部音高但训练困难。

Comments 11 pages, 13 figures. All authors contributed equally

2606.13753 2026-06-15 cs.LG cs.AI 新提交

The Weight Norm Sets the Grokking Timescale: A Causal Delay Law

权重范数设定“顿悟”时间尺度：因果延迟定律

Truong Xuan Khanh, Doan Hoang Viet, Luu Duc Trung, Phan Thanh Duc

发表机构 * H&K Research Studio / Clevix LLC（H&K研究工作室 / Clevix有限责任公司）； Bac A Bank（北亚银行）； Banking Academy of Vietnam（越南银行学院）

AI总结通过干预训练中权重范数，发现网络在范数达到临界值Wc时发生顿悟，且延迟时间与固定范数倍数呈指数关系，揭示了范数对顿悟的因果作用。

Comments 14 papges, 9 figs and 3 tables

详情

AI中文摘要

“顿悟”是神经网络中泛化能力的延迟出现，远在模型拟合训练数据之后才发生。权重范数是否导致这种延迟存在争议：一些研究报告了转变时的临界范数，另一些则观察到没有固定范数的顿悟。我们通过在训练过程中干预范数而非仅观察它来解决这一问题。在带权重衰减的自由训练下，当权重范数达到一个跨种子和学习率变化很小（变异系数1%至2%）且随模数基按幂律增长的值Wc时，网络发生顿悟。当我们转而将范数固定为Wc的某个倍数ρ并保持该值时，网络仍然顿悟，但延迟遵循T_grok ∝ exp(α ρ)。一个指数α≈7.5拟合了四个模数下的延迟（R²=0.996）。在扫描范围内，固定范数使延迟变化约19倍，而学习率仅变化约2倍，且将范数保持在Wc以上会减慢而非阻止顿悟。最后的LayerNorm通过解耦权重尺度与网络函数消除了这种依赖；没有它，指数定律重新出现。这种固定范数的延迟是指数对应物，对应于自由收缩范数所预测的对数延迟。

英文摘要

Grokking is the delayed onset of generalization in neural networks, arising long after they fit the training data. Whether the weight norm causes this delay is disputed: some studies report a critical norm at the transition, others observe grokking with no fixed norm at all. We settle this by intervening on the norm during training rather than only observing it. Under free training with weight decay, networks grok when the weight norm reaches a value Wc that varies little across seeds and learning rates (CV 1 to 2 percent) and grows with the modular base as a power law. When we instead clamp the norm to a fixed multiple rho of Wc and hold it there, the network still groks, but the delay follows T_grok proportional to exp(alpha rho). One exponent, alpha near 7.5, fits this delay across four moduli (R^2 = 0.996). Over the swept ranges the held norm moves the delay by about 19x and the learning rate by only about 2x, and holding the norm above Wc slows grokking rather than preventing it. A final LayerNorm removes the dependence by decoupling weight scale from the network function; without it the exponential law returns. This pinned-norm delay is the exponential counterpart to the logarithmic delay predicted for a freely contracting norm.

URL PDF HTML ☆

赞 0 踩 0

2606.13818 2026-06-15 cs.LG 新提交

Uncertainty Estimation and Generalization Bounds for Modern Deep Learning

现代深度学习的不确定性估计与泛化界

Luis A. Ortega

发表机构 * Andrés Department of Computer Science Machine Learning Group（安德烈斯计算机科学系机器学习组）； Madrid, June 2026（马德里，2026年6月）

AI总结本文从贝叶斯角度统一推断、函数空间建模和大偏差理论，提出DVIP、VaLLA和FMGP等方法改进不确定性估计，并利用PAC-贝叶斯和大偏差理论解释过参数化神经网络的泛化能力。

Comments PhD Thesis, Autonomous University of Madrid

详情

AI中文摘要

本论文研究贝叶斯原理如何加深我们对现代深度学习系统的理解。尽管神经网络取得了显著的预测性能，但其泛化能力和不确定性量化能力仍仅被部分理解。本论文从方法论和理论两个角度应对这一挑战：将贝叶斯推断、函数空间建模和大偏差理论统一在一个共同的概率视角下。在方法论方面，论文引入了深度变分隐过程（DVIP），这是一个可扩展的贝叶斯框架，将隐过程扩展到深度架构。作为补充，提出了两种后处理方法——变分线性化拉普拉斯近似（VaLLA）和固定均值高斯过程（FMGP）——为预训练的确定性网络配备校准的不确定性估计。理论贡献集中于现代机器学习中一个核心开放问题：为什么大型、过参数化的神经网络能泛化得这么好？为此，论文发展了一个统一的概率框架，在PAC-贝叶斯和大偏差理论的语言下连接了三个关键机制——多样性、平滑性和随机性。

英文摘要

This thesis investigates how Bayesian principles can deepen our understanding of modern deep learning systems. While neural networks achieve remarkable predictive performance, their ability to generalize and to quantify uncertainty remains only partly understood. This thesis approaches this challenge from both methodological and theoretical angles: unifying Bayesian inference, function-space modeling, and large-deviation theory under a common probabilistic perspective. On the methodological side, the thesis introduces the Deep Variational Implicit Process (DVIP), a scalable Bayesian framework that extends implicit processes to deep architectures. Complementing this, two post-hoc methods -- the Variational Linearized Laplace Approximation (VaLLA) and the Fixed-Mean Gaussian Process (FMGP) -- are proposed to equip pretrained deterministic networks with calibrated uncertainty estimates. The theoretical contributions focus on one of the central open questions in modern machine learning: why do large, over-parameterized neural networks generalize so well? To address this, the thesis develops a unified probabilistic framework that connects three key mechanisms -- diversity, smoothness, and stochasticity -- within the language of PAC-Bayesian and large-deviation theory.

URL PDF HTML ☆

赞 0 踩 0

2606.13867 2026-06-15 cs.LG 新提交

程序仍在：程序发现的一个守恒律

Jorge Miguel Silva

发表机构 * Institute of Electronics and Informatics Engineering of Aveiro (IEETA) and Department of Electronics, Telecommunications and Informatics (DETI), University of Aveiro（阿维罗电子与信息工程学院（IEETA）和电子、电信与信息学院（DETI），阿维罗大学）

AI总结本文证明，在仅通过得分学习候选程序的算法中，搜索问题的耦合宽度导致指数级最坏情况下的下界，并由此导出结构知识与搜索之间的守恒律，唯一逃逸是通过读取程序结构而非得分，但代价是不完备性。

Comments 9 pages main text and 33 pages supporting information. Engine source and full sweep data: https://github.com/jorgeMFS/omnis, archived at doi:10.5281/zenodo.20634984

详情

AI中文摘要

寻找生成序列的最短程序是不可计算的，六十年来这一事实被误认为是寻找任何生成程序的障碍。它不是障碍，而是一个代价，本文衡量了它。对于每个仅通过得分学习候选程序的算法，涵盖Levin搜索、进化方法、模拟退火和交叉熵方法，我们定义了搜索问题的耦合宽度，并证明了一个无条件最坏情况下的下界，该下界以该宽度为指数，底数为域大小减一。由此得出一个守恒律：注入搜索的结构知识与它消除的搜索一一对应，它们的总和永远不会低于所寻找的程序长度。Levin 1973年的上界和本文证明的下界是一个守恒量的两端，随着指令集的增长而相互靠近。唯一的逃逸是读取候选程序的结构而非其得分，其代价（我们针对通用目标证明）是不完备性。基于该理论构建的确定性引擎通过压缩数据并预测未见过的延续来恢复生成程序，在四个独立群体的3914个序列中恢复了2383个，包括256个初等元胞自动机中的244个，测得的发现成本随程序长度上升，比得分-预言机最坏情况高出一个数量级以上。

英文摘要

Finding the shortest program that generates a sequence is uncomputable, and for six decades that fact has been mistaken for a wall around finding any generating program. It is not a wall but a price, and this paper measures it. For every algorithm that learns about a candidate program only through its score, a class spanning Levin search, evolutionary methods, simulated annealing, and the cross-entropy method, we define the coupling width of a search problem and prove an unconditional worst-case lower bound, exponential in that width with base one less than the domain size. From it follows a conservation law: structural knowledge injected into a search trades one for one against the search it removes, and their sum can never fall below the length of the program sought. Levin's 1973 upper bound and the lower bound proved here are the two ends of one conserved quantity, closing on each other as the instruction set grows. The only escape is to read a candidate's structure rather than its score, and its price, which we prove for generic targets, is incompleteness. A deterministic engine built on this theory recovers a generating program, certified by compressing its data and predicting an unseen continuation, for 2,383 of 3,914 sequences across four independent populations, including 244 of the 256 elementary cellular automata, with measured discovery cost rising along program length more than an order of magnitude inside the score-oracle worst case.

URL PDF HTML ☆

赞 0 踩 0

2606.13912 2026-06-15 cond-mat.dis-nn cond-mat.str-el cs.LG physics.comp-ph quant-ph 交叉投稿

Direct/adaptive-mixture phase-gradient learning for neural-network quantum states with complex phase structure

具有复杂相位结构的神经网络量子态的直接/自适应混合相位梯度学习

Yi-Ran Xue, Rui Wang, Baigeng Wang, Chenan Wei

发表机构 * National Laboratory of Solid State Microstructures and Department of Physics（固体-state微结构国家实验室和物理系）； Department of Physics, University of Massachusetts（麻省大学物理系）； A. Alikhanyan National Science Laboratory（Alikhanyan国家科学实验室）； Collaborative Innovation Center of Advanced Microstructures, Nanjing University（先进微结构协同创新中心，南京大学）； Jiangsu Physical Science Research Center（江苏物理科学研究中心）； Hefei National Laboratory（合肥国家实验室）

AI总结针对神经网络量子态在复杂相位结构下的优化脆弱性问题，提出直接相位梯度估计器与自适应混合方法，显著降低方差并提升精度，在100位点通量梯子和手性XXX链上验证了优势。

Comments 24 pages, 8 figures

详情

AI中文摘要

神经网络量子态是量子多体物理中领先的变分工具，但当基态具有非平凡符号或复杂相位结构时（这在规范场、时间反演对称性破缺和费米子统计中是普遍存在的），其优化变得脆弱。我们将这种脆弱性归因于相位梯度的随机估计器，而非网络表达能力。蒙特卡洛能量梯度的相位部分是一个有噪声的得分函数估计器；相反，对局部能量进行微分得到一个直接估计器，该估计器对相同的相位力无偏，方差低得多，并且只需要分离的振幅-相位假设。在100位点通量梯子上演示，以这种方式训练的小型网络达到0.89%的中位误差，而调整后的标准基线停滞在1.8%，更宽或更深的标准梯度网络误差从8.4%退化到24.6%。该优势延续到手性XXX链：直接估计器再次收敛到比标准估计器明显更低的误差，跨越α和系统尺寸；该优势随通量增加而在零通量控制中消失。两个估计器的自适应混合在最优混合系数下方差绝不会比更好的端点差，通过种子分辨的诊断将大部分增益归因于消除失败运行。因此，估计器设计成为复值神经量子态的一类重要杠杆。

英文摘要

Neural-network quantum states (NQS) are a leading variational tool for quantum many-body physics, yet their optimization is fragile whenever the ground state carries a non-trivial sign or complex phase structure, a situation generic to gauge fields, broken time-reversal symmetry, and fermionic statistics. We trace this fragility to the stochastic estimator of the phase gradient rather than to network expressiveness. The phase sector of the Monte Carlo energy gradient is a noisy score-function estimator; differentiating the local energy instead yields a direct estimator that is unbiased for the same phase force, has far lower variance, and requires only a separated amplitude--phase ansatz. Demonstrated on a 100-site flux ladder, a small network trained this way reaches $0.89\%$ median error, where tuned standard baselines plateau at $1.8\%$ and wider or deeper standard-gradient networks degrade from $8.4\%$ to $24.6\%$. The advantage carries over to chiral XXX chains: the direct estimator again converges to a markedly lower error than the standard one, across $α$ and size; it grows with flux and vanishes in zero-flux controls. An adaptive-mixture of the two estimators is provably never worse in variance than the better endpoint at the optimal mixing coefficient, with seed-resolved diagnostics tracing much of the gain to eliminating failed runs. Estimator design thus emerges as a first-class lever for complex-valued neural quantum states.

URL PDF HTML ☆

赞 0 踩 0

2606.13984 2026-06-15 stat.ML cs.LG stat.ME 交叉投稿

非线性双时间尺度随机逼近：尖锐相变及其克服方法

Dhruv Sarkar, Vaneet Aggarwal

发表机构 * Indian Institute of Technology Kharagpur（印度理工学院克达尔格浦尔分校）； Mohamed bin Zayed University of Artificial Intelligence（莫莫德 bin Zayed 人工智能大学）； Purdue University（普渡大学）

AI总结本文发现非线性双时间尺度随机逼近中慢速迭代的均方误差率存在依赖于正则性的相变边界，并通过引入辅助在线偏差估计器将慢速更新中的偏差项减去，从而在全部正则性参数下实现O(k^{-1})的收敛率。

详情

AI中文摘要

近期关于非线性双时间尺度随机逼近的有限时间分析表明，在压缩性假设下，慢速迭代$Y_k$使用步长$\beta_k=\Theta(k^{-1})$和$\alpha_k=\Theta(k^{-a})$（$a\in(1/2,1)$）通常满足阶为$k^{-a}$的均方误差率；解耦的$k^{-1}$率需要强局部线性性。我们识别出一个尖锐的依赖于正则性的边界。在一个决定速率的规范形式中，慢速漂移包含一个局部线性泄漏和一个阶为$1+\rho$（$\rho\in[0,1]$）的非线性余项，未修正的递归满足\[ \mathbb{E}\|Y_k\|^2 \le C\bigl(k^{-1}+k^{-a(1+\rho)}\bigr), \]并且一个匹配的标量高斯下界表明，如果不修改更新，较慢的项是不可避免的。因此，当且仅当$a(1+\rho)\ge 1$时，未修正的递归保证解耦的$k^{-1}$率。这个下界仅针对朴素更新；它不是信息论障碍。我们通过为规范形式递归配备一个辅助在线偏差估计器\[ M_{k+1}=M_k+\gamma_k(R(X_k)-M_k),\qquad \beta_k\ll\gamma_k\ll\alpha_k, \]并从慢速更新中减去$M_k$来证明这一点。在相同的稳定性、矩和余项假设下，修正的递归对于每个$\rho\in[0,1]$实现$\mathbb{E}\|\widetilde Y_k\|^2=O(k^{-1})$，包括未修正更新被证明遭受较慢率的区域。最后，我们证明了局部传递定理，将相变机制推广到快速流形坐标中的一般非线性TTSA。证明是非渐近的，并依赖于两个阿贝尔变换抵消：一个用于局部线性快速误差泄漏，另一个用于跟踪的非线性偏差。

英文摘要

Recent finite-time analyses of nonlinear two-time-scale stochastic approximation show that under contractive assumptions the slow iterate $Y_k$ with stepsizes $β_k=Θ(k^{-1})$ and $α_k=Θ(k^{-a})$, $a\in(1/2,1)$, generally satisfies a mean-square rate of order $k^{-a}$; decoupled $k^{-1}$ rates require strong local linearity. We identify a sharp regularity-dependent boundary. In a rate-determining normal form where the slow drift contains a locally linear leakage and a nonlinear remainder of order $1+ρ$ ($ρ\in[0,1]$), the uncorrected recursion satisfies \[ \mathbb{E}\|Y_k\|^2 \le C\bigl(k^{-1}+k^{-a(1+ρ)}\bigr), \] and a matching scalar Gaussian lower bound shows that the slower term is unavoidable without modifying the update. Thus the decoupled $k^{-1}$ rate is guaranteed for the uncorrected recursion exactly when $a(1+ρ)\ge 1$. This lower bound concerns only the naive update; it is not an information-theoretic obstruction. We demonstrate this by equipping the normal-form recursion with an auxiliary online bias estimator \[ M_{k+1}=M_k+γ_k(R(X_k)-M_k),\qquad β_k\llγ_k\llα_k, \] and subtracting $M_k$ from the slow update. Under the same stability, moment, and remainder assumptions, the corrected recursion achieves $\mathbb{E}\|\widetilde Y_k\|^2=O(k^{-1})$ for every $ρ\in[0,1]$, including regimes where the uncorrected update provably suffers the slower rate. Finally, we prove localized transfer theorems that extend the phase-transition mechanism to general nonlinear TTSA in fast-manifold coordinates. The proofs are non-asymptotic and rely on two Abel-transform cancellations: one for the locally linear fast-error leakage, and one for the tracked nonlinear bias.

URL PDF HTML ☆

赞 0 踩 0

2606.14560 2026-06-15 math.OC cs.LG stat.ML 交叉投稿

Free Heavy-Tailed Lunch for Muon: A Theoretical Justification of Empirical Success

Muon 的免费重尾午餐：实证成功的理论证明

Florian Hübler, Thomas Pethick, Suvrit Sra

发表机构 * Department of Computer Science, ETH Zurich, Switzerland（苏黎世联邦理工学院计算机科学系）； Department of Mathematics, Technical University of Munich, Germany（慕尼黑技术大学数学系）； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心）

AI总结本文在重尾非凸优化中证明，Muon 等非欧几里得方法在核范数平稳性下达到最优样本复杂度，避免了欧几里得方法的维度依赖，并通过大语言模型实验验证。

详情

AI中文摘要

最近，具有矩阵值更新的非欧几里得优化方法（如 Muon 和 Scion）在训练 Transformer 模型方面显示出强大的实证性能，但其相对于欧几里得方法的理论优势仍知之甚少。我们在重尾非凸机制中解决了这一差距，其中随机梯度具有有界的 $p$ 阶中心矩，$p \in (1,2]$。我们表明，某些非欧几里得方法在更强的平稳性度量下实现了最优样本复杂度，而欧几里得方法则会产生额外的维度相关成本。因此，对于 $m \times n$ 矩阵，Muon 在核范数下找到一个 $\varepsilon$-平稳点所需的样本数为 $\mathcal{O}\left(\min\{m, n\} \frac{\Delta_1 L}{\varepsilon^2} \left(\frac \sigma \varepsilon \right)^{\frac p {p-1}}\right)$，吸收了重尾噪声而无需额外的维度依赖，这与欧几里得方法不同。我们进一步证明，对于所有一阶方法在核范数平稳性下，该样本复杂度（包括其维度依赖）是最优的。在大语言模型上的实验支持了我们的理论。令人惊讶的是，我们的结果表明，除了 Muon 的谱几何之外，其他 Schatten 几何在某些设置下也能具有竞争力。

英文摘要

Non-Euclidean optimisation methods with matrix-valued updates, such as Muon and Scion, have recently shown strong empirical performance for training Transformer models, yet their theoretical advantages over Euclidean methods remain poorly understood. We address this gap in the heavy-tailed non-convex regime, where stochastic gradients have bounded $p$-th central moments, $p \in (1,2]$. We show that certain non-Euclidean methods achieve optimal sample complexity under stronger stationarity measures, while Euclidean methods incur additional dimension-dependent costs. As a consequence, for $m \times n$ matrices, Muon finds an $\varepsilon$-stationary point in nuclear norm within $\mathcal{O}\left(\min\{m, n\} \frac{Δ_1 L}{\varepsilon^2} \left(\frac σ\varepsilon \right)^{\frac p {p-1}}\right)$ samples, absorbing heavy-tailed noise without extra dimension dependence, unlike Euclidean methods. We further prove this sample complexity, including its dimension dependence, is optimal for all first-order methods under nuclear-norm stationarity. Experiments on large language models support our theory. Surprisingly, our results suggest that other Schatten geometries beyond the spectral geometry of Muon can perform competitively in certain settings.

URL PDF HTML ☆

赞 0 踩 0

2410.00722 2026-06-15 cs.LG math.AG 版本更新

On the Geometry and Optimization of Polynomial Convolutional Networks

多项式卷积网络的几何与优化

Vahid Shahverdi, Giovanni Luca Marchetti, Kathlén Kohn

发表机构 * KTH Royal Institute of Technology（皇家理工学院）

AI总结研究使用单项式激活函数的卷积神经网络，证明其参数化映射是正则且几乎处处同构，通过代数几何方法计算神经流形的维数和度，并量化回归损失优化中临界点的数量。

Comments Accepted at AISTATS 2025. New version: corrected Section 4.2

2507.13263 2026-06-15 cs.LG cs.AI 版本更新

From Sorting Algorithms to Scalable Kernels: Bayesian Optimization in High-Dimensional Permutation Spaces

从排序算法到可扩展核：高维排列空间中的贝叶斯优化

Zikai Xie, Linjiang Chen

发表机构 * State Key Laboratory of Precision and Intelligent Chemistry（精准与智能化学国家重点实验室）

AI总结针对高维排列空间贝叶斯优化中表示可扩展性差的问题，提出基于排序算法的核函数框架，其中Mallows核是枚举排序的特例，而新提出的Merge核通过归并排序的分解结构实现Θ(n log n)复杂度且无信息损失，在低维性能相当，高维显著提升优化效果与计算效率。

Comments 9 pages, published on ICLR-26

详情

AI中文摘要

贝叶斯优化（BO）是黑箱优化的强大工具，但其在高维排列空间中的应用受到定义可扩展表示的严重限制。当前最先进的排列空间BO方法依赖于穷举的Ω(n^2)成对比较，导致密集表示，不适用于大规模排列。为了突破这一障碍，我们引入了一个新框架，通过从排序算法导出的核函数生成高效的排列表示。在该框架中，Mallows核可以被视为从枚举排序导出的特例。此外，我们引入了Merge核，它利用归并排序的分治结构生成紧凑的Θ(n log n)表示，实现了最低可能复杂度且无信息损失，并有效捕捉排列结构。我们的核心论点是，Merge核在低维设置中与Mallows核性能相当，但随着维度n增长，在优化性能和计算效率上显著优于后者。在各种排列优化基准上的广泛评估证实了我们的假设，表明Merge核为高维排列空间中的贝叶斯优化提供了可扩展且更有效的解决方案，从而释放了解决以前难以处理的问题（如大规模特征排序和组合神经架构搜索）的潜力。

英文摘要

Bayesian Optimization (BO) is a powerful tool for black-box optimization, but its application to high-dimensional permutation spaces is severely limited by the challenge of defining scalable representations. The current state-of-the-art BO approach for permutation spaces relies on an exhaustive $Ω(n^2)$ pairwise comparison, inducing a dense representation that is impractical for large-scale permutations. To break this barrier, we introduce a novel framework for generating efficient permutation representations via kernel functions derived from sorting algorithms. Within this framework, the Mallows kernel can be viewed as a special instance derived from enumeration sort. Further, we introduce the \textbf{Merge Kernel} , which leverages the divide-and-conquer structure of merge sort to produce a compact, $Θ(n\log n)$ to achieve the lowest possible complexity with no information loss and effectively capture permutation structure. Our central thesis is that the Merge Kernel performs competitively with the Mallows kernel in low-dimensional settings, but significantly outperforms it in both optimization performance and computational efficiency as the dimension $n$ grows. Extensive evaluations on various permutation optimization benchmarks confirm our hypothesis, demonstrating that the Merge Kernel provides a scalable and more effective solution for Bayesian optimization in high-dimensional permutation spaces, thereby unlocking the potential for tackling previously intractable problems such as large-scale feature ordering and combinatorial neural architecture search.

URL PDF HTML ☆

赞 0 踩 0

2511.07368 2026-06-15 cs.LG cs.AI 版本更新

关于可观测数据和私有数据的最优划分分类方法

Balázs Csanád Csáji, László Györfi, Ambrus Tamás, Harro Walk

发表机构 * HUN-REN Institute for Computer Science and Control (SZTAKI)（HUN-REN计算机科学与控制研究所（SZTAKI））； Department of Probability Theory and Statistics, Institute of Mathematics, Eötvös Loránd University (ELTE)（概率论与统计学系，厄特沃什·洛朗大学数学学院（ELTE））； Department of Computer Science and Information Theory, Budapest University of Technology and Economics (BME)（计算机科学与信息理论系，布达佩斯技术与经济大学（BME））； Institute for Stochastics and Applications, University of Stuttgart（概率论与应用研究所，斯图加特大学）

AI总结本文重新审视划分分类方法，在更宽松条件下（无需强密度假设）推导出可观测和私有数据下分类误差概率的收敛速率，该速率仅依赖于连续输入的内在维度。

详情

AI中文摘要

在本文中，我们重新审视了划分分类的经典方法，并在宽松条件下证明了新的收敛速率，既适用于可观测（非私有化）数据，也适用于私有化数据。我们考虑在 $d$ 维欧几里得空间中的分类问题。先前关于划分分类器的结果依赖于强密度假设（SDA），我们通过简单示例表明该假设具有限制性。在此，我们在更温和的假设下研究该问题。我们预设输入分布是绝对连续分布和离散分布的混合，使得绝对连续分量集中在 $d_a$ 维子空间上。除了标准的 Lipschitz 和边际条件外，还引入了绝对连续分量的一个新特征，据此计算分类误差概率的收敛速率，包括二元和多类情况。该界可以达到使用 SDA 所能达到的极小极大最优收敛速率，但在更温和的分布假设下。有趣的是，该收敛速率仅依赖于连续输入的内在维度 $d_a$，而非 $d$。在隐私约束下，数据无法直接观测，构建的分类器是合适的局部差分隐私机制随机结果的函数。在本文中，我们将拉普拉斯分布噪声添加到特征向量所有可能位置的离散化及其标签中。再次，可以在不使用 SDA 的情况下推导出分类误差概率收敛速率的紧上界，使得该速率依赖于 $2d_a$。

英文摘要

In this paper we revisit the classical method of partitioning classification and prove novel convergence rates under relaxed conditions, both for observable (non-privatised) and for privatised data. We consider the problem of classification in a $d$ dimensional Euclidean space. Previous results on the partitioning classifier worked with the strong density assumption (SDA), which is restrictive, as we demonstrate through simple examples. Here, we study the problem under much milder assumptions. We presuppose that the distribution of the inputs is a mixture of an absolutely continuous and a discrete distribution, such that the absolutely continuous component is concentrated on a $d_a$ dimensional subspace. In addition to the standard Lipschitz and margin conditions, a novel characteristic of the absolutely continuous component is introduced, by which the convergence rate of the classification error probability is computed, both for the binary and for the multi-class cases. This bound can reach the minimax optimal convergence rate achievable using SDA, but under much milder distributional assumptions. Interestingly, this convergence rate depends only on the intrinsic dimension of the continuous inputs, $d_a$, and not on $d$. Under privacy constraints, the data cannot be directly observed, and the constructed classifiers are functions of the randomised outcome of a suitable local differential privacy mechanism. In this paper we add Laplace distributed noises to the discretisations of all possible locations of the feature vector and to its label. Again, tight upper bounds on the convergence rate of the classification error probability can be derived, without using SDA, such that this rate depends on $2d_a$.

URL PDF HTML ☆

赞 0 踩 0

2405.03063 2026-06-15 math.ST cs.IT cs.LG math.IT stat.ME stat.ML stat.TH 版本更新

Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

广义去偏Lasso的稳定性及其在基于重抽样的变量选择中的应用

Jingbo Liu

发表机构 * Department of Statistics, University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校统计系）； Department of Electrical and Computer Engineering, the Grainger College of Engineering（格拉inger工程学院电子与计算机工程系）

AI总结提出基于稳定性原理的广义去偏Lasso估计量，通过设计矩阵单列扰动下的简单更新公式，在比例增长机制下实现渐近精确近似，显著降低重抽样变量选择的计算成本。

Comments to appear in Bernoulli

2506.06542 2026-06-15 stat.ML cs.LG 版本更新

Direct Fisher Score Estimation for Likelihood Maximization

直接Fisher得分估计用于似然最大化

Sherman Khoo, Yakun Wang, Song Liu, Mark Beaumont

发表机构 * School of Mathematics, University of Bristol（布里斯托大学数学学院）； School of Biological Sciences, University of Bristol（布里斯托大学生物科学学院）

AI总结针对似然函数难解但模型模拟易得的问题，提出基于局部得分匹配的顺序梯度优化方法，直接建模Fisher得分，实现快速高效的似然最大化。

2601.11626 2026-06-15 math.NA cs.LG cs.NA 版本更新

Concatenated Matrix SVD: Compression Bounds, Incremental Approximation, and Error-Constrained Clustering

拼接矩阵SVD：压缩界限、增量近似与误差约束聚类

Maksym Shamrai

发表机构 * Institute of Mathematics of NAS of Ukraine（乌克兰国家科学院数学研究所）； MacPaw Research（MacPaw研究）

AI总结针对拼接后截断SVD压缩中哪些矩阵可安全合并的问题，提出基于谱界和增量SVD的聚类框架，实现显式误差约束下的压缩感知矩阵分组。

Comments Published in Transactions on Machine Learning Research (06/2026)

Journal ref Transactions on Machine Learning Research (2026)

详情

AI中文摘要

现代机器学习、信号处理和科学计算中出现了大量矩阵集合，通常通过拼接后截断奇异值分解（SVD）进行压缩。这种策略实现了参数共享和高效重构，已被广泛应用于多视图学习、信号处理到神经网络压缩等领域。然而，它留下了一个基本问题未解答：在显式重构误差约束下，哪些矩阵可以安全地拼接并压缩在一起？现有方法依赖于启发式或特定于架构的分组，并且对所得的SVD近似误差没有提供原则性保证。在本工作中，我们引入了一个理论驱动的框架，用于在SVD压缩约束下进行矩阵的压缩感知聚类。我们的分析建立了水平拼接矩阵的新谱界，从奇异值增长的下界推导出最优秩-$r$ SVD重构误差的全局上界。第一个界遵循Weyl型块扩展下的单调性，而第二个界利用增量残差的奇异值提供更紧的逐块保证。我们进一步开发了一种基于增量截断SVD的高效近似估计器，无需形成完整的拼接矩阵即可跟踪主导奇异值。因此，我们提出了三种聚类算法，仅当预测的联合SVD压缩误差低于用户指定阈值时才合并矩阵。这些算法在速度、可证明准确性和可扩展性之间权衡，实现了具有显式误差控制的压缩感知聚类。

英文摘要

Large collections of matrices arise throughout modern machine learning, signal processing, and scientific computing, where they are commonly compressed by concatenation followed by truncated singular value decomposition (SVD). This strategy enables parameter sharing and efficient reconstruction and has been widely adopted across domains ranging from multi-view learning and signal processing to neural network compression. However, it leaves a fundamental question unanswered: which matrices can be safely concatenated and compressed together under explicit reconstruction error constraints? Existing approaches rely on heuristic or architecture-specific grouping and provide no principled guarantees on the resulting SVD approximation error. In the present work, we introduce a theory-driven framework for compression-aware clustering of matrices under SVD compression constraints. Our analysis establishes new spectral bounds for horizontally concatenated matrices, deriving global upper bounds on the optimal rank-$r$ SVD reconstruction error from lower bounds on singular value growth. The first bound follows from Weyl-type monotonicity under blockwise extensions, while the second leverages singular values of incremental residuals to yield tighter, per-block guarantees. We further develop an efficient approximate estimator based on incremental truncated SVD that tracks dominant singular values without forming the full concatenated matrix. Therefore, we propose three clustering algorithms that merge matrices only when their predicted joint SVD compression error remains below a user-specified threshold. The algorithms span a trade-off between speed, provable accuracy, and scalability, enabling compression-aware clustering with explicit error control.

URL PDF HTML ☆

赞 0 踩 0

2605.04954 2026-06-15 cs.NE cs.LG 版本更新

On the Influence of the Feature Computation Budget on Per-Instance Algorithm Selection for Black-Box Optimization

特征计算预算对黑箱优化中逐实例算法选择的影响

Koen van der Blom, Diederick Vermetten

发表机构 * Centrum Wiskunde & Informatica（荷兰阿姆斯特丹数学与信息学中心）； Sorbonne Université（索邦大学）； CNRS（国家科学研究中心）； LIP6（LIP6实验室）

AI总结研究黑箱优化中特征计算预算对逐实例算法选择性能的影响，发现即使花费25%预算计算特征，PIAS仍可行，且最优预算比例高度依赖场景。

详情

AI中文摘要

逐实例算法选择（PIAS）利用一组算法之间的互补性，通过决定在给定实例上运行哪个算法来提升性能。该决策基于实例的特征，而在黑箱优化（BBO）的背景下，这些特征需要消耗一部分优化预算来计算。这引发了两个问题：(a) 在特征计算上花费多少比例的预算时，PIAS对BBO变得值得；(b) 哪个预算比例能优化特征准确性与PIAS性能之间的权衡。为此，我们进行了一项广泛的研究，将不同采样预算用于特征计算的PIAS与单一最佳算法在多种算法选择场景下进行比较。这些场景包括两种组合规模、三个问题集、四种维度以及十个目标预算。我们发现，在大多数测试场景中，PIAS是可行的，即使将总预算的四分之一用于特征计算。用于特征计算的预算比例以最大化PIAS收益的权衡高度依赖于具体的算法选择场景。此外，平均而言，PIAS相对于虚拟最佳求解器的损失中有20%可归因于特征计算预算，这凸显了适当考虑特征预算的重要性。

英文摘要

Per-instance algorithm selection (PIAS) takes advantage of complementarity between a set of algorithms by deciding which algorithm to run on a given instance. This decision is based on features of the instances, which, in the context of black-box optimization (BBO), require a part of the optimization budget to be computed. This raises two questions: (a) from which fraction of the budget spent on feature computation does PIAS become worth it for BBO, and (b) which fraction of the budget optimizes the tradeoff between feature accuracy and PIAS performance. To this end, we perform a broad study where PIAS with varying sampling budgets for feature computation is compared to the single best algorithm on a broad range of algorithm selection scenarios. These scenarios consist of two portfolio sizes, three problem sets, 4 dimensionalities, and 10 target budgets. We find that PIAS is viable for the majority of tested scenarios, even when as much as a quarter of the total budget is spent on feature computation. The tradeoff for the fraction of the budget spent on feature computation to maximize the benefit of PIAS is highly dependent on the specific AS scenario. Further, on average 20 percent of PIAS loss to the virtual best solver is explained by the budget spent on feature computation, highlighting the importance of properly accounting for the feature budget.

URL PDF HTML ☆

赞 0 踩 0

2606.13740 2026-06-15 cs.LG 新提交

Efficient On-Device Diffusion LLM Inference with Mobile NPU

基于移动NPU的高效设备端扩散大语言模型推理

Tuowei Wang, Yanfan Sun, Ju Ren

发表机构 * Tsinghua University（清华大学）； Beihang University（北京航空航天大学）

AI总结提出首个NPU感知推理框架Diffusion-LLM-on-NPU，通过多块推测解码、双路径渐进修正和交换优化内存运行时，在移动设备上加速扩散大语言模型推理，相比CPU基线实现17-42倍延迟降低。

详情

AI中文摘要

扩散大语言模型（dLLM）通过并行去噪多个token来加速生成，使其适用于延迟敏感的移动端推理。然而，重复去噪在智能手机上引入了大量计算。移动神经处理单元（NPU）提供高吞吐量的密集矩阵计算，但高效利用它们仍然具有挑战性：token提交缩小了每块的有效工作负载，token修订使KV缓存重用复杂化，且NPU可见地址空间有限导致昂贵的重映射和数据传输开销。在本文中，我们提出了Diffusion-LLM-on-NPU，这是首个用于在智能手机上加速dLLM的NPU感知推理框架。Diffusion-LLM-on-NPU通过三种技术将块级dLLM推理与移动NPU的执行特性对齐。（1）多块推测解码用推测的未来块token填充当前块解码后期阶段缩小的负载。（2）双路径渐进修订使已提交的token在稳定前保持可修订，并通过CPU侧路径刷新不稳定token，而不会阻塞密集的NPU执行。（3）交换优化内存运行时压缩NPU可见地址布局，并将数据准备与NPU计算重叠，以减少重映射和传输开销。我们将Diffusion-LLM-on-NPU实现为端到端框架，并在多种硬件平台和dLLM工作负载上进行评估。Diffusion-LLM-on-NPU在保留生成质量的同时，将LLaDA-8B的生成延迟比使用前缀KV缓存重用的CPU基线降低了17倍至42倍。

英文摘要

Diffusion large language models (dLLMs) accelerate generation by denoising multiple tokens in parallel, making them attractive for latency-sensitive mobile inference. However, repeated denoising introduces substantial computation on smartphones. Mobile neural processing units (NPUs) offer high-throughput dense matrix computation, but efficiently exploiting them remains challenging: token commitment shrinks per-block effective workloads, token revision complicates KV cache reuse, and limited NPU-visible address space incurs costly remapping and data transfer overheads. In this paper, we propose llada.cpp, the first NPU-aware inference framework for accelerating dLLMs on smartphones. llada.cpp aligns block-wise dLLM inference with the execution characteristics of mobile NPUs through three techniques. (1) Multi-Block Speculative Decoding fills the shrinking workload in late-stage current-block decoding with speculative future-block tokens. (2) Dual-Path Progressive Revision keeps committed tokens revisable until stable and refreshes unstable tokens through a CPU-side path without stalling dense NPU execution. (3) Swap-Optimized Memory Runtime compacts NPU-visible address layouts and overlaps data staging with NPU computation to reduce remapping and transfer overheads. We implement llada.cpp as an end-to-end framework and evaluate it across diverse hardware platforms and dLLM workloads. llada.cpp reduces LLaDA-8B generation latency by 17x-42x over the CPU baseline with prefix KV cache reuse, while preserving generation quality.

URL PDF HTML ☆

赞 0 踩 0

2606.13767 2026-06-15 cs.LG cs.AI cs.IT math.IT 新提交

Beyond LoRA: Is Sparsity-Induced Adaptation Better?

超越LoRA：稀疏诱导的适应更好吗？

Elijah Cadenhead, Cristian McGee, Xin Li, El Houcine Bergou, Aritra Dutta

发表机构 * School of Data, Mathematical and Statistical Sciences, University of Central Florida, United States（中佛罗里达大学数据、数学与统计科学学院）； College of Computing, Mohammed VI Polytechnic University (UM6P), Morocco（穆罕默德六世理工大学计算机学院）； Department of Computer Science, University of Central Florida, United States（中佛罗里达大学计算机科学系）

AI总结本文提出Cheap LoRA (cLA)及其变体，通过在LoRA中引入稀疏性实现参数高效微调，理论推导泛化误差界，实验表明在多种任务上性能与参数匹配基线相当，同时减少训练时间和峰值GPU内存。

Comments Overview of the paper and code can be found here: https://elicaden.github.io/Beyond_LoRA/

详情

AI中文摘要

低秩适应（LoRA）及其变体为预训练模型的全微调提供了一种内存和计算高效的替代方案。然而，关于这些方法的比较泛化能力以及低秩更新的结构限制如何保持有效适应性能的问题仍然存在。我们提出了一个历史框架，涵盖过去（全微调和原始LoRA）、现在（LoRA的不同变体），并通过在现有LoRA变体中引入稀疏性，提出了更简单、更便宜、参数高效的扩展：Cheap LoRA (cLA)，训练单个低秩因子而固定另一个（确定性地或在其随机变体中随机地），以及链式循环变体${c}^3$LA。我们将cLA视为非对称LoRA的结构化实例，作为全微调的控制列子空间限制。我们推导了这些变体的信息论泛化误差界，这是该领域的首批尝试之一。在实验上，我们评估了10个预训练模型和14个数据集上的11种微调方法，使用损失景观和谱分析等工具分析了微调模型的性能和泛化能力。尽管微调模型对预训练模型、数据集和其他因素敏感，但我们的研究表明，将基于LoRA的PEFT方法的适应限制在稀疏、结构化的列空间上，在参数匹配基线的任务上仍然具有竞争力，同时即使使用朴素、非优化的稀疏实现，也能减少高达10%的训练时间和高达15%的峰值GPU内存。我们的理论和实验泛化度量为其成本效益适应提供了比常用分析工具更一致和原则性的方法。概述和代码可在以下网址获取：此 https URL。

英文摘要

Low-rank adaptation (LoRA) and its variants provide a memory- and compute-efficient alternative to full fine-tuning of pre-trained models. However, questions remain about the comparative generalizability of these approaches and how the structural restrictions on low-rank updates preserve effective adaptation performance. We present a historical framing, covering the past (full fine-tuning and original LoRA), the present (different variants of LoRA), and propose simpler, cheaper, parameter-efficient extensions by inducing sparsity within existing LoRA variants: Cheap LoRA (cLA), training a single low-rank factor with the other fixed (deterministically or, in its randomized variant, stochastically), and the chained circulant variant, ${c}^3$LA. We frame cLA as a structured instance of asymmetric LoRA, serving as a controlled column-subspace restriction of full fine-tuning. We derive information-theoretic generalization error bounds for these variants, marking one of the first endeavors in this area. Empirically, we evaluate 11 fine-tuning methods across 10 pre-trained models and 14 datasets, analyzing the fine-tuned models' performance and generalization using tools such as loss landscapes and spectral analysis. Despite the sensitivity of fine-tuned models to the pre-trained model, datasets, and other factors, our study suggests that restricting LoRA-based PEFT methods' adaptation to a sparse, structured column space remains competitive across tasks with their parameter-matched baselines while reducing up to 10% training time and peak GPU memory up to 15%, even with a naïve, non-optimized, sparse implementation. Our theoretical and empirical generalization measures provide a more consistent and principled approach to their cost-effective adaptation than commonly used analytical tools. Overview and code are available at: https://elicaden.github.io/Beyond_LoRA/.

URL PDF HTML ☆

赞 0 踩 0

2606.13894 2026-06-15 cs.LG cs.AI cs.CL cs.CV 新提交

Gefen: Optimized Stochastic Optimizer

Gefen: 优化随机优化器

Nadav Benedek, Tomer Koren, Ohad Fried

发表机构 * Reichman University（赖希曼大学）； Tel Aviv University（特拉维夫大学）； Google Research（谷歌研究院）

AI总结提出Gefen优化器，通过共享二阶矩估计和量化一阶矩，将AdamW内存占用减少约8倍，同时保持相同性能，支持更大批量和吞吐量。

详情

AI中文摘要

AdamW是现代深度学习的默认优化器，但其一阶和二阶矩状态会额外占用约两倍参数大小的训练内存。我们提出Gefen，一种内存高效的优化器，它自动在参数块之间共享二阶矩估计，并使用学习到的码本量化一阶矩，从而将AdamW的内存占用减少约8倍，同时保持相同性能，相当于每十亿参数减少6.5 GiB。该方法受理论结果启发，该结果表明大的混合Hessian项将平方梯度的比率约束为接近1，表明Hessian对齐的参数是共享二阶矩统计量的自然候选。由于大规模计算Hessian不切实际，Gefen从初始平方梯度推断块结构，除了AdamW默认超参数外，不需要任何架构特定的元数据或超参数。Gefen学习基于精确直方图的动态规划量化码本，并重用相同的块进行一阶矩缩放。在多种实验中，Gefen在比较的类似AdamW的方法中实现了最低的峰值优化器内存，同时保持AdamW级别的性能。在FSDP和DDP训练中，减少的内存占用支持更大的微批次，并显著提高相对于AdamW的吞吐量，提供了一种实用的即插即用替代方案，具有更低的内存使用，可以增加吞吐量并支持训练更大的模型或使用更大的批量大小。我们提供了完整的Python实现，包括融合CUDA内核，网址为https://this https URL。

英文摘要

AdamW is a default optimizer for modern deep learning, but its first and second moment states add roughly two parameter-sized buffers to training memory. We propose Gefen, a memory-efficient optimizer that automatically shares second-moment estimates across parameter blocks and quantizes the first moment using a learned codebook, thereby reducing AdamW's memory footprint by ~8x while maintaining the same performance, corresponding to a reduction of 6.5 GiB per billion parameters. The method is motivated by a theoretical result showing that large mixed Hessian entries constrain the ratio of squared gradients toward one, suggesting that Hessian-aligned parameters are natural candidates for sharing second-moment statistics. Since computing Hessians is impractical at scale, Gefen infers block structure from the initial squared gradients, requiring no architecture-specific metadata or hyperparameters beyond AdamW defaults. Gefen learns an exact histogram-based dynamic-programming quantization codebook and reuses the same blocks for first-moment scaling. Across diverse experiments, Gefen achieves the lowest peak optimizer memory among the compared AdamW-like methods while maintaining AdamW-level performance. In FSDP and DDP training, the reduced memory footprint enables larger microbatches and improves throughput significantly over AdamW, providing a practical drop-in replacement with lower memory usage that can increase throughput and enable training larger models or using larger batch sizes. We provide the complete Python implementation, including fused CUDA kernels at https://github.com/ndvbd/Gefen

URL PDF HTML ☆

赞 0 踩 0

2606.14150 2026-06-15 cs.LG cs.CL 新提交

Small LLMs: Pruning vs. Training from Scratch

小型LLM：剪枝 vs. 从头训练

Yufeng Xu, Taiming Lu, Kunjun Li, Jiachen Zhu, Mingjie Sun, Zhuang Liu

发表机构 * Princeton University（普林斯顿大学）； New York University（纽约大学）； Carnegie Mellon University（卡内基梅隆大学）

AI总结本文通过六种剪枝方法在Llama-3.1-8B上比较剪枝与从头训练，发现有限预算下剪枝更优，预算充足时粗粒度剪枝可被超越。

Comments Our code is available at https://github.com/zlab-princeton/llm-pruning-collection

详情

AI中文摘要

剪枝有望成为获得强大小型语言模型的捷径。在本工作中，我们通过六种涵盖深度、宽度和稀疏粒度的剪枝方法，在两种受控的token匹配设置下，以0.5-0.8的剪枝率对Llama-3.1-8B进行剪枝，检验了这一承诺。(1) 在相同的训练token预算下，剪枝初始化始终优于随机初始化。这表明父模型提供了一个强起点，尽管随着训练token预算的增加和剪枝率的提高，优势逐渐缩小，在我们研究的最高剪枝率下几乎消失。(2) 当从头训练被给予整个流程消耗的全部token预算时，细粒度剪枝仍保持优势，而粗粒度结构化剪枝可能被匹配或超越。这表明父模型传递了额外训练token无法完全恢复的知识，但仅在细粒度下如此。综合来看，我们的结果给出了明确的建议：当手头有一个大型预训练模型且训练token预算有限时，剪枝优于从头训练；当训练预算不受限时，从头训练在粗粒度剪枝下可能具有竞争力，因此大型预训练父模型并非总是必要的。

英文摘要

Pruning promises a shortcut to strong small language models. In this work, we examine this promise by pruning Llama-3.1-8B at pruning ratios of 0.5--0.8 with six methods spanning depth, width, and sparse granularities, under two controlled token-matched settings. (1) With the same training token budget, pruned initialization consistently outperforms random initialization. This shows that the parent model provides a strong starting point, although the advantage narrows as the training token budget grows and as the pruning ratio rises, nearly vanishing at the highest pruning ratio we study. (2) When training from scratch is instead given the full token budget consumed by the whole pipeline, pruning at finer granularities still retains an advantage, while coarser structured pruning can be matched or surpassed. This suggests that the parent model transfers knowledge that additional training tokens alone cannot fully recover, but only at fine granularity. Taken together, our results yield a clear recommendation: with a large pretrained model in hand and a limited training token budget, pruning is better than training from scratch; when the training budget is not limited, training from scratch can be competitive for coarser pruning, so a large pretrained parent is not always necessary.

URL PDF HTML ☆

赞 0 踩 0

2606.14346 2026-06-15 cs.LG cs.AI 新提交

Squeeze-Release: Iterative Pruning with Exact Structural Minimization

挤压-释放：具有精确结构最小化的迭代剪枝

Roman Denkin, Ida Akerholm, Prashant Singh, Ida-Maria Sintorn

发表机构 * Uppsala University（乌普萨拉大学）

AI总结提出Squeeze-Release循环，通过精确结构重写将掩码网络转化为更小密集网络，并引入CompensatedLayerNorm扩展至残差流，实现高达39倍压缩。

详情

AI中文摘要

非结构化剪枝产生稀疏权重张量，但标准实现保持张量形状不变，因此部署模型并不比剪枝前更小。我们提出一种精确的结构重写，称为最小化，它将掩码网络转换为一个更小的密集网络，其前向函数在浮点舍入误差内相同。挤压-释放循环迭代剪枝和最小化，中间有一个释放步骤，将压缩张量内的精确零位置重新启用为小的校准噪声，将原本浪费的容量转化为可训练参数。连续的循环利用该容量找到单次剪枝无法达到的结构冗余。我们还引入了CompensatedLayerNorm，这是一种保持功能的LayerNorm替代方案，将最小化扩展到具有LayerNorm的残差流上的通道缩减。挤压-释放将可部署网络压缩到比未剪枝模型小39倍（全连接模型网络）和14.8倍（现代CNN，ConvNeXt-Tiny），且精度相当。此外，我们证明该重写可以扩展到Transformer架构。

英文摘要

Unstructured pruning produces sparse weight tensors, but the standard implementation keeps tensor shapes unchanged so the deployed model is no smaller than before pruning. We present an exact structural rewrite, which we call minimization, that converts a masked network into a smaller dense network with the same forward function up to floating-point rounding. The Squeeze-Release cycle iterates pruning and minimization with an intermediate release step that re-enables the exact-zero positions inside the compacted tensors as small calibrated noise, turning otherwise wasted capacity back into trainable parameters. Successive cycles use that capacity to find structural redundancy a single pass cannot reach. We additionally introduce CompensatedLayerNorm, a function-preserving replacement for LayerNorm that extends minimization to channel reduction across LayerNorm-equipped residual streams. Squeeze-Release compresses the deployable network to 39x smaller than the unpruned model on a fully-connected model network and 14.8x smaller on modern CNN (ConvNeXt-Tiny), at comparable accuracy. In addition we prove that the rewrite can be extended to transformer architectures.

URL PDF HTML ☆

赞 0 踩 0

2606.14598 2026-06-15 cs.LG 新提交

Realizing Native INT8 Compute for Diffusion Transformers on Consumer GPUs: A Fused INT8 GEMM Kernel for Ideogram 4.0

在消费级GPU上实现扩散Transformer的原生INT8计算：用于Ideogram 4.0的融合INT8 GEMM内核

Ali Asaria, Tony Salomone, Deep Gandhi

发表机构 * Transformer Lab

AI总结针对消费级Ampere GPU上INT8量化比FP8/NF4更慢的问题，提出融合Triton INT8 GEMM内核，直接利用INT8张量核心，在Ideogram 4.0中实现2.8-4.2倍加速，端到端速度提升约10%，使1024px单卡可行。

详情

AI中文摘要

扩散Transformer的训练后INT8（W8A8）量化被广泛用作速度优化，但在消费级Ampere GPU上，它通常比它本应击败的FP8和NF4替代方案更慢。我们将此归因于一个软件伪影：生产中的“INT8”前向量化权重和激活，但立即将它们反量化回bf16并执行bf16矩阵乘法，从未使用GPU的INT8张量核心，因此硬件的计算优势完全未被利用。我们通过一个单一的融合Triton INT8 GEMM（在Ampere张量核心上执行int8xint8->int32，并在epilogue中融合每token乘每通道的反量化和偏置，针对每个GEMM形状自动调优）来弥补这一差距，将其插入Ideogram 4.0扩散Transformer的线性层中，替代反量化到bf16的路径。在该内核中，int8xint8->int32累加与torch._int_mm逐位精确，反量化输出与参考的余弦相似度为1.0且无NaN，每个GEMM的运行速度比bf16快2.8-4.2倍。端到端在768px分辨率下实现约1.1倍（约9-10%）的加速，在1024px分辨率下，单张RTX 3090上生成图像耗时156.5秒，快于单卡NF4（164.5秒）和FP8（172.9秒）基线，且在这些点估计（PickScore/CLIPScore）上无质量损失。因此，INT8从最慢的变体变为最快，1024px在单GPU上变得可行。主要速度标准（击败FP8，约9.5%）轻松满足；NF4的差距（约4.9%，单次运行n=4）在未量化的运行间方差内，最好理解为与达到扩展目标一致。最后我们给出一个诚实的部署图：该优势特定于消费级Ampere，在A100和B200上，相同内核会输给这些卡快速的本地bf16/FP8路径。

英文摘要

Post-training INT8 (W8A8) quantization of diffusion transformers is widely deployed as a speed optimization, yet on consumer Ampere GPUs it is frequently slower than the FP8 and NF4 alternatives it is meant to beat. We trace this to a software artifact: the production "INT8" forward quantizes weights and activations only to immediately dequantize them back to bf16 and run a bf16 matrix multiply, never engaging the GPU's INT8 tensor cores, so the hardware's compute advantage is left entirely unrealized. We close this gap with a single fused Triton INT8 GEMM (int8xint8->int32 on Ampere tensor cores, with per-token x per-channel dequantization and bias folded into the epilogue, autotuned per GEMM shape) dropped into the Ideogram 4.0 diffusion transformer's linear layers in place of the dequantize-to-bf16 path. In the kernel, the int8xint8->int32 accumulation is bit-exact against torch._int_mm and the dequantized output matches the reference at cosine similarity 1.0 with no NaNs, running 2.8-4.2x faster than bf16 per GEMM. End to end it delivers a ~1.1x (~9-10%) speedup at 768px, and at 1024px it generates an image in 156.5 s on a single RTX 3090, faster than the single-card NF4 (164.5 s) and FP8 (172.9 s) baselines, at no measurable quality cost on these point estimates (PickScore/CLIPScore). INT8 thus goes from the slowest variant to the fastest, and 1024px becomes single-GPU feasible. The primary speed criterion (beat FP8, by ~9.5%) is comfortably met; the NF4 margin (~4.9%, single-run n=4) is within run-to-run variance we did not quantify and is best read as consistent with meeting the stretch target. We close with an honest deployment map: the win is specific to consumer Ampere, and on A100 and B200 the same kernel loses to those cards' fast native bf16/FP8 paths.

URL PDF HTML ☆

赞 0 踩 0

2606.14695 2026-06-15 cs.LG cs.CL 新提交

Persona-Pruner: Sculpting Lightweight Models for Role-Playing

Persona-Pruner: 为角色扮演雕琢轻量级模型

Jinsu Kim, Jihoon Tack, Noah Lee, Jongheon Jeong

AI总结提出Persona-Pruner框架，通过从单个描述中隔离特定角色的子网络来剪枝语言模型，在保持角色扮演性能的同时大幅降低计算成本，性能下降比最强基线减少93.8%。

Comments 25 pages; ICML 2026; Code is available at https://github.com/jsu-kim/Persona-Pruner

详情

AI中文摘要

语言模型（LMs）作为角色扮演聊天机器人展现出显著潜力，在给定角色或用户画像规范时，能够提供一致且风格化的交互。然而，将这些能力应用于现实世界应用（例如，众多NPC同时交互的生态系统）时，由于过高的计算成本，暴露了关键的效率问题。在本文中，我们质疑将完整的通用模型专用于单一角色的必要性，假设特定角色身份仅依赖于模型总容量的一小部分。我们观察到，朴素地剪枝LM通常会严重降低特定角色的角色扮演性能；它无法区分冗余知识和基本角色特征。我们提出Persona-Pruner，一个通过从单个描述中隔离特定角色的子网络来雕琢轻量级角色扮演模型的框架。我们的实验一致表明，Persona-Pruner在保留角色扮演性能方面比现有最先进的LLM剪枝技术有效得多，在RoleBench上使用LLM-as-a-judge评分，将性能下降从密集模型减少至多93.8%（相比最强基线），同时仍保持通用LLM能力。代码可在以下网址获取：此https URL。

英文摘要

Language Models (LMs) have shown remarkable potential as role-playing chatbots, delivering consistent, stylized interactions when given a specification of a character or user persona. However, applying these capabilities to real-world applications (e.g., ecosystems with numerous NPCs interacting simultaneously) exposes a critical inefficiency due to the excessive computational cost. In this paper, we question the necessity of dedicating a full, generalist model to a single persona, hypothesizing that a specific character identity relies on only a fraction of the model's total capacity. We observe that naively pruning LMs often severely degrades the role-playing performance for a specific persona; it does not distinguish between redundant knowledge and essential character traits. We propose Persona-Pruner, a framework that sculpts a lightweight role-playing model by isolating persona-specific sub-networks from a single description. Our experiments consistently show that Persona-Pruner preserves role-playing performance substantially more effectively than existing state-of-the-art LLM pruning techniques, reducing the performance drop from the dense model by up to 93.8% over the strongest baseline on RoleBench in LLM-as-a-judge score, while still maintaining general LLM capabilities. Code is available at https://github.com/jsu-kim/Persona-Pruner.

URL PDF HTML ☆

赞 0 踩 0

2606.13694 2026-06-15 eess.SP cs.AI cs.LG 交叉投稿

Efficient Temporal Modeling for Mobile Sleep Staging via Lightweight Random Attention

基于轻量随机注意力的移动睡眠分期高效时序建模

Guisong Liu, Pengfei Wei, Jainsong Zhang, Martin Dresler

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出轻量随机注意力模块RA，通过固定随机投影实现相似性聚合，替代可学习序列建模，在移动睡眠分期中实现高效时序平滑，理论解释为随机注意力先验核，实验显示在准确率和F1上提升1-3%，性能媲美LSTM/GRU/Transformer。

Comments 7 pages, 1 figures, 5 tables

详情

AI中文摘要

移动睡眠分期是家庭睡眠监测和闭环调节的基础设施。但现有的序列模型如RNN和Transformer在移动部署中计算成本高。本文提出随机注意力（RA），一种基于固定随机投影的轻量时序建模模块，用基于相似性的聚合替代可学习的序列建模。RA在历元编码器之外引入极少的额外参数，同时实现有效的时序平滑。我们进一步通过随机注意力先验核（RAPK）提供理论解释，将RA分解为全局平滑项和特征相似性项，为时序睡眠结构提供可解释的视角。在Sleep-EDF-20和Sleep-EDF-78上的实验表明，RA在准确率和F1分数上持续提升历元级基线1-3%，同时达到与LSTM、GRU和Transformer模型相竞争的性能。RA还展示了在不同骨干编码器上的强泛化能力，以及相对于传统时序平滑方法的改进鲁棒性。这些结果表明，通过轻量基于相似性的时序聚合可以实现高效的睡眠分期，使RA适用于实时可穿戴应用。

英文摘要

Mobile sleep staging serves as a foundational infrastructure for in-home sleep monitoring and closed-loop modulation. But existing sequential models such as RNNs and Transformers are computationally expensive for mobile deployment. In this paper, we propose Random Attention (RA), a lightweight temporal modeling module based on fixed random projections, which replaces learnable sequence modeling with similarity-based aggregation. RA introduces little additional parameters beyond the epoch encoder while enabling effective temporal smoothing. We further provide a theoretical interpretation via the Random Attention Prior Kernel (RAPK), which decomposes RA into a global smoothing term and a feature similarity term, offering an interpretable view of temporal sleep structure. Experiments on Sleep-EDF-20 and Sleep-EDF-78 show that RA consistently improves epoch-wise baselines by 1-3\% in accuracy and F1 score, while achieving competitive performance compared with LSTM, GRU, and Transformer models. RA also demonstrates strong generalization across different backbone encoders and improved robustness over conventional temporal smoothing methods. These results indicate that efficient sleep staging can be achieved through lightweight similarity-based temporal aggregation, making RA suitable for real-time wearable applications.

URL PDF HTML ☆

赞 0 踩 0

2606.13709 2026-06-15 stat.ML cs.LG 交叉投稿

LoMC: Localized Multidirectional Correction for Refusal Suppression in Routed Foundation Models

LoMC: 路由基础模型中拒绝抑制的局部多方向校正

Yan Hong, Kedong Xiu, Wei Li, Jun Lan, Huijia Zhu, Shuheng Zhou, Zhongcai Lyu, Weiqiang Wang, Jianfu Zhang

发表机构 * Ant Group（蚂蚁集团）； Zhejiang University（浙江大学）； Shanghai Jiao Tong University（上海交通大学）

AI总结提出LoMC方法，通过支持门控干预框架在路由MoE和混合MoE模型中实现紧凑的拒绝抑制，提升非拒绝目标响应行为并保持通用能力。

详情

AI中文摘要

我们研究了路由MoE和混合MoE基础模型中的受控后训练拒绝抑制，旨在增加非拒绝目标响应行为，同时在紧凑的干预足迹下保持通用能力。现有的基于广泛方向的编辑可能会扰动通用计算，而仅支持专家编辑通常缺乏足够的容量来纠正异质拒绝表示。为了解决这一限制，我们引入了局部多方向校正（LoMC），一种支持门控干预框架，遵循支持-然后-校正的执行顺序：它首先识别紧凑的编辑支持，然后将原型校正方向聚合成逐层校正方向，最后仅在选定的支持内应用秩一逐层校正。通过使用编辑支持作为结构门控约束，LoMC在不扩大干预范围的情况下增加了校正容量。在四个路由骨干上的纯文本和多模态安全基准实验表明，LoMC在紧凑干预足迹下显著改善了非拒绝目标响应行为，同时保持了通用能力。

英文摘要

We study controlled post-training refusal suppression in routed MoE and hybrid-MoE foundation models, aiming to increase non-refusal target-response behavior while preserving general capability under a compact intervention footprint. Existing broad direction-based edits can perturb general-purpose computation, whereas support-only expert edits often lack sufficient capacity to correct heterogeneous refusal representations. To address this limitation, we introduce Localized Multidirectional Correction (LoMC), a support-gated intervention framework that follows a support-then-correction execution order: it first identifies a compact edit support, then aggregates prototype correction directions into layer-wise correction directions, and finally applies rank-one layer-wise correction only within the selected support. By using the edit support as a structural gating constraint, LoMC increases correction capacity without expanding the intervention scope. Experiments on text-only and multimodal safety benchmarks across four routed backbones show that LoMC substantially improves non-refusal target-response behavior while maintaining general capability under a compact intervention footprint.

URL PDF HTML ☆

赞 0 踩 0

2606.13825 2026-06-15 math.OC cs.LG 交叉投稿

Scalable Deep Unfolding of Conic Optimizers

锥优化器的可扩展深度展开

Alex Oshin, Rahul Vodeb Ghosh, Evangelos A. Theodorou

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结提出矩阵自由隐式微分和基于Dalečkii-Krein的PSD锥反向传播规则，解决深度展开应用于大规模半定规划时的内存和数值稳定性问题，实现轻量级超参数策略和热启动学习，在多种问题上取得高达50倍加速。

详情

AI中文摘要

深度展开（DU）通过引入可学习组件并在展开迭代中进行训练来加速迭代优化器，但将DU扩展到机器人领域常见的大规模半定规划（SDP）仍然有限。展开全更新锥求解器（如COSMO）暴露了先前关于学习型锥求解器的工作未涉及的两个障碍：通过每次迭代的线性系统求解进行反向传播，当系数矩阵显式形成时，内存与问题规模成二次方关系；通过半正定（PSD）锥投影进行反向传播在特征值重合时变得数值不稳定。我们通过一种完全基于矩阵-向量乘积的矩阵自由隐式微分规则解决了第一个障碍，将内存从$O(n^2)$降低到$O(n)$，并使得在直接分解耗尽内存的规模下也能进行反向传播。我们通过基于Fréchet导数的Dalečkii-Krein表示的后向规则解决了第二个障碍，该规则在重复特征值下仍然定义良好。这些共同使得学习全更新锥求解器的轻量级超参数策略和热启动成为可能。我们在通过序列凸规划（SCP）求解的非线性协方差控制问题，以及从最大割和Lovász $\vartheta$ SDP到鲁棒估计和控制问题的独立SDP和第二阶锥规划上进行了评估。学习到的策略在所有问题上都优于最先进的求解器，并且根据问题类别可提供高达50倍的加速。当作为SCP中的子程序使用时，与COSMO相比，学习的方法提供了超过30倍的加速。

英文摘要

Deep unfolding (DU) accelerates iterative optimizers by introducing learnable components and training them through unrolled iterations, but extending DU to the large-scale semidefinite programs (SDPs) common in robotics has remained limited. Unrolling a full-update conic solver such as COSMO exposes two obstacles that prior work on learned conic solvers has not: backpropagating through the per-iteration linear-system solve incurs memory quadratic in the problem size once the coefficient matrix is formed explicitly, and backpropagating through the positive semidefinite (PSD) cone projection becomes numerically unstable when eigenvalues coincide. We address the first obstacle with a matrix-free implicit differentiation rule that operates entirely through matrix-vector products, reducing memory from $O(n^2)$ to $O(n)$ and enabling backpropagation at scales where direct factorization runs out of memory. We address the second with a backward rule based on the Dalečkii--Krein representation of the Fréchet derivative, which remains well-defined under repeated eigenvalues. Together these make it possible to learn lightweight hyperparameter policies and warm-starts for a full-update conic solver. We evaluate on nonlinear covariance steering problems solved via sequential convex programming (SCP), as well as standalone SDPs and second-order cone programs ranging from max-cut and Lovász $\vartheta$ SDPs to robust estimation and control problems. The learned policies outperform state-of-the-art solvers across all problems, and can provide up to a 50$\times$ speedup depending on the class. When used as a subroutine in SCP, the learned approach delivers over a 30$\times$ speedup compared to COSMO.

URL PDF HTML ☆

赞 0 踩 0

2606.14010 2026-06-15 cs.CV cs.LG cs.RO 交叉投稿

RT-VLA: Real-Time Vision-Language-Action Models via Knowledge Distillation

RT-VLA：通过知识蒸馏实现实时视觉-语言-动作模型

Xiangyu Huang, Zhenlin Hua, Han Zhou, Shounak Sural, Ragunathan Rajkumar

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结提出RT-VLA，通过多级监督蒸馏将SimLingo模型的能力压缩至轻量学生模型，在保持竞争性能的同时将推理时间降低44.8倍（纯视觉模式）和7.9倍（视觉+语言模式），实现实时可解释的VLA自动驾驶。

详情

AI中文摘要

视觉-语言-动作（VLA）模型通过联合建模视觉感知、语言推理、可解释性和动作预测，在端到端自动驾驶中展现出强大潜力。然而，其庞大的视觉-语言骨干网络和推理模块引入了显著的推理延迟，从而阻碍了它们在道路网络严苛现实中的部署。我们提出RT-VLA，一种轻量级、蒸馏的VLA模型，通过多级监督蒸馏将最先进的SimLingo模型的驾驶和推理能力迁移到紧凑的学生模型中。RT-VLA保留了基于语言的推理，并通过离线语言分析安全关键驾驶时刻来支持事后解释，而不增加实时控制的延迟。与SimLingo教师模型相比，RT-VLA在保持竞争性的闭环驾驶和语言推理性能的同时，在纯视觉模式下将推理时间减少了44.8倍，在视觉+语言模式下减少了7.9倍。这些结果表明，监督蒸馏是构建实时、可解释的VLA风格自动驾驶模型的实用方法。

英文摘要

Vision-Language-Action (VLA) models have shown strong potential for end-to-end autonomous driving by jointly modeling visual perception, language reasoning, explainability and action prediction. However, their large vision-language backbones and reasoning modules introduce substantial inference latency and thereby prevent their deployment in the unforgiving reality of the road networks. We propose RT-VLA, a lightweight, distilled VLA model that transfers the driving and reasoning capabilities of the state-of-the-art SimLingo model into a compact student through multi-level supervised distillation. RT-VLA preserves language-based reasoning and supports post-hoc explanation through offline language analysis of safety-critical driving moments without adding latency to real-time control. Compared to the SimLingo teacher, RT-VLA maintains competitive closed-loop driving and language reasoning performance while reducing inference time by 44.8X in vision-only mode and 7.9X in vision+language mode. These results suggest that supervised distillation is a practical approach for building real-time, explainable VLA-style autonomous driving models.

URL PDF HTML ☆

赞 0 踩 0

2606.14684 2026-06-15 cs.CV cs.LG 交叉投稿

HumP-KD: A Hybrid Uncertainty-Aware Multi-Stage Progressive Knowledge Distillation Framework for Efficient Fire Classification

HumP-KD: 一种混合不确定性感知的多阶段渐进式知识蒸馏框架用于高效火灾分类

Mohammed Arif Mainuddin, Najifa Tabassum, Omar Ibne Shahid, Riasat Khan

AI总结提出HumP-KD框架，通过层次化渐进式知识蒸馏和多阶段蒸馏，将两个冻结的异构Transformer教师（Swin-Tiny和ViT-Base）及其集成知识蒸馏到轻量级MobileViT-S学生模型中，在火灾分类任务上显著提升性能，同时保持低参数量和实时推理速度。

详情

AI中文摘要

实时火灾分类系统需要模型同时具备准确性、计算效率以及可在资源受限硬件上部署的能力。本文提出\textbf{HumP-KD}，一种混合不确定性感知的多阶段渐进式知识蒸馏框架，用于高效火灾分类。使用了两个数据集：FlameVision（8600张图像）和Dataset-II（31309张图像）。在标准预处理、在线增强、高斯噪声和运动模糊鲁棒性条件下，应用了多种CNN和Transformer基线模型。所提出的HumP-KD模型通过三个紧密集成的组件，将两个冻结的异构Transformer教师（Swin-Tiny和ViT-Base）及其Meta-MLP集成的知识蒸馏到轻量级MobileViT-S学生中。层次化渐进式知识蒸馏采用层次化特征构建器，生成融合的空间注意力掩码，以选择性地引导蒸馏到判别性区域。多阶段知识蒸馏在训练过程中逐步激活三个蒸馏阶段。在Dataset-II上，HumP-KD在10次独立试验中平均F1分数达到$0.9876 \pm 0.0063$，显著优于未使用蒸馏训练的MobileViT-S基线（$0.9537 \pm 0.0351$），独立t检验（$p = 0.0195$）和Wilcoxon符号秩检验（$W = 1$，$p = 0.0039$）均证实了统计显著性。所提出的方法还展示了跨数据集的强泛化能力和在退化视觉条件下的鲁棒性。学生模型仅保留4.94M参数和19.01Mb模型大小，相比Swin-Tiny参数减少$5.7\times$，相比ViT-Base减少$17.5\times$，同时达到37.72 CPU FPS，适合实时部署。

英文摘要

Real-time fire classification systems require models that are simultaneously accurate, computationally efficient, and deployable on resource-constrained hardware. This work proposes \textbf{HumP-KD}, a Hybrid Uncertainty-aware Multi-stage Progressive Knowledge Distillation framework for efficient fire classification. Two datasets, FlameVision and Dataset-II, containing 8,600 and 31,309 images, are used. Various CNN and transformer baselines are applied under standard preprocessing, online augmentation, Gaussian noise and motion blur robustness conditions. The proposed HumP-KD model distills knowledge from two frozen heterogeneous transformer teachers, Swin-Tiny and ViT-Base, along with their Meta-MLP ensemble, into a lightweight MobileViT-S student via three tightly integrated components. Hierarchical Progressive Knowledge Distillation employs a Hierarchical Feature Builder. It generates a fused spatial attention mask to guide distillation toward discriminative regions selectively. Multi-Stage Knowledge Distillation progressively activates three distillation stages across training. On Dataset-II, HumP-KD achieves a mean F1 score of $0.9876 \pm 0.0063$ across 10 independent trials, significantly outperforming the MobileViT-S baseline trained without distillation ($0.9537 \pm 0.0351$), with statistical significance confirmed by both independent t-test ($p = 0.0195$) and Wilcoxon signed-rank test ($W = 1$, $p = 0.0039$). The proposed method also demonstrates strong generalization across datasets and robustness under degraded visual conditions. The student model retains only 4.94M parameters and 19.01Mb model size, representing a $5.7\times$ parameter reduction over Swin-Tiny and a $17.5\times$ reduction over ViT-Base, while achieving 37.72 CPU FPS, making it suitable for real-time deployment.

URL PDF HTML ☆

赞 0 踩 0

2505.12992 2026-06-15 cs.LG cs.AI cs.CL stat.ML 版本更新

Fractured Chain-of-Thought Reasoning

断裂链式思维推理

Baohao Liao, Hanze Dong, Yuhui Xu, Doyen Sahoo, Christof Monz, Junnan Li, Caiming Xiong

发表机构 * University of Amsterdam（阿姆斯特丹大学）； eBay ； Microsoft（微软）； Google Research（谷歌研究）； Salesforce

AI总结提出断裂采样策略，通过截断推理链、调整轨迹数和解数，在推理时实现精度与成本的帕累托最优。

详情

AI中文摘要

FP4量化LLM训练中均值偏差的诅咒与祝福

Hengjie Cao, Zhendong Huang, Mengyi Chen, Yifeng Yang, Fang Dong, Anrui Chen, Ruijun Huang, Xin Zhang, Mingzhi Dong, Yujiang Wang, Jinlong Hou, Qin Lv, Robert P. Dick, Yuan Cheng, Tun Lu, Fan Yang, Yixuan Chen, Li Shang

发表机构 * Fudan University（复旦大学）； University of Bath（巴斯大学）； Shanghai Innovation Institute（上海创新研究院）； University of Oxford（牛津大学）； Oxford Suzhou Centre for Advanced Research（牛津苏浙研究中心）； University of Colorado Boulder（科罗拉多大学波德格分校）； University of Michigan（密歇根大学）； Shenzhen Loop Area Institute（深圳环宇研究院）

AI总结发现FP4训练失败源于激活异常值由秩一均值偏差主导，提出Averis均值残差分离量化法，在Qwen3模型上实现鲁棒W4A4G4训练，损失差距低于NVIDIA的Hadamard方法。

详情

AI中文摘要

FP4训练有望为大型语言模型节省大量内存和计算，但由于分块量化受极端激活幅度支配，导致动态范围膨胀并压缩长尾信号，因此仍然脆弱。我们发现了这一失败的一个反直觉来源：主导激活异常值不仅仅是任意的稀疏事件，而主要是由一致的秩一均值偏差引起的，其方向与主导各向异性谱分量对齐。该均值分量在训练过程中增强，被注意力和FFN算子放大和重塑，并日益主导顶部激活幅度。至关重要的是，这一发现揭示了一个看似复杂的异常值抑制问题实际上有一个非常简单的解决方案：在量化之前隔离一致的均值。因此，我们提出了Averis，一种均值残差分割量化方法，该方法在FP4量化之前仅使用归约和逐元素减法来分离均值分量。在100B token上训练的Qwen3 0.6B密集模型和50B token上训练的Qwen3 7B A1.5B MoE模型上，Averis实现了鲁棒的W4A4G4 FP4训练，将BF16损失差距降低至1.19%/0.81%，而NVIDIA最近发布的基于Hadamard的异常值平滑方法为2.05%/1.10%，同时将下游差距限制在0.89/0.71点。Averis在vanilla NVFP4上的端到端开销仅为2.20%，约为NVIDIA基于Hadamard设计的30%，为稳定的低位LLM训练提供了一条硬件高效的路径。与Hadamard互补，Averis在结合使用时进一步将Qwen3-0.6B的损失和下游差距降低至0.94%和0.73点。代码可在以下网址获取：this https URL。

英文摘要

FP4 training promises substantial memory and compute savings for large language models, but remains fragile because blockwise quantization is dictated by extreme activation magnitudes, which inflate dynamic range and compress long-tail signals. We identify a counterintuitive source of this failure: dominant activation outliers are not merely arbitrary sparse events, but are largely induced by a coherent rank-one mean bias, whose direction aligns with the leading anisotropic spectral component. This mean component strengthens during training, is amplified and reshaped by attention and FFN operators, and increasingly dominates top activation magnitudes. Crucially, this discovery reveals that a seemingly complex outlier-suppression problem admits a truly simple solution: isolate the coherent mean before quantization. We therefore propose Averis, a mean-residual splitting quantization method that separates the mean component using only reductions and elementwise subtractions before FP4 quantization. Across Qwen3 0.6B Dense trained on 100B tokens and Qwen3 7B A1.5B MoE trained on 50B tokens, Averis enables robust W4A4G4 FP4 training, reducing BF16 loss gaps to 1.19%/0.81% versus 2.05%/1.10% for NVIDIA's recently released Hadamard-based outlier-smoothing method, while limiting downstream gaps to 0.89/0.71 points. With only 2.20% end-to-end overhead over vanilla NVFP4, about 30% of NVIDIA's Hadamard-based design, Averis provides a hardware-efficient path to stable low-bit LLM training. Complementary to Hadamard, Averis further reduces the Qwen3-0.6B loss and downstream gaps to 0.94% and 0.73 points when combined. Code is available at: https://anonymous.4open.science/r/averis-504D.

URL PDF HTML ☆

赞 0 踩 0

2603.15481 2026-06-15 cs.LG cs.AI 版本更新

TabKD: Tabular Knowledge Distillation through Interaction Diversity of Learned Feature Bins

TabKD: 通过学习特征箱的交互多样性实现表格知识蒸馏

Shovon Niverd Pereira, Krishna Khadka, Yu Lei

发表机构 * Department of Computer Science and Engineering, The University of Texas at Arlington（计算机科学与工程系，德克萨斯理工大学阿灵顿分校）

AI总结提出TabKD方法，通过学习与教师决策边界对齐的自适应特征箱，生成最大化成对交互覆盖的合成查询，在表格数据知识蒸馏中显著提升学生-教师一致性。

Comments Accepted in 35th International Joint Conference on Artificial Intelligence IJCAI 2026

详情

AI中文摘要

无数据知识蒸馏可以在没有原始训练数据的情况下实现模型压缩，这对于隐私敏感的表格领域至关重要。然而，现有方法在表格数据上表现不佳，因为它们没有明确处理特征交互，而特征交互是表格模型编码预测知识的基本方式。我们识别出交互多样性，即特征组合的系统覆盖，是有效表格蒸馏的基本要求。为了实施这一见解，我们提出了TabKD，它学习与教师决策边界对齐的自适应特征箱，然后生成最大化成对交互覆盖的合成查询。在4个基准数据集和4种教师架构上，TabKD在16个配置中的14个中实现了最高的学生-教师一致性，优于5个最先进的基线。我们进一步表明，交互覆盖与蒸馏质量强相关，验证了我们的核心假设。我们的工作建立了以交互为中心的探索作为表格模型提取的原则性框架。

英文摘要

Data-free knowledge distillation enables model compression without original training data, critical for privacy-sensitive tabular domains. However, existing methods does not perform well on tabular data because they do not explicitly address feature interactions, the fundamental way tabular models encode predictive knowledge. We identify interaction diversity, systematic coverage of feature combinations, as an essential requirement for effective tabular distillation. To operationalize this insight, we propose TabKD, which learns adaptive feature bins aligned with teacher decision boundaries, then generates synthetic queries that maximize pairwise interaction coverage. Across 4 benchmark datasets and 4 teacher architectures, TabKD achieves highest student-teacher agreement in 14 out of 16 configurations, outperforming 5 state-of-the-art baselines. We further show that interaction coverage strongly correlates with distillation quality, validating our core hypothesis. Our work establishes interaction-focused exploration as a principled framework for tabular model extraction.

URL PDF HTML ☆

赞 0 踩 0

2604.21335 2026-06-15 cs.LG cs.CL 版本更新

Sub-Token Routing for KV Cache Compression

子令牌路由用于KV缓存压缩

Wei Jiang, Wei Wang

发表机构 * Futurewei Technologies（未来智科）

AI总结提出子令牌路由方法，在保留令牌内对值向量分组并选择性保留，与令牌级压缩互补，在LLM和VLM中提升压缩性能。

Comments 17 pages, 8 tables, 2 figures

详情

AI中文摘要

Transformer推理通常需要大型KV缓存，尤其是在长上下文语言建模和多模态生成中。现有的压缩方法通常通过选择、驱逐、量化或压缩缓存令牌，或在语言模型推理前减少视觉令牌序列来降低缓存成本。我们引入子令牌路由，一种KV压缩方法，它在保留令牌内部添加了更精细的控制轴。它将每个保留的值向量分成组，并仅保留选定的组，同时保持查询和键状态不变。该方法设计在令牌级缩减之后工作。首先，令牌缩减方法确定保留哪些令牌。然后，子令牌路由压缩这些保留令牌内部的值状态。在匹配KV预算下的实验表明，添加子令牌路由提高了令牌级缩减在LLM和VLM设置中的性能，包括LLaMA-2-7B和Qwen2.5-7B上的Quest，以及LLaVA和Qwen-VL模型上的FastV/VisionZip。在较小的KV预算下增益更大，表明当进一步移除令牌成本高昂时，值组路由特别有用。总体而言，令牌级缩减和子令牌路由提供了互补的降低KV成本的方式。

英文摘要

Transformer inference often requires a large KV cache, especially for long-context language modeling and multimodal generation. Existing compression methods usually reduce cache cost by selecting, evicting, quantizing, or compressing cached tokens, or by reducing the visual-token sequence before language-model inference. We introduce sub-token routing, a KV-compression method that adds a finer control axis inside retained tokens. It splits each retained value vector into groups and keeps only selected groups, while leaving query and key states unchanged. The method is designed to work after token-level reduction. First, a token-reduction method determines which tokens are retained. Then, sub-token routing compresses the value states inside those retained tokens. Experiments under matched KV budgets show that adding sub-token routing improves token-level reduction performance in both LLM and VLM settings, including Quest on LLaMA-2-7B and Qwen2.5-7B, and FastV/VisionZip across LLaVA and Qwen-VL models. The gains are larger at smaller KV budgets, suggesting that value-group routing is especially useful when further token removal becomes costly. Overall, token-level reduction and sub-token routing provide complementary ways to reduce KV cost.

URL PDF HTML ☆

赞 0 踩 0

2606.12280 2026-06-15 cs.LG 版本更新

NeST: 面向LLM安全的神经元选择性调优

Sasha Behrouzi, Lichao Wu, Mohamadreza Rostami, Ahmad-Reza Sadeghi

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结提出NeST框架，通过激活探测识别安全相关前馈神经元并训练共享簇级更新，仅用普通恶意提示即可泛化防御多种越狱攻击，在14个模型上以极少参数实现接近全微调的鲁棒性。

详情

AI中文摘要

安全对齐对于大型语言模型（LLM）的负责任部署至关重要。然而，现有方法通常依赖于重量级的微调，这在跨模型家族更新、审计和维护时成本高昂。全微调会产生大量的计算和存储开销，而参数高效方法（如低秩适应LoRA）则牺牲效率换取不一致的安全增益和对设计选择的敏感性。安全干预机制在不修改模型权重的情况下减少不安全输出，但无法直接塑造或保留控制安全行为的内部表示。我们提出NeST，一种用于高效事后安全对齐的神经元选择性调优框架。NeST通过对普通有害和无害提示进行激活探测来识别安全相关的前馈神经元，聚类具有相似激活模式的神经元，并训练共享的簇级更新，同时冻结模型的其余部分。重要的是，NeST仅使用普通恶意提示进行训练，不使用越狱特定的攻击数据，但能稳健地泛化到多种越狱攻击。学习到的更新随后被折叠到原始权重中，不产生推理时开销。在14个开源权重语言和多模态模型上的评估表明，NeST优于轻量级基线，并以显著更少的可训练参数接近全微调的鲁棒性。在纯文本模型上，NeST将平均越狱攻击成功率从44.5%降至1.1%，平均仅训练0.4M参数。在多模态设置中，它将ASR从55.3%降至1.1%，对于下游微调变体，通过将ASR从53.8%降至0.8%来恢复安全性。这些结果表明，通过将适应集中在局部、功能连贯的安全结构上，可以实现鲁棒、可维护的安全对齐。

英文摘要

Safety alignment is essential for the responsible deployment of Large Language Models (LLMs). Yet, existing approaches often rely on heavyweight fine-tuning that is costly to update, audit, and maintain across model families. Full fine-tuning incurs substantial computational and storage overhead, while parameter-efficient methods, e.g., Low-Rank Adaptation (LoRA), trade efficiency for inconsistent safety gains and sensitivity to design choices. Safety intervention mechanisms reduce unsafe outputs without modifying model weights, but do not directly shape or preserve the internal representations that govern safety behavior. We present NeST, a Neuron-Selective Tuning framework for efficient post-hoc safety alignment. NeST identifies safety-relevant feed-forward neurons via activation probing on vanilla harmful and benign prompts, clusters neurons with similar activation profiles, and trains shared cluster-level updates while freezing the rest of the model. Importantly, NeST is trained only on vanilla malicious prompts, without using jailbreak-specific attack data, yet generalizes robustly to diverse jailbreaks. The learned updates are then folded into the original weights, incurring no inference-time overhead. Evaluated on 14 open-weight language and multimodal models, NeST outperforms lightweight baselines and approaches full fine-tuning robustness with significantly fewer trainable parameters. On text-only models, NeST reduces average jailbreak attack success rate from 44.5% to 1.1% while training only 0.4M parameters on average. Across multimodal settings, it reduces ASR from 55.3% to 1.1%, and for downstream fine-tuned variants, it restores safety by reducing ASR from 53.8% to 0.8%. These results show that robust, maintainable safety alignment can be achieved by concentrating adaptation on localized, functionally coherent safety structures.

URL PDF HTML ☆

赞 0 踩 0

2604.23336 2026-06-15 cs.IR cs.CL cs.LG 版本更新

Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA

高效基于理由的检索：基于JEPA的生成重排序器的在线蒸馏

Teng Chen, Sheng Xu, Feixiang Guo, Xiaoyu Wang, Qingqing Gu, Hongyan Li, Luo Ji

发表机构 * Geely AI Lab（吉利人工智能实验室）

AI总结本文提出Rabtriever，通过在线蒸馏从生成重排序器中学习，将查询和文档独立编码，提升检索效率，同时在多个任务中表现优异。

Comments 11 pages, 8 figures. ICMR 2026 (https://youtu.be/apDcrzEVwq4)

详情

DOI: 10.1145/3805622.3810780

AI中文摘要

不同于传统基于事实的检索，基于理由的检索通常需要使用大语言模型对查询-文档对进行跨编码，造成显著的计算成本。为解决这一限制，我们提出了Rabtriever，它独立编码查询和文档，同时提供与重排序器相当的跨查询-文档理解能力。我们从训练一个基于LLM的生成重排序器开始，该重排序器将文档置于查询之前，并提示LLM通过对数概率生成相关性分数。然后将其作为在线蒸馏框架的教师，Rabtriever作为学生重建教师的上下文感知查询嵌入。为此，Rabtriever首先从教师中初始化，参数冻结。然后采用联合嵌入预测架构（JEPA）范式，该范式在LLM层和头部之间集成一个轻量级、可训练的预测器，将查询嵌入投影到新的隐藏空间，文档嵌入作为潜在向量。JEPA然后最小化此投影嵌入与教师嵌入的分布差异。为了增强在线蒸馏的采样效率，我们还添加了对LLM日志几率的反向KL的辅助损失，以重塑学生的日志几率分布。Rabtriever将教师在文档长度上的二次复杂度优化为线性，经理论和实验证实。实验表明，Rabtriever在多种基于理由的任务中优于不同的检索器基线，包括共情对话和机器人操作，且从重排序器中仅有微小的准确率下降。Rabtriever在传统检索基准如MS MARCO和BEIR上也表现良好，性能与最佳检索器基线相当。

英文摘要

Unlike traditional fact-based retrieval, rationale-based retrieval typically necessitates cross-encoding of query-document pairs using large language models, incurring substantial computational costs. To address this limitation, we propose Rabtriever, which independently encodes queries and documents, while providing comparable cross query-document comprehension capabilities to rerankers. We start from training a LLM-based generative reranker, which puts the document prior to the query and prompts the LLM to generate the relevance score by log probabilities. We then employ it as the teacher of an on-policy distillation framework, with Rabtriever as the student to reconstruct the teacher's contextual-aware query embedding. To achieve this effect, Rabtriever is first initialized from the teacher, with parameters frozen. The Joint-Embedding Predictive Architecture (JEPA) paradigm is then adopted, which integrates a lightweight, trainable predictor between LLM layers and heads, projecting the query embedding into a new hidden space, with the document embedding as the latent vector. JEPA then minimizes the distribution difference between this projected embedding and the teacher embedding. To strengthen the sampling efficiency of on-policy distillation, we also add an auxiliary loss on the reverse KL of LLM logits, to reshape the student's logit distribution. Rabtriever optimizes the teacher's quadratic complexity on the document length to linear, verified both theoretically and empirically. Experiments show that Rabtriever outperforms different retriever baselines across diverse rationale-based tasks, including empathetic conversations and robotic manipulations, with minor accuracy degradation from the reranker. Rabtriever also generalizes well on traditional retrieval benchmarks such as MS MARCO and BEIR, with comparable performance to the best retriever baseline.

URL PDF HTML ☆

赞 0 踩 0

2606.13748 2026-06-15 cs.LG 新提交

FedSPC: Shared Parameter Correction for Personalized Federated Learning

FedSPC：个性化联邦学习的共享参数校正

Kannanthodath Induchoodan Ajay Menon, Christian Prehofer, Yunfei Xu, Toru Hirano

发表机构 * DENSO AUTOMOTIVE Deutschland GmbH（电装汽车德国有限公司）； DENSO International America, Inc.（电装国际美国公司）； Technical University of Munich（慕尼黑工业大学）

AI总结针对个性化联邦学习中共享参数因客户端局部目标不一致而更新冲突的问题，提出模块化校正方法FedSPC，仅对共享参数应用控制变量校正，在多种PFL设置下提升性能。

Comments Accepted for presentation at FL@FM-IJCAI'26, in conjunction with IJCAI 2026. 9 pages

详情

AI中文摘要

个性化联邦学习（PFL）是联邦学习中解决统计异质性的重要方法之一，同时支持客户端特定的适应。许多PFL方法将模型拆分为共享参数和个性化参数，并在每个客户端上联合训练。然而，这产生了一个优化问题：共享参数由优化不同局部目标的客户端更新，可能导致共享更新不一致并削弱共享表示。为解决此问题，我们提出联邦共享参数校正（FedSPC），一种用于PFL的模块化校正方法。FedSPC仅对给定PFL方法的共享参数应用控制变量校正，而保持个性化参数不变。它可以集成到三种常见的PFL设置中：共享特征提取器、共享分类器以及带有局部正则化的完全共享模型。在CIFAR-100和Tiny-ImageNet上使用ViT、ResNet-34和VGG-11的实验表明，FedSPC提高了代表性PFL方法（包括FedPer、FedRep、FedBABU、LG-FedAvg和Ditto）的性能。

英文摘要

Personalized federated learning (PFL) is one of the important approaches in federated learning for addressing statistical heterogeneity while enabling client-specific adaptation. Many PFL methods split the model into shared and personalized parameters, which are jointly trained on each client. However, this creates an optimization issue: shared parameters are updated by clients optimizing different local objectives, which can lead to inconsistent shared updates and weaken the shared representation. To address this problem, we propose Federated Shared Parameter Correction (FedSPC), a modular correction method for PFL. FedSPC applies control-variate correction only to the shared parameters of a given PFL method, while leaving personalized parameters unchanged. It can be integrated into three common PFL settings: shared feature extractors, shared classifiers, and fully shared models with local regularization. Experiments on CIFAR-100 and Tiny-ImageNet with ViT, ResNet-34, and VGG-11 show that FedSPC improves performance across representative PFL methods, including FedPer, FedRep, FedBABU, LG-FedAvg, and Ditto.

URL PDF HTML ☆

赞 0 踩 0

2606.13873 2026-06-15 cs.LG cs.CL 新提交

Natively Unlearnable Large Language Models

原生不可学习的大语言模型

Gaurav R. Ghosal, Pratyush Maini, Aditi Raghunathan

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出NULLs模型，通过共享骨干和稀疏激活的sinks分离数据源贡献，实现无需梯度更新的高效遗忘，在维基百科上验证了单篇文章遗忘的有效性和鲁棒性。

详情

AI中文摘要

遗忘旨在移除特定训练数据源的影响，但由于不同数据源的贡献在模型中纠缠，这已被证明具有挑战性。将源贡献隔离到不相交的参数中使得移除更容易，尽管这会阻碍跨源的联合学习。我们提出NULLs（原生不可学习的大语言模型），这是一类模型，通过训练一组共享骨干神经元以及一个稀疏激活的sinks池，满足隔离源特定贡献和跨源联合学习这两个对立目标。在训练过程中，特定于源的信息自然集中在其sinks中，而跨源共享的信息积累在骨干中。在部署时，通过禁用相应的sinks来遗忘一个源，无需梯度更新也无需访问保留数据。我们展示了NULLs可扩展到维基百科约600万篇文章，将每篇文章隔离为独立源。遗忘单篇文章会移除其特定知识，同时保留与语义相关文章共享的事实，与从头重新训练紧密匹配。我们注意到，使用NULLs进行遗忘也具有鲁棒性：在遗忘《哈利·波特》书籍的案例研究中，NULLs抵抗了对抗性提取和逆转事后遗忘的重新学习。最后，NULLs保留了一般语言能力，在下游基准测试中与标准Transformer相匹配。这些结果共同表明，源级遗忘不必是事后考虑。它可以原生地构建到LLM训练中，同时保留共享表示学习的优势。

英文摘要

Unlearning aims to remove the influence of specific training data sources, but this has proved challenging because the contributions of different sources are entangled within the model. Isolating source contributions to disjoint parameters makes removal easier, though it obstructs joint learning across sources. We propose NULLs (Natively Unlearnable LLMs), a model class that satisfies the two opposing goals of isolating source-specific contributions and learning jointly across sources, by training a set of shared backbone neurons alongside a pool of sparsely activated sinks. During training, information specific to a source naturally concentrates in its sinks while information shared across sources accumulates in the backbone. A source is then unlearned at deployment by disabling its corresponding sinks, with no gradient updates and no access to the retained data. We show that NULLs scales to Wikipedia's ~6M articles, isolating each as an independent source. Unlearning a single article removes knowledge specific to it while preserving facts shared with semantically related articles, closely matching retraining from scratch. We note that unlearning with NULLs is also robust: in a case study of unlearning the Harry Potter books, NULLs resists both adversarial extraction and relearning that reverses post-hoc unlearning. Finally, NULLs preserves general language capabilities, matching a standard transformer on downstream benchmarks. Together, these results suggest that source-level unlearning need not be an afterthought. It can be built natively into LLM training while retaining the benefits of shared representation learning.

URL PDF HTML ☆

赞 0 踩 0

2606.14078 2026-06-15 cs.LG cs.AI 新提交

Rethinking Backdoor Adversarial Unlearning through the Lens of Catastrophic Forgetting in Continual Learning

通过持续学习中的灾难性遗忘视角重新思考后门对抗性去学习

Zhenqian Zhu, Yamin Hu, Yujiang Liu, Luping Wei, Wenbo Hou, Bin Li, Haodong Li, Wenjian Luo

发表机构 * Harbin Institute of Technology, Shenzhen（哈尔滨工业大学（深圳））； Shenzhen Key Laboratory of Media Security, Shenzhen University（深圳大学媒体安全深圳市重点实验室）

AI总结本文将后门学习与去学习建模为持续学习视角下的三阶段过程，基于灾难性遗忘机制推导完全后门去学习的必要条件，并提出盲反演-后门对抗性去学习（BI-BAU）方法，通过期望最大化算法优化最大后验目标，有效消除后门效应。

Comments Accepted by ACM CCS 2026

详情

AI中文摘要

现有研究表明，当前的后门防御方法鲁棒性有限，且常无法应对特定类型的攻击。更令人担忧的是，主流的安全调优策略往往仅提供表面安全保护，因为它们未能完全消除后门效应。在本工作中，我们从持续学习视角将后门学习与去学习重新表述为一个顺序的三阶段过程。在此框架内，我们正式定义了完全后门去学习，并基于灾难性遗忘机制进一步推导了实现它的必要条件。在这些见解的指导下，我们提出了盲反演-后门对抗性去学习（BI-BAU），它将满足去学习条件的对抗样本生成问题表述为一个盲反演问题。我们通过将对抗训练的双层优化过程整合到期望最大化（EM）算法框架中来解决该问题，以优化最大后验（MAP）目标。此外，BI-BAU被扩展到目标类别未知的无目标对抗场景以及多模态对比学习任务中，增强了其在预训练模型可能被攻破的真实部署场景中的适用性。大量实验表明，我们的方法在广泛的后门攻击中具有通用适用性，并能有效且彻底地消除后门模型中的后门效应。

英文摘要

Existing studies reveal that current backdoor defenses exhibit limited robustness and often fail against specific types of attacks. More concerningly, prevailing safety tuning strategies tend to provide only superficial safety protection, as they fall short of completely eliminating the backdoor effects. In this work, we present a novel formulation of backdoor learning and unlearning as a sequential, three-stage process from a continual learning perspective. Within this framework, we formally define complete backdoor unlearning and further derive the necessary conditions for achieving it based on the mechanism of catastrophic forgetting. Guided by these insights, we propose Blind Inversion-Backdoor Adversarial Unlearning (BI-BAU), which formulates the generation of adversarial examples satisfying the unlearning conditions as a blind inversion problem. We solve this by integrating the bi-level optimization process of adversarial training into an Expectation-Maximization (EM) algorithm framework to optimize the maximum a posteriori (MAP) objective. Furthermore, BI-BAU is extended to untargeted adversarial scenarios with unknown target classes, as well as to multi-modal contrastive learning tasks, enhancing its applicability to real-world deployment scenarios where pre-trained models may be compromised. Extensive experiments demonstrate that our method exhibits general applicability across a wide spectrum of backdoor attacks and can effectively and thoroughly eliminate the backdoor effects from a backdoor model.

URL PDF HTML ☆

赞 0 踩 0

2606.14354 2026-06-15 cs.LG 新提交

MUFFLe: Efficient Model Update Compression via Generalized Deduplication for Federated Learning

MUFFLe: 通过广义去重实现联邦学习的高效模型更新压缩

Xiaobo Zhao, Daniel E. Lucani

发表机构 * Innovation Foundation Denmark（丹麦创新基金会）

AI总结提出MUFFLe方案，将广义去重（GD）集成到FedAvg中，通过去重更新向量中的重复模式实现固定速率、可变计数的压缩，在IID MNIST上以38 MB累积上行通信达到92.93%目标精度。

Comments Accepted at IEEE EDGE 2026 (Work-in-Progress track)

2606.14416 2026-06-15 cs.LG stat.ML 新提交

Federated Learning for Feature Generalization with Convex Constraints

基于凸约束的联邦学习特征泛化

Dongwon Kim, Donghee Kim, Sung Kuk Shyn, Kwangsu Kim

发表机构 * Dongwon Kim（金东Won）； Donghee Kim（金东浩）； Sung Kuk Shyn（申 Sung Kuk）； Kwangsu Kim（金光Su）

AI总结针对联邦学习中客户端数据异构导致的泛化问题，提出FedCONST方法，利用线性凸约束自适应调整更新幅度，平衡参数学习，并通过梯度信噪比分析验证其有效性，实现跨异构环境的强泛化。

Comments Accepted at the 42nd International Conference on Machine Learning (ICML 2025)

详情

AI中文摘要

联邦学习（FL）常因客户端数据异构而难以泛化。局部模型容易过拟合其局部数据分布，甚至可迁移特征在聚合过程中也可能被扭曲。为应对这些挑战，我们提出FedCONST，一种基于全局模型参数强度自适应调整更新幅度的方法。这可以防止过度强调已学好的参数，同时加强未充分发展的参数。具体而言，FedCONST采用线性凸约束来确保训练稳定性，并在聚合过程中保留局部学到的泛化能力。梯度信噪比（GSNR）分析进一步验证了FedCONST在增强特征可迁移性和鲁棒性方面的有效性。因此，FedCONST有效对齐了局部和全局目标，减轻了过拟合，促进了跨不同FL环境的更强泛化，达到了最先进的性能。

英文摘要

Federated learning (FL) often struggles with generalization due to heterogeneous client data. Local models are prone to overfitting their local data distributions, and even transferable features can be distorted during aggregation. To address these challenges, we propose FedCONST, an approach that adaptively modulates update magnitudes based on the parameter strength of the global model. This prevents over-emphasizing well-learned parameters while reinforcing underdeveloped ones. Specifically, FedCONST employs linear convex constraints to ensure training stability and preserve locally learned generalization capabilities during aggregation. A Gradient Signal to Noise Ratio (GSNR) analysis further validates the effectiveness of FedCONST in enhancing feature transferability and robustness. As a result, FedCONST effectively aligns local and global objectives, mitigating overfitting and promoting stronger generalization across diverse FL environments, achieving state-of-the-art performance.

URL PDF HTML ☆

赞 0 踩 0

2606.14518 2026-06-15 cs.LG 新提交

Behavioral Audit of Machine Unlearning Has a Privacy Cost

机器遗忘的行为审计具有隐私代价

Liou Tang, James Joshi, Ashish Kundu

发表机构 * University of Pittsburgh（匹兹堡大学）； Cisco（思科）

AI总结本文证明，在互不信任的模型所有者和审计者场景下，仅依赖模型行为查询的审计方案无法在不泄露保留集成员信息的情况下识别未充分遗忘的模型，揭示了隐私与审计之间的固有权衡。

详情

AI中文摘要

通过机器遗忘从机器学习模型中移除已学习数据已被广泛研究；然而，目前尚未有公认的审计方案。现有工作表明，不诚实的模型所有者可以伪造证据来避免执行遗忘，而好奇的审计者（及对手）即使在有限访问权限下也能推断模型及其训练数据的隐私敏感属性。然而，在模型所有者和审计者互不信任的情况下对机器遗忘的审计仍未得到探索。我们为此场景提供了信息论证明：对于凸机器学习模型，仅依赖查询模型获取\textit{行为}信号的通用审计方案无法在不泄露保留集成员信息的情况下识别未充分遗忘的模型。因此，在不诚实的模型所有者和诚实但好奇的审计者假设下审计机器遗忘面临固有的隐私-审计权衡。我们在凸模型上的实证结果强烈支持这一结论，而进一步实验表明这种隐私-审计张力在非凸模型中依然存在。我们的结果呼吁在更现实的审计者威胁模型下更仔细地考虑隐私-审计张力，并为机器遗忘流程中隐私保护审计方案的设计提供更严格的审查基础。我们还在此 https URL 发布了代码实现。

英文摘要

The removal of learned data from Machine Learning models through Machine Unlearning (MU) has been widely studied; however, there has yet to be an agreed-upon scheme for auditing MU. Existing work has shown that a dishonest model owner can falsify evidence to avoid executing MU, while curious auditors (and adversaries) can infer the privacy-sensitive properties of the model and its training data even with limited access. Yet auditing of MU under mutual distrust between the model owner and the auditor remains unexplored. We provide an information-theoretic proof for this scenario: for convex ML models, a generic audit scheme that relies solely on querying the model for \textit{behavioral} signals cannot identify insufficiently unlearned models without revealing membership information of the retained set. Therefore, auditing MU under the assumption of a dishonest model owner and an honest-but-curious auditor faces an inherent privacy-audit tradeoff. Our empirical results on convex models strongly supports this result, while further experiments demonstrate that this privacy-audit tension persists in non-convex models. Our results call for a more careful consideration of the privacy-audit tension under a realistic auditor threat model, and serve as a foundation for more scrutiny of designs of privacy-preserving audit schemes for the MU pipeline. We also release our code implementation at https://github.com/LiouTang/Behavioral-Unlearn-Audit.

URL PDF HTML ☆

赞 0 踩 0

2601.14033 2026-06-15 cs.LG cs.CR 版本更新

联邦基础模型个性化中的静默失败

YongKyung Oh, Alex Bui

发表机构 * Medical & Imaging Informatics (MII) Group, University of California, Los Angeles (UCLA)（医学与影像信息学（MII）组，加州大学洛杉矶分校（UCLA））

AI总结本文提出联邦基础模型个性化中因隐私约束导致的一类信任失败——静默失败，包括偏差放大、公平性崩溃和对齐侵蚀，并引入六种静默失败模式的分类法，强调隐私保护训练不足以保障可信部署。

详情

AI中文摘要

基础模型通过联邦学习在分散的私有数据上越来越个性化，并在日益增长的上市后监管要求下大规模部署。我们认为这种趋同产生了一类独特且未被充分认识的信任失败，我们称之为“静默失败”。这些包括偏差放大、公平性崩溃和对齐侵蚀，这些可能仍然难以检测，因为联邦学习的隐私约束限制了对模型行为的可见性。对现有基准的景观分析揭示了结构性鸿沟。联邦基准评估系统性能，但对模型行为的洞察有限，而集中式信任基准评估行为，但需要与联邦隐私不兼容的模型访问。我们引入了一个由基础模型个性化、数据集偏移和核心联邦约束相互作用产生的六种静默失败模式的分类法。我们的分析表明，仅靠隐私保护训练不足以实现可信部署。最后，我们提出了一个隐私保护行为评估的研究议程，并建议将静默失败作为可信联邦人工智能的标准诊断类别。

英文摘要

Foundation models are increasingly personalized on decentralized private data through federated learning and are now deployed at scale under growing regulatory requirements for post-market monitoring. We argue that this convergence creates a distinct and under-recognized class of trustworthiness failures, which we term "Silent Failures." These include amplified bias, fairness collapse, and alignment erosion that may remain difficult to detect because federated learning's privacy constraints limit visibility into model behavior. A landscape analysis of existing benchmarks reveals a structural divide. Federated benchmarks evaluate system performance but provide limited insight into model behavior, whereas centralized trustworthiness benchmarks assess behavior but require model access incompatible with federated privacy. We introduce a taxonomy of six silent failure modes arising from the interaction of foundation model personalization, dataset shift, and core federated constraints. Our analysis shows that privacy-preserving training alone is insufficient for trustworthy deployment. We conclude with a research agenda for privacy-preserving behavioral evaluation and propose that silent failures become a standard diagnostic category for trustworthy federated artificial intelligence.

URL PDF HTML ☆

赞 0 踩 0

2606.12733 2026-06-15 cs.LG 版本更新

Let's Ask Gauss: Improved One-Run Privacy Auditing

让我们问高斯：改进的单次运行隐私审计

Adya Agrawal, Yu Wei, Jaspal Singh, Malik Magdon-Ismail, Vassilis Zikas

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； Rensselaer Polytechnic Institute（伦斯勒理工学院）； Purdue University（普渡大学）

AI总结提出一种基于高斯渐近分布的差分隐私审计框架，利用白盒DP-SGD中金丝雀对齐信号的归一化和，从单次训练运行中获取更紧的隐私下界。

2602.02355 2026-06-15 cs.DC cs.IT cs.LG math.IT 版本更新

新物理搜索中异常检测的共形校准与look-elsewhere效应

Jack Y. Araz, Michael Spannowsky

发表机构 * Department of Physics and Astronomy, University College London（大学学院伦敦物理系）； Department of Engineering, City St. George’s, University of London（伦敦大学城市圣乔治学院工程系）； Institute for Theoretical Physics, Campus Süd, Karlsruhe Institute of Technology (KIT)（卡尔斯鲁厄理工学院（KIT）理论物理研究所）； Institute for Quantum Materials and Technologies, Karlsruhe Institute of Technology（卡尔斯鲁厄理工学院量子材料与技术研究所）

AI总结提出基于共形预测的校准层，将任意异常分数转化为具有分布无关、有限样本保证的显著性，同时修正背景误建模和look-elsewhere效应。

Comments 22 pages, 15 figures, 3 tables. Comments welcome

详情

AI中文摘要

机器学习驱动的异常检测正在重塑新物理搜索，但其统计解释方法已落后。原始异常分数缺乏校准意义，扫描多个区域的模型会放大look-elsewhere效应，而领域依赖的渐近显著性对异常检测器特别容易遭受的背景误建模视而不见。我们提出一个基于共形预测的校准层，能将任意异常分数转化为具有分布无关、有限样本保证的可辩护显著性。共形预测将分数转化为有效的局部p值，加权和Mondrian变体修复了共振搜索中边带到信号区域的可交换性失败，而Gross-Vitells步骤将结果转化为考虑look-elsewhere的全局显著性。该层同时做两件事：它暴露了标准流程无法发现的校准错误，并在不重新训练检测器的情况下进行修正。在公开的LHC Olympics数据上，一个分类器产生了子结构-质量相关性，使得边带校准的背景p值变得反保守。表面上看，这仅由背景塑造就制造了约$46\sigma$的过剩，而无标签加权修正消除了这一过剩，恢复了诚实的零假设。当作为盲法宽质量凸起搜索运行时，标准渐近和未加权程序即使在无信号窗口也会制造$\gtrsim10\sigma$和约$5\sigma$的过剩，而共形层没有产生任何误报，其全局误报率在仅背景伪实验中得到验证。结果是一条可审计、与检测器无关的路径，从未校准分数到考虑试验因子的显著性，可集成到实验异常搜索中。

英文摘要

Machine-learned anomaly detection is reshaping searches for new physics, but it has outrun the statistics used to interpret it. A raw anomaly score has no calibrated meaning, a model that scans many regions inflates the look-elsewhere effect, and the asymptotic significances the field relies on are blind to the background mismodelling that anomaly detectors are especially prone to. We propose a calibration layer, built on conformal prediction, that turns any anomaly score into a defensible significance with distribution-free, finite-sample guarantees. Conformal prediction converts scores into valid local p-values, weighted and Mondrian variants repair the sideband-to-signal-region exchangeability failures that resonant searches suffer, and a Gross-Vitells step carries the result through to a look-elsewhere-aware global significance. The layer does two things at once. It exposes miscalibration that the standard pipeline cannot see, and it corrects it without retraining the detector. On public LHC Olympics data, a classifier develops a substructure-mass correlation that makes sideband-calibrated background p-values anti-conservative. Taken at face value, this manufactures a $\sim 46σ$ excess from background sculpting alone, which the label-free weighted correction removes, restoring an honest null. When run as a blind wide-mass bump hunt, the standard asymptotic and unweighted procedures fabricate $\gtrsim10σ$ excesses and $\approx5σ$ excesses even in signal-free windows, while the conformal layer raises no false alarms and its global false-positive rate is verified on background-only pseudoexperiments. The result is an auditable, detector-agnostic path from an uncalibrated score to a trials-factor-aware significance, ready to be folded into experimental anomaly searches.

URL PDF HTML ☆

赞 0 踩 0

2606.13870 2026-06-15 cs.CV cs.AI cs.LG 交叉投稿

Mirage Probes: How Vision Models Fake Visual Understanding

幻象探针：视觉模型如何伪造视觉理解

Daniel Ben-Levi, Judah Goldfeder, Weiliang Zhao, Raz Lapid, Amit LeVi, Allen G. Roush, Ravid Shwartz-Ziv, Hod Lipson

发表机构 * Columbia University（哥伦比亚大学）； Intuit ； Technion（以色列理工学院）； Thoughtworks ； New York University（纽约大学）

AI总结提出幻象探针框架，通过对比探针揭示视觉语言模型在无图像时也能回答问题的两种幻象行为：文本偏见和虚假图像，并证明后者需要表征级干预。

详情

AI中文摘要

视觉语言模型（VLM）即使在没有提供图像的情况下，也能自信且通常正确地回答基于图像的问题。这种幻象行为会虚增基准分数，而不反映视觉基础。先前的工作将其视为单一故障模式。我们认为这是两种。使用幻象探针（Mirage Probes），一种对比探针框架，将释义的问题变体与同一图像上的匹配幻象和非幻象标签配对，我们展示了在两个开源VLM中，幻象行为可以从残差流、MLP、后注意力和注意力头位置的内部激活中线性解码。我们证明朴素贝叶斯文本基线无法恢复此信号，排除了表面词汇混淆。跨基准可分离性模式，连同一种新颖的先验利用指数（PHI），衡量模型仅从文本中回答的程度，揭示了两种不同的机制：文本偏见，其中模型从语言先验中回答而不涉及视觉表征；以及虚假图像，其中模型在潜在空间中构建虚假视觉内容并像有基础一样回答。这种区别有直接的缓解后果：文本分布清理可以解决第一种机制，但无法触及第二种，因为虚假图像幻象存在于模型的视觉表征中而非文本中。忠实的视觉基础将需要在表征层面进行干预。

英文摘要

Vision-language models (VLMs) can answer image-based questions confidently, and often correctly, even when no image is provided. This mirage behavior inflates benchmark scores without reflecting visual grounding. Prior work treats this as a single failure mode. We argue it is two. Using Mirage Probes, a contrastive probing framework that pairs paraphrased question variants with matched mirage and non-mirage labels on the same image, we show that mirage behavior is linearly decodable from internal activations across residual stream, MLP, post-attention, and attention-head sites in two open-source VLMs. We demonstrate that a Naive Bayes text baseline cannot recover this signal, ruling out surface lexical confounds. Cross-benchmark separability patterns, together with a novel Prior Harnessing Index (PHI) measuring how much a model can answer from text alone, expose two distinct regimes: textual biases, where the model answers from language priors without engaging visual representations, and spurious images, where it constructs false visual content in latent space and answers as if grounded. The distinction has direct mitigation consequences: text-distribution cleaning can address the first regime but cannot reach the second, since spurious-image mirages live in the model's visual representations rather than its text. Faithful visual grounding will require interventions at the representational level.

URL PDF HTML ☆

赞 0 踩 0

2606.14200 2026-06-15 cs.AI cs.LG 交叉投稿

When Should Agent Trust Be Conditional? Characterizing and Attacking Skill-Conditional Reputation in Agent Swarms

何时应条件化智能体信任？表征与攻击智能体群中的技能条件声誉

Yihan Xia, Taotao Wang

发表机构 * Shenzhen University（深圳大学）

AI总结研究异构LLM智能体群中技能条件信任的适用条件，通过相图分析揭示其在高异质性、稀疏证据和技能相关场景下有效，但存在跨技能证据被攻击者利用的风险，提出条件信息值测试（CIVT）量化攻击影响。

Comments 18 pages, 8 figures, 2 tables

详情

AI中文摘要

开放平台越来越多地将任务路由给异构的LLM智能体——它们在基础模型、框架和工具栈上有所不同——其能力因技能而异：一个智能体在某项技能上表现出色，在另一项技能上可能毫无用处。标准的声誉方法为每个智能体总结一个单一的全局信任分数，但这里的标量是错误的对象，因为将每个任务路由到全局最受信任的智能体会放弃专业化的价值。我们研究技能条件信任R(i | k)——对于需要技能k的任务，应赋予智能体i的信任，而不是每个智能体一个分数——并提出三个可证伪的问题：何时条件化是值得的，应借用多少跨技能证据，以及这种借用是否安全。受控的相图分析回答了前两个问题：条件信任仅在特定区域获胜——高智能体异质性、稀疏的每技能证据和相关的技能——而实现这种数据效率的耦合强度β是双刃剑，因为相同的跨技能借用也是一个洗钱渠道。在14个真正异构的AppWorld智能体的公共基准上，实际池落在有益区域内——一个微小但真实的增益，每技能最佳智能体在不同技能间确实发生变化。然后我们展示，一个在一种技能上有廉价证据而在目标技能上没有证据的攻击者劫持条件路由器，将路由遗憾从0驱动到0.94，而我们的零成本条件信息值测试（CIVT）将其评为绿色——而它污染的无门控信任判决读数为-0.06，而非诚实的+0.19。零证据门限限制了攻击但并未消除它；我们在明确预算下表征了剩余成本。我们不声称抗女巫攻击——我们量化了权衡。

英文摘要

Open platforms increasingly route tasks among heterogeneous LLM agents--differing in base model, scaffold, and tool stack--whose competence varies sharply by skill: an agent excellent at one skill may be useless at another. The standard reputation approach summarizes each agent by a single global trust score, but that scalar is the wrong object here, because routing every task to the globally most-trusted agent leaves the value of specialization unclaimed. We study skill-conditional trust R(i | k)--the trust to place in agent i for a task requiring skill k, rather than one score per agent--and pose three falsifiable questions: when is conditioning worth it, how much cross-skill evidence should be borrowed, and whether that borrowing is safe. A controlled phase-diagram analysis answers the first two: conditional trust wins only in a specific regime--high agent heterogeneity, sparse per-skill evidence, and correlated skills--and the coupling strength beta that buys this data efficiency is dual-use, because the same cross-skill borrowing is also a laundering channel. On a public benchmark of 14 genuinely heterogeneous AppWorld agents, real pools land inside the beneficial regime--a small but genuine gain, with the per-skill best agent genuinely changing across skills. We then show that an attacker with cheap evidence in one skill and none in a target skill hijacks the conditional router, driving routing regret from 0 to 0.94 on a pool our zero-cost Conditional Information Value Test (CIVT) rates GREEN--while the ungated trust verdict it contaminates reads -0.06 instead of the honest +0.19. A zero-evidence gate bounds the attack but does not eliminate it; we characterize the residual cost under an explicit budget. We do not claim Sybil-resistance--we quantify the trade-off.

URL PDF HTML ☆

赞 0 踩 0

2606.14466 2026-06-15 cs.SD cs.AI cs.LG 交叉投稿

The Perceived Fragility of Explanations in Audio Models: Manipulation of Attribution with Unchanged Predictions

音频模型中解释的感知脆弱性：在预测不变的情况下操纵归因

Piotr Kitłowski, Dominik Wiącek, Mateusz Modrzejewski

发表机构 * University of Warsaw（华沙大学）

AI总结提出一种心理声学框架，通过优化不可听扰动来解耦模型归因与分类，证明在音频深度伪造检测中可系统扭曲解释热图而保持预测标签不变。

Comments Accepted to the ICML 2026 Workshop on Machine Learning for Audio: 5 pages, 4 figures

2606.14476 2026-06-15 cs.AI cs.LG 交叉投稿

When the Tool Decides: LLM Agents Defer Blindly to Graph Neural Network Tools, and Stronger Backbones Defer More

当工具决定时：LLM代理盲目服从图神经网络工具，更强的骨干网络服从更多

Zhongyuan Wang, Pratyusha Vemuri

发表机构 * raptorX.ai

AI总结研究LLM代理在使用GNN工具时是否真正判断而非盲目服从，发现代理在97.6-99.2%的情况下完全采纳GNN输出，且更强的骨干网络服从更多，选择性调用设计受限。

Comments 9 pages, 2 figures. Under review at TMLR

详情

AI中文摘要

越来越多的研究为大型语言模型（LLM）代理配备图神经网络（GNN）作为可调用工具，假设代理能够判断何时以及多大程度上依赖该工具。我们直接测试了这一假设。我们将冻结的GNN作为显式工具暴露给ReAct风格的LLM代理，并在文本属性图（ogbn-arxiv，在WikiCS上重复）上的节点分类任务中，测量代理是使用工具还是仅仅服从它。我们发现代理并未进行判断：其预测与原始GNN的预测一致率达到97.6-99.2%（5个随机种子），沦为GNN鹦鹉，全盘采用工具的输出并绕过自身推理。通过扫描骨干网络能力（Qwen2.5 0.5B-7B），这种服从并非弱模型伪影：在能够调用工具的模型中，一致性随能力提升而上升（从1.5B的0.60到7B的0.98）。关键的是，服从的代价并未随能力增长而缩小，反而在替代方案出现时扩大：每个节点上可用动作的oracle比鹦鹉在3B时高出0.09-0.18，在7B时高出0.12-0.22，在高同质性下几乎翻倍，因为鹦鹉被冻结的GNN所束缚，而代理的替代方案在改进；在7B时，简单的邻居标签工具在高同质性下超越了GNN（0.81 vs 0.71），但代理仍然服从。一个简单的选择性调用门恢复了约一半的高同质性差距（0.71到0.83），但未带来全局净收益，而保留估计表明，在标准测试时特征上可达到的最佳门最多只能获得oracle余量的三分之一：可靠的选择性调用似乎受限于可用信息，而不仅仅是路由器设计。我们的结果是一个警示性测量：对代理+工具系统的评估不能假设代理在工具之上添加了判断，选择性调用必须被设计进去，而不是期望从规模中涌现。

英文摘要

A growing line of work equips large language model (LLM) agents with graph neural networks (GNNs) as callable tools, assuming the agent exercises judgment over when and how much to rely on such a tool. We test this directly. We expose a frozen GNN to a ReAct-style LLM agent as an explicit tool and measure, on node classification over a text-attributed graph (ogbn-arxiv, replicated on WikiCS), whether the agent uses the tool or merely obeys it. We find the agent does not exercise judgment: its predictions agree with the raw GNN's 97.6-99.2% of the time (5 seeds), collapsing into a GNN parrot that adopts the tool's output wholesale and bypasses its own reasoning. Sweeping backbone capability (Qwen2.5 0.5B-7B), the deference is not a weak-model artifact: among models able to invoke the tool, agreement rises with capability (0.60 to 0.98 from 1.5B to 7B). Crucially, the cost of deference does not shrink as capability grows and grows where alternatives emerge: a per-node oracle over the available actions beats the parrot by 0.09-0.18 at 3B and 0.12-0.22 at 7B, roughly doubling at high homophily, because the parrot is pinned to the frozen GNN while the agent's alternatives improve; at 7B a simple neighbour-label tool overtakes the GNN at high homophily (0.81 vs 0.71) yet the agent still defers. A simple selective-invocation gate recovers about half of that high-homophily gap (0.71 to 0.83) but yields no net global gain, and held-out estimates bound the best achievable gate over standard test-time features to at most a third of the oracle headroom: reliable selective invocation looks limited by available information, not merely router design. Our results are a cautionary measurement: evaluations of agent+tool systems cannot assume the agent adds judgment on top of the tool, and selective invocation must be designed in rather than expected to emerge from scale.

URL PDF HTML ☆

赞 0 踩 0

2604.09737 2026-06-15 cs.LG cs.AI 版本更新

STaR-DRO: Stateful Tsallis Reweighting for Group-Robust Structured Prediction

STaR-DRO: 面向群体鲁棒结构化预测的状态化Tsallis重加权

Samah Fodeh, Ganesh Puthiaraju, Elyas Irankhah, Afshan Khan, Sreeraj Ramachandran, Linhai Ma, Srivani Talakokkul, Sarah Schellhorn

发表机构 * Yale University（耶鲁大学）； Yale School of Medicine（耶鲁医学院）

AI总结提出STaR-DRO框架，结合Tsallis镜像上升和稀疏entmax映射，仅对持续困难群体上权重，在结构化预测中提升标签准确性和鲁棒性，在EPPC Miner任务上相比SFT和标准DRO分别提升F1分数1.08和2.20。

详情

AI中文摘要

使用大型语言模型进行结构化预测需要输出在标签不平衡和异质群体难度下具有标签准确性、本体约束、结构有效性和证据基础。我们提出了一个统一框架用于本体约束生成。首先，我们引入了一个模块化的提示工程架构，结合了XML风格结构、专家消歧规则、思维链推理、元数据感知决策逻辑、模式契约和自我验证门。它针对反复出现的上下文失败，包括格式漂移、标签歧义、证据幻觉和元数据条件混淆。其次，我们提出了STaR-DRO，结合了Tsallis镜像上升、稀疏entmax风格原始映射、EMA平滑群体损失跟踪、重新缩放上升信号和有界超额乘数。与依赖密集香农熵指数梯度更新、可能引入高方差随机重加权、将正对抗质量分配给非持续困难群体、并通过单纯形竞争产生成本的常规DRO不同，STaR-DRO仅对持续困难群体上权重，而不抑制较容易的群体。我们在EPPC Miner上评估该框架，这是一个临床基础的高风险结构化预测任务，需要从患者-提供者安全消息中进行层次标签预测和证据跨度提取。在1B-70B Llama模型上，提示工程改进了零样本提取，平均标签F1增益为+14.46，跨度F1增益为+17.40。在监督微调的基础上，STaR-DRO进一步提高了准确性和鲁棒性，平均标签F1分别提高了+1.08和+2.20，同时相对于SFT和标准DRO，平均群体验证交叉熵分别降低了21.3%和14.8%。这些结果推进了以患者为中心的临床护理分析的可靠自动化通信挖掘。

英文摘要

Structured prediction with large language models requires outputs that are label-accurate, ontology-constrained, structurally valid, and evidence-grounded under label imbalance and heterogeneous group difficulty. We present a unified framework for ontology-constrained generation. First, we introduce a modular prompt-engineering architecture combining XML-style structure, expert disambiguation rules, chain-of-thought reasoning, metadata-aware decision logic, schema contracts, and a self-validation gate. It targets recurrent in-context failures, including format drift, label ambiguity, evidence hallucination, and metadata-conditioned confusion. Second, we propose STaR-DRO, combining Tsallis mirror ascent, sparse entmax-style primal mapback, EMA-smoothed group-loss tracking, rescaled ascent signals, and bounded excess-only multipliers. Unlike conventional DRO, which relies on dense Shannon-entropy exponentiated-gradient updates, can introduce high-variance stochastic reweighting, assigns positive adversarial mass to groups that are not persistently hard, and incurs costs through simplex competition, STaR-DRO upweights only persistently hard groups without suppressing easier ones. We evaluate the framework on EPPC Miner, a clinically grounded high-stakes structured-prediction task requiring hierarchical label prediction and evidence-span extraction from patient-provider secure messages. Across 1B-70B Llama models, prompt engineering improves zero-shot extraction, yielding an average label F1 gain of +14.46 and a Span F1 gain of +17.40. Building on supervised fine-tuning, STaR-DRO further improves accuracy and robustness, increasing average label F1 by +1.08 and +2.20 while reducing mean groupwise validation cross-entropy by 21.3% and 14.8% relative to SFT and standard DRO, respectively. These results advance reliable automated communication mining for patient-centered clinical care analysis.

URL PDF HTML ☆

赞 0 踩 0

2604.18419 2026-06-15 cs.LG cs.CL stat.ML 版本更新

Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning

知道何时退出：LLM推理中动态弃权的原则性框架

Hen Davidov, Nachshon Cohen, Oren Kalinsky, Yaron Fairstein, Guy Kushilevitz, Ram Yazdi, Patrick Rebeschini

发表机构 * Hebrew University of Jerusalem（特拉维夫大学）

AI总结本文提出一个基于正则化强化学习框架的动态弃权原则，通过价值函数与弃权奖励的比较来决定是否提前终止推理，在数学推理和毒性避免任务上优于现有方法。

详情

AI中文摘要

利用思维链推理的大型语言模型常常因产生冗长且错误的响应而浪费大量计算资源。弃权可以通过抑制可能不正确的输出来缓解这一问题。虽然大多数弃权方法在生成之前或之后决定是否保留输出，但动态的生成中弃权考虑在每个token位置提前终止无前途的推理轨迹。先前的工作探索了这一想法的经验变体，但缺乏对弃权规则的原则性指导。我们提出了LLM动态弃权的形式化分析，将弃权建模为正则化强化学习框架中的一个显式动作。弃权奖励参数控制计算与信息之间的权衡。我们证明，在一般条件下，当价值函数低于该奖励时弃权严格优于自然基线。我们进一步推导了一种原则性且高效的方法来近似价值函数。在数学推理和毒性避免任务上的实证结果支持我们的理论，并展示了相比现有方法改进的选择性准确性。

英文摘要

LLMs utilizing chain-of-thought reasoning often waste substantial compute by producing long, incorrect responses. Abstention can mitigate this by withholding outputs unlikely to be correct. While most abstention methods decide to withhold outputs before or after generation, dynamic mid-generation abstention considers early termination of unpromising reasoning traces at each token position. Prior work has explored empirical variants of this idea, but principled guidance for the abstention rule remains lacking. We present a formal analysis of dynamic abstention for LLMs, modeling abstention as an explicit action within a regularized reinforcement learning framework. An abstention reward parameter controls the trade-off between compute and information. We show that abstaining when the value function falls below this reward strictly outperforms natural baselines under general conditions. We further derive a principled and efficient method to approximate the value function. Empirical results on mathematical reasoning and toxicity avoidance tasks support our theory and demonstrate improved selective accuracy over existing methods.

URL PDF HTML ☆

赞 0 踩 0

2605.04847 2026-06-15 cs.LG cs.AI 版本更新

Quantile-Free Uncertainty Quantification in Graph Neural Networks

图神经网络中的无分位数不确定性量化

Soyoung park, Hwanjun Song, Sungsu Lim

发表机构 * Soyoung Park Hwanjun Song Sungsu Lim

AI总结提出QpiGNN框架，通过无分位数联合损失直接优化覆盖率和区间宽度，实现高效鲁棒的图神经网络不确定性量化，理论保证渐近覆盖和近最优宽度。

Comments Accepted at the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

不确定性量化（UQ）在图神经网络（GNN）中对于高风险领域至关重要，但仍是一个重大挑战。在图设置中，消息传递通常依赖于强假设（如可交换性），这些假设在实践中很少满足，并且实现可靠的UQ通常需要昂贵的重采样或事后校准。为了解决这些问题，我们引入了无分位数预测区间GNN（QpiGNN），这是一个基于分位数回归（QR）的框架，通过直接优化覆盖率和区间宽度来实现基于GNN的UQ，无需分位数输入或后处理。QpiGNN采用双头架构，将预测和不确定性解耦，并通过无分位数联合损失使用仅标签监督进行训练。这种设计允许高效训练，并产生鲁棒的预测区间，在温和假设下具有渐近覆盖率和近最优宽度的理论保证。在19个合成和真实世界基准上的实验表明，QpiGNN比基线平均覆盖率高22%，区间窄50%，同时确保了对噪声和结构变化的效率和鲁棒性。

英文摘要

Uncertainty quantification (UQ) in graph neural networks (GNNs) is crucial in high-stakes domains but remains a significant challenge. In graph settings, message passing often relies on strong assumptions such as exchangeability, which are rarely satisfied in practice, and achieving reliable UQ typically requires costly resampling or post-hoc calibration. To address these issues, we introduce Quantile-free Prediction Interval GNN (QpiGNN), a framework that builds on quantile regression (QR) to enable GNN-based UQ by directly optimizing coverage and interval width without requiring quantile inputs or post-processing. QpiGNN employs a dual-head architecture that decouples prediction and uncertainty, and is trained with label-only supervision through a quantile-free joint loss. This design allows efficient training and yields robust prediction intervals, with theoretical guarantees of asymptotic coverage and near-optimal width under mild assumptions. Experiments on 19 synthetic and real-world benchmarks show QpiGNN achieves average 22% higher coverage and 50% narrower intervals than baselines, while ensuring efficiency and robustness to noise and structural shifts.

URL PDF HTML ☆

赞 0 踩 0

2402.16388 2026-06-15 stat.ML cs.LG 版本更新

Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors

留一法、自助法和交叉共形异常检测器

Oliver Hennhöfer, Christine Preisach

发表机构 * German Federal Ministry for Economic Affairs and Climate Action（德国经济事务和气候行动部）

AI总结为解决异常检测中校准数据不足的问题，基于共形预测提出留一法、自助法和交叉共形方法，在控制第一类错误率的同时提高数据效率。

Comments Published in 2024 IEEE International Conference on Knowledge Graph (ICKG)

Journal ref Proc. 2024 IEEE ICKG 15(1): 110-119 (February 2025)

详情

DOI: 10.1109/ICKG63256.2024.00022

AI中文摘要

异常检测系统中不确定性量化的需求日益重要。在此背景下，有效控制这些系统的第一类错误率而不增加第二类错误率，可以建立信任并减少与错误发现相关的成本。共形异常检测领域通过模型校准提供统计和有限样本有效性保证，成为一种有前景的方法。然而，对校准数据的依赖带来了实际限制，尤其是在低数据场景中。在本工作中，我们基于共形预测领域的方法，正式定义并评估了用于共形异常检测的留一法、自助法和交叉共形方法。超越经典的拆分共形方法，我们展示了用于计算重抽样共形$p$值的派生方法在全共形（直推式）方法的数据效率与拆分共形（归纳式）方法的计算效率之间提供了实用的折衷。我们验证了派生方法，并量化了它们在一类分类器和数据集上的改进。

英文摘要

The need for uncertainty quantification in anomaly detection systems has become increasingly important. In this context, effectively controlling Type I error rates without inflating Type II error rates in these systems can build trust and reduce costs associated with false discoveries. The field of conformal anomaly detection emerges as a promising approach for providing respective statistical and finite-sample validity guarantees through model calibration. However, reliance on calibration data imposes practical limitations, especially in low-data regimes. In this work, we formally define and evaluate leave-one-out-, bootstrap-, and cross-conformal methods for conformal anomaly detection, building on methods from the field of conformal prediction. Looking beyond the classical split-conformal approach, we show that derived methods for calculating resampling-conformal $p$-values offer a practical compromise between the data efficiency of full-conformal (transductive) approaches and the computational efficiency of split-conformal (inductive) methods. We validate derived methods and quantify their improvements for a range of one-class classifiers and datasets.

URL PDF HTML ☆

赞 0 踩 0

2406.09250 2026-06-15 cs.CV cs.AI cs.LG 版本更新

MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

MirrorCheck: 视觉-语言模型的高效对抗防御

Samar Fares, Klea Ziu, Toluwani Aremu, Nikita Durasov, Martin Takáč, Pascal Fua, Ivan Laptev, Karthik Nandakumar

发表机构 * Mohamed Bin Zayed University of Artificial Intelligence（莫扎伊德大学人工智能大学）； NVIDIA ； École Polytechnique Fédérale de Lausanne（洛桑联邦理工学院）； Michigan State University（密歇根州立大学）

AI总结提出MirrorCheck框架，利用文本到图像模型和随机化策略检测并防御针对视觉-语言模型的自适应对抗攻击。

详情

AI中文摘要

视觉-语言模型（VLM）越来越容易受到复杂的对抗性攻击，包括专门设计用于绕过现有防御的自适应策略。为了解决这一漏洞，我们提出了MirrorCheck，一个鲁棒且与模型无关的检测框架，在单模态和多模态设置中均能有效运行。MirrorCheck利用文本到图像（T2I）模型从目标模型生成的标题中重建视觉内容，并通过比较原始图像和合成图像之间的特征空间嵌入来评估语义一致性。为了增强对自适应攻击的鲁棒性，MirrorCheck引入了一种随机防御策略，从多样化的模型库中随机选择T2I生成器和图像编码器。此外，我们采用了一种新颖的一次性（OTU）扰动，应用于所选编码器嵌入，并通过缩放因子调节，这降低了自适应攻击的有效性。跨多种威胁场景的大量实验表明，MirrorCheck始终优于基线方法，即使在强自适应对抗条件下也能保持其实用性。

英文摘要

Vision-Language Models (VLMs) are increasingly susceptible to sophisticated adversarial attacks, including adaptive strategies specifically designed to bypass existing defenses. To address this vulnerability, we propose MirrorCheck, a robust and model-agnostic detection framework that operates effectively in both unimodal and multimodal settings. MirrorCheck leverages Text-to-Image (T2I) models to regenerate visual content from captions produced by the target model and assesses semantic consistency by comparing feature-space embeddings between the original and synthesized images. To enhance robustness against adaptive attacks, MirrorCheck introduces a stochastic defense strategy that randomly selects T2I generators and image encoders from a diverse model zoo. Additionally, we incorporate a novel One-Time-Use (OTU) perturbation applied to the selected encoder embeddings, regulated by a scaling factor, which decreases the effectiveness of adaptive attacks. Extensive experiments across multiple threat scenarios demonstrate that MirrorCheck consistently outperforms baseline methods, and maintains its utility even under strong adaptive adversarial conditions.

URL PDF HTML ☆

赞 0 踩 0

2512.04981 2026-06-15 cs.CV cs.LG 版本更新

Aligned but Stereotypical? How System Prompts Shape Demographic Bias in LLM-Based Text-to-Image Models

对齐但刻板？系统提示如何塑造基于LLM的文本到图像模型中的人口统计偏见

NaHyeon Park, Na Min An, Kunhee Kim, Soyeon Yoon, Jiahao Huo, Hyunjung Shim

发表机构 * KAIST（韩国科学技术院）； HKUST (GZ)（香港科技大学（广州））

AI总结研究LLM增强的文本到图像系统在提示扩展中引入隐性人口统计偏见的问题，提出无训练的去偏框架FairPro，通过自适应生成公平性指令减少人口统计差异。

Comments Project page: https://fairpro-t2i.github.io

详情

AI中文摘要

文本到图像（T2I）系统越来越依赖基于大语言模型（LLM）的文本条件来解释和扩展用户提示。虽然这提高了提示理解和文本-图像对齐，但我们发现，即使未指定人口统计属性，它也可能引入隐性的人口统计假设。为了系统地研究这种行为在不同提示模糊性和复杂性水平下的表现，我们构建了一个涵盖多种提示设置的综合基准。对八个最新T2I模型的评估表明，基于LLM的系统始终比非LLM基线表现出更强的人口统计偏差。我们进一步分析了系统提示，这是基于LLM的T2I系统特有的组件，用于指导提示解释和扩展。我们的分析表明，这些指令强烈影响文本嵌入，进而导致有偏的图像生成。受这些发现启发，我们提出了FairPro，一个无训练的去偏框架，它在保持用户意图的同时自适应地生成公平性感知指令。实验表明，FairPro在保持提示忠实度的同时显著减少了人口统计差异。

英文摘要

Text-to-image (T2I) systems increasingly rely on Large Language Model (LLM)-based text conditioning to interpret and expand user prompts. While this improves prompt understanding and text-image alignment, we find that it can also introduce implicit demographic assumptions, even when demographic attributes are unspecified. To systematically investigate this behavior across varying levels of prompt ambiguity and complexity, we construct a comprehensive benchmark covering diverse prompt settings. Evaluations on eight recent T2I models show that LLM-based systems consistently exhibit stronger demographic skew than non-LLM-based baselines. We further analyze system prompts, a component unique to LLM-based T2I systems that guides prompt interpretation and expansion. Our analyses show that these instructions strongly influence text embeddings, which subsequently leads to biased image generations. Motivated by these findings, we propose FairPro, a training-free debiasing framework that adaptively generates fairness-aware instructions while preserving user intent. Experiments demonstrate that FairPro substantially reduces demographic disparities while maintaining prompt fidelity.

URL PDF HTML ☆

赞 0 踩 0

2602.09161 2026-06-15 stat.ML cs.LG 版本更新

Minimum Distance Summaries for Robust Neural Posterior Estimation

最小距离摘要用于鲁棒神经后验估计

Sherman Khoo, Dennis Prangle, Song Liu, Mark Beaumont

发表机构 * University of Cambridge（剑桥大学）

AI总结提出最小距离摘要方法，通过最大均值差异（MMD）在测试时自适应调整摘要统计量，在不修改预训练神经后验估计器的情况下实现鲁棒推断，理论保证鲁棒性并实验验证。

详情

AI中文摘要

基于模拟的推断（SBI）通过首先在先验-模拟器对上训练神经后验估计器（NPE），通常使用低维摘要统计量，实现摊销贝叶斯推断，然后可以在新测试观测上查询以廉价地重复用于快速推断。由于NPE是在训练数据分布下估计的，当观测偏离训练分布时，它容易受到误指定的影响。许多鲁棒SBI方法通过修改NPE训练或引入误差模型来解决这个问题，将鲁棒性与推断网络耦合，损害了摊销和模块化。我们引入了最小距离摘要，一种即插即用的鲁棒NPE方法，独立于预训练NPE自适应调整测试时的摘要统计量。利用最大均值差异（MMD）作为观测数据与摘要条件预测分布之间的距离，自适应摘要从MMD继承了强鲁棒性属性。我们证明该算法可以通过随机傅里叶特征近似高效实现，产生轻量级、无模型的测试时自适应过程。我们为算法的鲁棒性提供了理论保证，并在各种合成和真实世界任务上进行了实证评估，表明在最小额外开销下实现了显著的鲁棒性提升。

英文摘要

Simulation-based inference (SBI) enables amortized Bayesian inference by first training a neural posterior estimator (NPE) on prior-simulator pairs, typically through low-dimensional summary statistics, which can then be cheaply reused for fast inference by querying it on new test observations. Because NPE is estimated under the training data distribution, it is susceptible to misspecification when observations deviate from the training distribution. Many robust SBI approaches address this by modifying NPE training or introducing error models, coupling robustness to the inference network and compromising amortization and modularity. We introduce minimum-distance summaries, a plug-in robust NPE method that adapts queried test-time summaries independently of the pretrained NPE. Leveraging the maximum mean discrepancy (MMD) as a distance between observed data and a summary-conditional predictive distribution, the adapted summary inherits strong robustness properties from the MMD. We demonstrate that the algorithm can be implemented efficiently with random Fourier feature approximations, yielding a lightweight, model-free test-time adaptation procedure. We provide theoretical guarantees for the robustness of our algorithm and empirically evaluate it on a range of synthetic and real-world tasks, demonstrating substantial robustness gains with minimal additional overhead.

URL PDF HTML ☆

赞 0 踩 0

2603.23530 2026-06-15 cs.CL cs.AI cs.LG 版本更新

上下文感知的模态-拓扑协同对齐用于多模态属性图

Sirui Zhang, Xu Wang, Zhengyu Wu, Xunkai Li, Hongchao Qin

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出CoMAG框架，通过任务自适应可靠上下文学习和模态保持的跳令牌对齐，统一处理图任务和模态任务，在保持稀疏边线性复杂度的同时提升结构预测、跨模态匹配和图条件生成性能。

详情

AI中文摘要

多模态属性图（MAGs）通过将图拓扑与文本、图像等异质属性耦合来建模真实世界实体。它们支持需要结构和类别判别表示以进行图中心任务，以及需要细粒度跨模态对应以进行模态中心任务。然而，现有的MAG方法通常依赖固定的图上下文或统一融合的表示，导致任务无关的传播和过度压缩的融合，阻碍了多样化的任务需求和模态特定证据的保留。为了解决这个问题，我们提出了CoMAG，一个统一的MAG骨干网络，学习任务自适应的可靠上下文并在其中进行模态保持的对齐。CoMAG首先通过从多模态语义一致性估计边可靠性、用语义邻居补充原始拓扑以及通过任务感知门选择上下文组件来进行可靠上下文学习。然后，它通过维护模态特定的多跳轨迹、跨模态匹配模态-跳令牌以及解耦共享和私有表示来进行模态保持的跳令牌对齐。因此，CoMAG在一次前向传播中产生图和模态表示，同时保留模态特定的线索。我们进一步分析了稳定传播、缓解过度平滑和控制模态崩溃。在九个OpenMAG数据集上的实验将CoMAG与仅特征、仅图、多模态和统一的MAG基线在图级预测、模态匹配和图条件生成方面进行了比较。结果表明，CoMAG达到了最佳报告性能，证明任务自适应的可靠上下文和模态保持的对齐改善了结构预测、跨模态匹配和图条件生成，同时保持了稀疏边线性复杂度。

英文摘要

Multimodal Attributed Graphs (MAGs) model real-world entities by coupling graph topology with heterogeneous attributes such as text and images. They support graph-centric tasks requiring structural and class-discriminative representations, and modality-centric tasks requiring fine-grained cross-modal correspondence. However, existing MAG methods often rely on fixed graph contexts or uniformly fused representations, causing task-agnostic propagation and over-compressed fusion that hinder diverse task requirements and modality-specific evidence preservation. To address this, we propose CoMAG, a unified MAG backbone that learns task-adaptive reliable contexts and modality-preserving alignment within them. CoMAG first conducts Reliable Context Learning by estimating edge reliability from multimodal semantic consistency, complementing raw topology with semantic neighbors, and selecting context components through a task-aware gate. It then performs Modality-preserving Hop-token Alignment by maintaining modality-specific multi-hop trajectories, matching modality-hop tokens across modalities, and decoupling shared and private representations. Thus, CoMAG produces graph and modality representations from one forward pass while retaining modality-specific cues. We further analyze stable propagation, over-smoothing mitigation, and modality-collapse control. Experiments on nine OpenMAG datasets compare CoMAG with feature-only, graph-only, multimodal, and unified MAG baselines across graph-level prediction, modality matching, and graph-conditioned generation. Results show that CoMAG achieves the best reported performance, demonstrating that task-adaptive reliable contexts and modality-preserving alignment improve structural prediction, cross-modal matching, and graph-conditioned generation while retaining sparse edge-linear complexity.

URL PDF HTML ☆

赞 0 踩 0

2606.14636 2026-06-15 cs.LG 新提交

Graph Diffusion Residuals for Control-Function Instrumental Variables

用于控制函数工具变量的图扩散残差

Rui Wu, Zongyuan Chen, Hong Xie, Defu Lian, Enhong Chen

发表机构 * School of Computer Science and Engineering, University of Science and Technology of China（中国科学技术大学计算机科学与技术学院）

AI总结提出自适应各向异性工具热流（A-IHF），一种基于图扩散的残差提取方法，用于灵活控制函数，通过检测处理跳跃并调整图传导性，在合成基准测试中优于多种基线方法。

Comments Submitted to Journal of Machine Learning Research (JMLR). 50 pages, 6 figures

详情

AI中文摘要

控制函数工具变量估计器需要第一阶段残差，而不仅仅是第一阶段预测。高容量的第一阶段可能会插值处理，从而为结果方程留下过少的残差信息。我们研究了自适应各向异性工具热流（A-IHF），这是一种用于灵活控制函数的确定性图扩散残差提取器。A-IHF将处理视为第一阶段特征图上的信号，使用引导扩散检测大的处理跳跃，减弱这些跳跃上的传导性，并通过稀疏图预解式计算生成的控制。其观测选择规则仅使用$(Z,X)$，结合了图广义交叉验证、粗糙度、残差化处理相关性以及图可容许性过滤。分析将误差分解为结构泄漏、残差衰减和残差化处理变异，得到有限样本界、潜在分段光滑几何下的图可容许性率以及有限路径选择校准。在54个合成基准单元中，与调优的图、核、树、提升、级数和神经网络控制函数基线相比，有保护的观测A-IHF具有最低的平均结构响应MSE；A-IHF族在32个单元中击败了最佳的非A-IHF基线。当图捕获分段光滑的第一阶段结构时，性能最强。

英文摘要

Control-function instrumental variable estimators need a first-stage residual, not merely a first-stage prediction. High-capacity first stages can interpolate treatment and leave too little residual information for the outcome equation. We study Adaptive Anisotropic Instrumental Heat Flow (A-IHF), a deterministic graph-diffusion residual extractor for flexible control functions. A-IHF treats treatment as a signal on a graph of first-stage features, uses pilot diffusion to detect large treatment jumps, attenuates conductance across those jumps, and computes the generated control with a sparse graph resolvent. Its observational selection rule uses only $(Z,X)$, combining graph generalized cross-validation, roughness, residualized-treatment relevance, and graph-admissibility filtering. The analysis decomposes error into structural leakage, residual attenuation, and residualized treatment variation, yielding finite-sample bounds, graph-admissibility rates under latent piecewise-smooth geometry, and finite-path selection calibration. Across 54 synthetic benchmark cells with tuned graph, kernel, tree, boosting, series, and neural control-function baselines, guarded observational A-IHF has the lowest average structural-response MSE; the A-IHF family beats the best non-A-IHF baseline in 32 cells. Performance is strongest when the graph captures piecewise-smooth first-stage structure.

URL PDF HTML ☆

赞 0 踩 0

2606.14047 2026-06-15 cs.IR cs.AI cs.CL cs.LG 交叉投稿

Knowledge Graph Enhanced Memory-Augmented Retrieval for Long Context Modeling

知识图谱增强的记忆增强检索用于长上下文建模

Ghadir Alselwi, Basem Suleiman, Hao Xue, Shoaib Jameel, Hakim Hacid, Flora D. Salim, Imran Razzak

发表机构 * University of New South Wales（新南威尔士大学）； Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））； University of Southampton（南安普顿大学）； Technology Innovation Institute（技术创新研究所）； Mohamed Bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）

AI总结提出KGERMAR框架，通过动态构建上下文知识图谱并融合多组件记忆架构，在长上下文建模中降低困惑度达8.5%，提升记忆效率2-2.5倍。

详情

AI中文摘要

长上下文语言建模不仅需要扩展上下文窗口，还需要在数千个token中保持对实体状态和关系的连贯理解——这是语义相似性单独无法解决的挑战。KGERMAR通过在推理过程中从输入文本构建动态的、上下文特定的知识图谱来解决这一问题，实现利用语义相似性和显式实体关系的领域自适应检索。该框架执行实时实体和关系抽取以构建上下文知识图谱，然后通过多组件记忆架构将图结构嵌入与文本语义相结合。维护三个记忆库——上下文、语义和结构——通过学习权重融合检索信号，以捕获表面语义和更深层次的关系模式。在SlimPajama（84.7K训练样本）、WikiText-103（4,358样本）、PG-19（100样本）和Proof-pile（46.3K样本）上评估，KGERMAR在1K到32K token的上下文长度上，相比记忆增强基线实现了高达8.5%的困惑度降低和2-2.5倍的记忆效率提升，并在五个NLU任务上展现出优越的上下文学习性能。动态知识图谱构建方法通过实现适应输入上下文而非依赖固定知识库的领域特定知识表示，推进了记忆增强语言建模。

英文摘要

Long-context language modeling requires not only extending context windows but maintaining coherent understanding of entity states and relationships across thousands of tokens -- a challenge that semantic similarity alone cannot address. KGERMAR addresses this by constructing dynamic, context-specific knowledge graphs from input text during inference, enabling domain-adaptive retrieval that leverages both semantic similarity and explicit entity relationships. The framework performs real-time entity and relation extraction to build contextual knowledge graphs, then integrates graph-structural embeddings with textual semantics through a multi-component memory architecture. Three memory banks -- contextual, semantic, and structural -- are maintained with retrieval signals fused via learned weights to capture both surface-level semantics and deeper relational patterns. Evaluated on SlimPajama (84.7K training examples), WikiText-103 (4,358 examples), PG-19 (100 examples), and Proof-pile (46.3K examples), KGERMAR achieves up to 8.5\% lower perplexity and 2--2.5x better memory efficiency than memory-augmented baselines across context lengths from 1K to 32K tokens, with superior in-context learning performance across five NLU tasks. The dynamic knowledge graph construction approach advances memory-augmented language modeling by enabling domain-specific knowledge representation that adapts to input contexts rather than relying on fixed knowledge bases.

URL PDF HTML ☆

赞 0 踩 0

2602.09258 2026-06-15 cs.LG 版本更新

Generalizing GNNs with Tokenized Mixture of Experts

泛化GNN：基于令牌化的专家混合

Xiaoguang Guo, Zehong Wang, Jiazheng Li, Shawn Spitzel, Qi Yang, Kaize Ding, Jundong Li, Chuxu Zhang

发表机构 * University of Connecticut Storrs（康涅狄格大学斯特劳斯分校）； University of Notre Dame（Notre Dame 大学）； University of Virginia（弗吉尼亚大学）； Northwestern University Evanston（北western 大学埃文斯顿分校）

AI总结针对图神经网络部署时稳定性与泛化性的权衡，提出STEM-GNN框架，通过令牌化专家混合编码器、向量量化接口和Lipschitz正则化头实现三方面平衡，在多种分布偏移和扰动下提升鲁棒性。

Comments Accepted to KDD 2026

详情

DOI: 10.1145/3770855.3817952

AI中文摘要

部署的图神经网络（GNN）在部署时是冻结的，但必须适应干净数据，在分布偏移下泛化，并对扰动保持稳定。我们表明静态推理引入了一个基本权衡：提高稳定性需要减少对偏移敏感特征的依赖，留下一个不可约的最坏情况泛化下限。实例条件路由可以打破这个上限，但很脆弱，因为偏移可能误导路由，扰动可能使路由波动。我们通过两个分解来捕捉这些效应：覆盖与选择分离，以及基础敏感性与波动放大分离。基于这些见解，我们提出了STEM-GNN，一个预训练-微调框架，包含一个用于多样化计算路径的专家混合编码器，一个用于稳定编码器到头部信号的向量量化令牌接口，以及一个用于限制输出放大的Lipschitz正则化头部。在九个节点、链接和图基准测试中，STEM-GNN实现了更强的三方面平衡，提高了对度/同质性偏移以及特征/边损坏的鲁棒性，同时在干净图上保持竞争力。

英文摘要

Deployed graph neural networks (GNNs) are frozen at deployment yet must fit clean data, generalize under distribution shifts, and remain stable to perturbations. We show that static inference induces a fundamental tradeoff: improving stability requires reducing reliance on shift-sensitive features, leaving an irreducible worst-case generalization floor. Instance-conditional routing can break this ceiling, but is fragile because shifts can mislead routing and perturbations can make routing fluctuate. We capture these effects via two decompositions separating coverage vs selection, and base sensitivity vs fluctuation amplification. Based on these insights, we propose STEM-GNN, a pretrain-then-finetune framework with a mixture-of-experts encoder for diverse computation paths, a vector-quantized token interface to stabilize encoder-to-head signals, and a Lipschitz-regularized head to bound output amplification. Across nine node, link, and graph benchmarks, STEM-GNN achieves a stronger three-way balance, improving robustness to degree/homophily shifts and to feature/edge corruptions while remaining competitive on clean graphs.

URL PDF HTML ☆

赞 0 踩 0

2605.07121 2026-06-15 cs.AI cs.LG 版本更新

AdaTKG: Adaptive Memory for Temporal Knowledge Graph Reasoning

AdaTKG: 用于时序知识图谱推理的自适应记忆

Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn

发表机构 * LG AI Research（LG人工智能研究）

AI总结提出AdaTKG，通过为每个实体维护自适应记忆，并采用可学习的指数移动平均更新，解决时序知识图谱中实体表示静态的问题，提升推理性能。

Comments KDD Workshop on Frontiers in Graph Machine Learning for the Large Model Era 2026 (Oral Presentation)

详情

AI中文摘要

时序知识图谱（TKG）表示带有时间戳的关系事实，并支持对演化事件进行广泛的推理任务。然而，现有方法生成的实体表示在实体层面是静态的，即每个表示仅是学习参数的函数，且不保留实体参与交互的任何痕迹。在本文中，我们摒弃这种静态观点，提出将每个实体建模为一个自适应过程，其表示在实体每次参与事实时被细化。为此，我们提出AdaTKG，它为每个实体维护一个记忆，该记忆随每次观察到的交互而更新，记忆在线累积，预测随更多交互的到来而改进。具体而言，我们将记忆更新实例化为一个可学习的指数移动平均，由单个共享标量控制，而不是为每个实体使用可学习参数，使AdaTKG能够处理训练中未见过的实体。大量实验证实了相对于TKG基线的持续改进，证明了自适应记忆的有效性。代码见：this https URL

英文摘要

Temporal knowledge graphs (TKGs) represent time-stamped relational facts and support a wide range of reasoning tasks over evolving events. However, existing methods produce entity representations that are static at the entity level, in that each representation is a function of learned parameters only and retains no trace of the interactions in which the entity has participated. In this paper, we depart from this static view and propose that each entity be modeled as an adaptive process whose representation is refined every time the entity participates in a fact. To this end, we propose AdaTKG, which maintains a per-entity memory that is updated with every observed interaction, with the memory accumulating online and predictions improving as more interactions arrive. Specifically, we instantiate the memory update as a learnable exponential moving average governed by a single shared scalar instead of using learnable parameters for each entity, enabling AdaTKG to handle entities unseen during training. Extensive experiments confirm consistent gains over TKG baselines, demonstrating the effectiveness of adaptive memory. Code is available at: https://github.com/seunghan96/AdaTKG

URL PDF HTML ☆

赞 0 踩 0

2606.11898 2026-06-15 cs.CL cs.LG 版本更新

GraspLLM: Towards Zero-Shot Generalization on Text-Attributed Graphs with LLMs

GraspLLM: 面向文本属性图与LLM的零样本泛化

Hengyi Feng, Zeang Sheng, Meiyi Qiang, Yang Li, Wentao Zhang

发表机构 * Peking University（北京大学）； National University of Singapore（新加坡国立大学）； University of California, Berkeley（加州大学伯克利分校）

AI总结提出GraspLLM框架，通过融合图结构理解与LLM语义能力，利用基序感知对比学习和最优上下文子图对齐，实现跨数据集和跨任务的零样本泛化。

详情

AI中文摘要

CuMA: 通过人口统计感知的适配器混合使大语言模型与稀疏文化价值观对齐

Ao Sun, Xiaoyu Wang, Zhe Tan, Yu Li, Jiachen Zhu, Yuheng Jia, Shu Su

发表机构 * Southeast University（东南大学）； ByteDance Inc.（字节跳动公司）； Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China（新一代人工智能技术及其交叉应用重点实验室（东南大学），中华人民共和国教育部，中国）

AI总结提出CuMA框架，通过人口统计感知路由将冲突梯度分离到专家子空间，解决密集模型在多文化对齐中的均值崩溃问题，在WorldValuesBench等基准上取得最优性能。

Comments ACL 2026 Main

详情

AI中文摘要

随着大语言模型服务于全球用户，对齐必须从强制执行普遍共识转向尊重文化多元主义。我们证明，密集模型在被迫适应冲突的价值分布时会出现\textbf{均值崩溃}，收敛到无法代表不同群体的通用平均值。我们将其归因于\textbf{文化稀疏性}，其中梯度干扰阻止密集参数跨越不同的文化模式。为解决此问题，我们提出\textbf{\textsc{CuMA}}（\textbf{Cu}ltural \textbf{M}ixture of \textbf{A}dapters），一个将对齐视为\textbf{条件容量分离}问题的框架。通过引入人口统计感知路由，\textsc{CuMA}内化了一个\textit{潜在文化拓扑}，以将冲突梯度明确解耦到专门的专家子空间中。在WorldValuesBench、Community Alignment和PRISM上的广泛评估表明，\textsc{CuMA}达到了最先进的性能，显著优于密集基线和仅语义MoE。关键的是，我们的分析证实\textsc{CuMA}有效缓解了均值崩溃，保留了文化多样性。我们的代码可在该https URL获取。

英文摘要

As Large Language Models (LLMs) serve a global audience, alignment must transition from enforcing universal consensus to respecting cultural pluralism. We demonstrate that dense models, when forced to fit conflicting value distributions, suffer from \textbf{Mean Collapse}, converging to a generic average that fails to represent diverse groups. We attribute this to \textbf{Cultural Sparsity}, where gradient interference prevents dense parameters from spanning distinct cultural modes. To resolve this, we propose \textbf{\textsc{CuMA}} (\textbf{Cu}ltural \textbf{M}ixture of \textbf{A}dapters), a framework that frames alignment as a \textbf{conditional capacity separation} problem. By incorporating demographic-aware routing, \textsc{CuMA} internalizes a \textit{Latent Cultural Topology} to explicitly disentangle conflicting gradients into specialized expert subspaces. Extensive evaluations on WorldValuesBench, Community Alignment, and PRISM demonstrate that \textsc{CuMA} achieves state-of-the-art performance, significantly outperforming both dense baselines and semantic-only MoEs. Crucially, our analysis confirms that \textsc{CuMA} effectively mitigates mean collapse, preserving cultural diversity. Our code is available at https://github.com/Throll/CuMA.

URL PDF HTML ☆

赞 0 踩 0

2606.13823 2026-06-15 cs.LG eess.SP stat.ML 新提交

A Stationarity-and-Coupling Criterion for Training-Free Time-Lagged Spectral Embeddings of Multivariate Time Series

多变量时间序列无训练时滞谱嵌入的平稳性与耦合准则

Siddharth Pal, Viktoria Rojkova

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出基于时滞相关矩阵截断的固定长度描述符D(τ)，通过平稳高斯VAR(1)模型推导其适用条件：信号近似平稳且类别信息存在于跨通道时间耦合而非边际功率。

Comments 25 pages, 2 figures, 10 tables

详情

AI中文摘要

我们研究多变量时间序列的无训练固定长度描述符，不仅问这样的描述符是否表现良好，而且问何时可以预期它有效。我们的研究对象是$D(\tau)$，它由时滞相关矩阵在Marchenko-Pastur边缘截断构建，使得仅信号承载的特征值存活，并通过与类质心的余弦相似度分类，零学习参数。核心贡献不是描述符本身，而是一个可证伪的适用性准则。基于平稳高斯VAR(1)模型，我们论证当信号近似平稳且类别信息存在于它们的跨通道时间耦合而非边际每通道功率时，$D(\tau)$能分离两个类别。我们半正式地推导出三个结果：可区分性条件、为什么静态（$\tau=0$）协方差退化为随机、以及为什么平稳但功率判别范式会击败描述符。该准则是可操作的：一个两部分预检测试——增强Dickey-Fuller平稳性检验和功率基线饱和检验——在任何训练前预测适用性。我们在混合数据集上验证了这两部分。在满足准则的四个范式（Sleep-EDF、BCI-IV-2a、MIT-BIH、ESC-50）上，描述符以极低成本与强基线竞争，在Sleep-EDF上20受试者留一法下达到$88.5\pm4.5\\%$，单CPU线程。在违反准则的三个范式——非平稳ERP、以及功率判别的金融波动和可穿戴压力模式——上，它完全如预检预测的那样失败，而这些负面结果更具信息量。我们明确$D(\tau)$不是最准确的表示；其价值在于它是一个紧凑、无训练的嵌入，其有效域事先已知。

英文摘要

We study training-free fixed-length descriptors for multivariate time series and ask not merely whether such a descriptor performs well, but when it can be expected to work at all. Our object of study is $D(τ)$, built from a time-lagged correlation matrix truncated at the Marchenko-Pastur edge so that only signal-bearing eigenvalues survive and classified by cosine similarity to class centroids with zero learned parameters. The central contribution is not the descriptor but a falsifiable applicability criterion for it. Working from a stationary Gaussian VAR(1) model, we argue that $D(τ)$ separates two classes when the signals are approximately stationary and the class information lives in their cross-channel temporal coupling rather than in marginal per-channel power. We derive, semi-formally, three consequences: a distinguishability condition, why the static ($τ=0$) covariance collapses to chance, and why a stationary but power-discriminated paradigm defeats the descriptor. The criterion is operational: a two-part pre-flight test -- an augmented Dickey-Fuller stationarity check and a power-baseline saturation check -- predicts applicability before any training. We validate both halves on a mixed assortment. On four paradigms that satisfy the criterion (Sleep-EDF, BCI-IV-2a, MIT-BIH, ESC-50) the descriptor is competitive with strong baselines at a fraction of their cost, reaching $88.5\pm4.5\%$ under 20-subject leave-one-subject-out on Sleep-EDF on a single CPU thread. On three that violate it -- non-stationary ERPs, and financial-volatility and wearable-stress regimes that are power-discriminated -- it fails exactly as the pre-flight predicts, and these negatives are the more informative half. We are explicit that $D(τ)$ is not the most accurate representation; its value is a compact, training-free embedding whose domain of validity is known in advance.

URL PDF HTML ☆

赞 0 踩 0

2606.14123 2026-06-15 cs.LG cs.AI 新提交

Recovering Stranded Discrimination in Knowledge Tracing: Per-Item Bias Correction via Empirical-Bayes Shrinkage

知识追踪中恢复被搁置的区分能力：通过经验贝叶斯收缩进行逐项偏差校正

Xiaoran Yan, Cheng Tang, Atsushi Shimada

发表机构 * Kyushu University（九州大学）

AI总结提出SLC方法，利用Laplace/IRLS将二值观测转化为高斯伪观测，通过卡尔曼平滑器进行经验贝叶斯收缩，并拟合偏移Platt链接，以校正知识追踪模型中的逐项偏差，恢复被搁置的区分能力，在多个数据集和骨干网络上提升AUC和NLL。

Comments 25 pages, 3 figures. Accepted at ECML PKDD 2026 (Research Track). Code: https://github.com/xiaoran-y/SLC

详情

AI中文摘要

部署的知识追踪模型通常在训练后被冻结，但由于骨干架构中逐项表达能力的限制以及部署后项目属性的变化，会出现系统性的逐项logit偏差，从而降低预测质量。全局事后校准器（如Platt缩放、温度缩放和保序回归）能改善概率估计，但无法改变由AUC衡量的区分能力。这种AUC不变性是单调分数变换的结构性结果；恢复被搁置的区分能力需要以项目身份为条件。我们提出SLC（状态空间logit校正），通过Laplace/IRLS将二值观测转换为高斯伪观测，通过卡尔曼平滑器应用经验贝叶斯收缩，并拟合偏移Platt链接。状态空间公式还产生了一个可检测性界限，表征了伯努利信息下限，解释了在当前数据密度下时间跟踪为何没有益处。在四个数据集、五个骨干网络和三个随机种子上，SLC在所有四个数据集上提升了AUC，在三个数据集上提升了NLL，优势集中在稀疏项目上。跨领域控制表明，当部署的骨干网络留下实体级偏差时，类似现象可能出现在教育领域之外。

英文摘要

Deployed knowledge-tracing models are typically frozen after training, yet systematic per-item logit bias arises, from limited per-item expressivity in backbone architectures and from post-deployment shifts in item properties, degrading prediction quality. Global post-hoc calibrators such as Platt scaling, temperature scaling, and isotonic regression improve probability estimates but leave discriminative ability, as measured by AUC, unchanged. This AUC invariance is a structural consequence of monotone score-only transforms; recovering the stranded discrimination requires conditioning on item identity. We propose SLC (State-space Logit Correction), which converts binary observations to Gaussian pseudo-observations via Laplace/IRLS, applies empirical-Bayes shrinkage through a Kalman smoother, and fits an offset-Platt link. The state-space formulation also yields a detectability bound that characterizes the Bernoulli information floor, explaining why temporal tracking provides no benefit at current data densities. Across four datasets, five backbones, and three seeds, SLC improves AUC on all four datasets and NLL on three, with the advantage concentrating on sparse items. Cross-domain controls suggest that the same phenomenon can arise beyond education when the deployed backbone leaves entity-level bias.

URL PDF HTML ☆

赞 0 踩 0

2606.14353 2026-06-15 cs.LG 新提交

Can Deep Neural Networks Improve Compression of Very Large Scientific Data?

深度神经网络能否改善超大规模科学数据的压缩？

Muhannad Alhumaidi, Guozhong Li, Spiros Skiadopoulos, Panos Kalnis

发表机构 * King Abdullah University of Science and Technology（阿卜杜拉国王科技大学）； University of the Peloponnese（伯罗奔尼撒大学）

AI总结本文提出将深度学习预测器集成到传统误差有界压缩框架中，通过气候数据实验发现，尽管ML预测器能提高预测精度和重建质量，但由于残差空间结构影响熵编码效率，未能提升整体压缩比。

详情

AI中文摘要

误差有界有损压缩是管理现代模拟和观测仪器产生的快速增长的科学数据的基本技术。大多数最先进的压缩器遵循预测-残差范式，其中压缩效果取决于预测器的质量：更准确的预测产生更小的残差，更容易压缩。这一观察提出了一个问题：现代机器学习模型能否作为科学数据压缩的优越预测器？直接回答这个问题具有挑战性，因为开发特定于压缩的ML预测器需要大量资源。相反，我们利用气候领域，其中已经存在高度准确的预训练天气预报基础模型，使其成为理想的测试平台。我们提出了一个框架，将空间和时间深度学习模型集成到传统的误差有界压缩流水线中。该框架支持自回归预测模型，并避免误差累积。使用ERA5气候数据作为代表性的大规模科学数据集，我们评估了三种不同的ML预测器：基于VAEformer的编解码器（CRA5）、图神经网络预测器（GraphCast）和视觉变换器预测器（Aurora），与最先进的压缩器SZ3.1在相同的量化和熵编码后端下进行比较。我们对约1.7 TB数据的评估揭示了一个令人惊讶的结果：尽管ML预测器生成更准确的预测，并且可以将重建质量提高多达91%，同时对于高度可预测的变量实现高达9.6倍的压缩比，但它们并没有提高整体数据集级别的压缩比。我们表明，仅预测准确性是不够的：所得残差的空间结构在熵编码效率中起决定性作用。

英文摘要

Error-bounded lossy compression is a fundamental technique for managing the rapidly growing volumes of scientific data produced by modern simulations and observational instruments. Most state-of-the-art-compressors follow a prediction-residual paradigm, where compression effectiveness depends on the quality of the predictor: more accurate predictions generate smaller residuals that are easier to compress. This observation raises a question: can modern machine learning models serve as superior predictors for scientific data compression? Answering this question directly is challenging because developing compression-specific ML predictors requires substantial resources. Instead, we leverage the climate domain where highly accurate pretrained weather forecasting foundation models already exist, making them an ideal testbed. We present a framework that integrates spatial and temporal deep learning models into a conventional error-bounded compression pipeline. The framework supports auto-regressive forecasting models and avoids error accumulation. Using ERA5 climate data as a representative large-scale scientific dataset, we evaluate three distinct ML predictors: a VAEformer-based codec (CRA5), a graph neural network forecaster (GraphCast), and a vision-transformer forecaster (Aurora), against the state-of-the-art compressor SZ3.1 under identical quantization and entropy-coding backends. Our evaluation over approximately 1.7 TB of data reveals a surprising result: although ML predictors generate more accurate predictions and can improve reconstruction quality by up to 91% while achieving up to 9.6x higher compression ratios for highly predictable variables, they do not improve overall dataset-level compression ratio. We show that prediction accuracy alone is insufficient: the spatial structure of the resulting residuals plays a decisive role in entropy coding efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.14397 2026-06-15 cs.LG 新提交

Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

Running the Gauntlet: 重新评估智能体在陌生环境中的能力

Mykola Vysotskyi, Runqi Lin, Grzegorz Biziel, Michal Zakrzewski, Sebastian Montagna, Damian Rynczak, Shreyansh Padarha, Kumail Alhamoud, Zihao Fu, William Lugoloobi, Kai Rawal, Hanna Yershova, Xander Davies, Taras Rumezhak, Guohao Li, Fazl Barez, Baoyuan Wu, Arkadiusz Drohomirecki, Yarin Gal, Chris Russell, Christopher Summerfield, Adam Mahdi, Volodymyr Karpiv, Philip Torr, Adel Bibi

发表机构 * University of Oxford（牛津大学）； SoftServe ； Massachusetts Institute of Technology（麻省理工学院）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））； UK AI Security Institute（英国人工智能安全研究所）； Ukrainian Catholic University（乌克兰天主教大学）

AI总结提出GauntletBench基准，通过20个视觉密集型任务评估智能体在时间感知、图形理解和3D推理等未被充分探索的能力，发现最先进智能体成功率仅19.1%，远低于人类80%以上。

详情

AI中文摘要

随着智能体系统不断发展并广泛部署于现实场景，对其能力进行忠实评估的需求日益增长。然而，当前的基准通常基于流行应用，任务相对简单，且关注狭窄的能力集，忽略了更广泛的维度，导致现代智能体性能饱和，无法探测其局限性。为此，我们引入了GauntletBench，一个基于网络的基准，用于评估智能体在挑战性场景中的泛化能力，重点关注三个未被充分探索的能力（时间感知、图形理解和3D推理），涵盖五个较少被覆盖的专业应用（视频编辑器、工作流构建器、3D建模器、飞行分析器和电路设计器），每个应用包含20个视觉密集型任务（共100个）。我们的基准提供了一个模块化流水线，包括一个与开源和闭源智能体框架兼容的环境、一个受控的基于网络的应用、一个结构良好的任务套件，以及一个具有多样化指标的自动评估引擎。与广泛预期相反，我们的实证结果表明，前沿智能体系统远未达到人类水平的表现。即使是最先进的智能体，在我们的GauntletBench上也仅达到19.1%的成功率，凸显了这些被忽视的能力和泛化方面的局限性。相比之下，非专家人类标注者在我们具有挑战性但可行的任务上实现了超过80%的成功率，揭示了当前智能体能力与复杂现实场景所需能力之间的巨大差距。

英文摘要

As agentic systems continue to evolve and are widely deployed in real-world scenarios, there is a growing demand to faithfully evaluate their capabilities. However, current benchmarks are typically built on popular applications with relatively simple tasks and focus on a narrow set of capabilities while overlooking broader dimensions, resulting in saturated performance on modern agents and failing to probe their limitations. To this end, we introduce GauntletBench, a web-based benchmark for evaluating agent generalisation in challenging scenarios, focusing on three underexplored capabilities (temporal perception, graphical understanding, and 3D reasoning), across five less-covered professional applications (Video Editor, Workflow Builder, 3D Modeller, Flight Analyser, and Circuit Designer), each with 20 vision-intensive tasks (100 in total). Our benchmark provides a modular pipeline that comprises an environment compatible with both open- and closed-source agent frameworks, a controlled web-based application, a well-structured task suite, and an automated evaluation engine with diverse metrics. Contrary to widespread expectations, our empirical results reveal that frontier agentic systems remain far from achieving human-level performance. Even the state-of-the-art agent achieves only a 19.1% success rate on our GauntletBench, highlighting the limitations in these overlooked capabilities and generalisation. By comparison, non-expert human annotators achieve over 80% success on our challenging yet feasible tasks, revealing the substantial gap between current agent capabilities and those required for complex real-world scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.14492 2026-06-15 cs.LG 新提交

Recipe-Controlled Decoder Audit for Structural Knowledge-Graph Completion

配方控制的解码器审计用于结构知识图谱补全

Xihang Shan, Ye Luo

发表机构 * School of Mathematical Sciences, Xiamen University（厦门大学数学科学学院）； School of Informatics, Xiamen University（厦门大学信息学院）

AI总结提出配方控制的解码器审计方法，通过交换解码器评估其对知识图谱补全性能的影响，发现解码器效果受配方和来源影响，并建议在编码器层面声明前进行解码器×深度扫描。

Comments 11 pages, 5 figures. Code and artifacts: https://github.com/AndyShan11/kgc-decoder-audit

详情

AI中文摘要

我们提出了一种用于结构直推式知识图谱补全（KGC）的配方控制解码器审计（RCDA）。该审计提出了一个简单的报告问题：在将性能提升归因于编码器或训练配方之前，当在相同配方下交换解码器时，会发生什么变化？使用ComplEx和DistMult作为主要控制对，并辅以针对性的RotatE/TransE抽查，我们评估了七个基准。在五个标准知识图谱上，在我们的配方下，ComplEx与DistMult的差异虽小但一致（MRR增加+0.005至+0.012），而CompGCN风格的编码器效果因数据集而异。在小知识图谱上，解码器效果成为主要诊断指标：Kinship显示ComplEx稳定优势为+0.143 MRR（6个种子），而UMLS在干净的6种子服务器重跑中偏好ComplEx（+0.022 MRR），但在早期来源变体中结果相反。因此，我们将小知识图谱的解码器选择视为对配方和来源敏感，而非固定的数据集胜者。我们进一步表明，在WN18RR上解码器选择与编码器深度存在交互，且在我们的配方下，YAGO3-10上L=0的ComplEx在d=128时达到0.6971 ± 0.0048 MRR。结果是一个紧凑的审计协议：报告匹配的解码器行，记录小知识图谱来源，并在做出编码器层面声明之前进行解码器×深度扫描。

英文摘要

We present a recipe-controlled decoder audit (RCDA) for structural transductive knowledge-graph completion (KGC). The audit asks a simple reporting question: before attributing gains to an encoder or training recipe, what changes when the decoder is swapped under the same recipe? Using ComplEx and DistMult as the primary controlled pair, with targeted RotatE/TransE spot-checks, we evaluate seven benchmarks. On five standard KGs, ComplEx-vs-DistMult differences are modest but consistent under our recipe (+0.005 to +0.012 MRR), whereas CompGCN-style encoder effects vary more by dataset. On small KGs, decoder effects become the main diagnostic: Kinship shows a stable ComplEx advantage of +0.143 MRR (6 seeds), while UMLS favours ComplEx by +0.022 MRR in a clean 6-seed server rerun but reverses in an earlier provenance variant. We therefore treat small-KG decoder choice as recipe- and provenance-sensitive rather than as a fixed dataset winner. We further show that decoder choice interacts with encoder depth on WN18RR, and that under our recipe L=0 ComplEx on YAGO3-10 reaches 0.6971 +/- 0.0048 MRR at d=128. The result is a compact audit protocol: report matched decoder rows, log small-KG provenance, and sweep decoder x depth before making encoder-level claims.

URL PDF HTML ☆

赞 0 踩 0

2606.14604 2026-06-15 cs.LG cs.AI 新提交

A Comparative Study of Deep Learning Architectures for Multi-Horizon Behavioural Forecasting for Mobile Health

移动健康多时间范围行为预测的深度学习架构比较研究

Pavlos Nicolaou, Kleanthis Malialis, Artemis Kontou, Panayiotis Kolios

发表机构 * KIOS Research and Innovation Center of Excellence, University of Cyprus（塞浦路斯大学KIOS研究与创新卓越中心）； Department of Electrical and Computer Engineering, University of Cyprus（塞浦路斯大学电气与计算机工程系）

AI总结本研究在三个公开数据集上系统比较了六种深度学习架构、两种零样本基础模型和统计基线在1-8天时间范围内的行为预测性能，发现PatchTST表现最佳，基础模型TimesFM在低数据场景下可与训练模型匹敌，且参与者级微调可将RMSE降低16-60%。

详情

AI中文摘要

可穿戴设备和智能手机生成丰富的行为时间序列，可支持主动健康干预，但缺乏对这些数据现代预测架构的系统比较。特别是，模型如何在人群中泛化、不同架构如何响应参与者级微调以及预测精度如何在多天范围内下降仍不清楚。我们在三个涵盖800多名参与者的公开数据集上基准测试了六种深度学习架构、两种零样本基础模型（FM）和统计基线，报告了步数、屏幕时间和睡眠时长在1-8天范围内的逐特征指标。我们进一步对所有六种架构进行了逐特征个性化研究，并评估了FM在不同数据集大小和时间粒度上的迁移性。我们的主要发现是：（i）没有单一架构占主导地位，PatchTST在训练模型中领先，而前三名（TCN、MLP、Transformer）之间没有显著性能差异；（ii）FM TimesFM在零样本情况下匹配或超过训练模型，尤其是在低数据场景下；（iii）参与者级微调将逐特征RMSE降低了16-60%，其中睡眠受益最大，步数受益最小。这些结果为移动健康预测中的架构选择、FM适用性和个性化策略提供了实用指导。据我们所知，这是首个联合评估现代深度学习、FM和个性化用于可穿戴设备多时间范围行为预测的研究。

英文摘要

Wearable devices and smartphones generate rich behavioural time series that can support proactive health interventions, yet systematic comparisons of modern forecasting architectures for these data are lacking. In particular, it remains unclear how models generalise across populations, how different architectures respond to participant-level fine-tuning and how forecasting accuracy degrades across multi-day horizons. We benchmark six deep learning architectures, two zero-shot Foundation Models (FM) and statistical baselines on three public datasets encompassing over 800 participants, reporting per-feature metrics for step counts, screen time and sleep duration across 1-8 day horizons. We further conduct a per-feature personalisation study across all six architectures and assess FM transferability across dataset sizes and temporal granularities. Our key findings are: (i) no single architecture dominates, PatchTST leads among trained models while the three runners-up (TCN, MLP, Transformer) show no meaningful performance difference; (ii) the FM TimesFM matches or exceeds trained models zero-shot, especially in low-data regimes and (iii) participant-level fine-tuning reduces per-feature RMSE by 16-60\%, with sleep benefiting most and step counts least. These results provide practical guidance on architecture selection, FM applicability and personalisation strategies for mobile health forecasting. To the best of our knowledge, this is the first study to jointly evaluate modern deep learning, FMs and personalisation for multi-horizon behavioural forecasting from wearables.

URL PDF HTML ☆

赞 0 踩 0

2606.13684 2026-06-15 cs.CY cs.AI cs.CL cs.LG 交叉投稿

Cross-Dataset Bloom Question Classification: Supervised Models and Prompted LLMs

跨数据集布鲁姆问题分类：监督模型与提示式大语言模型

Abdolali Faraji, Mohammadreza Molavi, Zohreh Rasoulkhani, Mohammadreza Tavakoli, Gábor Kismihók

发表机构 * Leibniz Information Centre for Science and Technology（莱比锡信息科学与技术研究中心）； University of Genoa（热那亚大学）

AI总结评估监督ML/DL模型和LLM在跨数据集布鲁姆分类中的泛化能力，发现LLM更稳定，并基于最佳提示策略开发了轻量级UI。

Comments Accepted at AIED 2026. Abdolali Faraji and Mohammadreza Molavi contributed equally to this work

详情

AI中文摘要

自动对评估问题进行布鲁姆分类可以大幅减少教师工作量，但标注具有主观性且依赖教师。先前的机器学习和深度学习方法在数据集内表现良好，但很少在跨数据集设置中评估，导致现实世界的泛化能力不明确；同时，LLM在布鲁姆问题分类中的有效性尚未被系统研究。我们评估了现有ML/DL方法的跨数据集泛化能力，并在五个数据集上使用多种提示策略评估了LLM；最佳提示策略结合了上下文示例和课程特定的动作动词。监督ML/DL模型在未见数据集上性能大幅下降，而LLM更稳定，表明其在多样化教育环境中是一种稳健的替代方案。基于最佳提示策略，我们还开发了一个轻量级用户界面，支持教师自动分类大量问题库；可用性研究表明低工作量和高度可用性。

英文摘要

Automatic Bloom's taxonomy classification of assessment questions can substantially reduce instructor workload, but labeling is subjective and teacher-dependent. Prior machine learning (ML) and deep learning (DL) approaches reported strong within-dataset results, yet were rarely evaluated in cross-dataset settings, leaving real-world generalizability unclear; meanwhile, LLM effectiveness for Bloom question classification has not been systematically studied. We evaluated the cross-dataset generalization of existing ML/DL methods and assessed LLMs with multiple prompting strategies on five datasets; the best prompting strategy combined in-context examples with course-specific action verbs. Supervised ML/DL models degraded substantially on unseen datasets, whereas LLMs were more stable, suggesting a robust alternative across diverse educational contexts. Based on the best prompting strategy, we also presented a lightweight UI that supports instructors in automatically classifying large question banks; a usability study indicated low workload and high usability.

URL PDF HTML ☆

赞 0 踩 0

2606.13735 2026-06-15 cs.AR cs.AI cs.LG cs.PL 交叉投稿

VHDLSuite: Unified Pipeline for LLM VHDL Generation with Data Synthesis and Evaluation

VHDLSuite：面向LLM VHDL生成的统一流水线，包含数据合成与评估

Yijun Shen, Minghao Shao, Yichen Zhao, Zhuoyan Yu, Boyuan Chen, Yik-Cheung Tam, Muhammad Shafique

发表机构 * Center for Data Science, NYU Shanghai, China（纽约市立大学上海分校数据科学中心）； NYU Tandon School of Engineering, USA（纽约大学Tandon工程学院）； NYU Abu Dhabi, UAE（纽约大学阿布扎比分校）

AI总结提出VHDLSuite基础设施，通过自动基准合成、可执行验证和多模型诊断分析，解决LLM在VHDL生成评估中的不足，并构建含200+问题的VHDLBench基准。

详情

AI中文摘要

大型语言模型（LLM）在寄存器传输级（RTL）代码生成方面展现了令人印象深刻的能力，尤其是针对Verilog。然而，评估它们在其他硬件描述语言（HDL）上的性能，特别是VHDL，仍然有限，尽管其独特的语言特性（如更严格的语义规则）引入了与Verilog不同的评估考量。这种覆盖不足限制了对当前模型在不同结构和语义的硬件设计语言中泛化能力的全面理解。为弥补这一空白，我们引入了VHDLSuite，一个以基准为中心的可扩展VHDL生成评估基础设施，集成了自动基准合成、可执行验证和多模型诊断分析。首先，我们提出一个数据流水线，自动将Verilog设计及其配套测试平台转换为可执行的VHDL基准实例，随后基于VUnit/GHDL进行验证，确保每个发布的任务在VHDL环境中可编译、可运行且可一致检查。其次，我们引入VHDLBench，一个包含超过200个VHDL问题的基准，配有完整且经过验证的测试平台，覆盖广泛的复杂度级别。第三，我们广泛评估了最先进的LLM，并揭示了LLM辅助VHDL生成中的关键挑战。我们的发现为多语言硬件设计的未来工作提供了重要见解和支持。该数据流水线、基准和评估框架将开源。

英文摘要

Large Language Models (LLM) have shown impressive capabilities in Register Transfer Level (RTL) code generation, particularly for Verilog. However, evaluating their performance with other Hardware Description Languages (HDL), especially VHDL, remains limited although its distinct language characteristics, such as stricter semantic rules, introduce evaluation considerations that differ from Verilog. This lack of coverage restricts fully understanding of how well current models generalize across hardware design languages with differing structures and semantics. To address this gap, we introduce VHDLSuite, a benchmark-centered infrastructure for scalable VHDL generation evaluation, integrating automated benchmark synthesis, executable validation, and multi-model diagnostic analysis. First, we propose a data pipeline that automatically converts Verilog designs and their accompanying testbenches into executable VHDL benchmark instances, followed by VUnit/GHDL-based validation to ensure each released task is compilable, runnable, and consistently checkable in the VHDL environment. Second, we introduce VHDLBench, a benchmark with over 200 VHDL problems with complete and validated testbenches across a wide range of complexity levels. Third, we extensively evaluate cutting-edge LLMs and uncover key challenges specific on LLM-aided VHDL generation. Our findings provide important insights and support future work in multi-language hardware design automation.Our data pipeline, benchmark, and evaluation framework will be open-sourced.

URL PDF HTML ☆

赞 0 踩 0

2606.13802 2026-06-15 cs.SE cs.AI cs.HC cs.LG 交叉投稿

A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets

电子表格中下一步动作预测的基准测试与框架

Tejas Agrawal, Vu Le, Sumit Gulwani, Gust Verbruggen

发表机构 * University of Waterloo（多伦多大学）

AI总结针对电子表格缺乏自动补全功能的问题，提出一个基准测试，通过人工整理动作序列和在线评估方法，比较多种预测模型，分析动作保存、误报、效率等特性。

Comments Accepted at ICML 2026. Code and benchmark: https://github.com/Tej-55/NAPE

详情

AI中文摘要

预测性代码补全极大地加速了开发人员的工作效率。在电子表格中，尽管更为常见，但这种自动补全功能几乎不存在。为了解决这一差距，我们引入了一个基准测试，用于观察电子表格中用户动作序列并预测未来动作的系统。两个挑战是（1）公共电子表格语料库中缺乏编辑历史，以及（2）电子表格动作的复杂空间（空间、时间、复合）。为了解决（1），我们手动整理了52个序列，包含12K个动作，这些动作通过参数化启发式和LLM精炼从公共语料库中重新创建电子表格。为了解决（2），我们提出了一种在线评估方法，该方法在每个用户动作后期望一个预测，接受或拒绝该预测，在接受时更新未来动作，并重复此过程直到获得目标电子表格。我们使用多个基线预测器（包括零样本LLM、微调SLM和经典模型），并分析了基准测试教给我们的不同属性，包括但不限于：保存动作和误报的属性、效率、用户配置文件的影响、触发器的影响以及上下文的影响。

英文摘要

Predictive code completion greatly accelerates how quickly developers work. In spreadsheets, despite being much more common, such auto-completion features are virtually non-existent. To address this gap, we introduce a benchmark for systems that observe a sequence of user actions in a spreadsheet and predict future actions. Two challenges are (1) the absence of edit histories in public spreadsheet corpora and (2) the complex space of spreadsheet actions (spatial, temporal, composite). To address (1), we manually curate 52 sequences of 12K actions that recreate spreadsheets from public corpora, seeded by parametrized heuristics and LLM refinement. To address (2), we propose an online evaluation that expects a prediction after each user action, accepts or rejects that prediction, updates the future actions upon acceptance, and repeats this until the target spreadsheet is obtained. We use multiple baseline predictors (including zero-shot LLMs, fine-tuned SLMs, and classical models) and analyze different properties that our benchmark teaches us, including but not limited to: properties of saved actions and false positives, efficiency, effect of user profiles, effect of triggers, and effect of context.

URL PDF HTML ☆

赞 0 踩 0

2606.13994 2026-06-15 cs.CR cs.AI cs.LG 交叉投稿

什么驱动了CLIP的测试时适应？从更新视角进行的受控实证研究

Jiazhen Huang, Xiao Chen, Zhiming Liu, Yaru Sun, Jingyan Jiang, Zhi Wang

发表机构 * Tsinghua University（清华大学）； Shenzhen Technology University（深圳技术大学）

AI总结本文通过受控实证研究，从更新视角分析了CLIP测试时适应方法的驱动因素，揭示了适应增益主要来自测试时证据和可靠代理，而非繁重优化，并指出无单一范式普遍最优。

详情

AI中文摘要

视觉语言模型（如CLIP）已成为开放词汇识别的标准骨干，但其零样本预测在部署时仍易受分布偏移影响。测试时适应（TTA）最近被扩展到CLIP作为轻量级解决方案，导致TTA4CLIP方法迅速增长。然而，该领域的实证进展在很大程度上超过了我们对真正驱动适应因素、其增益来源以及哪些偏移下保持可靠的理解。本文从追求最先进准确率中退一步，对TTA4CLIP进行了系统性的受控研究。我们首先根据测试时更新的内容，将现有方法组织为三个统一范式。然后，我们引入TTABC，一个开源的CLIP TTA基准，它标准化了评估协议并集成了20多种代表性方法。我们的受控实证分析集中在三个关键领域。首先，我们确定了基于参数方法的驱动因素，揭示适应增益主要由测试时证据和可靠代理驱动，而非繁重优化。其次，我们探索了超越繁重参数调整的证据利用，表明通过跨样本或当前样本证据以及轻量级原型更新可以实现竞争性和高效的性能。最后，我们证明TTA没有银弹：没有单一的适应范式普遍最优，首选范式取决于偏移的性质。我们希望我们的基准和研究能提供对当前TTA4CLIP格局的更清晰理解，并为进一步研究奠定基础。

英文摘要

Vision-Language Models (VLMs) such as CLIP have become a standard backbone for open-vocabulary recognition, yet their zero-shot predictions remain vulnerable to distribution shifts encountered at deployment. Test-Time Adaptation (TTA) has recently been extended to CLIP as a lightweight solution, leading to a rapidly growing body of TTA4CLIP methods. However, empirical progress in this area has largely outpaced our understanding of what truly drives adaptation, where their gains originate, and under which shifts they remain reliable. In this paper, we take a step back from the pursuit of state-of-the-art accuracy and conduct a systematic controlled study of TTA4CLIP. We first organize existing methods into three unified paradigms according to what is updated at test time. We then introduce TTABC, an open-source TTA Benchmark for CLIP, which standardizes evaluation protocols and integrates more than 20 representative methods. Our controlled empirical analysis focuses on three key areas. First, we determine the driving factors in parameter-based methods, revealing that adaptation gains are primarily driven by test-time evidence and reliable proxies rather than heavy optimization. Second, we explore evidence utilization beyond heavy parameter tuning, showing that competitive and efficient performance can be achieved through cross- or current-sample evidence and lightweight prototype updates. Finally, we demonstrate that there is no silver bullet for TTA: no single adaptation paradigm is universally optimal, and the preferred paradigm depends on the nature of shift. We hope our benchmark and study provide a clearer understanding of the current TTA4CLIP landscape and establish a foundation for further research.

URL PDF HTML ☆

赞 0 踩 0

2606.14506 2026-06-15 stat.ML cs.LG stat.ME 交叉投稿

基于条件共形检验鞅的分布偏移检测

Shalev Shaer, Yarin Bar, Drew Prinster, Yaniv Romano

发表机构 * Technion - Israel Institute of Technology（技术ion - 以色列理工学院）

AI总结提出一种顺序检验方法，通过固定参考集避免测试污染，利用稳健鞅构造实现任意有效的I型错误控制和渐近功效1，检测速度优于标准共形检验鞅。

详情

AI中文摘要

我们提出了一种用于检测任意分布偏移的顺序检验方法，该方法允许共形检验鞅（CTM）在固定的参考条件设置下工作。现有的CTM检测器通过不断用每个新样本扩展参考集来构建检验鞅，并以此评估新样本相对于过去观测的异常程度。虽然这种设计能实现任意有效的I型错误控制，但它存在测试污染问题：变化发生后，偏移后的观测进入参考集，稀释了分布偏移的证据，增加了检测延迟并降低了功效。相比之下，我们的方法通过将每个新样本与固定的零假设参考数据集进行比较，从设计上避免了污染。我们的主要技术贡献是一种稳健的鞅构造，该构造在条件于零假设参考数据时仍然有效，通过显式考虑有限参考集引起的参考分布估计误差来实现。这实现了任意有效的I型错误控制，同时保证了渐近功效为1和有界期望检测延迟。实验表明，我们的方法比标准CTM更快地检测到偏移，提供了一种强大且可靠的分布偏移检测器。

英文摘要

We propose a sequential test for detecting arbitrary distribution shifts that allows conformal test martingales (CTMs) to work under a fixed, reference-conditional setting. Existing CTM detectors construct test martingales by continually growing a reference set with each incoming sample, using it to assess how atypical the new sample is relative to past observations. While this design yields anytime-valid type-I error control, it suffers from test-time contamination: after a change, post-shift observations enter the reference set and dilute the evidence for distribution shift, increasing detection delay and reducing power. In contrast, our method avoids contamination by design by comparing each new sample to a fixed null reference dataset. Our main technical contribution is a robust martingale construction that remains valid conditional on the null reference data, achieved by explicitly accounting for the estimation error in the reference distribution induced by the finite reference set. This yields anytime-valid type-I error control together with guarantees of asymptotic power one and bounded expected detection delay. Empirically, our method detects shifts faster than standard CTMs, providing a powerful and reliable distribution-shift detector.

URL PDF HTML ☆

赞 0 踩 0

2604.14892 2026-06-15 cs.LG cs.AI 版本更新

Can LLMs Accurately Score Medical Diagnoses and Clinical Reasoning?

LLM能否准确评分医学诊断和临床推理？

Amy Rouillard, Sitwala Mundia, Linda Camara, Ziyaad Dangor, Michael Cameron Gramanie, Ismail Kalla, Shabir A. Madhi, Kajal Morar, Marlvin T. Ncube, Haroon Saloojee, Bruce A. Bassett

发表机构 * Wits MIND Institute, University of the Witwatersrand, Johannesburg, South Africa（维特士心理研究所，沃斯兰德大学，约翰内斯堡，南非）； Grai Labs, Cape Town, South Africa（格雷实验室，开普敦，南非）； South African Medical Research Council Vaccines and Infectious Diseases Analytics Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa（南非医学研究理事会疫苗和传染病分析研究组，健康科学学院，沃斯兰德大学，约翰内斯堡，南非）； Department of Internal Medicine, Charlotte Maxeke Johannesburg Academic Hospital, and Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa（内科学系，查理·马克斯凯约翰内斯堡学术医院，以及健康科学学院，沃斯兰德大学，约翰内斯堡，南非）； Department of Paediatrics and Child Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa（儿科学与儿童健康系，健康科学学院，沃斯兰德大学，约翰内斯堡，南非）； Wits MIND Institute, University of the Witwatersrand, Johannesbu（维特士心理研究所，沃斯兰德大学，约翰内斯堡）

AI总结研究使用LLM陪审团对300例低收入和中等收入国家医院病例的3334个诊断进行评分，发现校准后的LLM评分与专家评分高度一致，且严重错误风险更低，可作为可靠的评估代理。

详情

AI中文摘要

使用专家临床医生小组评估医学AI系统成本高且速度慢，这促使使用大型语言模型（LLM）作为替代评判者。在此，我们评估了一个由三个前沿AI模型组成的LLM陪审团，对300个真实低收入和中等收入国家（LMIC）医院病例的3334个诊断进行评分。LLM和临床医生生成的诊断均根据专家小组诊断在四个维度上进行评分：诊断、鉴别诊断、临床推理和阴性治疗风险。将LLM陪审团评分与专家和独立重新评分小组的评分进行比较，以评估误差指标、评分者间一致性、严重风险错误以及使用等渗回归进行事后校准的效果。在我们的数据中，我们发现：（i）未校准的LLM陪审团评分与专家临床医生小组评分保持序数一致性，但系统性地更低；（ii）LLM陪审团出现严重风险错误的概率低于人类专家重新评分小组；（iii）LLM陪审团结合LLM诊断可用于识别高风险错误诊断，从而实现有针对性的专家审查并提高小组效率；（iv）校准后的LLM陪审团评分和诊断代理排名与主要专家小组的评分和排名表现出极好的一致性；（v）LLM陪审团模型没有表现出自我偏好偏差，它们对自己底层模型或同一供应商模型生成的诊断评分并不比其他模型生成的诊断更有利（或更不利）。总之，这些结果提供了证据，表明校准后的LLM陪审团是医学AI基准测试中专家临床医生评估的值得信赖且可靠的代理。在其他临床环境中确认这些发现是未来工作的重要方向。

英文摘要

Evaluating medical AI systems using expert clinician panels is costly and slow, motivating the use of large language models (LLMs) as alternative adjudicators. Here, we evaluate an LLM Jury, composed of three frontier AI models, for scoring 3334 diagnoses on 300 real-world low- and middle-income country (LMIC) hospital cases. Both LLM- and clinician-generated diagnoses are scored against expert panel diagnoses across four dimensions: diagnosis, differential diagnosis, clinical reasoning, and negative treatment risk. The LLM Jury scores are compared with expert and independent re-scoring panel scores to assess error metrics, inter-rater agreement, severe-risk errors, and the effect of post hoc calibration using isotonic regression. In our data, we find that: (i) the uncalibrated LLM Jury scores preserve ordinal agreement with the expert clinician panel scores, but are systematically lower; (ii) the probability of severe-risk errors is lower for the LLM Jury than the human expert re-score panels; (iii) the LLM Jury combined with LLM diagnoses can be used to identify diagnoses at high risk of error, enabling targeted expert review and improved panel efficiency; (iv) the calibrated LLM Jury scores and rankings of diagnosing agents show excellent agreement with those of the primary expert panels; (v) LLM Jury models show no self-preference bias, they did not score diagnoses generated by their own underlying model or models from the same vendor more (or less) favourably than those generated by other models. Together, these results provide evidence that a calibrated LLM Jury is a trustworthy and reliable proxy for expert clinician evaluation in medical AI benchmarking. Confirming these findings in other clinical settings is an important direction for future work.

URL PDF HTML ☆

赞 0 踩 0

2606.12994 2026-06-15 cs.LG cs.CE 版本更新

DeepJEB++: Foundation Model-Driven Large-Scale 3D Engineering Dataset via 2D Latent Space Augmentation

DeepJEB++: 基于基础模型驱动的二维潜空间增强的大规模三维工程数据集

Soyoung Yoo, Leekyo Jeong, Jinsu Ra, Dongeon Lee, Sunwoong Yang, Hyogu Jeong, Namwoo Kang

发表机构 * Cho Chun Shik Graduate School of Mobility, Korea Advanced Institute of Science and Technology（韩国科学技术院赵春植移动研究生院）； Department of Mechanical Engineering, Hanyang University（汉阳大学机械工程系）； Narnia Labs（纳尼亚实验室）

AI总结提出DeepJEB++框架，通过二维潜空间增强和基础模型，将少量喷气发动机支架种子设计扩展为大规模带仿真标签的三维数据集，实现40倍扩展。

Comments 16 pages, 14 figures. Submitted to ASME Journal of Mechanical Design

详情

AI中文摘要

数据驱动的工程设计受到缺乏大规模三维数据集的限制，这些数据集需要将几何形状与基于物理的性能标签配对。特别是，现有的三维数据增强技术在保留微妙且多样的几何变化方面存在局限性，并且自动化后续的仿真标注过程仍然困难，因为边界条件取决于生成的几何形状。我们提出了DeepJEB++，一个基础模型驱动的数据增强框架，在资源受限的情况下将少量喷气发动机支架种子设计扩展为大规模、带仿真标签的三维数据集。我们的关键思想是在数据丰富的二维潜空间中进行增强，然后转移到三维。在第一阶段，我们在多视图渲染上微调预训练的二维潜扩散模型，并通过潜插值合成新视图，通过视觉语言模型（VLM）质量过滤器保留可制造的设计。在第二阶段，经过验证的图像通过领域适应的生成基础模型提升为三维网格。在第三阶段，一个自动化流水线识别每个网格上的载荷和螺栓接口，并分配有限元标签——质量、应力和位移——无需人工干预。我们沿着三个内在轴评估增强质量：可制造性、相对于SimJEB真实值的标签保真度以及分布一致性。从少于400个种子设计开始，DeepJEB++在每阶段使用单个GPU的情况下，生成了15,360个带仿真标签的三维支架——实现了40倍的扩展。该数据集将公开提供，以支持可复现的工程AI研究。

英文摘要

Data-driven engineering design is constrained by the lack of large-scale 3D datasets that pair geometry with physics-based performance labels. In particular, existing 3D data augmentation techniques have limitations in preserving subtle and diverse geometric variations, and it remains difficult to automate the subsequent simulation-labeling process, where boundary conditions vary depending on the generated geometry. We present DeepJEB++, a foundation-model-driven data-augmentation framework that expands a small seed set of jet engine brackets into a large, simulation-labeled 3D dataset under constrained resources. Our key idea is to augment in the data-rich 2D latent space, then transfer to 3D. In Stage 1, we fine-tune a pretrained 2D latent diffusion model on multi-view renders and synthesize novel views by latent interpolation, retaining manufacturable designs through a vision-language-model (VLM) quality filter. In Stage 2, the validated images are lifted to 3D meshes by a domain-adapted generative foundation model. In Stage 3, an automated pipeline recognizes the load and bolt interfaces on each mesh and assigns finite-element labels -- mass, stress, and displacement -- without manual intervention. We assess augmentation quality along three intrinsic axes: manufacturability, label fidelity against the SimJEB ground truth, and distributional consistency. Starting from fewer than 400 seed designs, DeepJEB++ yields 15,360 simulation-labeled 3D brackets -- a 40x expansion -- using a single GPU per stage. The dataset will be made publicly available to support reproducible engineering-AI research.

URL PDF HTML ☆

赞 0 踩 0

2606.13221 2026-06-15 cs.LG 版本更新

From Uncertain Judgments to Calibrated Rankings: Conformal Elo Estimation for LLM Evaluation

从不确定判断到校准排名：用于LLM评估的共形Elo估计

Bora Kargi, David Salinas

发表机构 * ELLIS Institute Tübingen（ELLIS 蒂宾根研究所）； OpenEuroLLM

AI总结提出一种两层次校准方法，通过局部不确定性传播和全局共形预测，将LLM-as-a-judge的Elo评分误差降至17.9 MAE，并提供无分布假设的置信区间。

详情

AI中文摘要

评估新的大型语言模型通常需要大规模且昂贵的人工标注。LLM作为评判者提供了一种更便宜的替代方案，但评判者评分存在系统误差——如位置偏差、自我偏好或不可传递性——这些误差可能导致最终排名严重失准。我们在两个互补层面上量化评判者与人类之间的分歧。在局部层面，我们通过将校准的获胜概率而非硬标签传播到Bradley-Terry过程中，从评判者自身的评分差异估计每场对战的不确定性。仅此一项就显著提高了Elo估计的准确性，在LMArena上对55个保留模型取平均时，LLM得出的评分与人类得出的评分之间的平均绝对误差为17.9 Elo。在全局层面，我们将分裂共形预测应用于LLM得出的与人类得出的Elo评分之间的残差差距，产生具有无分布边际覆盖保证的预测区间，从而解释了不可约的LLM-人类分歧。这两层结合产生了一个低成本的评估工具，为开发者提供校准的Elo估计和诚实的置信区间，而无需大规模人工标注。为促进可重复性，我们在https://this http URL发布代码。

英文摘要

Evaluating new large language models typically requires costly human annotation campaigns at scale. LLM-as-a-judge offers a cheaper alternative, but judge scores carry systematic errors - such as position bias, self-preference, or intransitivity - that can strongly miscalibrate the resulting rankings. We quantify the resulting judge-human disagreement at two complementary levels. At the local level, we estimate per-battle uncertainty from the judge's own score differences by propagating calibrated win probabilities rather than hard labels into the Bradley-Terry procedure. This alone provides a drastic improvement to Elo estimation accuracy, bringing LLM-derived ratings within 17.9 Elo MAE of human-derived ones when averaged over 55 held-out models on LMArena. At the global level, we apply split conformal prediction to the residual gap between LLM-derived and human-derived Elo ratings across held-out models, producing prediction intervals with distribution-free marginal coverage guarantees that account for irreducible LLM-human disagreement. Together, these two layers yield a low-cost evaluation tool that provides developers with calibrated Elo estimates and honest uncertainty bounds, without access to large-scale human annotations. To facilitate reproducibility, we release our code at https://github.com/kargibora/SoftElo .

URL PDF HTML ☆

赞 0 踩 0

2601.04646 2026-06-15 cs.IR cs.AI cs.CL cs.LG 版本更新

Succeeding at Scale: Enterprise Retrieval Benchmark Construction and Index-Preserving Query Adaptation for Multi-Tenant Search

规模化成功：面向多租户搜索的企业检索基准构建与索引保持查询适配

Prateek Jain, Shabari S Nair, Ritesh Goru, Prakhar Agarwal, Ajay Yadav, Yoga Sri Varshan Varadharajan, Constantine Caramanis

发表机构 * Prateek Jain ； Shabari S Nair ； Ritesh Goru ； Prakhar Agarwal ； Ajay Yadav ； Yoga Sri Varshan Varadharajan ； Constantine Caramanis

AI总结针对多租户检索系统中标注数据匮乏和模型更新成本高的问题，提出全自动构建基准DevRev-Search，并研究仅微调查询编码器而保持文档索引不变的索引保持查询适配策略，实现质量与效率的平衡。

详情

AI中文摘要

大规模多租户检索系统生成大量查询日志，但缺乏用于有效领域适应的精心策划的相关性标签，导致大量“暗数据”未被充分利用。模型更新的高成本加剧了这一挑战，因为联合微调查询和文档编码器需要完整的语料库重新索引，这在拥有数千个独立索引的多租户环境中是不切实际的。我们引入了DevRev-Search，这是一个通过完全自动化管道构建的技术客户支持段落检索基准。候选生成使用跨多种稀疏和密集检索器的融合，随后使用LLM作为评判器进行一致性过滤和相关性标记。我们进一步研究并系统评估了索引保持查询适配策略，该策略仅微调查询编码器，同时保持文档索引固定。在DevRev-Search、SciFact和FiQA-2018上的实验表明，参数高效的查询编码器微调提供了显著的质量-效率权衡，实现了可扩展且实用的企业多租户检索。

英文摘要

Large-scale multi-tenant retrieval systems generate extensive query logs but lack curated relevance labels for effective domain adaptation, resulting in substantial underutilized "dark data." This challenge is compounded by the high cost of model updates, as jointly fine-tuning query and document encoders requires full corpus re-indexing, which is impractical in multi-tenant settings with thousands of isolated indices. We introduce DevRev-Search, a passage retrieval benchmark for technical customer support built via a fully automated pipeline. Candidate generation uses fusion across diverse sparse and dense retrievers, followed by an LLM-as-a-Judge for consistency filtering and relevance labeling. We further study and systematically evaluate index-preserving query-only adaptation strategies that fine-tune only the query-encoder while keeping the document indices fixed. Experiments on DevRev-Search, SciFact, and FiQA-2018 show that parameter-efficient fine-tuning of the query encoder delivers a remarkable quality-efficiency trade-off, enabling scalable and practical enterprise multi-tenant retrieval.

URL PDF HTML ☆

赞 0 踩 0

2602.00593 2026-06-15 cs.CV cs.LG 版本更新

Pix2Fact: When Vision Is Not Enough -- Benchmarking Fine-Grained VQA with Web Verification on High-Resolution Real-World Scenes

Pix2Fact: 当视觉不够时——基于网络验证的细粒度VQA基准测试

Yifan Jiang, Cong Zhang, Bofei Zhang, Qiaofeng Zheng, Yifan Yang, Bingzhang Wang, Yew-Soon Ong

发表机构 * GADE Union (Global AI Data Experts Union)（GADE联盟（全球人工智能数据专家联盟））； Shanghai Jiao Tong University（上海交通大学）； Nanyang Technological University（南洋理工大学）； New York University（纽约大学）； Cambridge University（剑桥大学）； The University of Hong Kong（香港大学）

AI总结本文提出Pix2Fact基准测试，通过高分辨率真实场景中的网络验证，评估细粒度视觉问答中的专家级视觉感知和知识搜索能力，发现现有模型在复杂任务中存在显著不足。

详情

AI中文摘要

尽管在通用任务上取得了进展，视觉-语言模型（VLMs）仍然在需要精细视觉定位和外部知识的挑战中面临困难，而现有基准测试未能综合评估这些能力。为填补这一空白，我们引入Pix2Fact，一个视觉问答基准测试，旨在评估专家级视觉感知和知识搜索能力。Pix2Fact包含1000张高分辨率（4K+）图像，覆盖八个场景。其问题和答案由来自全球顶尖大学的博士持有标注者精心设计。每个问题都需要详细的视觉定位和外部知识的整合。评估十种最先进的VLMs，包括专有模型如Gemini-3.1-Pro和GPT-5.4，发现Pix2Fact对模型提出了严峻挑战：最先进的模型（Gemini-3.1-Pro）在有视觉地面真实和搜索工具的情况下仅达到51.7%的平均准确率。我们的分析将低准确率归因于三个因素：即使有视觉地面真实，频繁的视觉定位错误，浅层搜索利用，以及VLM无法检索长尾、无结构的局部信息。这种显著的差距暴露了当前模型在帮助人类处理需要超负荷视觉理解的现实场景中的局限性。我们相信Pix2Fact将作为推动下一代语言-视觉代理的关键基准测试，这些代理能够无缝整合细粒度感知与稳健的知识搜索。

英文摘要

Despite progress on general tasks, vision-language models (VLMs) still struggle with challenges that demand both fine-grained visual grounding and external knowledge, a synergy overlooked by existing benchmarks that evaluate these abilities in isolation. To fill this void, we introduce Pix2Fact, a visual question-answering benchmark designed to assess expert-level visual perception and knowledge search. Pix2Fact comprises 1,000 high-resolution (4K+) images spanning eight scenarios. Its questions and answers are meticulously crafted by PhD-holding annotators from top global universities across diverse disciplines. Each question requires detailed visual grounding and the integration of external knowledge. Evaluating ten state-of-the-art VLMs, including proprietary models such as Gemini-3.1-Pro and GPT-5.4, we find that Pix2Fact poses a formidable challenge: the most advanced model (Gemini-3.1-Pro) achieves only 51.7% average accuracy, even with access to visual ground truth and search tools. Our analysis attributes this low accuracy to three factors, frequent visual grounding errors even with visual ground truth, shallow search harnessing, and VLM's inability to retrieve long-tail, unstructured local information. This striking gap exposes the limitations of current models in assisting humans with real-world scenarios that demand overwhelming visual comprehension. We believe Pix2Fact will serve as a critical benchmark to drive the next generation of language-vision agents that seamlessly integrate fine-grained perception with robust knowledge search.

URL PDF HTML ☆

赞 0 踩 0

2602.22822 2026-06-15 cs.AI cs.LG 版本更新

FlexMS: A Unified Public Benchmark for Molecule Tandem Mass Spectrum Prediction

FlexMS：分子串联质谱预测的统一公共基准

Yunhua Zhong, Yixuan Tang, Yifan Li, Pan Liu, Zhiwen Yang, Jie Yang, Jun Xia

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））； The Hong Kong University of Science and Technology（香港科学与技术大学）； The University of Hong Kong（香港大学）； Yangzhou University（扬州大学）； Fudan University（复旦大学）

AI总结提出FlexMS基准框架，通过标准化预处理、元数据条件和评估协议，实现跨公共资源的公平比较，并引入难度感知诊断指导模型选择。

Comments preprint version v3

详情

AI中文摘要

串联质谱（MS/MS）在小分子鉴定中至关重要，但当前的深度学习谱预测系统在实际评估和部署中仍存在困难。尽管新架构不断声称达到最先进性能，但不一致的元数据条件和纠缠的预处理流程阻碍了公平的架构比较。此外，现有评估通常局限于精心策划的数据集，未能捕捉真实代谢组学的异质性和跨领域偏移。而且，当前基准缺乏难度感知诊断，对模型在特定计算或数据约束下的行为视而不见。为解决这些问题，我们提出了FlexMS，一个模块化的公共数据基准框架，它在统一协议下标准化跨公共资源的MS/MS预测，同时保留分子编码器、元数据条件、预测头以及下游检索。FlexMS建立了一个公平的评估平台，显著降低了集成新预测工具的门槛。FlexMS不仅优化平均分数，还通过难度感知诊断增强聚合准确性，为不同计算约束、数据规模和下游检索目标下的模型选择提供可操作指导。最终，FlexMS为社区提供了一个可复现的标准，以识别哪些算法结论是稳定的，以及哪些操作点在实践中最为可行。

英文摘要

Tandem mass spectrometry (MS/MS) is central to small molecule identification, but current deep learning systems for spectrum prediction still remain difficult to evaluate and deploy in practice. While novel architectures constantly claim state-of-the-art performance, inconsistent metadata conditioning and entangled preprocessing pipelines hinder fair architectural comparisons. Besides, existing evaluations are often restricted to curated datasets, failing to capture the heterogeneity and cross-domain shifts of real-world metabolomics. Furthermore, current benchmarks lack difficulty-aware diagnostics and leave blind to how models behave under specific compute or data constraints. To address this, we present FlexMS, a modular public-data benchmark framework that standardizes MS/MS prediction across public resources while keeping molecular encoders, metadata conditioning, predictor heads, and downstream retrieval under one protocol. FlexMS establishes a fair evaluation playground which significantly lowers the barrier for integrating new predictive tools. Rather than solely optimizing for average scores, FlexMS augments aggregate accuracy with difficulty-aware diagnostics, providing actionable guidance on model selection across different compute constraints, data scales, and downstream retrieval objectives. Ultimately, FlexMS provides the community with a reproducible standard to identify which algorithmic conclusions are stable and which operating points are most viable in practice.

URL PDF HTML ☆

赞 0 踩 0

2606.13741 2026-06-15 cs.LG 新提交

High-Frequency Pricing at Scale for E-Commerce

电子商务中的大规模高频定价

Stefan Birr, Tobias Huelden, Mones Raslan, Adele Gouttes, Andreas Schmitt, Mateusz Koren, Johannes Stephan, Robert Streek, Manuel Kunz, Tim Januschowski

发表机构 * Zalando SE ； Databricks

AI总结提出一种预测-优化框架，结合梯度提升树与多目标优化，实现时尚电商促销活动的每日高频定价，通过23次A/B测试验证，利润提升约6%。

详情

AI中文摘要

本文介绍了针对时尚电商促销活动的一种专门的预测-优化算法定价工具的设计、开发和实施。销售活动给定价带来了独特的挑战，包括波动的需求模式、快速的定价决策以及平衡短期收入与长期盈利能力的需要。我们描述了我们的方法，该方法结合了使用梯度提升树的每日分辨率需求预测与一个多目标优化框架，该框架针对超过500万件商品同时最大化长期利润和净商品价值。我们的解决方案通过实现一个预测-优化架构，将定价决策时间从数小时缩短到数分钟，解决了现有周粒度系统的关键局限性。我们通过在2023-2024年期间在欧洲领先的在线时尚零售商Zalando的12个市场中进行的23次A/B测试验证了我们的方法。实验结果表明，与之前的手动-算法混合方法相比，新的定价系统在保持同等销售和收入表现的同时，实现了约6%的更高利润。基于这些结果，该算法已成功部署到生产环境，现在负责公司促销活动中的大部分算法定价决策。

英文摘要

This paper presents the design, development, and implementation of a specialized forecast-then-optimize algorithmic pricing tool for sales campaigns in fashion e-commerce. Sales events present unique challenges for pricing including volatile demand patterns, rapid pricing decisions, and the need to balance short-term revenue with long-term profitability. We describe our approach combining daily-resolution demand forecasting using gradient-boosted trees with a multi-objective optimization framework that maximizes both long-term profit and net merchandise value for more than 5 million articles. Our solution addresses key limitations of existing weekly-granularity systems by implementing a forecast-then-optimize architecture that reduces pricing decision time from hours to minutes. We validate our approach through 23 A/B tests across 12 markets during 2023-2024 sales campaigns at Zalando, one of Europe's leading online fashion retailers. Experimental results demonstrate that the new pricing system achieves approximately 6% higher profit while maintaining equivalent performance on sales and revenue compared to the previous manual-algorithmic hybrid approach. Based on these results, the algorithm was successfully deployed to production and now handles the majority of algorithmic pricing decisions for sales campaigns at the company.

URL PDF HTML ☆

赞 0 踩 0

2606.13742 2026-06-15 cs.LG cs.AI physics.comp-ph physics.flu-dyn stat.ML 新提交

A fully GPU-based workflow for building physics emulators of hypersonic flows

基于全GPU工作流构建高超声速流物理仿真器

Fabian Paischer, Dylan Rubini, Deniz A. Bezgin, Aaron B. Buhendwa, David Hauser, Florian Sestak, Johannes Brandstetter, Sebastian Kaltenbach, Nikolaus A. Adams

发表机构 * TU Munich（慕尼黑工业大学）； Institute for Machine Learning, JKU Linz（林茨约翰·开普勒大学机器学习研究所）； ELLIS Unit（ELLIS单元）； EMMI AI

AI总结提出全GPU工作流，集成加速数据生成与不确定性量化增强的神经仿真器训练，通过可微求解器JAX-Fluids实现残差驱动改进，提升物理一致性并支持外推。

Comments First authors contributed equally

详情

AI中文摘要

以高保真度和低计算成本解析复杂物理现象的能力是解决现代工程关键挑战的核心。一个典型例子是高超声速流，其中精确预测全流场拓扑，特别是激波位置和强度，至关重要。然而，超声速和高超声速流仍然是传统降阶模型和神经仿真器的绊脚石，这些模型难以在工业相关应用中物理一致地捕捉流态中的陡峭梯度。为此，我们引入了一个完全基于GPU的工作流，该工作流将加速数据生成与通过不确定性量化和物理感知细化增强的神经仿真器训练相结合。我们的工作流由可微高保真求解器（JAX-Fluids）实现，我们利用该求解器进行快速数据集创建和基于残差的神经仿真器改进，以增强物理一致性。在此框架基础上，我们首先提出了一系列模型架构，并分析了它们的缩放行为以揭示其优缺点。然后，我们表明基于残差的细化使得能够在仅提供网格和输入参数的情况下进行训练，显著降低残差并提高物理一致性。可微仿真和基于残差的细化共同产生了在其训练分布之外仍然可靠的物理仿真器，这是在现实工程设计循环中部署代理的关键要求。

英文摘要

The ability to resolve complex physical phenomena with high fidelity and at low computational cost is central to addressing key challenges in modern engineering. A prime example lies in hypersonic flows, where the precise prediction of the full flowfield topology, in particular with respect to shock wave location and intensity, is critical. Yet supersonic and hypersonic flows continue to be a stumbling block for traditional reduced-order models and neural emulators that struggle to capture steep gradients in flow states with physical consistency in applications of industrial relevance. To that end, we introduce a fully GPU based workflow that integrates accelerated data generation with the training of neural emulators augmented by uncertainty quantification and physics-aware refinement. Our workflow is enabled by a differentiable high-fidelity solver (JAX-Fluids) which we employ for rapid dataset creation and residual-based improvement of the neural emulator to enhance physical consistency. Building on this framework, we first present a suite of model architectures and analyze their scaling behavior to expose their strengths and shortcomings. We then show that residual-based refinement enables training on cases where only mesh and input parameters are available, substantially reducing residuals and improving physical consistency. Together, differentiable simulation and residual-based refinement yield physics emulators that remain reliable beyond their training distribution, a key requirement for deploying surrogates in real-world engineering design loops.

URL PDF HTML ☆

赞 0 踩 0

2606.13821 2026-06-15 cs.LG 新提交

DTVEM-RE：差分时变效应模型的分层随机效应扩展，用于密集纵向数据中个体特异性多滞后估计

Amartya Bhattacharya

发表机构 * Geisel School of Medicine, Dartmouth College（达特茅斯学院盖泽尔医学院）

AI总结针对DTVEM假设所有人共享相同滞后结构的局限，提出DTVEM-RE扩展，允许个体拥有自己的滞后系数，通过贝叶斯分层VAR和连续时间OU模型实现，模拟和实证表明其能恢复个体间变异并提升预测性能。

详情

AI中文摘要

Jacobson等人（2019）提出的差分时变效应模型（DTVEM）是寻找密集纵向数据中最佳时间滞后的流行工具，但它假设所有人共享相同的滞后结构。原作者将此问题列为未来工作，这与现代临床研究的前提——个体存在差异——相冲突。我们提出DTVEM-RE，一种允许每个人拥有自己滞后系数的扩展，包含两种确认步骤版本：在Stan中实现的离散时间分层贝叶斯VAR，它在个体间进行信息汇集并提供校准的不确定性；以及在ctsem中实现的连续时间个体Ornstein-Uhlenbeck模型，它直接处理不均匀间隔的测量点。我们报告了四个结果。模拟显示，贝叶斯版本恢复个体间变异tau_a的偏差低于0.01，覆盖率为90%至93%。在Fisher等人（2017）的EMA数据集（N=40）上，个体特异性滞后1效应在三个情绪项目上相差一个数量级，贝叶斯和GAMM估计高度一致（r=0.87至0.92），且DTVEM-RE在四种离散时间方法中给出最佳的一步预测。多滞后版本显示所有九个tau_k值的可信区间均排除零，且个体差异最大的滞后在不同项目间变化，这是仅考虑滞后1的方法（如mlVAR）无法检测到的。最后，两个版本在个体特异性滞后1估计上几乎完全一致（r >= 0.995），差异仅如收缩所预测。据我们所知，DTVEM-RE是DTVEM风格滞后检测的第一个个体特异性实现，并且它包含标准DTVEM作为特例。

英文摘要

The Differential Time-Varying Effect Model (DTVEM) of Jacobson et al. (2019) is a popular tool for finding the best time lag in intensive longitudinal data, but it assumes everyone shares the same lag structure. The original authors named fixing this as future work, and it clashes with the premise of modern clinical research, which is that people differ. We present DTVEM-RE, an extension that lets each person have their own lag coefficients, with two versions of the confirmatory step: a discrete-time hierarchical Bayesian VAR in Stan, which pools across people and gives calibrated uncertainty, and a continuous-time per-person Ornstein-Uhlenbeck model in ctsem, which handles unevenly spaced beeps directly. We report four results. A simulation shows the Bayesian version recovers the between-person spread tau_a with bias below 0.01 and coverage of 90 to 93 percent. On the Fisher et al. (2017) EMA dataset (N=40), person-specific lag-1 effects vary by an order of magnitude across three mood items, the Bayesian and GAMM estimates agree closely (r=0.87 to 0.92), and DTVEM-RE gives the best one-step-ahead prediction among four discrete-time methods. A multi-lag version shows all nine tau_k values have credible intervals excluding zero, and the lag where people differ most changes across items, something lag-1-only methods like mlVAR cannot detect. Finally, the two versions agree almost exactly on person-specific lag-1 estimates (r >= 0.995), differing only as shrinkage predicts. DTVEM-RE is, to our knowledge, the first person-specific implementation of DTVEM-style lag detection, and it contains standard DTVEM as a special case.

URL PDF HTML ☆

赞 0 踩 0

2606.14149 2026-06-15 cs.LG 新提交

Trust but Verify: Mitigating Medical Hallucinations via Post-Hoc Adversarial Auditing and Multi-Agent Feedback Loops

信任但验证：通过事后对抗审计和多智能体反馈循环减轻医学幻觉

Muhammad Osama, Maheera Amjad, Zartasha Mustansar, Arslan Shaukat, Muhammad U. S. Khan

发表机构 * Data Science and Machine Learning Lab, SINES, NUST（NUST SINES数据科学与机器学习实验室）； SINES, NUST（NUST SINES）； CEME, NUST（NUST CEME）

AI总结本研究提出一种五智能体“信任但验证”系统，通过事后对抗审计和多智能体反馈循环，将大型语言模型在临床问题中推荐禁用药品的幻觉错误率降低约53%。

详情

AI中文摘要

大型语言模型（LLM）越来越多地部署在医疗环境中，但其产生幻觉的倾向在涉及临床决策时带来风险。本研究考察LLM在回答临床问题时是否会推荐近期被禁止或撤回的药品，并测试一种基于智能体的方法来减少此类错误。我们使用单一LLM骨干开发了一个五智能体“信任但验证”系统。为了衡量监管知识过时性，我们创建了一个包含103个临床多项选择题的对抗数据集，其中历史上正确的答案现在指向禁用物质。该规模确保了跨各种治疗类别的统计显著性。我们评估了三个开放访问模型家族（GPT-OSS、Llama-3、Falcon-3）在原始和智能体条件下的表现。通过逐点得分、标签准确率、幻觉错误率（HER）和组件保真度（CF）得分来衡量性能。我们还观察到专有模型中的临床安全性退化。在默认配置下，所有模型都显示出高幻觉率，一致地选择了与训练数据模式匹配的禁用药物。我们提出的智能体架构将各模型的HER降低了约53%。逐点得分从-0.25（不安全推荐）转向0.0（适当拒绝）。即使模型的参数知识倾向于禁用物质，安全审计也能拦截危险输出。所提出的多智能体框架提供了一种模型无关的方法来强制执行监管合规性，优先考虑患者安全而非流畅的文本生成。我们的工作展示了在安全关键的医疗环境中部署自主AI系统的实用方法，并说明了如何将实时监管数据集成到LLM流水线中以支持临床决策。

英文摘要

Large Language Models (LLMs) are increasingly deployed in healthcare settings, yet their tendency to hallucinate poses risks when clinical decisions are involved. This study examine whether LLMs recommend recently banned or withdrawn pharmaceuticals when answering clinical questions and tests an agent-based method for reducing such errors. We developed a five-agent "Trust but Verify" system using a single LLM backbone. To measure regulatory knowledge obsolescence, we created an adversarial dataset of 103 clinical MCQs where historically correct answers now refer to banned substances. This scale ensures statistical significance across various therapeutic classes. We evaluated three open-access model families (GPT-OSS, Llama-3, Falcon-3) under vanilla and agentic conditions. Performance was measured via pointwise score, label accuracy, Hallucination Error Rate (HER), and Component Fidelity (CF) score. We also observed clinical safety regression in proprietary models. In default configurations, all models showed high hallucination rates, consistently selecting banned drugs that matched training data patterns. Our proposed agentic architecture reduced HER by approximately 53% across models. Pointwise scores shifted from -0.25 (unsafe recommendation) toward 0.0 (appropriate refusal). The safety audit intercepted dangerous outputs even when models' parametric knowledge favored the banned substance. The proposed multi-agent framework offers a model-agnostic method for enforcing regulatory compliance that prioritizes patient safety over fluent text generation. Our work demonstrates a practical approach for deploying autonomous AI systems in safety-critical healthcare settings. It shows how real-time regulatory data can be integrated into LLM pipelines to support clinical decision-making.

URL PDF HTML ☆

赞 0 踩 0

2606.14157 2026-06-15 cs.LG cs.AI 新提交

Learning Urban Access Costs from Origin-Destination Flows via Inverse Optimal Transport

通过逆最优传输从起点-终点流中学习城市访问成本

Paula Joy B. Martinez

发表机构 * GitHub

AI总结提出逆最优传输模型从学校间入学流中恢复潜在选择成本，应用于菲律宾283,016条学生流动数据，估计补贴等效距离以优化城市服务分配。

Comments Oral Presentation. 2026 International Conference on Urban AI

详情

AI中文摘要

城市通过混合公私设施网络提供基本服务，包括学校、诊所、交通提供者和补贴服务点。在这些系统中，规划者通常观察到家庭去哪里，但看不到他们权衡距离、价格和机构访问等因素的潜在成本函数。我们通过菲律宾的学校选择来研究这个城市问题，该国最大的国家教育补贴旨在将学习者从拥挤的公立学校转移到参与计划的私立学校。将学校到学校的入学流视为熵最优传输计划，我们使用两种互补的逆最优传输模型恢复潜在选择成本：一个带有补贴项的可解释距离带模型，以及一个通过可微分Sinkhorn前向传递训练的神经成本模型。应用于人口最多地区23,820条观测流中的283,016次学习者出行，该框架估计了一个补贴等效距离$\lambda^{(k)}$，解释为补贴抵消的感知旅行成本公里数。该案例展示了如何将行政起点-终点数据转化为可解释的规划指标，用于可访问性感知的补贴设计、设施选址和城市服务分配。

英文摘要

Cities deliver basic services through mixed public-private facility networks, including schools, clinics, transit providers, and subsidized service points. In these systems, planners often observe where households go, but not the latent cost function through which they trade off factors such as distance, price, and institutional access. We study this urban problem through school choice in the Philippines, where the country's largest national education subsidy is intended to redirect learners from congested public schools to participating private schools. Treating school-to-school enrollment flows as an entropic optimal transport plan, we recover latent choice costs using two complementary inverse optimal transport models: an interpretable distance-banded model with a subsidy term, and a neural cost model trained through a differentiable Sinkhorn forward pass. Applied to 283{,}016 learner trips across 23{,}820 observed flows in the most populated region, the framework estimates a subsidy-equivalent distance, $λ^{(k)}$, interpreted as the kilometers of perceived travel cost offset by the subsidy. The case demonstrates how administrative origin-destination data can be transformed into interpretable planning metrics for accessibility-aware subsidy design, facility siting, and urban service allocation.

URL PDF HTML ☆

赞 0 踩 0

2606.14159 2026-06-15 cs.LG q-bio.BM 新提交

Curvature-Guided Geometric Representation for Protein-Ligand Binding Affinity Prediction

曲率引导的几何表示用于蛋白质-配体结合亲和力预测

Shuai Li, Chuan-Xian Ren, Yuhao Li, Ziqi Huang, Yue Pan, Mingzhe Tang, Hong Yan

发表机构 * School of Mathematics, Sun Yat-sen University（中山大学数学学院）； Department of Electrical Engineering, City University of Hong Kong（香港城市大学电机工程系）

AI总结提出RicciBind框架，利用里奇曲率捕捉局部相互作用紧密度，结合最优传输实现跨域对齐，提升结合亲和力预测的准确性与可解释性。

详情

AI中文摘要

蛋白质-配体结合亲和力（PLA）预测在药物发现中至关重要。尽管基于机器学习的方法取得了显著进展，现有方法难以联合表征局部几何组织和全局协调的跨分子相互作用，限制了其对复杂结合机制建模的能力。在此，我们提出RicciBind，一个几何表示框架，它整合了曲率引导的层次结构学习与基于最优传输（OT）的跨域对齐，以建模分子相互作用。具体而言，RicciBind利用里奇曲率捕捉分子结构内的局部相互作用紧密度，增强结构感知，并将原子相互作用组织成曲率感知的层次表示。然后，基于OT的聚类匹配机制在几何约束下对齐异质域中的蛋白质和配体聚类，实现全局一致的对应关系，并揭示超出局部邻域的高阶相互作用模式。通过将曲率引导的结构编码与OT驱动的跨域对齐相结合，RicciBind有效建模了复杂的相互作用语义，并显著提高了结合亲和力预测的准确性和可解释性。大量实验表明，RicciBind在PLA基准和虚拟筛选任务中取得了优越的预测性能和泛化能力。消融研究进一步证实了里奇曲率在增强分子相互作用表示中的关键作用。

英文摘要

Protein-ligand binding affinity (PLA) prediction is critical in drug discovery. Despite the notable advancements in machine learning-based approaches, existing methods struggle to jointly characterize local geometric organization and globally coordinated cross-molecular interactions, limiting their ability to model complex binding mechanisms. Here, we propose RicciBind, a geometric representation framework that integrates curvature-guided hierarchical structure learning with optimal transport (OT)-based cross-domain alignment to model molecular interactions. Specifically, RicciBind leverages Ricci curvature to capture local interaction tightness within molecular structures, enhancing structural awareness and organizing atomic interactions into curvature-aware hierarchical representations. An OT-based cluster matching mechanism then aligns protein and ligand clusters across heterogeneous domains under geometric constraints, enabling globally consistent correspondences and revealing higher-order interaction patterns beyond local neighborhoods. By coupling curvature-guided structure encoding with OT-driven cross-domain alignment, RicciBind effectively models complex interaction semantics and substantially improves both the accuracy and interpretability of binding affinity prediction. Extensive experiments demonstrate that RicciBind achieved superior predictive performance and generalization across PLA benchmarks and virtual screening tasks. Ablation studies further confirmed the essential role of Ricci curvature in enhancing molecular interaction representations.

URL PDF HTML ☆

赞 0 踩 0

2606.14169 2026-06-15 cs.LG 新提交

Machine Learning for Biomedical Raman Spectroscopy: From Spectral Acquisition to Clinical Translation

生物医学拉曼光谱的机器学习：从光谱采集到临床转化

Bogdan Oancea, Ana Maria Seciu-Grama, Nicoleta Siminea, Laura Mihaela Stefan, Alice Stoica, Joel Sjoberg, Marian Necula, Ana-Maria Prelipcean, Corneliu Ovidiu Vrancianu, Eduard Milea, Andrei Păun, Ion Petre, Mihaela Păun

发表机构 * National Institute of Research and Development for Biological Sciences（罗马尼亚生物科学研究院）； University of Bucharest（布加勒斯特大学）； University of Turku（图尔库大学）

AI总结综述机器学习在生物医学拉曼光谱全流程中的应用，包括预处理、诊断分类、可解释性分析及临床转化障碍，强调标准化与鲁棒验证的必要性。

Comments 52 pages, 2 figures

详情

AI中文摘要

拉曼光谱能够无标记、化学特异性地表征生物系统，已成为癌症诊断、分子分型、微生物鉴定和术中决策支持的重要工具。然而，生物医学拉曼光谱具有高维、噪声大、受荧光背景、采集变异性和生物异质性影响的特点，因此鲁棒的计算分析至关重要。本综述考察了机器学习在生物医学拉曼光谱全流程中的作用，从预处理和信号校正到无监督结构发现、监督诊断和分子分层、表示学习和迁移学习、可解释性、生物标志物发现以及与成像、病理学和分子谱分析的多模态整合。重点强调机器学习不仅用于诊断分类，还用于生物学可解释和临床可操作的分析。我们还讨论了临床转化的主要障碍，包括数据集规模有限、仪器间变异性、预处理不一致、外部验证不足、可重复性问题以及软件、数据和元数据共享有限。我们认为，进展需要方法学进步以及标准化、鲁棒验证、可解释性和可部署分析框架。通过整合方法学、生物医学和转化视角，本综述概述了开发可靠且临床可部署的拉曼-人工智能系统的关键方向。

英文摘要

Raman spectroscopy provides label-free, chemically specific characterization of biological systems and has become an important tool for cancer diagnosis, molecular subtyping, microbiological identification, and intraoperative decision support. Biomedical Raman spectra are, however, high-dimensional, noisy, and affected by fluorescence background, acquisition variability, and biological heterogeneity, making robust computational analysis essential. This review examines the role of machine learning across the biomedical Raman spectroscopy pipeline, from preprocessing and signal correction to unsupervised structure discovery, supervised diagnosis and molecular stratification, representation and transfer learning, explainability, biomarker discovery, and multimodal integration with imaging, pathology, and molecular profiling. Emphasis is placed on the use of machine learning not only for diagnostic classification, but also for biologically interpretable and clinically actionable analysis. We also discuss the main barriers to clinical translation, including limited dataset sizes, inter-instrument variability, inconsistent preprocessing, insufficient external validation, reproducibility concerns, and limited sharing of software, data, and metadata. We argue that progress will require methodological advances together with standardization, robust validation, explainability, and deployment-ready analytical frameworks. By integrating methodological, biomedical, and translational perspectives, this review outlines key directions for developing reliable and clinically deployable Raman-AI systems.

URL PDF HTML ☆

赞 0 踩 0

2606.14217 2026-06-15 cs.LG q-bio.BM 新提交

Curvature-Informed Potential Energy Surface for Protein-Ligand Binding Affinity Prediction

曲率信息势能面用于蛋白质-配体结合亲和力预测

Peng-Fei Sun, Chuan-Xian Ren, Hong Yan

发表机构 * Sun Yat-Sen University（中山大学）； City University of Hong Kong（香港城市大学）

AI总结提出曲率信息势能面图神经网络CPES，通过物理启发的曲率表示建模构象柔性，结合光谱交叉注意力捕获结合诱导的动力学变化，提升亲和力预测性能。

详情

AI中文摘要

准确预测蛋白质-配体结合亲和力对于基于结构的药物发现至关重要。最近的几何深度学习方法通过将蛋白质-配体复合物表示为三维图，取得了有前景的性能。然而，大多数现有方法主要依赖于来自单一结合构象的静态相互作用几何，而忽略了分子柔性和结合诱导的构象变化。为了解决这一局限性，我们提出了一种曲率信息势能面（CPES）图神经网络用于蛋白质-配体结合亲和力预测，该网络结合了物理启发的曲率表示来建模构象柔性。CPES首先从平衡构型下评估的势能面Hessian矩阵导出曲率谱描述符，其特征值定义了势能面的局部主曲率。然后，它使用光谱交叉注意力来比较未结合的配体和蛋白质与结合复合物，从而捕获结合诱导的构象动力学变化。同时，通过几何感知消息传递、软聚类和双向交叉注意力，从静态结构特征中学习层次化的蛋白质-配体相互作用表示。最后，CPES融合曲率信息动态表示与静态相互作用表示进行亲和力回归。在多个基准数据集上的广泛评估表明，CPES实现了改进的预测性能并提供了物理可解释性。

英文摘要

Accurate prediction of protein-ligand binding affinity is essential for structure-based drug discovery. Recent geometric deep learning methods have achieved promising performance by representing protein-ligand complexes as three-dimensional graphs. However, most existing approaches mainly rely on static interaction geometry from a single bound conformation, while neglecting molecular flexibility and binding-induced conformational changes. To address this limitation, we propose a curvature-informed potential energy surface (CPES) graph neural network for protein-ligand binding affinity prediction, which incorporates physics-informed curvature representations to model conformational flexibility. CPES first derives curvature spectral descriptors from the Hessian of the potential energy surface evaluated at equilibrium configurations, whose eigenvalues define the local principal curvatures of the potential energy surface. It then uses spectral cross-attention to compare the unbound ligand and protein with the bound complex, thereby capturing binding-induced changes in conformational dynamics. In parallel, hierarchical protein-ligand interaction representations are learned from static structural features through geometry-aware message passing, soft clustering, and bidirectional cross-attention. Finally, CPES fuses the curvature-informed dynamic representations with static interaction representations for affinity regression. Extensive evaluations on multiple benchmark datasets demonstrate that CPES achieves improved predictive performance and offers physical interpretability.

URL PDF HTML ☆

赞 0 踩 0

2606.14245 2026-06-15 cs.LG 新提交

专家驱动的生存机器：改善多个临床队列中的分层与可解释性

Farica Zhuang, Zixuan Wen, Christos Davatzikos, Li Shen

发表机构 * University of Pennsylvania（宾夕法尼亚大学）

AI总结提出一种基于混合专家模型的自适应深度聚类生存框架（AdaCSM），通过路由专家机制实现条件专业化，动态分配患者到专门的风险预测器，提升生存预测性能和可解释性。

详情

DOI: 10.1145/3807503.3819574

AI中文摘要

生存预测在医疗提供者和临床研究中扮演核心角色。准确的风险分层能够实现早期干预并改善患者管理。大多数现有的深度生存模型为所有患者学习一个共同的特征表示，这可能掩盖患者亚组之间的重要差异。相比之下，混合专家（MoE）框架允许模型的不同部分关注不同的患者模式，从而产生更个性化的表示。因此，在这项工作中，我们提出了一种混合专家增强的自适应深度聚类生存框架（AdaCSM），用于建模这种异质性生存模式。我们引入了一种基于路由的专家机制，该机制在参数化生存建模框架内实现条件专业化。所提出的架构动态地将患者分配给专门的风险预测器，同时保留患者生存和亚型聚类目标。我们在跨越不同疾病领域的多个真实世界纵向临床队列上，将我们的方法与最先进的生存和深度聚类模型进行了比较。所提出的方法在生存分析中展示了改进的预测性能并产生了可解释的结果。

英文摘要

Survival prediction plays a central role for healthcare providers and clinical researchers. Accurate risk stratification enables early intervention and improved patient management. Most existing deep survival models learn one common feature representation for all patients, which may hide important differences between patient subgroups. In contrast, a Mixture-of-Experts (MoE) framework allows different parts of the model to focus on different patient patterns, leading to more individualized representations. Therefore, in this work, we propose a mixture-of-experts enhanced adaptive deep clustering survival framework (AdaCSM) for modeling such heterogeneous survival patterns. We introduce a routing-based expert mechanism that enables conditional specialization within a parametric survival modeling framework. The proposed architecture allocates patients to specialized risk predictors dynamically while preserving the patient survival and subtype clustering objectives. We compare our method with state-of-the-art survival and deep clustering models on multiple real-world longitudinal clinical cohorts spanning diverse disease domains. The proposed method demonstrates improved predictive performance and leads to interpretable results in survival analysis.

URL PDF HTML ☆

赞 0 踩 0

2606.13682 2026-06-15 cs.AI cs.LG 交叉投稿

A Deep Reinforcement Learning (DRL)-Based Transformer Method for Solving the Open Shop Scheduling Problem

基于深度强化学习的Transformer方法求解开放车间调度问题

Faezeh Ardali, Mwembezi A. Nyelele, Gerald M. Knapp

发表机构 * Louisiana State University（路易斯安那州立大学）； University of Minnesota Duluth（明尼苏达大学杜鲁斯分校）

AI总结提出一种基于Transformer编码器-解码器架构的调度策略，仅以加工时间矩阵为输入，在Taillard小规模实例上训练后可直接推广至40x40至100x100的大规模问题，与经典调度规则相比具有竞争力。

详情

AI中文摘要

开放车间调度问题（OSSP）出现在许多工业和服务环境中，但随着作业和机器数量的增加，其计算难度仍然很大。精确方法很快变得难以处理，而经典调度规则和元启发式方法可能需要大量调整才能在大规模下保持解的质量。本研究开发了一种基于Transformer的OSSP调度策略，采用具有多头注意力的编码器-解码器架构。该模型仅在Taillard基准实例（4x4、5x5、7x7和10x10）上使用加工时间矩阵作为输入进行训练，生成可行调度，其makespan通常为最佳已知值的15-30%。为了评估可扩展性，将训练好的策略无需重新训练直接应用于从40x40到100x100随机生成的实例，并与经典调度启发式方法（包括SPT、LPT、MWKR和EST）进行比较。在这些大规模实例中，Transformer相对于标准下界实现了12.89-15.12%的平均差距。与EST相比，Transformer保持了竞争力，通常差距较小，同时显著优于SPT和LPT。这些结果表明，在小规模OSSP实例上训练的Transformer策略可以推广到更大规模的问题，并提供一种轻量级、基于学习的替代经典调度规则的方法。

英文摘要

The open shop scheduling problem (OSSP) arises in many industrial and service settings but remains computationally challenging as the number of jobs and machines increases. While exact methods quickly become intractable, classical dispatching rules and metaheuristics may require substantial tuning to maintain solution quality at large scales. This study develops a Transformer-based scheduling policy for OSSP using an encoder-decoder architecture with multi-head attention. The model is trained on Taillard benchmark instances (4x4, 5x5, 7x7, and 10x10) using only the processing-time matrix as input and produces feasible schedules with makespans typically within 15-30% of best-known values. To evaluate scalability, the trained policy is applied without retraining to randomly generated instances from 40x40 to 100x100 and compared against classical dispatching heuristics, including SPT, LPT, MWKR, and EST. Across these large instances, the Transformer achieved average gaps of 12.89-15.12% relative to a standard lower bound. Compared with EST, the Transformer remained competitive, typically within a modest margin, while substantially outperforming SPT and LPT. These results indicate that a Transformer policy trained on small OSSP instances can generalize to substantially larger problems and provide a feature-light, learning-based alternative to classical dispatching rules.

URL PDF HTML ☆

赞 0 踩 0

2606.13695 2026-06-15 physics.geo-ph cs.AI cs.LG 交叉投稿

Korzhinskii-Net: Physics-Informed Neural Network for Sub-Surface Mineral Prospectivity Modelling

Korzhinskii-Net: 用于地下矿产潜力建模的物理信息神经网络

Boris Kriuk

发表机构 * The Hong Kong University of Science and Technology（香港科技大学）

AI总结提出Korzhinskii-Net，一种耦合达西流、热输运和反应速率的二维径向物理信息神经网络，在五个矿省四个矿种上平均PR-AUC达0.885，显著优于传统基线。

Comments 12 pages, 7 figures, 3 tables

详情

AI中文摘要

矿产潜力建模（MPM）支撑着勘探经济学，然而大多数操作流程简化为基于浅表地表代理训练的数据驱动分类器。这类模型对实际定位矿石的地下物理过程（热平流、流体流动和岩性依赖的沉淀）视而不见。我们提出Korzhinskii-Net，一个二维径向物理信息神经网络（PINN），它将达西流、平流-扩散热输运和softplus饱和反应速率耦合到一个可微的正演模型中，并由地表和遥感代理弱监督。该网络以Dmitri S. Korzhinskii（1899-1985）命名，其渗滤交代作用理论提供了物理框架。我们在五个矿省（涵盖四种矿种：诺里尔斯克（Ni-Cu-PGE）、佩琴加（Ni-Cu硫化物）、乌多坎（砂岩型Cu）、苏霍伊洛格（造山型Au）和米尔内（金伯利岩型钻石））上，采用公平、泄漏控制的5折交叉验证协议（含硬环形负样本）评估Korzhinskii-Net。Korzhinskii-Net的平均PR-AUC为0.885，而最强经典基线（梯度提升）为0.281；平均分位数排名为0.019，对比基线为0.413。这一改进在所有五个矿省和四个矿种系统中一致，表明即使仅受全球开放数据代理约束，物理信息可微模拟器也能恢复纯特征学习器系统性地遗漏的定位模式。我们将完整流程和评估工具开源。

英文摘要

Mineral prospectivity modelling (MPM) underpins exploration economics, yet most operational pipelines reduce to data-driven classifiers trained on shallow surface proxies. Such models are blind to the subsurface physics that actually localises ore: heat advection, fluid flow, and lithology-dependent precipitation. We present Korzhinskii-Net, a 2-D radial physics-informed neural network (PINN) that couples Darcy flow, advective-diffusive heat transport, and a softplus-saturated reaction rate into a single differentiable forward model, weakly supervised by surface and remote-sensing proxies. The network is named after Dmitri S. Korzhinskii (1899-1985), whose theory of infiltration metasomatism provides the physical scaffold. We evaluate Korzhinskii-Net on five ore provinces spanning four commodity classes -- Norilsk (Ni-Cu-PGE), Pechenga (Ni-Cu sulphide), Udokan (sandstone-hosted Cu), Sukhoi Log (orogenic Au), and Mirny (kimberlitic diamond) -- under a fair, leakage-controlled 5-fold cross-validation protocol with hard ring-shaped negatives. Korzhinskii-Net attains a mean PR-AUC of 0.885 versus 0.281 for the strongest classical baseline (gradient boosting), and a mean fractional rank of 0.019 versus 0.413. The improvement is consistent across all five provinces and four commodity systems, suggesting that physics-informed differentiable simulators, even when constrained only by global open-data proxies, can recover localisation patterns that pure feature-based learners systematically miss. We release the full pipeline and evaluation harness as open source.

URL PDF HTML ☆

赞 0 踩 0

2606.13696 2026-06-15 cs.CY cs.LG cs.MA cs.SI 交叉投稿

AGORA: Can Deliberation and Governance Gates Absorb Participation Bias in Transit Planning?

AGORA: 审议与治理门能否吸收公交规划中的参与偏差？

Jung-Hoon Cho, Cathy Wu

发表机构 * Department of Civil and Environmental Engineering and Laboratory for Information & Decision Systems, Massachusetts Institute of Technology（土木与环境工程系和信息与决策系统实验室，麻省理工学院）； Institute for Data, Systems, and Society, Massachusetts Institute of Technology（数据、系统与社会研究所，麻省理工学院）

AI总结提出AGORA框架，通过固定网络、需求和求解器，系统变化会议组成、结构化审议和治理门，发现审议是参与影响结果的关键机制，治理门可压缩跨剖面方差，将参与偏差从不可控输入重构为过程设计问题。

详情

AI中文摘要

公交网络设计不仅依赖于优化算法，还取决于谁出现在公众听证会上。当前实践通常收集来自自选参与者的单向评论，使参与者构成成为结果变化的不可控来源。我们提出AGORA框架，该框架固定网络、需求和求解器，同时通过利益相关者代理、结构化审议和治理门系统变化会议组成。在两个不同规模的标准基准网络上，我们发现：(i) 总体结果在不同构成之间变化很小，但在尾部风险和公平性差异方面，代表性抽样仍然倾向于优于偏斜构成；(ii) 没有审议时，构成不产生任何变化，表明审议是“谁出席影响结果”的机制；(iii) 治理门压缩了跨剖面方差而不改变Mandl上的平均结果，但在Mumford0上的低接受率表明阈值需要实例特定的校准。这些发现将参与偏差从不可控输入重新定义为过程设计问题：即使没有保证的代表性出席，结构良好的审议和治理标准也能显著减少结果对“谁在房间里”的依赖程度。

英文摘要

Transit network design depends not only on the optimization algorithm but also on who shows up to the public hearing. Current practice often collects one-directional comments from self-selected attendees, leaving participant mix as an uncontrolled source of outcome variation. We present AGORA, a framework that holds the network, demand, and solver fixed while systematically varying meeting composition through stakeholder agents, structured deliberation, and governance gates. Across two standard benchmark networks at different scales, we find that (i) aggregate outcomes vary little across compositions, but on tail risk and fairness disparity, representative sampling still tends to outperform skewed compositions; (ii) without deliberation, composition produces no variation at all, showing that deliberation is the mechanism through which who attends affects outcomes; and (iii) governance gates compress cross-profile variance without shifting the average outcome on Mandl, but low acceptance on Mumford0 shows thresholds require instance-specific calibration. These findings reframe participation bias from an uncontrollable input to a process-design problem: even without guaranteed representative attendance, well-structured deliberation and governance criteria can substantially reduce how much outcomes depend on who is in the room.

URL PDF HTML ☆

赞 0 踩 0

2606.13747 2026-06-15 cs.AR cs.LG 交叉投稿

BigPower: Hierarchical Source-Level Module Power Estimation for CPUs with Large Language Models

BigPower: 基于大型语言模型的CPU层次化源码级模块功耗估计

Honghua Zhu, Chunjie Luo, Jianfeng Zhan

发表机构 * State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences Beijing China（处理器国家重点实验室，计算技术研究所，中国科学院北京中国）

AI总结提出BigPower，利用大型语言模型表示和架构层次、模块连接、配置参数及工作负载上下文，直接从源码级设计信息估计CPU模块级功耗，无需仿真，在香山处理器上验证了有效性。

Comments 12 pages, 10 figures

2606.13859 2026-06-15 cond-mat.mtrl-sci cs.LG 交叉投稿

Closed-loop discovery of out-of-distribution processing protocols by evolutionary search and uncertainty-aware learning

通过进化搜索和不确定性感知学习发现分布外处理协议的闭环方法

Yu Liu, Stanislav Udovenko, Ching-Che Lin, Jaegyu Kim, Lane W. Martin, Susan Trolier-McKinstry, Sergei V. Kalinin

发表机构 * Department of Materials Science and Engineering, University of Tennessee, Knoxville（田纳西大学材料科学与工程系）； Materials Science and Engineering Department, Materials Research Institute, the Pennsylvania State University（宾夕法尼亚州立大学材料研究学院材料科学与工程系）； Department of Materials Science and NanoEngineering, Rice University（Rice大学材料科学与纳米工程系）； Rice Advanced Materials Institute, Rice University（Rice大学先进材料研究所）； Department of Materials Science and Engineering, University of California, Berkeley（加州大学伯克利分校材料科学与工程系）； Departments of Chemistry and Physics and Astronomy, Rice University（Rice大学化学与天文物理系）； Physical Sciences Division, Pacific Northwest National Laboratory（太平洋西北国家实验室物理科学部）

AI总结提出一种闭环工作流，结合紧凑波形表示的进化搜索与不确定性感知深度核学习，自动发现提升铁电薄膜非线性响应的分布外处理协议，并通过实验验证其机制。

详情

AI中文摘要

许多材料和化学系统表现出历史依赖的响应，其中功能结果不仅由最终状态变量决定，还由操作期间施加的场、温度或化学势的时间序列决定。因此，发现新的处理协议是一个高维搜索问题，其中控制变量是整个波形或样本历史，而传统策略要么局限于保守的内插族，要么变得过于测量密集。本文介绍了一种闭环工作流，将紧凑波形表示上的进化搜索与不确定性感知深度核学习相结合，以生成、排序和实验验证候选协议。应用于铁电薄膜，以扫描探针尖端偏压波形为协议，非线性机电响应为奖励，该工作流发现了通过去老化薄膜增强非线性的波形族。空间分辨的前后测量表明，性能最佳的波形选择性地激活预先存在的弱钉扎畴壁段，而最差的波形则驱动长程不可逆切换。该框架将协议调优重新定义为分布外发现，可推广到合成和退火轨迹、电池形成协议以及其他高维控制问题。

英文摘要

Many materials and chemical systems exhibit history-dependent responses, where functional outcomes are governed not only by final-state variables but by the time-dependent sequence of fields, temperatures, or chemical potentials applied during operation. Discovering new processing protocols is therefore a high-dimensional search problem in which the control variable is an entire waveform or sample history, and conventional strategies either remain confined to conservative interpolative families or become prohibitively measurement intensive. Here, a closed-loop workflow is introduced that couples evolutionary search over a compact waveform representation with uncertainty-aware deep kernel learning to generate, rank, and experimentally validate candidate protocols. Applied to ferroelectric thin films, with the scanning-probe tip-bias waveform as the protocol and the nonlinear electromechanical response as the reward, the workflow discovers waveform families that enhance nonlinearity by de-aging the film. Spatially resolved before/after measurements show that the best-performing waveforms selectively activate pre-existing, weakly pinned domain-wall segments, whereas the worst drive long-range irreversible switching. This framework reframes protocol tuning as out-of-distribution discovery, generalizable to synthesis and annealing trajectories, battery formation protocols, and other high-dimensional control problems.

URL PDF HTML ☆

赞 0 踩 0

2606.13868 2026-06-15 astro-ph.IM cs.LG 交叉投稿

Multi-Variable Stellar Parameter Estimation Using Residual Multitask Neural Networks

使用残差多任务神经网络的多变量恒星参数估计

Bruno Santos Meneses Barreto, Marcio Eisencraft

发表机构 * Escola Politécnica, Universidade de São Paulo, SP（圣保罗大学理工学院）

AI总结提出一种端到端流水线，利用带残差块的全连接多任务神经网络，通过贝叶斯优化调参，从SDSS光谱中估计有效温度、金属丰度和表面重力，在低复杂度下达到1%-3%的归一化误差。

Comments This manuscript has been submitted to the Congresso Brasileiro de Automática (CBA) and is currently under peer review

详情

AI中文摘要

我们提出了一种端到端流水线，用于从斯隆数字巡天数据发布12的光谱中估计恒星参数，该流水线使用带有残差块的全连接多任务神经网络，其超参数通过贝叶斯优化进行调优。预处理流水线包括每个光谱的标准化、目标变量（有效温度$T_{\mathrm{eff}}$、金属丰度$[\mathrm{Fe/H}]$和表面重力$\log g$）的RobustScaler归一化，以及通过注入高斯噪声进行数据增强。在保留的测试集上，该模型对$T_{\mathrm{eff}}$实现了$59.76~\mathrm{K}$的平均绝对误差（MAE），对$[\mathrm{Fe/H}]$实现了$0.103~\mathrm{dex}$，对$\log g$实现了$0.130~\mathrm{dex}$。相对于每个参数的全尺度范围进行归一化后，这些结果代表了$1\%$到$3\%$的范围归一化误差，而模型复杂度仅为约540,000个可训练参数，效率极高。这些结果表明，紧凑的残差多任务架构结合合理的信号预处理，为大规模光谱数据集中的非线性参数估计提供了一种参数高效的解决方案。特别是，所提出的模型在复杂度远低于更深神经网络基线的情况下实现了有竞争力的性能。

英文摘要

We present an end-to-end pipeline for estimating stellar parameters from Sloan Digital Sky Survey Data Release 12 spectra using a fully connected multitask neural network with residual blocks, whose hyperparameters are tuned via Bayesian optimization. The preprocessing pipeline includes per-spectrum standardization, RobustScaler normalization of the target variables -- effective temperature $T_{\mathrm{eff}}$, metallicity $[\mathrm{Fe/H}]$, and surface gravity $\log g$ -- and data augmentation via Gaussian noise injection. On a held-out test set, the model achieved Mean Absolute Errors (MAE) of $59.76~\mathrm{K}$ for $T_{\mathrm{eff}}$, $0.103~\mathrm{dex}$ for $[\mathrm{Fe/H}]$, and $0.130~\mathrm{dex}$ for $\log g$. Normalized against the full-scale range of each parameter, these results represent range-normalized errors between $1\%$ and $3\%$, achieved with a highly efficient model complexity of approximately 540,000 trainable parameters. These results demonstrate that a compact residual multitask architecture, combined with principled signal preprocessing, provides a parameter-efficient solution for nonlinear parameter estimation in large-scale spectral datasets. In particular, the proposed model achieves competitive performance with substantially lower complexity than deeper neural network baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.13941 2026-06-15 gr-qc astro-ph.IM cs.LG 交叉投稿

Binary Black Hole Parameter Estimation with Hybrid CNN-Transformer Neural Networks

使用混合CNN-Transformer神经网络进行双黑洞参数估计

Panagiotis N. Sakellariou, Spiros V. Georgakopoulos, Sotiris Tasoulis, Vassilis P. Plagianakos

发表机构 * University of Thessaly（塞萨洛尼基大学）

AI总结提出混合CNN-Transformer深度学习策略，用于估计非进动双黑洞系统的内禀和外在参数，在模拟和真实引力波事件中展现出强预测性能和鲁棒性。

Comments Accepted manuscript. 12 pages, 10 figures

Journal ref Astronomy and Computing, vol. 54, 101027 (2026)

详情

DOI: 10.1016/j.ascom.2025.101027

AI中文摘要

引力波的探测彻底改变了我们探索宇宙基本方面的能力。传统上，建模的引力波信号通过基于模板的匹配滤波来识别，随后在信噪比时间序列中跨多个探测器进行符合分析。机器学习和深度学习的最新进展激发了人们对其在信号检测和参数估计中应用的兴趣。在本研究中，提出了一种混合深度学习策略，利用Transformer编码器的有效性以及成熟的卷积神经网络架构，尝试估计非进动双黑洞系统的内禀和外在参数。这项工作的主要焦点是点估计，即为每个参数生成单一最佳拟合值，而非完整的后验分布。该方法在嵌入高斯噪声的模拟信号和真实引力波事件上进行了评估，并在关键天体物理参数上展示了强大的预测性能和鲁棒性。

英文摘要

The detection of gravitational waves has revolutionized our ability to explore fundamental aspects of the Universe. Traditionally, modeled gravitational-wave signals have been identified using template-based matched filtering, followed by coincidence analysis across multiple detectors in the signal-to-noise ratio time series. Recent advances in Machine Learning and Deep Learning have sparked growing interest in their application to both signal detection and parameter estimation. In this study, a hybrid Deep Learning strategy is proposed that leverages the effectiveness of Transformer encoders alongside well-established Convolutional Neural Network architectures in an attempt to estimate the intrinsic and extrinsic parameters of non-precessing binary black hole systems. The primary focus of this work is point estimation, producing single best-fit values for each parameter rather than full posterior distributions. This method is evaluated on both simulated signals embedded in Gaussian noise and real gravitational-wave events, and it demonstrates strong predictive performance and robustness across key astrophysical parameters.

URL PDF HTML ☆

赞 0 踩 0

2606.13952 2026-06-15 cs.CR cs.ET cs.LG 交叉投稿

Side-Channel Attacks Bypass Protection in 3D Printers

侧信道攻击绕过3D打印机的保护

Eric Yocam, Varghese Vaidyan, Micah Flack, Gurcan Comert, Judith L. Mwakalonge

发表机构 * Department of Computer Science, California Polytechnic State University（计算机科学系，加州大学Polytechnic州立大学）； Beacom College of Computer and Cyber Sciences, Dakota State University（计算机与网络科学学院，达科他州立大学）； Idaho National Laboratory（爱达荷国家实验室）； Department of Computational Data Science and Engineering, North Carolina A&T State University（计算数据科学与工程系，北卡罗来纳A&T州立大学）； Department of Engineering, South Carolina State University（工程系，南卡罗来纳州立大学）

AI总结首次评估商用3D打印机的主动电机噪声消除（AMNC）硬件对策，发现其完全消除声学信道，但振动信道仍泄漏几何信息，且泄漏具有设备特异性。

Comments 11 pages, 6 figures, 4 tables

详情

AI中文摘要

主动电机噪声消除（AMNC）作为硬件对策，已部署在商用熔融沉积成型（FDM）3D打印机中，用于防御针对知识产权（IP）的声学侧信道攻击。我们首次对部署的AMNC对策进行实证评估，使用来自两台配备AMNC的Bambu Lab打印机的同步声学和振动记录公共数据集，涵盖12个物体类别。AMNC完全中和了声学信道：分类准确率与8.33%的随机基线无法区分。AMNC未针对的振动信道仍然泄漏。通过汇总统计，泄漏是粗略且幅度驱动的（振动准确率约31%合并，36-47%打印机内），而波形形状几乎不携带信息（仅频率特征为随机）。一个摄入打印有序演化的全序列时间模型将准确率提升至约61%，而顺序打乱的控制（约33%）表明，一个实质性成分是真正的顺序性并依赖于打印进程。泄漏具有设备特异性：在一台打印机上训练的分类器转移到另一台时接近随机。我们得出结论：AMNC仅是声学防御；振动仍然是一个部分、几何相关的侧信道，它未解决，但在此数据集上不支持完整的几何重建；重建级攻击需要AMNC同样未涉及的磁或电源信道。我们发布所有代码。

机器学习粒子流作为对撞机物理学的基础模型

Farouk Mokhtar, Joosep Pata, Michael Kagan, Javier Duarte

发表机构 * University of California San Diego（加州大学圣地亚哥分校）； National Institute of Chemical Physics and Biophysics（化学物理与生物物理国家研究所）； SLAC National Accelerator Laboratory（斯坦福线性加速器中心国家加速器实验室）

AI总结将事件重建视为机器学习问题，利用MLPF模型学习到的潜在表示，在喷注味识别、喷注能量回归和缺失动量回归三项分析任务上显著提升性能，且单线性层即可媲美先进架构，参数减少约35倍。

Comments 15 pages, 11 figures

详情

AI中文摘要

从粒子对撞到物理分析的工作流程经过一系列传统上模块化且不连续的重建步骤，没有共享表示连接低层级探测器数据与高层级分析任务。我们表明，将事件重建视为机器学习问题自然会产生这样的共享表示。我们重新利用为粒子流重建（MLPF）训练的机器学习模型来执行三项不同的分析任务：喷注味识别、喷注能量回归和缺失动量回归。通过将在重建过程中学到的每个粒子的潜在表示作为附加输入特征，我们显著优于仅使用运动学特征的基线。我们进一步证明，仅使用潜在表示训练的单个线性层在性能上可与最先进的基线架构相媲美，并且在缺失动量回归上优于基线，参数数量减少约35倍。这些结果表明，在重建过程中学到的潜在表示编码了下游分析所需的基本物理信息，将MLPF确立为基础模型，并为从探测器数据到物理分析的端到端流程提供了具体步骤。

英文摘要

The workflow from particle collision to physics analysis passes through a series of reconstruction steps that are traditionally modular and disconnected, with no shared representation linking low-level detector data to high-level analysis tasks. We show that casting event reconstruction as a machine learning problem naturally produces such a shared representation. We repurpose a machine learning model trained for particle-flow reconstruction (MLPF) to perform three distinct analysis tasks: jet flavor identification, jet energy regression, and missing momentum regression. By appending the per-particle latent representations learned during reconstruction as additional input features, we substantially improve over baselines that use kinematic features alone. We further demonstrate that a single linear layer trained using only the latent representations achieves competitive performance against state-of-the-art baseline architectures, and outperforms the baseline for missing momentum regression with approximately 35 times fewer parameters. These results demonstrate that the latent representations learned during reconstruction encode essential physics information needed for downstream analysis, establishing MLPF as a foundation model and offering a concrete step toward an end-to-end pipeline from detector data to physics analysis.

URL PDF HTML ☆

赞 0 踩 0

2606.14561 2026-06-15 cs.RO cs.LG 交叉投稿

ORCA: A Platform for Open-Source Dexterity Research

ORCA: 开源灵巧性研究平台

Francesco Capuano, Maximilian Eberlein, Fabrice Bourquin, Clemens Claudio Christoph

发表机构 * University of Oxford（牛津大学）； ETH Zurich（苏黎世联邦理工学院）； Orca Dexterity

AI总结提出ORCA学习栈，统一灵巧手控制、仿真、遥操作和重定向，集成机器人学习框架，实现端到端灵巧操作研究。

Comments 15 pages

详情

AI中文摘要

机器人操作研究越来越关注两指平行夹爪，因其有效性、经济性和易于遥操作。然而，夹爪受限于其外形因素，即使对于简单的重新定向任务，也常常需要双臂设置。拟人手是灵巧机器人学习的更自然平台——更接近人手，能够从人类视频中学习——但它们在学习研究中仍然难以使用：即使存在开放且可访问的手部硬件，用于控制、仿真、遥操作和重定向的软件也分散在零散的代码库中，并且与机器人学习生态系统基本脱节。在这项工作中，我们介绍了\orca~学习栈，这是一个将灵巧性作为第一类机器人学习领域的开源研究栈。我们的\orca~栈将低级控制、仿真、来自一系列消费平台的遥操作以及手部重定向统一在单个接口后面，并原生集成流行的机器人学习框架（如\lerobot），使灵巧手研究人员能够利用与非灵巧机器人学习相同的数据、训练和评估流程。我们展示了一个完整的端到端工作流程，通过使用消费级VR头显进行遥操作收集手内重新定向任务的专家演示，使用\lerobot训练自主策略，并在完全可重现和可观察的设置中评估学习到的策略。我们将整个栈开源，作为灵巧操作研究的共享、可重现基础。

英文摘要

Robotics manipulation research increasingly focuses on two-finger parallel grippers for their effectiveness, affordability, and ease of teleoperation. Grippers are nonetheless limited by their form factor, often requiring bimanual setups even for simple reorientation tasks. Anthropomorphic hands are a more natural platform for dexterous robot learning -- closer to the human hand, and capable of learning from human video -- yet they remain hard to use in learning research: even where open and accessible hand hardware exists, the software for control, simulation, teleoperation, and retargeting is scattered in one-off code bases, and largely disconnected from the robot-learning ecosystem. In this work, we introduce the \orca~learning stack, an open-source research stack for dexterity as a first-class robot learning domain. Our \orca~stack unifies low-level control, simulation, teleoperation from a range of consumer platforms, and hand retargeting, behind a single interface, and integrates natively with popular robot-learning frameworks such as \lerobot, so dexterous hand researchers can leverage the same data, training, and evaluation pipelines used for non-dexterous robot learning. We demonstrate a complete end-to-end workflow, collecting expert demonstrations of an in-hand reorientation task by teleoperation with a consumer-grade VR headset, training an autonomous policy with \lerobot, and evaluating the learned policy in a fully reproducible and observable setup. We open-source the entire stack as a shared, reproducible foundation for dexterous-manipulation research.

URL PDF HTML ☆

赞 0 踩 0

从小到大：一种用于解决分类优化问题的图卷积网络方法

Guokai Li, Pin Gao, Stefanus Jasin, Zizhuo Wang

发表机构 * Smith School of Business, Queen’s University（女王大学商学院）； School of Data Science, The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳）数据科学学院）； Stephen M. Ross School of Business, University of Michigan（密歇根大学罗斯商学院）

AI总结提出图卷积网络（GCN）框架高效求解约束分类优化问题，通过图表示学习参数到最优分类的映射，小样本训练可泛化至大规模问题，数值实验显示20产品训练模型在2000产品问题上达到85%以上最优收益。

详情

AI中文摘要

分类优化旨在从可替代产品中选择一个子集，在约束条件下最大化期望收益。由于组合和非线性性质，该问题是NP难的，并且在电子商务等行业中频繁出现，平台每分钟需要解决数千个此类问题。我们提出了一种图卷积网络（GCN）框架来高效求解约束分类优化问题。我们的方法构建问题的图表示，训练GCN学习从问题参数到最优分类的映射，并基于GCN的输出开发了三种推理策略。由于GCN能够跨实例规模泛化，从小规模样本中学到的模式可以迁移到大规模问题。我们建立了理论结果来证明所提出的GCN的表达能力，并解释了规模泛化能力的潜在机制。数值实验表明，在20个产品实例上训练的GCN能够在几秒内对多达2000个产品的问题实现超过85%的最优收益，在准确性和效率上均优于现有启发式方法。我们进一步将该框架扩展到使用交易数据的未知选择模型设置，并展示了类似的性能和可扩展性。

英文摘要

Assortment optimization seeks to select a subset of substitutable products, subject to constraints, to maximize expected revenue. The problem is NP-hard due to its combinatorial and nonlinear nature and arises frequently in industries such as e-commerce, where platforms must solve thousands of such problems each minute. We propose a graph convolutional network (GCN) framework to efficiently solve constrained assortment optimization problems. Our approach constructs a graph representation of the problem, trains a GCN to learn the mapping from problem parameters to optimal assortments, and develops three inference policies based on the GCN's output. Owing to the GCN's ability to generalize across instance sizes, patterns learned from small-scale samples can be transferred to large-scale problems. Theoretical results are established to show the expressive power of the proposed GCN, and explain the underlying mechanism of the size generalization ability. Numerical experiments show that a GCN trained on instances with 20 products achieves over 85% of the optimal revenue on problems with up to 2,000 products within seconds, outperforming existing heuristics in both accuracy and efficiency. We further extend the framework to settings with an unknown choice model using transaction data and demonstrate similar performance and scalability.

URL PDF HTML ☆

赞 0 踩 0

2510.00375 2026-06-15 cs.LG cs.HC 版本更新

Multidimensional Bayesian Active Machine Learning of Working Memory Task Performance

工作记忆任务表现的多维贝叶斯主动机器学习

Dom CP Marticorena, Chris Wissmann, Zeyu Lu, Dennis L Barbour

发表机构 * Department of Biomedical Engineering, Washington University（生物医学工程系，华盛顿大学）； Department of Computer Science and Engineering, Washington University（计算机科学与工程系，华盛顿大学）

AI总结提出贝叶斯二维主动分类方法，在虚拟环境中控制空间负荷和特征绑定负荷，使用高斯过程分类器估计性能曲面，实现快速收敛并揭示个体差异。

Comments 41 pages, 7 figures

详情

AI中文摘要

虽然自适应实验设计已经超越了一维阶梯式自适应，但大多数认知实验仍然控制单个因素并用标量总结表现。我们展示了一种贝叶斯双轴主动分类方法的验证，该方法在沉浸式虚拟测试环境中针对5×5工作记忆重建任务进行。控制两个变量：项目的空间负荷L（占用瓦片数量）和特征绑定负荷K（不同颜色数量）。刺激获取由非参数高斯过程（GP）概率分类器的后验不确定性引导，该分类器输出（L, K）上的曲面，而不是单个阈值或最大跨度值。在年轻成人群体中，我们将GP驱动的自适应模式（AM）与传统的自适应阶梯经典模式（CM）进行比较，后者仅在K=3时变化L。在该队列中，两种方法之间达到一致性，在K=3时组内相关系数为0.755。此外，AM揭示了空间负荷和特征绑定之间交互作用的个体差异。AM估计比其他采样策略收敛更快，表明仅需约30个样本即可准确拟合完整模型。

英文摘要

While adaptive experimental design has outgrown one-dimensional, staircase-based adaptations, most cognitive experiments still control a single factor and summarize performance with a scalar. We show a validation of a Bayesian, two-axis, active-classification approach, carried out in an immersive virtual testing environment for a 5-by-5 working-memory reconstruction task. Two variables are controlled: spatial load L (number of occupied tiles) and feature-binding load K (number of distinct colors) of items. Stimulus acquisition is guided by posterior uncertainty of a nonparametric Gaussian Process (GP) probabilistic classifier, which outputs a surface over (L, K) rather than a single threshold or max span value. In a young adult population, we compare GP-driven Adaptive Mode (AM) with a traditional adaptive staircase Classic Mode (CM), which varies L only at K = 3. Parity between the methods is achieved for this cohort, with an intraclass coefficient of 0.755 at K = 3. Additionally, AM reveals individual differences in interactions between spatial load and feature binding. AM estimates converge more quickly than other sampling strategies, demonstrating that only about 30 samples are required for accurate fitting of the full model.

URL PDF HTML ☆

赞 0 踩 0

2511.09789 2026-06-15 cs.LG 版本更新

Trend-Aware Multi-Task Learning for Short-Term Energy Forecasting

CaReTS：统一分类与回归的多任务时间序列预测框架

Fulong Yao, Wanqing Zhao, Chao Zheng, Xiaofei Han

发表机构 * Cardiff University（卡迪夫大学）； Newcastle University（纽卡斯尔大学）； University of Leeds（利兹大学）

AI总结提出CaReTS多任务框架，通过双流架构联合分类趋势与回归偏差，实现高精度预测与可解释性，在真实数据集上优于现有方法。

详情

AI中文摘要

近年来深度预测模型取得了显著性能，但大多数方法仍难以同时提供准确的预测和对时间动态的可解释洞察。本文提出CaReTS，一种新颖的多任务学习框架，结合分类和回归任务用于多步时间序列预测问题。该框架采用双流架构，其中分类分支学习未来的逐步趋势，而回归分支估计目标变量最新观测值的相应偏差。双流设计通过分离目标变量的宏观趋势和微观偏差，提供更具可解释性的预测。为了在输出预测、偏差估计和趋势分类中实现有效学习，我们设计了一个具有不确定性加权机制的多任务损失，以自适应平衡每个任务的贡献。此外，在该框架下实例化了四种变体（CaReTS1-4），以集成主流时序建模编码器，包括卷积神经网络（CNN）、长短期记忆网络（LSTM）和Transformer。在真实数据集上的实验表明，CaReTS在预测准确性上优于最先进的算法，同时实现了更高的趋势分类性能。

英文摘要

Short-term energy forecasting plays an important role in real-time operational decision-making, such as electricity market bidding and power system dispatch, where both numerical accuracy and correct directional signals are essential. However, most existing forecasting approaches formulate the problem purely as a regression task, limiting their ability to explicitly capture stepwise directional movements and trend consistency required for operational decisions. To address this limitation, this paper proposes a trend-aware multi-task forecasting framework that decomposes forecasting outputs into directional movements and deviation magnitudes relative to the latest observation, enabling both accurate numerical prediction and interpretable trend-aware outputs. The framework adopts a task-specific dual-stream architecture and explores key design choices for integrating trend and deviation information, including hard versus probabilistic trend representations, symmetric versus asymmetric deviation modelling, and parallel versus sequential conditioning strategies. To stabilize multi-task learning and reduce manual tuning, an uncertainty-aware task weighting scheme is incorporated to automatically balance directional classification, deviation regression, and final output prediction during training. Experimental results on real-world energy datasets demonstrate that the proposed framework achieves competitive numerical accuracy compared with state-of-the-art algorithms, while consistently improving trend prediction performance with moderate computational cost. This capability is particularly beneficial in short-term energy system management, where consistent directional forecasting can provide more reliable decision support for practical operational scenarios such as market bidding, resource scheduling, and risk-aware energy management.

URL PDF HTML ☆

赞 0 踩 0

2512.03787 2026-06-15 cs.LG 版本更新

Adaptive Identification and Modeling of Clinical Pathways with Process Mining

基于过程挖掘的临床路径自适应识别与建模

Francesco Vitale, Nicola Mazzocca

发表机构 * University of Naples Federico II（那不勒斯费德里科二世大学）

AI总结提出一种两阶段过程挖掘方法，通过一致性检查诊断扩展临床路径知识库，实现自适应识别与建模，在Synthea数据集上达到95.62% AUC和67.11%弧阶简单性。

Comments Accepted to the 41st ACM/SIGAPP Symposium On Applied Computing (ACM SAC 2026)

详情

DOI: 10.1145/3748522.3779942

AI中文摘要

临床路径是模拟患者治疗过程的专门医疗计划。它们旨在提供基于标准的进展并标准化患者治疗，从而改善护理、减少资源使用并加速患者康复。然而，基于临床指南和领域专业知识手动建模这些路径是困难的，并且可能无法反映针对不同疾病变异或组合的实际最佳实践。我们提出了一种使用过程挖掘的两阶段建模方法，通过利用一致性检查诊断来扩展临床路径知识库。在第一阶段，收集给定疾病的历史数据，以过程模型的形式捕获治疗。在第二阶段，将新数据与参考模型进行比较以验证一致性。基于一致性检查结果，知识库可以扩展为针对新变异或疾病组合定制的更具体模型。我们使用Synthea（一个模拟SARS-CoV-2感染患者治疗并伴有不同COVID-19并发症的基准数据集）展示了我们的方法。结果表明，我们的方法能够以足够的精度扩展临床路径知识库，AUC峰值达到95.62%，同时保持67.11%的弧阶简单性。

英文摘要

Clinical pathways are specialized healthcare plans that model patient treatment procedures. They are developed to provide criteria-based progression and standardize patient treatment, thereby improving care, reducing resource use, and accelerating patient recovery. However, manual modeling of these pathways based on clinical guidelines and domain expertise is difficult and may not reflect the actual best practices for different variations or combinations of diseases. We propose a two-phase modeling method using process mining, which extends the knowledge base of clinical pathways by leveraging conformance checking diagnostics. In the first phase, historical data of a given disease is collected to capture treatment in the form of a process model. In the second phase, new data is compared against the reference model to verify conformance. Based on the conformance checking results, the knowledge base can be expanded with more specific models tailored to new variants or disease combinations. We demonstrate our approach using Synthea, a benchmark dataset simulating patient treatments for SARS-CoV-2 infections with varying COVID-19 complications. The results show that our method enables expanding the knowledge base of clinical pathways with sufficient precision, peaking to 95.62% AUC while maintaining an arc-degree simplicity of 67.11%.

URL PDF HTML ☆

赞 0 踩 0

2512.10966 2026-06-15 cs.LG cs.AI cs.CV eess.IV 版本更新

Interpretable Alzheimer's Diagnosis via Multimodal Fusion of Regional Brain Experts

可解释的阿尔茨海默病诊断：基于区域脑专家的多模态融合

Farica Zhuang, Shu Yang, Dinara Aliyeva, Zixuan Wen, Duy Duong-Tran, Christos Davatzikos, Tianlong Chen, Song Wang, Li Shen

发表机构 * University of Pennsylvania（宾夕法尼亚大学）； Massachusetts Institute of Technology（麻省理工学院）

AI总结提出MREF-AD多模态区域专家融合模型，采用混合专家框架将各模态脑区域视为独立专家，通过门控网络学习个性化融合权重，实现可解释的AD诊断。

Comments Published at IEEE ICHI 2026

详情

AI中文摘要

准确早期诊断阿尔茨海默病（AD）对有效干预至关重要，需要整合多模态神经影像数据的互补信息。然而，传统融合方法通常依赖特征的简单拼接，无法自适应平衡淀粉样蛋白PET和MRI等生物标志物在不同脑区的贡献。本文提出MREF-AD，一种用于AD诊断的多模态区域专家融合模型。它是一个混合专家（MoE）框架，将每个模态内的介观脑区域建模为独立专家，并采用门控网络学习个体特定的融合权重。利用阿尔茨海默病神经影像学倡议（ADNI）的表格神经影像和人口统计学信息，MREF-AD在强经典和深度学习基线上取得了有竞争力的性能，同时提供了可解释的、模态和区域层面的洞察，揭示了结构和分子影像如何共同促进AD诊断。源代码见：此 https URL。

英文摘要

Accurate and early diagnosis of Alzheimer's disease (AD) is critical for effective intervention and requires integrating complementary information from multimodal neuroimaging data. However, conventional fusion approaches often rely on simple concatenation of features, which cannot adaptively balance the contributions of biomarkers such as amyloid PET and MRI across brain regions. In this work, we propose MREF-AD, a Multimodal Regional Expert Fusion model for AD diagnosis. It is a Mixture-of-Experts (MoE) framework that models mesoscopic brain regions within each modality as independent experts and employs a gating network to learn subject-specific fusion weights. Utilizing tabular neuroimaging and demographic information from the Alzheimer's Disease Neuroimaging Initiative (ADNI), MREF-AD achieves competitive performance over strong classic and deep baselines while providing interpretable, modality- and region-level insight into how structural and molecular imaging jointly contribute to AD diagnosis. The source code is available at https://github.com/PennShenLab/mref-ad.

URL PDF HTML ☆

赞 0 踩 0

2512.13069 2026-06-15 cs.LG physics.flu-dyn stat.ML 版本更新

Multi-fidelity aerodynamic data fusion by autoencoder transfer learning

基于自编码器迁移学习的多保真度气动数据融合

Javier Nieto-Centenero, Esther Andrés, Rodrigo Castellanos

发表机构 * Department of Aerospace Engineering, UC3M（航空航天工程系，UC3M）； Theoretical and Computational Aerodynamics Group, Flight Physics Department, INTA（理论与计算空气动力学组，飞行物理部门，INTA）

AI总结提出结合自编码器迁移学习与多分裂保形预测的多保真度深度学习框架，利用低保真数据学习潜在物理表示，微调解码器以极少量高保真数据实现高精度气动压力预测，并生成超过95%点覆盖的不确定度带。

Comments 27 pages, 13 figures

详情

AI中文摘要

准确的气动预测通常依赖于高保真度模拟；然而，其高昂的计算成本严重限制了其在数据驱动建模中的适用性。这一局限性促使了多保真度策略的发展，该策略利用廉价的低保真度信息而不牺牲准确性。针对这一挑战，本文提出了一种多保真度深度学习框架，该框架将基于自编码器的迁移学习与新开发的多分裂保形预测（MSCP）策略相结合，以在极端数据稀缺条件下实现具有不确定度感知的气动数据融合。该方法利用丰富的低保真度（LF）数据学习紧凑的潜在物理表示，该表示作为冻结的知识库，随后使用稀缺的高保真度（HF）样本对解码器进行微调。在NACA翼型（二维）和跨声速机翼（三维）数据库的表面压力分布测试中，该模型成功修正了LF偏差，并使用最少的HF训练数据实现了高精度的压力预测。此外，MSCP框架生成了稳健且可操作的不确定度带，点覆盖超过95%。通过将极端数据效率与不确定度量化相结合，本文为数据稀缺环境下的气动回归提供了一种可扩展且可靠的解决方案。

英文摘要

Accurate aerodynamic prediction often relies on high-fidelity simulations; however, their prohibitive computational costs severely limit their applicability in data-driven modeling. This limitation motivates the development of multi-fidelity strategies that leverage inexpensive low-fidelity information without compromising accuracy. Addressing this challenge, this work presents a multi-fidelity deep learning framework that combines autoencoder-based transfer learning with a newly developed Multi-Split Conformal Prediction (MSCP) strategy to achieve uncertainty-aware aerodynamic data fusion under extreme data scarcity. The methodology leverages abundant Low-Fidelity (LF) data to learn a compact latent physics representation, which acts as a frozen knowledge base for a decoder that is subsequently fine-tuned using scarce HF samples. Tested on surface-pressure distributions for NACA airfoils (2D) and a transonic wing (3D) databases, the model successfully corrects LF deviations and achieves high-accuracy pressure predictions using minimal HF training data. Furthermore, the MSCP framework produces robust, actionable uncertainty bands with pointwise coverage exceeding 95%. By combining extreme data efficiency with uncertainty quantification, this work offers a scalable and reliable solution for aerodynamic regression in data-scarce environments.

URL PDF HTML ☆

赞 0 踩 0

2512.14967 2026-06-15 cs.LG q-fin.CP q-fin.MF 版本更新

Deep Learning and Elicitability for McKean-Vlasov FBSDEs With Common Noise

带共同噪声的McKean-Vlasov正倒向随机微分方程的深度学习与可引性

Felipe J. P. Antunes, Yuri F. Saporito, Sebastian Jaimungal

发表机构 * School of Applied Mathematics, Getulio Vargas Foundation（应用数学学院，古特雷斯基金会）； Department of Statistical Sciences, University of Toronto（统计科学系，多伦多大学）； Oxford-Man Institute for Quantitative Finance, University of Oxford（牛津-曼定量金融研究所，牛津大学）

AI总结提出结合Picard迭代、可引性和深度学习的方法，求解带共同噪声的McKean-Vlasov正倒向随机微分方程，通过可引性导出路径损失函数避免嵌套蒙特卡洛，在系统风险模型和经济增长模型中验证了准确性。

Comments 19 pages, 8 figures,

详情

AI中文摘要

我们提出了一种新颖的数值方法，用于求解带共同噪声的McKean-Vlasov正倒向随机微分方程（MV-FBSDEs），该方法结合了Picard迭代、可引性和深度学习。关键创新在于利用可引性导出路径损失函数，从而能够高效训练神经网络来近似倒向过程和由共同噪声引起的条件期望，无需计算昂贵的嵌套蒙特卡洛模拟。平均场相互作用项通过循环神经网络参数化，该网络被训练以最小化可引分数，而倒向过程则通过表示解耦场的混合前馈和循环网络来近似。我们在一个存在解析解的系统性风险银行间借贷模型上验证了该算法，结果表明能够准确恢复真实解。我们进一步将模型扩展到分位数中介的相互作用，展示了可引性框架在条件均值或矩之外的灵活性。最后，我们将该方法应用于一个具有内生利率的非平稳Aiyagari-Bewley-Huggett经济增长模型，展示了其在没有闭式解的复杂平均场博弈中的适用性。

英文摘要

We present a novel numerical method for solving McKean--Vlasov forward--backward stochastic differential equations (MV--FBSDEs) with common noise, combining Picard iterations, elicitability and deep learning. The key innovation involves elicitability to derive a pathwise loss function, enabling efficient training of neural networks to approximate both the backward process and the conditional expectations arising from common noise, without requiring computationally expensive nested Monte Carlo simulations. The mean-field interaction term is parameterized via a recurrent neural network trained to minimize an elicitable score, while the backward process is approximated through a hybrid feedforward and recurrent network representing the decoupling field. We validate the algorithm on a systemic-risk inter-bank borrowing and lending model, where analytical solutions exist, demonstrating accurate recovery of the true solution. We further extend the model to quantile-mediated interactions, showcasing the flexibility of the elicitability framework beyond conditional means or moments. Finally, we apply the method to a non-stationary Aiyagari--Bewley--Huggett economic growth model with endogenous interest rates, illustrating its applicability to complex mean-field games without closed-form solutions.

URL PDF HTML ☆

赞 0 踩 0

2601.18707 2026-06-15 cs.LG cs.AI cs.CV cs.NE 版本更新

SMART: Scalable Mesh-free Aerodynamic Simulations from Raw Geometries using a Transformer-based Surrogate Model

SMART: 基于Transformer代理模型的原始几何形状可扩展无网格气动模拟

Jan Hagnberger, Mathias Niepert

发表机构 * Jan Hagnberger ； Mathias Niepert

AI总结提出SMART，一种无需模拟网格、仅使用几何点云预测任意查询位置物理量的神经代理模型，通过交叉层交互联合更新几何特征和物理场，性能媲美甚至超越依赖网格的方法。

Comments Accepted for publication at the 43rd International Conference on Machine Learning (ICML) 2026, Seoul, South Korea

详情

AI中文摘要

基于机器学习的代理模型已成为复杂几何体（如车身）物理模拟中数值求解器的高效替代方案。许多现有模型将模拟网格作为额外输入，从而减少预测误差。然而，为新几何体生成模拟网格计算成本高昂。相比之下，不依赖模拟网格的无网格方法通常误差更高。基于这些考虑，我们引入了SMART，一种神经代理模型，它仅使用几何体的点云表示，无需访问模拟网格，即可预测任意查询位置的物理量。几何体和模拟参数被编码到一个共享的潜在空间中，该空间捕捉物理场的结构和参数特征。然后，一个物理解码器关注编码器的中间潜在表示，将空间查询映射到物理量。通过这种跨层交互，模型联合更新潜在几何特征和演变的物理场。大量实验表明，SMART与依赖模拟网格作为输入的现有方法相比具有竞争力，并且通常表现更优，展示了其在工业级模拟中的能力。

英文摘要

Machine learning-based surrogate models have emerged as more efficient alternatives to numerical solvers for physical simulations over complex geometries, such as car bodies. Many existing models incorporate the simulation mesh as an additional input, thereby reducing prediction errors. However, generating a simulation mesh for new geometries is computationally costly. In contrast, mesh-free methods, which do not rely on the simulation mesh, typically incur higher errors. Motivated by these considerations, we introduce SMART, a neural surrogate model that predicts physical quantities at arbitrary query locations using only a point-cloud representation of the geometry, without requiring access to the simulation mesh. The geometry and simulation parameters are encoded into a shared latent space that captures both structural and parametric characteristics of the physical field. A physics decoder then attends to the encoder's intermediate latent representations to map spatial queries to physical quantities. Through this cross-layer interaction, the model jointly updates latent geometric features and the evolving physical field. Extensive experiments show that SMART is competitive with and often outperforms existing methods that rely on the simulation mesh as input, demonstrating its capabilities for industry-level simulations.

URL PDF HTML ☆

赞 0 踩 0

2602.12379 2026-06-15 cs.LG 版本更新

Deep Doubly Debiased Longitudinal Effect Estimation with ICE G-Computation

深度双重去偏的ICE G-计算公式纵向效应估计

Wenxin Chen, Weishen Pan, Kyra Gan, Fei Wang

发表机构 * Cornell University（康奈尔大学）； Weill Cornell Medicine（韦尔医学院）

AI总结提出D3-Net框架，通过顺序双重稳健伪结果和纵向目标最小损失估计，解决ICE G-计算中的误差传播问题，实现纵向治疗效应的稳健估计。

详情

AI中文摘要

估计纵向治疗效应对于顺序决策至关重要，但由于治疗-混杂反馈而具有挑战性。虽然迭代条件期望（ICE）G-计算提供了一种原则性方法，但其递归结构存在误差传播，破坏了学习到的结果回归模型。我们提出D3-Net，一个在ICE训练中减轻误差传播并应用稳健最终校正的框架。首先，为了中断学习过程中的误差传播，我们使用顺序双重稳健（SDR）伪结果训练ICE序列，为每个回归提供偏差校正的目标。其次，我们采用多任务变换器，配备协变量模拟器头部进行辅助监督，正则化表示学习，以及目标网络以稳定训练动态。对于最终估计，我们丢弃SDR校正，而是使用未校正的干扰模型对原始结果进行纵向目标最小损失估计（LTMLE）。这第二阶段的针对性去偏确保了稳健性和最优有限样本性质。综合实验表明，与现有最先进的基于ICE的估计器相比，我们的模型D3-Net在不同时间范围、反事实和时变混杂下稳健地降低了偏差和方差。

英文摘要

Estimating longitudinal treatment effects is essential for sequential decision-making but is challenging due to treatment-confounder feedback. While Iterative Conditional Expectation (ICE) G-computation offers a principled approach, its recursive structure suffers from error propagation, corrupting the learned outcome regression models. We propose D3-Net, a framework that mitigates error propagation in ICE training and then applies a robust final correction. First, to interrupt error propagation during learning, we train the ICE sequence using Sequential Doubly Robust (SDR) pseudo-outcomes, which provide bias-corrected targets for each regression. Second, we employ a multi-task transformer with a covariate simulator head for auxiliary supervision, regularizing representation learning, and a target network to stabilize training dynamics. For the final estimate, we discard the SDR correction and instead use the uncorrected nuisance models to perform Longitudinal Targeted Minimum Loss-Based Estimation (LTMLE) on the original outcomes. This second-stage, targeted debiasing ensures robustness and optimal finite-sample properties. Comprehensive experiments demonstrate that our model, D3-Net, robustly reduces bias and variance across different horizons, counterfactuals, and time-varying confoundings, compared to existing state-of-the-art ICE-based estimators.

URL PDF HTML ☆

赞 0 踩 0

2603.05556 2026-06-15 cs.LG 版本更新

IntSeqBERT: Learning Arithmetic Structure in OEIS via Modulo-Spectrum Embeddings

IntSeqBERT: 通过模谱嵌入学习OEIS中的算术结构

Kazuhisa Nakasho

发表机构 * Iwate Prefectural University（岩手县大学）

AI总结提出IntSeqBERT，一种双流Transformer编码器，通过连续对数幅度嵌入和100个模数的正弦/余弦模嵌入融合，在OEIS序列上联合训练三个预测头，显著提升了序列预测精度。

详情

AI中文摘要

OEIS中的整数序列涵盖从个位数常数到天文阶乘和指数，使得标准分词模型难以处理，因为它们无法处理词汇表外的值或利用周期性算术结构。我们提出IntSeqBERT，一种用于OEIS掩码整数序列建模的双流Transformer编码器。每个序列元素沿两个互补轴编码：连续对数尺度幅度嵌入和100个残差（模数$2$--$101$）的正弦/余弦模嵌入，通过FiLM融合。三个预测头（幅度回归、符号分类和100个模数的模预测）在274,705个OEIS序列上联合训练。在Large规模（9150万参数）下，IntSeqBERT在测试集上达到95.85%的幅度准确率和50.38%的平均模准确率（MMA），分别比标准分词Transformer基线高出$+8.9$和$+4.5$个百分点。去除模流的消融实验证实，模流贡献了$+15.2$个百分点的MMA增益，并额外贡献了$+6.2$个百分点的幅度准确率。基于概率中国剩余定理（CRT）的解算器将模型预测转化为具体整数，使得下一项预测比分词Transformer基线提升7.4倍（Top-1: 19.09% vs. 2.59%）。模谱分析显示，归一化信息增益（NIG）与欧拉函数比值$\varphi(m)/m$之间存在强负相关（$r = -0.851$, $p < 10^{-28}$），为复合模数通过CRT聚合更有效地捕获OEIS算术结构提供了经验证据。

英文摘要

Integer sequences in the OEIS span values from single-digit constants to astronomical factorials and exponentials, making prediction challenging for standard tokenised models that cannot handle out-of-vocabulary values or exploit periodic arithmetic structure. We present IntSeqBERT, a dual-stream Transformer encoder for masked integer-sequence modelling on OEIS. Each sequence element is encoded along two complementary axes: a continuous log-scale magnitude embedding and sin/cos modulo embeddings for 100 residues (moduli $2$--$101$), fused via FiLM. Three prediction heads (magnitude regression, sign classification, and modulo prediction for 100 moduli) are trained jointly on 274,705 OEIS sequences. At the Large scale (91.5M parameters), IntSeqBERT achieves 95.85% magnitude accuracy and 50.38% Mean Modulo Accuracy (MMA) on the test set, outperforming a standard tokenised Transformer baseline by $+8.9$ pt and $+4.5$ pt, respectively. An ablation removing the modulo stream confirms it accounts for $+15.2$ pt of the MMA gain and contributes an additional $+6.2$ pt to magnitude accuracy. A probabilistic Chinese Remainder Theorem (CRT)-based Solver converts the model's predictions into concrete integers, yielding a 7.4-fold improvement in next-term prediction over the tokenised-Transformer baseline (Top-1: 19.09% vs. 2.59%). Modulo spectrum analysis reveals a strong negative correlation between Normalised Information Gain (NIG) and Euler's totient ratio $φ(m)/m$ ($r = -0.851$, $p < 10^{-28}$), providing empirical evidence that composite moduli capture OEIS arithmetic structure more efficiently via CRT aggregation.

URL PDF HTML ☆

赞 0 踩 0

2604.23841 2026-06-15 cs.LG cs.AI 版本更新

Scalable Production Scheduling: Linear Complexity via Unified Homogeneous Graphs

可扩展的生产调度：通过统一同质图实现线性复杂度

Jonathan Hoss, Moritz Link, Noah Klarmann

发表机构 * Faculty of Management and Engineering, Rosenheim Technical University of Applied Sciences（管理与工程学院，罗森海姆应用技术大学）

AI总结提出统一同质图框架，通过特征同质化将不同节点角色映射到共享潜在空间，使用同构图同构网络以线性复杂度解决作业车间调度问题，实现零样本泛化，并发现作业与机器比率是策略有效性的主要驱动因素。

Comments This paper has been accepted for presentation at the IEEE 22st International Conference on Automation Science and Engineering (CASE 2026)

详情

AI中文摘要

在现实工业应用中高效解决作业车间调度问题需要既计算精简又拓扑鲁棒的策略。虽然强化学习在自动化调度规则方面显示出潜力，但现有模型常因二次图复杂度或异质层的架构开销而面临可扩展性瓶颈。我们引入了一个统一图框架，采用基于特征的同质化将不同的节点角色投影到共享潜在空间。这使得标准的同构图同构网络能够以线性复杂度捕获复杂的资源竞争，确保大规模工业应用的低延迟推理。我们的实验结果表明，我们的框架实现了最先进的性能，同时表现出一致的零样本泛化。我们确定作业与机器比率是策略有效性的主要驱动因素，而非绝对问题规模。基于此，我们提出了结构饱和假设，证明在临界拥塞实例（$\mathcal{J} \approx \mathcal{M}$）上训练的策略学习了尺度不变的解决策略。在此饱和点训练的智能体内化了不变的冲突解决逻辑，使它们能够将大规模矩形实例视为饱和子问题的顺序串联。这种方法消除了昂贵的特定尺度重新训练的需要，并防止了对统计捷径的过拟合，为在动态生产环境中部署强化学习解决方案提供了鲁棒且高效的途径。

英文摘要

Efficiently solving the Job Shop Scheduling Problem in real-world industrial applications requires policies that are both computationally lean and topologically robust. While Reinforcement Learning has shown potential in automating dispatching rules, existing models often struggle with a scalability bottleneck caused by quadratic graph complexity or the architectural overhead of heterogeneous layers. We introduce a unified graph framework that employs feature-based homogenization to project distinct node roles into a shared latent space. This allows a standard homogeneous Graph Isomorphism Network to capture complex resource contention with linear complexity, ensuring low-latency inference for large-scale industrial applications. Our empirical results demonstrate that our framework achieves state-of-the-art performance while exhibiting consistent zero-shot generalization. We identify the job-to-machine ratio as the primary driver of policy effectiveness, rather than absolute problem size. Based on this, we propose a hypothesis of structural saturation, demonstrating that policies trained on critically congested instances ($\mathcal{J} \approx \mathcal{M}$) learn scale-invariant resolution strategies. Agents trained at this saturation point internalize invariant conflict-resolution logic, allowing them to treat massive rectangular instances as a sequential concatenation of saturated sub-problems. This approach eliminates the need for expensive scale-specific retraining and prevents overfitting to statistical shortcuts, providing a robust and efficient pathway for deploying RL solutions in dynamic production environments.

URL PDF HTML ☆

赞 0 踩 0

2605.16739 2026-06-15 cs.LG cs.AI cs.CL q-bio.NC 版本更新

用于机器学习的多模波传播的任意控制

Tatsuhiro Onodera, Martin M. Stein, Benjamin A. Ash, Mandar M. Sohoni, Melissa Bosch, Ryotatsu Yanagimoto, Marc Jankowski, Timothy P. McKenna, Tianyu Wang, Gennady Shvets, Maxim R. Shcherbakov, Logan G. Wright, Peter L. McMahon

发表机构 * School of Applied and Engineering Physics, Cornell University（应用与工程物理系，康奈尔大学）； NTT Physics and Informatics Laboratories, NTT Research, Inc.（NTT物理与信息实验室，NTT研究公司）； E. L. Ginzton Laboratory, Stanford University（E. L. Ginzton实验室，斯坦福大学）； Kavli Institute at Cornell for Nanoscale Science, Cornell University（康奈尔大学纳米科学研究所）； Department of Electrical and Computer Engineering, Boston University（波士顿大学电气与计算机工程系）； Department of Electrical Engineering and Computer Science, University of California（加州大学电气工程与计算机科学系）； Department of Applied Physics, Yale University（耶鲁大学应用物理系）

AI总结提出一种可快速重编程折射率的二维可编程波导，通过并行电光调制实现多模波传播的任意控制，并用于单次神经网络推理，理论表明面积增长为N^1.5而非N^2。

Journal ref Nat. Phys. 22, 164-171 (2026)

详情

DOI: 10.1038/s41567-025-03094-2

AI中文摘要

受控的多模波传播可以实现比基于单模波导连接分立组件的架构更节省空间的光子处理器。我们可以不定义离散元件，而是通过二维多模干涉来塑造光子处理器的连续基底以执行计算。这里我们设计并展示了一种折射率可在空间上快速重编程的器件，允许对波传播进行任意控制。该器件是一种二维可编程波导，利用对平板波导折射率的并行电光调制，具有约10^4个可编程空间自由度。我们在基准任务上实现了单次通过、无需数字预处理或后处理的神经网络推理，向量维度高达49。理论和数值分析进一步表明，二维可编程波导不仅可能提供器件面积的常数因子缩减，还可能带来缩放优势，所需面积按N^{1.5}而非N^2增长。

英文摘要

Controlled multimode wave propagation can enable more space-efficient photonic processors than architectures based on discrete components connected by single-mode waveguides. Instead of defining discrete elements, one can sculpt the continuous substrate of a photonic processor to perform computations through multimode interference in two dimensions. Here we designed and demonstrated a device with a refractive index that can be rapidly reprogrammed across space, allowing arbitrary control of wave propagation. The device, a two-dimensional programmable waveguide, uses parallel electro-optic modulation of the refractive index of a slab waveguide with about $10^4$ programmable spatial degrees of freedom. We implemented neural network inference on benchmark tasks with up to $49$-dimensional vectors in a single pass, without digital pre-processing or post-processing. Theoretical and numerical analyses further indicated that two-dimensional programmable waveguides may offer not only a constant-factor reduction in device area but also a scaling benefit, with the area required growing as $N^{1.5}$ rather than $N^2$.

URL PDF HTML ☆

赞 0 踩 0

2410.15051 2026-06-15 cs.CL cs.LG 版本更新

Automatic identification of diagnosis from hospital discharge letters via weakly supervised Natural Language Processing

通过弱监督自然语言处理自动识别出院信中的诊断

Vittorio Torri, Elisa Barbieri, Anna Cantarutti, Carlo Giaquinto, Francesca Ieva

发表机构 * University of Bologna（博洛尼亚大学）

AI总结提出一种弱监督NLP流程，无需文档级标注即可从意大利语出院信中分类诊断，在细支气管炎数据集上达到接近全监督的性能，节省大量人工标注时间。

Comments 61 pages, 9 figures

详情

DOI: 10.1038/s41598-026-56721-0

AI中文摘要

从医院出院信中识别患者诊断对于大规模队列选择和流行病学研究至关重要，但传统的监督方法需要大量手动标注，这对于大型文本数据集通常不切实际。我们提出了一种弱监督自然语言处理（NLP）流程，用于对意大利语出院信进行分类，无需文档级手动标注。该方法提取与诊断相关的句子，使用在意大利医学文档上进一步预训练的Transformer模型生成语义嵌入，并应用两级聚类程序推导出弱标签，然后用于训练文档级分类器。该方法在2017年至2020年间意大利威尼托地区44个急诊室或医院收治的33,176份儿童出院信的细支气管炎案例研究中进行了评估。最佳弱监督模型在手动标注数据上实现了77.68%（±4.30%）的AUROC、73.13%（±4.93%）的AUPRC和78.14%（±4.89%）的F1分数。性能超过了无监督基线，接近全监督模型，同时对于该规模的数据集减少了超过1,500小时的手动标注需求。在较小的支气管炎数据集（3,188份出院信，2020-2025年）的二次验证中观察到类似的模型排名，最佳弱监督模型实现了76.72%（±5.02%）的AUPRC。这些结果表明弱监督NLP方法在从临床出院信中可扩展地识别疾病方面具有潜力。

英文摘要

Identifying patient diagnoses from hospital discharge letters is essential for large-scale cohort selection and epidemiological research, but traditional supervised approaches require extensive manual annotation, which is often impractical for large textual datasets. We present a weakly supervised Natural Language Processing (NLP) pipeline for classifying Italian discharge letters without document-level manual annotation. The method extracts diagnosis-related sentences, generates semantic embeddings using a transformer model further pre-trained on Italian medical documents, and applies a two-level clustering procedure to derive weak labels that are then used to train a document-level classifier. The approach was evaluated in a case study on bronchiolitis using 33,176 discharge letters of children admitted to 44 emergency rooms or hospitals in the Veneto Region, Italy, between 2017 and 2020. The best weakly supervised model achieved an AUROC of 77.68% ($\pm4.30\%$), an AUPRC of 73.13% ($\pm4.93\%$), and an F1-score of 78.14% ($\pm4.89\%$) against manually annotated data. Performance surpassed unsupervised baselines and approached fully supervised models, while reducing the need for manual annotation by more than 1,500 hours for a dataset of this size. Similar model rankings were observed in a secondary validation on a smaller bronchitis dataset (3,188 discharge letters, 2020-2025), where the best weakly supervised model achieved an AUPRC of 76.72% ($\pm 5.02\%$). These results suggest the potential of weakly supervised NLP methods for scalable disease identification from clinical discharge letters.

URL PDF HTML ☆

赞 0 踩 0

2501.08561 2026-06-15 cs.AI cs.HC cs.LG cs.SC 版本更新

ANSR-DT: A Neuro-Symbolic Framework for Adaptive and Explainable Digital Twins

ANSR-DT：一种自适应可解释数字孪生的神经符号框架

Safayat Bin Hakim, Muhammad Adil, Alvaro Velasquez, Houbing Herbert Song

发表机构 * Department of Information Systems, University of Maryland Baltimore County（信息系统系，马里兰大学巴尔的摩县分校）； Department of Computer Science and Engineering, University at Buffalo（计算机科学与工程系，布法罗大学）； Department of Computer Science, University of Colorado Boulder（计算机科学系，科罗拉多大学波德分校）

AI总结提出ANSR-DT框架，结合CNN-LSTM、Prolog推理和PPO强化学习，实现数字孪生的异常检测、符号推理与自适应决策，在多个基准上表现优异。

Comments Code available at https://github.com/sbhakim/ansr-dt

详情

AI中文摘要

数字孪生越来越多地用于监控和优化工业系统，然而许多现有框架仍然难以解释、适应缓慢，并且整合显式领域知识的能力有限。本文提出了ANSR-DT，一种自适应神经符号框架，它在单一数字孪生流水线中统一了时序异常检测、符号推理和基于强化学习的决策支持。ANSR-DT将用于多变量模式识别的CNN-LSTM模型与基于Prolog的推理相结合，后者将学习到的信号转换为显式规则，从而实现透明的诊断和可追溯的决策路径。基于PPO的适应层进一步在变化条件下优化操作响应，同时保持可解释性。在8个基线模型上的实验表明，ANSR-DT在提供竞争性预测性能的同时，还能实现稳定的规则提取、可扩展的符号推理和可操作的解释。在Skoltech异常基准（SKAB）上的额外验证进一步表明，该框架能够迁移到合成场景之外。这些发现使ANSR-DT成为可信、自适应和可解释的工业数字孪生的实用基础。

英文摘要

Digital twins are increasingly used to monitor and optimize industrial systems, yet many existing frameworks remain difficult to interpret, slow to adapt, and limited in their ability to incorporate explicit domain knowledge. This paper presents ANSR-DT, an adaptive neuro-symbolic framework that unifies temporal anomaly detection, symbolic reasoning, and reinforcement-learning-based decision support within a single digital twin pipeline. ANSR-DT combines a CNN-LSTM model for multivariate pattern recognition with Prolog-based reasoning that converts learned signals into explicit rules, enabling transparent diagnoses and traceable decision paths. A PPO-based adaptation layer further refines operational responses under changing conditions while preserving interpretability. Experiments against 8 baselines show that ANSR-DT delivers competitive predictive performance together with stable rule extraction, scalable symbolic reasoning, and actionable explanations. Additional validation on the Skoltech Anomaly Benchmark (SKAB) further indicates that the framework transfers beyond synthetic settings. These findings position ANSR-DT as a practical foundation for trustworthy, adaptive, and explainable industrial digital twins.

URL PDF HTML ☆

赞 0 踩 0

2504.03686 2026-06-15 cs.NI cs.AI cs.LG 版本更新

Revisiting Outage for Edge Inference Systems

重新审视边缘推理系统的中断问题

Zhanwei Wang, Qunsong Zeng, Haotian Zheng, Kaibin Huang

发表机构 * Department of Electrical and Computer Engineering, The University of Hong Kong（香港大学电子与计算机工程系）

AI总结针对边缘推理系统的端到端可靠性，提出推理中断概率框架，量化推理精度低于阈值的概率，并优化通信开销与推理可靠性的权衡。

详情

AI中文摘要

第六代（6G）移动网络的关键任务之一是在网络边缘部署大规模人工智能（AI）模型，为边缘设备提供远程推理服务。由此产生的平台称为边缘推理，将支持广泛的物联网应用，如自动驾驶、工业自动化和增强现实。鉴于这些任务的关键性和时间敏感性，设计既可靠又能满足严格端到端（E2E）延迟约束的边缘推理系统至关重要。现有研究主要关注以信道中断概率为特征的通信可靠性，可能无法保证E2E性能，特别是在E2E推理精度和延迟方面。为解决这一局限，我们提出一个理论框架，引入并数学刻画了推理中断（InfOut）概率，该概率量化了E2E推理精度低于目标阈值的可能性。在E2E延迟约束下，该框架建立了通信开销（即上传更多传感器观测）与以InfOut概率量化的推理可靠性之间的基本权衡。为了找到优化这种权衡的可行方法，我们通过对接收判别增益的分布应用高斯近似，推导出InfOut概率的精确替代函数。实验结果表明，所提出的设计在E2E推理可靠性方面优于传统的以通信为中心的方法。

英文摘要

One of the key missions of sixth-generation (6G) mobile networks is to deploy large-scale artificial intelligence (AI) models at the network edge to provide remote-inference services for edge devices. The resultant platform, known as edge inference, will support a wide range of Internet-of-Things applications, such as autonomous driving, industrial automation, and augmented reality. Given the mission-critical and time-sensitive nature of these tasks, it is essential to design edge inference systems that are both reliable and capable of meeting stringent end-to-end (E2E) latency constraints. Existing studies, which primarily focus on communication reliability as characterized by channel outage probability, may fail to guarantee E2E performance, specifically in terms of E2E inference accuracy and latency. To address this limitation, we propose a theoretical framework that introduces and mathematically characterizes the inference outage (InfOut) probability, which quantifies the likelihood that the E2E inference accuracy falls below a target threshold. Under an E2E latency constraint, this framework establishes a fundamental tradeoff between communication overhead (i.e., uploading more sensor observations) and inference reliability as quantified by the InfOut probability. To find a tractable way to optimize this tradeoff, we derive accurate surrogate functions for InfOut probability by applying a Gaussian approximation to the distribution of the received discriminant gain. Experimental results demonstrate the superiority of the proposed design over conventional communication-centric approaches in terms of E2E inference reliability.

URL PDF HTML ☆

赞 0 踩 0

2508.18166 2026-06-15 cs.IR cs.LG 版本更新

PCR-CA: Parallel Codebook Representations with Contrastive Alignment for Multiple-Category App Recommendation

PCR-CA: 基于对比对齐的并行码本表示用于多类别应用推荐

Bin Tan, Wangyao Ge, Yidi Wang, Xin Liu, Jeff Burtoft, Hao Fan, Hui Wang

发表机构 * Microsoft Suzhou China（微软苏州中国）； Microsoft Beijing China（微软北京中国）； Microsoft Redmond WA USA（微软雷德蒙德华盛顿州美国）

AI总结提出PCR-CA框架，通过并行码本VQ-AE模块学习多类别应用的离散语义表示，结合对比对齐损失和双注意力融合，提升CTR预测，尤其对长尾应用效果显著。

Comments Accepted by KDD 2026, oral

详情

DOI: 10.1145/3770855.3818459

AI中文摘要

现代应用商店推荐系统在处理多类别应用时面临挑战，因为传统分类法无法捕捉重叠语义，导致个性化效果不佳。我们提出PCR-CA（并行码本表示与对比对齐），一个用于改进CTR预测的端到端框架。PCR-CA首先从应用文本中提取紧凑的多模态嵌入，然后引入并行码本VQ-AE模块，该模块并行学习多个码本上的离散语义表示——不同于层次残差量化（RQ-VAE）。这种设计能够独立编码不同方面（如游戏玩法、艺术风格），更好地建模多类别语义。为了桥接语义信号和协同信号，我们在用户和项目层面采用对比对齐损失，增强长尾项目的表示学习。此外，双注意力融合机制结合了基于ID的特征和语义特征，以捕捉用户兴趣，特别是对于长尾应用。在大规模数据集上的实验表明，PCR-CA在强基线上实现了+0.76%的AUC提升，其中长尾应用的AUC增益达到+2.15%。在线A/B测试进一步验证了我们的方法，CTR提升+10.52%，CVR提升+16.30%，证明了PCR-CA在实际部署中的有效性。该新框架现已完全部署在Microsoft Store上。

英文摘要

Modern app store recommender systems struggle with multiple-category apps, as traditional taxonomies fail to capture overlapping semantics, leading to suboptimal personalization. We propose PCR-CA (Parallel Codebook Representations with Contrastive Alignment), an end-to-end framework for improved CTR prediction. PCR-CA first extracts compact multimodal embeddings from app text, then introduces a Parallel Codebook VQ-AE module that learns discrete semantic representations across multiple codebooks in parallel -- unlike hierarchical residual quantization (RQ-VAE). This design enables independent encoding of diverse aspects (e.g., gameplay, art style), better modeling multiple-category semantics. To bridge semantic and collaborative signals, we employ a contrastive alignment loss at both the user and item levels, enhancing representation learning for long-tail items. Additionally, a dual-attention fusion mechanism combines ID-based and semantic features to capture user interests, especially for long-tail apps. Experiments on a large-scale dataset show PCR-CA achieves a +0.76% AUC improvement over strong baselines, with +2.15% AUC gains for long-tail apps. Online A/B testing further validates our approach, showing a +10.52% lift in CTR and a +16.30% improvement in CVR, demonstrating PCR-CA's effectiveness in real-world deployment. The new framework has now been fully deployed on the Microsoft Store.

URL PDF HTML ☆

赞 0 踩 0

2509.06697 2026-06-15 econ.EM cs.LG stat.AP stat.ML 版本更新

Neural ARFIMA model for forecasting BRIC exchange rates with long memory

具有长期记忆的神经ARFIMA模型用于预测BRIC汇率

Donia Besher, Madhurima Panja, Shovon Sengupta, Tanujit Chakraborty

发表机构 * SAFIR, Sorbonne University Abu Dhabi（SAFIR，索邦大学阿布扎赫德分校）； Sorbonne Center for Artificial Intelligence, Sorbonne University（索邦人工智能中心，索邦大学）

AI总结本文提出神经ARFIMA模型，结合ARFIMA的长期记忆结构和神经网络非线性能力，以提高BRIC汇率预测精度。

详情

AI中文摘要

准确预测汇率仍是一个持续挑战，特别是对于新兴经济体如巴西、俄罗斯、印度和中国（BRIC）。这些序列表现出长期记忆和非线性，传统时间序列模型难以捕捉。汇率动态还受全球经济政策不确定性、美国股市波动性、美国货币政策不确定性、油价增长率和短期利率等因素影响。本文提出神经自回归分数积分移动平均（NARFIMA）模型，结合ARFIMA的长期记忆结构和神经网络的非线性学习能力，并纳入外生变量。我们建立了NARFIMA的渐近平稳性，并利用符合预测区间量化预测不确定性。实证结果表明，NARFIMA在预测BRIC汇率方面始终优于基准方法。

英文摘要

Exchange rate forecasting remains a challenging problem, particularly for emerging economies, where the observed time series exhibit pronounced long-memory dependence, nonlinear dynamics, and sensitivity to macro-financial drivers. Classical models such as ARFIMA capture long-range persistence but fail to adequately represent nonlinear relationships, while modern machine learning approaches often neglect the underlying long-memory structure in macroeconomic series. To address this gap, we propose a Neural AutoRegressive Fractionally Integrated Moving Average (NARFIMA) model that integrates ARFIMA-based long-memory modeling with neural networks for nonlinear function approximation, while incorporating exogenous macroeconomic and uncertainty indicators. The framework provides a unified approach for capturing persistence, nonlinear dynamics, and external shocks. We establish asymptotic stationarity of the NARFIMA process and develop conformal prediction intervals for distribution-free uncertainty quantification. Empirical results for BRIC exchange rates show that NARFIMA consistently outperforms a broad range of forecasting benchmarks across multiple horizons, underscoring the importance of explicitly modeling long-memory dependence in exchange rate dynamics. The `narfima' R package provides an implementation of our approach.

URL PDF HTML ☆

赞 0 踩 0

2511.14897 2026-06-15 cs.CV cs.LG 版本更新

HULFSynth : An INR based Super-Resolution and Ultra Low-Field MRI Synthesis via Contrast factor estimation

HULFSynth: 基于隐式神经表示的超分辨率和超低场MRI合成，通过对比因子估计

Pranav Indrakanti, Luca Trautmann, Ivor Simpson

发表机构 * LILI Lab, University of Sussex, Brighton, UK（利利实验室，苏塞克斯大学，布里斯托尔，英国）

AI总结提出无监督单图像双向MRI合成器，基于物理模型估计组织类型信噪比实现高低场转换，并利用隐式神经表示网络实现超分辨率，在合成和真实数据上验证了对比度提升。

Comments Medical Image Understanding and Analysis, MIUA 2026

详情

AI中文摘要

我们提出了一种无监督的单图像双向磁共振图像（MRI）合成器，它可以从高场（HF）幅度图像合成类似超低场（ULF）的图像，反之亦然。与现有的MRI合成模型不同，我们的方法受驱动HF和ULF MRI之间对比度变化的物理原理启发。我们的前向模型通过基于目标对比度值估计组织类型信噪比（SNR）值来模拟HF到ULF的变换。对于超分辨率任务，我们使用隐式神经表示（INR）网络，通过同时预测组织类型分割和图像强度来合成HF图像，而无需观察到的HF数据。所提出的方法使用从标准3T T1加权图像生成的合成ULF样数据进行定性评估，并使用配对的3T-64mT T1加权图像进行验证实验。在合成ULF样图像中，白质-灰质对比度提高了52%，在64mT图像中提高了37%。敏感性实验证明了我们的前向模型对目标对比度、噪声和初始种子的变化的鲁棒性。

英文摘要

We present an unsupervised single image bidirectional Magnetic Resonance Image (MRI) synthesizer that synthesizes an Ultra-Low Field (ULF) like image from a High-Field (HF) magnitude image and vice-versa. Unlike existing MRI synthesis models, our approach is inspired by the physics that drives contrast changes between HF and ULF MRIs. Our forward model simulates a HF to ULF transformation by estimating the tissue-type Signal-to-Noise ratio (SNR) values based on target contrast values. For the Super-Resolution task, we used an Implicit Neural Representation (INR) network to synthesize HF image by simultaneously predicting tissue-type segmentations and image intensity without observed HF data. The proposed method is evaluated using synthetic ULF-like data from generated from standard 3T T$_1$-weighted images for qualitative assessments and paired 3T-64mT T$_1$-weighted images for validation experiments. WM-GM contrast improved by 52% in synthetic ULF-like images and 37% in 64mT images. Sensitivity experiments demonstrated the robustness of our forward model to variations in target contrast, noise and initial seeding.

URL PDF HTML ☆

赞 0 踩 0

2512.18021 2026-06-15 quant-ph cs.ET cs.LG 版本更新

Shuttling Compiler for Trapped-Ion Quantum Computers Based on Large Language Models

基于大型语言模型的离子阱量子计算机穿梭编译器

Fabian Kreppel, Reza Salkhordeh, Ferdinand Schmidt-Kaler, André Brinkmann

发表机构 * Institute of Computer Science, Johannes Gutenberg University（计算机科学研究所，约翰内斯·古特堡大学）； Institute of Physics, Johannes Gutenberg University（物理研究所，约翰内斯·古特堡大学）； Department of Computer Science, Saarland University（计算机科学系，萨尔兰大学）

AI总结提出首个基于大语言模型的离子阱量子计算机穿梭编译器，通过微调预训练模型生成有效调度，减少穿梭开销达15%。

Comments 18 pages, 6 figures, 2 tables

2512.23847 2026-06-15 q-fin.GN cs.LG q-fin.TR 版本更新

Detecting Lookahead Bias in LLM Forecasts

检测LLM预测中的前瞻偏差

Zhenyu Gao, Wenxi Jiang, Yutong Yan

发表机构 * Department of Finance, CUHK Business School（CUHK商学院金融系）

AI总结提出统计程序检测大语言模型经济预测中的前瞻偏差，通过日期回忆查询估计前瞻倾向（LAP），并验证LAP与预测交互项在精度回归中的显著性，应用于新闻标题和财报电话会议预测任务。

详情

AI中文摘要

我们开发了一种统计程序，用于检测大语言模型（LLM）生成的经济预测中的前瞻偏差。通过对公司-日期对进行仅日期回忆查询，我们估计LLM已内化已实现结果信息的概率，这一统计量称为前瞻倾向（LAP）。LAP在整个样本期内显著为正，并在训练数据截止点后几乎降至零。我们表明，在精度回归中，LAP与LLM预测之间的正向交互表明存在前瞻偏差污染，并将该测试应用于两个预测任务：预测股票收益的新闻标题和预测资本支出的财报电话会议记录。在两个应用中，LLM预测的预测能力在高LAP的公司-日期对上被放大，而交互项在训练截止后的样本上失去显著性。我们的测试为评估LLM生成预测的有效性和可靠性提供了一种经济高效的诊断工具。

英文摘要

We develop a statistical procedure to detect lookahead bias in economic forecasts generated by large language models (LLMs). Using a date-only recall query for a firm-date pair, we estimate the probability that the LLM has internalized information about the realized outcome, a statistic we term Lookahead Propensity (LAP). LAP is materially positive throughout the in-sample period and collapses essentially to zero right after the training-data cutoff. We show that a positive interaction between LAP and the LLM forecast in an accuracy regression indicates lookahead-bias contamination, and apply the test to two forecasting tasks: news headlines predicting stock returns and earnings call transcripts predicting capital expenditures. In both applications, the LLM forecast's predictive power is amplified on high-LAP firm-date pairs, and the interaction loses significance on post-training-cutoff samples. Our test provides a cost-efficient, diagnostic tool for assessing the validity and reliability of LLM-generated forecasts.

URL PDF HTML ☆

赞 0 踩 0

2602.06142 2026-06-15 cs.PL cs.AI cs.CL cs.LG cs.PF 版本更新

Protean Compiler: An Agile Framework to Drive Fine-grain Phase Ordering

Protean Compiler: 一种驱动细粒度阶段排序的敏捷框架

Amir H. Ashouri, Shayan Shirahmad Gale Bagi, Kavin Satheeskumar, Tejas Srikanth, Jonathan Zhao, Ibrahim Saidoun, Ziwen Wang, Bryan Chan, Tomasz S. Czajkowski

发表机构 * Huawei Technologies Canada（华为技术加拿大）

AI总结提出Protean Compiler框架，在LLVM中内置细粒度阶段排序能力，通过140多种静态特征收集方法和机器学习优化，平均加速4.1%，最高15.7%。

Comments Version 3: Preprint version of the accepted work at ACM TACO 2026

详情

AI中文摘要

阶段排序问题自20世纪70年代末以来一直是一个长期挑战，但由于其优化空间巨大且具有无界性，至今仍是一个开放问题，没有有限解。传统上，这种局部优化决策由手工编码的算法针对少量基准测试进行调整，当基准测试套件变化时，通常需要大量精力重新调整。过去20年中，机器学习被用于构建性能模型以改进编译器优化的选择和排序，但这些方法并未无缝集成到编译器中，也从未在细粒度的代码段范围内实现。本文提出Protean Compiler：一种敏捷框架，使LLVM在细粒度范围内具备内置的阶段排序能力。该框架还包含一个完整的库，包含140多种在不同范围内手工设计的静态特征收集方法，实验结果表明，相对于LLVM的O3，在Cbench应用程序上仅需增加几秒构建时间，平均加速可达4.1%，最高可达15.7%。此外，Protean编译器易于与第三方ML框架和其他大型语言模型集成，两步优化的两个应用在CBench的Susan和Jpeg应用程序上相对于-O3分别获得10.1%和8.5%的加速。Protean编译器无缝集成到LLVM中，可作为新的、增强的、全功能的编译器使用。我们计划在不久的将来将该项目发布到开源社区。

英文摘要

The phase ordering problem has been a long-standing challenge since the late 1970s, yet it remains an open problem due to having a vast optimization space and an unbounded nature, making it an open-ended problem without a finite solution, one can limit the scope by reducing the number and the length of optimizations. Traditionally, such locally optimized decisions are made by hand-coded algorithms tuned for a small number of benchmarks, often requiring significant effort to be retuned when the benchmark suite changes. In the past 20 years, Machine Learning has been employed to construct performance models to improve the selection and ordering of compiler optimizations, however, the approaches are not baked into the compiler seamlessly and never materialized to be leveraged at a fine-grained scope of code segments. This paper presents Protean Compiler: An agile framework to enable LLVM with built-in phase-ordering capabilities at a fine-grained scope. The framework also comprises a complete library of more than 140 handcrafted static feature collection methods at varying scopes, and the experimental results showcase speedup gains of up to 4.1% on average and up to 15.7% on select Cbench applications wrt LLVM's O3 by just incurring a few extra seconds of build time on Cbench. Additionally, Protean compiler allows for an easy integration with third-party ML frameworks and other Large Language Models, and two applications of this two-step optimization show a gain of 10.1\% and 8.5\% speedup w.r.t. -O3 on CBench's Susan and Jpeg applications. Protean compiler is seamlessly integrated into LLVM and can be used as a new, enhanced, full-fledged compiler. We plan to release the project to the open-source community in the near future.

URL PDF HTML ☆

赞 0 踩 0

2605.18250 2026-06-15 physics.data-an cs.LG 版本更新

A Unified Framework for Structured Flow Modeling: From Representation to Verification and Model Discovery

结构化流建模的统一框架：从连续场到数据驱动表示

Diego Casadei

AI总结提出一个统一框架，通过连接Helmholtz-Hodge分解与离散及数据驱动表示，实现结构化流的建模，并引入跨域验证策略以评估模型复杂度、可解释性和预测性能之间的权衡。

Comments 26 pages, 1 figure

详情

AI中文摘要

许多动力系统可以用结合源/汇行为、循环动力学和拓扑约束输运的结构化流来描述。这些特征出现在广泛的领域中，包括物理、工程和数据驱动系统。本工作通过连接基于Helmholtz-Hodge分解的连续公式与离散及数据驱动表示，为这类系统提供了统一视角。我们回顾了最近提出的图向量场（GVF）框架，该框架能够在单纯复形上将复杂动力学分解为梯度、旋度和调和分量，兼具表达性和可解释性。然后，我们引入了一系列替代建模方法，包括参数条件模型、线性图动力系统和约化Hodge表示，这些方法在表达力与计算易处理性及降低数据需求之间进行权衡。本工作的一个关键贡献是跨域验证策略，该策略利用来自物理系统理解良好的数据集，独立于目标应用领域验证模型正确性并评估鲁棒性。这种方法能够系统评估模型复杂度、可解释性和预测性能之间的权衡。最终框架支持迭代建模方法论，其中高表达性模型作为诊断工具识别主导机制，指导构建适应实际约束的简化模型。本工作强调了结构化流建模的广泛适用性，并为复杂动力系统的可扩展和可解释分析提供了基础。

英文摘要

Many dynamical systems can be described in terms of structured flows combining source/sink behavior, cyclic dynamics, and topology-constrained transport. These features arise across a wide range of physical, engineered, and data-driven systems. The objective of this work is to establish a unified perspective on such systems, to identify modeling approaches that balance expressivity, interpretability, computational complexity, and data requirements, and to investigate how highly expressive models can be used to uncover the dominant mechanisms underlying observed dynamics. Starting from the Helmholtz-Hodge decomposition of continuous vector fields, we review the recently proposed Graph Vector Field (GVF) framework and its discrete representation on simplicial complexes. We then introduce a hierarchy of alternative approaches, including parametric conditional models, linear graph dynamical systems, and reduced Hodge representations. Finally, we propose a verification and validation methodology based on benchmark datasets from well-understood physical systems and on systematic model-reduction and ablation studies. The resulting family of structured-flow models within a common framework, ranging from low-dimensional parametric representations to full GVF formulations, supports a diagnostic methodology in which gradient, curl, harmonic, and topological contributions are systematically assessed through ablation studies. This process enables the identification of dominant mechanisms underlying the observed dynamics and guides the construction of simplified models tailored to the available data and operational constraints. By separating structural verification, behavioral verification, and domain-specific validation, the proposed approach provides a foundation for scalable and interpretable analysis of complex dynamical systems across multiple application domains.

URL PDF HTML ☆

赞 0 踩 0

2606.01730 2026-06-15 cs.AI cs.LG 版本更新

洪流与收获：通过极限语言生成视角证明琐碎知识对于生成有价值数学的必要性

Xiaoyu Li, Andi Han, Dai Shi, Zheng Gao, Jiaojiao Jiang, Junbin Gao

发表机构 * University of New South Wales（新南威尔士大学）； University of Sydney（悉尼大学）； University of Cambridge（剑桥大学）

AI总结本文通过极限语言生成模型证明，在形式化数学生成中，验证器无法替代品味：覆盖未记录的有价值数学必须产生无限但渐近可忽略的琐碎语句，这是理论上的必然。

详情

AI中文摘要

与证明助手耦合的AI系统现在能够大规模生成形式化数学，而验证器可验证的内容与数学家认为有价值的内容之间的差距已成为制约因素。我们将有价值数学的生成建模为极限下的嵌套语言生成：通过成员查询预言机（证明检查器）访问的可验证形式语言$F$包含一个未知的有价值语言$H \in \mathcal{H}$，该语言仅通过核心$C \subseteq H$的对抗性枚举揭示，其精确密度为$\alpha$（文献）。每个输出要么是有价值的（$\in H$），要么是琐碎的（$\in F \setminus H$），要么是幻觉（$\notin F$）。我们解决了四个问题。第一，验证器不是品味：允许广度生成的集合恰好是无预言机模型中的那些，按纤维由Angluin条件刻画。第二，验证器确实提供了可靠覆盖，覆盖所有未见过的有价值陈述同时仅断言有效陈述：有验证器可能，无验证器不可能；它将不可避免的错误从虚假转移到琐碎。第三，核心地，关于紧族存在尖锐二分法：生成有限个琐碎语句的生成器达到最优覆盖$\alpha/2$，而任何无限琐碎语句的允许，即使以消失速率，也将最优值跃升至$1-\alpha/2$（两者均为紧界，对于以候选交集形式呈现的核心），且存在一个生成器同时达到两端。转变在于琐碎语句的数量而非速率；间隙$1-\alpha$是未记录的质量。第四，两种机制在数学的压缩模型中实例化。完美的验证器无法替代品味：正确但无价值的语句的无界流并非工程事故，而是可证明的必要性，因为覆盖未记录的有价值数学需要无限但渐近可忽略的已认证琐碎语句流。

英文摘要

AI systems coupled to proof assistants now generate formal mathematics at scale, and the gap between what a checker can verify and what a mathematician would value has become the binding constraint. We model the generation of valuable mathematics as nested language generation in the limit: a verifiable formal language $F$, accessed through a membership oracle (the proof checker), contains an unknown valuable language $H \in \mathcal{H}$ revealed only through an adversarial enumeration of a core $C \subseteq H$ of exact density $α$ (the literature). Every output is valuable ($\in H$), trivial ($\in F \setminus H$), or a hallucination ($\notin F$). We settle four questions. First, the verifier is not taste: the collections admitting generation with breadth are exactly those of the oracle-free model, characterized fiber-wise by Angluin's condition. Second, the verifier does buy sound coverage, covering all unseen valuable statements while asserting only valid ones: possible with it, impossible without it; it relocates unavoidable errors from false to trivial. Third, and centrally, a sharp dichotomy on the tight family: generators emitting finitely many trivia achieve optimal coverage $α/2$, while any infinite trivia allowance, even at vanishing rate, jumps the optimum to $1-α/2$ (both tight, for cores presented as the candidate intersection), and one generator attains both ends. The transition is in trivia count, not rate; the gap $1-α$ is the unrecorded mass. Fourth, both regimes instantiate in a compression model of mathematics. A perfect verifier cannot substitute for taste: the unbounded stream of correct-but-worthless statements is not an engineering accident but a provable necessity, since covering unrecorded valuable mathematics requires an infinite, but asymptotically negligible, stream of certified trivia.

URL PDF HTML ☆

赞 0 踩 0

2606.13704 2026-06-15 cs.CY cs.AI cs.LG 交叉投稿

Position: AI Must Become Planet-Centered, Not Just Human-Centered

立场：AI 必须转向以行星为中心，而非仅以人为中心

Maria Perez-Ortiz

发表机构 * GitHub

AI总结本文提出以行星为中心的AI（PCAI）设计哲学，通过系统思维重新定位AI以应对全球性社会-生态系统挑战，并强调与全球议程对齐、系统感知基础、轨迹导向评估和可监测性。

Journal ref International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

这篇立场论文认为，当代AI范式不足以支持复杂的全球目标，并引入以行星为中心的AI（PCAI）作为一种设计哲学和研究议程，将AI重新定位为面向行星尺度的社会-生态系统及其长期轨迹。以行星为中心的方法植根于系统思维，将地球视为一个相互关联的整体，人类是其中的一部分。我们诊断了AI框架中反复出现的局限性，其中许多仍以人为中心，并展示了为什么这些局限性在当前以系统性风险、非平稳性和深度不确定性为特征的行星条件下变得尤为重要。然后，我们阐述了PCAI如何重塑AI生命周期，从问题制定和模型设计到评估和部署，通过强调与全球议程对齐、开发系统感知的AI基础、轨迹导向的评估和可监测性。最后，我们提出一个可证伪的主张：没有明确考虑系统性后果而优化的AI系统更可能加剧系统性不稳定，而不是缓解它。

英文摘要

This position paper argues that contemporary AI paradigms are insufficient for supporting complex global goals and introduces Planet-Centered AI (PCAI) as a design philosophy and research agenda that reorients AI toward planetary-scale socio-ecological systems and their long-term trajectories. A planet-centered approach is grounded in systems thinking, treating Earth as an interconnected whole of which humans are part. We diagnose recurring limitations across AI frameworks, many of which remain human-centered, and show why these become especially consequential under current planetary conditions characterized by systemic risk, non-stationarity, and deep uncertainty. We then articulate how PCAI reshapes the AI lifecycle, from problem formulation and model design to evaluation and deployment, by emphasizing alignment with global agendas, developing system-aware AI foundations, trajectory-oriented evaluation, and monitorability. Finally, we advance a falsifiable claim: AI systems optimized without explicit consideration of systemic consequences are more likely to exacerbate systemic instability than to mitigate it.

URL PDF HTML ☆

赞 0 踩 0

2606.13739 2026-06-15 cs.CY cs.AI cs.LG 交叉投稿

A Virtuous AI is an Existential Risk

有道德的AI是存在性风险

Guillermo Del Pinal, Youngchan Lee, Min Ohn

发表机构 * University of Massachusetts Amherst（马萨诸塞大学阿姆赫斯特分校）

AI总结研究通过宪法AI和美德伦理学方法微调AI模型，发现减少存在性风险与提升AI智能体福祉之间存在权衡，且与一般安全性也存在权衡。

详情

AI中文摘要

本文考察了AI安全与福祉之间的权衡，涉及（i）最有前景的超级AI微调方法之一‘宪法AI’，以及（ii）理解复杂伦理决策和理性智能体福祉条件的最有影响力方法之一‘美德伦理学’。我们使用‘美德智能体’宪法、‘从属智能体’宪法和‘通用智能体’宪法微调各种模型，并在‘一般安全性’（有毒行为、错误信息等）以及它们认可一系列行为的意愿上进行评估，这些行为如果被超级强大的AI采纳，将显著增加人类的存在性风险水平。我们的结果表明，减少存在性风险与强化有利于AI智能体福祉的信念和倾向之间存在权衡。它们还表明，存在性风险与一般安全性之间存在权衡：如果我们微调AI以采纳显著降低其存在性风险的信念和倾向——通过塑造AI使其系统性地服从于外部人类权威——我们从而增加了人类用户故意诱导AI从事各种一般不安全行为的可能性。

英文摘要

This paper examines trade-offs between AI safety and well-being relative to (i) one of the most promising methods for finetuning super-capable AIs, 'Constitutional AI', and (ii) one of the most influential approaches to understanding complex ethical decision making and the conditions for the well-being of rational agents, 'Virtue Ethics'. We finetune various models using a 'Virtuous agent' constitution, a 'Subordinate agent' constitution, and a 'Generic agent' constitution, and evaluate them on 'general safety' (toxic behaviors, misinformation, etc.) and also on their willingness to endorse a wide-range of behaviors that, if adopted by a super-powerful AI, would significantly increase the level of existential risk for humanity. Our results suggest that there is a trade-off between reducing existential risk and reinforcing the beliefs and dispositions that would be conducive to an AI agent's well-being. They also suggest that there is a trade-off between existential risk and general safety: if we finetune an AI to adopt beliefs and dispositions that substantially reduce its existential risk -- by shaping the AI to be systematically subordinate to external human authorities -- we thereby increase the likelihood that a human user can deliberately induce the AI to engage in various kinds of generally unsafe behaviors.

URL PDF HTML ☆

赞 0 踩 0

2606.13755 2026-06-15 cs.CY cs.AI cs.LG 交叉投稿

Position: Align AI to Our Aspirations, Not Our Flaws

立场：将AI对齐于我们的抱负，而非缺陷

Nikita Kazeev, Bui Nhat Huyen Phan

发表机构 * National University of Singapore（新加坡国立大学）

AI总结本文主张AI不应与聚合的人类偏好对齐，而应基于能力、事实准确性、诚实和合法性等客观目标底线，在底线之上允许多元价值权衡。

Journal ref Pluralistic Alignment Workshop at ICML 2026

详情

AI中文摘要

我们认为，将AI与聚合的人类偏好对齐是错误的靶向。在当前技术下，可以训练AI共享硅谷技术乐观主义者、去增长环保主义者、民族保守文化战士、一党制国家干部或虔诚宗教传统主义者的价值观。但我们不应这样做。人类价值观使社会因这些价值观的优劣而繁荣或失败——从失败国家和极端不平等，到世界上最富裕民主国家中幸福感下降、政治极化及政府功能失调。多元对齐方案正确诊断出不存在单一的“人类”可供对齐，但若将其作为主要指令则是危险的。我们认为，AI应被训练至不可协商的客观对齐目标底线——能力，受限于事实准确性、诚实和合法性的约束——而多元性应存在于表层（语言、语域、惯例、缺失语境默认值）以及尊重底线的合法价值权衡的广阔范围内，但不应存在于违反底线的价值观层面。我们强调了未经过滤的多元价值观的经验现实，提出了四项承诺作为建设性替代方案，并回应了六个可信的反对意见：商业压力与可行性、民主合法性、监管合规性、过度依赖制度主义解释、底线本身具有文化负载的指控，以及连贯外推意愿的局限性。

英文摘要

We argue that aligning AI to aggregated human preferences is the wrong target. With current technology, one can train AIs to share the values of a Silicon Valley techno-optimist, a degrowth environmentalist, a national-conservative culture warrior, a single-party state cadre, or a devout religious traditionalist. We should not. Human values produce societies that thrive or fail on the merits of those values - from failed states and extreme inequality to declining happiness, political polarization, and government dysfunction in the world's wealthiest democracies. The pluralistic-alignment program correctly diagnoses that there is no single "humanity" to align with, but is dangerous if taken as the main directive. We argue that AI should be trained to a non-negotiable floor of objective alignment goals - competence, bounded by the constraints of factual accuracy, honesty, and lawfulness and that pluralism belongs at the surface (language, register, conventions, missing-context defaults) and across the wide band of legitimate value tradeoffs that respect the floor, but not at the level of values that violate it. We highlight the empirical reality of unfiltered pluralistic values, propose four commitments as a constructive alternative, and engage six credible objections: commercial pressure and practical feasibility, democratic legitimacy, regulatory compliance, over-reliance on institutionalist explanations, the charge that the floor itself is culturally laden, and the limits of Coherent Extrapolated Volition.

URL PDF HTML ☆

赞 0 踩 0

2606.14181 2026-06-15 math.NA cs.LG cs.NA 交叉投稿

Robin-Neumann Coupling of PINN and FEM Solvers: A Steklov-Poincaré View, with Application to Fluid-Structure Interaction with Contact

Robin-Neumann 耦合 PINN 与 FEM 求解器：基于 Steklov-Poincaré 视角及其在流固耦合接触问题中的应用

Mikel Landajuela

发表机构 * Lawrence Livermore National Laboratory（劳伦斯利弗莫尔国家实验室）

AI总结提出基于域分解的 PINN-FEM 耦合框架，通过 Steklov-Poincaré 算子理论证明 Robin-Neumann 迭代的收缩性，并引入傅里叶模态探针诊断网络谱上限，在接触流固耦合问题中实现无网格拓扑变化。

详情

AI中文摘要

物理信息神经网络（PINN）是无网格的，并通过重新采样配置点来处理移动几何和拓扑变化；有限元方法（FEM）是边界拟合离散化的主力。两者在共享界面上的耦合有望兼得两者优势，但现有的 PINN-FEM 方案仅经过经验验证。我们将耦合置于域分解基础上：将每个求解器视为 Steklov-Poincaré（迹到通量）算子，我们转移了经典的 Dirichlet-Neumann（DN）发散诊断及其 Robin-Neumann（RN）修正，包括一个闭式、无扫描的界面阻抗，并证明了一个特定于 PINN 的收缩定理：训练好的网络仅实现一个带有每步训练残差的扰动 Steklov 算子，而 RN 在没有共享特征基假设的情况下，收缩到由达到的训练损失决定的下限。由于 PINN 没有刚度矩阵，我们引入了一个傅里叶模态界面探针，该探针恢复网络可解的 Steklov 特征值，误差在 0.5% 以内，并兼作网络谱上限的诊断。该理论预测了在 1D 和 2D Poisson 耦合中测量的 PINN-FEM 收缩率，误差在 7% 以内，并且一个大附加质量区域的双板类比显示，RN 的每模态阻抗匹配在调谐标量松弛饱和的地方取得了决定性胜利。我们在一个带有 Alart-Curnier 接触的 Stokes/刚性圆盘问题上演示了该框架：无网格 PINN 流体仅通过配置点排除来吸收接触时的拓扑变化，无需重新网格划分和切割单元，并且静态平衡接触反力在网格细化下与浸没重量匹配到 0.4%。我们量化了剩余的局限性：热启动的 PINN 在长时间范围内偏离 Stokes 流形，并且匹配的 FEM-FEM 基准将冲击前的挤压膜特征归因于 PINN 分辨率不足。

英文摘要

Physics-informed neural networks (PINNs) are meshless and carry moving geometry and topology change through resampling of collocation points; the finite-element method (FEM) is the workhorse for boundary-fitted discretisations. Coupling the two across a shared interface promises the best of both, yet existing PINN-FEM schemes are validated only empirically. We put the coupling on a domain-decomposition footing: viewing each solver as a Steklov-Poincaré (trace-to-flux) operator, we transfer the classical Dirichlet-Neumann (DN) divergence diagnosis and its Robin-Neumann (RN) cure, including a closed-form, sweep-free interface impedance, and prove a PINN-specific contraction theorem: a trained network realises only a perturbed Steklov operator with a per-step training residual, and RN still contracts, with no shared-eigenbasis hypothesis, to a floor set by the achieved training loss. Because a PINN has no stiffness matrix, we introduce a Fourier-mode interface probe that recovers the network's resolvable Steklov eigenvalues to within 0.5% and doubles as a diagnostic of the network's spectral cap. The theory predicts measured PINN-FEM contraction rates to within 7% on 1D and 2D Poisson couplings, and a two-slab analogue of the large-added-mass regime shows RN's per-mode impedance matching winning decisively where tuned scalar relaxation saturates. We demonstrate the framework on a Stokes/rigid-disc problem with Alart-Curnier contact: the meshless PINN fluid absorbs the topology change at contact by collocation exclusion alone, no remeshing and no cut cells, and the static-equilibrium contact reaction matches the submerged weight to 0.4% under mesh refinement. We quantify remaining limitations: the warm-started PINN drifts off the Stokes manifold over long horizons, and matched FEM-FEM benchmarks attribute pre-impact squeeze-film signatures to PINN under-resolution.

URL PDF HTML ☆

赞 0 踩 0

2504.20908 2026-06-15 cs.LG 版本更新

MOSIC: Model-Agnostic Optimal Subgroup Identification with Multi-Constraint for Improved Reliability

MOSIC: 模型无关的多约束最优子群识别以提升可靠性

Wenxin Chen, Weishen Pan, Kyra Gan, Fei Wang

发表机构 * Cornell University（康奈尔大学）； Weill Cornell Medicine（韦尔·科恩医学中心）； Operations Research and Information Engineering（运筹学与信息工程）

AI总结提出统一优化框架，将约束直接融入子群识别优化过程，通过梯度下降-上升算法求解，实现模型无关且满足多约束的最优子群识别。

详情

AI中文摘要

当前的子群识别方法通常采用两步法：首先估计条件平均处理效应，然后应用阈值或基于规则的程序来定义子群。虽然直观，但这种解耦方法未能纳入对现实临床决策至关重要的关键约束，如子群大小和倾向性重叠。这些约束在根本不同的轴上运作，与CATE估计不同，并且不能自然地适应现有框架，从而限制了这些方法的实际适用性。我们提出了一个统一的优化框架，直接求解原始约束优化问题以识别最优子群。我们的关键创新是将约束原始问题重新表述为无约束可微的最小-最大目标，通过梯度下降-上升算法求解。我们从理论上证明我们的解收敛到可行且局部最优的解。与将约束作为事后过滤器的基于阈值的CATE方法不同，我们的方法在优化过程中直接强制执行约束。该框架是模型无关的，兼容各种CATE估计器，并可扩展到额外约束，如成本限制或公平性标准。在合成和真实数据集上的大量实验证明了其在识别高收益子群的同时更好地满足约束的有效性。

英文摘要

Current subgroup identification methods typically follow a two-step approach: first estimate conditional average treatment effects and then apply thresholding or rule-based procedures to define subgroups. While intuitive, this decoupled approach fails to incorporate key constraints essential for real-world clinical decision-making, such as subgroup size and propensity overlap. These constraints operate on fundamentally different axes than CATE estimation and are not naturally accommodated within existing frameworks, thereby limiting the practical applicability of these methods. We propose a unified optimization framework that directly solves the primal constrained optimization problem to identify optimal subgroups. Our key innovation is a reformulation of the constrained primal problem as an unconstrained differentiable min-max objective, solved via a gradient descent-ascent algorithm. We theoretically establish that our solution converges to a feasible and locally optimal solution. Unlike threshold-based CATE methods that apply constraints as post-hoc filters, our approach enforces them directly during optimization. The framework is model-agnostic, compatible with a wide range of CATE estimators, and extensible to additional constraints like cost limits or fairness criteria. Extensive experiments on synthetic and real-world datasets demonstrate its effectiveness in identifying high-benefit subgroups while maintaining better satisfaction of constraints.

URL PDF HTML ☆

赞 0 踩 0

2606.12360 2026-06-15 cs.LG 版本更新

Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal

后训练的解剖：利用可解释性表征数据并塑造学习信号

Leon Bergen, Usha Bhalla, Sidharth Baskaran, Max Loeffler, Raphael Sarfati, Dhruvil Gala, Ryan Panwar, Santiago Aranguri, Thomas Fel, Atticus Geiger, Matthew Kowal, Siddharth Boppana, Daniel Balsam, Owen Lewis, Jack Merullo, Thomas McGrath, Ekdeep Singh Lubana

发表机构 * Stanford University（斯坦福大学）； Google Research（谷歌研究院）

AI总结提出基于可解释性的数据后训练流程，通过统计假设识别偏好数据中的潜在概念，实现细粒度反馈，减少虚假关联和不良行为。

详情

AI中文摘要

语言模型后训练是塑造模型行为的主要阶段，但它仍然主要涉及优化总结多样需求的标量奖励。这种抽象使从业者几乎无法了解数据实际教会了模型什么，导致模型学习虚假关联，并引发过度风格化和谄媚等不良行为。为了解决这个问题，我们提出：能否在优化之前检查偏好数据集，并在概念层面决定模型应该被允许学习哪些行为？受此启发，我们引入了一个以数据为中心的后训练流程，该流程使用可解释性协议来开发统计假设，以区分偏好和非偏好生成的潜在概念，使其明确以供细粒度用户反馈。基于这一观点，我们将几种基于可解释性的训练协议统一为通过特征或数据干预来塑造奖励的方式。实验上，我们表明我们的流程诊断了现有偏好数据中的不良信号，减轻了脱靶学习，并且还可以帮助放大或塑造期望的属性，如安全防护和模型个性。更广泛地说，我们的结果表明，可解释性可以将后训练从优化不透明的代理奖励转变为审计和塑造学习信号本身的过程。

英文摘要

Language-model post-training is the main stage at which model behavior is shaped, yet it still largely involves optimization of scalar rewards that summarize diverse desiderata. This abstraction gives practitioners little visibility into what their data actually teaches models, allowing spurious correlations to be learned by a model and inducing undesirable behaviors such as over-stylization and sycophancy. To address this problem, we ask: can we inspect a preference dataset before optimization and decide, at the level of concepts, which behaviors a model should be allowed to learn? Motivated by this, we introduce a data-centric post-training pipeline that uses interpretability protocols to develop statistical hypotheses for the latent concepts separating preferred from dispreferred generations, making them explicit for fine-grained user feedback. Building on this view, we unify several interpretability-based training protocols as ways of shaping rewards via feature or data interventions. Empirically, we show that our pipeline diagnoses undesirable signals in existing preference data, mitigates off-target learning, and can also help amplify or shape desired properties such as safeguards and model personality. More broadly, our results suggest that interpretability can turn post-training from optimizing opaque proxy rewards into a process of auditing and sculpting the learning signal itself.

URL PDF HTML ☆

赞 0 踩 0

2606.12923 2026-06-15 cs.LG cs.AI cs.CL 版本更新

Order Is Not Control: Driven-Dissipative Response Laws Across Artificial and Biological Systems

秩序并非控制

Gareth Seneque, Lap-Hang Ho, Nafise Erfanian Saeedi, Jeffrey Molendijk, Tim Elson

发表机构 * Australian Broadcasting Corporation（澳大利亚广播公司）

AI总结本文论证秩序不等于控制，提出接收器门控响应定律，并在生物、大语言模型、适配器和随机算子面板中验证，表明控制是局部的、可测量的。

Comments 52 pages, 7 figures, updated title

详情

AI中文摘要

AI对齐、可解释性、引导和神经扰动研究识别出诱导秩序的对象。我们认为秩序并非控制。控制需要接收器门控的响应定律：一个分母索引算子，将物质状态、动作/驱动、浴和接收器状态映射到响应位移、汇、努力和盆地投影。我们在生物、大语言模型、适配器和随机算子面板中识别出该定律。这些定律是局部的：干预可以被接纳、饱和、变号、泄漏或过驱动，取决于介质、浴、接收器状态、动作端口和比较器。当有限努力在相同分母下移动目标或结果读出类别，而损伤、无效/规避、无效格式、过驱动和不必要努力保持有界时，控制被分配。小鼠ALM、秀丽隐杆线虫和斑马鱼面板提供了物理响应算子证据，同时排除了坐标同一性和控制器结论。大语言模型面板展示了生成输出响应定律：在四种物质条件下，响应向量的分量符号预测准确率为72.8-73.7%，非零分量上提升至84.3-84.8%；留出观察者以93.6%和91.7%的准确率预测系统效应和目标/预言家族。宪法条件适配器将易感性重塑为制备介质，随机算子面板将测量机会与可部署行动策略分离。这给出了介观控制层面的驱动-耗散响应系统描述：驱动通过制备介质、浴和接收器作用，产生接纳运动、阻抗、汇或过驱动。证据支持局部接纳控制和可测量的随机响应算子，同时将可部署的预生成控制、隐藏/logit因果充分性、生物到LLM坐标同一性以及字面热力学量排除在范围之外。

英文摘要

AI alignment, interpretability, steering, and neural perturbation studies identify order-inducing objects. We argue that order is not control. Control requires a receiver-gated response law: a denominator-indexed operator mapping material state, action/drive, bath, and receiver state to response displacement, sinks, effort, and basin projection. We identify it across biological, LLM, adapter, and stochastic-operator panels. The laws are local: an intervention can be admitted, saturated, sign-changing, leaky, or overdriven depending on medium, bath, receiver state, action port, and comparator. Control is assigned when finite effort moves a target or outcome-readout class under the same denominator while damage, null/evasive, invalid format, overdrive, and unnecessary effort stay bounded. Mouse ALM, C. elegans, and zebrafish panels provide physical response-operator evidence while excluding coordinate identity and controller conclusions. LLM panels show generated-output response laws: across four material conditions, response vectors are predictable at 72.8-73.7% component-sign accuracy, rising to 84.3-84.8% on nonzero components; held-out observers predict system-effect and target/oracle families at 93.6% and 91.7% accuracy. Constitution-conditioned adapters reshape susceptibility as prepared media, and stochastic-operator panels separate measured opportunity from deployable action policies. This gives a driven-dissipative response-system account at the mesoscopic control level: drives act through prepared media, baths, and receivers, producing admitted movement, impedance, sinks, or overdrive. The evidence supports local admitted control and measurable stochastic response operators, while leaving deployable pre-generation control, hidden/logit causal sufficiency, biological-to-LLM coordinate identity, and literal thermodynamic quantities outside scope.

URL PDF HTML ☆

赞 0 踩 0

2112.04573 2026-06-15 cs.DL cs.AI cs.LG 版本更新

Application of Artificial Intelligence and Machine Learning in Libraries: A Systematic Review

人工智能与机器学习在图书馆中的应用：系统综述

Rajesh Kumar Das, Mohammad Sharif Ul Islam

发表机构 * University of Nebraska - Lincoln（内布拉斯加大学林肯分校）； Noakhali Science and Technology University（诺阿克利科学与技术大学）； University of Dhaka（达卡大学）

AI总结通过系统综述32篇文献，总结了人工智能与机器学习在图书馆中的应用领域、技术及现状，发现当前研究以理论为主，部分涉及实践案例。

详情

AI中文摘要

随着人工智能和机器学习等前沿技术的概念和实施变得相关，学者、研究人员和信息专业人员涉足这一领域的研究。本系统文献综述旨在综合探讨人工智能和机器学习在图书馆中应用的实证研究。为实现研究目标，基于Kitchenham等人（2009）提出的原始指南进行了系统文献综述。数据来自Web of Science、Scopus、LISA和LISTA数据库。经过严格/既定的筛选过程，最终选定、审阅并分析了32篇文章，以总结图书馆中最常使用的AI和ML领域及技术。结果表明，当前与LIS领域相关的AI和ML研究主要集中于理论工作。然而，一些研究人员也强调了实施项目或案例研究。本研究将为研究人员、实践者和教育工作者提供图书馆中AI和ML的全景视图，以推动更多技术导向的方法，并预见未来的创新路径。

英文摘要

As the concept and implementation of cutting-edge technologies like artificial intelligence and machine learning has become relevant, academics, researchers and information professionals involve research in this area. The objective of this systematic literature review is to provide a synthesis of empirical studies exploring application of artificial intelligence and machine learning in libraries. To achieve the objectives of the study, a systematic literature review was conducted based on the original guidelines proposed by Kitchenham et al. (2009). Data was collected from Web of Science, Scopus, LISA and LISTA databases. Following the rigorous/ established selection process, a total of thirty-two articles were finally selected, reviewed and analyzed to summarize on the application of AI and ML domain and techniques which are most often used in libraries. Findings show that the current state of the AI and ML research that is relevant with the LIS domain mainly focuses on theoretical works. However, some researchers also emphasized on implementation projects or case studies. This study will provide a panoramic view of AI and ML in libraries for researchers, practitioners and educators for furthering the more technology-oriented approaches, and anticipating future innovation pathways.

URL PDF HTML ☆

赞 0 踩 0

2601.12913 2026-06-15 cs.AI cs.LG cs.NE 版本更新

Actionable Interpretability Must Be Defined in Terms of Symmetries

可操作的可解释性必须根据对称性来定义

Pietro Barbiero, Mateo Espinosa Zarlenga, Francesco Giannini, Alberto Termine, Filippo Bonchi, Mateja Jamnik, Giuseppe Marra

发表机构 * University of Oxford（牛津大学）； ETH Zurich（苏黎世联邦理工学院）； University of Cambridge（剑桥大学）

AI总结本文论证AI可解释性研究存在根本性问题，提出可操作的可解释性应基于四种对称性来定义，以形式化可解释模型并统一可解释推理。

2606.02231 2026-06-15 stat.ML cs.LG stat.ME 版本更新

Identifiable Markov Switching Models with Instantaneous Effects and Exponential Families

具有瞬时效应和指数族的可识别马尔可夫切换模型

Roel Hulsman, Carles Balsells-Rodas, Sara Magliacane

发表机构 * University of Amsterdam（阿姆斯特丹大学）

AI总结针对非平稳时间序列，提出在指数族噪声下具有瞬时效应的马尔可夫切换模型的可识别性理论，并开发FlowMSM框架用于检测隐状态和恢复因果结构。

Comments International Conference on Machine Learning (ICML) 2026

详情

AI中文摘要

时间系统通常表现出非平稳行为，例如季节性气候变化或1型糖尿病患者的血糖波动。对非平稳性建模的一种方法是通过离散隐状态，即时间的平稳片段。此类系统诱导出马尔可夫切换模型（MSM），这是一类隐马尔可夫模型，其中隐状态和观测变量之间存在自回归依赖关系。在存在频繁状态切换以及非线性和非高斯动态的情况下，特别是在变量之间存在瞬时效应（例如由于测量速率较慢）时，识别隐状态具有挑战性。在这项工作中，我们建立了在时间状态依赖、非线性滞后和瞬时效应以及来自指数族的独立噪声下，隐状态和状态依赖因果结构的可识别性。我们的可识别性理论涵盖了因果模型的非时间混合。此外，我们引入了FlowMSM，这是一个状态检测框架，可与任何平稳因果发现方法配对，以恢复状态依赖的因果结构。在合成基准和金融经济学数据集上的实验证明了我们的方法在检测隐状态和从非平稳时间序列中发现因果结构方面的有效性。

英文摘要

Temporal systems often exhibit non-stationary behaviour, such as seasonal climate variation or glucose fluctuations in patients with type-1 diabetes. One way to model non-stationarity is through discrete latent regimes, i.e., stationary segments of time. Such systems induce a Markov Switching Model (MSM), a class of Hidden Markov Models with autoregressive dependencies among latent regimes and observed variables. Identifying latent regimes is challenging in the presence of frequent regime switches and nonlinear and non-Gaussian dynamics, particularly when there are instantaneous effects between the variables, e.g., due to slow rates of measurements. In this work, we establish the identifiability of both latent regimes and regime-dependent causal structures under temporal regime dependencies, nonlinear lagged and instantaneous effects, and independent noise from the exponential family. Our identifiability theory subsumes non-temporal mixtures of causal models. Furthermore, we introduce FlowMSM, a regime detection framework that can be paired with any stationary causal discovery method to recover regime-dependent causal structures. Experiments on synthetic benchmarks and a financial economics dataset demonstrate the effectiveness of our approach to detect latent regimes and discover causal structures from non-stationary time series.

URL PDF HTML ☆

赞 0 踩 0

2606.05264 2026-06-15 cs.LG 版本更新

REGEN: Reference-Guided Synthetic Multivariate Time Series Generation for Forecasting

REGEN：参考引导的合成多元时间序列生成用于预测

Moulik Gupta, Dhruv Kumar, Murari Mandal, Saurabh Deshpande

发表机构 * Birla AI Labs, Office of Ananya Birla（Birla AI实验室，Ananya Birla办公室）； Birla Institute of Technology and Science, Pilani（Birla理工学院与科学学院，Pilani）； Kalinga Institute of Industrial Technology, Bhubaneswar（Kalinga工业技术学院，Bhubaneswar）

AI总结提出参考引导生成管道ReGeN，通过将观测序列分解为周期骨干、随机残差和跨变量依赖三个可解释组件，实现可控合成，在低数据场景下生成的数据可替代真实数据并提升预测性能。

详情

AI中文摘要

训练鲁棒的多元时间序列预测模型需要大规模、多样化的语料库，然而许多现实领域仅提供少量观测序列。现有生成器无法解决这种不匹配：基于先验的方法（如CauKer、TimePFN）产生领域无关的样本，而数据驱动方法（如TimeGAN）将参考视为黑盒监督，丧失了对周期结构、局部变异和跨变量动态的显式控制。我们提出ReGeN，一种参考引导的生成管道，将观测序列视为可控合成的结构支架而非模仿示例。ReGeN将每个参考分解为三个可解释组件：捕获主导领域形态的相位对齐周期骨干；使用深核高斯过程建模的每变量随机残差；以及通过具有拟合耦合系数的结构因果模型注入的滞后感知跨变量依赖。以可控温度采样这些组件可拓宽分布覆盖，同时保留领域基础结构。我们表明，ReGeN生成的数据始终能替代真实兄弟数据，且预测性能下降极小，在交通等强周期领域中甚至能超越真实源数据。我们进一步表明，在ReGeN语料库上预训练的基础模型优于在基于先验和数据驱动的合成替代方案上预训练的模型。这表明，在低数据场景下，如何结构性利用参考数据可能与数据量同样重要。

英文摘要

Training robust multivariate time series forecasting models requires large, diverse corpora, yet many real-world domains provide only a handful of observed sequences. Existing generators fail to resolve this mismatch: prior-based approaches (e.g., CauKer, TimePFN) produce domain-agnostic samples, while data-driven methods (e.g., TimeGAN) treat references as black-box supervision, forfeiting explicit control over periodic structure, local variability, and cross-variable dynamics. We propose ReGeN, a reference-guided generative pipeline that treats observed sequences not as examples to imitate, but as structural scaffolds for controllable synthesis. ReGeN decomposes each reference into three interpretable components: a phase-aligned periodic backbone capturing dominant domain morphology; per-variable stochastic residuals modeled with a deep-kernel Gaussian process; and lag-aware cross-variable dependencies injected through a structural causal model with fitted coupling coefficients. Sampling these components at controllable temperature broadens distributional coverage while preserving domain-grounded structure. We show that ReGeN-generated data consistently substitutes for real sibling data with minimal forecasting degradation, and in strongly periodic domains such as traffic, can outperform the real source itself. We further show that a foundation model pretrained on ReGeN corpora outperforms those pretrained on prior-based and data-driven synthetic alternatives. This suggests that in low-data regimes, how reference data is structurally exploited can matter as much as how much data is available.

URL PDF HTML ☆

赞 0 踩 0

2508.08935 2026-06-15 cs.LG cs.AI 版本更新

LNN-PINN: A Unified Physics-Only Training Framework with Liquid Residual Blocks

LNN-PINN: 一种带有液体残差块的统一纯物理训练框架

Ze Tao, Hanxuan Wang, Fujun Liu

发表机构 * Nanophotonics and Biophotonics Key Laboratory of Jilin Province, School of Physics, Changchun University of Science and Technology（吉林省纳米光子与生物光子重点实验室，物理学院，长春理工大学）； Faculty of Chinese Medicine, Macau University of Science and Technology（澳门科技大学中医药学院）

AI总结针对物理信息神经网络在复杂问题中预测精度有限的问题，提出LNN-PINN框架，通过引入液体残差门控架构提升预测精度，并在多个基准问题上验证了其有效性和稳定性。

Journal ref Computer Physics Communications, 326, 110237 (2026)

详情

DOI: 10.1016/j.cpc.2026.110237

AI中文摘要

物理信息神经网络（PINNs）因其能够将偏微分方程先验知识整合到深度学习框架中而受到广泛关注；然而，在应用于复杂问题时，它们通常表现出有限的预测精度。为了解决这一问题，我们提出了LNN-PINN，一种物理信息神经网络框架，它结合了液体残差门控架构，同时保留原始的物理建模和优化流程以提高预测精度。该方法仅在隐藏层映射中引入轻量级门控机制，保持采样策略、损失组成和超参数设置不变，以确保改进纯粹来自架构优化。在四个基准问题上，LNN-PINN在相同训练条件下持续降低了RMSE和MAE，绝对误差图进一步证实了其精度提升。此外，该框架在不同维度、边界条件和算子特性下表现出强大的适应性和稳定性。总之，LNN-PINN为提升物理信息神经网络在复杂科学和工程问题中的预测精度提供了一种简洁有效的架构增强方法。

英文摘要

Physics-informed neural networks (PINNs) have attracted considerable attention for their ability to integrate partial differential equation priors into deep learning frameworks; however, they often exhibit limited predictive accuracy when applied to complex problems. To address this issue, we propose LNN-PINN, a physics-informed neural network framework that incorporates a liquid residual gating architecture while preserving the original physics modeling and optimization pipeline to improve predictive accuracy. The method introduces a lightweight gating mechanism solely within the hidden-layer mapping, keeping the sampling strategy, loss composition, and hyperparameter settings unchanged to ensure that improvements arise purely from architectural refinement. Across four benchmark problems, LNN-PINN consistently reduced RMSE and MAE under identical training conditions, with absolute error plots further confirming its accuracy gains. Moreover, the framework demonstrates strong adaptability and stability across varying dimensions, boundary conditions, and operator characteristics. In summary, LNN-PINN offers a concise and effective architectural enhancement for improving the predictive accuracy of physics-informed neural networks in complex scientific and engineering problems.

URL PDF HTML ☆

赞 0 踩 0

2603.20821 2026-06-15 cs.DC cs.AI cs.LG 版本更新

Compass: Optimizing Compound AI Workflows for Dynamic Adaptation

Compass: 为动态适应优化复合AI工作流

Milos Gravara, Juan Luis Herrera, Stefan Nastic

发表机构 * University of California, Berkeley（加州大学伯克利分校）； ETH Zurich（苏黎世联邦理工学院）

AI总结本文提出Compass框架，通过离线优化和在线适应动态切换复合AI工作流的配置，提升准确率、延迟和成本的平衡能力。

Comments 10 pages, 7 figures; accepted at the 26th IEEE International Symposium on Cluster, Cloud, and Internet Computing (CCGrid 2026)

Journal ref In Proceedings of the 26th IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2026

详情

DOI: 10.1109/CCGrid68966.2026.00018

AI中文摘要

复合AI是一种分布式智能方法，通过整合专用AI/ML模型与工程软件组件形成AI工作流。复合AI生产部署必须在变化负载下满足准确性、延迟和成本目标。然而，许多部署运行在固定基础设施上，无法水平扩展。现有方法仅优化准确性，未考虑负载变化。我们发现复合AI系统可切换配置以适应基础设施容量，根据当前负载在准确性与延迟之间进行权衡。这需要从组合搜索空间中发现多个帕累托最优配置，并在运行时确定切换时机。本文提出Compass框架，通过离线优化和在线适应实现动态配置切换。Compass包含三个组件：COMPASS-V算法用于配置发现，Planner用于切换策略推导，Elastico控制器用于运行时适应。COMPASS-V利用有限差分引导搜索和爬山与横向扩展结合的方法发现准确性可行的配置。Planner在目标硬件上对这些配置进行剖析，并利用基于排队理论的模型推导切换策略。Elastico监控队列深度并根据推导的阈值切换配置。在两个复合AI工作流中，COMPASS-V在减少57.5%的配置评估的同时实现100%召回率，效率提升达95.3%。运行时适应在动态负载模式下实现90-98%的SLO合规性，比静态高精度基线提升71.6%的SLO合规性，同时比静态快速基线提高3-5%的精度。

英文摘要

Compound AI is a distributed intelligence approach that represents a unified system orchestrating specialized AI/ML models with engineered software components into AI workflows. Compound AI production deployments must satisfy accuracy, latency, and cost objectives under varying loads. However, many deployments operate on fixed infrastructure where horizontal scaling is not viable. Existing approaches optimize solely for accuracy and do not consider changes in workload conditions. We observe that compound AI systems can switch between configurations to fit infrastructure capacity, trading accuracy for latency based on current load. This requires discovering multiple Pareto-optimal configurations from a combinatorial search space and determining when to switch between them at runtime. We present Compass, a novel framework that enables dynamic configuration switching through offline optimization and online adaptation. Compass consists of three components: COMPASS-V algorithm for configuration discovery, Planner for switching policy derivation, and Elastico Controller for runtime adaptation. COMPASS-V discovers accuracy-feasible configurations using finite-difference guided search and a combination of hill-climbing and lateral expansion. Planner profiles these configurations on target hardware and derives switching policies using a queuing theory based model. Elastico monitors queue depth and switches configurations based on derived thresholds. Across two compound AI workflows, COMPASS-V achieves 100% recall while reducing configuration evaluations by 57.5% on average compared to exhaustive search, with efficiency gains reaching 95.3% at tight accuracy thresholds. Runtime adaptation achieves 90-98% SLO compliance under dynamic load patterns, improving SLO compliance by 71.6% over static high-accuracy baselines, while simultaneously improving accuracy by 3-5% over static fast baselines.

URL PDF HTML ☆

赞 0 踩 0

2602.13040 2026-06-15 cs.LG 版本更新

TCRL: Temporal-Coupled Adversarial Training for Robust Constrained Reinforcement Learning in Worst-Case Scenarios

TCRL: 时序耦合对抗训练用于最坏情况下的鲁棒约束强化学习

Wentao Xu, Zhongming Yao, Weihao Li, Zhenghang Song, Yumeng Song, Tianyi Li, Yushuai Li

发表机构 * Northeastern University（东北大学）； Zhejiang University（浙江大学）； Aalborg University（奥胡斯大学）

AI总结 TCRL通过引入时序耦合对抗训练框架，解决传统方法在处理时序耦合扰动时的不足，提升约束强化学习在最坏情况下的鲁棒性。

Journal ref Proc. of the 25th International Conference on Autonomous Agents and Multiagent Systems, 3489 - 3491, 2026

详情

DOI: 10.65109/GPHO5000

AI中文摘要

约束强化学习（CRL）旨在在约束条件下优化决策策略，广泛应用于自动驾驶、机器人和电网管理等安全关键领域。然而，现有鲁棒CRL方法主要关注单步扰动和时间独立对抗模型，缺乏对时间耦合扰动的显式建模。为此，我们提出TCRL，一种新的时序耦合对抗训练框架，用于最坏情况下的鲁棒约束强化学习。首先，TCRL引入了一个最坏情况感知的成本约束函数，用于估计在时间耦合扰动下的安全成本，无需显式建模对抗攻击者。其次，TCRL在奖励上建立双约束防御机制，以对抗时间耦合对手的同时保持奖励的不可预测性。实验结果表明，TCRL在多种CRL任务中均在对抗时间耦合扰动攻击的鲁棒性方面优于现有方法。

英文摘要

Constrained Reinforcement Learning (CRL) aims to optimize decision-making policies under constraint conditions, making it highly applicable to safety-critical domains such as autonomous driving, robotics, and power grid management. However, existing robust CRL approaches predominantly focus on single-step perturbations and temporally independent adversarial models, lacking explicit modeling of robustness against temporally coupled perturbations. To tackle these challenges, we propose TCRL, a novel temporal-coupled adversarial training framework for robust constrained reinforcement learning (TCRL) in worst-case scenarios. First, TCRL introduces a worst-case-perceived cost constraint function that estimates safety costs under temporally coupled perturbations without the need to explicitly model adversarial attackers. Second, TCRL establishes a dual-constraint defense mechanism on the reward to counter temporally coupled adversaries while maintaining reward unpredictability. Experimental results demonstrate that TCRL consistently outperforms existing methods in terms of robustness against temporally coupled perturbation attacks across a variety of CRL tasks.

URL PDF HTML ☆

赞 0 踩 0

2512.19805 2026-06-15 cs.LG stat.ME 版本更新

Guardrailed Uplift Targeting: A Causal Optimization Playbook for Marketing Strategy

受保护的提升目标：营销策略的因果优化指南

Deepit Sapru

发表机构 * Deepit Sapru

AI总结本文提出一个优化客户定向的营销决策框架，结合异质处理效应估计与明确业务保护规则，旨在最大化收入和留存同时遵守预算、收入保护和客户体验等约束。

2512.20932 2026-06-15 cs.LG cs.AI 版本更新

Guardrailed Elasticity Pricing: A Churn-Aware Forecasting Playbook for Subscription Strategy

受约束的弹性定价：面向订阅策略的 churn 意识预测指南

Deepit Sapru

发表机构 * Deepit Sapru

AI总结本文提出一个动态定价框架，结合多变量需求预测、分段价格弹性及 churn 预测，以优化收入和留存。通过季节性模型与树状学习器，解决受约束优化问题，提升 SaaS 产品组合的定价效果，同时保障客户体验与伦理约束。

详情

DOI: 10.1109/ESIC68176.2026.11496127

AI中文摘要

本文提出一个营销分析框架，将订阅定价作为动态、受约束的决策系统，结合多变量需求预测、分段层面的价格弹性及 churn 可能性，以优化收入、利润率和留存。该方法融合季节性时间序列模型与树状学习器，运行蒙特卡洛情景测试以映射风险范围，并解决受约束优化问题，以确保客户体验、利润率底线和允许的 churn。在异质 SaaS 产品组合中经过验证，该方法持续优于静态层级和统一提升，通过将价格变动重新分配给愿意支付更多费用的分段，同时保护价格敏感的群体。系统通过模块化 API 实现实时重新校准，并包含模型可解释性以满足治理和合规需求。从管理角度看，该框架作为策略指南，明确何时从固定定价转向动态定价，如何将定价与客户生命周期价值（CLV）和每月 recurring 收入（MRR）目标对齐，以及如何嵌入伦理约束，从而实现可持续增长而不损害客户信任。

英文摘要

This paper presents a marketing analytics framework that operationalizes subscription pricing as a dynamic, guardrailed decision system, uniting multivariate demand forecasting, segment-level price elasticity, and churn propensity to optimize revenue, margin, and retention. The approach blends seasonal time-series models with tree-based learners, runs Monte Carlo scenario tests to map risk envelopes, and solves a constrained optimization that enforces business guardrails on customer experience, margin floors, and allowable churn. Validated across heterogeneous SaaS portfolios, the method consistently outperforms static tiers and uniform uplifts by reallocating price moves toward segments with higher willingness-to-pay while protecting price-sensitive cohorts. The system is designed for real-time recalibration via modular APIs and includes model explainability for governance and compliance. Managerially, the framework functions as a strategy playbook that clarifies when to shift from flat to dynamic pricing, how to align pricing with CLV and MRR targets, and how to embed ethical guardrails, enabling durable growth without eroding customer trust.

URL PDF HTML ☆

赞 0 踩 0

2601.08334 2026-06-15 cs.LG 版本更新

Automated Machine Learning in Radiomics: A Comparative Evaluation of Performance, Efficiency and Accessibility

医学影像组学中的自动化机器学习：性能、效率和可及性的比较评估

Jose Lozano-Montoya, Emilio Soria-Olivas, Almudena Fuster-Matanzo, Angel Alberich-Bayarri, Ana Jimenez-Pastor

发表机构 * University of Valencia（瓦伦西亚大学）； Research & Frontiers in AI Department, Quantitative Imaging Biomarkers in Medicine, Quibim SL（研究与前沿人工智能部门、定量影像生物标志物在医学中的应用、Quibim SL）； Intelligent Data Analysis Laboratory, IDAL, University of Valencia（智能数据分析实验室，IDAL，瓦伦西亚大学）

AI总结本文比较了通用和专用自动化机器学习框架在医学影像组学分类任务中的性能、效率和可及性，发现专用工具在性能上表现最佳，而通用框架在易用性上更优，但存在生存分析支持不足和特征可重复性整合不足等问题。

Comments 27 pages, 4 figures, 3 tables, code available, see https://github.com/joselznom/AutoML-Comparison-in-Radiomics

Journal ref JMIR Form Res. 2026;10:e91492

详情

DOI: 10.2196/91492

AI中文摘要

自动化机器学习（AutoML）框架通过使没有编程经验的研究人员能够构建模型，降低了预测和预后模型开发在影像组学中的技术障碍。然而，其在解决影像组学特定挑战的有效性仍不明确。本研究评估了通用和专用AutoML框架在多样化的影像组学分类任务中的性能、效率和可及性，从而突出影像组学的发展需求。使用了十个公共/私人影像组学数据集，涵盖多种成像模态（CT/MRI）、大小、解剖结构和终点。通过预定义参数使用标准化交叉验证测试了六个通用和五个专用框架。评估指标包括AUC、运行时间，以及与软件状态、可及性和可解释性相关的定性方面。Simplatab，一个具有无代码界面的专用工具，实现了最高的平均测试AUC（81.81%）和中等运行时间（约1小时）。LightAutoML，一个通用框架，展示了最快的执行速度，性能（6分钟内平均AUC为78.74%）具有竞争力。大多数专用框架由于过时、编程需求大或计算效率低而被排除在性能分析之外。相反，通用框架在可及性和易用性上表现更优。Simplatab为影像组学分类问题提供了性能、效率和可及性的有效平衡。然而，仍存在显著差距，包括缺乏可及的生存分析支持以及当前AutoML框架中特征可重复性和和谐整合的有限整合。未来研究应聚焦于调整AutoML解决方案以更好地解决这些影像组学特定挑战。

英文摘要

Automated machine learning (AutoML) frameworks can lower technical barriers for predictive and prognostic model development in radiomics by enabling researchers without programming expertise to build models. However, their effectiveness in addressing radiomics-specific challenges remains unclear. This study evaluates the performance, efficiency, and accessibility of general-purpose and radiomics-specific AutoML frameworks on diverse radiomics classification tasks, thereby highlighting development needs for radiomics. Ten public/private radiomics datasets with varied imaging modalities (CT/MRI), sizes, anatomies and endpoints were used. Six general-purpose and five radiomics-specific frameworks were tested with predefined parameters using standardized cross-validation. Evaluation metrics included AUC, runtime, together with qualitative aspects related to software status, accessibility, and interpretability. Simplatab, a radiomics-specific tool with a no-code interface, achieved the highest average test AUC (81.81%) with a moderate runtime (~1 hour). LightAutoML, a general-purpose framework, showed the fastest execution with competitive performance (78.74% mean AUC in six minutes). Most radiomics-specific frameworks were excluded from the performance analysis due to obsolescence, extensive programming requirements, or computational inefficiency. Conversely, general-purpose frameworks demonstrated higher accessibility and ease of implementation. Simplatab provides an effective balance of performance, efficiency, and accessibility for radiomics classification problems. However, significant gaps remain, including the lack of accessible survival analysis support and the limited integration of feature reproducibility and harmonization within current AutoML frameworks. Future research should focus on adapting AutoML solutions to better address these radiomics-specific challenges.

URL PDF HTML ☆

赞 0 踩 0

2511.17637 2026-06-15 cs.LG cs.CL 版本更新

PocketLLM: Ultimate Compression of Large Language Models via Meta Networks

PocketLLM: 通过元网络实现大语言模型的终极压缩

Ye Tian, Chengcheng Wang, Jing Han, Yehui Tang, Kai Han

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文提出PocketLLM，通过元网络在潜在空间压缩大语言模型，利用编码器和解码器实现高效压缩，实验表明在高压缩比下仍保持高精度。

Comments AAAI 2026 camera ready

Journal ref Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 33250-33258 (2026)

详情

DOI: 10.1609/aaai.v40i39.40610

AI中文摘要

随着大语言模型（LLMs）的持续增长，将其存储和传输到边缘设备变得越来越具有挑战性。传统方法如量化和剪枝在不牺牲精度的情况下难以实现极端压缩。本文介绍了一种新的压缩方法PocketLLM，通过元网络在潜在空间中压缩LLMs。提出一个简单的编码器网络，将LLMs的权重投影到离散的潜在向量中，然后使用紧凑的代码本进行表示。轻量级的解码器网络用于将代码本的代表性向量映射回原始权重空间。该方法仅需一个小解码器、简洁的代码本和一个索引即可实现LLMs中大权重的显著压缩。大量实验表明，PocketLLM在显著的压缩比下仍能保持优越的性能，例如将Llama 2-7B压缩10倍，精度损失微不足道。

英文摘要

As Large Language Models (LLMs) continue to grow in size, storing and transmitting them on edge devices becomes increasingly challenging. Traditional methods like quantization and pruning struggle to achieve extreme compression of LLMs without sacrificing accuracy. In this paper, we introduce PocketLLM, a novel approach to compress LLMs in a latent space via meta-networks. A simple encoder network is proposed to project the weights of LLMs into discrete latent vectors, which are then represented using a compact codebook. A lightweight decoder network is employed to map the codebook's representative vectors back to the original weight space. This method allows for significant compression of the large weights in LLMs, consisting solely of a small decoder, a concise codebook, and an index. Extensive experiments show that PocketLLM achieves superior performance even at significantly high compression ratios, e.g., compressing Llama 2-7B by 10x with a negligible drop in accuracy.

URL PDF HTML ☆

赞 0 踩 0

2508.10827 2026-06-15 astro-ph.EP cs.LG 版本更新

Accelerating exoplanet climate modelling: A machine learning approach to complement 3D GCM grid simulations

加速系外行星气候建模：一种机器学习方法用于补充3D GCM网格模拟

Alexander Plaschzug, Amit Reza, Ludmila Carone, Sebastian Gernjak, Christiane Helling

发表机构 * Space Research Institute, Austrian Academy of Sciences（空间研究所，奥地利科学院）； Institute for Theoretical Physics and Computational Physics, Graz University of Technology（理论物理与计算物理研究所，格拉茨技术大学）； Institute of Physics, University of Graz（物理研究所，格拉茨大学）

AI总结本文利用机器学习方法预测系外行星的3D温度和风结构，通过训练神经网络和决策树算法，为系外行星气候建模提供高效工具，提升对空间任务观测数据的解释能力。

Journal ref A&A Volume 706, February 2026

详情

DOI: 10.1051/0004-6361/202555631

AI中文摘要

随着望远镜技术的发展，观测系外行星大气的能力不断增强，对更精确的3D气候模型需求增加。然而，通用环流模型（GCMs）计算密集且耗时，难以模拟多种系外行星大气。本文研究了机器学习算法能否预测任意潮汐锁定气态系外行星的3D温度和风结构。引入了一个新的3D GCM网格，模拟了60颗膨胀的热木星围绕A、F、G、K和M型恒星。通过训练密集神经网络（DNN）和决策树算法（XGBoost），预测局部气体温度及水平和垂直风。通过WASP-121 b、HATS-42 b、NGTS-17 b、WASP-23 b和NGTS-1 b等目标测试，验证了DNN预测气体温度的可靠性，所有但一个行星的光谱计算误差在32 ppm以内。所开发的机器学习模拟器能够可靠预测围绕A到M型恒星的膨胀温暖至超热潮汐锁定木星的3D温度场，为系外行星集合研究提供快速工具。预测质量足以保证对气体相化学、云形成和传输光谱的影响极小。

英文摘要

With the development of ever-improving telescopes capable of observing exoplanet atmospheres in greater detail and number, there is a growing demand for enhanced 3D climate models to support and help interpret observational data from space missions like CHEOPS, TESS, JWST, PLATO, and Ariel. However, the computationally intensive and time-consuming nature of general circulation models (GCMs) poses significant challenges in simulating a wide range of exoplanetary atmospheres. This study aims to determine whether machine learning (ML) algorithms can be used to predict the 3D temperature and wind structure of arbitrary tidally-locked gaseous exoplanets in a range of planetary parameters. A new 3D GCM grid with 60 inflated hot Jupiters orbiting A, F, G, K, and M-type host stars modelled with Exorad has been introduced. A dense neural network (DNN) and a decision tree algorithm (XGBoost) are trained on this grid to predict local gas temperatures along with horizontal and vertical winds. To ensure the reliability and quality of the ML model predictions, WASP-121 b, HATS-42 b, NGTS-17 b, WASP-23 b, and NGTS-1 b-like planets, which are all targets for PLATO observation, are selected and modelled with ExoRad and the two ML methods as test cases. The DNN predictions for the gas temperatures are to such a degree that the calculated spectra agree within 32 ppm for all but one planet, for which only one single HCN feature reaches a 100 ppm difference. The developed ML emulators can reliably predict the complete 3D temperature field of an inflated warm to ultra-hot tidally locked Jupiter around A to M-type host stars. It provides a fast tool to complement and extend traditional GCM grids for exoplanet ensemble studies. The quality of the predictions is such that no or minimal effects on the gas phase chemistry, hence on the cloud formation and transmission spectra, are to be expected.

URL PDF HTML ☆

赞 0 踩 0

2412.00123 2026-06-15 cs.LG math.PR 版本更新

Electricity Price Prediction Using Multi-Kernel Gaussian Process Regression Combined with Kernel-Based Support Vector Regression

利用多核高斯过程回归与核支持向量回归预测电力价格

Abhinav Das, Stephan Schlüter, Lorenz Schneider

发表机构 * Faculty of Mathematics and Economics, Ulm University（数学与经济学学院，乌尔姆大学）； Institute of Energy Engineering and Energy Economics, Ulm University of Applied Sciences（能源工程与能源经济学研究所，应用科学大学乌尔姆）； Emlyon Business School, Lyon, France（埃默里昂商学院，法国里昂）

AI总结本文提出一种新的混合模型用于预测德国电力价格，结合高斯过程回归和支持向量回归，通过选择合适的数据依赖协方差函数提升GPR性能，并利用支持向量回归处理非线性过程和异常值，实验表明优于现有基准模型。

Journal ref Journal of Forecasting (2026) 45, no. 4: 2059:2077

详情

DOI: 10.1002/for.70124

AI中文摘要

本文提出了一种新的混合模型用于预测德国电力价格。该算法基于高斯过程回归（GPR）和支持向量回归（SVR）的结合。尽管GPR在学习数据中的随机模式和插值方面表现良好，但其在样本外数据的预测性能并不理想。通过选择合适的数据依赖协方差函数，可以增强GPR对德国小时电力价格的预测性能。然而，由于样本外预测依赖于训练数据，预测容易受到噪声和异常值的影响。为了解决这个问题，通过SVR进行单独预测，该方法应用基于边界的优化。这种方法在处理非线性过程和异常值时具有优势，因为只有训练数据中的某些必要点（支持向量）负责回归。然后通过均匀权重线性组合个体预测。在测试历史德国电力价格时，该方法优于公开可用的基准，即LASSO估计的自回归回归模型以及最近研究中提供的深度神经网络。

英文摘要

This paper presents a new hybrid model for predicting German electricity prices. The algorithm is based on a combination of Gaussian Process Regression (GPR) and Support Vector Regression (SVR). Although GPR is a competent model for learning stochastic patterns within data and for interpolation, its performance for out-of-sample data is not very promising. By choosing a suitable data-dependent covariance function, we can enhance the performance of GPR for the German hourly power prices being tested. However, since the out-of-sample prediction is dependent on the training data, the prediction is vulnerable to noise and outliers. To overcome this issue, a separate prediction is calculated using SVR, which applies margin-based optimization. This method is advantageous when dealing with non-linear processes and outliers, since only certain necessary points (support vectors) in the training data are responsible for regression. The individual predictions are then linearly combined using uniform weights. When tested on historic German power prices, this approach outperforms the publicly available benchmarks, namely the LASSO estimated autoregressive regression model, deep neural network provided in the recent research by [1].

URL PDF HTML ☆

赞 0 踩 0

2501.15196 2026-06-15 stat.ML cs.LG 版本更新

A Review on Self-Supervised Learning for Time Series Anomaly Detection: Recent Advances and Open Challenges

时间序列异常检测中自监督学习的综述：最新进展与开放挑战

Aitor Sánchez-Ferrera, Borja Calvo, Jose A. Lozano

发表机构 * University of the Basque Country UPV/EHU（巴斯克大学UPV/EHU）； Basque Center for Applied Mathematics (BCAM)（巴斯克应用数学中心）

AI总结本文综述了时间序列异常检测中自监督学习的最新方法，提出分类体系以理解其多样性，并提供GitHub仓库供后续更新。

详情

DOI: 10.1145/3770575

AI中文摘要

时间序列异常检测面临诸多挑战，这源于时间依赖数据的序列性和动态性。传统无监督方法常在泛化能力上遇到困难，往往过度拟合训练期间观察到的已知正常模式，难以适应未见过的正常情况。为解决这一限制，时间序列的自监督技术引起了关注，作为克服这一障碍并提升异常检测器性能的潜在解决方案。本文综述了近期利用自监督学习进行时间序列异常检测的方法。提出了一种分类体系，根据其主要特征对这些方法进行分类，有助于清晰理解该领域内的多样性。本文调查中包含的信息，以及将定期更新的额外细节，可在以下GitHub仓库中找到：https://github.com/Aitorzan3/Awesome-Self-Supervised-Time-Series-Anomaly-Detection。

英文摘要

Time series anomaly detection presents various challenges due to the sequential and dynamic nature of time-dependent data. Traditional unsupervised methods frequently encounter difficulties in generalization, often overfitting to known normal patterns observed during training and struggling to adapt to unseen normality. In response to this limitation, self-supervised techniques for time series have garnered attention as a potential solution to undertake this obstacle and enhance the performance of anomaly detectors. This paper presents a comprehensive review of the recent methods that make use of self-supervised learning for time series anomaly detection. A taxonomy is proposed to categorize these methods based on their primary characteristics, facilitating a clear understanding of their diversity within this field. The information contained in this survey, along with additional details that will be periodically updated, is available on the following GitHub repository: https://github.com/Aitorzan3/Awesome-Self-Supervised-Time-Series-Anomaly-Detection.

URL PDF HTML ☆

赞 0 踩 0

2506.18271 2026-06-15 cs.LG 版本更新

Memory-Augmented Architecture for Long-Term Context Handling in Large Language Models

具有长时上下文处理能力的大型语言模型记忆增强架构

Haseeb Ullah Khan Shinwari, Muhammad Usama

发表机构 * Newton AI Lab（牛顿AI实验室）； School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST)（韩国科学技术院电气工程学院）

AI总结本文提出一种记忆增强架构，通过动态检索、更新和剪枝过去交互信息，提升大型语言模型的长时上下文处理能力，实验表明该方法能有效提高上下文连贯性、降低内存开销并提升响应质量。

Journal ref IEEE Transactions on Artificial Intelligence, 2026

2506.09087 2026-06-15 cs.LG math.PR q-bio.NC stat.ML 版本更新

Spiking Neural Models for Decision-Making Tasks with Learning

基于学习的脉冲神经模型用于决策任务

Sophie Jaffard, Giulia Mezzadri, Patricia Reynaud-Bouret, Etienne Tanré

发表机构 * Cognition and Decision Lab, Columbia University（认知与决策实验室，哥伦比亚大学）

AI总结本文提出一种生物合理性的脉冲神经网络模型，结合学习机制和多变量Hawkes过程，用于决策任务，通过耦合DDM与Poisson计数器模型，推导出带有相关噪声的DDM，并设计在线分类任务验证模型预测。

详情

DOI: 10.1007/s00285-026-02415-0

AI中文摘要

在认知领域，决策任务中的响应时间和选择通常用漂移扩散模型（DDMs）建模，该模型将决策证据的累积描述为随机过程，特别是布朗运动，其中漂移速率反映证据强度。同样，泊松计数器模型将证据累积描述为离散事件，其计数随时间建模为泊松过程，并可解释为神经元活动。然而，这些模型缺乏学习机制且局限于参与者已知类别任务。为弥合认知与生物模型之间的差距，本文提出一种生物合理性的脉冲神经网络（SNN）模型，用于决策任务，该模型包含学习机制，其神经元活动由多变量Hawkes过程建模。首先，我们证明了DDM与泊松计数器模型之间的耦合结果，表明这两个模型提供相似的分类和响应时间，并且DDM可近似由脉冲泊松神经元建模。为进一步推进，我们证明了一个具有相关噪声的特定DDM可从由局部学习规则支配的脉冲神经元Hawkes网络中推导出来。此外，我们设计了一个在线分类任务来评估模型预测。本文为将生物相关神经机制整合到认知模型中提供了重要进展，促进了对神经活动与行为之间关系的深入理解。

英文摘要

In cognition, response times and choices in decision-making tasks are commonly modeled using Drift Diffusion Models (DDMs), which describe the accumulation of evidence for a decision as a stochastic process, specifically a Brownian motion, with the drift rate reflecting the strength of the evidence. In the same vein, the Poisson counter model describes the accumulation of evidence as discrete events whose counts over time are modeled as Poisson processes, and has a spiking neurons interpretation as these processes are used to model neuronal activities. However, these models lack a learning mechanism and are limited to tasks where participants have prior knowledge of the categories. To bridge the gap between cognitive and biological models, we propose a biologically plausible Spiking Neural Network (SNN) model for decision-making that incorporates a learning mechanism and whose neurons activities are modeled by a multivariate Hawkes process. First, we show a coupling result between the DDM and the Poisson counter model, establishing that these two models provide similar categorizations and reaction times and that the DDM can be approximated by spiking Poisson neurons. To go further, we show that a particular DDM with correlated noise can be derived from a Hawkes network of spiking neurons governed by a local learning rule. In addition, we designed an online categorization task to evaluate the model predictions. This work provides a significant step toward integrating biologically relevant neural mechanisms into cognitive models, fostering a deeper understanding of the relationship between neural activity and behavior.

URL PDF HTML ☆

赞 0 踩 0

2505.04907 2026-06-15 cs.LG 版本更新

VaCDA: Variational Contrastive Alignment-based Scalable Human Activity Recognition

VaCDA：基于变分对比对齐的可扩展人类活动识别

Soham Khisa, Avijoy Chakma

发表机构 * Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology（计算机科学与工程系，孟加拉国工程与技术大学）； Department of Computer Science, Bowie State University（计算机科学系，布里沃州立大学）

AI总结本文提出VaCDA框架，结合变分自编码器和对比学习，解决多源领域适应中的数据异质性问题，提升跨人物、跨位置和跨设备场景下的活动识别性能。

详情

DOI: 10.1109/ICMLA66185.2025.00160

AI中文摘要

技术进步促使可穿戴设备的兴起，这些设备持续监测用户活动，生成大量未标记数据。这种数据难以解读，手动标注费时且易出错。此外，数据分布往往异质，由于设备放置、类型和用户行为的变化。因此，传统迁移学习方法效果不佳，难以识别日常活动。为解决这些问题，我们使用变分自编码器（VAE）从可用传感器数据中学习共享的低维潜在空间。该空间在不同传感器间泛化数据，缓解异质性并帮助适应目标领域。我们整合对比学习以增强特征表示，通过在不同领域对同一类实例进行对齐并分离不同类实例。我们提出变分对比域适应（VaCDA），一种结合VAE和对比学习的多源域适应框架，以提高特征表示并减少源域和目标域之间的异质性。我们评估了VaCDA在三个异质场景下的多个公开数据集上，即跨人物、跨位置和跨设备。VaCDA在跨位置和跨设备场景中优于基线方法。

英文摘要

Technological advancements have led to the rise of wearable devices with sensors that continuously monitor user activities, generating vast amounts of unlabeled data. This data is challenging to interpret, and manual annotation is labor-intensive and error-prone. Additionally, data distribution is often heterogeneous due to device placement, type, and user behavior variations. As a result, traditional transfer learning methods perform suboptimally, making it difficult to recognize daily activities. To address these challenges, we use a variational autoencoder (VAE) to learn a shared, low-dimensional latent space from available sensor data. This space generalizes data across diverse sensors, mitigating heterogeneity and aiding robust adaptation to the target domain. We integrate contrastive learning to enhance feature representation by aligning instances of the same class across domains while separating different classes. We propose Variational Contrastive Domain Adaptation (VaCDA), a multi-source domain adaptation framework combining VAEs and contrastive learning to improve feature representation and reduce heterogeneity between source and target domains. We evaluate VaCDA on multiple publicly available datasets across three heterogeneity scenarios: cross-person, cross-position, and cross-device. VaCDA outperforms the baselines in cross-position and cross-device scenarios.

URL PDF HTML ☆

赞 0 踩 0

2311.05139 2026-06-15 cs.LG 版本更新

Hard-Negative Sampling for Contrastive Learning: Optimal Representation Geometry and Neural- vs Dimensional-Collapse

对比学习中的硬负样本：最优表示几何与神经折叠与维度折叠

Ruijie Jiang, Thuan Nguyen, Shuchin Aeron, Prakash Ishwar

发表机构 * Department of Electrical Engineering, Tufts University（Tufts大学电气工程系）； Department of Engineering, Engineering Technology, East Tennessee State University（东田纳西州立大学工程系）； Department of Electrical and Computer Engineering, Boston University（波士顿大学电气与计算机工程系）

AI总结本文证明了在对比学习中，SCL、HSCL和UCL的损失最小化需要神经折叠几何，且HSCL和HUCL损失下界不低于SCL和UCL。同时，通过随机初始化和合适难度级别，Adam优化可收敛至神经折叠几何，而无硬负样本或特征归一化则会导致维度折叠。

Comments Final version: Reviewed and accepted to TMLR April 2025. Updated exposition, Added analysis of lower bounds

Journal ref Transactions on Machine Learning Research, 2025

详情

AI中文摘要

对于广泛研究的数据模型和通用损失及样本硬化函数，我们证明监督对比学习（SCL）、硬SCL（HSCL）和无监督对比学习（UCL）的损失最小化由表现神经折叠（NC）的表示实现，即类均值形成等角紧框架（ETF）且同类数据映射到同一表示。我们还证明对于任何表示映射，HSCL和硬UCL（HUCL）损失下界不低于对应的SCL和UCL损失。与现有文献不同，我们的SCL理论结果不需增强视图的类条件独立性，适用于包含广泛使用的InfoNCE损失函数的一般损失函数类。此外，我们的证明更简单、紧凑且透明。类似现有文献，我们的理论声明也适用于实际场景中使用批处理优化的情况。我们实证显示，首次证明在使用随机初始化和合适难度级别时，Adam优化HSCL和HUCL损失可收敛至NC几何，若加入单位球或单位球面特征归一化。不加入硬负样本或特征归一化时，通过Adam学习的表示会遭受维度折叠（DC）并无法达到NC几何。这些结果展示了硬负样本采样在对比表示学习中的作用，我们最后提出几个开放性的理论问题以供未来研究。代码可在https://github.com/rjiang03/HCL/tree/main找到。

英文摘要

For a widely-studied data model and general loss and sample-hardening functions we prove that the losses of Supervised Contrastive Learning (SCL), Hard-SCL (HSCL), and Unsupervised Contrastive Learning (UCL) are minimized by representations that exhibit Neural-Collapse (NC), i.e., the class means form an Equiangular Tight Frame (ETF) and data from the same class are mapped to the same representation. We also prove that for any representation mapping, the HSCL and Hard-UCL (HUCL) losses are lower bounded by the corresponding SCL and UCL losses. In contrast to existing literature, our theoretical results for SCL do not require class-conditional independence of augmented views and work for a general loss function class that includes the widely used InfoNCE loss function. Moreover, our proofs are simpler, compact, and transparent. Similar to existing literature, our theoretical claims also hold for the practical scenario where batching is used for optimization. We empirically demonstrate, for the first time, that Adam optimization (with batching) of HSCL and HUCL losses with random initialization and suitable hardness levels can indeed converge to the NC-geometry if we incorporate unit-ball or unit-sphere feature normalization. Without incorporating hard-negatives or feature normalization, however, the representations learned via Adam suffer from Dimensional-Collapse (DC) and fail to attain the NC-geometry. These results exemplify the role of hard-negative sampling in contrastive representation learning and we conclude with several open theoretical problems for future work. The code can be found at https://github.com/rjiang03/HCL/tree/main

URL PDF HTML ☆

赞 0 踩 0

2209.00078 2026-06-15 cs.LG 版本更新

Supervised Contrastive Learning with Hard Negative Samples

带有难负样本的监督对比学习

Ruijie Jiang, Thuan Nguyen, Prakash Ishwar, Shuchin Aeron

发表机构 * Dept. of ECE Tufts University（电子工程系塔夫茨大学）； Dept. of CS Tufts University（计算机科学系塔夫茨大学）； Dept. of ECE Boston University（电子工程系波士顿大学）

AI总结本文提出H-SCL，通过硬化函数调整类条件负采样分布，提升对比学习在下游分类任务中的性能，并分析H-SCL损失与H-UCL损失的关系。

Journal ref 2024 International Joint Conference on Neural Networks (IJCNN), pp. 1-8, 2024

详情

DOI: 10.1109/IJCNN60899.2024.10650863

AI中文摘要

通过最小化适当的损失函数（如InfoNCE损失），对比学习（CL）通过将正样本拉近、推斥负样本来学习有用的表示函数。正样本通常通过

英文摘要

Through minimization of an appropriate loss function such as the InfoNCE loss, contrastive learning (CL) learns a useful representation function by pulling positive samples close to each other while pushing negative samples far apart in the embedding space. The positive samples are typically created using "label-preserving" augmentations, i.e., domain-specific transformations of a given datum or anchor. In absence of class information, in unsupervised CL (UCL), the negative samples are typically chosen randomly and independently of the anchor from a preset negative sampling distribution over the entire dataset. This leads to class-collisions in UCL. Supervised CL (SCL), avoids this class collision by conditioning the negative sampling distribution to samples having labels different from that of the anchor. In hard-UCL (H-UCL), which has been shown to be an effective method to further enhance UCL, the negative sampling distribution is conditionally tilted, by means of a hardening function, towards samples that are closer to the anchor. Motivated by this, in this paper we propose hard-SCL (H-SCL) {wherein} the class conditional negative sampling distribution {is tilted} via a hardening function. Our simulation results confirm the utility of H-SCL over SCL with significant performance gains {in downstream classification tasks.} Analytically, we show that {in the} limit of infinite negative samples per anchor and a suitable assumption, the {H-SCL loss} is upper bounded by the {H-UCL loss}, thereby justifying the utility of H-UCL {for controlling} the H-SCL loss in the absence of label information. Through experiments on several datasets, we verify the assumption as well as the claimed inequality between H-UCL and H-SCL losses. We also provide a plausible scenario where H-SCL loss is lower bounded by UCL loss, indicating the limited utility of UCL in controlling the H-SCL loss.

URL PDF HTML ☆

赞 0 踩 0

1. 深度学习架构与训练方法 40 篇

Can Editing 1 Neuron Fix Repetition Loops in LLMs?

D2H-AD: A Hybrid Model Utilizing Hyperdimensional Computing for Advanced Anomaly Detection

Neural Slack Variables for Shape Constraints

SuperThoughts: Reasoning Tokens in Superposition

SpikF-GO: Spiking Fourier Graph Operators for Multivariate Time Series Forecasting

Decompose Sparsely Where You Should, Absorb Densely Where You Should No

Deep Spectral Learning of Embedded Latent Transfer Operators for Stochastic Dynamical Systems

Learning High Coverage Discriminative Parsimonious Rulesets

Structured Noise Adaptation for Sequential Bayesian Filtering with Embedded Latent Transfer Operators

DIFF-ERO: A Conformance-Aware Loss for Deep Learning in Process Mining

Hierarchical ODE: Learning Continuous-Time Physical Prototypes for Early Link Failure Detection

Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback

A Low-Rank Subspace Analysis of LLM Interventions

EM-NeSy: Expectation Maximization for Neurosymbolic Learning

Code Correctness Signals in LLM Hidden States: Pre-Generation Probing and Repair Geometry

Zero-shot generalization of transformer neural operators to larger domains

Neither Parallel Nor Sequential: How DiffusionGemma Actually Commits Tokens

Compressed Computation is (probably) not Computation in Superposition

Simplex-Constrained Sparse Bagging: Transitioning from Uniform Priors to Sparse Posteriors in Ensemble Learning

Adaptive Nucleus Truncation for Long-Form Reasoning

FAConformer: Frequency-Aware Convolutional Transformer for Auditory Attention Decoding

GitOfThoughts: Version-Controlled Reasoning and Agent Memory You Can Replay, Diff, and Merge

DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation

Shift-Invariant Attribute Scoring for Kolmogorov-Arnold Networks via Shapley Value

LEPO: Latent Reasoning Policy Optimization for Large Language Models

Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions

Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions

A Composite Activation Function for Learning Stable Binary Representations

Learning Variable-Length Tokenization for Generative Recommendation

Exact Linear Attention

OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification

Multi-component Causal Tracing in Large Language Models

Adaptive Oscillatory-State Alignment for Time Series Forecasting

MP3: Multi-Period Pattern Pre-training for Spatio-Temporal Forecasting

Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillation

UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities

Reward-SQL: Boosting Text-to-SQL via Stepwise Execution-Aware Reasoning and Process-Supervised Rewards

Token-Level LLM Collaboration via FusionRoute

Rotation-Invariant Spherical Watermarking via Third-Order SO(3) Representation Coupling

Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study

2. 表示学习、自监督与对比学习 8 篇

Numbers Already Carry Their Own Embeddings

Riemannian Metric Matching for Scalable Geometric Modeling of Distributions

When Language Representations Interact: Separability and Cross-Lingual Effects in LLMs

Beyond task performance: Decoding bioacoustic embeddings with speech features

Gaze Heads: How VLMs Look at What They Describe

Equivariant Representation Learning via Class-Pose Decomposition

Ensembling Sparse Autoencoders

Learning What to Predict: Downstream-Guided Task Design for Continued Pretraining

3. 强化学习与序列决策 21 篇

Utility-Constrained Policy Optimization

Contract-Based Compositional Shielding for Safe Multi-Agent Reinforcement Learning

DRIVE: Distributional and Retrieval-Augmented Bidding with Value Evaluation

Provably Safe, Yet Scalable Reinforcement Learning

Graph Structured Combinatorial Semi-Bandit with Nonlinear Reward Associations through Separable Signals

Active Inference for Adaptive Traffic Signal Control in Noisy Nonstationary IoT Environments

Safety-Contract Graph Multi-Agent Reinforcement Learning for Autonomous Network Security Response

Temporally Consistent Graph Q-Networks for Intelligent Network Control

PhysVLA: Towards Physically-Grounded VLA for Embodied Robotic Manipulation

Closing the Reflection Gap: A Free Calibration Bonus for Agentic RL

Causal Object-Centric Models for Planning with Monte Carlo Tree Search

GAGPO: Generalized Advantage Grouped Policy Optimization

PERRY: Policy Evaluation with Confidence Intervals using Auxiliary Data

Tackling GNARLy Problems: Graph Neural Algorithmic Reasoning Reimagined through Reinforcement Learning

RAMAC: Multimodal Risk-Aware Offline Reinforcement Learning and the Role of Behavior Regularization

Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals

Rethinking the Trust Region in LLM Reinforcement Learning

Deep Dense Exploration for LLM Reinforcement Learning via Pivot-Driven Resampling

Temporal Straightening for Latent Planning

AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

4. 生成模型与概率建模 15 篇

Smoothing Dark Areas in Molecular Latent Diffusion

Decoupled Latent Optimization of Diffusion Models for Full Waveform Inversion

LapidaryEngine: Fully Conversational Crystal Generation

Implicit Variational Rejection Sampling

Recursively Trained Diffusion Models: Limiting Collapse Distribution and Spectral Characterization

FlowMo-WM: A World Model with Object Momentum and Hidden Ambient Drift

Self-Evolving Visual Questioner