arXivDaily arXiv每日学术速递 周一至周五更新
重置

1. 深度学习架构与训练方法 39 篇

2606.18283 2026-06-18 cs.LG 新提交

Gaussian Mixture Attention: Linear-Time Sequence Mixing via Probabilistic Latent Routing

高斯混合注意力:通过概率潜在路由实现线性时间序列混合

Yongchao Huang, Hassan Raza

AI总结 提出高斯混合注意力(GMA),用K个高斯混合分量的潜在路由替代逐对查询-键比较,实现固定K的线性内存缩放,在长上下文分类任务中与注意力基线竞争。

Comments 55 pages

详情
AI中文摘要

标准点积注意力的密集token间交互模式仍然是扩展Transformer架构到长上下文的主要瓶颈。我们引入\textbf{高斯混合注意力(GMA)},一种概率注意力风格的序列混合器,通过$K$个学习的高斯混合分量进行路由,替代显式的逐对查询-键比较。查询和键被映射到共享潜在路由空间上的后验\textit{责任}向量;它们的重叠定义了隐式的责任空间亲和性,而值被写入和读取自一个$K$槽的潜在记忆。通过利用矩阵乘法的结合性,GMA避免了生成诱导的$N\times N$亲和矩阵,而是使用两个责任矩阵,其主导激活存储规模为$\mathcal{O}(NK)$而非固定$K$下的$\mathcal{O}(N^2)$。我们制定了GMA的双向和因果变体,提供了高斯混合分量的端到端可微参数化,并分析了其责任调制的梯度结构、约束非负低秩亲和性解释以及局部路由稳定性。实验上,GMA表现出预期的固定$K$线性内存缩放,并在长上下文分类上与注意力基线竞争,而因果GMA在WikiText-103上优于测试的线性/随机特征注意力变体,但在当前实现中仍落后于优化的因果SDPA和Mamba。对学习到的责任的分析进一步显示了广泛的组件使用和与表面形式词类别的适度对齐,支持GMA作为一种概率性、可解释、固定$K$的线性时间注意力风格替代方案,而非优化softmax注意力或状态空间模型的通用替代。

英文摘要

The dense token-to-token interaction pattern of standard dot-product attention remains a central bottleneck in scaling Transformer architectures to long contexts. We introduce \textbf{Gaussian Mixture Attention (GMA)}, a probabilistic attention-style sequence mixer that replaces explicit pairwise query--key comparison with routing through $K$ learned Gaussian mixture components. Queries and keys are mapped to posterior \textit{responsibility} vectors over a shared latent routing space; their overlap defines an implicit responsibility-space affinity, while values are written into and read from a $K$-slot latent memory. By exploiting the associativity of matrix multiplication, GMA avoids materializing the induced $N\times N$ affinity matrix and instead uses two responsibility matrices whose dominant activation storage scales as $\mathcal{O}(NK)$ rather than $\mathcal{O}(N^2)$ for fixed $K$. We formulate bidirectional and causal variants of GMA, provide an end-to-end differentiable parameterization of the Gaussian mixture components, and analyze its responsibility-modulated gradient structure, constrained non-negative low-rank affinity interpretation, and local routing stability. Empirically, GMA exhibits the intended fixed-$K$ linear memory scaling and is competitive with attention-style baselines on long-context classification, while causal GMA improves over tested linear/random-feature attention variants on WikiText-103 but remains behind optimized causal SDPA and Mamba in the current implementation. Analysis of learned responsibilities further shows broad component usage and moderate alignment with surface-form token categories, supporting GMA as a probabilistic, interpretable, fixed-$K$ linear-time attention-style alternative rather than a universal replacement for optimized softmax attention or state-space models.

2606.18315 2026-06-18 cs.LG cs.AI 新提交

Ghost Attractor Networks: Basin-Structured Dynamical Decoders for Closed-Loop Sequential Generation

鬼吸引子网络:用于闭环序列生成的盆地结构动力学解码器

Tianyu Wang, Ying Wang, Zhihao Liu, Xi Vincent Wang, Lihui Wang

发表机构 * KTH Royal Institute of Technology(瑞典皇家理工学院) Department of Production Engineering, KTH Royal Institute of Technology(瑞典皇家理工学院生产工程系) Department of Decision and Control Systems, KTH Royal Institute of Technology(瑞典皇家理工学院决策与控制系统系)

AI总结 提出鬼吸引子网络,一种理论推导的动力学解码器,通过构建盆地-吸引子结构实现高效闭环序列生成,在机器人动作解码任务中以2.3M参数匹配1.07B参数扩散变压器的离线精度,延迟降低32倍。

详情
AI中文摘要

使用大规模Transformer和扩散解码器进行序列输出生成时,内存成本随序列长度增长,且需要迭代逐步骤计算。用小型前馈解码器替代可恢复效率,但产生非结构化的潜在表示,限制了闭环控制:相位条件动作生成和跨步骤潜在传递都需要具有稳定盆地的潜在几何结构。本文提出鬼吸引子网络,一种理论推导的动力学解码器,其潜在变量在学习的势能下演化并带有漂移,通过构造产生盆地-吸引子结构。三个期望(多模态、解码器级单次切换和恒定内存)激发了势能-漂移形式,模式转变作为鞍结分岔和鬼吸引子逃逸出现。层次化的相空间分解将一阶盆地收敛与二阶本体感受细化分开。实验上,使用行为克隆和对比目标端到端训练的鬼网络在其势能中表现出预测的梯度流收缩,在1430个保留样本上,梯度范数在五个积分步骤中衰减67%。鬼网络作为机器人动作解码器进行评估。一个230万参数的鬼网络以462倍少的参数和32倍低的延迟匹配了10.7亿参数扩散变压器的离线精度,并在离线均方误差上比五个替代的200万参数解码器(MLP、神经常微分方程、条件变分自编码器、Transformer、单步扩散)低5.9%至29%。在LIBERO-10闭环基准测试中,鬼网络的盆地结构潜在上的相位条件比前馈MLP基线提高了13.5个百分点的成功率,持久潜在集成达到95.7%的最终成功率。

英文摘要

Sequential output generation with large-scale Transformer and diffusion decoders pays a memory cost that grows with sequence length, plus iterative per-step computation. Replacing them with small feed-forward decoders restores efficiency but produces unstructured latent representations that limit closed-loop control: phase-conditioned action generation and cross-step latent carry-over both require a latent geometry with stable basins. This article proposes Ghost Attractor Networks, a theoretically derived dynamical decoder whose latent evolves under a learned potential with drift and produces a basin-attractor structure by construction. Three desiderata (multi-modality, decoder-level single-pass switching, and constant memory) motivate the potential-drift form, and mode transitions arise as saddle-node bifurcations with ghost-attractor escape. A hierarchical phase-space decomposition separates first-order basin convergence from second-order proprioceptive refinement. Empirically, a Ghost trained end-to-end with a behavioral-cloning and contrastive objective exhibits the predicted gradient-flow contraction in its potential, with the gradient norm decaying by 67 percent across five integration steps on 1430 held-out samples. Ghost is evaluated as a robotic action decoder. A 2.3-million-parameter Ghost matches the offline accuracy of a 1.07-billion-parameter Diffusion Transformer at 462 times fewer parameters and 32 times lower latency, and beats five alternative 2M-parameter decoders (MLP, Neural ODE, CVAE, Transformer, 1-step Diffusion) on offline mean squared error by 5.9 to 29 percent. On the LIBERO-10 closed-loop benchmark, phase conditioning on Ghost's basin-structured latent yields a 13.5 percentage-point success-rate gain over a feed-forward MLP baseline, and persistent-latent ensembling reaches a 95.7 percent final success rate.

2606.18324 2026-06-18 cs.LG cs.AI 新提交

Why SWAVE May Not Be All You Need:A Concept-Evolution Retrospective on Complex-Valued Recurrent Language Models

为什么SWAVE可能不是你所需的一切:复数值循环语言模型的概念演化回顾

Ramprasath Ganesaraja, Swathika N, Sahil Dilip Panse

发表机构 * EdgeVerve Systems Limited(EdgeVerve系统有限公司)

AI总结 本文回顾了复数值循环语言模型SWAVE的演化过程,揭示了其设计假设的缺陷,并提出了cos-domination collapse等理论见解和工程原则。

详情
AI中文摘要

SWave是一个复数值循环语言模型(169.26M参数,D=384,L=16,T=2048),在FineWeb-Edu上使用2xH100 NVL训练。它基于三个基本前提设计:将语言表示为复数值波而非实数值能实现更丰富的信息编码;Cayley参数化的酉变换提供数学保证防止状态衰减或爆炸;旋转而非收缩的隐藏状态能在任意长上下文中保持信号完整性。SWave的核心在三个开发阶段中经历了实质性演化。发现Resonance Head在结构上允许虚通道坍缩为全局损失最小值(我们称为cos-domination collapse的失败模式),并被来自相位关联记忆(PAM)架构的具有独立实部和虚部嵌入表的解耦头取代。这解决了退化最小值,并实现了稳定的200,000步训练(最佳步PPL 22.0,第89,861步)。ComplexNorm和Wave Propagation Scan在所有三个阶段中都是承重结构,并保留在最终架构中。ProtectGatedScan被重新定义为结构先验而非学习行为。四个多尺度保留概念在受控评估下未显示可测量的改进,被发现非承重。ComplexGatedUnit被参数更少的实值平方ReLU通道混合器取代。一旦结构约束得到解决,辅助训练目标未显示益处。研究得出了cos-domination collapse的形式化描述、用于数值稳定性的对数空间反向传播并行扫描、六个可迁移的复数值循环训练工程原则,以及用于捕捉传统测试套件遗漏的结构偏差的计划到代码可追溯性方法。

英文摘要

SWave is a complex-valued recurrent language model (169.26M parameters, D=384, L=16, T=2048) trained on FineWeb-Edu using 2xH100 NVL. It was designed around three founding premises: that representing language as complex waves rather than real-valued numbers enables richer information encoding; that a Cayley-parameterised unitary transition provides a mathematical guarantee against state decay or explosion; and that a hidden state which rotates rather than shrinks preserves signal integrity over arbitrarily long contexts. The core of SWave evolved substantially across three development phases. The Resonance Head was found to structurally admit imaginary-channel collapse as a global loss minimum (a failure mode we term cos-domination collapse) and was superseded by an untied head with independent real and imaginary embedding tables from the Phase-Associative Memory (PAM) architecture. This resolved the degenerate minimum and enabled stable 200,000-step training (best-step PPL 22.0 at step 89,861). ComplexNorm and the Wave Propagation Scan proved load-bearing throughout all three phases and were retained to the final architecture. ProtectGatedScan was reframed as a structural prior rather than a learned behaviour. The four multi-scale retention concepts showed no measurable improvement under controlled evaluation and were found non-load-bearing. The ComplexGatedUnit was superseded by a real-valued squared-ReLU channel mixer with fewer parameters. The auxiliary training objectives showed no benefit once structural constraints were resolved. The investigation yields a formal characterisation of cos-domination collapse, a parallel scan with a log-space backward pass for numerical stability, six transferable engineering principles for complex-valued recurrent training, and a plan-to-code traceability methodology for catching structural divergences that conventional test suites miss.

2606.18326 2026-06-18 cs.LG 新提交

Neural Network Implementation of the Renormalization Group for Fault Diagnosis with Class Imbalance

基于重正化群神经网络的类别不平衡故障诊断

Evgeny Nikulchev, Dmitry Ilin

发表机构 * MIREA – Russian Technological University(莫斯科俄罗斯技术大学)

AI总结 提出RGNet,一种基于重正化群概念的神经网络架构,通过层次化粗粒化特征空间处理类别不平衡和多维噪声,在AI4I数据集上验证了其有效性。

Comments 8 pages

详情
AI中文摘要

机器学习模型在实际任务中的应用面临类别不平衡和多维噪声等挑战。本文提出RGNet,一种基于重正化群(RG)概念的神经网络架构,用于特征空间的层次化粗粒化。该模型依次压缩输入维度,并在分类前拼接所有尺度,从而捕获局部细节和全局模式。引入了RG流的概念——可解释的低维表示,通过t-SNE可视化揭示了离散曲线结构,证实了粗粒化的有效性。在不平衡的AI4I数据集上给出了实验结果。结果表明,RGNet是一种通用、可解释且具有竞争力的故障预测解决方案,适用于类别不平衡的应用场景。

英文摘要

The application of machine learning models in practical tasks faces challenges such as class imbalance and multidimensional noise. This paper proposes RGNet, a neural network architecture based on the concept of the renormalization group (RG), for hierarchical coarse-graining of the feature space. The model sequentially compresses the input dimensionality and concatenates all scales before classification, allowing it to capture both local details and global patterns. The notion of RG-flows is introduced - interpretable low-dimensional representations whose visualization via t-SNE reveals a discrete curvilinear structure confirming the effectiveness of coarse-graining. Experimental results are presented on the imbalanced AI4I dataset. The obtained results demonstrate that RGNet is a universal, interpretable, and competitive solution for fault prediction in applications with imbalanced classes.

2606.18388 2026-06-18 cs.LG cs.AI cs.CL cs.MA 新提交

LLMZero: Discovering Adaptive Training Strategies for RL Post-Training via LLM Agents

LLMZero: 通过LLM智能体发现RL后训练的自适应训练策略

Haoyang Fang, Wei Zhu, Boran Han, Alex Zhang, Zhenyu Pan, Shuo Yang, Shuai Zhang, Jiading Gai, Peng Tang, Cuixiong Hu, Xuan Zhu, Huzefa Rangwala, George Karypis, Bernie Wang

发表机构 * Amazon(亚马逊)

AI总结 提出LLMZero系统,利用LLM智能体通过树搜索发现多阶段RL后训练的自适应策略,揭示容量参数单调累积、正则化参数振荡的规律,在4个GRPO任务上相对基线提升9%-140%。

详情
AI中文摘要

RL后训练策略依赖于数据集,并揭示了一个反复出现的经验模式:容量参数在阶段间单调累积,而正则化参数主要根据训练动态的变化而振荡。这种区别很重要,因为固定调度将所有参数提交到固定轨迹,因此无法表达正则化必须跟踪的非平稳探索-利用权衡;该原则为多阶段训练提供了可操作的设计规则。我们通过LLMZero发现了这一点,该系统通过树搜索让LLM智能体搜索训练轨迹,诊断每个检查点的病理并提出协调的多参数转换。在4个不同的GRPO任务中,LLMZero发现的策略相对基础模型提升9%到140%,相对网格搜索提升6%到15%,始终优于随机搜索和基于技能的智能体。该结构原则跨任务迁移,解释了为什么发现的策略形式不同但参数动态相似。

英文摘要

RL post-training strategies are dataset-dependent and reveal a recurring empirical pattern: capacity parameters accumulate monotonically across stages, while regularization parameters predominantly oscillate in response to shifting training dynamics. This distinction matters because fixed schedules commit all parameters to fixed trajectories and therefore cannot express the non-stationary exploration-exploitation tradeoffs that regularization must track; the principle provides actionable design rules for multi-stage training. We discover this through LLMZero, a system where LLM agents search over training trajectories via tree search, diagnosing pathologies at each checkpoint and proposing coordinated multi-parameter transitions. Across 4 diverse GRPO tasks, LLMZero discovers strategies that improve over the base model by 9% to 140% relative and over grid search by 6% to 15% relative, consistently outperforming random search and the skill-based agent. The structural principle transfers across tasks, providing an explanation for why discovered strategies take qualitatively different forms yet share similar parameter dynamics.

2606.18457 2026-06-18 cs.LG 新提交

Task-Restricted Symmetries in Recurrent Weight Space

循环权重空间中的任务限制对称性

Simon Dräger

发表机构 * Salk Institute for Biological Studies, La Jolla, CA, USA(索尔克生物研究所,拉霍亚,加利福尼亚州,美国)

AI总结 通过有序实Schur坐标分析单层tanh RNN,发现任务分布下循环矩阵存在功能冗余,特定非正常Schur耦合可被移除而不影响性能,揭示了任务限制的近似功能不变性。

Comments 6 pages, 2 figures. Accepted at the ICML 2026 Workshop on Weight-Space Symmetries

详情
AI中文摘要

循环网络在权重空间中可能包含大量的功能冗余:改变一个循环矩阵可能使输入-输出展开在任务分布上几乎不变,而类似尺度的变化可能破坏相同的行为。我们使用有序实Schur坐标研究单层tanh RNN中的这种冗余。Schur形式将谱块与定向非正常耦合分开,为保持输入和读出映射固定的结构化消融提供了诊断基础。在固定长度的复制任务中,一些训练好的解中可以选择性地移除非正常Schur耦合而损失很小,而其他耦合对于准确的自主回放是必要的。在触发器、正弦生成和上下文相关积分任务中,损失保持的消融轮廓因任务和训练解而异。这些结果识别了候选的近似功能不变性,而非循环权重空间的普遍对称性。Schur坐标消融提供了一种实用的诊断方法,用于判断哪些结构化扰动能保持训练好的循环解,哪些会破坏其计算。

英文摘要

Recurrent networks can contain substantial functional redundancy in weight space: changing a recurrent matrix may leave the input-output rollout nearly unchanged on a task distribution, while similar-scale changes can destroy the same behavior. We study this redundancy in one-layer tanh RNNs using ordered real Schur coordinates. The Schur form separates spectral blocks from directed nonnormal couplings, giving a diagnostic basis for structured ablations that keep the input and readout maps fixed. In a fixed-length copy task, selected nonnormal Schur couplings can be removed with little loss in some trained solutions, whereas other couplings are necessary for accurate autonomous replay. Across flip-flop, sine generation, and context-dependent integration, the loss-preserving ablation profile varies across tasks and trained solutions. These results identify candidate approximate functional invariances, not universal symmetries of recurrent weight space. Schur-coordinate ablations provide a practical diagnostic for which structured perturbations preserve a trained recurrent solution and which ones disrupt its computation.

2606.18487 2026-06-18 cs.LG cs.AI cs.CL 新提交

SFT Overtraining Predicts Rank Inversion via Entropy Collapse Under RLVR

SFT 过训练通过熵崩溃预测 RLVR 下的排名反转

Siddharth Aphale, Kelly Liu

发表机构 * Stanford University(斯坦福大学)

AI总结 研究发现 SFT 过度训练导致 rollout 分布熵降低,使 GRPO 中优势信号消失,从而引发排名反转;提出基于熵的两阶段诊断方法可预警高风险检查点。

Comments 14 pages, 6 figures. Accepted at the Deep Learning for Code (DL4C) Workshop at ICML 2026

详情
AI中文摘要

当 SFT 压缩 rollout 分布时,选择 pass@1 最高的 SFT 检查点进行 GRPO 的标准启发式方法可能失败。对于二元奖励,组内期望优势方差为 $p(1{-}p)(g{-}1)/g$;当早期 GRPO 将 $p$ 驱动到 $p^*(g)$ 以下时,大多数组具有相同奖励,不提供组间相对信号。我们研究了 Qwen2.5-Coder-3B 和 DeepSeek-Coder-6.7B 的 SFT 深度阶梯。我们在五个深度和三个种子上测试 Qwen2.5-Coder-3B,在四个匹配深度和三个种子上测试 DeepSeek-Coder-6.7B。在 Qwen 上,RL 前的 pass@1 随 SFT 深度增加而上升,但 GRPO 峰值 pass@10 从 $0.806$ 下降到 $0.481$(3 种子均值,$n{=}20$);RL 前的熵与 GRPO 结果正相关($\rho{=}{+}0.69$)。在 DeepSeek 上,pass@1 仍远高于 $p^*(8){=}0.083$,GRPO 结果压缩而非反转。结合 RL 前熵分诊与早期 GRPO 熵监测的两阶段诊断方法,可标记高风险检查点并提前停止失败运行。在我们的设置中,简单的 KL 参考正则化和标签平滑变体未能挽救崩溃的 Qwen 检查点,表明该失败并非琐碎的 GRPO 超参数伪影。

英文摘要

The standard heuristic of selecting the SFT checkpoint with the highest pass@1 for GRPO can fail when SFT compresses the rollout distribution. For binary rewards, the expected within group advantage variance is $p(1{-}p)(g{-}1)/g$; when early GRPO drives $p$ below $p^*(g)$, most groups have identical rewards and provide no group relative signal. We study SFT depth ladders for Qwen2.5-Coder-3B and DeepSeek-Coder-6.7B. We test Qwen2.5-Coder-3B across five depths and three seeds, and DeepSeek-Coder-6.7B across four matched depths and three seeds. On Qwen, pre RL pass@1 rises with SFT depth, but peak GRPO pass@10 falls from $0.806$ to $0.481$ (3 seed mean, $n{=}20$); pre RL entropy is positively associated with the GRPO outcome ($ρ{=}{+}0.69$). On DeepSeek, pass@1 remains far above $p^*(8){=}0.083$, and GRPO outcomes compress rather than invert. A two stage diagnostic, combining pre RL entropy triage with an early GRPO entropy monitor, flags high risk checkpoints and can stop failing runs early. Simple KL to reference regularisation and label smoothing variants do not rescue the collapsed Qwen checkpoint in our setting, suggesting the failure is not a trivial GRPO hyperparameter artefact.

2606.18521 2026-06-18 cs.LG cs.AI 新提交

Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging

稀疏性诅咒:从模型合并理解RLVR模型参数空间

Chenrui Wu, Zexi Li, Jiajun Bu, Jiangchuan Liu, Haishuai Wang

发表机构 * Zhejiang University(浙江大学) Simon Fraser University(西蒙菲莎大学) The Chinese University of Hong Kong(香港中文大学) Zhejiang Key Lab of Accessible Perception and Intelligent Systems(浙江省可感知智能系统重点实验室)

AI总结 本文发现RLVR模型的稀疏更新在参数空间中分散更远,形成近正交捷径导致合并脆弱,并提出SAR-Merging方法解决该问题。

Comments Accepted by KDD 2026

详情
AI中文摘要

可验证奖励强化学习(RLVR)已成为一种强大的后训练范式,在激发推理智能和抵抗灾难性遗忘方面超越了监督微调(SFT)。最近的研究进一步揭示,与SFT相比,RLVR会引发高度稀疏且偏离主成分的参数更新。这自然引出一个问题:这种稀疏性是否使RLVR模型更易于模型合并?如果是,模型合并将提供一种可扩展的、无需训练的方法,来聚合来自独立训练的RLVR模型的多样化推理能力。令人惊讶的是,我们发现相反的情况,揭示了一种稀疏性诅咒:稀疏的RLVR更新在参数空间中分散得更远,形成近正交的捷径,使得聚合本质上是脆弱的。这很可能源于RL优化的随机性和涌现推理模式的多样性。与SFT模型收敛到共享的平坦盆地并自然合并不同,RLVR模型在标准合并方法下遭受严重退化。通过对更新几何的系统性实证分析,我们描述了这种失败背后的机制,并提出了敏感性感知解析合并(SAR-Merging),这是一种针对RLVR参数空间独特结构定制的合并方案。SAR-Merging通过基于Fisher信息的敏感性仲裁解决重叠更新区域中的冲突,然后通过幅度感知稀疏化和重新缩放来保留脆弱的推理路径。在数学和编程基准上的实验表明,SAR-Merging在RLVR模型上显著优于现有合并方法,实现了单任务增强和多能力融合。

英文摘要

Reinforcement Learning with Verifiable Reward (RLVR) has emerged as a powerful post-training paradigm that surpasses Supervised Fine-Tuning (SFT) in eliciting reasoning intelligence and resisting catastrophic forgetting. Recent studies further reveal that RLVR induces highly sparse and off-principal parameter updates compared to SFT. This naturally raises the question: does such sparsity make RLVR models more amenable to model merging? If so, model merging would offer a scalable, training-free path to aggregate diverse reasoning capabilities from independently trained RLVR models. Surprisingly, we find the opposite, uncovering a sparsity curse: the sparse RLVR updates are spread farther apart in parameter space, forming near-orthogonal shortcuts that make aggregation inherently fragile. This is likely rooted in the stochasticity of RL optimization and the diversity of emergent reasoning patterns. Unlike SFT models that converge to shared, flat basins and merge naturally, RLVR models suffer severe degradation under standard merging methods. Through systematic empirical analysis of the update geometry, we characterize the mechanisms behind this failure and propose Sensitivity-aware Resolving Merging (SAR-Merging), a merging recipe tailored for the unique structure of RLVR parameter spaces. SAR-Merging resolves conflicts in overlapping update regions via Fisher Information-based sensitivity arbitration, followed by magnitude-aware sparsification and rescaling to preserve fragile reasoning pathways. Experiments on mathematical and coding benchmarks demonstrate that SAR-Merging substantially outperforms existing merging methods on RLVR models, enabling both single-task enhancement and multi-capability fusion.

2606.18524 2026-06-18 cs.LG 新提交

On the Residual Scaling of Looped Transformers: Stability and Transferability

关于循环Transformer的残差缩放:稳定性和可迁移性

Shaowen Wang, Bingrui Li, Ge Zhang, Wenhao Huang, Shen Yan, Jian Li

发表机构 * Tsinghua University(清华大学)

AI总结 针对循环Transformer,提出残差缩放因子应为1/N而非1/√L,并推导出多层的分解参数化,实现超参数从少循环到多循环的迁移。

Comments 19 pages, 9 figures

详情
AI中文摘要

循环(权重共享)Transformer 将共享残差块应用 N 次(h ← h + ε f(h),每一步使用相同的 f),在不增加参数的情况下增加有效深度。先前的深度缩放分析建议深度为 L 的残差网络使用 ε = 1/√L。我们证明这对于循环架构是不够的:权重共享使得残差更新在迭代间相关,需要更强的缩放 ε = 1/N。对于多层块(L 个独特层循环 N 次),我们推导出一个分解参数化 ε = λ/(N√L),将两种增长源分开:1/N 控制层内循环相关性,1/√L 控制层间方差。一个关键结果是,最优学习率仅取决于独特层数 L,而非循环次数 N,从而实现了从小的 N 到大的 N 的直接超参数迁移,无需重新调整。在循环 Transformer 上的实验证实,1/N 缩放相比 1/√N 缩放提高了可训练性,并在不同循环次数下获得更优的损失。

英文摘要

Looped (weight-tied) Transformers apply a shared residual block $N$ times ($h \leftarrow h + \varepsilon\,f(h)$, same $f$ at each step), increasing effective depth without adding parameters. Prior depth-scaling analyses prescribe $\varepsilon = 1/\!\sqrt{L}$ for depth-$L$ residual networks. We show that this is insufficient for looped architectures: weight sharing makes residual updates correlated across iterations, requiring the stronger scaling $\varepsilon = 1/N$. For multi-layer blocks ($L$ unique layers looped $N$ times), we derive a factored parameterization $\varepsilon = λ/(N\!\sqrt{L})$ that separates the two sources of growth: $1/N$ controls the within-layer loop correlation, and $1/\!\sqrt{L}$ controls the across-layer variance. A key consequence is that the optimal learning rate depends only on the number of unique layers $L$, not on the loop count $N$, enabling direct hyperparameter transfer from small to large $N$ without retuning. Experiments on looped Transformers confirm that $1/N$ scaling improves trainability and yields better loss than $1/\!\sqrt{N}$ scaling across loop counts.

2606.18525 2026-06-18 cs.LG 新提交

Hierarchical Attention via Domain Decomposition

基于区域分解的层次注意力机制

Stephan Köhler, Oliver Rheinbach

发表机构 * Faculty of Mathematics and Computer Science(数学与计算机科学系)

AI总结 提出一种基于两水平重叠Schwarz区域分解的层次注意力机制,通过局部低秩注意力块与粗网格注意力块结合,在少参数下实现更快训练和更高精度。

Comments 20 pages, 10 figures

详情
AI中文摘要

我们提出了一种基于两水平重叠Schwarz区域分解的层次注意力机制。该方法的动机源于观察到两水平Schwarz区域分解方法将局部子域校正与一个传达全局、长程信息的粗水平相结合。我们在一个具有齐次Dirichlet边界条件的一维扩散问题背景下,测试了其在有限维算子学习中的实用性。尽管该问题简单,但它提供了一个受控的序列到序列设置,其中精确的非局部解算子已知。离散化后,学习解算子相当于逼近一个对称正定矩阵的逆。作为基线,我们使用一个全局无softmax的低秩注意力算子,形式为$QK^T$。所提出的构造将这个密集的全局分解替换为一个两水平加性结构:重叠子域上的局部低秩注意力块与一个粗注意力块相结合。得到的算子形式为$$M_{\theta}^{-1} = \Phi Q_0 K_0^T \Phi^T + \sum_{i=1}^{N} R_i^T D_i^{1/2} Q_i K_i^T D_i^{1/2} R_i.$$ 这里$R_i$限制到重叠子域,$D_i$是单位划分权重,$\Phi$是粗插值(或延拓)矩阵。针对合成Fourier右端项的数值实验表明,区域分解注意力算子能够比全局低秩注意力基线训练更快,并在使用显著更少参数的情况下提供更精确的逼近。

英文摘要

We propose a hierarchical attention mechanism based on two-level overlapping Schwarz domain decomposition. The method is motivated by the observation that two-level Schwarz domain decomposition methods combine local subdomain corrections with a coarse level that communicates global, long-range information. We test its usefulness in the context of finite-dimensional operator learning using a simple, one-dimensional diffusion problem with homogeneous Dirichlet boundary conditions. Although elementary, this problem provides a controlled sequence-to-sequence setting in which the exact nonlocal solution operator is known. After discretization, learning the solution operator amounts to approximating the inverse of a symmetric positive definite matrix. As a baseline, we use a global softmax-free low-rank attention operator of the form $QK^T$. The proposed construction replaces this dense global factorization by a two-level additive structure: local low-rank attention blocks on overlapping subdomains are combined with a coarse attention block. The resulting operator has the form $$M_θ^{-1} = ΦQ_0 K_0^T Φ^T + \sum_{i=1}^{N} R_i^T D_i^{1/2} Q_i K_i^T D_i^{1/2} R_i.$$ Here $R_i$ restricts to an overlapping subdomain, $D_i$ is a partition-of-unity weight, and $Φ$ is a coarse interpolation (or prolongation) matrix. Numerical experiments for synthetic Fourier right-hand sides indicate that the domain-decomposition attention operator is able to train faster and can give more accurate approximations than a global low-rank attention baseline while using significantly fewer parameters.

2606.18627 2026-06-18 cs.LG 新提交

PACT: Preserving Anchored Cores in Task-vectors for Model Merging

PACT: 在任务向量中保留锚定核心用于模型合并

Ningyuan Shi, Zhipeng Zhou, Hao Wang, Chunyan Miao, Peilin Zhao

发表机构 * Shanghai Jiao Tong University(上海交通大学) Nanyang Technological University(南洋理工大学) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州))

AI总结 提出PACT方法,通过识别并保留预训练权重中的承重墙维度,在任务向量中锚定任务特定核心,解决任务向量范式下任务冲突和性能下降问题,提升模型合并效果。

Comments 33 pages,14 figures

详情
AI中文摘要

模型合并已成为多任务学习的一种无需训练的替代方案,旨在将多个任务特定的微调模型组合成一个单一的多任务模型。大多数现有的模型合并方法遵循任务算术范式,该范式将微调权重分解为预训练参数和任务向量,并仅在任务向量空间中进行合并。这一范式的有效性隐含地依赖于一个假设,即任务特定知识仅编码在任务向量中。我们认为,由于预训练模型固有的任务偏好,这一假设通常不成立。具体而言,我们识别出\textbf{承重墙(LBW)维度},即一些任务关键知识仍嵌入在预训练权重中,而非完全转移到任务向量中。我们从标量权重和子空间两个角度刻画LBW维度,从而覆盖现有模型合并方法的主要范式。我们的分析表明,忽略LBW维度会导致基于任务向量的方法无法完全解决任务冲突,并可能无意中破坏预训练模型中编码的任务特定知识,从而导致性能下降。为解决这一问题,我们提出PACT,该方法通过将任务向量的正交补与预训练权重的子空间对齐,从而在任务向量中保留锚定的任务特定核心(即LBW维度)。在应用现有模型合并算法之前,将这些对齐的子空间分量从任务向量中移除。此外,我们开发了一种基于随机SVD的高效变体以提高可扩展性。PACT可以无缝集成到现有方法中。在多个基准上的大量实验表明,PACT持续增强主流模型合并方法,并建立了新的最先进性能。

英文摘要

Model merging has emerged as a training-free alternative to multi-task learning, aiming to combine multiple task-specific fine-tuned models into a single multi-task model. Most existing model merging approaches follow the Task Arithmetic paradigm, which decomposes fine-tuned weights into pre-trained parameters and task vectors, and performs merging exclusively in the task-vector space. The effectiveness of this paradigm implicitly relies on the assumption that task-specific knowledge is encoded solely within task vectors. We argue that this assumption generally does not hold due to the intrinsic task preferences of pre-trained models. Specifically, we identify \textbf{Load-Bearing Wall (LBW) dimensions}, namely some task-critical knowledge that remains embedded in the pre-trained weights rather than being fully transferred into task vectors. We characterize LBW dimensions from both scalar-weight and subspace perspectives, thereby covering the major paradigms of existing model merging methods. Our analysis reveals that, by ignoring LBW dimensions, task-vector-based approaches fail to fully resolve task conflicts and may inadvertently damage task-specific knowledge encoded in the pre-trained model, leading to degradation. To address this issue, we propose PACT, which preserves the anchored task-specific cores (i.e., LBW dimensions) within task vectors by aligning their orthogonal complements with the subspace of the pre-trained weights. These aligned subspace components are then removed from the task vectors before applying existing model merging algorithms. Furthermore, we develop an efficient variant based on randomized SVD to improve scalability. PACT can be seamlessly integrated with existing methods. Extensive experiments across multiple benchmarks demonstrate that PACT consistently enhances mainstream model merging approaches and establishes new state-of-the-art performance.

2606.18676 2026-06-18 cs.LG cs.CV 新提交

InTrain: Intrinsic Trainability for Zero-Cost Neural Architecture Search

InTrain: 面向零成本神经架构搜索的内在可训练性

Qinqin Zhou, Fuhai Chen, Jipeng Wu, Zhiwei Chen, Zhikai Hu, Weiwei Cai

发表机构 * School of Computer and Data Science, Fuzhou University(福州大学计算机与数据科学学院) School of Computer and Data Science, Minjiang University(闽江学院计算机与数据科学学院) School of Artificial Intelligence, Nanchang University(南昌大学人工智能学院) Department of Computer Science, Hong Kong Baptist University(香港浸会大学计算机科学系) School of Interdisciplinary Medicine and Engineering, Harbin Medical University(哈尔滨医科大学跨学科医学与工程学院)

AI总结 提出统一理论代理InTrain,通过几何容量和优化韧性两个协同成分形式化架构的可训练性,在NAS基准上达到与集成方法相当的排序相关性。

详情
Journal ref
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026
AI中文摘要

免训练神经架构搜索有望在不进行昂贵训练的情况下高效发现高性能网络。然而,现有的零成本代理依赖于碎片化的启发式方法,未能捕捉基本问题:是什么使一个架构具有可训练性?本文引入内在可训练性(InTrain),一个统一的理论代理,将可训练性形式化为由两个协同成分——几何容量和优化韧性——涌现出的架构不变性。我们通过分析神经信息处理来操作化内在可训练性。几何容量通过激活协方差特征谱的参与比量化,捕捉表示流形的有效维度。优化韧性通过累积梯度健康度测量,评估跨网络深度的反向传播鲁棒性。InTrain通过尺度不变的乘法耦合综合这些维度,我们假设这对于捕捉它们协同、非加性的关系至关重要。在标准NAS基准和搜索空间上的大量实验表明,InTrain达到了与最先进的基于集成的代理相当的排序相关性,并优于其他单指标方法。

英文摘要

Training-free neural architecture search promises efficient discovery of high-performance networks without costly training. However, existing zero-cost proxies rely on fragmented heuristics that fail to capture the fundamental question: what makes an architecture trainable? This paper introduces Intrinsic Trainability (InTrain), a unified theoretical proxy that formalizes trainability as an architectural invariant emerging from two synergistic components: geometric capacity and optimization resilience. We operationalize intrinsic trainability through analysis of neural information processing. Geometric capacity is quantified via the participation ratio of activation covariance eigenspectrum, capturing the effective dimensionality of representation manifolds. Optimization resilience is measured through cumulative gradient health, assessing the robustness of backpropagation across network depth. InTrain synthesizes these dimensions through a scale-invariant multiplicative coupling, which we hypothesize is essential for capturing their synergistic, non-additive relationship. Extensive experiments on standard NAS benchmarks and search spaces demonstrate that InTrain achieves ranking correlations on par with state-of-the-art ensemble-based proxies and outperforms other single-metric methods.

2606.18694 2026-06-18 cs.LG cond-mat.dis-nn cs.CL cs.NE nlin.AO 新提交

Attention as Frustrated Synchronization

注意力作为受挫同步

Joshua Nunley

发表机构 * Cognitive Science Program(认知科学项目) Luddy School of Informatics, Computing, and Engineering(信息学、计算与工程学院) Indiana University Bloomington(印第安纳大学布卢明顿分校)

AI总结 提出受挫同步网络(FSN),通过复值耦合核和延迟项实现基于同步的注意力机制,在百万参数级字符级文本和代码任务上优于调优的RoPE-SwiGLU Transformer。

Comments 25 pages, 4 figures. Preliminary report at the 1-10M parameter scale

详情
AI中文摘要

一个完美同步的振荡器网络无法进一步计算,因此基于同步构建的注意力架构必须将其计算定位在结构性的偏离一致中。我们引入了受挫同步网络(FSN),其令牌状态是环面上的相位,整个值通路是一个学习到的复值耦合核,包含谐波和一步延迟。核的每个分量在同步文献意义上都是一个受挫。复相位是静态的Kuramoto-Sakaguchi受挫角,带符号的谐波是排斥性的Daido分量,而延迟项(将每个令牌与其关注的令牌的后继耦合)在代数上与Kuramoto-Sakaguchi耦合相同,其受挫角是数据自身的转移,因此下一个令牌预测被实现为由数据受挫的同步。在匹配百万参数和训练预算的字符级文本和代码任务上,FSN的验证损失在每个测量周期都低于调优的RoPE-SwiGLU Transformer,并且该比较在基线训练至收敛后仍然成立:每30个周期的enwik8种子都低于Transformer收敛的50周期损失1.611,而FSN完成的50周期运行收敛至1.5953 ± 0.0014。一种变体将每个前馈块替换为对学习到的集体模式的平均场耦合,堆栈中不保留多层感知机,其性能与Transformer相当。在自然文本上,无受挫的基础层在每个复制深度上都落后于收敛的Transformer,在长距离复制事件上最差;而核在四个及以上深度处逆转了这种劣势。标题比较在百万参数规模下进行;规模阶梯在四百万参数下完成,优势持续存在,其余分支标记为进行中。

英文摘要

A network of oscillators that synchronizes perfectly computes nothing further, so an attention architecture built from synchronization must locate its computation in structured departures from agreement. We introduce the Frustrated Synchronization Network (FSN), whose token states are phases on a torus and whose entire value pathway is one learned complex coupling kernel over harmonics and a one-step delay. Each component of the kernel is a frustration in the sense of the synchronization literature. The complex phases are static Kuramoto-Sakaguchi frustration angles, the signed harmonics are repulsive Daido components, and the delay term, which couples each token to the successors of the tokens it attends to, is algebraically identical to Kuramoto-Sakaguchi coupling whose frustration angle is the data's own transition, so next-token prediction is implemented as synchronization frustrated by the data. At matched one-million-parameter and training budgets on character-level text and code, the FSN's validation loss is below a tuned RoPE-SwiGLU transformer's at every epoch measured, and the comparison survives training the baseline to convergence: every thirty-epoch enwik8 seed finishes below the transformer's converged fifty-epoch loss of 1.611, and the FSN's completed fifty-epoch runs converge to 1.5953 +/- 0.0014. A variant with every feed-forward block replaced by mean-field coupling to learned collective modes, leaving no multilayer perceptron in the stack, tracks the transformer. On natural text the unfrustrated base layer falls behind the converged transformer at every copy depth, worst on long-range copy events; the kernel reverses the deficit at every depth of four and beyond. Headline comparisons are at the one-million-parameter scale; a scale ladder is complete through four million parameters with the advantage persisting, and remaining arms are marked as in progress.

2606.18844 2026-06-18 cs.LG 新提交

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

从自身错误中学习:为自蒸馏构建可学习的微反思轨迹

Zhilin Huang, Hang Gao, Ziqiang Dong, Yuan Chen, Yifeng Luo, Chujun Qin, Jingyi Wang, Yang Yang, Guanjun Jiang

发表机构 * Qwen Business Unit of Alibaba(阿里巴巴通义千问事业部) Tsinghua University(清华大学) Peking University(北京大学)

AI总结 提出TAPO方法,通过对比正确与错误轨迹构建微反思修正,实现从隐式分布对齐到显式轨迹构建的自蒸馏改进,在多个数学推理基准上优于GRPO。

详情
AI中文摘要

自蒸馏通过使用模型自身的生成作为训练信号来改进大型语言模型的推理能力,通常通过隐式的logit级对齐来实现,最小化与特权目标分布的KL散度。然而,由于这种监督是通过无控制采样生成的,它无法提供关于模型特定错误的诊断性洞察,也无法针对其个体失败模式提供纠正性指导。因此,模型学习的是模仿特权分布,而不是接收精确指出其推理失败位置和原因的细粒度修正。在本文中,我们提出了轨迹增强策略优化(TAPO),将自蒸馏从隐式分布对齐推进到显式轨迹构建。在强化学习训练期间,模型对同一查询同时产生正确和错误的生成轨迹,TAPO利用这种对比结构来构建微反思修正——新的训练轨迹,保留模型在失败点之前的错误推理,然后插入自然语言诊断和由同一采样组中的正确参考引导的修正推理。由于每条轨迹都锚定在学习者自身的前缀和解决方案上,与基于KL的方法施加的位置级对齐相比,修正信号在更大程度上保留了模型的在策略分布。为了整合这些轨迹,TAPO在模型能力边界引入了难度感知的候选选择,并采用解耦优势估计以防止梯度污染。在AIME 2024、AIME 2025和HMMT 2025上的实验表明,在相同训练步数下,TAPO相比GRPO取得了一致的改进。进一步分析表明,TAPO增强了首次推理和错误纠正的有效性。

英文摘要

Self-distillation improves reasoning in large language models by using the model's own rollouts as training signal, typically through implicit logit-level alignment that minimizes KL divergence toward a privileged target distribution. However, because this supervision is generated via uncontrolled sampling, it provides no diagnostic insight into the model's specific errors or corrective guidance for its individual failure patterns. Consequently, the model learns to imitate a privileged distribution rather than receiving fine-grained corrections that pinpoint where and why its reasoning fails. In this paper, we propose Trajectory-Augmented Policy Optimization (TAPO), which advances self-distillation from implicit distributional alignment to explicit trajectory construction. During RL training, the model produces both correct and incorrect rollouts to the same query, and TAPO leverages this contrastive structure to construct micro-reflective corrections, new training trajectories that retain the model's erroneous reasoning up to the point of failure, then insert a natural-language diagnosis and corrected reasoning guided by a correct reference from the same sampling group. Since each trajectory is anchored in the learner's own prefix and solutions, the corrective signal preserves the model's on-policy distribution to a greater extent than the position-wise alignment imposed by KL-based methods. To integrate these trajectories, TAPO introduces difficulty-aware candidate selection at the model's capability boundary and decoupled advantage estimation to prevent gradient contamination. Experiments on AIME 2024, AIME 2025, and HMMT 2025 show that TAPO achieves consistent improvements over GRPO under the same number of training steps. Further analysis demonstrates that TAPO strengthens both first-pass reasoning and error-correction effectiveness.

2606.18923 2026-06-18 cs.LG 新提交

GrapNet: A Programmable Dynamic-Architecture Neural Graph Substrate

GrapNet: 一种可编程的动态架构神经图基板

Zirong Li

发表机构 * Zirong Li(李子荣)

AI总结 提出GrapNet,一种将图作为可执行架构的神经基板,通过可编程接口支持结构编辑、冻结子图、局部审计等操作,在Split Fashion-MNIST和Split CIFAR-10上分别提升12.08和3.81个百分点的准确率。

Comments 8 pages, 1 figure, preprint

详情
AI中文摘要

可编程性是固定张量神经网络中缺失的一流接口:编辑关系、冻结子图、审计局部函数或更改执行后端应是对神经程序的操作,而非临时参数手术。GrapNet研究这种图即网络的设置。图是架构和可执行程序,而非输入数据图。每个计算节点拥有其下一层子节点引用和与这些引用对齐的可训练分配向量;删除关系会物理移除子节点引用和相应的分配坐标。结构规则和执行策略位于节点核心之外,因此同一子节点拥有的图可以被增长、冻结、结构编辑、分组为可训练族块、通过注意力在活动关系上路由,或在拓扑稳定后降级为密集快照。GrapNet通过向量值父接口与常规模块组合:密集层、CNN编码器、ResNet特征提取器、注意力块和Transformer表示都可以为每个坐标提供一个感知GrapNode。评估组织为可编程性压力测试套件,而非新的重放基准。在匹配的十种子Split Fashion-MNIST研究中,可塑GrapNet+ER头在相同已见类损失和重放记忆下达到63.16%的已见类准确率,而参数更大的密集MLP+ER为51.08%,配对差值为12.08点,p=1.3e-5。在Split CIFAR-10上使用冻结的ImageNet ResNet-18编码器时,相同基板将在线头比MLP-256提高3.81点,p=0.0026。这些结果支持GrapNet作为可编辑的神经图基板,其核心价值在于具有忠实执行视图的结构可编程性。

英文摘要

Programmability is a missing first-class interface in fixed-tensor neural networks: editing a relation, freezing a subgraph, auditing a local function, or changing the execution backend should be an operation on the neural program rather than ad-hoc parameter surgery. GrapNet studies this graph-as-network setting. The graph is the architecture and executable program, not an input data graph. Each compute node owns its next-layer child references and a trainable allocation vector aligned with those references; deleting a relation physically removes both the child reference and the corresponding allocation coordinate. Structural rules and execution policies live outside the node core, so the same child-owned graph can be grown, frozen, structurally edited, grouped into trainable family blocks, routed by attention over active relations, or lowered to dense snapshots after topology stabilizes. GrapNet composes with conventional modules through a vector-valued parent interface: dense layers, CNN encoders, ResNet feature extractors, attention blocks, and transformer representations can all feed one sensory GrapNode per coordinate. The evaluation is organized as a programmability stress suite rather than as a new replay benchmark. In a matched ten-seed Split Fashion-MNIST study, a plastic GrapNet+ER head reaches 63.16 percent seen-class accuracy versus 51.08 percent for a parameter-larger dense MLP+ER under the same seen-class loss and replay memory, with paired delta 12.08 points and p=1.3e-5. On Split CIFAR-10 with a frozen ImageNet ResNet-18 encoder, the same substrate improves the online head over MLP-256 by 3.81 points, with p=0.0026. These results support GrapNet as an editable neural graph substrate whose core value is structural programmability with faithful execution views.

2606.19120 2026-06-18 cs.LG cs.CV 新提交

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

先看后思:解耦感知与推理以实现抗捷径的多模态在策略自蒸馏

Sihan Wang, Xiyao Liu, Lianqing Liu, Zhi Han

发表机构 * State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences(中国科学院沈阳自动化研究所机器人学国家重点实验室) University of Chinese Academy of Sciences(中国科学院大学)

AI总结 提出ViGOS框架,通过解耦感知和推理,在MLLM后训练中避免文本捷径,提升图像依赖行为。

Comments 29 pages, 5 figures, 8 tables

详情
AI中文摘要

在策略自蒸馏(OPSD)训练模型在其自身rollouts上,并使用冻结副本提供基于参考目标的密集token级目标。这对于LLM推理效果良好,但直接扩展到多模态大语言模型(MLLMs)可能产生捷径:特权目标可能主要基于文本参考目标而非图像来引导token。我们提出ViGOS,一种视觉引导的OPSD框架用于MLLM后训练。学生首先编写视觉描述,然后推理出最终答案。对于有效rollouts,仅图像的感知教师监督描述,而特权推理教师监督同一学生前缀上的推理和最终答案。仅对无效rollouts使用参考教师以恢复输出格式。在通用视觉-语言、专家推理、视觉数学、空间定位和视觉-语言先验基准测试中,ViGOS保持了OPSD的主要优势,并在易产生捷径的设置中改善了图像引导行为。

英文摘要

On-policy self-distillation (OPSD) trains a model on its own rollouts and uses a frozen copy to provide dense token-level targets conditioned on a reference target. This works well for LLM reasoning, but a direct extension to multimodal large language models (MLLMs) can create a shortcut: the privileged target may guide tokens mainly based on the text reference target rather than the image. We propose ViGOS, a visually grounded OPSD framework for MLLM post-training. The student first writes a visual description and then reasons toward the final answer. For valid rollouts, an image-only perception teacher supervises the description, while a privileged reasoning teacher supervises the reasoning and final answer on the same student prefix. A reference teacher is used only for invalid rollouts to recover the output format. Across general vision-language, expert reasoning, visual math, spatial grounding, and visual-language-prior benchmarks, ViGOS keeps the main benefits of OPSD and improves image-grounded behavior in shortcut-prone settings.

2606.19138 2026-06-18 cs.LG stat.ML 新提交

INDEQS: Informed Neural controlled Differential EQuationS

INDEQS: 信息引导的神经控制微分方程

Michael Detzel, Gabriel Nobis, Kristiyan Blagov, Juri Schubert, Jackie Ma, Wojciech Samek

AI总结 提出INDEQS,一种基于图的NCDE预测方法,通过在不同架构位置注入有向图先验知识,结合内外混合机制和自适应图卷积,在合成和真实任务中优于无信息NCDE。

详情
AI中文摘要

神经控制微分方程(NCDE)为时间序列预测提供了强大的连续时间框架,但标准的基于图的扩展通常纯粹从数据中学习空间结构,即使在已知有向图结构的情况下也是如此。我们引入了信息引导的神经控制微分方程(INDEQS),这是一种基于图的NCDE预测方法,在特定的架构位置融入有向图的先验知识。INDEQS将隐藏状态在图节点上的内部混合与向量场和控制之间的外部混合分开,并提供了一种轻量级的图约束变体和一种更具表现力的变体,通过自适应图卷积从数据中学习额外的图连接。为了系统研究图信息在预测中的有益时机,我们在有向图上设计了一个连续平流模拟,生成了具有已知真实流结构的合成时空数据集。然后,我们在两个实际任务上评估INDEQS:水文网络上的河流流量预测和PeMS08上的交通流预测。在这些合成和真实基准测试中,外部信息引导在参数数量相当的情况下,持续改善了无信息NCDE的平均绝对误差,尤其是在较大图上,而内部信息引导在需要严格遵循已知邻接时提供了一种更参数高效的替代方案。离散卷积和连续时间解码器的比较进一步表明,连续解码器在实际任务中提供了更好的准确性和更大的时间灵活性。INDEQS和平流模拟的实现可在以下网址获取:此 https URL。

英文摘要

Neural Controlled Differential Equations (NCDE) provide a powerful continuous-time framework for forecasting time series, but standard graph-based extensions typically learn spatial structure purely from data, even in settings where a directed graph structure is known a priori. We introduce Informed Neural controlled Differential EQuationS (INDEQS), a graph-based NCDE forecasting method that incorporates prior knowledge of a directed graph at distinct architectural positions. INDEQS separates inner mixing of hidden states across graph nodes from outer mixing between vector field and control, and offers both a lightweight graph-constrained variant and a more expressive variant, learning additional graph connections from data via adaptive graph convolutions. To systematically study when graph informedness is beneficial in forecasting, we devise a continuous advection simulation on directed graphs, yielding synthetic spatio-temporal datasets with known ground-truth flow structure. We then evaluate INDEQS on two real-world tasks: river discharge forecasting on a hydrological network and traffic flow prediction on PeMS08. Across these synthetic and real-world benchmarks, outer informedness consistently improves mean absolute error over an uninformed NCDE with comparable parameter count, particularly on larger graphs, while inner informedness offers a more parameter-efficient alternative when strict adherence to a known adjacency is desired. A comparison of discrete convolutional and continuous-time decoders further shows that continuous decoders yield better accuracy and greater temporal flexibility on real-world tasks. An implementation of INDEQS and the advection simulation is available at https://github.com/Mitchi1/indeqs.

2606.18275 2026-06-18 cs.ET cond-mat.mtrl-sci cs.LG 交叉投稿

A physical adaptive material motor unit neural network: a hygromorph composite material machine

一种物理自适应材料运动单元神经网络:潮致变形复合材料机器

Charles de Kergariou, David Correa, Adam W. Perriman, Helmut Hauser, Fabrizio Scarpa

发表机构 * Bristol Composites Institute, School of Civil, Aerospace and Mechanical Engineering, University of Bristol(布里斯托尔复合材料研究所,土木、航空航天与机械工程学院,布里斯托尔大学) School of Architecture, University of Waterloo(滑铁卢大学建筑学院) Research School of Chemistry and John Curtin School of Medical Research, Australian National University(化学研究学校和约翰·库廷医学研究学院,澳大利亚国立大学) School of Cellular and Molecular Medicine, University of Bristol(细胞与分子医学学院,布里斯托尔大学) School of Engineering Mathematics and Technology, University of Bristol(工程数学与技术学院,布里斯托尔大学) Bristol Robotics Lab, Bristol, United Kingdom(布里斯托尔机器人实验室,布里斯托尔,英国)

AI总结 提出一种基于木材和炭黑复合材料的物理自适应运动单元神经网络,通过数据感知反向传播训练,实现动态遮阳控制,并能随数据库扩展增量学习。

Comments 35 pages, 16 figures

详情
AI中文摘要

新型材料科学的进步使得结构能够通过将记忆和学习能力直接嵌入材料来充当智能机器。我们的工作介绍了一种物理自适应材料运动单元神经网络,利用由木材和炭黑基复合材料组成的新一代可控执行器,这些执行器对温度和相对湿度敏感。这些材料执行器被组装成一种类似肌肉收缩触发的运动单元结构,形成一种能够进行动态遮阳控制的智能机器,例如可用于建筑物。该机器由一个神经网络控制,该网络在超过350个在不同环境条件下收集的实验数据点上进行训练。通过建立一种新的数据感知反向传播训练,我们展示了该机器能够预测遮阳响应,并随着数据库的扩展逐步学习预测适当的行为。我们还展示了该机器优化配置以在两种不同条件下实现相似遮阳输出的能力。

英文摘要

Advances in novel materials science enable structures to function as intelligent machines by embedding memory and learning capabilities directly into materials. Our work introduces a physical adaptive material motor unit neural network,leveraging a new generation of controllable actuators composed of wood- and carbon black-based composites, sensitive to temperature and relative humidity. These material actuators are assembled into a motor unit-like structure inspired by muscle contraction trigger, forming an intelligent machine capable of dynamic shading control that can be used, for example, in buildings. The machine is governed by a neural network trained on over 350 experimental data points collected under diverse environmental conditions. By establishing a new data-aware backpropagation training, we show that the machine predicts shading responses and learns to predict appropriate behaviour incrementally as the database expands. We also demonstrate the ability of the machine to optimise configurations to achieve similar shading outputs under two distinct conditions.

2606.18305 2026-06-18 math.NA cs.LG cs.NA 交叉投稿

Starter-Iterator Neural Operator: A Unified Architecture for High-Fidelity Forward and Inverse PDE Problems

起始迭代神经算子:面向高保真正问题和逆问题的统一架构

Kuilin Qin, Lianfang Wang, Xu Sun, Jiwei Jia, Yu Wang, Yong Wang, Yuping Duan

发表机构 * School of Mathematical Sciences, Beijing Normal University(北京师范大学数学科学学院) School of Mathematics, Jilin University(吉林大学数学学院) Key Laboratory of Digital Technology in Medical Diagnostics of Zhejiang(浙江省数字医疗诊断技术重点实验室) School of Physics, Nankai University(南开大学物理学院)

AI总结 提出起始迭代神经算子(SINO),通过神经网络重解释传统迭代方法的初始化与迭代格式,实现频谱-时空协同建模,在Navier-Stokes方程、声波方程等正逆问题中提升数值精度与泛化能力。

详情
AI中文摘要

算子学习是一个新兴的交叉学科领域,融合了机器学习与科学计算。通过映射无限维函数空间,该方法为高维偏微分方程(PDE)提供了高效的代理建模框架。与传统数值求解器相比,它在计算复杂度和逼近精度之间实现了更优的权衡,在实时预测和参数扫描等多查询任务中展现出显著优势。鉴于正演模拟和反演推理对精度的严格要求,以及现有算子学习方法在处理复杂边界或长期演化时的精度瓶颈,我们提出了起始迭代神经算子(SINO)。我们的框架通过神经网络重新诠释传统迭代方法的初始化策略和迭代格式,建立了一种高效的频谱-时空协同建模方法。具体而言,频域初始化模块捕获全局稳定的低频特征,而时域学习模块专注于优化局部解残差,从而有效克服了传统单域建模方法的内在局限性。在典型动力系统(如Navier-Stokes方程和声波方程)以及实际应用(包括超分辨率成像和天气预报)上的大量实验表明,SINO在数值精度、泛化能力和鲁棒性方面均取得了卓越性能。

英文摘要

Operator learning is an emerging interdisciplinary field that integrates machine learning with scientific computing. By mapping infinite-dimensional function spaces, this approach provides an efficient surrogate modeling framework for high-dimensional partial differential equations (PDEs). Compared to traditional numerical solvers, it achieves a superior trade-off between computational complexity and approximation accuracy, demonstrating significant advantages in many-query tasks such as real-time prediction and parameter sweeps. Given the stringent accuracy requirements of both forward simulation and inverse inference, as well as the precision bottlenecks of existing operator learning methods in handling complex boundaries or long-term evolution, we propose the Starter-Iterator Neural Operator (SINO). Our framework reinterprets the initialization strategies and iterative formats of traditional iterative methods through neural networks, establishing an efficient approach for spectral-spatiotemporal collaborative modeling. Specifically, the frequency-domain initialization module captures globally stable low-frequency features, while the time-domain learning module focuses on optimizing local solution residuals, thereby effectively overcoming the inherent limitations of conventional single-domain modeling approaches. Extensive experiments on typical dynamical systems such as the Navier-Stokes equations and acoustic wave equations, as well as practical applications including super-resolution imaging and weather forecasting, demonstrate that SINO achieves outstanding performance in numerical accuracy, generalization capability, and robustness.

2606.18611 2026-06-18 cs.SD cs.AI cs.LG stat.ML 交叉投稿

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

QC-GAN: 一种参数高效的四元数Conformer GAN用于高保真语音增强

Shogo Yamauchi, Hideaki Tamori, Makoto Sakai, Yosuke Yamano, Tohru Nitta

发表机构 * The Asahi Shimbun Company(朝日新闻社) Tokyo Woman's Christian University(东京女子基督教大学)

AI总结 提出参数高效的QC-GAN,结合四元数Conformer生成器和MetricGAN训练,通过汉密尔顿积共享权重减少参数量,在VoiceBank+DEMAND上以0.89M参数达到PESQ 3.48,性能媲美两倍大小模型。

Comments 10 pages, 6 figures and 5 tables. Accepted at Interspeech2026

详情
AI中文摘要

我们提出了一种参数高效的语音增强框架——四元数Conformer GAN(QC-GAN),它将四元数Conformer生成器与基于MetricGAN的训练相结合。汉密尔顿积通过结构化权重共享对幅度和相位进行编码,在减少层参数数量的同时保持其相互依赖性。采用度量学习判别器,通过优化近似感知评估分数来最大化感知质量。在VoiceBank+DEMAND数据集上,QC-GAN仅用0.89M参数就达到了3.48的语音质量感知评估(PESQ)分数,其性能与最先进模型相当,而参数量不到后者的一半。一个35K参数的变体实现了3.23的PESQ分数,以显著更少的参数超越了传统方法。在DNS-Challenge 3数据集上的评估进一步证实了其在真实世界条件下的泛化能力。

英文摘要

We propose a parameter-efficient speech enhancement framework, Quaternion Conformer GAN (QC-GAN), which combines a Quaternion Conformer generator with MetricGAN-based training. The Hamilton product encodes the magnitude and phase via structured weight sharing, reducing the number of layer parameters while preserving their interdependencies. A metric-learning discriminator was employed to maximize perceptual quality by optimizing the approximate perceptual evaluation scores. On the VoiceBank+DEMAND dataset, QC-GAN achieved a Perceptual Evaluation of Speech Quality (PESQ) score of 3.48 with only 0.89M parameters, delivering a performance comparable to state-of-the-art models at less than half their size. A 35K-parameter variant achieved a PESQ score of 3.23, surpassing conventional methods with significantly fewer parameters. Evaluation on the DNS-Challenge 3 dataset further confirmed generalization to real-world conditions.

2606.18759 2026-06-18 cs.CG cs.LG cs.NA math.NA 交叉投稿

A Neural Network Framework for Geodesic-Like Curve Computation on Parametric Surfaces

参数曲面上类测地线曲线计算的神经网络框架

Sheng-Gwo Chen, Chen-Chang Peng

发表机构 * Department of Applied Mathematics, National Chiayi University, Chia-Yi 600, Taiwan(国立嘉义大学应用数学系,嘉义600,台湾)

AI总结 提出基于物理信息神经网络(PINNs)的框架,高效计算参数曲面上的类测地线曲线,支持多曲面系统和旋转曲面。

Comments 22 pages, 16 figures, 8 tables

详情
AI中文摘要

类测地线曲线的概念由Chen于2010年提出,作为估计参数曲面上最短路径(测地线)的一种方法,其收敛性已在理论上得到证明。然而,高效的数值计算框架尚未被开发。在本文中,我们提出了一种优雅且高效的方法,通过利用深度学习和物理信息神经网络(PINNs)来计算类测地线曲线。在所提出的框架下,不仅可以高效处理单个参数曲面,还可以稳健地处理一大类复杂参数曲面,包括具有$C^0$或更高连续性的多曲面系统以及旋转曲面。

英文摘要

The concept of geodesic-like curves was introduced by Chen in 2010 as a method for estimating shortest paths (geodesics) on parametric surfaces, with its convergence established theoretically. However, an efficient numerical computational framework has not yet been developed. In this paper, we propose an elegant and efficient approach for computing geodesic-like curves by leveraging deep learning and Physics-Informed Neural Networks (PINNs). Under the proposed framework, not only can single parametric surfaces be handled efficiently, but a broad class of complex parametric surfaces including multi-surface systems with $C^0$ or higher continuity and surfaces of revolution can also be robustly addressed.

2606.18837 2026-06-18 cs.MA cs.AI cs.LG 交叉投稿

Skill-MAS: Evolving Meta-Skill for Automatic Multi-Agent Systems

Skill-MAS: 演化元技能以自动生成多智能体系统

Hehai Lin, Qi Yang, Chengwei Qin

发表机构 * Ant Group(蚂蚁集团) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州))

AI总结 提出Skill-MAS,通过将高层编排能力解耦为可演化的元技能,在无需参数更新的情况下实现经验保留,利用多轨迹采样和选择性反思优化元技能,在多个基准和LLM上取得显著性能提升且成本可控。

详情
AI中文摘要

基于大型语言模型(LLM)的自动多智能体系统(MAS)生成已成为处理复杂任务的关键前沿。然而,现有方法在模型能力和经验保留之间面临两难困境。推理时MAS利用冻结的尖端LLM,但重复相同搜索而不从过去经验中学习。相反,训练时MAS通过梯度更新内化经验,但受限于较小模型的低能力上限,且难以扩展到大型尖端LLM。为弥合这一差距,我们提出Skill-MAS,一种新颖的第三条路径,通过将高层编排能力概念化为可演化的元技能,将经验保留与参数更新解耦。Skill-MAS通过一个封闭优化循环来精炼这种架构知识:(1)多轨迹采样在当前元技能下为每个任务采样行为分布;(2)选择性反思自适应选择优先任务,并应用分层对比分析将系统经验蒸馏为可泛化的策略级原则。在四个复杂基准和四个不同LLM上的大量实验表明,Skill-MAS不仅实现了显著的性能提升,而且保持了良好的成本-性能权衡。进一步分析揭示,演化后的元技能高度鲁棒,并在未见任务和不同LLM之间表现出强迁移性。

英文摘要

Large Language Model (LLM)-based automatic Multi-Agent Systems (MAS) generation has become a crucial frontier for tackling complex tasks. However, existing methods face a dilemma between model capability and experience retention. Inference-time MAS leverages frozen frontier LLMs but repeats identical searches without learning from past experience. Conversely, Training-time MAS internalizes experience via gradient updates but is constrained by the low capability ceiling of smaller models, and is hard to scale to large frontier LLMs. To bridge this gap, we propose Skill-MAS, a novel third path that decouples experience retention from parametric updates by conceptualizing the high-level orchestration capability as an evolvable Meta-Skill. Skill-MAS refines this architectural knowledge through a closed optimization loop: (1) Multi-Trajectory Rollout samples a behavioral distribution for each task under the current Meta-Skill; and (2) Selective Reflection adaptively selects priority tasks and applies hierarchical contrastive analysis to distill systemic experience into generalizable, strategy-level principles. Extensive experiments across four complex benchmarks and four distinct LLMs demonstrate that Skill-MAS not only achieves remarkable performance gains but also maintains a favorable cost-performance trade-off. Further analysis reveals that the evolved Meta-Skills are highly robust and exhibit strong transferability across unseen tasks and different LLMs.

2606.18853 2026-06-18 stat.ML cs.LG 交叉投稿

Kernel of Partition Paths: A Unified Representation for Tree Ensembles

划分路径的核:树集成的统一表示

Nicolas Mahler

AI总结 提出KPP核,通过路径度量索引森林节点,统一了预测、精确加性归因、确定性Lipschitz鲁棒半径和Rademacher风险界,为树集成提供几何框架。

Comments 31 pages

详情
AI中文摘要

最近的一系列工作将单个决策树重新表述为基于其分裂的工程特征的线性模型,为oracle不等式和特征重要性重解释开辟了途径,但留下了一个开放问题:当通过节点而非分裂索引特征映射时,森林诱导的统一几何对象是什么。本文研究了该对象。KPP通过森林节点索引特征映射,并由路径度量加权,该度量将每个坐标转化为平方欧几里得路径等距嵌入的分量。KPP在承载度量的非对角Gram矩阵下统一了四个支柱:预测、精确加性归因、KPP度量下的确定性Lipschitz鲁棒半径,以及在固定、诚实或交叉拟合条件下的回归和分类的均匀Rademacher风险界。所有概率保证均以表示为条件,并在三种显式条件机制下陈述;鲁棒半径保证在KPP度量下是确定性的,而非原始输入的范数。回归和分类的快速率改进被推测为开放问题,并未声称是定理。

英文摘要

A recent line of work has reframed individual decision trees as linear models on engineered features associated with their splits, opening routes for oracle inequalities and feature-importance reinterpretation, but leaving open the question of what unified geometric object a forest induces when one indexes its feature map by nodes rather than by splits. The present paper studies that object. KPP indexes the feature map by the nodes of the forest, weighted by a path metric that turns each coordinate into a component of a squared-Euclidean path-isometric embedding. KPP unifies four pillars under a single non-diagonal Gram that carries a metric: prediction, exact additive attribution, deterministic Lipschitz robust radius in the KPP metric, and uniform Rademacher risk bounds for regression and classification under fixed, honest, or cross-fit conditioning. All probabilistic guarantees are conditional on the representation and are stated under three explicit conditioning regimes; the robust-radius guarantee is deterministic in the KPP metric rather than in a norm on the raw input. Conjectured fast-rate refinements for both regression and classification are stated as open problems and are not claimed as theorems.

2606.19039 2026-06-18 cs.NE cs.LG cs.SD 交叉投稿

Adaptive Speech-to-Spike Encoding for Spiking Neural Networks

自适应语音到脉冲编码用于脉冲神经网络

Taharim Rahman Anon, Jakaria Islam Emon

发表机构 * PI LLC(1 PI LLC)

AI总结 提出一种可学习的残差语音到脉冲编码器,与R-LIF骨干网络联合训练,在GSC-v2上达94.97%准确率,参数高效且学习任务对齐的脉冲表示。

Comments Accepted at Interspeech 2026. This version is a preprint

详情
AI中文摘要

连续声学信号与离散事件驱动处理之间的不匹配仍然是神经形态语音处理的基本瓶颈。当前系统通常依赖固定的脉冲编码器,迫使下游脉冲神经网络(SNN)补偿非自适应的输入表示。为了解决这个问题,我们提出了一种可学习的残差语音到脉冲编码器,与循环漏积分点火(R-LIF)骨干网络进行端到端联合训练。我们在Google Speech Commands v2(GSC-v2)基准上验证了该方法,达到了高达94.97%的准确率。值得注意的是,学习到的编码器仍然高度参数高效,其紧凑的35k参数变体达到了89.8%,匹配或超过了需要多一个数量级参数的先前基线。我们以编码器为中心的分析,包括线性探测和梯度残差检查,表明编码器并不追求忠实的信号重建,而是学习任务对齐的脉冲表示,增强了类别可分性。最后,我们通过比较直接反馈对齐(DFA)和替代梯度BPTT在相同架构和训练条件下的表现,对生物启发、硬件友好的信用分配进行了基准测试。我们发现DFA达到了91.5%的准确率,量化了生物启发学习规则在现代神经形态音频中的性能权衡。

英文摘要

The mismatch between continuous acoustic signals and discrete event-driven processing remains a fundamental bottleneck for neuromorphic speech processing. Current systems typically rely on fixed spike encoders, forcing downstream Spiking Neural Networks (SNNs) to compensate for non-adaptive input representations. To address this, we present a learnable residual speech-to-spike encoder jointly trained end-to-end with a Recurrent Leaky Integrate-and-Fire (R-LIF) backbone. We validate this approach on the Google Speech Commands v2 (GSC-v2) benchmark, achieving up to 94.97% accuracy. Notably, the learned encoder remains highly parameter-efficient with a compact 35k-parameter variant that reaches 89.8%, matching or exceeding prior baselines that require an order of magnitude more parameters. Our encoder-focused analysis, including linear probing and gradient-residual inspection, indicates that the encoder does not target faithful signal reconstruction but instead learns task-aligned spike representations that enhance class separability. Finally, we benchmark bio-inspired, hardware-friendly credit assignment by comparing Direct Feedback Alignment (DFA) with surrogate-gradient BPTT under identical architectures and training conditions. We find that DFA reaches 91.5% accuracy, quantifying the performance trade-off of bio-inspired learning rules for modern neuromorphic audio.

2606.19101 2026-06-18 eess.SP cs.LG 交叉投稿

Structure Over Nonlinearity: Explicit Interaction Architectures for Dynamical Learning

结构优于非线性:面向动力学学习的显式交互架构

Augusto Sarti

AI总结 提出基于波启发交互结构的显式动力学单元,通过结构化组织而非非线性表达实现建模能力,在非线性系统辨识中深度提升表示质量与泛化性能。

Comments 11 pages, 2 figures, 2 tables

详情
AI中文摘要

大多数动力学系统的学习架构依赖于通用非线性函数逼近,通常需要高模型复杂度来捕获结构化行为。在这项工作中,我们提出了一种替代范式,其中建模能力主要来源于结构而非表达性非线性。我们引入了一类基于波启发交互结构和内部状态的显式结构化动力学单元。受波计算原理启发,所提出的单元采用严格的因果组织,消除了代数循环,产生无需隐式求解器即可评估的完全显式模型。堆叠此类单元可产生具有涌现层次行为的分层动力学架构。通过非线性系统辨识任务的实验,我们表明即使在有限的参数优化下,深度也能提高表示质量和泛化能力。特别地,所提出的架构即使在仅进行读出层拟合时也能产生信息丰富的内部表示,这表明有用的动力学结构在大量参数优化之前就已从交互的组织中涌现。这些结果表明,结构优先的设计为学习动力学系统提供了一种可行且有效的替代传统黑箱方法,突出了交互结构作为模型表达性主要来源的作用。

英文摘要

Most learning architectures for dynamical systems rely on generic nonlinear function approximation, often requiring high model complexity to capture structured behaviors. In this work, we propose an alternative paradigm in which modeling capability arises primarily from structure rather than from expressive nonlinearities. We introduce a class of explicit structured dynamical units based on wave-inspired interaction structures with internal state. Inspired by wave-based computational principles, the proposed units adopt a strictly causal organization that eliminates algebraic loops, yielding fully explicit models that can be evaluated without implicit solvers. Stacking such units produces layered dynamical architectures with emergent hierarchical behavior. Through experiments on a nonlinear system identification task, we show that depth improves both representation quality and generalization, even under limited parameter optimization. In particular, the proposed architectures produce informative internal representations even under readout-only fitting, indicating that useful dynamical structure emerges from the organization of interactions prior to substantial parameter optimization. These results suggest that structure-first design provides a viable and effective alternative to conventional black-box approaches for learning dynamical systems, highlighting the role of interaction structure as a primary source of model expressivity.

2606.19168 2026-06-18 cs.AI cs.LG 交叉投稿

Beyond Safe Data: Pretraining-Stage Alignment with Regular Safety Reflection

超越安全数据:具有正则安全反射的预训练阶段对齐

Jinhan Li, Kexian Tang, Yihan Xu, Zhuorui Ye, Kaifeng Lyu

发表机构 * Institute for Interdisciplinary Information Sciences, Tsinghua University(清华大学交叉信息研究院)

AI总结 提出安全反射预训练方法,在预训练语料中插入安全反思,使模型具备自我监控能力,实验表明该方法能有效降低推理和微调攻击成功率。

详情
AI中文摘要

为了实现大型语言模型(LLMs)更深层次的安全对齐,最近的研究探讨了如何将安全干预措施提前到预训练阶段,主要通过过滤不安全数据或将其改写为更安全的形式。我们认为,预训练阶段的对齐应超越使数据安全:LLMs可能将看似良性的知识和能力组合成不安全的行为。为此,我们提出了安全反射预训练,一种预训练阶段的对齐方法,该方法定期在预训练语料中插入简短的安全反思,将自我监控直接集成到语言建模中,建立一种基础能力,随后通过兼容的后训练加以强化。我们在FineWeb-Edu上预训练的1.7B模型上的实验表明,安全反射预训练提高了安全分类准确性,并显著降低了推理阶段和微调攻击的成功率。除了真实世界实验,我们还引入了一个完全受控的合成环境MedSafetyWorld,其中包含清晰的安全定义和推理结构,模型可以轻松地从安全数据中泛化出不安全行为。在MedSafetyWorld中的消融实验进一步表明,与数据过滤和改写相比,安全反射预训练在防止模型根据安全数据泛化出的不安全行为方面具有明显优势。综合来看,我们的发现表明,预训练对齐不仅应使训练数据安全,还应塑造模型可能从安全数据中习得的行为。

英文摘要

To achieve deeper safety alignment for large language models (LLMs), recent efforts have studied how to push safety interventions earlier into the pretraining stage, primarily by filtering unsafe data or rewriting it into safer forms. We argue that pretraining-stage alignment should go beyond making the data safe: LLMs may compose seemingly benign knowledge and capabilities into unsafe behaviors. To this end, we propose Safety Reflection Pretraining, a pretraining-stage alignment method which regularly inserts short safety reflections into pretraining corpora to integrate self-monitoring directly into language modeling, establishing a foundational capability that is subsequently reinforced by compatible post-training. Our experiments with 1.7B models pretrained on FineWeb-Edu show that Safety Reflection Pretraining improves safety classification accuracy and substantially reduces the success rates of inference-stage and finetuning attacks. Complementary to our real-world experiments, we also introduce a fully controlled synthetic environment, MedSafetyWorld, with a clear definition of safety and a reasoning structure under which models can easily generalize unsafe behaviors from safe data. Ablations in MedSafetyWorld further demonstrate a clear advantage of Safety Reflection Pretraining in preventing models from acting on unsafe behaviors generalized from safe data, compared with data filtering and rewriting. Taken together, our findings suggest that pretraining alignment should not only make the training data safe, but also shape the behaviors that models are likely to acquire from safe data.

2606.19279 2026-06-18 cs.AI cs.LG cs.LO math.CT math.LO math.PR 交叉投稿

NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning

NeSyCat Torch:神经符号学习中范畴语义的可微张量实现

Daniel Romero Schellhorn, Till Mossakowski, Björn Gehrke

发表机构 * University of Osnabrück(奥斯纳布吕克大学)

AI总结 提出NeSyCat Torch框架,通过强单子和真值聚合结构统一神经符号语义,利用惰性对数张量单子实现可微训练,在MNIST加法任务上优于LTN和DeepProbLog。

详情
AI中文摘要

神经符号语义是碎片化的:经典、模糊、概率和神经系统的真值各自遵循其归纳规则。NeSyCat扩展了ULLER,将它们统一在一个单一的真值归纳定义下,该定义以强单子和真值上的聚合结构为参数。NeSyCat至今缺乏对由神经网络学习的谓词和函数的描述。我们提供NeSyCat Torch作为缺失的环节,通过神经网络解释计算符号,在概率编程和张量后端中实现该框架。我们使用分布单子作为参考语义和度量评估,并辅以一个用于数值稳定、可微训练的单子:对数半环上的惰性对数张量单子。为了高效批量训练,我们还采用了批处理单子。公理即源代码:一次性地用基于单子的do-notation编写,单子绑定执行边缘化,惰性地剪枝不需要的分支。在MNIST加法任务上,我们的HaskTorch、JAX和PyTorch实现在速度和准确性上优于LTN和DeepProbLog,同时几乎达到DeepStochLog的准确性。然而,与DeepStochLog不同,我们保持在一个统一的框架内,适用于许多一阶神经符号方法。即,该构造以单子为参数;例如,用Giry单子实例化它可将方法扩展到连续概率(在此留作未来工作)。

英文摘要

Neurosymbolic semantics is fragmented: classical, fuzzy, probabilistic and neural systems each define truth by their own inductive rules. NeSyCat, extending ULLER, subsumes them under a single inductive definition of truth, parametric in a strong monad and an aggregation structure on truth-values. NeSyCat has so far lacked an account of predicates and functions learned by neural networks. We provide NeSyCat Torch as the missing link and interpret computational symbols via neural networks, implementing the framework in probabilistic programming and tensor-based backends. We use the distribution monad for reference semantics and metric evaluation, and complement it by a monad for numerically stable, differentiable training: the lazy log-tensor monad over the log-semiring. For efficient training in batches, we furthermore employ a batch monad. The axioms are the source code: written once in monad-based do-notation, monadic bind performs marginalisation, lazily pruning unneeded branches. On MNIST addition, our HaskTorch, JAX, and PyTorch implementations outperform LTN and DeepProbLog in speed and accuracy, while achieving nearly the accuracy of DeepStochLog. However, unlike DeepStochLog, we stay in a uniform framework that applies to many first-order NeSy approaches. Namely, the construction is parametric in the monad; instantiating it with, e.g., the Giry monad extends the approach to continuous probability (working out a neural representation here is left for future work).

2209.01378 2026-06-18 cs.LG eess.SP q-fin.ST 版本更新

RNN(p) for Power Consumption Forecasting

RNN(p) 用于电力消耗预测

Roberto Baviera, Pietro Manzoni

发表机构 * Politecnico di Milano, Department of Mathematics(米兰理工大学数学系) University of Edinburgh, Business School(爱丁堡大学商学院)

AI总结 提出RNN(p)作为ARX(p)的推广,用于多时间尺度季节模式预测,通过结构化反馈设计高效训练策略,在电力消耗预测中实现高精度与可解释性。

详情
AI中文摘要

一种基本的循环神经网络,它作用于p个时间滞后,称为RNN(p),是线性自回归模型ARX(p)的自然推广。对于在多个时间尺度上显示固有季节模式的变量,如能源、经济和金融时间序列中经常观察到的,它是一个强大的预测工具。RNN(p)模型的结构,以跨时间滞后的结构化反馈为特征,使得设计高效的训练策略成为可能。我们对这些模型的学习算法进行了比较研究,对其计算复杂度和训练性能进行了严格分析。我们展示了RNN(p)模型在电力消耗预测中的两个应用,这是能源领域的一个关键领域,准确的预测为运营和财务决策提供信息。实验结果表明,RNN(p)模型在保持高度可解释性的同时实现了出色的预测精度。这些特性使其非常适合能源市场和其他金融科技应用中的决策,其中可靠的预测在经济中发挥着重要作用。

英文摘要

An elementary Recurrent Neural Network that operates on p time lags, called an RNN(p), is the natural generalisation of a linear autoregressive model ARX(p). It is a powerful forecasting tool for variables displaying inherent seasonal patterns across multiple time scales, as is often observed in energy, economic, and financial time series. The architecture of RNN(p) models, characterised by structured feedbacks across time lags, enables the design of efficient training strategies. We conduct a comparative study of learning algorithms for these models, providing a rigorous analysis of their computational complexity and training performance. We present two applications of RNN(p) models in power consumption forecasting, a key domain within the energy sector where accurate forecasts inform both operational and financial decisions. Experimental results show that RNN(p) models achieve excellent forecasting accuracy while maintaining a high degree of interpretability. These features make them well-suited for decision-making in energy markets and other fintech applications where reliable predictions play a significant economic role.

2503.01805 2026-06-18 cs.LG cs.AI cs.CL 版本更新

Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers

图任务算法推理中Transformer的深度-宽度权衡

Gilad Yehudai, Clayton Sanford, Maya Bechler-Speicher, Orr Fischer, Ran Gilad-Bachrach, Amir Globerson

发表机构 * Courant Institute of Mathematical Sciences, New York University(纽约大学应用数学科学研究所) Google Research(谷歌研究) Meta AI Bar-Ilan University(巴伊兰大学) Department of Bio-Medical Engineering, Edmond J. Safra Center for Bioinformatics, Tel-Aviv University(生物医学工程系,埃德蒙·J·萨法中心,特拉维夫大学) Tel Aviv University(特拉维夫大学)

AI总结 研究Transformer在图算法任务中深度与宽度的权衡,发现线性宽度下常数深度足以解决许多图问题,而某些问题需要二次宽度,实验验证了宽模型在保持精度的同时训练和推理更快。

Comments Updated ISF grant number

详情
AI中文摘要

Transformer已经彻底改变了机器学习领域。特别是,它们可用于解决复杂的算法问题,包括基于图的任务。在此类算法任务中,一个关键问题是能够实现该任务的Transformer的最小尺寸是多少。最近的工作开始探索图任务的这个问题,表明对于次线性嵌入维度(即模型宽度),对数深度就足够了。然而,我们在这里解决的一个开放问题是,如果允许宽度线性增长而深度保持固定,会发生什么。我们分析了这种情况,并得出了一个令人惊讶的结果:在线性宽度下,常数深度足以解决一系列基于图的问题。这表明宽度的适度增加可以允许更浅的模型,这在推理和训练时间方面是有利的。对于其他问题,我们表明需要二次宽度。我们的结果展示了Transformer实现图算法的复杂而有趣的格局。我们通过实验研究了深度和宽度相对能力之间的这些权衡,并发现宽模型在具有与深模型相同准确度的任务中,由于可并行化的硬件,训练和推理时间更快。

英文摘要

Transformers have revolutionized the field of machine learning. In particular, they can be used to solve complex algorithmic problems, including graph-based tasks. In such algorithmic tasks a key question is what is the minimal size of a transformer that can implement the task. Recent work has begun to explore this problem for graph-based tasks, showing that for sub-linear embedding dimension (i.e., model width) logarithmic depth suffices. However, an open question, which we address here, is what happens if width is allowed to grow linearly, while depth is kept fixed. Here we analyze this setting, and provide the surprising result that with linear width, constant depth suffices for solving a host of graph-based problems. This suggests that a moderate increase in width can allow much shallower models, which are advantageous in terms of inference and train time. For other problems, we show that quadratic width is required. Our results demonstrate the complex and intriguing landscape of transformer implementations of graph-based algorithms. We empirically investigate these trade-offs between the relative powers of depth and width and find tasks where wider models have the same accuracy as deep models, while having much faster train and inference time due to parallelizable hardware.

2503.08038 2026-06-18 cs.LG cs.AI cs.CV 版本更新

Generalized Kullback-Leibler Divergence Loss

广义Kullback-Leibler散度损失

Jiequan Cui, Beier Zhu, Qingshan Xu, Zhuotao Tian, Xiaojuan Qi, Bei Yu, Hanwang Zhang, Richang Hong

发表机构 * Hefei University of Technology(合肥工业大学) University of Science and Technology of China(中国科学技术大学) Nanyang Technological University(南洋理工大学) The Chinese University of Hong Kong(香港中文大学) The University of Hong Kong(香港大学) Harbin Institute of Technology, Shenzhen(哈尔滨工业大学(深圳))

AI总结 本文提出广义KL散度损失,通过解耦KL损失为加权MSE和交叉熵损失,并引入非对称优化修正和类别全局信息,在对抗训练和知识蒸馏中取得SOTA性能。

Comments TPAMI 2026, extension of our NeurIPS paper "Decoupled Kullback-Leibler Divergence Loss". arXiv admin note: substantial text overlap with arXiv:2305.13948

详情
AI中文摘要

在本文中,我们深入探讨了Kullback-Leibler (KL) 散度损失,并从数学上证明它等价于由(1)加权均方误差(wMSE)损失和(2)包含软标签的交叉熵损失组成的解耦Kullback-Leibler (DKL) 散度损失。得益于DKL损失的解耦结构,我们确定了两个改进方向。首先,我们通过打破KL损失的不对称优化性质并引入更平滑的权重函数,解决了其在知识蒸馏等场景中的局限性。这一修改有效缓解了优化中的收敛困难,特别是对于软标签中预测分数较高的类别。其次,我们将类别级别的全局信息引入KL/DKL,以减少单个样本带来的偏差。通过这两项改进,我们推导出广义Kullback-Leibler (GKL) 散度损失,并通过在CIFAR-10/100、ImageNet和视觉-语言数据集上进行实验,聚焦于对抗训练和知识蒸馏任务,评估其有效性。具体来说,我们在公开排行榜RobustBench上实现了新的最先进对抗鲁棒性,并在CIFAR/ImageNet模型和CLIP模型上取得了具有竞争力的知识蒸馏性能,展示了其重要的实际价值。我们的代码可在该https URL获取。

英文摘要

In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of (1) a weighted Mean Square Error (wMSE) loss and (2) a Cross-Entropy loss incorporating soft labels. Thanks to the decoupled structure of DKL loss, we have identified two areas for improvement. Firstly, we address the limitation of KL loss in scenarios like knowledge distillation by breaking its asymmetric optimization property along with a smoother weight function. This modification effectively alleviates convergence challenges in optimization, particularly for classes with high predicted scores in soft labels. Secondly, we introduce class-wise global information into KL/DKL to reduce bias arising from individual samples. With these two enhancements, we derive the Generalized Kullback-Leibler (GKL) Divergence loss and evaluate its effectiveness by conducting experiments on CIFAR-10/100, ImageNet, and vision-language datasets, focusing on adversarial training, and knowledge distillation tasks. Specifically, we achieve new state-of-the-art adversarial robustness on the public leaderboard -- RobustBench and competitive knowledge distillation performance across CIFAR/ImageNet models and CLIP models, demonstrating the substantial practical merits. Our code is available at https://github.com/jiequancui/DKL.

2506.09046 2026-06-18 cs.LG cs.AI cs.MA 版本更新

Self-Evolving Multi-Agent Systems via Textual Backpropagation

通过文本反向传播的自进化多智能体系统

Xiaowen Ma, Yunpu Ma, Chenyang Lin, Sikuan Yan, Jinhe Bi, Zixuan Cao, Yijun Tian, Volker Tresp, Hinrich Schuetze

发表机构 * Ludwig Maximilian University of Munich(慕尼黑路德维希-马克西米利安大学) Technical University of Munich(慕尼黑技术大学) Munich Center for Machine Learning(慕尼黑机器学习中心) University of Notre Dame(诺丁汉大学)

AI总结 提出Agentic Neural Network框架,将多智能体协作建模为分层神经网络,通过前向分解任务和反向传播反馈实现智能体角色、提示和协作的自进化,在七个基准数据集上超越现有方法。

详情
AI中文摘要

利用多个大型语言模型(LLM)已被证明对处理复杂、高维任务有效,但当前方法通常依赖静态、手动设计的多智能体配置。为克服这些限制,我们提出Agentic Neural Network(ANN)框架,该框架将多智能体协作概念化为分层神经网络架构。在此设计中,每个智能体作为节点运行,每一层形成一个专注于特定子任务的协作团队。我们的框架遵循两阶段优化策略:(1)前向阶段——受神经网络前向传播启发,任务被动态分解为子任务,并逐层构建具有合适聚合方法的协作智能体团队。(2)反向阶段——模仿反向传播,我们通过迭代反馈优化全局和局部协作,使智能体能够自进化其角色、提示和协调。这种神经符号方法使我们的框架能够在训练后创建新的或专门的智能体团队,在准确性和适应性方面带来显著提升。在七个基准数据集上,我们的工作在相同配置下超越了领先的多智能体基线,显示出持续的性能改进。

英文摘要

Leveraging multiple Large Language Models (LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network (ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative team focused on a specific subtask. Our framework follows a two-phase optimization strategy: (1) Forward Phase - Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase - Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables our framework to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across seven benchmark datasets, our work surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements.

2507.01414 2026-06-18 cs.LG 版本更新

Decomposing Prediction Mechanisms for In-Context Recall

分解上下文召回中的预测机制

Sultan Daniels, Dylan Davis, Dhruv Gautam, Wentinn Liao, Gireeja Ranade, Anant Sahai

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Pennsylvania(宾夕法尼亚大学)

AI总结 通过设计结合连续上下文学习与离散关联召回的新玩具问题,发现Transformer模型在上下文召回任务中存在两种具有不同学习动态的独立机制:一种依赖离散符号标签进行关联召回,另一种基于前一个token和上下文进行贝叶斯式预测。

Comments 45 pages, 47 figures, 2 tables

详情
AI中文摘要

我们引入了一类新的玩具问题,将线性回归风格的连续上下文学习(ICL)特征与离散关联召回相结合。我们在该玩具的样本轨迹上预训练Transformer模型,具体是从随机抽取的线性确定性动力系统中提取的符号标记交错状态观测。我们研究当模型被提示使用相应的上下文标签时,是否能够召回先前在其上下文中见过的序列的状态。仔细观察这个任务,很明显模型必须执行两个功能:(1)识别应召回哪个系统的状态,并将该系统应用于其最后看到的状态;(2)继续应用正确的系统来预测后续状态。训练动态表明,第一个能力在模型训练中后期才出现。令人惊讶的是,第二个能力(继续预测恢复的序列)发展得更早。通过分布外实验和通过边缘剪枝对模型权重的机制分析,我们发现这个玩具问题的下一个token预测涉及至少两个独立的机制。一种机制使用离散符号标签进行关联召回,以预测先前见过的序列恢复的开始。第二种机制在很大程度上与离散符号标签无关,基于前一个token和上下文进行“贝叶斯式”预测。这两种机制具有不同的学习动态。为了确认这种多机制现象(表现为不同的相变)不仅仅是玩具设置的人为产物,我们使用OLMo在ICL翻译任务上的训练检查点观察到了类似的现象:第一个任务token的性能与第二个任务token的性能出现决定性差距。

英文摘要

We introduce a new family of toy problems that combine features of linear-regression-style continuous in-context learning (ICL) with discrete associative recall. We pretrain transformer models on sample traces from this toy, specifically symbolically-labeled interleaved state observations from randomly drawn linear deterministic dynamical systems. We study if the transformer models can recall the state of a sequence previously seen in its context when prompted to do so with the corresponding in-context label. Taking a closer look at this task, it becomes clear that the model must perform two functions: (1) identify which system's state should be recalled and apply that system to its last seen state, and (2) continuing to apply the correct system to predict the subsequent states. Training dynamics reveal that the first capability emerges well into a model's training. Surprisingly, the second capability, of continuing the prediction of a resumed sequence, develops much earlier. Via out-of-distribution experiments, and a mechanistic analysis on model weights via edge pruning, we find that next-token prediction for this toy problem involves at least two separate mechanisms. One mechanism uses the discrete symbolic labels to do the associative recall required to predict the start of a resumption of a previously seen sequence. The second mechanism, which is largely agnostic to the discrete symbolic labels, performs a "Bayesian-style" prediction based on the previous token and the context. These two mechanisms have different learning dynamics. To confirm that this multi-mechanism (manifesting as separate phase transitions) phenomenon is not just an artifact of our toy setting, we used OLMo training checkpoints on an ICL translation task to see a similar phenomenon: a decisive gap in the emergence of first-task-token performance vs second-task-token performance.

2601.14968 2026-06-18 cs.LG cs.AI 版本更新

InstructTime++: Time Series Classification with Multimodal Language Modeling via Implicit Feature Enhancement

InstructTime++: 通过隐式特征增强的多模态语言建模进行时间序列分类

Mingyue Cheng, Xiaoyu Tao, Huajian Zhang, Qi Liu, Zhiding Liu, Yucong Luo, Yiheng Chen, Enhong Chen

发表机构 * State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China(中国科学技术大学认知智能国家重点实验室)

AI总结 提出将时间序列分类转化为多模态生成任务,通过离散化模块和对齐投影层弥合模态差距,并利用隐式特征建模提升语言模型性能。

详情
AI中文摘要

大多数现有的时间序列分类方法采用判别范式,将输入序列直接映射到独热编码的类别标签。虽然有效,但这种范式难以融入上下文特征,也无法捕捉类别间的语义关系。为了解决这些局限性,我们提出了InstructTime,一种将时间序列分类重新定义为多模态生成任务的新框架。具体来说,连续的数值序列、上下文文本特征和任务指令被视为多模态输入,而类别标签则通过调优的语言模型作为文本输出生成。为了弥合模态差距,InstructTime引入了一个时间序列离散化模块,将连续序列转换为离散的时间标记,同时结合对齐投影层和生成式自监督预训练策略,以增强跨模态表示对齐。在此框架基础上,我们进一步提出了InstructTime++,通过引入隐式特征建模来扩展InstructTime,以补偿语言模型有限的归纳偏差。InstructTime++利用专门的工具包从原始时间序列和上下文输入中挖掘信息丰富的隐式模式,包括统计特征提取和基于视觉-语言模型的图像描述,并将其转化为文本描述以实现无缝集成。在多个基准数据集上的大量实验证明了InstructTime++的优越性能。

英文摘要

Most existing time series classification methods adopt a discriminative paradigm that maps input sequences directly to one-hot encoded class labels. While effective, this paradigm struggles to incorporate contextual features and fails to capture semantic relationships among classes. To address these limitations, we propose InstructTime, a novel framework that reformulates time series classification as a multimodal generative task. Specifically, continuous numerical sequences, contextual textual features, and task instructions are treated as multimodal inputs, while class labels are generated as textual outputs by tuned language models. To bridge the modality gap, InstructTime introduces a time series discretization module that converts continuous sequences into discrete temporal tokens, together with an alignment projection layer and a generative self-supervised pre-training strategy to enhance cross-modal representation alignment. Building upon this framework, we further propose InstructTime++, which extends InstructTime by incorporating implicit feature modeling to compensate for the limited inductive bias of language models. InstructTime++ leverages specialized toolkits to mine informative implicit patterns from raw time series and contextual inputs, including statistical feature extraction and vision-language-based image captioning, and translates them into textual descriptions for seamless integration. Extensive experiments on multiple benchmark datasets demonstrate the superior performance of InstructTime++.

2601.20361 2026-06-18 cs.LG cs.NA math.NA 版本更新

TINNs: Time-Induced Neural Networks for Solving Time-Dependent PDEs

TINNs:时间诱导神经网络求解时变偏微分方程

Chen-Yang Dai, Che-Chia Chang, Te-Sheng Lin, Ming-Chih Lai, Chieh-Hsin Lai

发表机构 * Department of Applied Mathematics, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan(应用数学系,国立阳明交通大学,新竹30010,台湾) Institute of Artificial Intelligence Innovation, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan(人工智能创新研究所,国立阳明交通大学,新竹30010,台湾) National Center for Theoretical Sciences, National Taiwan University, Taipei 10617, Taiwan(理论科学研究中心,国立台湾大学,台北10617,台湾)

AI总结 提出时间诱导神经网络(TINNs),将网络权重参数化为时间的函数,使空间表示随时间演化,结合Levenberg-Marquardt优化,在时变PDE求解中相对误差降低4倍,收敛速度提升10倍。

Comments Accepted at ICML 2026. Camera-ready version. Includes appendix

详情
AI中文摘要

物理信息神经网络(PINNs)通过学习一个无网格、可微的解来求解时变偏微分方程(PDE),该解可在空间和时间的任意位置进行评估。然而,标准的时空PINNs将时间作为输入,但在所有时间上重用具有共享权重的单一网络,迫使相同的特征表示显著不同的动力学。这种耦合会降低误差性能,并在联合强制执行PDE、边界和初始条件时可能破坏训练稳定性。我们提出时间诱导神经网络(TINNs),一种新颖的架构,将网络权重参数化为时间的可学习函数,允许有效的空间表示随时间演化,同时保持共享结构。由此产生的公式自然产生一个非线性最小二乘问题,我们使用Levenberg-Marquardt方法高效优化。在各种时变PDE上的实验表明,与PINNs和强基线相比,相对误差提高了4倍,收敛速度提高了10倍。

英文摘要

Physics-informed neural networks (PINNs) solve time-dependent partial differential equations (PDEs) by learning a mesh-free, differentiable solution that can be evaluated anywhere in space and time. However, standard space-time PINNs take time as an input but reuse a single network with shared weights across all times, forcing the same features to represent markedly different dynamics. This coupling degrades error performance and can destabilize training when enforcing PDE, boundary, and initial constraints jointly. We propose Time-Induced Neural Networks (TINNs), a novel architecture that parameterizes the network weights as a learned function of time, allowing the effective spatial representation to evolve over time while maintaining shared structure. The resulting formulation naturally yields a nonlinear least-squares problem, which we optimize efficiently using a Levenberg-Marquardt method. Experiments on various time-dependent PDEs show up to 4 times improved relative error and 10 times faster convergence compared to PINNs and strong baselines.

2604.13082 2026-06-18 cs.LG cs.AI 版本更新

The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior

算术泛化的长延迟:当学习到的表征超越行为时

Laura Gomezjurado Gonzalez

发表机构 * Stanford University(斯坦福大学)

AI总结 研究Transformer在算术任务中泛化延迟的原因,发现编码器早期已学到结构,但解码器瓶颈导致延迟,通过移植编码器或冻结编码器可加速泛化,且数字基的选择影响学习难度。

Comments 19 pages, 10 fugures

详情
AI中文摘要

在算法任务上训练的Transformer中的grokking现象以训练集拟合与突然泛化之间的长延迟为特征,但该延迟的来源仍不清楚。在编码器-解码器算术模型中,我们认为这种延迟反映了对已学习结构的有限访问,而非未能首先获得该结构。我们研究一步Collatz预测,发现编码器在最初几千训练步内组织了奇偶性和残差结构,而输出精度在数万步内仍接近随机。因果干预支持解码器瓶颈假说。将训练好的编码器移植到新模型中将grokking加速2.75倍,而移植训练好的解码器则有害。冻结收敛的编码器并仅重新训练解码器完全消除了平台期,并达到97.6%的准确率,而联合训练为86.1%。解码器任务的难易取决于数字表示。在15种基中,那些分解与Collatz映射算术对齐的基(例如基24)达到99.8%的准确率,而二进制完全失败,因为其表示崩溃且无法恢复。基的选择作为归纳偏置,控制解码器可利用的局部数字结构量,从而在相同底层任务上产生巨大的可学习性差异。

英文摘要

Grokking in transformers trained on algorithmic tasks is characterized by a long delay between training-set fit and abrupt generalization, but the source of that delay remains poorly understood. In encoder-decoder arithmetic models, we argue that this delay reflects limited access to already learned structure rather than failure to acquire that structure in the first place. We study one-step Collatz prediction and find that the encoder organizes parity and residue structure within the first few thousand training steps, while output accuracy remains near chance for tens of thousands more. Causal interventions support the decoder bottleneck hypothesis. Transplanting a trained encoder into a fresh model accelerates grokking by 2.75 times, while transplanting a trained decoder actively hurts. Freezing a converged encoder and retraining only the decoder eliminates the plateau entirely and yields 97.6% accuracy, compared to 86.1% for joint training. What makes the decoder's job harder or easier depends on numeral representation. Across 15 bases, those whose factorization aligns with the Collatz map's arithmetic (e.g., base 24) reach 99.8% accuracy, while binary fails completely because its representations collapse and never recover. The choice of base acts as an inductive bias that controls how much local digit structure the decoder can exploit, producing large differences in learnability from the same underlying task.

2605.11287 2026-06-18 cs.LG cs.AI 版本更新

Beyond Similarity: Temporal Operator Attention for Time Series Analysis

超越相似性:时间序列分析中的时序操作注意力

Jevon Twitty, Vinh Pham, Nitiwith Rotchanarak, Viresh Pati, Yubin Kim, Shihao Yang, Jiecheng Lu

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文提出时序操作注意力(TOA),通过引入可学习的操作符增强注意力机制,以更有效地处理时间序列数据中的符号和振荡变换,提升时间序列预测、异常检测和分类任务的性能。

详情
AI中文摘要

时间序列预测中存在一个持久性悖论:结构简单的MLP和线性模型往往优于高容量的Transformer。我们指出,这种差距源于序列建模基本原理的不匹配:尽管许多时间序列动态由全局时间操作符(如滤波和谐波结构)主导,标准注意力将每个输出视为输入的凸组合。这限制了其表示带符号和振荡变换的能力,这些能力对于时间信号处理至关重要。我们正式将这一限制定义为softmax注意力中的简单约束混合瓶颈,这对由操作符驱动的时间序列任务尤其限制性。为了解决这一问题,我们提出时序操作注意力(TOA),一种通过显式、可学习的序列空间操作符增强注意力的框架,使时间内的符号混合成为可能,同时保持输入依赖的适应性。为了使密集的N×N操作符实用化,我们引入了随机操作符正则化,一种高方差的dropout机制,它稳定了训练并防止了记忆性学习。在预测、异常检测和分类基准上,TOA在集成到标准骨干如PatchTST和iTransformer时始终提高了性能,尤其是在重建密集任务中表现尤为突出。这些结果表明,显式操作符学习是有效时间序列建模的关键要素。

英文摘要

A persistent paradox in time-series forecasting is that structurally simple MLP and linear models often outperform high-capacity Transformers. We argue that this gap arises from a mismatch in the sequence-modeling primitive: while many time-series dynamics are governed by global temporal operators (e.g., filtering and harmonic structure), standard attention forms each output as a convex combination of inputs. This restricts its ability to represent signed and oscillatory transformations that are fundamental to temporal signal processing. We formalize this limitation as a simplex-constrained mixing bottleneck in softmax attention, which becomes especially restrictive for operator-driven time-series tasks. To address this, we propose $\textbf{Temporal Operator Attention (TOA)}$, a framework that augments attention with explicit, learnable sequence-space operators, enabling direct signed mixing across time while preserving input-dependent adaptivity. To make dense $N \times N$ operators practical, we introduce Stochastic Operator Regularization, a high-variance dropout mechanism that stabilizes training and prevents trivial memorization. Across forecasting, anomaly detection, and classification benchmarks, TOA consistently improves performance when integrated into standard backbones such as PatchTST and iTransformer, with particularly strong gains in reconstruction-heavy tasks. These results suggest that explicit operator learning is a key ingredient for effective time-series modeling.

2606.01249 2026-06-18 cs.LG cs.CL 版本更新

Trust Region On-Policy Distillation

信任区域在线策略蒸馏

Xingrun Xing, Haoqing Wang, Boyan Gao, Ziheng Li, Yehui Tang

发表机构 * Samsung Research(三星研究院) University of Oxford(牛津大学) Peking University(北京大学)

AI总结 提出信任区域在线策略蒸馏(TrOPD),通过信用分配策略和信任区域学习解决师生分布差异导致的训练不稳定问题,在数学推理、代码生成和通用基准上超越现有方法。

详情
AI中文摘要

在线策略蒸馏(OPD)是大型语言模型(LLM)高效后训练的基本技术,在智能体学习、多任务增强和模型压缩中具有广泛应用。然而,当教师和学生分布差异较大时,OPD训练变得不稳定,因为教师对学生生成token的监督可能产生不可靠的策略梯度,甚至导致优化失败。本文通过信用分配策略解决可靠的在线策略token级监督问题,并提出信任区域在线策略蒸馏(TrOPD)。它具有以下特点:1)信任区域在线策略学习:TrOPD仅在教师提供可靠监督的区域进行OPD,缓解了分布不匹配下K1反向KL估计的优化困难。2)异常值估计:对于异常区域,我们探索梯度裁剪、掩码和前向KL估计,以减少不可靠监督的不利影响。3)离策略引导:学生从教师前缀继续生成,并使用前向KL模仿离策略引导,鼓励向可靠区域进行在线策略探索。实验表明,TrOPD在数学推理、代码生成和通用领域基准上始终优于最先进的OPD基线,包括OPD、EOPD和REOPOLD。

英文摘要

On-Policy Distillation (OPD) is a fundamental technique for efficient post-training of large language models (LLMs), with broad applications in agent learning, multi-task enhancement, and model compression. However, OPD training becomes unstable when the teacher and student distributions differ substantially, as teacher supervision on student-generated tokens may yield unreliable policy gradients and even cause optimization failure. This work addresses reliable on-policy token-level supervision through credit assignment strategies, and proposes Trust Region On-Policy Distillation, TrOPD. It features the following characteristics: 1) Trust-Region On-Policy Learning: TrOPD performs OPD only in regions where the teacher provides reliable supervision, mitigating the optimization difficulty of the K1 reverse-KL estimator under distribution mismatch. 2) Outlier Estimation: For outlier regions, we explore gradient clipping, masking, and forward-KL estimation to reduce the adverse effects of unreliable supervision. 3) Off-Policy Guidance: The student continues generation from teacher prefixes and uses forward KL to imitate off-policy guidance, encouraging on-policy exploration toward reliable regions. Experiments show that TrOPD consistently outperforms SoTA OPD baselines, including OPD, EOPD, and REOPOLD, across mathematical reasoning, code generation, and general-domain benchmarks.

2606.06564 2026-06-18 cs.LG cs.AI 版本更新

HAARES Half-Split Residual Basis Routing for Deep Transformers

WAV:面向深度仅解码器Transformer的多分辨率块残差路由

Kehan Wang

发表机构 * Chongqing University(重庆大学)

AI总结 提出WAV v1方法,通过为每个块增加方向性细节基(相位基和分裂基)来增强残差路由,在深层Transformer中优于现有方法,48层时在TinyStories和Text8上取得更低验证损失。

Comments 6 pages, 4 figures, 3 tables

详情
AI中文摘要

残差连接对于训练深度Transformer至关重要,但标准的PreNorm残差流以固定的单位权重聚合子层更新。最近的注意力残差用内容相关的深度路由替代了这种固定累积,而块注意力残差通过对块级残差摘要进行路由使机制高效。然而,单个块摘要仅存储块内的低频总残差位移,丢弃了方向性结构,例如注意力与MLP的不平衡以及早期与晚期块的动态。我们提出WAV v1,一种用于仅解码器Transformer的轻量级多分辨率残差路由方法。WAV v1不是仅通过累积残差和来表示每个块,而是为每个块增加两个方向性细节基:一个对比注意力和MLP更新的相位基,以及一个对比早期和晚期子层更新的分裂基。这些基与标准块摘要一起通过相同的深度softmax混合器进行路由,而负细节源初始化和分离的RMS匹配稳定了训练。在字符级TinyStories和Text8语言建模中,WAV v1显示出明显的深度相关优势。尽管在12层时并非始终有益,但在24层时变得有竞争力,并在48层时优于所有基线。在48层时,WAV v1将TinyStories上的验证损失从0.4960降至0.4738,Text8上从0.9363降至0.9305,且额外参数可忽略。这些结果表明,方向性残差细节(而不仅仅是块级和)对于在更深Transformer中扩展残差路由很重要。

英文摘要

Block-level residual routing makes learned residual aggregation practical by routing over block summaries, but each summary compresses an ordered sequence of attention and MLP updates into one cumulative vector. We propose \method{}, a lightweight residual basis router that keeps the cumulative block source and adds one half-split detail basis, computed as the difference between first-half and second-half residual updates. The detail basis is RMS-matched and updated online, exposing coarse intra-block trajectory information without dense sublayer-level routing. Across OpenWebText, cross-domain character-level benchmarks, and BPE-tokenized OpenWebText, the empirical pattern is depth-dependent: gains are small or mixed at shallow depth and most reliable in 48-layer models. In the 201M 48-layer setting, \method{} improves over Block AttnRes across all three seeds, while a 453M two-seed probe shows the same direction. Ablations rule out source duplication, random signed details, fixed detail-source biases, or block-count changes alone. Cost analysis shows that the method is FLOP-light but not wall-clock-free: it adds memory and routing overhead, yet its relative arithmetic cost is amortized as width grows and earlier convergence can reduce time-to-target.

2606.02800 2026-06-18 cs.CV cs.AI cs.LG cs.MM cs.RO 版本更新

Cosmos 3: Omnimodal World Models for Physical AI

Cosmos 3:面向物理AI的全模态世界模型

NVIDIA, :, Aditi, Niket Agarwal, Arslan Ali, Jon Allen, Martin Antolini, Adeline Aubame, Alisson Azzolini, Junjie Bai, Maciej Bala, Yogesh Balaji, Josh Bapst, Aarti Basant, Mukesh Beladiya, Mohammad Qazim Bhat, Zaid Pervaiz Bhat, Dan Blick, Vanni Brighella, Han Cai, Tiffany Cai, Eric Cameracci, Jiaxin Cao, Yulong Cao, Mark Carlson, Carlos Casanova, Ting-Yun Chang, Yan Chang, Yu-Wei Chao, Prithvijit Chattopadhyay, Roshan Chaudhari, Chieh-Yun Chen, Junyu Chen, Ke Chen, Qizhi Chen, Wenkai Chen, Xiaotong Chen, Yu Chen, An-Chieh Cheng, Click Cheng, Xiu Chia, Jeana Choi, Chaeyeon Chung, Wenyan Cong, Yin Cui, Magdalena Dadela, Nalin Dadhich, Wenliang Dai, Joyjit Daw, Alperen Degirmenci, Rodrigo Vieira Del Monte, Robert Denomme, Sameer Dharur, Marco Di Lucca, Ke Ding, Wenhao Ding, Yifan Ding, Yuzhu Dong, Nicole Drumheller, Yilun Du, Aigul Dzhumamuratova, Aleksandr Efitorov, Hamid Eghbalzadeh, Naomi Eigbe, Imad El Hanafi, Hassan Eslami, Benedikt Falk, Jiaojiao Fan, Jim Fan, Amol Fasale, Sergiy Fefilatyev, Liang Feng, Francesco Ferroni, Sanja Fidler, Xiao Fu, Vikram Fugro, Prashant Gaikwad, TJ Galda, Katelyn Gao, Yihuai Gao, Wenhang Ge, Sreyan Ghosh, Arushi Goel, Vivek Goel, Akash Gokul, Rama Govindaraju, Jinwei Gu, Miguel Guerrero, Elfie Guo, Aryaman Gupta, Siddharth Gururani, Hugo Hadfield, Song Han, Ankur Handa, Zekun Hao, Mohammad Harrim, Ali Hassani, Nathan Hayes-Roth, Yufan He, Chris Helvig, Cyrus Hogg, Madison Huang, Michael Huang, Sophia Huang, Yufan Huang, Jacob Huffman, DeLesley Hutchins, Suneel Indupuru, Boris Ivanovic, Arihant Jain, Joel Jang, Ryan Ji, Yanan Jian, Dongfu Jiang, Jingyi Jin, Atharva Joshi, Nikhilesh Joshi, Pranjali Joshi, Andy Ju, Jaehun Jung, Weiwei Kang, Scott Kassekert, Jan Kautz, Ashna Khetan, Julia Kiczka, Slawek Kierat, Gwanghyun Kim, Kuno Kim, Sunny Kim, Kezhi Kong, Xin Kong, Zhifeng Kong, Tomasz Kornuta, Egor Krivov, Hui Kuang, Saurav Kumar, Chia-Wen Kuo, George Kurian, Wojciech Kutak, JF Lafleche, Himangshu Lahkar, Omar Laymoun, Jayjun Lee, Sanggil Lee, Gabriele Leone, Boyi Li, Freya Li, Jiajun Li, Jinfeng Li, Ling Li, Pengcheng Li, Shangru Li, Tingle Li, Xiaolong Li, Xuan Li, Zhaoshuo Li, Zhiqi Li, Hao Liang, Maosheng Liao, Chen-Hsuan Lin, Tsung-Yi Lin, Ming-Yu Liu, Sifei Liu, Zihan Liu, Hai Loc Lu, Xiangyu Lu, Alice Luo, Ruipu Luo, Wenjie Luo, Jiangran Lyu, Martin Ding Ma, Nic Ma, Qianli Ma, Dawid Majchrowski, Louis Marcoux, Miguel Martin, Qing Miao, Ashkan Mirzaei, Shreyas Misra, Kaichun Mo, Durra Mohsin, Hyejin Moon, Pawel Morkisz, Saeid Motiian, Kirill Motkov, Seungjun Nah, Yashraj Narang, Deepak Narayanan, Thabang Ngazimbi, Julian Ouyang, Shubham Pachori, David Page, Yatian Pang, Sehwi Park, Mahesh Patekar, Mostofa Patwary, Marco Pavone, Trung Pham, Wei Ping, Soha Pouya, Shrimai Prabhumoye, Varun Praveen, Delin Qu, Hesam Rabeti, Morteza Ramezanali, Marilyn Reeb, Xuanchi Ren, Kristen Rumley, Wojciech Rymer, Jun Saito, Yeongho Seol, John Shao, Piyush Shekdar, Tianwei Shen, Humphrey Shi, Min Shi, Stella Shi, Kevin Shih, Mohammad Shoeybi, Mateusz Sieniawski, Shuran Song, Alexander Sotelo, Amir Sotoodeh, Sunil Srinivasa, Vignesh Srinivasakumar, Bartosz Stefaniak, Rahul Heinrich Steiger, Shangkun Sun, Jiaxiang Tang, Shitao Tang, Yangyang Tang, Yue Tang, Tolou Tavakkoli, Kayley Ting, Krzysztof Tomala, Wei-Cheng Tseng, Jibin Varghese, Sergei Vasilev, Thomas Volk, Raju Wagwani, Roger Waleffe, Andrew Z. Wang, Boxiang Wang, Haoxiang Wang, Qiao Wang, Shihao Wang, Shijie Wang, Ting-Chun Wang, Yan Wang, Yu Wang, Rohit Watve, David Wehr, Fangyin Wei, Xinshuo Weng, Jay Zhangjie Wu, Kedi Wu, Hongchi Xia, Summer Xiao, Tianjun Xiao, Kevin Xie, Daguang Xu, Jiashu Xu, Mengyao Xu, Ruqing Xu, Xingqian Xu, Yao Xu, Dinghao Yang, Dong Yang, Hans Yang, Xiaodong Yang, Xuning Yang, Yichu Yang, Yurong You, Zhiding Yu, Hao Yuan, Simon Yuen, Xiaohui Zeng, Pengcuo Zeren, Cindy Zha, Haotian Zhang, Jenny Zhang, Jing Zhang, Liangkai Zhang, Paris Zhang, Shun Zhang, Xuanmeng Zhang, Zhizheng Zhang, Ann Zhao, Yilin Zhao, Yuliya Zhautouskaya, Charles Zhou, Fengzhe Zhou, Shilin Zhu, Yuke Zhu, Dima Zhylko, Artur Zolkowski

发表机构 * NVIDIA

AI总结 提出基于统一混合Transformer架构的全模态世界模型Cosmos 3,联合处理语言、图像、视频、音频和动作序列,在理解和生成任务上达到新最优,为具身智能体提供可扩展的通用骨干。

详情
AI中文摘要

我们介绍了Cosmos 3,一个全模态世界模型家族,设计用于在统一的混合Transformer架构中联合处理和生成语言、图像、视频、音频和动作序列。通过支持高度灵活的输入输出配置,Cosmos 3无缝统一了物理AI的关键模态——有效地将视觉语言模型、视频生成器、世界模拟器和世界动作模型整合到一个框架中。我们的评估表明,Cosmos 3在一系列多样化的理解和生成任务中确立了新的最优水平,展示了全模态世界模型作为具身智能体可扩展、通用骨干的能力。我们的后训练Cosmos 3模型在技术报告撰写时被Artificial Analysis评为最佳开源文本到图像和图像到视频模型,并被RoboArena评为最佳策略模型。为了加速物理AI领域的开放研究和部署,我们在Linux基金会的OpenMDW-1.1许可证下提供我们的代码、模型检查点、策划的合成数据集和评估基准,网址为https://this https URL License at this https URL }{ this http URL and this https URL。项目网站位于https://this https URL。

英文摘要

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI -- effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state-of-the-art across a diverse suite of understanding and generation tasks, demonstrating omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation's OpenMDW-1.1 License at https://github.com/nvidia/cosmos and https://huggingface.co/collections/nvidia/cosmos3. The project website is available at https://research.nvidia.com/labs/cosmos-lab/cosmos3.

2. 表示学习、自监督与对比学习 11 篇

2606.18383 2026-06-18 cs.LG cs.CL 新提交

From Sparse Features to Trustworthy Proxies: Certifying SAE-Based Interpretability

从稀疏特征到可信代理:认证基于SAE的可解释性

Dibyanayan Bandyopadhyay, Asif Ekbal

发表机构 * Department of Computer Science and Engineering, Indian Institute of Technology Patna(印度理工学院巴特那分校计算机科学与工程系)

AI总结 提出一种后验泛化框架,通过稀疏代理(SAE重建)认证语言模型,推导期望风险上界,并在GPT-2 Small等模型上验证非平凡界,揭示深层更易认证且特征分解区分语义对齐与统计稀疏性。

详情
AI中文摘要

稀疏自编码器(SAE)越来越多地被用于从语言模型(LM)中提取可解释特征,但一个核心问题仍然存在:基于SAE的解释何时可以被视为底层冻结LM的忠实视图?我们通过一个后验泛化框架来研究这个问题,该框架通过稀疏代理来认证LM,稀疏代理是通过将原生隐藏激活替换为其预训练的SAE重建而获得的。我们的框架使用四个可测量量推导出基础模型期望风险的上界:代理风险、SAE重建差距、概念池不匹配和稀疏复杂度。我们将此证书解释为解释忠实性的操作标准。特别地,非平凡界表明提取的稀疏特征保留了有意义的预测信息,而小的重建和匹配误差表明代理在行为上接近原始模型。实验上,我们展示了在GPT-2 Small、Gemma-2B和Llama-3-8B上,该界在实际样本量下变得非平凡。对Llama-3-8B的详细逐层分析揭示了强烈的深度依赖性,较深层变得更容易认证,这与更强的局部保真度和更弱的下游误差放大相关。最后,通过特征洗牌消融,我们展示了分解区分了真正的语义对齐与单纯的统计稀疏性,为基于SAE的解释何时变得不太可靠提供了有用的诊断。

英文摘要

Sparse autoencoders (SAEs) are increasingly used to extract interpretable features from language models (LMs), yet a central question remains: when can an SAE-based explanation be treated as a faithful view of an underlying frozen LM We study this through a post-hoc generalization framework that certifies the LM via a sparse proxy, obtained by replacing a native hidden activation with its pretrained SAE reconstruction. Our framework derives an upper bound on the base model's expected risk using four measurable quantities: proxy risk, SAE reconstruction gap, concept-pool mismatch, and sparse complexity. We interpret this certificate as an operational criterion for explanatory faithfulness. In particular, a non-vacuous bound indicates that the extracted sparse features retain meaningful predictive information, while small reconstruction and mismatch errors indicate that the proxy remains behaviorally close to the original model. Empirically, we show that the bound becomes non-vacuous on GPT-2 Small, Gemma-2B, and Llama-3-8B at practical sample sizes. A detailed layerwise analysis of Llama-3-8B reveals a strong depth dependence, with later layers becoming much easier to certify, associated with both stronger local fidelity and weaker downstream error amplification. Finally, through feature-shuffling ablations, we show that the decomposition distinguishes genuine semantic alignment from mere statistical sparsity, providing a useful diagnostic for when SAE-based explanations become less reliable.

2606.18390 2026-06-18 cs.LG q-bio.QM 新提交

MOLAR: Learning Multimodal Molecular Representations from Noisy Labels

MOLAR: 从噪声标签中学习多模态分子表示

Yingxu Wang, Kunyu Zhang, Nan Yin, Yu Li, Eran Segal

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学) Zhengzhou University(郑州大学) The Education University of Hong Kong(香港教育大学) The Chinese University of Hong Kong(香港中文大学) Weizmann Institute of Science(魏茨曼科学研究所)

AI总结 提出MOLAR框架,通过分离干净属性推断与标签观测,利用图与文本模态的残差证据,从噪声标签中学习多模态分子表示,在自然噪声和标签翻转基准上优于基线方法。

详情
AI中文摘要

动机:噪声标签是分子属性预测中的常见挑战,因为分子注释通常来自实验分析、 curated数据库或弱注释流程,而非直接观测到的干净生物状态。将记录标签视为可靠监督会导致模型记忆损坏的观测并学习误导性的分子证据。在多模态分子表示学习中,图-文本融合或对齐可能放大此问题,从而跨模态传播标签引起的错误。结果:我们提出MOLAR,一个从噪声标签中学习多模态分子表示的噪声感知框架。MOLAR将潜在干净属性推断与记录标签观测分离:图和文本视图为干净属性分布贡献残差证据,一个分类标签观测通道将此分布映射到记录标签用于训练。该公式从模型中推导出后验标签可靠性和模态特定的分子证据。在自然噪声分子基准和受控标签翻转基准上的实验表明,MOLAR始终优于代表性基线。可视化分析进一步表明MOLAR提供了可解释的可靠性和模态证据诊断。

英文摘要

Motivation: Noisy labels are a common challenge in molecular property prediction because molecular annotations are often obtained from assays, curated databases, or weak annotation pipelines rather than directly observed clean biological states. Treating recorded labels as reliable supervision can cause models to memorize corrupted observations and learn misleading molecular evidence. In multimodal molecular representation learning, this issue can be amplified by graph-text fusion or alignment, which may propagate label-induced errors across modalities. Results: We propose MOLAR, a noise-aware framework for learning multimodal molecular representations from noisy labels. MOLAR separates latent clean-property inference from recorded-label observation: graph and text views contribute residual evidence to a clean-property distribution, and a categorical label-observation channel maps this distribution to recorded labels for training. This formulation derives posterior label reliability and modality-specific molecular evidence from the model. Experiments on naturally noisy molecular benchmarks and controlled label-flipping benchmarks show that MOLAR consistently outperforms representative baselines. Visualization analyses further show that MOLAR provides interpretable reliability and modality-evidence diagnostics.

2606.18688 2026-06-18 cs.LG cs.AI 新提交

Dual-Channel Grounded World Modeling (DCGWM): Structural Prevention of Objective Interference Collapse via Heterogeneous External Grounding with Inward-Only Gradient Flow

双通道接地世界建模 (DCGWM):通过异构外部接地与内向梯度流结构性防止目标干扰崩溃

Akshay Hazare

发表机构 * Independent Researcher(独立研究者)

AI总结 提出双通道接地世界建模(DCGWM),通过分区潜空间和内向梯度流,结构性防止联合嵌入预测架构中多目标接地导致的目标干扰崩溃。

Comments Position paper. Experimental validation in progress

详情
AI中文摘要

联合嵌入预测架构(JEPAs)是世界模型表示学习的主要方法。我们识别出基于JEPA的世界模型在接地于两种性质不同的外部信号时存在一种失败模式:物理动力学(稀疏、高幅度、满足约束的梯度修正)和社会行为动力学(扩散、分布匹配的修正)。我们将其称为目标干扰崩溃(OIC):我们认为在共享潜空间中的联合学习会导致主导通道系统地崩溃从属通道的表示子空间,且仅通过损失加权无法解决。我们提出双通道接地世界建模(DCGWM),通过分区潜空间(物理子空间Z_p,行为子空间Z_b)和内向梯度流,从结构上防止OIC。物理接地通道通过VICReg风格的对齐到物理测量仅更新Z_p;社会行为接地通道通过对齐到涌现多智能体模拟的轨迹仅更新Z_b。通道间接口模块在任务级别耦合子空间,而不产生跨子空间梯度。非对称接地 adherence 损失通过硬铰链惩罚物理违反和软KL惩罚行为发散来惩罚 rollout 漂移。生成渲染层在架构上与潜世界模型隔离。我们给出三个理论结果:分区消除了与OIC相关的梯度干扰路径;每个接地子空间从其对齐目标继承抗崩溃保证;在生成目标几何形状的假设下,生成隔离是必要的。本文建立了问题表述和架构;实验验证正在进行中,将在未来修订中报告。

英文摘要

Joint Embedding Predictive Architectures (JEPAs) are a leading approach to world model representation learning. We identify a failure mode in JEPA-based world models grounded against two qualitatively distinct external signals: physical dynamics (sparse, high-magnitude, constraint-satisfying gradient corrections) and social-behavioral dynamics (diffuse, distribution-matching corrections). We term this Objective Interference Collapse (OIC): we argue that joint learning in a shared latent space causes the dominant channel to systematically collapse the subordinate channel's representational subspace, in a manner not resolvable by loss weighting alone. We propose Dual-Channel Grounded World Modeling (DCGWM), designed to structurally prevent OIC through a partitioned latent space (physical subspace Z_p, behavioral subspace Z_b) with inward-only gradient flow. A Physical Grounding Channel updates only Z_p via VICReg-style alignment to physical measurements; a Social-Behavioral Grounding Channel updates only Z_b via alignment to trajectories from an emergent multi-agent simulation. An Inter-Channel Interface Module couples the subspaces at the task level without cross-subspace gradients. An Asymmetric Grounding Adherence Loss penalizes rollout drift with a hard hinge for physical violations and a soft KL for behavioral divergence. A Generative Rendering Layer is architecturally isolated from the latent world model. We present three theoretical results: the partition removes the gradient-interference pathway implicated in OIC; each grounded subspace inherits anti-collapse guarantees from its alignment objective; and generative isolation is necessary under a stated assumption on the generative objective's geometry. This manuscript establishes the problem formulation and architecture; experimental validation is ongoing and will be reported in a future revision.

2606.18703 2026-06-18 cs.LG q-bio.QM 新提交

Contextualizing Biological Language Models across Modalities via Logit-Space Contrastive Alignment

跨模态生物学语言模型的逻辑空间对比对齐

Yanjun Shao, Yundi Chen, Yashvi Patel, Aurelien Pelissier, María Rodríguez Martínez

发表机构 * Biomedical Informatics and Data Science, Yale School of Medicine(耶鲁医学院生物医学信息学与数据科学)

AI总结 提出LOGICA框架,在输出逻辑空间进行对比学习,通过门控跨模态适配器保留预训练似然接口,实现跨不同词汇表模型的上下文条件预测,在蛋白质-配体结合、TCR-肽活性和药物耐药性预测任务上超越现有方法。

详情
AI中文摘要

预训练的生物学语言模型通过掩码标记预测暴露每个标记的概率分布,提供序列设计、变异评分和机制解释所依赖的似然接口。然而,这些分布是从广泛的无标注语料中学习得到的,并未自然地以任务特定的生物学上下文(如相互作用伙伴、细胞环境或治疗干预)为条件。现有的上下文匹配方法通常通过池化嵌入、对比潜在空间或任务特定的预测头来扭曲这一接口。我们提出了LOGICA(逻辑空间对比对齐),一种用于上下文条件预测的框架,直接在输出逻辑空间中进行对比学习。通过与每个模型的原生标记头兼容的门控跨模态适配器,LOGICA保留了预训练的似然接口,并将上下文化的标记对数似然转换为匹配分数。对齐是通过上下文敏感的标记概率来定义的,而不是共享嵌入空间中的邻近性,从而能够从具有不同词汇表的模型之间的稀疏配对数据中学习,无需共享分词器或解码器。LOGICA特别适用于突变局部变异排序,其中比较简化为扰动位点上突变标记的上下文条件似然。在蛋白质-配体结合、TCR-肽活性和药物条件耐药性预测中,LOGICA优于先前的最先进方法,包括匹配的潜在对比和条件MLM基线,同时保留了用于解释和生成的标记级接口。在保留基因的单突变药物耐药性预测中,LOGICA将AUC从接近随机的潜在空间基线约0.55提高到约0.65。

英文摘要

Pretrained biological language models expose per-token probability distributions through masked-token prediction, providing the likelihood interface central to sequence design, variant scoring, and mechanistic interpretation. Yet these distributions are learned from broad unlabeled corpora and are not naturally conditioned on task-specific biological contexts such as interaction partners, cellular environments, or therapeutic interventions. Existing contextual matching methods often distort this interface through pooled embeddings, contrastive latent spaces, or task-specific prediction heads. We introduce LOGICA (Logit-space Contrastive Alignment), a framework for context-conditioned prediction that performs contrastive learning directly in output-logit space. Using gated cross-modal adapters compatible with each model's native token head, LOGICA preserves the pretrained likelihood interface and converts contextualized token log-likelihoods into matching scores. Alignment is defined through context-sensitive token probabilities rather than proximity in a shared embedding space, enabling learning from sparse paired data across models with distinct vocabularies, without a shared tokenizer or decoder. LOGICA is particularly effective for mutation-local variant ranking, where comparisons reduce to context-conditioned likelihoods of mutant tokens at perturbed sites. Across protein--ligand binding, TCR--peptide activity, and drug-conditioned resistance prediction, LOGICA improves over prior state-of-the-art methods, including matched latent-contrastive and conditional MLM baselines, while retaining a token-level interface for interpretation and generation. On held-out-gene single-mutation drug-resistance prediction, LOGICA improves AUC from near-random latent-space baselines of $\sim$0.55 to $\sim$0.65.

2606.18961 2026-06-18 cs.LG 新提交

Be Your Own Teacher: Steering Protein Language Models via Unsupervised Reward Optimization

做自己的老师:通过无监督奖励优化引导蛋白质语言模型

Lanqing Li, Shentong Mo, Yang Yu, Pheng-Ann Heng

发表机构 * The Chinese University of Hong Kong(香港中文大学) MBZUAI Hong Kong University of Science and Technology(香港科学理工大学)

AI总结 提出无监督奖励优化框架,结合模型不确定性和语义一致性作为代理奖励,通过SRO和BRO算法优化PLMs,在无标签数据下实现可控蛋白质生成,性能接近有监督方法。

Comments 24 pages, 2 figures, 13 tables

详情
AI中文摘要

蛋白质语言模型(PLMs)已成为可控生物分子设计的有力工具,但其后训练适应通常依赖于昂贵的湿实验验证或精心策划的偏好数据集。为了克服这一监督瓶颈,我们引入了PLMs的无监督奖励优化,这是一个无需真实标签即可实现可引导蛋白质生成的综合框架。我们的关键见解是,任务无关的奖励(将内在模型不确定性与由蛋白质表示模型指导的外在语义一致性相结合)在基础模型和温度设置中与可控性度量表现出强相关性。基于这一发现,我们提出了两种离线算法:软奖励优化(SRO)和二值化奖励优化(BRO),它们有效地最大化由这些代理奖励诱导的经典RLHF目标。在组合性分布外提示上的大量实验表明,两种方法均显著优于竞争基线(DPO、KTO),同时在多个采样温度、模型规模和蛋白质家族中接近理想性能。此外,使用无监督奖励微调的PLMs在pass@k评估中相比其基础模型能够实现持续更高的覆盖率。通过使PLMs能够利用自身生成的体验进行自我改进,我们的框架为在标签偏好或实验反馈稀缺或不可用的环境中实现可控生物分子设计提供了一条可扩展的途径。

英文摘要

Protein language models (PLMs) have emerged as powerful tools for controllable biomolecular design, yet their post-training adaptation typically relies on costly wet-lab validation or curated preference datasets. To overcome this supervision bottleneck, we introduce unsupervised reward optimization of PLMs, a comprehensive framework for steerable protein generation without ground-truth labels. Our key insight is that task-agnostic rewards, which combine intrinsic model uncertainty with extrinsic semantic consistency informed by protein representation models, exhibit strong correlation with controllability measures across base models and temperature regimes. Building upon this discovery, we propose two offline algorithms: Soft Reward Optimization (SRO) and Binarized Reward Optimization (BRO), which effectively maximize the classical RLHF objective induced by these proxy rewards. Extensive experiments on compositional out-of-distribution prompts demonstrate that both methods significantly outperform competitive baselines (DPO, KTO), while approaching oracle performance across multiple sampling temperatures, model scales and protein families. Moreover, PLMs fine-tuned with unsupervised rewards can achieve consistently higher coverage compared to their base model in pass@k evaluations. By enabling self-improvement of PLMs through their own generated experience, our framework provides a scalable pathway toward controllable biomolecular design in settings where labeled preferences or experimental feedback are scarce or unavailable.

2606.18520 2026-06-18 stat.ML cs.CG cs.CL cs.DS cs.IR cs.LG 交叉投稿

Compact Geometric Representations of Hierarchies

层次结构的紧凑几何表示

Prashant Gokhale, Piotr Indyk, Yuhao Liu, Sandeep Silwal, Tony Chang Wang, Haike Xu

发表机构 * UW-Madison(威斯康星大学麦迪逊分校) MIT(麻省理工学院)

AI总结 研究如何用低维几何嵌入表示有向无环图中的祖先-后代关系,提出基于树宽等结构参数的维度上界和下界,并在真实数据集上验证了紧凑性。

Comments Published at the 39th Annual Conference on Learning Theory (COLT) 2026. 22 Pages

详情
AI中文摘要

计算数据的几何表示是现代机器学习的基石,通常通过训练双编码器将查询和文档映射到共享嵌入空间来实现。You等人[NeurIPS '25]的最新工作将这种方法扩展到层次检索,其中相关性由有向无环图(DAG)中的祖先-后代关系决定。虽然先前的工作表明当后代数量较少时存在有效嵌入,但这些界限对于深层层次结构会严重退化,所需维度与节点总数相当。在本文中,我们研究了更一般图类的紧凑可达性嵌入,并提供了使用维度依赖于结构图参数的嵌入来表示层次结构的理论保证。我们证明,对于任何有向树,存在常数维度3的可达性嵌入,与树的大小或深度无关。我们将这一结果推广到以树宽$t$为特征的图,构造了维度为$O(t \log n)$的嵌入,其中$n$是节点数。作为这些上界的补充,我们提供了匹配或接近匹配的下界,表明对于一般DAG,维度$\Omega(n)$是必要的,而对于树宽为$t$的图,需要$\Omega(t/\log(n/t))$的维度。我们还获得了由DAG中交叉边数量参数化的上界和下界。此外,我们展示了我们的嵌入可以在真实世界数据集上构建,并且与先前具有理论保证的嵌入相比,在高召回率情况下维度小得多。

英文摘要

Computing geometric representations of data is a cornerstone of modern machine learning, typically achieved by training dual encoders which map queries and documents into a shared embedding space. Recent work of You et al. [NeurIPS '25] has extended this approach to hierarchical retrieval, where relevance is determined by the ancestor-descendant relationships in a Directed Acyclic Graph (DAG). While previous work has shown that valid embeddings exist when the number of descendants is small, these bounds degrade significantly for deep hierarchies, requiring dimensions as large as the total number of nodes. In this paper, we investigate compact reachability embeddings for more general graph classes and provide theoretical guarantees for representing hierarchies using embeddings whose dimension depends on structural graph parameters. We prove that for any directed tree, there exists a reachability embedding in constant dimension 3, independent of the tree's size or depth. We generalize this result to graphs characterized by treewidth $t$, constructing embeddings of dimension $O(t \log n)$, where $n$ is the number of nodes. Complementing these upper bounds, we provide matching or near-matching lower bounds, showing that dimension $Ω(n)$ is necessary for general DAGs and $Ω(t/\log(n/t))$ is required for graphs of treewidth $t$. We also obtain upper and lower bounds parameterized by the number of cross-edges in the DAG. We additionally show that our embeddings can be constructed on real world datasets, and that they give much smaller dimensions in high recall regimes compared to prior embeddings with theoretical guarantees.

2606.19249 2026-06-18 cs.CV cs.LG 交叉投稿

Transformer Geometry Observatory TGO-I: Spectral Geometry Observatory

Transformer几何观测站TGO-I:谱几何观测站

Kaustubh Kapil, Kishor P. Upla

发表机构 * Sardar Vallabhai National Institute of Technology (SVNIT), Surat, India(印度苏拉特萨达尔·瓦拉巴伊国家理工学院(SVNIT))

AI总结 提出TGO框架,通过分析ViT表示的谱几何(有效秩、稳定秩、参与比、谱熵、谱平坦度、谱各向异性等),发现训练过程中维度利用增加、各向异性降低、谱熵和参与比上升,最终CLS标记表示具有最高有效维度和最低各向异性。

详情
AI中文摘要

尽管Vision Transformers(ViTs)被广泛采用并在众多计算机视觉应用中取得成功,对其维度和表示几何的基本理解仍然相对未被充分探索。为了弥补这一差距,我们引入了Transformer几何观测站(TGO),这是一个系统的实验和分析流程框架,旨在研究Vision Transformers的表示几何和动态。TGO-I是该框架的第一部分,专注于ViT表示的谱几何。使用在ImageNet-100上训练的ViT-Small/16模型,我们分析了训练过程中的有效秩、稳定秩、参与比、谱熵、谱平坦度、谱各向异性、协方差结构、特征谱和奇异值谱。我们的结果揭示了维度利用的一致增加,伴随着各向异性降低、谱熵增加、参与比增加以及逐渐平坦的特征谱。与常见的直觉(即训练应将信息集中到少数主导方向)相反,我们观察到方差在表示维度上的逐渐重新分布。这一现象在最终的CLS标记表示中尤为明显,该表示在网络中表现出最高的有效维度和最低的各向异性。

英文摘要

Despite the widespread adoption of Vision Transformers (ViTs) and their success across numerous computer vision applications, the fundamental understanding of their dimensional and representational geometry remains relatively underexplored. To address this gap, we introduce Transformer Geometry Observatory (TGO), a systematic framework of experiments and analysis pipelines designed to investigate the representational geometry and dynamics of Vision Transformers. TGO-I, the first installment of the framework, focuses on the spectral geometry of ViT representations. Using a ViT-Small/16 model trained on ImageNet-100, we analyze Effective Rank, Stable Rank, Participation Ratio, Spectral Entropy, Spectral Flatness, Spectral Anisotropy, covariance structure, eigenspectra, and singular value spectra throughout training. Our results reveal a consistent increase in dimensional utilization, accompanied by decreasing anisotropy, increasing spectral entropy, increasing participation ratio, and progressively flatter eigenspectra. Contrary to the common intuition that training should concentrate information into a small number of dominant directions, we observe a progressive redistribution of variance across representational dimensions. This phenomenon is particularly pronounced in the final CLS token representation, which exhibits the highest effective dimensionality and lowest anisotropy within the network.

2406.07775 2026-06-18 cs.LG 版本更新

Self-attention-based non-linear basis transformations for compact latent space modelling of dynamic optical fibre transmission matrices

基于自注意力的非线性基变换用于动态光纤传输矩阵的紧凑潜在空间建模

Yijie Zheng, Robert J. Kilpatrick, David B. Phillips, George S. D. Gordon

发表机构 * Optics and Photonics research group, University of Nottingham, UK(诺丁汉大学光学与光子学研究组,英国) University of Exeter, UK(埃克塞特大学,英国) State Key Laboratory of Extreme Photonics and Instrumentation, College of Optical Science and Engineering International Research Center for Advanced Photonics, Zhejiang University, Hangzhou, China(极端光子学与仪器国家重点实验室,浙江大学光科学与工程学院,国际先进光子学研究中心,中国杭州) Research Center for Humanoid Sensing, Zhejiang Lab, Hangzhou, China(人感知研究中心,浙江实验室,中国杭州)

AI总结 提出使用自注意力层动态变换光纤矩阵的坐标表示到紧凑基,实现低维表示,在多个数据集上验证了基稀疏性(参与比0.01-0.11)和低重建误差(<10%)。

详情
AI中文摘要

多模光纤是头发丝粗细的玻璃丝,能高效传输光。它们有望实现下一代医用内窥镜,在体内深处提供前所未有的亚细胞图像分辨率。然而,将光限制在这样的光纤中意味着图像在传输过程中固有地被打乱。传统上,通过预先校准特定光纤如何打乱光并求解表示光纤物理模型的静态线性矩阵方程来补偿这种打乱。然而,随着技术向实际部署发展,解扰过程必须考虑由于移动和温度变化等因素导致的光纤对光影响的矩阵的动态变化,以及由于光纤尖端在体内不可及而产生的非线性。这种复杂、动态和非线性行为非常适合用神经网络近似,但大多数领先的图像重建网络依赖卷积层,这些层假设相邻像素之间存在强相关性,这种强归纳偏置不适用于光纤矩阵,因为光纤矩阵可以用具有长程相关性的任意坐标表示来表达。我们引入了一个新概念,使用自注意力层将变化的光纤矩阵的坐标表示动态变换到允许紧凑、低维表示的基,适合进一步处理。我们在不同的光纤矩阵数据集上展示了该方法的有效性。我们展示了我们的模型在变换基上显著提高了光纤基的稀疏性,以参与比p作为稀疏性度量,介于0.01和0.11之间。此外,我们展示了这些变换后的表示允许以<10%的重建误差重建原始矩阵,证明了可逆性。

英文摘要

Multimode optical fibres are hair-thin strands of glass that efficiently transport light. They promise next-generation medical endoscopes that provide unprecedented sub-cellular image resolution deep inside the body. However, confining light to such fibres means that images are inherently scrambled in transit. Conventionally, this scrambling has been compensated by pre-calibrating how a specific fibre scrambles light and solving a stationary linear matrix equation that represents a physical model of the fibre. However, as the technology develops towards real-world deployment, the unscrambling process must account for dynamic changes in the matrix representing the fibre's effect on light, due to factors such as movement and temperature shifts, and non-linearities resulting from the inaccessibility of the fibre tip when inside the body. Such complex, dynamic and nonlinear behaviour is well-suited to approximation by neural networks, but most leading image reconstruction networks rely on convolutional layers, which assume strong correlations between adjacent pixels, a strong inductive bias that is inappropriate for fibre matrices which may be expressed in a range of arbitrary coordinate representations with long-range correlations. We introduce a new concept that uses self-attention layers to dynamically transform the coordinate representations of varying fibre matrices to a basis that admits compact, low-dimensional representations suitable for further processing. We demonstrate the effectiveness of this approach on diverse fibre matrix datasets. We show our models significantly improve the sparsity of fibre bases in their transformed bases with a participation ratio, p, as a measure of sparsity, of between 0.01 and 0.11. Further, we show that these transformed representations admit reconstruction of the original matrices with < 10% reconstruction error, demonstrating the invertibility.

2605.10840 2026-06-18 cs.LG cs.AI q-bio.QM 版本更新

Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

Clin-JEPA:一种多阶段协同训练框架,用于EHR患者轨迹的联合嵌入预测预训练

Yixuan Yang, Mehak Arora, Ryan Zhang, Baraa Abed, Junseob Kim, Tilendra Choudhary, Md Hassanuzzaman, Kevin Zhu, Ayman Ali, Chengkun Yang, Alasdair Edward Gent, Victor Moas, Rishikesan Kamaleswaran

发表机构 * Duke University(杜克大学)

AI总结 本文提出Clin-JEPA框架,通过多阶段预训练稳定协同训练编码器和预测器,解决EHR数据中联合嵌入预测的挑战,实现多任务下游任务的高性能表现。

Comments 16 pages, 4 figures, 8 tables. Code: https://github.com/YeungYathin/Clin-JEPA

详情
AI中文摘要

我们介绍了Clin-JEPA,一种用于EHR患者轨迹的联合嵌入预测(JEPA)预训练的多阶段协同训练框架。JEPA架构已在机器人领域实现了潜在空间规划,并在视觉领域实现了高质量的表示学习,但将其扩展到EHR数据以获得一个能够同时预测患者轨迹并服务于多种下游风险预测任务的单一主干,仍是一个开放性挑战。现有的JEPA框架要么在预训练后丢弃预测器(I-JEPA,V-JEPA),要么在冻结的预训练编码器上训练预测器(V-JEPA 2-AC),导致编码器在推理时无法感知预测器必须使用的滚动信号;在共享JEPA预测目标下协同训练编码器和预测器将提供这种基础,但朴素的协同训练不稳定,代表性崩溃和在线/目标漂移导致自回归滚动发散。Clin-JEPA的五阶段预训练课程——预测器预热、联合细化、EMA目标对齐、硬同步和预测器最终化——通过阶段解决每个失败模式,稳定地协同训练基于Qwen3-8B的编码器和一个具有9200万参数的潜在轨迹预测器。在MIMIC-IV ICU数据上,三个独立评估支持该框架:(1)潜在ℓ1滚动漂移唯一收敛(-15.7%)在48小时范围内,而基线和消融测试发散(+3%至+4951%);(2)编码器学习了临床可区分的潜在几何结构(衰变患者群体在潜在空间中偏离4.83×,而稳定患者仅偏离≤2.62×);(3)单一主干在多任务下游评估中优于强大的表格和序列基线。Clin-JEPA在ICareFM EEP上达到平均AUROC 0.851,在8个二元风险任务上达到0.883(比基线平均高0.038和0.041)

英文摘要

We present Clin-JEPA, a multi-phase co-training framework for joint-embedding predictive (JEPA) pretraining on EHR patient trajectories. JEPA architectures have enabled latent-space planning in robotics and high-quality representation learning in vision, but extending the paradigm to EHR data -- to obtain a single backbone that simultaneously forecasts patient trajectories and serves diverse downstream risk-prediction tasks without per-task fine-tuning -- remains an open challenge. Existing JEPA frameworks either discard the predictor after pretraining (I-JEPA, V-JEPA) or train it on a frozen pretrained encoder (V-JEPA 2-AC), leaving the encoder unaware of the rollout signal that the retained predictor must use at inference; co-training the encoder and predictor under a shared JEPA prediction objective would supply this grounding, but naïve co-training is unstable, with representation collapse and online/target drift causing autoregressive rollout to diverge. Clin-JEPA's five-phase pretraining curriculum -- predictor warmup, joint refinement, EMA target alignment, hard sync, and predictor finalization -- addresses each failure mode by phase, stably co-training a Qwen3-8B-based encoder and a 92M-parameter latent trajectory predictor. On MIMIC-IV ICU data, three independent evaluations support the framework: (1) latent $\ell_1$ rollout drift uniquely converges ($-$15.7%) over 48-hour horizons while baselines and ablations diverge (+3% to +4951%); (2) the encoder learns a clinically discriminative latent geometry (deteriorating-patient cohorts displace 4.83$\times$ further than stable patients in latent space, vs $\leq$2.62$\times$ for baseline encoders); (3) a single backbone outperforms strong tabular and sequence baselines on multi-task downstream evaluation. Clin-JEPA achieves mean AUROC 0.851 on ICareFM EEP and 0.883 on 8 binary risk tasks (+0.038 and +0.041 vs baseline average).

2606.12629 2026-06-18 cs.LG cs.AI 版本更新

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

Bag of Dims:通过维度级符号模式实现无需训练的机制可解释性

Varun Reddy Nalagatla

发表机构 * Amazon Web Services(亚马逊云服务)

AI总结 本文提出Bag of Dims框架,证明Transformer隐藏状态的标准基即可作为无需训练的特征基,通过维度符号模式编码语义,并在三个模型上验证了其有效性。

Comments 22 pages, 5 figures, 27 tables

详情
AI中文摘要

我们表明,Transformer隐藏状态的标准基已经提供了一个无需训练、架构通用的特征基。单个维度通过其符号编码语义内容,通过其幅度编码置信度,充当独立的二进制寄存器。我们通过四个渐进实验在三个模型家族(Qwen 3.5-4B、Gemma 3-4B、Mistral 7B)上验证了这种Bag of Dims框架。仅符号模式就携带预测性内容:将所有幅度替换为1,通过LM头实现72-93%的top-5下一个token准确率,而无需任何解码器的纯汉明评分达到80-90%的top-4096准确率。这些符号模式组织成语义特征:使用单token类型缓存(每个词汇token一次前向传播,无上下文),我们通过每维度符号一致性(平均AUC 0.80)从50个锚点发现了175个类别,无需任何训练。一个训练过的探针仅增加+0.018 AUC并收敛到轴对齐的权重,证实了可忽略的跨维度结构。这种结构扩展到注意力:所有175个类别在K和V投影中仍然可发现。在写入端,静态FFN权重检查将20%的特征与单个写入神经元联系起来(一致性>0.70;随机对照:0%),通过多数投票,top-200神经元联盟在99.9%的原型上实现>0.70的一致性。完全无监督的发现(随机种子,无标签)在所有三个模型上扩展到1500个特征,产量100%,稀疏度99%,成对互信息为0.0014比特,证实了低维度间耦合。这些结果确立了标准基已经足以在整个Transformer计算路径中进行特征读取,无需训练、无需优化,且每个词汇token仅需一次前向传播,无需GPU天数。

英文摘要

We show the standard basis of transformer hidden states already provides a training-free, architecture-general feature basis. Individual dimensions encode semantic content via their signs (+/-1) and confidence via their magnitudes, acting as independent binary registers; a feature is a subset of dimensions with a consistent sign pattern, read by counting sign agreements with no learned rotation. We validate this Bag of Dims framework across seven models spanning language (Qwen 3.5-4B, Gemma 3-4B, Mistral 7B, Qwen3-32B), vision (DINOv2, ViT-Base), and audio (AST). Signs alone carry predictive content: unit-magnitude sign patterns preserve 60-93% top-5 next-token accuracy through the LM head, and decoder-free Hamming scoring reaches 80-90% top-4096. From a single-token cache (one forward pass per token, no context, no labels), we detect 175 categories at AUC 0.97-0.99 by sign agreement; a trained probe adds only +0.018 AUC and converges to axis-aligned weights. These features are causally operative: they survive the K/V attention projections, trace to the FFN neuron coalitions that write them (random-weight controls never reproduce this), and flipping a feature's signs during the live forward pass suppresses its concept across four language models, magnitude-matched and concept-specific. Dimensions stay independent throughout (pairwise mutual information below 0.006 bits). The structure is not specific to language: the same per-dimension signs appear in self-supervised vision (DINOv2, 9/12 ImageNet superclasses), supervised vision (ViT-Base, 11/12), and audio (AST, 50/50 ESC-50 categories), so it reflects transformer training in general, not the language-modeling objective. The standard basis already suffices for feature reading at one forward pass, no optimization, no GPU-days. The open problem shifts from finding the right rotation to cataloging what each dimension encodes.

2603.11417 2026-06-18 cs.CV cs.LG 版本更新

Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations

端到端自动驾驶中的零样本跨城市泛化:自监督与监督表示

Fatemeh Naeinian, Ali Hamza, Haoran Zhu, Anna Choromanska

发表机构 * Department of Electrical and Computer Engineering, NYU Tandon School of Engineering(电气工程系,纽约大学Tandon工程学院)

AI总结 研究端到端自动驾驶模型在跨城市零样本迁移中的泛化能力,发现自监督预训练(如I-JEPA、DINOv2、MAE)相比监督预训练能显著减少位移和碰撞退化,提升闭环评估中的分布外PDMS。

详情
AI中文摘要

端到端自动驾驶模型通常使用监督的ImageNet预训练骨干网络在多城市数据集上训练,但其泛化到未见城市的能力尚未得到充分检验。当训练和评估数据在地理上混合时,模型可能隐含地依赖城市特定线索,掩盖了在真实世界域偏移下泛化到新位置时可能出现的失败模式。在这项工作中,我们将零样本跨城市迁移定义为端到端自动驾驶的受控表示级压力测试,并探究视觉预训练如何影响地理域偏移下的迁移行为。我们通过将自监督骨干网络I-JEPA、DINOv2和MAE集成到规划框架中进行了全面研究。我们在nuScenes上的开环设置和NAVSIM上的闭环评估协议中,在严格的地理划分下评估性能。我们的实验揭示了当模型在不同道路拓扑、交通规则和视觉环境的城市间迁移时存在显著的泛化差距。在开环评估中,监督骨干网络在城市间迁移时表现出严重退化,而某些领域特定的自监督方法可以显著减少位移和碰撞退化。在闭环评估中,自监督预训练在多个单城市训练设置中提高了平均分布外PDMS。我们的结果提供了经验证据,表明表示学习影响跨城市规划的鲁棒性,并促使将零样本地理迁移作为评估端到端自动驾驶系统的重要压力测试。

英文摘要

End-to-end autonomous driving models are typically trained on multi-city datasets using supervised ImageNet-pretrained backbones, yet their ability to generalize to unseen cities remains largely unexamined. When training and evaluation data are geographically mixed, models may implicitly rely on city-specific cues, masking failure modes that would occur under real-world domain shifts when generalizing to new locations. In this work, we formulate zero-shot cross-city transfer as a controlled representation-level stress test for end-to-end autonomous driving and ask how visual pretraining affects transfer behavior under geographic domain shift. We conduct a comprehensive study by integrating self-supervised backbones I-JEPA, DINOv2, and MAE into planning frameworks. We evaluate performance under strict geographic splits on nuScenes in the open-loop setting and on NAVSIM in the closed-loop evaluation protocol. Our experiments reveal a substantial generalization gap when transferring models across cities with different road topologies, traffic conventions, and visual environments. In open-loop evaluation, a supervised backbone exhibits severe degradation when transferring between cities, yet some domain-specific self-supervised methods can substantially reduce both displacement and collision degradation. In closed-loop evaluation, self-supervised pretraining improves average out-of-distribution PDMS in several single-city training settings. Our results provide empirical evidence that representation learning influences the robustness of cross-city planning and motivate zero-shot geographic transfer as an important stress test for evaluating end-to-end autonomous driving systems.

3. 强化学习与序列决策 26 篇

2606.18284 2026-06-18 cs.LG cs.AI cs.CL 新提交

Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier

打破求解器瓶颈:在可学习前沿训练任务生成器

Lorenz Wolf, Connor Watts, Roger Creus Castanyer, Geoffrey Bradway, Maxwill Lin, Augustine N. Mavor-Parker, Matthew Daborn-Sargent

发表机构 * Vmax Goodfire AI

AI总结 提出PROPEL框架,通过训练轻量级激活探针作为求解率代理,在无需重复求解器评估的情况下优化任务生成器,使生成任务集中在可学习前沿,提升数学、代码和软件工程任务的有效性。

Comments 30 pages, 9 figures, 12 tables

详情
AI中文摘要

通过强化学习训练智能体的限制资源日益成为前沿任务供给:有效、可求解且刚好足够困难以训练当前模型的任务。随着推理和智能体模型的改进,固定任务分布趋于饱和,而天真的合成生成产生琐碎、不可能或不适定的任务。用强化学习训练任务生成器以优化有效性和可学习性可以解决这一瓶颈,但直接优化需要对每个候选任务进行重复求解器评估。对于软件工程任务,单次评估可能耗时数十分钟;求解器在环的生成器训练是不可行的。我们提出PROPEL,一个求解器摊销框架,用于在目标求解率下训练任务生成器。PROPEL在一次性标注的生成任务和求解器结果语料库上训练一个轻量级激活探针。该探针从冻结的生成器参考模型预测目标求解器的通过率,并在生成器优化期间作为求解率的代理,将生成器评估简化为单次前向传播。在多种模型规模下的数学、代码和软件工程任务中,PROPEL将生成任务转向目标求解率:对于编程,在可学习前沿生成的任务从$10.1\% \ ightarrow 20.0\%$(针对Qwen2.5-3B-Instruct求解器)和从$5.3\% \ ightarrow 12.6\%$(针对Qwen2.5-7B-Instruct求解器)。对于软件工程,PROPEL将目标求解率下的生成份额从$9.8\% \ ightarrow 19.6\%$(针对Qwen3.5-27B在探针和生成器训练期间未见过的仓库)。

英文摘要

The limiting resource for training agents via reinforcement learning (RL) is increasingly frontier task supply: valid, solvable tasks just difficult enough to train the current model. As reasoning and agentic models improve, fixed task distributions saturate, while naive synthetic generation yields tasks that are trivial, impossible, or ill-posed. Training a task generator with RL to optimize validity and learnability can address this bottleneck, but direct optimization requires repeated solver rollouts per candidate. For software-engineering (SWE) tasks, a single rollout can take tens of minutes; solver-in-the-loop generator training is intractable. We introduce PROPEL, a solver-amortized framework for training task generators at the targeted solve rate. PROPEL trains a lightweight activation probe on a one-time labeled corpus of generated tasks and solver outcomes. The probe predicts target-solver pass rate from a frozen generator reference model and serves as a proxy for solve rate during generator optimization, reducing generator evaluation to a single forward pass. Across math, code, and software-engineering at multiple model scales, PROPEL shifts generation toward the targeted solve rate: for coding, tasks generated at the learnable frontier increase from $10.1\% \rightarrow 20.0\%$ for a Qwen2.5-3B-Instruct solver and from $5.3\% \rightarrow 12.6\%$ for a Qwen2.5-7B-Instruct solver. For SWE, PROPEL increases the share of generations at the targeted solve rate from $9.8\% \rightarrow 19.6\%$ for Qwen3.5-27B on repositories not seen during training of probe and generator.

2606.18308 2026-06-18 cs.LG cs.AI 新提交

TRIDENT: Breaking the Hybrid-Safety-Physics Coupling for Provably Safe Multi-Agent Reinforcement Learning

TRIDENT: 打破混合安全-物理耦合以实现可证明安全的多智能体强化学习

Zijie Meng, Ziwei Li, Yufei Liu, Zhiyu Li, Jiyuan Liu, Wenhua Nie, Bingcai Wei, Miao Zhang

发表机构 * Peking University(北京大学) Xiamen University(厦门大学) National Taiwan University(国立台湾大学) WHU(武汉大学) THU / Jimei University(清华大学 / 集美大学)

AI总结 针对混合离散-连续动作、训练时安全约束和物理动力学形成的耦合问题,提出TRIDENT框架,通过Richardson-Romberg梯度校正、Lyapunov约束序列信任域更新和物理信息残差评论家,实现可证明的安全收敛,显著降低训练违规并提升奖励。

Comments 16 pages, 4 figures

详情
AI中文摘要

网络化信息物理系统中的安全协调迫使学习算法同时处理混合离散-连续动作、严格的训练时安全约束和物理支配的动力学。我们证明这三个特征形成了一个有向偏差循环,击败了任何现成模块的朴素组合,并将其形式化为一个三向耦合引理。然后我们引入TRIDENT,这是第一个MARL框架,其三个组件被共同设计以消除每个泄漏:一个将Gumbel-Softmax偏差从O(tau)降低到O(tau^2)的Richardson-Romberg梯度校正,一个强制每次迭代可行性的Lyapunov约束顺序信任域更新,以及一个分解价值而非奖励的物理信息残差评论家。我们证明了以O~(1/sqrt(K))的收敛速率达到约束纳什均衡,以及O(sqrt(K))的累积违规界。在多无人机移动边缘计算、自主交叉口管理和混合SMAC变体上,TRIDENT相比MADDPG减少了95.5%的训练时违规,相比MACPO减少了76.3%,同时相比最强的无约束基线提高了13.5%的奖励。

英文摘要

Safe coordination in networked cyber-physical systems forces learning algorithms to simultaneously handle hybrid discrete-continuous actions, hard training-time safety constraints, and physics-governed dynamics. We show that these three features form a directed cycle of biases that defeats any naive composition of off-the-shelf modules, and formalize this as a three-way coupling lemma. We then introduce TRIDENT, the first MARL framework whose three components are co-designed to cancel each leak: a Richardson-Romberg gradient correction reducing Gumbel-Softmax bias from O(tau) to O(tau^2), a Lyapunov-constrained sequential trust-region update enforcing per-iterate feasibility, and a physics-informed residual critic that decomposes value rather than reward. We prove an O~(1/sqrt(K)) convergence rate to a constrained Nash equilibrium and an O(sqrt(K)) cumulative-violation bound. On multi-UAV mobile-edge computing, autonomous intersection management, and a hybrid SMAC variant, TRIDENT cuts training-time violations by 95.5% over MADDPG and 76.3% over MACPO, while improving reward by 13.5% over the strongest unconstrained baseline.

2606.18327 2026-06-18 cs.LG cs.AI 新提交

Self-CTRL: Self-Consistency Training with Reinforcement Learning

Self-CTRL:基于强化学习的自一致性训练

Itamar Pres, Laura Ruis, Melat Ghebreselassie, Belinda Z. Li, Jacob Andreas

发表机构 * MIT CSAIL(麻省理工学院计算机科学与人工智能实验室)

AI总结 提出Self-CTRL方法,通过强化学习优化语言模型自我解释与行为之间的一致性,在概率推理和宪法AI任务上显著提升一致性和安全性。

Comments 34 pages, 12 figures, includes appendices

详情
AI中文摘要

能够忠实描述自身行为的语言模型(LMs)更容易被用户审计、理解和信任。本文描述了基于强化学习的自一致性训练(Self-CTRL),该方法通过更新解释以更好地预测行为或更新行为以更好地匹配解释,优化LM的自我解释与相关输入行为之间的一致性。我们在两个领域应用该方法。首先,研究一个形式化概率推理任务,其中LM必须学习模仿一组有偏采样器,并评估其报告相关偏差的能力。我们发现,一致性训练将自我报告和行为测量的潜在偏差之间的相关性从$R^2=0.24$提高到$R^2=0.64$(在保留分布上),匹配直接真实标签监督的泛化能力。其次,研究一个宪法AI领域,其中LM必须描述何时拒绝或遵守用户请求。在此,Self-CTRL产生忠实描述模型在保留请求上行为的规则,将第三方审计模型的拒绝预测从$36\%$提高到$92\%$。另一方面,行为更新改善了对齐,将HarmBench失败率从$15.0\%$降低到$0.5\%$,而不会显著增加对无害提示的拒绝。通过对齐解释和行为,我们的工作为训练更安全、更透明、更可控的AI模型提供了通用方法。

英文摘要

Language models (LMs) that faithfully describe their own behavior can more easily be audited, understood, and trusted by users. This paper describes Self-Consistency Training with Reinforcement Learning (Self-CTRL), a method that optimizes for consistency between a LM's self-explanations and behavior on related inputs by updating explanations to better predict behavior or updating behavior to better match explanations. We apply our method in two domains. First, we study a formal probabilistic reasoning task in which LMs must learn to imitate a family of biased samplers and evaluated on their ability to report the associated biases. We find that consistency training improves the correlation between self-reported and behaviorally-measured latent biases from $R^2=0.24$ to $R^2=0.64$ on a set of held-out distributions, matching the generalization of direct ground-truth supervision. Second, we study a constitutional AI domain in which LMs must describe when they will refuse or comply with user requests. Here, Self-CTRL produces rules that faithfully describe the model's behavior on held-out requests, improving the refusal predictions of a third-party auditor model from $36\%$ to $92\%$. In the other direction, behavior updates improve alignment, reducing HarmBench failure rate from $15.0\%$ to $0.5\%$ without substantially increasing refusal on harmless prompts. By aligning explanations and behavior, our work provides a general recipe for training AI models to be safer, more transparent, and more controllable.

2606.18469 2026-06-18 cs.LG cs.AI 新提交

Structured Representation Learning with Locally Linear Embeddings and Adaptive Feature Fusion

基于局部线性嵌入与自适应特征融合的结构化表示学习

Somjit Nath, Jackson J Cone, Derek Nowrouzezahrai, Samira Ebrahimi Kahou

发表机构 * Mila – Quebec AI Institute(米拉-魁北克人工智能研究所)

AI总结 受神经科学启发,提出一种强化学习框架,利用局部线性嵌入捕捉状态局部结构,并通过注意力机制自适应融合动态与奖励特征,提升学习效率。

Comments Published in Transactions on Machine Learning Research (04/2026)

详情
AI中文摘要

神经科学研究揭示,大脑通过利用结构化的低维流形和自适应门控机制动态融合多源信息来编码复杂行为。受这些原理启发,我们提出了一种新颖的强化学习(RL)框架,鼓励分离动态特定和奖励特定特征,直接类比神经回路如何分离和整合信息以实现高效决策。我们的方法利用局部线性嵌入(LLE)来捕捉许多环境中固有的局部线性结构,反映神经群体活动中观察到的局部平滑性,同时通过标准RL目标推导奖励特定特征。一种类似于皮层门控的注意力机制,在逐状态基础上自适应地融合这些互补表示。在基准任务上的实验结果表明,我们的方法基于神经科学原理,相比传统RL方法提高了学习效率和整体性能,凸显了显式建模局部状态结构和自适应特征选择(如生物系统中观察到的)的优势。

英文摘要

Neuroscientific research has revealed that the brain encodes complex behaviors by leveraging structured, low-dimensional manifolds and dynamically fusing multiple sources of information through adaptive gating mechanisms. Inspired by these principles, we propose a novel reinforcement learning (RL) framework that encourages the disentanglement of dynamics-specific and reward-specific features, drawing direct parallels to how neural circuits separate and integrate information for efficient decision-making. Our approach leverages locally linear embeddings (LLEs) to capture the intrinsic, locally linear structure inherent in many environments, mirroring the local smoothness observed in neural population activity, while concurrently deriving reward-specific features through the standard RL objective. An attention mechanism, analogous to cortical gating, adaptively fuses these complementary representations on a per-state basis. Experimental results on benchmark tasks demonstrate that our method, grounded in neuroscientific principles, improves learning efficiency and overall performance compared to conventional RL approaches, highlighting the benefits of explicitly modeling local state structures and adaptive feature selection as observed in biological systems.

2606.18503 2026-06-18 cs.LG stat.ML 新提交

Quantum Annealing Enhanced Reinforcement Learning for Accurate Remaining Useful Lifetime Prediction

量子退火增强强化学习用于精确剩余使用寿命预测

Manoranjan Gandhudi, Arunkumar V., G. R. Anil, Gangadharan G. R

发表机构 * Central University of Karnataka(卡纳塔克中央大学) University College of Engineering, Anna University(安娜大学工程学院) AIONOS India Pvt Ltd(AIONOS印度私人有限公司) National Institute of Technology Tiruchirappalli(蒂鲁吉拉帕利国立理工学院)

AI总结 提出量子退火增强Q学习框架,通过将Q值更新编码为QUBO问题并利用量子退火采样实现随机动作选择,解决高维非凸空间中的收敛问题,在C-MAPSS和工业数据集上显著优于基线方法。

Comments 29 pages, 6 figures, 12 tables

详情
AI中文摘要

剩余使用寿命(RUL)估计是预测性维护的核心,意外故障的成本可能远超资产本身。统计退化模型忽略了真实系统的强非线性,而数据驱动模型在高维非凸搜索空间中常收敛到次优解。我们提出量子退火增强Q学习(QAQL)框架,将量子退火的采样行为与Q学习的序列决策相结合。每个Q值更新被编码为一个小的二次无约束二元优化(QUBO)问题,其基态对应贪婪动作;退火器不是作为确定性优化器,而是在多次读取中返回一个近最优动作的分布,这种随机动作选择提供了探索,从而抑制了在非线性退化轨迹上的过早收敛。QUBO在D-Wave Advantage系统上通过小规模嵌入求解,退火器被嵌入强化学习循环中,而非训练后附加。我们在两个公开基准上验证了QAQL:NASA C-MAPSS涡扇发动机数据集和一个设备群预测性维护数据集。在多次独立运行和六个误差指标上平均,QAQL优于本研究考虑的经典和量子基线,具有统计显著性改进。结果表明,量子退火是工业预测性维护应用中强化学习循环内一个可用的(而非仅理论上的)优化器。

英文摘要

Remaining useful life (RUL) estimation is central to predictive maintenance, where an unplanned failure can cost far more than the asset itself. Statistical degradation models miss the strong nonlinearity of real systems, and data-driven models often converge to suboptimal solutions in high-dimensional, non-convex search spaces. We propose a Quantum Annealing enhanced Q-Learning (QAQL) framework that couples the sampling behaviour of quantum annealing with the sequential decision making of Q-learning. Each Q-value update is encoded as a small quadratic unconstrained binary optimization (QUBO) whose ground state is the greedy action; rather than acting as a deterministic optimizer, the annealer returns a distribution over near-optimal actions across many reads, and this stochastic action selection supplies the exploration that curbs premature convergence on nonlinear degradation trajectories. The QUBO is solved on the D-Wave Advantage system using minor embedding, with the annealer woven into the reinforcement-learning loop rather than bolted on after training. We validate QAQL on two public benchmarks: the NASA C-MAPSS turbofan engine datasets and a device-fleet predictive maintenance dataset. Averaged over many independent runs and across six error metrics, QAQL outperforms the classical and quantum baselines considered in this study, with statistically significant improvements. The results indicate that quantum annealing is a usable, not merely theoretical, optimizer inside a reinforcement-learning loop for industrial predictive-maintenance applications.

2606.18537 2026-06-18 cs.LG 新提交

Do as the Romans Do: Learning Universal Behaviors from Heterogeneous Agents

入乡随俗:从异构智能体学习通用行为

Caleb Chang, Davin Win Kyi, Natasha Jaques, Karen Leung

发表机构 * University of Washington(华盛顿大学) NVIDIA(英伟达)

AI总结 提出GRID方法,从追求不同目标的异构示范者中提取通用奖励,训练通用智能体以学习环境通用能力,避免模式平均偏差,提升下游任务微调效率。

详情
AI中文摘要

人类通常通过观察他人来获取新技能,因为观察到的行为隐含地揭示了如何在环境中行动。然而,从异构群体中获得的观察会引入冲突的行为信号,使得难以确定哪些行为值得模仿。我们通过通用奖励推断与解耦(GRID)来解决这一挑战,这是一种从追求不同目标的异构示范者群体中提取普遍有用行为的社会学习方法。GRID将每个智能体的奖励函数分解为通用奖励(捕捉所有智能体共享的行为)和特定奖励(捕捉个体偏好和目标)。仅基于通用奖励进行训练提供了一种通用预训练的新范式。它产生了一个通用智能体,该智能体内化了通用的环境能力,如安全性和基本任务熟练度,而不会出现困扰标准从示范学习技术的模式平均偏差。这个通用智能体作为微调到下游任务(包括训练中未见过的偏好)的优越先验。在合成基函数分解、多智能体Craftax和连续自动驾驶模拟器(Highway-Env)上的实验证实,GRID以语义上有意义的方式成功解耦了奖励结构,优于标准的从示范学习基线,并实现了更高效和稳定的特化。

英文摘要

Humans often acquire new skills by observing others, since observed behaviors implicitly reveal how to act in an environment. However, observations drawn from a heterogeneous population introduce conflicting behavioral signals, making it difficult to determine which behaviors are worth imitating. We address this challenge with General Reward Inference and Disentanglement (GRID), a social learning method that extracts universally useful behaviors from a heterogeneous population of demonstrators pursuing different goals. GRID decomposes per-agent reward functions into a general reward, capturing behaviors shared across all agents, and specific rewards, capturing individual preferences and objectives. Training exclusively on the general reward provides a new paradigm of generalist pretraining. It yields a generalist agent that internalizes universal environmental competencies, such as safety and basic task proficiency, without the mode-averaging bias that afflicts standard learning from demonstration techniques. This generalist serves as a superior prior for fine-tuning to downstream tasks, including preferences unseen during training. Experiments across a synthetic basis function decomposition, multi-agent Craftax, and a continuous autonomous driving simulator (Highway-Env) confirm that GRID successfully disentangles reward structure in a semantically meaningful way, outperforms standard learning from demonstration baselines, and enables more efficient and stable specialization.

2606.18785 2026-06-18 cs.LG cs.AI 新提交

Bayesian Anytime Pareto Set Identification for Multi-Objective Multi-Armed Bandits

贝叶斯任意时间帕累托集识别用于多目标多臂老虎机

Lennert Saerens, Bram Silue, Eleni Litsa, Peter Vrancx, Pieter Libin

发表机构 * imec Data Science Institute, Interuniversity Institute of Biostatistics and Statistical Bioinformatics, UHasselt(哈瑟尔特大学生物统计学与统计生物信息学跨大学研究所数据科学研究所)

AI总结 提出首个任意时间多目标多臂老虎机算法Top-Two帕累托前沿汤普森采样(TTPFTS),用于帕累托集识别,在合成环境和超大型分子库中验证有效性,并引入不确定性量化指标。

Comments 26 pages, 13 figures

详情
AI中文摘要

识别帕累托最优解对于支持多目标决策至关重要。我们首次提出了一种用于帕累托集识别问题的任意时间多目标多臂老虎机算法,采用贝叶斯方法:Top-Two帕累托前沿汤普森采样(TTPFTS)。我们在合成环境中将TTPFTS与最先进的固定预算帕累托集识别算法进行基准测试。接下来,我们通过高效探索超大型按需合成分子库,在具有挑战性的多目标分子发现场景中展示了其实用性。此外,我们引入了一种新颖的不确定性量化指标,用于估计算法在预测帕累托集上的置信度。我们证明该指标有效代理真实性能,为监控复杂环境中的学习进度提供了一种稳健的方法。最后,我们用算法渐近正确性的理论证明补充了这些实证发现。

英文摘要

Identifying Pareto optimal solutions is critical to support multi-objective decision-making. We introduce the first anytime Multi-Objective Multi-Armed Bandit algorithm for the Pareto Set Identification problem, taking a Bayesian approach: Top-Two Pareto Front Thompson Sampling (TTPFTS). We benchmark TTPFTS against state-of-the-art fixed-budget Pareto Set Identification algorithms on synthetic environments. Next, we demonstrate its practical utility in a challenging multi-objective molecular discovery setting by efficiently exploring an ultra-large synthesis-on-demand molecular library. Furthermore, we introduce a novel uncertainty quantification metric that estimates our algorithm's confidence in the predicted Pareto set. We demonstrate that this metric effectively proxies true performance, yielding a robust methodology for monitoring learning progress in complex settings. Finally, we complement these empirical findings with a theoretical proof of the algorithm's asymptotic correctness.

2606.18810 2026-06-18 cs.LG cs.AI 新提交

Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards

从自身解中学习:面向可验证奖励强化学习的自条件化信用分配

Yingyu Shan, Yuhang Guo, Zihao Cheng, Zeming Liu, Xiangrong Zhu, Xinyi Wang, Jiashu Yao, Wei Lin, Hongru Wang, Heyan Huang

发表机构 * Beijing Institute of Technology(北京理工大学) Beihang University(北京航空航天大学) Independent Researcher(独立研究者)

AI总结 提出SC-GRPO方法,利用自条件化分布间的KL散度作为GRPO梯度的乘性权重,实现细粒度信用分配,在数学、代码和智能体任务上平均提升8.1%。

详情
AI中文摘要

具有可验证奖励的强化学习(RLVR)在训练LLMs进行推理任务方面取得了显著进展,但代表性方法如GRPO对所有token分配统一信用,浪费了常规token上的梯度,同时低估了关键推理步骤。现有的token级信用分配方法需要超出模型自身rollout的资源。GRPO变体依赖于过程奖励模型或真实答案。知识蒸馏通过每个token的散度分配信用,但需要外部教师(在线策略蒸馏)或特权信息(在线策略自蒸馏)。然而,这些依赖性限制了在纯RLVR设置中的适用性。我们观察到,将模型以其自身验证过的轨迹为条件,会在原始分布和条件分布之间诱导出可测量的每token KL散度,并证明当存在多个验证过的轨迹时,从由验证过的轨迹构建的自教师进行蒸馏会导致不可行的加权平均解。我们提出SC-GRPO(自条件化GRPO),它使用前述KL散度作为GRPO梯度的乘性权重。在涵盖数学、代码和智能体任务的五个基准上,SC-GRPO一致优于GRPO 8.1%,优于DAPO 5.9%,并具有更强的分布外性能。此外,SC-GRPO实现了比OPD更高的性能。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has driven substantial progress in training LLMs for reasoning tasks, but representative methods such as GRPO assign uniform credit across all tokens, wasting gradient on routine tokens while under-crediting pivotal reasoning steps. Existing token-level credit assignment methods require resources beyond the model's own rollouts. GRPO variants rely on process reward models or ground-truth answers. Knowledge distillation assigns credit through per-token divergence but requires external teachers (On-Policy Distillation) or privileged information (On-Policy Self Distillation). However, these dependencies limit applicability in the pure RLVR setting. We observe that conditioning the model on its own verified trajectories induces a measurable per-token KL divergence between the original and conditioned distributions, and prove that distilling from a self-teacher constructed by verified trajectories leads to infeasible weighted-average solutions when multiple verified trajectories exist. We propose SC-GRPO (Self-Conditioned GRPO), which uses KL divergence mentioned before as a multiplicative weight on GRPO gradients. Across five benchmarks spanning math, code, and agentic tasks, SC-GRPO consistently outperforms 8.1% over GRPO and 5.9% over DAPO with stronger OOD performance. Moreover, SC-GRPO achieves higher performance than OPD.

2606.18812 2026-06-18 cs.LG cs.AI 新提交

Reinforcement Learning Foundation Models Should Already Be A Thing

强化学习基础模型本应已经存在

Abdelrahman Zighem, Jill-Jênn Vie

发表机构 * École normale supérieure de Paris, PSL University, Paris, France(巴黎高等师范学院,PSL大学,法国巴黎) Soda team, Inria Saclay, Palaiseau, France(Soda团队,法国国家信息与自动化研究所萨克雷中心,法国帕莱索)

AI总结 提出通过合成MDP构建强化学习基础模型,利用固定大小的充分统计量使注意力架构适用,在线和离线实验均优于传统算法。

详情
AI中文摘要

语言和视觉的基础模型由互联网规模的数据驱动,而结构化领域(表格预测、时间序列预测、图学习、强化学习)则不然。替代方案是合成数据,它将负担从收集转移到先验设计。这种先验已经存在于许多结构化任务中:TabPFN及其后续工作通过一个在合成贝叶斯先验上预训练的Transformer解决表格分类问题。我们提出两点。\textbf{首先},强化学习是明显的空白:采样一个合成MDP与采样一个合成表格数据集一样可行,然而没有上下文强化学习工作将先验设计作为主要目标。\textbf{其次},MDP允许一个固定大小的充分统计量,独立于观察到的回合且形状为表格形式,这使得它们直接适用于用于表格基础模型的基于注意力的架构,只需将策略头替换监督目标。这些共同定义了强化学习基础模型的议程。作为概念验证,我们完全在合成MDP上训练一个模型,并表明,无需任务特定的调优,它就能在上下文中解决留出的表格基准,包括在线和离线:在线时,使用比UCB-VI和表格Q-learning少得多的回合;离线时,与VI-LCB竞争。

英文摘要

Foundation models for language and vision are powered by internet-scale data, while structured domains (tabular prediction, time-series forecasting, graph learning, reinforcement learning) are not. The substitute is synthetic data, which shifts the burden from collection to prior design. Such priors already exist for many structured tasks: TabPFN and its successors solve tabular classification with a transformer pretrained on a synthetic Bayesian prior. We make two points. \textbf{First}, reinforcement learning is the conspicuous gap: sampling a synthetic MDP is as feasible as sampling a synthetic tabular dataset, yet no in-context RL work treats prior design as a primary objective. \textbf{Second}, MDPs admit a fixed-size sufficient statistic, independent of the episodes observed and tabular in shape, which makes them directly amenable to the attention-based architectures used for tabular foundation models, with a policy head replacing the supervised target. Together these define the agenda for an RL foundation model. As a proof of concept, we train one model entirely on synthetic MDPs and show that, with no task-specific tuning, it solves held-out tabular benchmarks in context, both online and offline: online, in far fewer episodes than UCB-VI and tabular Q-learning, and offline, competitively with VI-LCB.

2606.18820 2026-06-18 cs.LG cs.AI 新提交

Maturing Markov Decision Processes: Decision Making under Increasing Information and Shrinking Action Sets

成熟马尔可夫决策过程:信息增加与动作集缩小下的决策制定

Jiaxi Liu, Aiping Yang, Yuhang Yang, Shuqi Zhang, Zewei Dong, Jiangming Yang, Xuebin Chen

发表机构 * Ant International(蚂蚁国际) School of Economics, Sichuan University(四川大学经济学院) School of Economics, Fudan University(复旦大学经济学院)

AI总结 针对决策过程中信息增加与动作集缩小的不对称性,提出成熟马尔可夫决策过程(MMDP)框架,并基于过期动作优先级原则开发结构感知强化学习方法,实验证明其能提升学习效率。

Comments 25 pages, 9 figures

详情
AI中文摘要

序列决策问题通常表现出信息和决策灵活性的不对称演化:随着决策周期的展开,智能体获得更丰富的信息,而由于操作截止、承诺或资源约束,可行动作逐渐过期。标准的MDP公式通常将这种结构扁平化为阶段相关的状态描述和动作掩码,从而掩盖了嵌套的信息-动作不对称性,而这种不对称性决定了哪些决策是紧急的、哪些可以推迟。我们引入了成熟马尔可夫决策过程(MMDP),这是一种围绕这种信息-动作不对称性构建的公式。我们通过一个过期动作优先级原则来刻画其关键后果之一,该原则识别出必须在下一阶段之前解决的动作。受此结构启发,我们开发了一个结构感知的强化学习框架,包括阶段感知的策略设计、过期动作抽象以及带有蒸馏的搜索增强学习。在受控的多供应商补货问题、复杂度递增的简化现金管理环境以及生产级模拟器上的实验表明,显式建模这种不对称性可以提高学习效率,并且随着决策问题的规模扩大,其价值日益增加。

英文摘要

Sequential decision problems often exhibit an asymmetric evolution of information and decision flexibility: as a decision cycle unfolds, the agent receives richer information while feasible actions expire due to operational cutoffs, commitments, or resource constraints. Standard MDP formulations typically flatten this structure into stage-dependent state descriptions and action masks, thereby obscuring the nested information--action asymmetry that determines which decisions are urgent and which can be deferred. We introduce Maturing Markov Decision Processes (MMDPs), a formulation built around this information--action asymmetry. We characterize one of its key consequences through an expiring-action priority principle, which identifies the actions that must be resolved before the next stage. Motivated by this structure, we develop a structure-aware reinforcement learning framework with stage-aware policy design, expiring-action abstraction, and search-augmented learning with distillation. Experiments on a controlled multi-supplier replenishment problem, simplified cash-management environments of increasing complexity, and a production-scale simulator show that explicitly modeling this asymmetry improves learning efficiency and becomes increasingly valuable as decision problems scale.

2606.18910 2026-06-18 cs.LG cs.CL 新提交

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

REVES:通过修订与验证增强的测试时扩展训练

Yuanxin Liu, Ruida Zhou, Xinyan Zhao, Amr Sharaf, Hongzhou Lin, Arijit Biswas, Mohammad Ghavamzadeh, Zhaoran Wang, Mingyi Hong

发表机构 * Northwestern University(西北大学) Amazon AGI(亚马逊人工智能实验室) Qualcomm AI Research(高通人工智能研究) University of Minnesota(明尼苏达大学)

AI总结 提出REVES框架,通过将中间步骤的“接近正确”答案转化为解耦的修订和验证提示,实现高效的离策略数据生成,提升大语言模型的多步推理能力,在LiveCodeBench上比强化学习基线高6.5分。

详情
AI中文摘要

通过顺序修订进行测试时扩展已成为增强大语言模型(LLM)推理能力的强大范式。然而,标准的后训练方法主要优化单次目标,与多步推理动态存在根本性不匹配。虽然最近的工作将其视为多轮强化学习(RL),但传统方法直接优化多步轨迹,未能进一步利用模型可以从纠正中学习的中间步骤中的高质量错误。我们提出了一个两阶段迭代框架,交替进行在线数据/提示增强和策略优化。通过将成功恢复轨迹中的中间步骤(“接近正确”答案)转化为解耦的修订和验证提示,我们的方法将训练集中在有效的答案转换和错误识别上。与标准的多轮RL相比,这种方法实现了高效的离策略数据生成,并减少了长程采样的计算开销。在LiveCodeBench上,使用公开可用的测试用例作为反馈,我们观察到比RL基线高6.5分,比标准多轮训练高4.0分。除了编码,我们的方法在圆填充问题上达到了先前报告的SOTA结果,同时使用了最小的基础模型(4B)和远少于更大进化搜索系统的采样次数。在真实验证下的数学结果进一步证实了改进的纠正能力。该方法还泛化到分布外的约束满足谜题,如n皇后和迷你数独,其中正确性完全由问题约束定义。代码可在该https URL获取。

英文摘要

Test-time scaling via sequential revision has emerged as a powerful paradigm for enhancing Large Language Model (LLM) reasoning. However, standard post-training methods primarily optimize single-shot objectives, creating a fundamental misalignment with multi-step inference dynamics. While recent work treats this as multi-turn reinforcement learning (RL), conventional approaches optimize over the multi-step trajectories directly, failing to further exploit the high-quality mistakes in intermediate steps that model can learn from correcting them. We propose a two-stage iterative framework that alternates between online data/prompt augmentation and policy optimization. By converting the intermediate steps (``near-miss'' answers) in the successful recovery trajectories into decoupled revision and verification prompts, our approach concentrates training on both effective answer transformation and error identification. This approach enables efficient off-policy data generation and reduces the computational overhead of long-horizon sampling compared to standard multi-turn RL. On LiveCodeBench, using publicly available test cases as feedback, we observe gains of +6.5 points over the RL baseline and +4.0 points over standard multi-turn training. Beyond coding, our approach matches the previously reported SOTA result on circle packing while using the smallest base model (4B) and far fewer rollouts than the much larger evolutionary search systems. Math results under ground-truth verification further confirm improved correction ability. It also generalizes to out-of-distribution constraint-satisfaction puzzles such as n\_queens and mini\_sudoku, where correctness is defined entirely by problem constraints. Code is available at https://github.com/yxliu02/REVES.git.

2606.18963 2026-06-18 cs.LG 新提交

Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

无环境奖励的固定通道感知事件流在线奖惩学习

Zirong Li

发表机构 * Zirong Li(李 Cirong)

AI总结 提出OHIRL框架,在无标量奖励下通过固定通道感知流进行在线奖惩学习,利用内部轨迹评估器推断感知维度的效价,在XOR任务和CartPole等控制任务中达到高准确率。

Comments 9 pages, 5 figures, 6 tables; 13-page technical supplement

详情
AI中文摘要

我们研究当环境不提供标量奖励或评估标签时的在线奖惩学习。在每一步,智能体仅接收一个固定通道的感知数据包,诸如疼痛、能量、接触、损伤或认知错误等量被视为感知维度,其效价必须从转移后果中推断。OHIRL分离了四个角色:M_psi学习下一数据包预测,D_omega建模残差动力学,C_eta是一个固定的内部转移后轨迹评估器,B_xi学习使用由此产生的价值证据进行后续策略更新和动作评分。C_eta采用恢复正性、持久/增长负性的残差调节取向;系数来源审计显示,等单元、原始等值和随机单调变体保留了超过92%的已发布顶级动作排名,而符号反转保留了0%。无奖励协议暴露观察转移,同时隐藏环境奖励、延迟外部评估器、成功标签和动作好坏标签。条件误差分解将B_xi的证据估计误差与残差策略优化误差分离。在2x2-XOR数据包任务中,药物和辣椒在视觉XOR上下文中获得相反的价值,并且相同的疼痛或辣度增加可能根据后果结构为正或负;B_xi达到0.952的平衡奖励符号准确率。在完整的在线交错审计中,M_psi达到留出R2=0.907,B_xi达到0.940的符号准确率,策略达到0.979的最优动作准确率,而即时数据包分数、预测误差奖励、打乱目标、零奖励和误差减少控制均崩溃。隐藏奖励的CartPole和Taxi控制、公共上下文无泄漏审计以及模块角色消融进一步测试了信息边界和组件必要性。

英文摘要

We study online reward-punishment learning when the environment provides no scalar reward or evaluative label. At each step the agent receives only a fixed-channel perceptual packet, and quantities such as pain, energy, contact, damage, or cognitive error are treated as perceptual dimensions whose valence must be inferred from transition consequences. OHIRL separates four roles: M_psi learns next-packet prediction, D_omega models residual dynamics, C_eta is a fixed internal post-transition trajectory evaluator, and B_xi learns to use the resulting value evidence for later policy updates and action scoring. C_eta uses a recovery-positive and persistence/growth-negative residual-regulation orientation; a coefficient-origin audit shows that equal-unit, raw-equal, and random monotone variants preserve more than 92% of the released top-action rankings, while sign inversion preserves 0%. The reward-free protocol exposes observation transitions while withholding environment rewards, delayed external evaluators, success labels, and action-goodness labels. A conditional error decomposition separates B_xi evidence-estimation error from residual policy-optimization error. In a 2x2-XOR packet task, medicine and chili acquire opposite value under visual XOR contexts, and the same pain or spice increase can be positive or negative depending on consequence structure; B_xi reaches 0.952 balanced reward-sign accuracy. In a full online-interleaved audit, M_psi reaches holdout R2=0.907, B_xi reaches 0.940 sign accuracy, and the policy reaches 0.979 optimal-action accuracy, while immediate packet scores, prediction-error rewards, shuffled targets, zero reward, and error-reduction controls collapse. Hidden-reward CartPole and Taxi controls, public-context no-leakage audits, and module-role ablations further test information boundaries and component necessity.

2606.19134 2026-06-18 cs.LG cs.AI 新提交

Pareto Q-Learning with Reward Machines

带奖励机的帕累托Q学习

Arnaud Lequen, Clément Legrand-Lixon, Léo Saulières

AI总结 提出PQLRM算法,结合帕累托Q学习和奖励机,在多目标强化学习中高效逼近帕累托前沿,并处理非马尔可夫奖励。

Comments Accepted at the ICAPS 2026 Workshop on Bridging the Gap Between AI Planning and (Reinforcement) Learning (PRL)

详情
AI中文摘要

我们提出了带奖励机的帕累托Q学习(PQLRM),这是一种用于任务的多目标强化学习算法,其奖励结构由一组奖励机(RMs)指定。PQLRM结合了帕累托Q学习(PQL)(该方法维护向量值Q估计的集合以逼近帕累托前沿)和带奖励机的Q学习(QRM)的增强(该方法利用奖励信号的因子化自动机结构)。这产生了一种多策略算法,在非马尔可夫、RM编码的奖励下保持样本效率。实验表明,PQLRM比应用于叉积MDP的朴素PQL基线收敛更快,并且可以合成QRM无法获得的帕累托最优策略。

英文摘要

We present Pareto Q-Learning with Reward Machines (PQLRM), a multi-objective reinforcement learning algorithm for tasks whose reward structure is specified by a set of reward machines (RMs). PQLRM combines Pareto Q-Learning (PQL), which maintains sets of vector-valued Q-estimates to approximate the Pareto front, with enhancements from Q-Learning with Reward Machines (QRM), which exploits the factored automaton structure of the reward signal. This yields a multi-policy algorithm that remains sample-efficient under non-Markovian, RM-encoded rewards. Experimental trials show that PQLRM converges faster than a naive PQL baseline applied to the cross-product MDP and can synthesize Pareto-optimal policies that QRM cannot.

2606.19199 2026-06-18 cs.LG cs.AI 新提交

Forecasting what Matters: Decision-Focused RL for Controlled EV Charging with Unknown Departure Times

预测关键因素:面向决策的强化学习用于未知离开时间的受控电动汽车充电

Giuseppe Gabriele, Fabio Pavirani, Seyed Soroush Karimi Madahi, Chris Develder

发表机构 * Ghent University -- imec(根特大学 -- imec)

AI总结 针对电动汽车充电中离开时间未知导致强化学习策略效果差的问题,提出面向决策的强化学习框架,联合训练预测器与控制器,实现端到端优化,使总奖励提升14%,未供应能量减少55%。

Comments ACM e-Energy 2026 5 pages, 1 figure, 1 table

详情
AI中文摘要

近年来电动汽车的普及给电力系统带来了挑战,包括峰值需求增加和潜在的电网不稳定。基于强化学习的智能充电控制可以通过从历史数据中学习时间和上下文模式来缓解这些问题。然而,在现实场景中,关键特征(如离开时间)通常不可用。这使得强化学习智能体更难学习和执行有效的充电策略。为了减轻这种不确定性,训练好的预测器可以从可用数据中近似未知特征。然而,由于这些预测模型通常针对准确性(而非对下游智能体决策质量的影响)进行训练,它们的误差可能会传播并阻碍使用预测的控制器的整体性能。为了避免这种情况,我们提出了一种面向决策的强化学习框架,其中预测器是端到端训练的,即通过强化学习智能体采取的充电策略动作的反馈。这种预测器和控制器的联合训练最终产生了更高质量的动作:与没有离开时间预测的强化学习方法相比,我们提出的面向决策的强化学习方法产生了更优的充电决策,总奖励提高了14%,未供应能量(即由于电动汽车已离开而未能进行的充电)减少了55%。

英文摘要

The recent growth of EV adoption poses challenges for power systems, including increased peak demand and potential grid instability. Smart control of EV charging -- e.g., based on reinforcement learning (RL) -- can alleviate these issues by learning temporal and contextual patterns from historical data. Yet, in real-world scenarios, key features, such as departure time, often are unavailable. This, in turn, makes it harder for an RL agent to learn and execute an effective charging policy. To mitigate this uncertainty, a trained forecaster can approximate the unknown features from available data. However, since these forecasting models are typically trained for accuracy (rather than their impact on a downstream agent's decision quality), their errors may propagate and hinder the overall performance of a controller that is using the forecasts. To avoid this, we propose a decision-focused RL (DF-RL) framework in which the forecaster is trained end-to-end, i.e., with feedback from the charging policy actions taken by the RL agent. Such joint training of both the forecaster and controller ultimately results in higher-quality actions: our proposed DF-RL method yields superior charging decisions compared to other baselines, achieving up to a 14% improvement in total reward and a 55% reduction of unsupplied energy (i.e., charging that failed to happen because the EV already left), relative to the RL method without departure time forecasting.

2606.19236 2026-06-18 cs.LG cs.AI cs.CL 新提交

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

STARE: 基于惊讶度的令牌级优势重加权以实现策略熵稳定性

Haipeng Luo, Qingfeng Sun, Songli Wu, Can Xu, Wenfeng Deng, Han Hu, Yansong Tang

发表机构 * Shenzhen International Graduate School, Tsinghua University(清华大学深圳国际研究生院) Tencent Hunyuan(腾讯混元)

AI总结 针对GRPO等RL算法中策略熵崩溃问题,提出STARE方法,通过惊讶度分位数识别熵关键令牌并重加权其优势,结合目标熵闭环门控稳定熵,在1.5B-32B模型和多种任务上实现稳定训练,AIME24/25准确率提升4%-8%。

Comments LLM, Reinforcement Learning

详情
AI中文摘要

基于可验证奖励的强化学习算法(如GRPO)已成为LLMs复杂推理的主流后训练范式,但通常在训练中遭受策略熵崩溃。我们对GRPO下的令牌级熵动态进行一阶梯度分析,识别出令牌级信用分配不匹配:每个令牌的熵变化分解为轨迹级优势与下一个令牌分布上的熵敏感函数的乘积,产生优势-惊讶度四象限结构和近临界性质。受此启发,我们提出STARE(基于惊讶度的令牌级优势重加权以实现策略熵稳定性),该方法通过批次内惊讶度分位数识别熵关键令牌子集,选择性重加权其有效优势,并引入目标熵闭环门控以实现稳定的熵调节。在1.5B至32B的模型规模以及三个任务族(短思维链、长思维链和多轮工具使用)上,STARE在数千步内维持稳定的RL训练,同时将策略熵保持在目标带内。在AIME24和AIME25上,STARE在平均准确率上比DAPO和其他竞争基线高出4%-8%,反思令牌和响应长度同步增长,表明持续探索-利用平衡进一步释放了RL训练潜力。代码可在https://github.com/xxxx获取。

英文摘要

Reinforcement Learning with Verifiable Rewards algorithms like GRPO have emerged as the dominant post-training paradigm for complex reasoning in LLMs, yet commonly suffer from policy entropy collapse during training. We conduct a first-order gradient analysis of token-level entropy dynamics under GRPO and identify a token-level credit assignment mismatch: the per-token entropy variation decomposes into the product of the trajectory-level advantage and an entropy sensitivity function over the next-token distribution, yielding an advantage-surprisal four-quadrant structure and a near-criticality property. Motivated by it, we propose STARE (Surprisal-guided Token-level Advantage Reweighting for policy Entropy stability), which identifies entropy-critical token subsets via batch-internal surprisal quantiles, selectively reweights their effective advantages, and incorporates a target-entropy closed-loop gate for stable entropy regulation. Across model scales from 1.5B to 32B and three task families (Short CoT, Long CoT, and Multi-Turn Tool Use), STARE sustains stable RL training over thousands of steps while maintaining policy entropy within the target band. On AIME24 and AIME25, STARE outperforms DAPO and other competitive baselines by 4%-8% in average accuracy, with reflection tokens and response length growing in tandem, indicating sustained exploration-exploitation balance that further unlocks RL training potential.Code is available at https://github.com/hp-luo/STARE.

2606.19328 2026-06-18 cs.LG cs.AI cs.RO 新提交

UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning

UBP2: 不确定性平衡的偏好规划用于高效基于偏好的强化学习

Mohamed Nabail, Leo Cheng, Jingmin Wang, Nicholas Rhinehart

发表机构 * Learning, Embodied Autonomy, and Forecasting (LEAF) Lab, University of Toronto(多伦多大学学习、具身自主与预测(LEAF)实验室)

AI总结 提出UBP2方法,通过联合推理奖励、动力学和值函数的不确定性来主动引导探索,在Meta-World基准上显著提高了样本效率。

详情
AI中文摘要

基于偏好的强化学习提供了一种从行为的成对比较中学习奖励模型的方法,绕过了显式奖励设计的需求。然而,现有方法通常依赖于被动数据收集,并且在学习的早期阶段样本效率低下。我们引入了一种基于模型的方法,通过联合推理奖励、动力学和值函数的不确定性来主动引导探索。我们的方法,不确定性平衡的偏好规划(UBP2),使用奖励、动力学和值函数模型的集成,根据结合了期望奖励、终值认知不确定性的统一评分来评估候选轨迹。在此目标下的规划产生了利用和信息获取之间的显式权衡,无需临时的探索启发式。在标准正则性假设下,我们为有限时域和无限时域设置建立了次线性遗憾保证。实验上,在Meta-World基准上的实验表明,UBP2比无模型的基于偏好的方法和非乐观的基于模型的基线方法实现了更高的样本效率。

英文摘要

Preference-based RL provides an approach to learning reward models from pairwise comparisons of behaviors, bypassing the need for explicit reward design. However, existing methods typically rely on passive data collection and suffer from poor sample efficiency, especially during the early stages of learning. We introduce a model-based approach that actively directs exploration by jointly reasoning over uncertainties in the reward, dynamics, and value functions. Our method, Uncertainty-Balanced Preference Planning (UBP2), uses ensembles of reward, dynamics, and value function models to evaluate candidate trajectories according to a unified score that combines expected reward, terminal value, and epistemic uncertainty. Planning under this objective yields an explicit tradeoff between exploitation and information acquisition without requiring ad hoc exploration heuristics. Under standard regularity assumptions, we establish sublinear regret guarantees for both finite-horizon and infinite-horizon settings. Empirically, experiments on the Meta-World benchmark show UBP2 achieves substantially higher sample efficiency than model-free preference-based methods and non-optimistic model-based baselines.

2606.18438 2026-06-18 math.OC cs.LG 交叉投稿

Sequential Hiring of Contingent Workers Through Learning-Based Optimization

基于学习优化的临时工顺序雇佣

Chris Lee, Xiuli Chao, Izak Duenyas

发表机构 * Department of Industrial and Operations Engineering, University of Michigan(工业与运营工程系,密歇根大学) Ross School of Business, University of Michigan(罗斯商学院,密歇根大学)

AI总结 针对临时工场景中工人产能和劳动力供给的不确定性,提出DR-UCB策略,通过学习周期顺序决策替换与雇佣,实现累积利润最大化,并证明其遗憾下界匹配。

详情
AI中文摘要

在本文中,我们研究了临时工场景下存在工人产能和劳动力供给不确定性的顺序劳动力管理问题。企业通过维持固定规模的活跃团队并随时间学习工人生产力,以最大化累积利润。我们强调该问题中的两个关键运营摩擦:替换工人成本高昂,且工人可能因先前工作承诺、日程限制或入职流程等原因无法立即雇佣。因此,雇佣决策仅在随机延迟后生效。我们将该问题建模为具有昂贵切换和延迟动作的随机多臂赌博机,并开发了一种基于学习的雇佣策略DR-UCB(延迟替换-UCB),该策略通过学习周期顺序做出替换和雇佣决策。在每个周期中,该策略使用实时生产数据确定何时启动劳动力变更以及替换和雇佣哪些工人。我们证明,所提策略的前沿遗憾在其对时间范围的依赖上匹配下界。数值实验表明,DR-UCB优于基准策略。

英文摘要

In this paper, we study a sequential workforce management problem in a contingent labor setting with uncertainty in both worker production and labor supply. A firm seeks to maximize cumulative profit by maintaining an active team of fixed size while learning worker productivity over time. We emphasize two critical operational frictions in this problem: replacing workers is costly, and workers may not be available immediately for hiring because of, for example, prior job commitments, scheduling constraints, or onboarding procedures. Thus, hiring decisions take effect only after a random delay. We formulate this problem as a stochastic multi-play bandit with costly switching and delayed actions, and develop a learning-based hiring policy, DR-UCB (DelayedReplacement-UCB), that makes replacement and hiring decisions sequentially through learning cycles. In each cycle, the policy uses real-time production data to determine when to initiate workforce changes and which workers to replace and hire. We show that the leading-order regret of the proposed policy matches its lower bound in its dependence on the time horizon. Our numerical experiments show that DR-UCB outperforms benchmark policies.

2606.18514 2026-06-18 cs.RO cs.LG 交叉投稿

N(CO)$^2$: Neural Combinatorial Optimization with Chance Constraints to Solve Stochastic Orienteering

N(CO)$^2$: 基于机会约束的神经组合优化求解随机定向问题

Anas Saeed, Marcos Abel Zuzuárregui, Stefano Carpin

发表机构 * Department of Computer Science and Engineering, University of California, Merced(加州大学默塞德分校计算机科学与工程系)

AI总结 提出N(CO)$^2$框架,结合强化学习求解随机定向问题,无需手工启发式,在不确定环境下优化路径选择,性能媲美MILP。

详情
Journal ref
In Proceedings of the IEEE International Conference on Automation Science and Engineering (CASE), 2025
AI中文摘要

神经组合优化(NCO)通过学习启发式,为求解复杂图优化问题提供了一种有前景的替代传统启发式方法的方法。这类问题在自动化领域频繁出现,可用于建模多种应用。虽然NCO在确定性组合优化问题上已被广泛研究,但只有少数工作旨在解决随机组合优化问题。本文提出N(CO)$^2$:基于机会约束的神经组合优化,用于求解随机定向问题(SOP),无需手工设计的启发式。通过集成强化学习(RL)框架,模型在不确定性下优化路径选择,有效平衡探索与利用。实验结果表明,我们的方法在多种SOP实例上具有良好的泛化能力,与最先进的混合整数线性规划(MILP)相比性能具有竞争力。所提方法减少了启发式设计的人力投入,同时在不确定环境中实现自适应和高效的决策。

英文摘要

Neural combinatorial optimization (NCO) offers a promising alternative to traditional heuristic-based methods for solving complex graph optimization problems by proposing to learn heuristics through data. This class of problems frequently arises in automation, as it can be used to model a variety of applications. While NCO has been extensively studied for deterministic combinatorial optimization problems, there are only a few works that aim to solve stochastic combinatorial optimization problems. In this work, we present N(CO)$^2$: Neural Combinatorial Optimization with Chance cOnstraints to solve the Stochastic Orienteering Problem (SOP) without the use of hand-crafted heuristics. By integrating a reinforcement learning (RL) framework, the model optimizes path selection under uncertainty, effectively balancing exploration and exploitation. Empirical results demonstrate that our method generalizes well across diverse SOP instances, achieving competitive performance compared to the state-of-the-art mixed-integer linear program (MILP) for the task. The proposed approach reduces human effort in heuristic design while enabling adaptive and efficient decision-making in uncertain environments.

2606.18531 2026-06-18 stat.ML cs.LG 交叉投稿

When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?

轨迹级监督何时允许高效的离线强化学习?

Xuanfei Ren, Tengyang Xie

发表机构 * University of Wisconsin-Madison(威斯康星大学麦迪逊分校)

AI总结 本文研究离线强化学习中仅使用轨迹级结果(如累积回报或偏好)进行策略优化的统计理论,提出OPAC算法并证明其样本复杂度,同时揭示在非线性聚合目标下存在的统计障碍。

Comments 69 pages

详情
AI中文摘要

离线强化学习通常在过程级奖励监督下进行分析,然而许多序列决策数据集仅记录轨迹级结果。我们发展了从这种结果级监督进行离线策略优化的统计理论。首先研究规范设置,其中目标仍是期望累积奖励,但每个离线轨迹仅提供一个标量标签,其条件均值是累积回报。我们提出OPAC,一种悲观演员-评论家算法,它学习潜在奖励模型并从轨迹级标签优化策略。我们证明了阶为$\widetilde O(H^2\sqrt{C_{sa}(\pi^\star)/n})$的高概率保证和匹配的下界,刻画了用单个轨迹级标签替代过程级奖励的尖锐统计代价。然后我们将该原理扩展到基于偏好的反馈,在偏好模型常数范围内保留了领先的视界和可集中性依赖。最后,我们研究广义基于结果的离线强化学习,其中监督和目标都是由潜在每步奖励的非线性聚合引起的轨迹级量。该问题通常不可学习:对于全成功目标,即使具有确定性转移和常数可集中性,任何离线学习器可能需要$\Omega(2^H)$个轨迹。然后我们通过两个结构系数$\kappa_\mu(\sigma)$和$\chi_\mu(\sigma)$识别出一个可处理的区域,这两个系数捕捉了结果聚合和广义贝尔曼更新中的信息损失,在此区域广义OPAC实现了多项式样本复杂度。我们的结果共同描绘了何时结果级监督能够实现样本高效的离线控制,以及何时缺失过程级奖励会带来根本性的统计障碍。

英文摘要

Offline reinforcement learning is typically analyzed under process-level reward supervision, yet many sequential decision datasets record only trajectory-level outcomes. We develop a statistical theory for offline policy optimization from such outcome-level supervision. We first study the canonical setting where the target remains the expected cumulative reward, but each offline trajectory provides only a scalar label whose conditional mean is the cumulative return. We propose OPAC, a pessimistic actor-critic algorithm that learns a latent reward model and optimizes a policy from trajectory-level labels. We prove a high-probability guarantee of order $\widetilde O(H^2\sqrt{C_{sa}(π^\star)/n})$ and a matching lower bound, characterizing the sharp statistical cost of replacing process-level rewards with one trajectory-level label. We then extend the principle to preference-based feedback, preserving the leading horizon and concentrability dependence up to preference-model constants. Finally, we study generalized outcome-based offline RL, where both the supervision and the objective are trajectory-level quantities induced by a nonlinear aggregation of latent per-step rewards. This problem is not learnable in general: for all-success objectives, any offline learner may require $Ω(2^H)$ trajectories even with deterministic transitions and constant concentrability. We then identify a tractable regime through two structural coefficients, $κ_μ(σ)$ and $χ_μ(σ)$, capturing information loss in outcome aggregation and generalized Bellman updates, under which generalized OPAC achieves polynomial sample complexity. Together, our results delineate when outcome-level supervision enables sample-efficient offline control and when missing process-level rewards create fundamental statistical barriers.

2606.18598 2026-06-18 cs.AI cs.LG 交叉投稿

Optimizing Lithium Production Decisions under Geological, Demand, and Pricing Uncertainties: A POMDP Framework for Multi-Objective Decision Making

在地质、需求和定价不确定性下优化锂生产决策:多目标决策的POMDP框架

Anna C. Edmonds, Mansur M. Arief, Robert J. Moss, Mykel J. Kochenderfer, Jef Caers

发表机构 * Computer Science Department, Stanford University(斯坦福大学计算机科学系) Aeronautics and Astronautics Department, Stanford University(斯坦福大学航空与航天系) Earth and Planetary Sciences Department, Stanford University(斯坦福大学地球与行星科学系)

AI总结 提出POMDP框架,通过信念状态规划优化锂矿开采决策,动态适应价格不确定性,实现更高需求满足和更平衡的经济环境效益。

Comments 24 pages, 14 tables, 4 figures

详情
AI中文摘要

锂生产中的决策制定具有挑战性,无论是从投资者角度还是战略生产角度。决定开采哪些矿山以及何时开采,不仅涉及地质和价格不确定性,还涉及提取方法选择的复杂性,从直接锂提取到硬岩开采。先前的工作探索了该问题的模型和优化采矿决策的不同方法;这些模型没有考虑定价不确定性、需求不确定性或提取锂的不同采矿技术。将不同的定价模型和提取技术纳入这些模型,可以制定更稳健的策略,不仅决定何时何地开采矿山,还决定采用哪种生产方法。我们将问题表述为部分可观测马尔可夫决策过程(POMDP),并使用信念状态规划方法求解以获得最优决策。在我们的研究中,我们表明POMDP求解器通过信念状态规划和显式不确定性管理,动态适应变化的锂价格机制(静态、线性、指数和随机),优于人类启发式启发法。通过优化勘探、生产和技术选择的顺序,该框架在所有不同的定价和矿床情景下,在项目生命周期内实现了更高的需求满足和更平衡的经济环境结果。

英文摘要

Decision making in lithium production is challenging, whether from an investor's perspective or a strategic production standpoint. Determining which mines to open and when to open them involves not only geological and price uncertainties, but also complexities around the choice of extraction method, from direct lithium extraction to hard rock mining. Prior work explored models of this problem and different methods to optimize mining decisions; these models did not account for uncertainty in pricing, uncertainty in demand, or different mining technologies to extract lithium. Incorporating different pricing models and extraction technology into these models enables more robust strategies for determining not only when and where to open a mine, but also which method of production to pursue. We frame the problem as a partially observable Markov decision process (POMDP) and solve using belief state planning methods to get optimal decision making. In our study, we show that POMDP solvers outperform human inspired heuristics by dynamically adapting to shifting lithium price regimes (static, linear, exponential, and stochastic) through belief state planning and explicit uncertainty management. By optimally sequencing exploration, production, and technology choice, the framework achieves higher demand fulfillment and more balanced economic environmental outcomes over the projects lifetime in all different pricing and deposit scenarios.

2606.19069 2026-06-18 eess.SY cs.LG cs.SY 交叉投稿

Model-Free Reinforcement Learning Control for Resilient Cyber-Physical Systems

面向弹性信息物理系统的无模型强化学习控制

Hugo O. Garcés, Alejandro J. Rojas, Bernardo A. Hernández, Andrés Escalona, Jonathan M. Palma, Md. Rezwan Parvez, Bhushan Gopaluni, Sirish L. Shah

发表机构 * Departmento de Ingenier\'ia El\'ectrica, Universidad de Concepci\'on, Concepci\'on, Chile (e-mail: ) Department of Electrical \& Computer Engineering, University of Alberta, Edmonton, T6G 1H9, Alberta, AB, Canada (e-mail: ) Department of Chemical Biological Engineering, University of British Columbia, Vancouver, BC V6T 1Z3, Canada ( ) Department of Chemical \& Materials Engineering, University of Alberta, Edmonton, T6G 1H9, Alberta, AB, Canada (e-mail: )

AI总结 本文比较了无模型控制器在非线性系统遭受网络攻击(虚假数据注入和拒绝服务攻击)下的性能,分析了四种强化学习奖励类型,发现Lyapunov奖励在低跟踪误差下弹性最佳,指数奖励在中等训练条件下提供良好折衷,渐进和线性奖励收敛快但鲁棒性差。

Comments Accepted to the 23rd IFAC World Congress 2026

详情
AI中文摘要

本文比较了无模型控制器在遭受网络攻击(包括虚假数据注入和拒绝服务攻击)的非线性系统上的性能。分析了四种强化学习奖励类型的准确性、成本和弹性。结果表明,Lyapunov奖励在低跟踪误差下提供最佳弹性。指数模式在中等训练条件下也提供了良好的折衷,具有可接受的弹性。渐进和线性奖励收敛更快,但鲁棒性较差。强化学习模型预测控制器(RL-MPC)表现出强稳态弹性,但需要更长的训练时间;强化学习比例-积分-微分控制器(RL-PID)更快,训练时间显著减少。近端策略优化(PPO)优于深度确定性策略梯度(DDPG),关键绩效指标(KPI)方差显著降低。本研究旨在强调精心设计的强化学习奖励如何提高性能和对网络威胁的弹性。

英文摘要

This paper compares the performance of model-free controllers on a nonlinear system under cyberattacks, including false data injection and denial-of-service attacks. Four RL reward types are analyzed for accuracy, cost, and resilience. Results show that the Lyapunov reward offers the best resilience with low tracking error. Exponential mode also provides good trade-offs with acceptable resilience under moderate training conditions. Progressive and linear rewards converge faster but are less robust. RL-MPCs show strong steady-state resilience but require longer training times; RL-PID controllers are faster with significantly less training time. Proximal Policy Optimization outperforms Deep Deterministic Policy Gradient with a significant reduction in KPI variance. This study serves to highlight how well-designed RL rewards can improve performance and resilience against cyber threats.

2507.17786 2026-06-18 cs.LG 版本更新

Reinforcement Learning for Accelerated Aerodynamic Shape Optimisation

强化学习加速气动外形优化

Florian Sobieczky, Alfredo Lopez, Erika Dudkin, Christopher Lackner, Matthias Hochsteger, Bernhard Scheichl, Helmut Sobieczky

发表机构 * Software Competence Center Hagenberg (SCCH)(软件竞争力中心哈根贝格) Institut für Strömungsmechanik und Wärmeübertragung, TU Wien(流体力学与传热研究所,维也纳技术大学) CERBSim GmbH(CERBSim公司)

AI总结 提出基于强化学习的自适应优化算法,通过代理模型和演员-评论家策略评估的MCMC方法,冻结部分参数以降低维度,加速气动外形优化,并在简单流体动力学问题上验证了特征重要性解释能力。

详情
AI中文摘要

我们引入了一种基于强化学习(RL)的自适应优化算法,用于气动外形优化,重点关注降维。这里应用RL的形式是一种基于代理的、演员-评论家策略评估的MCMC方法,允许对部分待优化参数进行时间上的“冻结”。目标是尽量减少计算量,并利用观察到的优化结果来解释所发现的极值点在实现所需流场中的作用。通过围绕作为真实值的中间CFD模拟进行一系列局部优化的参数变化,如果(a)参数必须驻留的局部邻域足够大,能够与网格大小的步长及其大量模拟相竞争,并且(b)对这些邻域所需的奖励和成本估计足够准确,以实现良好的逐步参数自适应,则可以加速全局优化。我们给出了一个简单流体动力学问题的例子,在该问题上,该方法允许在特征重要性评分意义上进行解释。

英文摘要

We introduce a reinforcement learning (RL) based adaptive optimization algorithm for aerodynamic shape optimization focused on dimensionality reduction. The form in which RL is applied here is that of a surrogate-based, actor-critic policy evaluation MCMC approach allowing for temporal 'freezing' of some of the parameters to be optimized. The goals are to minimize computational effort, and to use the observed optimization results for interpretation of the discovered extrema in terms of their role in achieving the desired flow-field. By a sequence of local optimized parameter changes around intermediate CFD simulations acting as ground truth, it is possible to speed up the global optimization if (a) the local neighbourhoods of the parameters in which the changed parameters must reside are sufficiently large to compete with the grid-sized steps and its large number of simulations, and (b) the estimates of the rewards and costs on these neighbourhoods necessary for a good step-wise parameter adaption are sufficiently accurate. We give an example of a simple fluid-dynamical problem on which the method allows interpretation in the sense of a feature importance scoring.

2604.03208 2026-06-18 cs.LG 版本更新

Hierarchical Planning with Latent World Models

基于潜在世界模型的分层规划

Wancong Zhang, Basile Terver, Artem Zholus, Soham Chitnis, Harsh Sutaria, Mido Assran, Randall Balestriero, Amir Bar, Adrien Bardes, Yann LeCun, Nicolas Ballas

发表机构 * FAIR at Meta(Meta旗下的FAIR) New York University(纽约大学) Mila - Québec AI Institute(魁北克AI研究院) Brown University(布朗大学)

AI总结 提出HWM架构,通过多时间尺度潜在世界模型和潜在匹配实现分层模型预测控制,解决长时域任务中单层规划失败和计算爆炸问题。

详情
AI中文摘要

世界模型是通过规划实现零样本具身控制的一条有前景的路径。然而,现有的世界模型规划器在长时域、多阶段任务中面临困难:预测误差累积,且朴素搜索的复杂度随规划时域呈指数增长。分层方法通过将任务分解为更短、可处理的子问题来缓解这两个问题;然而,先前的分层方法要么将控制摊销为任务特定的策略(分层强化学习),要么假设低维状态和已知动力学(经典分层MPC)。我们提出了基于潜在世界模型的分层规划(HWM),这是一种直接在仅通过下一潜在预测训练的视觉世界模型上进行分层模型预测控制(MPC)的架构和规划范式。HWM在共享潜在空间内学习多个时间尺度的世界模型,因此长时域模型的预测通过潜在匹配作为短时域模型的子目标,无需任务特定的奖励、技能学习或分层策略。为了保持长时域搜索的可处理性,HWM学习了一个动作编码器,将原始动作块压缩为潜在宏动作。在真实世界的Franka操作中,HWM从单个目标图像中完成拾取和放置的成功率为70%,而单层规划的成功率为0%。在模拟的推操作和迷宫导航任务中,HWM在长时域任务上持续提升性能,同时所需规划计算量最多减少3倍。

英文摘要

World models are a promising path to zero-shot embodied control through planning. However, existing world model planners struggle on long-horizon, multi-stage tasks: prediction errors compound and naive search is exponential in the planning horizon. Hierarchy mitigates both by decomposing tasks into shorter, tractable subproblems; yet prior hierarchical approaches either amortize control into task-specific policies (hierarchical RL) or assume low-dimensional states and known dynamics (classical hierarchical MPC). We present Hierarchical Planning with Latent World Models (HWM), an architecture and planning paradigm for hierarchical model predictive control (MPC) directly on visual world models trained solely via next-latent prediction. HWM learns world models at multiple temporal scales within a shared latent space, so predictions from the long-horizon model serve as subgoals for the short-horizon model via latent matching, without task-specific rewards, skill learning, or hierarchical policies. To keep long-horizon search tractable, HWM learns an action encoder that compresses primitive action chunks into latent macro-actions. On real-world Franka manipulation, HWM solves pick-and-place from a single goal image at 70% success vs. 0% for single-level planning. Across simulated push manipulation and maze navigation, HWM consistently improves performance on long-horizon tasks while requiring up to 3x less planning compute.

2605.22142 2026-06-18 cs.LG cs.AI 版本更新

Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability

知识图谱下的短期到长期记忆转移:在部分可观测性下的短期到长期记忆转移

Taewoon Kim, Vincent François-Lavet, Michael Cochez

AI总结 本文研究了在部分可观测性下知识图谱中的短期到长期记忆转移问题,提出了一种基于神经符号价值决策的方法,通过在长期插入前决定保留或丢弃观察到的三元组,从而提升记忆效率,并在RoomKG基准测试中优于符号和神经基线方法。

详情
AI中文摘要

在部分可观测性下的强化学习需要决定保留哪些信息,但大多数基于记忆的方法并未显式建模符号观察的短期到长期转移。我们研究了这一转移过程,将其建模为一个神经符号价值决策问题:对于每个观察到的三元组,智能体需决定在长期插入前是否保留或丢弃。为处理可变大小的短期缓冲区,我们采用了一种每项Q学习设计,使用共享参数和实际的时间差分更新,跨连续步骤匹配项目。在长期记忆容量为128的RoomKG基准测试中,学习到的转移决策优于符号和神经基线,包括带有时间注释的符号基线和基于历史的LSTM/Transformer基线。在转移策略消融分析中,一个轻量级的本地短期-only变体表现最佳,且在步骤层面行为显示,策略保留导航和查询相关的事实,同时丢弃低价值的候选事实,支持在内存限制下显式且可解释的记忆决策。

英文摘要

Reinforcement learning under partial observability requires deciding what information to retain, yet most memory-based approaches do not explicitly model short-term-to-long-term transfer of symbolic observations. We study this transfer process in a temporal knowledge-graph memory setting and cast it as a neuro-symbolic value-based decision problem: for each observed triple, the agent chooses whether to keep or drop it before long-term insertion. To handle variable-sized short-term buffers, we use a per-item Q-learning design with shared parameters and a practical temporal-difference update over matched items across consecutive steps. On the RoomKG benchmark at long-term memory capacity 128, learned transfer decisions outperform symbolic and neural baselines, including symbolic baselines with temporal annotations and history-based LSTM/Transformer baselines. Across transfer-policy ablations, a lightweight local short-term-only variant performs best, and step-level behavior shows that the policy keeps navigation- and query-relevant facts while discarding lower-value candidate facts, supporting explicit and interpretable memory decisions under memory constraints.

2606.12808 2026-06-18 cs.LG cs.AI 版本更新

SymQNet: Amortized Acquisition for Low-Latency Adaptive Hamiltonian Learning

SymQNet: 低延迟自适应哈密顿量学习的摊销获取

Yash Vardhan Tomar, Dheeraj Peddireddy

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出SymQNet,一种摊销强化学习方法,通过离线学习后验条件获取策略,在线快速前向传播,显著降低自适应哈密顿量学习的获取延迟。

详情
AI中文摘要

自适应哈密顿量学习对于校准和表征量子设备至关重要。在自适应控制器中,选择下一个实验本身就是一个计算。贝叶斯设计规则在每次后验更新后重新计算,这一步可能需要几秒钟。在数百次试验中,这些秒数成为自适应性的显著墙钟成本。我们引入SymQNet,一种用于低延迟自适应哈密顿量学习的摊销强化学习方法。SymQNet离线学习后验条件获取策略,然后在线使用快速策略前向传播,同时保留贝叶斯后验反馈。在横向场伊辛基准测试中,相对于有界Fisher信息搜索和有界两步贝叶斯主动学习(BALD),SymQNet显著降低了获取延迟。在五量子比特时,相对于这些在线基线,它仅获取决策延迟降低了$47.1\ imes$和$72.6\ imes$;在十二量子比特时,SymQNet的完整模拟步骤需要$1.02$秒,而有界两步BALD需要$13.27$秒。总体而言,我们表明学习获取可以使自适应哈密顿量学习对于重复的低延迟工作负载变得实用。

英文摘要

Adaptive Hamiltonian learning is central to calibrating and characterizing quantum devices. In an adaptive controller, choosing the next experiment is itself a computation. Bayesian design rules are recomputed after every posterior update, and that step can take seconds. Across hundreds of shots, those seconds become a significant wall-clock cost for adaptivity. We introduce SymQNet, an amortized reinforcement-learning approach for low-latency adaptive Hamiltonian learning. SymQNet learns a posterior-conditioned acquisition policy offline, then uses a fast policy forward pass online while retaining Bayesian posterior feedback. On transverse-field Ising benchmarks, SymQNet substantially reduces acquisition latency relative to bounded Fisher-information search and bounded two-step Bayesian active learning by disagreement (BALD). At five qubits, it reduces acquisition-only decision latency by $47.1\times$ and $72.6\times$ relative to these online baselines; at twelve qubits, full simulated steps take $1.02$ s for SymQNet versus $13.27$ s for bounded two-step BALD. Overall, we show that learned acquisition can make adaptive Hamiltonian learning practical for repeated low-latency workloads.

2511.00802 2026-06-18 cs.SE cs.CL cs.LG 版本更新

GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents

GrowthHacker: 使用代码修改型LLM代理的自动离线策略评估优化

Jie JW Wu, Ayanda Patrick Herlihy, Ahmad Saleem Mirza, Ali Afoud, Fatemeh Fard

发表机构 * Michigan Technological University, Houghton(密歇根技术大学) Birmingham City University(伯明翰城市大学) University of British Columbia, Kelowna(不列颠哥伦比亚大学, 肯洛纳)

AI总结 提出GrowthHacker基准,利用LLM代理自动迭代修改代码以优化离线策略评估(OPE)实现,在Open Bandit Pipeline和Scope-RL上评估多种框架,证明基于LLM的代理可作为自动增长黑客持续改进OPE系统。

Comments Accepted for publication in ACM Transactions on Software Engineering and Methodology (TOSEM), 2026

详情
AI中文摘要

随着数据驱动开发的广泛采用,在线A/B测试已成为衡量新技术效果的既定方法。然而,部署在线实验需要设计、实现和部署资源,并可能对用户产生负面影响(例如,不安全或不道德的结果),同时需要数周的数据收集。为了解决这一问题,离线策略评估(OPE)或离线A/B测试这一日益增长的研究领域,使用先前收集的日志数据离线评估新技术。OPE也是强化学习中的一个基本问题,在在线测试昂贵或风险高的领域(如医疗保健、推荐系统、教育和机器人技术)中非常重要。尽管代码生成大语言模型(LLM)和代理工作流取得了进展,但关于LLM和基于LLM的代理是否以及如何自动优化OPE实现,我们知之甚少。我们提出了GrowthHacker,这是一个基准测试,用于在大规模公共数据集上评估基线LLM和基于LLM的代理。GrowthHacker自主迭代修改代码,运行OPE,并使用指标指导后续优化。我们在Open Bandit Pipeline(OBP)和Scope-RL上评估方法,并开发了一个双代理框架,该框架解决了现有框架的局限性,同时降低了复杂性。在两个库中,双代理显示出最高的可靠性(98.1%-100%成功率)和正向结果率(78%),正向结果的中位改进为4.4%;CrewAI实现了最高的平均改进(37.9%),并且是唯一没有极端值失败的框架。AutoGen和Default各达到65%的正向结果率。这些结果证明了使用基于LLM的代理作为自动“增长黑客”持续改进OPE系统的可行性,对在手动优化成本高昂的情况下扩展数据驱动决策具有重要意义。

英文摘要

With data-driven development now widely adopted, online A/B testing is an established method for measuring the effects of new technologies. However, deploying online experiments demands resources for design, implementation, and deployment, and may negatively impact users (e.g., unsafe or unethical outcomes) while requiring weeks of data collection. To address this, the growing research area of off-policy evaluation (OPE), or offline A/B testing, assesses new technologies offline using previously collected logged data. OPE is also a fundamental problem in reinforcement learning and is important where online testing is expensive or risky, such as healthcare, recommender systems, education, and robotics. Despite advances in code-generation large language models (LLMs) and agentic workflows, little is known about whether and how LLMs and LLM-based agents can automatically optimize OPE implementations. We propose GrowthHacker, a benchmark that evaluates baseline LLMs and LLM-based agents on large-scale public datasets. GrowthHacker autonomously and iteratively modifies code, runs OPE, and uses the metrics to guide subsequent optimization. We evaluate methods on Open Bandit Pipeline (OBP) and Scope-RL, and develop a two_agent framework that addresses limitations of existing frameworks while reducing complexity. Across both libraries, two_agent shows the highest reliability (98.1%-100% success rate) and positive-outcome rate (78%), with a median improvement of 4.4% among positive outcomes; CrewAI achieves the highest average improvement (37.9%) and is the only framework with zero extreme-value failures. AutoGen and Default each reach 65% positive-outcome rates. These results establish the feasibility of using LLM-based agents as automated "growth hackers" to continuously improve OPE systems, with implications for scaling data-driven decision-making where manual optimization is expensive.

4. 生成模型与概率建模 23 篇

2606.18509 2026-06-18 cs.LG stat.ML 新提交

Concept Modulation Models: A Unified Framework for Identifiability and Extrapolation

概念调制模型:可识别性与外推的统一框架

Soheun Yi, Yizhou Lu, Chandler Squires, Pradeep Ravikumar

发表机构 * Department of Statistics and Data Science, Carnegie Mellon University(卡内基梅隆大学统计与数据科学系) Machine Learning Department, Carnegie Mellon University(卡内基梅隆大学机器学习系)

AI总结 提出概念调制模型(CMMs),通过属性势统一条件潜变量模型的可识别性与外推分析,将基于转移的可识别性提升至条件设置,并导出代数外推准则。

详情
AI中文摘要

条件潜变量模型中的可靠泛化需要理解可识别性和外推:观测属性间的变化如何决定潜在结构,以及该结构如何决定未见属性上的分布。然而,现有的可识别性和外推保证大多是模型特定的,在非线性ICA、因果表示学习、扰动建模及相关条件潜变量模型中分别进行分析。我们引入概念调制模型(CMMs),这是一类属性索引的条件生成模型,其结构为$A\to \Lambda \to C\to X$,其中属性选择调制器,调制器诱导潜在概念法则,概念生成观测特征。CMMs通过展示观测属性上的特征一致性诱导受CMM类约束的潜在概念转移,将基于转移的可识别性提升至条件设置。我们通过属性势(属性条件概念法则之间的对数密度比)表达这些约束,将通用提升步骤与模型特定的刚性论证分离。相同的势控制外推:当且仅当传输的属性势恒等式扩展到这些属性时,未见属性上的一致性成立。这导出了代数外推准则,识别出几个现有可识别性和外推结果背后的共同基于势的证明对象,并且当与这些工作中的模型特定刚性论证结合时,恢复了它们所述的结论。

英文摘要

Reliable generalization in conditional latent variable models requires understanding both identifiability and extrapolation: how observed variation across attributes determines latent structure, and how that structure determines distributions at unseen attributes. However, existing identifiability and extrapolation guarantees are largely model-specific, with separate analyses in nonlinear ICA, causal representation learning, perturbation modeling, and related conditional latent variable models. We introduce concept modulation models (CMMs), an attribute-indexed class of conditional generative models with structure $A\to Λ\to C\to X$, where attributes select modulators, modulators induce latent concept laws, and concepts generate observed features. CMMs lift transition-based identifiability to conditional settings by showing that feature agreement on observed attributes induces a latent concept transition constrained by the CMM class. We express these constraints through attribute potentials, log-density ratios between attribute-conditioned concept laws, separating the generic lifting step from model-specific rigidity arguments. The same potentials control extrapolation: agreement at unseen attributes holds exactly when the transported attribute-potential identities extend to those attributes. This yields algebraic extrapolation criteria, identifies the common potential-based proof objects behind several existing identifiability and extrapolation results, and, when combined with the model-specific rigidity arguments in those works, recovers their stated conclusions.

2606.18898 2026-06-18 cs.LG 新提交

Anomaly Detection for Sparse and Irregular Multivariate Time Series with Latent SDEs

基于潜在随机微分方程的稀疏不规则多元时间序列异常检测

Martin Uray, Dominik Geng, Florian Graf, Stefan Huber, Roland Kwitt

发表机构 * Josef Ressel Centre for Intelligent and Secure Industrial Automation, University of Applied Sciences, Salzburg, Austria(约瑟夫·雷斯尔智能与安全工业自动化中心,应用科学大学,萨尔茨堡,奥地利) University of Salzburg, Austria(萨尔茨堡大学,奥地利)

AI总结 针对现实世界中稀疏、不规则采样的多元时间序列,提出基于潜在随机微分方程的生成方法,将观测投影到连续时间随机动力系统,处理缺失和不规则采样,并捕获循环行为,在六个基准数据集上取得最优结果。

Comments Preprint

详情
AI中文摘要

多元时间序列异常检测(MTSAD)在工业监控、网络安全或医疗保健等广泛应用领域至关重要。现实世界的数据通常是稀疏的、不规则采样的或部分观测的,但现有方法假设时间序列均匀采样。我们提出了一种基于潜在随机微分方程的生成方法,将观测到的时间序列投影到一个连续时间随机动力系统上,能够直接处理缺失观测和不规则采样,同时自然捕获许多现实世界用例固有的可能循环行为。在六个异常基准数据集上的实验表明,我们提出的方法在现有最先进基线中排名第一。我们进一步证明,在严重数据稀疏性下,我们的方法保持鲁棒性,而测试的基线方法性能显著下降。这些结果突显了潜在随机微分方程作为多元时间序列异常检测的自然归纳偏置,尤其是在存在现实世界不规则性的情况下。

英文摘要

Multivariate time series anomaly detection (MTSAD) is critical for a wide range of application areas, such as industrial monitoring, cybersecurity, or healthcare. Real-world data is often sparse, irregularly sampled or partially observed, yet existing methods assume uniformly sampled time series. We propose a generative approach based on Latent SDEs that projects the observed time series on a continuous-time stochastic dynamical system, directly being able to handle missing observations and irregular sampling, while also naturally capturing possible cyclic behavior that many real-world use cases inherently possess. Experiments on six anomaly benchmark datasets show that our proposed method ranks first among state-of-the-art baselines. We further demonstrate that our method remains robust under severe data sparsity, while performance significantly degrades for the tested baseline methods. These results highlight latent SDEs as a natural inductive bias for anomaly detection in multivariate time series, especially in presence of real-world irregularities.

2606.18997 2026-06-18 cs.LG 新提交

DIPHINE: Diffusion-based $Φ$-ID Neural Estimator

DIPHINE: 基于扩散的 $\Phi$ID 神经估计器

Simon Pedro Galeano Munoz, Mustapha Bounoua, Giulio Franzese, Pietro Michiardi, Maurizio Filippone

发表机构 * KAUST(卡塔尔科学与技术部) EURECOM(欧雷康)

AI总结 提出首个基于扩散模型的神经估计器 DIPHINE,用于计算连续非高斯动力系统的集成信息分解($\Phi$ID),通过单个摊销网络联合估计所有互信息项,并利用 Möbius 逆变换恢复十六个原子。

详情
AI中文摘要

揭示真实世界复杂系统的真实信息架构需要厘清其组件如何随时间独特存储、冗余共享和协同整合信息。集成信息分解($\Phi$ID)是一个框架,用于将多变量系统的信息动态分解为十六个非重叠原子,这些原子表征冗余、独特和协同的信息存储、传输和整合模式。现有的计算 $\Phi$ID 的方法仅限于高斯或离散系统,阻碍了其在连续非高斯动力系统中的应用。我们通过提出 DIPHINE(基于扩散的 $\Phi$ID 神经估计器)来解决这一限制,这是首个利用基于分数的扩散模型从单个摊销网络中联合估计 $\Phi$ID 所需的所有互信息项的神经估计器,并通过 Möbius 逆变换恢复十六个原子。我们提供了通过逆变换的误差传播的理论分析,表明从互信息到原子的映射的雅可比矩阵是整数值的,并且协同到协同原子被证明是最难估计的。我们在合成基准上展示了准确恢复真实原子,与已建立的互信息估计器相比具有优越性能,并在涉及真实数据的应用中无需任何分布假设即可提取生理上可解释的信息动态结构。

英文摘要

Uncovering the true informational architecture of real-world complex systems requires disentangling how their components uniquely store, redundantly share, and synergistically integrate information over time. Integrated Information Decomposition ($Φ$ID) is a framework for decomposing the information dynamics of multivariate systems into sixteen non-overlapping atoms that characterize redundant, unique, and synergistic modes of information storage, transfer, and integration. Existing methods to compute $Φ$ID are restricted to Gaussian or discrete systems, preventing its application to continuous non-Gaussian dynamical systems. We address this limitation by proposing DIPHINE (Diffusion-based $Φ$-ID Neural Estimator), the first neural estimator that leverages score-based diffusion models to jointly estimate all the mutual information terms required by $Φ$ID from a single amortized network, recovering the sixteen atoms through Möbius inversion. We provide a theoretical analysis of error propagation through the inversion, showing that the Jacobian of the mapping from mutual informations to atoms is integer-valued and that the synergy-to-synergy atom is provably the hardest to estimate. We demonstrate accurate recovery of ground-truth atoms on synthetic benchmarks, superior performance compared to established mutual information estimators, and the ability to extract physiologically interpretable information-dynamic structure on an application involving real data without any distributional assumptions.

2606.19162 2026-06-18 cs.LG cs.CV 新提交

The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL

奖励一直就在你的数据中:用判别器引导的强化学习纠正流匹配

Nicolas Beltran-Velez, Felix Friedrich, Zhang Xiaofeng, Reyhane Askari-Hemmat, Xiaochuang Han, Adriana Romero-Soriano, Michal Drozdzal

发表机构 * FAIR at Meta(Meta FAIR) Columbia University(哥伦比亚大学) McGill University(麦吉尔大学) Canada CIFAR AI Chair(加拿大CIFAR人工智能主席)

AI总结 针对流匹配模型因损失函数与样本质量不匹配导致的视觉缺陷,提出判别器引导的强化学习(DRL),利用预训练空间中判别器的logit作为奖励,显著提升无引导FID和语义FD,并改善偏好对齐。

Comments 84 pages, including appendices

详情
AI中文摘要

得分匹配和流匹配模型通常依赖基于偏好的强化学习来实现两个目的:与主观偏好对齐,以及令人惊讶地恢复视觉真实性和连贯对象结构等属性——而这些属性本应通过匹配训练从数据本身学习。我们认为这反映了结构上的不匹配。匹配损失衡量训练时边缘分布下速度或得分场的$\ell_2$回归误差,这一代理指标与决定推理时样本质量的视觉和语义属性对齐不良。给定一个与这些属性对齐的奖励,强化学习通过评估模型自身生成的样本并直接遵循奖励景观来规避不匹配。挑战在于如何在不依赖人类偏好的情况下获得这样的奖励,因为人类偏好昂贵且会将数据真实性与标注者倾向混为一谈。我们提出判别器引导的强化学习(DRL)。DRL训练一个判别器,在预训练表示空间中区分数据样本和基础模型样本,并将其logit作为KL正则化强化学习中的奖励。预训练空间将判别器限制在感知有意义的方向上,而logit估计数据与模型之间的对数似然比,这是针对数据分布的最优奖励。在SiT、JiT、REPA和RAE上,DRL降低了无引导FID(例如,SiT上从9.38降至2.62)和语义空间FD(例如,SiT上DINOv3从88.2降至19.3),在所有骨干网络上均有一致提升,并且在没有经过偏好奖励训练的情况下改善了人类偏好奖励。在后续基于偏好的后训练中,DRL还在偏好奖励与图像保真度之间产生了更好的帕累托前沿,在提高对齐度的同时减少了过饱和和过亮等低级伪影。

英文摘要

Score- and flow-matching models often rely on preference-based reinforcement learning for two purposes: aligning with subjective preferences and, surprisingly, recovering properties such as visual realism and coherent object structure that matching-based training is intended to learn from the data itself. We argue that this reflects a structural mismatch. Matching losses measure $\ell_2$ regression error on the velocity or score field under training-time marginals, a proxy poorly aligned with the visual and semantic properties that determine sample quality at inference. Given a reward aligned with these properties, RL sidesteps the mismatch by evaluating the model on its own samples and following the reward landscape directly. The challenge is to obtain such a reward without relying on human preferences, which are expensive and conflate data realism with annotator inclinations. We propose Discriminator-Guided RL (DRL). DRL trains a discriminator to separate data from base-model samples in a pretrained representation space and uses its logit as the reward in KL-regularized RL. The pretrained space restricts the discriminator to perceptually meaningful directions, and the logit estimates the log-likelihood ratio between data and model, which is the optimal reward for targeting the data distribution. Across SiT, JiT, REPA, and RAE, DRL reduces guidance-free FID (e.g., $9.38 \to 2.62$ on SiT) and semantic-space FD (e.g., $88.2 \to 19.3$ on DINOv3 for SiT), with consistent gains across all backbones, and improves human-preference rewards without training on them. It also yields a better Pareto frontier between preference reward and image fidelity under subsequent preference-based post-training, increasing alignment while reducing low-level artifacts such as oversaturation and excessive brightness.

2606.19264 2026-06-18 cs.LG cs.CL 新提交

Structured Inference with Large Language Gibbs

大语言吉布斯结构化推理

Sanghyeok Choi, Henry Gouk, Esmeralda S. Whitammer

AI总结 提出大语言吉布斯方法,利用大语言模型的条件分布作为转移算子进行结构化概率推理,通过迭代重采样变量避免顺序偏差,在合成分布、一致性推理和贝叶斯结构学习中验证有效性。

Comments Code: https://github.com/hyeok9855/large-language-gibbs

详情
AI中文摘要

大型语言模型(LLMs)中编码的知识可以作为描述复杂世界变量的结构化推理的基础,但以概率一致的方式访问这些知识构成了一个困难的推理问题。我们提出了大语言吉布斯,一种结构化概率推理方案,它使用LLM的条件分布作为转移算子。不是通过单次自回归生成来采样结构化对象,而是利用LLM的下一个标记条件分布,在给定其他变量的条件下迭代地重采样单个变量。这种方法避免了顺序依赖偏差,并产生一个反映所有局部条件分布之间折衷的平稳分布。我们将这种方法应用于从合成分布中采样、一致性推理任务和贝叶斯结构学习。结果表明,在通过噪声LLM条件分布可访问的世界先验下,MCMC中使用LLM条件分布是用于结构化概率推理的一次性生成的实际替代方案。

英文摘要

The knowledge encoded in large language models (LLMs) can serve as a substrate for structured reasoning over variables describing a complex world, but accessing this knowledge in a probabilistically coherent manner poses a difficult inference problem. We propose Large Language Gibbs, a scheme for structured probabilistic inference that uses conditional distributions of an LLM as transition operators. Rather than sampling structured objects through single-pass autoregressive generation, we iteratively resample individual variables conditioned on others using an LLM's next-token conditionals. This approach avoids order-dependent biases and produces a stationary distribution that reflects a compromise between all local conditionals. We apply this approach to sampling from synthetic distributions, consistent reasoning tasks, and Bayesian structure learning. The results suggest that the use of LLM conditionals in MCMC is a practical alternative to one-pass generation for structured probabilistic inference under a world prior accessible through noisy LLM conditionals.

2606.19315 2026-06-18 cs.LG 新提交

Diffusion-Proof: Recipe for Formal Theorem Proving Beyond Auto-Regressive Generation

Diffusion-Proof:超越自回归生成的正式定理证明配方

Ruida Wang, Rui Pan, Pengcheng Wang, Shizhe Diao, Tong Zhang

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) NVIDIA(英伟达)

AI总结 提出Diffusion-Proof框架,首次将扩散语言模型应用于形式定理证明,通过全证明生成和局部校正方法,在ProofNet和MiniF2F上分别提升1.61%和6.14%,并解决了一个DeepSeek-Prover-V2-7B无法解决的IMO问题。

详情
AI中文摘要

近年来,增强大型语言模型(LLMs)的形式数学推理能力已成为数学和计算机科学社区的关键焦点。虽然在使用最先进的自回归(AR)LLMs进行形式定理证明方面取得了显著进展,但这些模型存在固有局限性。它们的下一个词预测生成方法可能因长程连贯性挑战和长序列错误累积而导致次优性能。最近,扩散LLMs(dLLMs)通过多词块的迭代去噪生成文本,提供了一种有前景的替代方案。然而,dLLMs在形式数学中的应用(其中保持长程连贯性至关重要)仍然研究不足。为解决上述挑战,我们提出了**Diffusion-Proof**,据我们所知,这是第一个训练和应用dLLMs进行形式定理证明的框架。我们的框架包含两种模型的训练和推理方法。第一个是*dLLM-Prover-7B*,它执行具有长程连贯策略使用的全证明写作。第二个是*dLLM-Corrector-7B*,这是一种新颖的大块扩散校正模型。它利用dLLMs的填充能力,使用双向信息进行局部证明校正。大量实验表明,**Diffusion-Proof**相对显著优于在同一数据集上训练的AR LLM基线。与基线相比,**Diffusion-Proof**在ProofNet-Test和MiniF2F-Test基准上分别实现了**1.61%**和**6.14%**的绝对提升。值得注意的是,**Diffusion-Proof**成功解决了一个更先进的思考模型DeepSeek-Prover-V2-7B无法解决的IMO问题,展示了dLLMs在形式定理证明中的独特优势。

英文摘要

Enhancing the formal math reasoning capabilities of Large Language Models (LLMs) has become a key focus in both mathematical and computer science communities in recent years. While significant progress has been made in using state-of-the-art Auto-Regressive (AR) LLMs for formal theorem proving, these models suffer from inherent limitations. Their next-token prediction generation methods may yield suboptimal performance due to the challenges of long-range coherence and the compounding of errors over long sequences. Recent advancements in diffusion LLMs (dLLMs), which generate text through iterative denoising of a multi-token block, offer a promising alternative. However, the application of dLLMs to formal mathematics, where maintaining long-range coherence is critical, remains largely understudied. To address the challenges above, we propose **Diffusion-Proof**, to the best of our knowledge, the first framework to train and apply dLLMs for formal theorem proving. Our frameworks contain training and inference methods for two models. The first one is *dLLM-Prover-7B*, which performs whole-proof writing with long-range coherent tactic usage. The second one is *dLLM-Corrector-7B*, which is a novel large block diffusion-based correction model. It leverages the in-filling capabilities of dLLMs to perform local proof correction using bi-directional information. Extensive experiments demonstrate that **Diffusion-Proof** relatively significantly outperforms the AR LLM baseline trained under the same dataset. **Diffusion-Proof** achieves an absolute improvement of **1.61%** on ProofNet-Test and **6.14%** on MiniF2F-Test benchmarks compare to the baseline. Notably, **Diffusion-Proof** successfully resolves one IMO problem that more advanced thinking model DeepSeek-Prover-V2-7B could not solve, showcasing the unique advantage of dLLMs in formal theorem proving.

2606.18290 2026-06-18 cond-mat.stat-mech cs.LG eess.SP 交叉投稿

Stochastic Thermodynamics and SDE-based Generative Models

随机热力学与基于SDE的生成模型

Yaowen Zhang

发表机构 * GitHub

AI总结 本文在随机热力学框架下,为基于SDE的生成模型(如扩散模型和薛定谔桥)定义了轨迹层面的功、热和熵产生,并推广了Jarzynski恒等式和类第二定律不等式。

详情
AI中文摘要

基于SDE的生成模型,包括扩散模型和薛定谔桥,在信号处理任务中有着广泛的应用,如语音增强、图像恢复和时间序列生成。本文在随机热力学的背景下为这类模型提出了一个建模框架。本文的主要结果是功、热和熵产生的轨迹层面定义,以及一个推广的Jarzynski恒等式和一个类第二定律不等式。所提出的框架扩展了原始的Jarzynski设置,以适应时间依赖的浴温和非保守驱动力。这种热力学视角可能从非平衡统计力学的角度加深我们对扩散模型和薛定谔桥的理解。

英文摘要

SDE-based generative models, including diffusion models and the Schrödinger bridge, have found broad applications in signal processing tasks such as speech enhancement, image restoration, and time-series generation. This note presents a modeling framework for such models within the context of stochastic thermodynamics. The main results of this note are trajectory-level definitions of work, heat, and entropy production, along with a generalized Jarzynski identity and a second-law-like inequality. The proposed framework extends the original Jarzynski setup to accommodate time-dependent bath temperature and nonconservative driving forces. This thermodynamic perspective may deepen our understanding of diffusion models and the Schrödinger bridge from a nonequilibrium statistical mechanics viewpoint.

2606.18354 2026-06-18 eess.IV cs.LG 交叉投稿

Structural MRI Synthesis for Alzheimer's Disease via Conditional Diffusion on Anatomical Masks

基于解剖掩膜条件扩散的阿尔茨海默病结构MRI合成

Muge Zhang, Muhammad Ali Khaliq, Jamal Alsakran, Byeong Kil Lee, Jeeho Ryoo

发表机构 * Fairleigh Dickinson University(Fairleigh Dickinson大学) University of Colorado at Colorado Springs(科罗拉多州立大学)

AI总结 针对阿尔茨海默病结构MRI合成中细微解剖变化难以捕捉的问题,本文扩展Med-DDPM条件扩散模型,以解剖分割掩膜为条件生成3D结构MRI,实验表明合成数据训练的模型Dice分数与真实数据相当,混合数据训练则显著提升性能。

详情
Journal ref
2025 IEEE 8th International Conference on Multimedia Information Processing and Retrieval (MIPR)
AI中文摘要

生成式机器学习模型的最新进展显著改善了医学成像,为数据增强、隐私保护和模型泛化提供了有前景的解决方案。然而,由于神经退行性病变相关的细微、区域特异性和渐进性解剖变化,合成阿尔茨海默病(AD)的高质量结构MRI数据仍然具有挑战性。在本文中,我们将最初为脑肿瘤合成设计的Med-DDPM条件扩散模型扩展,以生成专门针对AD的3D结构MRI。我们采用Med-DDPM,因为与其他生成模型相比,它具有稳定的结构和保真度,特别适合捕捉AD特征的细微解剖变化。我们的方法以来自ADNI数据集的解剖分割掩膜为条件,将关键的AD相关脑结构纳入生成过程。我们通过在真实、合成和混合数据集上训练分割模型,系统评估了合成图像的质量和实用性。实验结果表明,仅在合成数据上训练的分割模型达到了与真实数据训练(0.6513)相当的Dice分数(0.6532),同时召回率显著提高。值得注意的是,在混合数据集(混合真实和合成图像)上训练的模型优于真实和纯合成基线,Dice分数达到0.7244。这些发现强调了条件扩散模型在生成解剖准确、AD特异性合成MRI方面的成功应用,并突出了它们在增强训练数据可用性、提高诊断准确性和促进神经影像研究可重复性方面的潜力。

英文摘要

Recent advances in generative machine learning models have significantly improved medical imaging, offering promising solutions for data augmentation, privacy preservation, and improved model generalization. However, synthesizing high-quality structural MRI data for Alzheimer's Disease (AD) remains challenging due to the subtle, region-specific, and progressive anatomical changes associated with neurodegeneration. In this paper, we extend the Med-DDPM conditional diffusion model -- originally designed for brain tumor synthesis -- to generate 3D structural MRIs specifically tailored to AD. We adopted Med-DDPM due to its established stability and structural fidelity compared to other generative models, which makes it particularly suitable for capturing the subtle anatomical changes characteristic of AD. Our approach conditions the diffusion process on anatomical segmentation masks derived from the ADNI dataset, incorporating key AD-relevant brain structures into the generation process. We systematically evaluate the quality and utility of the synthetic images by training segmentation models on real, synthetic, and hybrid (mixed) datasets. Experimental results demonstrate that segmentation models trained exclusively on synthetic data achieve comparable Dice scores (0.6532) to those trained on real data (0.6513), while exhibiting significantly enhanced recall. Notably, models trained on hybrid datasets (mixing real and synthetic images) outperform both real and synthetic-only baselines, achieving a Dice score of 0.7244. These findings underscore the successful use of conditional diffusion models for generating anatomically accurate, AD-specific synthetic MRIs, and highlight their potential for enhancing training data availability, improving diagnostic accuracy, and promoting research reproducibility in neuroimaging studies.

2606.18790 2026-06-18 cs.SD cs.AI cs.LG 交叉投稿

Closing the Loop: PID Feedback Control for Interpretable Activation Steering in Symbolic Music Generation

闭环:用于符号音乐生成中可解释激活引导的PID反馈控制

Ioannis Prokopiou, Pantelis Vikatos, Maximos Kaliakatsos-Papakostas, Theodoros Giannakopoulos, Themos Stafylakis

发表机构 * Athens University of Economics and Business(雅典经济与商业大学) Orfium Research(Orfium 研究) Hellenic Mediterranean University(希腊地中海大学) Archimedes / Athena Research Center(阿基米德/雅典娜研究中心)

AI总结 提出基于PID反馈控制的推理时激活引导框架,通过差分均值法提取音高和时长潜在方向,并利用Gram-Schmidt正交化解耦多属性引导,实现符号音乐生成中细粒度、可解释的属性调制。

Comments Accepted at Learning to Listen: ICML 2026 Workshop on Machine Learning for Audio (43rd International Conference on Machine Learning - ICMLMLA26), 4 pages main (11 total), 2 figures

详情
AI中文摘要

基于Transformer的架构在生成复杂符号序列方面取得了显著进展,但在实现对离散信号属性的细粒度、可解释控制方面仍存在明显差距。本文研究了多轨音乐Transformer(MMT)的机制可解释性,并提出了一种无需重新训练即可通过推理时激活引导实现确定性属性调制的框架。利用差分均值(DiffMean)方法,我们在残差流中分离出信号属性(特别是音高和时长)的潜在方向。我们验证了该领域的线性表示假设,实现了引导幅度与属性偏移之间的高相关性。为了解决多属性引导中固有的特征纠缠问题,我们引入了一种利用Gram-Schmidt正交化的双引导框架。实验结果表明,与朴素向量加法相比,这种几何解耦减少了概念干扰和信号退化,即使在强自回归条件下也能实现独立的确定性控制。

英文摘要

Transformer-based architectures have significantly advanced the generation of complex symbolic sequences, yet a significant gap remains in achieving fine-grained, interpretable control over discrete signal attributes. This paper investigates the mechanistic interpretability of the Multitrack Music Transformer (MMT) and proposes a framework for deterministic attribute modulation without retraining to bridge this gap via inference-time activation steering. Utilizing the Difference-in-Means (DiffMean) methodology, we isolate latent directions for signal attributes, specifically Pitch and Duration, within the residual stream. We validate the Linear Representation Hypothesis in this domain, achieving high correlation between steering magnitude and attribute shift. To address the inherent feature entanglement in multi-attribute steering, we introduce a Dual Steering framework utilizing Gram-Schmidt Orthogonalization. Experimental results demonstrate that this geometric decoupling reduces conceptual interference and signal degradation compared to naive vector addition, enabling independent deterministic control even against strong autoregressive conditioning.

2606.18856 2026-06-18 cs.CL cs.LG 交叉投稿

Approximate Structured Diffusion for Sequence Labelling

近似结构化扩散用于序列标注

Nicolas Floquet, Joseph Le Roux, Nadi Tomeh

发表机构 * Université Sorbonne Paris Nord, CNRS, Laboratoire d’Informatique de Paris Nord, LIPN(巴黎北大学 Sorbonne、法国国家科学研究中心、巴黎北信息学实验室、LIPN)

AI总结 提出一种基于扩散的条件随机场(CRF)训练方法,通过引入标签噪声条件来捕捉长距离依赖,结合近似推理在词性标注任务上实现16.5%的错误率降低。

详情
AI中文摘要

序列标注是自然语言处理(NLP)的核心任务,涉及为输入句子的每个标记分配一个标签。从机器学习的角度来看,序列标注通常被建模为由神经网络参数化的线性链条件随机场(CRF)。虽然这种方法在经验上取得了良好结果,但CRF假设有限的决策跨度(例如标签二元组),这可能会限制其表达能力,并在需要长距离依赖时损害性能。我们证明可以利用扩散来训练一个以整个标签序列为条件的CRF,但条件是标签的噪声版本。实验表明,该方法结合近似CRF推理,在词性标注任务上实现了16.5%的错误率降低,提高了标签准确性。

英文摘要

Sequence labelling, a core task of Natural Language Processing (NLP), consists in assigning each token of an input sentence a label. From a Machine Learning point of view, sequence labelling is often cast as a Linear-Chain Conditional Random Field (CRF) parametrised by a neural network. While this approach gives good empirical results, CRFs assume a finite decision span (eg label bigrams) which can limit their expressivity and hurt performance when long-range dependencies are required. We show we can leverage diffusion to train a CRF conditioned on an entire label sequence, with the caveat that the condition is on a noisy version of labels. We show experimentally that this method, in conjunction with approximate CRF inference, improves label accuracy with a 16.5% error reduction for POS-tagging.

2606.19005 2026-06-18 cs.CL cs.LG 交叉投稿

Sumi: Open Uniform Diffusion Language Model from Scratch

Sumi: 从头训练的开放均匀扩散语言模型

Mengyu Ye, Keito Kudo, Wataru Ikeda, Ryosuke Matsuda, Keisuke Sakaguchi, Jun Suzuki

发表机构 * Tohoku University(东北大学)

AI总结 本文提出Sumi,一个从零开始预训练的70亿参数均匀扩散语言模型,在1.5T tokens上训练,性能与同规模自回归模型相当,并开源所有资源。

详情
AI中文摘要

扩散模型已成为自回归模型的有前途的替代方案。其中,均匀扩散语言模型(UDLM)允许在任何步骤更新任何token,原则上能够实现更灵活的生成。然而,目前还没有从零开始预训练的大参数规模和大token预算的UDLM。自回归建模和掩码扩散建模已经拥有大规模的可供社区研究和构建的模型;而均匀扩散模型则没有。大规模从头预训练的UDLM将为研究缩放行为、生成动态、可控性以及与现有自回归和掩码扩散模型的权衡提供一个干净的参考点。为此,我们引入了Sumi(日语中“墨水”的意思),一个完全开放的70亿参数均匀扩散语言模型,从零开始在1.5T tokens上预训练。Sumi在知识、推理和编码基准测试中与在可比token预算下训练的自回归模型表现相当,但在常识基准测试中表现较差,其中我们以教育为主的数据混合可能是原因之一。我们发布了模型权重、检查点和完整的训练方案,包括在公开可用的语料库上的数据混合的完整规范。我们希望这次发布能使社区研究大规模原生均匀扩散,并促进对其尚未很好理解的方面的研究。

英文摘要

Diffusion models have become a promising alternative to autoregressive models. Among these, uniform diffusion language models (UDLMs) permit any token to be updated at any step, in principle enabling more flexible generation. However, no UDLM has yet been pretrained from scratch at both large parameter scale and large token budget. Both autoregressive modeling and masked diffusion modeling already have capable models at scale that the community can study and build on; uniform diffusion has none. A scratch-pretrained UDLM at scale would provide a clean reference point for studying scaling behavior, generation dynamics, controllability, and trade-offs against established autoregressive and masked diffusion models. To this end, we introduce Sumi ("ink" in Japanese), a fully open 7B uniform diffusion language model pretrained from scratch on 1.5T tokens. Sumi performs competitively with autoregressive models trained at comparable token budgets on knowledge, reasoning, and coding benchmarks, while under-performing on commonsense benchmarks, where our education-heavy data mixture is a likely contributor. We release our model weights, checkpoints, and full training recipe, including a complete specification of the data mixture over publicly available corpora. We hope this release enables the community to study native uniform diffusion at scale and catalyzes work on its as-yet poorly understood aspects.

2602.11467 2026-06-18 cs.LG 版本更新

PRISM: A 3D Probabilistic Neural Representation for Interpretable Shape Modeling

PRISM:一种用于可解释形状建模的三维概率神经表示

Yining Jiao, Sreekalyani Bhamidi, Carlton Jude Zdanski, Julia S Kimbell, Andrew Prince, Cameron P Worden, Samuel Kirse, Christopher Rutter, Benjamin H Shields, Jisan Mahmud, Marc Niethammer

发表机构 * Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, USA(北卡罗来纳大学教堂山分校计算机科学系) Department of Computer Science, University of California San Diego, La Jolla, USA(加州大学圣地亚哥分校计算机科学系) School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, USA(北卡罗来纳大学教堂山分校医学院)

AI总结 提出PRISM框架,结合隐式神经表示与不确定性感知统计形状分析,通过封闭形式Fisher信息度量实现高效局部时间不确定性量化,在形状演化、个性化预测和异常检测任务中表现优异。

Comments ICML 2026, camera-ready version, 24 pages

详情
AI中文摘要

理解解剖形状如何响应发育协变量而演变——并量化其空间变化的不确定性——在医疗保健研究中至关重要。现有方法通常依赖于忽略空间异质性动态的全局时间扭曲公式。我们引入PRISM,一种新颖的框架,将隐式神经表示与不确定性感知统计形状分析相结合。PRISM建模给定协变量下形状的条件分布,提供总体均值和协变量依赖不确定性在任意位置的空间连续估计。一个关键的理论贡献是封闭形式的Fisher信息度量,通过自动微分实现高效、解析可处理的局部时间不确定性量化。在三个合成数据集和一个临床数据集上的实验表明,PRISM在统一框架内从建模形状演化到个性化形状预测和异常检测等多样化任务中表现出色,同时提供可解释且临床有意义的不确定性估计。

英文摘要

Understanding how anatomical shapes evolve in response to developmental covariates - and quantifying their spatially varying uncertainties - is critical in healthcare research. Existing approaches typically rely on global time-warping formulations that ignore spatially heterogeneous dynamics. We introduce PRISM, a novel framework that bridges implicit neural representations with uncertainty-aware statistical shape analysis. PRISM models the conditional distribution of shapes given covariates, providing spatially continuous estimates of both the population mean and covariate-dependent uncertainty at arbitrary locations. A key theoretical contribution is a closed-form Fisher Information metric that enables efficient, analytically tractable local temporal uncertainty quantification via automatic differentiation. Experiments on three synthetic datasets and one clinical dataset demonstrate PRISM's strong performance across diverse tasks - from modeling shape evolution to personalized shape prediction and anomaly detection - within a unified framework, while providing interpretable and clinically meaningful uncertainty estimates.

2603.10718 2026-06-18 cs.LG 版本更新

Riemannian MeanFlow for One-Step Generation on Manifolds

Riemannian MeanFlow用于流形上的单步生成

Zichen Zhong, Haoliang Sun, Yukun Zhao, Yongshun Gong, Yilong Yin

发表机构 * School of Software, Shandong University, Jinan, China(软件学院,山东大学,济南,中国)

AI总结 本文提出Riemannian MeanFlow(RMF),通过平行运输定义平均速度场,并推导出将平均速度与瞬时速度联系起来的Riemannian MeanFlow恒等式,从而实现流形上基于位置的切空间中的单步生成,改进了生成质量与效率的权衡并降低了采样成本。

Comments ICML 2026

详情
AI中文摘要

Flow Matching enables simulation-free training of generative models on Riemannian manifolds, yet sampling typically still relies on numerically integrating a probability-flow ODE. We propose Riemannian MeanFlow (RMF), extending MeanFlow to manifold-valued generation where velocities lie in location-dependent tangent spaces. RMF defines an average-velocity field via parallel transport and derives a Riemannian MeanFlow identity that links average and instantaneous velocities for intrinsic supervision. We make this identity practical in a log-map tangent representation, avoiding trajectory simulation and heavy geometric computations. For stable optimization, we decompose the RMF objective into two terms and apply conflict-aware multi-task learning to mitigate gradient interference. RMF also supports conditional generation via classifier-free guidance. Experiments on spheres, tori, SO(3), and SE(3) demonstrate competitive one-step sampling with improved quality-efficiency trade-offs and substantially reduced sampling cost.

英文摘要

Flow Matching enables simulation-free training of generative models on Riemannian manifolds, yet sampling typically still relies on numerically integrating a probability-flow ODE. We propose Riemannian MeanFlow (RMF), extending MeanFlow to manifold-valued generation where velocities lie in location-dependent tangent spaces. RMF defines an average-velocity field via parallel transport and derives a Riemannian MeanFlow identity that links average and instantaneous velocities for intrinsic supervision. We make this identity practical in a log-map tangent representation, avoiding trajectory simulation and heavy geometric computations. For stable optimization, we decompose the RMF objective into two terms and apply conflict-aware multi-task learning to mitigate gradient interference. RMF also supports conditional generation via classifier-free guidance. Experiments on spheres, tori, SO(3), and SE(3) demonstrate competitive one-step sampling with improved quality-efficiency trade-offs and substantially reduced sampling cost.

2604.04342 2026-06-18 cs.LG stat.ML 版本更新

Generative models for decision-making under distributional shift

分布偏移下决策的生成模型

Xiuyuan Cheng, Yunqin Zhu, Yao Xie

发表机构 * Department of Mathematics, Duke University(杜克大学数学系) H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology(佐治亚理工学院H. Milton Stewart工业与系统工程学院)

AI总结 本文提出基于流和分数生成模型的统一框架,通过传输映射、速度场等工具处理分布偏移下的决策问题,实现鲁棒性、条件分布生成及不确定性量化。

Comments INFORMS TutORials in Operations Research, 2026

详情
AI中文摘要

许多数据驱动的决策问题使用从历史数据估计的名义分布来制定,而性能最终由可能发生偏移、依赖于上下文、部分观测或由压力引起的部署分布决定。本教程介绍了现代生成模型,特别是基于流和分数的方法,作为构建决策相关分布的数学工具。从运筹学的角度来看,它们的主要价值不在于无约束的样本合成,而在于通过传输映射、速度场、分数场和引导随机动力学来表示和变换分布。我们提出了一个基于前推映射、连续性、Fokker-Planck方程、Wasserstein几何和概率空间优化的统一框架。在此框架内,生成模型可用于学习名义不确定性、构建用于鲁棒性的受压或最不利分布,以及在侧信息和部分观测下生成条件或后验分布。我们还强调了代表性的理论保证,包括迭代流模型的前向-反向收敛、传输映射空间中的一阶极小极大分析,以及具有生成先验的后验采样的误差传递界。本教程为在分布偏移下使用生成模型进行场景生成、鲁棒决策、不确定性量化及相关问题提供了原则性的介绍。

英文摘要

Many data-driven decision problems are formulated using a nominal distribution estimated from historical data, while performance is ultimately determined by a deployment distribution that may be shifted, context-dependent, partially observed, or stress-induced. This tutorial presents modern generative models, particularly flow- and score-based methods, as mathematical tools for constructing decision-relevant distributions. From an operations research perspective, their primary value lies not in unconstrained sample synthesis but in representing and transforming distributions through transport maps, velocity fields, score fields, and guided stochastic dynamics. We present a unified framework based on pushforward maps, continuity, Fokker-Planck equations, Wasserstein geometry, and optimization in probability space. Within this framework, generative models can be used to learn nominal uncertainty, construct stressed or least-favorable distributions for robustness, and produce conditional or posterior distributions under side information and partial observation. We also highlight representative theoretical guarantees, including forward-reverse convergence for iterative flow models, first-order minimax analysis in transport-map space, and error-transfer bounds for posterior sampling with generative priors. The tutorial provides a principled introduction to using generative models for scenario generation, robust decision-making, uncertainty quantification, and related problems under distributional shift.

2605.17232 2026-06-18 cs.LG math.ST stat.ML stat.TH 版本更新

Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

离散扩散模型的维度无关收敛性:伴随方程诱导了正确的空间

Kelvin Kan, Xingjian Li, Benjamin J. Zhang, Tuhin Sahai, Stanley Osher, Markos A. Katsoulakis

发表机构 * Department of Mathematics(数学系) Oden Institute School of Data Science and Society(数据科学与社会学院) UCLA(加州大学洛杉矶分校) University of Texas at Austin(德克萨斯大学奥斯汀分校) UNC Chapel Hill(北卡罗来纳大学教堂山分校) Computational and Applied Sciences Group(计算与应用科学组) Department of Mathematics and Statistics(数学与统计学系) SRI International(SRI国际) University of Massachusetts Amherst(马萨诸塞大学阿姆赫斯特分校)

AI总结 本文提出了一种基于伴随方程的统一框架,实现了任何积分概率度量(IPM)下的维度无关收敛保证,克服了传统KL和TV方法在处理大规模状态空间时的局限性。

详情
AI中文摘要

离散扩散已成为生成建模中的领先框架,广泛应用于语言、视觉和生物学等领域。然而,现有的收敛理论存在根本性局限。基于KL的分析在奇异先验如掩码分布下会发散,而总变差(TV)的界依赖于状态空间大小S,并在现代语言任务中变得无效,因为词汇表包含数以万计的标记。我们开发了一种统一的基于伴随方程的框架,建立了任何积分概率度量(IPM)下的维度无关收敛保证。到目前为止,我们的界是首个完全不依赖S且适用于掩码和均匀先验的。重要的是,我们的理论仅依赖于一个标准的速率矩阵正则性假设,并且兼容时间非齐次调度。四个新颖的技术推动了我们的改进:通过伴随方程在可观测空间中工作而不是直接处理概率测度,一种产生任何IPM界正则性分析,一种耦合论证在均匀转移下去除S依赖性,以及一种分数-边际抵消技术在掩码转移下去除S依赖性。因此,我们的框架与先前分析显著不同,并避免了路径空间-KL和现有TV方法的不足。除了收敛界外,我们的框架还提供了一种灵活的工具包,用于进一步理论研究离散扩散模型。

英文摘要

Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked distribution, while bounds in total variation (TV) depend on the state space size $S$ and become vacuous for modern language tasks, where vocabularies contain hundreds of thousands of tokens. We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric (IPM). To the best of our knowledge, our bounds are the first to be entirely free of $S$ and applicable to both masked and uniform priors. Importantly, our theory relies only on a single standard rate-matrix regularity assumption and applies to general priors. Five novel techniques drive our improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes $S$-dependence under uniform transitions, and score-marginal cancellation and exit-routing techniques that remove $S$-dependence under masked transitions. Our framework thus sharply departs from prior analyses and avoids the shortcomings of pathspace-KL and existing TV-based approaches. Beyond convergence bounds, our framework provides a versatile toolkit for further theoretical study of discrete diffusion models, including principled choices of loss functions and dimension-free step complexity.

2605.30920 2026-06-18 cs.LG 版本更新

Unsupervised Diffusion Solver for Combinatorial Optimization via Combinatorial Adjoint Matching

通过组合伴随匹配实现组合优化的无监督扩散求解器

Shengyu Feng, Tarun Suresh, Yiming Yang

发表机构 * Language Technologies Institute, Carnegie Mellon University(卡内基梅隆大学语言技术研究所) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出组合伴随匹配(CAM)框架,利用离散伴随动力学和随机控制公式,实现无监督训练离散扩散求解器,在多种组合优化问题上达到与监督方法竞争的性能。

Comments ICML26

详情
AI中文摘要

基于扩散的神经求解器在组合优化(CO)中显示出强大潜力,但现有方法通常依赖于使用大量近最优解进行监督训练。在这项工作中,我们将基于伴随的轨迹优化方法扩展到离散组合域。我们将基于扩散的CO表述为连续时间马尔可夫链上的随机控制问题,并引入离散伴随动力学,用于通过离散生成轨迹传播优化信号。基于这一表述,我们提出了组合伴随匹配(CAM),一种用于离散扩散求解器的无监督训练框架,具有结构化和低方差的轨迹级优化信号。实验上,CAM在多种组合优化问题上始终优于现有的无监督扩散基线,并与强大的监督扩散求解器甚至传统求解器性能相当。我们的代码可在 https://github.com/Shengyu-Feng/CAM 获取。

英文摘要

Diffusion-based neural solvers have shown strong promise for combinatorial optimization (CO), but existing methods typically rely on supervised training with large collections of near-optimal solutions. In this work, we extend adjoint-based trajectory optimization methods to discrete combinatorial domains. We formulate diffusion-based CO as a stochastic control problem over Continuous-Time Markov Chains and introduce discrete adjoint dynamics for propagating optimization signals through discrete generative trajectories. Building on this formulation, we propose Combinatorial Adjoint Matching (CAM), an unsupervised training framework for discrete diffusion solvers with structured and low-variance trajectory-level optimization signals. Empirically, CAM consistently outperforms existing unsupervised diffusion baselines and achieves performance competitive with strong supervised diffusion solvers and even traditional solvers across diverse combinatorial optimization problems. Our code is available at https://github.com/Shengyu-Feng/CAM.

2606.10466 2026-06-18 cs.LG cs.AI 版本更新

UPLOTS: A Unified Pretrained Language Model for Constrained Time-series Generation

UPLOTS: 一种用于约束时间序列生成的统一预训练语言模型

Du Yin, Hao Xue, Jinliang Deng, Yang Yang, Shuang Ao, Arian Prabowo, Flora Salim

发表机构 * University of New South Wales(新南威尔士大学) HKUST(GZ)(香港科技大学(广州)) BUAA(北京航空航天大学)

AI总结 提出UPLOTS,一种基于统一预训练语言模型和提示引导的框架,通过动态多数据集损失重加权和提示到模式映射,实现跨领域约束时间序列生成,在四个基准上验证了其泛化性和数据增强效果。

详情
AI中文摘要

在时间序列生成中,现有方法通常为每个数据集手工设计或训练单独的模型,这阻碍了它们的可扩展性,并且未能利用跨领域的共享时间结构。为了解决这种碎片化问题,我们提出了UPLOTS,一种统一的、提示引导的语言模型框架,用于跨不同领域的约束时间序列生成。UPLOTS不是构建任务特定的模型,而是利用一个由学习到的约束提示引导的单一预训练transformer骨干网络,从而能够按需生成并精确控制模式。一个关键创新是我们的动态多数据集损失重加权和提示到模式映射,这使得UPLOTS能够在训练期间内化多样化的时间结构,并在推理时有条件地生成它们。我们在四个真实世界基准和多个约束设置(包括峰值周期、日历、负载水平和波动性模式)上评估了UPLOTS。额外的保留约束组合和下游预测实验进一步表明,UPLOTS能够泛化到原始峰值模式设置之外,并在真实数据稀缺的情况下改进数据增强。我们的代码和基线可在匿名GitHub仓库获取:this https URL。

英文摘要

In time-series generation, existing approaches typically handcraft ortrain a separate model for each dataset, which hinders their scalability and fails to leverage shared temporal structures across domains. To address this fragmentation, we propose UPLOTS, a Unified, Prompt-guided Language model framework fOr constrained Time-Series Generation across diverse domains. Instead of building task-specific models, UPLOTS leverages a single pre-trained transformer backbone guided by learned constraint prompts, enabling on-demand generation with precise pattern control. One key innovation is our dynamic multi-dataset loss re-weighting and prompt-to-pattern mapping, which allows UPLOTS to internalize diverse temporal structures during training and conditionally generate them at inference. We evaluate UPLOTS on four real-world benchmarks and multiple constraint settings, including peak-period, calendar, load-level, and volatility patterns. Additional held-out constraint-combination and downstream forecasting experiments further demonstrate that UPLOTS generalizes beyond the original peak-pattern setting and improves data augmentation under scarce real-data regimes. Our code and baselines are available at anonymous github repo: https://anonymous.4open.science/r/UPLOTS-6C36.

2606.13795 2026-06-18 cs.LG 版本更新

DiPOD: Diffusion Policy Optimization without Drifting Apart

无漂移扩散策略优化

Haozhe Jiang, Haiwen Feng, Pieter Abbeel, Jiantao Jiao, Angjoo Kanazawa, Nika Haghtalab

发表机构 * University of California, Berkeley(加州大学伯克利分校) Simons Institute for the Theory of Computing(西蒙斯计算理论研究所) Department of Electrical Engineering and Computer Sciences, University of California, Berkeley(加州大学伯克利分校电气工程与计算机科学系)

AI总结 针对扩散策略梯度方法的不稳定性,提出DiPOD框架,通过自蒸馏与策略改进梯度更新交替进行,维持紧界行为,实现稳定且高效的策略优化。

Comments Project page: astro-eric.github.io/blogs/dipod/ Code: https://github.com/Astro-Eric/DiPOD-release

详情
AI中文摘要

RL后训练对于改进扩散策略越来越关键,但现有的扩散策略梯度方法往往不稳定,无法实现可靠的策略改进。我们确定原因是双重漂移现象:优化变分代理可能导致ELBO与真实对数似然分离,从而使产生的代理策略梯度与期望回报的真实策略梯度不对齐。我们提出\textbf{DiPOD},一种扩散策略优化框架,通过将自蒸馏与策略改进梯度更新交替进行,在整个训练过程中维持紧界行为。这导致了一个简单实用的算法:在每个扩散策略梯度更新中增加一个在策略ELBO正则化项。在扩散语言模型后训练和连续控制扩散策略中,DiPOD显著稳定了训练,并达到了比先前方法更高的奖励。

英文摘要

RL post-training has become increasingly pivotal for improving diffusion policies, but existing diffusion policy-gradient methods are often unstable and cannot achieve reliable policy improvement. We identify the cause as the double-drift phenomenon: optimizing a variational surrogate can let the ELBO separate from the true log-likelihood, which then makes the resulting proxy policy gradient misaligned with the true policy gradient of expected return. We propose \textbf{DiPOD}, a diffusion policy optimization framework that maintains tight-bound behavior throughout training by interleaving self-distillation with policy-improving gradient updates. This leads to a simple and practical algorithm: augmenting each diffusion policy-gradient update with an on-policy ELBO regularizer. Across diffusion language model post-training and continuous-control diffusion policies, DiPOD substantially stabilizes training and reaches higher rewards than previous methods.

2502.07531 2026-06-18 cs.CV cs.AI cs.LG cs.MM 版本更新

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation

VidCRAFT3: 面向图像到视频生成的相机、物体与光照控制

Sixiao Zheng, Zimian Peng, Yanpeng Zhou, Yi Zhu, Hang Xu, Xiangru Huang, Yanwei Fu

发表机构 * School of Data Science, Fudan University(复旦大学数据科学学院) Shanghai Innovation Institute(上海创新研究院) Zhejiang University(浙江大学) Huawei Noah’s Ark Lab(华为诺亚实验室) Westlake University(西湖大学) School of Data Science and MOE Frontiers Center for Brain Science, Fudan University(复旦大学数据科学学院和脑科学前沿中心) Fudan ISTBI–ZJNU Algorithm Centre for Brain-inspired Intelligence, Zhejiang Normal University(复旦大学-浙江师范大学脑启发智能算法中心)

AI总结 提出VidCRAFT3框架,通过显式建模几何、运动与光照的跨因素交互,实现对相机运动、物体运动和光照方向的独立或联合控制,在控制精度和视觉一致性上达到最优。

Comments Accepted to TVCG 2026

详情
AI中文摘要

可控图像到视频(I2V)生成将参考图像转换为由用户指定控制信号引导的连贯视频。虽然对相机运动、物体运动和光照的精确控制对于高保真创作至关重要,但现有方法通常独立处理这些因素,忽视了动态场景中视角、几何和光照之间的物理耦合,导致同时变化时出现阴影不匹配和透视漂移等视觉不一致问题。我们提出了VidCRAFT3,一个统一且灵活的I2V框架,显式建模几何、运动和光照之间的跨因素交互,实现对相机运动、物体运动和光照方向的独立或联合控制。Image2Cloud提供显式的3D几何先验以实现精确的相机运动控制。ObjMotionNet将稀疏物体轨迹编码为多尺度运动特征,以引导逼真的物体运动。空间三重注意力变压器通过光照交叉注意力整合光照方向,实现一致的重光照。为了解决联合标注数据的稀缺性,我们构建了VideoLightingDirection(VLD)数据集,包含精确的逐帧光照方向标注,并引入三阶段渐进训练策略,使得无需完全联合标注即可实现鲁棒学习。大量实验表明,VidCRAFT3在多种场景下的控制精度和视觉一致性上达到了最先进水平。

英文摘要

Controllable image-to-video (I2V) generation transforms a reference image into a coherent video guided by user-specified control signals. While precise control over camera motion, object motion, and lighting is essential for high-fidelity creation, existing methods often treat these factors independently. This overlooks the physical coupling among viewpoint, geometry, and illumination in dynamic scenes, leading to visual inconsistencies such as mismatched shadows and perspective drift under simultaneous changes. We present VidCRAFT3, a unified and flexible I2V framework that explicitly models cross-factor interactions among geometry, motion, and illumination, enabling both independent and joint control over camera motion, object motion, and lighting direction. Image2Cloud provides explicit 3D geometric priors for accurate camera motion control. ObjMotionNet encodes sparse object trajectories into multi-scale motion features to guide realistic object motion. A Spatial Triple-Attention Transformer integrates lighting direction through lighting cross-attention for consistent relighting. To address the scarcity of jointly annotated data, we construct the VideoLightingDirection (VLD) dataset with accurate per-frame lighting direction annotations, and introduce a three-stage progressive training strategy that enables robust learning without fully joint annotations. Extensive experiments demonstrate that VidCRAFT3 achieves state-of-the-art performance in control precision and visual coherence across diverse scenarios.

2602.23006 2026-06-18 stat.ML cs.LG 版本更新

Regular Fourier Features for Nonstationary Gaussian Processes

非平稳高斯过程的规则傅里叶特征

Arsalan Jawaid, Abdullah Karatas, Jörg Seewig

发表机构 * Institute of Measurement and Sensor Technology University of Kaiserslautern-Landau(测量与传感器技术研究所 柏林-卡尔斯鲁厄大学) Independent Researcher(独立研究者)

AI总结 提出规则傅里叶特征方法,通过直接离散化谱表示避免概率假设,实现非平稳高斯过程的低秩近似,并扩展至核学习。

Comments 11 pages (9 main + 2 suppl.), 5 figures, 2 tables

详情
AI中文摘要

模拟高斯过程需要从高维高斯分布中采样,其计算复杂度随采样点数量呈三次方增长。谱方法通过利用傅里叶表示并将谱密度视为适用于蒙特卡洛近似的概率分布来应对这一挑战。尽管这种概率解释对平稳过程有效,但对于非平稳情况则过于严格,因为非平稳过程的谱密度通常不是概率测度。我们针对可调和过程提出规则傅里叶特征以避免这一限制。我们的方法直接离散化谱表示,保留谱权重之间的相关结构,无需概率假设。在有限谱支撑假设下,这产生了一个高效的低秩近似,该近似一致且半正定。当谱密度未知时,该框架自然地扩展到基于数据的核学习。我们在局部平稳和可调和混合核(后者具有复值谱密度)上演示了该方法,并将核学习扩展应用于真实和合成数据。

英文摘要

Simulating a Gaussian process requires sampling from a high-dimensional Gaussian distribution, which scales cubically with the number of sample locations. Spectral methods address this challenge by exploiting the Fourier representation and treating the spectral density as a probability distribution suitable for Monte Carlo approximation. Although this probabilistic interpretation is valid for stationary processes, it is overly restrictive for the nonstationary case, where spectral densities are generally not probability measures. We propose regular Fourier features for harmonizable processes to avoid this limitation. Our method discretizes the spectral representation directly, preserving the correlation structure among spectral weights without requiring probability assumptions. Under a finite-spectral-support assumption, this yields an efficient low-rank approximation that is consistent and positive semi-definite by construction. When the spectral density is unknown, the framework extends naturally to kernel learning from data. We demonstrate the method on locally stationary and harmonizable mixture kernels, the latter with a complex-valued spectral density, and apply the kernel-learning extension to real and synthetic data.

2605.27478 2026-06-18 stat.ML cs.LG math.PR 版本更新

Triangular-Reference Schrödinger Bridges for Time Series Generation

三角参考薛定谔桥用于时间序列生成

Gabriele Bocchi

发表机构 * Arakne S.r.l.(阿拉克内公司)

AI总结 提出三角参考薛定谔桥框架,通过区间冻结的退化扩散参考和层次化潜在波动率结构,实现时间序列的保守生成,并保持熵最小化的变分核心。

详情
AI中文摘要

我们引入了用于时间序列的三角参考薛定谔桥(TR-SBTS),这是SBTS框架的一种保守扩展,其中布朗参考被替换为区间冻结的、可能退化的扩散参考,在潜在波动率水平的层次上呈三角形。该构造是在增广状态空间上的单一熵投影,变分约束在时间和潜在水平上联合施加,并通过相对熵的分解层次展开。SBTS的变分核心得以保留:熵最小化器是参考的h-变换,在每个冻结区间上,最优动力学在活跃协方差方向的仿射叶上具有对数梯度漂移公式,即使冻结协方差是秩亏的也成立。我们建立了冻结近似的稳定性以及相应正则化核估计量的收敛性。该构造通过一个有限维条件映射实现,该映射由三种互补的过去约简组成——块PCR摘要、由运行时冻结协方差累积量诱导的过去增量的参考感知马氏核,以及在同一参考度量下的过去窗口WLS漂移回归器——以及一个耦合的状态-协方差桥步骤,其中每个潜在水平为上一水平产生动态参考,并由协方差描述符总结;该构造在数值实验上进行了评估。

英文摘要

Schrödinger bridges for time series (SBTS) generate synthetic paths by projecting, in relative entropy, a Brownian reference onto the path laws that match the joint distribution of the data on the observation grid. The Brownian reference, however, fixes the quadratic variation of the generated paths, which is restrictive when stochastic volatility, correlated noise, or rank-deficient covariance structures must be reproduced. We introduce "Triangular-Reference Schrödinger Bridges for Time Series" (TR-SBTS), which keeps the entropy-projection backbone of SBTS but replaces the Brownian reference by a triangular, volatility-informed, intervalwise frozen reference on a state augmented with latent covariance descriptors. The construction remains a single entropy projection on the augmented state: the minimiser is the \(h\)-transform of the reference, and on each frozen interval the optimal drift has the logarithmic-gradient form \(b^\star(t,x)=A\,\nabla\log H(t,x)\), intrinsic to the active covariance directions when the frozen covariance \(A\) is degenerate. We prove stability of the frozen approximation and consistency of the associated regularised kernel estimators, describe a reference-aware Nadaraya--Watson implementation of the conditional next-increment law, and evaluate the construction on numerical experiments.

2605.28690 2026-06-18 quant-ph cs.LG 版本更新

Latent-Conditioned Parameterized Quantum Circuits as Universal Approximators for Distributions over Quantum States

潜在条件参数化量子电路作为量子态分布的通用近似器

Quoc Hoan Tran, Koki Chinzei, Yasuhiro Endo, Hirotaka Oshima

发表机构 * Quantum Laboratory, Fujitsu Research, Fujitsu Limited(Fujitsu 研究所量子实验室, Fujitsu 有限公司)

AI总结 提出潜在条件参数化量子电路(LPQC),通过经典神经网络将潜在变量映射到量子电路参数,证明其在1-Wasserstein距离下是密度算子概率测度的通用近似器,并引入多模态潜在先验和专家混合电路架构缓解贫瘠高原问题。

Comments 21 pages, 11 figures (fix the proof and update appendix for barren plateaus analysis)

详情
AI中文摘要

量子模拟、量子化学和量子机器学习中的许多应用不仅需要单个量子态,还需要表征目标系统异质性的量子态系综。在变分和容错设置中,逐个状态地准备这样的系综是不可行的,这激发了生成式建模方法。我们引入了潜在条件参数化量子电路(LPQC),这是一种混合量子-经典框架,其中经典神经网络将从先验分布中采样的潜在变量映射到参数化量子电路的参数。我们证明了LPQC在1-Wasserstein距离下是密度算子概率测度的通用近似器,将经典通用近似定理扩展到量子分布设置。我们还引入了多模态潜在先验和专家混合电路架构,并表明它在优化过程中经验性地缓解了贫瘠高原问题。数值实验在合成多簇混合量子态系综和QM9衍生的3D分子结构系综上验证了该框架。在这些任务中,LPQC优于最近的量子生成基线,同时与典型的经典基线相比,在输出维度大幅降低的情况下保持竞争力。通过利用潜在空间中的经典表达能力,LPQC为量子生成建模提供了一条可行的途径。

英文摘要

Many applications in quantum simulation, quantum chemistry, and quantum machine learning require not a single quantum state but an ensemble of states characterizing the heterogeneity of a target system. Preparing such ensembles state-by-state is prohibitive in both variational and fault-tolerant settings, thereby motivating a generative modeling approach. We introduce latent-conditioned parameterized quantum circuits (LPQCs), a hybrid quantum-classical framework in which classical neural networks map a latent variable sampled from a prior distribution to the parameters of a parameterized quantum circuit. We prove that LPQCs are universal approximators for probability measures over density operators in the 1-Wasserstein distance, extending classical universal approximation theorems to the quantum-distribution setting. We additionally introduce a multimodal latent prior and a mixture-of-experts circuit architecture, and show empirically that the latent-conditioned parameterization alleviates the barren plateau problem during optimization, a behavior for which we provide rigorous partial guarantees. Numerical experiments validate the framework on a synthetic multi-cluster ensemble of mixed quantum states and on a QM9-derived ensemble of 3-D molecular structures. In these tasks, LPQC outperforms recent quantum generative baselines and matches the generation quality of a classical neural-network baseline, while requiring an output dimension that grows only linearly with the number of qubits rather than exponentially. By leveraging classical expressivity in the latent space, LPQCs offer a tractable route to quantum generative modeling.

2606.17491 2026-06-18 stat.ML cs.LG stat.ME 版本更新

A Bayesian Boolean Matrix Factorization with Application to Copy Number Analysis in Cancer

贝叶斯布尔矩阵分解及其在癌症拷贝数分析中的应用

Adolphus Wagala, Mehmet Samur, Giovanni Parmigiani

发表机构 * Department of Data Science, Dana-Farber Cancer Institute(数据科学部,达纳-法伯癌症研究所) Department of Biostatistics, Harvard T.H. Chan School of Public Health(生物统计学部,哈佛T.H. 潘克学校公共卫生学院)

AI总结 提出贝叶斯布尔矩阵分解(BBMF)模型,通过全共轭生成模型和稀疏先验实现布尔约束下的可解释因子分解,并应用于多发性骨髓瘤的染色体臂拷贝数变异分析,揭示肿瘤异质性的离散潜在结构。

详情
AI中文摘要

二值数据分解很常见,但实值方法忽略了离散性并产生难以解释的因子。布尔矩阵分解(BooMF)通过逻辑与和或运算将二值矩阵分解为两个低秩二值矩阵,将数据表示为可解释模式的布尔析取。在癌症基因组学中,BooMF可以揭示可能驱动肿瘤演化的协调特征变化,这与旋转或加性分解不同。大多数现有的BooMF方法是启发式的、贪婪的、对初始化敏感、容易陷入局部最优,并且不支持原则性的模型选择或不确定性量化。我们引入了贝叶斯布尔矩阵分解(BBMF),这是一个具有稀疏诱导先验的全共轭生成模型。它强制执行布尔约束,产生具有一致不确定性量化的可解释潜在因子,并允许具有封闭形式全条件分布的吉布斯采样。由于癌症演化通常涉及广泛、近乎同时的染色体数目变化(例如,全基因组复制后伴随不稳定性和选择),布尔分解比加性模型更自然地捕捉这些模式。应用于多发性骨髓瘤的臂级拷贝数变异数据(其中条目指示染色体臂扩增的存在/缺失),BBMF找到了一小组可解释的双团,将患者子集与反复共变的染色体臂联系起来,提供了肿瘤异质性的紧凑、生物学上有意义的总结,并展示了BBMF在复杂二值数据中发现离散潜在结构的实用性。

英文摘要

Binary data factorization is common, but real-valued methods ignore discreteness and yield hard-to-interpret factors. Boolean Matrix Factorization (BooMF) instead decomposes a binary matrix into two lower-rank binary matrices via logical AND and OR, expressing the data as a Boolean disjunction of interpretable patterns. In cancer genomics, BooMF can reveal coordinated feature changes that may drive tumor evolution, unlike rotational or additive decompositions. Most existing BooMF methods are heuristic, greedy, sensitive to initialization, prone to local optima, and do not support principled model selection or uncertainty quantification. We introduce Bayesian Boolean Matrix Factorization (BBMF), a fully conjugate generative model with sparsity-inducing priors. It enforces Boolean constraints, yields interpretable latent factors with coherent uncertainty quantification, and admits Gibbs sampling with closed-form full conditionals. Because cancer evolution often involves widespread, near-simultaneous chromosome-number changes (e.g., whole-genome duplication followed by instability and selection), Boolean factorizations capture these patterns more naturally than additive models. Applied to arm-level copy-number alteration data in multiple myeloma, where entries indicate presence/absence of chromosomal-arm amplifications, BBMF finds a small set of interpretable bicliques linking patient subsets to recurrently co-altered chromosomal arms, providing a compact, biologically meaningful summary of tumor heterogeneity and demonstrating BBMF's utility for uncovering discrete latent structure in complex binary data.

5. 优化、泛化与理论分析 30 篇

2606.18303 2026-06-18 cs.LG cs.AI 新提交

A Link between Shock-wave Theory and Symmetry-reduced Stochastic Gradient Descent for Artificial Neural Networks

冲击波理论与人工神经网络对称约化随机梯度下降之间的联系

Taiki Miyagawa

发表机构 * NEC Corporation(NEC公司)

AI总结 本文通过微分几何、李群和流体力学,建立了冲击波理论与对称商化随机梯度下降学习动力学之间的显式数学联系,并应用于多种神经网络架构。

Comments Accepted to the 35th International Conference on Artificial Neural Networks (ICANN) 2026

详情
AI中文摘要

我们利用微分几何、李群理论和流体力学,在冲击波理论与随机梯度下降的对称商化学习动力学之间建立了显式的数学联系。具体而言,在商化参数对称性并应用局部熵粗粒化后,有效动力学满足商流形上的粘性Hamilton-Jacobi方程。此外,假设原始参数动力学可由商空间上的梯度场概括,粗粒化损失函数的梯度服从Burgers型方程,且可严格建立激波形成。我们将该理论应用于多层感知机、卷积神经网络、Transformer和平均场网络,并证明它们满足Hamilton-Jacobi或Burgers型方程。我们推测该框架也能为深度学习提供实用的诊断工具。在诸如Transformer等架构中,原始参数范数常因对称冗余而失真,可能产生误导,而对称校正的商可观测量为监测、预测和控制训练阶段转变提供了原则性基础。

英文摘要

We develop a mathematically explicit link between shock-wave theory and the symmetry-quotiented learning dynamics of stochastic gradient descent, drawing on differential geometry, Lie group theory, and fluid mechanics. Specifically, after quotienting parameter symmetries and applying local-entropy coarse-graining, the effective dynamics satisfy a viscous Hamilton--Jacobi equation on the quotient manifold. Moreover, under the assumption that the raw parameter dynamics can be summarized by a gradient field on the quotiented space, the gradient of the coarse-grained loss function obeys a Burgers-type equation, and shock formation can be established rigorously. We apply our theory to multilayer perceptrons, convolutional neural networks, Transformers, and mean-field networks, and show that they obey the Hamilton--Jacobi or Burgers-type equations. We conjecture that this framework also yields practical diagnostics for deep learning. In architectures such as Transformers, raw parameter norms are often distorted by symmetry redundancy and may therefore be misleading, whereas symmetry-corrected quotient observables provide a principled basis for monitoring, forecasting, and controlling training-phase transitions.

2606.18306 2026-06-18 cs.LG stat.ML 新提交

Fisher Width: A Geometric Measure of Complexity on Statistical Manifolds

Fisher宽度:统计流形上的几何复杂度度量

Vu Khac Ky

发表机构 * Department of Mathematics, FPT University(FPT大学数学系)

AI总结 提出Fisher宽度作为统计流形上高斯宽度的类比,利用Fisher信息度量局部几何,并证明其保持高斯宽度的关键性质,应用于Fisher-Lipschitz假设类的泛化界。

Comments 48 pages, 3 figures

详情
AI中文摘要

高斯宽度是高维概率、压缩感知、凸优化和学习理论中的一个核心几何复杂度度量。它量化了集合沿随机方向的平均延伸程度,从而捕捉了约束集、假设类和下降锥的有效维度。然而,这一概念本质上是欧几里得的。统计模型则具有由Fisher信息度量诱导的自然黎曼几何,其中方向根据统计可区分性而非环境欧几里得长度进行缩放。我们引入了Fisher宽度,即统计流形上高斯宽度的Fisher几何类比。在参数点$\ heta$处,Fisher宽度将欧几里得恒等替换为局部度量张量$G(\ heta)^{1/2}$,测量Fisher重缩放集的高斯宽度。这使得所得量对局部统计曲率敏感,且在光滑重参数化下不变。我们发展了Fisher宽度的基本理论,表明它保留了高斯宽度的关键结构特征,包括集中性、度量扰动稳定性以及与欧几里得基线的谱比较界,同时捕捉了欧几里得度量无法察觉的各向异性几何效应。作为应用,我们证明了Fisher-Lipschitz假设类的泛化界,并提出了可计算的估计量,在MNIST上对三个模型类进行了实证评估。Fisher宽度之于统计流形,正如高斯宽度之于欧几里得凸体。这项工作为研究弯曲统计流形上的复杂性和学习奠定了基础。

英文摘要

Gaussian width is a central geometric complexity measure in high-dimensional probability, compressed sensing, convex optimization, and learning theory. It quantifies the average extent of a set along random directions, thereby capturing the effective dimension of constraint sets, hypothesis classes, and descent cones. However, this notion is intrinsically Euclidean. Statistical models instead carry a natural Riemannian geometry induced by the Fisher information metric, where directions are scaled according to statistical distinguishability rather than ambient Euclidean length. We introduce Fisher width, a Fisher-geometric analogue of Gaussian width for statistical manifolds. At a parameter point $θ$, Fisher width replaces the Euclidean identity by the local metric tensor $G(θ)^{1/2}$, measuring the Gaussian width of the Fisher-rescaled set. This makes the resulting quantity sensitive to local statistical curvature and invariant under smooth reparameterizations. We develop the basic theory of Fisher width, showing that it retains key structural features of Gaussian width, including concentration, metric perturbation stability, and spectral comparison bounds with the Euclidean baseline, while also capturing anisotropic geometric effects invisible to Euclidean measures. As an application, we prove a generalization bound for Fisher-Lipschitz hypothesis classes and propose computable estimators, which we evaluate empirically on MNIST across three model classes. Fisher width is to statistical manifolds what Gaussian width is to Euclidean convex bodies. This work lays the foundation for studying complexity and learning on curved statistical manifolds.

2606.18420 2026-06-18 cs.LG q-bio.QM stat.ML 新提交

Measurement noise limits the advantage of nonlinear models over linear models in biomedical prediction

测量噪声限制了非线性模型在生物医学预测中相对于线性模型的优势

Marc-Andre Schulz, Kerstin Ritter

发表机构 * Hertie Institute for AI in Brain Health, University of Tübingen(赫蒂人工智能脑健康研究所,图宾根大学) Tübingen AI Center, University of Tübingen(图宾根人工智能中心,图宾根大学) Department of Psychiatry and Neurosciences, Charité – Universitätsmedizin Berlin(精神病学与神经科学系,柏林夏里特医学院) Bernstein Center for Computational Neuroscience, Berlin(伯恩斯坦计算神经科学中心,柏林) German Center for Mental Health (DZPG), partner site Tübingen(德国心理健康中心(DZPG),图宾根合作站点)

AI总结 本文指出,在生物医学表格数据中,测量噪声会削弱非线性结构,导致非线性模型与线性模型性能相当,并提出了一个精确的超额风险恒等式,揭示了测量可靠性、样本量和特征表示三个条件必须同时满足才能体现非线性优势。

详情
AI中文摘要

在生物医学表格数据上,诸如深度网络、梯度提升树和核方法等灵活模型,在给定相同特征的情况下,反复被线性回归和逻辑回归匹配或击败。通常的反应是将其视为模型方面的不足,需要通过更多数据、更好的架构或调参来修复,假设非线性结构存在而模型未能捕捉到。我们认为,当限制因素是测量而非模型时(这在生物医学中经常发生),这些修复无法奏效。加性噪声模糊了群体最优预测器,并且由于模糊在去除函数的广泛形状之前先去除精细、快速变化的细节,它比线性结构更快地抹去非线性结构。一个k阶交互作用被特征可靠性的k次幂衰减,而线性部分只衰减一次。在生物医学测量典型的可靠性下,即使底层生物学是强非线性的,非线性优势也可能消失,并且噪声所移除的部分无法通过更大的队列或更灵活的模型恢复,只能通过更好的测量。非线性是隐藏的,而非缺失,线性模型与灵活模型之间的平局本身并不能对生物学做出定论。这些片段是经典的,来自测量误差统计、心理测量学和高斯分析,我们将它们组合成一个精确的超额风险恒等式。测量可靠性是与样本量和特征表示并列的三个条件之一,必须对齐才能使灵活模型发挥作用,而它们共同只留下一个狭窄的窗口,大多数生物医学任务落在此窗口之外。在140个英国生物银行任务中,灵活模型与线性模型之间的差距(如果存在)带有预测的噪声特征,并且这三个条件可以通过干预而非仅通过基准测试来分离。

英文摘要

On biomedical tabular data, flexible models such as deep networks, gradient-boosted trees, and kernel methods are repeatedly matched or beaten by linear and logistic regression given the same features. The usual reaction is to treat this as a model-side shortfall, to be fixed with more data, a better architecture, or tuning, on the assumption that the nonlinear structure is there and the model has failed to capture it. We argue that these fixes cannot help when the binding limit is the measurement rather than the model, as it frequently is in biomedicine. Additive noise blurs the population-optimal predictor, and because blurring removes a function's fine, rapidly varying detail before its broad shape, it erases nonlinear structure faster than linear structure. A degree-$k$ interaction is attenuated by the $k$-th power of feature reliability, while the linear part is attenuated only once. At the reliabilities typical of biomedical measurement, the nonlinear advantage can vanish even when the underlying biology is strongly nonlinear, and what the noise removes cannot be recovered by a larger cohort or a more flexible model, only by better measurement. The nonlinearity is hidden, not absent, and a tie between linear and flexible models is not by itself a verdict on the biology. These pieces are classical, drawn from measurement-error statistics, psychometrics, and Gaussian analysis, and we assemble them into an exact excess-risk identity. Measurement reliability is one of three conditions, alongside sample size and feature representation, that must align for a flexible model to help, and together they leave only a narrow window that most biomedical tasks fall outside. Across 140 UK Biobank tasks, the gap between flexible and linear models, where it exists, carries the predicted noise signature, and the three conditions can be separated by intervention but not by a benchmark alone.

2606.18465 2026-06-18 cs.LG cs.AI 新提交

What Does the Weight Norm Control in Grokking? Logit-Scale Mediation under Cross-Entropy

权重范数在Grokking中控制什么?交叉熵下的对数尺度中介作用

Truong Xuan Khanh

发表机构 * H&K Research Studio, Clevix LLC

AI总结 本文通过固定权重范数并改变输出温度,发现Grokking延迟主要由对数尺度(logit scale)决定,权重范数仅通过影响对数尺度间接起作用。

Comments 16 papges, 10 tables and 4 figures. Code and data to reproduce all numbers, tables, and figures: https://github.com/ClevixLab/grokking-logit-scale

详情
AI中文摘要

Grokking,即从记忆到泛化的延迟跳跃,通常与权重范数相关:范数越小,泛化越早。我们探究范数实际控制什么。通过钳位固定权重范数并仅改变输出温度,我们在交叉熵下将Grokking延迟滑动到其整个范数诱导范围;将有效对数尺度匹配回基线可恢复两个模数下约85%的延迟。在范数和温度的网格上,延迟仅由对数尺度决定(R2 = 0.97),范数仅额外贡献1-2%。该效应依赖于损失函数:在均方误差下,对数尺度被固定,范数通过不同路径起作用。记忆控制、float64 softmax崩溃审计和无LayerNorm的Transformer均指向同一通道。从同一状态分叉,延迟遵循钳位的范数值而非钳位操作本身,这排除了重缩放伪影。近端变量是对数尺度及其驱动的softmax饱和;权重范数仅是上游手柄。所有数字、表格和图表均可从发布的代码和数据中复现。

英文摘要

Grokking, the delayed jump from memorization to generalization, is usually tied to the weight norm: a smaller norm generalizes sooner. We ask what the norm actually controls. Holding the weight norm fixed by clamping and varying only an output temperature, we slide the grokking delay across its entire norm-induced range under cross-entropy; matching the effective logit scale back to baseline recovers about 85% of the delay at two moduli. Across a grid of norms and temperatures the delay collapses onto the logit scale alone (R2 = 0.97), with the norm adding 1-2% beyond it. The effect is loss-dependent: under mean-squared error the logit scale is pinned and the norm acts through a different route. A memorization control, a float64 softmax-collapse audit, and a no-LayerNorm transformer point to the same channel. Forking arms from one identical state, the delay follows the held norm value and not the clamp operation, which closes a rescaling-artifact concern. The proximal variable is the logit scale and the softmax saturation it drives; the weight norm is only an upstream handle. All numbers, tables, and figures reproduce from released code and data.

2606.18538 2026-06-18 cs.LG stat.ML 新提交

Effects of sparsity and superposition on loss in simple autoencoders

稀疏性与叠加对简单自编码器损失的影响

Mriganka Basu Roy Chowdhury, Eric McLaughlin Weiner

发表机构 * Department of Statistics, UC Berkeley(伯克利大学统计学系) Department of Materials Science, UC Berkeley(伯克利大学材料科学系)

AI总结 研究神经网络中多语义性源于叠加现象,通过数学分析稀疏输入下自编码器的L2重构损失上下界,验证并扩展了Elhage等人的实证结果。

Comments 16 pages, 3 figures

详情
AI中文摘要

神经网络机械可解释性的主要困难之一是出现多语义性,即每个神经元通常负责多个不同任务,阻碍了对其功能的清晰解释。Elhage等人(2022)的开创性论文认为,这是由于叠加现象,即神经网络将不同特征表示为低维空间中的非正交方向,这种策略可以在不牺牲保真度的情况下实现更大的数据压缩,因为输入向量具有特征稀疏性。Elhage等人(2022)在一个相当自然且简单的具有稀疏输入的自编码器中实证验证了这些假设。本文的贡献在于分析叠加现象发生和最优性的数学基础,同时严格证实了他们的一些发现。特别地,我们为幂激活函数提供了L2重构损失的上界和下界,在非常稀疏的情况下是紧的。文末还包含一个简短的开放问题列表。

英文摘要

One of the major difficulties in the mechanistic interpretability of neural networks is the occurrence of polysemanticity, which suggests that each neuron is typically responsible for multiple different tasks, impeding a clean interpretation of their function. The seminal paper of Elhage et al. (2022) argues that this occurs due to superposition, a phenomenon where the neural network represents distinct features as non-orthogonal directions in a lower-dimensional space, a strategy that allows much greater compression of the data without sacrificing fidelity due to the feature sparsity of input vectors. Elhage et al. (2022) empirically validates these hypotheses in a rather natural and simple autoencoder with sparse inputs. The contribution of the present work is to analyze the mathematical basis for the occurrence and optimality of superposition, while rigorously corroborating some of their findings. In particular, we provide upper and lower bounds for the L2 reconstruction loss, tight in the very sparse regime, for power activation functions. A short list of interesting open problems are also included at the end.

2606.18778 2026-06-18 cs.LG stat.ML 新提交

Online Distributional Prediction via Latent Cluster Geometry Under Drift and Corruption

漂移与腐败下基于潜在簇几何的在线分布预测

Navyansh Mahla, Prateek Chanda, Ganesh Ramakrishnan

发表机构 * Indian Institute of Technology, Bombay(印度理工学院,孟买)

AI总结 针对非平稳流中的在线分布预测问题,提出一种基于潜在簇几何的吉布斯准后验方法,通过可逆跳跃MCMC采样变维后验,并引入重启变体应对漂移,在亚线性腐败预算和运输代价下实现亚线性Wasserstein遗憾。

详情
AI中文摘要

非平稳流中的在线学习通常被表述为跟踪点估计,但许多应用需要预测完整的数据生成分布。我们研究漂移和对抗性腐败下的在线分布预测。我们的方法通过潜在簇几何表示每个候选律:一个可变大小的中心配置,组织概率质量并诱导预测分布。这些配置上的吉布斯准后验通过后验平均产生在线预测器,所得变维后验可通过可逆跳跃MCMC采样。因此,该方法避免了指定参数化流律,同时保留了用于不确定性、正则化和比较的结构化潜在空间。我们通过累积Wasserstein-1遗憾相对于时变真实律来评估性能。分析分离了两种效应:腐败扰动基于损失的后验更新,而漂移使长时域后验记忆过时。我们通过一个重启变体来解决后者,该变体在时间上局部化相同的准贝叶斯更新。所得的高概率界分解为PAC-Bayesian复杂度项、腐败敏感的后验扰动项以及由\(A_T^{\mathrm{OT}}=\sum_{t=2}^T W_2^2(p_{t-1}^*,p_t^*)\)驱动的动态最优传输项。在有界支撑、稳定潜在几何、预测映射正则性、预言可实现性、局部化重启窗口、亚线性传输作用和亚线性腐败预算下,重启预测器实现了亚线性累积Wasserstein遗憾。这些保证不需要对流、漂移机制或腐败过程进行参数化建模。

英文摘要

Online learning in non-stationary streams is often formulated as tracking a point estimate, but many applications require predicting the full data-generating distribution. We study online distributional prediction under drift and adversarial corruption. Our approach represents each candidate law through a latent cluster geometry: a variable-size configuration of centers that organizes probability mass and induces a predictive distribution. A Gibbs quasi-posterior over these configurations yields an online predictor by posterior averaging, and the resulting variable-dimensional posterior can be sampled with reversible-jump MCMC. The method therefore avoids specifying a parametric streaming law while retaining a structured latent space for uncertainty, regularization, and comparison. We evaluate performance by cumulative Wasserstein-1 regret against the time-varying true law. The analysis separates two effects: corruption perturbs the loss-based posterior update, whereas drift makes long-horizon posterior memory stale. We address the latter with a restarted variant that temporally localizes the same quasi-Bayesian update. The resulting high-probability bounds decompose into a PAC-Bayesian complexity term, a corruption-sensitive posterior perturbation term, and a dynamic optimal-transport term driven by \(A_T^{\mathrm{OT}}=\sum_{t=2}^T W_2^2(p_{t-1}^*,p_t^*)\). Under bounded support, stable latent geometry, predictive-map regularity, oracle realizability, localized restart windows, sublinear transport action, and sublinear corruption budget, the restarted predictor achieves sublinear cumulative Wasserstein regret. These guarantees require no parametric model for the stream, drift mechanism, or corruption process.

2606.18834 2026-06-18 cs.LG 新提交

Identifying Structural Biases from Causal Mechanism Shifts

从因果机制变化中识别结构性偏差

Praharsh Nanavati, Jilles Vreeken, David Kaltenpoth

发表机构 * CISPA Helmholtz Center for Information Security(CISPA赫尔姆霍茨信息安全中心)

AI总结 提出利用环境间机制变化识别隐藏混淆和选择偏差,基于互信息构建可检验准则,并设计StruBI算法,在合成和真实数据上显著优于现有方法。

详情
AI中文摘要

因果发现方法通常假设所有数据独立同分布(i.i.d.),且系统中没有未测量的变量影响。在实践中,这些假设经常被违反,导致推断不准确。在本文中,我们研究如何从因果机制变化中识别隐藏混淆和选择偏差。特别地,我们表明结构性偏差会导致依赖的机制变化。也就是说,通过考虑在不同环境下的数据中哪些变量的机制发生了变化,我们可以判断哪些变量是无偏的,哪些受到隐藏混淆的影响,哪些正在经历选择偏差。我们将此形式化为一个基于互信息的经验可检验准则,并展示在哪些条件下它能识别结构性偏差。为了判断哪些节点受到何种偏差的影响,我们引入了StruBI算法。在合成和真实数据上的实验表明,StruBI在实践中表现良好,准确恢复了受影响的变量集和偏差类型,以较大优势超越了现有技术水平。

英文摘要

Causal discovery methods commonly assume that all data is independently and identically distributed (i.i.d.) and that there are no unmeasured variables affecting the system. In practice, these assumptions are often violated, leading to inaccurate inference. In this paper, we study how to identify hidden confounding and selection biases from causal mechanism shifts. In particular, we show that structural biases lead to dependent mechanism shifts. That is, by considering for which variables the mechanisms change given data from different environments, we can tell which variables are unbiased, which are subject to hidden confounding, and which are undergoing selection bias. We formalize this into an empirically testable criterion based on mutual information, and show under which conditions it identifies structural biases. To tell which nodes are subject to what kind of bias, we introduce the StruBI algorithm. Experiments on synthetic and real-world data show that StruBI works well in practice, accurately recovering affected variable sets and types of biases, outperforming the state-of-the-art by a wide margin.

2606.18918 2026-06-18 cs.LG cs.CC 新提交

Some Complexity Results for Robustness Verification for Binarized Neural Networks

二值化神经网络鲁棒性验证的一些复杂性结果

Harshit Goyal, Sudakshina Dutta

发表机构 * Indian Institute of Technology Goa(印度理工学院Goa)

AI总结 本文通过从布尔可满足性问题归约证明二值化神经网络的可满足性是NP完全的,并利用均匀遮挡导致的网络输出分段常数结构,提出多项式时间鲁棒性检查算法。

详情
AI中文摘要

本文研究了二值化神经网络(BNNs)验证问题的计算复杂性,其中激活函数(有时权重)是二值的。我们分析了两个问题:可满足性和均匀图像遮挡下的鲁棒性。我们通过从布尔可满足性问题(SAT)归约证明BNN可满足性是NP完全的,并且均匀遮挡在网络输出中诱导出分段常数结构,从而实现了多项式时间的鲁棒性检查算法。

英文摘要

This paper studies the computational complexity of verification problems for Binarized Neural Networks (BNNs), where activations (and sometimes weights) are binary. We analyze two problems: satisfiability and robustness under uniform image occlusion. We show that BNN satisfiability is NP-complete via a reduction from Boolean satisfiability problem (SAT), and that uniform occlusion induces a piecewise-constant structure in the network output, enabling a polynomial-time robustness-checking algorithm.

2606.19036 2026-06-18 cs.LG 新提交

Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts

稀疏混合专家模型中不连续性的几何与随机分析

Tho Tran Huu, Huu-Tuan Nguyen, Thien-Hai Nguyen, Nhat-Tri Ho, Viet-Hoang Tran, Tho Quan, Tan Minh Nguyen

发表机构 * Department of Mathematics, National University of Singapore, Singapore(新加坡国立大学数学系) Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology (HCMUT), VNU-HCM, Ho Chi Minh City, Vietnam(胡志明市技术大学计算机科学与工程学院)

AI总结 本文对稀疏混合专家模型中的不连续性进行几何与随机分析,分类不连续阶数,建立渐近体积估计,证明随机路径几乎必然击中一阶不连续,并提出低开销平滑机制以提升性能。

Comments ICML 2026 Spotlight

详情
AI中文摘要

稀疏混合专家(SMoE)架构现已广泛应用于最先进的语言和视觉模型中,其中条件路由允许扩展到非常大的网络。然而,正是这种Top-$k$专家选择使得条件路由成为可能,同时也导致SMoE映射本质上不连续。在这些不连续曲面附近,即使任意接近的输入也可能激活截然不同的专家集,从而产生显著不同的输出。本文对这些不连续性进行了严格的几何和随机分析。首先,我们根据切换事件中并列专家的数量对不连续性进行阶数分类。利用测度论切片论证,我们建立了加厚不连续曲面的渐近体积估计,表明低阶不连续集占主导地位,而高阶不连续集占据的体积相对极小。接着,通过扩散过程对输入空间中的随机扰动建模,我们证明路径最终会遇到不连续,并且首次击中几乎必然发生在阶数为1的不连续上,同时给出了显式的有限时间概率界。我们进一步推导了占据时间界,量化了随机路径在每个不连续阶数邻域内停留的时长。这些理论结果表明输入更可能位于低阶不连续附近。受此启发,我们提出一种简单的平滑机制,可直接应用于现有SMoE,在接近不连续处软性地整合专家;我们的分析保证增加的额外计算开销很小,同时在不连续附近提供局部平滑,跨语言和视觉任务的实验表明,平滑不仅增强了SMoE映射的连续性,还提升了经验性能。

英文摘要

Sparse Mixture-of-Experts (SMoE) architectures are now widely deployed in state-of-the-art language and vision models, where conditional routing allows scaling to very large networks. However, this very Top-$k$ expert selection that enables conditional routing also renders the SMoE map inherently discontinuous. In the vicinity of these discontinuity surfaces, even inputs that are arbitrarily close may activate substantially different sets of experts resulting in significantly different outputs. In this work we give a rigorous geometric and stochastic analysis of these discontinuities. We first classify them by order, determined by the number of tied experts at a switching event. Using measure-theoretic slicing arguments, we establish asymptotic volume estimates for the thickened discontinuity surfaces, showing that lower-order discontinuity sets dominate, whereas higher-order ones occupy a vanishingly small relative volume. Next, modeling random perturbations in the input space via a diffusion process, we prove that the path eventually encounter a discontinuity, and moreover that the first hit almost surely occurs on an order-1 discontinuity with explicit finite-time probability bounds. We further derive occupation-time bounds that quantify the duration the random path spend in the neighborhoods of each discontinuity order. These theoretical results imply that inputs are more likely to lie near lower order discontinuities. Motivated by this insight, we propose a simple smoothing mechanism that can be directly applied to existing SMoEs, softly incorporating experts near discontinuities; our analysis guarantees that the added computational overhead remains small while providing localized smoothing near discontinuities, and experiments across language and vision tasks show that smoothing not only enforces continuity of the SMoE map but also enhances empirical performance.

2606.19105 2026-06-18 cs.LG stat.ML 新提交

Smoothness-Based Derandomization of PAC-Bayes Bounds

基于光滑性的PAC-Bayes去随机化

Alexandre Lemire Paquin, Brahim Chaib-Draa, Philippe Giguère

发表机构 * Department of Computer Science and Software Engineering(计算机科学与软件工程系) Université Laval(拉瓦尔大学)

AI总结 利用损失和预测器的光滑性,将Gibbs预测器去随机化为后验均值处的确定性预测器,通过Jensen间隙类的Rademacher复杂度控制泛化界,并导出涉及参数雅可比和海森矩阵的正则化器。

详情
AI中文摘要

我们研究光滑损失函数的PAC-Bayes去随机化。我们的目标是通过利用损失和预测器类的光滑性,获得对确定性预测器以高概率成立的泛化界。我们表明,从Gibbs预测器到后验均值处的确定性预测器的转换有一个精确的代价,由Jensen间隙类的泛化间隙给出。我们通过其Rademacher复杂度控制该类,从而得到涉及以参数雅可比和得分图的海森矩阵表示的平坦度量的确定性预测器界。该框架适用于有界和无界光滑损失函数,并将结果专门应用于线性预测器和光滑神经网络。最后,理论中出现的雅可比和海森矩阵量激发了一个实用的正则化器。对于BatchNorm网络,我们通过将BatchNorm变换折叠到相邻的仿射权重中,相对于有效的BatchNorm权重计算该正则化器。在CIFAR-10上的实验说明了该正则化器在不同批量大小下的行为。

英文摘要

We study PAC-Bayes derandomization for smooth loss functions. Our goal is to obtain generalization bounds that hold with high probability for deterministic predictors by exploiting smoothness properties of both the loss and the predictor class. We show that passing from the Gibbs predictor to the deterministic predictor at the posterior mean has a precise cost, given by the generalization gap of the Jensen gap class. We control this class through its Rademacher complexity, leading to bounds for deterministic predictors that involve flatness quantities expressed in terms of parameter Jacobians and Hessians of the score map. The framework applies to both bounded and unbounded smooth loss functions, and we specialize the results to linear predictors and smooth neural networks. Finally, the Jacobian and Hessian quantities appearing in the theory motivate a practical regularizer. For BatchNorm networks, we compute this regularizer with respect to effective BatchNorm weights obtained by folding the BatchNorm transformation into the adjacent affine weights. Experiments on CIFAR-10 illustrate the behavior of this regularizer under different batch sizes.

2606.19145 2026-06-18 cs.LG cs.AI cs.SY eess.SY 新提交

OrthoReg: Orthogonal Regularization for Hybrid Symbolic-Neural Dynamical Systems

OrthoReg:混合符号-神经动力系统的正交正则化

Till Richter, Niki Kilbertus

发表机构 * Technical University of Munich(慕尼黑工业大学) Helmholtz Munich(亥姆霍兹慕尼黑中心)

AI总结 针对混合建模中神经部分可能重复学习符号结构导致模型冗余的问题,提出正交正则化方法OrthoReg,直接惩罚符号与神经组件间的重叠,实现互补分解,提升符号恢复和分布外行为。

详情
AI中文摘要

动力系统是建模自然世界的基础,然而建模过程中存在持续的权衡:手动指定的机械模型设计上可解释但通常过于简单且设定错误;相反,灵活的数据驱动神经方法缺乏物理洞察。混合建模旨在通过结合指定的或基于符号的物理组件与灵活的神经网络来兼顾两者优势。然而,一个关键挑战是神经组件可能重新学习机械部分,产生冗余且不可解释的模型,特别是当符号结构本身是从数据中发现时。基于标准$L^2$正则化的现有方法依赖于投影论证,但当符号组件通过稀疏发现学习时,该论证失效,允许神经增强与符号结构重叠。我们引入\textbf{OrthoReg}(正交正则化),直接惩罚符号与神经组件之间的重叠,防止符号结构被神经残差吸收。这产生互补分解:符号部分捕捉库能表达的内容,神经部分捕捉剩余内容。在存在部分库不匹配的基准动力系统上,OrthoReg改善了符号恢复和分布外行为。

英文摘要

Dynamical systems are fundamental to modeling the natural world, yet modeling them involves a persistent trade-off: manually prescribed mechanistic models are interpretable by design but often overly simplistic and misspecified; in contrast, flexible data-driven neural methods lack physical insight. Hybrid modeling aims for the best of both worlds by combining a prescribed or symbolic, physics-based component with a flexible neural network. A critical challenge, however, is that the neural component may relearn mechanistic parts, yielding redundant and uninterpretable models, especially when the symbolic structure itself is discovered from data. Existing methods based on standard $L^2$ regularization rely on a projection argument that breaks when the symbolic component is learned through sparse discovery, allowing the neural augmentation to overlap with symbolic structure. We introduce \textbf{OrthoReg} (Orthogonal Regularization), which directly penalizes overlap between the symbolic and neural components, preventing symbolic structure from being absorbed by the neural residual. This yields a complementary decomposition: the symbolic part captures what the library can express, and the neural part captures what remains. On benchmark dynamical systems with partial library mismatch, OrthoReg improves symbolic recovery and out-of-distribution behavior.

2606.19179 2026-06-18 cs.LG cs.AI math.OC stat.ML 新提交

Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods

随机动量方法的计算效率与串行运行时间权衡

Depen Morwani, Alexandru Meterez, Pranav Nair, Sham Kakade

发表机构 * Harvard University(哈佛大学) Kempner Institute at Harvard University(哈佛大学凯普纳研究所)

AI总结 研究随机动量方法(如重球法和加速SGD)在一致线性回归中的批次大小权衡,证明重球法不改善SGD的计算效率前沿但允许更大批次减少串行运行时间,而加速SGD的计算效率与串行运行时间权衡依赖于谱衰减。

详情
AI中文摘要

随机动量方法,如重球法(HB)、Nesterov动量以及加速SGD(ASGD)的变体[Kidambi等人,2018],在现代训练中被广泛使用,但其随机优势取决于两个不同的量:串行运行时间(达到目标精度所需的迭代次数)和计算效率(CE,总梯度查询或FLOP成本的倒数)。更大的批次在不损害CE的情况下减少串行运行时间,仅当收缩间隙随批次大小线性增长时。我们研究了一致线性回归(具有高斯协变量)的随机HB和ASGD,并证明了其批次大小权衡的有限维离散时间下界。我们的第一个结果表明,HB不会改善任意谱下SGD的CE前沿;相反,它在更大的批次大小窗口内保持SGD级别的CE,允许更大的批次减少串行运行时间,直到HB达到其确定性加速尺度。这个窗口可能比SGD临界批次大小大$\sqrt{\kappa}$倍。对于ASGD,情况更依赖于谱:对于快速衰减的幂律谱,ASGD改善了小批次下的CE(相对于HB/SGD),但随着批次大小增加,它牺牲了这种CE优势以换取改进的串行运行时间。合成线性回归实验验证了这些定性区域,包括慢衰减谱下ASGD和HB的近乎重叠,以及快速衰减谱下预测的CE-串行权衡。

英文摘要

Stochastic momentum methods such as heavy ball (HB), Nesterov momentum, and variants of Accelerated SGD (ASGD) [Kidambi et al., 2018] are widely used in modern training, but their stochastic benefits depend on two distinct quantities: serial runtime, the number of iterations needed to reach a target accuracy, and compute efficiency (CE), the inverse total gradient-query or FLOP cost. Larger batches reduce serial runtime without hurting CE only when the contraction gap grows linearly with batch size. We study stochastic HB and ASGD for consistent linear regression with Gaussian covariates and prove finite-dimensional, discrete-time lower bounds on their batch-size tradeoffs. Our first result shows that HB does not improve the CE frontier over SGD for arbitrary spectra; rather, it preserves SGD-level CE over a larger batch-size window, allowing larger batches to reduce serial runtime until HB reaches its deterministic accelerated scale. This window can be a factor $\sqrtκ$ larger than the SGD critical batch size. For ASGD, the picture is more spectrum-dependent: for rapidly decaying power-law spectra, ASGD improves small-batch CE over HB/SGD, but as batch size grows it trades this CE advantage for improved serial runtime. Synthetic linear-regression experiments verify these qualitative regimes, including near-overlap of ASGD and HB for slowly decaying spectra and the predicted CE--serial tradeoff for rapidly decaying spectra.

2606.18515 2026-06-18 quant-ph cs.LG stat.ML 交叉投稿

Exponentially many initializations to avoid barren plateaus

指数多个初始化以避免贫瘠高原

Ankit Kulshrestha, Ricard Puig, Diego García-Martín, Lukasz Cincio, Ilya Safro, Zoë Holmes, M. Cerezo

发表机构 * Fujitsu Research of America, Santa Clara, CA 95054, USA(美国富士通美洲研究部) University of Delaware, Newark, DE 19716, USA(德雷克塞尔大学) Department for Quantum Information and Computation at Kepler (QUICK), Johannes Kepler University, Linz, Austria(约翰·凯撒大学量子信息与计算部门) Information Sciences, Los Alamos National Laboratory, Los Alamos, NM 87545, USA(洛斯阿拉莫斯国家实验室信息科学部)

AI总结 提出一阶矩框架诊断初始化能否逃离完全集中的贫瘠高原不动点,发现避免贫瘠高原的初始化策略高度非唯一,存在指数多个不等价族,且不同初始化导致不同极小值。

Comments 18 + 27 pages, 5+4 figures, 1 Table

详情
AI中文摘要

贫瘠高原被描述为一种平均情况现象:选择一个拟设,天真地初始化,然后集中随之而来。这导致了一种普遍观点,即贫瘠高原的潜在治愈方法仅仅是更仔细地初始化参数。在这里,我们表明情况更为微妙。我们引入了一个一阶矩框架,该框架提供了一个简单的算子级诊断,用于判断初始化何时可能逃离完全集中的贫瘠高原不动点,并用于比较不同初始化策略引起的偏差。我们的框架恢复了几种已知的初始化方案,如恒等初始化和高斯初始化,但也表明避免贫瘠高原是高度非唯一的。实际上,许多平移、有偏和非对称的参数分布可以避免集中,并且这些选择不必等价。事实上,我们的结果表明,可以生成指数多个不等价的初始化策略族。然后,我们的数值实验表明,不同一阶矩不同的初始化可能导致不同的达到极小值,这表明通过智能初始化避免贫瘠高原可以将指数集中问题转化为从众多选项中选择正确可训练口袋的挑战。

英文摘要

Barren plateaus are stated as an average-case phenomenon: pick an ansatz, initialize it naively, and concentration follows. This has led to the common view that a potential cure for barren plateaus is simply to initialize the parameters more carefully. Here we show that the situation is subtler. We introduce a first-moment framework that gives a simple operator-level diagnostic for when an initialization may escape the fully concentrated barren-plateau fixed point, and for comparing the biases induced by different initialization strategies. Our framework recovers several known initialization schemes such as identity and Gaussian initialization, but also shows that barren-plateau avoidance is highly non-unique. Indeed, many shifted, biased, and non-symmetric parameter distributions can avoid concentration, and these choices need not be equivalent. In fact, our results show that one can generate exponentially many families of inequivalent initialization strategies. Then, our numerics indicate that different first-moment-distinct initializations can lead to different attained minima, suggesting that avoiding barren plateaus via smart initializations can trade the exponential concentration problem for the challenge of selecting the right trainable pocket amongst many options.

2606.18527 2026-06-18 stat.ML cs.LG 交叉投稿

Toward Simultaneously Optimal Regret in U-Calibration

面向同时最优遗憾的U-校准

Rafael Frongillo, Haipeng Luo, Nishant A. Mehta, Jon Schneider

发表机构 * University of Colorado Boulder(科罗拉多大学波德穆尔分校) University of Southern California(南加州大学) Google Research(谷歌研究)

AI总结 提出一种基于自和谐噪声的FTPL变体,实现对所有有界适当损失的最优$\tilde O(\sqrt{T})$遗憾和对光滑损失的对数遗憾。

Comments 30 pages; to appear at COLT 2026

详情
AI中文摘要

U-校准研究在线预测算法,其预测可被任何未知下游智能体使用,同时保证对所有适当损失函数的次线性遗憾。现有U-校准算法对每个有界适当损失实现了最坏情况最优的$O(\sqrt{T})$遗憾,但它们未能适应更简单的损失:如我们所示,即使对于平方损失等光滑损失,它们也会产生$\Omega(\sqrt{T})$遗憾,而不是最优的$O(\log T)$遗憾。在这项工作中,我们表明这一局限性并非固有。具体来说,我们设计了一个单一的预测算法,同时对所有有界适当损失实现$\tilde O(\sqrt{T})$遗憾,并对所有有界光滑适当损失实现$O(\log T)$遗憾。更一般地,我们的算法还对于相对于对数障碍光滑的损失(包括几个非Lipschitz例子)实现了对数遗憾。我们的方法基于一种新颖的跟随扰动领导者(FTPL)变体,其中使用自和谐噪声直接在预测空间中应用扰动。由于这种噪声的复杂性质,所得分析也大大偏离了先前的FTPL分析,可能具有独立意义。

英文摘要

U-calibration studies online forecasting algorithms whose predictions can be consumed by any unknown downstream agent, guaranteeing sublinear regret simultaneously for all proper loss functions. Existing U-calibration algorithms achieve worst-case optimal $O(\sqrt{T})$ regret for every bounded proper loss, but they fail to adapt to easier losses: as we show, even for smooth losses such as squared loss, they incur $Ω(\sqrt{T})$ regret instead of the optimal $O(\log T)$ regret. In this work, we show that this limitation is not inherent. Specifically, we design a single forecast algorithm that simultaneously achieves $\tilde O(\sqrt{T})$ regret for every bounded proper loss and $O(\log T)$ regret for every bounded smooth proper loss. More generally, our algorithm also attains logarithmic regret for losses that are smooth relative to the log-barrier, which include several non-Lipschitz examples. Our approach is based on a novel variant of Follow-the-Perturbed-Leader (FTPL) in which perturbations are applied directly in the prediction space using self-concordant noise. The resulting analysis also departs substantially from prior FTPL analyses due to the complex nature of this noise and may be of independent interest.

2606.18679 2026-06-18 cs.DS cs.GT cs.LG math.OC 交叉投稿

Fair Online Resource Allocation

公平在线资源分配

Christopher En, Yuri Faenza, Andrea Lodi, Gonzalo Muñoz

发表机构 * Columbia University, IEOR Department(哥伦比亚大学工业工程与运营研究系) Cornell Tech(康奈尔科技学院) Universidad de Chile(智利大学)

AI总结 研究在线资源分配中的公平性问题,提出基于对偶镜像下降的算法,在批次内强制执行公平约束,实现亚线性遗憾,并通过难民数据验证了福利与公平的权衡。

Comments 3 pages, 4 figures. To appear in the proceedings of EC 2026

详情
AI中文摘要

我们研究公平在线资源分配问题,其动机源于难民安置和航班调度等应用,其中代理顺序到达并必须分配到容量有限的设施。我们引入一个模型,在资源约束和Lipschitz公平性要求下最大化整体福利,该要求确保同一批次中到达的相似代理获得相似的预期结果。我们首先分析离线问题,证明最优公平分配的价值至少是最优不公平分配的$\Omega(1/\gamma)$倍,其中$\gamma$是公平系数,从而界定了公平的代价。对于在线设置,我们提出一种基于对偶镜像下降的算法,该算法在估计最优对偶变量的同时,在批次内强制执行公平约束。我们证明该算法相对于最优离线流体基准实现了亚线性遗憾。最后,我们使用难民经济项目的真实数据验证了理论结果,展示了算法的性能,并考察了福利最大化与公平执行之间的权衡。

英文摘要

We study the problem of fair online resource allocation, motivated by applications such as refugee resettlement and airline scheduling, where agents arrive sequentially and must be assigned to facilities with limited capacities. We introduce a model that maximizes the overall welfare subject to resource constraints and a Lipschitz fairness requirement, which ensures that similar agents arriving in the same batch receive similar expected outcomes. We first analyze the offline problem, proving that the value of the optimal fair allocation is at least an $Ω(1/γ)$ fraction of the optimal unfair allocation, where $γ$ is the fairness coefficient, thereby bounding the price of fairness. For the online setting, we propose an algorithm based on dual mirror descent that enforces fairness constraints within batches while estimating optimal dual variables. We prove that this algorithm achieves sublinear regret relative to the optimal offline fluid benchmark. Finally, we validate our theoretical results using real-world data from the Refugee Economies Programme, demonstrating the algorithm's performance and examining the trade-offs between welfare maximization and fairness enforcement.

2606.18807 2026-06-18 cs.DS cs.LG 交叉投稿

Learning Augmented Exact Exponential Algorithms

学习增强的精确指数时间算法

Tatiana Belova, Yuriy Dementiev, Danil Sagunov

发表机构 * ITMO University(ITMO大学)

AI总结 提出一种通用方法,利用略优于随机猜测的噪声预测器,可证明地减少NP难子集选择问题的搜索空间,运行时间加速随预测质量平滑扩展,且仅需预测的成对独立性或无需知道预测器精度。

详情
AI中文摘要

学习增强算法领域已经证明,机器学习预测可以在广泛的问题中绕过最坏情况下的下界。然而,到目前为止,关注点几乎完全集中在多项式时间算法上,其中预测改进了竞争比、近似保证或运行时间。在本文中,我们提出了一个问题:预测能否推动NP难问题的精确指数时间算法的前沿?我们通过提出一种通用方法对此问题给出肯定回答,该方法增强了一整类用于各种子集选择问题的最先进精确算法。我们表明,一个仅略优于随机猜测的噪声预测器足以可证明地减少搜索空间,并且由此产生的运行时间加速随预测质量平滑扩展。重要的是,我们的算法仅需要预测的成对独立性,或者,不需要知道预测器的精度——这两种设置都比通常假设的更弱且更现实。

英文摘要

The field of learning-augmented algorithms has demonstrated that machine-learned predictions can bypass worst-case lower bounds across a wide range of problems. So far, however, the focus has been almost exclusively on polynomial-time algorithms, where predictions improve competitive ratios, approximation guarantees, or running times. In this paper, we raise the question of whether predictions can push the frontier of exact exponential-time algorithms for NP-hard problems. We answer this question affirmatively by proposing a general approach that augments an entire family of state-of-the-art exact algorithms for a variety of subset selection problems. We show that a noisy predictor that is only marginally better than random guessing suffices to provably reduce the search space, and that the resulting runtime speedup scales smoothly with the prediction quality. Importantly, our algorithms require only pairwise independence of predictions or, alternatively, do not require the knowledge of the predictor's accuracy - both strictly weaker and more realistic settings than typically assumed.

2606.18993 2026-06-18 stat.ML cs.LG stat.ME 交叉投稿

Sequential Kernel-based Conditional Independence Testing via Adaptive Betting

基于自适应投注的序列核条件独立性检验

Zheng He, Danica J. Sutherland

AI总结 提出一种对估计误差更鲁棒的序列条件独立性检验方法,通过自适应优化核条件独立性统计量、归一化及截断平移校准,在合成与真实数据上控制第一类错误并保持高功效。

Comments Published at ICML 2026: https://openreview.net/forum?id=vUMdIyTs9c

详情
AI中文摘要

检验条件独立性是基础但本质上困难的问题:在没有额外假设的情况下,通常无法控制第一类错误。“Model-X”范式通过假设精确知道相关条件分布来解决这一困难。虽然经典的一次性检验有时可以容忍对该假设的小偏差,但现有的序列条件独立性检验通常要求精确知道Model-X条件分布,这使得当必须估计该分布时它们变得脆弱。我们提出了一种新方法,对这类估计误差具有更强的鲁棒性。我们的方法将测试-投注应用于自适应优化的核条件独立性统计量,并结合归一化方案和截断-移位校准策略。这些修改大大减少了第一类错误膨胀,同时在高维合成基准和现实世界公平性任务中保持了高功效,优于现有的序列Model-X方法。代码可在https://this URL获取。

英文摘要

Testing conditional independence is fundamental yet intrinsically difficult: without additional assumptions, Type I error control is impossible in general. The "Model-X'' paradigm addresses this difficulty by assuming exact knowledge of a relevant conditional distribution. While small deviations from this assumption can sometimes be tolerated in classical one-shot testing, existing sequential conditional independence tests typically require the Model-X conditional to be known exactly, making them fragile when it must instead be estimated. We propose a new approach that is substantially more robust to such estimation error. Our method applies testing-by-betting to an adaptively optimized Kernel Conditional Independence statistic, together with a normalization scheme and a truncate-and-shift calibration strategy. These modifications greatly reduce Type I error inflation while preserving high power across high-dimensional synthetic benchmarks and real-world fairness tasks, outperforming existing sequential Model-X approaches. Code is available at https://github.com/he-zh/SKCI.

2606.19117 2026-06-18 stat.ME cs.LG econ.EM stat.ML 交叉投稿

Wasserstein Policy Learning for Distributional Outcomes

Wasserstein 策略学习用于分布性结果

Yiyan Huang, Cheuk Hang Leung, Qi Wu, Zhiheng Zhang

AI总结 针对分布值结果,提出基于Wasserstein重心和效用泛函的策略学习框架,使用IPW和DR估计器,证明遗憾率由策略类复杂度主导,并给出极小化下界。

Comments Accepted by The 39th Annual Conference on Learning Theory (COLT 2026)

详情
AI中文摘要

离线策略学习在因果推断中受到越来越多的关注。主要目标是学习一个策略(个体化治疗规则),作为从协变量到治疗的映射,以最大化定义为标量值潜在结果均值的经验福利。在本文中,我们研究具有分布值结果的离线策略学习,其中每个潜在结果是$\mathbb{R}$上的概率测度,奖励通过应用于诱导结果分布的Wasserstein重心的效用泛函来定义。我们基于逆概率加权(IPW)和双稳健(DR)估计器为策略学习框架建立了统计保证。通过处理组合策略类和无限维分位数域乘积上的具有挑战性的均匀偏差,我们证明了有限样本遗憾具有主导依赖$\widetilde{\mathcal{O}}(\sqrt{\mathrm{N\text{-}dim}(\Pi)/N})$。在一维Wasserstein设定下,并在所述正则条件下,主导遗憾率仍由策略类复杂度控制。此外,我们提供了一个极小化下界,建立了对$N$和$\mathrm{N\text{-}dim}(\Pi)$主导依赖的尖锐性。

英文摘要

Offline policy learning has received growing attention in causal inference. The primary objective is to learn a policy (individualized treatment rule) as a mapping from covariates to treatment that maximizes the empirical welfare defined as the mean of scalar-valued potential outcomes. In this paper, we study offline policy learning with distribution-valued outcomes, where each potential outcome is a probability measure on $\mathbb{R}$ and the reward is defined through a utility functional applied to the Wasserstein barycenter of induced outcome distributions. We establish statistical guarantees for the policy learning framework based on both Inverse Probability Weighting (IPW) and Doubly Robust (DR) estimators. By handling the challenging uniform deviation over the product of the combinatorial policy class and the infinite-dimensional quantile domain, we prove that the finite-sample regret has leading dependence $\widetilde{\mathcal{O}}(\sqrt{\mathrm{N\text{-}dim}(Π)/N})$. In the one-dimensional Wasserstein setting and under the stated regularity conditions, the leading regret rate is still governed by the policy-class complexity. Moreover, we provide a minimax lower bound establishing the sharpness of the leading dependence on $N$ and $\mathrm{N\text{-}dim}(Π)$.

2606.19147 2026-06-18 stat.ML cs.LG math.ST stat.TH 交叉投稿

On Local Population-Risk Certificates

论局部总体风险证书

Mingzhi Song

发表机构 * Department of Mathematics, The University of Hong Kong(香港大学数学系)

AI总结 本文提出局部总体风险增量证书,用于在模型更新时提供风险控制,通过双边置信带判断更新是否接受。

Comments 35 pages, 6 figures

详情
AI中文摘要

本文为当前模型周围的总体风险增量开发了局部证书。对于局部候选集 \(\mathcal D\),该证书是 \(P({\ell_{\theta+v}-\ell_\theta})\) 在 \(v\in\mathcal D\) 上的双边置信带。作为应用,该置信带的上端点产生了一个风险控制的更新规则:仅当认证的上端点为非正时,更新被接受;否则保留当前模型。

英文摘要

This paper develops local certificates for population-risk increments around a current model. For a local candidate set \(\mathcal D\), the certificate is a two-sided confidence band for \(P({\ell_{θ+v}-\ell_θ})\) over \(v\in\mathcal D\). As an application, the upper endpoint of this band yields a risk-controlled update rule: an update is accepted only when its certified upper endpoint is nonpositive; otherwise the current model is retained.

2606.19212 2026-06-18 stat.ML cs.LG 交叉投稿

Generalised Eigenvalue Geometry of Semantic Adversarial Attacks

语义对抗攻击的广义特征值几何

Martin Anthony, Kaveh Salehzadeh Nobari

AI总结 提出一种连续局部模型,通过矩阵束$(A,B)$的最大广义特征值量化语义对抗攻击性,并给出预测翻转条件、攻击性证书及VC界。

详情
AI中文摘要

最近的实证工作表明,语义等价的释义可以欺骗金融情感分类器:尽管释义在强参考嵌入下保持与原文接近,但它可能足以改变目标模型的表示,从而改变预测类别。现有的鲁棒性理论要么假设单模型威胁模型,要么主要关注实证攻击算法。我们开发了一个连续局部模型来描述语义释义扰动,该模型捕捉了这种双模型结构。我们证明,在代理模型预算下,目标表示的最坏情况局部位移由从两个嵌入映射的雅可比矩阵构造的矩阵束$(A,B)$的最大广义特征值控制。由此产生的攻击性指标$\lambda^*(x)$是局部释义几何和所选嵌入器固有的,为仿射读出提供了闭式预测翻转条件,并支持保守的总体和有限样本攻击性证书。为了对仿射读出的类别进行统一控制,我们推导了二元攻击性指标的无分布VC界,以及基于攻击性调整边界的尺度敏感边界,该边界从标准分类器边界中减去局部几何惩罚。我们还将连续理论与离散释义搜索联系起来,识别出成功与不成功的有限搜索之间的不对称性,并给出了离散和连续设置一致时的覆盖条件。最后,我们提出了一个使用软令牌松弛和生成的释义集的实证验证框架,以评估部署的金融文本分类器上的局部特征值几何、预测翻转条件和有限搜索近似。

英文摘要

Recent empirical work shows that semantically equivalent paraphrases can fool financial sentiment classifiers: although a paraphrase remains close to the original under a strong reference embedding, it may shift the target model's representation enough to change the predicted class. Existing robustness theory either assumes a single-model threat model or focuses mainly on empirical attack algorithms. We develop a continuous local model of semantic paraphrase perturbations that captures this two-model structure. We show that the worst-case local displacement of the target representation, subject to a proxy-model budget, is governed by the largest generalised eigenvalue of a matrix pencil $(A,B)$ constructed from the Jacobians of the two embedding maps. The resulting attackability index $λ^*(x)$ is intrinsic to the local paraphrase geometry and the chosen embedders, yields a closed-form prediction-flip condition for affine readouts, and supports conservative population and finite-sample attackability certificates. For uniform control over classes of affine readouts, we derive a distribution-free VC bound for binary attackability indicators and a scale-sensitive margin bound based on an attackability-adjusted margin that subtracts a local geometric penalty from the standard classifier margin. We also connect the continuous theory to discrete paraphrase search, identify an asymmetry between successful and unsuccessful finite searches, and give a covering condition under which the discrete and continuous settings agree. Finally, we propose an empirical verification framework using soft-token relaxations and generated paraphrase sets to assess the local eigenvalue geometry, prediction-flip condition, and finite-search approximation on a deployed financial-text classifier.

2602.11557 2026-06-18 cs.LG stat.ML 版本更新

The Implicit Bias of Steepest Descent with Mini-batch Stochastic Gradient

小批量随机梯度下降的隐式偏差

Jichu Li, Xuan Tang, Difan Zou

AI总结 研究小批量随机最陡下降在多类分类中的隐式偏差,揭示批大小、动量和方差缩减对最大间隔行为和收敛率的影响,并证明动量可实现小批量收敛,方差缩减可恢复全批量隐式偏差。

详情
AI中文摘要

多种广泛使用的优化方法,如SignSGD和Muon,可以被解释为在不同范数诱导几何下的最陡下降实例。在这项工作中,我们研究了多类分类中小批量随机最陡下降的隐式偏差,刻画了批大小、动量和方差缩减如何在一般逐项和Schatten-$p$范数下塑造极限最大间隔行为和收敛率。我们证明,在没有动量时,最坏情况下的收敛和成功分类只能通过全批量梯度保证。相反,动量通过批量-动量权衡使得小批量收敛到近似最大间隔解成为可能,尽管会减慢收敛速度。该方法提供了完全显式、与维度无关的收敛率,优于先前的结果。此外,我们证明方差缩减可以恢复任意批大小下的精确全批量隐式偏差,尽管收敛速度较慢。最后,我们进一步研究了无动量的单批量最陡下降,并通过一个具体数据示例揭示了其收敛到根本不同偏差的特性,这揭示了纯随机更新的一个关键局限性。总体而言,我们的统一分析阐明了随机优化何时与全批量行为一致,并为更深入地探索随机梯度最陡下降算法的训练行为铺平了道路。

英文摘要

A variety of widely used optimization methods like SignSGD and Muon can be interpreted as instances of steepest descent under different norm-induced geometries. In this work, we study the implicit bias of mini-batch stochastic steepest descent in multi-class classification, characterizing how batch size, momentum, and variance reduction shape the limiting max-margin behavior and convergence rates under general entry-wise and Schatten-$p$ norms. We show that, without momentum, worst-case convergence and successful classification can only be guaranteed with full-batch gradient. In contrast, momentum enables small-batch convergence to an approximate max-margin solution through a batch-momentum trade-off, though it slows convergence. This approach provides fully explicit, dimension-free rates that improve upon prior results. Moreover, we prove that variance reduction can recover the exact full-batch implicit bias for any batch size, albeit at a slower convergence rate. Finally, we further investigate the batch-size-one steepest descent without momentum, and reveal its convergence to a fundamentally different bias via a concrete data example, which reveals a key limitation of purely stochastic updates. Overall, our unified analysis clarifies when stochastic optimization aligns with full-batch behavior, and paves the way for perform deeper explorations of the training behavior of stochastic gradient steepest descent algorithms.

2411.16206 2026-06-18 cs.LG cs.AI cs.NE 版本更新

Scalable Batch Bayesian Optimization Via Subspace Acquisition Functions

可扩展的批量贝叶斯优化:基于子空间采集函数

Dawei Zhan, Zhaoxi Zeng, Shuoxiao Wei, Ping Wu

发表机构 * School of Computing and Artificial Intelligence(计算与人工智能学院)

AI总结 提出通过从原始问题的轴对齐子空间中各选一点来扩展贝叶斯优化至大规模批量评估,显著加速收敛,与十种批量算法相比极具竞争力。

详情
Journal ref
ACM Transactions on Evolutionary Learning and Optimization, 2026
AI中文摘要

将贝叶斯优化扩展到批量评估可以使设计者充分利用并行计算技术。然而,当前大多数批量方法在批量大小增大时扩展性不佳,优化效率往往下降。为解决此问题,本文提出一种简单高效的方法,将贝叶斯优化扩展到大规模批量评估。与现有批量方法不同,新方法的思想是从原始问题中抽取一批轴对齐子空间,并使用现有采集函数从每个子空间中选择一个点。数值实验表明,与顺序贝叶斯优化算法相比,我们提出的方法显著加速收敛,并且与十种批量贝叶斯优化算法相比表现非常有竞争力。我们提出的方法的实现可在此 https URL 获取。

英文摘要

Extending Bayesian optimization to batch evaluation can enable the designer to make the most use of parallel computing technology. However, most of current batch approaches do not scale well with the batch size. That is, their optimization efficiencies often deteriorate as the batch size increases. To address this issue, we propose a simple and efficient approach to extend Bayesian optimization to large-scale batch evaluation in this work. Different from existing batch approaches, the idea of the new approach is to draw a batch of axis-aligned subspaces of the original problem and select one point from each subspace using existing acquisition functions. Numerical experiments show that our proposed approach speedups the convergence significantly when compared with the sequential Bayesian optimization algorithm, and performs very competitively when compared with ten batch Bayesian optimization algorithms. The implementation of our proposed approach is available at https://github.com/zhandawei/SubSpace_Acquisition_Functions.

2506.08764 2026-06-18 cs.LG 版本更新

On the Stability of the Jacobian Matrix in Deep Neural Networks

深度神经网络中雅可比矩阵的稳定性

Benjamin Dadoun, Soufiane Hayou, Hanan Salam, Mohamed El Amine Seddik, Pierre Youssef

AI总结 本文利用随机矩阵理论,建立了深度神经网络中雅可比矩阵谱稳定性的通用定理,适用于稀疏和非独立同分布权重,扩展了初始化方案的理论基础。

Comments 21 pages, 28 figures; the main theorem was wrong (again) and is now corrected

详情
AI中文摘要

深度神经网络随着深度增加容易出现梯度爆炸或消失,这一现象与输入-输出雅可比矩阵的谱行为密切相关。先前的工作确定了确保雅可比稳定性的关键初始化方案,但这些分析通常局限于具有独立同分布权重的全连接网络。在这项工作中,我们显著超越了这些限制:我们建立了一个适用于深度神经网络的通用稳定性定理,该定理能够处理稀疏性(例如由剪枝引入的)以及非独立同分布、弱相关权重(例如由训练引起的)。我们的结果依赖于随机矩阵理论的最新进展,并为更广泛类别的网络模型提供了谱稳定性的严格保证。这扩展了具有结构化和依赖随机性的现代神经网络中初始化方案的理论基础。

英文摘要

Deep neural networks are known to suffer from exploding or vanishing gradients as depth increases, a phenomenon closely tied to the spectral behavior of the input-output Jacobian. Prior work has identified critical initialization schemes that ensure Jacobian stability, but these analyses are typically restricted to fully connected networks with i.i.d. weights. In this work, we go significantly beyond these limitations: we establish a general stability theorem for deep neural networks that accommodates sparsity (such as that introduced by pruning) and non-i.i.d., weakly correlated weights (e.g. induced by training). Our results rely on recent advances in random matrix theory, and provide rigorous guarantees for spectral stability in a much broader class of network models. This extends the theoretical foundation for initialization schemes in modern neural networks with structured and dependent randomness.

2509.14969 2026-06-18 cs.LG math.OC stat.ML 版本更新

Stochastic Adaptive Gradient Descent Without Descent

无需下降的随机自适应梯度下降

Jean-François Aujol, Jérémie Bigot, Camille Castera

发表机构 * Univ. Bordeaux CNRS, Bordeaux INP, IMB, UMR 5251(波尔多大学 CNRS,波尔多 INP,IMB,UMR 5251)

AI总结 提出一种无需超参数调优的随机梯度自适应步长策略,利用一阶随机Oracle的局部几何信息,理论证明收敛性,实验与调优基线竞争。

详情
AI中文摘要

我们引入了一种新的自适应步长策略,用于随机梯度的凸优化,该策略仅通过一阶随机Oracle利用目标函数的局部几何信息,无需任何超参数调优。该方法源于将自适应梯度下降无需下降方法理论化地适应到随机设置。我们证明了在多种假设下,使用我们的步长的随机梯度下降的收敛性,并展示了它在经验上与调优基线竞争。

英文摘要

We introduce a new adaptive step-size strategy for convex optimization with stochastic gradient that exploits the local geometry of the objective function only by means of a first-order stochastic oracle and without any hyper-parameter tuning. The method comes from a theoretically-grounded adaptation of the Adaptive Gradient Descent Without Descent method to the stochastic setting. We prove the convergence of stochastic gradient descent with our step-size under various assumptions, and we show that it empirically competes against tuned baselines.

2602.14789 2026-06-18 cs.LG stat.ML 版本更新

On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials

关于GD和SGD中非线性动力学的稳定性:超越二次势能

Rotem Mulayoff, Sebastian U. Stich

发表机构 * CISPA Helmholtz Center for Information Security(CISPA赫尔姆霍兹信息安全中心)

AI总结 研究梯度下降和随机梯度下降中非线性项对动力学稳定性的影响,推导了多元设置下稳定振荡的精确条件,并发现SGD的稳定性由单个不稳定批次决定。

Comments Accepted to COLT 2026

详情
AI中文摘要

训练过程中迭代的动力稳定性在确定优化算法所获得的极小值方面起着关键作用。例如,梯度下降(GD)的稳定解对应于平坦极小值,而平坦极小值被认为具有有利特征。虽然先前的工作通常依赖线性化来确定稳定性,但线性化动力学是否忠实捕捉完整的非线性行为仍不清楚。最近的研究表明,GD可能在线性不稳定的极小值附近稳定振荡,并在步长衰减后收敛,这表明线性分析可能具有误导性。在这项工作中,我们明确研究了非线性项的影响。具体而言,我们在多元设置下推导了GD在极小值附近稳定振荡的精确准则。我们的条件依赖于高阶导数,推广了现有结果。将分析扩展到随机梯度下降(SGD),我们表明即使单个批次不稳定,非线性动力学也可能在期望上发散。这意味着稳定性可能由单个不稳定振荡的批次决定,而非线性分析所暗示的平均效应。最后,我们证明如果所有批次都是线性稳定的,则SGD的非线性动力学在期望上是稳定的。

英文摘要

The dynamical stability of the iterates during training plays a key role in determining the minima obtained by optimization algorithms. For example, stable solutions of gradient descent (GD) correspond to flat minima, which have been associated with favorable features. While prior work often relies on linearization to determine stability, it remains unclear whether linearized dynamics faithfully capture the full nonlinear behavior. Recent work has shown that GD may stably oscillate near a linearly unstable minimum and still converge once the step size decays, indicating that linear analysis can be misleading. In this work, we explicitly study the effect of nonlinear terms. Specifically, we derive an exact criterion for stable oscillations of GD near minima in the multivariate setting. Our condition depends on high-order derivatives, generalizing existing results. Extending the analysis to stochastic gradient descent (SGD), we show that nonlinear dynamics can diverge in expectation even if a single batch is unstable. This implies that stability can be dictated by a single batch that oscillates unstably, rather than an average effect, as linear analysis suggests. Finally, we prove that if all batches are linearly stable, the nonlinear dynamics of SGD are stable in expectation.

2605.04267 2026-06-18 cs.LG cs.NE math.OC 版本更新

QUIVER: Cost-Aware Adaptive Preference Querying in Surrogate-Assisted Evolutionary Multi-Objective Optimization

QUIVER: 代理辅助多目标进化优化中的成本自适应偏好查询

Florian A. D. Burnat

发表机构 * University of Warwick(沃里克大学) Warwick Business School(沃里克商学院)

AI总结 提出QUIVER方法,通过自适应选择目标评估与异质偏好查询(成对偏好陈述与无差异调整),在代理辅助多目标优化中最小化决策遗憾,实验显示在WFG难题上效用遗憾降低25%。

Comments Accepted at Genetic and Evolutionary Computation Conference (GECCO '26)

详情
AI中文摘要

交互式多目标优化系统面临预算分配困境:资源可用于昂贵的目标评估,或用于引出决策者偏好以识别帕累托集的相关区域。此外,偏好引出本身跨越具有不同信息内容和认知负担的模态,从廉价、嘈杂的成对偏好陈述(PS)到更丰富但成本更高的无差异调整(IA)。我们研究了未知标量化下的成本感知优化,并引入了QUIVER(查询信息价值估计遗憾),这是一种代理辅助的进化多目标优化器,可自适应地在目标评估和异质偏好查询之间进行选择。在每一步,QUIVER通过最大化每单位总成本的预期决策质量改进来选择下一个动作。在合成决策者模型下的DTLZ和WFG基准测试中,QUIVER在具有挑战性的WFG问题上实现了最低的最终效用遗憾(WFG4上效用遗憾为2.14,WFG9上为2.82:比基线提高25%),优于所有单模态基线。我们分析了PS和IA的最优混合如何适应问题难度:在简单问题(DTLZ2)上,QUIVER选择80%的PS查询;在困难问题(WFG9)上,它转向35%的IA查询。这种自适应模态选择展示了成本感知偏好学习的实际应用。

英文摘要

Interactive multi-objective optimization systems face a budget allocation dilemma: one can spend resources on expensive objective evaluations or on eliciting decision-maker preferences that identify the relevant region of the Pareto set. Moreover, preference elicitation itself spans modalities with different information content and cognitive burden, ranging from cheap, noisy pairwise preference statements (PS) to richer but costlier indifference adjustments (IA). We study cost-aware optimization under an unknown scalarization and introduce QUIVER (Query-Informed Value Estimation for Regret), a surrogate-assisted evolutionary multi-objective optimizer that adaptively chooses between objective evaluations and heterogeneous preference queries. At each step, QUIVER selects the next action by maximizing the expected decision-quality improvement per unit total cost. Across DTLZ and WFG benchmarks under synthetic decision-maker models, QUIVER achieves the lowest final utility regret on challenging WFG problems (utility regret of 2.14 on WFG4, 2.82 on WFG9: a 25% improvement over baselines), outperforming all single-modality baselines. We analyze how the optimal mix of PS and IA adapts to problem difficulty: on easy problems (DTLZ2), QUIVER selects 80\% PS queries; on hard problems (WFG9), it shifts to 35% IA queries. This adaptive modality selection demonstrates cost-aware preference learning in action.

2505.15215 2026-06-18 stat.ML cs.LG stat.ME 版本更新

Clustering and Pruning in Causal Data Fusion

因果数据融合中的聚类与剪枝

Otto Tabell, Santtu Tikka, Juha Karvanen

发表机构 * Department of Mathematics and Statistics(数学与统计学系)

AI总结 针对多数据源因果融合中变量增多导致计算复杂的问题,提出剪枝和聚类预处理方法,基于小图推断大图中因果效应的可识别性并给出识别函数。

详情
AI中文摘要

数据融合,即结合观测数据和实验数据的过程,可以使得原本不可识别的因果效应变得可识别。尽管针对特定场景已经开发了识别算法,但do-calculus仍然是因果数据融合的唯一通用工具,特别是当某些变量存在于部分数据源而其他数据源中没有时。然而,基于do-calculus的方法可能随着变量数量增加和因果图复杂度增长而面临计算挑战。因此,有必要在保留必要特征的同时减小此类模型的规模。为此,我们提出将剪枝(移除不必要的变量)和聚类(合并变量)作为因果数据融合的预处理操作。我们将先前关于单一数据源的结果进行推广,并推导出在多数据源情况下应用剪枝和聚类的条件。我们给出了基于较小图推断较大图中因果效应可识别性或不可识别性的充分条件,并展示了如何为可识别的因果效应获得相应的识别函数。来自流行病学和社会科学的例子展示了这些结果的应用。

英文摘要

Data fusion, the process of combining observational and experimental data, can enable the identification of causal effects that would otherwise remain non-identifiable. Although identification algorithms have been developed for specific scenarios, do-calculus remains the only general-purpose tool for causal data fusion, particularly when variables are present in some data sources but not others. However, approaches based on do-calculus may encounter computational challenges as the number of variables increases and the causal graph grows in complexity. Consequently, there exists a need to reduce the size of such models while preserving the essential features. For this purpose, we propose pruning (removing unnecessary variables) and clustering (combining variables) as preprocessing operations for causal data fusion. We generalize earlier results on a single data source and derive conditions for applying pruning and clustering in the case of multiple data sources. We give sufficient conditions for inferring the identifiability or non-identifiability of a causal effect in a larger graph based on a smaller graph and show how to obtain the corresponding identifying functional for identifiable causal effects. Examples from epidemiology and social science demonstrate the use of the results.

2509.03734 2026-06-18 cs.DS cs.LG 版本更新

How fast can you find a good hypothesis?

你能多快找到一个好的假设?

Anders Aamand, Maryam Aliakbarpour, Justin Y. Chen, Sandeep Silwal

发表机构 * BARC, University of Copenhagen(巴尔的效力研究所,哥本哈根大学) Rice University(里士满大学) MIT University of Wisconsin-Madison(麻省理工学院,威斯康星大学麦迪逊分校)

AI总结 研究假设选择问题,提出一种运行时间为poly(n)的混合输出算法,达到C=3-2/n的近似保证,并将正确算法的运行时间改进为Õ(n/(δε²))。

Comments Abstract abridged to meet arxiv requirements. This is the full version of a paper appearing at COLT 2026

详情
AI中文摘要

在假设选择问题中,我们被给予对有限候选分布(假设)集合 $\mathcal{H} = \{H_1, \ldots, H_n\}$ 的样本和查询访问,以及来自未知分布 $P$ 的样本,两者都在域 $\mathcal{X}$ 上。目标是输出一个分布 $Q$,使其到 $P$ 的距离与 $\mathcal{H}$ 中最近假设的距离相当。具体来说,如果最小距离是 $\mathsf{OPT}$,我们旨在输出 $Q$,使得以至少 $1-\delta$ 的概率,其到 $P$ 的总变差距离至多为 $C \cdot \mathsf{OPT} + \varepsilon$。对于正确算法(其中 $Q \in \mathcal{H}$),最优近似为 $C=3$,使用来自 $P$ 的 $\Theta(\log(n/\delta)/\varepsilon^2)$ 个样本;对于不正确算法(其中 $Q$ 不一定在 $\mathcal{H}$ 中),最优近似为 $C=2$,使用来自 $P$ 的 $\tilde{\Theta}(\log(n/\delta)/\varepsilon^2)$ 个样本。在不正确设置中,达到 $C=2$ 的算法 [Bousquet, Braverman, Kol, Efremenko, Moran, FOCS 2021] 的运行时间随 $|\mathcal{X}|$ 多项式增长——对于实值分布,它无法在有限时间内运行。改进运行时间的一个有希望的途径是考虑输出假设混合 $Q$ 的不正确算法,因为这样的分布可以用 $n$ 个内存字表示。我们证明 (1) 一个下界:除非样本数量是 $|\mathcal{X}|$ 的多项式,否则任何输出混合的算法都无法实现比 $C = 3-2/n$ 更好的近似,以及 (2) 一个运行时间为 $\text{poly}(n)$ 并达到相同近似保证的算法。在正确设置中,[Aliakbarpour, Bun, Smith, NeurIPS 2024] 提供了一个 $C=3$ 且运行时间为 $\tilde{O}(n/(\delta^3\varepsilon^3))$ 的算法。我们将时间复杂度改进为 $\tilde{O}(n/(\delta \varepsilon^2))$,显著减少了对置信度和误差参数的依赖。

英文摘要

In the hypothesis selection problem, we are given sample and query access to finite set of candidate distributions (hypotheses), $\mathcal{H} = \{H_1, \ldots, H_n\}$, and samples from an unknown distribution $P$, both over a domain $\mathcal{X}$. The goal is to output a distribution $Q$ whose distance to $P$ is comparable to that of the nearest hypothesis in $\mathcal{H}$. Specifically, if the minimum distance is $\mathsf{OPT}$, we aim to output $Q$ such that, with probability at least $1-δ$, its total variation distance to $P$ is at most $C \cdot \mathsf{OPT} + \varepsilon$. The optimal approximation for proper algorithms (where $Q \in \mathcal{H}$) is $C=3$ using $Θ(\log(n/δ)/\varepsilon^2)$ samples from $P$ and for improper algorithms (where $Q$ is not necessarily in $\mathcal{H}$) is $C=2$ using $\tildeΘ(\log(n/δ)/\varepsilon^2)$ samples from $P$. In the improper setting, the algorithm achieving $C=2$ [Bousquet, Braverman, Kol, Efremenko, Moran, FOCS 2021] runs in time which grows polynomially with $|\mathcal{X}|$ -- it does not run in finite time for real-valued distributions. A promising path towards improved runtime is to consider improper algorithms which output a mixture $Q$ of the hypotheses as such a distribution can be represented in $n$ words of memory. We show (1) a lower bound that no algorithm which outputs a mixture can achieve approximation better than $C = 3-2/n$ unless the number of samples is polynomial in $|\mathcal{X}|$, as well as (2) an algorithm which runs in time $\text{poly}(n)$ and achieves the same approximation guarantee. In the proper setting, [Aliakbarpour, Bun, Smith, NeurIPS 2024] provided an algorithm with $C=3$ running in $\tilde{O}(n/(δ^3\varepsilon^3))$ time. We improve this time complexity to $\tilde{O}(n/(δ\varepsilon^2))$, significantly reducing the dependence on the confidence and error parameters.

2603.04895 2026-06-18 stat.ML cs.LG math.OC 版本更新

How Does the ReLU Activation Affect the Implicit Bias of Gradient Descent on High-dimensional Neural Network Regression?

ReLU激活函数如何影响高维神经网络回归中梯度下降的隐式偏差?

Kuo-Wei Lai, Guanghui Wang, Molei Tao, Vidya Muthukumar

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文通过原始-对偶分析,研究了高维随机数据下浅层ReLU模型平方损失梯度下降的隐式偏差,证明其以高概率近似最小ℓ2范数解,差距为Θ(√(n/||λ||₁))。

Comments 66 pages

详情
AI中文摘要

过度参数化的机器学习模型(包括神经网络)通常会导致欠定的训练目标,具有多个全局最小值。隐式偏差指的是通过常见优化算法(如梯度下降)达到的极限全局最小值。在本文中,我们刻画了在高维随机特征上使用平方损失训练浅层ReLU模型时梯度下降的隐式偏差。先前的工作(Vardi和Shamir,2021)表明,在最坏情况下隐式偏差不存在,或者在完全正交数据下恰好对应于最小ℓ2范数插值解(Boursier等人,2022)。我们的工作介于这两个极端之间,并表明,对于足够高维的随机数据,隐式偏差以高概率近似最小ℓ2范数解,差距为Θ(√(n/||λ||₁)),其中n是训练样本数,λ表示数据协方差矩阵的谱。我们的结果通过一种新颖的原始-对偶分析获得,该分析仔细跟踪了预测、数据跨度系数及其相互作用的演变,并表明ReLU激活模式在随机数据上以高概率迅速稳定。

英文摘要

Overparameterized ML models, including neural networks, typically induce underdetermined training objectives with multiple global minima. The implicit bias refers to the limiting global minimum that is attained by a common optimization algorithm, such as gradient descent (GD). In this paper, we characterize the implicit bias of GD for training a shallow ReLU model with the squared loss on high-dimensional random features. Prior work (Vardi and Shamir, 2021) showed that the implicit bias does not exist in the worst-case, or corresponds exactly to the minimum-$\ell_2$-norm interpolating solution under exactly orthogonal data (Boursier et al., 2022). Our work interpolates between these two extremes and shows that, for sufficiently high-dimensional random data, the implicit bias approximates the minimum-$\ell_2$-norm solution with high probability with a gap on the order $Θ(\sqrt{n/||λ||_1})$, where $n$ is the number of training examples and $λ$ denotes the spectrum of the data covariance matrix. Our results are obtained through a novel primal-dual analysis that carefully tracks the evolution of predictions, data-span coefficients, as well as their interactions, and show that the ReLU activation pattern quickly stabilizes with high probability over random data.

2605.20726 2026-06-18 stat.ME cs.LG stat.ML 版本更新

Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference

在符合推断中对虚假发现比例的处处有效界

Ziang Song, Ying Jin, Emmanuel J. Candès

发表机构 * Department of Statistics, Stanford University(斯坦福大学统计学系) Department of Statistics and Data Science, University of Pennsylvania(宾夕法尼亚大学统计学与数据科学系) Department of Mathematics, Stanford University(斯坦福大学数学系)

AI总结 本文提出了一种在多重检验问题中对虚假发现比例(FDP)的处处有效界,通过构造高概率包络来保证在任意后验阈值选择下的统计保证,同时展示了该方法在异常检测和符合选择中的应用。

Comments 34 pages, 12 figures. Code available at https://github.com/sza919/everywhere-valid-fdp-bounds-in-conformal-inference

详情
AI中文摘要

现代将符合推断应用于多重检验问题,如异常检测和候选选择时,通常涉及选择符合p值低于阈值的测试样本。此类方法的质量通常通过虚假发现比例(FDP)来衡量,定义为错误选择的比例。现有方法通常控制FDP的期望值,使用如Benjamini-Hochberg过程等方法。这种做法无法提供高概率界下的实际FDP界,且当拒绝阈值在查看数据后选择时会破坏统计保证。本文建立了适用于所有可能拒绝阈值的有限样本、分布无关的FDP上界,从而允许任意后验阈值选择。通过从其联合分布中采样来构造null符合p值的经验分布函数的高概率包络,实现了同时有效性。此外,我们的框架允许从业者调节包络的形状,从而在主要感兴趣的拒绝区域中产生更紧的界。我们使用这种灵活的方法推导出异常检测和符合选择的的同时FDP上界。通过合成和真实数据实验,我们展示了所得到的界既有效又比现有方法的界更加不保守。

英文摘要

Modern applications of conformal inference to multiple testing problems, such as outlier detection and candidate selection, often involve selecting test samples whose conformal p-values fall below a threshold. The quality of such methods is often measured by the false discovery proportion (FDP), defined as the fraction of incorrect selections. Existing approaches typically control the expected value of the FDP, using methods such as the Benjamini-Hochberg procedure. This approach fails to provide high-probability bounds on the realized false discovery proportion and invalidates statistical guarantees if the rejection threshold is selected after inspecting the data. This paper establishes finite-sample, distribution-free upper bounds on the FDP that hold simultaneously over all possible rejection thresholds, enabling arbitrary post hoc selection of the threshold. Simultaneous validity is achieved by constructing a high-probability envelope for the empirical distribution function of null conformal p-values by sampling from their joint distribution. Furthermore, our framework allows practitioners to modulate the envelope's shape, thereby producing tight bounds in rejection regions of primary interest. We use this flexible approach to derive simultaneous FDP upper bounds for both outlier detection and conformal selection. We demonstrate through synthetic and real-data experiments that the resulting bounds are both valid and substantially less conservative than those derived from existing approaches.

6. 高效学习、压缩与部署 18 篇

2606.18286 2026-06-18 cs.LG 新提交

CODEBLOCK: Learning to Supervise Code at the Right Granularity

CODEBLOCK: 学习在正确的粒度上监督代码

Zhijie Deng, Ling Li, Jinlong Pang, Kaiqin Hu, Qi Xuan, Zhaowei Zhu, Jiaheng Wei

发表机构 * Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) UC Santa Cruz(加州大学圣克鲁兹分校) Ant Group(蚂蚁集团) BAIA, ZJUT(浙江工业大学智能信息处理实验室) D5Data.ai

AI总结 提出CodeBlock框架,通过选择结构完整的代码块而非孤立token进行稀疏监督,在仅使用1.9%监督token的情况下,在六个代码生成基准上取得优于全token微调的效果。

详情
AI中文摘要

代码大语言模型的监督微调通常对所有响应token应用统一的交叉熵损失,隐含假设每个token提供同等有用的学习信号。最近的token级选择方法通过仅监督高价值token挑战了自然语言SFT中的这一假设。然而,直接将token级掩码迁移到代码可能会破坏语法和语义连贯的程序单元,因为代码依赖于结构完整性和定义-使用关系。因此,我们提出CodeBlock,一个结构感知的稀疏监督框架,选择结构完整的代码证据而非孤立token。CodeBlock首先选择高质量的指令-响应对,然后将代码响应划分为语法连贯的编码项,通过聚合核心逻辑token上的广义交叉熵来估计其效用,并使用数据流可达性和桥接信号重新排序,以优先传播或连接重要程序依赖的块。在训练期间,完整响应仍作为上下文可用,但损失仅应用于选定的代码项和信息性自然语言token。在六个代码生成基准上的实验表明,CodeBlock在仅使用1.9%的监督响应token的情况下,实现了比全tokenSFT和竞争性选择基线更强的平均pass@1。

英文摘要

Supervised fine-tuning of code LLMs typically applies uniform cross-entropy loss to all response tokens, implicitly assuming that every token provides equally useful learning signal. Recent token-level selection methods challenge this assumption in natural-language SFT by supervising only high-value tokens. However, directly transferring token-level masking to code can break syntactically and semantically coherent program units, because code depends on structural completeness and definition-use relations. We therefore propose CodeBlock, a structure-aware sparse supervision framework that selects structure-complete code evidence rather than isolated tokens. CodeBlock first selects high-quality instruction-response pairs, then partitions code responses into syntactically coherent coding items, estimates their utility by aggregating generalized cross-entropy over core logic tokens, and reranks them with data-flow reach and bridge signals to prioritize blocks that propagate or connect important program dependencies. During training, the full response remains available as context, while loss is applied only to selected code items and informative natural-language tokens. Experiments on six code-generation benchmarks show that CodeBlock achieves stronger average pass@1 than full-token SFT and competitive selection baselines, while using only 1.9% of supervised response tokens.

2606.18304 2026-06-18 cs.LG cs.AI 新提交

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

基于归因引导和覆盖最大化的结构MoE剪枝

Yifu Ding, Jiacheng Wang, Ge Yang, Yongcheng Jing, Jinyang Guo, Xianglong Liu, Dacheng Tao

发表机构 * School of Computer Science and Engineering, Beihang University(北京航空航天大学计算机科学与工程学院) School of Artificial Intelligence, Beihang University(北京航空航天大学人工智能学院) Nanyang Technological University(南洋理工大学)

AI总结 针对MoE模型专家级剪枝粒度粗、冗余识别不足的问题,提出基于归因引导和覆盖最大化的结构剪枝框架,将剪枝分配转化为通道分数覆盖优化问题,在50%剪枝率下结合4位量化保持精度,内存减少5.27倍。

Comments 9 pages, 5 figures. Submitted to ICML 2026

详情
AI中文摘要

混合专家(MoE)模型在计算上高效扩展,但由于其巨大的内存占用和推理开销,部署成本仍然很高。先前的压缩方法主要在专家级别操作,要么移除整个专家,要么通过粗粒度的重要性分数对专家进行排序。然而,这种专家级别的决策通常过于粗糙,无法捕捉细粒度的冗余,导致剪枝预算分配不当和压缩效果有限。为了解决这个问题,我们观察到MoE专家内的信息高度集中在一小部分通道中,即使在被认为重要的专家中也存在大量冗余。基于这一观察,我们提出了一种针对MoE模型量身定制的结构剪枝框架。我们的方法将剪枝比例分配重新表述为通道分数覆盖最大化问题,并使用基于归因的近似方法高效求解。在DeepSeek和Qwen MoE模型上的实验表明,我们的方法在结合4位量化时,在50%或25%的结构化剪枝下仍能保持模型精度。在Qwen3-30B-A3B上,我们的方法将内存占用减少了5.27倍,并在各种基准测试中持续优于最先进的基线方法。

英文摘要

Mixture-of-Experts (MoE) models scale compute efficiently, yet remain expensive to deploy due to their substantial memory footprint and inference overhead. Prior compression methods mainly operate at the expert level, either removing entire experts or ranking experts by coarse-grained importance scores. However, such expert-wise decisions are often too coarse to capture fine-grained redundancy, leading to misallocated pruning budgets and limited compression. To address this problem, we observe that information within MoE experts is highly concentrated in a small subset of channels, leaving substantial redundancy even in experts deemed important. Based on this observation, we propose a structural pruning framework tailored for MoE models. Our method reformulates prune-ratio allocation as a channel-score coverage maximization problem and solves it efficiently using an attribution-based approximation. Experiments on DeepSeek and Qwen MoE models show that our method preserves model accuracy under 50% or 25% structured pruning when combined with 4-bit quantization. On Qwen3-30B-A3B, our approach reduces memory footprint by 5.27$\times$ and consistently outperforms state-of-the-art baselines across diverse benchmarks.

2606.18431 2026-06-18 cs.LG cs.DC 新提交

Beyond Prediction: Tail-Aware Scheduling for LLM Inference

超越预测:面向LLM推理的尾延迟感知调度

Yueying Li, Yuanfan Chen, Jiayang Chen, Esha Choukse, Haoran Qiu, G. Edward Suh, Rodrigo Fonseca, Ziv Scully, Udit Gupta

发表机构 * Cornell University, Computer Science Department(康奈尔大学计算机科学系) Cornell University, Electrical and Computer Engineering Department(康奈尔大学电气与计算机工程系) Cornell University, Operations Research and Information Engineering Department(康奈尔大学运筹学与信息工程系) Microsoft Azure System Research(微软Azure系统研究) NVIDIA Corporation(英伟达公司)

AI总结 针对LLM推理中长度预测调度在分布偏移和尾延迟控制上的脆弱性,提出无预测的分布感知调度框架,通过轻量统计信号实现软优先级提升,结合缓存感知抢占,在多种工作负载下将P99 TTLT降低35-50%,TTFT降低34-47%。

详情
Journal ref
Forty-Third International Conference on Machine Learning (2026)
AI中文摘要

LLM服务表现出极端的长度可变性,使得基于大小的调度在实践中变得困难。最近的LLM调度器使用预测的解码长度或排名来近似SJF/SRPT,并主要报告均值中心指标如TTFT和TBT。我们表明,这些预测驱动的策略在分布偏移、突发到达和GPU内存压力下可能脆弱,同时对主导用户体验的尾延迟(P90-P99)控制有限,即使拥有完美的解码长度知识。我们引入了一个分布感知、无预测的调度框架,用由轻量统计信号驱动的软优先级提升取代显式长度预测。我们的设计协同优化调度和缓存感知抢占,以考虑跨工作负载混合的内存耦合解码动态。在生产环境和开源轨迹上的评估表明,相对于具有完美长度知识的SRPT,我们的方法将P99 TTLT降低了高达35-50%,并在各种工作负载(包括推理密集型和聊天密集型任务)上将TTFT降低了34-47%。这些结果证明了在在线LLM服务中优化尾延迟的稳健替代方案。

英文摘要

LLM serving exhibits extreme length variability, making size-based scheduling difficult in practice. Recent LLM schedulers approximate SJF/SRPT using predicted decode lengths or ranks and primarily report mean-centric metrics such as TTFT and TBT. We show that these prediction-driven policies can be fragile under distribution shifts, bursty arrivals, and GPU memory pressure, while offering limited control over the tail latency (P90-P99) that dominates user experience, even with perfect decode-length knowledge. We introduce a distribution-aware, prediction-free scheduling framework that replaces explicit length prediction with soft priority boosting driven by lightweight statistical signals. Our design co-optimizes scheduling and cache-aware preemption to account for memory-coupled decode dynamics across workload mixes. Evaluated on production and open-source traces, our method reduces P99 TTLT by up to 35-50% relative to SRPT with perfect length knowledge and reduces TTFT by 34-47% across workloads, including reasoning-heavy and chat-heavy tasks. These results demonstrate a robust alternative for optimizing tail latency in online LLM serving.

2606.18650 2026-06-18 cs.LG 新提交

BLADE: Scalable Bi-level Adaptive Data Selection for LLM Training

BLADE: 面向LLM训练的可扩展双层自适应数据选择

Jiaxing Wang, Deping Xiang, Jin Xu, Zirui Liu, Zicheng Zhang, Guoqiang Gong, Jun Fang, Chao Liu, Pengzhang Liu, Tongxuan Liu, Ke Zhang, Qixia Jiang

发表机构 * University of Oxford(牛津大学) Renmin University of China(中国人民大学) University of Chinese Academy of Sciences(中国科学院大学)

AI总结 提出BLADE框架,通过拉格朗日乘子将双层优化转化为单层惩罚目标,避免逆Hessian计算,实现动态参考模型,理论保证一阶收敛,实验优于现有方法。

详情
AI中文摘要

随着大语言模型(LLM)数据集规模扩展到数万亿token,数据选择已成为过滤无信息噪声和构建自适应学习轨迹的关键前沿。除了静态启发式过滤,LLM训练的高级数据选择方法主要遵循两种范式,每种都有根本性局限。基于影响的方法提供了原则性的双层目标,但需要难以处理的逆Hessian计算,而超额损失方法计算高效但依赖静态参考模型,该模型在训练过程中与不断演化的代理模型失配。我们提出BLADE(双层自适应数据选择),一种无Hessian的数据选择框架。BLADE通过拉格朗日乘子将基于影响的方法背后的双层优化问题重新表述为惩罚单层目标,避免了逆Hessian计算,同时揭示了与基于超额损失的数据选择之间的原则性联系。所得目标恢复了超额损失形式,但用与训练同步的动态参考模型替代了静态参考模型。理论上,我们证明该惩罚公式保证一阶收敛。为了实现高效的在线批次选择,我们将BLADE实例化为一种无记忆随机块坐标Frank-Wolfe算法。大量实验表明,BLADE始终优于最先进的数据选择基线,为LLM训练提供了实用方案。

英文摘要

As Large Language Model (LLM) datasets scale to trillions of tokens, data selection has emerged as a critical frontier to filter out uninformative noise and construct adaptive learning trajectories. Beyond static heuristic filtering, advanced data selection methods for LLM training largely follow two paradigms, each with fundamental limitations. Influence-based methods provide principled bi-level objectives but require intractable inverse-Hessian computations, while excess-loss methods are computationally efficient but rely on a static reference model that becomes misaligned with the evolving proxy model during training. We propose BLADE (Bi-Level Adaptive Data sElection), a Hessian-free framework for data selection. BLADE reformulates the bi-level optimization problem underlying influence-based methods as a penalized single-level objective via Lagrange multipliers, avoiding inverse-Hessian computation while revealing a principled connection to excess-loss based data selection. The resulting objective recovers an excess-loss form but replaces the static reference model with a dynamic one that stays synchronized with training. Theoretically, we prove that this penalized formulation guarantees first-order convergence. For efficient online batch selection, we instantiate BLADE as a memoryless randomized block-coordinate Frank-Wolfe algorithm. Extensive experiments show that BLADE consistently outperforms state-of-the-art data selection baselines, providing a practical recipe for LLM training.

2606.18691 2026-06-18 cs.LG cond-mat.mtrl-sci 新提交

Robust and Interpretable Adaptation of Equivariant Materials Foundation Models via Sparsity-promoting Fine-tuning

通过稀疏性促进微调实现等变材料基础模型的鲁棒和可解释适应

Youngwoo Cho, Seunghoon Yi, Wooil Yang, Sungmo Kang, Young-woo Son, Jaegul Choo, Joonseok Lee, Soo Kyung Kim, Hongkee Yoon

发表机构 * KAIST(韩国科学技术院) Seoul National Univ.(首尔国立大学) KIAS(韩国宇宙科学研究所) Ewha Womans Univ.(成均馆大学) Kangwon National Univ.(江原国立大学)

AI总结 提出稀疏性促进微调方法,利用E(3)等变材料基础模型的结构特性选择性更新参数,在能量和力预测任务中以约3%参数达到或超越全微调性能,并展示在磁矩预测等任务中的泛化性和可解释性。

Comments Accepted by ICLR 2026

详情
AI中文摘要

预训练的材料基础模型,或机器学习原子间势,利用通用的物理化学知识有效逼近势能面。然而,由于物理化学多样性以及实际计算设置与构建预训练数据所用设置之间的不匹配,它们通常需要特定领域的校准。为了解决这个问题,我们提出了一种稀疏性促进的微调方法,通过利用E(3)等变材料基础模型的结构特性选择性更新模型参数。在跨分子和晶体基准的能量和力预测任务上,我们的方法匹配或超越了全微调和等变低秩适应,同时仅更新约3%的参数,在某些情况下甚至低至约0.5%。除了能量和力校准,我们进一步通过将方法应用于磁矩预测和磁感知总能量建模来展示任务泛化性。最后,稀疏模式分析揭示了物理可解释的特征,例如过渡金属系统中增强的d轨道贡献。总体而言,我们的结果确立了稀疏性促进微调作为等变材料基础模型领域专业化的灵活且可解释的方法。

英文摘要

Pre-trained materials foundation models, or machine learning interatomic potentials, leverage general physicochemical knowledge to effectively approximate potential energy surfaces. However, they often require domain-specific calibration due to physicochemical diversity as well as mismatches between practical computational settings and those used in constructing the pre-training data. To address this, we propose a sparsity-promoting fine-tuning method that selectively updates model parameters by exploiting the structural properties of E(3)-equivariant materials foundation models. On energy and force prediction tasks across molecular and crystalline benchmarks, our method matches or surpasses full fine-tuning and equivariant low-rank adaptation while updating only $\sim$3~\% of parameters, and in some cases as little as $\sim$0.5~\%. Beyond energy and force calibration, we further demonstrate task generalizability by applying our method to magnetic moment prediction and magnetism-aware total energy modeling. Finally, analysis of sparsity patterns reveals physically interpretable signatures, such as enhanced $d$-orbital contributions in transition metal systems. Overall, our results establish sparsity-promoting fine-tuning as a flexible and interpretable method for domain specialization of equivariant materials foundation models.

2606.18967 2026-06-18 cs.LG 新提交

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

EfficientRollout: 面向强化学习推演的感知系统的自推测解码

Minseo Kim, Minjae Lee, Seunghyuk Oh, Kevin Galim, Donghoon Kim, Coleman Hooper, Harman Singh, Amir Gholami, Hyung Il Koo, Wonjun Kang

发表机构 * FuriosaAI University of California, Berkeley(加州大学伯克利分校)

AI总结 针对强化学习推演中自回归解码延迟瓶颈,提出感知系统的自推测解码框架,通过量化自推测解码器与感知系统的推测开关策略,在保持模型质量前提下降低推演和端到端延迟。

Comments Project Page: https://github.com/furiosa-ai/EfficientRollout

详情
AI中文摘要

强化学习(RL)已成为LLMs代表性后训练范式,赋予其强大的推理和智能体能力。然而,推演生成仍是主要的延迟瓶颈,因为自回归采样顺序解码响应,且少量长尾生成往往决定完成时间。推测解码(SD)为缓解此瓶颈提供了自然途径,它是一种用于服务固定LLMs的成熟技术,通过快速草拟令牌并通过并行验证接受它们来降低延迟,同时保持目标模型分布。但其实际加速效果无法直接迁移到RL推演:(i)不断变化的目标策略使得任何固定草拟者与策略输出分布日益不匹配;(ii)推演解码过程中活跃批次大小缩小,解码从计算受限转向内存受限,此时并行验证可利用未充分利用的计算资源。因此,加速RL推演需要草拟者在长序列、高温生成下对演化策略保持有效,以及感知系统的SD使用以避免计算受限状态。我们提出EfficientRollout,一个感知系统的自推测SD框架,旨在解决RL推演中的这一差距。EfficientRollout从目标模型诱导量化草拟者(即自推测解码),使其与演化策略保持耦合,无需单独草拟者预训练或在线适应。它进一步协调感知系统的SD切换策略与接受感知的草稿长度自适应,仅在有益状态下进行推测,同时使草拟预算与演化草拟者质量匹配。EfficientRollout在加速自回归推演基线上分别将推演和端到端延迟降低高达19.6%和12.7%,同时保持最终模型质量。

英文摘要

Reinforcement learning (RL) has become a representative post-training paradigm for LLMs, enabling strong reasoning and agentic capabilities. However, rollout generation remains a dominant latency bottleneck because autoregressive sampling decodes responses sequentially and a small number of long-tailed generations often determine completion time. Speculative decoding (SD) offers a natural way to address this bottleneck, as it is a well-established technique for serving fixed LLMs that reduces latency by rapidly drafting tokens and accepting them through parallel verification while preserving the target-model distribution. However, its practical speedups do not directly carry over to RL rollouts: (i) the evolving target policy makes any fixed drafter increasingly mismatched with the policy's output distribution; and (ii) active batch sizes shrink throughout rollout decoding, shifting decoding from compute-bound to memory-bound regimes where parallel verification can exploit underutilized compute. Therefore, accelerating RL rollouts requires both a drafter that remains effective under long, high-temperature generations from an evolving policy and system-aware use of SD that avoids compute-bound regimes. We present EfficientRollout, a system-aware self-SD framework designed to address this gap for RL rollouts. EfficientRollout induces a quantized drafter from the target model (i.e. self-speculative decoding), keeping it coupled to the evolving policy without separate drafter pretraining or online adaptation. It further coordinates a system-aware SD toggle policy with acceptance-aware draft-length adaptation, enabling speculation only in beneficial regimes while matching the drafting budget to evolving drafter quality. EfficientRollout reduces rollout and end-to-end latency by up to 19.6% and 12.7%, respectively, over an accelerated AR rollout baseline, while preserving final model quality.

2606.19025 2026-06-18 cs.LG cs.AI cs.DC cs.SY eess.SY 新提交

FoMoE: Breaking the Full-Replica Barrier with a Federation of MoEs

FoMoE: 打破全副本壁垒的专家混合联邦系统

Lorenzo Sani, Zeyu Cao, Meghdad Kurmanji, Alex Iacob, Andrej Jovanovic, Yan Gao, Wanru Zhao, Nicholas D. Lane

发表机构 * DeepSeek-AI

AI总结 提出FoMoE系统,通过跨工作节点分区专家层打破全副本范式,结合部分专家复制和跳跃令牌机制,显著降低通信开销并提升吞吐量。

详情
AI中文摘要

预训练大型语言模型(LLMs)通常需要大规模基础设施,配备紧密耦合的硬件加速器。虽然增加模型和数据集规模仍是性能的主要驱动力,但专家混合(MoE)架构最近通过将参数数量与计算成本解耦,取得了最先进的结果。这种效率使得在受限计算预算下训练大规模模型成为可能,但通常需要单个数据中心的高速互连。为了克服这些物理限制,最近的方法如DiLoCo和Photon使用低通信数据并行方法,使得能够在地理分布、弱连接的数据中心之间进行扩展。然而,这些方法存在根本性的低效问题:它们需要在每个站点拥有完整的模型副本,这带来了高昂的内存约束和通信开销。在这项工作中,我们引入了FoMoE,一个通过跨工作节点分区专家层来打破全副本范式的系统。我们证明FoMoE:(I)通过部分专家复制,在所研究的场景中,相比高效基线降低了高达1.42倍的通信成本,相比DDP降低了45.44倍;(II)通过一种新颖的跳跃令牌机制,实现了高达1.4倍的经验吞吐量加速;(III)在训练代理场景中展示了稳定的路由,并通过系统建模将通信/内存优势推广到100B规模的配置。

英文摘要

Pre-training Large Language Models (LLMs) typically demands large-scale infrastructure with tightly coupled hardware accelerators. While increasing model and dataset scale remains the dominant driver of performance, Mixture-of-Experts (MoEs) architectures have recently achieved state-of-the-art results by decoupling parameter count from computational cost. This efficiency enables training massive models on constrained compute budgets, yet it typically requires the high-speed interconnects of a single datacenter. To overcome these physical limits, recent approaches such as DiLoCo and Photon use low-communication data-parallel methods to enable scaling across geographically distributed, weakly connected data centers. However, these methods suffer from a fundamental inefficiency: they require full model replicas at every site, which imposes prohibitive memory constraints and communication overheads. In this work, we introduce FoMoE, a system that breaks the full-replica paradigm by partitioning expert layers across workers. We demonstrate that FoMoE: (I) reduces communication costs by up to 1.42x over efficient baselines and 45.44x over DDP via partial expert replication in the studied regimes; (II) achieves empirical throughput speedups of up to 1.4x through a novel skip-token mechanism; and (III) shows stable routing in the trained proxy regimes and projects the communication/memory benefits to 100B-scale configurations through system modelling.

2606.19150 2026-06-18 cs.LG 新提交

Complementary Attention Head Pruning for Efficient Transformers

互补注意力头剪枝用于高效Transformer

Yaniv Livertovsky, Shahar Somin, Gonen Singer

发表机构 * Bar-Ilan University(巴伊兰大学)

AI总结 提出CAHP框架,将注意力头选择建模为全局图论问题,通过图聚类和信息论距离保留互补头,自动确定剪枝数量,在SST-5和MNLI上优于现有方法。

Comments 9 pages, 4 figures, 3 tables. Accepted for presentation at the International Joint Conference on Neural Networks (IJCNN) 2026

详情
AI中文摘要

基于Transformer的模型在自然语言处理中的显著成功源于架构的规模化,这导致大量参数并阻碍了在资源受限环境中的部署。虽然结构化剪枝提供了一条压缩路径,但现有的最先进方法通常依赖于基于梯度的重要性排序或随机门控,这些方法存在不稳定性、结构退化以及需要大量手动超参数调整的问题。在本文中,我们引入了CAHP(互补注意力头剪枝),一种新颖的事后框架,将头选择重新定义为全局图论问题。CAHP不是孤立地评估头,而是利用基于图的聚类结合信息论距离度量来识别并保留一组拓扑多样化的互补注意力头。无需预定义稀疏度或剪枝比例,该框架通过识别递减的边际性能曲线自动确定各层中保留的注意力头数量,其中根据所选多项式次数,剪除额外头会导致性能急剧下降。在SST-5和MNLI基准上跨不同Transformer模型规模的广泛评估表明,CAHP始终优于竞争基线,特别是在高压缩率情况下。此外,我们的结构分析表明,CAHP避免了基于梯度的剪枝方法的“邻近偏差”(倾向于主要保留靠近输出层的头),而是保留了模型中间层中功能关键的注意力头集合。

英文摘要

The remarkable success of Transformer-based models in natural language processing stems from architectural scaling, which leads to a large number of parameters and hinders deployment in resource-constrained environments. While structured pruning offers a pathway to compression, existing state-of-the-art methods often rely on gradient-based importance ranking or stochastic gating, which suffer from instability, structural degeneration, and the need for extensive manual hyperparameter tuning. In this paper, we introduce CAHP (Complementary Attention Head Pruning), a novel post-hoc framework that redefines head selection as a global graph-theoretical problem. Rather than evaluating heads in isolation, CAHP utilizes graph-based clustering combined with information-theoretic distance measures to identify and preserve a topologically diverse subset of complementary attention heads. Without requiring a predefined sparsity level or pruning ratio, the framework automatically determines the number of selected attention heads across layers by identifying a diminishing marginal performance curve, where pruning additional heads leads to a sharp degradation in performance, as determined by the chosen polynomial degree. Extensive evaluations on the SST-5 and MNLI benchmarks, across different Transformer model scales, demonstrate that CAHP consistently outperforms competitive baselines, particularly in high-compression regimes. Furthermore, our structural analysis shows that CAHP avoids the "proximity bias" of gradient-based pruning methods, which tend to preserve heads mainly in layers close to the output, and instead retains a functionally critical set of attention heads in the model's intermediate layers.

2606.16290 2026-06-18 cs.LG cs.AI 新提交

An affordable hardware-aware neural architecture search for deploying convolutional neural networks on ultra-low-power computing platforms

一种经济实惠的硬件感知神经架构搜索,用于在超低功耗计算平台上部署卷积神经网络

Andrea Mattia Garavagno, Edoardo Ragusa, Antonio Frisoli, Paolo Gastaldo

发表机构 * University of Genoa(热那亚大学) Scuola Superiore Sant’Anna(圣安娜高等研究学院)

AI总结 提出一种轻量级硬件感知神经架构搜索方法,生成可在超低功耗微控制器上运行的微型CNN,在保持分类精度的同时降低搜索成本。

详情
Journal ref
IEEE Sensors Letters, vol. 8, no. 5, pp. 1-4, May 2024
AI中文摘要

硬件感知神经架构搜索(HW-NAS)通过自动设计能够满足预置硬件约束的神经架构,使得卷积神经网络(CNN)能够集成到微控制器设备中。然而,最先进的HW-NAS针对的是高性能微控制器,其功耗无法满足传感节点的要求。本文提出了一种HW-NAS方法,生成可在超低功耗微控制器上运行的微型CNN,其搜索过程轻量级,甚至可以在嵌入式设备上执行。在三个著名的微型计算机视觉基准测试上的实证结果表明,所提出的HW-NAS能够在保持最先进分类精度的同时生成微型CNN。

英文摘要

Hardware-aware neural architecture search (HW-NAS) allows the integration of Convolutional Neural Networks (CNNs) in microcontrollers devices by automatically designing neural architectures that can fit prearranged hardware constraints. However, state-of-the-art HW-NAS target high-performance microcontrollers, whose power consumption does not meet sensing nodes requirements. This work presents a HW-NAS generating tiny CNNs that can run on ultra-low-power microcontrollers, featuring a lightweight search procedure enabling its execution even on embedded devices. Empirical results on three well-known benchmarks for tiny computer vision proved that the proposed HW-NAS was able to generate tiny CNNs while preserving state-of-the-art classification accuracy.

2606.18463 2026-06-18 cs.DC cs.LG cs.NA math.NA stat.ML 交叉投稿

Mixed-Precision Communication-Avoiding SGD for Generalized Linear Models on GPUs

面向GPU上广义线性模型的混合精度通信避免SGD

Aditya Devarakonda, Irene Simó Muñoz, Giulia Guidi

发表机构 * Department of Computer Science, Wake Forest University(沃杰福大学计算机科学系) Department of Computer Science, Cornell University(康奈尔大学计算机科学系)

AI总结 提出混合精度通信避免SGD(CA-SGD),通过分析有限精度误差将精度选择分解为九个独立部分,在NVIDIA GPU上实现5.1-6.8倍加速,且损失与FP32 SGD匹配。

详情
AI中文摘要

分布式随机梯度下降(SGD)受限于通信而非计算,因为每次迭代都需要跨进程进行AllReduce。通信避免SGD(CA-SGD)通过将$s$次连续的AllReduce替换为单个$sb\ imes sb$ Gram矩阵的AllReduce,将通信开销分摊到$s$次迭代中,以更多的计算和带宽换取更少的同步点。现代GPU配备矩阵硬件和低精度格式,通过加速Gram GEMM和缩减BF16流量来抵消这一开销。我们研究了NVIDIA GPU上针对广义线性模型的混合精度CA-SGD。我们的有限精度分析将一次CA-SGD外迭代的局部舍入误差分解为九个独立的精度选择,仅通过低精度单元舍入误差依赖于硬件,因此所得方案原则上可跨GPU代际迁移。该方案将输入矩阵和边缘向量以低精度存储,从低精度输入计算Gram矩阵并采用高精度累加,以高精度通信该矩阵,并以高精度执行内部递推和权重更新。在NERSC Perlmutter A100 GPU上,混合精度CA-SGD在逻辑回归、线性回归和泊松问题上的损失与FP32 SGD相差在0.5%以内,并在epsilon、SUSY、HIGGS、synth和Poisson-synth数据集上达到5.1-6.8倍于FP32 SGD的加速。我们的软件可在以下网址获取:this https URL

英文摘要

Distributed stochastic gradient descent (SGD) is limited by communication rather than computation, since each iteration requires an AllReduce across processes. Communication-avoiding SGD (CA-SGD) amortizes communication over $s$ iterations by replacing $s$ consecutive AllReduces with a single AllReduce of an $sb\times sb$ Gram matrix, trading more computation and bandwidth for fewer synchronization points. Modern GPUs with matrix hardware and reduced-precision formats offset this by accelerating the Gram GEMM and shrinking BF16 traffic. We study mixed-precision CA-SGD for generalized linear models on NVIDIA GPUs. Our finite-precision analysis decomposes the local rounding error of one CA-SGD outer iteration into nine independent precision choices, depending on the hardware only through its low-precision unit roundoffs, so the resulting recipes transfer in principle across GPU generations. The recipe stores the input matrix and margin vector in low precision, computes the Gram matrix from low-precision inputs with high-precision accumulation, communicates it in high precision, and performs the inner recurrence and weight updates in high precision. On NERSC Perlmutter A100 GPUs, mixed-precision CA-SGD matches FP32 SGD loss within $0.5\%$ on logistic, linear, and Poisson problems and reaches $5.1$--$6.8\times$ speedup over FP32 SGD on epsilon, SUSY, HIGGS, synth, and Poisson-synth. Our software is available at https://doi.org/10.5281/zenodo.20448273

2606.19004 2026-06-18 cs.DC cs.AI cs.LG 交叉投稿

Spotlight: Synergizing Seed Exploration and Spot GPUs for DiT RL Post-Training

Spotlight: 协同种子探索与抢占式GPU用于DiT强化学习后训练

Ruiqi Lai, Dakai An, Wei Gao, Ju Huang, Siran Yang, Jiamang Wang, Lin Qu, Dmitrii Ustiugov, Wei Wang

发表机构 * NTU Singapore(南洋理工大学) Hong Kong University of Science and Technology(香港科技大学) Alibaba Group(阿里巴巴集团)

AI总结 针对DiT强化学习后训练成本高的问题,提出Spotlight系统,通过利用探索对旧权重的容忍性和SP组快速重配置,在抢占式GPU上实现高效训练,加速4倍并降低成本1.4-6.4倍。

详情
AI中文摘要

扩散Transformer(DiT)的强化学习(RL)后训练成本极高,需要数千块高端GPU。现有工作探索了两个降低成本的方向:种子探索通过选择高对比度样本来改善训练收敛,但增加了关键路径的计算量;抢占式GPU提供69-77%的成本降低,但在训练期间处于空闲状态,因为DiT rollout几乎同时完成,这阻止了类似LLM的rollout与训练流水线化。抢占式GPU的抢占进一步破坏了序列并行(SP)组,导致GPU拓扑碎片化。我们提出了Spotlight,这是第一个利用抢占式GPU进行DiT RL后训练的系统。Spotlight基于我们设计的两个关键洞察:(1)我们证明探索可以容忍过时的模型权重,因为使用前一次迭代模型权重的探索保留了随机种子的相对排序,允许探索在训练期间在空闲的抢占式GPU上运行。(2)SP重配置可以重用节点内状态,将组恢复时间从分钟级缩短到亚秒级启动。基于这些洞察,Spotlight引入了三种技术:基于bandit的探索规划器,在训练时间预算内最大化奖励方差;弹性序列并行,通过持久调度器和节点内权重复制动态重配置SP组;以及抢占感知的拉取式请求调度器,平衡负载并在抢占时提交进行中的状态。我们在开源RL平台ROLL上实现了Spotlight,并在Qwen-Image后训练上进行了评估。Spotlight达到相同目标验证分数的速度比基线快4倍,总成本降低1.4-6.4倍,同时在分辨率512×512和1280×1280的DeepSeek-OCR和Geneval数据集上实现了更优的图像质量。

英文摘要

Reinforcement learning (RL) post-training of Diffusion Transformers (DiTs) is prohibitively expensive, requiring thousands of high-end GPUs. Existing works explore two directions to reduce cost: seed exploration improves training convergence by selecting high-contrast samples, yet adds compute to the critical path; spot GPUs offer 69--77\% lower cost, yet sit idle during training because DiT rollouts finish nearly simultaneously, which prevents LLM-style pipelining of rollout with training. Spot preemptions further break Sequence Parallelism (SP) groups, fragmenting GPU topology. We present Spotlight, the first system that harvests spot GPUs for DiT RL post-training. Spotlight rests on two key insights we devise: (1)~we show that exploration can tolerate stale model weights because exploration that uses the model weights from the previous iteration preserves the relative ranking of random seeds, allowing exploration to run on idle spot GPUs during training. (2)~SP reconfiguration can reuse on-node state, reducing group recovery from minutes to sub-second launches. Built on these insights, Spotlight introduces three techniques: a bandit-based exploration planner that maximizes reward variance within the training time budget, elastic sequence parallelism that reconfigures SP groups on the fly via persistent schedulers and intra-node weight copying, and a preemption-aware pull-based request scheduler that balances load and commits in-flight state upon preemption. We implement Spotlight on the open-source RL platform ROLL and evaluate it on Qwen-Image post-training. Spotlight reaches the same target validation score $4\times$ faster than baselines, reducing total cost by $1.4$-$6.4\times$ while achieving superior image quality on DeepSeek-OCR and Geneval datasets with resolution $512\times512$ and $1280\times1280$.

2606.14824 2026-06-18 cs.AR cs.AI cs.LG 交叉投稿

Running hardware-aware neural architecture search on embedded devices under 512MB of RAM

在512MB内存下的嵌入式设备上运行硬件感知的神经架构搜索

Andrea Mattia Garavagno, Edoardo Ragusa, Paolo Gastaldo, Antonio Frisoli

发表机构 * University of Bologna(博洛尼亚大学) Politecnico di Milano(米兰理工学院)

AI总结 提出一种在资源受限的嵌入式设备上直接运行的硬件感知神经架构搜索方法,生成针对低端MCU的微型CNN,在Visual Wake Word数据集上达到最先进水平。

详情
Journal ref
2024 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 2024, pp. 1-2
AI中文摘要

本文提出了一种新颖的硬件感知神经架构搜索(HW NAS)方法,该方法考虑了运行它的计算平台上的可用资源,使其能够在各种嵌入式设备上执行。所提出的HW NAS生成针对低端微控制器单元(MCU)的微型卷积神经网络(CNN),这些MCU通常用于物联网(IoT)或可穿戴机器人领域,从而开辟了新的应用场景。网关可以运行它来根据获取的数据定制CNN的架构,而无需使用外部服务器,从而确保隐私。所提出的技术在Visual Wake Word数据集(一个标准的TinyML基准)上的多个人体识别任务中,在多个嵌入式设备上取得了最先进的结果。

英文摘要

This document proposes a novel approach to hardware-aware neural architecture search (HW NAS) that considers the resources available on the computing platform running it, enabling its execution on various embedded devices. The presented HW NAS produces tiny convolutional neural networks (CNNs) targeting low-end microcontroller units (MCUs), typically involved in the Internet of Things (IoT) or wearable robotics, opening new use cases. A gateway could run it to tailor CNNs' architecture on the acquired data without using external servers, ensuring privacy. The proposed technique achieves state-of-the-art results in the human-recognition tasks on the Visual Wake Word dataset, a standard TinyML benchmark, on several embedded devices.

2509.22020 2026-06-18 cs.LG 版本更新

Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models

面向天气基础模型的任务自适应参数高效微调

Shilei Cao, Hehai Lin, Jiashun Cheng, Yang Liu, Guowen Li, Xuehe Wang, Juepeng Zheng, Haoyuan Liang, Meng Jin, Chengwei Qin, Hong Cheng, Haohuan Fu

发表机构 * Sun Yat-sen University(中山大学) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) The Hong Kong University of Science and Technology(香港科技大学) The Chinese University of Hong Kong(香港中文大学) National Supercomputing Center in Shenzhen(深圳国家超算中心) Huawei Technologies Co., Ltd(华为技术有限公司) Tsinghua University(清华大学)

AI总结 提出WeatherPEFT框架,通过任务自适应动态提示和随机Fisher引导自适应选择,在天气下游任务上以更少参数达到全微调性能。

详情
AI中文摘要

尽管机器学习的最新进展使天气基础模型(WFM)在多种下游任务中具备了强大的泛化能力,但随着模型规模扩大,计算需求不断攀升,实际部署愈发困难。当前为视觉或语言任务设计的参数高效微调(PEFT)方法无法应对天气下游任务的独特挑战,如变量异质性、分辨率多样性和时空覆盖变化,导致在WFM上性能欠佳。为弥补这一差距,我们提出WeatherPEFT,一种新颖的PEFT框架,包含两项协同创新。首先,在前向传播中,任务自适应动态提示(TADP)通过内部和外部模式提取,将编码器中的嵌入权重动态注入预训练骨干网络的输入令牌,实现针对特定下游任务的上下文感知特征重校准。其次,在反向传播中,随机Fisher引导自适应选择(SFAS)不仅利用Fisher信息识别并更新最关键的任务参数,从而保留不变的预训练知识,还引入随机性以稳定选择过程。我们在三个下游任务上验证了WeatherPEFT的有效性和效率,现有PEFT方法与全微调相比存在显著差距,而WeatherPEFT使用更少的可训练参数达到了与全微调相当的性能。本工作代码见此https链接。

英文摘要

While recent advances in machine learning have equipped Weather Foundation Models (WFMs) with substantial generalization capabilities across diverse downstream tasks, the escalating computational requirements associated with their expanding scale increasingly hinder practical deployment. Current Parameter-Efficient Fine-Tuning (PEFT) methods, designed for vision or language tasks, fail to address the unique challenges of weather downstream tasks, such as variable heterogeneity, resolution diversity, and spatiotemporal coverage variations, leading to suboptimal performance when applied to WFMs. To bridge this gap, we introduce WeatherPEFT, a novel PEFT framework for WFMs incorporating two synergistic innovations. First, during the forward pass, Task-Adaptive Dynamic Prompting (TADP) dynamically injects the embedding weights within the encoder to the input tokens of the pre-trained backbone via internal and external pattern extraction, enabling context-aware feature recalibration for specific downstream tasks. Furthermore, during backpropagation, Stochastic Fisher-Guided Adaptive Selection (SFAS) not only leverages Fisher information to identify and update the most task-critical parameters, thereby preserving invariant pre-trained knowledge, but also introduces randomness to stabilize the selection. We demonstrate the effectiveness and efficiency of WeatherPEFT on three downstream tasks, where existing PEFT methods show significant gaps versus Full-Tuning, and WeatherPEFT achieves performance parity with Full-Tuning using fewer trainable parameters. The code of this work is available at https://github.com/ShileiCao/WeatherPEFT.

2601.21626 2026-06-18 cs.LG cs.AI 版本更新

HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning

HeRo-Q: 通过Hessian条件化实现稳定低比特量化的通用框架

Jinhao Zhang, Yunquan Zhang, Zicheng yan, Boyang Zhang, Jun Sun, Daning Cheng

发表机构 * Beijing University of Posts and Telecommunications(北京邮电大学) Institute of Computing Technology, Chinese Academy of Sciences(中国科学院计算技术研究所) University of Science and Technology of China(中国科学技术大学) Zhejiang Lab(浙江实验室) Peng Cheng Laboratory(鹏城实验室)

AI总结 针对后训练量化中“低误差、高损失”的矛盾,提出HeRo-Q算法,通过轻量可学习的旋转压缩矩阵重塑损失景观,降低最大Hessian特征值,增强对量化噪声的鲁棒性,在Llama和Qwen模型上优于现有方法。

详情
AI中文摘要

后训练量化(PTQ)是一种主流的模型压缩技术,但由于其仅专注于最小化量化误差,常常导致矛盾的“低误差、高损失”现象。根本原因在于LLM损失景观的Hessian矩阵:少数高曲率方向对扰动极其敏感。为了解决这个问题,我们提出了Hessian鲁棒量化(HeRo Q)算法,该算法在量化前对权重空间应用一个轻量级、可学习的旋转压缩矩阵。这个联合框架通过降低最大的Hessian特征值并减小其最大特征值来重塑损失景观,从而显著增强对量化噪声的鲁棒性。HeRo-Q不需要修改架构,计算开销可忽略不计,并且可以无缝集成到现有的PTQ流程中。在Llama和Qwen模型上的实验表明,HeRo Q在标准W4A8设置下不仅持续优于包括GPTQ、AWQ和SpinQuant在内的最先进方法,而且在极具挑战性的W3A16超低比特场景中表现出色,将Llama3 8B在GSM8K上的准确率提升至70.15%,并有效避免了激进量化中常见的逻辑崩溃。

英文摘要

Post Training Quantization (PTQ), a mainstream model compression technique, often leads to the paradoxical 'low error, high loss' phenomenon because it focuses solely on minimizing quantization error. The root cause lies in the Hessian matrix of the LLM loss landscape: a few high curvature directions are extremely sensitive to perturbations. To address this, we propose the Hessian Robust Quantization (HeRo Q) algorithm, which applies a lightweight, learnable rotation-compression matrix to the weight space prior to quantization. This joint framework reshapes the loss landscape by reducing the largest Hessian eigenvalue and reducing its max eigenvalue, thereby significantly enhancing robustness to quantization noise. HeRo-Q requires no architectural modifications, incurs negligible computational overhead, and integrates seamlessly into existing PTQ pipelines. Experiments on Llama and Qwen models show that HeRo Q consistently outperforms state of the art methods including GPTQ, AWQ, and SpinQuant not only achieving superior performance under standard W4A8 settings, but also excelling in the highly challenging W3A16 ultra low bit regime, where it boosts GSM8K accuracy on Llama3 8B to 70.15\% and effectively avoids the logical collapse commonly seen in aggressive quantization.

2602.00161 2026-06-18 cs.LG cs.AI cs.CL quant-ph 版本更新

LLM Compression by Block Removal with Constrained Binary Optimization

通过带约束二进制优化的块移除进行LLM压缩

David Jansen, Roman Rausch, Ali Hashemi, David Montero, Román Orús

发表机构 * Multiverse Computing(多维计算公司) Donostia International Physics Center(多斯蒂亚国际物理中心) Ikerbasque Foundation for Science(伊克尔巴斯克科学基金会)

AI总结 提出将大语言模型块移除压缩问题建模为约束二进制优化,映射到Ising玻璃系统,实现高效排序和高质量非连续块移除,在50%压缩时MMLU提升近23个百分点,且计算高效、通用性强。

Comments 16 pages, 3 figures

详情
AI中文摘要

在本文中,我们将通过最优删除Transformer块(“块移除”)来压缩大语言模型(LLM)的问题,表述为一个约束二进制优化(CBO)问题,该问题可以映射到物理系统(Ising玻璃),其能量是下游模型性能的强代理。这种表述使得能够高效地对大量候选块移除配置进行排序,产生许多高质量、非平凡的解决方案,而不仅仅是移除连续区域。我们的方法在深度压缩场景中表现强劲,例如在Llama-3.3-70B-Instruct的50%压缩中,与其他最先进的块移除方法相比,我们在MMLU基准上取得了近23个百分点的提升。对于较轻的压缩,它在多个基准上与这些方法表现相当,适用于Llama-3.1-8B-Instruct、Qwen3-14B(重训练前后)以及Llama-3.3-70B-Instruct。该方法计算效率高,仅需在校准数据集上对少数活跃参数进行前向和反向传播。此外,我们证明,当无法精确求解CBO问题时,使用良好的启发式求解器可以在可忽略的运行时间内提供在下游任务上表现良好的解决方案。该方法可以轻松应用于任何架构。我们在最近的NVIDIA-Nemotron-3-Nano-30B-A3B-FP8模型上展示了这种通用性,该模型具有高度不均匀且具有挑战性的块结构,并且在移除2个注意力层或3个混合专家层时,我们在AIME25和GPQA上超越了最先进水平。

英文摘要

In this paper, we formulate the compression of large language models (LLMs) by optimally deleting transformer blocks (``block removal'') as a constrained binary optimization (CBO) problem that can be mapped to a physical system (Ising glass), whose energies are a strong proxy for downstream model performance. This formulation enables an efficient ranking of a large number of candidate block-removal configurations yielding many high-quality, non-trivial solutions beyond those only removing consecutive regions. Our method performs strongly in the deep compression regime, such as for 50% compression of Llama-3.3-70B-Instruct, where we achieve an almost 23 percentage point increase on the MMLU benchmark compared to other state-of-the-art (SOTA) block-removal methods. For lighter compression, it performs on par with those methods across several benchmarks for Llama-3.1-8B-Instruct, Qwen3-14B (both before and after retraining), as well as Llama-3.3-70B-Instruct. The approach is computationally efficient and requires only forward and backward passes on a calibration dataset for a few active parameters. Additionally, we demonstrate that using good heuristic solvers for the CBO problem provides solutions that perform well on downstream tasks in negligible runtime when it is unfeasible to solve the problem exactly. The method can be readily applied to any architecture. We illustrate this generality on the recent NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 model, which exhibits a highly inhomogeneous and challenging block structure, and where we outperform SOTA for AIME25 and GPQA when removing either 2 attention layers or 3 mixture-of-experts layers.

2512.12850 2026-06-18 cs.AR cs.LG cs.SY eess.SY hep-ex 版本更新

KANELÉ: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation

KANELÉ:基于Kolmogorov-Arnold网络的高效LUT评估

Duc Hoang, Aarush Gupta, Philip Harris

发表机构 * Massachusetts Institute of Technology(麻省理工学院)

AI总结 提出KANELÉ框架,利用Kolmogorov-Arnold网络(KAN)的独特性质,通过量化与剪枝协同优化,首次系统实现FPGA上的高效LUT映射,相比先前方法加速高达2700倍并节省大量资源。

Comments International Symposium on Field-Programmable Gate Arrays 2026 (ISFPGA'2026)

详情
AI中文摘要

低延迟、资源高效的FPGA神经网络推理对于需要实时能力和低功耗的应用至关重要。基于查找表(LUT)的神经网络是一种常见解决方案,结合了强大的表示能力和高效的FPGA实现。在这项工作中,我们介绍了KANELÉ,一个利用Kolmogorov-Arnold网络(KAN)独特性质进行FPGA部署的框架。与传统的多层感知器(MLP)不同,KAN使用可学习的一维样条作为边缘激活函数,其域固定,这种结构天然适合离散化和高效的LUT映射。我们提出了第一个在FPGA上实现KAN的系统设计流程,通过量化与剪枝协同优化训练,以实现紧凑、高吞吐量和低延迟的KAN架构。我们的结果表明,与先前的KAN-on-FPGA方法相比,加速高达2700倍,并节省了数量级的资源。此外,KANELÉ在广泛使用的基准测试中匹配或超越了其他基于LUT的架构,特别是在涉及符号或物理公式的任务中,同时平衡了FPGA硬件上的资源使用。最后,我们通过将框架扩展到实时、高能效的控制系统,展示了其多功能性。

英文摘要

Low-latency, resource-efficient neural network inference on FPGAs is essential for applications demanding real-time capability and low power. Lookup table (LUT)-based neural networks are a common solution, combining strong representational power with efficient FPGA implementation. In this work, we introduce KANELÉ, a framework that exploits the unique properties of Kolmogorov-Arnold Networks (KANs) for FPGA deployment. Unlike traditional multilayer perceptrons (MLPs), KANs employ learnable one-dimensional splines with fixed domains as edge activations, a structure naturally suited to discretization and efficient LUT mapping. We present the first systematic design flow for implementing KANs on FPGAs, co-optimizing training with quantization and pruning to enable compact, high-throughput, and low-latency KAN architectures. Our results demonstrate up to a 2700x speedup and orders of magnitude resource savings compared to prior KAN-on-FPGA approaches. Moreover, KANELÉ matches or surpasses other LUT-based architectures on widely used benchmarks, particularly for tasks involving symbolic or physical formulas, while balancing resource usage across FPGA hardware. Finally, we showcase the versatility of the framework by extending it to real-time, power-efficient control systems.

2602.02056 2026-06-18 cs.AR cs.LG cs.SY eess.SY stat.ML 版本更新

Ultrafast On-chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks

基于Kolmogorov-Arnold网络中样条局部性的超快片上在线学习

Duc Hoang, Aarush Gupta, Philip Harris

发表机构 * MIT(麻省理工学院)

AI总结 针对量子计算和核聚变控制等高频系统对亚微秒级在线学习的需求,提出利用Kolmogorov-Arnold网络的B样条局部性实现稀疏更新和固定点量化鲁棒性,在FPGA上实现比MLP更高效、更具表达力的超快在线学习。

Comments Forty-Third International Conference on Machine Learning (ICML'26)

详情
AI中文摘要

超快在线学习对于高频系统(如量子计算和核聚变控制)至关重要,这些系统中的自适应必须在亚微秒时间尺度内发生。满足这些需求需要在严格的内存约束下进行低延迟、固定精度的计算,而传统的多层感知器(MLP)在这种条件下既低效又不稳定。我们识别了Kolmogorov-Arnold网络(KAN)与这些约束相符的关键特性。具体来说,我们表明:(i)利用B样条局部性的KAN更新是稀疏的,从而实现优越的片上资源缩放;(ii)KAN对固定点量化具有固有的鲁棒性。通过在现场可编程门阵列(FPGA)上实现固定点在线训练(一种代表性的片上计算平台),我们证明基于KAN的在线学习器在一系列低延迟和资源受限的任务中比MLP显著更高效且更具表达力。据我们所知,这项工作首次展示了在亚微秒延迟下的无模型在线学习。

英文摘要

Ultrafast online learning is essential for high-frequency systems, such as controls for quantum computing and nuclear fusion, where adaptation must occur on sub-microsecond timescales. Meeting these requirements demands low-latency, fixed-precision computation under strict memory constraints, a regime in which conventional Multi-Layer Perceptrons (MLPs) are both inefficient and numerically unstable. We identify key properties of Kolmogorov-Arnold Networks (KANs) that align with these constraints. Specifically, we show that: (i) KAN updates exploiting B-spline locality are sparse, enabling superior on-chip resource scaling, and (ii) KANs are inherently robust to fixed-point quantization. By implementing fixed-point online training on Field-Programmable Gate Arrays (FPGAs), a representative platform for on-chip computation, we demonstrate that KAN-based online learners are significantly more efficient and expressive than MLPs across a range of low-latency and resource-constrained tasks. To our knowledge, this work is the first to demonstrate model-free online learning at sub-microsecond latencies.

2606.04404 2026-06-18 stat.ML cs.LG 版本更新

Knockoffs-based False Discovery Rate Control and Simplification for Deep Neural Networks

基于Knockoffs的深度神经网络错误发现率控制与简化

Wenyu Liao, Yiqing Shi, Fang Xie

发表机构 * bnbu.edu.cn(北京理工大学)

AI总结 本文基于knockoff方法和正则化神经网络,提出了三种在控制错误发现率条件下的变量筛选方法(单层过滤、多层过滤、变量权重聚合过滤),以简化深度神经网络并降低计算复杂度。

详情
AI中文摘要

深度神经网络是机器学习中广泛使用的框架,已广泛应用于各个领域。然而,深度神经网络通常涉及大量参数和输入,其中许多可能与目标或真实输出无关。这些参数和输入变量不仅增加了计算复杂度,还导致了额外的计算成本。解决这一问题的一种方法是knockoff方法,该方法在高维回归中已被证明能有效控制错误发现率。基于knockoff方法和正则化神经网络,本文提出了三种在控制错误发现率条件下的变量筛选方法:单层过滤、多层过滤、变量权重聚合过滤。与现有算法相比,我们发现我们的算法表现出令人满意的性能。

英文摘要

The deep neural network is a widely used framework in machine learning that has been widely applied in various fields. However, deep neural networks often involve a large number of parameters and inputs, many of which may be irrelevant to the goal or true output. These parameters and input variables not only increase computational complexity, but also contribute to additional computational cost. One solution to this problem is knockoff methods, which have proven successful in controlling false discovery rates in high-dimensional regression. Building on the knockoff methods and using the regularised neural network, this paper proposes three variable screening methods under the condition of controlling false discovery rates: one layer filter, multiple layers filter, and variable weight aggregation filter. In comparison with existing algorithms, we find that our algorithms show satisfactory performance.

7. 联邦学习、隐私与安全 14 篇

2606.18309 2026-06-18 cs.LG cs.AI 新提交

SAGE: Retain-Aware Post-Hoc Sanitization of Final Unlearning Vector

SAGE: 保留感知的最终遗忘向量事后净化

Jingyuan Zhang, Yucheng Bai, Peixi Wen, Zhehao Huang, Zhengbao He, Hanling Tian, Xinwen Cheng, Haiyin Ran, Xiaolin Huang

发表机构 * Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University(上海交通大学图像处理与模式识别研究所)

AI总结 提出SAGE方法,通过事后净化最终更新向量,在不重新运行原始遗忘流程的情况下,缓解大语言模型遗忘与保留能力之间的权衡。

详情
AI中文摘要

大语言模型(LLM)遗忘旨在移除不良知识或行为,同时保留已有能力。当前的遗忘方法都涉及遗忘与保留之间的权衡。我们发现,保留激活偏差也可用于量化遗忘方法对保留造成的损害,而无需考虑遗忘过程的具体实现。这使得我们能够通过事后方法恢复任何遗忘方法的保留性能。因此,我们提出一种互补的事后设置,在不重新运行原始遗忘流程的情况下净化最终更新向量。在该设置中,我们设计了SAGE(光谱激活-几何净化),一种对最终遗忘更新的源无关修正。SAGE从一个小型保留代理收集真实模块输入,提取其主导激活几何结构,并求解一个闭式源锚定优化目标,该目标抑制与高能保留方向对齐的更新分量,同时保留源方法的遗忘载体。在多种遗忘方法、模型规模和基准测试中,SAGE持续缓解保留-遗忘权衡,将最终向量的事后净化识别为机器遗忘中一个实用且未被充分探索的维度。

英文摘要

Large Language Model (LLM) unlearning aims to remove undesirable knowledge or behaviors while preserving retained capabilities. Current unlearning methods all involve a trade-off between unlearning and retention. We have found that the retention activation bias can also be used to quantify the damage an unlearning method inflicts on retention, without considering the specific implementation of the unlearning process. This allows us to restore retention performance for any unlearning method using a post-hoc approach. Therefore, we propose a complementary post-hoc setting to sanitize the final update vector without rerunning the original unlearning pipeline. In this setting, we design SAGE, Spectral Activation-GEometry Sanitization, a source-agnostic correction for final unlearning updates. SAGE collects real module inputs from a small retain proxy, extracts their dominant activation geometry, and solves a source-anchored optimization objective in closed form, which suppresses update components aligned with high-energy retained directions while preserving the source method's forgetting carrier. Across multiple unlearning methods, model scales, and benchmarks, SAGE consistently relieves the retain-forget trade-off, identifying post-hoc sanitization of final vectors as a practical and underexplored axis for machine unlearning.

2606.18384 2026-06-18 cs.LG cs.DC 新提交

SCOPE-FL: A Strategy-proof Chain-based Optimal pareto efficient Federated Learning System

SCOPE-FL:一种策略证明的基于链的最优帕累托高效联邦学习系统

Seyed Salar Ghazi, Kaiwen Zhang, Mehdi feizi, Hans-Arno Jacobsen

发表机构 * École de Technologie Supérieure (ÉTS)(高等技术学院) Ferdowsi University of Mashhad(菲尔多西大学) University of Toronto(多伦多大学)

AI总结 针对分层联邦学习中客户端选择策略缺乏帕累托效率和策略证明性导致整体福利下降的问题,提出SCOPE-FL框架,采用顶级交易循环算法同时保证帕累托最优和策略证明性,并通过区块链智能合约实现奖励分配。

详情
AI中文摘要

分层联邦学习(HFL)能够在分布式设备间实现可扩展的协作模型训练,同时保护数据隐私。然而,现有的HFL客户端选择机制存在根本性的策略低效问题。通过优先考虑稳定性而非帕累托效率(PE),它们产生次优的资源分配,并且缺乏策略证明性(SP),参与者有动机歪曲其真实偏好,这两种失败在实践中都会在帕累托意义上降低系统整体福利。为解决这一问题,我们提出SCOPE-FL(策略证明的基于链的最优帕累托高效联邦学习),一种同步HFL框架,将客户端选择建模为双边学校选择问题,通过顶级交易循环(TTC)算法求解,同时保证PE和SP。对于奖励分配,SCOPE-FL采用基于一轮重建(OR)的可扩展沙普利值近似,确保补偿与每个客户端的贡献成比例。整个机制通过区块链智能合约执行,为SP保证在实践中成立提供了防篡改环境。在MNIST、Fashion-MNIST和CIFAR-10上的综合评估表明,SCOPE-FL在模型准确率、收敛速度和奖励效率方面优于现有最先进方法(包括DA、IAS等),同时通信延迟与DA相当,区块链开销在大规模下显著低于DA。

英文摘要

Hierarchical Federated Learning (HFL) enables scalable collaborative model training across distributed devices while preserving data privacy. However, existing HFL client selection mechanisms suffer from a fundamental strategic inefficiency. By prioritizing stability over Pareto efficiency (PE), they produce suboptimal resource allocations, and without strategy proofness (SP), participants are incentivized to misrepresent their true preferences, both failures degrading system overall welfare in the Pareto sense in practice. To address it, we propose SCOPE-FL (Strategy-proof Chain-based Optimal pareto efficient Federated Learning), a synchronous HFL framework that formulates client selection as a two-sided school choice problem solved through the Top Trading Cycle (TTC) algorithm that simultaneously guarantees PE and SP. For reward distribution, SCOPE-FL employs a scalable Shapley value approximation based on One-Round Reconstruction (OR), ensuring compensation proportional to each client's contribution. The entire mechanism executes via blockchain smart contracts, providing the tamper-proof environment required for the SP guarantees to hold in practice. A comprehensive evaluation on MNIST, Fashion-MNIST, and CIFAR-10 demonstrates that SCOPE-FL outperforms state-of-the-art approaches, including DA, IAS, and other methods across model accuracy, convergence rate, and reward efficiency, while achieving communication latency comparable to DA and blockchain overhead significantly lower than DA at scale.

2606.18518 2026-06-18 cs.LG cs.AI 新提交

PSyGenTAB: A Privacy-Preserving Framework for Synthetic Clinical Tabular Data Generation via Constrained Optimization

PSyGenTAB:通过约束优化生成合成临床表格数据的隐私保护框架

Arshia Ilaty, Hossein Shirazi, Manasi Chitale, Kedar Hegde, Dhanalakshmi Ramesh, Rashmi S. Manjunath, Amir Rahmani, Hajar Homayouni

发表机构 * San Diego State University(圣地亚哥州立大学) University of California, Irvine(加利福尼亚大学尔湾分校)

AI总结 提出PSyGenTAB框架,将合成医疗数据生成建模为约束优化问题,通过增强拉格朗日方法嵌入可配置隐私约束,在保证隐私阈值的同时最大化临床数据效用,实验表明合成数据训练的模型性能与真实数据相当。

Comments 20 pages

详情
AI中文摘要

由于机构壁垒和严格的隐私法规(如HIPAA和GDPR),医疗AI的发展受到高质量临床数据获取限制。合成数据生成提供了一种潜在解决方案,但现有方法缺乏明确管理隐私-效用权衡的原则性机制,常常退化临床有意义的模式或面临患者重识别风险。我们提出PSyGenTAB,一个隐私保护生成框架,将合成医疗数据生成建模为使用增强拉格朗日方法求解的约束优化问题。通过将可配置的隐私约束直接嵌入模型训练,PSyGenTAB在最大化临床数据效用的同时强制执行最低隐私阈值。在多个临床驱动的基准测试中,PSyGenTAB保留了可靠健康AI所需的特征间临床关系和少数类诊断模式。使用“合成训练、真实测试”和“真实训练、合成测试”协议的下游评估表明,在合成数据上训练的模型达到了与真实患者记录训练模型相当的性能。隐私审计进一步证明了精确记录复制的减少和对成员推理攻击的强大抵抗力。这些结果确立了PSyGenTAB作为平衡合成医疗数据中隐私保护和临床效用的原则性框架,支持安全的跨机构AI开发。

英文摘要

The development of medical AI is constrained by limited access to high-quality clinical data due to institutional silos and strict privacy regulations such as HIPAA and GDPR. Synthetic data generation offers a potential solution, but existing methods lack principled mechanisms to explicitly manage the privacy-utility trade-off, often degrading clinically meaningful patterns or risking patient re-identification. We present PSyGenTAB, a privacy-preserving generative framework that formulates synthetic healthcare data generation as a constrained optimization problem solved using the Augmented Lagrangian Method. By embedding configurable privacy constraints directly into model training, PSyGenTAB enforces minimum privacy thresholds while maximizing clinical data utility. Across multiple clinically motivated benchmarks, PSyGenTAB preserves inter-feature clinical relationships and minority-class diagnostic patterns essential for reliable health AI. Downstream evaluation using Train-on-Synthetic, Test-on-Real and Train-on-Real, Test-on-Synthetic protocols shows that models trained on synthetic data achieve performance comparable to those trained on real patient records. Privacy auditing further demonstrates reduced exact record reproduction and strong resilience to membership inference attacks. These results establish PSyGenTAB as a principled framework for balancing privacy protection and clinical utility in synthetic healthcare data, supporting secure cross-institutional AI development.

2606.18773 2026-06-18 cs.LG cs.AI 新提交

Private Learning with Public Feature Conditioning

基于公共特征条件化的私有学习

Shuli Jiang, Walid Krichene, Nicolas Mayoraz

发表机构 * Microsoft(微软) Google Research(谷歌研究院)

AI总结 针对标签差分隐私回归问题,提出Cond-DP方法,利用公共特征矩阵的结构信息构造条件化矩阵以加速优化,在凸、强凸和非凸设置下提供收敛保证,并在线性回归中实现比DPSGD更快的收敛速度。

Comments Proceedings of the 43rd International Conference on Machine Learning (ICML 2026). 26 pages, 9 figures

详情
AI中文摘要

我们研究了每个数据样本包含公共、非敏感特征的设置下的差分隐私(DP)回归问题——这在推荐和广告系统等应用中很常见。虽然这种标签DP或半敏感特征设置主要在分类背景下进行了探索,但有效的回归方法仍未被充分研究。我们提出了Cond-DP,一种DPSGD的条件化变体,它利用公共特征矩阵的结构来改善隐私约束下的优化。受这些公共特征通常表现出快速衰减谱的观察启发,Cond-DP引入了一个数据驱动的条件化矩阵来重塑优化景观并加速收敛。我们为凸、强凸和非凸设置提供了收敛保证,并将标准DPSGD作为条件化矩阵为单位矩阵时的特例。我们展示了如何直接从公共特征为Cond-DP构造有效的条件化矩阵,从而在私有线性回归中实现比DPSGD更快的收敛速度,且不增加额外的隐私成本。实验表明,在标签DP下,使用该条件化矩阵的Cond-DP在多种数据集和模型架构上持续优于最先进的基线方法,展示了强大且稳健的实际性能。

英文摘要

We study differentially private (DP) regression in settings where each data sample includes public, non-sensitive features -- common in applications such as recommendation and advertising systems. While such label-DP or semi-sensitive-feature settings have been primarily explored in the context of classification, effective approaches for regression remain underexplored. We introduce Cond-DP, a conditioned variant of DPSGD that leverages the structure of public feature matrices to improve optimization under privacy constraints. Motivated by the observation that these public features often exhibit rapidly decaying spectra, Cond-DP incorporates a data-driven conditioning matrix to reshape the optimization landscape and accelerate convergence. We provide convergence guarantees for convex, strongly convex, and non-convex settings, and recover standard DPSGD as a special case when the conditioning matrix is the identity. We show how to construct an effective conditioning matrix for Cond-DP directly from public features, enabling provably faster convergence than DPSGD in private linear regression without incurring additional privacy cost. Empirically, Cond-DP with this conditioning matrix consistently outperforms state-of-the-art baselines across a wide range of datasets and model architectures under label DP, demonstrating strong and robust performance in practice.

2606.19220 2026-06-18 cs.LG cs.AI 新提交

Machine Unlearning for the XGBoost Model with Network Intrusion Datasets

面向网络入侵数据集的XGBoost模型机器遗忘

Diana Magalhães, Eva Maia, João Vitorino, Isabel Praça

发表机构 * GECAD, ISEP, Polytechnic of Porto(波尔图理工学院工程学院GECAD研究所)

AI总结 针对XGBoost模型提出XGBoost-Forget遗忘方法,在表格型网络入侵数据集上实现高效遗忘,保持模型性能的同时显著提升遗忘速度。

Comments 12 pages, 7 tables, WorldCist'26 Conference

详情
AI中文摘要

机器遗忘(MU)已成为一种从训练模型中移除特定数据点而无需完全重新训练的重要技术。然而,现有大多数MU研究集中于深度学习和图像数据,在网络入侵检测领域存在空白,该领域严重依赖表格数据。本文引入XGBoost-Forget,一种针对XGBoost模型的遗忘方法,以填补这一空白。该方法在两个表格型网络入侵(NI)数据集IoT-23和GeNIS上进行了评估,使用多个指标衡量模型性能、遗忘效率和遗忘质量。结果表明,XGBoost-Forget在保持接近原始模型的预测性能的同时,提供了显著更快的遗忘速度,展示了其在表格型NI场景中用于MU的潜力。

英文摘要

Machine Unlearning (MU) has emerged as an important technique for removing specific data points from trained models without requiring full retraining. However, most existing MU research focuses on deep learning and image data, leaving a gap in the domain of network intrusion detection, which relies heavily on tabular data. This work introduces XGBoost-Forget, an unlearning approach for the XGBoost model, to address this gap. The approach is evaluated on two tabular Network Intrusion (NI) datasets, IoT-23 and GeNIS, using multiple metrics to assess model performance, unlearning efficiency, and forgetting quality. The results show that XGBoost-Forget maintains predictive performance close to the original model while providing significantly faster unlearning, demonstrating its potential for MU in tabular NI settings.

2606.19222 2026-06-18 cs.LG cs.AI 新提交

Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning

机制引导的选择性遗忘:针对RLVR诱导的推理

Chenyu Zhou, Qiliang Jiang, Shuning Wu, Xu Zhou

发表机构 * School of Engineering, Institute of Science Tokyo, Japan(东京科学大学工学院) College of Control Science and Engineering, Zhejiang University, China(浙江大学控制科学与工程学院) Department of Electrical and Computer Engineering, National University of Singapore, Singapore(新加坡国立大学电气与计算机工程系)

AI总结 提出MAST方法,通过机制引导选择性更新参数,在遗忘RLVR诱导的推理行为时,显著降低对保留性能的附带损害。

Comments 15 pages, 4 figures, 7 tables

详情
AI中文摘要

我们提出MAST(机制对齐选择性目标),一种机制引导的方法,用于遗忘RLVR诱导的推理,其附带损害远低于标准全参数更新。在Qwen2.5-Math-1.5B和Qwen3-1.7B-Base的匹配SFT/RLVR检查点上,SFT到RLVR的增量在token级delta-log-probability上与SFT更新显著不同,而全参数梯度上升仅通过破坏保留的MATH和GSM8K来实现遗忘。MAST根据离主能量、更新幅度和遗忘梯度耦合幅度对注意力投影张量进行排序,然后仅更新排名最高的子集。在主模型上,MAST诱导了统计上显著的目标遗忘(MATH遗忘从45/150降至37/150;McNemar p=0.0078),同时保留了GSM8K(+0.8个百分点)和MATH保留(-0.5个百分点)。该优势在不同种子、NPO/SimNPO目标以及Qwen3上均得到复现,在Qwen3上MAST保留了GSM8K,而全参数遗忘导致其崩溃。

英文摘要

We propose MAST (Mechanism-Aligned Selective Targeting), a mechanism-guided method for unlearning RLVR-induced reasoning with substantially lower collateral damage than standard full-parameter updates. In matched SFT/RLVR checkpoints on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, the SFT-to-RLVR increment differs sharply from the SFT update in token-level delta-log-probability, and full-parameter gradient ascent forgets only by damaging retain MATH and GSM8K. MAST ranks attention-projection tensors by off-principal energy, update magnitude, and forget-gradient coupling magnitude, then updates only the top-ranked subset. On the primary model, MAST induces statistically significant target forgetting (MATH forget 45/150 to 37/150; McNemar p=0.0078) while preserving GSM8K (+0.8 pp) and MATH retain (-0.5 pp). The advantage reproduces across seeds, NPO/SimNPO objectives, and Qwen3, where MAST preserves GSM8K while full-parameter unlearning collapses it.

2606.19262 2026-06-18 cs.LG 新提交

Detecting Hidden ML Training With Zero-Overhead Telemetry

使用零开销遥测检测隐藏的机器学习训练

Robi Rahman, Sabiha Tajdari

发表机构 * Machine Intelligence Research Institute(机器智能研究所) University of Virginia(弗吉尼亚大学)

AI总结 本文评估了仅使用零开销、隐私保护的NVML遥测(内容无关信号)对GPU工作负载分类的对抗鲁棒性,开发了一个分类器,在识别训练工作负载时达到98.2%的二元准确率,并对最具挑战性的意外工作负载达到43-87%的准确率。

Comments Technical AI Governance Research workshop at ICML 2026

详情
AI中文摘要

硬件支持的GPU工作负载监控是许多AI计算治理方案的基础,但如果开发者能够击败监控机制,这些方案将不可行。我们评估了仅使用零开销、隐私保护的NVML遥测(内容无关信号,观察计算的物理效应而不访问模型权重、训练数据或超参数)的GPU工作负载分类的对抗鲁棒性。在5轮监控-逃避迭代中,我们在跨越4代架构的9个GPU模型上评估了20种逃避策略家族。我们开发了一个分类器,在整个语料库上识别训练工作负载时达到98.2%的二元准确率,并在最具挑战性的意外工作负载上(即使它们被对抗性伪装)达到43-87%的准确率。

英文摘要

Hardware-enabled monitoring of GPU workloads underpins many proposals for AI compute governance, but if developers can defeat monitoring mechanisms, such schemes are unworkable. We evaluate the adversarial robustness of GPU workload classification using only zero-overhead, privacy-preserving NVML telemetry: content-agnostic signals that observe physical effects of computation without accessing model weights, training data, or hyperparameters. Across 5 rounds of monitor-evader iteration, we evaluate 20 evasion strategy families on 9 GPU models spanning 4 architecture generations. We develop a classifier that achieves 98.2% binary accuracy at identifying training workloads across the whole corpus, and 43-87% accuracy against the most challenging unexpected workloads even when they are adversarially disguised.

2606.18312 2026-06-18 cs.CR cs.DC cs.LG 交叉投稿

TIGER: Inverting Transformer Gradients via Embedding-Subspace Distance Optimization

TIGER:通过嵌入子空间距离优化反转Transformer梯度

William Kalikman, Ivo Petrov, Dimitar I. Dimitrov, Martin Vechev

发表机构 * ETH Zürich(苏黎世联邦理工学院) INSAIT, Sofia University "St. Kliment Ohridski"(索菲亚大学"圣克莱门特·奥赫里茨基")

AI总结 提出TIGER攻击,通过将子空间信号转化为可微目标,直接优化令牌嵌入以最小化到子空间的距离,在编码器模型上提升重建质量和速度,在解码器模型上增强对差分隐私的鲁棒性。

Comments 16 pages, 13 pages main text,

详情
AI中文摘要

联邦学习允许多个客户端通过向中央服务器发送梯度更新来联合训练共享模型,同时保持原始输入在本地。然而,先前的梯度反转攻击表明,这些更新可以泄露足够的信息来重建客户端输入。现有的针对Transformer的攻击要么优化虚拟输入以匹配真实的客户端更新,这对于现代模型来说成本高昂且不稳定;要么利用注意力梯度的低秩性来识别包含真实层嵌入的子空间,然后对候选令牌进行离散成员测试。然而,这种令牌测试在数值噪声(例如来自量化或差分隐私)下很脆弱,并且对于具有非因果注意力的编码器模型扩展性差。我们引入了TIGER,一种连续的梯度反转攻击,它将这种子空间信号转化为可微目标。TIGER不是搜索令牌或匹配完整梯度,而是直接优化令牌嵌入以最小化它们到子空间的距离。我们的实验表明,在仅编码器模型上,TIGER在重建质量和运行时间上均显著优于现有攻击;而在解码器模型上,TIGER比先前基于子空间的攻击更鲁棒,从而在受差分隐私保护的联邦学习设置中实现了首次成功的重建。

英文摘要

Federated learning allows multiple clients to jointly train a shared model by sending gradient updates to a central server while keeping raw inputs local. However, prior gradient inversion attacks show that these updates can reveal enough information to reconstruct client inputs. Existing attacks on transformers either optimize dummy inputs to match the true client updates, which is costly and unstable for modern models, or exploit the low rank of attention gradients to identify a subspace containing the true layer embeddings, followed by a discrete membership test for candidate tokens. However, this token test is brittle under numerical noise, i.e., from quantization or Differential Privacy (DP), and scales poorly for encoder models with non-causal attention. We introduce TIGER, a continuous gradient inversion attack that turns this subspace signal into a differentiable objective. Instead of searching over tokens or matching full gradients, TIGER directly optimizes token embeddings to minimize their distance to the subspace. Our experiments demonstrate that on encoder-only models, TIGER substantially improves both reconstruction quality and runtime over existing attacks, while on decoder models, TIGER is more robust than prior subspace-based attacks, enabling the first successful reconstructions in DP-defended federated learning settings.

2606.19023 2026-06-18 cs.CR cs.LG 交叉投稿

Lifecycle-Aware Dynamic Analysis for Secure ML Model Execution

生命周期感知的动态分析用于安全ML模型执行

Gabriele Digregorio, Marco Di Gennaro, Francesco Pastore, Stefano Zanero, Stefano Longari, Michele Carminati

发表机构 * Politecnico di Milano(米兰理工大学)

AI总结 提出Moat,一种动态生命周期感知方法,通过监控模型执行各阶段与宿主系统的结构化交互来检测恶意行为,在多个框架上实现零误报率。

详情
AI中文摘要

对预训练机器学习(ML)模型的日益依赖引入了新的攻击面。最近的漏洞表明,恶意行为可以嵌入模型工件中,常常绕过现有防御。当前的模型扫描解决方案主要依赖于静态的、特定格式的规则或已知的攻击签名,这限制了它们跨框架泛化和检测新型利用路径的能力。相比之下,我们提出了一种解决方案,专注于攻击对执行模型的宿主系统产生的影响,并基于关于ML模型执行的基本直觉。特别地,我们观察到ML模型在定义良好的生命周期阶段内运行,并且在每个阶段内,与宿主系统的交互是高度结构化和可预测的。我们将这些直觉转化为Moat,一种用于安全ML模型执行的动态生命周期感知方法,并在我们的参考实现Re-Moat中实例化此设计。我们使用来自Hugging Face Hub的77,974个真实世界模型工件、来自CVE的31个概念验证(PoC)以及来自最先进数据集的334个模型,在多个ML框架上评估Re-Moat,并将其与最先进的模型扫描解决方案进行比较。我们的结果表明,我们的方法检测到所有评估的攻击类别,同时保持接近零的误报率,验证了我们的直觉并激励了用于安全ML模型执行的动态分析。

英文摘要

The growing reliance on pre-trained Machine Learning (ML) models has introduced new attack surfaces. Recent vulnerabilities demonstrate that malicious behavior can be embedded within model artifacts, often bypassing existing defenses. Current model-scanning solutions primarily rely on static, format-specific rules or known attack signatures, which limit their ability to generalize across frameworks and to detect novel exploitation paths. In contrast, we propose a solution that focuses on the effects an attack has on the host system executing the model and builds on foundational intuitions about ML model execution. In particular, we observe that ML models operate within well-defined lifecycle phases and that, within each phase, interactions with the host system are highly structured and predictable. We translate these intuitions into Moat, a dynamic lifecycle-aware approach for securing ML model execution, and instantiate this design in Re-Moat, our reference implementation. We evaluate Re-Moat across multiple ML frameworks using 77,974 real-world model artifacts from the Hugging Face Hub, 31 Proofs-of-Concept (PoCs) from CVEs, and 334 models from a state-of-the-art dataset, and compare it against state-of-the-art model-scanning solutions. Our results show that our approach detects all evaluated attack classes while maintaining a close-to-zero false-positive rate, validating our intuitions and motivating dynamic analysis for securing ML model execution.

2606.19129 2026-06-18 cs.CR cs.LG 交叉投稿

Giskard : Byzantine Robust and Confidential Aggregation for Large-Scale Decentralized Learning

Giskard: 大规模去中心化学习中的拜占庭鲁棒与机密聚合

Ousmane Touat, César Sabater, Mohamed Maouche, Sonia Ben Mokhtar

发表机构 * INSA Lyon, LIRIS, CNRS(里尔斯大学 Lyon,LIRIS,CNRS) INRIA, INSA Lyon(法国国家科学研究中心 INRIA,里尔斯大学 Lyon)

AI总结 针对去中心化学习中同时保证机密性和抵御拜占庭行为的挑战,提出Giskard协议,通过树状委员会结构和BGW风格MPC实现近似中位数聚合,在百万级参与者下降低通信复杂度并保持模型效用。

Comments 17 pages, with appendix

详情
AI中文摘要

在去中心化学习中同时处理机密性和拜占庭行为是一个具有挑战性的问题。实际上,在去中心化学习中,客户端在本地保留数据的同时训练机器学习模型,并与一组邻居共享其模型参数或梯度。虽然强制机密性需要隐藏交换的模型参数/梯度(例如,通过使用密码学技术),但处理拜占庭贡献通常需要检查后者。因此,大多数研究工作分别处理这些目标。最近的一系列工作提出使用安全多方计算(MPC)来实现对模型投毒攻击的鲁棒聚合器,从而同时保证机密性和拜占庭鲁棒性。然而,这些解决方案扩展性差:它们要么要求参与者之间进行全对全通信,要么将整个计算委托给一个小子集,其计算和通信负载随网络规模成比例增长。在本文中,我们提出了Giskard,一种用于机密且拜占庭鲁棒的去中心化聚合协议。Giskard将$n$个参与方组织成一个大小为$O(\log n)$的委员会树,并通过在值域上进行委员会适应的分布式二分搜索来评估坐标-wise近似中位数,在每个委员会内使用BGW风格的MPC。我们通过理论证明其安全性和机密性,并通过涉及多达一百万个参与者的广泛实验来评估Giskard。与其最接近的竞争对手相比,Giskard渐近地降低了每方通信复杂度,同时在多达$n/4$个拜占庭参与方下表现出相当的模型效用。

英文摘要

Dealing simultaneously with confidentiality and Byzantine behaviors in decentralized learning is a challenging problem. Indeed, in decentralized learning, clients train a machine learning model while keeping their data locally and share their model parameters or gradients with a set of neighbors. While enforcing confidentiality calls for hiding the exchanged model parameters/gradients (e.g., by using cryptographic techniques), dealing with Byzantine contributions often requires inspecting the latter. Hence, most research works address these objectives separately. A recent line of work proposes to employ secure multi-party computation (MPC) to implement robust aggregators against model poisoning, thereby enforcing both confidentiality and Byzantine resilience. However, these solutions scale badly: they either require all-to-all communication between participants or delegate the entire computation to a small subset, whose computational and communication load grows proportionally with the size of the network. In this paper, we present Giskard, a protocol for confidential and Byzantine-robust decentralized aggregation. Giskard organizes $n$ parties into a tree of committees of size $O(\log n)$ and evaluates a coordinate-wise approximate median via a committee-adapted distributed binary search over the value domain, using BGW-style MPC within each committee. We assess Giskard both theoretically by proving its security and confidentiality properties and experimentally through extensive experiments involving up to one million participants. Compared to its closest competitors, Giskard reduces per-party communication complexity asymptotically while exhibiting comparable model utility under up to $n/4$ Byzantine parties.

2502.10239 2026-06-18 cs.LG cs.AI 版本更新

Efficient Zeroth-Order Federated Finetuning of Language Models on Resource-Constrained Devices

资源受限设备上语言模型的高效零阶联邦微调

Mohamed Aboelenien Ahmed, Kilian Pfeiffer, Ramin Khalili, Heba Khdr, Jörg Henkel

发表机构 * Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院) Huawei(华为) Heisenberg Research Center (Munich), Germany(海森堡研究中心(慕尼黑),德国)

AI总结 提出一种基于零阶优化的联邦微调方法,通过分块模型并分配更多扰动到后一块,复用中间激活减少前向评估次数,在保持内存和通信优势的同时将计算量降低至其他零阶方法的1/3。

Comments Published at TMLR

详情
AI中文摘要

联邦学习是一种有前景的范式,可以在分布式数据源上微调大型语言模型,同时保护数据隐私。然而,在边缘设备上微调如此大的模型由于资源需求高而具有挑战性。零阶优化通过有限差分近似估计梯度,依赖于模型参数随机扰动下的函数评估。因此,与任务对齐的零阶优化提供了一种潜在解决方案,允许仅使用前向传播(推理级内存需求和低通信开销)进行微调,但存在收敛慢和计算需求高的问题。在本文中,我们提出了一种新的基于零阶优化的方法,应用更高效的技术来减少使用大量扰动带来的计算需求,同时保留其收敛优势。这是通过将模型分成连续的块,并为第二块分配更多扰动来实现的,从而能够高效复用中间激活,以更少的前向评估更新整个网络。我们在RoBERTa-large、OPT1.3B、LLaMa-3-3.2B模型上的评估显示,与其他基于零阶优化的技术相比,计算量减少了高达3倍,同时保留了一阶联邦学习技术的内存和通信优势。

英文摘要

Federated Learning (FL) is a promising paradigm for finetuning Large Language Models (LLMs) across distributed data sources while preserving data privacy. However, finetuning such large models is challenging on edge devices due to its high resource demand. Zeroth-order Optimization (ZO) estimates gradients through finite-difference approximations, which rely on function evaluations under random perturbations of the model parameters. Consequently, ZO with task alignment provides a potential solution, allowing finetuning using only forward passes with inference-level memory requirements and low communication overhead, but it suffers from slow convergence and higher computational demand. In this paper, we propose a new ZO-based method that applies a more efficient technique to reduce the computational demand associated with using a large number of perturbations while preserving their convergence benefits. This is achieved by splitting the model into consecutive blocks and allocating a higher number of perturbations to the second block, enabling efficient reuse of intermediate activations to update the full network with fewer forward evaluations. Our evaluation on RoBERTa-large, OPT1.3B, LLaMa-3-3.2B models shows up to $3\times$ reduction in computation compared to the other ZO-based techniques, while retaining the memory and communication benefits over first-order federated learning techniques.

2502.17748 2026-06-18 cs.LG cs.CR 版本更新

FinP: Fairness-in-Privacy in Federated Learning by Addressing Disparities in Privacy Risk

FinP:联邦学习中通过解决隐私风险差异实现隐私公平性

Tianyu Zhao, Mahmoud Srewa, Salma Elmalaki

发表机构 * University of California, Irvine(加州大学尔湾分校)

AI总结 针对联邦学习中隐私风险分布不均的问题,提出FinP框架,通过服务器端自适应聚合和客户端正则化技术,减轻源推理攻击风险,将隐私暴露差异降低57.14%,同时保持模型效用与基线相当。

Comments To appear in PoPETS 2026 Issue 4. Privacy Enhancing Technology Symposium (PETS) 2026

详情
AI中文摘要

联邦学习(FL)固有地缓解了大规模数据集中化风险;然而,其隐私保护并非均匀分布——使得脆弱个体不成比例地暴露于复杂的隐私攻击之下。关键的是,以人为中心的FL环境中的统计异质性常常导致隐私风险的不公平分布,尤其影响那些敏感属性或行为使其成为异常值的个体。为解决这一关键差距,我们引入了FinP,这是一个新颖的框架,旨在通过减轻客户端对源推理攻击(SIA)的过度脆弱性来形式化和实施隐私公平性。FinP实施了一种双管齐下的防御策略,同时解决隐私差异的症状和根本原因,确保没有一组客户端承担过度的隐私负担。它结合了服务器端自适应聚合机制(根据客户端的估计隐私风险动态加权其贡献)和客户端正则化技术(抑制导致独特数据记忆的局部过拟合)。在FEMNIST、人类活动识别(HAR)和CIFAR-10数据集上的广泛实证评估表明,FinP有效地将隐私公平性与主要任务效用对齐。值得注意的是,FinP成功减轻了SIA风险并减少了隐私暴露差异,证明了强大的隐私公平性保证无需牺牲模型效用。最终,FinP通过将脆弱性差异降低高达57.14%,同时将全局模型效用保持在标准联邦基线±1.75%的微小范围内,建立了公平的隐私保护。

英文摘要

Federated Learning (FL) inherently mitigates mass data centralization risks; however, its privacy protections are not equally distributed - leaving vulnerable individuals disproportionately exposed to sophisticated privacy attacks. Crucially, statistical heterogeneity in human-centric FL environments often results in an inequitable distribution of privacy risks, particularly affecting those whose sensitive attributes or behaviors make them outliers. To address this critical gap, we introduce FinP, a novel framework designed to formalize and enforce fairness-in-privacy by mitigating disproportionate client vulnerability to Source Inference Attacks (SIA). FinP operationalizes a two-pronged defense strategy that tackles both the symptoms and root causes of privacy disparity, ensuring that no group of clients bears an excessive privacy burden. It combines a server-side adaptive aggregation mechanism, which dynamically weights client contributions based on their estimated privacy risk, with a client-side regularization technique to curb localized overfitting that drives unique data memorization. Extensive empirical evaluations on FEMNIST, Human Activity Recognition (HAR), and CIFAR-10 datasets demonstrate that FinP effectively aligns privacy fairness with primary task utility. Notably, FinP successfully mitigates SIA risks and reduces disparities in privacy exposure, establishing that strong fairness-in-privacy guarantees need not compromise model utility. Ultimately, FinP establishes equitable privacy protections by reducing vulnerability disparities by up to 57.14%, while preserving global model utility within a marginal +/- 1.75% of standard federated baselines.

2507.04219 2026-06-18 cs.LG cs.AI 版本更新

Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs

模型崩溃不是错误,而是大语言模型机器遗忘中的一种特性

Yan Scholten, Sophie Xhonneux, Leo Schwinn, Stephan Günnemann

发表机构 * Dept. of Computer Science & Munich Data Science Institute, Technical University of Munich(计算机科学系及慕尼黑数据科学研究所,技术大学慕尼黑) Mila, Université de Montréal(蒙特利尔大学Mila)

AI总结 提出部分模型崩溃(PMC)方法,通过故意触发模型在目标数据上的分布崩溃实现遗忘,无需在遗忘目标上优化,有效移除私有信息并保持模型效用。

Comments Accepted at ICLR 2026

详情
AI中文摘要

当前大语言模型的遗忘方法通过将待移除的私有信息纳入微调数据来优化。我们认为这不仅可能强化对敏感数据的暴露,而且从根本上违背了最小化其使用的原则。作为补救,我们提出了一种新颖的遗忘方法——部分模型崩溃(PMC),该方法在遗忘目标中不需要遗忘目标。我们的方法受到最近观察的启发:在生成模型上训练其自身生成会导致分布崩溃,从而有效移除模型输出中的信息。我们的核心见解是,可以通过故意触发我们旨在移除的数据上的模型崩溃来利用模型崩溃进行机器遗忘。我们从理论上分析了我们的方法收敛到期望结果,即模型遗忘目标移除的数据。我们实验证明,PMC克服了现有显式优化遗忘目标的遗忘方法的四个关键限制,并在保持通用模型效用的同时更有效地从模型输出中移除私有信息。总体而言,我们的贡献代表了向更全面、更符合现实隐私约束的遗忘迈出的重要一步。代码可在该 https URL 获取。

英文摘要

Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their fine-tuning data. We argue this not only risks reinforcing exposure to sensitive data, but also fundamentally contradicts the principle of minimizing its use. As a remedy, we propose a novel unlearning method-Partial Model Collapse (PMC), which does not require unlearning targets in the unlearning objective. Our approach is inspired by recent observations that training generative models on their own generations leads to distribution collapse, effectively removing information from model outputs. Our central insight is that model collapse can be leveraged for machine unlearning by deliberately triggering it for data we aim to remove. We theoretically analyze that our approach converges to the desired outcome, i.e. the model unlearns the data targeted for removal. We empirically demonstrate that PMC overcomes four key limitations of existing unlearning methods that explicitly optimize on unlearning targets, and more effectively removes private information from model outputs while preserving general model utility. Overall, our contributions represent an important step toward more comprehensive unlearning that better aligns with real-world privacy constraints. Code available at https://www.cs.cit.tum.de/daml/partial-model-collapse/.

2605.21115 2026-06-18 cs.DC cs.LG 版本更新

Automated Byzantine-Resilient Clustered Decentralized Federated Learning for Battery Intelligence in Connected EVs

自动化抗拜占庭攻击的集群化去中心化联邦学习用于连接电动车的电池智能

Mouhamed Amine Bouchiha, Abdelaziz Amara Korba, Yacine Ghamri-Doudane

发表机构 * SAMOVAR, Télécom SudParis(SAMOVAR,法国电信南巴黎学院) Department of Computer Science, German University of Technology in Oman (GUtech)(阿曼技术大学计算机科学系) L3i, La Rochelle University(拉罗什大学L3i)

AI总结 本文提出了一种自动化抗拜占庭攻击的集群化去中心化联邦学习框架ABC-DFL,用于连接电动车的电池智能,通过引入动态Quorum拜占庭容错协议和基于或acles的聚合层,提高信任、安全和自动化水平,FLECA协议通过适应性阈值过滤恶意更新,有效缓解拜占庭攻击。

Comments 16 pages, 8 figures

详情
AI中文摘要

联邦学习(FL)已作为一种有前景的范式,用于管理智能交通系统(ITS)中的电动汽车(EV)电池数据,使其能够执行隐私保护的任务,如异常检测和容量估计。然而,大多数现有框架依赖于集中式聚合方案,这在安全性和信任方面存在关键限制。为了应对这些挑战,我们提出了ABC-DFL,一种用于连接电动车的自动化抗拜占庭攻击的集群化去中心化联邦学习(C-DFL)框架。所提出的激励驱动的C-DFL系统用开放许可的区块链取代中央服务器,特征新的动态Quorum拜占庭容错(QBFT)协议和基于或acles的聚合层,以增强信任、安全和自动化。ABC-DFL的核心是FLECA(过滤分层增强聚合),一种稳健的分层聚合协议,通过让每个EV使用基于其参考模型更新偏差的适应性阈值过滤恶意更新来缓解拜占庭攻击。Oracle节点负责跨组聚合,利用稳健的聚类来隔离和聚合来自可信EV组的模型更新。全面的实验评估显示,FLECA在良好条件下与FedProx收敛,并在适应性对抗场景中显著优于现有防御措施,攻击影响评分低于0.10。此外,多个多任务模型学习实验验证了激励机制的有效性和公平性。最后,链上和链下基准验证了ABC-DFL的实用性。

英文摘要

Federated learning (FL) has emerged as a promising paradigm for managing electric vehicle (EV) battery data in intelligent transportation systems (ITS), enabling privacy-preserving tasks such as anomaly detection and capacity estimation. However, most existing frameworks rely on centralized aggregation schemes, which pose critical limitations in terms of security and trust. To address these challenges, we propose ABC-DFL, an automated Byzantine-resilient clustered decentralized federated learning (C-DFL) framework for connected EVs. The proposed incentive-driven C-DFL system replaces the central server with an open-permissioned blockchain, featuring a new dynamic Quorum Byzantine Fault Tolerance (QBFT) protocol and an oracle-based aggregation layer, to enhance trust, security, and automation. At the core of ABC-DFL lies FLECA (Filtered Layered Enhanced Clustering Aggregation), a robust hierarchical aggregation protocol that mitigates Byzantine attacks by having each EV filter malicious updates using an adaptive threshold based on deviations from its reference model update. Oracle nodes, responsible for inter-group aggregation, employ robust clustering to isolate and aggregate model updates from trustworthy EV groups. Comprehensive experimental evaluations demonstrate that FLECA matches FedProx convergence under benign conditions and significantly outperforms existing defenses with attack impact scores below 0.10 in adaptive adversarial scenarios. Furthermore, several learning experiments with multitask models confirm the effectiveness and fairness of the incentive mechanism. Finally, on-chain and off-chain benchmarks validate the practicality of ABC-DFL.

8. 鲁棒性、不确定性与可信学习 17 篇

2606.18322 2026-06-18 cs.LG cs.AI 新提交

SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior

SAE干预不可靠:干预后抑制行为的恢复

Mingyue Cui, Linghui Shen, Xingyi Yang

发表机构 * The Hong Kong Polytechnic University(香港理工大学)

AI总结 研究发现稀疏自编码器(SAE)特征干预虽能抑制行为,但存在可恢复的失败模式,通过优化残差扰动可恢复原始行为,揭示特征级控制与行为完整性之间的差距。

Comments Code: https://github.com/Mingyuee88/sae-post-intervention-recovery, Project page: https://mingyuee88.github.io/sae-post-intervention-recovery/

详情
AI中文摘要

稀疏自编码器(SAE)将残差流激活分解为可解释特征。最近的潜在空间防御越来越依赖这些分解,假设识别出的“不安全”SAE特征可作为监控和干预的可操作手柄。在这种范式下,固定特定有害特征预期能可靠地防止模型不当行为。然而,我们表明这种成功可能隐藏一种可恢复的失败模式:固定可能阻止行为的一条可见路径,但并未消除行为本身。我们将这种脆弱性形式化为干预后恢复,这是一个受约束的残差空间优化问题。从干预后的残差状态开始,我们优化残差扰动以恢复干预前的行为,同时保持目标SAE特征的干预后值。即使在干预在优化和生成过程中保持活跃的强威胁模型下,恢复仍然可能。为了排除恢复仅仅是撤销干预的可能性,我们使用编码器正交更新进行单层干预,并在跨层设置中使用相应的特征图雅可比矩阵。在TPP、遗忘、IOI和拒绝引导实验中,这种压力测试揭示了尽管特征级干预成功,行为仍可恢复。特别是在安全关键的拒绝引导设置中,我们在有效样本上实现了95.8%的恢复率,同时将防御特征的相对漂移保持在0.131,远低于基于后缀的基线。恢复路径归因分析进一步将这种恢复定位到SAE重建残差,即SAE未解释的组件。这些结果暴露了特征级控制与行为完整性之间的差距:SAE特征可以支持因果干预,但控制它们并不能保证对底层行为的控制。

英文摘要

Sparse Autoencoders (SAEs) decompose residual-stream activations into interpretable features. Recent latent-space defenses increasingly rely on these decompositions, assuming that identified "unsafe" SAE features serve as actionable handles for monitoring and intervention. In this paradigm, clamping a specific harmful feature is expected to reliably prevent model misbehavior. However, we show that this success may hide a recoverable failure mode: the clamp may block one visible route to a behavior without eliminating the behavior itself. We formulate this vulnerability as post-intervention recovery, a constrained residual-space optimization problem. Starting from the post-intervention residual state, we optimize residual perturbations to recover the pre-intervention behavior while preserving the post-intervention values of the targeted SAE features. Even under a strong threat model where the intervention remains active throughout optimization and generation, recovery remains possible. To rule out that recovery simply undoes the intervention, we use encoder-orthogonal updates for single-layer interventions and the corresponding feature-map Jacobian in the cross-layer setting. Across TPP, unlearning, IOI, and refusal steering experiments, this stress test reveals recoverable behavior despite successful feature-level intervention. Especially in the safety-critical refusal-steering setting, we achieve a 95.8% recovery rate on valid samples while keeping defended-feature relative drift to 0.131, substantially below suffix-based baselines. A recovery-path attribution analysis further localizes this recovery to the SAE reconstruction residual, the component left unexplained by the SAE. These results expose a gap between feature-level control and behavioral completeness: SAE features can support causal intervention, but controlling them does not guarantee control over the underlying behavior.

2606.18418 2026-06-18 cs.LG 新提交

P$^2$CE: Model-Agnostic Plausible Pareto-Optimal Counterfactual Explanations

P$^2$CE: 模型无关的可行帕累托最优反事实解释

Arthur Hendricks Mendes de Oliveira, Giovani Valdrighi, Marcos Medeiros Raimundo

AI总结 提出P$^2$CE算法,利用隔离森林异常检测和SHAP值,生成可行且帕累托最优的反事实解释,平衡可行性、合理性和计算效率。

Comments Under review in the Machine Learning journal

详情
AI中文摘要

机器学习算法在社会应用中的日益普及引发了对公平性和透明度的担忧,从而推动了反事实解释的发展。这些解释通过提供可操作的输入特征更改,帮助个人理解并可能改变在贷款申请、工作选择等领域的不利决策。现有方法往往难以平衡可行性、合理性和计算效率。为此,我们提出了P$^2$CE,一种生成可行帕累托最优反事实解释的算法,为用户提供不同可行性概念之间的多样化最优权衡。P$^2$CE使用辅助隔离森林异常检测器确保解释符合数据分布,并利用SHAP值在短时间内获得最优结果,与底层模型无关。我们在三个数据集上进行了实证评估,结果表明,与相关技术相比,该算法在解决方案质量和计算效率方面均表现出优越性能。

英文摘要

The increasing use of machine learning algorithms in social applications has raised concerns about fairness and transparency, leading to the development of counterfactual explanations. These explanations supports individuals to understand and potentially alter unfavorable decisions in areas such as loan applications, job selections, and more, by providing actionable changes to input features that would lead to a desired outcome. Existing methods often struggle to balance feasibility, plausibility, and computational efficiency. To address this, we introduce P$^2$CE, an algorithm for generating plausible Pareto-optimal counterfactual explanations, offering users a diverse set of optimal trade-offs between different notions of feasibility. P$^2$CE employs an auxiliary isolation forest outlier detector to ensure that explanations are in accordance with the data distribution and leverages SHAP values to obtain optimal results with short computing times, regardless of the underlying model. Our algorithm was empirically evaluated on three datasets, demonstrating superior performance in terms of both solution quality and computational efficiency compared to related techniques.

2606.18430 2026-06-18 cs.LG cs.CR 新提交

Signature filtering: a lightweight enhancement for statistical watermark detection in large language models

签名过滤:大型语言模型中统计水印检测的轻量级增强方法

Chih-Duo Hong, Yen-Pang Chen, Fang Yu

发表机构 * National Chengchi University(国立政治大学)

AI总结 提出签名过滤模块,通过移除干扰水印检测的签名令牌,在弱信号和低熵设置下将检测率从8-31%提升至78-99%,同时保持可控的假阳性率。

详情
AI中文摘要

统计水印帮助组织归因大型语言模型(LLM)的输出,但现有检测器在水印信号弱、文本重复或水印被编辑时往往表现不佳。我们提出签名过滤,一种检测时模块,在不修改水印嵌入和文本生成的情况下增强水印检测。它学习一小部分“签名”令牌,这些令牌的存在会使水印测试不可靠,并在检测前移除这些令牌。通过在小训练集上求解混合整数线性规划获得签名,约束条件最大化真阳性率。我们还推导了在几种攻击者模型(色盲、颜色自适应和分布相关)下的有限样本和渐近界。在四个知名水印家族(Kgw、Sweet、Unigram、Exp)、四个基准语料库(C4、MBPP、HumanEval、Code-Search-Net)和六个LLM(Opt-1.3b、Opt-6.7b、Llama2-13b、Llama3.1-8b、Qwen2.5-14b、Phi-3-medium-14b)上,2-gram和3-gram签名在弱信号和低熵设置下将检测率从无过滤时的8-31%提升至78-99%,同时保持假阳性率可控且通常可忽略。在压力测试中,我们打乱句子并稀释、删除和替换25-50%的令牌,针对Kgw风格水印的2-gram过滤器保留了大部分干净文本的检测增益,通常匹配或超越先进的WinMax水印检测器。因此,签名过滤提供了一种简单、可扩展且模型无关的附加组件,以加强信息处理工作流中LLM文本基于水印的来源检查。

英文摘要

Statistical watermarks help organizations attribute large language model (LLM) outputs, yet existing detectors often struggle when watermark signals are weak, texts are repetitive, or watermarks are edited. We propose signature filtering, a detection-time module that enhances watermark detection without modifying watermark embedding and text generation. It learns a small set of ``signature'' tokens whose presence makes watermark tests unreliable, and removes these tokens before detection. The signatures are obtained by solving a mixed-integer linear program on a small training set, with constraints that maximize the true positive rate. We additionally derive finite-sample and asymptotic bounds under several attacker models (color-blind, color-adaptive, and distributionally correlated). On four well-known watermark families (Kgw, Sweet, Unigram, Exp), four benchmark corpora (C4, MBPP, HumanEval, Code-Search-Net), and six LLMs (Opt-1.3b, Opt-6.7b, Llama2-13b, Llama3.1-8b, Qwen2.5-14b, Phi-3-medium-14b), 2- and 3-gram signatures raise detection rates in weak-signal and low-entropy settings from 8~31% without filtering to 78~99% with filtering, while keeping false positives controllable and often negligible. In stress tests where we scramble sentences and perturb 25~50% of tokens by dilution, deletions, and substitutions, 2-gram filters for Kgw-style watermarks preserve most of the clean-text detection gains, often matching or outperforming the advanced WinMax watermark detector. Signature filtering thus provides a simple, scalable, and model-agnostic add-on to strengthen watermark-based provenance checks for LLM text in information processing workflows.

2606.18454 2026-06-18 cs.LG cs.AI 新提交

Veriphi: Attack-Guided Neural Network Verification with Dataset-Dependent Training Methods

Veriphi: 基于攻击引导的神经网络验证与数据集依赖训练方法

Pratik Deshmukh, Kartik Arya, Vasili Savin

发表机构 * TU Wien(维也纳工业大学)

AI总结 提出Veriphi系统,结合快速对抗攻击与α,β-CROWN形式化边界验证,实验表明训练方法有效性依赖数据集特性,IBP在MNIST上有效但在CIFAR-10上失效,PGD对抗训练在小扰动下达到94%认证准确率,并实现5倍验证加速。

Comments 17 Pages, 8 Figures

详情
AI中文摘要

我们提出Veriphi,一个GPU加速的神经网络验证系统,它使用α,β-CROWN方法将快速对抗攻击与形式化边界认证相结合。通过在MNIST和CIFAR-10上使用三种训练方法(标准、对抗、认证)进行系统实验,我们证明了训练方法的有效性从根本上依赖于数据集。区间边界传播(IBP)在简单的MNIST(784维)上达到78%的认证准确率,但在更复杂的CIFAR-10数据集上提供的认证性能可忽略不计,而在小扰动下PGD对抗训练以94%的认证率占主导地位。我们通过攻击引导的伪造实现了5倍的验证加速,并将我们的方法扩展到生产规模模型(1.058亿参数),用于实际航空航天物流优化。我们的结果挑战了认证训练普遍优于对抗训练的假设,表明上下文对于验证策略选择至关重要。

英文摘要

We present Veriphi, a GPU-accelerated neural network verification system that combines fast adversarial attacks with formal bound certification using alpha,beta-CROWN methods. Through systematic experiments on MNIST and CIFAR-10 using three training methodologies (standard, adversarial, certified), we demonstrate that training method effectiveness is fundamentally dataset-dependent. Interval Bound Propagation (IBP) achieves 78% certified accuracy on simple MNIST (784 dimensions) but provides negligible certification performance on the more complex CIFAR-10 dataset, where PGD adversarial training dominates with 94% certification at small perturbations. We achieve 5x verification speedup through attack-guided falsification and scale our approach to production-size models (105.8M parameters) for real-world aerospace logistics optimization. Our results challenge the assumption that certified training universally outperforms adversarial training, showing context matters critically for verification strategy selection.

2606.18697 2026-06-18 cs.LG cs.CR cs.RO 新提交

Stealthy World Model Manipulation via Data Poisoning

通过数据投毒进行隐蔽的世界模型操纵

Yibin Hu, Xiaolin Sun, Zizhan Zheng

发表机构 * Department of Computer Science(计算机科学系)

AI总结 提出SWAAP框架,通过两阶段数据投毒(双层级优化寻找有害目标模型+梯度匹配隐蔽实现)操纵学习到的世界模型,导致规划性能显著下降,且能规避多种防御检测。

Comments 41 pages, 8 figures, 11 tables. Submitted to NeurIPS 2026

详情
AI中文摘要

基于模型的学习智能体使用学习到的世界模型来预测未来状态、规划行动并适应新环境。然而,从收集的经验中更新世界模型的过程创造了一个训练时攻击面:对抗性投毒的微调轨迹可以操纵学习到的动力学,从而破坏下游规划。在本文中,我们提出了SWAAP,这是第一个针对学习到的世界模型的两阶段数据投毒框架。在第一阶段,SWAAP利用过渡梯度定理实现的一阶双层优化,识别出一个有害的目标世界模型,该模型在规划下诱导低回报行为,同时保持接近干净动力学。在第二阶段,SWAAP通过隐蔽约束的梯度匹配实现该目标,仅修改有限比例的微调过渡目标,使得诱导的训练梯度将受害者模型引向对抗目标,同时预测误差正则化器鼓励投毒目标保持接近世界模型的自然近似误差。为了评估攻击的隐蔽性,我们在投毒管道的三个阶段评估了防御和可检测性:投毒过渡的预训练检测、微调期间的鲁棒训练以及测试时对结果世界模型的监控。在多种连续控制任务中,SWAAP导致显著的性能下降,同时保持投毒过渡接近干净数据,并规避了评估的非自适应残差/CUSUM/TRIM风格防御。这些结果揭示了世界模型适应管道中的实际漏洞,并强调了需要保护世界模型训练数据和所学动力学的鲁棒性方法。

英文摘要

Model-based learning agents use learned world models to predict future states, plan actions, and adapt to new environments. However, the process of updating world models from collected experience creates a training-time attack surface: adversarially poisoned fine-tuning trajectories can manipulate the learned dynamics and thereby corrupt downstream planning. In this paper, we propose SWAAP, the first two-stage data poisoning framework for learned world models. In the first stage, SWAAP identifies a harmful target world model that induces low-return behavior under planning while remaining close to clean dynamics, using first-order bilevel optimization enabled by a transition-gradient theorem. In the second stage, SWAAP realizes this target through stealth-constrained gradient matching, modifying only a limited fraction of fine-tuning transition targets so that the induced training gradients steer the victim model toward the adversarial target, while a prediction-error regularizer encourages the poisoned targets to remain close to the world model's natural approximation error. To assess attack stealthiness, we evaluate defenses and detectability across three stages of the poisoning pipeline: pre-training detection of poisoned transitions, robust training during fine-tuning, and test-time monitoring of the resulting world model. Across diverse continuous-control tasks, SWAAP causes substantial performance degradation while keeping poisoned transitions close to clean data and evading the evaluated non-adaptive residual/CUSUM/TRIM-style defenses. These results reveal a practical vulnerability in world-model adaptation pipelines and highlight the need for robustness methods that protect both world-model training data and learned dynamics.

2606.18832 2026-06-18 cs.LG cs.AI 新提交

Target-confidence Recourse Using tSeTlin machines: TRUST

使用Tsetlin机器的目标置信度追索:TRUST

K. Darshana Abeyrathna, Sara El Mekkaoui, Nils Enric Canut Taugbøl, Anuja Vats

发表机构 * Group Research and Development Det Norske Veritas (DNV)(挪威船级社(DNV)集团研发部)

AI总结 提出TRUST框架,通过概率Tsetlin机器和贝叶斯优化直接搜索满足用户指定置信度目标的最小输入变化,生成更稳健和可解释的反事实解释。

详情
AI中文摘要

反事实解释被广泛用于高风险决策系统中的算法追索。大多数现有方法寻求最小化改变输入以翻转模型决策。然而,决策者通常不仅依赖预测标签,还依赖置信度阈值和风险边际。刚好越过决策边界的反事实在噪声或模型变化下可能脆弱且不稳定。本文提出使用Tsetlin机器的目标置信度追索(TRUST),一种用户明确指定追索所需预测置信度的框架。TRUST不是先生成反事实再评估置信度,而是直接搜索满足用户定义置信度目标的最小变化,从而在成本、置信度和鲁棒性方面比较追索选项。我们使用概率Tsetlin机器(PTM)结合贝叶斯优化实例化TRUST。PTM基于概率子句的结构将预测置信度与决策规则的稳定性联系起来。我们表明,满足相同规则的反事实在可靠性上可能差异很大,取决于它们满足这些规则的安全程度,揭示了决策是由稳健还是脆弱的子句激活支持的。在合成和真实数据集上的实验表明,目标置信度反事实比传统的基于边界的方法产生更稳健和可解释的追索。在多个基准测试中,TRUST实现了完美的鲁棒性,同时保持较低的追索成本,包括在Haberman数据集上以0.92置信度达到0.10的L2距离。通过显式控制置信度和暴露规则级稳定性,TRUST为高风险决策支持提供了可操作的追索。

英文摘要

Counterfactual explanations are widely used to provide algorithmic recourse in high-stakes decision-making systems. Most existing methods seek the smallest change to an input that flips a model's decision. However, decision-makers often rely not only on predicted labels but also on confidence thresholds and risk margins. Counterfactuals that barely cross a decision boundary can be fragile and unstable under noise or model variation. In this paper, we propose Target-confidence Recourse Using tSeTlin machines (TRUST), a framework in which users explicitly specify the desired prediction confidence for recourse. Rather than generating counterfactuals and evaluating confidence afterward, TRUST directly searches for minimal changes that satisfy a user-defined confidence target, enabling comparison of recourse options in terms of cost, confidence, and robustness. We instantiate TRUST using a Probabilistic Tsetlin Machine (PTM) combined with Bayesian optimization. The probabilistic clause-based structure of PTM links prediction confidence to the stability of decision rules. We show that counterfactuals satisfying the same rules can still differ substantially in reliability depending on how securely they satisfy those rules, revealing whether decisions are supported by robust or fragile clause activations. Experiments on synthetic and real-world datasets demonstrate that target-confidence counterfactuals produce more robust and interpretable recourse than conventional boundary-based approaches. Across multiple benchmarks, TRUST achieves perfect robustness while maintaining low recourse cost, including an L2 distance of 0.10 on the Haberman dataset at 0.92 confidence. By explicitly controlling confidence and exposing rule-level stability, TRUST provides actionable recourse for high-stakes decision support.

2606.18839 2026-06-18 cs.LG cs.CV 新提交

Semantic Robustness Certification for Vision-Language Models

视觉语言模型的语义鲁棒性认证

Peiyu Yang, Paul Montague, Feng Liu, Andrew C. Cullen, Amardeep Kaur, Christopher Leckie, Sarah M. Erfani

发表机构 * School of Computing \& Information Systems, University of Melbourne, Australia

AI总结 提出首个无需额外数据即可认证视觉语言模型在语义层面(如形状、大小、风格)鲁棒性的框架,通过文本提示作为语义代理并量化决策边界,确保预测类别在语义变换下不变。

Comments Accepted to ICML

详情
AI中文摘要

视觉语言模型(VLM)现在被广泛用于下游任务。然而,现实世界的应用常常使VLM面临由语义变化(例如形状、大小和风格)引起的分布偏移。鲁棒性认证确定当对输入应用变换时模型的预测是否改变。虽然大多数认证框架研究输入的几何或像素级变换,但本文提出了一种新颖的框架,能够在语义级变换下认证VLM的鲁棒性。利用VLM的开放词汇能力,我们使用文本提示作为语义代理来构建由控制语义变化程度的范围参数化的变换。通过以封闭形式表征VLM决策边界,我们的框架定量地认证了在语义变换下预测类别保持不变的范围区间。我们的框架是第一个在语义级变化下认证VLM鲁棒性而无需为每种变化提供额外数据的框架,使其易于应用。在合成数据和真实数据上的实验表明,我们的框架能够在各种场景下认证针对多种语义变化的鲁棒性。

英文摘要

Vision-language models (VLMs) are now widely used in downstream tasks. However, real-world applications often expose VLMs to distribution shifts induced by semantic variation (e.g., shape, size, and style). Robustness certification determines if a model's prediction changes when transformations are applied to its input. While most certification frameworks study geometric or pixel-level transformations over inputs, this work proposes a novel framework that enables certifying VLM robustness under semantic-level transformations. Leveraging the open-vocabulary capability of VLMs, we use text prompts as semantic proxies to construct transformations parameterized by an extent that controls the degree of semantic variation. By characterizing the VLM decision boundary in closed form, our framework quantitatively certifies extent intervals for which the predicted class remains unchanged under the semantic transformation. Our framework is the first to certify VLM robustness under semantic-level variations without requiring additional data for each variation, making it practical to apply. Experiments on both synthetic and real-world data show that our framework enables certifying robustness under diverse semantic variations across scenarios.

2606.18867 2026-06-18 cs.LG cs.CY stat.ML 新提交

Strategic Feature Selection

战略特征选择

Jivat Neet Kaur, Pratik Patil, Divya Shanmugam, Emma Pierson, Michael I. Jordan, Nika Haghtalab, Meena Jagadeesan, Ahmed Alaa, Serena Wang

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Texas, Austin(德克萨斯大学奥斯汀分校) Cornell Tech(康奈尔科技) Stanford University(斯坦福大学) University of Pennsylvania(宾夕法尼亚大学) Harvard University(哈佛大学) Inria, Paris(巴黎Inria)

AI总结 研究通过特征选择和岭正则化应对战略操纵的分类问题,发现仅基于可操纵性排除特征通常次优,提出联合优化特征集与正则化水平的算法,并在医疗支付基准上验证。

详情
AI中文摘要

当算法预测器在高风险领域(如医疗)中指导资源分配时,这些预测器必须考虑输入特征的战略操纵。典型的解决方案是重新设计预测器本身以明确考虑战略互动。然而在实践中,决策者通常受限于调整现有预测管道中的粗粒度杠杆。例如,医疗组织通常根据感知的可操纵性选择排除哪些特征,同时使用标准正则化程序来收缩保留特征的系数。在这项工作中,我们通过特征选择及其与岭正则化的相互作用,启动了对战略分类的形式化研究。我们的主要发现是,仅基于可操纵性排除单个特征通常是次优的。我们提供了在最优正则化下特征子集性能的细粒度刻画,为政策设计提供了新的见解。受此刻画启发,我们开发了一种实用算法,用于联合选择特征集和岭正则化水平。通过一个关于医疗支付基准的真实世界案例研究,我们说明了我们的算法如何指导实践中粗粒度政策杠杆的设计。我们的结果为减轻算法决策系统中战略行为的影响提供了一个有原则的、实用的框架。

英文摘要

When algorithmic predictors inform resource allocation in high-stakes domains such as healthcare, these predictors must account for strategic manipulation of input features. The typical solution is to redesign the predictor itself to explicitly account for strategic interactions. In practice, however, decision makers are often constrained to adjusting coarser levers within existing prediction pipelines. For example, healthcare organizations often select which features to exclude based on perceived manipulability, while using standard regularization procedures to shrink the coefficients of retained features. In this work, we initiate a formal study of strategic classification through feature selection and its interaction with ridge regularization. Our main finding is that excluding individual features based on their manipulability alone is generally suboptimal. We provide a fine-grained characterization of the performance of a feature subset under optimal regularization, yielding new insights for policy design. Motivated by this characterization, we develop a practical algorithm for jointly choosing the feature set and the level of ridge regularization. Through a real-world case study on a healthcare payments benchmark, we illustrate how our algorithm can guide the design of coarse policy levers in practice. Our results provide a principled, practical framework for mitigating the effects of strategic behavior in algorithmic decision-making systems.

2606.18467 2026-06-18 stat.ML cs.LG 交叉投稿

ToolChain-CRC: Conformal Risk Control for Agentic AI Under Retrieval and Tool-Use Drift

ToolChain-CRC: 检索与工具使用漂移下代理型AI的共形风险控制

Jeffery Opoku, David Banahene

发表机构 * The University of Texas Rio Grande Valley(德克萨斯大学里奥格兰德谷分校) Florida International University(佛罗里达国际大学)

AI总结 针对检索增强和工具使用代理在漂移下的风险控制问题,提出ToolChain-CRC方法,通过构建轨迹级风险评分并校准接受或干预规则,实现可证明的轨迹级风险控制。

Comments 26 pages, 11 figures

详情
AI中文摘要

现代AI代理检索文档、调用工具、检查中间信息,然后产生最终答案或行动。这产生了一个仅从最终答案无法察觉的风险控制问题。即使检索薄弱、工具输出错误或早期步骤缺乏支持,最终响应也可能看起来可接受。我们提出ToolChain-CRC,一种针对漂移下检索增强和工具使用代理的共形风险控制方法。该方法将每次代理运行视为动作、观察和最终输出的完整轨迹。它构建步骤级风险评分,将其组合成轨迹风险评分,校准接受或干预规则,并添加一个随时报警,可在最终答案前停止风险运行。我们在可交换校准运行下证明了轨迹级风险控制,给出了具有可审计常数的漂移感知扩展,并通过超鞅构造证明了随时升级规则。实验涵盖合成工具链漂移、RAG/工具使用压力测试、基于SQuAD的公共检索任务、无API代理问答案例研究、消融实验、目标风险敏感性检查、20种子鲁棒性检查、漂移边界审计以及实时RAG/工具使用代理基准。在这些设置中,仅基于最终答案的校准可能遗漏检索和工具故障,而轨迹级校准将接受轨迹的风险保持在目标之下。

英文摘要

Modern AI agents retrieve documents, call tools, check intermediate information, and then produce a final answer or action. This creates a risk-control problem that is not visible from the final answer alone. A final response may look acceptable even when the retrieval was weak, a tool output was wrong, or an earlier step was unsupported. We propose ToolChain-CRC, a conformal risk-control method for retrieval-augmented and tool-using agents under drift. The method treats each agent run as a full trajectory of actions, observations, and final output. It builds step-level risk scores, combines them into a trajectory risk score, calibrates an accept-or-intervene rule, and adds an anytime alarm that can stop risky runs before the final answer. We prove trajectory-level risk control under exchangeable calibration runs, give a drift-aware extension with auditable constants, and prove an anytime escalation rule through a supermartingale construction. Experiments cover synthetic tool-chain drift, RAG/tool-use stress tests, public SQuAD-derived retrieval tasks, an API-free agentic QA case study, ablations, target-risk sensitivity checks, 20-seed robustness checks, a drift-margin audit, and a live RAG/tool-use agent benchmark. Across these settings, final-answer-only calibration can miss retrieval and tool failures, while trajectory-level calibration keeps accepted-trajectory risk below the target.

2606.18530 2026-06-18 cs.CR cs.CL cs.LG 交叉投稿

Evaluating Prompting-Based Defenses Against Domain-Camouflaged Injection Attacks

评估基于提示的防御策略对抗领域伪装注入攻击

Aaditya Pai

发表机构 * Data Science Institute(数据科学研究所)

AI总结 针对领域伪装注入攻击,评估五种基于提示的防御方法(如释义、重点标记等)在三个模型家族和三个部署领域中的有效性,发现释义法最有效,可将伪装攻击成功率降低55-84%。

Comments 9 pages, 4 figures, 4 tables; under review at the AdvML-Frontiers x CoTMA workshop, COLM 2026

详情
AI中文摘要

领域伪装注入攻击使用领域特定词汇将恶意指令嵌入检索内容中,从而逃避依赖句法注入标记的标准检测器。当检测失败时,从业者需要知道哪些防御架构能降低攻击成功率。我们评估了五种基于提示的防御方法(重点标记、释义、提示夹层以及两种组合)对抗领域伪装注入攻击,涉及三个模型家族(Claude Haiku、Llama 3.1 8B、Gemini 2.0 Flash)和三个部署领域(金融、法律、通用),共进行3,510次试验。在代理处理之前对检索内容进行释义是最一致有效的防御方法,根据模型不同,可将伪装攻击成功率降低55-84%,并且在所有测试模型上均实现了比我们的Llama Guard 4配置更低的攻击成功率。防御效果强烈依赖于模型:重点标记在Claude Haiku上将攻击成功率减半,但在Llama 3.1 8B上没有任何益处。金融领域部署面临最高的残余风险,基线攻击成功率为26-33%,在较弱模型上没有任何基于提示的防御能完全消除威胁。这些结果首次系统评估了专门针对伪装类注入攻击的基于提示的防御方法,并为从业者建立了基于基准的建议。所有任务均使用合成构建的专业文档;这些基准排名是否能推广到真实企业文档仍是一个开放问题。

英文摘要

Domain-camouflaged injection attacks embed malicious instructions in retrieved content using domain-appropriate vocabulary, evading standard detectors that rely on syntactic injection markers. When detection fails, practitioners need to know which defense architectures reduce attack success. We evaluate five prompting-based defenses (spotlighting, paraphrasing, prompt sandwiching, and two combinations) against domain-camouflaged injection across three model families (Claude Haiku, Llama 3.1 8B, Gemini 2.0 Flash) and three deployment domains (financial, legal, general) using 3,510 trials. Paraphrasing retrieved content before agent processing is the most consistently effective defense in this benchmark, reducing camouflage attack success rate by 55-84\% depending on model, and achieves lower attack success rates than our Llama Guard 4 configuration on every model tested. Defense effectiveness is strongly model-dependent: spotlighting halves attack success on Claude Haiku but provides no benefit on Llama 3.1 8B. Financial domain deployments face the highest residual risk at 26-33\% baseline attack success rate, with no prompting-based defense fully eliminating the threat on weaker models. These results provide the first systematic evaluation of prompting-based defenses specifically against camouflage-class injection attacks and establish benchmark-based recommendations for practitioners. All tasks use synthetically constructed professional documents; whether these benchmark rankings generalize to real enterprise documents remains an open question.

2606.18860 2026-06-18 cs.CV cs.LG 交叉投稿

Quantification of Uncertainty with Adversarial Models in Medical Image Segmentation

医学图像分割中对抗模型的不确定性量化

Hana Jebril, Thomas Pinetz, Günter Klambauer, Hrvoje Bogunović

发表机构 * Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria(人工智能研究所、医学数据科学中心、维也纳医学大学,奥地利) Comprehensive Center for AI in Medicine, Medical University of Vienna, Austria(医学人工智能综合中心、维也纳医学大学,奥地利) ELLIS Unit Linz, LIT AI Lab and Institute for Machine Learning, Johannes Kepler University Linz, Austria(林茨ELLIS单位、LIT人工智能实验室和机器学习研究所、林茨约瑟夫·冯·克拉夫特大学,奥地利) Institute for Machine Learning, Johannes Kepler University Linz, Austria(机器学习研究所、林茨约瑟夫·冯·克拉夫特大学,奥地利) Clinical Research Center for Medical AI, Johannes Kepler University Linz, Austria(医学人工智能临床研究中心、林茨约瑟夫·冯·克拉夫特大学,奥地利)

AI总结 提出QUAM-SM后处理框架,通过针对性对抗搜索识别脆弱像素,量化不确定性并分离认知与偶然不确定性,在公开数据集上优于现有方法。

Comments Accepted at MICCAI 2026

详情
AI中文摘要

可靠的像素级不确定性量化具有通过实现高保真纵向监测和区分真实病理变化与伪影来改变临床工作流程的潜力。理想情况下,这些模型提供关键治疗计划和手术干预所需的稳定性。然而,标准深度学习模型常常遭受校准不良,产生过度自信的预测,掩盖了微妙病理边界处的潜在脆弱性。为了解决这个问题,我们提出了QUAM-SM,一种使用针对性对抗搜索来识别“对抗脆弱”像素的后处理框架。通过主动寻找暴露预测不稳定性的扰动,我们的方法突出了决策最容易被翻转的区域。重要的是,该框架将认知不确定性与偶然不确定性分离。在两个具有多个专家标注的公开数据集上的实验表明,QUAM-SM在可靠性和边界敏感性方面优于标准和最新的不确定性估计方法。代码可在以下网址获取:https://this https URL

英文摘要

Reliable pixel-level uncertainty quantification holds the potential to transform clinical workflows by enabling high-fidelity longitudinal monitoring and distinguishing true pathological changes from artifacts. Ideally, these models provide the stability required for critical treatment planning and surgical intervention. However, standard deep learning models often suffer from miscalibration, yielding overconfident predictions that mask underlying vulnerabilities at subtle pathological boundaries. To address this, we propose QUAM-SM, a post-hoc framework using targeted adversarial search to identify "adversarially fragile" pixels. By actively seeking perturbations that expose predictive instability, our method highlights regions where decisions are most vulnerable to being flipped. Importantly, the framework disentangles epistemic uncertainty from aleatoric uncertainty. Experiments on two public datasets with multiple expert annotations demonstrate that QUAM-SM outperforms both standard and recent uncertainty estimation approaches in terms of reliability and boundary sensitivity. Code is available at https://github.com/HanaJebril/quam_sm

2606.19300 2026-06-18 cs.CV cs.LG 交叉投稿

Confidence is Not Reliability: Rethinking MC Dropout in Brain Tumour Segmentation

置信度不等于可靠性:重新思考脑肿瘤分割中的MC Dropout

Xin Ci Wong, Duygu Sarikaya, Kieran Zucker, Marc De Kamps, Nishant Ravikumar

发表机构 * Centre for Doctoral Training in AI for Medical Diagnosis and Care, School of Computing, University of Leeds(利兹大学计算机学院人工智能医学诊断与护理博士培训中心) School of Computer Science, University of Leeds(利兹大学计算机科学学院)

AI总结 通过MC Dropout不确定性估计,发现全局不确定性-误差对齐(AUROC≈0.97)可能掩盖关键子区域(如增强肿瘤)的严重误校准(ECE=0.915),表明子区域校准评估对临床安全至关重要。

Comments Accepted for MIUA2016

详情
AI中文摘要

多参数MRI中的胶质瘤分割是治疗计划的关键组成部分。一个在治疗关键子区域上静默失败的分割模型会带来患者安全风险,而Dice分数等基于重叠的指标无法暴露这种风险。我们探究通过蒙特卡洛(MC)Dropout进行的体素级不确定性估计能否可靠地识别临床关键子区域中的分割错误,以及校准失败模式是否仅从标准报告指标中可检测。在126名BraTS21患者的两模型实证案例研究中,我们评估了高性能预训练SegResNet和本地训练的带有残差单元的UNet(UNet-Res)。MC dropout保持了分割准确性($|\Delta \text{Dice}|$ $<0.01$),同时实现了强不确定性-误差对齐(熵(H)的AUROC $\approx$0.97),表明不确定性正确地将错误体素排在正确体素之上。基于熵的患者分层识别出一个高不确定性亚组,其分割性能显著较低(全肿瘤Dice中位数$0.835$ vs. $0.925$),支持不确定性作为实用的分诊信号。然而,全局对齐可能掩盖重要的区域特异性差异。尽管AUROC相似,UNet-Res在增强肿瘤熵上接近零($0.054$),期望校准误差(ECE)为$0.915$,Dice仅为$0.714$,表明在最临床关键子区域上置信度严重误校准,这是标准Dice和AUROC报告无法发现的失败模式。这些发现表明,强不确定性-误差对齐对于临床安全是必要但不充分的:在选择临床部署模型时,子区域特异性校准评估必须伴随AUROC评估。

英文摘要

Glioma segmentation in multiparametric MRI is a critical component of treatment planning. A segmentation model that fails silently on treatment-critical sub-regions represents a patient safety risk that overlap-based metrics such as Dice scores cannot expose. We ask whether voxel-level uncertainty estimation via Monte Carlo (MC) Dropout can reliably identify segmentation errors in clinically critical sub-regions, and whether calibration failure modes are detectable from standard reporting metrics alone. In an empirical two-model case study on 126 BraTS21 patients, we evaluate a high-performance pretrained SegResNet and a locally trained UNet with residual units (UNet-Res). MC dropout preserved segmentation accuracy ($|Δ\text{Dice}|$ $<0.01$) while achieving strong uncertainty-error alignment (AUROC for entropy (H) $\approx$0.97), indicating uncertainty correctly ranks erroneous voxels above correct ones. Entropy-based patient stratification identified a high-uncertainty subgroup with substantially lower segmentation performance (median whole-tumour Dice $0.835$ vs. $0.925$), supporting uncertainty as a practical triage signal. However, global alignment can mask important region-specific differences. Despite similar AUROC, UNet-Res exhibited near-zero enhancing tumour entropy ($0.054$) and Expected Calibration Error (ECE) of $0.915$, with a Dice of only $0.714$, indicating severely miscalibrated confidence on the most clinically critical sub-region, a failure mode invisible to standard Dice and AUROC reporting. These findings demonstrate that strong uncertainty-error alignment is necessary but insufficient for clinical safety: sub-region-specific calibration assessment must accompany AUROC evaluation when selecting models for clinical deployment.

2504.14798 2026-06-18 cs.LG cs.CV 版本更新

RUB: Evaluating Residual Knowledge in Unlearned Models

RUB: 评估未学习模型中的残留知识

Hao Xuan, Xingyu Li

发表机构 * Electrical and Computer Engineering University of Alberta(电气与计算机工程大学阿尔伯塔大学)

AI总结 提出鲁棒未学习原则及统一基准RUB,通过未学习映射攻击(UMA)检测残留信息,揭示现有方法在对抗评估下的脆弱性。

详情
Journal ref
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2026, pages 8550-8559
AI中文摘要

机器未学习(MUL)已成为隐私保护和内容监管的关键机制,然而当前技术往往无法保证完全移除敏感信息。虽然现有工作大多关注验证未学习的执行,但它们忽略了模型在面对对抗性恢复遗忘知识尝试时是否保持鲁棒性的关键问题。在这项工作中,我们倡导鲁棒未学习原则,要求模型既与重新训练的模型不可区分,又能抵御多样化的对抗威胁。为实例化这一原则,我们提出了一个统一基准RUB(鲁棒未学习基准),系统评估未学习算法在分类、图像到图像重建和文本到图像合成中的鲁棒性。在此框架内,我们引入未学习映射攻击(UMA)作为检测残留信息的通用方法,并展示现有攻击策略如何适应此框架,只要它们符合通用UMA框架。我们在判别式和生成式任务上的实验表明,最先进的未学习方法在这些评估下仍然脆弱,即使通过了标准验证指标。通过将鲁棒性定位为核心标准并提供对抗评估基准,我们希望RUB能为更可靠和安全的未学习实践铺平道路。RUB中的代码库和模型检查点将公开发布。

英文摘要

Machine Unlearning (MUL) has emerged as a key mechanism for privacy protection and content regulation, yet current techniques often fail to guarantee the complete removal of sensitive information. While most existing works focus on verifying the execution of unlearning, they overlook the critical question of whether models remain robust against adversarial attempts to recover forgotten knowledge. In this work, we advocate for the principle of Robust Unlearning, which requires models to be both indistinguishable from retrained counterparts and resilient against diverse adversarial threats. To instantiate this principle, we propose a unified benchmark, RUB (Robust Unlearning Benchmark), that systematically evaluates the robustness of unlearning algorithms across classification, image-to-image reconstruction, and text-to-image synthesis. Within this framework, we introduce the Unlearning Mapping Attack (UMA) as a generalizable method to detect residual information, and demonstrate how existing attack strategies can be adapted into this framework as long as they conform to the generic UMA framework. Our experiments across discriminative and generative tasks reveal that state-of-the-art unlearning methods remain vulnerable under these evaluations, even when passing standard verification metrics. By positioning robustness as the central criterion and providing a benchmark for adversarial evaluation, we hope RUB paves the way toward more reliable and secure unlearning practices. The codebase and model checkpoints in RUB will be published.

2505.03646 2026-06-18 cs.LG cs.AI cs.CV 版本更新

Revealing Hidden Vulnerabilities in Autoencoders through Gradient Signal Restoration

通过梯度信号恢复揭示自编码器中的隐藏漏洞

Chethan Krishnamurthy Ramanaik, Arjun Roy, Tobias Callies, Eirini Ntoutsi

发表机构 * University of the Bundeswehr Munich(联邦国防军理工大学)

AI总结 针对自编码器对抗攻击中梯度消失导致鲁棒性被高估的问题,提出GRILL框架恢复梯度信号,显著提升攻击效果,暴露隐藏漏洞。

详情
AI中文摘要

深度自编码器(AE)的对抗鲁棒性受到的关注远少于判别模型,尽管其压缩的潜在表示会导致病态映射,从而放大小的输入扰动并破坏重建稳定性。现有的AE白盒攻击通过优化范数有界的对抗扰动以最大化重建损失,往往收敛到次优扰动,从而可能高估AE的鲁棒性。我们表明,这种限制与通过病态层反向传播时对抗损失梯度消失有关,这些病态层的中间权重矩阵具有接近零的奇异值。为了解决这个问题,我们提出了GRILL(病态层中的梯度信号恢复)框架,旨在减轻梯度退化并提高编码器-解码器架构中对抗鲁棒性评估的可靠性。GRILL旨在缓解优化过程中的对抗梯度退化,使攻击能够在固定范数约束下更好地逼近高失真扰动。通过在多种AE架构上的广泛实验,包括样本特定和通用攻击,以及标准和自适应攻击设置,我们表明GRILL显著提高了攻击有效性,从而暴露了现有攻击限制所隐藏的漏洞。除了AE之外,我们提供了初步证据表明现代多模态编码器-解码器架构也存在类似的漏洞。

英文摘要

Adversarial robustness of deep autoencoders (AEs) has received less attention than that of discriminative models, although their compressed latent representations induce ill-conditioned mappings that can amplify small input perturbations and destabilize reconstructions. Existing white-box attacks for AEs, which optimize norm-bounded adversarial perturbations to maximize reconstruction damage, often converge to suboptimal perturbations, thereby potentially overstating AE robustness. We show that this limitation is linked to vanishing adversarial loss gradients during backpropagation through ill-conditioned layers, associated with near-zero singular values in their intermediate weight matrices. To address this, we propose GRILL (Gradient Signal Restoration in Ill-Conditioned Layers), a framework designed to mitigate gradient degradation and improve the reliability of adversarial robustness evaluation in encoder-decoder architectures. GRILL is designed to mitigate adversarial gradient degradation during optimization, enabling attacks to better approximate high-distortion perturbations under fixed norm constraints. Through extensive experiments across multiple AE architectures, under both sample-specific and universal attacks, as well as standard and adaptive attack settings, we show that GRILL significantly increases attack effectiveness, thereby exposing vulnerabilities hidden by existing attack limitations. Beyond AEs, we provide preliminary evidence that modern multimodal encoder-decoder architectures exhibit similar vulnerabilities.

2606.16214 2026-06-18 cs.LG cs.AI 版本更新

Calibrated Sampling-Free Uncertainty Estimation in Bayesian Deep Learning

贝叶斯深度学习中的校准无采样不确定性估计

Tobias Jan Wieczorek, Leon de Andrade, Thomas Möllenhoff, Marcus Rohrbach

发表机构 * TU Darmstadt & hessian.AI, Darmstadt, Germany(达姆施塔特工业大学 & hessian.AI,德国达姆施塔特) RIKEN Center for Advanced Intelligence Project, Tokyo, Japan(日本理化学研究所革新智能研究中心,日本东京)

AI总结 提出校准方差传播(CVP),通过新型归一化层传播方法、激活函数处理技术及轻量校准步骤,在单次前向传播中高效估计不确定性,在Transformer和CNN上达到与MC采样相当的精度,成本显著降低。

详情
AI中文摘要

现代深度学习模型仍然以过度自信而闻名,限制了它们在高风险应用中的可靠性。贝叶斯方法通过学习模型参数的分布来应对这一问题,最近的进展使得在大规模架构上以与AdamW相当的成本实现这一目标成为可能。然而,测试时仍存在一个挑战:预测必须对从后验中采样的权重进行多次前向传播的平均,这代价高昂。方差传播提供了一种高效的替代方案,在单次前向传播中计算每层不确定性的解析近似。虽然此类技术对MLP有效,但由于现代架构的深度增加和层类型多样性,其扩展仍然具有挑战性。为填补这一空白,我们提出了校准方差传播(CVP),它引入了一种新的归一化层传播方法,结合了处理激活函数的近期技术,并通过轻量校准步骤吸收残差误差。CVP在Transformer和CNN上产生与MC采样相当准确的不确定性估计,而成本仅为极小部分。与先前的方差传播工作相比,CVP在BEiT-3上对视觉推理(NLVR2)的$0.5\%$风险覆盖率从$8.2\%$提高到$14.6\%$,在ViLT上对VQAv2从$2.6\%$提高到$10.8\%$,且增益扩展到卷积架构。

英文摘要

Modern deep learning models remain notoriously prone to overconfidence, limiting their reliability in high-stakes applications. Bayesian methods aim to counter this by learning a distribution over model parameters, and recent advances now make this feasible for large-scale architectures at costs comparable to AdamW. However, a challenge remains at test time: predictions must be averaged across many forward passes with weights sampled from the posterior, which is prohibitively expensive. Variance propagation offers an efficient alternative, computing layer-wise analytical approximations of uncertainty in a single forward pass. While such techniques are effective for MLPs, their extension to modern architectures remains challenging, due to increased depth and diversity of layer types. To fill this gap, we propose Calibrated Variance Propagation (CVP), which introduces a new propagation method for normalization layers, combines it with recent techniques for handling activation functions, and absorbs residual error through a light calibration step. CVP yields comparably accurate uncertainty estimates to MC sampling across transformers and CNNs, at a fraction of the cost. Against prior variance propagation work, CVP improves coverage at $0.5\%$ risk from $8.2\%$ to $14.6\%$ with BEiT-3 on Visual Reasoning (NLVR2) and from $2.6\%$ to $10.8\%$ with ViLT on VQAv2, with gains extending to convolutional architectures.

2508.02158 2026-06-18 cs.IT cs.CR cs.DS cs.LG math.IT math.ST stat.TH 版本更新

Robust Detection of Planted Subgraphs in Semi-Random Models

半随机模型中植入子图的鲁棒检测

Dor Elimelech, Wasim Huleihel

AI总结 研究半随机模型下植入子图检测问题,证明存在对抗者时强次对数密度子图检测在信息论上不可能,而对数以上密度子图统计极限不变,并设计了高效鲁棒检测算法。

Comments 38 pages, 2 figures

详情
AI中文摘要

在Erdös-Rényi随机图中检测植入子图已被广泛研究,产生了丰富的刻画统计和计算阈值的结果。然而,大多数先前的工作假设纯随机生成模型,使得所得算法在面对现实扰动时可能脆弱。本文开创性地研究了植入子图检测问题的半随机模型,其中允许对抗者在图被揭示给统计学家之前移除植入子图外的边。关键的是,统计学家仍然不知道哪些边被移除,这给推理任务带来了根本性挑战。我们建立了该半随机模型下检测的基本统计极限,揭示了尖锐的二分性。具体而言,对于具有强次对数最大密度的植入子图,在存在对抗者的情况下检测在信息论上变得不可能——尽管在经典随机模型中某些植入子图是可能的。与此形成鲜明对比的是,对于具有超对数密度的子图,统计极限基本保持不变;我们证明最优(尽管计算上不可行)的似然比检验仍然是鲁棒的。在这些统计边界之外,我们设计了一种新的计算高效且鲁棒的检测算法,并为其性能提供了严格的统计保证。我们的结果为植入子图检测建立了第一个鲁棒框架,并为半随机模型、计算-统计权衡和图推理问题中的鲁棒性研究开辟了新方向。

英文摘要

Detection of planted subgraphs in Erdös-Rényi random graphs has been extensively studied, leading to a rich body of results characterizing both statistical and computational thresholds. However, most prior work assumes a purely random generative model, making the resulting algorithms potentially fragile in the face of real-world perturbations. In this work, we initiate the study of semi-random models for the planted subgraph detection problem, wherein an adversary is allowed to remove edges outside the planted subgraph before the graph is revealed to the statistician. Crucially, the statistician remains unaware of which edges have been removed, introducing fundamental challenges to the inference task. We establish fundamental statistical limits for detection under this semi-random model, revealing a sharp dichotomy. Specifically, for planted subgraphs with strongly sub-logarithmic maximum density detection becomes information-theoretically impossible in the presence of an adversary-despite being possible for some planted subgraphs in the classical random model. In stark contrast, for subgraphs with super-logarithmic density, the statistical limits remain essentially unchanged; we prove that the optimal (albeit computationally intractable) likelihood ratio test remains robust. Beyond these statistical boundaries, we design a new computationally efficient and robust detection algorithm, and provide rigorous statistical guarantees for its performance. Our results establish the first robust framework for planted subgraph detection and open new directions in the study of semi-random models, computational-statistical trade-offs, and robustness in graph inference problems.

2602.21160 2026-06-18 stat.ML cs.LG stat.AP stat.ME 版本更新

Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

不仅多少,而且何处:将认知不确定性分解为每类贡献

Mame Diarra Toure, David A. Stephens

发表机构 * Department of Mathematics and Statistics(数学与统计学系)

AI总结 针对安全关键分类中认知不确定性度量无法区分类别的问题,提出将互信息分解为每类向量$C_k$,通过二阶泰勒展开和$1/\mu_k$加权校正边界抑制,在糖尿病视网膜病变选择性预测、分布外检测和标签噪声研究中验证其有效性。

Comments 8 pages, 17 figures Accepted at UAI 2026

详情
Journal ref
Forty-Second Annual Conference on Uncertainty in Artificial Intelligence}, year={2026}, url={https://openreview.net/forum?id=cxuWscJmAr}
AI中文摘要

在安全关键分类中,失败的代价往往是不对称的,然而贝叶斯深度学习用单个标量——互信息(MI)来总结认知不确定性,这无法区分模型的无知涉及良性类别还是安全关键类别。我们将MI分解为每类向量$C_k(x)=\sigma_k^{2}/(2\mu_k)$,其中$\mu_k{=}\mathbb{E}[p_k]$,$\sigma_k^2{=}\mathrm{Var}[p_k]$,计算基于后验样本。该分解来自熵的二阶泰勒展开;$1/\mu_k$加权校正了边界抑制,使$C_k$在稀有类别和常见类别之间具有可比性。根据构造,$\sum_k C_k \approx \mathrm{MI}$,并且伴随的偏度诊断标志可识别近似退化的输入。在刻画$C_k$的公理性质后,我们在三个任务上验证了它:(i)糖尿病视网膜病变的选择性预测,其中关键类别的$C_k$相比MI降低了34.7%的选择性风险,相比方差基线降低了56.2%;(ii)临床和图像基准上的分布外检测,其中$\sum_k C_k$取得了最高的AUROC,并且每类视角暴露了MI无法察觉的不对称偏移;(iii)受控的标签噪声研究,其中在端到端贝叶斯训练下,$\sum_k C_k$对注入的偶然噪声的敏感性低于MI,而在迁移学习下两种度量均退化。在所有任务中,后验近似的质量对不确定性的影响至少与度量选择本身一样强,这表明不确定性如何通过网络传播与其如何被度量同等重要。

英文摘要

In safety-critical classification, the cost of failure is often asymmetric, yet Bayesian deep learning summarises epistemic uncertainty with a single scalar, mutual information (MI), that cannot distinguish whether a model's ignorance involves a benign or safety-critical class. We decompose MI into a per-class vector $C_k(x)=σ_k^{2}/(2μ_k)$, with $μ_k{=}\mathbb{E}[p_k]$ and $σ_k^2{=}\mathrm{Var}[p_k]$ across posterior samples. The decomposition follows from a second-order Taylor expansion of the entropy; the $1/μ_k$ weighting corrects boundary suppression and makes $C_k$ comparable across rare and common classes. By construction $\sum_k C_k \approx \mathrm{MI}$, and a companion skewness diagnostic flags inputs where the approximation degrades. After characterising the axiomatic properties of $C_k$, we validate it on three tasks: (i) selective prediction for diabetic retinopathy, where critical-class $C_k$ reduces selective risk by 34.7\% over MI and 56.2\% over variance baselines; (ii) out-of-distribution detection on clinical and image benchmarks, where $\sum_k C_k$ achieves the highest AUROC and the per-class view exposes asymmetric shifts invisible to MI; and (iii) a controlled label-noise study in which $\sum_k C_k$ shows less sensitivity to injected aleatoric noise than MI under end-to-end Bayesian training, while both metrics degrade under transfer learning. Across all tasks, the quality of the posterior approximation shapes uncertainty at least as strongly as the choice of metric, suggesting that how uncertainty is propagated through the network matters as much as how it is measured.

9. 图学习与结构化数据 8 篇

2606.18317 2026-06-18 cs.LG 新提交

Enhanced Graph Neural Networks using K-Hop Gaussian Diffusion

使用K跳高斯扩散增强图神经网络

Xuling Zhang, Peng Wang, Daiyan Li, Aoran Huang, Zeiwei Chen, Yongkui Yang

发表机构 * Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences(中国科学院深圳先进技术研究院) Southern University of Science and Technology(南方科技大学)

AI总结 提出K跳高斯扩散核作为预处理模块,通过多跳扩散和高斯权重平衡局部与全局信息,在噪声或结构复杂图中优于传统消息传递和现有扩散方法。

Comments 5page, 3 figures

详情
AI中文摘要

大多数图神经网络核心依赖于图卷积,通常实现为直接(单跳)邻居之间的消息传递。在许多现实世界的图中,边可能带有噪声或定义不明确,限制了信息传播到局部邻域。现有的扩散核,如个性化PageRank和热核,通过全局传播缓解了这个问题,但仍然难以处理复杂的局部结构和远距离节点噪声。为了解决这些限制,我们提出了一种K跳高斯扩散核作为图数据的预处理模块。KHG引入了多跳扩散,并对远程节点进行高斯加权,在应用标准GNN之前平衡局部和全局信息传播。在多个基准数据集上的实验表明,KHG显著优于传统的消息传递GNN,以及PPR和热核扩散,特别是在噪声或结构复杂的图中。

英文摘要

Most graph neural network (GNN) cores rely on graph convolutions, typically implemented as message passing between direct (single-hop) neighbors. In many real-world graphs, edges can be noisy or poorly defined, limiting information propagation to local neighborhoods. Existing diffusion kernels, such as Personalized PageRank (PPR) and Heat Kernel, alleviate this issue through global propagation, but still struggle with complex local structures and distant node noise. To address these limitations, we propose a K-Hop Gaussian (KHG) diffusion kernel as a preprocessing module for graph data. KHG introduces multi-hop diffusion with Gaussian weighting for remote nodes, balancing local and global information propagation before applying standard GNNs. Experiments on multiple benchmark datasets demonstrate that KHG significantly outperforms traditional message-passing GNNs, as well as PPR and Heat Kernel diffusion, particularly in noisy or structurally complex graphs.

2606.18444 2026-06-18 cs.LG cs.AI 新提交

TMR-GGNN: Credit Card Fraud Detection based on Time-Aware Multi-Relational Guided Graph Neural Network

TMR-GGNN:基于时间感知多关系引导图神经网络的信用卡欺诈检测

Rohit Tewari, Shubhankar Shilpi, Navin Chhibber, Devendra Singh Parmar, Sunil Khemka, Piyush Ranjan

发表机构 * Unysis Truist Banks Infinity Tech Group Technical Product(Unysis 信任银行 Infinity 技术集团技术产品) Fairfax, USA(美国费尔法克斯) Atlanta, USA(美国亚特兰大) Sunnyvale, USA(美国 Sunnyvale) Persistent Systems IEEE Vice Chair AeroSpace Chapter(Persistent 系统 IEEE 副主席航空航天分会) Discover Financial Services(Discover 金融服务) Edison, USA(美国埃迪森)

AI总结 提出TMR-GGNN框架,通过时间窗口内异构实体交互建模、动态多关系图构建、时间感知注意力机制和对比学习解码器,结合InfoNCE与Focal Loss复合损失函数,解决数据不平衡和欺诈模式演化问题。

Comments 2025 2nd International Conference on Software, Systems and Information Technology (SSITCON), Pages 7

详情
AI中文摘要

近年来,由于高度不平衡的数据、不断演变的欺诈模式以及交易实体间复杂的关联结构,信用卡欺诈检测面临重大挑战。为解决这些问题,本研究提出了一种名为时间感知多关系引导图神经网络(TMR-GGNN)的新框架。具体而言,所提出的TMR-GGNN通过建模客户、商户、设备和IP在时间窗口内的异构交互,扩展了编码器-解码器图神经网络(GNN)架构。随后,该TMR-GGNN方法构建了一个动态的多关系图,并在编码器中引入时间感知关系注意力机制,以基于时间邻近性和语义上下文自适应地权衡交易相关性。因此,解码器采用对比学习模块来区分真实和合成的交易模式,同时提高模型对罕见欺诈案例的泛化能力。此外,为有效管理严重的类别不平衡并强调判别性学习,引入了结合基于信息噪声对比估计(InfoNCE)的对比损失与Focal Loss的复合损失函数。这种集成有助于改进欺诈识别,同时减少假阴性。

英文摘要

In recent years, credit card fraud detection has faced significant challenges due to highly imbalanced data, evolving fraud patterns, and complex relational structures among transaction entities. To address these issues, this research proposes a novel framework called Timeaware Multi Relational Guided Graph Neural Network (TMR GGNN). Particularly, the proposed TMR GGNN extends the encoder decoder Graph Neural Network GNN architecture by modeling heterogeneous interactions across customers, merchants, devices, and IPs over temporal windows. Subsequently, the proposed TMR GGNN approach constructs a dynamic, multi relational graph and incorporates a time aware relational attention mechanism within the encoder to adaptively weigh the transaction relevance based on temporal proximity and semantic context. Consequently, the decoder employs a contrastive learning module to distinguish between real and synthesized transaction patterns, while improving the models generalization of rare fraud cases. Additionally, to effectively manage severe class imbalances and emphasize discriminative learning, a composite loss function combining Information Noise Contrastive Estimation (InfoNCE) based contrastive loss with Focal Loss is introduced. This integration assists in improving fraud identification while mitigating false negatives.

2606.18621 2026-06-18 cs.LG 新提交

Towards Anomaly Detection on Relational Data

面向关系数据的异常检测

Shiyuan Li, Yunfeng Zhao, Yue Tan, Qingfeng Chen, Yixin Liu, Shirui Pan

发表机构 * Griffith University(格里菲斯大学) Guangxi University(广西大学)

AI总结 提出RelAD框架,通过条件稀疏门控属性重建和双视图多关系边重建,有效检测关系数据中的属性异常和连接模式异常,在6个基准数据集上优于现有方法。

详情
AI中文摘要

关系数据库广泛应用于现实系统中管理结构化数据。从这类关系数据中检测异常对于识别欺诈、风险和异常行为至关重要,但尚未得到充分探索。关键挑战在于关系数据的内在复杂性:多表属性是高维且异质的,使得稀疏的异常线索容易被正常或无关信息淹没;异常还可能表现为跨不同外键关系的异常连接模式,而现有的表格和图异常检测方法难以捕捉。为解决这些问题,我们提出RelAD,一个基于重建的框架,从属性和关系边重建中捕捉异常。RelAD包含两个核心模块:条件稀疏门控属性重建,抑制冗余的多表属性并强调异常语义块;以及双视图多关系边重建,从内在和行为实体画像中检测关系特定的异常连接。得到的属性和关系信号通过轻量级融合模块整合,产生最终异常分数。我们进一步构建了6个具有系统性异常的基准数据集,大量实验表明RelAD在取得竞争性效率的同时,始终优于其他基线方法。

英文摘要

Relational databases are widely used for managing structured data in real-world systems. Detecting anomalies from such relational data is crucial for identifying fraud, risks, and abnormal behaviors, yet remains under-explored. The key challenges lie in the intrinsic complexity of relational data: multi-table attributes are high-dimensional and heterogeneous, making sparse abnormal clues easy to overwhelm by normal or irrelevant information; and anomalies may further manifest as abnormal connection patterns across different foreign-key relations, which existing tabular and graph anomaly detection methods are ill-suited to capture. To address them, we propose RelAD, a reconstruction-based framework that captures anomalies from both attribute and relational edge reconstruction. RelAD contains two core modules: conditional sparse-gated attribute reconstruction, which suppresses redundant multi-table attributes and emphasizes abnormal semantic blocks, and dual-view multi-relational edge reconstruction, which detects relation-specific abnormal connections from both intrinsic and behavioral entity profiles. The resulting attribute and relational signals are integrated through a lightweight fusion module to produce the final anomaly score. We further construct 6 benchmark datasets with systematic anomalies, on which extensive experiments show that RelAD consistently outperforms other baselines while achieving competitive efficiency.

2606.19185 2026-06-18 cs.LG 新提交

AGDN: Learning to Solve Traveling Salesman Problem with Anisotropic Graph Diffusion Network

AGDN:利用各向异性图扩散网络学习求解旅行商问题

Bolin Shen, Ziwei Huang, Zhiguang Cao, Yushun Dong

发表机构 * Florida State University(佛罗里达州立大学) Singapore Management University(新加坡管理大学)

AI总结 提出各向异性图扩散网络(AGDN),通过MixScore转移矩阵和各向异性扩散策略,有效利用图结构信息求解旅行商问题,在多种实例规模和分布上优于现有方法。

Comments Accepted at the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

详情
AI中文摘要

旅行商问题(TSP)是组合优化的基石,出现在许多实际场景中。尽管基于图的学习方法已被探索用于TSP,但如何更有效地利用图结构的问题仍然悬而未决。我们提出了各向异性图扩散网络(AGDN),一种新的图神经网络框架,旨在求解TSP。我们的方法解决了两个核心难点:(1)完全连接TSP图中缺乏信息丰富的拓扑先验,以及(2)在常用的图稀疏化技术后,最优解中丢失连接节点。为了克服这些问题,我们构建了一个MixScore转移矩阵,将节点相似性与成对距离相结合,并开发了一种各向异性图扩散策略,支持跨多跳的高效信息交换。涵盖不同实例规模和节点分布的全面实验表明,AGDN在保持计算时间竞争力的同时,始终优于现有方法。此外,AGDN能够很好地泛化到训练期间未见的问题规模和分布。实现代码已公开在:this https URL。

英文摘要

The Traveling Salesman Problem (TSP) is a cornerstone of combinatorial optimization and arises in many practical scenarios. Although graph-based learning approaches have been explored for TSP, the question of how to exploit graph structure more effectively remains open. We present the Anisotropic Graph Diffusion Network (AGDN), a new Graph Neural Network framework designed to solve TSP. Our method tackles two central difficulties: (1) the lack of informative topological prior in fully connected TSP graphs, and (2) losing connected nodes in the optimal solution after the commonly used graph sparsification techniques. To overcome these issues, we construct a MixScore transition matrix that merges node similarity with pairwise distance, and we develop an anisotropic graph diffusion strategy that supports efficient information exchange across multiple hops. Comprehensive experiments spanning diverse instance sizes and node distributions show that AGDN consistently outperforms existing methods while keeping computation time competitive. Furthermore, AGDN generalizes well to problem sizes and distributions beyond those seen during training. The implementation is publicly available at: https://github.com/LabRAI/AGDN.

2606.19303 2026-06-18 cs.LG 新提交

P-K-GCN: Physics-augmented Koopman-enhanced Graph Convolutional Network for Deep Spatiotemporal Super-resolution

P-K-GCN:物理增强的Koopman图卷积网络用于深度时空超分辨率

Xizhuo, Zhang, Zekai Wang, Fei Liu, Bing Yao

发表机构 * Department of Industrial & Systems Engineering, The University of Tennessee, Knoxville(田纳西大学诺克斯维尔分校工业与系统工程系) Charles F. Dolan School of Business, Fairfield University(费尔菲尔德大学查尔斯·F·多兰商学院) Department of Electrical Engineering & Computer Science, The University of Tennessee, Knoxville(田纳西大学诺克斯维尔分校电气工程与计算机科学系)

AI总结 提出P-K-GCN,结合样条GCN和Koopman算子理论,在非规则几何上实现时空超分辨率,并通过物理损失和理论分析保证误差降低。

详情
AI中文摘要

高保真时空动力学模拟计算成本高昂,因此需要高效的超分辨率技术从粗粒度输入重建高分辨率数据。传统数据驱动方法缺乏物理约束,而简单的物理信息学习难以处理不规则空间几何和复杂时间演化。为解决这些问题,我们提出了一种物理增强的Koopman图卷积网络(P-K-GCN),用于不规则几何上的时空超分辨率。具体地,首先设计了一个基于连续样条的GCN,直接从粗粒度图中提取空间依赖关系,并引入Koopman算子理论将非线性动力学投影到紧凑的潜空间,其中时间演化被线性化。其次,我们通过基于物理的损失增强优化目标,迫使数据驱动重建遵循物理定律,以提高预测保真度和鲁棒性。最后,我们提供了严格的理论分析,证明物理增强和Koopman正则化通过减少Rademacher复杂度和收紧泛化界,数学上保证了超分辨率误差的降低。我们在从稀疏低分辨率测量重建三维心脏几何上的高分辨率心脏电动力学上评估了我们的框架。数值实验表明,我们的方法相比基线模型实现了更高的精度。

英文摘要

High-fidelity simulation of spatiotemporal dynamics is computationally prohibitive, necessitating efficient super-resolution techniques to reconstruct high-resolution data from coarse-grained inputs. Traditional data-driven methods often lack physical constraints, and simple physics-informed learning struggles with irregular spatial geometries and intricately evolving temporal dynamics. To tackle these challenges, we propose a Physics-augmented Koopman-enhanced Graph Convolutional Network (P-K-GCN) for spatiotemporal super-resolution on irregular geometries. Specifically, a continuous spline-based GCN is first designed to extract spatial dependencies directly from coarse graph, and Koopman operator theory is incorporated to project the nonlinear dynamics into a compact latent space where temporal progression is linearized. Second, we augment the optimization objective with a physics-based loss to force the data-driven reconstructions to adhere to physical laws for improving predictive fidelity and robustness. Finally, we provide a rigorous theoretical analysis, establishing that the physics augmentation and Koopman regularization mathematically guarantees a reduction in super-resolution error by diminishing Rademacher complexity and tightening generalization bounds. We evaluate our framework on reconstructing spatially high-resolution cardiac electrodynamics across a 3D heart geometry from sparse low-resolution measurements. Numerical experiments demonstrate that our method achieves superior accuracy compared to baseline models.

2504.04739 2026-06-18 cs.LG cs.CY 版本更新

UST-GNN: A Unified Spatial--Topological Graph Neural Network Framework for Urban Analytics--Demonstrated through a Case Study on Urban Health Prediction

UST-GNN:面向城市分析的空间-拓扑统一图神经网络框架——以城市健康预测为例

Minwei Zhao, Sanja Scepanovic, Stephen Law, Ivica Obadic, Cai Wu, Daniele Quercia

发表机构 * University College London(伦敦大学学院) The Hong Kong University of Science(香港科学大学) Nokia Bell Labs(诺基亚贝尔实验室) Technical University of Munich(慕尼黑技术大学) University of Oxford(牛津大学)

AI总结 提出UST-GNN框架,整合邻域连通性、异质城市特征和位置嵌入,在大伦敦4835个邻域的健康预测中,严格空间交叉验证下R²提升8.4-13.2%,并引入主成分模块解释嵌入。

详情
AI中文摘要

理解社会、人口、环境与空间因素如何共同塑造城市结果,对于可持续城市发展和循证政策至关重要。传统统计方法往往难以捕捉复杂的非线性关系,而许多机器学习方法忽视了城市系统中空间自相关和网络拓扑的共同作用。近期GeoAI的进展仅部分解决了这些挑战,通常将空间效应、图结构、评估和可解释性分开处理。我们提出\textbf{UST-GNN},一个统一的空间-拓扑图神经网络框架,将邻域连通性、异质城市特征和位置/区位嵌入整合到单一表示中。使用MedSAT数据集(包含大伦敦4835个邻域的150多个环境和社会人口变量及六种处方结果),UST-GNN在严格空间交叉验证下,比强统计基线、地理增强基线和图机器学习基线表现更优,样本外$R^2$提升8.4-13.2%。我们进一步引入轻量级主成分模块,从地理角度解释学习到的节点嵌入,并将其与政策相关的协变量联系起来。结果分析恢复了已知模式,为有争议的关联提供了新视角,并揭示了值得进一步因果研究的新预测因子。这些发现共同证明了基于图的空间机器学习在城市健康分析、环境不平等评估和循证城市政策中的价值。除预测增益外,UST-GNN提供了一个统一的GeoAI分析流程,可嵌入城市数字孪生工作流,用于情景测试、监测和数据驱动的决策,以建设更健康、更可持续的城市。

英文摘要

Understanding how social, demographic, environmental, and spatial factors jointly shape urban outcomes is essential for sustainable urban development and evidence-based policy. Traditional statistical approaches often struggle to capture complex non-linear relationships, while many machine learning methods overlook the joint roles of spatial autocorrelation and network topology in urban systems. Recent advances in GeoAI have addressed these challenges only partially, often treating spatial effects, graph structure, evaluation, and interpretability separately. We present \textbf{UST-GNN}, a unified spatial--topological graph neural network framework that integrates neighbourhood connectivity, heterogeneous urban features, and positional/locational embeddings into a single representation. Using the MedSAT dataset, which contains over 150 environmental and socio-demographic variables and six prescription outcomes across 4,835 neighbourhoods in Greater London, UST-GNN outperforms strong statistical, geographically enhanced, and graph Machine Learning baselines, improving out-of-sample $R^2$ by 8.4--13.2\% under strict spatial cross-validation. We further introduce a lightweight principal-component module to interpret learned node embeddings geographically and relate them to policy-relevant covariates. The resulting analyses recover established patterns, offer new perspectives on debated associations, and reveal novel predictors warranting further causal investigation. Together, these findings demonstrate the value of graph-based spatial machine learning for urban health analytics, environmental inequality assessment, and evidence-based urban policy. Beyond predictive gains, UST-GNN provides a unified GeoAI analytical pipeline that can be embedded into urban digital twin workflows for scenario testing, monitoring, and data-informed decision-making for healthier, more sustainable cities.

2606.15633 2026-06-18 cs.LG 版本更新

Formalizing and Mitigating Structural Distortion in LLM Attention for Graph Reasoning

形式化并缓解大语言模型注意力中的结构失真以实现零样本图推理

Donald Loveland, Puja Trivedi, Ari Weinstein, Edward W Huang, Danai Koutra

发表机构 * University of Michigan(密歇根大学) Amazon(亚马逊)

AI总结 本文形式化了大语言模型处理文本属性图时因图线性化导致的结构失真机制,并提出轻量级推理时修改方法GaLA,通过校正注意力偏差提升零样本图推理性能。

Comments Accepted to KDD 2026

详情
AI中文摘要

大语言模型(LLM)在文本属性图(TAG)推理中展现出潜力。然而,将LLM应用于图需要将其结构线性化为序列,这引入了根源于图带宽问题的失真。虽然这种失真已被证明会降低性能,但通常归因于提示设计或模型规模,其潜在机制尚不清楚。在这项工作中,我们展示了旋转位置嵌入如何将图线性化为带宽相关的注意力衰减,抑制了序列化序列中被强制分隔开的图相邻节点之间的注意力。这将基于LLM的图推理的焦点从提示工程和规模缩放转向纠正注意力错位。受此分析启发,我们提出了图对齐语言注意力(GaLA),一种轻量级的、推理时修改LLM的方法。GaLA将注意力偏向图相邻节点,同时保留LLM的序列归纳偏差。在TAG基准测试中,GaLA以可忽略的开销提升了性能,表明失真是基于LLM的图推理中可纠正的瓶颈。

英文摘要

Large Language Models (LLMs) have shown promise for reasoning over Text-Attributed Graphs (TAGs). However, applying LLMs to graphs requires linearizing their structure into sequences, introducing distortion rooted in the graph bandwidth problem. While this distortion has been shown to degrade performance, it is often attributed to prompt design or model scale, leaving the underlying mechanism unclear. In this work, we show \textit{how} rotary positional embeddings turn graph linearization into bandwidth-dependent attention decay, suppressing attention between graph-adjacent nodes that are forced far apart in the serialized sequence. This shifts the focus of LLM-based graph reasoning from prompt engineering and scaling toward correcting attention misalignment. Motivated by this analysis, we propose \textbf{G}raph-\textbf{a}ligned \textbf{L}anguage \textbf{A}ttention (\textbf{GaLA}), a lightweight, inference-time modification for LLMs. GaLA biases attention toward graph-adjacent nodes while preserving the LLM's sequential inductive biases. Across TAG benchmarks, GaLA improves performance with negligible overhead, demonstrating that distortion is a correctable bottleneck in LLM-based graph reasoning.

2505.12369 2026-06-18 cs.AI cs.LG cs.LO 版本更新

Fully Geometric Multi-Hop Reasoning on Knowledge Graphs with Transitive Relations

知识图谱上具有传递关系的全几何多跳推理

Fernando Zhapa-Camacho, Robert Hoehndorf

发表机构 * KAUST Center of Excellence for Smart Health (KCSH)(智能健康卓越中心) KAUST Center of Excellence for Generative AI(生成人工智能卓越中心)

AI总结 提出GeometrE方法,将逻辑操作映射为纯几何变换,并引入传递损失函数,在保持可解释性的同时提升多跳推理性能。

Comments Accepted at ESWC 2026

详情
Journal ref
The Semantic Web. ESWC 2026. Lecture Notes in Computer Science, vol 16549. Springer, Cham (2026)
AI中文摘要

知识图谱上的多跳逻辑推理需要将逻辑语义忠实地映射到潜在空间。当前的几何嵌入方法通过将实体映射到几何区域、逻辑操作映射到潜在变换,在此任务上表现出有效性。虽然几何嵌入可以为查询回答提供直接的可解释性框架,但当前方法仅利用了实体的几何构造,未能将逻辑操作映射为纯几何变换,而是使用神经组件来学习这些操作。另一方面,纯神经方法优于几何方法,但在潜在空间中缺乏可解释性。我们提出了GeometrE,一种用于多跳推理的几何嵌入方法,它将每个逻辑操作映射为潜在空间中的纯几何操作。此外,我们引入了一个传递损失函数,并表明与现有方法不同,它可以保留对所有a,b,c的逻辑规则:r(a,b)和r(b,c) -> r(a,c)。我们的实验表明,GeometrE优于当前最先进的几何方法,并在标准基准数据集上与现有的神经方法保持竞争力。

英文摘要

Multi-hop logical reasoning on knowledge graphs requires faithfully mapping the logical semantics to latent space. Current geometric embedding methods show to be useful on this task by mapping entities to geometric regions and logical operations to latent transformations. While a geometric embedding can provide a direct interpretability framework for query answering, current methods have only leveraged the geometric construction of entities, failing to map logical operations to pure geometric transformations and, instead, using neural components to learn these operations. On the other hand, purely neural-based methods outperform geometric methods, but they lack interpretability in the latent space. We introduce GeometrE, a geometric embedding method for multi-hop reasoning, that maps every logical operation to a purely geometric operation in the latent space. Additionally, we introduce a transitive loss function and show that, unlike existing methods, it can preserve the logical rule for all a,b,c: r(a,b) and r(b,c) -> r(a,c). Our experiments show that GeometrE outperforms current state-of-the-art geometric methods and remains competitive with existing neural-based methods on standard benchmark datasets.

10. 迁移、元学习与持续学习 7 篇

2606.19164 2026-06-18 cs.LG cs.AI 新提交

Essential Subspace Merging for Multi-Task Learning

多任务学习的本质子空间合并

Longhua Li, Lei Qi, Xin Geng, Qi Tian

发表机构 * School of Computer Science and Engineering, Southeast University(东南大学计算机科学与工程学院) Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education(教育部新一代人工智能技术及其跨学科应用重点实验室(东南大学)) Huawei Inc.(华为公司)

AI总结 提出本质子空间分解(ESD)和合并(ESM/ESM++)方法,通过正交化任务更新的主成分来减少多任务合并中的干扰,无需训练即可实现高效多任务学习。

详情
AI中文摘要

模型合并旨在通过将多个从同一预训练检查点微调得到的模型的能力集成到一个单一模型中,从而实现多任务学习。其核心挑战是任务特定参数更新之间的任务间干扰。在本文中,我们分析了任务更新引起的输出偏移,并观察到它们的能量集中在少数主方向上。我们将这些方向张成的子空间称为本质子空间。相比之下,大多数剩余方向携带的任务相关能量很少,但它们在多个任务更新中的累积会在合并过程中引起严重干扰。受此观察启发,我们提出了本质子空间分解(ESD),它根据激活偏移的主成分分解每个任务更新。基于ESD,我们引入了本质子空间合并(ESM),一种无需训练的静态合并方法,它将本质成分正交化并融合成一个紧凑的多任务模型。我们进一步将ESM扩展到ESM++,一种无需训练的动态合并方法,它将任务特定残差分解为低秩专家,并在前向推理过程中通过基于原型的路由选择最相关的专家。跨多个任务集和模型规模的大量实验表明,ESM和ESM++在减少任务间干扰的同时有效保留了任务知识。

英文摘要

Model merging aims to enable multi-task learning by integrating the capabilities of multiple models fine-tuned from the same pre-trained checkpoint into a single model. Its core challenge is inter-task interference among task-specific parameter updates. In this paper, we analyze the output shifts induced by task updates and observe that their energy is concentrated in a small number of principal directions. We call the subspace spanned by these directions the essential subspace. In contrast, most remaining directions carry little task-relevant energy, but their accumulation across multiple task updates can cause severe interference during merging. Motivated by this observation, we propose Essential Subspace Decomposition (ESD), which decomposes each task update according to the principal components of its activation shift. Based on ESD, we introduce Essential Subspace Merging (ESM), a training-free static merging method that orthogonalizes and fuses essential components into one compact multi-task model. We further extend ESM to ESM++, a training-free dynamic merging method that decomposes task-specific residuals into low-rank experts and selects the most relevant expert through prototype-based routing during forward inference. Extensive experiments across multiple task sets and model scales demonstrate that ESM and ESM++ effectively preserves task knowledge while reducing inter-task interference.

2606.18567 2026-06-18 stat.ML cs.LG stat.AP stat.ME 交叉投稿

Bridging Data Gaps in Structural Fragility Modeling through Transfer Learning: Methodology and Case Studies

通过迁移学习弥合结构易损性建模中的数据空白:方法与案例研究

Narges Saeednejad, Jamie Ellen Padgett

发表机构 * Department of Civil and Environmental Engineering, Rice University(Rice大学土木与环境工程系) Ken Kennedy Institute, Rice University(Rice大学肯尼迪研究所)

AI总结 提出以方法为中心的迁移学习框架,解决领域偏移、类别不平衡和目标标签稀缺问题,通过三个案例验证其在低数据场景下提升失效检测与预测稳定性的有效性。

Comments 24 pages, 12 figures

详情
AI中文摘要

本文提出了一个以方法为中心的迁移学习框架,用于在领域偏移、类别不平衡和目标标签稀缺的情况下进行易损性自适应,同时保持工程可解释性并支持不确定性下的决策。通过三个互补的案例研究展示了四种迁移学习策略(基于实例、基于参数、分层贝叶斯和多源):(i) 基于实例的迁移学习通过重要性加权,利用卡特里娜飓风观测数据演示了沿海桥梁易损性;(ii) 基于参数的迁移学习结合分层贝叶斯迁移学习,实现了跨层的部分合并和后验不确定性量化,利用伊恩飓风观测数据演示了住宅建筑易损性;(iii) 多源迁移学习融合多个分析易损性模型,学习源权重并进行正则化的目标域自适应,利用2001年尼斯夸利地震观测数据演示了地震桥梁易损性。在这些案例研究中,直接迁移源模型(即使用现有最先进模型)在领域偏移和严重类别不平衡下失败,而有针对性的自适应在低数据场景下显著提高了失效检测和预测稳定性。这些发现强调了在开发和自适应易损性模型时,需要对诊断、策略选择和不确定性报告提供系统指导。

英文摘要

This paper presents a methodology-centered transfer learning framework for fragility adaptation under domain shift, class imbalance, and scarce target labels while preserving engineering interpretability and supporting decision-making under uncertainty. Four transfer learning strategies (instance-based, parameter-based, hierarchical Bayesian, and multi-source) are demonstrated through three complementary case studies: (i) instance-based transfer learning via importance weighting, demonstrated on coastal bridge fragility using Hurricane Katrina observations; (ii) parameter-based transfer learning together with hierarchical Bayesian transfer learning, enabling partial pooling across strata and posterior uncertainty quantification, demonstrated on residential building fragility using Hurricane Ian observations; and (iii) multi-source transfer learning that fuses multiple analytical fragility models with learned source weights and regularized target-domain adaptation, demonstrated on seismic bridge fragility using observations from the 2001 Nisqually earthquake. Across these case studies, direct transfer of source models (i.e. using existing state-of-the-art models) fails under domain shift and severe class imbalance, while targeted adaptation substantially improves failure detection and predictive stability in low-data regimes. These findings highlight the need for systematic guidance on diagnostics, strategy selection, and uncertainty reporting when developing and adapting fragility models.

2506.14126 2026-06-18 cs.LG cs.AI 版本更新

From Memorization to Parameter Interference: How Overtraining Experts Harms Model Merging

从记忆到参数干扰:过度训练专家如何损害模型合并

Stefan Horoi, Guy Wolf, Eugene Belilovsky, Gintare Karolina Dziugaite

发表机构 * Concordia University(康科德大学) Mila -- Québec AI Institute(魁北克人工智能研究所) Google DeepMind(谷歌深Mind)

AI总结 本文研究专家模型微调过度对模型合并的影响,发现长时间微调导致记忆困难样本,造成参数干扰,降低合并性能,并提出任务相关的早停策略改善合并效果。

Comments Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026

详情
AI中文摘要

现代深度学习日益以使用开放权重基础模型为特征,这些模型可以在专门数据集上进行微调。这导致了专家模型和适配器的激增,通常通过HuggingFace和AdapterHub等平台共享。模型合并最近成为一种有效利用这些现有资源的方法,使得能够组合不同模型检查点的能力。因此,形成了一种自然的流程来利用迁移学习的好处并分摊沉没训练成本:模型在通用数据上预训练,在特定任务上微调,然后合并多个检查点以获得更强大的模型。一个普遍假设是,该流程中某一阶段的改进会向下游传播,从而在后续步骤中带来收益。在这项工作中,我们通过研究专家微调如何影响模型合并来挑战这一假设。我们表明,针对个体性能优化的专家长时间微调会导致跨视觉和语言模态、多种模型规模以及完全微调和LoRA适配模型的合并性能下降。我们将这种退化追溯到对一小部分困难样本的记忆,这些样本主导了微调后期步骤。这会导致负参数干扰,并编码在合并过程中被遗忘的知识。最后,我们证明任务相关的激进早停策略可以显著改善模型合并性能。

英文摘要

Modern deep learning is increasingly characterized by the use of open-weight foundation models that can be fine-tuned on specialized datasets. This has led to a proliferation of expert models and adapters, often shared via platforms like HuggingFace and AdapterHub. Model merging has recently emerged as an effective way to leverage these existing resources, enabling the composition of capabilities from different model checkpoints. A natural pipeline has thus formed to harness the benefits of transfer learning and amortize sunk training costs: models are pre-trained on general data, fine-tuned on specific tasks, and then multiple checkpoints are merged to obtain a more capable model. A prevailing assumption is that improvements at one stage of this pipeline propagate downstream, leading to gains at subsequent steps. In this work, we challenge that assumption by examining how expert fine-tuning affects model merging. We show that long fine-tuning of experts that optimizes for their individual performance leads to degraded merging performance across vision and language modalities, multiple model scales, and both fully fine-tuned and LoRA-adapted models. We trace this degradation to the memorization of a small set of difficult examples that dominate late fine-tuning steps. This causes negative parameter interference and encodes knowledge that is forgotten during merging. Finally, we demonstrate that task-dependent aggressive early stopping strategies can significantly improve model merging performance.

2602.09234 2026-06-18 cs.LG cs.AI 版本更新

Do Neural Networks Lose Plasticity in a Gradually Changing World?

神经网络在渐变世界中会失去可塑性吗?

Tianhui Liu, Lili Mou

发表机构 * Dept. Computing Science \& Alberta Machine Intelligence Institute (Amii), University of Alberta Canada CIFAR AI Chair

AI总结 研究任务转换的突然性对神经网络可塑性损失的影响,通过输入/输出插值和任务采样模拟渐变环境,理论和实验表明可塑性损失严重程度与任务转换突然性密切相关,渐变环境下可显著减轻。

详情
AI中文摘要

持续学习已成为机器学习的热门话题。最近的研究发现了一个有趣的现象,称为可塑性丧失,指的是神经网络逐渐失去学习新任务的能力。然而,现有的可塑性研究很大程度上依赖于具有突然任务转换的基准测试,而没有检验突然性本身是否导致了观察到的可塑性损失。在本文中,我们通过输入/输出插值和任务采样模拟逐渐变化的环境,研究了转换突然性的作用。我们进行了理论和实证分析,表明可塑性损失的严重程度与任务转换的突然性密切相关,并且在环境逐渐变化时可以显著降低。

英文摘要

Continual learning has become a trending topic in machine learning. Recent studies have discovered an interesting phenomenon called loss of plasticity, referring to neural networks gradually losing the ability to learn new tasks. However, existing plasticity research largely relies on benchmarks with abrupt task transitions, without examining whether the abruptness itself contributes to the observed plasticity loss. In this paper, we investigate the role of transition abruptness by simulating gradually changing environments through input/output interpolation and task sampling. We perform theoretical and empirical analysis, showing that the severity of plasticity loss is closely tied to the abruptness of task transitions, and can be substantially reduced when the environment changes gradually.

2303.18031 2026-06-18 cs.CV cs.AI cs.LG 版本更新

Simple Domain Generalization Methods are Strong Baselines for Open Domain Generalization

简单域泛化方法是开放域泛化的强基线

Masashi Noguchi, Shinichi Shirakawa

发表机构 * Graduate School of Environment and Information Sciences(环境与信息科学研究生院) Yokohama National University(Yokohama国立大学) Faculty of Environment(环境学系)

AI总结 本文评估现有域泛化方法在开放域泛化中的表现,发现简单方法CORAL和MMD与复杂方法DAML竞争力相当,并通过集成学习和Dirichlet混合数据增强简单扩展后性能接近DAML且计算成本更低。

Comments Accepted at IJCNN 2024. The code used in the experiments is available at https://github.com/shiralab/OpenDG-Eval

详情
AI中文摘要

在现实应用中,机器学习模型需要处理开放集识别(OSR),即在推理过程中出现未知类别,同时还要处理域偏移,即训练和推理阶段数据分布不同。域泛化(DG)旨在处理推理阶段目标域在模型训练期间不可访问的域偏移情况。开放域泛化(ODG)同时考虑DG和OSR。域增强元学习(DAML)是一种针对ODG的方法,但其学习过程复杂。相比之下,尽管已提出多种DG方法,但它们尚未在ODG场景下进行评估。在本研究中,我们全面评估了现有DG方法在ODG中的表现,并表明两种简单的DG方法——相关对齐(CORAL)和最大均值差异(MMD)——在多种情况下与DAML具有竞争力。此外,我们通过引入DAML中使用的技术(如集成学习和Dirichlet混合数据增强)提出了CORAL和MMD的简单扩展。实验评估表明,扩展后的CORAL和MMD可以以较低的计算成本达到与DAML相当的性能。这表明简单的DG方法及其简单扩展是ODG的强基线。

英文摘要

In real-world applications, a machine learning model is required to handle an open-set recognition (OSR), where unknown classes appear during the inference, in addition to a domain shift, where the data distribution differs between the training and inference phases. Domain generalization (DG) aims to handle the domain shift situation where the target domain of the inference phase is inaccessible during the model training. Open domain generalization (ODG) considers DG and OSR. Domain-augmented meta-learning (DAML) is a method targeting ODG; however, it has a complicated learning process. By contrast, although various DG methods have been proposed, they have not been evaluated in ODG situations. In this study, we comprehensively evaluate the existing DG methods in ODG and show that the two simple DG methods, CORrelation ALignment (CORAL) and maximum mean discrepancy (MMD), are competitive with DAML in several cases. In addition, we propose simple extensions of CORAL and MMD by introducing the techniques used in DAML, such as ensemble learning and Dirichlet mixup data augmentation. The experimental evaluation demonstrates that the extended CORAL and MMD can perform comparably to DAML with lower computational costs. This suggests that the simple DG methods and their simple extensions are strong baselines for ODG.

2510.15551 2026-06-18 cs.CL cs.AI cs.LG 版本更新

Rethinking Cross-lingual Gaps from a Statistical Viewpoint

从统计视角重新思考跨语言差距

Vihari Piratla, Purvam Jain, Darshan Singh, Trevor Cohn, Preethi Jyothi, Partha Talukdar

发表机构 * Google DeepMind(谷歌深Mind)

AI总结 提出跨语言差距源于目标语言响应方差,通过形式化偏差和无偏误差,并采用推理时集成方法降低方差,使跨语言迁移得分提升8%-50%以上。

Comments 30 pages

详情
AI中文摘要

任何知识片段通常以一种或少数几种自然语言表达在网页或大型语料库中。大型语言模型(LLMs)通过从源语言获取知识,并在使用目标语言查询时使其可访问,从而充当桥梁。跨语言差距是指使用目标语言而非源语言查询知识时准确率的下降。现有研究侧重于导致跨语言差距的建模或训练失败。在这项工作中,我们采取另一种视角来表征跨语言错误的性质,并假设目标语言中响应的方差是造成这一差距的关键原因。我们首次将跨语言差距形式化为有偏误差和无偏误差。通过多种控制方差并减少跨语言差距的推理时干预,我们实证验证了我们的假设。我们展示了几种测试时集成方法,这些方法降低了响应方差,从而将源-目标迁移得分提高了多达12个绝对百分点,在各种LLMs上实现了8%到超过50%的相对提升。

英文摘要

Any piece of knowledge is usually expressed in one or a handful of natural languages on the web or in any large corpus. Large Language Models (LLMs) act as a bridge by acquiring knowledge from a source language and making it accessible when queried using target languages. A cross-lingual gap is a drop in accuracy incurred when querying knowledge in a target language rather than the source language. Existing research focused on modeling or training failures leading to cross-lingual gaps. In this work, we take an alternative view to characterize the nature of cross-lingual error, and hypothesize that the variance of responses in the target language is a key cause of this gap. For the first time, we formalize the cross-lingual gap in terms of biased and unbiased errors. We empirically validate our hypothesis through multiple inference-time interventions that control variance and reduce the cross-lingual gap. We demonstrate a few test-time ensemble methods that reduce response variance, and thereby improve source-target transfer scores by up to 12 absolute points yielding relative gains of 8% to over 50% across various LLMs.

2602.17187 2026-06-18 stat.ML cs.LG 版本更新

Anti-causal domain generalization: Leveraging unlabeled data

反因果域泛化:利用无标签数据

Sorawit Saengkyongam, Juan L. Gamella, Andrew C. Miller, Jonas Peters, Nicolai Meinshausen, Christina Heinze-Deml

发表机构 * Apple(苹果公司) ETH Zürich(苏黎世联邦理工学院)

AI总结 针对反因果设置下的域泛化问题,提出利用无标签数据估计环境扰动方向,通过惩罚模型对协变量均值和协方差变化的敏感性实现鲁棒性,并提供最坏情况最优性保证。

Comments Accepted at the International Conference on Machine Learning (ICML) 2026

详情
AI中文摘要

域泛化问题关注的是学习在部署到新的、未见过的环境时对分布变化具有鲁棒性的预测模型。现有方法通常需要来自多个训练环境的标记数据,这在标记数据稀缺时限制了它们的适用性。在这项工作中,我们研究了反因果设置下的域泛化,其中结果导致观察到的协变量。在这种结构下,影响协变量的环境扰动不会传播到结果,这促使我们对模型对这些扰动的敏感性进行正则化。关键在于,估计这些扰动方向不需要标签,使我们能够利用来自多个环境的无标签数据。我们提出了两种方法,分别惩罚模型对跨环境协变量均值和协方差变化的敏感性,并证明这些方法在特定环境类别下具有最坏情况最优性保证。最后,我们在一个受控物理系统和一个生理信号数据集上展示了我们方法的实证性能。

英文摘要

The problem of domain generalization concerns learning predictive models that are robust to distribution shifts when deployed in new, previously unseen environments. Existing methods typically require labeled data from multiple training environments, limiting their applicability when labeled data are scarce. In this work, we study domain generalization in an anti-causal setting, where the outcome causes the observed covariates. Under this structure, environment perturbations that affect the covariates do not propagate to the outcome, which motivates regularizing the model's sensitivity to these perturbations. Crucially, estimating these perturbation directions does not require labels, enabling us to leverage unlabeled data from multiple environments. We propose two methods that penalize the model's sensitivity to variations in the mean and covariance of the covariates across environments, respectively, and prove that these methods have worst-case optimality guarantees under certain classes of environments. Finally, we demonstrate the empirical performance of our approach on a controlled physical system and a physiological signal dataset.

11. 数据集、基准与评测 39 篇

2606.18307 2026-06-18 cs.LG cs.AI 新提交

DRIFT: Refining Instruction Data via On-Policy Data Attribution

DRIFT: 通过在线策略数据归因优化指令数据

Zefan Wang, Lincheng Li, Tianyu Yu, Yuan Yao

发表机构 * Tsinghua University(清华大学)

AI总结 提出DRIFT方法,利用在线策略影响函数解决标准影响函数在指令微调数据归因中的近邻偏差和梯度范数偏差问题,通过模型自身生成作为验证目标,提升7B模型性能上限。

详情
AI中文摘要

优化监督微调(SFT)的训练数据分布决定了大型语言模型(LLMs)的能力。虽然现有的数据筛选方法在有限预算下加速训练方面表现出色,但它们不太适合提升能力上限。这里的挑战不再是识别一个保持性能的较小子集,而是将数据分布优化为最能提升最终模型的实例。为了解决这个问题,我们探索了使用影响函数(IF)进行实例级数据归因。我们发现标准IF公式在此设置中存在两个结构限制:由离策略验证目标引起的近邻偏差,以及对梯度范数的严重偏向。我们提出了DRIFT(通过在线策略影响函数进行数据优化用于监督微调)。DRIFT不依赖外部参考数据,而是利用模型的在线策略生成作为验证目标,这在经验上最小化了参数近邻偏差,并更好地符合IF的局部邻域假设。它进一步基于轨迹正确性应用符号加权,并针对梯度操纵问题对影响分数进行去偏,使得少量验证查询能够作为可靠锚点来归因整个数据集。在7B参数指令和推理模型上的实验表明,DRIFT持续提升了两者的性能上限,优于现有的数据筛选基线。

英文摘要

Optimizing the training data distribution for Supervised Fine-Tuning (SFT) dictates the capability of Large Language Models (LLMs). While existing data curation methods excel at accelerating training under constrained budgets, they are less suited to elevating the capability upper bound. The challenge here is no longer to identify a smaller subset that preserves performance, but to refine the data distribution toward instances most capable of improving the final model. To address this problem, we explore instance-level data attribution using Influence Functions (IF). We identify that standard IF formulations struggle in this setting due to two structural limitations: a proximity gap caused by off-policy validation targets, and a severe bias towards gradient norm. We propose DRIFT (Data Refinement via On-Policy Influence Functions for Supervised Fine-Tuning). Instead of relying on external reference data, DRIFT utilizes the model's on-policy rollouts as validation targets, which empirically minimizes the parameter proximity gap and better aligns with the local neighborhood assumption of IF. It further applies signed weighting based on trajectory correctness and debiases influence scores against the gradient hacking issue, allowing a small set of validation queries to act as reliable anchors for attributing the full dataset. Experiments on 7B-parameter instruction and reasoning models show that DRIFT consistently raises the performance ceiling on both, outperforming existing data curation baselines.

2606.18338 2026-06-18 cs.LG astro-ph.EP astro-ph.IM 新提交

ThousandWorlds: A benchmark for climate emulation of potentially habitable exoplanets

ThousandWorlds: 一个用于潜在宜居系外行星气候模拟的基准数据集

Edward T. Stevenson, Mei Ting Mak, Eric Wolf, Denis E. Sergeev, Tobi Hammond, N. J. Mayne, Miles Cranmer

发表机构 * University of Cambridge(剑桥大学) University of Oxford(牛津大学) University of Colorado Boulder(科罗拉多大学博尔德分校) University of Bristol(布里斯托大学) Purdue University(普渡大学) University of Exeter(埃克塞特大学)

AI总结 为加速系外行星气候模拟,提出ThousandWorlds基准数据集,包含五个全球气候模型的约1800次模拟,用于评估机器学习模拟器在低数据、多模拟器参数到场回归任务中的性能。

Comments 10 pages main text, 26 pages references/appendix, plus NeurIPS checklist. Data at https://doi.org/10.57967/hf/8695. Code at https://github.com/edstevenson/ThousandWorlds

详情
AI中文摘要

寻找地球以外生命将依赖于探测潜在宜居系外行星大气中的微弱特征。解释这些特征需要了解宿主行星的气候:同一分子可能在一颗行星上标志着生命,而在另一颗行星上则是非生物化学的结果。全球气候模型(GCM)提供了这种理解,但单次运行可能需要多达数百万核心小时和大量领域专家时间。机器学习模拟器可以消除这一瓶颈,但由于缺乏经过整理的多模型系外气候数据集,进展受到限制。我们介绍了ThousandWorlds,这是一个为系外气候模拟以及更广泛的低数据、多模拟器、参数到场回归任务设计的ML就绪基准数据集。该数据集包含来自五个GCM的大约1800次模拟,将八个行星参数映射到三维大气场,包括温度、湿度、风、云和辐射。三个嵌套子集定义了逐步增加的挑战:单模拟器回归、具有完整观测的多模拟器回归以及具有结构化缺失的多模拟器回归。我们提出了两个评估协议:一个用于方法排名,另一个用于衡量相对于GCM自身分歧的性能。我们评估了七种基线方法,涵盖简单方法、深度学习和高斯过程。基于GP的方法表现最佳,表明ThousandWorlds揭示了一个现成深度学习尚未成功的领域。数据:此https URL。代码:此https URL。

英文摘要

The search for life beyond Earth will depend on detecting faint signatures in the atmospheres of potentially habitable exoplanets. Interpreting those signatures requires understanding the host planet's climate: the same molecule may signal life on one planet and abiotic chemistry on another. Global climate models (GCMs) provide this understanding, but individual runs can require up to millions of core-hours and substantial domain expert time. Machine-learning emulators could remove this bottleneck, but progress has been limited by the absence of a curated, multi-model exoclimate dataset. We introduce ThousandWorlds, an ML-ready benchmark for exoclimate emulation and for the broader regime of low-data, multi-simulator, parameter-to-field regression. The dataset contains approximately 1800 simulations from five GCMs, mapping eight planet parameters to 3D atmospheric fields including temperature, humidity, winds, clouds, and radiation. Three nested subsets define progressively harder challenges: single-simulator regression, multi-simulator regression with complete observations, and multi-simulator regression with structured missingness. We propose two evaluation protocols: one for ranking methods, and one that measures performance relative to the disagreement between GCMs themselves. We evaluate seven baselines spanning simple methods, deep learning, and Gaussian processes. GP-based methods perform best, suggesting that ThousandWorlds exposes a regime where off-the-shelf deep learning does not yet succeed. Data: https://doi.org/10.57967/hf/8695. Code: https://github.com/edstevenson/ThousandWorlds.

2606.18367 2026-06-18 cs.LG 新提交

Do Time Series Foundation Model Benchmarks Hide Regime-Dependent Failures? Evidence from Traffic Speed Forecasting

时间序列基础模型基准是否隐藏了依赖于状态的失败?来自交通速度预测的证据

Yingshuo Wang, Xian Sun, Lingdong Kong, Wei Gao, Yanhang Li, Zhichao Fan, Zexin Zhuang

发表机构 * University of California, Berkeley(加州大学伯克利分校) Duke University(杜克大学) National University of Singapore(新加坡国立大学) Northeastern University(东北大学) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Southern Methodist University(南卫理公会大学)

AI总结 本文提出状态分层评估方法,发现时间序列基础模型在交通状态转换时准确率和预测区间覆盖率显著下降,并提出了双峰混合增强方法以改善转换状态覆盖。

Comments 5 pages, 2 figures. Accepted at the Workshop on Forecasting as a New Frontier of Intelligence, ICML 2026

详情
AI中文摘要

标准基准使用聚合指标评估时间序列基础模型(TSFMs),但这可能掩盖关键运行状态下的严重失败。我们引入了状态分层评估,并将其应用于两个标准交通速度基准上的三个TSFMs。交通在自由流和拥堵状态之间表现出突然的状态切换,在转换期间产生双峰速度分布。当我们按交通状态分层时,准确率和预测区间覆盖率在转换期间急剧下降:转换状态的MAE达到11 mph(而总体为3 mph),90%预测区间的经验覆盖率低至55%。这些失败在聚合指标中不可见,因为自由流观测主导了样本。一个简单的历史条件基线(从每个传感器的训练分布中采样)实现了比任何TSFM更好的转换覆盖率,但总体准确率差得多。我们提出了双峰混合增强(BMA),一种后处理方法,将TSFM预测与历史分布知识相结合,在保持TSFM准确率的同时接近历史基线的转换覆盖率。我们的结果表明,TSFM基准应纳入状态感知评估,以揭示聚合指标隐藏的失败。

英文摘要

Standard benchmarks evaluate time series foundation models (TSFMs) using aggregate metrics, but these can mask severe failures in critical operating regimes. We introduce regime-stratified evaluation and apply it to three TSFMs on two standard traffic speed benchmarks. Traffic exhibits abrupt regime switching between free-flow and congested states, producing bimodal speed distributions during transitions. When we stratify by traffic regime, both accuracy and prediction-interval coverage degrade sharply during transitions: transition-regime MAE reaches 11 mph (versus 3 mph overall), and empirical coverage of 90% prediction intervals drops as low as 55%. These failures are invisible in aggregate metrics because free-flow observations dominate the sample. A simple historical conditional baseline (sampling from per-sensor training distributions) achieves better transition coverage than any TSFM, but has far worse overall accuracy. We propose bimodal mixture augmentation (BMA), a post-hoc method that combines TSFM forecasts with historical distributional knowledge, approaching the historical baseline's transition coverage while preserving the TSFM's accuracy. Our results suggest that TSFM benchmarks should incorporate regime-aware evaluation to surface failures that aggregate metrics hide.

2606.18451 2026-06-18 cs.LG 新提交

A Cross-Model VLM-Judge Protocol for Single-Image 3D Mesh Quality (and Why Cheap Proxies Fall Short)

跨模型VLM评判协议用于单图像3D网格质量(以及为什么廉价代理方法不足)

Ali Asaria, Tony Salomone, Deep Gandhi

发表机构 * Transformer Lab

AI总结 提出可重复的VLM评判协议评估单图3D网格质量,发现几何有效性和渲染CLIP等廉价代理方法无法替代VLM评判。

详情
AI中文摘要

单图像到3D生成器正在快速改进,但目前没有公认的、无需人工的方法来判断生成的网格是否优于另一个。从业者通常依赖廉价的自动代理方法(渲染空间的CLIP相似性和网格几何有效性统计),但这些方法在多大程度上跟踪感知质量尚未确定。我们做出两项贡献。首先,我们提出并验证了一个可重复的VLM评判评估协议:一个固定的24视角无头渲染装置、两个独立的视觉语言评判家族,以及一个强制的位置偏差校正,该校正查询两种呈现顺序并仅保留顺序一致的判决。两个评判家族彼此高度一致(Cohen's kappa = 0.66),远高于随机一致性基线。其次,以该协议为参考,我们证明廉价代理方法无法替代它。几何有效性平均而言仅是一个弱信号(因为,如我们所示,它是双峰的),且低于我们预先注册的目标,而渲染CLIP则处于随机水平。一个学习的Bradley-Terry头部坍缩到一个单一流形统计量(给渲染CLIP赋予负权重),并且与仅几何方法完全匹配,因此学习特征权重毫无收益。该代理方法也是双峰的:在具有可见几何缺陷的对比中显著高于随机水平,但在模糊对比中处于随机水平,这与几何有效性仅在缺陷视觉显著时跟踪评判者的行为一致。因此,我们推荐VLM评判协议作为在测试条件下(Google Scanned Objects上的两个前馈生成器,采用面丢失退化机制)可靠且可重复的评估器,并建议不要将几何/CLIP代理方法作为优化目标。

英文摘要

Single-image-to-3D generators are improving quickly, but there is no agreed, human-free way to tell whether one generated mesh is better than another. Practitioners commonly rely on cheap automatic proxies (render-space CLIP similarity and mesh geometry-validity statistics), yet how well these track perceived quality is unestablished. We make two contributions. First, we propose and validate a reproducible VLM-judge evaluation protocol: a fixed 24-view headless render rig, two independent vision-language judge families, and a mandatory position-bias correction that queries both presentation orders and keeps only order-consistent verdicts. The two judge families agree substantially with each other (Cohen's kappa = 0.66), well above the chance-agreement floor. Second, using this protocol as the reference, we show the cheap proxies do not substitute for it. Geometry validity is only a weak signal on average (because, as we show, it is bimodal) and stays below our pre-registered target, while render-CLIP is at chance. A learned Bradley-Terry head collapses onto a single manifoldness statistic (giving render-CLIP a negative weight) and matches geometry-only exactly, so learning the feature weights buys nothing. The proxy is also bimodal: it is significantly above chance on contrasts with visible geometric defects but at chance on ambiguous contrasts, consistent with geometry validity tracking the judge only when the defect is visually salient. We therefore recommend the VLM-judge protocol as a reliable, reproducible evaluator under the conditions tested (two feed-forward generators on Google Scanned Objects, with a face-drop degradation regime) and advise against geometry/CLIP proxies as optimization targets.

2606.18539 2026-06-18 cs.LG stat.ML 新提交

TS-Fault: Benchmarking Time Series Forecasters Against Structural Faults

TS-Fault: 针对结构性故障的时间序列预测器基准测试

Yuyang Zhao, Lian Xu, Hao Miao, Chenxi Liu, Hao Xue

发表机构 * Ray-zyy

AI总结 提出TS-Fault基准,通过参数化故障场景(沿观测/机制、单变量/多变量两轴)评估时间序列预测模型鲁棒性,发现干净数据准确性与鲁棒性负相关、机制级故障重排排名、基础模型最脆弱。

详情
AI中文摘要

时间序列预测(TSF)支撑着能源、交通、金融和医疗等领域的关键决策,然而TSF模型几乎普遍通过在干净保留数据上的单一数字(如平均误差)进行排名,隐含假设该数字能预测部署可靠性。但实际故障并非独立同分布噪声,而是具有时间形状的结构化事件、断裂的跨变量依赖、伴随缺失的机制变化以及跨传感管道的因果传播。将TSF鲁棒性视为数据质量问题,我们提出TS-Fault,一个在显式、参数化且具有可控语义难度的故障场景下评估预测模型的基准。TS-Fault将重复出现的故障沿两个正交轴(观测级 vs 机制级;单变量 vs 多变量)组织为四种模式,并通过统一重要性评分将每种故障注入最关键的预测窗口。该设计使得鲁棒性能够针对模型实际依赖的结构进行测试,而非简化为通用噪声敏感性。我们在6个数据集、4种模式和5个难度级别上,采用配对干净/损坏协议评估了21个模型。结果揭示了三个与常见排行榜直觉相悖的发现:(i)干净数据准确性与鲁棒性负相关;(ii)干净排名在观测级故障下保持不变,但在机制级故障下重新洗牌;(iii)所有灾难性故障均发生在机制级故障下,基础模型在干净数据上准确率最高但表现出最大的脆弱性。代码已公开于该URL。

英文摘要

Time series forecasting (TSF) underpins consequential decisions in energy, transportation, finance, and healthcare, yet TSF models are almost universally ranked by a single number (e.g., average error) on clean held-out data, under the implicit assumption that it predicts deployed reliability. However, real faults are not i.i.d noise but structured events with temporal shape, broken cross-variable dependencies, regime change coupled with missingness, and causal propagation across a sensing pipeline. Treating TSF robustness as a data-quality problem, we present TS-Fault, a benchmark that evaluates forecasting models under explicit, parameterized fault scenarios with controllable semantic difficulty. TS-Fault organizes recurring failures into four modes along two orthogonal axes (observation- vs mechanism-level; univariate vs multivariate) and injects each fault into the most prediction-critical window via a unified importance score. This design enables robustness to be tested against the structures models actually rely on, rather than reduced to generic noise sensitivity. We evaluate 21 models across 6 datasets, 4 modes, and 5 difficulty levels under a paired clean/corrupt protocol. The results reveal three findings that contradict common leaderboard intuition: (i) clean-data accuracy anti-correlates with robustness; (ii) clean rankings are preserved under observation-level faults but reshuffled under mechanism-level faults; and (iii) all catastrophic failures occur under mechanism-level faults, with foundation models achieving the highest clean-data accuracy yet exhibiting the greatest fragility. The code is publicly available at https://github.com/Ray-zyy/TS-Fault.

2606.18640 2026-06-18 cs.LG q-bio.QM 新提交

MetaboNet-Bench: A Multi-modal Benchmark for Glucose Forecasting in Type 1 Diabetes

MetaboNet-Bench:1型糖尿病血糖预测的多模态基准

Nathaniel Jeffries, Miriam Wolff, Sam Royston, Elizabeth Healey, Caleb Mayer, David Klonoff, Michael Snyder, Tao Wang

发表机构 * Department of Genetics, Stanford University School of Medicine(斯坦福大学医学院遗传学系) Replica Health Boston Children’s Hospital, Harvard Medical School(哈佛医学院波士顿儿童医院) Diabetes Research Institute, Mills-Peninsula Medical Center(米尔斯半岛医学中心糖尿病研究所)

AI总结 针对1型糖尿病血糖预测算法缺乏标准化评估基准的问题,提出MetaboNet-Bench多模态基准,集成血糖、胰岛素和碳水化合物数据,通过多个模型对比验证多模态数据对模型性能的影响。

Comments main content in 10 pages with 5 figures; supplementary section with 11 more pages and 5 more figures

详情
AI中文摘要

血糖预测算法是1型糖尿病血糖控制管理的重要方面。迄今为止,研究社区已经开发了大量预测算法和模型。然而,公认的是,缺乏标准化的模型性能评估基准使得公平比较变得困难,并阻碍了进一步的创新,因此基准标准化迫在眉睫。此外,许多已发表的血糖预测算法仅限于CGM数据,忽略了其他多模态信号,如胰岛素剂量和碳水化合物摄入。在此,我们介绍MetaboNet-Bench,这是一个针对1型糖尿病患者的多模态血糖预测基准,它提供了一个可扩展的开源评估框架,用于比较利用血糖、胰岛素和碳水化合物数据的血糖预测算法。然后,我们通过基准测试几个最近发布的血糖预测模型和一个自定义的多模态时间序列模型(代表不同的模型架构)来展示其实用性。结果表明,添加数据模态的好处取决于模型的复杂性,并且纳入更多临床指标有助于识别未来研究中有意义的空白。

英文摘要

Glucose forecasting algorithms are an important aspect of glycemic control management in type 1 diabetes. So far, the research community has developed numerous algorithms and models for forecasting. However, it is well-recognized that the lack of standardized model performance evaluation benchmarks makes fair comparison difficult and hinders further innovation, and thus benchmark standardization is in urgent need. Furthermore, many published glucose forecasting algorithms are limited to CGM data alone, ignoring other multimodal signals such as insulin dosing and carbohydrate intake. Here, we introduce MetaboNet-Bench, a benchmark for multimodal glucose forecasting for patients with type 1 diabetes that provides an extensible open-source evaluation framework for comparison of glucose forecasting algorithms that leverage glucose, insulin, and carbohydrate data. We then demonstrate its utility by benchmarking several recently published glucose forecasting models and a custom multimodal time-series model, representing different model architectures. The results show that the benefit of adding data modalities is conditioned on the complexity of the model and that incorporating more clinical metrics helps identify meaningful gaps to fill for future research.

2606.18677 2026-06-18 cs.LG cs.AI 新提交

Bounded Context Management for Tabular Foundation Models on Stream Learning

表格基础模型在流学习中的有界上下文管理

Jinmo Lee, Doyun Choi, Moongi Choi, Jaemin Yoo

发表机构 * Seoul National University(首尔大学) KAIST(韩国科学技术院)

AI总结 针对表格流学习中分布漂移问题,提出上下文管理策略CURE,通过不确定性门控准入和冗余感知驱逐管理上下文,在七个流上相对提升最高27.0%。

Comments Accepted as a spotlight oral (top 5%) at the 2nd ICML Workshop on Foundation Models for Structured Data (FMSD@ICML2026)

详情
AI中文摘要

表格流学习需要在分布漂移下对顺序到达的样本进行预测。虽然标准方法通过更新模型状态来适应,但表格基础模型(TFMs)以上下文方式基于标记上下文进行预测,使其成为流学习的自然替代方案。这便将挑战从如何更新模型转移到如何管理上下文。我们提出一种未来信息视角,为上下文管理导出三个实际需求:保留最近样本、保留不确定样本、移除冗余样本。我们将这些需求实例化为CURE(通过不确定性感知准入和冗余感知驱逐的上下文管理),一种具有熵门控准入和冗余感知驱逐的上下文管理策略。在七个流上,CURE相比经典流学习器相对提升高达27.0%,在多个TFM骨干上保持鲁棒,并在其他策略变体中排名第一。代码和数据集可在该https URL获取。

英文摘要

Tabular stream learning requires predictions on sequentially arriving examples under distribution shift. While standard methods adapt by updating model states, tabular foundation models (TFMs) make predictions conditioned on a labeled context in an in-context manner, making them a natural alternative for stream learning. This shifts the challenge from how to update the model to how to manage the context. We propose a future information view that yields three practical requirements for context management: preserve recent examples, retain uncertain examples, and remove redundant examples. We instantiate these requirements as CURE (Context management via Uncertainty-aware admission and Redundancy aware Eviction), a context-managing policy with entropy-gated admission and redundancy-aware eviction. Across seven streams, CURE shows up to 27.0% relative improvement over classical stream learners, remains robust across multiple TFM backbones, and ranks first among other policy variants. Code and datasets are available at https://github.com/morcellinus/CURE-ICML-FMSD.

2606.18774 2026-06-18 cs.LG 新提交

RouteJudge: An Open Platform for Reproducible and Preference-Aware LLM Routing

RouteJudge: 一个可复现且偏好感知的LLM路由开放平台

Guannan Lai, Haoran Hu, Han-Jia Ye

发表机构 * School of Artificial Intelligence, Nanjing University(南京大学人工智能学院) National Key Laboratory for Novel Software Technology, Nanjing University(南京大学计算机软件新技术国家重点实验室) SinapisAI

AI总结 提出RouteJudge平台,通过匿名成对比较评估LLM路由策略的决策质量,并发布ORBIT工具箱标准化路由工作流,支持可复现和偏好感知的路由评估。

Comments Accepted by Pluralistic Alignment Workshop at ICML 2026

详情
AI中文摘要

我们提出RouteJudge,一个用于LLM路由系统的在线成对偏好评估框架,并提供一个公开平台(https://...)。与模型级别的响应评估不同,RouteJudge关注路由器级别的决策质量。对于每个用户查询,多个路由策略在相同的模型池和预算约束下独立推荐候选模型。然后通过匿名成对比较将所选模型的响应呈现给用户,由此产生的用户偏好归因于比较响应背后的路由策略。每条评估记录存储查询、路由决策、模型响应、偏好标签、成本、延迟和任务元数据,从而支持对LLM路由器进行偏好感知、成本感知和任务条件分析。为了支持RouteJudge中路由方法的持续扩展,我们进一步发布了ORBIT(最优路由与预算推理工具箱),这是一个模块化且可扩展的工具箱,标准化了LLM路由的端到端工作流。ORBIT为基准加载、查询表示、路由器实现、预算感知评估和方法比较提供了统一接口,允许研究人员在一致的协议下开发和评估路由算法。它同时作为RouteJudge的提交和集成层:研究人员可以在ORBIT中实现路由方法,在现有路由基准上验证它们,并提交兼容的路由器进行在线偏好评估。ORBIT的代码可在https://...获取。

英文摘要

We present RouteJudge, an online pairwise preference evaluation framework for LLM routing systems, with a public platform available at https://routejudge.cn. Different from model-level response evaluation, RouteJudge focuses on router-level decision quality. For each user query, multiple routing strategies independently recommend candidate models under the same model pool and budget constraints. The selected model responses are then presented to users through anonymous pairwise comparisons, and the resulting user preferences are attributed back to the routing strategies behind the compared responses. Each evaluation record stores the query, routing decisions, model responses, preference labels, cost, latency, and task metadata, enabling preference-aware, cost-aware, and task-conditioned analysis of LLM routers. To support the continuous expansion of routing methods in RouteJudge, we further release ORBIT (Optimal Routing and Budgeted Inference Toolbox), a modular and extensible toolbox that standardizes the end-to-end workflow of LLM routing. ORBIT provides unified interfaces for benchmark loading, query representation, router implementation, budget-aware evaluation, and method comparison, allowing researchers to develop and evaluate routing algorithms under consistent protocols. It also serves as the submission and integration layer for RouteJudge: researchers can implement routing methods within ORBIT, validate them on existing routing benchmarks, and submit compatible routers for online preference-based evaluation. The code of ORBIT is available at https://github.com/AIGNLAI/LAMDA-ORBIT.

2606.18829 2026-06-18 cs.LG cs.CL 新提交

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

GateMem:多主体共享内存代理中的内存治理基准

Zhe Ren, Yibo Yang, Yimeng Chen, Zijun Zhao, Benshuo Fu, Zhihao Shu, Bingjie Zhang, Yangyang Xu, Dandan Guo, Shuicheng Yan

发表机构 * School of Artificial Intelligence, Jilin University(吉林大学人工智能学院) Shanghai Jiao Tong University(上海交通大学) King Abdullah University of Science and Technology (KAUST)(卡尔斯鲁厄大学) Tsinghua University(清华大学) National University of Singapore(新加坡国立大学)

AI总结 提出GateMem基准,评估多主体共享内存代理在效用、访问控制和遗忘三方面的治理能力,发现现有方法无法同时满足三者。

Comments 24 pages, 8 figures. Code and dataset are available at https://github.com/rzhub/GateMem and https://huggingface.co/datasets/Ray368/GateMem

详情
AI中文摘要

LLM代理的内存基准主要假设单用户设置,而医院、工作场所、校园和家庭中的共享助手研究不足。在这些部署中,多个主体写入公共内存池并根据不同角色、范围和关系进行查询,因此内存质量需要治理和召回。我们引入GateMem,一个多主体共享内存代理的基准。GateMem联合评估合法长期请求的效用(含状态更新)、跨上下文授权边界的访问控制,以及显式删除请求后的主动遗忘。它涵盖医疗、办公、教育和家庭领域,包含长形式多方情节、增量内存注入、隐藏检查点、结构化评判和泄漏目标注释。在多种基线和骨干模型上,没有方法能同时实现强效用、鲁棒访问控制和可靠遗忘。长上下文提示通常以高令牌成本获得最佳治理分数,而基于检索和外部内存的方法降低成本但仍泄漏未授权或已删除信息。这些结果表明,当前内存代理远未达到可靠的共享机构部署水平。

英文摘要

Memory benchmarks for LLM agents largely assume single-user settings, leaving shared assistants for hospitals, workplaces, campuses, and households understudied. In these deployments, multiple principals write to a common memory pool and query it under different roles, scopes, and relationships, so memory quality requires governance as well as recall. We introduce GateMem, a benchmark for multi-principal shared-memory agents. GateMem jointly evaluates utility for legitimate long-horizon requests with state updates, access control across contextual authorization boundaries, and agent-facing active forgetting after explicit deletion requests. It spans medical, office, education, and household domains, with long-form multi-party episodes, incremental memory injection, hidden checkpoints, structured judging, and leak-target annotations. Across diverse baselines and backbone models, no method simultaneously achieves strong utility, robust access control, and reliable forgetting. Long-context prompting often yields the best governance score at high token cost, while retrieval-based and external-memory methods reduce cost yet still leak unauthorized or deleted information. These results show current memory agents remain far from reliable shared institutional deployment.

2606.18833 2026-06-18 cs.LG 新提交

Seed-Guided Semi-Supervised Clustering by A-Contrario Anomaly Detection

基于A-Contrario异常检测的种子引导半监督聚类

Nassir Mohammad

发表机构 * Cyber Innovation Lab, Airbus, Newport, UK(空中客车公司网络创新实验室(英国纽波特))

AI总结 提出一种基于统计对偶性的半监督聚类框架,通过a-contrario推理和感知算法,利用种子标签初始化并迭代排除异常点,实现鲁棒聚类,在少量种子下达到强性能。

详情
AI中文摘要

本文介绍了一种基于分组原则与异常检测之间统计对偶性的半监督聚类框架。我们解决了噪声环境中鲁棒聚类定义的挑战——在该任务中,划分算法往往过度分配离群点,而基于密度的方法仍对启发式全局参数敏感。借鉴\textit{a-contrario}统计推理和格式塔邻近原则,我们将聚类定义为相对于均匀随机性零假设不包含任何异常点的最大数据点子集。该方法的核心是感知算法,该算法利用基于期望的原则性阈值($\mathbb{E} < 1$)来识别异常点,无需手动参数调整。通过将聚类视为异常检测的对偶问题,我们采用迭代的“通过排除进行聚类”机制。该算法由种子引导,利用最少的用户提供标签来初始化鲁棒的聚类中位数并形成初始组,随后通过接纳非异常点进行扩展。这种方法自然地隔离了边缘点、孤立噪声和新兴的未知聚类。我们在合成和真实基准数据集上评估了该方法,包括通过原始、线性降维和邻域保持嵌入表示的图像和文本数据集。结果表明,在每个聚类仅使用10-30个种子的情况下,所提出的方法在实用的低调优基准测试协议下实现了具有竞争力且通常非常强的性能,同时在固定种子聚类数和迭代次数下,对观测数和维度均保持线性可扩展性。

英文摘要

This paper introduces a semi-supervised clustering framework grounded in the statistical duality between grouping principles and anomaly detection. We address the challenge of robust cluster definition in noisy environments -- a task where partitioning algorithms often over-assign outliers and density-based methods remain sensitive to heuristic global parameters. Drawing on \textit{a-contrario} statistical reasoning and Gestalt proximity principles, we define a cluster as a maximal subset of data points containing no anomalies relative to a null hypothesis of uniform randomness. Central to this approach is the Perception algorithm, which utilises a principled expectation-based threshold ($\mathbb{E} < 1$) to identify outliers without manual parameter tuning. By treating clustering as the dual of anomaly detection, we employ an iterative ``clustering-by-exclusion'' mechanism. The algorithm is seed-guided, leveraging minimal user-provided labels to initialise robust cluster medians and form initial groups, which are subsequently expanded by admitting non-anomalous points. This approach naturally isolates fringe points, isolated noise, and emerging unknown clusters. We evaluate the method on synthetic and real-world benchmarks, including image and text datasets represented through raw, linear-reduced, and neighbourhood-preserving embeddings. Results demonstrate that with as few as 10--30 seeds per cluster, the proposed method achieves competitive and often very strong performance under a practical low-tuning benchmarking protocol, while maintaining linear scalability with respect to both observations and dimensionality for a fixed number of seeded clusters and iterations.

2606.18970 2026-06-18 cs.LG cs.AI cs.CV 新提交

A Controlled Benchmark of Quantum-Latent GAN Augmentation for Brain MRI

脑MRI的量子潜GAN增强的受控基准测试

Syed Mujtaba Haider, Silvia Figini

发表机构 * Department of Mathematics(数学系) Department of Political and Social Sciences(政治与社会科学系)

AI总结 通过受控基准测试,比较量子与经典生成器在脑MRI数据增强中的性能,发现两者均未显著优于仅用真实数据训练,且量子生成器无额外优势。

Comments This work has been submitted to the IEEE for possible publication. This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

医学图像分类常受限于有限的标注数据,因此生成式增强被提出;最近,量子生成模型被用于此目的,并经常报告准确率提升。然而,这些声称通常基于单次训练运行,未匹配量子与经典生成器的参数预算,也未表征任何收益出现的数据范围。我们提出了一个受控基准测试,隔离量子生成器对脑MRI增强的贡献。图像被编码到KL正则化的潜在空间中,在该空间中,使用变分量子生成器或参数数量几乎相同的经典生成器(1648 vs. 1632)训练带有梯度惩罚的条件Wasserstein GAN。合成样本被解码并用于增强预训练分类器,覆盖从5%到100%的标注数据比例,通过八个随机种子进行配对显著性检验(多重比较校正)以及集内多样性和潜在分布分析。在所有比例下,没有增强变体显著优于仅用真实数据训练,且量子与经典生成器在统计上无法区分。任何低数据优势表现为正则化而非忠实的数据扩展:合成样本分布外移,并且在数据稀缺时严重模式崩溃,而量子生成器并不比经典生成器更多样化。我们发布该协议作为医学成像中量子生成增强严格评估的测试平台。

英文摘要

Medical image classification is often constrained by limited labeled data, motivating generative augmentation; recently, quantum generative models have been proposed for this purpose, frequently reporting accuracy gains. However, such claims are typically based on single training runs, do not match the parameter budgets of the quantum and classical generators, and do not characterize the data regime in which any benefit appears. We present a controlled benchmark that isolates the contribution of a quantum generator to brain-MRI augmentation. Images are encoded into a KL-regularized latent space in which a conditional Wasserstein GAN with gradient penalty is trained using either a variational quantum generator or a classical generator of near-identical parameter count (1648 vs. 1632). Synthetic samples are decoded and used to augment a pretrained classifier across labeled data fractions from 5% to 100%, evaluated over eight random seeds with paired significance testing (with multiple-comparison correction) and with intraset diversity and latent-distribution analyses. Across all fractions, no augmentation variant significantly outperforms real-data-only training, and the quantum and classical generators are statistically indistinguishable. Any low-data benefit behaves as regularization rather than faithful data expansion:synthetic samples are off distribution and severely mode collapsed precisely where data is scarce, and the quantum generator is no more diverse thanits classical counterpart. We release the protocol as a testbed for rigorous evaluation of quantum generative augmentation in medical imaging.

2606.19297 2026-06-18 cs.LG cs.RO 新提交

Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models

VLA 甚至知道基础知识吗?衡量视觉-语言-动作模型中的常识和世界知识保留

Nikita Kachaev, Andrey Moskalenko, Matvey Skripkin, Nikita Kurlaev, Daria Pugacheva, Albina Burlova, Mikhail Kolosov, Denis Shepelev, Andrey Kuznetsov, Elena Tutubalina, Aleksandr I. Panov, Alexey K. Kovalev, Vlad Shakhuro

发表机构 * CogAI Lab(CogAI实验室) FusionBrain Lab(FusionBrain实验室) IAI MSU(莫斯科大学人工智能研究所) Lomonosov MSU(莫斯科国立罗蒙诺索夫大学) NUST MISIS(国立研究型技术大学MISIS) Applied AI Institute(应用人工智能研究所) HSE University(高等经济大学) Generalizable AI Systems(通用人工智能系统实验室) ISP RAS(俄罗斯科学院系统编程研究所) MIRAI Domain-specific NLP Group(领域特定自然语言处理组)

AI总结 提出 Act2Answer 协议,通过动作回答评估 VLA 模型的知识保留,发现模型在简单概念上表现良好,但在丰富语义类别上存在差距,且 VQA 联合训练有助于知识保留。

Comments Project page: https://tttonyalpha.github.io/act2answer/

详情
AI中文摘要

具身视觉-语言-动作(VLA)模型通常通过在机器人数据上微调强大的预训练 VLM 获得,但目前尚不清楚它们在适应后保留了多少常识和事实知识。在知识敏感任务上的失败是模糊的,混淆了知识缺失与低级控制泛化能力差。我们引入 Act2Answer,一种轻量级协议,通过要求智能体通过动作来回答,将 VLM 知识基准适配到 VLA 评估。每个问题变成一个简短的桌面场景,其中智能体执行单个物体放置动作以选择候选答案,从而产生动作基础的、减少控制混淆的成功率。我们在不同的常识和世界知识类别中策划了这样的环境测试套件,并引入逐层意图探测以定位 VLM 骨干和动作头中与答案相关的信息。在对 7 个 VLA 模型和 9 个 VLM 基线的大规模研究中,我们系统地跨类别对模型进行排名,发现 VLA 在简单概念上表现稳健,但在更丰富的语义类别上相对于其源 VLM 显示出更大的差距,VQA 联合训练与更好的知识保留相关,并且答案相关信号在 VLA 中间层达到峰值,但在上层减弱。Act2Answer 可在以下网址获取:此 https URL。

英文摘要

Embodied Vision-Language-Action (VLA) models are typically obtained by fine-tuning powerful pretrained VLMs on robotics data, yet it is unclear how much commonsense and factual knowledge they retain after adaptation. Failures on knowledge-sensitive tasks are ambiguous, conflating missing knowledge with poor generalization of low-level control. We introduce Act2Answer, a lightweight protocol that adapts VLM knowledge benchmarks to VLA evaluation by requiring agents to answer through action. Each question becomes a short tabletop episode where the agent performs a single object-placement action to select among candidate answers, yielding an action-grounded success rate with reduced control confounds. We curate a test suite of such environments across diverse commonsense and world-knowledge categories and introduce layerwise intent probing to localize answer-relevant information across the VLM backbone and action head. In a large-scale study of 7 VLA models and 9 VLM baselines, we systematically rank models across categories, finding that VLAs show solid performance on simple concepts while exhibiting larger gaps on richer semantic categories relative to their source VLMs, that VQA co-training is associated with better knowledge retention, and that answer-relevant signals peak in middle VLA layers but attenuate in upper layers. Act2Answer is available at https://tttonyalpha.github.io/act2answer/.

2606.18267 2026-06-18 cs.SI cs.LG cs.NE 交叉投稿

Graph Instance Landscapes: When Structural Similarity Does (Not) Reflect Shortest-Path Performance

图实例景观:当结构相似性(不)反映最短路径性能时

Maryam Gholami Shiri, Ivana Krminac, Marko Djukanović, Sašo Džeroski, Eva Tuba, Tome Eftimov

发表机构 * Jožef Stefan Institute(乔泽夫·斯塔芬研究所) Ljubljana, Slovenia(斯洛文尼亚卢布尔雅那) Jožef Stefan International Postgraduate School(乔泽夫·斯塔芬国际研究生学院) University of Banja Luka(班贾卢卡大学) Faculty of Natural Science and Mathematics(自然科学与数学学院) University of Nova Gorica(诺瓦戈里察大学) Institute of Information Sciences (IZUM)(信息科学研究所(IZUM)) Trinity University(特里尼蒂大学)

AI总结 通过将图嵌入低维结构特征空间并聚类,分析最短路径算法在不同图结构区域中的性能差异,发现结构相似性并不保证性能相似。

Comments Preprint version of a paper accepted at the 2026 IEEE Congress on Evolutionary Computation (IEEE CEC 2026)

详情
AI中文摘要

最短路径算法的基准测试通常基于异构图集上的聚合性能,这限制了对不同搜索范式如何响应实例结构的理解。我们采用实例景观视角进行图基准测试,将图嵌入到低成本的结构特征空间中,并将其聚类为结构相似的区域。研究了三个基准套件:加权 Erdős--Rényi 图、随机几何(无线)图和真实世界道路网络。我们评估了四种代表性的最短路径求解器,涵盖无信息精确搜索(Dijkstra)、双向精确搜索(双向 Dijkstra)、启发式引导精确搜索(A$^{*}$)和基于双端队列的策略(DEQ)。在多种特征选择方案下分析聚类鲁棒性,并使用非参数检验比较不同景观区域内的运行时间分布。虽然生成器参数诱导出稳定的结构区域,但我们发现特征空间相似性并不一定意味着性能相似:即使在相同的景观区域内,也经常观察到显著的运行时间变化。合并套件分析进一步表明,不同的基准族占据大部分不相交的区域。这些结果突出了结构景观用于最短路径算法结构感知基准测试的潜力和局限性。

英文摘要

Benchmarking shortest-path algorithms is commonly based on aggregate performance over heterogeneous graph sets, which limits insight into how different search paradigms react to instance structure. We adopt an instance-landscape view of graph benchmarking by embedding graphs into a low-cost structural feature space and clustering them into regions of similar structure. Three benchmark suites are studied: weighted Erdős--Rényi graphs, random geometric (wireless) graphs, and real-world road networks. We evaluate four representative shortest-path solvers spanning uninformed exact search (Dijkstra), bidirectional exact search (bidirectional Dijkstra), heuristic-guided exact search (A$^{*}$), and deque-based strategies (DEQ). Clustering robustness is analyzed under multiple feature-selection schemes, and runtime distributions are compared across landscape regions using non-parametric tests. While generator parameters induce stable structural regions, we find that feature-space similarity does not necessarily imply performance similarity: significant runtime shifts are frequently observed even within the same landscape region. A merged-suite analysis further shows that different benchmark families occupy largely disjoint regions. These results highlight both the potential and the limits of structural landscapes for the structure-aware benchmarking of shortest-path algorithms.

2606.18281 2026-06-18 stat.AP cs.LG stat.ML 交叉投稿

A Guide to Estimating Conditional Average Treatment Effects in Competing Risks Settings

竞争风险背景下条件平均处理效应估计指南

Daniel Klippert, Sarah Friedrich, Markus Pauly

发表机构 * Department of Statistics, TU Dortmund University(图恩-多特蒙德大学统计学系) Research Center Trustworthy Data Science and Security, University Alliance Ruhr (UA Ruhr)(鲁尔大学联盟可信数据科学与安全研究中心) Institute for Mathematics, University of Augsburg(艾希施泰特大学数学研究所)

AI总结 针对竞争风险生存数据,比较六种元学习器估计条件平均处理效应,提供R包crsurvlearners指导模型选择。

详情
AI中文摘要

条件平均处理效应(CATE)是个性化医疗中治疗决策的核心。在竞争风险背景下,从生存数据估计CATE允许对特定感兴趣事件的治疗效果进行患者特异性评估,同时适当考虑替代事件类型。在存在合并症的情况下,这种区分至关重要,因为竞争死亡原因可能混淆治疗效果。本文聚焦于右删失生存时间和二元治疗,研究CATE定义为在固定时间点上感兴趣事件绝对风险的协变量条件差异。为此,我们研究了元学习器,这些学习器将机器学习算法适应于竞争风险场景中的CATE估计。我们系统比较了六种元学习器,结合Cox回归或随机生存森林进行风险建模,以及弹性网回归或随机森林进行直接CATE建模。为提供模型选择的实践指导,我们在多种模拟设置中评估其性能,这些设置在风险复杂性、治疗异质性、治疗分配、事件类型分布和删失方面有所不同。为促进应用,我们提供R包crsurvlearners,实现了所有考虑的方法。

英文摘要

Conditional average treatment effects (CATEs) are central to treatment decision-making in personalized medicine. In competing risks settings, estimating CATEs from survival data allows for patient-specific assessments of treatment effectiveness for a specific event of interest while properly accounting for alternative event types. This distinction is essential in the presence of comorbidities, where competing causes of death may otherwise confound the therapeutic benefit. Focusing on right-censored survival times with binary treatment, we examine CATEs defined as covariate-conditional differences in the absolute risk for the event of interest at a fixed time. To this end, we study meta-learners which adapt machine learning algorithms for CATE estimation in competing risks scenarios. We systematically compare six meta-learners, combining Cox regression or random survival forests for risk modeling with elastic net regression or random forests for direct CATE modeling. To provide practical guidance on model selection, we evaluate their performance in multiple simulation settings, that differ in hazard complexity, treatment heterogeneity, treatment assignment, event type distribution and censoring. To facilitate applied use, we provide the R package, crsurvlearners, which implements all considered approaches.

2606.18302 2026-06-18 q-bio.OT cs.LG 交叉投稿

Protein-Based Fish Species Identification: Dataset, Models, and Insights from Native Bangladeshi Fish

基于蛋白质的鱼类物种识别:孟加拉本土鱼类的数据集、模型与见解

Md Nasiat Hasan Fahim, Md. Abid Ullah Muhib, Mohammad Shahidur Rahman

发表机构 * Shahjalal University of Science

AI总结 本研究构建了首个孟加拉本土鱼类蛋白质序列数据集,并系统评估了七种架构,提出了一种轻量级混合模型MotifCNN-Transformer+TA-PE,在资源受限场景下优于大型蛋白质语言模型ProtBERT。

Comments Published in 2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking (QPAIN). \c{opyright} 2026 IEEE. Personal use of this material is permitted

详情
Journal ref
2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking (QPAIN)
AI中文摘要

在孟加拉国,正确识别鱼类物种对于粮食安全、经济发展和气候适应性至关重要。蛋白质序列直接反映功能和进化约束,对物种认证和生物多样性监测具有重要意义。然而,目前尚无针对孟加拉本土鱼类物种的蛋白质序列识别基准。本研究通过引入首个包含9种孟加拉本土鱼类2845条高质量蛋白质序列的精选数据集来填补这一空白。我们还通过对七种架构范式进行系统基准测试,建立了该领域首个蛋白质序列分类基线。此外,我们提出了一种实用的新型混合架构——MotifCNN与具有末端感知位置编码的Transformer(MotifCNN-Transformer+TA-PE)。该新架构实现了79.80%的准确率和0.80的宏F1分数。最高准确率83.04%由微调的蛋白质语言模型ProtBERT取得,该模型有4.2亿参数,需要双16GB GPU进行推理。根据McNemar检验,ProtBERT相比我们的MotifCNN-Transformer+TA-PE的3.24%准确率提升在统计上不显著(p = 0.1120)。在九类中的六类上,我们的新架构在每类识别中优于ProtBERT。此外,我们的MotifCNN-Transformer+TA-PE比ProtBERT快约5倍,小42倍,支持16倍更大的批处理大小,且无需GPU推理,使其在资源受限地区(如孟加拉农村)部署更为实用。除此之外,我们的基础性工作展示了系统发育关系对序列相似性的影响,并为南亚蛋白质依赖型经济中的渔业管理、食品认证和生物多样性保护建立了途径。

英文摘要

Correct identification of fish species is highly significant for food security, economic development, and climate resilience in Bangladesh. Protein sequences directly reflect functional and evolutionary constraints which are important for species authentication and biodiversity monitoring. Yet there exists no benchmark for native Bangladeshi fish species identification from protein sequence. In this study, we addressed this gap by introducing the first curated dataset for nine native Bangladeshi fish species of 2845 high quality protein sequences. We also established the first protein sequence classification baseline for this domain through a systematic benchmarking of seven architectural paradigms. Moreover, we propose a realistic deployable novel hybrid architecture of MotifCNN and Transformer with Terminal-Aware Positional-Encoding (MotifCNN-Transformer+TA-PE). Our novel architecture achieves 79.80% accuracy with macro-F1 of 0.80. The highest 83.04% accuracy is achieved by finetuned protein language model ProtBERT that has 420M parameters and requires dual 16GB GPUs for inference. According to McNemar's test, ProtBERT's 3.24% accuracy gain over our MotifCNN-Transformer+TA-PE is statistically insignificant (p = 0.1120). Our novel architecture beats it among six of the nine classes in per class identification. Also our MotifCNN-Transformer+TA-PE is approximately 5x faster, 42x smaller, and supports 16x larger batch size than ProtBERT and has GPU free inference, making it more practical for deployment in resources constrained areas such as rural Bangladesh. Beyond this, our foundational work shows effects of phylogenetic relationships on sequence similarity and establishes pathways for fisheries management, food authentication and biodiversity conservation in South Asia's protein dependent economy.

2606.18436 2026-06-18 stat.ML cs.LG 交叉投稿

Pointwise is Pointless? A Multimodal Ablation Study for Precipitation Nowcasting with Graph Neural Networks

逐点是否无意义?基于图神经网络的降水临近预报的多模态消融研究

Ophélia Miralles, Máté Mile, Christoffer Artturi, Thomas Nipen, Ivar Seierstad

发表机构 * Norwegian Meteorological Institute(挪威气象研究所)

AI总结 本研究通过多模态图神经网络系统,消融分析雷达、数值预报、地面观测、卫星数据及训练损失对降水临近预报的影响,发现各模态分别改善不同方面,点观测虽提升局部但需结合损失函数和不确定性表示才能优化雷达场。

详情
AI中文摘要

稀疏点观测在降水临近预报中日益可用,但尚不清楚它们能在多大程度上改善密集雷达场预报。我们通过北欧雷达区域的多模态图神经网络临近预报系统部分回答了这个问题。该模型预测未来两小时内每五分钟的降雨率,并采用雷达历史、MEPS数值天气预报、Netatmo地面观测、MSG卫星通道、随机噪声和基于CRPS的集合损失的不同组合进行训练。本研究设计为对操作相关信源和训练目标的消融。我们比较了仅雷达、NWP信息、站点信息、卫星信息、噪声增强和基于CRPS的配置,使用雷达网格、站点位置、降雨起始的互补诊断,以及oracle、位移和幅度评分。结果表明,每个信源改善了预报问题的不同方面。MEPS稳定了仅雷达外推,Netatmo观测改善了局部站点和起始诊断,卫星预测因子减少了某些站点级偏差,但在确定性使用时可能过早激活降雨。基于CRPS的配置提供了最一致的雷达网格增益,而卫星与CRPS的组合设置给出了最佳的整体oracle/DAS评分。这些结果不支持点观测对临近预报无用的结论,但表明局部观测技能和空间相干雷达场技能是不同的目标。实际意义是,稀疏观测可以提供有用的局部约束,但它们对雷达类场的益处取决于训练损失、不确定性表示以及观测支持在模型中的编码方式。

英文摘要

Sparse point observations are increasingly available for precipitation nowcasting, but it is unclear how much they improve dense radar-field forecasts. We partially address this question with a multimodal graph neural network nowcasting system over the Nordic radar domain. The model predicts rain rate every five minutes up to two hours ahead and is trained with different combinations of radar history, MEPS numerical weather prediction, Netatmo surface observations, MSG satellite channels, stochastic noise, and CRPS-based ensemble losses. The study is designed as an ablation of operationally relevant information sources and training objectives. We compare radar-only, NWP-informed, station-informed, satellite-informed, noise-augmented, and CRPS-based configurations using complementary diagnostics on the radar grid, at station locations, for rain onset, and through oracle, displacement, and amplitude scores. The results show that each source improves a different part of the forecast problem. MEPS stabilises radar-only extrapolation, Netatmo observations improve local station and onset diagnostics, and satellite predictors reduce some station-level biases but may activate rain too early when used deterministically. CRPS-based configurations provide the most consistent radar-grid gains, while the combined satellite and CRPS setup gives the best overall oracle/DAS score. These results do not support the conclusion that point observations are uninformative for nowcasting, but they show that local observational skill and spatially coherent radar-field skill are distinct targets. The practical implication is that sparse observations can provide useful local constraints, but their benefit for radar-like fields depends on the training loss, uncertainty representation, and how observation support is encoded in the model.

2606.18557 2026-06-18 cs.AI cs.LG cs.LO 交叉投稿

DeFAb: A Verifiable Benchmark for Defeasible Abduction in Foundation Models

DeFAb:基础模型中可废止溯因的可验证基准

Patrick Cooper, Alvaro Velasquez

发表机构 * University of Colorado Boulder(科罗拉多大学博尔德分校)

AI总结 提出DeFAb基准,通过将知识库转换为可验证的溯因实例,评估基础模型在可废止推理中的创造力与理论推理能力,发现前沿模型准确率远低于符号求解器。

Comments 33 pages, 14 figures, 23 tables. Dataset: https://huggingface.co/datasets/PatrickAllenCooper/DeFAb ; code and evaluation harness: https://github.com/PatrickAllenCooper/blanc

详情
AI中文摘要

一个基于规则的逻辑求解器在不到50微秒内以100%的准确率解决了我们基准中的每个实例;而最佳前沿语言模型在渲染鲁棒评估下最高仅达65%,最差降至23.5%(四种表面渲染的最坏情况)。我们引入DeFAb(可废止溯因基准),这是一个数据集和生成流水线,将四十年的公共资助知识库转换为形式化可废止溯因实例:通过覆盖默认值同时保留无关期望来构建解释异常假设。由于每个假设必须通过多项式时间检查(有效推导、保守性和最小性),DeFAb将逻辑严谨性作为衡量创造性和理论推理的工具,评分的是理论修正的规范构建,而非流畅但破坏理论的散文。该流水线将分类层次结构(OpenCyc、YAGO、Wikidata)与行为属性图(ConceptNet、UMLS)配对,从18个来源生成372,648+个实例,涉及33.75M条实例化规则,分为三个级别,并具有多项式时间可验证的金标准。四个前沿模型未能可靠内化可废止推理:渲染鲁棒的Level 2准确率为7.8-23.5%;思维链方差(约36个百分点)超过任何模型间差距;匹配的污染控制隔离出+19.4个百分点的Level 3差距。我们进一步发布了DeFAb-Hard(235个实例的Level 3难度变体;最佳模型53.3% vs 符号100%)和CONJURE(一个内核验证的变革性创造力变体,包含560个Lean 4/Mathlib实例,其金答案证明内核先前未包含的定义,无需判断的验证器;试点发现零新概念)。同一验证器还可作为偏好优化(DPO、RLVR/GRPO)的精确奖励。基于MIT许可发布于此https URL。

英文摘要

A rule-based logic solver resolves every instance in our benchmark in under 50 microseconds with 100% accuracy; the best frontier language model reaches 65% at best and drops to 23.5% under rendering-robust evaluation (worst case over four surface renderings). We introduce DeFAb (Defeasible Abduction Benchmark), a dataset and generation pipeline that converts four decades of publicly funded knowledge bases into formally grounded instances for defeasible abduction: constructing hypotheses that explain anomalies by overriding defaults while preserving unrelated expectations. Because every hypothesis must pass polynomial-time checks for valid derivation, conservativity, and minimality, DeFAb makes logical rigor the instrument for measuring creativity and theoretical reasoning, scoring the disciplined construction of theory revisions rather than fluent but theory-destroying prose. The pipeline pairs taxonomic hierarchies (OpenCyc, YAGO, Wikidata) with behavioral property graphs (ConceptNet, UMLS) to produce 372,648+ instances across 33.75M materialized rules from 18 sources, in three levels with polynomial-time verifiable gold standards. Four frontier models do not reliably internalize defeasible reasoning: rendering-robust Level 2 accuracy is 7.8-23.5%; chain-of-thought variance (~36 pp) exceeds any inter-model gap; and a matched contamination control isolates a +19.4 pp Level 3 gap. We further release DeFAb-Hard (a 235-instance Level 3 difficulty variant; best model 53.3% vs 100% symbolic) and CONJURE (a kernel-verified transformative-creativity variant of 560 Lean 4/Mathlib instances whose gold answers are definitions the proof kernel did not previously contain, judge-free verifier; a pilot finds zero novel concepts). The same verifier doubles as an exact reward for preference optimization (DPO, RLVR/GRPO). Released under MIT at https://huggingface.co/datasets/PatrickAllenCooper/DeFAb.

2606.18686 2026-06-18 cs.AI cs.CL cs.LG 交叉投稿

ForecastBench-Sim: A Simulated-World Forecasting Benchmark

ForecastBench-Sim:一个模拟世界预测基准

Jaeho Lee, Nick Merrill, Ezra Karger

发表机构 * Forecasting Research Institute(预测研究所)

AI总结 提出基于Freeciv游戏模拟的预测基准ForecastBench-Sim,通过游戏回滚生成可控、即时可解的预测问题,用于评估AI系统的概率推理能力。

Comments 15 pages, 5 main figures, 6 appendix figures. Spotlight presentation at Forecasting as a New Frontier of Intelligence / Workshop on AI Forecasting, ICML 2026

详情
AI中文摘要

通用AI系统的预测基准通常继承现实世界的约束:结果缓慢显现、尾部事件罕见、反事实问题难以评分。我们引入ForecastBench-Sim,一个基于Freeciv(一款以文明系列为模型的回合制策略游戏)游戏回滚的模拟世界预测基准。预测者接收固定的世界报告(当前游戏状态的结构化快照),并回答关于隐藏未来状态的问题;然后基准继续模拟并对预测进行评分。由于世界是模拟的,同一设置可以生成任意时间跨度的连续或二元预测问题、用于条件或因果问题的配对干预世界,以及罕见或破坏性结果的已解决示例。我们描述了基准流程、问题族、评分协议和发布工件,并报告了来自模型评估和匿名人工试点的验证切片。ForecastBench-Sim旨在通过提供受控、即时可解的任务来补充现实世界预测基准,用于研究动态世界状态下的概率推理。

英文摘要

Forecasting benchmarks for general-purpose AI systems usually inherit the constraints of the real world: outcomes resolve slowly, tail events are rare, and counterfactual questions are difficult to score. We introduce ForecastBench-Sim, a simulated-world forecasting benchmark built on game rollouts from Freeciv, a turn-based strategy game modelled on the Civilization series. Forecasters receive a fixed world report (a structured snapshot of the current game state) and answer questions about hidden future states; the benchmark then continues the simulation and scores forecasts. Because the world is simulated, the same setup can generate continuous or binary forecasting questions at arbitrary time horizons, paired intervention worlds for conditional or causal questions, and resolved examples of rare or disruptive outcomes. We describe the benchmark pipeline, question families, scoring protocol, and release artifacts, and report validation slices from model evaluations and an anonymized human pilot. ForecastBench-Sim is intended to complement real-world forecasting benchmarks by providing controlled, immediately resolvable tasks for studying probabilistic reasoning under dynamic world states.

2606.18729 2026-06-18 stat.ML cs.LG 交叉投稿

TimeLAVA: Learning-Agnostic Data Valuation for Time Series

TimeLAVA: 时间序列的学习无关数据估值

Wenqin Liu, Weizhi Quan, Aoqi Zuo, Erdun Gao, Vu Nguyen, Dino Sejdinovic, Howard Bondell, Mingming Gong

发表机构 * School of Mathematics and Statistics, The University of Melbourne(墨尔本大学数学与统计学学院) Statistics, The University of Melbourne(墨尔本大学统计学系) Statistics, University of Sydney(悉尼大学统计学系) Responsible AI Research Centre, Australian Institute for Machine Learning(澳大利亚机器学习研究所负责任人工智能研究中心) Amazon(亚马逊) School of Mathematical Sciences, Adelaide University(阿德莱德大学数学科学学院) Department of Machine Learning, MBZUAI(MBZUAI机器学习系)

AI总结 提出TimeLAVA,一种学习无关框架,通过小波变换和最优传输评估时间序列片段对分布差异的边际贡献,无需模型训练,在异常检测、数据剪枝和标签噪声检测中优于现有方法。

Comments 34pages

详情
Journal ref
ICML2026
AI中文摘要

数据估值量化单个样本的内在质量,以实现原则性的数据整理、质量控制和鲁棒学习。对于医疗、金融和工业监控等关键领域的时间序列,有效的估值方法至关重要但基本缺乏。现有方法要么依赖于模型,限制了其泛化性,要么针对独立同分布数据设计,因此无法捕捉序列数据固有的时间依赖性、多尺度模式和非平稳动态。我们引入了TimeLAVA,一种学习无关框架,通过评估时间片段对最小化评估数据与参考数据之间分布差异的边际贡献来估值。其核心是一种新颖的基于选择性小波的Wasserstein差异,结合了用于时间定位的多尺度小波变换和用于对分布偏移具有鲁棒性的非平衡最优传输。通过敏感性分析高效计算片段值,无需模型训练,并聚合成逐点得分。我们提供了将估值与模型无关泛化联系起来的理论保证,并证明了对异常值污染的有界敏感性。在异常检测、数据剪枝和标签噪声检测上的大量实验表明,TimeLAVA在多样化的真实世界数据集上产生了比现有方法显著更具信息量的价值分数。

英文摘要

Data valuation quantifies the intrinsic quality of individual samples to enable principled data curation, quality control, and robust learning. For time series in critical domains such as healthcare, finance, and industrial monitoring, effective valuation methods are essential yet fundamentally lacking. Existing approaches are either model-dependent, limiting their generalizability, or designed for i.i.d. data and thus fail to capture temporal dependencies, multi-scale patterns, and non-stationary dynamics inherent to sequential data. We introduce TimeLAVA, a learning-agnostic framework that values temporal segments by their marginal contribution to minimizing distributional discrepancy between evaluated and reference data. At its core is a novel Selective Wavelet-based Wasserstein discrepancy combining multi-scale wavelet transforms for temporal localization with unbalanced optimal transport for robustness to distributional shifts. Segment values are efficiently computed via sensitivity analysis without requiring model training and aggregated into point-wise scores. We provide theoretical guarantees linking valuation to model-agnostic generalization and prove bounded sensitivity to outlier contamination. Extensive experiments across anomaly detection, data pruning, and label noise detection demonstrate that TimeLAVA produces significantly more informative value scores than existing methods on diverse real-world datasets.

2606.18750 2026-06-18 stat.AP cs.LG 交叉投稿

Ensuring Trustworthy Online A/B Testing: Addressing Five Key Questions on CUPED

确保可信的在线A/B测试:解决关于CUPED的五个关键问题

Yu Zhang, Bokui Wan, Yongli Qin, Jinyong Ma, Yifan Guo

AI总结 本文系统解决CUPED应用中五个常见但被忽视的问题,包括最优调整规范、回归调整有效性、鲁棒方差估计,并扩展到多臂实验和两阶段抽样设计,通过理论分析和实验验证提供可靠方法,已在字节跳动平台部署。

Comments 15 pages, 3 figures

详情
AI中文摘要

A/B测试已成为大规模在线实验中数据驱动决策的金标准,为功能发布、定价优化和用户体验提升提供关键指导。为最大化统计灵敏度,许多科技公司常规使用实验前数据控制实验(CUPED),该技术实现大幅方差缩减,同时保持平均处理效应估计的无偏性。尽管被广泛采用,CUPED的几个关键方法和实践细节仍未充分探索。本文系统解决了关于CUPED应用的五个常见但被忽视的问题。首先,我们提供各种后CUPED估计量的比较分析,以确定最优调整规范。其次,我们评估基于回归的调整的有效性,并描述为此类框架定制的鲁棒方差估计方法。最后,我们将研究扩展到复杂但常见的场景,包括多臂实验和两阶段抽样设计。我们的发现表明,在这些设置中,天真地依赖标准方差估计量可能导致严重误导的推断。通过提供严格的理论见解和广泛的实验验证,本工作加深了对CUPED的概念理解。值得注意的是,推荐的方法已成功部署并集成到字节跳动的实验平台中。

英文摘要

A/B testing has become the gold standard for data-driven decision-making in large-scale online experimentation, providing critical guidance for feature launch, pricing optimization, and user experience enhancement. To maximize statistical sensitivity, many technology companies routinely employ Controlled-experiment Using Pre-Experiment Data (CUPED), a technique that achieves substantial variance reduction while preserving the unbiasedness of estimating the average treatment effect. Despite its widespread adoption, several critical methodological and practical nuances of CUPED remain underexplored. This paper systematically addresses five frequently encountered yet overlooked questions regarding the application of CUPED. First, we provide a comparative analysis of various post-CUPED estimators to identify the optimal adjustment specification. Second, we evaluate the validity of regression-based adjustments and delineate robust variance estimation methods tailored for such frameworks. Finally, we extend our investigation to complex but common scenarios, including multi-arm experiments and two-stage sampling designs. Our findings reveal that in these settings, naive reliance on standard variance estimators can lead to severely misleading inferences. By offering rigorous theoretical insights and extensive experimental validation, this work deepens the conceptual understanding of CUPED. Notably, the recommended methodologies have been successfully deployed and integrated into ByteDance's experimentation platform.

2606.18972 2026-06-18 stat.ML cs.LG 交叉投稿

FOSC-X: An Extended Framework for Optimal Local Cuts and Non-Horizontal Cluster Selection from Clustering Hierarchies

FOSC-X: 一种用于从聚类层次结构中提取最优局部切割和非水平聚类的扩展框架

Connor Simpson, Ricardo J. G. B. Campello

AI总结 提出FOSC-X框架,通过动态规划从层次聚类树中提取前M个全局最优的局部非水平切割聚类,支持聚类数约束,在线性时间内保证最优排序。

详情
AI中文摘要

从层次结构中提取平坦聚类解是实际聚类分析中的常见任务,可表述为优化问题。现有方法侧重于寻找单个最优解。我们引入FOSC-X,一个从层次聚类树的局部非水平切割中提取前M个全局最优平坦聚类的框架,同时可选地对聚类数量施加约束。这使得能够自动识别多个高质量替代聚类,捕捉层次结构的不同方面。无约束时,利用子树内局部最优部分候选可组合成全局最优解并自动确定聚类数的性质,通过动态规划在多项式时间内求解前M问题。然而,这可能导致聚类数最终不理想——例如,在特定应用领域中过大而失去意义或难以实际分析。施加聚类数约束破坏了无约束动态规划方法的最优性性质,因为局部最优部分候选可能不再能组合成可行的全局最优解。FOSC-X通过一种动态规划策略应对这一挑战,该策略使用可行性的下界和上界维护紧凑的可行候选集,同时剪枝不可行或占优的组合。所得方法保证在有无聚类数约束下,均以聚类节点数和数据集大小的线性时间复杂度获得前M个解的最优排序。实验表明,FOSC-X能有效揭示单解提取方法忽略的替代聚类结构。

英文摘要

Extracting a flat clustering solution from a hierarchy is a common task in practical cluster analysis and can be formulated as an optimisation problem. Existing approaches focus on finding a single optimal solution. We introduce FOSC-X, a framework for extracting the top-M globally optimal flat clusterings from local, non-horizontal cuts of a hierarchical cluster tree, while optionally enforcing constraints on the number of clusters. This enables automatic identification of multiple high-quality alternative clusterings that capture different aspects of the hierarchical structure. Without constraints, the top-M problem can be solved in polynomial time using dynamic programming, exploiting the property that locally optimal partial candidates within subtrees can be combined to form globally optimal solutions while automatically determining the number of clusters. However, this can lead to solutions with numbers of clusters that are ultimately undesirable -- e.g., too large to be meaningful or practically analysed within a particular application domain. Imposing cluster-count constraints breaks the optimality property underlying the unconstrained dynamic programming approach, since locally optimal partial candidates may no longer combine into feasible globally optimal solutions. FOSC-X addresses this challenge through a dynamic programming strategy that maintains compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations. The resulting method guarantees optimal rankings of the top-M solutions with linear-time complexity in the number of cluster nodes and dataset size, both with and without cluster-count constraints. Experiments show that FOSC-X efficiently reveals alternative clustering structures overlooked by single-solution extraction methods.

2606.19057 2026-06-18 stat.ML cs.LG stat.CO stat.ME 交叉投稿

Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning

通过正-无标签学习量化与审计大语言模型评估

Zilong Zhang, Yi-Ting Hung, Lei Ding, Chi-Kuang Yeh

AI总结 针对大语言模型作为评估者存在的系统性偏差(如冗长偏好),提出基于部分最优传输的几何审计框架,利用少量人工验证正样本校正偏差,无需重训练即可提升与人类偏好的一致性。

详情
AI中文摘要

大语言模型(LLM)越来越多地被用作可扩展评估的评判者,然而这种LLM作为评判者的系统表现出与语义质量脱节的系统性偏差,最显著的是冗长偏差。同时,人工监督成本高昂且通常具有选择性,产生可靠的正向判断,但大多数输出未被标记且质量可能参差不齐。我们将选择性人工监督下的LLM评估形式化为一个正-无标签学习问题,并提出了一个基于部分最优传输的几何审计框架。通过在固定嵌入空间中将一小部分人工验证的正样本与可靠的无标签输出子集对齐,我们的方法识别出与人类一致的偏好,并在无需重新训练的情况下纠正有偏的评判者。实验表明,该方法提高了与人类偏好的一致性,增强了对呈现偏差的鲁棒性,并提供了可解释的置信度估计,为现有的LLM作为评判者流程提供了一种可扩展且统计上有依据的替代方案。

英文摘要

Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM--as--a--Judge systems exhibit systematic biases that are decoupled from semantic quality, most notably verbosity bias. Meanwhile, human supervision is costly and typically selective, yielding reliable positive judgments but leaving most outputs unlabelled and potentially mixed in quality. We formulate LLM evaluation under selective human supervision as a positive--unlabelled learning problem and propose a geometric auditing framework based on Partial Optimal Transport. By aligning a small set of human--verified positives with a reliable subset of unlabelled outputs in a fixed embedding space, our method identifies human--consistent preferences and corrects biased judges without retraining. Experiments demonstrate improved alignment with human preferences, increased robustness to presentation biases, and interpretable confidence estimates, offering a scalable and statistically grounded alternative to existing LLM--as--a--judge pipelines.

2606.19184 2026-06-18 cs.CV cs.LG 交叉投稿

When AUC Misleads: Polarization-Aware Evaluation of Deepfake Detectors under Domain Shift

当AUC误导:域偏移下深度伪造检测器的极化感知评估

Dat Nguyen, Cosmin Radoi, Romain Hermary, Marcella Astrid, Nesryne Mejri, Enjie Ghorbel, Djamila Aouada

发表机构 * Cristal Laboratory, National School of Computer Sciences, University of Manouba(马努巴大学国家计算机科学学院Cristal实验室)

AI总结 针对现有AUC评估无法反映真实场景中混合数据源和不同伪影类型的问题,提出Cross-dataset AUC(Cross-AUC)指标,通过平均每域AUC并引入预测极化度量(Wasserstein距离)来评估域偏移鲁棒性,实验证明其有效性。

详情
AI中文摘要

生成式AI的最新进展,如扩散模型和换脸工具,使得创建高度逼真的深度伪造成为可能,导致了包括金融欺诈和非自愿色情内容在内的现实危害。为此,深度伪造检测成为一个活跃的研究领域,近期方法越来越关注提高对未见操作的泛化能力。这通常通过跨多个数据集分别测量的ROC曲线下面积(AUC)来评估。然而,这种评估未能反映检测器面对混合数据源和不同伪影类型的真实场景。为解决这一局限,我们引入一种新指标——跨数据集AUC(Cross-AUC),该指标平均每域AUC并加入预测极化度量,以考虑对域偏移的鲁棒性。极化程度通过类别分数分布之间的Wasserstein距离量化。Cross-AUC不仅更真实地评估深度伪造检测器在域偏移下的泛化能力,而且具有可解释性,因为它能更好地解释性能下降的原因。在七个基准数据集上的实验证明了其实用性。

英文摘要

Recent advances in generative AI, such as diffusion models and face-swapping tools, have enabled the creation of highly realistic deepfakes, leading to real-world harms including financial fraud and non-consensual explicit content. In response, deepfake detection has become an active research area, with recent methods increasingly focusing on improving generalization to unseen manipulations. This is typically evaluated using the Area Under the ROC Curve (AUC) measured separately across multiple datasets. However, such an evaluation fails to reflect real-world scenarios where detectors face a mixture of data sources and varying artifact types. To address this limitation, we introduce a novel metric, Cross-dataset AUC (Cross-AUC) that averages per-domain AUCs with a measure of prediction polarization for taking into account the robustness to domain shift. The polarization extent is quantified by the Wasserstein Distance between class score distributions. Cross-AUC not only assesses the generalization capabilities of deepfake detectors under domain shifts more realistically, but it is also interpretable as it better explains the reason behind a drop in performance. Experiments performed on seven benchmark datasets demonstrate its practical relevance.

2606.19245 2026-06-18 cs.AI cs.LG 交叉投稿

TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology

TxBench-PP:分析AI代理在小分子临床前药理学中的表现

Hannah Le, Ramesh Ramasamy, Alex Urrutia, Mahsa Yazdani, Tim Proctor, Kenny Workman

发表机构 * LatchBio

AI总结 提出TxBench-PP基准,用于评估AI代理从真实实验数据中恢复临床前药理学结论的能力,测试显示最强配置Claude Opus 4.8 / Pi仅通过59.3%的端点尝试。

详情
AI中文摘要

人工智能(AI)代理有望通过压缩解释和决策循环来加速药物发现,但实际部署需要基于现实程序决策的可信评估。我们引入了TherapeuticsBench临床前药理学(TxBench-PP),这是一个针对小分子临床前药理学的可验证基准,也是更广泛的TherapeuticsBench在药物发现阶段和治疗模式中的首个聚焦切片。TxBench-PP测试代理是否能够从真实实验数据中恢复准确的结论,而非从文献中记忆的事实。该基准包含100个评估,按程序阶段、实验类型和任务结构索引,涵盖作用机制(MoA)和药效学(PD)推理、化合物-靶点结合、因果靶点验证、可开发性与安全性以及转化疗效。代理接收现实的工作流程快照,在编码环境中检查文件,并返回确定性评分的结构化答案。在16个模型-工具配置(包括11个模型和4,800条轨迹)中,没有系统能够可靠地恢复临床前药理学决策。最强配置Claude Opus 4.8 / Pi通过了59.3%的端点尝试(178/300;95% CI, 51.1-67.6),其次是GPT-5.5 / Pi,为55.3%(166/300;47.0-63.6)。

英文摘要

Artificial intelligence (AI) agents promise to accelerate drug discovery by compressing interpretation and decision-making loops, but practical deployment requires trusted evaluation on realistic program decisions. We introduce TherapeuticsBench Preclinical Pharmacology (TxBench-PP), a verifiable benchmark for small-molecule preclinical pharmacology and the first focused slice of a broader TherapeuticsBench effort across drug-discovery stages and therapeutic modalities. TxBench-PP tests whether agents can recover accurate conclusions from real-world assay data rather than memorized facts from literature. The benchmark contains 100 evaluations indexed by program stage, assay type, and task structure, spanning mechanism-of-action (MoA) and pharmacodynamic (PD) reasoning, compound-target engagement, causal target validation, developability and safety, and translational efficacy. Agents receive realistic workflow snapshots, inspect files in a coding environment, and return structured answers graded deterministically. Across 16 model-harness configurations, comprising 11 models and 4,800 trajectories, no system reliably recovered preclinical pharmacology decisions. The strongest configuration, Claude Opus 4.8 / Pi, passed 59.3\% of endpoint attempts (178/300; 95\% CI, 51.1-67.6), followed by GPT-5.5 / Pi at 55.3\% (166/300; 47.0-63.6).

2606.19334 2026-06-18 cs.CL cs.CY cs.LG 交叉投稿

Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

用LOCUS解放法律:美国地方条例语料库

Denis Peskoff, Joe Barrow, Christopher Vu, Diag Davenport

发表机构 * UC Berkeley(加州大学伯克利分校) School of Information(信息学院) Independent(独立研究者)

AI总结 为解决美国地方条例缺乏机器可读语料的问题,构建了包含9239个市县条例的LOCUS语料库,并训练ModernBERT分类器以分析法律透明度等维度。

Comments 14 pages, 6 figures

详情
AI中文摘要

法律人工智能的进展越来越依赖于大规模获取权威法律文本。然而,美国法律中最具影响力的层级之一——地方条例——在很大程度上仍然缺失于现有的机器可读语料库中。地方法规管辖着分区、住房、商业许可、公共卫生、噪音、动物控制以及许多其他日常监管领域,但它们分散在专为人类浏览而非批量研究访问设计的供应商平台上。我们引入了LOCUS——美国地方条例语料库——一个全面的语料库和县级统一访问层,用于美国市和县条例。原始语料库可供研究人员发布,几乎涵盖了所有公开可用的市和县条例。由此产生的原始语料库包含来自9239个城市和县的法规。一个较小的县级统一LOCUS访问层覆盖了美国3144个县中最大的2309个,覆盖了大部分人口。我们使用OCR来处理使法律无法成为公共资源的各种文档格式。我们发布了带有覆盖元数据的语料库,以支持可重复性、下游法律AI研究以及逐步扩展对地方法律的机器可读访问。我们训练了一系列基于ModernBERT的分类器和评分器,以便从多个维度分析美国地方法律,例如不透明性和家长式作风,这些维度以前从未在此规模上研究过。LOCUS-v1及其衍生模型可在以下网址获取:this https URL

英文摘要

Progress in legal AI increasingly depends on access to authoritative legal text at scale. Yet one of the most consequential layers of American law remains largely absent from existing machine-readable corpora: local ordinances. Local codes govern zoning, housing, business licensing, public health, noise, animal control, and many other domains of everyday regulation, but they are fragmented across vendor platforms designed for human browsing rather than bulk research access. We introduce LOCUS - the Local Ordinance Corpus for the United States - a comprehensive corpus and county-harmonized access layer for U.S. municipal and county ordinance codes. The raw corpus, available for release to researchers, represents nearly all publicly available municipal and county ordinance codes. The resulting raw corpus contains codes from 9,239 cities and counties. A smaller county-harmonized LOCUS access layer provides coverage for the largest 2,309 of 3,144 U.S. counties, accounting for a majority of the population. We use OCR to handle the myriad of document formats that have kept the law from being a public resource. We release the corpus with coverage metadata to support reproducibility, downstream legal AI research, and the incremental expansion of machine-readable access to local law. We train a collection of ModernBERT-based classifiers and scorers to facilitate analyzing U.S. local law among several dimensions, such as opacity and paternalism, that have not previously been studied at this scale. LOCUS-v1 and its derivative models are available at: https://huggingface.co/datasets/LocalLaws/LOCUS-v1

2406.14399 2026-06-18 cs.LG cs.CV physics.ao-ph stat.ML 版本更新

Benchmarking Physics-Informed Time-Series Models for Operational Global Station Weather Forecasting

面向全球站点业务天气预报的物理信息时间序列模型基准测试

Tao Han, Zhibin Wen, Zhenghao Chen, Dazhao Du, Song Guo, Lei Bai

发表机构 * Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong SAR China(香港科技大学计算机科学与工程系) Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China(南方科技大学计算机科学与工程系) School of Computer and Information Sciences, University of Newcastle, Newcastle, Australia(新castle大学计算机与信息科学学院) Hangzhou Innovation Institute of Beihang University, Hangzhou, China(北京航空航天大学杭州创新研究院) Shanghai Artificial Intelligence Laboratory, Shanghai, China(上海人工智能实验室)

AI总结 提出大规模观测数据集WEATHER-5K和物理信息模型PhysicsFormer,通过压力-风对齐和能量感知平滑损失增强物理一致性,在多个天气变量和极端事件预测上评估学术模型与业务系统的差距。

Comments Accepted by ICML2026

详情
AI中文摘要

时间序列预测(TSF)模型的发展常受限于缺乏全面的数据集,尤其是在全球站点天气预报(GSWF)中,现有数据集规模小、时间短且空间稀疏。为解决这一问题,我们引入了WEATHER-5K,一个大规模观测天气数据集,能更好地反映真实世界条件,支持改进模型训练和评估。尽管最近的TSF方法在基准测试上表现良好,但在捕捉复杂天气动态和极端事件方面落后于业务数值天气预报系统。我们提出了PhysicsFormer,一种物理信息预测模型,结合动态核心与Transformer残差来预测未来天气状态。通过压力-风对齐和能量感知平滑损失强制物理一致性,确保在捕捉复杂时间模式的同时保持合理的动力学。我们将PhysicsFormer及其他TSF模型与业务系统在多个天气变量、极端事件预测和模型复杂度上进行基准测试,全面评估学术TSF模型与业务预报之间的差距。数据集和基准测试实现可在以下网址获取:this https URL。

英文摘要

The development of Time-Series Forecasting (TSF) models is often constrained by the lack of comprehensive datasets, especially in Global Station Weather Forecasting (GSWF), where existing datasets are small, temporally short, and spatially sparse. To address this, we introduce WEATHER-5K, a large-scale observational weather dataset that better reflects real-world conditions, supporting improved model training and evaluation. While recent TSF methods perform well on benchmarks, they lag behind operational Numerical Weather Prediction systems in capturing complex weather dynamics and extreme events. We propose PhysicsFormer, a physics-informed forecasting model combining a dynamic core with a Transformer residual to predict future weather states. Physical consistency is enforced via pressure-wind alignment and energy-aware smoothness losses, ensuring plausible dynamics while capturing complex temporal patterns. We benchmark PhysicsFormer and other TSF models against operational systems across several weather variables, extreme event prediction, and model complexity, providing a comprehensive assessment of the gap between academic TSF models and operational forecasting. The dataset and benchmark implementation are available at: https://github.com/taohan10200/WEATHER-5K.

2508.20330 2026-06-18 cs.LG 版本更新

FORGE: Foundational Optimization Representations from Graph Embeddings

FORGE:基于图嵌入的基础优化表示

Zohair Shafi, Serdar Kadioglu

发表机构 * Khoury College of Computer Science Northeastern University(诺埃弗大学计算机科学学院) AI Center of Excellence, Fidelity Investments(富达投资人工智能卓越中心) Department of Computer Science, Brown University(布朗大学计算机科学系)

AI总结 提出FORGE框架,通过无监督预训练向量量化图自编码器学习混合整数规划实例的通用表示,无需求解器或最优解,在下游任务中提升求解器性能并超越现有方法。

Comments Published in TMLR

详情
AI中文摘要

组合优化问题在科学和工程中无处不在。然而,基于学习的加速组合优化方法通常需要求解大量困难实例来收集训练数据,导致显著的计算成本。现有的学习方法需要为每个问题分布和每个下游任务训练专用模型,严重限制了其可扩展性和泛化能力。我们提出Forge:基于图嵌入的基础优化表示,这是一个框架,它在大规模、多样化的混合整数规划(MIP)实例集合上以无监督方式预训练向量量化图自编码器,不依赖优化求解器或最优解。向量量化产生离散的代码分配,作为表示优化实例的词汇表。我们在无监督和有监督设置下评估Forge。在无监督设置中,Forge嵌入有效聚类跨问题领域和规模的未见实例。在有监督设置中,我们微调Forge嵌入,并展示单个预训练模型有助于预测割生成的完整性差距和搜索指导的变量提示,跨越多个问题和规模分布。在这两个任务中,我们提升了商业优化求解器的性能,并超越了最先进的基于学习的方法。最后,我们开源训练代码、预训练Forge权重和多个MIP分布的嵌入,以促进优化问题表示学习的进一步研究。

英文摘要

Combinatorial optimization problems are ubiquitous in science and engineering. Still, learning-based approaches to accelerate combinatorial optimization often require solving a large number of difficult instances to collect training data, incurring significant computational cost. Existing learning-based methods require training dedicated models for each problem distribution, for each downstream task, severely limiting their scalability and generalization. We introduce Forge: Foundational Optimization Representations from Graph Embeddings, a framework that pre-trains a vector-quantized graph autoencoder on a large, diverse collection of mixed-integer programming (MIP) instances in an unsupervised manner, without relying on optimization solvers or optimal solutions. Vector quantization produces discrete code assignments that serve as a vocabulary for representing optimization instances. We evaluate Forge in both unsupervised and supervised settings. In the unsupervised setting, Forge embeddings effectively cluster unseen instances across problem domains and sizes. In the supervised setting, we fine-tune Forge embeddings and show that a single pre-trained model helps predicting both the integrality gap for cut-generation and variable hints for search guidance across multiple problem and size distributions. In both tasks, we improve the performance of a commercial optimization solver and outperform state-of-the-art learning-based methods. Finally, we open-source our training code, pre-trained Forge weights, and embeddings for multiple MIP distributions to foster further research in representation learning for optimization problems https://skadio.github.io/forge/

2509.02555 2026-06-18 cs.LG cs.AI cs.NE 版本更新

Surrogate Benchmarks for Model Merging Optimization

模型合并优化的替代基准

Rio Akizuki, Yuya Kudo, Nozomu Yoshinari, Yoichi Hirose, Toshiyuki Nishimoto, Kento Uchida, Shinichi Shirakawa

发表机构 * Yokohama National University(横滨国立大学)

AI总结 针对模型合并超参数优化计算成本高的问题,构建替代基准以低成本预测合并模型性能并模拟优化算法行为。

Comments AutoML 2025 Non-Archival Content Track. The code of the surrogate benchmark is available at https://github.com/shiralab/SMM-Bench

详情
AI中文摘要

模型合并技术旨在将多个模型的能力整合到一个模型中。大多数模型合并技术都有超参数,其设置会影响合并模型的性能。由于现有几项工作表明,调整模型合并中的超参数可以增强合并结果,因此为模型合并开发超参数优化算法是一个有前景的方向。然而,其优化过程计算成本高昂,特别是在合并大型语言模型时。在这项工作中,我们为合并超参数的优化开发了替代基准,以实现低成本的算法开发和性能比较。我们定义了两个搜索空间并收集数据样本,以构建替代模型来预测合并模型在给定超参数下的性能。我们证明了我们的基准能够很好地预测合并模型的性能,并模拟优化算法的行为。

英文摘要

Model merging techniques aim to integrate the abilities of multiple models into a single model. Most model merging techniques have hyperparameters, and their setting affects the performance of the merged model. Because several existing works show that tuning hyperparameters in model merging can enhance the merging outcome, developing hyperparameter optimization algorithms for model merging is a promising direction. However, its optimization process is computationally expensive, particularly in merging LLMs. In this work, we develop surrogate benchmarks for optimization of the merging hyperparameters to realize algorithm development and performance comparison at low cost. We define two search spaces and collect data samples to construct surrogate models to predict the performance of a merged model from a hyperparameter. We demonstrate that our benchmarks can predict the performance of merged models well and simulate optimization algorithm behaviors.

2509.22363 2026-06-18 cs.LG eess.AS 版本更新

Investigating Faithfulness in Large Audio Language Models

大型音频语言模型中的忠实性研究

Pooneh Mousavi, Lovenya Jain, Mirco Ravanelli, Cem Subakan

发表机构 * Concordia University(康科迪亚大学) Mila - Quebec AI Institute(魁北克人工智能研究院) Université Laval(拉瓦尔大学) Birla Institute of Technology and Science, Pilani(比拉理工学院和科学学院,皮兰尼)

AI总结 提出系统框架评估大型音频语言模型在推理链忠实性上的表现,定义三个音频忠实性标准,并通过基准测试发现模型推理与音频输入存在脱节。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

大型音频语言模型(LALMs)将音频编码器与预训练的大型语言模型集成,以执行复杂的多模态推理任务。虽然这些模型可以生成思维链(CoT)解释,但这些推理链的忠实性仍不清楚。在这项工作中,我们提出了一个系统框架来评估LALMs中CoT在输入音频和最终模型预测方面的忠实性。我们定义了音频忠实性的三个标准:无幻觉、整体性和专注聆听。我们还引入了一个基于音频和CoT干预的基准来评估忠实性\footnote{基准测试界面和评估结果可在以下网址获取:https://this https URL。}。在Audio Flamingo 3和Qwen2.5-Omni上的实验表明存在潜在的多模态脱节:推理通常与最终预测一致,但并不总是强烈基于音频,并且可能容易受到幻觉或对抗性扰动的影响。

英文摘要

Large Audio Language Models (LALMs) integrate audio encoders with pretrained Large Language Models to perform complex multimodal reasoning tasks. While these models can generate Chain-of-Thought (CoT) explanations, the faithfulness of these reasoning chains remains unclear. In this work, we propose a systematic framework to evaluate CoT faithfulness in LALMs with respect to both the input audio and the final model prediction. We define three criteria for audio faithfulness: hallucination-free, holistic, and attentive listening. We also introduce a benchmark based on both audio and CoT interventions to assess faithfulness\footnote{The benchmarking interface and evaluation results are available at https://poonehmousavi.github.io/faithfulness/. Experiments on Audio Flamingo 3 and Qwen2.5-Omni suggest a potential multimodal disconnect: reasoning often aligns with the final prediction but is not always strongly grounded in the audio and can be vulnerable to hallucinations or adversarial perturbations.

2605.07022 2026-06-18 cs.LG 版本更新

Self-Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale

自主驾驶数据集:从2000万篇论文到大规模精细化生物医学知识

Haydn Jones, Yimeng Zeng, Alden Rose, Li S. Yifei, Yining Huang, Kaiwen Wu, Jiaming Liang, Maggie Ziyu Huan, Yoseph Barash, Cesar de la Fuente-Nunez, Osbert Bastani, Zachary Ives, Mark Yatskar, Jacob R. Gardner

发表机构 * Department of Computer and Information Science, University of Pennsylvania(宾夕法尼亚大学计算机与信息科学系) Department of Genetics, University of Pennsylvania(宾夕法尼亚大学遗传学系) Departments of Bioengineering and Chemical and Biomolecular Engineering, University of Pennsylvania(宾夕法尼亚大学生物工程与化学与生物分子工程系)

AI总结 本文提出通过PubMed自动生成结构化数据集,实现更大规模、更精细和更准确的生物医学知识,展示Starling系统在多个任务中生成大规模数据集并提升准确性。

详情
AI中文摘要

人工编纂的生物医学仓库在生物活性、基因组学和化学领域昂贵且滞后于原始文献,丢弃实验背景,掩盖了评估数据正确性和覆盖范围所需的细微差别。我们证明PubMed本身可以被自动且经济地转化为结构化数据集,这些数据集比它们取代的编纂数据库更大、更细致和更准确。我们提出了三个耦合贡献:(1)基于九个生物医学本体的LLM实体标记流水线,能够在包含2250万篇论文和2500亿个token的PubMed语料库中标记45亿个实体,跨19个类别;(2)混合稀疏密集检索支持在标记语料库上执行实体过滤的语义查询;(3)Starling,一个多代理深度研究系统,仅给定自然语言任务描述,即可设计精度和召回率目标的检索过滤器,诱导提取模式,并输出具有丰富细节字段和支持段落的结构化记录。在六个任务中——血脑屏障渗透性、口服生物利用度、急性毒性(LD50)、基因疾病关联、蛋白质亚细胞定位和化学反应——Starling生成约630万条记录(每任务91K至3M条);其中一些是目前最大的公开数据集。前沿模型对我们的提取的拒绝率在0.6-7.7%之间,远低于我们在广泛使用的编纂数据集上测量的错误率(例如,BBB_Martins为16.5%,Bioavailability_Ma为7.3%)。除了规模和准确性外,支持段落还携带了表格数据库所丢弃的细微差别——例如,口服生物利用度可能取决于进食与否的状态。共同,语料库、检索和代理为AI驱动的治疗设计建立了基础。代码和数据集:https://github.com/starling-labs/starling.

英文摘要

Manually curated biomedical repositories -- spanning bioactivity, genomics, and chemistry -- are expensive to maintain, lag behind primary literature, and discard experimental context, obscuring nuances needed to assess data correctness and coverage. We show that PubMed itself can be autonomously and cost-effectively turned into structured datasets that are larger, more nuanced, and more accurate than the curated databases they replace. We present three coupled contributions: (1) an LLM-based entity-tagging pipeline, grounded in nine biomedical ontologies, that tags 4.5B entities across 19 categories in a 22.5M-paper, 2.5T-token PubMed corpus; (2) hybrid sparse-dense retrieval supporting entity-filtered semantic queries over the tagged corpus; and (3) Starling, a multi-agent deep research system that, given only a natural-language task description, designs precision- and recall-targeted retrieval filters, induces an extraction schema, and emits structured records with nuance-rich fields and supporting passages. Across six tasks -- blood-brain barrier permeability, oral bioavailability, acute toxicity (LD50), gene-disease associations, protein subcellular localization, and chemical reactions -- Starling produces ~6.3M records (91K-3M per task); several are, to our knowledge, the largest public datasets for their property. Frontier-model rejection of our extractions is 0.6-7.7% across tasks, far below error rates we measure on widely used curated counterparts (e.g., 16.5% on BBB_Martins, 7.3% on Bioavailability_Ma). Beyond scale and accuracy, the supporting passages carry nuance tabular databases discard -- e.g., oral bioavailability may depend on fed vs. fasted state. Together, the corpus, retrieval, and agent establish a foundation for AI-driven therapeutic design. Code and datasets: https://github.com/starling-labs/starling.

2606.07591 2026-06-18 cs.LG cs.AI cs.CL 版本更新

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research

ResearchClawBench: 端到端自主科学研究基准

Wanghan Xu, Shuo Li, Tianlin Ye, Qinglong Cao, Yixin Chen, Hengjian Gao, Yiheng Wang, Qi Li, Kun Li, Sheng Xu, Shengdu Chai, Fangchen Yu, Xiangyu Zhao, Zhangrui Zhao, Weijie Ma, Zijie Guo, Koutian Wu, Haoyu Zhou, Haoxiang Yin, Lixue Cheng, Chaofan Hu, Haoxuan Li, Lu Mi, Xuxuan Xie, Yifan Zhou, Ruizhe Chen, Zhiwang Zhou, Xingjian Guo, Yuhao Zhou, Xuming He, Shengyuan Xu, Xinyu Gu, Jiamin Wu, Mianxin Liu, Chunfeng Song, Fenghua Ling, Dongzhan Zhou, Shixiang Tang, Yuqiang Li, Mao Su, Peng Ye, Siqi Sun, Bin Wang, Xue Yang, Zhenfei Yin, Tianfan Fu, Guangtao Zhai, Wanli Ouyang, Bo Zhang, Lei Bai, Wenlong Zhang

发表机构 * Shanghai Artificial Intelligence Laboratory(上海人工智能实验室)

AI总结 提出ResearchClawBench基准,包含10个领域40个任务,通过多模态评分标准评估自主科研能力,最强智能体仅得21.5分,揭示当前系统在实验协议、证据匹配和科学核心方面的不足。

详情
AI中文摘要

AI编码智能体越来越多地用于科学工作,但其端到端自主研究能力仍然难以验证。我们提出了ResearchClawBench,一个用于评估自主科学研究的基准,涵盖来自10个科学领域的40个任务。每个任务基于一篇真实发表论文,提供相关文献和原始数据,并在评估期间隐藏目标论文。专家策划的多模态评分标准将目标科学制品分解为加权标准,从而能够评估目标论文级别的重新发现,同时为新发现留出空间。我们在统一协议下评估了七个自主研究(auto-research)智能体,并通过轻量级ResearchHarness评估了十七个原生LLM。当前系统远未达到可靠的重新发现:最强的自主智能体Claude Code平均得分为21.5,最强的ResearchHarness LLM Claude-Opus-4.7平均得分为20.7,LLM前沿均值仅为26.5。错误分析表明,失败集中在实验协议不匹配、证据不匹配和缺失科学核心。ResearchClawBench为衡量自主科学研究进展提供了一个可复现的评估前沿。

英文摘要

AI coding agents are increasingly used for scientific work, but their end-to-end autonomous research capability remains difficult to verify. We present ResearchClawBench, a benchmark for evaluating autonomous scientific research across 40 tasks from 10 scientific domains. Each task is grounded in a real published paper, provides related literature and raw data, and hides the target paper during evaluation. Expert-curated multimodal rubrics decompose the target scientific artifacts into weighted criteria, enabling evaluation of target-paper-level re-discovery while leaving room for new discovery. We evaluate seven autonomous research (auto-research) agents under a unified protocol and seventeen native LLMs through the lightweight ResearchHarness. Current systems remain far from reliable re-discovery: the strongest autonomous agent, Claude Code, averages 21.5, and the strongest ResearchHarness LLM, Claude-Opus-4.7, averages 20.7, with an LLM frontier mean of only 26.5. Error analysis shows that failures concentrate in experimental protocol mismatch, evidence mismatch, and missing scientific core. ResearchClawBench provides a reproducible evaluation frontier for measuring progress toward autonomous scientific research.

2407.18245 2026-06-18 cs.CV cs.LG 版本更新

VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic Dataset

VGGHeads: 基于大规模合成数据集的3D多头部对齐

Orest Kupyn, Eugene Khvedchenia, Christian Rupprecht

发表机构 * University of Oxford(牛津大学) Piñata Farms Ukrainian Catholic University(乌克兰天主大学)

AI总结 提出VGGHeads,一个由扩散模型生成的大规模合成数据集,用于单步同时进行头部检测和3D网格重建,在真实图像上表现优异。

详情
AI中文摘要

人类头部检测、关键点估计和3D头部模型拟合是许多应用中的基本任务。然而,传统的真实世界数据集常常存在偏差、隐私和伦理问题,并且是在实验室环境中记录的,这使得训练出的模型难以泛化。在这里,我们介绍\method——一个使用扩散模型生成的大规模合成数据集,用于人类头部检测和3D网格估计。我们的数据集包含超过100万张高分辨率图像,每张图像都标注了详细的3D头部网格、面部标志和边界框。利用这个数据集,我们引入了一种新的模型架构,能够从单张图像中单步同时进行头部检测和头部网格重建。通过广泛的实验评估,我们证明了在我们的合成数据上训练的模型在真实图像上取得了强劲的性能。此外,我们数据集的多样性使其适用于广泛的任务,提供了人类头部的通用和全面表示。

英文摘要

Human head detection, keypoint estimation, and 3D head model fitting are essential tasks with many applications. However, traditional real-world datasets often suffer from bias, privacy, and ethical concerns, and they have been recorded in laboratory environments, which makes it difficult for trained models to generalize. Here, we introduce \method -- a large-scale synthetic dataset generated with diffusion models for human head detection and 3D mesh estimation. Our dataset comprises over 1 million high-resolution images, each annotated with detailed 3D head meshes, facial landmarks, and bounding boxes. Using this dataset, we introduce a new model architecture capable of simultaneous head detection and head mesh reconstruction from a single image in a single step. Through extensive experimental evaluations, we demonstrate that models trained on our synthetic data achieve strong performance on real images. Furthermore, the versatility of our dataset makes it applicable across a broad spectrum of tasks, offering a general and comprehensive representation of human heads.

2507.07156 2026-06-18 stat.ML cs.CG cs.LG math.AT 版本更新

Unreduced Persistence Diagrams for Topological Machine Learning

未约简持久图在拓扑机器学习中的应用

Nicole Abreu, Parker B. Edwards, Francis Motta

发表机构 * Department of Mathematics and Statistics, Florida Atlantic University, Boca Raton, FL(数学与统计学系,佛罗里达国际大学, Boca Raton, FL)

AI总结 研究未约简边界矩阵生成的拓扑特征向量在机器学习中的性能,发现其与完全约简持久图性能相当甚至更优,且计算内存需求低一个数量级。

Comments Substantially expanded to include additional ML and software benchmark experiments. 11 figures, 4 tables, 20 pages (without appendix and references)

详情
AI中文摘要

基于持久同源性特征训练的监督机器学习流程在实验中被观察到忽略了持久图中包含的大量信息。然而,计算持久图通常是此类流程中计算最密集的步骤。为了探索这一动态,我们引入了几种从未约简边界矩阵生成拓扑特征向量的方法,并研究了它们的理论和计算性质。我们比较了基于未约简持久图向量化的流程与基于完全约简持久图向量化的流程在多种数据和任务类型上的性能。结果表明,基于未约简图构建的持久图训练的模型在某些任务上可以与基于完全约简图训练的模型表现相当,甚至更优。我们还对一个计算未约简图的算法进行了计算性能基准测试,该算法是Ripser的 heavily modified 版本。这些计算是可并行的,并且平均所需内存比计算完全持久图少一个数量级。我们的结果表明,利用未约简边界矩阵中包含信息的机器学习流程可能在计算成本和性能方面受益。

英文摘要

Supervised machine learning pipelines trained on features derived from persistent homology have been experimentally observed to ignore much of the information contained in a persistence diagram. Computing persistence diagrams is often the most computationally demanding step in such a pipeline, however. To explore this dynamic, we introduce several methods to generate topological feature vectors from unreduced boundary matrices and investigate their theoretical and computational properties. We compared the performance of pipelines trained on vectorizations of unreduced PDs to vectorizations of fully-reduced PDs across several data and task types. Our results indicate that models trained on PDs built from unreduced diagrams can perform on par and even outperform those trained on fully-reduced diagrams on some tasks. We also benchmarked the computational performance of an algorithm for computing unreduced diagrams, which was implemented as a heavily modified version of Ripser. These computations are parallelizable and required an order of magnitude less memory on average compared to computing full persistence diagrams. Our results suggest that machine learning pipelines which incorporate topology-based features may benefit in terms of computational cost and performance by utilizing information contained in unreduced boundary matrices.

2604.06367 2026-06-18 cs.CR cs.AI cs.LG 版本更新

WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

WebSP-Eval:在网站安全与隐私任务上评估网络代理

Guruprasad Viswanathan Ramesh, Asmit Nayak, Basieem Siddique, Kassem Fawaz

发表机构 * University of Wisconsin-Madison(威斯康星大学麦迪逊分校)

AI总结 提出WebSP-Eval框架,通过200个任务实例和自动化评估器,测试多模态大模型在网站安全与隐私任务上的表现,发现状态UI元素(如开关)导致超过45%的任务失败。

Comments Accepted at PETS 2026. Project Page: https://wiscprivacy.com/webspeval/

详情
AI中文摘要

网络代理自动化浏览器任务,从简单的表单填写到复杂的工作流程(如订购杂货)。虽然当前的基准测试评估通用性能(如WebArena)或针对恶意行为的安全性(如SafeArena),但没有现有框架评估代理成功执行面向用户的网站安全和隐私任务的能力,例如管理cookie偏好、配置隐私敏感账户设置或撤销非活动会话。为填补这一空白,我们引入了WebSP-Eval,一个用于衡量网络代理在网站安全和隐私任务上性能的评估框架。WebSP-Eval包括:1)一个手动制作的任务数据集,涵盖28个网站的200个任务实例;2)一个强大的代理系统,支持使用自定义Google Chrome扩展在多次运行中进行账户和初始状态管理;以及3)一个自动化评估器。我们使用最先进的多模态大语言模型评估了总共8个网络代理实例,对网站、任务类别和UI元素进行了细粒度分析。我们的评估显示,当前模型在可靠解决网站安全和隐私任务方面自主探索能力有限,并且在特定任务类别和网站上表现困难。关键的是,我们发现状态UI元素是代理失败的主要原因,其中开关导致许多模型超过45%的任务失败。

英文摘要

Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performance~(e.g., WebArena) or safety against malicious actions~(e.g., SafeArena), no existing framework assesses an agent's ability to successfully execute user-facing website security and privacy tasks, such as managing cookie preferences, configuring privacy-sensitive account settings, or revoking inactive sessions. To address this gap, we introduce WebSP-Eval, an evaluation framework for measuring web agent performance on website security and privacy tasks. WebSP-Eval comprises 1) a manually crafted task dataset of 200 task instances across 28 websites; 2) a robust agentic system supporting account and initial state management across runs using a custom Google Chrome extension; and 3) an automated evaluator. We evaluate a total of 8 web agent instantiations using state-of-the-art multimodal large language models, conducting a fine-grained analysis across websites, task categories, and UI elements. Our evaluation reveals that current models suffer from limited autonomous exploration capabilities to reliably solve website security and privacy tasks, and struggle with specific task categories and websites. Crucially, we identify stateful UI elements are a primary reason for agent failure, with toggles causing more than 45% task failure across many models.

2604.20822 2026-06-18 cs.CV cs.LG 版本更新

Global Offshore Wind Infrastructure: Deployment and Operational Dynamics from Dense Sentinel-1 Time Series

全球海上风电基础设施:基于密集Sentinel-1时间序列的部署与运行动态

Thorsten Hoeser, Felix Bachofer, Claudia Kuenzer

发表机构 * Earth Observation Center (EOC), German Aerospace Center (DLR)(地球观测中心(EOC),德国航空航天中心(DLR)) Institute for Geography and Geology, University of Wuerzburg(地理与地质研究所,乌尔姆大学)

AI总结 提出全球Sentinel-1 SAR时间序列数据集,通过目标检测和规则分类器识别海上风电基础设施的部署与运行阶段,支持全球尺度动态分析。

Comments 29 pages, 18 figures

详情
AI中文摘要

海上风电行业正在快速扩张,增加了对全球范围内基础设施部署和运行进行独立、高时间分辨率监测的需求。虽然基于地球观测的海上风电基础设施测绘在空间定位方面已经成熟,但现有的开放数据集缺乏关于建设和运行动态的时间密集且语义精细的信息。我们引入了一个全球Sentinel-1合成孔径雷达(SAR)时间序列数据语料库,该语料库解析了2016年第一季度至2025年第一季度海上风电基础设施的部署和运行阶段。基于更新的目标检测工作流程,我们在检测到的基础设施位置编译了15,606条时间序列,共有14,840,637个事件作为分析就绪的一维SAR后向散射剖面,每个剖面对应一次Sentinel-1采集和一个位置。为了便于直接使用和基准测试,我们发布了(i)分析就绪的一维SAR剖面,(ii)由基于规则的分类器生成的事件级基线语义标签,以及(iii)包含553条时间序列和328,657个事件标签的专家标注基准数据集。基线分类器在事件评估中实现了0.84的宏F1分数,在折叠编辑相似性-质量阈值曲线下面积(AUC)为0.785,表明时间一致性。我们证明,由此产生的语料库支持全球尺度的部署动态分析、区域部署模式差异的识别、船只交互和运行事件,并为开发和比较海上风电基础设施监测的时间序列分类方法提供了参考。

英文摘要

The offshore wind energy sector is expanding rapidly, increasing the need for independent, high-temporal-resolution monitoring of infrastructure deployment and operation at global scale. While Earth Observation based offshore wind infrastructure mapping has matured for spatial localization, existing open datasets lack temporally dense and semantically fine-grained information on construction and operational dynamics. We introduce a global Sentinel-1 synthetic aperture radar (SAR) time series data corpus that resolves deployment and operational phases of offshore wind infrastructure from 2016Q1 to 2025Q1. Building on an updated object detection workflow, we compile 15,606 time series at detected infrastructure locations, with overall 14,840,637 events as analysis-ready 1D SAR backscatter profiles, one profile per Sentinel-1 acquisition and location. To enable direct use and benchmarking, we release (i) the analysis ready 1D SAR profiles, (ii) event-level baseline semantic labels generated by a rule-based classifier, and (iii) an expert-annotated benchmark dataset of 553 time series with 328,657 event labels. The baseline classifier achieves a macro F1 score of 0.84 in event-wise evaluation and an area under the collapsed edit similarity-quality threshold curve (AUC) of 0.785, indicating temporal coherence. We demonstrate that the resulting corpus supports global-scale analyses of deployment dynamics, the identification of differences in regional deployment patterns, vessel interactions, and operational events, and provides a reference for developing and comparing time series classification methods for offshore wind infrastructure monitoring.

2604.28076 2026-06-18 cs.CL cs.AI cs.LG 版本更新

TopBench: A Benchmark for Implicit Predictive Reasoning in Tabular Question Answering

TopBench:表格问答中隐式预测推理的基准

An-Yang Ji, Jun-Peng Jiang, De-Chuan Zhan, Han-Jia Ye

发表机构 * School of Artificial Intelligence, Nanjing University, China(人工智能学院,南京大学,中国) National Key Laboratory for Novel Software Technology, Nanjing University, China(新型软件技术国家重点实验室,南京大学,中国)

AI总结 提出TopBench基准,包含779个样本和四个子任务,评估大语言模型在表格问答中识别隐式预测意图并进行可靠推理的能力,发现当前模型在意图识别上存在困难。

详情
AI中文摘要

大型语言模型(LLM)推动了表格问答的发展,其中大多数查询可以通过提取信息或简单聚合来回答。然而,一类常见的现实世界查询是隐式预测性的,需要从历史模式中推断未观察到的答案,而不仅仅是检索。这些查询带来了两个挑战:识别潜在意图和对大规模表格进行可靠的预测推理。为了评估LLM在带有隐式预测任务的表格问答中的表现,我们引入了TopBench,一个包含779个样本的基准,涵盖四个子任务,从单点预测到决策制定、处理效应分析和复杂过滤,要求模型生成涵盖推理文本和结构化表格的输出。我们在基于文本和代理工作流下评估了多种模型。实验表明,当前模型通常在意图识别上存在困难,默认进行查找。更深入的分析发现,准确的意图消歧是引导这些预测行为的前提。此外,提升预测精度的上限需要整合更复杂的建模或推理能力。

英文摘要

Large Language Models (LLMs) have advanced Table Question Answering, where most queries can be answered by extracting information or simple aggregation. However, a common class of real-world queries is implicitly predictive, requiring the inference of unobserved answers from historical patterns rather than mere retrieval. These queries introduce two challenges: recognizing latent intent and reliable predictive reasoning over massive tables. To assess LLMs in such Tabular questiOn answering with implicit Prediction tasks, we introduce TopBench, a benchmark consisting of 779 samples across four sub-tasks, ranging from single-point prediction to decision making, treatment effect analysis, and complex filtering, requiring models to generate outputs spanning reasoning text and structured tables. We evaluate diverse models under both text-based and agentic workflows. Experiments reveal that current models often struggle with intent recognition, defaulting to just lookups. Deeper analysis identifies that accurate intent disambiguation serves as the prerequisite for leading these predictive behaviors. Furthermore, elevating the upper bound of prediction precision requires the integration of more sophisticated modeling or reasoning capabilities.

2605.03460 2026-06-18 cs.AI cs.LG 版本更新

FinSTaR: Towards Financial Reasoning with Time Series Reasoning Models

FinSTaR:面向时间序列推理模型的金融推理

Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, Soonyoung Lee, Wonbin Ahn

发表机构 * LG AI Research(LG人工智能研究)

AI总结 针对时间序列推理模型在金融领域的失效问题,提出基于2x2能力分类法的FinSTaR模型,通过Compute-in-CoT和Scenario-Aware CoT策略在FinTSR-Bench基准上达到78.9%平均准确率。

Comments KDD Workshop on SciSoc Agents & LLMs 2026 (Oral Presentation)

详情
AI中文摘要

时间序列推理模型在通用领域表现出色,但在具有独特特征的金融领域却持续失败。我们提出一个通用的2x2能力分类法,通过交叉1)单实体与多实体分析,以及2)当前状态评估与未来行为预测来划分TSRM能力。我们在金融领域实例化该分类法——其中确定性评估与随机性预测的区分尤为关键——形成十个金融推理任务,并基于标普股票构建FinTSR-Bench基准。为此,我们提出FinSTaR(金融时间序列思考与推理),在FinTSR-Bench上训练,并针对每个类别采用不同的思维链策略。对于评估(确定性,即可从可观测数据计算得出),我们采用Compute-in-CoT,一种程序化思维链,使模型能够直接从原始价格推导答案。对于预测(本质上是随机的,即受不可观测因素影响),我们采用场景感知思维链,在做出判断前生成多种场景,模拟金融分析师在不确定性下的推理方式。所提方法在FinTSR-Bench上达到78.9%的平均准确率,显著优于LLM和TSRM基线。此外,我们展示了四个能力类别通过联合训练具有互补性和相互增强性,并且场景感知思维链相比标准思维链持续提升预测准确率。代码已公开:https://github.com/seunghan96/FinSTaR。

英文摘要

Time series (TS) reasoning models (TSRMs) have shown promising capabilities in general domains, yet they consistently fail in the financial domain, which exhibits unique characteristics. We propose a general 2 x 2 capability taxonomy for TSRMs by crossing 1) single-entity vs. multi-entity analysis with 2) assessment of the current state vs. prediction of future behavior. We instantiate this taxonomy in the financial domain-where the distinction between deterministic assessment and stochastic prediction is particularly critical-as ten financial reasoning tasks, forming the FinTSR-Bench benchmark based on S&P stocks. To this end, we propose FinSTaR (Financial Time Series Thinking and Reasoning), trained on FinTSR-Bench with distinct chain-of-thought (CoT) strategies tailored to each category. For assessment, which is deterministic (i.e., computable from observable data), we employ Compute-in-CoT, a programmatic CoT that enables models to derive answers directly from raw prices. For prediction, which is inherently stochastic (i.e., subject to unobservable factors), we adopt Scenario-Aware CoT, which generates diverse scenarios before making a judgment, mirroring how financial analysts reason under uncertainty. The proposed method achieves 78.9% average accuracy on FinTSR-Bench, substantially outperforming LLM and TSRM baselines. Furthermore, we show that the four capability categories are complementary and mutually reinforcing through joint training, and that Scenario-Aware CoT consistently improves prediction accuracy over standard CoT. Code is available at https://github.com/seunghan96/FinSTaR.

2606.16000 2026-06-18 cs.CL cs.LG 版本更新

GRACE-DS: a Guarded Reward-guided Agent Correction Environment in Data Science

GRACE-DS:数据科学中的受保护奖励引导智能体修正环境

Aleksandr Tsymbalov, Danis Zaripov, Artem Epifanov, Anastasiya Palienko

发表机构 * ITMO University(ITMO大学) HSE University(高等经济学院)

AI总结 提出GRACE-DS,一个用于评估LLM驱动的AutoML智能体在部署前性能的隔离环境,通过隐藏的可执行验证器衡量预测性能、泄漏避免、可重复性等指标,实验证明其灵活迭代交互模式优于基线方法。

详情
AI中文摘要

我们介绍了GRACE-DS,一个数据科学中的受保护奖励引导智能体修正环境,用于对LLM驱动的AutoML智能体进行部署前评估。GRACE-DS是一组在隔离环境中的评估指标,可应用于特定组织的表格ML任务。它将智能体暴露于现实的工作流阶段,从规划和数据检查到特征工程、模型开发、验证、代码修复直至最终提交,同时隐藏的可执行验证器不仅衡量最终预测性能,还衡量泄漏避免、可重复性、协议有效性、修正行为和奖励对齐。最强的结构化机制——灵活迭代交互(我们的方法)——实现了比单次生成、非结构化交互和基于重启的基线更高的端到端归一化隐藏测试质量,同时提高了协议有效完成率。经过7000多个回合的验证,这些结果确立了GRACE-DS作为评估基于LLM的AutoML智能体在生产类条件下按照组织特定要求执行机器学习工作流能力的稳健平台。

英文摘要

We introduce GRACE-DS, a Guarded Reward-guided Agent Correction Environment in Data Science for pre-deployment evaluation of LLM-powered AutoML agents. GRACE-DS is a set of evaluation metrics in an isolated environment that can be applied to tabular ML tasks specific to a particular organization. It exposes agents to realistic workflow stages, from planning and data inspection through feature engineering, model development, validation, and code repair to final submission, while hidden executable validators measure not only final predictive performance but also leakage avoidance, reproducibility, protocol validity, correction behavior, and reward alignment. The strongest structured regime, flexible iterative interaction (our approach), achieves higher end-to-end normalized hidden-test quality than single-shot generation, unstructured interaction, and restart-based baselines, while also improving protocol-valid completion. Validated across more than 7,000 episodes, these results establish GRACE-DS as a robust platform for assessing the capacity of LLM-based AutoML agents to execute machine learning workflows under production-like conditions and in accordance with organization-specific requirements.

2410.15595 2026-06-18 cs.AI cs.CL cs.LG 版本更新

A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications

直接偏好优化综述:数据集、理论、变体及应用

Wenyi Xiao, Zechuan Wang, Leilei Gan, Shuai Zhao, Zongrui Li, Ruirui Lei, Wanggui He, Luu Anh Tuan, Long Chen, Hao Jiang, Zhou Zhao, Fei Wu

发表机构 * Zhejiang University(浙江大学) Nanyang Technological University(南洋理工大学) Alibaba Group(阿里巴巴集团)

AI总结 综述直接偏好优化(DPO)在理论、变体、数据集和应用方面的进展,指出其作为RL-free替代方案的潜力与局限,并提出未来研究方向。

Comments Accepted by TPAMI 2026. Project page: https://github.com/Mr-Loevan/DPO-Survey

详情
AI中文摘要

随着大语言模型(LLMs)的快速发展,将策略模型与人类偏好对齐变得日益关键。直接偏好优化(DPO)作为一种有前景的对齐方法,作为从人类反馈中强化学习(RLHF)的无RL替代方案而出现。尽管DPO取得了各种进展并存在固有局限性,但文献中目前缺乏对这些方面的深入综述。在这项工作中,我们对DPO中的挑战和机遇进行了全面回顾,涵盖理论分析、变体、相关偏好数据集和应用。具体而言,我们基于关键研究问题对近期DPO研究进行分类,以提供对DPO当前格局的透彻理解。此外,我们提出了几个未来研究方向,为研究社区提供模型对齐的见解。相关论文的更新合集可在此https URL找到。

英文摘要

With the rapid advancement of large language models (LLMs), aligning policy models with human preferences has become increasingly critical. Direct Preference Optimization (DPO) has emerged as a promising approach for alignment, acting as an RL-free alternative to Reinforcement Learning from Human Feedback (RLHF). Despite DPO's various advancements and inherent limitations, an in-depth review of these aspects is currently lacking in the literature. In this work, we present a comprehensive review of the challenges and opportunities in DPO, covering theoretical analyses, variants, relevant preference datasets, and applications. Specifically, we categorize recent studies on DPO based on key research questions to provide a thorough understanding of DPO's current landscape. Additionally, we propose several future research directions to offer insights on model alignment for the research community. An updated collection of relevant papers can be found on https://github.com/Mr-Loevan/DPO-Survey.

12. 机器学习应用 71 篇

2606.18287 2026-06-18 cs.LG 新提交

Artemis: Anatomy-Resolved inTervention for Eliminating Multimodal NeuroImage confounderS

Artemis: 解剖分辨的干预方法用于消除多模态神经影像混杂因素

Siyuan Dai, Yang Du, Kun Zhao, Zhusuyi Chen, Heng Huang, Paul Thompson, Chao Shi, Haoteng Tang, Liang Zhan

发表机构 * University of Pittsburgh(匹兹堡大学) University of Maryland(马里兰大学) University of Southern California(南加州大学) Binghamton University(宾汉姆顿大学) University of Texas Rio Grande Valley(德克萨斯大学里奥格兰德河谷分校)

AI总结 提出Artemis框架,通过区域级因果干预学习特定脑区的混杂因素表示,消除fMRI和DTI多模态神经影像中人口统计学混杂因素对GNN的影响,在三个基准上提升性能。

Comments 11 pages, 8 figures

详情
AI中文摘要

多模态神经影像学整合了来自fMRI的功能连接和来自DTI的结构连接,使得使用图神经网络对脑网络进行无创分析成为可能。然而,年龄和性别等人口统计学因素系统地混淆了脑连接与临床结果之间的关系,导致GNN利用虚假捷径而非学习因果不变表示。尽管最近的因果GNN方法在图建模层面引入因果关系,但其因果机制仍然是领域无关的,没有考虑临床神经影像数据中固有的真实世界混杂因素。此外,脑网络是基于图谱分区构建的,每个区域对人口统计学因素表现出不同的敏感性,因此需要区域感知的调整。我们提出了Artemis,一个区域级因果框架,通过在每个脑区域独立进行因果干预,使用轻量级参数学习区域特定的混杂因素表示,从而弥合了这一差距。我们的调整综合利用多模态功能和结构特征进行图推理,作为一个与任意GNN骨干兼容的插件模块。在三个基准(用于疾病诊断的ADNI、用于痴呆分期的OASIS和用于性别分类的HCP)上的实验表明,与代表性的基于GNN的基线相比,该方法具有一致的改进。多项支持实验进一步证明了统计显著性和神经科学可解释性。

英文摘要

Multimodal neuroimaging, integrating functional connectivity from fMRI and structural connectivity from DTI, enables non-invasive analysis of brain networks using graph neural networks. However, demographic factors such as age and sex systematically confound the relationship between brain connectivity and clinical outcomes, causing GNNs to exploit spurious shortcuts rather than learning causally invariant representations. While recent causal GNN methods introduce causality at the graph-modeling level, their causal mechanisms remain domain-agnostic without accounting for the real-world confounders inherent in clinical neuroimaging data. Moreover, brain networks are constructed from atlas-based parcellations where each region exhibits distinct sensitivity to demographic factors, necessitating region-aware adjustment. We propose Artemis, a region-level causal framework that bridges this gap with causal intervention at each brain region independently by learning region-specific confounder representations with lightweight parameters. Our adjustment comprehensively utilized the multimodal functional and structural features for graph reasoning as a plug-in module compatible with arbitrary GNN backbones. Experiments on three benchmarks, ADNI for disease diagnosis, OASIS for dementia staging, and HCP for sex classification, demonstrate consistent improvements over representative GNN-based baselines. Multiple supporting experiments further demonstrate statistical significance and neuroscientific interpretability.

2606.18316 2026-06-18 cs.LG 新提交

A Survey on Data-Driven Models for Soil Moisture Regression and Classification

基于数据驱动的土壤湿度回归与分类模型综述

Ilektra Tsimpidi, George Georgoulas, Vidya Sumathy, George Nikolakopoulos

发表机构 * Electrical Engineering\ University of Technology\ , Sweden(电气工程\ 技术大学\ ,瑞典)

AI总结 综述了基于AI的土壤湿度建模方法,分为五类:统计时间序列、地统计、经典机器学习、深度学习和概率/贝叶斯方法,利用多源数据实现回归或分类。

Comments 14 pages, 3 figures, AIAI 2026 Conference

详情
AI中文摘要

土壤湿度(SM)建模构成一个复杂的时空学习问题,其特点是非线性环境相互作用、异构数据源和有限的地面观测。基于物理的方法,如水量平衡模型,依赖于明确的水文方程和高质量的输入,但其计算成本和可扩展性限制阻碍了大规模部署。数据驱动的人工智能(AI)方法已成为灵活的替代方案,能够以较少的建模假设提取土壤湿度与环境变量之间的经验关系。本文对基于AI的土壤湿度估计和分类模型进行了结构化综述。现有方法被组织为五类:(a)统计时间序列模型,(b)地统计方法,(c)经典机器学习(ML)模型,(d)深度学习(DL)模型和(e)概率/贝叶斯方法。这些模型利用历史土壤湿度记录、气象变量、植被指数、地形、土壤特征和地理位置数据来执行回归或分类任务。

英文摘要

Soil Moisture (SM) modelling constitutes a complex spatiotemporal learning problem characterised by nonlinear environmental interactions, heterogeneous data sources, and limited ground observations. Physics-based approaches, such as water balance models, rely on explicit hydrological equations and high-quality inputs, but their computational cost and scalability limitations restrict large-scale deployment. Data-driven artificial intelligence (AI) methods have emerged as flexible alternatives, enabling the extraction of empirical relationships between soil moisture and environmental variables with reduced modelling assumptions. This work presents a structured survey of AI-based models for soil moisture estimation and classification. Existing approaches are organized into five categories: (a) statistical time-series models, (b) geostatistical methods (c) classical machine learning (ML) models, (d) Deep Learning (DL) models and (e) Probabilistic/Bayesian methods. These models leverage historical soil moisture records, meteorological variables, vegetation indices, topography, soil characteristics, and geolocation data to perform regression or classification tasks.

2606.18319 2026-06-18 cs.LG cs.AI cs.HC cs.SE 新提交

ASTRA: A Scalable Next-Generation ATCO Training Simulator with Autonomous Simpilots

ASTRA:一种具有自主模拟飞行员的可扩展下一代空中交通管制员训练模拟器

Ethan Chew, Enjia Wu, Iruss Eng Wei Yeow, Ian Weiqin Lim, Ranen Sim, Brandon Koh Ziheng, Kaleb Nim, Caden Toh Jun Yi, Wei Dong Soin, Darius Kai Keat Koh, Galen King Yu Tay, Prannaya Gupta, Jonathan Ee Fang Koong, Yong Zhi Lim

发表机构 * Air Emerging Technologies High-Speed Experimentations and Research (AETHER), RSAF Agile Innovation Digital (RAiD), Republic of Singapore Air Force(新加坡共和国空军敏捷创新数字实验室空中新兴技术高速实验与研究)

AI总结 提出ASTRA模拟器,通过微调ASR将词错误率降至23.45%,并集成AI评估框架,实现可扩展的标准化ATCO训练。

详情
AI中文摘要

空中交通管制员(ATCO)对于确保空中交通的安全、有序和高效至关重要,但培训能力受到依赖专门的人类培训师(称为模拟飞行员)的限制,这些培训师必须在模拟空域中扮演飞行员和ATCO的双重角色。现有的自动化解决方案依赖于西方中心的语音模型,这些模型在新加坡的运营环境中表现不佳,现成的系统在新加坡口音的航空语音上词错误率(WER)高达107.80%。我们引入了ASTRA,一个端到端的训练模拟器,通过一个流水线自动化这些模拟飞行员角色,该流水线转录ATCO语音、解释指令,并使用本地适应的语音模型生成适当的飞行员和ATCO响应。我们微调的自动语音识别(ASR)流水线将WER降低到23.45%,在该领域显著优于现有方法。除了交通模拟,ASTRA还集成了一个AI辅助的性能评估框架,该框架评估受训者的无线电通信的准确性、简洁性和完整性,优化后得分分别为91.7%、88.2%和86.9%。基于DSPy和Unsloth等开源基础,这种方法实现了可扩展、标准化的ATCO评估,同时减少了教师的工作量。

英文摘要

Air Traffic Control Operators (ATCOs) are vital in ensuring the safe, orderly, and efficient flow of air traffic, yet training capacity is constrained by reliance on specialized human trainers known as simpilots, who must role-play both pilots and ATCOs in a simulated airspace. Existing automated solutions rely on Western-centric speech models that perform poorly in Singaporean operational contexts, with off-the-shelf systems exhibiting Word Error Rates (WER) of up to 107.80% on Singaporean-accented aviation speech. We introduce ASTRA, an end-to-end training simulator that automates these simpilot roles through a pipeline that transcribes ATCO speech, interprets instructions, and generates appropriate pilot and ATCO responses using locally adapted voice models. Our fine-tuned Automatic Speech Recognition (ASR) pipeline reduces WER to 23.45%, substantially outperforming existing approaches in this domain. Beyond traffic simulation, ASTRA incorporates an AI-assisted performance evaluation framework that assesses trainee radiotelephony communications across accuracy, brevity, and completeness, achieving post-optimization scores of 91.7%, 88.2%, and 86.9%, respectively. Built on open-source foundations such as DSPy and Unsloth, this approach enables scalable, standardized ATCO assessment while reducing instructor workload.

2606.18479 2026-06-18 cs.LG cs.CY 新提交

The Illusion of Improvement: Reject Inference Strategies in Credit Scoring

改进的幻觉:信用评分中的拒绝推断策略

Bruno Scarone, Ricardo Baeza-Yates

发表机构 * Northeastern University(东北大学) KTH Royal Institute of Technology(瑞典皇家理工学院)

AI总结 研究揭示拒绝推断方法在信用评分中因反馈循环导致评估指标误导,提出通过少量探索打破循环并诊断问题。

Comments Accepted to ECML PKDD 2026 (Research Track)

详情
AI中文摘要

拒绝推断方法被广泛用于减轻信用评分中的生存偏差,但其有效性仍不明确。我们系统评估了几种此类方法,并发现一个结构性失败模式:在自然的再训练循环中,模型的准确率提升而召回率崩溃,造成改进的幻觉,使从业者认为系统在变好,而实际上其拒绝质量——正确筛选出违约者的能力——在恶化。然后,我们提出一种受控探索策略,无需统计假设即可打破反馈循环:贷款方故意批准一部分被拒绝的申请人,并观察他们的真实结果。我们表明,准确率和拒绝质量在是否探索上给出相反的建议:准确率倾向于不探索,而拒绝质量随探索提高,证实标准评估指标在选择性偏差下具有误导性。即使极低的探索率(2-5%)在我们的实验中也足以以近乎零成本诊断反馈循环的严重性。我们的发现在两种机器学习方法和三个真实数据集上一致,表明标准评估协议不足以评估在生存偏差下训练的模型。

英文摘要

Reject inference methods are widely used to mitigate survival bias in credit scoring, yet their effectiveness remains poorly understood. We systematically evaluate several such methods and uncover a structural failure mode: in a natural retraining cycle, models whose accuracy improves while recall collapses create an illusion of improvement that leads practitioners to believe the system is getting better when, in fact, its rejection quality -- the ability to correctly screen out defaulters -- is deteriorating. We then propose a controlled exploration strategy that breaks the feedback loop without statistical assumptions: the lender deliberately approves a fraction of rejected applicants and observes their true outcomes. We show that accuracy and rejection quality give opposite recommendations on whether to explore: accuracy favors no exploration, while rejection quality improves with it, confirming that standard evaluation metrics are misleading under selection bias. Even minimal exploration rates (2--5\%) prove sufficient in our experiments to diagnose the severity of the feedback loop at near-zero cost. Our findings are consistent across two machine learning methods and three real-world datasets, and suggest that standard evaluation protocols are inadequate for assessing models trained under survival bias.

2606.18506 2026-06-18 cs.LG eess.SP stat.AP 新提交

Beyond AHI: An Interpretable Causal-Discovery-Guided Framework for Sleep Recovery in Connected Health

超越AHI:一种可解释的因果发现引导的睡眠恢复框架在互联健康中的应用

Saba A. Farahani, Elahe Khatibi, Manoj Vishwanath, Amir M. Rahmani, Hung Cao

发表机构 * University of California, Irvine(加州大学尔湾分校)

AI总结 提出一种可解释的因果发现引导框架,从多模态PSG中推导层次化睡眠恢复评分(SRS),在两大队列中SRS与感知恢复的关联强度是AHI的2.5倍。

Comments 6 pages, 2 figures, 2 tables. Accepted at the 2nd Workshop on Sensing and Computing for Smart and Connected Health (SCH), co-located with IEEE/ACM CHASE 2026

详情
AI中文摘要

客观睡眠评估依赖于多导睡眠图(PSG),但临床影响通常更好地反映在患者报告结局(PROs)如嗜睡和疲劳中。现有的总结指标,包括呼吸暂停低通气指数(AHI),对功能恢复背后的多域生理学提供的洞察有限。我们提出了一种可解释的、因果发现引导的框架,用于从多模态PSG中推导层次化睡眠恢复评分(SRS)。利用两个大型人群队列(MESA: n=1540; MrOS: n=825),我们应用有向无环图(DAG)学习来识别候选生理驱动因素,涵盖呼吸负担、缺氧负担、睡眠碎片化、睡眠结构和自主神经调节。尽管源自临床PSG,这些域自然映射到互联健康技术中日益可用的传感流,包括可穿戴心电图、血氧测定和睡眠阶段估计设备。为了保持机制合理性,我们引入了一个两阶段筛选过程,结合基于生理学的约束和受约束的LLM辅助审计,以识别和消除结构混杂因素以及构造重叠变量。跨队列,这五个域作为与恢复相关的重复生理域出现,所得SRS与感知恢复的关联强度高达AHI的2.5倍。通过将多模态睡眠生理学与以患者为中心的结果通过可解释、偏差感知和域结构化的框架联系起来,这项工作为临床睡眠研究和新兴智能互联健康环境中的恢复建模提供了实用基础。

英文摘要

Objective sleep assessment relies on polysomnography (PSG), yet clinical impact is often better reflected in patient-reported outcomes (PROs) such as sleepiness and fatigue. Existing summary indices, including the Apnea-Hypopnea Index (AHI), provide limited insight into the multidomain physiology underlying functional recovery. We propose an interpretable, causal-discovery--guided framework for deriving a hierarchical Sleep Recovery Score (SRS) from multimodal PSG. Using two large population cohorts (MESA: n=1540; MrOS: n=825), we apply directed acyclic graph (DAG) learning to identify candidate physiological drivers spanning respiratory burden, hypoxic burden, sleep fragmentation, sleep architecture, and autonomic regulation. Although derived from clinical PSG, these domains map naturally to sensing streams increasingly available in connected health technologies, including wearable ECG, oximetry, and sleep-stage estimation devices. To preserve mechanistic plausibility, we introduce a two-stage screening process that combines physiology-based constraints with constrained LLM-assisted auditing to identify and remove structural confounders and construct-overlapping variables. Across cohorts, these five domains emerge as recurrent physiological domains associated with recovery, and the resulting SRS shows up to 2.5$\times$ stronger alignment with perceived recovery than AHI. By linking multimodal sleep physiology to patient-centered outcomes through an interpretable, bias-aware, and domain structured framework, this work provides a practical foundation for recovery modeling across both clinical sleep studies and emerging smart and connected health settings.

2606.18561 2026-06-18 cs.LG cs.AI 新提交

Correcting Sensor-Induced Distribution Drift with Wasserstein Adversarial Learning

使用Wasserstein对抗学习校正传感器引起的分布漂移

Saraa Ali, Vladimir Bocharnikov, Fedor Ratnikov, Mikhail Hushchyn, Artem Ryzhikov, Denis Derkach

发表机构 * Laboratory of Methods for Big Data Analysis, HSE University(大数据分析方法实验室,高等经济大学)

AI总结 提出WGAN方法,通过可学习的校准变换将变化检测器响应分布映射回参考分布,在探测器模型和模拟量能器数据上验证了恢复老化系数和改善能量分布一致性的能力。

Comments This is a preprint sent to Nuclear Science and Techniques journal

详情
AI中文摘要

记录数据的质量取决于采集数据的传感器系统的稳定性。传感器运动和老化会降低下游数据驱动方法的性能和稳定性。我们提出了一种基于Wasserstein-GAN的无监督方法,用于推断物理可解释的变换参数,这些参数将变化的检测器响应分布映射回标称参考分布。与标准生成建模不同,生成器被用作可学习的校准变换,其可训练权重代表所寻求的参数,而判别器通过Wasserstein目标提供分布距离信号。我们在具有受控层偏移的跟踪探测器玩具模型上验证了该方法,并展示了其在具有单元老化效应的高粒度Geant4模拟量能器数据上的应用。该方法恢复了单个单元的老化系数,与真实值相关,并改善了校准后和参考能量和分布之间的一致性,同时随着通道间噪声水平的增加而表现出预期的退化。这些结果表明,在退化参数的直接标签不可用的情况下,对抗性分布匹配可以作为校准策略的数据驱动组件。

英文摘要

The quality of recorded data depends on the stability of the sensor system that acquires it. Sensor motion and aging can degrade the performance and stability of downstream data-driven methods. We present a Wasserstein-GAN-inspired approach for unsupervised inference of physically interpretable transformation parameters that map a changed detector response distribution back to a nominal reference distribution. In contrast to standard generative modeling, the generator is used as a learnable calibration transformation whose trainable weights represent the sought parameters, while the critic provides a distributional distance signal via the Wasserstein objective. We validate the approach on a tracking-detector toy model with controlled layer shifts and demonstrate its application on high-granularity Geant4-simulated calorimeter data with cell-wise aging effects. The method recovers aging coefficients for individual cells with correlation to ground truth and improves agreement between calibrated and reference energy-sum distributions, while exhibiting the expected degradation at increasing channel-to-channel noise levels. These results indicate that adversarial distribution matching can serve as a data-driven component of calibration strategies in settings where direct labels for degradation parameters are unavailable.

2606.18571 2026-06-18 cs.LG cs.CL cs.SD eess.AS 新提交

Fair Cognitive Impairment Detection Through Unlearning

通过去学习实现公平的认知障碍检测

William Nguyen, Jiali Cheng, Hadi Amiri

发表机构 * University of Massachusetts Lowell, USA(马萨诸塞大学洛厄尔分校)

AI总结 提出一种多模态框架,结合跨模态融合和梯度反转去学习,减少人口统计信息对轻度认知障碍检测的偏见,在跨语言数据集上缩小性能差距。

Comments Interspeech 2026

详情
AI中文摘要

轻度认知障碍(MCI)是一种以记忆、语言或思维能力显著下降为特征的医学状况。从自发语音中检测MCI对于可扩展的筛查具有前景。然而,学习模型常常利用与标签相关的人口统计线索,导致不同亚组之间存在较大的性能差距。我们提出了一种多模态框架,结合了(i)模态间(语音、文本和图像)的跨模型融合,以及(ii)使用梯度反转的去学习,该技术阻止共享嵌入编码与任务无关的人口统计属性。在多语言基准TAUKADIAL和PREPARE上的评估表明,我们的方法在MCI分类上优于最先进的多语言和多模态基线,同时显著缩小了患者亚组(性别和语言)之间的性能差距。我们进一步分析了跨数据集的迁移,表明人口统计去学习有助于学习更鲁棒的MCI检测表示。

英文摘要

Mild Cognitive Impairment (MCI) is a medical condition characterized by a noticeable decline in memory, language, or thinking abilities. MCI detection from spontaneous speech is promising for scalable screening. However, learned models often exploit demographic cues correlated with labels, resulting in a large performance gap across subgroups. We present a multimodal framework that combines (i) cross-model fusion between modalities (speech, text, and image), and (ii) unlearning using gradient reversal that discourages the shared embedding from encoding task-irrelevant demographic attributes. Evaluated on the multilingual benchmarks TAUKADIAL and PREPARE, our method outperforms the state-of-the-art multilingual and multimodal baseline in MCI classification while substantially reducing the performance gap across patient subgroups (sex and language). We further analyze transfer across datasets, showing that demographic unlearning helps learn more robust representations for MCI detection.

2606.18672 2026-06-18 cs.LG cs.AI q-bio.GN 新提交

scGTN: Deep Siamese Graph Transformer Network for Single-cell RNA Sequencing Clustering

scGTN:用于单细胞RNA测序聚类的深度孪生图变换网络

Jinke Wu, Yifan Wang, Siyu Yi, Caiyang Yu, Ziyue Qiao, Nan Yin, Jiancheng Lv, Wei Ju

发表机构 * Sichuan University(四川大学) University of International Business and Economics(对外经济贸易大学) Great Bay University(大湾区大学) The Education University of Hong Kong(香港教育大学)

AI总结 提出scGTN框架,通过孪生图变换网络整合基因表达与细胞间结构信息,利用最优传输策略进行自监督聚类,在多个数据集上优于现有方法。

Comments Accepted by Proceedings of the Thirty-Fifth International Joint Conference on Artificial Intelligence (IJCAI 2026)

详情
AI中文摘要

单细胞RNA测序(scRNA-seq)在表征细胞水平基因表达、识别细胞类型以及促进对细胞异质性的理解中起着关键作用。尽管scRNA-seq数据聚类取得了显著进展,但我们认为当前方法常常忽略scRNA-seq数据固有的稀疏性和噪声,以及复杂的细胞间结构信息。为此,本文提出了一种基于深度孪生图变换网络(称为scGTN)的新型单细胞RNA-seq聚类框架,该框架明确整合了基因表达谱和细胞间结构依赖关系以进行细胞聚类。具体而言,我们将scRNA-seq数据建模为图,并构建两个增强图视图作为双视图以捕获互补的细胞间信息。然后,采用孪生图变换网络显式整合最短路径信息和节点间距离,以捕获细胞间更丰富的结构关系。最后,我们采用最优传输策略以自监督方式指导细胞聚类。在多个基准scRNA-seq数据集上的大量实验表明,我们的scGTN始终优于现有方法。我们的代码可在以下网址获取:https://github.com/...(原文链接)。

英文摘要

Single-cell RNA sequencing (scRNA-seq) serves a pivotal role in characterizing gene expression at the cellular level, enabling the identification of cell types and advancing the understanding of cellular heterogeneity. Despite the significant progress in scRNA-seq data clustering, we argue that current methods always ignore the sparsity and noise, as well as the complex intercellular structural information inherent in scRNA-seq data. Toward this end, in this paper, we propose a novel single-cell RNA-seq clustering framework via deep Siamese Graph Transformer Network (termed scGTN), which explicitly integrates gene expression profile and intercellular structural dependencies for cell clustering. In particular, we formulate scRNA-seq data as a graph and construct two augmented graph views that serve as dual views to capture complementary intercellular information. Then, a Siamese graph transformer network is employed to explicitly incorporate shortest-path information and node-wise distances for capturing richer structural relationships between cells. Finally, we employ an optimal transport strategy to guide the cell clustering in a self-supervised manner. Extensive experiments on multiple benchmark scRNA-seq datasets demonstrate that our scGTN consistently outperforms existing methods. Our code is available at https://github.com/W-RMSL/scGTN.

2606.18713 2026-06-18 cs.LG physics.comp-ph 新提交

Trainable Photonic Measurement for Physics-Informed PDE Learning

可训练光子测量用于物理信息偏微分方程学习

Jiale Linghu, Hao Dong, Yangshuai Wang

发表机构 * Xidian University(西安电子科技大学) National University of Singapore(新加坡国立大学)

AI总结 提出一种光子量子神经场,将坐标编码为可训练光学相位,通过多光子Fock空间干涉混合并从光子数测量解码,作为物理信息残差最小化的可训练表示,在七种PDE基准上展示相位复杂度转变,在困难区域误差低一个数量级且参数少约四分之一。

详情
AI中文摘要

光子量子机器学习提供了一条从相位、干涉和测量构建可训练物理表示的途径。然而,其在科学机器学习中的作用仍 largely unexplored。物理信息神经场提供了一个自然设置,因为微分方程需要保留相位、频率和导数结构的试验空间。这里我们引入一种光子量子神经场,其中坐标成为可训练光学相位,通过多光子Fock空间干涉混合,并从光子数测量解码。光子电路本身作为神经场表示进行优化,而非固定特征图或硬件加速器。因此,光子测量是一种可训练表示,在此基础上最小化物理信息残差。在七个椭圆、波动、非线性色散和逆PDE基准测试中,我们观察到相位复杂度转变:经典坐标和傅里叶特征网络在平滑区域足够,而光子场在残差导数放大相位失配时最准确。在最困难区域,它给出最低误差,差距达一个数量级,且可训练参数约为经典基线四分之一。冻结和打乱控制以及噪声压力测试将这一增益归因于学习到的干涉和在复合扰动下稳定的Fock概率读出。这些结果将光子量子测量识别为科学机器学习的一种表示学习原理。

英文摘要

Photonic quantum machine learning offers a route to trainable physical representations built from phase, interference and measurement. However, its role in scientific machine learning remains largely unexplored. Physics-informed neural fields provide a natural setting, because differential equations require trial spaces that preserve phase, frequency and derivative structure. Here we introduce a photonic quantum neural field in which coordinates become trainable optical phases, are mixed by multi-photon Fock-space interference and are decoded from photon-number measurements. The photonic circuit is optimized as the neural-field representation itself, not as a fixed feature map or hardware accelerator. Photonic measurement is therefore a trainable representation on which the physics-informed residual is minimized. Across seven elliptic, wave, nonlinear dispersive and inverse PDE benchmarks, we observe a phase-complexity transition: classical coordinate and Fourier-feature networks suffice in smooth regimes, whereas the photonic field is most accurate when residual derivatives amplify phase mismatch. In the hardest regimes it gives the lowest errors, with margins reaching an order of magnitude and about one quarter of the trainable parameters of classical baselines. Frozen and shuffled controls, together with noise stress tests, attribute this gain to learned interference and stable Fock-probability readout under compound perturbations. These results identify photonic quantum measurement as a representation-learning principle for scientific machine learning.

2606.18726 2026-06-18 cs.LG cs.AI 新提交

Graph Grounded Cross Attention Transformer Neural Network for Structurally Constrained Full Event Sequence Generation in Predictive Process Monitoring

基于图锚定交叉注意力Transformer神经网络的预测过程监控中结构约束完整事件序列生成

Fang Wang, Ernesto Damiani

发表机构 * Department of Computer Science, University of Milan(米兰大学计算机科学系)

AI总结 提出图锚定交叉注意力Transformer(GGATN),通过全局过程图作为结构化记忆、Transformer自注意力编码序列位置、图锚定交叉注意力注入过程拓扑,结合维特比式图约束解码,一次性生成完整事件序列,在六个基准日志上优于LLM基线。

Comments 40 pages

详情
AI中文摘要

结构约束的事件序列生成仍然具有挑战性,因为生成的路径必须保持转移可行性、时间顺序、终止和属性一致性。在预测过程监控(PPM)中,这一挑战表现为完整事件序列生成,而现有工作主要处理子任务,如下一个活动、剩余时间、结果和属性预测。本文提出了图锚定交叉注意力Transformer神经网络(GGATN)用于这一统一的PPM任务。GGATN使用全局过程图作为结构化活动记忆,通过Transformer自注意力对序列位置进行上下文化,并通过图锚定交叉注意力注入过程拓扑。与自回归解码不同,GGATN一次性生成活动、时间戳、长度以及事件级和序列级属性,随后进行维特比风格的图约束解码以获得可行路径和显式终止。在六个基准事件日志上的实验表明,其生成质量优于局部指令提示的LLM基线。GGATN在序列相似性、Damerau-Levenshtein相似性、基于二元组的控制流相似性和持续时间分布方面取得了强劲性能,同时保持零幻觉活动和零序列级属性不一致。消融分析证实了全局图编码器作为稳定的结构先验。可解释性分析展示了图结构、序列上下文、反馈细化和约束解码如何塑造生成过程。

英文摘要

Structurally constrained event sequence generation remains challenging because generated paths must preserve transition feasibility, temporal order, termination, and attribute consistency. In predictive process monitoring (PPM), this challenge appears as full event sequence generation, whereas existing work mainly addresses component tasks such as next activity, remaining time, outcome, and attribute prediction. This paper proposes the Graph Grounded Cross Attention Transformer Neural Network (GGATN) for this unified PPM task. GGATN uses a global process graph as structured activity memory, contextualizes sequence positions through Transformer self attention, and injects process topology through graph grounded cross attention. Unlike autoregressive decoding, GGATN generates activities, timestamps, length, and event level and sequence level attributes in a single pass, followed by Viterbi style graph constrained decoding for feasible paths and explicit termination. Experiments on six benchmark event logs show more reliable generation quality than local instruction prompted LLM baselines. GGATN achieves strong performance on sequence similarity, Damerau Levenshtein similarity, bigram based control flow similarity, and duration distribution, while maintaining zero hallucinated activities and zero sequence level attribute inconsistency. Ablation analyses confirm the global graph encoder as a stable structural prior. Interpretability analyses show how graph structure, sequence context, feedback refinement, and constrained decoding shape generation.

2606.18732 2026-06-18 cs.LG cs.CV 新提交

Low-Cost Neuromorphic Fall Detection Using Synthetic Event Data and Hybrid SNNs

低成本神经形态跌倒检测:使用合成事件数据和混合SNN

Guillermo Rojas, Gonzalo Soto, Daniel Yunge

发表机构 * School of Electrical Engineering Pontificia Universidad Católica de Valparaíso, Chile(瓦尔帕莱索天主教大学电气工程学院)

AI总结 提出混合SNN-CNN模型,从智能手机视频合成事件相机数据,实现高效准确的跌倒检测。

Comments 4 pages, 6 figures, presented at ICONS 2025 during the Poster Session, but not published

详情
AI中文摘要

本工作提出了混合模型,将脉冲神经网络(SNN)与卷积神经网络(CNN)组件集成,以从传统智能手机视频生成的模拟事件相机数据(动态视觉传感器,DVS)中学习。主要针对人类跌倒检测,该方法通过将视频帧转换为事件数据,利用SNN的能效和时空处理能力。通过多个数据集上的模拟评估所提出的模型,并将其性能与传统机器学习模型进行比较。结果表明,在不牺牲准确性的情况下显著提高了效率,强调了将SNN和DVS技术结合用于现实环境中复杂任务的潜力。

英文摘要

This work presents the development of hybrid models that integrate spiking neural networks (SNNs) with components of convolutional neural networks (CNNs) to learn from simulated event-based camera data (Dynamic Vision Sensor, DVS) generated from conventional smartphone videos. Aimed primarily at human fall detection, the approach leverages the energy efficiency and spatio-temporal processing capabilities of SNNs by converting video frames into event-based data. The proposed models are evaluated through simulations on multiple datasets, comparing their performance to that of traditional machine learning models. Results demonstrate significant gains in efficiency without sacrificing accuracy, underscoring the potential of combining SNNs and DVS technology for complex tasks in real-world environments.

2606.18857 2026-06-18 cs.LG physics.ao-ph 新提交

Investigating Inductive Biases for Machine Learning Emulation of Sudden Stratospheric Warmings in Idealised Isca Simulations

研究理想化Isca模拟中平流层突然增温的机器学习模拟的归纳偏差

Oskar Bohn Lassen, Simon Driscoll, Stephen I. Thomson, Sebastian Schemm, Francisco C. Pereira

发表机构 * Technical University of Denmark(丹麦技术大学) University of Cambridge(剑桥大学) University of Exeter(埃克塞特大学)

AI总结 测试不同架构的归纳偏差对模拟平流层突然增温动力学的影响,发现三维垂直耦合是关键,但低预测误差不保证物理一致性。

详情
AI中文摘要

机器学习模拟器越来越多地用于天气预报,并有可能通过学习动态重要的可预测性来源,将技能扩展到次季节到季节时间尺度。一个关键挑战是模型能否利用可预测性锚点,例如平流层变率,这些锚点在超出短期超前时间时影响对流层环流。我们使用配对的理想化Isca模拟测试架构归纳偏差如何影响对平流层突然增温(SSW)动力学的模拟,这些模拟仅在施加的波-2加热扰动上有所不同。在用于一步预测的卷积、变换器和基于图的架构中,当平流层动态安静时,模型差异不大,但当类似SSW的变率活跃时,差异显著扩大。我们的结果确定显式三维垂直耦合是机器学习模拟平流层动力学的关键归纳偏差。然而,Eliassen-Palm通量诊断表明,低预测误差并不能保证物理上真实的波-平均流相互作用,平流层波驱动结构中仍存在相干误差。

英文摘要

Machine-learning emulators are increasingly used for weather prediction and have the potential to extend skill on subseasonal-to-seasonal timescales by learning dynamically important sources of predictability. A key challenge is whether the models can exploit predictability anchors, such as stratospheric variability, that influence tropospheric circulation beyond short lead times. We test how architectural inductive bias affects emulation of sudden stratospheric warming (SSW) dynamics using paired idealised Isca simulations that differ only in an imposed wave-2 heating perturbation. Across convolutional, transformer, and graph-based architectures trained for one-step prediction, model differences are modest when the stratosphere is dynamically quiet but widen substantially when SSW-like variability is active. Our results identify explicit three-dimensional vertical coupling as a key inductive bias for machine-learning emulation of stratospheric dynamics. However, Eliassen-Palm flux diagnostics show that low forecast error does not guarantee physically faithful wave-mean-flow interaction, with coherent errors remaining in stratospheric wave-driving structure.

2606.18864 2026-06-18 cs.LG cs.AI 新提交

Scaling Learning-based AEB with Massive Unlabeled Data

基于大规模无标签数据的可扩展学习型自动紧急制动

Xiangyu Wang, Yang Zhan, Mengxiang Hao, Chuanchuan Zhong, Yansong Jia, Junjie Zhang, Yu Han, Xin Jiang, Zhen Cao, Ying Wang, Yulun Song, Zhitao Xu

发表机构 * Li Auto

AI总结 提出稳定元反馈半监督学习框架,通过噪声感知解耦和运动学门控伪标签,利用大规模无标签数据提升自动紧急制动性能,实现超100:1正误触发比和35%无事故里程提升。

Comments Accepted for presentation at the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

详情
AI中文摘要

本文研究如何在生产约束下,利用大规模无标签车队数据扩展基于学习的自动紧急制动(AEB)。我们的方法基于元反馈半监督学习(MF-SSL),其中教师模型为无标签驾驶数据生成伪标签,并使用小型有标签锚定集作为安全关键反馈进行更新。在生产中,锚定歧义和有标签-无标签不匹配会放大系统性的伪标签错误,导致误触发。我们提出了一种稳定的MF-SSL框架,包括:(i) 噪声感知解耦,从教师监督更新路径中移除易产生歧义的锚定;(ii) 运动学门控伪标签,结合教师冲突惩罚,抑制无标签数据上由不匹配引起的风险幻觉,同时保持广泛覆盖。大量实验表明,随着无标签数据从1M扩展到1B窗口,模型性能持续提升,在保持舒适性的同时提高了安全性。经过1B数据训练的学生模型已部署到数十万辆车辆上,并在超过10^9公里的行驶中得到验证,实现了超过100:1的正误触发比,且相比仅基于规则的基线,无事故行驶里程提升了35%。

英文摘要

This paper studies how to scale learning-based automatic emergency braking (AEB) with massive unlabeled fleet data under production constraints. Our approach is based on meta-feedback semi-supervised learning (MF-SSL), where a teacher generates pseudo labels for unlabeled driving data and is updated using a small labeled anchor set as safety-critical feedback. In production, anchor ambiguity and labeled-unlabeled mismatch can amplify systematic pseudo-label errors, leading to spurious triggers. We propose a stabilized MF-SSL framework with (i) Noise-Aware Decoupling, which removes ambiguity-prone anchors from the teacher's supervised update path, and (ii) kinematics-gated pseudo-labeling with a teacher conflict penalty to suppress mismatch-induced risk hallucinations on unlabeled data while maintaining broad coverage. Extensive experiments show consistent gains as unlabeled data scale from 1M to 1B windows, improving safety while keeping comfort stable. The 1B-trained student model is deployed to hundreds of thousands of vehicles and validated over \$10^9$ km of driving, achieving a positive-to-false activation ratio exceeding 100:1 and a 35% improvement in accident-free driving mileage over a production rule-only baseline.

2606.18882 2026-06-18 cs.LG cs.AI eess.SP 新提交

Domain-Shift Aware Neural Networks for Unbalance Characterization in Rotating Systems

面向旋转系统不平衡表征的域偏移感知神经网络

Bernardo Feijó Junqueira, Claudio Kiyoshi Umezu, Bruno Bilhar Karaziack, Tomaz Junior, Daniel Alves Castello

发表机构 * Springer Nature

AI总结 提出域偏移感知神经网络,通过最大均值差异策略对齐源域与目标域特征,解决变工况下旋转轴不平衡质量估计的回归问题,实验证明该方法在域偏移未知时显著提升预测精度。

详情
AI中文摘要

本文研究了域偏移感知神经网络在回归任务中的应用,旨在估计不同运行条件下旋转轴的不平衡质量。实验数据来自一个测试台,其中主轴上安装有带不平衡质量的法兰,在不同转速下驱动,同时可选择性地激活副轴以引入域差异。不平衡质量固定在径向距离上,使用三轴加速度计记录系统的动态响应。质量估计的逆问题在域自适应框架中提出,网络采用最大均值差异策略进行训练,以对齐源域和目标域的特征表示。结果表明,显式处理域偏移能有效提高预测精度,尤其是在系统的物理行为和域偏移来源不完全已知且超出训练条件的情况下。这些发现凸显了域偏移感知模型在结构健康监测回归任务中的潜力。

英文摘要

This work investigates the application of a domain-shift aware neural network for regression tasks aimed at estimating unbalance masses in rotating shafts under varying operating conditions. Experimental data were collected from a test rig in which a primary shaft, equipped with a flange carrying unbalanced masses, was driven at different rotational speeds, while a secondary shaft could be optionally activated to introduce domain discrepancy. The unbalance masses were positioned at a fixed radial distance, and the dynamic response of the system was recorded using triaxial accelerometers. The inverse problem of mass estimation is formulated within a domain adaptation framework, where the network is trained with a maximum mean discrepancy strategy to align feature representations across source and target distributions. The results demonstrate the effectiveness of explicitly addressing domain shift in improving prediction accuracy, especially when the system's physical behavior and sources of domain discrepancy are not fully known and fall outside the training conditions. These findings highlight the potential of domain-shift aware models for regression tasks in Structural Health Monitoring.

2606.18933 2026-06-18 cs.LG cs.IR stat.ME 新提交

Zero-Shot Active Feature Acquisition via LLM-Elicitation

基于LLM启发式的零样本主动特征获取

Binyamin Perets, Natalie Mendelson, Shiran Vainberg, Yehuda Chowers, Shai Shen-Orr, Shie Mannor

发表机构 * Faculty of EE, Technion(技术学院电子工程系) Faculty of Medicine, Technion(技术学院医学院) CytoReason NVIDIA

AI总结 提出通过LLM启发式获取马尔可夫随机场充分统计量的零样本主动特征获取框架,解决数据标注不足问题,在IBD患者诊断中优于现有方法。

详情
AI中文摘要

主动特征获取(AFA)顺序选择要观察的特征以达成分类或排序决策。其主要局限性在于依赖大量标注数据来拟合指导获取的概率模型。大型语言模型(LLM)提供无监督的领域知识,但作为序列规划者表现不佳。要求其同时知晓和决策会混淆最好分开的能力。这里,我们通过严格的启发式方法开发了一个零样本AFA框架:仅要求LLM返回其可被信任返回的内容,即马尔可夫随机场(MRF)的充分统计量——一元偏差和成对协变。我们将该框架应用于两个场景:二分类和top-$k$识别。实践中,LLM可靠地仅返回判别性统计量,即区分类别而非孤立每个类别的统计量,这阻碍了经典AFA。我们应用最大熵闭包来解决这种规范模糊性。我们在炎症性肠病(IBD)患者队列上进行评估,这是一个活跃的临床环境,其中诊断模糊性和患者异质性阻碍了稳定的治疗策略。我们的框架在真实标签和其自身提取的信念上均优于LLM。在最关键的地方,即最困难的患者上,我们的top-$k$获取策略显著优于所有现有方法。

英文摘要

Active feature acquisition (AFA) sequentially selects which features to observe to reach a classification or ranking decision. Its central limitation is reliance on large amount of labeled data to fit probabilistic models guiding acquisition. Large language models (LLMs) supply unsupervised domain knowledge, but are poor sequential planners. Asking one to both know and decide conflates capabilities best kept separate. Here, we develop a framework for zero-shot AFA through disciplined elicitation: asking the LLM only for what it can be trusted to return, the unary deviations and pairwise co-variations that are the sufficient statistics of a Markov random field (MRF). We apply our framework to two settings: binary classification and top-$k$ identification. In practice, the LLM reliably returns only discriminative statistics, what distinguishes the classes rather than each class in isolation, which precludes classical AFA. We apply a maximum-entropy closure that resolves this gauge ambiguity. We evaluate on a cohort of Inflammatory Bowel Disease (IBD) patients, an active clinical setting where diagnostic ambiguity and patient heterogeneity obstruct stable treatment strategies. Our framework outperforms the LLM both on real labels and on its own extracted beliefs. Where it matters most, on the hardest patients, our top-$k$ acquisition policy markedly outperforms all existing methods.

2606.19026 2026-06-18 cs.LG cs.AI physics.ao-ph 新提交

A Hybrid LSTM--Vision Transformer Architecture for Predicting HRRR Forecast Errors

混合LSTM-视觉Transformer架构用于预测HRRR预报误差

David Aaron Evans, Jay C. Rothenberger, Kara J. Sulia, Nick P. Bassill, Chris D. Thorncroft

发表机构 * Atmospheric Sciences Research Center, University at Albany, SUNY(纽约州立大学奥尔巴尼分校大气科学研究中心) University of Oklahoma(俄克拉荷马大学) State Weather Risk Communication Center, University at Albany, SUNY(纽约州立大学奥尔巴尼分校州天气风险沟通中心)

AI总结 提出LSTM-ViT混合框架,结合地表观测时序与大气廓线,预测HRRR降水、风速和温度预报误差,相比基线LSTM性能提升,尤其降水误差预测技能提高约两倍。

Comments This manuscript is a preprint and has been submitted for peer review to the Artificial Intelligence for the Earth Systems journal. The content is subject to change based on the outcome of the peer review process and should not be considered final or definitive. Copyright in this Work may be transferred without further notice

详情
AI中文摘要

高分辨率数值天气预报(NWP)系统中的预报误差通常与未解析的边界层(PBL)过程、对流、地形诱导环流以及其他垂直结构的大气现象有关。先前的研究表明,长短期记忆(LSTM)网络可以利用中尺度观测成功预测高分辨率快速刷新(HRRR)模型的预报误差,但我们认为性能下降与复杂垂直大气演化时期有关。为解决这一局限,我们开发了一种混合LSTM-视觉Transformer(LSTM-ViT)框架,将来自地表观测的时间序列学习与来自纽约州中尺度剖面仪网络的垂直大气廓线相结合。LSTM-ViT框架被训练用于预测单个中尺度站点上HRRR的逐时降水、10米风速和2米温度预报误差。在所有三个预测变量中,相对于基线LSTM架构,引入剖面仪导出的大气结构提高了预报误差预测技能,最大提升出现在较短的预报提前期和PBL活动增强期间。对于降水预报误差,改进尤为显著,LSTM-ViT框架相对于基线LSTM实现了约两倍的预测技能提升,同时更好地捕捉了对流驱动的误差演变并减少了与PBL过程相关的退化。这些结果表明,将时间序列学习与垂直注意力机制相结合,为改进业务NWP系统中的预报误差预测提供了一条具有物理意义的途径。我们的研究为预报员提供了关于模型偏差和预报置信度的增强指导。

英文摘要

Forecast errors in high-resolution numerical weather prediction (NWP) systems are often linked to unresolved planetary boundary layer (PBL) processes, convection, terrain-induced circulations, and other vertically structured atmospheric phenomena. Previous work demonstrated that Long Short-Term Memory (LSTM) networks can successfully predict forecast errors in the High-Resolution Rapid Refresh (HRRR) model using mesonet observations, but we believe performance degradation is linked to periods of complex vertical atmospheric evolution. To address this limitation, we develop a hybrid LSTM-Vision Transformer (LSTM-ViT) framework that combines temporal sequence learning from surface observations with atmospheric profiles from the New York State Mesonet profiler network. The LSTM-ViT framework is trained to predict HRRR hourly precipitation, 10 m wind speed, and 2 m temperature forecast errors at individual mesonet stations. Across all three predictors, incorporation of profiler-derived atmospheric structure improves forecast error prediction skill relative to the baseline LSTM architecture, with the largest gains occurring at shorter forecast lead times and during periods of enhanced PBL activity. Improvements are particularly pronounced for precipitation forecast error, where the LSTM-ViT framework achieves approximately a twofold increase in predictive skill relative to the baseline LSTM while better capturing convectively driven error evolution and reducing degradation associated with PBL processes. These results demonstrate that combining temporal sequence learning with vertically informed attention mechanisms provides a physically meaningful pathway for improving forecast error prediction in operational NWP systems. Our research offers forecasters enhanced guidance regarding model bias and forecast confidence.

2606.19108 2026-06-18 cs.LG 新提交

JourneyFormer: Encoding Airbnb Guest Journey with Sequence Modeling

JourneyFormer: 使用序列建模编码Airbnb客人旅程

Daochen Zha, Chun How Tan, Xin Liu, Bin Xu, Han Zhao, Xiaowei Liu, Tracy Yu, Hui Gao, Huiji Gao, Liwei He, Stephanie Moyerman, Sanjeev Katariya

发表机构 * Airbnb

AI总结 针对Airbnb中客人序列长、探索性强且标签稀疏的问题,提出JourneyFormer序列建模解决方案,通过优化数据选择、ID嵌入、模型架构和标签归因,并在两个生产面上通过在线A/B测试验证了其有效性。

Comments Accepted by KDD 2026

详情
AI中文摘要

序列建模因其能够建模用户历史行为并推断用户意图,在推荐和排序算法中越来越受欢迎。尽管理论简单,但由于序列的复杂性和稀疏标签,序列模型在生产中的实际部署并非易事。例如,在Airbnb中,客人序列通常较长、具有探索性且复杂,我们关注的是稀疏的预订标签。因此,我们经常需要在数据和建模方面做出各种设计决策,以在有效性和可扩展性之间取得平衡。本文深入探讨了这些生产挑战,并部署了JourneyFormer,一种用于Airbnb搜索排序的序列建模解决方案。我们详细介绍了关键的设计考虑,涵盖客人事件选择、ID嵌入、模型架构和标签归因等方面。此外,我们描述了几种加速模型训练和推理的定制策略。JourneyFormer已成功部署在Airbnb的生产环境中,其有效性和影响不仅通过改进的离线排序指标得到证明,而且通过两个生产面上的在线A/B测试在关键业务指标上取得了显著提升。

英文摘要

Sequence modeling has become increasingly popular in recommendation and ranking algorithms, owing to its capacity to model users' historical behaviors and infer user intentions. Despite its theoretical simplicity, the practical deployment of a sequence model in production is non-trivial due to complexity of the sequence and sparse labels. For example, in Airbnb, guest sequences are often long, exploratory and complex, and we focus on booking labels, which are sparse. As such, we are often required to make various design decisions regarding data and modeling to strike a balance between effectiveness and scalability. This work delved into these production challenges and deployed JourneyFormer, a sequence modeling solution for search ranking at Airbnb. We detail crucial design considerations, covering aspects such as guest event selection, ID embeddings, model architecture, and label attribution. Additionally, we describe several tailored strategies to accelerate model training and inference. JourneyFormer has been successfully deployed within Airbnb's production, where its effectiveness and impact have been evidenced not only by improved offline ranking metrics but also by significant gains in key business metrics through online A/B testing across 2 production surfaces.

2606.19140 2026-06-18 cs.LG 新提交

ChronoSurv: A Clinical Pathway-Guided Graph Framework for Multimodal Survival Analysis

ChronoSurv:一种临床路径引导的多模态生存分析图框架

Hugo Miccinilli, Theo Di Piazza

发表机构 * Université Paris-Saclay, CentraleSupélec, MICS, France(巴黎-萨克雷大学,中央理工-高等电力学院,MICS,法国) University of Lyon, INSA Lyon, CREATIS, France(里昂大学,INSA里昂,CREATIS,法国)

AI总结 提出ChronoSurv,一种基于有向图的多模态生存分析框架,通过层次化拓扑和异质消息传递建模临床轨迹,在头颈癌数据集上取得最优判别性能与可靠校准。

Comments Accepted at MICCAI 2026. Submitted version due to embargo

详情
AI中文摘要

准确的生存预测对于头颈癌的个性化治疗计划至关重要,但由于多模态临床数据的异质性和高维性,这仍然具有挑战性。虽然深度生存模型在预测性能上优于经典统计方法,但现有方法通常依赖于静态融合策略或时间无关建模,限制了其捕捉结构化临床工作流程的能力。在这项工作中,我们提出了ChronoSurv,一种用于多模态生存分析的异质层次有向图框架。ChronoSurv使用与关键诊断步骤对齐的有向图,将患者护理表示为进展感知的临床轨迹。层次拓扑包含细粒度、粗粒度和全局表示,进一步支持对缺失模态的灵活适应,而异质消息传递则建模了跨模态和临床步骤的复杂非对称关系。在两个公共数据集上的实验结果表明,ChronoSurv在保持统计可靠校准的同时,实现了最先进的判别性能。全面的消融研究进一步证实了每个架构组件的贡献,突出了轨迹感知图建模在多模态生存预测中的潜力。

英文摘要

Accurate survival prediction is essential for personalized treatment planning in head and neck cancer, yet remains challenging due to the heterogeneous and high-dimensional nature of multimodal clinical data. While deep survival models have improved predictive performance over classical statistical approaches, existing methods typically rely on static fusion strategies or temporally agnostic modeling, limiting their ability to capture structured clinical workflows. In this work, we propose ChronoSurv, a heterogeneous hierarchical directed graph framework for multimodal survival analysis. ChronoSurv represents patient care as a progression-aware clinical trajectory using directed graphs aligned with key diagnostic steps. A hierarchical topology incorporates fine-grained, coarse, and global representations, further supporting flexible adaptation to missing modalities, while heterogeneous message passing models complex and asymmetric relationships across modalities and clinical steps. Experimental results on two public datasets demonstrate that ChronoSurv achieves state-of-the-art discriminative performance while maintaining statistically reliable calibration. Comprehensive ablation studies further confirm the contribution of each architectural component, highlighting the potential of trajectory-aware graph modeling for multimodal survival prediction.

2606.19230 2026-06-18 cs.LG cs.HC stat.ML 新提交

A Human-in-the-Loop Bayesian Optimization Framework for Constraint-Aware Bioprocess Development

一种面向约束感知的生物过程开发的人机协同贝叶斯优化框架

Samuel Stricker, Claus Wirnsperger, Alessandro Butté, Laura Helleckes, Gonzalo Guillén Gosálbez, Antonio del Rio Chanona, Mehmet Mercangöz

发表机构 * Imperial College London(伦敦帝国理工学院) DataHow AG ETH Zurich(苏黎世联邦理工学院)

AI总结 提出一种扩展的帕累托前沿引导采样框架,通过将高斯过程代理的约束满足概率和鲁棒性作为多目标优化目标,结合交互式仪表盘实现人机协同的约束感知生物过程优化。

详情
AI中文摘要

本文提出了帕累托前沿引导采样(PFGS)的一种扩展,这是一种人机协同(HitL)贝叶斯优化(BO)框架,其中高斯过程(GP)代理导出的量被重新表述为多目标优化问题的目标,得到的帕累托前沿暴露给领域专家进行交互式候选选择,而不是返回单一的自动推荐。该框架在两个方向上进行了扩展:约束优化通过将满足输出规格限的后验概率作为显式的帕累托目标来处理,该概率从GP后验分布解析计算得到;鲁棒优化通过蒙特卡洛采样策略来处理,该策略估计在用户定义的输入扰动变异性下的期望下置信性能,捕捉在可能的实现偏差下的性能退化。由此产生的多维帕累托表示通过交互式仪表盘上的成对二维投影同时显示预测性能、模型不确定性、概率约束满足和输入鲁棒性之间的权衡,使得选择标准能够随着代理模型的改进和开发目标的演变而迭代细化。该框架在一个八维的补料分批中国仓鼠卵巢(CHO)细胞培养模拟器上进行了展示,证明了系统性地识别高性能、满足可行性且对扰动具有鲁棒性的操作条件,并说明了专家定义的需求如何提供原则性的停止标准并支持实验资源的明智分配。

英文摘要

This work presents an extension to Pareto Front Guided Sampling (PFGS), a Human-in-the-Loop (HitL) Bayesian Optimization (BO) framework in which Gaussian process (GP) surrogate-derived quantities are reformulated as objectives of a multi-objective optimization problem, and the resulting Pareto front is exposed to a domain expert for interactive candidate selection rather than returning a single automated recommendation. The framework is extended in two directions: constrained optimization is addressed by incorporating the posterior probability of satisfying output specification limits as an explicit Pareto objective, computed analytically from the GP posterior distribution; robust optimization is addressed by a Monte Carlo sampling strategy that estimates expected lower-confidence performance over a user-defined variability of input perturbations, capturing performance degradation under likely implementation deviations. The resulting multi-dimensional Pareto representation renders trade-offs between predicted performance, model uncertainty, probabilistic constraint satisfaction, and input robustness simultaneously visible through pairwise two-dimensional projections on an interactive dashboard, enabling selection criteria to be iteratively refined as the surrogate model improves and development objectives evolve. The framework is showcased on an eight-dimensional fed-batch Chinese Hamster Ovary (CHO) cell culture simulator demonstrating systematic identification of high-performing, feasibility-compliant, and perturbation-resilient operating conditions, and illustrating how expert-defined requirements provide a principled stopping criterion and support informed allocation of experimental resources.

2606.19255 2026-06-18 cs.LG 新提交

SCAN: Enhance Time Series Anomaly Detection via Multi-Scale Neighborhood-Centered Clustering

SCAN: 通过多尺度邻域中心聚类增强时间序列异常检测

Xingze Zheng, Hanyin Cheng, Siyuan Wang, Yiting Hao, Peng Chen, Yuan Jun, Yang Shu

发表机构 * East China Normal University(华东师范大学) APPLab, Huawei(华为2012应用实验室) Huawei(华为)

AI总结 提出SCAN方法,通过多尺度聚类增强重建型异常检测,在表示层集成正常模式聚类中心约束重建,在异常判据层结合聚类概率与重建误差,并利用邻域中心表示改进聚类性能,在多个真实数据集上达到最优。

详情
AI中文摘要

时间序列异常检测在广泛的现实应用中扮演着关键角色。基于重建的方法已成为主流范式,但它们面临过度泛化和欠泛化问题,且难以平衡。为了解决这一问题,我们引入多尺度聚类来增强基于重建的方法。在表示层面,我们整合正常模式的聚类中心表示,以约束模型针对代表性正常模式进行重建,防止强大能力和表示能力的主导。在异常判据层面,我们基于聚类成员概率推导异常置信度分数,并将其与重建误差结合,提供双重检测标准。此外,聚类中心表示和异常置信度分数的有效性取决于聚类性能。因此,我们提取邻域中心表示用于多视图聚类,以提高聚类性能。在来自不同应用领域的多个真实数据集上的大量实验表明,SCAN达到了最先进的性能。

英文摘要

Time series anomaly detection plays a crucial role in a wide range of real-world applications. Reconstruction-based methods have become the mainstream paradigm, but they suffer from over-generalization and under-generalization problems, which are challenging to balance. To address this, we introduce multi-scale clustering to enhance reconstruction-based methods. At the representation level, we integrate the cluster center representations of normal patterns to constrain the model to target representative normal patterns for reconstruction, preventing dominance of powerful capacity and representation capability. At the anomaly criterion level, we derive anomaly confidence score based on cluster membership probability and combine it with reconstruction error, providing dual criteria for detection. Furthermore, the effectiveness of the cluster center representations and anomaly confidence score depends on the clustering performance. Accordingly, we extract neighborhood-centered representations for multi-view clustering to improve clustering performance. Extensive experiments on multiple real-world datasets from diverse application domains demonstrate the state-of-the-art performance of SCAN.

2606.19292 2026-06-18 cs.LG 新提交

Risk Stratification for ICU Delirium using Pervasive Ambient Sensing Information

使用普适环境感知信息进行ICU谵妄风险分层

Jiaqing Zhang, Sabyasachi Bandyopadhyay, Miguel Contreras, Jessica Sena, Yuanfang Ren, Andrea Davidson, Ziyuan Guan, Tezcan Ozrazgat-Baslanti, Subhash Nerella, Azra Bihorac, Parisa Rashidi

发表机构 * University of Florida(佛罗里达大学) Stanford University(斯坦福大学)

AI总结 本研究利用环境声音和光照强度数据,通过高效序列神经网络模型预测ICU患者谵妄风险,发现声音是主要预测因子,结合光照可改善短期预测,AUC达0.80。

详情
AI中文摘要

谵妄是重症监护室(ICU)中常见且严重的并发症,与发病率增加、住院时间延长和医疗成本升高相关。尽管其普遍存在,早期预测和预防仍具挑战性。环境因素如环境声音和光照可能影响谵妄的发生,但在风险评估中常被忽视。在本研究中,我们检验了光照强度和声压级是否能在多个预测时间窗口内独立预测谵妄。我们评估了四种高效的序列神经网络模型,这些模型基于来自9个ICU的309名患者的数据,用于预测10种预测窗口大小的谵妄。我们使用Shapley Additive Explanations分析报告了特征重要性和影响方向。卷积模型实现了最强的区分能力,在声音数据和组合数据上的AUC均为0.80。声音特征是整体上的主要预测因子。将声音与光照结合改善了短期(<1周)预测,组合模型在感知期后立即分配最高风险。这些发现表明,被动环境感知,尤其是声音,可以为谵妄风险评估增加临床上有意义、可解释的信号,并为丰富多模态ICU预测和预防策略提供实用途径。

英文摘要

Delirium is a common and serious complication in the Intensive Care Unit (ICU), associated with increased morbidity, prolonged hospital stays, and higher healthcare costs. Despite its prevalence, early prediction and prevention remain challenging. Environmental factors such as ambient sound and light may influence the onset of delirium, yet they are often overlooked in risk assessments. In this study, we examined whether light intensity and sound pressure levels can independently predict delirium across multiple prediction horizons. We evaluated four efficient sequential neural network models on data collected from 9 ICUs across 309 patients to predict delirium for 10 prediction-window sizes. We reported feature importance and direction of influence using Shapley Additive Explanations analysis. The convolutional model achieved the strongest discrimination, with AUC = 0.80 on sound data and on combined data. Sound features were the dominant predictors overall. Integrating sound with light improved short-term ($<1$ week) prediction, with the combined model assigning the highest risk immediately after the sensing period. These findings suggest that passive ambient sensing, especially sound, can add a clinically meaningful, interpretable signal for delirium risk estimation and offer a practical pathway to enrich multimodal ICU prediction and prevention strategies.

2606.17077 2026-06-18 physics.chem-ph cs.AI cs.LG quant-ph 交叉投稿

Comprehensive pKa Data Augmentation from Limited Real Data through an Engineered Models-Quantum Framework

基于工程化模型-量子框架从有限真实数据中全面增强pKa数据

Wang Rui, Liu Dinghao

发表机构 * Department of Chemistry, Tsinghua University(清华大学化学系) Department of Chemical Engineering, Tsinghua University(清华大学化学工程系) School of Science, China Pharmaceutical University(中国药科大学理学院)

AI总结 针对pKa数据稀疏问题,提出量子辅助分子生成方法,利用优化机器学习模型预测和量子退火器采样,在相干伊辛机上实现极端值采样。

详情
AI中文摘要

质子解离常数(pKa)对于功能分子发现和分子建模至关重要。基于已建立的最大实验pKa数据库iBonD,我们和其他研究人员开发了多种方法,包括基于机器学习的经验预测和高精度能量计算。尽管如此,高质量pKa数据的快速增强仍然受到根本性限制。作为这项工作的一部分,我们使用一组经过广泛优化的机器学习模型,对未标记分子数据集进行了大规模基于回归的pKa预测。结果表明,由于未标记分子数据集的特征分布,pKa数据分布近似正态,尾部区域样本极度稀缺。尽管这种增强对于提高整体数据可用性和预测建模非常有价值,但对于高效发现具有广谱pKa性质的分子仍然不足。为了解决这个问题,我们探索从广阔的化学空间中定向生成具有稀疏pKa性质的分子。鉴于传统的连续潜在空间VAE-RNN分子生成方法稳定性不足,且在补充稀疏数据方面未能显示出明显优势,我们设计并实现了一种量子辅助的稀疏pKa分子生成。在模拟量子退火器上验证了可行性,并在物理相干伊辛机(CIM)上进一步实现了优越的极端值采样。(未完待续)

英文摘要

Proton dissociation constants (pKa) are critical for functional molecule discovery and molecular modeling. Building on iBonD, the largest experimental pKa database established, we and other researchers have developed several methods including machine-learning-based empirical prediction and high-accuracy energy calculations. Despite this foundation, the rapid augmentation of high-quality pKa data remains fundamentally constrained. As part of this work, we performed large-scale regression-based pKa prediction on unlabeled molecular datasets using a collection of extensively optimized machine-learning models. The results indicate that, since the feature distributions of unlabeled molecular datasets, the pKa data distribution approximates normality, with extreme scarcity of tail-region samples. Although such augmentation is highly valuable for improving overall data availability and predictive modeling, it remains insufficient for efficiently discovering molecules with broad-spectrum pKa properties. To address this, we explore the targeted generation of molecules with sparse pKa properties from the vast chemical space. Given that traditional continuous latent space VAE-RNN methods for molecular generation suffer from insufficient stability and fail to demonstrate clear advantages in complementing sparse data, we design and implement a quantum-assisted sparse-pKa molecular generation. Feasibility is validated on a simulated quantum annealer, and superior extreme-value sampling is further achieved on physical coherent Ising machines (CIMs). (to be continued)

2601.23018 2026-06-18 cs.HC cs.AI cs.LG 交叉投稿

Integrating Multi-Label Classification and Generative AI for Scalable Analysis of User Feedback

整合多标签分类与生成式AI实现用户反馈的可扩展分析

Sandra Loop, Erik Bertram, Sebastian Juhl, Martin Schrepp

发表机构 * SAP SE(SAP公司) Hochschule Fresenius Heidelberg(弗赖辛大学海德堡分校) University of Missouri(密苏里大学)

AI总结 提出结合监督多标签分类与生成式AI的方法,高效处理大量用户评论,自动分配主题标签并生成摘要,同时发现情感分析不能可靠反映产品满意度。

Comments 8 pages, 2 figures, submitted to Springer Nature

详情
AI中文摘要

在高度竞争的软件市场中,用户体验(UX)评估对于确保软件质量和促进产品长期成功至关重要。此类UX评估通常将标准化问卷的定量指标与通过开放式问题收集的定性反馈相结合。虽然开放式反馈为改进提供了有价值的见解,并有助于解释定量结果,但分析大量用户评论具有挑战性且耗时。在本文中,我们介绍了一家大型软件公司在长期UX测量项目中开发的技术,以高效处理和解释大量用户评论。为了提供收集到的评论的高层概述,我们采用监督机器学习方法,为每条评论分配有意义的预定义主题标签。此外,我们展示了如何利用生成式AI(GenAI)创建简洁且信息丰富的用户反馈摘要,促进向组织尤其是高层管理人员有效传达发现。最后,我们研究了用户评论中表达的情感是否可以作为整体产品满意度的指标。我们的结果表明,仅凭情感分析并不能可靠地反映用户满意度。相反,产品满意度需要在调查中明确评估,以衡量用户对产品的感知。

英文摘要

In highly competitive software markets, user experience (UX) evaluation is crucial for ensuring software quality and fostering long-term product success. Such UX evaluations typically combine quantitative metrics from standardized questionnaires with qualitative feedback collected through open-ended questions. While open-ended feedback offers valuable insights for improvement and helps explain quantitative results, analyzing large volumes of user comments is challenging and time-consuming. In this paper, we present techniques developed during a long-term UX measurement project at a major software company to efficiently process and interpret extensive volumes of user comments. To provide a high-level overview of the collected comments, we employ a supervised machine learning approach that assigns meaningful, pre-defined topic labels to each comment. Additionally, we demonstrate how generative AI (GenAI) can be leveraged to create concise and informative summaries of user feedback, facilitating effective communication of findings to the organization and especially upper management. Finally, we investigate whether the sentiment expressed in user comments can serve as an indicator for overall product satisfaction. Our results show that sentiment analysis alone does not reliably reflect user satisfaction. Instead, product satisfaction needs to be assessed explicitly in surveys to measure the user's perception of the product.

2606.03745 2026-06-18 hep-ph cs.LG hep-ex physics.data-an 交叉投稿

Predicting the Neutrino Mass Ordering Using Neural Networks

利用神经网络预测中微子质量顺序

T. J. C. Bezerra, L. Asquith, E. Bannister, W. Shorrock

发表机构 * Department of Physics and Astronomy, University of Sussex(苏塞克斯大学物理与天文学系)

AI总结 针对中微子质量顺序这一粒子物理核心问题,提出基于前馈神经网络分类器的机器学习方法,利用合成长基线数据集训练,并与标准χ²和logL方法对比,证明其性能相当,可作为独立交叉检验工具。

Comments 11 pages, 7 figures

详情
AI中文摘要

确定中微子质量顺序仍是粒子物理中的一个核心开放问题。虽然下一代长基线实验有望解决这一问题,但当前数据提供的灵敏度有限,因为正常顺序和倒置顺序之间的谱差异细微且与参数简并纠缠。我们研究了一种用于质量顺序确定的机器学习策略,使用前馈神经网络分类器,该分类器在合成长基线数据集上训练,这些数据集由三味振荡概率、物质效应和统计涨落生成。我们使用常见的判别指标(包括接收者操作特征曲线)将分类器与标准χ²和logL方法进行评估,以量化灵敏度并说明如何选择操作点以优先考虑纯度或效率。我们发现,在所研究的场景中,神经网络实现了与常规拟合相当的性能,为已有分析提供了灵活、独立的交叉检验。该框架可以扩展以包含系统不确定性并探索振荡参数的联合推断,也可作为在中微子物理中引入机器学习方法的教学工具。

英文摘要

Determining the neutrino mass ordering remains a central open problem in particle physics. While next-generation long-baseline experiments are expected to resolve this question, current data provide limited sensitivity because the spectral differences between normal and inverted ordering are subtle and entangled with parameter degeneracies. We investigate a machine-learning strategy for mass-ordering determination using a feed-forward neural-network classifier trained on synthetic long-baseline datasets generated with three-flavour oscillation probabilities, matter effects, and statistical fluctuations. We evaluate the classifier against standard $χ^2$ and $\log\mathcal{L}$ approaches using common discrimination metrics, including receiver-operating-characteristic curves, to quantify sensitivity and to illustrate how operating points can be selected to prioritise purity or efficiency. We find that the neural network achieves performance comparable to conventional fits for the scenarios studied, providing a flexible, independent cross-check of established analyses. The framework can be extended to incorporate systematic uncertainties and to explore joint inference of oscillation parameters, and it may also serve as a pedagogical tool for introducing machine-learning methods in neutrino physics.

2606.18271 2026-06-18 cs.AI cs.LG 交叉投稿

NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation

NAVI-Orbital:用于自主地球观测的零样本视觉语言模型的首次在轨演示

Juan Manuel Delfa Victoria, Taran Cyriac John, Andrew W. Herson

发表机构 * NASA Jet Propulsion Laboratory (JPL)(美国宇航局喷气推进实验室) Loft Orbital(Loft Orbital公司)

AI总结 本文介绍NAVI-Orbital系统,在低地球轨道卫星上首次实现视觉语言模型的自主多模态推理,通过语义压缩解决数据下传瓶颈。

Comments 17 pages, 47 figures

详情
AI中文摘要

随着地球观测数据的生成速度超过下行链路带宽和人在回路处理能力,星载采集与可操作地面情报之间的差距日益扩大。本文介绍NAVI-Orbital,一个部署在低地球轨道(LEO)航天器上的软件系统。2026年4月16日,NAVI-Orbital实现了据作者所知首次在轨演示,即视觉语言模型完全在星上进行自主多模态推理。NAVI-Orbital使用本地视觉语言模型(Gemma 3)对每个捕获场景进行分类,生成其内容及特征间关系的文本描述,并通过自然语言对话响应操作员的后续查询。该系统通过纯英语提示替代传统指令序列进行任务重定向,并由基于图的状态机(LangGraph)编排,协调用于检测和对话的专用代理。地面基准测试(在7,960张图像的精选AID基准上准确率达88.16%)、Flatsat验证以及实时在轨捕获的新获取、未见过的地球图像(包括未校正的YAM-9图像,在星上通过硬件加速GPU推理处理且未对飞行仪器进行微调)的结果表明,在卫星级边缘计算机上运行基础模型是可行的,通过星上地球观测的语义压缩,颠覆了传统的先采集后全部下传的带宽模式。

英文摘要

As Earth Observation data generation outpaces downlink bandwidth and human-in-the-loop processing, a widening gap has emerged between onboard collection and actionable ground intelligence. This paper presents NAVI-Orbital, a software system deployed on a Low Earth Orbit (LEO) spacecraft. On April 16, 2026, NAVI-Orbital achieved what is, to the authors' knowledge, the first in-orbit demonstration of a vision-language model performing autonomous multi-modal inference entirely onboard. NAVI-Orbital uses a local vision-language model (Gemma 3) to classify each captured scene, produce a text description of its content and the relationships between its features, and respond to operator follow-up via natural-language dialogue. The system is re-tasked through plain-English prompts in place of conventional command sequences, and is orchestrated by a graph-based state machine (LangGraph) coordinating dedicated agents for detection and dialogue. Results across ground benchmarking (88.16% accuracy on the 7,960-image curated AID benchmark), Flatsat validation, and live in-orbit captures of newly acquired, previously unseen Earth imagery (including uncorrected YAM-9 imagery, processed onboard with hardware-accelerated GPU inference and no fine-tuning for the flight instrument) demonstrate the feasibility of running foundation models on satellite-class edge computers to invert the conventional acquire-then-downlink-everything bandwidth profile through semantic compression of Earth observations in-orbit.

2606.18323 2026-06-18 cs.SD cs.LG 交叉投稿

Reliable Neural-Codec Text-to-Speech by ASR Self-Verification and Distillation: Near-Zero Catastrophic Failures Across Models and Codecs

通过ASR自验证与蒸馏实现可靠的神经编解码文本转语音:跨模型与编解码器的近零灾难性失败

Ali Asaria, Tony Salomone, Deep Gandhi

发表机构 * Transformer Lab

AI总结 针对开放自回归神经编解码TTS模型的随机灾难性失败(静音、早停、重复或幻觉),提出基于ASR往返的格式鲁棒度量,通过最佳N自验证将失败率降至近零,并通过蒸馏将鲁棒性迁移至单次解码,在无测试代价下关闭约52-58%的失败。

详情
AI中文摘要

开放自回归神经编解码文本转语音(TTS)模型在典型输入上表现优异,但会出现随机灾难性失败:在相当一部分话语中,它们会发出静音、提前终止或陷入重复或幻觉内容。我们表明这种失败模式可以廉价地消除。在单一格式鲁棒度量(通过ASR往返的灾难性失败率)下,最佳N ASR自验证将失败率降至近零:在标准语料库(LibriSpeech)上N=2时未观察到失败,在困难提示集上N=4时也未观察到。这不是单一模型的假象:该减少在四个开放编解码TTS系统和三个神经编解码器(XCodec2、SNAC、Mimi)上复现,其中三个系统在N=2时达到近零下限。然后,通过将自验证行为蒸馏到模型中,我们在推理时免费实现了修复,这恢复了单次解码中的大部分鲁棒性,在无测试代价下关闭了困难输入上约52-58%的失败。蒸馏增益集中在需要的地方(困难输入);在已经可靠的散文上,没有改进空间且无检测到变化。一项受控比较添加了一个干净的负面结果:离线直接偏好优化(DPO/IPO)并未优于普通监督蒸馏,而在线迭代变体虽有前景但在我们的评估规模下统计上不显著。我们诚实地报告了唯一抵抗的模型(一个更大的Llasa,其中规模并未明显帮助)以及一个罕见词能力上限,该上限无法通过任何自蒸馏方法克服。

英文摘要

Open autoregressive neural-codec text-to-speech (TTS) models sound excellent on typical inputs yet suffer stochastic catastrophic failures: on a meaningful fraction of utterances they emit silence, terminate early, or collapse into repetitive or hallucinated content. We show this failure mode is cheap to remove. Under a single format-robust metric (a catastrophic-failure rate via an ASR round-trip), best-of-N ASR self-verification drives failures to near-zero: no observed failures remain by N=2 on a standard corpus (LibriSpeech) and by N=4 on a hard prompt set. This is not an artifact of one model: the reduction replicates across four open codec-TTS systems and three neural codecs (XCodec2, SNAC, Mimi), reaching the near-zero floor by N=2 on three of the four. We then make the fix free at inference time by distilling the self-verified behaviour into the model, which recovers much of the robustness in single-shot decoding, closing ~52-58% of the failure mass on hard inputs at no test-time cost. The distillation gain concentrates where it is needed (hard inputs); on already-reliable prose there is no headroom and no detectable change. A controlled comparison adds a clean negative: offline direct preference optimization (DPO/IPO) does not beat plain supervised distillation, and an online iterative variant is promising but not statistically separable at our evaluation size. We report honestly the one model that resists (a larger Llasa where scale did not obviously help) and a rare-word capability ceiling that no self-distillation method overcomes

2606.18429 2026-06-18 cs.CV cs.AI cs.LG 交叉投稿

CAOA -- Completion-Assisted Object-CAD Alignment

CAOA -- 补全辅助的物体-CAD对齐

Hiranya Garbha Kumar, Minhas Kamal, Balakrishnan Prabhakaran

发表机构 * University at Albany(奥尔巴尼大学)

AI总结 提出CAOA方法,结合语义感知点云补全和对称感知相对位姿估计,在Scan2CAD上实现17%精度提升,并发布S2C-Completion数据集。

Comments GitHub: https://github.com/MinhasKamal/CAOA

详情
Journal ref
Thirteenth International Conference on 3D Vision (3DV), 2026
AI中文摘要

准确地将CAD模型与室内RGB-D扫描中的对应物体对齐是3D语义重建的核心挑战。该任务需要估计9自由度(DoF)位姿——位置、旋转和三轴尺度——但受到噪声和不完整扫描以及导致几何畸变的分割误差的阻碍。我们提出补全辅助的物体-CAD对齐(CAOA),该方法将语义和上下文感知的点云补全模块与对称感知的相对位姿估计算法相结合,实现CAD模型与扫描物体的精确对齐。现有的补全方法通常在合成数据集上训练和评估,往往难以泛化到真实扫描。为弥合这一差距,我们引入了一种针对室内场景的合成数据生成策略,通过与广泛使用的补全数据集进行定量比较,验证了其显著减小合成到真实领域差距的效果。此外,我们发布了S2C-Completion,一个来自Scan2CAD的超过8500个物体-CAD对的专家标注数据集,用于真实室内单物体补全,并作为该任务的新基准。对于物体-CAD对齐,我们通过对称感知损失融入对称信息,提高了对对称模糊的鲁棒性。在Scan2CAD基准上,CAOA相比最先进方法实现了17%的精度提升。

英文摘要

Accurately aligning CAD models to their corresponding objects in indoor RGB-D scans is a central challenge in 3D semantic reconstruction. The task requires estimating a 9-Degree-of-Freedom (DoF) pose-position, rotation, and scale along three axes-but is hindered by noisy and incomplete scans, as well as segmentation errors that cause geometric distortions. We present Completion-Assisted Object-CAD Alignment (CAOA), a method that integrates a semantically and contextually aware point cloud completion module with a symmetry-aware relative pose estimation algorithm, enabling precise alignment of CAD models to scanned objects. Existing completion methods are typically trained and evaluated on synthetic datasets, which often fail to generalize to real-world scans. To bridge this gap, we introduce a synthetic data generation strategy tailored to indoor scenes, significantly reducing the synthetic-to-real domain gap-validated through quantitative comparisons with widely used completion datasets. In addition, we release S2C-Completion, an expert-annotated dataset of over 8,500 object-CAD pairs from Scan2CAD, created for real-world indoor single-object completion and intended as a new benchmark for this task. For object-CAD alignment, we incorporate symmetry information via a symmetry-aware loss, improving robustness to symmetric ambiguities. On the Scan2CAD benchmark, CAOA achieves a 17% accuracy improvement over state-of-the-art methods.

2606.18464 2026-06-18 astro-ph.IM astro-ph.EP cs.LG 交叉投稿

Modeling Doppler Shifts in Radial-Velocity Data with Deep Learning toward Earth-mass Exoplanet Detection

利用深度学习建模径向速度数据中的多普勒频移以探测地球质量系外行星

Isidro Gómez-Vargas, Xavier Dumusque, Yinan Zhao, Khaled Al Moulla, Michael Cretignier

发表机构 * Department of Astronomy, University of Geneva 51 chemin de Pegasi, 1290 Versoix, Switzerland. Instituto de Astrofı\'isica de Andaluc\'ia (CSIC), Glorieta de la Astronom\'ia s/n, E-18008 Granada, Spain. Institute of Space Sciences (CSIC), Carrer de Can Magrans s/n, E-08193 Barcelona, Spain. Department of Astronomy, University of Texas at Austin, 2515 Speedway, Austin, TX 78712, USA. Instituto de Astrofísica e Ciências do Espaço, Universidade do Porto, CAUP, Rua das Estrelas, 4150-762 Porto, Portugal. Department of Physics, University of Oxford, OX13RH Oxford, UK.

AI总结 针对恒星活动干扰,提出结合物理启发光谱表示与深度学习的框架,通过交叉验证和遗传算法优化,可靠恢复振幅≥25 cm/s、周期10-550天的行星信号,并发布Python包doppleriann。

Comments 20 pages, 14 figures. Accepted for publication in Astronomy & Astrophysics

详情
AI中文摘要

由于恒星活动的影响,在恒星径向速度测量中探测由地球质量行星引起的微小多普勒频移仍然极具挑战性。许多在模拟数据上表现良好的深度学习方法难以可靠地应用于真实恒星光谱。本工作的目标是开发一种深度学习框架,使其能够泛化到真实、未见过的光谱,并提高径向速度数据中地球质量行星的可探测性。我们在注入行星信号的HARPS-N太阳光谱上训练人工神经网络,使用基于通量和谱线形成温度的物理驱动光谱表示,以及它们的速度梯度。探索了两种训练策略:留出测试和交叉验证。通过基于遗传算法的超参数优化增强模型鲁棒性,并使用蒙特卡洛dropout量化预测不确定性。在交叉验证策略下,我们最精确的神经网络模型能够可靠地恢复振幅≥25 cm/s、周期在10到550天之间的行星信号的振幅、相位和轨道周期。此外,在所有测试案例中,成功恢复的信号对应于多普勒频移预测周期图中最显著的峰值。基于温度的光谱壳表示始终优于基于通量的壳。我们还发布了实现该框架的Python包doppleriann。我们的结果表明,将物理驱动的光谱表示与深度学习相结合,为从真实观测的径向速度数据中探测地球质量行星提供了一条有前景的途径,该建模框架既具有物理基础又具有统计严谨性,并包含了不确定性量化和优化的训练策略。

英文摘要

Detecting the tiny Doppler shifts induced by Earth-mass planets in stellar radial-velocity measurements remains extremely challenging due to stellar activity. Many deep-learning methods performing well on simulated data remain difficult to apply reliably on real stellar spectra. The aim of this work is to develop a deep-learning framework that generalizes to real, unseen spectra and improves the detectability of Earth-mass planets in radial-velocity data. We train artificial neural networks on HARPS-N solar spectra with injected planetary signals, using physics-motivated spectral representations based on flux and line-formation temperature, together with their velocity gradients. Two training strategies are explored: hold-out testing and cross-validation. Model robustness is enhanced through genetic-algorithm-based hyperparameter optimization, and predictive uncertainty is quantified using Monte Carlo dropout. Our most precise neural network model reliably retrieves, under the cross-validation strategy, the amplitudes, phases, and orbital periods of planetary signals with amplitudes greater than or equal to 25 cm/s and periods between 10 and 550 days. In addition, in all cases tested here, the successfully recovered signals correspond to the most significant peaks in the periodograms of the Doppler-shift predictions. Temperature-based spectral-shell representations consistently outperform flux-based shells. We also release doppleriann, a Python package implementing the proposed framework. Our results demonstrate that combining physically motivated spectral representations with deep learning provides a promising pathway toward the detection of Earth-mass planets in radial-velocity data from real observations, supported by a modeling framework that is both physically grounded and statistically rigorous, incorporating uncertainty quantification and optimized training strategies.

2606.18698 2026-06-18 cs.RO cs.AI cs.LG 交叉投稿

Leveraging Energy Features for Surface Classification with Deep Learning: A Comparative Analysis Across Three Independent Datasets

利用能量特征进行基于深度学习的表面分类:三个独立数据集的比较分析

Alexander Belyaev, Oleg Kushnarev

AI总结 研究评估能量特征作为表面分类的独立或辅助模态的可行性,在三个数据集上比较多种深度学习架构,发现CNN性能最优,纯能量特征准确率85-90%,与惯性特征结合可达96-99%,且能量特征可稳定提升1-2%准确率。

详情
AI中文摘要

基于能量的方法在移动机器人表面分类中仍是一个相对未被充分研究的途径,尽管在受限环境中取得了有希望的结果。本研究评估了使用能量衍生特征作为独立分类模态或作为惯性数据补充输入的可行性。在三个公开数据集上进行了全面评估,比较了现代深度学习架构(包括循环神经网络、卷积神经网络、仅编码器变压器和Mamba状态空间模型)在自动超参数调整和输入序列长度优化下的性能。模型在所有评估数据集上均实现了比先前报道值更高的准确率,其中卷积神经网络取得了最高的整体性能。当仅依赖基于能量的特征时,模型分类准确率在85-90%范围内,比与惯性特征结合时(96-99%)低约5-10%。用能量特征增强惯性数据导致平均准确率持续提高1-2%。这些发现表明,仅依赖能量特征的分类器为独立部署提供了足够的准确性,同时在与其它感知模态结合使用时也提供了一致的增益。

英文摘要

The energy-based method remains a comparatively underexamined approach for surface classification in mobile robotics, despite promising results in constrained environments. This study evaluated the viability of using energy-derived features as either a standalone classification modality or as supplementary input to inertial data. A comprehensive evaluation was conducted across three publicly available datasets, comparing the performance of modern deep learning architectures including recurrent neural networks, convolutional neural networks, encoder-only transformers, and Mamba state-space models, under automated hyperparameter tuning and input sequence length optimization. The models achieved higher accuracy than previously reported values on all evaluated datasets, with the convolutional neural network yielding the highest overall performance. When relying exclusively on energy-based features, the models attained classification accuracies in the range of 85-90%, approximately 5-10% lower than those achieved when combined with inertial features (96-99%). Augmenting inertial data with energy features resulted in a consistent mean accuracy improvement of 1-2%. These findings indicate that classifiers relying solely on energy features offer sufficient accuracy for standalone deployment, while also providing a consistent gain when used in combination with other sensing modalities.

2606.18723 2026-06-18 cs.CV cs.LG 交叉投稿

Clinically Aligned Geometry Constraints for Robust IVUS Vessel Boundary Segmentation

临床对齐的几何约束用于鲁棒的IVUS血管边界分割

Yunshu Chen, Litao Yang, Giuseppe Di Giovanni, Jordan Tan, Deval Mehta, Andrew Lin, Derek Chew, Masasi Fujino, Julie Butters, Stephen Nicholls, Zongyuan Ge, Kyung Hoon Cho

发表机构 * AIM For Health Lab, Monash University(莫纳什大学AIM健康实验室) Department of Data Science and Artificial Intelligence, Faculty of IT, Monash University(莫纳什大学信息技术学院数据科学与人工智能系) Monash University Victorian Heart Institute(莫纳什大学维多利亚心脏研究所) School of Computing Technologies, RMIT University(皇家墨尔本理工大学计算技术学院) National Cerebral and Cardiovascular Center(国立循环器病研究中心) Department of Cardiology, Chonnam National University Hospital and Medical School(全南大学医院和医学院心脏病学系)

AI总结 提出GeoCat网络,通过双编码器与可微几何一致性损失,在IVUS分割中降低边界漂移和拓扑错误,提升临床几何测量精度。

Comments MICCAI2026 Accepted

详情
AI中文摘要

血管内超声(IVUS)管腔和外弹性膜(EEM)分割对于定量评估冠状动脉斑块负荷至关重要。管腔或EEM勾画的误差会直接传播到斑块面积、斑块负荷和几何测量中。然而,优先考虑重叠分数的标准方法常常遭受边界漂移和拓扑错误,导致临床测量不准确。我们提出GeoCat,一个几何一致性网络,使用双笛卡尔-极坐标编码器,结合跨域注意力和时间融合,处理5帧IVUS片段。可微的几何一致性损失直接监督临床相关描述符,包括直径、方向和横截面积。该模型在来自146名患者的12,242张标注帧上训练,这些帧使用两种商用IVUS系统采集。我们使用分割准确性和斑块相关临床指标评估性能,包括Dice/IoU、边界测量(95HD(mm)、ASSD)、拓扑违规率和临床几何误差(dmax/dmin、角度和面积)。在我们的数据集上,GeoCat实现了0.93的Dice,将95HD降低到0.14 mm,并将拓扑违规率降低到1.0%。重要的是,它显著提高了几何保真度,产生0.13-0.16 mm的直径误差和约8度的角度误差,支持可靠的斑块负荷量化。

英文摘要

Intravascular ultrasound (IVUS) lumen and external elastic membrane (EEM) segmentation is important for quantitative coronary plaque burden assessment. Errors in lumen or EEM delineation directly propagate to plaque area, plaque burden and geometric measurements. However, standard methods prioritising overlap scores often suffer from boundary drift and topology errors, leading to inaccurate clinical measurements. We present GeoCat, a geometry-consistent network that processes 5-frame IVUS clips using dual Cartesian-polar encoders with cross-domain attention and temporal fusion. A differentiable geometry consistency loss directly supervises clinically relevant descriptors including diameters, orientations, and cross-sectional areas. The model is trained on 12,242 annotated frames from 146 patients acquired with two commercial IVUS systems. We evaluate performance using both segmentation accuracy and plaque-relevant clinical metrics, including Dice/IoU, boundary measures(95HD (mm), ASSD), topology violation rate, and clinical geometry errors (dmax/dmin, angles, and areas). On our dataset, GeoCat achieves a Dice of 0.93, reduces 95HD to 0.14 mm, and lowers topology violations to 1.0%. Importantly, it significantly improves geometric fidelity, yielding diameter errors of 0.13-0.16 mm and angular errors of ~8 degrees, supporting reliable plaque burden quantification.

2606.18734 2026-06-18 eess.SP cs.LG 交叉投稿

Point-Cloud-Assistant Localized Statistical Channel Prediction by Tangent Gaussian Splatting

点云辅助的切线高斯溅射局部统计信道预测

Ye Xue, Yiheng Wang, Xinhua Shao, Qi Yan, Shutao Zhang, Tsung-Hui Chang

AI总结 提出点云辅助切线高斯溅射(PC-TGS)框架,通过融合稀疏无线电测量与密集LiDAR几何数据,将角功率谱外推到未测量网格,实现大规模无线数字孪生中的高效信道预测。

详情
AI中文摘要

准确、特定地点的信道信息对于优化下一代无线网络至关重要。在各种方法中,局部统计信道建模(LSCM)通过从参考信号接收功率(RSRP)测量中建模信道多径角功率谱(APS),已成为一种针对高效网络优化的最先进方法。然而,尽管其有效性,LSCM无法在绝大多数没有测量值的位置预测APS,这严重限制了其在大规模真实场景中的适用性。为了解决这一挑战,我们提出了\emph{点云辅助切线高斯溅射}(PC-TGS),这是第一个通过将稀疏无线电测量与密集的基于LiDAR的几何信息相结合,将APS\emph{外推}到未测量室外网格的框架。PC-TGS将环境散射体表示为各向异性的3D高斯分布,通过原始点云的松弛均值重新参数化进行初始化和细化。切线平面投影将每个高斯分布精确映射到局部角度域,而深度感知的电磁溅射过程聚合它们的贡献。为了确保实际部署,我们推导了用于APS bin积分的闭式高斯加权平均(GWA),并提供了可证明的误差界。在LiDAR扫描的城市规模数据集(500万个点,6310个RSRP样本)上的评估表明,与最先进的基线相比,PC-TGS在APS和RSRP预测性能上更优,并且在外推APS任务中推理时间更快。这些结果突显了PC-TGS在大规模无线数字孪生中实现几何感知和数据高效信道预测的潜力。

英文摘要

Accurate, site-specific channel information is crucial for optimizing next-generation wireless networks. Among various approaches, localized statistical channel modeling (LSCM), which models the channel multipath angular power spectrum (APS) from the reference signal received power (RSRP) measurement, has emerged as a state-of-the-art method tailored for efficient network optimization. However, despite its effectiveness, LSCM cannot predict APS at the vast majority of locations where no measurements are available, which significantly restricts its applicability in large-scale, real-world scenarios. To address this challenge, we present \emph{point-cloud-assisted tangent Gaussian splatting} (PC-TGS), the first framework to \emph{extrapolate} APS to unmeasured outdoor grids by integrating sparse radio measurements with dense LiDAR-based geometry. PC-TGS represents environmental scatterers as anisotropic 3D Gaussians, initialized and refined through a relaxed-mean reparameterization of the raw point cloud. A tangent-plane projection accurately maps each Gaussian into the local angular domain, while a depth-aware electromagnetic splatting process aggregates their contributions. To ensure practical deployment, we derive a closed-form Gaussian-weighted average (GWA) for APS bin integration and provide a provable error bound. { Evaluations on a LiDAR-scanned city-scale dataset (5M points, 6,310 RSRP samples) demonstrate that PC-TGS achieves better APS and RSRP prediction performance compared to state-of-the-art baselines and faster inference time for APS extrapolation task. These results highlight the potential of PC-TGS to enable geometry-aware and data-efficient channel prediction in large-scale wireless digital twins.

2606.18824 2026-06-18 cs.CV cs.LG 交叉投稿

Where Will They Go? Modelling Multimodal Pedestrian Manoeuvres from Ego-centric Videos

他们将去哪里?从自我中心视频建模多模态行人机动

Yuxuan Xie, Nicolas Pugeault, Chongfeng Wei, Hubert P. H. Shum, Edmond S. L. Ho

发表机构 * School of Computing Science, University of Glasgow(格拉斯哥大学计算机科学学院) James Watt School of Engineering, University of Glasgow(格拉斯哥大学詹姆斯·瓦特工程学院) Department of Computer Science, Durham University(杜伦大学计算机科学系)

AI总结 提出MMPM框架,通过行为感知交互模块和基于CVAE的模态感知轨迹预测器,分别建模行人过马路和不过马路两种模式,提升自我中心视角下多模态轨迹预测准确性。

Comments Accepted at The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2026

详情
AI中文摘要

从自我中心摄像头进行行人轨迹预测具有挑战性,因为它依赖于与车辆和场景上下文的复杂交互以及行人的意图。通过建模行人历史与未来轨迹的相关性和意图,通常会产生多模态(即多个模式)分布。现有的随机预测器通常从单一单峰分布中采样多个未来轨迹,这可能导致次优的“混合模式”轨迹,这些轨迹位于不同的运动模式之间,并在真实场景中变得不合理。在本文中,我们提出MMPM,一种模态感知框架,基于行人的过马路行为将未来轨迹分布分别建模为语义上有意义的模式。MMPM由两个模块组成:行为感知行人交互模块(PIM),通过引入注视、头部和手势来联合捕捉行人-车辆和行人-环境交互;以及基于CVAE的模态感知轨迹预测器(MTP)模块,分别对过马路和不过马路两种模式的未来轨迹分布进行建模。基于查询的解码器进一步在解码过程中强制执行模态一致性。在PIE和JAAD数据集上的实验表明,我们的方法超越了最先进的基线。我们提出的MTP是模型无关的,可以集成到现有框架如BiTrap-NP和SGNet-ED中,以进一步提高未来轨迹预测性能。我们还引入了一种数据驱动的验证协议,将预测与时空一致的真实轨迹匹配,展示了相比先前工作改进的逐帧位移误差。

英文摘要

Pedestrian trajectory prediction from an ego-centric camera is challenging since it depends on complex interactions with vehicles and scene context, as well as the intention of the pedestrian. By modelling correlation and intent from the historical and future trajectories of the pedestrian, it will usually result in a multimodal (i.e. multiple modes) distribution. Existing stochastic predictors often sample multiple futures from a single unimodal distribution, which can yield sub-optimal 'mixed-mode' trajectories that lie between distinct motion patterns and become implausible in real scenes. In this paper, we propose MMPM, a mode-aware framework that separately models future trajectory distributions into semantically meaningful modes based on the pedestrian's crossing behavior. MMPM consists of two modules: behavior-aware Pedestrian Interaction Module (PIM) that jointly captures pedestrian-vehicle and pedestrian-environment interactions by introducing gaze, head and hand gesture, and a CVAE-based Mode-aware Trajectory Predictor (MTP) module to model the future trajectory distributions on two modes, crossing and non-crossing the road, separately. A query-based decoder further enforces mode consistency during decoding. Experiments on PIE and JAAD datasets show that our method surpasses state-of-the-art baselines. Our proposed MTP is model-agnostic, which can be integrated into existing frameworks such as BiTrap-NP and SGNet-ED to further improve future trajectory prediction performance. We additionally introduce a data-driven validation protocol that matches predictions to spatio-temporally consistent ground-truth trajectories, demonstrating improved frame-wise displacement errors over previous work.

2606.18876 2026-06-18 cs.CV cs.LG 交叉投稿

Test-Time Adaptation in Optical Coherence Tomography Using Trajectory-Aligned Time-Independent Flow

光学相干断层扫描中基于轨迹对齐的时间无关流的测试时自适应

Veit Hucke, Thomas Pinetz, Gregor Reiter, Ursula Schmidt-Erfurth, Hrvoje Bogunović

发表机构 * Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria(人工智能研究所、医学数据科学中心、维也纳医学大学,奥地利) Comprehensive Center for Artificial Intelligence in Medicine, Medical University of Vienna, Austria(医学人工智能综合中心、维也纳医学大学,奥地利) Department of Ophthalmology and Optometry, Medical University of Vienna, Austria(眼科与视光学部、维也纳医学大学,奥地利) Laboratory for Ophthalmic Image Analysis, Medical University of Vienna, Austria(眼科图像分析实验室、维也纳医学大学,奥地利)

AI总结 提出一种基于流匹配的测试时自适应方法,通过直方图匹配和去除时间条件,生成高质量替代图像,在AMD分割中达到最优性能。

Comments Accepted in MICCAI

详情
AI中文摘要

光学相干断层扫描(OCT)在眼科中至关重要,但图像质量不一致,尤其是在低成本设备中,阻碍了自动化分析。为了解决这个问题,我们引入了一种基于流匹配的测试时自适应方法,从噪声输入生成高质量替代图像。通常,测试数据和训练数据之间的域差距会导致去噪过程中像素分布不匹配。我们通过将测试图像的直方图与合成参考轨迹匹配来克服这一问题,成功地将输入与预期分布对齐。此外,我们移除了网络的时间条件,以考虑真实世界噪声分布的轻微偏差。我们的方法在分割年龄相关性黄斑变性(AMD)两个阶段的关键生物标志物方面达到了最先进的性能。代码地址:this https URL。

英文摘要

Optical coherence tomography (OCT) is essential in ophthalmology, but inconsistent image quality especially in low-cost devices hinders automated analysis. To address this, we introduce a flow-matching-based test-time adaptation method that generates high-quality surrogate images from noisy inputs. Typically, domain gaps between test and training data cause pixel distribution mismatches during the denoising process. We overcome this by matching the test image's histogram to synthetic reference trajectories, successfully aligning the input with expected distributions. Additionally, we remove the network's time conditioning to account for slight deviations in real-world noise distributions. Our approach achieves state-of-the-art performance in segmenting critical biomarkers for two stages of Age-related Macular Degeneration (AMD). Code is available: https://github.com/Veit21/tta-flow.

2606.18932 2026-06-18 astro-ph.EP astro-ph.IM cs.AI cs.LG 交叉投稿

TransitNet: A Compact Attention-Augmented Deep Learning Framework for Low-SNR Transit Blind Searches

TransitNet: 一种用于低信噪比凌星盲搜索的紧凑型注意力增强深度学习框架

Xingchen Yan, Jian Ge, Qingtian Liu, Kevin Willis, Quanquan Hu, Jiapeng Zhu

发表机构 * Shanghai Astronomical Observatory, Shanghai 200030, China(上海天文台,上海200030,中国) University of Chinese Academy of Sciences, Yanqi Lake Campus, East Road 1, Huairou, Beijing 101408, China(中国科学院大学,燕琦湖校区,东路1号,北京101408,中国) Science Talent Training Center, Gainesville, FL, 32606 USA(科学人才培训中心,佛罗里达州盖恩斯维尔,32606美国)

AI总结 提出紧凑型注意力增强深度学习框架TransitNet,用于低信噪比凌星盲搜索,在SNR 6-8范围内达到95.2%准确率,恢复率93.0%,远超TLS和BLS,且模型仅1.5 MB,推理速度提升12-25倍。

Comments 24 pages, 23 figures, 3 tables, submitted to MNRAS

详情
AI中文摘要

受中长周期地球大小行星观测不完整性的启发,我们提出了TransitNet,一种用于低信噪比凌星盲搜索的紧凑型注意力增强深度学习框架。为了实现盲搜索条件下现实的方法开发和客观的阈值校准,我们开发了一个统一的数据集构建、基准测试和阈值选择框架。在由未见过的Kepler目标构建的恢复基准测试中,TransitNet在具有挑战性的信噪比6-8范围内达到了95.2%的准确率,并优于TLS和BLS,ROC-AUC和PR-AP值分别为0.974和0.982。在一次注入的地球大小和亚地球大小凌星恢复实验中,TransitNet实现了93.0%的恢复率,显著超过TLS(63.1%)和BLS(60.0%)。除了检测,TransitNet还提供了基于注意力的凌星窗口和中点估计。在一个独立评估集上,97.4%的注入凌星被估计的凌星窗口完全覆盖。应用于真实的Kepler观测,该模型成功恢复了所有34个选定的已确认Kepler行星,平均绝对凌星中点误差为1.24小时。该模型结合了约1.5 MB的紧凑体积和高推理效率,相对于CPU-TLS加速约12-25倍,相对于CPU-BLS加速约4-5倍。这些结果表明,TransitNet在测试范围内为低信噪比凌星盲搜索提供了一个准确、可扩展且计算高效的框架,并激励其扩展到更长周期的地球大小行星搜索。

英文摘要

Motivated by the observational incompleteness of intermediate-to-long-period Earth-size planets, we present TransitNet, a compact attention-augmented deep-learning framework for low-SNR transit blind searches. To enable realistic method development and objective threshold calibration under blind-search conditions, we develop a unified dataset construction, benchmarking, and threshold-selection framework. On recovery benchmarks constructed from unseen Kepler targets, TransitNet attains 95.2 percent accuracy in the challenging SNR range of 6 to 8 and outperforms both TLS and BLS, achieving ROC-AUC and PR-AP values of 0.974 and 0.982, respectively. In an injected Earth-size and sub-Earth-size transit recovery experiment, TransitNet achieves a recovery rate of 93.0 percent, substantially exceeding those of TLS (63.1 percent) and BLS (60.0 percent). In addition to detection, TransitNet provides attention-based estimates of transit windows and midpoints. On an independent evaluation set, 97.4 percent of injected transits are fully covered by the estimated transit window. Applied to real Kepler observations, the model successfully recovers all 34 selected confirmed Kepler planets, with a mean absolute transit midpoint error of 1.24 hours. The model combines a compact footprint of about 1.5 MB with high inference efficiency, yielding speed-ups of about 12 to 25 times relative to CPU-TLS and about 4 to 5 times relative to CPU-BLS. These results demonstrate that TransitNet provides an accurate, scalable, and computationally efficient framework for low-SNR transit blind searches in the tested regime and motivate its extension to longer-period Earth-size planet searches.

2606.19092 2026-06-18 stat.AP cs.LG 交叉投稿

Context-Aware Optimization of Follow-Up Intervals for Type 2 Diabetes Care Using Markov Decision Processes

使用马尔可夫决策过程对2型糖尿病护理随访间隔进行上下文感知优化

Parisa Lotfibagha, Kristen Miller, William J. Gallagher, Elizabeth B. Selden, Muge Capan

AI总结 提出上下文马尔可夫决策过程模型,利用电子健康记录数据为2型糖尿病患者优化个性化随访间隔,识别低风险和高风险亚群,相比固定间隔策略显著降低预期累积成本。

详情
AI中文摘要

慢性病管理依赖于定期的医患互动来跟踪疾病进展和控制。对于2型糖尿病,当前指南对所有患者规定固定的初级保健随访间隔,忽略了临床轨迹和患者特征的异质性。本研究引入上下文马尔可夫决策过程模型,利用来自10个初级保健诊所的22,154名2型糖尿病患者的电子健康记录数据,优化亚群特定的随访间隔决策。上下文通过以下方式识别:i) 利用主成分分析对代表个体健康轨迹的变量进行降维,以及ii) 通过主成分和额外的患者层面特征使用聚类将患者分配到上下文中。出现了两个不同的上下文,分别代表低风险和高风险亚群。CMDP导出的策略建议:(i) 如果当前就诊的实验室值未测量,则在1个月内随访;(ii) 对于实验室值升高或近期住院,最多3个月;(iii) 对于持续血糖控制,6至12个月,高风险上下文患者的随访间隔更短。最优策略实现了比基准更低的预期累积成本(例如,在高共病上下文中,相对于美国糖尿病协会类似的固定间隔随访策略,CMDP策略降低了约34.8%的成本;在低共病上下文中降低了约6.4%)。这些发现展示了上下文感知方法如何为适应性随访策略提供信息,并有可能通过综合机器学习和概率决策模型来推进初级保健中的慢性病管理。

英文摘要

Chronic disease management relies on regular patient-provider interactions to follow-up on disease progression and control. For Type 2 Diabetes (T2D), current guidelines prescribe fixed time intervals between subsequent primary care visits for all patients, overlooking heterogeneity in clinical trajectories and patient characteristics. This study introduces a Contextual Markov Decision Process (CMDP) model to optimize subpopulation-specific follow-up interval decisions using Electronic Health Record (EHR) data from 22,154 T2D patients across 10 primary care clinics. Contexts are identified by: i) dimensionality reduction of variables representing the individual health trajectories utilizing Principal Component Analysis, and ii) assigning patients to contexts via principal components and additional patient-level features using clustering. Two distinct contexts emerged, representing a lower- and a higher-risk subpopulation. CMDP-derived policies recommend: (i) follow-up within 1 month if lab value at current visit is unmeasured; (ii) up to 3 months for elevated lab values or recent hospitalizations; and (iii) 6 to 12 months for sustained glycemic control, with shorter follow-up intervals for patients in high-risk context. The optimal policies achieved lower expected cumulative cost than benchmarks (e.g., in the higher-comorbidity context, the CMDP policy reduced cost by about 34.8%, and in the lower-comorbidity context by about 6.4%, relative to an American Diabetes Association-like fixed interval follow-up policy. These findings demonstrate how context-aware approaches can inform adaptive follow-up strategies, and have the potential to advance chronic care management in primary care by synthesizing machine learning and probabilistic decision models.

2606.19118 2026-06-18 cs.AI cs.LG econ.GN q-fin.EC 交叉投稿

Analysing drivers and interdependencies in European electricity markets using XAI

使用XAI分析欧洲电力市场的驱动因素与相互依赖性

Antoine Pesenti, Aidan O'Sullivan

发表机构 * UCL Energy Institute, University College London, UK(伦敦大学学院能源研究所,英国)

AI总结 结合深度神经网络与可解释人工智能(XAI)技术,利用SHAP和SSHAP框架分析39个欧洲竞价区的电价决定因素,发现可再生能源(尤其是太阳能)对电价形成具有重要作用,天然气价格仍是主导驱动因素,且互联互通显著影响价格动态。

Comments 12 pages

详情
AI中文摘要

电力市场本质上是复杂系统,具有强非线性、高维交互以及跨区域日益增长的相互依赖性。虽然深度神经网络(DNN)在电价预测方面表现出强大的能力,但其缺乏可解释性限制了其在理解电价形成潜在驱动因素方面的实用性。本文通过将DNN模型与可解释人工智能(XAI)技术相结合,分析了39个欧洲竞价区电价的决定因素,填补了这一空白。我们采用SHAP(SHapley Additive exPlanations)量化特征贡献,并应用和扩展了SSHAP(一种聚合框架)以提高高维设置下的可解释性。分析表明,可再生能源(尤其是太阳能)在电价形成中发挥着不成比例的重要作用,尽管其在总发电量中占比较低。天然气价格仍然是跨电力市场的主导且一致的驱动因素,而互联互通显著影响价格动态,凸显了欧洲电力系统的强相互依赖性。此外,我们构建了一个合成性的全欧盟电力市场,以探索完全一体化单一价格市场的反事实情景。

英文摘要

Electricity markets are inherently complex systems characterised by strong nonlinearities, high-dimensional interactions, and increasing interdependence across regions. While deep neural networks (DNNs) have demonstrated strong predictive capabilities for electricity prices, their lack of interpretability limits their usefulness for understanding the underlying drivers of price formation. This paper addresses this gap by combining DNN models with explainable artificial intelligence (XAI) techniques to analyse the determinants of electricity prices across 39 European bidding zones. We employ SHAP (SHapley Additive exPlanations) to quantify feature contributions and apply and extend SSHAP, an aggregation framework to improve interpretability in high-dimensional settings. The analysis identifies that renewable energy sources, particularly solar, play a disproportionately important role in price formation despite their lower share in total power generation. Gas prices remain a dominant and consistent driver across electricity markets, while interconnections significantly shape price dynamics, highlighting the strong interdependence of European electricity systems. In addition, a synthetic EU-wide electricity market is constructed to explore the counterfactual scenario of a fully integrated market with a single price.

2606.19149 2026-06-18 cs.CR cs.LG 交叉投稿

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

OpenAnt:通过代码分解、对抗性验证和动态测试实现LLM驱动的漏洞发现

Nahum Korda, Gadi Evron

AI总结 提出OpenAnt系统,结合静态分析与LLM推理,通过代码分解、对抗性验证和动态测试三阶段流水线,在降低误报率的同时发现未知漏洞。

详情
AI中文摘要

在大型代码库中自动发现漏洞仍然具有挑战性:传统静态分析误报率高,而模糊测试等动态方法需要大量基础设施且通常针对狭窄的漏洞类别。大型语言模型(LLM)的最新进展使得对程序行为进行语义推理成为可能,但将LLM应用于仓库级安全分析会引入上下文管理、成本和验证方面的挑战。我们提出了OpenAnt,一个开源漏洞发现系统,它在多阶段流水线中集成了静态程序分析与基于LLM的推理。OpenAnt引入了三种关键技术。首先,代码库被分解为自包含的分析单元,并通过从外部入口点的可达性进行过滤,将分析面减少高达97%,同时保留与攻击相关的代码。其次,候选漏洞通过受限攻击者模拟进行对抗性验证,其中模型在现实攻击者能力下评估可利用性。第三,通过动态验证确认发现结果,其中自动生成利用环境,在沙箱容器中执行,并在使用后丢弃。在包括OpenSSL、WordPress和Flowise在内的广泛使用的开源项目上的评估表明,这种架构可以识别先前未知的漏洞,同时保持可管理的分析成本并大幅减少误报。我们的结果表明,结合语义推理与利用验证的闭环漏洞发现流水线,为可扩展的自动化安全分析提供了一条实用路径。OpenAnt已在Apache 2.0许可下开源,网址为https://this https URL。

英文摘要

Automated vulnerability discovery in large codebases remains challenging: traditional static analysis produces high false-positive rates, while dynamic approaches such as fuzzing require substantial infrastructure and often target narrow classes of bugs. Recent advances in large language models (LLMs) enable semantic reasoning about program behavior, but applying LLMs to repository-scale security analysis introduces challenges related to context management, cost, and verification. We present OpenAnt, an open-source vulnerability discovery system that integrates static program analysis with LLM-based reasoning in a multi-stage pipeline. OpenAnt introduces three key techniques. First, codebases are decomposed into self-contained analysis units filtered by reachability from external entry points, reducing the analysis surface by up to 97% while preserving attack-relevant code. Second, candidate vulnerabilities undergo adversarial verification through constrained attacker simulation, where the model evaluates exploitability under realistic attacker capabilities. Third, findings are validated through dynamic verification, in which exploit environments are generated automatically, executed in sandboxed containers, and discarded after use. Evaluation on widely used open-source projects including OpenSSL, WordPress, and Flowise shows that this architecture can identify previously unknown vulnerabilities while maintaining manageable analysis cost and substantially reducing false positives. Our results suggest that closed-loop vulnerability discovery pipelines, combining semantic reasoning with exploit validation, provide a practical path toward scalable automated security analysis. OpenAnt is released as open source under the Apache 2.0 license at https://github.com/knostic/OpenAnt.

2606.19186 2026-06-18 cs.RO cs.LG 交叉投稿

Learning to Annotate Delayed and False AEB Events: A Practical System for Extreme Class Imbalance and Asymmetric Label Noise

学习标注延迟和误报AEB事件:针对极端类别不平衡和非对称标签噪声的实用系统

Mengxiang Hao, Xin Jiang, Xinghao Huang, Wenliang Su, Zhiteng Wang, Junjie Rao, Xiaotian Yang, Wei Liao, Chengyu Han, Gen Liang, Yulun Song, Zhitao Xu, Xianpeng Lang

发表机构 * Li Auto(理想汽车)

AI总结 提出首个自动化AEB标注框架,通过特定数据增强和噪声抑制技术,解决极端类别不平衡和非对称标签噪声问题,将延迟/误报触发召回率提升80%,人工工作量减少50%。

Comments 8 pages, 5 figures, accepted by IEEE International Conference on Robotics and Automation (ICRA)

详情
Journal ref
2026 IEEE International Conference on Robotics and Automation (ICRA)
AI中文摘要

自主紧急制动(AEB)优化依赖于准确标注的真实世界触发事件,特别是揭示系统缺陷的罕见但关键的延迟和误报AEB触发事件。然而,这些少数样本在每天数千次触发事件中占比不到5%,使得大规模人工标注成本过高。我们提出了首个自动化AEB标注框架来解决这一问题。在开发过程中,我们识别出两个严重损害延迟/误报触发标注准确性的基本挑战:(1)极端类别不平衡,其中延迟/误报触发被真实触发淹没;(2)非对称标签噪声,其中误标注的多数样本(真实触发)抑制了少数样本(延迟/误报触发)的学习。为克服这些挑战,我们提出两项关键创新:(1)特定数据增强,通过操纵焦点目标属性、移植自车动态和掩蔽非焦点代理来合成逼真样本;(2)噪声抑制,使用稳定硬度估计和探针引导的自适应阈值来清理误标注的真实触发样本。关键的是,我们将模型部署为具有全栈架构的实用标注系统,从每天数千个AEB事件中高效识别关键的延迟/误报触发。生产结果表明,延迟/误报触发的召回率提高了80%,人工工作量减少了50%。除了直接收益,该系统通过积累高质量标注实现持续自我改进,为车载AEB系统优化奠定了必要的数据基础。

英文摘要

Autonomous Emergency Braking (AEB) optimization relies on accurately annotated real-world trigger events, particularly rare but critical delayed and false AEB triggers that expose system deficiencies. However, these minority samples comprise less than 5% of thousands of daily triggers, making manual annotation prohibitively expensive at scale. We present the first automated AEB annotation framework to address this problem. During development, we identified two fundamental challenges that severely impair delayed/false trigger annotation accuracy: (1) Extreme class imbalance where delayed/false triggers are overwhelmed by true triggers; (2) Asymmetric label noise where mislabeled majority samples (true triggers) suppress minority samples (delayed/false triggers) learning. To overcome these challenges, we propose two key innovations: (1) Specific data augmentation that synthesizes realistic samples by manipulating focal target attributes, transplanting ego-vehicle dynamics, and masking non-focal agents; (2) noise suppression using stable hardness estimation and probe-guided adaptive threshold to clean mislabeled true trigger samples. Crucially, we deploy our model as a practical annotation system with full-stack architecture, efficiently identifying critical delayed/false triggers from thousands of daily AEB events. Production results demonstrate 80% improvement in recall of delayed/false triggers and 50% reduction in manual workload. Beyond immediate gains, the system enables continuous self-improvement through accumulated high-quality annotations, establishing a necessary data foundation for on-vehicle AEB system optimization

2606.19251 2026-06-18 physics.comp-ph cs.LG physics.flu-dyn 交叉投稿

Acceleration of an algebraic multigrid pressure solver using graph neural networks

使用图神经网络加速代数多重网格压力求解器

Eric Chillón, Artur K. Lidtke, Nguyen Anh Khoa Doan, Bernat Font

发表机构 * Faculty of Mechanical Engineering, Delft University of Technology, The Netherlands(荷兰代尔夫特理工大学机械工程学院) Maritime Research Institute Netherlands, The Netherlands(荷兰海事研究院) Department of Aeronautics, Imperial College London, United Kingdom(英国伦敦帝国理工学院航空系)

AI总结 提出一种基于图卷积同构网络的代数多重网格平滑器,通过预测最优多项式系数构造稀疏伪逆算子,减少V-cycle迭代次数,在非结构化网格上实现4%-37%的加速,并泛化至训练时未见的大规模网格。

Comments 23 pages, 11 figures

详情
AI中文摘要

求解压力-泊松方程仍然是非结构化不可压缩流求解器的主要计算瓶颈,这主要是由于传统线性求解器对网格不规则性固有的敏感性。本文引入了一种数据驱动的代数多重网格(AMG)平滑器,该平滑器使用改进的图卷积同构网络(GCIN)。图神经网络预测最优多项式系数,以在不同网格拓扑上构造稀疏伪逆算子。优化系数以减少每次V-cycle迭代后的残差。通过直接从稀疏系数矩阵捕获系统的代数结构,所提出的方法在适应非结构化网格中的局部各向异性的同时,保持了求解器的线性性。我们的框架通过减少达到给定容差所需的V-cycle次数,并在不同基准测试中实现4%到37%的墙钟加速,展示了显著的性能提升。值得注意的是,该模型在比训练时所见大128倍的网格上保持效率,并在未见过的工业相关问题上(如AirfRANS数据集)加速求解器收敛,表现出鲁棒的泛化能力。

英文摘要

Solving the pressure-Poisson equation remains the primary computational bottleneck in incompressible unstructured flow solvers primarily due to the inherent sensitivity of traditional linear solvers to mesh irregularities. This work introduces a data-driven algebraic multigrid (AMG) smoother that uses a modified graph convolutional isomorphism network (GCIN). The graph neural network predicts optimal polynomial coefficients to construct a sparse pseudo-inverse operator across diverse grid topologies. The coefficients are optimized to reduce the residual after each V-cycle iteration. By directly capturing the algebraic structure of the system from the sparse coefficient matrix, the proposed method maintains the solver's linearity while adapting to local anisotropies in unstructured grids. Our framework demonstrates significant performance gains by reducing the number of V-cycles required for a given tolerance and delivering wall-clock speedups from 4% to 37% across diverse benchmarks. Notably, the model exhibits robust generalization by maintaining efficiency on meshes up to 128 times larger than those seen in training, and by accelerating the solver's convergence on unseen industry-relevant problems such as the AirfRANS dataset.

2606.19253 2026-06-18 cs.CV cs.AI cs.LG cs.RO 交叉投稿

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

OneCanvas: 通过全景重投影实现3D场景理解

Bartłomiej Baranowski, Dave Zhenyu Chen, Matthias Nießner

发表机构 * Technical University of Munich(慕尼黑工业大学) Huawei(华为)

AI总结 提出OneCanvas方法,将多视图补丁特征聚合到全景画布上,利用深度和相机位姿进行重投影,无需复杂几何编码器或大量训练,在SQA3D等基准上达到最先进精度。

Comments Project page: https://baranowskibrt.github.io/onecanvas/

详情
AI中文摘要

现有的视觉语言模型(VLM)中的3D场景理解方法要么依赖复杂的、模型特定的几何编码器,要么为了追求空间推理而需要大量的训练预算。相反,OneCanvas将所有视图的补丁特征聚合到一个单一的等距柱状全景画布上。具体来说,每个补丁利用其深度和相机位姿被反投影到3D世界坐标,然后根据从画布原点看到的该点的连续经度和纬度放置在画布上,无需对重叠视图进行光栅化或聚合。补丁的度量坐标的3D位置嵌入被添加到其特征中,从而恢复了将世界位置压缩到角度画布坐标时丢失的深度。因此,来自所有帧的补丁共享一个空间坐标系,无需融合或对主干网络进行重大架构修改。预训练的VLM将此表示视为普通图像。由于画布可以以任何感兴趣的姿态为中心,相同的表示直接支持从特定视角进行情境推理,这是机器人和具身AI中的常见需求。得益于这种表示,我们还可以引入空间预训练课程:通过程序化地将从真实图像中提取的对象的补丁特征放置在原本空白的画布上的选定3D世界位置,我们生成了涵盖广泛空间推理任务的即时监督,并控制答案分布以减少空间推理捷径。OneCanvas在SQA3D和VSI-Bench上达到了最先进的准确率,并在SPBench上泛化到分布外数据,其训练计算量比最强竞争方法少一个数量级。

英文摘要

Existing approaches to 3D scene understanding in Vision-Language Models (VLMs) either rely on complex, model-specific geometry encoders or large training budgets in pursuit of spatial reasoning. Instead, OneCanvas aggregates patch features from all views onto a single equirectangular panoramic canvas. Namely, each patch is unprojected to a 3D world coordinate using its depth and camera pose, then placed on the canvas at the continuous longitude and latitude of that point as seen from the canvas origin, with no rasterization or aggregation across overlapping views. A 3D position embedding of the patch's metric coordinates is added to its feature, restoring the depth lost when collapsing the world position to an angular canvas coordinate. Patches from all frames thus share one spatial coordinate system with no fusion or major architectural modifications of the backbone. The pretrained VLM consumes this representation as if it were an ordinary image. Because the canvas can be centered on any pose of interest, the same representation directly supports situated reasoning from a specific viewpoint, a common requirement in robotics and embodied AI. Thanks to this representation, we can also introduce a spatial pretraining curriculum: by procedurally placing patch features of objects, drawn from real images, at chosen 3D world positions on an otherwise empty canvas, we generate on-the-fly supervision spanning a broad range of spatial reasoning tasks, with answer distributions controlled to reduce spatial reasoning shortcuts. OneCanvas achieves state-of-the-art accuracy on SQA3D and VSI-Bench, and generalizes to out-of-distribution data on SPBench, using an order of magnitude less training compute than the strongest competing methods.

2606.19302 2026-06-18 physics.ao-ph cs.LG 交叉投稿

Optimal scenario design for climate emulation

气候模拟的最优情景设计

Christopher B. Womack, Shahine Bouabid, Andrei Sokolov, Popat Salunke, Glenn Flierl, Sebastian D. Eastham, Noelle E. Selin

发表机构 * Department of Aeronautics and Astronautics, Massachusetts Institute of Technology(航空与航天系,麻省理工学院) Center for Sustainability Science and Strategy, Massachusetts Institute of Technology(可持续科学与战略中心,麻省理工学院) Department of Earth, Atmospheric, and Planetary Sciences, Massachusetts Institute of Technology(地球、大气与行星科学系,麻省理工学院) Brahmal Vasudevan Institute for Sustainable Aviation, Department of Aeronautics, Imperial College London(可持续航空研究所,帝国理工学院伦敦校区) Institute for Data, Systems, and Society, Massachusetts Institute of Technology(数据、系统与社会研究所,麻省理工学院)

AI总结 针对气候模拟器泛化能力受限的问题,提出通过可微简单气候模型优化训练数据情景,使小数据集训练的模拟器性能优于标准情景集。

详情
AI中文摘要

随着深度学习在物理系统中的普及,改进泛化性的努力主要集中在设计嵌入物理约束的架构上。然而,对于机器学习替代气候模型(模拟器),我们表明现有情景中用于生成训练数据的低结构多样性限制了预测能力。在此,我们研究是否可以优化训练数据集本身以提高泛化性。我们引入一种方法创建数据集,使模拟器能够泛化到训练数据中未出现的新结构情景。我们使用可微简单气候模型(SCM)计算模拟器损失对训练数据扰动的敏感性,迭代更新训练数据以最大化模拟器技能。对于SCM,以这种方式优化的一个情景训练出的模拟器优于在六个标准ScenarioMIP路径上训练的模拟器。尽管训练数据集更小,但我们实现了更高的预测技能,发现我们的模拟器成功隔离了不同气候强迫因子(如温室气体与气溶胶)的独特物理行为,而无需单强迫运行。然后我们证明,使用SCM优化的情景驱动中等复杂度气候模型时,产生的训练数据集比在ScenarioMIP输出上训练得到更熟练的模拟器。我们的结果表明,在运行全尺度气候模型的计算受限环境中,生成少量动态丰富的情景比扩展传统排放路径集对模拟和表征系统响应具有更大的边际价值。

英文摘要

As deep learning for physical systems continues to grow in popularity, efforts to improve generalizability have primarily focused on designing architectures that embed physical constraints. However, for machine-learning surrogate climate models (emulators), we show that the low structural diversity in existing scenarios commonly used to generate training data places a ceiling on predictive skill. Here, we examine whether training datasets themselves can be optimized to improve generalization. We introduce a method to create datasets that produce emulators capable of generalizing to new, structurally different scenarios absent from the training data. We use a differentiable Simple Climate Model (SCM) to calculate the sensitivity of emulator loss to perturbations in the training data, iteratively updating the training data to maximize emulator skill. For an SCM, training on one scenario optimized in this fashion outperforms an emulator trained on six standard ScenarioMIP pathways. We achieve this higher predictive skill despite training on a smaller dataset, finding that our emulator successfully isolates distinct physical behaviors of different climate forcing agents (e.g., greenhouse gases vs. aerosols) without single-forcing runs. We then demonstrate that scenarios optimized using an SCM, when used to drive an intermediate-complexity climate model, produce a training dataset that yields a more skillful emulator than training on ScenarioMIP outputs. Our results suggest that, in the compute-constrained environment of running full-scale climate models, generating a small number of dynamically rich scenarios provides greater marginal value for emulation and characterizing system responses than expanding the suite of traditional emissions pathways.

2606.19329 2026-06-18 astro-ph.IM cs.LG 交叉投稿

The Chandra-Gaia Catalog of Counterparts: Resolving ambiguous Gaia matches to X-ray sources in the Chandra Source Catalog using Machine Learning

钱德拉-盖亚对应体星表:利用机器学习解决钱德拉源星表中X射线源与盖亚源的多重匹配歧义

V. Samuel Pérez-Díaz, Vinay L. Kashyap, Joshua D. Ingram, David Fouhey, Juan Rafael Martínez-Galarza, Pavlos Protopapas, Jeremy J. Drake, Dong-Woo Kim, Cecilia Garraffo

发表机构 * Center for Astrophysics Harvard \& Smithsonian, 60 Garden St, Cambridge MA 02138, USA Harvard John A. Paulson School of Engineering Universidad del Rosario, School of Engineering, Science The NSF AI Institute for Artificial Intelligence New York University, Courant Institute, 60 5th Avenue, New York NY, USA Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213 New College of Florida, 5800 Bayshore Road, Sarasota, FL 34243, USA Astrophysics Laboratory, 3251 Hanover St, Palo Alto, CA 94304, USA

AI总结 提出结合源属性(星等、颜色、距离)的机器学习框架,解决钱德拉源星表与盖亚源星表的交叉匹配歧义,为约11.3万个X射线源找到对应体,并识别约2万个假匹配。

Comments Accepted to The Astrophysical Journal. Website: https://www.samuelperezdi.com/chandragaia/

详情
AI中文摘要

我们提出了一个框架,用于将钱德拉源星表(CSC v2.1)中的源与盖亚数据发布3中的光学源进行交叉匹配。与纯空间方法不同,我们使用源属性(如星等、颜色和距离)来识别真实对应体、检测偶然重合,并在存在多个合理候选者时解决歧义。我们使用NWAY(一种考虑位置误差和源密度的贝叶斯交叉匹配框架)定义高置信度匹配的训练集。我们在两个星表的多种特征上训练梯度提升分类器(LightGBM)。在约25.4万个独特X射线源中,我们为约11.3万个源找到了对应体,其中约7000个源存在多个合理对应体。对于约2万个基于分离的交叉匹配能找到匹配的源,我们未找到对应体,并将其中的一半归因于偶然重合。我们在钱德拉猎户座超深项目(COUP)上验证了该流程,机器学习匹配在不使用任何位置信息的情况下再现了NWAY交叉匹配的95%。我们发布了约11.3万个钱德拉-盖亚对应体的星表,以及约7000个替代匹配和约2万个歧义NWAY关联,以支持未来对钱德拉和盖亚均可探测到的源进行种群研究。我们讨论了局限性,并提供了该框架的泛化版本,适用于其他交叉匹配场景。

英文摘要

We present a framework to cross-match sources from the Chandra Source Catalog (CSC v2.1) with optical sources from Gaia Data Release 3. Unlike purely spatial approaches, we use source properties such as magnitudes, colors, and distances to identify true counterparts, detect chance coincidences, and resolve ambiguities when multiple plausible candidates exist. We define a training set of high-confidence matches using NWAY, a Bayesian cross-matching framework that accounts for positional errors and source densities. We train a gradient-boosted classifier (LightGBM) on a variety of features from both catalogs. Of the ~$254$k unique X-ray sources, we find counterparts for ~$113$k sources, of which plausible multiple counterparts are found for ~$7$k. We find no counterparts for ~$20$k sources for which separation-based cross-matching does find a match, and attribute half of these to chance coincidences. We validate the pipeline on the Chandra Orion Ultradeep Project (COUP), where the machine-learning matches reproduce 95% of NWAY cross-matches without using any positional information. We release a catalog of the ~$113$k Chandra-Gaia counterparts, together with ~$7$k alternative matches and ~$20$k ambiguous NWAY associations, supporting future population studies of sources detectable by both Chandra and Gaia. We discuss limitations and provide a generalization of the framework that is applicable in other cross-matching scenarios.

2509.24725 2026-06-18 cs.LG cs.AI 版本更新

Q-Net: Queue Length Estimation via Kalman-based Neural Networks

Q-Net:基于卡尔曼神经网络的队列长度估计

Ting Gao, Elvin Isufi, Winnie Daamen, Erik-Sander Smits, Serge Hoogendoorn

发表机构 * University of Amsterdam(阿姆斯特丹大学) Delft University of Technology(代尔夫特理工大学)

AI总结 本文提出Q-Net框架,通过结合卡尔曼滤波与神经网络,解决信号交叉口队列长度估计中的数据融合问题,提升空间转移性和实时性,实现无需昂贵传感设备的准确队列估计。

详情
AI中文摘要

估计信号交叉口的队列长度一直是交通管理中的长期挑战。尽管有两类隐私保护的数据源:(i) 接近停止线的环形检测器提供的车辆计数汇总数据,以及 (ii) 提供路段平均速度测量的汇总浮动汽车数据 (aFCD),但如何将这些具有不同空间和时间分辨率的数据源整合用于队列长度估计仍不清楚。为此,本文提出Q-Net:一种基于状态空间形式的队列估计框架。该设计解决了队列建模中的关键挑战,如违反交通守恒假设。Q-Net遵循卡尔曼预测-更新结构,并在状态演变和测量模型中保持物理可解释性。Q-Net使用AI增强的卡尔曼滤波器从数据中学习时间变化的增益动态。该框架支持实时实现,并通过将aFCD测量分组为固定大小的局部组来提高空间转移性,使可学习参数的数量与路段长度无关。在荷兰 Rotterdam 城市主干道的评估显示,Q-Net优于基线方法,能够准确追踪队列的形成和消散,并缓解aFCD引起的延迟。通过结合数据效率、可解释性、实时适用性和空间转移性,Q-Net在无需昂贵的传感基础设施(如摄像头或雷达)的情况下实现了准确的队列长度估计。

英文摘要

Estimating queue lengths at signalized intersections is a long-standing challenge in traffic management. Partial observability of vehicle flows complicates this task despite the availability of two privacy-preserving data sources: (i) aggregated vehicle counts from loop detectors near stop lines, and (ii) aggregated floating car data (aFCD) that provide segment-wise average speed measurements. However, how to integrate these sources with differing spatial and temporal resolutions for queue length estimation is rather unclear. Addressing this question, we present Q-Net: a queue estimation framework built upon a state-space formulation. This design addresses key challenges in queue modeling, such as violations of traffic conservation assumptions. Q-Net follows the Kalman predict-update structure and maintains physical interpretability in both the state evolution and measurement models. Q-Net uses an AI-augmented Kalman filter to learn time-varying gain dynamics from data. The framework supports real-time implementation and improves spatial transferability by grouping aFCD measurements into fixed-size local groups, making the number of learnable parameters independent of section length. Evaluations on urban main roads in Rotterdam, the Netherlands, show that Q-Net outperforms baseline methods, tracks queue formation and dissipation accurately, and mitigates aFCD-induced delays. By combining data efficiency, interpretability, real-time applicability, and spatial transferability, Q-Net makes accurate queue length estimation possible without costly sensing infrastructure like cameras or radar.

2307.05623 2026-06-18 cs.LG cs.AI 版本更新

A DeepLearning Framework for Dynamic Estimation of Origin-Destination Sequence

一种用于动态估计起点-终点序列的深度学习框架

Zheli Xiong, Defu Lian, Enhong Chen, Gang Chen, Xiaomin Cheng

发表机构 * School of Data Science University of Science(数据科学学院 中国科学技术大学) Yangtze River Delta Information Intelligence Innovation Research Institute, China(长江三角洲信息智能创新研究院)

AI总结 针对OD矩阵估计中的欠定性和滞后性问题,提出集成深度学习方法,利用神经网络推断OD序列结构并引导数值优化,实验证明能有效提供时空约束。

Comments 11 pages,25 figures

详情
AI中文摘要

OD矩阵估计是交通领域的一个关键问题。主要方法利用交通传感器测量信息(如交通计数)来估计由OD矩阵表示的交通需求。该问题分为两类:静态OD矩阵估计和动态OD矩阵序列(简称OD序列)估计。上述两类都面临由大量待估参数和不足的约束信息引起的欠定性问题。此外,OD序列估计还面临滞后挑战:由于拥堵等不同交通状况,同一车辆在相同观测时段内会出现在不同路段,导致相同的OD需求对应不同的行程。为此,本文提出一种集成方法,利用深度学习方法推断OD序列的结构,并利用结构约束指导传统数值优化。实验表明,神经网络能有效推断OD序列的结构,并为数值优化提供实用的约束以获得更好的结果。此外,实验表明,所提供的结构信息不仅包含对OD矩阵空间结构的约束,还提供了对OD序列时间结构的约束,很好地解决了滞后问题的影响。

英文摘要

OD matrix estimation is a critical problem in the transportation domain. The principle method uses the traffic sensor measured information such as traffic counts to estimate the traffic demand represented by the OD matrix. The problem is divided into two categories: static OD matrix estimation and dynamic OD matrices sequence(OD sequence for short) estimation. The above two face the underdetermination problem caused by abundant estimated parameters and insufficient constraint information. In addition, OD sequence estimation also faces the lag challenge: due to different traffic conditions such as congestion, identical vehicle will appear on different road sections during the same observation period, resulting in identical OD demands correspond to different trips. To this end, this paper proposes an integrated method, which uses deep learning methods to infer the structure of OD sequence and uses structural constraints to guide traditional numerical optimization. Our experiments show that the neural network(NN) can effectively infer the structure of the OD sequence and provide practical constraints for numerical optimization to obtain better results. Moreover, the experiments show that provided structural information contains not only constraints on the spatial structure of OD matrices but also provides constraints on the temporal structure of OD sequence, which solve the effect of the lagging problem well.

2506.13196 2026-06-18 cs.LG 版本更新

KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction

KEPLA:一种用于精确预测蛋白质-配体结合亲和力的知识增强深度学习框架

Han Liu, Keyan Ding, Peilin Chen, Yinwei Wei, Liqiang Nie, Dapeng Wu, Shiqi Wang

发表机构 * Department of Computer Science, City University of Hong Kong(香港城市大学计算机科学系) ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University(浙江大学杭州国际科技创新中心) School of Software, Shandong University(山东大学软件学院) College of Informatics, Harbin Institute of Technology (Shenzhen)(哈尔滨工业大学(深圳)计算机学院)

AI总结 提出KEPLA框架,通过整合基因本体和配体属性的先验知识,利用全局表示对齐与局部交叉注意力,提升蛋白质-配体结合亲和力预测的准确性,在多个基准数据集上超越现有方法。

详情
AI中文摘要

准确预测蛋白质-配体结合亲和力对药物发现至关重要。尽管最近的深度学习方法已展现出有希望的结果,但它们通常仅依赖蛋白质和配体的结构特征,忽略了与结合亲和力相关的宝贵生化知识。为解决这一局限,我们提出KEPLA,一种新颖的深度学习框架,明确整合来自基因本体和配体属性的先验知识以增强预测性能。KEPLA以蛋白质序列和配体分子图作为输入,并优化两个互补目标:(1)将全局表示与知识图谱关系对齐,以捕获领域特定的生化见解;(2)利用局部表示之间的交叉注意力构建细粒度联合嵌入用于预测。在两个基准数据集上的域内和跨域场景实验表明,KEPLA始终优于最先进的基线方法。此外,基于知识图谱关系和交叉注意力图的可解释性分析为潜在的预测机制提供了有价值的见解。

英文摘要

Accurate prediction of protein-ligand binding affinity is critical for drug discovery. While recent deep learning approaches have demonstrated promising results, they often rely solely on structural features of proteins and ligands, overlooking their valuable biochemical knowledge associated with binding affinity. To address this limitation, we propose KEPLA, a novel deep learning framework that explicitly integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance. KEPLA takes protein sequences and ligand molecular graphs as input and optimizes two complementary objectives: (1) aligning global representations with knowledge graph relations to capture domain-specific biochemical insights, and (2) leveraging cross attention between local representations to construct fine-grained joint embeddings for prediction. Experiments on two benchmark datasets across both in-domain and cross-domain scenarios demonstrate that KEPLA consistently outperforms state-of-the-art baselines. Furthermore, interpretability analyses based on knowledge graph relations and cross attention maps provide valuable insights into the underlying predictive mechanisms.

2508.09191 2026-06-18 cs.LG cs.AI 版本更新

From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization

从数值到标记:一种基于符号离散化的LLM驱动上下文感知时间序列预测框架

Xiaoyu Tao, Shilong Zhang, Mingyue Cheng, Daoyu Wang, Tingyue Pan, Bokai Pan, Changqing Zhang, Shijin Wang

发表机构 * State Key Laboratory of Cognitive Intelligence(认知智能国家重点实验室) University of Science and Technology of China(中国科学技术大学) College of Intelligence and Computing(智能科学与计算学院) iFLYTEK Research(iFLYTEK研究院)

AI总结 提出TokenCast框架,利用大语言模型通过符号离散化将连续时间序列转化为标记,与上下文文本对齐,实现上下文感知的预测,实验证明有效。

详情
AI中文摘要

时间序列预测在能源、医疗和金融等关键应用领域支持决策中起着重要作用。尽管近期取得了进展,但由于将历史数值序列与通常包含非结构化文本数据的上下文特征整合的挑战,预测精度仍然有限。为了解决这一挑战,我们提出了TokenCast,一个由大语言模型(LLM)驱动的框架,利用基于语言的符号表示作为上下文感知时间序列预测的统一中介。具体来说,TokenCast采用离散分词器将连续数值序列转化为时间标记,实现与基于语言输入的结构对齐。为了有效弥合模态之间的语义差距,时间和上下文标记通过预训练的LLM嵌入到共享表示空间中,并通过生成目标进一步优化。基于这一统一语义空间,对齐的LLM随后以监督方式进行微调,以预测未来的时间标记,然后解码回原始数值空间。在真实世界数据集上的大量实验证明了我们框架的有效性,并突显了其作为上下文感知时间序列预测生成框架的潜力。代码可从此https URL获取。

英文摘要

Time series forecasting plays a vital role in supporting decision-making across a wide range of critical applications, including energy, healthcare, and finance. Despite recent advances, forecasting accuracy remains limited due to the challenge of integrating historical numerical sequences with contextual features, which often comprise unstructured textual data. To address this challenge, we propose TokenCast, a large language model (LLM) driven framework that leverages language-based symbolic representations as a unified intermediary for context-aware time series forecasting. Specifically, TokenCast employs a discrete tokenizer to transform continuous numerical sequences into temporal tokens, enabling structural alignment with language-based inputs. To effectively bridge the semantic gap between modalities, both temporal and contextual tokens are embedded into a shared representation space via a pre-trained LLM, further optimized with generative objectives. Building upon this unified semantic space, the aligned LLM is subsequently fine-tuned in a supervised manner to predict future temporal tokens, which are then decoded back into the original numerical space. Extensive experiments on real-world datasets demonstrate the effectiveness of our framework and highlight its potential as a generative framework for context-aware time series forecasting. The code is available at https://github.com/Xiaoyu-Tao/TokenCast.

2511.05221 2026-06-18 cs.LG q-bio.NC 版本更新

ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy

ActiTect:通过标准化体动记录进行REM睡眠行为障碍筛查的通用机器学习流程

David Bertram, Anja Ophey, Sinah Röttgen, Konstantin Kufer, Gereon R. Fink, Elke Kalbe, Clint Hansen, Walter Maetzler, Maximilian Kapsecker, Lara M. Reimer, Stephan Jonas, Andreas T. Damgaard, Natasha B. Bertelsen, Casper Skjaerbaek, Per Borghammer, Karolien Groenewald, Pietro-Luca Ratti, Michele T. Hu, Noémie Moreau, Michael Sommerauer, Katarzyna Bozek

发表机构 * Faculty of Mathematics and Natural Sciences, University of Cologne, Germany(科隆大学数学与自然科学学院,德国) Institute for Biomedical Informatics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆大学医学院与科隆大学医院生物医学信息学研究所,德国) Center for Molecular Medicine Cologne (CMMC), Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆分子医学中心(CMMC),科隆大学医学院与科隆大学医院,德国) Medical Psychology | Neuropsychology and Gender Studies, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆大学医学院与科隆大学医院医学心理学 | 神经心理学与性别研究,德国) Cognitive Neuroscience, Insitute for Neuroscience and Medicine, INM-3, Research Center Juelich, Germany(认知神经科学,神经科学与医学研究所,Juelich研究中心,德国) Department of Neurology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆大学医学院与科隆大学医院神经科,德国) Center of Neurology, Department of Parkinson, Sleep and Movement Disorders, University Hospital Bonn, University of Bonn, Germany(神经科中心,帕金森、睡眠与运动障碍部门,波恩大学医院,德国) German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany(德国神经退行性疾病研究中心(DZNE),波恩,德国) Cluster of Excellence for Aging and Aging-Associated Diseases (CECAD), University of Cologne, Germany(老龄化与相关疾病卓越中心(CECAD),科隆大学,德国) Department of Neurology, University Medical Center Schleswig-Holstein, Campus Kiel and Kiel University, Germany(神经科,施普伦德-霍斯特大学医院,基尔校区和基尔大学,德国) Department of Informatics, Technical University of Munich, Germany(信息学院,慕尼黑技术大学,德国) Institute for Digital Medicine, University Hospital Bonn, Germany(数字医学研究所,波恩大学医院,德国) Lundbeck Foundation Parkinson’s Disease Research Center (PACE), Aarhus University, Denmark(路德维希基金会帕金森病研究中心(PACE),奥胡斯大学,丹麦) Department of Nuclear Medicine, Aarhus University Hospital, Denmark(核医学部,奥胡斯大学医院,丹麦) Department of Electrical and Computer Engineering, Aarhus University, Denmark(电气与计算机工程系,奥胡斯大学,丹麦) Oxford Parkinson’s Disease Centre and Division of Neurology, Nuffield Department of Clinical Neurosciences, University of Oxford, UK(牛津帕金森病中心与神经科,牛津大学临床神经科学系,英国)

AI总结 提出ActiTect,一个全自动开源机器学习工具,通过标准化预处理和睡眠-觉醒检测,从体动记录中识别RBD,在多个独立队列中验证了泛化能力(AUROC 0.84-0.94)。

Comments 37 pages including Supplementary Information, 4 core figures, 1 supplementary figure. (v2: fixed a typo in Table 3 and made minor text edits; v3: post review)

详情
Journal ref
npj Digital Medicine (2026)
AI中文摘要

孤立性快速眼动睡眠行为障碍(iRBD)是α-突触核蛋白病的主要前驱标志,通常先于帕金森病、路易体痴呆或多系统萎缩的临床发作。虽然腕戴式体动记录仪通过捕捉异常夜间运动在大规模筛查中具有检测RBD的巨大潜力,但缺乏可靠高效的分析流程则无法使用。本研究提出了ActiTect,一个全自动开源机器学习工具,用于从体动记录中识别RBD。为确保跨异构采集设置的泛化能力,我们的流程包括稳健的预处理和自动睡眠-觉醒检测,以协调多设备数据并提取表征活动模式的生理可解释运动特征。模型开发基于78名个体的队列,在嵌套交叉验证下表现出强大的区分能力(AUROC = 0.95)。在盲法本地测试集(n = 31,AUROC = 0.86)和两个独立外部队列(n = 113,AUROC = 0.84;n = 57,AUROC = 0.94)上验证了泛化性。为评估现实世界鲁棒性,跨内部和外部队列的留一数据集交叉验证显示出一致的性能(AUROC范围 = 0.84-0.89)。补充稳定性分析表明,关键预测特征在数据集中保持可重复性,支持最终合并的多中心模型作为更广泛部署的稳健预训练资源。通过开源且易于使用,我们的工具促进了广泛采用,并促进了独立验证和协作改进,从而推动该领域向使用可穿戴设备的统一且可泛化的RBD检测模型发展。

英文摘要

Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of $α$-synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with Lewy bodies, or multiple system atrophy. While wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts by capturing abnormal nocturnal movements, they become inoperable without a reliable and efficient analysis pipeline. This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings. To ensure generalizability across heterogeneous acquisition settings, our pipeline includes robust preprocessing and automated sleep-wake detection to harmonize multi-device data and extract physiologically interpretable motion features characterizing activity patterns. Model development was conducted on a cohort of 78 individuals, yielding strong discrimination under nested cross-validation (AUROC = 0.95). Generalization was confirmed on a blinded local test set (n = 31, AUROC = 0.86) and on two independent external cohorts (n = 113, AUROC = 0.84; n = 57, AUROC = 0.94). To assess real-world robustness, leave-one-dataset-out cross-validation across the internal and external cohorts demonstrated consistent performance (AUROC range = 0.84-0.89). A complementary stability analysis showed that key predictive features remained reproducible across datasets, supporting the final pooled multi-center model as a robust pre-trained resource for broader deployment. By being open-source and easy to use, our tool promotes widespread adoption and facilitates independent validation and collaborative improvements, thereby advancing the field toward a unified and generalizable RBD detection model using wearable devices.

2602.19591 2026-06-18 cs.LG cs.AI 版本更新

Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks

使用异构图神经网络检测高潜力中小企业

Yijiashun Qi, Hanzhe Guo, Yijiazhen Qi

发表机构 * University of Michigan(密歇根大学) The University of Hong Kong(香港大学)

AI总结 提出SME-HGT异构图Transformer框架,利用公开数据构建包含公司、研究主题和政府机构的异构图,预测SBIR第一阶段获奖者能否进入第二阶段,AUPRC达0.621,优于基线模型。

Comments accepted by (ICIIS 2026)

详情
AI中文摘要

中小企业占美国企业的99.9%,贡献44%的经济活动,但系统性地识别高潜力中小企业仍是一个开放挑战。我们提出了SME-HGT,一个异构图Transformer框架,仅使用公开数据预测哪些SBIR第一阶段获奖者将进入第二阶段资助。我们构建了一个异构图,包含32,268个公司节点、124个研究主题节点和13个政府机构节点,通过约99,000条边连接三种语义关系类型。SME-HGT在时间分割测试集上达到0.621±0.003的AUPRC,在五个随机种子上优于MLP基线(0.590±0.002)和R-GCN(0.608±0.013)。在筛选深度为100家公司时,SME-HGT达到89.6%的精确率,比随机选择提升2.14倍。我们的时间评估协议防止信息泄露,对公开数据的依赖确保了可重复性。这些结果表明,公司、研究主题和资助机构之间的关系结构为中小企业潜力评估提供了有意义的信号,对政策制定者和早期投资者具有启示意义。

英文摘要

Small and Medium Enterprises (SMEs) constitute 99.9% of U.S. businesses and generate 44% of economic activity, yet systematically identifying high-potential SMEs remains an open challenge. We introduce SME-HGT, a Heterogeneous Graph Transformer framework that predicts which SBIR Phase I awardees will advance to Phase II funding using exclusively public data. We construct a heterogeneous graph with 32,268 company nodes, 124 research topic nodes, and 13 government agency nodes connected by approximately 99,000 edges across three semantic relation types. SME-HGT achieves an AUPRC of 0.621 0.003 on a temporally-split test set, outperforming an MLP baseline (0.590 0.002) and R-GCN (0.608 0.013) across five random seeds. At a screening depth of 100 companies, SME-HGT attains 89.6% precision with a 2.14 lift over random selection. Our temporal evaluation protocol prevents information leakage, and our reliance on public data ensures reproducibility. These results demonstrate that relational structure among firms, research topics, and funding agencies provides meaningful signal for SME potential assessment, with implications for policymakers and early-stage investors.

2605.10083 2026-06-18 cs.LG 版本更新

Unlocking air traffic flow prediction through microscopic aircraft-state modeling

通过微观飞机状态建模解锁空交通流量预测

Bin Wang, Anqi Liu, Jiangtao Zhao, Hina Birahmani, Yanyong Huang, Peilan He, Guiyuan Jiang, Feng Hong, Yanwei Yu, Yuanyuan Hou, Tianrui Li

发表机构 * Faculty of Information Science and Engineering(信息科学与工程学院) Ocean University of China(中国海洋大学) Sanya Oceanographic Institution(三亚海洋研究所) Joint Laboratory of Data Science and Business Intelligence(数据科学与商务智能联合实验室) Southwestern University of Finance and Economics(西南财经大学) The Affiliated Hospital of Qingdao University(青岛大学附属医院) School of Computing and Artificial Intelligence(计算机与人工智能学院)

AI总结 本文提出AeroSense模型,通过微观飞机状态直接预测未来区域交通流量,提升高密度交通下的预测精度,替代传统时间序列方法。

详情
AI中文摘要

终端空域短期空交通流量预测对主动空交通管理至关重要。现有方法主要将交通流量建模为聚合时间序列,尽管交通动态由飞机状态和连续空域中的相互作用决定。此类聚合掩盖了包括飞机运动学、边界相互作用和控制意图在内的细粒度信息。本文提出AeroSense,一种从即时空域情况中的动态飞机状态集直接预测未来交通流量的状态到流量建模框架。通过建立从微观飞机状态到未来区域交通流量的端到端映射,AeroSense在保持飞机级动态的同时,自然适应变化的交通密度,而无需依赖历史回溯窗口。在大规模真实数据集上的实验表明,AeroSense在高密度交通期间比基于聚合的预测方法具有持续的预测精度提升。这些发现表明,即时空域情况为传统基于时间序列的交通预测范式提供了有效的替代方案。

英文摘要

Short-term air traffic flow prediction in terminal airspace is essential for proactive air traffic management. Existing approaches predominantly model traffic flow as aggregated time series. However, traffic dynamics are governed by aircraft states and their interactions in continuous airspace. Such aggregation obscures fine-grained information, including aircraft kinematics, boundary interactions, and control intent. Here we present AeroSense, a state-to-flow modeling paradigm that predicts future traffic flow directly from instantaneous airspace situations represented as dynamic sets of aircraft states derived from ADS-B trajectories. By establishing an end-to-end mapping from microscopic aircraft states to future regional traffic flow, AeroSense preserves aircraft-level dynamics while naturally accommodating varying traffic density without relying on historical look-back windows. Experiments on a large-scale real-world dataset show that AeroSense exhibits admirable predictive accuracy and robustness over aggregation-based forecasting approaches, particularly during high-density traffic periods. These findings suggest that aircraft-state situation modeling provides a promising alternative to conventional time-series forecasting in air traffic flow management.

2605.13566 2026-06-18 cs.LG 版本更新

Spatiotemporal downscaling and nowcasting of urban land surface temperatures with deep neural networks

基于深度神经网络的城市地表温度时空下垫面精细化与现在预报

Solomiia Kurchaba, Angela Meyer

发表机构 * Department of Geoscience and Remote Sensing(地质科学与遥感系) Delft University of Technology(代尔夫特理工大学) School of Engineering and Computer Science(工程与计算机科学学院) Bern University of Applied Sciences(伯恩应用科学大学)

AI总结 本文提出利用深度神经网络结合静止和极轨卫星数据,实现高时空分辨率的城市地表温度场估计与现在预报,提升城市气候与生态研究的精度与时效性。

Comments Paper after publication in IEEE Access

详情
Journal ref
IEEE Access, vol. 14, pp. 85134-85151, 2026
AI中文摘要

地表温度(LST)是多种应用的关键变量,如城市气候和生态研究。然而,现有卫星衍生的LST产品提供的是高空间或高时间分辨率,导致两者之间存在根本性权衡。为解决这一权衡,我们结合静止和极轨卫星的观测数据,提供高空间和高时间分辨率(1公里,15分钟间隔)的LST场。我们展示了其在日内LST预报中的应用。为了估计高时空分辨率的LST场,训练了一个U-Net模型,将SEVIRI/MSG(3公里,15分钟分辨率)的LST场映射到Terra/Aqua MODIS(1公里,每天4次过境)的LST场,二者在空间和时间上同步。所提出的模型已在欧洲大都市的LST上进行训练,人口超过100万,且在留出测试集上达到RMSE=1.92°C和接近零偏移MVE=0.01°C。作为第二步,我们提出基于ConvLSTM架构的LST现在预报模型,训练数据为下缩的LST场,预测时间跨度为15至75分钟。该现在预报模型优于持续性和气候滚动中位数基准,对于所考虑的预测时间,RMSE为0.57至1.15°C,偏移范围从-0.1到0.14°C。此外,与独立MODIS过境的额外验证确认了鲁棒性能。我们的高时空分辨率LST预报模型可直接应用于基于卫星的LST监测操作。

英文摘要

Land Surface Temperature (LST) is a key variable for various applications, such as urban climate and ecology studies. Yet, existing satellite-derived LST products provide either high spatial or high temporal resolution, resulting in a fundamental trade-off between the two. To address this trade-off, we combine observations from a geostationary and a polar orbiting satellite and provide LST fields at high spatial and high temporal resolution (1 km at 15-min intervals). We demonstrate their application for intraday forecasting of LSTs. To estimate LST fields at high spatiotemporal resolution, a U-Net model is trained to map LST fields from SEVIRI/MSG (3 km and 15 min resolution) to LST fields from Terra/Aqua MODIS (1 km, 4 overpasses per day) that are collocated in space and time. The presented model has been trained on LSTs across large European cities with a population exceeding 1 million inhabitants, and achieves an RMSE = $1.92$°C and near-zero bias MBE = $0.01$°C on the hold-out test set. As a second step, we present an LST nowcasting model based on ConvLSTM architecture, trained across downscaled LST fields with forecast lead times of 15 to 75 minutes. The nowcasting model outperforms a persistence and a Climatological Rolling Median benchmarks, with RMSEs of $0.57$ to $1.15$°C for the considered lead times and biases ranging from $-0.1$ to $0.14$°C. An additional validation conducted against independent MODIS overpasses confirms robust performance. Our LST forecast model at high spatiotemporal resolution is directly applicable to operational satellite-based LST monitoring.

2605.21528 2026-06-18 cs.LG cs.AI 版本更新

A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction

可重复的基于日志的自动机器学习框架用于医疗风险预测中的可解释流水线优化

Rui Huang, Lican Huang

发表机构 * School of Basic Medicine, Hangzhou Normal University(杭州师范大学基础医学院) Research Department, Hangzhou Domain Zones Technology Co.Ltd.(杭州域区技术有限公司)

AI总结 本文提出了一种可重复的基于日志的自动机器学习框架,用于医疗风险预测中的可解释流水线优化,通过分析组件属性、交互和冗余性,提高了模型性能和稳定性。

详情
AI中文摘要

准确且可重复的疾病风险预测仍然具有挑战性,由于异质特征、有限样本和严重的类别不平衡。本研究引入了yvsoucom-iterkit,一种确定性和基于日志的自动化机器学习框架,将流水线优化完全可重复地建模为配置级系统。每个流水线被编码为可追溯的日志实体,使能够分析组件属性、交互、相似性和跨种子鲁棒性。在超过18,000个流水线配置上对Pima Indians糖尿病和中风数据集的实验揭示了一个结构化且部分冗余的搜索空间,其中性能由一小部分相互作用的组件决定。随机森林重要性分析显示,增强(0.454)、模型选择(0.198)和不平衡处理(0.101)是Pima数据集的关键驱动因素,而不平衡处理主导中风(0.406)。组件相似性分析显示强冗余性,特征选择变体(biMax-biMean)表现出低RMS距离(0.0252),混合匹配无增强(0.0279),TomekLinks与无不平衡处理对齐(0.0325),而高斯噪声与无增强的差异更大(0.10)。该框架使用集成模型(加权F1 0.89,宏F1 0.88在Pima;加权F1 0.94在中风)实现了强且稳定的性能,而宏F1在中风上较低(0.67)由于类别不平衡。跨种子分析揭示了性能-鲁棒性权衡,集成模型的变异性低于SVM。这些结果表明,有效的AutoML优化可以聚焦于一组高影响的组件。

英文摘要

Accurate disease risk prediction is challenged by heterogeneous features, limited data, and class imbalance. This study presents yvsoucom-iterkit, a deterministic AutoML framework that models pipeline optimization as a configuration-level system with full reproducibility and traceable execution logs, enabling systematic analysis of component attribution, interactions, similarity, and cross-seed robustness. Experiments on the Pima Indians Diabetes and Stroke datasets across more than 18,000 pipeline configurations reveal a structured yet partially redundant search space, where performance is dominated by a small subset of interacting components. Ensemble models achieve stable performance, reaching a Weighted-F1 of 0.89 on Pima and 0.94 on Stroke. Macro-F1 reaches approximately 0.88 on Pima but drops to 0.6560 on Stroke due to severe imbalance. Cross-seed experiments show that ensembles reduce variance compared to single models. Friedman testing ($p < 0.05$) confirms significant ranking differences across configurations. Based on analysis of component attribution, interaction, and similarity, optimal configuration design reveals dataset-dependent behavior. For the Pima dataset, computational efficiency benefits from simplified search spaces where redundant components can be removed, with split ratio playing a key role. In contrast, the Stroke dataset requires enhanced imbalance-aware strategies, where RandomOverSampler improves Macro-F1 from 0.6560 to 0.6766. These findings demonstrate that effective AutoML optimization is achieved through optimal configuration design, where carefully constraining the search space to high-impact components can improve performance, stability, and interpretability while reducing unnecessary search complexity.

2606.07622 2026-06-18 cs.LG stat.AP 版本更新

Airport Terminal Passenger Queue Forecasting for Departure Gates and Security Checkpoints

机场航站楼登机口与安检点旅客排队预测

Juhwan Lee, Seokbin Yoon, Keumjin Lee, Hojong Baik, Seyeon Jung

发表机构 * Korea Aerospace University(韩国航空大学) Korea Airports Corporation(韩国机场公社)

AI总结 提出基于Transformer的框架,利用历史队列长度、等待时间和旅客吞吐量数据,预测登机口和安检点未来两小时的队列长度与等待时间,支持主动排队管理。

Comments 10 pages, 6 figures, accepted at DASC 2026

详情
AI中文摘要

准确的机场航站楼旅客排队预测对于高效的离港运营至关重要,因为它能够实现主动的拥堵管理。然而,时变的旅客需求以及多个离港设施中异构的设施使用情况使得预测具有挑战性。在这项工作中,我们提出了一种旅客排队预测框架,该框架从运营数据中学习历史旅客流量模式。所提出的模型采用基于Transformer的架构,利用过去登机口和安检点的队列长度和等待时间,以及值机岛的旅客吞吐量,来捕捉时间依赖性和设施间相关性。学习到的表示被映射到两个设施特定的MLP头部,以预测登机口和安检点的队列长度和等待时间。实验结果表明,该模型能够准确预测未来两小时内的排队情况。所提出的方法为机场航站楼运营中的主动排队管理和人员重新分配提供了实用的实时决策支持。

英文摘要

Accurate passenger queue forecasting in airport terminals is essential for efficient departure operations, as it enables proactive congestion management. However, time-varying passenger demand and heterogeneous facility usage across multiple departure facilities make forecasting challenging. In this work, we propose a passenger queue forecasting framework that learns historical passenger flow patterns from operational data. The proposed model employs a Transformer-based architecture to capture temporal dependencies and inter-facility correlations using past queue length and waiting time at departure gates and security checkpoints, together with passenger throughput at check-in islands. The learned representations are mapped to two facility-specific prediction heads to predict queue length and waiting time at departure gates and security checkpoints. Experimental results demonstrate accurate forecasts up to two hours ahead. The proposed approach offers practical real-time decision support for proactive queue management and staff reallocation in airport terminal operations.

2204.14224 2026-06-18 cs.CV cs.LG eess.IV 版本更新

Investigation of Neural Network Methods for Reconstruction and Classification of Texture Images Under Conditions of Incomplete Information

不完全信息条件下纹理图像重建与分类的神经网络方法研究

Galymzhan Abdimanap, Kairat Bostanbekov, Abdelrahman Abdallah, Anel Alimova, Darkhan Kurmangaliyev, Daniyar Nurseitov, Tatyana Dedova, Larissa Balakay, Serik Nurakynov

发表机构 * Satbayev University(萨特巴耶夫大学) Institute of Ionosphere LLP(电离层研究所) Information Technology Department(信息技术部门) Assiut University(阿西乌特大学)

AI总结 提出结合目标检测、GAN(CRA)修复和Transformer/CNN分类的端到端框架,发现重建质量高(PSNR 28.7dB)但分类准确率仅53%,通过置信度混合集成将MCA从48%提升至58%,揭示生成模型产生语义模糊特征的问题。

Comments IEEE ACCESS

详情
AI中文摘要

异质自然纹理的自动化分析常因物理损伤和数据丢失而受阻,这对计算机视觉构成了重大挑战。虽然深度学习在受控环境中已显示出成功,但其在信息不完全条件下对复杂地质材料的应用仍未被充分探索。本研究提出了一个用于高分辨率岩心样本图像修复和分类的集成框架。我们设计了一个端到端流水线,利用目标检测进行样本分割,随后使用具有上下文残差聚合(CRA)的生成对抗网络(GAN)进行图像修复,以重建缺失的高频细节。接着,我们在重建数据上评估了现代基于Transformer(Swin、ViT)和CNN架构的性能。实验揭示了重建质量与下游效用之间的关键分歧:尽管结构保真度高(PSNR 28.7 dB,FID 74.01),分类准确率却停滞在53%。为了改善少数类检测,我们提出了一种基于置信度的混合集成方法,将MCA从48%提升至58%。这些结果凸显了当前最先进生成模型的局限性,它们可能产生视觉上合理但语义模糊的特征(“幻觉”),从而混淆分类器。本工作深入探讨了图像重建质量与分类性能之间的依赖关系,为无损检测和材料科学领域的未来研究提供了可复现的基线。鉴于井间准确率仍处于49-53%范围,我们将所得到的系统定位为岩相解释的决策支持和筛选工具,而非完全自主的分类器。代码可在以下网址获取:https://github.com/your-repo(注:原文URL未提供,此处为示例)

英文摘要

The automated analysis of heterogeneous natural textures is frequently hindered by physical damage and data loss, presenting a significant challenge to computer vision. While deep learning has shown success in controlled environments, its application to complex geological materials under conditions of incomplete information remains underexplored. This study presents an integrated framework for the inpainting and classification of high-resolution core sample images. We propose an end-to-end pipeline that utilizes object detection for sample segmentation, followed by image inpainting using Generative Adversarial Networks (GANs) with Contextual Residual Aggregation (CRA) to reconstruct missing high-frequency details. Subsequently, we evaluate the performance of modern Transformer-based (Swin, ViT) and CNN architectures on the reconstructed data. Our experiments revealed a critical divergence between reconstruction quality and downstream utility: despite high structural fidelity (PSNR 28.7~dB, FID 74.01), classification accuracy plateaued at 53\%. To improve minority-class detection, we propose a confidence-based hybrid ensemble that raises MCA from 48\% to 58\%. These results highlight the limitations of current state-of-the-art generative models, which may produce visually plausible but semantically ambiguous features ("hallucinations") that confound classifiers. This work provides insights into the dependencies between image reconstruction quality and classification performance, offering a reproducible baseline for future research in non-destructive testing and material science. Given that cross-well accuracy remains in the 49--53\% range, we position the resulting system as a decision-support and screening tool for lithofacies interpretation rather than as a fully autonomous classifier. The code is available at https://github.com/GalymzhanAbdimanap/Lithology_recognition

2508.10178 2026-06-18 q-bio.QM cs.LG 版本更新

Estimating carbon pools in the European Shelf sea environment: replacing reanalysis by model-informed machine learning?

估算欧洲陆架海环境中的碳库:用模型指导的机器学习替代再分析?

Jozef Skakala

发表机构 * Plymouth Marine Laboratory(普利茅斯海洋实验室) National Centre for Earth Observation(国家地球观测中心)

AI总结 提出用深度集成神经网络学习可观测变量与海洋碳库的关系,以低成本替代昂贵再分析,在西北欧陆架海实现高效碳库预测并提供不确定性。

Comments 37 pages, 9 figures (+ 3 in the appendix), v3 - published version

详情
Journal ref
JGR - Machine Learning and Computation 3 (2026)
AI中文摘要

陆架海对经济和碳循环至关重要,但碳库观测往往稀疏或高度不确定。碳再分析(无论是同化叶绿素a等代理变量还是直接同化碳)可提供替代方案,但运行成本高昂。我们提出使用计算成本低的神经网络集成(即深度集成)来学习直接可观测(大气、河流和海洋)变量与海洋碳库之间的关系,该关系来自一个物理-生物地球化学耦合模型。深度集成在西北欧陆架海(NWES)物理-生物地球化学模型自由运行模拟上训练。训练后,使用来自NWES再分析的输入而非自由运行来运行深度集成,证明它能高效预测多个NWES碳库(如碎屑、浮游动物、异养细菌),且与再分析的一致性远优于自由运行,同时提供不确定性信息。我们进一步表明,当深度集成直接由同化到再分析中的观测驱动时,其表现同样良好,但碳库只能预测在观测位置和时间。我们关注结果的可解释性,并展示了深度集成在未来气候假设情景中的潜在应用。我们认为,模型指导的机器学习为昂贵的再分析提供了可行的替代方案,并可在观测缺失和/或高度不确定的地方补充观测。

英文摘要

Shelf seas are important for the economy and the carbon cycle, but shelf sea observations for carbon pools are often sparse, or highly uncertain. An alternative can be provided by carbon reanalyses (whether assimilating proxy variables, such as chlorophyll-$a$, or directly carbon), but these are often expensive to run. We propose to use a computationally cheap ensemble of neural networks (i.e. deep ensemble) to learn the relationship between the directly observable (atmospheric, riverine and ocean) variables and marine carbon pools from a coupled physics-biogeochemistry model. The deep ensemble was trained on a North-West European Shelf (NWES) physical-biogeochemistry model free run simulation. After training, the deep ensemble was run using inputs from the NWES reanalysis instead of the free run, demonstrating that it can efficiently predict several NWES carbon pools (e.g., detritus, zooplankton, heterotrophic bacteria) in much better agreement with the reanalysis than the free run, while also providing uncertainty information. We further show that the deep ensemble performs similarly well when it is driven directly by the observations assimilated into the reanalysis, with the limitation that carbon pools can then be predicted only at the observed locations and times. We focus on explainability of the results and demonstrate potential use of the deep ensembles for future climate what-if scenarios. We suggest that model-informed machine learning presents a viable alternative to expensive reanalyses and could complement observations, wherever they are missing and/or highly uncertain.

2511.00366 2026-06-18 stat.ML cs.CE cs.LG 版本更新

A Streaming Sparse Cholesky Method for Derivative-Informed Gaussian Process Surrogates Within Digital Twin Applications

面向数字孪生应用中导数信息高斯过程代理的流式稀疏Cholesky方法

Shridhar Vashishtha, Krishna Prasath Logakannan, Jacob Hochhalter, Shandian Zhe, Robert M. Kirby

发表机构 * organization= Department of Mechanical Engineering, University of Utah , addressline= , city= Salt Lake City , postcode= 84112 , state= UT , country= USA organization= Kahlert School of Computing, University of Utah , city= Salt Lake City , postcode= 84112 , state= UT , country= USA organization= Scientific Computing \& Imaging Institute, University of Utah , addressline= , city= Salt Lake City , postcode= 84112 , state= UT , country= USA

AI总结 提出一种流式稀疏Cholesky方法,通过动态更新和导数信息增强高斯过程代理,降低协方差矩阵维度,实现数字孪生中飞机结构性能的实时预测。

详情
AI中文摘要

数字孪生被开发用于模拟特定物理资产(或孪生体)的行为,它们可以由高保真基于物理的模型或代理组成。高精度代理通常优于多物理场模型,因为它们能够实时预测物理孪生体的未来状态。为了适应特定的物理孪生体,必须使用来自该物理孪生体的在役数据更新数字孪生模型。在本文中,我们结合并扩展了几项先前与代理相关的进展,旨在展示一个端到端的数字孪生(DT)解决方案,用于预测飞机结构(物理资产)的性能。为此,我们将高斯过程(GP)模型扩展到包含导数数据,以提高精度,并通过动态更新来吸收在役期间的物理孪生体数据。然而,包含导数数据会带来协方差矩阵维度增加的过高成本。我们通过改进的动态稀疏Cholesky线性系统求解器规避了这个问题。数值实验表明,导数增强的稀疏Cholesky GP方法在动态数据添加时产生了改进的模型预测精度。最后,我们在一个数字孪生框架内演示了所开发的算法,用于模拟航空航天飞行器中的疲劳裂纹扩展,从而通过我们组装的工程系统展示了数字孪生技术如何在实践中结合。

英文摘要

Digital twins are developed to model the behavior of a specific physical asset (or twin), and they can consist of high-fidelity physics-based models or surrogates. A highly accurate surrogate is often preferred over multi-physics models as they enable forecasting the physical twin future state in real-time. To adapt to a specific physical twin, the digital twin model must be updated using in-service data from that physical twin. In this paper, we combine and extend several previous surrogate-related advancements with the goal of demonstrating an end-to-end digital twin (DT) solution for predicting performance of an aircraft structure (the physical asset). To this end, we extend Gaussian process (GP) models to include derivative data, for improved accuracy, with dynamic updating to ingest physical twin data during service. Including derivative data, however, comes at a prohibitive cost of increased covariance matrix dimension. We circumvent this issue through our modified dynamic sparse Cholesky linear system solver. Numerical experiments demonstrate that the prediction accuracy of the derivative-enhanced sparse Cholesky GP method produces improved models upon dynamic data additions. Lastly, we demonstrate the developed algorithm within a DT framework to model fatigue crack growth in an aerospace vehicle, thereby exhibiting through our assembled engineered system how digital twin technologies can be combined in practice.

2511.19468 2026-06-18 cs.DC cs.ET cs.LG physics.space-ph 版本更新

Towards a future space-based, highly scalable AI infrastructure system design

面向未来天基、高度可扩展的AI基础设施系统设计

Blaise Agüera y Arcas, Travis Beals, Maria Biggs, Jessica V. Bloom, Thomas Fischbacher, Konstantin Gromov, Urs Köster, Rishiraj Pravahan, James Manyika

发表机构 * Google(谷歌)

AI总结 本文探索利用卫星集群、太阳能板、自由空间光通信和TPU芯片构建天基机器学习计算系统,并分析辐射测试、发射成本等可行性。

Comments 18 pages, 4 figures. v2: Cleaned up references. Improved rough estimates. Fixed typos. Re-ran radiation test with improved methods

详情
AI中文摘要

如果AI是一种基础通用技术,我们应该预期对AI计算和能源的需求将持续增长。太阳是太阳系中最大的能源来源,因此值得考虑未来的AI基础设施如何最有效地利用这种能量。本文探索了用于太空机器学习的可扩展计算系统,该系统使用配备太阳能板的卫星群、自由空间光通信的星间链路以及谷歌张量处理单元(TPU)加速芯片。为了促进高带宽、低延迟的星间通信,卫星将近距离飞行。我们通过一个半径为1公里的81颗卫星集群说明了编队飞行的基本方法,并描述了一种使用基于高精度ML模型来控制大规模星座的方法。Trillium TPU经过了辐射测试。它们在总电离剂量相当于5年任务寿命的情况下存活,没有永久性故障,并针对位翻转错误进行了表征。发射成本是整体系统成本的关键部分;学习曲线分析表明,到2030年代中期,发射到近地轨道(LEO)的成本可能达到$\lesssim$200美元/公斤。

英文摘要

If AI is a foundational general-purpose technology, we should anticipate that demand for AI compute -- and energy -- will continue to grow. The Sun is by far the largest energy source in our solar system, and thus it warrants consideration how future AI infrastructure could most efficiently tap into that power. This work explores a scalable compute system for machine learning in space, using fleets of satellites equipped with solar arrays, inter-satellite links using free-space optics, and Google tensor processing unit (TPU) accelerator chips. To facilitate high-bandwidth, low-latency inter-satellite communication, the satellites would be flown in close proximity. We illustrate the basic approach to formation flight via an 81-satellite cluster of 1 km radius, and describe an approach for using high-precision ML-based models to control large-scale constellations. Trillium TPUs are radiation tested. They survive a total ionizing dose equivalent to a 5 year mission life without permanent failures, and are characterized for bit-flip errors. Launch costs are a critical part of overall system cost; a learning curve analysis suggests launch to low-Earth orbit (LEO) may reach $\lesssim$\$200/kg by the mid-2030s.

2603.15988 2026-06-18 eess.AS cs.AI cs.LG 版本更新

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

无中生有:面向构音障碍语音严重程度鲁棒估计的数据增强

Jaesung Bae, Xiuwen Zheng, Minje Kim, Chang D. Yoo, Mark Hasegawa-Johnson

发表机构 * 1 University of Illinois Urbana-Champaign, IL, USA 2 Korea Advanced Institute of Science \& Technology, KR

AI总结 提出三阶段框架,利用未标注构音障碍语音和典型语音数据集,通过教师模型生成伪标签、标签感知对比学习预训练和微调,在五个未见数据集上平均SRCC达0.761,显著优于现有方法。

Comments Accepted to Interspeech 2026 Long Paper Track

详情
AI中文摘要

构音障碍语音质量评估(DSQA)对于临床诊断和包容性语音技术至关重要。然而,主观评估成本高且难以规模化,而标注数据的稀缺限制了鲁棒的客观建模。为解决这一问题,我们提出了一个三阶段框架,利用未标注的构音障碍语音和大规模典型语音数据集来扩展训练。教师模型首先生成未标注样本的伪标签,然后使用标签感知对比学习策略进行弱监督预训练,使模型暴露于多样化的说话者和声学条件。预训练模型随后针对下游DSQA任务进行微调。在跨越多种病因和语言的五个未见数据集上的实验证明了我们方法的鲁棒性。我们的基于Whisper的基线显著优于SOTA DSQA预测器(如SpICE),完整框架在未见测试数据集上实现了平均SRCC为0.761。

英文摘要

Dysarthric speech quality assessment (DSQA) is critical for clinical diagnostics and inclusive speech technologies. However, subjective evaluation is costly and difficult to scale, and the scarcity of labeled data limits robust objective modeling. To address this, we propose a three-stage framework that leverages unlabeled dysarthric speech and large-scale typical speech datasets to scale training. A teacher model first generates pseudo-labels for unlabeled samples, followed by weakly supervised pretraining using a label-aware contrastive learning strategy that exposes the model to diverse speakers and acoustic conditions. The pretrained model is then fine-tuned for the downstream DSQA task. Experiments on five unseen datasets spanning multiple etiologies and languages demonstrate the robustness of our approach. Our Whisper-based baseline significantly outperforms SOTA DSQA predictors such as SpICE, and the full framework achieves an average SRCC of 0.761 across unseen test datasets.

2603.29247 2026-06-18 cs.CL cs.AI cs.LG 版本更新

MemRerank: Preference Memory for Personalized Product Reranking

MemRerank:用于个性化产品重排序的偏好记忆

Zhiyuan Peng, Xuyang Wu, Huaixiao Tou, Yi Fang, Yu Gong

发表机构 * Santa Clara University(圣克拉拉大学) Independent Researcher(独立研究者)

AI总结 提出MemRerank框架,通过强化学习将用户购买历史提炼为查询无关的偏好记忆,用于LLM购物代理的个性化重排序,在1-in-5选择任务中准确率提升高达10.61个百分点。

Comments correct author name in metadata

详情
AI中文摘要

基于LLM的购物代理越来越依赖长购买历史和多轮交互来实现个性化,然而,由于噪声、长度和相关性不匹配,将原始历史简单地附加到提示中通常效果不佳。我们提出MemRerank,一个偏好记忆框架,将用户购买历史提炼为简洁、查询无关的信号,用于个性化产品重排序。为了研究这个问题,我们构建了一个端到端的基准测试和评估框架,围绕基于LLM的\ extbf{1-in-5}选择任务,该任务同时衡量记忆质量和下游重排序效用。我们进一步使用强化学习(RL)训练记忆提取器,以下游重排序性能作为监督。使用两个基于LLM的重排序器进行的实验表明,MemRerank始终优于无记忆、原始历史和现成记忆基线,在1-in-5准确率上提高了高达\ extbf{+10.61}个绝对百分点。这些结果表明,显式偏好记忆是代理型电子商务系统中个性化的一种实用且有效的构建模块。

英文摘要

LLM-based shopping agents increasingly rely on long purchase histories and multi-turn interactions for personalization, yet naively appending raw history to prompts is often ineffective due to noise, length, and relevance mismatch. We propose MemRerank, a preference memory framework that distills user purchase history into concise, query-independent signals for personalized product reranking. To study this problem, we build an end-to-end benchmark and evaluation framework centered on an LLM-based \textbf{1-in-5} selection task, which measures both memory quality and downstream reranking utility. We further train the memory extractor with reinforcement learning (RL), using downstream reranking performance as supervision. Experiments with two LLM-based rerankers show that MemRerank consistently outperforms no-memory, raw-history, and off-the-shelf memory baselines, yielding up to \textbf{+10.61} absolute points in 1-in-5 accuracy. These results suggest that explicit preference memory is a practical and effective building block for personalization in agentic e-commerce systems.

2604.00730 2026-06-18 cs.CY cs.AI cs.LG cs.SE 版本更新

A CEFR-Inspired Classification Framework with Fuzzy C-Means To Automate Assessment of Programming Skills in Scratch

基于CEFR启发的模糊C均值分类框架:自动化评估Scratch编程技能

Ricardo Hidalgo-Aragón, Jesús M. González-Barahona, Gregorio Robles

发表机构 * Universidad Rey Juan Carlos(雷昂卡洛斯大学)

AI总结 提出一种基于CEFR的Scratch项目评估框架,使用模糊C均值聚类对200万+项目分级,识别B2瓶颈并引入分类确定性指标以平衡自动反馈与人工审核。

Comments Best Paper Award CSEDU 2026 -Minor change FPC fix-

详情
AI中文摘要

背景:学校、培训平台和技术公司日益需要以透明、可重复的方法大规模评估编程能力,以支持个性化学习路径。目标:本研究引入一个与欧洲共同语言参考标准(CEFR)一致的Scratch项目评估教学框架,为学生和教师提供通用能力等级,并为课程设计提供可行见解。方法:我们对通过此http URL评估的2008246个Scratch项目应用模糊C均值聚类,实施序数准则将聚类映射到CEFR等级(A1-C2),并引入增强分类指标,识别过渡学习者,实现持续进度跟踪,量化分类确定性以平衡自动反馈与教师评审。影响:该框架能够诊断系统性课程缺口——特别是“B2瓶颈”,由于逻辑同步和数据表示的认知负荷,仅13.3%的学习者处于该等级——同时提供基于确定性的触发机制以进行人工干预。

英文摘要

Context: Schools, training platforms, and technology firms increasingly need to assess programming proficiency at scale with transparent, reproducible methods that support personalized learning pathways. Objective: This study introduces a pedagogical framework for Scratch project assessment, aligned with the Common European Framework of Reference (CEFR), providing universal competency levels for students and teachers alongside actionable insights for curriculum design. Method: We apply Fuzzy C-Means clustering to 2008246 Scratch projects evaluated via Dr.Scratch, implementing an ordinal criterion to map clusters to CEFR levels (A1-C2), and introducing enhanced classification metrics that identify transitional learners, enable continuous progress tracking, and quantify classification certainty to balance automated feedback with instructor review. Impact: The framework enables diagnosis of systemic curriculum gaps-notably a "B2 bottleneck" where only 13.3% of learners reside due to the cognitive load of integrating Logic Synchronization, and Data Representation--while providing certainty--based triggers for human intervention.

2604.03275 2026-06-18 physics.ao-ph cs.AI cs.LG 版本更新

IPSL-AID: Generative Diffusion Models for Climate Downscaling from Global to Regional Scales

IPSL-AID:用于从全球到区域尺度气候降尺度的生成扩散模型

Kishanthan Kingston, Olivier Boucher, Freddy Bouchet, Pierre Chapel, Rosemary Eade, Jean-Francois Lamarque, Redouane Lguensat, Kazem Ardaneh

发表机构 * Climate Modeling Center(气候建模中心) Sorbonne University(索邦大学) CNRS(法国国家科学研究中心) IPSL Paris(巴黎) France(法国)

AI总结 提出基于去噪扩散概率模型的IPSL-AID工具,利用ERA5再分析数据从粗分辨率输入生成0.25°温度、风和降水场,并建模细尺度特征概率分布以量化不确定性,准确重建统计分布、极端事件和空间结构。

Comments 17 pages, 12 figures, submitted to Climate Informatique 2026, to appear in Environmental Data Science

详情
AI中文摘要

有效的气候变化适应和减缓策略需要高分辨率预测来指导战略决策。传统的全球气候模型通常以150至200公里的分辨率运行,缺乏表示关键区域过程的能力。IPSL-AID是一种基于去噪扩散概率模型的全球到区域降尺度工具,旨在解决这一限制。该工具在ERA5再分析数据上训练,利用粗分辨率输入及其时空上下文生成0.25°分辨率的温度、风和降水场。它还建模细尺度特征的概率分布,以产生用于不确定性量化的合理情景。该模型准确重建了统计分布,包括极端事件、功率谱和空间结构。这项工作突出了生成扩散模型在高效气候降尺度及不确定性量化方面的潜力。

英文摘要

Effective adaptation and mitigation strategies for climate change require high-resolution projections to inform strategic decision-making. Conventional global climate models, which typically operate at resolutions of 150 to 200 kilometers, lack the capacity to represent essential regional processes. IPSL-AID is a global to regional downscaling tool based on a denoising diffusion probabilistic model designed to address this limitation. Trained on ERA5 reanalysis data, it generates 0.25 degree resolution fields for temperature, wind, and precipitation using coarse inputs and their spatiotemporal context. It also models probability distributions of fine-scale features to produce plausible scenarios for uncertainty quantification. The model accurately reconstructs statistical distributions, including extreme events, power spectra, and spatial structures. This work highlights the potential of generative diffusion models for efficient climate downscaling with uncertainty

2604.14906 2026-06-18 physics.bio-ph cs.LG 版本更新

Unraveling the Mechanism of Drug Binding to SARS-CoV-2 RNA Pseudoknot with Thermodynamics-Driven Machine Learning

用热力学驱动的机器学习揭示药物与SARS-CoV-2 RNA假结的结合机制

Mariia Ivonina, Jakub Rydzewski

发表机构 * Platform of Inter/Transdisciplinary Energy Research, Kyushu University(interdisciplinary 能源研究平台,九州大学) Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University(物理研究所,物理、天文学与信息学学院,尼古拉库普林大学)

AI总结 本研究利用热力学驱动的机器学习方法(光谱映射)从全原子分子动力学轨迹中学习集体变量,揭示了配体结合对SARS-CoV-2 RNA假结拓扑选择性去稳定化的机制,并发现质子化状态是模拟RNA靶向药物作用的关键因素。

详情
AI中文摘要

SARS-CoV-2 RNA中的假结二级结构通过$-1$程序性核糖体移码($-1$ PRF)调控蛋白质合成,该机制使病毒能从重叠阅读框产生结构蛋白和非结构蛋白。该假结表现出穿线和非穿线两种长寿命拓扑结构。配体结合对其折叠的影响是开发$-1$ PRF小分子抑制剂的关键过程。通过引入捕捉相应最慢动力学模式的集体变量(CVs),可以促进通过无偏分子动力学(MD)模拟理解这一过程。这里,我们使用光谱映射(SM),一种热力学驱动的机器学习技术,直接从SARS-CoV-2 RNA假结与$-1$ PRF抑制剂莫拉沙星及其两种结构类似物(中性和离子化形式)复合物的全原子MD轨迹中学习这样的CVs。从学习到的CVs导出的自由能景观(FELs)表明,配体诱导的去稳定化是拓扑选择性的。在穿线假结中,抑制剂去稳定化S2茎,而在非穿线假结中,去稳定化发生在S1和S3茎。此外,每个配体重塑FEL的程度与实验报道的抗病毒效力相匹配,而质子化状态在相同RNA拓扑内定性地改变动力学。总体而言,我们的结果显示了假结拓扑、配体类型和质子化状态如何共同影响病毒RNA的慢构象动力学,并确立了生理质子化作为模拟RNA靶向药物作用的关键因素。

英文摘要

The pseudoknot secondary structure in SARS-CoV-2 RNA is essential for regulating protein synthesis through $-$1 programmed ribosomal frameshifting ($-1$ PRF), a mechanism that allows the virus to generate both structural and non-structural proteins from overlapping reading frames. This pseudoknot exhibits both threaded and unthreaded long-lived topologies. The influence of ligand binding on its folding is a process critical for the development of $-$1 PRF small-molecule inhibitors. Understanding this process through unbiased molecular dynamics (MD) simulations can be facilitated by introducing collective variables (CVs) that capture the corresponding slowest dynamical modes. Here, we use spectral map (SM), a thermodynamics-driven machine learning technique, to learn such CVs directly from all-atom MD trajectories of the SARS-CoV-2 RNA pseudoknot in complex with the $-$1 PRF inhibitor merafloxacin and its two structural analogs in neutral and ionized forms. Free-energy landscapes (FELs) derived from the learned CVs indicate that ligand-induced destabilization is topology-selective. In the threaded pseudoknot, the inhibitors destabilize the S2 stem, while in the unthreaded pseudoknot, destabilization occurs in the S1 and S3 stems. Furthermore, the extent to which each ligand reshapes the FEL matches experimentally reported antiviral potency, whereas the protonation state qualitatively alters dynamics within the same RNA topology. Overall, our results show how pseudoknot topology, ligand type, and protonation state collectively influence the slow conformational dynamics of viral RNA and establish physiological protonation as a critical factor for modeling RNA-targeted drug action.

2604.22476 2026-06-18 cs.CV cs.LG 版本更新

All Eyes on the Workflow: Automated and Efficient Event Discovery from Video Streams

全神贯注于工作流:从视频流中自动高效发现事件

Marco Pegoraro, Jonas Seng, Dustin Heller, Wil M. P. van der Aalst, Kristian Kersting

发表机构 * Chair of Process and Data Science, RWTH Aachen University(过程与数据科学教授席位,亚琛工业大学) Artificial Intelligence & Machine Learning Lab, Technical University of Darmstadt(人工智能与机器学习实验室,达姆施塔特技术大学)

AI总结 提出SnapLog方法,利用图像嵌入和帧间相似矩阵进行时间分割,结合广义少样本分类从视频中提取事件数据,生成可解释的带标签时间戳帧序列。

Comments 18 pages, 6 figures, 1 table, 27 references

详情
AI中文摘要

业务流程管理和流程挖掘等学科通过基于记录的事件数据发现流程见解来帮助组织。然而,流程分析的一个障碍是数据多模态性:例如,视频形式的数据不能直接解释为事件。现有方法依赖于活动标签字典作为输入,无法提供逐帧标签解释,或依赖于过时的计算机视觉技术。在这项工作中,我们提出了SnapLog,一种通过使用图像嵌入将帧转换为特征向量,并通过帧间相似矩阵进行时间分割来从视频中提取事件数据的方法。然后使用广义少样本分类为视频片段分配标签,生成可解释为事件的带标签、时间戳的子帧序列。传统的流程挖掘技术可用于分析结果数据。我们表明,我们的方法生成的日志准确反映了视频中的流程。

英文摘要

Disciplines such as business process management and process mining aid organizations by discovering insights about processes on the basis of recorded event data. However, an obstacle to process analysis is data multi-modality: for instance, data in video form are not directly interpretable as events. Existing approaches rely on a dictionary of activity label as input, cannot provide frame-by-frame labeling explanations, or rely on superseded computer vision techniques. In this work, we present SnapLog, an approach to extract event data from videos by converting frames to feature vectors using image embeddings and performing temporal segmentation through frame-wise similarity matrices. A generalized few-shot classification is then used to assign labels to the video segments, yielding labeled, timestamped sub-sequences of frames that are interpretable as events. Conventional process mining techniques can be used to analyze the resulting data. We show that our approach produces logs that accurately reflect the process in the videos.

2605.22845 2026-06-18 cs.CE cs.LG 版本更新

A finite-element-inspired bipartite graph learned simulator for manufacturability assessment in large-deformation sheet forming

基于交叉注意力的二分图神经网络用于大变形板材成形中节点和单元场的耦合预测

Yingxue Zhao, Haoran Li, Haosu Zhou, Tobias Pfaff, Nan Li

发表机构 * Dyson School of Design Engineering(设计工程学院) Imperial College London(帝国理工学院伦敦分校) NVIDIA(NVIDIA公司)

AI总结 提出交叉注意力二分图神经网络(CAtt-BiGNN),通过节点-单元二分图结构和边感知交叉注意力机制,实现大变形板材成形中节点位移增量和单元减薄量的耦合预测。

详情
AI中文摘要

大变形板材成形的有限元模拟涉及节点运动学与单元级变形度量之间的节点-单元耦合。机器学习代理可以加速此类模拟,但大多数基于图的模型使用以节点为中心的表示。这种表示对于单元级量是间接的,通常通过插值或后处理从节点预测中恢复。它也可能模糊有限元更新背后的节点-单元耦合结构。本文提出了一种基于交叉注意力的二分图神经网络(CAtt-BiGNN),用于节点位移增量和单元减薄量的耦合预测。该图将网格节点和单元表示为不同但相连的实体,通过有向节点-单元边连接,从而在它们本征的离散域上预测节点场和单元场。边感知交叉注意力处理器根据几何边特征自适应地调节节点-单元耦合权重,实现节点运动状态与单元变形状态之间的双向消息传递。层次化扩展CAtt-BiUGNN将CAtt-BiGNN与图下采样-上采样相结合,以改善在较大网格上的信息传播。进一步评估了自适应高斯噪声作为可选的展开稳定策略。模型在两个具有不同图尺寸的代表性成形案例上进行了测试。与以节点为中心的基线和二分消融变体相比,CAtt-BiGNN改善了位移和减薄预测之间的平衡,而CAtt-BiUGNN在较大图设置下给出了最强的整体性能。结果表明,所提出的模型为大变形板材成形提供了一个有效的代理框架。

英文摘要

Explicit dynamic finite element (FE) simulations are widely used for large deformation engineering analysis, but repeated simulations remain costly during design space exploration and optimisation. In explicit FE analysis, nodal kinematics and element level deformation measures evolve through coupled node element updates. This motivates graph learned simulators that approximate one step FE state transitions and roll them out autoregressively. However, many mesh based graph surrogates are node centred, which makes element level variables and native nodal elemental exchange less direct to represent. This work proposes CAttBiGNN, a cross attention based bipartite graph neural network for coupled nodal elemental learning. The graph represents FE mesh nodes and elements as distinct entities linked by directed node element edges, enabling nodal displacement increments and element level deformation states to be predicted on their native discretisation domains. An edge aware cross attention processor uses geometric edge embeddings to modulate directional node element message passing. For larger graphs, CAttBiUGNN combines the bipartite processor with graph downsampling and upsampling to improve long-range information propagation. The method is evaluated on dome shaped cold forming and corner shaped hot forming benchmarks. Comparisons with node centred baselines and bipartite and attention ablations show improved accuracy and balance in nodal displacement and elemental thinning prediction during autoregressive rollout. The results indicate that the proposed finite element inspired learned simulator can support manufacturability oriented field prediction and efficient design space exploration in large deformation sheet material forming.

2605.26631 2026-06-18 stat.AP cs.LG 版本更新

Data-driven sparse identification of governing PDEs via knockoff filters and multi-criteria trade-offs

基于Knockoff滤波器与多准则权衡的数据驱动稀疏识别控制偏微分方程

Pongpisit Thanasutives, Naichang Ke, Yoshinobu Kawahara

发表机构 * RIKEN Center for Advanced Intelligence Project (AIP)(RIKEN先进人工智能项目中心) The University of Osaka(大阪大学)

AI总结 提出KO-PDE-IDENT框架,通过模型-X knockoff滤波器控制错误发现率,结合递归特征消除和多准则决策,从噪声数据中稀疏识别偏微分方程。

Comments 44 pages, 5 figures, 11 tables

详情
AI中文摘要

我们提出KO-PDE-IDENT,一个用于识别简洁偏微分方程(PDE)并控制错误发现率(FDR)的数据驱动框架。从噪声观测中发现PDE常常受到候选项之间极端多重共线性的阻碍,这导致典型的稀疏回归方法选择虚假项。为了解决这个问题,KO-PDE-IDENT首先通过具有有限样本FDR控制的模型-X knockoff滤波器挖掘潜在候选项的支持集,然后对存活的PDE备选方案进行细化和排序。该框架整合了三个组成部分。首先,通过将$\ell_{0}$约束的自适应最佳子集选择与SHapley Additive exPlanations(SHAP)相结合,构建knockoff特征统计量,产生有效且计算高效的差异统计量。其次,递归特征消除(RFE)过程去除边际贡献可省略的项,并通过knockoff扰动假设检验评估统计必要性。第三,最终模型选择被表述为一个多准则决策(MCDM)问题,其中最优控制方程是在预测精度、模型复杂度和系数不确定性等广泛准则之间取得最佳平衡的备选方案。我们在严重噪声污染下对五个经典PDE验证了KO-PDE-IDENT。实验结果表明,我们的框架可以精确恢复真实的PDE结构,消除错误发现同时保留所有真实潜在项,且系数估计误差低。

英文摘要

We propose KO-PDE-IDENT, a data-driven framework for identifying parsimonious partial differential equations (PDEs) with false discovery rate (FDR) control. PDE discovery from noisy observations is often hindered by extreme multicollinearity among candidate terms, which causes typical sparse-regression methods to select spurious terms. To address this problem, KO-PDE-IDENT initially mines a support set of potential candidate terms via model-X knockoff filters with finite-sample FDR control, then refines and ranks the surviving PDE alternatives. The framework integrates three components. First, knockoff feature statistics are constructed by coupling $\ell_{0}$-constrained adaptive best-subset selection with SHapley Additive exPlanations (SHAP), yielding an effective and computationally efficient difference statistic. Second, a recursive feature elimination (RFE) procedure removes terms whose marginal contributions are dispensable and assesses statistical necessity through knockoff-perturbed hypothesis testing. Third, the final model selection is formulated as a multi-criteria decision-making (MCDM) problem, where the optimal governing equation is the alternative that best balances a wide range of criteria such as predictive accuracy, model complexity and coefficient uncertainty. We evaluate KO-PDE-IDENT on five canonical PDEs under severe noise corruption. Empirical results show that our framework can exactly recover the true PDE structure, eliminating false discoveries while retaining all true underlying terms, with low coefficient estimation error.

2606.06133 2026-06-18 cs.SE cs.AI cs.LG cs.LO 版本更新

TLA-Prover: Verifiable TLA+ Specification Synthesis via Preference-Optimized Low-Rank Adaptation

TLA-Prover: 通过偏好优化低秩适配实现可验证的 TLA+ 规范合成

Eric Spencer, Arslan Bisharat, Brian Ortiz, Khushboo Bhadauria, TaiNing Wang, George K. Thiruvathukal, Konstantin Laufer, Mohammed Abuhamad

发表机构 * Department of Computer Science, Loyola University Chicago(洛约拉芝加哥大学计算机科学系)

AI总结 提出 TLA-Prover 模型,结合监督微调和基于修复的组相对策略优化,在 TLC 模型检查器上实现 TLA+ 规范合成,Gold/Diamond 级别通过率达 30%,约为未调优基线的 3.5 倍。

Comments 12 pages, 5 tables, 3 figures. Accepted at the 21st International Conference on Software Technologies (ICSOFT 2026)

详情
AI中文摘要

TLA+ 是一种用于验证分布式系统和安全关键协议的正式规范语言。大型语言模型(LLM)生成的 TLA+ 规范常常因语义原因无法通过 TLC 模型检查器。在 25 个 LLM 中,最佳公开基线的语法解析成功率为 26.6%,语义模型检查通过率为 8.6%。我们提出了 TLA-Prover,一个 200 亿参数的 TLA+ 规范合成模型。训练结合了在已验证示例上的监督微调(SFT)和基于修复的组相对策略优化(GRPO)。在 GRPO 阶段,模型学习修复自身被拒绝的规范。我们还从相同的 SFT 检查点训练了一个直接偏好优化(DPO)变体作为消融实验。TLC 直接提供奖励信号,无需学习奖励模型。每个输出分为四个等级:青铜(解析通过)、银(无警告)、金(通过 TLC)和钻石。要达到钻石级,模型的正确性属性会被自动微小修改;TLC 必须检测到违反。如果 TLC 仍然通过,则该属性始终为真且无贡献;输出无法达到钻石级。在一个保留的 30 问题基准上,TLA-Prover 在金级和钻石级均达到 9/30(即 pass@1 = 30%)。这大约是未调优基线 8.6% 的 3.5 倍。DPO 变体在钻石级达到 20%。金级和钻石级在每个检查点都一致;这防止了平凡属性失败模式。

英文摘要

TLA+ is a formal specification language for verifying distributed systems and safety-critical protocols. Large language models (LLMs) frequently produce TLA+ specifications that fail the TLC model checker for semantic reasons. Across 25 LLMs, the best public baseline is 26.6% syntactic parse and 8.6% semantic model-check. We present TLA-Prover, a 20-billion-parameter model for TLA+ specification synthesis. Training combines supervised fine-tuning (SFT) on verified examples with repair-based group-relative policy optimization (GRPO). In the GRPO stage, the model learns to fix its own rejected specifications. We also train a direct preference optimization (DPO) variant from the same SFT checkpoint as an ablation. TLC provides the reward signal directly, with no learned reward model. Four tiers grade each output: Bronze (parses), Silver (no warnings), Gold (passes TLC), and Diamond. To reach Diamond, the model's correctness property is automatically altered in a small way; TLC must then detect a violation. If TLC still passes, the property was always-true and contributes nothing; the output fails Diamond. TLA-Prover reaches 9/30 (i.e. pass@1 = 30%) at both Gold and Diamond on a held-out 30-problem benchmark. This is roughly 3.5x the 8.6% untuned baseline. The DPO variant reaches 20% at Diamond. Gold and Diamond coincide at every checkpoint; this prevents the trivial-property failure mode.

2606.08206 2026-06-18 cs.CV cs.LG 版本更新

SegmentAnyTreeV2: Scaling Transformer-Based Tree Instance Segmentation Across Sensors, Platforms, and Forests

SegmentAnyTreeV2:跨传感器、平台和森林的基于Transformer的树木实例分割扩展

Maciej Wielgosz, Stefano Puliti, Rasmus Astrup

发表机构 * Norwegian Institute of Bioeconomy Research (NIBIO)(挪威生物经济研究所(NIBIO))

AI总结 提出SegmentAnyTreeV2,一种传感器和平台无关的森林点云语义与实例分割框架,结合Point Transformer v3骨干网络、轻量语义头和树木交叉注意力掩码解码器,在FOR-instance v3基准上达到90.5%精度和80.2%召回率,并展现出强跨域泛化能力。

Comments 25 pages, 6 figures, 10 tables, Corrected bibliography metadata and minor typographical issues; results unchanged

详情
AI中文摘要

我们提出SegmentAnyTreeV2,一种传感器和平台无关的森林点云语义与实例分割框架。该模型结合了基于序列化的Point Transformer v3骨干网络、轻量级语义头以及专注于树木的交叉注意力掩码解码器。语义预测将实例解码限制在树木类体素上,而实例感知的查询初始化、一对多种子监督和非对称掩码评分改善了密集和结构复杂林分中的分离效果。我们进一步引入了FOR-instance v3,一个扩展的基准数据集,包含427个场景和26,496棵标注树木,涵盖不同生物群落、森林结构和LiDAR平台。在FOR-instanceV2测试集上,SegmentAnyTreeV2实现了90.5%的精度、80.2%的召回率、85.0%的F1分数、90.7%的覆盖率和87.6%的语义mIoU,在实例检测和掩码完整性方面均优于以往基于学习的方法。在独立站点上的零样本评估进一步证明了其强大的跨域泛化能力。

英文摘要

We present SegmentAnyTreeV2, a sensor- and platform-agnostic framework for semantic and instance segmentation of forest point clouds. The model combines a serialization-based Point Transformer v3 backbone with a lightweight semantic head and a tree-focused cross-attention mask decoder. Semantic predictions restrict instance decoding to tree-class voxels, while instance-aware query initialization, one-to-many seed supervision, and asymmetric mask scoring improve separation in dense and structurally complex stands. We further introduce FOR-instance v3, an expanded benchmark comprising 427 scenes and 26,496 annotated trees across diverse biomes, forest structures, and LiDAR platforms. On the FOR-instanceV2 test split, SegmentAnyTreeV2 achieves 90.5% precision, 80.2% recall, 85.0% F1, 90.7% coverage, and 87.6% semantic mIoU, outperforming previous learning-based methods in both instance detection and mask completeness. Zero-shot evaluation on independent sites further demonstrates strong cross-domain generalization.

2606.11615 2026-06-18 cs.CV cs.CR cs.LG 版本更新

Adv-TGD: Adversarial Text-Guided Diffusion for Face Recognition Impersonation Attacks

Adv-TGD:面向人脸识别冒充攻击的对抗性文本引导扩散

Omid Ahmadieh, Nima Karimian

发表机构 * University of South Florida, Bellini College of Artificial Intelligence, Cybersecurity and Computing(南佛罗里达大学贝利尼人工智能、网络安全与计算学院)

AI总结 提出Adv-TGD框架,利用Stable Diffusion和LoRA微调生成逼真对抗人脸,在保持视觉质量的同时实现高成功率身份冒充攻击,平均ASR达85.90%。

详情
AI中文摘要

人脸识别(FR)技术的广泛普及引发了严重的隐私担忧,因为面部数据可能在未经同意的情况下被利用。为了解决这一挑战,我们提出了Adv-TGD,一个生成式对抗攻击框架,能够合成逼真的人脸,冒充目标身份并欺骗人脸识别系统。基于Stable Diffusion,Adv-TGD对每个样本进行LoRA微调,以简洁的文本提示为条件,生成自然但具有对抗性操控的身份。与传统的身份攻击方法不同,我们的方法在单步去噪过程中为每个源-目标对优化轻量级交叉注意力适配器。潜在混合受到面部局部热图掩码的约束,以确保空间精确的身份操控,同时保留非敏感区域。我们引入了一个复合目标,结合了掩码epsilon-MSE重建、FR嵌入空间中的阈值化身份差异、方向特征对齐和源相似性抑制,以平衡对抗攻击和视觉真实性。可选地,LLaVA生成的属性提示增强了细粒度语义细节,而不会重新引入身份线索。在黑盒评估协议下,Adv-TGD在IR152、IRSE50、MobileFace和FaceNet上平均攻击成功率(ASR)达到85.90%,超过语义SOTA基线Adv-CPG +6.25个百分点、基于扩散的化妆方法DiffAIM +3个百分点以及基于噪声的P3-Mask +16个百分点。尽管攻击效果强劲,Adv-TGD仍保持了高视觉保真度(PSNR = 27.15 dB,SSIM = 0.981)。此外,我们通过成功将其扩展到野外数据集(LADN)、通用对象分类(ImageNet)和基于Transformer的扩散模型(FLUX.1),展示了我们框架的灵活性。

英文摘要

The widespread adoption of face recognition (FR) technologies raises serious privacy concerns, as facial data can be exploited without consent. To address this challenge, we propose Adv-TGD, a generative adversarial attack framework that synthesizes photorealistic faces capable of impersonating target identities and deceiving face recognition systems. Built upon Stable Diffusion v2.1, Adv-TGD performs per-sample LoRA fine-tuning conditioned on concise textual prompts to generate natural yet adversarially manipulated identities. Unlike conventional identity attack approaches, our method optimizes lightweight cross-attention adapters for each source-target pair within a fixed-timestep denoising process. Latent blending is constrained by a face-local heatmap mask to ensure spatially precise identity manipulation while preserving non-sensitive regions. We introduce a composite objective that integrates masked epsilon-MSE reconstruction, thresholded identity divergence in FR embedding space, directional feature alignment, and source-similarity suppression to balance adversarial attack and visual realism. Optionally, LLaVA-generated attribute prompts enhance fine-grained semantic details without reintroducing identity cues. Under the black-box evaluation protocol, Adv-TGD attains an average attack success rate (ASR) of 85.90% across IR152, IRSE50, MobileFace, and FaceNet, surpassing the semantic SOTA baseline Adv-CPG by 6.25 points, the diffusion-based makeup method DiffAIM by 3 points, and the noise-based P3-Mask by 16 points. Despite its strong attack efficacy, Adv-TGD preserves high visual fidelity (PSNR = 28.18 dB, SSIM = 0.981). Furthermore, we demonstrate the flexibility of our framework by successfully extending it to in-the-wild datasets (LADN), general object classification (ImageNet), and transformer-based diffusion models (FLUX.1).

2606.12816 2026-06-18 quant-ph cs.ET cs.LG 版本更新

Graph Reinforcement Learning for Calibration-Aware Quantum Circuit Routing

图强化学习用于校准感知的量子电路路由

Yash Vardhan Tomar, Dheeraj Peddireddy

发表机构 * University of California, Berkeley(加州大学伯克利分校) National Institute of Standards and Technology(国家标准与技术研究院)

AI总结 提出一种利用图强化学习进行校准感知的量子电路路由方法,通过IBM Heron r2校准数据选择SWAP操作,在MQT Bench电路上平均保真度达0.727,优于SABRE-best20的0.440。

详情
AI中文摘要

量子电路路由是在为噪声中等规模量子处理器编译程序时的关键步骤。通过标准开销指标看似高效的路由,在通过校准不良的耦合器时仍可能损失保真度。我们研究了一种校准感知的图强化学习路由器,该路由器使用当天的IBM Heron r2校准数据来选择硬件边缘SWAP。我们使用近端策略优化训练策略,并通过九个慕尼黑量子工具包(MQT)基准电路和三个校准快照的精确模拟保真度进行评估。在这些评估中,合并的平均精确保真度为$0.727$,而SABRE-best20为$0.440$,目标感知SABRE为$0.481$。保真度增益伴随着更高的路由双量子比特计数,并集中在5q和8q电路系列中;在固定树动作图下,所有10q系列都倾向于SABRE-best20。总体而言,我们的结果表明,校准感知的学习路由可以超越基于门计数的编译,提高保真度。

英文摘要

Quantum circuit routing is a key step in compiling programs for noisy intermediate-scale quantum processors. Routes that appear efficient by standard overhead metrics can still lose fidelity when they pass through poorly calibrated couplers. We study a calibration-aware graph reinforcement-learning router that uses same-day IBM Heron r2 calibration data to choose hardware-edge SWAPs. We train the policy with proximal policy optimization and evaluate it with exact simulated fidelity across nine Munich Quantum Toolkit (MQT) Bench circuits and three calibration snapshots. Across these evaluations, pooled mean exact fidelity is $0.727$, compared with $0.440$ for SABRE-best20 and $0.481$ for target-aware SABRE. We observed that fidelity gains came with higher routed two-qubit counts and were concentrated in 5 qubit and 8 qubit circuit families; under the fixed tree action graph, all 10 qubit families favored SABRE-best20. Overall, our results show that calibration-aware learned routing can improve fidelity beyond gate-count-driven compilation.

2606.17276 2026-06-18 cs.IR cs.LG 版本更新

On the Memorization Behavior of LLMs in Generative Recommendation: Observations, Implications, and Training Strategies

LLM在生成式推荐中的记忆行为:观察、启示与训练策略

Sunwoo Kim, Sunkyung Lee, Clark Mingxuan Ju, Donald Loveland, Bhuvesh Kumar, Kijung Shin, Neil Shah, Liam Collins

发表机构 * KAIST(韩国科学技术院) Sungkyunkwan University(成均馆大学) Snap Inc.(Snap公司)

AI总结 研究LLM在生成式推荐中的记忆倾向,发现其过度依赖一跳记忆,提出IIRG训练策略以学习多跳协同与语义关系,显著提升对非一跳记忆用户的推荐效果。

详情
AI中文摘要

生成式推荐(GR)已成为推荐系统的一个有前景的方向。最近,大型语言模型(LLM)越来越多地被用于GR,因为其丰富的预训练知识有望帮助它们泛化到传统以记忆为导向的基线所能捕捉的常见用户行为模式之外。然而,现有的基于LLM的GR工作很大程度上忽略了LLM众所周知的记忆倾向,如果这种倾向存在于为GR微调的LLM中,将限制它们对预训练知识的利用。在这项工作中,我们通过检查一跳记忆(即模型推荐训练数据中项目的直接后继项目)来研究这一担忧。我们表明,LLM比非LLM的GR模型更频繁地这样做——事实上,它们相对于GR基线的大部分增益实际上来自那些目标项目可以通过一跳记忆预测的用户。我们直觉认为,提高剩余用户的性能需要LLM学习更丰富的项目-项目关系,超越一跳转换。为此,我们提出了IIRG,一种新颖的训练策略,教导LLM捕获:(1)从用户序列中跨多跳的项目共现导出的协同关系,以及(2)具有相似主题的项目之间的语义关系,这两者都可以作为有用的推荐信号。我们表明,IIRG显著优于仅使用标准下一项目预测训练的LLM,尤其是对于那些测试项目在训练时的一跳转换中未覆盖的用户,增益尤为显著。

英文摘要

Generative recommendation (GR) has emerged as a promising direction for recommender systems. Recently, large language models (LLMs) have been increasingly adopted for GR, as their rich pretrained knowledge is expected to help them generalize beyond common user behavior patterns that traditional memorization-oriented baselines can capture. However, existing LLM-based GR works largely ignore LLMs' well-known tendency to memorize, which, if present in LLMs fine-tuned for GR, would restrict their utilization of pretrained knowledge. In this work, we investigate this concern by examining one-hop memorization, where a model recommends items that are direct successors of items in the training data. We show that LLMs do this more than non-LLM-based GR models-in fact, the vast majority of their gains over GR baselines are actually on users whose target items can be predicted through one-hop memorization. We intuit that improving performance on the remaining users requires LLMs to learn richer item-item relations beyond one-hop transitions. To achieve this, we propose IIRG, a novel training strategy that teaches LLMs to capture: (1) collaborative relations derived from item co-occurrences across multiple hops in user sequences, and (2) semantic relations among items with similar themes, both of which can serve as useful recommendation signals. We show that IIRG significantly improves over LLMs trained solely with standard next-item prediction, with especially large gains for users whose test items are not covered by train-time one-hop transitions.

2606.17846 2026-06-18 cs.RO cs.CV cs.LG 版本更新

Qwen-RobotManip Technical Report: Alignment Unlocks Scale for Robotic Manipulation Foundation Models

Qwen-RobotManip 技术报告:对齐解锁机器人操作基础模型的规模

Haoqi Yuan, Zhixuan Liang, Anzhe Chen, Ye Wang, Haoyang Li, Pei Lin, Yiyang Huang, Zixing Lei, Tong Zhang, Jiazhao Zhang, Jie Zhang, Jingyang Fan, Gengze Zhou, Qihang Peng, Chenxu Lv, Xiaoyue Chen, An Yang, Fei Huang, Junyang Lin, Dayiheng Liu, Jingren Zhou, Chenfei Wu, Xiong-Hui Chen

发表机构 * Qwen Team(Qwen团队)

AI总结 提出 Qwen-RobotManip,通过统一的对齐框架(表示、运动和行为维度)实现多源异构操作数据的大规模协同训练,构建约38,100小时预训练语料,在零样本指令跟随、跨本体迁移等泛化能力上超越先前模型。

Comments 44 pages

详情
AI中文摘要

语言和多模态基础模型通过统一公式对齐异构数据并大规模训练,实现了强大的泛化能力。在本报告中,我们研究这种扩展方法是否可以应用于机器人操作以实现真正的泛化。这具有挑战性,因为与文本不同,操作数据本质上是异构的、收集成本高且多样性狭窄,使得对齐和规模同时变得困难。我们提出了 Qwen-RobotManip,一个基于 Qwen-VL 构建的可泛化视觉-语言-动作基础模型。Qwen-RobotManip 引入了一个跨操作表示、运动和行为维度的统一对齐框架,使大规模多源训练变得一致而非冲突。这种对齐能力进而使 Qwen-RobotManip 能够吸收以前训练方案无法维持规模的操作数据。一个人到机器人合成流水线将第一人称手部演示转换为跨15个平台的机器人轨迹,一个严格的策展流水线协调异构数据集。仅使用开源数据集和人类视频,无需专有数据收集,Qwen-RobotManip 构建了约38,100小时的预训练语料,并展现出涌现的泛化能力,包括零样本指令跟随、对扰动的鲁棒性、反应性错误恢复和跨本体迁移。我们发现标准基准无法捕捉预训练质量,因此采用了包括 RoboCasa365、LIBERO-Plus、EBench、RoboTwin-Clean2Rand、RoboTwin-IF 和 RoboTwin-XE 在内的 OOD 设置。Qwen-RobotManip 在所有 OOD 设置中显著优于先前最先进的模型(包括 π0.5),在 RoboChallenge 中排名第一,相对改进20%,并在包括 AgileX ALOHA、Franka、UR 和 ARX 在内的真实机器人平台上得到验证。

英文摘要

Foundation models in language and multimodality achieve strong generalization by aligning heterogeneous data under a unified formulation and training at scale. In this report, we investigate whether this scaling recipe can be applied to robotic manipulation to achieve genuine generalization. This is challenging because, unlike text, manipulation data is heterogeneous by nature, expensive to collect, and narrow in diversity, making alignment and scale simultaneously difficult. We present Qwen-RobotManip, a generalizable Vision-Language-Action foundation model built on Qwen-VL. Qwen-RobotManip introduces a unified alignment framework across the representation, motion, and behavioral dimensions of manipulation, making large-scale multi-source training coherent rather than conflicting. This alignment capability in turn enables Qwen-RobotManip to absorb manipulation data at a scale that prior training regimes could not sustain. A human-to-robot synthesis pipeline converts egocentric hand demonstrations into robot trajectories across 15 platforms, and a rigorous curation pipeline harmonizes heterogeneous datasets. Using only open-source datasets and human videos without proprietary data collection, Qwen-RobotManip constructs a ~38,100-hour pretraining corpus and exhibits emergent generalization capabilities, including zero-shot instruction following, robustness to perturbations, reactive error recovery, and cross-embodiment transfer. We find that standard benchmarks fail to capture pretraining quality and instead adopt OOD settings including RoboCasa365, LIBERO-Plus, EBench, RoboTwin-Clean2Rand, RoboTwin-IF, and RoboTwin-XE. Qwen-RobotManip substantially outperforms prior state-of-the-art models, including $π$0.5, across all OOD settings, ranks 1st in RoboChallenge with a 20% relative improvement, and is validated on real-robot platforms including AgileX ALOHA, Franka, UR, and ARX.

2606.18105 2026-06-18 cs.NI cs.LG 版本更新

OmniPlan: An Adaptive Framework for Timely and Near-Optimal Network Planning Optimization

OmniPlan:一种用于及时且近乎最优的网络规划优化的自适应框架

Longlong Zhu, Jiashuo Yu, Zedi Chen, Yuhan Wu, Zhifan Jiang, Yuchen Xian, Yimeng Liu, Jiajie Su, Shaopeng Zhou, Xingyuan Li, Hongyan Liu, Xuan Liu, Dong Zhang, Chunming Wu, Xiang Chen

发表机构 * Zhejiang University(浙江大学) Fuzhou University(福州市大学) Yangzhou University(扬州大学) The State Key Laboratory of Blockchain and Data Security(区块链与数据安全国家重点实验室) College of Computer Science and Technology(计算机科学与技术学院)

AI总结 提出OmniPlan自适应框架,利用大语言模型解析用户意图,通过混合专家架构动态选择MIP求解器、启发式算法或深度强化学习模型,实现网络规划优化的及时性与近乎最优性,在分布式机器学习推理卸载任务中延迟降低97.8%,资源消耗降低11.5%。

Comments Accepted by ACM KDD 2026

详情
AI中文摘要

网络规划优化是跨多个领域(包括交通系统、通信网络和电网)的基本问题。它需要在复杂约束下同时优化多个相互竞争的目标。现有的网络规划优化框架依赖混合整数规划(MIP)求解器、启发式算法和深度强化学习(DRL)模型来计算规划决策。然而,它们缺乏对多样化和动态用户意图的有效适应性,从而导致执行时间与最优性之间的权衡。在本文中,我们提出OmniPlan,一种自适应框架,在网络规划优化中同时实现及时性和近乎最优性。为了实现现有解决方案所缺乏的适应性,OmniPlan采用基于大语言模型(LLM)的解释器,将异构的自然语言意图转换为统一且可量化的用户偏好向量。然后,它采用混合专家架构,集成MIP求解器、启发式算法和DRL模型作为专门专家,OmniPlan通过动态选择及时且近乎最优的专家来适应多样化的意图。最后,它包含一个基于DRL的专家配置模块,该模块微调优化目标权重,使规划决策与用户特定偏好对齐。我们使用代表性的真实工作负载(即分布式机器学习(ML))评估OmniPlan,其中我们利用OmniPlan将广泛的ML推理任务(例如决策树、SVM、朴素贝叶斯、XGBoost和随机森林)卸载到硬件设备网络。我们在真实测试平台上的实验表明,OmniPlan为真实ML推理任务实现了近乎最优且低执行时间的卸载,延迟降低高达97.8%,网络设备资源消耗降低高达11.5%。

英文摘要

Network planning optimization is a fundamental problem across diverse domains, including transportation systems, communication networks, and power grids. It requires simultaneous optimization of multiple competing objectives under complex constraints. Existing network planning optimization frameworks rely on mixed integer programming (MIP) solvers, heuristics, and deep reinforcement learning (DRL) models to compute planning decisions. However, they lack effective adaptability to diverse and dynamic user intents, thus leading to the trade-off between execution time and optimality. In this paper, we propose OmniPlan, an adaptive framework that achieves both timeliness and near-optimality in network planning optimization. To achieve the adaptability lacking in existing solutions, OmniPlan employs a large language model (LLM)-based interpreter to convert heterogeneous natural-language intents into a unified and quantifiable user-preference vector. Then it employs a mixture-of-experts architecture that integrates MIP solvers, heuristics, and DRL models as specialized experts, where OmniPlan adapts to diverse intents by dynamically selecting timely and near-optimal experts. Finally, it incorporates a DRL-based expert configuration module that fine-tunes optimization objective weights to align planning decisions with user-specific preferences. We evaluate OmniPlan with a representative real-world workload, i.e., distributed machine learning (ML), where we leverage OmniPlan to offload a wide spectrum of ML inference tasks, e.g., decision trees, SVM, naive Bayes, XGBoost, and random forests, onto a network of hardware devices. Our experiments on a real-world testbed indicate that OmniPlan achieves near-optimal and low-execution-time offloading for real-world ML inference tasks, reducing latency by up to 97.8\% and network device resource consumption by up to 11.5\%.

13. 其他/综合机器学习 22 篇

2606.19317 2026-06-18 cs.LG cs.AI 新提交

Explaining Attention with Program Synthesis

用程序合成解释注意力机制

Amiri Hayes, Belinda Li, Jacob Andreas

发表机构 * NJIT(新泽西理工学院) MIT EECS(麻省理工学院电气工程与计算机科学系) MIT CSAIL(麻省理工学院计算机科学与人工智能实验室)

AI总结 提出用可执行程序近似深度网络组件行为的方法,针对Transformer注意力头,通过生成Python程序再现注意力模式,实现可解释性。

详情
AI中文摘要

可解释深度学习研究的一个长期目标是,用人类可理解的符号描述取代不透明的神经计算。本文提出了一种用可执行程序近似深度网络组件行为的方法。我们专注于Transformer语言模型中的注意力头。对于给定的注意力头,我们首先在一组随机选择的训练样本上计算其关联的注意力矩阵。接着,我们向预训练语言模型提供这些矩阵的摘要,并指示它生成一组Python程序,这些程序仅根据输入句子中的文本即可再现相关的注意力模式。最后,我们根据最终程序集在保留输入上预测行为的效果对程序进行重新排序。我们证明,少于1000个这样的生成程序即可再现GPT-2、TinyLlama-1.1B和Llama-3B中注意力头的注意力模式,在TinyStories上平均交并比相似度超过75%。此外,最佳匹配程序可以替代神经注意力头而不会显著影响模型行为:在三个模型中用程序替代25%的注意力头仅导致平均困惑度增加16%,同时在各种下游问答基准上保持性能。这项工作为使用人类可读、可执行的代码逆向工程Transformer模型中的注意力头提供了一个可扩展的流程,推动了神经模型向符号透明性的发展。

英文摘要

A longstanding goal of research on interpretable deep learning is to replace opaque neural computations with human-meaningful symbolic descriptions. In this paper, we propose an approach for approximating the behavior of components of deep networks with executable programs. We focus on attention heads in transformer language models. For a given head, we first compute its associated attention matrices on a collection of randomly selected training examples. Next, we prompt a pre-trained language model with a summary of these matrices, and instruct it to generate a set of Python programs that can reproduce the associated attention patterns given only text from the input sentence. Finally, we re-rank programs according to how well our final set of programs predict behavior on held-out inputs. We demonstrate that a set of fewer than 1,000 such generated programs can reproduce the attention patterns of heads in GPT-2, TinyLlama-1.1B, and Llama-3B, achieving an average Intersection-over-Union similarity above 75% on TinyStories. Moreover, the best-fit programs can replace neural attention heads without substantially affecting model behavior: replacing 25% of attention heads with programmatic surrogates across the three models incurs only a 16% average perplexity increase, while maintaining performance on a variety of downstream question answering benchmarks. This work contributes a scalable pipeline for reverse-engineering attention heads in transformer models using human-readable, executable code, advancing a path toward symbolic transparency in neural models.

2606.18535 2026-06-18 stat.ME cs.LG math.ST stat.TH 交叉投稿

Shrinkage priors for Bayesian Substitute Confounders

贝叶斯替代混杂因子的收缩先验

Yordan P. Raykov, Hengrui Luo, Justin D. Strait, Wasiur R. KhudaBukhsh

发表机构 * School of Mathematical Sciences, University of Nottingham, Nottingham, UK(诺丁汉大学数学科学学院) Department of Statistics, Rice University, USA(里士满大学统计学系;伯克利国家实验室) Lawrence Berkeley National Laboratory, USA(洛斯阿拉莫斯国家实验室统计科学组) Statistical Sciences Group, Los Alamos National Laboratory, USA

AI总结 针对多原因观察研究中替代混杂因子过度编码问题,提出贝叶斯因子分配框架,利用收缩先验学习稀疏替代混杂因子,保持粗粒度多原因依赖,并证明后验集中性和重叠保持几何性质,实现潜在结果的一致性估计。

详情
AI中文摘要

多原因观察研究通过原因间的依赖结构包含关于未测量混杂的信息。然而,对未观测混杂的直接插补通常比学习一个低维替代得分更复杂,该得分保留了稳定因果调整所需的共享分配变异。去混杂因子(Wang and Blei, 2019)及相关替代混杂因子方法利用了这一思想,但灵活的分配模型可以拟合原因的联合分布,同时产生过度编码处理向量、破坏重叠或捕获单原因变异的得分。我们开发了一个贝叶斯因子分配框架,用于学习稀疏替代混杂因子,该框架通过收缩先验保留粗粒度的多原因依赖。该理论在后验集中性、因子得分收缩和保留重叠的分配几何层面进行阐述,因此不依赖于特定的收缩先验。在这些条件下,当相应的潜变量识别假设成立时,所提出的回归调整估计量对平均潜在结果是一致的。收缩先验为潜在结构学习提供了自然工具:它们倾向于由多个原因支持的低维因子,阻止有效的单原因因子,并通过渐进收缩诱导潜在因子的排序。合成实验说明了信号强度、结果有效性和几何感知正则化的作用。在阿尔茨海默病神经影像学倡议(ADNI)基线分析中,稀疏替代得分恢复了对侵入性脑脊液生物标志物直接条件调整的大部分效果,而重叠崩溃诊断则识别出拟合因子何时简化为单个观测测量。

英文摘要

Multi-cause observational studies contain information about unmeasured confounding through the dependence structure among causes. However, literal imputation of the unobserved confounder is often more complex than learning a lower-dimensional substitute score that preserves the shared assignment variation needed for stable causal adjustment. The deconfounder (Wang and Blei, 2019) and related substitute confounder methods exploit this idea, but flexible assignment models can fit the joint distribution of the causes while producing scores that over-encode the treatment vector, collapse overlap, or capture single-cause variation. We develop a Bayesian factor assignment framework for learning sparse substitute confounders that retain coarse multi-cause dependence with shrinkage priors. The theory is stated at the level of posterior concentration, factor score contraction, and overlap-preserving assignment geometry and therefore does not rely on a particular shrinkage prior. Under these conditions, the proposed regression-adjusted estimators are consistent for mean potential outcomes when the corresponding latent variable identification assumptions hold. Shrinkage priors provide a natural tool for latent structural learning: they favour low-dimensional factors supported by multiple causes, discourage effectively single-cause factors, and induce an ordering of the latent factors through progressive shrinkage. Synthetic experiments illustrate the roles of signal strength, outcome validity, and geometry-aware regularization. In an Alzheimer's Disease Neuroimaging Initiative (ADNI) baseline analysis, sparse substitute scores recover much of the adjustment obtained by directly conditioning on invasive cerebrospinal-fluid biomarkers, while collapse diagnostics identify when fitted factors reduce to individual observed measurements.

2606.19270 2026-06-18 eess.IV cs.LG physics.med-ph 交叉投稿

Beyond Algorithms: Conceptual Innovation in Medical Imaging AI

超越算法:医学影像人工智能中的概念创新

Mark A. Anastasio

发表机构 * Mallinckrodt Institute of Radiology and Department of Electrical & Systems Engineering, Washington University in St. Louis(马林克罗德特放射医学研究所和电气与系统工程系,华盛顿大学圣路易斯分校)

AI总结 本文区分算法创新与概念创新,指出当前激励结构过度奖励算法新颖性而忽视概念贡献,通过医学影像AI案例展示概念不足导致的错位目标与有限临床影响,并提出促进概念创新的建议。

详情
AI中文摘要

人工智能推动了医学影像研究的快速发展,产生了日益复杂的算法,并在基准任务上稳步改进。然而,这种以算法为中心的发展轨迹也揭示了一个日益加剧的不平衡:虽然计算方法快速进步,但定义成像任务、评估指标和临床意义的概念基础有时仍未得到充分审视。在这篇观点文章中,我们区分了算法创新(专注于在固定问题定义内改进计算实现和性能)与概念创新(重新定义提出的问题、衡量成功的方式以及方法在临床上的相关性)。我们认为,当前的激励结构、培训路径和发表规范不成比例地奖励算法新颖性,尤其是对早期职业研究者而言,而有时低估了对科学成熟和临床转化至关重要的概念贡献。通过医学影像AI的代表性例子,我们展示了概念基础不足如何导致目标错位、泛化脆弱以及现实世界影响有限。最后,我们为研究者、导师、审稿人和期刊提出了可操作的建议,以更好地识别、支持和整合概念创新与算法进步。

英文摘要

Artificial intelligence has driven rapid progress in medical imaging research, producing increasingly sophisticated algorithms and steady improvements on benchmark tasks. However, this algorithm-centric trajectory has also revealed a growing imbalance: while computational methods advance rapidly, the conceptual foundations that define imaging tasks, evaluation metrics, and clinical meaning sometimes remain underexamined. In this Perspective, we distinguish algorithmic innovation, which focuses on improving computational implementations and performance within a fixed problem definition, from conceptual innovation, which reframes what problems are posed, how success is measured, and why an approach is clinically relevant. We argue that prevailing incentive structures, training pathways, and publication norms disproportionately reward algorithmic novelty, particularly for early-career researchers, while at times undervaluing conceptual contributions that are essential for scientific maturation and clinical translation. Through representative examples from medical imaging AI, we show how insufficient conceptual grounding can lead to misaligned objectives, fragile generalization, and limited real-world impact. We conclude with actionable recommendations for researchers, mentors, reviewers, and journals to better recognize, support, and integrate conceptual innovation alongside algorithmic advances.

2412.16468 2026-06-18 cs.LG 版本更新

The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

通往人工超级智能之路:超级对齐的全面综述

HyunJin Kim, DongHyun Ryu, Xiaoyuan Yi, Jing Yao, Jianxun Lian, Muhua Huang, Shitong Duan, JinYeong Bak, Xing Xie

发表机构 * Microsoft Research Asia(微软亚洲研究院) Sungkyunkwan University(顺天大学) Stanford University(斯坦福大学) Fudan University(复旦大学)

AI总结 本文综述了超级对齐问题,通过分析可扩展监督范式(夹层、自我增强和弱到强泛化)及其局限性,探讨了监督、控制和管理人工超级智能的挑战与路径。

Comments 24 pages

详情
AI中文摘要

大型语言模型(LLMs)的出现引发了关于人工超级智能(ASI)的讨论,这是一种假设性的、超越人类智能的AI系统。尽管ASI仍处于假设阶段且远超出当前AI能力,但讨论其潜力、探索其可行性和潜在风险对于未来AI系统的发展至关重要。超级对齐的概念源于可扩展监督,后者研究当直接人类监督不足时如何监督日益强大的AI系统。本文聚焦于超级对齐问题:“监督、控制和管理人工超级智能的过程”。我们首先回顾可扩展监督范式——夹层、自我增强和弱到强泛化,然后通过可能性和不可能性的视角分析当前范式的局限性,讨论关键挑战,并提出未来AI系统安全持续改进的路径。

英文摘要

The emergence of large language models (LLMs) has sparked discussion on Artificial Superintelligence (ASI), a hypothetical AI system that surpasses human intelligence. Although ASI remains hypothetical and far beyond current AI capabilities, discussing its potential and exploring its feasibility and potential risks is critical for the development of future AI systems. The idea of superalignment originates from scalable oversight, which studies how to supervise increasingly capable AI systems when direct human supervision becomes insufficient. In this paper, we focus on the superalignment problem: "The process of supervising, controlling, and governing artificial superintelligence." We first review scalable oversight paradigms-Sandwiching, Self-Enhancement, and Weak-to-Strong Generalization -- then analyze the limitations of current paradigms through the lens of possibility and impossibility, discuss key challenges, and propose pathways for the safe and continual improvement of future AI systems.

2605.08934 2026-06-18 cs.LG 版本更新

From Mechanistic to Compositional Interpretability

从机制到组合可解释性

Ward Gauderis, Thomas Dooms, Steven T. Homer, Kola Ayonrinde, Geraint A. Wiggins

发表机构 * UK AI Security Institute(英国人工智能安全研究所)

AI总结 本文提出组合可解释性框架,通过范畴论原理解决机制可解释性无法客观验证的问题,将解释质量分解为忠实度和复杂度,引入压缩细化方法实现模型简化,理论证明简洁性准则保障人类对齐的解释。

详情
AI中文摘要

机制可解释性旨在通过逆向工程神经模型的行为来解释其计算结构,但缺乏正式框架导致无法客观验证。本文引入组合可解释性,基于组合性和最小描述长度原则的范畴论框架。组合解释是语法和语义映射的对,必须满足一致性。将解释质量分解为忠实度和复杂度,将其视为约束优化问题,并引入压缩细化方法系统地重构模型为更简单的部分。最后证明了在简洁性准则下,语法压缩理论上能保证更简洁的人类对齐解释。该框架将 prominent 机制方法作为细化子类,澄清了为何其压缩性启发式方法与人类可解释性一致。本文为自动化发现和评估机制解释提供了可测量、可优化的基础。

英文摘要

Mechanistic interpretability aims to explain neural model behaviour by reverse-engineering learned computational structure into human-understandable components. Without a formal framework, however, mechanistic explanations cannot be objectively verified, compared, or composed. We introduce compositional interpretability, a category-theoretic framework grounded in the principles of compositionality and minimum description length. Compositional interpretations are pairs of syntactic and semantic mappings that must commute to enforce consistency between a model's decomposition and its observed behaviour. We deconstruct explanation quality into measures of faithfulness and complexity to cast interpretability as a constrained optimisation problem, and introduce compressive refinement to systematically restructure models into simpler parts without altering their function. Finally, we derive a parsimony criterion under which syntactic compression theoretically guarantees more concise, human-aligned explanations. Our framework situates prominent mechanistic methods as subclasses of refinement, and clarifies why their compressibility heuristics tend to align with human interpretability. Our work provides a measurable, optimisable blueprint for automating the discovery and evaluation of mechanistic explanations.

2410.21258 2026-06-18 quant-ph cs.CC cs.LG 版本更新

Provable quantum speedups for computing persistence in topological data analysis

可证明的量子加速用于拓扑数据分析中的持久性计算

Casper Gyurik, Alexander Schmidhuber, Robbie King, Vedran Dunjko, Ryu Hayakawa

发表机构 * applied Quantum algorithms (aQa), Leiden University, 2300 RA Leiden, The Netherlands Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, USA Department of Computing Yukawa Institute for Theoretical Physics \& The Hakubi Center, Kyoto University, Japan

AI总结 提出一种高效量子算法,用于判断拓扑数据分析中洞的持久性,并证明该问题为BQP_1-hard,暗示在标准复杂性假设下存在指数级量子加速。

Comments 17 pages

详情
Journal ref
PRX Quantum 7, 020361 (2026)
AI中文摘要

拓扑数据分析(TDA)旨在通过检查数据拓扑中空洞的数量和持久性,从数据集中提取对噪声鲁棒的特征。我们为与TDA核心任务密切相关的一个计算问题提供了高效的量子算法——判断给定空洞是否在不同长度尺度上持续存在。此外,我们证明该问题本身是$\mathsf{BQP}_1$-hard的,意味着经典解决方案极不可能;这与所有先前的TDA量子方法形成对比,在这些方法中,问题对于量子计算机也是难解的,或者严格的经典困难性证明仍然悬而未决。这一结果表明,在标准复杂性理论假设下,该问题存在指数级的量子加速。我们的方法依赖于将空洞的持久性编码到引导稀疏哈密顿量问题的一个变体中,其中引导态由空洞的调和代表元构造而成。

英文摘要

Topological data analysis (TDA) aims to extract noise-robust features from a data set by examining the number and persistence of holes in its topology. We provide an efficient quantum algorithm for a computational problem closely related to a core task in TDA -- determining whether a given hole persists across different length scales. Further, we prove the problem itself is $\mathsf{BQP}_1$-hard, implying that a classical solution is extremely unlikely; this stands in contrast to all previous quantum approaches to TDA, where the problems were also intractable for quantum computers, or where a rigorous proof of classical hardness still remains open. This result implies an {exponential} quantum speedup for this problem under standard complexity-theoretic assumptions. Our approach relies on encoding the persistence of a hole in a variant of the guided sparse Hamiltonian problem, where the guiding state is constructed from a harmonic representative of the hole.

2604.23716 2026-06-18 cs.AI cs.IT cs.LG cs.MA math.IT 版本更新

Information-Theoretic Measures in AI: A Practical Decision Guide

人工智能中的信息论度量:实用决策指南

Nikolaos Al. Papadopoulos, Konstantinos E. Psannis

发表机构 * Department of Applied Informatics, University of Macedonia(马其顿大学应用信息系)

AI总结 本文为七种信息论度量提供实用决策框架,围绕每个度量的三个关键问题:回答的问题与AI场景、适合的估计器、最危险的误用,并附有流程图和决策表。

Comments 25 pages, 2 tables, 1 figure. Submitted to Entropy (MDPI)

详情
AI中文摘要

信息论(IT)度量在人工智能中无处不在:熵驱动决策树分裂和不确定性量化,交叉熵是默认的分类损失,互信息支撑表示学习和特征选择,转移熵揭示动态系统中的有向影响。第二类较不成熟的度量——整合信息(Phi)、有效信息(EI)和自主性——已出现用于表征智能体复杂性。尽管被广泛采用,度量选择常常与估计器假设、失败模式和安全的推断主张脱节。本文为所有七种度量提供了一个实用决策框架,围绕每个度量的三个指导性问题组织:(i)该度量回答什么问题,在何种AI背景下;(ii)哪种估计器适合数据类型和维度;(iii)最危险的误用是什么。该框架通过两个互补的人工制品实现:度量选择流程图和主决策表。我们涵盖每个度量的AI/ML和决策智能体应用领域,并使用标准化桥接框将IT量与认知构造联系起来。三个工作示例展示了该框架在具体从业者场景中的应用,涵盖表示学习、时间影响分析和进化智能体复杂性。

英文摘要

Information-theoretic (IT) measures are ubiquitous in artificial intelligence: entropy drives decision-tree splits and uncertainty quantification, cross-entropy is the default classification loss, mutual information underpins representation learning and feature selection, and transfer entropy reveals directed influence in dynamical systems. A second, less consolidated family of measures, integrated information (Phi), effective information (EI), and autonomy, has emerged for characterizing agent complexity. Despite wide adoption, measure selection is often decoupled from estimator assumptions, failure modes, and safe inferential claims. This paper provides a practical decision framework for all seven measures, organized around three prescriptive questions for each: (i) what question does the measure answer and in which AI context; (ii) which estimator is appropriate for the data type and dimensionality; and (iii) what is the most dangerous misuse. The framework is operationalized in two complementary artifacts: a measure-selection flowchart and a master decision table. We cover both AI/ML and decision-making agent application domains per measure, with standardized Bridge Boxes linking IT quantities to cognitive constructs. Three worked examples illustrate the framework on concrete practitioner scenarios spanning representation learning, temporal influence analysis, and evolved agent complexity.

2605.17131 2026-06-18 cs.CV cs.AI cs.LG 版本更新

A Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation

针对点云分类和分割的深度学习架构系统性调研

Minhas Kamal, Hiranya Garbha Kumar, Balakrishnan Prabhakaran

发表机构 * State University of New York at Albany(纽约州立大学阿尔巴尼分校)

AI总结 本文系统性地探讨了点云分类和分割中的深度学习架构,分析了点云数据的结构特性,分类了不同架构的工作,并评估了其在主流基准上的性能,同时指出了开放挑战和未来方向。

Comments We reviewed a decade of advancements in point cloud processing: trace the evolution of the field from its foundational roots to the modern SOTA, analyze how diverse architectures overcome the inherent geometric challenges of 3D data, and map out critical research gaps alongside promising future directions. GitHub: https://github.com/MinhasKamal/DeepLearningForPointCloud

详情
Journal ref
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2026
AI中文摘要

点云因其简洁性和几何保真度而成为表示3D形状和场景最广泛采用的格式。然而,其固有的无序和不规则性质,加剧了传感器噪声和遮挡的影响,给基于机器学习的方法带来了独特的挑战。为应对这些问题,已开发出多种策略,包括转换为有序格式、提取局部几何特征以及基于排列不变或自注意力的处理方法。在本文中,我们的重点是深度学习模型在3D视觉三个基本任务中的应用:点云分类、部分分割和语义分割。我们首先正式定义点云数据,然后深入讨论其结构特性。接着,我们根据其骨干结构对重要工作进行分类,并评估其在流行基准上的性能。除了经验比较外,我们还提供了架构创新和局限性的见解。我们还概述了3D点云理解中的开放挑战和有前途的未来方向。

英文摘要

Point cloud stands as the most widely adopted format for representing 3D shapes and scenes due to its simplicity and geometric fidelity. However, its inherent unordered and irregular nature, exacerbated by sensor noise and occlusions, introduces unique challenges for machine learning based methodologies. To combat these issues, diverse strategies have been developed, including converting to a format that has orderliness, extracting local geometry, and permutation-invariant or self-attention-based processing. In this paper, our focus is directed towards deep learning models for three fundamental tasks in 3D vision: point cloud classification, part segmentation, and semantic segmentation. We begin by formally defining point cloud data, followed by an in-depth discussion on its structural characteristics. Then, we categorize notable works based on their backbone structure and evaluate their performance on popular benchmarks. Beyond empirical comparison, we offer insights into architectural innovations and limitations. We also outline open challenges and promising future directions for 3D point cloud understanding.

2605.25929 2026-06-18 cs.MA cs.LG 版本更新

Multi-Agent Systems are Mixtures of Experts: Who Becomes an Influencer?

多智能体系统是专家混合:谁成为影响者?

Franka Bause, Jonas Niederle, Martin Pawelczyk, Rebekka Burkholz

发表机构 * CISPA Helmholtz Center for Information Security(CISPA海德堡信息安全中心) Faculty of Computer Science, University of Vienna(维也纳大学计算机科学系)

AI总结 本文通过Friedkin-Johnsen意见动力学模型分析多智能体LLM协商机制,揭示输入依赖的FJ参数使系统成为专家混合,并探讨基于自信度、感知自信度和初始观点对齐的影响者形成机制。

Comments Accepted at the 2nd Workshop on Compositional Learning at ICML 2026

详情
AI中文摘要

多智能体LLM协商的有效性不仅取决于智能体的个体预测,还取决于它们如何沟通和协作。我们通过Friedkin-Johnsen (FJ)意见动力学的视角研究这一机制,这是一个可处理的模型,用于分析多智能体系统中的固执、影响力和意见变化,并捕捉经验观察到的协商模式。我们表明FJ参数是输入依赖的,将多智能体协商转变为专家混合。这一视角意味着,当路由反映智能体能力时,多智能体系统可以胜过单个智能体和静态集成。由于能力在实践中是潜在的,我们分析了影响力如何通过可观察的代理建立:智能体的自我评估自信度、感知自信度以及与其他智能体观点的初始对齐。

英文摘要

The effectiveness of multi-agent LLM deliberation depends not only on the agents' individual predictions, but also on how they communicate and collaborate. We study this mechanism through the lens of Friedkin-Johnsen (FJ) opinion dynamics, a tractable model for analyzing stubbornness, influence, and opinion change in multi-agent systems that captures empirically observed deliberation patterns. We show that the FJ parameters are input-dependent, turning multi-agent deliberation into a mixture of experts. This perspective implies that multi-agent systems can outperform single agents and static ensembles when routing reflects agent competence. Since competence is latent in practice, we analyze how influence is established through observable proxies: agents' self-assessed confidence, their perceived confidence, and initial alignment with other agents' views.

2606.17454 2026-06-18 cs.AI cs.LG 版本更新

Dissecting model behavior through agent trajectories

通过智能体轨迹剖析模型行为

Gaurav Gupta, Vatshank Chaturvedi, Jun Huan, Anoop Deoras

发表机构 * AWS AI Labs(AWS人工智能实验室)

AI总结 本文提出“意图-执行差距”概念,并设计Simple Strands Agent(SSA)框架,通过分析138k条轨迹揭示模型在自主问题解决中的行为差异。

Comments 106 pages, 50 Figures, 16 Tables

详情
AI中文摘要

AI智能体性能不仅仅是一个建模问题,它本质上是一个系统问题。模型的高级能力通过智能体框架(harness)实现。因此,模型假设与框架行为之间的差距很容易阻止模型的全部能力转化为智能体性能。我们将此形式化为“意图-执行差距”:模型意图与框架执行之间的不匹配,反之亦然。我们认为,最小化这种意图-执行差距与框架设计的其他方面(如工具和执行循环)同样重要。为了说明这种框架-模型对齐的影响,我们开发了一个简单且可定制的框架,称为“Simple Strands Agent”(SSA)。SSA旨在找到跨不同模型家族(如Claude、Gemini、GPT、Grok、Qwen)通用的常见模式,以及少量模型特定的偏好。我们做出两个贡献:(i)我们在流行的智能体基准测试(SWE-Pro、SWE-Verified和Terminal-Bench-2)上**复现或改进了**不同模型提供商家族报告的pass@1性能;(ii)基于对**SSA生成的138k条轨迹的分析**,我们超越了前沿模型之间通常相对均匀的pass@1数字。通过在代码状态空间中表示智能体轨迹,我们观察到问题解决行为中的模型级差异。更细粒度的指标,如编辑频率、测试活动和阶段转换,揭示了单个模型如何在自主问题解决的不同阶段分配努力。

英文摘要

AI agent performance is not just a modeling problem, it is fundamentally a systems problem. The advanced capabilities of models are realized through agent harnesses. Therefore, a gap between model assumptions and harness behavior can easily prevent the model's full capabilities from translating into agent performance. We formalize this as the `intent-execution' gap: the mismatch between what the model intends and what the harness executes, and vice versa. We argue that minimizing this intent-execution gap is as important as other aspects of harness design such as tools and execution loops. To illustrate the impact of this harness-model alignment, we develop a simple and customizable harness called `Simple Strands Agent' (SSA). SSA aims to find the bulk of common patterns which generalize across different model families (such as Claude, Gemini, GPT, Grok, Qwen), as well as a small number of model-specific preferences. We make two contributions: (i) we reproduce or improve on the pass@1 performance reported by diverse model-provider families on popular agentic benchmarks (SWE-Pro, SWE-Verified and Terminal-Bench-2), and (ii) building on an analysis of 138k trajectories generated by SSA, we look beyond the pass@1 numbers which tend to be relatively even across frontier models. By representing agent trajectories in code state-spaces, we observe model-level differences in problem-solving behavior. Finer-grained metrics such as edit frequency, testing activity, and phase-transitions reveal how individual models allocate effort across different stages of autonomous problem solving.

2510.15300 2026-06-18 cs.LG 版本更新

DFCA: Decentralized Federated Clustering Algorithm

Jonas Kirch, Sebastian Becker, Tiago Koketsu Rodrigues, Stefan Harmeling

发表机构 * Fraunhofer Institute for Software and Systems Engineering(弗劳恩霍夫软件与系统工程研究所) Lamarr Institute for Machine Learning and AI(拉马尔人工智能与机器学习研究所)

详情
英文摘要

Clustered Federated Learning has emerged as an effective approach for handling heterogeneous data across clients by partitioning them into clusters with similar or identical data distributions. However, most existing methods, including the Iterative Federated Clustering Algorithm (IFCA), rely on a central server to coordinate model updates, which creates a bottleneck and a single point of failure, limiting their applicability in more realistic decentralized learning settings. In this work, we introduce DFCA, a fully decentralized clustered FL algorithm that enables clients to collaboratively train cluster-specific models without central coordination. DFCA uses a sequential running average to aggregate models from neighbors as updates arrive, providing a communication-efficient alternative to batch aggregation while maintaining clustering performance. Our experiments on various datasets demonstrate that DFCA outperforms other decentralized algorithms and performs comparably to centralized IFCA, even under sparse connectivity, highlighting its robustness and practicality for dynamic real-world decentralized networks.

2601.18637 2026-06-18 quant-ph cs.LG stat.ML 版本更新

Universality of Many-body Projected Ensemble for Learning Quantum Data Distribution

Quoc Hoan Tran, Koki Chinzei, Yasuhiro Endo, Hirotaka Oshima

发表机构 * Quantum Laboratory, Fujitsu Research, Fujitsu Limited, Kawasaki, Kanagawa 211-8588, Japan(富士通量子实验室,富士通研究,富士通株式会社,神户,神奈川县211-8588,日本)

Comments 21 pages, 6 figures (added Github repository)

详情
Journal ref
IJCNN 2026
英文摘要

Generating quantum data by learning the underlying quantum distribution poses challenges in both theoretical and practical scenarios, yet it is a critical task for understanding quantum systems. A fundamental question in quantum machine learning (QML) is the universality of approximation: whether a parameterized QML model can approximate any quantum distribution. We address this question by proving a universality theorem for the Many-body Projected Ensemble (MPE) framework, a method for quantum state design that uses a single many-body wave function to prepare random states. This demonstrates that MPE can approximate any distribution of pure states within a 1-Wasserstein distance error. This theorem provides a rigorous guarantee of universal expressivity, addressing key theoretical gaps in QML. For practicality, we propose an Incremental MPE variant with layer-wise training to improve the trainability. Numerical experiments on clustered quantum states and quantum chemistry datasets validate MPE's efficacy in learning complex quantum data distributions.

2405.14273 2026-06-18 cs.LG cs.AI math.OC 版本更新

Exact Solution to Data-Driven Inverse Optimization of MILPs in Finite Time via Gradient-Based Methods

通过基于梯度的方法在有限时间内精确求解混合整数线性规划的驱动数据反优化问题

Akira Kitaoka

发表机构 * NEC Corporation(日本电气株式会社)

AI总结 本文研究了混合整数线性规划中驱动数据反优化问题,揭示了子最优损失的几何结构,并证明了基于梯度的优化方法可以在有限次迭代内达到观测数据的一致性,同时给出了投影子梯度下降法的迭代次数上界。

Comments 66 pages; comments are welcome

详情
AI中文摘要

驱动数据反优化问题(DDIOP)是估计能够解释观测最优解数据的目标函数参数(权重)的问题,广泛应用于混合整数线性规划(MILP)中。在MILP的反优化中,特征的预测误差对权重的不连续性使得直接应用基于梯度的优化方法具有挑战性。本文聚焦于子最优损失,该损失在权重与观测数据完全一致时达到最小值零。我们揭示了该损失的几何结构——它具有凸性和分段线性特性,并且与观测数据完全一致的权重集合具有正的“厚度”而非单一点或薄边界。利用这一结构,我们证明了:首先,一类广泛的基于梯度的优化方法,包括投影子梯度下降法,在有限次迭代中可以达到观测数据的一致性(在有限时间内获得精确解)。其次,对于投影子梯度下降法,我们给出了达到精确一致性的迭代次数的显式上界。第三,当正向问题是一个整数线性规划(ILP)时,我们将其上界表示为仅由样本数、特征维度和约束系数矩阵结构(例如,若系数矩阵是总模矩阵,则迭代次数被显式地限制为样本数平方和维度的多项式)决定的完全显式迭代次数。通过数值实验,我们验证了这种有限步数达到行为。

英文摘要

A data-driven inverse optimization problem (DDIOP) is the problem of estimating the objective-function parameters (weights) that explain observed optimal-solution data, and it arises in many applications, including mixed integer linear programming (MILP). In inverse optimization for MILPs, the prediction error of the features is discontinuous with respect to the weights, so applying gradient-based optimization directly is difficult. In this paper we focus on the suboptimality loss. This loss attains its minimum value, zero, if and only if the weights are exactly consistent with the observed data. We reveal a geometric structure of this loss -- it is convex and piecewise linear, and moreover the set of weights that are exactly consistent with the observed data has a positive ``thickness'' rather than being a single point or a thin boundary -- and use it to show the following. First, a broad class of gradient-based optimization methods, including projected subgradient descent, reaches exact consistency with the observed data in finitely many iterations (an exact solution is obtained in finite time). Second, for projected subgradient descent we give an explicit upper bound on the number of iterations needed to reach exact consistency. Third, when the forward problem is an integer linear program (ILP), we give this upper bound as a fully explicit iteration count determined solely by the number of samples, the dimension of the features, and the structure of the constraint coefficient matrix. Through numerical experiments, we confirm this finite-step attainment behavior.

2407.00449 2026-06-18 cs.LG cs.AI cs.NE 版本更新

Fully tensorial approach to hypercomplex-valued neural networks

Agnieszka Niemczynowicz, Radosław Antoni Kycia

发表机构 * Faculty of Computer Science and Mathematics, Cracow University of Technology(克拉科夫技术大学计算机科学与数学系)

Comments 23 pages, 3 figures

详情
Journal ref
Information Sciences, 2026, 123796
英文摘要

A fully tensorial theoretical framework for hypercomplex-valued neural networks is presented. The proposed approach enables neural network architectures to operate on data defined over arbitrary finite-dimensional algebras. The central observation is that algebra multiplication can be represented by a rank-three tensor, which allows all algebraic operations in neural network layers to be formulated in terms of standard tensor contractions, permutations, and reshaping operations. This tensor-based formulation provides a unified and dimension-independent description of hypercomplex-valued dense and convolutional layers and is directly compatible with modern deep learning libraries supporting optimized tensor operations. The proposed framework recovers existing constructions for four-dimensional algebras as a special case. Within this setting, a tensor-based version of the universal approximation theorem for single-layer hypercomplex-valued perceptrons is established under mild non-degeneracy assumptions on the underlying algebra, thereby providing a rigorous theoretical foundation for the considered class of neural networks.

2512.17696 2026-06-18 cs.LG stat.ME stat.ML 版本更新

Spatially-informed transformers: Injecting geostatistical covariance biases into self-attention for spatio-temporal forecasting

Yuri Calleo

发表机构 * Unimercatorum(乌尼默卡图姆大学)

详情
英文摘要

The modeling of high-dimensional spatio-temporal processes presents a fundamental dichotomy between the probabilistic rigor of classical geostatistics and the flexible, high-capacity representations of deep learning. While Gaussian processes offer theoretical consistency and exact uncertainty quantification, their prohibitive computational scaling renders them impractical for massive sensor networks. Conversely, modern transformer architectures excel at sequence modeling but inherently lack a geometric inductive bias, treating spatial sensors as permutation-invariant tokens without a native understanding of distance. In this work, we propose a spatially-informed transformer, a hybrid architecture that injects a geostatistical inductive bias directly into the self-attention mechanism via a learnable covariance kernel. By formally decomposing the attention structure into a stationary physical prior and a non-stationary data-driven residual, we impose a soft topological constraint that favors spatially proximal interactions while retaining the capacity to model complex dynamics. We demonstrate the phenomenon of ``Deep Variography'', where the network successfully recovers the true spatial decay parameters of the underlying process end-to-end via backpropagation. Extensive experiments on synthetic Gaussian random fields and real-world traffic benchmarks confirm that our method outperforms state-of-the-art graph neural networks. Furthermore, rigorous statistical validation confirms that the proposed method delivers not only superior predictive accuracy but also well-calibrated probabilistic forecasts, effectively bridging the gap between physics-aware modeling and data-driven learning.

2508.06406 2026-06-18 cs.DC cs.LG 版本更新

Blockchain-Enabled Federated Learning

Murtaza Rangwala, KR Venugopal, Rajkumar Buyya

发表机构 * Quantum Cloud and Distributed Systems (qCLOUDS) Lab, School of Computing and Information Systems, The University of Melbourne, Australia(量子云与分布式系统实验室,计算机与信息系统学院,墨尔本大学,澳大利亚) Department of Computer Science and Engineering, University of Visvesvaraya College of Engineering, Bangalore University, India(计算机科学与工程系,维萨瓦拉亚工程学院,班加罗尔大学,印度)

Comments 32 pages, 6 figures, chapter for edited book (Federated Learning: Foundations and Applications)

详情
英文摘要

Blockchain-enabled federated learning (BCFL) addresses fundamental challenges of trust, privacy, and coordination in collaborative AI systems. This chapter provides comprehensive architectural analysis of BCFL systems through a systematic four-dimensional taxonomy examining coordination structures, consensus mechanisms, storage architectures, and trust models. We analyze design patterns from blockchain-verified centralized coordination to fully decentralized peer-to-peer networks, evaluating trade-offs in scalability, security, and performance. Through detailed examination of consensus mechanisms designed for federated learning contexts, including Proof of Quality and Proof of Federated Learning, we demonstrate how computational work can be repurposed from arbitrary cryptographic puzzles to productive machine learning tasks. The chapter addresses critical storage challenges by examining multi-tier architectures that balance blockchain's transaction constraints with neural networks' large parameter requirements while maintaining cryptographic integrity. A technical case study of the TrustMesh framework illustrates practical implementation considerations in BCFL systems through distributed image classification training, demonstrating effective collaborative learning across IoT devices with highly non-IID data distributions while maintaining complete transparency and fault tolerance. Analysis of real-world deployments across healthcare consortiums, financial services, and IoT security applications validates the practical viability of BCFL systems, achieving performance comparable to centralized approaches while providing enhanced security guarantees and enabling new models of trustless collaborative intelligence.

2508.20275 2026-06-18 cs.LG cs.CL q-bio.QM 版本更新

A Systematic Review on the Generative AI Applications in Human Medical Genomics

Anton Changalidis, Yury Barbitoff, Yulia Nasykhova, Andrey Glotov

发表机构 * Dpt. of Genomic Medicine(基因组医学系) D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology(D.O. Ott妇产科与生殖医学研究所)

Comments 31 pages, 5 figures

详情
Journal ref
Frontiers in Genetics 16 (2026) 1694070
英文摘要

Although traditional statistical techniques and machine learning methods have contributed significantly to genetics and, in particular, inherited disease diagnosis, they often struggle with complex, high-dimensional data, a challenge now addressed by state-of-the-art deep learning models. Large language models (LLMs), based on transformer architectures, have excelled in tasks requiring contextual comprehension of unstructured medical data. This systematic review examines the role of LLMs in the genetic research and diagnostics of both rare and common diseases. Automated keyword-based search in PubMed, bioRxiv, medRxiv, and arXiv was conducted, targeting studies on LLM applications in diagnostics and education within genetics and removing irrelevant or outdated models. A total of 172 studies were analyzed, highlighting applications in genomic variant identification, annotation, and interpretation, as well as medical imaging advancements through vision transformers. Key findings indicate that while transformer-based models significantly advance disease and risk stratification, variant interpretation, medical imaging analysis, and report generation, major challenges persist in integrating multimodal data (genomic sequences, imaging, and clinical records) into unified and clinically robust pipelines, facing limitations in generalizability and practical implementation in clinical settings. This review provides a comprehensive classification and assessment of the current capabilities and limitations of LLMs in transforming hereditary disease diagnostics and supporting genetic education, serving as a guide to navigate this rapidly evolving field.

2503.01163 2026-06-18 cs.AI cs.CL cs.HC cs.LG cs.NE 版本更新

Bandit-Based Prompt Design Strategy Selection Improves Prompt Optimizers

Rin Ashizawa, Yoichi Hirose, Nozomu Yoshinari, Kento Uchida, Shinichi Shirakawa

发表机构 * Yokohama National University(横滨国立大学)

Comments Accepted to ACL 2025 Findings

详情
英文摘要

Prompt optimization aims to search for effective prompts that enhance the performance of large language models (LLMs). Although existing prompt optimization methods have discovered effective prompts, they often differ from sophisticated prompts carefully designed by human experts. Prompt design strategies, representing best practices for improving prompt performance, can be key to improving prompt optimization. Recently, a method termed the Autonomous Prompt Engineering Toolbox (APET) has incorporated various prompt design strategies into the prompt optimization process. In APET, the LLM is needed to implicitly select and apply the appropriate strategies because prompt design strategies can have negative effects. This implicit selection may be suboptimal due to the limited optimization capabilities of LLMs. This paper introduces Optimizing Prompts with sTrategy Selection (OPTS), which implements explicit selection mechanisms for prompt design. We propose three mechanisms, including a Thompson sampling-based approach, and integrate them into EvoPrompt, a well-known prompt optimizer. Experiments optimizing prompts for two LLMs, Llama-3-8B-Instruct and GPT-4o mini, were conducted using BIG-Bench Hard. Our results show that the selection of prompt design strategies improves the performance of EvoPrompt, and the Thompson sampling-based mechanism achieves the best overall results. Our experimental code is provided at https://github.com/shiralab/OPTS .

2502.15376 2026-06-18 cs.LG cond-mat.mes-hall 版本更新

Learning Chern Numbers of Topological Insulators with Gauge Equivariant Neural Networks

Longde Huang, Oleksandr Balabanov, Hampus Linander, Mats Granath, Daniel Persson, Jan E. Gerken

发表机构 * Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg(数学科学系,查尔姆斯理工大学和哥德堡大学) Department of Physics, Stockholm University, AlbaNova University Center(物理系,斯德哥尔摩大学,阿尔巴诺瓦大学中心) VERSES AI Research Lab, Los Angeles, USA(VERSES AI研究实验室,美国洛杉矶) Department of Physics, University of Gothenburg(物理系,哥德堡大学)

详情
Journal ref
Advances in Neural Information Processing Systems 38 (NeurIPS 2025)
英文摘要

Equivariant network architectures are a well-established tool for predicting invariant or equivariant quantities. However, almost all learning problems considered in this context feature a global symmetry, i.e. each point of the underlying space is transformed with the same group element, as opposed to a local ``gauge'' symmetry, where each point is transformed with a different group element, exponentially enlarging the size of the symmetry group. Gauge equivariant networks have so far mainly been applied to problems in quantum chromodynamics. Here, we introduce a novel application domain for gauge-equivariant networks in the theory of topological condensed matter physics. We use gauge equivariant networks to predict topological invariants (Chern numbers) of multiband topological insulators. The gauge symmetry of the network guarantees that the predicted quantity is a topological invariant. We introduce a novel gauge equivariant normalization layer to stabilize the training and prove a universal approximation theorem for our setup. We train on samples with trivial Chern number only but show that our models generalize to samples with non-trivial Chern number. We provide various ablations of our setup. Our code is available at https://github.com/sitronsea/GENet/tree/main.

2410.23503 2026-06-18 cs.LG 版本更新

Development and Comparative Analysis of Machine Learning Models for Hypoxemia Severity Triage in CBRNE Emergency Scenarios Using Physiological and Demographic Data from Medical-Grade Devices

Santino Nanini, Mariem Abid, Yassir Mamouni, Arnaud Wiedemann, Philippe Jouvet, Stephane Bourassa

发表机构 * SADC-CDSS IA PEDIATRICS, CHU Sainte-Justine, Montreal, Canada(SADC-CDSS IA儿科,圣-朱斯特医院,蒙特利尔,加拿大) Solutions Applicare AI Inc., Montreal, Canada(应用爱智AI公司,蒙特利尔,加拿大) Université de Montréal, Canada(蒙特利尔大学,加拿大) MEDINT CBRNE Group, Montreal, Canada(MEDINT CBRNE组,蒙特利尔,加拿大)

Comments 12 figures, 12 tables and 39 pages

详情
Journal ref
Diagnostics 14 (2024) 2763
英文摘要

This paper presents the development of machine learning (ML) models to predict hypoxemia severity during emergency triage, especially in Chemical, Biological, Radiological, Nuclear, and Explosive (CBRNE) events, using physiological data from medical-grade sensors. Gradient Boosting Models (XGBoost, LightGBM, CatBoost) and sequential models (LSTM, GRU) were trained on physiological and demographic data from the MIMIC-III and IV datasets. A robust preprocessing pipeline addressed missing data, class imbalances, and incorporated synthetic data flagged with masks. Gradient Boosting Models (GBMs) outperformed sequential models in terms of training speed, interpretability, and reliability, making them well-suited for real-time decision-making. While their performance was comparable to that of sequential models, the GBMs used score features from six physiological variables derived from the enhanced National Early Warning Score (NEWS) 2, which we termed NEWS2+. This approach significantly improved prediction accuracy. While sequential models handled temporal data well, their performance gains did not justify the higher computational cost. A 5-minute prediction window was chosen for timely intervention, with minute-level interpolations standardizing the data. Feature importance analysis highlighted the significant role of mask and score features in enhancing both transparency and performance. Temporal dependencies proved to be less critical, as Gradient Boosting Models were able to capture key patterns effectively without relying on them. This study highlights ML's potential to improve triage and reduce alarm fatigue. Future work will integrate data from multiple hospitals to enhance model generalizability across clinical settings.

2211.01960 2026-06-18 q-bio.NC cs.HC cs.LG 版本更新

FingerFlex: Inferring Finger Trajectories from ECoG signals

Vladislav Lomtev, Alexander Kovalev, Alexey Timchenko

发表机构 * Bauman Moscow State Technical University(巴乌曼莫斯科国立技术大学) ALVI Labs(ALVI实验室) Brain Dynamics Group, Higher School of Economics(高等经济学院脑动力组) University of Tuebingen(图宾根大学)

Comments 6 pages, 3 figures, 4 tables. Preprint. Under review

详情
Journal ref
10.1109/IEEECONF58974.2023.10405112
英文摘要

Motor brain-computer interface (BCI) development relies critically on neural time series decoding algorithms. Recent advances in deep learning architectures allow for automatic feature selection to approximate higher-order dependencies in data. This article presents the FingerFlex model - a convolutional encoder-decoder architecture adapted for finger movement regression on electrocorticographic (ECoG) brain data. State-of-the-art performance was achieved on a publicly available BCI competition IV dataset 4 with a correlation coefficient between true and predicted trajectories up to 0.74. The presented method provides the opportunity for developing fully-functional high-precision cortical motor brain-computer interfaces.

1909.13203 2026-06-18 cs.LG stat.ML 版本更新

Learning transport cost from subset correspondence

Ruishan Liu, Akshay Balsubramani, James Zou

发表机构 * Department of Electrical Engineering(电气工程系) Department of Genetics(遗传学系) Stanford University(斯坦福大学) Department of Biomedical Data Science(生物医学数据科学系)

详情
Journal ref
International Conference on Learning Representations (ICLR 2020)
英文摘要

Learning to align multiple datasets is an important problem with many applications, and it is especially useful when we need to integrate multiple experiments or correct for confounding. Optimal transport (OT) is a principled approach to align datasets, but a key challenge in applying OT is that we need to specify a transport cost function that accurately captures how the two datasets are related. Reliable cost functions are typically not available and practitioners often resort to using hand-crafted or Euclidean cost even if it may not be appropriate. In this work, we investigate how to learn the cost function using a small amount of side information which is often available. The side information we consider captures subset correspondence -- i.e. certain subsets of points in the two data sets are known to be related. For example, we may have some images labeled as cars in both datasets; or we may have a common annotated cell type in single-cell data from two batches. We develop an end-to-end optimizer (OT-SI) that differentiates through the Sinkhorn algorithm and effectively learns the suitable cost function from side information. On systematic experiments in images, marriage-matching and single-cell RNA-seq, our method substantially outperform state-of-the-art benchmarks.