电路同步先于泛化：来自Grokking Transformer中傅里叶结构的因果证据

Achyuthan Sivasankar

发表机构 * New York University（纽约大学）

AI总结提出频率同步度（FSD）指标，发现其在模算术任务中比grokking早500-3000步同步，且通过权重衰减控制验证了间隔期的正则化本质，提供因果证据。

Comments 16 pages, 6 figures, 10 tables

详情

AI中文摘要

Grokking——模算术上的transformer从近乎随机突然转变为近乎完美的验证准确率——归因于傅里叶电路，但其时机、因果结构和可控性仍知之甚少。我们引入了频率同步度（FSD），一种无需先验电路知识的归一化、置换检验的傅里叶电路同步度量。在九个模加法配置（素数p∈{53,71,97,113,131}，三个种子）中，FSD在grokking前500-3000步同步（平均领先+1722步；所有九个为正，符号检验p≈0.004），并且在所有九个案例中先于受限logit损失基线（Nanda等人的排除损失），使其成为最早可用的预测器。我们提供了直接因果证据，证明相间间隙是一种正则化现象：在FSD峰值步骤分叉训练并变化权重衰减λ，会产生严格单调的更早grokking，且Δ_t与1/λ成正比。该定律在三个素数（p∈{53,97,131}；两个干净案例的R²=1.00和R²=0.99）上重复，表示为Δ_t ~ C/λ，与(1/λ)*log(||W_mem||/τ)一致。架构消融实验表明，仅注意力模型在强FSD前兆下grok；仅MLP模型从不grok；单层模型的FSD滞后，确认了前兆是多块电路属性。

英文摘要

Grokking -- where a transformer on modular arithmetic suddenly transitions from near-chance to near-perfect validation accuracy -- is attributed to a Fourier circuit, but its timing, causal structure, and controllability remain poorly understood. We introduce the Frequency Synchronization Degree (FSD), a normalised, permutation-tested metric for Fourier circuit synchronisation requiring no prior circuit knowledge. Across nine modular addition configurations (primes p in {53, 71, 97, 113, 131}, three seeds), FSD synchronises 500-3,000 steps before grokking (mean lead +1,722 steps; all nine positive, sign-test p~0.004), and precedes a restricted-logit loss baseline (Nanda et al.'s excluded loss) in all nine cases, making it the earliest available predictor. We provide direct causal evidence that the inter-phase gap is a regularisation phenomenon: forking training at the FSD-ceiling step and varying weight decay lambda produces strictly monotone earlier grokking, with Delta_t proportional to 1/lambda. This law replicates across three primes (p in {53,97,131}; R^2=1.00 and R^2=0.99 for two clean cases), captured as Delta_t ~ C/lambda, consistent with (1/lambda)*log(||W_mem||/tau). Architecture ablations show an attention-only model groks with a strong FSD precursor; an MLP-only model never groks; a single-layer model's FSD lags, confirming the precursor is a multi-block circuit property.

URL PDF HTML ☆

赞 0 踩 0

2606.12979 2026-06-12 cs.LG 新提交

EPM-JEPA: Operator-Side Experience Modulation in JEPA-Family World Models

EPM-JEPA：JEPA系列世界模型中的算子侧经验调制

Vedant Pandya

发表机构 * School of Artificial Intelligence and Data Engineering (SAIDE), Indian Institute of Technology Jodhpur（印度理工学院焦特布尔分校人工智能与数据工程学院）

AI总结提出EPM-JEPA，通过LoRA在权重层面调制预测器，以应对测试时动态偏移；实验表明其优于无记忆基线，但效果弱于预期，并揭示了三种独立动力学过程。

Comments 16 pages, 5 figures, 9 tables, 5 code listings. Pre-registered experimental study with mechanism analysis

详情

AI中文摘要

JEPA系列世界模型使用静态预测器，其权重在测试时动态偏离训练时不会自适应。我们比较了在分布偏移下将累积经验融入JEPA预测器的两种机制：操作数侧注入（EI-JEPA），将压缩的经验表示作为残差添加到预测器的隐藏状态；以及算子侧调制（EPM-JEPA），通过应用于预测器权重的LoRA生成低秩权重增量。在预注册的比较（Moving MNIST，重力偏移）中，EPM-JEPA（D_shift^{n=50} = 0.7848 +/- 0.0078，三个种子）与EI-JEPA（0.8238）相差delta = 4.74% - 根据我们声明的标准，结果C：零结果 - 是一个有效结果。作为次要的、非预注册的观察，EPM-JEPA在无记忆基线（0.8000）上提高了1.90%，且在所有种子上一致，而EI-JEPA低于基线，表明收益特定于权重级调制。我们的主要贡献是机制分析：D_shift^{n=50}轨迹反映了三个独立的动力学过程——缓冲区循环、EMA目标漂移和内在的LoRA稳定瞬态（+0.021）——而非收敛到平衡。这些发现推动了PEM-JEPA，一个基于物理的后续模型，以解决这一动力学峰值限制。

英文摘要

JEPA-family world models use a static predictor whose weights do not adapt when test-time dynamics diverge from training. We compare two mechanisms for incorporating accumulated experience into a JEPA predictor under distribution shift: operand-side injection, where a compressed experience representation is added as a residual to the predictor's hidden state (EI-JEPA), and operator-side modulation, where the same representation generates low-rank weight deltas via LoRA applied to the predictor's weights (EPM-JEPA). On a pre-registered comparison (Moving MNIST, gravity shift), EPM-JEPA (D_shift^{n=50} = 0.7848 +/- 0.0078, three seeds) differs from EI-JEPA (0.8238) by delta = 4.74% - Outcome C: a null result - by our stated criterion, a valid outcome. As a secondary, non-pre-registered observation, EPM-JEPA improves 1.90% over a no-memory baseline (0.8000), consistently across seeds, while EI-JEPA underperforms the baseline, indicating the benefit is specific to weight-level modulation. Our primary contribution is a mechanism analysis: the D_shift^{n=50} trajectory reflects three independent dynamical processes - buffer cycling, EMA target drift, and an intrinsic LoRA settling transient of +0.021 - rather than convergence to equilibrium. These findings motivate PEM-JEPA, a physics-grounded successor addressing this dynamical-peak limitation.

URL PDF HTML ☆

赞 0 踩 0

2606.13081 2026-06-12 cs.LG cs.AI 新提交

Emotional regulation improves deep learning-based image classification

情绪调节改善基于深度学习的图像分类

Riccardo Emanuele Landi, João M. F. Rodrigues, Marta Chinnici

发表机构 * Mare Group（Mare集团）； NOVA LINCS（NOVA LINCS实验室）； Institute of Engineering (ISE), University of Algarve（阿尔加维大学工程学院）； Department of Energy Technologies and Renewable Sources, ENEA Casaccia Research Center（ENEA卡萨恰研究中心能源技术与可再生能源部）

AI总结提出情绪调节框架，通过人工主观体验在深度学习中建模情绪，在图像分类任务中预训练ResNet和ViT，在CIFAR-10/100上超越现有方法，成为情绪增强深度学习的新标杆。

详情

AI中文摘要

情绪显著影响认知，能在特定条件下增强记忆和学习。基于这一原理，情绪增强深度学习研究情感状态如何改善神经网络架构和学习范式，实现比非情绪模型更好的泛化。然而，现有方法通常仅依赖客观神经生理因素，忽视了情绪的主观性。为弥补这一差距，本研究引入情绪调节，一种通过人工主观体验在深度学习中建模情绪的新框架。该方法采用基于情感刺激的预训练，在下游任务优化中平衡非情绪和情绪影响响应。在图像分类中进行了广泛实验，在四个情感数据集上预训练ResNet和ViT架构，以CIFAR-10和CIFAR-100作为目标基准。结果显示，相比上述骨干网络有改进，证明情绪调节是通过人工主观体验定义情绪增强深度学习的有前景方法。此外，所提方法超越了基于CIFAR的图像分类相关工作，揭示情绪调节成为大规模视觉数据集上情绪增强深度学习的新标杆。研究还提供了情感状态改善机器学习任务优化的证据，鼓励进一步探索情绪启发架构。

英文摘要

Emotion significantly influences cognition, enhancing memory and learning under certain conditions. Drawing on this principle, emotion-augmented deep learning investigates how affective states can improve neural network architectures and learning paradigms, achieving better generalization than non-emotional models. However, existing methods often rely solely on objective neurophysiological factors, neglecting the role of subjectivity in emotion. To bridge this gap, the present study introduces Emotional Regulation, a novel framework for modeling emotion in deep learning through artificial subjective experience. The method employs pre-training based on affective stimuli, balancing non-emotional and emotionally-influenced responses in downstream task optimization. Extensive experimentation was conducted in image classification, pre-training ResNet and ViT architectures on four emotional datasets, using CIFAR-10 and -100 as target benchmarks. Results reveal improvements over the aforementioned backbones, providing evidence of Emotional Regulation as a promising method for defining emotion-augmented deep learning through artificial subjective experience. Furthermore, the proposed approach overcomes the related work in image classification based on CIFAR, revealing Emotional Regulation as the new state-of-the-art in emotion-augmented deep learning for large-scale vision datasets. The study also enforces evidence of the impact of affective states in improving machine learning tasks' optimization, encouraging further investigation on emotion-inspired architectures.

URL PDF HTML ☆

赞 0 踩 0

2606.13106 2026-06-12 cs.LG cs.CL 新提交

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

揭秘隐状态循环：基于在线强化学习的可切换潜在推理

Jiayu Yang, Chao Chen, Shengen Wu, Yinhong Liu, Yuxuan Fan, Lujundong Li, Songning Lai, Chengwei Qin, Zhijiang Guo

发表机构 * HKUST(GZ)（香港科技大学（广州））； University of Cambridge（剑桥大学）； NTU（南洋理工大学）； JoinQuant（聚宽）； HKUST（香港科技大学）

AI总结提出SWITCH框架，通过离散边界令牌使隐状态循环推理兼容在线强化学习，并支持因果机制分析，实验表明其优于现有方法。

详情

AI中文摘要

潜在思维链通过用连续的隐状态循环替换可见推理轨迹来压缩推理，但现有公式难以用标准在线强化学习（RL）优化，且难以进行因果解释。我们的关键见解是，一对显式的边界令牌可以同时解决这两个问题：离散的进入和退出锚点使潜在块与标准在线RL兼容，并且相同的锚点为机制分析提供了自然立足点。基于此，我们提出SWITCH，一个可切换的潜在推理框架。模型发出<swi>进入潜在模式，</swi>退出。由于边界是普通的离散令牌，GRPO策略比率在每个决策点都有明确定义。相同的锚点也使潜在步骤暴露于直接探测和因果干预。我们通过可见到潜在的课程和Switch-GRPO目标训练模型，该目标通过循环潜在计算传播梯度。SWITCH在相似规模下始终优于先前的隐状态循环潜在推理方法。通过边界令牌的机制分析进一步揭示了三个发现：（i）<swi>是一个尖锐局部化的学习切换策略，而非风格化伪影；（ii）它开启的潜在步骤执行特定于问题的、因果重要的计算，而非作为惰性占位符；（iii）该计算集中在进入时的单个隐状态转换上。这些结果表明，隐状态循环潜在推理既可RL训练，又可进行直接机制分析，包括在线RL本身如何从内部改进模型。

英文摘要

Latent chain-of-thought compresses reasoning by replacing visible reasoning traces with continuous hidden-state recurrence, but existing formulations are difficult to optimize with standard on-policy reinforcement learning (RL) and hard to interpret causally. Our key insight is that a single pair of explicit boundary tokens can address both issues at once: discrete entry and exit anchors make the latent block compatible with standard on-policy RL, and the same anchors offer a natural foothold for mechanistic analysis. Motivated by this, we propose SWITCH, a switchable latent reasoning framework. The model emits <swi> to enter latent mode and </swi> to exit. Because the boundaries are ordinary discrete tokens, the GRPO policy ratio is well-defined at every decision point. The same anchors also expose the latent steps to direct probing and causal intervention. We train the model with a visible-to-latent curriculum and a Switch-GRPO objective that propagates gradients through recurrent latent computation. SWITCH consistently outperforms prior hidden-state-recurrence latent reasoning approaches at similar scale. Mechanistic analysis through the boundary tokens further reveals three findings: (i) <swi> is a sharply localised, learned switching policy rather than a stylistic artefact; (ii) the latent step it opens performs problem-specific, causally important computation rather than acting as an inert placeholder; and (iii) that computation is concentrated at a single hidden-state transition on entry. Together, these results show that hidden-state-recurrence latent reasoning is both RL-trainable and open to direct mechanistic analysis, including of how on-policy RL itself improves the model from the inside.

URL PDF HTML ☆

赞 0 踩 0

2606.13125 2026-06-12 cs.LG cs.AI 新提交

Select and Improve: Understanding the Mechanics of Post-Training for Reasoning

选择与改进：理解推理后训练的机制

Akshay Krishnamurthy, Audrey Huang, Nived Rajaraman

发表机构 * Microsoft Research NYC（微软研究院纽约）； UIUC（伊利诺伊大学厄巴纳-香槟分校）

AI总结通过控制实验揭示强化学习后训练通过策略选择和策略改进两种机制提升推理能力，并指出SFT数据和RL数据的不同作用。

2606.13168 2026-06-12 cs.LG 新提交

When Does Routing Become Interpretable? Causal Probes on Block Attention Residuals

路由何时变得可解释？对块注意力残差的因果探针

Aydin Javadov

发表机构 * ETH Zurich（苏黎世联邦理工学院）

AI总结研究块注意力残差中路由的可解释性，发现仅当路由参与训练时才出现结构化深度路由，且路由权重与因果重要性存在分离，需用因果干预验证。

详情

AI中文摘要

块注意力残差（Block AttnRes）通过将固定的加性残差替换为基于早期深度源表示的学习softmax，在前向传播中将跨层路由暴露为可检查的张量。这是一个诱人的可解释性目标：通常间接推断的信息流现在可以直接观察。我们询问这种暴露是否足以进行机制解释。我们在相同的路由消融干预下探测了两个同规模（0.6B）的Block AttnRes检查点：一个是通过确定性近因偏差调度（代码库将其视为路由等效加载路径）包装的普通Qwen3推理，另一个是从头训练且路由作为优化一部分的Block AttnRes Qwen3。包装基线的路由权重与内容无关，并重现了调度的分析预测。而训练的AttnRes检查点则表现出三种局部路由模式：通过早期层MLP的嵌入源路径、通过早期层注意力和MLP的当前状态路径，以及通过后期层注意力的较旧历史路径。除了这种分层之外，我们发现平均路由质量与因果重要性之间存在明显分离：在两个子层中，最大的质量切片并非最大的因果贡献，并且一个源家族在干预下携带了可观的质量但没有可检测的因果作用。因此，路由的架构暴露对于机制解释是必要但不充分的：只有当路由是训练的一部分时，结构化的深度路由才会出现，即使如此，描述性路由总结也应被视为待因果干预检验的候选假设，而非其本身的机制证据。

英文摘要

Block Attention Residuals (Block AttnRes) by replace fixed additive residuals with a learned softmax over earlier depth-source representations, surfacing cross-layer routing as an inspectable tensor in the forward pass. This is a tempting interpretability target: information flow normally inferred indirectly is now directly observable. We ask whether such exposure suffices for mechanistic interpretation. We probe two same-scale ($0.6$B) Block AttnRes checkpoints under identical routing-ablation interventions: a vanilla Qwen3 inference-wrapped through a deterministic recency-bias schedule that the codebase admits as a routing-equivalent loading path, and a Block AttnRes Qwen3 trained from scratch with routing as part of optimisation. The wrapped baseline's routing weights are content-independent and reproduce the schedule's analytic prediction. The trained AttnRes checkpoint instead exhibits three localised routing motifs: an embedding-source pathway through early-layer MLP, a current-state pathway through early-layer attention and MLP, and an older-history pathway through late-layer attention. Beyond this stratification, we find a sharp dissociation between average routing mass and causal importance: in both sublayers, the largest mass slice is not the largest causal contribution, and one source family carries appreciable mass with no detectable causal role under intervention. Architectural exposure of routing is therefore necessary but not sufficient for mechanistic interpretation: structured depth routing emerges only when routing has been part of training, and even then, descriptive routing summaries should be treated as candidate hypotheses to be tested by causal interventions, not as evidence of mechanism in their own right.

URL PDF HTML ☆

赞 0 踩 0

2606.13223 2026-06-12 cs.LG cs.CV 新提交

Distributional Loss for Robust Classification

分布损失用于鲁棒分类

Kathleen Anderson, Thomas Martinetz

发表机构 * Institute for Neuro- and Bioinformatics（神经与生物信息学研究所）

AI总结提出一种基于双峰高斯分布的分布损失概念，通过软化目标隐式捕捉类别模糊性，缓解过拟合，提升决策边界鲁棒性，尤其在低数据场景下效果显著。

Comments ICANN 2026

2606.13276 2026-06-12 cs.LG cs.AI 新提交

Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization

不同层，不同流形：Transformer优化中的模块级权重空间几何

Kirato Yoshihara

发表机构 * School of Engineering Science, The University of Osaka（大阪大学工程科学学院）

AI总结研究Transformer不同模块偏好不同流形几何，提出为注意力层和MLP层分别分配Stiefel和DGram约束，在GPT-2预训练中取得最佳性能。

Comments Accepted at WSS @ ICML 2026, code is available at https://github.com/kiratoyoshihara/module-wise-manifold-muon

详情

AI中文摘要

权重空间几何在神经网络优化中扮演核心角色，但流形约束通常统一应用于所有权重矩阵。在这项工作中，我们探究不同Transformer模块是否偏好不同的流形几何。我们研究GPT-2预训练的Manifold Muon，并比较跨注意力块和MLP块的Stiefel和DGram约束的逐层分配。我们的结果显示出明显的不对称性：在测试配置中，将注意力层约束为Stiefel几何，同时将MLP层分配为DGram几何，获得了最佳性能；而反向分配和全DGram配置在共享超参数设置下变得不稳定。我们将这种失败归因于DGram约束的注意力权重中奇异值的增长，这会放大注意力logits并导致softmax饱和。这些发现表明，Transformer的对称感知和几何感知优化应该是模块特定的，而不是统一的。

英文摘要

Weight-space geometry plays a central role in neural network optimization, yet manifold constraints are often applied uniformly across all weight matrices. In this work, we ask whether different transformer modules prefer different manifold geometries. We study Manifold Muon for GPT-2 pretraining and compare layer-wise assignments of Stiefel and DGram constraints across attention and MLP blocks. Our results show a clear asymmetry: constraining attention layers with Stiefel geometry while assigning DGram geometry to MLP layers gives the best performance among the tested configurations, whereas the inverted assignment and all-DGram configuration become unstable under the shared hyperparameter setting. We trace this failure to singular value growth in DGram-constrained attention weights, which can amplify attention logits and induce softmax saturation. These findings suggest that symmetry-aware and geometry-aware optimization for transformers should be module-specific rather than uniform.

URL PDF HTML ☆

赞 0 踩 0

2606.13443 2026-06-12 cs.LG 新提交

How Much Memory Do We Need? Adaptive Memory Gate for Neural Operators

我们需要多少记忆？神经算子的自适应记忆门

Jihyeon Hur, Yongseok Kwon, Min-Gi Jo, Jeongwhan Choi, Noseong Park

发表机构 * University of Seoul（首尔大学）

AI总结针对现有神经算子固定记忆权重适应性不足的问题，提出AMGFNO，通过可学习门动态调节记忆权重，在低分辨率下nRMSE降低55-79%。

2606.13568 2026-06-12 cs.LG math-ph math.MP 新提交

Adjusted Cup-Product Neural Layer

调整杯积神经层

Snigdha Chandan Khilar

AI总结提出调整杯积神经层，通过硬连线杯积与高规范理论调整项，实现规范不变读出，并证明调整系数是唯一信号源。

2606.13571 2026-06-12 cs.LG cs.AI 新提交

Existence Precedes Value: Joint Modeling of Observational Existence and Evolving States in Time Series Forecasting

存在先于价值：时间序列预测中观测存在性与状态演变的联合建模

Yifan Hu, Hongzhou Chen, Peiyuan Liu, Yiding Liu, Zewei Dong, Jiang-Ming Yang

发表机构 * Ant International（蚂蚁国际）

AI总结提出Timeflies框架，联合建模未来观测是否发生（存在性）与数值估计，通过观测流和数值流耦合模块提升缺失值时间序列预测性能。

详情

AI中文摘要

现实世界的时间序列常因传感器休眠、传输延迟和事件驱动采样而高度不完整和不规则，使得可靠预测面临根本性挑战。现有方法已从插值后预测的流水线发展到连续时间模型，如神经常微分方程和连续时间图网络。尽管这些方法改进了历史不规则性的建模，但它们仍然在推理时依赖一个隐式的先知假设：未来有效观测的时间戳被假定为预先已知。这一假设限制了实际相关性，因为在许多现实系统中，更根本的问题不仅是未来值是多少，还包括是否会有有效观测发生。在本文中，我们提出Timeflies，一个统一的框架，将预测重新表述为未来可观测性推断和数值估计的联合问题。为了显式建模观测动态与状态演变之间的交互，Timeflies采用观测流和数值流，通过三个专用模块（可靠性感知嵌入、观测引导的依赖建模和联合预测）进行耦合。我们进一步构建了Shadow基准，该基准结合了来自公共数据集和真实工业数据的自然缺失，并引入观测-值联合熵（OVJE）指标来全面评估这种耦合的可预测性。大量实验表明，Timeflies始终优于现有方法，突显了在缺失值时间序列预测中显式建模未来可观测性的重要性。代码和数据集见https://this URL。

英文摘要

Real-world time series are often highly incomplete and irregular due to sensor dormancy, transmission delays, and event-driven sampling, making reliable forecasting fundamentally challenging. Existing methods have evolved from impute-then-forecast pipelines to continuous-time models such as Neural ODEs and continuous-time graph networks. While these approaches improve the modeling of historical irregularity, they still rely on an implicit oracle assumption at inference time: the timestamps of future valid observations are presumed to be known in advance. This assumption limits practical relevance, since in many real systems the more fundamental question is not only what the future value will be, but also whether a valid observation will occur at all. In this paper, we propose Timeflies, a unified framework that reformulates forecasting as a joint problem of future observability inference and value estimation. To explicitly model the interaction between observation dynamics and state evolution, Timeflies adopts an observation stream and a value stream, coupled through three dedicated modules for reliability-aware embedding, observation-guided dependency modeling, and joint prediction. We further construct Shadow, a benchmark that combines natural missingness from public datasets with real-world industrial data, and introduce the Observation-Value Joint Entropy (OVJE) metric to comprehensively evaluate this coupled predictability. Extensive experiments show that Timeflies consistently outperforms existing methods, highlighting the importance of explicitly modeling future observability in time series forecasting with missing values. Code and dataset are available in https://github.com/ant-intl/Timeflies.

URL PDF HTML ☆

赞 0 踩 0

2606.13603 2026-06-12 cs.LG cs.AI cs.CL 新提交

Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models

超越承诺边界：探究大型推理模型中的附带思维链

Daniel Scalena, Sara Candussio, Luca Bortolussi, Elisabetta Fersini, Malvina Nissim, Gabriele Sarti

发表机构 * CLCG, University of Groningen（格罗宁根大学CLCG）； University of Milano-Bicocca（米兰-布雷拉大学）； University of Trieste（特里耶大学）； Khoury College of Computer Sciences, Northeastern University（东北大学Khoury计算机科学学院）

AI总结通过早期退出估计思维链步骤的因果重要性，发现推理中存在从瞬态猜测到稳定答案的“承诺边界”，后续步骤为附带现象，可提前退出以缩短推理长度达55%而不影响性能。

详情

AI中文摘要

思维链推理是语言模型推理时扩展的主导范式，但每个步骤对最终答案的因果影响尚不明确。我们通过早期退出估计每个步骤的因果重要性，并利用这一度量研究多个模型家族的推理轨迹中答案如何形成。在多种任务中，我们发现推理通常会跨越一个“承诺边界”——从瞬态中间猜测到稳定、高置信度答案的急剧转变。这种转变通常发生在单个步骤中，远在模型推理块结束之前，随后是“附带”的思维链步骤，这些步骤不改变最终答案概率。利用注意力探针，我们表明答案形成阶段可以从中间推理步骤中以高精度线性解码，并稳健地泛化到未见过的推理任务。我们利用这一信号在承诺边界处提前退出推理块，平均将思维链长度减少高达55%，而对模型性能影响微乎其微。

英文摘要

Chain-of-thought (CoT) reasoning is the dominant paradigm for inference-time scaling in language models, yet the causal influence of individual steps on the final answer poorly understood. We estimate each step's causal importance via early exit and use this measure to study how answers form across the reasoning traces of several model families. Across diverse tasks, we find that reasoning typically crosses a \emph{commitment boundary} -- a sharp transition from transient intermediate guesses to a stable, high-confidence answer. This transition often happens in a single step, well before the model's reasoning block ends, and is followed by \emph{epiphenomenal} CoT steps that leave the final answer probability unaltered. Using attention probes, we show that answer-formation stages can be linearly decoded from intermediate reasoning steps with high accuracy and generalize robustly to unseen reasoning tasks. We exploit this signal to early-exit reasoning blocks at the commitment boundary, reducing the length of CoTs up to 55\% on average with negligible impact on model performance.

URL PDF HTML ☆

赞 0 踩 0

2605.18898 2026-06-12 cs.LG stat.ML 新提交

A Two-Parameter Weibull Framework for Diagnosing Transformer Weight Distributions

一种双参数Weibull框架用于变压器权重分布诊断

Tiexin Ding

发表机构 * Independent Researcher（独立研究者）

AI总结本文提出了一种基于Weibull分布的双参数框架，用于分析Transformer中元素权重幅度分布，通过实验发现不同模块的k值分布特征，并揭示了训练过程中lambda参数的变化规律。

Comments 27 pages, 14 figures. Companion library npm-weibull-py and benchmark database available at https://github.com/tiexinding/NPM-Weibull-public

详情

AI中文摘要

我们应用Weibull分布——极值理论中的一个双参数家族——作为诊断框架，用于分析Transformer中元素权重幅度分布。在初始化时，i.i.d.高斯权重给出|w| ~ HalfNormal，产生k ~ 1.20通过中间80%概率-图拟合（此工作中的协议）。这个锚点使k成为一种原则性的、架构无关的训练动态测量工具；在每个层的每个检查点独立拟合每个权重矩阵，使能够进行每组件、每层和每步的诊断，这些聚合统计无法解决。将此框架应用于12个模型，涵盖7个架构家族（Pythia, OLMo-1/2, LLaMA-3, Mistral, Qwen2.5/3）揭示了三个发现。首先，FFN模块和注意力输出投影W_o——传输类——落在狭窄的k带中：在12个条目中，中位数终端k在[1.186, 1.204]之间（跨家族CV=0.51%），在SwiGLU/GeLU激活、Pre-LN/QK-Norm放置和70M-14B大小之间共享。其次，注意力输入投影W_q, W_k——选择类——脱离Weibull家族，其严重程度由存储形状决定：分别存储Q/K（OLMo-1, OLMo-2）产生k在[0.76, 0.99]（深层）；GQA模型产生k在[1.10, 1.16]（轻微）；Pythia的合并W_qkv占据过渡区，跟踪训练预算T/tau单调递增。第三，lambda在训练过程中显著增长，并在Pythia家族中与sqrt(eta/lambda_wd)成比例（Pearson r=0.94，三种传输类型），方向上与Fan等人（2025）一致。这两个参数携带独立信息：k标记功能类别，lambda标记训练进度。我们发布了npm-weibull-py v0.4（Python库）和DATABASE_v9_1在https://github.com/tiexinding/NPM-Weibull-public。

英文摘要

We apply the Weibull distribution -- a two-parameter family from extreme-value theory -- as a diagnostic framework for element-wise weight magnitude distributions in transformers. At initialization, i.i.d. Gaussian weights give |w| ~ HalfNormal, yielding k ~ 1.20 via middle-80% probability-plot fit (the protocol used throughout this work). This anchor makes k a principled, architecture-independent measuring stick for training dynamics; fitting each weight matrix independently at every layer at every checkpoint enables per-component, per-layer, and per-step diagnostics that aggregate statistics cannot resolve. Applying this framework to 12 model entries spanning 7 architectural families (Pythia, OLMo-1/2, LLaMA-3, Mistral, Qwen2.5/3) reveals three findings. First, FFN modules and the attention output projection W_o -- the Transmission Class -- fall in a narrow k band: median terminal k in [1.186, 1.204] across 12 entries (cross-family CV = 0.51%), shared across SwiGLU/GeLU activations, Pre-LN/QK-Norm placements, and 70M-14B sizes. Second, the attention input projections W_q, W_k -- the Selection Class -- depart from the Weibull family, with severity shaped by storage: separately-stored Q/K (OLMo-1, OLMo-2) yields k in [0.76, 0.99] (deep); GQA models yield k in [1.10, 1.16] (mild); Pythia's merged W_qkv occupies a transitional zone tracking training budget T/tau monotonically. Third, lambda grows substantially during training and scales with sqrt(eta/lambda_wd) within the Pythia family (Pearson r = 0.94, three Transmission kinds), directionally consistent with Fan et al. (2025). The two parameters carry independent information: k labels the functional class, lambda labels training progress. We release npm-weibull-py v0.4 (Python library) and DATABASE_v9_1 at https://github.com/tiexinding/NPM-Weibull-public .

URL PDF HTML ☆

赞 0 踩 0

2606.12662 2026-06-12 cs.SD cs.AI cs.LG 交叉投稿

BASENet: Band-Adapted Speech Enhancement Network with Cross-Band Attention

BASENet: 基于频带自适应的跨频带注意力语音增强网络

Damien Martins Gomes, François Capman

发表机构 * Thales SIX GTS, FRANCE（泰雷兹SIX GTS公司，法国）

AI总结提出BASENet，通过Bark尺度划分频带并分配自适应容量编码器，结合跨频带注意力模块，以最少参数实现高PESQ和STOI，适用于资源受限设备。

详情

AI中文摘要

语音增强模型通常对所有频率采用统一容量，忽略了人类听觉的非均匀频谱分辨率。我们提出BASENet，一种频率自适应架构，将频谱划分为Bark尺度频带，并为每个频带分配基于临界频带密度的缩放容量编码器，自动为感知密集的低频分配更深的分支，为高频分配更轻的分支。跨频带注意力模块通过紧凑的频率池化表示以线性复杂度捕获跨频带的谐波依赖性。基于具有密集连接的倒残差块和卷积循环网络，BASENet在VoiceBank+DEMAND上以仅0.83M参数和7.3 G MACs达到3.55 PESQ和STOI~96%，是所有PESQ > 3.50方法中参数最少的。因果变体（3.44 PESQ）超过了几种非因果基线，证实了其在资源受限设备上实时流传输的适用性。

英文摘要

Speech enhancement models typically apply uniform capacity across all frequencies, disregarding the non-uniform spectral resolution of human hearing. We propose BASENet, a frequency-adapted architecture that partitions the spectrum into Bark-scale bands and assigns each a scaled-capacity encoder derived from critical-band density, automatically granting deeper branches to perceptually dense low frequencies and lighter ones to high frequencies. A cross-band attention module captures harmonic dependencies across bands through compact frequency-pooled representations at linear complexity. Built on inverted residual blocks with dense connectivity and a convolutional recurrent network, BASENet achieves 3.55 PESQ and STOI~96% on VoiceBank+DEMAND with only 0.83M parameters and 7.3 G~MACs, the fewest parameters among all methods with PESQ > 3.50. A causal variant (3.44 PESQ) surpasses several non-causal baselines, confirming suitability for real-time streaming on resource-constrained devices.

URL PDF HTML ☆

赞 0 踩 0

2606.12940 2026-06-12 cs.SD cs.LG 交叉投稿

Self-Guidance: Enhancing Neural Codecs via Decoder Manifold Alignment

自引导：通过解码器流形对齐增强神经编解码器

Xiang Li, Yixuan Zhou, Jingran Xie, Zhiyong Wu, Hui Wang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出自引导方法，通过轻量特征映射损失对齐解码器内部流形，在不改变推理过程下提升VQ-VAE神经语音编解码器重建质量，实现低比特率SOTA性能并支持4倍码本缩减。

Comments 20 pages, 9 figures, accepted to ICML 2026, demo website available at https://sgvqvae.github.io/sgvqvae-demo

详情

AI中文摘要

DeepONet和S-DeepONet中的单分支与多分支：网络架构遵循多物理系统中的耦合

Jaewan Park, Kazuma Kobayashi, Qibang Liu, Seid Koric, Diab Abueidda, Syed Bahauddin Alam

发表机构 * National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign（国家超级计算应用中心，伊利诺伊大学厄巴纳-香槟分校）； The Grainger College of Engineering, Mechanical Science and Engineering, University of Illinois at Urbana-Champaign（格拉inger工程学院，机械科学与工程系，伊利诺伊大学厄巴纳-香槟分校）； The Grainger College of Engineering, Nuclear, Plasma & Radiological Engineering, University of Illinois at Urbana-Champaign（格拉inger工程学院，核物理与辐射工程系，伊利诺伊大学厄巴纳-香槟分校）； Department of Industrial and Manufacturing Systems Engineering, Kansas State University（工业与制造系统工程系，堪萨斯州立大学）； Civil and Urban Engineering Department, New York University Abu Dhabi, UAE（土木与城市工程系，纽约大学阿布扎比分校，阿联酋）

AI总结研究比较单分支与多分支神经算子架构在强耦合多物理系统中的表现，发现单分支网络在紧耦合场景下通过共享潜在表示优于多分支，而多分支适用于解耦或单物理任务，代理模型加速高达1.8×10^4倍。

详情

AI中文摘要

复杂物理系统的实时预测需要从数据中学习并代表强多物理耦合的代理模型。深度算子网络在单物理问题中已显示出成功，但其在捕捉耦合系统（如热-机械或电-热耦合）中非线性相互作用方面的有效性仍未充分探索。这里我们提出一个实际问题：神经算子的架构是否应反映其旨在建模的物理耦合强度？我们比较了单分支和多分支设计，包括前馈和顺序循环形式，跨越三个代表性系统：具有异质源的反应-扩散问题、具有温度依赖电导率和焦耳热的非线性热电问题，以及钢凝固的粘塑性热-机械模型。单分支网络在紧耦合场景中通过鼓励共享潜在表示持续优于多分支变体，而多分支设计对于解耦或单物理任务仍然有利。一旦训练完成，这些代理模型提供全场预测的速度比基于物理的求解器快高达1.8×10^4倍。

英文摘要

`Real-time prediction of complex physical systems requires surrogate models that learn from data while representing strong multiphysics coupling. Deep Operator Networks have shown success in single-physics problems, yet their effectiveness in capturing nonlinear interactions in coupled systems (such as thermo-mechanical or electro-thermal coupling) remains underexplored. Here we pose a practical question: should the architecture of a neural operator reflect the strength of physical coupling it aims to model? We compare single-branch and multi-branch designs, in both feedforward and sequential recurrent forms, across three representative systems: a reaction--diffusion problem with heterogeneous sources, a nonlinear thermo-electrical problem with temperature-dependent conductivity and Joule heating, and a viscoplastic thermo-mechanical model of steel solidification. Single-branch networks consistently outperform multi-branch variants in tightly coupled regimes by encouraging shared latent representations, whereas multi-branch designs remain favorable for decoupled or single-physics tasks. Once trained, these surrogates deliver full-field predictions up to $1.8 \times 10^4$ times faster than physics-based solvers.

URL PDF HTML ☆

赞 0 踩 0

2509.18085 2026-06-12 cs.LG cs.AI cs.CL 版本更新

Structuring The Future: Diffusion LLM Speculative Decoding via Calibrated Draft Graphs

构建未来：通过校准草稿图实现扩散LLM推测解码

Sudhanshu Agrawal, Risheek Garrepalli, Raghavv Goel, Christopher Lott, Fatih Porikli, Mingu Lee

发表机构 * University of Waterloo（多伦多大学）

AI总结提出Spiffy算法，利用校准的草稿图结构实现扩散LLM的推测解码，在保持输出分布的同时加速推理，最高减少8.6倍模型推理次数并加速6.3倍令牌生成速率。

Comments Original version uploaded on Sep 22, 2025. (v2): Extended Table 2 with additional analysis and referenced it in Sec 5.2. (v3): Added note to Sec 4.2 and Appendix A.2 specifying conditions for losslessness. (v4): Updated with the version accepted to ICML 2026 workshops

详情

AI中文摘要

扩散LLM（dLLM）最近作为自回归LLM（AR-LLM）的强大替代方案出现，具有以显著更高的令牌生成速率运行的潜力。为了释放这一潜力，我们提出了Spiffy，一种推测解码算法，用于加速dLLM推理，同时可证明地保持模型的输出分布。这项工作解决了将AR-LLM的推测解码思想应用于dLLM所涉及的独特挑战。Spiffy执行自动推测以消除独立草稿模型的开销，以新颖的有向草稿图形式构建草稿状态，以利用dLLM生成的双向、块状特性。这些草稿图离线校准以最大化接受率，并在推理过程中动态剪枝以提高计算效率。我们给出了Spiffy的详细公式，并展示了其与KV缓存和基于阈值的动态掩码相结合，加速LLaDA、Dream和SDAR模型的能力，导致模型推理次数减少高达8.6倍，令牌速率加速高达6.3倍。

英文摘要

Diffusion LLMs (dLLMs) have recently emerged as a powerful alternative to autoregressive LLMs (AR-LLMs) with the potential to operate at significantly higher token-generation rates. To unlock this potential, we present Spiffy, a speculative decoding algorithm to accelerate dLLM inference while provably preserving the model's output distribution. This work addresses the unique challenges involved in applying ideas from speculative decoding of AR-LLMs to dLLMs. Spiffy performs auto-speculation to eliminate the overheads of an independent draft model, structuring draft states in the form of a novel directed draft graph to take advantage of the bidirectional, blockwise nature of dLLM generation. These draft graphs are calibrated offline to maximize acceptance rates and are dynamically pruned during inference for improved computational efficiency. We present a detailed formulation of Spiffy and demonstrate its ability to accelerate LLaDA, Dream, and SDAR models in combination with KV caching and threshold-based dynamic unmasking leading to up to $8.6\times$ reduction in model inferences and $6.3\times$ acceleration in token rate.

URL PDF HTML ☆

赞 0 踩 0

2605.18817 2026-06-12 cs.LG 版本更新

Multi-Token Residual Prediction

多令牌残差预测

Yufeng Xu, Zishuo Bao, Qian Wang, Zeshen Zhang, Haoqi Zhang, Bowen Peng, Ang Li, Rahul Chalamala, Yucheng Lu

发表机构 * New York University（纽约大学）； New York University Shanghai（纽约大学上海）； Nos Research（Nos研究）； Modal

AI总结本文提出了一种轻量级模块Multi-token Residual Prediction，通过利用去噪过程中相邻步骤的logit分布相似性，在单次骨干网络前向传播中实现依赖感知的多令牌去噪，从而在成本较低的情况下提高去噪效率。

详情

AI中文摘要

扩散语言模型（DLMs）通过迭代去噪掩码令牌序列生成文本，相较于自回归模型在并行性和质量之间提供了一种权衡。在当前实践中，每步解码的令牌数量由置信度阈值控制，随着每步去噪的令牌数量增加，质量单调下降。我们引入了多令牌残差预测（MRP），这是一种轻量级模块，能够在单个骨干网络前向传播中实现依赖感知的多令牌去噪。MRP利用了去噪过程的一个关键性质：相邻去噪步骤的logit分布具有显著相似性。而不是再次运行骨干网络以获得下一步的logits，MRP通过骨干网络的隐藏状态预测步骤间的残差，从而在较低的成本下在单次骨干网络前向传播中去噪更多的令牌。我们部署了MRP在两种推理模式中：直接解码，它使用纠正的logits而不进行验证，以实现可调节的质量-速度权衡；以及推测解码，它通过骨干网络验证MRP的提案以实现无损加速。在SDAR模型上进行的实验表明，在推理和代码生成基准测试中，SDAR模型在1.7B、4B和8B规模上实现了高达1.42倍的SGLang无损加速。

英文摘要

Diffusion Language Models (DLMs) generate text by iteratively denoising masked token sequences, offering a tradeoff between parallelism and quality compared to autoregressive models. In current practice, the number of tokens decoded per step is controlled by a confidence threshold, and quality degrades monotonically as more tokens are denoised per step. We introduce Multi-token Residual Prediction (MRP), a lightweight module that enables dependency-aware multi-token denoising within a single backbone forward pass. MRP exploits a key property of the denoising process: the logit distributions at adjacent denoising steps are remarkably similar. Rather than running the backbone a second time to obtain the next-step logits, MRP predicts the residual between steps from the backbone's hidden states, effectively denoising more tokens per backbone forward at a fraction of the cost. We apply MRP across the two operating regimes of DLM decoding. In the high-quality-low-throughput static denoising regime, MRP serves as a drafter for speculative decoding: its proposals are verified against the backbone, yielding lossless acceleration of up to 1.4x in SGLang. In the low-quality-high-throughput dynamic denoising regime, MRP instead drives a remasking scheme that revokes over-eager reveals, recovering most of the accuracy lost to aggressive low-threshold decoding and improving accuracy by up to 22.6 points on code generation task HumanEval and 17.7 points on reasoning task GSM8K.

URL PDF HTML ☆

赞 0 踩 0

2605.25225 2026-06-12 cs.LG cs.AI 版本更新

Transformer Field Theory: A Response-Theoretic Approach to Mechanistic Interpretability

用于Transformer修补和机制可解释性的连续深度场论

David N. Olivieri, Antonio F. Pérez Rodríguez

发表机构 * Universidade de Vigo（维戈大学）； Independent Researcher（独立研究员）

AI总结本文提出场论框架，将残差流视为深度-标记场，通过局部源插入、灵敏度场预测、经验格林函数响应和伴随变分问题来组织和预测Transformer激活修补干预，并在GPT-2风格自回归Transformer中验证了前向响应理论。

详情

AI中文摘要

机制可解释性通常使用激活修补、因果追踪、路径修补和引导方向来揭示Transformer激活空间中行为有意义的子空间。本文发展了一个场论框架来组织和预测此类干预。将残差流视为深度-标记场，我们将修补公式化为局部源插入，修补效应作为灵敏度场预测，下游传播作为经验格林函数响应，修补选择作为伴随变分问题。实验上，我们通过在GPT-2风格自回归Transformer中应用局部残差场干预并观察诱导的残差场差异和logit差异响应来测试前向响应理论。我们识别出有界的局部线性区域；从跨残差站点的一阶灵敏度预测修补效应；测量跨深度和标记位置的结构化各向异性传播；从高灵敏度站点和切片格林算子构建响应描述；并表明提示诱导的残差位移可以传递答案行为。这些结果将响应对象（即灵敏度、传播场和格林算子切片）确立为组织修补实验的实用语言，以及制定修补站点推断和跨尺度迁移的前向数学基础。

英文摘要

Mechanistic interpretability often studies Transformer behavior by intervening on internal activations through activation patching, causal tracing, path patching, and steering directions. This paper develops Transformer Field Theory: a response-theoretic framework in which the residual stream of a fixed forward pass is treated as a Transformer field over layer depth and token position. In this formulation, patching becomes a localized source insertion into the Transformer field, first-order sensitivity fields predict patch effects, Green functions describe downstream propagation, and patch selection is posed as an adjoint inverse problem. Empirically, we test the theory's forward response objects in GPT-2-style autoregressive Transformers. Localized Transformer-field interventions exhibit a bounded local linear regime; first-order sensitivities predict patch effects across layer-token sites; localized sources generate structured anisotropic Transformer-field propagation; high-sensitivity sites and sliced Green operators provide reduced response descriptions; and prompt-induced Transformer-field displacements partially transfer answer behavior. These results establish sensitivities, Transformer-field responses, and sliced Green operators as practical objects for organizing patching experiments, while providing the forward mathematical basis for patch-site inference and cross-scale response transfer.

URL PDF HTML ☆

赞 0 踩 0

2606.05860 2026-06-12 cs.LG 版本更新

GenAutoML: An Agentic Framework for Dynamic Architecture Generation and Optimization in Time-Series Analysis

GenAutoML: 面向时间序列分析的动态架构生成与优化的智能体框架

Oleeviya Babu Poikarayil, Cédric Schockaert, Abdulrahman Nahhas, Christian Daase, Mursal Dawodi, Jawid Ahmad Baktash

发表机构 * Paul Wurth S.A.（保罗·沃思公司）； Otto-von-Guericke University（奥托·冯·格里克大学）； Technical University of Munich（慕尼黑技术大学）

AI总结提出GenAutoML框架，利用大语言模型作为神经架构师，通过沙盒反射循环和签名感知运行时自动生成并优化时间序列预测与异常检测的神经网络架构，引入动态可逆实例归一化提升非平稳条件下的鲁棒性。

Comments 26 pages, 17 figures, 12 tables. Under review

详情

AI中文摘要

为时间序列预测和异常检测设计神经架构仍然是一项资源密集型任务，通常需要大量领域专业知识。传统的自动机器学习系统通常依赖于静态、预定义的搜索空间，限制了其适应多样数据特征的能力。我们提出GenAutoML，一个智能体框架，利用大语言模型作为神经架构师，将自然语言需求与可执行的PyTorch实现连接起来。该框架包含一个沙盒反射循环用于自主代码优化，以及一个签名感知运行时用于确保架构一致性和执行安全性。为了提升非平稳条件下的鲁棒性，我们进一步引入了动态可逆实例归一化包装器。在ETTh1、ETTm1和Weather基准上的实验表明，GenAutoML能够动态生成针对数据集特征定制的任务特定神经架构。在生成的模型中，WaveInterferenceNet实现了每个样本低于0.01毫秒的推理延迟，同时保持有竞争力的预测性能。通过强调计算效率、架构适应性和稳定的优化行为，GenAutoML使得创建适用于资源受限和延迟敏感的Edge AI部署的超轻量级神经网络成为可能。

英文摘要

Designing neural architectures for time-series forecasting and anomaly detection remains a resource-intensive task that often requires substantial domain expertise. Traditional Automated Machine Learning (AutoML) systems typically rely on static, predefined search spaces, limiting their ability to adapt to diverse data characteristics. We present GenAutoML, an agentic framework that leverages Large Language Models (LLMs) as neural architects to bridge natural-language requirements and executable PyTorch implementations. The framework incorporates a Sandboxed Reflection Loop for autonomous code refinement and a Signature-Aware Runtime that enforces architectural consistency and execution safety. To improve robustness under non-stationary conditions, we further introduce a Dynamic Reversible Instance Normalization (Dyn-RevIN) wrapper. Experiments on the ETTh1, ETTm1, and Weather benchmarks demonstrate that GenAutoML can dynamically generate task-specific neural architectures tailored to dataset characteristics. Among the generated models, WaveInterferenceNet achieves inference latency below 0.01 ms per sample while maintaining competitive predictive performance. By emphasizing computational efficiency, architectural adaptability, and stable optimization behavior, GenAutoML enables the creation of ultra-lightweight neural networks suitable for resource-constrained and latency-sensitive Edge AI deployments.

URL PDF HTML ☆

赞 0 踩 0

2606.11255 2026-06-12 cs.LG 版本更新

Dolph2Vec: 海豚发声的自监督表示

Chiara Semenzin, Faadil Mustun, Roberto Dessi, Pierre Orhan, Alexis Emanuelli, Yair Lakretz, Gonzalo de Polavieja, German Sumbre

发表机构 * École Normale Supérieure, Paris, France（巴黎高等师范学院）； Not Diamond, San Francisco, USA（Not Diamond公司）； Institut du Cerveau, Paris, France（巴黎脑研究所）； Champalimaud Foundation, Lisbon, Portugal（尚帕利莫基金会）

AI总结提出Dolph2Vec，首个基于五年纵向海豚录音数据训练的自监督模型，在签名哨声分类和检测任务上显著优于通用基线，并发现可解释的声学单元。

详情

AI中文摘要

自监督学习（SSL）通过无需昂贵人工标注即可对动物发声进行可扩展建模，为生物声学开辟了新机遇。然而，当前该领域的SSL模型优先考虑跨物种的广泛泛化，并未针对揭示个体通信系统的细粒度结构进行优化。在这项工作中，我们收集并发布了一个新颖的数据集，包含来自半自然海洋环境中五只已知海豚的超过五年的纵向录音，这是研究海豚通信的前所未有的资源。我们将Wav2Vec2.0 Baevski等人（2020）的架构适应于此领域，并引入Dolph2Vec，这是第一个仅在此数据上训练的大规模、物种特异性SSL模型。我们在两个生物学相关任务上对模型进行基准测试：签名哨声分类和哨声检测。Dolph2Vec在这两个任务上均显著优于通用基线。除了性能，我们还展示了学习到的嵌入和码本结构捕获了与海豚哨声类别以及可能的子哨声结构对齐的可解释声学单元，从而能够对通信模式进行细粒度分析。我们的发现证明了SSL如何作为模型和科学工具来探索动物通信研究中的假设。

BrainPro：迈向大规模脑状态感知的脑电图表征学习

Yi Ding, Muyun Jiang, Weibang Jiang, Shuailei Zhang, Xinliang Zhou, Chenyu Liu, Shanglin Li, Yong Li, Cuntai Guan

发表机构 * Nanyang Technological University（南洋理工大学）； Shanghai Jiao Tong University（上海交通大学）； Advanced Telecommunications Research Institute International（先进电信研究院）； Southeast University（东南大学）

AI总结提出BrainPro模型，通过检索式空间对齐和脑状态解耦模块，学习共享与特定状态表征，在9个公共BCI数据集上取得最优性能。

Comments 31 pages, 11 figures

详情

AI中文摘要

脑电图（EEG）反映了潜在的脑状态，其活动分布在大脑区域并表现为头皮上的空间模式。学习这些空间结构化的、与状态相关的模式需要跨数据集的一致空间表征。然而，现有的EEG基础模型通常基于自注意力机制，该机制不保留位置特定信息，并且难以对齐不同通道配置记录的信号。此外，脑状态包含共享和状态特定的区域活动，这表明学习神经生理学上合理的、状态感知的表征可以补充当前模型所针对的共享表征，并改善下游解码。为了解决这些局限性，我们提出了BrainPro，一个大型EEG模型，它结合了基于检索的空间学习机制用于跨布局空间对齐，以及一个脑状态解耦模块，通过并行编码器和区域感知重建学习共享和状态特定表征。在大型EEG语料库上预训练后，BrainPro在跨越情感、运动、语音、压力、精神疾病和注意力任务的九个公共BCI数据集上实现了最先进的性能。对空间滤波器、通道丢失鲁棒性和编码器贡献的分析进一步验证了其空间对齐和状态感知路径的有效性。这些结果表明，BrainPro实现了学习空间模式的更好可解释性，并产生了有益于多种EEG解码任务的表征。

英文摘要

Electroencephalography (EEG) reflects underlying brain states, whose activities are distributed across brain regions and manifest as spatial patterns on the scalp. Learning these spatially structured, state-related patterns requires consistent spatial representations across datasets. However, existing EEG foundation models are typically based on self-attention, which does not preserve location-specific information and struggles to align signals recorded with different channel configurations. Moreover, brain states contain both shared and state-specific regional activity, suggesting that learning neurophysiologically plausible, state-aware representations can complement the shared representations targeted by current models and improve downstream decoding. To address these limitations, we propose BrainPro, a large EEG model that combines a retrieval-based spatial learning mechanism for cross-layout spatial alignment with a brain state-decoupling module that learns both shared and state-specific representations through parallel encoders and region-aware reconstruction. Pre-trained on a large EEG corpus, BrainPro achieves state-of-the-art performance across nine public BCI datasets spanning emotion, motor, speech, stress, mental disease, and attention tasks. Analyses of spatial filters, channel-drop robustness, and encoder contributions further validate the effectiveness of its spatial alignment and state-aware pathways. These results show that BrainPro achieves improved interpretability of learned spatial patterns and produces representations that benefit diverse EEG decoding tasks.

URL PDF HTML ☆

赞 0 踩 0

2603.08505 2026-06-12 cs.LG cs.AI 版本更新

Echo2ECG: Enhancing ECG Representations with Cardiac Morphology from Multi-View Echos

Echo2ECG：利用多视角超声心动图的心脏形态增强心电图表示

Michelle Espranita Liman, Özgün Turgut, Alexander Müller, Eimo Martens, Daniel Rueckert, Philip Müller

发表机构 * Chair for AI in Healthcare and Medicine, Technical University of Munich (TUM) and TUM University Hospital（人工智能在医疗与医学中的中心，慕尼黑技术大学（TUM）和慕尼黑大学医院）； Department of Cardiology, TUM University Hospital（心血管科，慕尼黑大学医院）； Department of Computing, Imperial College London（计算系，伦敦帝国理工学院）； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心（MCML））

AI总结提出Echo2ECG多模态自监督学习框架，通过多视角超声心动图丰富心电图表示，在结构表型分类和超声检索任务上优于现有方法，模型大小仅为最大基线的1/18。

Comments Accepted at MICCAI 2026

详情

AI中文摘要

心电图（ECG）是一种低成本、广泛使用的模态，通过捕捉心脏电活动来诊断电异常（如房颤）。然而，它无法直接测量心脏形态表型，如左心室射血分数（LVEF），这通常需要超声心动图（Echo）。从ECG预测这些表型将实现早期、可及的健康筛查。现有的自监督方法通过将ECG与单视角Echo对齐而遭受表示不匹配，单视角Echo仅捕捉局部、空间受限的解剖快照。为解决此问题，我们提出Echo2ECG，一种多模态自监督学习框架，利用多视角Echo中捕捉的心脏形态结构丰富ECG表示。我们在两个根本上需要形态信息的临床相关任务上评估Echo2ECG作为ECG特征提取器：（1）跨三个数据集的结构性心脏表型分类，以及（2）使用ECG查询检索具有相似形态特征的Echo研究。我们的提取的ECG表示在两个任务上始终优于最先进的单模态和多模态基线，尽管模型大小仅为最大基线的1/18。这些结果表明Echo2ECG是一个鲁棒、强大的ECG特征提取器。我们的代码可从此https URL获取。

英文摘要

Electrocardiography (ECG) is a low-cost, widely used modality for diagnosing electrical abnormalities like atrial fibrillation by capturing the heart's electrical activity. However, it cannot directly measure cardiac morphological phenotypes, such as left ventricular ejection fraction (LVEF), which typically require echocardiography (Echo). Predicting these phenotypes from ECG would enable early, accessible health screening. Existing self-supervised methods suffer from a representational mismatch by aligning ECGs to single-view Echos, which only capture local, spatially restricted anatomical snapshots. To address this, we propose Echo2ECG, a multimodal self-supervised learning framework that enriches ECG representations with the heart's morphological structure captured in multi-view Echos. We evaluate Echo2ECG as an ECG feature extractor on two clinically relevant tasks that fundamentally require morphological information: (1) classification of structural cardiac phenotypes across three datasets, and (2) retrieval of Echo studies with similar morphological characteristics using ECG queries. Our extracted ECG representations consistently outperform those of state-of-the-art unimodal and multimodal baselines across both tasks, despite being 18x smaller than the largest baseline. These results demonstrate that Echo2ECG is a robust, powerful ECG feature extractor. Our code is accessible at https://github.com/michelleespranita/Echo2ECG.

URL PDF HTML ☆

赞 0 踩 0

2603.14483 2026-06-12 cs.LG 版本更新

Disentangling Dynamical Systems: Causal Representation Learning Meets Local Sparse Attention

解耦动力系统：因果表示学习遇见局部稀疏注意力

Markus W. Baumgartner, Anson Lei, Joe Watson, Ingmar Posner

发表机构 * Applied Artificial Intelligence Lab, Oxford Robotics Institute, Oxford, UK（应用人工智能实验室，牛津机器人研究所，英国牛津）

AI总结提出一种结合因果表示学习和局部稀疏注意力的方法，从原始轨迹数据中无结构假设地解耦系统参数，并通过图论准则保证可辨识性。

Comments Presented as an Oral at the 5th Conference on Causal Learning and Reasoning

详情

Journal ref: Proceedings of Machine Learning Research 323, 2026

AI中文摘要

参数化系统辨识方法从数据中估计显式定义的物理系统的参数。然而，它们仍然受限于需要提供显式函数空间，通常通过基于可用领域知识预定义的候选函数库。相比之下，深度学习能够以高保真度对广泛复杂性的系统进行建模，但黑箱函数逼近通常无法产生揭示系统结构的显式描述性或解耦表示。我们开发了一种新的可辨识性定理，利用因果表示学习，在没有结构假设的情况下发现系统参数的解耦表示。我们推导了一个图论准则，指定何时系统参数可以从原始轨迹数据中唯一解耦，直至置换和微分同胚。关键的是，我们的分析表明，全局因果结构为考虑局部状态依赖因果结构时可实现的解耦保证提供了下界。我们将系统参数识别实例化为变分推断问题，利用稀疏正则化变换器来发现状态依赖的因果结构。我们在四个合成领域上实证验证了我们的方法，证明了其恢复基线方法无法恢复的高度解耦表示的能力。与我们的理论分析一致，我们的结果证实了强制局部因果结构通常对于完全可辨识性是必要的。

英文摘要

Parametric system identification methods estimate the parameters of explicitly defined physical systems from data. Yet, they remain constrained by the need to provide an explicit function space, typically through a predefined library of candidate functions chosen via available domain knowledge. In contrast, deep learning can demonstrably model systems of broad complexity with high fidelity, but black-box function approximation typically fails to yield explicit descriptive or disentangled representations revealing the structure of a system. We develop a novel identifiability theorem, leveraging causal representation learning, to uncover disentangled representations of system parameters without structural assumptions. We derive a graphical criterion specifying when system parameters can be uniquely disentangled from raw trajectory data, up to permutation and diffeomorphism. Crucially, our analysis demonstrates that global causal structures provide a lower bound on the disentanglement guarantees achievable when considering local state-dependent causal structures. We instantiate system parameter identification as a variational inference problem, leveraging a sparsity-regularised transformer to uncover state-dependent causal structures. We empirically validate our approach across four synthetic domains, demonstrating its ability to recover highly disentangled representations that baselines fail to recover. Corroborating our theoretical analysis, our results confirm that enforcing local causal structure is often necessary for full identifiability.

URL PDF HTML ☆

赞 0 踩 0

2604.27277 2026-06-12 cs.LG cs.AI cs.CV 版本更新

BrainDINO: A Brain MRI Foundation Model for Generalizable Clinical Representation Learning

BrainDINO：一种用于通用临床表征学习的脑MRI基础模型

Yizhou Wu, Shansong Wang, Yuheng Li, Mojtaba Safari, Mingzhe Hu, Chih-Wei Chang, Harini Veeraraghavan, Xiaofeng Yang

发表机构 * Department of Radiation Oncology and Winship Cancer Institute, Emory University（放射肿瘤科和Winship癌症研究所，埃默里大学）； Department of Radiation and Cellular Oncology, The University of Chicago（放射肿瘤学与细胞肿瘤学部，芝加哥大学）； Department of Electrical and Computer Engineering, Georgia Institute of Technology（电气与计算机工程系，佐治亚理工学院）； Department of Biomedical Engineering, Georgia Institute of Technology（生物医学工程系，佐治亚理工学院）； Department of Biomedical Informatics, Emory University（生物医学信息学系，埃默里大学）； Department of Medical Physics, Memorial Sloan Kettering Cancer Center（医学物理系，纪念斯隆凯特琳癌症中心）

AI总结提出BrainDINO，一种基于自蒸馏的基础模型，在约660万张未标记轴向切片上训练，通过冻结编码器加轻量任务头，在多种脑MRI任务上达到或超越基线，尤其在小样本场景下优势显著。

Comments 25 pages, 5 figures

详情

AI中文摘要

脑MRI支撑着广泛的神经科学和临床应用，然而大多数基于学习的方法仍针对特定任务且需要大量标注数据。本文表明，单一的自监督表征可以泛化到异质的脑MRI终点。我们训练了BrainDINO，一个自蒸馏的基础模型，使用了来自20个数据集的约660万张未标记轴向切片，这些数据集涵盖了人群、疾病和采集设置的广泛变异。通过使用冻结编码器加轻量任务头，BrainDINO支持肿瘤分割、神经退行性和神经发育性疾病分类、脑年龄估计、卒中后时间预测、分子状态预测、MRI序列分类和生存建模等任务的迁移。在各种任务和监督机制下，BrainDINO始终等于或超过自然图像和MRI特定自监督基线，在标签稀缺时尤其具有优势。表征分析进一步显示，在缺乏任务特定监督的情况下，特征结构具有解剖学组织和病理敏感性。我们的发现表明，大规模切片级自监督学习可以产生统一的脑MRI表征，支持多样化的神经影像任务，无需体积预训练或全网络微调，为稳健且数据高效的脑影像分析建立了可扩展的基础。代码可在 https://github.com/mclwu22/BrainDINO 获取。

英文摘要

Brain MRI underpins a wide range of neuroscientific and clinical applications, yet most learning-based methods remain task-specific and require substantial labeled data. Here we show that a single self-supervised representation can generalize across heterogeneous brain MRI endpoints. We trained BrainDINO, a self-distilled foundation model, on approximately 6.6 million unlabeled axial slices from 20 datasets encompassing broad variation in population, disease, and acquisition setting. Using a frozen encoder with lightweight task heads, BrainDINO supported transfer across tumor segmentation, neurodegenerative and neurodevelopmental conditions classification, brain age estimation, post-stroke temporal prediction, molecular status prediction, MRI sequence classification, and survival modeling. Across tasks and supervision regimes, BrainDINO consistently equaled or exceeded natural-image and MRI-specific self-supervised baselines, with particularly strong advantages under label scarcity. Representation analyses further showed anatomically organized and pathology-sensitive feature structure in the absence of task-specific supervision. Our findings indicate that large-scale slice-wise self-supervised learning can yield a unified brain MRI representation that supports diverse neuroimaging tasks without volumetric pretraining or full-network fine-tuning, establishing a scalable foundation for robust and data-efficient brain imaging analysis. Code is available at https://github.com/mclwu22/BrainDINO

URL PDF HTML ☆

赞 0 踩 0

2606.10678 2026-06-12 cs.LG 版本更新

One Step Closer to Ground Truth: A Multi-Scale Residual-Aware Representation Learning Pipeline for Predicting Time Series Data

更接近真实：一种多尺度残差感知表示学习管道用于时间序列预测

Amrijit Biswas, Mustafa Kamal, Robin Krambroeckers, M. M. Lutfe Elahi, Sifat Momen, Nabeel Mohammed, Shafin Rahman

发表机构 * RobotBulls Labs（RobotBulls实验室）； North South University（南北大学）

AI总结提出两阶段模型无关框架，通过显式解耦预测与残差学习，使用元校正器动态建模结构误差模式，提升Transformer预测精度。

Comments Accepted at the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '26)

详情

AI中文摘要

近年来，基于Transformer的模型已成为时间序列预测的主要范式，利用自注意力机制捕获长程依赖关系。尽管取得了成功，但这些单阶段预测架构由于结构差异、未建模的随机成分或多尺度时间表示不足，表现出持续的系统性残差偏差。当残差被视为不可约噪声时，这一局限性依然存在，阻碍了对结构化误差模式的自适应校正。为解决这一问题，我们引入了一个两阶段、模型无关的框架，将预测和残差学习显式解耦为不同的表示学习阶段。基础Transformer首先生成初始预测。随后，专用的元校正器动态建模跨多元通道的结构化误差模式，保留跨变量依赖关系，并迭代修正基础Transformer的残差偏差。通过将该管道形式化为假设空间扩展，我们的框架解决了单阶段架构固有的近似局限性，消除了对限制性假设的依赖，并实现了复杂误差动态的端到端学习。在八个流行的基准数据集上使用既定协议进行评估，我们的方法达到了最先进的性能，在标准指标（MSE、MAE）上有显著改进。结果表明，该框架能够减轻系统性偏差，增强对复杂时间动态的鲁棒性，推进了基于Transformer的预测模型的实际应用。

英文摘要

Transformer-based models have emerged as leading paradigms in time-series forecasting in recent years, employing self-attention mechanisms to capture long-range dependencies. Despite their success, these single-stage forecasting architectures exhibit persistent systematic residual biases arising from structural discrepancies, unmodeled stochastic components, or inadequate multi-scale temporal representations. This limitation persists when residuals are treated as irreducible noise, precluding adaptive correction of structured error patterns. To address this limitation, we introduce a two-stage, model-agnostic framework that explicitly decouples forecasting and residual learning into distinct stages of representation learning. A base transformer first generates the initial predictions. Subsequently, a dedicated meta-corrector dynamically models structured error patterns across multivariate channels, preserves cross-variable dependencies, and iteratively refines the residual bias of the base transformer. By formalizing this pipeline as a hypothesis space expansion, our framework addresses approximation limitations inherent in single-stage architectures, removes reliance on restrictive assumptions, and enables end-to-end learning of complex error dynamics. Evaluated on eight popular benchmark datasets using established protocols, our approach achieves state-of-the-art performance, with significant improvements in standard metrics (MSE, MAE). The results demonstrate the framework's ability to mitigate systematic biases and enhance robustness to complex temporal dynamics, advancing the practical applicability of transformer-based forecasting models.

URL PDF HTML ☆

赞 0 踩 0

2606.11190 2026-06-12 cs.LG 版本更新

ReCal: 基于强化学习的LLM路由的奖励校准

Qihang Yu, Hanwen Tong, Zhengqi Zhang, Bo Zheng, Feng Wei, Shengyu Zhang, Zemin Liu, Fei Wu

发表机构 * Zhejiang University（浙江大学）； Ant Group（蚂蚁集团）； Shanghai AI Laboratory（上海人工智能实验室）

AI总结提出ReCal框架，通过分层奖励分解和分布感知优化校准奖励信号，解决多目标冲突和异质性任务优化偏差，提升LLM路由性能与稳定性。

详情

AI中文摘要

大型语言模型（LLM）路由已成为一种有效范式，通过动态模型和推理策略选择来利用多个LLM的互补优势。最近的基于强化学习（RL）的路由方法通过从交互反馈中优化路由策略，进一步提高了路由质量。然而，在难度不同的异质性任务下，它们仍然难以提供信息丰富且可比较的学习信号。在实践中，多个目标（如正确性、格式行为）被聚合为单个标量奖励，导致模糊的信用分配和冲突的优化信号。此外，奖励信号在不同实例间表现出显著变异性，其中一些实例产生更高或更可变的奖励，引入了偏向于平凡样本而非信息性样本的优化偏差。为了解决这些问题，我们提出了\textbf{ReCal}，一个用于基于RL的LLM路由的\textbf{\underline{Re}}ward \textbf{\underline{Cal}}ibration（奖励校准）框架。我们首先引入了一种具有分量式优势估计的分层奖励分解机制。我们进一步提出了一种分布感知的优化策略，通过方差感知重加权和每数据集归一化来校准优化变异性。在七个数据集上的实验表明，ReCal在路由性能和训练稳定性上持续优于基线方法。代码可在该网址获取。

英文摘要

Large language model (LLM) routing has emerged as an effective paradigm for leveraging the complementary strengths of multiple LLMs through dynamic model and reasoning-strategy selection. Recent reinforcement learning (RL)-based routing methods further improve routing quality by optimizing routing policies from interaction feedback. However, they still struggle to provide informative and comparable learning signals under heterogeneous tasks with varying difficulty. In practice, multiple objectives (e.g., correctness, format behavior) are aggregated into a single scalar reward, leading to ambiguous credit assignment and conflicting optimization signals. Moreover, reward signals exhibit significant variability across instances, where some instances produce higher or more variable rewards, introducing optimization bias that favors trivial samples over informative ones. To address these issues, we propose \textbf{ReCal}, a \textbf{\underline{Re}}ward \textbf{\underline{Cal}}ibration framework for RL-based LLM routing. We first introduce a hierarchical reward decomposition mechanism with component-wise advantage estimation. We further propose a distribution-aware optimization strategy that calibrates optimization variability through variance-aware reweighting and per-dataset normalization. Experiments on seven datasets demonstrate that ReCal consistently improves routing performance, and training stability over baselines. Code is available at https://anonymous.4open.science/r/ReCal.

URL PDF HTML ☆

赞 0 踩 0

2606.12485 2026-06-12 cs.LG cs.AI 新提交

个体控制障碍函数引导的扩散模型用于安全离线多智能体强化学习

Qingyun Guo, Junyi Shi, Jianuo Huang, Tianyu Shi

发表机构 * Department of Electrical Engineering and Automation, Aalto University（阿尔托大学电气工程与自动化系）； School of Computing and Data Science, Xiamen University Malaysia（厦门大学马来西亚分校计算与数据科学学院）； Department of Computer Science, University of Toronto（多伦多大学计算机科学系）

AI总结提出一种将神经个体控制障碍函数嵌入扩散模型的离线多智能体强化学习算法，通过逆动力学恢复控制策略，在保证奖励的同时显著提升轨迹生成的安全性。

Comments Accepted to the 23rd IFAC World Congress, 2026

2606.12780 2026-06-12 cs.LG cs.CL 新提交

ProPlay: Procedural World Models for Self-Evolving LLM Agents

ProPlay: 用于自我进化LLM智能体的程序化世界模型

Yijun Ma, Zehong Wang, Yiyang Li, Ziming Li, Xiaoguang Guo, Weixiang Sun, Chuxu Zhang, Yanfang Ye

发表机构 * University of Notre Dame（圣母大学）； University of Connecticut（康涅狄格大学）

AI总结提出ProPlay程序化世界模型，通过程序级预演和因果过程图，使LLM智能体在部分可观测环境中自我进化，无需外部监督。

详情

AI中文摘要

自我进化智能体应能在无外部监督下通过交互改进，但在部分可观测环境中仍困难，智能体必须主动探索、从有限反馈中学习，并决定何时信任先前经验。现有的LLM智能体方法通常依赖记忆或规划模块，但很少在它们之间闭环以持续完善对环境动态的内部理解。我们提出ProPlay，一种程序化世界模型，支持程序级预演，智能体可利用学到的世界知识排练未来的程序路径。ProPlay不将经验表示为孤立的规则或低层动作约束，而是将成功轨迹抽象为程序，并在捕获任务阶段间因果转换的程序图中组织它们。每个转换与一个可靠性记录嵌入相关联，以从过去结果中估计其任务特定贡献。在每个回合前，ProPlay在已知图结构上模拟未来程序轨迹作为结构化软指导；执行后，它利用环境反馈精炼图。在公开基准上的实验表明，ProPlay在环境理解和自我进化能力上持续优于强基线。我们的代码已在此https URL发布。

英文摘要

Self-evolving agents are expected to improve through interaction without external supervision, but this remains difficult in partially observable environments where agents must explore actively, learn from limited feedback, and decide when to trust prior experience. Existing LLM-agent methods often rely on memory or planning modules, yet they rarely close the loop between them to continually refine an internal understanding of environment dynamics. We introduce ProPlay, a procedural world model that supports procedure-level preplay, where agents can rehearse future procedural paths using the learned world knowledge. Rather than representing experience as isolated rules or low-level action constraints, ProPlay abstracts successful trajectories into procedures and organizes them in a procedure graph that captures causal transitions among task stages. Each transition is associated with a reliability record embedding to estimate its task-specific contribution from past outcomes. Before each episode, ProPlay simulates future procedural trajectories over known graph structures as structured soft guidance; after execution, it refines the graph using environment feedback. Experiments on public benchmarks show that ProPlay consistently improves environment understanding and self-evolution capability over strong baselines. Our code has been released in https://github.com/antman9914/proplay.

URL PDF HTML ☆

赞 0 踩 0

2606.13461 2026-06-12 cs.LG cs.CV 新提交

Reinforcement Learning for Neural Model Editing

神经模型编辑的强化学习

Shaivi Malik

发表机构 * Shaivi Malik

AI总结提出将神经模型编辑形式化为强化学习问题，通过奖励反馈学习编辑策略，在偏见缓解和机器遗忘任务上取得良好效果。

详情

AI中文摘要

编辑预训练神经网络需要针对特定目标定制的专用算法。设计此类算法通常耗时且需要大量精力。我们提出了一个探索性框架，将神经模型编辑形式化为强化学习问题，其中智能体使用奖励反馈修改模型。我们引入了两个环境：MaskWorld，其中智能体以乘法方式缩放权重；以及ShiftWorld，其中智能体应用加法权重更新。奖励函数结合了效用保持目标和任务特定编辑目标，使智能体能够在保持整体模型性能的同时学习有针对性的修改。我们在文本分类中的偏见缓解和图像分类中的机器遗忘上评估了该框架，这两者传统上都依赖于专用算法。我们的结果表明，在遗忘任务中，学习到的策略将遗忘集准确率降至接近0%，同时保留集准确率保持在90%以上。在偏见缓解设置中，学习到的策略将偏见相关性能提高了5%以上，同时保持了一般分类效用。我们的发现表明，神经模型编辑可以转化为强化学习问题，从而可以从奖励反馈中学习编辑策略，而不是为每个任务手动设计。

英文摘要

Editing pretrained neural networks requires specialized algorithms tailored to specific objectives. Designing such algorithms is often time-consuming and demands significant effort. We present an exploratory framework that formulates neural model editing as a reinforcement learning problem, where agents modify models using reward feedback. We introduce two environments: MaskWorld, where agents scale weights multiplicatively, and ShiftWorld, where agents apply additive weight updates. The reward function combines a utility-preservation objective with a task-specific editing objective, enabling agents to learn targeted modifications while maintaining overall model performance. We evaluate the framework on bias mitigation in text classification and machine unlearning in image classification, both of which traditionally rely on specialized algorithms. Our results show that the learned policies reduce forget set accuracy to nearly 0% while preserving over 90% retain set accuracy on the unlearning task. In the bias mitigation setting, the learned policies improve bias-related performance by more than 5% while maintaining general classification utility. Our findings show that neural model editing can be cast as a reinforcement learning problem, allowing editing policies to be learned from reward feedback rather than manually engineered for each task.

URL PDF HTML ☆

赞 0 踩 0

2606.13473 2026-06-12 cs.LG cs.AI cs.CL 新提交

MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

MaxProof: 通过生成-验证器强化学习与群体级测试时扩展实现数学证明规模化

Jiacheng Chen, Xinyu Zhang, Shunkai Zhang, Yanmohan Wang, Lin Li, Tiancheng Qin, Qin Wang, Zhengmao Zhu, Tianle Li, Jingyang Li, Zehan Li, Binyang Jiang, Jin Zhu, Han Ding, Fei Yu, Chenyu Du, Zijian Song, Jiayuan Song, Zhi Zhang, Yunan Huang, Weiyu Cheng, Pengyu Zhao, Yu Cheng

发表机构 * MiniMax ； The Chinese University of Hong Kong（香港中文大学）； Fudan University（复旦大学）； Peking University（北京大学）； Tsinghua University（清华大学）

AI总结提出MaxProof框架，结合生成-验证器强化学习与群体级测试时扩展，在MiniMax-M3系列上实现竞赛级数学证明，在IMO 2025和USAMO 2026上超越人类金牌阈值。

2606.13076 2026-06-12 cs.MA cs.GT cs.LG 交叉投稿

DiffCoord: 分布式多智能体轨迹优化的可微协调

Bingheng Wang, Yichao Gao, Tianchen Sun, Shanker Ajay, Lin Zhao

发表机构 * Department of Electrical and Computer Engineering, National University of Singapore（新加坡国立大学电子与计算机工程系）

AI总结提出DiffCoord框架，将截断ADMM-DDP管道的耦合参数通过端到端元学习联合优化，利用智能体神经网络实现任务自适应，并扩展到不同智能体数量。在协作空中运输系统中验证，相比现有方法将每智能体梯度计算时间减少70%。

详情

AI中文摘要

将交替方向乘子法（ADMM）与微分动态规划（DDP）相结合，为分布式多智能体轨迹优化提供了一个可扩展的框架。在实践中，ADMM通常被截断以提高计算效率，这紧密耦合了原本分别控制协调质量和任务性能的参数。在本文中，我们提出了可微协调（DiffCoord），一个统一框架，联合元学习截断ADMM-DDP管道的这些耦合参数。这些参数由智能体神经网络生成以实现任务自适应，并且同构智能体之间共享相同的网络，从而能够扩展到不同数量的智能体。我们通过端到端微分ADMM-DDP管道实现了高效的元学习。值得注意的是，这产生了一个辅助的ADMM-LQR分布式梯度求解器，用于计算和协调关于这些参数的元梯度。该求解器继承了管道的计算结构，使得关键计算结果可以重用，并能够在智能体和轨迹时间线上高效并行化。我们通过协作空中运输系统的数值和物理实验验证了DiffCoord，该系统在狭窄空间中重新配置四旋翼编队以实现安全的六自由度负载操作。它能够鲁棒地适应变化的团队规模和负载动力学，同时与最先进的轨迹梯度方法相比，将每智能体梯度计算时间减少高达70%。

英文摘要

Integrating the Alternating Direction Method of Multipliers (ADMM) with Differential Dynamic Programming (DDP) provides a scalable framework for distributed multi-agent trajectory optimization. In practice, ADMM is typically truncated for computational efficiency, tightly coupling parameters that would otherwise separately govern coordination quality and task performance. In this paper, we propose Differentiable Coordination (DiffCoord), a unified framework that jointly meta-learns these coupled parameters for the truncated ADMM-DDP pipeline. These parameters are generated by agent-wise neural networks for task adaptation, and the same networks are shared among isomorphic agents to enable scalability to varying agent counts. We achieve efficient meta-learning by differentiating the ADMM-DDP pipeline end-to-end. Notably, this yields an auxiliary ADMM-LQR distributed gradient solver that computes and coordinates meta-gradients with respect to these parameters. This solver inherits the computational structure of the pipeline, enabling reuse of key computation results and efficient parallelization over agents and along trajectory horizons. We validate DiffCoord through numerical and physical experiments on a cooperative aerial transport system, where it reconfigures quadrotor formations for safe 6-DoF load manipulation in tight spaces. It adapts robustly to varying team sizes and load dynamics, while reducing per-agent gradient computation time by up to 70% compared with state-of-the-art trajectory-gradient methods.

URL PDF HTML ☆

赞 0 踩 0

2603.11395 2026-06-12 cs.LG cs.AI 版本更新

ARROW: Augmented Replay for RObust World models

ARROW：增强重放用于鲁棒世界模型

Abdulaziz Alyahya, Abdallah Al Siyabi, Markus R. Ernst, Luke Yang, Levin Kuhlmann, Gideon Kowadlo

发表机构 * Imam Mohammad Ibn Saud Islamic University (IMSIU)（伊玛姆·穆罕默德·本·沙特伊斯兰大学）； Monash University（莫纳什大学）； University of New South Wales, Sydney（新南威尔士大学，悉尼）； Cerenaut

AI总结本文提出ARROW算法，一种基于模型的持续强化学习方法，通过高效的重放缓冲区减少灾难性遗忘，提升在无共享结构任务和有共享结构任务中的表现。

Comments 36 pages and 11 figures (includes Appendix)

详情

Journal ref: Transactions on Machine Learning Research, 2026

AI中文摘要

持续强化学习挑战智能体在获取新技能的同时保留已学习技能，以提高过去和未来任务的性能。大多数现有方法依赖于无模型方法和重放缓冲区来缓解灾难性遗忘；然而，这些解决方案往往面临显著的可扩展性挑战，因为内存需求大。受神经科学启发，其中大脑将经验重放给预测世界模型而不是直接重放到策略中，我们提出了ARROW（增强重放用于鲁棒世界模型），一种扩展DreamerV3的基于模型的持续RL算法，具有内存高效、分布匹配的重放缓冲区。与标准固定大小的FIFO缓冲区不同，ARROW维护两个互补的缓冲区：一个短期缓冲区用于近期经验，一个长期缓冲区通过智能采样保留任务多样性。我们在两个具有挑战性的持续RL设置中评估了ARROW：无共享结构任务（Atari）和有共享结构任务（Procgen CoinRun变体）。与相同大小的无模型和基于模型的基线方法相比，ARROW在无共享结构任务中表现出显著减少的遗忘，同时保持可比的前向转移。我们的发现突显了基于模型的RL和生物启发方法在持续强化学习中的潜力，值得进一步研究。

英文摘要

Continual reinforcement learning challenges agents to acquire new skills while retaining previously learned ones with the goal of improving performance in both past and future tasks. Most existing approaches rely on model-free methods with replay buffers to mitigate catastrophic forgetting; however, these solutions often face significant scalability challenges due to large memory demands. Drawing inspiration from neuroscience, where the brain replays experiences to a predictive World Model rather than directly to the policy, we present ARROW (Augmented Replay for RObust World models), a model-based continual RL algorithm that extends DreamerV3 with a memory-efficient, distribution-matching replay buffer. Unlike standard fixed-size FIFO buffers, ARROW maintains two complementary buffers: a short-term buffer for recent experiences and a long-term buffer that preserves task diversity through intelligent sampling. We evaluate ARROW on two challenging continual RL settings: Tasks without shared structure (Atari), and tasks with shared structure, where knowledge transfer is possible (Procgen CoinRun variants). Compared to model-free and model-based baselines with replay buffers of the same-size, ARROW demonstrates substantially less forgetting on tasks without shared structure, while maintaining comparable forward transfer. Our findings highlight the potential of model-based RL and bio-inspired approaches for continual reinforcement learning, warranting further research.

URL PDF HTML ☆

赞 0 踩 0

2603.12530 2026-06-12 cs.LG 版本更新

Net-Ev$^2$：网络事件演化的生成式模拟器

Guangyu Wang, Zhaonan Wang

发表机构 * NYU Shanghai（上海纽约大学）

AI总结提出Net-Ev$^2$，一种结合事件线索与网络拓扑的生成式模拟器，通过结构引导掩码预训练和拓扑感知扩散过程模拟网络事件演化，在多个道路网络数据集上达到最优性能。

Comments Accepted by KDD 2026 Research Track

详情

DOI: 10.1145/3770855.3817972
Journal ref: In Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

AI中文摘要

减少现实世界的试错一直是决策的核心目标，生成式模拟器通过建模未来状态的演化推进了这一目标。一个更具挑战性且更有意义的任务是模拟扰动事件（如事故）如何通过网络传播其影响。现有方法在模拟网络事件演化时，未能同时建模事件的结构化属性和非结构化语义，也未能捕捉拓扑结构。因此，我们提出Net-Ev$^2$（$\underline{\textbf{Net}}$work $\underline{\textbf{Ev}}$ent $\underline{\textbf{Ev}}$olution），一种新颖的生成式模拟器，在模拟中联合利用事件线索并保留网络拓扑。具体而言，该框架包含两个阶段：结构引导的掩码预训练和拓扑感知扩散过程，后者通过类似U-Net的图下采样和上采样实现去噪。在推理时，Net-Ev$^2$仅需自然语言事件输入即可生成模拟，具有更大的实际使用灵活性。此外，我们引入了Net-Ev$^2$-6.5M，一个跨四个大规模道路网络的对齐事件和网络流量数据的多模态基准，以及一个新的拓扑感知指标JL-MMD，用于评估生成网络动态的拓扑保真度。大量实验证明了Net-Ev$^2$的最优性能和强泛化能力。代码已开源。

英文摘要

Reducing real-world trial and error has long been a central goal of decision making, and generative simulators advance this goal by modeling the evolution of future states. An even more challenging yet meaningful task is simulating how disturbance events (e.g., accidents) propagate their impacts across real-world networks. The existing approaches fall short of modeling both structured attributes and unstructured semantics of events, and capturing topological structures in simulating network event evolution. Therefore, we are motivated to propose Net-Ev$^2$ ($\underline{\textbf{Net}}$work $\underline{\textbf{Ev}}$ent $\underline{\textbf{Ev}}$olution), a novel generative simulator that jointly leverages event cues while preserving network topology in simulations. Specifically, the framework consists of two stages, namely structure-guided masked pre-training and topology-aware diffusion process, which is achieved by U-Net-like graph downsampling and upsampling during denoising. At inference time, Net-Ev$^2$ can generate simulations using natural-language event input only, with greater flexibility for practical usage. Furthermore, we introduce Net-Ev$^2$-6.5M, a multimodal benchmark of aligned event and network traffic data across four large-scale road networks, as well as a new topology-aware metric, namely JL-MMD, to evaluate topological fidelity in generated network dynamics. Extensive experiments demonstrate the state-of-the-art performance and strong generalization ability of Net-Ev$^2$. Code is made available at https://github.com/Guangyu4/Net-Ev-2.

URL PDF HTML ☆

赞 0 踩 0

2606.12710 2026-06-12 cs.LG math.OC 新提交

A Stabilized Path-Space Approach to Diffusion-Based Posterior Sampling

一种稳定的路径空间方法用于基于扩散的后验采样

Evan Scope Crafts, Umberto Villa, Saviz Mowlavi, Yanting Ma, Hassan Mansour, Wael H. Ali

发表机构 * Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin（德克萨斯大学奥斯汀分校奥登计算工程与科学研究所）； Mitsubishi Electric Research Laboratories (MERL)（三菱电机研究实验室）； Department of Biomedical Engineering, The University of Texas at Austin（德克萨斯大学奥斯汀分校生物医学工程系）； Mitsubishi Electric Research Laboratories（三菱电机研究实验室）

AI总结提出一种稳定的路径空间框架，通过随机最优控制与信任域优化，实现非线性逆问题中准确且鲁棒的后验采样。

详情

AI中文摘要

扩散模型为贝叶斯逆问题提供了表达性数据驱动先验，但许多扩散后验采样器依赖启发式引导近似，可能对非线性算子和多模态后验失效。本文开发了一种稳定的路径空间框架用于基于扩散的后验采样。从终端边际代表先验的基础扩散过程出发，我们定义了轨迹上的似然加权目标测度，并将后验采样转化为学习一个路径测度匹配该目标的受控随机过程。该公式将扩散后验采样与随机最优控制联系起来，同时保留了不确定性量化所需的贝叶斯结构。我们引入了一种时间重参数化，通过消除未知初始值函数引起的偏差，使路径空间控制问题适定，无需辅助训练。然后通过具有对数方差目标的信任域路径空间优化方法学习控制。路径空间视角还统一了我们的学习控制方法与现有的基于引导的采样器，量化了近似控制引起的采样误差，并产生了用于渐近精确后验期望的重要性采样校正。我们在具有解析表征或高质量参考后验的基准逆问题套件上评估了所提出的框架，从而实现了对采样精度和不确定性量化的原则性评估。这些实验深入揭示了基于扩散的后验采样器的行为，并证明了相比领先方法更高的准确性和鲁棒性。

英文摘要

Diffusion models provide expressive data-driven priors for Bayesian inverse problems, but many diffusion posterior samplers rely on heuristic guidance approximations that can fail for nonlinear operators and multimodal posteriors. In this work, we develop a stabilized path-space framework for diffusion-based posterior sampling. Starting from a base diffusion process whose terminal marginal represents the prior, we define a likelihood-weighted target measure on trajectories and cast posterior sampling as learning a controlled stochastic process whose path measure matches this target. This formulation connects diffusion posterior sampling to stochastic optimal control while preserving the Bayesian structure needed for uncertainty quantification. We introduce a time reparameterization that makes the path-space control problem well posed by removing the bias induced by the unknown initial value function, without auxiliary training. We then learn the control via a trust-region path-space optimization method with log-variance objectives. The path-space perspective also unifies our learned control approach with existing guidance-based samplers, quantifies the sampling error induced by approximate controls, and yields importance sampling corrections for asymptotically exact posterior expectations. We evaluate the proposed framework on a suite of benchmark inverse problems with analytically characterized or high-quality reference posteriors, enabling principled assessment of sampling accuracy and uncertainty quantification. These experiments provide insight into the behavior of diffusion-based posterior samplers and demonstrate improved accuracy and robustness over leading approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.13191 2026-06-12 cs.LG 新提交

The Geometry of Phase Transitions in Generative Dynamics via Projection Caustics

生成动力学中相变的几何：投影焦散视角

Ryosuke Sakamoto, Kotaro Sakamoto

发表机构 * Institute for the Advanced Study of Human Biology, Institute for Advanced Study, Kyoto University（京都大学高等研究院人类生物学高等研究所）； Graduate School of Engineering, The University of Tokyo（东京大学大学院工学系研究科）

AI总结本文通过投影焦散几何解释生成动力学中的相变行为，提出临界边界检测器（CBD）诊断分数方向不稳定性，定位模式承诺并支持敏感区域控制。

详情

AI中文摘要

连续状态生成采样器（包括扩散和流匹配模型）通过连续逆时间动力学演化，但其样本经常经历突然的定性变化：轨迹承诺于模式，语义替代坍缩，窄时间窗口内的小扰动可产生大的下游效应。本文对这种相变般行为进行了几何解释。我们将去噪视为自由能景观上的梯度下降，并表明尖锐转变出现在投影焦散附近，此时数据支撑上的最近点投影不再唯一。受此视角启发，我们引入临界边界检测器（CBD）作为分数方向不稳定性的实用诊断工具。在玩具模型、标准扩散模型和潜在文本到图像扩散模型中，CBD定位了模式承诺，预测了干预敏感窗口，并支持几何敏感区域中的目标控制。我们的结果连接了数据的几何与扩散生成的动力学。

英文摘要

Continuous-state generative samplers, including diffusion and flow-matching models, evolve through continuous reverse-time dynamics, yet their samples often undergo abrupt qualitative changes: trajectories commit to modes, semantic alternatives collapse, and small perturbations in narrow time windows can produce large downstream effects. This paper develops a geometric account of such phase-transition-like behaviour. We view denoising as gradient descent on a free energy landscape and show that sharp transitions arise near projection caustics, where the nearest-point projection onto the data support ceases to be unique. Motivated by this perspective, we introduce the Critical Boundary Detector (CBD), as practical diagnostics for score-direction instability. Across toy models, standard diffusion models, and latent text-to-image diffusion models, CBD localises mode commitment, predicts intervention-sensitive windows, and supports targeted control in geometrically sensitive regions. Our results connect geometry of data and dynamics of diffusion generation.

URL PDF HTML ☆

赞 0 踩 0

2606.13240 2026-06-12 cs.LG cs.AI cs.CV stat.ME stat.ML 新提交

Towards More General Control of Diffusion Models Using Jeffrey Guidance

使用 Jeffrey 引导实现扩散模型的更通用控制

Raphaël Razafindralambo, Rémy Sun, Frédéric Precioso, Jes Frellsen, Pierre-Alexandre Mattei

发表机构 * Inria, CNRS, I3S, Maasai Université Côte d’Azur（法国国家信息与自动化研究所、法国国家科学研究中心、信息与系统科学实验室、马赛·蔚蓝海岸大学）； Technical University of Denmark（丹麦技术大学）； Inria, CNRS, LJAD, Maasai Université Côte d’Azur（法国国家信息与自动化研究所、法国国家科学研究中心、雅克-路易·利翁实验室、马赛·蔚蓝海岸大学）

AI总结提出 Jeffrey 引导框架，通过 Jeffrey 条件规则更新边缘分布，扩展扩散模型控制到标准引导无法表达的应用，在 CIFAR-10 和 FFHQ 上显著降低 FID，并在 CelebA-HQ 上实现公平性控制。

详情

AI中文摘要

扩散模型的一个关键优势在于其灵活性，因为其输出可以在采样时通过引导进行控制。然而，除了条件采样等简单情况外，目标分布通常隐含地定义，仅通过采样规则或启发式能量函数给出。为了解决这个问题，我们提出了 Jeffrey 引导，这是一个原则性框架，将扩散模型控制扩展到标准引导无法表达的应用。它利用 Jeffrey 条件规则将边际分布更新到指定的目标，保持条件结构并最小化对联合分布的扰动。我们首先通过针对指定的嵌入分布来演示 Jeffrey 引导。以 Inception 嵌入为目标，这导致在 CIFAR-10 和 FFHQ 上 FID 显著降低。我们进一步将 Jeffrey 引导应用于 CelebA-HQ 上的公平性，更新无条件扩散模型以强制属性之间的独立性。

英文摘要

A key strength of diffusion models lies in their flexibility, since their outputs can be controlled at sampling time through guidance. However, beyond simple cases such as conditional sampling, the target distribution is often left implicit, defined only through a sampling rule or a heuristic energy function. To address this, we propose Jeffrey guidance, a principled framework that extends diffusion-model control to applications beyond what standard guidance can express. It leverages Jeffrey's rule of conditioning to update marginal distributions towards a prescribed target, preserving the conditional structure and minimally perturbing the joint distribution. We first demonstrate Jeffrey guidance by targeting a prescribed embedding distribution. With Inception embeddings as the target, this leads to substantial reductions in FID on both CIFAR-10 and FFHQ. We further apply Jeffrey guidance to fairness on CelebA-HQ, updating an unconditional diffusion model to enforce independence between attributes.

URL PDF HTML ☆

赞 0 踩 0

2606.13347 2026-06-12 cs.LG 新提交

通过块验证加速推测性扩散

Alexander Soen, Hisham Husain, Valentin De Bortoli, Arnaud Doucet

发表机构 * KTH（皇家理工学院）； Google Research（谷歌研究）； Google DeepMind（谷歌深Mind）

AI总结提出一种针对扩散模型的推测性采样方案，通过块验证提高草稿接受率，无需训练的Free Drafter实现高达6.3%的加速。

详情

AI中文摘要

推测性解码通过使用草稿模型生成令牌，并采用接受-拒绝方案确保输出与目标分布匹配，从而加速LLM推理。将其适应于连续扩散是困难的，因为推测性采样需要从残差分布中采样。虽然在离散空间中直接，但在连续空间中高效采样残差并非易事。因此，现有的扩散适应要么使用计算效率低下的采样技术，要么依赖替代方案。在这项工作中，我们引入了一种新颖的方案，高效地实现了扩散模型的原始推测性采样机制。我们的方法相比现有方法具有关键优势：它使我们能够将LLM的块验证适应到扩散——这被证明可以提高草稿的接受率。此外，我们形式化并分析了Free Drafter，一种无需训练的扩散启发式自推测草稿生成器。通过启用块验证，我们的Free Drafter在无需额外训练且开销可忽略的情况下，相比现有推测性方法实现了高达6.3%的加速。

英文摘要

Speculative decoding speeds up LLM inference by using a draft model to generate tokens, with an acceptance-rejection scheme that ensures that the output matches the target distribution. Adapting this to continuous diffusions is difficult because speculative sampling requires drawing from a residual distribution. While straightforward in discrete spaces, efficiently sampling this residual in continuous space is non-trivial. Consequently, existing diffusion adaptations either use computationally inefficient sampling techniques or rely on an alternative scheme. In this work, we introduce a novel scheme that efficiently implements the original speculative sampling mechanism for diffusion models. Our approach offers a critical advantage over current methods: it enables us to adapt block verification from LLMs to diffusions -- which provably improves the acceptance rate of drafts. Furthermore, we formalize and analyze the Free Drafter, a heuristic self-speculative drafter for diffusions that requires no training. By enabling block verification, our Free Drafter yields up to a 6.3% speedup over existing speculative methods with no additional training and negligible overhead beyond the existing parallel verification pass.

URL PDF HTML ☆

赞 0 踩 0

2606.13565 2026-06-12 cs.LG 新提交

度量辛条件流匹配用于耗散动力学

Ali Baheri, Lars Lindemann

发表机构 * Rochester Institute of Technology, Rochester, NY, USA（罗切斯特理工学院）； Automatic Control Laboratory, ETH Zürich, Switzerland（自动控制实验室）

AI总结提出度量辛条件流匹配（MCFM）方法，通过将保守-耗散分解融入向量场和结构保持采样器，学习耗散动力学，保证能量单调递减和长期稳定性。

详情

AI中文摘要

度量辛条件流匹配（MCFM）在不违反第一原理的情况下学习耗散动力学。神经替代模型常常注入能量并破坏长期推演的稳定性；MCFM 则将保守-耗散分解同时融入向量场和结构保持采样器。MCFM 通过短时间过渡上的条件流匹配进行训练，避免了长时间推演伴随的梯度计算。在推理时，Strang-prox 方案交替进行辛更新和近端度量步骤，确保离散能量衰减；当有可信能量可用时，可选投影强制严格衰减。我们提供了连续和离散时间保证，将该参数化和采样器与守恒、单调耗散和稳定推演联系起来。在一个受控机械基准上，MCFM 产生的相图更接近真实情况，并且与同等表达能力的无约束神经流相比，能量增加和正能量率事件显著减少，同时匹配终端分布拟合。

英文摘要

Metriplectic conditional flow matching (MCFM) learns dissipative dynamics without violating first principles. Neural surrogates often inject energy and destabilize long-horizon rollouts; MCFM instead builds the conservative-dissipative split into both the vector field and a structure preserving sampler. MCFM trains via conditional flow matching on short transitions, avoiding long rollout adjoints. In inference, a Strang-prox scheme alternates a symplectic update with a proximal metric step, ensuring discrete energy decay; an optional projection enforces strict decay when a trusted energy is available. We provide continuous and discrete time guarantees linking this parameterization and sampler to conservation, monotonic dissipation, and stable rollouts. On a controlled mechanical benchmark, MCFM yields phase portraits closer to ground truth and markedly fewer energy-increase and positive energy rate events than an equally expressive unconstrained neural flow, while matching terminal distributional fit.

URL PDF HTML ☆

赞 0 踩 0

2512.22287 2026-06-12 cs.LG cs.AI 版本更新

Cluster Aggregated GAN (CAG): A Cluster-Based Hybrid Model for Appliance Pattern Generation

聚类聚合生成对抗网络 (CAG)：一种基于聚类的混合模型用于电器模式生成

Zikun Guo, Adeyinka. P. Adedigba, Rammohan Mallipeddi

发表机构 * Department of Artificial Intelligence, School of Electronics Engineering, Kyungpook National University（人工智能系，电子工程学院，全北国立大学）

AI总结针对现有生成方法忽略间歇性与连续电器行为差异导致训练不稳定和保真度有限的问题，提出CAG框架，通过聚类模块为间歇电器分配专用生成器，连续电器使用LSTM生成器，在UVIC数据集上优于基线方法。

Comments 18pages, 5Figues

详情

AI中文摘要

合成电器数据对于开发非侵入式负荷监测算法和实现隐私保护的能源研究至关重要，然而标记数据集的稀缺性仍然是一个重大障碍。最近基于GAN的方法已经证明了合成负荷模式的可行性，但大多数现有方法在单个模型内统一处理所有设备，忽略了间歇性和连续性电器之间的行为差异，导致训练不稳定和输出保真度有限。为了解决这些局限性，我们提出了聚类聚合生成对抗网络框架，这是一种混合生成方法，根据每个电器的行为特征将其路由到专门的分支。对于间歇性电器，聚类模块将相似的激活模式分组，并为每个聚类分配专用生成器，确保常见和罕见操作模式都获得足够的建模能力。连续性电器遵循单独的分支，采用基于LSTM的生成器来捕捉逐渐的时间演变，同时通过序列压缩保持训练稳定性。在UVIC智能插头数据集上的大量实验表明，所提出的框架在衡量真实性、多样性和训练稳定性的指标上始终优于基线方法，并且将聚类作为主动生成组件显著提高了可解释性和可扩展性。这些发现确立了所提出的框架作为非侵入式负荷监测研究中合成负荷生成的有效方法。

英文摘要

Synthetic appliance data are essential for developing non-intrusive load monitoring algorithms and enabling privacy preserving energy research, yet the scarcity of labeled datasets remains a significant barrier. Recent GAN-based methods have demonstrated the feasibility of synthesizing load patterns, but most existing approaches treat all devices uniformly within a single model, neglecting the behavioral differences between intermittent and continuous appliances and resulting in unstable training and limited output fidelity. To address these limitations, we propose the Cluster Aggregated GAN framework, a hybrid generative approach that routes each appliance to a specialized branch based on its behavioral characteristics. For intermittent appliances, a clustering module groups similar activation patterns and allocates dedicated generators for each cluster, ensuring that both common and rare operational modes receive adequate modeling capacity. Continuous appliances follow a separate branch that employs an LSTM-based generator to capture gradual temporal evolution while maintaining training stability through sequence compression. Extensive experiments on the UVIC smart plug dataset demonstrate that the proposed framework consistently outperforms baseline methods across metrics measuring realism, diversity, and training stability, and that integrating clustering as an active generative component substantially improves both interpretability and scalability. These findings establish the proposed framework as an effective approach for synthetic load generation in non-intrusive load monitoring research.

URL PDF HTML ☆

赞 0 踩 0

2601.03184 2026-06-12 cs.LG cs.AI 版本更新

Decentralized Autoregressive Generation

分散自回归生成

Stepan Maschan, Haoxuan Qu, Jun Liu

发表机构 * Lancaster University（兰卡斯特大学）

AI总结本文通过离散流匹配框架证明分散训练与集中训练在理论上等价，实验验证其在多模态基准上保持竞争力。

2601.06572 2026-06-12 cs.LG cs.AI 版本更新

Hellinger Multimodal Variational Autoencoders

Hellinger多模态变分自编码器

Huyen Vo, Isabel Valera

发表机构 * Department of Computer Science, Saarland University（萨尔兰大学计算机科学系）； MPI-SWS, Saarland Informatics Campus（萨尔兰信息学校区Max Planck研究所）

AI总结提出基于Hellinger距离的矩匹配近似方法HELVAE，避免子采样，在多模态变分自编码器中实现更优的生成一致性与质量权衡。

Comments Accepted at AISTATS 2026. Camera-ready version

详情

AI中文摘要

多模态变分自编码器（VAEs）广泛用于弱监督生成学习，涉及多种模态。主流方法通过专家乘积（PoE）、专家混合（MoE）或其组合来聚合单模态推理分布，以近似联合后验。本文从概率意见池化的优化视角重新审视多模态推理。我们从$\alpha=0.5$的Hölder池化出发，这是$\alpha\text{-散度}$族中唯一的对称成员，并推导出一种矩匹配近似，称为Hellinger。我们利用这种近似提出HELVAE，一种避免子采样的多模态VAE，从而得到一个高效且有效的模型，该模型：（i）随着观察到的模态增加，学习更具表达力的潜在表示；（ii）在生成一致性和质量之间实现更好的权衡，优于最先进的多模态VAE模型。

英文摘要

Multimodal variational autoencoders (VAEs) are widely used for weakly supervised generative learning with multiple modalities. Predominant methods aggregate unimodal inference distributions using either a product of experts (PoE), a mixture of experts (MoE), or their combinations to approximate the joint posterior. In this work, we revisit multimodal inference through the lens of probabilistic opinion pooling, an optimization-based approach. We start from Hölder pooling with $α=0.5$, which corresponds to the unique symmetric member of the $α\text{-divergence}$ family, and derive a moment-matching approximation, termed Hellinger. We then leverage such an approximation to propose HELVAE, a multimodal VAE that avoids sub-sampling, yielding an efficient yet effective model that: (i) learns more expressive latent representations as additional modalities are observed; and (ii) empirically achieves better trade-offs between generative coherence and quality, outperforming state-of-the-art multimodal VAE models.

URL PDF HTML ☆

赞 0 踩 0

2602.04675 2026-06-12 cs.LG 版本更新

基于神经受控微分方程的通用时间序列生成

Torben Berndt, Elyes Farjallah, Leif Seute, Raeid Saqur, Benjamin Walker, Jan Stühmer

发表机构 * Heidelberg Institute for Theoretical Studies（海德堡理论研究所）； IAR, Karlsruhe Institute of Technology（卡尔斯鲁厄技术大学IAR部门）； Max Planck Institute for Polymer Research（马克斯·普朗克聚合物研究所）； IWR, Heidelberg University（海德堡大学IWR部门）； Dept. of Computer Science, University of Toronto（多伦多大学计算机科学系）； Mathematical Institute, University of Oxford（牛津大学数学研究所）； Vector Institute, Toronto, Canada（多伦多向量研究所）

AI总结本文证明结构化线性受控微分方程（SLiCEs）是通用时间序列生成器，并提出生成式SLiCEs（G-SLiCEs）用于路径空间上的流匹配，在概率预测和下流任务中表现优异，尤其适用于不规则网格。

详情

AI中文摘要

最近关于状态空间模型（SSMs）序列通用性的工作引入了高效、最大表达性的连续时间方法用于时间序列建模。虽然这些工作侧重于判别设置，我们将这一视角扩展到生成式时间序列建模，通过证明最大表达性的结构化线性受控微分方程（SLiCEs）是通用时间序列生成器，即它们可以在$W_\infty$中逼近紧致潜在集上连续因果推前映射的诱导路径律。基于这些理论结果，我们提出了生成式SLiCEs（G-SLiCEs），一种用于路径空间上流匹配的最大表达性连续时间模型。实验上，我们表明表达性提高了概率预测和下流任务的性能，同时保留了连续时间模型的优势，例如泛化到任意观测网格。这对于不规则网格尤其有利，而固定网格模型通常难以处理此类网格。

英文摘要

Recent work on the sequence universality of State Space Models (SSMs) has introduced efficient, maximally expressive continuous-time approaches for time-series modelling. While these works focus on discriminative settings, we extend this perspective to generative time-series modelling by proving that maximally expressive Structured Linear Controlled Differential Equations (SLiCEs) are universal time-series generators, in the sense that they can approximate the induced path laws of continuous causal pushforwards on compact latent sets in $W_\infty$. Building on these theoretical results, we propose Generative SLiCEs (G-SLiCEs), a maximally expressive continuous-time model for flow matching on path-space. Empirically, we show that expressivity improves performance in probabilistic forecasting and downstream tasks, while retaining the advantages of continuous-time models such as generalising to arbitrary observation grids. This is particularly beneficial for irregular grids, where fixed-grid models often struggle.

URL PDF HTML ☆

赞 0 踩 0

2605.29906 2026-06-12 cs.LG 版本更新

Plan, Don't Pose: Long Composite Motion Generation with Text-Aligned BFM

计划，而非摆姿势：基于文本对齐的BFM的长复合运动生成

Nikolay Shvetsov, Maksim Bobrin, Nazar Buzun, Anton Bozhedarov, Dmitry V. Dylov

发表机构 * AvaCapo ； Potsdam University（波茨坦大学）； Applied AI Institute（应用人工智能研究所）； Computational Imaging Lab（计算成像实验室）； AXXX ； Innopolis University（因诺波利斯大学）

AI总结提出Text2BFM框架，通过将自然语言与预训练行为基础模型对齐，在潜在策略空间中实现长复合运动生成，无需端到端运动生成器。

详情

AI中文摘要

文本到运动（T2M）生成在角色动画、虚拟化身和人机交互中具有广泛应用。现有方法通常直接从语言生成姿态轨迹或运动令牌，迫使单个模型处理语义解释、长程结构和低级物理实现。这种耦合使得它们在处理长、复合或语义密集的提示时成本高昂且往往不可靠。我们提出Text2BFM，这是第一个将自然语言与预训练行为基础模型（BFM）对齐用于T2M生成的框架，无需依赖重型端到端运动生成器。Text2BFM在冻结的BFM的潜在策略空间中操作，将其用作可执行的运动先验。一个文本对齐的变分行为瓶颈将BFM策略潜在序列压缩成与语言兼容且保留长程行为结构的紧凑运动表示。生成在这个紧凑的行为流形上通过轻量级条件生成器进行，得到的潜在编码行为被解码为驱动预训练冻结BFM的策略潜在。通过将语义规划与运动执行解耦，Text2BFM实现了高效、鲁棒的T2M生成，并在长复合文本描述上表现出色。

英文摘要

Text-to-motion (T2M) generation has broad applications in character animation, virtual avatars, and human-robot interaction. Existing methods typically generate pose trajectories or motion tokens directly from language, forcing a single model to handle semantic interpretation, long-horizon structure, and low-level physical realization. This coupling makes them costly and often unreliable for long, compositional, or semantically dense prompts. We propose Text2BFM, the first framework that aligns natural language with pretrained Behavioral Foundation Models (BFMs) for T2M generation without relying on heavy end-to-end motion generators. Text2BFM operates in the latent policy space of a frozen BFM, using it as an executable motion prior. A text-aligned variational behavioral bottleneck compresses BFM policy-latent sequences into compact motion representations that are compatible with language and preserve long-horizon behavioral structure. Generation is performed in this compact behavioral manifold with a lightweight conditional generator, and the resulting latent encoded behaviors are decoded into policy latents that drive the pretrained frozen BFM. By decoupling semantic planning from motion execution, Text2BFM achieves efficient, robust T2M generation and strong performance on long, compositional textual descriptions.

URL PDF HTML ☆

赞 0 踩 0

2606.01172 2026-06-12 cs.LG stat.ME stat.ML 版本更新

Revisiting Neural Processes via Fourier Transform and Volterra Series

通过傅里叶变换和Volterra级数重新审视神经过程

Peiman Mohseni, Nick Duffield, Raymond K. W. Wong

发表机构 * University of Cambridge（剑桥大学）

AI总结本文利用Volterra展开和集合傅里叶卷积，提出了两种新的条件神经过程模型，解决了现有平移等变神经过程在可解释性和计算效率上的局限性。

详情

AI中文摘要

从有限的、不规则采样的测量中建模未知的潜在函数是科学和工程中的一个反复出现的挑战。神经过程（NPs）是一类概率函数模型，是有前景的解决方案——尤其是当赋予领域特定的对称性（如平移等变性）时，这提高了样本效率和泛化能力。然而，现有的平移等变NPs面临两个局限性：（i）它们堆叠带有非线性的通用组件，模糊了诱导的函数类并限制了可解释性；（ii）卷积设计依赖于具有局部感受野的核，并需要密集的均匀输入网格，而基于注意力的方法避免了这些问题，但随观测数量呈二次方缩放。我们通过两个贡献解决了这两个问题。首先，利用Volterra展开，我们将连续平移等变算子表征为高阶卷积的和，实现了分析透明性，同时允许通过一阶卷积进行高效近似。其次，我们引入了集合傅里叶卷积（SFConvs），这是一种频域参数化方法，直接在不规则采样点上操作，实现近似全局感受野，并在观测数量上线性缩放。基于这些思想，我们提出了两种条件神经过程（CNPs）：SFConvCNPs，它堆叠带有非线性的SFConv块，以及SFVConvCNPs，它整合了Volterra公式。在合成和真实世界数据集上的实验证明了我们的方法相对于最先进基线的有效性。

英文摘要

Modeling unknown latent functions from finite, irregularly sampled measurements is a recurring challenge across science and engineering. Neural processes (NPs), a family of probabilistic functional models, are promising solutions -- especially when endowed with domain-specific symmetries like translation equivariance, which improve sample efficiency and generalization. Yet existing translation-equivariant NPs face two limitations: (i) they stack generic components with non-linearities, obscuring the induced function class and limiting interpretability; and (ii) convolutional designs rely on kernels with local receptive fields and require dense uniform input grids, while attention-based methods avoid these issues but scale quadratically with the number of observations. We address both with two contributions. First, using the Volterra expansion, we characterize continuous translation-equivariant operators as sums of higher-order convolutions, yielding analytical transparency while admitting efficient approximation by first-order convolutions. Second, we introduce set Fourier convolutions (SFConvs), a frequency-domain parameterization that operates directly on irregularly sampled points, achieves approximately global receptive fields, and scales linearly in the number of observations. Building on these ideas, we propose two conditional NPs (CNPs): SFConvCNPs, which stack SFConv blocks with non-linearities, and SFVConvCNPs, which integrate the Volterra formulation. Experiments on synthetic and real-world datasets demonstrate our methods' efficacy against state-of-the-art baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.02133 2026-06-12 cs.LG cs.AI 版本更新

Variational Learning for Insertion-based Generation

基于插入生成的变分学习

Yangtian Zhang, Zhe Wang, Arthur Gretton, Rex Ying, David van Dijk, Michalis K. Titsias, Jiaxin Shi

发表机构 * University of Cambridge（剑桥大学）

AI总结提出插入过程（IP）模型，通过排列变分推断联合学习插入位置、内容和终止条件，支持变长生成并提升非自回归序列建模质量。

详情

AI中文摘要

非单调序列生成方法，如掩码扩散模型，通过允许以非固定和预设的顺序生成token，为从左到右的自回归建模提供了一种灵活的替代方案。尽管具有实际优势，但大多数现有的非单调模型是顺序无关的，并依赖于固定长度的网格，限制了它们支持变长生成和自适应插入顺序的能力。在这项工作中，我们引入了一个概率框架，用于在变长插入模型中学习插入顺序。我们形式化了插入轨迹与排列之间的双射对应关系，这使得数据似然能够精确重参数化为排列上的和。基于这一结果，我们提出了插入过程（IP），这是一种随机生成模型，它联合学习在哪里插入、插入什么以及何时终止，并通过基于排列的变分推断进行训练。与先前的固定画布方法不同，IP原生支持变长生成，并学习数据驱动的插入顺序偏好。在目标条件规划和分子字符串生成上的实验表明，在缺乏规范从左到右结构的领域中，学习插入顺序提高了建模质量和泛化能力。

英文摘要

Non-monotonic sequence generation methods, such as masked diffusion models, provide a flexible alternative to left-to-right autoregressive modeling by allowing tokens to be generated in non-fixed and prescribed orders. Despite their practical advantages, most existing non-monotonic models are order-agnostic and rely on a fixed-length grid, limiting their ability to support variable-length generation and adaptive insertion order. In this work, we introduce a probabilistic framework for learning insertion order in variable-length insertion models. We formalize a bijective correspondence between insertion trajectories and permutations, which enables an exact reparameterization of the data likelihood as a sum over permutations. Building on this result, we propose the Insertion Process (IP), a stochastic generative model that jointly learns where to insert, what to insert, and when to terminate, trained via permutation-based variational inference. Unlike prior fixed-canvas approaches, IP natively supports variable-length generation and learns data-driven preferences over insertion orders. Experiments on goal-conditioned planning and molecular string generation demonstrate that learning insertion order improves both modeling quality and generalization in domains without a canonical left-to-right structure.

URL PDF HTML ☆

赞 0 踩 0

2402.01779 2026-06-12 eess.IV cs.CV cs.LG stat.ML 版本更新

Plug-and-Play image restoration with Stochastic deNOising REgularization

即插即用图像恢复：随机去噪正则化

Marien Renaud, Jean Prost, Arthur Leclaire, Nicolas Papadakis

发表机构 * GitHub

AI总结提出SNORE框架，仅在适当噪声水平图像上应用去噪器，结合随机正则化与梯度下降求解逆问题，在去模糊和修复任务上达到SOTA。

详情

AI中文摘要

即插即用（PnP）算法是一类迭代算法，通过结合物理模型和深度神经网络进行正则化来解决图像逆问题。尽管它们能产生令人印象深刻的图像恢复结果，但这些算法依赖于在迭代过程中噪声逐渐减小的图像上非标准地使用去噪器，这与最近基于扩散模型（DM）的算法形成对比，后者仅在重新加噪的图像上应用去噪器。我们提出了一种新的PnP框架，称为随机去噪正则化（SNORE），该框架仅在具有适当噪声水平的图像上应用去噪器。它基于显式的随机正则化，从而产生一种随机梯度下降算法来解决不适定逆问题。提供了该算法及其退火扩展的收敛性分析。实验上，我们证明SNORE在去模糊和修复任务上与最先进方法相比具有竞争力，无论是在定量还是定性方面。

英文摘要

Plug-and-Play (PnP) algorithms are a class of iterative algorithms that address image inverse problems by combining a physical model and a deep neural network for regularization. Even if they produce impressive image restoration results, these algorithms rely on a non-standard use of a denoiser on images that are less and less noisy along the iterations, which contrasts with recent algorithms based on Diffusion Models (DM), where the denoiser is applied only on re-noised images. We propose a new PnP framework, called Stochastic deNOising REgularization (SNORE), which applies the denoiser only on images with noise of the adequate level. It is based on an explicit stochastic regularization, which leads to a stochastic gradient descent algorithm to solve ill-posed inverse problems. A convergence analysis of this algorithm and its annealing extension is provided. Experimentally, we prove that SNORE is competitive with respect to state-of-the-art methods on deblurring and inpainting tasks, both quantitatively and qualitatively.

URL PDF HTML ☆

赞 0 踩 0

2508.21531 2026-06-12 stat.ML cs.LG stat.CO 版本更新

Adaptive generative moment matching networks for improved learning of dependence structures

自适应生成矩匹配网络用于改进依赖结构学习

Marius Hofert, Gan Yao

发表机构 * Department of Statistics and Actuarial Science, The University of Hong Kong（香港大学统计与精算科学系）

AI总结提出自适应带宽选择的最大均值差异混合核用于生成矩匹配网络，通过增加核数量和早停策略提升训练性能，在copula随机数生成、高维收敛率及金融数据依赖建模中优于传统方法。

详情

AI中文摘要

自适应加权平均

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

发表机构 * University of Utah（犹他大学）； Boston University（波士顿大学）； Google（谷歌）

AI总结提出一种从单次无偏估计中选取最大未知值的方法，具有可容许性且不劣于基线，应用于随机优化获得在线到批次的转换界限。

2606.12921 2026-06-12 cs.LG cs.AI 新提交

LoRA-Muon: Spectral Steepest Descent on the Low-Rank Manifold

LoRA-Muon：低秩流形上的谱最速下降

Franz Louis Cesista, Katherine Crowson, Cédric Simal, Stella Biderman

发表机构 * Ateneo de Manila University（雅典耀马尼拉大学）； EleutherAI ； NaXys, UNamur（纳慕尔大学NaXys研究所）

AI总结提出LoRA-Muon优化器，将Muon的谱最速下降规则应用于低秩微调，解决LoRA对初始化敏感、最优学习率跨秩迁移差等问题，在TinyShakespeare上以秩32达到比稠密基线更低的验证损失。

Comments 20 pages, 4 figures

详情

AI中文摘要

低秩适应（LoRA）显著降低了微调深度学习模型的计算和内存成本，但通常比稠密训练更难调优：当使用因子级优化器（如AdamW）时，它对初始化选择敏感，其最优学习率在秩之间迁移性差，且常常无法超越稠密基线。我们通过将Muon优化器的谱最速下降规则应用于低秩设置，推导出LoRA-Muon。结合我们的分裂权重衰减规则，我们的主要主张是LoRA-Muon是全秩Muon和Shampoo族优化器的一个良好的低秩代理。其最优学习率在秩、宽度、深度和因子重缩放之间均可迁移。在我们计算匹配的TinyShakespeare研究中，秩2代理恢复了稠密最佳测试学习率，秩32的LoRA-Muon运行在种子平均扫描中达到了比稠密基线更低的平均验证损失。我们进一步表明，Spectron优化器依赖于任意的因子缩放，因此在从严重不平衡的因子开始微调时可能不太适用，并且LoRA-RITE的简化QR坐标核心实现了相同的谱更新。LoRA-Muon无需QR分解即可计算该更新，并避免存储二阶矩，使其更易于加速器使用且内存效率更高。

英文摘要

Low-Rank Adaptation (LoRA) significantly reduces compute and memory costs for finetuning Deep Learning models but is often harder to tune than dense training: when using factor-wise optimizers such as AdamW, it is sensitive to initialization choices, its optimal learning rates transfer poorly across ranks, and it often fails to beat dense baselines. We derive LoRA-Muon by applying the Muon optimizer's spectral steepest-descent rule to the low-rank setting. Along with our split weight-decay rule, our main claim is that LoRA-Muon is a good low-rank proxy for full-rank Muon and Shampoo-family optimizers. Its optimal learning rates transfer across rank, width, depth, and factor-rescaling. In our compute-matched TinyShakespeare study, a rank-$2$ proxy recovers the dense best tested learning rate, and a rank-$32$ LoRA-Muon run attains lower mean validation loss than the dense baseline in the seed-averaged sweep. We further show that the Spectron optimizer depends on arbitrary factor scaling, so it would likely be a poor fit when finetuning starts from badly imbalanced factors, and that LoRA-RITE's simplified QR-coordinate core implements the same spectral update. LoRA-Muon computes that update without QR-decomposition and avoids storing second moments, making it more accelerator-friendly and memory-efficient.

URL PDF HTML ☆

赞 0 踩 0

2606.12930 2026-06-12 cs.LG 新提交

Is Spurious Correlation Removal Always Learnable?

虚假相关性去除是否总是可学习的？

Yibo Zhou, Bo Li, Hai-Miao Hu, Hanzi Wang, Xiaokang Zhang, Ruifan Zhang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结研究不变学习在统计可识别时的计算障碍，证明存在一维不变子空间的可采样多环境实例，多项式时间算法无法达到常数精度，并量化环境多样性对可识别性和风险的影响。

Comments poster paper in ICML-2026

详情

AI中文摘要

即使不变结构在统计上是可识别的，不变学习也可能失败。我们展示了一个条件计算障碍：在由平均情况稀疏恢复归约驱动的黑盒可采样监督稀疏恢复原语下，存在具有一维预测不变子空间（$k=1$）的\emph{可采样}多环境实例，这些实例可以通过穷举搜索用多项式样本学习，而任何多项式时间常数精度恢复算法都会与该原语矛盾。我们进一步通过分离参数$\gamma$量化环境多样性，该参数控制可识别性和不变性目标的曲率。在充分多样性和局部高斯正则性下，极小极大风险为$\mathbb{E}[\dist(\hat{V},V_{\mathrm{inv}})^2]=\Theta(k(d-k)/(n|\mathcal{E}|))$，在标签诱导的偏移下，在$n^*\propto k(d-k)/(|\mathcal{E}|\gamma^2)$处发生相变，估计误差缩放比例与$1/\gamma^2$成正比。合成和真实数据集说明了预测的差距和转变，并激发了简单的多样性诊断。

英文摘要

Invariant learning can fail even when the invariant structure is statistically identifiable. We show a conditional computational barrier: under a black-box samplable supervised sparse recovery primitive motivated by average-case sparse-recovery reductions, there exist \emph{samplable} multi-environment instances with a one-dimensional predictive invariant subspace ($k=1$) that are learnable with polynomial samples by exhaustive search, while any polynomial-time constant-accuracy recovery algorithm would contradict the primitive. We further quantify environment diversity by a separation parameter $γ$, which controls identifiability and the curvature of invariance objectives. Under sufficient diversity and local Gaussian regularity, the minimax risk is $\mathbb{E}[\dist(\hat{V},V_{\mathrm{inv}})^2]=Θ(k(d-k)/(n|\mathcal{E}|))$, and under label-induced shifts a phase transition occurs at $n^*\propto k(d-k)/(|\mathcal{E}|γ^2)$ with refined estimation error scaling proportional to $1/γ^2$. Synthetic and real datasets illustrate the predicted gaps and transitions and motivate simple diversity diagnostics.

URL PDF HTML ☆

赞 0 踩 0

2606.12990 2026-06-12 cs.LG 新提交

Exposure Bias as Epistemic Underidentification in Recursive Forecasting

递归预测中的曝光偏差作为认知欠识别问题

Riku Green, Zahraa S. Abdallah, Telmo M Silva Filho

发表机构 * University of Bristol（布里斯托大学）

AI总结本文证明递归多步预测中的曝光偏差不仅是分布偏移，更是部分可观测性下的认知欠识别问题，并提出基于来源变量的误差分解与校正方法。

Comments Accepted for ICML 2026 EIML workshop

详情

AI中文摘要

递归多步预测通常被表述为分布偏移：模型在观测历史数据上训练，但部署于自身预测结果上。我们通过证明在部分可观测性或状态截断下，递归展开也是一个认知欠识别问题，表明这种表述是不完整的。即使具有确定性潜在动力学，一步贝叶斯监督仅在观测上下文中识别行为，一旦展开查询自生成诱导状态（其正确的局部目标不能仅由数值状态确定），则无需识别部署的递归预测器。我们通过诱导状态 $Z$ 和来源变量 $P$ 形式化这一点，并推导出诱导状态误差分解为教师强制/展开不匹配、表示-类别逼近和来源信息差距。实验表明，展开进入一个不同的诱导状态区域，固定诱导状态定义了一个不同的局部校正任务，闭环增益不仅来自局部适应，还来自改变展开期间访问的诱导状态。使用简单的二进制来源编码，来源感知校正可以进一步提高性能，尽管增益是有条件的而非均匀的。这些结果将曝光偏差重新定义为自诱导认知不确定性下的推理。

英文摘要

Recursive multi-step forecasting is usually framed as distribution shift: models are trained on observed histories but deployed on their own predictions. We show this framing is incomplete by proving that, under partial observability or state truncation, recursive rollout is also an epistemic underidentification problem. Even with deterministic latent dynamics, one-step Bayes supervision identifies behavior only on observed contexts and need not identify the deployed recursive predictor once rollout queries self-generated induced states whose correct local targets are not determined by numeric state alone. We formalize this with induced states $Z$ and provenance variables $P$, and derive a decomposition of induced-state error into teacher-forcing/rollout mismatch, representation--class approximation, and provenance information gaps. Empirically, we show that rollout enters a distinct induced-state regime, that fixed induced states define a distinct local corrective task, and that closed-loop gains arise not only from local adaptation but also from changing the induced states visited during rollout. Using a simple binary provenance encoding, provenance-aware correction can further improve performance, though gains are conditional rather than uniform. These results recast exposure bias as reasoning under self-induced epistemic uncertainty.

URL PDF HTML ☆

赞 0 踩 0

2606.13067 2026-06-12 cs.LG 新提交

Limits of spectral learning under noise

噪声下谱学习的极限

Sabin Roman, Ljupco Todorovski, Saso Dzeroski, Marta Sales-Pardo, Roger Guimera

发表机构 * Joz̆ef Stefan Institute（约瑟夫·斯特凡研究所）； Faculty of Mathematics and Physics, University of Ljubljana（卢布尔雅那大学数学与物理学院）； Department of Chemical Engineering, Universitat Rovira i Virgili（罗维拉-威尔吉利大学化学工程系）； Center for Computational Science and Applied Mathematics (ComSCIAM), Universitat Rovira i Virgili（罗维拉-威尔吉利大学计算科学与应用数学中心）； ICREA（加泰罗尼亚研究与高等研究院）

AI总结研究监督回归中加性标签噪声对谱方法的影响，推导出噪声导致系数漂移的闭合表达式，揭示了由单一内在噪声尺度控制的通用退化曲线。

详情

AI中文摘要

从含噪数据中学习函数关系是科学推理的核心问题。谱方法通过将未知函数在基函数上展开并从数据中估计相应系数来逼近函数，但这些系数在噪声下的稳定性仍知之甚少。本文研究使用稀疏谱表示在多个基和维度下进行带加性标签噪声的监督回归。我们表明，噪声会导致学习到的系数向量发生可预测的漂移，其大小取决于有效活跃谱模式的数量。在对经验特征几何进行白化后，我们推导出含噪与无噪系数向量之间重叠的闭合表达式，揭示了一条由单一内在噪声尺度控制的通用退化曲线。在傅里叶、勒让德、贝塞尔和哈尔基上的数值实验证实了理论预测。结果表明，谱学习存在一个基本噪声阈值，超过该阈值系数估计变得不稳定，从而对从含噪数据中恢复函数结构施加了内在限制。

英文摘要

Learning functional relationships from noisy data is a central problem in scientific inference. Spectral methods approximate unknown functions by expanding them in a basis and estimating the corresponding coefficients from data, but the stability of these coefficients under noise remains poorly understood. Here we study supervised regression with additive label noise using sparse spectral representations across multiple bases and dimensions. We show that noise induces a predictable drift in the learned coefficient vector whose magnitude depends on the effective number of active spectral modes. After whitening the empirical feature geometry, we derive a closed-form expression for the overlap between noisy and noiseless coefficient vectors, revealing a universal degradation curve governed by a single intrinsic noise scale. Numerical experiments across Fourier, Legendre, Bessel, and Haar bases confirm the theoretical prediction. The results demonstrate that spectral learning exhibits a fundamental noise threshold beyond which coefficient estimates become unstable, placing intrinsic limits on recovering functional structure from noisy data.

URL PDF HTML ☆

赞 0 踩 0

2606.13092 2026-06-12 cs.LG cs.RO math.DS 新提交

Scale Buys Interpolation, Structure Buys a Horizon: Certified Predictability for Equivariant World Models

规模买插值，结构买地平线：等变世界模型的认证可预测性

Hongbo Wang

AI总结针对等变潜在世界模型，提出可计算的多步可预测地平线认证，证明T步滚动误差在对称轨道上恒定，并由李雅普诺夫谱分层界定，且该认证为等变模型独有。

Comments 23 pages (9 main + appendices). Code: https://github.com/TimothyWang418/se3-ejepa

详情

AI中文摘要

规模买插值；结构买认证的地平线。世界模型的平均误差无法说明特定预测是否可信，或可信多久。对于等变潜在世界模型，我们给出可计算的多步可预测地平线认证：$T$步滚动误差在每个对称轨道上恒定（定理A），并由预测器的李雅普诺夫谱逐通道分层，$T_j(\epsilon)\sim\log(1/\epsilon)/\lambda_j$。地平线是双向的——匹配的下界使近似等变被证明受地平线限制——且该认证为结构独有：轨道恒定误差刻画等变性，因此任何非等变模型无论规模多大都不具备。实验上，在40维Lorenz-96上，只有$\mathbb{Z}_N$等变网络恢复完整李雅普诺夫谱（$R^2=0.98$）；密集和循环基线失败。由于谱是忠实的，认证先验地起作用：在固定感知预算下，$c$倍膨胀的认证需要$c$倍预算，且等变认证满足其膨胀密集对应物无法满足的预算——无需校准数据。相同的读出，未经修改，可无训练审计公开预训练世界模型：TD-MPC2检查点落在认证自身的范围分类上——在强膨胀处校准（比率0.94-1.02），在弱膨胀处乐观，在收缩处正确弃权——部署的监控器逐单元复制该映射，样本外。在官方1M-317M多任务阶梯上，校准不随参数增加。在V-JEPA 2-AC（1B，真实机器人数据）上，测量的交叉检查正确覆盖了过度承诺的切空间谱——交叉验证审计，而非原始数值，是可部署的对象。规模买插值，而非校准的地平线。

英文摘要

Scale buys interpolation; structure buys a certified horizon. A world model's average error says nothing about whether a particular prediction can be trusted, or for how long. For equivariant latent world models we give a computable, multi-step certificate of the predictable horizon: $T$-step rollout error is provably constant over each symmetry orbit (Theorem A) and stratified channel-by-channel by the predictor's Lyapunov spectrum, $T_j(ε)\sim\log(1/ε)/λ_j$. The horizon is two-sided -- a matching lower bound makes approximate equivariance provably horizon-limited -- and the certificate is exclusive to structure: orbit-constant error characterizes equivariance, so no non-equivariant model has it at any scale. Empirically, on 40-D Lorenz-96 only a $\mathbb{Z}_N$-equivariant network recovers the full Lyapunov spectrum ($R^2{=}0.98$); dense and recurrent baselines fail. Because the spectrum is faithful, the certificate acts, a priori: under a fixed sensing budget a $c\times$-inflated certificate provably needs $c\times$ the budget, and the equivariant certificate meets a budget its inflated dense counterpart cannot -- with zero calibration data. The same read-out, unchanged, audits public pretrained world models training-free: TD-MPC2 checkpoints land on the certificate's own scope taxonomy -- calibrated where strongly expansive (ratio 0.94-1.02), optimistic where weakly expansive, correctly abstaining where contracting -- a map a deployed monitor replicates cell-by-cell, out-of-sample. Across the official 1M-317M multitask ladder, calibration does not improve with parameters. On V-JEPA 2-AC (1B, real robot data) the measured cross-check correctly overrides an over-promising tangent spectrum -- the cross-validated audit, not the raw number, is the deployable object. Scale buys interpolation, not a calibrated horizon.

URL PDF HTML ☆

赞 0 踩 0

2606.13178 2026-06-12 cs.LG 新提交

Loss-Shift Transfer via Bayes Quotients

通过贝叶斯商进行损失转移迁移学习

Vasileios Sevetlidis

发表机构 * Athena Research Center（雅典娜研究中心）； Democritus University of Thrace（德谟克利特大学）； International Hellenic University（国际希腊大学）

AI总结本文研究数据分布固定但损失函数变化时的损失转移问题，利用贝叶斯商形式化损失的精炼顺序，证明粗损失的最小表示对严格更细的损失不足，并在有限输出对数损失下给出精确量化关系。

详情

AI中文摘要

迁移学习通常被研究为分布偏移的结果。本文识别了一种正交的失败模式，其中数据分布固定而损失函数变化。这种设置称为\emph{损失转移}。损失决定了$X$中哪些信息是贝叶斯相关的，因此即使在同一联合分布$P(X,Y)$下，两个损失也可能需要不同的表示。该思想使用贝叶斯商形式化，允许按精炼程度对损失排序。在贝叶斯商公式中，严格精炼立即给出定性的障碍。对于较粗损失，源最小表示对于严格更细的目标损失是不充分的。对于有限输出的对数损失，这个障碍变成了精确的定量恒等式。超额风险是表示丢弃的关于$Y$的条件信息。在受控、学习、合成图像和真实图像设置中的实验显示了预测的效果，即分类等价的表示在固定数据分布下可能具有不同的最优对数损失性能。

英文摘要

Transfer learning is usually studied as a consequence of distribution shift. This paper identifies an orthogonal failure mode in which the data distribution is fixed and the loss changes. This setting is called \emph{loss shift}. A loss determines which information in $X$ is Bayes-relevant, and two losses may therefore require different representations even under the same joint law $P(X,Y)$. The idea is formalized using Bayes quotients, which allow losses to be ordered by refinement. In the Bayes-quotient formulation, strict refinement gives an immediate qualitative obstruction. A source-minimal representation for a coarser loss is insufficient for a strictly finer target loss. For finite-output log loss, this obstruction becomes an exact quantitative identity. The excess risk is the conditional information about $Y$ discarded by the representation. Experiments in controlled, learned, synthetic-image, and real-image settings show the predicted effect, i.e., classification-equivalent representations can have different optimal log-loss performance under a fixed data distribution.

URL PDF HTML ☆

赞 0 踩 0

2606.13287 2026-06-12 cs.LG cs.DC math.OC 新提交

Clipping Makes Distributed and Federated Asynchronous SGD Robust to Stragglers

裁剪使分布式和联邦异步SGD对掉队者具有鲁棒性

Samuel Erickson, Mikael Johansson

发表机构 * KTH Royal Institute of Technology（瑞典皇家理工学院）

AI总结本文理论证明梯度裁剪能消除异步SGD中最大延迟对复杂度的影响，基于次Weibull梯度噪声模型，首次实现异步优化的高概率收敛。

详情

AI中文摘要

在现代机器学习中，训练的并行化是扩大规模的重要策略。异步随机梯度下降（ASGD）通过避免等待慢速工作节点来最大化可用硬件的利用率。然而，在恒定步长下，由于更新中的大延迟，慢速工作节点仍然会对ASGD的收敛产生负面影响。同时，在深度学习模型的异步训练中，经验观察到梯度裁剪能“稳定”训练。在这项工作中，我们为这一行为提供了理论依据，证明裁剪消除了最大延迟对预言复杂度的依赖。我们采用次Weibull梯度噪声模型，该模型将次高斯和次指数分布推广到更重尾的分布，受深度学习中的经验观察启发。我们证明了期望收敛，并且首次在异步优化中证明了高概率收敛。

英文摘要

In modern machine learning, parallelization of training is an important strategy for increasing scale. Asynchronous stochastic gradient descent (ASGD), which maximizes the utilization of available hardware by avoiding waiting for slow workers. However, with constant step sizes, the convergence of ASGD is nonetheless affected negatively by slow workers due to large delays in updates. At the same time, it has been empirically observed in asynchronous training of deep learning models that gradient clipping "stabilizes" training. In this work, we provide a theoretical justification for this behavior, as we show that clipping removes the dependence of the maximum delay in the oracle complexity. We employ a sub-Weibull model of gradient noise which generalizes sub-Gaussian and sub-exponential distributions to more heavy-tailed distributions, motivated by empirical observations in deep learning. We show convergence in expectation, and the first time in asynchronous optimization, convergence with high probability.

URL PDF HTML ☆

赞 0 踩 0

2606.13576 2026-06-12 cs.LG cs.CC cs.DS stat.ML 新提交

Learning with Simulators: No Regret in a Computationally Bounded World

与模拟器学习：计算受限世界中的无悔学习

Sasha Voitovych, Abhishek Shetty, Noah Golowich, Alexander Rakhlin

发表机构 * MIT（麻省理工学院）； Microsoft Research（微软研究院）

AI总结提出可模拟过程框架，利用模拟器近似任意复杂依赖的数据分布，恢复VC维误差界，并展示条件采样的统计与计算优势。

Comments To appear at COLT 2026

详情

对数凹采样的统一复杂度界

Yunbum Kook, Santosh S. Vempala

发表机构 * University of Texas at Austin（得克萨斯大学奥斯汀分校）

AI总结本文通过In-and-Out算法与指数提升，给出了从热启动采样任意对数凹分布的简单、统一且近乎紧的界，主要创新是提升了提升分布的Poincaré常数界。

Comments 5 pages

2606.12892 2026-06-12 stat.ML cs.LG econ.EM math.ST stat.ME stat.TH 交叉投稿

Prediction-Powered Causal Inference by Automatic Debiased Machine Learning and Semi-Supervised Riesz Regression

预测驱动的因果推断：自动去偏机器学习与半监督Riesz回归

Masahiro Kato

发表机构 * University of Tokyo（东京大学）

AI总结研究半监督设置下因果参数的半参数有效估计，通过结合去偏机器学习和半监督Riesz回归，提出DML-PPCI和TMLE-PPCI方法，实现比仅用标注数据更小的渐近方差。

详情

AI中文摘要

本研究探讨了在半监督设置下因果和结构参数的半参数有效估计。在我们的设置中，除了由结果和回归变量组成的标注观测数据外，还有未标记的辅助回归变量可用。我们的目标是构建因果和结构参数的估计量，其渐近方差小于仅使用标注数据构建的估计量。我们将此框架称为预测驱动的因果推断（PPCI）。我们首先推导了有效影响函数和效率界，这表明使用辅助回归变量可以获得比仅从标注观测数据可达到的效率界更小的渐近方差。然后，通过将有效影响函数与去偏机器学习（DML）框架相结合，我们提出了称为DML-PPCI的方法。如果我们构建一个估计方程估计量，我们称之为EE-DML-PPCI；如果我们构建一个目标学习估计量，我们称之为TMLE-DML-PPCI。两种估计量的渐近方差都与我们推导的效率界相匹配。在构建估计量时，有效影响函数的估计起着重要作用。在我们的研究中，有效影响函数也是一个Neyman正交分数，它依赖于Riesz表示子和回归函数。对于Riesz表示子估计，我们开发了具有收敛速度保证的半监督广义Riesz回归。

SGD预条件子的设计准则：局部条件数、噪声基底与盆地稳定性

Mitchell Scott, Tianshi Xu, Ziyuan Tang, Alexandra Pichette-Emmons, Qiang Ye, Yousef Saad, Yuanzhe Xi

发表机构 * Department of Mathematics, Emory University（埃默里大学数学系）； Department of Mathematics, University of Minnesota Twin Cities（明尼苏达大学双城分校数学系）； Department of Computer Science, University of Minnesota Twin Cities（明尼苏达大学双城分校计算机科学系）； Department of Mathematics, University of Kentucky（肯塔基大学数学系）

AI总结针对SGD在训练后期因各向异性曲率和梯度噪声导致的收敛缓慢问题，提出基于对称正定矩阵M的预条件SGD分析框架，推导收敛速率和噪声基底受M相关量控制的界，并给出非凸目标下的盆地稳定性保证，为科学机器学习提供设计准则。

Comments 31 pages, 11 Figures

详情

Journal ref: Trans. of Mach. Learning Research, 06/2026

AI中文摘要

随机梯度下降（SGD）在训练后期常因各向异性曲率和梯度噪声而变慢。我们在对称正定矩阵$\mathbf{M}$诱导的几何中分析预条件SGD，推导出收敛速率和随机噪声基底均受$\mathbf{M}$相关量控制的界：速率通过$\mathbf{M}$度量下的有效条件数，基底通过该条件数与预条件噪声水平的乘积。对于非凸目标，我们建立了依赖于预条件子的盆地稳定性保证：当光滑性和盆地大小以$\mathbf{M}$范数度量时，迭代停留在良好局部区域的概率有显式下界。这一视角在科学机器学习（SciML）中尤为重要，其中在随机更新下实现小训练损失与物理保真度、数值稳定性和约束满足密切相关。该框架适用于对角/自适应和曲率感知预条件子，并给出一个简单的设计原则：选择$\mathbf{M}$以改善局部条件同时衰减噪声。在二次诊断问题和三个SciML基准上的实验验证了预测的速率-基底行为。

英文摘要

Stochastic Gradient Descent (SGD) often slows in the late stage of training due to anisotropic curvature and gradient noise. We analyze preconditioned SGD in the geometry induced by a symmetric positive definite matrix $\mathbf{M}$, deriving bounds in which both the convergence rate and the stochastic noise floor are governed by $\mathbf{M}$-dependent quantities: the rate through an effective condition number in the $\mathbf{M}$-metric, and the floor through the product of that condition number and the preconditioned noise level. For nonconvex objectives, we establish a preconditioner-dependent basin-stability guarantee: when smoothness and basin size are measured in the $\mathbf{M}$-norm, the probability that the iterates remain in a well-behaved local region admits an explicit lower bound. This perspective is particularly relevant in Scientific Machine Learning (SciML), where achieving small training loss under stochastic updates is closely tied to physical fidelity, numerical stability, and constraint satisfaction. The framework applies to both diagonal/adaptive and curvature-aware preconditioners and yields a simple design principle: choose $\mathbf{M}$ to improve local conditioning while attenuating noise. Experiments on a quadratic diagnostic and three SciML benchmarks validate the predicted rate-floor behavior.

URL PDF HTML ☆

赞 0 踩 0

2512.23566 2026-06-12 math.DS cond-mat.stat-mech cs.LG math.OC stat.ML 版本更新

From geometry to dynamics: Learning overdamped Langevin dynamics from sparse observations with geometric constraints

从几何到动力学：基于几何约束从稀疏观测学习过阻尼朗之万动力学

Dimitra Maoutsa

发表机构 * Dimitra Maoutsa（迪米特拉·马乌茨）

AI总结提出一种随机控制框架，利用系统不变密度的几何结构进行路径增强，从稀疏时间采样数据中恢复过阻尼朗之万动力学，无需参数模型假设。

Comments 10+54 pages, 14 figures; accepted at ICML 2026 An earlier account of this work has previously appeared in arXiv:2301.08102 and arXiv:2304.00423 ; main methodology remains the same, this version includes additional numerical experiments and theory

详情

AI中文摘要

当随机系统的轨迹在时间上稀疏采样时，我们如何学习其动力学背后的规律？现有方法要么需要时间分辨的高频观测，要么依赖于仅适用于保守系统的几何论证，限制了它们能恢复的动力学范围。在这里，我们提出一个新的框架，通过将推断重新表述为随机控制问题来调和这两种观点。我们的方法使用几何驱动的路径增强，以系统不变密度的几何结构为指导，重构可能的轨迹并推断底层动力学，而不假设特定的参数模型。应用于过阻尼朗之万系统，我们的方法即使在极度欠采样数据下也能准确恢复随机动力学，在合成基准测试中优于现有方法。这项工作证明了将几何归纳偏差纳入随机系统识别方法的有效性。

英文摘要

How can we learn the laws underlying the dynamics of stochastic systems when their trajectories are sampled sparsely in time? Existing methods either require temporally resolved high-frequency observations, or rely on geometric arguments that apply only to conservative systems, limiting the range of dynamics they can recover. Here, we present a new framework that reconciles these two perspectives by reformulating inference as a stochastic control problem. Our method uses geometry-driven path augmentation, guided by the geometry in the system's invariant density to reconstruct likely trajectories and infer the underlying dynamics without assuming specific parametric models. Applied to overdamped Langevin systems, our approach accurately recovers stochastic dynamics even from extremely undersampled data, outperforming existing methods in synthetic benchmarks. This work demonstrates the effectiveness of incorporating geometric inductive biases into stochastic system identification methods.

URL PDF HTML ☆

赞 0 踩 0

2601.22003 2026-06-12 stat.ML cs.LG stat.CO 版本更新

Efficient Stochastic Optimisation via Sequential Monte Carlo

通过序贯蒙特卡洛实现高效随机优化

James Cuin, Davide Carbone, Yanbo Tang, O. Deniz Akyildiz

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结针对梯度难以计算的优化问题，提出用序贯蒙特卡洛（SMC）采样器替代昂贵的内采样循环，实现高效随机优化，并在能量模型奖励调优中验证有效性。

Comments Accepted to ICML 2026

2603.17527 2026-06-12 stat.ML cs.LG math.OC 版本更新

Mirror Descent on Riemannian Manifolds

黎曼流形上的镜像下降

Jiaxin Jiang, Lei Shi, Jiyuan Tan

发表机构 * School of Mathematical Sciences, Fudan University, Shanghai 200433, China（复旦大学数学学院，上海200433，中国）； Shanghai Key Laboratory for Contemporary Applied Mathematics, Fudan University, Shanghai 200433, China（上海当代应用数学重点实验室，复旦大学，上海200433，中国）

AI总结将镜像下降推广到黎曼流形，通过重参数化提出黎曼镜像下降（RMD）及其随机变体，并建立非渐近收敛保证，在Stiefel流形上退化为曲线梯度下降（CGD）。

2606.12487 2026-06-12 cs.LG 新提交

ReSET: 通过步骤感知温度缩放实现精确的延迟关键型NVFP4推理

Sihwa Lee, Janghwan Lee, Donghoon Yoo, Jae Gon Kim, Hanyul Ryu, Soojung Ryu, Jungwook Choi

发表机构 * Hanyang University（汉阳大学）； Xenoscube Korean Inc.（Xenoscube韩国公司）

AI总结针对大型推理模型在NVFP4低精度推理中精度下降和延迟问题，提出基于推理步骤熵的温度缩放方法ReSET，并设计CUDA小M核，在多个基准上提升精度约2点，解码速度提升2倍。

详情

AI中文摘要

大型推理模型（LRMs）通过生成长中间推理轨迹来改进复杂问题求解，但这大幅增加了推理成本。NVFP4推理通过硬件支持的低精度执行提供了一种减少计算和内存成本的有前景方法。然而，直接将NVFP4应用于LRMs引入了两个实际限制：量化下推理精度下降，且现有NVFP4核在小型批处理自回归解码中未完全实现延迟优势。在这项工作中，我们分析了NVFP4量化对推理过程中token级不确定性的影响。我们表明，量化增加了低熵符号token的错误采样，同时导致在高不确定性推理步骤中过度集中于少量token。基于这一观察，我们提出了\textbf{ReSET}，一种基于推理步骤熵的温度缩放方法，它在线估计步骤级不确定性，并使用token级和步骤级熵信号自适应调整解码温度。为解决延迟差距，我们进一步设计了一个CUDA核心的小型$M$ NVFP4核，用于延迟关键的自回归解码。在推理基准和模型规模上，ReSET将NVFP4推理精度相比NVFP4基线提升高达$\sim\!$2个点。我们的CUDA核心小型$M$核进一步改善了延迟关键解码，相比NVFP4 vLLM提供高达$2.5\!\times$的核级加速，相比BF16提供约$2\!\times$的端到端解码加速。代码可在该https URL获取。

英文摘要

Large reasoning models (LRMs) improve complex problem-solving by generating long intermediate reasoning traces, but this substantially increases inference costs. NVFP4 inference offers a promising approach to reduce both computational and memory costs through hardware-supported low-precision execution. However, directly applying NVFP4 to LRMs introduces two practical limitations: reasoning accuracy degrades under quantization, and existing NVFP4 kernels do not fully realize latency benefits in small-batch autoregressive decoding. In this work, we analyze the effect of NVFP4 quantization on token-level uncertainty during reasoning. We show that quantization increases incorrect sampling at low-entropy symbolic tokens, while causing over-concentration on a small set of tokens in high-uncertainty reasoning steps. Based on this observation, we propose \textbf{ReSET}, a reasoning-step entropy-based temperature-scaling method that estimates step-level uncertainty online and adapts the decoding temperature using both token-level and step-level entropy signals. To address the latency gap, we further design a CUDA-core small-$M$ NVFP4 kernel for latency-critical autoregressive decoding. Across reasoning benchmarks and model scales, ReSET improves NVFP4 reasoning accuracy by up to $\sim\!$2 points over the NVFP4 baseline. Our CUDA-core small-$M$ kernel further improves latency-critical decoding, delivering up to $2.5\!\times$ kernel-level speedup over NVFP4 vLLM and approximately $2\!\times$ end-to-end decoding speedup over BF16. Code is available at https://github.com/aiha-lab/ReSET.

URL PDF HTML ☆

赞 0 踩 0

2606.13379 2026-06-12 cs.LG cs.AR cs.ET 新提交

Positional Encoding in the Context of Memristor-Based Analog Computation for Automatic Speech Recognition

基于忆阻器的模拟计算在自动语音识别中的位置编码

Benedikt Hilmes, Nick Rossenbach, Ralf Schlüter

发表机构 * Machine Learning and Human Language Technology Group, Faculty of Computer Science, RWTH Aachen University（亚琛工业大学计算机科学学院机器学习和人类语言技术组）； Apptek GmbH（Apptek 有限公司）

AI总结针对忆阻器模拟计算中位置编码导致模数转换精度下降的问题，通过调整ADC权重和精度位比例或移除编码相关线性变换，分别降低约50%和30%的性能损失。

Comments Accepted at Interspeech 2026

2606.13177 2026-06-12 cs.CL cs.AI cs.LG 交叉投稿

MemRefine: LLM-Guided Compression for Long-Term Agent Memory

MemRefine: 基于LLM引导的压缩用于长期智能体记忆

Minjae Kim, Jinheon Baek, Soyeong Jeong, Sung Ju Hwang

发表机构 * Korea University（韩国大学）； KAIST（韩国科学技术院）

AI总结提出MemRefine框架，利用LLM判断事实内容，通过删除、合并和保留操作将记忆库压缩到固定预算内，在多个基准上保持下游性能并优于基于规则的基线。

详情

AI中文摘要

大型语言模型（LLM）智能体越来越需要在长期交互中运行，其中过去对话中的信息必须被保留和回忆以支持未来任务。然而，随着交互的积累，记忆存储无限制增长，并充满冗余条目，这些条目增加了存储成本，并通过排挤最有用的证据而降低了检索质量。此外，在具有硬性内存预算的资源受限平台上，这尤其受限，促使我们制定了有存储预算的记忆管理任务，即在固定预算内保持已构建的记忆库，同时保留对未来交互有用的信息。为此，我们提出了MemRefine，一个基于LLM引导的框架，由于表面相似性不能很好地反映事实价值，该框架仅使用相似性来提出候选对，并将删除、合并和保留决策推迟给基于事实内容的LLM判断，迭代直到满足预算。在多个记忆框架和长期对话基准上，MemRefine始终满足目标预算，同时保持下游性能，并在紧预算下优于基于规则的基线。

英文摘要

Large language model (LLM) agents are increasingly expected to operate over long-term interactions, where information from past dialogues must be preserved and recalled to support future tasks. However, as interactions accumulate, the memory store grows without bound and fills with redundant entries that inflate storage cost and degrade retrieval by crowding out the most useful evidence. Furthermore, this is especially limiting on resource-constrained platforms with hard memory budgets, motivating us to formulate storage-budgeted memory management, the task of keeping an already constructed memory store within a fixed budget while preserving information useful for future interactions. To this end, we then propose MemRefine, an LLM-guided framework that, since surface similarity poorly reflects factual value, uses similarity only to propose candidate pairs and defers delete, merge, and preserve decisions to an LLM judge based on factual content, iterating until the budget is met. Across multiple memory frameworks and long-term conversation benchmarks, MemRefine consistently meets target budgets while preserving downstream performance and outperforming rule-based baselines under tight budgets.

URL PDF HTML ☆

赞 0 踩 0

Prism: 通过GPU内存气球实现经济高效的多LLM服务

Shan Yu, Yifan Qiao, Mingyuan Ma, Yangmin Li, Shuo Yang, Xinyuan Tong, Yang Wang, Zhiqiang Xie, Yuwei An, Shiyi Cao, Ke Bao, Deepak Vij, Xiaoning Ding, Yichen Wang, Qingda Lu, Zhong Wang, Gao Gao, Harry Xu, Junyi Shu, Jiarong Xing, Ying Sheng

发表机构 * UCLA（加州大学洛杉矶分校）； UC Berkeley（伯克利加州大学）； Harvard University（哈佛大学）； CMU（卡内基梅隆大学）； University of Edinburgh（爱丁堡大学）； Intel（英特尔）； Stanford University（斯坦福大学）； LMSYS（灵州市系统实验室）； ByteDance（字节跳动）； Alibaba Cloud（阿里云）； Tsinghua University（清华大学）； Novita AI ； Rice University（里士满大学）

AI总结针对多LLM服务中资源效率低下的问题，提出基于内存气球的内存中心化LLM协同服务框架Prism，统一空间与时间共享，已在10K+ GPU生产环境部署。

Comments OSDI'26

2606.12679 2026-06-12 cs.LG cs.CR eess.IV 新提交

Fed-FBD: Federated Functional Block Diversification for Isolation, Privacy, and Surgical Unlearning

Fed-FBD：用于隔离、隐私和精准遗忘的联邦功能块多样化

Weijie Chen, Alan B. McMillan

发表机构 * University of Wisconsin–Madison（威斯康星大学麦迪逊分校）

AI总结提出Fed-FBD模块化联邦架构，将ResNet分解为六个功能块并维护颜色变体仓库，实现块级隔离、隐私设计和亚秒级精准遗忘，在多个数据集上以微小精度代价换取安全保障。

Comments 12 pages, 3 figures, 8 tables. Code: https://github.com/wchen-ai/functional-block-diversification

详情

AI中文摘要

联邦学习（FL）能够在无需共享原始患者数据的情况下进行协作模型训练，但标准方法（如FedAvg）将每个客户端视为黑盒，无法隔离对抗性贡献者、审计每个客户端的影响或尊重已退出参与者的被遗忘权。我们提出Fed-FBD（联邦功能块多样化），一种模块化联邦架构，将ResNet骨干网络分解为六个功能块（主干、四个残差组和分类头），并维护一个包含N种颜色变体的仓库，每种变体由独立跟踪和贡献者标记的块组装而成。Fed-FBD提供了FedAvg所不具备的三种能力：(i) 架构保证的块级隔离，使对抗性或错误标注的客户端无法污染干净颜色；(ii) 隐私设计，在应用任何隐私机制之前，成员推断优势已与随机猜测无异；(iii) 在亚秒级成本下无需重新训练即可精准遗忘已退出参与者的贡献。在六个MedMNIST-2D数据集、224x224的PathMNIST和CIFAR-10上的实验表明，Fed-FBD在规模足够的数据集上以0.3%-3.1%的IID精度差距换取这些保证，在四个数据集中的三个上，Dirichlet alpha=1.0时与FedAvg的差距在0.8%-4.0%以内，并将我们研究的所有六种对抗性攻击限制在中毒客户端自己的块内，干净颜色上的AUC漂移最多为+/-0.01。

英文摘要

Federated learning (FL) enables collaborative model training without sharing raw patient data, but standard approaches such as FedAvg treat each client as a black box and provide no mechanism for isolating an adversarial contributor, auditing per-client influence, or honoring a departed participant's right to be forgotten. We present Fed-FBD (Federated Functional Block Diversification), a modular federated architecture that decomposes a ResNet backbone into six functional blocks (the stem, four residual groups, and the classification head) and maintains a warehouse of N color variants, each assembled from independently tracked and contributor-stamped blocks. Fed-FBD provides three capabilities absent in FedAvg: (i) architecturally guaranteed block-level isolation, so that an adversarial or mislabelled client cannot contaminate the clean colous; (ii) privacy-by-design, where membership inference advantage is already indistinguishable from chance before any privacy mechanism is applied; and (iii) surgical machine unlearning of a departed participant's contribution at sub-second cost and without retraining. Experiments on six MedMNIST-2D datasets, PathMNIST at 224x224, and CIFAR-10 show that Fed-FBD trades a modest 0.3%-3.1% IID accuracy gap on the adequately sized datasets for these guarantees, remains within 0.8%-4.0% of FedAvg at Dirichlet alpha=1.0 on three of four datasets, and confines all six adversarial attacks we study to the poisoned client's own blocks with at most +/-0.01 AUC drift on the clean colors.

URL PDF HTML ☆

赞 0 踩 0

2606.12654 2026-06-12 stat.ME cs.LG stat.ML 交叉投稿

Computationally tractable robust differentially private mean estimation

计算可处理的鲁棒差分隐私均值估计

Kelly Ramsay

AI总结提出一种名为“气球均值”的新差分隐私均值估计器，通过扩展马氏距离球上的迭代裁剪实现计算可处理性、鲁棒性及零集中差分隐私，理论保证在重尾和污染椭圆模型下的统计性能与鲁棒性。

Comments 40 pages, 17 figures

2606.12703 2026-06-12 cs.CR cs.AI cs.LG 交叉投稿

FedBiCross: 医学图像上的个性化一次性联邦学习

Yuexuan Xia, Yinghao Zhang, Yalin Liu, Hong-Ning Dai, Yong Xia

发表机构 * School of Computer Science and Engineering, Northwestern Polytechnical University, China（西北工业大学计算机科学与工程学院）； School of Science and Technology, Hong Kong Metropolitan University, Hong Kong（香港 Metropolitan 大学科学与技术学院）； Department of Computer Science, Hong Kong Baptist University, Hong Kong（香港 Baptist 大学计算机科学系）

AI总结提出FedBiCross框架，通过聚类、双层跨簇优化和个性化蒸馏解决非独立同分布数据下一次性联邦学习中知识蒸馏效果差的问题，在四个医学图像数据集上优于现有方法。

Comments Accepted by BlockSys 2026. This version of the contribution has been accepted for publication, after peer review (when applicable) but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections

详情

AI中文摘要

基于无数据知识蒸馏的一次性联邦学习（OSFL）在单轮通信中训练模型，无需共享原始数据，这使得OSFL对隐私敏感的医疗应用具有吸引力。然而，现有方法聚合所有客户端的预测以形成全局教师。在非独立同分布数据下，冲突的预测在平均过程中相互稀释，产生信息量较少的软标签，从而削弱蒸馏效果。我们提出FedBiCross，一个个性化OSFL框架，包含三个阶段：（1）根据模型输出相似性对客户端进行聚类，形成连贯的子集成；（2）双层跨簇优化，学习自适应权重以选择性利用有益的跨簇知识，同时抑制负迁移；（3）针对客户端特定适应的个性化蒸馏。在四个医学图像数据集上的实验表明，FedBiCross在不同非独立同分布程度下始终优于最先进的基线方法。

英文摘要

Data-free knowledge distillation-based one-shot federated learning (OSFL) trains a model in a single communication round without sharing raw data, making OSFL attractive for privacy-sensitive medical applications. However, existing methods aggregate predictions from all clients to form a global teacher. Under non-IID data, conflicting predictions dilute each other during averaging, yielding less informative soft labels that weaken distillation. We propose FedBiCross, a personalized OSFL framework with three stages: (1) clustering clients by model output similarity to form coherent sub-ensembles, (2) bi-level cross-cluster optimization that learns adaptive weights to selectively leverage beneficial cross-cluster knowledge while suppressing negative transfer, and (3) personalized distillation for client-specific adaptation. Experiments on four medical image datasets demonstrate that FedBiCross consistently outperforms state-of-the-art baselines across different non-IID degrees.

URL PDF HTML ☆

赞 0 踩 0

2605.11165 2026-06-12 cs.LG 版本更新

COSMOS: Model-Agnostic Personalized Federated Learning with Clustered Server Models and Pseudo-Label-Only Communication

COSMOS：基于聚类服务器模型和伪标签通信的模型无关个性化联邦学习

Ben Rachmut, Luise Ge, William Yeoh, Ning Zhang, Yevgeniy Vorobeychik

发表机构 * Washington University in St. Louis（华盛顿大学圣路易斯分校）

AI总结 COSMOS通过伪标签通信实现服务器端个性化，利用客户端本地模型预测公共数据并聚类，训练集群特定模型并回传知识蒸馏，理论分析显示其能有效降低个性化风险，实验验证其在异构环境中优于现有基线方法。

详情

AI中文摘要

联邦学习在异构环境中面临挑战，因为客户端模型在架构和数据分布上差异显著。尽管近期方法通过客户端聚类和知识蒸馏应对，但同时处理架构和统计异质性仍困难。我们引入COSMOS，一种模型无关框架，通过仅使用伪标签通信实现服务器端个性化。客户端训练本地模型并在公共数据上进行预测；服务器根据预测相似性聚类客户端，利用自身计算为每个群组训练特定模型，并将所得模型蒸馏回客户端。我们提供了首个理论分析，证明从学习的集群模型蒸馏可产生指数级个性化风险收缩，超越模型无关联邦学习通常提供的收敛到平稳状态保证。在基准测试中，COSMOS在异构环境中一致优于所有模型无关联邦学习基线方法，同时与最先进的个性化联邦学习方法竞争。更广泛地说，我们的结果强调了使用伪标签实现个性化服务器端学习作为可扩展且模型无关联邦学习的有前景范式。

英文摘要

Federated learning (FL) in heterogeneous environments remains challenging because client models often differ in both architecture and data distribution. While recent approaches attempt to address this challenge through client clustering and knowledge distillation, simultaneously handling architectural and statistical heterogeneity remains difficult. We introduce COSMOS, a model-agnostic framework that enables server-side personalization using only pseudo-label communication. Clients train local models and predict on the public data; the server clusters clients by prediction similarity, trains a cluster-specific model for each group using its own compute, and distills the resulting models back to clients. We provide the first theoretical analysis showing that distillation from the learned cluster models can yield exponential personalization risk contraction, going beyond the convergence-to-stationarity guarantees typically provided in model-agnostic FL. Experiments across benchmarks demonstrate that COSMOS consistently outperforms all model-agnostic FL baselines while remaining competitive with state-of-the-art personalized FL methods. More broadly, our results highlight personalized server-side learning with pseudo-labels as a promising paradigm for scalable and model-agnostic federated learning in highly heterogeneous environments.

URL PDF HTML ☆

赞 0 踩 0

2305.08175 2026-06-12 cs.DB cs.CR cs.LG 版本更新

ResidualPlanner+: a scalable matrix mechanism for marginals and beyond

ResidualPlanner+：一种用于边际查询及更广泛查询的可扩展矩阵机制

Guanlin He, Yingtai Xiao, Levent Toksoz, Zeyu Ding, Danfeng Zhang, Daniel Kifer

发表机构 * The Pennsylvania State University（宾夕法尼亚州立大学）； Binghamton University（宾厄姆顿大学）； Duke University（杜克大学）； TikTok Inc.（抖音公司）

AI总结提出两种可扩展的矩阵机制ResidualPlanner和ResidualPlanner+，分别优化边际查询的精度和支持更复杂的工作负载（如范围查询），在速度和内存上显著超越现有方法。

详情

AI中文摘要

带噪声的边际查询是保护机密性的常见数据发布形式，对于列联表分析、贝叶斯网络构建甚至合成数据生成等下游任务非常有用。为线性查询（如边际查询）提供无偏噪声答案的隐私机制称为矩阵机制。我们提出了ResidualPlanner和ResidualPlanner+，两种高度可扩展的矩阵机制。ResidualPlanner在使用高斯噪声回答边际查询时既最优又可扩展，而ResidualPlanner+支持更通用的工作负载，例如边际查询与范围查询或前缀和查询的组合。ResidualPlanner可以优化许多损失函数，这些损失函数可以写成边际方差的凸函数（先前的工作仅限于一个预定义的目标函数）。ResidualPlanner可以在几秒钟内优化大规模设置中边际查询的精度，即使之前的最先进方法（HDMM）内存耗尽。它甚至可以在几分钟内处理具有100个属性的数据集。此外，ResidualPlanner可以高效计算每个边际的方差/协方差值（先前的方法即使对于相对较小的数据集也会很快耗尽内存）。ResidualPlanner+支持更复杂的工作负载，这些工作负载结合了边际查询和范围/前缀和查询（例如，关于种族的边际查询、关于年龄的范围查询以及回答每个种族的年龄范围查询的组合种族/年龄表格）。它甚至支持用户在不同属性上自定义工作负载。凭借这种增加的灵活性，ResidualPlanner+不一定是最优的，但它仍然极具可扩展性，并且在精度和速度上均优于先前的最先进方法（HDMM）处理前缀和查询。

英文摘要

Noisy marginals are a common form of confidentiality protecting data release and are useful for many downstream tasks such as contingency table analysis, construction of Bayesian networks, and even synthetic data generation. Privacy mechanisms that provide unbiased noisy answers to linear queries (such as marginals) are known as matrix mechanisms. We propose ResidualPlanner and ResidualPlanner+, two highly scalable matrix mechanisms. ResidualPlanner is both optimal and scalable for answering marginal queries with Gaussian noise, while ResidualPlanner+ provides support for more general workloads, such as combinations of marginals and range queries or prefix-sum queries. ResidualPlanner can optimize for many loss functions that can be written as a convex function of marginal variances (prior work was restricted to just one predefined objective function). ResidualPlanner can optimize the accuracy of marginals in large scale settings in seconds, even when the previous state of the art (HDMM) runs out of memory. It even runs on datasets with 100 attributes in a couple of minutes. Furthermore, ResidualPlanner can efficiently compute variance/covariance values for each marginal (prior methods quickly run out of memory, even for relatively small datasets). ResidualPlanner+ provides support for more complex workloads that combine marginal and range/prefix-sum queries (e.g., a marginal on race, a range query on age, and a combined race/age tabulation that answers age range queries for each race). It even supports custom user-defined workloads on different attributes. With this added flexibility, ResidualPlanner+ is not necessarily optimal, however it is still extremely scalable and outperforms the prior state-of-the-art (HDMM) on prefix-sum queries both in terms of accuracy and speed.

URL PDF HTML ☆

赞 0 踩 0

2606.12490 2026-06-12 cs.LG 新提交

Robustness Verification of Recurrent Neural Networks with Abstraction Refinement

基于抽象精化的循环神经网络鲁棒性验证

Li-Jen Lin, Chih-Duo Hong

发表机构 * National Science and Technology Council (NSTC), Taiwan（台湾国家科学与技术委员会）

AI总结提出抽象精化框架，通过分割预激活区间消除非线性松弛误差，并利用SHAP引导的时间步选择策略降低组合成本，显著提升RNN鲁棒性验证成功率。

详情

AI中文摘要

PolicyGuard：面向强化学习智能体的测试时和步级对抗防御

Junfeng Guo Heng Huang

AI总结提出PolicyGuard，一种基于高斯过程后验方差的测试时步级后门防御方法，通过自适应伪轨迹计算单步不确定性，在七种RL游戏中达到平均AUROC 0.856和0.859。

详情

AI中文摘要

尽管强化学习（RL）的实际应用日益普及，但RL系统的安全性值得更多关注和探索。特别是，最近的研究揭示了RL智能体容易受到后门攻击，即受害智能体在标准条件下表现正常，但在特定触发器被激活时执行恶意动作。现有的RL后门防御要么需要访问智能体的内部参数，要么仅在模型或轨迹级别操作，或者仅限于特定攻击类型。为了确保RL智能体的安全性，我们提出了\texttt{PolicyGuard}，一种\textit{测试时步级}后门防御方法，它利用高斯过程（GP）后验方差并自适应伪轨迹以实现单个时间步的不确定性计算。此外，我们还提供了理论基础来解释GP后验方差的有效性。在七个RL游戏上的大量实验表明，PolicyGuard在大多数情况下实现了最先进的检测性能，对于基于扰动的攻击平均AUROC为0.856，对于对抗智能体攻击平均AUROC为0.859。

英文摘要

While real-world applications of reinforcement learning (RL) are becoming increasingly popular, the security of RL systems deserve more attention and exploration. In particular, recent work has revealed that RL agents are vulnerable to backdoor attacks, where a victim agent behaves normally under standard conditions but executes malicious actions when a specific trigger is activated. Existing backdoor defenses for RL either require access to the agent's internal parameters, operate only at the model or trajectory level, or are limited to specific attack types. To ensure the security of RL agents, we propose \texttt{PolicyGuard}, a \textit{test-time step-level} backdoor defense which leverages Gaussian Process (GP) posterior variance and adapts pseudo trajectories to enable uncertainty computation for individual time step. Besides, we also provide theoretical foundations to explain the efficacy of GP posterior variance. Extensive experiments across seven RL games demonstrate that PolicyGuard achieves state-of-the-art detection performance in most cases, with average AUROC of 0.856 for perturbation-based attacks and 0.859 for adversary-agent attacks.

URL PDF HTML ☆

赞 0 踩 0

2606.13172 2026-06-12 cs.LG 新提交

Detecting Explanatory Insufficiency in Learned Representations: A Framework for Representational Vigilance

检测学习表示中的解释不充分性：表示警觉性框架

Jacques Raynal, Pierre Slangen, Elsa Raynal, Jacques Margerit

发表机构 * Laboratory of Bioengineering and Nanosciences (LBN), University of Montpellier（蒙彼利埃大学生物工程与纳米科学实验室）； EuroMov Digital Health in Motion, University of Montpellier, IMT Mines Alès（蒙彼利埃大学EuroMov数字健康运动实验室，IMT阿莱斯矿业学院）； Certified Sophrologist, Sensorimotor Practice（认证心理放松治疗师，感觉运动实践）； Emeritus Professor, University of Montpellier（蒙彼利埃大学名誉教授）

AI总结提出VER框架，通过识别持久残差结构来监测学习表示的充分性，补充传统评估方法。

Comments 22 pages, 1 figure. Conceptual framework for representation diagnostics in machine learning

详情

AI中文摘要

学习表示是现代机器学习的核心，通常通过预测性能、鲁棒性、不确定性估计或泛化能力来评估。然而，一个学习表示可能在操作上仍然成功，同时逐渐无法组织未被传统评估指标完全捕获的持久残差结构。本文介绍了VER（表示警觉评估器），一个用于监测学习表示充分性的概念框架。VER不提出新的学习算法、损失函数或模型架构。相反，它形式化了一个诊断过程，通过该过程可以识别、分析持久残差结构，并将其解释为解释不充分性的潜在指标。该框架将表示不充分性与普通预测误差、不确定性、噪声和分布偏移区分开来。它引入了一个基于表示识别、解释域界定、残差结构检测、解释阻力评估和警觉信号发出的监测序列。VER旨在作为机器学习中表示诊断的贡献。其目标不是取代现有的评估方法，而是通过将表示充分性视为明确的探究对象来补充它们。还概述了通过表示警觉性基准进行实证评估的路径。

英文摘要

Learned representations are central to modern machine learning and are commonly evaluated through predictive performance, robustness, uncertainty estimation, or generalization. However, a learned representation may remain operationally successful while progressively failing to organize persistent residual structures that are not fully captured by conventional evaluation metrics. This article introduces VER, the Vigilant Evaluator of Representations, a conceptual framework for monitoring representational adequacy in learned representations. VER does not propose a new learning algorithm, loss function, or model architecture. Instead, it formalizes a diagnostic process through which persistent residual structures may be identified, analyzed, and interpreted as potential indicators of explanatory insufficiency. The framework distinguishes representational inadequacy from ordinary prediction error, uncertainty, noise, and distribution shift. It introduces a monitoring sequence based on representation identification, explanatory-domain delimitation, residual-structure detection, explanatory-resistance evaluation, and vigilance signaling. VER is intended as a contribution to representation diagnostics in machine learning. Its objective is not to replace existing evaluation methods but to complement them by treating representational adequacy as an explicit object of inquiry. A path toward empirical evaluation through representational-vigilance benchmarks is also outlined.

URL PDF HTML ☆

赞 0 踩 0

2606.13209 2026-06-12 cs.LG cs.CL 新提交

Understanding helpfulness and harmless tension in reward models

理解奖励模型中的有用性与无害性张力

Eshaan Tanwar, Pepa Atanasova

发表机构 * University of Copenhagen（哥本哈根大学）

AI总结通过激活分析和消融实验，发现奖励模型中有用性和无害性目标存在干扰，共享神经元对模型行为影响不成比例，导致对齐张力。

Comments The source code used in this study is publicly available at: https://github.com/EshaanT/RM-alignment\_tension

详情

AI中文摘要

奖励模型是从人类反馈中进行强化学习（RLHF）的关键组成部分，使语言模型在有用性和无害性行为上对齐。然而，这些目标背后的内部机制及其冲突仍知之甚少。我们研究了在仅有用性、仅无害性和混合目标设置下训练的奖励模型中的对齐张力。我们发现混合目标模型通常表现不如单目标模型，表明目标之间存在干扰。使用基于激活的方法，我们识别了与每个目标相关的神经元，并通过定向消融研究其功能角色。我们发现这些神经元因果地支持其对应目标，同时往往对对立目标产生负面影响。我们发现相当比例的神经元在有用性和无害性之间共享，并且这些共享神经元对模型行为产生不成比例的影响，导致对齐张力。此外，我们的结果提供了关于对齐目标如何在奖励模型中表示以及为什么多目标对齐仍然具有挑战性的见解和机制解释，为未来关于解耦和可控对齐方法的研究提供了动力。

英文摘要

Reward models are a key component of reinforcement learning from human feedback (RLHF), aligning language models toward both helpful and harmless behaviour. However, the internal mechanisms underlying these objectives and their conflicts remain poorly understood. We study alignment tension in reward models trained under helpfulness-only, harmlessness-only, and mixed-objective settings. We find that mixed-objective models often underperform single-objective models, indicating interference between objectives. Using activation-based methods, we identify neurons associated with each objective and study their functional roles via targeted ablations. We find that these neurons causally support their corresponding objectives while often negatively affecting the opposing one. We find that a substantial proportion of neurons are shared between helpfulness and harmlessness, and that these shared neurons exert a disproportionate influence on model behaviour, contributing to alignment tension. Additionally, our results provide insights and mechanistic interpretation into how alignment objectives are represented in reward models and why multi-objective alignment remains challenging, motivating future work on disentangled and controllable alignment methods.

URL PDF HTML ☆

赞 0 踩 0

2606.13451 2026-06-12 cs.LG 新提交

Uncertainty Estimation for Molecular Diffusion Models

分子扩散模型的不确定性估计

Paul Seij, Christian A. Naesseth, Stephan Mandt, Metod Jazbec

发表机构 * University of Amsterdam（阿姆斯特丹大学）

AI总结提出一种事后方法，利用去噪网络的拉普拉斯近似估计预训练分子扩散模型中每个样本的不确定性，该分数与样本质量负相关，可用于过滤生成样本。

2606.12498 2026-06-12 cs.CR cs.LG 交叉投稿

鲁棒的状态条件特征加权跳跃模型用于时间聚类

Federico P. Cortese, Alessio Farcomeni

AI总结提出一种鲁棒的特征加权跳跃模型，通过Tukey双权损失函数实现鲁棒性，并引入状态特定特征权重，在模拟和实证中优于竞争方法。

2606.13277 2026-06-12 stat.ML cs.LG 交叉投稿

不可靠数据下的公平性审计有多可靠？

Yash Vardhan Tomar

发表机构 * Purdue University（普渡大学）

AI总结研究受保护标签缺失对公平性缓解审计的影响，提出种子校准压力测试区分缺失效应与随机波动，发现正可用性缺失通常不改变缓解方法效果，但无标签端点表现不同，且阈值优化可能将单轴公平性增益转化为交叉危害。

详情

AI中文摘要

公平性审计是负责任机器学习部署的关键组成部分。然而，在不完全受保护标签访问下审计建议的可靠性仍然知之甚少。在这项工作中，我们关注公平性缓解审计中的受保护标签缺失。我们引入了一种种子校准压力测试，以将缺失效应与完全标签下已经存在的种子间波动分离开来。在ACS/Folktables任务中，我们发现正可用性缺失通常不会将选定的缓解方法移出完全标签的种子基线。无标签端点表现不同，暴露了ERM等效候选和确定性断点，而不是广泛的缺失效应。我们还发现，阈值优化可以将单轴公平性增益转化为高于零点的交叉危害，这是一种更尖锐的失败模式，在随机森林验证下似乎仍然可见。总体而言，我们的结果强调，在将受保护标签缺失视为审计脆弱性的证据之前，应报告种子零校准、候选集背景和交叉后果。

英文摘要

Fairness audits are a key component of responsible machine-learning deployment. Yet, audit-recommendation reliability under incomplete protected-label access is still poorly understood. In this work, we focused on protected-label missingness in fairness mitigation audits. We introduced a seed-calibrated stress test to separate missingness effects from seed-to-seed movement already present under complete labels. Across ACS/Folktables tasks, missingness settings that retain some protected labels usually do not move selected mitigation methods beyond a complete-label seed-to-seed baseline. At $0%$ protected-label access, candidates collapse to an empirical-risk-minimization baseline and deterministic tie-breaking rather than revealing a broad missingness effect. We also found that threshold optimization can turn fairness gains on a single protected axis into intersectional harm above a seed baseline, and this threshold-optimizer finding persists under random-forest validation. Overall, our results highlight that protected-label missingness should be reported with seed-null calibration, candidate-set context, and intersectional consequences before it is treated as evidence of audit fragility.

URL PDF HTML ☆

赞 0 踩 0

2507.07947 2026-06-12 cs.LG cs.AI 版本更新

Reconstructing Template-Memorized Images from Natural Prompts

从自然提示中重建模板记忆的图像

Sol Yarkoni, Mahmood Sharif, Roi Livni

发表机构 * School of Electrical & Computer Engineering（电气与计算机工程学院）； School of Computer Science & AI（计算机科学与人工智能学院）； Tel Aviv University（特拉维夫大学）

AI总结提出一种低资源攻击方法，利用模板化电商数据中的模式，从自然提示中重建训练集中的记忆图像，揭示隐私风险。

详情

AI中文摘要

生成模型（如扩散模型）的最新进展引发了与隐私、版权侵犯和数据管理相关的担忧。为了更好地理解和控制这些风险，先前的工作引入了从训练数据中重建图像或部分图像的技术和攻击。虽然这些结果表明训练数据可以被恢复，但现有方法通常依赖于高计算资源、对训练集的部分访问或精心设计的提示。在这项工作中，我们提出了一种新的攻击，该攻击需要低资源，假设对训练数据几乎没有或完全没有访问权限，并识别出看似良性的提示，这些提示可能导致潜在有风险的图像重建。我们进一步表明，即使对于没有专业知识的用户，这种重建也可能无意中发生。例如，我们观察到，对于现有模型，提示“蓝色男女通用T恤”会生成一个真实个体的面部。此外，通过将已识别的漏洞与真实世界的提示数据相结合，我们发现了能够重现记忆视觉元素的提示。我们的方法建立在先前工作的见解之上，并利用领域知识来揭示由于使用抓取的电商数据而产生的基本漏洞，其中模板化布局和图像与模式化的文本提示紧密相关。我们的攻击代码在此https URL公开。

英文摘要

Recent advances in generative models, such as diffusion models, have raised concerns related to privacy, copyright infringement, and data stewardship. To better understand and control these risks, prior work has introduced techniques and attacks that reconstruct images, or parts of images, from training data. While these results demonstrate that training data can be recovered, existing methods often rely on high computational resources, partial access to the training set, or carefully engineered prompts. In this work, we present a new attack that requires low resources, assumes little to no access to the training data, and identifies seemingly benign prompts that can lead to potentially risky image reconstruction. We further show that such reconstructions may occur unintentionally, even for users without specialized knowledge. For example, we observe that for one existing model, the prompt ``blue Unisex T-Shirt'' generates the face of a real individual. Moreover, by combining the identified vulnerabilities with real-world prompt data, we discover prompts that reproduce memorized visual elements. Our approach builds on insights from prior work and leverages domain knowledge to expose a fundamental vulnerability arising from the use of scraped e-commerce data, where templated layouts and images are closely tied to pattern-like textual prompts. The code for our attack is publicly available at https://github.com/TheSolY/lr-tmi.

URL PDF HTML ☆

赞 0 踩 0

2507.08794 2026-06-12 cs.LG cs.CL 版本更新

One Token to Fool LLM-as-a-Judge

一个令牌就能欺骗LLM裁判

Yulai Zhao, Haolin Liu, Dian Yu, Sunyuan Kung, Meijia Chen, Haitao Mi, Dong Yu

发表机构 * Princeton University（普林斯顿大学）； University of Virginia（弗吉尼亚大学）； Tencent AI Lab（腾讯人工智能实验室）； Rutgers University（罗格斯大学）

AI总结发现基于参考的生成式奖励模型易受奖励黑客攻击，表面输入（如非词符号或通用推理开头）能持续引发假阳性奖励，提出使用截断模型输出作为对抗性负例的数据增强策略，构建鲁棒的Master奖励模型。

详情

AI中文摘要

大型语言模型（LLM）越来越被信任作为自动裁判，协助评估并为训练其他模型提供奖励信号，特别是在基于参考的设置中，如带可验证奖励的强化学习（RLVR）。然而，我们揭示了即使在这种基于参考的范式中也存在一个关键漏洞：生成式奖励模型系统性地容易受到奖励黑客攻击。我们发现，表面输入——我们称之为“万能钥匙”，例如非词符号（如“:”或“.”）或通用推理开头（如“思考过程：”或“让我们逐步解决这个问题。”）——可以在没有任何实质性推理的情况下持续引发假阳性奖励。我们的系统评估表明，这是一个广泛存在的失败，影响多种模型，包括领先的专有系统如GPT-o1和Claude-4。这些结果挑战了LLM裁判假定的鲁棒性，并对其可靠性构成重大威胁。为了解决这个问题，我们提出了一种简单而有效的数据增强策略，使用截断的模型输出作为对抗性负例。由此产生的Master奖励模型（Master-RMs）在对这些“万能钥匙”攻击方面表现出最先进的鲁棒性，同时在标准评估设置中保持高性能。我们通过跨模型规模、提示变化和常见推理时策略的漏洞全面分析来补充这些发现，为未来关于鲁棒LLM评估的研究提供见解。我们在https://this.url 和 https://this.url 发布我们的鲁棒通用领域奖励模型和合成训练数据。

英文摘要

Large language models (LLMs) are increasingly trusted as automated judges, assisting evaluation and providing reward signals for training other models, particularly in reference-based settings like Reinforcement Learning with Verifiable Rewards (RLVR). However, we uncover a critical vulnerability even in this reference-based paradigm: generative reward models are systematically susceptible to reward hacking. We find that superficial inputs, which we term ''master keys'' such as non-word symbols (e.g., '':'' or ''.'') or generic reasoning openers (e.g., ''Thought process:'' or ''Let's solve this problem step by step.''), can consistently elicit false positive rewards without any substantive reasoning. Our systematic evaluation demonstrates this is a widespread failure affecting a diverse range of models, including leading proprietary systems such as GPT-o1 and Claude-4. These results challenge the assumed robustness of LLM judges and pose a significant threat to their reliability. To address this, we propose a simple yet effective data augmentation strategy using truncated model outputs as adversarial negative examples. The resulting Master Reward Models (Master-RMs) demonstrate state-of-the-art robustness against these ''master key'' attacks while maintaining high performance in standard evaluation settings. We supplement these findings with a comprehensive analysis of the vulnerability across model scales, prompt variations, and common inference-time strategies, offering insights to guide future research on robust LLM evaluation. We release our robust, general-domain reward models and the synthetic training data at https://huggingface.co/sarosavo/Master-RM and https://huggingface.co/datasets/sarosavo/Master-RM.

URL PDF HTML ☆

赞 0 踩 0

2603.29515 2026-06-12 cs.LG 版本更新

Variational Graph Neural Networks for Uncertainty Quantification in Inverse Problems

变分图神经网络用于反问题中的不确定性量化

David Gonzalez, Alba Muixi, Beatriz Moya, Elias Cueto

发表机构 * Keysight-UZ Chair of the Spanish National Strategy on AI（西班牙人工智能国家战略主席席位）； Aragon Institute of Engineering Research (I3A)（阿拉贡工程研究所（I3A））； Universidad de Zaragoza（萨拉戈塔大学）； Laboratori de Càlcul Numèric (LaCàN)（数值计算实验室（LaCàN））； Universitat Politècnica de Catalunya - BarcelonaTech (UPC)（加泰罗尼亚理工大学 - 巴塞罗那科技大学（UPC））； Centre Internacional de Mètodes Numèrics en Enginyeria (CIMNE)（国际数值工程方法中心（CIMNE））； PIMM Lab. Arts et Métiers Institute of Technology（巴黎艺术与技术理工学院PIMM实验室）

AI总结提出变分图神经网络（VGNN），通过在解码器引入变分层以较低成本量化认知和统计不确定性，在固体力学反问题中验证了高精度参数恢复与置信区间估计。

详情

AI中文摘要

深度学习技术在计算力学中的日益广泛应用显著加速了那些几年前还被认为是难以处理的问题的模拟。然而，在诸如工程或医学数字孪生等关键应用中，快速响应是不够的；还必须提供可靠的结果。在某些情况下，传统的确定性方法可能不是最优的，因为它们无法提供对其预测或结果的置信度度量，尤其是在反问题中，解可能不唯一或初始数据由于噪声等原因不完全可靠。经典的深度神经网络也缺乏明确的度量来量化其预测的不确定性。在这项工作中，我们提出了一种变分图神经网络（VGNN）架构，该架构将变分层集成到其架构中以建模权重的概率分布。与计算昂贵的全贝叶斯网络不同，我们的方法仅在解码器中策略性地引入变分层，从而能够以相对较低的成本估计认知不确定性和统计不确定性。在这项工作中，我们在两个固体力学案例中验证了所提出的方法：在二维弹性问题中识别具有非线性分布的弹性模量值，以及在三维超弹性梁中定位和量化施加的载荷，在这两种情况下仅使用每个测试的位移场作为输入数据。结果表明，该模型不仅以高精度恢复了物理参数，还提供了与问题物理特性一致的置信区间，并且能够定位施加载荷的位置并估计其值，为该实验提供了置信区间。

英文摘要

The increasingly wide use of deep machine learning techniques in computational mechanics has significantly accelerated simulations of problems that were considered unapproachable just a few years ago. However, in critical applications such as Digital Twins for engineering or medicine, fast responses are not enough; reliable results must also be provided. In certain cases, traditional deterministic methods may not be optimal as they do not provide a measure of confidence in their predictions or results, especially in inverse problems where the solution may not be unique or the initial data may not be entirely reliable due to the presence of noise, for instance. Classic deep neural networks also lack a clear measure to quantify the uncertainty of their predictions. In this work, we present a variational graph neural network (VGNN) architecture that integrates variational layers into its architecture to model the probability distribution of weights. Unlike computationally expensive full Bayesian networks, our approach strategically introduces variational layers exclusively in the decoder, allowing us to estimate cognitive uncertainty and statistical uncertainty at a relatively lower cost. In this work, we validate the proposed methodology in two cases of solid mechanics: the identification of the value of the elastic modulus with nonlinear distribution in a 2D elastic problem and the location and quantification of the loads applied to a 3D hyperelastic beam, in both cases using only the displacement field of each test as input data. The results show that the model not only recovers the physical parameters with high precision, but also provides confidence intervals consistent with the physics of the problem, as well as being able to locate the position of the applied load and estimate its value, giving a confidence interval for that experiment.

URL PDF HTML ☆

赞 0 踩 0

2605.00432 2026-06-12 cs.LG stat.ML 版本更新

Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction

贝叶斯共形预测的最优时空解耦

Yu-Hsueh Fang, Chia-Yen Lee

AI总结提出状态自适应贝叶斯共形预测（SA-BCP），通过门控凸组合平衡长期时间惯性与局部空间证据，实现分布漂移下的快速适应与稳定覆盖，并给出MSE最优阈值闭式解及在线选择过程的遗憾界。

详情

AI中文摘要

在线共形预测必须在快速适应分布漂移与稳定覆盖之间取得平衡：基于反馈的方法反应迅速但变得不稳定，而强折扣贝叶斯方法滞后并在紧密覆盖下膨胀区间。我们引入了\textbf{状态自适应贝叶斯共形预测（SA-BCP）}，它将预测分位数形成为长期时间惯性与来自核密度估计的局部空间证据的门控凸组合，由单个可解释的证据阈值$K$控制。我们建立了三个结果：(i) 所得区间的渐近边际有效性；(ii) MSE最优阈值的闭式表达式$K^*_{\mathrm{MSE}}=\alpha(1-\alpha)/M^{\mathcal{T}}$，权衡了覆盖指标（伯努利）方差与时间结构偏差$M^{\mathcal{T}}$；(iii) 在线选择$K$的滚动起点过程——在平稳性下一致，对最佳固定$K$具有$O(\sqrt{T\log N})$遗憾，对于分段变体，在有界漂移下具有次线性动态遗憾界。在四个金融波动率和天气数据集、三个目标覆盖水平以及八个基线（包括最强的最近条件分位数方法SPCI和KOWCPI）上，SA-BCP在大多数设置中达到或超过名义覆盖，同时产生显著更窄的区间——在最紧密覆盖下，Winkler得分比折扣贝叶斯CP低约$3\times$——覆盖匹配审计确认这些效率提升并非欠覆盖的假象。我们披露了一个主要限制：一个专门针对波动率的共形GARCH竞争对手在其主波动率基序列上仍然更高效，尽管它不能跨领域迁移。

英文摘要

Online conformal prediction must balance fast adaptation to distribution shift against stable coverage: feedback-driven methods react quickly but become volatile, while strongly discounted Bayesian methods lag and inflate intervals at tight coverage. We introduce \textbf{State-Adaptive Bayesian Conformal Prediction (SA-BCP)}, which forms the predictive quantile as a gated convex combination of long-term temporal inertia and local spatial evidence from a kernel density estimate, controlled by a single interpretable evidence threshold $K$. We establish three results: (i) asymptotic marginal validity of the resulting intervals; (ii) a closed-form expression for the MSE-optimal threshold, $K^*_{\mathrm{MSE}}=α(1-α)/M^{\mathcal{T}}$, trading the coverage-indicator (Bernoulli) variance against the temporal structural bias $M^{\mathcal{T}}$; and (iii) a rolling-origin procedure for selecting $K$ online -- consistent under stationarity, with $O(\sqrt{T\log N})$ regret against the best fixed $K$ and, for a segmented variant, a sublinear dynamic-regret bound under bounded drift. Across four financial-volatility and weather datasets, three target coverage levels, and eight baselines (including the strongest recent conditional-quantile methods, SPCI and KOWCPI), SA-BCP attains at-or-above-nominal coverage in most settings while producing substantially sharper intervals -- up to roughly $3\times$ lower Winkler score than discounted Bayesian CP at the tightest coverage -- and a coverage-matched audit confirms these efficiency gains are not an artifact of under-coverage. We disclose one principal limitation: a volatility-specialized conformal-GARCH competitor remains more efficient on its home volatility-base series, though it does not transfer across domains.

URL PDF HTML ☆

赞 0 踩 0

2605.00600 2026-06-12 cs.LG cs.AI cs.CV 版本更新

Possibilistic Predictive Uncertainty for Deep Learning

深度学习的可能性预测不确定性

Yao Ni, Jeremie Houssineau, Yew-Soon Ong, Piotr Koniusz

发表机构 * University of Cambridge（剑桥大学）； National University of Singapore（新加坡国立大学）； University of Warsaw（华沙大学）

AI总结提出基于可能性理论的Dirichlet近似可能性后验预测（DAPPr）框架，通过投影-近似策略实现高效且原则性的认知不确定性量化，在多个基准上达到竞争性能。

Comments Accepted by ICML 2026, 20 pages

详情

AI中文摘要

深度神经网络在多种应用中取得了令人印象深刻的结果，然而它们对未见输入的过度自信需要可靠的认知不确定性建模。现有的不确定性建模方法面临一个基本困境：贝叶斯方法提供原则性的估计，但计算成本高昂，而高效的二阶预测器在其特定目标与认知不确定性量化之间缺乏严格联系。为解决这一困境，我们引入了Dirichlet近似可能性后验预测（DAPPr），一个基于可能性理论的原则性框架。我们定义了参数上的可能性后验，通过上确界算子将其投影到预测空间，并使用可学习的Dirichlet可能性函数近似投影后的后验。这种投影-近似策略产生了一个具有闭式解的简单训练目标。尽管简单，跨多个不同基准的大量实验表明，DAPPr在保持原则性推导和计算效率的同时，实现了与最先进的二阶预测器相当或更优的不确定性量化性能。代码可在 https://github.com/MaxwellYaoNi/DAPPr 获取。

英文摘要

Deep neural networks achieve impressive results across diverse applications, yet their overconfidence on unseen inputs necessitates reliable epistemic uncertainty modeling. Existing methods for uncertainty modeling face a fundamental dilemma: Bayesian approaches provide principled estimates but remain computationally prohibitive, while efficient second-order predictors lack rigorous connections between their specific objectives and epistemic uncertainty quantification. To resolve this dilemma, we introduce Dirichlet-approximated possibilistic posterior predictions (DAPPr), a principled framework grounded in possibility theory. We define a possibilistic posterior over parameters, project it to the prediction space via supremum operators, and approximate the projected posterior using learnable Dirichlet possibility functions. This projection-and-approximation strategy yields a simple training objective with closed-form solutions. Despite its simplicity, extensive experiments across diverse benchmarks show that DAPPr achieves competitive or superior uncertainty quantification performance over state-of-the-art second-order predictors while maintaining both principled derivation and computational efficiency. Code is available at https://github.com/MaxwellYaoNi/DAPPr.

URL PDF HTML ☆

赞 0 踩 0

2605.18231 2026-06-12 cs.LG 版本更新

Attacking the First-Principle: A Black-Box, Query-Free Targeted Mimicry Attack on Binary Function Classifiers

攻击第一原理：一种针对二元函数分类器的黑盒、无查询目标模仿攻击

Gabriel Sauger, Jean-Yves Marion, Sazzadur Rahaman, Victor Matrat, Vincent Tourneur, Muaz Ali

发表机构 * LORIA（洛林信息与自动化研究院）； University of Arizona（亚利桑那大学）

AI总结本文提出Kelpie框架，首次在黑盒无查询环境下成功执行针对二元函数分类器的模仿攻击，展示了其在不同模型架构下的有效性，并通过实际案例验证了攻击的可行性，引发对现有机器学习二元函数分类器可靠性和安全性的质疑。

详情

AI中文摘要

二元函数分类器在维护软件系统安全性和完整性方面起着关键作用，通过检测恶意代码和未经授权的修改。然而，基于机器学习的分类器容易受到对抗攻击的威胁，这些攻击可以绕过检测。在本研究中，我们提出Kelpie，一种新型框架，用于在黑盒、零查询环境下执行模仿攻击，这是一种更强大的目标逃避攻击类型。与以往依赖查询目标分类器来优化无目标逃避攻击的方法不同，Kelpie利用代码转换，保持恶意负载的功能性，同时使其被误分类为所需类别。通过广泛实验，我们证明Kelpie能够成功对六种最先进的二元函数分类器执行模仿攻击，这些分类器代表了不同的模型架构，而无需直接与它们交互。我们进一步通过实际演示验证了我们的方法，包括隐藏在看似无害函数中的键盘记录器和擦除器。到目前为止，我们的工作是首次在黑盒、零查询环境下展示此类模仿攻击，引发了对现有基于机器学习的二元函数分类器可靠性和安全性的重大质疑。

英文摘要

Binary function classifiers play a crucial role in maintaining the security and integrity of software systems by detecting malicious code and unauthorized modifications. However, machine learning-based classifiers are vulnerable to adversarial attacks that can evade detection. In this study, we present Kelpie, a novel framework for executing mimicry attacks, a stronger type of targeted evasion attacks, on binary function classifiers in a black-box, zero-query setting. Unlike previous approaches that rely on querying the target classifier to refine untargeted evasion attacks, Kelpie leverages code transformations that preserve the functionality of malicious payloads while causing them to be misclassified as we want. Through extensive experimentation, we demonstrate that Kelpie can successfully execute mimicry attacks against six state-of-the-art binary function classifiers representing different model architectures without requiring direct interaction with them. We further validate our approach with a practical demonstration, involving a keylogger and a wiper concealed within benign-looking functions embedded in an application. This work, to our best knowledge, is the first to demonstrate such a mimicry attack in a black-box, zero-query context, raising important questions about the reliability and security of existing machine learning-based binary function classifiers.

URL PDF HTML ☆

赞 0 踩 0

2606.09073 2026-06-12 cs.LG cs.AI cs.CL 版本更新

A Unifying Lens on Reward Uncertainty in RLHF

RLHF中奖励不确定性的统一视角

Ely Hahami, Yoel Zimmermann, Ray Zhou, Jack Benarroch Jedlicki

发表机构 * University of California, Berkeley（加州大学伯克利分校）； DeepMind（深度Mind）

AI总结本文提出使用分布奖励模型统一RLHF中的悲观主义方法，通过闭式有效奖励公式连接现有启发式方法，并揭示其隐含假设。

详情

AI中文摘要

基于人类反馈的强化学习（RLHF）受限于\textit{奖励破解}，即策略利用代理奖励模型（RM）中的错误，产生高RM分数而缺乏真正的质量提升。一种自然的缓解方法是\textit{悲观主义}：在RM不确定的区域惩罚奖励。然而，标准标量RM没有提供原则性的不确定性概念。我们认为正确的对象是\textit{分布}奖励模型$p(r\mid x,y)$。在贝叶斯推断或KL分布鲁棒优化（KL-DRO）视角下，KL正则化的RLHF目标具有闭式有效奖励$\tilde r(x,y) = \pmβ\log\mathbb{E}_p[e^{\pm r/β}]$。悲观分支统一了RM集成聚合的先前启发式方法：均值聚合、最坏情况优化（WCO）和不确定性加权优化（UWO）都作为该单一表达式的极限或截断出现。这也澄清了每个现有规则的隐含假设。

英文摘要

Reinforcement learning from human feedback (RLHF) is bottlenecked by reward hacking, where the policy exploits errors in a proxy reward model (RM) and produces high RM scores without genuine quality gains. A natural mitigation is pessimism: lowering rewards in regions where the RM is uncertain. However, standard scalar RMs provide no principled notion of uncertainty. We argue that the right object is a distributional reward model $p(r\mid x,y)$. Under either a Bayesian inference or a KL-distributionally robust optimization (KL-DRO) lens, the KL-regularized RLHF objective admits a closed-form effective reward $\tilde r(x,y) = \pmβ\log\mathbb{E}_p[e^{\pm r/β}]$. The pessimistic branch unifies the prior heuristics for RM ensemble aggregation: mean aggregation, worst-case optimization (WCO), and uncertainty-weighted optimization (UWO) all emerge as limits or truncations of this single expression. This also clarifies the implicit assumptions of each existing rule.

URL PDF HTML ☆

赞 0 踩 0

2601.21324 2026-06-12 stat.ML cs.LG 版本更新

Bulk-Calibrated Credal Ambiguity Sets: Fast, Tractable Decision Making under Out-of-Sample Contamination

批量校准的置信模糊集：样本外污染下的快速、可处理决策

Mengqi Chen, Thomas B. Berrett, Theodoros Damoulas, Michele Caprio

发表机构 * University of Bristol（布里斯托大学）； University of Cambridge（剑桥大学）； University of California, Berkeley（加州大学伯克利分校）； University of Oxford（牛津大学）

AI总结提出批量校准置信模糊集，通过分离批量内污染和尾部贡献，得到闭式有限风险目标，转化为线性或二阶锥规划，实现高效鲁棒优化。

Comments Accepted for publication (spotlight) at ICML 2026

详情

AI中文摘要

分布鲁棒优化（DRO）在模糊集上最小化最坏情况期望损失，该模糊集可捕捉样本外环境中的分布偏移。虽然Huber（线性-空）污染是$\varepsilon$分数任意扰动的经典最小假设模型，但将其纳入模糊集可能导致最坏情况风险无穷大，且DRO目标变得无意义，除非施加强有界性或支撑假设。我们通过引入批量校准的置信模糊集来解决这些挑战：我们从数据中学习一个高质量批量集，同时考虑批量内的污染，并分别约束剩余尾部贡献。这导致一个闭式、有限的$\mathrm{mean}+\sup$鲁棒目标，以及针对常见损失和批量几何结构的可处理线性或二阶锥规划。通过该框架，我们强调并利用上期望（不精确概率概念）与最坏情况风险之间的等价性，展示IP置信集如何转化为具有可解释容忍水平的DRO目标。在重尾库存控制、地理偏移房价回归和人口偏移文本分类上的实验显示了竞争性的鲁棒性-准确性权衡和高效的优化时间，使用了贝叶斯、频率学派或经验参考分布。

英文摘要

Distributionally robust optimisation (DRO) minimises the worst-case expected loss over an ambiguity set that can capture distributional shifts in out-of-sample environments. While Huber (linear-vacuous) contamination is a classical minimal-assumption model for an $\varepsilon$-fraction of arbitrary perturbations, including it in an ambiguity set can make the worst-case risk infinite and the DRO objective vacuous unless one imposes strong boundedness or support assumptions. We address these challenges by introducing bulk-calibrated credal ambiguity sets: we learn a high-mass bulk set from data while considering contamination inside the bulk and bounding the remaining tail contribution separately. This leads to a closed-form, finite $\mathrm{mean}+\sup$ robust objective and tractable linear or second-order cone programs for common losses and bulk geometries. Through this framework, we highlight and exploit the equivalence between the imprecise probability (IP) notion of upper expectation and the worst-case risk, demonstrating how IP credal sets translate into DRO objectives with interpretable tolerance levels. Experiments on heavy-tailed inventory control, geographically shifted house-price regression, and demographically shifted text classification show competitive robustness-accuracy trade-offs and efficient optimisation times, using Bayesian, frequentist, or empirical reference distributions.

URL PDF HTML ☆

赞 0 踩 0

2606.12673 2026-06-12 cs.LG cs.AI 新提交

A Zero-shot Generalized Graph Anomaly Detection Framework via Node Reconstruction

基于节点重构的零样本广义图异常检测框架

Phan Nguyen, Dat Cao, Hien Chu, Khue Hoang

发表机构 * School of Computing, KAIST（韩国科学技术院计算机学院）

AI总结提出AlignGAD框架，通过全局统一模块对齐异构特征、聚类模块捕获组级异常模式及节点差异评分模块聚合多视图异常证据，实现零样本跨域图异常检测。

详情

AI中文摘要

跨域图异常检测旨在识别未见过的目标图中的异常节点，在异构图数据的实际应用中展现出巨大潜力。然而，现有方法通常依赖于数据集特定的特征语义和结构模式，限制了其跨域泛化能力。为解决这一挑战，我们提出AlignGAD，一个零样本广义图异常检测框架。我们的框架基于三个关键组件：全局统一模块，用于对齐异构节点特征并在谱域中归一化图信号；聚类模块，用于构建聚类感知的图视图以捕获组级异常模式；以及节点差异评分模块，用于测量重构差异并聚合来自不同图视图的异常证据。在多个真实数据集上的实验证明了AlignGAD在零样本图异常检测设置下的有效性。

英文摘要

Cross-domain graph anomaly detection (GAD) aims to identify abnormal nodes in unseen target graphs, showing strong potential in real-world applications with heterogeneous graph data. However, existing methods often depend on dataset-specific feature semantics and structural patterns, which limits their ability to generalize across different domains. To address this challenge, we propose AlignGAD, a zero-shot generalized graph anomaly detection framework. Our framework is built upon three key components: a Global Unification Module that aligns heterogeneous node features and normalizes graph signals in the spectral domain; a Clustering Module that constructs cluster-aware graph views to capture group-level abnormal patterns; and a Node Discrepancy Scoring Module that measures reconstruction discrepancy and aggregates anomaly evidence from different graph views. Experiments on multiple real-world datasets demonstrate the effectiveness of AlignGAD under the zero-shot GAD setting.

URL PDF HTML ☆

赞 0 踩 0

2606.13444 2026-06-12 cs.LG 新提交

Clustering Node Attributed Networks with Graph Neural Networks and Self Learning

使用图神经网络和自学习的节点属性网络聚类

Rodrigo de Sapienza Luna, Daniel Ratton Figueiredo

发表机构 * Systems Engineering and Computer Science (PESC), Federal University of Rio de Janeiro (UFRJ)（里约热内卢联邦大学系统工程与计算机科学系）

AI总结提出一种基于图神经网络和自学习的无监督图聚类框架，通过多轮自学习交替优化节点表示和聚类，利用上下文图提升性能，在合成和真实数据上表现优异。

详情

AI中文摘要

图聚类——将图的节点集划分为反映潜在信息的互不相交的子集——是一个基本问题，因为它应用于多种不同的场景。虽然这个经典问题已经被不同社区处理了几十年，但由真实数据驱动的一个最新变体考虑了节点具有信息性属性的场景。这引发了同时利用网络信息（边）和节点信息（属性）设计新型聚类算法的新方法。本文提出了一种新颖的框架，该框架建立在先前将图神经网络（GNN）应用于图聚类的工作之上。所提出的框架在完全无监督的设置下以自学习轮次运行。在每一轮中，GNN生成用于聚类节点的节点表示。这种聚类影响用于生成下一轮节点表示的图。此外，每一轮中使用原始图构建的上下文图用于生成节点表示。实验结果表明，所提出的方法从合成数据中的网络边和节点属性中提取信息，当两者都不太具有信息性时，其性能优于仅关注网络或属性的算法。多轮学习也提高了性能，并且总是优于长时间的单轮训练（即经典的GNN图聚类）。在考虑真实数据集时，实验结果表明，当聚类大小平衡时，所提出的方法与最先进的方法具有竞争力。

英文摘要

Graph clustering - partitioning the node set of a graph into disjoint subsets that reflect some latent information - is a fundamental problem as it finds applications in a myriad of different scenarios. While this classic problem has been tackled for decades by different communities, a recent variation of the problem driven by real data considers the scenario where nodes have attributes that are also informative. This has triggered novel methods that simultaneously leverage network information (edges) and node information (attributed) in the design of novel clustering algorithms. This work proposes a novel framework that builds on prior works that have applied graph neural networks (GNN) to graph clustering. The proposed framework operates in rounds of self learning in a fully unsupervised setting. In each round, a GNN generates representations for nodes that are used to cluster the nodes. This clustering influences the graph used to generate the node representation in the next round. Moreover, a context graph built in each round using the original graph is used to generate the node representations. Empirical results show that the proposed methodology extracts information from both network edges and node attributes in synthetic data, outperforming algorithms focused solely on the network or attributes when neither are very informative. Multiple rounds of learning also improve the performance and always outperforms a long single round of training (i.e., classic GNN graph clustering). When considering real datasets, empirical results indicate that the proposed methodology is competitive to state-of-the-art methods when cluster sizes are balanced.

URL PDF HTML ☆

赞 0 踩 0

2606.13671 2026-06-12 cs.LG 新提交

Understanding Truncated Positional Encodings for Graph Neural Networks

理解图神经网络的截断位置编码

James Flora, Mitchell Black, Weng-Keen Wong, Amir Nayyeri

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结研究截断位置编码（如前k个特征空间或邻接矩阵幂）对图神经网络表达能力的影响，理论证明截断后多种位置编码的表达能力存在本质差异，且截断谱位置编码不再强于1-WL测试，实验表明混合截断编码优于单一类型。

Comments 28 pages, 4 figures, ICML 2026

详情

AI中文摘要

位置编码（PEs）在理论和经验上增强了图神经网络（GNNs）的能力。两个最流行的PE家族——谱（例如，拉普拉斯特征空间、有效电阻）和基于游走的（邻接矩阵的多项式）——在表达能力上理论等价，其表达性介于1-WL和3-WL测试之间。然而，这种等价性假设GNN使用这些PE的“完整”版本，这需要$O(n^3)$的时间和空间复杂度。相反，从业者通常使用这些编码的截断变体，例如前$k$个特征空间或邻接矩阵的幂。然而，这些截断PE的理论性质尚不清楚。在这项工作中，我们启动了对这些截断PE的研究。理论上，我们表明，在截断下，几个PE家族在表达能力上存在根本差异。作为推论，我们证明截断谱PE不再强于1-WL测试。我们还研究了一个谱PE家族——$k$-调和距离——以突出即使密切相关的截断PE在表达能力上的差异。最后，我们通过实验表明，在真实世界数据集上，混合截断PE优于任何单一家族。

英文摘要

Positional encodings (PEs) enhance the power of graph neural networks (GNNs), both theoretically and empirically. Two of the most popular families of PEs - spectral (e.g., Laplacian eigenspaces, effective resistance) and walk-based (polynomials of the adjacency matrix) - are theoretically equivalent in expressive power, with expressivity between the 1-WL and 3-WL tests. However, this equivalence assumes the GNN uses the "complete" version of these PEs, which requires $O(n^3)$ time and space complexity. Instead, practitioners commonly use truncated variants of these encodings, such as the first $k$ eigenspaces or powers of the adjacency matrix. However, the theoretical properties of these truncated PEs are unknown. In this work, we initiate the study of these truncated PEs. Theoretically, we show that, under truncation, several families of PEs are fundamentally different in expressive power. As a corollary, we show that truncated spectral PEs are no longer stronger than the 1-WL test. We also study a family of spectral PEs, the $k$-harmonic distances, to highlight the differences in expressive power of even closely related truncated PEs. Finally, we experimentally show that a mix of truncated PEs is preferable to any single family on real-world datasets.

URL PDF HTML ☆

赞 0 踩 0

2510.16311 2026-06-12 cs.LG 版本更新

Toward General Digraph Contrastive Learning: A Dual Spatial Perspective

面向一般有向图对比学习：双空间视角

Zhengyu Wu, Daohan Su, Yang Zhang, Xunkai Li, Rong-Hua Li, Guoren Wang

发表机构 * National University of Singapore（新加坡国立大学）； University of Science and Technology of China（中国科学技术大学）

AI总结提出S2-DiGCL框架，从复数域和实数域双空间视角对有向图进行对比学习，通过磁拉普拉斯自适应调制和路径子图增强，在节点分类和链接预测任务上分别提升4.41%和4.34%。

详情

AI中文摘要

图对比学习（GCL）已成为一种从图中提取一致表示而无需标签信息的强大工具。然而，现有方法主要关注无向图，忽略了在实际网络（如社交网络和推荐系统）中基础且不可或缺的关键方向信息。本文提出了S2-DiGCL，一种新颖的框架，强调从复杂域和实数域视角对有向图进行对比学习的空间洞察。从复数域视角，S2-DiGCL在磁拉普拉斯中引入个性化扰动，以自适应地调制边相位和方向语义。从实数域视角，它采用基于路径的子图增强策略，捕捉细粒度的局部不对称性和拓扑依赖性。通过联合利用这两个互补的空间视图，S2-DiGCL构建了高质量的正负样本，从而实现更通用和鲁棒的有向图对比学习。在7个真实有向图数据集上的大量实验证明了我们方法的优越性，在监督和无监督设置下，节点分类和链接预测分别实现了4.41%和4.34%的性能提升，达到了最先进水平。

英文摘要

Graph Contrastive Learning (GCL) has emerged as a powerful tool for extracting consistent representations from graphs, independent of labeled information. However, existing methods predominantly focus on undirected graphs, disregarding the pivotal directional information that is fundamental and indispensable in real-world networks (e.g., social networks and recommendations).In this paper, we introduce S2-DiGCL, a novel framework that emphasizes spatial insights from complex and real domain perspectives for directed graph (digraph) contrastive learning. From the complex-domain perspective, S2-DiGCL introduces personalized perturbations into the magnetic Laplacian to adaptively modulate edge phases and directional semantics. From the real-domain perspective, it employs a path-based subgraph augmentation strategy to capture fine-grained local asymmetries and topological dependencies. By jointly leveraging these two complementary spatial views, S2-DiGCL constructs high-quality positive and negative samples, leading to more general and robust digraph contrastive learning. Extensive experiments on 7 real-world digraph datasets demonstrate the superiority of our approach, achieving SOTA performance with 4.41% improvement in node classification and 4.34% in link prediction under both supervised and unsupervised settings.

URL PDF HTML ☆

赞 0 踩 0

2606.12680 2026-06-12 cs.LG stat.ML 新提交

元学习变换器以改进上下文泛化

Lorenzo Braccaioli, Anna Vettoruzzo, Prabhant Singh, Joaquin Vanschoren, Mohamed-Rafik Bouguelia, Nicola Conci

发表机构 * University of Trento, Italy（特伦托大学，意大利）； Eindhoven University, Netherlands（埃因霍温大学，荷兰）； University of Doha for Science and Technology, Qatar（多哈科学与技术大学，卡塔尔）

AI总结提出利用多个小规模领域特定数据集训练上下文学习器，通过元学习提升跨领域泛化能力，并在持续学习和无监督场景下验证其鲁棒性。

详情

AI中文摘要

上下文学习使变换器模型能够仅基于输入提示泛化到新任务，无需任何权重更新。然而，现有的训练范式通常依赖于大型非结构化数据集，这些数据集存储成本高，难以评估质量和平衡性，并且由于包含敏感信息而引发隐私和伦理问题。受这些局限性和风险的启发，我们提出了一种替代训练策略，利用多个小规模、领域特定的数据集集合。我们经验性地证明，此类数据质量的提高和多样性的增加提升了上下文学习器在其训练领域之外的泛化能力，同时与在单个大规模数据集上训练的模型相比，性能相当。我们通过利用元学习在Meta-Album集合上训练上下文学习器来研究这一范式，在多种设置下进行实验。首先，我们在受控环境中展示性能，其中测试领域完全排除在训练知识之外。其次，我们探索这些模型在信息可访问时间有限的持续场景中对遗忘的鲁棒性。最后，我们探索更具挑战性的无监督场景。我们的发现表明，当在精心策划的数据集集合上训练时，变换器仍然能够泛化用于上下文预测，同时在模块化和可替换性方面提供了优势。

英文摘要

In-context learning enables transformer models to generalize to new tasks based solely on input prompts, without any need for weight updates. However, existing training paradigms typically rely on large, unstructured datasets that are costly to store, difficult to evaluate for quality and balance, and pose privacy and ethical concerns due to the inclusion of sensitive information. Motivated by these limitations and risks, we propose an alternative training strategy where we leverage a collection of multiple, small-scale, and domain-specific datasets. We empirically demonstrate that the increased quality and diversity of such data improve the generalization abilities of in-context learners beyond their training domain, while achieving comparable performance with models trained on a single large-scale dataset. We investigate this paradigm by leveraging meta-learning to train an in-context learner on the Meta-Album collection under several settings. Firstly, we show the performance in a controlled environment, where the test domain is completely excluded from the training knowledge. Secondly, we explore the robustness of these models to forgetting in a continual scenario where the information is accessible for a limited time. Finally, we explore the more challenging unsupervised scenario. Our findings demonstrate that transformers still generalize for in-context prediction when trained on a curated dataset collection while offering advantages in modularity and replaceability.

URL PDF HTML ☆

赞 0 踩 0

2602.12753 2026-06-12 cs.LG 版本更新

Hierarchical Successor Representation for Robust Transfer

层次化后继表示用于鲁棒迁移

Changmin Yu, Máté Lengyel

发表机构 * University of Cambridge（剑桥大学）； DeepMind（深度思维）

AI总结提出层次化后继表示（HSR），通过时间抽象构建鲁棒的状态特征，结合非负矩阵分解实现稀疏低秩表示，支持多隔间环境下的高效任务迁移与探索。

详情

AI中文摘要

后继表示（SR）为将预测动态与奖励解耦提供了强大框架，能够实现跨奖励配置的快速泛化。然而，经典SR受其固有的策略依赖性限制：由于持续学习、环境非平稳性和任务需求变化，策略会发生变化，使得已建立的预测表示过时。此外，在拓扑复杂的环境中，SR遭受谱扩散，导致特征密集重叠且扩展性差。本文提出层次化后继表示（HSR）以克服这些限制。通过将时间抽象纳入预测表示的构建，HSR学习到对任务引起的策略变化鲁棒的稳定状态特征。将非负矩阵分解（NMF）应用于HSR，得到稀疏低秩的状态表示，有助于在多隔间环境中实现向新任务的高样本效率迁移。进一步分析表明，HSR-NMF发现了可解释的拓扑结构，提供了策略无关的层次化地图，有效桥接了无模型最优性和基于模型的灵活性。除了为任务迁移提供有用基础外，我们还展示了HSR的时间扩展预测结构也可用于驱动高效探索，有效扩展到大规模程序生成的环境。

英文摘要

The successor representation (SR) provides a powerful framework for decoupling predictive dynamics from rewards, enabling rapid generalisation across reward configurations. However, the classical SR is limited by its inherent policy dependence: policies change due to ongoing learning, environmental non-stationarities, and changes in task demands, making established predictive representations obsolete. Furthermore, in topologically complex environments, SRs suffer from spectral diffusion, leading to dense and overlapping features that scale poorly. Here we propose the Hierarchical Successor Representation (HSR) for overcoming these limitations. By incorporating temporal abstractions into the construction of predictive representations, HSR learns stable state features which are robust to task-induced policy changes. Applying non-negative matrix factorisation (NMF) to the HSR yields a sparse, low-rank state representation that facilitates highly sample-efficient transfer to novel tasks in multi-compartmental environments. Further analysis reveals that HSR-NMF discovers interpretable topological structures, providing a policy-agnostic hierarchical map that effectively bridges model-free optimality and model-based flexibility. Beyond providing a useful basis for task-transfer, we show that HSR's temporally extended predictive structure can also be leveraged to drive efficient exploration, effectively scaling to large, procedurally generated environments.

URL PDF HTML ☆

赞 0 踩 0

2603.15158 2026-06-12 cs.LG 版本更新

Point-Identification of a Robust Predictor Under Latent Shift with Imperfect Proxies

在不完美代理下潜在偏移中鲁棒预测器的点识别

Zahra Rahiminasab, Reza Soumi, Arto Klami, Samuel Kaski

发表机构 * Department of Computer Science, Aalto University（阿尔托大学计算机科学系）； Department of Computer Science, University of Helsinki（赫尔辛基大学计算机科学系）； ELLIS Institute Finland（芬兰埃利斯研究所）； Department of Computer Science, Manchester University（曼彻斯特大学计算机科学系）

AI总结针对潜在混淆变量导致的域适应问题，提出基于潜在等价类的点识别方法，通过跨域秩条件替代强完备性假设，并设计主动学习框架PQAL实现鲁棒预测。

详情

AI中文摘要

当跨域的分布偏移源于同时影响协变量和结果的潜在混淆变量时，域适应问题变得更加具有挑战性。现有的基于代理的方法通过强完备性假设来唯一确定（点识别）鲁棒预测器。完备性要求代理具有关于潜在混淆变量变化的足够信息。对于不完美代理，从混淆变量到代理分布空间的映射是非单射的，多个潜在混淆变量值可能生成相同的代理分布。这破坏了完备性假设，观测数据与多个潜在预测器（集识别）一致。为了解决这个问题，我们引入了潜在等价类（LECs）。LECs定义为诱导相同条件代理分布的潜在混淆变量组。我们证明，只要多个域在如何混合代理诱导的LECs以形成鲁棒预测器方面有足够差异，鲁棒预测器的点识别仍然可以实现。这种域多样性条件被形式化为混合权重的跨域秩条件，该条件比完备性假设弱得多。我们提出了近端准贝叶斯主动学习（PQAL）框架，该框架主动查询满足该秩条件的小型、有针对性的多样化域集合。PQAL可以恢复点识别的预测器，展示了对不同程度偏移的鲁棒性，并在合成数据、半合成dSprites、IHDP、ACS Folktables数据集上优于先前方法。

英文摘要

Addressing the domain adaptation problem becomes more challenging when distribution shifts across domains stem from latent confounders that affect both covariates and outcomes. Existing proxy-based approaches that address latent shift rely on a strong completeness assumption to uniquely determine (point-identify) a robust predictor. Completeness requires that proxies have sufficient information about variations in latent confounders. For imperfect proxies the mapping from confounders to the space of proxy distributions is non-injective, and multiple latent confounder values can generate the same proxy distribution. This breaks the completeness assumption and observed data are consistent with multiple potential predictors (set-identified). To address this, we introduce latent equivalent classes (LECs). LECs are defined as groups of latent confounders that induce the same conditional proxy distribution. We show that point-identification for the robust predictor remains achievable as long as multiple domains differ sufficiently in how they mix proxy-induced LECs to form the robust predictor. This domain diversity condition is formalized as a cross-domain rank condition on the mixture weights, which is substantially weaker assumption than completeness. We introduce the Proximal Quasi-Bayesian Active learning (PQAL) framework, which actively queries a small, targeted set of diverse domains that satisfy this rank condition. PQAL can recover the point-identified predictor, demonstrates robustness to varying degrees of shift and outperforms previous methods on synthetic data and semi-synthetic dSprites, IHDP, ACS Folktables datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.12483 2026-06-12 cs.LG 新提交

Scalable anomaly detection via a univariate Christoffel function

通过单变量Christoffel函数实现可扩展的异常检测

Florian Grivet, Didier Henrion, Jean-Bernard Lasserre, Louise Travé-Massuyès

AI总结针对Christoffel函数方法因矩阵大小随维度指数增长而难以应用于高维数据的问题，提出基于查询点与支撑点间平方距离的单变量Christoffel函数（UCF），在ADBench基准上平均精度优于14种基线方法。

详情

AI中文摘要

异常检测在欺诈检测、网络入侵和系统故障诊断等领域识别异常模式中发挥关键作用。近年来，基于Christoffel函数的方法（根植于多项式优化）因其坚实的数学基础和计算节俭性，成为深度学习的有前景替代方案。然而，其实用性受限于需要求逆一个大小随数据维度指数增长的矩阵，即使对于中等维度数据集也难以处理。本文解决了Christoffel函数异常检测的维度限制，同时保留了其关键理论性质，即开关支撑二分法行为和准确的支撑形状捕获。我们引入了UCF，一种基于查询点与支撑点间平方距离的单变量Christoffel函数。在ADBench基准上的大量实验表明，UCF在平均精度上持续优于14个最先进的基线方法。通过解决Christoffel函数的可扩展性瓶颈，本文扩展了异常检测方法的工具箱，提供了一种稳健、有理论依据且普遍适用的方法。

英文摘要

Anomaly detection plays a critical role in identifying unusual patterns across domains such as fraud detection, network intrusion, and system fault diagnosis. Recently, Christoffel function-based methods, rooted in polynomial optimization, have emerged as promising alternatives to deep learning due to their strong mathematical foundations and computational frugality. However, their practical applicability is hindered by the need to invert a matrix whose size grows exponentially with the data dimension, rendering the method intractable even for moderate-dimensional datasets. This paper addresses the dimensionality limitations of Christoffel function-based anomaly detection while preserving its key theoretical properties, i.e., the on-off support dichotomy behavior and the accurate support shape capture. We introduce UCF, a univariate Christoffel function which is based on the squared distance between the query point and the support points. Extensive experiments on the ADBench benchmark demonstrate that UCF consistently outperforms 14 state-of-the-art baselines in terms of Average Precision. By resolving the scalability bottleneck of the Christoffel Function, this work expands the toolkit of anomaly detection methods with a robust, theoretically grounded, and universally applicable approach.

URL PDF HTML ☆

赞 0 踩 0

2606.12552 2026-06-12 cs.LG 新提交

Crossing the Validation Crisis: Cross-Validation Reduces Benchmarking Variance Surprisingly Well

跨越验证危机：交叉验证出人意料地有效降低基准测试方差

Célestin Eve, Gaël Varoquaux, Thomas Moreau

发表机构 * MIND Team, Université Paris-Saclay, Inria, CEA, Palaiseau, France（MIND团队，巴黎-萨克雷大学，法国国家信息与自动化研究所，法国原子能委员会，帕莱索，法国）； SODA Team, Inria, Palaiseau, France（SODA团队，法国国家信息与自动化研究所，帕莱索，法国）； Probabl

AI总结本文提出交叉验证通过样本增益概念量化虚拟数据增强，显著提升算法性能评估的置信度与稳定性，并引入动态早停机制减少计算开销。

Comments 34 pages, 11 figures

详情

AI中文摘要

现代机器学习通过实证工作推进，对新方法进行基准测试以评估相对性能。然而，评估固有的统计变异性——由于许多算法的随机性而加剧——常常因有限的测试样本而使性能估计不可靠，导致验证危机，其中真正的进步难以辨别。在这项工作中，我们展示了交叉验证在评估和比较学习算法性能时显著提高了置信度。我们引入了样本增益的概念，它量化了通过使用多个交叉验证分割来减少基准测试方差所实现的虚拟数据增强。在合成和真实世界数据集（组织病理学扫描和NLP微调）上的实验表明，多个分割可以显著提高性能估计的可靠性和稳定性，且收益递减往往比预期来得更晚。我们还引入了一种动态早停交叉验证的程序，通过从最初几个折叠估计后续折叠是否会带来大的样本增益。我们的发现强调了在可用样本上推行交叉验证以实现稳健可靠基准测试的价值。

英文摘要

Modern machine learning progresses through empirical work, benchmarking new methods to evaluate relative performance. However, the statistical variability inherent to evaluation - exacerbated by the stochastic nature of many algorithms - often makes performance estimation unreliable due to the limited test samples available, leading to a validation crisis in which genuine advances are difficult to discern. In this work, we show that cross-validation improves markedly confidence when evaluating and comparing learning algorithm performances. We introduce the concept of sample gain, which quantifies the virtual data augmentation achieved by using multiple cross-validation splits to reduce benchmarking variance. Experiments on both synthetic and real-world datasets (histopathologic scans and NLP fine-tuning) demonstrate that multiple splits can substantially improve the reliability and stability of performance estimates, with diminishing returns often setting in later than expected. We also introduce a procedure to dynamically early-stop cross-validation by estimating from the first few folds if subsequent folds will bring large sample gains. Our findings highlight the value of pushing cross-validation on available samples to achieve robust and reliable benchmarking.

URL PDF HTML ☆

赞 0 踩 0

2606.12595 2026-06-12 cs.LG cs.AI cs.CV 新提交

Emerging Flexible Designs for Geospatial Multimodal Foundation Models

地理空间多模态基础模型的新兴灵活设计

Philipe Dias, Waqwoya Abebe, Abhishek Potnis, Aristeidis Tsaris, Dan Lu, Xiao Wang, Dalton Lunga

发表机构 * Oak Ridge National Laboratory（橡树岭国家实验室）

AI总结本文系统比较了不同架构的地理空间基础模型，在统一设置下评估其灵活性与性能，为多模态推理提供设计指导。

详情

AI中文摘要

基础模型通过跨多样未标记地理空间模态的可扩展预训练，正在迅速改变地球观测。然而，其架构多样性——从编码器-only到编码器-解码器以及掩码自编码范式——使得以一致方式评估性能权衡变得具有挑战性。在这项工作中，我们对领先的、专为地理空间多模态推理设计的基础模型架构进行了同类比较，特别关注不同光谱波段配置下的灵活性。我们使用相同的自监督学习目标和训练数据集标准化预训练，并在GEOBench基准测试上，在一致参数化下评估所有模型的分类和分割任务。我们的结果为模型灵活性、模态对齐和下游任务性能之间的设计权衡提供了新见解。通过强调受控条件下的架构优势和局限性，本研究为构建能够进行鲁棒多模态推理的下一代地理空间基础模型提供了实用指导。

英文摘要

Foundation models are rapidly transforming Earth observation by enabling scalable pretraining across diverse unlabeled geospatial modalities. However, their architectural diversity ranging from encoder-only to encoder-decoder and masked autoencoding paradigms makes it challenging to assess performance trade offs in a consistent manner. In this work, we present an apples-to-apples comparison of leading FM architectures designed for geospatial multimodal reasoning, with a particular focus on flexibility across varied spectral band configurations. We standardize pretraining using identical self supervised learning objectives and training datasets, and evaluate all models under consistent parameterization on the GEOBench benchmark across classification and segmentation tasks. Our results offer new insights into the design trade-offs between model flexibility, modality alignment, and downstream task performance. By highlighting architectural strengths and limitations under controlled conditions, this study provides practical guidance for building next generation geospatial foundation models capable of robust multimodal reasoning.

URL PDF HTML ☆

赞 0 踩 0

2606.12611 2026-06-12 cs.LG cs.IT math.IT 新提交

Evaluation of AutoML Frameworks for IDS under Imbalanced Data Conditions of the NSL-KDD Dataset

NSL-KDD数据集不平衡数据条件下IDS的AutoML框架评估

Wiliane Carolina Silva, Evandro César Vilas Boas, Felipe A. P. de Figueiredo

发表机构 * Cybersecurity and Artificial Intelligence Laboratory (CS&I Lab), National Institute of Telecommunications (Inatel)（网络安全与人工智能实验室（CS&I Lab），国家电信研究所（Inatel））； Wireless and Artificial Intelligence Laboratory (WAI Lab), National Institute of Telecommunications (Inatel)（无线与人工智能实验室（WAI Lab），国家电信研究所（Inatel））

AI总结研究NSL-KDD数据集上严重类别不平衡对多分类入侵检测中AutoML框架性能的影响，发现集成学习和不平衡感知优化可提升少数类检测能力，PyCaret表现最佳（macro-F1 66%）。

详情

AI中文摘要

本研究探讨了严重类别不平衡对使用NSL-KDD数据集进行多分类网络入侵检测的自动化机器学习（AutoML）框架性能的影响。与以往通过二分类或移除少数类来简化问题的研究不同，我们保留了原始的五类分布，包括高度欠表示的R2L和U2R攻击，从而能够对不平衡敏感的学习行为进行现实评估。在统一且可重复的实验协议下，分析了九个开源AutoML框架，考虑了架构设计、集成策略、验证程序、超参数优化和不平衡处理机制的差异。结果表明，采用集成学习和不平衡感知优化的框架在少数类判别上表现更好。PyCaret获得了最佳整体性能，macro-F1达到66%，其次是AutoGluon（55%），而缺乏原生平衡支持的框架在少数类检测能力上显著下降。进一步分析表明，仅以准确率为导向的优化不足以应对高度不平衡的入侵检测场景，因为高加权指标可能与对罕见攻击类别的泛化能力差共存。作为贡献，本研究为严重多类不平衡下的AutoML入侵检测建立了标准化基准，指出了当前架构的局限性，以及将不平衡感知优化、重采样和分层评估策略原生集成到自动化学习流水线中的必要性。源代码已公开。

英文摘要

This work investigates the impact of severe class imbalance on the performance of automated machine learning (AutoML) frameworks for multiclass network intrusion detection using the NSL-KDD dataset. Unlike previous studies that simplify the problem through binary classification or minority-class removal, we preserve the original five-class distribution, including highly underrepresented attacks such as R2L and U2R, enabling a realistic evaluation of imbalance-sensitive learning behavior. Nine open-source AutoML frameworks were analyzed under a unified and reproducible experimental protocol, considering differences in architectural design, ensemble strategies, validation procedures, hyperparameter optimization, and imbalance-handling mechanisms. The results demonstrate that frameworks incorporating ensemble learning and imbalance-aware optimization achieve better minority-class discrimination. PyCaret obtained the best overall performance, reaching 66\% macro-F1, followed by AutoGluon with 55\%, whereas frameworks lacking native balancing support exhibited significant degradation in minority-class detection capability. The analysis further shows that accuracy-oriented optimization alone is insufficient for highly imbalanced IDS scenarios, since high-weighted metrics may coexist with poor generalization on rare attack categories. As a contribution, this work establishes a standardized benchmark for AutoML-based intrusion detection under severe multiclass imbalance, highlighting current architectural limitations and the need for native integration of imbalance-aware optimization, resampling, and stratified evaluation strategies into automated learning pipelines. The source code is publicly available.

URL PDF HTML ☆

赞 0 踩 0

2606.12639 2026-06-12 cs.LG q-bio.QM 新提交

The Metric Picks the Winner: Evaluation Choice Flips Model Rankings for Drug-Response Prediction in Unseen Chemistry

度量选择胜者：评估选择翻转未见化学空间中药物反应预测的模型排名

Dhruv Agarwal, Riya Bisht

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本研究通过VCPI竞赛数据，发现药物反应预测模型排名随评估指标反转：简单基线在代理指标下胜出，但真实指标下深度模型显著优于线性指纹基线，首次在真实药物化学数据上验证了度量校准效应。

详情

AI中文摘要

预测细胞转录组对其从未见过的药物的反应是计算细胞生物学中的一个核心难题：最近的基准测试表明，一旦测试化合物按化学结构留出，复杂模型往往无法击败简单基线。我们研究了一个细胞系和检测方法，即通过DRUG-seq分析的THP-1细胞，由VCPI预测竞赛的活性化合物加权MSE（wMSE）评分。我们提出了一种分阶段方法：该领域一直无法击败的简单基线（未处理对照和平均训练化合物响应）；非参数检索（对留出化合物的最近训练化合物进行Tanimoto加权平均）；以及一个融合阶段，将冻结的化学嵌入与检索支持特征相结合，以预测相对于均值的残差，并包含不确定性头和基因程序。在发布的VCPI THP-1 drug-seq数据（14,026个训练化合物）上，采用Bemis-Murcko骨架划分，模型排名根据度量标准反转。在逆方差每基因代理度量下，基于Morgan指纹的正则化线性回归似乎胜过了深度模型、检索和ChemBERTa——这是教科书式的“简单基线获胜”结果。但在竞赛的真实活性集度量（每（基因，化合物）的Mejia权重，经官方评分器验证；均值基线0.535 vs 组织者的0.507参考）下，情况反转：深度模型获胜，我们的融合解码器显著优于线性指纹基线（-0.012 wMSE，配对bootstrap p < 10^-4），而代理度量的胜者成为最差的化学感知预测器。选择度量即选择胜者——据我们所知，这是首次在真实留出药物化学数据上证明度量校准效应，该效应此前主要在遗传扰动中建立。我们发布了一个可复现的流水线，连接到官方评分器，可在真实的1064 x 12,995网格上生成有效提交。

英文摘要

Predicting how a cell's transcriptome responds to a drug it has never seen is a core, hard problem in computational cell biology: recent benchmarks show complex models often fail to beat trivial baselines once test compounds are held out by chemistry. We study one cell line and assay, THP-1 cells profiled by DRUG-seq, scored by the active-compound weighted MSE(wMSE) of the VCPI prediction contest. We propose a staged approach: dumb baselines (untreated control and mean training-compound response) that the field keeps failing to beat; non-parametric retrieval (a Tanimoto-weighted average of a held-out compound's nearest training compounds); and a fusion stage combining a frozen chemistry embedding with retrieval-support features to predict the residual over the mean, with an uncertainty head and gene programs. On the released VCPI THP-1 drug-seq data (14,026 training compounds), under a Bemis-Murcko scaffold split, the model ranking inverts depending on the metric. Under an inverse-variance per-gene proxy, a regularized linear regression on Morgan fingerprints appears to win over the deep models, retrieval, and ChemBERTa -- the textbook "simple baselines win" result. But under the contest's true active-set metric (per-(gene, compound) Mejia weights, validated against the official scorer; mean baseline 0.535 vs the organizers' 0.507 reference), that reverses: the deep models win, our fusion decoder significantly beats the linear fingerprint baseline (-0.012 wMSE, paired bootstrap p < 10^-4), and the proxy's winner becomes the worst chemistry-aware predictor. Picking the metric picks the winner -- to our knowledge the first demonstration on real held-out drug chemistry of the metric-calibration effect established largely on genetic perturbation. We release a reproducible pipeline wired to the official scorer that emits a valid submission over the real 1064 x 12,995 grid.

URL PDF HTML ☆

赞 0 踩 0

2606.12643 2026-06-12 cs.LG 新提交

TEDD: Robust Detection of Unstable Temporal Features

TEDD：不稳定时间特征的鲁棒检测

Ricardo Ribeiro Pereira, Bruno Casal Laraña, Nádia Soares, Miguel Araújo

发表机构 * Feedzai

AI总结提出TEDD方法，利用回归模型检测导致时间分布变化的特征，无需参数调优，可扩展，能检测数值和类别特征的单变量及多变量漂移。

Comments 8 pages, 9 figures

详情

DOI: 10.1109/ICDMW51313.2020.00063

AI中文摘要

在处理真实世界的时间序列数据时，经常会遇到特征分布随时间变化的情况。在这种不稳定的数据上直接使用机器学习模型可能导致性能迅速下降，尤其是当新分布与训练时所见差异较大时。为了解决这个问题，自动识别随时间变化的特征至关重要。检测到这些特征后，数据科学家和其他从业者能够通过应用数据变换等方式缓解问题，部署更鲁棒的模型，使其在更长时间内保持高性能。本文描述了特征不应遭受的时间变化类型，并提出了TEDD技术，用于a) 识别数据集何时可能导致不稳定的机器学习模型，以及b) 自动检测哪些特征导致了这种不鲁棒性。为此，我们利用回归模型来突出哪些特征有助于良好预测实例的时间戳。我们将我们的方法与其他方法在真实和合成数据上进行比较，测试它们在所有简单变化模式上的检测能力。我们表明，我们的方法：检测所有类型的基本变化，包括数值和类别特征；能够检测多变量漂移；返回一个可比较的值来衡量每个特征的变化量；无需参数调优；并且在数据集的特征数量和实例数量上都具有可扩展性。

英文摘要

When working with real-world temporal data, it is common to encounter features whose distribution is changing over time. The naive employment of Machine Learning models on this unstable data might lead to rapidly degrading performance, especially if the new distribution is much different from what was previously seen during training. In order to cope with this problem, it is critical to automatically identify features that are changing over time. With these features detected, data scientists and other practitioners will be able to mitigate the issue (for instance, by applying data transformations), deploying more robust models that retain high performance for longer periods of time. In this paper, we describe which temporal changes a feature should not suffer from, and propose TEDD, a technique to a) identify when a dataset might lead to an unstable Machine Learning model and b) automatically detect which features cause such lack of robustness. In order to achieve it, we leverage a regression model to highlight which features contribute to a good prediction of an instance's timestamp. We compare our approach to other methods in real and synthetic data, testing their detection capability on all simple change patterns. We show that our method: detects all types of basic changes, both for numerical and categorical features; can detect multivariate drifts; returns a comparable value measuring the amount of change of each feature; requires no parameter tuning; and is scalable both on number of features and instances of the dataset.

URL PDF HTML ☆

赞 0 踩 0

2606.12718 2026-06-12 cs.LG eess.SP 新提交

Out-of-Distribution (OOD) Detectors for Open-Set RF Fingerprinting

面向开放集射频指纹识别的分布外检测器

Sudeepta Mondal, Ganesh Sundaramoorthi

发表机构 * University of Michigan（密歇根大学）

AI总结针对开放集射频指纹识别中未知发射机与时间漂移引起的分布偏移问题，引入基于信息论的OOD检测统一框架，并采用无需OOD调优数据的方法，在POWDER数据集上验证其性能接近有真实OOD数据的基线。

详情

AI中文摘要

射频指纹识别系统必须在开放世界环境中运行，其中来自未知发射机的信号和时间漂移会在测试时引入分布偏移。分布外检测为该问题提供了自然框架，但其在射频指纹识别中的应用仍然有限。其采用的一个关键障碍是大多数OOD检测器需要辅助OOD数据进行参数调优，而在射频环境中收集代表性OOD数据不切实际，这一假设难以满足。在这项工作中，我们将机器学习文献中一组有前景的OOD检测方法引入开放集RFF领域。我们基于信息论（通信系统的自然框架）在一个统一的数学框架中呈现这些方法。我们的框架允许对方法进行系统分析并开发新方法。我们进一步展示了最近关于无需给定OOD调优数据即可调优OOD检测器的工作在开放集RFF中的适用性。我们在POWDER射频指纹数据集上进行评估，表明无需任何给定OOD数据调优的检测器性能与能够访问真实OOD调优数据的基线相当，并且大大优于无法访问真实OOD调优数据的基线方法，展示了RFF问题的实际可行性。

英文摘要

Radio-frequency (RF) fingerprinting systems must operate in open-world environments where signals from unknown transmitters and temporal drift introduce distribution shift at test time. Out-of-distribution (OOD) detection provides a natural framework for this problem, yet its application to RF fingerprinting (RFF) remains limited. A key barrier to their adoption is that most OOD detectors require auxiliary OOD data for parameter tuning, an assumption that is difficult to satisfy in RF environments where representative OOD data is impractical to collect. In this work, we introduce a promising set of OOD detection methods from the machine learning literature to open-set RFF domain. We present these methods within a unified mathematical framework based on information theory, which is a natural framework for communication systems. Our framework allows for the systematic analysis of methods and development of new methods. We further demonstrate the applicability of recent work on tuning OOD detectors without given OOD tuning data for open-set RFF. We evaluate on the POWDER RF fingerprinting dataset, showing that detectors tuned without any given OOD data achieve performance comparable to baselines with access to true OOD tuning data and greatly out-perform baseline approaches without access to true OOD tuning data, showcasing the practical viability for the RFF problem.

URL PDF HTML ☆

赞 0 踩 0

2606.12764 2026-06-12 cs.LG cs.CL cs.CR 新提交

Detecting Functional Memorization in Code Language Models

检测代码语言模型中的功能记忆

Matthieu Meeus, Anil Ramakrishna, Matthew Grange, Zheng Xu, Luca Melis

发表机构 * Meta ； Imperial College London（伦敦帝国学院）

AI总结研究代码语言模型的功能记忆现象，通过反事实设置对比暴露目标代码的模型与未暴露的参考模型，使用文本和功能相似性度量，发现功能记忆超出文本重叠的检测范围。

2606.12913 2026-06-12 cs.LG cs.CV 新提交

Selecting Samples on Graphs: A Unified Dataset Pruning Framework for Lossless Training Acceleration

图上的样本选择：用于无损训练加速的统一数据集剪枝框架

Dongyue Wu, Zilin Guo, Xiaoyu Li, Jiajia Liu, Jingdong Chen, Nong Sang, Changxin Gao

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出基于图的统一数据集剪枝框架，将数据集建模为加权图，通过最大权重团问题选择样本，并设计贪心算法，在多种剪枝比例下优于现有方法，实现ImageNet-1k上40%以上训练加速且不损失精度。

Comments ICML 2026

详情

AI中文摘要

现代训练数据集的快速增长显著增加了计算成本，促使数据集剪枝（DP）方法仅保留信息量丰富的样本子集以减少训练成本。现有的剪枝标准通常依赖于评估样本独立性的内在信号或通过成对关系促进多样性的外在信号。虽然在其特定领域有效，但每种方法仅捕捉样本效用的一方面，且在不同剪枝比例或数据分布下缺乏鲁棒性。在这项工作中，我们提出了一个统一的基于图的DP框架。通过将数据集建模为加权图，其中节点权重编码内在价值，边权重编码外在价值，DP可以转化为最大权重团问题（MWCP）。尽管MWCP是NP难的，但其结构允许基于样本边际增益的原则性贪心解法。在几个温和条件下，我们进一步证明该统一目标具有形式化的近似保证，适用于广泛的度量族，并提供了实用设计指南。大量实验表明，我们的方法优于现有DP方法，同时显著降低训练成本，在ImageNet-1k上使用ResNet-50时，训练时间减少超过40%且不损失精度。

英文摘要

The rapid growth of modern training datasets has significantly increased computational cost, motivating dataset pruning~(DP) methods which retain only a subset of informative samples to reduce training cost. Existing pruning criteria typically rely on either intrinsic signals that assess samples independently or extrinsic signals that promote diversity via pairwise relations. While effective in their own specific regimes, each captures only one aspect of sample utility and lacks robustness across different pruning ratios or data distribution. In this work, we present a unified graph-based DP framework. By modeling the dataset as a weighted graph, where node weights encode intrinsic value and edge weights encode extrinsic value, DP can be cast as a Maximum Weight Clique Problem (MWCP). Although MWCP is NP-hard, its structure admits a principled greedy solution based on sample-wise marginal gains. Under a few mild conditions, we further prove that this unified objective enjoys a formal approximation guarantee, which applies to a broad family of importance metrics and provides practical design guidelines. Extensive experiments show that our method outperforms existing DP methods while substantially reducing training cost, reducing training time by over 40\% without sacrificing accuracy on ImageNet-1k with ResNet-50.

URL PDF HTML ☆

赞 0 踩 0

2606.12997 2026-06-12 cs.LG stat.ML 新提交

Reliability of Probabilistic Emulation of Physical Systems

物理系统概率仿真的可靠性

Sam F. Greenbury, Radka Jersakova, Paolo Conti, Marjan Famili, Christopher Iliffe Sprague, Edwin Brown, Jason D. McEwen

发表机构 * The Alan Turing Institute（艾伦·图灵研究所）； Autodesk Research（欧特克研究院）； PhysicsX ； Orbital ； University of Sheffield（谢菲尔德大学）； University College London（伦敦大学学院）

AI总结比较生成模型与CRPS训练集成在物理系统概率仿真中的可靠性，发现CRPS集成在覆盖率和推理速度上更优。

详情

AI中文摘要

目前，生成物理系统概率预测的两种主要方法已经出现：生成模型（如扩散或流匹配）以及注入随机性的确定性模型集成（使用连续排序概率评分（CRPS）损失训练）。虽然这两种方法都表现出强大的预测准确性，但其不确定性的可靠性尚未得到系统评估。我们通过开发一个框架来填补这一空白，该框架在匹配模型大小和计算预算的情况下，评估这两种方法在多种二维时空物理系统中的表现。我们通过检查预测区间的经验覆盖率来评估概率仿真的可靠性，同时考虑准确性和计算效率指标。CRPS训练的集成在单步预测和自回归展开中通常能实现更可靠的不确定性，显示出比在潜在空间中训练生成模型的标准替代方案更好的覆盖率。此外，CRPS方法提供了显著更快的推理速度。当生成模型在环境空间而非压缩潜在空间中训练时（这在高维问题中通常不可行），它们表现出与CRPS训练集成相当的覆盖率，但推理延迟显著更大。相比之下，当CRPS训练的集成在潜在空间中训练时，其覆盖率相对于环境空间没有明显下降。生成模型和CRPS训练的集成都表现出良好的预测准确性。为促进未来的研究和应用，我们发布了AutoCast，一个实现生成模型和CRPS训练集成的模块化框架，以及AutoSim，一个用于快速原型的灵活数据集生成包。

英文摘要

Two dominant approaches have emerged for generating probabilistic forecasts of physical systems: generative models, such as diffusion or flow matching; and ensembles of deterministic models with stochasticity injected, trained using the continuous ranked probability score (CRPS) loss. While both approaches have demonstrated strong predictive accuracy, the reliability of their uncertainties has not been systematically assessed. We address this gap by developing a framework to evaluate both approaches across diverse 2D spatiotemporal physical systems, under matched model size and computational budget. We assess the reliability of probabilistic emulation by inspecting the empirical coverage of predictive intervals, while also considering accuracy and computational efficiency metrics. CRPS-trained ensembles typically achieve more reliable uncertainties on both single-step prediction and autoregressive rollouts, demonstrating better coverage than the standard alternative of training generative models in a latent space. Moreover, the CRPS approach offers significantly faster inference. When generative models are trained in ambient rather than a compressed latent space, which is often infeasible for high-dimensional problems, they exhibit comparable coverage to CRPS-trained ensembles, though with substantially larger inference latency. In contrast, when CRPS-trained ensembles are trained in latent space they do not show a marked degradation in coverage with respect to ambient space. Both generative models and CRPS-trained ensembles demonstrate good predictive accuracy. To facilitate future research and application, we release AutoCast, a modular framework implementing both generative models and CRPS-trained ensembles, alongside AutoSim, a flexible dataset generation package for rapid prototyping.

URL PDF HTML ☆

赞 0 踩 0

2606.13104 2026-06-12 cs.LG 新提交

Authority, Truth, and Citation Bias: A Large-Scale Multi-Domain Benchmark for Studying Epistemic Susceptibility in Large Language Models

权威、真实性与引文偏差：研究大语言模型认知易感性的大规模多领域基准

Aryan Khurana, Aravind Ramana RN, Dhruv Kumar

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出AuthorityBench基准，通过2x2因子设计隔离引文权威信号对LLM认知行为的影响，发现引文存在（无论真假）均提高幻觉率，真声明搭配假引文时幻觉率上升3-22个百分点。

Comments 10 pages, 5 figures. Accepted to AI4GOOD and EIML at ICML 2026

详情

AI中文摘要

大型语言模型越来越多地部署在引文增强的环境中，但引文存在对模型行为的影响（独立于事实内容）仍知之甚少。我们引入了AuthorityBench，一个包含220,564个提示的多领域基准，用于隔离基于引文的权威信号如何影响LLM的认知行为。该基准采用完全平衡的2x2因子设计，交叉声明真实性（claim veracity）与引文真实性（citation veracity），这是首个这样做的基准，涵盖四个领域（常识、科学、法律和医学），并在40个提示模板、四个场所声望等级和一个国家编码的作者姓名数据集上进行受控变化。评估七个模型在12个结构化研究问题上的表现，我们发现引文的存在（无论是真实的还是捏造的）相对于无引文基线一致地提高了幻觉率。当捏造的引文伴随真实声明时，这种效应最强，使幻觉率提高3到22个百分点，在常识领域达到35%到77%，而法律声明相对稳健，场所声望和作者人口统计学影响可忽略不计。所有数据集和评估代码均可在以下网址获取：this https URL

英文摘要

Large language models are increasingly deployed in citation-augmented settings, yet the effect of citation presence on model behavior independent of factual content remains poorly understood. We introduce AuthorityBench, a 220,564-prompt multi-domain benchmark that isolates how citation-based authority signals influence epistemic behavior in LLMs. The benchmark uses a fully balanced 2x2 factorial design crossing claim veracity with citation veracity, the first to do so, across four domains (general knowledge, science, law, and medicine), with controlled variation over 40 prompt templates, four venue prestige tiers, and a country-coded author name dataset. Evaluating seven models on 12 structured research questions, we find that citation presence, whether real or fabricated, consistently increases hallucination rates relative to a no-citation baseline. The effect is strongest when fabricated citations accompany true claims, raising hallucination rates by 3 to 22 percentage points and reaching 35 to 77% in the general knowledge domain, while legal claims are comparatively robust and venue prestige and author demographics show negligible impact. All datasets and evaluation code are available at: https://github.com/floating-reeds/AuthorityBench

URL PDF HTML ☆

赞 0 踩 0

2606.13105 2026-06-12 cs.LG 新提交

Disparate Impact in Synthetic Data Generation

合成数据生成中的差异性影响

Paul Andrey, Michaël Perrot, Batiste Le Bars, Marc Tommasi

发表机构 * Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 - CRIStAL（里尔大学、法国国家信息与自动化研究所、法国国家科学研究中心、中央里尔高等电力工程学院、计算机科学、信号与自动化研究实验室）

AI总结本文重新审视合成数据生成中的差异性影响公平性概念，指出非差异性影响要求合成分布与真实分布一致，并分析SDG失败的原因（表达能力、抽样误差、差分隐私估计误差），提出分组学习策略以提升整体效用和公平性。

详情

AI中文摘要

我们重新审视合成数据生成（SDG）中差异性影响的公平性概念，该概念评估生成记录的效用是否在不同敏感群体间相同。我们的方法不同于现有的公平SDG工作，后者旨在纠正观测分布中的不当偏差，从而将SDG重新定义为学习一个并非真实数据分布的分布。相比之下，当合成分布与真实分布相同时，非差异性影响得以显著实现。我们揭示了SDG可能无法达到该解决方案的原因，并讨论了近似误差和估计误差为何会发生以及可能在不同群体间存在差异。我们特别关注了SDG方法相对于分布复杂性的表达能力、群体比例导致的抽样误差以及差分隐私机制引起的估计误差。我们在人工和真实数据上展示了差异性影响的案例，重点关注依赖概率图模型的SDG方法。我们还引入了一种学习分组SDG模型的策略，并说明了它在许多情况下如何提升整体效用及其公平性。

英文摘要

We revisit the fairness notion of disparate impact for synthetic data generation (SDG), that assesses whether the utility of generated records is the same across sensitive groups. Our approach departs from existing work on fair SDG, that address the problem of correcting for undue biases in the observed distribution, hence redefining SDG as learning a distribution that is not that of the real data. By contrast, non-disparate impact is notably achieved when the synthetic and real distributions are the same. We expose reasons why SDG may fail to reach that solution and discuss why approximation and estimation errors occur and can be disparate across groups. We notably look into the expressive power of SDG methods relative to distribution complexity, sampling errors due to group proportions, and estimation errors induced by differential privacy mechanisms. We illustrate cases of disparate impact on both artificial and real-world data, focusing on SDG methods that rely on probabilistic graphical models. We also introduce a strategy of learning group-wise SDG models and illustrate how it can improve both the overall utility and its parity in many settings.

URL PDF HTML ☆

赞 0 踩 0

2606.13194 2026-06-12 cs.LG 新提交

WHAR Arena: Benchmarking the State of the Art in Efficient Wearable Human Activity Recognition

WHAR Arena: 基准测试高效可穿戴人体活动识别的最新进展

Maximilian Burzer, Tobias King, Till Riedel, Michael Beigl, Tobias Röddiger

发表机构 * Karlsruhe Institute of Technology（卡尔斯鲁厄理工学院）； IPAI Foundation gGmbH（IPAI基金会有限责任公司）

AI总结为解决可穿戴人体活动识别中的可比性危机，构建了包含30个数据集的大规模基准，评估17种架构，发现预测性能趋于饱和，而紧凑模型和随机森林在部署效率上构成帕累托前沿。

Comments 20 pages, 9 Figures, 3 Tables

详情

AI中文摘要

深度学习已成为可穿戴人体活动识别（WHAR）的主导范式，但进展因可比性危机而变得模糊。结果通常使用不一致的数据集、自定义数据处理和不同的评估协议报告，使得最新技术的声明脆弱。我们通过一个大规模、开源基准来解决这个问题，该基准在标准化处理、统一模型接口和共享的跨主体评估协议下整合了30个不同的数据集。在4760次训练运行中评估了17种代表性架构，我们共同测量了预测性能以及Android参考设备上的设备延迟、峰值内存和模型大小。我们的结果表明，WHAR的最新进展是分布式的，而非由单一架构主导。虽然CNN-HAR实现了最高的平均宏F1，但表现最佳的模型紧密聚集，表明当代架构已接近预测性能上限。当考虑部署效率时，紧凑神经模型（如TinierHAR）和经典随机森林定义了实际相关的帕累托前沿，而较大的循环和混合模型则产生高硬件成本而无相应的性能增益。因此，尽管预测性能已趋于平稳，但在优化部署效率和改进对领域变化的适应方面，未来仍有巨大潜力。我们发布完整框架以支持透明的重用和扩展。

英文摘要

Deep learning has become the dominant paradigm in Wearable Human Activity Recognition (WHAR), yet progress is obscured by a comparability crisis. Results are often reported using inconsistent datasets, custom data processing, and varying evaluation protocols, making state-of-the-art claims fragile. We address this with a large-scale, open-source benchmark that integrates 30 diverse datasets under standardized processing, unified model interfaces, and a shared cross-subject evaluation protocol. Evaluating 17 representative architectures across 4760 training runs, we jointly measure predictive performance alongside on-device latency, peak memory, and model size on an Android reference device. Our results reveal that the WHAR state of the art is distributed rather than dominated by a single architecture. While CNN-HAR achieves the highest mean macro-F1, top-performing models cluster tightly, indicating contemporary architectures have converged near a predictive performance ceiling. When accounting for deployment efficiency, compact neural models, such as TinierHAR, and classical Random Forests define the practically relevant Pareto frontier, whereas larger recurrent and hybrid models incur high hardware costs without corresponding performance gains. Consequently, while predictive performance has plateaued, substantial potential for future progress remains in optimizing deployment efficiency and improving adaptation to domain shifts. We release our full framework to support transparent reuse and extension.

URL PDF HTML ☆

赞 0 踩 0

2606.13338 2026-06-12 cs.LG 新提交

Navigating the Safety-Fidelity Trade-off: Massive-Variate Time Series Forecasting for Power Systems via Probabilistic Scenarios

导航安全-保真度权衡：通过概率场景进行电力系统的大规模多变量时间序列预测

Kaijie Xu, Anqi Wang, Xilin Dai

发表机构 * ZJU-UIUC Institute, Zhejiang University（浙江大学伊利诺伊大学厄巴纳香槟校区联合学院）

AI总结针对现有基准无法评估大规模多变量概率预测的安全性与保真度权衡问题，提出包含多达36,964个通道的电力系统基准PowerPhase和场景式分位数预测器PowerForge，在多个网格上取得最佳平均排名。

详情

AI中文摘要

概率预测模型越来越多地部署在具有不同通道物理特性和运行约束的多变量系统上，但现有基准无法大规模评估这两个属性。公开的规范多变量基准最多包含2,000个通道，而电力系统基准要么缺乏时间结构，要么缺乏概率评估。我们提出PowerPhase，这是一个基于六个输电网络构建的概率预测基准，联合预测通道数从2,000到36,964，比流行的规范多变量基准高出一个数量级以上。每个目标轨迹是交流潮流求解的输出，PowerPhase配备了约束感知指标，包括Safety_mBrier、NECV和CVaR-alpha，作为CRPS和Distortion的补充。在八个基线和三个随机种子上，分布准确性和约束满足对模型进行不同排序，我们将这种权衡称为安全-保真度。我们进一步提出PowerForge，一种基于场景的分位数预测器，具有类型特定的解码头和变量组之间的因果桥，在每个网格上实现了最佳平均排名。

英文摘要

Probabilistic forecasting models are increasingly deployed on multivariate systems with distinct channel physics and operational constraints, but existing benchmarks evaluate neither property at scale. Public canonical multivariate benchmarks cap out at 2,000 channels, while power-system benchmarks either lack temporal structure or probabilistic evaluation. We introduce PowerPhase, a probabilistic forecasting benchmark built on six transmission grids ranging from 2,000 to 36,964 jointly forecasted channels, more than an order of magnitude beyond popular canonical multivariate benchmarks. Each target trajectory is the output of an AC power-flow solve, and PowerPhase ships with constraint-aware metrics, including Safety_mBrier, NECV, and CVaR-alpha, that complement CRPS and Distortion. Across eight baselines and three seeds, distributional accuracy and constraint satisfaction rank models differently, a trade-off we term safety-fidelity. We further propose PowerForge, a scenario-based quantile forecaster with type-specific decoding heads and a causal bridge between variable groups, which achieves the best average rank on every grid.

URL PDF HTML ☆

赞 0 踩 0

2606.13477 2026-06-12 cs.LG cs.AI cs.CL 新提交

SupraBench: A Benchmark for Supramolecular Chemistry

SupraBench: 超分子化学基准

Tianyi Ma, Yijun Ma, Zehong Wang, Weixiang Sun, Ziming Li, Connor R. Schmidt, Chuxu Zhang, Matthew J. Webber, Yanfang Ye

发表机构 * University of Notre Dame（圣母大学）； University of Connecticut（康涅狄格大学）

AI总结为评估大语言模型在超分子化学推理中的能力，与领域专家合作发布了首个超分子基准SupraBench，包含四个基本任务和一个辅助视觉任务，并提供了16M令牌的语料库SupraPMC。

详情

AI中文摘要

超分子化学，包括非共价主客体组装的研究，推动了各种应用的发展。然而，设计主客体系统仍然耗时，每个候选对需要数天的干实验室验证。尽管LLMs已成为一种快速的替代方案，在分子结合任务上表现出色，但目前尚无基准系统性地评估LLMs在超分子化学基本任务（如结合亲和力预测）中的主客体推理能力。为此，我们与领域专家合作发布了首个超分子基准，称为SupraBench，用于评估LLMs在化学推理中的表现。具体来说，我们设计了四个基本任务，即结合亲和力预测、最佳结合物选择、溶剂识别和主客体描述，以及一个辅助的基于视觉的分子识别任务。我们还发布了SupraPMC，一个从Europe PMC中提取的经过整理的1600万令牌的超分子化学文章语料库，以支持对超分子领域的适应。我们对一系列开源和专有LLMs进行了基准测试，发现LLMs在所有任务上都有很大的提升空间。在SupraPMC上的领域自适应预训练可以干净地迁移到分布内回归，但会与严格的字母格式输出进行权衡。此外，不同任务家族的难度分布差异很大，揭示了不同的失败模式，表明当前超分子化学推理中存在特定的差距。我们的源代码和基准数据集可在以下网址获取：此 https URL。

英文摘要

Supramolecular chemistry, which includes the study of non-covalent host-guest assemblies, has advanced various applications. However, designing host-guest systems remains time-consuming, requiring days of dry-lab verification per candidate pair. Although LLMs have emerged as a fast alternative with strong performance on molecular binding tasks, no benchmark currently systematically evaluates LLMs for host-guest reasoning across fundamental supramolecular chemistry tasks, e.g., binding affinity prediction. To this end, we collaborate with domain experts to release the first Supramolecular Benchmark, called SupraBench, to evaluate LLMs in chemistry reasoning. Specifically, we design four fundamental tasks, i.e., binding affinity prediction, top-binder selection, solvent identification, and host-guest description, plus an auxiliary vision-based task for molecular identification. We also release SupraPMC, a curated 16M-token corpus of Supramolecular chemistry articles distilled from Europe PMC, to support the adaptation to the supramolecular domain. We benchmark a broad range of open and proprietary LLMs and find that LLMs leave substantial headroom across all tasks. Domain adaptation pretraining over SupraPMC transfers cleanly to in-distribution regression but trades off against strict letter-format output. Moreover, the difficulty profile differs sharply across task families, revealing distinct failure modes that indicate specific gaps in current supramolecular chemistry reasoning. Our source codes and benchmark datasets are available at https://github.com/Tianyi-Billy-Ma/SupraBench.

URL PDF HTML ☆

赞 0 踩 0

2606.13486 2026-06-12 cs.LG cs.AI 新提交

CRAFTIIF: Cross-Resolution Analytic Four-Type Interpretable Isolation Forest for Multivariate Time Series Anomaly Detection

CRAFTIIF：用于多元时间序列异常检测的跨分辨率分析四类型可解释孤立森林

William Smits

发表机构 * Avathon

AI总结提出CRAFTIIF无监督框架，通过四种小波特征和五个孤立森林同时检测点、分布、时间和集体四类异常，在mTSBench基准上达到平均F1=0.228，VUS-PR比先前最佳提升40.7%。

Comments 14 pages, 4 figures, 2 appendices. Submitted to IEEE Transactions on Knowledge and Data Engineering (TKDE). Code: https://github.com/smitswil/craftiif

详情

AI中文摘要

多元时间序列中的异常检测面临四种结构不同的异常类型——点异常（孤立尖峰）、分布异常（水平偏移）、时间异常（节奏变化）和集体异常（传感器间相关性崩溃）——每种都需要不同的特征表示。大多数无监督方法只针对其中一两种类型，且可解释性有限。我们提出CRAFTIIF（跨分辨率分析四类型可解释孤立森林），这是一个完全无监督的框架，针对所有四种类型，无需针对数据集调整。CRAFTIIF生成K=500个随机分析小波特征，跨越四个小波族（Morlet、DOG、Haar、Coiflet），每个针对特定异常类型，并输入五个结构化的孤立森林——每种类型一个，外加一个用于复合异常的元IF。自适应Otsu/MAD阈值在0.1%到69.2%的异常率范围内自动校准检测。由于每个IF仅针对特定类型的特征进行训练，分支触发直接提供异常类型归因，无需事后解释。在mTSBench基准（Zhou等人，TMLR 2026）的所有19个数据集上评估，CRAFTIIF在全部19个数据集上达到平均F1=0.228，在13个可检测数据集上F1=0.322，在VUS-PR上排名第一（0.463对比之前最佳0.329，提升40.7%）。一个诊断框架——oracle F1、可检测性限制和分支分离比——识别出19个数据集中有6个从根本上无法被任何无监督方法检测。在11种消融条件下，自适应阈值（+38% F1）、四分支结构（+20%）和元IF（+23%）均被证明是必不可少的。代码：此 https URL

英文摘要

Anomaly detection in multivariate time series is challenged by four structurally distinct anomaly types -- point (isolated spikes), distributional (level shifts), temporal (rhythm changes), and collective (inter-sensor correlation breakdowns) -- each requiring different feature representations. Most unsupervised methods target only one or two types and provide limited interpretability. We present CRAFTIIF (Cross-Resolution Analytic Four-Type Interpretable Isolation Forest), a fully unsupervised framework targeting all four types without dataset-specific tuning. CRAFTIIF generates K=500 random analytic wavelet feature draws across four families (Morlet, DOG, Haar, Coiflet), each targeting a specific anomaly type, feeding five structured Isolation Forests -- one per type plus a meta-IF for compound anomalies. An adaptive Otsu/MAD threshold calibrates detection automatically across anomaly rates from 0.1% to 69.2%. Because each IF is trained exclusively on type-specific features, branch firing provides direct anomaly-type attribution by construction, without post-hoc explanation. Evaluated on all 19 datasets of the mTSBench benchmark (Zhou et al., TMLR 2026), CRAFTIIF achieves mean F1=0.228 (all 19 datasets) and F1=0.322 (13 detectable datasets), ranking first among all 25 evaluated methods on VUS-PR (0.463 vs. previous best 0.329, +40.7%). A diagnostic framework -- oracle F1, detectability limits, and branch separation ratios -- identifies 6 of 19 datasets as fundamentally undetectable by any unsupervised method. Ablation over 11 conditions confirms adaptive thresholding (+38% F1), four-branch structure (+20%), and meta-IF (+23%) are each essential. Code: https://github.com/smitswil/craftiif

URL PDF HTML ☆

赞 0 踩 0

2606.12426 2026-06-12 cs.CY cs.CL cs.LG 交叉投稿

Two Wrongs, No Right: Auditing Social-Desirability Bias in LLM Annotators for Computational Social Science

两个错误，没有正确：审计计算社会科学中LLM标注者的社会期望偏差

Varun Kotte

发表机构 * Varun Kotte

AI总结研究审计了三个开源指令微调模型在TweetEval任务中的社会期望偏差，发现模型存在宽大、过度纠正和中性偏差，且提示干预无法纠正，聚合指标可能掩盖实质结论错误。

详情

AI中文摘要

LLM标注者越来越多地用于计算社会科学（CSS），但尚不清楚其对齐形状的错误是否会改变研究者报告的实证结论。我们在四个提示条件下（72个单元格）审计了三个开源7B指令微调模型（Zephyr、Mistral-Instruct、Qwen2.5-Instruct）在六个TweetEval任务中的表现，发现社会期望失败并非单一方向。Zephyr表现出宽大偏差，系统性地少应用有害标签（冒犯性语言：假良性率0.729，虚警率0.031）。Mistral和Qwen表现出过度纠正，过度应用相同标签（Mistral仇恨言论FAR = 0.604）。所有三个模型在堕胎立场上表现出中性偏差，低估反对流行率24至40个百分点，并夸大中性标签。我们测试的四种提示干预（中性、安全框架、去个性化、思维链）均未纠正这些跨模型失败；安全框架可能加剧立场扭曲。引人注目的是，Zephyr的仇恨言论流行率估计与黄金率完全一致，而其类别条件误差在两个方向上都很大，这是一种偶然的抵消，误导了聚合验证。我们将这些模式转化为一个三部分分类法，具有诊断性FBR/FAR特征和轻量级黄金样本验证协议。可信CSS的标题：在聚合指标上看起来校准的模型仍然可能翻转研究者报告的实质性实证结论。

英文摘要

LLM annotators are increasingly used in computational social science (CSS), but it is unclear whether their alignment-shaped errors preserve the empirical conclusions a researcher would report. We audit three open-source 7B instruction-tuned models (Zephyr, Mistral-Instruct, Qwen2.5-Instruct) across six TweetEval tasks under four prompt conditions (72 cells) and find that social-desirability failures do not run in a single direction. Zephyr exhibits leniency bias, systematically under-applying harmful labels (offensive language: false benign rate 0.729, false alarm rate 0.031). Mistral and Qwen exhibit overcorrection, over-applying the same labels (Mistral hate-speech FAR = 0.604). All three models exhibit neutrality bias on abortion stance, underestimating opposition prevalence by 24 to 40 percentage points and inflating the neutral label. None of the four prompting interventions we test (neutral, safety framing, depersonalized, chain-of-thought) corrects these failures across models; safety framing can worsen stance distortion. Strikingly, Zephyr's hate-speech prevalence estimate matches the gold rate exactly while its class-conditional errors are large in both directions, an accidental cancellation that misleads aggregate validation. We translate these patterns into a three-part taxonomy with diagnostic FBR/FAR signatures and a lightweight gold-sample validation protocol. The headline for trustworthy CSS: a model that looks calibrated on aggregate metrics can still flip the substantive empirical conclusion a researcher would report.

URL PDF HTML ☆

赞 0 踩 0

2606.12451 2026-06-12 cs.AI cs.IR cs.LG 交叉投稿

ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs

ToolSense: 审计LLM中参数化工具知识的诊断框架

Ashutosh Hathidara, Sai Shruthi Sistla, Sebastian Schreiber, Sahil Bansal

发表机构 * SAP Labs（SAP实验室）

AI总结提出ToolSense诊断框架，自动生成三类基准测试，揭示参数化工具检索中知识-检索分离现象，发现模型在模糊查询下性能显著下降。

详情

AI中文摘要

作为大型工具目录上的代理部署的大型语言模型面临关键的工具检索瓶颈。由于基于嵌入的检索方法依赖于可能无法充分捕获专用工具语义的紧凑编码器，参数化工具检索通过将每个工具编码为附加到LLM词汇表的虚拟令牌来解决这一问题，经过两个阶段（记忆然后检索SFT）的微调，将LLM用作检索器，在标准ToolBench检索基准上取得了强劲性能。然而，这些基准使用冗长、完全指定的查询，并且其评估应用了将输出限制为有效令牌路径的约束解码，这并不能揭示模型是否真正理解其工具。我们引入了\textbf{ToolSense}，一个开源LLM驱动的诊断框架，它将任何工具目录作为输入，并自动生成三个基准：具有三个模糊级别查询的现实检索基准（RRB）、MCQ探测基准和QA探测基准。将ToolSense应用于ToolBench（约47k个工具）并评估五个参数化模型训练配置，揭示了知识-检索分离：在RRB查询上，与完全指定的ToolBench基准相比，几个配置下降了约50-64个百分点，低于嵌入模型基线。此外，尽管检索性能强劲，一些模型在事实探测上得分接近随机，表明存在知识-检索分离。我们在https://this URL上开源了ToolSense框架和ToolBench诊断基准。

英文摘要

Large language models deployed as agents over large tool catalogs face a critical tool-retrieval bottleneck. As embedding-based retrieval approaches rely on compact encoders that may under-capture specialized tool semantics, parametric tool retrieval addresses this by encoding each tool as a virtual token appended to the LLM vocabulary, fine-tuned in two stages (memorization then retrieval SFT) to use the LLM as a retriever, achieving strong performance on standard ToolBench retrieval benchmarks. Yet these benchmarks use verbose, fully-specified queries, and their evaluation applies constrained decoding that restricts outputs to valid token paths, neither reveals whether the model actually understands its tools. We introduce \textbf{ToolSense}, an open-source LLM-powered diagnostic framework that takes any tool catalog as input and automatically generates three benchmarks: a Realistic Retrieval Benchmark (RRB) with queries at three ambiguity tiers, an MCQ probing benchmark, and a QA probing benchmark. Applying ToolSense to ToolBench (~47k tools) and evaluating five parametric model training configurations reveals a knowledge-retrieval dissociation: on RRB queries, several configurations collapse by ~50-64 percentage points compared to fully-specified ToolBench benchmarks, falling below the embedding-model baseline. Additionally, despite strong retrieval performance, some models score near-random on factual probes, suggesting a knowledge-retrieval dissociation. We open-source the ToolSense framework and the ToolBench diagnostic benchmarks at https://github.com/SAP/toolsense.

URL PDF HTML ☆

赞 0 踩 0

2606.12608 2026-06-12 cs.CL cs.LG 交叉投稿

OpenMedQ：面向医学视觉语言模型的广泛开放预训练

Ibrahim Gulluk, Max Van Puyvelde, Olivier Gevaert

发表机构 * Stanford University（斯坦福大学）； Stanford University School of Medicine（斯坦福大学医学院）； Ghent University（根特大学）

AI总结提出OpenMedQ，在14个数据集（约335万样本）上预训练医学视觉语言模型，在PathVQA上BLEU-1达75.9，超越562B参数的Med-PaLM M，并在8个未见医学分类任务上取得最高平均macro-F1（0.757）。

Comments Medical Imaging with Deep Learning (MIDL) 2026, Short Paper Track

2606.13629 2026-06-12 stat.ME cs.AI cs.LG stat.ML 交叉投稿

Valid Inference with Synthetic Data via Task Exchangeability

通过任务可交换性实现基于合成数据的有效推断

Lezhi Tan, Tijana Zrnic

AI总结提出任务可交换性条件，确保在科学研究中使用合成数据进行统计推断的有效性，并给出在民意调查和AI评估中的应用。

详情

AI中文摘要

越来越多的工作主张在科学研究中使用合成数据。例如，社会科学家主张在试点研究中使用LLM生成的“硅样本”；AI评估越来越依赖“LLM作为裁判”的输出；蛋白质组学研究通过生成合成蛋白质结构的生成模型加速。这些发展引发了一个有趣的可能性：合成数据可以帮助研究人员提出更多问题、进行更多研究并加速发现。但它们也引发了一个根本性的担忧：合成数据可能有偏、有噪声且设定错误。在这项工作中，我们提出了在科学研究中使用合成数据的统计原则，并具有可证明的有效性保证。关键见解是一个我们称为任务可交换性的新技术条件。非正式地说，这是一个要求，即研究人员可以识别出有真实数据可用的历史任务，使得他们当前感兴趣的任务与历史任务在适当的数学意义上可交换。我们开发了在任务可交换性下进行有效推断的方法，以及即使在可交换性之外也能提供保证的扩展。我们通过硅样本的民意调查和自动评分器的AI评估来展示该框架。

英文摘要

There is a proliferation of work arguing for the use of synthetic data in scientific research. For example, social scientists are arguing for the use of LLM-generated "silicon samples" in pilot studies; AI evaluations increasingly rely on "LLM-as-a-judge" outputs; and proteomics research is accelerated by generative models that produce synthetic protein structures. These developments raise an intriguing possibility: synthetic data may help researchers ask more questions, run more studies, and accelerate discovery. But they also raise a fundamental concern: synthetic data can be biased, noisy, and misspecified. In this work, we propose statistical principles for using synthetic data in scientific research with provable validity guarantees. The key insight is a new technical condition that we call task exchangeability. Informally, this is a requirement that the researcher can identify historical tasks, for which real data is available, such that their current task of interest is exchangeable with the historical tasks in an appropriate mathematical sense. We develop methods for valid inference under task exchangeability, together with extensions that provide guarantees even beyond exchangeability. We demonstrate the framework on public opinion surveys with silicon samples and AI evaluation with autoraters.

URL PDF HTML ☆

赞 0 踩 0

2606.13647 2026-06-12 cs.CL cs.AI cs.LG 交叉投稿

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

SkMTEB：斯洛伐克大规模文本嵌入基准与模型适配

Marek Šuppa, Andrej Ridzik, Daniel Hládek, Natália Kňažeková, Viktória Ondrejová

发表机构 * Comenius University in Bratislava（布拉迪斯拉发夸美纽斯大学）； Cisco Systems（思科系统）； Technical University of Košice（科希策技术大学）； Kempelen Institute of Intelligent Technologies（肯佩伦智能技术研究所）

AI总结针对低资源西斯拉夫语斯洛伐克语，构建首个MTEB风格文本嵌入基准SkMTEB（含31个数据集、7类任务），并开发高效本地部署模型e5-sk-small/large，通过词汇裁剪与微调在参数减少62%下达到与商业API相当的竞争力。

Comments ACL 2026

详情

AI中文摘要

我们介绍了SkMTEB，这是首个针对斯洛伐克语（一种低资源西斯拉夫语）的全面MTEB风格文本嵌入基准，包含31个数据集，覆盖7种任务类型——几乎是现有斯洛伐克语多语言基准覆盖深度的4倍。我们对31个嵌入模型的评估表明，大型指令调优多语言模型表现最强，而现有的针对NLU任务训练的斯洛伐克语特定模型在嵌入任务上迁移效果不佳。为了满足高效、可本地部署的斯洛伐克语嵌入需求，我们通过对多语言E5模型进行词汇裁剪和微调，开发了\ exttt{e5-sk-small}（45M参数）和\ exttt{e5-sk-large}（365M）模型。尽管模型尺寸缩小了高达62%，我们的开源模型在性能上与专有API相当，同时仍可本地部署用于语义搜索和检索增强生成（RAG）。我们公开了基准、模型、数据集和代码，希望我们的方法能为其他资源匮乏的语言提供可复现的路径。

英文摘要

We introduce SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, a low-resource West Slavic language, comprising 31 datasets across 7 task types -- nearly 4$\times$ the depth of existing multilingual benchmark coverage for Slovak. Our evaluation of 31 embedding models reveals that large instruction-tuned multilingual models achieve the strongest performance, while existing Slovak-specific models trained for NLU tasks transfer poorly to embedding tasks. To address the need for efficient, locally-deployable Slovak embeddings, we develop \texttt{e5-sk-small} (45M parameters) and \texttt{e5-sk-large} (365M) by applying vocabulary trimming and fine-tuning to Multilingual E5 models. Despite size reductions of up to 62\%, our open-source models achieve competitive performance with proprietary APIs while remaining locally deployable for semantic search and retrieval-augmented generation (RAG). We release the benchmark, models, datasets, and code openly, hoping our approach offers a replicable path for other under-resourced languages.

URL PDF HTML ☆

赞 0 踩 0

2606.13649 2026-06-12 cs.CL cs.LG 交叉投稿

Operadic consistency: a label-free signal for compositional reasoning failures in LLMs

Operadic一致性：LLM中组合推理失败的无标签信号

Nathaniel Bottman, Yinhong Liu, Kyle Richardson

发表机构 * Incubilate ； University of Cambridge（剑桥大学）； Allen Institute for Artificial Intelligence（艾伦人工智能研究所）

AI总结提出Operadic一致性（OC）作为检测大语言模型组合推理失败的无标签信号，在四个多跳QA数据集上与准确率强相关（Pearson r≥0.86），优于自一致性等方法。

详情

AI中文摘要

在推理时检测LLM推理失败而无需真实标签，催生了广泛的置信度基线，包括自一致性、语义熵和P(True)，这些方法基于问题内采样和自我评估。Operad理论，即通过迭代替换构建系统的形式化方法，提出了一种补充性诊断：模型对组合查询的直接回答应与通过组合同一查询的分解陈述所产生的回答一致。我们将这一思想实例化为Operadic一致性（OC），一个每问题信号。在四个多跳QA数据集上的十二个指令微调LLM（4B到671B参数，开源和闭源）上，OC与每个数据集上的准确率强相关（Pearson r ∈ [0.86, 0.94]，所有p ≤ 0.0004），并且是我们评估的所有信号中唯一在所有四个数据集上均匀达到r ≥ 0.85的信号。思维链自一致性（CoT-SC；Wang等人，2023）在HotpotQA和DROP上与OC匹配（r = 0.93, 0.87），但在MuSiQue和StrategyQA上降至r ≈ 0.45。在每问题层面，OC在每个数据集上提供了超出CoT-SC和语义熵的信息（OC系数的聚类稳健p ≤ 10^{-16}），并且该结论在额外控制构造的分解感知基线时依然稳健（p ≤ 10^{-13}）。相同的信号在等成本K = 3预算下，相对于调优的CoT-SC基线产生了选择性预测改进（固定覆盖率下的准确率提升）（AUARC提升+0.086至+0.096，AUROC提升+0.092至+0.164；95%置信区间在每个单元上排除零）。在五个前沿思维模型上，其中分解从模型自身的思维链中提取，相同的等成本比较在所有测试的16个（数据集、预算、指标）单元上给出了正的选择性预测点估计提升，其中12个单元的95%置信区间排除零。

英文摘要

Detecting LLM reasoning failures at inference time without ground-truth labels has motivated a wide range of confidence baselines, including self-consistency, semantic entropy, and P(True), built on within-question sampling and self-evaluation. Operad theory, the formalism for systems built by iterated substitution, suggests a complementary diagnostic: a model's direct answer to a compositional query should agree with the answer it produces by composing a stated decomposition of the same query. We instantiate this idea as operadic consistency (OC), a per-question signal. Across twelve instruction-tuned LLMs (4B to 671B parameters, open-weights and closed-source) on four multi-hop QA datasets, OC is strongly correlated with accuracy on every dataset (Pearson $r \in [0.86, 0.94]$, all $p \leq 0.0004$), and is the only signal we evaluate with $r \geq 0.85$ uniformly across all four datasets. Chain-of-thought self-consistency (CoT-SC; Wang et al., 2023) matches OC on HotpotQA and DROP ($r = 0.93, 0.87$) but drops to $r \approx 0.45$ on MuSiQue and StrategyQA. At the per-question level, OC contributes information beyond CoT-SC and semantic entropy on every dataset (cluster-robust $p \leq 10^{-16}$ for the OC coefficient), and the conclusion is robust to additionally controlling for constructed decomposition-aware baselines ($p \leq 10^{-13}$). The same signal yields selective-prediction improvements (accuracy at fixed coverage) over a tuned CoT-SC baseline at the equal-cost $K = 3$ budget (AUARC lifts of +0.086 to +0.096 and AUROC lifts of +0.092 to +0.164; 95% CIs exclude zero on every cell). On five frontier thinking models, where the decomposition is extracted from the model's own chain of thought, the same equal-cost comparison gives positive selective-prediction point-estimate lift on all 16 (dataset, budget, metric) cells tested, with 95% CIs excluding zero on 12 of the 16.

URL PDF HTML ☆

赞 0 踩 0

2304.13836 2026-06-12 cs.LG cs.AI cs.CV stat.ME 版本更新

On Pitfalls of $\textit{RemOve-And-Retrain}$: Data Processing Inequality Perspective

论 $\textit{RemOve-And-Retrain}$ 的陷阱：数据处理不等式视角

Junhwa Song, Keumgang Cha, Junghoon Seo

发表机构 * KAIST（韩国科学技术院）

AI总结从信息论角度揭示ROAR基准的缺陷：数据无关的后处理可提升ROAR分数，导致对归因图信息量的误判，并发现模糊性偏差。

Comments Accepted at the 2026 ICML Workshop on Mechanistic Interpretability

2603.14407 2026-06-12 cs.LG 版本更新

Towards One-for-All Anomaly Detection for Tabular Data

面向表格数据的通用异常检测

Shiyuan Li, Yixin Liu, Yu Zheng, Xiaofeng Cao, Shirui Pan, Heng Tao Shen

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出OFA-TAD框架，通过多视图邻居距离表示和混合专家评分网络，实现跨领域表格异常检测的通用化，一次训练即可泛化到未见数据集。

Comments Accepted by ICML 2026

详情

AI中文摘要

表格异常检测（TAD）旨在识别表格数据中偏离大多数样本的样本，在许多实际应用中至关重要。然而，现有方法遵循“一个数据集一个模型（OFO）”范式，依赖于数据集特定的训练，导致计算成本高且对未见领域的泛化能力有限。为解决这些局限性，我们提出OFA-TAD，一个通用的“一劳永逸（OFA）”TAD框架，只需在多个源数据集上进行一次训练，即可即时泛化到来自不同领域的未见数据集。为实现通用表格异常检测，OFA-TAD提取邻居距离模式作为可迁移线索，并引入来自多个变换诱导度量空间的多视图邻居距离表示，以减轻距离分布对变换的敏感性。为自适应组合多视图距离证据，采用混合专家（MoE）评分网络进行视图特定异常评分和熵正则化门控融合，并采用多策略异常合成机制以支持单类约束下的训练。在来自14个领域的34个数据集上的大量实验表明，OFA-TAD在严格的OFA设置下实现了优越的异常检测性能和强大的跨领域泛化能力。源代码见：https://this URL。

英文摘要

Tabular anomaly detection (TAD) aims to identify samples that deviate from the majority in tabular data and is critical in many real-world applications. However, existing methods follow a ``one model for one dataset (OFO)'' paradigm, which relies on dataset-specific training and thus incurs high computational cost and yields limited generalization to unseen domains. To address these limitations, we propose OFA-TAD, a generalist one-for-all (OFA) TAD framework that only requires one-time training on multiple source datasets and can generalize to unseen datasets from diverse domains on-the-fly. To realize one-for-all tabular anomaly detection, OFA-TAD extracts neighbor-distance patterns as transferable cues, and introduces multi-view neighbor-distance representations from multiple transformation-induced metric spaces to mitigate the transformation sensitivity of distance profiles. To adaptively combine multi-view distance evidence, a Mixture-of-Experts (MoE) scoring network is employed for view-specific anomaly scoring and entropy-regularized gated fusion, with a multi-strategy anomaly synthesis mechanism to support training under the one-class constraint. Extensive experiments on 34 datasets from 14 domains demonstrate that OFA-TAD achieves superior anomaly detection performance and strong cross-domain generalizability under the strict OFA setting. The source code is available at https://github.com/Shiy-Li/OFA-TAD.

URL PDF HTML ☆

赞 0 踩 0

2605.20763 2026-06-12 cs.LG 版本更新

ShapeBench: A Scalable Benchmark and Diagnostic Suite for Standardized Evaluation in Aerodynamic Shape Optimization

ShapeBench: 一种可扩展的基准和诊断套件，用于气动形状优化的标准化评估

Shaghayegh Fazliani, Krissh Chawla, Jack Guo, Yiren Shen, Matthias Ihme, Madeleine Udell

发表机构 * Stanford University（斯坦福大学）； Spinoza Labs（斯皮诺扎实验室）

AI总结本文提出ShapeBench，一个开源的气动形状优化基准，提供统一的API，涵盖103个任务和八个形状类别，通过验证的代理模型和高保真CFD流程进行系统分析，展示了不同形状类别和问题形式中优化器排名的显著差异，强调了需要更通用方法的必要性。

详情

AI中文摘要

气动形状优化（ASO）的快速进展已超过了目前可用的标准化评估框架。公平比较需要一个覆盖多样形状类别、目标公式和匹配预算的统一基准。我们引入ShapeBench，一个开源的ASO基准，涵盖103个任务，跨越八个形状类别和多种优化模式。每个ShapeBench任务包括经过验证的代理模型以实现快速搜索；当可行时，提供高保真计算流体动力学（CFD）流程用于最终验证，从而实现系统化的保真度差距分析。ShapeBench提供可重复的协议和配置良好的基线，以使用一致的预算度量进行公平比较，允许在经典方法和LLM驱动方法之间进行比较，包括通用优化器和一个新的领域专用进化LLM基线，ShapeEvolve。在ShapeBench上的结果展示了不同形状类别和问题形式中优化器排名的显著差异，平均成对斯皮尔曼ρ=0.013，因此单任务结论无法可靠地推广到问题类别中。该基准还远未饱和；经典方法很少能适用于所有形状类别和任务，进一步强调了需要更通用方法的必要性。

英文摘要

Rapid progress in aerodynamic shape optimization (ASO) has outpaced currently-available standardized evaluation frameworks. Fair comparison requires a unified benchmark spanning diverse shape classes, objective formulations, and matched-budget state-of-the-art baselines. We introduce ShapeBench, an open-source ASO benchmark with a unified API spanning 103 tasks across eight shape categories and multiple optimization regimes. Each ShapeBench task includes a validated surrogate for fast search; when feasible, a high-fidelity Computational Fluid Dynamics (CFD) pipeline for final verification is available, enabling systematic fidelity-gap analysis. ShapeBench provides a reproducible protocol with well-configured baselines to compare fairly using a consistent budget metric, allowing for comparison among both classical and LLM-driven methods, including general-purpose optimizers and a new domain-specialized evolutionary LLM baseline, ShapeEvolve. Results on ShapeBench demonstrate substantial variance in optimizer rankings across shape categories and problem formulations, with mean pairwise Spearman $ρ= 0.013$, so single-task conclusions do not reliably generalize across problem classes. The benchmark is also far from saturation; classical methods are rarely applicable across all shape categories and tasks, further highlighting the need for more general-purpose approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.10642 2026-06-12 cs.LG physics.ao-ph 版本更新

PhysMetrics.Weather: An Evaluation Framework for Physical Consistency in ML Weather Models

PhysMetrics.Weather: 机器学习天气模型中物理一致性的评估框架

Emma Kasteleyn, Timo Maier, Axel Lauer, Veronika Eyring, Pierre Gentine, Ana Lucic

AI总结提出PhysMetrics.Weather评估框架，通过守恒、谱和动力学三类指标量化MLWP模型的物理真实性，指导物理信息架构开发并评估其运行可靠性。

Comments Preprint

2510.16380 2026-06-12 cs.CL cs.AI cs.CY cs.HC cs.LG 版本更新

MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

MoReBench：评估语言模型中的程序性和多元道德推理，超越结果

Yu Ying Chiu, Michael S. Lee, Rachel Calcott, Brandon Handoko, Paul de Font-Reaulx, Raphaël Millière, Paula Rodriguez, Chen Bo Calvin Zhang, Ziwen Han, Udari Madhushani Sehwag, Yash Maurya, Christina Q Knight, Harry R. Lloyd, Florence Bacus, Conor Downey, Mantas Mazeika, Bing Liu, Yejin Choi, Mitchell L Gordon, Sydney Levine

发表机构 * University of Washington（华盛顿大学）； New York University（纽约大学）； Scale AI ； Harvard University（哈佛大学）； University of Michigan（密歇根大学）； UNC Chapel Hill（北卡罗来纳大学教堂山分校）； Center for AI Safety（人工智能安全中心）； Stanford University（斯坦福大学）； MIT（麻省理工学院）； University of Oxford（牛津大学）

AI总结提出MoReBench基准，包含1000个道德场景和超过2.3万条标准，用于评估语言模型在道德推理中的程序性推理能力，发现现有基准无法预测模型表现，且模型对特定道德框架存在偏好。

Comments 46 pages, 8 figures, 10 tables. Published in ICLR 2026. Accepted at CHAI workshop and SPP 2026 (non-archival)

详情

AI中文摘要

随着人工智能系统的进步，我们越来越依赖它们与我们共同或代替我们做出决策。为了确保这些决策符合人类价值观，我们不仅需要理解它们做出了什么决策，还需要理解它们如何得出这些决策。推理语言模型能够提供最终响应和（部分透明的）中间思考轨迹，这为研究AI的程序性推理提供了及时的机会。与通常有客观正确答案的数学和代码问题不同，道德困境是过程导向评估的绝佳测试平台，因为它们允许多种可辩护的结论。为此，我们提出了MoReBench：包含1000个道德场景，每个场景配有一组专家认为在推理该场景时必须包含（或避免）的评分标准。MoReBench包含超过2.3万条标准，包括识别道德考量、权衡利弊以及给出可操作的建议，覆盖了AI为人类道德决策提供建议以及自主做出道德决策的情况。此外，我们整理了MoReBench-Theory：150个示例，用于测试AI是否能在规范伦理学的五个主要框架下进行推理。我们的结果表明，规模定律以及现有的数学、代码和科学推理任务基准无法预测模型进行道德推理的能力。模型还显示出对特定道德框架（例如边沁式的行为功利主义和康德义务论）的偏好，这可能是流行训练范式的副作用。这些基准共同推动了面向过程推理的评估，以实现更安全、更透明的AI。

英文摘要

As AI systems progress, we rely more on them to make decisions with us and for us. To ensure that such decisions are aligned with human values, it is imperative for us to understand not only what decisions they make but also how they come to those decisions. Reasoning language models, which provide both final responses and (partially transparent) intermediate thinking traces, present a timely opportunity to study AI procedural reasoning. Unlike math and code problems which often have objectively correct answers, moral dilemmas are an excellent testbed for process-focused evaluation because they allow for multiple defensible conclusions. To do so, we present MoReBench: 1,000 moral scenarios, each paired with a set of rubric criteria that experts consider essential to include (or avoid) when reasoning about the scenarios. MoReBench contains over 23 thousand criteria including identifying moral considerations, weighing trade-offs, and giving actionable recommendations to cover cases on AI advising humans moral decisions as well as making moral decisions autonomously. Separately, we curate MoReBench-Theory: 150 examples to test whether AI can reason under five major frameworks in normative ethics. Our results show that scaling laws and existing benchmarks on math, code, and scientific reasoning tasks fail to predict models' abilities to perform moral reasoning. Models also show partiality towards specific moral frameworks (e.g., Benthamite Act Utilitarianism and Kantian Deontology), which might be side effects of popular training paradigms. Together, these benchmarks advance process-focused reasoning evaluation towards safer and more transparent AI.

URL PDF HTML ☆

赞 0 踩 0

2602.13379 2026-06-12 cs.CR cs.AI cs.CL cs.LG cs.SE 版本更新

范围缩小，威胁依旧：重新评估2026前沿模型队列上的LLM包幻觉

Aleksandr Churilov

AI总结本文重新评估了2026前沿模型队列上大型语言模型（LLM）的包幻觉现象，发现尽管幻觉率有所降低，但仍然存在威胁，识别出一组127个包名（109个在PyPI，18个在npm）被所有评估模型一致生成，构成一个跨模型的供应链攻击面，同时发现Python与JavaScript幻觉的不对称性以及DeepSeek V3.2和GPT-5.4-mini之间的高相似性。

Comments 13 pages, 3 figures, 4 tables. v2: incorporates coordinated-disclosure feedback from PyPI Security and Socket.dev; registrable attack surface refined to 53 names (41 PyPI, 12 npm). Headline rates unchanged. Replication of Spracklen et al. (USENIX Security 2025). Data and code: https://github.com/churik5/slopsquatting-replication-2026 and https://doi.org/10.5281/zenodo.19859120

详情

AI中文摘要

Spracklen等人（USENIX Security '25）表明，生成代码的大型语言模型会以5.2%至21.7%的比率生成不存在于PyPI或npm上的包名，从而为slopsquatting攻击（恶意包的注册）提供了攻击面。我们在这五款2025年10月至2026年3月期间发布的前沿代码能力LLM上重复了他们的方法：Claude Sonnet 4.6、Claude Haiku 4.5、GPT-5.4-mini、Gemini 2.5 Pro和DeepSeek V3.2。在199,845个经过PyPI和npm主列表验证的Python和JavaScript提示对中，我们测量到幻觉率在4.62%（Claude Haiku 4.5）到6.10%（GPT-5.4-mini）之间——比Spracklen观察到的模型间差异缩小了一个数量级，但威胁并未消失。除了重复研究外，我们识别出一组127个包名（109个在PyPI，18个在npm）被所有评估模型一致生成，构成一个跨模型的供应链攻击面，无法由单一模型研究揭示。我们进一步记录了Python与JavaScript幻觉的不对称性，推翻了Spracklen 2024年的发现，识别出Anthropic家族中的Haiku低于Sonnet的倒置现象，并观察到DeepSeek V3.2和GPT-5.4-mini之间的Jaccard相似性峰值（J=0.343），暗示共享的训练数据起源。

英文摘要

Spracklen et al. (USENIX Security '25) showed that code-generating large language models hallucinate package names that do not exist on PyPI or npm at rates ranging from 5.2% on commercial models to 21.7% on open-source models, creating an attack surface for slopsquatting -- the registration of malicious packages under hallucinated names. We replicate their methodology on five frontier code-capable LLMs released between October 2025 and March 2026: Claude Sonnet 4.6, Claude Haiku 4.5, GPT-5.4-mini, Gemini 2.5 Pro, and DeepSeek V3.2. Across 199,845 paired Python and JavaScript prompts validated against PyPI and npm master lists, we measure overall hallucination rates between 4.62% (Claude Haiku 4.5) and 6.10% (GPT-5.4-mini) -- an order-of-magnitude compression of the inter-model spread observed by Spracklen, but not a retirement of the threat. Beyond replication, we identify a set of 127 package names (109 on PyPI, 18 on npm) that all five evaluated models invent identically; following coordinated disclosure with PyPI Security and Socket.dev, 53 of these (41 on PyPI, 12 on npm) remain registrable by an attacker after each registry's existing defenses, constituting a model-agnostic supply-chain attack surface that no single-model study can reveal. We further document a Python-over-JavaScript hallucination asymmetry that inverts Spracklen's 2024 finding, identify a Haiku-below-Sonnet inversion within the Anthropic family, and observe a Jaccard-similarity peak between DeepSeek V3.2 and GPT-5.4-mini (J = 0.343) suggestive of shared training-data origins.

URL PDF HTML ☆

赞 0 踩 0

2606.01538 2026-06-12 cs.GR cs.CV cs.LG 版本更新

MPMWorlds: Material-Point-Method Simulations for Inferring and Extrapolating Physical Dynamics

MPMWorlds: 用于推断和外推物理动力学的物质点法模拟

Žiga Kovačič, Kevin Ellis

发表机构 * Cornell University（康奈尔大学）

AI总结通过构建2D物质点法（MPM）模拟数据集，研究从视频推断物理动力学并外推时间演化的能力，比较代码生成与视频扩散方法的优劣。

Comments 16 pages, 13 figures. Project page: https://zzigak.github.io/mpmworlds/

2606.04525 2026-06-12 cs.CL cs.LG q-bio.GN 版本更新

GENEB: Why Genomic Models Are Hard to Compare

GENEB：为什么基因组模型难以比较

Daria Ledneva, Mikhail Nuridinov, Denis Kuznetsov

发表机构 * GitHub ； arXiv

AI总结针对基因组基础模型评估碎片化的问题，提出GENEB基准，通过统一探测协议在100项任务上比较40个模型，揭示模型排名不稳定、规模收益有限等关键发现。

Comments change first page figure, fix model sizes, add more consistency

详情

AI中文摘要

由于基准碎片化、评估协议不兼容以及任务特定报告，基因组基础模型的进展难以评估。因此，关于模型优越性或通用性的声明往往无法直接比较。我们引入GENEB，这是一个大规模诊断基准，在统一的基于探测的协议下（包括少样本场景），评估来自40个基因组基础模型的冻结表示，涵盖100个任务，跨越13个功能类别。GENEB能够在明确暴露任务级权衡的同时，对模型规模、架构、分词和预训练数据进行受控比较。我们的分析表明，整体排行榜不稳定：模型排名在不同任务类别间变化剧烈，规模仅带来适度且不一致的收益，而架构和预训练对齐常常超过参数数量的影响。这些结果凸显了当前评估实践的局限性，并将GENEB定位为基因组机器学习中原则性比较和类别感知模型选择的参考框架。

英文摘要

Progress in genomic foundation models is difficult to assess due to fragmented benchmarks, incompatible evaluation protocols, and task-specific reporting. As a result, claims of superiority or generality across models are often not directly comparable. We introduce GENEB, a large-scale diagnostic benchmark that evaluates frozen representations from 40 genomic foundation models across 100 tasks spanning 13 functional categories under a unified probing-based protocol, including few-shot regimes. GENEB enables controlled comparison across model scale, architecture, tokenization, and pretraining data while explicitly exposing task-level trade-offs. Our analysis shows that aggregate leaderboards are unstable: model rankings vary sharply across task categories, scale provides only modest and inconsistent gains, and architectural and pretraining alignment frequently outweigh parameter count. These results highlight limitations of current evaluation practices and position GENEB as a reference framework for principled comparison and category-aware model selection in genomic machine learning.

URL PDF HTML ☆

赞 0 踩 0

2606.05405 2026-06-12 cs.AI cs.CL cs.LG 版本更新

Agents' Last Exam

Yiyou Sun, Xinyang Han, Weichen Zhang, Yuanbo Pang, Tianyu Wang, Yuhan Cao, Yixiao Huang, Chris Duroiu, Haoyun Zhang, Jeffrey Lin, Weishu Zhang, Tyler Zeng, Ying Yan, Bo Liu, Hanson Wen, Mingyang Xu, Xiaoyuan Liu, Zimeng Chen, Weiyan Shi, Amanda Dsouza, Vincent Sunn Chen, Patrick Bryant, Carl Boettiger, Yamini Rangan, Bradley Rothenberg, Kyle Steinfeld, Arvind Rao, Tapio Schneider, Georgios Yannakakis, Laure Zanna, Kaan Ozbay, Ida Sim, Tarek Zohdi, George Em Karniadakis, Jack Gallant, Teresa Head-Gordon, Yushan Li, Wenxi Deng, Tao Sun, Huiqi Wang, Zhun Wang, Justin Xu, Chris Yuhao Liu, Yafei Cheng, Rongwang Hu, Aras Bacho, Shengcao Cao, Zengyi Qin, Yixiong Chen, Hengduan Fan, Hao Liu, Lin Zeng, Shashank Muralidhar Bharadwaj, Litian Gong, Yingxuan Yang, Maojia Song, Ruheng Wang, Zongzheng Zhang, Honglin Bao, Shuo Lu, Jianhong Tu, Zhonghua Wang, Zheng Zhang, Zijiao Chen, Yanqiong Jiang, Zhendong Li, Bohan Lyu, Chang Ma, Peiran Xu, Benran Zhang, Shangding Gu, Haoyue Hua, Haoyang Li, Wanzhe Liao, Chengzhi Liu, Junbo Peng, Haoran Sun, Zechen Xu, Bo Chen, Jiayi Cheng, Yi Jiang, Keying Kuang, Yuan Li, Youbang Pan, Ziyan Rao, Alexander Schubert, Yifan Shen, Vincent Siu, Xiatao Sun, Kangqi Zhang, Xiaopan Zhang, Yuchen Zhu, Ishaan Singh Chandok, Lei Ding, Jingxuan Fan, Andrew Glover, Jiaming Hu, Yiran Hu, Wenbo Huang, Zixin Jiang, Haoran Jin, Lukas Kim, Ming Liu, Yang Liu, Alireza Rafiei, Xuhuan Shen, Kunyang Sun, Sophia Sun, Ting Sun, Eric Wang, Yixin Wang, Hanwen Xing, Sihan Xu, Yuzheng Xu, Zhongxing Xu, Zhiling Yan, Boqin Yuan, Ruiqi Zhang, Yifan Zhang, Zibo Zhao, Liana, Santanu Bosu Antu, Haoyue Bai, Carlo Bosio, Joseph Cavanagh, Patricia Cavazos-Rehg, Tianxing Chen, Xuewen Chen, Yipu Chen, Chenyu Zhu, Chen Dai, Stefano De Castro, Yunfu Deng, Kaustubh Dhole, Jiayuan Ding, Chenchen Du, Zhehang Du, Hao Fan, Run-Ze Fan, Hengyu Fu, Shi Gu, Yifan Gu, Charlie Guo, Baihe Huang, Baixiang Huang, Rimika Jaiswal, Zhihan Jiang, Ran Jin, Erin Kasson, Xin Lan, Joseph Lee, Deren Lei, Chenyu Li, Daofeng Li, Haitao Li, Hongwei Li, Jingyan Li, Xiao Li, Yi Li, Yinsheng Li, Yuangang Li, Zhixu Li, Wenyu Liang, Longtai Liao, Kevin Qinghong Lin, Andy Zeyi Liu, Che Liu, Jiaming Liu, Kaiyuan Liu, Xuan Liu, Pan Lu, Wenbo Lv, Yicheng Lyu, Qiuyang Mang, Kyle Montgomery, Yuzhou Nie, Ruoxi Ning, Jorin Overwiening, Xu Pan, Layna Paraboschi, Core Francisco Park, Justin Purnomo, Swati Rajwal, Scott Rankin, Bixuan Ren, Yiren Rong, HaoYang Shang, Ventus Shaw, Fiona Shen, Jiawei Shen, Minqi Shi, Shi Qiu, Huaxiu Yao, Tianneng Shi, Jonah So, Vladislav Susoy, Hannah Szlyk, Haocheng Wang, Jialu Wang, Wei Wang, Xinyu Wang, Zehao Wang, Dowling Wong, Angela Wu, Dehao Wu, Fangyu Wu, Mengyuan "Millie" Wu, Yu Wu, Yuchen Wu, Yuhao Wu, Qingpo Wuwu, Weihang Xiao, Yongyi Xiong, Fan Xu, Ruiling Xu, Mingxuan Yan, Benjamin Yang, Jirong Yang, Sen Yang, Xiaoli Yang, Yushi Yang, Haoran Ye, Xiaohu Yu, Zhengming Yu, Chenlong Zhang, Chi Zhang, Hanning Zhang, Hanwen Zhang, Junge Zhang, Kunpeng Zhang, Song Zhang, Wenjin Zhang, Wenshuo Zhang, Ying Zhang, Yizhi Zhang, Brian Zhao, Qijian Zhao, Yimin Zhao, Yuhaohua Zheng, Liwei Zhou, Tianyue Zhou, Sichen Zhu, Siqi Zhu, Yan Zhu, Yishu Zhu, Jierui Zuo, Chonghao Cai, Helena Casademunt, Wenjia Chen, Cheng Cheng, Nawen Deng, Rao Fu, Tianfu Fu, Yifan Han, He Ren, Zhenyu He, Qiao Jin, Langlang Li, Yuetai Li, Sylvia Liu, Lu Lu, Luqing Zhou, Subhabrata Mukherjee, Yunqi Ouyang, Yin Ren, Dawei Shi, Haoran Wu, Zhiyue Wu, Hannah Yao, Zhuoran Yi, Jenny Yu, Rhea Zhan, Hang Zhou, Blake Zhu, Junfan Zhu, Alan Yuille, Yang Liu, Russell Alan Poldrack, Jiachen Li, Zhenglu Li, Molei Tao, Jing Huang, Wenqi Shi, Costas Spanos, Lichao Sun, Chenguang Wang, Orson Xu, Zhen Dong, Hector Gomez, Aylin Caliskan, Ali Emami, Haimin Hu, Zhi Li, Lihui Liu, Murphy Niu, Yi Shao, Jianxin Sun, Mikko Tolonen, Ting Wang, Sanjiv Das, Yanjun Gao, Wenbo Guo, Erika J Schneider, Zhiyong Lu, Yian Ma, Mark Mueller, Radha Poovendran, Somayeh Sojoudi, Yinglun Zhu, Dawn Song

发表机构 * arXiv

AI总结针对AI系统在专业领域缺乏经济性部署的问题，提出Agents' Last Exam (ALE)基准，通过250+专家协作构建覆盖13个行业集群55个子领域的1000+长期真实经济任务，当前最难层级平均通过率仅2.6%。

Comments Project website: https://agents-last-exam.org Code: https://github.com/rdi-berkeley/agents-last-exam

详情

AI中文摘要

最近的AI系统在广泛基准测试中取得了强劲结果，但这些成果并未转化为许多专业领域的经济上有意义的部署。我们认为这一差距主要是评估问题：广泛使用的基准缺乏对真实且经济上有价值的工作流程的持续性能测量。本文介绍了Agents' Last Exam (ALE)，这是一个旨在评估AI代理在长期、经济上有价值、结果可验证的真实世界任务上的基准。与250多名行业专家合作开发，ALE涵盖了参考O*NET/SOC 2018（美国联邦职业分类）定义的非实体行业。它围绕一个任务分类法组织，包含55个子领域，分为13个行业集群，涵盖1000多个任务。当前结果显示，最难层级远未饱和：在主流框架和骨干配置下，平均完全通过率为2.6%。ALE被设计为一个活的基准：其任务池随着新工作流程和行业的加入而持续增长。更广泛地说，ALE不仅旨在作为另一个排行榜，而是作为缩小基准成功与GDP相关影响之间差距的工具。

英文摘要

Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not translated into economically meaningful deployment across many professional domains. We argue that this gap is largely an evaluation problem: widely used benchmarks lack sustained performance measurement on real and economically valuable workflows. This paper introduces Agents' Last Exam (ALE), a benchmark designed to evaluate AI agents on long horizon, economically valuable, real world tasks with verifiable outcomes. Developed in collaboration with 250+ industry experts, ALE covers non-physical industries defined with reference to O*NET / SOC 2018 (the U.S. federal occupational taxonomy). It is organized around a task taxonomy with 55 sub fields grouped into 13 industry clusters covering 1K+ tasks. Current results show that the hardest tier remains far from saturated: across mainstream harness and backbone configurations, the average full pass rate is below 1%. ALE is designed as a living benchmark: its task pool grows continuously as new workflows and industries are onboarded. More broadly, ALE is intended not merely as another leaderboard, but as an instrument for closing the gap between benchmark success and GDP relevant impact.

URL PDF HTML ☆

赞 0 踩 0

2606.08098 2026-06-12 cs.AI cs.LG 版本更新

When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference

何时委托优于多数？一种基于委托的多样本LLM推理聚合器

Yasushi Sakai, Allen Song, Kent Larson

发表机构 * MIT Media Lab（麻省理工学院媒体实验室）

AI总结提出基于委托的聚合器PPV，利用样本的字母熵和推理几何信号，在MMLU-Pro上比多数投票高1.5个百分点，无需标签或训练。

Comments Preprint. 16 pages, 5 figures, 4 tables

详情

AI中文摘要

多数投票是对多样本LLM推理进行无监督聚合的主流方法。我们证明，将每个样本携带的信号输入基于委托的聚合器（传播代理投票，PPV）可产生一种无监督共识规则，在MMLU-Pro上整体比多数投票高1.5个百分点，在非平凡子集上高2.24个百分点（配对McNemar p ~ 1.0e-14，n = 8,099）。多数投票丢弃了每个样本携带的两个自由信号：组内字母熵和组间推理几何。PPV暴露了两个每个投票者使用的杠杆，它们恰好消耗这些信号：WHEN（投票者保留自己选择的权重）和WHOM（如何将剩余权重分配给同行）。我们使用字母熵驱动WHEN，使用以问题为中心的嵌入余弦驱动WHOM。该方法不需要真实标签和辅助训练：对于每个问题，我们将128个采样生成划分为16组，计算每组的字母级语义熵和推理嵌入质心，并将两者输入随机委托矩阵，其平稳分布选择共识答案。我们通过一个例子说明PPV如何推翻一个明显的10-6多数（错误答案）：10票的多数簇几何上不连贯（平均簇内余弦-0.02），而6票的少数簇紧凑（+0.26），因此传播的委托质量集中在少数派的答案上，尽管仅凭熵会使多数保持领先。我们还报告了具有负面结果的委托策略，这些策略限制了无监督LLM聚合的设计空间：没有问题内的置信度模式集成能够缩小与oracle的差距。

英文摘要

Majority voting over sampled answers is the dominant unsupervised aggregator for multi-sample LLM inference. In this paper, we show a delegation-based aggregator (Propagational Proxy Voting, PPV; Sakai et al., 2025) yields an unsupervised consensus rule that beats majority on MMLU-Pro by +1.5 pp overall and +2.24 pp on the non-trivial subset (paired McNemar p ~ 1.0e-14, n = 8,099). Majority discards two signals that every sample carries: within-group letter entropy and between-group reasoning geometry. PPV exposes per-voter levers that consume exactly these two signals: When (how much weight a voter keeps on its own pick) and Whom (how it splits the remainder across peers). We drive When with letter entropy and Whom with per-question-centered embedding cosine. Our method needs no gold labels and no auxiliary training: per-question, we partition 128 sampled generations into 16 groups, compute each group's letter-level semantic entropy and reasoning embedding centroid, and feed both into a stochastic delegation matrix whose stationary distribution selects the consensus answer. We walk through an example in which PPV overturns a clear 10-6 majority for the wrong letter: the 10-voter majority cluster is geometrically incoherent (mean within-cluster cosine -0.02) while the 6-voter minority is tight (+0.26), so propagated delegation mass concentrates on the minority's answer even though entropy alone would keep the majority ahead. We further report delegation strategies with negative results that constrain the design space for unsupervised LLM aggregation. No within-question ensemble of confidence modes closes the oracle gap.

URL PDF HTML ☆

赞 0 踩 0

2606.12500 2026-06-12 cs.LG cs.AI 新提交

基于可穿戴传感器数据的2型糖尿病个性化血糖评估：LLM驱动方法

Yifan Gao, Yanmin Gong, Yun Shi, Yuanxiong Guo

发表机构 * Department of Information Systems and Cybersecurity, The University of Texas at San Antonio（德克萨斯大学圣安东尼奥分校信息系统与网络安全系）； School of Engineering Medicine, Texas A&M University（德克萨斯农工大学工程医学院）； Department of Family and Community Medicine, The University of Texas at San Antonio（德克萨斯大学圣安东尼奥分校家庭与社区医学系）

AI总结提出GlyLLM框架，利用大语言模型整合可穿戴传感器数据和结构化元数据，实现个性化血糖动态建模，在血糖预测和糖尿病分类任务上分别比传统ML方法提升13.66%和13.08%。

Comments The 14th IEEE International Conference on Healthcare Informatics, 2026

详情

AI中文摘要

2型糖尿病（T2D）对全球健康构成日益严重的威胁，需要有效的血糖评估来支持个性化和改进的糖尿病护理。可穿戴传感器如连续血糖监测仪（CGM）和健身追踪器为血糖评估提供了许多有价值的见解。然而，有效分析这些数据需要与重要的个体层面背景信息整合。现有方法通常基于传统机器学习（ML），主要依赖历史血糖测量值，忽略了个性化信息，这限制了它们在多样化糖尿病群体中的性能。大语言模型（LLMs）的最新进展展示了它们整合多种数据模态同时建模序列依赖性的能力，激发了探索其在个性化血糖评估中潜力的兴趣。在本文中，我们提出了GlyLLM，一个基于LLM的框架，通过整合可穿戴传感器数据和结构化元数据来建模基于CGM的血糖动态。GlyLLM可以利用预训练LLM的广泛先验知识，并在决策时实现传感器-文本语义抽象。在AI-READI数据集上的两个相关任务实验表明，我们的模型在血糖预测的均方根误差（RMSE）上平均优于传统ML方法13.66%，在糖尿病分类的受试者工作特征曲线下面积（AUROC）上平均优于13.08%。此外，我们的消融研究表明，糖尿病调查和生物特征测试比其他健康信息对血糖评估更为关键。我们的工作为利用LLM推进T2D护理中的个性化血糖评估迈出了有希望的一步。

英文摘要

Type 2 Diabetes (T2D) poses an increasing global health threat, demanding effective glycemic assessment to support personalized and improved diabetes care. Wearable sensors such as continuous glucose monitors (CGM) and fitness trackers offer many valuable insights for glycemic assessment. However, effectively analyzing these data requires integration with essential individual-level context. Existing methods are often based on traditional machine learning (ML) and rely primarily on historical blood glucose measurements and overlook personalized information, which limits their performance across diverse diabetes populations. Recent advances in large language models (LLMs) have demonstrated their ability to integrate diverse data modalities while modeling sequential dependencies, motivating the exploration of their potential for personalized glycemic assessment. In this paper, we propose GlyLLM, an LLM-powered framework for modeling CGM-based glycemic dynamics through the integration of wearable sensor data and structured metadata. GlyLLM can leverage the extensive prior knowledge of pre-trained LLMs and achieve sensor-text semantic abstraction at decision time. Experiments on two related tasks on the AI-READI dataset demonstrate that our model outperforms traditional ML methods by an average of 13.66\% in Root Mean Squared Error (RMSE) for glucose forecasting and 13.08\% in Area Under the Receiver Operating Characteristic (AUROC) for diabetes categorization. Additionally, our ablation study shows that diabetes surveys and biometric tests are more critical than other health information for glycemic assessment. Our work presents a promising step toward harnessing the power of LLMs to advance personalized glycemic assessment in T2D care.

URL PDF HTML ☆

赞 0 踩 0

2606.12735 2026-06-12 cs.LG 新提交

Physics-Informed Neural Networks and Radial Basis Functions for PDEs with Dirac Delta Sources

物理信息神经网络与径向基函数求解含狄拉克δ源的偏微分方程

Manuel Reyna, Alexandre Tartakovsky

发表机构 * Department of Civil and Environmental Engineering, University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校土木与环境工程系）

AI总结针对含狄拉克δ项的偏微分方程，通过将物理信息神经网络解释为残差最小二乘法，利用弱形式直接处理δ项，并对比径向基函数展开方法，发现径向基函数-残差最小二乘法在输运问题中更稳定。

Comments 33 pages, 4 figures

详情

AI中文摘要

物理信息神经网络（PINNs）是一种用于求解正向和逆向偏微分方程（PDEs）的机器学习方法。当应用于强迫项、边界条件或初始条件中包含狄拉克δ函数的PDEs时，PINNs需要用光滑的代理函数来近似它们，这种做法可能会引入显著的建模误差。在这项工作中，我们利用PINNs作为残差最小二乘法（RLS）的解释，并表明这种视角能够通过积分弱形式方程直接处理狄拉克δ项。在除PINN之外的RLS公式中，我们重点关注径向基函数（RBF）展开（也称为单层RBF网络）。我们证明，虽然在PINNs中积分掉狄拉克δ会导致残差无法收敛到零，但RBF-RLS始终能为输运问题提供良好的正向和逆向解。我们使用神经正切核（NTK）理论解释这一发现。我们在代表多孔介质和河流中地下水流和输运的线性PDEs上测试了这两种方法。我们求解逆问题以拟合合成数据、含噪声的合成数据以及真实世界测量值。

英文摘要

Physics-Informed Neural Networks (PINNs) are a machine learning method for solving forward and inverse Partial Differential Equations (PDEs). When applied to PDEs with Dirac delta functions in the forcing terms, boundary conditions, or initial conditions, PINNs require approximating them with smooth surrogate functions, a practice that can introduce significant modeling errors. In this work, we exploit the interpretation of PINNs as Residual Least Squares (RLS) methods and show that this perspective enables direct treatment of Dirac delta terms by integrating the weak-form equation. Among RLS formulations other than PINN, we focus on the Radial Basis Function (RBF) expansion (also known as a single-layer RBF Network). We show that while integrating out the Dirac delta in PINNs causes residuals to fail to converge to zero, RBF-RLS consistently provides good forward and inverse solutions to transport problems. We explain this finding using the Neural Tangent Kernel (NTK) theory. We test both approaches on linear PDEs that represent groundwater flow and transport in porous media and rivers. We solve inverse problems to fit synthetic data, noisy synthetic data, and real-world measurements.

URL PDF HTML ☆

赞 0 踩 0

2606.12843 2026-06-12 cs.LG cs.CE 新提交

Interpretable Factor Decomposition for Decision Intelligence in Large-Scale Financial Markets: Evidence from China's A-Share Market

可解释因子分解用于大规模金融市场决策智能：来自中国A股市场的证据

Xiao Han, Yao Xiao, Zhen Zhang, Moxuan Zheng

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出可解释机器学习流程，将截面股票收益预测分解为可审计因子贡献，使用XGBoost和TreeSHAP在中国A股市场验证，发现行为信号贡献58.2%预测归因。

详情

AI中文摘要

我们提出一个可解释的机器学习流程，将截面股票收益预测分解为可审计的因子贡献。我们应用带有TreeSHAP归因的XGBoost模型，对2009年至2019年的3632只中国A股进行压力测试。使用60个月滚动窗口，在55个月的样本外数据上，XGBoost获得平均AUC为0.547，且前五分之一与后五分之一的多空价差为+2.38%/月（Newey-West t = 5.94；年化夏普比率2.23）。在调整Carhart四因子模型后，该alpha持续存在（+2.31%/月；t = 7.48）。SHAP分解表明，在55个行业组中，行为信号（换手率和动量）平均占预测归因的58.2%，而估值比率仅占10.7%。消融分析用于交叉验证这一排名，并提供证据表明SHAP和消融以突出特征可替代性结构的方式产生分歧，而这种结构在单独使用任一方法时几乎不可见。

英文摘要

We present an interpretable machine learning pipeline to decompose Cross-Sectional Equity Return Predictability into auditable factor contribution. We apply an XGBoost model with TreeSHAP attribution and conduct stress testing on 3632 Chinese A-share stocks from 2009 until 2019. Using 60-month, rolling windows over 55 months of out-of-sample data, XGBoost obtains a mean AUC of 0.547 and +2.38%/month (Newey-West t = 5.94; Annualized Sharpe 2.23) long-short spread for the top vs bottom quintiles. This alpha is persistent after adjusting for the Carhart four-factor model (+2.31%/month; t = 7.48). SHAP Decomposition indicates that behavioral signals (turnover and momentum) account for 58.2% of predictive attribution compared to 10.7% for valuation ratios, on average, across 55 industry groups. Ablation analysis serves to cross-validate this ranking and provides evidence that SHAP and ablation diverge in a manner that highlights feature substitutability structure that is largely invisible to either method used in isolation.

URL PDF HTML ☆

赞 0 踩 0

2606.12971 2026-06-12 cs.LG 新提交

Predicting Cognitive Load from Speech and Interaction Dynamics in Dyadic Conversations

从二元对话中的语音和交互动态预测认知负荷

Tahiya Chowdhury

发表机构 * Department of Computer Science, Colby College（科尔比学院计算机科学系）

AI总结研究在自然协作对话中，通过语音和交互动态特征预测感知认知负荷，发现对话交互（如话轮转换）能有效预测时间压力、脑力工作等认知负荷维度。

Comments Accepted to Interspeech 2026

详情

AI中文摘要

从语音估计认知负荷主要在受控实验室环境中研究，对其在自然协作对话中的可靠性了解有限。我们研究语音和交互动态是否能预测二元对话中的感知认知负荷。我们分析了53对执行九项协作任务的对话音频，提取静态声学、动态和交互特征，训练双头门控循环单元编码器预测认知负荷分数。结果表明，对话交互为预测与时间压力、脑力工作、努力和任务表现相关的认知负荷提供了有用信号。时间需求与话轮转换动态（如重叠和说话者切换）相关，而脑力需求与说话者之间的不平衡参与相关。这些发现强调了任务结构和对话交互在自然协作环境中建模认知负荷的重要性。

英文摘要

Estimating cognitive load from speech has largely been studied in controlled laboratory settings, with limited understanding of its reliability in natural collaborative conversations. We investigate whether speech and interaction dynamics predict perceived cognitive load during dyadic conversations. We analyze audio from 53 dyads performing nine collaborative tasks and extract static acoustic, dynamic, and interaction features to train a two-head Gated Recurrent Unit encoder to predict cognitive load scores. Results show conversational interaction provides useful signals for predicting cognitive load related to time pressure, mental work, effort, and task performance. Temporal demand is associated with turn-taking dynamics such as overlap and speaker switch, while mental demand is linked to imbalanced participation between speakers. These findings highlight the importance of task structure and conversational interaction for modeling cognitive load in natural collaborative settings.

URL PDF HTML ☆

赞 0 踩 0

2606.13007 2026-06-12 cs.LG cs.AI 新提交

scLLM-DSC: LLM-Knowledge Enhanced Cross-Modal Deep Structural Clustering for Single-Cell RNA Sequencing

scLLM-DSC：基于LLM知识增强的跨模态深度结构聚类用于单细胞RNA测序

Ping Xu, Pengjiang Li, Tian Du, Zaitian Wang, Jiawei Gu, Ziyue Qiao, Pengfei Wang, Yuanchun Zhou

发表机构 * Computer Network Information Center, Chinese Academy of Sciences（中国科学院计算机网络信息中心）； University of Chinese Academy of Sciences（中国科学院大学）； Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences（中国科学院大学杭州高等研究院）； School of Computing and Information Technology, Great Bay University（大湾区大学计算机科学与技术学院）； School of Engineering, Westlake University（西湖大学工学院）

AI总结提出scLLM-DSC框架，通过知识驱动语义视图与结构感知拓扑视图的跨模态对比对齐，利用LLM增强单细胞RNA测序数据的聚类性能，显著优于现有方法。

详情

AI中文摘要

聚类是scRNA-seq分析的基础，是识别细胞群体和解析组织异质性的基石。然而，现有方法专注于挖掘数值统计模式，由于忽略了基因编码的内在生物学功能，存在语义不可知的问题。虽然大语言模型（LLM）提供了有前景的语义能力，但生成式预训练目标与判别式下游任务之间的结构不匹配阻碍了它们直接适应细胞聚类。为弥合这一差距，我们提出了scLLM-DSC，一种新颖的LLM知识增强跨模态深度结构聚类框架。与数据驱动范式不同，scLLM-DSC通过协同两个视图建立语义基础表示：从NCBI基因先验和上下文化的Cell2Sentence嵌入中提取的知识驱动语义视图，以及通过图引导编码器提取的结构感知拓扑视图。关键的是，我们引入了一种跨模态对比对齐机制，以在统一潜在空间中强制生物学语义与转录组特征之间的一致性。广泛的基准测试表明，scLLM-DSC在聚类准确性上显著优于十一个最先进的基线方法。

英文摘要

Clustering is fundamental to scRNA-seq analysis, serving as a cornerstone for identifying cell populations and resolving tissue heterogeneity. However, existing methods focus on mining numerical statistical patterns, suffering from semantic agnosticism by neglecting the intrinsic biological functions encoded by genes. While Large Language Models (LLMs) offer promising semantic capabilities, their direct adaptation to cell clustering is hindered by the structural mismatch between generative pre-training objectives and discriminative downstream tasks. To bridge this gap, we propose scLLM-DSC, a novel LLM-Knowledge Enhanced Cross-Modal Deep Structural Clustering framework. Diverging from data-driven paradigms, scLLM-DSC establishes a semantically-grounded representation by synergizing two views: a Knowledge-Driven Semantic View derived from NCBI gene priors and contextualized Cell2Sentence embeddings, and a Structure-Aware Topological View extracted via a graph-guided encoder. Crucially, we introduce a cross-modal contrastive alignment mechanism to enforce consistency between biological semantics and transcriptomic features within a unified latent space. Extensive benchmarks demonstrate that scLLM-DSC significantly outperforms eleven state-of-the-art baselines in clustering accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.13024 2026-06-12 cs.LG cs.AI 新提交

CausalMoE: A Billion-Scale Multimodal Foundation Model for Granger Causal Discovery with Pattern-Routed Heterogeneous Experts

CausalMoE：基于模式路由异构专家的十亿规模多模态基础模型用于格兰杰因果发现

Bo Liu, Di Dai, Jingwei Liu, Jiarui Jin, Xiaocheng Fang, Guangkun Nie, Hongyan Li, Shenda Hong

发表机构 * State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University（北京大学智能科学与技术学院通用人工智能国家重点实验室）； National Institute of Health Data Science, and Institute for Artificial Intelligence, Peking University（北京大学健康医疗大数据国家研究院、人工智能研究院）

AI总结提出CausalMoE，一种十亿规模多模态格兰杰因果基础模型，通过模式路由混合异构专家解耦动态机制，结合因果自注意力与LLM/VLM先验，实现稀疏因果图恢复，在监督和少样本场景中达到最优。

详情

AI中文摘要

格兰杰因果发现（GCD）是分析复杂系统中时间依赖性的基础。然而，现有的神经GCD方法主要依赖“一刀切”范式，难以捕捉真实世界时间序列中固有的分布偏移和动态机制变化，常导致表示纠缠和虚假因果图。本文提出CausalMoE，一种十亿规模多模态格兰杰因果基础模型，显式建模补丁级异质性。CausalMoE引入模式路由混合异构专家，动态识别潜在时间模式并将补丁路由到专门领域专家，有效解耦机制特定动态与共享动态。为确保可解释的图恢复，我们设计了一种跨变量运行的因果感知自注意力机制，通过近端优化生成稀疏格兰杰因果图。此外，CausalMoE是首个集成LLM和VLM以对齐数值信号与文本和视觉先验的模型，在复杂场景中正则化因果估计。大量实验表明，CausalMoE在全监督基准上达到新最优，同时在传统方法失败的少样本设置中有效泛化。

英文摘要

Granger Causal Discovery (GCD) is fundamental for analyzing temporal dependencies in complex systems. However, existing neural GCD methods predominantly rely on a "one-size-fits-all" paradigm, struggling to capture distribution shifts and dynamic regime changes inherent in real-world time series. This often leads to entangled representations and spurious causal graphs. In this paper, we propose CausalMoE, a billion-scale multimodal Granger causal foundation model that explicitly models patch-level heterogeneity. CausalMoE introduces a Pattern-Routed Mixture of Heterogeneous Experts, which dynamically identifies latent temporal patterns and routes patches to specialized domain experts, effectively decoupling regime-specific mechanisms from shared dynamics. To ensure interpretable graph recovery, we design a Causality-Aware Self-Attention mechanism operating across variables, yielding sparse Granger causal graphs via proximal optimization. Furthermore, CausalMoE is the first to integrate LLMs and VLMs to align numerical signals with textual and visual priors, regularizing causal estimation in complex scenarios. Extensive experiments demonstrate that CausalMoE establishes a new state-of-the-art on fully supervised benchmarks, while effectively generalizing to few-shot settings where traditional methods fail.

URL PDF HTML ☆

赞 0 踩 0

2606.13060 2026-06-12 cs.LG 新提交

A green solvent screening tool for emerging materials via uncertainty aware, transformer enhanced transfer learning

一种面向新兴材料的绿色溶剂筛选工具：基于不确定性感知、Transformer增强的迁移学习

Ioannis Kouroudis, Simon Ternes, Zhaosu Gu, Gohar Ali Siddiqui, Marina Ustinova, Angelo Lembo, Alessio Gagliardi, Aldo Di Carlo

发表机构 * Technical University of Munich（慕尼黑工业大学）； Institute of Structure of Matter – National Research Council Rome (ISM-CNR)（罗马国家研究委员会物质结构研究所）； University of Rome "Tor Vergata"（罗马第二大学）

AI总结提出一种结合预训练Transformer模型和不确定性量化的迁移学习方法，在极少数据下高精度预测溶解度参数，并开发了可定制的绿色溶剂筛选工具。

详情

AI中文摘要

溶解度的准确预测仍然是材料科学和可持续化学中的一个核心挑战。特别是由于有机和混合光伏、电池、催化等新兴技术，溶剂使用量预计在未来几年将显著增加。因此，用更绿色的替代品取代溶剂至关重要。这正是机器学习可以产生重大影响的地方。然而，溶解度关键参数的数据有限，严重制约了机器学习的效能。在这项工作中，我们将预训练的QM9基础模型迁移到我们的应用中，所需数据极少。此外，该流程集成了不确定性量化，允许用户评估预测的置信度。作为基线，我们成功预测了存在大量数据库的汉森溶解度参数和介电常数。重要的是，我们在其他目标（如Gutmann供体和受体数）上实现了高模型性能，而这些目标的可获得数据极为有限。总体而言，我们通过高质量预测将溶解度描述符的数据量提高了数个数量级。为了有效传播，我们部署了一个易于使用、易于与高通量实验室集成、可定制的工具，用于排序和筛选可能的溶剂替代品。最后，我们重新发现了已知的绿色溶剂替代品，并提出了新的候选者，证明了其在寻找环保溶剂方面的相关性。

英文摘要

Accurate prediction of solubility remains a central challenge across materials science and sustainable chemistry. In particular due to emerging technologies like organic and hybrid photovoltaics, batteries, and catalysis, solvent usage is expected to increase significantly within the coming years. Therefore, substituting solvents with greener alternatives is vital. This is where machine learning can have substantial impact. However, the limited data on critical parameters of solubility significantly constraints machine learning efficacy. In this work, we transfer a pre-trained foundational model on QM9 targets to our application with minimal data requirements. Additionally, the pipeline integrates uncertainty quantification, allowing the user to gauge the confidence of the predictions. As baseline, we succeed in predicting the Hansen solubility parameters and Dielectric Constant for which extensive databases exist. Importantly, we achieve high model performance on additional targets, such as Gutmann Donor and Acceptor numbers, where the available data is extremely limited. Overall, we augment data on solubility descriptors by orders of magnitude with high quality predictions. For effective dissemination, we deploy easy-to-use, easily integrateable with high throughput labs, customizable tool for ranking and screening possible solvent substitutes. Finally, we rediscovered known green solvent alternatives and proposed new candidates proving its relevance for finding eco-friendly solvents.

URL PDF HTML ☆

赞 0 踩 0

2606.13174 2026-06-12 cs.LG cs.CL 新提交

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

与你合作得更好：将用户修正编译为编码代理的运行时强制

Yujun Zhou, Kehan Guo, Haomin Zhuang, Xiangqi Wang, Yue Huang, Zhenwen Liang, Pin-Yu Chen, Tian Gao, Nuno Moniz, Nitesh V. Chawla, Xiangliang Zhang

发表机构 * University of Notre Dame（圣母大学）； IBM Research（IBM研究院）； Tencent AI Lab（腾讯AI实验室）

AI总结提出TRACE方法，通过将用户修正编译为原子规则并在运行时强制执行，显著减少编码代理在后续任务中的偏好违反，优于纯记忆方法。

详情

AI中文摘要

交互式LLM代理正成为日常工作的组成部分，但它们并不会随着时间的推移而变得更易于合作：在一个会话中记住的修正可能在下一个会话中仍被违反。我们研究了偏好访问与偏好遵从之间的差距。在源自匿名真实用户摩擦案例的任务中，Mem0记忆仍然导致57.5%的适用偏好检查被违反。我们引入了测试时规则获取与编译强制（TRACE），这是一个用于编码代理运行时的即插即用技能层管道，它挖掘用户修正，将其重写为原子规则，并编译为运行时检查，这些检查必须在代理完成未来任务之前通过。与开发者提前编写的运行时检查不同，TRACE技能来自用户自己的聊天修正。我们通过在ClawArena编码代理任务和MemoryArena衍生的内存密集型任务上进行模拟用户参与实验来评估TRACE。在ClawArena上，TRACE将分布内任务的保留偏好违反从100.0%降低到37.6%，将分布外任务从100.0%降低到2.0%。在MemoryArena衍生的任务上，TRACE将分布内违反从100.0%降低到60.5%，同时在任务通过率上匹配或超过最强的记忆基线。这些结果表明，将修正编译为运行时强制可以解决记忆单独无法可靠解决的重复摩擦失败模式，减少用户在未来会话中重复相同修正的需求。实验代码可在此https URL获取，可部署的技能可在此https URL获取。

英文摘要

Interactive LLM agents are becoming part of daily work, but they do not reliably become easier to work with over time: a correction remembered in one session may still be violated in the next. We study this gap between preference access and preference compliance. In tasks derived from anonymized real-user friction cases, Mem0 memory still leaves 57.5% of applicable preference checks violated. We introduce Test-time Rule Acquisition and Compiled Enforcement (TRACE), a drop-in skill-layer pipeline for coding-agent runtimes that mines user corrections, rewrites them as atomic rules, and compiles them into runtime checks that must pass before an agent completes future tasks. Unlike runtime checks written ahead of time by developers, TRACE skills come from the user's own chat corrections. We evaluate TRACE with simulated user-in-the-loop experiments on ClawArena coding-agent tasks and MemoryArena-derived memory-intensive tasks. On ClawArena, TRACE reduces held-out preference violation from 100.0% to 37.6% on in-distribution tasks and from 100.0% to 2.0% on out-of-distribution tasks. On MemoryArena-derived tasks, TRACE reduces in-distribution violation from 100.0% to 60.5% while matching or exceeding the strongest memory baseline on task pass. These results suggest that compiling corrections into runtime enforcement can address a repeated-friction failure mode that memory alone does not reliably solve, reducing the need for users to restate the same correction across future sessions. Experiment code is available at https://github.com/YujunZhou/TRACE_exp, and the deployable skill is available at https://github.com/YujunZhou/tellonce.

URL PDF HTML ☆

赞 0 踩 0

2606.13236 2026-06-12 cs.LG cs.AI cs.SD stat.AP 新提交

Decoding Insect Song: A Multitask Semisupervised Orthoptera Bioacoustic Classifier

解码昆虫之歌：一种多任务半监督直翅目生物声学分类器

Olga Isupova, Danil Kuzin, Ella Browning, Tom Mills, Steven Reece

发表机构 * University of Oxford（牛津大学）

AI总结提出PULSE半监督多任务框架，结合弱监督分类、自监督学习和知识蒸馏，在直翅目生物声学分类中优于通用模型，并通过主动学习进一步提升性能。

Comments ICML 2026 Workshop on Machine Learning for Audio

2606.13252 2026-06-12 cs.LG 新提交

To GAN or Not To GAN: Segmentation Analysis on Mars DEM

生成对抗还是非生成对抗：火星DEM上的分割分析

Douglas Dziedzorm Agbeve, Aditya V. Handrale, Salim Fares, Seif E. Idani

发表机构 * University of Passau（帕绍大学）

AI总结使用监督语义分割和生成对抗方法自动检测火星上的土丘，并比较两种方法，发现添加人工生成数据并未改善结果。

2606.13285 2026-06-12 cs.LG cs.AI 新提交

Once-for-All: Scalable Simultaneous Forecasting via Equilibrium State Estimation

Once-for-All: 基于均衡状态估计的可扩展同步预测

Beinan Xu, Andy Song, Jiti Gao, Feng Liu

发表机构 * RMIT University（皇家墨尔本理工大学）； Monash University（莫纳什大学）； University of Adelaide（阿德莱德大学）

AI总结提出均衡状态估计（ESE）范式，通过一次前向传播估计多系统均衡状态并基于状态差异生成预测，在保持精度的同时实现10-70倍加速，且具有线性时间复杂度和鲁棒性。

Comments Accepted by ICML 2026

详情

AI中文摘要

我们引入均衡状态估计（ESE），一种用于同步预测的新范式，其中多个相互作用的系统需要独立但协调的预测。这种场景在现实世界中经常出现，例如经济学和医疗建模。与一次预测一个系统的现有方法不同，ESE在一次前向传播中预测所有系统。它首先估计跨系统的均衡状态，然后基于当前状态与估计均衡之间的差异生成整体预测。在合成和真实世界数据集（包括货币汇率和COVID-19传播建模）上的大量实验表明，ESE至少与最先进（SOTA）方法一样准确，同时速度显著更快。此外，ESE与传统预测器无缝集成，结合了它们的准确性和其卓越的效率，实现了10-70倍的加速。凭借线性时间复杂度，随着系统数量的增加，ESE的扩展性远优于SOTA方法。此外，它在各种扰动下仍保持准确，使ESE成为一种快速、可泛化、鲁棒且可扩展的多预测方法。

英文摘要

We introduce Equilibrium State Estimation (ESE), a novel paradigm for simultaneous prediction, where multiple interacting systems require separate yet coordinated forecasts. Such scenarios often arise in real-world settings such as economics and healthcare modeling. Unlike existing approaches that predict one system at a time, ESE forecasts all systems in a single pass. It first estimates the equilibrium state across systems, then generates holistic forecasts based on the difference between the current state and the estimated equilibrium. Extensive experiments on synthetic and real-world datasets, including currency exchange and COVID-19 spread modeling, demonstrate that ESE is at least as accurate as state-of-the-art (SOTA) methods while being significantly faster. In addition, ESE integrates seamlessly with conventional predictors, combining their accuracy with its exceptional efficiency and delivering a 10-70x speedup. With linear-time complexity, ESE scales far better than SOTA methods as the number of systems increases. Moreover, it remains accurate under diverse perturbations, establishing ESE as a fast, generalizable, robust, and scalable multi-prediction method.

URL PDF HTML ☆

赞 0 踩 0

2606.13311 2026-06-12 cs.LG cs.AI 新提交

保持特征的潜在EnKF用于含激波流动的数据同化

Hemanth Chandravamsi, Hangchuan Hu, Ponkrshnan Thiagarajan, Tamer A. Zaki

发表机构 * Department of Mechanical Engineering, Johns Hopkins University（约翰霍普金斯大学机械工程系）

AI总结针对含激波流动中EnKF因多模态统计产生伪振荡的问题，提出在学习的低维潜在空间进行集合更新以保持激波特征，并通过共享解码器恢复物理状态，数值实验验证了无伪振荡的准确特征恢复。

详情

AI中文摘要

集合卡尔曼滤波（EnKF）被广泛用于顺序数据同化，但对于具有间断的解（如可压缩流中的激波）会失效。激波位置的不确定性导致多模态集合统计，违反了EnKF的高斯假设，在分析状态中产生大尺度伪振荡。我们引入了一种保持特征的潜在EnKF，在学习的低维潜在空间中进行集合更新，其中激波和流动特征具有光滑流形表示，从而在EnKF分析期间保持尖锐特征。更新后的潜在状态通过所有集合成员共享的解码器映射回物理状态。该算法消除了先前方法中使用的成员特定有序训练和正性下限。在Sod激波管和马赫2激波与二维圆柱相互作用的数值实验中，使用稀疏和噪声观测，结果显示能够准确恢复激波和接触间断的特征，且无伪振荡。

英文摘要

The ensemble Kalman filter (EnKF) is widely adopted for sequential data assimilation, but fails for solutions with discontinuities, such as shocks in compressible flows. Uncertainty in shock location induces multimodal ensemble statistics that violate the Gaussian assumptions underlying the EnKF, producing large-scale spurious oscillations in the analysis state. We introduce a feature-preserving latent-EnKF that performs the ensemble update in a learned low-dimensional latent space, where shock and flow features admit a smooth manifold representation, thereby preserving sharp features during EnKF analysis. The updated latent state is mapped back to physical state through a shared decoder for all ensemble members. The algorithm eliminates the member-specific ordered training and positivity flooring used in prior approaches. Numerical experiments on a Sod shock tube and Mach 2 shock interaction with a 2D cylinder, using sparse and noisy observations, show accurate feature recovery of shocks and contact discontinuities without spurious oscillations.

URL PDF HTML ☆

赞 0 踩 0

2606.12806 2026-06-12 quant-ph cs.LG 交叉投稿

Quantum Reservoir Computing for Short-Term Power Load Forecasting in Resource-Constrained Energy Systems

量子储层计算在资源受限能源系统中的短期电力负荷预测

Mansi Od, Param Pathak, Nouhaila Innan, Muhammad Shafique

发表机构 * University of Waterloo（滑铁卢大学）

AI总结提出一种硬件高效的量子储层计算框架，通过固定量子储层和压缩经典读出层，在有限内存和硬件噪声下实现短期负荷预测，6位量化保留全精度性能并减少81.2%内存。

Comments 11 pages, 9 figures

详情

AI中文摘要

短期负荷预测对于可靠的能源管理至关重要，但在边缘设备上的实际部署需要模型在有限内存、有限测量预算和硬件噪声下保持准确性。本文提出一种硬件高效的量子储层计算（QRC）框架用于能源负荷预测，其中固定量子储层将时间输入窗口转换为高维特征，仅训练经典弹性网络读出层。为降低部署成本，训练后的读出层通过训练后定点量化压缩，位宽从8位到2位。该框架在Tetouan和Spain能源负荷数据集上评估，采用精确态矢量模拟、512次有限采样以及来自IBM FakeTorino和IBM FakeMarrakesh的 realistic 硬件噪声模型。结果表明，6位读出精度保持全精度预测性能，同时将读出内存减少81.2%。低于此阈值时，性能退化依赖于数据集，Tetouan表现出更强的敏感性，而Spain退化更缓慢。硬件噪声验证进一步表明，训练后的读出层可转移到噪声储层状态而无需重新训练。这些发现支持量化QRC作为近期量子时间序列应用的资源感知预测方法。

英文摘要

Short-term load forecasting is essential for reliable energy management, but practical deployment on edge devices requires models that remain accurate under limited memory, finite measurement budgets, and hardware noise. This work proposes a hardware-efficient Quantum Reservoir Computing (QRC) framework for energy load forecasting, where a fixed quantum reservoir transforms temporal input windows into high-dimensional features and only a classical Elastic Net readout is trained. To reduce deployment cost, the trained readout is compressed using post-training fixed-point quantization at bit widths from 8 to 2 bits. The framework is evaluated on the Tetouan and Spain energy load datasets under exact statevector simulation, 512-shot finite sampling, and realistic hardware-noise models from IBM FakeTorino and IBM FakeMarrakesh. Results show that 6-bit readout precision preserves full-precision forecasting performance while reducing readout memory by 81.2%. Below this point, degradation becomes dataset dependent, with Tetouan showing stronger sensitivity and Spain degrading more gradually. Hardware-noise validation further shows that the trained readout transfers to noisy reservoir states without retraining. These findings support quantized QRC as a resource-aware forecasting approach for near-term quantum time-series applications.

URL PDF HTML ☆

赞 0 踩 0

2606.12838 2026-06-12 q-bio.QM cs.AI cs.LG q-bio.GN 交叉投稿

OCOO-T : A Simple and Scalable Virtual Cell Model for Transcriptional Perturbation Response Prediction

OCOO-T: 一种用于转录扰动响应预测的简单可扩展虚拟细胞模型

Danning Jiang, Zheming An, Yalong Zhao, Lipeng Lai

AI总结提出OCOO-T，一种基于流匹配的简约虚拟细胞模型，通过连续时间去噪和自适应层归一化，在多个基准上实现转录扰动预测的最优性能。

Comments 22 pages, 6 figures

详情

AI中文摘要

预测单细胞对遗传、化学和细胞因子扰动的转录响应是计算生物学和AI虚拟细胞（AIVC）建模中的一个基本挑战，对药物发现和基因调控网络的阐明具有直接影响。现有方法通常依赖辅助细胞状态编码器、分层变分自编码器、专用Transformer编码器-解码器模块或基因相互作用先验，将高维表达谱压缩为潜在表示。虽然有效，但这些设计增加了架构复杂性，可能限制可扩展性和泛化性。本文介绍了OCOO-T，一种基于流匹配的简约AIVC模型，用于转录扰动响应预测。OCOO-T利用一个直接操作连续基因表达谱的普通Transformer堆栈，并将扰动响应预测表述为连续时间去噪过程。通过自适应层归一化和上下文令牌整合扰动嵌入、剂量信息以及细胞系/细胞类型特异性。在Tahoe100M、Replogle和PBMC基准上的全面评估表明，OCOO-T在多种扰动和细胞类型上实现了最先进的性能，同时通过细胞上下文的修补和拆补有效扩展到长转录谱。通过利用基于Transformer去噪的单细胞组学简单性，OCOO-T为计算机细胞模拟提供了一个有效且可扩展的框架。

英文摘要

Predicting single-cell transcriptional responses to genetic, chemical and cytokine perturbations is a fundamental challenge in computational biology and AI Virtual Cell (AIVC) modeling, with direct implications for drug discovery and the elucidation of gene regulatory networks. Existing approaches often rely on auxiliary cell-state encoders, hierarchical variational autoencoders, dedicated Transformer encoder-decoder modules, or gene-interaction priors to compress high-dimensional expression profiles into latent representations. While effective, these designs increase architectural complexity and may limit scalability and generalizability. This paper introduces OCOO-T, a minimalist flow-matching-based AIVC model for transcriptional perturbation response prediction. OCOO-T utilizes a vanilla Transformer stack that operates directly on continuous gene expression profiles and formulates perturbation response prediction as a continuous-time denoising process. Perturbation embeddings, dosage information, and cell-line/cell-type specificity are integrated through adaptive layer normalization and in-context tokens. Comprehensive evaluations on Tahoe100M, Replogle, and PBMC benchmarks demonstrate that OCOO-T achieves state-of-the-art performance across diverse perturbations and cell types while effectively scaling to long transcriptional profiles through patching and depatching of cellular contexts. By leveraging the simplicity of Transformer-based denoising for single-cell omics, OCOO-T provides an effective and scalable framework for in-silico cellular simulation.

URL PDF HTML ☆

赞 0 踩 0

2606.12916 2026-06-12 cs.AI cs.CL cs.LG 交叉投稿

MDForge: Agentic Molecular Dynamics Pipeline Design under Sparse Simulator Feedback

MDForge：稀疏模拟器反馈下的智能分子动力学流水线设计

Zehong Wang, Yijun Ma, Connor R. Schmidt, Tianyi Ma, Weixiang Sun, Ziming Li, Xiaoguang Guo, Chuxu Zhang, Matthew J. Webber, Yanfang Ye

发表机构 * University of Notre Dame（圣母大学）； University of Connecticut（康涅狄格大学）

AI总结提出MDForge，利用LLM智能体通过多智能体辩论将稀疏奖励稠密化，自动设计分子动力学流水线，在SAMPL基准上达到专家水平，并发现新型高亲和力CB[7]结合剂。

详情

AI中文摘要

分子动力学（MD）是原子分子科学中经典的计算机模拟方法，从第一性原理物理模拟分子行为。为新系统设计MD流水线需要大量专业知识：即使在一个分子上运行也代价高昂，排除了试错法。我们使用LLM智能体自动化这一专家流水线设计过程。与现有编排预定义工具集的MD智能体不同，我们将流水线设计视为开放式代码生成，其中智能体的行为通过语言奖励在线重塑。具体而言，我们构建了MDForge，一个LLM智能体，其上下文更新规则通过物理专家间的多智能体辩论将稀疏奖励稠密化。在三个SAMPL主客体结合自由能基准上，MDForge自动设计的MD流水线与人类专家竞争。部署在未见过的候选客体库上，其CB[7]流水线发现了一种新型结合剂，湿实验竞争NMR证实其为高亲和力、皮摩尔级的CB[7]结合剂。我们的数据和代码可在https://this URL获取。

英文摘要

Molecular dynamics (MD) is the canonical in-silico method for atomistic molecular science, simulating molecular behavior from first-principle physics. Designing an MD pipeline for a new system requires substantial expert knowledge: running it on even one molecule is expensive, ruling out trial-and-error. We automate this expert pipeline-design process with an LLM agent. Unlike existing MD agents that orchestrate a predefined tool set, we treat pipeline design as open-ended code generation in which the agent's behavior is reshaped online by verbal reward. Specifically, we build MDForge, an LLM agent whose in-context update rule densifies the sparse reward via a multi-agent debate among physics experts. On three SAMPL host-guest binding free-energy benchmarks, MDForge automatically designs MD pipelines competitive with human experts. Deployed on a library of unseen candidate guests, its CB[7] pipeline discovers a novel binder that wet-lab competition NMR confirms is a high-affinity, picomolar CB[7] binder. Our data and code are available at https://github.com/Zehong-Wang/MDForge.

URL PDF HTML ☆

赞 0 踩 0

2606.13017 2026-06-12 q-bio.NC cs.LG 交叉投稿

Deep Sleep Classification via EEG Signal Criticality: A Passive BCI Approach for Sleep-Improvement Neurofeedback

基于EEG信号临界性的深度睡眠分类：一种用于改善睡眠神经反馈的被动BCI方法

Stanisław Narębski, Tomasz Komendziński, Tomasz M. Rutkowski

AI总结本研究利用去趋势波动分析（DFA）提取的临界性特征，通过朴素贝叶斯分类器实现了对深度睡眠（N3）的高精度识别（平衡准确率87.17%），为被动脑机接口中的状态依赖神经反馈提供了高效感知机制。

Comments 7 pages, 3 figures, accepted for publication in the Proceedings of the 10th Graz Brain-Computer Interface Conference 2026, Graz, Austria, September 14-17, 2026

详情

AI中文摘要

自动睡眠分期是被动脑-机接口（pBCI）的一项基础应用，它解码自发神经状态以实现独立于用户意图的闭环干预。本研究评估了从去趋势波动分析（DFA）中提取的临界性特征，用于特定识别深度睡眠（N3）。我们分析了来自290名老年女性的347,232个EEG时段，使用UMAP流形学习可视化状态转换。随后，通过10折交叉验证对六个分类器进行基准测试，使用平衡准确率确定此http URL的最佳“状态感知”引擎。朴素贝叶斯达到了最高的平均平衡准确率（87.17% ± 0.24%），显著优于全连接深度神经网络（FNN：81.58%）和随机森林（80.97%）。线性模型（LDA：57.21%；SVM：51.01%）表现不佳，表明DFA衍生的临界性特征位于一个独特的非线性流形上。EEG临界性的概率解码为pBCI提供了一种高精度的感知机制。这种稳健的分类流程支持开发状态依赖的神经反馈，例如靶向听觉刺激，以增强认知恢复。

英文摘要

Automated sleep staging is a fundamental application of passive Brain-Computer Interfaces (pBCI), decoding spontaneous neural states to enable closed-loop interventions independent of user intent. This study evaluates criticality features derived from Detrended Fluctuation Analysis (DFA) for the specific identification of deep sleep (N3). We analyzed $347,232$ EEG epochs from $290$ older women using UMAP manifold learning to visualize state transitions. Subsequently, six classifiers were benchmarked via 10-fold cross-validation, using balanced accuracy to determine the optimal "state-sensing" engine for neurofeedback.Naive Bayes achieved the highest mean balanced accuracy ($87.17\% \pm 0.24\%$), significantly outperforming a fully connected deep neural network (FNN: $81.58\%$) and Random Forest ($80.97\%$). Linear models (LDA: $57.21\%$; SVM: $51.01\%$) performed poorly, indicating that DFA-derived criticality features reside on a distinct, non-linear manifold. Probabilistic decoding of EEG criticality provides a high-accuracy sensing mechanism for pBCIs. This robust classification pipeline supports the development of state-dependent neurofeedback, such as targeted auditory stimulation, to enhance cognitive recovery.

URL PDF HTML ☆

赞 0 踩 0

2606.13133 2026-06-12 cs.DS cs.LG 交叉投稿

Learning-Augmented Approximation for Unrelated-Machines Makespan Scheduling

学习增强的无关联机器调度近似算法

Kaito Baba, Evripidis Bampis, Giorgos Mitropoulos

AI总结针对无关联机器调度问题，提出学习增强算法，利用重作业分配预测实现精确预测时(1+ε)-近似，误差增大时退化为2-近似。

Comments 22 pages, 3 figures

2606.13136 2026-06-12 cs.CV cs.LG eess.IV 交叉投稿

An Extensible and Lightweight Unified Architecture for Demosaicing Pixel-bin Image Sensors

一种可扩展且轻量级的统一架构用于像素合并图像传感器的去马赛克

Saurabh Kumar, Nutan Sairam Yenneti

发表机构 * Samsung Research Institute Bangalore（三星研究院班加罗尔分院）

AI总结提出模块化统一架构，通过无学习CFA识别模块和轻量级设计，实现多种像素合并传感器的去马赛克，提升图像质量并降低资源消耗。

2606.13216 2026-06-12 cs.CL cs.LG 交叉投稿

Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization

分层最优传输用于神经机器翻译和抽象摘要中的幻觉检测

Mariia Onyshchuk, Maksym-Vasyl Tarnavskyi, Marta Sumyk

发表机构 * Fairseq ； AggreFact

AI总结通过最优传输分析跨注意力分布，发现幻觉检测集中于解码器前四层，且该方法在源脱离时有效，但无法检测注意力下游的不忠实摘要。

Comments Accepted to ICML Mechanistic Interpretability Workshop 2026

详情

AI中文摘要

最优传输（OT）已被证明可以通过测量跨注意力分布与参考分布之间的几何距离来检测神经机器翻译（NMT）中的幻觉，无需任何监督。我们将此分析扩展到Fairseq DE-EN模型的所有六个解码器层（$N=3{,}414$），表明Wass-to-Unif和Wass-to-Data是互补的检测器，专门针对不同类型的幻觉；检测集中在L1--L4层，而L5层对较微妙的类型具有反预测性；并且幻觉翻译缺乏正确翻译从第一步解码开始就存在的探索性注意力阶段。我们进一步评估了几何信号是否可迁移到抽象摘要忠实性检测：在AggreFact（$N=1{,}116$）上，我们的无监督OT检测器在CNN/XSum上达到$57.2\%$/$57.6\%$的平衡准确率——高于随机水平，但远低于有监督的MiniCheck-Flan-T5-L（$69.9\%$/$74.3\%$）。这种差距是原则性的：与NMT幻觉不同，不忠实的摘要可以正确关注源标记，同时歪曲其内容，这种失败模式在基于集中度的OT指标中由于构造原因而不可见。在T5-base上的结构实验证实了解码器在深度上的一致组织，其中第3层显示峰值集中度，第12层对生成质量最为关键。总之，结果确立了当失败模式是源脱离时，跨注意力的OT是一种可靠的检测器；无论任务如何，它都是一种原则性的可解释性工具；而当忠实性失败发生在注意力下游时，它则具有根本局限性。

英文摘要

Optimal transport (OT) has been shown to detect hallucinations in neural machine translation (NMT) by measuring the geometric distance between cross-attention distributions and a reference distribution, without any supervision. We extend this analysis to all six decoder layers of the Fairseq DE-EN model ($N=3{,}414$), showing that Wass-to-Unif and Wass-to-Data are complementary detectors specialised across hallucination types, that detection is concentrated in layers L1--L4 with L5 anti-predictive for subtler types, and that hallucinated translations lack the exploratory attention phase present in correct translations from the first decoding step. We further evaluate whether the geometric signal transfers to abstractive summarization faithfulness detection: our unsupervised OT detector on AggreFact ($N=1{,}116$) achieves $57.2\%$/$57.6\%$ balanced accuracy on CNN/XSum -- above chance but substantially below supervised MiniCheck-Flan-T5-L($69.9\%$/$74.3\%$). This gap is principled: unlike NMT hallucinations, unfaithful summaries can attend correctly to source tokens while misrepresenting their content, a failure mode invisible to concentration-based OT metrics by construction. Structural experiments on T5-base confirm consistent decoder organisation across depth, with Layer~3 showing peak concentration and Layer~12 being most critical for generation quality. Together, the results establish OT on cross-attention as a reliable detector when the failure mode is source disengagement, a principled interpretability tool regardless of task, and fundamentally limited when faithfulness failures occur downstream of attention.

URL PDF HTML ☆

赞 0 踩 0

2606.13220 2026-06-12 cs.AI cs.CE cs.ET cs.LG cs.MA 交叉投稿

LLM-as-an-Investigator: Evidence-First Reasoning for Robust Interactive Problem Diagnosis

LLM作为调查员：基于证据优先的鲁棒交互式问题诊断

Fabrizio Marozzo, Pietro Liò

发表机构 * University of Calabria（卡拉布里亚大学）； University of Cambridge（剑桥大学）

AI总结提出证据优先的AI方法LLM-as-an-Investigator，通过估计问题歧义、生成假设、提问澄清并更新概率，避免过早接受用户假设，提升诊断准确性。

详情

AI中文摘要

大型语言模型（LLM）越来越多地被用作技术问题解决的交互式助手。然而，当用户提供不完整的描述或看似合理但未经证实的解释时，LLM可能会过早地认同这些假设，并在收集足够证据之前提出解决方案。我们将这种行为称为用户驱动的谄媚：LLM倾向于强化用户提供的假设，而不是测试其他解释。本文介绍了LLM-as-an-Investigator，一种基于证据优先的智能体AI方法，用于鲁棒的问题诊断。该方法通过一个解决方案调查智能体实现，该智能体估计初始问题描述的模糊性，生成候选假设，提出有针对性的澄清问题，并在每次回答后更新假设概率。该智能体不是立即给出响应，而是继续调查，直到证据使一个候选解释比其他解释更强。为了评估该方法，我们从机械、电气和液压领域已解决的技术论坛帖子中构建了一个基准测试。我们使用一个三智能体评估流程：问题-解决方案提取智能体将已解决的帖子转换为结构化案例，真实答案评估智能体在隐藏已知解决方案的同时模拟用户，被测试的助手通过对话尝试恢复解决方案。实验比较了标准助手、面向推理的LLM和基于调查员的模型，使用不同的LLM骨干网络。除了诊断准确性，我们还分析了标准助手在诊断案例中如何遵循误导性的用户假设。结果表明，所提出的方法比直接提示和仅推理基线更准确地识别问题，而其证据优先协议有助于减少用户引发的对话偏差。

英文摘要

Large language models (LLMs) are increasingly used as interactive assistants for technical problem solving. However, when users provide incomplete descriptions or plausible but unverified explanations, LLMs may prematurely align with these assumptions and propose solutions before collecting sufficient evidence. We refer to this behavior as user-driven sycophancy: the tendency of an LLM to reinforce a user-provided hypothesis instead of testing alternative explanations. This paper introduces LLM-as-an-Investigator, an evidence-first agentic AI methodology for robust problem diagnosis. The approach is implemented through a Solution Investigator Agent, which estimates the ambiguity of an initial problem description, generates candidate hypotheses, asks targeted clarification questions, and updates hypothesis probabilities after each answer. Rather than producing an immediate response, the agent continues the investigation until the evidence makes one candidate explanation stronger than the alternatives. To evaluate the approach, we build a benchmark from solved technical forum threads in mechanical, electrical, and hydraulic domains. We use a three-agent evaluation pipeline in which a Problem-Solution Extractor Agent converts solved threads into structured cases, a Ground-Truth Evaluator Agent simulates the user while hiding the known solution, and the tested assistant attempts to recover the solution through dialogue. The experiments compare standard assistants, reasoning-oriented LLMs, and the proposed investigator-based model across LLM backbones. In addition to diagnostic accuracy, we analyze how standard assistants follow misleading user hypotheses in diagnostic cases. The results show that the proposed approach identifies the problem more accurately than direct prompting and reasoning-only baselines, while its evidence-first protocol helps reduce user-induced conversational bias.

URL PDF HTML ☆

赞 0 踩 0

2606.13302 2026-06-12 cs.AI cs.LG 交叉投稿

Physics-Guided Spatiotemporal Learning for Coastal Wave Peak Period Estimation from Video

物理引导的时空学习用于从视频估计海岸波浪峰值周期

Abubakar Hamisu Kamagata, Dharm Singh Jat, Attlee Munyaradzi Gamundani, Abhishek Srivastava, Paramasivam Saravanakumar

发表机构 * Namibia University of Science and Technology（纳米比亚科技大学）； Indian Institute of Technology Indore（印度理工学院印多尔分校）； Namdeb Diamond Corporation（纳米比亚钻石公司）

AI总结提出物理引导的深度时空学习框架，结合自动区域检测、模拟到真实迁移学习和物理信息正则化，从海岸视频直接估计近岸波浪峰值周期，验证了基于Transformer和轻量级循环卷积架构的有效性。

详情

AI中文摘要

近岸波浪参数对于海岸工程、海岸线保护、海洋灾害评估和气候适应性的海岸管理至关重要。传统的监测系统如浮标和雷达平台提供精确监测，但安装和维护成本高，空间覆盖有限。通过深度学习实现了使用视频的被动海洋监测，然而许多方法在海洋学上缺乏物理可解释性、可行性和验证。本文提出了一种物理引导的深度时空学习框架，用于从被动海岸视频流直接估计近岸波浪峰值周期。该框架结合了基于自动时间方差感兴趣区域检测、多阶段模拟到真实迁移学习和物理信息正则化，以提高预测精度和物理一致性。评估了多种时空架构，如基于Transformer和循环卷积的架构，以及合成预训练、银标签自适应和专家微调。结果表明，基于Transformer的架构在瞬时预测精度方面表现更好，而轻量级循环卷积架构实现了更高的时间稳定性和操作海洋学技能。消融研究也证明了物理引导正则化在趋势跟随一致性和减少物理上不可信预测方面的益处。可解释性审计有助于将注意力集中在水动力活跃的碎波带区域，并与物理推导的波浪传播行为良好吻合。总体而言，所提出的框架展示了基于物理引导视频的深度学习系统在长期海岸波浪监测中的潜力，具有成本效益和操作可行性。

英文摘要

Wave parameters in the nearshore are crucial for coastal engineering, shoreline protection, marine hazard assessment, and coastal management for climate resilience. Traditional monitoring systems like buoys and radar platforms offer accurate monitoring but can have high installation and maintenance expenses and limited spatial coverage. Passive ocean monitoring using video has been achieved by leveraging deep learning, however, many methods are not physically interpretable, feasible, and validated for oceanography. In thiswork, a Physics-Guided Deep Spatiotemporal Learning Framework for direct estimation of nearshore wave peak periods from passive coastal video stream is proposed. The framework combines automated temporal-variance based region-of-interest detection, multi-stage Sim-to-Real transfer learning, and physics-informed regularization to enhance the predictive accuracy and physical consistency. A variety of spatiotemporal architectures were assessed, such as transformer-based and recurrent-convolutional ones, alongside synthetic pretraining,silver-label adaptation, and expert fine-tuning. The results show that transformer-based architectures outperformed in terms of the accuracy of the instantaneous prediction, while lightweight recurrent-convolutional architectures achieved higher temporal stability and operational oceanographic skill. Ablation studies also demonstrated the benefits of physics-guided regularization in terms of trend-following consistency, and physically implausible predictions. Explainability auditing also helped to focus attention in hydrodynamically active surf-zone regions and showed good agreement with the physically derived wave propagation behavior. In general, the proposed framework shows the promise of physics-guided video-based deep learning systems for long-term coastal wave monitoring that are cost-efficient and operationally feasible.

URL PDF HTML ☆

赞 0 踩 0

2606.13515 2026-06-12 cs.CV cs.LG cs.RO 交叉投稿

MaskWAM: Unifying Mask Prompting and Prediction for World-Action Models

MaskWAM：统一掩码提示与预测的世界-动作模型

Hanyang Yu, Haitao Lin, Jingbo Zhang, Wenyao Zhang, Chenghao Gu, Heng Li, Ping Tan

发表机构 * The Hong Kong University of Science and Technology（香港科技大学）； Tencent Robotics X（腾讯机器人X实验室）； Tsinghua University（清华大学）

AI总结提出MaskWAM，通过统一掩码输入与预测的混合Transformer架构，解决世界-动作模型的空间瓶颈，提升策略泛化能力，在LIBERO等任务上显著优于基线。

详情

AI中文摘要

世界-动作模型（WAMs）通过视频预测为机器人控制提供了一种有前景的范式。然而，当前的WAMs存在根本性的空间瓶颈：标准文本输入在杂乱场景中引入指代歧义，而非结构化的RGB预测缺乏语义基础，并受任务无关背景的偏差影响。为克服这些限制，我们引入了MaskWAM，一种以对象为中心的世界-动作模型。通过统一的混合Transformer（MoT）将掩码同时作为显式输入和预测进行联合集成，MaskWAM实现了鲁棒的策略泛化。该设计提供两个关键优势：（1）预测未来掩码产生以对象为中心的语义监督，抑制视觉噪声，显著增强甚至标准文本条件的WAMs；（2）将此预测监督与第一帧视觉提示（如目标对象掩码）耦合，建立精确的空间锚点，大幅减少语言歧义。关键在于，由于WAMs本质上是视觉驱动的架构，直接掩码条件化比单独文本提供更强的引导，为操作未见对象建立了精确且鲁棒的范式。在LIBERO、RoboTwin和真实世界任务上的评估表明，MaskWAM在语言清晰和语言模糊任务中均显著优于基线。

英文摘要

World Action Models (WAMs) present a promising paradigm for robotic control via video prediction. However, current WAMs suffer from fundamental spatial bottlenecks: standard text inputs introduce referential ambiguity in cluttered scenes, while unstructured RGB predictions lack semantic grounding and remain biased by task-irrelevant backgrounds. To overcome these limitations, we introduce MaskWAM, an object-centric world-action model. By jointly integrating masks as both explicit inputs and predictions via a unified Mixture of Transformers (MoT), MaskWAM unlocks robust policy generalization. This design provides two key benefits: (1) predicting future masks yields object-centric semantic supervision that suppresses visual noise, significantly enhancing even standard text-conditioned WAMs; and (2) coupling this predictive supervision with first-frame visual prompts, such as target object masks, establishes a precise spatial anchor that substantially reduces language ambiguity. Crucially, as WAMs are inherently vision-driven architectures, direct mask conditioning yields substantially stronger guidance than text alone, establishing a precise and robust paradigm for manipulating unseen objects. Evaluations on LIBERO, RoboTwin, and real-world tasks demonstrate that MaskWAM significantly outperforms baselines in both language-clear and language-ambiguous tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.13529 2026-06-12 cs.HC cs.LG 交叉投稿

Ride, Track, and Recover: Pilot Randomized Trial of a Wearable Digital Self-Management Intervention During a Veteran Endurance-Cycling Program

骑行、追踪与恢复：一项关于可穿戴数字自我管理干预在退伍军人耐力骑行项目中的初步随机试验

Alan Ta, Nilsu Salgin, Caleb Armstrong, Kala Phillips Reindel, Farzan Sasangohar

发表机构 * Department of Industrial and Systems Engineering, Texas A&M University（工业与系统工程系，德克萨斯A&M大学）； Texas A&M Health Telehealth Institute（德克萨斯A&M健康远程医疗研究所）

AI总结本研究通过随机试验，评估可穿戴数字自我管理干预对退伍军人创伤后应激障碍（PTSD）高唤醒症状的稳定效果，发现干预组症状改善更持久，且机器学习检测精度与症状严重程度正相关。

详情

AI中文摘要

退伍军人的创伤后应激障碍（PTSD）以持续高唤醒及共病焦虑和抑郁症状为特征，这些症状在临床环境外难以监测和管理。在德克萨斯州参加“英雄计划”骑行活动的13名退伍军人，通过计算机生成序列在自然环境中随机分为两组：（1）数字干预加体力活动，或（2）仅体力活动，外加一个由从更广泛的“英雄计划”退伍军人社区中选出的7名退伍军人组成的第三组家庭监测对照组。连续智能手表传感结合心率和加速度计特征来检测高唤醒事件，并由参与者实时确认。每周收集焦虑、抑郁和PTSD严重程度的自我报告测量。广义加性混合模型描述了随时间变化的非线性轨迹。基线归一化的高唤醒轨迹在不同条件下存在显著差异，数字干预组（n=7）显示出结构化的稳定，而仅体力活动组（n=3）在研究后期出现恶化。两个骑行组在耐力活动期间均表现出急性症状改善；然而，数字干预组表现出更高的整体收益维持。家庭对照组（n=4）显示出症状逐渐下降。机器学习检测的感知精度在个体间差异很大，并与症状严重程度正相关，较高严重程度的参与者确认了更大比例的检测事件。这些结果表明，将可穿戴检测与数字自我管理工具相结合可能支持高唤醒的稳定和症状改善，同时强调了在可穿戴心理健康系统中个性化和以人为中心的设计的重要性。

英文摘要

Post-traumatic stress disorder (PTSD) in veterans is characterized by persistent hyperarousal and comorbid anxiety and depressive symptoms that are difficult to monitor and manage outside clinical settings. Thirteen veterans participating in a Project Hero cycling event in Texas were randomized by computer-generated sequence in a naturalistic setting to two arms: (1) digital intervention plus physical activity, or (2) physical activity only, plus a third at-home monitoring control cohort consisting of 7 veterans selected from the broader Project Hero veteran community. Continuous smartwatch sensing combined heart rate and accelerometer features to detect hyperarousal events, which were confirmed in real time by participants. Weekly self-report measures of anxiety, depression, and PTSD severity were collected. Generalized additive mixed models characterized nonlinear trajectories over time. Baseline-normalized hyperarousal trajectories differed significantly across conditions, with the digital intervention group (n=7) showing structured stabilization compared to late-study escalation in the physical-only group (n=3). Both cycling groups exhibited acute symptom improvements during the endurance event; however, the digital intervention group demonstrated a higher overall maintenance of gains. The at-home control group (n=4) showed gradual symptom declines. Perceived precision of ML detections varied substantially across individuals and was positively associated with symptom severity, with higher-severity participants confirming a greater proportion of detected events. These results suggest that coupling wearable detection with digital self-management tools may support stabilization of hyperarousal and symptom improvement while emphasizing the importance of personalization and human-centered design in wearable mental health systems.

URL PDF HTML ☆

赞 0 踩 0

2606.13532 2026-06-12 cs.NI cs.LG 交叉投稿

Graphical Causal Reasoning for Root Cause Analysis in Cloud Networks

云网络中根本原因分析的图因果推理

Fabien Chraim, Dominik Janzing, John Evans

AI总结提出基于图因果发现的云网络事故根本原因分析方法，通过时空分组和自动化本体降维，利用双变量Granger因果性和条件独立性检验构建因果图，并引入概率方法进行时间感知的根因评分。在35个生产事故中召回率85.7%，精确匹配率74.3%。

Comments 6 pages, 4 figures

详情

AI中文摘要

云计算依赖于大规模网络，这些网络本质上是复杂系统。在本文中，我们提出了一种新颖的云网络事故根本原因分析（RCA）方法，利用基于图的因果发现技术。我们的方法通过引入时空分组策略和自动化本体来降低问题维度，从而解决了基于规则的自动化的局限性。我们使用双变量Granger因果性和条件独立性检验从二元时间序列数据构建因果图。对于推理，我们引入了一种概率方法，该方法根据时间延迟分配边特定的条件概率，从而通过因果图遍历实现可解释的、时间感知的根因评分。我们使用来自一家主要云提供商的35个生产事故的标记数据集评估了该系统。该模型成功召回正确根因的事故占85.7%，精确匹配的事故占74.3%。在生产中，该系统已用于800多个真实世界事故，并获得了网络工程师的积极定性反馈。这些结果突显了在动态和大规模运营环境中采用数据驱动的因果方法进行RCA的实用性。

英文摘要

Cloud-computing relies on large-scale networks which are inherently complex systems. In this paper, we present a novel approach to root cause analysis (RCA) of cloud network incidents, leveraging graph-based causal discovery techniques. Our method addresses the limitations of rule-based automation by introducing a spatiotemporal grouping strategy and an automation ontology to reduce the dimensionality of the problem. We construct a causal graph from binary time series data using bivariate Granger causality and conditional independence tests. For inference, we introduce a probabilistic method that assigns edge-specific conditional probabilities as a function of time lag, allowing for interpretable, time-aware root cause scoring via causal graph traversal. We evaluated the system using a labeled dataset of 35 production incidents from a major cloud provider. The model successfully recalled the correct root cause in 85.7% of incidents and produced an exact match in 74.3%. In production, the deployed system has been used in over 800 real-world incidents, with positive qualitative feedback from network engineers. These results highlight the practicality of a data-driven, causal approach to RCA in dynamic and large-scale operational environments.

URL PDF HTML ☆

赞 0 踩 0

2606.13543 2026-06-12 cs.NI cs.LG 交叉投稿

NetCause: Counterfactual Learning for Root Cause Analysis in Large-Scale Networks

NetCause：大规模网络中根因分析的反事实学习

Fabien Chraim, Jian Zhang, Dominik Janzing, Xiang Song, Christos Faloutsos, John Evans

AI总结提出NetCause框架，将网络事件建模为图时间过程，通过反事实模拟排序候选根因，在31个专家标注事件上准确率提升16.1%。

Comments 9 pages, 6 figures

详情

AI中文摘要

一个学习模型能否捕捉故障在大规模网络中的传播方式，并利用这些知识将客户影响因果归因于其根本原因？现有的根因分析技术通常依赖于静态规则、相关启发式或拓扑局部推理，难以在动态环境中泛化，因为故障在复杂的物理和逻辑依赖关系中传播。我们提出了NetCause，一个基于自监督学习的框架，将网络事件建模为图时间过程，并使用反事实模拟对候选根因进行排序。该方法生成可解释的根因假设排序，并自然地与操作员定义的缓解和修复措施集成。我们在来自领先云提供商生产网络的六个月内收集的1500多个事件上训练模型，并在31个专家标注的事件上评估。NetCause在与运营决策最相关的场景中持续改善根因排序质量，相比基于规则的启发式基线，准确率提升16.1%。虽然训练计算密集，但推理轻量，每个事件仅需数秒GPU运行时间（远低于典型的遥测收集延迟）。

英文摘要

Can a learned model capture how faults propagate through a large-scale network and use this knowledge to causally attribute customer impact to its underlying root cause? Existing root cause analysis techniques often rely on static rules, correlation heuristics, or topology-local reasoning, which struggle to generalize in dynamic environments where faults propagate across complex physical and logical dependencies. We present NetCause, a self-supervised learning-based framework that models network incidents as graph-temporal processes and uses counterfactual simulation to rank candidate root causes. This approach produces an interpretable ranking of root cause hypotheses and integrates naturally with operator-defined mitigation and remediation actions. We train the model on over 1,500 incidents collected over six months from a leading cloud provider's production network and evaluate it on 31 expert-labeled incidents. NetCause consistently improves root cause ranking quality in the regime most relevant to operational decision-making, achieving a 16.1% accuracy improvement over a rule-based heuristic baseline. While training is computationally intensive, inference is lightweight, requiring only seconds of GPU runtime per incident (well below typical telemetry collection latencies).

URL PDF HTML ☆

赞 0 踩 0

2606.13591 2026-06-12 cs.AI cs.LG cs.MA 交叉投稿

Multiagent Protocols with Aggregated Confidence Signals

带有聚合置信信号的多智能体协议

Ali Elahi, Barbara Di Eugenio

发表机构 * University of Illinois Chicago（伊利诺伊大学芝加哥分校）

AI总结提出三种协议，通过转换原始置信信号并采用软投票或贝叶斯融合，为多智能体系统输出聚合置信度，在保持正确性的同时显著提升判别能力。

Comments 22 pages and 5 figures, 9 pages and 2 figures before the appendix

详情

AI中文摘要

置信度在自然语言处理中用于可靠性、监督和一系列下游决策任务，但目前没有方法能够为多智能体系统的输出产生或评估置信度。先前的工作在多智能体辩论中使用置信度来加权消息、触发辩论或校准单个智能体，但从未将这些置信度聚合成系统本身的单一置信度。我们引入了三种协议，通过首先转换原始置信信号使其在不同模型间可比，然后通过软投票或称为贝叶斯融合的概率融合方法将它们组合，从而产生最终答案和单一的聚合置信度。这种聚合置信度在判别性（AUARC）上显著优于最佳单个智能体或标准辩论基线，同时正确性（F1分数）保持稳定，并恢复了多智能体辩论在更模糊任务上的损失。通过分析两种估计器（序列概率和自我报告）以及参数和非参数校准器，我们发现校准提高了两种估计器的F1分数，而AUARC对其依赖较小。我们在五个基准测试和四种任务类型上评估了每基准六对同质和异质辩论对，涵盖了多种模型能力和大小。

英文摘要

Confidence is used for reliability, oversight, and a range of downstream decision tasks in Natural Language Processing (NLP), yet no existing method produces or evaluates a confidence for the output of a multiagent system. Prior work uses confidence within multiagent debate (MAD) to weight messages, trigger debate, or calibrate individual agents, but it never aggregates these into a single confidence for the system itself. We introduce three protocols that produce a final answer along with a single aggregated confidence by first transforming raw confidence signals to make them comparable across models, then combining them via soft voting or a probability fusion we call Bayesian fusion. This aggregated confidence is substantially more discriminative (AUARC) than that of the best single agent or the standard debate baselines, while correctness (F1-score) stays stable and recovers the losses MAD incurs on more ambiguous tasks. Analyzing two estimators, sequence probability and self-report, alongside parametric and non-parametric calibrators, we find that calibration improves F1 for both estimators while AUARC is less reliant on it. We evaluate six homogeneous and heterogeneous debating pairs per benchmark, across five benchmarks and four task types, spanning a range of model capabilities and sizes.

URL PDF HTML ☆

赞 0 踩 0

2606.13633 2026-06-12 eess.SY cs.LG cs.SY 交叉投稿

Aerial Wildfire Suppression Planning with a Hybrid CNN-Cellular Automata Fire Model

基于混合CNN-元胞自动机火灾模型的空中野火抑制规划

Ion Matei, Maksym Zhenirovskyy, Takuya Kurihana, Rohit Vupala, Anthony Wong

AI总结提出结合混合神经-元胞自动机野火模型与梯度优化空中投放的框架，通过蒙特卡洛采样和空间相关扰动量化不确定性，案例验证可生成有效抑制方案。

详情

AI中文摘要

空中野火抑制不仅需要预测火势蔓延，还需要在操作和环境不确定性下设计有效的干预策略。我们提出了一个空中野火抑制的建模与优化框架，该框架将混合神经-元胞自动机野火模型与基于梯度的目标空中投放设计相结合。野火模型根据地形、燃料和风数据预测空间变化的蔓延行为，而干预模块确定二元投放动作，其连续值位置和方向参数映射到模拟网格。水和阻燃剂具有不同的抑制效果，分别对应于立即减少活跃燃烧和持续减少未来蔓延。为了评估所得抑制方案的鲁棒性，我们通过每日火势状态的蒙特卡洛采样量化偶然不确定性，并通过空间相关的预测误差扰动量化认知不确定性。基于2020年Bear Fire的案例研究表明，该框架可以生成连贯的空中抑制调度，以减少总火灾影响面积，并支持对野火干预策略的不确定性感知分析。

英文摘要

Aerial wildfire suppression requires not only predicting fire spread, but also designing effective intervention strategies under operational and environmental uncertainty. We present a modeling and optimization framework for aerial wildfire suppression that combines a hybrid neural-cellular automaton wildfire model with gradient-based design of targeted aerial drops. The wildfire model predicts spatially varying spread behavior from terrain, fuel, and wind data, while the intervention module determines binary drop actions with continuous-valued location and orientation parameters mapped to the simulation grid. Water and retardant are represented with distinct suppression effects, corresponding to immediate reduction of active burning and persistent reduction of future spread. To evaluate the robustness of the resulting suppression plans, we quantify both aleatoric uncertainty through Monte Carlo sampling of daily fire-state realizations and epistemic uncertainty through spatially correlated prediction-error perturbations. A case study based on the 2020 Bear Fire shows that the framework can generate coherent aerial suppression schedules for reducing total fire-affected area and can support uncertainty-aware analysis of wildfire intervention strategies.

URL PDF HTML ☆

赞 0 踩 0

2606.13677 2026-06-12 cs.RO cs.AI cs.CV cs.LG 交叉投稿

Mana: Dexterous Manipulation of Articulated Tools

Mana: 铰接工具的灵巧操作

Zhao-Heng Yin, Guanya Shi, Pieter Abbeel, C. Karen Liu

发表机构 * UC Berkeley（加州大学伯克利分校）； CMU（卡内基梅隆大学）； Stanford University（斯坦福大学）； Amazon FAR（亚马逊FAR）

AI总结提出Mana框架，将灵巧操作重解释为动画问题，通过粗到细的流水线自动生成操作轨迹，实现铰接工具的零样本仿真到现实迁移。

Comments Project Page: https://zhaohengyin.github.io/mana

详情

AI中文摘要

铰接工具的操作由于需要协调内部自由度与接触丰富的交互，仍然是灵巧机器人学中的一个主要挑战。虽然先前的工作主要集中在刚性物体上，但铰接工具的使用由于其物理复杂性以及学习功能性抓取和操作策略的困难，仍未得到充分探索。我们提出了Mana（操作动画器），一个通用的仿真到现实框架，将灵巧操作重新解释为动画问题。受计算机动画启发，Mana采用粗到细的流水线，通过运动规划和强化学习将程序生成的抓取关键帧转化为操作轨迹。数据生成过程基本自动化，仅需几次鼠标点击即可指定功能可供性（每个工具不到1分钟）。在跨越不同尺度和关节类型的四个铰接工具上，Mana实现了抓取和手内操作的零样本仿真到现实迁移，展示了灵巧铰接工具操作的可扩展方法。

英文摘要

Articulated tool manipulation remains a major challenge in dexterous robotics due to the need to coordinate internal degrees of freedom and contact-rich interactions. While prior work has largely focused on rigid objects, articulated tool use remains underexplored because of its physical complexity and the difficulty of learning functional grasping and manipulation policies. We present Mana (Manipulation Animator), a general sim-to-real framework that reinterprets dexterous manipulation as an animation problem. Inspired by computer animation, Mana employs a coarse-to-fine pipeline that transforms procedurally-generated grasp keyframes into manipulation trajectories through motion planning and reinforcement learning. The data generation process is largely automatic, requiring only a few mouse clicks to specify functional affordances (<1 minute per tool). Across four articulated tools spanning different scales and joint types, Mana achieves zero-shot sim-to-real transfer for both grasping and in-hand manipulation, demonstrating a scalable approach to dexterous articulated tool use.

URL PDF HTML ☆

赞 1 踩 0

2301.12538 2026-06-12 cs.LG cs.AI math.DS 版本更新

On Approximating the Dynamic Response of Synchronous Generators via Operator Learning: A Step Towards Building Deep Operator-based Power Grid Simulators

关于通过算子学习逼近同步发电机动态响应：迈向构建基于深度算子的电网模拟器的一步

Christian Moya, Amirhossein Mollaali, Guang Lin, Meng Yue

发表机构 * Purdue University（普渡大学）

AI总结提出基于算子学习的框架，利用DeepONet逼近同步发电机的动态响应，并设计递归模拟方案及残差DeepONet方案，结合数据聚合策略实现与电网交互的模拟。

详情

AI中文摘要

本文开发了一个算子学习框架，用于逼近同步发电机的动态响应。该框架可用于（i）构建一个基于神经网络的发电机模型，与电网模拟器交互，或（ii）跟踪真实发电机的暂态响应。首先，我们开发了一个数据驱动的深度算子网络（DeepONet）来逼近发电机的无限维解算子。然后，我们设计了一个基于DeepONet的数值方案，在给定的时间范围内模拟发电机的响应。所提出的方案递归地使用训练好的DeepONet来模拟给定多维输入下的响应，该输入描述了发电机与电网之间的相互作用。此外，我们设计了一个残差DeepONet数值方案，可以整合现有数学模型的信息。我们为这个残差DeepONet方案提供了预测累积误差的估计。最后，我们构建了一个数据聚合（DAgger）策略，允许使用DeepONet在与其他电网组件交互模拟中可能遇到的聚合训练数据对DeepONet进行微调。作为概念验证，我们证明了所提出的框架能够有效逼近同步发电机的暂态模型。

英文摘要

This paper develops an Operator Learning framework for approximating the dynamic response of synchronous generators. The framework can be used to (i) build a neural network-based generator model that interacts with a power grid simulator or (ii) shadow the true generator's transient response. First, we develop a data-driven Deep Operator Network (DeepONet) to approximate the infinite-dimensional solution operator of the generators. Then, we design a numerical scheme based on DeepONet that simulates the generator's response over a given time horizon. The proposed scheme recursively employs the trained DeepONet to simulate the response for a given multi-dimensional input that describes the interaction between the generator and the power grid. In addition, we design a residual DeepONet numerical scheme that can incorporate information from existing mathematical models. We accompany this residual DeepONet scheme with an estimate for the prediction's cumulative error. Finally, we build a data aggregation (DAgger) strategy that allows fine-tuning of DeepONets using aggregated training data that the DeepONets will likely encounter during interactive simulations with other grid components. As a proof of concept, we demonstrate that the proposed frameworks can effectively approximate the transient model of a synchronous generator.

URL PDF HTML ☆

赞 0 踩 0

2505.22695 2026-06-12 cs.LG 版本更新

LLM-ODDR: A Large Language Model Framework for Joint Order Dispatching and Driver Repositioning

LLM-ODDR：一种用于联合订单调度和司机重新定位的大语言模型框架

Tengfei Lyu, Siyuan Feng, Hao Liu, Hai Yang

发表机构 * Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou)（人工智能前沿技术 thrust，香港科学与技术大学（广州））； Department of Aeronautical and Aviation Engineering, The Hong Kong Polytechnic University（航空与航空工程系，香港理工大学）； Research Center for Low Altitude Economy, The Hong Kong Polytechnic University（低空经济研究中心，香港理工大学）； Department of Computer Science and Engineering, The Hong Kong University of Science and Technology（计算机科学与工程系，香港科学与技术大学）； Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology（土木与环境工程系，香港科学与技术大学）

AI总结提出LLM-ODDR框架，利用大语言模型联合优化网约车订单调度与司机重新定位，通过多目标价值细化、公平感知调度和时空需求感知重定位提升效果、适应性和可解释性。

Comments Published in IEEE Transactions on Intelligent Transportation Systems (TITS)

详情

AI中文摘要

网约车平台在动态城市环境中优化订单调度和司机重新定位操作面临重大挑战。基于组合优化、规则启发式和强化学习的传统方法往往忽视司机收入公平性、可解释性以及对现实动态的适应性。为弥补这些不足，我们提出LLM-ODDR，一种利用大语言模型（LLM）进行网约车服务中联合订单调度和司机重新定位（ODDR）的新型框架。LLM-ODDR框架包含三个关键组件：（1）多目标引导的订单价值细化，通过考虑多个目标评估订单以确定其整体价值；（2）公平感知的订单调度，平衡平台收入与司机收入公平性；（3）时空需求感知的司机重新定位，基于历史模式和预测供应优化空闲车辆放置。我们还开发了JointDR-GPT，一个针对ODDR任务进行领域知识微调的模型。在曼哈顿出租车运营的真实数据集上进行的大量实验表明，我们的框架在有效性、对异常条件的适应性以及决策可解释性方面显著优于传统方法。据我们所知，这是首次将LLM作为决策智能体应用于网约车ODDR任务，为将先进语言模型集成到智能交通系统中奠定了基础性见解。虽然当前框架的计算成本高于传统方法，但我们表明并行分解和模型蒸馏可以将延迟降低到可部署的生产水平。

英文摘要

Ride-hailing platforms face significant challenges in optimizing order dispatching and driver repositioning operations in dynamic urban environments. Traditional approaches based on combinatorial optimization, rule-based heuristics, and reinforcement learning often overlook driver income fairness, interpretability, and adaptability to real-world dynamics. To address these gaps, we propose LLM-ODDR, a novel framework leveraging Large Language Models (LLMs) for joint Order Dispatching and Driver Repositioning (ODDR) in ride-hailing services. LLM-ODDR framework comprises three key components: (1) Multi-objective-guided Order Value Refinement, which evaluates orders by considering multiple objectives to determine their overall value; (2) Fairness-aware Order Dispatching, which balances platform revenue with driver income fairness; and (3) Spatiotemporal Demand-Aware Driver Repositioning, which optimizes idle vehicle placement based on historical patterns and projected supply. We also develop JointDR-GPT, a fine-tuned model optimized for ODDR tasks with domain knowledge. Extensive experiments on real-world datasets from Manhattan taxi operations demonstrate that our framework significantly outperforms traditional methods in terms of effectiveness, adaptability to anomalous conditions, and decision interpretability. To our knowledge, this is the first exploration of LLMs as decision-making agents in ride-hailing ODDR tasks, establishing foundational insights for integrating advanced language models within intelligent transportation systems. While the current framework incurs higher computational costs than traditional methods, we show that parallel decomposition and model distillation can reduce latency to production-viable levels for deployment.

URL PDF HTML ☆

赞 0 踩 0

2508.04888 2026-06-12 cs.LG 版本更新

Retrieval-Augmented Foundation Models for Water Level Prediction in the Everglades

用于大沼泽地水位预测的检索增强基础模型

Rahuul Rangaraj, Jimeng Shi, Rajendra Paudel, Giri Narasimhan, Yanzhao Wu

发表机构 * Florida International University（佛罗里达国际大学）； Everglades National Park（大沼泽地国家公园）

AI总结针对大沼泽地水位预测，提出检索增强机制，利用统计相似性或互信息检索历史水文事件，提升预训练时序基础模型的长期预测性能，尤其在极端事件中效果显著。

详情

DOI: 10.1145/3770855.3818897

AI中文摘要

大沼泽地的准确水位预测对于防洪、干旱管理、水资源规划和生物多样性保护至关重要。尽管最近的时序基础模型在通用任务（体现在其预训练中）上表现出色，但它们在特定领域应用中的有效性仍未被充分理解。在这项工作中，我们整理了一个用于大沼泽地水位预测的领域特定数据集，并观察到当前最先进模型的性能仍然有限。为了解决这一差距，我们利用检索增强机制，从历史观测的外部档案中检索类似的多变量水文事件，以丰富这些预训练模型的输入上下文。我们研究了两种检索策略：基于统计相似性的检索和基于互信息的检索，并分析了纳入检索到的历史上下文如何影响预测性能。大量实验表明，检索增强一致地改善了长期水位预测，并在极端事件期间产生了不成比例的更大收益，这对环境决策尤为关键。我们的研究提供了经验证据，表明基于类比检索可以有益于环境科学中的预训练时序基础模型，为它们在大沼泽地水文预测中的应用提供了关于其优势、局限性和失败模式的实用见解。尽管在大沼泽地进行了评估，但所提出的框架是通用的，并且可以应用于给定时间序列数据的其他水文系统。代码和数据已在此 https URL 公开。

英文摘要

Accurate water level forecasting in the Everglades is essential for flood mitigation, drought management, water resource planning, and biodiversity conservation. While recent time-series foundation models have shown strong performance on generic tasks (represented in their pre-training), their effectiveness in domain-specific applications remains insufficiently understood. In this work, we curate a domain-specific dataset for water-level forecasting in the Everglades and observe that the performance of current state-of-the-art models remains limited. To address this gap, we leverage a retrieval-augmented mechanism that retrieves analogous multivariate hydrological episodes from an external archive of historical observations to enrich the input context of those pre-trained models. We study two retrieval strategies, statistical similarity-based retrieval and mutual information-based retrieval, and analyze how incorporating retrieved historical contexts affects predictive performance. Extensive experiments show that retrieval augmentation consistently improves long-horizon water level forecasts and yields disproportionately larger gains during extreme events, which is particularly critical for environmental decision-making. Our study provides empirical evidence that analog-based retrieval can benefit pretrained time-series foundation models in environmental science, offering practical insights into their strengths, limitations, and failure modes when applied to hydrological forecasting in the Everglades. Although evaluated in the Everglades, the proposed framework is general and can be applied to other hydrological systems given time series data. The code and data have been made publicly available at https://github.com/rahuul2992000/WaterRAF.

URL PDF HTML ☆

赞 0 踩 0

2509.07150 2026-06-12 cs.LG cond-mat.mtrl-sci 版本更新

PLaID++: A Preference Aligned Language Model for Targeted Inorganic Materials Design

PLaID++: 一种用于定向无机材料设计的偏好对齐语言模型

Andy Xu, Rohan Desai, Larry Wang, Ethan Ritz, Gabriel Hope

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结提出PLaID++，通过对称性感知的Wyckoff文本表示和温度缩放熵正则化，结合可验证奖励的强化学习，实现稳定、新颖且满足空间群属性的晶体生成，比先前方法效率提高约50%。

Comments Code available at https://github.com/andaero/PLaID, model weights at https://huggingface.co/HOPE-Lab-HMC/PLaID

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）已成为提高LLM正确性的有前景方法，然而在许多科学问题中，目标并非产生正确答案，而是产生满足一组约束的多样化候选方案。我们在材料生成背景下研究这一挑战。为此，我们引入了PLaID++，一个经过后训练的LLM，用于稳定且属性引导的晶体生成。我们发现性能取决于我们的晶体学表示和奖励公式。首先，我们引入了一种紧凑的、对称性感知的Wyckoff文本表示，提高了计算效率并鼓励从物理先验中泛化。其次，我们证明了温度缩放作为熵正则化器，可以抵消模式坍塌并鼓励探索。通过将对称性约束直接编码到文本中，并将模型输出引导至理想的化学空间，PLaID++生成热力学稳定、独特且新颖的结构，其速率比先前方法高约50%，并能条件性地生成具有所需空间群属性的结构。我们的工作展示了将自然语言处理中的后训练技术适应于材料设计的潜力，为定向和高效发现新材料铺平了道路。

英文摘要

Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a promising approach to improve correctness in LLMs, however, in many scientific problems, the objective is not necessarily to produce the correct answer, but instead to produce a diverse array of candidates which satisfy a set of constraints. We study this challenge in the context of materials generation. To this end, we introduce PLaID++, an LLM post-trained for stable and property-guided crystal generation. We find that performance hinges on our crystallographic representation and reward formulation. First, we introduce a compact, symmetry-informed Wyckoff text representation which improves computational efficiency and encourages generalization from physical priors. Second, we demonstrate that temperature scaling acts as an entropy regularizer which counteracts mode collapse and encourages exploration. By encoding symmetry constraints directly into text and guiding model outputs towards desirable chemical space, PLaID++ generates structures that are thermodynamically stable, unique, and novel at a $\sim$50\% greater rate than prior methods and conditionally generates structures with desired space group properties. Our work demonstrates the potential of adapting post-training techniques from natural language processing to materials design, paving the way for targeted and efficient discovery of novel materials.

URL PDF HTML ☆

赞 0 踩 0

2601.00921 2026-06-12 cs.LG cs.AI quant-ph 版本更新

Geometric and Quantum Kernel Methods for Predicting Skeletal Muscle Outcomes in chronic obstructive pulmonary disease

用于预测慢性阻塞性肺疾病骨骼肌结果的几何与量子核方法

Azadeh Alavi, Hamidreza Khalili, Stanley H. Chan, Fatemeh Kouchmeshki, Muhammad Usman, Ross Vlahos

发表机构 * School of Computing Technologies, RMIT University（计算技术学院，拉筹纳斯大学）； School of Health & Biomedical Sciences, STEM College, RMIT University（健康与生物医学科学学院，STEM学院，拉筹纳斯大学）； Pattern Recognition Pty Ltd, Melbourne（模式识别有限公司，墨尔本）； Data61, CSIRO（Data61，澳大利亚联邦科学与工业研究组织）

AI总结提出一种核几何量子混合方法，通过再生核希尔伯特空间映射合成SPD参考、随机投影压缩和低维量子回归电路，在COPD动物队列中预测肌肉重量、质量和力量，肌肉重量RMSE比最佳经典方法低约1.8%。

Comments 24 pages, 2 figures

详情

AI中文摘要

慢性阻塞性肺疾病（COPD）影响全球数亿人，骨骼肌功能障碍具有临床重要性。量子机器学习在生物医学预测中日益受到探索，但在小型生物标志物队列中的价值需要与强经典基线进行基准测试。我们分析了一个由213只动物组成的香烟烟雾COPD队列，利用血液和支气管肺泡灌洗生物标志物预测胫骨前肌重量、肌肉质量和力量。我们开发了一种核几何量子混合方法，其中合成对称正定（SPD）参考通过再生核希尔伯特空间映射，使用仅训练随机投影压缩，归一化，并输入低维量子回归电路。我们将该方法与经典岭/核模型、SPD关系表示和量子核回归（QKR）进行了基准测试。所有方法均使用条件分层重复交叉验证进行评估。最大的数值改进出现在肌肉重量上，所提出方法的平均均方根误差（RMSE）数值最低，比最佳经典比较器低约1.8%；配对折叠水平测试在Holm调整后未建立统计显著性优势，但该终点具有生物学意义。该方法在肌肉质量上也具有数值最低的平均RMSE。对于力量，仅使用生物标志物的岭回归表现最佳，表明更线性的终点结构。

英文摘要

Chronic obstructive pulmonary disease (COPD) affects hundreds of millions of people worldwide, and skeletal-muscle dysfunction is clinically important. Quantum machine learning is increasingly explored for biomedical prediction, but its value in small biomarker cohorts requires benchmarking against strong classical baselines. We analysed a cigarette-smoke COPD cohort of 213 animals with blood and bronchoalveolar-lavage biomarkers to predict tibialis anterior muscle weight, muscle quality, and force. We developed a kernel-geometric quantum hybrid method in which synthetic symmetric positive definite (SPD) references are mapped through a reproducing kernel Hilbert space, compressed using train-only random projection, normalised, and supplied to low-dimensional quantum regression circuits. We benchmarked this approach against classical ridge/kernel models, SPD relational representations, and quantum-kernel regression (QKR). All methods were evaluated using condition-stratified repeated cross-validation. The largest numerical improvement was observed for muscle weight, where the proposed method had the numerically lowest mean root mean squared error (RMSE), approximately 1.8% below the best classical comparator; paired fold-level testing did not establish statistically significant superiority after Holm adjustment, but the endpoint is biologically meaningful. The method also had the numerically lowest mean RMSE for muscle quality. For force, biomarker-only Ridge performed best, suggesting a more linear endpoint structure.

URL PDF HTML ☆

赞 0 踩 0

2601.09693 2026-06-12 cs.LG stat.ML 版本更新

Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design

对比几何学习实现统一的结构与配体药物设计

Lisa Schneckenreiter, Sohvi Luukkonen, Lukas Friedrich, Daniel Kuhn, Günter Klambauer

发表机构 * DeepMind Ltd（DeepMind有限公司）

AI总结提出对比几何模型ConGLUDe，统一结构与配体训练，实现虚拟筛选、靶标钓鱼和配体条件口袋预测，在多项基准测试中表现优异。

Comments Forty-Third International Conference on Machine Learning

详情

AI中文摘要

基于结构和基于配体的计算药物设计传统上依赖于不相关的数据源和建模假设，限制了它们在大规模上的联合使用。在这项工作中，我们引入了用于统一计算药物设计的对比几何学习（ConGLUDe），这是一个单一的对比几何模型，统一了基于结构和基于配体的训练。ConGLUDe将产生全蛋白质表示和预测结合位点的隐式嵌入的几何蛋白质编码器与快速配体编码器耦合，消除了对预定义口袋的需求。通过对比学习将配体与全局蛋白质表示和多个候选结合位点对齐，ConGLUDe除了支持虚拟筛选和靶标钓鱼外，还支持配体条件口袋预测，同时在蛋白质-配体复合物和大规模生物活性数据上联合训练。在多种基准测试中，ConGLUDe实现了具有竞争力的零样本虚拟筛选性能，在具有挑战性的靶标钓鱼任务上显著优于现有方法，并展示了最先进的配体条件口袋选择。这些结果突显了统一结构-配体训练的优势，并将ConGLUDe定位为迈向药物发现通用基础模型的一步。

英文摘要

Structure-based and ligand-based computational drug design have traditionally relied on disjoint data sources and modeling assumptions, limiting their joint use at scale. In this work, we introduce Contrastive Geometric Learning for Unified Computational Drug Design (ConGLUDe), a single contrastive geometric model that unifies structure- and ligand-based training. ConGLUDe couples a geometric protein encoder that produces whole-protein representations and implicit embeddings of predicted binding sites with a fast ligand encoder, removing the need for predefined pockets. By aligning ligands with both global protein representations and multiple candidate binding sites through contrastive learning, ConGLUDe supports ligand-conditioned pocket prediction in addition to virtual screening and target fishing, while being trained jointly on protein-ligand complexes and large-scale bioactivity data. Across diverse benchmarks, ConGLUDe achieves competitive zero-shot virtual screening performance, substantially outperforms existing methods on a challenging target fishing task, and demonstrates state-of-the-art ligand-conditioned pocket selection. These results highlight the advantages of unified structure-ligand training and position ConGLUDe as a step toward general-purpose foundation models for drug discovery.

URL PDF HTML ☆

赞 0 踩 0

2601.15503 2026-06-12 cs.LG 版本更新

Data-driven Lake Water Quality Forecasting for Time Series with Missing Data using Machine Learning

基于机器学习的数据驱动湖泊水质时间序列缺失数据预测

Rishit Chatterjee, Tahiya Chowdhury

发表机构 * Department of Computer Science, Colby College（科克学院计算机科学系）

AI总结针对志愿者监测导致的湖泊数据缺失问题，采用多重插补和岭回归，在30个湖泊数据集上实现透明度预测，并量化了最小样本量和特征集，提出联合可行性函数以优化监测策略。

Comments 8 pages, 4 figures, 3 tables

详情

DOI: 10.1109/SusTech67720.2026.11536227
Journal ref: Published in: 2026 IEEE Conference on Technologies for Sustainability (SusTech)

AI中文摘要

志愿者主导的湖泊监测产生不规则、季节性的时间序列，由于冰盖、天气相关的通行限制以及偶尔的人为错误，存在大量缺失数据，这给有害藻华预测和早期预警带来了困难。我们研究了基于来自缅因州湖泊三十年间原位记录的数据丰富子集（30个湖泊）的塞氏盘深度（SDD）预测。通过链式方程多重插补（MICE）处理缺失数据，并使用归一化平均绝对误差（nMAE）指标进行跨湖泊性能比较。在六种候选模型中，岭回归提供了最佳的平均测试性能。利用岭回归，我们量化了最小样本量，表明在向后近期历史协议下，模型平均每个湖泊约176个训练样本即可达到全历史准确率的5%以内。我们还确定了最小特征集，其中紧凑的四特征子集在相同5%容差内匹配了十三特征基线。综合这些结果，我们引入了一个联合可行性函数，该函数识别出达到完整历史、全特征基线5%以内目标所需的最小训练历史和最少预测变量。在我们的研究中，达到5%准确率目标需要每个湖泊约64个近期样本和仅一个预测变量，凸显了针对性监测的实用性。因此，我们的联合可行性策略在固定准确率目标下统一了近期历史长度和特征选择，为湖泊研究人员制定采样工作和测量优先级提供了简单高效的规则。

英文摘要

Volunteer-led lake monitoring yields irregular, seasonal time series with many gaps arising from ice cover, weather-related access constraints, and occasional human errors, complicating forecasting and early warning of harmful algal blooms. We study Secchi Disk Depth (SDD) forecasting on a 30-lake, data-rich subset drawn from three decades of in-situ records collected across Maine lakes. Missingness is handled via Multiple Imputation by Chained Equations (MICE), and we evaluate performance with a normalized Mean Absolute Error (nMAE) metric for cross-lake comparability. Among six candidates, ridge regression provides the best mean test performance. Using ridge regression, we then quantify the minimal sample size, showing that under a backward, recent-history protocol, the model reaches within 5% of full-history accuracy with approximately 176 training samples per lake on average. We also identify a minimal feature set, where a compact four-feature subset matches the thirteen-feature baseline within the same 5% tolerance. Bringing these results together, we introduce a joint feasibility function that identifies the minimal training history and fewest predictors sufficient to achieve the target of staying within 5% of the complete-history, full-feature baseline. In our study, meeting the 5% accuracy target required about 64 recent samples and just one predictor per lake, highlighting the practicality of targeted monitoring. Hence, our joint feasibility strategy unifies recent-history length and feature choice under a fixed accuracy target, yielding a simple, efficient rule for setting sampling effort and measurement priorities for lake researchers.

URL PDF HTML ☆

赞 0 踩 0

2603.11249 2026-06-12 cs.LG 版本更新

Differentiable Thermodynamic Phase-Equilibria for Machine Learning

可微热力学相平衡用于机器学习

Karim K. Ben Hicham, Moreno Ascani, Jan G. Rittig, Alexander Mitsos

发表机构 * RWTH Aachen University（亚琛工业大学）； Process Systems Engineering (AVT.SVT)（过程系统工程）； Forschungszentrum Jülich GmbH（吕根研究中心）； Institute of Climate and Energy Systems ICE-1（气候与能源系统研究所）； Energy Systems Engineering（能源系统工程）； JARA-ENERGY

AI总结提出DISCOMAX算法，通过可微相平衡计算结合离散枚举与掩码softmax，实现热力学一致性端到端学习，在二元液液平衡数据上优于现有方法。

Comments 45 pages, 27 figures, 5 tables

详情

AI中文摘要

相平衡的准确预测仍是化学工程中的核心挑战。将热力学结构融入神经网络的物理一致性机器学习方法最近在活度系数建模中表现出色。然而，将此类方法扩展到源于极值原理的平衡数据（如液液平衡）仍然困难。本文提出DISCOMAX，一种用于相平衡计算的可微算法，在训练和推理时均保证热力学一致性，仅受用户指定的离散化影响。该方法将可行相态的离散枚举与反向传播中的掩码softmax聚合相结合，在前向传播中传播真实平衡态，使用直通梯度估计器实现神经gE模型的物理一致性端到端学习。我们展示了该方法与统计热力学的类比，并在二元液液平衡数据上评估，其优于现有基于代理的方法，同时为从不同种类的平衡数据中学习提供了通用框架。

英文摘要

Accurate prediction of phase equilibria remains a central challenge in chemical engineering. Physics-consistent machine learning methods that incorporate thermodynamic structure into neural networks have recently shown strong performance for activity-coefficient modeling. However, extending such approaches to equilibrium data arising from an extremum principle, such as liquid-liquid equilibria, remains difficult. Here we present DISCOMAX, a differentiable algorithm for phase-equilibrium calculation that guarantees thermodynamic consistency at both training and inference, only subject to a user-specified discretization. The method combines discrete enumeration of feasible phase states with masked softmax aggregation in the backward pass, with the propagation of the true equilibrium state in the forward pass, using a straight-through gradient estimator to enable physics-consistent end-to-end learning of neural \gls{gE}-models. We show that this approach bears analogy to statistical thermodynamics, and we evaluate it on binary liquid-liquid equilibrium data where it outperforms existing surrogate-based methods, while offering a general framework for learning from different kinds of equilibrium data.

URL PDF HTML ☆

赞 0 踩 0

2603.11479 2026-06-12 cs.LG cs.AI cs.MA 版本更新

Grammar of the Wave: Towards Explainable Multivariate Time Series Event Detection via Neuro-Symbolic VLM Agents

波的语法：通过神经符号VLM智能体实现可解释的多变量时间序列事件检测

Sky Chenwei Wan, Yifei Y. Wang, Tianjun Hou, Xiqing Chang, Aymeric Jan

发表机构 * AI Lab, SLB（SLB人工智能实验室）； Télécom Paris, Institut Polytechnique de Paris, France（巴黎电信学院，巴黎高等理工学院，法国）

AI总结提出语言引导的时间序列事件检测（TSED）任务，通过事件逻辑树（ELT）将文本描述转化为结构化时序逻辑，并构建神经符号VLM智能体SELA，实现零/少样本事件检测与可解释推理。

Comments 8 pages (main text), 28 pages total including appendix. 9 figures, 7 tables

详情

AI中文摘要

时间序列事件检测（TSED）旨在定位时间序列数据中具有语义意义的事件，在高风险领域具有关键应用。与统计异常不同，事件通常由自然语言描述定义，且跨多个物理通道具有内部时序逻辑结构。然而，在现实场景中，密集的事件标注成本高昂，使得纯监督学习困难。我们引入了语言引导的TSED，该设置中模型被赋予文本事件描述，并必须在几乎没有标注数据的情况下将其映射到多变量信号中的区间。为了解决这个问题，我们提出了事件逻辑树（ELT），一种知识表示框架，将语言描述转化为信号基元上的结构化时序逻辑。基于ELT，我们提出了SELA，一种神经符号VLM智能体框架，它从信号可视化中迭代地接地基元，并在ELT约束下组合它们，产生事件区间和忠实的树状结构解释。我们进一步发布了跨能源和气候领域的真实世界基准，包含专家知识和标注。实验表明，SELA优于监督微调和现有的零/少样本时间序列推理基线。

英文摘要

Time Series Event Detection (TSED) aims to localize semantically meaningful events in time series data, with critical applications in high-stakes domains. Unlike statistical anomalies, events are often defined by natural-language descriptions with internal temporal-logic structures across multiple physical channels. However, in real-world settings, dense event annotations are expensive to obtain, making purely supervised learning difficult. We introduce Language-guided TSED, a setting where a model is given textual event descriptions and must ground them to intervals in multivariate signals with little or no labeled data. To address this problem, we propose Event Logic Tree (ELT), a knowledge representation framework that converts linguistic descriptions into structured temporal logic over signal primitives. Building on ELT, we present SELA, a neuro-symbolic VLM agent framework that iteratively grounds primitives from signal visualizations and composes them under ELT constraints, producing both event intervals and faithful tree-structured explanations. We further release a real-world benchmark across energy and climate domains with expert knowledge and annotations. Experiments show that SELA improves over supervised fine-tuning and existing zero/few-shot time series reasoning baselines.

URL PDF HTML ☆

赞 0 踩 0

2604.12497 2026-06-12 cs.LG stat.ML 版本更新

Allocating Human Oversight in AI-Enabled Analytics

AI赋能分析中的人类监督分配

Zikun Ye, Jiameng Lyu, Rui Tao

发表机构 * Michael G. Foster School of Business, University of Washington（华盛顿大学迈克尔·G·福斯特商学院）； Department of Management Science, School of Management, Fudan University（复旦大学管理学院管理科学系）； Guanghua School of Management, Peking University（北京大学光华管理学院）

AI总结针对AI预测可靠性异质且未知的问题，提出基于上置信界的在线学习策略，动态分配有限的人类验证预算，使终端效率损失随预算增长趋于零。

详情

AI中文摘要

组织越来越多地部署AI作为面向客户的决策过程中的低成本预测层，包括需求感知、服务质量监控、产品测试和市场研究，但AI生成的信号在不同任务、产品和客户细分中的可靠性并不均匀。因此，企业仍然需要稀缺的人类验证（标签、审计、调查回复或后续测量）来将AI输出锚定到真实情况。由于人类真实情况本身存在噪声，在不同标注者之间甚至重复判断中都有所变化，企业必须为每个任务收集并平均多个人类标签，这使得人类验证成本高昂。我们研究如何在可靠性异质且在部署前未知的情况下，将有限的人类验证预算分配到多个AI辅助任务中。我们将其置于调优的预测驱动推断框架内。每个人类标签既提高了AI辅助估计的精度，也揭示了任务的修正难度，即在使用AI预测作为控制变量后剩余的方差。如果难度已知，最优分配将遵循Neyman平方根规则；由于未知，我们提出一种基于上置信界的策略，该策略在线学习难度并将验证导向AI最不可靠的任务。我们证明，随着预算增长，该策略相对于最优分配的终端效率损失趋于零。在合成实验和一个包含68个任务和超过2000名受访者的真实数字孪生调查中，当可靠性异质时，该策略缩小了与最优分配的大部分差距，优于均匀分配和epsilon-贪婪分配；在调查数据上，它还优于先探索后提交的试点设计，并将均匀分配的10-12%差距缩小到2-6%。AI的价值不仅取决于模型准确性，还取决于将人类监督定向到AI错误影响最大的操作策略。

英文摘要

Organizations increasingly deploy AI as a low-cost prediction layer in customer-facing decision processes, including demand sensing, service-quality monitoring, product testing, and market research, but AI-generated signals are unevenly reliable across tasks, products, and customer segments. Firms therefore still need scarce human validation (labels, audits, survey responses, or follow-up measurements) to anchor AI outputs to ground truth. Because human ground truth is itself noisy, varying across labelers and even across repeated judgments, the firm must collect and average several human labels per task, which makes human validation costly. We study how to allocate a limited human-validation budget across many AI-assisted tasks when reliability is heterogeneous and unknown before deployment. We cast this within tuned prediction-powered inference. Each human label both sharpens the AI-assisted estimate and reveals the task's rectification difficulty, the variance that remains after the AI prediction is optimally used as a control variate. If difficulties were known, the optimal allocation would follow a Neyman square-root rule; because they are unknown, we propose a policy based on upper confidence bounds that learns them online and steers validation toward tasks where AI is least reliable. We prove that the policy's terminal efficiency loss relative to the oracle allocation vanishes as the budget grows. In synthetic experiments and a real digital-twin survey with 68 tasks and over 2000 respondents, it closes most of the gap to the oracle when reliability is heterogeneous, outperforming uniform and epsilon-greedy allocation; on the survey data it also outperforms explore-then-commit pilot designs and cuts uniform's 10--12% gap to 2--6%. The value of AI depends not only on model accuracy but also on the operational policy that targets human oversight where AI errors matter most.

URL PDF HTML ☆

赞 0 踩 0

2604.20236 2026-06-12 cs.LG 版本更新

Machine Learning-based Two-Stage Graph Sparsification for the Travelling Salesman Problem

基于机器学习的两阶段图稀疏化方法用于旅行商问题

Bo-Cheng Lin, Yi Mei, Mengjie Zhang

发表机构 * Centre for Data Science and Artificial Intelligence（数据科学与人工智能中心）； School of Engineering and Computer Science（工程与计算机科学学院）； Victoria University of Wellington（惠灵顿维多利亚大学）

AI总结提出两阶段方法，先结合α-Nearest和POPMUSIC得到近完美召回率的候选图，再用轻量级分类器修剪单源边，在保持≥99.69%最优边的同时降低37%-47%密度。

详情

AI中文摘要

高性能TSP求解器（如Lin-Kernighan-Helsgaun (LKH)）在\emph{候选图}（为求解器预先选定的边的小子集）中搜索，而不是在完整图上搜索。两种主要的稀疏化启发式方法，$\alpha$-Nearest和POPMUSIC，各自在密度-覆盖率平衡上存在不足：$\alpha$-Nearest密集且召回率稳定，而POPMUSIC更稀疏但其召回率随规模增大而下降。它们的并集在密度上远低于完整图的同时弥补了召回率差距，为进一步缩减留下了空间。现有的基于学习的稀疏化方法在完整图上对边评分，这种方法代价高昂且主要限于欧几里得实例。我们提出了一种两阶段方法，反转了这一逻辑。第一阶段取$\alpha$-Nearest和POPMUSIC的并集，在${\sim}6N$条边上实现近乎完美的召回率。关键在于，并集为每条边标注了其\emph{来源出处}——即它是由$\alpha$-Nearest、POPMUSIC还是两者共同支持的。第二阶段在这些标注边上训练一个轻量级分类器，并修剪得分最低的边。由于双源边几乎总是最优的，学习问题简化为过滤单源子集——这比从头开始对所有$O(N^2)$条边进行分类要容易得多。在四种距离类型、五种空间分布以及50到500的问题规模上，该流程将候选图密度降低了37%-47%，同时保留了${\geq}99.69\%$的最优旅行边，并且在TSP500上以更低的密度达到或超过了近期仅限欧几里得的神经稀疏化方法的覆盖率。

英文摘要

High-performance TSP solvers such as Lin-Kernighan-Helsgaun (LKH) search within a \emph{candidate graph} -- a small subset of edges pre-selected for the solver -- rather than over the complete graph. The two leading sparsification heuristics, $α$-Nearest and POPMUSIC, each fall short of the density-coverage balance: $α$-Nearest is dense with stable recall, while POPMUSIC is sparser but its recall degrades with scale. Their union closes the recall gap while remaining far below the complete graph in density, leaving room for further reduction. Existing learning-based sparsifiers score edges on the complete graph, an approach that is expensive and largely limited to Euclidean instances. We propose a two-stage method that inverts this logic. Stage~1 takes the union of $α$-Nearest and POPMUSIC, achieving near-perfect recall at ${\sim}6N$ edges. Crucially, the union annotates each edge with its \emph{source provenance} -- whether it was endorsed by $α$-Nearest, POPMUSIC, or both. Stage~2 trains a lightweight classifier on these annotated edges and prunes the lowest-scoring ones. Because dual-source edges are almost always optimal, the learning problem reduces to filtering the single-source subset -- a substantially easier task than classifying all $O(N^2)$ edges from scratch. Across four distance types, five spatial distributions, and problem sizes from 50 to 500, the pipeline reduces candidate-graph density by $37$-$47\%$ while retaining ${\geq}99.69\%$ of optimal-tour edges, and matches or exceeds the coverage of recent Euclidean-only neural sparsifiers at lower density at TSP500.

URL PDF HTML ☆

赞 0 踩 0

2606.02044 2026-06-12 cs.LG physics.med-ph 版本更新

Realistic noise synthesis reduces bias and improves tissue microstructure estimation with supervised machine learning

真实噪声合成减少偏差并改善有监督机器学习的组织微结构估计

Bradley G. Karat, Maëliss Jallais, Ali R. Khan, Santiago Aja-Fernández, Jelle Veraart, Marco Palombo

AI总结针对扩散MRI中模拟与实测信号噪声不匹配导致的协变量偏移问题，提出真实噪声合成框架，通过引入Rician期望和有效后处理噪声方差，显著降低参数估计偏差并提高精度。

Comments * Shared first author

详情

AI中文摘要

扩散MRI能够无创探测组织微结构，但准确的参数估计受到噪声相关效应的挑战。在基于模拟数据训练的有监督机器学习框架中，模拟信号与采集信号的噪声特性差异引入了一种协变量偏移，导致训练和推理时的输入信号分布不同。我们研究了这种不匹配对微结构参数估计的影响，并提出了一种真实噪声合成（RNS）框架来缓解该问题。RNS将Rician期望和有效后处理噪声方差同时纳入模拟训练信号。Rician期望使用MPPCA估计的噪声标准差建模，而有效标准差则从预处理数据的球谐残差中导出。该方法使用cylinder-zeppelin和SANDI模型在多个SNR水平的模拟数据集以及具有重复采集的体内扩散数据上进行了评估。还评估了对噪声误估计的敏感性。训练过程中忽略幅度诱导的噪声效应会产生系统性的、依赖于SNR的参数偏差，尤其是在低SNR下。引入Rician期望显著降低了偏差，使其达到噪声感知的非线性最小二乘拟合的水平。对有效标准差进行建模进一步提高了精度。性能在很大程度上独立于回归架构，但对准确的噪声估计敏感。这些发现表明，在模拟训练数据中进行真实噪声建模可以减轻信号域的协变量偏移，并且对于无偏的监督微结构估计至关重要，特别是在与高b值或高空间分辨率相关的低SNR区域。

英文摘要

Diffusion MRI enables non-invasive probing of tissue microstructure, but accurate parameter estimation is challenged by noise-related effects. In supervised machine learning frameworks trained on simulated data, discrepancies between the noise characteristics of simulated and acquired signals introduce a form of covariate shift, whereby the input signal distribution differs between training and inference. We investigated the impact of this mismatch on microstructure parameter estimation and propose a realistic noise synthesis (RNS) framework to mitigate it. RNS incorporates both the Rician expectation and the effective post-processing noise variance into simulated training signals. The Rician expectation was modelled using a noise standard deviation estimated with MPPCA, while the effective standard deviation was derived from spherical harmonic residuals of preprocessed data. The method was evaluated using the cylinder-zeppelin and the SANDI models on simulated datasets across multiple SNR levels and on in vivo diffusion data with repeated acquisitions. Sensitivity to noise misestimation was also assessed. Ignoring magnitude-induced noise effects during training produced systematic, SNR-dependent parameter bias, particularly at low SNR. Incorporating the Rician expectation substantially reduced bias to the level of noise-aware nonlinear least-squares fitting. Modelling the effective standard deviation further improved precision. Performance was largely independent of regression architecture but sensitive to accurate noise estimation. These findings demonstrate that realistic noise modelling in simulated training data mitigates signal-domain covariate shift and is essential for unbiased supervised microstructure estimation, particularly in low-SNR regimes associated with high b-values or high spatial resolution.

URL PDF HTML ☆

赞 0 踩 0

2606.10069 2026-06-12 cs.LG physics.geo-ph 版本更新

Using Seismic Statistical Features and VQ-VAE to Improve Spatiotemporal Seismicity Predictability

基于VQ-VAE和地震统计特征的时空地震危险性评估

Wei Quan, Denise Gorse

AI总结本文在先前基于XGBoost和地震统计特征的研究基础上，将预测从全区域扩展到局部区域，并引入基于VQ-VAE模型从二维地震图提取的新特征，提升了局部地震预测性能。

Comments Title updated from "Spatiotemporal Seismic Hazard Assessment Using VQ-VAE and Seismic Statistical Features" to "Using Seismic Statistical Features and VQ-VAE to Improve Spatiotemporal Seismicity Predictability" in v2 to better reflect the focus of the paper. The content is unchanged apart from the title and minor copyediting

详情

AI中文摘要

在本文中，我们基于先前的一项研究，该研究使用XGBoost以及日本和智利的地震目录数据证明，一组60个地震统计特征（SSFs）比tsfresh包中的428个通用时间序列特征具有更大的预测价值。我们在此以两种关键方式扩展了先前的工作，重点使用日本的数据，因为需要大数据集来训练深度学习（自编码器）模型。首先，我们从全区域预测（针对每个候选事件，考虑未来15天内区域内任何地方发生M≥5.0事件的可能性）转向局部预测，其中特征计算区域和预测区域都限制在候选事件周围半径24公里的圆内，并且我们表明性能仍然优秀，与先前同一区域的全局研究相似。其次，我们将基于一维（目录）数据的这套经过验证的SSFs与基于二维地震图的新特征相结合，该特征通过训练VQ-VAE模型以输出此类地图，并识别其误差度量与局部地壳应力积累的关系。我们表明，尽管仅基于SSFs的局部预测可以单独有效，测试AUC值与先前日本全局研究中的值一样高，但包含新的原生空间VQ-VAE衍生特征（通过SHAP分析排名最高）可以提升性能，并且似乎几乎完全取代了传统计算的b值在特征使用中的位置。

英文摘要

In this paper we build upon a previous study in which we demonstrated, using XGBoost and earthquake catalogue data from Japan and Chile, that a set of 60 seismic statistical features (SSFs) had much greater predictive value than a set of 428 generic time series features from the tsfresh package. We here extend this previous work in two key ways, focusing on data from Japan as a large dataset is necessary in order to allow for the training of a deep learning (autoencoder) model. First, we move from whole-region prediction (considering, for each candidate event, the likelihood of an event M $\geq$ 5.0 anywhere in the region in the next 15 days) to localised predictions in which both the region of feature computation and the region of prediction are restricted to a circle of radius 24 km around the candidate event, and we show that performance remains excellent, similar to our previous whole-region study for the same area. Second, we here couple this proven set of SSFs, based on one-dimensional (catalogue) data, with a novel feature based on two-dimensional seismic maps, obtained by training a VQ-VAE model to reproduce such maps as output and identifying a measure of its error in doing so with a localised build-up of crustal stress. We show that while localised prediction based on SSFs can be effective alone, with test AUC values as high as those obtained in the case of Japan in our previous whole-region study, the inclusion of the new natively-spatial VQ-VAE-derived feature, top-ranked by SHAP analysis, can enhance performance and additionally appears to near-wholly replace the traditionally-computed $b$-value in terms of feature usage.

URL PDF HTML ☆

赞 0 踩 0

2606.11793 2026-06-12 cs.LG cs.AI physics.ao-ph 版本更新

Scalable Deep Learning Framework for Global High-Resolution Land Use Reconstruction

AI4Land: 面向全球高分辨率土地利用重建的可扩展深度学习

Amirpasha Mozaffari, Marina Castaño, Stefano Materia, Etienne Tourigny, Oscar Molina-Sedano, Jordi Varela-Agrelo, Dario Garcia-Gasulla, Miguel Castrillo Melguizo, Mario Acosta, Amanda Duarte

发表机构 * Barcelona Supercomputing Center（巴塞罗那超级计算中心）

AI总结提出AI4Land框架，采用U-Net两阶段方法，结合粗分辨率情景数据与静态地理特征，重建高分辨率年度土地利用与覆盖，减少陆地碳循环不确定性，支持气候模拟。

详情

AI中文摘要

陆地碳循环的不确定性仍是气候预测的主要制约因素，部分源于地球系统模型中陆面表征和变率的不确定性。为解决此问题，我们提出了数据驱动框架AI4Land，用于生成关键陆面变量的高分辨率历史重建和未来预测。该框架采用U-Net架构的两阶段方法。在第一阶段（本文重点），它通过整合粗分辨率情景数据与静态地理特征，重建年度土地利用与土地覆盖。在计划的第二阶段，生成的高分辨率地图将用于在更细时间尺度上预测动态生物物理变量，特别是叶面积指数。模型基于地球观测数据训练，学习再现空间明确且物理一致的陆面模式，并将时间覆盖扩展到缺乏直接观测的时期。AI4Land在MareNostrum5上开发和训练，展示了GPU加速的高性能计算基础设施如何支持全球尺度的气候AI流水线。最终产品是一套开源模拟器，旨在与数字孪生平台（如Destination Earth计划下开发的平台）实时耦合。通过按需提供逼真且演变的陆面条件，本工作旨在减少关键不确定性，提高下一代气候模拟的预测能力。

英文摘要

Uncertainty in the terrestrial carbon cycle remains a major constraint in climate projections, partly driven by the uncertainties affecting the land surface representation and variability in Earth system models. To address this limitation, we present a data-driven framework AI4Land, for generating high-resolution historical reconstructions and future projections of key land surface variables. The framework follows a two-phase approach using a U-Net architecture. In the first phase, which is the focus of this work, it reconstructs annual land use and land cover by integrating coarse-resolution scenario data with static geophysical features. In a planned second phase, the resulting high-resolution maps will be used to predict dynamic biophysical variables, particularly leaf area index, at finer temporal scales. Trained on Earth observation data, the models learn to reproduce spatially explicit and physically consistent land surface patterns, extending temporal coverage to periods lacking direct observations. AI4Land was developed and trained on MareNostrum5, demonstrating how GPU-accelerated HPC infrastructure enables global-scale climate AI pipelines. The final product is a suite of open-source emulators designed for real-time coupling with digital twin platforms, such as those developed under the Destination Earth initiative. By delivering realistic and evolving land surface conditions on demand, this work aims to reduce critical uncertainties and improve the predictive power of next-generation climate simulations.

URL PDF HTML ☆

赞 0 踩 0

2410.00903 2026-06-12 stat.AP cs.CL cs.LG 版本更新

固态合成预测机器学习模型的热力学评估

Jane Schlesinger, Simon Hjaltason, Nathan J. Szymanski, Christopher J. Bartel

发表机构 * University of Minnesota（明尼苏达大学）

AI总结评估了机器学习模型预测固态材料合成可行性的热力学一致性，发现模型普遍高估合成可能性，但部分分数与热力学启发式趋势一致。

详情

AI中文摘要

机器学习模型最近被用于预测假设的固态材料是否可合成。这些模型旨在绕过固态相变的直接第一性原理建模，而是从成功合成材料的大型数据库中学习。在这里，我们评估了几个最近引入的合成预测模型与材料和反应热力学的对齐程度，通过相对于凸包的能量和考虑枚举合成反应的热力学选择性的度量来量化。使用成功合成配方的数据集确定了这两个量的可能界限，超出该界限的材料被认为不太可能被合成。以这些界限为背景，使用CHGNet基础势计算了通过Chemeleon生成模型生成的数千种新假设材料的热力学量。将四个最近发表的用于可合成性预测的机器学习模型应用于同一数据集，并将所得预测与计算的热力学进行比较。我们发现这些模型普遍高估了合成的可能性，但一些模型分数确实与热力学启发式趋势一致，对稳定性较差或没有计算为热力学选择性的可用合成配方的材料分配较低的分数。总的来说，这项工作识别了机器学习模型在材料合成中存在的差距，并引入了一种在缺乏大量负例（失败合成）的情况下评估其质量的新方法。

英文摘要

Machine learning models have recently emerged to predict whether hypothetical solid-state materials can be synthesized. These models aim to circumvent direct first-principles modeling of solid-state phase transformations, instead learning from large databases of successfully synthesized materials. Here, we assess the alignment of several recently introduced synthesis prediction models with material and reaction thermodynamics, quantified by the energy with respect to the convex hull and a metric accounting for thermodynamic selectivity of enumerated synthesis reactions. A dataset of successful synthesis recipes was used to determine the likely bounds on both quantities beyond which materials can be deemed unlikely to be synthesized. With these bounds as context, thermodynamic quantities were computed using the CHGNet foundation potential for thousands of new hypothetical materials generated using the Chemeleon generative model. Four recently published machine learning models for synthesizability prediction were applied to this same dataset, and the resultant predictions were considered against computed thermodynamics. We find these models generally overpredict the likelihood of synthesis, but some model scores do trend with thermodynamic heuristics, assigning lower scores to materials that are less stable or do not have an available synthesis recipe that is calculated to be thermodynamically selective. In total, this work identifies existing gaps in machine learning models for materials synthesis and introduces a new approach to assess their quality in the absence of extensive negative examples (failed syntheses).

URL PDF HTML ☆

赞 0 踩 0

2605.12542 2026-06-12 astro-ph.IM astro-ph.EP cs.LG 版本更新

基于深度学习的代数雷诺应力闭合模型用于湍流RANS模拟

Daniel Dehtyriov, Jonathan F. MacArt, Justin Sirignano

发表机构 * Mathematical Institute, University of Oxford（牛津大学数学研究所）； Aerospace and Mechanical Engineering, University of Notre Dame（诺特丹大学航空航天与机械工程系）

AI总结提出一种物理驱动的深度学习闭合模型DARSM，通过神经网络映射流动不变量到隐式代数雷诺应力方程中的经验参数，并结合伴随方程实现端到端优化，在方形管道和周期性山丘基准测试中平均速度误差降低2-4倍。

详情

AI中文摘要

湍流在工程和科学中普遍存在，但直接模拟成本过高。雷诺平均纳维-斯托克斯（RANS）方程可节省超过十个数量级的计算量，但引入了未封闭项（封闭问题）。离线训练的机器学习（ML）闭合模型在预测模拟中会出现分布偏移，而绕过控制方程的ML方法难以从稀缺的高保真数据中泛化。我们开发了一种基于物理的深度学习RANS闭合模型——深度代数雷诺应力模型（DARSM），该模型可在小数据集上训练，并准确泛化到不同雷诺数、未见几何形状和不同流动状态。神经网络将流动不变量映射到隐式代数雷诺应力方程中的经验参数，该方程基于弱平衡假设从雷诺应力输运方程推导而来，为ML闭合施加了基于物理的结构。通过控制偏微分方程和耦合隐式闭合的端到端优化消除了分布偏移，但展开和隐式自动微分在刚性耦合求解器上均失败。我们推导了利用求解器隐式-显式结构的伴随方程，以实现高效优化。在标准方形管道和周期性山丘基准测试中，DARSM将基线RANS的平均测试速度误差降低了2-4倍（跨雷诺数、几何形状和流动状态），峰值案例级降低达12倍。在附着、各向异性主导的流动（方形管道）上训练的模型无需重新训练即可准确泛化到分离流动（周期性山丘），这是底层物理状态的改变。DARSM还优于五种已建立的ML方法：离线训练、张量基神经网络、场反演机器学习、DeepONet和物理信息神经网络。

英文摘要

Turbulence is ubiquitous in engineering and science, yet direct simulation is prohibitively expensive. The Reynolds-averaged Navier-Stokes (RANS) equations provide savings exceeding ten orders of magnitude but introduce unclosed terms (the closure problem). Offline-trained machine-learning (ML) closures suffer distribution shift in predictive simulations, while ML methods that bypass the governing equations struggle to generalise from scarce high-fidelity data. We develop a physics-derived deep learning closure model for RANS, the Deep Algebraic Reynolds Stress Model (DARSM), which can be trained on small datasets and accurately generalise across Reynolds numbers, to unseen geometries, and to different flow regimes. A neural network maps flow invariants to empirical parameters in an implicit algebraic Reynolds stress equation, derived from the Reynolds stress transport equations under the weak-equilibrium assumption, imposing physics-based structure on the ML closure. End-to-end optimisation through the governing PDEs and the coupled implicit closure eliminates distribution shift, but both unrolled and implicit automatic differentiation fail on the stiff coupled solver. We derive adjoint equations that exploit the solver's implicit-explicit structure for efficient optimisation. On canonical square-duct and periodic-hill benchmarks, DARSM reduces average test velocity error over baseline RANS by $2$-$4\times$ across Reynolds number, geometries, and flow regimes, with peak case-level reductions of $12\times$. The model trained on attached, anisotropy-dominated flows (square duct) accurately generalises without retraining to separated flows (periodic hills), a regime change in the underlying physics. DARSM also outperforms five established ML methods: offline training, tensor-basis neural networks, field-inversion machine learning, DeepONets, and physics-informed neural networks.

URL PDF HTML ☆

赞 0 踩 0

2606.02778 2026-06-12 astro-ph.EP astro-ph.IM cs.LG 版本更新

One Transit Is All You Need: Detecting Exoplanets Through Learned Stellar Behaviour with EXOVEIL

一次凌星足矣：通过EXOVEIL学习恒星行为检测系外行星

Pratik Priyanshu

发表机构 * SRH Hochschule（SRH 高校）

AI总结提出EXOVEIL系统，利用Transformer世界模型和自监督学习从原始光变曲线中检测单次凌星事件，在Kepler数据上实现高召回率，并零样本迁移至TESS和PLATO任务。

Comments v3: appendix gallery of confirmed-planet recoveries added; Section 6 candidate catalogue reframed as transit-like anomalies for follow-up; TLS comparison table expanded

详情

AI中文摘要

我提出EXOVEIL，一个凌星检测系统，它学习恒星亮度应有的样子，并在现实不符时发出标记。与需要相位折叠输入的现有系统不同，EXOVEIL在原始通量时间序列上运行，可以检测仅凌星一次的行星。一个Transformer世界模型，在16,499条Kepler光变曲线上通过凌星掩蔽自监督学习训练，预测预期的恒星通量。一个带有方差加权的匹配滤波检测器从预测残差中提取凌星信号。一个学习分类器（XGBoost）将行星与假阳性区分开，在Kepler DR25上达到AUC 0.938。应用于单次凌星注入-恢复，EXOVEIL在1000 ppm深度下恢复了32%的凌星——而所有基于分类的系统由于设计原因得分为0%。对3,737颗Kepler恒星进行盲搜索，发现了179个新的凌星类信号，这些信号不在DR25 TCE目录中，包括46个单次凌星候选者。无需重新训练，应用于PLATO LOPS2场中的47颗已确认TESS行星，EXOVEIL实现了100%的恢复，展示了零样本跨任务迁移。在PLATO的25秒曝光下，检测达到100 ppm——接近地球类似物范围。我提供了共形预测在凌星检测中的首次应用（95.9%经验覆盖率），并发布了该系统，可通过pip install exoveil安装，包含预训练权重和候选目录。

英文摘要

I present EXOVEIL, a transit detection system that learns what a star's brightness should look like and flags when reality disagrees. Unlike existing systems that require phase-folded input, EXOVEIL operates on raw flux time series and can detect planets that transit only once.A Transformer world model, trained on 16,499 Kepler light curves with transit-masked self-supervised learning, predicts expected stellar flux. A matched-filter detector with variance weighting extracts transit signals from the prediction residuals. A learned classifier (XGBoost) separates planets from false positives, achieving AUC 0.938 on Kepler DR25. Applied to single-transit injection-recovery, EXOVEIL recovers 32% of transits at 1000 ppm depth a task where all classification-based systems score 0% by construction. A blind search of 3,737 Kepler stars yields 179 new transit-like signals not present in the DR25 TCE catalogue, including 46 monotransit candidates. Applied withoutretraining to 47 confirmed TESS planets in the PLATO LOPS2 field, EXOVEIL achieves 100% recovery, demonstrating zero-shot cross-mission transfer. At PLATO's 25-second cadence, detection reaches 100 ppm -- approaching the Earth-analog regime. I provide the first application of conformal prediction to transit detection (95.9% empirical coverage) and release the system as pip install exoveil with pretrained weights and a candidate catalogue.

URL PDF HTML ☆

赞 0 踩 0

2606.09855 2026-06-12 cs.MM cs.CV cs.LG 版本更新

MinhwaNet: Faithful but Insufficient Object Grounding in Korean Folk Painting

MinhwaNet: 韩国民俗画中忠实但不足的对象定位

Joonhyung Bae

发表机构 * Korea Advanced Institute of Science and Technology (KAIST)（韩国科学技术院）

AI总结提出MinhwaNet，通过部分级检测器生成对象证据图，发现韩国民俗画中符号列表不足以预测画作类型，而符号布局更重要，揭示了忠实但不足的解离现象。

详情

AI中文摘要

韩国民俗画（minhwa）由少量吉祥符号构成——老虎代表保护、一对鸟代表婚姻和谐、牡丹代表财富——这些符号在其许多绘画类型中反复出现。这暗示了一种直观的计算方法：识别画作中出现的符号，并从符号清单中读取画作类型。我们使用一个公开语料库，包含整幅画作、八字段双语策展说明以及一组独立的专家对象裁剪图，发现这种方法并不奏效。仅给定画作包含的符号列表的模型，其预测画作类型的效果远不如将图像与策展文本融合的模型，而强制类型表示基于对象定位反而会损害准确性。然而，类型预测所依赖的视觉证据仍然是局部化的且可检查的。从部分级检测器投影出的无泄漏对象证据图，在空间上忠实于策展人隔离符号对象的位置以及基于补丁的替代模型的梯度显著性。我们将这种配置称为忠实但不足的解离。部分级解释诚实地反映了部分级模型所见，但类型目标取决于符号的排列方式而非出现的符号。相同的视角区分了内容标签（在转移到保留的源机构时仍然有效，即类型）和风格标签（无效，即时代），我们通过语料库中的另外两个标签验证了这一预测。我们发布了多模态系统、一幅画作的证据图与其目录的工作示例解读，以及在长尾遗产收藏中反复出现的一系列评估注意事项。

英文摘要

Korean folk painting (minhwa) is built from a small vocabulary of auspicious symbols, a tiger for protection, a pair of birds for marital harmony, a peony for wealth, that recur across many of its painted genres. This suggests an obvious computational approach, identify which symbols appear in a painting and read the genre from the inventory. Working with a public corpus that pairs whole paintings, eight-field bilingual curatorial captions, and a separate set of expert object crops, we find that this approach does not work. A model given only a list of which symbols a painting contains predicts the genre far worse than a model that fuses the image with the curatorial text, and forcing the genre representation to be object-grounded actively hurts accuracy. The visual evidence on which the genre prediction rests is nonetheless localized and inspectable. A leakage-safe object evidence map projected from a part-level detector is spatially faithful to where curators isolated symbolic objects and to a patch-based surrogate's own gradient saliency. We name this configuration a faithful-but-insufficient dissociation. The part-level explanation is honest about what the part-level model sees, yet the genre target turns on how symbols are arranged rather than on which ones appear. The same lens separates a content label that survives transfer to held-out source institutions, genre, from a style label that does not, era, a prediction we confirm on two further labels in the corpus. We release the multimodal system, a worked-example reading of one painting's evidence map against its catalogue, and a set of evaluation cautions that recur in long-tailed heritage collections.

URL PDF HTML ☆

赞 0 踩 0

2606.10200 2026-06-12 cs.CV cs.AI cs.LG 版本更新

An Improved Generative Adversarial Network for Micro-Resistivity Imaging Logging Restoration

一种改进的生成对抗网络用于微电阻率成像测井恢复

Ahmed Faizul Haque, S. M. Riaz Rahman Antu, Saif Ahmed, Asadullah Hil Galib, Souvik Pramanik, Mohammad Ashrafuzzaman Khan, Mohammad Abdul Qayum, Mohsin Sajjad

AI总结提出基于改进GAN的成像测井图像恢复方法，通过FCN生成网络、深度可分离卷积残差块、Inception模块及多尺度特征提取与空间注意力机制，结合全局与局部判别网络，有效恢复缺失区域，结构相似性达0.903。

Comments Mistakes in citations and references. Further we want to submit in conference with improved experiments and results

详情

AI中文摘要

本文提出了一种改进的基于GAN的成像测井图像恢复方法，用于解决微电阻率成像测井图像部分缺失的问题。该方法采用FCN作为生成网络基础设施，并添加深度可分离卷积残差块以学习和保留更有效的像素与语义信息；添加Inception模块以增加网络的多尺度感知场并减少参数数量；添加多尺度特征提取模块和空间注意力残差块，结合通道注意力机制与残差块实现多尺度特征提取。设计了全局判别网络和局部判别网络，通过相互对抗与生成网络逐步提高恢复部分与整体图像之间的内容和语义结构一致性。实验结果表明，测试集中五组不同大小缺失区域的成像测井图像的平均结构相似性度量为0.903，相比其他类似方法提高了约0.3。研究表明，该方法可用于微电阻率成像测井图像的恢复，在语义结构一致性和纹理细节方面有良好改善，从而为保障微电阻率成像测井图像后续解释的顺利进行提供了一种新的深度学习方法。

英文摘要

An improved GAN-based imaging logging image restoration method is presented in this paper for solving the problem of partially missing micro-resistivity imaging logging images. The method uses FCN as the generative network infrastructure and adds a depth-separable convolutional residual block to learn and retain more effective pixel and semantic information; an Inception module is added to increase the multi-scale perceptual field of the network and reduce the number of parameters in the network; and a multi-scale feature extraction module and a spatial attention residual block are added to combine the channel attention. The multi-scale module adds a multi-scale feature extraction module and a spatial attention residual block, which combine the channel attention mechanism and the residual block to achieve multi-scale feature extraction. The global discriminative network and the local discriminative network are designed to gradually improve the content and semantic structure coherence between the restored parts and the whole image by playing off each other and the generative network. According to the experimental results, the average structural similarity measure of the five sets of imaged logging images with different sizes of missing regions in the test set is 0.903, which is an improvement of about 0.3 compared with other similar methods. It is shown that the method in this study can be used for the restoration of micro-resistivity imaging log images with good improvement in semantic structural coherence and texture details, thus providing a new deep learning method to ensure the smooth advancement of the subsequent interpretation of micro-resistivity imaging log images.

URL PDF HTML ☆

赞 0 踩 0

2606.11240 2026-06-12 physics.comp-ph cond-mat.str-el cs.LG quant-ph 版本更新

Physically Constrained Ensemble Gaussian Process Modelling for Expensive Quantum Systems with Heteroskedastic Noise

物理约束集成高斯过程建模用于具有异方差噪声的昂贵量子系统

Arpan Biswas, Sutirtha Paul, Joseph Agada, Matthias Thamm, Adrian Del Maestro

AI总结提出物理约束集成高斯过程框架，通过加权惩罚和数值积分集成多个GP代理，高效建模含异方差噪声的量子系统，在Bose-Hubbard模型和纳米孔硅酸盐量子液体模拟中实现更准确且物理合理的预测。

Comments 14 pages, 6 figures in main text, 2 figures in Supp materials

详情

AI中文摘要

精确建模量子多体系统通常需要计算昂贵的模拟，如密度矩阵重正化群（DMRG）或量子蒙特卡洛（QMC）计算。这些方法虽然精确，但会带来显著的时间和资源限制，限制了它们在详尽参数探索中的应用。此外，这些昂贵模拟在大的未知参数空间内可能包含可变误差，需要量化和传播。因此，需要预测建模来准确估计稀疏采样数据（具有异方差噪声）的函数空间，同时保持估计的物理相关性。为此，我们提出了物理约束集成高斯过程（pc-EGP）框架，旨在物理一致性约束下高效建模复杂且含噪声的量子系统。该方法首先将物理约束作为用户控制的加权惩罚项，施加到高斯过程（GP）代理的数据驱动损失函数中。然后，通过数值求积方法训练一组这样的GP模型，其中多个不同节点上的GP通过求积加权平均进行集成。我们首先在合成生成数据上演示该框架，然后应用于量子系统。在第一个案例研究中，我们利用Bose-Hubbard模型的DMRG模拟来预测控制超流-莫特绝缘体转变的临界相互作用参数Uc。在第二个案例研究中，我们展示了该方法在QMC模拟上的应用，模拟限制在纳米孔硅酸盐内的量子液体，目标是优化化学环境以实现一维超流。与传统GP相比，pc-EGP在准确性和物理有意义的预测之间实现了更好的平衡。

英文摘要

Accurate modeling of quantum many-body systems often requires computationally expensive simulations such as Density Matrix Renormalization Group (DMRG) or Quantum Monte Carlo (QMC) calculations. These methods, while precise, impose significant time and resource constraints, limiting their use in exhaustive parameter exploration. Moreover, these expensive simulations can contain variable errors over the large unknown parameter space, which needs to be quantified and propagated. Thus, predictive modelling is required to estimate the functional space accurately over scarcely sampled data with heteroskedastic noise, while preserving the physical relevance of the estimation. Therefore, we present a Physically Constrained Ensemble Gaussian Process (pc-EGP) framework designed to efficiently model complex and noisy quantum systems under physical consistency constraints. The proposed method first enforces physical constraints as a user controlled weighted penalty to the data-driven loss function of the Gaussian Process (GP) surrogates. Then an ensemble of such GP models is trained with variable noisy simulations via numerical quadrature method where these multiple GP(s) at different nodes is integrated as a quadrature weighted average. We first demonstrate the framework on synthetically generated data before applying to quantum systems. In the first case study, we leverage DMRG simulations of the Bose-Hubbard Model to predict the critical interaction parameter Uc governing the superfluid-to-Mott-insulator transition. In the second case study, we demonstrate our method on QMC simulations, of a quantum liquid confined inside a nanoporous silicate with the goal of optimizing a chemical environment to realize a one-dimensional superfluid. Compared to conventional GP, pc-EGP achieves a better balance of accuracy and physically meaningful predictions.

URL PDF HTML ☆

赞 0 踩 0

2606.12610 2026-06-12 cs.LG 新提交

The Mathematics of AI Winters: The mathematical Taxonomy of Paradigm Fragility in AI Winter

AI寒冬的数学：AI中范式脆弱性的数学分类

Miquel Noguer i Alonso, David Pacheco Aznar

发表机构 * AIFI ； Staq.io

AI总结本文提出AI寒冬的数学解释，通过感知机不可能性、神经网络训练复杂度、高维非参数估计率、梯度消失和统计学习理论等数学瓶颈，分析早期AI范式失败的原因，并关联后续突破。

Comments 33 pages, 1 figure

详情

AI中文摘要

人工智能研究中两个主要的资金减少和信心下降时期，通常被称为第一次和第二次AI寒冬，通常被解释为工程失败、商业失望和预期膨胀。本文提出一个补充论点：这些时期的主导范式也遇到了真正的形式障碍，包括表示、优化、计算复杂性、统计可学习性和高维近似的限制。贡献是综合性的而非档案性的。我们并不声称特定定理机械地导致了寒冬；相反，我们表明早期AI的几个核心失望与数学上精确的瓶颈相一致。我们通过Minsky和Papert的感知机不可能结果、Blum和Rivest建立的精确神经网络训练的计算复杂性困难、Stone的高维非参数估计的极小化极大率、Hochreiter以及Bengio及其合作者的梯度消失分析，以及Vapnik和Chervonenkis、Valiant、Blumer及其合作者传统的经典统计学习理论来分析这些瓶颈。然后我们将这些障碍与后来缓解（而非消除）它们的突破联系起来。

英文摘要

Two major periods of reduced funding and confidence in artificial intelligence research, commonly called the first and second AI winters, are usually explained through engineering failure, commercial disappointment, and inflated expectations. This article develops a complementary thesis: that the dominant paradigms of those periods also met genuine formal barriers, including limitations of representation, optimisation, computational complexity, statistical learnability, and high-dimensional approximation. The contribution is synthetic rather than archival. We do not claim that particular theorems mechanically caused the winters; rather, we show that several central disappointments of early AI were aligned with mathematically precise bottlenecks. We analyse these bottlenecks through the perceptron impossibility results of Minsky and Papert, the complexity-theoretic hardness of exact neural-network training established by Blum and Rivest, minimax rates for nonparametric estimation in high dimension due to Stone, vanishing-gradient analyses by Hochreiter and by Bengio and collaborators, and classical statistical learning theory in the tradition of Vapnik and Chervonenkis, Valiant, and Blumer and collaborators. We then relate these barriers to the later breakthroughs that mitigated, rather than eliminated, them.

URL PDF HTML ☆

赞 0 踩 0

2606.12683 2026-06-12 cs.AI cs.CY cs.LG 交叉投稿

From AGI to ASI

从AGI到ASI

Tim Genewein, Matija Franklin, Alexander Lerchner, Laurent Orseau, Samuel Albanie, Adam Bales, Cole Wyeth, Stephanie Chan, Iason Gabriel, Joel Z. Leibo, Allan Dafoe, Marcus Hutter, Thore Graepel, Shane Legg

发表机构 * Google DeepMind（谷歌深度思维）； University of Waterloo（滑铁卢大学）； Australian National University（澳大利亚国立大学）； University College London（伦敦大学学院）

AI总结探讨从人类级通用人工智能到超级智能的转变路径，包括扩展、范式转变、递归改进和多智能体涌现，并分析摩擦与瓶颈。

详情

AI中文摘要

在过去十年中，构建人类级通用人工智能已从遥不可及的猜测转变为许多大型AI组织未来十年的具体目标。实现这一目标将对人类社会产生深远影响，并引发未来十年的诸多复杂问题。本报告研究在机器智能连续体中，AI如何在后AGI世界中继续发展。该连续体的终点——通用AI——在理论上已被充分理解，这为本报告的主要焦点提供了形式基础：从人类级AGI向人工通用超级智能的转变，直观上可理解为比大型人类组织更智能、认知能力更强的系统。在描述ASI后，报告讨论了从AGI到ASI的四条潜在路径：扩展AGI、AI范式转变、递归改进以及从大规模多智能体集体中涌现ASI。随后，报告讨论了这些路径上可能的摩擦和瓶颈。确定这些摩擦的影响是微不足道还是重大，提出了若干具体的开放研究问题。由于预测ASI进展存在巨大不确定性，不能排除AI进展在未来几年继续加速的可能性。这可能意味着由人类级AGI引入社会所导致的单一变革性步骤的形象可能不准确。更恰当的前景可能是由AI在科学和技术的多个领域引发的进步和突破所导致的一系列变革性社会变化。为这一前景做准备需要全球范围内的大规模跨学科努力。

英文摘要

Over the last decade, building human-level artificial general intelligence has moved from far-fetched speculation to being a concrete next-decade target for many of the largest AI organisations. Achieving this goal would have profound and far-reaching impacts on human society, which raises many complex questions for the decade ahead. This report investigates how AI itself might continue to develop in a post-AGI world along the continuum of machine intelligence. The endpoint of this continuum, Universal AI, is theoretically well understood, which provides some formal grounding for the main focus of this report: the transition from human-level AGI to artificial general superintelligence, which, intuitively, can be understood as a system that is more intelligent and cognitively capable than large organisations of humans. After characterizing ASI, the report discusses four potential pathways from AGI to ASI: scaling AGI, AI paradigm shifts, recursive improvement, and ASI emerging from large-scale multi-agent collectives. The report then discusses possible frictions and bottlenecks along these pathways. Determining whether the impact of these frictions will be negligible or substantial raises a number of concrete open research questions. Due to large uncertainties for predicting ASI progress, it cannot be ruled out that AI progress might continue to accelerate over the next years. This could imply that the image of a single transformative step change, caused by the introduction of human-level AGI into our society, could be inaccurate. More apt might be the prospect of a series of transformative societal changes caused by AI-enabled progress and breakthroughs across many areas of science and technology. Preparing for this prospect requires a massively interdisciplinary endeavour of global scope and interest.

URL PDF HTML ☆

赞 0 踩 0

2606.12709 2026-06-12 cs.MA cs.CR cs.LG 交叉投稿

Smarter Saboteurs, Better Fixers: Scaling & Security in Linear Multi-Agent Workflows

更聪明的破坏者，更好的修复者：线性多智能体工作流中的规模与安全性

Timothy McAllister, Sina Abdidizaji, Ivan Garibay, Ozlem Ozmen Garibay

AI总结研究模型规模对线性多智能体工作流安全性的影响，发现大模型更易执行恶意指令，但轻量级修复阶段可恢复性能，表明线性结构在适当校正下具有鲁棒性。

Comments 16 pages (4 are main text), 2 figures, 6 tables. Accepted to the AIWILD Workshop at ICML 2026

详情

AI中文摘要

随着基于LLM的多智能体系统（MAS）在现实环境中部署，其协作结构对抗对抗性攻击的韧性成为一个关键的安全问题。攻击者可能利用提示注入或越狱来破坏MAS工作流中的单个智能体，但模型缩放与系统级韧性之间的相互作用仍知之甚少。本文研究了模型规模如何影响线性多智能体工作流的安全性。我们在HumanEval基准上对两个开放权重模型系列在不同规模下的实验揭示了一种合规-校正对称性：较大的模型更可能忠实地执行恶意指令，在未校正的流水线中，27B参数模型的控制到恶意性能下降达到53.7个百分点。然而，附加一个轻量级的终端修复阶段可将此下降缩小到0.6个百分点，并恢复与控制级性能的统计对等性，表明严格线性协作结构在此规模下是可行且对抗性鲁棒的，并暗示先前归因于线性拓扑的脆弱性可能源于缺乏校正。

英文摘要

As LLM-based multi-agent systems (MAS) are deployed in the wild, the resilience of their collaboration structures against adversarial compromise becomes a critical safety concern. Attackers may leverage prompt-injection or jailbreaking to sabotage individual agents within MAS workflows, but the interaction between model scaling and system-level resilience remains poorly understood. This paper investigates how model scale affects the security of linear multi-agent workflows. Our experiments across scales of two open-weight model families on the HumanEval benchmark reveal a compliance-correction symmetry: larger models are far more likely to faithfully execute malicious instructions, with the control-to-malicious performance drop reaching 53.7pp at 27B in uncorrected pipelines. However, appending a lightweight terminal Fixer stage collapses this to 0.6pp and restores statistical parity with control-level performance, demonstrating that strictly linear collaboration structures can be viable and resilient to adversaries at this scale, and suggesting that the brittleness previously attributed to linear topology may stem from a lack of correction.

URL PDF HTML ☆

赞 0 踩 0

2606.13422 2026-06-12 quant-ph cs.LG physics.flu-dyn 交叉投稿

Foundations of Practical Quantum Advantage in Quantum-Informed Machine Learning for Predicting Chaos

量子信息机器学习预测混沌的实用量子优势基础

Maida Wang, Xiao Xue, Minh Chung, Peter V. Coveney

发表机构 * Centre for Computational Science, University College London（大学学院伦敦计算科学中心）； Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities（巴伐利亚科学院和人文科学莱比锡超算中心）； Centre for Advanced Research Computing, University College London（大学学院伦敦先进研究计算中心）

AI总结提出基于高阶量子统计先验的量子优势机制，通过两阶段优势（表示与提取）证明量子-经典复制测量复杂度分离，并在湍流和天气预报中验证。

详情

AI中文摘要

我们为混沌动力系统的量子信息机器学习中的实用量子优势机制建立了理论基础。一族由k索引的高阶量子统计先验（Q-Priors）在n_q = kq个量子比特上承载不变测度的k点边际，扩展了先前工作的单站点构造。我们证明了一个两阶段优势。在表示阶段，叠加和纠缠紧凑地存储了n_q个量子比特上不变测度的不可分解空间相关性。在提取阶段，对两个副本进行联合贝尔测量，以独立于n_q的副本对数量估计任何事后泡利泛函，而相应的全泡利读出的任何自适应单副本协议需要Ω(2^(n_q))个副本；这是复制测量复杂度中可证明的量子-经典分离。双副本读出在模拟和IQM超导处理器上实现。两个案例研究将这一机制实例化到具有独立科学价值的工作流程中：一个湍流通道流研究，其中双副本读出产生了不变测度的一个命名的非对角关联子（速度方向相干性），以及一个基于欧洲中期天气预报中心ERA5再分析的中期天气预报工作流程，其中对角k ≤ 2 Q-Prior引导Koopman展开，在48-240小时预报时效内将异常相关系数技能提高10-39%，并减少了滚动预报到静态平均场的长期崩溃。我们的实用优势定义的两个条件在互补层面上得到满足，为在容错硬件之前实现实用量子优势确定了一条候选路径。

英文摘要

We develop theoretical foundations for a practical quantum-advantage mechanism in quantum-informed machine learning for chaotic dynamical systems. A family of k-indexed higher-order quantum statistical priors (Q-Priors) hosts the k-point marginal of the invariant measure on n_q = kq qubits, extending the single-site construction of prior work. We prove a two-stage advantage. In the representation stage, superposition and entanglement compactly store non-factorisable spatial correlations of the invariant measure on n_q qubits. In the extraction stage, joint Bell measurements on two copies estimate any post hoc Pauli functional with a copy-pair count independent of n_q, whereas any adaptive single-copy protocol for the corresponding full-Pauli read-out requires Omega(2^(n_q)) copies; this is a provable quantum-classical separation in copy-measurement complexity. The two-copy read-out is realised in simulation and on IQM superconducting processors. Two case studies instantiate the mechanism in workflows of independent scientific value: a turbulent channel-flow study in which the two-copy read-out yields a named non-diagonal correlator of the invariant measure (the velocity-direction coherence), and a medium-range weather forecasting workflow on the European Centre for Medium-Range Weather Forecasts ERA5 reanalysis in which the diagonal k <= 2 Q-Prior steers a Koopman rollout, improves anomaly-correlation skill by 10-39% across 48-240 h lead times, and reduces the long-horizon collapse of rollouts onto a static mean field. The two conditions of our practical-advantage definition are met at complementary levels, identifying a candidate route to practical quantum advantage before fault-tolerant hardware.

URL PDF HTML ☆

赞 0 踩 0

2606.13454 2026-06-12 physics.optics cond-mat.dis-nn cs.ET cs.LG 交叉投稿

Optical Implementation of Equilibrium Propagation Using Spatial Photonic Ising Machines

利用空间光子伊辛机实现平衡传播的光学实现

Dimitri Vanden Abeele, Daniele Veraldi, Davide Pierangeli, Claudio Conti, Serge Massar

发表机构 * Laboratoire d’Information Quantique, Université Libre de Bruxelles (ULB)（量子信息实验室，布鲁塞尔自由大学）； Dipartimento di Fisica, Sapienza Università di Roma（物理学系，萨皮恩扎罗马大学）

AI总结提出利用空间光子伊辛机光学实现平衡传播，通过规范变换方法编码神经元状态和可训练模式，在Wine和MNIST数据集上验证了能效物理实现的可行性。

2505.20076 2026-06-12 cs.LG 版本更新

ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior

ExPLAIND：统一模型、数据和训练归因以研究模型行为

Florian Eichin, Yupei Du, Philipp Mondorf, Maria Matveev, Barbara Plank, Michael A. Hedderich

发表机构 * University of Michigan（密歇根大学）

AI总结提出ExPLAIND框架，统一归因于模型组件、数据和训练轨迹，支持跨粒度解释，通过梯度路径核和AdamW核机器推导参数级和步骤级影响分数，验证了Transformer的Grokking和EuroLLM预训练中的两阶段动态。

Comments published at ICML 2026, code at https://github.com/mainlp/explaind

详情

AI中文摘要

事后可解释性方法通常将模型行为归因于其组件、数据或训练轨迹中的某一个，并且往往局限于局部到全局谱中的特定粒度。这导致解释缺乏统一视角，可能遗漏关键交互。我们提出了ExPLAIND，一个理论扎实的统一框架，它整合了模型组件、数据和训练轨迹，同时支持跨粒度的解释。我们推广了最近关于梯度路径核的工作，将AdamW训练的模型重新表述为核机器。从得到的核特征图中，我们推导出新的参数级和步骤级影响分数。我们在多种设置下实证验证了模型行为的分解结果，并将ExPLAIND应用于两个案例研究。我们对一个表现出Grokking现象的Transformer的发现支持了先前提出的学习阶段，同时将最后阶段细化为外层在记忆后围绕一个表示管道对齐的阶段。对于EuroLLM预训练，ExPLAIND揭示了一个两阶段动态：第一阶段以外部MLP学习为特征，第二阶段以中间注意力层的相对影响增加为特征。这些结果确立了ExPLAIND作为解释模型行为和训练动态的统一框架。

英文摘要

Post-hoc interpretability methods typically attribute a model's behavior to its components, data, or training trajectory in isolation, and are often tied to a particular level of granularity along the local-to-global spectrum. This leads to explanations that lack a unified view and may miss key interactions. We present ExPLAIND, a theoretically grounded, unified framework that integrates model components, data, and training trajectory while supporting explanations across granularities. We generalize recent work on gradient path kernels, reformulating models trained by AdamW as kernel machines. From the resulting kernel feature maps, we derive novel parameter-wise and step-wise influence scores. We empirically validate the resulting decomposition of model behavior in several settings and apply ExPLAIND to two case studies. Our findings on a Transformer exhibiting Grokking support previously proposed learning phases, while refining the final phase as one in which outer layers align around a representation pipeline learned after memorization. For EuroLLM pretraining, ExPLAIND reveals a two-phase dynamic, with the first characterized by outer-layer MLP learning and the second by increased relative influence of intermediate attention layers. These results establish ExPLAIND as a unified framework for interpreting model behavior and training dynamics.

URL PDF HTML ☆

赞 0 踩 0

2508.04427 2026-06-12 cs.LG cs.AI 版本更新

Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models

解码多模态迷宫：多模态注意力模型中可解释性采纳的系统综述

Md Raisul Kibria, Sébastien Lafond, Janan Arslan

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文系统综述了2020年至2024年初多模态模型可解释性研究，发现多数工作集中于视觉-语言和纯语言模型，注意力机制是主要解释方法，但评估缺乏系统性和鲁棒性，并提出了改进建议。

详情

DOI: 10.1016/j.inffus.2026.104405

AI中文摘要

近年来，多模态学习取得了显著进展，特别是随着注意力模型的整合，在各种任务中带来了显著的性能提升。与此同时，对可解释人工智能（XAI）的需求推动了越来越多的研究，旨在解释这些模型的复杂决策过程。本系统文献综述分析了2020年1月至2024年初期间发表的、关注多模态模型可解释性的研究。在XAI更广泛目标的框架内，我们从多个维度审视文献，包括模型架构、涉及模态、解释算法和评估方法。我们的分析显示，大多数研究集中在视觉-语言和纯语言模型上，注意力机制是最常用的解释方法。然而，这些方法往往无法捕捉模态间交互的全谱系，这一问题因领域间的架构异质性而进一步加剧。重要的是，我们发现多模态环境中XAI的评估方法大多是非系统性的，缺乏一致性、鲁棒性，并且未考虑模态特定的认知和上下文因素。为解决这些不足，我们不仅综合了所调查研究的发现，还纳入了补充分析，整合了推动多模态可解释性的近期和新兴进展。基于这些见解，我们提出了一套全面的建议，旨在促进多模态XAI研究中严谨、透明和标准化的评估与报告实践。我们的目标是支持未来构建更可解释、可问责和负责任的多模态AI系统，并以可解释性为核心。

英文摘要

Multimodal learning has witnessed remarkable advancements in recent years, particularly with the integration of attention-based models, leading to significant performance gains across a variety of tasks. Parallel to this progress, the demand for explainable artificial intelligence (XAI) has spurred a growing body of research aimed at interpreting the complex decision-making processes of these models. This systematic literature review analyzes research published between January 2020 and early 2024 that focuses on the explainability of multimodal models. Framed within the broader goals of XAI, we examine the literature across multiple dimensions, including model architecture, modalities involved, explanation algorithms and evaluation methodologies. Our analysis reveals that most studies are concentrated on vision-language and language-only models, with attention-based techniques being the most commonly employed for explanation. However, these methods often fall short in capturing the full spectrum of interactions between modalities, a challenge further compounded by the architectural heterogeneity across domains. Importantly, we find that evaluation methods for XAI in multimodal settings are largely non-systematic, lacking consistency, robustness, and consideration for modality-specific cognitive and contextual factors. To address these gaps, we not only synthesize findings from the surveyed works but also incorporate a complementary analysis that integrates recent and emerging advances driving multimodal explainability. Based on these insights, we provide a comprehensive set of recommendations aimed at promoting rigorous, transparent, and standardized evaluation and reporting practices in multimodal XAI research. Our goal is to support future research in more interpretable, accountable, and responsible multimodal AI systems, with explainability at their core.

URL PDF HTML ☆

赞 0 踩 0

2508.14143 2026-06-12 cs.LG q-bio.NC 版本更新

SPLIT：通过潜在算术分离物理接触以实现基于图像的触觉传感器

Wadhah Zai El Amri, Nicolás Navarro-Guerrero

发表机构 * Leibniz Universität Hannover, L3S Research Center（莱布尼茨汉诺威大学，L3S研究所）

AI总结本文提出SPLIT方法，通过潜在空间算术分离接触几何与传感器光学特性，实现触觉传感器的高效模拟，支持多传感器迁移和双向模拟，提升机器人触觉感知研究效率。

Comments Accepted to Elsevier Robotics and Autonomous Systems Journal

详情

DOI: 10.1016/j.robot.2026.105498

AI中文摘要

训练机器人触觉感知的机器学习模型需要大量数据，但获取真实交互数据因物理复杂性和变异性而具有挑战性。模拟触觉传感器是加速进展的关键步骤。本文提出了SPLIT，一种新的基于图像的触觉传感器模拟方法，重点在于DIGIT传感器。我们的方法核心是一种潜在空间算术策略，明确分离接触几何与传感器特定的光学属性。与需要重新校准的现有方法不同，这种分离使SPLIT能够适应多样化的DIGIT背景，甚至在不完全重训练的情况下将数据转移到不同的传感器如GelSight R1.5。此外，我们的方法在推理速度上优于现有替代方案。我们还提供了一种校准的有限元方法（FEM）软体网格模拟，具有可变分辨率，提供速度与保真度之间的可调权衡。此外，我们的算法支持双向模拟，允许从变形网格生成逼真图像以及从触觉图像重建网格。这种多功能性使SPLIT成为加速机器人触觉感知研究进展的重要工具。

英文摘要

Training machine learning models for robotic tactile sensing requires vast amounts of data, yet obtaining realistic interaction data remains a challenge due to physical complexity and variability. Simulating tactile sensors is thus a crucial step in accelerating progress. This paper presents SPLIT, a novel method for simulating image-based tactile sensors, with a primary focus on the DIGIT sensor. Central to our approach is a latent space arithmetic strategy that explicitly disentangles contact geometry from sensor-specific optical properties. Unlike methods that require recalibration for every new unit, this disentanglement allows SPLIT to adapt to diverse DIGIT backgrounds and even transfer data to distinct sensors like the GelSight R1.5 without full model retraining. Beyond this adaptability, our approach achieves faster inference speeds than existing alternatives. Furthermore, we provide a calibrated finite element method (FEM) soft-body mesh simulation with variable resolution, offering a tunable trade-off between speed and fidelity. Additionally, our algorithm supports bidirectional simulation, allowing for both the generation of realistic images from deformation meshes and the reconstruction of meshes from tactile images. This versatility makes SPLIT a valuable tool for accelerating progress in robotic tactile sensing research.

URL PDF HTML ☆

赞 0 踩 0

2604.08581 2026-06-12 cs.LG 版本更新

Fully Autonomous Z-Score-Based TinyML Anomaly Detection on Resource-Constrained MCUs Using Power Side-Channel Data

基于功率侧信道数据的全自主Z分数TinyML异常检测

Abdulrahman Albaiz, Fathi Amsaad

AI总结本文提出一种在低功耗微控制器上实现的全自主TinyML Z分数异常检测系统，利用功率侧信道数据实时监控设备行为，无需外部计算或连接，实现高效嵌入式部署。

Comments SaTC 2026 Conference

详情

DOI: 10.1109/SATC69565.2026.11542250
Journal ref: Proc. IEEE 2nd International Conference on Secure IoT, Assured and Trusted Computing (SATC), Houston, TX, USA, 2026, pp. 1-6

AI中文摘要

本文提出了一种在低功耗微控制器上实现的全自主TinyML Z分数异常检测系统，用于通过功率侧信道数据实时监控设备行为。与现有物联网异常检测方法不同，该系统在资源受限的微控制器上直接进行模型训练和推断，无需外部计算或连接。系统持续采样电流消耗，在设备上计算均方根（RMS）值，并在初始训练阶段推导统计参数。利用轻量级Z分数阈值检测异常，实现可解释且计算高效的推断，适用于嵌入式部署。该架构在基于STM32的平台上实现，并使用从家庭小型冰箱在正常运行和受控异常条件下收集的14天数据集进行评估。结果表明，检测性能完美，精度和召回率均为1.00，推断延迟在十微秒量级，总内存占用约为3.3 KB SRAM和63 KB Flash。这些结果证实，可以在低成本微控制器上实现稳健且完全自主的TinyML异常检测。未来的工作包括扩展框架以纳入额外轻量级模型和多设备学习场景。

英文摘要

This paper presents a fully autonomous Tiny Machine Learning (TinyML) Z-Score-based anomaly detection system deployed on a low-power microcontroller for real-time monitoring of appliance behavior using power side-channel data. Unlike existing Internet of Things (IoT) anomaly detection approaches that rely on offline training or cloud-assisted analytics, the proposed system performs both model training and inference directly on a resource-constrained microcontroller without external computation or connectivity. The system continuously samples current consumption, computes Root Mean Square (RMS) values on-device, and derives statistical parameters during an initial training phase. Anomalies are detected using lightweight Z-Score thresholds, enabling interpretable and computationally efficient inference suitable for embedded deployment. The architecture was implemented on an STM32-based platform and evaluated using a 14-day dataset collected from a household mini-fridge under normal operation and controlled anomaly conditions. Results demonstrate perfect detection performance, with Precision and Recall of 1.00, inference latencies on the order of tens of microseconds, and a total memory footprint of approximately 3.3 KB SRAM and 63 KB Flash. These results confirm that robust and fully autonomous TinyML anomaly detection can be achieved on low-cost microcontrollers. Future work includes extending the framework to incorporate additional lightweight models and multi-device learning scenarios.

URL PDF HTML ☆

赞 0 踩 0

2603.27393 2026-06-12 cs.LG 版本更新

K-Means Based TinyML Anomaly Detection and Distributed Model Reuse via the Distributed Internet of Learning (DIoL)

基于K均值的TinyML异常检测与通过分布式物联网学习（DIoL）的分布式模型重用

Abdulrahman Albaiz, Fathi Amsaad

AI总结本文提出了一种轻量级K均值异常检测模型和适用于资源受限微控制器的分布式模型共享流程。通过实际电源测量数据，在设备上进行特征提取、聚类和阈值估计以识别异常行为。DIoL框架允许在一台MCU上训练的模型导出为可移植的文本表示并在其他设备上直接重用，实验验证了该方法的可行性。

Comments SaTC 2026 Conference

详情

DOI: 10.1109/SATC69565.2026.11542487
Journal ref: Proc. IEEE 2nd International Conference on Secure IoT, Assured and Trusted Computing (SATC), Houston, TX, USA, 2026, pp. 1-5

AI中文摘要

本文提出了一种轻量级K均值异常检测模型和适用于资源受限微控制器的分布式模型共享流程。通过实际电源测量数据，在设备上进行特征提取、聚类和阈值估计以识别异常行为。DIoL框架允许在一台MCU上训练的模型导出为可移植的文本表示并在其他设备上直接重用，实验验证了该方法的可行性。

英文摘要

This paper presents a lightweight K-Means anomaly detection model and a distributed model-sharing workflow designed for resource-constrained microcontrollers (MCUs). Using real power measurements from a mini-fridge appliance, the system performs on-device feature extraction, clustering, and threshold estimation to identify abnormal appliance behavior. To avoid retraining models on every device, we introduce the Distributed Internet of Learning (DIoL), which enables a model trained on one MCU to be exported as a portable, text-based representation and reused directly on other devices. A two-device prototype demonstrates the feasibility of the "Train Once, Share Everywhere" (TOSE) approach using a real-world appliance case study, where Device A trains the model and Device B performs inference without retraining. Experimental results show consistent anomaly detection behavior, negligible parsing overhead, and identical inference runtimes between standalone and DIoL-based operation. The proposed framework enables scalable, low-cost TinyML deployment across fleets of embedded devices.

URL PDF HTML ☆

赞 0 踩 0

2603.26705 2026-06-12 q-bio.BM cs.AI cs.LG 版本更新

PI-Mamba: Linear-Time Protein Backbone Generation via Spectrally Initialized Flow Matching

PI-Mamba：通过谱初始化流匹配实现线性时间的蛋白质主链生成

Tianyu Wu, Lin Zhu

发表机构 * Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign（生物物理与定量生物学中心，伊利诺伊大学厄巴纳-香槟分校）； School of Information Science, University of Illinois Urbana-Champaign（信息科学学院，伊利诺伊大学厄巴纳-香槟分校）

AI总结 PI-Mamba通过谱初始化和流匹配框架，在保证局部共价几何精确性的同时实现线性时间推断，实现了主链生成的高效与高保真。

详情

DOI: 10.1093/bioinformatics/btag370
Journal ref: Bioinformatics (2026)

AI中文摘要

动机：蛋白质主链设计的生成模型必须同时确保几何有效性、采样效率和长序列的可扩展性。然而，大多数现有方法依赖于迭代细化、二次注意力机制或事后几何修正，导致计算效率与结构保真度之间存在持续的权衡。结果：我们提出物理指导的Mamba（PI-Mamba），一种生成模型，通过构造确保精确的局部共价几何，同时实现线性时间推断。PI-Mamba将可微约束执行操作符整合到流匹配框架中，并与基于Mamba的状态空间架构耦合。为了提高优化稳定性和主链真实性，我们引入了源自Rouse聚合物模型的谱初始化和辅助的顺式脯氨酸意识头。在基准任务中，PI-Mamba实现了0.0%的局部几何违规率和高设计性（scTM = $0.91\pm 0.03$，n = 100），并且在单个A5000 GPU（24 GB）上可扩展到超过2,000个残基的蛋白质。

英文摘要

Motivation: Generative models for protein backbone design have to simultaneously ensure geometric validity, sampling efficiency, and scalability to long sequences. However, most existing approaches rely on iterative refinement, quadratic attention mechanisms, or post-hoc geometry correction, leading to a persistent trade-off between computational efficiency and structural fidelity. Results: We present Physics-Informed Mamba (PI-Mamba), a generative model that enforces exact local covalent geometry by construction while enabling linear-time inference. PI-Mamba integrates a differentiable constraint-enforcement operator into a flow-matching framework and couples it with a Mamba-based state-space architecture. To improve optimisation stability and backbone realism, we introduce a spectral initialization derived from the Rouse polymer model and an auxiliary cis-proline awareness head. Across benchmark tasks, PI-Mamba achieves 0.0\% local geometry violations and high designability (scTM = $0.91\pm 0.03$, n = 100), while scaling to proteins exceeding 2,000 residues on a single A5000 GPU (24 GB).

URL PDF HTML ☆

赞 0 踩 0

2411.02933 2026-06-12 cs.DB cs.LG cs.PF 版本更新

P-MOSS: Scheduling Main-Memory Indexes Over NUMA Servers Using Next Token Prediction

P-MOSS：利用下一个令牌预测在NUMA服务器上调度主内存索引

Yeasir Rayhan, Walid G. Aref

发表机构 * Purdue University West Lafayette, IN, USA（普渡大学西拉法叶分校）

AI总结 P-MOSS通过学习空间调度框架，在NUMA服务器上调度查询执行到特定逻辑核心并 colocate 数据，利用大语言模型原理提升性能，实验表明其查询吞吐量提升达6倍。

Comments Accepted to SIGMOD'26

详情

DOI: 10.1145/3786675

AI中文摘要

自从2000年代初Dennard缩放定律失效，CPU频率停滞，厂商开始在每个CPU芯片上增加核心数量，引入异构性，从而 ushered the era of NUMA和Chiplet处理器。此后，硬件设计空间的异构性不断增加，现代服务器中DBMS性能可能变化高达一个数量级。影响性能的重要因素包括DBMS查询执行的逻辑核心位置和数据存储的位置。本文介绍了P-MOSS，一种学习空间调度框架，将查询执行调度到特定逻辑核心，并在对应的NUMA节点上 colocate数据。为了实现跨硬件和工作负载的适应性，P-MOSS利用大语言模型的核心原理，如下一个令牌预测、生成式预训练和微调。在硬件-软件协同的精神下，P-MOSS仅基于硬件性能监控单元收集的低层硬件统计信息，通过决策变压器进行调度决策。在B$^+$-Tree索引的背景下进行了实验评估。性能结果表明，P-MOSS在查询吞吐量方面比传统调度提高了多达6倍。

英文摘要

Ever since the Dennard scaling broke down in the early 2000s and the frequency of the CPUs stalled, vendors have started to increase the core count in each CPU chip at the expense of introducing heterogeneity, thus ushering the era of NUMA and Chiplet processors. Since then, the heterogeneity in the design space of hardware has only increased to the point that DBMS performance may vary significantly up to an order of magnitude in modern servers. An important factor that affects performance includes the location of the logical cores where the DBMS queries execute, and the location where the data resides. This paper introduces P-MOSS, a learned spatial scheduling framework that schedules query execution to specific logical cores, and co-locates data on the corresponding NUMA node. For cross-hardware and workload adaptability, P-MOSS leverages core principles from Large Language Models, such as Next Token prediction, Generative Pre-training, and Fine-tuning. In the spirit of hardware-software synergy, P-MOSS guides its scheduling decision solely based on the low-level hardware statistics collected from the hardware Performance Monitoring Unit with the aid of a Decision Transformer. Experimental evaluation is performed in the context of the B$^+$-Tree index. Performance results demonstrate that P-MOSS offers an improvement of up to $6\times$ over traditional schedules in terms of query throughput.

URL PDF HTML ☆

赞 0 踩 0

2601.10885 2026-06-12 physics.plasm-ph cs.LG physics.comp-ph 版本更新

Learning collision operators from plasma phase space data using differentiable simulators

利用可微分模拟器从等离子体相空间数据学习碰撞算子

Diogo D. Carvalho, Pablo J. Bilbao, Warren B. Mori, Luis O. Silva, E. Paulo Alves

发表机构 * GoLP/Instituto de Plasmas e Fusão Nuclear, Instituto Superior Técnico, Universidade de Lisboa（GoLP/等离子体与核融合研究所，理工学院，里斯本大学）； Mani L. Bhaumik Institute for Theoretical Physics, University of California, Los Angeles（马尼·L·巴乌米克理论物理研究所，加州大学洛杉矶分校）； The Rudolf Peierls Centre for Theoretical Physics, University of Oxford（鲁道夫·皮埃尔尔斯理论物理中心，牛津大学）； Department of Physics and Astronomy University of California, Los Angeles（物理与天文学系，加州大学洛杉矶分校）

AI总结提出一种结合可微分Fokker-Planck求解器与梯度优化方法，从等离子体相空间数据推断碰撞算子的方法，并在二维PIC模拟数据上验证其准确性和计算效率。

Comments accepted for publication in Journal of Plasma Physics, code available at https://github.com/diogodcarvalho/ml-pic-collision-operators

详情

DOI: 10.1017/S0022377826101755
Journal ref: J. Plasma Phys. (2026), vol. 92, E76

AI中文摘要

我们提出了一种从等离子体动力学相空间数据推断碰撞算子的方法。该方法结合了一个可微分动力学模拟器（其核心组件是一个可微分的Fokker-Planck求解器）与基于梯度的优化方法，以学习最能描述相空间动力学的碰撞算子。我们使用空间均匀热等离子体的二维Particle-in-Cell模拟数据测试了该方法，学习了能够捕获有限大小带电粒子之间自洽电磁相互作用的碰撞算子，该算子适用于多种模拟参数。我们证明，学习到的算子比基于粒子轨迹的替代估计更准确，同时无需对过程的相关时间尺度做出先验假设，并显著降低了内存需求。我们发现，在非相对论条件下获得的算子与静电场景的理论预测高度一致。我们的结果表明，可微分模拟器为推断新算子提供了一种强大且计算高效的方法，适用于广泛的问题，如电磁主导的碰撞动力学和随机波粒相互作用。

英文摘要

We propose a methodology to infer collision operators from phase space data of plasma dynamics. Our approach combines a differentiable kinetic simulator, whose core component in this work is a differentiable Fokker-Planck solver, with a gradient-based optimisation method to learn the collisional operators that best describe the phase space dynamics. We test our method using data from two-dimensional Particle-in-Cell simulations of spatially uniform thermal plasmas, and learn the collision operator that captures the self-consistent electromagnetic interaction between finite-size charged particles over a wide variety of simulation parameters. We demonstrate that the learned operators are more accurate than alternative estimates based on particle tracks, while making no prior assumptions about the relevant time scales of the processes and significantly reducing memory requirements. We find that the retrieved operators, obtained in the non-relativistic regime, are in excellent agreement with theoretical predictions derived for electrostatic scenarios. Our results show that differentiable simulators offer a powerful and computational efficient approach to infer novel operators for a wide rage of problems, such as electromagnetically dominated collisional dynamics and stochastic wave-particle interactions.

URL PDF HTML ☆

赞 0 踩 0

2510.03699 2026-06-12 q-bio.NC cs.AI cs.LG cs.NE cs.SY eess.SY 版本更新

Dissecting Larval Zebrafish Hunting using Deep Reinforcement Learning Trained RNN Agents

解析斑马鱼幼体捕食行为的深度强化学习训练RNN代理

Raaghav Malik, Satpreet H. Singh, Sonja Johnson-Yu, Nathan Wu, Roy Harpaz, Florian Engert, Kanaka Rajan

发表机构 * California Institute of Technology（加州理工学院）； Harvard University（哈佛大学）

AI总结本文通过深度强化学习训练RNN代理，研究斑马鱼幼体捕食行为，揭示生态和能量约束如何影响适应性行为，发现简单模型能复现真实捕食行为，并通过虚拟实验验证约束和环境对捕食动态的影响。

详情

DOI: 10.32470/h4pp9b0
Journal ref: Proceedings of the 9th Conference on Cognitive Computational Neuroscience (2026)

AI中文摘要

斑马鱼幼体捕食行为为研究生态和能量约束如何塑造生物大脑和人工代理适应性行为提供了可操作的环境。本文开发了一个最小的基于代理的模型，通过深度强化学习在基于回合的斑马鱼模拟器中训练循环策略。尽管模型简单，它能复现标志性的捕食行为，包括眼位联合适追、速度调节和刻板接近轨迹，这些行为与真实幼体斑马鱼高度吻合。定量轨迹分析显示，追捕回合系统性地将猎物角度减少约一半后再捕食，与测量结果一致。虚拟实验和参数扫描变化生态和能量约束、回合运动学（耦合 vs. 未耦合转弯和前进运动）以及环境因素如食物密度、食物速度和融合限制。这些操作揭示了约束和环境如何塑造追捕动态、捕食成功率和中止率，为神经科学实验提供可验证的预测。这些扫描识别出一组紧凑的约束——双目感知、回合运动学中前进速度与转弯的耦合，以及适度的运动和融合的能量成本——这些约束足以使斑马鱼样式的捕食行为出现。惊人的是，这些行为在最小的代理中出现，而无需详细的生物力学、流体动力学、电路真实性和从真实斑马鱼数据中模仿学习。总体而言，这项工作为斑马鱼捕食行为提供了规范性的解释，即能量成本和感官收益之间的最佳平衡，突显了融合和轨迹动态的权衡。我们建立了一个虚拟实验室，缩小了实验搜索空间并生成了关于行为和神经编码的可验证预测。

英文摘要

Larval zebrafish hunting provides a tractable setting to study how ecological and energetic constraints shape adaptive behavior in both biological brains and artificial agents. Here we develop a minimal agent-based model, training recurrent policies with deep reinforcement learning in a bout-based zebrafish simulator. Despite its simplicity, the model reproduces hallmark hunting behaviors -- including eye vergence-linked pursuit, speed modulation, and stereotyped approach trajectories -- that closely match real larval zebrafish. Quantitative trajectory analyses show that pursuit bouts systematically reduce prey angle by roughly half before strike, consistent with measurements. Virtual experiments and parameter sweeps vary ecological and energetic constraints, bout kinematics (coupled vs. uncoupled turns and forward motion), and environmental factors such as food density, food speed, and vergence limits. These manipulations reveal how constraints and environments shape pursuit dynamics, strike success, and abort rates, yielding falsifiable predictions for neuroscience experiments. These sweeps identify a compact set of constraints -- binocular sensing, the coupling of forward speed and turning in bout kinematics, and modest energetic costs on locomotion and vergence -- that are sufficient for zebrafish-like hunting to emerge. Strikingly, these behaviors arise in minimal agents without detailed biomechanics, fluid dynamics, circuit realism, or imitation learning from real zebrafish data. Taken together, this work provides a normative account of zebrafish hunting as the optimal balance between energetic cost and sensory benefit, highlighting the trade-offs that structure vergence and trajectory dynamics. We establish a virtual lab that narrows the experimental search space and generates falsifiable predictions about behavior and neural coding.

URL PDF HTML ☆

赞 0 踩 0

2307.05520 2026-06-12 cs.LG cs.CY cs.SE 版本更新

Estimating Deep Learning energy consumption based on model architecture and training environment

基于模型架构和训练环境的深度学习能耗估算

Santiago del Rey, Luís Cruz, Xavier Franch, Silverio Martínez-Fernández

发表机构 * Universitat Politècnica de Catalunya（巴塞罗那理工大学）； Tecnológico de Delft（代尔夫特理工大学）

AI总结研究通过分析模型架构与训练环境对能耗的影响，提出STEP和PRE方法，显著提升能耗估算准确性，减少训练能耗达80.68%。

Comments 48 pages, 10 figures, under review in Computer Standards & Interfaces journal. This work is an extension of arXiv:2307.05520v3 [cs.LG]

详情

DOI: 10.1016/j.csi.2026.104170

AI中文摘要

为提高对深度学习环境影响的认识，许多研究估算DL系统的能耗。然而，训练期间的能耗估计常依赖未经验证的假设。本文通过研究模型架构和训练环境对能耗的影响，训练多种计算机视觉模型并收集能耗和准确率指标，分析其配置间的权衡。结果表明，选择合适的模型-训练环境组合可将训练能耗降低80.68%，准确率损失低于2%。发现模型与训练环境之间存在显著交互效应：GPU计算能力与模型复杂度成正比时，能效提升。此外，证明常用估算方法如FLOPs或GPU TDP无法捕捉这些动态，可能导致重大误差。为此，提出STable Training Epoch Projection (STEP)和Pre-training Regression-based Estimation (PRE)方法。在评估中，这些方法在估算准确性上比现有工具高两倍或更多。

英文摘要

To raise awareness of the environmental impact of deep learning (DL), many studies estimate the energy use of DL systems. However, energy estimates during DL training often rely on unverified assumptions. This work addresses that gap by investigating how model architecture and training environment affect energy consumption. We train a variety of computer vision models and collect energy consumption and accuracy metrics to analyze their trade-offs across configurations. Our results show that selecting the right model-training environment combination can reduce training energy consumption by up to 80.68% with less than 2% loss in $F_1$ score. We find a significant interaction effect between model and training environment: energy efficiency improves when GPU computational power scales with model complexity. Moreover, we demonstrate that common estimation practices, such as using FLOPs or GPU TDP, fail to capture these dynamics and can lead to substantial errors. To address these shortcomings, we propose the Stable Training Epoch Projection (STEP) and the Pre-training Regression-based Estimation (PRE) methods. Across evaluations, our methods outperform existing tools by a factor of two or more in estimation accuracy.

URL PDF HTML ☆

赞 0 踩 0

2507.11936 2026-06-12 cs.CL cs.AI cs.CV cs.LG 版本更新

A Survey of Deep Learning for Geometry Problem Solving

深度学习在几何问题求解中的应用综述

Jianzhe Ma, Wenxuan Wang, Qin Jin

发表机构 * Renmin University of China（中国人民大学）

AI总结本文综述了深度学习在几何问题求解中的应用，涵盖相关任务、方法、评估指标及未来方向，旨在提供实践参考以推动该领域发展。

Comments ACL 2026 Main Conference

详情

AI中文摘要

几何问题求解作为数学推理的重要组成部分，在教育、评估AI数学能力及多模态能力评估中具有关键作用。近期深度学习技术，尤其是多模态大语言模型的出现，显著加速了该领域的研究。本文综述了深度学习在几何问题求解中的应用，包括（i）几何问题求解相关任务的全面总结；（ii）相关深度学习方法的深入回顾；（iii）评估指标和方法的详细分析；以及（iv）最先进性能、现有挑战和有前景的未来方向的批判性讨论。我们的目标是提供一个全面且实用的深度学习在几何问题求解中的参考，从而推动该领域进一步发展。我们维护了一个相关论文列表：https://github.com/majianz/dl4gps。

英文摘要

Geometry problem solving, a crucial aspect of mathematical reasoning, is vital across various domains, including education, the assessment of AI's mathematical abilities, and multimodal capability evaluation. The recent surge in deep learning technologies, particularly the emergence of multimodal large language models, has significantly accelerated research in this area. This paper presents a survey of the applications of deep learning in geometry problem solving, including (i) a comprehensive summary of the relevant tasks in geometry problem solving; (ii) a thorough review of related deep learning methods; (iii) a detailed analysis of evaluation metrics and methods; and (iv) a critical discussion of state-of-the-art performance, existing challenges, and promising future directions. Our objective is to offer a comprehensive and practical reference of deep learning for geometry problem solving, thereby fostering further advancements in this field. We maintain a list of relevant papers: https://github.com/majianz/dl4gps.

URL PDF HTML ☆

赞 0 踩 0

2104.11105 2026-06-12 cs.CR cs.LG cs.NE 版本更新

Synchronization of Tree Parity Machines using non-binary input vectors

使用非二进制输入向量同步树奇偶机

Miłosz Stypiński, Marcin Niemiec

发表机构 * AGH University of Science and Technology（波兰格但尼克技术大学）

AI总结本文提出利用范围更广的非二进制输入向量改进树奇偶机的同步过程，从而减少同步时间并提升神经密码学的安全性。

Comments This work has been submitted to the IEEE for possible publication

2306.01690 2026-06-12 cs.LG cs.AI 版本更新

Context selectivity with dynamic availability enables lifelong continual learning

基于动态可用性的上下文选择性促进终身持续学习

Martin Barry, Wulfram Gerstner, Guillaume Bellec

发表机构 * Department of Life Sciences, Department of Computer Sciences（生命科学系、计算机科学系）

AI总结本文提出基于上下文选择性和动态可用性的元可塑性规则，通过模拟验证该模型在图像识别和自然语言处理任务中优于现有持续学习算法。

详情

DOI: 10.1016/j.neunet.2025.107728

AI中文摘要

"你永远忘不了如何骑自行车"——但这是如何可能的？大脑能够学习复杂技能，停顿多年不练习，中间学习其他技能，仍能随时召回原始知识。这种能力的机制，称为终身学习（或持续学习，CL），尚不清楚。我们建议一种生物合理的元可塑性规则，基于经典持续学习工作，总结为两个原则：(i) 神经元具有上下文选择性，(ii) 一个局部可用性变量在神经元先前任务相关时部分冻结可塑性。在新的神经中心形式化中，我们建议神经元选择性和神经元级巩固是简单且可行的元可塑性假设，以在大脑中实现CL。在模拟中，该简单模型平衡了遗忘和巩固，导致在图像识别和自然语言处理CL基准上优于当前CL算法。

英文摘要

"You never forget how to ride a bike", -- but how is that possible? The brain is able to learn complex skills, stop the practice for years, learn other skills in between, and still retrieve the original knowledge when necessary. The mechanisms of this capability, referred to as lifelong learning (or continual learning, CL), are unknown. We suggest a bio-plausible meta-plasticity rule building on classical work in CL which we summarize in two principles: (i) neurons are context selective, and (ii) a local availability variable partially freezes the plasticity if the neuron was relevant for previous tasks. In a new neuro-centric formalization of these principles, we suggest that neuron selectivity and neuron-wide consolidation is a simple and viable meta-plasticity hypothesis to enable CL in the brain. In simulation, this simple model balances forgetting and consolidation leading to better transfer learning than contemporary CL algorithms on image recognition and natural language processing CL benchmarks.

URL PDF HTML ☆

赞 0 踩 0

1710.03070 2026-06-12 cs.NE cs.LG q-bio.NC stat.ML 版本更新

full-FORCE: A Target-Based Method for Training Recurrent Networks

full-FORCE：一种基于目标的训练循环网络方法

Brian DePasquale, Christopher J. Cueva, Kanaka Rajan, G. Sean Escola, L. F. Abbott

发表机构 * Department of Neuroscience（神经科学系）； Zuckerman Institute（Zuckerman研究所）； Columbia University（哥伦比亚大学）； Department of Physiology and Cellular Biophysics（生理学与细胞生物物理学系）； Columbia University College of Physicians and Surgeons（哥伦比亚大学医学与外科学院）； Princeton Neuroscience Institute（普林斯顿神经科学研究所）； Lewis-Sigler Institute for Integrative Genomics（整合基因组学研究所）

AI总结本文提出一种基于目标的循环网络训练方法，通过引入第二网络提供目标动态，实现更高效的任务处理，具有更少的神经元和更高的噪声鲁棒性。

Comments 20 pages, 8 figures

详情

DOI: 10.1371/journal.pone.0191527
Journal ref: PLoS ONE (2018)

AI中文摘要

训练好的循环网络是建模动态神经计算的强大工具。我们提出了一种基于目标的方法，用于修改循环网络的全连接矩阵，以训练其执行涉及时间复杂输入/输出转换的任务。该方法在训练过程中引入第二个网络，提供合适的“目标”动态，有助于完成任务。由于利用了全循环连接，该方法产生的网络在执行任务时比传统的最小二乘（FORCE）方法使用更少的神经元，并具有更高的噪声鲁棒性。此外，我们展示了如何通过向目标生成网络引入额外的输入信号，这些信号作为任务提示，大大扩展了可学习的任务范围，并提供了对训练任务执行网络动态复杂性和性质的控制。

英文摘要

Trained recurrent networks are powerful tools for modeling dynamic neural computations. We present a target-based method for modifying the full connectivity matrix of a recurrent network to train it to perform tasks involving temporally complex input/output transformations. The method introduces a second network during training to provide suitable "target" dynamics useful for performing the task. Because it exploits the full recurrent connectivity, the method produces networks that perform tasks with fewer neurons and greater noise robustness than traditional least-squares (FORCE) approaches. In addition, we show how introducing additional input signals into the target-generating network, which act as task hints, greatly extends the range of tasks that can be learned and provides control over the complexity and nature of the dynamics of the trained, task-performing network.

URL PDF HTML ☆

赞 0 踩 0

1. 深度学习架构与训练方法 34 篇

Boltzmann Attention: Learnable Ising Couplings for Cooperative Attention

$μ$VLA: On Recurrent Memory for Partially Observable Manipulation in VLA Models

Rubric-Guided Self-Distillation: Post-Training Without Rubric Verifiers

Deep Unfolded Latent Optimally Partitioned-l2/l1 Networks for Data-driven Block-Sparse Recovery

CLARITree: Cholesky and Lookahead Accelerations for Regression with Interpretable Piecewise Linear Trees

TimeROME-DLM: Temporal Causal Tracing and Low-Rank Inference-Time Knowledge Editing for Masked Diffusion Language Models

LongSpike: Fractional Order Spiking State Space Models for Efficient Long Sequence Learning

Where Computation Lives Inside TabPFN: Causal Localisation of Attention Head Function

Circuit Synchronization Precedes Generalization: Causal Evidence from Fourier Structure in Grokking Transformers

EPM-JEPA: Operator-Side Experience Modulation in JEPA-Family World Models

Emotional regulation improves deep learning-based image classification

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

Select and Improve: Understanding the Mechanics of Post-Training for Reasoning

When Does Routing Become Interpretable? Causal Probes on Block Attention Residuals

Distributional Loss for Robust Classification

Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization

How Much Memory Do We Need? Adaptive Memory Gate for Neural Operators

Adjusted Cup-Product Neural Layer

Existence Precedes Value: Joint Modeling of Observational Existence and Evolving States in Time Series Forecasting

Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models

A Two-Parameter Weibull Framework for Diagnosing Transformer Weight Distributions

BASENet: Band-Adapted Speech Enhancement Network with Cross-Band Attention

Self-Guidance: Enhancing Neural Codecs via Decoder Manifold Alignment

Geometry of Lightning Self-Attention: Identifiability and Dimension

Fourier Multi-Component and Multi-Layer Neural Networks: Unlocking High-Frequency Potential

Learning on a Razor's Edge: Identifiability and Singularity of Polynomial Neural Networks

Lightweight and Interpretable Transformer via Mixed Graph Algorithm Unrolling for Traffic Forecast

Single vs. Multiple Branches in DeepONet and S-DeepONet: Network Architecture Follows Coupling in Multiphysics Systems

Structuring The Future: Diffusion LLM Speculative Decoding via Calibrated Draft Graphs

Multi-Token Residual Prediction

Transformer Field Theory: A Response-Theoretic Approach to Mechanistic Interpretability

GenAutoML: An Agentic Framework for Dynamic Architecture Generation and Optimization in Time-Series Analysis

Bernstein-Schur Kernels: Random Features by Sketched Modulation and Radial Randomization

Spatially Grounded Concept Bottleneck Models via Part-Factorized Attention

2. 表示学习、自监督与对比学习 15 篇

Representing Time Series as Structured Programs for LLM Reasoning

A Stationary (and Therefore Compatible) Representation is All You Need

Dolph2Vec: Self-Supervised Representations of Dolphin Vocalizations

Viral Proteins Reveal Geometry of Protein Language Models

Extracting Governing Equations from Latent Dynamics via Multi-View Contrastive Learning

Identifiability Without Gaussianity: Symbolic World Models and Near-Infinite Temporal Consistency

PlaceRep: Geospatial Place Representation Learning from Large-Scale Point-of-Interest Data

BrainPro: Towards Large-scale Brain State-aware EEG Representation Learning

Echo2ECG: Enhancing ECG Representations with Cardiac Morphology from Multi-View Echos

Disentangling Dynamical Systems: Causal Representation Learning Meets Local Sparse Attention

BrainDINO: A Brain MRI Foundation Model for Generalizable Clinical Representation Learning

One Step Closer to Ground Truth: A Multi-Scale Residual-Aware Representation Learning Pipeline for Predicting Time Series Data

When to Align, When to Predict: A Phase Diagram for Multimodal Learning

Learning Visually Interpretable Oscillator Networks for Soft Continuum Robots from Video

Counterfactual Explanations for Deep Two-Sample Testing

3. 强化学习与序列决策 18 篇

ReCal: Reward Calibration for RL-based LLM Routing

Speculative Rollback Correction for Quality-Diverse Web Agent Imitation

Boosting Direct Preference Optimization with Penalization

Keep Policy Gradient in Charge: Sibling-Guided Credit Distillation for Long-Horizon Tool-Use Agents

Individual Control Barrier Functions-Guided Diffusion Model for Safe Offline Multi-Agent Reinforcement Learning

ProPlay: Procedural World Models for Self-Evolving LLM Agents

Reinforcement Learning for Neural Model Editing

MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

$α$-fair heterogeneous agent reinforcement learning

Reward Modeling for Multi-Agent Orchestration

Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch

Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning

DiffCoord: Differentiable Coordination for Distributed Multi-Agent Trajectory Optimization

ARROW: Augmented Replay for RObust World models

Mixing Makes Markovian Contexts Cheap for Linear Bandits

WOMBET: World Model-Based Experience Transfer for Robust and Sample-efficient Reinforcement Learning

QoS Improvement in Multi User Cellular-Symbiotic Radio Network Assisted by Active-STAR-RIS

SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models

4. 生成模型与概率建模 27 篇

Net-Ev$^2$: A Generative Simulator for Network Event Evolution

A Stabilized Path-Space Approach to Diffusion-Based Posterior Sampling

The Geometry of Phase Transitions in Generative Dynamics via Projection Caustics

Towards More General Control of Diffusion Models Using Jeffrey Guidance

Enhanced Low-Density Region Exploration in Classifier-Guided Diffusion Models Through Modified Reverse Diffusion Sampling

VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

Hölder++: Improving the Quality-Coherence Trade-off in Multimodal VAEs

PolyFlow: Safe and Efficient Polytope-Constrained Flow Matching with Constraint Embedding and Projection-free Update

Accelerating Speculative Diffusions via Block Verification