arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 8081
专题追踪
2606.02661 2026-06-03 eess.IV cs.AI cs.LG

Learning to Refine: Spectral-Decoupled Iterative Refinement Framework for Precipitation Nowcasting

学习细化:用于降水临近预报的频谱解耦迭代细化框架

Yunlong Zhou, Chen Zhao, Danyang Peng, Fanfan Ji, Xiao-Tong Yuan

发表机构 * National University of Singapore(新加坡国立大学)

AI总结 提出频谱解耦迭代细化框架(SDIR),通过双路径设计(SFG-Former和FR-Refiner)和物理一致功率谱密度损失,在确定性框架中实现降水临近预报的渐进频率解耦细化,消除模糊和幻觉,在空间精度和频谱保真度上超越现有方法。

Comments 21 pages, 10 figures, accepted at ICML 2026

详情
AI中文摘要

准确的降水临近预报对减灾至关重要,但深度学习方法面临关键权衡:回归模型产生过度平滑、频谱衰减的预测,模糊对流细节并违反湍流幂律;扩散模型生成逼真但无锚定的幻觉,缺乏物理基础。我们提出频谱解耦迭代细化(SDIR),一个确定性框架,将临近预报重新表述为渐进频率解耦细化。SDIR首先提取稳定的低频天气尺度骨架,然后在物理约束下迭代细化高频纹理,消除模糊和幻觉。它采用双路径设计:天气尺度频率引导前馈网络(SFG-Former)使用尺度自适应Transformer处理全局结构,傅里叶残差细化器(FR-Refiner)使用尺度条件傅里叶神经算子处理精细残差。具有动态掩蔽的物理一致功率谱密度(PCPSD)损失强制执行湍流一致的频谱分布。在三个基准上的实验表明,SDIR在空间精度上显著优于最先进方法,同时实现了与基于扩散方法竞争的频谱保真度,实现了可靠的高分辨率业务化临近预报。代码链接:this https URL。

英文摘要

Accurate precipitation nowcasting is vital for disaster mitigation, but deep learning methods face a key trade-off: regression models produce over-smoothed, spectrally decaying predictions that blur convective details and violate turbulence power laws; diffusion models generate realistic yet unanchored hallucinations lacking physical grounding. We propose Spectral-Decoupled Iterative Refinement (SDIR), a deterministic framework that reformulates nowcasting as progressive frequency-decoupled refinement. SDIR first extracts a stable low-frequency synoptic skeleton, then iteratively refines high-frequency textures under physical constraints, eliminating both blurring and hallucinations. It features a dual-path design: the Synoptic Frequency-Guided Former (SFG-Former) with Scale-Adaptive Transformers for global structure, and the Fourier Residual Refiner (FR-Refiner) with Scale-Conditioned Fourier Neural Operators for fine residuals. A Physically Consistent Power Spectral Density (PCPSD) loss with dynamic masking enforces a turbulence-consistent spectral distribution. Experiments on three benchmarks show SDIR significantly outperforms SOTA methods in spatial accuracy while achieving spectral fidelity competitive with diffusion-based methods, enabling reliable high-resolution operational nowcasting. Code link: https://github.com/RuntimeWarning/SDIR.

2606.02642 2026-06-03 eess.AS cs.AI cs.CV cs.LG cs.MM cs.SD

SVHalluc: Benchmarking Speech-Vision Hallucination in Audio-Visual Large Language Models

SVHalluc: 音频-视觉大语言模型中的语音-视觉幻觉基准测试

Chenshuang Zhang, Kyeong Seon Kim, Chengxin Liu, Tae-Hyun Oh

发表机构 * KAIST(韩国国立信息通信研究院)

AI总结 针对音频-视觉大语言模型中的语音-视觉幻觉问题,提出SVHalluc基准,从语义和时间两个维度评估模型将语音内容与视觉信号对齐的能力,发现现有模型存在跨模态理解局限。

Comments Accepted at CVPR 2026

详情
AI中文摘要

尽管音频-视觉大语言模型(LLMs)取得了成功,但它们可能产生看似合理但缺乏依据的输出,即幻觉。现有基准侧重于环境声音(例如狗叫)来指示事件发生。相比之下,人类语音承载着根本不同的、丰富的语义和时间结构,但当前模型能否准确地将语音内容与相应的视觉信号对齐仍未得到探索。在这项工作中,我们表明语音内容可以引发音频-视觉LLMs中的幻觉。为了系统研究这一点,我们引入了SVHalluc,这是第一个用于评估音频-视觉LLMs中语音-视觉幻觉的综合基准。我们的基准从两个关键且互补的方面诊断语音-视觉幻觉:语义和时间。实验结果表明,最先进的开源音频-视觉LLMs难以将语音内容与相应的视觉信号对齐,在多个任务上的准确率接近随机。相比之下,Gemini 2.5 Pro显著优于开源模型。我们的分析表明,它们的失败源于跨模态理解能力有限,尽管在单模态感知方面表现强劲。我们的工作揭示了当前音频-视觉LLMs的一个新的根本性局限,并强调了基于语音的视频理解的需求。项目页面:此https URL。

英文摘要

Despite the success of audio-visual large-language models (LLMs), they can produce plausible but ungrounded outputs, termed hallucination. Existing benchmarks focus on environmental sounds (e.g., dog barking) to indicate event occurrence. In contrast, human speech carries fundamentally different, rich semantics and temporal structures, yet it remains unexplored whether current models can accurately align speech content with corresponding visual signals. In this work, we show that speech content can induce hallucinations in audio-visual LLMs. To systematically study this, we introduce SVHalluc, the first comprehensive benchmark for evaluating speech-vision hallucination in audio-visual LLMs. Our benchmark diagnoses speech-vision hallucinations from two critical and complementary aspects: semantic and temporal. Experimental results demonstrate that state-of-the-art open-source audio-visual LLMs struggle with aligning speech content with corresponding visual signals, with a near-random accuracy on multiple tasks. In contrast, Gemini 2.5 Pro significantly outperforms the open-source models. Our analysis suggests that their failures stem from limited ability in cross-modality understanding, despite strong performance in single-modality perception. Our work uncovers a new and fundamental limitation of current audio-visual LLMs and highlights the need for speech-grounded video comprehension. Project page: https://chenshuang-zhang.github.io/projects/svhalluc/.

2606.02639 2026-06-03 eess.IV cs.AI cs.CV

Sparse-View Lung Nodule Volumetry from Digitally Reconstructed Radiographs via AReT: Anatomy-Regularized TensoRF

通过AReT:解剖正则化TensoRF从数字重建放射图像进行稀疏视图肺结节体积测量

Spoorthi M, Suja Palaniswamy

发表机构 * Amrita University(阿姆里塔大学)

AI总结 本文发现并解决了TensoRF在X射线衰减场中的默认密度偏移问题,提出解剖正则化张量辐射场框架AReT,仅用三个正交X射线投影即可实现肺结节的稳定体积重建,在LIDC-IDRI数据集上达到高精度。

详情
AI中文摘要

我们识别并解决了TensoRF应用于X射线衰减场时一个先前未报告的失败模式:默认密度偏移-10(最初为RGB场景重建引入)抑制了密度梯度,并阻止了稀疏视图医学重建,无论学习率或正则化策略如何。将密度偏移设置为零可恢复梯度流,并仅从三个正交X射线投影实现肺结节的稳定体积重建。在此基础上,我们提出AReT,一个解剖正则化的张量辐射场框架,用于使用LIDC-IDRI数据集(19名患者,放射科医生注释的结节)的冠状、矢状和轴向投影进行肺结节重建。与需要密集多视图采集的现有NeRF方法不同,AReT专为稀疏视图胸部成像设计,并整合了结合L1稀疏性和总变分平滑性的胸部解剖感知正则化。对11种重建策略的系统比较表明,解剖感知正则化始终优于生成先验引导的方法。与放射科医生共识分割相比,AReT在临床可操作的结节(>=10 mm,n=14)上实现了Pearson r=0.983(p<0.0001),中位绝对体积误差为11.4%,接近零的系统偏差为-77.3 mm^3,并且比球形体积近似提高了8.4倍。

英文摘要

We identify and resolve a previously unreported failure mode in TensoRF when applied to X-ray attenuation fields: the default density shift of -10, originally introduced for RGB scene reconstruction, suppresses density gradients and prevents sparse-view medical reconstruction regardless of learning rate or regularization strategy. Setting the density shift to zero restores gradient flow and enables stable volumetric reconstruction of pulmonary nodules from only three orthogonal X-ray projections. Building on this, we propose AReT, an anatomy-regularized tensorial radiance field framework for lung nodule reconstruction using coronal, sagittal, and axial projections from the LIDC-IDRI dataset (19 patients, radiologist-annotated nodules). Unlike existing NeRF approaches requiring dense multi-view acquisition, AReT is designed for sparse-view thoracic imaging and incorporates chest-anatomy-aware regularization combining L1 sparsity and total variation smoothness. A systematic comparison across 11 reconstruction strategies shows anatomy-aware regularization consistently outperforms generative-prior-guided approaches. Evaluated against radiologist consensus segmentations, AReT achieves Pearson r=0.983 (p<0.0001) for clinically actionable nodules >=10 mm (n=14), median absolute volumetric error of 11.4%, near-zero systematic bias of -77.3 mm^3, and 8.4x improvement over spherical volume approximation.

2606.02631 2026-06-03 eess.AS cs.AI cs.CV cs.LG cs.SD

Wavelet as Tokenizer: Preliminary Results on a Shared Wavelet Token Schema for Natural Signals

小波作为分词器:自然信号共享小波分词方案的初步结果

Shenghao Ding

发表机构 * Yet Another AI

AI总结 本文研究音频、图像和视频能否共享统一的小波分词方案,通过基于Haar DWT/IDWT的连续令牌模型,在多个数据集上验证了统一分词模式的可行性,并分析了潜在容量和元数据的影响。

Comments 12 pages, 3 figures

详情
AI中文摘要

本文研究音频、图像和视频是否可以共享一个共同的小波令牌模式,而不是依赖于各自模态特定的潜在网格。它介绍了一个初步的连续令牌模型,该模型围绕一级Haar DWT/IDWT前端、共享系数令牌布局、可选结构元数据、轻量级模态值适配器和共享的令牌级编码器-解码器主干构建。在Speech Commands、EuroSAT RGB和DAVIS 2017数据上,密集共享模型达到了39.92 dB音频、29.37 dB图像和23.93 dB视频的PSNR。在连续潜在标量预算下的匹配速率扫描表明,视觉增益不能仅由潜在容量解释,同时也表明加性元数据嵌入并非普遍改进来源。最后,固定速率能量选择提供了一个强大的非参数基线:在压缩保留比率下,energy_global相比均匀选择将音频的平均PSNR提高了16.73 dB,图像提高了16.90 dB,视频提高了15.86 dB。掩蔽稀疏训练在50%的密集令牌下达到了34.45 dB的视频PSNR。结果支持统一的 wavelet 令牌模式和稀疏令牌接口,但尚未建立通用的离散词汇表。

英文摘要

This paper studies whether audio, images, and video can share a common wavelet token schema rather than relying on separate modality-specific latent grids. It introduces a preliminary continuous-token model built around a one-level Haar DWT/IDWT frontend, a shared coefficient-token layout, optional structural metadata, lightweight modality value adapters, and a shared token-wise encoder-decoder trunk. On Speech Commands, EuroSAT RGB, and DAVIS 2017 data, a dense shared model reaches 39.92 dB audio, 29.37 dB image, and 23.93 dB video PSNR. A matched-rate sweep under continuous latent scalar budgets indicates that the visual gains are not explained solely by latent capacity, while also showing that additive metadata embeddings are not a universal source of improvement. Finally, fixed-rate energy selection provides a strong non-parametric baseline: energy_global improves average PSNR over uniform selection by 16.73 dB for audio, 16.90 dB for images, and 15.86 dB for video under compressed keep ratios. Masked sparse training reaches 34.45 dB video PSNR with 50% of dense tokens. The results support a unified wavelet token schema and sparse token interface, while stopping short of establishing a universal discrete vocabulary.

2606.02615 2026-06-03 eess.AS cs.AI cs.SD

FSA-GRPO: Teaching Auditory LLMs to Use Few-shot Demonstrations

FSA-GRPO:训练听觉大语言模型使用少样本示例

Haolong Zheng, Siyin Wang, Xulin Fan, Zengrui Jin, Mark Hasegawa-Johnson

发表机构 * University of Illinois Urbana Champaign(伊利诺伊大学厄巴纳-香槟分校) Tsinghua University(清华大学)

AI总结 提出基于强化学习的后训练方法FSA-GRPO,通过专门设计的奖励机制鼓励模型利用少样本示例,增强其少样本适应能力,在儿童语音识别、语音翻译和音频理解等任务上取得提升。

详情
AI中文摘要

少样本提示为将听觉大语言模型适应低资源任务(如儿童语音识别)提供了一种有效方式。然而,大多数听觉大语言模型并未被明确训练以在这种示例条件格式下进行推理,限制了它们从少样本提示中获益的程度。为解决这一局限,我们引入了少样本感知GRPO(FSA-GRPO),一种基于强化学习的后训练方法,使用专门设计的奖励来鼓励模型利用少样本示例,从而增强其少样本适应能力。值得注意的是,仅使用高资源成人ASR数据进行训练即可提升模型的通用少样本适应能力,不仅在儿童语音识别中带来收益,在语音翻译和音频理解中也是如此。我们进一步研究了数据选择和辅助奖励加权,以确定有效的训练方案。实验表明,当域内数据不可用或无法用于训练时,FSA-GRPO比直接对相关域外数据进行微调更有效。

英文摘要

Few-shot prompting provides an effective way to adapt auditory large language models to low-resource tasks such as children's speech recognition. However, most auditory large language models are not explicitly trained to perform inference in this demonstration-conditioned format, limiting the extent to which they can benefit from few-shot prompting. To address this limitation, we introduce Few-Shot Aware GRPO (FSA-GRPO), an RL-based post-training recipe that uses a specially designed reward to encourage the model to leverage few-shot demonstrations, thereby strengthening its few-shot adaptation ability. Notably, training with only high-resource adult ASR data improves the model's general few-shot adaptation ability, yielding gains not only in children's speech recognition but also in speech translation and audio understanding. We further study data selection and auxiliary reward weighting to identify an effective training recipe. Our experiments show that when in-domain data are unavailable or cannot be used for training, FSA-GRPO is more effective than direct tuning on related out-of-domain data.

2606.03878 2026-06-03 stat.ML cs.LG

Privacy-Robust Incrementality Measurement for Advertising Systems under Signal Loss

信号损失下广告系统的隐私鲁棒增量测量

Prashant Shekhar, Caroline Howard

发表机构 * Department of Mathematics, Embry-Riddle Aeronautical University(数学系,埃姆伯里-里德尔航空大学)

AI总结 针对隐私保护报告系统导致的信号损失,提出鲁棒因果决策框架,通过投影观测兼容的实验世界到增量泛函,给出尖锐决策边界,实现认证、拒绝或未决的增量判断。

详情
AI中文摘要

广告平台使用随机提升测试来测量增量,但隐私保护报告系统通过匹配率损失、可链接性损失、归因窗口损失、聚合阈值抑制、随机报告噪声和分段异质信号损失降低观测信号。本文将隐私约束下的广告测量形式化为一个鲁棒因果决策问题,考虑上述信号损失。给定随机实验和隐私引起的退化的模糊集,该框架将观测兼容的干净/未过滤实验世界的纤维投影到增量泛函上,并返回认证、拒绝和未决的决策。主要结果给出了尖锐的决策边界。边界外的报告支持一致有效的认证或拒绝,而边界内的报告包含的信息太少,任何方法都无法一致区分高于阈值的增量与非增量。支持结果给出了有限样本认证、样本复杂度保证、表明信号损失减少有效信息的极小极大下界,以及报告粒度权衡。在200万条Criteo提升数据和6.4万条Hillstrom电子邮件实验中,两个数据集的干净转化提升均为正,分别为0.00112和0.00495。在Criteo中,总体认证在轻度退化下幸存,在Hillstrom中在严重退化下幸存,而两个数据集中所有考虑的有限样本压力设置在同时包含不确定性和报告噪声后仍然未决。总体而言,本研究为隐私感知的增量测量贡献了一个决策理论层,其输出是由退化广告信号证明的最强因果主张。

英文摘要

Advertising platforms use randomized lift tests to measure incrementality, but privacy-preserving reporting systems degrade the observed signal through match-rate loss, linkability loss, attribution-window loss, aggregation-threshold suppression, randomized reporting noise, and segment-heterogeneous signal loss. This paper formulates privacy-constrained advertising measurement as a robust causal decision problem under the mentioned signal losses. Given a randomized experiment and an ambiguity set for privacy-induced degradation, the framework projects the observation-compatible fiber of clean/unfiltered experimental worlds onto the incrementality functional and returns certified, rejected, and unresolved decisions. The main result gives a sharp decision frontier. Reports outside the frontier support uniformly valid certification or rejection, whereas reports inside it contain too little information for any method to uniformly distinguish above-threshold incrementality from non-incrementality. Supporting results give finite-sample certification, sample-complexity guarantees, a minimax lower bound showing that signal loss reduces effective information, and a reporting-granularity tradeoff. On 2.0M Criteo Uplift rows and the 64K-row Hillstrom email experiment, clean conversion lift is positive in both datasets, with lifts 0.00112 and 0.00495, respectively. Population certification survives mild degradation in Criteo and severe degradation in Hillstrom, while all considered finite-sample stress settings in both datasets remain unresolved after simultaneous uncertainty and reporting noise are included. Overall, the research contributes a decision-theoretic layer for privacy-aware incrementality measurement whose output is the strongest causal-claim justified by degraded ads signals.

2606.03820 2026-06-03 stat.ML cs.LG

A Quantitative Approximation Framework for Flow Distillation in Diffusion Models

扩散模型中流蒸馏的定量近似框架

Weiguo Gao, Ming Li, Lei Shi, Hanfei Zhou

发表机构 * School of Mathematical Sciences, Fudan University(复旦大学数学学院) Shanghai Key Laboratory of Contemporary Applied Mathematics, Fudan University(复旦大学当代应用数学重点实验室)

AI总结 针对扩散模型中的流蒸馏,提出一个定量近似框架,将少步采样视为学习流映射组合下的误差传播,通过理论分析和实验验证了稳定性平衡的非均匀时间网格能显著降低端到端相对MSE。

详情
AI中文摘要

我们为扩散蒸馏开发了一个定量近似框架,将少步采样视为学习流映射组合下的误差传播。聚焦于概率流ODE的轨迹蒸馏,我们表明局部近似误差在低噪声多模态区域可能被强烈放大,其中底层动力学变得刚性。在一个解析可处理的高斯混合Ornstein--Uhlenbeck设定中,我们分离了两个核心困难:近似时间依赖的分数场和控制由概率流ODE的时间积分Jacobian界决定的动力学放大。在近似方面,我们证明了构造性的L^p(p_t)保证,表明ReLU--ReQU网络随时间一致地近似高斯混合分数,其深度和宽度在目标精度上呈多对数缩放,并显式依赖于混合几何。在稳定性方面,我们推导了概率流速度的空间Lipschitz常数的一个显式界L(t),并将其转化为由∫_s^t L(u)du控制的流映射稳定性估计,使得刚性区域中的后期放大可计算。基于这些估计,我们证明深度残差组合有效近似长时程传输,全局误差由稳定性放大因子控制,并识别出一个Lipschitz不匹配区域,其中一步蒸馏在结构上不利。由此产生的理论通过累积稳定性坐标的均匀划分得到一个稳定性平衡的非均匀时间网格。实验支持该预测,并在8个分段下与均匀网格相比将端到端相对MSE降低了高达51.9%。

英文摘要

We develop a quantitative approximation framework for diffusion distillation, viewing few-step sampling as error propagation under compositions of learned flow maps. Focusing on trajectory distillation for the probability-flow ODE, we show that local approximation errors can be strongly amplified in low-noise multimodal regimes, where the underlying dynamics become stiff. In an analytically tractable Gaussian-mixture Ornstein--Uhlenbeck setting, we separate two core difficulties: approximating the time-dependent score field and controlling the dynamical amplification governed by the time-integrated Jacobian bound of the probability-flow ODE. On the approximation side, we prove constructive L^p(p_t) guarantees showing that ReLU--ReQU networks approximate the Gaussian-mixture score uniformly over time, with depth and width scaling polylogarithmically in the target accuracy and explicitly with the mixture geometry. On the stability side, we derive an explicit bound L(t) for the spatial Lipschitz constant of the probability-flow velocity and convert it into a flow map stability estimate governed by \int_s^t L(u)\,du, making late-time amplification in stiff regimes computable. Building on these estimates, we prove that deep residual compositions efficiently approximate the long-horizon transport, with global error controlled by the stability amplification factor, and identify a Lipschitz-mismatch regime in which one-step distillation is structurally unfavorable. The resulting theory yields a stability-balanced non-uniform time grid obtained by uniform partitioning in the cumulative stability coordinate. Experiments support the prediction and reduce end-to-end relative MSE by up to 51.9\% with 8 segments compared with uniform grids.

2606.03736 2026-06-03 stat.ML cs.LG

Resource-Constrained Adaptive Inference for Sequential Pricing

资源约束下的自适应推断用于顺序定价

Ruicheng Ao, Jiashuo Jiang, David Simchi-Levi

发表机构 * Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA 02139(数据、系统与社会研究所,麻省理工学院,剑桥,马萨诸塞州,02139) Department of Industrial Engineering and Decision Analytics, Hong Kong University of Science and Technology, Hong Kong(工业工程与决策分析系,香港科技大学,香港) Department of Civil and Environmental Engineering and Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA 02139(土木与环境工程系和运筹中心,麻省理工学院,剑桥,马萨诸塞州,02139)

AI总结 针对资源约束导致固定价格推断不可行的问题,提出一种目标感知定价控制器,通过认证可行目标带并记录连续局部密度,实现基于局部去偏的学生化区间,并分析遗憾-信息核算。

详情
AI中文摘要

资源约束的定价控制器可能使得固定价格推断变得不可能:即使每个已实现的动作具有已知的正密度,控制器的资源状态也可能从可行集中移除目标价格邻域。我们通过局部不可识别结果和已实现的信息时钟形式化了这种支持排除失败。然后,我们设计了一种目标感知定价控制器,该控制器认证可行的目标带并记录连续的局部密度。局部去偏产生了学生化区间,其宽度由该时钟控制。由此产生的遗憾-信息核算(直到初始求解误差)表明,廉价的探索可能不足以进行推断:多项式目标质量给出多项式速率,而纯$1/t$目标分支在没有额外局部移动的情况下不会产生收缩的固定目标区间。实验显示了在认证带中的校准以及当资源状态崩溃目标支持时的诊断性弃权。

英文摘要

Resource-constrained pricing controllers can make fixed-price inference impossible: the controller's resource state may remove the target price neighborhood from the feasible set, even when every realized action has a known positive density. We formalize this support-exclusion failure through a local non-identification result and a realized information clock. We then design a target-aware pricing controller that certifies feasible target bands and logs continuous local densities. Localized debiasing gives studentized intervals whose width is governed by this clock. The resulting regret--information accounting, stated up to pilot re-solving error, shows that cheap exploration can be insufficient for inference: polynomial target mass gives polynomial rates, while a pure $1/t$ target branch does not yield shrinking fixed-target intervals without additional local movement. Experiments show calibration in certified bands and diagnostic abstention when the resource state collapses target support.

2606.03600 2026-06-03 stat.ML cs.LG

Set-Preserving Calibration from Conformal P-Values to E-Values

从共形p值到e值的集合保持校准

Nabil Alami, Jad Zakharia, Souhaib Ben Taieb

发表机构 * ETH Zurich(苏黎世联邦理工学院)

AI总结 针对共形预测中p值到e值转换的局限性,提出一种集合保持的P2E校准器,在不改变预测集的前提下实现高效转换,并在交叉共形预测和共形聚合中达到期望覆盖并提升效率。

详情
AI中文摘要

标准的共形预测(CP)过程通常用p值表述,但仅依赖p值限制了灵活性,例如在跨模型或数据分割组合依赖证据时。最近的工作探索了共形推断的e值表述,然而CP中p值和e值表述之间的直接联系仍然缺失,特别是在统计效率方面。我们首先指出了CP设置中经典p到e校准器的局限性,表明它们不是集合保持的,可能导致过于保守的预测集。为解决这一问题,我们提出了一种新颖的P2E校准器,它将共形p值转换为e值,而不改变原始共形p值诱导的预测集。我们在理论和实证上证明,我们的校准器相比现有的p到e校准器可以带来显著的效率提升。这种e值表述使得能够原则性地使用e值合并和随机化的最新进展,我们在两个应用中展示了其影响:交叉共形预测(CCP),其变体通常仅提供近似的$1-2\alpha$覆盖率,以及共形聚合(CA)。在这两种情况下,我们基于e值的方法满足所需的$1-\alpha$覆盖率保证,同时相比标准基线提高了效率。更广泛地说,我们的方法扩展了CP的灵活性,并为高效、无分布的量化不确定性开辟了新方向。

英文摘要

Standard conformal prediction (CP) procedures are typically formulated in terms of p-values, but reliance on p-values alone limits flexibility, for example, when combining dependent evidence across models or data splits. Recent work has explored e-value formulations for conformal inference, yet a direct connection between p- and e-value formulations in CP has been missing, especially regarding their statistical efficiency. We first identify limitations of classical p-to-e calibrators in the CP setting, showing that they are not set-preserving and can lead to overly conservative prediction sets. To address this, we propose a novel P2E calibrator that converts conformal p-values into e-values without altering the prediction set induced by the original conformal p-value. We establish both theoretically and empirically that our calibrator can yield significant efficiency gains over existing p-to-e calibrators. This e-value formulation enables principled use of recent advances in e-value merging and randomization, where we demonstrate its impact in two applications: cross-conformal prediction (CCP), whose variants typically provide only approximate $1-2α$ coverage, and conformal aggregation (CA). In both cases, our e-value-based methods satisfy the desired $1-α$ coverage guarantee while improving efficiency over standard baselines. More broadly, our approach expands the flexibility of CP and opens new directions for efficient, distribution-free uncertainty quantification.

2606.03574 2026-06-03 stat.ML cs.LG

Few-Shot Prediction for Pulsar Noise with Long Short-Term Memory Network

基于长短期记忆网络的脉冲星噪声少样本预测

Qingye Tang, Dechao An, Haoran Peng, Yuqi Ouyang

发表机构 * Sichuan University, College of Computer Science(四川大学计算机学院) Sichuan University, College of Physics(四川大学物理学院)

AI总结 针对脉冲星计时数据稀缺问题,提出一种结合模型无关元学习优化的LSTM网络,仅需少量真实计时残差即可快速适应新频域,并利用粒子群算法自动调参,在IPTA数据集上以10%数据实现高精度预测。

详情
AI中文摘要

本文提出了一种新颖的解决方案,用于在有限数据下预测脉冲星计时残差,解决了PTA数据集中毫秒脉冲星自旋频率子组数据稀缺的关键挑战。该方案应用了长短期记忆(LSTM)网络,并通过模型无关元学习算法进行优化,使得仅需少量真实计时残差即可通过微调LSTM网络快速适应新的频域。同时,采用粒子群优化算法进行自动超参数优化,提高了预测精度。我们的解决方案在国际脉冲星计时阵列(IPTA)第二次数据发布上进行了评估,在高频测试频域的三个指标上均展现出鲁棒的泛化能力和准确预测,且仅需这些域中10%的计时残差进行模型微调。此外,我们的轻量级结构仅需16.86 MB CPU内存和18毫秒即可完成单步残差预测。所有这些特性使得我们的解决方案非常适合实际应用,在这些应用中,有效且实时的脉冲星计时残差预测至关重要——尤其是在计算能力、内存或能源有限的资源受限环境中。

英文摘要

This work proposes a novel solution to predict pulsar timing residuals with limited data, addressing the critical challenge of data scarcity across spin-frequency subgroups of millisecond pulsars in PTA datasets. The proposed solution applies a Long Short-Term Memory (LSTM) network optimized using the model-agnostic meta-learning algorithm, enabling rapid adaptation to new frequency domain by fine-tuning the LSTM network with only a few-shot of ground truth timing residuals. Particle swarm optimization algorithm is also used for automatic hyperparameter optimization, leading to improved prediction accuracy. Our solution, evaluated on the second data release of the International Pulsar Timing Array (IPTA), demonstrates robust generalization with accurate predictions in three metrics across high-frequency test frequency domains, while requiring only 10% of the timing residuals from these domains for model fine-tuning. Furthermore, our lightweight structure only costs 16.86 MB CPU memory and 18 milliseconds for single-step residual prediction. All these characteristics make our solution highly suitable for real-world applications, where effective and real-time predictions of pulsar timing residuals are essential-particularly in resource-constrained environments with limited computational power, memory, or energy availability.

2606.03553 2026-06-03 stat.ML cs.LG math.OC

A Robust Optimization Approach to Sparse Principal Component Analysis

稀疏主成分分析的鲁棒优化方法

David Vävinggren, Francis Bach, André M. H. Teixeira, Dave Zachariah, Antônio H. Ribeiro

发表机构 * Uppsala University, Sweden(乌普萨拉大学,瑞典) PSL Research University / INRIA, France(巴黎社会科学大学 / INRIA,法国) Science for Life Laboratory, Sweden(生命科学实验室,瑞典)

AI总结 提出AdvPCA方法,通过鲁棒优化在重建目标中引入最坏情况潜在空间扰动实现稀疏性,并给出闭式解和迭代算法。

详情
AI中文摘要

虽然主成分分析(PCA)是降维的基本工具,但其稠密表示使其不适用于高维数据。现有方法通过显式的$\ell_1$惩罚来促进稀疏性,但由于任务的无监督性质,这些惩罚不易调整。相比之下,我们提出了对抗性PCA(AdvPCA),它利用鲁棒优化,通过优化针对有界、最坏情况潜在空间扰动的重建目标来实现稀疏性。我们表明,该公式允许闭式约简,从而产生一种实用的迭代算法,该算法交替进行稀疏编码器的对抗性线性回归式更新和解码器的正交更新。通过对解进行理论刻画,我们推导出一种数据自适应参数化,使算法能够开箱即用地有效执行。我们通过在合成和真实世界基因组学数据上的数值实验验证了这些主张。

英文摘要

While principal component analysis (PCA) is a fundamental tool for dimensionality reduction, its dense representations make it ill-suited for high-dimensional data. Existing methods address this by promoting sparsity through explicit $\ell_1$-penalties, but these are not obvious to tune due to the unsupervised nature of the task. In contrast, we propose Adversarial PCA (AdvPCA), which leverages robust optimization to achieve sparsity by optimizing the reconstruction objective against bounded, worst-case latent space perturbations. We show that this formulation admits a closed-form reduction, leading to a practical iterative algorithm that alternates between adversarial linear regression-style updates for the sparse encoder and orthogonal updates for the decoder. By theoretically characterizing the solution, we derive a data-adaptive parameterization that allows the algorithm to perform effectively out of the box. We validate these claims through numerical experiments on synthetic and real-world genomics data.

2606.03292 2026-06-03 stat.ML cs.LG

Combining Statistical Features and Deep Encodings for Rehearsal-Based Class-Incremental Time Series Classification

结合统计特征与深度编码的基于排练的类增量时间序列分类

Pablo García-Santaclara, Bruno Fernández-Castro, Rebeca Pilar Díaz-Redondo

发表机构 * atlanTTic – ICLAB, Universidade de Vigo(atlanTTic–ICLAB,维戈大学) Centro Tecnolóxico de Telecomunicacións de Galicia (GRADIANT)(加利西亚电信技术中心(GRADIANT)) Universidade de Vigo(维戈大学)

AI总结 提出一种双流特征提取管道(结合预训练冻结基础模型的深度时间嵌入特征与统计特征),用于多变量时间序列的类增量持续学习,在五个基准数据集上实现了有竞争力的平均准确率和低遗忘率。

详情
AI中文摘要

现实环境中使用的许多系统需要在不遗忘分类模型先前学习内容的情况下添加新类别并整合新信息。这被称为类增量持续学习,而对于多变量时间序列,数据的时间结构进一步增加了复杂性。本文提出了一种基于双流特征提取管道(使用通过预训练冻结基础模型生成的深度时间嵌入特征以及应用统计特征)的多变量时间序列分类类增量持续学习的新方法。在五个基准数据集上的评估表明,所提出的系统在所有数据集上实现了有竞争力的平均准确率,同时在所有实验配置中保持了较低的遗忘率。

英文摘要

Many systems used in real-world environments require adding new categories and incorporating new information without forgetting what was previously learnt by the classification model. This is known as class-incremental continual learning, and in the case of multivariate time-series, is further complicated by the temporal structure of the data. In this paper, we present a novel approach for performing class incremental continual learning for the classification of multivariate time series data based upon the construction of a dual-stream feature extraction pipeline (using both deep temporal embedding features generated via a pre-trained frozen foundation model and application of statistical features). Evaluated on five benchmark datasets, the proposed system achieves competitive average accuracy across all datasets while maintaining low forgetting rates across all experimental configurations.

2606.03245 2026-06-03 stat.ML cs.LG

Hierarchies of Calibration: Classification meets Regression

校准的层次结构:分类与回归的融合

Johannes Resin, Lu Yang, Tilmann Gneiting

发表机构 * Goethe University Frankfurt(法兰克福歌德大学) University of Minnesota(明尼苏达大学) Heidelberg Institute for Theoretical Studies(海德堡理论研究所) Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院)

AI总结 本文综述、扩展并桥接了分类与回归任务中的校准概念,重点研究了不同校准概念之间的层次关系,并提出了模态校准、全校准、部分校准和平均校准等新概念。

详情
AI中文摘要

校准概念形式化了概率预测与相应结果之间的兼容性。简而言之,结果应与从预测分布中随机抽取的样本无法区分。本文回顾、扩展并桥接了针对分类和回归任务提出的校准概念。特别强调了各种概念之间的层次关系,因为它们适用于一般实值数据、连续结果、计数数据、名义类别和二元结果。为了突出若干贡献,我们引入了名义结果的模态校准概念,在此背景下区分了全校准、部分校准和平均校准,并证明了双概率积分变换(PIT)校准在逻辑上独立于先前针对离散结果提出的校准概念。此外,我们推广了关于校准概念的现有结果,这些概念以预测分布的性质或泛函(如均值、分位数或事件概率)表示。在整篇论文中,我们通过实例说明这些概念及其层次关系,并提供支持构建指导性示例和反例的算法工具。

英文摘要

Concepts of calibration formalize the compatibility between probabilistic predictions and the respective outcomes. In a nutshell, the outcomes ought to be indistinguishable from random draws from the predictive distributions. In this paper, we review, extend, and bridge notions of calibration that have been proposed for classification and regression tasks. Particular emphasis is given to hierarchical relations between the various notions, as they apply to general real-valued data, continuous outcomes, count data, nominal classes, and binary outcomes. To highlight a number of contributions, we introduce the notion of modal calibration for nominal outcomes, we distinguish full, partial, and average calibration in this setting, and we show that double probability integral transform (PIT) calibration is logically independent of previously proposed concepts of calibration for discrete outcomes. Furthermore, we generalize extant results on concepts of calibration that are expressed in terms of properties or functionals of the predictive distributions, such as means, quantiles, or event probabilities. Throughout the paper, we illustrate the concepts and their hierarchical relations in worked examples, and we provide algorithmic tools that support the construction of instructive examples and counterexamples.

2606.03217 2026-06-03 stat.ML cond-mat.dis-nn cs.LG

An Asymptotic Theory of Chain-of-Thought in In-Context Learning

上下文学习中思维链的渐近理论

Kaito Takanami, Cengiz Pehlevan

发表机构 * Department of Physics, Graduate School of Science, The University of Tokyo(东京大学物理系研究生院) John A. Paulson School of Engineering and Applied Sciences, Harvard University(哈佛大学约翰·A·保罗森工程与应用科学学院) Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University(哈佛大学凯普勒人工智能研究 institute) Center for Brain Science, Harvard University(哈佛大学脑科学中心)

AI总结 通过高维随机矩阵理论,推导了线性回归中上下文学习思维链的泛化误差精确公式,揭示了推理深度、预训练数据量和上下文长度之间的相变现象。

详情
AI中文摘要

思维链推理已成为一种广泛使用的机制,通过在推理时生成中间推理步骤来激发大型语言模型的多步推理。然而,泛化能力随思维链深度的缩放行为仍知之甚少。为了解决这个问题,我们研究了一个理论上可解的线性回归中上下文权重预测的思维链模型,其中测试时推理表示为权重参数估计的迭代细化。利用高维渐近下的随机矩阵理论工具,我们推导了泛化误差作为推理深度、预训练数据量和上下文长度的精确公式。我们的分析揭示了指数与多项式改进、饱和及过度思考之间的尖锐相变,并刻画了最优推理深度如何缩放。我们进一步表明,更深的推理在预训练和上下文信息足够丰富时最为有效,而有限的预训练或上下文会使较长的推理容易产生误差放大或饱和。我们还通过在完全学习的线性注意力和softmax注意力模型上的实验验证了这些预测。我们的结果为测试时思维链深度如何影响泛化提供了一个统一的理论解释。

英文摘要

Chain-of-thought (CoT) reasoning has become a widely used mechanism for eliciting multi-step reasoning in large language models by generating intermediate reasoning steps at inference time. Yet the scaling behavior of generalization with CoT depth remains poorly understood. To address this question, we study a theoretically solvable model of CoT for in-context weight prediction in linear regression, where test-time reasoning is represented as an iterative refinement of the weight-parameter estimate. Using tools from random matrix theory under high-dimensional asymptotics, we derive an exact formula for the generalization error as a function of reasoning depth, pretraining data amount, and context length. Our analysis reveals a sharp phase transition separating exponential and polynomial improvement, saturation, and overthinking, and characterizes how the optimal reasoning depth scales. We further show that deeper reasoning is most effective with sufficiently rich pretraining and in-context information, whereas limited pretraining or context makes longer reasoning prone to error amplification or saturation. We also validate these predictions through experiments on fully learned linear attention and softmax attention models. Our results provide a unified theoretical account of how test-time CoT depth affects generalization.

2606.02909 2026-06-03 stat.ML cs.LG

Scalable Derivative Gaussian Processes via Exact Gradient Reduction

可扩展的导数高斯过程通过精确梯度约简

Hyunseok Seung, Matthias Katzfuss

发表机构 * Department of Statistics University of Wisconsin–Madison(统计学系威斯康星大学麦迪逊分校)

AI总结 提出TERA方法,利用精确梯度约简将导数高斯过程的计算复杂度从O(n^3 d^3)降至O(d m^2 + m^6),实现高维空间中的可扩展推理。

详情
AI中文摘要

梯度观测可以显著改善高斯过程(GP)代理,特别是在函数评估昂贵的高维设置中。然而,对n个函数值和n个完整梯度(d维)进行精确推理的计算复杂度与联合状态大小呈三次方关系,导致难以处理的O(n^3 d^3)计算瓶颈。我们提出TERA,一种基于目标特定精确梯度约简的高度可扩展导数GP方法。我们证明,对于平稳核,与连接目标和条件点的方向正交的梯度分量在条件上独立于目标函数值;因此,一旦指定了大小为m的条件集,精确条件密度完全由至多m^2个方向导数刻画。通过将这些约简的、无维度的条件作为Vecchia近似中的局部因子,TERA有效地将n和d从稠密矩阵求逆中解耦。这将每个目标的评估成本降低到O(d m^2 + m^6)时间和O(d m^2 + m^4)内存,同时保持底层导数GP模型在数学上不变。实验评估表明,TERA实现了最先进的预测精度,同时比标准导数GP快数个数量级。关键的是,计算时间和峰值GPU内存相对于d基本保持平稳,从而在高维空间中实现高度可扩展的推理。

英文摘要

Gradient observations can substantially improve Gaussian process (GP) surrogates, particularly in high-dimensional settings where function evaluations are expensive. However, exact inference with $n$ function values and $n$ full gradients in $d$ dimensions scales cubically in the joint state size, imposing an intractable $\mathcal{O}(n^3 d^3)$ computational bottleneck. We introduce TERA, a highly scalable derivative GP method based on target-specific exact gradient reduction. We prove that for stationary kernels, the gradient components orthogonal to the directions connecting the target and conditioning points are conditionally independent of the target function value; consequently, the exact conditional density is fully characterized by at most $m^2$ directional derivatives once a conditioning set of size $m$ is specified. By using these reduced, dimension-free conditionals as local factors in a Vecchia approximation, TERA effectively decouples $n$ and $d$ from the dense matrix inversion. This reduces the per-target evaluation cost to $\mathcal{O}(dm^2 + m^6)$ time and $\mathcal{O}(dm^2 + m^4)$ memory, leaving the underlying derivative GP model mathematically unchanged. Empirical evaluations demonstrate that TERA achieves state-of-the-art predictive accuracy while operating orders of magnitude faster than standard derivative GPs. Crucially, both computation time and peak GPU memory remain essentially flat with respect to $d$, enabling highly scalable inference in high-dimensional spaces.

2606.02740 2026-06-03 stat.ML cs.LG

ScoreStop: Gradient-based early stopping using functional score tests

ScoreStop: 基于梯度的早期停止方法使用函数得分检验

Oliver J. Hines, Christian L. Hines

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出ScoreStop方法,通过函数得分检验在每次迭代中检验当前预测器是否为总体风险最小化器,从而在梯度提升决策树中实现基于梯度的早期停止,避免过拟合。

Comments Presented at the International Conference on Machine Learning 2026 Workshop on Hypothesis Testing

详情
AI中文摘要

梯度提升决策树需要停止规则以避免过拟合。标准规则监控验证损失,如果损失在固定的耐心期内没有改善则停止。然而,耐心参数没有可解释的尺度,验证损失可能带有噪声或由用户指定的梯度隐式定义。我们提出ScoreStop,一种基于梯度的早期停止规则,将每次迭代的停止决策视为检验当前预测器是否为总体风险最小化器的原假设。我们使用在验证数据上计算的函数得分检验,其统计量在更新方向上具有尺度不变性,并且在原假设下具有已知的渐近分布。由于我们的检验使用梯度而非损失值,相同的构造适用于隐式损失(如LambdaRank)和通过影响函数的数据依赖损失(如Cox回归)。在合成实验和真实数据基准测试中,我们展示了ScoreStop与基于损失的方法相比具有竞争力。

英文摘要

Gradient boosted decision trees require a stopping rule to avoid overfitting. The standard rule monitors a validation loss and stops if the loss fails to improve for a fixed patience period. However, the patience parameter has no interpretable scale and validation losses can be noisy or implicitly defined by a user-specified gradient. We propose ScoreStop, a gradient-based early-stopping rule that casts the stopping decision at each iteration as a test of the null hypothesis that the current predictor is the population risk minimizer. We use a functional score test, computed on validation data, with a statistic that is scale-invariant in the update direction, with a known asymptotic distribution under the null. Because our test uses gradients rather than loss values, the same construction applies to implicit losses such as LambdaRank, and data-dependent losses such as Cox regression via influence functions. In synthetic experiments and real-data benchmarks, we show that ScoreStop is competitive with loss-based methods.

2606.02664 2026-06-03 stat.ML cs.LG

State-Coupled Volatility in Latent Dynamical Systems: Recovery Under Partial Observation

潜变量动力系统中的状态耦合波动性:部分观测下的恢复

Imani Beckett

发表机构 * The Herbert Wertheim School of Public Health and Human Longevity Science(赫伯特·韦特海姆公共卫生与人类长寿科学学院) University of California San Diego(加州大学圣地亚哥分校)

AI总结 提出状态耦合随机波动框架,利用粒子期望最大化算法在部分观测下估计潜变量过程方差与平衡点位移的关系,并通过仿真验证了恢复与检测性能。

Comments 40 pages, 16 figures

详情
AI中文摘要

潜状态空间模型广泛用于研究部分观测的动力系统,但大多数公式假设过程变异性与潜状态位置无关。然而,在许多生物、行为和生理系统中,变异性可能系统地依赖于潜在动力状态,产生恒定方差模型无法捕捉的结构化随机性。我们引入了一个状态耦合随机波动框架,其中潜过程方差取决于与潜平衡点的位移。为了在部分观测下估计这种关系,我们开发了一种粒子期望最大化程序,结合了引导粒子滤波和反向轨迹平滑。该模型包含一个耦合参数 $\gamma$,用于量化潜状态位置与过程变异性之间的关联强度。一个大规模仿真基准评估了在不同耦合强度、观测噪声水平、轨迹长度和持续性机制下的恢复和检测性能。与基于观测状态的异方差代理相比,所提出的框架一致地减少了恢复偏差,在强耦合下改进最大。恢复性能随着潜持续性的增加而提高,而检测性能在广泛条件下保持竞争力,并随着观测噪声的增加而变得更加有利。综合来看,结果表明当明确建模潜状态结构时,可以在部分观测下识别和估计状态耦合波动性。该框架为研究状态依赖变异性以及评估结构化随机性是否提供超出平均状态轨迹所包含的系统动力学信息提供了实用的方法论基础。

英文摘要

Latent state-space models are widely used to study partially observed dynamical systems, yet most formulations assume that process variability is independent of latent-state position. In many biological, behavioral, and physiological systems, however, variability may depend systematically on the underlying dynamical state, producing structured stochasticity that is not captured by constant-variance models. We introduce a state-coupled stochastic volatility framework in which latent process variance depends on displacement from a latent equilibrium. To estimate this relationship under partial observation, we develop a particle expectation-maximization procedure combining bootstrap particle filtering and backward trajectory smoothing. The model includes a coupling parameter, $γ$, that quantifies the strength of association between latent-state position and process variability. A large-scale simulation benchmark evaluated recovery and detection performance across varying coupling strengths, observation noise levels, trajectory lengths, and persistence regimes. The proposed framework consistently reduced recovery bias relative to an observed-state heteroskedastic proxy, with the largest improvements occurring under strong coupling. Recovery performance improved with increasing latent persistence, while detection performance remained competitive across a broad range of conditions and became increasingly advantageous as observation noise increased. Taken together, the results demonstrate that state-coupled volatility can be identified and estimated under partial observation when latent-state structure is explicitly modeled. The framework provides a practical methodological foundation for studying state-dependent variability and evaluating whether structured stochasticity contributes information about system dynamics beyond that contained in mean-state trajectories alone.

2606.02645 2026-06-03 stat.ML cs.AI cs.LG

Target Updates May Stabilize Linear Q-Learning: Periodic and Soft Dynamics

目标更新可能稳定线性Q学习:周期性和软动态

Donghwan Lee

发表机构 * School of Electrical Engineering, KAIST(韩国成均馆大学电气工程学院)

AI总结 本文通过精确的切换线性系统动力学和联合谱半径分析,证明了在特定谱和步长条件下,周期性硬目标更新和软目标更新可以保证线性Q学习收敛到精确的投影Q-Bellman解。

详情
AI中文摘要

Q学习中的周期性目标更新和actor-critic方法中的软目标更新是经验上公认的稳定机制,但其精确的理论解释仍不完整。本文针对线性函数逼近的Q学习(线性Q学习),利用Bellman最大值引起的精确切换线性系统(SLS)动力学以及由此产生的切换矩阵族的联合谱半径(JSR),对这些机制进行了严格而精确的分析。尽管线性Q学习通常可能无法收敛,但我们证明,在明确的谱和步长条件下,周期性硬目标更新和软目标更新可以保证收敛到精确的投影Q-Bellman解。主要分析针对确定性线性Q学习进行,其中目标更新机制最为透明。一旦为均值递归建立了相应的JSR证书,随机强化学习设置可以通过将确定性模式替换为采样随机模式并添加相应的随机噪声分析来处理。

英文摘要

Periodic target updates in Q-learning and soft target updates in actor-critic methods are empirically well established stabilization mechanisms, but their precise theoretical explanation is still incomplete. This paper gives a rigorous and exact analysis of these mechanisms for Q-learning with linear function approximation (linear Q-learning) using the exact switched linear system (SLS) dynamics induced by the Bellman maximum and the joint spectral radius (JSR) of the resulting switching matrix families. Although linear Q-learning can fail to converge in general, we prove that, under explicit spectral and step-size conditions, periodic hard target updates and soft target updates can guarantee convergence to the exact projected Q-Bellman solution. The main analysis is carried out for deterministic linear Q-learning, where the target-update mechanism is most transparent. Once the corresponding JSR certificate is established for the mean recursion, the stochastic reinforcement-learning setting can be treated by replacing deterministic modes with sampled stochastic modes and adding the corresponding stochastic-noise analysis.

2606.02632 2026-06-03 stat.ML cs.AI cs.CY cs.LG econ.EM stat.AP

Position: Prioritize Identifying Structure, Not Complex Models, for Scientific Discovery

立场:优先识别结构,而非复杂模型,以促进科学发现

Tyler H. McCormick

发表机构 * GitHub

AI总结 本文论证现代机器学习在高维代理机制下存在通用欠定性,提出“机制性机器学习”的具体标准,以确保以LLM为中心的工作流真正支持科学而非模拟科学。

Comments Will appear as a position paper in ICML

详情
AI中文摘要

现代机器学习(ML)和人工智能(AI)模型,特别是大型语言模型(LLMs),越来越多地被用于从观测数据中生成科学假设和机制解释。这篇立场论文认为,在现代ML擅长的高维代理机制中,机制性学习通常是欠定的:许多不相容的机制在数据支撑上诱导出本质上相同的观测关系,因此预测成功和连贯的解释并不足以作为机制发现的证据。这种欠定性在大型语言模型(LLMs)中变得尤为危险,因为它们倾向于将大量等价的解释类压缩成一个流畅的叙述。本文提出了“机制性机器学习”的具体标准,并论证如果以LLM为中心的工作流要支持科学而非仅仅模拟科学,这些标准是必要的。

英文摘要

Modern Machine Learning (ML) and Artificial Intelligence (AI) models, especially large language models (LLMs), are increasingly used to generate scientific hypotheses and mechanistic explanations from observational data. This position paper argues that in the high-dimensional proxy regimes where modern ML excels, mechanistic learning is generically underdetermined: many incompatible mechanisms induce essentially the same observational relationships on the support of the data, so predictive success and coherent explanations are insufficient evidence of mechanism discovery. This underdetermination becomes uniquely hazardous with large language models (LLMs), which tend to collapse large equivalence classes of explanations into a single fluent narrative. This paper proposes concrete standards for ``mechanistic ML,'' and argues these norms are necessary if LLM-centered workflows are to support science rather than merely simulate it.

2606.02592 2026-06-03 stat.AP cs.AI

Tracking Urban Atmospheric Pollutants using Sentinel-5P Satellite Data

利用Sentinel-5P卫星数据追踪城市大气污染物

Alice Gomez-Cantos, Henry O. Velesaca

发表机构 * Facultad de Ciencias Naturales y Matemáticas, Escuela Superior Politécnica del Litoral, ESPOL, Campus Gustavo Galindo, Km. 30.5 Vía Perimetral, Guayaquil, 090902, Ecuador(生态与数学学院,海岸理工大学,ESPOL,加斯托·加林多校区,公里30.5环形路,瓜亚基尔,090902,厄瓜多尔) Software Engineering Department, Research Center for Information and Communication Technologies (CITIC-UGR), University of Granada, 18071, Granada, Spain(软件工程系,信息与通信技术研究中心(CITIC-UGR),格拉纳达大学,18071,格拉纳达,西班牙)

AI总结 提出基于Sentinel-5P/TROPOMI卫星对流层柱观测的框架,通过中位数和高百分位数等分布指标及K-means聚类,在厄瓜多尔瓜亚斯省尺度上表征城市NO2污染背景与极端值,为数据稀缺地区提供可解释、可扩展的空气质量评估工具。

详情
AI中文摘要

城市二氧化氮($NO_2$)是燃烧相关空气污染的关键指标,在城市中表现出强烈的时空变异性。本研究提出一个基于卫星的框架,利用Sentinel-5P/TROPOMI的对流层柱观测数据,追踪厄瓜多尔瓜亚斯省的城市$NO_2$污染。该方法不估计地表浓度,而是强调稳健的分布指标,包括中位数和上尾百分位数($P_{90}$、$P_{95}$和$P_{99}$),以表征县尺度上的背景条件和局部污染极端值。多年卫星观测数据按年汇总,并使用无监督K-means聚类分析,以识别无预定义阈值的特征污染模式。结果表明,高度城市化的县持续表现出较高的极端$NO_2$值和更大的变异性,而城市化程度较低的地区则呈现较低且更均匀的模式。所提出的方法为数据稀缺地区仅使用卫星观测提供了一种可解释且可扩展的城市空气质量评估工具。该实现已在GitHub上公开,网址为https://this URL。

英文摘要

Urban nitrogen dioxide ($NO_2$) is a key indicator of combustion-related air pollution and exhibits strong spatial and temporal variability in cities. This study presents a satellite-based framework for tracking urban $NO_2$ pollution using tropospheric column observations from Sentinel-5P/TROPOMI over Guayas Province, Ecuador. Rather than estimating surface concentrations, the methodology emphasizes robust distributional metrics, including the median and upper-tail percentiles ($P_{90}$, $P_{95}$, and $P_{99}$), to characterize background conditions and localized pollution extremes at the canton scale. Multi-year satellite observations are aggregated annually and analyzed using unsupervised K-means clustering to identify characteristic pollution regimes without predefined thresholds. Results show that highly urbanized cantons consistently exhibit elevated extreme $NO_2$ values and greater variability, while less urbanized areas display lower and more homogeneous patterns. The proposed approach provides an interpretable and scalable tool for urban air-quality assessment in data-scarce regions using satellite observations alone. The implementation is publicly available on GitHub https://hvelesaca.github.io/sentinel-5P-clustering/.

2606.03184 2026-06-03 q-fin.CP cs.LG q-fin.ST

FinStressTS: A Parametric Synthetic Benchmark for Time-Series Forecasting in Finance

FinStressTS: 金融时间序列预测的参数化合成基准

Jiaze Sun, Kelvin J. L. Koa, Ruiyang Ni, Yize Liu, Haonan Chen, Ke-Wei Huang

发表机构 * National University of Singapore(新加坡国立大学) Asian Institute of Digital Finance(亚洲数字金融研究所) Nanyang Technological University(南洋理工大学)

AI总结 针对金融预测中信号弱、机制复杂的问题,提出FinStressTS合成基准,通过30个诊断环境系统评估15种模型在点预测与概率预测上的表现,揭示模型性能对数据机制的依赖性。

Comments KDD 2026 (Oral)

详情
AI中文摘要

金融预测因信噪比低、潜在因子、重尾、机制转换和跳跃而困难。真实世界基准提供的故障归因有限:研究人员可以观察到表现不佳,但往往无法隔离原因,因为机制不可观察且纠缠。真实金融数据仅揭示一条实现路径,使得评估尾部风险校准或数据效率变得困难。我们引入FinStressTS,一个机制感知的合成基准,将模型行为与受控的结构原因联系起来。FinStressTS包含围绕六个机制族(波动率聚类、多尺度持续性、重尾冲击、机制转换、自激跳跃和零膨胀过程)的30个诊断环境。我们评估两个任务:点预测(使用五种设置下的NMAE)和概率预测(在已知数据生成机制下使用CRPS)。我们对15个模型进行基准测试,从经典方法(HAR、VAR)到Transformer预测器(PatchTST、iTransformer)和深度概率架构(DeepAR、TSFlow),并使用学习曲线衡量样本效率。我们的评估揭示了三个见解。首先,性能依赖于机制:自回归和线性模型在多个波动率、尾部和跳跃驱动的环境中具有很强的竞争力,并且通常优于基于Transformer的模型。其次,分布对齐很重要:诸如DeepAR之类的参数化概率模型在平稳设置中校准良好,而灵活模型在分布变为多模态或稀疏时可能有所帮助。第三,神经网络模型通常需要更多数据才能匹配简单基线,主要在学习潜在机制或复杂分布时获得更大收益。FinStressTS提供了一个用于诊断故障模式和推进风险感知预测的开放框架。

英文摘要

Financial forecasting is difficult due to low signal-to-noise ratios, latent factors, heavy tails, regime shifts, and jumps. Real-world benchmarks offer limited failure attribution: researchers can observe underperformance, but often cannot isolate why because mechanisms are unobservable and entangled. Real financial data reveal only one realized path, making it difficult to assess tail-risk calibration or data efficiency. We introduce FinStressTS, a mechanism-aware synthetic benchmark that links model behavior to controlled structural causes. FinStressTS comprises 30 diagnostic environments around six mechanism families: volatility clustering, multi-scale persistence, heavy-tailed shocks, regime switching, self-exciting jumps, and zero-inflated processes. We evaluate two tasks: point forecasting, using NMAE across five settings, and probabilistic forecasting, using CRPS under known data-generating mechanisms. We benchmark 15 models, from classical methods (HAR, VAR) to Transformer forecasters (PatchTST, iTransformer) and deep probabilistic architectures (DeepAR, TSFlow), and use learning curves to measure sample efficiency. Our evaluation reveals three insights. First, performance is mechanism-dependent: autoregressive and linear models are highly competitive, and often outperform Transformer-based models, in several volatility-, tail-, and jump-driven environments. Second, distributional alignment matters: parametric probabilistic models such as DeepAR calibrate well in stationary settings, while flexible models can help when distributions become multimodal or sparse. Third, neural models often require more data to match simple baselines, with larger gains mainly when learning latent regimes or complex distributions. FinStressTS provides an open framework for diagnosing failure modes and advancing risk-aware forecasting.

2606.02937 2026-06-03 q-bio.NC cs.CV

BEAST3D: Animal behavioral analysis and neural encoding from multi-view video via Gaussian splatting

BEAST3D: 通过高斯泼溅从多视角视频进行动物行为分析与神经编码

Yanchen Wang, Lenny Aharon, Wangshu Zhu, Kyle Daruwalla, Linghua Zhang, Jiaru Zou, Selmaan Chettih, Helen Hou, Liam Paninski, Matthew R Whiteway

发表机构 * Columbia University(哥伦比亚大学) Cold Spring Harbor(冷泉港) Stanford University(斯坦福大学)

AI总结 提出BEAST3D自监督预训练框架,利用未标注的多视角视频通过3D高斯泼溅重建和动物分割,学习3D视觉表征,有效应用于新视角合成、多视角姿态估计和神经编码。

详情
AI中文摘要

多视角视频记录越来越多地用于捕捉实验环境中动物的3D运动,但从这些记录中提取丰富的3D表示仍然具有挑战性。有监督的姿态估计需要大量手动标注,而在通用场景数据集上训练的通用3D重建模型无法适用于实验室实验的专业图像和稀疏视角设置。我们通过BEAST3D解决了这些限制,这是一个自监督预训练框架,从未标注的、已校准的多视角视频中学习3D视觉表示。BEAST3D使用视觉变换器预测3D高斯泼溅,通过可微渲染重建保留视角,同时将动物从背景中分割出来。BEAST3D通过直接以已知相机参数为条件,仅用四个视角即可重建3D结构——这与通用模型不同,后者必须从实验室环境中很少有的密集重叠视角估计相机几何。通过在四个物种上的全面评估,我们证明BEAST3D产生丰富的、视角不变的特征,这些特征有效地迁移到三个下游任务:新视角合成(验证了学习到的3D表示的质量)、多视角姿态估计(提供了行为分析中广泛使用的稀疏关键点轨迹)和神经编码(将3D行为特征与同时记录的神经活动相关联)。因此,BEAST3D建立了一个利用现代多视角实验室记录中3D结构的行为分析多功能框架。

英文摘要

Multi-view video recordings are increasingly used to capture the 3D movements of animals in experimental settings, yet extracting rich 3D representations from these recordings remains challenging. Supervised pose estimation requires extensive manual annotation, while general-purpose 3D reconstruction models trained on generic scene datasets fail on the specialized imagery and sparse-view setting of laboratory experiments. We address these limitations with BEAST3D, a self-supervised pretraining framework that learns 3D visual representations from unlabeled, calibrated multi-view video. BEAST3D uses a vision transformer to predict 3D Gaussian splats that reconstruct held-out views through differentiable rendering, while simultaneously segmenting the animal from the background. BEAST3D reconstructs 3D structure with as few as four views by conditioning directly on known camera parameters--unlike general-purpose models, which must estimate camera geometry from dense overlapping viewpoints that are seldom available in lab settings. Through comprehensive evaluation across four species, we demonstrate that BEAST3D produces rich, viewpoint-invariant features that transfer effectively to three downstream tasks: novel view synthesis, which validates the quality of the learned 3D representations; multi-view pose estimation, which provides the sparse keypoint trajectories widely used in behavioral analysis; and neural encoding, which relates 3D behavioral features to simultaneously recorded neural activity. BEAST3D thus establishes a versatile framework for behavioral analysis that leverages 3D structure in modern multi-view laboratory recordings.

2606.02629 2026-06-03 q-bio.QM cs.AI cs.LG

Enhancing Protein-Protein Interaction Prediction with Hierarchical Motif-based Multimodal Protein Embedding

基于层次化基序的多模态蛋白质嵌入增强蛋白质-蛋白质相互作用预测

Zaifei Yang, Samuel Ping-Man Choi, James Kwok

发表机构 * National University of Singapore(新加坡国立大学) University of California, Los Angeles(加州大学洛杉矶分校)

AI总结 提出MMM-PPI模型,通过层次化基序的多模态编码(微观、中观、宏观三尺度)整合序列、结构和功能信息,提升蛋白质-蛋白质相互作用预测性能。

详情
AI中文摘要

蛋白质-蛋白质相互作用(PPIs)对许多生物过程至关重要。然而,现有的PPI预测方法存在两个主要局限性:它们忽略了蛋白质的层次组织,特别是关键调控PPIs的中观尺度基序,并且未能有效整合序列、结构和功能模态。为了解决这些局限性,我们提出了MMM-PPI,一种基于层次化基序的多模态蛋白质编码器用于PPI预测,该编码器以自底向上的多模态方式在三个尺度上构建PPI嵌入。在微观尺度上,我们编码三种模态的残基特征;在中观尺度上,一种新颖的多模态基序编码器将残基聚合成空间感知的基序嵌入;在宏观尺度上,一种多模态蛋白质编码器通过联合建模基序重要性和模态间相关性将基序整合为蛋白质嵌入。预训练的编码器可直接用于大规模PPI预测。在多个PPI数据集上的大量实验表明,MMM-PPI优于最先进的多标签PPI预测模型,特别是在具有挑战性的数据划分和有限数据场景下。代码见此链接。

英文摘要

Protein-protein interactions (PPIs) are essential for many biological processes. However, existing PPI prediction approaches suffer from two major limitations: they overlook the hierarchical organization of proteins, particularly meso-scale motifs that critically regulate PPIs, and fail to effectively integrate sequence, structure, and function modalities. To address these limitations, we propose MMM-PPI, a Hierarchical Motif-based Multi-Modal protein Encoder for PPI Prediction that constructs PPI embeddings in a bottom-up multi-modal manner across three scales. At the micro-scale, we encode three modal residue features; at the meso-scale, a novel multimodal motif encoder aggregates residues into spatially-informed motif embeddings; at the macro-scale, a multimodal protein encoder integrates motifs into protein embeddings by jointly modeling motif importance and inter-modal correlations. The pre-trained encoder can be used off-the-shelf for large-scale PPI prediction. Extensive experiments on multiple PPI datasets show that MMM-PPI outperforms state-of-the-art multi-label PPI prediction models, particularly under challenging data partitions and limited data scenarios. Codes are in https://github.com/yzf-code/MMM-PPI.

2606.02624 2026-06-03 q-bio.QM cs.AI cs.LG

TadA-Bench: A Million-Variant Benchmark for Future-Round Discovery Toward Agentic Protein Engineering

TadA-Bench:面向智能蛋白质工程的未来轮次发现的百万变异基准

Jin Gao, Juntu Zhao, Zirui Zeng, Jiaqi Shen, Junhao Shi, Dukun Zhao, Yuming Lu, Dequan Wang

发表机构 * Tsinghua University(清华大学)

AI总结 TadA-Bench 是一个基于31轮TadA定向进化的百万变异湿实验回放基准,通过定义固定数据回放任务来评估模型在未见过的未来轮次中排序变异的能力,并引入Seq2Graph统一标签,揭示进化覆盖度比局部数据密度更重要。

Comments Accepted at the 43rd International Conference on Machine Learning (ICML 2026). Data: https://huggingface.co/datasets/JinGao/TadABench-1M . Code: https://github.com/shiyegao/TadABench-1M

详情
AI中文摘要

人工智能用于科学发现正进入智能体时代,蛋白质工程系统应优先考虑未来的湿实验,而不仅仅是拟合静态测量。我们引入了TadA-Bench,这是一个来自31轮TadA定向进化的百万变异湿实验回放基准,用于面向智能蛋白质工程的未来轮次发现。TadA-Bench保留了实验的时间顺序,并定义了一个固定数据回放任务:给定早期的实验轮次,模型对仅出现在后期轮次中的变异进行排序。它提供了对齐的DNA、RNA和蛋白质视图,并使用Seq2Graph(一种基于图的标签统一流程)来将嘈杂的富集测量结果协调为一致的跨轮次活性标签。随机分割控制显示强插值能力,但未来轮次排序和有限预算候选选择则弱得多。控制分析表明,进化覆盖度比局部数据密度更具信息性,将TadA-Bench定位为面向智能蛋白质工程的未来轮次发现的可重复湿实验回放基底;数据和代码已在Hugging Face和GitHub上发布。

英文摘要

AI for scientific discovery is entering an agentic era, where protein-engineering systems are expected to prioritize future wet-lab experiments rather than merely fit static measurements. We introduce TadA-Bench, a million-variant wet-lab replay benchmark from 31 TadA directed-evolution rounds for future-round discovery toward agentic protein engineering. TadA-Bench preserves the campaign chronology and defines a fixed-data replay task: given earlier experimental rounds, models rank variants that appear only in later rounds. It provides aligned DNA, RNA, and protein views, and uses Seq2Graph, a graph-based label-unification pipeline, to reconcile noisy enrichment measurements into consistent cross-round activity labels. Random-split controls show strong interpolation, but future-round ranking and finite-budget candidate selection are much weaker. Controlled analyses suggest that evolutionary coverage is more informative than local data density, positioning TadA-Bench as a reproducible wet-lab replay substrate for future-round discovery toward agentic protein engineering; the data and code are released on Hugging Face and GitHub.

2606.03946 2026-06-03 cs.DB cs.LG cs.LO

MLSkip: Data Skipping for ML Filters via Lightweight Metadata

MLSkip: 通过轻量级元数据实现ML过滤器的数据跳过

Mihail Stoian, Mark Gerarts, Pascal Ginter, Andreas Zimmerer, Jan Van den Bussche, Andreas Kipf

发表机构 * University of Technology Nuremberg(图恩堡技术大学) Hasselt University(哈塞尔特大学) Technical University of Munich(慕尼黑技术大学)

AI总结 针对ML过滤器无法应用传统数据跳过技术的问题,提出利用Parquet默认的min-max元数据以及增强的二维凸包元数据结构,实现高效的谓词剪枝,平均剪枝效果达38.31%。

详情
AI中文摘要

数据库厂商最近发布了可用于过滤器谓词的AI函数。由于这些函数通常依赖于昂贵且黑盒的ML模型,它们带来了新的数据管理挑战。具体而言,针对整数和字符串数据的传统数据跳过技术无法适用于这种新型过滤器。实际上,目前还没有已知的机制用于剪枝不合格的行组,例如从blob存储读取文件时。在这项工作中,我们首次研究了ML过滤器的数据跳过技术。我们论证了Parquet默认的min-max元数据足以实现剪枝。为此,我们联系了两条研究路线:(i) 最近提出的ML模型查询语言和(ii) 神经网络验证。我们在ReLU架构上的初步结果表明,在TPC-H和TPC-DS表上,选择性低于0.1%的过滤器的平均剪枝效果为27.4%。最后,受空间连接研究的启发,我们提出了一种增强的元数据结构:一个有大小限制的二维凸包,验证工具可以更好地利用它,将剪枝效果提高到38.31%,同时每个行组和列对最多占用45字节。我们观察到在DuckDB中相对于PyTorch的端到端加速比为1.07倍。

英文摘要

Database vendors recently released AI functions that can be used in filter predicates. As such functions often rely on costly, black-box ML models, they unveil new data management challenges. Concretely, traditional data skipping techniques for integer and string data fail to be applicable to the new filter type. Indeed, there is no known mechanism for pruning non-qualifying row groups, e.g., when reading files from blob storage. In this work, we initiate the study of data skipping techniques for ML filters. We make the case that Parquet's default min-max metadata is enough to enable pruning. To this end, we draw connections to two lines of research: (i) the recently proposed query language for ML models and (ii) neural network verification. Our preliminary results on ReLU architectures show that on tables from TPC-H and TPC-DS, the average pruning effectiveness for filters of selectivity below 0.1% amounts to 27.4%. Finally, inspired by research on spatial joins, we propose an enhanced metadata structure: a size-bounded 2D convex hull that verification tools can make better use of, increasing the pruning effectiveness to 38.31%, while occupying at most 45 bytes per row group and column pair. We observe an end-to-end speedup of 1.07$\times$ over PyTorch in DuckDB.

2606.03935 2026-06-03 cs.NE cs.LG

Quadratic integrate-and-fire neurons exhibit less fragmented loss landscapes and outperform leaky integrate-and-fire neurons in spike-based gradient descent

二次整合-放电神经元表现出更少的碎片化损失景观,并在基于脉冲的梯度下降中优于漏电整合-放电神经元

Carlo Wenig, Raoul-Martin Memmesheimer, Christian Klos

发表机构 * University of Bonn(波恩大学) University of Tübingen(图宾根大学) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 通过对比LIF和QIF神经元在Spiking Heidelberg Digits数据集上的表现,发现QIF神经元具有更平滑的损失景观和梯度,从而在脉冲神经网络训练中表现更优。

Comments 9 pages, 5 figures (main part)

详情
AI中文摘要

训练脉冲神经网络对于模拟生物神经网络以及神经形态计算至关重要。然而,对于广泛使用的漏电整合-放电(LIF)神经元,任意小的参数变化都可能引起脉冲的(消失)出现,从而破坏后续活动,导致在精确的基于脉冲的梯度下降过程中出现不稳定的神经表征和永久沉默的神经元。最近的研究表明,包括二次整合-放电(QIF)神经元在内的一类神经元模型避免了这些不连续性,并实现了连续甚至平滑的基于脉冲的梯度下降。然而,尚不清楚这些优势是否能转化为实际应用。在这里,我们通过在流行的Spiking Heidelberg Digits数据集上对LIF和QIF神经元网络进行受控比较,证明了它们确实如此。具体来说,第一步,我们进行了彻底的超参数搜索以优化两种模型,揭示了QIF神经元的明显性能优势。第二步,我们可视化了损失和梯度景观。与它们较差的性能一致,我们发现LIF神经元的损失景观(不连续)显得更加碎片化,相关梯度更加不稳定。对单个样本景观的分析表明,这些特征源于脉冲时间顺序的变化,这常常导致破坏性的脉冲(消失)出现。总体而言,我们的结果主张在梯度下降训练中用具有连续脉冲动态的神经元模型(如QIF神经元)替代LIF神经元。

英文摘要

The ability to train spiking neural networks is essential for modeling biological neural networks as well as for neuromorphic computing. However, for the extensively used leaky integrate-and-fire (LIF) neurons, arbitrarily small parameter changes can induce spike (dis)appearances that disrupt subsequent activity, leading to unstable neural representations and permanently silent neurons during exact spike-based gradient descent. Recent work shows that a class of neuron models, which includes the quadratic integrate-and-fire (QIF) neuron, avoids these discontinuities and enables continuous and even smooth spike-based gradient descent. However, it remains unclear whether these advantages translate into practice. Here, we demonstrate that they do so via a controlled comparison between networks of LIF and QIF neurons on the popular Spiking Heidelberg Digits dataset. Specifically, in a first step, we perform a thorough hyperparameter search to optimize both models, revealing a clear performance advantage of QIF neurons. In a second step, we visualize the loss and gradient landscapes. Consistent with their inferior performance, we find that the loss landscapes of LIF neurons, which are discontinuous, appear more fragmented and the related gradients more erratic. An analysis of the landscapes of single samples indicates that these features arise from changes in the temporal order of spikes, which often cause disruptive spike (dis)appearances. Overall, our results advocate replacing LIF neurons with neuron models exhibiting continuous spiking dynamics, such as QIF neurons, for gradient descent training.

2606.03926 2026-06-03 cs.HC cs.LG

DiffUNet^2: Bidirectional Prediction, Probabilistic Generation and Collaborative Visual Discovery for Scientific Data

DiffUNet^2: 科学数据的双向预测、概率生成与协同视觉发现

Mengdi Chu, Jiaxin Yang, Angus G. Forbes, Nathan Debardeleben, Earl Lawrence, Ayan Biswas, Han-Wei Shen

发表机构 * Ohio State University(俄亥俄州立大学) NVIDIA Los Alamos National Laboratory(洛斯阿拉莫斯国家实验室)

AI总结 提出基于扩散模型的条件生成框架DiffUNet^2,实现时间序列的双向任意步预测与概率分布捕获,并结合交互式可视化支持科学探索。

Comments 12 pages, 20 figures

详情
AI中文摘要

对科学现象进行时间演化建模对于分析和推理至关重要,然而大多数机器学习方法仅提供确定性的前向预测,忽略了多种可能的结果,且很少支持反向推理,限制了它们在科学工作流中的实用性。我们提出了一个将基于扩散的生成建模与交互式视觉分析相结合的框架,用于科学探索。我们引入了DiffUNet^2,一种条件扩散模型,能够实现跨时间的双向、任意到任意生成,并捕获系统可能演化的分布。基于该模型,我们的交互式系统支持分支时间线探索、用户引导的状态编辑和概率空间导航,使科学家能够主动探索替代假设,而非被动观察预测。我们在5个不同科学领域的数据集上评估了该模型,验证了其预测准确性和概率空间集成质量。与领域专家合作,我们证明了该方法在支持实际科学时间数据分析工作流中的有效性。通过集成建模与视觉交互,我们的方法使科学家能够交互式地探索系统动力学,将生成模型转化为假设驱动的科学分析工具。

英文摘要

Modeling temporal evolution is important to analyzing and reasoning about scientific phenomena, yet most machine learning methods provide deterministic forward predictions that overlook multiple plausible outcomes and rarely support backward reasoning, limiting their usefulness in practical scientific workflows. We present a framework that integrates diffusion-based generative modeling with interactive visual analytics for scientific exploration. We introduce DiffUNet^2, a conditional diffusion model that enables bidirectional, any-to-any generation across time and captures distributions of plausible system evolutions. Built upon the model, our interactive system supports branching timeline exploration, user-guided state editing, and probability-space navigation, enabling scientists to actively explore alternative hypotheses rather than passively observe predictions. We evaluate the model on 5 datasets across different scientific domains to validate its predictive accuracy and probability-space ensemble quality. In collaboration with domain experts, we demonstrate the effectiveness of our approach in supporting practical scientific temporal data analysis workflows. By integrating modeling and visual interaction, our approach enables scientists to interactively explore system dynamics, transforming generative models into tools for hypothesis-driven scientific analysis.

2606.03919 2026-06-03 cs.SI cs.CY cs.DL cs.LG physics.soc-ph

Forecasting Conceptual Diffusion in Science: The Case of Quantum Computing

预测科学中的概念扩散:以量子计算为例

Thomas Maillart, Thibaut Chataing, David Dosu, Paul Bagourd, Julian Jang-Jaccard, Alain Mermoud

发表机构 * Geneva School of Economics and Management, University of Geneva(日内瓦经济管理学院,日内瓦大学) Faculty of Medicine, University of Geneva(日内瓦大学医学院) Open Quantum Institute, CERN(开放量子研究所,欧洲核子研究中心) armasuisse Science + Technology(armasuisse 科学与技术)

AI总结 通过构建时间分辨的概念共现网络并训练LightGBM模型,研究量子计算领域概念的内生巩固与外生扩散的可预测性,发现外生扩散和熵具有强可预测性(R²高达0.78),而内生巩固在量子计算中几乎不可预测,但在神经植入领域显著上升(R²=0.83),表明概念扩散受语义和引用环境中的稳定结构规律支配。

Comments 19 pages, 5 figures, 6 tables. Code and manuscript sources: https://github.com/wazaahhh/breakthroughs-diffusion . An earlier version was presented at the Global Tech Mining Conference (GTM) 2026 (submission #117)

详情
AI中文摘要

理解和预测科学变化需要能够区分科学概念的内生巩固和外生扩散的模型。利用OpenAlex中量子计算概念子树,我们构建了一个时间分辨的概念共现网络,并追踪每个概念对的上游引用谱系和下游扩散。我们在分布和多样性感知特征上训练LightGBM模型,以预测四个结果:内生巩固、外生扩散、它们的比率以及扩散熵。在控制科学整体出版增长后,内生巩固在主要的量子计算基准中基本不可预测。相比之下,外生扩散和熵具有很强的可预测性(R²高达0.78),并且由上游异质性、引用广度和分布离散度驱动,如SHAP分析所示;在机器人、先进材料和神经植入上的重复验证证实,外生扩散仍然是跨领域排名最高的目标(测试R²约0.60-0.87),而内生可预测性在神经植入中显著上升(测试R²=0.83),表明量子计算的不对称性并非普遍适用。案例研究表明,尖锐的熵增加与新概念前沿的开启同时发生,而熵崩溃则标志着技术趋同或范式更替。这些结果表明,概念扩散受嵌入语义和引用环境中的稳定结构规律支配。通过识别跨领域采纳的早期基于多样性的信号,该方法为快速发展的研究领域中的预期科学计量学、技术预见和创新导向政策分析提供了可扩展的基础。

英文摘要

Understanding and anticipating scientific change requires models that distinguish between endogenous consolidation and exogenous diffusion of scientific concepts. Using the quantum computing subtree of concepts in OpenAlex, we construct a temporally resolved concept co-occurrence network and track each concept pair through its upstream citation lineage and downstream diffusion. We train LightGBM models on distributional and diversity-aware features to predict four outcomes: endogenous reinforcement, exogenous diffusion, their ratio, and diffusion entropy. After controlling for overall publication growth of the scientific body, endogenous reinforcement proves largely unpredictable in the primary quantum-computing benchmark. In contrast, exogenous diffusion and entropy are strongly predictable ($R^2$ up to $0.78à) and are driven by upstream heterogeneity, citation breadth, and distributional dispersion, as shown by SHAP analyses; replications on robotics, advanced materials, and neuro implants confirm that exogenous diffusion remains the top-ranked target across fields ($R^2_test \sim 0.60-0.87$), while endogenous predictability rises markedly in neuro implants (R^2_test = 0.83), indicating that the quantum-computing asymmetry does not generalise uniformly. Case studies reveal that sharp entropy increases coincide with the opening of new conceptual frontiers, while entropy collapses signal technological convergence or paradigm displacement. These results demonstrate that conceptual diffusion is governed by stable structural regularities embedded in semantic and citation environments. By identifying early diversity-based signals of cross-domain uptake, the approach provides a scalable foundation for anticipatory scientometrics, technology foresight, and innovation-oriented policy analysis in rapidly evolving research fields.

2606.03910 2026-06-03 cs.PF cs.AI cs.DC cs.NI

NetKV: Network-Aware Decode Instance Selection for Disaggregated LLM Inference

NetKV: 面向分解式LLM推理的网络感知解码实例选择

Mubarak Adetunji Ojewale

发表机构 * Cloud Competency Centre, National College of Ireland(国家爱尔兰学院云能力中心)

AI总结 针对分解式LLM推理中KV缓存传输导致的首令牌时间增加问题,提出网络成本感知调度器NetKV,通过贪心算法选择解码实例,在64-GPU胖树模拟器上平均降低TTFT达21.2%。

详情
AI中文摘要

分解式LLM推理迫使KV缓存在解码开始前穿越数据中心网络,因此传输时间直接计入首令牌时间(TTFT)预算。当前调度器仅根据计算负载和前缀缓存局部性进行路由,忽略了预填充和解码实例之间的拓扑距离和动态拥塞。我们通过一个轻量级的算子到调度器接口(网络成本预言机)来弥补这一差距,并证明忽略网络项会导致仅缓存感知的调度在上下文长度增长时任意次优。NetKV是一个每请求O(|D|)的贪心算法,它消耗该预言机,其层级排名对过时遥测数据具有可证明的鲁棒性。在由Mooncake轨迹驱动的64-GPU四层胖树模拟器上,NetKV相比轮询调度平均降低TTFT达21.2%,相比调优的缓存+负载感知调度器降低17.6%,将SLO达标率提升最多20.1个百分点,并在所有测试条件下将令牌间时间开销保持在0.5毫秒以下,无需对传输、推理引擎或硬件进行任何更改。

英文摘要

Disaggregated LLM inference forces the KV cache to traverse the datacenter network before decoding begins, so transfer time enters directly into the Time to First Token (TTFT) budget. Current schedulers route on compute load and prefix-cache locality alone, ignoring the topological distance and dynamic congestion between prefill and decode instances. We close this gap with a thin operator-to-scheduler interface, the network cost oracle, and we prove that ignoring the network term renders cache-aware-only scheduling arbitrarily suboptimal as context length grows. NetKV, the O(|D|) per-request greedy that consumes this oracle, has tier rankings that are provably robust to stale telemetry. On a 64-GPU four-tier fat-tree simulator driven by Mooncake traces, NetKV reduces mean TTFT by up to 21.2% over round-robin and 17.6% over a tuned cache+load-aware scheduler, lifts SLO attainment by up to 20.1 percentage points, and keeps the Time Between Tokens overhead below 0.5 ms in every condition tested, with no changes to the transport, inference engine, or hardware.

2606.03907 2026-06-03 cs.SE cs.AI cs.HC

The Impact of Configuring Agentic AI Coding Tools on Build-vs-Buy Decisions: A Study Protocol

配置智能体AI编码工具对构建vs购买决策的影响:一项研究协议

Jai Lal Lulla, Matthias Galster, Jie M. Zhang, Sebastian Baltes, Christoph Treude

发表机构 * Singapore Management University, Singapore(新加坡管理大学) University of Bamberg, Germany(巴姆堡大学) King’s College London, United Kingdom(伦敦国王学院) Heidelberg University, Germany(海德堡大学)

AI总结 本研究通过受控实验协议,探讨配置机制如何影响Claude Code和OpenAI Codex等智能体AI编码工具在构建vs购买决策中的行为,并发布可复用的基准数据集和分析流程。

Comments 14 pages, 1 table. Accepted at the 20th International Symposium on Empirical Software Engineering and Measurement (ESEM 2026), Registered Reports track

详情
AI中文摘要

智能体AI编码工具以越来越高的自主性编写代码,并在此过程中决定何时导入库以及何时从头实现功能。这些决策——是从头构建功能还是购买外部库(以下称为构建vs购买)——对软件安全性、许可合规性、性能和长期可维护性有直接影响。然而,尚无受控实验研究探讨智能体AI编码工具中构建vs购买决策的支配因素。配置机制,即开发人员根据项目或工作流程定制智能体AI编码工具行为的手段,是实践者影响这些决策的主要方式之一。但尚不清楚哪些配置机制最有效地影响构建vs购买决策。我们提出了一项预注册协议,研究配置机制如何改变两种流行的智能体AI编码工具(Claude Code和OpenAI Codex)中的构建vs购买行为。我们将执行来自阶段性项目基准的受控编程任务,每个任务围绕可识别的构建vs购买点构建,并操纵提供给每个工具的配置,范围从无配置、包含软偏好和明确禁止的上下文文件,到技能(可自主发现的指令)、支持MCP的库发现工具和权限控制,测量工具选择的库、是否披露新引入的库以及这些披露是否完整准确。九个预注册假设构成了该协议。生成的基准数据集和分析流程将作为可复用工件发布,用于评估智能体AI编码工具中的构建vs购买行为。

英文摘要

Agentic AI coding tools write code with increasing autonomy and in doing so decide when to import a library and when to implement functionality from scratch. These decisions, whether to build functionality from scratch or buy into an external library, hereafter build-versus-buy, carry direct consequences for software security, licensing compliance, performance, and long-term maintainability. Yet no controlled experimental study has examined what governs build-versus-buy decisions in agentic AI coding tools. Configuration mechanisms, i.e., the means by which developers tailor agentic AI coding tool behavior to a project or workflow, are one of the primary means by which practitioners can influence these decisions. However, it is unclear which configuration mechanisms influence build-versus-buy decisions most effectively. We present a pre-registered protocol to study how configuration mechanisms alter build-versus-buy behavior in two popular agentic AI coding tools: Claude Code and OpenAI Codex. We will execute controlled programming tasks drawn from a benchmark of staged projects, each constructed around identifiable build-versus-buy points, and will manipulate the configuration supplied to each tool, ranging from no configuration, through context files with soft preferences and explicit prohibitions, to Skills (instructions that can be autonomously discovered), MCP-enabled library discovery tools, and permission controls, measuring which libraries the tool selects, whether it discloses newly introduced libraries, and whether those disclosures are complete and accurate. Nine pre-registered hypotheses structure the protocol. The resulting benchmark dataset and analysis pipeline will be released as a reusable artifact for evaluating build-versus-buy behavior in agentic AI coding tools.